You are on page 1of 11

Face Recognition:

A Comparative Analysis of PCA, LDA, PCA+LDA, SVM


Mukul Chandail1, Rahul Chandail
1

ECE657A, University of Waterloo,


Waterloo, Canada
{mchandai, rchandai}@uwaterloo.ca

Abstract. In this project we aim to provide a comparative analysis of the face


recognition algorithms like PCA, LDA, PCA+LDA and SVM. The identification prospect of face recognition is focused in this project. The dataset used
would be a standard dataset in order to comply with the above said algorithms.
The analysis criteria chosen for this project is the recognition accuracy percentage for the said algorithms. This percentage is calculated for varying training
and testing dataset percentages out of the original data. Based on this comparison we draw conclusion on the relative accuracy of the said algorithms.

1 Introduction
Face recognition can be generally categorized into two types [19]:
Verification: Verification is the process of determining whether one given face
matches to one pre-selected given face. In other words, it can be thought of as a one to
one matching. For example, a login prompt on your personal phone or a laptop.
Identification: Identification is the process of determining whether one given face
matches to any of the pre-stored faces in the dataset of different faces. In other words,
it can be thought of as a one to many matching. For example, the face identification
system utilized by the police or any other public/private agency which aims to identify
the given face.
For the purpose of this project we aim to focus on the identification aspect of face
recognition. Once training is finished with individual algorithm, the testing set as the
input to our system and as an output we get the metrics of number of successful classifications or identifications that have been performed.
The field of Face recognition has always been one of the uppermost studied area of
computer vision, especially in the field of pattern recognition. A number of different
algorithms have been developed in order to reach the purpose of face recognition and
a continuous scuffle to estimate the best technique keeps happening. As new algorithms are developed, they are frequently compared to existing ones. As such with a
vast diversity in the recognition methodologies, comparison between few selected
methods is important, however unavailable.

There are different approaches can be roughly classified into categories like Eigenvector based, Feature based, and statistical model based approaches [1]. A brief description of the approaches is as follows:
Eigenvector based depends upon the calculation of the eigenfaces. Some of the example of this approach are Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA).
Feature based: In this methodology the focus is primarily on the local feature extraction. Features like ears, nose, eyes, etc. An example is SIFT.
Statistical Model based: This approach makes use of classifiers which primarily focus on statistics. An example of this approach is Support Vector Machine.
For the purpose of this project we decided to focus on very popular approach and
less popular approach. From the popular approach i.e. the Eigenface based approach
we decided to pick PCA and LDA. Feature based or Statistical Model based approaches share similar popularity, however we decided to go with SVM since it was a
topic that was simultaneously covered in the course lectures. Hence, the main aim of
this project is to provide a comparative review of the accuracy results obtained by
comparing algorithms Support Vector Machines, Principal Component Analysis, Linear Discriminant Analysis, and a hybrid combination of PCA and LDA. [14] For this
purpose, the yalefaces database was initially selected. However, in order to facilitate
the implementation of LDA the database was expanded to the AT&T face database
[13]. Comparison criteria are hard to establish or investigate at our current level. For
example, in order to investigate the effect of environment conditions on the algorithm
performance, an independent dataset with varying but consistent setup would need to
be created. Creation of such a dataset is a non-trivial task as it requires expertise.
Through the means of this project we would like to focus on the analysis criteria of
change in accuracy of each involved algorithm that is a result of varying training data
and testing dataset, as a ratio of the original dataset.

2 Literature Review
Though the literature review was conducted for multiple algorithms, methodologies
and their applications relative to face recognition, for the purpose of this project this
section will only cover the literature review which directly corresponds to the methodologies in use for this project. Also for a particular algorithm multiple research papers
were investigated, however with the limited space and scope of the project, the below
section will only cover the literature review that focuses on the primary research paper.
2.1 Principal Component Analysis
Though different research papers were investigated for the purpose of development
and implementation of Principal Component Analysis algorithm, the primary focus
was stressed on the work done by [2]. The primary motive of this research paper was
aimed at the development of alternative method of face recognition other than the

conventional methods. These conventional methods were limited by the use of impoverished face models and feature descriptions. It provided an unconventional approach
based on the use of eigenvectors and eigenvalues in order to maximize the differences
between multiple faces and their evaluations.The approach was shown to be quick,
relatively simple and to work well in a restricted atmosphere. The experiments carried
out demonstrated that the eigenface technique can be made to perform at very high
accuracy. Accuracy as high as 96% was achieved.
2.2 Linear Discriminant Analysis
The primary paper focused on for the development and implementation of the Linear
Discriminant Analysis algorithm was the work done in [4]. This research work primarily focused on the implementation of Linear Discriminant Analysis in the face recognition domain in order to maximize the in between class covariance while simultaneously minimizing the within class covariance. The implementation aimed to investigate
the discriminatory capability of various facial features. It makes use of feature extraction in which eigenfaces are the highest distinct feature vectors. The paper then proceeds to provide a comparison with the results obtained from the application of PCA
on the same dataset. A number of results of face recognition and gender classification
have also been produced, which provide high accuracy with a small feature set. Accuracy results towards the range of 90% were achieved in the testing process.
2.3 Hybrid Combination: PCA + LDA
[10] provide analysis on the implementation of PCA + LDA. Both PCA and LDA
provide decent results individually. However, PCA as a primary feature extractor is
considered insufficient discriminator. While performing dimension reduction or feature extraction while basis vector calculation, PCA tends to retain unwanted or redundant information which clouds its discrimination capability. Whereas, as LDA focuses
on making use of the class information to maximize the ratio of between class scatter
to within class scatter. However, LDA falls when it has to deal with high dimensional
data, especially it completely fails when the dimension of the data is greater than the
number of input samples. Hence in this paper, the author aims to investigate the face
recognition performance using combination of PCA + LDA. Inferences like, the more
the number of training samples, learning more fully leads to higher recognition rate
and the recognition rate increases with the feature vector dimension, were drawn. It
was concluded that when compared to standalone PCA, the PCA + LDA combination
method improved recognition rate.
2.4 Support Vector Machine
This paper presented face recognition techniques using linear support vector machines
[7]. It focuses on evaluating performance of a component-based technique and two
global techniques for face recognition with respect to robustness against facial pose

changes. In the global system, a face detector was created to detect the whole face and
extracted it from the image. This was further used it as a input to the classifiers. For
the component-based system did a similar operation of detecting and extracting. It
extracted a set of 10 facial components and arranged them in a single feature vector.
These feature vectors were then later used for classification by linear SVMs. Among
the two global systems, the first approach consisted of a single SVM for each person
in the database. In the second approach, the database of each person was clustered and
then used to train a set of view-specific SVM classifiers. These systems where tested
on test sets containing faces rotated in depth up to about 40o. Based on the obtained
results, the component-based system outperformed the global systems concluding that
using facial components instead of the whole face pattern as input features significantly simplifies the task of face recognition.

2.5 Comparison
For the purpose of comparative analysis, a number of different research papers were
investigated briefly in order to determine the best criteria for comparison. Due to the
inclusion of SVM which is a non-eigenvector based approach no research paper was
found which targeted specific comparison between SVM and eigenvector based techniques. Hence accuracy rate with varying training and test set ratio was chosen as the
primary comparison criteria. One of the research paper investigated was:
[11] Main matters regarding statistical assessment of face recognition were addressed
in this paper by evaluating algorithms PCA, ICA1, ICA2 and LDA. These were evaluated in different implementations with identification framework. It was concluded that
the exact choice of images to be in a gallery or in a probe set has great effect on
recognition results. A comparative result based on recognition results between the said
algorithms with fixed gallery and permuted probe set and fixed probe set with fixed
gallery were obtained.
References for other research papers can be found in the Reference section of the
report.

3 Description of the method selected


Following is the list of methods that were selected for the purpose of this project. The
method section starts with a brief description of the algorithm, followed by the implementation details of the algorithm.
Initialized data set comprises of images = 40*10*26 = 10400. (For more information
about the dataset please refer to the dataset section). The following description is
based on the initial tests, where 100% of the original data was used for training and
30% of the original data was used for testing. Tests and test setup with varying training and testing ratios will be described in the Testing section of this report.

3.1 PCA application


The aim of Principal Component Analysis is to maximize the variance of the data
regardless of the class information. It is a typical method to approximate the original
data information to a lower sub-dimensional subspace. The generic approach for the
implementation of PCA for the purpose of feature extraction is as follows: first samples of n dimensions are subjected to PCA as an input. As a result of PCA provides
the subspace composed of orthogonal basis vectors. These basis vectors are then used
to map the original input vectors of n dimension to the new subspace of m dimension. Note: m<=n. These are our new feature vectors with reduced dimensions. Finally, when a new feature vector is introduced for the purpose of testing, it is first
mapped to the m reduced subspace and then classified based on the chosen classifier
[2].
Following are the steps that were followed to implement/train PCA onto our selected
dataset:
First we stored all images in a matrix X. Each column of the matrix stores the column wise grey values of the image (92*112=10304). The number of columns represent the number of images (10400).
X = [10304x10400]
Next we proceed to Centralize the data by subtracting the column wise mean of the
matrix from each column of the matrix and store the new matrix as
centX = X mean(X).
This centralized data is used to then calculate the covariance matrix,
covX = centX*centXT
Next the Eigenvectors V and Eigenvalues D corresponding to the covX are calculated. Based on m highest Eigenvalues, corresponding m eigenvectors are selected
as the orthogonal basis of the new subspace.
W = V(:,1:m)
Finally, the original centralized data is projected onto the new subspace to obtain the
dimension reduced feature vectors.
Y=WT * centX
Now the algorithm has been trained with our initial training dataset.
3.2 LDA application
The primary aim of Linear Discriminant Analysis is to maximize class discrimination
capability. It achieves this by maximizing the between class scatter matrix and minimizing the within class scatter matrix. [4]

Following are the steps that were followed to implement/train LDA onto our selected dataset:
Firstly, we store the original image training dataset into matrix X and then split the
data into individual class matrices X1, X2, X3X40
Next we calculate the overall column wise mean and the mean of the individual
classes 1, 2, 3, 440
Then with the above information we calculate the within class scatter matrix and
between class scatter matrix.
The within class scatter for each class is calculated as:
1 = (1 1)
Similarly, SW2, SW3, SW4SW40 are calculated.
The overall within class scatter matrix can be calculated as:
SW = SW1+SW2+SW3+S40
The between class scatter for each class is calculated as:
1 = (26*10) (1)
Similarly, SB2, SB3, SB4SB40 are calculated.
SB = SB1+SB2+SB3+SB40
Finally, based on the between and within class scatter matrix, the final Fischer Matrix is calculated:
FM = SW-1 * SB
NOTE: It is very crucial to ensure that the dimension of column of our original matrix
is less than the number of image samples present. If it is not the case, then SW will
produce a singular matrix and LDA will not be able to proceed. LDA fails at this
point.
Next the Eigenvectors V and Eigenvalues D corresponding to the FM are calculated. Based on m highest Eigenvalues, corresponding m eigenvectors are selected
as the orthogonal basis of the new subspace.
W = V(:,1:m)
Finally, the original centralized data is projected onto the new subspace to obtain
the dimension reduced feature vectors.
Y=WT * X
Now the algorithm has been trained with our initial training dataset.
3.3 PCA+LDA application
PCA + LDA aims to improve on the results of individual performances of PCA and
LDA. It also tends to cull the shortcomings of LDA. A major drawback of LDA is that
if the dimension of the vector/column of the dataset is greater than the number of
samples/images, LDA fails completely. Hence in order to remove this limitation of

LDA, PCA + LDA is utilized. The original dataset is first subjected to PCA for the
initial dimension reduction and variance maximization. The new data is compliant to
LDA and ensures that LDA does not fails. Next this reduced dimension feature vector
subspace is then subjected LDA for further dimension reduction. [10]
Initial data X = [10304 x 10400]
This data is subjected to PCA reduction:
yX = W1T * X yX = [m1 x 10400]
Where m1 <<10304
Finally, this reduced data is subjected to LDA:
Y = W2T * yX Y = [m2 x 10400]
Where m2<<m1<<10304
Now the system has been trained and is subjected to training.
3.4 SVM application
SVMs is a type of decision (function) boundary classifier. It performs pattern recognition between two classes by finding a decision surface (hyperplane) induced from the
available samples. This hyperplane has maximum distance to the closest points in the
training set which are termed as support vectors. The main idea of SVM is to produce
a classifier that will work well on unseen examples, that is, generalizes well. [18]
Consider the problem of separating the set of training vectors s = {x1, x2, . . ., xn}
where each point xi, i = 1, 2, . . ., n, belongs to one of two classes identified by the
label yi {-1,1}. Assuming that the data is linearly separable, then the hyperplane of
the form:
, will separate the two classes such that the distance to the support vectors is maximized. This hyperplane is called the optimal separating hyperplane
(OSH).
Multi-class separation:
Since SVM is a binary classifier, a multi-class patter recognition system can be obtained by combining multiple SVMs. There are two basic strategies for solving multiclass problems with SVMs:
One-vs-all approach: In this approach, for q number of classes, q SVMs are
trained. Each of the SVMs separates a single class from all remaining classes.
One-vs-one approach: This is a pairwise approach to classify between each
pair. For q class labels, q(q-1)/2 SVMs are trained. Each SVM separates a
pair of classes.
Among the two approaches, for training, the one-vs-one approach is desirable as
only one-vs-all approach often leads to ambiguous classification results. The run-time
complexity of the two methods are alike: one-vs-all approach requires the evaluation
of q SVMs whereas, the pairwise approach requires evaluation of q-1 SVMs.
Since the number of classes for face recognition is subsequently small, one-vs-one
strategy was preferred over the one-vs-all approach. For this project, AT&T face
database was used. The database contains greyscale face images of 40 individuals,
with 10 set of face images for each individual in different orientations, which was

extended to 260 set of face images for each individual. 80% of the database was used
as a training set and 20% as testing set. From the training set, class labels were created, with one class label for each individual; thus, 40 total class labels. From the training set, HOG features were extracted. HOG is a short for Histogram of Oriented features. A HOG feature extractor decomposes an image into small squared cells and
computes a histogram of oriented gradients in each cell.

4 Implementation Tools and Dataset description

4.1 MATLAB
MATLAB is considered as the tool of choice that is utilized for this project. It has a
huge set of premade functions and tools which aid in the implementation of the above
said algorithms.
4.2 Dataset Description
For the purpose of comparative analysis between multiple algorithms/techniques, it is
crucial to use a standard data set. Since having an image data set for face recognition
involves a lot of time and expertise, we initially decided to go for the Yale Face Dataset which provides decent diversity of individuals with different facial expressions
and environment conditions. This database consists of images of the size 243x320 of
15 individuals with 11 different poses each. We started off with this dataset however
soon realized that the dimensions of the above image dataset were too high for LDA.
Upon storing, the matrix X formed was of the dimension 77760x165. As explained
such a matrix forces LDA to fail. In order to avoid LDA failure, we tried to extrapolate the above image dataset by adding in mirror images dusted with random noise.
However, this proved to be cumbersome since for each image we needed to add in
475 additional images with varying noise. [14]
Hence we decided to shift from the Yale Face Dataset to the AT&T image dataset.
The size of each image is 92x112 pixels, with 256 grey levels per pixel. The images
are of 40 different individuals with 10 different poses for each face. Upon storage this
leads to the formulation of the X matrix of the dimension 10304x400. This is still not
LDA compliant and LDA will fail. Hence in order to rectify this 25 additional images
were added corresponding to each image. This was done by dusting the mirror images
with different level and variation of noise. This is a lot easier to do for 25 images than
for 475 images. Hence the new dataset becomes of the dimension 10304x(400x26) or
10304x10400. This forms our extended dataset. For the purpose of varying training
and testing ratios similar methods would be adopted to make dataset LDA compliant.
[13]

5 Testing: Test cases on the selected datasets and evaluation of


performance
PCA: Testing data is stored into a matrix with each column representing the testing
image and number of column representing the number of test samples. This data is
first centralized by subtracting the mean of the training matrix from each column vector. Next the centralized data is projected onto the trained eigen subspace in order to
achieve the required weight vectors. Finally, a Euclidean classifier which calculates
the Euclidean distance between the average class weight and test input weights is used
to obtain classification results.
LDA: Similar to PCA the LDA testing data is stored in a matrix. This data is then
projected onto the trained eigen subspace. Finally, a Euclidean classifier is used to
obtain classification results.
PCA+LDA: For PCA + LDA the testing data is subjected to dimension reduction
twice. First it is projected onto the PCA reduced eigen subspace for initial weight
calculation. Next this PCA projected testing data is projected onto the LDA reduced
subspace in order to calculate the final weight vectors.
Based on the above procedure we end up with the reduced dimension feature vectors (or weight vectors) for each testing image. Next for the purpose of testing we
calculate the Euclidean distance of each weight vector from the average weight vector
of each class from the training dataset. The minimum distance achieved for an individual testing weight vector gives us the class of the said testing image. A threshold
distance is set which prevents false classifications. If the calculated distances are
above the threshold, the testing vector is termed as not-classified.
SVM: The training samples with extracted features were then passed through a
MATLAB ECOC classifier. An error-correcting output codes (ECOC) multiclass
model uses a one-vs-one coding design with support vector machine (SVM) binary
learners. After successful training, the test samples are classified using a MATLAB
predictor function used with the ECOC classifier.
The data will be split up into different training and testing data for the purpose of
testing. In order to support testing on LDA, the images in the dataset will be repeated
in order preserve the consistency of the dataset with other algorithms like PCA,
PCA+LDA, SVM.
For 100% Training and 30% Testing: For the purpose of the initial testing and verification the dataset of size 10304x10400 was split into the training and testing dataset.
100% of the dataset was used for training and 30% of the same dataset was used for
testing. About 4% of the testing samples were dusted with random noise in order to
achieve variations in the testing result.
The dataset was split into the following ratios (no added noise) for testing purpose:
90% Training and 10% Testing
80% Training and 20% Testing
70% Training and 30% Testing
60% Training and 40% Testing

6 Discussion on Result and Conclusion


Accuracy
Table

Train-Test
100%-30%

Train-Test
90%-10%

Train-Test
80%-20%

Train-Test
70%-30%

Train-Test
60%-40%

PCA

95

95

91

87

85

LDA

96

95

89

80

73

PCA+LDA

99

99

95

91

90

SVM

97

96

90

86

84

The above results were obtained by varying the training and testing set percentages.
A general declining trend in accuracy of the algorithms was observed as the training
set size/ratio was decreased with a simultaneous increase in the testing set size/space.
This is as expected as the with the decreasing training set, algorithms are less and less
aware or trained for the new image or face sets. This is made further obscure because
of the presence of added noise to the sample images. However regardless of this, it is
quite evident that PCA and SVM provided almost consistent results. The best results
were provided with the help of the hybrid combination of the PCA and LDA, which is
again expected as PCA+LDA enhances the discrimination capability of PCA and
eliminates LDAs shortcomings that occur due to the insufficient data dimensions.
An interesting trend to notice is that of LDA. With high number of training samples
LDA performed almost on par to PCA or even better initially. However, as the number of training samples reduced, the accuracy of the LDA also fell drastically relative
to the other algorithms. This decline in performance can be attributed to LDAs incapability to handle low training samples. LDAs success essentially depends on the
number of training samples being equal to or greater than the dimension of each feature vector. As the training data samples declined, in order to help LDA to cope up,
the training samples were duplicated. This was done to preserve the fairness of the
dataset. However, we believe this also skewed the discrimination capability of LDA
and thus resulting in a steeper fall in accuracy.
All in all, the algorithms provided with a decent recognition accuracy. A good
measure to take this experience to the next level would be test PCA, PCA+LDA, and
SVM on a less constrained yet more diverse dataset in order to maximize the recognition capability. This can also be further extended by investigating further into more
robust classifiers. Upon further research we recently found that a combination of the
PCA+SVM, LDA+SVM, and PCA+LDA+SVM produces good results. These will be
further investigated; however, it is out of scope of this current project.

References
1.
2.
3.
4.
5.
6.

7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.

Parmar, Divyarajsinh N., and Brijesh B. Mehta. "Face Recognition Methods & Applications." arXiv preprint arXiv:1403.0485 (2014).
M. Turk, A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neurosicence,
Vol. 3, No. 1, 1991, pp. 71-86
H. Moon, P.J. Phillips, Computational and Performance aspects of PCA-based Face
Recognition Algorithms, Perception, Vol. 30, 2001, pp. 303-321
K. Etemad, R. Chellappa, Discriminant Analysis for Recognition of Human Face Images,
Journal of the Optical Society of America A, Vol. 14, No. 8, August 1997, pp. 1724-1733
P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition
using Class Specific Linear Projection, Proc. of the 4th European Conference on Computer Vision, ECCV'96, 15-18 April 1996, Cambridge, UK, pp. 45-58
W. Zhao, A. Krishnaswamy, R. Chellappa, D.L. Swets, J. Weng, Discriminant Analysis of
Principal Components for Face Recognition, Face Recognition: From Theory to Applications, H. Wechsler, P.J. Phillips, V. Bruce, F.F. Soulie, and T.S. Huang, eds., SpringerVerlag, Berlin, 1998, pp. 73-85
Heisele, Bernd, Purdy Ho, and Tomaso Poggio. "Face recognition with support vector
machines: Global versus component-based approach."Computer Vision, 2001. ICCV
2001. Proceedings. Eighth IEEE International Conference on. Vol. 2. IEEE, 2001.
G. Guo, S.Z. Li, K. Chan, Face Recognition by Support Vector Machines, Proc. of the
IEEE International Conference on Automatic Face and Gesture Recognition, 26-30 March
2000, Grenoble, France, pp. 196-201
L. H. Chan, S. H. Salleh, C. M. Ting and A. K. Ariff, "Face identification and verification
using PCA and LDA," Information Technology, 2008. ITSim 2008. International Symposium on, Kuala Lumpur, Malaysia, 2008, pp. 1-6
Li, Jianke, Baojun Zhao, and Hui Zhang. "Face recognition based on PCA and LDA combination feature extraction." Information Science and Engineering (ICISE), 2009 1st International Conference on. IEEE, 2009.
Delac, Kresimir, Mislav Grgic, and Sonja Grgic. "Independent comparative study of PCA,
ICA, and LDA on the FERET data set." International Journal of Imaging Systems and
Technology 15.5 (2005): 252-260.
Phillips, P. Jonathon. Support vector machines applied to face recognition. Vol. 285. US
Department of Commerce, Technology Administration, NIST, 1998.
AT&T database, http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
UCSD Computer Vision, Yale Face Database, http://vision.ucsd.edu/content/yale-facedatabase
A.V. Nefian, M.H. Hayes III, Hidden Markov Models for Face Recognition, Proc. of the
IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP'98,
Vol. 5, 12-15 May 1998, Seattle, Washington, USA, pp. 2721-2724
T.F. Cootes, C.J. Taylor, Statistical Models of Appearance for Computer Vision, Technical Report, University of Manchester, 125 pages
Aruni Singh, Sanjay Kumar Singh, Shrikant Tiwari, Comparison of face Recognition
Algorithms on Dummy Faces, The International Journal of Multimedia & Its Applications
(IJMA) Vol.4, No.4, August 2012
Mathwork, http://www.mathworks.com/videos/face-recognition-with-matlab-100902.html
L. H. Chan, S. H. Salleh, C. M. Ting and A. K. Ariff, "Face identification and verification
using PCA and LDA," Information Technology, 2008. ITSim 2008. International Symposium on, Kuala Lumpur, Malaysia, 2008, pp. 1-6.

You might also like