You are on page 1of 7

Principal Component

Analysis
Introduction:
PCA is the way of identifying pattern in the data; data is expressed in
such away to highlight the similarities and differences.
To reduce dimensionality of a vector image while maintaining
information as much as possible.
Once the pattern is found in the data it is compressed i.e reduce in
number of dimensions, without too much losing the information of
data.
Much more predominant factors are considered.
Varieties of Samples:
Single sample, independent samples, and dependent samples.
Single sample t having only 1 group; want to test against a
hypothetical mean.
Independent samples t having 2 means, 2 groups; no relation
between groups, e.g., people randomly assigned to a single group.
Dependent t having two means. Either same people in both groups,
or people are related, e.g., husband-wife, left hand-right hand,
hospital patient and visitor.
The t Distribution
We use t when the population variance is unknown (the usual case)
and sample size is small (N<100, the usual case). If you use a stat
package for testing hypotheses about means, you will use t.
PCA steps: transform an matrix into an matrix :
Centralized the data (subtract the mean).
Calculate the covariance matrix: C= 1 1
,= 1 1 =1 ,. , , (diagonal) is the variance of
variable i.
, (off-diagonal) is the covariance between variables i and j.
Calculate the eigenvectors of the covariance matrix (orthonormal)
Principal Components
All principal components (PCs) start
at the origin of the ordinate axes.
First PC is direction of maximum
variance from origin
Subsequent PCs are orthogonal to 1st
PC and describe maximum residual
variance

You might also like