Professional Documents
Culture Documents
Philosophy of PCA
Introduced by Pearson (1901) and
Hotelling (1933) to describe the
variation in a set of multivariate data in
terms of a set of uncorrelated variables
We typically have a data matrix of n
observations on p correlated variables
x1,x2,xp
PCA looks for a transformation of the xi
into p new variables yi that are
uncorrelated
The data matrix
case ht (x1) wt(x2) age(x3) sbp(x4) heart rate (x5)
1 175 1225 25 117 56
2 156 1050 31 122 63
n 202 1350 58 154 67
Reduce dimension
1+ 2+..+ p =1
One good criterion
v( x1 ) c(x1,x2 ) ........c(x1,x p )
C=
c(x1,x2 ) v( x2 ) ........c(x2 ,x p )
c(x ,x ) c(x ,x )..........v( x )
1 p 2 p p
And so.. We find that
The direction of is given by the
eigenvector 1 correponding to the
largest eigenvalue of matrix C
The second vector that is orthogonal
(uncorrelated) to the first is the one
that has the second highest variance
which comes to be the eignevector
corresponding to the second
eigenvalue
And so on
So PCA gives
C= 1 c C-I= 1 c
c 1
c 1
det(C-I)=(1- )-c
a1 1 c a1 a1 ca2 a1
A= CA= = =
2
a
c 1 a2 ca1 a2 a2
Solving we find A1 A2
PCA is sensitive to scale
Xi*= (Xi-mean)/variance