Professional Documents
Culture Documents
3.1 Objective
Data X = {xij ; i=1,...,n, j=1,...,p} = {xi ; i=1,...,n}
Objective is to find a linear transformation X Y such that
the 1st component of Y is the most interesting,
the 2nd is the 2nd most interesting,
the 3rd ....................................... etc.
i.e. want to choose a new coordinate system so that the data, when
referred to this new system, Y, are such that
the 1st component contains most information,
the 2nd component contains the next most information, ... etc.
& with luck, the first few (2, 3 or 4 say) components contain nearly
all the information in the data & the remaining p2,3,4 contain relatively
little information and can be discarded (i.e. the statistical analysis can
be concentrated on just the first few components multivariate analysis
is much easier in 2 or 3 dimensions!)
Notes
A linear transformation XY is given by Y=XA where A is a pp
matrix; it makes statistical sense to restrict attention to non-singular
matrices A. If A happens to be an orthogonal matrix, i.e. AA=Ip (Ip the
pp identity matrix) then the transformation XY is an orthogonal
transformation, (i.e. just a rotation and/or a reflection of the n points in p-
dimensional space).
3.2 Specific
The basic idea is to find a set of orthogonal coordinates such that the
sample variances of the data with respect to these coordinates are in
decreasing order of magnitude, i.e. the projection of the points onto the
1st principal component has maximal variance among all such linear
projections, the projection onto the 2nd has maximal variance subject to
orthoganility with the first, projection onto the 3rd has maximal variance
subject to orthogonality with the first two, ..... etc.
Note
most interesting most information maximum variance
Theorem
This objective can be achieved by an eigenanalysis of the variance
matrix S of the data, i.e. the matrix of all two-way covariances between
the variables and variances along the diagonal.
i.e. if we measure the total variation in the original data as the sum of
the variances of the original components,
i.e. s11+s22+...+spp= tr(S)
then the total variation of Y is 1+2+...+p
and i=tr(S) (since the i are eigenvalues of S
So we can interpret
1/i (=1/tr(S) )
as the proportion of the total variation explained by the first principal
component, and
(1+2)/i
as the proportion of the total variation explained by the first two
principal components, ...... etc.
If the first few p.c.s explain most of the variation in the data, then the
later p.c.s are redundant and little information is lost if they are
discarded (or ignored).
k
i
j= 1
If e.g. p
say 80+%, then the (k+1)th,...., pth components contain
i
j= 1
3.3 Scree-plot
If p is large an informal way of choosing a good k is graphically with a
j
i
j= 1
scree-plot, i.e. plot p
vs j. The graph will be necessarily monotonic
i
j= 1
and convex. Typically it will increase steeply for the first few values of j
(i.e. the first few p.c.s) and then begin to level off. The point where it
starts levelling off is the point where bringing in more p.c.s brings less
returns in terms of variance explained.
j
i
j= 1
p
i
j= 1
kink or elbow
graph suddenly
flattens, take k=3
j
0 1 2 3 4 ... p
Some formal tests of H: p=0 are available (but not used much), similarly
expressions for confidence intervals for i.
> library(mva)
> ir.pca<-princomp(ir)
> ir.pca
Call:
princomp(x = ir)
Standard deviations:
Comp.1 Comp.2 Comp.3 Comp.4
2.0494032 0.4909714 0.2787259 0.1538707
ir.p ca
4
1.0
3
0.5
Variances
Com p.2
0.0
2
-0.5
1
-1.0
0
Com p.1
0.4
0.5
0.2
Com p.3
Com p.4
0.0
0.0
-0.2
-0.5
-0.4
Comments
We can then see that the first component a1 is proportional to the sum of
all the measurements. Large fowl will have all xi large and so their
scores on the first principal component y1 (=x'a1) will be large, similarly
small birds will have low scores of y1. If we produce a scatter plot using
the first p.c. as the horizontal axis then the large birds will appear on the
right hand side and small ones on the left. Thus the first p.c. measures
overall size.
The second component is of the form (skull)(wing & leg) and so high
positive scores of y2 (=x'a2) will come from birds with large heads and
small wings and legs. If we plot y2 against y1 then the birds with
relatively small heads for the size of their wings and legs will appear at
the bottom of the plot and those with relatively big heads at the top. The
second p.c. measures overall body shape.
Note: It is typical that the first p.c. reflects overall size or level
i.e. objects often vary most in overall size or there may be large
variation in base level of response (e.g. in ratings data from
questionnaires)
Suggestion: consider ignoring the first p.c. if it only reveals the obvious
(i.e. look at proportions of variance explained excluding the first pc)
> mtcars.pca<-princomp(mtcars,cor=T)
> mtcars.pca
Call:
princomp(x = mtcars, cor = T)
Standard deviations:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
2.5706809 1.6280258 0.7919579 0.5192277 0.4727061 0.4599958 0.3677798 0.3505730
Comp.9 Comp.10 Comp.11
0.2775728 0.2281128 0.1484736
m tcars.p ca
4
6
3
5
2
V arianc es
Com p.2
1
3
0
2
-1
1
-2
-3
0
Com p.1
1.0
1.5
1.0
0.5
Com p.3
Com p.4
0.5
0.0
0.0
-0.5
-1.0
-1.0
> loadings(mtcars.pca)
Comp.1 Comp.2 Comp.3
mpg -0.36 0.01 -0.22
cyl 0.37 0.04 -0.17
disp 0.36 -0.04 -0.06
hp 0.33 0.24 0.14
drat -0.29 0.27 0.16
wt 0.34 -0.14 0.34
qsec -0.20 -0.46 0.40
vs -0.30 -0.23 0.42
am -0.23 0.42 -0.20
gear -0.20 0.46 0.28
carb 0.21 0.41 0.52
data(mtcars)
Format:
Interpretation of loadings:
Comp.3: (less clear) but highs scores on this component from large
heavy cars that do few miles to the gallon and have slow mile speeds.
3.7 Notes:
> plot(ir.pc[,1:2])
> eqscplot(ir.pc[,1:2])
> plot(mtcars.pc[,1:2])
> eqscplot(mtcars.pc[,1:2])
3
1.0
2
0.5
1
Com p.2
0.0
0
-0.5
-1
-2
-1.0
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3
Com p.1
4
4
3
3
2
2
Com p.2
1
0
0
-1
-1
-2
-2
-3
-4 -2 0 2 4 -4 -2 0 2 4
Com p.1
Note that unlike plot() the axes are not automatically labelled by
eqscplot() and you need to do this by including
,xlab="first p.c.",ylab="second p.c."
in the call to it.
3.8 Biplots
> biplot(ir.pca)
-20 -10 0 10 20
16 132
20
0.2
15 118
34
19
33
17 110
6
51 136
10
11
0.1
37
49 126 106
20
47
22 45 66 53 130 123
32 87 142
2821
5 76
121
140
125
131
108
103
1
18
29 52
S epal.W idth 577S 145
38
41 78 144
7 epal.Length
111 119
4044 8659 113
146
141
C omp.2
23 50827 75 137
116
149
36 24 55 148
138
117
0.0
12 98
0
25 92 P 101idth
etal.W
105
710 72 71 P etal.Len
335 26 62 64128
134104
2 79139
48
1330
31
46 96 74127
65 89
124 129
112
133
109
4 8397 150
43 68
100 67
80 93 88 14773
56 84
1439 85
95 135
70 69
63 115
-0.1
9 143
102
122
-10
82 9091
81
99 60 120
114
54
42
58
94
-0.2
107
-20
61
C omp.1
1.0
0.9
cumulative variance explained
0.8
0.7
cut off
0.6
0.5
0.4
0.3
0 1 2 3 4 5 6 7 8 9
PC number
Scree plot
4
3
3
2
2
1
PC 2 (23%)
PC 3 (16%)
1 0
0 -1
-1 -2
-2 -3
-3 -4
-4 -3 -2 -1 0 1 2 3 4 5 6 -3 -2 -1 0 1 2 3 4
PC 1 (36%) PC 2 (23%)
2 1
1
PC 4 (12%)
PC 5 (6%)
0
0
-1
-1
-2
-3 -2
-4 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
PC 3 (16%) PC 4 (12%)
1.0
0.5
PC 6 (3%)
0.0
-0.5
-1.0
-1.5
-2 -1 0 1
PC 5 (6%)
x=f+