You are on page 1of 4

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S.

Tsay

Lecture 6: Principal Component Analysis

Population Principal Components

A principal component analysis is concerned with explaining the variance-covariance structure of a set of variables through some linear combinations of these variables. Its general objectives are (a) dimension reduction and (b) interpretation. Consider p random variables X1 , . . . , Xp . Principal component analysis seeks to select a new coordinate system obtained by rotating the original system with X1 , . . . , Xp as the coordinate axes. The new axes represent the directions with maximum variability and provide a simpler and more parsimonious description of the covariance structure. Let X = (X1 , . . . , Xp ) . Denote the covariance matrix of X by whose eigenvalues are 1 2 p 0. Let Yi = ai X = ai1 X1 + ai2 X2 + + aip Xp be a linear combination of X. Then, we have Var(Yi ) = Var(ai X) = ai ai , and, more generally, for p such linear combinations, Cov(Yi , Yj ) = ai aj , i, j = 1, . . . , p. i = 1, . . . , p (1)

The principal components (PC) are those linear combinations Y1 , . . . , Yp that are uncorrelated and whose variance in (1) are as large as possible. Thus, The rst PC = linear combination a1 X that maximizes Var(a1 X) subject to a1 a1 = 1. The 2nd PC = linear combination a2 X that maximizes Var(a2 X) subject to a2 a2 = 1 and Cov(a1 X, a2 X) = 0. In general, The ith PC = linear combination ai X that maximizes Var(ai X) subject to ai ai = 1 and Cov(aj X, ai X) = 0 j = 1, . . . , i 1. Result 8.1. Let be the covariance matrix of the random vector X = (X1 , . . . , Xp ) . Let the eignevalue-eigenvector pairs of be (1 , e1 ), . . . , (p , ep ) such that 1 2 p . Then the ith PC of X is given by Yi = ei X = ei1 X1 + + eip Xp , i = 1, . . . , p.

Furthermore, Var(Yi ) = ei ei = i , i = 1, . . . , p, i = j. Cov(Yi , Yj ) = ei ej = 0,

If some j are equal then the choices of the corresponding coecient vectos ej and, hence, Yj are not unique. Proof. The result follows Eq. (2.51) and (2.52) of the textbook. Result 8.2. Let be the covariance matrix of the random vector X = (X1 , . . . , Xp ) and (1 , e1 ), . . . , (p , ep ) be the ordered (in increasing order) eigenvalue-eigenvector pairs of . Then p p p p ii =
i=1 i=1

Var(Xi ) =
i=1

i =
i=1

Var(Yi ).

The proportion of total population variance explained by the ith PC is i p j=1 j .

Let X,Y be the correlation coecient between the random variables X and Y . Result 8.3. Let Yi = ei X be the PC of the random vector X with covariance matrix . Then, eik i , i, k = 1, 2, . . . , p. Yi ,Xk = kk Correlation matrix: Let Z = D 1 X, where D = diag{ 11 , . . . , pp } with ii being the (i.i)th element of the covariance matrix of X. Then the ith PC of Z is Yi = ei D 1 (X ), where = E(X) and (i , ei )s are the ordered eigenvalue-eigenvector pairs of = Cov(Z). Moreover,
p p

Var(Yi ) =
i=1 i=1

Var(Zi ) = p

and Yi ,Zk = eik i , i, k = 1, . . . , p.

Sample PC

Suppose that the data x1 , . . . , xn represent n independent draws from some p-dimensional population with mean and covariance matrix . Let x, S and R be the sample mean, covariance matrix and correlation matrix, respectively. The sample PCs are the counterparts of the population PCs with and replaced by S and R. 2

2.1

The number of PCs

The scree plot which is a time-series plot of the eigenvalues of S or R in descreasing order. That is, the scatter-plot of (i, i ), i = 1, . . . , p. By looking for an elbow (bend) in the scree plot, we can determine the number of PCs.

2.2

Large sample inferences

Asymptotic results for eigenvalues and eigenvectors are only available under the normality assumption and the condition that all eigenvalues are distinct and positive. Some limited results start to appear when the variables are not normally distributed. Anderson (1963, Annals of Mathematical Statistics) derived the following large sample distri bution theory for the eigenvalues = (1 , . . . , p ) and eigenvectors e1 , . . . , ep of the sample covariance matrix S: 1. Let be the diagonal matrix of eigenvalues 1 , . . . , p of , then n( ) Np (0, 22 ). 2. Let E i = i then n(ei ei ) Np (0, E i ). k e e , 2 k k k=1,k=i (k i )
p

3. Each i is distributed independently of the elements of the associated ei . Property 1 implies that, for large n, the sample eigenvalues i are independently distributed. 2 Moreover, i has an approximate N (i , 2i /n) distribution. Using this result, a large sample 100(1 )% condence interval for i is provided by i 1 + z(/2) 2/n i i 1 z(/2) 2/n ,

where z(/2) is the upper 100(/2) percentile of s standard normal distribution. Peoperty 2 implies that the ei s are normally distributed about the corresponding ei s for large samples. The elements of each ei are correlated, and the correlation depends to a large extent on the separation of the eigenvalues 1 , 2 , . . . , p and the sample size n. Approximate standard errors for the coecients ei,k are given by the square roots of the diagonal elements of (1/n)E i , where E i is derived from E i by substituting i s for i s and ei s for the ei s. Testing for the equal correlation structure: Consider the null hypothesis
Ho : = o =

1 . . .

1 . .. . . .

. . . 1

and the alternative hypothesis H1 : = 0 . Lawley (1963) proposed a testing procedure. Let p 1 2 rk = rik , k = 1, 2, . . . , p; r = rik , p 1 i=1,i=k p(p 1) i<k = (p 1)2 [1 (1 r)2 ] , p (p 2)(1 r)2

where rk is the average of the o-diagonal elements in the kth column of R and r is the overall average of the o-diagonal elements. The large sample approximate -level test is to reject Ho in favor of H a if (n 1) T = (1 r)2

(rik r)2
i<k k=1

(k r)2 > 2 r (p+1)(p2)/2 ().

Remark: The R command of PC analysis is princomp.

You might also like