You are on page 1of 11

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

Introduction to principal component analysis


While graphical representations of data can be enough to formulate cursory inferences, nding meaningful patterns becomes increasingly dicult as data enters higher dimensional spaces. Principal component analysis, or PCA, is a mathematical method by which to reduce data dimensionality in multivariate datasets without losing data signicance.

Mathematical foundation for principal component analysis


Concepts from both statistics and linear algebra underlie the processes by which to transform a dataset to its principal components. The following sections will provide a quick overview of the necessary knowledge to apply principal component analysis. Statistics: covariance matrix Covariance is measure of how random variables change with respect to each other. Explicitly, the covariance between two random variables {X, Y } with n observations is: x1 cov(X, Y ) = x2 (xi x)(yi y ) where X = n i=1 xn
n

y1 and Y = y2 yn (1)

If on average, xi > x and yi > y , or xi < x and yi < y , the covariance computation will yield a positive value (i.e. the random variables X and Y are positively correlated). Else, if on average, xi > x and yi < y , or xi > x and yi < y , the covariance computation will yield a negative value (i.e. the random variables X and Y are negatively correlated). It is the sign covariance that reveals the relationship between two random variables. For independent, random variables, the covariance is zero, however cov(X, Y ) = 0 does not imply independence. Now consider a collection of m random variables {X1 , X2 , ..., Xm } with n observations. The covariance between all m random variables can be constructed using a covariance matrix, denoted , in which the i, j position is the covariance between the ith and j th random variable. In this manner: cov(X1 , X1 ) m,m = cov(X2 , X1 ) cov(X1 , X2 ) cov(X2 , X2 ) cov(X1 , Xm ) cov(X2 , Xm ) cov(Xm , Xm ) (2)

cov(Xm , X1 ) cov(Xm , X2 )

Since cov(X, Y ) = cov(Y, X), this matrix will be a symmetric, positive-denite matrix. A symmetric m m matrix, A, with column vectors {x1 , x2 , ..., xm }, is positive-denite if the following is satised: xT Ax > 0 x Rm where x 0 Winter 2013 Math 208 (3) 1

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

This can be applied to the covariance matrix dened in equation (2): X T X > 0 X Rm where X 0 Linear algebra: eigenvectors, eigenvalues and eigendecomposition Denition: Eigenvector Let A be a square matrix, m m and let x Rm where x 0. x is an eigenvector for A if and only if R Ax = x. In a more intuitive sense, if T denes a linear mapping from Rm to Rm and T (x) = Ax = x; this means the result of the matrix vector product of the transformation matrix, A, and the eigenvector, x, merely scales the the eigenvector by a constant, , termed the eigenvalue. By simply rearranging the equation Ax = x, the following can be found, (A I)x = 0. Since the eigenvector cannot be zero, we seek a non-trivial solution; therefore, (A I) will be a singular matrix for which the determinant equals zero. In this manner: a11 det(A I) = det a21 am1 a12 a22 am2 a1m a2m amm =0 (5) (4)

Equation (5) is called the characteristic equation. The process of computing the determinant will yield p(), or the characteristic polynomial, which can be factored in the form: p() = ( 1 )n1 ( 2 )n2 ...( m )nm = 0 where ni is the algebraic multiplicity of i . For every eigenvalue, i , there is a specic eigenvector that satises the equation: (A i I)x = 0. Denition: Similar If A and B are m m matrices, then A is similar to B if there is an invertible matrix P such that B = P1 AP, or equivalently, A = PBP1 . If two square matrices, A and B, are similar, then A and B have the same characteristic polynomial, implying det(A I) = det(B I); it would stand to follow that both matrices have the same eigenvalues and algebraic multiplicities. In a special case of this property, in which matrix A is similar to some diagonal matrix D, then A is said to be diagonizable. (A diagonal matrix is characterized by the presence of zeros outside the main diagonal.) (6)

Winter 2013

Math 208

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

Denition: Diagonizable1 An m m matrix A is said to be diagonizable if there exists a diagonal matrix D and invertible matrix P such that A = PDP1 . As a corollary, the entries along the diagonal of D are the eigenvalues of A and the columns of P are the eigenvectors of A corresponding to their respective eigenvalues. In other words, an m m matrix, A, is diagonizable if and only if there are enough eigenvectors to form a basis of Rm . This basis is called an eigenvector basis of Rm . In the very special case that an m m matrix, A, is a symmetric, positive-denite matrix, then A is said to be orthogonally diagonizable. Matrix A will have m independent and orthogonal eigenvectors, and as such, these eigenvectors form an orthonormal basis for Rm . The inverse of an orthogonal matrix is the transpose of that matrix. Therefore: 1 A = PDP1 = PDPT = p1 p2 pm 0 0 0 2 0 0 0 m pT 1 pT 2 pT m (7)

The concept of an orthonormal basis is quite signicant in regards to linear transformations. A linear transformation that maps one orthogonal basis to another orthogonal basis, called an orthogonal or orthonormal transformation, will necessarily preserve the length of the vectors and the angles between vectors. Geometrically, the transformation will merely account for a rotation or ip.

Matrix diagonalization is sometimes termed eigendecomposition; this label may be more apt since it better conveys

the decomposing of a matrix into the product of three other matrices, only one of which is actually diagonal.

Winter 2013

Math 208

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

Principal component analysis for time series data


Overview In the following example, principal component analysis utilizes the eigenvectors and eigenvalues of the covariance matrix to map data to an orthogonal basis upon which the variance is maximized along the coordinate axes. The rst k principal components span a subspace that is said to contain the best k-dimensional view of the data. Application There are dierent types of marketable securities that the government issues to the public in order to nance the national debt; treasury bills, notes and bonds are basically IOUs that accrue interest over time. Federal Reserve Economic Data (FRED), provided by the Federal Reserve Bank of St. Louis, records key metrics that indicate the state of the American economy. Treasury rates are indicative an administrations inuence on the economy through monetary policy. Monetary policy is simply the adjustment of the money supply and interest rates to promote economic stability. The dataset on which I performed principal component analysis pertains to the daily change in treasury rates for government issued marketable securities. All the computations are completed using the complete sample of 9 maturities at nearly 5000 observations. Since the datasets for which PCA is used are usually quite large, most calculations are made using computer-based algorithms. This analysis was completed using the princomp package for R, an open-source, statistical computing environment.

Winter 2013

Math 208

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

Data The following graphs are plotted using the lattice package for data visualization. Chart 1 displays the raw data. Chart 2 is the daily change in treasury rates. Chart 1: Treasury rates (1994-present)
1995 2000 2005 2010

3M
0 2 4 6 0 2 4 6

6M

02468

2 4 6 8

7Y
2 4 6 8

2468

3Y

02468

1Y
0246

2Y

5Y

10Y

20Y
2 4 6 8 1995

2000

2005

2010

Time

Chart 2: Daily change in treasury rates (1994-present)


1995 2000 2005 2010

3M
0.5 0.5 0.4 0.2 0.5 0.5

6M

1Y
0.4 0.2

2Y

3Y
0.4 0.2

5Y

0.4 0.2

7Y

10Y

20Y
0.2 0.2

1995

2000

2005

2010

Time

Winter 2013

Math 208

0.4 0.2

0.4 0.2

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

Chart 3 displays a series of yield curves. The yield curve is a snapshot of maturity yields at a single moment in time. Principal component analysis, as applied in this context, will aim to determine common trends in the yield curve. Chart 3: Yield curves
19960102
7 7

19990104

Rate

Rate 0 5 10 Maturity 15 20

0 0

10 Maturity

15

20

20020102
7 7

20050103

Rate

Rate 0 5 10 Maturity 15 20

0 0

10 Maturity

15

20

20080102
7 7

20110103

Rate

Rate 0 5 10 Maturity 15 20

0 0

10 Maturity

15

20

Winter 2013

Math 208

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

Step 1: Center the data In order for PCA to work properly, it is necessary to center the data; this is done by subtracting the mean of the ith column vector from each ith column entry. In this manner, the mean of each column vector is now zero. Step 2: Compute the covariance matrix The next step is to calculate the covariance matrix, . Table 1: Covariance matrix 3M 3M 6M 1Y 2Y 3Y 5Y 7Y 10Y 20Y 30.87 19.54 15.79 13.30 12.89 11.67 10.16 8.72 6.43 6M 19.54 20.94 18.01 17.19 17.06 15.97 14.38 12.69 9.92 1Y 15.79 18.01 22.66 24.04 24.25 23.30 21.45 19.14 15.42 2Y 13.30 17.19 24.04 35.57 35.97 35.50 33.34 29.94 24.46 3Y 12.89 17.06 24.25 35.97 39.57 39.31 37.43 33.99 28.23 5Y 11.67 15.97 23.30 35.50 39.31 43.09 41.68 38.50 32.89 7Y 10.16 14.38 21.45 33.34 37.43 41.68 43.06 39.73 34.91 10Y 8.72 12.69 19.14 29.94 33.99 38.50 39.73 38.69 34.13 20Y 6.43 9.92 15.42 24.46 28.23 32.89 34.91 34.13 33.23

Multiplied by 104 for display purposes.

Recall, this matrix is positive-denite as per Equation (4). Notice the symmetry in the matrix; this arises because cov(X, Y ) = cov(Y, X). Since this covariance matrix is essentially comparing how the rate of change of treasury rates change with respect to each other, it makes sense that the covariance between disparate maturities is less pronounced than the covariance between similar maturities. The unique values along the diagonal represent cov(Xi , Xi ) = var(Xi ).

Winter 2013

Math 208

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

Step 3: Compute the eigenvectors and eigenvalues of the covariance matrix Thanks to computer-based algorithms, calculating the eigenvalues and eigenvectors for an 9 9 matrix takes fractions of a second. The eigenvectors of are the principal components of the data (Table 3). Since the eigenvectors of a positive-denite matrix are orthogonal, together they form a basis for R9 . The rst principal component, i.e. the eigenvector associated with the largest eigenvalue, is in the direction along which the data has the most variance. The second principal component, i.e. the eigenvector associated with the second largest eigenvalue, will be in the direction orthogonal to the rst eigenvector, and so on, in this manner, the eigenvalues will give the total variance described by each component. The signicance of each eigenvalue can be visualized in a chart called a Scree plot. A Scree plot simply compares the magnitude of the eigenvalue to the sum of the eigenvalues. Table 2: Eigenvalues (ranked in order from greatest to lowest) Eigenvalues 1st 2
nd th th th th th th th

0.02390036 0.00426753 0.00134145 0.00053698 0.00024657 0.00018641 0.00011940 0.00008910 0.00008000

3 4 5 6 7 8 9

Chart 4: Scree plot


80 Percent Variance Explained
77.7

60

40

20
91.5 95.9

97.7

0 PC1 PC2 PC3 PC4

98.5

99.1

99.5

99.7

100

PC5

PC6

PC7

PC8

Winter 2013

Math 208

PC9

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

As can be inferred from the graph, not all the principal components explain the same percentage of variance. In fact, the rst two or three principal components explain 96% of the variance in the data. Table 3: Eigenvectors or principal components (ranked by eigenvalue, greatest to lowest) 1st 3M 6M 1Y 2Y 3Y 5Y 7Y 10Y 20Y -15.36 -18.83 -25.33 -36.04 -39.20 -41.57 -40.98 -38.03 -32.85 2nd -69.81 -47.74 -30.82 -8.74 -0.64 11.84 20.12 23.26 27.12 3th -48.29 -1.98 26.84 47.90 35.85 6.80 -16.11 -29.24 -46.95 4th -48.33 58.16 45.14 -19.43 -26.03 -18.49 -5.11 7.76 27.61 5th 14.52 -54.73 50.96 26.51 -8.84 -33.79 -23.02 -2.97 41.66 6th -3.56 31.19 -54.51 44.47 18.50 -30.96 -30.17 -8.93 42.12 7th 0.29 -3.35 8.91 -55.82 66.80 13.17 -33.54 -20.20 25.00 8th -0.15 0.77 -1.30 10.26 -35.62 60.79 -3.32 -62.61 31.58 9th -0.19 0.35 0.96 -7.24 18.69 -42.13 71.09 -51.78 9.36

Multiplied by 102 for display purposes.

Step 4: Select a subset of the eigenvectors as a new basis Chart 5 and chart 6, in the following section, provide a way to visualize the rst three eigenvectors. As can be seen, their behavior is modeled from the rst three columns of the matrix in Table 3. I will choose to explain variance in the data using the rst three principal components. Chart 5: First three principal components

0.5

0.0

PC

0.5

PC1 Parallel Shift PC2 Tilt PC3 Bend

10 Maturity

15

20

Winter 2013

Math 208

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

Step 5: Interpret the results Recall the yield curves displayed in Chart 3. If we plotted yield curves for all 5000 observations, we might be able to observe some general trends in the behavior of the data, but this would be incredibly onerous and our observations would likely be subjective in nature. This is the point at which the idea of reducing data dimensionality becomes an extremely powerful tool. By observing the projection of the data on the rst few principal components, we are able to observe general trends in the data. Charts 5 and 6 are indicative of these common trends and chart 7 contextualizes the trends. Chart 6: First three principal components
PC1 PC2 PC3

20Y

20Y

20Y

10Y

10Y

10Y

7Y

7Y

7Y

5Y

5Y

5Y

3Y

3Y

3Y

2Y

2Y

2Y

1Y

1Y

1Y

6M

6M

6M

3M

3M

3M

0.4 0.3 0.2 0.1

0.0

0.6

0.2 0.0

0.2

0.4

0.0

0.2

0.4

Parallel Shift

Tilt

Bend

Chart 7: Yield curve behavior (one direction)

The most prominent trend is the parallel shift, in which all the treasury rates either decrease or increase together. This can easily be observed in the raw time-series data.

Winter 2013

Math 208

10

Bethany Yollin

Principal Component Analysis

Due: March 15th , 2013

Tilt is a less obvious phenomenon, but not entirely by chance. From time-to-time, the FED conducts something called Operation Twist in which the FED purchases long-term bonds in order to increase price and decrease yields while simultaneously selling short-term bonds in order to decrease price and increase yields. In fact, the FED is currently adopting this policy as to make short-term loans more attractive to the public. Ultimately, this causes the yield curve twist or tilt. The last behavior, bend, is the least prominent of all the trends. In this instance, short-term rates will be directly related to long-term rates and inversely related to mid-term rates. Only about 4% of the variance is characterized by this behavior. Conclusion Principal component analysis is a common mathematical method used in econometrics. By applying concepts from both statistics and linear algebra, patterns in multivariate datasets can be revealed. These patterns may have otherwise gone unnoticed since visualizing data in higher dimensions is not easy.

Winter 2013

Math 208

11

You might also like