Professional Documents
Culture Documents
y1 and Y = y2 yn (1)
If on average, xi > x and yi > y , or xi < x and yi < y , the covariance computation will yield a positive value (i.e. the random variables X and Y are positively correlated). Else, if on average, xi > x and yi < y , or xi > x and yi < y , the covariance computation will yield a negative value (i.e. the random variables X and Y are negatively correlated). It is the sign covariance that reveals the relationship between two random variables. For independent, random variables, the covariance is zero, however cov(X, Y ) = 0 does not imply independence. Now consider a collection of m random variables {X1 , X2 , ..., Xm } with n observations. The covariance between all m random variables can be constructed using a covariance matrix, denoted , in which the i, j position is the covariance between the ith and j th random variable. In this manner: cov(X1 , X1 ) m,m = cov(X2 , X1 ) cov(X1 , X2 ) cov(X2 , X2 ) cov(X1 , Xm ) cov(X2 , Xm ) cov(Xm , Xm ) (2)
cov(Xm , X1 ) cov(Xm , X2 )
Since cov(X, Y ) = cov(Y, X), this matrix will be a symmetric, positive-denite matrix. A symmetric m m matrix, A, with column vectors {x1 , x2 , ..., xm }, is positive-denite if the following is satised: xT Ax > 0 x Rm where x 0 Winter 2013 Math 208 (3) 1
Bethany Yollin
This can be applied to the covariance matrix dened in equation (2): X T X > 0 X Rm where X 0 Linear algebra: eigenvectors, eigenvalues and eigendecomposition Denition: Eigenvector Let A be a square matrix, m m and let x Rm where x 0. x is an eigenvector for A if and only if R Ax = x. In a more intuitive sense, if T denes a linear mapping from Rm to Rm and T (x) = Ax = x; this means the result of the matrix vector product of the transformation matrix, A, and the eigenvector, x, merely scales the the eigenvector by a constant, , termed the eigenvalue. By simply rearranging the equation Ax = x, the following can be found, (A I)x = 0. Since the eigenvector cannot be zero, we seek a non-trivial solution; therefore, (A I) will be a singular matrix for which the determinant equals zero. In this manner: a11 det(A I) = det a21 am1 a12 a22 am2 a1m a2m amm =0 (5) (4)
Equation (5) is called the characteristic equation. The process of computing the determinant will yield p(), or the characteristic polynomial, which can be factored in the form: p() = ( 1 )n1 ( 2 )n2 ...( m )nm = 0 where ni is the algebraic multiplicity of i . For every eigenvalue, i , there is a specic eigenvector that satises the equation: (A i I)x = 0. Denition: Similar If A and B are m m matrices, then A is similar to B if there is an invertible matrix P such that B = P1 AP, or equivalently, A = PBP1 . If two square matrices, A and B, are similar, then A and B have the same characteristic polynomial, implying det(A I) = det(B I); it would stand to follow that both matrices have the same eigenvalues and algebraic multiplicities. In a special case of this property, in which matrix A is similar to some diagonal matrix D, then A is said to be diagonizable. (A diagonal matrix is characterized by the presence of zeros outside the main diagonal.) (6)
Winter 2013
Math 208
Bethany Yollin
Denition: Diagonizable1 An m m matrix A is said to be diagonizable if there exists a diagonal matrix D and invertible matrix P such that A = PDP1 . As a corollary, the entries along the diagonal of D are the eigenvalues of A and the columns of P are the eigenvectors of A corresponding to their respective eigenvalues. In other words, an m m matrix, A, is diagonizable if and only if there are enough eigenvectors to form a basis of Rm . This basis is called an eigenvector basis of Rm . In the very special case that an m m matrix, A, is a symmetric, positive-denite matrix, then A is said to be orthogonally diagonizable. Matrix A will have m independent and orthogonal eigenvectors, and as such, these eigenvectors form an orthonormal basis for Rm . The inverse of an orthogonal matrix is the transpose of that matrix. Therefore: 1 A = PDP1 = PDPT = p1 p2 pm 0 0 0 2 0 0 0 m pT 1 pT 2 pT m (7)
The concept of an orthonormal basis is quite signicant in regards to linear transformations. A linear transformation that maps one orthogonal basis to another orthogonal basis, called an orthogonal or orthonormal transformation, will necessarily preserve the length of the vectors and the angles between vectors. Geometrically, the transformation will merely account for a rotation or ip.
Matrix diagonalization is sometimes termed eigendecomposition; this label may be more apt since it better conveys
the decomposing of a matrix into the product of three other matrices, only one of which is actually diagonal.
Winter 2013
Math 208
Bethany Yollin
Winter 2013
Math 208
Bethany Yollin
Data The following graphs are plotted using the lattice package for data visualization. Chart 1 displays the raw data. Chart 2 is the daily change in treasury rates. Chart 1: Treasury rates (1994-present)
1995 2000 2005 2010
3M
0 2 4 6 0 2 4 6
6M
02468
2 4 6 8
7Y
2 4 6 8
2468
3Y
02468
1Y
0246
2Y
5Y
10Y
20Y
2 4 6 8 1995
2000
2005
2010
Time
3M
0.5 0.5 0.4 0.2 0.5 0.5
6M
1Y
0.4 0.2
2Y
3Y
0.4 0.2
5Y
0.4 0.2
7Y
10Y
20Y
0.2 0.2
1995
2000
2005
2010
Time
Winter 2013
Math 208
0.4 0.2
0.4 0.2
Bethany Yollin
Chart 3 displays a series of yield curves. The yield curve is a snapshot of maturity yields at a single moment in time. Principal component analysis, as applied in this context, will aim to determine common trends in the yield curve. Chart 3: Yield curves
19960102
7 7
19990104
Rate
Rate 0 5 10 Maturity 15 20
0 0
10 Maturity
15
20
20020102
7 7
20050103
Rate
Rate 0 5 10 Maturity 15 20
0 0
10 Maturity
15
20
20080102
7 7
20110103
Rate
Rate 0 5 10 Maturity 15 20
0 0
10 Maturity
15
20
Winter 2013
Math 208
Bethany Yollin
Step 1: Center the data In order for PCA to work properly, it is necessary to center the data; this is done by subtracting the mean of the ith column vector from each ith column entry. In this manner, the mean of each column vector is now zero. Step 2: Compute the covariance matrix The next step is to calculate the covariance matrix, . Table 1: Covariance matrix 3M 3M 6M 1Y 2Y 3Y 5Y 7Y 10Y 20Y 30.87 19.54 15.79 13.30 12.89 11.67 10.16 8.72 6.43 6M 19.54 20.94 18.01 17.19 17.06 15.97 14.38 12.69 9.92 1Y 15.79 18.01 22.66 24.04 24.25 23.30 21.45 19.14 15.42 2Y 13.30 17.19 24.04 35.57 35.97 35.50 33.34 29.94 24.46 3Y 12.89 17.06 24.25 35.97 39.57 39.31 37.43 33.99 28.23 5Y 11.67 15.97 23.30 35.50 39.31 43.09 41.68 38.50 32.89 7Y 10.16 14.38 21.45 33.34 37.43 41.68 43.06 39.73 34.91 10Y 8.72 12.69 19.14 29.94 33.99 38.50 39.73 38.69 34.13 20Y 6.43 9.92 15.42 24.46 28.23 32.89 34.91 34.13 33.23
Recall, this matrix is positive-denite as per Equation (4). Notice the symmetry in the matrix; this arises because cov(X, Y ) = cov(Y, X). Since this covariance matrix is essentially comparing how the rate of change of treasury rates change with respect to each other, it makes sense that the covariance between disparate maturities is less pronounced than the covariance between similar maturities. The unique values along the diagonal represent cov(Xi , Xi ) = var(Xi ).
Winter 2013
Math 208
Bethany Yollin
Step 3: Compute the eigenvectors and eigenvalues of the covariance matrix Thanks to computer-based algorithms, calculating the eigenvalues and eigenvectors for an 9 9 matrix takes fractions of a second. The eigenvectors of are the principal components of the data (Table 3). Since the eigenvectors of a positive-denite matrix are orthogonal, together they form a basis for R9 . The rst principal component, i.e. the eigenvector associated with the largest eigenvalue, is in the direction along which the data has the most variance. The second principal component, i.e. the eigenvector associated with the second largest eigenvalue, will be in the direction orthogonal to the rst eigenvector, and so on, in this manner, the eigenvalues will give the total variance described by each component. The signicance of each eigenvalue can be visualized in a chart called a Scree plot. A Scree plot simply compares the magnitude of the eigenvalue to the sum of the eigenvalues. Table 2: Eigenvalues (ranked in order from greatest to lowest) Eigenvalues 1st 2
nd th th th th th th th
3 4 5 6 7 8 9
60
40
20
91.5 95.9
97.7
98.5
99.1
99.5
99.7
100
PC5
PC6
PC7
PC8
Winter 2013
Math 208
PC9
Bethany Yollin
As can be inferred from the graph, not all the principal components explain the same percentage of variance. In fact, the rst two or three principal components explain 96% of the variance in the data. Table 3: Eigenvectors or principal components (ranked by eigenvalue, greatest to lowest) 1st 3M 6M 1Y 2Y 3Y 5Y 7Y 10Y 20Y -15.36 -18.83 -25.33 -36.04 -39.20 -41.57 -40.98 -38.03 -32.85 2nd -69.81 -47.74 -30.82 -8.74 -0.64 11.84 20.12 23.26 27.12 3th -48.29 -1.98 26.84 47.90 35.85 6.80 -16.11 -29.24 -46.95 4th -48.33 58.16 45.14 -19.43 -26.03 -18.49 -5.11 7.76 27.61 5th 14.52 -54.73 50.96 26.51 -8.84 -33.79 -23.02 -2.97 41.66 6th -3.56 31.19 -54.51 44.47 18.50 -30.96 -30.17 -8.93 42.12 7th 0.29 -3.35 8.91 -55.82 66.80 13.17 -33.54 -20.20 25.00 8th -0.15 0.77 -1.30 10.26 -35.62 60.79 -3.32 -62.61 31.58 9th -0.19 0.35 0.96 -7.24 18.69 -42.13 71.09 -51.78 9.36
Step 4: Select a subset of the eigenvectors as a new basis Chart 5 and chart 6, in the following section, provide a way to visualize the rst three eigenvectors. As can be seen, their behavior is modeled from the rst three columns of the matrix in Table 3. I will choose to explain variance in the data using the rst three principal components. Chart 5: First three principal components
0.5
0.0
PC
0.5
10 Maturity
15
20
Winter 2013
Math 208
Bethany Yollin
Step 5: Interpret the results Recall the yield curves displayed in Chart 3. If we plotted yield curves for all 5000 observations, we might be able to observe some general trends in the behavior of the data, but this would be incredibly onerous and our observations would likely be subjective in nature. This is the point at which the idea of reducing data dimensionality becomes an extremely powerful tool. By observing the projection of the data on the rst few principal components, we are able to observe general trends in the data. Charts 5 and 6 are indicative of these common trends and chart 7 contextualizes the trends. Chart 6: First three principal components
PC1 PC2 PC3
20Y
20Y
20Y
10Y
10Y
10Y
7Y
7Y
7Y
5Y
5Y
5Y
3Y
3Y
3Y
2Y
2Y
2Y
1Y
1Y
1Y
6M
6M
6M
3M
3M
3M
0.0
0.6
0.2 0.0
0.2
0.4
0.0
0.2
0.4
Parallel Shift
Tilt
Bend
The most prominent trend is the parallel shift, in which all the treasury rates either decrease or increase together. This can easily be observed in the raw time-series data.
Winter 2013
Math 208
10
Bethany Yollin
Tilt is a less obvious phenomenon, but not entirely by chance. From time-to-time, the FED conducts something called Operation Twist in which the FED purchases long-term bonds in order to increase price and decrease yields while simultaneously selling short-term bonds in order to decrease price and increase yields. In fact, the FED is currently adopting this policy as to make short-term loans more attractive to the public. Ultimately, this causes the yield curve twist or tilt. The last behavior, bend, is the least prominent of all the trends. In this instance, short-term rates will be directly related to long-term rates and inversely related to mid-term rates. Only about 4% of the variance is characterized by this behavior. Conclusion Principal component analysis is a common mathematical method used in econometrics. By applying concepts from both statistics and linear algebra, patterns in multivariate datasets can be revealed. These patterns may have otherwise gone unnoticed since visualizing data in higher dimensions is not easy.
Winter 2013
Math 208
11