Professional Documents
Culture Documents
Value Decomposition 2
—Multidimensional Scaling 1
-4 -3 -2 -1 1 2 3 4
-1
-2
-3
-4
3
and rest. Remove rest for
2
dimension reduction.
1
-4 -3 -2 -1 1 2 3 4
-1
-2 Ist Principal
component
-3
-4
After projection
1
Principal Component Analysis (PCA)
Principal Component Analysis
—PCA is a way of expressing a high —Rotate data so that the new coordinates
dimensional data set via an alternative set of are based on correlation structure of the
dimensions that are
data
– Orthogonal to each other
– Easy to rank by how “representative” they are —Select a subset of the new coordinates
– Not always easy to interpret with high variability, and use for data
—It is useful because it allows visualization of visualization, summarization and
the data in most representative dimension clustering
—It may (or not) provide useful predictions for —E.G. Clustering tumor samples using
clustering and phenotype prediction PCA on expression profiles
PCA COMPUTATIONS
and the eigenvectors and -values can be found by solving the set of
equations:
2
The read my lips example
Data:
PCA PCA
3
Cell Cycle
Experiments
Fig. 1. Normalized elutriation eigengenes. (a) Raster display of NT, the expression of 14
eigengenes in 14 arrays. (b) Bar chart of the fractions of eigenexpression, showing that
and capture about 20% of the overall normalized expression each, and a high entropy d =
0.88. (c) Line-joined graphs of the expression levels of (red) and (blue) in the 14
arrays fit dashed graphs of normalized sine (red) and cosine (blue) of period T = 390 min
and phase = 2 /13, respectively.
Fig. 2. Normalized elutriation expression in the subspace associated with the cell cycle.
(a) Array correlation with along the y-axis vs. that with along the x-axis, color-coded
Fig. 3. Genes sorted by relative correlation with and of normalized elutriation. (a) Normalized elutriation according to the classification of the arrays into the five cell cycle stages, M/G1 (yellow),
expression of the sorted 5,981 genes in the 14 arrays, showing traveling wave of expression. G1 (green), S (blue), S/G2 (red), and G2/M (orange). The dashed unit and half-unit circles
(b) Eigenarrays expression; the expression of |1N and , the eigenarrays corresponding to and , displays the outline 100% and 25% of overall normalized array expression in the and subspace.
sorting. (c) Expression levels of (red) and (green) fit normalized sine and cosine functions of period (b) Correlation of each gene with vs. that with , for 784 cell cycle regulated genes, color-
Z =N-1 = 5,980 and phase 2 /13 (blue), respectively. coded according to the classification by Spellman et al. (3).
4
Fig. 4. Rotated normalized factor, CLB2, and CLN3 eigengenes. (a) Raster display of
RNT, where = , , and . (b) and capture 20% of the overall Fig. 5. Rotated normalized factor, CLB2, and CLN3 expression in the subspace
normalized expression each. (c) Expression levels of (red) and (blue) fit dashed associated with the cell cycle. (a) Array correlation with along the y-axis vs. that with |
graphs of normalized sine (red) and cosine (blue) of period T/2 = 66 min and phase /4, along the x-axis, color-coded according to the classification of the arrays into the five cell
respectively, and (green) fits dashed graph of normalized sine of period T = 112 min cycle stages, M/G1 (yellow), G1 (green), S (blue), S/G2 (red), and G2/M (orange). The
and phase - /8, from t = 7 to t = 119 min during the cell cycle. dashed unit and half-unit circles outline 100% and 25% of overall normalized array
expression in the and subspace. (b) Correlation of each gene with vs. that with , for
638 cell cycle regulated genes, color-coded according to the classification by Spellman et
al. (3).
Fig. 6. Genes sorted by relative correlation with and of rotated normalized factor, CLB2, and
CLN3. (a) Normalized expression of the sorted 4,579 genes in the 22 arrays, showing traveling wave of
expression from t = 0 to 119 min during the cell cycle and standing waves of expression in the CLB2-
and CLN3-overactive arrays. (b) Eigenarrays expression; the expression of and , the eigenarrays
corresponding to and , displays the sorting. (c) Expression levels of (red) and (green) fit
normalized sine and cosine functions of period Z =N-1 = 4,578 and phase /8 (blue), respectively.
5
MDS: Multidimensional
Scaling
—PCA requires vector representation
—It can be interesting to start from
pairwise distances between n points.
—How can we do dimension reduction and
visualization in this case?
—Find coordinates for points in d
dimensional space s.t. distances are
preserved as well as possible
6
Multidimensional Scaling