5 views

Uploaded by Chetan Desai

- Datapreprocessing
- 06299548
- Factorial
- v3-8-77
- Progress Report on Multivariate Statistics for Geothermal System Prediction_ Evidence From Indonesia(2)
- A Coherency-Based Approach for Signal Selection
- Pca and t-SNE dimensionality reduction
- cwe 4w twef ewf rdsf er
- DataScienceCurriculum_v3.docx
- ASTER Data Analysis for Mineral Potential Mapping Around Sawar-Malpura Area, Central Rajasthan by B. K. Bhadra&Suparn Pathak&G. Karunakar& J. R. Sharma
- The Tolerance for Ambiguity Scale_Towards a More Refined Measure for International Management Research
- Model-based Clustering and Typologies in the.pdf
- Coresets and Sketches for High Dimensional Subspace Approximation Problems
- hdi-ap-dec
- Literature Survey Nabs
- The Unscrambler Methods
- 1987 Multicollinearity Problems in Modeling Time Series With Trading-Day VariationMulticollinearity Problems in Modeling Time Series With Trading-Day Variation
- tmpAF8A.tmp
- Construction and Validation of-Diabetes Education Process-Scale
- chemoinformatics

You are on page 1of 100

We study phenomena that can not be directly observed

ego, personality, intelligence in psychology Underlying factors that govern the observed data

We want to identify and operate with underlying latent factors rather than the observed data

E.g. topics in news articles Transcription factors in genomics

beautiful car and gorgeous automobile are closely related So are driver and automobile But does your search engine know this? Reduces noise and error in results

We have too many observations and dimensions

To reason about or obtain insights from To visualize Too much noise in the data Need to reduce them to a smaller set of factors Better representation of data without losing much information Can build more effective data analyses on the reduceddimensional space: classification, clustering, pattern recognition

Combinations of observed variables may be more effective bases for insights, even if physical meaning is obscure

Discover a new set of factors/dimensions/axes against which to represent, describe or evaluate the data

For more effective reasoning, insights, or better visualization Reduce noise in the data Typically a smaller set of factors: dimension reduction Better representation of data without losing much information Can build more effective data analyses on the reduceddimensional space: classification, clustering, pattern recognition

May be more effective bases for insights, even if physical meaning is obscure Observed data are described in terms of these factors rather than in terms of original variables/dimensions

Basic Concept

Areas of variance in data are where items can be best discriminated and key underlying phenomena observed

Areas of greatest signal in the data

They are likely to represent highly related phenomena If they tell us about the same underlying variance in the data, combining them to form a single measure is reasonable

Parsimony Reduction in Error

So we want to combine related variables, and focus on uncorrelated or independent ones, especially those along which the observations have high variance We want a smaller set of variables that explain most of the variance in the original data, in more compact and insightful form

Basic Concept

What if the dependences and correlations are not so strong or direct? And suppose you have 3 variables, or 4, or 5, or 10000? Look for the phenomena underlying the observed covariance/co-dependence in a set of variables

Once again, phenomena that are uncorrelated or independent, and especially those along which the data show high variance

These phenomena are called factors or principal components or independent components, depending on the methods used

Factor analysis: based on variance/covariance/correlation Independent Component Analysis: based on independence

Most common form of factor analysis The new variables/dimensions

Are linear combinations of the original ones Are uncorrelated with one another

Orthogonal in original dimension space

Capture as much of the original variance in the data as possible Are called Principal Components

http://www.cs.mcgill.ca/~sqrt/dimr/dimreductio

Original Variable B PC 2 PC 1

Original Variable A

Orthogonal directions of greatest variance in data Projections along PC1 discriminate the data most along any one axis

Principal Components

First principal component is the direction of greatest variability (covariance) in the data Second is the next orthogonal (uncorrelated) direction of greatest variability

So first remove all the variability along the first component, and then find the next direction of greatest variability

And so on

Principle

Linear projection method to reduce the number of parameters Transfer a set of correlated variables into a new set of uncorrelated variables Map the data into a space of lower dimensionality Form of unsupervised learning

Properties

It can be viewed as a rotation of the existing axes to new positions in the space defined by original variables New axes are orthogonal and represent the directions with maximum variability

Data points are vectors in a multidimensional space Projection of vector x onto an axis (dimension) u is u.x Direction of greatest variability is that in which the average square of the projection is greatest

I.e. u such that E((u.x)2) over all x is maximized (we subtract the mean along each dimension, and center the original axis system at the centroid of all data points, for simplicity) This direction of u is the direction of the first Principal Component

E((u.x)2) = E ((u.x) (u.x)T) = E (u.x.x T.uT) The matrix C = x.xT contains the correlations (similarities) of the original axes based on how the data values project onto them So we are looking for w that maximizes uCuT, subject to u being unit-length It is maximized when w is the principal eigenvector of the matrix C, in which case

uCuT = uuT = if u is unit-length, where is the principal eigenvalue of the correlation matrix C

The eigenvalue denotes the amount of variability captured along that dimension

Maximise uTxxTu s.t uTu

=1 Construct Langrangian uTxxTu uTu Vector of partial derivatives set to zero xxTu u = (xxT I) u = 0 As u 0 then u must be an eigenvector of xxT with eigenvalue

The first root is called the prinicipal eigenvalue which has an associated orthonormal (uTu = 1) eigenvector u Subsequent roots are ordered such that 1> 2 > > M with rank(D) non-zero values. Eigenvectors form an orthonormal basis i.e. uiTuj = ij The eigenvalue decomposition of xxT = UUT where U = [u1, u2, , uM] and = diag[ 1, 2, , M] Similarly the eigenvalue decomposition of xTx = VVT The SVD is closely related to the above x=U 1/2 VT The left eigenvectors U, right eigenvectors V, singular values = square root of eigenvalues.

Similarly for the next axis, etc. So, the new axes are the eigenvectors of the matrix of correlations of the original variables, which captures the similarities of the original variables based on how data samples project to them

Linear transformation

The first PC retains the greatest amount of variation in the sample The kth PC retains the kth greatest fraction of the variation in the sample The kth largest eigenvalue of the correlation matrix C is the variance in the sample along the kth PC The least-squares view: PCs are a series of linear least squares fits to a sample, each orthogonal to all previous ones

For n original dimensions, correlation matrix is nxn, and has up to n eigenvectors. So n PCs. Where does dimensionality reduction come from?

Dimensionality Reduction

Can ignore the components of lesser significance.

25 20 15 10 Variance (%) 5 0 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

You do lose some information, but if the eigenvalues are small, you dont lose much

n dimensions in original data calculate n eigenvectors and eigenvalues choose only the first p eigenvectors, based on their eigenvalues final data set has only p dimensions

PCA/FA

Principal Components Analysis

Extracts all the factor underlying a set of variables The number of factors = the number of variables Completely explains the variance in each variable

Parsimony?!?!?!

Factor Analysis

Also known as Principal Axis Analysis Analyses only the shared variance

NO ERROR!

Documents are vectors in multi-dim Euclidean space

Each term is a dimension in the space: LOTS of them Coordinate of doc d in dimension t is prop. to TF(d,t)

TF(d,t) is term frequency of t in document d

t*d matrix, for t terms and d documents

Treated just like documents

Rank documents based on similarity of their vectors with the query vector

Using cosine distance of vectors

Problems

Looks for literal term matches

Terms in queries (esp short ones) dont always capture users information need well

Problems:

Synonymy: other words with the same meaning

Car and automobile

Apple (fruit and company)

What if we could match against concepts, that represent related words, rather than words themselves

Example of Problems

-- Relevant docs may not have the query terms but may have many related terms -- Irrelevant docs may have the query terms but may not have any related terms

Uses statistically derived conceptual indices instead of individual words for retrieval Assumes that there is some underlying or latent structure in word usage that is obscured by variability in word choice Key idea: instead of representing documents and queries as vectors in a t-dim space of terms

Represent them (and terms themselves) as vectors in a lower-dimensional space whose axes are concepts that effectively group together similar words These axes are the Principal Components from PCA

Example

Suppose we have keywords

Car, automobile, driver, elephant

We want queries on car to also get docs about drivers and automobiles, but not about elephants

What if we could discover that the cars, automobiles and drivers axes are strongly correlated, but elephants is not How? Via correlations observed through documents If docs A & B dont share any words with each other, but both share lots of words with doc C, then A & B will be considered similar E.g A has cars and drivers, B has automobiles and drivers

When you scrunch down dimensions, small differences (noise) gets glossed over, and you get desired behavior

-- Relevant docs may not have the query terms but may have many related terms -- Irrelevant docs may have the query terms but may not have any related terms

Take the vector representation of the query in the original term space and transform it to new space

Turns out dq = xqt T -1 places the query pseudo-doc at the centroid of its corresponding terms locations in the new space

Similarity with existing docs computed by taking dot products with the rows of the matrix D2Dt

Definitions

Let t be the total number of index terms Let N be the number of documents Let (Mij) be a term-document matrix with t rows and N columns

Entries are TF-based weights

The matrix (Mij) can be decomposed into 3 matrices (SVD) as follows: (Mij) = (U) (S) (V)t

(U) is the matrix of eigenvectors derived from (M)(M)t (V)t is the matrix of eigenvectors derived from (M)t(M) (S) is an r x r diagonal matrix of singular values

r = min(t,N) that is, the rank of (Mij) Singular values are the positive square roots of the eigen values of (M)(M)t (also (M)t(M))

Example

term controllability observability realization feedback controller observer transfer function polynomial matrices ch2 1 1 1 0 0 0 0 0 0 ch3 1 0 0 1 1 1 0 0 0 ch4 0 0 1 0 0 1 0 0 0 ch5 0 0 0 0 0 0 0 0 0 ch6 1 1 1 0 1 1 1 1 1 ch7 0 1 0 1 1 1 1 0 0 ch8 0 0 1 0 0 0 0 1 1 ch9 1 1 0 0 0 0 0 0 1

U (9x7) = 0.3996 0.4180 0.3464 0.1888 0.3602 0.4075 0.2750 0.2259 0.2958 -0.1037 -0.0641 -0.4422 0.4615 0.3776 0.3622 0.1667 -0.3096 -0.4232 0.5606 0.4878 -0.3997 0.0049 -0.0914 -0.3657 -0.1303 -0.3579 0.0277 -0.3717 0.1566 -0.5142 -0.0279 0.1596 -0.2684 0.4376 0.3127 0.4305 -0.3919 0.5771 0.2787 -0.2087 -0.2045 -0.0174 0.3844 -0.2406 -0.3800 -0.3482 0.1981 0.0102 0.4193 -0.3701 0.2711 -0.3066 -0.3122 0.5114 0.1029 -0.1094 -0.2857 -0.6629 -0.1023 0.5676 0.1230 -0.2611 0.2010

S (7x7) = 3.9901 0 0 0 0 0 0 0 2.2813 0 0 0 0 0 0 0 1.6705 0 0 0 0 0 0 0 1.3522 0 0 0 0 0 0 0 1.1818 0 0 0 0 0 0 0 0.6623 0 0 0 0 0 0 0 0.6487 V (7x8) = 0.2917 0.3399 0.1889 -0.0000 0.6838 0.4134 0.2176 0.2791 -0.2674 0.4811 -0.0351 -0.0000 -0.1913 0.5716 -0.5151 -0.2591 0.3883 0.0649 -0.4582 -0.0000 -0.1609 -0.0566 -0.4369 0.6442 -0.5393 -0.3760 -0.5788 -0.0000 0.2535 0.3383 0.1694 0.1593 0.3926 -0.6959 0.2211 0.0000 0.0050 0.4493 -0.2893 -0.1648 -0.2112 -0.0421 0.4247 -0.0000 -0.5229 0.3198 0.3161 0.5455 -0.4505 -0.1462 0.4346 0.0000 0.3636 -0.2839 -0.5330 0.2998

This happens to be a rank-7 matrix -so only 7 dimensions required Singular values = Sqrt of Eigen values of AAT

The key idea is to map documents and queries into a lower dimensional space (i.e., composed of higher level concepts which are in fewer number than the index terms) Retrieval in this reduced concept space might be superior to retrieval in the space of index terms

In matrix (S), select only k largest values Keep corresponding columns in (U) and (V)t The resultant matrix (M)k is given by

(M)k = (U)k (S)k (V)tk where k, k < r, is the dimensionality of the concept space

large enough to allow fitting the characteristics of the data small enough to filter out the non-relevant representational detail

In the sense of having to find quantities that are not observable directly Similarly, transcription factors in biology, as unobservable causal bridges between experimental conditions and gene expression

Documents

Documents

Terms

M

mxn A

U

mxr U

S

rxr D

Vt

rxn VT

Uk

mxk Uk

Sk

kxk Dk

Vkt

kxn T Vk

= mxn =

Terms

Singular Value Decomposition (SVD): Convert term-document matrix into 3matrices U, S and V

Recreate Matrix: Multiply to produce approximate termdocument matrix. Use new matrix to process queries OR, better, map query to reduced space

term controllability observability realization feedback controller observer transfer function polynomial matrices ch2 1 1 1 0 0 0 0 0 0 ch3 1 0 0 1 1 1 0 0 0 ch4 0 0 1 0 0 1 0 0 0 ch5 0 0 0 0 0 0 0 0 0 ch6 1 1 1 0 1 1 1 1 1 ch7 0 1 0 1 1 1 1 0 0 ch8 0 0 1 0 0 0 0 1 1 ch9 1 1 0 0 0 0 0 0 1

U (9x7) = 0.3996 0.4180 0.3464 0.1888 0.3602 0.4075 0.2750 0.2259 0.2958 -0.1037 -0.0641 -0.4422 0.4615 0.3776 0.3622 0.1667 -0.3096 -0.4232 0.5606 0.4878 -0.3997 0.0049 -0.0914 -0.3657 -0.1303 -0.3579 0.0277 -0.3717 0.1566 -0.5142 -0.0279 0.1596 -0.2684 0.4376 0.3127 0.4305 -0.3919 0.5771 0.2787 -0.2087 -0.2045 -0.0174 0.3844 -0.2406 -0.3800 -0.3482 0.1981 0.0102 0.4193 -0.3701 0.2711 -0.3066 -0.3122 0.5114 0.1029 -0.1094 -0.2857 -0.6629 -0.1023 0.5676 0.1230 -0.2611 0.2010

S (7x7) = 3.9901 0 0 0 0 0 0 0 2.2813 0 0 0 0 0 0 0 1.6705 0 0 0 0 0 0 0 1.3522 0 0 0 0 0 0 0 1.1818 0 0 0 0 0 0 0 0.6623 0 0 0 0 0 0 0 0.6487 V (7x8) = 0.2917 0.3399 0.1889 -0.0000 0.6838 0.4134 0.2176 0.2791 -0.2674 0.4811 -0.0351 -0.0000 -0.1913 0.5716 -0.5151 -0.2591 0.3883 0.0649 -0.4582 -0.0000 -0.1609 -0.0566 -0.4369 0.6442 -0.5393 -0.3760 -0.5788 -0.0000 0.2535 0.3383 0.1694 0.1593 0.3926 -0.6959 0.2211 0.0000 0.0050 0.4493 -0.2893 -0.1648 -0.2112 -0.0421 0.4247 -0.0000 -0.5229 0.3198 0.3161 0.5455 -0.4505 -0.1462 0.4346 0.0000 0.3636 -0.2839 -0.5330 0.2998

This happens to be a rank-7 matrix -so only 7 dimensions required Singular values = Sqrt of Eigen values of AAT

U (9x7) = 0.3996 -0.1037 0.5606 -0.3717 -0.3919 -0.3482 0.4180 -0.0641 0.4878 0.1566 0.5771 0.1981 0.3464 -0.4422 -0.3997 -0.5142 0.2787 0.0102 0.1888 0.4615 0.0049 -0.0279 -0.2087 0.4193 0.3602 0.3776 -0.0914 0.1596 -0.2045 -0.3701 0.4075 0.3622 -0.3657 -0.2684 -0.0174 0.2711 0.2750 0.1667 -0.1303 0.4376 0.3844 -0.3066 0.2259 -0.3096 -0.3579 0.3127 -0.2406 -0.3122 0.2958 -0.4232 0.0277 0.4305 -0.3800 0.5114 S (7x7) = 3.9901 0 0 0 0 0 0 0 2.2813 0 0 0 0 0 0 0 1.6705 0 0 0 0 0 0 0 1.3522 0 0 0 0 0 0 0 1.1818 0 0 0 0 0 0 0 0.6623 0 0 0 0 0 0 0 0.6487 V (7x8) = 0.2917 -0.2674 0.3883 -0.5393 0.3926 -0.2112 0.3399 0.4811 0.0649 -0.3760 -0.6959 -0.0421 0.1889 -0.0351 -0.4582 -0.5788 0.2211 0.4247 -0.0000 -0.0000 -0.0000 -0.0000 0.0000 -0.0000 0.6838 -0.1913 -0.1609 0.2535 0.0050 -0.5229 0.4134 0.5716 -0.0566 0.3383 0.4493 0.3198 0.2176 -0.5151 -0.4369 0.1694 -0.2893 0.3161 0.2791 -0.2591 0.6442 0.1593 -0.1648 0.5455

Formally, this will be the rank-k (2) matrix that is closest to M in the 0.1029 matrix norm sense

U2 (9x2) = 0.3996 -0.1037 0.4180 -0.0641 0.3464 -0.4422 0.1888 0.4615 0.3602 0.3776 0.4075 0.3622 0.2750 0.1667 0.2259 -0.3096 0.2958 -0.4232 S2 (2x2) = 3.9901 0 0 2.2813 V2 (8x2) = 0.2917 -0.2674 T 0.3399 0.4811 0.1889 -0.0351 -0.0000 -0.0000 0.6838 -0.1913 0.4134 0.5716 0.2176 -0.5151 0.2791 -0.2591

5 components ignored

0.52835834 0.42813724 0.30949408 0.0 1.1355368 0.5239192 0.46880865 0.5063048 0.5256176 0.49655432 0.3201918 0.0 1.1684579 0.6059082 0.4382505 0.50338876 0.6729299 -0.015529543 0.29650056 0.0 1.1381099 -0.0052356124 0.82038856 0.6471 -0.0617774 0.76256883 0.10535021 0.0 0.3137232 0.9132189 -0.37838274 -0.06253 0.18889774 0.90294445 0.24125765 0.0 0.81799114 1.0865396 -0.1309748 0.17793834 0.25334513 0.95019233 0.27814224 0.0 0.9537667 1.1444798 -0.071810216 0.2397161

U2S2V2T

term controllability observability realization feedback controller observer transfer function polynomial matrices ch2 1 1 1 0 0 0 0 0 0 ch3 1 0 0 1 1 1 0 0 0 ch4 0 0 1 0 0 1 0 0 0 ch5 0 0 0 0 0 0 0 0 0 ch6 1 1 1 0 1 1 1 1 1 ch7 0 1 0 1 1 1 1 0 0 ch8 0 0 1 0 0 0 0 1 1 ch9 1 1 0 0 0 0 0 0 1

0.21838559 0.55592346 0.19392742 0.0 0.6775683 0.6709899 0.042878807 0.2077163 0.4517898 -0.033422917 0.19505836 0.0 0.75146574 -0.031091988 0.55994695 0.4345 0.60244554 -0.06330189 0.25684044 0.0 0.99175954 -0.06392482 0.75412846 0.5795

U4S4V4T

1.1630535 0.67789733 0.17131016 0.0 0.85744447 0.30088043 -0.025483057 1.0295205

K=4

3 components ignored

0.7278324 0.46981966 -0.1757451 0.0 1.0910251 0.6314231 0.11810507 1.0620605 0.78863835 0.20257005 1.0048805 0.0 1.0692837 -0.20266426 0.9943222 0.106248446 -0.03825318 0.7772852 0.12343567 0.0 0.30284256 0.89999276 -0.3883498 -0.06326774 0.013223715 0.8118903 0.18630582 0.0 0.8972661 1.1681904 -0.027708884 0.11395822 0.21186034 1.0470067 0.76812166 0.0 0.960058 1.0562774 0.1336124 -0.2116417 -0.18525022 0.31930918 -0.048827052 0.0 0.8625925 0.8834896 0.23821498 0.1617572 -0.008397698 -0.23121 0.2242676 0.0 0.9548515 0.14579195 0.89278513 0.1167786 0.30647483 -0.27917668 -0.101294056 0.0 1.1318822 0.13038804 0.83252335 0.70210195

K=6

One component ignored

U6S6V6T

1.0299273 1.0099105 -0.029033005 0.0 0.9757162 0.019038305 0.035608776 0.98004794 0.96788234 -0.010319378 0.030770123 0.0 1.0258299 0.9798115 -0.03772955 1.0212346 0.9165214 -0.026921304 1.0805727 0.0 1.0673982 -0.052518982 0.9011715 0.055653755 -0.19373542 0.9372319 0.1868434 0.0 0.15639876 0.87798584 -0.22921464 0.12886547 -0.029890355 0.9903935 0.028769515 0.0 1.0242295 0.98121595 -0.03527296 0.020075336 0.16586632 1.0537577 0.8398298 0.0 0.8660687 1.1044582 0.19631699 -0.11030859 0.035988174 0.01172187 -0.03462495 0.0 0.9710446 1.0226605 0.04260301 -0.023878671 -0.07636017 -0.024632007 0.07358454 0.0 1.0615499 -0.048087567 0.909685 0.050844945 0.05863098 0.019081593 -0.056740552 0.0 0.95253044 0.03693092 1.0695065 0.96087193

Querying

To query for feedback controller, the query vector would be q = [0 0 0 1 1 0 0 0 0]' (' indicates transpose), Let q be the query vector. Then the document-space vector corresponding to q is given by: q'*U2*inv(S2) = Dq Point at the centroid of the query terms poisitions in the new space. For the feedback controller query vector, the result is: Dq = 0.1376 0.3678 To find the best document match, we compare the Dq vector against all the document vectors in the 2-dimensional V2 space. The document vector that is nearest in direction to Dq is the best match. The cosinevalues for the eight document vectors and the query vector are:

-0.3747 0.9671 0.1735 -0.9413 0.0851 0.9642 -0.7265 -0.3805

U2 (9x2) = 0.3996 -0.1037 0.4180 -0.0641 0.3464 -0.4422 0.1888 0.4615 0.3602 0.3776 0.4075 0.3622 0.2750 0.1667 0.2259 -0.3096 0.2958 -0.4232 S2 (2x2) = 3.9901 0 0 2.2813 V2 (8x2) = 0.2917 -0.2674 0.3399 0.4811 0.1889 -0.0351 -0.0000 -0.0000 0.6838 -0.1913 0.4134 0.5716 0.2176 -0.5151 0.2791 -0.2591

term controllability observability realization feedback controller observer transfer function polynomial matrices

ch2 1 1 1 0 0 0 0 0 0

ch3 1 0 0 1 1 1 0 0 0

ch4 0 0 1 0 0 1 0 0 0

ch5 0 0 0 0 0 0 0 0 0

ch6 1 1 1 0 1 1 1 1 1

ch7 0 1 0 1 1 1 1 0 0

ch8 0 0 1 0 0 0 0 1 1

ch9 1 1 0 0 0 0 0 0 1

-0.37

0.967

0.173

-0.94

0.08

0.96

-0.72

-0.38

Medline data

K is the number of singular values used

LSI analysis effectively does

Dimensionality reduction Noise reduction Exploitation of redundant data Correlation analysis and Query expansion (with related words)

Some of the individual effects can be achieved with simpler techniques (e.g. thesaurus construction). LSI does them together. LSI handles synonymy well, not so much polysemy Challenge: SVD is complex to compute (O(n3))

Needs to be updated as new documents are found/updated

SVD Properties

There is an implicit assumption that the observed data distribution is multivariate Gaussian Can consider as a probabilistic generative model latent variables are Gaussian sub-optimal in likelihood terms for non-Gaussian distribution Employed in signal processing for noise filtering dominant subspace contains majority of information bearing part of signal Similar rationale when applying SVD to LSI

LSI Conclusions

SVD defined basis provide P/R improvements over term matching

Interpretation difficult Optimal dimension open question Variable performance on LARGE collections Supercomputing muscle required

Clear interpretation of decomposition Optimal dimension open question High variability of results due to nonlinear optimisation over HUGE parameter space

Factor Analysis (e.g. PCA) is not the most sophisticated dimensionality reduction technique

Dimensionality reduction is a useful technique for any classification/regression problem

Text retrieval can be seen as a classification problem

Neural nets, support vector machines etc.

It cannot capture non-linear dependencies between original dimensions (e.g. data that are circularly distributed).

Limitations of PCA

Are the maximal variance dimensions the relevant dimensions for preservation? Relevant Component Analysis (RCA) Fisher Discriminant analysis (FDA)

Limitations of PCA

Should the goal be finding independent rather than pair-wise uncorrelated dimensions Independent Component Analysis (ICA) ICA PCA

PCA vs ICA

Limitations of PCA

The reduction of dimensions for complex distributions may need non linear processing Curvilenear Component Analysis (CCA)

Non linear extension of PCA Preserves the proximity between the points in the input space i.e. local topology of the distribution Enables to unfold some varieties in the input data Keep the local topology

Eigenfaces are the eigenvectors of the covariance matrix of the probability distribution of the vector space of human faces Eigenfaces are the standardized face ingredients derived from the statistical analysis of many pictures of human faces A human face may be considered to be a combination of these standard faces

To generate a set of eigenfaces: 1. 2. 3. Large set of digitized images of human faces is taken under the same lighting conditions. The images are normalized to line up the eyes and mouths. The eigenvectors of the covariance matrix of the statistical distribution of face image vectors are then extracted. These eigenvectors are called eigenfaces.

4.

the principal eigenface looks like a bland androgynous average human face

http://en.wikipedia.org/wiki/Image:Eigenfaces.png

When properly weighted, eigenfaces can be summed together to create an approximate grayscale rendering of a human face. Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces Hence eigenfaces provide a means of applying data compression to faces for identification purposes. Similarly, Expert Object Recognition in Video

Eigenfaces

Experiment and Results Data used here are from the ORL database of faces. Facial images of 16 persons each with 10 views are used. - Training set contains 167 images. - Test set contains 163 images. First three eigenfaces :

Save average coefficients for each person. Classify new face as the person with the closest average. Recognition accuracy increases with number of eigenfaces till 15. Later eigenfaces do not help much with recognition.

1 0.8 a c c u ra c y 0.6 0.4 0 50 100 150 num of eigenfaces ber validation set training set

34 patients, dimension of 8973 genes reduced to 2

Plot of 8973 genes, dimension of 34 patients reduced to 2

Data: A subset of sporulation data (477 genes) were classified into seven temporal patterns (Chu et al., 1998) The first 2 PCs contains 85.9% of the variation in the data. (Figure 1a) The first 3 PCs contains 93.2% of the variation in the data. (Figure 1b)

Sporulation Data

The patterns overlap around the origin in (1a). The patterns are much more separated in (1b).

Variation in processes: Chance

Natural variation inherent in a process. Cumulative effect of many small, unavoidable causes.

Assignable

Variations in raw material, machine tools, mechanical failure and human error. These are accountable circumstances and are normally larger.

Latent variables can sometimes be interpreted as measures of physical characteristics of a process i.e., temp, pressure. Variable reduction can increase the sensitivity of a control scheme to assignable causes Application of PCA to monitoring is increasing

Start with a reference set defining normal operation conditions, look for assignable causes Generate a set of indicator variables that best describe the dynamics of the process PCA is sensitive to data types

Cocktail-party Problem

Multiple sound sources in room (independent) Multiple sensors receiving signals which are mixture of original signals Estimate original source signals from mixture of received signals Can be viewed as Blind-Source Separation as mixing parameters are not known

Cocktail party or Blind Source Separation (BSS) problem

Ill posed problem, unless assumptions are made!

Most common assumption is that source signals are statistically independent. This means knowing value of one of them gives no information about the other. Methods based on this assumption are called Independent Component Analysis methods

statistical techniques for decomposing a complex data set into independent parts.

It can be shown that under some reasonable conditions, if the ICA assumption holds, then the source signals can be recovered up to permutation and scaling.

Microphone 1 W W W Microphone 2 W22 +

11

Separation 1 +

21

12

Separation 2

s1 s2 s3 s4 xi(t) = ai1*s1(t) + ai2*s2(t) + ai3*s3(t) + ai4*s4(t) Here, i=1:4. In vector-matrix notation, and dropping index t, this is x=A*s

a11

a12

a13

a14

x1

x2

x3

x4

This is recorded by the microphones: a linear mixture of the sources xi(t) = ai1*s1(t) + ai2*s2(t) + ai3*s3(t) + ai4*s4(t)

Recovered signals

BSS

Problem: Determine the source signals s, given only the mixtures x.

If we knew the mixing parameters aij then we would just need to solve a linear system of equations. We know neither aij nor si. ICA was initially developed to deal with problems closely related to the cocktail party problem Later it became evident that ICA has many other applications

e.g. from electrical recordings of brain activity from different locations of the scalp (EEG signals) recover underlying components of brain activity

ICA is a statistical method, the goal of which is to decompose given multivariate data into a linear sum of statistically independent components. For example, given two-dimensional vector , x = [ x1 x2 ] T , ICA aims at finding the following decomposition x1 a11 = s1 + a12 s 2 x2 a 21 a 22

x = a1 s1 + a 2 s 2

where a1, a2 are basis vectors and s1, s2 are basis coefficients. Constraint: Basis coefficients s1 and s2 are statistically independent.

Blind source separation Image denoising Medical signal processing fMRI, ECG, EEG Modelling of the hippocampus and visual cortex Feature extraction, face recognition Compression, redundancy reduction Watermarking Clustering Time series analysis (stock market, microarray data) Topic extraction Econometrics: Finding hidden factors in financial data

Image denoising

Original image

Noisy image

Wiener filtering

ICA filtering

1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2

50

100

150

200

250

300

350

Approaches to ICA

Unsupervised Learning

Factorial coding (Minimum entropy coding, Redundancy reduction) Maximum likelihood learning Nonlinear information maximization (entropy maximization) Negentropy maximization Bayesian learning

Higher-order moments or cumulants Joint approximate diagonalization Maximum likelihood estimation

In the heart, PET can:

quantify the extent of heart disease can calculate myocardial blood flow or metabolism quantitatively

Static image

accumulating the data during the acquisition

Dynamic image

for the quantitative analysis (ex. blood flow, metabolism) acquiring the data sequentially with a time interval

frame n

tem po ral

frame 1

frame 3

frame 2

spatial

y1y2y3yN

Unmixing

g(u) u1 u2

Noise

u3

uN

Elementary Activities

Dynamic Frames

Independent Components

Independent Components

Right Ventricle

Left Ventricle

Tissue

In cancer, PET can:

distinguish benign from malignant tumors stage cancer by showing metastases anywhere in body

calculate cerebral blood flow or metabolism quantitatively. positively diagnose Alzheimer's disease for early intervention. locate tumors in the brain and distinguish tumor from scar tissue. locate the focus of seizures for some patients with epilepsy. find regions related with specific task like memory, behavior etc.

Eigenfaces PCA/FA

It finds a linear data representation that best model the covariance structure of face image data (second-order relations) Factorial Faces Factorial Coding/ICA It finds a linear data representation that Best model the probability distribution of

Eigenfaces

PCA

= a1

+ a2

+ a3

+ a4

+ an

= b1

+ b2

+ b3

+ b4

+ bn

Experimental Results

Figure 1. Sample images in the training set. (neutral expression, anger, and right-light-on from first session; smile and left-light-on from second session)

Figure 2. Sample images in the test set. (smile and left-light-on from first session; neutral expression, anger, and right-light-on from second session)

(a)

(b)

Figure 3. First 20 basis images: (a) in eigenface method; (b) factorial code. They are ordered by column, then, by row.

Experimental Results

Figure 1. The comparison of recognition performance using Nearest Neighbor: the eigenface method and Factorial Code Representation using 20 or 30 principal components in the snap-shot method;

Experimental Results

Figure 2. The comparison of recognition performance using MLP Classifier: the eigenface method and Factorial Code Representation using 20 or 50 principal components in the snap-shot method;

PCA vs ICA

Linear Transform

Compression Classification

PCA

Focus on uncorrelated and Gaussian components Second-order statistics Orthogonal transformation

ICA

Focus on independent and non-Gaussian components Higher-order statistics Non-orthogonal transformation

If some components are gaussian and some are non-gaussian.

Can estimate all non-gaussian components Linear combination of gaussian components can be estimated. If only one gaussian component, model can be estimated

- DatapreprocessingUploaded bynoushad nk
- 06299548Uploaded byPradeep Kundu
- FactorialUploaded byjamesdkay
- v3-8-77Uploaded byKittu M-Tech
- Progress Report on Multivariate Statistics for Geothermal System Prediction_ Evidence From Indonesia(2)Uploaded byIchsan Alfan
- A Coherency-Based Approach for Signal SelectionUploaded byRabbuni Gangavarapu
- Pca and t-SNE dimensionality reductionUploaded byИван Радонов
- cwe 4w twef ewf rdsf erUploaded byvgmanjunatha
- DataScienceCurriculum_v3.docxUploaded bysarav.karthik
- ASTER Data Analysis for Mineral Potential Mapping Around Sawar-Malpura Area, Central Rajasthan by B. K. Bhadra&Suparn Pathak&G. Karunakar& J. R. SharmaUploaded bydzzz
- The Tolerance for Ambiguity Scale_Towards a More Refined Measure for International Management ResearchUploaded byIan McKay
- Model-based Clustering and Typologies in the.pdfUploaded byRafaela Sturza
- Coresets and Sketches for High Dimensional Subspace Approximation ProblemsUploaded byVishesh Karwa
- hdi-ap-decUploaded byrajanandguru
- Literature Survey NabsUploaded byNirmal Pradeep Jeldi
- The Unscrambler MethodsUploaded byMostafa Afify
- 1987 Multicollinearity Problems in Modeling Time Series With Trading-Day VariationMulticollinearity Problems in Modeling Time Series With Trading-Day VariationUploaded bythe4asempire
- tmpAF8A.tmpUploaded byFrontiers
- Construction and Validation of-Diabetes Education Process-ScaleUploaded byGlobal Research and Development Services
- chemoinformaticsUploaded byMahesh Kumar Guzuva Desikan
- g8d Flows Dep 042606Uploaded byapi-3856433
- [S-01]-Saunders 2012 (RTTOV).pdfUploaded byParameswara Boraiah
- Time and Space Efficient Spectral Clustering via Column Sampling - Li Et Al. - Proceedings of IEEE Conference on Computer Vision and Pattern Recognition - 2011Uploaded byzukun
- Glover NilssonUploaded byNelly B Vivianna
- Robust Digital Image-Adaptive Watermarking Using BSS Based Extraction TechniqueUploaded bydiankusuma123
- 293-2390-1-PBUploaded byAbhishek Puri
- 1610.08104.pdfUploaded by6doit
- ant Analysis - PptUploaded bymilancefakultet
- The Metabolic Topography of ParkinsonismUploaded byMihaela Toader
- Airbag Fall ProtectionUploaded bydevi

- Face ChallengeUploaded byrajesh199155
- Assignment 2 NewUploaded byNasrin Dorreh
- PcaUploaded bySelva Kumar
- Face Recognition Line Edge MapsUploaded byJarjit Tandel
- Face Recognition Using EigenfacesUploaded byykaitao
- Ip_Lect_9Uploaded byapi-3696125
- Eigenface TutorialUploaded byAnonymous la43A1
- Eigenface TutorialUploaded byMian
- Sound Source LocalizationUploaded byChackrapani Wickramarathne
- 12Uploaded byBurhan Rajput
- Week6_FaceDetection.pptUploaded bypradanjain
- Face Recognition Using PCA Based EigenfacesUploaded byMayank Shekhar
- Face RecognitionUploaded byDr-Ahmed Abdulridha
- Incremental and Robust PCAUploaded byalexandergir
- 2. Ijcseitr- Face Recognition System Usung Pca , Lda , Kernel Pca and Kernel LdaUploaded byTJPRC Publications
- Face Detection in Color Images Using PCAUploaded byRuben Alpredo
- Markov ChainsUploaded byDavood Pour Yousefian Barfeh
- An Image Based Approach of IRIS Recognition for Person Identification using Segmentation AlgorithmUploaded byEditor IJRITCC
- Thesis Proposal - FPGA-Based Face Recognition System, By Poie - Nov 12, 2009Uploaded byPoie
- Face Recognition Using Eigenfaces and Neural Networks PDFUploaded byStacy
- Co-Occurrence Based Statistical Approach for Face RecognitionUploaded byMohammad Rofii
- MATLAB Based Face Recognition System Using PCA and Neural NetworkUploaded byInternational Association of Scientific Innovations and Research (IASIR)
- Face RecognitionUploaded byvbriceno1
- Sterling, Crispin - Data-masks.pdfUploaded byElizabeth Jacobs
- Automating Tinder With EigenfacesUploaded byWeirliam John
- An Approach to Face Recognition Using Feed Forward Neural NetworkUploaded byATS
- face recognition using ica presentationUploaded bySadia Khan
- Face Recognition and DetectionUploaded bychaithra580
- 253 #deadbeef)Uploaded byswinefl
- RealTimeFacialDetectionandRecognitionUsingHaarCascadeandSupportVectorMachines.pdfUploaded byNguyễn Hoàng Ân