You are on page 1of 3

PCA

Def: PCA is a statistical technique applied to complex data sets aiming at reducing
the dimensionality of the data while
simultaneously retaining the maximum amount ofvariance. Dimensionality reduction is
achieved by creating a set of new
variables called principalcomponents (PC) which are linear combinations of the
original variables.

Informal method: change the coordinates and ignore the second dimension(for 2
variables), that is, project the data onto the x-axis

1. Compute the covariance matrix Σ


2. Compute the eigenvalues and the corresponding eigenvectors of Σ/VЛV^T
factorization/
3. Select the k biggest eigenvalues and their eigenvectors (V‘)
4. The k selected eigenvectors represent an orthogonal basis
5. Transform the original n × d data matrix D with the d × k basis V'

I.
-As PC1 usually captures the majority of variance, the original (potentially
high-dimensional) data can be projected on a subset of a few essential principal
components only, while higher
dimensions can be discarded without a major loss of information

-the input data need to be continuous, real-valued variables measured


on an interval or ratio scale because standard PCA investigates patterns of
covariance/correlation,
which only makes sense for such variables

II.
-covariance/correlation measures require that the relationship between each pair of
variables is linear.
(logarithmic transformation if non-linear)
Screening for outliers should be performed prior to the analysis, since outliers
can affect the size of the covariance/correlation

-An alternative approach of calculating the principal components is to use the


eigenvectors of the correlation
matrix R as weighting coefficients. Recall that R is a symmetric matrix whose off-
diagonal elements
represent the Pearson correlation coefficients between pairs of variables in a data
set.
The entries in the main diagonal are all equal to 1 because they correspond to the
correlation of a variable with itself

III.
-A major issue in PCA is to decide on how many components to retain in order to
still keep a sufficient
amount of variance but, at the same time, to achieve a substantial reduction in
dimensionality

-The probably most popular approach is the Scree


plot, where the principal components are plotted
on the x-axis in descending order against their
corresponding eigenvalues. This leads to a decreasing
function showing the variance explained by each PC.
This plot often shows a clear crease (the so-called
“elbow”) separating the ’most important’ components
from the ’least important’ components. All components
to the right of the break point can be discarded. The
disadvantage of this method is the visual inspection of
the Scree plot - a subjective way to identify the correct
number of principal components. Furthermore, in some
practical applications, it might be difficult to determine
the cut-off point where the slope of the line which goes
through the eigenvalues changes the most.

a good rule of thumb is to keep enough to explain 85% of the variation


A deprecated method is Kaiser's rule- take only eigenvalues bigger than 1
A more sophisticated technique is parallel analysis. The method compares the
eigenvalues generated from the data matrix to the eigenvalues generated
from a Monte-Carlo simulated matrix created from random data of the same size
Here, PCA is performed on a simulated data set with as many variables and cases as
there are in the original
data set. Averaged eigenvalues from the simulated data are compared to the
eigenvalues obtained from the real
data. Components from the real data with eigenvalues lower than the eigenvalues for
the simulated data are
discarded

IV. Interpretation

Principal components are interpreted based on the original variables which “load”
on them. Loadings
correspond to correlations or covariances between the original variables and
principal components. Variable
loadings are stored in a loading matrix, A, which is produced by multiplying the
matrix of the eigenvectors
with a diagonal matrix containing the square root of the corresponding eigenvalues

t-distributed stochastic neighbor embedding(t-SNE)


Def: t-Distributed stochastic neighbor embedding (t-SNE) minimizes the divergence
between two distributions: a distribution that measures
pairwise similarities of the input objects and a distribution that measures
pairwise similarities of the corresponding low-dimensional points in the embedding.
In this way, t-SNE maps the multi-dimensional data to a lower dimensional space and
attempts to find patterns in the data by identifying
observed clusters based on similarity of data points with multiple features.
However, after this process, the input features are no longer identifiable,
and you cannot make any inference based only on the output of t-SNE. Hence it is
mainly a data exploration and visualization technique

- can map d-dim data to 2 dimensions and visualize it very good


- Stochastic Neighbor Embedding (SNE) starts by converting the high-dimensional
Euclidean distances between datapoints into conditional probabilities
that represent similarities.The similarity of datapoint x j to datapoint xi
is the conditional probability, p j|i, that xi would pick x j as its neighbor
if neighbors were picked in proportion to their probability density under a
Gaussian centered at xi.
- See also Student-t-distribution

PCA vs. t-SNE


o t-SNE is computationally expensive and can take several hours on million-sample
datasets where PCA will finish in seconds or minutes.
o PCA it is a mathematical technique, but t-SNE is a probabilistic one.
o Linear dimensionality reduction algorithms, like PCA, concentrate on placing
dissimilar data points far apart in a lower dimension representation.
But in order to represent high dimension data on low dimension, non-linear
manifold, it is essential that similar data points must be represented close
together,
which is something t-SNE does not PCA.
o Sometimes in t-SNE different runs with the same hyperparameters may produce
different results hence multiple plots must be observed before making any
assessment with t-SNE, while this is not the case with PCA.
o Since PCA is a linear algorithm, it will not be able to interpret the complex
polynomial relationship between features while t-SNE is made to capture exactly
that.

You might also like