PCA Cluster Analysis

Principal Component
Analysis (PCA) and K-means

Clustering
Class Note
PETE 630 Geostatistics
PCA
• Principal component analysis (PCA) finds a set
of standardized linear combination (called
principal components) which are orthogonal and
taken together explain all the variances of the
original data.
• In general there are as many principal
components as variables. However, in most
cases it is possible to consider first few principal
components that explain most of the data
variance.
PETE 630 Geostatistics 2
PCA Details
• Orthogonal linear transformation
• First, normalize the data to a mean of zero and
variance of unity. This removes the undue
impact of size or dimensions of the variables
involved
• Next we construct the covariance matrix (this is
actually a correlation matrix as the data has
been normalized)
Σx x T

PCA Details
• From matrix algebra, the covariance(correlation) matrix
can be factored into a diagonal matrix and an orthogonal
matrix, L and Q respectively
Σ  QT ΛQ
• The principal components are given by the following
y  QT x

PCA Example: Well Log Data Analysis
Variance given as:
Covariance given as:

Covariance for Each Variables
Covariance matrix given by:
Advantage:
Reduction of data to a 6 x 6
covariance matrix instead of
multiple well log data

Eigenvalues
Eigenvalues of a matrix A are given by:
Corresponding to each eigenvalues there will be a non-

trivial solution, i.e. x not equal to 0
Ax=lx

Eigenvalues
Sum of the eigenvalues yields the overall variance of data
set

PCA
• Each principal axis is a linear combination of the original
two variables
• PCj = ai1Y1 + ai2Y2 + … ainYn
• aij’s are the coefficients for factor i, multiplied by the
measured value for variable j
• PC 1 is simultaneously the direction of maximum
variance and a least-squares “line of best fit” (squared
distances of points away from PC 1 are minimized).
8
6
PC 1
4
PC 2
Variable X 2
0
-8 -6 -4 -2 0 2 4 6 8 10 12
-2
-4
-6
Variable X1

PCA
PC1 PC2 PC3 PC4 PC5 PC6

GR -0.16 0.43 -0.09 0.06 -0.31 -0.02
NPHI -0.42 -0.16 0.19 0.86 -0.15 0.08
RHOB 0.41 0.06 -0.76 0.42 0.27 -0.01
DT -0.46 0.21 0.05 -0.04 0.86 -0.09
log (LLD) 0.46 0.13 0.45 0.24 0.13 -0.71
log(MSFL) 0.46 0.20 0.42 0.15 0.25 0.70
Contribution, % 64.5 13.4 8.1 7.4 4.1 2.5
Cum.Contribution, % 61.3 77.9 86.0 93.4 97.5 100
PC2 = 0.43(GR)-
0.16(NPHI)+0.06(RHOB)+0.21(DT)+0.13log(LLD)+0.20log(MSFL)

PCA
DT

PCA
4.0
3.0
2.0
1.0
PC 2
0.0
-1.0
-2.0
-3.0
-6.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0
PC 1 Tightness

K-means Clustering
• A method of cluster analysis which aims to partition n
observations into k clusters in which each observation
belongs to the cluster with the nearest mean.
• Given a set of observations (x1, x2, …, xn), where each
observation is a d-dimensional real vector, k-means
clustering aims to partition the n observations into k sets
(k ≤ n) S = {S1, S2, …, Sk} so as to minimize the within-
cluster sum of squares:
k
arg s min  x
2
j  i
i 1 x j S i
where, i is the mean of points in Si .

K-means Algorithm
• Uses an iterative refinement technique.
• Given an initial set of k means m1, m2, ….,mk, the
algorithm proceeds by alternating between two steps:
 Assignment step: Assign each observation to the cluster with
the closest mean:

Si(t )  x p : x p  mi(t )  x p  m (jt ) 1  j  k 
where each xp goes into exactly one Si(t ), even if it could go in
two of them.
 Updating step: Calculate the new means to be the centroid of
the observations in the cluster.
1
mi(t 1) 
Si(t )
x j
x j S i( t )

Steps for Clustering
• Step 1: K initial “means” (in this case k = 3) are randomly
selected from the data set.
• Step 2: K clusters are created by associating every
observation with the nearest mean.
• Step 3: The centroid of each of the k clusters becomes
the new means.
• Step 4: Step2 through 3 are repeated until convergence
has been reached.

STEP 1
K initial “means” (in this case k = 3) are randomly selected from the data set.
5
4
k1
k2
2
k3
0
0 1 2 3 4 5

STEP 2
K clusters are created by associating every observation with the nearest mean.
5
4
k1
k2
2
k3
0
0 1 2 3 4 5

STEP 3
The centroid of each of the k clusters becomes the new means.
5
4
k1
2
k3
k2
1
0
0 1 2 3 4 5

K-means Clustering: Step 4
Step2 through 3 are repeated until convergence has been reached.
5
4
k1
2
k3
k2
1
0
0 1 2 3 4 5

Use of K-Means Clustering
• Strength
 Relatively efficient, simple and easy way to classify a
given data set
 May find and terminate at a local optimum solution.
• The global optimum solution may be searched with other
techniques (for example, Genetic algorithms)
• Weakness
 The number of clusters, k, must be specified in advance
• No way of knowing how many clusters exist
 Sensitive to initial condition
• Different result of cluster if different initial condition is used
 Result is more likely circular shape because of use of
distance

Discriminant Analysis
Discriminant analysis aims at classifying unknown samples to one of

several possible groups or classes.
Classical discriminant function analysis attempts to develop a linear
equation (a linear weighted function of the variables) that best
differentiates between two different classes
The approach requires a training data set for which assignment to
population groups are already known ( a supervised method)
If two data groups are plotted in a multidimensional space, they would

appear as two data clouds with either a distinct separation or some
overlap.
An axis is located on which the distance between each cloud is
maximized while the dispersion within each cloud is simultanneously
minimized.
The new axis defines the linear discriminant function and is calculated
from the multivariate means, variances and covariances of the data
groups.
The data points of the two groups may be projected onto this axis to
collapse multidimensional data in terms of a single discriminant variable
We assume that each data group is characterized by a group specific
probability density function
If each group is characterized by a group specific probability density
function fc(x) and the prior probability of the group pc is known, then the
posterior distribution of the classes given the observation x is (Bayes’
theorem)
p p ( x | c ) p c f c ( x)
p ( c | x)  c   p c f c ( x)
p ( x) p ( x)
The observation is allocated to the group with the maximal posterior

probability
• By Bayes’ theorem, the posterior distribution
of the classes given the observation is
p c p (x | c) p c f c ( x)
p (c | x)    p c f c ( x)
p ( x) p ( x)
 By Maximum Likelihood rule,

Qc  2 log f c (x)  2 log p c
 (x  μ c )T Σ c1 (x  μ c )  log | Σ c | 2 log p c

PCA Cluster Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PCA Cluster Analysis

Uploaded by

Copyright:

Available Formats

Principal Component

Analysis (PCA) and K-means

PETE 630 Geostatistics 3

• The principal components are given by the following

PETE 630 Geostatistics 4

Variance given as:

Covariance given as:

PETE 630 Geostatistics

PETE 630 Geostatistics

Corresponding to each eigenvalues there will be a non-

PETE 630 Geostatistics

PETE 630 Geostatistics

PETE 630 Geostatistics 9

PC1 PC2 PC3 PC4 PC5 PC6

PETE 630 Geostatistics

PETE 630 Geostatistics

PETE 630 Geostatistics

PETE 630 Geostatistics 13

PETE 630 Geostatistics 14

PETE 630 Geostatistics 15

PETE 630 Geostatistics

PETE 630 Geostatistics

PETE 630 Geostatistics

PETE 630 Geostatistics

PETE 630 Geostatistics

Discriminant analysis aims at classifying unknown samples to one of

If two data groups are plotted in a multidimensional space, they would

The observation is allocated to the group with the maximal posterior

 By Maximum Likelihood rule,

 (x  μ c )T Σ c1 (x  μ c )  log | Σ c | 2 log p c

PETE 630 Geostatistics

You might also like