You are on page 1of 25

Principal Component

Analysis (PCA) and K-means


Clustering
Class Note
PETE 630 Geostatistics
PCA
• Principal component analysis (PCA) finds a set
of standardized linear combination (called
principal components) which are orthogonal and
taken together explain all the variances of the
original data.
• In general there are as many principal
components as variables. However, in most
cases it is possible to consider first few principal
components that explain most of the data
variance.
PETE 630 Geostatistics 2
PCA Details
• Orthogonal linear transformation
• First, normalize the data to a mean of zero and
variance of unity. This removes the undue
impact of size or dimensions of the variables
involved
• Next we construct the covariance matrix (this is
actually a correlation matrix as the data has
been normalized)

Σx x T

PETE 630 Geostatistics 3


PCA Details
• From matrix algebra, the covariance(correlation) matrix
can be factored into a diagonal matrix and an orthogonal
matrix, L and Q respectively

Σ  QT ΛQ

• The principal components are given by the following

y  QT x

PETE 630 Geostatistics 4


PCA Example: Well Log Data Analysis

Variance given as:

Covariance given as:

PETE 630 Geostatistics


Covariance for Each Variables
Covariance matrix given by:

Advantage:
Reduction of data to a 6 x 6
covariance matrix instead of
multiple well log data

PETE 630 Geostatistics


Eigenvalues
Eigenvalues of a matrix A are given by:

Corresponding to each eigenvalues there will be a non-


trivial solution, i.e. x not equal to 0

Ax=lx

PETE 630 Geostatistics


Eigenvalues
Sum of the eigenvalues yields the overall variance of data
set

PETE 630 Geostatistics


PCA
• Each principal axis is a linear combination of the original
two variables
• PCj = ai1Y1 + ai2Y2 + … ainYn
• aij’s are the coefficients for factor i, multiplied by the
measured value for variable j
• PC 1 is simultaneously the direction of maximum
variance and a least-squares “line of best fit” (squared
distances of points away from PC 1 are minimized).
8

6
PC 1
4

PC 2
Variable X 2

0
-8 -6 -4 -2 0 2 4 6 8 10 12

-2

-4

-6

Variable X1

PETE 630 Geostatistics 9


PCA

PC1 PC2 PC3 PC4 PC5 PC6


GR -0.16 0.43 -0.09 0.06 -0.31 -0.02
NPHI -0.42 -0.16 0.19 0.86 -0.15 0.08
RHOB 0.41 0.06 -0.76 0.42 0.27 -0.01
DT -0.46 0.21 0.05 -0.04 0.86 -0.09
log (LLD) 0.46 0.13 0.45 0.24 0.13 -0.71
log(MSFL) 0.46 0.20 0.42 0.15 0.25 0.70
Contribution, % 64.5 13.4 8.1 7.4 4.1 2.5
Cum.Contribution, % 61.3 77.9 86.0 93.4 97.5 100

PC2 = 0.43(GR)-
0.16(NPHI)+0.06(RHOB)+0.21(DT)+0.13log(LLD)+0.20log(MSFL)

PETE 630 Geostatistics


PCA

DT

PETE 630 Geostatistics


PCA
4.0

3.0

2.0

1.0
PC 2

0.0

-1.0

-2.0

-3.0
-6.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0

PC 1 Tightness

PETE 630 Geostatistics


K-means Clustering
• A method of cluster analysis which aims to partition n
observations into k clusters in which each observation
belongs to the cluster with the nearest mean.
• Given a set of observations (x1, x2, …, xn), where each
observation is a d-dimensional real vector, k-means
clustering aims to partition the n observations into k sets
(k ≤ n) S = {S1, S2, …, Sk} so as to minimize the within-
cluster sum of squares:
k
arg s min  x
2
j  i
i 1 x j S i
where, i is the mean of points in Si .

PETE 630 Geostatistics 13


K-means Algorithm
• Uses an iterative refinement technique.
• Given an initial set of k means m1, m2, ….,mk, the
algorithm proceeds by alternating between two steps:
 Assignment step: Assign each observation to the cluster with
the closest mean:


Si(t )  x p : x p  mi(t )  x p  m (jt ) 1  j  k 
where each xp goes into exactly one Si(t ), even if it could go in
two of them.
 Updating step: Calculate the new means to be the centroid of
the observations in the cluster.
1
mi(t 1) 
Si(t )
x j
x j S i( t )

PETE 630 Geostatistics 14


Steps for Clustering
• Step 1: K initial “means” (in this case k = 3) are randomly
selected from the data set.
• Step 2: K clusters are created by associating every
observation with the nearest mean.
• Step 3: The centroid of each of the k clusters becomes
the new means.
• Step 4: Step2 through 3 are repeated until convergence
has been reached.

PETE 630 Geostatistics 15


STEP 1
K initial “means” (in this case k = 3) are randomly selected from the data set.
5

4
k1

k2
2

k3
0
0 1 2 3 4 5

PETE 630 Geostatistics


STEP 2
K clusters are created by associating every observation with the nearest mean.
5

4
k1

k2
2

k3
0
0 1 2 3 4 5

PETE 630 Geostatistics


STEP 3
The centroid of each of the k clusters becomes the new means.
5

4
k1

2
k3
k2
1

0
0 1 2 3 4 5

PETE 630 Geostatistics


K-means Clustering: Step 4
Step2 through 3 are repeated until convergence has been reached.
5

4
k1

2
k3
k2
1

0
0 1 2 3 4 5

PETE 630 Geostatistics


Use of K-Means Clustering
• Strength
 Relatively efficient, simple and easy way to classify a
given data set
 May find and terminate at a local optimum solution.
• The global optimum solution may be searched with other
techniques (for example, Genetic algorithms)
• Weakness
 The number of clusters, k, must be specified in advance
• No way of knowing how many clusters exist
 Sensitive to initial condition
• Different result of cluster if different initial condition is used
 Result is more likely circular shape because of use of
distance

PETE 630 Geostatistics


Discriminant Analysis

Discriminant analysis aims at classifying unknown samples to one of


several possible groups or classes.
Classical discriminant function analysis attempts to develop a linear
equation (a linear weighted function of the variables) that best
differentiates between two different classes
The approach requires a training data set for which assignment to
population groups are already known ( a supervised method)
Discriminant Analysis

If two data groups are plotted in a multidimensional space, they would


appear as two data clouds with either a distinct separation or some
overlap.
An axis is located on which the distance between each cloud is
maximized while the dispersion within each cloud is simultanneously
minimized.
The new axis defines the linear discriminant function and is calculated
from the multivariate means, variances and covariances of the data
groups.
The data points of the two groups may be projected onto this axis to
collapse multidimensional data in terms of a single discriminant variable
Discriminant Analysis
Discriminant Analysis
We assume that each data group is characterized by a group specific
probability density function
If each group is characterized by a group specific probability density
function fc(x) and the prior probability of the group pc is known, then the
posterior distribution of the classes given the observation x is (Bayes’
theorem)
p p ( x | c ) p c f c ( x)
p ( c | x)  c   p c f c ( x)
p ( x) p ( x)

The observation is allocated to the group with the maximal posterior


probability
Discriminant Analysis
• By Bayes’ theorem, the posterior distribution
of the classes given the observation is
p c p (x | c) p c f c ( x)
p (c | x)    p c f c ( x)
p ( x) p ( x)

 By Maximum Likelihood rule,


Qc  2 log f c (x)  2 log p c

 (x  μ c )T Σ c1 (x  μ c )  log | Σ c | 2 log p c

PETE 630 Geostatistics

You might also like