Matrix Decomposition and its Application in Statistics Part II

Matrix Decomposition and its
Application in Statistics
Part II
Chaiyaporn Khemapatapan, Ph.D
DPU
Overview
Introduction
LU decomposition
QR decomposition
Cholesky decomposition
Jordan Decomposition
Spectral decomposition
Singular value decomposition
Applications
2
Characteristic Roots and

Characteristics Vectors
Any nonzero vector x is said to be a characteristic vector of a matrix A, If
there exist a number such that Ax = x;
Where A is a square matrix, also then is said to be a characteristic root
of the matrix A (or eigenvalue of A) corresponding to the characteristic
vector x.
Characteristic root is unique but characteristic vector is not unique.
We calculate characteristics root from the characteristic equation |A- I|=0
For = i the characteristics vector is the solution of x from the following
homogeneous system of linear equation (A- iI)x=0
Theorem: If A is a real symmetric matrix and i and j are two distinct latent
root of A then the corresponding latent vector x i and xj are3orthogonal.
Multiplicity
Algebraic Multiplicity: The number of repetitions of a
certain eigenvalue. If, for a certain matrix, ={3,3,4}, then
the algebraic multiplicity of 3 would be 2 (as it appears twice)
and the algebraic multiplicity of 4 would be 1 (as it appears
once). This type of multiplicity is normally represented by the
Greek letter , where (i) represents the algebraic
multiplicity of i.
Geometric Multiplicity: the geometric multiplicity of an
eigenvalue is the number of linearly independent eigenvectors
associated with it.
4
Jordan Decomposition
Camille Jordan (1870)
Let A be any nn matrix then there exists a nonsingular matrix P and

JK() , Jordan normal form, is a kk matrix form as
0
J J k ( )

0
1
0
0
Camille Jordan
(1838-1921)
such that
J P 1 AP
A PJP 1
where P is an invertable matrix, k1+k2+ + kr =n, i , i=1,2,. . ., r are
the characteristic roots and ki are the algebraic multiplicity of i .
Jordan Decomposition is used in Differential equation and time series analysis.
5
Spectral Decomposition
(Eigen Decomposition)
A. L. Cauchy established the Spectral Decomposition in 1829.
Let A be a m m real symmetric matrix. P is

a matrix of Eigen vectors in column direction.
We can decompose a linearly system matrix A
as
1
A P P
A. L. CAUCHY,
(1789-1857)
where is a diagonal matrix of Eigen values which

number of unique Eigen value equals to rank of A.
In special case, there exists an orthogonal matrix P
(PT = P-1) such that
T
T
or
P
AP 6
A P P
Spectral Decomposition and

Principal component Analysis (Cont.)
By using spectral decomposition we can write A PPT
In multivariate analysis our data is a matrix. Suppose our data is
X matrix. Suppose X is mean centered i.e., X ( X )
and the variance covariance matrix is . The variance covariance
matrix is real and symmetric.
Using spectral decomposition we can write =PPT . Where is
a diagonal matrix. diag (1 , 2 ,, n )
Also
1 2 n
tr() = Total variation of Data =tr()

7
Spectral Decomposition and

Principal component Analysis (Cont.)
The Principal component transformation is the transformation
Y=(X-)P
Where,
E(Yi)=0
V(Yi)=i
Cov(Yi ,Yj)=0 if i j
V(Yn1) V(Y2) . . . V(Yn)
V (Yi ) tr ()

i 1
n
V (Y )
i 1
MATLAB code for Spectral

Decomposition
[P, D] = eig(A)
where P = [pi pi+1 ..] is a matrix of Eigen vectors in column direction, pi is an
Eigen vector of A and D (or ) is a diagonal matrix of Eigen values, .
Api i*pi = 0 (but not exactly 0) or

AP PD = 0 (but not exactly 0)
Application:
For Data Reduction.
Image Processing and Compression.
K-Selection for K-means clustering
Multivariate Outliers Detection
Noise Filtering
Trend detection in the observations.
Historical background of SVD

There are five mathematicians who were responsible for establishing the existence of the
singular value decomposition and developing its theory.
Eugenio Beltrami
(1835-1899)
Camille Jordan
(1838-1921)
James Joseph
Sylvester
(1814-1897)
Erhard Schmidt
(1876-1959)
Hermann Weyl
(1885-1955)
The Singular Value Decomposition was originally developed by two mathematician in the
mid to late 1800s
1. Eugenio Beltrami , 2.Camille Jordan
Several other mathematicians took part in the final developments of the SVD including James
Joseph Sylvester, Erhard Schmidt and Hermann Weyl who studied the SVD into the mid-1900s.
C.Eckart and G. Young prove low rank approximation of SVD (1936).
10
C.Eckart
What is SVD?
Any real (mn) matrix X, where (n m), can be
decomposed,
X = USVT
U is a (mm) or (mn) column orthonormal matrix
(UTU = I), containing the eigenvectors of the symmetric
matrix XXT.
S is a (mn) or (nn) diagonal matrix, containing the
singular values of matrix X. The number of non zero
diagonal elements of S corresponds to the rank of X.
VT is a (nn ) row orthonormal matrix (VTV = I),
containing the eigenvectors of the symmetric matrix XTX.
11
Dimension in SVD
economy-size
normal-size
12
Singular Value Decomposition (Cont.)

Theorem (Singular Value Decomposition) : Let X be mn of rank r,
r n m. Then there exist matrices U , V and a diagonal matrix
S , with positive diagonal elements such that,
X USV T
Proof: Since X is m n of rank r, r n m. So XXT and XTX both
of rank r ( by using the concept of Grammian matrix ) and of
dimension m m and n n respectively. Since XXT is real
symmetric matrix A with m m, so we can write by spectral
decomposition,
A XX T QDQ T
Where Q and D are respectively, the matrices of characteristic
vectors (Eigenvector) and corresponding characteristic roots
(Eigenvalues) of XXT. Again since XTX is real symmetric
matrix B with n n, so we can write by spectral decomposition,
B X T X RMR T
13

Where R is the (orthogonal) matrix of characteristic vectors and M
is diagonal matrix of the corresponding characteristic roots.
Since XXT and XTX are both of rank r, only r of their characteristic
roots are positive, the remaining being zero. Hence we can
write,
Dr
D
0
0
0
Also we can write,
Mr
M
0
0
0
14

We know that the nonzero characteristic roots of XXT and XTX are
equal so Dr M r
Partition Q, R conformably with D and M, respectively
i.e., Q (Qr , Q* ) ; R ( R , R ) such that Qr is m r , Rr is n r and
correspond respectively to the nonzero characteristic roots of
XXT and XTX. Now take
r
U Qr
V Rr
1/ 2
r
S diag ( d
1/ 2
1
,d
1/ 2
2
,, d r
1/ 2
Where d i , i 1,2,, r are the positive characteristic roots of

XXT and hence those of XTX as well (by using the concept of
15
grammian matrix.)

Now define, G Q D R
Now we shall show that S=X thus completing the proof.
r
1/ 2
r
G T G (Qr Dr
1/ 2
Rr )T Qr Dr
1/ 2
Rr
Rr Dr1 / 2QrT Qr Dr1 / 2 RrT

Rr Dr Rr
Rr M r Rr
RMR T
XTX
Similarly,
GG T XX T
From the first relation above we conclude that for an arbitrary orthogonal matrix,
say P1 ,
G PX
1
While from the second we conclude that for an arbitrary orthogonal matrix, say
16
P2
G XP
2

The preceding, however, implies that for arbitrary orthogonal
matrices P1 , P2 the matrix X satisfies
T
XX T P1 XX T P1 ,
Which in turn implies that,

Thus
X T X P2T X T XP2
P1 I m ,
P2 I n
X G Qr Dr1 / 2 RrT USV T
17
MATLAB Code for SVD

The svd for MATLAB command computes the matrix singular value
decomposition.
s = svd(X)
returns a vector of singular values.
svd(X) returns a vector of singular values.
[U,S,V] = svd(X) produces a diagonal matrix S of the same dimension as X, with

nonnegative diagonal elements in decreasing order, and unitary matrices U and
V so that X = U*S*V'.
[U,S,V] = svd(X,0) produces the "economy size" decomposition. If X is m-by-n
with m > n, then svd computes only the first n columns of U and S is n-by-n.
[U,S,V] = svd(X,'econ') also produces the "economy size" decomposition. If X
is m-by-n with m >= n, it is equivalent to svd(X,0). For m < n, only the first m
columns of V are computed and S is m-by-m
18
Decomposition in Diagram
Matrix A
LU decomposition
Not always unique
Rectangular
Full column rank
Square
Symmetric
Cholesky
Decomposition
QR Decomposition
SVD
Asymmetric
Spectral
Decomposition
Jordan
Decomposition
Similar
Diagonalization
P-1AP=
19
Properties Of SVD
Rewriting the SVD
A UV
where
ui i viT
i 1
r = rank of A
i = the i-th diagonal element of .
ui and vi are the i-th columns of U and V
respectively.
20
Proprieties of SVD
Low rank Approximation
Theorem: If A=UVT is the SVD of A and the
singular values are sorted as 1 2 , n
then for any l <r, the best rank-l approximation
l
to A is ~
r
T
~2
A
ui i vi ; A A i2
i 1
i l 1
Low rank approximation technique is very much

important for data compression.
21
Low-rank Approximation
SVD can be used to compute optimal low-rank
approximations.
Approximation of A is of rank k such that
~
A
Min
X :rank ( X ) k
A X
Frobenius norm
a
i 1 j 1
If d1 , d 2 ,, d n are the characteristics roots of A A then

T
and X are both mn matrices.
di
i 1
22
ij
Low-rank Approximation
Solution via SVD
~
A U diag (1 ,..., k ,0,...,0)V T
set smallest r-k
singular values to zero
* * *
* * *
* * *
*
*
*
* * *

X
K=2
* * *
* * *
* * *
*
*
*
* * *

* * *
* * *
* * *

k
~
A i 1 i u i viT
column notation: sum

of rank 1 matrices23
VT
Approximation error
How good (bad) is this approximation?
Its the best possible, measured by the Frobenius norm of the
error:
r
~
min
X :rank ( X ) k
A X
A A
i k 1
where the i are ordered such that i i+1.

Now
~
A A
2
F
24
Row approximation and column

approximation
Suppose Ri and cj represent the i-th row and j-th column of A. The SVD
~
of A and A is
A UV
u
k 1
The SVD equation for Ri is
k v
k 1
Ri
R
l
i
We can approximate Ri by
where i = 1,,m.
Also the SVD equation for Cj is,

where j = 1, 2, , n
We can also approximate Cj by
l
~
T
A U l lVl uk k vkT
T
k
u
k 1
ik
u
k 1
ik
k vk
k vk
;
C j v jk k u k
k 1
C v jk k u k
l
j
k 1
25
; l<r
l<r
Least square solution in inconsistent

system
By using SVD we can solve the inconsistent system.This gives the
least square solution. min
2
x
Ax b
The least square solution
where Ag be the MP inverse of A.

26
The SVD of Ag is
This can be written as
Where
27
Basic Results of SVD
28
SVD based PCA

If we reduced variable by using SVD then it performs like PCA.
Suppose X is a mean centered data matrix, Then
X using SVD, X=UVT
we can write- XV = U
Suppose Y = XV = U
Then the first columns of Y represents the first
principal component score and so on.
o SVD Based PC is more Numerically Stable.
If no. of variables is greater than no. of observations then SVD based PCA will
give efficient result(Antti Niemist, Statistical Analysis of Gene Expression
29
Microarray Data,2005)
Application of SVD
Data Reduction both variables and observations.

Solving linear least square Problems
Image Processing and Compression.
K-Selection for K-means clustering
Multivariate Outliers Detection
Noise Filtering
Trend detection in the observations and the variables.
30
Origin of biplot
Gabriel (1971)
One of the most
important advances in
data analysis in recent
decades
Currently
> 50,000 web pages
Numerous academic
publications
Included in most
statistical analysis
packages
Still a very new

technique to most
scientists
Prof. Ruben Gabriel, The founder of biplot

Courtesy of Prof. Purificacin Galindo
University of Salamanca, Spain
31
What is a biplot?
Biplot = bi + plot
plot
scatter plot of two rows OR of two columns, or
scatter plot summarizing the rows OR the columns
bi
BOTH rows AND columns
1 biplot >> 2 plots
32
Practical definition of a biplot

Any two-way table can be analyzed using a 2D-biplot as soon as it can be
sufficiently approximated by a rank-2 matrix. (Gabriel, 1971)
(Now 3D-biplots are also possible)
Matrix decomposition
P(4, 3)
g1
g2
g3
g 4
G(3, 2)
e1
e2
20
e3
6
g1
x
4
E1
E(2, 3)
y
3
G2
G1
E2
6
12 15 g 2 3 3 x
10 6
9
g 3 1 3 y
g 4 4
8
12 12
0
e1 e2
2 3
4
G4
e3
3
E3
G3
G-by-E table
33
Singular Value Decomposition (SVD) &

Singular Value Partitioning (SVP)
The rank of Y,
i.e., the
minimum
number of PC
required to fully
represent Y
X ij
SVD:
SVP:
SVP
Matrix
characterisi Singular values
ng the rows
SVD
Matrix
characterisi
ng the
columns
uik k vkj
Common uses value

of f
k 1
(uik )(
f
k
k 1
1 f
k
f=1
vkj )
f=0
f=1/2
Rows
scores
Column
scores
Plot
34
Biplot Plot
Biplot
The simplest biplot is to show the first two PCs together with the
projections of the axes of the original variables
x-axis represents the scores for the first principal component

Y-axis the scores for the second principal component.
The original variables are represented by arrows which
graphically indicate the proportion of the original variance
explained by the first two principal components.
The direction of the arrows indicates the relative loadings on
the first and second principal components.
Biplot analysis can help to understand the multivariate data
i) Graphically
ii) Effectively
iii) Conveniently.
35
Biplot of Iris Data

0
0.0
10
10
33
333 333
Sepal L.
33 33
22 2
3
22223 3 3
333
3 3 3
2 3
2 233 3 3
Petal W.
3
Petal L.
2
2
3
3
3
222 23
222 22 233 33
2 2 22 333
22 323
222
3 3 3
2
22 22
3
22
2
2 22 3
2
22 3
2
2
2
-0.2
-0.1
0.0
Comp. 1
0.1
0.2
36
-5
111
1
111
1
1
11 1
1 11 111
11
1
1
1
1
11 1
1
11
1
1
1 11
111
1
1
1
1
0.1
-0.1
0.2
1
Sepal W.
1
1 1
-0.2
Comp. 2
1= Setosa
2= Versicolor
3= Virginica
-5
-10
-10
Image Compression Example

Pansy Flower image, collected from
http://www.ats.ucla.edu/stat/r/code/pansy.jpg
This image is 600465 pixels
37
Singular values of flowers image
Plot of the singular values
38
Low rank Approximation to flowers image
Rank-1 approximation
Rank- 5
39
approximation
40
41
42
True Image
43
Outlier Detection Using SVD

Nishith and Nasser (2007,MSc. Thesis) propose a graphical
method of outliers detection using SVD.
It is suitable for both general multivariate data and regression
data. For this we construct the scatter plots of first two PCs,
and first PC and third PC. We also make a box in the scatter
plot whose range lies
median(1stPC) 3 mad(1stPC) in the X-axis and
median(2ndPC/3rdPC) 3 mad(2ndPC/3rdPC) in the Yaxis.
Where mad = median absolute deviation.
The points that are outside the box can be considered as
extreme outliers. The points outside one side of the box is
termed as outliers. Along with the box we may construct
another smaller box bounded by 2.5/2 MAD line
44
Outlier Detection Using SVD (Cont.)

HAWKINS-BRADU-KASS
(1984) DATA
Data set containing 75 observations

with 14 influential observations.
Among them there are ten high
leverage outliers (cases 1-10)
and for high leverage points
(cases 11-14) -Imon (2005).
Scatter plot of Hawkins, Bradu and kass data (a) scatter plot of first two PCs and
(b) scatter plot of first and third PC.
45
Outlier Detection Using SVD (Cont.)
Scatter plot of modified Brown data (a) scatter plot of first

two PCs and (b) scatter plot of first and third PC.
MODIFIED BROWN DATA

Data set given by Brown (1980).
Ryan (1997) pointed out that the
original data on the 53 patients
which contains 1 outlier
(observation number 24).
Imon and Hadi(2005) modified
this data set by putting two more
outliers as cases 54 and 55.
Also they showed that observations
24, 54 and 55 are outliers by using
generalized standardized
Pearson residual (GSPR)
46
Cluster Detection Using SVD

Singular Value Decomposition is also used for cluster
detection (Nishith, Nasser and Suboron, 2011).
The methods for clustering data using first three

PCs are given below,
median (1st PC) k mad (1st PC) in the X-axis and
median (2nd PC/3rd PC) k mad (2nd PC/3rd PC)
in the Y-axis.
Where mad = median absolute deviation. The value of
k = 1, 2, 3.
47
48
Principals stations in climate data
49
Climatic Variables
The climatic variables are,
1.
Rainfall (RF) mm
2.
Daily mean temperature (T-MEAN)0C
3.
Maximum temperature (T-MAX)0C
4.
Minimum temperature (T-MIN)0C
5.
Day-time temperature (T-DAY)0C
6.
Night-time temperature (T-NIGHT)0C
7.
Daily mean water vapor pressure (VP) MBAR
8.
Daily mean wind speed (WS) m/sec
9.
Hours of bright sunshine as percentage of maximum possible sunshine
hours (MPS)%
10. Solar radiation (SR) cal/cm2/day
50
Consequences of SVD
Generally many missing values may present in the data. It may also contain
unusual observations. Both types of problem can not handle Classical singular
value decomposition.
Robust singular value decomposition can solve both types of problems.
Robust singular value decomposition can be obtained by alternating L1
regression approach (Douglas M. Hawkins, Li Liu, and S. Stanley Young,
(2001)).
51
The Alternating L1 Regression Algorithm for Robust Singular Value

Decomposition.
There is no obvious choice of

the initial values of u1
Initialize the leading

left singular vector
u1
Fit the L1 regression coefficient cj by

n
minimizing
xij c j u i1 ; j=1,2,
i 1
,p
Calculate the resulting estimate of

the left eigenvector ui=d/ d
Calculate right singular vector v1=c/c

, where . refers to Euclidean norm.
Again fit the L1 regression coefficient

di by minimizing
x
j 1
ij
d i v j1
; i=1,2,.,n
Iterate this process untill it converge.
For the second and subsequent of the SVD, we replaced X by a52deflated matrix
obtained by subtracting the most recently found them in the SVD X
X-kukvkT
Clustering weather stations on Map

Using RSVD
53
References
Brown B.W., Jr. (1980). Prediction analysis for binary data. in

Biostatistics Casebook, R.G. Miller, Jr., B. Efron, B. W. Brown, Jr., L.E.
Moses (Eds.), New York: Wiley.
Dhrymes, Phoebus J. (1984), Mathematics for Econometrics, 2nd ed.
Springer Verlag, New York.
Hawkins D. M., Bradu D. and Kass G.V.(1984),Location of several
outliers in multiple regression data using elemental sets. Technometrics,
20, 197-208.
Imon A. H. M. R. (2005). Identifying multiple influential observations in
linear Regression. Journal of Applied Statistics 32, 73 90.
Kumar, N. , Nasser, M., and Sarker, S.C., 2011. A New Singular Value
Decomposition Based Robust Graphical Clustering Technique and Its
Application in Climatic Data Journal of Geography and Geology,
Canadian Center of Science and Education , Vol-3, No. 1, 227-238.
Ryan T.P. (1997). Modern Regression Methods, Wiley, New York.
Stewart, G.W. (1998). Matrix Algorithms, Vol 1. Basic

Decompositions, Siam, Philadelphia.
Matrix Decomposition.
http://fedc.wiwi.hu-berlin.de/xplore/ebooks/html/csa/node36.html
54
55

Matrix Decomposition and its Application in Statistics Part II

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matrix Decomposition and its Application in Statistics Part II

Uploaded by

Copyright:

Available Formats

Matrix Decomposition and its

Characteristic Roots and

Let A be any nn matrix then there exists a nonsingular matrix P and

Let A be a m m real symmetric matrix. P is

where is a diagonal matrix of Eigen values which

Spectral Decomposition and

tr() = Total variation of Data =tr()

Spectral Decomposition and

MATLAB code for Spectral

Api i*pi = 0 (but not exactly 0) or

Historical background of SVD

Singular Value Decomposition (Cont.)

Singular Value Decomposition (Cont.)

Also we can write,

Singular Value Decomposition (Cont.)

Where d i , i 1,2,, r are the positive characteristic roots of

Singular Value Decomposition (Cont.)

Rr Dr1 / 2QrT Qr Dr1 / 2 RrT

Singular Value Decomposition (Cont.)

Which in turn implies that,

X G Qr Dr1 / 2 RrT USV T

MATLAB Code for SVD

returns a vector of singular values.

svd(X) returns a vector of singular values.

[U,S,V] = svd(X) produces a diagonal matrix S of the same dimension as X, with

Low rank approximation technique is very much

If d1 , d 2 ,, d n are the characteristics roots of A A then

and X are both mn matrices.

column notation: sum

where the i are ordered such that i i+1.

Row approximation and column

The SVD equation for Ri is

Also the SVD equation for Cj is,

Least square solution in inconsistent

The least square solution

where Ag be the MP inverse of A.

This can be written as

Basic Results of SVD

SVD based PCA

Data Reduction both variables and observations.

Still a very new

Prof. Ruben Gabriel, The founder of biplot

1 biplot >> 2 plots

Practical definition of a biplot

Singular Value Decomposition (SVD) &

Common uses value

x-axis represents the scores for the first principal component

Biplot of Iris Data

Image Compression Example

This image is 600465 pixels

Singular values of flowers image

Plot of the singular values

Low rank Approximation to flowers image

Low rank Approximation to flowers image

Low rank Approximation to flowers image

Low rank Approximation to flowers image

Low rank Approximation to flowers image

Outlier Detection Using SVD

Outlier Detection Using SVD (Cont.)

Data set containing 75 observations

Outlier Detection Using SVD (Cont.)

Scatter plot of modified Brown data (a) scatter plot of first