You are on page 1of 17

Matrix Decomposition

Ming Yang
Electrical and Computer Engineering
Northwestern University
Evanston, IL 60208
mya671@ece.northwestern.edu

Contents
1. Overview 2

2 Matrix Multiplication and Definitions 2


2.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Special Matrix Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Singular Value Decomposition 4


3.1 SVD decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Corollary of SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 LU and Cholesky Decomposition 5


4.1 Elementary Operation and Gaussian Transform . . . . . . . . . . . . . . . . . . . . . . 5
4.2 LU decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Cholesky decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5 QR Decomposition 7
5.1 Householder Reflections and Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . 8
5.2 Gram-Schmidt orthonormalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.3 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.4 Least Square Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6 Schur Decomposition and Eigenvalue Decomposition 11


6.1 Schur Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.2 Eigenvalue Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.3 Hessenberg Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7 Biconjugate Decomposition 15
7.1 Biconjugate Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1
1. Overview
“Matrix decomposition refers to the transformation of a given matrix into a given canonical form.” [1],
when the given matrix is transformed to a right-hand-side product of canonical matrices the process of
producing this decomposition is also called “matrix factorization”. Matrix decomposition is a fundamen-
tal theme in linear algebra and applied statistics which has both scientific and engineering significance.
The purposes of matrix decomposition typically involve two aspects: computational convenience and an-
alytic simplicity. In the real world, it is not feasible for most of the matrix computations to be calculated
in an optimal explicit way, such as matrix inversion, matrix determinant, solving linear system and least
square fitting, thus to convert a difficult matrix computation problem into several easier tasks such as
solving triangular or diagonal system will greatly facilitate the calculations. Data matrices representing
some numerical observations such as proximity matrix or correlation matrix are often huge and hard to
analyze, therefore to decompose the data matrices into some lower-order or lower-rank canonical forms
will reveal the inherent characteristic and structure of the matrices and help to interpret their meaning
readily.
This tutorial is primarily a summary of important matrix decomposition methods, we will first present
some basic concepts in Section 2 and then introduce several fundamental matrix decomposition methods
in the successive sections, e.g. SVD, LU, QR and Eigen decomposition. A unified view of matrix fac-
torization derived from the Wedderburn rank-one reduction theorem is briefly discussed in the summary
Section 7.

2 Matrix Multiplication and Definitions


2.1 Matrix Multiplication
Suppose A ∈ Rm×r and B ∈ Rr×n , the matrix multiplication C = AB can be viewed from three
different perspectives as follows:
Dot Product Matrix Multiply. Every element cij of C is the dot product of row vector aTi and column
vector bj .

 
aT1
 
A =  ...  ak ∈ R r
aTm
B = (b1 , . . . , bn ) bk ∈ Rr
C = (cij ) cij = aTi bj .

Column Combination Matrix Multiply. Every column cj of C is a linear combination of column vector
ak of A with columns bkj as the weight coefficients.

2
A = (a1 , . . . , ar ) ai ∈ Rm
B = (b1 , . . . , bn ) bj ∈ Rr
C = (c1 , . . . , cn ) cj ∈ Rm
X r
cj = bkj ak j = 1 : n.
k=1

Outer Product Matrix Multiply. C is the sum of r matrices, every matrix is an outer product of A’s
column vector and B’s row vector, which is a rank-one matrix.

A = (a1 , . . . , ar ) ai ∈ Rm
 
bT1
 
B =  ...  bk ∈ Rn
bTr
X r
C= ak bTk .
k=1

2.2 Special Matrix Definition


Before further discussion, we first present definitions of some special matrices, here we follow the terms
in [2].

Definition 1 A real matrix A is a symmetric matrix if it equals to its own transpose, that is A = AT .

Definition 2 A complex matrix A is a hermitian matrix if it equals to its own complex conjugate trans-
pose, that is A = AH .

Definition 3 A real matrix Q is an orthogonal matrix if the inverse of Q equals to the transpose of Q,
Q−1 = QT , that is QQT = QT Q = I.

Definition 4 A complex matrix U is a unitary matrix if the inverse of U equals the complex conjugate
transpose of U, U−1 = UH , that is UUH = UH U = I.

Definition 5 A matrix A ∈ Rn×n is positive definite if xT Ax ≥ 0 for all nonzero x ∈ Rn . Positive


definite matrices have positive definite principle sub-matrices and all the diagonal entries are positive.

Definition 6 Suppose S ⊆ Rn be a subspace with orthonormal basis V = (v1 , . . . , vk ), P = VT V ∈


Rn×n is the orthogonal projection matrix onto S such that range(P) = S, P2 = P, and PT = P. P is
unique for subspace S.

Hermitian matrix and unitary matrix are the counterparts of symmetric and orthogonal matrix in R,
the following theorems in R can be readily transformed to the corresponding forms in C by substituting
the transpose by conjugate transpose and orthogonal matrix by unitary matrix. Therefore, for simplicity,
we present most of the matrix decomposition results in R.

3
3 Singular Value Decomposition
Suppose matrix A ∈ Rm×n , the column vectors of A, namely range(A), represent a subspace in Rm ,
similarly range(AT ) is a subspace in Rn , apparently the two subspaces have the same dimension equals
to the rank of A. SVD decomposition is able to reveal the orthonormal basis of the range(A) and
range(AT ) and the respective scale factors σi simultaneously.

3.1 SVD decomposition


Theorem 1 Singular Value Decomposition(SVD) If matrix A ∈ Rm×n , then there exist orthogo-
nal matrices U = (u1 , . . . , um ) ∈ Rm×m , V = (v1 , . . . , vn ) ∈ Rn×n and diagonal matrix Σ =
diag(σ1 , . . . , σp ) ∈ Rm×n p = min(m, n), such that

A = UΣVT , where σ1 ≥ σ2 . . . ≥ σp ≥ 0.

Proof 1 Let σ1 = kAk2 = maxkvk2 =1 kAvk2 . Then there exist unit 2-norm vectors u1 ∈ Rm and
v1 ∈ Rn , such that
Av1
kAv1 k = σ1 , u1 = , therefore Av1 = σ1 u1 .
σ1
Any orthonormal set can be extended to form an orthonormal basis for the whole space, so we can
find V1 ∈ Rn×(n−1) and U1 ∈ Rm×(m−1) , such that V = (v1 V1 ) ∈ Rn×n and U = (u1 U1 ) ∈ Rnm×m
are orthonormal basis, thus

µ ¶ µ ¶ µ ¶ µ ¶
. uT1 uT1 Av1 uT1 AV1 σ1 ku1 k22 uT1 AV1 σ1 uT1 AV1
A1 = (Av1 AV1 ) = = =
UT1 UT1 Av1 UT1 AU1 σ1 UT1 u1 UT1 AU1 0 UT1 AU1
¡ ¢ ¡ ¢
Let σ1 uT1 AV1 T
= σ1 ω T ∈ Rn , the 2-norm of the product with A1 gives:
µ ¶
σ1 σ2 + ωT ω 2
kA1 k22 = k 1 k2 ≥ (σ12 + ω T ω)2
ω ...
So the 2-norm of matrix A1 is
q
kA1 xk (σ 2 + ω T ω)
kA1 k2 = sup ≥ p 12 = (σ12 + ω T ω),
x∈Rn kxk (σ1 + ω T ω)

while U and V are both orthonomal basis and kA1 k2 = kAk2 = σ1 , so ω = 0. An induction on
arguments completes the proof.

The σi are the singular values of A and the vector ui and vi are the left singular vector and right
singular vector, which satisfy that

Avi = σi ui and AT ui = σi vi .

4
3.2 Corollary of SVD
SVD decomposition reveals many intrinsic properties of matrix A and is numerical stable for calcula-
tions.

Corollary 1 If A = UΣVT is a SVD of A with σ1 ≥ σ2 . . . ≥ σr ≥ σr+1 = . . . = σp = 0, we have the


following statements:

1. rank(A) = r.

2. null(A) = span{vr+1 , . . . , vn }.

3. range(A) = span{u1 , . . . , ur }.
P
4. A = rj=1 σj uj vjT = Ur Σr Vr , where Ur = (u1 , . . . , ur ),Vr = (v1 , . . . , vr ),Σr = (σ1 , . . . , σr ).
q
5. kAkF = σ12 + . . . + σp2 .

6. kAk2 = σ1 .
p
7. σj = λj (AT A), j = 1, . . . , p, where λj (AT A) is the jth largest eigenvalue of AT A.

8. vi are orthonormalized eigenvectors of AT A and ui are orthonormalized eigenvectors of AAT .

SVD is generalized to simultaneously diagonalize two matrices [3] or decomposition of a matrix that
employs different metrics in the normalizations [4].

4 LU and Cholesky Decomposition


Solution to the linear system equation Ax = b is the basic problem in linear algebra. Theoretically when
A is a non-singular square matrix there exists a unique solution x = A−1 b, however the inverse of a
matrix is typically not easy to compute. So we hope to transform A to some triangular systems which
are much easier to solve by forward or backward substitution, this process is referred to as Gaussian
elimination [5]. This process can be summarized in matrix form as LU decompostion and a series of
evolutions when matrix A has extra properties.

4.1 Elementary Operation and Gaussian Transform


For square matrix A, the following three operations are referred to as elementary row (column) opera-
tions of type 1, 2, and 3 respectively:

1. interchanging two rows (or columns) in A.

2. multiplying all elements of a row (or column) of A by some nonzero number.

3. adding to any row (or column) of A any other row (or column) of A multiplied by a non zero
number.

5
These operations can be implemented by pre- or post-multiplying an appropriate matrices called ele-
mentary matrices, the type 3 row elementary matrices have the following forms:
   
1 0 1 0
 ..   .. 
 .   . 
   
 1 ... τ   1 
 .   . 
(3)
E =   (3)  
1 ..  or E =  .. 1 
 1   τ . . . 1 
   
 .   . 
 . .   . . 
0 1 0 1
Gaussian elimination process can be described as matrix multiplications of type 3 lower triangle ele-
mentary matrices. For x ∈ Rn with xk 6= 0, Gaussian Transformation is defined as matrix Mk = I−τ eTk ,
where Gauss vector τ is

  Ã !
xi
τ T = 0, . . . , 0, τk+1 , . . . , τn  τi = i = k + 1 : n and eTk = 0, . . . , 0, |{z}
1 , 0, . . . , 0
| {z } xk
k kth

Pre-multiply x with Mk then the last k + 1 to n elements of x are zeroed.


 
1 ··· 0 0 ··· 0  x   x1

 .. . . . .. .. . . ..   1   
 . . . . .  ...   ... 
 
 0 1 0 0  xk   xk 
Mk x =   = 
 0 −τk+1 1 0   
 xk+1   0 

 . . 
 .. . . .
.. .. . . . ..   . . .  
. . ... 
0 . . . −τ 0 ... 1 xn 0
n

It is easy to verify that Gaussian transform matrix is the product of lower triangular type 3 elementary
matrices with det(Mk ) = 1. So by multiplying a series of Gaussian transform matrix, the lower part
of A can be gradually zeroed given that the pivots xkk 6= 0 during the process. This process can be
summarized as LU decomposition.

4.2 LU decomposition
Theorem 2 LU Decomposition Let A ∈ Rn×n and all the leading principal minors det(A(1 : k, 1 :
k)) 6= 0, k = 1, . . . , n − 1. Then there exist a unique unit lower triangular L with diagonal elements all
equal to one and a unique upper triangular matrix U such that A = LU, and det(A) = u11 u22 . . . unn .

Proof 2 Given a11 6= 0 in A, we can find Gaussian transform M1 to zero the a21 , . . . , an1 . Suppose at
k − 1 step Mk−1 . . . M1 A = A(k−1) , consider the k × k portion of this equation, since Gaussian trans-
(k−1) (k−1)
forms are unit lower triangular with determinants equal to one, det(A(1 : k, 1 : k) = a11 . . . akk 6=
(k−1)
0. Therefore the kth pivot akk 6= 0, we can proceed to find Gaussian transform Mk .

6
If A = L1 U1 = L2 U2 are two LU decompositions of a non-singular A, then L−1 −1
2 L1 = U2 U1 , since
the left part of the equation is unit lower triangular while the right side is upper triangular, both of the
matrices must be the identity to satisfy the equation. Hence, L1 = L2 and U1 = U2 .
If A = LU then det(A) = det(LU) = det(L) det(U) = u11 u22 . . . unn .
For linear system Ax = b if we pre-compute the LU decomposition of A = LU, the problem reduces
to solve two triangle systems Ly = b and Ux = y which can be calculated much more readily. Moreover
when the system has to be solved with respect to many different b, such as the solution of certain circuit
under different excitations, the LU decomposition method is very efficient.

4.3 Cholesky decomposition


If the matrix A has additional properties, the LU decomposition will have particular forms. In this
section we will present the specilized LU decomposition for symmetric and positive definite matrices.
First we express the LU decomposition in an equavalent way.
Theorem 3 LDMT Decomposition. Let A ∈ Rn×n and all the leading principal minors det(A(1 :
k, 1 : k)) 6= 0, k = 1, . . . , n − 1. Then there exist unique unit lower triangular matrices L and M and a
unique diagonal matrix D = diag(d1 , . . . , dn ), such that A = LDMT .
If A has a LU decomposition A = LU and let D = diag(u11 , . . . , unn ), observe that MT = D−1 U
is unit upper triangular. Thus A = LU = LD(D−1 U) = LDMT . Uniqueness follows from the
uniqueness of LU decomposition.
Further if A is symmetric, A = LDMT = AT = MDLT , MDLT is also the LDMT Decomposition
of A. From the uniqueness we have L = M.
Theorem 4 LDLT Decomposition. If A ∈ Rn×n is a non-singular symmetric matrix and has LDMT
Decomposition A = LDMT , then L = M and A = LDLT .
For a positive definite matrix A, the D = diag(d1 , . . . , dn ) in LDMT decomposition has positive
diagonal entries. So we can further specilize the LU decomposition for symmetric positive definite
matrices.
Theorem 5 Cholesky Decomposition. If A ∈ Rn×n is symmetric positive definite, then there exists
a unique lower triangular G ∈ Rn×n with positive diagonal entries such that A = GGT , and G is
referred to the Cholesky triangle.
In the LDLT decomposition of symmetric
√ A,
√ the entries of the diagonal matrix D = diag(d1 , . . . , dn )
are all positive, so let G = Ldiag( d1 , . . . , dn ) is a lower triangular with positive diagonal entries
and A = GGT , the uniqueness follows from the uniqueness of the LDLT decomposition.

5 QR Decomposition
If the linear system Ax = b is overdetermined, namely, where A ∈ Rm×n with m ≥ n and b ∈ Rm , the
exact solution may not exist. So we can use the least square solution of the minimization kAx − bk2 as
a substitution. In this section we will present several methods to construct the QR decomposition and
how to compute the least square fitting by QR, LU and SVD decomposition.

7
5.1 Householder Reflections and Givens Rotations
Let v ∈ Rn be nonzero, a n-by-n matrix P of the form

P = I − 2vv T /v T v,

is called a Householder reflection or Householder matrix or Householder transformation. The vector


v is called a Householder vector. A geometric illustration: when a vector x is multiplied by P, it is
reflected with the hyperplane of v’s orthogonal complement span{v}⊥ .
Householder reflections can be used to zero selected entries of a vector in similar way to Gauss
transformations, while Gauss transformations are unit lower triangular and Householder matrices are
orthogonal and symmetric:

PPT = PP = (I − 2vv T /v T v)(I − 2vv T /v T v)


= I − 4vv T /v T v + 4vv T vv T /(v T v)2 = I.
Given a vector 0 6= x ∈ Rn , we will show there exists a Householder reflection can zero the all but
the first elements in x, such that Px ∈ span{e1 }. Let v = x + αe1 and α = kxk2 ,
2v T x
Px = x − vT v
v
2(xT x+αx1 )
= x− xT x+2αx1 +α2
(x + αe1 )
2(α2 +αx1 )
= x− 2α2 +2αx1
(x + αe1 ) = αe1
Furthermore, to zero all the elements except the first two entries of vector x = (x1 , x2 , . . . , xn )T , we can
obtain the Household vector v 0 = x0 + kx0 ke1 where x0 = (x2 , . . . , xn )T and extend v = (0, v 0 )T . So
on so forth we can apply a series of Householder reflections to reduce matrix A to a upper triangular
matrix.
Household reflections are capable of introducing zeros to all but the first element of a vector, Givens
rotations are able to selectively zero one element. For vector x = (x1 , . . . , xi , . . . , xk , . . . , xn )T , xk 6= 0,
the following Givens rotation can force xk to be zero:
 
1 ... 0 ... 0 ... 0
 .. . . . .. .. 
 . . .. . . 
 
 0 ... c ... s ... 0 
 . .. . . .. .. 
G(i, k, θ) = 
 .
. . . . . 

 0 . . . −s . . . c . . . 0 
 
 . . . . . 
 .. .. .. . . .. 
0 ... 0 ... 0 ... 1
where c = cos(θ) = √ x2i and s = sin(θ) = √−x k
. In a geometric view, Givens rotation amounts to
xi +x2k 2 2
xi +xk
a counterclockwise rotation θ in (i, k) coordinate plane. It is easy to check G(i, k, θ) is also orthogonal,
and by the pre-multiplication of a series of Givens rotations we can zero the lower part of a matrix.

8
5.2 Gram-Schmidt orthonormalization
If A = (a1 , . . . , an ) ∈ Rm×n is a linear independent set of vectors, by subtracting from the the projec-
tions of ak onto ai (i < k) from ak and adequate normalization, we can gradually orthonormalize A to
an orthonomal set Q = (q1 , . . . , qm ) as follows:

q1 = a1 /r11 r11 = ka1 k


q2 = (a2 − r21 q1 )/r22 r21 = xT2 q1 , r22 = ka2 − r21 q1 k2
q3 = (a3 − r31 q1 − r32 q2 )/r33 r31 = xT3 q1 , r32 = xT3 q2 , r33 = ka3 − r31 q1 − r32 q2 k2
.. . ..
. = .. .
P P
qk = (ak − k−1i=1 rik qi )/rkk ) rik = qiT ak , rkk = kak − k−1 i=1 rik qi k2

P
Therefore, ak = ki=1 rik qi , A is the product of Q and an upper triangular R = (rij ), this process is
called the Gram-Schmidt orthonormalization process.
This process is sensitive to roundoff errors. A modified version of Gram-Schmidt process subtracts
the projections onto qk of all the succeeding ai from ai instead of subtract from ai all the previous qk .
When qk is determined we first subtract the projection of ai onto qk from ai for i > k and then normalize
the new ak+1 to get qk+1 :

for k=1 to n
qk = akk /rkk , rkk = kak k2
k+1
ai = ai − rki qk , rki = akT
k
i qk , for i = k + 1 : n

After Q is calculated, by sequentially substituting akk with the previous aik , i < k we can easily get the
representation a1k namely ak with respect to q1 , . . . , qk :
1
qk = akk /rkk = rkk (ak−1
k − rk−1 k qk−1 )
1 k−2
= rkk (ak − rk−2 k qk−2 − rk−1 k qk−1 )
.
= ..
P
= rkk1
(a1k − k−1i=1 rik qi ).

Pk
Thus ak = i=1 rik qi , which implies A is the product of Q and an upper triangular R = (rij ).

5.3 QR Decomposition
Theorem 6 QR Decomposition. Let A ∈ Rm×n , there exist an orthogonal matrix Q ∈ Rm×m and an
upper triangular matrix R ∈ Rm×n , such that

A = QR

All the methods in the previous sub-sections can be viewed as different constructive proofs of QR de-
composition, including Householder reflection, Givens rotation and Gram-Schimdt orthogonalization
process and its modification version.

9
Corollary 2 If A ∈ Rm×n has full column rank m ≥ n and A = QR is a QR decomposition. A =
(a1 , . . . , an ) and Q = (q1 , . . . , qm ) are column partition forms, then
span{a1 , . . . , ak } = span{q1 , . . . , qk } k = 1 : n
range(A) = span{q1 , . . . , qn } .

range(A) = span{qn+1 , . . . , qm }
Let Q1 = (q1 , . . . , qn ), A = Q1 R1 with R1 ∈ Rn×n , then G = RT1 is the lower triangular Cholesky
factor of AT A.
The first part of the corollary can be easily proved by the Gram-Schmidt process.
AT A = (Q1 R1 )T Q1 R1 = RT1 R1 = GT G, so R1 is unique upper triangular with positive diagonal
entries.

5.4 Least Square Fitting


Let’s consider the overdetermined system Ax = b, where the data matrix A ∈ Rm×n and the observation
vector b ∈ Rm with m ≥ n, typically the system has no exact solution if b is not an element of range(A).
The goal of least square fitting problem is to find x ∈ Rn to minimize J = kAx − bkp , where p = 2 the
optimization function is analytic.
If A doesn’t have full column rank the solution to the LS fitting is not unique, if x minimizes the J
then x + z, z ∈ null(A) is also a solution. Assume A has full column rank the unique LS solution xLS
is give by the pseudo inverse xLS = A† b = (AT A)−1 AT b. As we mentioned before the inverse of a
matrix is typically not easy to compute, the normal equation AT AxLS = AT b is more practical to be
solved by QR decomposition, LU decomposition or SVD decomposition.
Note that the 2-norm of a vector is invariant under orthogonal transformation. Suppose A = QR is
the QR decomposition, we can get
µ ¶
T R1 n
Q A=R= ,
0 m−n
where R1 is square upper triangular, and
µ ¶
T c n
Q b= ,
d m−n
then
kAx − bk22 = kQT Ax − QT bk22 = kR1 x − ck22 + kdk22 .
Thus xLS ∈ Rn can be readily solved with back substitution of the upper triangular system R1 x = c.
Once the QR decomposition of A is computed by Householder reflection or any other methods, the full
rank LS problem can be solved by the above procedure.
An alternative method is to solve AT AxLS = AT b by LU decomposition of AT A. C = AT A is a
symmetric positive definite matrix, there exists the Cholesky decomposition C = GGT , so solving the
two triangular system gives the LS solution:
d = AT b, Gy = d, GT xLS = y.
If A doesn’t have full column rank, AT A may not be invertible. we can use SVD decomposition of
A to solve the LS problem.

10
Theorem 7 Let A ∈ Rm×n and A = UΣVT is its SVD decomposition with rank(A) = r. If U =
(u1 , . . . , um ) and V = (v1 , . . . , un ) are column partitions and b ∈ Rm , then the LS solution to Ax = b
is: r
X uTi b
xLS = vi
i=1
σi

Proof 3
kAx − bk22 = P
k(UT AV)(VT x) − UTP bk22 = kΣα − UT bk22
r T 2 m T 2
= i=1 (σi αi − ui b) + i=r+1 (ui b)
¡ ¢T
where α = VT x. Clearly, only the first part related to x, so α = uTi b/σi , . . . , uTr b/σr , 0, . . . , 0
minimizes the fitting, thus
r
X uT b i
xLS = Vα = vi .
i=1
σi

In addition given the SVD decomposition the pseduo inverse of A ∈ Rm×n is defined as A† ∈ Rn×m
and A† = VΣ† UT , where
µ ¶
† 1 1
Σ = diag , . . . , , 0, . . . , 0 ∈ Rn×m .
σ1 σr

6 Schur Decomposition and Eigenvalue Decomposition


Given a square matrix A, we have interests about what is the simplest form B in C or R under uni-
tary(orthogonal) similarity transform A = QBQH or similarity transform A = XBX−1 . Matrix B
reveals the intrinsic information of A in that many attributes and structure of matrices are invariant
under similarity transform.

Definition 7 Let A ∈ Cn×n , if there exists a non-zero vector x ∈ Cn that satisfies Ax = λx, λ ∈ C, λ
is called the eigenvalue of matrix A and x is referred to as eigenvector.

Eigenvalues are the n roots of matrix A’s characteristic polynomial det(λI − A), the set of eigenvalues
is also called the spectrum of A. The sum of the diagonal elements of A is referred to as trace of A,
n
X n
X
trace(A) = aii = λi .
i=1 i=1

6.1 Schur Decomposition


Theorem 8 Schur Decomposition. Let A ∈ Cn×n , then there exists a unitary Q ∈ Cn×n such that

QH AQ = T = D + N

11
where D = diag(λ1 , . . . , λn ) and N is strictly upper triangular. Q = (q1 , . . . , qn ) is a column partition-
ing of the unitary matrix Q where qi is referred to as Schur vectors and from AQ = QT Schur vector
satisfy
k−1
X
Aqk = λk qk + nik qi , k = 1 : n.
i=1

Proof 4 The theorem obviously holds when n = 1. Suppose λ is an eigenvalue of matrix A and Ax =
λx with x ∈ Cn is a unit vector. Then x can be extended to a unitary matrix U = (x, u2 , . . . , un ),

AU = (Ax, Au2 , . . . , Aun )


= (λx,
µ Au2 , .T. .¶, Aun )
λ ω
= U
0 C

Suppose the theorem holds for matrices of order n − 1, there is a unitary Ũ such that ŨH CŨ is upper
triangular. Thus, lets Q = Udiag(1, Ũ) it is easy to verify the theorem holds for order n.

For a real matrix A, the eigenvalues are either real or conjugate complex in pairs. In order to operate
all with real numbers, T changes to block upper triangular with either 1-by1 or 2-by-2 diagonal blocks
which is called as real Schur decomposition.

Theorem 9 Real Schur Decomposition.Let A ∈ Rn×n , then there exists an orthogonal Q ∈ Rn×n
such that  
R11 R12 . . . R1m
 0 R22 . . . R2m 
T  
Q AQ = R =  .. .. . . .. 
 . . . . 
0 0 . . . Rmm
where each Rii is either a 1-by-1 matrix a 2-by-2 matrix having complex conjugate eigenvalues.

Proof 5 The theorem obviously holds for n = 1. Let A ∈ Rn×n , if A has a real eigenvalue λ then A
can be block diagonalized and reduced to order n − 1 as shown in the proof of Schur decomposition.
If A has a couple of conjugate complex eigenvalue λ1,2 = α ± iβ, it is easily to see the corresponding
eigenvectors are also complex conjugate x1,2 = y ± iz, where y and z are real vectors.
µ ¶
¡ ¢ ¡ ¢ α β
A(y + iz) = (α + iβ)(y + iz) ⇒ A y z = y z .
−β α

β 6= 0 implies that y and z are independent, thus by Gram-Schimt process we can extend y and z to an
orthogonal Q = (y, (y − r12 z)/r22 , q3 , . . . , qn ), such that
µ ¶
T R11 R12
Q AQ =
0 R22

where R11 is a 2-by-2 matrix with eigenvalues λ1,2 = α + iβ. By induction the theorem holds.

12
Corollary 3 A is normal, namely AH A = AAH , if and only if there exists a unitary Q ∈ Cn×n such
that QH AQ = diag(λ1 , . . . , λn ).

Corollary 4 A is real symmetric matrix, there exists an orthogonal Q ∈ Rn×n such that QT AQ =
diag(λ1 , . . . , λn ).

Consider the real Schur decomposition of symmetric A, so R is also symmetric. And the eigenvalues of
2-by-2 symmetric matrices are real, thus A can be diagonalized.

6.2 Eigenvalue Decomposition


Theorem 10 Block Diagonal Decomposition. Let A ∈ Cn×n and a Schur decomposition as follows:
 
T11 T12 . . . T1q
 0 T22 . . . T2q 
 
QH AQ = T =  .. .. ... .. 
 . . . 
0 0 . . . Tqq

assume that the Tii are square and the eigenvalues of Tii and Tjj are different whenever i 6= j, then
there exists a nonsingular matrix Y ∈ Cn×n , such that
0 0
(Y−1 QH )A(QY) = diag(T11 , . . . , Tqq ).

For matrix A ∈ Cn×n , the order of eigenvalue λi in the characteristic polynomial is referred to as
algebraic multiplicity of λi , the dimensions of null(λi I − A) is called geometric multiplicity of λi which
implies the number of independent eigenvectors associated with λi .

Corollary 5 Diagonal Decomposition. Let A ∈ Cn×n , there exists a non-singular X ∈ Cn×n which
can diagonalize A
X−1 AX = diag(λ1 , . . . , λn ),
if and only if the geometric multiplicities of all eigenvalue λi equal to their algebraic multiplicities.

Theorem 11 Jordan Decomposition. Let A ∈ Cn×n , then there exists a non-singular X ∈ Cn×n such
that X−1 AX = diag(J1 , . . . , Jt ),where
 
λi 1 ... 0
 . 
 0 λi . . ... 
 
Ji = 

.. .. ..
. . . 

 . .. .. 
 .. . . 1 
0 ... 0 λi

is mi − by − mi square matrix and m1 + . . . + mt =n, Ji is referred to as Jordan blocks.

13
6.3 Hessenberg Decomposition
Theorem 12 Hessenberg Decomposition. Let A ∈ Rn×n , then there exists an orthogonal matrix
Q ∈ Rn×n , such that
QT AQ = H
where H is a Hessenberg matrix which means the elements below the sub-diagonal are zero.
Proof 6 We claim Q is a product of n − 2 Householder matrices P1 , . . . , Pn2 . We can find n − 1
order Householder reflection P1 to zero the first column of A except the first two entries. Let α =
(a21 , . . . , an1 )T and P1 α = (a21 , 0, . . . , 0)T . Let P1 = diag(1, P), note Householder matrices are
symmetric and P1 is symmetric, then
µ ¶µ ¶µ ¶ µ ¶
T 1 0 a11 ω 1 0 a11 ω T P1
P1 AP = =
0 P1 α A22 0 P1 P1 α P1 A22 P1
Now suppose the k − 1 step has been done we find k − 1 Householder matrices P1 , . . . , Pk−1 such that
 
B11 B12 B13
(P1 . . . Pk−1 )T A (P1 . . . Pk−1 ) =  B11 b22 B23 
0 B32 B33
is upper Hessenberg through its first k−1 columns. B32 is a vector with n−k elements, we can find n−k
order Householder matrix Pk to zero B32 ’s elements except the first entry, Let Pk = diag(In−k , Pk ),
then  
B11 B12 BT13 Pk
(P1 . . . Pk )T A (P1 . . . Pk ) =  B11 b22 BT23 Pk 
0 Pk B32 Pk B33 Pk
is upper Hessenberg through its first k columns. By induction, the theorem holds.
If matrix A is symmetric, the Hessenberg decomposition leads to a tri-diagonal form of A. This claim
can be easily verified by setting ω = αT andB23 = BT32 in the above proof.
 
h11 h12 0 ... 0
 . .. 
 h21 h22 . . 0 . 
 
Q AQ = H = 
T
 0
. .
.. .. . .. 0  .
 . ... ... 
 .. 0 hn−1 n 
0 . . . 0 hn n−1 hnn
Companion matrix decomposition is a non-orthogonal(non-unitary in complex domain) analog of
the Hessenberg decompositon, just like the relation of Schur decomposition and Jordan decomposition.
Companion matrix indicates the matrices have the following forms and their transpose, which can be
easily derived from the characteristic polynomial det(λI − C) = c0 + c1 λ + . . . + cn−1 λn−1 + λn :
   
0 0 . . . 0 −c0 −cn−1 . . . −c2 −c1 −c0
 1 0 . . . 0 −c1   1 0 ... 0 0 
   
 0 1 . . . 0 −c2   0 1 . . . 0 0 
C=  C= .
 .. .. .. .. ..   .. .. .. .. .. 
 . . . . .   . . . . . 
0 0 . . . 1 −cn−1 0 0 ... 1 0

14
Schur Decomposition is an important means to compute eigenvalues. A practical iteration scheme
based on Hessenberg decomposition and QR decomposition is called QR iteration as follows:

Hessenberg decomposition
H0 = UT0 AU0
for k = 1, 2, . . .
QR decomposition
Hk−1 = Uk Rk
Hk = Rk Uk
The QR iteration converges to the Schur decomposition of matrix A. Please refer to [3] for details.

7 Biconjugate Decomposition
7.1 Biconjugate Decomposition
A variety of matrix decomposition processes can be unified with the Wedderburn rank-one reduction
theorem [6], such as Gram-Schmidt orthogonalization process, LU, QR, SVD decomposition.

Theorem 13 If A ∈ Rm×n , x ∈ Rn and y ∈ Rm are vectors such that ω = y T Ax 6= 0, then the matrix
.
B = A − ω −1 Axy T A has rank exactly one less than the rank of A.

Proof 7 We will show the order of B’s null space is one larger than that of A. ∀z ∈ null(A),e.g.
Az = 0 we get Bz = 0, so null(A) ⊆ null(B). ∀z ∈ null(B),

0 = Bz = Az − ω −1 Ax(y T Az).

Let k = ω −1 y T Az, which is a scalar, thus

A(z − kx) = 0,

(z − kx) ∈ null(A), note Ax 6= 0, the null space of B is therefore obtained from that of A by adding x
to its basis, which increase the order of this space by 1. Thus, the rank of B is one less then A.

Suppose rank(A) = r, we can define a rank reducing process to generate a sequence of Wedderburn
matrices {Ak } by using
. .
A1 = A, Ak+1 = Ak − ωk−1 Ak xk ykT Ak
for any vector xk ∈ Rn and yk ∈ Rm satisfying ωk = ykT Ak xk 6= 0. The sequence will terminate in
r steps since {rank(Ak )} decreases by exactly one at each step. This process can be summarized in
matrix outer-product factorization form:

A = ΦΩ−1 ΨT (1)
. . .
where Ω = diag{ω1 , . . . , ωr }, Φ = (φ1 , . . . , φr ) ∈ Rm×r and Ψ = (ψ1 , . . . , ψr ) ∈ Rn×r with
. .
φk = Ak xk , ψk = ATk yk

15
Further equ. 1 can be written:
¡ ¢
A = (A1 x1 , . . . , Ar xr ) Ω−1 y1T A1 , . . . , yrT Ar (2)

Note every Ak can be expressed with A, we can find U = (u1 , . . . , ur ) ∈ Rn×r and V = (v1 , . . . , vr ) ∈
Rm×r , where Auk = Ak xk and vkT A = ykT Ak .

X k−1 µ ¶ k−1X µ ¶
. viT Axk . ykT Aui
uk = x k − ui , v k = y k − vi
i=1
viT Aui i=1
viT Aui

Thus equ. 1 can be rewritten as

VT AU = Ω (3)
A = AUΩ−1 VT A. (4)

This matrix decomposition process in equ. 1,3,4 is referred to as biconjugate decomposition in [6],
which can be easily verified by substitution Wedderburn matrix Ar+1 = 0 with {Ak }.(U, V) is called
A-biconjugate pair and (X, Y) is called A-biconjugatable.
Depending on the initial matrix A and the choice of the vector sets (X, Y), a variety of factorizations
can be derived from biconjugate decomposition. Here we list the results for some well-known matrix
decompositions, please refer to [6] for details.

Gram-Schmidt let A be the identity matrix and (X, Y) are identical and contain the vectors for which
an orthogonal basis is desired, (U = V) give the resultant orthogonal basis.

LDM For A ∈ Rn×n of rank n, if the A-biconjugatable (X, Y) are both the identity matrix (I, I), then
equ. 3 provides the unique LDMT decomposition of A, where A = V−T ΩU−1 for V−T and
U−T unit lower triangular matrices.

QR For A ∈ Rn×n of rank n, if the A-biconjugatable (X, Y) is (I, A) and gives the the biconjugate
pair (U, V) = (R−1 2 −1
1 , QΨ) and Ω = Ψ in equ. 3, where Ψ is a diagonal matrix and R1 is the
unit upper triangular matrix, R = ΨR1 and Q give the QR decomposition A = QR.

SVD For A ∈ Rm×n of rank, the SVD of A given as A = UΣVT , the A-biconjugatable (X, Y) is
(V, U).

7.2 Summary


 SVD : A = UΣVH



 QR : A = QR


 (square)Schur : A = QTQH (non-unitary)Jordan : A = XJX−1 (· · · )Eigen : A = XΣX−1
m×n
A∈C (real square)Real Schur : A = QRQT (symmetric)Eigen : A = XΣXT



 (real square)Hessenberg : A = QHQT (non-orthogonal)Companion(symmetric)Tri-Diagonal



 (non-singular square · · · )LU(LDM) : A = LU = LDMT (symmetric)LDL : A = LDLT

(symmetric positive definite square)Cholesky : A = GGT

16
References
[1] E. W. Weisstein. Matrix decomposition. MathWorld–A Wolfram Web Resource. [Online]. Available:
http://mathworld.wolfram.com/MatrixDecomposition.html

[2] R. Bronson, Matrix Methods An Introduction, 2nd ed. Academic Press, 1991.

[3] G. H.Golub and C. F. Loan, Matrix Computations, 2nd ed. Johns Hopkins Press, 1996.

[4] L. Hubert, J. Meulman, and W. Heiser, “Two purposes for matrix factorization: A historical appraisal,” SIAM
Review, vol. 42, no. 1, pp. 68–82, 2000.

[5] P. Lancaster and M. Tismenestsky, The Theory of Matrices, 2nd ed., W. Rheinboldt, Ed. Academic Press,
1985.

[6] M. T. Chu, R. E. Funderlic, and G. H. Golub, “A rank-one reduction formula and its applications to matrix
factorizations,” SIAM Review, vol. 37, no. 4, pp. 512–530, 1995.

17

You might also like