You are on page 1of 46

Matrix Analysis Notes

Preface

Preface

ii

1 Preliminaries

1.1
1.2
1.3

1
7
8

Euclidean space
The four fundamental subspaces
Projectors

2 Spectral Theory

10

2.1
2.2
2.3

10
14
18

Eigenvalues and eigenvectors


The Schur form
The Jordan form

3 The Singular Value Decomposition

25

3.1
3.2
3.3

25
33
36

Detour: Hermitian matrices


Singular values and singular vectors
Matrix approximation

4 Supplemental Material

39

4.1
4.2

39
41

Gaussian Elimination and the LU Factorization


The QR Factorization

References

43

Preface

These are notes for a refresher course offered to incoming graduate


students in the Institute for Computational and Mathematical Engineering (ICME) at Stanford University. The course prepares students
for the ICME core courses in numerical linear algebra (CME 302) and
numerical optimization (CME 304) and the ICME Ph.D. qualifying
examinations on these subjects.
These notes are based on the lecture notes of Mark Embree [2, 3]
used to teach advanced undergraduate and graduate students in applied
mathematics at Rice University and covers spectral theory, the singular
value decomposition, and allied concepts. The course supplements these
notes with material from Parts II and IV of Numerical Linear Algebra
by Nick Trefethen and David Bau [4].
I thank Ernest Ryu, Michael Saunders, and Ed Schmerling for their
helpful comments and suggestions about my choice of topics and style of
presentation. I must also thank Mark Embree, for kindling my interest
in this beautiful subject, and Hannah, for her tireless support at home.
Stanford, California

Yuekai Sun

Preface

This edition of the refresher course notes has been edited to address
some typographical errors, though I have probably also managed to simultaneously introduce several new ones. Furthermore, a few sections
have been slightly revised to present material in a different manner and
some material has been added to discuss topics not covered in the original version. Jordan Canonical form is now presented in a significantly
different manner than the original and sections have been added on
the real Scuhr form and some matrix factorizations. These sections are
written somewhat differently than the existing material in the notes as
they are intended to provide a brief introduction to the material and
then point the reader towards some resources for further study.
The bulk of the notes still closely mirrors the original version written
a year ago by Yuekai and I would like to thank him for the original
construction of these notes.
Stanford, California

Anil Damle

ii

1
Preliminaries

In this first chapter, we refresh basic concepts in matrix analysis. The


purpose is two-fold: (i) to ensure readers are familiar with the concepts
relevant to our later exposition and (ii) to standardize notation. We
expect readers to be familiar with these concepts; hence we omit all but
the most trivial proofs. Readers who desire a more detailed treatment
should refer to Part I and Lecture 6 of Trefethen and Baus excellent
textbook [4].

1.1
1.1.1

Euclidean space
Vectors

The basic building blocks are complex numbers (scalars), which we


denote by italicized Greek and Latin letters, e.g. , a C etc.
Using these scalars, we can build column vectors of length n, which
we denote by lower-case bold-faced letters, e.g. x Cn . The jth entry
of x is denoted by xj C. Sometimes, we shall emphasize that a scalar
or vector has real entries: x Rn , vi R.
A set of vectors is linearly independent provided the zero vector
cannot be expressed as a nontrivial linear combination of the vectors
1

Preliminaries

in the set: 1 x1 + +n xn = 0 1 = = n = 0. Equivalently,


no vector in that set can be expressed as a linear combination of the
other vectors in the set.
A set S is called a subspace if it is closed under vector addition and
scalar multiplication; i.e. (i) if x, y S, then x + y S and (ii) if
x S, and C, then x S.
The span of a set of vectors is the set of all possible linear combinations of the vectors in the set:
span{x1 , . . . , xn } = {1 x1 + + n xn | 1 , . . . , n C}.
The span of a set of vectors is a subspace.
A basis for a subspace S is a set of linearly independent vectors that
span S. Bases are not unique but every basis for a subspace contains
the same number of vectors. We call this number the dimension of the
subspace S, written dim(S). If S Cn , then dim(S) n.
We seek to extend our intuition about geometry naturally to Cn ,
which implies we seek generalizations of the usual notions of angle and
distance or length. First, we define the dot product between two vectors
x, y Cn to be
n
X
hx yi :=
x
i yi = x y,
i=1

where x denotes the conjugate-transpose of x:




x = x
1 x
2 x
n C1n ,
a row vector consisting of the complex conjugates of the entries of x.
Sometimes we turn a column vector into a row vector without conju

gating the entries; we call this the transpose: xT = x1 x2 xn .
T , and if x Rn , then x = xT .
Note that x = x
The dot product provides a notion of magnitude, or norm, of a
vector x Cn . This is the Euclidean norm of x:
v
u n
uX

kxk2 = x x = t
|xi |2 .
i=1

Here the subscript is used to distinguish this specific norm. More generally, a norm must satisfy kxk 0 and kxk = 0 if and only if x = 0.

1.1. Euclidean space

A norm also satisfies: (i) kxk = ||kxk for all C, and (ii) the
triangle inequality: kx + yk kxk + kyk. We call a vector of norm 1 a
unit vector.
The dot product satisfies the Cauchy-Schwarz inequality:
|x y| kxk2 kyk2 ,

(1.1)

a very useful inequality that appears in many mathematical subjects.


Equality holds in (1.1) when y is a scalar multiple of x.
The dot product also provides a notion of acute angle between two
vectors x, y Cn :


|x y|
(x, y) = arccos
.
kxk2 kyk2
This notion of angle is, in essence, a measure of the sharpness of the
Cauchy-Schwarz inequality. The argument to arccos is always between
0 and 1 so the angle is always between 0 and /2.
We say the vectors x and y are orthogonal if x y = 0 because
arccos(0) = /2. We denote this case using x y. This definition of
angle and norm immediately yields a generalization of the Pythagorean
Theorem to Cn : If x and y are orthogonal, then
kx + yk22 = kxk22 + kyk22 .
Two sets U and V are orthogonal if u v for every u U and
v V. The orthogonal complement of a set U Cn is the set of vectors
orthogonal to every u U, denoted by
U := {v Cn : u v = 0 for all u U}.
The sum of two subspaces U and V is denoted by
U + V := {u + v : u U, v V}.
If the two subspaces intersect trivially (U V = {0}), then we call the
sum of U and V a direct sum, denoted by U V. We use this special
notation because in the case of a direct sum, every vector x U V
can be decomposed uniquely as x = u + v for some u U and v V.

Preliminaries

1.1.2

Matrices

Using scalars, we can also build matrices with m rows and n columns,
which we denote by a bold capital letter, e.g. A Cmn . The entry
in the jth row and kth column is denoted by ajk . We often split a
matrix into columns a1 , . . . , an or rows aT1 , . . . , aTm . For example, we
can represent A C32 as
T
a1



A = a1 a2 or A = aT2 .
aT3
The conjugate-transpose and transpose of a matrix are defined and
denoted similarly to the respective operations for a vector: the j, k entry
of A is a
kj and then j, k entry of AT is akj . For our 3 2 example
matrix A,


a
11 a
21 a
31

A =
a
12 a
22 a
32


a11 a21 a31
T
A =
.
a12 a22 a23
Matrices with the same number of rows and columns are called square
matrices.
We can verify using a direct calculation that
(AB) = B A
(AB)T = BT AT .
We often add and multiply matrices and vectors block-wise in the same
way we carry out matrix multiplication entry-wise. For example, the
Karush-Kuhn-Tucker (KKT) conditions for some classes of optimization problems can be expressed as
Hx + AT y = g
Ax = b,
which can also be expressed as

   
H AT x
g
=
.
A 0
y
b

1.1. Euclidean space

We can also generalize the dot product to handle m n matrices:


hA Bi = Tr(A B)
where Tr() denotes the trace of a square matrix; i.e. the sum of its
diagonal entries. The trace is a linear operator: it satisfies (i) Tr(A +
B) = Tr(A)+Tr(B) and (ii) Tr(A) = Tr(A). The trace of a product
is also invariant with respect to cyclic permutations of the product; i.e.
Tr(ABC) = Tr(CAB) = Tr(BCA).
Like the dot product, the trace product also provides a norm. This
norm is called the Frobenius norm:
v
uX
n
p
um X
kAkF := Tr(A A) = t
|aij |2 .
i=1 j=1

The trace product and the Frobenius norm also satisfy the CauchySchwarz inequality:
|Tr(A B)| kAkF kBkF .
If A is a scalar multiple of B, then equality is attained.
In fact, we can use any vector norm, for example the aforementioned euclidean norm, to measure magnitude of a matrix by the largest
amount the matrix stretches vectors. We define the induced matrix
norm of a matrix A Cmn as
kAk := max
x6=0

kAxk
,
kxk

which may be equivalently expressed as


kAk := max kAxk.
kxk=1

There is no simple formula for this norm and we shall describe how to
compute it after developing the requisite machinery.
These matrix norms (and many others) satisfy the same properties
the Euclidean norm satisfies: (i) kAk 0 and kAk = 0 if and only if
A = 0, (ii) kAk = ||kAk for C, and (iii) kA + Bk kAk + kBk.

Preliminaries

Both these norms are also submultiplicative; i.e. if A Cmn and


x Cn , then
kAxk kAkkxk.
The result also holds if we replace x by a matrix B Cnp .
We say a matrix A Cmn is diagonal if all its off-diagonal entries
are zero (aij = 0 if i 6= j). We say A is upper triangular if all its entries
below the main diagonal are zero (aij = 0 if i > j) and lower triangular
if all its entries above the main diagonal are zero.
A square matrix is Hermitian if A = A and symmetric if AT = A.
In the case of real matrices, these notions are the same and we call
such matrices symmetric. In the case of complex matrices, Hermitian
matrices are more common and [thankfully] more useful than symmetric matrices. We can also define skew-Hermitian and skew-symmetric
martices that satisfy A = A and AT = A respectively.
A square matrix is unitary if U U = I. We say the columns of
U are orthonormal. Since U is a square matrix (it has n orthonormal
columns, each in Cn ), the columns of U are an orthonormal basis for
Cn and such bases are useful because we can express a vector x as
x = (x u1 )u1 + + (x un )un .
We also often encounter matrices V Cnk , n > k with orthonormal columns. Such matrices satisfy V V = I Ckk , however,
VV 6= I Cnn . There is no universally accepted term for such matrices although a commonly used term is subunitary. Premultiplying a
vector by such a matrix preserves the norm because
p

kVxk2 = (Vx) (Vx) = x V Vx = x x = kxk2 .


The result also holds if we replace x by a matrix A and the Euclidean
norm with either the induced matrix norm or the Frobenius norm :
kUAk2 = max kUAxk2 = max kAxk2 = kAk2 .
kxk2 =1

kxk=1

The intuition behind these results is that the product Ux is just a


representation of x in a new orthonormal basis and such a change of
basis should not affect the length/magnitude of x/A. These results are
known as the invariance of the Euclidean/induced matrix/Frobenius
norm under unitary transformations.

1.2. The four fundamental subspaces

1.2

The four fundamental subspaces

The range or column space of a matrix A Cmn , denoted by R(A),


is given by
R(A) := {Ax | x Cn } Cm .
The null space or kernel of A, denoted by N (A), is given by
N (A) := {z Cn | Az = 0} Cn .
The range and nullspace of A and A are referred to collectively as the
four fundamental subspaces of A.
The column rank of A is the dimension of R(A); i.e. the number
of linearly independent vectors in a basis for R(A). The row rank of
A is the dimension of R(A ). The row rank of a matrix is the same
as its column rank so we refer to this quantity as simply the rank of a
matrix.
The range and null space of A are subspaces of Cm and Cn respectively, and the span of a set of vectors {x1 , . . . , xn } Cm is the same
as the range of the matrix whose columns are x1 , . . . , xn :


span{x1 , . . . , xn } = R x1 xn .
The set of vectors are linearly independent if


N x1 xn = {0}.
The four fundamental subspaces of A satisfy what Gilbert Strang calls
the Fundamental Theorem of Linear Algebra.
Theorem 1.1. If A Cmn , its four fundamental subspaces satisfy
R(A) N (A ) and R(A) N (A ) = Cm ,
R(A ) N (A) and R(A ) N (A) = Cn .
Thus, a vector x Cm can be expressed uniquely as x = xR + xN ,
where xR R(A) and xN N (A ) are orthogonal.
A square matrix A Cnn is nonsingular or invertible if and only
if the linear system Ax = b has a unique solution x for any b

Preliminaries

Cn . We express the unique solution to such linear systems as x =


A1 b. Equivalently, A is nonsingular if R(A) = Cn or N (A) = {0}. A
square matrix that is not nonsingular is called singular and rectangular
matrices (matrices that are not square) are always singular.
If A is nonsingular, then its inverse is unique. Thus, for a nonsingular matrix A, (A )1 = (A1 ) and if both A and B are nonsingular,
then their product is also nonsingular:
(AB)1 = B1 A1 .

1.3

Projectors

A square matrix P is called a projector if P2 = P. The name projector


hints at the action of these matrices. Imagine shining a lamp onto the
subspace R(P) from some direction. The shadow cast by a vector v
represents Pv, the projection of v onto R(P).
Our intuition about projectors says if v R(P), then it should
lie exactly on its shadow so Pv = v. The mathematical notion of a
projector agrees with this intuiton: if v R(P), then there exists some
x such that v = Px and
Pv = P2 x = Px = v.
If P is a projector, then I P is also a projector and R((I P)) =
N (P)). Further, R(P) N (P) = {0}, so a projector partitions Cn
into a direct sum of two subspaces. We say such a pair of subspaces are
complementary subspaces and the pair of projectors are complementary
projectors.
If the projector P is Hermitian, the fundamental theorem of linear
algebra implies R(P) N (P). In this case we call P an orthogonal
projector because the space P projects onto is orthogonal to the space
it projects along. The orthogonal projection of a vector v satisfies
min kv wk = kv Pvk.
wR(P)

In general, the induced norm of a projector satisfies kPk 1 because kPvk = kvk for every v R(P). If P is an orthogonal projector,
then kPk = 1 because P decomposes v into Pv + (I P)v, where Pv
and (I P)v are orthogonal. The converse is also true.

1.3. Projectors

Theorem 1.2. A projector P is orthogonal if and only if kPk = 1.


We can construct an orthogonal projector onto a subspace S using
an orthonormal basis for S. Let u1 , . . . , uk be an orthonormal basis for
S and U be the matrix whose jth column is ui . Then the matrix
PS = UU
is an orthogonal projector onto S. Sometimes, S is one-dimensional;
i.e. S = span{v}. In this case, we use a special case of this formula:
PS =

vv
.
v v

Although we can construct orthogonal projectors using nonorthonormal bases, this is rarely done in practice, so we skip this topic.

2
Spectral Theory

Matrices are useful in higher mathematics because of the insights available from spectral theory. This chapter develops the theory of eigenvalues, eigenvectors and allied concepts via two approaches, the Schur
form and the Jordan form. We do so because each approach yields different insights for a variety of problems, and we believe reconciling the
two approaches enriches ones understanding of spectral theory.

2.1

Eigenvalues and eigenvectors

The resolvent of a matrix is a key to its set of eigenvaluesit not only


reveals the existence of eigenvalues, but also indicates where they lie
in the complex plane and how sensitive they are to perturbations.
Definition 2.1. The resolvent of a square matrix A Cnn is the
matrix-valued mapping
1
R(z) := (zI A) .

The entries of the resolvent are rational functions of z, and the


resolvent fails to exist at any z C if z is a pole of any of the rational
10

2.1. Eigenvalues and eigenvectors

11

functions. These poles are the eigenvalues of the matrix A.


Definition 2.2. A scalar C is called an eigenvalue of a square
matrix A Cnn if I A is singular; i.e. R(z) does not exist. A
nonzero vector u Cn is called an eigenvector of A associated with
eigenvalue if u N (A I) or equivalently, and u satisfy the
eigenvalue equation:
Au = u.
N (A I) is often called the eigenspace of A associated with the
eigenvalue . The set of all eigenvalues of A is called its spectrum:
(A) := { C : I A is singular}.
An equivalent statement to I A being singular is det (I A) = 0,
which characterizes the eigenvalues as roots of a polynomial in .
How large can the eigenvalues of a matrix be? Is there a threshold
for |z| beyond which R(z) must exist? We shall answer these questions
using the classic Neumann series approach.
Theorem 2.1 (Neumann series). Let A be a square matrix with
kAk < 1. Then the matrix I A is nonsingular and

X
(I A)1 =
Ai ,
i=0

1
.
1 kAk

k(I A)1 k

Proof. Consider the partial sum


Sk :=
The sequence of partial

k
X

Ai .

i=1

sums {Sk }k=1 is


m
X
k1
i

a Cauchy sequence because

A k kAkk1

kSk+m Sk k = kA

i=0

X
k1

kAk

i=0

kAki =

m
X

i=0
k1
kAk

1 kAk

kAki

12 Spectral Theory
which becomes arbitrarily small as k . The space Cnn endowed
with the matrix norm is complete. Thus, the sequence of partial sums
has a limit, which we denote by S . We now verify that this limit is
the desired inverse using a direct calculation:
(IA)S = lim (IA)Sk = lim
k

k
X

Ai

i=0

k+1
X

Ai = lim IAk+1 = I

i=1

because Ak 0 as k . Thus, IA is nonsingular and (IA)1 =


S . Further,
1
1
=
.
k 1 kAk
1 kAk

kS k = lim kSk k lim


k

The inequality holds because we can bound kSk k using the geometric
P
i
series
i=1 kAk = 1/(1 + kAk).
Theorem 2.1 yields a bound on k R(z)k when |z| > kAk:
k R(z)k = k(zI A)1 k

1
.
|z| kAk

Thus there can be no eigenvalues with magnitude larger than kAk and
(A) {z C : |z| kAk}.
Of course, we can arrive at the same conclusion using the eigenvalue
equation and the fact that matrix norms are submultiplicative:
||kuk = |Auk kAkkuk.
We can ask a more fundamental question: do eigenvalues exist for
every matrix? Can the spectrum of a matrix be empty?
Theorem 2.2. Every matrix A Cnn has at least one eigenvalue;
i.e. R() has at least one pole.
Proof. The entries of the resolvent are analytic functions in an open
set not containing an eigenvalue because rational functions are analytic
over a set not containing its poles. If A has no eigenvalues, then the

2.1. Eigenvalues and eigenvectors

13

entries of R(z) are analytic over the entire complex plane and k R(z)k
must be bounded in the disk {z C : |z| kAk}. k R(z)k is also
bounded outside the disk (a consequence of Theorem 2.1) so by Liouvilles Theorem, R(z) must be a constant. Theorem 2.1 also says
k R(z)k 0 as |z| and so we must have R(z) = 0. This is a contradiction because if A has no eigenvalues, R(z) should be nonsingular.
Hence, A must have at least one eigenvalue.
Can we say more about the location of eigenvalues in the complex
plane? The basic result in this area is a theorem named after the Belarusian mathematician Semyon A. Gershgorin.
Definition 2.3. The ith Gershgorin disc is a ball of radius ri =
P
j6=i |aij | centered at aii in the complex plane; i.e.
X
Di = {z C | |z aii |
|aij |}.
j6=i

Theorem 2.3 (Gershgorins disc theorem). Every eigenvalue of


A lies in a Gershgorin disc.
Proof. Let A Cnn and , u be an eigenvalue-eigenvector pair; i.e.
Au = u.
Let ui be the largest entry of u (in magnitude) and consider the ith
row of this equation:
n
X
aij uj = ui .
j=1

We can collect the terms containing ui and then divide by ui to obtain


X uj
aij
= ( aii )
ui
j6=i

Taking the magnitude of both sides yields


X
X
|uj |
|aij |
|aij |
| aii |.
|ui |
j6=i

j6=i

14 Spectral Theory
We stated and proved the vanilla version of Gershgorins disc theorem. There are stronger versions of the theorem. Informally, a general
statement of the theorem states that if there are k disks whose union is
a connected set that is disjont from the remaining disks, then exactly
k eigenvalues lie in the domain fromed by the union of those k disks.

2.2

The Schur form

The Schur form or Schur factorization of a square matrix is a representation as the similarity transform of a triangular matrix. Because the
Schur form exists for every square matrix, it is a very general theoretical tool. The Schur form enables matrix analysts to study the behavior
of general square matrices via study of the behavior of triangular matrices.
Theorem 2.4 (The Schur form). If A Cnn , then there exist a
unitary matrix U Cnn and an upper triangular matrix T Cnn
such that
A = UTU .
Proof. We shall prove the existence of the Schur form via mathematical
induction. If A C11 (A is a scalar), then we can express A as UTU ,
where U = 1 and T = A.
Suppose the result holds for n 1 n 1 matrices. If A Cnn ,
then Theorem 2.2 guarantees the existence of at least one eigenvalue
and a unit length eigenvector u such that
Au = u.


Let V denote an orthonormal basis for span{u} so u V is a unitary
matrix. Then



 

u Au u AV
.
u V A u V =
V Au V AV
u is an eigenvector of A so u Au = kuk2 = and V u = 0 because
the columns of V are an orthonormal basis for span{u} . Hence,




 
u AV
.
u V A u V =
0 V AV

2.2. The Schur form



u V is unitary so we can rearrange to obtain



 u AV 

A= u V
u V .

0 V AV

15

(2.1)

V AV Cn1n1 . Thus by the inductive hypothesis, V AV has a


and T
such that
Schur form; i.e. there exist U
T
U
.
V AV = U
We substitute this Schur form of V AV into (2.1) to obtain



 u AV 

A= u V
u V

0 UTU






 1 0 u AV 1 0 
= u V
u V

0 U
T
0 U 0
i
h
i  u AV h

= u VU
u VU .

0
T

(2.2)

The matrix
in the
h
i center is upper triangular because T is upper trian is unitary because
gular. u VU
"
# 

h
i h
i
u
VU

u
u
1 0

u VU
u VU =
V VU
= 0 I .
U V u U
This concludes the induction step.
Pre- and post-multiplying by a matrix and its inverse is called a
similarity transform. Similarity transforms preserve the eigenvalues of
a matrix. If and u are an eigenpair of A and V is a nonsingular
matrix, then and Vu is an eigenpair of VAV1 . We can verify by a
direct calculation:
VAV1 (Vu) = VAu = Vu.
The Schur form of a matrix A represents it as the similarity transform of a triangular matrix T. Thus, The eigenvalues of A are the
same as the eigenvalues of T. T is an upper triangular matrix, hence
its eigenvalues are its diagonal entries. This is true because the resolvent zI T will be singular if and only if z equals one of the diagonal
entries of T. The Schur form thus reveals the eigenvalues of a matrix.

16 Spectral Theory
2.2.1

Real Schur form

We observed that in the Schur factorization of a matrix the upper


triangular matrix T contains the eigenvalues of A as its diagonal entries. Thus, when a real matrix A Rnn has complex eigenvalues
its Schur form must necessarily contain complex numbers. However, in
some cases it is convenient to consider a factorization of a real matrix
that uses only real numbers.
In this case we may consider the so called real Schur form of a
matrix. We still consider factoring a real matrix A as
T
V
,
A=V
contain only real numbers. However, there is a price to
and bT
where V
pay if we want the factorization to contain only real numbers. Specifi is a block upper triangular matrix
cally, we may now only say that T
of the form

T1,1 . . .
..
..
=
T

.
.
Tm,m
where the diagonal blocks, Ti,i , are either of size 1 1 or 2 2. In this
reveal the eigenvalues
case the eigenvalues of the diagonal blocks of T
of A. The 1 1 blocks on the diagonal are themselves eigenvalues of
A, and the eigenvalues of the 2 2 blocks on the diagonal are also
eigenvalues of A.
Since the eigenvalues of a real matrix A may be characterized as the
roots of a polynomial with real coefficients we know that the complex
eigenvalues must appear in conjugate pairs. Thus, we may observe that
each set of complex conjugate eigenvalues of A appear as the eigenval
ues of one of the 2 2 block on the diagonal of T.
2.2.2

The spectral decomposition of Hermitian matrices

If A is a Hermitian matrix (A = A), then its Schur form satisfies


A = UTU
A = (UTU ) = UT U .

2.2. The Schur form

17

U is nonsingular; hence T = T so T must be a diagonal matrix.


Further, the diagonal entries of T satisfy tii = tii ; thus the eigenvalues
of A are real. In this case, it is customary to denote T as D to emphasize
the fact that it is a diagonal matrix:
A = UDU .

(2.3)

We call this the unitary diagonalization of a Hermitian matrix. Further,


the columns of U are unit-length eigenvectors of A associated with the
eigenvalues 1 , . . . , n :
Aui = i ui , i = 1, . . . , n.
Hence, not only are the eigenvalues of Hermitian matrices real, but the
associated eigenvectors also form an orthonormal basis for Cn .
We can expand the unitary diagonalization of a Hermitian matrix
A column-wise to resolve A as a sum of orthogonal projectors:


1
u1



.
.
..
A = u1 . . . un
..
n

un

= 1 u1 u1 + + n un un
= 1 P1 + + n Pn
where Pi = ui ui , i = 1, . . . , n. The matrices Pi are orthogonal projectors onto the spans of the eigenvectors ui because Pi = Pi and
P2i = ui (ui ui )ui = ui ui = Pi .
Further, this set of orthogonal projectors is a resolution of the identity
because we can sum the matrices Pi to yield

u
n
n
X
X

 .1

Pi =
ui ui = u1 . . . un .. = UU = I.
i=1
i=1
un
Thus, we can express any Hermitian matrix as a sum of orthogonal
projectors onto its eigenspaces weighted by the associated eigenvalues.
This is called the spectral decomposition of a Hermitian matrix.

18 Spectral Theory
The fact that the Schur form of a Hermitian matrix is also the
unitary diagonalization is actually true for the larger class of normal
matrices; i.e. A Cnn that satisfy
A A = AA .
We leave the proof as an exercise to the curious reader.

2.3

The Jordan form

In this section, we generalize the spectral decomposition of Hermitian


matrices to the Jordan form for general square matrices. First we generalize the spectral decomposition to n n matrices with n linearly
independent eigenvectors. Then, we provide a brief introduction to generalized eigenvectors and the Jordan form. The Jordan form is a fragile
object that is rarely computed in practice, but like the Schur form, its
generality makes it a useful theoretical tool.
2.3.1

The spectral decomposition of diagonalizable matrices

Suppose A Cnn has eigenvalues 1 , . . . , n and associated eigenvectors u1 , . . . , un ; i.e.


Au1 = 1 u1
..
.
Aun = n un .
We can organize these n eigenvalue equations into one matrix equation:

 

A u1 . . . un = 1 u1 . . . n un
AU = UD.
Definition 2.4. An nn matrix A is diagonalizable if it has n linearly
independent eigenvectors.
Most square matrices (in a sense that can be made mathematically
rigorous) are diagonalizable. For example, all normal matrices are diagonalizable. If an n n matrix has n distinct eigenvalues, then it must

2.3. The Jordan form

19

also have n linearly independent eigenvectors because eigenvectors associated with distinct eigenvalues are linearly independent.
Lemma 2.1. Eigenvectors associated with distinct eigenvalues are linearly independent.
Proof. Let u1 , . . . , up be a set of eigenvectors associated with distinct
eigenvalues 1 , . . . , p . Suppose that there exist 1 , . . . , p not all zero
such that
1 u1 + + p up = 0.
We premultiply both sides by j I A, j = 2, . . . , p to obtain
1

p
Y

(A j I)u1 + + p

j=2

p
Y

(A j I)un = 0.

j=2

Because for i 6= j (A i I)uj = (j i )uj most products on the left


hand side contain a term (A i I)ui = 0 so we can ignore all but the
first term in the sum on the left to yield
1

p
Y

(A j I)u1 = 0.

j=2

We can simplify this product to obtain


1

p
Y

(A j I)u1 = 1

j=2

p
Y

Au1 j u1 = 1

j=2

p
Y

(j 1 )u1

j=2

which can only be zero if 1 = 0 or some i = 1 or u1 = 0. The latter


two are contradictions; hence 1 = 0. We can show that 2 , . . . , p are
zero using a similar argument.
If A is diagonalizable, then U is nonsingular and we can rearrange
the matrix form of the n eigenvalue equations to obtain
A = UDU1 .

(2.4)

This equation is very similar to (2.3) and we call this the diagonalization
of a diagonalizable matrix.

20 Spectral Theory
We can expand (2.4) to obtain the spectral decomposition of a di i denote columns and rows of U
agonalizable matrix A. Let ui and u
and U1 respectively. We have


1
1
u


..
.
.
A = u1 . . . un
.
.
n
n
u
1 + + n un u
n
= 1 u1 u
= 1 P1 + + n Pn ,
i , i = 1, . . . , n. u
1, . . . , u
n are sometimes called the left
where Pi = ui u
eigenvectors of A because we can rearrange (2.4) to obtain
U1 A = DU1 .
After expanding this equation row-wise, we have


1
1
u
1
u
..

.
..
. A =
.. .
.
n
n
n
u
u
n when multi 1 , . . . , u
Thus, we see that A preserves the direction of u
plied from the left. Similarly, u1 , . . . , un are also sometimes called the
right eigenvectors of A. Although both sets of eigenvectors are not orthogonal in general, they are linearly-independent and form bases for
Cn . The left and right eigenvectors are biorthogonal; i.e.
(
0 i 6= j
j =
.
ui u
1 i=j
The matrices Pi are projectors onto the spans of the eigenvector ui
because
i = Pi .
P2i = ui (
ui ui )
ui = ui u
This set of projectors is also a resolution of the identity because we can
sum the Pi s to yield

1
u
n
n
X
X



i = u1 . . . un ... = UU1 = I.
Pi =
ui u
i=1

i=1

n
u

2.3. The Jordan form

21

Thus, we can express any diagonalizable matrix as a sum of projectors


onto its eigenspaces weighted by the associated eigenvalues. This is
called the spectral decomposition of a diagonalizable matrix.
Note that the spectral decomposition of normal matrices is a special
case of this spectral decomposition. If A is normal, then the eigenvectors ui are orthonormal, the eigenvalues i are real, and the projectors
Pi s are orthogonal.
2.3.2

Generalized eigenvectors

Although matrices that are not diagonalizable are rare, they do sometimes arise in practice and we need a more general theory to handle
these nondiagonalizable or so-called defective matrices. In this section,
we develop the theory concerning generalized eigenvectors.
Definition 2.5. The algebraic multiplicity ai of the eigenvalue i
equals the number of times i appears on the diagonal of the T factor
in the Schur form A = UTU .
The geometric multiplicity gi of the eigenvalue i equals the dimension of the eigenspace associated with i ; i.e. gi = dim(N (A i I)).
The geometric multiplicity of an eigenvalue is always less than or
equal to the algebraic multiplicity. If the sum of the geometric multiplicities is less than the sum of algebraic multiplicities, then there
are not enough eigenvectors to form a basis for Cn and the matrix is
defective. For such matrices, we substitute generalized eigenvectors in
lieu of the missing eigenvectors.
Definition 2.6. A nonzero vector u Cn is called a generalized eigenvector of A associated with the eigenvalue if u N ((A I)a ) ,
where a is the algebraic multiplicity of .
N ((A I)a ) is called the generalized eigenspace associated with
the eigenvalue .

22 Spectral Theory
2.3.3

Jordan form

Previously we discussed diagonalizable matrices, however, not all matrices are diagonalizable. One such example is the n n matrix

...

,
Jn () =

..

. 1

where the only eigenvalue of Jn () is with algebraic multiplicity n.


If we now consider

..

.
Jn () I =
,

1
we observe that N (Jn () I) = span{e1 }. Thus, has geometric
multiplicity one, which implies that A is not diagonalizable.
Motivated by such examples we provide a brief introduction to the
Jordan form. While presented here absent its construction, the Jordan
form is nevertheless of theoretical importance. Specifically, the Jordan
form exists for all matrices. That being said, there is no stable way to
compute the Jordan form and thus it appears rarely in practice. A more
comprehensive treatment of the material may be found in Chapter 3
of [5].
The aforementioned defective matrix is in fact an example of what
is known as a Jordan block associated with the eigenvalue i . More
generally, a ni ni matrix of the form

i 1

i . .
,
Jni (i ) =

..

. 1
i
is known as a Jordan block associated with the eigenvalue i . These
matrices serve as the building blocks of the Jordan form, by allowing the

2.3. The Jordan form

23

off diagonal entries we are able to circumvent potential discrepancies


between the algebraic and geometric multiplicities of eigenvalues.
Theorem 2.5 (The Jordan form). Let A Cnn . There exists a
nonsingular matrix U such that

Jn1 (1 )

1
..
A = U
U ,
.
Jnk (k )
where n1 + + nk = n, each i is an eigenvalue A, and the elements
of the set {1 , . . . , k } may not be distinct.
2.3.4

The characteristic polynomial

The Jordan form enables us to define and analyze functions of matrices, such as the matrix exponential exp(A) and matrix polynomials.
We conclude this chapter with a section about the characteristic polynomial of a matrix.
Definition 2.7. The characteristic polynomial of a matrix A
Cn with distinct eigenvalues 1 , . . . , p and algebraic multiplicities
a1 , . . . , ap is the polynomial
pA (z) =

p
Y

(z i )ai .

i=1

A closely related polynomial is the minimal polynomial of a matrix.


Definition 2.8. The minimal polynomial of a matrix A Cn with
distinct eigenvalues 1 , . . . , p and indices d1 , . . . , dp is the polynomial
mA (z) =

p
Y
(z i )di .
i=1

24 Spectral Theory
The index (length of the longest Jordan chain) cannot be greater
than the algebraic multiplicity of an eigenvalue; i.e. di ai , so the
minimal polynomial has degree no greater than n and it divides the
characteristic polynomial.
It is easy to see that the minimal polynomial, thus and the characteristic polynomial, annihilates the matrix using the Jordan form.
Since a matrix polynomial is a sum of matrix powers,
!
d
d
d
X
X
X
mA (A) =
i A j =
i UJU1 = U
i Jj U1
i=1

i=1
1

= UmA (J)U

i=1

for some d and 1 , . . . , d . J is a block diagonal matrix; thus mA (J) is


in fact the block diagonal matrix with blocks mA (J1 ), . . . , mA (Jp ); i.e.

mA (J1 )

..
mA (J) =
.
.
mA (Jp )
mA (Ji ) = 0 because mA (Ji ) contain a term (Ji i I)di = Ndi i = 0.
We conclude that mA (J) = 0; hence mA (A) = 0.
This fact was first postulated by Arthur Cayley, who also proved it
for the 22 and 33 cases, and is called the Cayley-Hamilton theorem.
Sir William Hamilton proved the theorem for general square matrices.
Theorem 2.6 (Cayley-Hamilton). The characteristic polynomial
pA (z) of a square matrix A annihilates A; i.e.
pA (A) = 0.

3
The Singular Value Decomposition

The singular value decomposition (SVD) expresses a matrix as the product of a unitary matrix, a diagonal matrix, and a second unitary matrix.
It is one of the most useful tools in the matrix analysts toolbox because of its amazing range of applications. In this chapter, we shall
construct the SVD and then explore some of its many applications in
computational mathematics.

3.1

Detour: Hermitian matrices

The matrix A A, often called the Gram matrix or Gramian, is a Hermitian matrix that arises in diverse contexts. In section 2.2, we proved
Hermitian matrices are unitary diagonalizable; i.e. (i) their eigenvalues
are real and (ii) their eigenvectors form orthonormal bases for Cn .
Definition 3.1. The Rayleigh quotient of a matrix A Cnn evaluated at the nonzero vector x Cn is the scalar quantity
x Ax
C.
x x

25

26 The Singular Value Decomposition


The Rayleigh quotient is named after Lord Rayleigh (John Strutt)
and its behavior is well understood if A is Hermitian. For example, if A
is Hermitian, its Rayleigh quotient is real and bounded by the smallest
and largest eigenvalues of A. To see this, first express the argument
to the Rayleigh quotient x as a linear combination of the eigenvectors
u1 , . . . , un of A:
n
X
x=
i ui ,
i=1

x u

n
where i =
i because u1 , . . . , un is an orthonormal basis for C .
Thus, the Rayleigh quotient of A evaluated at v can be expressed as

a U AUa
a Da
x Ax
=
=
,
x x
a U Ua
a a
where the final equality holds because A is Hermitian; hence unitarily
diagonalizable.
We can further simplify this expression using the fact that D is
diagonal to obtain
x Ax
1 |1 |2 + + n |n |2
=
.
x x
|1 |2 + + |n |2
Both numerator and denominator are real; hence the Rayleigh quotient
is real. If we order the eigenvalues of A such that
1 n ,
we can bound the Rayleigh quotient above and below by 1 and n :

1 |1 |2 + + |n |2
1 |1 |2 + + n |n |2

= 1 ,
|1 |2 + + |n |2
|1 |2 + + |n |2

n |1 |2 + + |n |2
1 |1 |2 + + n |n |2

= n .
|1 |2 + + |n |2
|1 |2 + + |n |2
Note the bounds are attained by the eigenvectors associated with the
largest and smallest eigenvectors respectively. Thus, these eigenvalues
are the optimal values of optimization problems:
1 = max
x

x Ax
x Ax
and

=
min
.
n
x
x x
x x

3.1. Detour: Hermitian matrices

3.1.1

27

The CourantFischer minimax principle

How can the interior eigenvalues be characterized as the optimal value


to optimization problems? Assume the usual set up: A is a Hermitian
matrix with eigenvalues ordered such that 1 n and eigenvectors u1 , . . . , un .
The expression of the Rayleigh quotient as
x Ax
1 |1 |2 + + n |n |2
=
,
x x
|1 |2 + + |n |2
where i = x ui , hints at the solution. If we minimize the Rayleigh
quotient over the orthogonal complement of un , then we can ignore the
smallest eigenvalue n and the optimal value n1 is attained at un1 .
Similarly, if we maximize the Rayleigh quotient over the orthogonal
complement of u1 , then we can ignore the largest eigenvalue 1 and
the optimal value 2 is attained at u2 .
In general, k can be characterized as the optimal value of
k = min

xVnk

x Ax
x Ax
=
max
,

x x
x x
xUk1

where Uk1 and Vnk denote the span of the eigenvectors associated
with the k 1 largest eigenvalues and nk smallest eigenvalues respectively. Although this describes the interior eigenvalues, it is unsatisfactory because it requires knowledge of the eigenspaces associated with
the more extreme eigenvalues.
The CourantFisher minimax principle circumvents this problem
by describing the eigenvalues of A using minimax problems.
Theorem 3.1. If A is a Hermitian matrix and its eigenvalues are ordered such that 1 n , then its eigenvalues satisfy
k =

max

min

dim(Sk )=k xSk

x Ax
.
x x

Proof. Let Sk be a k dimensional subspace of Cn and Vnk+1 denote


the span of the eigenvectors associated with the n k + 1 smallest

28 The Singular Value Decomposition


eigenvalues; i.e. Vnk+1 = span{uk , . . . , un }. Then, their intersection
Vnk+1 Sk must be nontrivial because dim(Vnk+1 ) + dim(Sk ) > n.
Choose a vector v Vnk+1 Sk . v is a linear combination of the
eigenvectors uk , . . . , un :
v = k uk + + n un
so the Rayleigh quotient at v can be bounded by
v Av
k |k |2 + + n |n |2
=
k .
v v
|nk+1 |2 + + |n |2
The minimum of the Rayleigh quotient over Sk can be bounded by
min

xSk

v Av
x Ax

k .
x x
v v

This bound is attained by Sk = span{u1 , . . . , uk } so k is the optimal


value of a maximin problem:
k =

max

min

dim(Sk )=k xSk

x Ax
.
x x

The Courant-Fisher minimax principle can also be stated in a minimax version. This version is easier to state if the eigenvalues are arranged such that 1 n .
Lemma 3.1 (Courant-Fisher minimax principle). If A is a Hermitian matrix and its eigenvalues are ordered such that 1 n ,
then its eigenvalues satisfy
k =

3.1.2

min

max

dim(Sk )=k xSk

x Ax
.
x x

Positive definite matrices

A class of Hermitian matrices that commonly arise in practice have


Rayleigh quotients that are always positive. We say such matrices are

3.1. Detour: Hermitian matrices

29

positive definite. Positive definite matrices are desirable computationally because they permit the use the special algorithms designed to
handle these matrices. For example, the Cholesky factorization halves
the effort required to solve a symmetric positive definite linear system
when compared to Gaussian elimination.
Definition 3.2. Suppose A Cnn is a Hermitian matrix. Then A is
positive definite if its Rayleigh quotient is always positive; i.e.
x Ax
> 0 for all nonzero x Cn .
x x
An equivalent definition is: A is positive definite if its eigenvalues are
positive.
The two definitions are equivalent because if the Rayleigh quotient
is always nonnegative, then
i =

ui Aui
> 0.
ui ui

On the other hand, the Rayleigh quotient is bounded above and below
by the largest and smallest eigenvalues of A, so if the eigenvalues are
positive, so is the Rayleigh quotient. We can also define the larger class
of positive semidefinite matrices by allowing the Rayleigh quotient (and
eigenvalues) to be zero.
A third characterization of positive definite matrices involves a
quantity called the Schur complement, named after the German mathematician Issai Schur (also of Schur form fame).
Definition 3.3. Suppose A is a Hermitian matrix partitioned as


A11 A12
A=
.
A21 A22
If A11 is nonsingular, the Schur complement of A11 in A is the quantity
SA11 = A22 A12 A1
11 A12 .

30 The Singular Value Decomposition


The Schur complement arises often in optimization. For example,
it arises when we minimize a quadratic form over some of its variables;
i.e.
 T 
 
u
A11 A12 u
min f (u, v) =
.
u
v
A12 A22 v
The minimum is attained at v SA11 v because we can expand f and
complete the square in u to obtain
f (u, v) = uT A11 u + 2v A12 u + v A22 v

2
p
1
p

1


=
A11
A12 v
A11 u +
+ v A22 v v A12 A11 A12 v

2
p
1
p

=
A11
A12 v
A11 u +
+ v SA11 v.
We can set the first term to zero by choosing u = A11 A12 v so the
minimum over u is attained at v SA11 v. This interpretation of the
Schur complement leads to a characterization of positive definite matrices that is often used in optimization.
Lemma 3.2. Suppose A is a Hermitian matrix partitioned as


A11 A12
A=
.
A21 A22
A is positive definite if and only if A11 and its Schur complement SA11
are both positive definite.
The proof of this lemma mainly uses ideas from convex optimization
so we refer to Appendix A in [1] for the details.
The Gram matrix arises in many contexts, such as the stiffness
matrix in the finite element method. It is Hermitian and positive semidefinite because
kAxk2
x A Ax
=
0.
x x
kxk2
Hence, it has n nonnegative eigenvalues and the eigenvectors form an
orthonormal basis for Cn .
We are ready to start our tour of the SVD with its construction.
Unlike the Jordan form, the SVD can be computed robustly, although

3.1. Detour: Hermitian matrices

31

the standard algorithm differs from our construction. We shall assume


m n (A is tall and skinny) but our results generalize readily to
the case m < n (A is short and fat).
Theorem 3.2. Suppose A Cmn . Then there exist a unitary matrix
U Cmm , a diagonal matrix Rmn with nonnegative entries, and
a second unitary matrix V Cnn such that
A = UV .

Proof. Let A Cmn and assume m n. A A Cnn is a Hermitian matrix with eigenvalues 1 , . . . , n and associated unit-length
eigenvectors v1 , . . . , vn . A A is positive semidefinite so the eigenvalues are nonnegative. Assume there are r nonzero eigenvalues and the
eigenvalues are ordered such that
1 r > r+1 = = n = 0.

We define i = i and ui = 1i Avi , for j such that i 6= 0.


1 , . . . , n are nonnegative because A A is a positive semidefinite matrix. u1 , . . . , ur are orthonormal because A A is a Hermitian matrix
so its eigenvectors v1 , . . . , vn are orthonormal. Hence, we have
1
1
vi A Avi = i = 1,
2
i
i

1
j
v A Avj =
v vj = 0.
ui uj =
i j i
i j i
ui ui =

Further, we have
Avi = i ui ,

i = 1, . . . , r,

Finally, we choose ur+1 , . . . , un to be an orthonormal set of vectors in


span{u1 , . . . , ur } . Note that we also have
Avi = i ui = 0,

i = r + 1, . . . , n

because vi A Avi = i = 0, i = r + 1, . . . , n and vi A Avi = kAvi k2


is zero if and only if Avi = 0.

32 The Singular Value Decomposition


We can express both sets of equations in matrix form:

1

 


..
A v1 . . . vn = u1 . . . un
,
.
n
which we shall write concisely as
.

AV = U
We can rearrange to obtain the reduced SVD or thin SVD of A:
V
.
A=U
is a subunitary matrix; i.e. its columns are orthonormal but they do
U
unitary, we can choose un+1 , . . . , um to be
not span Cm . To make U
an orthonormal basis for span{u1 , . . . , un } and append these m n
to form
vectors to U
h
i
un+1 . . . um .
U= U
.
The most obvious choice is
We must also choose so that U = U
 

=
.
0

to obtain the [full] SVD of A:


Finally, we substitute U in lieu of U
A = UV .

We can expand the SVD of A column-wise to express A as a sum


of rank-1 matrices:


1
v1

.


..
..
A = u1 . . . um
.

n
vn
0
= 1 u1 v1 + + n un vn .

3.2. Singular values and singular vectors

33

This is the dyadic version of the SVD. Note that if A has rank r < n,
then we only need to sum the first r rank-1 terms because in this case
r+1 = = n = 0.
We can interpret the SVD as an algebraic representation of a
geometric fact: the image of the unit sphere under a matrix is a
hyper-ellipse. This hyper-ellipse is the one whose major axes lie along
u1 , . . . , un with (not necessarily nonzero) lengths 1 , . . . , n (assuming
m n). If rank(A) = r, then r of these axes will have nonzero length.
The pre-image of these axes are v1 , . . . , vn because AV = U.

3.2
3.2.1

Singular values and singular vectors


Singular values

The diagonal entries of , 1 , . . . , n , are called the singular values of


A, and they are usually ordered such that
1 n .
Singular values reveal a variety of properties of matrices. For example, the number of nonzero singular values r is the rank of the matrix.
This is clear from both the algebraic and geometric interpretation of
the SVD. The singular values also give rise to a class of matrix norms
that includes the induced matrix norm and the Frobenius norm. Recall
that here k k is strictly refering to the 2-norm.
Lemma 3.3. The induced norm of a matrix kAk is equal to its largest
singular value.
Proof. The induced matrix norm is
kAk = max kAxk.
kxk=1

This quantity can be expressed in terms of the singular values. Let


A = UV be the SVD of A. The induced matrix norm is invariant
under unitary transformations, so
kAk = kUV k = kk.

34 The Singular Value Decomposition


is a diagonal matrix, so we can calculate its induced norm directly.
kxk2 =

r
X

i2 x2i .

i=1

This quantity is maximized over {x Cn : kxk = 1} by x = e1 so


kAk = kk = 1 .

The Frobenius norm can also be expressed using singular values.


Lemma 3.4. The Frobenius norm of a matrix kAkF is equal to the
Euclidean norm of the vector of its singular values; i.e.
v
u n
uX
kAkF = t
i2 .
i=1

Proof. We leave the proof as an exercise to the reader. Hint: The Frobenius norm is also invariant under unitary transformations.
3.2.2

Singular vectors

The vectors u1 , . . . , um and v1 , . . . , vn are called the left and right singular vectors of A respectively. Subsets of these vectors form orthonormal bases for the four fundamental subspaces of A.
Lemma 3.5. Let A Cmn be a matrix of rank r. Then the four
fundamental subspaces are spanned by
R(A) = span{u1 , . . . , ur },

N (A ) = span{ur+1 , . . . , um },

R(A ) = span{v1 , . . . , vr },

N (A) = span{vr+1 , . . . , vn }.

Proof. The first r left singular vectors associated with nonzero singular
values form a basis for R(A) because
Ax = UV x =

n
X
i=1

i (vi x)ui =

r
X
i=1

i (vi x)ui .

3.2. Singular values and singular vectors

35

The fundamental theorem of linear algebra says R(A) N (A ) = Cm ,


so the last m r left singular vectors must span N (A ). We can apply
this result to A to see that the first r and last n r right singular
vectors span R(A ) and N (A) respectively.
To check Lemma 3.5, we verify that the last n r right singular
vectors associated with zero singular values form a basis for N (A).
First, note that dim(N (A)) = dim(span{vr+1 , . . . , vn }) = n r. Second, their intersection also has dimension n r because any vector in
span{vr+1 , . . . , vn } is in N (A). We check this by a direct calculation:

nr
nr
X
X
A
j vr+j = UV
j vr+j
j=1

j=1

nr
X

= U

j r+j er+j = 0.

j=1

Hence, the SVD reveals the four fundamental subspaces of a matrix.


Like the eigenvalues and eigenvectors of Hermitian matrices, singular values and singular vectors also satisfy a variational principle. Let
x and y both be unit vectors and consider the quadratic form x Ay.
First, we first expand the quadratic form in terms of the SVD of A:
x Ay = x UV y =

n
X

i (x ui )(y vi ) 1 .

i=1

where the last inequality holds because x and y are both unit vectors.
This bound is attained by x = u1 and y = v1 so 1 is the optimal
value of the optimization problem:
max
x,y

x Ay
.
kxkkyk

To characterize the smaller singular values, we constrain x and y to


lie in span{u1 , . . . , uk1 } and span{v1 , . . . , vk1 } respectively. Then
we can ignore the k 1 largest singular values and
x Ay =

n
X
i=k

i (x ui )(y vi ) k .

36 The Singular Value Decomposition


This bound is attained by x = uk and y = vk so k is the optimal
value of the optimization problem:
max

xspan{u1 ,...,uk1 }
yspan{v1 ,...,vk1 }

x Ay
.
kxkkyk

We summarize these results in a lemma.


Lemma 3.6. The kth largest singular value of a matrix satisfies
k =

3.3
3.3.1

max

xspan{u1 ,...,uk1 }
yspan{v1 ,...,vk1 }

x Ay
.
kxkkyk

Matrix approximation
Low-rank matrix approximation

The SVD can be used to construct low-rank approximations to a matrix. Suppose A Cmn and assume m n. Then, the dyadic version
of the SVD expresses A as a sum of rank-1 matrices:
A = 1 u1 v1 + + n un vn .
A natural way to approximate A is to take only a partial sum:
A Ak =

k
X

i ui vi ,

(3.1)

i=1

where k < n. This is a rank-k approximation to A because u1 , . . . , uk


are linearly independent. How good is this approximation?
Theorem 3.3. Suppose A Cmn and k < n. Then
min kA Bk = k+1 .
rank(B)=k
The minimum is attained by Ak =

Pk

i=1 i ui vi .

3.3. Matrix approximation

37

Proof. We shall derive a lower bound for kA Bk over the set


{B Cmn | rank(B) = k} and then show that the bound is attained
by the rank-k approximation to A in (3.1).
Let B Cmn be a rank-k matrix and Vk+1 denote the span of the
first k + 1 right singular vectors of A. Then the intersection of N (B)
and Vk+1 must be nontrivial because dim(N (B)) + dim(Vk+1 ) > n.
Choose z to be a unit length vector in N (B) Vk+1 . z Vk+1 , so
we can express z using the first r right singular vectors v1 , . . . , vk+1 :
z=

k+1
X

i vi .

i=1

We can now bound the quantity k(A B)zk from below as follows.
First, observe that z is also in N (B), so


k+1


X


k(A B)zk = kAzk = UV
i vi .


i=1

The Euclidean norm is invariant under unitary transformation, so



k+1
k+1

k+1

X
X

X




i vi =
i ei =
i i ei .
UV




i=1

i=1

i=1

Because the vectors i ei are orthogonal


k+1

k+1

X

X





i i ei k+1
i ei .





i=1

Note that

i=1




k+1
k+1
X
X




i ei =
i V vi = kV zk = 1,




i=1

i=1

so we can conclude
k(A B)zk k+1 .
k(A B)zk is clearly a lower bound for k(A B)k, so
k(A B)k k+1 .
P
This lower bound is attained by Ak = ki=1 i ui vi because
A Ak =

n
X
i=1

i ui vi

k+1
X
i=1

i ui vi =

n
X
i=k+1

i ui vi .

38 The Singular Value Decomposition


This is the SVD of A Ak , so kA Ak k = k+1 .
This approximation to A is also optimal in the Frobenius norm.
Theorem 3.4. Let A Cmn and k < m n. Then
v
u X
u n
i2 .
min kA BkF = t
rank(B)=k
i=k+1
The minimum is attained by Ak =

Pk

i=1 i ui vi .

Proof. The proof is simple and we leave it as an exercise to the reader.

3.3.2

The orthogonal Procrustes problem

The SVD also facilitates the solution of the orthogonal Procrustes problem. It asks how can one should rotate a matrix A so it approximates
another matrix B. Mathematically, we seek a solution to
min kB QAkF .

(3.2)

Q Q=I

To solve the orthogonal Procrustes problem, we first expand the


objective function
kQA BkF = Tr(B B) 2|Tr(B QA)| + Tr(A Q QA)
= Tr(B B) 2|Tr(AB Q)| + Tr(A A).
Thus (3.2) is equivalent to maximizing the quantity |Tr(AB Q)|. Let
AB = UV denote the SVD of AB , then
|Tr(AB Q)| = Tr(UV Q)| = |Tr(R )|
where R = V QU. R is a unitary matrix so the magnitude of its
diagonal entries cannot be larger than 1 so


n
n
X
X

i .
|Tr(R )| =
i rii


i=1

i=1

This bound is attained if we set Q =

VU

so R = I.

4
Supplemental Material

There are several important topics covered in the refresher course that
are not explicitly discussed in these notes. This chapter acts as a link to
supplemental material for these topics. Specifically, we provide a very
brief introduction to each of the topics and list references for further
study.

4.1

Gaussian Elimination and the LU Factorization

Perhaps the simplest method for solving linear systems of the form

Ax = b,

(4.1)

where A Cnn is known as Gaussian elimination. This is the process


of using successive elementary row operations to force the matrix A
into upper triangular form. Let us introduce the set of matrices Mj
39

40 Supplemental Material
such that for any vector x Cn1


x1
x1
.. ..
. .


xj xj

= ,
Mj

xj+1 0
. .
.. ..
xj

and Mj has the form


Mj = I wj ej
for some wj . Specifically, we may choose w to have entries

kj
0,
(wj )k = xk
,
k>j
xj
We seek a sequence of these so called Gauss Transforms such that
Mn1 . . . M1 A = U,

(4.2)

where U is upper triangular. It is important to note that such a sequence may not exist, however, there are a set of formal properties of
A that imply the existence of such a transform, see, e.g., section 3.5 of
[5] for details. The matrices Mj are used to successively introduce zeros
below the diagonal in column j of the matrix A. The order in which
these matrices are applied ensures that each successive application of a
Gauss Transform does not introduce any new elements below the diagonal. Since the matrices Mj are all nonsingular, under the assumption
that A is nonsingular and that we never have to divide by zero in the
construction of Mj we may solve a the linear system (4.1) by forming
Ux = Mn1 . . . M1 b
and then solving for x via backward substitution. This procedure is
what we know as Gaussian Elimination.
This methodology is related to the LU Factorization where we attempt to factor A into a unit lower triangular matrix L and an upper
triangular matrix U such that
A = LU.

4.2. The QR Factorization

41

Under suitable assumptions on A such that the transform (4.2) exists,


let us define
L1 = Mn1 . . . M1 .
Observe that each of the Mj are unit lower triangular and thus the
product is unit lower triangular. Furthermore, because the inverse of a
unit lower triangular matrix is also unit lower triangular we may write
the so called LU decomposition of A as
A = LU.
Previously we made note of the fact that not all nonsingular matrices
A admit an LU factorization. However it turns out that for any nonsingular matrix A there exists a permutation P such that PA has an
LU factorization.
Theorem 4.1. Given any nonsingular matrix A Cnn there exists a
permutation matrix P, a unit lower triangular matrix L and an upper
triangular matrix U such that
PA = LU.
Proof. See, e.g., Section 3.5 of [5].
Further discussion of Gaussian elimination and the LU factorization
can be found in Lecture 20 of [4].

4.2

The QR Factorization

Another common and important factorization of a matrix A Cmn


that is of rank k is the QR factorization. Specifically, we seek to write
A = QR,
where Q Cmk has orthonormal columns and R Ckn is an upper
triangular matrix. In some cases, the dimensions given here for Q and
R actually make this the so called reduced QR decomposition. For the
sake of the brief discussion here we will focus on this form of the QR
factorization and restrict our discussion to the case where m n and

42 Supplemental Material
A has full column rank, i.e., k = n. However, the factorization is in no
way restricted to this situation.
Structurally, the goal of this factorization is to form an orthonormal
basis {qi }ni=1 for the range of A with the property that
span {q1 , . . . , qp } = span {a1 , . . . , ap } ,

p = 1, . . . , n.

Such a set of q may be constructed via the Gram Schmidt process, see,
e.g., Lecture 7 of [4]. Under the assumptions here such a basis always
exists and we observe that we may therefore write
aj =

j
X

ri,j qi ,

j = 1, . . . , n.

i=1

If we form

Q = q1 . . .


qk ,

the nested nature of the orthonormal basis means that we may construct the j th column of A from we only need the first j columns of Q.
By construction Q Q = I, however, unless m = n Q is not unitary. If
we let

r1,1 . . . r1,n

..
..
R=
.
.
rn,n
then by construction we have a factorization
A = QR.
This section only provides a very brief introduction to the QR factorization. This factorization has a wide variety of uses and is incredibly
powerful. It may also be constructed for matrices that do not adhere
to the assumptions given here. Further discussion of this factorization
can be found in Lecture 7 of [4].

References

[1] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University


Press, 2004.
[2] M. Embree, Applied Matrix Analysis.
[3] M. Embree, Numerical Analysis.
[4] L. N. Trefethen and D. Bau, Numerical Linear Algebra, SIAM, 1997.
[5] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press,
1985.

43

You might also like