Professional Documents
Culture Documents
Preface
Preface
ii
1 Preliminaries
1.1
1.2
1.3
1
7
8
Euclidean space
The four fundamental subspaces
Projectors
2 Spectral Theory
10
2.1
2.2
2.3
10
14
18
25
3.1
3.2
3.3
25
33
36
4 Supplemental Material
39
4.1
4.2
39
41
References
43
Preface
Yuekai Sun
Preface
This edition of the refresher course notes has been edited to address
some typographical errors, though I have probably also managed to simultaneously introduce several new ones. Furthermore, a few sections
have been slightly revised to present material in a different manner and
some material has been added to discuss topics not covered in the original version. Jordan Canonical form is now presented in a significantly
different manner than the original and sections have been added on
the real Scuhr form and some matrix factorizations. These sections are
written somewhat differently than the existing material in the notes as
they are intended to provide a brief introduction to the material and
then point the reader towards some resources for further study.
The bulk of the notes still closely mirrors the original version written
a year ago by Yuekai and I would like to thank him for the original
construction of these notes.
Stanford, California
Anil Damle
ii
1
Preliminaries
1.1
1.1.1
Euclidean space
Vectors
Preliminaries
kxk2 = x x = t
|xi |2 .
i=1
Here the subscript is used to distinguish this specific norm. More generally, a norm must satisfy kxk 0 and kxk = 0 if and only if x = 0.
A norm also satisfies: (i) kxk = ||kxk for all C, and (ii) the
triangle inequality: kx + yk kxk + kyk. We call a vector of norm 1 a
unit vector.
The dot product satisfies the Cauchy-Schwarz inequality:
|x y| kxk2 kyk2 ,
(1.1)
Preliminaries
1.1.2
Matrices
Using scalars, we can also build matrices with m rows and n columns,
which we denote by a bold capital letter, e.g. A Cmn . The entry
in the jth row and kth column is denoted by ajk . We often split a
matrix into columns a1 , . . . , an or rows aT1 , . . . , aTm . For example, we
can represent A C32 as
T
a1
A = a1 a2 or A = aT2 .
aT3
The conjugate-transpose and transpose of a matrix are defined and
denoted similarly to the respective operations for a vector: the j, k entry
of A is a
kj and then j, k entry of AT is akj . For our 3 2 example
matrix A,
a
11 a
21 a
31
A =
a
12 a
22 a
32
a11 a21 a31
T
A =
.
a12 a22 a23
Matrices with the same number of rows and columns are called square
matrices.
We can verify using a direct calculation that
(AB) = B A
(AB)T = BT AT .
We often add and multiply matrices and vectors block-wise in the same
way we carry out matrix multiplication entry-wise. For example, the
Karush-Kuhn-Tucker (KKT) conditions for some classes of optimization problems can be expressed as
Hx + AT y = g
Ax = b,
which can also be expressed as
H AT x
g
=
.
A 0
y
b
The trace product and the Frobenius norm also satisfy the CauchySchwarz inequality:
|Tr(A B)| kAkF kBkF .
If A is a scalar multiple of B, then equality is attained.
In fact, we can use any vector norm, for example the aforementioned euclidean norm, to measure magnitude of a matrix by the largest
amount the matrix stretches vectors. We define the induced matrix
norm of a matrix A Cmn as
kAk := max
x6=0
kAxk
,
kxk
There is no simple formula for this norm and we shall describe how to
compute it after developing the requisite machinery.
These matrix norms (and many others) satisfy the same properties
the Euclidean norm satisfies: (i) kAk 0 and kAk = 0 if and only if
A = 0, (ii) kAk = ||kAk for C, and (iii) kA + Bk kAk + kBk.
Preliminaries
kxk=1
1.2
Preliminaries
1.3
Projectors
In general, the induced norm of a projector satisfies kPk 1 because kPvk = kvk for every v R(P). If P is an orthogonal projector,
then kPk = 1 because P decomposes v into Pv + (I P)v, where Pv
and (I P)v are orthogonal. The converse is also true.
1.3. Projectors
vv
.
v v
Although we can construct orthogonal projectors using nonorthonormal bases, this is rarely done in practice, so we skip this topic.
2
Spectral Theory
Matrices are useful in higher mathematics because of the insights available from spectral theory. This chapter develops the theory of eigenvalues, eigenvectors and allied concepts via two approaches, the Schur
form and the Jordan form. We do so because each approach yields different insights for a variety of problems, and we believe reconciling the
two approaches enriches ones understanding of spectral theory.
2.1
11
X
(I A)1 =
Ai ,
i=0
1
.
1 kAk
k(I A)1 k
k
X
Ai .
i=1
A k kAkk1
kSk+m Sk k = kA
i=0
X
k1
kAk
i=0
kAki =
m
X
i=0
k1
kAk
1 kAk
kAki
12 Spectral Theory
which becomes arbitrarily small as k . The space Cnn endowed
with the matrix norm is complete. Thus, the sequence of partial sums
has a limit, which we denote by S . We now verify that this limit is
the desired inverse using a direct calculation:
(IA)S = lim (IA)Sk = lim
k
k
X
Ai
i=0
k+1
X
Ai = lim IAk+1 = I
i=1
The inequality holds because we can bound kSk k using the geometric
P
i
series
i=1 kAk = 1/(1 + kAk).
Theorem 2.1 yields a bound on k R(z)k when |z| > kAk:
k R(z)k = k(zI A)1 k
1
.
|z| kAk
Thus there can be no eigenvalues with magnitude larger than kAk and
(A) {z C : |z| kAk}.
Of course, we can arrive at the same conclusion using the eigenvalue
equation and the fact that matrix norms are submultiplicative:
||kuk = |Auk kAkkuk.
We can ask a more fundamental question: do eigenvalues exist for
every matrix? Can the spectrum of a matrix be empty?
Theorem 2.2. Every matrix A Cnn has at least one eigenvalue;
i.e. R() has at least one pole.
Proof. The entries of the resolvent are analytic functions in an open
set not containing an eigenvalue because rational functions are analytic
over a set not containing its poles. If A has no eigenvalues, then the
13
entries of R(z) are analytic over the entire complex plane and k R(z)k
must be bounded in the disk {z C : |z| kAk}. k R(z)k is also
bounded outside the disk (a consequence of Theorem 2.1) so by Liouvilles Theorem, R(z) must be a constant. Theorem 2.1 also says
k R(z)k 0 as |z| and so we must have R(z) = 0. This is a contradiction because if A has no eigenvalues, R(z) should be nonsingular.
Hence, A must have at least one eigenvalue.
Can we say more about the location of eigenvalues in the complex
plane? The basic result in this area is a theorem named after the Belarusian mathematician Semyon A. Gershgorin.
Definition 2.3. The ith Gershgorin disc is a ball of radius ri =
P
j6=i |aij | centered at aii in the complex plane; i.e.
X
Di = {z C | |z aii |
|aij |}.
j6=i
j6=i
14 Spectral Theory
We stated and proved the vanilla version of Gershgorins disc theorem. There are stronger versions of the theorem. Informally, a general
statement of the theorem states that if there are k disks whose union is
a connected set that is disjont from the remaining disks, then exactly
k eigenvalues lie in the domain fromed by the union of those k disks.
2.2
The Schur form or Schur factorization of a square matrix is a representation as the similarity transform of a triangular matrix. Because the
Schur form exists for every square matrix, it is a very general theoretical tool. The Schur form enables matrix analysts to study the behavior
of general square matrices via study of the behavior of triangular matrices.
Theorem 2.4 (The Schur form). If A Cnn , then there exist a
unitary matrix U Cnn and an upper triangular matrix T Cnn
such that
A = UTU .
Proof. We shall prove the existence of the Schur form via mathematical
induction. If A C11 (A is a scalar), then we can express A as UTU ,
where U = 1 and T = A.
Suppose the result holds for n 1 n 1 matrices. If A Cnn ,
then Theorem 2.2 guarantees the existence of at least one eigenvalue
and a unit length eigenvector u such that
Au = u.
Let V denote an orthonormal basis for span{u} so u V is a unitary
matrix. Then
u Au u AV
.
u V A u V =
V Au V AV
u is an eigenvector of A so u Au = kuk2 = and V u = 0 because
the columns of V are an orthonormal basis for span{u} . Hence,
u AV
.
u V A u V =
0 V AV
u V is unitary so we can rearrange to obtain
u AV
A= u V
u V .
0 V AV
15
(2.1)
0 UTU
1 0 u AV 1 0
= u V
u V
0 U
T
0 U 0
i
h
i u AV h
= u VU
u VU .
0
T
(2.2)
The matrix
in the
h
i center is upper triangular because T is upper trian is unitary because
gular. u VU
"
#
h
i h
i
u
VU
u
u
1 0
u VU
u VU =
V VU
= 0 I .
U V u U
This concludes the induction step.
Pre- and post-multiplying by a matrix and its inverse is called a
similarity transform. Similarity transforms preserve the eigenvalues of
a matrix. If and u are an eigenpair of A and V is a nonsingular
matrix, then and Vu is an eigenpair of VAV1 . We can verify by a
direct calculation:
VAV1 (Vu) = VAu = Vu.
The Schur form of a matrix A represents it as the similarity transform of a triangular matrix T. Thus, The eigenvalues of A are the
same as the eigenvalues of T. T is an upper triangular matrix, hence
its eigenvalues are its diagonal entries. This is true because the resolvent zI T will be singular if and only if z equals one of the diagonal
entries of T. The Schur form thus reveals the eigenvalues of a matrix.
16 Spectral Theory
2.2.1
T1,1 . . .
..
..
=
T
.
.
Tm,m
where the diagonal blocks, Ti,i , are either of size 1 1 or 2 2. In this
reveal the eigenvalues
case the eigenvalues of the diagonal blocks of T
of A. The 1 1 blocks on the diagonal are themselves eigenvalues of
A, and the eigenvalues of the 2 2 blocks on the diagonal are also
eigenvalues of A.
Since the eigenvalues of a real matrix A may be characterized as the
roots of a polynomial with real coefficients we know that the complex
eigenvalues must appear in conjugate pairs. Thus, we may observe that
each set of complex conjugate eigenvalues of A appear as the eigenval
ues of one of the 2 2 block on the diagonal of T.
2.2.2
17
(2.3)
1
u1
.
.
..
A = u1 . . . un
..
n
un
= 1 u1 u1 + + n un un
= 1 P1 + + n Pn
where Pi = ui ui , i = 1, . . . , n. The matrices Pi are orthogonal projectors onto the spans of the eigenvectors ui because Pi = Pi and
P2i = ui (ui ui )ui = ui ui = Pi .
Further, this set of orthogonal projectors is a resolution of the identity
because we can sum the matrices Pi to yield
u
n
n
X
X
.1
Pi =
ui ui = u1 . . . un .. = UU = I.
i=1
i=1
un
Thus, we can express any Hermitian matrix as a sum of orthogonal
projectors onto its eigenspaces weighted by the associated eigenvalues.
This is called the spectral decomposition of a Hermitian matrix.
18 Spectral Theory
The fact that the Schur form of a Hermitian matrix is also the
unitary diagonalization is actually true for the larger class of normal
matrices; i.e. A Cnn that satisfy
A A = AA .
We leave the proof as an exercise to the curious reader.
2.3
19
also have n linearly independent eigenvectors because eigenvectors associated with distinct eigenvalues are linearly independent.
Lemma 2.1. Eigenvectors associated with distinct eigenvalues are linearly independent.
Proof. Let u1 , . . . , up be a set of eigenvectors associated with distinct
eigenvalues 1 , . . . , p . Suppose that there exist 1 , . . . , p not all zero
such that
1 u1 + + p up = 0.
We premultiply both sides by j I A, j = 2, . . . , p to obtain
1
p
Y
(A j I)u1 + + p
j=2
p
Y
(A j I)un = 0.
j=2
p
Y
(A j I)u1 = 0.
j=2
p
Y
(A j I)u1 = 1
j=2
p
Y
Au1 j u1 = 1
j=2
p
Y
(j 1 )u1
j=2
(2.4)
This equation is very similar to (2.3) and we call this the diagonalization
of a diagonalizable matrix.
20 Spectral Theory
We can expand (2.4) to obtain the spectral decomposition of a di i denote columns and rows of U
agonalizable matrix A. Let ui and u
and U1 respectively. We have
1
1
u
..
.
.
A = u1 . . . un
.
.
n
n
u
1 + + n un u
n
= 1 u1 u
= 1 P1 + + n Pn ,
i , i = 1, . . . , n. u
1, . . . , u
n are sometimes called the left
where Pi = ui u
eigenvectors of A because we can rearrange (2.4) to obtain
U1 A = DU1 .
After expanding this equation row-wise, we have
1
1
u
1
u
..
.
..
. A =
.. .
.
n
n
n
u
u
n when multi 1 , . . . , u
Thus, we see that A preserves the direction of u
plied from the left. Similarly, u1 , . . . , un are also sometimes called the
right eigenvectors of A. Although both sets of eigenvectors are not orthogonal in general, they are linearly-independent and form bases for
Cn . The left and right eigenvectors are biorthogonal; i.e.
(
0 i 6= j
j =
.
ui u
1 i=j
The matrices Pi are projectors onto the spans of the eigenvector ui
because
i = Pi .
P2i = ui (
ui ui )
ui = ui u
This set of projectors is also a resolution of the identity because we can
sum the Pi s to yield
1
u
n
n
X
X
i = u1 . . . un ... = UU1 = I.
Pi =
ui u
i=1
i=1
n
u
21
Generalized eigenvectors
Although matrices that are not diagonalizable are rare, they do sometimes arise in practice and we need a more general theory to handle
these nondiagonalizable or so-called defective matrices. In this section,
we develop the theory concerning generalized eigenvectors.
Definition 2.5. The algebraic multiplicity ai of the eigenvalue i
equals the number of times i appears on the diagonal of the T factor
in the Schur form A = UTU .
The geometric multiplicity gi of the eigenvalue i equals the dimension of the eigenspace associated with i ; i.e. gi = dim(N (A i I)).
The geometric multiplicity of an eigenvalue is always less than or
equal to the algebraic multiplicity. If the sum of the geometric multiplicities is less than the sum of algebraic multiplicities, then there
are not enough eigenvectors to form a basis for Cn and the matrix is
defective. For such matrices, we substitute generalized eigenvectors in
lieu of the missing eigenvectors.
Definition 2.6. A nonzero vector u Cn is called a generalized eigenvector of A associated with the eigenvalue if u N ((A I)a ) ,
where a is the algebraic multiplicity of .
N ((A I)a ) is called the generalized eigenspace associated with
the eigenvalue .
22 Spectral Theory
2.3.3
Jordan form
Previously we discussed diagonalizable matrices, however, not all matrices are diagonalizable. One such example is the n n matrix
...
,
Jn () =
..
. 1
..
.
Jn () I =
,
1
we observe that N (Jn () I) = span{e1 }. Thus, has geometric
multiplicity one, which implies that A is not diagonalizable.
Motivated by such examples we provide a brief introduction to the
Jordan form. While presented here absent its construction, the Jordan
form is nevertheless of theoretical importance. Specifically, the Jordan
form exists for all matrices. That being said, there is no stable way to
compute the Jordan form and thus it appears rarely in practice. A more
comprehensive treatment of the material may be found in Chapter 3
of [5].
The aforementioned defective matrix is in fact an example of what
is known as a Jordan block associated with the eigenvalue i . More
generally, a ni ni matrix of the form
i 1
i . .
,
Jni (i ) =
..
. 1
i
is known as a Jordan block associated with the eigenvalue i . These
matrices serve as the building blocks of the Jordan form, by allowing the
23
Jn1 (1 )
1
..
A = U
U ,
.
Jnk (k )
where n1 + + nk = n, each i is an eigenvalue A, and the elements
of the set {1 , . . . , k } may not be distinct.
2.3.4
The Jordan form enables us to define and analyze functions of matrices, such as the matrix exponential exp(A) and matrix polynomials.
We conclude this chapter with a section about the characteristic polynomial of a matrix.
Definition 2.7. The characteristic polynomial of a matrix A
Cn with distinct eigenvalues 1 , . . . , p and algebraic multiplicities
a1 , . . . , ap is the polynomial
pA (z) =
p
Y
(z i )ai .
i=1
p
Y
(z i )di .
i=1
24 Spectral Theory
The index (length of the longest Jordan chain) cannot be greater
than the algebraic multiplicity of an eigenvalue; i.e. di ai , so the
minimal polynomial has degree no greater than n and it divides the
characteristic polynomial.
It is easy to see that the minimal polynomial, thus and the characteristic polynomial, annihilates the matrix using the Jordan form.
Since a matrix polynomial is a sum of matrix powers,
!
d
d
d
X
X
X
mA (A) =
i A j =
i UJU1 = U
i Jj U1
i=1
i=1
1
= UmA (J)U
i=1
mA (J1 )
..
mA (J) =
.
.
mA (Jp )
mA (Ji ) = 0 because mA (Ji ) contain a term (Ji i I)di = Ndi i = 0.
We conclude that mA (J) = 0; hence mA (A) = 0.
This fact was first postulated by Arthur Cayley, who also proved it
for the 22 and 33 cases, and is called the Cayley-Hamilton theorem.
Sir William Hamilton proved the theorem for general square matrices.
Theorem 2.6 (Cayley-Hamilton). The characteristic polynomial
pA (z) of a square matrix A annihilates A; i.e.
pA (A) = 0.
3
The Singular Value Decomposition
The singular value decomposition (SVD) expresses a matrix as the product of a unitary matrix, a diagonal matrix, and a second unitary matrix.
It is one of the most useful tools in the matrix analysts toolbox because of its amazing range of applications. In this chapter, we shall
construct the SVD and then explore some of its many applications in
computational mathematics.
3.1
The matrix A A, often called the Gram matrix or Gramian, is a Hermitian matrix that arises in diverse contexts. In section 2.2, we proved
Hermitian matrices are unitary diagonalizable; i.e. (i) their eigenvalues
are real and (ii) their eigenvectors form orthonormal bases for Cn .
Definition 3.1. The Rayleigh quotient of a matrix A Cnn evaluated at the nonzero vector x Cn is the scalar quantity
x Ax
C.
x x
25
x u
n
where i =
i because u1 , . . . , un is an orthonormal basis for C .
Thus, the Rayleigh quotient of A evaluated at v can be expressed as
a U AUa
a Da
x Ax
=
=
,
x x
a U Ua
a a
where the final equality holds because A is Hermitian; hence unitarily
diagonalizable.
We can further simplify this expression using the fact that D is
diagonal to obtain
x Ax
1 |1 |2 + + n |n |2
=
.
x x
|1 |2 + + |n |2
Both numerator and denominator are real; hence the Rayleigh quotient
is real. If we order the eigenvalues of A such that
1 n ,
we can bound the Rayleigh quotient above and below by 1 and n :
1 |1 |2 + + |n |2
1 |1 |2 + + n |n |2
= 1 ,
|1 |2 + + |n |2
|1 |2 + + |n |2
n |1 |2 + + |n |2
1 |1 |2 + + n |n |2
= n .
|1 |2 + + |n |2
|1 |2 + + |n |2
Note the bounds are attained by the eigenvectors associated with the
largest and smallest eigenvectors respectively. Thus, these eigenvalues
are the optimal values of optimization problems:
1 = max
x
x Ax
x Ax
and
=
min
.
n
x
x x
x x
3.1.1
27
xVnk
x Ax
x Ax
=
max
,
x x
x x
xUk1
where Uk1 and Vnk denote the span of the eigenvectors associated
with the k 1 largest eigenvalues and nk smallest eigenvalues respectively. Although this describes the interior eigenvalues, it is unsatisfactory because it requires knowledge of the eigenspaces associated with
the more extreme eigenvalues.
The CourantFisher minimax principle circumvents this problem
by describing the eigenvalues of A using minimax problems.
Theorem 3.1. If A is a Hermitian matrix and its eigenvalues are ordered such that 1 n , then its eigenvalues satisfy
k =
max
min
x Ax
.
x x
xSk
v Av
x Ax
k .
x x
v v
max
min
x Ax
.
x x
The Courant-Fisher minimax principle can also be stated in a minimax version. This version is easier to state if the eigenvalues are arranged such that 1 n .
Lemma 3.1 (Courant-Fisher minimax principle). If A is a Hermitian matrix and its eigenvalues are ordered such that 1 n ,
then its eigenvalues satisfy
k =
3.1.2
min
max
x Ax
.
x x
29
positive definite. Positive definite matrices are desirable computationally because they permit the use the special algorithms designed to
handle these matrices. For example, the Cholesky factorization halves
the effort required to solve a symmetric positive definite linear system
when compared to Gaussian elimination.
Definition 3.2. Suppose A Cnn is a Hermitian matrix. Then A is
positive definite if its Rayleigh quotient is always positive; i.e.
x Ax
> 0 for all nonzero x Cn .
x x
An equivalent definition is: A is positive definite if its eigenvalues are
positive.
The two definitions are equivalent because if the Rayleigh quotient
is always nonnegative, then
i =
ui Aui
> 0.
ui ui
On the other hand, the Rayleigh quotient is bounded above and below
by the largest and smallest eigenvalues of A, so if the eigenvalues are
positive, so is the Rayleigh quotient. We can also define the larger class
of positive semidefinite matrices by allowing the Rayleigh quotient (and
eigenvalues) to be zero.
A third characterization of positive definite matrices involves a
quantity called the Schur complement, named after the German mathematician Issai Schur (also of Schur form fame).
Definition 3.3. Suppose A is a Hermitian matrix partitioned as
A11 A12
A=
.
A21 A22
If A11 is nonsingular, the Schur complement of A11 in A is the quantity
SA11 = A22 A12 A1
11 A12 .
=
A11
A12 v
A11 u +
+ v A22 v v A12 A11 A12 v
2
p
1
p
=
A11
A12 v
A11 u +
+ v SA11 v.
We can set the first term to zero by choosing u = A11 A12 v so the
minimum over u is attained at v SA11 v. This interpretation of the
Schur complement leads to a characterization of positive definite matrices that is often used in optimization.
Lemma 3.2. Suppose A is a Hermitian matrix partitioned as
A11 A12
A=
.
A21 A22
A is positive definite if and only if A11 and its Schur complement SA11
are both positive definite.
The proof of this lemma mainly uses ideas from convex optimization
so we refer to Appendix A in [1] for the details.
The Gram matrix arises in many contexts, such as the stiffness
matrix in the finite element method. It is Hermitian and positive semidefinite because
kAxk2
x A Ax
=
0.
x x
kxk2
Hence, it has n nonnegative eigenvalues and the eigenvectors form an
orthonormal basis for Cn .
We are ready to start our tour of the SVD with its construction.
Unlike the Jordan form, the SVD can be computed robustly, although
31
Proof. Let A Cmn and assume m n. A A Cnn is a Hermitian matrix with eigenvalues 1 , . . . , n and associated unit-length
eigenvectors v1 , . . . , vn . A A is positive semidefinite so the eigenvalues are nonnegative. Assume there are r nonzero eigenvalues and the
eigenvalues are ordered such that
1 r > r+1 = = n = 0.
1
j
v A Avj =
v vj = 0.
ui uj =
i j i
i j i
ui ui =
Further, we have
Avi = i ui ,
i = 1, . . . , r,
i = r + 1, . . . , n
1
..
A v1 . . . vn = u1 . . . un
,
.
n
which we shall write concisely as
.
AV = U
We can rearrange to obtain the reduced SVD or thin SVD of A:
V
.
A=U
is a subunitary matrix; i.e. its columns are orthonormal but they do
U
unitary, we can choose un+1 , . . . , um to be
not span Cm . To make U
an orthonormal basis for span{u1 , . . . , un } and append these m n
to form
vectors to U
h
i
un+1 . . . um .
U= U
.
The most obvious choice is
We must also choose so that U = U
=
.
0
1
v1
.
..
..
A = u1 . . . um
.
n
vn
0
= 1 u1 v1 + + n un vn .
33
This is the dyadic version of the SVD. Note that if A has rank r < n,
then we only need to sum the first r rank-1 terms because in this case
r+1 = = n = 0.
We can interpret the SVD as an algebraic representation of a
geometric fact: the image of the unit sphere under a matrix is a
hyper-ellipse. This hyper-ellipse is the one whose major axes lie along
u1 , . . . , un with (not necessarily nonzero) lengths 1 , . . . , n (assuming
m n). If rank(A) = r, then r of these axes will have nonzero length.
The pre-image of these axes are v1 , . . . , vn because AV = U.
3.2
3.2.1
r
X
i2 x2i .
i=1
Proof. We leave the proof as an exercise to the reader. Hint: The Frobenius norm is also invariant under unitary transformations.
3.2.2
Singular vectors
The vectors u1 , . . . , um and v1 , . . . , vn are called the left and right singular vectors of A respectively. Subsets of these vectors form orthonormal bases for the four fundamental subspaces of A.
Lemma 3.5. Let A Cmn be a matrix of rank r. Then the four
fundamental subspaces are spanned by
R(A) = span{u1 , . . . , ur },
N (A ) = span{ur+1 , . . . , um },
R(A ) = span{v1 , . . . , vr },
N (A) = span{vr+1 , . . . , vn }.
Proof. The first r left singular vectors associated with nonzero singular
values form a basis for R(A) because
Ax = UV x =
n
X
i=1
i (vi x)ui =
r
X
i=1
i (vi x)ui .
35
nr
nr
X
X
A
j vr+j = UV
j vr+j
j=1
j=1
nr
X
= U
j r+j er+j = 0.
j=1
n
X
i (x ui )(y vi ) 1 .
i=1
where the last inequality holds because x and y are both unit vectors.
This bound is attained by x = u1 and y = v1 so 1 is the optimal
value of the optimization problem:
max
x,y
x Ay
.
kxkkyk
n
X
i=k
i (x ui )(y vi ) k .
xspan{u1 ,...,uk1 }
yspan{v1 ,...,vk1 }
x Ay
.
kxkkyk
3.3
3.3.1
max
xspan{u1 ,...,uk1 }
yspan{v1 ,...,vk1 }
x Ay
.
kxkkyk
Matrix approximation
Low-rank matrix approximation
The SVD can be used to construct low-rank approximations to a matrix. Suppose A Cmn and assume m n. Then, the dyadic version
of the SVD expresses A as a sum of rank-1 matrices:
A = 1 u1 v1 + + n un vn .
A natural way to approximate A is to take only a partial sum:
A Ak =
k
X
i ui vi ,
(3.1)
i=1
Pk
i=1 i ui vi .
37
k+1
X
i vi .
i=1
We can now bound the quantity k(A B)zk from below as follows.
First, observe that z is also in N (B), so
k+1
X
k(A B)zk = kAzk =
UV
i vi
.
i=1
i=1
i=1
Note that
i=1
k+1
k+1
X
X
i ei
=
i V vi
= kV zk = 1,
i=1
i=1
so we can conclude
k(A B)zk k+1 .
k(A B)zk is clearly a lower bound for k(A B)k, so
k(A B)k k+1 .
P
This lower bound is attained by Ak = ki=1 i ui vi because
A Ak =
n
X
i=1
i ui vi
k+1
X
i=1
i ui vi =
n
X
i=k+1
i ui vi .
Pk
i=1 i ui vi .
3.3.2
The SVD also facilitates the solution of the orthogonal Procrustes problem. It asks how can one should rotate a matrix A so it approximates
another matrix B. Mathematically, we seek a solution to
min kB QAkF .
(3.2)
Q Q=I
i .
|Tr(R )| =
i rii
i=1
i=1
VU
so R = I.
4
Supplemental Material
There are several important topics covered in the refresher course that
are not explicitly discussed in these notes. This chapter acts as a link to
supplemental material for these topics. Specifically, we provide a very
brief introduction to each of the topics and list references for further
study.
4.1
Perhaps the simplest method for solving linear systems of the form
Ax = b,
(4.1)
40 Supplemental Material
such that for any vector x Cn1
x1
x1
.. ..
. .
xj xj
= ,
Mj
xj+1 0
. .
.. ..
xj
kj
0,
(wj )k = xk
,
k>j
xj
We seek a sequence of these so called Gauss Transforms such that
Mn1 . . . M1 A = U,
(4.2)
where U is upper triangular. It is important to note that such a sequence may not exist, however, there are a set of formal properties of
A that imply the existence of such a transform, see, e.g., section 3.5 of
[5] for details. The matrices Mj are used to successively introduce zeros
below the diagonal in column j of the matrix A. The order in which
these matrices are applied ensures that each successive application of a
Gauss Transform does not introduce any new elements below the diagonal. Since the matrices Mj are all nonsingular, under the assumption
that A is nonsingular and that we never have to divide by zero in the
construction of Mj we may solve a the linear system (4.1) by forming
Ux = Mn1 . . . M1 b
and then solving for x via backward substitution. This procedure is
what we know as Gaussian Elimination.
This methodology is related to the LU Factorization where we attempt to factor A into a unit lower triangular matrix L and an upper
triangular matrix U such that
A = LU.
41
4.2
The QR Factorization
42 Supplemental Material
A has full column rank, i.e., k = n. However, the factorization is in no
way restricted to this situation.
Structurally, the goal of this factorization is to form an orthonormal
basis {qi }ni=1 for the range of A with the property that
span {q1 , . . . , qp } = span {a1 , . . . , ap } ,
p = 1, . . . , n.
Such a set of q may be constructed via the Gram Schmidt process, see,
e.g., Lecture 7 of [4]. Under the assumptions here such a basis always
exists and we observe that we may therefore write
aj =
j
X
ri,j qi ,
j = 1, . . . , n.
i=1
If we form
Q = q1 . . .
qk ,
the nested nature of the orthonormal basis means that we may construct the j th column of A from we only need the first j columns of Q.
By construction Q Q = I, however, unless m = n Q is not unitary. If
we let
r1,1 . . . r1,n
..
..
R=
.
.
rn,n
then by construction we have a factorization
A = QR.
This section only provides a very brief introduction to the QR factorization. This factorization has a wide variety of uses and is incredibly
powerful. It may also be constructed for matrices that do not adhere
to the assumptions given here. Further discussion of this factorization
can be found in Lecture 7 of [4].
References
43