You are on page 1of 181

Lecture Notes on Linear Algebra

T
AF

Arbind K Lal Sukant Pati


DR

July 10, 2016


2

DR
AF
T
Contents

1 Introduction to Matrices 7
1.1 Definition of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.A Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Operations on Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.A Multiplication of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.B Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Some More Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.A sub-matrix of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
T

1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
AF
DR

2 System of Linear Equations over R 23


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.A Elementary Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 System of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.A Elementary Matrices and the Row-Reduced Echelon Form (RREF) . . . . 28
2.2.B Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.C Gauss-Jordan Elimination and System of Linear Equations . . . . . . . . 34
2.3 Square Matrices and System of Linear Equations . . . . . . . . . . . . . . . . . . 41
2.3.A Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.B Adjugate (classically Adjoint) of a Matrix . . . . . . . . . . . . . . . . . . 48
2.3.C Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 Miscellaneous Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Vector Spaces 55
3.1 Vector Spaces: Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.A Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.1.B Linear Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.1.C Fundamental Subspaces Associated with a Matrix . . . . . . . . . . . . . 66
3.2 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2.A Basic Results related to Linear Independence . . . . . . . . . . . . . . . . 69
3.2.B Application to Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4 CONTENTS

3.2.C Linear Independence and Uniqueness of Linear Combination . . . . . . . 71


3.3 Basis of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.A Main Results associated with Bases . . . . . . . . . . . . . . . . . . . . . 75
3.3.B Constructing a Basis of a Finite Dimensional Vector Space . . . . . . . . 76
3.4 Application to the subspaces of Cn . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5 Ordered Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4 Linear Transformations 87
4.1 Definitions and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.A Algebra of Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . 96
4.3 Matrix of a linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3.A Dual Space* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4 Similarity of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
T

5 Inner Product Spaces 109


AF

5.1 Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


DR

5.1.A Cauchy Schwartz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 111


5.1.B Angle between two Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.1.C Normed Linear Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.1.D Application to Fundamental Spaces . . . . . . . . . . . . . . . . . . . . . 117
5.1.E Properties of Orthonormal Vectors . . . . . . . . . . . . . . . . . . . . . . 119
5.2 Gram-Schmidt Orthogonalization Process . . . . . . . . . . . . . . . . . . . . . . 121
5.3 Orthogonal Operator and Rigid Motion . . . . . . . . . . . . . . . . . . . . . . . 125
5.4 Orthogonal Projections and Applications . . . . . . . . . . . . . . . . . . . . . . . 127
5.4.A Orthogonal Projections as Self-Adjoint Operators* . . . . . . . . . . . . . 131
5.5 QR Decomposition∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6 Eigenvalues, Eigenvectors and Diagonalization 139


6.1 Introduction and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.1.A Spectrum of an eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.2.A Schur’s Unitary Triangularization . . . . . . . . . . . . . . . . . . . . . . . 150
6.2.B Diagonalizability of some Special Matrices . . . . . . . . . . . . . . . . . . 152
6.2.C Cayley Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.3.A Sylvester’s law of inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
CONTENTS 5

7 Appendix 165
7.1 Permutation/Symmetric Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.2 Properties of Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.3 Uniqueness of RREF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.4 Dimension of W1 + W2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.5 When does Norm imply Inner Product . . . . . . . . . . . . . . . . . . . . . . . . 176

Index 178

T
AF
DR
6 CONTENTS

T
AF
DR
Chapter 1

Introduction to Matrices

1.1 Definition of a Matrix


Definition 1.1.1.1. Matrix A rectangular array of numbers is called a matrix.
The horizontal arrays of a matrix are called its rows and the vertical arrays are called its
columns. Let A be a matrix having m rows and n columns. Then, A is said to have order
T
m × n or is called a matrix of size m × n and can be represented in either of the following forms:
   
AF

a11 a12 · · · a1n a11 a12 · · · a1n


   
DR

 a21 a22 · · · a2n   a21 a22 · · · a2n 


   
A= . .. .. ..  or A =  . .. .. ..  ,
 .. . . .   .
. . . . 
   
am1 am2 · · · amn am1 am2 · · · amn

where aij is the entry at the intersection of the ith row and j th column. To be concise, one
writes Am×n = [aij ] or, in short, A = [aij ]. We will also use A[i, :] to denote the i-th row of
A, A[:, j] to denote the j-th column of A. We shall mostly be concerned with matrices having
complex numbers as entries.
" # " #
1 3+i 7 7
For example, if A = then A[1, :] = [1 3 + i 7], A[:, 3] = and
4 5 6 − 5i 6 − 5i
a22 = 5. In general, in row vector commas are inserted to differentiate between entries. Thus,
A[1, :] = [1, 3 + i, 7]. A matrix having only one column is called a column vector and a matrix
with only one row is called a row vector. All our vectors will be column vectors and will be
represented by bold letters. Thus, A[1, :] is a row vector and A[:, 3] is a column vector.
Definition 1.1.1.2. Equality of two Matrices Two matrices A = [aij ] and B = [bij ] having
the same order m × n are equal if aij = bij , for each i = 1, 2, . . . , m and j = 1, 2, . . . , n.
In other words, two matrices are said to be equal if they have the same order and their
corresponding entries are equal.
Example 1.1.1.3. The " linear system
# of equations 2x + 3y = 5 and 3x + 2y = 5 can be identified
2 3 5
with the matrix A = . Note that x and y are unknowns with the understanding
3 2 5
that x is associated with A[:, 1] and y is associated with A[:, 2].
8 CHAPTER 1. INTRODUCTION TO MATRICES

1.1.A Special Matrices

Definition 1.1.1.4. 1. A matrix in which each entry is zero is called a zero-matrix, denoted
0. For example, " # " #
0 0 0 0 0
02×2 = and 02×3 = .
0 0 0 0 0

2. A matrix that has the same number of rows as the number of columns, is called a square
matrix. A square matrix is said to have order n if it is an n × n matrix.

3. Let A = [aij ] be an n × n square matrix.

(a) Then the entries a11 , a22 , . . . , ann are called the diagonal entries the principal di-
agonal of A.
(b) Then A is said to be a diagonal matrix"if aij# = 0 for i 6= j, denoted diag[a11 , . . . , ann ].
4 0
For example, the zero matrix 0n and are a few diagonal matrices.
0 1
(c) If A = diag[a11 , . . . , ann ] and aii = d for all i = 1, . . . , n then the diagonal matrix A
is called a scalar matrix.
T
AF

(d) Then A = diag[1, . . . , 1] is called the identity matrix, denoted In , or in short I.


 
" # 1 0 0
DR

1 0  
For example, I2 = and I3 =  0 1 0.
0 1
0 0 1
(e) Then A is said to be an upper triangular matrix if aij = 0 for i > j.
(f) Then A is said to be a lower triangular matrix if aij = 0 for i < j.
(g) Then A is said to be triangular if it is an upper or a lower triangular matrix.
   
0 1 4 0 0 0
   
 
For example, 0 3 −1 is upper triangular, 1 0 0
 is lower triangular.
0 0 −2 0 1 1

4. An m × n matrix A = [aij ] is said to have an upper triangular form if aij = 0 for all
 
a11 a12 · · · a1n 0 0 1
  " #
 0 a22 · · · a2n   0 1 0

 1 2 0 0 1
    and
i > j. For example, the matrices  . .. .. ..  , 
 .. . . 
.   0 0 2  0 0 0 1 1

0 0 · · · ann 0 0 0
have upper triangular forms.

1.2 Operations on Matrices


Definition 1.1.2.1. Transpose of a Matrix Let A = [aij ] be an m × n matrix with real
entries. Then the transpose of A, denoted AT = [bij ], is an n × m matrix with bij = aji , for all
i, j.
1.2. OPERATIONS ON MATRICES 9

Definition 1.1.2.2. Conjugate Transpose of a Matrix Let A = [aij ] be an m × n matrix


with complex entries. Then the conjugate transpose of A, denoted A∗ , is an n × m matrix
with (A∗ )ij = aji , for all i, j, where for a ∈ C, a denotes the complex-conjugate of a.

Thus, if x is a column vector then T ∗


 x and x are row vectors and vice-versa. For example, if
" # 1 0 " # " #
1 4 5   1 4 + i 1 0
A= then A = A = 
∗ T  ∗
4 1, whereas if A = 0 1 − i then A = 4 − i 1 + i
0 1 2
5 2

and note that A 6= A .T

Theorem 1.1.2.3. For any matrix A, (A∗ )∗ = A. Thus, (AT )T = A.

Proof. Let A = [aij ], A∗ = [bij ] and (A∗ )∗ = [cij ]. Clearly, the order of A and (A∗ )∗ is the same.
Also, by definition cij = bji = aij = aij for all i, j and hence the result follows.

Remark 1.1.2.4. Note that transpose is studied whenever the entries of the matrix are real.
Since, we are allowing the matrix entries to be complex numbers, we will state and prove the
results for complex-conjugate. The readers should separately very that similar results hold for
transpose whenever the matrix has real entries.
T

Definition 1.1.2.5. Addition of Matrices Let A = [aij ] and B = [bij ] be two m × n


AF

matrices. Then, the sum of A and B, denoted A + B, is defined to be the matrix C = [cij ] with
DR

cij = aij + bij .

Definition 1.1.2.6. Multiplying a Scalar to a Matrix Let A = [aij ] be an m × n matrix.


Then the product of k ∈ C with A, denoted kA, is defined as kA = [kaij ].
" # " # " #
1 4 5 5 20 25 2 + i 8 + 4i 10 + 5i
For example, if A = then 5A = and (2+i)A = .
0 1 2 0 5 10 0 2 + i 4 + 2i

Theorem 1.1.2.7. Let A, B and C be matrices of order m × n, and let k, ℓ ∈ C. Then

1. A + B = B + A (commutativity).

2. (A + B) + C = A + (B + C) (associativity).

3. k(ℓA) = (kℓ)A.

4. (k + ℓ)A = kA + ℓA.

Proof. Part 1.
Let A = [aij ] and B = [bij ]. Then, by definition

A + B = [aij ] + [bij ] = [aij + bij ] = [bij + aij ] = [bij ] + [aij ] = B + A

as complex numbers commute. The reader is required to prove the other parts as all the results
follow from the properties of complex numbers.

Definition 1.1.2.8. Additive Inverse Let A be an m × n matrix.


10 CHAPTER 1. INTRODUCTION TO MATRICES

1. Then there exists a matrix B with A + B = 0. This matrix B is called the additive
inverse of A, and is denoted by −A = (−1)A.

2. Also, for the matrix 0m×n , A + 0 = 0 + A = A. Hence, the matrix 0m×n is called the
additive identity.

Exercise 1.1.2.9. 1. Find a 3 × 3 non-zero matrix A with real entries satisfying

(a) AT = A.
(b) AT = −A.

2. Find a 3 × 3 non-zero matrix A with complex entries satisfying

(a) A∗ = A.
(b) A∗ = −A.

3. Find the 3 × 3 matrix A = [aij ] satisfying aij = 1 if i 6= j and 2 otherwise.

4. Find the 3 × 3 matrix A = [aij ] satisfying aij = 1 if | i − j | ≤ 1 and 0 otherwise.


T

5. Find the 4 × 4 matrix A = [aij ] satisfying aij = i + j.


AF

6. Find the 4 × 4 matrix A = [aij ] satisfying aij = 2i+j .


DR

7. Suppose A + B = A. Then show that B = 0.

8. Suppose A + B = 0. Then show that B = (−1)A = [−aij ].


 
1 + i −1 " #
  2 3 −1
9. Let A = 
 2 3 ∗ ∗
 and B = 1 1 − i 2 . Compute A + B and B + A .
i 1

1.2.A Multiplication of Matrices

Definition 1.1.2.10. Matrix Multiplication / Product Let A = [aij ] be an m × n matrix


and B = [bij ] be an n × r matrix. The product of A and B, denoted AB, is a matrix C = [cij ]
of order m × r with
n
X
cij = aik bkj = ai1 b1j + ai2 b2j + · · · + ain bnj , 1 ≤ i ≤ m, 1 ≤ j ≤ r.
k=1

Thus, AB is defined if and only if number


 of columns
 of A = number of rows of B.
" # α β γ δ
a b c  
For example, if A = and B = x y z t 

 then
d e f
u v w s
" #
aα + bx + cu aβ + by + cv aγ + bz + cw aδ + bt + cs
AB = . (1.1.2.1)
dα + ex + f u dβ + ey + f v dγ + ez + f w dδ + et + f s
1.2. OPERATIONS ON MATRICES 11

Thus, note that the rows of the matrix AB can be written directly as

(AB)[1, :] = a [α, β, γ, δ] + b [x, y, z, t] + c [u, v, w, s] = aB[1, :] + bB[2, :] + cB[3, :]


(AB)[2, :] = dB[1, :] + eB[2, :] + f B[3, :] (1.1.2.2)

and similarly, the columns of the matrix AB can be written directly as


" #
aα + bx + cu
(AB)[:, 1] = = α A[:, 1] + x A[:, 2] + u A[:, 3], (1.1.2.3)
dα + ex + f u

(AB)[:, 2] = β A[:, 1] + y A[:, 2] + v A[:, 3], · · · , (AB)[:, 4] = δ A[:, 1] + t A[:, 2] + s A[:, 3].

Remark 1.1.2.11. Observe the following:

1. In this example, while AB is defined, the product BA is not defined. However, for square
matrices A and B of the same order, both the product AB and BA are defined.

2. The product AB corresponds to operating on the rows of the matrix B (see Equation (1.1.2.2)).
This is row method for calculating the matrix product.
T
3. The product AB also corresponds to operating on the columns of the matrix A (see Equa-
AF

tion (1.1.2.3)). This is column method for calculating the matrix product.
DR

4. Let A and B be two matrices such that the product AB is defined. Then verify that
 
A[1, :]B
 
 A[2, :]B 
 
AB =  ..  = [A B[:, 1], A B[:, 2], . . . , A B[:, p]]. (1.1.2.4)
 . 
 
A[n, :]B

  
1 2 1 0 −1
0
   
Example 1.1.2.12. Let A =   
1 0 1 and B = 0 0 1. Use the row/column
0 −1 1 0 −1 1
method of matrix multiplication to

1. find the second row of the matrix AB.


Solution: By Remark 1.1.2.11.4, (AB)[2, :] = A[2, :]B and hence

(AB)[2, :] = 1 · [1, 0, −1] + 0 · [0, 0, 1] + 1 · [0, −1, 1] = [1, −1, 0].

2. find the third column of the matrix AB.


Solution: Again, by Remark 1.1.2.11.4, (AB)[:, 3] = A B[:, 3] and hence
       
1 2 0 1
       
(AB)[:, 3] = −1 ·        
1 + 1 ·  0  + 1 · 1 = 0 .
0 −1 1 0
12 CHAPTER 1. INTRODUCTION TO MATRICES

Definition 1.1.2.13. [Commutativity of Matrix Product] Two square matrices A and B are
said to commute if AB = BA.

Remark 1.1.2.14. Note that if A is a square matrix of order n and if B is a scalar matrix
of order n then
" AB#= BA. In"general,
# the matrix product is not
" commutative.
# " # For example,
1 1 1 0 2 0 1 1
consider A = and B = . Then verify that AB = 6= = BA.
0 0 1 0 0 0 1 1

Theorem 1.1.2.15. Suppose that the matrices A, B and C are so chosen that the matrix
multiplications are defined.

1. Then (AB)C = A(BC). That is, the matrix multiplication is associative.

2. For any k ∈ R, (kA)B = k(AB) = A(kB).

3. Then A(B + C) = AB + AC. That is, multiplication distributes over addition.

4. If A is an n × n matrix then AIn = In A = A.

5. Now let A be a square matrix of order n and D = diag[d1 , d2 , . . . , dn ]. Then


T

• (DA)[i, :] = di A[i, :], for 1 ≤ i ≤ n, and


AF

• (AD)[:, j] = dj A[:, j], for 1 ≤ j ≤ n.


DR

Proof. Part 1. Let A = [aij ]m×n , B = [bij ]n×p and C = [cij ]p×q . Then
p
X n
X
(BC)kj = bkℓ cℓj and (AB)iℓ = aik bkℓ .
ℓ=1 k=1

Therefore,
n
X n
X p
X p
n X
   X 
A(BC) ij = aik BC kj = aik bkℓ cℓj = aik bkℓ cℓj
k=1 k=1 ℓ=1 k=1 ℓ=1
Xn Xp p
X Xn X t
   
= aik bkℓ cℓj = aik bkℓ cℓj = AB c
iℓ ℓj
= (AB)C ij
.
k=1 ℓ=1 ℓ=1 k=1 ℓ=1

Part 5. As D is a diagonal matrix, using Remark 1.1.2.11.4, we have


X
(DA)[i, :] = D[i, :] A = 0 · A[j, :] + di A[i, :].
j6=i

Using a similar argument, the next part follows. The other parts are left for the reader.

Exercise 1.1.2.16. 1. Find a 2 × 2 non-zero matrix A satisfying A2 = 0.

2. Find a 2 × 2 non-zero matrix A satisfying A2 = A and A 6= I2 .

3. Find 2 × 2 non-zero matrices A, B and C satisfying AB = AC but B 6= C. That is, the


cancelation law doesn’t hold.
1.2. OPERATIONS ON MATRICES 13
 
0 1 0
 
4. Let A =   2 3 3 3 2
0 0 1. Compute A and A . Is A = I? Determine aA + bA + cA .
1 0 0

5. Let A and B be two m × n matrices. Then prove that (A + B)∗ = A∗ + B ∗ .

6. Let A be a 1 × n matrix and B be an n × 1 matrix. Then verify that AB is a 1 × 1 matrix,


whereas BA has order n × n.

7. Let A and B be two matrices such that the matrix product AB is defined.

(a) Prove that (AB)∗ = B ∗ A∗ .


(b) If A[1, :] = 0∗ then (AB)[1, :] = 0∗ .
(c) If B[:, 1] = 0 then (AB)[:, 1] = 0.
(d) If A[i, :] = A[j, :] for some i and j then (AB)[i, :] = (AB)[j, :].
(e) If B[:, i] = B[:, j] for some i and j then (AB)[:, i] = (AB)[:, j].
   
1 1 + i −2 1 0
   

8. Let A =  1  
i  and B =  0 1
−2 . Compute
T
AF

−i 1 1 −1 + i 1
DR

(a) A − A∗ , A + A∗ , (3AB)∗ − 4B ∗ A and 3A − 2A∗ .


(b) (AB)[1, :], (AB)[3, :], (AB)[:, 1] and (AB)[:, 2].
(c) (B ∗ A∗ )[:, 1], (B ∗ A∗ )[:, 3], (B ∗ A∗ )[1, :] and (B ∗ A∗ )[2, :].
 
" # 0 1 1
0 1  
9. Let A = and B =   0 0 1  . Guess a formula for An and B n and prove it?
0 0
0 0 0
   
" # 1 1 1 1 1 1
1 1    
 1   2
10. Let A =
0 1
, B = 0 1  and C = 1 1 1. Is it true that A − 2A + I = 0?
0 0 1 1 1 1
3 2
What is B − 3B + 3B − I? 3
Is C = 3C ?2

11. Construct the matrices A and B satisfying the following statements.

(a) The product AB is defined but BA is not defined.


(b) The products AB and BA are defined but they have different orders.
(c) The products AB and BA are defined, they have the same order but AB 6= BA.

12. Let a,
 b and
 c be indeterminate.
 Then, can we find A with complex entries satisfying
a a+b " # " #
    a a · b
A   
 b  =  b − c ? What if A b = a ? Give reasons for your answer.
c 3a − 5b + c
14 CHAPTER 1. INTRODUCTION TO MATRICES

1.2.B Inverse of a Matrix

Definition 1.1.2.17. [Inverse of a Matrix] Let A be a square matrix of order n.

1. A square matrix B is said to be a left inverse of A if BA = In .

2. A square matrix C is called a right inverse of A, if AC = In .

3. A matrix A is said to be invertible (or is said to have an inverse) if there exists a matrix
B such that AB = BA = In .

Lemma 1.1.2.18. Let A be an n × n matrix. Suppose that there exist n × n matrices B and C
such that AB = In and CA = In then B = C.

Proof. Note that C = CIn = C(AB) = (CA)B = In B = B.

Remark 1.1.2.19. 1. Lemma 1.1.2.18 implies that whenever A is invertible, the inverse is
unique.

2. Therefore the inverse of A is denoted by A−1 . That is, AA−1 = A−1 A = I.


" #
a b
Example 1.1.2.20. 1. Let A = .
c d
T

" #
AF

1 d −b
(a) If ad − bc 6= 0. Then verify that A−1 = ad−bc .
−c a
DR

" # " #
2 3 7 −3
(b) In particular, the inverse of equals 21 .
4 7 −4 2
(c) If ad − bc = 0 then prove that either A[1, :] = 0∗ or A[:, 1] = 0 or A[2, :] = αA[1, :] or
A[:, 2] = αA[:, 1] for some α ∈ C. Hence, prove that A is not invertible.
" # " # " #
1 2 1 0 4 2
(d) The matrices , and do not have inverses.
0 0 4 0 6 3
   
1 2 3 −2 0 1
   
2. Let A =  2 3 4 . Then A−1 =  0
  3 −2 .

3 4 6 1 −2 1
   
1 1 1 1 1 2
   
3. Prove that the matrices A = 
 1 1 1 and B = 1 0 1 are not invertible.
  
1 1 1 0 1 1
Solution: Suppose there exists C such that CA = AC = I. Then, using matrix product

A[1, :]C = (AC)[1, :] = I[1, :] = [1, 0, 0] and A[2, :]C = (AC)[2, :] = I[2, :] = [0, 1, 0].

But A[1, :] = A[2, :] and thus [1, 0, 0] = [0, 1, 0], a contradiction.


Similarly, if there exists D such that BD = DB = I then

DB[:, 1] = (DB)[:, 1] = I[:, 1], DB[:, 2] = (DB)[:, 2] = I[:, 2] and DB[:, 3] = I[:, 3].

But B[:, 3] = B[:, 1] + B[:, 2] and hence I[:, 3] = I[:, 1] + I[:, 2], a contradiction.
1.2. OPERATIONS ON MATRICES 15

Theorem 1.1.2.21. Let A and B be two invertible matrices. Then

1. (A−1 )−1 = A.

2. (AB)−1 = B −1 A−1 .

3. (A∗ )−1 = (A−1 )∗ .

Proof. Proof of Part 1. Let B = A−1 be the inverse of A. Then AB = BA = I. Thus, by


definition, B is invertible and B −1 = A. Or equivalently, (A−1 )−1 = A.
Proof of Part 2. By associativity (AB)(B −1 A−1 ) = A(BB −1 )A−1 = I = (B −1 A−1 )(AB).
Proof of Part 3. As AA−1 = A−1 A = I, we get (AA−1 )∗ = (A−1 A)∗ = I ∗ . Or equivalently,
(A−1 )∗ A∗ = A∗ (A−1 )∗ = I. Thus, by definition (A∗ )−1 = (A−1 )∗ .
We will again come back to the study of invertible matrices in Sections 2.2 and 2.3.A.

Exercise 1.1.2.22. 1. Let A be an invertible matrix. Then (A−1 )r = A−r for all integer r.
" # " #
− cos(θ) sin(θ) cos(θ) sin(θ)
2. Find the inverse of and .
sin(θ) cos(θ) − sin(θ) cos(θ)
T

3. Let A1 , . . . , Ar be invertible matrices. Then the matrix B = A1 A2 · · · Ar is also invertible.


AF

4. Let x∗ = [1 + i, 2, 3] and y∗ = [2, −1 + i, 4]. Prove that x∗ y is invertible but xy∗ is not
DR

invertible.

5. Let A be an n × n invertible matrix. Then prove that

(a) A[i, :] 6= 0T for any i.


(b) A[:, j] 6= 0 for any j.
(c) A[i, :] 6= A[j, :] for any i and j.
(d) A[:, i] 6= A[:, j] for any i and j.
(e) A[3, :] 6= αA[1, :] + βA[2, :] for any α, β ∈ C, whenever n ≥ 3.
(f ) A[:, 3] 6= αA[:, 1] + βA[:, 2] for any α, β ∈ C, whenever n ≥ 3.
" #
−1 1 2
6. Determine A that satisfies (I + 3A) = .
2 1
 
−2 0 1
 
7. Determine A that satisfies (I − A)−1 = 
0 3 −2
. [See Example 1.1.2.20.2].
1 −2 1

1

8. Let A be a square matrix satisfying A3 + A − 2I = 0. Prove that A−1 = 2 A2 + I .

9. Let A = [aij ] be an invertible matrix. If B = [pi−j aij ] for some p ∈ C, p 6= 0 then


determine B −1 .
16 CHAPTER 1. INTRODUCTION TO MATRICES

1.3 Some More Special Matrices


Definition 1.1.3.1. 1. Let A be a square matrix with real entries. Then, A is called
" #
1 3
(a) symmetric if AT = A. For example, A = .
3 2
" #
0 3
(b) skew-symmetric if AT = −A. For example, A = .
−3 0
" #
T T 1 1 1
(c) orthogonal if AA = A A = I. For example, A = √ .
2 1 −1

2. Let A be a square matrix with complex entries. Then, A is called


 
∗ ∗ 1 i
(a) Normal if A A = AA . For example, is a normal matrix.
i 1
" #
∗ 1 1+i
(b) Hermitian if A = A. For example, A = .
1−i 2
" #
∗ 0 1+i
(c) skew-Hermitian if A = −A. For example, A = .
T
−1 + i 0
AF

" #
1 1 + i 1
(d) unitary if AA∗ = A∗ A = I. For example, A = √ .
DR

3 −1 1 − i
" #
1 0
3. A matrix A is said to be idempotent if A2 = A. For example, A = is idempotent.
1 0

4. A matrix that is symmetric and idempotent is called a projection matrix. For example,
let u ∈ Rn be a column vector with uT u = 1 then A = uuT is an idempotent matrix. More-
1
over, A is symmetric and hence is a projection matrix. In particular, let u = √ (1, 2)T
5
and A = uuT . Then uT u = 1 and for any vector x = (x1 , x2 )T ∈ R2 note that
 
∗ ∗ x1 + 2x2 x1 + 2x2 2x1 + 4x2 T
Ax = (uu )x = u(u x) = √ u= , .
5 5 5

Thus, Ax is the feet of the perpendicular from the point x on the vector [1 2]T .

5. A square matrix A is said to be nilpotent if there exists a positive integer n such that
An = 0. The least positive integer k for which Ak = 0 is called the order of nilpotency.
For example, if A = [aij ] is an n × n matrix with aij equal to 1 if i − j = 1 and 0, otherwise
then An = 0 and Aℓ 6= 0 for 1 ≤ ℓ ≤ n − 1.

Exercise 1.1.3.2. 1. Let A be a complex square matrix. Then S1 = 21 (A+A∗ ) is Hermitian,


S2 = 21 (A − A∗ ) is skew-Hermitian, and A = S1 + S2 .

2. Let A and B be two lower triangular matrices. Then prove that AB is a lower triangular
matrix. A similar statement holds for upper triangular matrices.
1.3. SOME MORE SPECIAL MATRICES 17

3. Let A and B be Hermitian matrices. Then prove that AB is Hermitian if and only if
AB = BA.

4. Show that the diagonal entries of a skew-Hermitian matrix are zero or purely imaginary.

5. Let A, B be skew-Hermitian matrices with AB = BA. Is the matrix AB Hermitian or


skew-Hermitian?

6. Let A be a Hermitian matrix of order n with A2 = 0. Is it necessarily true that A = 0?

7. Let A be a nilpotent matrix. Prove that there exists a matrix B such that B(I + A) = I =
(I + A)B [If Ak = 0 then look at I − A + A2 − · · · + (−1)k−1 Ak−1 ].
   
1 0 0 1 0 0
8. Are the matrices 0 cos θ − sin θ and 0 cos θ sin θ  orthogonal, for θ ∈ [0, 2π]?
0 sin θ cos θ 0 sin θ − cos θ

9. Let {u1 , u2 , u3 } be three vectors in R3 such that u∗i ui = 1, for 1 ≤ i ≤ 3, and u∗i uj = 0
whenever i 6= j. Then prove that the 3 × 3 matrix

(a) U = [u1 u2 u3 ] satisfies U ∗ U = I. Thus, U U ∗ = I.


T
AF

(b) A = ui u∗i , for 1 ≤ i ≤ 3, satisfies A2 = A. Is A symmetric?


DR

(c) A = ui u∗i + uj u∗j , for i 6= j, satisfies A2 = A. Is A symmetric?

10. Verify that the matrices in Exercises 9.9b and 9.9c are projection matrices.

11. Let A and B be two n × n orthogonal matrices. Then prove that AB is also an orthogonal
matrix.

1.3.A sub-matrix of a Matrix

Definition 1.1.3.3. A matrix obtained by deleting some of the rows and/or columns of a matrix
is said to be a sub-matrix of the given matrix.
" # " # " #
1 4 5 1 1 5
For example, if A = then [1], [2], , [1 5], , A are a few sub-matrices of A.
0 1 2 0 0 2
" # " #
1 4 1 4
But the matrices and are not sub-matrices of A (the reader is advised to give
1 0 0 2
reasons).
Let A be an n×m matrix and B be an m×p matrix.
" # Suppose r < m. Then, we can decompose
H
the matrices A and B as A = [P Q] and B = , where P has order n × r and H has order
K
r × p. That is, the matrices P and Q are sub-matrices of A and P consists of the first r columns
of A and Q consists of the last m − r columns of A. Similarly, H and K are sub-matrices of B
and H consists of the first r rows of B and K consists of the last m − r rows of B. We now
prove the following important theorem.
18 CHAPTER 1. INTRODUCTION TO MATRICES
" #
H
Theorem 1.1.3.4. Let A = [aij ] = [P Q] and B = [bij ] = be defined as above. Then
K

AB = P H + QK.

Proof. The matrix products P H and QK are valid as the order of the matrices P, H, Q and K
are respectively, n × r, r × p, n × (m − r) and (m − r) × p. Also, the matrices P H and QK are
of the same order and hence their sum is justified. Now, let P = [Pij ], Q = [Qij ], H = [Hij ],
and K = [kij ]. Then, for 1 ≤ i ≤ n and 1 ≤ j ≤ p, we have
m
X r
X m
X r
X m
X
(AB)ij = aik bkj = aik bkj + aik bkj = Pik Hkj + Qik Kkj
k=1 k=1 k=r+1 k=1 k=r+1
= (P H)ij + (QK)ij = (P H + QK)ij .

Thus, the required result follows.

Remark 1.1.3.5. Theorem 1.1.3.4 is very useful due to the following reasons:

1. The order of the matrices P, Q, H and K are smaller than that of A or B.

2. The matrices P, Q, H and K can be further partitioned so as to form blocks that are either
T
AF

identity or zero or matrices that have nice forms. This partition may be quite useful during
different matrix operations.
DR

3. If we want to prove results using induction then after proving the initial step, one assume
the result for all r×r sub-matrices and then try to prove it for (r+1)×(r+1) sub-matrices.
 
" # a b " #" #
1 2 0   1 2 a b
For example, if A = and B =  
 c d  then AB = 2 5 c d .
2 5 0
e f
m m s s
" 1 #2 "1 2#
Suppose A = n1 P Q and B = r1 E F . Then the matrices P, Q, R, S
n2 R S r2 G H
and E, F, G, H, are called the blocks of the matrices A and B, respectively. Note that even
if A + B is defined, the orders
" of P and E#need not be the same. But, if the block sums
P +E Q+F
are defined then A + B = . Similarly, if the product AB is defined, the
R+G S+H
product" P E may not be defined.
# Again, if the block products are defined, one can verify that
P E + QG P F + QH
AB = . That is, once a partition of A is fixed, the partition of B has
RE + SG RF + SH
to be properly chosen for purposes of block addition or multiplication.

Exercise 1.1.3.6. 1. Complete the proofs of Theorems 1.1.2.7 and 1.1.2.15.


     
1/2 0 0 1 0 0 2 2 2 6
     
2. Let A =   
 0 1 0 , B = −2 1 0  
 and C = 2 1 2 5 . Compute
0 0 1 −3 0 1 3 3 4 10
1.3. SOME MORE SPECIAL MATRICES 19

(a) (AC)[1, :],


(b) (B(AC))[1, :], (B(AC))[2, :] and (B(AC))[3, :].
(c) Note that (B(AC))[:, 1] + (B(AC))[:, 2] + (B(AC))[:, 3] − (B(AC))[:, 4] = 0.
(d) Let xT = [1, 1, 1, −1]. Use previous result to prove Cx = 0.
" # " # " # " #
x1 y1 cos α − sin α cos(2θ) sin(2θ)
3. Let x = ,y= , A= and B = . Then
x2 y2 sin α cos α sin(2θ) − cos(2θ)

(a) prove that y = Ax gives the counter-clockwise rotation through an angle α.


(b) prove that y = Bx gives the reflection about the line y = tan(θ)x.
(c) compute y = (AB)x and y = (BA)x. Do they correspond to reflection? If yes, then
about which line?
(d) furthermore if y = Cx gives the counter-clockwise rotation through β and y = Dx
gives the reflection about the line y = tan(δ) x, respectively. Then prove that
i. AC = CA and y = (AC)x gives the counter-clockwise rotation through α + β.
ii. y = (BD)x and y = (DB)x give rotations. Which angles do they represent?
T

4. Fix a unit vector a ∈ Rn and define f : Rn → Rn by f (y) = 2(aT y)a − y. Does this
AF

function give a reflection about the line that contains the points 0 and a.
DR

5. Consider the two coordinate transformations


x1 = a11 y1 + a12 y2 y1 = b11 z1 + b12 z2
and .
x2 = a21 y1 + a22 y2 y2 = b21 z1 + b22 z2

(a) Compose the two transformations to express x1 , x2 in terms of z1 , z2 .


(b) If xT = [x1 , x2 ], yT = [y1 , y2 ] and zT = [z1 , z2 ] then find matrices A, B and C such
that x = Ay, y = Bz and x = Cz.
(c) Is C = AB? Give reasons for your answer.

6. For An×n = [aij ], the trace of A, denoted Tr(A), is defined by Tr(A) = a11 + a22 + · · · +
ann .
" # " #
3 2 4 −3
(a) Compute Tr(A) for A = and A = .
2 2 −5 1
" # " # " # " # " #
1 1 1 1 1 1
(b) Let A be a matrix with A =2 and A =3 . If B = then
2 2 −2 −2 2 −2
compute Tr(AB).
(c) Let A and B be two square matrices of the same order. Then prove that
i. Tr(A + B) = Tr(A) + Tr(B).
ii. Tr(AB) = tr(BA).
(d) Prove that one cannot find matrices A and B such that AB − BA = cI for any c 6= 0.
20 CHAPTER 1. INTRODUCTION TO MATRICES

7. Let A and B be two m × n matrices with complex entries. Then prove that

(a) Ax = 0 for all n × 1 vector x implies that A = 0, the zero matrix.

(b) Ax = Bx for all n × 1 vector x implies that A = B.

8. Let A be an n × n matrix such that AB = BA for all n × n matrices B. Then prove that
A is a scalar matrix. That is, A = αI for some α ∈ C.
" #
1 2 3
9. Let A = .
2 1 1

(a) Find a matrix B such that AB = I2 .

(b) What can you say about the number of such matrices? Give reasons for your answer.

(c) Does there exist a matrix C such that CA = I3 ? Give reasons for your answer.
   
1 0 0 1 1 2 2 1
   
 0 1 1 1   1 1 2 1 
10. Let A =    . Compute the matrix product AB
 and B =  
 0 1 1 0   1 1 1 1 
T
AF

0 1 0 1 −1 1 −1 1
using the block matrix multiplication.
DR

"#
Q P
11. Let A = . If P, Q and R are Hermitian, is the matrix A Hermitian?
Q R
" #
A11 A12
12. Let A = , where A11 is an n × n invertible matrix and c ∈ C.
A21 c

(a) If p = c − A21 A−1


11 A12 is non-zero, prove that
" # " #
A−1 0 1 A−1 h i
11 11 A12
B= + A21 A−1
11 −1
0 0 p −1

is the inverse of A.
   
0 −1 2 0 −1 2
   
(b) Use the above to find the inverse of 
 1 1 4 
 and  3
 1 4 .
−2 1 1 −2 5 −3

13. Let x be an n × 1 vector with real entries and satisfying xT x = 1.

(a) Define A = In − 2xxT . Prove that A is symmetric and A2 = I. The matrix A is


commonly known as the Householder matrix.

(b) Let α 6= 1 be a real number and define A = In − αxxT . Prove that A is symmetric
and invertible [The inverse is also of the form In + βxxT for some value of β].
1.4. SUMMARY 21

14. Let A be an invertible matrix of order n and let x and y be two n × 1 vectors with real
entries. Also, let β be a real number such that α = 1 + βyT A−1 x 6= 0. Then prove the
famous Shermon-Morrison formula
β −1 T −1
(A + βxyT )−1 = A−1 − A xy A .
α
This formula gives the information about the inverse when an invertible matrix is modified
by a rank one matrix.

15. Suppose the matrices B and C are invertible and the involved partitioned products are
defined, then prove that
" #−1 " #
A B 0 C −1
= .
C 0 B −1 −B −1 AC −1

16. Let J be an n × n matrix having each entry 1.

(a) Prove that J 2 = nJ.


(b) Let α1 , α2 , β1 , β2 ∈ R. Prove that there exist α3 , β3 ∈ R such that
T

(α1 In + β1 J) · (α2 In + β2 J) = α3 In + β3 J.
AF
DR

(c) Let α, β ∈ R with α 6= 0 and α + nβ 6= 0 and define A = αIn + βJ. Prove that A is
invertible.

17. Let A be an upper triangular matrix. If A∗ A = AA∗ then prove that A is a diagonal
matrix. The same holds for lower triangular matrix.

18. Let A be an m × n matrix. Then a matrix G of order n × m is called a generalized


inverse of A" if AGA# = A. For example, a generalized inverse of the matrix A = [1, 2] is a
1 − 2α
matrix G = , for all α ∈ R. A generalized inverse G is called a pseudo inverse
α
or a Moore-Penrose inverse if GAG = G and the matrices AG and GA are symmetric.
2
Check that for α = the matrix G is a pseudo inverse of A.
5

1.4 Summary
In this chapter, we started with the definition of a matrix and came across lots of examples. In
particular, the following examples were important:

1. The zero matrix of size m × n, denoted 0m×n or 0.

2. The identity matrix of size n × n, denoted In or I.

3. Triangular matrices.

4. Hermitian/Symmetric matrices.
22 CHAPTER 1. INTRODUCTION TO MATRICES

5. Skew-Hermitian/skew-symmetric matrices.

6. Unitary/Orthogonal matrices.

7. Idempotent matrices.

8. nilpotent matrices.

We also learnt product of two matrices. Even though it seemed complicated, it basically tells
that multiplying by a matrix on the

1. left to a matrix A is same as operating on the rows of A.

2. right to a matrix A is same as operating on the columns of A.

T
AF
DR
Chapter 2

System of Linear Equations over R

2.1 Introduction

Let us look at some examples of linear systems.

1. Suppose a, b ∈ R. Consider the system ax = b in the unknown x. If


T

(a) a 6= 0 then the system has a unique solution x = ab .


AF

(b) a = 0 and
DR

i. b 6= 0 then the system has no solution.


ii. b = 0 then the system has infinite number of solutions, namely all x ∈ R.

2. Consider a linear system with 2 equations in 2 unknowns. The equation ax + by = c in


the unknowns x and y represents a line in R2 if either a 6= 0 or b 6= 0. Thus the solution
set of the system
a1 x + b1 y = c1 , a2 x + b2 y = c2

is given by the points of intersection of the two lines. The different cases are illustrated
by examples (see Figure 1). Figure 1??).

(a) Unique Solution


x + 2y = 1 and x + 3y = 1. The unique solution is [x, y]T = [1, 0]T .
Observe that in this case, a1 b2 − a2 b1 6= 0.

(b) Infinite Number of Solutions


x + 2y = 1 and 2x + 4y = 2. As both equations represent the same line, the solution
set is [x, y]T = [1 − 2y, y]T = [1, 0]T + y[−2, 1]T with y arbitrary. Observe that

i. a1 b2 − a2 b1 = 0, a1 c2 − a2 c1 = 0 and b1 c2 − b2 c1 = 0.
ii. the vector [1, 0]T corresponds to the solution x = 1, y = 0 of the given system.
iii. the vector [−2, 1]T corresponds to the solution x = −2, y = 1 of the system
x + 2y = 0, 2x + 4y = 0.
24 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

(c) No Solution
x + 2y = 1 and 2x + 4y = 3. The equations represent a pair of parallel lines and
hence there is no point of intersection. Observe that in this case, a1 b2 − a2 b1 = 0 but
a1 c2 − a2 c1 6= 0.

3. As a last example, consider 3 equations in 3 unknowns.


A linear equation ax + by + cz = d represent a plane in R3 provided [a, b, c] 6= [0, 0, 0].
Here, we have to look at the points of intersection of the three given planes.

(a) Unique Solution


Consider the system x+y +z = 3, x+4y +2z = 7 and 4x+10y −z = 13. The unique
solution to this system is [x, y, z]T = [1, 1, 1]T , i.e., the three planes intersect
at a point.
(b) Infinite Number of Solutions
Consider the system x + y + z = 3, x + 2y + 2z = 5 and 3x + 4y + 4z = 11. The
solution set is [x, y, z]T = [1, 2 − z, z]T = [1, 2, 0]T + z[0, −1, 1]T , with z arbitrary.
Observe the following:
i. Here, the three planes intersect in a line.
T
AF

ii. The vector [1, 2, 0]T corresponds to the solution x = 1, y = 2 and z = 0 of the
linear system x + y + z = 3, x + 2y + 2z = 5 and 3x + 4y + 4z = 11. Also,
DR

the vector [0, −1, 1]T corresponds to the solution x = 0, y = −1 and z = 1 of the
linear system x + y + z = 0, x + 2y + 2z = 0 and 3x + 4y + 4z = 0.
(c) No Solution
The system x + y + z = 3, 2x + 2y + 2z = 5 and 3x + 3y + 3z = 3 has no solution. In
this case, we have three parallel planes. The readers are advised to supply the proof.

Definition 2.2.1.1. [Linear System] A system of m linear equations in n unknowns x1 , x2 , . . . , xn


is a set of equations of the form

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. ..
. . (2.2.1.1)
am1 x1 + am2 x2 + · · · + amn xn = bm

where for 1 ≤ i ≤ n and 1 ≤ j ≤ m; aij , bi ∈ R. Linear System (2.2.1.1) is called homogeneous


if b1 = 0 = b2 = · · · = bm and non-homogeneous , otherwise.
     
a11 a12 · · · a1n x1 b1
     
 a21 a22 · · · a2n   x2   b2 
     
Let A =  . . . .  , x =  .  and b =  . . Then (2.2.1.1) can be re-written
 .. .. .. ..   ..   .. 
     
am1 am2 · · · amn xn bm
2.1. INTRODUCTION 25

as Ax = b. In this setup, the matrix A is called the coefficient matrix and the block matrix
[A b] is called the augmented matrix of the linear system (2.2.1.1).

Remark 2.2.1.2. Consider the augmented matrix [A b] of the linear system Ax = b, where A
is an m × n matrix, b and x are column vectors of appropriate size. If xT = [x1 , . . . , xn ] then
it is important to note that

1. the unknown x1 corresponds to the column ([A b])[:, 1].

2. in general, for j = 1, 2, . . . , n, the unknown xj corresponds to the column ([A b])[:, j].

3. the vector b = ([A b])[:, n + 1].

4. for i = 1, 2, . . . , m, the ith equation corresponds to the row ([A b])[i, :].

Definition 2.2.1.3. [Consistent, Inconsistent] A linear system is called consistent if it admits


a solution and is called inconsistent if it admits no solution. For example, the homogeneous
system Ax = 0 is always consistent as 0 is a solution whereas the system x + y = 2, 2x + 2y = 1
is inconsistent.

Definition 2.2.1.4. Consider the linear system Ax = b. Then the corresponding linear sys-
T
AF

tem Ax = 0 is called the associated homogeneous system. As mentioned in the previous


paragraph, the associated homogeneous system is always consistent.
DR

Definition 2.2.1.5. [Solution of a Linear System] A solution of Ax = b is a vector y such


that Ay indeed equals b. The set of all solutions
 is called the
 solution set of the system. For
    
1 1 1 3  1 
 
example, the solution set of Ax = b, with A =  1 4 2  and b =  7  equals 1 .
  
4 10 −1 13 1

The readers are advised to supply the proof of the next theorem that gives information about
the solution set of a homogeneous system.

Theorem 2.2.1.6. Consider the homogeneous linear system Ax = 0.

1. Then x = 0, the zero vector, is always a solution.

2. Let u 6= 0 be a solution of Ax = 0. Then, y = cu is also a solution for all c ∈ R.


P
k
3. Let u1 , . . . , uk be solutions of Ax = 0. Then ai ui is also a solution of Ax = 0, for all
i=1
ai ∈ R, 1 ≤ i ≤ k.

Remark 2.2.1.7. Consider the homogeneous system Ax = 0. Then

1. the vector 0 is called the trivial solution.

2. a non-zero "
solution
# is called a non-trivial
" # solution. For example, for the system Ax = 0,
1 1 1
where A = , the vector x = is a non-trivial solution.
1 1 −1
26 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

3. Thus, by Theorem 2.2.1.6, the existence of a non-trivial solution of Ax = 0 is equivalent


to having an infinite number of solutions for the system Ax = 0.

4. Let u, v be two distinct solutions of the non-homogeneous system Ax = b. Then xh = u−v


is a solution of the homogeneous system Ax = 0. That is, any two solutions of Ax = b
differ by a solution of the associated homogeneous system Ax = 0. Or equivalently, the
solution set of Ax = b is of the form, {x0 + xh }, where x0 is a particular solution of
Ax = b and xh is a solution of the associated homogeneous system Ax = 0.

Exercise 2.2.1.8. 1. Consider a system of 2 equations in 3 unknowns. If this system is


consistent then how many solutions does it have?

2. Define a linear system of 3 equations in 2 unknowns such that the system is inconsistent.

3. Define a linear system of 4 equations in 3 unknowns such that the system is inconsistent
whereas it has three equations which form a consistent system.

4. Let Ax = b be a system of m equations and n unknowns. Then

(a) determine the possible solution set if m ≥ 3 and n = 2.


T

(b) determine the possible solution set if m ≥ 4 and n = 3.


AF

(c) can this system have exactly two distinct solutions?


DR

(d) can have only a finitely many (greater than 1) solutions?

2.1.A Elementary Row Operations

Example 2.2.1.9. Solve the linear system y + z = 2, 2x + 3z = 5, x + y + z = 3.


 
0 1 1 2
 
Solution: Let B0 = [A b], the augmented matrix. Then B0 = 
2 0 3 5. We now

1 1 1 3
systematically proceed to get the solution.

1. Interchange 1st and 2nd equation (interchange B0 [1, :] and B0 [2, :] to get B1 ).
 
2x + 3z =5 2 0 3 5
 
y+z =2 B1 = 
 0 1 1 2 .

x+y+z =3 1 1 1 3

1
2. In the new system, multiply 1st equation by 1
2 (multiply B1 [1, :] by to get B2 ).
2
 
x + 23 z = 5
2 1 0 23 5
2

y+z =2 B2 = 
0 1 1 2
.
x+y+z =3 1 1 1 3
2.1. INTRODUCTION 27

3. In the new system, replace 3rd equation by 3rd equation minus 1st equation (replace
B2 [3, :] by B2 [3, :] − B2 [1, :] to get B3 ).
 
x + 23 z = 5
2 1 0 3
2
5
2

y+z =2 B3 = 
0 1 1 2
.
y − 21 z = 1
2 0 1 − 12 1
2

4. In the new system, replace 3rd equation by 3rd equation minus 2nd equation (replace
B3 [3, :] by B3 [3, :] − B3 [2, :] to get B4 ).
 
x + 32 z = 5
2 1 0 3
2
5
2
 
y+z =2 B4 = 
0 1 1 2 .
− 23 z = − 32 0 0 − 32 −23

−2
5. In the new system, multiply 3rd equation by −2
3 (multiply B4 [3, :] by to get B5 ).
3
 
x + 32 z = 5
2 1 0 3
2
5
2

y+z =2 B5 = 
0 1 1 2
.
T

z =1 0 0 1 1
AF

The last equation gives z = 1. Using this, the second equation gives y = 1. Finally, the
DR

first equation gives x = 1. Hence, the solution set is {[x, y, z]T | [x, y, z] = [1, 1, 1]}, a unique
solution.
In Example 2.2.1.9, observe that for each operation on the system of linear equations there
is a corresponding operation on the row of the augmented matrix. We use this idea to define
elementary row operations and the equivalence of two linear systems.

Definition 2.2.1.10. [Elementary Row Operations] Let A be an m × n matrix. Then the


elementary row operations are

1. Eij : Interchange of A[i, :] and A[j, :].

2. Ek (c) for c 6= 0: Multiply A[k, :] by c.

3. Eij (c) for c 6= 0: Replace A[i, :] by A[i, :] + cA[j, :].

Definition 2.2.1.11. [Equivalent Linear Systems] Consider the linear systems Ax = b and
Cx = d with corresponding augmented matrices, [A b] and [C d], respectively. Then the two
linear systems are said to be equivalent if [C d] can be obtained from [A b] by application of
a finite number of elementary row operations.

Definition 2.2.1.12. [Equivalent Matrices] Two matrices are said to be equivalent if one can
be obtained from the other by a finite number of elementary row operations.

Thus, note that the linear systems at each step in Example 2.2.1.9 are equivalent to each other.
We now prove that the solution set of two equivalent linear systems are same.
28 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

Lemma 2.2.1.13. Let Cx = d be the linear system obtained from Ax = b by application of a


single elementary row operation. Then Ax = b and Cx = d have the same solution set.

Proof. We prove the result for the elementary row operation Ejk (c) with c 6= 0. The reader is
advised to prove the result for the other two elementary operations.
In this case, the systems Ax = b and Cx = d vary only in the j th equation. So, we need
to show that y satisfies the j th equation of Ax = b if and only if y satisfies the j th equation
of Cx = d. So, let yT = [α , . . . , α ]. Then, the j th and kth equations of
1 n Ax = b are
aj1 α1 + · · · + ajn αn = bj and ak1 α1 + · · · + akn αn = bk . Therefore, we see that αi ’s satisfy

(aj1 + cak1 )α1 + · · · + (ajn + cakn )αn = bj + cbk . (2.2.1.2)

Also, by definition the j th equation of Cx = d equals

(aj1 + cak1 )x1 + · · · + (ajn + cakn )xn = bj + cbk . (2.2.1.3)

Therefore, using Equation (2.2.1.2), we see that yT = [α1 , . . . , αn ] is also a solution for Equation
(2.2.1.3). Now, use a similar argument to show that if zT = [β1 , . . . , βn ] is a solution of Cx = d
then it is also a solution of Ax = b. Hence, the required result follows.
T

The readers are advised to use Lemma 2.2.1.13 as an induction step to prove the main result
AF

of this subsection which is stated next.


DR

Theorem 2.2.1.14. Let Ax = b and Cx = d be two equivalent linear systems. Then they have
the same solution set.

2.2 System of Linear Equations


In the previous section, we saw that if one linear system can be obtained from another by a
repeated application of elementary row operations then the two linear systems have the same
solution set. Sometimes it helps to imagine an elementary row operation as a product on the
left by elementary matrix. In this section, we will try to understand this relationship and use
them to first obtain results for the system of linear equations and then to the theory of square
matrices.

2.2.A Elementary Matrices and the Row-Reduced Echelon Form (RREF)

Definition 2.2.2.1. A square matrix E of order n is called an elementary matrix if it is


obtained by applying exactly one elementary row operation to the identity matrix In .

Remark 2.2.2.2. The elementary matrices are of three types and they correspond to elementary
row operations.

1. Eij : Matrix obtained by applying elementary row operation Eij to In .

2. Ek (c) for c 6= 0: Matrix obtained by applying elementary row operation Ek (C) to In .


2.2. SYSTEM OF LINEAR EQUATIONS 29

3. Eij (c) for c 6= 0: Matrix obtained by applying elementary row operation Eij (c) to In .

Example 2.2.2.3.
 1. In particular,
 forn = 3 and a real number
 c 6= 0, one 
has 
1 0 0 c 0 0 1 0 0 1 0 0
       
E23 =   
0 0 1, E1 (c) = 0 1 0 , E31 (c) = 0 1 0 and E23 (c) = 0 1 c.
   
0 1 0 0 0 1 c 0 1 0 0 1
   
1 2 3 0 1 2 3 0
   
2. Let A = 
2 0 3 4  and B = 
3 4 5 6
. Then verify that
3 4 5 6 2 0 3 4
    
1 0 0 1 2 3 0 1 2 3 0
    

E23 A = 0    
0 3 4 = 3 4 5 6
0 1 2  = B.
0 1 0 3 4 5 6 2 0 3 4
     
1 1 1 1 1 1 1 1 1
3. Let A = 1 2 2. Then E21 (−1)E32 (−2)A = E21 (−1)1 2 2 = 0 1 1.
2 4 4 0 0 0 0 0 0
 
 
0 1 1 2 1 0 0 1
T
 

4. Let A = 2  
0 3 5. Then verify that E31 (−2)E13 E31 (−1)A = 0 1 1 2.
AF

1 1 1 3 0 0 3 3
DR

Exercise 2.2.2.4. 1. Which of the following matrices are elementary?


           
1
2 0 1 0 0 1 −1 0 1 0 0 0 0 1 0 0 1
  2         
0 1 0 ,  0 1 0        
    , 0 1 0 , 5 1 0 , 0 1 0 , 1 0 0 .
0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0

" #
2 1
2. Find elementary matrices E1 , . . . , Ek such that Ek · · · E1 = I2 .
1 2
 
1 1 1
 
3. Determine elementary matrices F1 , . . . , Fk such that Ek · · · E1 
 0 1 1  = I3 .

0 0 3

Remark 2.2.2.5. Observe that

1. (Eij )−1 = Eij as Eij Eij = I = Eij Eij .

2. Let c 6= 0. Then (Ek (c))−1 = Ek (1/c) as Ek (c)Ek (1/c) = I = Ek (1/c)Ek (c).

3. Let c 6= 0. Then (Eij (c))−1 = Eij (−c) as Eij (c)Eij (−c) = I = Eij (−c)Eij (c).

Thus, each elementary matrix is invertible and the inverse is also an elementary matrix.

Based on the above observation and the fact that product of invertible matrices is invertible,
the readers are advised to prove the next result.
30 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

Lemma 2.2.2.6. Prove that applying elementary row operations is equivalent to multiplying on
the left by the corresponding elementary matrix and vice-versa.

Proposition 2.2.2.7. Let A and B be two equivalent matrices. Then prove that B = E1 · · · Ek A,
for some elementary matrices E1 , . . . , Ek .

Proof. By definition of equivalence, the matrix B can be obtained from A by a finite number
of elementary row operations. But by Lemma 2.2.2.6, each elementary row operation on A
corresponds to multiplying on the left of A by an elementary matrix. Thus, the required result
follows.
We now give a direct prove of Theorem 2.2.1.14. To do so, we state the theorem once again.

Theorem 2.2.2.8. Let Ax = b and Cx = d be two equivalent linear systems. Then they have
the same solution set.

Proof. Let E1 , . . . , Ek be the elementary matrices such that E1 · · · Ek [A b] = [C d]. Then, by


definition of matrix product and Remark 2.2.2.5

E1 · · · Ek A = C, E1 · · · Ek b = d and A = (E1 · · · Ek )−1 C, b = (E1 · · · Ek )−1 d. (2.2.2.1)


T

Now assume that Ay = b holds. Then, by Equation (2.2.2.1)


AF
DR

Cy = E1 · · · Ek Ay = E1 · · · Ek b = d. (2.2.2.2)

On the other hand if Cz = d holds then using Equation (2.2.2.1), we have

Az = (E1 · · · Ek )−1 Cz = (E1 · · · Ek )−1 d = b. (2.2.2.3)

Therefore, using Equations (2.2.2.2) and (2.2.2.3) the required result follows.
As an immediate corollary, the readers are advised to prove the following result.

Corollary 2.2.2.9. Let A and B be two equivalent matrices. Then the systems Ax = 0 and
Bx = 0 have the same solution set.
   
1 0 0 1 0 a
Example 2.2.2.10. Are the matrices A = 0 1 0 and B = 0 1 b equivalent?
0 0 1 0 0 0
 
a
Solution: No, as  b  is a solution of Bx = 0 but it isn’t a solution of Ax = 0.
−1
Definition 2.2.2.11. [Pivot/Leading Term] Let A be a non-zero matrix. Then, a pivot/leading
term is the first (from left) nonzero element of a non-zero row in A and the column containing
the pivot term is called the pivot column. If aij is a pivot then we denote it by aij . For
 
0 3 4 2
 
example, the entries a12 and a23 are pivots in A = 
0 0 0 0 . Thus, the columns 1 and
0 0 2 1
2 are pivot columns.
2.2. SYSTEM OF LINEAR EQUATIONS 31

Definition 2.2.2.12. [Echelon Form] A matrix A is in echelon form (EF) (ladder like) if

1. pivot of the (i + 1)-th row comes to the right of the i-th.

2. entries below the pivot in a pivotal column are 0.

3. the zero rows are at the bottom.


Example 2.2.2.13. 1. The following matrices are in echelon form.
     
1 2 0 5  
0 2 4 2 1 1 0 2 3 1 0 0
    0 2 0 6
0 1  4   and  0 0 .
 0 1 ,  0 0 0 3 ,  0 0 0 1
1
0 0 0 0 0 0 0 0 1 0 0 1
0 0 0 0
2. The
 following matrices
 are not in echelon 
form (determine the rule(s) that fail).
0 1 4 2 1 1 0 2 3
   
0  
0 0 and  0 0 0 0 1
 0 .
0 0 1 1 0 0 0 1 4

Definition 2.2.2.14. [Row-Reduced Echelon Form] A matrix C is said to be in the row-


reduced echelon form (RREF) if
T
1. C is already in echelon form,
AF

2. pivot of each non-zero row is 1,


DR

3. every other entry in the pivotal column is zero.

A matrix in RREF is also called a row-reduced echelon matrix.


Example 2.2.2.15. 1. The following matrices are in RREF.
     
  1 0 0 5
0 1 0 −2 0 1 3 0 1 1 0 0 0
  0 1 0 6  
0 1     and 0 0
 0 1 , 0 0 0 1 ,  0 0 1 2  0 0 1 .
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0
2. The following matrices are not in RREF (determine the rule(s) that fail).
     
0 3 3 0 0 1 3 0 0 1 3 1
0 0 0 1 , 0 0 0 0, 0 0 0 1 .
0 0 0 0 0 0 0 1 0 0 0 0
The proof of the next result is beyond the scope of this book and hence is omitted.

Theorem 2.2.2.16. Let A and B be two matrices in RREF. If they are equivalent then A = B.

As an immediate corollary, we obtain the following important result.

Corollary 2.2.2.17. The RREF of a matrix is unique.

Proof. Suppose there exists a matrix A with two different RREFs, say B and C. As the RREFs
are obtained by multiplication of elementary matrices there exist elementary matrices E1 , . . . , Ek
and F1 , . . . , Fℓ such that B = E1 · · · Ek A and C = F1 · · · Fℓ A. Thus,

B = E1 · · · Ek A = E1 · · · Ek (F1 · · · Fℓ )−1 C = E1 · · · Ek Fℓ−1 · · · F1−1 C.


32 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

As inverse of elementary matrices are elementary matrices, we see that the matrices B and C
are equivalent. As B and C are in RREF, using Theorem 2.2.2.16, we see that B = C.

Theorem 2.2.2.18. Let A be an m × n matrix. If B consists of the first s columns of A then


RREF(B) equals the first s columns of RREF(A).

Proof. Let us write F = RREF(A). By definition of RREF, there exist elementary matrices
E1 , . . . , Ek such that E1 · · · Ek A = F . Then, by matrix multiplication

E1 · · · Ek A = [E1 · · · Ek A[:, 1], . . . , E1 · · · Ek A[:, n]] = [F [:, 1], . . . , F [:, n]].

Thus, E1 · · · Ek B = [E1 · · · Ek A[:, 1], . . . , E1 · · · Ek A[:, s]] = [F [:, 1], . . . , F [:, s]]. Since the matrix
F is in RREF, by definition, it’s first s columns are also in RREF. Hence, by Corollary 2.2.2.17
we see that RREF(B) = [F [:, 1], . . . , F [:, s]]. Thus, the required result follows.
Let A an m × n matrix. Then by Corollary 2.2.2.17, it’s RREF is unique. We use it to define
our next object.

2.2.B Rank of a Matrix


T

Definition 2.2.2.19. [Row-Rank of a Matrix] Let A be an m × n matrix and let the number of
AF

pivots (number of non-zero rows) in it’s RREF. Then the row rank of A, denoted row-rank(A),
DR

equals r. For example, row-rank(In ) = n and row-rank(0) = 0.

Remark 2.2.2.20. 1. Even though, row-rank is defined using the RREF of a matrix, we just
need to compute the echelon form as the number of non-zero rows/pivots do not change
when we proceed to compute the RREF from the echelon form.

2. Let A be an m × n matrix. Then by the definition of RREF, the number of pivots, say r,
satisfies r ≤ min{m, n}. Thus, row-rank(A) ≤ min{m, n}.

Example 2.2.2.21. Determine the row-rank of the following matrices.

1. diag(d1 , . . . , dn ).
Solution: Let S = {i | 1 ≤ i ≤ n, di 6= 0}. Then, verify that row-rank equals the number
of elements in S.
 
1 2 1 1
 
2. A =  2 3 1 2.

1 1 2 1
Solution: The echelon form of A is obtained as follows:
     
1 2 1 1 1 2 1 1 1 2 1 1
    E32 (−1)  
2 3 1 2 E31 (−1),E

21 (−2)
 0 0 −1 −1 0 .
  0 −1 −1  →  
1 1 2 1 0 −1 1 0 0 0 −2 0

As the echelon form of A has 3 non-zero rows row-rank(A) = 3.


2.2. SYSTEM OF LINEAR EQUATIONS 33
 
1 2 1 1 1
 
3. A = 
2 3 1 2 2.

1 1 0 1 1
Solution: row-rank(A) = 2 as the echelon form of A (given below) has two non-zero rows:
     
1 2 1 1 1 1 2 1 1 1 1 2 1 1 1
    (−1)  
2 3 1 2 2 E31 (−1),E→
21 (−2)
0 −1 −1 0 0 E32→ 0 −1 −1 0 0 .
     
1 1 0 1 1 0 −1 −1 0 0 0 0 0 0 0

Remark 2.2.2.22. Let Ax = b be a linear system with m equations in n unknowns. Let


RREF([A b]) = [C d], row-rank(A) = r and row-rank([A b]) = ra .
1. Then, using Theorem 2.2.2.18 conclude that r ≤ ra .
2. If r < ra then again using Theorem 2.2.2.18, note that ra = r + 1 and ([C d])[:, n + 1] has
a pivot at the (r + 1)-th place. Hence, by definition of RREF, ([C d])[r + 1, :] = [0T , 1].
3. If r = ra then ([C d])[:, n + 1] has no pivot. Thus, [0T , 1] is not a row of [C d].

Now, consider an m×n matrix A and an elementary matrix E of order n. Then the product AE
corresponds to applying column transformation on the matrix A. Therefore, for each elementary
T

matrix, there is a corresponding column transformation as well. We summarize these ideas as


AF

follows.
DR

Definition 2.2.2.23. The column transformations obtained by right multiplication


 of ele-
1 2 3 1
 
mentary matrices are called column operations. For example, if A = 
2 0 3 2 then

3 4 5 3
   
1 3 2 1 1 2 3 0
   
AE23 = 2 3 0 2 and AE14 (−1) = 2 0 3 0
  
.
3 5 4 3 3 4 5 0
Remark 2.2.2.24 (Rank of a Matrix). 1. The idea of row-rank was based on RREF and
RREF was the result of systematically applying a finite number of elementary row op-
erations. So, starting with a matrix A, we can systematically apply a finite number of
elementary column operations (see Definition 2.2.2.23) to get a matrix which in some
sense looks similar to RREF, call it say B, and then use the non-zero columns in that to
define the column-rank. Note that B will have the following form:

(a) The zero columns appear after the non-zero columns.


(b) The first non-zero entry of a non-zero column is 1, called the pivot.
(c) The pivots in non-zero column move down in successive columns.

2. It will be proved later that row-rank(A) = column-rank(A). Thus, we just talk of the
“rank”, denoted Rank(A).

we are now ready to prove a few results associated with the rank of a matrix.
34 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

Theorem 2.2.2.25. Let A be a matrix of rank r. Then there exist a finite number of elementary
matrices E1 , . . . , Es and F1 , . . . , Fℓ such that
" #
Ir 0
E1 · · · Es A F1 · · · Fℓ = .
0 0
Proof. Let C = RREF(A). As Rank(A) = r, by definition of RREF, there exist elementary
matrices E1 , . . . , Es such that C = E1 · · · Es A. Note that C has r pivots and they appear in
columns, say i1 < i2 < · · · < ir .
Now, let D be the matrix obtained from C by successively multiplying
" # the elementary matrices
Ir B
Ejij , for 1 ≤ j ≤ r, on the right of C. Then observe that D = , where B is a matrix of
0 0
an appropriate size.
As the (1, 1) block of D is an identity matrix, the block (1, 2) can be made the zero matrix by
elementary column operations to D. Thus, the required result follows.
" # " #
2 4 8 1 0 0
Exercise 2.2.2.26. 1. Let A = and B = . Find P and Q such that
1 3 2 0 1 0
B = P AQ.
T
2. Let A be"a matrix#of rank r. Then
" prove
# that there "exist invertible
# matrices B
" i , Ci such
# that
AF

R1 R2 S1 0 A1 0 Ir 0
B1 A = , AC1 = , B2 AC2 = and B3 AC3 = , where
0 0 S3 0 0 0 0 0
DR

the (1, 1) block of each matrix is of size r × r. Also, prove that A1 is an invertible matrix.

3. Let A be an m × n matrix of rank r. Then prove that A can be written as A = BC, where
both B and C have rank r and B is of size m × r and C is of size r × n.

4. Prove that if the product AB is defined and Rank(A) = Rank(AB) then A = ABX for some
matrix X. Similarly, if BA is defined and Rank(A) = Rank(BA) then "A = Y BA # for some
A1 0
matrix Y . [Hint: Choose invertible matrices P, Q satisfying P AQ = , P (AB) =
0 0
" #
−1 A2 A3
(P AQ)(Q B) = . Now find an invertible matrix R such that P (AB)R =
0 0
" # " #
C 0 C −1 A1 0
. Now, define X = R Q−1 to get the required result.]
0 0 0 0

5. Prove that if AB is defined then Rank(AB) ≤ Rank(A) and Rank(AB) ≤ Rank(B).

6. Let P and Q be invertible matrices such that the matrix product P AQ is defined. Then
prove that Rank(P AQ) = Rank(A).

7. Prove that if A + B is defined then Rank(A + B) ≤ Rank(A) + Rank(B).

2.2.C Gauss-Jordan Elimination and System of Linear Equations

Let A be an m × n matrix. We now present an algorithm, commonly known as the Gauss-Jordan


Elimination (GJE) method, to compute the RREF of A.
2.2. SYSTEM OF LINEAR EQUATIONS 35

1. Input: A.
2. Output: a matrix B in RREF such that A is row equivalent to B.
3. Step 1: Put ‘Region’ = A.
4. Step 2: If all entries in the Region are 0, STOP. Else, in the Region, find the leftmost
nonzero column and find its topmost non-zero entry. Suppose this non-zero entry is aij .
Box it. This is a pivot.
5. Step 3: Replace the row containing the pivot with the top row of the region. Also, make
the pivot entry 1. Use this pivot to make other entries in the pivotal column as 0.
6. Step 4: Put Region = the submatrix below and to the right of the current pivot. Now,
go to step 2.
Important: The process will stop, as we can get at most min{m, n} pivots.
 
0 2 3 7
1 1 1 1
Example 2.2.2.27. Apply GJE to  1 3 4 8

0 0 0 1
1. Region = A as A 6= 0.
   
1 1 1 1 1 1 1 1
T

 0 2 3 7 0 2 3 7
AF

2. Then E12 A =  
 1 3 4 8. Also, E31 (−1)E12 A =  0
  = B (say).
2 3 7
DR

0 0 0 1 0 0 0 1
   
2 3 7 2 3 7
3. Now, Region = 2 3  7 6= 0. Let C = 2 3 7.
0 0 1 0 0 1
 3 7
  
1 2 2 1 32 27
4. Then E1 ( 21 )C =  2 3 7  and E21 (−2)E1 ( 12 )C =  0 0 0 . Or equivalently,
0 0 1 0 0 1
 
1 1 1 1
 0 1 32 72 
E32 (−2)E2 ( 21 )B = 
0
.
0 0 0
0 0 0 1
 
1 1 1 1
0 1 32 27 
5. Thus, E34 E32 (−2)E2 ( 21 )E31 (−1)E12 A = 
0
. Hence,
0 0 1
0 0 0 0
 
1 0 − 12 0
3 7 1 0 1 3
0
E12 (− )E13 (−1)E23 (− )E34 E32 (−2)E2 ( )E31 (−1)E12 A =  
2 .
2 2 2 0 0 0 1
0 0 0 0

6. As the matrix A has been multiplied with elementary matrices on the left the RREF
 
1 0 − 21 0
0 1 3
0
matrix 
0
2  is equivalent to A.
0 0 1
0 0 0 0
36 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

Remark 2.2.2.28 (Gauss Elimination (GE)). GE is similar to the GJE except that
1. the pivots need not be made 1 and
2. the entries above the pivots need not be made 0.
   
1 1 1 1 1 1
For example, if A = 1 2 3 then GE gives E32 (−3)E21 (−1)E31 (−1)A = 0 1 2.
1 4 9 0 0 2
Thus, Gauss-Jordan Elimination may be viewed as an extension of the Gauss Elimination.

Example 2.2.2.29. Consider the system Ax = b with A a matrix of order 3× 3 and A[:, 1] 6= 0.
If [C d] = RREF([A b]) then the possible choices for [C d] are given below:
 
   
1 0 0 d1 x d1
 
   
1. 0 1 0 d2 . Here, Ax = b is consistent and with unique solution y = d2 . 
0 0 1 d z d3
3
     
1 0 α 0 1 α 0 0 1 α β 0
     
2. 
0 1 β 0    
 , 0 0 1 0 or 0 0 0 1. Here, Ax = b is inconsistent for any
0 0 0 1 0 0 0 1 0 0 0 0
choice of α, β.
T
AF

     
1 0 α d1 1 α 0 d1 1 α β d1
     
DR

3. 
0 1 β d2   , 0 0 1 d2  or 0 0 0 0 . Here, Ax = b is consistent and has
  
0 0 0 0 0 0 0 0 0 0 0 0
infinite number of solutions for every choice of α, β.

Exercise 2.2.2.30. 1. Let Ax = b be a linear system of m equations in 2 unknowns. What


are the possible choices for RREF([A b]) if m ≥ 1?

2. Find the row-reduced echelon form of the following matrices:


 
      −1 −1 −2 3
0 0 1 0 1 1 3 0 −1 1  
      3 3 −3 −3
1 0 3 , 0 0 1 3 , −2 0 3 ,  .
      1 1 2 2
 
3 0 7 1 1 0 0 −5 1 0
−1 −1 2 −2

3. Find the solutions of the linear system of equations using Gauss-Jordan method.

x +y −2u +v = 2
z +u +2v = 3
v +w = 3
v +2w = 5

Now, using Proposition 2.2.2.7, Theorem 2.2.2.8 and the definition of RREF of a matrix, we
obtain the following remark.

Remark 2.2.2.31. Let RREF([A b]) = [C d]. Then


2.2. SYSTEM OF LINEAR EQUATIONS 37

1. there exist elementary matrices, say E1 , . . . , Ek , such that E1 · · · Ek [A b] = [C d]. Thus,


the GJE (or the GE) is equivalent to multiplying by a finite number of elementary matrices
on the left of [A b].
2. by Theorem 2.2.2.18 RREF(A) = C.

Definition 2.2.2.32. [Basic, Free Variables] Consider the linear system Ax = b. If RREF([A b]) =
[C d] then the unknowns

1. corresponding to the pivotal columns of C are called the basic variables.

2. that are not basic are called free variables.


 
1 0 0 1
Example 2.2.2.33. 1. Let RREF([A b]) = 0 1 1 2. Then, Ax = b has
0 0 0 0

[x, y, z]T | [x, y, z] = [1, 2 − z, z] = [1, 2, 0] + z[0, −1, 1], with z arbitrary

as it’s solution set. Note that x and y are basic variables and z is the free variable.
 
1 0 0 0
2. Let RREF([A b]) = 0 1 1 0. Then, the system Ax = b has no solution as
T
AF

0 0 0 1
RREF([A b])[3, :] = [0, 0, 0, 1] which corresponds to the equation 0 · x + 0 · y + 0 · z = 1.
DR

3. Suppose the system Ax = b is consistent and RREF(A) has r non-zero rows. Then the
system has r basic variables and n − r free variables.

We now prove the main result in the theory of linear system (recall Remark 2.2.2.22).

Theorem 2.2.2.34. Let A be an m × n matrix and let RREF([A b]) = [C d], Rank(A) = r
and Rank([A b]) = ra . Then Ax = b

1. is inconsistent if r < ra

2. is consistent if r = ra . Furthermore, Ax = b has

(a) a unique solution if r = n.


(b) infinite number of solutions if r < n. In this case, the solution set equals

{x0 + k1 u1 + k2 u2 + · · · + kn−r un−r | ki ∈ R, 1 ≤ i ≤ n − r},

where x0 , u1 , . . . , un−r ∈ Rn with Ax0 = b and Aui = 0, for 1 ≤ i ≤ n − r.

Proof. Part 1: As r < ra , by Remark 2.2.2.22 ([C d])[r + 1, :] = [0T , 1]. Then, this row
corresponds to the linear equation

0 · x1 + 0 · x2 + · · · + 0 · xn = 1

which clearly has no solution. Thus, by definition and Theorem 2.2.1.14, Ax = b is inconsistent.
38 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

Part 2: As r = ra , by Remark 2.2.2.22, [C d] doesn’t have a row of the form [0T , 1] and there
are r pivots in C. Suppose the pivots appear in columns i1 , . . . , ir with 1 ≤ i1 < · · · < ir ≤ n.
Thus, the unknowns xij , for 1 ≤ j ≤ r, are basic variables and the remaining n − r variables,
say xt1 , . . . , xtn−r , are free variables with t1 < · · · < tn−r . Since C is in RREF, in terms of the
free variables and basic variables, the ℓ-th row of [C d], for ℓ, 1 ≤ ℓ ≤ r, corresponds to the
equation
n−r
X n−r
X
x iℓ + cℓtk xtk = dℓ ⇔ xiℓ = dℓ − cℓtk xtk .
k=1 k=1

Hence, the solution set of the system Cx = d is given by


 
P
n−r
         
x i1 d1 − k=1 c1tk xtk  d 1 c1t c1t c1t
  1 2 n−r
 ..   ..   ..   ..   ..   .. 
 .   .  .  .   .   . 
           
 x ir   P
n−r  dr  crt1  crt2  crtn−r 
   crtk xtk         
 xt1  = dr −  =  0  + xt  1  + xt  0  + · · · + xt  0 . (2.2.2.4)
   k=1    1  2  n−r  
 xt   xt1  0  0   1   0 
 2           
 ..   x   ..   ..   ..   .. 
 .   t 2  .  .   .   . 
 .. 
xtn−r  .  0 0 0 1
xtn−r
T
AF

Thus, by Theorem 2.2.1.14 the system Ax = b is consistent. In case of Part 2a, r = n and hence
DR

there are no free variables. Thus, the


 unique
 solution  xi = di , for1 ≤ i ≤
 equals  n.
d1 c1t1 c1tn−r
 ..   ..   .. 
.  .   . 
     
dr  crt1  crtn−r 
     
In case of Part 2b, define x0 =      
 0  and u1 =  1 , . . . , un−r =  0 . Then, it can
0  0   0 
     
.  .   . 
.. .
 .   . .
0 0 1
be easily verified that Ax0 = b and for 1 ≤ i ≤ n − r, Aui = 0 and by Equation (2.2.2.4) the
solution set has indeed the required form, where ki corresponds to the free variable xti . As there
is at least one free variable the system has infinite number of solutions. Thus, the proof of the
theorem is complete.
Let A be an m × n matrix. Then by Remark 2.2.2.20, Rank(A) ≤ m and hence using Theo-
rem 2.2.2.34 the next result follows.

Corollary 2.2.2.35. Let A be a matrix of order m × n and consider Ax = 0. If


1. Rank(A) = r < min{m, n} then Ax = 0 has infinitely many solutions.
2. m < n, then Ax = 0 has infinitely many solutions.

Thus, in either case, the homogeneous system Ax = 0 has at least one non-trivial solution.

Remark 2.2.2.36. Let A be an m × n matrix. Then Theorem 2.2.2.34 implies that

1. the linear system Ax = b is consistent if and only if Rank(A) = Rank([A b]).


2.2. SYSTEM OF LINEAR EQUATIONS 39

2. the vectors associated to the free variables in Equation (2.2.2.4) are solutions to the asso-
ciated homogeneous system Ax = 0.

We end this subsection with some applications.

Example 2.2.2.37. 1. Determine the equation of the line/circle that passes through the
points (−1, 4), (0, 1) and (1, 4).
Solution: The general equation of a line/circle in euclidean plane is given by a(x2 +
y 2 ) + bx + cy + d = 0, where a, b, c and d are unknowns. Since this curve passes through
the given points, weget a homogeneous
 system in 3 equations and 4 unknowns, namely
  a
(−1)2 + 42 −1 4 1  
b
 (0)2 + 12 0 1 1 3 16
 c  = 0. Solving this system, we get [a, b, c, d] = [ 13 d, 0, − 13 d, d].
2
1 +4 2 1 4 1
d
Hence, taking d = 13, the equation of the required circle is 3(x2 + y 2 ) − 16y + 13 = 0.

2. Determine the equation of the plane that contains the points (1, 1, 1), (1, 3, 2) and (2, −1, 2).
Solution: The general equation of a plane in space is given by ax + by + cz + d = 0, where
a, b, c and d are unknowns. Since this plane passes through the 3 given points, we get a
homogeneous system in 3 equations and 4 unknowns. So, it has a non-trivial solution,
T
AF

namely [a, b, c, d] = [− 43 d, − d3 , − 32 d, d]. Hence, taking d = 3, the equation of the required


plane is −4x − y + 2z + 3 = 0.
DR

 
2 3 4
 
3. Let A =  
0 −1 0. Then
0 −3 4

(a) find a non-zero x ∈ R3 such that Ax = 2x.


(b) does there exist a non-zero vector y ∈ R3 such that Ay = 4y?

Solution of Part 3a: Solving


 for Ax =2x is equivalent to solving (A − 2I)x = 0 whose
0 3 4 0
 
augmented matrix equals 0 −3 0 0
 T
. Verify that x = [1, 0, 0] is a non-zero solution.
0 4 2 0
Part 3b: As above,
 Ay = 4yis equivalent to solving (A − 4I)y = 0 whose augmented
−2 3 4 0
 
matrix equals  0 −5 0 0
 T
. Now, verify that y = [2, 0, 1] is a non-zero solution.
0 −3 0 0

Exercise 2.2.2.38. 1. In the first part of this chapter 3 figures (see Figure ??) were given to
illustrate different cases in Euclidean plane (2 equations in 2 unknowns). It is well known
that in the case of Euclidean space (3 equations in 3 unknowns) there

(a) is a way to place the 3 planes so that the system has a unique solution.
(b) are 4 distinct ways to place the 3 planes so that the system has no solution.
40 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

(c) are 3 distinct ways to place the 3 planes so that the system has an infinite number of
solutions.

Determine the position of planes by drawing diagram to explain the above cases. Do these
diagrams and the RREF matrices that appear in Example 2.2.2.29 have any relationship?
Justify your answer.

2. Determine the equation of the curve y = ax2 + bx + c that passes through the points
(−1, 4), (0, 1) and (1, 4).

3. Solve the following linear systems.

(a) x + y + z + w = 0, x − y + z + w = 0 and −x + y + 3z + 3w = 0.
(b) x + 2y = 1, x + y + z = 4 and 3y + 2z = 1.
(c) x + y + z = 3, x + y − z = 1 and x + y + 7z = 6.
(d) x + y + z = 3, x + y − z = 1 and x + y + 4z = 6.
(e) x + y + z = 3, x + y − z = 1, x + y + 4z = 6 and x + y − 4z = −1.

4. For what values of c and k, the following systems have i) no solution, ii) a unique solution
T
AF

and iii) infinite number of solutions.


DR

(a) x + y + z = 3, x + 2y + cz = 4, 2x + 3y + 2cz = k.
(b) x + y + z = 3, x + y + 2cz = 7, x + 2y + 3cz = k.
(c) x + y + 2z = 3, x + 2y + cz = 5, x + 2y + 4z = k.
(d) kx + y + z = 1, x + ky + z = 1, x + y + kz = 1.
(e) x + 2y − z = 1, 2x + 3y + kz = 3, x + ky + 3z = 2.
(f ) x − 2y = 1, x − y + kz = 1, ky + 4z = 6.

5. For what values of a, does the following systems have i) no solution, ii) a unique solution
and iii) infinite number of solutions.

(a) x + 2y + 3z = 4, 2x + 5y + 5z = 6, 2x + (a2 − 6)z = a + 20.


(b) x + y + z = 3, 2x + 5y + 4z = a, 3x + (a2 − 8)z = 12.

6. Find the condition(s) on x, y, z so that the system of linear equations given below (in the
unknowns a, b and c) is consistent?

(a) a + 2b − 3c = x, 2a + 6b − 11c = y, a − 2b + 7c = z
(b) a + b + 5c = x, a + 3c = y, 2a − b + 4c = z
(c) a + 2b + 3c = x, 2a + 4b + 6c = y, 3a + 6b + 9c = z

7. Let A be an n × n matrix. If the system A2 x = 0 has a non trivial solution then show that
Ax = 0 also has a non trivial solution.
2.3. SQUARE MATRICES AND SYSTEM OF LINEAR EQUATIONS 41

8. Prove that 5 distinct points are needed to specify a general conic in Euclidean plane.

9. Let u = (1, 1, −2)T and v = (−1, 2, 3)T . Find condition on x, y and z such that the system
cu + dv = (x, y, z)T in the unknowns c and d is consistent.

10. Consider the linear system Ax = b in m equations and 3 unknowns. Then, for each of
the given solution set, determine the possible choices of m? Further, for each choice of m,
determine a choice of A and b.
(a) (1, 1, 1)T is the only solution.
(b) (1, 1, 1)T is the only solution.
(c) {(1, 1, 1)T + c(1, 2, 1)T |c ∈ R} as the solution set.
(d) {c(1, 2, 1)T |c ∈ R} as the solution set.
(e) {(1, 1, 1)T + c(1, 2, 1)T + d(2, 2, −1)T |c, d ∈ R} as the solution set.
(f ) {c(1, 2, 1)T + d(2, 2, −1)T |c, d ∈ R} as the solution set.

2.3 Square Matrices and System of Linear Equations


T

In this section the coefficient matrix of the linear system Ax = b will be a square matrix. We
AF

start with proving a few equivalent conditions that relate different ideas.
DR

Theorem 2.2.3.1. Let A be a square matrix of order n. Then the following statements are
equivalent.

1. A is invertible.

2. The homogeneous system Ax = 0 has only the trivial solution.

3. Rank(A) = n.

4. The RREF of A is In .

5. A is a product of elementary matrices.

Proof. 1 =⇒ 2 As A is invertible, A−1 exists and A−1 A = In . So, if x0 is any solution of


the homogeneous system Ax = 0. Then

x0 = In · x0 = (A−1 A)x0 = A−1 (Ax0 ) = A−1 0 = 0.

Hence, 0 is the only solution of the homogeneous system Ax = 0.


2 =⇒ 3 Let if possible Rank(A) = r < n. Then, by Corollary 2.2.2.35, the homogeneous
system Ax = 0 has infinitely many solution. A contradiction. Thus, A has full rank.
3 =⇒ 4 Suppose Rank(A) = n and let B = RREF(A). As B = [bij ] is a square matrix of
order n, each column of B contains a pivot. Since the pivots move to the right as we go down
the row, the i-th pivot must be at position bii , for 1 ≤ i ≤ n. Thus, B = In and hence, the
RREF of A is In .
42 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

4 =⇒ 5 Suppose RREF(A) = In . Then using Proposition 2.2.2.7, there exist elementary


matrices E1 , . . . , Ek such that E1 · · · Ek A = In . Or equivalently,

A = (E1 · · · Ek )−1 = Ek−1 · · · E1−1 (2.2.3.1)

which gives the desired result as by Remark 2.2.2.5 we know that the inverse of an elementary
matrix is also an elementary matrix.
5 =⇒ 1 Suppose A = E1 · · · Ek for some elementary matrices E1 , . . . , Ek . As the elemen-
tary matrices are invertible (see Remark 2.2.2.5) and the product of invertible matrices is also
invertible, we get the required result.
As an immediate consequence of Theorem 2.2.3.1, we have the following important result which
implies that one needs to compute either the left or the right inverse to prove invertibility.

Corollary 2.2.3.2. Let A be a square matrix of order n. Suppose there exists a matrix

1. C such that CA = In . Then A−1 exists.


T

2. B such that AB = In . Then A−1 exists.


AF
DR

Proof. Part 1: Let CA = In for some matrix C and let x0 be a solution of the homogeneous
system Ax = 0. Then Ax0 = 0 and

x0 = In · x0 = (CA)x0 = C(Ax0 ) = C 0 = 0.

Thus, the homogeneous system Ax = 0 has only the trivial solution. Hence, using Theo-
rem 2.2.3.1, the matrix A is invertible.
Part 2: Using the first part, B is invertible. Hence, B −1 = A or equivalently A−1 = B and
thus A is invertible as well.
Another important consequence of Theorem 2.2.3.1 is stated next which uses Equation (2.2.3.1)
to get the required result. This result is used to compute the inverse of a matrix using the Gauss-
Jordan Elimination.

Corollary 2.2.3.3. Let A be an invertible matrix of order n. Suppose there exist elementary
matrices E1 , . . . , Ek such that E1 · · · Ek A = In . Then A−1 = E1 · · · Ek .

Remark 2.2.3.4. Let A be an n × n matrix. Apply GJE to [A In ] and let RREF([A In ]) =


[B C]. If B = In , then A−1 = C or else A is not invertible.
 
0 0 1
 
Example 2.2.3.5. Use GJE to find the inverse of A = 
 0 1 1.

1 1 1
2.3. SQUARE MATRICES AND SYSTEM OF LINEAR EQUATIONS 43
 
0 0 1 1 0 0
 
Solution: Applying GJE to [A | I3 ] =  
 0 1 1 0 1 0  gives
1 1 1 0 0 1
   
1 1 1 0 0 1 1 1 0 −1 0 1
E13  E13 (−1),E23 (−2)
[A | I3 ] → 0 1 1 0 1 0 → 0 1 0 −1 1 0
0 0 1 1 0 0 0 0 1 1 0 0
 
1 0 0 0 −1 1
E12 (−1)
→ 0 1 0 −1 1 0.
0 0 1 1 0 0
 
0 −1 1
 
Thus, A−1 = 
−1 1 0
.
1 0 0

Exercise 2.2.3.6. 1. Find


 the inverse
 ofthe following
 matrices
 using
 GJE.
1 2 3 1 3 3 2 1 1 0 0 2
       
(i)      
1 3 2 (ii) 2 3 2 (iii) 1 2 1 (iv) 0 2
 1.
2 4 7 2 4 7 1 1 2 2 1 1
T
AF

2. Let A and B be two matrices having positive entries and of order 1×2 and 2×1, respectively.
Which of BA or AB is invertible? Give reasons.
DR

3. Let A be an n × m matrix and B be an m × n matrix. Prove that

(a) I − BA is invertible if and only if I − AB is invertible [Use Theorem 2.2.3.1.2].


(b) if I − AB is invertible then (I − BA)−1 = I + B(I − AB)−1 A.
(c) if I − AB is invertible then (I − BA)−1 B = B(I − AB)−1 .
(d) if A, B and A + B are invertible then (A−1 + B −1 )−1 = A(A + B)−1 B.

4. Let A be a square matrix. Then prove that


(a) A is invertible if and only if AT A is invertible.
(b) A is invertible if and only if AAT is invertible.

We end this section by giving two more equivalent conditions for a matrix to be invertible.

Theorem 2.2.3.7. The following statements are equivalent for an n × n matrix A.

1. A is invertible.

2. The system Ax = b has a unique solution for every b.

3. The system Ax = b is consistent for every b.

Proof. 1 =⇒ 2 Note that x0 = A−1 b is the unique solution of Ax = b.


2 =⇒ 3 The system is consistent as Ax = b has a solution.
44 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

3 =⇒ 1 For 1 ≤ i ≤ n, define eTi = In [i, :]. By assumption, the linear system Ax = ei has
a solution, say xi , for 1 ≤ i ≤ n. Define a matrix B = [x1 , . . . , xn ]. Then

AB = A[x1 , x2 . . . , xn ] = [Ax1 , Ax2 . . . , Axn ] = [e1 , e2 . . . , en ] = In .

Therefore, by Corollary 2.2.3.2, the matrix A is invertible.


We now give an immediate application of Theorem 2.2.3.7 and Theorem 2.2.3.1 without proof.

Theorem 2.2.3.8. The following two statements cannot hold together for an n × n matrix A.

1. The system Ax = b has a unique solution for every b.

2. The system Ax = 0 has a non-trivial solution.

Exercise 2.2.3.9. 1. Let A and B be square matrices of order n with B = P A, for an


invertible matrix P . Then prove that A is invertible if and only if B is invertible.

2. Let A and B be two m × n matrices. Then prove that A and B are equivalent if and only
if B = P A, where P is product of elementary matrices. When is this P unique?
T

3. Let bT = [1, 2, −1, −2]. Suppose A is a 4 × 4 matrix such that the linear system Ax = b
AF

has no solution. Mark each of the statements given below as true or false?
DR

(a) The homogeneous system Ax = 0 has only the trivial solution.

(b) The matrix A is invertible.


(c) Let cT = [−1, −2, 1, 2]. Then the system Ax = c has no solution.
(d) Let B = RREF(A). Then

i. B[4, :] = [0, 0, 0, 0].


ii. B[4, :] = [0, 0, 0, 1].
iii. B[3, :] = [0, 0, 0, 0].
iv. B[3, :] = [0, 0, 0, 1].
v. B[3, :] = [0, 0, 1, α], where α is any real number.

2.3.A Determinant

In this section, we associate a number with each square matrix. To start with, let A be an n × n

matrix. Then, for 1 ≤ i, j ≤ k and 1 ≤ αi , βj ≤ n, by A(α1 , . . . , αk β1 , . . . , βℓ ) we mean that
submatrix of A that is obtained by deleting the rows corresponding to αi ’s and the columns
corresponding to βj ’s of A.
 
1 2 3 " #
  1 2
Example 2.2.3.10. For A =  
1 3 2, A(1 | 2) = 2 7 and A(1, 2 | 1, 3) = [4].
2 4 7
2.3. SQUARE MATRICES AND SYSTEM OF LINEAR EQUATIONS 45

With the notations as above, we are ready to give an inductive definition of the determinant
of a square matrix. The advanced students can find the actual definition of the determinant
in Appendix 7.7.1.22, where it is proved that the definition given below corresponds to the
expansion of determinant along the first row.

Definition 2.2.3.11. [Determinant] Let A be a square matrix of order n. Then the determinant
of A, denoted det(A) (or | A | ) is defined by


 a, if A = [a] (n = 1),
det(A) = Pn 

 (−1)1+j a1j det A(1 | j) , otherwise.
j=1

Example 2.2.3.12. 1. Let A = [−2]. Then det(A) = | A | = −2.


" #
a b
2. Let A = . Then, det(A) = | A | = a det(A(1 | 1)) − b det(A(1 | 2)) = ad − bc. For
c d
" #
1 2 1 2

example, if A = then det(A) = = 1 · 5 − 2 · 3 = −1.
3 5 3 5

3. Let A = [aij ] be a 3 × 3 matrix. Then,


T
AF

det(A) = | A | = a11 det(A(1 | 1)) − a12 det(A(1 | 2)) + a13 det(A(1 | 3))

a a a
DR

22 a23 21 a23 21 a22


= a11 − a12 + a13
a32 a33 a31 a33 a31 a32
= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 ).
 
1 2 3
  3 1 2 1 2 3

For A =  
2 3 1, | A | = 1 · 2 2 − 2 · 1 2 + 3 · 1 2 = 4 − 2(3) + 3(1) = 1.
1 2 2

Exercise
 2.2.3.13.
 Find
 the determinant
 of the following matrices.
1 2 7 8 3 0 0 1  
    1 a a2
0 4 3 2 0 2 0 5  

i)   ii)   iii) 1 b b2 
0 0 2 3  6 −7 1 0  .
   
1 c c2
0 0 0 5 3 2 0 6

Definition 2.2.3.14. [Singular, Non-Singular] A matrix A is said to be a singular if det(A) =


0 and is called non-singular if det(A) 6= 0.

The next result relates the determinant with row operations. For proof, see Appendix 7.2.

Theorem 2.2.3.15. Let A be an n × n matrix. If

1. B = Eij A, for 1 ≤ i 6= j ≤ n, then det(B) = − det(A).

2. B = Ei (c)A, for c 6= 0, 1 ≤ i ≤ n, then det(B) = c det(A).

3. B = Eij (c)A, for c 6= 0 and 1 ≤ i 6= j ≤ n, then det(B) = det(A).


46 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

4. A[i, :]T = 0, for 1 ≤ i, j ≤ n then det(A) = 0.

5. A[i, :] = A[j, :] for 1 ≤ i 6= j ≤ n then det(A) = 0.

6. A is a triangular matrix with d1 , . . . , dn on the diagonal then det(A) = d1 · · · dn .




2 2 6 E1 ( 1 ) 1 1 3 1 1 3
E21 (−1),E31 (−1)
Example 2.2.3.16. 1. Since 1 3 2 →2 1 3 2 → 0 2 −1 , using


1 1 2 1 1 2 0 0 −1
 
2 2 6
 
Theorem 2.2.3.15, we see that, for A =  
1 3 2, det(A) = 2 · (1 · 2 · (−1)) = −4, where
1 1 2
1
the first 2 appears from the elementary matrix E1 ( ).
2
 
2 2 6 8 1 1 3 4
1
  E1 ( )
1 1 2 4 2 1 1 2 4
 
2. For A =   verify that | A | → 1 3 2 6 . Now, a successive
1 3 2 6

3 3 5 8 3 3 5 8

1 1 3 4
T



AF

0 0 −1 0
application of E21 (−1), E31 (−1) and E41 (−3) gives and then applying E32

DR

0 2 −1 2

0 0 −4 −4

1 1 3 4


0 2 −1 2
and E43 (−4), we get . Thus, by Theorem 2.2.3.15 det(A) = 2 · (−1) · (8) =
0 0 −1 0


0 0 0 −4
1
−16 as 2 gets contributed due to E1 ( ) and −1 due to E32 .
2
Exercise 2.2.3.17. Use Theorem 2.2.3.15 to arrive at the answer.
     
a b c a b c a e αa + βe + h
     
1. Let A =   e f g ,B =  e
  f g  and C T =  b f αb + βf + j  for some
  
h j ℓ αh αj αℓ c g αc + βg + ℓ
complex numbers α and β. Prove that det(B) = α det(A) and det(C) = det(A).


1 2 1 1 −1 0

2. Prove that 3 divides 3 3 5 and −1 0 1 = 0.

2 1 3 0 1 −1

By Theorem 2.2.3.15.6 det(In ) = 1. The next result about the determinant of the elementary
matrices is an immediate consequence of Theorem 2.2.3.15 and hence the proof is omitted.

Corollary 2.2.3.18. Fix a positive integer n. Then

1. det(Eij ) = −1.
2.3. SQUARE MATRICES AND SYSTEM OF LINEAR EQUATIONS 47

2. For c 6= 0, det(Ek (c)) = c.

3. For c 6= 0, det(Eij (c)) = 1.

Remark 2.2.3.19. Theorem 2.2.3.15.1 implies that the determinant can be calculated by ex-
panding along any row. Hence, the readers are advised to verify that
n
X
det(A) = (−1)k+j akj det(A(k | j)), for 1 ≤ k ≤ n.
j=1

Example 2.2.3.20. Let us use Remark 2.2.3.19 to compute the determinant.




2 2 6 2 6 2 2

1. 0 1 2 = (−1)2+2 · + (−1)2+3 · 2 · = (2 − 6) − 2(4 − 2) = −8.
1 1 1 2
1 2 1


2 2 1 2 1

2. 0 1 0 = (−1)2+2 · = 2 − 1 = 1.
1 1
1 2 1

2 2 6 1

T

2 2 1 2 2 6
0 0 2 1
AF

3. = (−1)2+3 · 2 · 0 1 0 + (−1)2+4 · 0 1 2 = −2 · 1 + (−8) = −10.


0 1 2 0
DR

1 2 1 1 2 1
1 2 1 1

Remark 2.2.3.21. 1. Let uT = [u1 , u2 ], vT = [v1 , v2 ] ∈ R2 . Now, consider the parallelogram


on vertices P = [0, 0]T , Q = u, R = u+v and S = v (see Figure 3). Then Area (P QRS) =
u u
1 2
| u1 v2 − u2 v1 | , the absolute value of .
v1 v2

u×v
T
w
R
γ S v
θ Qu
P
Figure 3: Parallelepiped with vertices P, Q, R and S as base

Recall that the dot product of uT , vT , denoted u • v = u1 v1 + u2 v2 , the length of u, denoted


p u•v
ℓ(u) = u21 + u22 and cos(θ) = ℓ(u)ℓ(v) , where θ is the angle between u and v. Therefore
s  2
u•v
Area(P QRS) = ℓ(u)ℓ(v) sin(θ) = ℓ(u)ℓ(v) 1−
ℓ(u)ℓ(v)
p p
= ℓ(u)2 · ℓ(v)2 − (u • v)2 = (u1 v2 − u2 v1 )2 = | u1 v2 − u2 v1 | .

That is, in R2 , the determinant is ± times the area of the parallelogram.


48 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

2. Consider Figure 3 again. Let u = [u1 , u2 , u3 ]T , v = [v1 , v2 , v3 ]T , w = [w1 , w2 , w3 ]T ∈ R3 .


Then u • v = u1 v1 + u2 v2 + u3 v3 and the cross product of u and v, denoted

u × v = (u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 ).

Also, the vector u × v is perpendicular to the plane containing both u and v. So, if
u3 = v3 = 0 then one can think of u and v as vectors in the XY -plane and in this case
ℓ(u × v) = | u1 v2 − u2 v1 | = Area(P QRS). Hence, if γ is the angle between the vector w
and the vector u × v then


w
1 w2 w3

volume (P ) = Area(P QRS) · height = | w • (u × v) | = ± u1 u2 u3 .

v1 v2 v3

3. In general, for an n × n matrix A, | det(A) | satisfies all the properties associated with the
volume of the n-dimensional parallelepiped. The actual proof is beyond the scope of this
book. But, one can observe the following:
(a) The volume of the n-dimensional unit cube is 1 = det(In ).
T

(b) If one vector is multiplied by c 6= 0 then the volume either reduces or expands by c.
AF

The same holds for determinant.


DR

(c) Recall that if the height and base of a parallelogram is not changed then the area
remains the same. This corresponds to replacing the perpendicular vector with the
perpendicular vector plus c 6= 0 times the base vector for different choices of c.

2.3.B Adjugate (classically Adjoint) of a Matrix

Definition 2.2.3.22. [Minor, Cofactor] Let A be an n × n matrix. Then the


1. (i, j)th minor of A, denoted Aij = det (A(i | j)), for 1 ≤ i, j ≤ n.

2. (i, j)th cofactor of A, denoted Cij = (−1)i+j Aij .

3. the Adjugate (classically Adjoint) of A, denoted  Adj(A) = [bij ] with bij = Cji , for
1 2 3
  3 1

1 ≤ i, j ≤ n. For example, for A =  
2 3 1, A11 = det(A(1 | 1)) = 2 2 = 4, A12 =
1 2 2

2 1 2 3 1 3 1 2

= 3, A13 = = 1, . . . , A32 = = −5 and A33 = = −1. So,
1 2 1 2 2 1 2 3
   
(−1)1+1 A11 (−1)2+1 A21 (−1)3+1 A31 4 2 −7
   
Adj(A) =  (−1) 1+2 A
12 (−1) 2+2 A
22 (−1) 3+2 A  = −3 −1
32   5 .

(−1)1+3 A13 (−1)2+3 A23 (−1)3+3 A33 1 0 −1

We now prove a very important result that relates adjugate matrix with the inverse.

Theorem 2.2.3.23. Let A be an n × n matrix. Then


2.3. SQUARE MATRICES AND SYSTEM OF LINEAR EQUATIONS 49

P
n P
n
1. for 1 ≤ i ≤ n, aij Cij = aij (−1)i+j Aij = det(A),
j=1 j=1

P
n P
n
2. for i 6= ℓ, aij Cℓj = aij (−1)ℓ+j Aℓj = 0, and
j=1 j=1

3. A(Adj(A)) = det(A)In . Thus,


1
whenever det(A) 6= 0 one has A−1 = Adj(A). (2.2.3.2)
det(A)

Proof. Part 1: It follows directly from Remark 2.2.3.19 and the definition of the cofactor.
Part 2: Fix positive integers i, ℓ with 1 ≤ i 6= ℓ ≤ n and let B = [bij ] be a square matrix
with B[ℓ, :] = A[i, :] and B[t, :] = A[t, :], for t 6= ℓ. As ℓ 6= i, B[ℓ, :] = B[i, :] and thus, by
Theorem 2.2.3.15.5, det(B) = 0. As A(ℓ | j) = B(ℓ | j), for 1 ≤ j ≤ n, using Remark 2.2.3.19
n
X n
 X 
0 = det(B) = (−1)ℓ+j bℓj det B(ℓ | j) = (−1)ℓ+j aij det B(ℓ | j)
j=1 j=1
Xn Xn

= (−1)ℓ+j aij det A(ℓ | j) = aij Cℓj . (2.2.3.3)
j=1 j=1
T

This completes the proof of Part 2.


AF

Part 3:, Using Equation (2.2.3.3) and Remark 2.2.3.19, observe that
DR

  n n
(
 X  X 0, if i 6= j,
A Adj(A) = aik Adj(A) kj = aik Cjk =
ij k=1 k=1
det(A), if i = j.
 
1
Thus, A(Adj(A)) = det(A)In . Therefore, if det(A) 6= 0 then A det(A) Adj(A) = In . Hence,
1
by Corollary 2.2.3.2, A−1 = det(A) Adj(A).
   
1 −1 0 −1 1 −1
   
Example 2.2.3.24. For A =   
0 1 1 , Adj(A) =  1 1 −1 and det(A) = −2. Thus,
1 2 1 −1 −3 1
 
1/2 −1/2 1/2
 
by Theorem 2.2.3.23.3, A = 
−1
 −1/2 −1/2 1/2 .

1/2 3/2 −1/2
1
Let A be a non-singular matrix. Then, by Theorem 2.2.3.23.3, A−1 = det(A) Adj(A). Thus
 
A Adj(A) = Adj(A) A = det(A) In and this completes the proof of the next result

Corollary 2.2.3.25. Let A be a non-singular matrix. Then


n
(
X det(A), if j = k,
aij Cik =
i=1 0, if j 6= k.

The next result gives another equivalent condition for a square matrix to be invertible.

Theorem 2.2.3.26. A square matrix A is non-singular if and only if A is invertible.


50 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

1
Proof. Let A be non-singular. Then det(A) 6= 0 and hence A−1 = det(A) Adj(A) as .
Now, let us assume that A is invertible. Then, using Theorem 2.2.3.1, A = E1 · · · Ek a product
of elementary matrices. Also, by Corollary 2.2.3.18, det(Ei ) 6= 0 for 1 ≤ i ≤ k. Thus, a repeated
application of Parts 1, 2 and 3 of Theorem 2.2.3.15 gives det(A) 6= 0.
The next result relates the determinant of product of two matrices with their determinants.

Theorem 2.2.3.27. Let A and B be square matrices of order n. Then

det(AB) = det(A) · det(B) = det(BA).

Proof. Step 1. Let A be non-singular. Then, by Theorem 2.2.3.23.3, A is invertible and by


Theorem 2.2.3.1, A = E1 · · · Ek , a product of elementary matrices. Then a repeated application
of Parts 1, 2 and 3 of Theorem 2.2.3.15 gives the desired result as

det(AB) = det(E1 · · · Ek B) = det(E1 ) det(E2 · · · Ek B) = det(E1 ) det(E2 ) det(E3 · · · Ek B)


= · · · = det(E1 ) · · · det(Ek ) det(B) = · · · = det(E1 E2 · · · Ek ) det(B)
= det(A) det(B).

Step 2. Let A be singular. Then, by Theorem 2.2.3.26 A is not invertible. So, by Theo-
T
AF

rem 2.2.3.1
" # and Exercise 2.2.2.26.2
" # there exists an invertible matrix P such that P A = C, where
C1 C1
DR

C= . So, A = P −1 . As P is invertible, using Part 1, we have


0 0
" #! ! " #! " #!
C1 C1 B C1 B
det(AB) = det P −1 B = det P −1 = det(P −1 ) · det
0 0 0
= det(P ) · 0 = 0 = 0 · det(B) = det(A) det(B).

Thus, the proof of the theorem is complete.


The next result relates the determinant of a matrix with the determinant of its transpose.
Thus, the determinant can be computed by expanding along any column as well.

Theorem 2.2.3.28. Let A be a square matrix. Then det(A) = det(AT ).

Proof. If A is a non-singular, Corollary 2.2.3.25 gives det(A) = det(AT ).


If A is singular then, by Theorem 2.2.3.26, A is not invertible. So, AT is also not invertible
and hence by Theorem 2.2.3.26, det(AT ) = 0 = det(A).

Example 2.2.3.29. Let A be an orthogonal matrix then, by definition, AAT = I. Thus, by


Theorems 2.2.3.27 and 2.2.3.28

1 = det(I) = det(AAT ) = det(A) det(AT ) = det(A) det(A) = (det(A))2 .


   2 
a b T a + b2 ac + bd
Hence det A = ±1. In particular, if A = then I = AA = .
c d ac + bd c2 + d2
1. Thus, a2 + b2 = 1 and hence there exists θ ∈ [0, 2π] such that a = cos θ and b = sin θ.
2.3. SQUARE MATRICES AND SYSTEM OF LINEAR EQUATIONS 51

2. As ac + bd = 0, we get c = r sin θ and d = −r cos θ, for some r ∈ R. But, c2 + d2 = 1


implies that either c = sin θ and d = − cos θ or c = − sin θ and d = cos θ.
   
cos θ sin θ cos θ sin θ
3. Thus, A = or A = .
sin θ − cos θ − sin θ cos θ
 
cos θ sin θ
4. For A = , det(A) = −1. Then A represents a reflection across the line
sin θ − cos θ
y = mx. Determine m? (see Exercise 3.3b).
 
cos θ sin θ
5. For A = , det(A) = 1. Then A represents a rotation through the angle α.
− sin θ cos θ
Determine α? (see Exercise 3.3a).
Exercise 2.2.3.30. 1. Let A be an n × n upper triangular matrix with non-zero entries on
the diagonal. Then prove that A−1 is also an upper triangular matrix.
2. Let A be an n × n matrix. Then det(A) = 0 if either


a b c a e αa + βe + h

(a) Prove that e f g = b f αb + βf + j for some α, β ∈ C.

h j ℓ c g αc + βg + ℓ


3 1 1

T

(b) Prove that 17 divides 4 8 1 .


AF


0 7 9
DR

(c) A[i, :]T = 0 or A[:, i] = 0, for some i, 1 ≤ i ≤ n.


(d) or A[i, := cA[j, :] or A[:, i] = cA[:, j], for some c ∈ C and for some i 6= j.

2.3.C Cramer’s Rule

Let A be a square matrix. Then combining Theorem 2.2.3.7 and Theorem 2.2.3.26, one has the
following result.

Corollary 2.2.3.31. Let A be a square matrix. Then the following statements are equivalent:

1. A is invertible.

2. The linear system Ax = b has a unique solution for every b.

3. det(A) 6= 0.

Thus, Ax = b has a unique solution for every b if and only if det(A) 6= 0. The next theorem
gives a direct method of finding the solution of the linear system Ax = b when det(A) 6= 0.

Theorem 2.2.3.32 (Cramer’s Rule). Let A be an n × n non-singular matrix. Then the unique
solution of the linear system Ax = b with xT = [x1 , . . . , xn ] is given by

det(Aj )
xj = , for j = 1, 2, . . . , n,
det(A)

where Aj is the matrix obtained from A by replacing A[:, j] by b.


52 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

Proof. Since det(A) 6= 0, A is invertible. Thus, there exists an invertible matrix P such that
P A = In and P [A | b] = [I | P b]. Let d = Ab. Then Ax = b has the unique solution xj = dj ,
for 1 ≤ j ≤ n. Also, [e1 , . . . , en ] = I = P A = [P A[:, 1], . . . , P A[:, n]]. Thus,

P Aj = P [A[:, 1], . . . , A[:, j − 1], b, A[:, j + 1], . . . , A[:, n]]


= [P A[:, 1], . . . , P A[:, j − 1], P b, P A[:, j + 1], . . . , P A[:, n]]
= [e1 , . . . , ej−1 , d, ej+1 , . . . , en ].

dj det(P Aj ) det(P ) det(Aj ) det(Aj )


Thus, det(P Aj ) = dj , for 1 ≤ j ≤ n. Also, dj = = = = .
1 det(P A) det(P ) det(A) det(A)
det(Aj )
Hence, xj = and the required result follows.
det(A)
   
1 2 3 1
   
Example 2.2.3.33. Solve Ax = b using Cramer’s rule, where A =    
2 3 1 and b = 1.
1 2 2 1
T
Solution: Check that det(A) = 1 and x = [−1, 1, 0] as


1 2 3 1 1 3 1 2 1

T

x1 = 1 3 1 = −1, x2 = 2 1 1 = 1, and x3 = 2 3 1 = 0.
AF


1 2 2 1 1 2 1 2 1
DR

2.4 Miscellaneous Exercises


" # " #
A11 A12 B11 B12
Exercise 2.2.4.1. 1. Suppose A−1 = B with A = and B = . Also,
A21 A22 B21 B22
assume that A11 is invertible and define P = A22 − A21 A−1 11 A12 . Then prove that
   " #
I 0 A11 A12 A11 A12
(a) = ,
−A21 A−1
11 I A21 A22 0 A22 − A21 A−1 11 A12
" #
A−1
11 + (A −1
11 A12 )P −1 (A A−1 )
21 11 −(A −1
11 A12 )P −1
(b) P is invertible and B = .
−P −1 (A21 A−111 ) P −1

2. Determine necessary and sufficient condition for a triangular matrix to be invertible.

3. Let A be a unitary matrix then what can you say about | det(A) |?

4. Let A be a 2 × 2 matrix with Tr(A) = 0 and det(A) = 0. Then A is a nilpotent matrix.

5. Let A and B be two non-singular matrices. Are the matrices A+B and A−B non-singular?
Justify your answer.

6. Let A be an n × n matrix. Prove that the following statements are equivalent:

(a) A is not invertible.


(b) Rank(A) 6= n.
2.4. MISCELLANEOUS EXERCISES 53

(c) det(A) = 0.

(d) A is not row-equivalent to In .

(e) The homogeneous system Ax = 0 has a non-trivial solution.

(f ) The system Ax = b is either inconsistent or it has an infinite number of solutions.

(g) A is not a product of elementary matrices.

7. For what value(s) of λ does the following systems have non-trivial solutions? Also, for
each value of λ, determine a non-trivial solution.

(a) (λ − 2)x + y = 0, x + (λ + 2)y = 0.

(b) λx + 3y = 0, (λ + 6)y = 0.

8. Let a1 , . . . , an ∈ C and define A = [aij ]n×n with aij = aij−1 . Prove that det(A) =
Q
(aj − ai ). This matrix is usually called the van der monde matrix.
1≤i<j≤n

9. Let A = [aij ]n×n with aij = max{i, j}. Prove that det A = (−1)n−1 n.
T
AF

10. Solve the following system of equations by Cramer’s rule.


DR

i) x + y + z − w = 1, x + y − z + w = 2, 2x + y + z − w = 7, x + y + z + w = 3.
ii) x − y + z − w = 1, x + y − z + w = 2, 2x + y − z − w = 7, x − y − z + w = 3.

11. Let p ∈ C, p 6= 0. Let A = [aij ] and B = [bij ] be two n × n matrices with bij = pi−j aij , for
1 ≤ i, j ≤ n. Then compute det(B) in terms of det(A).

12. The position of an element aij of a determinant is called even or odd according as i + j is
even or odd. Prove that if all the entries in

(a) odd positions are multiplied with −1 then the value of determinant doesn’t change.

(b) even positions are multiplied with −1 then the value of determinant

i. does not change if the matrix is of even order.


ii. is multiplied by −1 if the matrix is of odd order.

13. Let A be a Hermitian matrix. Prove that det A is a real number.

14. Let A be an n × n matrix. Then A is invertible if and only if Adj(A) is invertible.

15. Let A and B be invertible matrices. Prove that Adj(AB) = Adj(B)Adj(A).


" #
A B
16. Let A be an invertible matrix and let P = . Then show that Rank(P ) = n if and
C D
only if D = CA−1 B.
54 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS OVER R

2.5 Summary
In this chapter, we started with a system of m linear equations in n unknowns and formally
wrote it as Ax = b and in turn to the augmented matrix [A | b]. Then the basic operations on
equations led to multiplication by elementary matrices on the right of [A | b] and thus giving
as the RREF which in turn gave us rank of a matrix. If Rank(A) = r and Rank([A | b]) = ra
and

1. r < ra then the linear system Ax = b is inconsistent.

2. r = ra then the linear system Ax = b is consistent. Furthermore, if

(a) r = n then the system Ax = b has a unique solution.


(b) r < n then the system Ax = b has an infinite number of solutions.

We have also see that the following conditions are equivalent for an n × n matrix A.

1. A is invertible.

2. The homogeneous system Ax = 0 has only the trivial solution.


T
AF

3. The row reduced echelon form of A is I.


DR

4. A is a product of elementary matrices.

5. The system Ax = b has a unique solution for every b.

6. The system Ax = b has a solution for every b.

7. rank(A) = n.

8. det(A) 6= 0.

So, overall we have learnt to solve the following type of problems:

1. Solving the linear system Ax = b. This idea will lead to the question “is the vector b a
linear combination of the columns of A”?

2. Solving the linear system Ax = 0. This will lead to the question “are the columns of A
linearly independent/dependent”? In particular, if Ax = 0 has

(a) a unique solution then the columns of A are linear independent.


(b) else, the columns of A are linearly dependent.
Chapter 3

Vector Spaces

In this chapter, we will mainly be concerned with finite dimensional vector spaces over R or C.
The last section will consist of results in infinite dimensional vector spaces that are similar but
different as compared with he finite dimensional case. We have given lots of examples of vector
spaces that are infinite dimensional or are vector spaces over fields that are different from R and
C. See appendix to have some ideas about fields that are different from R and C.
T
AF

3.1 Vector Spaces: Definition and Examples


DR

In this chapter, F denotes either R, the set of real numbers or C, the set of complex numbers.
Let A be an m × n complex matrix and let V denote the solution set of the homogeneous
system Ax = 0. Then, by Theorem 2.2.1.6, V satisfies:

1. 0 ∈ V as A0 = 0.

2. if x ∈ V then αx ∈ V, for all α ∈ C. In particular, for α = −1, −x ∈ V.

3. if x, y ∈ V then, for any α, β ∈ C, αx + βy ∈ V.

4. if x, y, z ∈ V then, (x + y) + z = x + (y + z).

That is, the solution set of a homogeneous linear system satisfies some nice properties. The
Euclidean plane, R2 and the Euclidean space, R3 , also satisfy the above properties. In this
chapter, our aim is to understand sets that satisfy such properties. We start with the following
definition.

Definition 3.3.1.1. [Vector Space] A vector space V over F, denoted V(F) or in short V (if
the field F is clear from the context), is a non-empty set, satisfying the following axioms:

1. Vector Addition: To every pair u, v ∈ V there corresponds a unique element u ⊕ v ∈ V


(called the addition of vectors) such that

(a) u ⊕ v = v ⊕ u (Commutative law).


(b) (u ⊕ v) ⊕ w = u ⊕ (v ⊕ w) (Associative law).
56 CHAPTER 3. VECTOR SPACES

(c) V has a unique element, denoted 0, called the zero vector that satisfies u ⊕ 0 = u,
for every u ∈ V (called the additive identity).
(d) for every u ∈ V there is a unique element −u ∈ V that satisfies u ⊕ (−u) = 0 (called
the additive inverse).

2. Scalar Multiplication: For each u ∈ V and α ∈ F, there corresponds a unique element


α ⊙ u in V (called the scalar multiplication) such that

(a) α · (β ⊙ u) = (αβ) ⊙ u for every α, β ∈ F and u ∈ V (· is multiplication in F).


(b) 1 ⊙ u = u for every u ∈ V, where 1 ∈ F.

3. Distributive Laws: relating vector addition with scalar multiplication


For any α, β ∈ F and u, v ∈ V, the following distributive laws hold:

(a) α ⊙ (u ⊕ v) = (α ⊙ u) ⊕ (α ⊙ v).
(b) (α + β) ⊙ u = (α ⊙ u) ⊕ (β ⊙ u) (+ is addition in F).
Definition 3.3.1.2. 1. The number 0 ∈ F, whereas 0 ∈ V.
2. The elements of F are called scalars.
T

3. The elements of V are called vectors.


AF

4. If F = R then V is called a real vector space.


DR

5. If F = C then V is called a complex vector space.


6. In general, a vector space over R or C is called a linear space.

Some interesting consequences of Definition 3.3.1.1 is stated next. Intuitively, these results
seem obvious but for better understanding of the axioms it is desirable to go through the proof.

Theorem 3.3.1.3. Let V be a vector space over F. Then

1. u ⊕ v = u implies v = 0.

2. α ⊙ u = 0 if and only if either u = 0 or α = 0.

3. (−1) ⊙ u = −u, for every u ∈ V.

Proof. Part 1: By Axiom 3.3.1.1.1d, for each u ∈ V there exists −u ∈ V such that −u ⊕ u = 0.
Hence, u ⊕ v = u is equivalent to

−u ⊕ (u ⊕ v) = −u ⊕ u ⇐⇒ (−u ⊕ u) ⊕ v = 0 ⇐⇒ 0 ⊕ v = 0 ⇐⇒ v = 0.

Part 2: As 0 = 0 ⊕ 0, using Axiom 3.3.1.1.3, we have

α ⊙ 0 = α ⊙ (0 ⊕ 0) = (α ⊙ 0) ⊕ (α ⊙ 0).

Thus, using Part 1, α ⊙ 0 = 0 for any α ∈ F. In the same way, using Axiom 3.3.1.1.3b,

0 ⊙ u = (0 + 0) ⊙ u = (0 ⊙ u) ⊕ (0 ⊙ u).
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 57

Hence, using Part 1, one has 0 ⊙ u = 0 for any u ∈ V.


Now suppose α ⊙ u = 0. If α = 0 then the proof is over. Therefore, assume that α 6= 0, α ∈ F.
Then, (α)−1 ∈ F and

0 = (α)−1 ⊙ 0 = (α)−1 ⊙ (α ⊙ u) = ((α)−1 · α) ⊙ u = 1 ⊙ u = u

as 1 ⊙ u = u for every vector u ∈ V (see Axiom 2.2b). Thus, if α 6= 0 and α ⊙ u = 0 then u = 0.


Part 3: As 0 = 0 · u = (1 + (−1))u = u ⊕ (−1) · u, one has (−1) · u = −u.

Example 3.3.1.4. The readers are advised to justify the statements given below.

1. Let A be an m × n matrix with complex entries with Rank(A) = r ≤ n. Then, using


Theorem 2.2.2.34, V = {x | Ax = 0} is a vector space.

2. Consider R with the usual addition and multiplication. That is, ⊕ ≡ + and ⊙ ≡ ·. Then,
R forms a real vector space.

3. Let R2 = {(x1 , x2 )T | x1 , x2 ∈ R} Then, for x1 , x2 , y1 , y2 ∈ R and α ∈ R, define

(x1 , x2 )T ⊕ (y1 , y2 )T = (x1 + y1 , x2 + y2 )T and α ⊙ (x1 , x2 )T = (αx1 , αx2 )T .


T
AF

Verify that R2 is a real vector space.


DR

4. Let Rn = {(a1 , . . . , an )T | ai ∈ R, 1 ≤ i ≤ n}. For u = (a1 , . . . , an )T , v = (b1 , . . . , bn )T ∈ V


and α ∈ R, define

u ⊕ v = (a1 + b1 , . . . , an + bn )T and α ⊙ u = (αa1 , . . . , αan )T

(called component wise operations). Then, V is a real vector space. The vector
space Rn is called the real vector space of n-tuples.


Recall that the symbol i represents the complex number −1.

5. Consider C = {x + iy | x, y ∈ R}, the set of complex numbers. Let z1 = x1 + iy1 and


z2 = x2 + iy2 and define z1 ⊕ z2 = (x1 + x2 ) + i(y1 + y2 ). For scalar multiplication,

(a) let α ∈ R. Then α ⊙ z1 = (αx1 ) + i(αy1 ) and we call C the real vector space.
(b) let α + iβ ∈ C. Then (α + iβ) ⊙ (x1 + iy1 ) = (αx1 − βy1 ) + i(αy1 + βx1 ) and we call
C the complex vector space.

6. Let Cn = {(z1 , . . . , zn )T | zi ∈ C, 1 ≤ i ≤ n}. For z = (z1 , . . . , zn ), w = (w1 , . . . , wn )T ∈ Cn


and α ∈ F, define

z + w = (z1 + w1 , . . . , zn + wn )T , and α ⊙ z = (αz1 , . . . , αzn )T .

Then, verify that Cn forms a vector space over C (called complex vector space) as well as
over R (called real vector space). In general, we assume Cn to be a complex vector space.
58 CHAPTER 3. VECTOR SPACES

Remark 3.3.1.5. If F = C then i(1, 0) = (i, 0) is allowed. Whereas, if F = R then i(1, 0)


doesn’t make sense as i 6∈ R.

7. Fix m, n ∈ N and let Mm,n (C) = {Am×n = [aij ] | aij ∈ C}. For A, B ∈ Mm,n (C) and
α ∈ C, define (A + αB)ij = aij + αbij . Then Mm,n (C) is a complex vector space. If m = n,
the vector space Mm,n (C) is denoted by Mn (C).

8. Let S be a non-empty set and let RS = {f | f is a function from S to R}. For f, g ∈ RS


and α ∈ R, define (f + αg)(x) = f (x) + αg(x), for all x ∈ S. Then, RS is a real vector
space. In particular,
(a) for S = N, observe that RN , consisting of all real sequences, forms a real vector space.
(b) Let V be the set of all bounded real sequences. Then V is a real vector space.
(c) Let V be the set of all real sequences that converge to 0. Then V is a real vector
space.
(d) Let S be the set of all real sequences that converge to 1. Then check that S is not a
vector space. Determine the conditions that fail.

9. Fix a, b ∈ R with a < b and let C([a, b], R) = {f : [a, b] → R | f is continuous}. Then
T

C([a, b], R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ [a, b], is a real vector space.
AF
DR

10. Let C(R, R) = {f : R → R | f is continuous}. Then C(R, R) with (f + αg)(x) = f (x) +


αg(x), for all x ∈ R, is a real vector space.

11. Fix a < b ∈ R and let C 2 ((a, b), R) = {f : (a, b) → R | f ′′ exists and f ′′ is continuous}.
Then C 2 ((a, b), R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ (a, b), is a real vector space.

12. Fix a < b ∈ R and let C ∞ ((a, b), R) = {f : (a, b) → R | f is infinitely differentiable}. Then
C ∞ ((a, b), R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ (a, b) is a real vector space.

13. Fix a < b ∈ R. Then V = {f : (a, b) → R | f ′′ + f ′ + 2f = 0} is a real vector space.

14. Let R[x] = {p(x) | p(x) is a real polynomial in x}. Then, with the usual addition of
polynomials and α(a0 + a1 x + · · · + an xn ) = (αa0 ) + · · · + (αan )xn , for α ∈ R, gives R[x]
a real vector space structure.

15. Fix n ∈ N and let R[x; n] = {p(x) ∈ R[x] | p(x) has degree ≤ n}.Then, with the usual
addition of polynomials and α(a0 + a1 x + · · · + an xn ) = (αa0 ) + · · · + (αan )xn , for α ∈ R,
gives R[x; n] a real vector space structure.

16. Let C[x] = {p(x) | p(x) is a complex polynomial in x}. Then, with the usual addition of
polynomials and α(a0 + a1 x + · · · + an xn ) = (αa0 ) + · · · + (αan )xn , for α ∈ C, gives C[x]
a real vector space structure.

17. Let V = {0}. Then V is a real as well as a complex vector space.

18. Let R+ = {x ∈ R | x > 0}. Then


3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 59

(a) R+ is not a vector space under usual operations of addition and scalar multiplication.
(b) R+ is a real vector space with 1 as the additive identity if we define

u ⊕ v = u · v and α ⊙ u = uα for all u, v ∈ R+ and α ∈ R.

19. For any α ∈ R and x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 , define

x ⊕ y = (x1 + y1 + 1, x2 + y2 − 3)T and α ⊙ x = (αx1 + α − 1, αx2 − 3α + 3)T .

Then R2 is a real vector space with (−1, 3)T as the additive identity.

20. Let V = {A = [aij ] ∈ Mn (C) | a11 = 0}. Then V is a complex vector space.

21. Let V = {A = [aij ] ∈ Mn (C) | A = A∗ }. Then V is a real vector space but not a complex
vector space.

22. Let V and W be vector spaces over F, with operations (+, •) and (⊕, ⊙), respectively. Let
V × W = {(v, w) | v ∈ V, w ∈ W}. Then V × W forms a vector space over F, if for every
(v1 , w1 ), (v2 , w2 ) ∈ V × W and α ∈ R, we define

(v1 , w1 ) ⊕′ (v2 , w2 ) = (v1 + v2 , w1 ⊕ w2 ), and


T
AF

α ◦ (v1 , w1 ) = (α • v1 , α ⊙ w1 ).
DR

v1 + v2 and w1 ⊕ w2 on the right hand side mean vector addition in V and W, respectively.
Similarly, α • v1 and α ⊙ w1 correspond to scalar multiplication in V and W, respectively.

23. Let Q be the set of scalars. Then R is a vector space over Q. As e, π − 2 6∈ Q, these real
numbers are vectors but not scalars in this space.

24. Similarly, C is a vector space over Q. Since e − π, i + 2, i 6∈ Q, these complex numbers
are vectors but not scalars in this space.

25. Let Z5 = {0, 1, 2, 3, 4} with addition and multiplication, respectively, given by

+ 0 1 2 3 4 · 0 1 2 3 4
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 1 0 1 2 3 4
and .
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
Then, V = {(a, b) | a, b ∈ Z5 } is a vector space having 25 vectors.

Note that all our vector spaces, except the last three, are linear spaces.

From now on, we will use ‘u + v’ for ‘u ⊕ v’ and ‘αu or α · u’ for ‘α ⊙ u’.

Exercise 3.3.1.6. 1. Verify the axioms for vector spaces that appear in Example 3.3.1.4.
60 CHAPTER 3. VECTOR SPACES

2. Does the set V given below form a real/complex or both real and complex vector space?
Give reasons for your answer.
(a) For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 , define x + y = (x1 + y1 , x2 + y2 )T and
αx = (αx1 , 0)T for all α ∈ R.
  
a b
(b) Let V = | a, b, c, d ∈ C, a + c = 0 .
c d
  
a b
(c) Let V = | a = b, a, b, c, d ∈ C .
c d
(d) Let V = {(x, y, z)T | x + y + z = 1}.
(e) Let V = {(x, y)T ∈ R2 | x · y = 0}.
(f ) Let V = {(x, y)T ∈ R2 | x = y 2 }.
(g) Let V = {α(1, 1, 1)T + β(1, 1, −1)T | α, β ∈ R}.
(h) Let V = R with x ⊕ y = x − y and α ⊙ x = −αx, for all x, y ∈ V and α ∈ R.
(i) Let V = R2 . Define (x1 , y1 )T ⊕(x2 , y2 )T = (x1 +x2 , 0)T and α⊙(x1 , y1 )T = (αx1 , 0)T ,
for α, x1 , x2 , y1 , y2 ∈ R.

3.1.A Subspaces
T
AF

Definition 3.3.1.7. [Vector Subspace] Let V be a vector space over F. Then, a non-empty
DR

subset S of V is called a subspace of V if S is also a vector space with vector addition and
scalar multiplication inherited from V.

Example 3.3.1.8. 1. If V is a vector space then V and {0} are subspaces, called trivial
subspaces.

2. The real vector space R has no non-trivial subspace.

3. W = {x ∈ R3 | [1, 2, −1]x = 0} is a plane in R3 containing 0 (so a subspace).


 
3 1 1 1
4. W = {x ∈ R | x = 0} is a line in R3 containing 0 (so a subspace).
1 −1 −1

5. The vector space R[x; n] is a subspace of R[x].

6. Prove that C 2 (a, b) is a subspace of C(a, b).

7. Prove that W = {(x, 0)T ∈ R2 | x ∈ R} is a subspace of R2 .

8. Is the set of sequences converging to 0 a subspace of the set of all bounded sequences?

9. Let V be the vector space of Example 3.3.1.4.19. Then


(a) S = {(x, 0)T | x ∈ R} is not a subspace of V as (x, 0)T ⊕(y, 0)T = (x+y+1, −3)T 6∈ S.
(b) W = {(x, 3)T | x ∈ R} is a subspace of V.

10. The vector space R+ defined in Example 3.3.1.4.18 is not a subspace of R.


3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 61

Let V(F) be a vector space and W ⊆ V, W 6= ∅. We now prove a result which implies that to
check W to be a subspace, we need to verify only one condition.

Theorem 3.3.1.9. Let V(F) be a vector space and W ⊆ V, W 6= ∅. Then W is a subspace of V


if and only if αu + βv ∈ W whenever α, β ∈ F and u, v ∈ W.

Proof. Let W be a subspace of V and let u, v ∈ W. Then, for every α, β ∈ F, αu, βv ∈ W and
hence αu + βv ∈ W.
Now, we assume that αu + βv ∈ W, whenever α, β ∈ F and u, v ∈ W. To show, W is a
subspace of V:

1. Taking α = 1 and β = 1, we see that u + v ∈ W, for every u, v ∈ W.

2. Taking α = 0 and β = 0, we see that 0 ∈ W.

3. Taking β = 0, we see that αu ∈ W, for every α ∈ F and u ∈ W. Hence, using Theo-


rem 3.3.1.3.3, −u = (−1)u ∈ W as well.

4. The commutative and associative laws of vector addition hold as they hold in V.

5. The axioms related with scalar multiplication and the distributive laws also hold as they
T
AF

hold in V.
DR

Thus, one obtains the required result.

Exercise 3.3.1.10. 1. Determine all the subspaces of R and R2 .

2. Prove that a line in R2 is a subspace if and only if it passes through (0, 0) ∈ R2 .

3. Are all the sets given below subspaces of C([−1, 1])?

(a) W = {f ∈ C([−1, 1]) | f (1/2) = 0}.


(b) W = {f ∈ C([−1, 1]) | f (−1/2) = 0, f (1/2) = 0}.
(c) W = {f ∈ C([−1, 1]) | f ′ ( 14 ) exists }.

4. Are all the sets given below subspaces of R[x]?

(a) W = {f (x) ∈ R[x] | deg(f (x)) = 3}.


(b) W = {f (x) ∈ R[x] | deg(f (x)) = 0}.
(c) W = {f (x) ∈ R[x] | f (1) = 0}.
(d) W = {f (x) ∈ R[x] | f (0) = 0, f (1/2) = 0}.

5. Which of the following are subspaces of Rn (R)?

(a) {(x1 , x2 , . . . , xn )T | x1 ≥ 0}.


(b) {(x1 , x2 , . . . , xn )T | x1 is rational}.
(c) {(x1 , x2 , . . . , xn )T | | x1 | ≤ 1}.
62 CHAPTER 3. VECTOR SPACES

6. Among the following, determine the subspaces of the complex vector space Cn ?

(a) {(z1 , z2 , . . . , zn )T | z1 is real }.

(b) {(z1 , z2 , . . . , zn )T | z1 + z2 = z3 }.

(c) {(z1 , z2 , . . . , zn )T | | z1 |=| z2 |}.

7. Fix n ∈ N. Then, is W a subspace of Mn (R), where

(a) W = {A ∈ Mn (R) | A is upper triangular}?

(b) W = {A ∈ Mn (R) | A is symmetric}?

(c) W = {A ∈ Mn (R) | A is skew-symmetric}?

(d) W = {A | A is a diagonal matrix}?

(e) W = {A | trace(A) = 0}?

(f ) W = {A ∈ Mn (R) | AT = 2A}?

(g) W = {A = [aij ] | a11 + a21 + a34 = 0}?


T

8. Fix n ∈ N. Then, is W = {A = [aij ] | a11 + a22 = 0} a subspace of the complex vector


AF

space Mn (C)? What if Mn (C) is a real vector space?


DR

9. Prove that the following sets are not subspaces of Mn (R).

(a) G = {A ∈ Mn (R) | det(A) = 0}.

(b) G = {A ∈ Mn (R) | det(A) 6= 0}.

(c) G = {A ∈ Mn (R) | det(A) = 1}.

3.1.B Linear Span

Definition 3.3.1.11. [Linear Combination] Let V be a vector space over F and let u1 , . . . , un ∈
V. Then, a vector u ∈ V is said to be a linear combination of u1 , . . . , un if we can find scalars
α1 , . . . , αn ∈ F such that u = α1 u1 + · · · + αn un .

Example 3.3.1.12. 1. (3, 4, 3) = 2(1, 1, 1) + (1, 2, 1) but we cannot find a, b ∈ R such that
(3, 4, 5) = a(1, 1, 1) + b(1, 2, 1).

2. Is (4, 5, 5) a linear combination of (1, 0, 0), (2, 1, 0), and (3, 3, 1)?
Solution: (4, 5, 5) is a linear combination if the linear system

a(1, 0, 0) + b(2, 1, 0) + c(3, 3, 1) = (4, 5, 5) (3.3.1.1)

in the unknowns a, b, c ∈ R has a solution. Clearly, Equation (3.3.1.1) has solution a =


9, b = −10 and c = 5.
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 63

3. Is (4, 5, 5) a linear combination of the vectors (1, 2, 3), (−1, 1, 4) and (3, 3, 2)?
Solution: The vector (4, 5, 5) is a linear combination if the linear system

a(1, 2, 3) + b(−1, 1, 4) + c(3, 3, 2) = (4, 5, 5) (3.3.1.2)

 a, b, c ∈ R has a solution. The RREF of the corresponding augmented


in the unknowns
1 0 2 3
 
matrix equals 0 1 −1 −1

, implying infinite number of solutions. For example,
0 0 0 0
(4, 5, 5) = 3(1, 2, 3) − (−1, 1, 4).

4. Is (4, 5, 5) a linear combination of the vectors (1, 2, 1), (1, 0, −1) and (1, 1, 0)?
Solution: The vector (4, 5, 5) is a linear combination if the linear system

a(1, 2, 1) + b(1, 0, −1) + c(1, 1, 0) = (4, 5, 5) (3.3.1.3)

 a, b, c ∈ R has a solution. The RREF of the corresponding augmented


in the unknowns
1 0 1/2 5/2
 

matrix equals 0 1 21 3 . So, Equation (3.3.1.3) has no solution. Thus, (4, 5, 5) is
2 
T

0 0 0 1
AF

not a linear combination of the given vectors.


DR

Exercise 3.3.1.13. 1. Let x ∈ R3 . Prove that xT is a linear combination of (1, 0, 0), (2, 1, 0)
and (3, 3, 1). Is this linear combination unique? That is, does there exist (a, b, c) 6= (e, f, g)
with xT = a(1, 0, 0) + b(2, 1, 0) + c(3, 3, 1) = e(1, 0, 0) + f (2, 1, 0) + g(3, 3, 1)?

2. Find condition(s) on x, y, z ∈ R such that (x, y, z) is a linear combination of


(a) (1, 2, 3), (−1, 1, 4) and (3, 3, 2).
(b) (1, 2, 1), (1, 0, −1) and (1, 1, 0).
(c) (1, 1, 1), (1, 1, 0) and (1, −1, 0).

Definition 3.3.1.14. [Linear Span] Let V be a vector space over F and S = {u1 , . . . , un } ⊆ V.
Then, the linear span of S, denoted LS(S), equals

LS(S) = {α1 u1 + · · · + αn un | αi ∈ F, 1 ≤ i ≤ n}.

If S is an empty set, we define LS(S) = {0}.

Example 3.3.1.15. For the set S given below, determine LS(S).

1. S = {(1, 0)T , (0, 1)T } ⊆ R2 .


Solution: LS(S) = {a(1, 0)T + b(0, 1)T | a, b ∈ R} = {(a, b)T | a, b ∈ R} = R2 .

2. S = {(1, 1, 1)T , (2, 1, 3)T }. What is the geometrical representation of LS(S)?


Solution: LS(S) = {a(1, 1, 1)T + b(2, 1, 3)T | a, b ∈ R} = {(a + 2b, a + b, a + 3b)T | a, b ∈
R}. For geometrical representation, we need to find conditions on x, y and z such that
64 CHAPTER 3. VECTOR SPACES

(a + 2b, a + b, a + 3b) = (x, y, z). Or equivalently, a + 2b = x, a + b = y, a + 3b = z


has
 a solution for all a, b ∈ R. Check that the RREF of the augmented matrix equals
1 0 2y − x
 
0 x−y 
 1 . Thus, we need 2x − y − z = 0. Hence, LS(S) is a plane given by
0 0 z + y − 2x

LS(S) = {a(1, 1, 1)T + b(2, 1, 3)T | a, b ∈ R} = {(x, y, z)T ∈ R3 | 2x − y − z = 0}.

3. S = {(1, 2, 1)T , (1, 0, −1)T , (1, 1, 0)T }. What is the geometrical representation of LS(S)?
Solution: As above, we need to find condition(s) on x, y, z such that the linear system

a(1, 2, 1) + b(1, 0, −1) + c(1, 1, 0) = (x, y, z) (3.3.1.4)

in theunknowns a, b, c is always
 consistent. An application of GJE to Equation (3.3.1.4)
x+y
1 0 1 3
 
gives 
 0 1 1
2
2x−y
3
. Thus,

0 0 0 x−y+z

LS(S) = {(x, y, z)T ∈ R3 | x − y + z = 0}.


T
AF

4. S = {(1, 2, 3)T , (−1, 1, 4)T , (3, 3, 2)T }.


DR

Solution: As above, need to find condition(s) on x, y, z such that the linear system

a(1, 2, 3) + b(−1, 1, 4) + c(3, 3, 2) = (x, y, z)

in the unknowns a, b, c is always consistent. An application of GJE method gives


5x − 7y + 3z = 0 as the required condition. Thus,

LS(S) = {(x, y, z)T ∈ R3 | 5x − 7y + 3z = 0}.

5. S = {(1, 2, 3, 4)T , (−1, 1, 4, 5)T , (3, 3, 2, 3)T } ⊆ R4 .


Solution: The readers are advised to show that

LS(S) = {(x, y, z, w)T ∈ R4 | 2x − 3y + w = 0, 5x − 7y + 3z = 0}.

Exercise 3.3.1.16. For each S, determine the geometric representation of LS(S).

1. S = {−1} ⊆ R.

2. S = {π} ⊆ R.

3. S = {(1, 0, 1)T , (0, 1, 0)T , (3, 0, 3)T } ⊆ R3 .

4. S = {(1, 2, 1)T , (2, 0, 1)T , (1, 1, 1)T } ⊆ R3 .

5. S = {(1, 0, 1, 1)T , (0, 1, 0, 1)T , (3, 0, 3, 1)T } ⊆ R4 .


3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 65

Definition 3.3.1.17. [Finite Dimensional Vector Space] Let V be a vector space over F. Then
V is called finite dimensional if there exists S ⊆ V, such that S has finite number of elements
and V = LS(S). If such an S does not exist then V is called infinite dimensional.

Example 3.3.1.18. 1. {(1, 2)T , (2, 1)T } spans R2 . Thus, R2 is finite dimensional.

2. {1, 1 + x, 1 − x + x2 , x3 , x4 , x5 } spans C[x; 5]. Thus, C[x; 5] is finite dimensional.

3. Fix n ∈ N. Then, R[x; n] is finite dimensional as R[x; n] = LS({1, x, x2 , . . . , xn }).

4. C[x] is not finite dimensional as the degree of a polynomial can be any large positive
integer. Indeed, verify that C[x] = LS({1, x, x2 , . . . , xn , . . .}).

5. The vector space R over Q is infinite dimensional.

6. The vector space C over Q is infinite dimensional.

Lemma 3.3.1.19 (Linear Span is a Subspace). Let V be a vector space over F and S ⊆ V.
Then LS(S) is a subspace of V.

Proof. By definition, 0 ∈ LS(S). So, LS(S) is non-empty. Let u, v ∈ LS(S). To show,


T

au + bv ∈ LS(S) for all a, b ∈ F. As u, v ∈ LS(S), there exist n ∈ N, vectors wi ∈ S and scalars


AF

αi , βi ∈ F such that u = α1 w1 + · · · + αn wn and v = β1 w1 + · · · + βn wn . Hence,


DR

au + bv = (aα1 + bβ1 )w1 + · · · + (aαn + bβn )wn ∈ LS(S)

as aαi + bβi ∈ F for 1 ≤ i ≤ n. Thus, by Theorem 3.3.1.9, LS(S) is a vector subspace.

Remark 3.3.1.20. Let V be a vector space over F. If W is a subspace of V and S ⊆ W then


LS(S) is a subspace of W as well.

Theorem 3.3.1.21. Let V be a vector space over F and S ⊆ V. Then LS(S) is the smallest
subspace of V containing S.

Proof. For every u ∈ S, u = 1 · u ∈ LS(S). Thus, S ⊆ LS(S). Need to show that LS(S) is the
smallest subspace of V containing S. So, let W be any subspace of V containing S. Then, by
Remark 3.3.1.20, LS(S) ⊆ W and hence the result follows.

Definition 3.3.1.22. Let V be a vector space over F.


1. Let S and T be two subsets of V. Then, the sum of S and T , denoted S + T equals
{s + t|s ∈ S, t ∈ T }. For example,
(a) if V = R, S = {0, 1, 2, 3, 4, 5, 6} and T = {5, 10, 15} then S + T = {5, 6, . . . , 21}.
     
2 1 −1 0
(b) if V = R , S = and T = then S + T = .
1 1 2
        
1 −1 1 −1
(c) if V = R2 , S = and T = LS then S + T = +c |c ∈ R .
1 1 1 1
2. Let P and Q be two subspaces of V. Then, we define their sum, denoted P + Q, as
P + Q = {u + v | u ∈ P, v ∈ Q}. For example, P + Q = R2 , if
66 CHAPTER 3. VECTOR SPACES

(a) P = {(x, 0)T | x ∈ R} and Q = {(0, x)T | x ∈ R} as (x, y) = (x, 0) + (0, y).
(b) P = {(x, 0)T | x ∈ R} and Q = {(x, x)T | x ∈ R} as (x, y) = (x − y, 0) + (y, y).
2y − x 2x − y
(c) P = LS((1, 2)T ) and Q = LS((2, 1)T ) as (x, y) = (1, 2) + (2, 1).
3 3
We leave the proof of the next result for readers.

Lemma 3.3.1.23. Let V be a vector space over F and let P and Q be two subspaces of V. Then
P + Q is the smallest subspace of V containing both P and Q.

Exercise 3.3.1.24. 1. Let a ∈ R2 , a 6= 0. Then show that {x ∈ R2 | aT x = 0} is a


non-trivial subspace of R2 . Geometrically, what does this set represent in R2 ?

2. Find all subspaces of R3 .

3. Prove that {(x, y, z)T ∈ R3 | ax + by + cz = d} is a subspace of R3 if and only if d = 0.

4. Let W = {f (x) ∈ R[x] | deg(f (x)) = 5}. Prove that W is not a subspace R[x].

5. Determine all subspaces of the vector space in Example 3.3.1.4.19.


     
a b a 0
T

6. Let U = | a, b, c ∈ R and W = | a, d ∈ R be subspaces of M2 (R).


c 0 0 d
AF

Determine U ∩ W. Is M2 (R) = U ∪ W? What is U + W?


DR

7. Let W and U be two subspaces of a vector space V over F.

(a) Prove that W ∩ U is a subspace of V.


(b) Give examples of W and U such that W ∪ U is not a subspace of V.
(c) Determine conditions on W and U such that W ∪ W a subspace of V?
(d) Prove that LS(W ∪ U) = W + U.

8. Let S = {x1 , x2 , x3 , x4 }, where x1 = (1, 0, 0)T , x2 = (1, 1, 0)T , x3 = (1, 2, 0)T and x4 =
(1, 1, 1)T . Then, determine all xi such that LS(S) = LS(S \ {xi }).

9. Let W = LS((1, 0, 0)T , (1, 1, 0)T ) and U = LS((1, 1, 1)T ). Prove that W + U = R3 and
W ∩ U = {0}. If u ∈ R3 , determine uW ∈ W and uU ∈ U such that u = uW + uU . Is it
necessary that uW and uU are unique?

10. Let W = LS((1, −1, 0), (1, 1, 0)) and U = LS((1, 1, 1), (1, 2, 1)). Prove that W + U = R3
and W ∩ U 6= {0}. Find u ∈ R3 such that when we write u = uW + uU , with uW ∈ W and
uU ∈ U, the vectors uW and uU are not unique.

3.1.C Fundamental Subspaces Associated with a Matrix

Definition 3.3.1.25. Let A ∈ Mm,n (C). Then, we use the functions P : Cn → Cm and
Q : Cm → Cn defined by P (x) = Ax and Q(y) = A∗ y, respectively, for x ∈ Cn and y ∈ Cm , to
get the four fundamental subspaces associated with A, namely,
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 67

1. Col(A) = {Ax | x ∈ Cn } = Rng(P ) is a subspace of Cm , called the Column / Range


space. Observe that Col(A) is the linear span of columns of A.
2. Null(A) = {x | Ax = 0, x ∈ Cn } = Null(P ) is a subspace of Cn , called the Null space.
3. Col(A∗ ) = {A∗ x | x ∈ Cn } = Rng(Q) is the linear span of rows of A = [aij ]. If
A ∈ Mm,n (R) then Col(A∗ ) reduces to Row(A) = {xT A | x ∈ Rm }, the row space of A.
4. Null(A∗ ) = {x | A∗ x = 0, x ∈ Cn } = Null(Q). If A ∈ Mm,n (R) then Null(A∗ ) reduces
to Null(AT ) = {x ∈ Rm | xT A = 0T }.
 
1 1 1
 
Example 3.3.1.26. 
1. Let A = 2 0 1
. Then
1 −1 0

(a) Col(A) = {x = (x1 , x2 , x3 )T ∈ R3 | x1 − x2 + x3 = 0}.


(b) Row(A) = {x = (x1 , x2 , x3 )T ∈ R3 | x1 + x2 − 2x3 = 0}.
(c) Null(A) = LS((1, 1, −2)T ).
(d) Null(AT ) = LS((1, −1, 1)T ).
 
1 1 0 −1
 
T

2. Let A = 1 −1 1 2 . Then
AF

 
2 0 1 1
DR

(a) Col(A) = {x = (x1 , x2 , x3 )T ∈ R3 | x1 + x2 − x3 = 0}.


(b) Row(A) = {x = (x1 , x2 , x3 , x4 )T ∈ R4 | x1 − x2 − 2x3 = 0, x1 − 3x2 − 2x4 = 0}.
(c) Null(A) = LS({(1, −1, −2, 0)T , (1, −3, 0, −2)T }).
(d) Null(AT ) = LS((1, 1, −1)T ).
 
1 i 2i
 
3. Let A = 
 i −2 −3 . Then

1 1 1+i

(a) Col(A) = {(x1 , x2 , x3 ) ∈ C3 | (2 + i)x1 − (1 − i)x2 − x3 = 0}.


(b) Col(A∗ ) = {(x1 , x2 , x3 ) ∈ C3 | ix1 − x2 + x3 = 0}.
(c) Null(A) = LS((i, 1, −1)T ).
(d) Null(A∗ ) = LS((−2 + i, 1 + i, 1)T ).

Remark 3.3.1.27. Let A ∈ Mm,n (R). Then, in Example 3.3.1.26, observe that the direction
ratios of Col(A) matches vector(s) in Null(AT ). Similarly, the direction ratios of Row(A)
matches with vectors in Null(A). What are the relationships in case A ∈ Mm,n (C)? We will
come back to these spaces again and again.

Let V be a vector space over either R or C. Then, we have learnt that


1. for any S ⊆ V, LS(S) is again a vector space. Moreover, LS(S) is the smallest subspace
containing S.
68 CHAPTER 3. VECTOR SPACES

2. unless S = ∅, LS(S) has infinite number of vectors.

Therefore, the following questions arise:

(a) Are there conditions under which LS(S1 ) = LS(S2 ) for S1 6= S2 ?


(b) Is it always possible to find S so that LS(S) = V?
(c) Suppose we have found S ⊆ V such that LS(S) = V. Can we find the minimum
number of vectors in S?

We try to answer these questions in the subsequent sections.

3.2 Linear Independence


Definition 3.3.2.1. [Linear Independence and Dependence] Let S = {u1 , . . . , um } be a non-
empty subset of a vector space V over F. Then the set S is said to be linearly independent
if the system of linear equations

α1 u1 + α2 u2 + · · · + αm um = 0, (3.3.2.1)
T

in the unknowns αi ’s, 1 ≤ i ≤ m, has only the trivial solution. If the system (3.3.2.1) has a
AF

non-trivial solution then the set S is said to be linearly dependent.


DR

If the set S has infinitely many vectors then S is said to be linearly independent if for every
finite subset T of S, T is linearly independent, else S is linearly dependent.

Let V be a vector space over F and S = {u1 , . . . , um } ⊆ V with S 6= ∅. Then, one needs to
solve the linear system of equation

α1 u1 + α2 u2 + · · · + αm um = 0 (3.3.2.2)

in the unknowns α1 , . . . , αn ∈ F. If α1 = · · · = αm = 0 is the only solution of (3.3.2.2) then


S is a linearly independent subset of V. Otherwise, S is a linearly dependent subset of V. Since
one is solving a linear system over F, linear independence and dependence depend on F, the set
of scalars.

Example 3.3.2.2. Is the set S a linear independent set? Give reasons.

1. Let S = {(1, 2, 1)T , (2, 1, 4)T , (3, 3, 5)T }.


Solution: Consider the system a(1, 2, 1) + b(2, 1, 4) + c(3, 3, 5) = (0, 0, 0) in the unknowns
a, b and c. As rank of coefficient matrix is 2 < 3, the number of unknowns, the system has
a non-trivial solution. Thus, S is a linearly dependent subset of R3 .

2. Let S = {(1, 1, 1)T , (1, 1, 0)T , (1, 0, 1)T }.


Solution: Consider the system a(1, 1, 1) + b(1, 1, 0) + c(1, 0, 1) = (0, 0, 0) in the unknowns
a, b and c. As rank of coefficient matrix is 3 = the number of unknowns, the system has
only the trivial solution. Hence, S is a linearly independent subset of R3 .
3.2. LINEAR INDEPENDENCE 69

3. Consider C as a complex vector space and let S = {1, i}.


Solution: Since C is a complex vector space, i · 1 + (−1)i = i − i = 0. So, S is a linear
dependent subset the complex vector space C.

4. Consider C as a real vector space and let S = {1, i}.


Solution: Consider the linear system a · 1 + b · i = 0, in the unknowns a, b ∈ R. Since
a, b ∈ R, equating real and imaginary parts, we get a = b = 0. So, S is a linear independent
subset the real vector space C.

5. Let A ∈ Mm,n (C). If Rank(A) < min{m, n} then the rows of A are linearly dependent.
Solution: Let B = RREF(A). Then, there exists an invertible matrix P = [pij ] such that
P
n
B = P A. Since Rank(A) < min{m, n}, B[m, :] = 0T . Thus, 0T = B[m, :] = pmi A[i, :].
i=1
As P is invertible, at least one pmi 6= 0. Thus, the required result follows.

3.2.A Basic Results related to Linear Independence

The reader is expected to supply the proof of parts that are not given.

Proposition 3.3.2.3. Let V be a vector space over F. Then,


T
AF

1. 0, the zero-vector, cannot belong to a linearly independent set.


DR

2. every non-empty subset of a linearly independent set in V is also linearly independent.

3. a set containing a linearly dependent set of V is also linearly dependent.

Proof. Let S = {0 = u1 , u2 , . . . , un }. Then, 1 · u1 + 0 · u2 + · · · + 0 · un = 0. Hence, the system


α1 u1 + · · · + αm um = 0 has a non-trivial solution [α1 , α2 , . . . , αn ] = [1, 0 . . . , 0]. Thus, the set S
is linearly dependent.

Theorem 3.3.2.4. Let V be a vector space over F. Let S = {u1 , . . . , uk } ⊆ V with S 6= ∅. If


T ⊆ LS(S) such that m = | T | > k then T is a linearly dependent set.

Proof. Let T = {w1 , . . . , wm }. As wi ∈ LS(S), there exist aij ∈ F such that wi = ai1 u1 + · · · +
aik uk , for 1 ≤ i ≤ m. So,
      
w1 a11 u1 + · · · + a1k uk a11 · · · a1k u1
 ..   .
.   .. . . .
.  .. 
 . = . = . . .  . .
wm am1 u1 + · · · + amk uk am1 · · · amk uk

As m > k, using Corollary 2.2.2.35, the system AT x = 0 has a non-trivial solution, say xT =
P
m P
m
[α1 , . . . , αm ] 6= 0T . That is, αi AT [:, i] = 0. Or equivalently, αi A[i, :] = 0T . Thus,
i=1 i=1
      
m m u1 m
! u1 u1
X X   ..  X  ..  T  .. 
αi wi = αi A[i, :] .  = αi A[i, :]  .  = 0  .  = 0.
i=1 i=1 uk i=1 uk uk

Thus, the set T is linearly dependent.


70 CHAPTER 3. VECTOR SPACES

Corollary 3.3.2.5. Fix n ∈ N. Then, any set S ⊆ Rn with | S | ≥ n + 1 is linearly dependent.

Proof. Observe that Rn = LS({e1 , . . . , en }), where ei = In [:, i], the i-th column of In . Hence,
the required result follows using Theorem 3.3.2.4.

Theorem 3.3.2.6. Let V be a vector space over F and let S be a linearly independent subset of
V. Suppose v ∈ V. Then S ∪ {v} is linearly dependent if and only if v ∈ LS(S).

Proof. Let us assume that S ∪ {v} is linearly dependent. Then, there exist vi ∈ S such that the
system α1 v1 + · · · + αp vp + αp+1 v = 0 has a non-trivial solution, say αi = ci , for 1 ≤ i ≤ p + 1.
As the solution is non-trivial one of the ci ’s is non-zero. We claim that cp+1 6= 0.
For, if cp+1 = 0 then the system α1 v1 + · · · + αp vp = 0 in the unknowns α1 , . . . , αp has a
non-trivial solution [c1 , . . . , cp ]. This contradicts Proposition 3.3.2.3.2 as {v1 , . . . , vp } is a subset
of the linearly independent set S. Thus, cp+1 6= 0 and we get

1
v=− (c1 v1 + · · · + cp vp ) ∈ L(v1 , . . . , vp )
cp+1
ci
as − cp+1 ∈ F, for 1 ≤ i ≤ p. Hence, v is a linear combination of v1 , . . . , vp .
Now, assume that v ∈ LS(S). Then, there exists ci ∈ F, not all zero and vi ∈ S such that
T

p
P
AF

v = ci vi . Thus, the system α1 v1 + · · · + αp vp + αp+1 v = 0 in the unknowns αi ’s has a


i=1
DR

non-trivial solution [c1 , . . . , cp , −1]. Hence, S ∪ {v} is linearly dependent.


We now state a very important corollary of Theorem 3.3.2.6 without proof.

Corollary 3.3.2.7. Let V be a vector space over F and let S = {u1 , . . . , un } ⊆ V with u1 6= 0.
If S is

1. linearly dependent then there exists k, 2 ≤ k ≤ n with LS(u1 , . . . , uk ) = LS(u1 , . . . , uk−1 ).

2. linearly independent then v ∈ V \ LS(S) if and only if S ∪ {v} = {u1 , . . . , un , v} is also a


linearly independent subset of V.

3. linearly independent then LS(S) = V if and only if each proper superset of S is linearly
dependent.

3.2.B Application to Matrices

We leave the proof of the next result for readers.

Theorem 3.3.2.8. The following statements are equivalent for A ∈ Mn (C).


1. A is invertible.

2. The columns of A are linearly independent.

3. The rows of A are linearly independent.

A generalization of Theorem 3.3.2.8 is stated next.


3.2. LINEAR INDEPENDENCE 71

Theorem 3.3.2.9. Let A ∈ Mm,n (C) with B = RREF(A). Then, the rows of A corresponding
to the pivotal rows of B are linearly independent. Also, the columns of A corresponding to the
pivotal columns of B are linearly independent.

Proof. Pivotal rows of B are linearly independent due to the pivotal 1’s. Now, let B1 be the
submatrix of B consisting of the pivotal rows of B. Let A1 be the submatrix of A which gives
B1 . As the RREF of a matrix is unique (see Corollary 2.2.2.17) there exists an invertible matrix
Q such that QA1 = B1 . So, if there exists c 6= 0 such that cT A1 = 0T then

0T = cT A1 = cT (Q−1 B1 ) = (cT Q−1 )B1 = dT B1 ,

with dT = cT Q−1 6= 0T as Q is an invertible matrix (see Theorem 2.2.3.1). Thus, it contradicts


the linear independence of the pivotal rows of B.
Let B[:, i1 ], . . . , B[:, ir ] be the pivotal columns of B which are linearly independent due to
pivotal 1’s. Let B = P A then the corresponding columns of A satisfy

[A[:, i1 ], . . . , A[:, ir ]] = P −1 [B[:, i1 ], . . . , B[:, ir ]].

Hence, the required result follows as P is an invertible matrix.


T
We give an example for better understanding.
AF

   
1 1 1 0 1 0 −1 0
1 0 −1 1  
DR

Example 3.3.2.10. Let A =   with RREF(A) = B = 0 1 2 0. Then


2 1 0 1 0 0 0 1
1 1 1 2 0 0 0 0
B[:, 3] = −B[:, 1]+2B[:, 2]. Thus, A[:, 3] = −A[:, 1]+2A[:, 2]. As the 1-st, 2-nd and 4-th columns
of B are linearly independent, the set {A[:, 1], A[:, 2], A[:, 4]} is linearly independent. Also, note
that during the application of GJE to get RREF, we have interchanged the 3-rd and 4-th rows.
Hence, the rows A[1, :], A[2, :] and A[4, :] are linearly independent.

3.2.C Linear Independence and Uniqueness of Linear Combination

We end this section with a result that states that linear combination with respect to linearly
independent set is unique.

Lemma 3.3.2.11. Let S be a linearly independent set in a vector space V over F. Then each
v ∈ LS(S) is a unique linear combination vectors from S.

Proof. On the contrary, suppose there exists v ∈ LS(S) such that v = α1 v1 + · · · + αp vp and
v = β1 v1 + · · · + βp vp , for αi , βi ∈ F and vi ∈ S, for 1 ≤ i ≤ p. Equating the two expressions
for v gives
(α1 − β1 )v1 + · · · + (αp − βp )vp = 0. (3.3.2.3)

As {v1 , . . . , vp } ⊆ S is a linearly independent subset in LS(S), the system c1 v1 + · · · + cp vp = 0


in the unknowns c1 , . . . , cp has only the trivial solution. Thus, each of the scalars αi − βi ,
appearing in Equation (3.3.2.3), must be equal to 0. That is, αi − βi = 0 , for 1 ≤ i ≤ p. Thus,
for 1 ≤ i ≤ p, αi = βi and the result follows.
72 CHAPTER 3. VECTOR SPACES

Exercise 3.3.2.12. 1. Consider the Euclidean plane R2 . Let u1 = (1, 0)T . Determine con-
dition on u2 such that {u1 , u2 } is a linearly independent subset of R2 .

2. Let S = {(1, 1, 1, 1)T , (1, −1, 1, 2)T , (1, 1, −1, 1)T } ⊆ R4 . Does (1, 1, 2, 1)T ∈ LS(S)? Fur-
thermore, determine conditions on x, y, z and u such that (x, y, z, u)T ∈ LS(S).

3. Show that S = {(1, 2, 3)T , (−2, 1, 1)T , (8, 6, 10)T } ⊆ R3 is linearly dependent.

4. Prove that {u1 , . . . , un } ⊆ Cn is linearly independent if and only if {Au1 , . . . , Aun } is


linearly independent for every invertible matrix A.

5. Let V be a complex vector space and let A ∈ Mn (C) be invertible. Then {u1 , . . . , un } ⊆ V
is a linearly independent if and only if {w1 , . . . , wn } ⊆ V is linearly independent, where
Pn
wi = aij uj , for 1 ≤ i ≤ n.
j=1

6. Find u, v, w ∈ R4 such that {u, v, w} is linearly dependent whereas {u, v}, {u, w} and
{v, w} are linearly independent.

7. Is {(1, 0)T , (i, 0)T } a linearly independent subset of the real vector space C2 ?
T

8. Suppose V is a collection of vectors such that V is a real as well as a complex vector space.
AF

Then prove that {u1 , . . . , uk , iu1 , . . . , iuk } is a linearly independent subset of V over R if
DR

and only if {u1 , . . . , uk } is a linear independent subset of V over C.

9. Let V be a vector space and M be a subspace of V. For u, v ∈ V \M , define K = LS(M, u)


and H = LS(M, v). Then prove that v ∈ K if and only if u ∈ H.

10. Let A ∈ Mn (R). Suppose x, y ∈ Rn \ {0} such that Ax = 3x and Ay = 2y. Then prove
that x and y are linearly independent.
 
2 1 3
 
11. Let A =   3
4 −1 3. Determine x, y, z ∈ R \ {0} such that Ax = 6x, Ay = 2y and
3 −2 5
Az = −2z. Use the vectors x, y and z obtained above to prove the following.

(a) A2 v = 4v, where v = cy + dz for any c, d ∈ R.


(b) The set {x, y, z} is linearly independent.
(c) Let P = [x, y, z] be a 3 × 3 matrix. Then P is invertible.
 
6 0 0
 
(d) Let D = 
 0 2 0 . Then AP = P D.

0 0 −2

12. Prove that the rows/columns of


(a) A ∈ Mn (C) are linearly independent if and only if det(A) 6= 0.
(b) A ∈ Mn (C) span Cn if and only if A is an invertible matrix.
3.3. BASIS OF A VECTOR SPACE 73

(c) a skew-symmetric matrix A of odd order are linearly dependent.

13. Let P and Q be subspaces of Rn such that P + Q = Rn and P ∩ Q = {0}. Prove that each
u ∈ Rn is uniquely expressible as u = uP + uQ , where uP ∈ P and uQ ∈ Q.

3.3 Basis of a Vector Space


Definition 3.3.3.1. Let S be a subset of a set T . Then S is said to be a maximal subset of
T having property P if
1. S has property P and
2. no proper superset S of T has property P .

Example 3.3.3.2. Let T = {2, 3, 4, 7, 8, 10, 12, 13, 14, 15}. Then a maximal subset of T of
consecutive integers is S = {2, 3, 4}. Other maximal subsets are {7, 8}, {10} and {12, 13, 14, 15}.
Note that {12, 13} is not maximal. Why?

Definition 3.3.3.3. Let V be a vector space over F. Then S is called a maximal linearly
independent subset of V if
T
AF

1. S is linearly independent and


2. no proper superset S of V linearly independent.
DR

Example 3.3.3.4. 1. In R3 , the set S = {e1 , e2 } is linearly independent but not maximal
as S ∪ {(1, 1, 1)T } is a linearly independent set containing S.
2. In R3 , S = {(1, 0, 0)T , (1, 1, 0)T , (1, 1, −1)T } is a maximal linearly independent set as any
collection of 4 or more vectors from R3 is linearly dependent (see Corollary 3.3.2.5).
3. Let S = {v1 , . . . , vk } ⊆ Rn . Now, form the matrix A = [v1 , . . . , vk ] and let B = RREF(A).
Then, using Theorem 3.3.2.9, we see that if B[:, i1 ], . . . , B[:, ir ] are the pivotal columns of
B then {vi1 , . . . , vir } is a maximal linearly independent subset of S.

Theorem 3.3.3.5. Let V be a vector space over F and S a linearly independent set in V. Then
S is maximal linearly independent if and only if LS(S) = V.

Proof. Let v ∈ V. As S is linearly independent, using Corollary 3.3.2.7.2, the set S ∪ {v} is
linearly independent if and only if v ∈ V \ LS(S). Thus, the required result follows.
Let V = LS(S) for some set S with | S | = k. Then, using Theorem 3.3.2.4, we see that if
T ⊆ V is linearly independent then | T | ≤ k. Hence, a maximal linearly independent subset
of V can have at most k vectors. Thus, we arrive at the following important result.

Theorem 3.3.3.6. Let V be a vector space over F and let S and T be two finite maximal linearly
independent subsets of V. Then | S | = | T | .

Proof. By Theorem 3.3.3.5, S and T are maximal linearly independent if and only if LS(S) =
V = LS(T ). Now, use the previous paragraph to get the required result.
74 CHAPTER 3. VECTOR SPACES

Definition 3.3.3.7. Let V be a vector space over F with V 6= {0}. Suppose V has a finite
maximal linearly independent set S. Then | S | is called the dimension of V, denoted dim(V).
By convention, dim({0}) = 0.
Example 3.3.3.8. 1. As {π} is a maximal linearly independent subset of R, dim(R) = 1.
2. As {(1, 0, 1)T , (0, 1, 1)T , (1, 1, 0)T } ⊆ R3 is maximal linearly independent, dim(R3 ) = 3.
3. As {e1 , . . . , en } is a maximal linearly independent set in Rn , dim(Rn ) = n.
4. As {e1 , . . . , en } is a maximal linearly independent subset of the complex vector space Cn ,
dim(Cn ) = n.
5. Using Exercise 3.3.2.12.8, {e1 , . . . , en , ie1 , . . . , ien } is a maximal linearly independent sub-
set of the real vector space Cn . Thus, as a real vector space dim(Cn ) = 2n.
6. Let S = {v1 , . . . , vk } ⊆ Rn . Define A = [v1 , . . . , vk ]. Then, using Example 3.3.3.4.3, we
see that dim(LS(S)) = Rank(A).

Definition 3.3.3.9. [Basis of a Vector Space] Let V be a vector space over F with V 6= {0}.
Then a maximal linearly independent subset of V is called a basis of V. The vectors in a basis
are called basis vectors. Note that a basis of {0} is either not defined or is the empty set.
T

Definition 3.3.3.10. Let V be a vector space over F with V 6= {0}. Then a set S ⊆ V is called
AF

minimal spanning if LS(S) = V and no proper subset of S spans V.


DR

Example 3.3.3.11. [Standard Basis] Fix n ∈ N and let ei = In [:, i], the i-th column of the
identity matrix. Then B = {e1 , . . . , en } is called the standard basis of Rn or Cn . In particular,

1. B = {e1 } = {1} is a standard basis of R over R.

2. B = {e1 , e2 } with eT1 = (1, 0)T and eT2 = (0, 1)T is the standard basis of R2 .

3. B = {e1 , e2 , e3 } = {(1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T } is the standard basis of R3 .

4. {1} is a basis of C over C.

5. {e1 , . . . , en , ie1 , . . . , ien } is a basis of Cn over R. So, {1, i} is a basis of C over R.


Example 3.3.3.12. 1. Note that {−2} is a basis and a minimal spanning set of R.
2. Let u1 , u2 , u3 ∈ R2 . Then, {u1 , u2 , u3 } can neither be a basis nor a minimal spanning
subset of R2 .
3. {(1, 1, −1)T , (1, −1, 1)T , (−1, 1, 1)T } is a basis and a minimal spanning subset of R3 .
4. Let V = {(x, y, 0)T | x, y ∈ R} ⊆ R3 . Then B = {(1, 0, 0)T , (1, 3, 0)T } is a basis of V.
5. Let V = {(x, y, z)T ∈ R3 | x + y − z = 0} ⊆ R3 . As each element (x, y, z)T ∈ V satisfies
x + y − z = 0. Or equivalently z = x + y, we see that

(x, y, z) = (x, y, x + y) = (x, 0, x) + (0, y, y) = x(1, 0, 1) + y(0, 1, 1).

Hence, {(1, 0, 1)T , (0, 1, 1)T } forms a basis of V.


3.3. BASIS OF A VECTOR SPACE 75

6. Let S = {1, 2, . . . , n} and consider


( the vector space RS (see Example 3.3.1.4.8). Then,
1, if j = i
for 1 ≤ i ≤ n, define ei (j) = . Prove that B = {e1 , . . . , en } is a linearly
0, if j 6= i.
independent set. Is it a basis of RS ?

7. Let S = Rn and consider the vector space RS (see Example 3.3.1.4.8). For 1 ≤ i ≤ n,

define the functions ei (x) = ei (x1 , . . . , xn ) = xi . Then, verify that {e1 , . . . , en } is a
linearly independent set. Is it a basis of RS ?

8. Let S = {v1 , . . . , vk } ⊆ Rn . Define A = [v1 , . . . , vk ]. Then, using Theorem 3.3.2.9, the


columns of A corresponding to the pivotal columns in RREF(A) form a basis as well as a
minimal spanning subset of LS(S).

9. Let S = {a1 , . . . , an }. Then S


( recall that R is a real vector space (see Example 8). For
1 if j = i
1 ≤ i ≤ n, define fi (aj ) = . Then, verify that {f1 , . . . , fn } is a basis of
0 otherwise
RS . What can you say if S is a countable set?

10. Recall the vector space C[a, b], where a < b ∈ R. For each α ∈ [a, b] ∩ Q, define fα (x) =
x − α, for all x ∈ [a, b]. Is the set {fα | α ∈ [a, b]} linearly independent? What if α ∈ [a, b]?
T

Can we write any function in C[a, b] as a finite linear combination? Give reasons for your
AF

answer.
DR

3.3.A Main Results associated with Bases

Theorem 3.3.3.13. Let V 6= {0} be a vector space over F. Then the following statements are
equivalent.
1. B is a basis (maximal linearly independent subset) of V.

2. B is linearly independent and it spans V.

3. B is a minimal spanning set of V.

Proof. 1 ⇒ 2 By definition, every basis is a maximal linearly independent subset of V.


Thus, using Corollary 3.3.2.7.2, we see that B spans V.
2 ⇒ 3 Let S be a linearly independent set that spans V. As S is linearly independent,
for any x ∈ S, x ∈
/ LS (S − {x}). Hence LS (S − {x}) 6= V.
3 ⇒ 1 If B is linearly dependent then using Corollary 3.3.2.7.1 B is not minimal spanning.
A contradiction. Hence, B is linearly independent.
We now need to show that B is a maximal linearly independent set. Since B spans V, for any
x ∈ V \ B, the set B ∪ {x} is linearly dependent. That is, every proper superset of B is linearly
dependent. Hence, the required result follows.
Now, using Lemma 3.3.2.11, we get the following result.

Remark 3.3.3.14. Let B be a basis of a vector space V over F. Then, for each v ∈ V, there
P
n
exist unique ui ∈ B and unique αi ∈ F, for 1 ≤ i ≤ n, such that v = αi ui .
i=1
76 CHAPTER 3. VECTOR SPACES

The next result is generally known as ”every linearly independent set can be extended to form
a basis in a finite dimensional vector space”.

Theorem 3.3.3.15. Let V be a vector space over F with dim(V) = n. If S is a linearly


independent subset of V then there exists a basis T of V such that S ⊆ T .

Proof. If LS(S) = V, done. Else, choose u1 ∈ V \ LS(S). Thus, by Corollary 3.3.2.7.2, the set
S ∪{u1 } is linearly independent. We repeat this process till we get n vectors in T as dim(V) = n.
By Theorem 3.3.3.13, this T is indeed a required basis.

3.3.B Constructing a Basis of a Finite Dimensional Vector Space

We end this section with an algorithm which is based on the proof of the previous theorem.

Step 1: Let v1 ∈ V with v1 6= 0. Then {v1 } is linearly independent.

Step 2: If V = LS(v1 ), we have got a basis of V. Else, pick v2 ∈ V \ LS(v1 ). Then by


Corollary 3.3.2.7.2, {v1 , v2 } is linearly independent.

Step i: Either V = LS(v1 , . . . , vi ) or LS(v1 , . . . , vi ) 6= V. In the first case, {v1 , . . . , vi } is


a basis of V. Else, pick vi+1 ∈ V \ LS(v1 , . . . , vi ). Then, by Corollary 3.3.2.7.2, the set
T
AF

{v1 , . . . , vi+1 } is linearly independent.


This process will finally end as V is a finite dimensional vector space.
DR

Exercise 3.3.3.16. 1. Let B = {u1 , . . . , un } be a basis of a vector space V over F. Then,


Pn
does the condition αi ui = 0 in αi ’s imply that αi = 0, for 1 ≤ i ≤ n?
i=1

2. Find a basis of R3 containing the vector (1, 1, −2)T .

3. Find a basis of R3 containing the vector (1, 1, −2)T and (1, 2, −1)T .

4. Is it possible to find a basis of R4 containing the vectors (1, 1, 1, −2)T , (1, 2, −1, 1)T and
(1, −2, 7, −11)T ?

5. Let S = {v1 , . . . , vp } be a subset of a vector space V over F. Suppose LS(S) = V but S


is not a linearly independent set. Then does this imply that each v ∈ V is expressible in
more than one way as a linear combination of vectors from S?

6. Show that B = {(1, 0, 1)T , (1, i, 0)T , (1, 1, 1 − i)T } is a basis of C3 over C.

7. Find a basis of C3 over R containing the basis B given in Example 3.3.3.16.6.

8. Determine a basis and dimension of W = {(x, y, z, w)T ∈ R4 | x + y − z + w = 0}.

9. Find a basis of V = {(x, y, z, u) ∈ R4 | x − y − z = 0, x + z − u = 0}.


 
1 0 1 1 0
 
10. Let A = 
0 1 2 3 0 5
. Find a basis of V = {x ∈ R | Ax = 0}.
0 0 0 0 1
3.4. APPLICATION TO THE SUBSPACES OF CN 77

11. Prove that B = {1, x, . . . , xn , . . .} is a basis of R[x]. B is called the standard basis of R[x].

12. Let uT = (1, 1, −2), vT = (−1, 2, 3) and wT = (1, 10, 1). Find a basis of L(u, v, w).
Determine a geometrical representation of LS(u, v, w).

13. Let V be a vector space of dimension n. Then any set

(a) consisting of n linearly independent vectors forms a basis of V.


(b) S in V having n vectors with LS(S) = V forms a basis of V.

14. Let {v . . , vn } be a basis of Cn . Then prove that the two matrices B = [v1 , . . . , vn ] and
 1 ,T. 
v1
 .. 
C =  .  are invertible.
vnT

15. Let W1 and W2 be two subspaces of a vector space V such that W1 ⊆ W2 . Show that
W1 = W2 if and only if dim(W1 ) = dim(W2 ).

16. Consider the vector space C([−π, π]) over R. For each n ∈ N, define en (x) = sin(nx).
Then prove that S = {en | n ∈ N} is linearly independent. [Hint: Need to show that every
finite subset of S is linearly independent. So, on the contrary assume that there exists ℓ ∈ N and
T
AF

functions ek1 , . . . , ekℓ such that α1 ek1 + · · · + αℓ ekℓ = 0, for some αt 6= 0 with 1 ≤ tℓ. But, the
above system is equivalent to looking at α1 sin(k1 x) + · · · + αℓ sin(kℓ x) = 0 for all x ∈ [−π, π]. Now
DR

in the integral Z π
sin(mx) (α1 sin(k1 x) + · · · + αℓ sin(kℓ x)) dx
−π

replace m with ki ’s to show that αi = 0, for all i, 1 ≤ i ≤ ℓ to get the required contradiction.]

17. Let V be a vector space over F with dim(V) = n. If W1 is a k-dimensional subspace of V


then prove that there exists a subspace W2 of V such that W1 ∩ W2 = {0}, W1 + W2 = V
and dim(W2 ) = n − k. Also, prove that for each v ∈ V there exist unique vectors w1 ∈ W1
and w2 ∈ W2 such that v = w1 + w2 . The subspace W2 is called the complementary
subspace of W1 in V.

18. Is the set W = {p(x) ∈ R[x; 4] | p(−1) = p(1) = 0} a subspace of R[x; 4]? If yes, find its
dimension.

3.4 Application to the subspaces of Cn


In this subsection, we will study results that are intrinsic to the understanding of linear algebra
from the point of view of matrices, especially the fundamental subspaces (see Definition 3.3.1.25)
associated with matrices. We start with an example.
 
1 1 1 −2
 
Example 3.3.4.1. 1. Compute the fundamental subspaces for A = 
1 2 −1 1 .
1 −2 7 −11
Solution: Verify the following
78 CHAPTER 3. VECTOR SPACES

(a) Col(A∗ ) = Row(A) = {(x, y, z, u)T ∈ C4 | 3x − 2y = z, 5x − 3y + u = 0}.


(b) Col(A)= Row(A∗ ) = {(x, y, z)T ∈ C3 | 4x − 3y − z = 0}.
(c) Null(A) = {(x, y, z, u)T ∈ C4 | x + 3z − 5u = 0, y − 2z + 3u = 0}.
(d) Null(A∗ ) = {(x, y, z)T ∈ C3 | x + 4z = 0, y − 3z = 0}.
 
1 1 0 1 1 0 −1
 
2. Let A = 
0 0 1 2 3 0 −2. Find a basis and dimension of Null(A).

0 0 0 0 0 1 1
Solution: Writinf the basic vairables x1 , x3 and x6 in terms of the free variables x2 , x4 , x5
and x7 , we get x1 = x7 − x2 − x4 − x5 , x3 = 2x7 − 2x4 − 3x5 and x6 = −x7 . Hence, the
solution set has the form
           
x1 x7 − x2 − x4 − x5 −1 −1 −1 1
           
 x2   x2  1 0 0 0
           
           
x3   2x7 − 2x4 − 3x5  0 −2 −3 2
           
 x4  =  x  = x  0  + x  1  + x  0  + x  0 . (3.3.4.1)
   4  2   4   5   7  
           
 x5   x5  0 0 1 0
           
x   −x7        −1
 6   0 0 0  
T
AF

x7 x7 0 0 0 1
DR

h i h i h i
Now, let uT1 = −1, 1, 0, 0, 0, 0, 0 , uT2 = −1, 0, −2, 1, 0, 0, 0 , uT3 = −1, 0, −3, 0, 1, 0, 0
h i
and uT4 = 1, 0, 2, 0, 0, −1, 1 . Then S = {u1 , u2 , u3 , u4 } is a basis of Null(A). The
reasons for S to be a basis are as follows:

(a) By Equation (3.3.4.1) Null(A) = LS(S).


(b) For Linear independence, the homogeneous system c1 u1 + c2 u2 + c3 u3 + c4 u4 = 0 in
the unknowns c1 , c2 , c3 and c4 has only the trivial solution as
i. u4 is the only vector with a non-zero entry at the 7-th place (u4 corresponds to
x7 ) and hence c4 = 0.
ii. u3 is the only vector with a non-zero entry at the 5-th place (u3 corresponds to
x5 ) and hence c3 = 0.
iii. Similar arguments hold for the unknowns c2 and c1 .
   
1 2 1 3 2 2 4 0 6
   
0 2 2 2 4  −1 0 −2 5 
Exercise 3.3.4.2. Let A =    
 and B = −3 −5 1 −4.
2 −2 4 0 8   
4 2 5 6 10 −1 −1 1 2

1. Find RREF(A) and RREF(B).

2. Find invertible matrices P1 and P2 such that P1 A = RREF(A) and P2 B = RREF(B).

3. Find bases for Col(A), Col(A∗ ), Col(B) and Col(B ∗ ).


3.4. APPLICATION TO THE SUBSPACES OF CN 79

4. Find bases of Null(A), Null(A∗ ), Null(B) and Null(B ∗ ).

5. Find the dimensions of all the vector subspaces so obtained.

6. Does there exist relationship between the bases of different spaces?

Lemma 3.3.4.3. Let A ∈ Mm×n (C) and let E be an elementary matrix. If


1. B = EA then Col(A∗ ) = Col(B ∗ ) and Col(AT ) = Col(B T ). Hence dim(Col(A∗ )) =
dim(Col(B ∗ )) and dim(Col(AT )) = dim(Col(B T )).
2. B = AE then Col(A) = Col(B) and Col(A) = Col(B). Hence dim(Col(A)) =
dim(Col(B)) and dim(Col(A)) = dim(Col(B)).

Proof. Note that B = EA if and only if B = EA. As E is invertible, A are B are equivalent
and hence they have the same RREF. Also, E is invertible as well and hence A are B have the
same RREF. Now, use Theorem 3.3.2.9 to get the required result.
For the second part, note that B ∗ = E ∗ A∗ and E ∗ is invertible. Hence, using the first part
Col((A∗ )∗ ) = Col((B ∗ )∗ ), or equivalently, Col(A) = Col(B).
Let A ∈ Mm×n (C) and let B = RREF(A). Then as an immediate application of Lemma 3.3.4.3,
we get dim(Col(A∗ )) = Row rank(A). Hence, dim(Col(A)) = Column rank(A) as well. We
T

now prove that Row rank(A) = Column rank(A).


AF

Theorem 3.3.4.4. Let A ∈ Mm×n (C). Then Row rank(A) = Column rank(A).
DR

Proof. Let Row rank(A) = r = dim(Col(AT )). Then  there exist i1 , . . . , ir such that {A[i1 , :
A[i1 , :]
T  .. 
], . . . , A[ir , :]} form a basis of Col(A ). Then, B =  .  is an r × n matrix and it’s rows
A[ir , :]
are a basis of Col(AT ). Therefore, there exist αij ∈ C, 1 ≤ i ≤ m, 1 ≤ j ≤ r such that
A[t, :] = [αt1 , . . . , αtr ]B, for 1 ≤ t ≤ m. So, using matrix multiplication (see Remark 1.1.2.11.4)
   
A[1, :] [α11 , . . . , α1r ]B
   
A =  ...  =  ..
.  = CB,
A[m, :] [αm1 , . . . , αmr ]B
where C = [αij ] is an m × r matrix. Thus, matrix multiplication implies that each column of
A is a linear combination of r columns of C. Hence, Column rank(A) = dim(Col(A)) ≤ r =
Row rank. A similar argument gives Row rank(A) ≤ Column rank(A). Hence, we have the
required result.

Remark 3.3.4.5. The proof also shows that for every A ∈ Mm×n (C) of rank r there exists
matrices Br×n and Cm×r , each of rank r, such that A = CB.

Let W1 and W1 be two subspaces of a vector space V over F. Then recall that (see Exer-
cise 3.3.1.24.7d) W1 + W2 = {u + v | u ∈ W1 , v ∈ W2 } = LS(W1 ∪ W2 ) is the smallest
subspace of V containing both W1 and W2 . We now state a result similar to a result in Venn
diagram that states | A | + | B | = | A ∪ B | + | A ∩ B |, whenever the sets A and B are
finite (for a proof, see Appendix 7.7.4.1).
80 CHAPTER 3. VECTOR SPACES

Theorem 3.3.4.6. Let V be a finite dimensional vector space over F. If W1 and W2 are two
subspaces of V then

dim(W1 ) + dim(W2 ) = dim(W1 + W2 ) + dim(W1 ∩ W2 ). (3.3.4.2)

For better understanding, we give an example for finite subsets of Rn . The example uses
Theorem 3.3.2.9 to obtain bases of LS(S), for different choices S. The readers are advised to
see Example 3.3.2.9 before proceeding further.

Example 3.3.4.7. Let V and W be two spaces with V = {(v, w, x, y, z)T ∈ R5 | v +x+z = 3y}
and W = {(v, w, x, y, z)T ∈ R5 | w − x = z, v = y}. Find bases of V and W containing a basis
of V ∩ W.
Solution: (v, w, x, y, z)T ∈ V ∩ W if v, w, x, y and z satisfy v + x − 3y + z = 0, w − x − z = 0
and v = y. The solution of the system is given by

(v, w, x, y, z)T = (y, 2y, x, y, 2y − x)T = y(1, 2, 0, 1, 2)T + x(0, 0, 1, 0, −1)T .

Thus, B = {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T } is a basis of V ∩ W. Similarly, a basis of V is given


by C = {(−1, 0, 1, 0, 0)T , (0, 1, 0, 0, 0)T , (3, 0, 0, 1, 0)T , (−1, 0, 0, 0, 1)T } and that of W is given by
T

D = {(1, 0, 0, 1, 0)T , (0, 1, 1, 0, 0)T , (0, 1, 0, 0, 1)T }. To find the required basis form a matrix whose
AF

rows are the vectors in B, C and D (see Equation(3.3.4.3)) and apply row operations other than
DR

Eij . Then after a few row operations, we get


  
1 2 0 1 2 1 2 0 1 2
   
0 0 1 0 −1 0 0 1 0 −1
   
   
−1 0 1 0 0 0 1 0 0 0
   
0 1 0 0 0 0 0 0 1 3
   
   
3 0 0 1 0  → 0 0 0 0 0 . (3.3.4.3)
   
−1 0 0 0 1 0 0 0 0 0
   
   
1 0 0 1 0 0 1 0 0 1
   
   
0 1 1 0 0 0 0 0 0 0
0 1 0 0 1 0 0 0 0 0

Thus, a required basis of V is {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T , (0, 1, 0, 0, 0)T , (0, 0, 0, 1, 3)T }. Sim-
ilarly, a required basis of W is {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T , (0, 1, 0, 0, 1)T }.

Exercise 3.3.4.8. 1. Give an example to show that if A and B are equivalent then Col(A)
need not equal Col(B).

2. Let V = {(x, y, z, w)T ∈ R4 | x + y − z + w = 0, x + y + z + w = 0, x + 2y = 0} and


W = {(x, y, z, w)T ∈ R4 | x − y − z + w = 0, x + 2y − w = 0} be two subspaces of R4 . Find
bases and dimensions of V, W, V ∩ W and V + W.

3. Let W1 and W2 be 4-dimensional subspaces of a vector space V of dimension 7. Then


prove that dim(W1 ∩ W2 ) ≥ 1.
3.4. APPLICATION TO THE SUBSPACES OF CN 81

4. Let W1 and W2 be two subspaces of a vector space V. If dim(W1 ) + dim(W2 ) > dim(V),
then prove that dim(W1 ∩ W2 ) ≥ 1.

5. Let A ∈ Mm×n (C) with m < n. Prove that the columns of A are linearly dependent.

We now prove the rank-nullity theorem and give some of it’s consequences.

Theorem 3.3.4.9 (Rank-Nullity Theorem). Let A ∈ Mm×n (C). Then

dim(Col(A)) + dim(Null(A)) = n. (3.3.4.4)

Proof. Let dim(Null(A)) = r ≤ n and let B = {u1 , . . . , ur } be a basis of Null(A). Since B is


a linearly independent set in Rn , extend it to get B = {u1 , . . . , un } as a basis of Rn . Then,

Col(A) = LS(B) = LS(Au1 , . . . , Aun )


= LS(0, . . . , 0, Aur+1 , . . . , Aun ) = LS(Aur+1 , . . . , Aun ).

So, C = {Aur+1 , . . . , Aun } spans Col(A). We further need to show that C is linearly indepen-
dent. So, consider the linear system

α1 Aur+1 + · · · + αn−r Aun = 0 ⇔ A(α1 ur+1 + · · · + αn−r un ) = 0 (3.3.4.5)


T
AF

in the unknowns α1 , . . . , αn−r . Thus, α1 ur+1 + · · · + αn−r un ∈ Null(A) = LS(B). Therefore,


DR

P
n−r Pr
there exist scalars βi , 1 ≤ i ≤ r, such that αi ur+i = βj uj . Or equivalently,
i=1 j=1

β1 u1 + · · · + βr ur − α1 ur+1 − · · · − αn−r un = 0. (3.3.4.6)

As B is a linearly independent set, the only solution of Equation (3.3.4.6) is

αi = 0, for 1 ≤ i ≤ n − r and βj = 0, for 1 ≤ j ≤ r.

In other words, we have shown that the only solution of Equation (3.3.4.5) is the trivial solution.
Hence, {Aur+1 , . . . , Aun } is a basis of Col(A). Thus, the required result follows.
Theorem 3.3.4.9 is part of what is known as the fundamental theorem of linear algebra (see
Theorem 5.5.1.23). The following are some of the consequences of the rank-nullity theorem. The
proof is left as an exercise for the reader.

Exercise 3.3.4.10. 1. Let A ∈ Mm,n (C). If

(a) n > m then the system Ax = 0 has infinitely many solutions,


(b) n < m then there exists b ∈ Rm \ {0} such that Ax = b is inconsistent.

2. The following statements are equivalent for an m × n matrix A.

(a) Rank (A) = k.


(b) There exist a set of k rows of A that are linearly independent.
(c) There exist a set of k columns of A that are linearly independent.
82 CHAPTER 3. VECTOR SPACES

(d) dim(Col(A)) = k.
(e) There exists a k × k submatrix B of A with det(B) 6= 0. Further, the determinant of
every (k + 1) × (k + 1) submatrix of A is zero.
(f ) There exists a linearly independent subset {b1 , . . . , bk } of Rm such that the system
Ax = bi , for 1 ≤ i ≤ k, is consistent.
(g) dim(Null(A)) = n − k.

3.5 Ordered Bases


Let V be a vector subspace of Cn for some n ∈ N with dim(V) = k. Then, a basis of V may
not look like a standard basis. Our problem may force us to look for some other basis. In such
a case, it is always helpful to fix the vectors in a particular order and then concentrate only on
the coefficients of the vectors as was done for the system of linear equations where we didn’t
worry about the unknowns. It also may happen that k is very-very small as compared to n and
hence it is always better to work with vectors of small size.

Definition 3.3.5.1. [Ordered Basis, Basis Matrix] Let V be a vector space over F with a basis
T
AF

B = {u1 , . . . , un }. Then an ordered basis for V is a basis B together with a one-to-one


correspondence between B and {1, 2, . . . , n}. Since there is an order among the elements of B,
DR

we write B = [u1 , . . . , un ]. The matrix B = [u1 , . . . , un ] is called the basis matrix.

Thus, B = [u1 , u2 , . . . , un ] is different from C = [u2 , u3 , . . . , un , u1 ] and both of them are


different from D = [un , un−1 , . . . , u2 , u1 ] even though they have the same set of vectors as
elements. We now define the notion of coordinates of a vector with respect to an ordered basis.

Definition 3.3.5.2. [Coordinates of a Vector] Let B = [v1 , . . . , vn ] be the basis matrix cor-
responding to an ordered basis B of V. SinceB  is a basis of V, for each v ∈ V, there exist
β1 β1
Pn
 ..   .. 
βi , 1 ≤ i ≤ n, such that v = βi vi = B  . . The column vector  .  is called the
i=1
βn βn
coordinates of v with respect to B, denoted [v]B . Thus, using notation v = B[v]B .
        
1 1 2 π 1 1 π
Example 3.3.5.3. 1. Let B = , be an ordered basis of R . Then, = .
1 2 e 1 2 e B
   −1  
π 1 1 π
Thus, = .
e B 1 2 e

2. Consider the vector space R[x; 2] with basis {1 − x, 1 + x, x2 }. Then an ordered basis can
either be B = [1 − x, 1 + x, x2 ] or C = [1 + x, 1 − x, x2 ] or .... Note that there are 3! different
ordered bases. Also, for a0 + a1 x + a2 x2 ∈ R[x; 2], one has
 a0 −a1 
2
a0 + a1 x + a2 x2 = [1 − x, 1 + x, x2 ] a0 +a
2
1
.
a2
3.5. ORDERED BASES 83
 a0 −a1   a0 +a1 
2 2
Thus, [a0 + a1 x + a2 x2 ]B =  a0 +a1 , whereas [a0 + a1 x + a2 x2 ]C =  a0 −a1
.
2 2
a2 a2

3. Let {(x, y, z)T | x + y = z}. If B = [(−1,


 V = 
T T
 1, 0) , (1, 0, 1) ] is an ordered basis of V then
x −1 1   3  
y  =  1 0 y and hence 4   =
4
.
x+y 7
z B 0 1 7 B

4. Let V = {(v, w, x, y, z)T ∈ R5 | w − x = z, v = y, v + x + z = 3y}.   So, if B =


3
[(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T ] is an ordered basis then [(3, 6, 0, 3, 1)]B = .
5

Let V be a vector space over F with dim(V) = n. Let A = [v1 , . . . , vn ] and B = [u1 , . . . , un ] be
basis matrices corresponding to the ordered bases B and C, respectively. So, using the notation
in Definition 3.3.5.2, we have

A = [v1 , . . . , vn ] = [B[v1 ]C , . . . , B[vn ]C ] = B [[v1 ]C , . . . , [vn ]C ] .

So, the matrix [A]C = [[v1 ]C , . . . , [vn ]C ], denoted [B]C , is called the matrix of B with respect
to the ordered basis C or the change of basis matrix from B to C. We now summarize the
T

above discussion.
AF

Theorem 3.3.5.4. Let V be a vector space over F with dim(V) = n. Let B = [v1 , . . . , vn ] and
DR

C = [u1 , . . . , un ] be two ordered bases of V.


1. Then, the matrix [B]C is invertible and [w]C = [B]C [w]B , for all w ∈ V.
2. Similarly, verify that [C]B is invertible and [w]B = [C]B [w]C , for all w ∈ V.
3. Furthermore, ([B]C )−1 = [C]B .

Proof. Part 1: We prove all the parts together. Let A = [v1 , . . . , vn ], B = [u1 , . . . , un ] and
C = [B]C and D = [C]B . Then, by previous paragraph A = BC. Similarly,

B = [u1 , . . . , un ] = [A[u1 ]B , . . . , A[un ]B ] = A [[u1 ]B , . . . , [un ]B ] = AD.

But, by Exercise 3.3.3.16.14, A and B are invertible and thus C = B −1 A and D = A−1 B are
−1
invertible as well. Clearly, C −1 = B −1 A = A−1 B = D which proves the third part. For the
first two parts, note that for any w ∈ V, w = A[w]B , w = B[w]C . Hence,

B[w]C = w = A[w]B = BC[w]B = B[B]C [w]B

and thus [w]C = [B]C [w]B . Similarly, [w]B = [C]B [w]C and the required result follows.
Example 3.3.5.5. 1. Note that if C = [e1 , . . . , en ] then

[A]C = [[v1 ]C , . . . , [vn ]C ] = [v1 , . . . , vn ] = A.

   
2. Suppose B = (1, 0, 0)T , (1, 1, 0)T , (1, 1, 1)T and C = (1, 1, 1)T , (1, −1, 1)T , (1, 1, 0)T are
two bases of R3 . Then, verify the statements in the previous result.
84 CHAPTER 3. VECTOR SPACES

        −1    
x 1 1 1 x x 1 1 1 x x−y
(a) Then y  = 0 1 1y  . Thus, y  = 0 1 1 y  =  y − z .
z 0 0 1 z B z B 0 0 1 z z
   −1       
x 1 1 1 x −1 1 2 x −x + y + 2z
1 1
(b) Similarly, y  = 1 −1 1 y  =  1 −1 0 y  =  x − y + z .
2 2
z C 1 1 0 z 2 0 −2 z 2x − 2z
   
−1/2 0 1 0 2 0
  
(c) Verify that [B]C = 1/2 0 0 and [C]B = 0 −2 1.
1 1 0 1 1 0
Remark 3.3.5.6. Let V be a vector space over F with B = [v1 , . . . , vn ] as an ordered basis.
Then, by Theorem 3.3.5.4, vB is an element of Fn , for each v ∈ V. Therefore, if
1. F = R then the elements of V look like the elements of Rn .
2. F = C then the elements of V look like the elements of Cn .
   
Exercise 3.3.5.7. Let B = (1, 2, 0)T , (1, 3, 2)T , (0, 1, 3)T and C = (1, 2, 1)T , (0, 1, 2)T , (1, 4, 6)T
be two ordered bases of R3 . Find the change of basis matrix

1. P from B to C.
T

2. Q from C to B.
AF
DR

3. from the standard basis of R3 to B. What do you notice?

Is it true that P Q = I = QP ? Give reasons for your answer.

3.6 Summary
In this chapter, we defined vector spaces over F. The set F was either R or C. To define a vector
space, we start with a non-empty set V of vectors and F the set of scalars. We also needed to
do the following:

1. first define vector addition and scalar multiplication and

2. then verify the axioms in Definition 3.3.1.1.

If all axioms in Definition 3.3.1.1 are satisfied then V is a vector space over F. If W was a
non-empty subset of a vector space V over F then for W to be a space, we only need to check
whether the vector addition and scalar multiplication inherited from that in V hold in W.
We then learnt linear combination of vectors and the linear span of vectors. It was also shown
that the linear span of a subset S of a vector space V is the smallest subspace of V containing
S. Also, to check whether a given vector v is a linear combination of u1 , . . . , un , we needed to
solve the linear system c1 u1 + · · · + cn un = v in the unknowns c1 , . . . , cn . Or equivalently, the
system Ax = b, where in some sense A[:, i] = ui , 1 ≤ i ≤ n, xT = [c1 , . . . , cn ] and b = v. It
was also shown that the geometrical representation of the linear span of S = {u1 , . . . , un } is
equivalent to finding conditions in the entries of b such that Ax = b was always consistent.
3.6. SUMMARY 85

Then, we learnt linear independence and dependence. A set S = {u1 , . . . , un } is linearly


independent set in the vector space V over F if the homogeneous system Ax = 0 has only the
trivial solution in F. Else S is linearly dependent, where as before the columns of A correspond
to the vectors ui ’s.
We then talked about the maximal linearly independent set (coming from the homogeneous
system) and the minimal spanning set (coming from the non-homogeneous system) and culmi-
nating in the notion of the basis of a finite dimensional vector space V over F. The following
important results were proved.

1. A linearly independent set can be extended to form a basis of V.

2. Any two bases of V have the same number of elements.

This number was defined as the dimension of V, denoted dim(V).


Now let A ∈ Mn (R). Then, combining a few results from the previous chapter, we have the
following equivalent conditions.

1. A is invertible.

2. The homogeneous system Ax = 0 has only the trivial solution.


T
AF

3. RREF(A) = In .
DR

4. A is a product of elementary matrices.

5. The system Ax = b has a unique solution for every b.

6. The system Ax = b has a solution for every b.

7. Rank(A) = n.

8. det(A) 6= 0.

9. Col(AT ) = Row(A) = Rn .

10. Rows of A form a basis of Rn .

11. Col(A) = Rn .

12. Columns of A form a basis of Rn .

13. Null(A) = {0}.


86 CHAPTER 3. VECTOR SPACES

T
AF
DR
Chapter 4

Linear Transformations

4.1 Definitions and Basic Properties


In the previous chapter, it was shown that if V is a real vector space with dim(V) = n then with
respect to an ordered basis, the elements of V were column vectors of size n. So, in some sense
the vector in V look like elements of Rn . In this chapter, we concretize this idea. We also show
that matrices give rise to functions between two finite dimensional vector spaces. To do so, we
T
AF

start with the definition of functions over vector spaces that commute with the operations of
vector addition and scalar multiplication.
DR

Definition 4.4.1.1. [Linear Transformation, Linear Operator] Let V and W be vector spaces
over F. A function (map) T : V → W is called a linear transformation if for all α ∈ F and
u, v ∈ V the function T satisfies

T (α · u) = α ⊙ T (u) and T (u + v) = T (u) ⊕ T (v),

where +, · are binary operations in V and ⊕, ⊙ are the binary operations in W. By L(V, W), we
denote the set of all linear transformations from V to W. In particular, if W = V then the linear
transformation T is called a linear operator and the corresponding set of linear operators is
denoted by L(V).

Definition 4.4.1.2. [Equality of Linear Transformation] Let S, T ∈ L(V, W). Then, S and T
are said to be equal if T (x) = S(x), for all x ∈ V.

We now give examples of linear transformations.

Example 4.4.1.3. 1. Let V be a vector space. Then, the maps Id, 0 ∈ L(V), where

(a) Id(v) = v, for all v ∈ V, is commonly called the identity operator.


(b) 0(v) = 0, for all v ∈ V, is commonly called the zero operator.

2. Let V and W be two vector spaces over F. Then, 0 ∈ L(V, W), where 0(v) = 0, for all
v ∈ V, is commonly called the zero transformation.
88 CHAPTER 4. LINEAR TRANSFORMATIONS

3. The map T (x) = x, for all x ∈ R, is an element of L(R) as T (ax) = ax = aT (x) and
T (x + y) = x + y = T (x) + T (y).

4. The map T (x) = (x, 3x)T , for all x ∈ R, is an element of L(R, R2 ) as T (λx) = (λx, 3λx)T =
λ(x, 3x)T = λT (x) and T (x + y) = (x + y, 3(x + y)T = (x, 3x)T + (y, 3y)T = T (x) + T (y).

5. Let V, W and Z be vector spaces over F. Then, for any T ∈ L(V, W) and S ∈ L(W, Z), the

map S ◦T ∈ L(V, Z), where (S ◦T )(v) = S T (v) , for all v ∈ V, is called the composition
of maps. The readers should verify that S ◦ T , in short ST , is an element of L(V, Z).

6. Fix a ∈ Rn and define T (x) = hx, ai, for all x ∈ Rn . Then T ∈ L(Rn , R). For example, if
P
n
(a) a = (1, . . . , 1)T then T (x) = xi , for all x ∈ Rn .
i=1
(b) a = ei , for a fixed i, 1 ≤ i ≤ n, then Ti (x) = xi , for all x ∈ Rn .

7. Define T : R2 → R3 by T (x, y)T = (x + y, 2x − y, x + 3y)T . Then T ∈ L(R2 , R3 ) with
T (e1 ) = (1, 2, 1)T and T (e2 ) = (1, −1, 3)T .

8. Let A ∈ Mm×n (C). Define TA (x) = Ax, for every x ∈ Cn . Then, TA ∈ L(Cn , Cm ). Thus,
for each A ∈ Mm,n (C), there exists a map TA ∈ L(Cn , Cm ).
T
AF


9. Define T : Rn+1 → R[x; n] by T (a1 , . . . , an+1 )T = a1 + a2 x + · · · + an+1 xn , for
DR

(a1 , . . . , an+1 ) ∈ Rn+1 . Then T is a linear transformation.

10. Fix A ∈ Mn (C). Then TA : Mn (C) → Mn (C) and SA : Mn (C) → C are both linear
transformations, where TA (B) = BA∗ and SA (B) = Tr(BA∗ ), for every B ∈ Mn (C).
d
11. The map T : R[x; n] → R[x; n] defined by T (f (x)) = dx f (x) = f ′ (x), for all f (x) ∈ R[x; n]
is a linear transformation.
d
Rx
12. The maps T, S : R[x] → R[x] defined by T (f (x)) = dx f (x) and S(f (x)) = f (t)dt, for all
0
f (x) ∈ R[x] are linear transformations. Is it true that T S = Id? What about ST ?

13. Recall the vector space RN in Example 3.3.1.4.8. Now, define maps T, S : RN → RN
by T ({a1 , a2 , . . .}) = {0, a1 , a2 , . . .} and S({a1 , a2 , . . .}) = {a2 , a3 , . . .}. Then, T and S,
commonly called the shift operators, are linear operators with exactly one of ST or T S
as the Id map.

14. Recall the vector space C(R, R) (see Example 3.3.1.4.10). Then, the map g = T (f ), for
Rx
each f ∈ C(R, R), defined by g(x) = f (t)dt is an element of L(C(R, R)). For example,
0
Rx
(T (sin))(x) = sin(t)dt = 1 − cos(x). So, (T (sin))(x) = 1 − cos(x), for all x ∈ R.
0

We now prove that any linear transformation sends the zero vector to a zero vector.

Proposition 4.4.1.4. Let T ∈ L(V, W). Suppose that 0V is the zero vector in V and 0W is the
zero vector of W. Then T (0V ) = 0W .
4.1. DEFINITIONS AND BASIC PROPERTIES 89

Proof. Since 0V = 0V + 0V , we have T (0V ) = T (0V + 0V ) = T (0V ) + T (0V ). Thus, T (0V ) = 0W


as T (0V ) ∈ W.
From now on 0 will be used as the zero vector of the domain and codomain. The next result
states that a linear transformation is known if we know its image on a basis of the domain space.

Example 4.4.1.5. Does there exist a linear transformation

1. T : V → W such that T (v) 6= 0, for all v ∈ V?


Solution: No, as T (0) = 0 (see Proposition 4.4.1.4).

2. T : R → R such that T (x) = x2 , for all x ∈ R?


Solution: No, as T (ax) = (ax)2 = a2 x2 = a2 T (x) 6= aT (x), unless a = 0, 1.

3. T : R → R such that T (x) = x, for all x ∈ R?
√ √ √ √
Solution: No, as T (ax) = ax = a x 6= a x = aT (x), unless a = 0, 1.

4. T : R → R such that T (x) = sin(x), for all x ∈ R?


Solution: No, as T (ax) 6= aT (x).

5. T : R → R such that T (5) = 10 and T (10) = 5?


Solution: No, as T (10) = T (5 + 5) = T (5) + t(5) = 10 + 10 = 20 6= 5.
T
AF

6. T : R → R such that T (5) = π and T (e) = π?


π eπ
DR

Solution: No, as 5T (1) = T (5) = π implies that T (1) = . So, T (e) = eT (1) = .
5 5
7. T : R2 → R2 such that T ((x, y)T ) = (x + y, 2)T ?
Solution: No, as T (0) 6= 0.

8. T : R2 → R2 such that T ((x, y)T ) = (x + y, xy)T ?


Solution: No, as T ((2, 2)T ) = (4, 4)T 6= 2(2, 1)T = 2T ((1, 1)T ).

Theorem 4.4.1.6. Let V and W be two vector spaces over F and let T ∈ L(V, W). Then T is
known if the image of T on basis vectors of V are known.

Proof. Let {u1 , . . . , un } be a  basis of V and v ∈ V. Then, there exist c1 , . . . , cn ∈ F such that
c1
Pn
 
v= ui ci = [u1 , . . . , un ] ... . Hence, by definition
i=1
cn
 
c1
 .. 
T (v) = T (c1 u1 + · · · + cn un ) = c1 T (u1 ) + · · · + cn T (un ) = [T (u1 ), . . . , T (un )] . .
cn

Thus, the required result follows.


As a direct application, we have the following result.

Corollary 4.4.1.7 (Reisz Representation Theorem). Let T ∈ L(Rn , R). Then, there exists
a ∈ Rn such that T (x) = aT x.
90 CHAPTER 4. LINEAR TRANSFORMATIONS

Proof. By Theorem 4.4.1.6, T is known if we know the image of T on {e1 , . . . , en }, the standard
basis of Rn . As T is given, for 1 ≤ i ≤ n, T (ei ) = ai , for some ai ∈ R. So, let us take
a = (a1 , . . . , an )T . Then, for x = (x1 , . . . , xn )T ∈ Rn ,
n
! n n
X X X
T (x) = T xi ei = xi T (ei ) = xi ai = aT x.
i=1 i=1 i=1

Thus, the required result follows.

Example 4.4.1.8. Does there exist a linear transformation


1. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((1, −1)T ) = (5, 10)T ? 
2 1 1
Solution: Yes, as the set {(1, 1), (1, −1)} is a basis of R , the matrix is invertible.
1 −1
              
1 1 a 1 1 1 5 1 5 a
Also, T =T a +b =a +b = . So,
1 −1 b 1 −1 2 10 2 10 b
     −1  !!    −1  !
x 1 1 1 1 x 1 5 1 1 x
T = T =
y 1 −1 1 −1 y 2 10 1 −1 y
  x+y   
1 5 2 3x − 2y
= x−y = .
2 10 2 6x − 4y
T
AF

2. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((5, 5)T ) = (5, 10)T ?
DR

Solution: Yes, as (5, 10)T = T ((5, 5)T ) = 5T ((1, 1)T ) = 5(1, 2)T = (5, 10)T .
To construct one such linear transformation, let {(1, 1)T , u} be a basis of R2 and define
T (u) = v = (v1 , v2 )T , for some v ∈ R2 . For example, if u = (1, 0)T then
     −1  !!    −1  !  
x 1 1 1 1 x 1 v1 1 1 x 1
T =T = =y + (x − y)v.
y 1 0 1 0 y 2 v2 1 0 y 2

3. T : R2 → R2 such that Rng(T ) = {T (x) | x ∈ R2 } = LS{(1, π)T }?


Solution: Yes. Define T (e1 ) = (1, π)T and T (e2 ) = 0 or T (e1 ) = (1, π)T and T (e2 ) =
a(1, π)T ), for some a ∈ R.
4. T : R2 → R2 such that Rng(T ) = {T (x) | x ∈ R2 } = R2 ?
Solution: Yes. Define T (e1 ) = (1, π)T and T (e2 ) = (π, e)T . Or, let {u, v} be a basis of
R2 and define T (e1 ) = u and T (e2 ) = v.
5. T : R2 → R2 such that Rng(T ) = {T (x) | x ∈ R2 } = {0}?
Solution: Yes. Define T (e1 ) = 0 and T (e2 ) = 0.
6. T : R2 → R2 such that Null(T ) = {x ∈ R2 | T (x) = 0} = LS{(1, π)T }?
Solution: Yes. Let a basis of R2 = {(1, π)T , (1, 0)T } and define T ((1, π)T ) = 0 and
T ((1, 0)T ) = u 6= 0.

Exercise 4.4.1.9. 1. Let V be a vector space and let a ∈ V. Then the map Ta : V → V
defined by Ta (x) = x + a, for all x ∈ V is called the translation map. Prove that
Ta ∈ L(V) if and only if a = 0.
4.1. DEFINITIONS AND BASIC PROPERTIES 91

2. Are the maps T : V → W given below, linear transformations?

(a) Let V = R2 and W = R3 with T ((x, y)T ) = (x + y + 1, 2x − y, x + 3y)T .


(b) Let V = W = R2 with T ((x, y)T ) = (x − y, x2 − y 2 )T .
(c) Let V = W = R2 with T ((x, y)T ) = (x − y, | x | )T .
(d) Let V = R2 and W = R4 with T ((x, y)T ) = (x + y, x − y, 2x + y, 3x − 4y)T .
(e) Let V = W = R4 with T ((x, y, z, w)T ) = (z, x, w, y)T .

3. Which of the following maps T : M2 (R) → M2 (R) are linear operators?

(a) T (A) = AT (b) T (A) = I + A (c) T (A) = A2


(d) T (A) = BAB −1 , where B is a fixed 2 × 2 matrix.

4. Prove that a map T : R → R is a linear transformation if and only if there exists a unique
c ∈ R such that T (x) = cx, for every x ∈ R.

5. Let A ∈ Mn (C) and define TA : Cn → Cn by TA (x) = Ax for every x ∈ Cn . Prove that


for any positive integer k, TAk (x) = Ak x.
T

6. Use matrices to give examples of linear operators T, S : R3 → R3 that satisfy:


AF
DR

(a) T 6= 0, T 2 6= 0, T 3 = 0.
(b) T 6= 0, S 6= 0, S ◦ T 6= 0, T ◦ S = 0.
(c) S 2 = T 2 , S 6= T .
(d) T 2 = I, T 6= I.

7. Let T : Rn → Rn be a linear operator with T 6= 0 and T 2 = 0. Prove that there exists a


vector x ∈ Rn such that the set {x, T (x)} is linearly independent.

8. Fix a positive integer p and let T : Rn → Rn be a linear operator with T k 6= 0 for


1 ≤ k ≤ p and T p+1 = 0. Then prove that there exists a vector x ∈ Rn such that the set
{x, T (x), . . . , T p (x)} is linearly independent.

9. Let T : Rn → Rm be a linear transformation with T (x0 ) = y0 for some x0 ∈ Rn and


y0 ∈ Rm . Define T −1 (y0 ) = {x ∈ Rn : T (x) = y0 }. Then prove that for every x ∈ T −1 (y0 )
there exists z ∈ T −1 (0) such that x = x0 + z. Also, prove that T −1 (y0 ) is a subspace of
Rn if and only if 0 ∈ T −1 (y0 ).

10. Define a map T : C → C by T (z) = z, the complex conjugate of z. Is T a linear transfor-


mation over the real vector space C?

11. Prove that there exists infinitely many linear transformations T : R3 → R2 such that
T ((1, −1, 1)T ) = (1, 2)T and T ((−1, 1, 2)T ) = (1, 0)T ?

12. Does there exist a linear transformation T : R3 → R2 such that


92 CHAPTER 4. LINEAR TRANSFORMATIONS

(a) T ((1, 0, 1)T ) = (1, 2)T , T ((0, 1, 1)T ) = (1, 0)T and T ((1, 1, 1)T ) = (2, 3)T ?
(b) T ((1, 0, 1)T ) = (1, 2)T , T ((0, 1, 1)T ) = (1, 0)T and T ((1, 1, 2)T ) = (2, 3)T ?

13. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x + 3y + 4z, x + y + z, x + y + 3z)T . Find


the value of k for which there exists a vector x ∈ R3 such that T (x) = (9, 3, k)T .

14. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x − 2y + 2z, −2x + 5y + 2z, 8x + y + 4z)T .
Find x ∈ R3 such that T (x) = (1, 1, −1)T .

15. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x + y + 3z, 4x − y + 3z, 3x − 2y + 5z)T .


Determine x, y, z ∈ R3 \ {0} such that T (x) = 6x, T (y) = 2y and T (z) = −2z. Is the set
{x, y, z} linearly independent?

16. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x + 3y + 4z, −y, −3y + 4z)T . Determine
x, y, z ∈ R3 \ {0} such that T (x) = 2x, T (y) = 4y and T (z) = −z. Is the set {x, y, z}
linearly independent?

17. Let n ∈ N. Does there exist a linear transformation T : R3 → Rn such that T ((1, 1, −2)T ) =
x, T ((−1, 2, 3)T ) = y and T ((1, 10, 1)T ) = z
T
AF

(a) with z = x + y?
(b) with z = cx + dy, for some c, d ∈ R?
DR

18. For each matrix A given below, define T ∈ L(R2 ) by T (x) = Ax. What do these linear
operators signify geometrically?
 √     √      
1 3 −1

1 1 −1 1 1 − 3
√ 0 −1 cos 2π 3  − sin 2π
3
(a) A ∈ , √ , , , 2π 2π .
2 1 3 2 1 1 2 3 1 1 0 sin cos
            3 3
−1 0 1 0 1 1 1 1 1 2 0 0 1 0
(b) A ∈ , , , , , .
0 1 0 −1 2 1 −1 5 2 4 0 1 0 0
 √     √   2π
 2π
 
1 3 1 1 1 1 1 1 3 cos sin
(c) A ∈ √ ,√ , √ , 3  3  .
2 1 − 3 2 1 −1 2 3 −1 sin 2π3 − cos 2π 3

19. Find all functions f : R2 → R2 that fixes the line y = x and sends (x1 , y1 ) for x1 6= y1 to
its mirror image along the line y = x. Or equivalently, f satisfies

(a) f (x, x) = (x, x) and


(b) f (x, y) = (y, x) for all (x, y) ∈ R2 .

20. Consider the space C3 over C. If f ∈ L(C3 ) with f (x) = x, f (y) = (1 + i)y and f (z) =
(2 + 3i)z, for x, y, z ∈ C3 \ {0} then prove that {x, y, z} form a basis of C3 .

4.2 Rank-Nullity Theorem


The readers are advised to see Theorem 3.3.4.9 on Page 81 for clarity and similarity with the
results in this section. We start with the following result.
4.2. RANK-NULLITY THEOREM 93

Theorem 4.4.2.1. Let V and W be two vector spaces over F and let T ∈ L(V, W). If S ⊆ V is
linearly dependent then T (S) = {T (v) | v ∈ V} is linearly dependent.

Proof. As S is linearly dependent, there exist k ∈ N and vi ∈ S, for 1 ≤ i ≤ k, such that the
Pk
system xi vi = 0, in the unknown xi ’s, has a non-trivial solution, say xi = ai ∈ F, 1 ≤ i ≤ k.
i=1
P
k P
k
Thus, ai vi = 0. Now, consider the system yi T (vi ) = 0, in the unknown yi ’s. Then,
i=1 i=1

k k k
!
X X X
ai T (vi ) = T (ai vi ) = T ai vi = T (0) = 0.
i=1 i=1 i=1

P
k
Thus, ai ’s give a non-trivial solution of yi T (vi ) = 0 and hence the required result follows.
i=1
As an immediate corollary, we get the following result.

Remark 4.4.2.2. Let V and W be two vector space over F and let T ∈ L(V, W). Suppose S ⊆ V
such that T (S) is linearly independent then S is linearly independent.

We now give some important definitions.


T

Definition 4.4.2.3. [Range Space and Null Space] Let V and W be two vector spaces over F
AF

and let T ∈ L(V, W). Then, we define


DR

1. Rng(T ) = {T (x) | x ∈ V} and call it the range space of T and

2. Null(T ) = {x ∈ V | T (x) = 0} and call it the null space of T .

Example 4.4.2.4. Determine Rng(T ) and Null(T ) of T ∈ L(R3 , R4 ), where we define T ((x, y, z)T ) =
(x − y + z, y − z, x, 2x − 5y + 5z)T .
Solution: Consider the standard basis {e1 , e2 , e3 } of R3 . Then

Rng(T ) = LS(T (e1 ), T (e2 ), T (e3 )) = LS (1, 0, 1, 2)T , (−1, 1, 0, −5)T , (1, −1, 0, 5)T

= LS (1, 0, 1, 2)T , (1, −1, 0, 5)T = {λ(1, 0, 1, 2)T + β(1, −1, 0, 5)T | λ, β ∈ R}
= {(λ + β, −β, λ, 2λ + 5β) : λ, β ∈ R}
= {(x, y, z, w)T ∈ R4 | x + y − z = 0, 5y − 2z + w = 0}

and

Null(T ) = {(x, y, z)T ∈ R3 : T ((x, y, z)T ) = 0}


= {(x, y, z)T ∈ R3 : (x − y + z, y − z, x, 2x − 5y + 5z)T = 0}
= {(x, y, z)T ∈ R3 : x − y + z = 0, y − z = 0, x = 0, 2x − 5y + 5z = 0}
= {(x, y, z)T ∈ R3 : y − z = 0, x = 0}
= {(0, y, y)T ∈ R3 : y ∈ R} = LS((0, 1, 1)T )

Exercise 4.4.2.5. 1. Let V and W be two vector spaces over F and let T ∈ L(V, W). Then
94 CHAPTER 4. LINEAR TRANSFORMATIONS

(a) Rng(T ) is a subspace of W.


(b) Null(T ) is a subspace of V.

Furthermore, if V is finite dimensional then


(a) dim(Null(T )) ≤ dim(V).
(b) dim(Rng(T )) is finite and whenever dim(W) is finite dim(Rng(T )) ≤ dim(W).

2. Describe Null(D) and Rng(D), where D ∈ L(R[x; n]) and is defined by D(f (x)) = f ′ (x).
Note that Rng(D) ⊆ R[x; n − 1].

3. Define T ∈ L(R3 ) by T (e1 ) = e1 + e3 , T (e2 ) = e2 + e3 and T (e3 ) = −e3 . Then

(a) determine T ((x, y, z)T ), for x, y, z ∈ R.


(b) determine Null(T ) and Rng(T ).
(c) is it true that T 3 = T ?

4. Find T ∈ L(R3 ) for which Rng(T ) = LS (1, 2, 0)T , (0, 1, 1)T , (1, 3, 1)T .

5. Let V and W be two vector spaces over F. If {v1 , . . . , vn } is a V and w1 , . . . , wn ∈ W then


T

prove that there exists a unique T ∈ L(V, W) such that T (vi ) = wi , for i = 1, . . . , n.
AF

Definition 4.4.2.6. [Rank and Nullity] Let V and W be two vector spaces over F. If
DR

T ∈ L(V, W) and dim(V) is finite then we define Rank(T ) = dim(Rng(T )) and


Nullity(T ) = dim(Null(T )).

We now prove the rank-nullity Theorem. The proof of this result is similar to the proof of
Theorem 3.3.4.9. We give it again for the sake of completeness.

Theorem 4.4.2.7 (Rank-Nullity Theorem). Let V and W be two vector spaces over F. If
T ∈ L(V, W) and dim(V) is finite then

Rank(T ) + Nullity(T ) = dim(Rng(T )) + dim(Null(T )) = dim(V).

Proof. By Exercise 4.4.2.5.1.1a, dim(Null(T )) ≤ dim(V). Let B be a basis of Null(T ). We


extend it to form a basis C of V. So, by definition Rng(T ) = LS({T (v)|v ∈ C}) = LS({T (v)|v ∈
C \ B}). We claim that {T (v)|v ∈ C \ B} is linearly independent subset of W.
Let if possible the claim be false. Then, there exists v1 , . . . , vk ∈ C \ B and a = [a1 , . . . , ak ]T
Pk
such that a 6= 0 and ai T (vi ) = 0. Thus, we see that
i=1

k
! k
X X
T ai vi = ai T (vi ) = 0.
i=1 i=1

P
k
That is, ai vi ∈ Null(T ). Hence, there exists b1 , . . . , bℓ ∈ F and u1 , . . . , uℓ ∈ B such that
i=1
P
k Pk P
k P
k
ai vi = bj uj . Or equivalently, the system xi vi + yj uj = 0, in the unknowns xi ’s
i=1 j=1 i=1 j=1
4.2. RANK-NULLITY THEOREM 95

and yj ’s, has a non-trivial solution [a1 , . . . , ak , −b1 , . . . , −bℓ ]T (non-trivial as a 6= 0). Hence,
S = {v1 , . . . , vk , u1 , . . . , uℓ } is linearly dependent in V. A contradiction to S ⊆ C. That is,

dim(Rng(T )) + dim(Null(T )) = |C \ B| + |B| = |C| = dim(V).

Thus, we have proved the required result.


As an immediate corollary, we have the following result.

Corollary 4.4.2.8. Let V be a vector space over F with dim(V) = n. If S, T ∈ L(V). Then
1. Nullity(T ) + Nullity(S) ≥ Nullity(ST ) ≥ max{Nullity(T ), Nullity(S)}.
2. min{Rank(S), Rank(T )} ≥ Rank(ST ) ≥ n − Rank(S) − Rank(T ).

Proof. The prove of Part 2 is omitted as it directly follows from Part 1 and Theorem 4.4.2.7.
Part 2: We first prove the second inequality. Suppose v ∈ Null(T ). Then (ST )(v) = S(T (v) =
S(0) = 0 gives Null(T ) ⊆ Null(ST ). Thus, Nullity(T ) ≤ Nullity(ST ).
By Theorem 4.4.2.7, Nullity(S) ≤ Nullity(ST ) is equivalent to Rng(ST ) ⊆ Rng(S). And
this holds as Rng(T ) ⊆ V implies Rng(ST ) = S(Rng(T )) ⊆ S(V) = Rng(S).
To prove the first inequality, let {v1 , . . . , vk } be a basis of Null(T ). Then {v1 , . . . , vk } ⊆
Null(ST ). So, let us extend it to get a basis {v1 , . . . , vk , u1 , . . . , uℓ } of Null(ST ).
T
AF

Claim: {T (u1 ), T (u2 ), . . . , T (uℓ )} is a linearly independent subset of Null(S).


Clearly, {T (u1 ), . . . , T (uℓ )} ⊆ Null(S). Now, consider
 ℓ the  system c1 T (u1 )+· · ·+cℓ T (uℓ ) = 0
DR

P Pℓ
in the unknowns c1 , . . . , cℓ . As T ∈ L(V), we get T ci ui = 0. Thus, ci ui ∈ Null(T ).
i=1 i=1
P

Hence, ci ui is a unique linear combination of v1 , . . . , vk . Therefore,
i=1

c1 u1 + · · · + cℓ uℓ = α1 v1 + · · · + αk vk (4.4.2.1)

for some scalars α1 , . . . , αk . But by assumption, {v1 , . . . , vk , u1 , . . . , uℓ } is a basis of Null(ST )


and hence linearly independent. Therefore, the only solution of Equation (4.4.2.1) is given by
ci = 0 for 1 ≤ i ≤ ℓ and αj = 0 for 1 ≤ j ≤ k. Thus, we have proved the claim. Hence,
Nullity(S) ≥ ℓ and Nullity(ST ) = k + ℓ ≤ Nullity(T ) + Nullity(S).
Exercise 4.4.2.9. 1. Let A ∈ Mn (R) with A2 = A. Define T ∈ L(Rn , Rn ) by T (v) = Av,
for all v ∈ Rn . Then prove that

(a) T 2 = T . Equivalently, T (Id − T ) = 0.


(b) Null(T ) ∩ Rng(T ) = {0}.
(c) Rn = Rng(T ) + Null(T ). [Hint: x = T (x) + (Id − T )(x)]
 
x  
3     x−y+z
2. Define T ∈ L(R ) by T y = . Find a basis and the dimension of Rng(T )
x + 2z
z
and Null(T ).
 
3. Let zi ∈ C, for 1 ≤ i ≤ k. Define T ∈ L(C[x; n], Ck ) by T P (z) = P (z1 ), . . . , P (zk ) . If
zi ’s are distinct then for each k ≥ 1, determine Rank(T ).
96 CHAPTER 4. LINEAR TRANSFORMATIONS

4.2.A Algebra of Linear Transformation

We start with the following definition.

Definition 4.4.2.10. [Sum and Scalar Multiplication of Linear Transformations] Let V, W be


vector spaces over F and let S, T ∈ L(V, W). Then, we define the point-wise
1. sum of S and T , denoted S + T , by (S + T )(v = S(v) + T (v), for all v ∈ V.
2. scalar multiplication, denoted cT for c ∈ F, by (cT )(v = c (T (v)), for all v ∈ V.

Theorem 4.4.2.11. Let V and W be vector spaces over F. Then L(V, W) is a vector space over
F. Furthermore, if dim V = n and dim W = m, then dim L(V, W) = mn.

Proof. It can be easily verified that for S, T ∈ L(V, W), if we define (S + αT )(v) = S(v) + αT (v)
(point-wise addition and scalar multiplication) then L(V, W) is indeed a vector space over F.
We now prove the other part. So, let us assume that B = {v1 , . . . , vn } and C = {w1 , . . . , wm }
are bases of V and W, respectively. For 1 ≤ i ≤ n, 1 ≤ j ≤ m, we define the functions fij on the
basis vectors of V by (
wj , if k = i
fij (vk ) =
0, k 6= i.
T

P
n
AF

For other vectors of V, we extend the definition by linearity. That is, if v = αs vs then
s=1
!
DR

n
X n
X
fij (v) = fij αs vs = αs fij (vs ) = αi fij (vi ) = αi wj . (4.4.2.2)
s=1 s=1

Thus, fij ∈ L(V, W).


Claim: {fij |1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis of L(V, W).
n P
P m
So, let us consider the linear system cij fij = 0, in the unknowns cij ’s for 1 ≤ i ≤ n, 1 ≤
i=1 j=1
j ≤ m. Using the point-wise addition and scalar multiplication, we get
 
Xn X m X m
n X m
X
0 = 0(vk ) =  
cij fij (vk ) = cij fij (vk ) = ckj wj .
i=1 j=1 i=1 j=1 j=1

But, the set {w1 , . . . , wm } is linearly independent and hence the only solution equals ckj = 0,
for 1 ≤ j ≤ m. Now, as we vary k from 1 to n, we see that cij = 0, for 1 ≤ j ≤ m and 1 ≤ i ≤ n.
Thus, we have proved the linear independence.
Now, let us prove that LS ({fij |1 ≤ i ≤ n, 1 ≤ j ≤ m}) = L(V, W). So, let f ∈ L(V, W).
P
m
Then, f (vs ) ∈ W and hence there exists βst ’s such that f (vs ) = βst wt , for 1 ≤ s ≤ n. So, if
t=1
P
n
v= αs vs ∈ V then, using Equation (4.4.2.2), we get
s=1

n
! n n m
! n X
m
X X X X X
f (v) = f αs vs = αs f (vs ) = αs βst wt = βst (αs wt )
s=1 s=1 s=1 t=1 s=1 t=1
n Xm n X
m
!
X X
= βst fst (vs ) = βst fst (v).
s=1 t=1 s=1 t=1
4.2. RANK-NULLITY THEOREM 97

Since the above is true for every v ∈ V, LS ({fij |1 ≤ i ≤ n, 1 ≤ j ≤ m}) = L(V, W) and thus
the required result follows.

Definition 4.4.2.12. Let f : S → T be any function.


1. Then, a function g : T → S is called a left inverse of f if (g ◦ f )(x) = x, for all x ∈ S.
That is, g ◦ f = Id, the identity function on S.
2. Then, a function h : T → S is called a right inverse of f if (f ◦ h)(y) = y, for all y ∈ T .
That is, f ◦ h = Id, the identity function on T .
3. Then f is said to be invertible if it has a right inverse and a left inverse.

Remark 4.4.2.13. Let f : S → T be invertible. Then, it can be easily shown that any right
inverse and any left inverse are the same. Thus, the inverse function is unique and is denoted
by f −1 . The reader should prove that f is invertible if and only if f is both one-one and onto.

Theorem 4.4.2.14. Let V and W be two vector spaces over F and let T ∈ L(V, W). Also
assume that T is one-one and onto. Then

1. for each w ∈ W, the set |T −1 (w)| = 1, where T −1 (w) = {v ∈ V|T (v) = w}.
T

2. the map T −1 ∈ L(W, V), where one defines T −1 (w) = v whenever T (v) = w.
AF

Proof. Part 1. As T is onto, for each w ∈ W there exists v ∈ V such that T (v) = w. So,
DR

T −1 (w) 6= ∅. Now, let us assume that there exist vectors v1 , v2 ∈ V such that T (v1 ) = T (v2 ).
Then T is one-one implies v1 = v2 . Hence, |T −1 (w)| = 1. This completes the proof of Part 1.
Part 2. We need to show that T −1 (α1 w1 +α2 w2 ) = α1 T −1 (w1 )+α2 T −1 (w2 ), for all α1 , α2 ∈ F
and w1 , w2 ∈ W. Note that by Part 1, there exist unique vectors v1 , v2 ∈ V such that T −1 (w1 ) =
v1 and T −1 (w2 ) = v2 . Or equivalently, T (v1 ) = w1 and T (v2 ) = w2 . So, T (α1 v1 + α2 v2 ) =
α1 w1 + α2 w2 , for all α1 , α2 ∈ F. Hence, by definition of T −1 , for all α1 , α2 ∈ F, we get

T −1 (α1 w1 + α2 w2 ) = α1 v1 + α2 v2 = α1 T −1 (w1 ) + α2 T −1 (w2 ).

Thus the proof of Part 2 is complete.

Definition 4.4.2.15. [Inverse Linear Transformation] Let V and W be two vector spaces over
F and let T ∈ L(V, W). If T is one-one and onto then T −1 ∈ L(W, V), where T −1 (w) = v
whenever T (v) = w. The map T −1 is called the inverse of the linear transformation T .
2 2 be defined by T ((x, y)T ) = (x + y, x − y)T . Then
Example    x+y1. Let T : R →
4.4.2.16. R      x+y   
−1 x 2 −1 x −1 x 2 x
T = x−y as (T ◦ T ) =T T =T x−y = . Thus,
y 2 y y 2 y
the map T −1 is indeed the inverse of T .
P
n+1
2. Define T ∈ L(Rn+1 , R[x; n]) by T (a1 , . . . , an+1 ) = ai xi−1 , for (a1 , . . . , an+1 ) ∈ Rn+1 .
n+1  i=1
−1
P i−1
P
n+1
Then, one defines T ai x = (a1 , . . . , an+1 ), for all ai xi−1 ∈ R[x; n]. Verify
i=1 i=1
that T −1 ∈ L(R[x; n], Rn+1 ).
98 CHAPTER 4. LINEAR TRANSFORMATIONS

Definition 4.4.2.17. Let V and W be two vector spaces over F and let T ∈ L(V, W). Then, T
is said to be singular if there exists v ∈ V such that v 6= 0 but T (v) = 0. If such a v ∈ V does
not exist then T is called non-singular.
 
  x
x
Example 4.4.2.18. Let T ∈ L(R2 , R3 ) be defined by T = y . Then, verify that T is
y
0
non-singular. Is T invertible?

We now prove a result that relates non-singularity with linear independence.

Theorem 4.4.2.19. Let V and W be two vector spaces over F and let T ∈ L(V, W). Then the
following statements are equivalent.

1. T is one-one.

2. T is non-singular.

3. Whenever S ⊆ V is linearly independent then T (S) is necessarily linearly independent.

Proof. 1⇒2 Let T be singular. Then, there exists v 6= 0 such that T (v) = 0 = T (0). This
T

implies that T is not one-one, a contradiction.


AF

2⇒3 Let S ⊆ V be linearly independent. Let if possible T (S) be linearly dependent.


P
k
DR

Then, there exists v1 , . . . , vk ∈ S and α = (α1 , . . . , αk )T 6= 0 such that αi T (vi ) = 0.


 k  i=1
P P
k
Thus, T αi vi = 0. But T is nonsingular and hence we get αi vi = 0 with α 6= 0, a
i=1 i=1
contradiction to S being a linearly independent set.
3⇒1 Suppose that T is not one-one. Then, there exists x, y ∈ V such that x 6= y but
T (x) = T (y). Thus, we have obtained S = {x − y}, a linearly independent subset of V with
T (S) = {0}, a linearly dependent set. A contradiction. Thus, the required result follows.

Definition 4.4.2.20. Let V and W be two vector spaces over F and let T ∈ L(V, W). Then, T
is said to be an isomorphism if T is one-one and onto. The vector spaces V and W are said to
be isomorphic, denoted V ∼ = W, if there is an isomorphism from V to W.

We now give a formal proof of the statement in Remark 3.3.5.6.

Theorem 4.4.2.21. Let V be an n-dimensional vector space over F. Then V ∼


= Fn .

Proof. Let {v1 , . . . , vn } be a basis of V and n


 {e1 ,n. . . , en }, the standard basis of F . Now define
Pn P
T (vi ) = ei , for 1 ≤ i ≤ n and T αi vi = αi ei , for α1 , . . . , αn ∈ F. Then, it is easy to
i=1 i=1
observe that T ∈ L(V, Fn ), T is one-one and onto. Hence, T is an isomorphism.
We now summarize the different definitions related with a linear operator on a finite dimen-
sional vector space. The prove basically uses the rank-nullity theorem.

Theorem 4.4.2.22. Let V be a vector space over F with dim V = n. Then the following
statements are equivalent for T ∈ L(V).
4.3. MATRIX OF A LINEAR TRANSFORMATION 99

1. T is one-one.
2. Null(T ) = {0}.
3. Rank(T ) = n.
4. T is onto.
5. T is an isomorphism.
6. If {v1 , . . . , vn } is a basis for V then so is {T (v1 ), . . . , T (vn )}.
7. T is non-singular.
8. T is invertible.

Proof. 1 ⇒2 Let x ∈ Null(T ). Then T (x) = 0 = T (0). So, T is one-one implies x = 0.


Thus Null(T ) = {0}.
2 ⇒3 As Null(T ) = {0}, Nullity(T ) = 0 and hence by Theorem 4.4.2.7 Rank(T ) = n.
3 ⇒4 As Rank(T ) = n, Rng(T ) ⊆ V and dim(V) = n, we get Rng(T ) = V. Thus T is
onto.
4 ⇒1 As T is onto, dim(Rng(T )) = n. So, by Theorem 4.4.2.7 Null(T ) = {0}. Now, let
x, y ∈ V such that T (x) = T (y). Or equivalently, x − y ∈ Null(T ) = {0}. Thus x = y and T
T

is one-one.
AF

The equivalence of 1 and 2 gives the equivalence with 5. Also, using Theorem 4.4.2.19, one has
DR

the equivalence of 1, 6 and 7. Further note that the equivalence of 1 and 2 with Theorem 4.4.2.14
implies that T is invertible. For the other way implication, note that by definition T is invertible
implies that T is one-one and onto. Thus, all the statements are equivalent.

Exercise 4.4.2.23. Let V and W be two vector spaces over F and let T ∈ L(V, W). If dim(V)
is finite then prove that

1. T cannot be onto if dim(V) < dim(W).

2. T cannot be one-one if dim(V) > dim(W).

4.3 Matrix of a linear transformation


In Example 4.4.1.3.8, we saw that for each A ∈ Mm×n (C) there exists a linear transformation
T ∈ L(Cn , Cm ) given by T (x) = Ax, for each x ∈ Cn . In this section, we prove that if V
and W are vector spaces over F with dimensions n and m, respectively, then any T ∈ L(V, W)
corresponds to an m × n matrix. Before proceeding further, the readers should recall the results
on ordered basis (see Section 3.5).
So, let B = [v1 , . . . , vn ] and C = [w1 , . . . , wm ] be ordered bases of V and W, respectively.
Then, recall that if A = [v1 , . . . , vn ] and B = [w1 , . . . , wm ] then v = A[v]B and w = B[w]C , for
all v ∈ V and w ∈ W. So, if T ∈ L(V, W) then, for any v ∈ V,

B[T(v)]C = T (v) = T (A[v]B ) = T (A)[v]B = [T (v1 ), . . . , T (vn )][v]B


= [B[T (v1 )]C , . . . , B[T (vn )]C ] [v]B = B [[T (v1 )]C , . . . , [T (vn )]C ] [v]B .
100 CHAPTER 4. LINEAR TRANSFORMATIONS

As B is invertible, we get [T(v)]C = [[T (v1 )]C , . . . , [T (vn )]C ] [v]B . Note that the matrix
[[T (v1 )]C , . . . , [T (vn )]C ], denoted T [B, C], is an m × n matrix and is unique as the i-th column
equals [T (vi )]C , for 1 ≤ i ≤ n. So, we immediately have the following definition and result.

Definition 4.4.3.1. [Matrix of a Linear Transformation] Let B = [v1 , . . . , vn ] and C =


[w1 , . . . , wm ] be ordered bases of V and W, respectively. If T ∈ L(V, W) then the matrix T [B, C]
is called the coordinate matrix of T or the matrix of the linear transformation T with
respect to the basis B and C, respectively. When there is no mention of bases, we take the
standard bases and denote the matrix by [T ].

Theorem 4.4.3.2. Let B = [v1 , . . . , vn ] and C = [w1 , . . . , wm ] be ordered bases of V and W,


respectively. If T ∈ L(V, W) then there exists a matrix A ∈ Mm×n (F) with

A = T [B, C] = [[T (v1 )]C , . . . , [T (vn )]C ] and [T (x)]C = A [x]B , for all x ∈ V.

We now give a few examples to understand the above discussion and Theorem 4.4.3.2.

Q = (0, 1)
Q′ = (− sin θ, cos θ) P ′ = (x′ , y ′ )

θ P ′ = (cos θ, sin θ)
T

θ
AF

P = (x, y)
θ P = (1, 0) α
DR

O O

Figure 4.1: Counter-clockwise Rotation by an angle θ

Example 4.4.3.3. 1. Let T ∈ L(R2 ) represent a counterclockwise rotation by an angle θ, 0 ≤


θ < 2π. Then, using Figure 4.1, x = OP cos α and y = OP sin α, verify that
 ′       
x OP ′ cos(α + θ) OP cos α cos θ − sin α sin θ  cos θ − sin θ x
= = = .
y′ OP ′ sin(α + θ) OP sin α cos θ + cos α sin θ sin θ cos θ y

Or equivalently, the matrix in the standard ordered basis of R2 equals


" #
h i cos θ − sin θ
[T ] = T (e1 ), T (e2 ) = . (4.4.3.1)
sin θ cos θ

2. Let T ∈ L(R2 ) with T ((x, y)T ) = (x + y, x − y)T .


 
  1 1
(a) Then [T ] = [T (e1 )] [T (e2 )] = .
1 −1
   
1 1
(b) On the image space take the ordered basis C = , . Then
0 1
      
  1 1 0 2
[T ] = [T (e1 )]C [T (e2 )]C = = .
1 C −1 C 1 −1
   
−1 3
(c) In the above, let the ordered basis of the domain space be B = , . Then
1 1
             
−1 3 0 4 2 2
T [B, C] = T T = = .
1 C 1 C −2 C 2 C −2 2
4.3. MATRIX OF A LINEAR TRANSFORMATION 101

3. Let B = [e1 , e2 ] and C = [e1 + e2 , e1 − e2 ] be two ordered bases of R2 . Then Compute


T [B, B] and T [C, C], where T ((x, y)T T
 + y, x − 2y) .
 ) = (x  
1 1 −1 −1 1 1 1
Solution: Let A = Id2 and B = . Then, A = Id2 and B = . So,
1 −1 2 1 −1

            " #


1 0 1 1 1 1
T [B, B] = T , T = , = and
0 B
1 B
1 B −2 B 1 −2
            " 1 3
#
1 1 2 0
T [C, C] = T , T = , = 23 2
1 C
−1 C
−1 C
3 C − 3
2 2

       
2 −1 2 0 −1 0
as =B and =B .
−1 C −1 3 C 3

4. Let T ∈ L(R3 , R2 ) be defined by T ((x, y, z)T ) = (x + y − z, x + z)T . Determine [T ].


Solution: By definition
      " #
1 1 −1 1 1 −1
[T ] = [[T (e1 )], [T (e2 )], [T (e3 )]] = , , = .
1 0 1 1 0 1
T

5. Define T ∈ L(C3 ) by T (x) = x, for all x ∈ C3 . Determine the coordinates with respect to
AF

   
the ordered basis B = e1 , e2 , e3 and C = (1, 0, 0), (1, 1, 0), (1, 1, 1) .
DR

Solution: By definition, verify that


 
      
1 0 0 1 −1 0
 
T [B, C] = [[T (e1 )]C , [T (e2 )]C , [T (e3 )]C ] =  0 , 1 , 0  = 
   
 0 1 −1

0 C 0 C 1 C 0 0 1

and  
      
1 1 1 1 1 1
 
T [C, B] = 0 , 1 , 1  =  
0 1 1 .
0 B 0 B 1 B 0 0 1

Thus, verify that T [C, B]−1 = T [B, C] and T [B, B] = T [C, C] = I3 as the given map is indeed
the identity map.

6. Fix A ∈ Mn (C) and define T ∈ L(Cn ) by T (x) = Ax, for all x ∈ Cn . If B is the standard
basis of Cn then

[T ][:, i] = [T (ei )]B = [A(ei )]B = [A[:, i]]B = A[:, i].

7. Fix A ∈ Mm,n (C) and define T ∈ L(Cn , Cm ) by T (x) = Ax, for all x ∈ Cn . Let B and C
be the standard ordered bases of Cn and Cm , respectively. Then T [B, C] = A as

(T [B, C])[:, i] = [T (ei )]C = [A(ei )]C = [A[:, i]]C = A[:, i].
102 CHAPTER 4. LINEAR TRANSFORMATIONS

8. Fix A ∈ Mn (C) and define T ∈ L(Cn ) by T (x) = Ax, for all x ∈ Cn . Let B = [v1 , . . . , vn ]
and C = [u1 , . . . , un ] be two ordered basses of Cn with respective matrices B and C. Then
 
T [B, C] = [[T (v1 )]C , . . . , [T (v1 )]C ] = C −1 T (v1 ), . . . , C −1 T (v1 )
 
= C −1 Av1 , . . . , C −1 Av1 = C −1 A[v1 , . . . , vn ] = C −1 AB.

In particular, if
(a) B = C then T [B, B] = B −1 AB.
(b) A = In so that T = Id then Id[B, C] = C −1 B, an invertible matrix. Similarly,
Id[C, B] = B −1 C. So, Id[C, B] · Id[B, C] = (B −1 C)(C −1 B) = In .
(c) A = In so that T = Id and B = C then Id[B, B] = In .

Exercise 4.4.3.4. 1. Let T ∈ L(R2 ) represent the reflection about the line y = mx. Find
its matrix with respect to the standard ordered basis of R2 .

2. Let T ∈ L(R3 ) represent the reflection about the X-axis. Find its matrix with respect to
the standard ordered basis of R3 .

3. Let T ∈ L(R3 ) represent the counterclockwise rotation around the positive Z-axis by an
T

angle θ, 0≤ θ < 2π. Findits matrix with respect to the standard ordered basis of R3 .
AF

cos θ − sin θ 0
 
DR

[Hint: Is  sin θ cos θ 0



 the required matrix?]
0 0 1

4. Define a function D ∈ L(R[x; n]) by D(f (x)) = f ′ (x). Find the matrix of D with respect
to the standard ordered basis of R[x; n]. Observe that Rng(D) ⊆ R[x; n − 1].

4.3.A Dual Space*

Definition 4.4.3.5. Let V be a vector space over F. Then a map T ∈ L(V, F) is called a linear
functional on V.

Example 4.4.3.6. The following linear transformations are linear functionals.


1. T (A) = trace(A) for T ∈ L(Mn (R), R).
Rb
2. T (f ) = f (t)dt for T ∈ L(C([a, b], R), R).
a
Rb
3. T (f ) = t2 f (t)dt for T ∈ L(C([a, b], R), R).
a

Exercise 4.4.3.7. Let V be a vector space. Suppose there exists v ∈ V such that f (v) = 0, for
all f ∈ V∗ . Then prove that v = 0.

Definition 4.4.3.8. Let V be a vector space over F. Then L(V, F) is called the dual space of
V and is denoted by V∗ . The double dual space of V, denoted V∗∗ , is the dual space of V∗ .

We first give an immediate corollary of Theorem 4.4.2.21.


4.3. MATRIX OF A LINEAR TRANSFORMATION 103

Corollary 4.4.3.9. Let V and W be vector spaces over F with dim V = n and dim W = m.

1. Then L(V, W) ∼
= Fmn . Moreover, {fij |1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis of L(V, W).

2. In particular, if W = F then L(V, F) = V∗ ∼ = Fn . Moreover,(if {v1 , . . . , vn } is a basis of V


1, if k = i
then the set {fi |1 ≤ i ≤ n} is a basis of V∗ , where fi (vk ) = .
0, k 6= i.

So, we see that V∗ can be understood through a basis of V. Thus, one can understand V∗∗
again via a basis of V∗ . But, the question arises “can we understand it directly via the vector
space V itself?” We answer this in affirmative by giving a canonical isomorphism from V to V∗∗ .
To do so, for each v ∈ V, we define a map Lv : V∗ → F by Lv (f ) = f (v), for each f ∈ V∗ . Then
Lv is a linear functional as

Lv (αf + g) = (αf + g) (v) = αf (v) + g(v) = αLv (f ) + Lv (g).

So, for each v ∈ V, we have obtained a linear functional Lv ∈ V∗∗ . We use it to give the required
canonical isomorphism.

Theorem 4.4.3.10. Let V be a vector space over F. If dim(V) = n then the canonical map
T

T : V → V∗∗ defined by T (v) = Lv is an isomorphism.


AF
DR

Proof. Note that the map T satisfies the following:

1. For each f ∈ V∗ , note that

Lαv+u (f ) = f (αv + u) = αf (v) + f (u) = αLv (f ) + Lu (f ) = (αLv + Lu ) (f ).

Thus, Lαv+u = αLv + Lu . Hence, T (αv + u) = αT (v) + T (u). Thus, T is a linear


transformation.

2. We now show that T is one-one. So, suppose assume that T (v) = T (u), for some u, v ∈ V.
Then, Lv = Lu . That is, Lv (f ) = Lu (f ), for all f ∈ V∗ . Or equivalently, f (v − u) = 0, for
all f ∈ V∗ . Hence, by Exercise 4.4.3.7 v − u = 0. So, v = u. Therefore T is one-one.

Thus, T gives an inclusion map from V to V∗∗ . Further, applying Corollary 4.4.3.9.2 to V∗ ,
gives dim(V∗∗ ) = dim(V∗ ) = n. Hence, the required result follows.
We now give a few immediate consequences of Theorem 4.4.3.10.

Corollary 4.4.3.11. Let V be a vector space of dimension n with basis B = {v1 , . . . , vn }.

1. Then, a basis of V∗∗ , the double dual of V, equals D = {Lv1 , . . . , Lvn }. Thus, for each
T ∈ V∗∗ there exists α ∈ V such that T (f ) = f (α), for all f ∈ V∗ .

2. If C = {f1 , . . . , fn } is the dual basis of V∗ defined using the basis B (see Corollary 4.4.3.9.2)
then D is indeed the dual basis of V∗∗ obtained using the basis C of V∗ . Thus, each basis
of V∗ is the dual basis of some basis of V.
104 CHAPTER 4. LINEAR TRANSFORMATIONS

Proof. Part 1 is direct as T : V → V∗∗ was a canonical inclusion map. For Part 2, we need to
show that
( (
1, if j = i 1, if j = i
Lvi (fj ) = or equivalently fj (vi ) =
0, if j 6= i 0, if j 6= i

which indeed holds true using Corollary 4.4.3.9.2.


Let V be a finite dimensional vector space. Then Corollary 4.4.3.11 implies that the spaces V
and V∗ are naturally dual to each other.
We are now ready to prove the main result of this subsection. To start with, let T ∈ L(V, W).
Then, we want to define a map Tb : W∗ → V∗ . As g ∈ W∗ , Tb(g) ∈ V∗ , a linear functional. So,
we need to be evaluate at an element of V. Thus, we define Tb(g)(v) = g (T (v)), for all v ∈ V.
Note that Tb ∈ L(W∗ , V∗ ) as for every g, h ∈ W∗ ,
   
Tb(αg + h) (v) = (αg + h) (T (v)) = αg (T (v)) + h (T (v)) = αTb(g) + Tb(h) (v),

for all v ∈ V implies that Tb(αg + h) = αTb(g) + Tb(h).

Theorem 4.4.3.12. Let V and W be two vector spaces over F with ordered bases B = [v1 , . . . , vn ]
T

and C = [w1 , . . . , wm ], respectively. Also, let B ∗ = [f1 , . . . , fn ] and C ∗ = [g1 , . . . , gm ] be the


AF

corresponding ordered bases of the dual spaces V∗ and W∗ , respectively. Then, Tb[C ∗ , B ∗ ] =
DR

(T [B, C])T , the transpose of the coordinate matrix T .


hh i h i i
Proof. Note that we need to compute Tb[C ∗ , B ∗ ] = Tb(g1 ) ∗
, . . . , b(gm )
T ∗
and prove that
B B
it  of the matrix T [B, C]. So, let T [B, C] = [[T (v1 )]C , . . . , [T (vn )]C ] =
 equals the transpose
a11 a12 · · · a1n
 a21 a22 · · · a2n 
 
 .. .. . . .. . Thus, to prove the required result, we need to show that
 . . . . 
am1 am2 · · · amn
 
aj1
h i  aj2  X
n
 
Tb(gj ) = [f1 , . . . , fn ]  .  = ajk fk , for 1 ≤ j ≤ m. (4.4.3.2)
B∗  .. 
k=1
ajn
 
P
n P
n
Now, recall that the functionals fi ’s and gj ’s satisfy αk fk (vt ) = αk (fk (vt )) = αt , for
k=1 k=1
1 ≤ t ≤ n and [gj (w1 ), . . . , gj (wm )] = eTj , a row vector with 1 at the j-th place and 0, elsewhere.
So, let C = [w1 , . . . , wm ] and evaluate Tb(gj ) at vt ’s, the elements of B.
 
Tb(gj ) (vt ) = gj (T (vt )) = gj (C [T (vt )]C ) = [gj (w1 ), . . . , gj (wm )] [T (vt )]C
 
a1t !
a  X n
T  2t 
= ej  .  = ajt = ajk fk (vt ).
 .. 
k=1
amt
4.4. SIMILARITY OF MATRICES 105

P
n
Thus, the linear functional Tb(gj ) and ajk fk are equal at vt , for 1 ≤ t ≤ n, the basis vectors
k=1
P
n
of V. Hence Tb(gj ) = ajk fk which gives Equation (4.4.3.2).
k=1

Remark 4.4.3.13. The proof of Theorem 4.4.3.12 also shows the following.
1. For each T ∈ L(V, W) there exists a unique map Tb ∈ L(W∗ , V∗ ) such that
 
b
T (g) (v) = g (T (v)) , for each g ∈ W∗ .

2. The coordinate matrices T [B, C] and Tb[C ∗ , B ∗ ] are transpose of each other, where the or-
dered bases B ∗ of V∗ and C ∗ of W∗ correspond, respectively, to the ordered bases B of V
and C of W.

3. Thus, the results on matrices and its transpose can be re-written in the language a vector
space and its dual space.

4.4 Similarity of Matrices


T

Let V be a vector space over F with dim(V) = n and ordered basis B. Then any T ∈ L(V)
AF

corresponds to a matrix in Mn (F). What happens if the ordered basis needs to change? We
DR

answer this in this subsection.


T [B, C]m×n S[C, D]p×m
(V, B, n) (W, B2 , m) (Z, D, p)

(ST )[B, D]p×n = S[C, D] · T [B, C]

Figure 4.2: Composition of Linear Transformations

Theorem 4.4.4.1 (Composition of Linear Transformations). Let V, W and Z be finite dimen-


sional vector spaces over F with ordered bases B, C and D, respectively. Also, let T ∈ L(V, W)
and S ∈ L(W, Z). Then S ◦ T = ST ∈ L(V, Z) (see Figure 4.2). Then

(ST ) [B, D] = S[C, D] · T [B, C].

Proof. Let B = [u1 , . . . , un ], C = [v1 , . . . , vm ] and D = [w1 , . . . , wp ] be the ordered bases of


V, W and Z, respectively. Then using Theorem 4.4.3.2, we have

(ST )[B, D] = [[ST (u1 )]D , . . . , [ST (un )]D ] = [[S(T (u1 ))]D , . . . , [S(T (un ))]D ]
= [S[C, D] [T (u1 )]C ] , . . . , S[C, D] [T (un )]C ]]
= S[C, D] [[T (u1 )]C ] , . . . , [T (un )]C ]] = S[C, D] · T [B, C].

Hence, the proof of the theorem is complete.


As an immediate corollary of Theorem 4.4.4.1 we have the following result.
106 CHAPTER 4. LINEAR TRANSFORMATIONS

Theorem 4.4.4.2 (Inverse of a Linear Transformation). Let V is a vector space with dim(V) =
n. If T ∈ L(V) is invertible then for any ordered basis B, (T [B, B])−1 = T −1 [B, B]. That is, the
coordinate matrix is invertible.

Proof. As T is invertible, T T −1 = Id. Thus, Example 4.4.3.3.8c and Theorem 4.4.4.1 imply

In = Id[B, B] = (T T −1 )[B, B] = T [B, B] T −1 [B, B].

Hence, by definition of inverse, T −1 [B, B] = (T [B, B])−1 and the required result follows.

Exercise 4.4.4.3. Find the matrix of the linear transformations given below.

1. Define T ∈ L(R3 ) by T (x1 ) = x2 , T (x2 ) = x3 and T (x3 ) = x1 . Find T [B, B], where
 
B = , x2 , x3 is an ordered basis of R3 . Is T invertible?
 
2. Let B = 1, x, x2 , x3 be an ordered basis of R[x; 3] and define T ∈ L(R[x; 3]) by T (1) = 1,
T (x) = 1 + x, T (x2 ) = (1 + x)2 and T (x3 ) = (1 + x)3 . Prove that T is invertible. Also,
find T [B, B] and T −1 [B, B].

Let V be a finite dimensional vector space. Then, the next result answer the question “what
T

happens to the matrix T [B, B] if the ordered basis B changes to C?”


AF

T [B, B]
DR

(V, B) (V, B1 )

Id ◦ T
Id[B, C] Id[B, C]
T ◦ Id

(V, C) (V, C)
T [C, C]
Figure 4.3: Commutative Diagram for Similarity of Matrices

Theorem 4.4.4.4. Let B = [u1 , . . . , un ] and C = [v1 , . . . , vn ] be two ordered bases of V and Id
the identity operator. Then, for any linear operator T ∈ L(V)

T [C, C] = (Id[C, B])−1 · T [B, B] · Id[C, B]. (4.4.4.1)

Proof. The proof uses Theorem 4.4.4.1 to represent T [B, C] as (Id ◦ T )[B, C] and (T ◦ Id)[B, C]
(see Figure 4.3 for clarity). Now, by Theorem 4.4.4.1, T [B, C] = (Id ◦ T )[B, C] = Id[B, C] · T [B, B]
and T [B, C] = (T ◦ Id)[B, C] = T [C, C] · Id[B, C]. Hence, T [C, C] · Id[B, C] = Id[B, C] · T [B, B] and
hence T [C, C] = (Id[C, B])−1 · T [B, B] · Id[C, B] and the result follows.
Let V be a vector space and let T ∈ L(V). If dim(V) = n then every ordered basis B of V
gives an n × n matrix T [B, B]. So, as we change the ordered basis, the coordinate matrix of T
changes. Theorem 4.4.4.4 tells us that all these matrices are related by an invertible matrix.
Thus we are led to the following remark and the definition.
4.5. SUMMARY 107

Remark 4.4.4.5. As T [C, C] = Id[B, C] · T [B, B] · Id[C, B], the matrix Id[B, C] is called the B : C
change of basis matrix (also, see Theorem 3.3.5.4).

Definition 4.4.4.6. [Similar Matrices] Let B, C ∈ Mn (C). Then, B and C are said to be
similar if there exists a non-singular matrix P such that P −1 BP = C ⇔ BP = P C.
   
Example 4.4.4.7. Let B = 1+x, 1+2x+x2 , 2+x and C = 1, 1+x, 1+x+x2 be ordered bases
of R[x; 2]. Then, for Id(a + bx + cx2 ) = a + bx + cx2, verify that verify that Id[B, C]−1 = Id[C, B],
where
 
−1 1 −2
 
Id[C, B] = [[1]B , [1 + x]B , [1 + x + x2 ]B ] = 
 0 0 1  and

1 0 1
 
0 −1 1
 
Id[B, C] = [[1 + x]C , [1 + 2x + x2 ]C , [2 + x]C ] =  
1 1 1 .
0 1 0

Exercise 4.4.4.8. 1. Define T ∈ L(R3 ) by T ((x, y, z)T ) = (x+y+2z, x−y−3z, 2x+3y+z)T .


 
Let B be the standard ordered basis and C = (1, 1, 1), (1, −1, 1), (1, 1, 2) be another ordered
T

basis of R3 . Then find


AF
DR

(a) matrices T [B, B] and T [C, C].


(b) the matrix P such that P −1 T [B, B] P = T [C, C].

2. Let V be a vector space with dim(V) = n. Let T ∈ L(V) satisfy T n−1 6= 0 but T n = 0.

(a) Prove that there exists u ∈ V with {u, T (u), . . . , T n−1 (u)}, a basis of V.
 
0 0 0 ··· 0
 
1 0 0 · · · 0
   
 
(b) If B = u, T (u), . . . , T n−1 (u) then T [B, B] = 0 1 0 · · · 0.
. . . . . .. 
. . . .
. 
0 0 ··· 1 0
(c) Let A be an n × n matrix satisfying An−1 6= 0 but An = 0. Then prove that A is
similar to the matrix given in Part 1b.

3. Let V, W be vector spaces over F with dim(V) = n and dim(W) = m and ordered bases
B and C, respectively. Define IB,C : L(V, W) → Mm,n (F) by IB,C (T ) = T [B, C]. Show that
IB,C is an isomorphism. Thus, when bases are fixed, the number of m × n matrices is same
as the number of linear transformations.

4.5 Summary
108 CHAPTER 4. LINEAR TRANSFORMATIONS

T
AF
DR
Chapter 5

Inner Product Spaces

5.1 Definition and Basic Properties


Recall the dot product in R2 and R3 . Dot product helped us to compute the length of vectors
and angle between vectors. This enabled us to rephrase geometrical problems in R2 and R3
in the language of vectors. We generalize the idea of dot product to achieve similar goal for a
general vector space.
T
AF

Definition 5.5.1.1. [Inner Product] Let V be a vector space over F. An inner product over
DR

V, denoted by h , i, is a map from V × V to F satisfying

1. hau + bv, wi = ahu, wi + bhv, wi, for all u, v, w ∈ V and a, b ∈ F,

2. hu, vi = hv, ui, the complex conjugate of hu, vi, for all u, v ∈ V and

3. hu, ui ≥ 0 for all u ∈ V. Furthermore, equality holds if and only if u = 0.

Remark 5.5.1.2. Using the definition of inner product, we immediately observe that
1. hv, αwi = hαw, vi = αhw, vi = αhv, wi, for all α ∈ F and v, w ∈ V.

2. If hu, vi = 0 for all v ∈ V then in particular hu, ui = 0. Hence, u = 0.

Definition 5.5.1.3. [Inner Product Space] Let V be a vector space with an inner product h , i.
Then (V, h , i) is called an inner product space (in short, ips).

Example 5.5.1.4. Examples 1 and 2 that appear below are called the standard inner prod-
uct or the dot product on Rn and Cn , respectively. Whenever an inner product is not clearly
mentioned, it will be assumed to be the standard inner product.

1. For u = (u1 , . . . , un )T , v = (v1 , . . . , vn )T ∈ Rn define hu, vi = u1 v1 + · · · + un vn = vT u.



Then h , i is indeed an inner product and hence Rn , h , i is an ips.

2. For u = (u1 , . . . , un )∗ , v = (v1 , . . . , vn )∗ ∈ Cn define hu, vi = u1 v1 + · · · + un vn = v∗ u.



Then Cn , h , i is an ips.
110 CHAPTER 5. INNER PRODUCT SPACES
# "
−1 4
3. For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 and A = , define hx, yi = yT Ax. Then,
−1 2
h , i is an inner product as hx, xi = (x1 − x2 )2 + 3x21 + x22 .
 
a b
4. Fix A = with a, c > 0 and ac > b2 . Then hx, yi = yT Ax is an inner product on R2
b c
h i2  
as hx, xi = ax21 + 2bx1 x2 + cx22 = a x1 + bxa2 + a1 ac − b2 x22 .

5. Verify that for x = (x1 , x2 , x3 )T , y = (y1 , y2 , y3 )T ∈ R3 , hx, yi = 10x1 y1 + 3x1 y2 + 3x2 y1 +


2x2 y2 + x2 y3 + x3 y2 + x3 y3 defines an inner product.

6. For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 , we define three maps that satisfy at least one
condition out of the three conditions for an inner product. Determine the condition which
is not satisfied. Give reasons for your answer.

(a) hx, yi = x1 y1 .

(b) hx, yi = x21 + y12 + x22 + y22 .

(c) hx, yi = x1 y13 + x2 y23 .


T
AF

7. Let A ∈ Mn (C) be a Hermitian matrix. Then, for x, y ∈ Cn , define hx, yi = y∗ Ax. Then,
DR

h , i satisfies hx, yi = hy, xi and hx + αz, yi = hx, yi + αhz, yi, for all x, y, z ∈ Cn and
α ∈ C. Does there exist conditions on A such that hx, xi ≥ 0 for all x ∈ C? This will be
answered in affirmative in the chapter on eigenvalues and eigenvectors.

8. For A, B ∈ Mn (R), define hA, Bi = tr(B T A). Then



hA + B, Ci = Tr C T (A + B) = Tr(C T A) + Tr(C T B) = hA, Ci + hB, Ci and
hA, Bi = Tr(B T A) = Tr( (B T A)T ) = Tr(AT B) = hB, Ai.

P
n P
n P
n
If A = [aij ] then hA, Ai = Trr(AT A) = (AT A)ii = aij aij = a2ij and therefore,
i=1 i,j=1 i,j=1
hA, Ai > 0 for all non-zero matrix A.

R1
9. Consider the complex vector space C[−1, 1] and define hf, gi = f (x)g(x)dx. Then
−1

R1
(a) hf , f i = | f (x) |2 dx ≥ 0 as | f (x) |2 ≥ 0 and this integral is 0 if and only if f ≡ 0
−1
as f is continuous.
R1 R1 R1
(b) hg, f i = g(x)f (x)dx = g(x)f (x)dx = f (x)g(x)dx = hf , gi.
−1 −1 −1
R1 R1
(c) hf + g, hi = (f + g)(x)h(x)dx = [f (x)h(x) + g(x)h(x)]dx = hf , hi + hg, hi.
−1 −1
R1 R1
(d) hαf , gi = (αf (x))g(x)dx = α f (x)g(x)dx = αhf , gi.
−1 −1
5.1. DEFINITION AND BASIC PROPERTIES 111

(e) Fix an ordered basis B = [u1 , . . . , un ] of a 


complex vector space V. Then, for any
a1 b1
    Pn
u, v ∈ V, with [u]B =  ...  and [v]B =  ... , define hu, vi = ai bi . Then, h , i is
i=1
an bn
indeed an inner product in V. So, any finite dimensional vector space can be endowed
with an inner product.

5.1.A Cauchy Schwartz Inequality

As hu, ui > 0, for all u 6= 0, we use inner product to define length of a vector.

Definition 5.5.1.5. [Length/Norm of a Vector] Let V be a vector space over F. Then for any
p
vector u ∈ V, we define the length (norm) of u, denoted kuk = hu, ui, the positive square
u
root. A vector of norm 1 is called a unit vector. Thus, is called the unit vector in the
kuk
direction of u.

Example 5.5.1.6. 1. Let V be an ips and u ∈ V. Then for any scalar α, kαuk = α · kuk.
√ √
2. Let u = (1, −1, 2, −3)T ∈ R4 . Then kuk = 1+1+4+9 = 15. Thus, √1 u and
15
− √115 u are vectors of norm 1. Moreover √1 u is a unit vector in the direction of u.
T
15
AF

Exercise 5.5.1.7. 1. Let u = (−1, 1, 2, 3, 7)T ∈ R5 . Find all α ∈ R such that kαuk = 1.
DR

2. Let u = (−1, 1, 2, 3, 7)T ∈ C5 . Find all α ∈ C such that kαuk = 1.



3. Prove that kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 , for all xT , yT ∈ Rn . This equality is
called the Parallelogram Law as in a parallelogram the sum of square of the lengths of
the diagonals is equal to twice the sum of squares of the lengths of the sides.

4. Apollonius’ Identity: Let the length of the sides of a triangle be a, b, c ∈ R and that of
the median be d ∈ R.  If the median is drawn on the side with length a then prove that
 a 2
2 2
b +c =2 d + 2 .
2
5. Let A ∈ Mn (C) satisfy kAxk ≤ kxk for all x ∈ Cn . Then prove that if α ∈ C with
| α | > 1 then A − αI is invertible.

6. Let u = (1, 2)T , v = (2, −1)T ∈ R2 . Then, does there "exist an 2


# inner product in R such
a b
that kuk = 1, kvk = 1 and hu, vi = 0? [Hint: Let A = and define hx, yi = yT Ax.
b c
Use given conditions to get a linear system of 3 equations in the unknowns a, b, c.]

7. Let x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 . Then hx, yi = 3x1 y1 − x1 y2 − x2 y1 + x2 y2 defines


an inner product. Use this inner product to find

(a) the angle between e1 = (1, 0)T and e2 = (0, 1)T .


(b) v ∈ R2 such that hv, e1 i = 0.
(c) x, y ∈ R2 such that kxk = kyk = 1 and hx, yi = 0.
112 CHAPTER 5. INNER PRODUCT SPACES

A very useful and a fundamental inequality, commonly called the Cauchy-Schwartz inequality,
concerning the inner product is proved next.

Theorem 5.5.1.8 (Cauchy-Bunyakovskii-Schwartz inequality). Let V be an inner product space


over F. Then, for any u, v ∈ V
| hu, vi | ≤ kuk kvk. (5.5.1.1)

Moreover, equality holds in Inequality


 (5.5.1.1)
 if and only if u and v are linearly dependent.
u u
Furthermore, if u 6= 0 then v = v, .
kuk kuk
Proof. If u = 0 then Inequality (5.5.1.1) holds. Hence, let u 6= 0. Then, by Definition 5.5.1.1.3,
hv, ui
hλu + v, λu + vi ≥ 0 for all λ ∈ F and v ∈ V. In particular, for λ = − ,
kuk2

0 ≤ hλu + v, λu + vi = λλkuk2 + λhu, vi + λhv, ui + kvk2


hv, ui hv, ui 2 hv, ui hv, ui 2 2 | hv, ui |2
= kuk − hu, vi − hv, ui + kvk = kvk − .
kuk2 kuk2 kuk2 kuk2 kuk2

Or, in other words | hv, ui |2 ≤ kuk2 kvk2 and the proof of the inequality is over.
Now, note that equality holds in Inequality (5.5.1.1) if and only if hλu + v, λu + vi = 0, or
T

equivalently, λu + v = 0. Hence, u and v are linearly dependent. Moreover,


AF

0 = h0, ui = hλu + v, ui = λhu, ui + hv, ui


DR

 
hv, ui u u
implies that v = −λu = − 2
u = v, .
kuk kuk kuk
n 2  n  n 
n
P P 2 P 2
Corollary 5.5.1.9. Let x, y ∈ R . Then xi yi ≤ xi yi .
i=1 i=1 i=1

5.1.B Angle between two Vectors

Let V be a real vector space. Then, for u, v ∈ V, the Cauchy-Schwartz inequality implies that
hu,vi
−1 ≤ kuk kvk ≤ 1. We use this together with the properties of the cosine function to define the
angle between two vectors in an inner product space.

Definition 5.5.1.10. [Angle between two vectors] Let V be a real vector space. If θ ∈ [0, π] is
the angle between u, v ∈ V \ {0} then we define

hu, vi
cos θ = .
kuk kvk
Example 5.5.1.11. 1. Take (1, 0)T , (1, 1)T ∈ R2 . Then cos θ = √1 . So θ = π/4.
2

2. Take (1, 1, 0)T , (1, 1, 1)T ∈ R3 . Then angle between them, say β = cos−1 √2 .
6

3. Angle depends on the IP. Take hx, yi = 2x1 y1 + x1 y2 + x2 y1 + x2 y2 on R2 . Then angle


between (1, 0)T , (1, 1)T ∈ R2 equals cos−1 √3 .
10

4. As hx, yi = hy, xi for any real vector space, the angle between x and y is same as the
angle between y and x.
5.1. DEFINITION AND BASIC PROPERTIES 113
 
1 1
5. Let a, b ∈ R with a, b > 0. Then prove that (a + b) + ≥ 4.
a b
6. 
For 1  ≤ i ≤ n, let ai ∈ R with ai > 0. Then, use Corollary 5.5.1.9 to show that
Pn Pn 1
ai ≥ n2 .
i=1 a
i=1 i
p
7. Prove that | z1 + · · · + zn | ≤ n( | z1 |2 + · · · + | zn |2 ), for z1 , . . . , zn ∈ C. When does
the equality hold?
8. Let V be an ips. If u, v ∈ V with kuk = 1, kvk = 1 and hu, vi = 1 then prove that u = αv
for some α ∈ F. Is α = 1?
C

a
b

A B
c
Figure 2: Triangle with vertices A, B and C

We will now prove that if A, B and C are the vertices of a triangle (see Figure 5.1.B) and a, b
T
b2 +c2 −a2
and c, respectively, are the lengths of the corresponding sides then cos(A) = 2bc . This in
AF

turn implies that the angle between vectors has been rightly defined.
DR

Lemma 5.5.1.12. Let A, B and C be the vertices of a triangle (see Figure 5.1.B) with corre-
sponding side lengths a, b and c, respectively, in a real inner product space V then

b2 + c2 − a2
cos(A) = .
2bc
Proof. Let 0, u and v be the coordinates of the vertices A, B and C, respectively, of the triangle
ABC. Then AB ~ = u, AC~ = v and BC ~ = v − u. Thus, we need to prove that

kvk2 + kuk2 − kv − uk2


cos(A) = ⇔ kvk2 + kuk2 − kv − uk2 = 2 kvk kuk cos(A).
2kvkkuk

Now, by definition kv−uk2 = kvk2 +kuk2 −2hv, ui and hence kvk2 +kuk2 −kv−uk2 = 2 hu, vi.
As hv, ui = kvk kuk cos(A), the required result follows.

Definition 5.5.1.13. [Orthogonality] Let V be an inner product space over R. Then


1. the vectors u, v ∈ V are called orthogonal/perpendicular if hu, vi = 0.
2. Let S ⊆ V. Then, the orthogonal complement of S in V, denoted S ⊥ , equals

S ⊥ = {v ∈ V : hv, wi = 0, for all w ∈ S}.

Example 5.5.1.14. 1. 0 is orthogonal to every vector as h0, xi = 0 for all x ∈ V.

2. If V is a vector space over R or C then 0 is the only vector that is orthogonal to itself.

3. Let V = R.
114 CHAPTER 5. INNER PRODUCT SPACES

(a) S = {0}. Then S ⊥ = R.


(b) S = R, Then S ⊥ = {0}.
(c) Let S be any subset of R containing a non-zero real number. Then S ⊥ = {0}.

4. Let u = (1, 2)T . What is u⊥ in R2 ?


Solution: {(x, y)T ∈ R2 | x + 2y = 0}. Is this Null(u)? Note that (2, −1)T is a basis of
u⊥ and for any vector x ∈ R2 ,
 
u u x1 + 2x2 2x1 − x2
x = hx, ui 2
+ x − hx, ui 2
= (1, 2)T + (2, −1)T
kuk kuk 5 5

is a decomposition of x into two vectors, one parallel to u and the other parallel to u⊥ .

5. Fix u = (1, 1, 1, 1)T , v = (1, 1, −1, 0)T ∈ R4 . Determine z, w ∈ R4 such that u = z + w


with the condition that z is parallel to v and w is orthogonal to v.
Solution: As z is parallel to v, z = kv = (k, k, −k, 0)T , for some k ∈ R. Since w is
orthogonal to v the vector w = (a, b, c, d)T satisfies a + b − c = 0. Thus, c = a + b and

(1, 1, 1, 1)T = u = z + w = (k, k, −k, 0)T + (a, b, a + b, d)T .


T

Comparing the corresponding coordinates, gives the linear system d = 1, a + k = 1,


AF

b + k = 1 and a + b − k = 1 in the unknowns a, b, d and k. Thus, solving for a, b, d and k


DR

1 1
gives z = (1, 1, −1, 0)T and w = (2, 2, 4, 3)T .
3 3
6. Let x, y ∈ Rn then prove that

(a) hx, yi = 0 ⇐⇒ kx − yk2 = kxk2 + kyk2 (Pythagoras Theorem).


Solution: Use kx − yk2 = kxk2 + kyk2 − 2hx, yi to get the required result follows.
(b) kxk = kyk ⇐⇒ hx + y, x − yi = 0 (x and y form adjacent sides of a rhombus as the
diagonals x + y and x − y are orthogonal).
Solution: Use hx + y, x − yi = kxk2 − kyk2 to get the required result follows.
(c) 4hx, yi = kx + yk2 − kx − yk2 (polarization identity in Rn ).
Solution: Just expand the right hand side to get the required result follows.
(d) kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 (parallelogram law: the sum of squares of
the diagonals of a parallelogram equals twice the sum of squares of its sides).
Solution: Just expand the left hand side to get the required result follows.

7. Let P = (1, 1, 1)T , Q = (2, 1, 3)T and R = (−1, 1, 2)T be three vertices of a triangle in R3 .
Compute the angle between the sides P Q and P R.
Solution: Method 1: Note that P~Q = (2, 1, 3)T −(1, 1, 1)T = (1, 0, 2)T , P~R = (−2, 0, 1)T
~ = (−3, 0, −1)T . As hP~Q, P~Ri = 0, the angle between the sides P Q and P R is π .
and RQ
2
√ √ √
Method 2: kP Qk = 5, kP Rk = 5 and kQRk = 10. As kQRk = kP Qk + kP Rk2 ,
2 2
π
by Pythagoras theorem, the angle between the sides P Q and P R is .
2
Exercise 5.5.1.15. 1. Let V be an ips.
5.1. DEFINITION AND BASIC PROPERTIES 115

(a) If S ⊆ V then S ⊥ is a subspace of V and S ⊥ = (LS(S))⊥ .


(b) Furthermore, if V is finite dimensional then S ⊥ and LS(S) are complementary. That
is, V = LS(S) + S ⊥ . Equivalently, hu, wi = 0, for all u ∈ LS(S) and w ∈ S ⊥ .

2. Consider R3 with the standard inner product. Find


(a) S ⊥ for S = {(1, 1, 1)T , (0, 1, −1)T } and S = LS((1, 1, 1)T , (0, 1, −1)T ).
(b) vectors v, w ∈ R3 such that v, w, u = (1, 1, 1)T are mutually orthogonal.
(c) the line passing through (1, 1, −1)T and parallel to (a, b, c) 6= 0.
(d) the plane containing (1, 1 − 1) with (a, b, c) 6= 0 as the normal vector.
(e) the area of the parallelogram with three vertices 0T , (1, 2, −2)T and (2, 3, 0)T .
(f ) the area of the parallelogram when kxk = 5, kx − yk = 8 and kx + yk = 14.
(g) the plane containing (2, −2, 1)T and perpendicular to the line with parametric equation
x = t − 1, y = 3t + 2, z = t + 1.
(h) the plane containing the lines (1, 2, −2) + t(1, 1, 0) and (1, 2, −2) + t(0, 1, 2).
(i) k such that cos−1 (hu, vi) = π/3, where u = (1, −1, 1)T and v = (1, k, 1)T .
(j) the plane containing (1, 1, 2)T and orthogonal to the line with parametric equation
x = 2 + t, y = 3 and z = 1 − t.
T

(k) a parametric equation of a line containing (1, −2, 1)T and orthogonal to x+3y+2z = 1.
AF

3. Let P = (3, 0, 2)T , Q = (1, 2, −1)T and R = (2, −1, 1)T be three points in R3 . Then,
DR

(a) find the area of the triangle with vertices P, Q and R.


(b) find the area of the parallelogram built on vectors P~Q and QR.
~
(c) find a nonzero vector orthogonal to the plane of the above triangle.

(d) find all vectors x orthogonal to P~Q and QR
~ with kxk = 2.
(e) the volume of the parallelepiped built on vectors P~Q and QR
~ and x, where x is one
of the vectors found in Part 3d. Do you think the volume would be different if you
choose the other vector x?

4. Let p1 be a plane containing A = (1, 2, 3)T and (2, −1, 1)T as its normal vector. Then

(a) find the equation of the plane p2 that is parallel to p1 and contains (−1, 2, −3)T .
(b) calculate the distance between the planes p1 and p2 .

5. In the parallelogram ABCD, ABkDC and ADkBC and A = (−2, 1, 3)T , B = (−1, 2, 2)T
and C = (−3, 1, 5)T . Find the

(a) coordinates of the point D,


(b) cosine of the angle BCD.
(c) area of the triangle ABC
(d) volume of the parallelepiped determined by AB, AD and (0, 0, −7)T .

6. Let W = {(x, y, z, w)T ∈ R4 : x + y + z − w = 0}. Find a basis of W⊥ .


7. Recall the ips Mn (R) (see Example 5.5.1.4.8). If W = {A ∈ Mn (R) | AT = A} then W⊥ ?
116 CHAPTER 5. INNER PRODUCT SPACES

5.1.C Normed Linear Space

To proceed further, recall that a vector space over R or C was a linear space.

Definition 5.5.1.16. Let V be a linear space.


1. Then, a norm on V is a function f (x) = kxk from V to R such that
(a) kxk ≥ 0 for all x ∈ V and if kxk = 0 then x = 0.
(b) kαxk = | α | kxk for all α ∈ F and x ∈ V.
(c) kx + yk ≤ kxk + kyk for all x, y ∈ V (triangle inequality).

2. A linear space with a norm on it is called a normed linear space (nls).



Theorem 5.5.1.17. Let V be a normed linear space and x, y ∈ V. Then kxk − kyk ≤ kx − yk.

Proof. As kxk = kx − y + yk ≤ kx − yk + kyk one has kxk − kyk ≤ kx − yk. Similarly, one
obtains kyk − kxk ≤ ky − xk = kx − yk. Combining the two, the required result follows.
p
Example 5.5.1.18. 1. On R3 , kxk = x21 + x22 + x23 is a norm. Also, observe that this
p
norm corresponds to hx, xi, where h, i is the standard inner product.
p
2. Let V be an ips. Is it true that f (x) = hx, xi is a norm?
T
AF

Solution: Yes. The readers should verify the first two conditions. For the third condition,
recalling the Cauchy-Schwartz inequality, we get
DR

f (x + y)2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi


≤ kxk2 + kxkkyk + kxkkyk + kyk2 = (f (x) + f (y))2 .
p
Thus, kxk = hx, xi is a norm, called the norm induced by the inner product h·, ·i.
Exercise 5.5.1.19. 1. Let V be an ips. Then

4hx, yi = kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 (Polarization Identity).

2. Consider the complex vector space Cn . If x, y ∈ Cn then prove that

(a) If x 6= 0 then kx + ixk2 = kxk2 + kixk2 , even though hx, ixi =


6 0.
(b) hx, yi = 0 whenever kx + yk2 = kxk2 + kyk2 and kx + iyk2 = kxk2 + kiyk2 .

The next result is stated without proof as the proof is beyond the scope of this book.

Theorem 5.5.1.20. Let k·k be a norm on a nls V. Then, k·k is induced by some inner product
if and only if k · k satisfies the parallelogram law: kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 .
Example 5.5.1.21. 1. For x = (x1 , x2 )T ∈ R2 , we define kxk1 = |x1 | + |x2 |. Verify that
kxk1 is indeed a norm. But, for x = e1 and y = e2 , 2(kxk2 + kyk2 ) = 4 whereas

kx + yk2 + kx − yk2 = k(1, 1)k2 + k(1, −1)k2 = (|1| + |1|)2 + (|1| + | − 1|)2 = 8.

So, the parallelogram law fails. Thus, kxk1 is not induced by any inner product in R2 .
5.1. DEFINITION AND BASIC PROPERTIES 117

2. Does there exist an inner product in R2 such that kxk = max{|x1 |, |x2 |}?
3. If k · k is a norm in V then d(x, y) = kx − yk, for x, y ∈ V, defines a distance function as
(a) d(x, x) = 0, for each x ∈ V.
(b) using the triangle inequality, for any z ∈ V, we have

d(x, y) = kx−yk = k (x − z)−(y − z) k ≤ k (x − z) k+k (y − z) k = d(x, z)+d(z, y).

5.1.D Application to Fundamental Spaces

We end this section by proving the fundamental theorem of linear algebra. So, the readers are
advised to recall the four fundamental subspaces and also to go through Theorem 3.3.4.9 (the
rank-nullity theorem for matrices). We start with the following result.

Lemma 5.5.1.22. Let A ∈ Mm,n (R). Then Null(A) = Null(AT A).

Proof. Let x ∈ Null(A). Then Ax = 0. So, (AT A)x = AT (Ax) = AT 0 = 0. Thus,


x ∈ Null(AT A). That is, Null(A) ⊆ Null(AT A).
Suppose that x ∈ Null(AT A). Then (AT A)x = 0 and 0 = xT 0 = xT (AT A)x = (Ax)T (Ax) =
kAxk2 . Thus, Ax = 0 and the required result follows.
T
AF

Theorem 5.5.1.23 (Fundamental Theorem of Linear Algebra). Let A ∈ Mn (C). Then


DR

1. dim(Null(A)) + dim(Col(A)) = n.
⊥ ⊥
2. Null(A) = Col(A∗ ) and Null(A∗ ) = Col(A) .

3. dim(Col(A)) = dim(Col(A∗ )).

Proof. Part 1: Proved in Theorem 3.3.4.9.


Part 2: We first prove that Null(A) ⊆ Col(A∗ )⊥ . Let x ∈ Null(A). Then Ax = 0 and

0 = h0, ui = hAx, ui = u∗ Ax = (A∗ u)∗ x = hx, A∗ ui, for all u ∈ Cn .

But Col(A∗ ) = {A∗ u | u ∈ Cn }. Thus, x ∈ Col(A∗ )⊥ and Null(A) ⊆ Col(A∗ )⊥ .


We now prove that Col(A∗ )⊥ ⊆ Null(A). Let x ∈ Col(A∗ )⊥ . Then, for every y ∈ Cn ,

0 = hx, A∗ yi = (A∗ y)∗ x = y∗ (A∗ )∗ x = y∗ Ax = hAx, yi.

In particular, for y = Ax ∈ Cn , we get kAxk2 = 0. Hence Ax = 0. That is, x ∈ Null(A).


Thus, the proof of the first equality in Part 2 is over. We omit the second equality as it proceeds
on the same lines as above.
Part 3: Use the first two parts to get the required result.
Hence the proof of the fundamental theorem is complete.
We now give some implications of the above theorem.

Corollary 5.5.1.24. Let A ∈ Mn (R). Then the function T : Col(AT ) → Col(A) defined by
T (x) = Ax is one-one and onto.
118 CHAPTER 5. INNER PRODUCT SPACES

Proof. In view of Theorem 5.5.1.23.3, we just need to show that the map is one-one. So, let
us assume that there exist x, y ∈ Col(AT ) such that T (x) = T (y). Or equivalently, Ax =
Ay. Thus, x − y ∈ Null(A) = (Col(AT ))⊥ (by Theorem 5.5.1.23.2). Therefore, x − y ∈
(Col(AT ))⊥ ∩ Col(AT ) = {0} (by Example 2). Thus, x = y and hence the map is one-one.

Remark 5.5.1.25. Let A ∈ Mm,n (R).

1. Then the spaces Col(A) and Null(AT ) are not only orthogonal but are orthogonal com-
plement of each other.

2. Further if Rank(A) = r then, using Corollary 5.5.1.24, we observe the following:

(a) If i1 , . . . , ir are the pivot rows of A then {A(A[i1 , :]T ), . . . , A(A[ir , :]T )} form a basis
of Col(A).
(b) Similarly, if j1 , . . . , jr are the pivot columns of A then {AT (A[:, j1 ]), . . . , AT (A[:, jr ])}
form a basis of Col(AT ).
(c) So, if we choose the rows and columns corresponding to the pivot entries then the
corresponding r × r submatrix of A is invertible.

The readers should look at Example 3.3.1.26 and Remark 3.3.1.27. We give one more example.
T
AF

 
1 1 0
Example 5.5.1.26. Let A = 2 1 1. Then verify that
DR

3 2 1
1. {(0, 1, 1)t , (1, 1, 2)T } is a basis of Col(A).

2. {(1, 1, −1)T } is a basis of Null(AT ).

3. Null(AT ) = (Col(A))⊥ .

Exercise 5.5.1.27. 1. Find distinct subspaces W1 and W2

(a) in R2 such that W1 and W2 are orthogonal but not orthogonal complement.
(b) in R3 such that W1 6= {0} and W2 6= {0} are orthogonal, but not orthogonal comple-
ment.

2. Let A ∈ Mm,n (C). Then Null(A) = Null(A∗ A).

3. Let A ∈ Mm,n (R). Then Col(A) = Col(AT A).

4. Let A ∈ Mm,n (R). Then Rank(A) = n if and only if Rank(AT A) = n.

5. Let A ∈ Mm,n (C). Then, for every

(a) x ∈ Rn , x = u + v, where u ∈ Col(AT ) and v ∈ Null(A) are unique.


(b) y ∈ Rm , y = w + z, where w ∈ Col(A) and z ∈ Null(AT ) are unique.

For more information related with the fundamental theorem of linear algebra the interested
readers are advised to see the article “The Fundamental Theorem of Linear Algebra, Gilbert
Strang, The American Mathematical Monthly, Vol. 100, No. 9, Nov., 1993, pp. 848 - 855.”
5.1. DEFINITION AND BASIC PROPERTIES 119

5.1.E Properties of Orthonormal Vectors

At the end of the previous section, we saw that Col(A) is orthogonal to Null(AT ). So, in this
section, we try to understand the orthogonal vectors.

Definition 5.5.1.28. [Orthonormal Set] Let V be an ips. Then a non-empty set


S = {v1 , . . . , vn } ⊆ V is called an orthonormal set if
1. hui , uj i = 0, for all 1 ≤ i 6= j ≤ n. That is, vi and vj are mutually orthogonal, for
1 ≤ i 6= j ≤ n.
2. kvi k = 1, for 1 ≤ i ≤ n

If S is also a basis of V then S is called an orthonormal basis of V.

Example 5.5.1.29. 1. A few orthonormal sets in R2 are


  1 1  1 1
(1, 0)T , (0, 1)T , √ (1, 1)T , √ (1, −1)T and √ (2, 1)T , √ (1, −2)T .
2 2 5 5

2. Let S = {e1 , . . . , en } be the standard basis of Rn . Then S is an orthonormal set as


T

(a) kei k = 1, for 1 ≤ i ≤ n.


AF

(b) hei , ej i = 0, for 1 ≤ i 6= j ≤ n.


DR

h iT h iT h iT 
1 1 √1 1 √1 2 √1 1
3. The set √
3
, − 3 , 3 , 0, 2 , 2 , 6 , 6 , − 6
√ √ √ √ is an orthonormal in R3 .


4. Recall that hf (x), g(x)i = f (x)g(x)dx defines the standard inner product in C[−π, π].
−π
Consider S = {1} ∪ {em | m ≥ 1} ∪ {fn | n ≥ 1}, where 1(x) = 1, em (x) = cos(mx) and
fn (x) = sin(nx), for all m, n ≥ 1 and for all x ∈ [−π, π]. Then
(a) S is a linearly independent set.
(b) k1k2 = 2π, kem k2 = π and kfn k2 = π.
(c) the functions in S are orthogonal.
     
1 1 1
Hence, √ ∪ √ em | m ≥ 1 ∪ √ fn | n ≥ 1 is an orthonormal set in C[−π, π].
2π π π
Theorem 5.5.1.30. Let V be an ips with {u1 , . . . , un } as a set of mutually orthogonal vectors.

1. Then the set {u1 , . . . , un } is linearly independent.


P
n P
n P
n
2. Let v = αi ui ∈ V. Then kvk2 = k αi ui k2 = | αi |2 kui k2 ;
i=1 i=1 i=1

P
n
3. Let v = αi ui ∈ V. So, for 1 ≤ i ≤ n, if kui k = 1 then αi = hv, ui i. That is,
i=1
P
n P
n
v= hv, ui iui and kvk2 = | hv, ui i |2 .
i=1 i=1

4. Let dim(V) = n. Then hv, ui i = 0 for all i = 1, 2, . . . , n if and only if v = 0.


120 CHAPTER 5. INNER PRODUCT SPACES

Proof. Part 1: Consider the linear system c1 u1 + · · · + cn un = 0 in the unknowns c1 , . . . , cn .


As h0, ui = 0 and huj , ui i = 0, for all j 6= i, we have
n
X
0 = h0, ui i = hc1 u1 + · · · + cn un , ui i = cj huj , ui i = ci hui , ui i.
j=1

As ui 6= 0, hui , ui i =
6 0 and therefore ci = 0, for 1 ≤ i ≤ n. Thus, the above linear system has
only the trivial solution. Hence, this completes the proof of Part 1.
Part 2: A similar argument gives
n
* n n
+ n
* n
+
X X X X X
2
k αi ui k = αi ui , αi ui = αi ui , αj uj
i=1 i=1 i=1 i=1 j=1
n
X n
X n
X n
X
= αi αj hui , uj i = αi αi hui , ui i = | αi |2 kui k2 .
i=1 j=1 i=1 i=1
* +
P
n P
n
Part 3: If kui k = 1, for 1 ≤ i ≤ n then hv, ui i = αj uj , ui = αj huj , ui i = αj .
j=1 j=1
Part 4: Follows directly using Part 3 as {u1 , . . . , un } is a basis of V.
 
T

Remark 5.5.1.31. Using Theorem 5.5.1.30, we see that if B = v1 , . . . , vn is an ordered


AF

 
hu, v1 i
 
orthonormal basis of an ips V then for each u ∈ V, [u]B =  ... . Thus, in place of solving
DR

hu, vn i
a linear system to get the coordinates of a vector, we just need to compute the inner product with
basis vectors.

Exercise 5.5.1.32. 1. Find v, w ∈ R3 such that v, w, (1, −1, −2)T are mutually orthogonal.
       " x+y #
1 1 x √
1 1 2 2 .
2. Let B = √2 , √2 be an ordered basis of R . Then = x−y
1 −1 y B √
2
        √ 
1 1 1 2 3
 1   1   1   3 T  −1 
3. For the ordered basis B = √3 1 , √2 −1 , √6 1 of R , [(2, 3, 1) ]B =  √2 .
1 0 −2 √3
6

4. Let S = {u1 , . . . , un } ⊆ Rn and define A = [u1 , . . . , un ]. Then prove that A is an orthog-


onal matrix if and only if S is an orthonormal basis of Rn .

5. Let A be an n × n orthogonal matrix. Then prove that

(a) the rows/columns of A form an orthonormal basis of Rn .


(b) for any two vectors x, y ∈ Rn , hAx, Ayi = hx, yi Orthogonal matrices preserve
angle.
(c) for any vector x ∈ Rn , kAxk = kxk Orthogonal matrices preserve length.

6. Let A be an n × n unitary matrix. Then prove that


5.2. GRAM-SCHMIDT ORTHOGONALIZATION PROCESS 121

(a) the rows/columns of A form an orthonormal basis of the complex vector space Cn .
(b) for any two vectors x, y ∈ Cn , hAx, Ayi = hx, yi Unitary matrices preserve
angle.
(c) for any vector x ∈ Cn , kAxk = kxk Unitary matrices preserve length.

7. Let A, B ∈ Mn (C) be two unitary matrices. Then prove that AB and BA are unitary
matrices.
P P
8. If A = [aij ] and B = [bij ] are unitarily equivalent then prove that |aij |2 = |bij |2 .
ij ij

9. Let A be an n × n upper triangular matrix. If A is also an orthogonal matrix then A is a


diagonal matrix with diagonal entries ±1.

5.2 Gram-Schmidt Orthogonalization Process


In view of the importance of Theorem 5.5.1.30, we inquire into the question of extracting an
orthonormal basis from a given basis. The process of extracting an orthonormal basis from a
finite linearly independent set is called the Gram-Schmidt Orthogonalization process. We
T
AF

first consider a few examples.


DR

Example 5.5.2.1. Which point on the plane P is closest to the point, say Q?

P lane − P
0 y

Solution: Let y be the foot of the perpendicular from Q on P . Thus, by Pythagoras Theorem,
this point is unique. So, the question arises: how do we find y?
−→ − −→

Note that yQ gives a normal vector of the plane P . Hence, y = Q − yQ. So, need to find a
−→
way to compute yQ, a line on the plane passing through 0 and y.

Thus, we see that given u, v ∈ V \ {0}, we need to find two vectors, say y and z, such that y
is parallel to u and z is perpendicular to u. Thus, y = u cos(θ) and z = u sin(θ), where θ is the
angle between u and v.
R v P
~ = hv,ui
~ =v−
OR hv,ui
u OQ kuk2
u
kuk2
u
Q
O θ

Figure 3: Decomposition of vector v


u
We do this as follows (see Figure 5.2). Let û = be the unit vector in the direction
kuk
~
of u. Then using trigonometry, cos(θ) = kOQk ~ ~
~ . Hence kOQk = kOP k cos(θ). Now using
kOP k
122 CHAPTER 5. INNER PRODUCT SPACES

~ hv,ui hv,ui
Definition 5.5.1.10, kOQk = kvk kvk kuk = kuk , where the absolute value is taken as the
length/norm is a positive quantity. Thus,
 
~ = kOQk
~ û = v, u u
OQ .
kuk kuk
   
~ = v, u u u u ~
Hence, y = OQ kuk and z = v − v, . In literature, the vector y = OQ
kuk kuk kuk
is called the orthogonal projection of v on u, denoted Proju (v). Thus,
 
u u hv, ui
Proju (v) = v, ~
and kProju (v)k = kOQk = . (5.5.2.1)
kuk kuk kuk

~ = kP~Qk =
Moreover, the distance of u from the point P equals kORk u
v − hv, kuk i u
kuk .

Example 5.5.2.2. 1. Determine the foot of the perpendicular from the point (1, 2, 3) on the
XY -plane.
Solution: Verify that the required point is (1, 2, 0)?

2. Determine the foot of the perpendicular from the point Q = (1, 2, 3, 4) on the plane
generated by (1, 1, 0, 0), (1, 0, 1, 0) and (0, 1, 1, 1).
T
AF

Answer: (x, y, z, w) lies on the plane x− y − z + 2w = 0 ⇔ h(1, −1, −1, 2), (x, y, z, w)i = 0.
DR

So, the required point equals

1 1
(1, 2, 3, 4) − h(1, 2, 3, 4), √ (1, −1, −1, 2)i √ (1, −1, −1, 2)
7 7
4 1
= (1, 2, 3, 4) − (1, −1, −1, 2) = (3, 18, 25, 20).
7 7

3. Determine the projection of v = (1, 1, 1, 1)T on u = (1, 1, −1, 0)T .


u
Solution: By Equation (5.5.2.1), we have Projv (u) = hv, ui = 31 (1, 1, −1, 0)T and
kuk2
w = (1, 1, 1, 1)T − Projv (u) = 31 (2, 2, 4, 3)T is orthogonal to u.

4. Let u = (1, 1, 1, 1)T , v = (1, 1, −1, 0)T , w = (1, 1, 0, −1)T ∈ R4 . Write v = v1 + v2 , where
v1 is parallel to u and v2 is orthogonal to u. Also, write w = w1 + w2 + w3 such that w1
is parallel to u, w2 is parallel to v2 and w3 is orthogonal to both u and v2 .
Solution: Note that

u 1 1 T is parallel to u.
(a) v1 = Proju (v) = hv, ui kuk2 = 4 u = 4 (1, 1, 1, 1)

(b) v2 = v − 41 u = 41 (3, 3, −5, −1)T is orthogonal to u.

Note that Proju (w) is parallel to u and Projv2 (w) is parallel to v2 . Hence, we have

u 1 1 T is parallel to u,
(a) w1 = Proju (w) = hw, ui kuk2 = 4 u = 4 (1, 1, 1, 1)

7
(b) w2 = Projv2 (w) = hw, v2 i kvv22k2 = 44 (3, 3, −5, −1)
T is parallel to v2 and
3 T
(c) w3 = w − w1 − w2 = 11 (1, 1, 2, −4) is orthogonal to both u and v2 .
5.2. GRAM-SCHMIDT ORTHOGONALIZATION PROCESS 123

That is, from the given vector subtract all the orthogonal projections/components. If the new
vector is non-zero then this vector is orthogonal to the previous ones. This idea is generalized
to give the Gram-Schmidt Orthogonalization process.

Theorem 5.5.2.3 (Gram-Schmidt Orthogonalization Process). Let V be an ips. If {v1 , . . . , vn }


is a set of linearly independent vectors in V then there exists an orthonormal set {w1 , . . . , wn }
in V. Furthermore, LS(w1 , . . . , wi ) = LS(v1 , . . . , vi ), for 1 ≤ i ≤ n.

Proof. Note that for orthonormality, we need kwi k = 1, for 1 ≤ i ≤ n and hwi , wj i = 0, for
1 ≤ i 6= j ≤ n. Also, by Corollary 3.3.2.7.2, vi ∈
/ LS(v1 , . . . , vi−1 ), for 2 ≤ i ≤ n, as {v1 , . . . , vn }
is a linearly independent set. We are now ready to prove the result by induction.
v1
Step 1: Define w1 = then LS(v1 ) = LS(w1 ).
kv1 k
u2
Step 2: Define u2 = v2 − hv2 , w1 iw1 . Then u2 6= 0 as v2 6∈ LS(v1 ). So, let w2 = .
ku2 k
Note that {w1 , w2 } is orthonormal and LS(w1 , w2 ) = LS(v1 , v2 ).
Step 3: For induction, assume that we have obtained an orthonormal set {w1 , . . . , wk−1 } such
that LS(v1 , . . . , vk−1 ) = LS(w1 , . . . , wk−1 ). Now, note that
P
k−1 P
k−1
uk = vk − hvk , wi iwi = vk − Projwi (vk ) 6= 0 as vk ∈
/ LS(v1 , . . . , vk−1 ). So, let us put
T

i=1 i=1
uk
AF

wk = . Then, {w1 , . . . , wk } is orthonormal as kwk k = 1 and


kuk k
DR

k−1
X k−1
X
kuk khwk , w1 i = huk , w1 i = hvk − hvk , wi iwi , w1 i = hvk , w1 i − h hvk , wi iwi , w1 i
i=1 i=1
k−1
X
= hvk , w1 i − hvk , wi ihwi , w1 i = hvk , w1 i − hvk , w1 i = 0.
i=1

Similarly, hwk , wi i = 0, for 2 ≤ i ≤ k − 1. Clearly, wk = uk /kuk k ∈ LS(w1 , . . . , wk−1 , vk ). So,


wk ∈ LS(v1 , . . . , vk ).
P
k−1
As vk = kuk kwk + hvk , wi iwi , we get vk ∈ LS(w1 , . . . , wk ). Hence, by the principle of
i=1
mathematical induction LS(w1 , . . . , wk ) = LS(v1 , . . . , vk ) and the required result follows.
We now illustrate the Gram-Schmidt process with a few examples.

Example 5.5.2.4. 1. Let S = {(1, −1, 1, 1), (1, 0, 1, 0), (0, 1, 0, 1)} ⊆ R4 . Find an orthonor-
mal set T such that LS(S) = LS(T ).
Solution: Let v1 = (1, 0, 1, 0)T , v2 = (0, 1, 0, 1)T and v3 = (1, −1, 1, 1)T . Then
w1 = √1 (1, 0, 1, 0)T . As hv2 , w1 i = 0, we get w2 = √12 (0, 1, 0, 1)T . For the third vec-
2
tor, let u3 = v3 − hv3 , w1 iw1 − hv3 , w2 iw2 = (0, −1, 0, 1)T . Thus, w3 = √1 (0, −1, 0, 1)T .
2

 T  T 1 T  T
2. Let S = {v1 = 2 0 0 , v2 = 23 2 0 , v3 = 2
3
2 0 , v4 = 1 1 1 }. Find an
orthonormal set T such that LS(S) = LS(T ).
 T
Solution: Take w1 = kvv11 k = 1 0 0 = e1 . For the second vector, consider u2 =
 T  T
v2 − 23 w1 = 0 2 0 . So, put w2 = kuu22 k = 0 1 0 = e2 .
124 CHAPTER 5. INNER PRODUCT SPACES

P
2
For the third vector, let u3 = v3 − hv3 , wi iwi = (0, 0, 0)T . So, v3 ∈ LS((w1 , w2 )). Or
i=1
equivalently, the set {v1 , v2 , v3 } is linearly dependent.
P
2
So, for again computing the third vector, define u4 = v4 − hv4 , wi iwi . Then, u4 =
i=1
v4 − w1 − w2 = e3 . So w4 = e3 . Hence, T = {w1 , w2 , w4 } = {e1 , e2 , e3 }.

3. Find an orthonormal set in R3 containing (1, 2, 1)T .




Solution: Let (x, y, z)T ∈ R3 with (1, 2, 1), (x, y, z) = 0. Thus,

(x, y, z) = (−2y − z, y, z) = y(−2, 1, 0) + z(−1, 0, 1).

Observe that (−2, 1, 0) and (−1, 0, 1) are orthogonal to (1, 2, 1) but are themselves not
orthogonal.
Method 1: Apply Gram-Schmidt process to { √16 (1, 2, 1)T , (−2, 1, 0)T , (−1, 0, 1)T } ⊆ R3 .
Method 2: Valid only in R3 using the cross product of two vectors.
In either case, verify that { √16 (1, 2, 1), −1

5
(2, −1, 0), √−1
30
(1, 2, −5)} is the required set.

We now state two immediate corollaries without proof.


T
AF

Corollary 5.5.2.5. Let V 6= {0} be an ips. If


DR

1. V is finite dimensional then V has an orthonormal basis.


2. S is a non-empty orthonormal set and dim(V) is finite then S can be extended to form an
orthonormal basis of V.

Remark 5.5.2.6. Let S = {v1 , . . . , vn } =


6 {0} be a non-empty subset of a finite dimensional
vector space V. If we apply Gram-Schmidt process to
1. S then we obtain an orthonormal basis of LS(v1 , . . . , vn ).
2. a re-arrangement of elements of S then we may obtain another orthonormal basis of
LS(v1 , . . . , vn ). But, observe that the size of the two bases will be the same.
Exercise 5.5.2.7. 1. Let V be an ips with B = {v1 , . . . , vn } as a basis. Then prove that B
Pn
is orthonormal if and only if for each x ∈ V, x = hx, vi ivi . [Hint: Since B is a basis,
i=1
each x ∈ V has a unique linear combination in terms of vi ’s.]
2. Let S be a subset of V having 101 elements. Suppose that the application of the Gram-
Schmidt process yields u5 = 0. Does it imply that LS(v1 , . . . , v5 ) = LS(v1 , . . . , v4 )? Give
reasons for your answer.
P
k
3. Let B = {v1 , . . . , vn } be an orthonormal set in Rn . For 1 ≤ k ≤ n, define Ak = vi viT .
i=1
Then prove that ATk = Ak and A2k = Ak . Thus, Ak ’s are projection matrices.
4. Determine an orthonormal basis of R4 containing (1, −2, 1, 3)T and (2, 1, −3, 1)T .
5. Let x ∈ Rn with kxk = 1.
5.3. ORTHOGONAL OPERATOR AND RIGID MOTION 125

(a) Then prove that {x} can be extended to form an orthonormal basis of Rn .
(b) Let the extended basis be {x,x2 , . . . , xn } and B = [e  1 , . . . , en ] the standard ordered
basis of Rn . Prove that A = [x]B , [x2 ]B , . . . , [xn ]B is an orthogonal matrix.

6. Let v, w ∈ Rn , n ≥ 1 with kuk = kwk = 1. Prove that there exists an orthogonal matrix A
such that Av = w. Prove also that A can be chosen such that det(A) = 1.

7. Let (V, h , i) be an n-dimensional ips. If u ∈ V with kuk = 1 then give reasons for the
following statements.

(a) Let S ⊥ = {v ∈ V | hv, ui = 0}. Then dim(S ⊥ ) = n − 1.


(b) Let 0 6= β ∈ F. Then S = {v ∈ V : hv, ui = β} is not a subspace of V.
(c) Let v ∈ V. Then v = v0 + hv, uiu for a vector v0 ∈ S ⊥ . That is, V = LS(u, S ⊥ ).

5.3 Orthogonal Operator and Rigid Motion


We now give the definition and a few properties of an orthogonal operator.
T

Definition 5.5.3.1. [Orthogonal Operator] Let V be a vector space. Then, a linear operator
AF

T : V → V is said to be an orthogonal operator if kT (x)k = kxk, for all x ∈ V.


DR

Example 5.5.3.2. Each T ∈ L(V) given below is an orthogonal operator.


1. Fix a unit vector a ∈ V and define T (x) = 2ha, xia − x, for all x ∈ V.


Solution: Note that Proja (x) = ha, xia. So, ha, xia, x − ha, xia = 0. Also, by
Pythagoras theorem kx − ha, xiak2 = kxk2 − (ha, xi)2 . Thus,

kT (x)k2 = k(ha, xia) + (ha, xia − x)k2 = kha, xiak2 + kx − ha, xiak2 = kxk2 .

  
cos θ − sin θ x
2. Let n = 2, V = R2 and 0 ≤ θ < 2π. Now define T (x) = .
sin θ cos θ y

We now show that an operator is orthogonal if and only if it preserves the angle.

Theorem 5.5.3.3. Let T ∈ L(V). Then, the following statements are equivalent.
1. T is an orthogonal operator.

2. hT (x), T (y)i = hx, yi, for all x, y ∈ V. That is, T preserves inner product.

Proof. 1 ⇒ 2 Let T be an orthogonal operator. Then, kT (x + y)k2 = kx + yk2 . So,


kT (x)k2 + kT (y)k2 + 2hT (x), T (y)i = kT (x) + T (y)k2 = kT (x + y)k2 = kxk2 + kyk2 + 2hx, yi.
Thus, using definition again hT (x), T (y)i = hx, yi.
2 ⇒ 1 If hT (x), T (y)i = hx, yi, for all x, y ∈ V then T is an orthogonal operator as
kT (x)k2 = hT (x), T (x)i = hx, xi = kxk2 .
As an immediate corollary, we obtain the following result.
126 CHAPTER 5. INNER PRODUCT SPACES

Corollary 5.5.3.4. Let T ∈ L(V). Then T is an orthogonal operator if and only if “for every
orthonormal basis {u1 , . . . , un } of V, {T (u1 ), . . . , T (un )} is an orthonormal basis of V”. Thus,
if B is an orthonormal ordered basis of V then T [B, B] is an orthogonal matrix.

Definition 5.5.3.5. [Isometry, Rigid Motion] Let V be a vector space. Then, a map T : V → V
is said to be an isometry or a rigid motion if kT (x) − T (y)k = kx − yk, for all x, y ∈ V.
That is, an isometry is distance preserving.

Observe that if T and S are two rigid motions then ST is also a rigid motion. Furthermore,
it is clear from the definition that every rigid motion is invertible.

Example 5.5.3.6. The maps given below are rigid motions/isometry.

1. Let V be a linear space with norm k · k. If a ∈ V then the translation map Ta : V → V


(see Exercise 1), defined by Ta (x) = x + a for all x ∈ V, is an isometry/rigid motion as

kTa (x) − Ta (y)k = k (x + a) − (y + a) k = kx − yk.


T

2. Let V be an ips. Then, using Theorem 5.5.3.3, we see that every orthogonal operator is
AF

an isometry.
DR

We now prove that every rigid motion that fixes origin is an orthogonal operator.

Theorem 5.5.3.7. Let V be a real ips. Then, the following statements are equivalent for any
map T : V → V.

1. T is a rigid motion that fixes origin.

2. T is linear and hT (x), T (y)i = hx, yi, for all x, y ∈ V (preserves inner product).

3. T is an orthogonal operator.

Proof. We have already seen the equivalence of Part 2 and Part 3 in Theorem 5.5.3.3. Let us
now prove the equivalence of Part 1 and Part 2/Part 3.
If T is an orthogonal operator then T (0) = 0 and kT (x) − T (y)k = kT (x − y)k = kx − yk.
This proves Part 3 implies Part 1.
We now prove Part 1 implies Part 2. So, let T be a rigid motion that fixes 0. Thus, T (0) = 0
and kT (x) − T (y)k = kx − yk, for all x, y ∈ V. Hence, in particular for y = 0, we have
kT (x)k = kxk, for all x ∈ V. So,

kT (x)k2 + kT (y)k2 − 2hT (x), T (y)i = hT (x) − T (y), T (x) − T (y)i = kT (x) − T (y)k2
= kx − yk2 = hx − y, x − yi
= kxk2 + kyk2 − 2hx, yi.
5.4. ORTHOGONAL PROJECTIONS AND APPLICATIONS 127

Thus, using kT (x)k = kxk, for all x ∈ V, we get hT (x), T (y)i = hx, yi, for all x, y ∈ V. Now, to
prove T is linear, we use hT (x), T (y)i = hx, yi in 3-rd and 4-th line to get

kT (x + y) − (T (x) + T (y)) k2 = hT (x + y) − (T (x) + T (y)) , T (x + y) − (T (x) + T (y))i


= hT (x + y), T (x + y)i − 2 hT (x + y), T (x)i
−2 hT (x + y), T (y)i + hT (x) + T (y), T (x) + T (y)i
= hx + y, x + yi − 2hx + y, xi − 2hx + y, yi
+hT (x), T (x)i + 2hT (x), T (y)i + hT (y), T (y)i
= −hx + y, x + yi + hx, xi + 2hx, yi + hy, yi = 0.

Thus, T (x + y) − (T (x) + T (y)) = 0 and hence T (x + y) = T (x) + T (y). A similar calculation


gives T (αx) = αT (x) and hence T is linear.

5.4 Orthogonal Projections and Applications


Till now, our main interest was to understand the linear system Ax = b from different points
of view. But, in most practical situations the system has no solution. So, we try to find x such
T

that the vector err = b − Ax has minimum norm. The next result gives the existence of an
AF

orthogonal subspace of a finite dimensional inner product space.


DR

Theorem 5.5.4.1 (Decomposition). Let V be an ips having W as a finite dimensional subspace.


P
k
Suppose {f1 , . . . , fk } is an orthonormal basis of W. Then, for each b ∈ V, y = hb, fi ifi is the
i=1
only closest point in W from b. Furthermore, b − y ∈ W⊥ .
P
k
Proof. Clearly y = hb, fi ifi ∈ W. As the closet point is the feet of the perpendicular, we need
i=1
to show that b − y ∈ W⊥ . To do so, we verify that hb − y, fi i = 0, for 1 ≤ i ≤ k.
* k + k
X X
hb − y, fi i = hb, fi i − hb, fj ifj , fi = hb, fi i − hb, fj ihfj , fi i = hb, fi i − hb, fi i = 0.
j=1 j=1

Also, note that for each w ∈ W , y − w ∈ W and hence

kb − wk2 = kb − y + y − wk2 = kb − yk2 + ky − wk2 ≥ kb − yk2 .

Thus, y is the closet point in W from b. Now, use Pythagoras theorem to conclude that y is
unique. Thus, the required result follows.
We now give a definition and then an implication of Theorem 5.5.4.1.

Definition 5.5.4.2. [Orthogonal Projection] Let W be a finite dimensional subspace of an ips


V. Then, by Theorem 5.5.4.1, for each v ∈ V there exist unique vectors w ∈ W and u ∈ W⊥
with v = w + u. We thus define the orthogonal projection of V onto W, denoted PW , by

PW : V → V by PW (v) = w.

The vector w is called the projection of v on W.


128 CHAPTER 5. INNER PRODUCT SPACES

Remark 5.5.4.3. Let A ∈ Mm,n (R). Then, to find the orthogonal projection y of a vector b
on Col(A), we can use either of the following ideas:
P
k
1. Determine an orthonormal basis {f1 , . . . , fk } of Col(A) and get y = hb, fi ifi .
i=1

2. By Remark 5.5.1.25.1, the spaces Col(A) and Null(AT ) are completely orthogonal. Hence,
every b ∈ Rm equals b = u + v for unique u ∈ Col(A) and v ∈ Null(AT ). Thus, using
Definition 5.5.4.2 and Theorem 5.5.4.1, y = u.

Before proceeding to projections, we give an application of Theorem 5.5.4.1 to a linear system.

Corollary 5.5.4.4. Let A ∈ Mm,n (R) and b ∈ Rm . Then, every least square solution of Ax = b
is a solution of the system AT Ax = AT b. Conversely, every solution of AT Ax = AT b is a least
square solution of Ax = b.

Proof. Let W = Col(A). Then, by Remark 5.5.4.3, b = y + v, where y ∈ W, v ∈ Null(AT )


and min{kb − wk |w ∈ W} = kb − yk.
As y ∈ W there exists x0 ∈ Rn such that Ax0 = y. That is, x0 is the least square solution of
Ax = b. Hence,
T

(AT A)x0 = AT (Ax0 ) = AT y = AT (b − v) = AT b − 0 = AT b.


AF

Conversely, we need to show that min{kb − Axk |x ∈ Rn } = kb − Ax1 k, where x1 ∈ Rn is a


DR

solution of AT Ax = AT b. Thus, AT (Ax1 −b) = 0. Hence, for any x ∈ Rn , hb−Ax1 , A(x−x1 )i =


(x − x1 )T AT (b − Ax1 ) = (x − x1 )T 0 = 0. Thus,

kb − Axk2 = kb − Ax1 + Ax1 − Axk2 = kb − Ax1 k2 + kAx1 − Axk2 ≥ kb − yk2 .

Hence, the required result follows.


The above corollary gives the following result.

Corollary 5.5.4.5. Let A ∈ Mm,n (R) and b ∈ Rm . If


1. AT A is invertible then the least square solution of Ax = b equals x = (AT A)−1 AT b.
2. AT A is not invertible then the least square solution of Ax = b equals x = (AT A)− AT b,
where (AT A)− is the pseudo-inverse of AT A.

Proof. Part 1 directly follows from Corollary 5.5.4.5. For Part 1, let W = Col(A). Then, by
Remark 5.5.4.3, b = y + v, where y ∈ W and v ∈ Null(AT ). So, AT b = AT (y + v) = AT y.
Since y ∈ W, there exists x0 ∈ Rn such that Ax0 = y. Thus, AT b = AT Ax0 . Now, using the
definition of pseudo-inverse (see Exercise 1.1.3.6.18), we see that

(AA A) (AT A)− AT b = (AT A)(AT A)− (AT A)x0 = (AT A)x0 = AT b.

Thus, we see that (AT A)− AT b is a solution of the system AT Ax = AT b. Hence, by Corol-
lary 5.5.4.4, the required result follows.
We now give a few examples to understand projections.
5.4. ORTHOGONAL PROJECTIONS AND APPLICATIONS 129

Example 5.5.4.6. Use the fundamental theorem of linear algebra to compute the vector of the
orthogonal projection.

1. Determine the projection of (1, 1, 1, 1, 1)T on Null ([1, −1, 1, −1, 1]).
Solution: Here A = [1, −1, 1, −1, 1]. So, a basis of Col(AT ) equals {(1, −1, 1, −1, 1)T }
and that of Null(A) equals {(1, 1, 0, 0, 0)T , (1, 0, −1, 0, 0)T , (1, 0, 0, 1, 0)T , (1, 0, 0, 0, −1)T }.
Then, 
thesolution of the linear system   
1 1 1 1 1 1 6
1 1 0 0 0 −1 −4
    1 
Bx =   
1, where B = 0 −1 0 0 1  
 equals x = 5  6 . Thus, the projection is
1 0 0 1 0 −1 −4
1 0 0 0 −1 1 1
1  2
6(1, 1, 0, 0, 0)T − 4(1, 0, −1, 0, 0)T + 6(1, 0, 0, 1, 0)T − 4(1, 0, 0, 0, −1)T = (2, 3, 2, 3, 2)T .
5 5
T
2. Determine the projection of (1, 1, 1) on Null ([1, 1, −1]).
Solution: Here A = [1, 1, −1]. So, a basis of Null(A) equals {(1, −1, 0)T , (1, 0, 1)T } and
that of 
Col(A T ) equals {(1, 1, −1)T }. Then, the solution of the linear system
    
1 1 1 1 −2
    1 
Bx = 1 , where B = −1 0 1 equals x = 4 . Thus, the projection is
3
1 0 1 −1 1
T

1  2
(−2)(1, −1, 0)T + 4(1, 0, 1)T = (1, 1, 2)T .
AF

3 3

3. Determine the projection of (1, 1, 1)T on Col [1, 2, 1]T .
DR

Solution: Here, AT = [1, 2, 1], a basis of Col(A) equals {(1, 2, 1)T } and that of Null(AT )
equals {(1, T T
 0, −1) , (2, −1,
 0) }. Then,
 using the solution of the linear system
1 1 2 1
2
Bx = 1, where B =  0 −1 2 gives (1, 2, 1)T as the required vector.
3
1 −1 0 1

To use the first idea in Remark 5.5.4.3, we prove the following result.

Theorem 5.5.4.7. [Matrix of Orthogonal Projection] Let W be a subspace of an ips V with


P
k
dim(W) < ∞. If {f1 , . . . , fk } is an orthonormal basis of W then PW = fi fiT .
i=1
 
P
k P
k  Pk
Proof. Let v ∈ V. Then PW v = fi fiT v= fi fiT v = hv, fi ifi . As PW v indeed gives
i=1 i=1 i=1
the only closet point (see Theorem 5.5.4.1), the required result follows.

Example 5.5.4.8. In each of the following, determine the matrix of the orthogonal projection.
Also, verify that PW + PW⊥ = I. What can you say about Rank(PW⊥ ) and Rank(PW )? Also,
verify the orthogonal projection vectors obtained in Example 5.5.4.6.

1. W = {(x1 , . . . , x5 )T ∈ R5 | x1 − x2 + x3 −
x4 + x5 = 0} = Null ([1, −1, 1, −1, 1]).
       

 1 0 1 −2  

  2 
   1

0
 
−1
  1  


Solution: An orthonormal basis of W is √2  1
0, √ 1, √  0 , √  3  . Thus,
1 1
  2   6   30  

 0 1 0 −3

 

 
0 0 −2 −2
130 CHAPTER 5. INNER PRODUCT SPACES
   
4 1 −1 1 −1 1 −1 1 −1 1
 1 4 1 −1 1 −1 1 −1 1 −1
P4 1

 1  
PW = T
fi fi = −1 1 4 1 −1

and PW⊥ =  1 −1 1 −1 1.
5  5
i=1 1 −1 1 4 1 −1 1 −1 1 −1
−1 1 −1 1 4 1 −1 1 −1 1
2. W = {(x, y, z)T ∈ R3 | x + y − z = 0} = Null ([1,1, −1]). 
⊥ 1 1
Solution: Note {(1, 1, −1)} is a basis of W and √ (1, −1, 0), √ (1, 1, 2) an orthonor-
2 6
mal basis of W. So,
   
1 1 −1 2 −1 1
1  
 and PW = 1 −1 2 1 .

PW⊥ =  1 1 −1
3  3 
−1 −1 1 1 1 2

Verify that PW + PW⊥ = I3 , Rank(PW⊥ ) = 2 and Rank(PW ) = 1.



3. W = LS( (1, 2, 1) ) = Col [1, 2, 1]T ⊆ R3 .
Solution: Using Example 5.5.2.4.3 and Equation (5.5.2.1)

W⊥ = LS({(−2, 1, 0), (−1, 0, 1)}) = LS({(−2, 1, 0), (1, 2, −5)}).


   
T

1 2 1 5 −2 −1
   
AF

So, PW = 16   1
2 4 2 and PW⊥ = 6 −2 2 −2.

DR

1 2 1 −1 −2 5

We advise the readers to give a proof of the next result.

Theorem 5.5.4.9. Let {f1 , . . . , fk } be an orthonormal basis of a subspace W of Rn . If {f1 , . . . , fn }


Pk Pn
is an extended orthonormal basis of Rn , PW = fi fiT and PW⊥ = fi fiT then prove that
i=1 i=k+1
1. In − PW = PW⊥ .
2. (PW )T = PW and (PW⊥ )T = PW⊥ . That is, PW and PW⊥ are symmetric.
3. (PW )2 = PW and (PW⊥ )2 = PW⊥ . That is, PW and PW⊥ are idempotent.
4. PW ◦ PW⊥ = PW⊥ ◦ PW = 0.
Exercise 5.5.4.10. 1. Let W = {(x, y, z, w) ∈ R4 : x = y, z = w} be a subspace of R4 .
Determine the matrix of the orthogonal projection.
2. Let PW1 and PW2 be the orthogonal projections of R2 onto W1 = {(x, 0) : x ∈ R} and
W2 = {(x, x) : x ∈ R}, respectively. Note that PW1 ◦ PW2 is a projection onto W1 . But, it
is not an orthogonal projection. Hence or otherwise, conclude that the composition of two
orthogonal projections need not be an orthogonal projection?
 
1 1
3. Let A = . Then A is idempotent but not symmetric. Now, define P : R2 → R2 by
0 0
P (v) = Av, for all v ∈ R2 . Then

(a) P is idempotent.
(b) Null(P ) ∩ Rng(P ) = Null(A) ∩ Col(A) = {0}.
5.4. ORTHOGONAL PROJECTIONS AND APPLICATIONS 131

(c) R2 = Null(P ) + Rng(P ). But, (Rng(P ))⊥ = (Col(A))⊥ 6= Null(A).


(d) Since (Col(A))⊥ 6= Null(A), the map P is not an orthogonal projector. In this case,
P is called a projection of R2 onto Rng(P ) along Null(P ).

4. Find all 2 × 2 real matrices A such that A2 = A. Hence, or otherwise, determine all
projection operators of R2 .
5. Let W be an (n − 1)-dimensional subspace of Rn with ordered basis BW = [f1 , . . . , fn−1 ].
Suppose B = [f1 , . . . , fn−1 , fn ] is an orthogonal ordered basis of Rn obtained by extending
P
n−1
BW . Now, define a function Q : Rn → Rn by Q(v) = hv, fn ifn − hv, fi ifi . Then
i=1

(a) Q fixes every vector in W⊥ .


(b) Q sends every vector w ∈ W to −w.
(c) Q ◦ Q = In .

The function Q is called the reflection operator with respect to W⊥ .

Theorem 5.5.4.11 (Bessel’s Inequality). Let V be an ips with {v1 , · · · , vn } as an orthogonal set.
n
X Xn
| hu, vk i |2 2 hu, vk i
≤ kuk ∈
T
Then , for each u V. Equality holds if and only if u = vk .
kvk k2 kvk k2
AF

k=1 k=1

vk P
n
DR

Proof. For 1 ≤ k ≤ n, define wk = and u0 = hu, wk iwk . Then, by Theorem 5.5.4.1 u0 is


kvk k k=1
the nearest the vector to u in LS(v1 , · · · , vn ). Also, hu − u0 , u0 i = 0. So,
kuk2 = ku − u0 + u0 k2 = ku − u0 k2 + ku0 k2 ≥ ku0 k2 . Thus, we have obtained the required
inequality. The equality is attained if and only if u − u0 = 0 or equivalently, u = u0 .
u

u0
0

We now give a generalization of the pythagoras theorem. The proof is left as an exercise for
the reader.

Theorem 5.5.4.12 (Parseval’s formula). Let V be an ips with {v1 , · · · , vn } as an orthonormal


P
n
basis of V. Then, for each x, y ∈ V, hx, yi = hx, vi ihy, vi i. Furthermore, if x = y then
i=1
P
n
kxk2 = | hx, vi i |2 , giving us a generalization of the Pythagoras Theorem.
i=1

Exercise 5.5.4.13. Let A ∈ Mm,n (R). Then there exists a unique B such that hAx, yi =
hx, Byi, for all x ∈ Rn , y ∈ Rm . In fact B = AT .

5.4.A Orthogonal Projections as Self-Adjoint Operators*


Theorem 5.5.4.9 implies that the matrix of the projection operator is symmetric. We use this
idea to proceed further.
132 CHAPTER 5. INNER PRODUCT SPACES

Definition 5.5.4.14. [Self-Adjoint Operator] Let V be an ips with inner product h , i. A linear
operator P : V → V is called self-adjoint if hP (v), ui = hv, P (u)i, for every u, v ∈ V.

A careful understanding of the examples given below shows that self-adjoint operators and
Hermitian matrices are related. It also shows that the vector spaces Cn and Rn can be decom-
posed in terms of the null space and column space of Hermitian matrices. They also follow
directly from the fundamental theorem of linear algebra.

Example 5.5.4.15. 1. Let A be an n × n real symmetric matrix. If P : Rn → Rn is defined


by P (x) = Ax, for every x ∈ Rn then

(a) P is a self adjoint operator as A = AT , for every x, y ∈ Rn , implies

hP (x), yi = (yT )Ax = (yT )AT x = (Ay)T x = hx, Ayi = hx, P (y)i.

(b) Null(P ) = (Rng(P ))⊥ as A = AT . Thus, Rn = Null(P ) ⊕ Rng(P ).

2. Let A be an n × n Hermitian matrix. If P : Cn → Cn is defined by P (z) = Az, for all


z ∈ Cn then using similar arguments (see Example 5.5.4.15.1) prove the following:
T

(a) P is a self-adjoint operator.


AF

(b) Null(P ) = (Rng(P ))⊥ as A = A∗ . Thus, Cn = Null(P ) ⊕ Rng(P ).


DR

We now state and prove the main result related with orthogonal projection operators.

Theorem 5.5.4.16. Let V be a finite dimensional ips. If V = W ⊕ W⊥ then the orthogonal


projectors PW : V → V on W and PW⊥ : V → V on W⊥ satisfy

1. Null(PW ) = {v ∈ V : PW (v) = 0} = W⊥ = Rng(PW⊥ ).

2. Rng(PW ) = {PW (v) : v ∈ V} = W = Null(PW⊥ ).

3. PW ◦ PW = PW , PW⊥ ◦ PW⊥ = PW⊥ (Idempotent).

4. PW⊥ ◦ PW = 0V and PW ◦ PW⊥ = 0V , where 0V (v) = 0, for all v ∈ V

5. PW + PW⊥ = IV , where IV (v) = v, for all v ∈ V.

6. The operators PW and PW⊥ are self-adjoint.

Proof. Part 1: As V = W⊕W⊥, for each u ∈ W⊥ , one uniquely writes u = 0+u, where 0 ∈ W
and u ∈ W⊥ . Hence, by definition, PW (u) = 0 and PW⊥ (u) = u. Thus, W⊥ ⊆ Null(PW ) and
W⊥ ⊆ Rng(PW⊥ ).
Now suppose that v ∈ Null(PW ). So, PW (v) = 0. As V = W ⊕ W⊥ , v = w + u, for unique
w ∈ W and unique u ∈ W⊥ . So, by definition, PW (v) = w. Thus, w = PW (v) = 0. That is,
v = 0 + u = u ∈ W⊥ . Thus, Null(PW ) ⊆ W⊥ .
A similar argument implies Rng(PW⊥ ) ⊆ W ⊥ and thus completing the proof of the first part.
Part 2: Use an argument similar to the proof of Part 1.
5.4. ORTHOGONAL PROJECTIONS AND APPLICATIONS 133

Part 3, Part 4 and Part 5: Let v ∈ V. Then v = w + u, for unique w ∈ W and unique
u ∈ W⊥ . Thus, by definition,

(PW ◦ PW )(v) = PW PW (v) = PW (w) = w and PW (v) = w

(PW⊥ ◦ PW )(v) = PW⊥ PW (v) = PW⊥ (w) = 0 and
(PW ⊕ PW⊥ )(v) = PW (v) + PW⊥ (v) = w + u = v = IV (v).

Hence, PW ◦ PW = PW , PW⊥ ◦ PW = 0V and IV = PW ⊕ PW⊥ .


Part 6: Let u = w1 + x1 and v = w2 + x2 , for unique w1 , w2 ∈ W and unique x1 , x2 ∈ W⊥ .
Then, by definition, hwi , xj i = 0 for 1 ≤ i, j ≤ 2. Thus,

hPW (u), vi = hw1 , vi = hw1 , w2 i = hu, w2 i = hu, PW (v)i

and the proof of the theorem is complete.

Remark 5.5.4.17. Theorem 5.5.4.16 gives us the following:

1. The orthogonal projectors PW and PW⊥ are idempotent and self-adjoint.

2. Let v ∈ V. Then v − PW (v) = (IV − PW )(v) = PW⊥ (v) ∈ W⊥ . Thus, hv − PW (v), wi = 0,


T
AF

for every v ∈ V and w ∈ W.


DR

3. As PW (v) − w ∈ W, for each v ∈ V and w ∈ W, we have

kv − wk2 = kv − PW (v) + PW (v) − wk2


= kv − PW (v)k2 + kPW (v) − wk2 + 2hv − PW (v), PW (v) − wi
= kv − PW (v)k2 + kPW (v) − wk2 .

Therefore, kv − wk ≥ kv − PW (v)k and equality holds if and only if w = PW (v). Since


PW (v) ∈ W, we see that

d(v, W) = inf {kv − wk : w ∈ W } = kv − PW (v)k.

That is, PW (v) is the vector nearest to v ∈ W. This can also be stated as: the vector
PW (v) solves the following minimization problem:

inf kv − wk = kv − PW (v)k.
w∈W

The next theorem is a generalization of Theorem 5.5.4.16. We omit the proof as the arguments
are similar and uses the following:
Let V be a finite dimensional ips with V = W1 ⊕ · · · ⊕ Wk , for certain subspaces Wi ’s of V.
Then, for each v ∈ V there exist unique vectors v1 , . . . , vk such that

1. vi ∈ Wi , for 1 ≤ i ≤ k,

2. hvi , vj i = 0 for each vi ∈ Wi , vj ∈ Wj , 1 ≤ i 6= j ≤ k and


134 CHAPTER 5. INNER PRODUCT SPACES

3. v = v1 + · · · + vk .

Theorem 5.5.4.18. Let V be a finite dimensional ips with subspaces W1 , . . . , Wk of V such


that V = W1 ⊕ · · · ⊕ Wk . Then, for each i, j, 1 ≤ i 6= j ≤ k, there exist orthogonal projectors
PWi : V → V of V onto Wi satisfying the following:

1. Null(PWi ) = Wi⊥ = W1 ⊕ W2 ⊕ · · · ⊕ Wi−1 ⊕ Wi+1 ⊕ · · · ⊕ Wk .

2. Rng(PWi ) = Wi .

3. PWi ◦ PWi = PWi .

4. PWi ◦ PWj = 0V .

5. PWi is a self-adjoint operator, and

6. IV = PW1 ⊕ PW2 ⊕ · · · ⊕ PWk .

5.5 QR Decomposition∗
T

The next result gives the proof of the QR decomposition for real matrices. The readers are
AF

advised to prove similar results for matrices with complex entries. This decomposition and its
DR

generalizations are helpful in the numerical calculations related with eigenvalue problems (see
Chapter 6).

Theorem 5.5.5.1 (QR Decomposition). Let A ∈ Mn (R) be invertible. Then there exist matrices
Q and R such that Q is orthogonal and R is upper triangular with A = QR. Furthermore, if
det(A) 6= 0 then the diagonal entries of R can be chosen to be positive. Also, in this case, the
decomposition is unique.

Proof. As A is invertible, it’s columns form a basis of Rn . So, an application of the Gram-Schmidt
orthonormalization process to {A[:, 1], . . . , A[:, n]} gives an orthonormal basis {v1 , . . . , vn } of Rn
satisfying
LS(A[:, 1], . . . , A[:, i]) = LS(v1 , . . . , vi ), for 1 ≤ i ≤ n.

Since A[:, i] ∈ LS(v1 , . . . , vi ), for 1 ≤ i ≤ n, there exist αji ∈ R, 1 ≤ j ≤  i, such that


  α11 α12 · · · α1n
α1i  
 0 α22 · · · α2n 
   
A[:, i] = [v1 , . . . , vi ] ... . Thus, if Q = [v1 , . . . , vn ] and R =  . .. .. ..  then
 .. . . . 
αii  
0 0 · · · αnn
1. Q is an orthogonal matrix (see Exercise 5.5.1.32.4),
2. R is an upper triangular matrix, and
3. A = QR.

Thus, this completes the proof of the first part. Note that
5.5. QR DECOMPOSITION∗ 135

1. αii 6= 0, for 1 ≤ i ≤ n, as A[:, 1] 6= 0 and A[:, i] ∈


/ LS(v1 , . . . , vi−1 ).
2. if αii < 0, for some i, 1 ≤ i ≤ n then we can replace vi in Q by −vi to get a new Q ad R
in which the diagonal entries of R are positive.

Uniqueness: suppose Q1 R1 = Q2 R2 for some orthogonal matrices Qi ’s and upper triangular


matrices Ri ’s with positive diagonal entries. As Qi ’s and Ri ’s are invertible, we get Q−1
2 Q1 =
R2 R1−1 . Now, using
1. Exercises 2.2.3.30.1, 1.1.3.2.2, the matrix R2 R1−1 is an upper triangular matrix.
2. Exercises 1.1.3.2.11, Q−1
2 Q1 is an orthogonal matrix.

So, the matrix R2 R1−1 is an orthogonal upper triangular matrix and hence, by Exercise 1.1.3.6.17,
R2 R1−1 = In . So, R2 = R1 and therefore Q2 = Q1 .
Let A be an n × k matrix with Rank(A) = r. Then, by Remark 5.5.2.6, an application
of the Gram-Schmidt orthonormalization process to columns of A yields an orthonormal set
{v1 , . . . , vr } ⊆ Rn such that

LS(A[:, 1], . . . , A[:, j]) = LS(v1 , . . . , vi ), for 1 ≤ i ≤ j ≤ k.


T

Hence, proceeding on the lines of the above theorem, we have the following result.
AF
DR

Theorem 5.5.5.2 (Generalized QR Decomposition). Let A be an n × k matrix of rank r. Then


A = QR, where

1. Q = [v1 , . . . , vr ] is an n × r matrix with QT Q = Ir ,

2. LS(A[:, 1], . . . , A[:, j]) = LS(v1 , . . . , vi ), for 1 ≤ i ≤ j ≤ k and

3. R is an r × k matrix with Rank(R) = r.


 
1 0 1 2
 
0 1 −1 1
Example 5.5.5.3. 1. Let A = 
1 0 1
. Find an orthogonal matrix Q and an upper
 1

0 1 1 1
triangular matrix R such that A = QR.
Solution: From Example 5.5.2.4, we know that w1 = √1 (1, 0, 1, 0)T , w2 = √1 (0, 1, 0, 1)T
2 2
and w3 = √1 (0, −1, 0, 1)T . We now compute w4 . If v4 = (2, 1, 1, 1)T then
2

1
u4 = v4 − hv4 , w1 iw1 − hv4 , w2 iw2 − hv4 , w3 iw3 = (1, 0, −1, 0)T .
2
Thus, w4 = √1 (−1, 0, 1, 0)T . Hence, we see that A = QR with
2   √ √ 
√1 0 0 √1 2 0 2 − √32
 2 2   √ √ 
   1
 0 √2 √2 0 
−1   0 2 0 − 2
Q= w1 , . . . , w4 =  1  and R = 
 0 √ .
√
 2 0
−1
0 √2    0 2 0  
0 √ 1 √1
0 0 0 0 √1
2 2 2
136 CHAPTER 5. INNER PRODUCT SPACES
 
1 1 1 0
 
−1 0 −2 1
2. Let A =   . Find a 4 × 3 matrix Q satisfying QT Q = I3 and an upper
1 1 1 0 
 
1 0 2 1
triangular matrix R such that A = QR.
Solution: Let us apply the Gram-Schmidt orthonormalization process to the columns of
A. As v1 = (1, −1, 1, 1)T , we get w1 = 12 v1 . Let v2 = (1, 0, 1, 0)T . Then

1
u2 = v2 − hv2 , w1 iw1 = (1, 0, 1, 0)T − w1 = (1, 1, 1, −1)T .
2
Hence, w2 = 21 (1, 1, 1, −1)T . Let v3 = (1, −2, 1, 2)T . Then

u3 = v3 − hv3 , w1 iw1 − hv3 , w2 iw2 = v3 − 3w1 + w2 = 0.

So, we again take v3 = (0, 1, 0, 1)T . Then

u3 = v3 − hv3 , w1 iw1 − hv3 , w2 iw2 = v3 − 0w1 − 0w2 = v3 .

So, w3 = √1 (0, 1, 0, 1)T . Hence,


2
T

 
1 1  
AF

2 2 0
 −1 1  2 1 3 0
 2 √1   
DR

 2 2  and R = 
Q = [v1 , v2 , v3 ] =  1   0 1 −1 0  .
 2
1
2 0 √ 
1 −1 1
0 0 0 2

2 2 2

The readers are advised to check the following:

(a) Rank(A) = 3,
(b) A = QR with QT Q = I3 , and
(c) R is a 3 × 4 upper triangular matrix with Rank(R) = 3.

Remark 5.5.5.4. Let A ∈ Mm,n (R).


 
hv1 , A[:, 1]i hv1 , A[:, 2]i hv1 , A[:, 3]i ···
 0 hv2 , A[:, 2]i hv2 , A[:, 3]i · · ·
 
1. If A = QR with Q = [v1 , . . . , vn ] then R =  0 0 hv3 , A[:, 3]i · · · .
 
.. .. ..
. . .
In case Rank(A) < n then a slight modification gives the matrix R.
2. Further, let Rank(A) = n.
(a) Then AT A is invertible (see Exercise 5.5.1.27.4).
(b) By Theorem 5.5.5.2, A = QR with Q a matrix of size m×n and R an upper triangular
matrix of size n × n. Also, QT Q = In and Rank(R) = n.
(c) Thus, AT A = RT QT QR = RT R. As AT A is invertible, the matrix RT R is invertible.
Since R is a square matrix, by Exercise 4.4a, the matrix R itself is invertible. Hence,
(RT R)−1 = R−1 (RT )−1 .
5.6. SUMMARY 137

(d) So, if Q = [v1 , . . . , vn ] then

A(AT A)−1 AT = QR(RT R)−1 RT QT = (QR)(R−1 (RT )−1 )RT QT = QQT .

(e) Hence, using Theorem 5.5.4.7, we see that the matrix



v1T n
T −1 T T  ..  X
P = A(A A) A = QQ = [v1 , . . . , vn ] .  = vi viT
vnT i=1

is the orthogonal projection matrix on Col(A).

3. Further, let Rank(A) = r < n. If j1 , . . . , jr are the pivot columns of A then Col(A) =
Col(B), where B = [A[:, j1 ], . . . , A[:, jr ]] is an m × r matrix with Rank(B) = r. So, using
Part 2e we see that B(B T B)−1 B T is the orthogonal projection matrix on Col(A). So,
compute RREF of A and choose columns of A corresponding to the pivot columns.

5.6 Summary
T
AF

In the previous chapter, we learnt that if V is vector space over F with dim(V) = n then V
basically looks like Fn . Also, any subspace of Fn is either Col(A) or Null(A) or both, for some
DR

matrix A with entries from F.


So, we started this chapter with inner product, a generalization of the dot product in R3
or Rn . We used the inner product to define the length/norm of a vector. The norm has the
property that “the norm of a vector is zero if and only if the vector itself is the zero vector”.
We then proved the Cauchy-Bunyakovskii-Schwartz Inequality which helped us in defining the
angle between two vector. Thus, one can talk of geometrical problems in Rn and proved some
geometrical results.
We then independently defined the notion of a norm in Rn and showed that a norm is induced
by an inner product if and only if the norm satisfies the parallelogram law (sum of squares of
the diagonal equals twice the sum of square of the two non-parallel sides).
The next subsection dealt with the fundamental theorem of linear algebra where we showed
that if A ∈ Mm,n (C) then

1. dim(Null(A)) + dim(Col(A)) = n.
⊥ ⊥
2. Null(A) = Col(A∗ ) and Null(A∗ ) = Col(A) .

3. dim(Col(A)) = dim(Col(A∗ )).

We then saw that having an orthonormal basis is an asset as determining the

1. coordinates of a vector boils down to computing the inner product.

2. projection of a vector on a subspace boils down to finding an orthonormal basis of the


subspace and then summing the corresponding rank 1 matrices.
138 CHAPTER 5. INNER PRODUCT SPACES

So, the question arises, how do we compute an orthonormal basis? This is where we came
across the Gram-Schmidt Orthonormalization process. This algorithm helps us to determine
an orthonormal basis of LS(S) for any finite subset S of a vector space. This also lead to the
QR-decomposition of a matrix.
Thus, we observe the following about the linear system Ax = b. If
1. b ∈ Col(A) then we can use the Gauss-Jordan method to get a solution.
2. b ∈
/ Col(A) then in most cases we need a vector x such that the least square error between
b and Ax is minimum. We saw that this minimum is attained by the projection of b on
Col(A). Also, this vector can be obtained either using the fundamental theorem of linear
algebra or by computing the matrix B(B T B)−1 B T , where the columns of B are either the
pivot columns of A or a basis of Col(A).

T
AF
DR
Chapter 6

Eigenvalues, Eigenvectors and


Diagonalization

6.1 Introduction and Definitions


In this chapter, every matrix is an element of Mn (C) and x = (x1 , . . . , xn )T ∈ Cn , for some
T

n ∈ N. We start with a few examples to motivate this chapter.


     
AF

1 2 3 2 x
Example 6.6.1.1. 1. Let A = and B = and x = . Then,
2 1 2 3 y
DR

      
1 1 2 1 1
(a) A magnifies the nonzero vector three times as =3 . Verify that
1 2 1 1 1
   
1 1
B =5 and hence B magnifies 5 times.
1 1
     
1 1 1
(b) A behaves by changing the direction of as = −1 , whereas B fixes it.
−1 −1 −1
(x + y)2 (x − y)2 (x + y)2 (x − y)2
(c) xT Ax = 3 − and xT Bx = 5 + . So, maximum
2 2 2 2
and minimum displacement
  lines x + y = 0 and x − y = 0, where
occurs along
1 1
x + y = (x, y) and x − y = (x, y) .
1 −1
(d) the curve xT Ax = 1 represents a hyperbola, where as the curve xT Bx = 1 represents
an ellipse (see Figure 6.1 drawn using the package ”MATHEMATICA”).
 
1 2
2. Let C = , a non-symmetric matrix. Then, does there exist a nonzero x ∈ C2 which
1 3
gets magnified by C?
So, we need x 6= 0 and α ∈ C such that Cx = αx ⇔ [αI2 − C]x = 0. As x 6= 0,
[αI2 − C]x = 0 has a solution if and only if det[αI − A] = 0. But,
 
α−1 −2
det[αI − A] = det = α2 − 4α + 1.
−1 α − 3
 √ 
√ √ 1 + 3 √ −2
So, α = 2± 3. For α = 2+ 3, verify that the x 6= 0 that satisfies x=0
−1 3−1
140 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

20 2

10 1
2

0 0 0

-2
-10 -1

-4

-20 -2
-20 -10 0 10 20 -2 -1 0 1 2 -4 -2 0 2 4

Figure 6.1: A Hyperbola and two Ellipses (first one has orthogonal axes)
.

√
 √ 
3−1 √ 3+1
equals x = . Similarly, for α = 2 − 3, the vector x = satisfies
1 −1
 √ 
1− 3 √ −2 x = 0. In this example,
−1 − 3 − 1
√  √ 
3−1 3+1
(a) we still have magnifications in the directions and .
1 −1

(b) the maximum/minimum displacements do not occur along the lines ( 3− 1)x+ y = 0
T


AF

and ( 3 + 1)x − y = 0 (see the third curve in Figure 6.1).


√ √
DR

(c) the lines ( 3 − 1)x + y = 0 and ( 3 + 1)x − y = 0 are not orthogonal.

3. Let A be a real symmetric matrix. Consider the following problem:

Maximize (Minimize) xT Ax such that x ∈ Rn and xT x = 1.

To solve this, consider the Lagrangian


n
n X n
!
X X
L(x, λ) = xT Ax − λ(xT x − 1) = aij xi xj − λ x2i − 1 .
i=1 j=1 i=1

Partially differentiating L(x, λ) with respect to xi for 1 ≤ i ≤ n, we get


∂L
= 2a11 x1 + 2a12 x2 + · · · + 2a1n xn − 2λx1 ,
∂x1
.. ..
.=.
∂L
= 2an1 x1 + 2an2 x2 + · · · + 2ann xn − 2λxn .
∂xn
Therefore, to get the points of extremum, we solve for
 
T ∂L ∂L ∂L T ∂L
0 = , ,..., = = 2(Ax − λx).
∂x1 ∂x2 ∂xn ∂x
Thus, to solve the extremal problem, we need λ ∈ R, x ∈ Rn such that x 6= 0 and Ax = λx.

We observe the following about the matrices A, B and C that appear in Example 6.6.1.1.
√ √
1. det(A) = −3 = 3 × −1, det(B) = 5 = 5 × 1 and det(C) = 1 = (2 + 3) × (2 − 3).
6.1. INTRODUCTION AND DEFINITIONS 141
√ √
2. Tr(A) = 2 = 3 − 1, Tr(B) = 6 = 5 + 1 and det(C) = 4 = (2 + 3) + (2 − 3).
    √  √ 
1 1 3−1 3+1
3. Both the sets , and , are linearly independent.
1 −1 1 −1
   
1 1
4. If v1 = and v2 = and S = [v1 , v2 ] then
1 −1
   
3 0 −1 3 0
(a) AS = [Av1 , Av2 ] = [3v1 , −v2 ] = S ⇔ S AS = = diag(3, −1).
0 −1 0 −1
   
5 0 5 0
(b) BS = [Bv1 , Bv2 ] = [5v1 , v2 ] = S ⇔ S −1 AS = = diag(5, 1).
0 1 0 1
1 1
(c) Let u1 = √ v1 and u2 = √ v2 . Then, u1 and u2 are orthonormal unit vectors.
2 2
That is, if U = [u1 , u2 ] then I = U U ∗ = u1 u∗1 + u2 u∗2 and
i. A = 3u1 u∗1 − u2 u∗2 .
ii. B = 5u1 u∗1 + u2 u∗2 .
√  √ 
3−1 3+1
5. If v1 = and v2 = and S = [v1 , v2 ] then
1 −1
 √ 
−1 2+ 3 0√ √ √
S CS = = diag(2 + 3, 2 − 3).
0 2− 3
T
AF

Thus, we see that given A ∈ Mn (C), the number λ ∈ C and x ∈ Cn , x 6= 0 satisfying Ax = λx


have certain nice properties. For example, there exists a basis of C2 in which the matrices A, B
DR

and C behave like diagonal matrices. To understand the ideas better, we start with the following
definitions.

Definition 6.6.1.2. Let A ∈ Mn (C). Then,


1. the equation
Ax = λx ⇔ (A − λIn )x = 0 (6.6.1.1)

is called the eigen-condition.


2. an α ∈ C is called a characteristic value/root or eigenvalue or latent root of A if
there exists a non-zero vector x satisfying Ax = αx.
3. a non-zero vector x satisfying Equation (6.6.1.1) is called a characteristic vector or
eigenvector or invariant/latent vector of A corresponding to λ.
4. the tuple (α, x) with x 6= 0 and Ax = αx is called an eigen-pair or characteristic-pair.
5. for an eigenvalue α ∈ C, Null(A − αI) = {x ∈ Rn |Ax = αx} is called the eigen-space
or characteristic vector space of A corresponding to α.

Theorem 6.6.1.3. Let A ∈ Mn (C) and α ∈ C. Then, the following statements are equivalent.
1. α is an eigenvalue of A.
2. det(A − αIn ) = 0.

Proof. We know that α is an eigenvalue of A if any only if the system (A − αIn )x = 0 has a
non-trivial solution. By Theorem 2.2.2.34 this holds if and only if det(A − αI) = 0.
142 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

Definition 6.6.1.4. Let A ∈ Mn (C). Then,


1. det(A − λI) is a polynomial of degree n in λ and is called the characteristic polynomial
of A, denoted pA (λ), or in short p(λ).
2. the equation pA (λ) = 0 is called the characteristic equation of A.

We thus observe the following.

Remark 6.6.1.5. 1. Let A ∈ Mn (C). If α ∈ C is a root of pA (λ) = 0 then α is an eigenvalue.


As Null(A − αI) is a subspace of Cn , the following statements hold.
(a) (α, x) is an eigen-pair of A if and only if (α, cx) is an eigen-pair of A, for c ∈ C \{0}.
(b) If x1 , . . . , xr are linearly independent eigenvectors of A for the eigenvalue α then
Pr
ci xi , with at least one ci 6= 0, is also an eigenvector of A for α.
i=1

Hence, if S is a collection of eigenvectors, S needs to be linearly independent.

2. A − αI is singular. Therefore, if Rank(A − αI) = r then r < n. Hence, by Theo-


rem 2.2.2.34, the system (A − αI)x = 0 has n − r linearly independent solutions.

Almost all books in mathematics differentiate between characteristic value and eigenvalue as
T
AF

the ideas change when one moves from complex numbers to any other scalar field. We give the
following example for clarity.
DR

Remark 6.6.1.6. Let A ∈ M2 (F). Then A induces a map T ∈ L(F2 ) defined by T (x) = Ax,
for all x ∈ F2 . We use this idea to understand the difference.
" #
0 1
1. Let A = . Then pA (λ) = λ2 + 1. So, ±i are the roots of p(λ) = 0 in C. Hence,
−1 0

(a) A has (i, (1, i)T ) and (−i, (i, 1)T ) as eigen-pairs or characteristic-pairs.
(b) A has no characteristic value over R.
 
1 2 √
2. Let A = . Then 2 ± 3 are the roots of the characteristic equation. Hence,
1 3
(a) A has characteristic values or eigenvalues over R.
(b) A has no characteristic value over Q.

Let us look at some more examples.

Example 6.6.1.7. 1. Let A = diag(d1 , . . . , dn ) with di ∈ C, 1 ≤ i ≤ n. Then p(λ) =


Q
n
(λ − di ) and thus verify that (d1 , e1 ), . . . , (dn , en ) are the eigen-pairs.
i=1
" #
1 1
2. Let A = . Then p(λ) = (1 − λ)2 . Hence, 1 is a repeated eigenvalue. But the
0 1
complete solution of the system (A − I2 )x = 0 equals x = ce1 , for c ∈ C. Hence using
Remark 6.6.1.5.1, e1 is an eigenvector. Therefore, 1 is a repeated eigenvalue whereas
there is only one eigenvector.
6.1. INTRODUCTION AND DEFINITIONS 143
" #
1 0
3. Let A = . Then, 1 is a repeated eigenvalue of A. In this case, (A − I2 )x = 0 has
0 1
a solution for every x ∈ C2 . Hence, any two linearly independent vectors xt , yt from
C2 gives (1, x) and (1, y) as the two eigen-pairs for A. In general, if S = {x1 , . . . , xn } is a
basis of Cn then (1, x1 ), . . . , (1, xn ) are eigen-pairs of In , the identity matrix.
" # " #! " #!
√ 3√ √ 3√
1 3
4. Let A = . Then, 3 + 2i, 2+ 2i and 3 − 2i, 2− 2i are the eigen-
−2 5 1 1
pairs of A.
" #      
1 −1 i 1
5. Let A = . Then, 1 + i, and 1 − i, are the eigen-pairs of A.
1 1 1 i
 
0 1 0
6. Verify that A = 0 0 1 has the eigenvalue 0 repeated 3 times with e1 as the only
0 0 0
eigenvector as Ax = 0 with x = (x1 , x2 , x3 )T implies x2 = 0 = x3 .
 
0 1 0 0 0
0 0 1 0 0
 
7. Verify that A =  
0 0 0 0 0 has the eigenvalue 0 repeated 5 times with e1 and e4 as
T

0 0 0 0 1
AF

0 0 0 0 0
DR

the only eigenvectors as Ax = 0 with x = (x1 , x2 , x3 )T implies x2 = 0 = x3 = x5 . Note


that the diagonal blocks of A are nilpotent matrices.

Exercise 6.6.1.8. 1. Let A ∈ Mn (R). Then, prove that

(a) if α ∈ σ(A) then αk ∈ σ(Ak ), for all k ∈ N.


(b) if A is invertible and α ∈ σ(A) then αk ∈ σ(Ak ), for all k ∈ Z.

2. "
Find eigen-pairs
# over
" C, for each
# of "the following matrices:
# " #
1 1+i i 1+i cos θ − sin θ cos θ sin θ
, , and .
1−i 1 −1 + i i sin θ cos θ sin θ − cos θ

P
n
3. Let A = [aij ] ∈ Mn (C) with aij = a, for all 1 ≤ i ≤ n. Then prove that a is an
j=1
eigenvalue of A. What is the corresponding eigenvector?

4. Prove that the matrices A and AT have the same set of eigenvalues. Construct a 2 × 2
matrix A such that the eigenvectors of A and AT are different.

5. Let A be an idempotent matrix. Then prove that its eigenvalues are either 0 or 1 or both.

6. Let A be a nilpotent matrix. Then prove that its eigenvalues are all 0.

Theorem 6.6.1.9. Let λ1 , . . . , λn , not necessarily distinct, be the A = [aij ] ∈ Mn (C). Then
Q
n Pn Pn
det(A) = λi and Tr(A) = aii = λi .
i=1 i=1 i=1
144 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

Proof. Since λ1 , . . . , λn are the eigenvalues of A, by definition,


n
Y
n
det(A − xIn ) = (−1) (x − λi ) (6.6.1.2)
i=1

is an identity in x as polynomials. Therefore, by substituting x = 0 in Equation (6.6.1.2), we


Q Q
get det(A) = (−1)n (−1)n ni=1 λi = ni=1 λi . Also,
 
a11 − x a12 ··· a1n
 
 a21 a − x · · · a 
 22 2n 
det(A − xIn ) =  . .. .. ..  (6.6.1.3)
 .. . . . 
 
an1 an2 · · · ann − x
= a0 − xa1 + · · · + (−1)n−1 xn−1 an−1 + (−1)n xn (6.6.1.4)

for some a0 , a1 , . . . , an−1 ∈ C. Then, an−1 , the coefficient of (−1)n−1 xn−1 , comes from the term

(a11 − x)(a22 − x) · · · (ann − x).


P
n
So, an−1 = aii = Tr(A), the trace of A. Also, from Equation (6.6.1.2) and (6.6.1.4), we have
i=1
T
AF

n
Y
n−1 n−1 n n n
a0 − xa1 + · · · + (−1) x an−1 + (−1) x = (−1) (x − λi ).
DR

i=1

Therefore, comparing the coefficient of (−1)n−1 xn−1 , we have


( n
) n
X X
Tr(A) = an−1 = (−1) (−1) λi = λi .
i=1 i=1

Hence, we get the required result.

Exercise 6.6.1.10. 1. Let A be a 3 × 3 orthogonal matrix (AAT = I). If det(A) = 1, then


prove that there exists v ∈ R3 \ {0} such that Av = v.

2. Let A ∈ M2n+1 (R) with AT = −A. Then prove that 0 is an eigenvalue of A.

3. Let A ∈ Mn (C). Then, A is invertible if and only if 0 is not an eigenvalue of A.

6.1.A Spectrum of an eigenvalue

Definition 6.6.1.11. Let A ∈ Mn (C). Then,


1. the collection of eigenvalues of A, counting with multiplicities, is called the spectrum of
A, denoted σ(A).
2. the multiplicity of α ∈ σ(A) is called the algebraic multiplicity of A, denoted Alg.Mulα (A).
3. for α ∈ σ(A), dim(Null(A−αI)) is called the geometric multiplicity of A, Geo.Mulα (A).

We now state the following observations.


6.1. INTRODUCTION AND DEFINITIONS 145

Remark 6.6.1.12. Let A ∈ Mn (C).


1. Then, for each α ∈ σ(A), using Theorem 2.2.2.34 dim(Null(A − αI)) ≥ 1. So, we have
at least one eigenvector.
2. If the algebraic multiplicity of α ∈ σ(A) is r ≥ 2 then the Example 6.6.1.7.7 implies that
we need not have r linearly independent eigenvectors.

Theorem 6.6.1.13. Let A and B be two similar matrices. Then


1. α ∈ σ(A) if and only if α ∈ σ(B).
2. for each α ∈ σ(A), Alg.Mulα (A) = Alg.Mulα (B) and Geo.Mulα (A) = Geo.Mulα (B).

Proof. Since A and B are similar, there exists an invertible matrix S such that A = SBS −1 .
So, α ∈ σ(A) if and only if α ∈ σ(B) as

det(A − xI) = det(SBS −1 − xI) = det S(B − xI)S −1
= det(S) det(B − xI) det(A−1 ) = det(B − xI). (6.6.1.5)

Note that Equation (6.6.1.5) also implies that Alg.Mulα (A) = Alg.Mulα (B). We will now
show that Geo.Mulα (A) = Geo.Mulα (B).
So, let Q1 = {v1 , . . . , vk } be a basis of Null(A − αI). Then, B = SAS −1 implies that
T
AF

Q2 = {Sv1 , . . . , Svk } ⊆ Null(B − αI). Since Q1 is linearly independent and S is invertible, we


get Q2 is linearly independent. So, Geo.Mulα (A) ≤ Geo.Mulα (B). Now, we can start with
DR

eigenvectors of B and use similar arguments to get Geo.Mulα (B) ≤ Geo.Mulα (A) and hence
the required result follows.

Remark 6.6.1.14. Let A ∈ Mn (C). Then, for any invertible matrix B, the matrices AB and
BA = B(AB)B −1 are similar. Hence, in this case the matrices AB and BA have
1. the same set of eigenvalues.
2. Alg.Mulα (A) = Alg.Mulα (B), for each α ∈ σ(A).
3. Geo.Mulα (A) = Geo.Mulα (B), for each α ∈ σ(A).

We will now give a relation between the geometric multiplicity and the algebraic multiplicity.

Theorem 6.6.1.15. Let A ∈ Mn (C). Then, for α ∈ σ(A), Geo.Mulα (A) ≤ Alg.Mulα (A).

Proof. Let Geo.Mulα (A) = k. Suppose Q1 = {v1 , . . . , vk } is an orthonormal basis of Null(A−


αI). Extend Q1 to get {v1 , . . . , vk , vk+1 , . . . , vn } as an orthonormal basis of Cn . Put P =
[v1 , . . . , vk , vk+1 , . . . , vn ]. Then P ∗ = P −1 and

P ∗ AP = P ∗ [Av1 , . . . , Avk , Avk+1 , . . . , Avn ]


   
v1∗ α ··· 0 ∗ ··· ∗
 ..  .. 
 . 0 . 0 ∗· · · ∗
   
 v∗  0 ··· α ∗ · · · ∗
=  k  
v∗  [αv1 , . . . , αvk , ∗, . . . , ∗] =  0
.
 k+1   ··· 0 ∗ · · · ∗

 ..  .. 
 .  . 
vn∗ 0 ··· 0 ∗ ··· ∗
146 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

Now, if we denote the lower diagonal submatrix as D then

pA (x) = det(A − xI) = det(P ∗ AP − xI) = (α − x)k det(D − xI).

So, Alg.Mulα (A) = Alg.Mulα (P ∗ AP ) ≥ k = Geo.Mulα (A).

Exercise 6.6.1.16. 1. Let A ∈ Mm×n (R) and B ∈ Mn×m (R).

(a) If α ∈ σ(AB) and α 6= 0 then


i. α ∈ σ(BA).
ii. Alg.Mulα (AB) = Alg.Mulα (BA).
iii. Geo.Mulα (AB) = Geo.Mulα (BA).

(b) If 0 ∈ σ(AB) and n = m then Alg.Mul0 (AB) = Alg.Mul0 (BA) as there are n
eigenvalues, counted with multiplicity.
(c) Give an example to show that Geo.Mul0 (AB) need not equal Geo.Mul0 (BA) even
when n = m.

2. Let A ∈ Mn (R) be an invertible matrix and let x, y ∈ Rn with x 6= 0 and yT A−1 x 6= 0.


Define B = xyT A−1 . Then prove that
T
AF

(a) λ0 = yT A−1 x is an eigenvalue of B of multiplicity 1.


DR

(b) 0 is an eigenvalue of B of multiplicity n − 1 [Hint: Use Exercise 6.6.1.16.1a].

(c) 1 + αλ0 is an eigenvalue of I + αB of multiplicity 1, for any α ∈ R.

(d) 1 is an eigenvalue of I + αB of multiplicity n − 1, for any α ∈ R.

(e) det(A + αxyT ) equals (1 + αλ0 ) det(A), for any α ∈ R. This result is known as the
Shermon-Morrison formula for determinant.

3. Let A, B ∈ M2 (R) such that det(A) = det(B) and Tr(A) = Tr(B).

(a) Do A and B have the same set of eigenvalues?

(b) Give examples to show that the matrices A and B need not be similar.

4. Let A, B ∈ Mn (R). Also, let (λ1 , u) and (λ2 , v) are eigen-pairs of A and B, respectively.

(a) If u = αv for some α ∈ R then (λ1 + λ2 , u) is an eigen-pair for A + B.

(b) Give an example to show that if u and v are linearly independent then λ1 + λ2 need
not be an eigenvalue of A + B.

5. Let A ∈ Mn (R) be an invertible matrix with eigen-pairs (λ1 , u1 ), . . . , (λn , un ). Then prove
that B = [u1 , . . . , un ] forms a basis of Rn . If [b]B = (c1 , . . . , cn )T then the system Ax = b
has the unique solution
c1 c2 cn
x= u1 + u2 + · · · + un .
λ1 λ2 λn
6.2. DIAGONALIZATION 147

6.2 Diagonalization
Let A ∈ Mn (C) and let T ∈ L(Cn ) be defined by T (x) = Ax, for all x ∈ Cn . In this section, we
first find conditions under which one can obtain a basis B of Cn such that T [B, B] is a diagonal
matrix. And, then it is shown that normal matrices satisfy the above conditions. To start with,
we have the following definition.

Definition 6.6.2.1. [Matrix Digitalization] A matrix A is said to be diagonalizable if A is


similar to a diagonal matrix. Or equivalently, P −1 AP = D ⇔ AP = P D, for some diagonal
matrix D and invertible matrix P .  
0 1
Example 6.6.2.2. 1. Let A = . Then, A cannot be diagonalized.
0 0
solution: Suppose A is diagonalizable. Then, A is similar to D = diag(d1 , d2 ). Thus,
by Theorem 6.6.1.13, {d1 , d2 } = σ(D) = σ(A) = {0, 0}. Hence, D = 0 and therefore,
A = SDS −1 = 0, a contradiction.
 
2 1 1
2. Let A = 0 2 1. Then, A cannot be diagonalized.
0 0 2
solution: Suppose A is diagonalizable. Then, A is similar to D = diag(d1 , d2 , d3 ). Thus,
by Theorem 6.6.1.13, {d1 , d2 , d3 } = σ(D) = σ(A) = {2, 2, 2}. Hence, D = 2I3 and
T

therefore, A = SDS −1 = 2I3 , a contradiction.


AF

" #      
0 1 i −i
DR

3. Let A = . Then i, and −i, are two eigen-pairs of A. Define


−1 0 1 1
" # " #
i −i −i 0
U = √12 . Then U ∗ U = I2 = U U ∗ and U ∗ AU = .
1 1 0 i
Theorem 6.6.2.3. Let A ∈ Mn (R).
1. Let S be an invertible matrix such that S −1 AS = diag(d1 , . . . , dn ). Then, for 1 ≤ i ≤ n,
the i-th column of S is an eigenvector of A corresponding to di .
2. Then A is diagonalizable if and only if A has n linearly independent eigenvectors.

Proof. Let S = [u1 , . . . , un ]. Then AS = SD gives

[Au1 , . . . , Aun ] = A [u1 , . . . , un ] = AS = SD = S diag(d1 , . . . , dn ) = [d1 u1 , . . . , dn un ] .

Or equivalently, Aui = di ui , for 1 ≤ i ≤ n. As S is invertible, {u1 , . . . , un } are linearly


independent. Hence, (di , ui ), for 1 ≤ i ≤ n, are eigen-pairs of A. This proves Part 1 and “only
if” part of Part 2.
Conversely, let {u1 , . . . , un } be n linearly independent eigenvectors of A corresponding to
eigenvalues α1 , . . . , αn . Then, by Theorem 3.3.2.8, S = [u1 , . . . , un ] is non-singular and
 
α1 0 0
 
 0 α2 0 
 
AS = [Au1 , . . . , Aun ] = [α1 u1 , . . . , λn un ] = [u1 , . . . , un ]  . .  = SD,
 .. . . ... 
 
0 0 αn

where D = diag(α1 , . . . , αn ). Therefore, S −1 AS = D and hence A is diagonalizable.


148 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

Theorem 6.6.2.4. Let (α1 , v1 ), . . . , (αk , vk ) be k eigen-pairs of A ∈ Mn (C) with αi ’s distinct.


Then, {v1 , . . . , vk } is linearly independent.

Proof. Suppose {v1 , . . . , vk } is linearly dependent. Then, there exists a smallest ℓ ∈ {1, . . . , k−1}
and β 6= 0 such that vℓ+1 = β1 v1 + · · · + βℓ vℓ . So,

αℓ+1 vℓ+1 = αℓ+1 β1 v1 + · · · + αℓ+1 βℓ vℓ . (6.6.2.1)

and

αℓ+1 vℓ+1 = Avℓ+1 = A (β1 v1 + · · · + βℓ vℓ ) = α1 β1 v1 + · · · + αℓ βℓ vℓ . (6.6.2.2)

Now, subtracting Equation (6.6.2.2) from Equation (6.6.2.1), we get

0 = (αℓ+1 − α1 ) β1 v1 + · · · + (αℓ+1 − αℓ ) βℓ vℓ .

So, vℓ ∈ LS(v1 , . . . , vℓ−1 ), a contradiction to the choice of ℓ. Thus, the required result follows.
An immediate corollary of Theorem 6.6.2.3 and Theorem 6.6.2.4 is stated next without proof.

Corollary 6.6.2.5. Let A ∈ Mn (C) have n distinct eigenvalues. Then A is diagonalizable.

The converse of Theorem 6.6.2.4 is not true as In has n linearly independent eigenvectors
T
AF

corresponding to the eigenvalue 1, repeated n times.


DR

Corollary 6.6.2.6. Let α1 , . . . , αk be k distinct eigenvalues A ∈ Mn (C). Also, for 1 ≤ i ≤ k,


Pk
let dim(Null(A − αi In )) = ni . Then A has ni linearly independent eigenvectors.
i=1

Proof. For 1 ≤ i ≤ k, let Si = {ui1 , . . . , uini } be a basis of Null(A − αi In ). Then, we need to


 k 
S
k Q
prove that Si is linearly independent. To do so, denote pj (A) = (A − αi In ) / (A − αj In ),
i=1 i=1
for 1 ≤ j ≤ k. Then note that pj (A) is a polynomial in A of degree k − 1 and

 0, if u ∈ Null(A − αi In ), for some i 6= j
pj (A)u = Q (6.6.2.3)
 (αj − αi )u if u ∈ Null(A − αj In )
i6=j

S
k
So, to prove that Si is linearly independent, consider the linear system
i=1

c11 u11 + · · · + c1n1 u1n1 + · · · + ck1 uk1 + · · · + cknk uknk = 0

in the unknowns cij ’s. Now, applying the matrix pj (A) and using Equation (6.6.2.3), we get
Y 
(αj − αi ) cj1 uj1 + · · · + cjnj ujnj = 0.
i6=j
Q
But (αj − αi ) 6= 0 as αi ’s are distinct. Hence, cj1 uj1 + · · · + cjnj ujnj = 0. As Sj is a basis of
i6=j
Null(A − αj In ), we get cjt = 0, for 1 ≤ t ≤ nj . Thus, the required result follows.

Corollary 6.6.2.7. Let A ∈ Mn (C) with distinct eigenvalues α1 , . . . , αk . Then A is diagonaliz-


able if and only if Geo.Mulαi (A) = Alg.Mulαi (A), for each 1 ≤ i ≤ k.
6.2. DIAGONALIZATION 149

P
k
Proof. Let Alg.Mulαi (A) = mi . Then, mi = n. Let Geo.Mulαi (A) = ni , for 1 ≤
i=1
P
k
i ≤ k. Then, by Corollary 6.6.2.6 A has ni linearly independent eigenvectors. Also, by
i=1
Theorem 6.6.1.15, ni ≤ mi , for 1 ≤ i ≤ mi .
Now, let A be diagonalizable. Then, by Theorem 6.6.2.3, A has n linearly independent eigen-
P
k P
k
vectors. So, n = ni . As ni ≤ mi and mi = n, we get ni = mi .
i=1 i=1
Now, assume that Geo.Mulαi (A) = Alg.Mulαi (A), for 1 ≤ i ≤ k. Then, for each i, 1 ≤ i ≤
P
k P
k
n, A has ni = mi linearly independent eigenvectors. Thus, A has ni = mi = n linearly
i=1 i=1
independent eigenvectors. Hence by Theorem 6.6.2.3, A is diagonalizable.
 
     
2 1 1 1 1
 

Example 6.6.2.8. Let A =  1 2 1   . Then  1,  0  and  2,  1  are the only
0 −1 1 −1 −1
eigen-pairs. Hence, by Theorem 6.6.2.3, A is not diagonalizable.
 
2 1 1
 
Exercise 6.6.2.9. 1. Is the matrix A =  
1 2 1 diagonalizable?
1 1 2
T
AF

" #
A 0
2. Let A ∈ Mn (R) and B ∈ Mm (R). Suppose C = . Then prove that C is diagonal-
DR

0 B
izable if and only if both A and B are diagonalizable.

3. Let Jn be an n×n matrix with all entries 1. Then, prove that Geo.Mul1 (Jn ) = Alg.Mul1 (Jn ) =
1 and Geo.Mul0 (Jn ) = Alg.Mul0 (Jn ) = n − 1.

4. Let A = [aij ] ∈ Mn (R), where aij = a, if i = j and b, otherwise. Then, verify that
A = (a − b)In + bJn . Hence, or otherwise determine the eigenvalues and eigenvectors of
Jn . Is A diagonalizable?

5. Let T : R5 −→ R5 be a linear operator with Rank(T − I) = 3 and

Null(T ) = {(x1 , x2 , x3 , x4 , x5 ) ∈ R5 | x1 + x4 + x5 = 0, x2 + x3 = 0}.

(a) Determine the eigenvalues of T ?


(b) For each distinct eigenvalue α of T , determine Geo.Mulα (T ).
(c) Is T diagonalizable? Justify your answer.

6. Let A ∈ Mn (R) with A 6= 0 but A2 = 0. Prove that A cannot be diagonalized.

7. Are the followingmatrices diagonalizable?


1 3 2 1    
  1 0 −1 1 −3 3 " #
0 2 3 1     2 i

i)   , ii) 0 0 1  , iii) 0 −5 6 and iv) .
    
0 0 −1 1 i 0
0 2 0 0 −3 4
0 0 0 4
150 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

6.2.A Schur’s Unitary Triangularization

We now prove one of the most important results in diagonalization, called the Schur’s Lemma
or Schur’s unitary triangularization.

Lemma 6.6.2.10 (Schur’s unitary triangularization (SUT)). Let A ∈ Mn (C). Then there exists
a unitary matrix U such that A is an upper triangular matrix. Further, if A ∈ Mn (R) and σ(A)
have real entries then U is real orthogonal matrix.

Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let n > 1
and assume the result to be true for k < n and prove it for n.
Let (λ1 , x1 ) be an eigen-pair of A with kx1 k = 1. Now, extend it to form an orthonormal basis
{x1 , x2 , . . . , un } of Cn and define X = [x1 , x2 , . . . , un ]. Then X is a unitary matrix and
 
x∗1
   
 x∗ 
∗ ∗  2 λ1 ∗
X AX = X [Ax1 , Ax2 , . . . , Axn ] =  .  [λ1 x1 , Ax2 , . . . , Axn ] = , (6.6.2.4)
 ..  0 B
 
x∗n
T

where B ∈ Mn−1 (C). Now, by induction hypothesis there exists a unitary" matrix
# U ∈ Mn−1 (C)
AF

such that U ∗ BU = T is an upper triangular matrix. Define U b = X 1 0 . Then, using


DR

0 U
b
Exercise 7.7, the matrix U is unitary and
 ∗        
b AUb 1 0 ∗ 1 0 1 0 λ1 ∗ 1 0
U = X AX =
0 U∗ 0 U 0 U∗ 0 B 0 U
      
λ1 ∗ 1 0 λ1 ∗ λ1 ∗
= = = .
0 U ∗B 0 U 0 U ∗ BU 0 T
 
λ1 ∗
Since T is upper triangular, is upper triangular.
0 T
Further, if A ∈ Mn (R) and σ(A) has real entries then x1 ∈ Rn with Ax1 = λ1 x1 . Now, one
uses induction once again to get the required result.

Remark 6.6.2.11. Let A ∈ Mn (C). Then, by Schur’s Lemma there exists a unitary matrix U
such that U ∗ AU = T = [tij ], a triangular matrix. Thus,

{α1 , . . . , αn } = σ(A) = σ(U ∗ AU ) = {t11 , . . . , tnn }. (6.6.2.5)

Furthermore, we can get the αi ’s in the diagonal of T in any prescribed order.

Definition 6.6.2.12. [Unitary Equivalence] Let A, B ∈ Mn (C). Then A and B are said to be
unitarily equivalent/similar if there exists a unitary matrix U such that A = U ∗ BU .

Exercise 6.6.2.13. Use the exercises given below to conclude that the upper triangular matrix
obtained in the “Schur’s Lemma” need not be unique.
6.2. DIAGONALIZATION 151
 √   √ 
2 −1 3 2 2 1 3 2
 √   √ 

1. Prove that B = 0 1  
2  and C = 0 1 − 2 are unitarily equivalent.
0 0 3 0 0 3
 √   √ 
2 0 3 2 2 0 3 2
 √   √ 

2. Prove that D = 1 1  
2  and E = −1 1 − 2 are unitarily equivalent.
0 0 1 0 0 1
   
2 1 4 1 1 4
   
3. Let A1 = 
 0 1 2 and A2 = 0 2 2. Then
   prove that
0 0 1 0 0 3
(a) A1 and D are unitarily equivalent.
(b) A2 and B are unitarily equivalent.
(c) Do the above results contradict Exercise 5.5.1.32.6.6c? Give reasons for your answer.
   √ 
1 1 1 2 −1 2
   
 
4. Prove that A = 0 2 1 and B = 0 1  0 
 are unitarily equivalent.
0 0 3 0 0 3
T
AF

5. Let A be a normal matrix. If all the eigenvalues of A are 0 then prove that A = 0. What
happens if all the eigenvalues of A are 1?
DR

6. Let A ∈ Mn (C). Then Prove that if x∗ Ax = 0, for all x ∈ Cn , then A = 0. Do these


results hold for arbitrary matrices?
" # " #
4 4 10 9
7. Show that the matrices A = and B = are similar. Is it possible to find
0 4 −4 −2
a unitary matrix U such that A = U ∗ BU ?

Remark 6.6.2.14. We know that if two matrices are unitarily equivalent then they are neces-
sarily similar as U ∗ = U −1 , for every unitary matrix U . But, similarity doesn’t imply unitary
equivalence (see Exercise 6.6.2.13.7). In numerical calculations, unitary transformations are
preferred as compared to similarity transformations due to the following main reasons:

1. Exercise 5.5.1.32.6.6c? implies that kAxk = kxk, whenever A is a normal matrix. This
need not be true under a similarity change of basis.

2. As U −1 = U ∗ , for a unitary matrix, unitary equivalence is computationally simpler.

3. Also, computation of “conjugate transpose” doesn’t create round-off error in calculation.

We use Lemma 6.6.2.10 to give another proof of Theorem 6.6.1.9.


Q
n
Corollary 6.6.2.15. Let A ∈ Mn (C). If σ(A) = {α1 , . . . , αn } then det(A) = αi and
i=1
P
n
Tr(A) = αi .
i=1
152 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

Proof. By Schur’s Lemma there exists a unitary matrix U such that U ∗ AU = T = [tij ], a
Q
n Q
n
triangular matrix. By Remark 6.6.2.11, σ(A) = σ(T ). Hence, det(A) = det(T ) = tii = αi
i=1 i=1
P
n P
n
and Tr(A) = Tr(A(U U ∗ )) = Tr(U ∗ (AU )) = Tr(T ) = tii = αi .
i=1 i=1

6.2.B Diagonalizability of some Special Matrices

We now use Schur’s unitary triangularization Lemma to state the main theorem of this subsec-
tion. Also, recall that A is said to be a normal matrix if AA∗ = A∗ A.

Theorem 6.6.2.16 (Spectral Theorem for Normal Matrices). Let A ∈ Mn (C). If A is a normal
matrix then there exists a unitary matrix U such that U ∗ AU = diag(α1 , . . . , αn ).

Proof. By Schur’s Lemma there exists a unitary matrix U such that U ∗ AU = T = [tij ], a
triangular matrix. Since A is upper triangular, we see that

T ∗ T = (U ∗ AU )∗ (U ∗ AU ) = U ∗ A∗ AU = U ∗ AA∗ U = (U ∗ AU )(U ∗ AU )∗ = T T ∗ .

Thus, we see that T is an upper triangular matrix with T ∗ T = T T ∗ . Thus, by Exercise 1.1.3.6.17,
T is a diagonal matrix and this completes the proof.
T
AF

Exercise 6.6.2.17. Let A ∈ Mn (C). If A is either a Hermitian, skew-Hermitian or Unitary


DR

matrix then A is a normal matrix.

We re-write Theorem 6.6.2.16 in another form to indicate that A can be decomposed into
linear combination of orthogonal projectors onto eigen-spaces. Thus, it is independent of the
choice of eigenvectors.

Remark 6.6.2.18. Let A ∈ Mn (C) be a normal matrix with eigenvalues α1 , . . . , αn .


1. Then there exists a unitary matrix U = [u1 , . . . , un ] such that
(a) In = u1 u∗1 + · · · + un u∗n .
(b) the columns of U form a set of orthonormal eigenvectors for A (use Theorem 6.6.2.3).
(c) A = A · In = A (u1 u∗1 + · · · + un u∗n ) = α1 u1 u∗1 + · · · + αn un u∗n .

2. Let α1 , . . . , αk be the distinct eigenvalues of A. Also, let Wi = Null(A − αi In ), for


1 ≤ i ≤ k, be the corresponding eigen-spaces.
(a) Then, we can group the ui ’s such that they form an orthonormal basis of Wi , for
1 ≤ i ≤ k. Hence, Cn = W1 ⊕ · · · ⊕ Wk .
(b) If Pαi is the orthogonal projector onto Wi , for 1 ≤ i ≤ k then A = α1 P1 + · · · + αk Pk .
Thus, A depends only on eigen-spaces and not on the computed eigenvectors.

We now give the spectral theorem for Hermitian matrices.

Theorem 6.6.2.19. Let A ∈ Mn (C) be a Hermitian matrix. Then

1. the eigenvalues αi , for 1 ≤ i ≤ n, of A are real.


6.2. DIAGONALIZATION 153

2. there exists a unitary matrix U such that U ∗ AU = D, where D = diag(α1 , . . . , αn ).

Proof. The second part is immediate from Theorem 6.6.2.16 and hence the proof is omitted.
For Part 1, let (α, x) be an eigen-pair. Then Ax = αx. As A is Hermitian A∗ = A. Thus,
x∗ A = x∗ A∗ = (Ax)∗ = (αx)∗ = αx∗ . Hence, using x∗ A = αx∗ , we get

αx∗ x = x∗ (αx) = x∗ (Ax) = (x∗ A)x = (αx∗ )x = αx∗ x.

As x is an eigenvector, x 6= 0. Hence, kxk2 = x∗ x 6= 0. Thus λ = λ. That is, λ ∈ R.


As an immediate corollary of Theorem 6.6.2.19 and the second part of Lemma 6.6.2.10, we
give the following result without proof.

Corollary 6.6.2.20. Let A ∈ Mn (R) be symmetric. Then A = U diag(α1 , . . . , αn )U ∗ , where

1. the αi ’s are all real,

2. the columns of U can be chosen to have real entries,

3. the eigenvectors that correspond to the columns of U form an orthonormal basis of Rn .


T

Exercise 6.6.2.21. 1. Let A be a skew-symmetric matrix. Then the eigenvalues of A are


AF

either zero or purely imaginary and A is unitarily diagonalizable.


DR

2. Let A be a skew-Hermitian matrix. Then, A is unitarily diagonalizable.

3. Let A be a normal matrix with (λ, x) as an eigen-pair. Then

(a) (A∗ )k x for k ∈ Z+ is also an eigenvector corresponding to λ.


(b) (λ, x) is an eigen-pair for A∗ . [Hint: Verify kA∗ x − λxk2 = kAx − λxk2 .]

4. Let A be an n × n unitary matrix. Then

(a) |λ| = 1 for any eigenvalue λ of A.


(b) the eigenvectors x, y corresponding to distinct eigenvalues are orthogonal.

5. Let A be a 2 × 2 orthogonal matrix. Then prove the following:


" #
cos θ − sin θ
(a) if det(A) = 1 then A = , for some θ, 0 ≤ θ < 2π. That is, A
sin θ cos θ
counterclockwise rotates every point in R2 by an angle θ.
" #
cos θ sin θ
(b) if det A = −1 then A = , for some θ, 0 ≤ θ < 2π. That is, A
sin θ − cos θ
reflects every point in R2 about a line passing through origin. Determine"this line.
#
1 0
Or equivalently, there exists a non-singular matrix P such that P −1 AP = .
0 −1

6. Let A be a 3 × 3 orthogonal matrix. Then prove the following:


154 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

(a) if det(A) = 1 then A is a rotation about a fixed axis, in the sense that A has an
eigen-pair (1, x) such that the restriction of A to the plane x⊥ is a two dimensional
rotation in x⊥ .
(b) if det A = −1 then A corresponds to a reflection through a plane P , followed by a
rotation about the line through origin that is orthogonal to P .

7. Let A be a normal matrix. Then prove that Rank(A) equals the number of non-zero
eigenvalues of A.

6.2.C Cayley Hamilton Theorem

Let A ∈ Mn (C). Then, in Theorem 6.6.1.9, we saw that



pA (x) = det(A − xI) = (−1)n xn − an−1 xn−1 + an−2 xn−2 + · · · + (−1)n−1 a1 x + (−1)n a0
(6.6.2.6)
for certain ai ∈ C, 0 ≤ i ≤ n − 1. Also, if α is an eigenvalue of A then pA (α) = 0. So,
xn − an−1 xn−1 + an−2 xn−2 + · · · + (−1)n−1 a1 x + (−1)n a0 = 0 is satisfied by n complex numbers.
It turns out that the expression

An − an−1 An−1 + an−2 An−2 + · · · + (−1)n−1 a1 A + (−1)n a0 I = 0


T
AF

holds true as a matrix identity. This is a celebrated theorem called the Cayley Hamilton
DR

Theorem. We give a proof using Schur’s unitary triangularization. To do so, we look at


multiplication of certain upper triangular matrices.

Lemma 6.6.2.22. Let A1 , . . . , An ∈ Mn (C) be upper triangular matrices such that the (i, i)-th
entry of Ai equals 0, for 1 ≤ i ≤ n. Then, A1 A2 · · · An = 0.

Proof. We use induction to prove that the first k columns of A1 A2 · · · Ak is 0, for 1 ≤ k ≤ n.


The result is clearly true for k = 1 as the first column of A1 is 0. For clarity, we show that the
first two columns of A1 A2 is 0. Let B = A1 A2 . Then, by matrix multiplication

B[:, i] = A1 [:, 1](A2 )1i + A1 [:, 2](A2 )2i + · · · + A1 [:, n](A2 )ni = 0 + · · · + 0 = 0

as A1 [:, 1] = 0 and (A2 )ji = 0, for i = 1, 2 and j ≥ 2. So, assume that the first n − 1 columns of
C = A1 · · · An−1 is 0 and let B = CAn . Then, for 1 ≤ i ≤ n, we see that

B[:, i] = C[:, 1](An )1i + C[:, 2](An )2i + · · · + C[:, n](An )ni = 0 + · · · + 0 = 0

as C[:, j] = 0, for 1 ≤ j ≤ n − 1 and (An )ni = 0, for i = n − 1, n. Thus, by induction hypothesis


the required result follows.
We now prove the Cayley Hamilton Theorem using Schur’s unitary triangularization.

Theorem 6.6.2.23 (Cayley Hamilton Theorem). Let A ∈ Mn (C). Then A satisfies its charac-
teristic equation. That is, if pA (x) = det(A−xIn ) = a0 −xa1 +· · ·+(−1)n−1 an−1 xn−1 +(−1)n xn
then
An − an−1 An−1 + an−2 An−2 + · · · + (−1)n−1 a1 A + (−1)n a0 I = 0
holds true as a matrix identity.
6.2. DIAGONALIZATION 155

Q
n
Proof. Let σ(A) = {α1 , . . . , αn } then pA (x) = (x − αi ). And, by Schur’s unitary triangular-
i=1
ization there exists a unitary matrix U such that U ∗ AU = T , an upper triangular matrix with
tii = αi , for 1 ≤ i ≤ n. Now, observe that if Ai = T − αi I then the Ai ’s satisfy the conditions
of Lemma 6.6.2.22. Hence,
(T − α1 I) · · · (T − αn I) = 0.

Therefore,
n
Y n
Y h i
pA (A) = (A − αi I) = (U T U ∗ − αi U IU ∗ ) = U (T − α1 I) · · · (T − αn I) U ∗ = U 0U ∗ = 0.
i=1 i=1

Thus, the required result follows.


We now give some examples and then implications of the Cayley Hamilton Theorem.
 
1 2
Remark 6.6.2.24. 1. Let A = . Then, pA (x) = x2 + 2x − 5. Hence, verify that
1 −3
     
2 3 −4 1 2 1 0
A + 2A − 5I2 = +2 −5 = 0.
−2 11 1 −3 0 1
 
−1 1 1 3 2
Further, verify that A = (A + 2I2 ) = .
5 5 1 −1
T
AF

" #
0 1
2. Let A = . Then pA (x) = x2 . So, even though A 6= 0, A2 = 0.
DR

0 0
 
0 0 1
3. For A = 0 0 0, pA (x) = x3 . Thus, by the Cayley Hamilton Theorem A3 = 0. But, it
0 0 0
turns out that A2 = 0.

4. Let A ∈ Mn (C) with pA (x) = a0 − xa1 + · · · + (−1)n−1 an−1 xn−1 + (−1)n xn .

(a) Then, for any ℓ ∈ N, the division algorithm gives α0 , α1 , . . . , αn−1 ∈ C and a polyno-
mial f (x) with coefficients from C such that

xℓ = f (x)pA (x) + α0 + xα1 + · · · + xn−1 αn−1 .

Hence, by the Cayley Hamilton Theorem, Aℓ = α0 I + α1 A + · · · + αn−1 An−1 .


i. Thus, to compute any power of A, one needs to apply the division algorithm to
get αi ’s and know Ai , for 1 ≤ i ≤ n − 1. This is quite helpful in numerical
computation as computing powers takes much more time that division.

ii. Note that LS I, A, A2 , . . . is a subspace of Mn (C). Also, dim (Mn (C)) = n2 .
 
But, the above argument implies that dim LS I, A, A2 , . . . ≤ n.
iii. In the language of graph theory, it says the following: “Let G be a graph on n
vertices and A its adjacency matrix. Suppose there is no path of length n − 1 or
less from a vertex v to a vertex u in G. Then, G doesn’t have a path from v to u
of any length. That is, the graph G is disconnected and v and u are in different
components of G.”
156 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

(b) Suppose A is non-singular. Then, by definition a0 = det(A) 6= 0. Hence,


1  
A−1 = a1 I − a2 A + · · · + (−1)n−2 an−1 An−2 + (−1)n−1 An−1 .
a0
This matrix identity can be used to calculate the inverse.

(c) The above also implies that if A is invertible then A−1 ∈ LS I, A, A2 , . . . . That is,
A−1 is a linear combination of the vectors I, A, . . . , An−1 .
     
2 3 4 −1 −1 1 −2 −1
1
     
Exercise 6.6.2.25. Find the inverse of   
5 6 7,  1 −1 1 and
 −2 1 −1 by the
 
1 1 2 0 1 1 0 −1 2
Cayley Hamilton Theorem.

Exercise 6.6.2.26. Miscellaneous Exercises:" #   


0 B x
1. Let B be an m × n matrix and A = . Then, prove that λ, is an eigen-pair
B T 0 y
  
x
if and only if −λ, is an eigen-pair.
−y
 
B C
2. Let B and C be n × n matrices and A = . Then, prove the following:
T

−C B
AF

 
x
(a) if s is a real eigenvalue of A with corresponding eigenvector then s is also an
DR

y
 
−y
eigenvalue corresponding to the eigenvector .
x
 
x + iy
(b) if s + it is a complex eigenvalue of A with corresponding eigenvector then
−y + ix
 
x − iy
s − it is also an eigenvalue of A with corresponding eigenvector .
−y − ix
(c) (s + it, x + iy) is an eigen-pair of B +iC if and only if (s − it, x − iy) is an eigen-pair
of B − iC.
  
x + iy
(d) s + it, is an eigen-pair of A if and only if (s + it, x + iy) is an eigen-
−y + ix
pair of B + iC.
(e) det(A) = | det(B + iC)|2 .

We end this chapter with an application to the study of conic sections in analytic geometry.

6.3 Quadratic Forms


Definition 6.6.3.1. Let A ∈ Mn (C). Then A is said to be
1. positive semi-definite(psd) if x∗ Ax ≥ 0, for all x ∈ Cn .
2. positive definite(pd) if x∗ Ax > 0, for all x ∈ Cn \ {0}.
3. negative semi-definite(psd) if x∗ Ax ≤ 0, for all x ∈ Cn .
4. negative definite(pd) if x∗ Ax < 0, for all x ∈ Cn \ {0}.
6.3. QUADRATIC FORMS 157

Remark 6.6.3.2. Let A = [aij ] ∈ Mn (C) be positive semi-definite (positive definite/negative


semi-definite or negative definite) matrix. Then, A is Hermitian.
Solution: By definition, aii = e∗i Aei ∈ R. Also, aii + ajj + aij + aji = (ei + ej )∗ A(ei + ej ) ∈ R.
So, Im(aij ) = −Im(aji ). Similarly, aii + ajj + iaij − iaji = (ei + iej )∗ A(ei + iej ) ∈ R implies
that Re(aij ) = Re(aji ).

2 1
Example 6.6.3.3. 1. Let A = . Then, A is positive definite.
1 2
 
1 1
2. Let A = . Then, A is positive semi-definite but not positive definite.
1 1
 
−2 1
3. Let A = . Then, A is negative definite.
1 −2
 
−1 1
4. Let A = . Then, A is negative semi-definite.
1 −1
 
0 1
5. Let A = . Then, A is neither positive semi-definite nor positive definite nor
1 −1
negative semi-definite and nor negative definite.

Theorem 6.6.3.4. Let A ∈ Mn (C). Then the following statements are equivalent.
1. A is positive semi-definite.
T
AF

2. A∗ = A and each eigenvalue of A is non-negative.


DR

3. A = B ∗ B, for some B ∈ Mn (C).

Proof. 1 ⇒ 2: Let A be positive semi-definite. Then, by Remark 6.6.3.2 A is Hermitian. If


(α, v) is an eigen-pair of A then αkvk2 = v∗ Av ≥ 0. So, α ≥ 0.
2 ⇒ 3: Let σ(A) = {α1 , . . . , αn }. Then, by spectral theorem, there exists a unitary
matrix U such that U ∗ AU = D with D = diag(α1 , . . . , αn ). As αi ≥ 0, for 1 ≤ i ≤ n, define
1 √ √ 1 1
D 2 = diag( α1 , . . . , αn ). Then, A = U D 2 [D 2 U ∗ ] = B ∗ B.
3 ⇒ 1: Let A = B ∗ B. Then, for x ∈ Cn , x∗ Ax = x∗ B ∗ Bx = kBxk2 ≥ 0. Thus, the
required result follows.
A similar argument give the next result and hence the proof is omitted.

Theorem 6.6.3.5. Let A ∈ Mn (C). Then the following statements are equivalent.
1. A is positive definite.
2. A∗ = A and each eigenvalue of A is positive.
3. A = B ∗ B, for a non-singular matrix B ∈ Mn (C).

Definition 6.6.3.6. Let V be a vector space over F. Then,


1. for a fixed m ∈ N, a function f : Vm → F is called an m-multilinear function if f is
linear in each component. That is,

f (v1 , . . . , vi−1 , (vi + αu), vi+1 . . . , vm ) = f (v1 , . . . , vi−1 , vi , vi+1 . . . , vm )


+αf (v1 , . . . , vi−1 , u, vi+1 . . . , vm )

for α ∈ F, u ∈ V and vi ∈ V, for 1 ≤ i ≤ m.


158 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

2. An m-multilinear form is also called an m-form.


3. A 2-form is called a bilinear form.

Definition 6.6.3.7. [Sesquilinear, Hermitian and Quadratic Forms] Let A = [aij ] ∈ Mn (C) be
a Hermitian matrix and let x, y ∈ Cn . Then a sesquilinear form in x, y ∈ Cn is defined as
H(x, y) = y∗ Ax. In particular, H(x, x), denoted H(x), is called the Hermitian form. In case
A ∈ Mn (R), H(x) is called the quadratic form.

Remark 6.6.3.8. Observe that


1. if A = In then the bilinear/sesquilinear form reduces to the standard inner product.
2. H(x, y) is ‘linear’ in the first component and ‘conjugate linear’ in the second component.
3. the Hermitian form H(x) is a real number. Hence, for α ∈ R, the equation H(x) = α,
represents a conic in Cn .
Example 6.6.3.9. 1. Let vi ∈ Cn , for 1 ≤ i ≤ n. Then, f (v1 , . . . , vn ) = det ([v1 , . . . , vn ])
is an n-form on Cn .
2. Let A ∈ Mn (R). Then, f (x, y) = yT Ax, for x, y ∈ Rn , is a bilinear form on Rn .
" #  
T

1 2−i ∗ x
3. Let A = . Then A = A and for x = , verify that
AF

2+i 2 y
DR

H(x) = x∗ Ax = |x|2 + 2|y|2 + 2Re ((2 − i)xy)

where ‘Re’ denotes the real part of a complex number is a sesquilinear form.

6.3.A Sylvester’s law of inertia

The main idea of this section is to express H(x) as sum or difference of squares. Since H(x)
is a quadratic in x, replacing x by cx, for c ∈ C, just gives a multiplication factor by |c|2 .
Hence, one needs to study only the normalized vectors. Also, in Example 6.6.1.1, we expressed
(x + y)2 (x − y)2 (x + y)2 (x − y)2
xT Ax = 3 − and xT Bx = 5 + . But, we can also express them
T
2 2
22 2 T
2 2 2
2 2
as x Ax = 2(x + y) − (x + y ) and x Bx = 2(x + y) + (x + y ). Note that the first expression
clearly gives the direction of maximum and minimum displacements or the axes of the curves
that they represent whereas such deductions cannot be made from the other expression. So, in
this subsection, we proceed to clarify these ideas.
Let A ∈ Mn (C) be Hermitian. Then, by Theorem 6.6.2.19, σ(A) = {α1 , . . . , αn } ⊆ R and
there exists a unitary matrix U such that U ∗ AU = D = diag(α1 , . . . , αn ). Let x = U z. Then
kxk = 1 and U is unitary implies that kzk = 1. If z = (z1 , . . . , zn )∗ then
n
X p
X 2 r
X p 2
∗ ∗ ∗ 2 p
H(x) = z U AU z = z Dz = λi |zi | = |λi | zi − |λi | zi . (6.6.3.7)
i=1 i=1 i=p+1

Thus, the possible values of H(x) depend only on the eigenvalues of A. Since U is an invertible
matrix, the components zi ’s of z = U ∗ x are commonly known as the linearly independent linear
6.3. QUADRATIC FORMS 159

forms. Note that each zi is a linear expression in the components of x. Also, note that in
Equation (6.6.3.7), p corresponds to the number of positive eigenvalues and r − p to the number
of negative eigenvalues. So, as a next result, we show that in any expression of H(x) as a sum or
difference of n absolute squares of linearly independent linear forms, the number p (respectively,
r − p) gives the number of positive (respectively, negative) eigenvalues of A. This is popularly
known as the ‘Sylvester’s law of inertia’.

Lemma 6.6.3.10 (Sylvester’s law of inertia). Let A ∈ Mn (C) be a Hermitian matrix and let
x ∈ Cn . Then every Hermitian form H(x) = x∗ Ax, in n variables can be written as

H(x) = |y1 |2 + · · · + |yp |2 − |yp+1 |2 − · · · − |yr |2

where y1 , . . . , yr are linearly independent linear forms in the components of x and the integers
p and r satisfying 0 ≤ p ≤ r ≤ n, depend only on A.

Proof. Equation (6.6.3.7) implies that H(x) has the required form. We only need to show that
p and r are uniquely determined by A. Hence, let us assume on the contrary that there exist
p, q, r, s ∈ N with p > q such that
T

H(x) = |y1 |2 + · · · + |yp |2 − |yp+1 |2 − · · · − |yr |2


AF

= |z1 |2 + · · · + |zq |2 − |zq+1 |2 − · · · − |ys |2 , (6.6.3.8)


DR

where y = (y1 , . . . , yn )∗ = M x and z = (z1 , . . . , zn )∗ = N x for some invertible matrices M


and N . Hence, z = By, for "B = N M#−1 , an invertible matrix. Let us write Y1 = (y1 , . . . , yp )∗ ,
B1 B2
Z1 = (z1 , . . . , zq )∗ and B = , where B1 is a q × p matrix. As p > q, the homogeneous
B3 B4
 
f
Y
f ∗ e = ∗1 . Then
linear system B1 Y1 = 0 has a non-zero solution, say Y1 = (y˜1 , . . . , y˜p ) and let y
0
Z1 = 0 and thus, using Equation (6.6.3.8), we have

H(ỹ) = |y˜1 |2 + |y˜2 |2 + · · · + |y˜p |2 = −(|zq+1 |2 + · · · + |zs |2 ).

Now, this can hold only if y˜1 = · · · = y˜p = 0, a contradiction. Hence p = q. Similarly, the case
r > s can be resolved. Thus, the proof of the lemma is over.

Remark 6.6.3.11. Since A is Hermitian, Rank(A) equals the number of non-zero eigenvalues.
Hence, Rank(A) = r. The number r is called the rank and the number r − 2p is called the
inertial degree of the Hermitian form H(x).

Definition 6.6.3.12. [Associate Quadratic Form] Let f (x, y) = ax2 + 2hxy + by 2 + 2f x+ 2gy + c
be a general quadratic in x and y, with coefficients from R. Then,
  
  a h x
T
H(x) = x Ax = x, y = ax2 + 2hxy + by 2
h b y

is called the associated quadratic form of the conic f (x, y) = 0.


160 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

We now obtain conditions on the eigenvalues of A, corresponding to the associated quadratic


form, to characterize conic sections in R2 , with respect to the standard inner product.

Proposition 6.6.3.13. Consider the general quadratic f (x, y), for a, b, c, g, f, h ∈ R. Then
f (x, y) = 0 represents

1. an ellipse or a circle if ab − h2 > 0,

2. a parabola or a pair of parallel lines if ab − h2 = 0,

3. a hyperbola or a pair of intersecting lines if ab − h2 < 0.

Proof. As A is symmetric, by Corollary 6.6.2.20, A = U diag(α1 , α2 )U T , where U = [u1 , u2 ] is an


orthogonal matrix, with (α1 , u1 ) and (α2 , u2 ) as eigen-pairs of A. Let [u, v] = xT U . As u1 and
u2 are orthogonal, u and v represent orthogonal lines passing through origin in the (x, y)-plane.
In most cases, these lines form the principal axes of the conic.
We also have xT Ax = α1 u2 + α2 v 2 and hence f (x, y) = 0 reduces to

λ1 u2 + λ2 v 2 + 2g1 u + 2f1 v + c = 0. (6.6.3.9)

for some g1 , f1 ∈ R. Now, we consider different cases depending of the values of α1 , α2 :


T

1. If α1 = 0 = α2 then A = 0 and Equation (6.6.3.9) gives the straight line 2gx + 2f y + c = 0.


AF

2. if α1 = 0 and α2 6= 0 then ab − h2 = det(A) = α1 α2 = 0. So, after dividing by α2 ,


DR

Equation (6.6.3.9) reduces to (v + d1 )2 = d2 u + d3 , for some d1 , d2 , d3 ∈ R. Hence, let us


look at the possible subcases:
(a) Let d2 = d3 = 0. Then v + d1 = 0 is a pair of coincident lines.
(b) Let d2 = 0, d3 6= 0. q
i. If d3 > 0, then we get a pair of parallel lines given by v = −d1 ± αd32 .
ii. If d3 < 0, the solution set of the corresponding conic is an empty set.
(c) If d2 6= 0. Then the given equation is of the form Y 2 = 4aX for some translates
X = x + α and Y = y + β and thus represents a parabola.
Let H(x) = 2 2
  x +4y +4xy be the associated quadratic form for a class of curves. Then,
1 2
A= , α1 = 0, α2 = 5 and v = x + 2y. Now, let d1 = −3 and vary d2 and d3 to
2 4
get different curves (see Figure 6.2 drawn using the package ”MATHEMATICA”).

3. α1 > 0 and α2 < 0. Then ab − h2 = det(A) = λ1 λ2 < 0. If α2 = −β2 , for β2 > 0, then
Equation (6.6.3.9) reduces to

α1 (u + d1 )2 − β2 (v + d2 )2 = d3 , for some d1 , d2 , d3 ∈ R (6.6.3.10)

whose understanding requires the following subcases:


(a) If d3 = 0 then Equation (6.6.3.10) equals
√ p  √ p 
α1 (u + d1 ) + β2 (v + d2 ) · α1 (u + d1 ) − β2 (v + d2 ) = 0

or equivalently, a pair of intersecting straight lines in the (u, v)-plane.


6.3. QUADRATIC FORMS 161

2.0 2

2
1.5
1

1.0
1

0.5 -1 1 2 3 4 5

-1 1 2 3 4 5
-1 1 2 3 4 5 -1

-0.5
-1
-2
-1.0

Figure 6.2: Curves for d2 = 0 = d3 , d2 = 0, d3 = 1 and d2 = 1, d3 = 1


.

(b) Without loss of generality, let d3 > 0. Then Equation (6.6.3.10) equals
T
AF

λ1 (u + d1 )2 α2 (v + d2 )2
− =1
d3 d3
DR

or equivalently, a hyperbola with orthogonal principal axes u + d1 = 0 and v + d2 = 0.


Let H(x) =10x2 − 5y 2
 + 20xy be the associated quadratic form for a class of curves.
10 10 √ √
Then, A = , α1 = 15, α2 = −10 and 5u = 2x + y, 5v = x − 2y. Now, let
10 −5
√ √
d1 = 5, d2 = − 5 to get 3(2x + y + 1)2 − 2(x − 2y − 1)2 = d3 . Now vary d3 to get
different curves (see Figure 6.3 drawn using the package ”MATHEMATICA”).

4. α1 , α2 > 0. Then, ab − h2 = det(A) = α1 α2 > 0 and Equation (6.6.3.9) reduces to

λ1 (u + d1 )2 + λ2 (v + d2 )2 = d3 , for some d1 , d2 , d3 ∈ R. (6.6.3.11)

We consider the following three subcases to understand this.


(a) If d3 = 0 then we get a pair of orthogonal lines u + d1 = 0 and v + d2 = 0.
(b) If d3 < 0 then the solution set of Equation (6.6.3.11) is an empty set.
α1 (u + d1 )2 α2 (v + d2 )2
(c) If d3 > 0 then Equation (6.6.3.11) reduces to + = 1, an
d3 d3
ellipse or circle with u + d1 = 0 and v + d2 = 0 as the orthogonal principal axes.

Let H(x) = 2 2
  6x + 9y + 4xy be the associated quadratic form for a class of curves. Then,
6 2 √ √ √ √
A= , α1 = 10, α2 = 5 and 5u = x+2y, 5v = 2x−y. Now, let d1 = 5, d2 = − 5
2 9
to get 2(x+2y +1)2 +(2x−y −1)2 = d3 . Now vary d3 to get different curves (see Figure 6.4
drawn using the package ”MATHEMATICA”).

Thus, we have considered all the possible cases and the required result follows.
162 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

5 5 5

-2 -1 1 2 -2 -1 1 2 -2 -1 1 2

-5 -5 -5

Figure 6.3: Curves for d3 = 0, d3 = 1 and d3 = −1


.

1.0
T
AF
DR

-1.0 -0.5 0.5 1.0

0.5

-1

-2 -1 1 2
-2

-0.5

-3

Figure 6.4: Curves for d3 = 0 and d3 = 5


.

" # " #
x   u
Remark 6.6.3.14. Observe that the condition = u1 u2 implies that the principal
y v
axes of the conic are functions of the eigenvectors u1 and u2 .

Exercise 6.6.3.15. Sketch the graph of the following surfaces:

1. x2 + 2xy + y 2 − 6x − 10y = 3.

2. 2x2 + 6xy + 3y 2 − 12x − 6y = 5.

3. 4x2 − 4xy + 2y 2 + 12x − 8y = 10.


6.3. QUADRATIC FORMS 163

4. 2x2 − 6xy + 5y 2 − 10x + 4y = 7.

As a last application,
  we  consider
 a quadraticin 3 variables, namely x, y and z. To do so, let
a d e x l y1
     
A = d b f , x = y , b = m and y = y2  with 
e f c z n y3

f (x, y, z) = xT Ax + 2bT x + q (6.6.3.12)


= ax2 + by 2 + cz 2 + 2dxy + 2exz + 2f yz + 2lx + 2my + 2nz + q(6.6.3.13)

Then, we now observe the following:

1. As A is symmetric, P T AP = diag(α1 , α2 , α3 ), where P = [u1 , u2 , u3 ] is an orthogonal


matrix and (αi , ui ), for i = 1, 2, 3 are eigen-pairs of A.

2. Let y = P T x. Then f (x, y, z) reduces to

g(y1 , y2 , y3 ) = α1 y12 + α2 y22 + α3 y32 + 2l1 y1 + 2l2 y2 + 2l3 y3 + q. (6.6.3.14)

3. Depending on the values of αi ’s, rewrite g(y1 , y2 , y3 ) to determine to determine the center
T

and the planes of symmetry of f (x, y, z) = 0.


AF

Example 6.6.3.16. Determine the following quadrics f (x, y, z) = 0, where


DR

1. f (x, y, z) = 2x2 + 2y 2 + 2z 2 + 2xy + 2xz + 2yz + 4x + 2y + 4z + 2.

2. f (x, y, z) = 3x2 − y 2 + z 2 + 10.

3. f (x, y, z) = 3x2 − y 2 + z 2 − 10.

4. f (x, y, z) = 3x2 − y 2 + z − 10.


   
2 1 1 2
   
Solution: Part 1 Here, A = 1 2 1, b = 1
  
 and q = 2. So, the orthogonal matrices
1 1 2 2
   
√1 √1 √1 4 0 0
 13 −12 16   
P = √
 3

2
√  and P AP = 0 1 0. Hence, f (x, y, z) = 0 reduces to
6 
T
 
√1 0 −2
√ 0 0 1
3 6

5 1 1 9
4(y1 + √ )2 + (y2 + √ )2 + (y3 − √ )2 = .
4 3 2 6 12
   −5   
√ −3
x 4 3 4
√−1 
2 2 2 9 
So, the standard form of the quadric is 4z1 + z2 + z3 = 12 , where y = P  2  = 14  is
 
√1 −3
z 4
6
the center and x + y + z = 0, x − y = 0 and x + y − 2z = 0 as the principal axes.
y2 3x2 z2
Part 2 Here f (x, y, z) = 0 reduces to 10 − 10 − 10 = 1 which is the equation of a hyperboloid
consisting of two sheets with center 0 and the axes x, y and z as the principal axes.
164 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZATION

2
3x2 2
Part 3 Here f (x, y, z) = 0 reduces to 10 − y10 + 10
z
= 1 which is the equation of a hyperboloid
consisting of one sheet with center 0 and the axes x, y and z as the principal axes.
Part 4 Here f (x, y, z) = 0 reduces to z = y 2 − 3x2 + 10 which is the equation of a hyperbolic
paraboloid.
The different curves are given in Figure 6.5. These curves have been drawn using the package
”MATHEMATICA”.

Figure 6.5: Ellipsoid, Hyperboloid of two sheets and one sheet, Hyperbolic Paraboloid
.
T
AF
DR
Chapter 7

Appendix

7.1 Permutation/Symmetric Groups


Definition 7.7.1.1. For a positive integer n, denote [n] = {1, 2, . . . , n}. A function f : A → B
is called
1. one-one/injective if f (x) = f (y) for some x, y ∈ A necessarily implies that x = y.
T

2. onto/surjective if for each b ∈ B there exists a ∈ A such that f (a) = b.


AF

3. a bijection if f is both one-one and onto.


DR

Example 7.7.1.2. Let A = {1, 2, 3}, B = {a, b, c, d} and C = {α, β, γ}. Then, the function
1. j : A → B defined by j(1) = a, j(2) = c and j(3) = c is neither one-one nor onto.
2. f : A → B defined by f (1) = a, f (2) = c and f (3) = d is one-one but not onto.
3. g : B → C defined by g(a) = α, g(b) = β, g(c) = α and g(d) = γ is onto but not one-one.
4. h : B → A defined by h(a) = 2, h(b) = 2, h(c) = 3 and h(d) = 1 is onto.
5. h ◦ f : A → A is a bijection.
6. g ◦ f : A → C is neither one-one not onto.

Remark 7.7.1.3. Let f : A → B and g : B → C be functions. Then, the composition of


functions, denoted g ◦ f , is a function from A to C defined by (g ◦ f )(a) = g(f (a)). Also, if
1. f and g are one-one then g ◦ f is one-one.
2. f and g are onto then g ◦ f is onto.

Thus, if f and g are bijections then so is g ◦ f .

Definition 7.7.1.4. A function f : [n] → [n] is called a permutation on n elements if f is a


bijection. For example, f, g : [2] → [2] defined by f (1) = 1, f (2) = 2 and g(1) = 2, g(2) = 1 are
permutations.

Exercise 7.7.1.5. Let S3 be the set consisting of all permutation on 3 elements. Then prove
that S3 has 6 elements. Moreover, they are one of the 6 functions given below.
166 CHAPTER 7. APPENDIX

1. f1 (1) = 1, f1 (2) = 2 and f1 (3) = 3.


2. f2 (1) = 1, f2 (2) = 3 and f2 (3) = 2.
3. f3 (1) = 2, f3 (2) = 1 and f3 (3) = 3.
4. f4 (1) = 2, f4 (2) = 3 and f4 (3) = 1.
5. f5 (1) = 3, f5 (2) = 1 and f5 (3) = 2.
6. f6 (1) = 3, f6 (2) = 2 and f6 (3) = 1.

Remark 7.7.1.6. Let f : [n] → [n] be a bijection. Then, the inverse of f , denote f −1 , is
defined by f −1 (m) = ℓ whenever f (ℓ) = m for m ∈ [n] is well defined and f −1 is a bijection.
For example, in Exercise 7.7.1.5, note that fi−1 = fi , for i = 1, 2, 3, 6 and f4−1 = f5 .

Remark 7.7.1.7. Let Sn = {f : [n] → [n] : σ is a permutation}. Then, Sn has n! elements and
forms a group with respect to composition of functions, called product, due to the following.

1. Let f ∈ Sn . Then
!
1 2 ··· n
(a) f can be written as f = , called a two row notation.
f (1) f (2) · · · f (n)
T

(b) f is one-one. Hence, {f (1), f (2), . . . , f (n)} = [n] and thus, f (1) ∈ [n], f (2) ∈ [n] \
AF

{f (1)}, . . . and finally f (n) = [n]\{f (1), . . . , f (n−1)}. Therefore, there are n choices
DR

for f (1), n−1 choices for f (2) and so on. Hence, the number of elements in Sn equals
n(n − 1) · · · 2 · 1 = n!.

2. By Remark 7.7.1.3, f ◦ g ∈ Sn , for any f, g ∈ Sn .

3. Also associativity holds as f ◦ (g ◦ h) = (f ◦ g) ◦ h for all functions f, g and h.

4. Sn has a special permutation called the identity permutation, denoted Idn , such that
Idn (i) = i, for 1 ≤ i ≤ n.

5. For each f ∈ Sn , f −1 ∈ Sn and is called the inverse of f as f ◦ f −1 = f −1 ◦ f = Idn .

Lemma 7.7.1.8. Fix a positive integer n. Then, the group Sn satisfies the following:

1. Fix an element f ∈ Sn . Then Sn = {f ◦ g : g ∈ Sn } = {g ◦ f : g ∈ Sn }.

2. Sn = {g−1 : g ∈ Sn }.

Proof. Part 1: Note that for each α ∈ Sn the functions f −1 ◦α, α◦f −1 ∈ Sn and α = f ◦(f −1 ◦α)
as well as α = (α ◦ f −1 ) ◦ f .
Part 2: Note that for each f ∈ Sn , by definition, (f −1 )−1 = f . Hence the result holds.

Definition 7.7.1.9. Let f ∈ Sn . Then, the number of inversions of f , denoted n(f ), equals

n(f ) = | {(i, j) : i < j, f (i) > f (j) } |


= | {j : i + 1 ≤ j ≤ n, f (j) < f (i)} | using two row notation. (7.7.1.1)
7.1. PERMUTATION/SYMMETRIC GROUPS 167
!
1 2 3 4
Example 7.7.1.10. 1. For f = , n(f ) = | {(1, 2), (1, 3), (2, 3)} | = 3.
3 2 1 4

2. In Exercise 7.7.1.5, n(f5 ) = 2 + 0 = 2.


!
1 2 3 4 5 6 7 8 9
3. Let f = . Then n(f ) = 3 + 1 + 1 + 1 + 0 + 3 + 2 + 1 = 12.
4 2 3 5 1 9 8 7 6

Definition 7.7.1.11. [Cycle Notation] Let f ∈ Sn . Suppose there exist r, 2 ≤ r ≤ n and


i1 , . . . , ir ∈ [n] such that f (i1 ) = i2 , f (i2 ) = i3 , . . . , f (ir ) = i1 and f (j) = j for all j 6= i1 , . . . , ir .
Then, we represent such
! a permutation by f = (i1 , i2 , . !
. . , ir ) and call it an r-cycle. For example,
1 2 3 4 5 1 2 3 4 5
f= = (1, 4, 5) and = (2, 3).
4 2 3 5 1 1 3 2 4 5
Remark 7.7.1.12. 1. One also write the r-cycle (i1 , i2 , . . . , ir ) as (i2 , i3 , . . . , ir , i1 ) and so
on. For example, (1, 4, 5) = (4, 5, 1) = (5, 1, 4).
!
1 2 3 4 5
2. The permutation f = is not a cycle.
4 3 2 5 1
3. Let f = (1, 3, 5, 4) and g = (2, 4, 1) be two cycles. Then, their product, denoted f ◦ g or
T

(1, 3, 5, 4)(2, 4, 1) equals (1, 2)(3, 5, 4). The calculation proceeds as (the arrows indicate the
AF

images):
DR

1 → 2. Note (f ◦ g)(1) = f (g(1)) = f (2) = 2.


2 → 4 → 1 as (f ◦ g)(2) = f (g(2)) = f (4) = 1. So, (1, 2) forms a cycle.
3 → 5 as (f ◦ g)(3) = f (g(3)) = f (3) = 5.
5 → 4 as (f ◦ g)(5) = f (g(5)) = f (5) = 4.
4 → 1 → 3 as (f ◦ g)(4) = f (g(4)) = f (1) = 3. So, the other cycle is (3, 5, 4).

4. Let f = (1, 4, 5) and g = (2, 4, 1) be two permutations. Then, (1, 4, 5)(2, 4, 1) = (1, 2, 5)(4) =
(1, 2, 5) as 1 → 2, 2 → 4 → 5, 5 → 1, 4 → 1 → 4 and
(2, 4, 1)(1, 4, 5) = (1)(2, 4, 5) = (2, 4, 5) as 1 → 4 → 1, 2 → 4, 4 → 5, 5 → 1 → 2.
!
1 2 3 4 5
5. Even though is not a cycle, verify that it is a product of the cycles
4 3 2 5 1
(1, 4, 5) and (2, 3).

Definition 7.7.1.13. A permutation f ∈ Sn is called a transposition if there exist m, r ∈ [n]


such that f = (m, r).

Remark 7.7.1.14. Verify that


1. (2, 4, 5) = (2, 5)(2, 4) = (4, 2)(4, 5) = (5, 4)(5, 2) = (1, 2)(1, 5)(1, 4)(1, 2).

2. in general, the r-cycle (i1 , . . . , ir ) = (1, i1 )(1, ir )(1, ir−1 ) · · · (1, i2 )(1, i1 ).

3. So, every r-cycle can be written as product of transpositions. Furthermore, they can be
written using the n transpositions (1, 2), (1, 3), . . . , (1, n).

With the above definitions, we state and prove two important results.
168 CHAPTER 7. APPENDIX

Theorem 7.7.1.15. Let f ∈ Sn . Then f can be written as product of transpositions.

Proof. Note that using use Remark 7.7.1.14, we just need to show that f can be written as
product of disjoint cycles.
Consider the set S = {1, f (1), f (2) (1) = (f ◦ f )(1), f (3) (1) = (f ◦ (f ◦ f ))(1), . . .}. As S is an
infinite set and each f (i) (1) ∈ [n], there exist i, j with 0 ≤ i < j ≤ n such that f (i) (1) = f (j) (1).
Now, let j1 be the least positive integer such that f (i) (1) = f (j1 ) (1), for some i, 0 ≤ i < j1 .
Then, we claim that i = 0.
For if, i − 1 ≥ 0 then j1 − 1 ≥ 1 and the condition that f is one-one gives
   
f (i−1) (1) = (f −1 ◦ f (i) )(1) = f −1 f (i) (1) = f −1 f (j1 ) (1) = (f −1 ◦ f (j1 ) )(1) = f (j1−1) (1).

Thus, we see that the repetition has occurred at the (j1 − 1)-th instant, contradicting the
assumption that j1 was the least such positive integer. Hence, we conclude that i = 0. Thus,
(1, f (1), f (2) (1), . . . , f (j1 −1) (1)) is one of the cycles in f .
Now, choose i1 ∈ [n] \ {1, f (1), f (2) (1), . . . , f (j1 −1) (1)} and proceed as above to get another
cycle. Let the new cycle by (i1 , f (i1 ), . . . , f (j2 −1) (i1 )). Then, using f is one-one follows that
 
T

1, f (1), f (2) (1), . . . , f (j1 −1) (1) ∩ i1 , f (i1 ), . . . , f (j2 −1) (i1 ) = ∅.
AF
DR

So, the above process needs to be repeated at most n times to get all the disjoint cycles. Thus,
the required result follows.

Remark 7.7.1.16. Note that when one writes a permutation as product of disjoint cycles, cycles
of length 1 are suppressed so as to match Definition 7.7.1.11. For example, the algorithm in the
proof of Theorem 7.7.1.15 implies

1. Using Remark 7.7.1.14.3, we see that every permutation can be written as product of the
n transpositions (1, 2), (1, 3), . . . , (1, n).
!
1 2 3 4 5
2. = (1)(2, 4, 5)(3) = (2, 4, 5).
1 4 3 5 2
!
1 2 3 4 5 6 7 8 9
3. = (1, 4, 5)(2)(3)(6, 9)(7, 8) = (1, 4, 5)(6, 9)(7, 8).
4 2 3 5 1 9 8 7 6

Note that Id3 = (1, 2)(1, 2) = (1, 2)(2, 3)(1, 2)(1, 3), as well. The question arises, is it possible
to write Idn as a product of odd number of transpositions? The next lemma answers this
question in negative.

Lemma 7.7.1.17. Suppose there exist transpositions fi , 1 ≤ i ≤ t, such that

Idn = f1 ◦ f2 ◦ · · · ◦ ft ,

then t is even.
7.1. PERMUTATION/SYMMETRIC GROUPS 169

Proof. We will prove the result by mathematical induction. Observe that t 6= 1 as Idn is not a
transposition. Hence, t ≥ 2. If t = 2, we are done. So, let us assume that the result holds for
all expressions in which the number of transpositions t ≤ k. Now, let t = k + 1.
Suppose f1 = (m, r) and let ℓ, s ∈ [n] \ {m, r}. Then, the possible choices for the com-
position f1 ◦ f2 are (m, r)(m, r) = Idn , (m, r)(m, ℓ) = (r, ℓ)(r, m), (m, r)(r, ℓ) = (ℓ, r)(ℓ, m)
and (m, r)(ℓ, s) = (ℓ, s)(m, r). In the first case, f1 and f2 can be removed to obtain Idn =
f3 ◦ f4 ◦ · · · ◦ ft , where the number of transpositions is t − 2 = k − 1 < k. So, by mathematical
induction, t − 2 is even and hence t is also even.
In the remaining cases, the expression for f1 ◦ f2 is replaced by their counterparts to obtain
another expression for Idn . But in the new expression for Idn , m doesn’t appear in the first trans-
position, but appears in the second transposition. The shifting of m to the right can continue
till the number of transpositions reduces by 2 (which in turn gives the result by mathematical
induction). For if, the shifting of m to the right doesn’t reduce the number of transpositions
then m will get shifted to the right and will appear only in the right most transposition. Then,
this expression for Idn does not fix m whereas Idn (m) = m. So, the later case leads us to a
contradiction. Hence, the shifting of m to the right will surely lead to an expression in which the
number of transpositions at some stage is t − 2 = k − 1. At this stage, one applied mathematical
T

induction to get the required result.


AF

Theorem 7.7.1.18. Let f ∈ Sn . If there exist transpositions g1 , . . . , gk and h1 , . . . , hℓ with


DR

f = g1 ◦ g2 ◦ · · · ◦ gk = h1 ◦ h2 ◦ · · · ◦ hℓ

then, either k and ℓ are both even or both odd.

Proof. As g1 ◦ · · · ◦ gk = h1 ◦ · · · ◦ hℓ and h−1 = h for any transposition h ∈ Sn , we have

Idn = g1 ◦ g2 ◦ · · · ◦ gk ◦ hℓ ◦ hℓ−1 ◦ · · · ◦ h1 .

Hence by Lemma 7.7.1.17, k + ℓ is even. Thus, either k and ℓ are both even or both odd.

Definition 7.7.1.19. [Even and Odd Permutation] A permutation f ∈ Sn is called an


1. even permutation if f can be written as product of even number of transpositions.

2. odd permutation if f can be written as a product of odd number of transpositions.

Definition 7.7.1.20. Observe that if f and g are both even or both odd permutations, then
f ◦ g and g ◦ f are both even. Whereas, if one of them is odd and the other even then f ◦ g and
g ◦ f are both odd. We use this to define a function sgn : Sn → {1, −1}, called the signature
of a permutation, by
(
1 if f is an even permutation
sgn(f ) = .
−1 if f is an odd permutation

Example 7.7.1.21. Consider the set Sn . Then


170 CHAPTER 7. APPENDIX

1. by Lemma 7.7.1.17, Idn is an even permutation and sgn(Idn ) = 1.

2. a transposition, say f , is an odd permutation and hence sgn(f ) = −1

3. using Remark 7.7.1.20, sgn(f ◦ g) = sgn(f ) · sgn(g) for any two permutations f, g ∈ Sn .

We are now ready to define determinant of a square matrix A.

Definition 7.7.1.22. Let A = [aij ] be an n × n matrix with complex entries. Then the deter-
minant of A, denoted det(A), is defined as

X X n
Y
det(A) = sgn(g)a1g(1) a2g(2) . . . ang(n) = sgn(g) aig(i) . (7.7.1.2)
g∈Sn σ∈Sn i=1

 
1 2
For example, if S2 = {Id, f = (1, 2)} then for A = , det(A) = sgn(Id) · a1Id(1) a2Id(2) +
2 1
sgn(f ) · a1f (1) a2f (2) = 1 · a11 a22 + (−1)a12 a21 = 1 − 4 = −3.

Observe that det(A) is a scalar quantity. Even though the expression for det(A) seems com-
plicated at first glance, it is very helpful in proving the results related with “properties of
T
determinant”. We will do so in the next section. As another examples, we verify that this
AF

definition also matches for 3 × 3 matrices. So, let A = [aij ] be a 3 × 3 matrix. Then using
Equation (7.7.1.2),
DR

X 3
Y
det(A) = sgn(σ) aiσ(i)
σ∈Sn i=1
3
Y 3
Y 3
Y
= sgn(f1 ) aif1 (i) + sgn(f2 ) aif2 (i) + sgn(f3 ) aif3 (i) +
i=1 i=1 i=1
3
Y 3
Y 3
Y
sgn(f4 ) aif4 (i) + sgn(f5 ) aif5 (i) + sgn(f6 ) aif6 (i)
i=1 i=1 i=1
= a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 .

7.2 Properties of Determinant


Theorem 7.7.2.1 (Properties of Determinant). Let A = [aij ] be an n × n matrix.

1. If A[i, :] = 0T for some i then det(A) = 0.

2. If B = Ei (c)A, for some c 6= 0 and i ∈ [n] then det(B) = c det(A).

3. If B = Eij A, for some i 6= j then det(B) = − det(A).

4. If A[i, :] = A[j, :] for some i 6= j then det(A) = 0.

5. Let B and C be two n×n matrices. If there exists m ∈ [n] such that B[i, :] = C[i, :] = A[i, :]
for all i 6= m and C[m, :] = A[m, :] + B[m, :] then det(C) = det(A) + det(B).
7.2. PROPERTIES OF DETERMINANT 171

6. If B = Eij (c), for c 6= 0 then det(B) = det(A).

7. If A is a triangular matrix then det(A) = a11 · · · ann , the product of the diagonal entries.

8. If E is an n × n elementary matrix then det(EA) = det(E) det(A).

9. A is invertible if and only if det(A) 6= 0.

10. If B is an n × n matrix then det(AB) = det(A) det(B).

11. If AT denotes the transpose of the matrix A then det(A) = det(AT ).

Proof. Part 1: Note that each sum in det(A) contains one entry from each row. So, each sum
has an entry from A[i, :] = 0T . Hence, each sum in itself is zero. Thus, det(A) = 0.
Part 2: By assumption, B[k, :] = A[k, :] for k 6= i and B[i, :] = cA[i, :]. So,
   
X Y X Y
det(B) = sgn(σ)  bkσ(k)  biσ(i) = sgn(σ)  akσ(k)  caiσ(i)
σ∈Sn k6=i σ∈Sn k6=i
T

X n
Y
AF

= c sgn(σ) akσ(k) = c det(A).


σ∈Sn k=1
DR

Part 3: Let τ = (i, j). Then sgn(τ ) = −1, by Lemma 7.7.1.8, Sn = {σ ◦ τ : σ ∈ Sn } and

X n
Y X n
Y
det(B) = sgn(σ) biσ(i) = sgn(σ ◦ τ ) bi,(σ◦τ )(i)
σ∈Sn i=1 σ◦τ ∈Sn i=1
 
X Y
= sgn(τ ) · sgn(σ)  bkσ(k)  bi(σ◦τ )(i) bj(σ◦τ )(j)
σ◦τ ∈Sn k6=i,j
 
X Y X n
Y
= sgn(τ ) sgn(σ)  bkσ(k)  biσ(j) bjσ(i) = − sgn(σ) akσ(k)
σ∈Sn k6=i,j σ∈Sn k=1
= − det(A).

Part 4: As A[i, :] = A[j, :], A = Eij A. Hence, by Part 3, det(A) = − det(A). Thus, det(A) = 0.
Part 5: By assumption, C[i, :] = B[i, :] = A[i, :] for i 6= m and C[m, :] = B[m, :] + A[m, :]. So,
 
X n
Y X Y
det(C) = sgn(σ) ciσ(i) = sgn(σ)  ciσ(i)  cmσ(m)
σ∈Sn i=1 σ∈Sn i6=m
 
X Y
= sgn(σ)  ciσ(i)  (amσ(m) + bmσ(m) )
σ∈Sn i6=m
X n
Y X n
Y
= sgn(σ) aiσ(i) + sgn(σ) biσ(i) = det(A) + det(B).
σ∈Sn i=1 σ∈Sn i=1
172 CHAPTER 7. APPENDIX

Part 6: By assumption, B[k, :] = A[k, :] for k 6= i and B[i, :] = A[i, :] + cA[j, :]. So,
 
X n
Y X Y
det(B) = sgn(σ) bkσ(k) = sgn(σ)  bkσ(k)  biσ(i)
σ∈Sn k=1 σ∈Sn k6=i
 
X Y
= sgn(σ)  akσ(k)  (aiσ(i) + cajσ(j) )
σ∈Sn k6=i
   
X Y X Y
= sgn(σ)  akσ(k)  aiσ(i) + c sgn(σ)  akσ(k)  ajσ(j) )
σ∈Sn k6=i σ∈Sn k6=i
X n
Y
= sgn(σ) akσ(k) + c · 0 = det(A).
σ∈Sn k=1

Part 7: Observe that if σ ∈ Sn and σ 6= Idn then n(σ) ≥ 1. Thus, for every σ 6= Idn , there
exists m ∈ [n] (depending on σ) such that m > σ(m) or m < σ(m). So, if A is triangular,
Q Q
amσ(m) = 0. So, for each σ 6= Idn , ni=1 aiσ(i) = 0. Hence, det(A) = ni=1 aii . the result follows.
Part 8: Using Part 7, det(In ) = 1. By definition Eij = Eij In and Ei (c) = Ei (c)In and
Eij (c) = Eij (c)In , for c 6= 0. Thus, using Parts 2, 3 and 6, we get det(Ei (c)) = c, det(Eij ) = −1
and det(Eij (k) = 1. Also, again using Parts 2, 3 and 6, we get det(EA) = det(E) det(A).
T

Part 9: Suppose A is invertible. Then by Theorem 2.2.3.1, A = E1 · · · Ek , for some elementary


AF

matrices E1 , . . . , Ek . So, a repeated application of Part 8 implies det(A) = det(E1 ) · · · det(Ek ) 6=


DR

0 as det(Ei ) 6= 0 for 1 ≤ i ≤ k.
Now, suppose that det(A) 6= 0. We need to show that A is invertible. On the contrary, assume
that A is not invertible. Then by Theorem 2.2.3.1, Rank(A) < n." So, #by Exercise 2.2.2.26.2,
B
there exist elementary matrices E1 , . . . , Ek such that E1 · · · Ek A = . Therefore, by Part 1
0
and a repeated application of Part 8 gives
" #!
B
det(E1 ) · · · det(Ek ) det(A) = det(E1 · · · Ek A) = det = 0.
0
As det(Ei ) 6= 0, for 1 ≤ i ≤ k, we have det(A) = 0, a contradiction. Thus, A is invertible.
Part 10: Let A be invertible. Then by Theorem 2.2.3.1, A = E1 · · · Ek , for some elementary
matrices E1 , . . . , Ek . So, applying Part 8 repeatedly gives det(A) = det(E1 ) · · · det(Ek ) and

det(AB) = det(E1 · · · Ek B) = det(E1 ) · · · det(Ek ) det(B) = det(A) det(B).

In case A is not invertible, by Part 9, det(A) = 0. Also, AB is not invertible (AB is invertible
will imply A is invertible). So, again by Part 9, det(AB) = 0. Thus, det(AB) = det(A) det(B).
Part 11: Let B = [bij ] = AT . Then bij = aji , for 1 ≤ i, j ≤ n. By Lemma 7.7.1.8, we know
that Sn = {σ −1 : σ ∈ Sn }. As σ ◦ σ −1 = Idn , sgn(σ) = sgn(σ −1 ). Hence,
X n
Y X n
Y X n
Y
det(B) = sgn(σ) biσ(i) = sgn(σ −1 ) bσ−1 (i),i = sgn(σ −1 ) aiσ(i)
σ∈Sn i=1 σ∈Sn i=1 σ∈Sn i=1
= det(A).
7.2. PROPERTIES OF DETERMINANT 173

Remark 7.7.2.2. 1. As det(A) = det(AT ), we observe that in Theorem 7.7.2.1, the condi-
tion on “row” can be replaced by the condition on “column”.

2. Let A = [aij ] be a matrix satisfying a11 = 1 and a1j = 0, for 2 ≤ j ≤ n. Let B = A(1 | 1),
the submatrix of A obtained by removing the first row and the first column. Then, prove
that det(A) = det(B).
Proof: Let σ ∈ Sn with σ(1) = 1. Then σ has a cycle (1). So, a disjoint cycle represen-
tation of σ only has numbers {2, 3, . . . , n}. That is, we can think of σ as an element of
Sn−1 . Hence,

X n
Y X n
Y X n−1
Y
det(A) = sgn(σ) aiσ(i) = sgn(σ) aiσ(i) sgn(σ) biσ(i)
σ∈Sn i=1 σ∈Sn ,σ(1)=1 i=2 σ∈Sn−1 i=1

= det(B).

We now relate this definition of determinant with the one given in Definition 2.2.3.11.
P
n 
Theorem 7.7.2.3. Let A be an n × n matrix. Then det(A) = (−1)1+j a1j det A(1 | j) ,
j=1
where recall that A(1 | j) is the submatrix of A obtained by removing the 1st row and the j th
T

column.
AF

 
DR

0 0 · · · a1j ··· 0
 
 a21 a22 · · · a2j · · · a2n 
 
Proof. For 1 ≤ j ≤ n, define an n × n matrix Bj =  . .. .. .. ..  . Also, for
 .. . . . . 
 
an1 an2 · · · anj · · · ann
each matrix Bj , we define the n × n matrix Cj by

1. Cj [:, 1] = Bj [:, j],

2. Cj [:, i] = Bj [:, i − 1], for 2 ≤ i ≤ j and

3. Cj [:, k] = Bj [:, k] for k ≥ j + 1.

Also, observe that Bj ’s have been defined to satisfy B1 [1, :] + · · · + Bn [1, :] = A[1, :] and
Bj [i, :] = A[i, :] for all i ≥ 2 and 1 ≤ j ≤ n. Thus, by Theorem 7.7.2.1.5,
n
X
det(A) = det(Bj ). (7.7.2.3)
j=1

Let us now compute det(Bj ), for 1 ≤ j ≤ n. Note that Cj = E12 E23 · · · Ej−1,j Bj , for 1 ≤ j ≤ n.
Then by Theorem 7.7.2.1.3, we get det(Bj ) = (−1)j−1 det(Cj ). So, using Remark 7.7.2.2.2 and
Theorem 7.7.2.1.2 and Equation (7.7.2.3), we have
n
X n
X
j−1

det(A) = (−1) det(Cj ) = (−1)j+1 a1j det A(1 | j) .
j=1 j=1

Thus, we have shown that the determinant defined in Chapter 2 is valid.


174 CHAPTER 7. APPENDIX

7.3 Uniqueness of RREF


Definition 7.7.3.1. Fix n ∈ N. Then, for each f ∈ Sn , we associate an n × n matrix, denoted
P f = [pij ], such that pij = δi,f (i) , where δi,j = 1 if i = j and 0, otherwise is the famous
Kronecker delta function. The matrix P f is called the Permutation
 matrix corresponding to
0 1
the permutation f . For example, I2 , corresponding to Id2 , and = E12 , corresponding to
1 0
the permutation (1, 2), are the two permutation matrices of order 2 × 2.

Remark 7.7.3.2. Recall that in Remark 7.7.1.16.1, it was observed that each permutation is a
product of n transpositions, (1, 2), . . . , (1, n).
1. Verify that the elementary matrix Eij is the permutation matrix corresponding to the trans-
position (i, j) .
2. Thus, every permutation matrix is a product of elementary matrices E1j , 1 ≤ j ≤ n.
   
1 0 0 0 1 0
3. For n = 3, the permutation matrices are I3 , 0 0 1 = E23 = E12 E13 E12 , 1 0 0 =
0 1 0 0 0 1
     
0 1 0 0 0 1 0 0 1
    
E12 , 0 0 1 = E12 E13 , 1 0 0 = E13 E12 and 0 1 0 = E13 .
T

1 0 0 0 1 0 1 0 0
AF

4. Let f ∈ Sn and P f = [pij ] be the corresponding permutation matrix. Since pij = δi,j and
DR

{f (1), . . . , f (n)} = [n], each entry of P f is either 0 or 1. Furthermore, every row and
column of P f has exactly one non-zero entry. This non-zero entry is a 1 and appears at
the position pi,f (i) .
5. By the previous paragraph, we see that when a permutation matrix is multiplied to A
(a) from left then it permutes the rows of A.
(b) from right then it permutes the columns of A.

6. P is a permutation matrix if and only if P has exactly one 1 in each row and column.
Solution: If P has exactly one 1 in each row and column, then P is a square matrix, say
n × n. Now, apply GJE to P . The occurrence of exactly one 1 in each row and column
implies that these 1’s are the pivots in each column. We just need to interchange rows to
get it in RREF. So, we need to multiply by Eij . Thus, GJE of P is In and P is indeed a
product of Eij ’s. The other part has already been explained earlier.

We are now ready to prove Theorem 2.2.2.16.

Theorem 7.7.3.3. Let A and B be two matrices in RREF. If they are equivalent then A = B.

Proof. Note that the matrix A = 0 if and only if B = 0. So, let us assume that the matrices
A, B 6= 0. On the contrary, assume that A 6= B. Then, there exists the smallest k such that
A[:, k] 6= B[:, k] but A[:, i] = B[:, i], for 1 ≤ i ≤ k − 1. For 1 ≤ j ≤ n, we define matrices
Aj = [A[:, 1], . . . , A[:, j]] and Bj = [B[:, 1], . . . , B[:, j]]. Then, by the choice of k, Ak−1 = Bk−1 .
Also, let there be r pivots in Ak−1 . To get our result, we consider the following three cases.
7.4. DIMENSION OF W1 + W2 175

Case 1: Neither A[:, k] nor B[:, k] are pivotal. As Ak−1


 = Bk−1
 and the number
 of pivots is r,
Ir X Ir Y
we can find a permutation matrix P such that Ak P = and Bk P = . But, Ak
0 0 0 0
and Bk are equivalent and thus there exists an invertible matrix C such that Ak = CBk . That
is,       
Ir X C1 C2 Ir Y C1 C1 Y
= Ak P = CBk P = = .
0 0 C3 C4 0 0 C3 C3 Y
Hence, C1 = Ir and therefore X = Y , contradicting A[:, k] 6= B[:, k].
Case 2: A[:, k] is pivotal but B[:, k] in non-pivotal. Let {i1 , . . . , iℓ } be the pivotal columns of
Ak . Define F = [A[:, i1 ], . . . , A[:, iℓ ]] and G = B[:, i1 ], . . . , B[:, iℓ ]]. So, using matrix multiplica-
tion, F and G are equivalent, they are in RREF and are of the same size. But,
   
1 0 ··· 0 0 1 0 · · · 0 b1k
   
0 1 · · · 0 0  0 1 · · · 0 b2k 
   
 .. ..   .. .. 
 . .   . . 
F = 
 and G = 
 
.

0 · · · 0 1 0  0 · · · 0 1 brk 
   
0 · · · 0 0 1 0 · · · 0 0 0 
   
0 0 0 0 0 0 0 0 0 0
T

Clearly F and G are not equivalent.


AF

Case 3: Both A[:, k] and B[:, k] are pivotal. As Ak−1 = Bk−1 and they have r pivots, the next
DR

pivot will appear in the (r + 1)-th row. But, both A[:, k] and B[:, k] are pivotal and hence they
will have 1 in the (r + 1)-th entry and 0, everywhere else. Thus, A[:, k] = B[:, k], a contradiction.
Therefore, combining all the three cases, we get the required result.

7.4 Dimension of W1 + W2
Theorem 7.7.4.1. Let V be a finite dimensional vector space over F and let W1 and W2 be two
subspaces of V. Then

dim(W1 ) + dim(W2 ) = dim(W1 + W2 ) + dim(W1 ∩ W2 ). (7.7.4.4)

Proof. Since W1 ∩ W2 is a vector subspace of V , let B = {u1 , . . . , ur } be a basis of W1 ∩ W2 .


As, W1 ∩ W2 is a subspace of both W1 and W2 , we extend the basis B to form a basis B1 =
{u1 , . . . , ur , v1 , . . . , vs } of W1 and a basis B2 = {u1 , . . . , ur , w1 , . . . , wt } of W2 .
We now prove that D = {u1 , . . . , ur , w1 , . . . , ws , v1 , v2 , . . . , vt } is a basis of W1 + W2 . To do
this, we show that

1. D is linearly independent subset of V and

2. LS(D) = W1 + W2 .

The second part can be easily verified. For the first part, consider the linear system

α1 u1 + · · · + αr ur + β1 w1 + · · · + βs ws + γ1 v1 + · · · + γt vt = 0 (7.7.4.5)
176 CHAPTER 7. APPENDIX

in the unknowns αi ’s, βj ’s and γk ’s. We re-write the system as

α1 u1 + · · · + αr ur + β1 w1 + · · · + βs ws = −(γ1 v1 + · · · + γt vt ).

P
s P
r P
t
Then v = − γi vi ∈ LS(B1 ) = W1 . Also, v = αr ur + βk wk . So, v ∈ LS(B2 ) = W2 .
i=1 j=1 k=1
P
r
Hence, v ∈ W1 ∩ W2 and therefore, there exists scalars δ1 , . . . , δk such that v = δj uj .
j=1
Substituting this representation of v in Equation (7.7.4.5), we get

(α1 − δ1 )u1 + · · · + (αr − δr )ur + β1 w1 + · · · + βt wt = 0.

So, using Exercise 3.3.3.16.1, αi − δi = 0, for 1 ≤ i ≤ r and βj = 0, for 1 ≤ j ≤ t. Thus, the


system (7.7.4.5) reduces to

α1 u1 + · · · + αk uk + γ1 v1 + · · · + γr vr = 0

which has αi = 0 for 1 ≤ i ≤ r and γj = 0 for 1 ≤ j ≤ s as the only solution. Hence, we see
that the linear system of Equations (7.7.4.5) has no non-zero solution. Therefore, the set D is
linearly independent and the set D is indeed a basis of W1 + W2 . We now count the vectors in
T
AF

the sets B, B1 , B2 and D to get the required result.


DR

7.5 When does Norm imply Inner Product

In this section, we prove the following result. A generalization of this result to complex vector
space is left as an exercise for the reader as it requires similar ideas.

Theorem 7.7.5.1. Let V be a real vector space. A norm k · k is induced by an inner product if
and only if, for all x, y ∈ V, the norm satisfies

kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 (parallelogram law). (7.7.5.6)

Proof. Suppose that k · k is indeed induced by an inner product. Then by Exercise 5.5.1.7.3 the
result follows.
So, let us assume that k · k satisfies the parallelogram law. So, we need to define an inner
product. We claim that the function f : V × V → R defined by

1 
f (x, y) = kx + yk2 − kx − yk2 , for all x, y ∈ V
4

satisfies the required conditions for an inner product. So, let us proceed to do so.
1
Step 1: Clearly, for each x ∈ V, f (x, 0) = 0 and f (x, x) = kx + xk2 = kxk2 . Thus,
4
f (x, x) ≥ 0. Further, f (x, x) = 0 if and only if x = 0.

Step 2: By definition f (x, y) = f (y, x) for all x, y ∈ V.


7.5. WHEN DOES NORM IMPLY INNER PRODUCT 177


Step 3: Now note that kx + yk2 − kx − yk2 = 2 kx + yk2 − kxk2 − kyk2 . Or equivalently,

2f (x, y) = kx + yk2 − kxk2 − kyk2 , for x, y ∈ V. (7.7.5.7)

Thus, for x, y, z ∈ V, we have

4 (f (x, y) + f (z, y)) = kx + yk2 − kx − yk2 + kz + yk2 − kz − yk2



= 2 kx + yk2 + kz + yk2 − kxk2 − kzk2 − 2kyk2

= kx + z + 2yk2 + kx − zk2 − kx + zk2 + kx − zk2 − 4kyk2
= kx + z + 2yk2 − kx + zk2 − k2yk2
= 2f (x + z, 2y) using Equation (7.7.5.7). (7.7.5.8)

Now, substituting z = 0 in Equation (7.7.5.8) and using Equation (7.7.5.7), we get


2f (x, y) = f (x, 2y) and hence 4f (x + z, y) = 2f (x + z, 2y) = 4 (f (x, y) + f (z, y)). Thus,

f (x + z, y) = f (x, y) + f (z, y), for all x, y ∈ V. (7.7.5.9)

Step 4: Using Equation (7.7.5.9), f (x, y) = f (y, x) and the principle of mathematical induc-
T

tion, it follows that nf (x, y) = f (nx, y), for all x, y ∈ V and n ∈ N. Another application
AF

of Equation (7.7.5.9) with f (0, y) = 0 implies that nf (x, y) = f (nx, y), for all x, y ∈ V
DR

and n ∈ Z. Also, for m 6= 0,


n  n
mf x, y = f (m x, y) = f (nx, y) = nf (x, y).
m m
Hence, we see that for all x, y ∈ V and a ∈ Q, f (ax, y) = af (x, y).

Step 5: Fix u, v ∈ V and define a function g : R → R by

g(x) = f (xu, v) − xf (u, v)


1  x 
= kxu + vk2 − kxuk2 − kvk2 − ku + vk2 − kuk2 − kvk2 .
2 2
Then, by previous step g(x) = 0, for all x ∈ Q. So, if g is a continuous function then
continuity implies g(x) = 0, for all x ∈ R. Hence, f (xu, v) = xf (u, v), for all x ∈ R.
Note that the second term of g(x) is a constant multiple of x and hence continuous. Using
a similar reason, it is enough to show that g1 (x) = kxu + vk, for certain fixed vectors
u, v ∈ V, is continuous. To do so, note that

kx1 u + vk = k(x1 − x2 )u + x2 u + vk ≤ k(x1 − x2 )uk + kx2 u + vk.




Thus, kx1 u + vk − kx2 u + vk ≤ k(x1 − x2 )uk. Hence, taking the limit as x1 → x2 , we
get lim kx1 u + vk = kx2 u + vk.
x1 →x2

Thus, we have proved the continuity of g and hence the prove of the required result.
Index

m-form, 157 Square Matrix, 8


Transpose of a Matrix, 8
Addition of Matrices, 9
Triangular Matrix, 8
Additive Inverse, 9
Upper Triangular Matrix, 8
Adjugate of a Matrix, 48
Zero Matrix, 8
Algebraic Multiplicity, 144
Determinant
Basic Variables, 37 Properties, 170
Basis Matrix, 82 Determinant of a Square Matrix, 45, 170
Basis of a Vector Space, 74 Diagonalizable Matrices, 147
Bilinear Form, 157 Double Dual Space, 102
Dual Space, 102
T

Cauchy-Schwartz Inequality, 112


AF

Cayley Hamilton Theorem, 154 Eigen-Condition, 141


DR

Characteristic Equation, 142 Eigen-pair, 141


Characteristic Polynomial, 142 Eigen-space, 141
Characteristic Root, 141 Eigenvalue, 141
Characteristic Value, 142 Eigenvector, 141
Characteristic Vector, 141 Elementary Matrices, 28
Characteristic-pair, 141 Elementary Row Operations, 27
Cofactor Matrix, 48 Equality of Linear Transformations, 87
Column Operations, 33 Equality of two Matrices, 7
Column-Rank of a Matrix, 33 Equivalent Matrices, 27
Complex Vector Space, 56
Finite Dimensional Vector Space, 65
Conjugate Transpose of a Matrix, 9
Free Variables, 37
Coordinate Matrix, 100
Function
Coordinates of a Vector, 82
bijective, 165
Definition injective, 165
Conjugate Transpose of a Matrix, 9 Inverse, 97
Diagonal Matrix, 8 Left Inverse, 97
Equality of two Matrices, 7 Multilinear, 157
Identity Matrix, 8 one-one, 165
Lower Triangular Matrix, 8 onto, 165
Matrix, 7 Right Inverse, 97
Principal Diagonal, 8 surjective, 165
INDEX 179

Fundamental Theorem of Linear Algebra, 117 Trivial Solution, 25


Linear Transformation, 87
Gauss Elimination Method, 36
Composition, 105
Geometric Multiplicity, 144
Equality, 87
Gram-Schmidt Orthogonalization Process, 123
Inverse, 97, 106

Hermitian Form, 158 Isomorphism, 98

Inertial degree, 159 Matrix, 100


Matrix Product, 105
Identity Operator, 87 Non-Singular, 98
Identity Transformation, 87 Null Space, 93
Inertial degree, 159 Nullity, 94
Inner Product, 109 Range Space, 93
Inner Product Space, 109 Rank, 94
Inverse of a Linear Transformation, 97 Scalar Product, 96
Inverse of a Matrix, 14 Singular, 98
Isometry, 126 Sum, 96

Latent Roots, 141 Matrix, 7


T
AF

Leading Term, 30, 31 Addition, 9


Linear Algebra Adjugate, 48
DR

Fundamental Theorem, 117 Change of Basis, 83


Linear Combination of Vectors, 62 Cofactor, 48
Linear Dependence, 68 Column-Rank, 33
Linear Functional, 102 Determinant, 45
linear Independence, 68 Diagonalization, 147
Linear Operator, 87 Echelon Form, 31
Linear Space Eigen-pair, 141
Norm, 116 Eigenvalue, 141
Linear Span of Vectors, 63 Eigenvector, 141
Linear System, 24 Elementary, 28
Associated Homogeneous System, 25 Generalized Inverse, 21
Augmented Matrix, 25 Hermitian, 16
Coefficient Matrix, 25 Idempotent, 16
Consistent, 25 Inverse, 14
Equivalent Systems, 27 Minor, 48
Homogeneous, 24 Negative definite, 156
Inconsistent, 25 Negative semi-definite, 156
Non-Homogeneous, 24 Nilpotent, 16
Non-trivial Solution, 25 Non-Singular, 45
Solution, 25 Normal, 16
Solution Set, 25 Orthogonal, 16, 129
180 INDEX

Permutation, 174 Orthogonal Operators, 125


Positive definite, 156 Orthogonal Projection, 127, 129
Positive semi-definite, 156 Orthogonal Vectors, 113
Product of Matrices, 10 Orthonormal Basis, 119
Projection, 16 Orthonormal Set, 119
Pseudo, 21 Orthonormal Vectors, 119
Quadratic Form, 159
Permutation, 165
Reflection, 131
Cycle notation, 167
Row Echelon Form, 31
Even, 169
Row Equivalence, 27
Odd, 169
Row-Rank, 32
Permutation:sgn function, 169
Row-Reduced Echelon Form, 31
Permutation:Signature, 169
Scalar Multiplication, 9
Pivot Term, 30
Singular, 45
Properties of Determinant, 170
Skew-Hermitian, 16
Skew-Symmetric, 16 QR Decomposition, 134
Spectrum, 144 Generalized, 135
T

sub-matrix, 17 Quadratic Form, 158, 159


AF

Symmetric, 16
DR

Trace, 19 Rank-Nullity Theorem, 94


Unitary, 16 Real Vector Space, 56
Upper Triangular Form, 8 Reflection Operator, 131
Matrix Equality, 7 Reisz Representation Theorem, 89
Matrix Multiplication, 10 Rigid Motion, 126
Matrix Multiplication / Product, 10 Row Operations
Matrix of a Linear Transformation, 100 Elementary, 27
Maximal linear independent set, 73 Row-Rank of a Matrix, 32
Maximal subset, 73 Row-Reduced Echelon Form, 31
Minor of a Matrix, 48
Self-Adjoint Operator, 132
Multilinear function, 157
Sesquilinear Form, 158
Multiplying a Scalar to a Matrix, 9
Similar Matrices, 107
Non-Singular Matrix, 45 Singular Matrix, 45
Solution Set of a Linear System, 25
Operations Space
Column, 33 Column Space, 66
Operator Linear, 56
Self-Adjoint, 132 Normed Linear, 116
Order of Nilpotency, 16 Null Space, 66
Ordered Basis, 82 Range, 66
Orthogonal Complement, 115 Row Space, 66
INDEX 181

Spectrum of a Matrix, 144 Linear Combination, 62


Square Matrix Linear Dependence, 68
Bilinear Form, 157 Linear Independence, 68
Determinant, 170 Linear Span, 63
Sesquilinear Form, 158 Mutually Orthogonal, 119
sub-matrix of a Matrix, 17 Norm, 111
Subset Orthogonal, 113
Maximal, 73 Orthonormal, 119
Maximal linear independent, 73
Zero Operator, 87
Subspace
Zero Transformation, 87
Linear Span, 65
Orthogonal Complement, 113
Sum, 65
Sum of two Matrices, 9
System of Linear Equations, 24

Trace of a Matrix, 19
Transpose of a Matrix, 8
T
AF

Trivial Subspace, 60
DR

Unit Vector, 111


Unitary Equivalence, 150
Unitary Similar, 150

Vector Space, 55
Basis, 74
Complex, 56
Complex n-tuple, 57
Dimension of M + N , 175
Finite Dimensional, 65
Infinite Dimensional, 65
Inner Product, 109
Isomorphic, 98
Minimal spanning set, 74
Real, 56
Real n-tuple, 57
Subspace, 60
Vector Subspace, 60
Vectors
Angle, 112
Coordinates, 82
Length, 111

You might also like