You are on page 1of 166

LINEAR SYSTEMS OF EQUATIONS

AND
MATRIX COMPUTATIONS
1. DIRECT METHODS FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

1.1 SIMPLE GAUSSIAN ELIMINATION METHOD


Consider a system of n equations in n unknowns,

a11x1 + a12x2 + …. + a1nxn = y1


a21x1 + a22x2 + …. + a2nxn = y2
… … … … …
an1x1 + an2x2 + …. + annxn = yn

We shall assume that this system has a unique solution and proceed to describe the
simple “Gaussian Elimination Method”, (from now on abbreviated as GEM),Page 2 of 11
for finding the solution. The method reduces the system to an upper triangular system
using elementary row operations (ERO).

Let A(1) denote the coefficient matrix A.

 a 11( 1 ) a 12( 1 ) ..... a 1( 1n ) 


 
 (1 )
a 21 (1 )
a 22 ..... a 2( 1n) 
 ..... ...... ..... ...... 
A = 
(1)

 ..... ...... ...... ...... 
 
 a n( 11 ) a n( 12) ...... (1 )
a nn 

where aij(1) = aij

Let
 y1(1) 
 (1) 
 y2 
y(1) =  M 
 
 y (1) 
 n 

where yi(1) = yi

We assume a11(1) ≠ 0

Then by ERO applied to A(1) , (that is, subtracting suitable multiples of the first
row from the remaining rows), reduce all entries below a11(1) to zero. Let the
resulting matrix be denoted by A(2).

Ri +mi(11)R1
A(1) 
→ A(2)

2
ai(11)
where mi(11) = − ; i > 1.
a11(1)

Note A(2) is of the form


 a11(1) a12(1) ... ... a1(n1) 
 
 0 a22 ( 2)
... ... a2( 2n) 
 0 a ( 2) ... ... a3( n2) 
A(2) =  32
 M M M M M 
 ( 2) ( 2) 
 0 an 2 ... ... ann 
Notice that the above row operations on A(1) can be effected by premultiplying
A(1) by M(1) where

 1 0 0 L 0 0
 (1) 
 m21 
(1)
M = m31  (1)
I n −1 
 
 M 
 (1) 
 mn1 
(In-1 being the n-1 × n-1 identity matrix).
i.e.
M(1) A(1) = A(2)

Let
y(2) = M(1) y(1)
i.e.
R i + m i 1 R1
y (1)   → y ( 2 )
Then the system Ax = y is equivalent to

A(2)x = y(2)

Next we assume

(2)
a22 ≠0

and reduce all entries below this to zero by ERO

Ri +mi(22)
A(2) → A(3) ;

3
ai(22)
m ( 2)
i2 = − ( 2) ; i>3
a22
Here

1 0 0 L 0
 
0 1 0 L 0
 0 m ( 2) 
M ( 2) = 32

 0 m42 
( 2)
I n −2
 
M M 
 0 m ( 2) 
 n2 

and
M(2) A(2) = A(3) ; M(2) y(2) = y(3) ;

and A(3) is of the form

 a11(1) a12(1) ... ... a1(n1) 


 
 0 (2)
a22 ( 2)
a23 ... a2( 2n) 
A( 3 ) = 0 0 ( 3)
a33 ... a3(3n) 


 M M M ... M 
 (3) 
 0 0 an( 33) ... ann 

We next assume a33(3)


≠ 0 and proceed to make entries below this as zero. We
thus get M , M , …. , M(r) where
(1) (2)

1 0 L 0 
 
0 1 L 0 0 r ×( n − r ) 
M M M M 
 
0 0 L 1
M (r ) = 
0 0 L mr( +r )1r 
 
0 0 L mr( +r )2 r I n −r 
M M M M 
 
0 0 L mnr( r ) 
 

4
 a11(1) ... ... ... ... a1(n1) 
 
 0 ( 2)
a22 ... ... ... a2( rn) 
 
M 0 arr( r ) ... ... arn( r ) 
M (r )
A (r )
=A ( r +1)
=
 M M 0 ar( r++1r1+) 1 ... ar( +r 1+n1) 
 
 M M M ... ... ... 
 0 anr( r ++11) ( r +1) 
 0 0 ... ann 
M(r) y(r) = y(r+1)

At each stage we assume arr( r ) ≠ 0 .

Proceeding thus we get,

M(1), M(2), …. , M(n-1) such that

M(n-1) M(n-2) …. M(1) A(1) = A(n) ; M(n-1) M(n-2) …. M(1) y(1) = y(n)
 a11(1) a12(1) L a1(n1) 
 
 ( 2)
a22 L a2( 2n) 
A( n ) = 
where  
 O 
 ( n) 
 ann 
which is an upper triangular matrix and the given system is equivalent to

A(n)x = y(n)

Since this is an upper triangular, this can be solved by back substitution; and
hence the system can be solved easily.

Note further that each M(r) is a lower triangular matrix with all diagonal entries as
1. Thus determinant of M(r) is 1 for every r. Now,

A(n) = M(n-1) …. M(1) A(1)

Thus

det A(n) = det M(n-1) det M(n-2) …. det M(1) det A(1)

det A(n) = det A(1) = det A since A = A(1)

5
Now A(n) is an upper triangular matrix and hence its determinant is a11(1) a22
(2) (n)
L ann .
Thus det A is given by

det A = a11(1) a22


(2) ( n)
L ann

Thus the simple GEM can be used to solve the system Ax = y and also to
evaluate det A provided aii( i ) ≠ 0 for each i.
Further note that M(1), M(2), …. , M(n-1) are lower triangular, and nonsingular as
their det = 1 and hence not zero. They are all therefore invertible and their
inverses are all lower triangular, i.e. if L = M(n-1) M(n-2) …. M(1) then L is lower
triangular and nonsingular and L-1 is also lower triangular.

Now LA = LA(1) = M(n-1) M(n-2) …. M(1) A(1) = A(n)

Therefore A = L-1 A(n)

Now L-1 is lower triangular which we denote by L and A(n) is upper triangular
which we denote by U, and we thus get the so called LU decomposition

A = LU

of a given matrix A – as a product of a lower triangular matrix with an upper


triangular matrix. This is another application of the simple GEM. REMEMBER IF
AT ANY STAGE WE GET aii(1) = 0 WE CANNOT PROCEED FURTHER WITH
THE SIMPLE GEM.

EXAMPLE:

Consider the system

x1 + x2 + 2x3 = 4
2x1 - x2 + x3 = 2
x1 + 2x2 =3

Here

 1 1 2 4
   
A =  2 −1 1 y = 2
1 2 0 3
   

6
1 1 2 1 1 2 
  R 2 − 2 R1  
= 2 −1 → −3 − 3  = A (2)
(1)
A 1 0
1 0 
R 3 − R1 0 − 2 
 2  1

a11(1) = 1 ≠ 0
(1)
m21 = −2
(1)
m31 = −1
(2)
a22 = −3 ≠ 0

 1 0 0 4  4 
     
M (1)
=  − 2 1 0 y (1 )
=  2  →  − 6  = y (2) = M (1 )
y (1 )
 − 1 0 1 3  −1
     

1
R3 + R 2 1 1 2 
3
 
→ −3 − 3  = A ( 3) = −3
(2) (3)
A 0 a33
0 − 3 
 0

(2)
m31 =1 3

1 0 
 0  4 
 
M (2)
= 0 1 0 y (3)
= M (2)
y (2)
= − 6
   − 3
0 1 1  
 3 

Therefore the given system is equivalent to A(3)x = y(3) ,i.e.,

x1 + x2 + 2x3 = 4

-3x2 - 3x3 = -6

- 3x3 = -3

Back Substitution

x3 = 1

-3x2 - 3 = - 6 ⇒ -3x2 = -3 ⇒ x2 = 1

7
x1 + 1 + 2 = 4 ⇒ x1 = 1

Thus the solution of the given system is,

 x1   1 
   
x =  x 2  = 1
 x  1
 3  
The determinant of the given matrix A is

a11(1) a22 a33 = (1)(−3)(−3) = 9.


(2) (3)

Now

1 0 0
(M )  
( −1)
(1)
=  2 1 0
1 0 1
 

1 0 0 

(M ( ) )
( −1)
= 0 0
2
1
 
0 − 1 1
 3 

L = M(2) M(-1)

L-1= ( M (2) M (1) ) = ( M (1) ) (M )


−1 −1 (2) −1

 0 
 1 0 0  1 0
 
=  2 1 0  0 1 0
 1 0 1  
  0 − 1 1
 3 

1 0 0 

L = L(-1) =  2 1 0
 
1 − 1 1
 3 

1 1 2 
 
U=A (n)
=A (3)
=  0 − 3 − 3
 0 0 − 3
 

8
Therefore A = LU i.e.,

1 0 
1 1 2  0 1 1 2 
 
A=  2 −1 1  =  2 1 0  0 − 3 − 3
1 2 0    0 0 − 3
  1 − 1 1  
 3 

is the LU decomposition of the given matrix A.

We observed that in order to apply simple GEM we need arr( r ) ≠ 0 for each
stage r. This may not be satisfied always. So we have to modify the simple
GEM in order to overcome this situation. Further, even if the condition arr( r ) ≠ 0 is
satisfied at each stage, simple GEM may not be a very accurate method to use.
What do we mean by this? Consider, as an example, the following system:

(0.000003) x1 + (0.213472) x2 + (0.332147) x3 = 0.235262


(0.215512) x1 + (0.375623) x2 + (0.476625) x3 = 0.127653
(0.173257) x1 + (0.663257) x2 + (0.625675) x3 = 0.285321

Let us do the computations to 6 significant digits.

Here,

 0 .000003 0 .213472 0 .332147 


 
A(1) =  0 .215512 0 .375623 0 .476625 
 0 . 173257 0 .625675 
 0 . 663257

 0 . 235262 
 
y (1)
=  0 . 127653  a11(1) = 0.000003 ≠ 0
 0 . 285321 
 

(1)
a21 0.215512
(1)
m21 =− (1)
=− = −71837.3
a11 0.000003

(1)
a31 0.173257
(1)
m31 =− (1)
=− = −57752.3
a11 0.000003

9
 1 0 0  0.235262 
   
M (1)
=  − 71837 .3 1 0 ; y (2) (1)
=M y(1)
=  − 16900.5 
 − 57752 . 3 1   − 13586.6 
 0  

 0.000003 0.213472 0.332147 


 
A(2) = M(1) A(1) =  0 − 15334 .9 − 23860 .0 
 − 12327 .8 − 19181 .7 
 0

(2)
a22 = −15334.9 ≠ 0

(2)
a32 −12327.8
(2)
m32 =− =− = −0.803905
(2)
a22 −15334.9

1 0 0
 
M(2) =  0 1 0
 0 − 0.803905 1 
 

 0 .235262 
 
y (3)
=M (2) (2)
y =  − 16900 . 5 
 − 0 . 20000 
 

 0.000003 0.213472 0.332147 


 
A(3) = M(2) A(2) =  0 − 15334 .9 − 23860 .0 
 − 0.50000 
 0 0

Thus the given system is equivalent to the upper triangular system

A(3)x = y(3)

Back substitution yields,

x3 = 0.40 00 00
x2 = 0.47 97 23
x1 = -1.33 33 3

10
This compares poorly with the correct answers (to 10 digits) given by

x1 = 0.67 41 21 46 94
x2 = 0.05 32 03 93 39.1
x3 = -0.99 12 89 42 52

Thus we see that the simple Gaussian Elimination method needs modification in
order to handle the situations that may lead to arr( r ) = 0 for some r or situations as
arising in the above example. In order to do this we introduce the idea of Partial
Pivoting in the next section.

11
1.2 GAUSS ELIMINATION METHOD WITH PARTIAL PIVOTING

In the Gaussian Elimination method discussed in the previous section, at the r th


stage we reduced all the entries in the rth column, below the r th principal diagonal
entry as zero. In the partial pivoting method, before we do this we look at the
entries in the r th diagonal and below it and then pick the one that has the largest
absolute value and we bring it to the diagonal position by a row interchange, and
then reduce the entries below the r th diagonal as zero. When we incorporate
this idea at each stage of the Gaussian elimination process we get the GAUSS
ELIMINATION METHOD WITH PARTIAL PIVOTING. We now illustrate this with
a few examples:

Note: The elementary row operations on the matrix A and the vector y can be
simultaneously carried out by introducing the ”Augmented matrix”, Aaug which is
obtained by appending y as an additional column at the end.

Example 1:

x1 + x2 + 2 x3 = 4
2x1 – x2 + x3 = 2
x1 + 2x2 =3

We have

1 1 2 4
 
Aaug =  2 − 1 1 2 
1 0 3 
 2
1st Stage: The pivot has to be chosen as 2 as this is the largest absolute valued
entry in the first column. Therefore we do

2 −1 1 2
Aaug →
R12 
1 1 2 4

1 3 
 2 0

Therefore we have

0 1 0 2 −1 1
   
M(1) =  1 0 0  and M(1) A(1) = A(2) = 1 1 2
0 1  1 2 0 
 0 

12
2
(1) (1) (2)
 
M A =y = 4
3
 

Next we have

 2 −1 1 2 
1 
R2 − R1
2 0 3 3 3
 
(2)
Aaug 2 2
1
R3 − R1  0 5 −1 2 
 
2
2 2

Here

 1 2 1 
 0 0   −1
M(2) =  − 1 1 0 ; M A = A = 0
(2) (2) (3) 3 3 
 2   2 2 
− 1  0 − 1 
0 1  5
 2   2 2

2
2
 
(2) (3)
M y =y = 3
2
 
5
Now at the next stage the pivot is since this is the entry with the largest
2
absolute value in the 1st column of the next submatrix. So we have to do another
row interchange.

Therefore

 2 −1 1 2 

(3)
Aavg →
R23 0 5 −1 2
 2 2 
 0 3 3 3 
 2 2 

13
 2 −1 1 
1 0 0 
 
(3)
M = 0 0 1 M(3) A(3) = A(4) =  0 5 −1 
 2 2
0 0   0 3
 1 3 
 2 2 

2
(3) (3) (4)
 
M y =y = 2
3
 

Next we have

 
 2 −1 1 2 
 
3
R2 − R2
(4)
Aavg → 5
0
5 −1 2 
2 2
 9 9 
0 0
 5 5

Here

1 2 
 0 0   −1 1 
M(4) =  0 1 0 M A =A = 0
(4) (4) (5) 5 − 1 
   2 2
0 −3 1  0 9 
 5  
0
5 

 2 
 
M(4) y(4) = y(5) =  2 
 
9 
 5

This completes the reduction and we have that the given system is equivalent to
the system

A(5)x = y(5)

i.e.

2x1 – x2 + x3 = 2

14
5 1
x2 - x3 = 2
2 2
9 9
x3 =
5 5

We now get the solution by back substitution:

The 3rd equation gives,

x3 = 1

using this in second equation we get

5 1 5 5
x2 - = 2 giving x2 = and hence x2 = 1.
2 2 2 2

Using the values of x1 and x2 in the first equation we get

2x1 – 1 + 1 = 2 giving x1 = 1

Thus we get the solution of the system as x1 = 1, x2 = 1, x3 = 1; the same as we


had obtained with the simple Gaussian elimination method earlier.

Example 2:
Let us now apply the Gaussian elimination method with partial pivoting to the
following example:

(0.000003)x1 + (0.213472)x2 + (0.332147) x3 = 0.235262


(0.215512)x1 + (0.375623)x2 + (0.476625) x3 = 0.127653
(0.173257)x1 + (0.663257)x2 + (0.625675) x3 = 0.285321,

the system to which we had earlier applied the simple GEM and had obtained
solutions which were far away from the correct solutions.

Note that

 0 .000003 0 .213472 0.332147 


 
A =  0 .215512 0 .375623 0 .476625 
 0 .173257 0 .625675 
 0 .663257

 0 .235262 
 
y =  0 .127653 
 0 . 285321 
 

15
We observe that at the first stage we must choose 0.215512 as the pivot. So we
have

 0 .215512 0 .375623 0 .476625 


 
A(1)
= A → A R12 (2)
=  0 .000003 0 .213472 0.332147 
 0 .173257 0 .625675 
 0 .663257

 0 . 127653  0 1 0
   
y(1) = y →
R12
y(2) =  0 . 235262  M(1) =  1 0 0
 0 .285321  0 1 
   0

Next stage we make all entries below 1st diagonal as zero

R2 + m21R1  0.215512 0.375623 0.476625 


 
A ( 2)
A ( 3)
= 0 0.213467 0.332140 
R3 + m21R1  0.361282 0.242501 
 0

where

a 21 0.000003
m21 = - =- = - 0.000014
a11 0.215512

a31 0.173257
m31 = - =- = - 0.803932
a11 0.215512

 1 0 0  0 .127653 
   
M(2) =  − 0.000014 1 0  ; y(3)
=M (2)
y(2)
=  0 .235260 
 − 0.803932 0 1   0 .182697 
   

In the next stage we observe that we must choose 0.361282 as the pivot. Thus
we have to interchange 2nd and 3rd row. We get,

16
 0 .215512 0 .375623 0 .476625 
 
A(3) →
R
A(4) = 
23 0 0.361282 0.242501 
 0.332140 
 0 0.213467

1 0 0  0 .127653 
   
M = 0
(3) 0 1 y (4)
=M (3)
y(3)
=  0 .182697 
0 0   0 .235260 
 1  

Now reduce the entry below 2nd diagonal as zero

 0 .215512 0 .375623 0.476625 


 
A(4) R3 + m32 R2
 → A5 =  0 0 .361282 0.242501 
 0 .188856 
 0 0

0.213467
m32 = - = - 0.590860
0.361282

1 0 0  0 .127653 
 0 
 
M(4) =  0 1 y(5)
=M(4)
y(4)
=  0.182697 
0 − 0.59086 1   0 .127312 
  

Thus the given system is equivalent to

A(5) x = y(5)

which is an upper triangular system and can be solved by back substitution to get

x3 = 0.674122
x2 = 0.053205 ,
x1 = - 0.991291

which compares well with the 10 decimal accurate solution given at the end of
section 1.1(page11). Notice that while we got very bad errors in the solutions
while using simple GEM whereas we have come around this difficulty by using
partial pivoting.

17
1.3 DETERMINANT EVALUATION

We observe that even in the partial pivoting method we get matrices

M(k), M(k-1), …. ,M(1) such that

M(k) M(k-1) …. M(1) A is upper triangular


and
therefore

det M(k) det M(k-1) …. det M(1) det A = Product of the diagonal entries in the final
upper triangular matrix.

Now det M(i) = 1 if it refers to the process of nullifying entries below the diagonal
to zero; and

det M(i) = -1 if it refers to a row interchange necessary for a partial pivoting.

Therefore det M(k) …. det M(1) = (-1)m where m is the number of row inverses
effected in the reduction.

Therefore det A = (-1)m product of the diagonal entries in the final upper
triangular matrix.

In our example 1 above, we had M(1), M(2), M(3), M(4) of which M(1) and M(3)
referred to row interchanges. Thus therefore there were two row interchanges
and hence

 5  9 
det A = (−1)2 (2)    = 9.
 2  5 
In example 2 also we had M(1), M(3) as row interchange matrices and
therefore det A = (-1)2 (0.215512) (0.361282) (0.188856) = 0.013608

LU decomposition:

Notice that the M matrices corresponding to row interchanges are no


longer lower triangular. (See M(1) & M(3) in the two examples.) Thus,

M(k) M(k-1) . . . . . M(1)

is not a lower triangular matrix in general and hence using partial pivoting we
cannot get LU decomposition in general.

18
1.4 GAUSS JORDAN METHOD

In this method we continue the partial pivoting method further to reduce,


using Elementary Row Operations, the diagonal entries where the nonzero pivots
are located to 1 and all other entries in the columns containing nonzero pivots to
zero. The resulting matrix is called the Row Reduced Echelon (RRE) form of the
given matrix, and is denoted by AR. These Elementary Row Transformations
correspondingly reduce the vector y to a form which we denote by yR. This is the
same as saying that the augmented matrix Aaug is reduced to the matrix (AR|yR)

Remark:

In case in the reduction process at some stage if we get


arr = ar +1r = L = ar +1n = 0, then even partial pivoting does not bring any nonzero
entry to rth diagonal because there is no nonzero entry available. In such a case
A is singular matrix and we proceed to the RRE form to get the general solution
of the system. As observed earlier, in the case A is nonsingular, Gauss-Jordan
Method leads to AR = In and the product of corresponding M(i) gives us A-1.

1.5 L U DECOMPOSITIONS

We shall now consider the LU decomposition of matrices. Suppose A is


an nxn matrix. If L and U are lower and upper triangular nxn matrices
respectively such that A = LU, we say that this is a LU decomposition of A. Note
that LU decomposition is not unique. For example if A = LU is a decomposition
then A = Lα Uα is also a LU decomposition where α ≠ 0 is any scalar and Lα = α
L and Uα = 1/α U.

Suppose we have a LU decomposition A = LU. Then, the system, Ax = y,


can be solved as follows:

Set Ux = z …………… (1)

Then the system Ax = y can be written as,

LUx = y,
i.e.,
Lz = y ……………..(2)

Now (2) is a triangular system – infact lower triangular and hence we can solve it
by forward substitution to get z.

Substituting this z in (1) we get an upper triangular system for x and this can be
solved by back substitution.

19
Further if A = LU is a LU decomposition then, det. A can be
calculated as
det. A = det. L . det. U
= l11 l22 ….lnn u11u22 …..unn

where lii are the diagonal entries of L, and uii are the diagonal entries of U.

Also A-1 can be obtained from an LU decomposition as A-1 = U-1 L-1.

Thus an LU decomposition helps to break a given system into Triangular


systems; to find the determinant of a given matrix; and to find the inverse of a
given matrix.

We shall now give methods to find LU decomposition of a matrix.


Basically, we shall be considering three cases. First, we shall consider the
decomposition of a Tridiagonal matrix; secondly the Doolittles’s method for a
general matrix, and thirdly the Cholesky’s method for a symmetric matrix.

I. TRIDIAGONAL MATRIX

Let

 b1 a2 0 0 .... 0
 
 c1 b2 a3 0 .... 0
0 c2 b3 a4 .... 0
 
A =  .... .... .... .... .... .... 
 .... .... .... .... .... .... 
 
0 .... 0 cn − 2 bn −1 an 
0 bn 
 .... .... 0 cn −1

be an nxn tridiagonal matrix. We seek a LU decomposition for this. First we shall


give some preliminaries.

Let δi denote the determinant of the ith principal minor of A

20
b1 a2 0 .... 0
c1 b2 a3 .... 0
.... .... .... .... ....
δi =
.... .... .... .... ....
0 .... ci − 2 bi −1 ai
0 .... 0 c i −1 bi

Expanding by the last row we get,

δi = bi δi-1 – ci-1 ai δi-2 ; i = 2,3,4, ……..


δ1 = b1 ……..(I)

We define δ0 = 1

From (I) assuming that δi are all nonzero we get

δi δ
= bi − c i −1 a i i − 2
δ i −1 δ i −1

δi
Setting = ki this can be written as
δ i −1

ai
bi = k i + ci −1 ...............................( II )
k i −1

Now we seek a decomposition of the form A = LU where,

 1 0 .... .... .... 0   u1 α 2 0 .... 0 


 w 1 0 .... .... 0   
 0 u α .... 0 
 1  2 3

L =  0 w2 1 0 .... 0  ; U =  .... .... .... .... .... 

 
 .... .... .... .... .... ....   0 0 .... u n −1 α n 
 0 0 .... .... w  0 0 .... 0 u 
 n −1 1   n 

i.e. we need the lower triangular and upper triangular parts also to be
‘Tridiagonal’ triangular.

21
Note that if A = (aij) then because A is Tridiagonal, aij is nonzero only when
i and j differ by 1. i.e. only ai-1i, aii, aii+1 are nonzero. In fact,

ai-1i = ai
aii = bi …………….. (III)
ai+1i = ci

In the case of L and U we have

li +1i = wi
lii = 1 …………….. (IV)
lij = 0 if i) j>i or ii) j<i with i-j ≥ 2.

uii +1 = αi+1
uii = ui ……………………. (V)

uii = 0 if i) i>j or ii) i<j with j-I ≥ 2.

Now A = LU is what is needed.

Therefore,
n
a ij = ∑ lik u kj …………….. (VI)
k =1
Therefore
n
a i −1i = ∑ li −1k u ki
k =1

Using (III), (IV) and (V) we get

a i = li −1i −1ui −1i = α i

Therefore

αi = ai ………………….. (VII)

This straight away gives us the off diagonal entries of U.

From (VI) we also get


n
a ii = ∑ lik u ki
k =1

22
= Lii−1U i−1i + LiiU ii

Therefore

b i = w i − 1α i + u i …………….. (VIII)

From (VI) we get further,

n
a i +1i = ∑
k =1
li +1 k u k i

= li +1i uii + li +1i +1ui +1i

ci = wi u i

Thus ci = wi u i …………….. (IX)

Using (IX) in (VIII) we get (also using αI = ai)

c i −1 a i
bi = + ui
u i −1

Therefore

c i −1 a i
bi = u i + …………….. (X)
u i −1

Comparing (X) with (II) we get

δi
ui = ki = …………….. (XI)
δ i −1

using this in (IX) we get

ci ciδ i−1
wi = = …………….. (XII)
ui δi

From (VII) we get

23
α i = ai …………….. (XIII)

(XI), (XII) and (XIII) completely determine the matrices L and U and hence
we get the LU decomposition.

Note : We can apply this method only when δI are all nonzero. i.e. all the
principal minors have nonzero determinant.

Example :
 2 −2 0 0 0 
 
− 2 1 1 0 0 
Let A= 0 −2 5 −2 0 
 
 0 0 9 −3 1 
 0 − 1 
 0 0 3

Let us now find the LU decomposition as explained above.

We have
b1 = 2 b2 = 1 b3 = 5 b4 = -3 b5 = -1
c1 = -2 c2 = -2 c3 = 9 c4 = 3
a2 = -2 a3 = 1 a4 = -2 a5 = 1

We have
δ0 = 1
δ1 = 2
δ2 = b2 δ1 – a2 c1 δ0 = 2-4 = -2
δ3 = b3 δ2 – a3 c2 δ1 = (-10) – (-2) (2) = -6
δ4 = b4 δ3 – a4 c3 δ2
= (-3) (-6) – (-18) (-2) = -18
δ5 = b5 δ4 – a5 c4 δ3
= (-1) (-18) – (3) (-6)
= 36.

Note δ1,δ2,δ3,δ4,δ5 are all nonzero. So we can apply the above method.

Therefore by (XI) we get

δ1 δ −2 δ −6
u1 = = 2; u 2 = 2 = = −1; u 3 = 3 = =3
δ0 δ1 2 δ2 − 2

24
δ 4 − 18 δ 5 36
u4 = = = 3 ; and u5 = = = −2
δ3 − 6 δ 4 − 18
From (XII) we get
c1 − 2
w1 = = = −1
u1 2
c2 − 2
w2 = = =2
u2 − 1
c3 9
w3 = = =3
u3 3
c4 3
w4 = = =1
u4 3

From (XIII) we get

α 2 = a2 = −2
α 3 = a3 = 1
α 4 = a4 = −2
α 5 = a5 = 1

Thus;

 1 0 0 0 0 2 −2 0 0 0 
   
−1 1 0 0 0 0 −1 1 0 0 
L = 0 2 1 0 0 U = 0 0 3 −2 0 
  ;  
 0 0 3 1 0 0 0 0 3 1 
 0 1  0 − 2 
 0 0 1  0 0 0

25
In the above method we had made all the diagonal entries of L as 1. This
will facilitate solving the triangular system Lz = y (equation (2) on page 19).
However by choosing these diagonals as 1 it may be that the ui, the diagonal
entries in U become small thus creating problems in back substitution for the
system Ux = z (equation (1) on page 19). In order to avoid this situation
Wilkinson suggests that in any triangular decomposition choose the diagonal
entries of L and U to be of the same magnitude. This can be achieved as
follows:
We seek
A = LU
where
 l1 
 
w2 l2
L= 
 O 
 
 wn −1ln 

 u1 α 2 0 .... 0 
 
 0 u2 α 3 .... .... 
U =  .... .... .... .... .... 
 
0 0 .... u n −1 αn 
0 u n 
 0 .... 0

lii = li
Now li +1i = wi

lii = 0 if i) j>i or ii) j<i with i-j ≥ 2

uii = ui

ui +1i = α i +1

uij = 0 if i) i>j; or ii) j>i with j-i ≥ 2

Now (VII), (VIII) and (IX) change as follows:

26
n
ai = a i −1i = ∑ li −1k uki = li −1i −1ui −1i = li −1α i
k −1

Therefore
ai = li −1α i …………….. (VII`)

n
bi = a ii = ∑ lik uki = lii −1ui −1i + lii uii
k −1

= wi −1α i + li ui
Therefore

bi = wi −1α i + li u i …………….. (VIII`)

n
ci = a i +1i = ∑ li +1k u ki
k −1

= li+1iuii

= wiu i

ci = wiu i …………….. (IX`)

From (VIII`) we get using (VII`) and (IX`)


c i −1 a i
bi = . + liu i
u i −1 l i −1

a i c i −1
= + liu i
l i −1 u i −1

ai ci −1
bi = + pi …………….. (X`)
pi −1

27
where
pi = l i ui

Comparing (X`) with (II) we get

pi = ki =
δi
δ i −1
Therefore

li u i =
δi
δ i −1

δi
we choose li = δ i −1 …………….. (XIV)

δi δi
ui =  sgn 
 δ i −1  δ i −1 …………….. (XV)

Thus li and ui have same magnitude. These then can be used to get wi and αi
from (VII`) and (IX`). We get finally,

δi  δ  δ
li = ; u i =  sgn . i  i …………….. (XI`)
δ i −1 δ i −1  δ i −1

ci
wi = ……………... (XII`)
ui
α i = ai l …………….. (XIII`)
i −1

These are the generalizations of formulae (XI), (XII) and (XIII).

Let us apply this to our example matrix (on page 23).


We get;

28
δ0 = 1 δ1 = 2 δ2 = -2 δ3 = -6 δ4 = -18 δ5 =
36

b1 = 2 b2 = 1 b3 = 5 b4 = -3 b5 = -1

c1 = -2 c2 = -2 c3 = 9 c4 = 3

a1 = -2 a3 = 1 a4 = -2 a5 = 1

We get δ1/δ0 = 2 ; δ2/δ1 = -1 ; δ3/δ2 = 3 ; δ4/δ3 = 3 ; δ5/δ4 = -2

Thus from (XI`) we get


l1 = 2 u1 = 2

l2 = 1 u2 = −1

l3 = 3 u3 = 3

l4 = 3 u4 = 3

From (XII`) we get

C1 − 2 C2 − 2
w1 = = =− 2 ; w2 = = =2 ;
u1 2 u2 −1

C3 9 C4 3
w3 = = =3 3 ; w4 = = = 3
u3 3 u4 3

From (XIII`) we get

a2 − 2 a3 1
α2 = = =− 2 ; α3 = = =1 ;
l1 2 l2 1

29
a4 − 2 a5 1
α4 = = ; α5 = =
l3 3 l4 3

Thus, we have LU decomposition,

 2 − 2 0 0 0 
 2 −2 0 0 0   2 0 0 0 0 
 0 
 −2 1  −1 1 0 0 
 1 0 0  − 2 1 0 0 0  
  0 0 3 −2 0 
A =  0 −2 5 −2 0  =  0 2 3 0 0  3
 
9 −3 1   0   0 1 

0 0  0 3 3 3 0  0 0 3
0 0 0 3 −1  3
 
2 
14444 − 2 
0 0 0 3
4244444 3 0 0 0 0
L 144444 4 2444444 3
U

in which the L and U have corresponding diagonal elements having the same
magnitude.

30
1.6 DOOLITTLE’S LU DECOMPOSITION

We shall now consider the LU decomposition of a general matrix. The


method we describe is due to Doolittle.
Let A = (a ij ). We seek as in the case of a tridiagonal matrix, an LU decomposition
B B

in which the diagonal entries l ii of L are all 1. Let L = (l ii ) ; U = (u ij ). Since L is a B B B B B B

lower triangular matrix, we have


l ij = 0 if j > i ; and by our choice, lii = 1 .
B B

Similarly, since U is an upper triangular matrix, we have


u ij = 0 if i > j.
B B

We determine L and U as follows : The 1 st row of U and 1 st column of L are P


P
P
P

determined as follows :
n
a 11 = ∑
k =1
l1k u k 1

= l 11 u 11 B B B B since l 1k = 0 for k>1


B B

= u 11 B B since l 11 = 1.
B B

Therefore
u11 = a11
In general,
n
a1 j = ∑
k =1
l 1 k u kj

= l11u1 j . since l 1k = 0 for k>1 B B

= u 1j B B since l 11 = 1. B B

⇒ u 1j = a 1j . . . . . . . . . (I)
B B B B

Thus the first row of U is the same as the first row of A. The first column of L is
determined as follows:
n
a j1 = ∑
k =1
l jk u k 1

= l j1 u 11 B B B B since u k1 = 0 if k>1
B B

⇒ l j1 = a j1 /u 11 . . . . . . . . . (II)
B B B B B B

Note : u 11 is already obtained from (I).


B B

31
Thus (I) and (II) determine respectively the first row of U and first column of L.
The other rows of U and columns of L are determined recursively as given below:
Suppose we have determined the first i-1 rows of U and the first i-1 columns of L.
Now we proceed to describe how one then determines the i th row of U and i th P P P P

column of L. Since first i-1 rows of U have been determined, this means, u kj ; are B B

all known for 1 ≤ k ≤ i-1 ; 1 ≤ j ≤ n. Similarly, since first i-1 columns are known
for L, this means, l ik are all known for 1 ≤ i ≤ n ; 1 ≤ k ≤ i-1.
B B

Now
n
a ij = ∑
k =1
l ik u kj

i
= ∑
k =1
l ik u kj
since l ik = 0 for k>i B B

i −1
= ∑l
k =1
ik u kj + l ii u ij

i −1
= ∑l
k =1
ik u kj + u ij since l ii = 1. B B

i−1
⇒ u ij = a ij − ∑
k =1
l ik u kj . . . . . ... . . . .(III)

Note that on the RHS we have a ij which is known from the given matrix. Also the B B

sum on the RHS involves l ik for 1 ≤ k ≤ i-1 which are all known because they B B

involve entries in the first i-1 columns of L ; and they also involve u kj ; 1 ≤ k ≤ i-1 B B

which are also known since they involve only the entries in the first i-1 rows of U.
Thus (III) determines the i th row of U in terms of the known given matrix and
P
P

quantities determined upto the previous stage. Now we describe how to get the
i th column of L :
P
P

n
a ji = ∑
k =1
l jk u ki

i
= ∑k =1
l jk u ki Since u ki = 0 if k>i B B

i −1
= ∑l
k =1
jk u ki + l ji u ii

1  i−1

⇒ l ji =
u ii 
a ji − ∑
k =1
l jk u ki  …..(IV)

32
Once again we note the RHS involves u ii , which has been determined using (III); B B

a ji which is from the given matrix; l jk ; 1 ≤ k ≤ i-1 and hence only entries in the first
B B B B

i-1 columns of L; and u ki , 1 ≤ k ≤ i-1 and hence only entries in the first i-1 rows of B B

U. Thus RHS in (IV) is completely known and hence l ji , the entries in the i th B B P
P

column of L are completely determined by (IV).


Summarizing, Doolittle’s procedure is as follows:
l ii = 1; 1 st row U = 1 st row of A ; Step 1 determining 1 st row of U and
B B P P P P P P

l j1 = a j1 /u 11
B B B B B B 1 st column of L.
P
P

For i ≥ 2; we determine, (by (III) and (IV)),


i−1
u ij = a ij − ∑k =1
l ik u kj ; j = i, i+1, i+2, …….,n

(Note for j<i we have u ij = 0) B B

1  i −1

l ji =
u ii 
a ji − ∑ k =1
l jk u ki  ; j = i, i+1, i+2,…..,n

(Note for j<i we have l ji = 0) B B

We observe that the method fails if u ii = 0 for some i. B B

U Example:
Let
 2 1 −1 3 
 
− 2 2 6 − 4
A=
4 14 19 4 
 
 6 − 
 0 6 12 
Let us determine the Doolittle decomposition for this matrix.

U First step:
1 st row of U : same as 1 st row of A.
P P P P

Therefore u 11 = 2 ; u 12 = 1 ; u 13 = -1 ; u 14 = 3 B B B B B B B B

U 1 st column of L:
UPU
UPU
U

l 11 = 1;
B B

l 21 = a 21 /u 11 = -2/2 = -1.
B B B B B B

l 31 = a 31 /u 11 = 4/2 = 2.
B B B B B B

33
l 41 = a 41 /u 11 = 6/2 = 3.
B B B B B B

U Second step: U

2 nd row of U : u 12 = 0 (Because upper triangular)


P P B B

u 22 = a 22 – l 21 u 12 = 2 – (-1) (1) = 3.
B B B B B B B B

u 23 = a 23 – l 21 u 13 = 6 – (-1) (-1) = 5.
B B B B B B B B

u 24 = a 24 – l 21 u 14 = - 4 – (-1) (3) = -1.


B B B B B B B B

2 nd column of L : l 12 = 0 (Because lower triangular)


P
P
B B

l 22 = 1.
B B

l 32 = (a 32 – l 31 u 12 ) /u 22
B B B B B B B B B

B B = [14 – (2)(1)]/3 = 4.
l 42 = (a 42 – l 41 u 12 ) /u 22
B B B B B B B B B

B B = [0 – (3)(1)]/3 = -1.

U Third Step: U

3 rd row of U: u 31 = 0
P
P
B B (because U is upper triangular )
u 32 = 0B B

u 33 = a 33 – l 31 u 13 – l 32 u 23
B B B B B B B B B B B B

= 19 – (2) (-1) – (4)(5) = 1.


u 34 = a 34 – l 31 u 14 – l 32 u 24
B B B B B B B B B B B B

= 4 – (2) (3) – (4)(-1) = 2.

3 rd column of L : l 13 = 0
P
P
B B (because L is lower triangular)
l 23 = 0
B B B B

l 33 =1
B B

l 43 = (a 43 – l 41 u 13 – l 42 u 23 )/ u 33
B B B B B B B B B B B B B B

= [-6 – (3) (-1) – (-1) (5)]/1


= 2.

34
U Fourth Step: U

4 th row of U: u 41 = 0
P P B B

u 42 = 0B B (because upper triangular)


u 43 = 0B B

u 44 = a 44 – l 41 u 14 – l 42 u 24 – l 43 u 34
B B B B B B B B B B B B B B B B

= 12 – (3) (3) – (-1) (-1) – (2) (2)


= -2.

4 th column of L : l 14 = 0 = l 24 = l 34 Because lower triangular


P P B B B B B B

l 44 = 1.
B B

Thus
1 0 0 0 2 1 −1 3 
   
−1 1 0 0 0 3 5 −1
L= ; U = . . . . . . . . . . .(V)
2 4 1 0 0 0 1 2 
   
 3 −1 1  0 0 0 − 2 
 2 
and
A = LU.
This gives us the LU decomposition by Doolittle’s method for the given A.
As we observed in the case of the LU decomposition of a tridiagonal matrix;
it is not advisable to choose the l ii as 1; but to choose in such a way that the B B

diagonal entries of L and the corresponding diagonal entries of U are of the same
magnitude. We describe this procedure as follows:

Once again 1 st row and 1 st column of U & L respectively is our first concern:
P
P
P
P

U Step 1: U a 11 = l 11 u 11
B B B B B

Choose l11 = a11 ; u11 = (sgn .a11 ) a11


n

Next aij = ∑ l1k ukj = l11u1 j as l1k = 0 for k > 1


k =1

a1 j
⇒ u ij =
l1 1
Thus note that u 1j have been scaled now as compared to what we did earlier.
B B

Similarly,

35
a
l j1 = j1
u11
These determine the first row of U and first column of L. Suppose we have
determined the first i-1 rows of U and first i-1 columns of L. We determine now
the i th row of U and i th column of L as follows:
P
P
P
P

n
a ii = ∑
k =1
l ik u ki

i
= ∑
k =1
l ik u k i for l ik = 0 if k>i
B B

i−1
= ∑k =1
l ik u ki + l ii u ii

Therefore
i −1
l ii u ii = a ii − ∑ l ik u ki = p i , say
k =1

i −1
Choose l ii = pi = a ii − ∑ l ik u ki
k =1

u ii = − sgn pi pi

n i
aij = ∑ lik ukj = ∑ lik u kj Q lik = 0 for k > i
k −1 k =1

i −1
= ∑l k =1
ik u kj + l ii u ij

 i −1

⇒ u ij =  a ij − ∑ l ik u kj  l ii
 
B B

k =1

determining the i th row of U. P


P

n
a ji = ∑l
k =1
jk u ki

36
i
= ∑l
k =1
jk u ki Q u k i = 0 if k > i

i −1
= ∑k =1
l jk u ki + l ji u ii

 i −1

⇒ l ji =  a ji − ∑ l jk u ki  u ii ,
B B

 k =1 

thus determining the i th column of L. P


P

Let us now apply this to matrix A in the example in page 32.


U First Step: U

l 11 u 11 = a 11 = 2 ∴ l 11 = 2 ; u 11 = 2
a12 1 a 1 a 3
u12 = = u13 = 13 = − ; u14 = 14 =
l11 2 l11 2 l11 2
1 1 3
u11 = 2 ; u12 = ; u13 = − ; u14 =
2 2 2
a 21 2
l 21 = =− =− 2
u11 2
a 31 4
l 31 = = =2 2
u 11 2
a 41 6
l 41 = = =3 2
u 11 2
Therefore
l11 = 2
l 21 = − 2
l31 = 2 2
l 41 = 3 2

U Second Step: U

l 22 u 22 = a 22 − l 21u12

37
(
 1 
=2− − 2  =3 )
 2

∴ l 22 = 3; u 22 = 3

u23 =
( a23 − l21u13 )
l22
  1 
6 − − 2  −

( 
2 
)
= = 5
3 3

u24 =
( a24 − l21u14 )
l22

  3 
(
( −4 ) − − 2  
 2 
)
= =− 1
3 3
Therefore
5 1
u21 = 0; u22 = 3; u23 = ; u 24 = −
3 3

l32 =
( a32 − l31u12 )
u22
  1 
14 − 2 2  ( 
 2 
)
=
3

=4 3

l42 =
( a42 − l41u12 )
u22
  1 
(
0 − 3 2  
 2 
)
=
3

=− 3

38
Therefore
l12 = 0
l 22 = 3
l32 = 4 3
l 42 = 3
U Third Step: U

l33u33 = a33 − l31u13 − l32u 23

( )
 1 
= 19 − 2 2  −
 5 
 − 4 3   ( )
 2  3
=1
∴ l33 = 1; u 33 = 1

u34 =
( a34 − l31u14 − l32u24 )
l33
  1  

(
 3 
 4 − 2 2 
 2
)
− 4 3 −

 
3  
( )
= 
1
=2

∴u31 = 0; u32 = 0; u33 = 1, u34 = 2

l43 =
[ a43 − l41u13 − l42u23 ]
u33

  5 
 
(
 1 
 −6 − 3 2  −
2
− − 3 
 
)3


( )
=
1
=2
Therefore
 l 13 = 0
l = 0 
 23
 l 33 = 1 
 
 l 43 = 2

39
U Fourth Step: U

l44u44 = a44 − l41u14 − l42u24 − l43u34

( )
 3 
= 12 − 3 2  (
 1 
 − − 3  − )
 − (2 )(2 )
 2  3
= -2

∴ l44 = 2; u 44 = − 2

∴ u 41 = 0; u 42 = 0; u 43 = 0; u 44 = − 2

 l 14 = 0 
l = 0 
 24
 l 34 = 0 
 
 l 44 = 2 

Thus we get the LU decompositions,


 1 1 3 
 2 0 0 0   2 − 
   2 2 2 
− 2 3 0 0   5 1 
L=  ; U = 0 3 −
2 2 4 3 1 0  3 3
 2 
3 − 3 2   0 0 1
 2 2

 0 0 0 − 2 

in which l ii = u ii , i.e. the corresponding diagonal entries of L and U have the


same magnitude.

U Note: Compare this with the L and U of page 35. What is the difference.
U

The U we have obtained above can be obtained from the U of page 34 by


(1) replacing the ‘numbers’ in the diagonal of the U of Page 34 by their square
roots of their magnitude and keeping the same sign. Thus the first
diagonal 2 is replaced by 2 ; 2 nd diagonal 3 is replaced by 3 , third
P P

th
diagonal 1 by 1 and 4 diagonal –2 by - 2 .
P
P
These then give the
diagonals of the U we have obtained above.

40
(2) Divide each entry to the right of a diagonal in the U of page 35 by these
replaced diagonals.

Thus 1 st row of the U of Page 34 changes to 1 st row of U in page 40


P P P P

2 nd row of the U of Page 34 changes to 2 nd row of U in page 40


P
P
P
P

3 rd row of the U of Page 34 changes to 3 rd row of U in page 40


P P P P

4 th row of the U of Page 34 changes to 4 th row of U in page 40


P
P
P
P

This gives the U of page 40 from that of page 35.


The L in page 40 can be obtained from the L of page 35 as follows:
(1) Replace the diagonals in L by magnitude of the diagonals in U of page 40.
(2) Multiply each entry below the diagonal of L by this new diagonal entry.
We get the L of page 35 changing to the L of page 40.

41
1.7 DOOLITTLE’S METHOD WITH ROW INTERCHANGES

We have seen that Doolittle factorization of a matrix A may fail the moment
at stage i we encounter a uii which is zero. This occurrence corresponds to the
occurrence of zero pivot at the ith stage of simple Gaussian elimination method.
Just as we avoided this problem in the Gaussian elimination method by
introducing partial pivoting we can adopt this procedure in the modified Doolittle’s
procedure. The Doolittle’s method which is used to factorize A as LU is used
from the point of view of reducing the system
Ax = y
to two triangular systems
Lz = y
Ux = z
as already mentioned on page 19.
Thus instead of actually looking for a factorization A = LU we shall be
looking for a system,
A*x = y*
and for which A* has LU decomposition.
We illustrate this by the following example: The basic idea is at each stage
calculate all the uii that one can get by the permutation of rows of the matrix and
choose that matrix which gives the maximum absolute value for uii.
As an example consider the system
Ax = y
where
 3 1 − 2 − 1  3 
   
2 − 2 2 3  − 8
A=
1 5 − 4 − 1
y=  
  3
3 1   
 − 1
 2 3   
We want LU decomposition for some matrix that is obtained from A by row
interchanges.
We keep lii = 1 for all i .
Stage 1:
1st diagonal of U. By Doolittle decomposition,
u11 = a11 = 3
If we interchange 2nd or 3rd or 4th rows with 1st row and then find the u11 for the
new matrix we get respectively u11 = 2 or 1 or 3. Thus interchange of rows does

42
not give any advantage at this stage as we have already got 3, without row
interchange, for u11.

So we keep the matrix as it is and calculate 1st row of U, by Doolittle’s method.


u11 = 3; u12 = a12 = 1; u13 = a13 = −2; u14 = −1
The first column of L:
a 21 2 a 1 a 3
l11 = 1; l 21 = = ; l31 = 31 = ; l 41 = 41 = = 1.
u11 3 u11 3 u11 3
Thus
 1 0 0 0 
2 
 3 1 0 0 
L is of the form   ; and
 13 l 32 1 0 
 
 1 l 42 l43 l 44 

3 1 −2 −1 
0 u u23 u24 
U is of the form 
22
; A and Y remaining unchanged.
0 0 u33 u34 
 
0 0 0 u44 

Stage 2

We now calculate the second diagonal of U: By Doolittle’s method we have

u 22 = a22 − l 21u12
 2
= − 2 −   (1) = −
8
 3 3
Suppose we interchange 2nd row with 3rd row of A and calculate u22 : our new a22
is 5.
But note that l21 and l31 get interchanged. Therefore new l21 is 1/3.
Suppose instead of above we interchange 2nd row with 4th row of A:
New a22 = 1 and new l21 = 1 and therefore new u22 = 1 – (1) (1) = 0

43
Of these 14/3 has largest absolute value. So we prefer this. Therefore we
interchange 2nd and 3rd row.

3 1 −2 − 1  3 
   
1 5 −4 − 1  3 
NewA =  ; Newy = 
2 −2 2 3  − 8
   
3 3   − 1
 1 2  
 1 0 0 0 3 1 −2 −1 
   
 13 1 0 0  0 14 3 * *
NewL =   ; NewU =  
 23 * 1 0 0 0 * *
  0
 0 0 * 
 1 * * 1

Now we do the Doolittle calculation for this new matrix to get 2nd row of U and 2nd
column of L.

u 23 = a 23 − l 21u13
1
= (− 4 ) −  (− 2 ) = −
10
 
3 3

u 24 = a 24 − l 21u14
1
= (− 1) −  (− 1) = −
2
3 3

2nd column of L:
l 32 = [a 32 − l 31u12 ] ÷ u 22
  2   14
=  (− 2 ) −   (1 ) ÷
 3  3
4
= −
7
l42 = [a42 − l41u12 ] ÷ u11

= [3 − (1 )(1 )] ÷
14
3
=0

44
 1 0 0 0
 1 
 3
1 0 0
Therefore new L has form  
 2 −4 1 0
3 7
 
 1 0 * 1 

3 1 −2 −1 
 
 0 14 3 − 10 −2 
New U has form  3 3

0 0 * * 
0 * 
 0 0

This completes the 2nd stage of our computation.

Note: We had three choices of u22 to be calculated, namely –8/3, 14/3, 0 before
we chose 14/3. It appears that we are doing more work than Doolittle. But this is
not really so. For, observe, that the rejected u22 namely – 8/3 and 0 when divided
by the chosen u22 namely 14/3 give the entries of L below the second diagonal.

3rd Stage:
3rd diagonal of U:
u 33 = a 33 − l 31u13 − l 32 u 23
 2  4  10 
= 2 −  (− 2) −  −  − 
 3  7  3 
10
=
7
Suppose we interchange 3rd row and 4th row of new A obtained in 2nd stage. We
get new a33 = 2.
But in L also the second column gets 3rd and 4th row interchanges
Therefore new l31 = 1 and new l32 = 0
Therefore new u33 = a33 – l31 u13 – l32 u23
 10 
= 2 − (1)(− 2) + (0 ) − 
 3
= 4.

45
Of these two choices of u33 we have 4 has the largest magnitude. So we
interchange 3rd and 4th rows of the matrix of 2nd stage to get

 3 1 − 2 − 1  3 
   
 1 5 − 4 − 1  3 
NewA =   NewY =  
3 1 2 3 −1
   
2 − 2 2 3   − 8
  

 1 0 0 0 3 1 −2 −1 
   
 13 1 0 0  0 14 3 − 10 −2 
NewL =   ; NewU = 
3 3

 1 0 1 0
0 0 4 * 
 2  
−4 * 1 0 0 0 * 
 3 7 

Now for this set up we calculate the 3rd stage entries as in Doolittle’s method:
u 34 = a34 − l 31u14 − l32 u 24
 2
= 3 − (1)(− 1) − (0 ) −  = 4
 3

l 43 = (a 43 − l 41u13 − l 42 u 23 ) ÷ u 33
  2  4  10 
= 2 −  (− 2) −  −  −  ÷ 4
  3  7  3 
= 5/14.
 1 0 00 3 1 −2 −1 
   
 13 1 0 0  0 14 3 − 10 −2 
∴ NewL =   ; NewU = 
3 3

 1 0 1 0
0 0 4 4 
 2  
−4 5 1 0 0 0 * 
 3 7 14 

Note: The rejected u33 divided by chosen u33 gives l43.

46
4th Stage
u 44 = [a44 − l 41u14 − l 42 u 24 − l 43u 34 ]
 2  4  2   5 
= 3 −  (− 1) −  −  −  −  (4)
 3  7  3   14 
= 13/7.

3 1 −2 − 1  3 
   
1 5 −4 − 1  3 
∴ NewA = A * =  NewY = Y *
=
3 1 2 3  − 1
   
2 −2 3   − 8
 2  

New L = L* , New U = U*

 1 0 0 0 3 1 −2 −1 
   
 13 1 0 0  0 14 3 − 10 −2 
3 3
L =
*
 ;U = 
*
,
 1 0 1 0 0 0 4 4 
 2 −4 5 1   0 0 0 13 
 3 7 14   7

and A* = L*U*

The given system Ax=y is equivalent to the system

A*x=y*

and hence can be split into the triangular systems,

L * z = y*

U*x = z

Now L*z = y* gives by forward substitution:

47
Z1 =3

1
z1 + z 2 = 3 ⇒ z 2 = 3 − 1 = 2
3

z1 + z 3 = − 1 ⇒ z 3 = − 1 − z 1 = − 4

2 4 5
z1 − z 2 + z 3 + z 4 = −8
3 7 14

2 4  5 
  (3 ) −   (2 ) +   (− 4 ) + z 4 = − 8
3 7  14 

52
⇒ z4 = −
7
 3 
 
 2 
∴z = −4 
 52 
− 
 7 

Therefore U*x = z gives by back-substitution;

13 52
x4 = − therefore x4 = -4.
7 7

4x3 + 4x4 = −4 ⇒ x3 + x4 = −1 ⇒ x3 = −1− x4 = 3

therefore x3 = 3
14 10 2
x2 − x3 − x 4 = 2
3 3 3

48
 10   2 
x 2 −  (3) − (− 4 ) = 2
14
3  3   3

⇒ x2 = 2

3 x1 + x 2 − 2 x3 − x 4 = 3

⇒ 3x1 + 2 − 6 + 4 − 3 ⇒ x1 = 1

Therefore the solution of the given system is

 1 
 
 2 
x = 
3 
 
− 4
 

Some Remarks:

The factorization of a matrix A as the product of lower and upper triangular


matrices is by no means unique. In fact, the diagonal elements of one or the
other factor can be chosen arbitrarily; all the remaining elements of the upper
and lower triangular matrices may then be uniquely determined as in Doolittle’s
method; which is the case when we choose all the diagonal entries of L as 1.
The name of Crout is often associated with triangular decomposition methods,
and in Crout’s method the diagonal elements of U are all chosen as unity. Apart
from this, there is little distinction, as regards procedure or accuracy, between the
two methods.

As already mentioned, Wilkinson’s (see Page 25), suggestion is to get a LU


decomposition in which l ii = u ii ;1 ≤ i ≤ n .

We finally look at the Cholesky decomposition for a symmetric matrix:

Let A be a symmetric matrix.

49
Let A = LU be its LU decomposition

Then AT = UT LT where the superscript T denotes the Transpose.


We have
UT is lower triangular and LT is upper triangular
Therefore UTLT is a decomposition of AT as product of lower and upper triangular
matrices. But AT = A since A is symmetric.
Therefore LU = UTLT
We ask the question whether we can choose L as UT; so that
A = UTU (or same as LLT)
In that case, determining U automatically gets L = UT
We now do the Doolittle method for this. Note that it is enough to determine the
rows of U.
Stage 1: 1st row of U:
n n
a11 = ∑l1k uk1 = ∑ukl 2 since l1k = uk1 Q L = UT
k =1 k =1

= u112 since u k 1 = 0 for k>1 as U is upper triangular.

∴ u 11 = a 11

n n
a1i = ∑ l1k u ki = ∑ u k1u ki
k =1 k =1

= u 11 u 1 i since uk 1 = 0 for k > 1

u11 = a11

∴ u1i = a1i / u11 determines the first row of U.


and hence the first column of L.

Having determined the 1st i-1 rows of U; we determine the ith row of U as follows:

n n
a ii = ∑
k =1
l ik u k i = ∑ k =1
u k l 2 s i n c e l ik = u k i

50
i
= ∑k =1
u ki
2
s in c e u ki = 0 fo r k > i

i −1
= ∑uk =1
ki
2
+ u ii 2

i −1
∴ u ii 2 = a ii − ∑k =1
u ki 2

i −1
∴ u ii = a ii − ∑k =1
u ki 2 ;

( Note that uki are known for k ≤ i -1,because 1st i-1 rows have already been
obtained).

n n
a ij = ∑l
k =1
ik u kj = ∑u
k =1
ki u kj Now we need uij for j > i

i
= ∑ u ki u kj Because uki = 0 for k > i
k =1

i −1
= ∑u
k =1
ki u kj + u ii u ij

Therefore

 i −1

u ij =  a ij −


k =1
u ki u kj  ÷ u ii

51
 i −1
 u ii = a ii − ∑ u ki
2

 k =1
∴
u =  a − 
i −1

 ij  ij ∑ u ki kj  ÷ u ij
u

 k = 1

determines the ith row of U in terms of the previous rows. Thus we get U and L is
U1. This is called CHOLESKY decomposition.

Example:

 1 −1 1 1
 
−1 5 − 3 3 
Let A = 
1 −3 3 1
 
 1 
 3 1 10 

This is a symmetric matrix. Let us find the Cholesky decomposition.

1st row of U

u11 = a11 = 1

u12 = a12 ÷ u11 = − 1

u13 = a13 ÷ u11 = 1
u = a14 ÷ u11 = 1
 14

2nd row of U

u = a − u 212 = 5 − 1 = 2
 22 22

u 23 = (a 23 − u12 u13 ) ÷ u 22 = (− 3 − (− 1)(1)) ÷ 2 = −1


u = (a − u u ) ÷ u = (3 − (− 1)(1)) ÷ 2 = 2
 24 24 12 14 22

3rd row of U

52
u = a − u 213 − u 2 23 = 3 − 1 − 1 = 1

33 33

u 34 = (a34 − u13u14 − u 23u 24 ) ÷ u 33 = (1 − (1)(1) − (− 1)(2 )) ÷ 1 = 2

4th row of U

u 44 = a44 − u 214 − u 2 24 − u 2 34 = 10 − 1 − 4 − 4 = 1

1 −1 1 1 1 0 0 0
   
0 2 −1 2 −1 2 0 0
∴U =  ∴U 1 = L = 
2 1 0
and
0 0 1 1 −1
   
0 0 1  1 2 1 
 0  2

A = LU
= LLT
= UTU

53
2. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEM OF EQUATIONS

2.1 Introduction To Iterative Methods

In general an iterative scheme is as follows:

We have an nxn matrix M and we want to get the solution of the system

x = Mx + y ……………………..(1)

We obtain the solution x as the limit of a sequence of vectors, {x k } which are obtained as
follows:

We start with any initial vector x(0), and calculate x(k) from,

x(k) = Mx(k-1) + y ……………….(2)

for k = 1,2,3, ….. successively.

We shall mention that a necessary and sufficient condition for the sequence of vectors
(k)
x to converge to a solution x of (1) is that the spectral radius M sp of the iterating
matrix M is less than 1 or if M for some matrix norm. (We shall introduce the notion of
norm formally in the next unit).

We shall now consider some iterative schemes for solving systems of linear equations,

Ax = y …………….(3)

We write this system in detail as

a11 x1 + a12 x 2 + ..... + a1n xn = y1


a 21 x1 + a 22 x2 + ..... + a 2 n xn = y 2
. . . . . . . .(4)
...... ...... ......
a n1 x1 + a n 2 x2 + ..... + a nn xn = y n
We have

 a11 a12 K a1n 


 
a a 22 K a2 n 
A =  21 . . . . . . . . . . . (5)
K K K K
 
 a n1 an 2 K a nn 

54
We denote by D, L, U the matrices

 a11 0 ... ... 0 


 
 0 a 22 ... ... 0 
D= 0 0 a 33 ... 0 .......... .......... ......( 6)
 
 ... ... ... ... ... 
 
 0 0 ... ... a nn 

the diagonal part of A; and

 0 0 K K 0
 
 a21 0 K K 0
L =  a31 a32 0 K 0  ..................................(7)
 
K K K K K
a K an ,n −1 0 
 n1 an 2

the lower triangular part of A; and

 0 a12 ... ... a1n 


 
0 0 a23 ... a2 n 
U = M M M M M  ........................................(8)
 
 ... ... ... ... an −1,n −1 
0 0 0 
 0 ...

the upper triangular part of A.

Note that,

A = D + L + U……………………… (9).
We assume that aii ≠ 0 ; i = 1, 2, ……, n …………(10)
so that D-1 exists.

We now describe two important iterative schemes, in the next section, for solving the
system (3).

55
2.2 JACOBI ITERATION

We write the system as in (4) of section 2.1 as

a11 x1 = −a12 x2 − a13 x3 ..... − a1n xn + y1


a 22 x2 = −a 21 x1 − a 23 x3 ..... − a 2 n xn + y 2 . . . . . . . .(11)
...... ...... ......
a nn xn = −a n1 x1 − a n 2 x2 ..... − a nn−1 x n−1 + y n

We start with an initial vector,

 x1(0) 
 (0) 
=  2  . . . . . . . .(12)
x
x(
0)
 M 
 (0) 
 xn 

and substitute this vector for x in the RHS of (11) and calculate x1,x2, ….., xn and this
vector is called x(1). We now substitute this vector in the RHS of (11) to calculate again
x1, x2, ….., xn and call this new vector as x(2) and continue this procedure to calculate the
sequence {x (k) } . Thus,

The equation (11) can be written as,

Dx = - (L + U) x + y …………………. (13)

which we can write as

x = -D-1 (L+U) x +D-1 y,

giving
x = J x + yˆ ……………… (14)

where

J = -D-1 (L + U) …………….(15)

and, we get

x(0) starting vector


…………….(16)
(k−1)
x = Jx
(k )
+ yˆ for k =1,2,........

57
as the iterative scheme. This is similar to (2 in section 2.1) with the iterating matrix M as
J = -D-1 (L + U); J is called the Jacobi Iteration Matrix. The scheme will converge to the
solution x of our system if J sp < 1 . We shall see an easier condition below:
We have

 1 
 a11 
 1 
-1  a22 
D = 
 O 
 1 

 ann 

and therefore

 0 −
a12

a13
.... −
a1n 
 a11 a11 a11 
 
 − a21 a23 a2n 
0 − .... −
J = −D ( L + U ) = 
−1 a22 a22 a22 
 .... .... .... .... .... 
 
 − an1 a
− n2 −
an,n−1
0 
 ....
 ann ann ann 

Now therefore the ith Absolute row sum for J is

Ri = ∑
aij
=
(a i1 + ai 2 + .... + ai ,i −1 + ai ,i +1 + .... + ain )
j ≠i aii aii

∴ If Ri <1 for every i =1,2,3,…..,n

then

J ∞
= max{R1 ,....., Rn } < 1

and we have convergence.

Now Ri < 1 means

58
ai1 + ai 2 + ..... + ai ,i −1 + ai ,i +1 + ..... + ain < aii

i.e. in each row of A the sum of the absolute values of the non diagonal entries is
dominated by the absolute value of the diagonal entry (in which case A is called ‘strictly
row diagonally dominant’). Thus the Jacobi iteration scheme for the system (3)
converges if A is strictly row diagonally dominant (Of course, this condition may not be
satisfied) and still Jacobi iteration scheme may converge if J sp < 1.

Example 1:

Consider the system

x1 + 2x2 – 2x3 = 1
x1 + x2 + x3 = 0 ………….(I)
2x1 + 2x2 + x3 = 0

Let us apply the Jacobi iteration scheme with the initial vector as

0
 
x (0)
= θ = 0 ………….(II)
0
 
1 2 − 2 1 0 0
   
We have A =  1 1 1  ; D = 0 1 0
2 1  0 1 
 2  0

0 2 − 2 1
   
L +U = 1 0 1  ; y =  0
2 0   0
 2  

 0 − 2 + 2 1
   
J = − D −1 (L + U ) =  − 1 0 − 1  ;
−1
yˆ = D y =  0 
− 2 − 2 0   0
   

Thus the Jacobi scheme (16) becomes

0
(0 )  
x = 0
0
 

59
x( ) = Jx(
k−1)
+ yˆ,
k
k = 1, 2,.......

1
(1)  
( 0)
∴ x = Jx + yˆ = yˆ =  0  since x (0) is the zero vector.
 0
 

 0 − 2 + 2  1   1 
(2 ) (1)     
x = Jx + yˆ =  − 1 0 − 1  0  +  0 
 − 2 − 2 0  0   0 
    

 0  1  1 
     
=  − 1 +  0 =  − 1 
 − 2  0  − 2
     

 0 − 2 2  1   1 
(3 ) (2 )     
x = Jx + yˆ =  − 1 0 − 1 − 1  +  0 
 − 2 − 2 0  − 2   0 
    

 − 2   1   − 1
     
=  1  +  0 =  1 
 0   0  0 
     

 0 − 2 2  − 1  1 
    
x (4 ) = Jx (3) + yˆ =  − 1 0 − 1 1  +  0 
 − 2 − 2 0  0   0 
    

 − 2   1   − 1
     
=  1  +  0  =  1  = x (3 )
 0  0  0 
     

∴ x(4) = x(5) = x(6) = ………. = x(3)

∴ x(k) = x(3) and x(k) converges to x(3)

60
∴ The solution is

 −1
(k) (3)  
x = lim x =x = 1 
k →∞
0
 

We can easily check that this is the exact solution.

Here, there is no convergence problem at all.

Example 2:

8x1 + 2x2 – 2x3 = 8


x1 - 8x2 + 3x3 = 19
2x1 + x2 + 9x3 = 30
 0
 
Let us apply Jacobi iteration scheme starting with x (0 ) =  0
 0
 

 1 0 0 
 8 0 0  8 
 
We have D =  0 − 8 0 ∴ D −1 = 0 −1 0 
 8 
0 0 9 
  0 0 1 
 9

 0 − 0.25
+ 0.25 
 
J = − D −1 (L + U ) =  + 0.125 0.375 
0
 − 0.22222 − 0.11111 0 

 1 
−1
 
yˆ = D y =  − 2 .375 
 3.33333 
 

Now the matrix is such that

a11 = 8 and a12 + a13 = 2 + 2 = 4 ∴ a 11 > a 12 + a13

a22 = 8 and a21 + a23 = 1 + 3 = 4; ∴ a 22 > a 21 + a 23

61
a33 = 9 and a31 + a32 = 2 + 1 = 3 ∴ a 33 > a 31 + a 32

Thus we have strict row diagonally dominant matrix A. Hence the Jacobi iteration
scheme will converge. The scheme is,
 0
 
x 0 =  0
 0
 
x ( k ) = Jx ( k −1) + yˆ

 0 − 0.25 0.25 
 
=  0.125 0 0.375  x ( k −1) + yˆ
 − 0.22222 − 0.11111 0 

 1 
(1)  
x = yˆ =  − 2 .375 
 3 .33333 
 

We continue the iteration until the components of x(k) and x(k+1) differ by at most, say;
3x10-5 , that is, x ( k +1 ) − x ( k ) ≤ 3 x10 − 5 , we get x (1) − x (0 ) = 3 .33333 . So we
∞ ∞

continue

 2.42708 
(2 ) (1)  
x = Jx + yˆ =  − 1.00000  x (2 ) − x (1 ) = 1.42708 ≥∈

 3.37500 
 

 2 .09375 
(3 ) (2 )  
x = Jx + yˆ =  − 0 .80599 ; x (3 ) − x ( 2 ) = 0.46991 ≥∈

 2 .90509 
 

 1 .92777 
(4 ) (3 )  
x = Jx + yˆ =  − 1 .02387 ; x ( 4 ) − x (3 ) = 0 .21788 ≥∈

 2 .95761 
 

62
 1 .99537 
(5 ) (4 )  
x = Jx + yˆ =  − 1 .02492 ; x (5 ) − x ( 4 ) = 0 .06760 ≥∈

 3 .01870 
 

 2 .01091 
(6 )  
x = Jx (5 ) + yˆ =  − 0 .99356 ; x ( 6 ) − x (5 ) = 0.03136 ≥∈

 3 .00380 
 

 1 .99934 
(7 ) (6 )  
x = Jx + yˆ =  − 0 .99721 ; x ( 7 ) − x (6 ) = 0 .01157 ≥∈

 2 .99686 
 

 1 .99852 
(8 )  
x = Jx (7 ) + yˆ =  − 1 .00126 ; x (8 ) − x ( 7 ) = 0 .00405 ≥∈

 2 .99984 
 

 2 .00027 
(9 ) (8 )  
x = Jx + yˆ =  − 1 .00025 ; x (9 ) − x (8 ) = 0 .00176 ≥∈

 3 .00047 
 

 2 .00018 
 
x (10 ) = Jx (9 ) + yˆ =  − 0 .99979 ; x (10 ) − x (9 ) = 0 .00050 ≥∈

 2 .99997 
 

 1 .99994 
(11 ) (10 )   ( ) ( )
x = Jx + yˆ =  − 0 .99999 ; x 11 − x 10 = 0 .00024 ≥∈

 2 .99994 
 

 1 .99998 
(12 ) (11 )  
x = Jx + yˆ =  − 1 .00003 ; x (12 ) − x (11 ) = 0 .00008 ≥∈

 3 .00001 
 

63
 2 .00001 
(13 ) (12 )  
x = Jx + yˆ =  − 1 .00000 ; x (13 ) − x (12 ) = 0 .00003 =∈

 3 .00001 
 

Hence the solution is x1=2 ; x2=-1, x3=3.00001


(The Exact solution is x1 = 1, x2 = -2, x3 =3).

64
2.3 GAUSS – SEIDEL METHOD

Once again we consider the system

Ax = y …………….. (I)

In the Jacobi scheme we used the values of x2( k ) , x3( k ) ,L , xn( k ) obtained in the kth
iteration, in place of x2, x3, ….., xn in the first equation,
a11 x1 + a12 x2 + ..... + a1n xn = y1

to calculate x1( k +1) from

a11 x1( k +1) = − a12 x2( k ) − a13 x3( k ) ..... − a1n xn( k ) + y1

Similarly, in the ith equation, ( i = 2,3,L , n ) , we used the values,


(k ) (k ) (k )
x , x ,L , x , x ,L , x
1 2 i −1
(k )
i +1
(k )
n , in place of x1 , x2 ,L , xi −1 , xi +1 ,L , xn to calculate xi( k +1)
from

aii xi( k +1) = − ai1 x1( k ) − ai 2 x2( k ) − ...... − ai ,i −1 xi(−k1) − ai ,i +1 xi(+k1) − ..... − ain xn( k ) + yi LL (*)

What Gauss – Seidel suggests is that having obtained x1( k +1) from the first
equation use this value for x1 in the second equation to calculate x2( k +1) from

a22 x2( k +1) = − a21 x1( k +1) − a23 x3( k ) − ...... − a2 n xn( k ) + y2

and use these values of x1( k +1) , x2( k +1) , in the 3rd equation to calculate x3( k +1) , and so
on. Thus in the equation (*) use x1( k +1) , L , xi(−k1+1) in place of x1( k ) , x2( k ) , L , xi(−k1) to get
the following modification of the i -th equation to calculate xi( k +1) :

aii xi( k +1) = − ai1 x1( k +1) − ai 2 x2( k +1) − ...... − ai ,i −1 xi(−k1+1) − ai ,i +1 xi(+k1) − ai ,i + 2 xi(+k2) ..... − ain xn( k ) + yi
In matrix notation we can write this as,

Dx ( k +1) = − Lx ( k +1) − Ux ( k ) + y

which can be rewritten as,

(D + L )x (k +1) = −Ux (k ) + y , and hence


x (k +1 ) = − (D + L ) Ux k + (D + L ) y
−1 −1

65
Thus we get the Gauss – Seidel iteration scheme as,

x(0) initial guess

x (k +1) = Gx ( k ) + yˆ ……..(II)

where,

G = -(D+L)-1U

is the Gauss – Seidel iteration matrix, and

yˆ = (D + L ) y
−1

The scheme converges if and only if G sp


< 1. Of course, the scheme will
converge if G < 1 in some matrix norm. But some matrix norm, G ≥ 1 does not
mean that the scheme will diverge. The acid test for convergence is G sp
< 1.

We shall now consider some examples.

Example 3:

Let us consider the system

x1 + 2x2 – 2x3 = 1
x1 + x2 + x3 = 0 ,
2x1 + 2x2 + x3 = 0

considered in example 1 on page 59; and for which the Jacobi scheme gave the
exact solution in the 3rd iteration. (see page 60). We shall now try to apply the
Gauss – Seidel scheme for this system. We have,

1 2 − 2 1
   
A = 1 1 1  ; y =  0
2 1   0
 2  

1 0 0 0 −2 2 
   
D + L = 1 1 0 ; −U =  0 0 −1
2 1  0 0 
 2  0

66
 1 0 0
 
(D + L ) −1
= −1 1 0
 0 −2 1 

Thus,

1 0 0   0 −2 2 
G = −( D + L) U =  −1 1 0   0 0 −1 

−1

 0 −2 1   0 0 0 
  

From above we get the Gauss – Seidel iteration matrix as,

0 −2 2 
 
G = 0 2 − 3
0 2 
 0

Since G is triangular we get its eigenvalues immediately, as its diagonal entries.


Thus
λ1 = 0, λ2 = 2, λ3 = 2 are the three eigenvalues. Therefore,

G sp
= 2 >1
Hence the Gauss – Seidel scheme for this system will not converge. Thus for
this system the Jacobi scheme converges so rapidly giving the exact solution in
the third iteration itself whereas the Gauss – Seidel scheme does not converge.

Example 4:

Consider the system

1 1
x1 − x2 − x3 = 1
2 2
x1 + x 2 + x 3 = 0
1 1
− x1 − x 2 + x 3 = 0
2 2

Let us apply the Gauss – Seidel scheme to this system. We have,

67
 1 −1 −1 
 2 2 1
 
A= 1 1 1  ; y =  0
   0
− 1 −1 1   
 2 2 

 1 0 0   1 0 0 
 
(D + L)
−1
D+L= 1 1 0 ; =  −1 1 0 ,
   
− 1 −1 1  0 1 1
 2 2   2 

0 1 1 
 2 2
−U =  0 0 −1  .
 
0 0 0 
 

Thus,

0   0
 1 1 1 
 0 2 2
G = − ( D + L ) U =  −1
−1
1 00 0 −1 
  
 0 1 10 0 0 
 2  

0 1 1 
 2 2 
∴G = 0 − 12 − 3 2  ............(*)
 
 0 0 − 1 
 2

is the Gauss – Seidel matrix for this sytem.

The Gauss – Seidel scheme is

x (k +1 ) = Gx ( k ) + yˆ
 0
 
x (0 ) =  0
 0
 

68
where

 1 0 01  1 
 
   
yˆ = ( D + L )
−1
y =  −1 1 0   0  =  −1 ;
    
 0 1 10  0 
 2 
and
where G is given (*).

Notice that G is upper triangular and hence we readily get the eigenvalues of G
as its diagonal entries. Thus the eigenvalues of G are, λ1 = 0, λ2 = -1/2, λ3 = -
1
1/2. Hence G sp
= < 1 . Hence in this example the Gauss – Seidel scheme
2
will converge.

Let us now carry out a few steps of the Gauss – Seidel iteration, since we
have now been assured of convergence. (We shall first do some exact
calculations).

 0  1   1 
     
x (1 ) = Gx (0 ) + yˆ = G  0  +  − 1 =  − 1
 0  0   0 
     

 1   1 
(2 ) (1 )    
x = Gx + yˆ = G  − 1 +  − 1 
 0   0 
   

0 1 1 
 2 2  1   1 
= 0 − 1 − 3   −1 +  −1
 2 2    
 0 0 − 1   0   0 
 2 

 1− 1 
 2 
= − 1− 1
 (2 ) 

 
 0 
 

69
 1− 1 + 1 2 
 
( )
2 2
x (3 ) = Gx ( 2 ) 
+ y = − 1− 2 +
ˆ 1 1 
 22 
 0 
 

If we continue this process we get


 1− 2 +
1 1 2 − ..... +
(− 1)k −1

k −1


 2 2 
x (k )  
= − 1 − 2 +
1 1 − ..... + (− 1 ) k −1


  22 2 k −1  
 0 
 
 

Clearly,

 1 − 1 + 1 2 − 1 3 + 1 4 ..... 
 2 2 2 2 
x(
k) 
(
→  − 1 − 1 2 + 1 2 − 1 3 ..... 

2 2


)
 0 
 

and by summing up the geometric series we get,

 2 3 
(k )  
x →  − 2 3
 0 
 

which is the exact solution.

Of course, here ‘we’ knew ‘a priori’ that the sequence is going to sum up
neatly for each component and so we did exact calculation. If we had not noticed
this we still would have carried out the computations as follows:

 1 
 
x (1 ) = Gx (0 ) + yˆ =  − 1  as before
 0 
 

70
 0 .5 
(2 ) (1)  
x = Gx + yˆ =  − 0 .5 
 0 
 

 0 .625 
(3 ) (2 )  
x = Gx + yˆ =  − 0 .625 
 0 

 0 .6875 
(4 ) (3 )  
x = Gx + yˆ =  − 0 .6875 
 
 0 

 0 .65625 
 
x (5 ) = Gx (4 ) + yˆ =  − 0 .65625  ; x (5 ) − x ( 4 ) = 0 .03125

 
 0 

 0 .671875 
 
x (6 ) = Gx (5 ) + yˆ =  − 0 .671875  ; x (6 ) − x (5 ) = 0 .025625

 
 0 

 0 . 664062 
(7 )  
x =  − 0 . 664062  ; x ( 7 ) − x (6 ) = 0 .007813

 
 0 
 0 . 667969 
 
x (8 ) =  − 0 . 667969  ; x (8 ) − x (7 ) = 0 .003907

 
 0 

 0 . 666016 
(9 )  
x =  − 0 . 666016  ; x (9 ) − x (8 ) = 0 .001953

 
 0 

 0 . 666504 
(10 )  
x =  − 0 . 666504 ; x (10 ) − x (9 ) = 0 .000488

 
 0 

71
Since now error is < 10-3 we may stop here and take x(10) as our solution for the
system. Or we may improve our accuracy by doing more iterations, to get,

 0 . 666748   0 . 666626   0 . 666687 


     
x (11 ) =  − 0 . 666748  ; x
(12 )
=  − 0 . 666626  ; x
(13 )
=  − 0 . 666687 
     
 0   0   0 

 0 . 666656 
(14 )  
x =  − 0 . 666656  x (14 ) − x (13 ) = 0 .000031 < 10 −4

 
 0 

and hence we can take x(14) as our solution within error 10-4.

Let us now try to apply the Jacobi scheme for this system. We have

 1 −1 −1 
 2 2
A= 1 1 1  ; and therefore,
 
− 1 −1 1 
 2 2 

 0 1 1 
 2 2
J =  −1 0 −1 
 
1 1 0 
 2 2 

We have the characteristic polynomial of J as

λ −1 −1
2 2  1  λ 
λI − J = + 1 λ + 1 =  λ +  λ 2 − + 1
 2  2 
−1
2
−1
2
λ

Thus the eigenvalues of J are

1 1 1
λ1 = − ; λ2 = + i 15 ; λ3 = − i 15
2 2 4 2 4

1 1 15
∴ λ1 = ; λ2 = λ3 = + = 16 = 2
2 4 4 4

72
∴ J sp
= 2 which is >1. Thus the Jacobi scheme for this system will not
converge.

Thus, in example 3 we had a system for which the Jacobi scheme converged
but Gauss – Seidel scheme did not converge; whereas in example 4 above we
have a system for which the Jacobi scheme does not converge, but the Gauss –
Seidel scheme converges. Thus, these two examples demonstrate that, in
general, it is not ‘correct’ to say that one scheme is better than the other.

Let us now consider another example.

Example 5:

2x1 – x2 =y1
-x1 + 2x2 – x3 = y2
-x2 + 2x3 –x4 =y3
-x3 + 2x4 = y4

Here

 2 −1 0 0 
 
−1 2 −1 0 
A= ,
0 − 1 2 − 1
 
 0 − 
 0 1 2 

is a symmetric tridiagonal matrix.

The Jacobi matrix for this scheme is

 0 1 0 0 
 2 
 1 0 1 0 
J =  2 2 
 0 1 0 1 
 2 2 
 
0 0 1 0 
 2 

The characteristic equation is,

16 λ 4 - 12 λ 2 + 1 = 0 ………………(CJ)

Set λ 2 = α

73
Therefore

16α2 - 12α + 1 = 0 ………………(CJ1)

∴ λ is the square root of the roots of (CJ1).

Thus the eigenvalues of J are ± 0.3090; ± 0.8090.

Hence

J sp
= 0.8090 ; and the Jacobi scheme will converge.

The Gauss – Seidel matrix for the system is found as follows:

 2 0 0 0
 
−1 2
(D + L ) = 
0 0
0 −1 2 0
 
 0 0 −1 2 

0 1 0 0
 
0 0 1 0
−U = 
0 0 0 1
 
0 0 
 0 0

 1 0 0 0 
 2 
 1 1 0 0 
(D + L ) = 
−1 4 2
 1 1 1 0 
 8 4 2 
 1 1 1 1 
 16 8 4 2

74
 1 0 0 0 
 2   0 1 0 0 
 1 1 0 0   
0 0 1 0 
G = − (D + L ) U =   
−1 4 2
 1 1 1 0   0 0 0 1 
 8 4 2   
 1 1 1 1   0 0 0 0 
 16 8 4 2 
0 1 0 0 
 2 
0 1 1 0 
=  4 2 
0 1 1 1 
 8 4 2 
 1 1 1 
0
 16 8 4 

The characteristic equation of G is

λI − G = 0 , which becomes in this case

16λ4 − 12λ3 + λ 2 = 0.......... .......... (C G )

This can be factored as

λ 2 (16λ2 − 12λ + 1) = 0

Thus the eigenvalues of G are roots of

λ2 = 0 ; and

16λ2 - 12λ + 1 = 0 ………….(CG1)

Thus one of the eigenvalues of G is 0 (repeated twice), and two eigenvalues of G


are roots of (CG1). Notice that roots of (CG1) are same as those of (CJ1). Thus
nonzero eigenvalues of G are squares of eigenvalues of J. ∴ the nonzero
eigenvalues of G are,

0.0955, 0.6545.

Thus,

G sp
= 0.6545 < 1

Thus the Gauss – Seidel scheme also converges. Observe that

75
G sp
= J
2
sp ; G sp
< J sp

Thus the Gauss – Seidel scheme converges faster than the Jacobi scheme.

In many class of problems where both schemes converge it is the Gauss –


Seidel scheme that converges faster. We shall not go into any further details of
this aspect.

76
2.4 SUCCESSIVE OVERRELAXATION (SOR) METHOD

We shall now consider SOR method for the system

Ax = y ………..(I)

We take a scalar parameter ω ≠ 0 and multiply both sides of (I) to get an equivalent
system,

ωAx = ωy ………………(II)

Now as before we split the given matrx as

A = (D + L + U )

We write (II) as

(ωD + ωL + ωU)x = ωy,

i.e.

(ωD + ωL) = - ωUx + ωy

i.e.

(D + ωL)x + (ω-1) Dx = - ωUx +ωy

i.e.

(D + ωL)x = - [(ω – 1)D + ωU]x + ωy

i.e.
x = - (D + ωL)-1 [(ω-1)D + ωU]x + ω [D + ωL]-1y.

We thus get the SOR scheme as

x ( k + 1) = M ω x ( k ) + yˆ
……………(III)
x(
0)
= zero vector; initial guess

where,

M ω = −(D + ωL)
−1
[(ω −1)D + ωu]

77
and
yˆ = (D + ω L ) ω y
−1

Mω is the SOR matrix for the system.

Notice that if ω = 1 we get the Gauss – Seidel scheme. The strategy is to choose ω such
that M ω sp is < 1, and is as small as possible so that the scheme converges as rapidly as
possible. This is easier said than achieved. How does one choose ω? It can be shown
that convergence cannot be achieved if ω ≥ 2. (We assume ω > 0). ‘Usually’ ω is chosen
between 1 and 2. Of course, one must analyse M ω sp as a function of ω and find that
value ω0 of ω for which this is minimum and work with this value of ω0.

Let us consider an example of this aspect.

Example 6:

Consider the system given in example 5 in section 2.3.

For that system,

Mω = - (D +ω L)-1 [(ω-1) D +ωU]

 1 
 1−ω ω 0 0 
 2 
 1ω − 1ω 2 1
1−ω + ω 2
1
ω 0 
 
= 2 2 4 2

1 2 1 3 1 1 2 1 3 1 1
 ω − ω ω− ω + ω 1−ω + ω 2 ω 
4 4 2 2 8 4 2 
1 1
 ω − ω 4 1 2 1 3 1 4 1 1 2 1 3 1 2
3
ω − ω + ω ω− ω + ω 1−ω + ω 
8 8 4 4 16 2 2 8 4 

and the characteristic equation is

16 (ω − 1 + λ ) − 12ω 2 λ (ω − 1 + λ ) + ω 4 λ 2 = 0 .......... ....... (C Mω )


4 2

Thus the eigenvalues of Mω are roots of the above equation. Now when is λ = 0 a root?
If λ = 0 we get, from (CMω),

16(ω-1)4 = 0 ⇒ ω = 1, i.e. as in the Gauss – Seidel case. So let us take ω ≠ 1; so


λ = 0 is not a root. So we can divide the above equation (CMω) by ω4λ2 to get

78
 (ω − 1 + λ )2  (ω −1+ λ)
2 2
16  − 12 +1 = 0
 ω 2
λ  ω 2
λ

Setting

µ =
2 (ω − 1+ λ )2
we get
2
ω λ

16 µ 4 − 12µ 2 + 1 = 0

which is the same as (CJ). Thus

µ = ± 0 . 3090 ; ± 0 . 8090 .

Now

(ω − 1 + λ )2 = µ 2
= 0.0955 or 0.6545 ……….(*)
ω λ
2

Thus, this can be simplified as

1
1 2
λ = µ 2ω 2 − (ω − 1) ± µω  µ 2ω 2 − (ω − 1)
1
2 4 

as the eigenvalues of Mω.

With ω = 1.2 and using the two values of µ2 in (*) we get,

λ = 0.4545, 0.0880, -0.1312 ± i (0.1509).

as the eigenvalues. The modulus of the complex roots is 0.2

Thus
M ω sp
when ω = 1.2 is 0.4545

which is less that J sp


= 0.8090 and G sp
= 0.6545 computed in Example 5 in section
2.3. Thus for this system, SOR with ω = 1.2 is faster than Jacobi and Gauss – Seidel
scheme.

79
We can show that in this example when ω = ω0 = 1.2596, the spectral radius M ω 0 is
smaller than M ω for any other ω. We have

M 1.2596 = 0.2596

Thus the SOR scheme with ω = 1.2596 will be the method which converges fastest.

Note:
We had M 1 .2 sp
= 0 .4 5 4 5
and
M 1.2596 sp
= 0.2596

Thus a small change in the value of ω brings about a significant change in the spectral
radius M ω sp .

80
3. REVIEW OF PROPERTIES OF EIGENVALUES AND EIGENVECTORS

3.1 EIGENVALUES AND EIGENVECTORS


We shall now review some basic facts from matrix theory.
Let A be an nxn matrix. A scalar α is called an eigenvalue of A if there
exists a nonzero nx1 vector x such that

Ax = αx

Example:
Let
 − 9 4 4
 
A =  − 8 3 4
 − 16 7 
 8

α = −1

Consider
 1
 
x =  2 .
 0
 
We have
 − 9 4 4  1   − 1   1
      
Ax =  − 8 3 4  2  =  − 2  = −1 2 
 − 16 8 7  0   0   0
      
= (− 1)x = αx

Hence α = −1 is such that there exists a nonzero vector x such that Ax = αx.
Thus α is an eigenvalue of A.

1
Similarly, if we take α = 3, x =  1  we find that
 2
 
Ax = αx. Thus, α = 3 is also an eigenvalue of A.

Let α be an eigenvalue of A. Then any nonzero x such that Ax = αx is


called an eigenvector of A.
Let α be an eigenvalue of A. Let,

81
{
Wα = x ∈ C n : Ax = αx }
Then we have the following properties of Wα :

(i) Wα is nonempty, since the zero vector, (which we denote by θ ), is in


Wα , that is , Aθ n = θ n = αθ n .
(ii) x, y ∈ Wα ⇒ Ax = αx , Ay = αy
⇒ A(x + y) = α(x + y)
⇒ x + y ∈ Wα

(iii) For any constant k, we have


kAx = kαx = α(kx)
⇒ A(kx) = α(kx)
⇒ kx ∈ Wα

Thus Wα is a subspace of Cn. This is called the characteristic subspace or the


eigensubspace corresponding to the eigenvalue α.

Example: Consider the A in the example on page 81. We have seen that α = -1
is an eigenvalue of A. What is W( −1) , the eigensubspace corresponding to –1?

We want to find all x such that

Ax = -x, that is,


(A+I)x = θ, that is,
we want to find all solutions of the homogeneous system Mx = θ ; where

 −8 4 4
 
M = A +I =  −8 4 4
 − 16 8 
 8

We now can use our row reduction to find the general solution of the system.

1 − 1 − 1
R 2 − R1
 −8 4 4 1  2 2
M →  0 0   
− R1
0 8
→ 0 0 0 
R 3 − 2 R1    
 0 0 0 0 0 0 
 

82
1 1
Thus, x 1 = x2 + x3
2 2

Thus the general solution of (A+I) x = θ is

 1 x2 + 1 x3 
 2 2  1 1 1
 =   1  
x2 x2  2  + x3  0 
  2 0 2 2
 x3     
 
1 1
   
= A1  2  + A 2  0 
0  2
   

where A1 and A2 are arbitrary constants.

Thus consists of all vectors of the form

1 1
   
A1  2  + A2  0  .
0 2
   

1 1
   
Note: The vectors  2  ,  0  form a basis for ω-1 and therefore
0 2
   

dim W( −1) = 2.

What is W(3) the eigensubspace corresponding to the eigenvalue 3 for the above
matrix?

We need to find all solutions of Ax = 3x,


i.e., Ax – 3x = θ
i.e., Nx = θ
where

83
 − 12 4 4
 
N = A − 3I =  − 8 0 4
 − 16 4 
 8

Again we use row reduction

 − 12 4   − 12 4 
 4

4

R 1 − 2 R 3 and R1 − 4 R 1
N  3    3 → 0 −8 4   
R + R
→ 0 − 8
4 3 4 
 3 3 

3 3
 0 8 − 
4  0 0 0 
 3 3
∴ 12 x1 = 4 x 2 + 4 x 3
8 4
x2 = x3 ∴ x3 = 2 x2
3 3
∴12 x1 = 4 x 2 + 8 x 2 = 12 x 2
∴ x 2 = x1
∴ x 2 = x1 ; x 3 = 2 x 2 = 2 x1
∴ The general solution is

 x1  1
   
 x1  = x1  1 
2x  2
 1   

Thus W(3) consists of all vectors of the form

 1 
 
κ  1 
 
 2 

Where κ is an arbitrary constant.

84
1
 
Note: The vector  1  forms a basis for W(3) and hence
 2
 
dim. W(3) = 1.

Now when can a scalar α be an eigenvalue of a matrix A of order n? We


shall now investigate this question. Suppose α is an eigenvalue of A.

This ⇒ There is a nonzero vector x such that Ax = αx.


⇒ (A − αI) x = θ and x ≠ θ
⇒ The system ( A − α I ) x = θ has at least one nonzero solution.
⇒ nullity (A - αI) ≥ 1
⇒ rank (A - αI) < n
⇒ (A - αI) is singular
⇒ det. (A - αI) = 0

Thus, α is an eigenvalue of A ⇒ det. (A - αI) = 0.


Conversely, α is a scalar such that det. (A - αI) = 0.
This ⇒ (A - αI) is singular
⇒ rank (A - αI) < n
⇒ nullity (A - αI) ≥ 1
⇒ The system ( A − α I ) x = θ has nonzero solution.
⇒ α is an eigenvalue of A.

Thus, α is a scalar such that det. (A - αI) = 0 ⇒ α is an eigenvalue.


Combining the two we get,
α is an eigenvalue of A
⇔ det. (A - αI) = 0
⇔ det. (αI - A) = 0

Now let C(λ) = det. (λI - A)

Thus we see that,


“The eigenvalues λ of a matrix A are precisely the roots of
C(λ) = det. (λI - A)”.

85
We have,

λ − a 11 − a 12 K − a 1n
− a 21 λ − a 22 K − a 2n
C (λ ) = K K K K
K K K K
− a n1 − a n2 K λ − a nn

= λ n − (a11 + K + a nn )λ n −1 + K + (− 1) det . A
n

Thus ; C(λ) is a polynomial of degree n. Note the ‘leading’ coefficient of C(λ) is


one and hence C(λ) is a ‘monic’ polynomial of degree n. This is called
CHARACTERISTIC POLYNOMIAL of A. The roots of the characteristic
polynomial are the eigenvalues of A. The equation C(λ) = 0 is called the
characteristic equation.

Sum of the roots of C(λ) = Sum of the eigenvalues of A


= a11 + . . . . . . + ann ,
and this is called the TRACE of A.

Product of the roots of C(λ) = Product of the eigenvalues of A


= det. A.

In our example in page 81 we have

 −9 4 4
 
A =  −8 3 4
 − 16 7 
 8
λ+9 −4 −4
∴ C (λ ) = det .( λ I − A ) = 8 λ −3 −4
16 −8 λ −7
λ +1 − 4 −4
  → λ + 1 λ − 3 − 4
C1 + C 2 + C 3

λ +1 −8 λ − 7

1 − 4 − 4
= (λ + 1 )1 λ − 3 − 4
1 − 8 λ − 7

86
R 2 − R1
1 −4 −4

R 3 − R1
= ( λ + 1) 0 λ +1 0
0 −4 λ −3

= (λ + 1 )(λ + 1 )(λ − 3 )

= (λ + 1 ) (λ − 3 )
2

Thus the characteristic polynomial is


C ( λ ) = (λ + 1 ) (λ − 3 )
2

The eigenvalues are –1 (repeated twice) and 3.

Sum of eigenvalues = (-1) + (-1) + 3 = 1


= Trace A = Sum of diagonal entries.

Product of eigenvalues = (-1) (-1) (3) = 3 = det. A.

Thus, if A is an nxn matrix, we define the CHARACTERISTIC POLYNOMIAL as,


C (λ ) = λI − A . . . . . . . . . . . . .(1)
and observe that this is a monic polynomial of degree n. When we factorize this
as,
C(λ ) = (λ − λ1 ) 1 (λ − λ2 ) 2 KK(λ − λk ) k . . . . . . . .(2)
a a a

where λ1, λ2, . . . . . ., λk are the distinct roots; these distinct roots are the distinct
eigenvalues of A and the multiplicities of these roots are called the algebraic
multiplicities of these eigenvalues of A. Thus when C(λ) is as in (2), the distinct
eigenvalues are λ1, λ2, . . . . . ., λk and the algebraic multiplicities of these
eigenvalues are respectively, a1, a2, . . . . . , ak.

For the matrix in Example in page 81 we have found the characteristic


polynomial on page 86 as
C ( λ ) = (λ + 1 ) (λ − 3 )
2

Thus the distinct eigenvalues of this matrix are λ1 = -1 ; and λ2 = 3 and their
algebraic multiplicities are respectively a1 = 2 ; a2 = 1.

If λi is an eigenvalues of A, the eigen subspace corresponding to λi is Wλ i

and is defined as
{
Wλ = x ∈ C n : Ax = λ i x
i
}
87
The dimension of Wλ is called the GEOMETRIC MULTIPLICITY of the
i

eigenvalue λi and is denoted by gi.

Again for the matrix on page 81, we have found on pages 83 and 84
respectively that, dim W(−1) = 2 ; and dim. W(3) = 1. Thus the geometric
multiplicities of the eigenvalues λ1 = -1 and λ2 = 3 are respectively g1 = 2 ; g2 = 1.
Notice that in this example it turns out that a1 = g1 = 2 ; and a2 = g2 = 1. In
general this may not be so. It can be shown that for any matrix A having C(λ) as
in (2),

1 ≤ gi ≤ ai ; 1 ≤ i ≤ k . . . . . . . . . . . .(3)

i.e., for any eigenvalue λ i of A, that is,


1 ≤ geometric multiplicity ≤ algebraic multiplicity for any eigenvalue.

We shall now study the properties of the eigenvalues and eigenvectors of


a matrix. We shall start with a preliminary remark on Lagrange Interpolation
polynomials :

Let α1, α2, . . . . . . . ., αs be s distinct scalars, (i.e., αi ≠ αj if i ≠ j ).


Consider,

(λ − α 1 )( λ − α 2 ) K ( λ − α i −1 )( λ − α i +1 ) K ( λ − α s )
p i (λ ) =
(α i − α 1 )(α i − α 2 ) K (α i − α i −1 )(α i − α i +1 ) K (α i − α s )
(λ − α j )
= ∏ for i = 1,2, . . . . . . ., s . . . . . . .. (4)
1≤ j ≤ s
j≠i
(α i − α j )

Then pi(λ) are all polynomials of degree s-1.

Further notice that pi (α1 ) = K= pi (αi−1 ) = pi (αi+1 ) = K= pi (αs ) = 0


pi (αi ) = 1

Thus pi(λ)are all polynomials of degree s-1 such that,

1if i = j
( )
pi α j = δij =  . . . . . . . . . . (5)
0 if i ≠ j

88
We call these the Lagrange Interpolation polynomials. If p(λ) is any polynomial
of degree ≤ s-1 then it can be written as a linear combination of p1(λ),p2(λ), . . .,
ps(λ) as follows:

p (λ ) = p (α 1 ) p1 (λ ) + p (α 2 ) p 2 (λ ) + L + p (α s ) p s (λ ) . . . . (6)

s
= ∑ p (α ) p (λ )
i =1
i i

With this preliminary, we now proceed to study the properties of the


eigenvalues and eigenvectors of an nxn matrix A.

Let λ1, . . . . , λk be the distinct eigenvalues of A. Let φ1, φ2, . . . , φk be


eigenvectors corresponding to these eigenvalues respectively ; i.e., φi are
nonzero vectors such that
Aφi = λiφi , i=1,2,…,k . . . . . . . . . . .(6)

From (6) it follows that

A 2 φ i = A ( A φ i ) = A ( λ i φ i ) = λ i A φ i = λ2 i φ i
A 3φi = A ( A 2 φi ) = A (λ i 2 φ i ) = λ i 2 i Aφ i = λ i 3 φ i
and by induction we get

A m φ i = λ i m φ i for any integer m ≥ 0 . . . . . . . . . . .(7)


(We interpret A0 as I).

Now,
p ( λ ) = a 0 + a1 λ + K K + a s λ s

be any polynomial. We define p(A) as the matrix,


p ( A ) = a 0 I + a1 A + K K + a s A s
Now
p ( A )φ i = ( a 0 I + a 1 A + K K + a s A s )φ i
= a 0φ i + a1 A φ i + K K + a s A sφ i
= a 0 φ i + a 1λ i φ i + K K + a s λ i s φ i by (6)

= ( a 0 + a 1λ i + K K + a s λ i s ) φ i
= p ( λ i )φ i .

89
Thus,

If λi is any eigenvalue of A and φi is an eigenvector corresponding to λi ,then


for any polynomial p(λ) we have p ( A )φ i = p ( λ i )φ i .
PROPERTY I

Now we shall prove that the eigenvectors, φ1, φ2, . . . . , φk corresponding


to the distinct eigenvalues λ1, λ2, . . . . , λk of A, are linearly independent .

In order to establish this linear independence, we must show that

C1φ1 + C2φ2 + K+ CKφK = θn ⇒ C1 = C2 = K = CK = 0 . . . (8)

Now if in (4) & (5) we take s = k ; αi = λi, (I=1,2,..,s) then we get the Lagrange
Interpolation polynomials as

(λ − λ j )
p i (λ ) = ∏ ; i = 1,2,….., k …………(9)
1≤ j ≤ k
j≠ i
(λ i − λ j )
and

p i (λ j ) = δ ij …………(10)

Now,

C1φ1 + C2φ2 + .... + Ckφk = θ n

For 1≤ i ≤ k,

pi ( A)[C1φ1 + C 2φ 2 + .... + C k φk ] = pi ( A)θ n = θ n

⇒ C1 pi ( A)φ1 + C 2 pi ( A)φ 2 + .... + C k pi ( A)φ k = θ n

⇒ C1 pi (λ1 )φ1 + C 2 pi (λ2 )φ 2 + .... + C k pi (λ k )φ k = θ n , (by property I on page 86)

⇒ Ciφi = θ ;1 ≤ i ≤ k ; by (10)
⇒ Ci = 0;1 ≤ i ≤ k since φi are nonzero vectors

Thus

90
C1φ1 + C 2φ 2 + .... + C k φ k = θ n ⇒ C1 = C 2 = .... = C n = 0 proving (8). Thus
we have

Eigen vectors corresponding to distinct eigenvalues of A are linearly


independent.

PROPERTY II

91
3.2 SIMILAR MATRICES

We shall now introduce the idea of similar matrices and study the properties
of similar matrices.

DEFINITION

An nxn matrix A is said to be similar to a nxn matrix B if there exists a


nonsingular nxn matrix P such that,

P-1 A P = B

We then write,

A∼B

Properties of Similar Matrices

(1) Since I-1 A I = A it follows that A ∼ A

(2) A ∼ B ⇒ ∃ P, nonsingular such that., P-1 A P = B

⇒ A = P B P-1

⇒ A = Q-1 B P, where Q = P-1 is nonsingular

⇒ ∃ nonsingular Q show that Q-1 B Q = A


⇒ B∼A

Thus

A∼B ⇒ B∼A

(3) Similarly, we can show that

A ∼ B, B ∼ C ⇒ A ∼ C.

(4) Properties (1), (2) and (3) above show that similarity is an equivalence
relation on the set of all nxn matrices.

(5) Let A and B be similar matrices. Then there exists a nonsingular matrix P
such that

A = P-1 B P.

91
Now, let CA(λ) and CB (λ) be the characteristic polynomials of A and B
respectively. We have,

C A (λ ) = λI − A = λI − P −1 BP

= λP −1 P − P −1 BP

= P −1 (λI − B )P

= P −1 λI − B P

= λ I − B since P −1 P = 1

= CB ( λ )

Thus “ SIMILAR MATRICES HAVE THE SAME CHARACTERISTIC


POLYNOMIALS ”.

(6) Let A and B be similar matrices. Then there exists a nonsingular matrix P
such that

A = P-1 B P

Now for any positive integer k, we have

(
P −4
Ak = 1 1
BP
44
)(
−1
P4 BP4
2 .....
4P
) (
−1
44 BP
3
)
k times

= P-1 Bk P (since PP-1=I)

Therefore,

Ak = On ⇔ P-1 Bk P = On

⇔ Bk = On

“ Thus if A and B are similar matrices then Ak = On ⇔ Bk = On ”.

7) Let A and B be any two square matrices of the same order, and let
p(λ) = a0 + a1λ + ….. + akλ be any polynomial.

Then

92
p ( A ) = a 0 I + a1 A + ..... + a k A k

= a0 I + a1 P −1 BP + a 2 P −1 B 2 P + ..... + a k P −1 B k P

[ ]
= P −1 a 0 I + a1 B + a 2 B 2 + ..... + a k B k P

= P −1 p (B )P

Thus

p ( A) = O n ⇔ P −1 p (B )P = On

⇔ p ( B ) = On

Thus “ IF A and B ARE SIMILAR MATRICES THEN FOR ANY POLYNOMIAL


p(λ); p (A) = On ⇔ p (B) = On ”.

(8) Let A be any matrix. By A(A) we denote the set of all polynomials p(λ) such
that

p(A) = On, i.e.

A (A) = {p(λ) : p(A) = On}

Now from (6) it follows that,

“ IF A AND B ARE SIMILAR MATRICES THEN A(A) = A (B) ”.

The set A (A) is called the set of “ ANNIHILATING POLYNOMIALS OF A ”.


Thus similar matrices have the same set of annihilating polynomials.

We shall discuss more about annihilating polynomials later.

We now investigate the following question? Given an nxn matrix A, when is it


similar to a “simple matrix”? What are simple matrices? The simplest matrix we
know is the zero matrix On. Now A ∼ On ⇔ There is a nonsingular matrix P such
that A = P-1On P = On.

∴ “ THE ONLY MATRIX SIMILAR TO On IS On ITSELF ”.

The next simple matrix we know is the identity matrix In. Now A ∼ In ⇔
there is a nonsingular P such that A = P-1 In P ⇔ A = In.

93
Thus “THE ONLY MATRIX SIMILAR TO In IS ITSELF ”.
Similarly the only matrix similar to a scalar matrix kIn , (where k is a scalar),is kIn
itself.

The next class of simple matrices are the DIAGONAL MATRICES. So we


now ask the question “ Which type of nxn matrices are similar to diagonal
matrices”?

Suppose now A is an nxn matrix; and A is similar to a diagonal matrix,

 λ1 
 
 λ2 
D=
 O 
 
 λn 
(λI not necessarily distinct).

Then there exists a nonsingular matrix P such that

P-1 A P = D

AP = PD ………..(1)

 p11 p12 ..... p1i ..... p1n 


 
p p22 ..... p2i ..... p2 n 
LetP =  21
 M M M M M M 
 
 pn1 pn 2 ..... pni ..... pnn 

 a11 a12 ..... a1n 


 
a a 22 ..... a 2n 
A =  21
..... ..... ..... ..... 
 
a a nn 
 n1 an2 .....

 p1 i 
 p2i 
L etPi =   denote the ith column of P.
 M 
 
 p ni 

94
Now the ith column of AP is

 a11 p1i + a12 p 2 i + ..... + a1n pni 


 
 a 21 p1 i + a 22 p 2 i + ..... + a 2 n p ni 
 ............................................ 
 
 an1 p1i + a n 2 p 2 i + ..... + a nn pni 

which is equal to APi.


Thus the ith column of AP, the l.h.s. of (1), is APi.

Now the ith column of P D is

 p1i λi   p1i 
   
 p 2 i λi  = λ  p 2 i  = λ P
 M  i
 M  i i

   
 p ni λi   pni 

Thus the ith column of P D, the r.h.s. of (1), is λi Pi. Since l.h.s. = r.h.s. by (1) we
have

APi = λi Pi ; i = 1, 2, …., n ……………..(2)

Note that since P is nonsingular no column of P can be zero vector. Thus none
of the column vectors Pi are zero. Thus we conclude that,

“IF A IS SIMILAR TO A DIAGONAL MATRIX D THEN THE DIAGONAL


ENTRIES OF D MUST BE THE EIGENVALUES OF A AND IF P-1AP = D THEN
THE ith COLUMN VECTOR OF P MUST BE AN EIGENVECTOR CORRESPON
DING TO THE EIGENVALUE WHICH IS THE ith DIAGONAL ENTRY OF D”.

Note:

The n columns of P must be linearly independent since P is nonsingular and thus


these n columns give us n linearly independent eigenvectors of A

Thus the above result can be restated as follows: A is similar to a


diagonal matrix D and P-1 A P = D ⇒ A has n linearly independent eigenvectors;
taking these as the columns of P we get P-1 A P we get D where the ith diagonal
entry of D is the eigenvalue corresponding to the ith eigenvector, namely the Ith
column vector of P. .

95
Conversely, it is now obvious that if A has n linearly independent
eigenvectors then A is similar to a diagonal matrix D and if P is the matrix whose
ith column is the eigenvector, then D is P-1 A P and ith diagonal entry of D is the
eigenvalue corresponding to the ith eigenvector.

When does then a matrix have n linearly independent eigenvectors? It can


be shown that a matrix A has n linearly independent eigenvectors ⇔ the
algebraic multiplicity of each eigenvalue of A is equal to its geometric multiplicity.
Thus

A IS SIMILAR TO A DIAGONAL MATRIX ⇔


FOR EVERY EIGENVALUE OF A, ITS ALGEBRAIC MULTIPLICITY IS
EQUAL TO ITS GEOMETRIC MULTPLICITY”.

RECALL; if C (λ ) = (λ − λ1 ) 1 (λ − λ2 ) 2 .....(λ − λk ) k where λ 1, λ 2, …..,


a a a

λ k are the distinct eigenvalues of A, then ai is called the algebraic multiplicity of


the eigenvalue λ i. Further, let

ω i = {x : Ax = λi x}

be the eigensubspace corresponding to λ i. Then gi = dim ωi is called the


geometric multiplicity of λ i.

Therefore, we have,

“ If A is an nxn matrix with C (λ ) = (λ − λ1 ) a1 L (λ − λk ) ak where λ 1, ….., λ k are the


district eigenvalues of A, then A is similar to a diagonal matrix ⇔ ai = gi (=dimωi)
; 1≤ i ≤ k”.

Example:

Let us now consider

 − 9 4 4
 
A =  − 8 3 4
 − 16 8 7 
 

On page 87, we found the characteristic polynomial of A as

C( λ ) = ( λ +1)2 ( λ - 3)

Thus λ 1 = -1 ; a1 = 2
λ 2 = 3 ; a2 = 1

96
On pages 83 and 84 we found,

W 1 = eigensubspace corresponding to λ = -1

 1  1 
    
=  x : x = A1  2  + A2  0  
  0  2 
    

W 2 = eigensubspace corresponding to λ = 3

  1 
  
=  x : x = k  1 
  2 
  

Thus dim W 1 = 2 ∴ g1 = 2
dim W 2 = 1 ∴ g2 = 1

Thus a1 = 2 = g1 and hence A must be similar.


a2 = 1 = g2

to a diagonal matrix. How do we get P such that P-1AP is a diagonal matrix?


Recall the columns of P must be linearly independent eigenvectors. From ω1 we
1 1
   
get two linearly eigenvectors, namely,  2  and  0  ; and from ω2 we get third as
0  2
   
1
 
1 .
 2
 
Thus if we take these as columns and write,

1 1 1
 
P =  2 0 1
 0 2 2
 
 1 0 − 1 
 2
Then P
−1
= 2 −1 − 1 
 2  ; and it can be verified that
− 2 1 1 
 

97
 1 0 − 1   − 9 4  1 1
 2  4

1

P AP =  2
−1
−1 − 1  −8
2 
3 4  2 0 1

− 2 1 1   − 16 8 7   0 2 2 
 

−1 0 0
 
= 0 −1 0  a diagonal matrix., whose diagonal entries are the
 0 3 
 0
eigenvalues of the matrix A.

Thus we can conclude that A is similar to a diagonal matrix, i.e., P-1 AP = D


⇒ A has n linearly independent eigenvectors namely the n columns of P.

Conversely, A has n linearly independent eigenvectors.


⇒ P-1 AP is a diagonal matrix where the columns of P are taken to be the n
linearly eigenvectors.

We shall now see a class of matrices for which it is easy to decide


whether they are similar to a diagonal matrix; and in which case the P-1 is easy to
compute. But we shall first introduce some preliminaries.

 x1   y1 
   
 x2  y 
If x =   ; y =  2  are any two vectors in Cn, we define the INNER
M M
   
x  y 
 n  n
PRODUCT OF x with y (which is denoted by (x,y)) as,
(x , y ) = x1 y 1 + x 2 y 2 + K + x n y n
n
= ∑
i =1
xi y i

Example 1:

 i   1 
   
If x =  2 + i ; y =  1 − i  ; then,
 −1   i 
   

( x , y ) = i .1 + (2 + i )(1 − i ) + (− 1 )(i )

98
= i + (2 + i )(1 + i ) + (− 1 )(− i )= 1 + 5 i

()
Whereas ( y , x ) = 1 i + (1 − i )(2 + i ) + (i )(− 1 ) = 1 − 5i

We now observe some of the properties of the inner product, below:

(1) For any vector x in Cn, we have


n n
(x , x ) = ∑ xi x i = ∑ xi
2
,
i =1 i =1
which is real ≥ 0. Further,
n
(x , x ) = 0 ⇔ ∑ x i
2
= 0
i=1

⇔ x i = 0 ;1 ≤ i ≤ n
⇔ x = θn
Thus,
(x,x) is real and ≥ 0 and (x,x)= 0 ⇔ x = θn

n  n 
(2) (x , y ) = ∑ xi y i = 
 ∑ yi xi 

i =1  i =1 
= (y , x )
Thus,
(x , y ) = ( y , x )
(3) For any complex number α, we have,
n n
(α x , y ) = ∑ (α x i ) y i =α ∑ xi y i
i =1 i =1

= α ( x, y )

Thus
(αx,y) = α (x,y) for any complex number α.
We note,
( x , α y ) = (α y , x ) by (2)
= α ( y , x ) = α (y , x ) = α (x , y )
n
(4) (x + y, z ) = ∑ (x i + y i )z i
i=1

99
n n
= ∑ xi z i + ∑ yi z i
i =1 i =1

= ( x, y ) + ( x, z )

Thus

(x + y, z) = (x,z) + (y,z) and similarly

(x, y + z) = (x, y) + (x, z)

We say that two vectors x and y are ORTHOGONAL if (x, y) = 0.

1  − 1
   
Example (1) If x =  i ; y =  i ,
− i 0
   

then,
(x, y ) = 1(− 1) + i (i ) + (− i )(0 )
= -1 + 1 = 0
Thus x and y are orthogonal.

1  − 1
   
(2) If x =  i , y =  a 
 − i 1
   

then
(x, y ) = −1 + ai − i
∴ x, y orthogonal
⇔ − (1 + i ) + a i = 0
1+ i 
⇔ a =   = − i (1 + i ) = 1 − i
 i 
⇔ a =1+ i

100
3.3 HERMITIAN MATRICES

Let A = (aij); be an nxn matrix. We define the Hermitian conjugate of A,


denoted by A* as ; A∗ = (aij∗ ) where aij∗ = a ji .

A* is the conjugate of the transpose of A.

 1 i
Example 1: A =  
− i i

1 − i 
Transpose of A =  
i i 

1 i 
∴ A* =  
 − i − i 

 1 i
Example 2: A =  
 − i 2
1 − i 
Transpose of A =  
i 2 

 1 i
∴ A* =  
 − i 2

Observe that in Example 1. A* ≠ A, whereas in Example 2, A* = A.

DEFINITION: An nxn matrix A is said to be HERMITIAN if


A* = A.

We now state some properties of Hermitian matrices.

(1) If A = (aij) , A* = (a*ij), and A = A* , then aii = a*ii = aii

Thus the DIAGONAL ENTRIES OF A HERMITIAN MATRIX ARE REAL.

 x1   y1 
   
 x2  y 
(2) Let x =   ; y =  2  be any two vectors in Cn and A a Hermitian matrix.
M M
   
x  y 
 n  n

101
Let
 ( Ax )1   ( Ay )1 
   
 ( Ax )2   ( Ay ) 2 
Ax =  ; Ay =  M 
M
   
 ( Ax )   ( Ay ) 
 n   n 

We have
n n
( Ax )i = ∑a ij x j ; ( Ay )j = ∑a ji yi.
j =1 i =1

Now
n
( Ax , y ) = ∑ ( Ax )i y i
i =1
n  n 
= ∑ ∑ 
 a ij x j  y i
i =1  j =1 
n
 n 
= ∑j =1
x j  ∑ a ij y i 
 i =1 

n
 n 
= ∑j =1
x j  ∑ a ij y i 
 i=1 

n
 n 
= ∑j =1
x j∑ a
 i =1
ji yi 

(Qaij = a ji since A = A* )

(Ay )
n
= ∑j =1
x j j

= (x, Ay)

Thus IF A IS HERMITIAN THEN


(Ax, y) = (x, Ay)
FOR ANY TWO VECTORS x, y.

102
(3) Let λ be any eigenvalue of a Hermitian matrix A. Then there is an x ∈ Cn, x ≠
θn such that
Ax = λx.
Now, since A is Hermitian we have,
λ ( x , x ) = (λ x , x ) = ( Ax , x )
= ( x , Ax )
= (x , λ x )
= λ (x , x )

(
∴ λ − λ )(x , x ) = 0 . But (x , x ) ≠ 0Q x ≠ θ n

∴ λ − λ = 0∴ λ = λ ∴ λ is real.

THUS THE EIGENVALUES OF A HERMITIAN MATRIX ARE ALL REAL.

(4) Let λ, µ be two different eigenvalues of a Hermitian matrix A and x, y their


corresponding eigenvectors. We have,

Ax = λx and Ay = µy
and λ, µ are real by (3).

Now,
λ (x , y ) = (λ x , y )
= (Ax , y )
= (x , Ay ) by ( 2 )
= (x , µ y )
= µ (x , y )
= µ (x , y ) since µ is real .
Hence we get
(λ − µ )(x , y ) = 0 .
Butλ ≠ µ
So we get (x,y) = 0 ⇒ x and y are orthogonal.

THUS IF A IS A HERMITIAL MATRIX THEN THE EIGENVECTORS


CORRESPONDING TO ITS DISTINCT EIGENVALUES ARE ORTHOGONAL.

103
3.4 GRAMM – SCHMIDT ORTHONORMALIZATION

Let U1, U2, …., Uk be k linearly independent vectors in Cn spanning a subspace W .


The Gramm – Schmidt process is the method to get an orthonormal set
φ1 , φ 2 ,....., φ k such that the subspace ω spanned by U1, ….., Uk is the same as
the subspace spanned by φ1 ,....., φ k thus providing an orthonormal basis for W.

The process goes as follows:

Let ψ 1 = U 1 ;

ψ1 ψ1
φ1 = = Note φ1 = 1
ψ1 (ψ 1 ,ψ 1 )
(We have used the symbol x to denote the norm ( x, x ) of a vector x)

Next, let,

ψ 2 = U 2 − (U 2 , φ1 )φ1

Note that

(ψ 2 , φ 1 )
= ( U 2 , φ1 ) − ((U 2 , φ1 )φ1 , φ1 )
= ( U 2 , φ1 ) − ( U 2 , φ1 )(φ1 , φ1 )
= ( U 2 , φ1 ) − ( U 2 , φ1 ) (since (φ1 , φ1 ) = 1
Hence we get,

ψ 2 ⊥ φ1.

Let
ψ2
φ2 = ; clearly φ 2 = 1, φ1 = 1, (φ1 , φ 2 ) = 0
ψ2

Also

x = α1 U1 + α2 U2 then
⇔ x = α1ψ 1 + α 2 (ψ 2 + (U 2 , φ1 )φ1 )
⇔ x = α 1 ψ 1 φ1 + α 2 [ ψ 2 φ2 + (U 2 ,φ1 )φ1 ]

104
⇔ x = β 1φ1 + β 2φ 2 , where

β1 = α 1 ψ 1 + α 2 (U 2 ,φ1 )

β2 = α2 ψ 2

Thus x ∈ subspace spanned by U1, U2

⇔ x ∈ subspace spanned by φ1, φ2.

Thus φ1, φ2 is an orthonormal basis for the subspace [U1,U2].

Having defined φ1, φ2,….., φi-1 we define φi as follows:

( ) (ψ i , φp ) = 0
i −1
ψi = U i − ∑ U p , φp φp Clearly 1≤ p ≤ i-1
p =1

and
ψi
φi =
ψi

Obviously φi = 1 and (φi , φ j ) = 0 for 1 ≤ j ≤ i − 1

and x ∈ the subspace spanned by U1,U2,…,Ui which we denote by


[U1, U2, ….., Ui]
⇔ x ∈ the subspace spanned by φ1 , φ 2 ,L , φ i which we denote by [φ1, ….., φi].
Thus φ1, φ2, ….., φi is an orthonormal basis for [U1, ….., Ui].
Thus at the kth stage we get an orthonormal basis φ1, …., φk for [U1, ….., Uk].

Example:

1    2
   1   
1    3
Let U 1 =  ;U 2 =  1 ;U 3 =  
1
   − 1 1
 
0 0   0
     

be l.i. vectors in R4. Let us find an orthonormal basis for the subspace ω
spanned by U1, U2, U3 using the Gramm – Schmidt process.

105
1 1
   
1 ψ1 1 1
ψ 1 = U 1 =  ; φ1 = =
1
  (ψ 1 , ψ 1 ) 3  1 
0  0
   


 1 
 
 3 
 1 
φ1 =  3 
 1 
 
 3 
 
 0 

ψ 2 = U 2 − (U 2 , φ1 )φ1

 1 
 
 1   3
   1 
 1   1 1 1 
=   −  + −  3
−1 3  
   3 3
 1 
 0 
   3
 0 

1 
 1   3
  1 
 1   3
=  −
− 1  1 
 
 0   3 
   0 
 

 2 
 3 
 2  4 4 16 2 6
= 3  and ψ 2 = + + =
− 4  9 9 9 3
 3
 0 

106
 2   1 6 
 3 
 
ψ2 3  2   1
∴φ2 = =  3 = 6 
ψ2 2 6 − 4   −2 
 3   6

 0   0 

Thus

 1 
 6 
 1 
φ2 =  6 
− 2 
 6
 0 
 

Finally,

ψ 3 = U 3 − (U 3 , φ1 )φ1 − (U 3 , φ 2 )φ 2

 1   1 
 2  3   6 
   1   1 
 3   6    3 
=   −   3 −   6 
   
    1   −2
1 3 6
 0  3   6
   0   0 
   

 2   2   1 2 
   
 3  2  1 
=   −   −  2
     − 1
1 2
 0  0 
     0 

− 1 
 2
 1 
= 2 
 0 
 0 
 

107
ψ3 = 1 +1 =
4 4
1 = 1
2 2

 − 1   − 1 
 2 2
ψ3  1   1 
∴ φ3 = = 2 2  =  2 
ψ3  0   0 
 0    
   0 

Thus the required orthonormal basis for W, the subspace spanned by U1,U2, U3
is φ1, φ2, φ3, where

 1   1  − 1 
 3  6  
    2
1 1  1 
φ1 =  
3 ;φ 2 =  
6 ;φ 3 =  2 
     0 
−
1 2
 3 6  
  0 
 0   0 
   

Note that these φi are mutually orthogonal and have, each, ‘length’ one.

We now get back to Hermitian matrices. We had seen that the


eigenvalues of a Hermitian matrix are all real; and that the eigenvectors
corresponding to district eigenvalues are mutually orthogonal. We can further
show the following: (We shall not give a proof here, but illustrate with an
example).

Let A be any nxn Hermitian matrix. Let


C (λ ) = (λ − λ1 ) (λ − λ 2 ) .....(λ − λ k ) be its characteristic polynomial, where λ1, λ2,
a1 a2 ak

….., λk are its distinct eigenvalues and a1, ….., ak are their algebraic multiplicities.
If W i is the characteristic subspace, (eigensubspace), corresponding to the
eigenvalue λi ; that is,

ω i = {x : Ax = λ i x }

then it can be shown that dim is ωi = ai.

We choose any basis for ωi and orthonormalize it by G-S process and get an
orthonormal basis for ωi. If we now take all these orthonormal - basis vectors for
ω1, . .,ωk and write them as the columns of a matrix P then

108
P*AP
will be a diagonal matrix.

Example :

 6 −2 2 
 
A = − 2 3 − 1
 2 −1 3 

Notice

A* = A1 = A1 = A.

Thus the matrix A is Hermitian.

Characteristic Polynomial of A:

λ −6 2 −2
λI − A = 2 λ −3 1
−2 1 λ −3

λ −2 2 (λ − 2 ) 0
 
→R1 + 2 R 2
2 λ −3 1
−2 1 λ −3

1 2 0
= (λ − 2 ) 2 λ − 3 1
− 2 1 λ − 3

R 2 − 2 R1
1 2 0
→ = (λ − 2 ) 0 λ −7 1
R 3 + 2 R1
0 5 λ −3

= (λ − 2 )[(λ − 7 )(λ − 3) − 5]

109
[
= (λ − 2 ) λ2 − 10λ + 16 ]
= (λ − 2)(λ − 2)(λ − 8)

= (λ − 2 ) (λ − 8 )
2

Thus
C (λ ) = (λ − 2 ) (λ − 8)
2

∴ λ1 = 2 a1 = 2

λ2 = 8 a2 = 1

The characteristic subspaces:

W1 = {x : Ax = 2x}

= {x : ( A − 2 I )x = θ }

i.e. We have to solve

(A – 2I) x = θ

 4 −2 2  x1   0 
    
i.e.  − 2 1 − 1 x 2  =  0 
 2 −1 1  x 3   0 

⇒ 2x1 – x2 + x3 = 0

⇒ x3 = - 2x1 + x2

 x1 

∴x =  x2 ; x , x
 1 2 arbitrary
 −2 x + x 
 1 2 

  α  
   
∴ ω1 =  x : x =  β  ; α , β scalars 
  − 2α + β  
   

110
∴ A basis for W 1 is

 1   0
   
U 1 =  0 ; U 2 =  1 
 − 2 1
   

We now orthonormalize this:

 1  ψ
 
ψ1 = U1 =  0  ψ1 = 5 φ1 = 1

 − 2 ψ 1
 

 1 
 
 5 
∴ φ1 =  0 
 2 
− 5 

ψ 2 = U 2 − (U 2 , φ1 )φ1

 1 
0  
   2  5 
=  1  −  −  0 
1  5  2 
  − 5 

 2 
 0  5 
   
=  1 +  0 
 1  − 4 
   
 5

2 
 5
= 1 
1 
 
 5

111
4 1 30 30
ψ2 = +1+ = =
25 25 25 5

 2 
2   
ψ2 5  5   30 
∴φ2 = =  1 = 5 
ψ2 30  1   30 
 
 5  1 
 30 

∴ φ1, φ2 is an orthonormal basis for W 1.

W2 = {x : Ax = 8x}

= {x : ( A − 8I )x = θ }

So we have to solve

(A-8I) x = θ i.e.

− 2 −2 2  x1   0 
    
− 2 −5 − 1  x 2  =  0 
 2 −1 − 5  x 3   0 

This yields x1 = -x2 + x3 and therefore the general solution is

 2γ   2 
   
 γ = γ − 1
 γ   1 
   

 2
 
∴ Basis : U 3 =  − 1
1
 

∴ Orthonormalize: only one step:

2
 
ψ 3 = U 3 =  − 1
1
 

112
 2 
 2  
 6 
ψ3 1  
φ3 = = −
  
1 = − 1 
ψ3 6   6
 1  1 
 6 

 1 2 2 
 
 5 30 6 
∴ If P =  0 5 − 1 
30 6
 2 1 1 
− 
 5 30 6 

Then

P* = P1 and

2 0 0
 
P AP = P AP =  0
* 1
2 0 ;
0 8 
 0

a diagonal matrix.

113
3.5 VECTOR AND MATRIX NORMS

Consider the space,

 x  
R 2 =  x =  1 ; x1 , x 2 ∈ R ,
  x2  

x 
our ‘usual’ two-dimensional plane. If x =  1  is any vector in this space we
 x2 
define its ‘usual’ ‘length’ or ‘norm’ as

x = x12 + x22

We observe that

(i) x ≥ 0 for every vector x in R2,


x = 0 if and only if x is θ;

(ii) αx = α x for any scalar α; for any vector x.

(iii) x + y ≤ x + y for any two vectors x and y.


(The inequality (iii) is usually referred to as the triangle inequality).

We now generalize this idea to define the concept of a norm on Cn or Rn.

The norm can be thought of intuitively as a rule which associates with


each vector x in V, a real number x , and more precisely as a function from the
collection of vectors to the real numbers, satisfying the following properties:

(i) x ≥ 0 for every x ∈ V and


x = 0 if and only if x = θ;

(ii) α x = α x for every scalar α and every vector x in V,

(iii) x + y ≤ x + y for every x, y in V.

114
Examples of Vector Norms on Cn and Rn

 x1 
 
 x2 
Let x =   be any vector x in Cn (or Rn)
M
 
 x n 

We can define various norms as follows:

[ ]  2
1 n
= ∑ x i 
2
= x1 + x 2 + ..... + x n
2 2 2 2
(1) x
 i =1 
2

n
(2) x 1
= x1 + x 2 + .... + x n = ∑ x i
i =1

In general for 1 ≤ p < ∞ we can define,

1
 n

= ∑ xi
p p
(3) x 
 i =1 
p

If we set p = 2 in (3) we get x 2 as in (1) and if we set p = 1 in (3) we get x 1 as


in (2).

(4) x ∞
= max .{ x1 , x 2 ,....., xn }

All these can be verified to satisfy the above mentioned properties (i), (ii) and (iii)
required of a norm. Thus these give several types of norms on Cn and Rn.

Example:

 1 
 
(1) Let x =  − 2  in R3
 − 1
 

Then

x 1 = 1+1+ 2 = 4

115
1
x 2
= (1 + 1 + 4) 2 = 6

x ∞
= max .{1,1, 2} = 2

( )
1 1
x 4
= 1 +1 +
4 4
24 4 = 18 4

 1 
 
(2) Let x =  i  in C3
 − 2i 
 

Then

x 1 = 1+ 2 +1 = 4
1

= (1 + 4 + 1) = 6
2
x 2

x ∞
= max .{1, 2,1} = 2
1

( )
1
3
= 1 + 2 +1 = 10
3 3 3 3
x 3

{x } (k ) ∞
Consider a sequence k =1
of vectors in Cn (or Rn)

 x1( k ) 
 (k ) 
x
x (k ) = 2 
 M 
 ( k ) 
 xn 

 x1 
 
x 
Suppose x= 2  ∈ C ( or R )
n n
M
 
x 
 n 

116
DEFINITION:

{ } of vectors CONVERGES to the vector x


We say that the sequence x
(k )

as k tends to infinity if the sequence of numbers, { x } converges to the


(k )
1

number x ; { x } converges to x , …., and { x } converges to x i.e.


(k ) (k )
1 2 2 n n

As k → ∞; xi( k ) → xi for every i=1, 2, …., n.

Example:

 
 i 
 k 
Let x ( k ) = 1 − 2 k  be a sequence of vectors in R3.
 1 
 
 k 2 + 1

0
  1
Let x =  1  . Here x1( k ) = → 0 = x1
0 k
 
2
x 2( k ) = 1 − → 1 = x2
k
1
x3( k ) = 2 → 0 = x3
k +1

∴ x ( k ) i → xi for i =1,2,3.

∴ x (k ) → x

If {x (k ) } is a sequence of vectors such that in some norm , the sequence of


real numbers, x ( k ) − x converges to the real number 0 then we say that the
sequence of vectors converges to x with respect to this norm. We then write,

x ( k ) → x

For example consider the sequence,

117
 1 
 
 k 
=  1 −  in R3 as before and,
2
x(k)
 k 
 1 
 2 
 k + 1

0
 
x = 1
0
 

We have

 1 
 
 k 
x (k ) − x =  − 
2
 k 
 1 
 2 
 k + 1

Now

1 2 1
x (k ) − x = + + 2 →0
1 k k k +1

∴ x ( k ) →
1
x
Similarly

1 2 1  2
x (k ) − x = max . , , 2  = → 0

 k k k + 1 k

∴ x (k )  
∞ → x
1
 1 2 1  2
x (k ) −x = 2 + 2 + 2 
→0
2
 k k (
k +1 
2
 )
∴ x ( k )  2 → x

118
Also,

1
 1  2  p 1  p
x (k ) − x =  p +  + p 
→0
p
 k k ( )
k 2 + 1 

∴ x(k ) 
p → x ∀ p ;1 ≤ p ≤ ∞
It can be shown that
“ IF A SEQUENCE {x (k ) }OF VECTORS IN Cn (or Rn) CONVERGES TO A
VECTOR x IN Cn (or Rn) WITH RESPECT TO ONE VECTOR NORM THEN THE
SEQUENCE CONVERGES TO x WITH RESPECT TO ALL VECTOR NORMS
AND ALSO THE SEQUENCE CONVERGES TO x ACCORDING TO
DEFINITION ON PAGE 113 . CONVERSELY IF A SEQUENCE CONVERGES
TO x AS PER DEFINITION ON PAGE 113 THEN IT CONVERGES WITH
RESPECT TO ALL VECTOR NORMS”.

Thus when we want to check the convergence of a particular sequence of


vectors we can choose that norm which is convenient to that sequence.

MATRIX NORMS

Let M be the set of all nxn matrices (real or complex). A matrix norm is a
function from the collection of matrices to the real numbers, whose value at any
matrix A is denoted by A having the following properties:
(i) A ≥ 0 for all matrices A
A = 0 if and only if A = On,

(ii) αA = α A for every scalar α and every matrix A,

(iii) A + B ≤ A + B for all matrices A and B,

(iv) AB ≤ A B for all matrices A and B.

Before we give examples of matrix norms we shall see a method of getting a


matrix norm starting with a vector norm.

Ax
Suppose . is a vector norm. Then, consider (where A is an nxn
x
matrix); for x ≠ θn. This given us an idea to by what proportion the matrix A has

119
distorted the length of x. Suppose we take the maximum distortion as we vary x
over all vectors. We get

max Ax
x ≠ θn x

a real number. We define

max Ax
A =
x ≠ θn x

We can show this is a matrix norm and this matrix norm is called the matrix norm
subordinate to the vector norm . We can also show that

max Ax max
A = = Ax
x ≠ θn x x =1

For example,

max
A1 = Ax 1
x 1 =1

max
A = Ax
2 x 2
=1 2

max
A = Ax ∞
∞ x ∞
=1

max
A = Ax
p x p
=1 p

How hard or easy is it to compute these matrix norms? We shall give some idea
of computing A 1 , A ∞ and A 2 for a matrix A.

Let

120
 a 11 a 12 ..... a1n 
 
 a 21 a 22 ..... a2n 
A=
..... ..... ..... ..... 
 
a a nn 
 n1 an2 .....

The sum of the absolute values of the entries in the ith column is called the
absolute column sum and is denoted by Ci. We have

n
C1 = a11 + a 21 + a 31 + ..... + a n1 = ∑ ai1
i =1
n
C 2 = a12 + a 22 + a32 + ..... + an 2 = ∑ ai 2
i =1
….. ….. ….. ….. ….. ….. …..
n
Cj = ∑
i =1
a ij ; 1≤j≤n

Thus we have n absolute column sums, C1 , C2, ….., Cn.

Let
C = max .{C1 , C2 ,.....,Cn }

This is called the maximum absolute column sum. We can show that,

A 1 = C = max .{C1 ,....., C n }

= max  n 
∑ a
1 ≤ j ≤ n  i = 1
ij 

For example, if

 1 2 − 3
 
A =  −1 0 1 ,
− 3 − 4 
 2
then
C 1 = 1 + 1 + 3 = 5;
C 2 = 2 + 0 + 2 = 4; and C = max. {5, 4, 8} = 8
C3 = 3 + 1 + 4 = 8

121
∴ A1 =8

Similarly we denote by Ri the sum of the absolute values of the entries in


the ith row
n
R1 = a11 + a12 + ..... + a1n = ∑ a1 j
j =1

n
R2 = a21 + a22 + ..... + a2 n = ∑ a2 j
j =1

….. ….. ….. ….. ….. ….

n
R i = a i1 + a i 2 + ..... + a in = ∑a j =1
ij

and define R, the maximum absolute row sum as,

R = max {R1, ….., Rn}

It can be show that,

A ∞ = R = max{R1 ,....., Rn }

 n 
 ∑ a ij 
= max
1 ≤ i ≤ n  j =1 
For example, for the matrix

 1 2 − 3
 
A =  −1 0 1 , we have
−3 − 4 
 2

R1 = 1 + 2 + 3 = 6;
R2 = 1 + 0 + 1 =2; and R = max {6, 2, 9}= 9
R3 = 3 + 2 + 4 = 9

∴ A ∞ =9

122
The computation of A 1 and A ∞ for a matrix are thus fairly easy.
However, the computation of A 2
is not very easy; but somewhat easier in the
case of the Hermitian matrix.

Let A be any nxn matrix; and


C (λ ) = (λ − λ1 ) 1 L (λ − λk ) k , be its characteristic polynomial, where
a a

λ1, λ2, ….., λk are the distinct characteristic values of A.

Let
P = max .{λ 1 , λ 2 , K , λ k }
This is called the spectral radius of A and is also denoted by A sp

It can be show that for a Hermitian matrix A,

A 2=P= A sp

For example, for the matrix,

 6 −2 2
 
A = − 2 3 − 1
 2 −1 3 

which is Hermitian we found on page 106, the distinct eigenvalues as λ1 = 2; λ2 =


8

∴ A xp
= P = max .{2,8}
=8

∴ A2= A sp
=8

If A is any general nxn matrix (not Hermitian) then let B = A* A. Then


B* = A* A = B, and hence B is Hermitian and its eigenvalues are real and in fact
its eigenvalues are nonnegative. Let the eigenvalues (distinct) of B be µ1, µ2,
….., µr. Then let

µ = max {µ1, µ2, ….., µr}

123
We can show that

A 2 = µ = max{µ1 ,....., µ n }

If follows from the matrix norm definition subordinate to a vector norm, that
max Ax
A =
x ≠ θn x

∴ For any x in Cn or Rn , we have, if x ≠ θn

Ax max Ax
≤ = A
x x ≠ θn x

and therefore

Ax ≤ A x for all x ≠ θn

But this is obvious for x = θn

Thus if A is a matrix norm subordinate to the vector norm x then

Ax ≤ A x

for every vector x in Cn (or Rn).

124
125
UNIT 4
EIGEN VALUE COMPUTATIONS

125
4.1 COMPUTATION OF EIGEN VALUES

In this section we shall discuss some standard methods for computing the
eigenvalues of an nxn matrix. We shall also briefly discuss some methods for
computing the eigenvectors corresponding to the eigenvalues.

We shall first discuss some results regarding the general location of the
eigenvalues.

Let A = (aij) be an nxn matrix; and let λ1, λ2, ….., λn be its eigenvalues
(including multiplicities). We defined

P = A xp
= max {λ 1 , λ 2 ,....., λ n }

Thus if we draw a circle of radius P about the origin in the complex plane, then
all the eigenvalues of A will lie on or inside this closed disc. Thus we have

(A) If A is an nxn matrix then all the eigenvalues of A lie in the closed disc
{λ : λ ≤ P}in the complex plane.

This result give us a disc inside which all the eigenvalues of A are located.
However, to locate this circle we need P and to find P we need the eigenvalues.
Thus this result is not practically useful. However, from a theoretical point of
view, this suggests the possibility of locating all the eigenvalues in some disc.
We shall now look for other discs which can be easily located and inside which
the eigenvalues can all be trapped.

Let A be any matrix norm. Then it can be shown that P ≤ A . Thus if


we draw a disc of radius A and origin as center then this disc will be at least
as big as the disc given in (A) above and hence will trap all the eigenvalues.
Thus, the idea is to use a matrix norm, which is easy to compute. For example
we can use A ∞ or A 1 which are easily computed as Maximum Absolute
Row Sums (MARS) or Maximum Absolutr Column Sums (MACS) respectively,
that is,
n 
A ∞
= Max
{  ∑ a ij  and
1≤i≤ n  j=1 

126
n 
A 1 = Max
{  ∑ a ij 
1≤ j≤ n i =1 

. Thus we have,

(B) If A is an nxn matrix then all its eigenvalues are trapped in the closed disc
{ } { }
λ : λ ≤ A ∞ or the disc λ : λ ≤ A 1 . (The idea is to use A ∞ if it is smaller
than A 1 , and A 1 if it is smaller than A ∞ ).

COROLLORY

(C) If A is Hermitian, all its eigenvalues are real and hence all the eigenvalues
lie in the intervals,

{λ : −P ≤ λ ≤ P} by (A)

{λ : − A ∞
≤λ≤ A ∞
}
{λ : − A 1
≤λ≤ A 1
} by (B).

Example 1:

 1 −1 2
 
Let A = −1 2 3
 1 0 
 2

Here ‘Row sums’ are P1 = 4 P2 = 6 P3 = 3

∴ A ∞ = MARS = 6

Thus the eigenvalues are all in the disc ; {λ : λ ≤ 6}

The ‘Column sums’ are C1 = 3, C2 = 5, C3 = 5.

127
∴ A 1 = MACS = 5

∴ The eigenvalues are all in the disc, {λ : λ ≤ 5},

In this example A 1 = 5 < A ∞


= 6; and hence we use A 1 and get the smaller
disc {λ : λ ≤ 5}, inside which all eigenvalues are located.

The above results locate all the eigenvalues in one disc. The next set of
results try to isolate these eigenvalues to some extent in smaller discs. These
results are due to GERSCHGORIN.

Let A = (aij) be an nxn matrix.

The diagonal entries are

ξ1 = a11 ; ξ 2 = a22 ; ….., ξ n = ann ;

Now let Pi denote the sum of the absolute values of the off-diagonal entries of A
in the ith row.

Pi = ai1 + ai 2 + ..... + aii −1 + aii +1 + ..... + ain

Now consider the discs:

G1 : Centre ξ1 ; radius P1 : {λ : λ − ξ1 ≤ P1}

G 2 : Centre ξ 21; radius P2 : {λ : λ − ξ 2 ≤ P2 }


and in general

G i : Centre ξ i ; radius Pi : {λ : λ − ξ i ≤ Pi }

128
Thus we get n discs G1, G2, ….., Gn. These are called the GERSCHGORIN
DISCS of the matrix A.

The first result of Gerschgorin is the following:

(D) All eigenvalue of A must lie within the union of these Gerschgorin discs.

Example 2:

1 −1 0 
 
Let A = 0 4 1 
3 − 5 
 1

The Gerschgorin discs are found as follows:

ξ1 = (1,0) ; ξ2 = (4,0) ; ξ3 = (-5,0)

P1 = 1 ; P2 = 1 ; P3 = 4

G1 : Centre (1,0) radius 1


G2 : Centre (4,0) radius 1
G3 : Centre (-5,0) radius 4.

G1 (1,0) G2 (4,0)
G3 (-5,0)

129
Thus every eigenvalue of A must lie in one of these three discs.

Example 3:

 10 4 1 
 
Let A= 1 10 0 .5 
 1.5 − 3 20 

(It can be shown that the eigenvalues are exactly λ1 = 8, λ2 = 12, λ3 = 20).

Now for this matrix we have,

ξ1 = (10,0) ξ2 = (10,0) ξ3 = 20
P1 = 5 P2 = 1.5 P3 = 4.5

Thus we have the three Gerschgorin discs

G 1 = {λ : λ − 10 ≤ 5}
G 2 = {λ : λ − 10 ≤ 1 . 5}
G 3 = {λ : λ − 20 ≤ 4 . 5}

GG3
G1 G3

130
Thus all the eigenvalues of A are in these discs. But notice that our exact
eigenvalues are 8,12 and 20. Thus no eigenvalue lies in G2; and one
eigenvalue lie in G3 (namely 20) and two lie in G1 (namely 8 and 12).

Example 4:

1 0 1
 
Let A = 1 2 0
1 5 
 0

Now,

ξ1 = (1,0) ξ2 = (2,0) ξ3 = (5,0)


P1 = 1 P2 = 1 P3 = 1

The Gerschgorin discs are


G 1 = {λ : λ − 1 ≤ 1}
G 2 = {λ : λ − 2 ≤ 1}
G 3 = {λ : λ − 5 ≤ 1}

G1 (1,0) G2 (2,0)

G3 (5,0)

Thus every eigenvalue of A must lie in the union of these three discs.

131
In example 2, all the Gerschgorin discs were isolated; and in examples 3
and 4 some discs intersected and others were isolated. The next Gerschgoin
result is to identify the location of the eigenvalues in such cases.

(E) If m of the Gerschgorin discs intersect to form a common connected region


and the remaining discs are isolated from this region then exactly m
eigenvalues lie in this common region. In particular if Gerschgorin disc is
isolated from all the rest then exactly one eigenvalue lies in this disc.

Thus in example 2 we have all three isolated discs and thus each disc will
trap exactly one eigenvalue.

In example 3; G1and G2 intersected to form the connected (shaded) region


and this is isolated from G3. Thus the shaded region has two eigenvalues and
G3 has one eigenvalue.

In example 4, G1and G2 intersected to form a connected region (shaded


portion) and this is isolated from G3. Thus the shaded portion has two
eigenvalues and G3 has one eigenvalue.
REMARK:

In the case of Hermitian matrices, since all the eigenvalues are real, the
Gerschgorin discs, Gi = {λ : λ − a ii ≤ Pi } = {λ : λ − ξ i ≤ Pi } can be replaced by the
Gerschgorin intervals,

Gi = {λ : λ − ξ i ≤ Pi } = {λ : ξ i − Pi ≤ λ ≤ ξ i + Pi }

Example 5:

 1 −1 1 

Let A = −1 5 0 
 
 1 0 −1 
 2

Note A is Hermitian. (In fact A is real symmetric)

132
Here; ξ1 = (1,0) P1 = 2
ξ2 = (5,0) P2 = 1
ξ3 = (-1/2,0) P3 = 1

Thus the Gerschgorin intervals are

G1 : -1≤ λ ≤ 3
G2 : 4 ≤ λ ≤ 6
G3 : -3/2 ≤ λ ≤ ½

-2 -1 0 1 2 3 4 5 6

Note that G1 and G3 intersect and give a connected region, -3/2 ≤ λ ≤ 3; and this
is isolated from G2 : 4 ≤ λ ≤ 6. Thus there will be two eigenvalues in –3/2 ≤ λ ≤
3 and one eigenvalue in 4 ≤ λ ≤ 6.

All the above results (A), (B), (C), (D), and (E) give us a location of the
eigenvalues inside some discs and if the radii of these discs are small then the
centers of these circles give us a good approximations of the eigenvalues.
However if these discs are of large radius then we have to improve these
approximations substantially. We shall now discuss this aspect of computing
the eigenvalues more accurately. We shall first discuss the problem of
computing the eigenvalues of a real symmetric matrix.

4.2 COMPUTATION OF THE EIGENVALUES OF A REAL SYMMETRIC


MATRIX

We shall first discuss the method of reducing the given real symmetric matrix to
a real symmetric tridiagonal matrix which is similar to the given matrix and then
computing the eigenvalues of a real symmetric tridiagonal matrix. Thus the
process of determining the eigenvalues of A = (aij), a real symmetric mtrix
involves two steps:

STEP 1:

Find a real symmetric tridiagonal matrix T which is similar to A.

133
STEP 2:

Find the eigenvalues of T. (The eigenvalues of A will be same as those of


T since A and T are similar).

We shall first discuss step 2.

134
4.3 EIGENVALUES OF A REAL SYMMETRIC TRIDIAGONAL MATRIX

 a1 b1 0 0 0 .... 0 
 
 b1 a 2 b2 0 0 ..... 0 
 0 b a3 b3 0 ..... 0 
T =  2

Let  ..... ..... ..... ..... ..... ..... ..... 
 
 0 ..... ..... 0 bn − 2 a n −1 bn −1 
 0 a n 
 0 ..... ..... 0 bn −1

be a real symmetric tridiagonal matrix.

Let us find Pn (λ) = det [T - λI]

a1 − λ b1 0 ..... ..... 0
b1 a2 −λ b2 0 ..... 0
= ..... ..... ..... ..... ..... .....
0 ..... 0 bn −2 a n −1 − λ b n −1
0 ..... ..... 0 b n −1 an − λ

The eigenvalues of T are precisely the roots of Pn (λ) = 0

(Without loss of generality we assume bi ≠ 0 for all i. For if bi = 0 for some i then
the above determinant reduces to two diagonal blocks of the same type and
thus the problem reduces to that of the same type involving smaller sized
matrices).

We define Pi (λ) to be the ith principal minor of the above determinant. We have

135
P0 (λ ) = 1
P1 (λ ) = a1 − λ …….. (I)
Pi (λ ) = (a i − λ )Pi −1 (λ ) − b 2
i −1 Pi − 2 (λ )

W are interested in finding the zeros of Pn (λ ) . To do this we analyse the


polynomials P0 (λ ), P1 (λ ),....., Pn (λ ) .

Let C be any real number. Compute P0 (C ), P1 (C ),....., Pn (C ) (which can be


calculated recursively by (I)). Let N (C) denote the agreements in sign between
two consecutive in the above sequence of values, P0 (C ), P1 (C ),....., Pn (C ) . If for
some i, Pi (C ) = 0 , we assign to it the the same sign as that of Pi −1 (C ) . Then we
have

(F) There are exactly N (C) eigenvalues of T that are ≥ C.

Example:

If for an example we have an 8 x 8 matrix T (real symmetric tridiagonal)


giving use to,

P0 (1 ) = 1
P1 (1 ) = 2
P2 (1 ) = − 3
P3 (1 ) = − 2
P4 (1 ) = 6
P5 (1 ) = − 1
P6 (1 ) = 0
P7 (1 ) = 4
P8 (1 ) = − 2

Here the consecutive pairs,

136
P 0 (1 ), P1 (1 )
P 2 (1 ), P 3 (1 )
P 5 (1 ), P 6 (1 )
agree in sign.
(Because since P6 (1) = 0 we assign its sign to be the same as that of P5 (1) .

Thus three pairs of sign agreements are achieved. Thus N (C) = 3; and there
will be 3 eigenvalues of T greater than or equal to 1; and the remaining 5
eigen values are < 1.

It is this idea of result (F) that will be combined with (A), (B), (C), (D) and
(E) of the previous section and clever repeated applications of (F) that will
locate the eigenvalues of T. We now explain this by means of an example.

Example 7:

1 2 0 0 
 
2 −1 4 0 
Let T =
0 4 2 − 1
 
0 0 − 1 3 

Here we have

Absolute Row sum 1 = 3


Absolute Row sum 2 = 7
Absolute Row sum 3 = 7
Absolute Row sum 4 = 4

and therefore,

T ∞
= MARS = 7

137
(Note since T is symmetric we have MARS = MACS and therefore
T 1 = T ∞ = T ). Thus by our result (C), we have that the eigenvalues are all in
the interval –7 ≤ λ ≤ 7
[ ]

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

Now the Gerschgorin (discs) intervals are as follows:

G1 : Centre 1 radius : 2 ∴ G1 : [-1, 3]


G2 : Centre -1 radius : 6 ∴ G2 : [-7, 5]
G3 : Centre 2 radius : 5 ∴ G3 : [-3, 7]
G4 : Centre 3 radius : 1 ∴ G4 : [2, 4]

We see that G1, G2, G3 and G4 all intersect to form one single connected region
[-7, 7]. Thus by (E) there will be 4 eigenvalues in [-7, 7]. This gives therefore
the same information as we obtained above using (C). Thus so far we know
only that all the eigenvalues are in [-7, 7]. Now we shall see how we use (F) to
locate the eigenvalues.

First of al let us see how many eigenvalues will be ≥ 0. Let C = 0. Find N (0)
and we will get the number of eigenvalues ≥ 0 to be N (0).

Now
1− λ 2 0 0
2 −1 − λ 4 0
T − λI =
0 4 2−λ −1
0 0 −1 3−λ

P2 (λ ) = − (1 + λ )P1 (λ ) − 4 P0 (λ )
P0 (λ ) = 1 P1 (λ ) = 1 − λ P3 (λ ) = (2 − λ )P2 (λ ) − 16 P1 (λ )
P4 (λ ) = (3 − λ )P3 (λ ) − P2 (λ )

Now, we have,

138
P0 (0 ) = 1
P1 (0 ) = 1
P2 (0 ) = − 5
P3 (0 ) = − 26
P4 (0 ) = − 73

We have
P0 (0 ), P1 (0 )
P2 (0 ), P3 (0 ) as three consecutive pairs having sign
P3 (0 ), P4 (0 )
agreements.

∴ N (0 ) = 3

∴ Three are three eigenvalues ≥ 0


and hence one eigenvalue < 0.

i.e. there are three eigenvalues in [0, 7]


and there is one eigenvalue in [-7, 0]

One eigenvalue 3 eigenvalues

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

Fig.1

Let us take C = -1 and calculate N (C). We have

139
P 0 (− 1 ) = 1
P1 (− 1 ) = 2
P 2 (− 1 ) = − 4
P 3 (− 1 ) = − 48
P 4 (− 1 ) = − 188

Again we have N (-1) = 3. ∴ There are three eigenvalues ≥ -1 Compare this


with figure1. We get

One eigenvalue 3 eigenvalues

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

(Fig.2)

Let us take the mid point of [-7, -1] in, which the negative eigenvalue lies.

So let C = -4.

P0 (-4) = 1 Again there are three pairs of sign agreements.


P1 (-4) = 5 ∴ N (-4) = 3. ∴ There are 3 eigenvalues ≥ -4.
P2 (-4) = 11 Comparing with fig. 2 we get
P3 (-4) = -14
P4 (-4) = -109

that the negative eigenvalue is in [-7, -4] ………..(*)

Let us try mid pt. C = -5.5

We have

140
P0 (-5.5) = 1
P1 (-5.5) = + 6.5 ∴ N (-5.5) = 4. ∴ 4 eigenvalues ≥ -5.5.
P2 (-5.5) = 25.25 Combining this with (*) and fig. 2 we get
P3 (-5.5) = 85.375 that negative eigenvalue is in [-5.5 – 4].
P4 (-5.5) = 683.4375

We again take the mid pt. C and calculate N (C) and locate in which half of this
interval does this negative eigenvalue lie and continue this bisection process
until we trap this negative eigenvalue in as small an interval as necessary.

Now let us look at the eigenvalues ≥ 0. We have from fig. 2 three


eigenvalues in [0, 7]. Now let us take C = 1

P0 (1) = 1
P1 (1) = 0 ∴ N (1) = 3
P2 (1) = - 4 ∴ all the eigenvalues are ≥ 1 ………….. (**)
P3 (1) = - 4
P4 (1) = - 4

C=2

P0 (2) = 1
P1 (2) = -1 ∴ N (2) = 2 ∴ There are two eigenvalues
P2 (2) = - 1 ≥ 2. Combining this with (**) we get one
P3 (2) = 16 eigenvalue in [1, 2) and two in [2, 7].
P4 (2) = 17

C=3

P0 (3) = 1 ∴ N (3) = 1 ∴ one eigenvalue ≥ 3


P1 (3) = -2 Combining with above observation we get
P2 (3) = 4 one eigenvalue in [1, 2)
P3 (3) = 28 one eigenvalue in [2, 3)

141
P4 (3) = - 4 one eigenvalue in [3, 7)
Let us locate the eigenvalue in [3, 7] a little better. Take C = mid point = 5

P0 (5) = 1
P1 (5) = - 4 ∴ N (5) = 1
P2 (5) = 20 ∴ this eigenvalue is ≥ 5
P3 (5) = 4
P4 (5) = -28

∴ This eigenvalue is in [5, 7]

Let us take mid point C = 6

P0 (6) = 1
P1 (6) = - 5 ∴ N (6) = 0
P2 (6) = 31 ∴ No eigenvalue ≥ 6
P3 (6) = - 44 ∴ the eigenvalue is in [5, 6)
P4 (6) = 101

Thus combining all, we have,

one eigenvalue in [-5.5, -4)


one eigenvalue in [1, 2)
one eigenvalue in [2, 3)
one eigenvalue in [5, 6)

Each one of these locations can be further narrowed down by the bisection
applied to each of these intervals.

We shall now discuss the method of obtaining a real symmetric tridiagonal


T similar to a given real symmetric matrix A.

142
4.4 TRIDIAGONALIZATION OF A REAL SYMMETRIC MATRIX

Let A = (aij) be a real symmetric nxn matrix. Our aim is to get a real symmetric
tridiagonal matrix T such that T is similar to A. The process of obtaining this T is called
the Givens – Householder scheme. The idea is to first find a reduction process which
annihilates the off – tridiagonal matrices in the first row and first column of A and
repeatedly use this idea. We shall first see some preliminaries.

 U1 
 
 U2 
Let U= be a real nx1 nonzero vector.
M 
 
U 
 n

Then, H = UUT is an nxn real symmetric matrix. Let α be a real number (which we
shall suitably choose) and consider

P = I − αH = I − αUU T ...........(I )

We shall choose α such that P is its own inverse. (Note that PT = P). So we need

P2 = I

i.e.
(I - α H) (I - α H) = I
i.e.
(I - α UUT) (I - α UUT) = I

I – 2 α UUT + α2 UUT UUT = I

So we choose α such that

143
α2 UUT UUT = 2 α UUT

Obviously, we choose α ≠ 0. Because otherwise we get P = I; and we don’t get any new
transformation.
Hence we need
α UUT UUT = 2 UUT

But
UTU = U21 + U22 + ….. + U2n = U 2

is a real number ≠ 0 (since U is a nonzero vector) and thus we have

α (UT U). UUT = 2 UUT

and hence

2
α= .......... ..( II )
UTU

Thus if U is an nx1 nonzero vector and α is as in (II) then P defined as

P = I − αUU T .......... ....( III)

is such that

P = P T = P −1.......... ....( IV )

Now we go back to our problem of tridiagonalization of A. Our first aim is to find a P


of the form (IV) such that P t AP = PAP has off tridiagonal entries in 1 st row and 1 st
column as zero. We can choose the P as follows:

Let s 2 = a 2 21 + a 2 31 + ..... + a 2 n1 ................(V )


(the sum of the squares of the entries below the 1st diagonal in A)

144
Let s = nonnegative square root of s2.

Let

 0 
 
 a 21 + s sgn .a 21 
U = a 31 
  ………. (VI)
 M 
 
 a n1 

Thus U is the same as the 1st column of A except that the 1st component is taken as 0 and
second component is a variation of the second component in the 1st column of A. All
other components of U are the same as the corresponding components of the1st column
of A.

Then

−1
UTU 
α= 
 2 

−1
 (a + s sgn .a 21 )2 + a 31
2
+ a 241 + ..... + a 2n1 
=  21 
 2 

−1
 a 221 + s 2 + 2 a 21 s + a 31
2
+ ..... + a 2n1 
= 
 2 

== 
(
 a 221 + a 31
2
)
+ ..... + a 2n1 + s 2 + 2s a 21 

−1

 2 

145

(
 a 221 + a 31
2
)
+ ..... + a 2n1 + s 2 + 2s a 21 

−1

 2 

[
 2 s 2 + s a 21
= 
] −1

 since s 2 = a 221 + a 31
2
+ L + a 2n1
 2 

1
=
s 2 + s a 21

1
∴α = (VII)
s + s a 21
2

Thus if α is as in (VII) and U is as in (VI) where s is as in (V) then

P = I - α UUT

is s.t. P = PT = P-1, and it can be shown that

A2 = PA1P = PAP ( letting A1 = A)

is similar to A and has off tridiagonal entries in 1st row and 1st column as 0.

Now we apply this procedure to the matrix obtained by ignoring 1st column and 1st row
of A2.

Thus we now choose

s 2 = a 2 32 + a 2 42 + ..... + a 2 n 2

(where now aij denote entries of A2)

146
(i.e. s2 is sum of squares of the entries below second diagonal of A2)

s = Positive square root of s2

 0 
 
 0 
 a + ( sign .a ) s 
U =  32 32

 a 42 
 M 
 
 
 a n2 

1
α =
s + s a 32
2

P = I - α UUT

Then

A3 = PA2P

has off tridiagonal entries 1n 1st, 2nd rows and columns as zero. We proceed similarly
and annihilate all off tridiagonal entries and get T, real symmetric tridiagonal and
similar to A.

Note: For an nxn matrix we get tridiagonalization in n – 2 steps.

Example:

5 4 1 1
 
4 5 1 1
A=
1 1 4 2
 
1 4 
 1 2

147
A is a real symmetric matrix and is 4 x 4. Thus we get tridiagonalization after (4 – 2)
i.e. 2 steps.

Step 1:

s 2 = 4 2 + 1 2 + 1 2 = 18
s = 18 = 4 .24264

1 1 1
α= = =
s + s a 21 18 + (4.24264 )(4 ) 34 .97056
2

= 0.02860

 0   0 
   
 a 21 + s sgn .a 21   4 + 4 .24264 
U = = 
a 31 1
   
   
 a 41   1 

With this α, U, we get

1 0 0 0 
 
 0 − 0.94281 − 0.23570 − 0.23570 
P = I − αUU t = 
0 − 0.23570 0.97140 − 0.02860 
 
 0 − 0.23570 − 0.02860 0.97140 

148
 5 − 4 .24264 0 0 
 
 − 4 .24264 6 − 1 − 1
A2 = PAP = 
0 −1 3 . 5 1 .5 
 
 0 −1 1 .5 3 .5 

Step 2

s 2 = (− 1 ) + (1 ) = 2
2 2

s = 2 = 1 . 41421

1 1 1
α= = = = 0.29289
s + s a32
2
2 + (1.41421)(1) 3.41421

 0   0   0 
     
 0   0   0 
U =  =  =
a + s sgn .a 32 − 1 − 1.41421 − 2.41421
 32     
   −1   − 1 
 a 42    

P = I - α UUT

1 0 0 0 
 
0 1 0 0 
=
0 0 − 0 . 70711 − 0 .70711 
 
0 − 0 . 70711 0 . 70711 
 0

A3 = PA2P

 5 − 4 .24264 0 0
 
 − 4.24264 6 1 .41421 0
=
0 1.41421 5 0
 
 2 
 0 0 0

149
which is tridiagonal.
Thus the Givens – Householder scheme for finding the eigenvalues involves two steps,
namely,

STEP 1: Find a tridiagonal T (real symmetric) similar to T (by the method


described above)

STEP 2: Find the eigenvalues of T (by the method of sturm sequences and
bisection described earlier)

However, it must be mentioned that this method is used mostly to calculate the
eigenvalue of the largest modulus or to sharpen the calculations done by some other
method.

If one wants to calculate all the eigenvalues at the same time then one uses the
Jacobi iteration which we now describe.

150
4.5 JACOBI ITERATION FOR FINDING EIGENVALUES OF A REAL
SYMMETRIC MATRIX

Some Preliminaries:

a a12 
Let A =  11  be a real symmetric matrix.
 a12 a 22 

 Cos θ − sin θ 
Let P =   ; (where we choose θ ≤ π for
 sin θ Cos θ  4
purposes of convergence of the scheme)

Note
 Cos θ sin θ 
P t =   and P t P = PP t
= I
 − sin θ Cos θ 

Thus P is an orthogonal matrix.

Now
 cos θ sin θ   a 11 a 12   cos θ − sin θ 
A 1 = P t AP =      
 − sin θ cos θ   a 12 a 22   sin θ cos θ 
 cos θ sin θ   a11 cos θ + a12 sin θ − a11 sin θ + a12 cos θ 
=    
 − sin θ cos θ   a12 cos θ + a 22 sin θ − a12 sin θ + a 22 cos θ 

 a11 cos 2 θ + 2a12 sin θ cosθ + a 22 sin 2 θ (− a11 + a22 )sin θ cosθ + a12 (cos 2 θ − sin 2 θ )
= 
(
 (− a11 + a 22 ) sin θ cosθ + a12 cos θ − sin θ
2 2
) a11 sin 2 θ − 2a12 sin θ cosθ + a 22 cos 2 θ 

Thus if we choose θ such that,

(− a11 + a 22 )sin θ cos θ + a12 (cos 2 θ − sin 2 θ ) = 0 . . . (I)

We get the entries in (1,2) position and (2,1) position of A1 as zero.

151
(I) gives

 − a11 + a22 
  sin 2θ + a12 (cos 2θ ) = 0
 2 

a 11 − a 22
⇒ a 12 cos 2θ = sin 2θ
2

2 a12 2 a sgn (a11 − a 22 )


⇒ tan 2θ = = 12
(a11 − a 22 ) a11 − a 22

α
= , say . . . . . (II)
β

where α = 2a12 sgn(a11 − a22 ) . . . . . (III)

β = a 11 − a 22 . . . . . (IV)

∴ sec 2 2θ = 1 + tan 2

α2
= 1+ from (II)
β2
α2 + β2
=
β2

β2
∴ cos 2 2θ =
α2 +β2

β β
∴ cos 2θ = ⇒ 2 cos 2 θ − 1 =
α +β
2 2
α +β2
2

152
1 β 
⇒ cos θ = 1 +  . . . . . . . (V)
2 α2 +β 2


and

β 2
2 sin θ cos θ = sin 2θ = 1 − cos 2 2θ = 1−
α2 +β 2

α2 α
= =
α +β2 2
α2 + β2

α
∴ sin θ = . . . . . .(VI)
2 cos θ α 2 + β 2

(V) and (VI) give sinθ, cos θ and if we choose

 cos θ − sin θ 
P =   with these values of cosθ, sinθ, then
 sin θ cos θ 

PtAP = A1 has (2,1) and (1,2) entries as zero.

We now generalize this idea.

Let A = (aij) be an nxn real symmetric matrix.

Let 1 ≤ q < p < n. (Instead of (1,2) position above choose (q, p) position)

Consider,

α = 2 a qp sgn( a qq − a pp ) . . . . . . . (A)

153
β = a qq − a pp . . . . . . . (B)

1 β 
cos θ = 1 +  . . . . . . . (C)
2
 α +β2
2


1 α
sin θ = . . . . . . . (D)
2 cos θ α2 + β2

q p

1 
 
 1 
 O 
 
 cosθ − sin θ  q

P= 
1 
 O 
 
 sin θ cosθ  p
 O 
 
 1 

then A1 = Pt AP has the entries in (q, p) position and (p, q) position as zero.

In fact A1 differs from A only in qth row, pth row and qth column and pth column
and it can be shown that these new entries are

a 1 qi = a qi cos θ + a pi sin θ
i ≠ q, p (qth row pth row) . .(E)
a 1 pi = − a qi sin θ + a pi cos θ

154
a 1 iq = a iq cos θ + a ip sin θ
i ≠ q, p (qth column pth column) . .(F)
a 1 ip = − a iq sin θ + a ip cos θ

a 1 qq = a qq cos 2 θ + 2 a qp sin θ cos θ + a pp sin 2 θ


a 1 pp = a qq sin 2 θ − 2 a qp sin θ cos θ + a pp cos 2 θ . . . (G)
a 1
qp = a 1
pq = 0.

Now the Jacobi iteration is as follows.

Let A = (aij) be nxn real symmetric.

Find 1 ≤ q < p ≤ n such that a qp is largest among the absolute values of all
the off diagonal entries in A.

For this q, p find P as above. Let A1 = Pt AP. A1 can be obtained as follows:


Except the pth and the qth rows and the pth and qth columns other rows and
columns of A1 are the same as the corresponding rows and columns of A,

pth row, qth column, pth column which are obtained from (E), (F), (G).

Now A1 has 0 in (q, p), (p, q) position.

Replace A by A1 and repeat the process. The process converges to a diagonal


matrix the diagonal entries of which give the eigenvalues of A.

Example:
7 3 2 1
 
3 9 − 2 4
A = 
2 − 2 − 4 2
 
1 3 
 4 2

Entry with largest modulus is at (2, 4) position.

155
∴ q = 2, p = 4.

α = 2 sgn (a qq − a pp ).a qp = 2 sgn (a 22 − a 44 ).a 24

= (2 )(1 )(4 ) = 8 .

β = a qq − a pp = 9 − 3 = 6

∴α 2 + β 2 = 100 ; α 2 + β 2 = 10

 β 
1 + 
1
∴ cos θ =
2   α 2
+ β 2 


1 6 4
= 1 +  = = 0.8 = 0.89442
2  10  5

1 α 1 8
sin θ = =
2 cos θ α +β
2 2 2( 0.89442 ) 10

= 0.44721
1 0 0 0 
 
0 0 . 89442 0 − 0 . 44721 
∴ P =  
0 0 1 0
 
0 0 . 89442 
 0 . 44721 0

156
A1 = PTAP will have a124 = a142 = 0.

Other entries that are different from that of A are a121, a122, a123 ; a141, a142, a143,
a144 ; (of course by symmetric corresponding reflected entries also change).

We have,

a1 21 = a 21 cosθ + a 41 sin θ = 3.1305

a141 = − a21 sin θ + a41 cos θ = −0.44721

a123 = a23 cos θ + a43 sin θ = −0.89443

a143 = − a23 sin θ + a43 cos θ = 2.68328

a122 = a22 cos 2 θ + 2a24 sin θ cosθ + a44 sin 2 θ = 11

a144 = a22 sin 2 θ − 2a24 sin θ cos θ + a44 cos 2 θ = 1

 7 3.1305 2 − 0.44721 
 
 3.1305 11 − 0.89443 0.0000 
∴ A1 = 
2 − 0.89443 −4 2.68328 
 
 − 0.44721 1.00000 
 0 2.68328

Now we repeat the process with this matrix.

The largest absolute value is at (1, 2) position.

∴ q = 1, p = 2.

β = a qq − a pp = a11 − a 22 = 7 − 11 = − 4 = 4

157
α = 2a gp sgn (aqq − a pp ) = 2(3.1305)(− 1)
= - 6.2610.

α 2
+β 2
= 55 . 200121
α 2
+β 2
= 7 . 42968

1 β 
cos θ = 1 +  = 0 .87704 ;
2 α 2 + β 2 

1 α
sin θ = = −0.48043
2 cos θ α +β2
2

∴ The entries that change are

a 112 = a 1 21 = 0

a 113 = a 13 cos θ + a 23 sin θ = 2 . 18378


a 1 23 = − a 13 sin θ + a 23 cos θ = 0 . 17641

a 114 = a 14 cos θ + a 24 sin θ = − 0 . 39222


a 1 24 = − a 14 sin θ + a 24 cos θ = − 0 . 21485

a 111 = a 11 cos 2 θ + 2 a 12 sin θ cos θ + a 22 sin 2 θ = 5 . 28516


a 1 22 = a 11 sin 2 θ − 2 a 12 sin θ cos θ + a 22 cos 2 θ = 12 . 71484

and the new matrix is

158
 5.28516 0 2.18378 − 0.39222 
 
 0 12.71484 0.17641 − 0.21485 
 2.18378 0.17641 −4 2.68328 
 
 − 0.39222 − 0.21485 
 2.68328 1 

Now we repeat with q = 3, p = 4 and so on.


And at the 12th step we get the diagonal matrix

 5.78305 0 0 0 
 
 0 12.71986 0 0 
 0 0 − 5.60024 0 
 
 2.09733 
 0 0 0

giving eigenvalues of A as 5.78305, 12.71986, -5.60024, 2.09733.

Note: At each stage when we choose (q, p) position and apply the above
transformation to get new matrix A1 then sum of squares of off diagonal entries
of A1 will be less than that of A by 2a2qp.

159
4.6 The Q R decomposition:

Let A be an nxn real nonsingular matrix. A matrix Q is said to be an


orthogonal matrix if QQT=I=QTQ. We shall see that given the matrix we can find
an orthogonal matrix Q and an upper triangular matrix R (with rii >0) such that
A=QR called the QR decomposition of A. The Q and R are found as follows:

Let a(1) ; a(2) ; ……. , a(n) be the columns of A

q(1) ; q(2) ; ……… , q(n) be the columns of Q

r(1) , r(2) , ………., r(n) be the columns of R.

Note: Since Q is orthogonal we have

q (1) = 1 = q (2 ) = ....... = q (n ) .......... ..( A )


2 2 2

(q ( ) , q ( ) ) = 0
i j
if i ≠ j …………….. (B).

and since R is upper triangular we have

 r1i 
 
 r2 i 
 M 
 
r (i ) =  rii .......... .......... ..(C )
 0
 
 M 
 
 0

Also the ith column of QR is; Qr(i) and ∴ ith column of

QR = r1i q (1) + r2i q (2 ) + ..... + rii q (i ) ....................(D )

We want A = QR.

Comparing 1st column on both sides we get

160
a(1) = QR ’s first column
= Qr(1)
= r11q(1) by (D)

∴ a (1) = r11 q (1) = r11 q (1)


2 2 2

= r11 r11 > 0 and q 1 = 1by( A)


2

∴ r11 = a (1 ) a .......... ....... (E )


1 (1)
and q (1) =
2 r11

giving 1st columns of R and Q.

Next comparing second columns on both sides we get

a(2) = Qr(2) = r12 q(1) +r22 q(2) ……….. (*)

Therefore from (*) we get

(a ( ) , q ( ) ) = r (q ( ) , q ( ) ) + r (q ( ) , q ( ) )
2 1
12
1 1
22
2 1

(
= r12 Q q (1) , q (1) = q (1 ) ) 2
2 = 1by ( A )

(
and q ( 2 ) , q (1) = 0by (B ) )
( )
∴ r12 = a ( 2 ) , q (1) ............(F )

∴ (*) gives

r22 q ( 2 ) = a (2 ) − r12 q (1 )
and ∴ r22 q (2 ) = a ( 2 ) − r12 q (1 )
2 2

∴ r22 = a ( 2 ) − r12 q (1 ) .......... .......... (G )


2

and

q (2 ) =
1
r22
[ ]
a ( 2 ) − r12 q (1 ) .......... .......... ..(H )

(F), (G), (H) give 2nd columns of Q and R. We can proceed having got the first i
- 1 columns of Q and R we get ith columns of Q and R as follows:

( ) ( )
r1i = a (i ) , q (1) ; r2i = a (1) , q (2 ) ,............, ri −1i = a (i ) , q (i−1) ( )
161
rii = a (i ) − r1i q (1) − r2i a (2 ) ....... − ri −1i q (i −1)
2

q (i ) =
i (i )
rii
[
a − r1i q (1) − r2i q (2 ) .......... − ri −1 q (i −1) ]
Example:

1 2 1
 
A = 1 0 1
0 1 1 

1st column of Q and R

r11 = a (1) = 12 + 12 = 2
2

 1 
 
 2 
1 (1 )  1 
q (1 ) = a = 
r11 2 
 0 
 
 

2nd column of Q and R:

  1 
  
  2  2 
  
(
r12 = a (2 ) , q (1 ) ) =   0 , 
1 
 =
2
= 2
  1   2  2
  0  
 
  

 2 1  1 
(2 ) (1 )      
r22 = a − r12 q =  0 − 1 =  − 1 = 3
 1  0  1 
2

    2   2

162
 1 
 
 3 
q (2 ) =
1
[a ( ) − r
2
12 q (1 ) ] 
= −
1 

3  3
 1 
 
 3 

3rd column of Q and R:

  1 
  
  1  2 
  
(
r13 = a (3 ) , q (1) ) =   1 , 
1 
 =
2
= 2
  1   2  2
  0  
 
  

(
r23 = a (3 ) , q (2 ) = ) 1
3

r33 = a (3 ) − r13 q (1) − r23 q (2 )

 1 
 
 1  1   3 
    1  1 
=  1 −  1  − − 
 1  0  3  3
    1 

 
 3  2

 1 
− 
 3
 1  1 1 4 2
=   = + + = 2 =
3  9 9 9 3 3

 2 
 
 3  2

and

163
q (3 ) =
r33
[
1 (3 )
a − r13 q (1) − r23 q (2 ) ]

 1   − 1 
−  
 3 6 
3  1   1 
= =  
2  3   6 
 2  
  2 
 3   3 

 1 1 1   
 −  
 2 3 6  2 2 2 
 1 1 1   1 
∴Q =  −  ; R = 0 3 
 2 3 6   3 
 1 2   2 
 0   0 0 
 3 3   3 

and

1 2 1
 
QR =  1 0 1 = A
0 1 1

giving us QR decomposition of A.

The QR algorithm

Let A be any nonsingular nxn matrix.


Let A = A1 = Q1 R1 be its QR decomposition.
Let A2 = R1 Q1. Then find the QR decomposition of A2 as A2 = Q2 R2
Define A3 = R2 Q2 ; find QR decomposition of A3 as

A3 = Q3 R3.

Keep repeating the process. Thus

A1 = Q1 R1
A2 = R1 Q1
and the ith step is

Ai = Ri-1 Qi-1
Ai = Qi Ri

164
Then Ai ‘ converges’ to an upper triangular matrix exhibiting the eigenvalues of
A along the diagonal.

165

You might also like