You are on page 1of 337

Lecture Notes of Matrix Computations

Wen-Wei Lin

Department of Mathematics National Tsing Hua University Hsinchu, Taiwan 30043, R.O.C.

May 5, 2008

ii

Contents
I On the Numerical Solutions of Linear Systems
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
3 3 3 4 5 5 6 16 16 17 19 19 20 20 21 21

1 Introduction 1.1 Mathematical auxiliary, denitions and relations . 1.1.1 Vectors and matrices . . . . . . . . . . . . 1.1.2 Rank and orthogonality . . . . . . . . . . 1.1.3 Special matrices . . . . . . . . . . . . . . . 1.1.4 Eigenvalues and Eigenvectors . . . . . . . 1.2 Norms and eigenvalues . . . . . . . . . . . . . . . 1.3 The Sensitivity of Linear System Ax = b . . . . . 1.3.1 Backward error and Forward error . . . . . 1.3.2 An SVD Analysis . . . . . . . . . . . . . . 1.3.3 Normwise Forward Error Bound . . . . . . 1.3.4 Componentwise Forward Error Bound . . 1.3.5 Derivation of Condition Number of Ax = b 1.3.6 Normwise Backward Error . . . . . . . . . 1.3.7 Componentwise Backward Error . . . . . . 1.3.8 Determinants and Nearness to Singularity 2 Numerical methods for solving linear systems 2.1 Elementary matrices . . . . . . . . . . . . . . . 2.2 LR-factorization . . . . . . . . . . . . . . . . . . 2.3 Gaussian elimination . . . . . . . . . . . . . . . 2.3.1 Practical implementation . . . . . . . . . 2.3.2 LDR- and LLT -factorizations . . . . . . 2.3.3 Error estimation for linear systems . . . 2.3.4 Error analysis for Gaussian algorithm . . ` 2.3.5 Apriori error estimate for backward error 2.3.6 Improving and Estimating Accuracy . . 2.4 Special Linear Systems . . . . . . . . . . . . . . 2.4.1 Toeplitz Systems . . . . . . . . . . . . . 2.4.2 Banded Systems . . . . . . . . . . . . . 2.4.3 Symmetric Indenite Systems . . . . . .

23 . . . . . . . . . . . . . . 23 . . . . . . . . . . . . . . 24 . . . . . . . . . . . . . . 27 . . . . . . . . . . . . . . 27 . . . . . . . . . . . . . . 29 . . . . . . . . . . . . . . 30 . . . . . . . . . . . . . . 30 bound of LR-factorization 33 . . . . . . . . . . . . . . 38 . . . . . . . . . . . . . . 40 . . . . . . . . . . . . . . 40 . . . . . . . . . . . . . . 43 . . . . . . . . . . . . . . 44 45 45 45

3 Orthogonalization and least squares methods 3.1 QR-factorization (QR-decomposition) . . . . . . . . . . . . . . . . . . . . 3.1.1 Householder transformation . . . . . . . . . . . . . . . . . . . . .

iv 3.1.2 Gram-Schmidt method . . . . . . . . . . . . . . . . . . 3.1.3 Givens method . . . . . . . . . . . . . . . . . . . . . . Overdetermined linear Systems - Least Squares Methods . . . 3.2.1 Rank Deciency I : QR with column pivoting . . . . . 3.2.2 Rank Deciency II : The Singular Value Decomposition 3.2.3 The Sensitivity of the Least Squares Problem . . . . . 3.2.4 Condition number of a Rectangular Matrix . . . . . . . 3.2.5 Iterative Improvement . . . . . . . . . . . . . . . . . .

3.2

CONTENTS . . . . . . 48 . . . . . . 49 . . . . . . 50 . . . . . . 53 . . . . . . 55 . . . . . . 56 . . . . . . 57 . . . . . . 59

4 Iterative Methods for Solving Large Linear Systems 61 4.1 General procedures for the construction of iterative methods . . . . . . . 61 4.1.1 Some theorems and denitions . . . . . . . . . . . . . . . . . . . . 65 4.1.2 The theorems of Stein-Rosenberg . . . . . . . . . . . . . . . . . . 73 4.1.3 Sucient conditions for convergence of TSM and SSM . . . . . . 75 4.2 Relaxation Methods (Successive Over-Relaxation (SOR) Method ) . . . 77 4.2.1 Determination of the Optimal Parameter for 2-consistly Ordered Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.2 Practical Determination of Relaxation Parameter b . . . . . . . . 84 4.2.3 Break-o Criterion for SOR Method . . . . . . . . . . . . . . . . 84 4.3 Application to Finite Dierence Methods: Model Problem (Example 4.1.3) 85 4.4 Block Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.5 The ADI method of Peaceman and Rachford . . . . . . . . . . . . . . . . 88 4.5.1 ADI method (alternating-direction implicit iterative method) . . . 88 4.5.2 The algorithm of Buneman for the solution of the discretized Poisson Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.5.3 Comparison with Iterative Methods . . . . . . . . . . . . . . . . . 100 4.6 Derivation and Properties of the Conjugate Gradient Method . . . . . . . 102 4.6.1 A Variational Problem, Steepest Descent Method (Gradient Method).102 4.6.2 Conjugate gradient method . . . . . . . . . . . . . . . . . . . . . 106 4.6.3 Practical Implementation . . . . . . . . . . . . . . . . . . . . . . . 109 4.6.4 Convergence of CG-method . . . . . . . . . . . . . . . . . . . . . 110 4.7 CG-method as an iterative method, preconditioning . . . . . . . . . . . . 113 4.7.1 A new point of view of PCG . . . . . . . . . . . . . . . . . . . . . 114 4.8 Incomplete Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . 119 4.9 Chebychev Semi-Iteration Acceleration Method . . . . . . . . . . . . . . 124 4.9.1 Connection with SOR Method . . . . . . . . . . . . . . . . . . . . 130 4.9.2 Practical Performance . . . . . . . . . . . . . . . . . . . . . . . . 132 4.10 GCG-type Methods for Nonsymmetric Linear Systems . . . . . . . . . . 133 4.10.1 GCG method(Generalized Conjugate Gradient) . . . . . . . . . . 134 4.10.2 BCG method (A: unsymmetric) . . . . . . . . . . . . . . . . . . . 137 4.11 CGS (Conjugate Gradient Squared), A fast Lanczos-type solver for nonsymmetric linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 4.11.1 The polynomial equivalent method of the CG method . . . . . . . 138 4.11.2 Squaring the CG algorithm: CGS Algorithm . . . . . . . . . . . . 142 4.12 Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems . . . . . . . . . . . . . . . . . 143

CONTENTS 4.13 A Transpose-Free Qusi-minimal Residual Algorithm for Nonsymmetric Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.13.1 Quasi-Minimal Residual Approach . . . . . . . . . . . . . . . . . 4.13.2 Derivation of actual implementation of TFQMR . . . . . . . . . . 4.13.3 TFQMR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14 GMRES: Generalized Minimal Residual Algorithm for solving Nonsymmetric Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14.1 FOM algorithm: Full orthogonalization method . . . . . . . . . . 4.14.2 The generalized minimal residual (GMRES) algorithm . . . . . . 4.14.3 Practical Implementation: Consider QR factorization of Hk . . . 4.14.4 Theoretical Aspect of GMRES . . . . . . . . . . . . . . . . . . . .

v 146 147 149 149 151 152 155 157 159

II

On the Numerical Solutions of Eigenvalue Problems


Unsymmetric Eigenvalue Problem Orthogonal Projections and C-S Decomposition . . . . . . Perturbation Theory . . . . . . . . . . . . . . . . . . . . . Power Iterations . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Power Method . . . . . . . . . . . . . . . . . . . . . 5.3.2 Inverse Power Iteration . . . . . . . . . . . . . . . . 5.3.3 Connection with Newton-method . . . . . . . . . . 5.3.4 Orthogonal Iteration . . . . . . . . . . . . . . . . . QR-algorithm (QR-method, QR-iteration) . . . . . . . . . 5.4.1 The Practical QR Algorithm . . . . . . . . . . . . . 5.4.2 Single-shift QR-iteration . . . . . . . . . . . . . . . 5.4.3 Double Shift QR iteration . . . . . . . . . . . . . . 5.4.4 Ordering Eigenvalues in the Real Schur From . . . LR, LRC and QR algorithms for positive denite matrices qd-algorithm (Quotient Dierence) . . . . . . . . . . . . . 5.6.1 The qd-algorithm for positive denite matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

161
163 165 169 181 182 185 186 188 192 195 198 199 202 205 208 210 213 213 225 228 233 237 237 240 241 242 242

5 The 5.1 5.2 5.3

5.4

5.5 5.6

6 The 6.1 6.2 6.3 6.4 6.5

6.6

Symmetric Eigenvalue problem Properties, Decomposition, Perturbation Theory . . . . . . . Tridiagonalization and the Symmetric QR-algorithm . . . . Once Again:The Singular Value Decomposition . . . . . . . Jacobi Methods . . . . . . . . . . . . . . . . . . . . . . . . . Some Special Methods . . . . . . . . . . . . . . . . . . . . . 6.5.1 Bisection method for tridiagonal symmetric matrices 6.5.2 Rayleigh Quotient Iteration . . . . . . . . . . . . . . 6.5.3 Orthogonal Iteration with Ritz Acceleration . . . . . Generalized Denite Eigenvalue Problem Ax = Bx . . . . . 6.6.1 Generalized denite eigenvalue problem . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

vi 7 Lanczos Methods 7.1 The Lanczos Algorithm . . . . . . . . . . . . . . . 7.2 Applications to linear Systems and Least Squares 7.2.1 Symmetric Positive Denite System . . . . 7.2.2 Bidiagonalization and the SVD . . . . . . 7.3 Unsymmetric Lanczos Method . . . . . . . . . . . 8 Arnoldi Method 8.1 Arnoldi decompositions . . . . . . . . . . 8.2 Krylov decompositions . . . . . . . . . . 8.2.1 Reduction to Arnoldi form . . . . 8.3 The implicitly restarted Arnoldi method 8.3.1 Filter polynomials . . . . . . . . 8.3.2 Implicitly restarted Arnoldi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

CONTENTS 261 . . . . . . 261 . . . . . . 280 . . . . . . 280 . . . . . . 284 . . . . . . 292 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 297 302 303 304 305 305 311 311 312 313 319 320 322

9 Jacobi-Davidson method 9.1 JOCC(Jacobi Orthogonal Component Correction) . . . . . . . . . 9.2 Davidson method . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Jacobi Davidson method . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Jacobi Davidson method as on accelerated Newton Scheme 9.3.2 Jacobi-Davidson with harmonic Ritz values . . . . . . . . . 9.4 Jacobi-Davidson Type method for Generalized Eigenproblems . .

List of Tables
1.1 3.1 4.1 4.2 4.3 Some denitions for matrices. . . . . . . . . . . . . . . . . . . . . . . . . Solving the LS problem (m n) . . . . . . . . . . . . . . . . . . . . . . . Comparison results for Jacobi, Gauss-Seidel, SOR and ADI methods . . . Number of iterations and operations for Jacobi, Gauss-Seidel and SOR methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . j 1 q4 , q8 and j : j Convergence rate of qk where j : n q4 , q 8 . . +1 5 56 101 102 129

List of Figures
1.1 4.1 4.2 5.1 Relationship between backward and forward errors. . . . . . . . . . . . . gure of (Lb ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1) (2) Geometrical view of i ( ) and i ( ). . . . . . . . . . . . . . . . . . . Orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 82 83 165

LIST OF FIGURES

Part I On the Numerical Solutions of Linear Systems

Chapter 1 Introduction
1.1
1.1.1

Mathematical auxiliary, denitions and relations


Vectors and matrices
a11 a1n . . .. . A = [aij ] = . . . . . am1 amn
n k=1

A Kmn , where K = R or C

Product of matrices (Kmn Knp Kmp ): C = AB , where cij = i = 1, , m , j = 1, , p . Transpose (Rmn Rnm ): C = AT , where cij = aji R.

aik bkj ,

Conjugate transpose (Cmn Cnm ): C = A or C = AH , where cij = a ji C. (t) = [c Dierentiation (Rmn Rmn ): Let C (t) = (cij (t)). Then C ij (t)]. If A, B Knn satisfy AB = I , then B is the inverse of A and is denoted by A1 . If A1 exists, then A is said to be nonsingular; otherwise, A is singular. A is nonsingular if and only if det(A) = 0. If A Kmn , x Kn and y = Ax, then yi = Outer product of x Km and y Kn : x1 y 1 . .. xy = . . . xm y 1 Inner product of x and y Kn :
n n j =1

aij xj , i = 1, , m.

x1 y n . mn . . . K xm y n

(x, y ) := x y =
i=1 n

xi yi = y T x R x i yi = y x C
i=1

(x, y ) := x y =

Chapter 1. Introduction Sherman-Morrison Formula: Let A Rnn be nonsingular, u, v Rn . If v T A1 u = 1, then (A + uv T )1 = A1 (1 + v T A1 u)1 A1 uv T A1 . (1.1.1)

Sherman-Morrison-Woodburg Formula: Let A Rnn , be nonsingular U , V Rnk . If (I + V T A1 U ) is invertible, then (A + U V T )1 = A1 A1 U (I + V T A1 U )1 V T A1 , Proof of (1.1.1): (A + uv T )[A1 A1 uv T A1 /(1 + v T A1 u)] 1 =I+ [uv T A1 (1 + v T A1 u) uv T A1 uv T A1 uv T A1 ] 1 + v T A1 u 1 =I+ [u(v T A1 u)v T A1 uv T A1 uv T A1 ] = I. T 1 1+v A u Example 1.1.1 A= 3 1 0 1 0 1 0 0 0 0 1 2 4 0 0 1 2 1 3 0 1 2 1 0 3 =B+ 0 0 1 0 0 0 1 0 0 0 .

1.1.2

Rank and orthogonality

Let A Rmn . Then R(A) = {y Rm | y = Ax for some x Rn } Rm is the range space of A. N (A) = {x Rn | Ax = 0 } Rn is the null space of A. rank(A) = dim [R(A)] = The number of maximal linearly independent columns of A. rank(A) = rank(AT ). dim(N (A)) + rank(A) = n. If m = n, then A is nonsingular N (A) = {0} rank(A) = n. Let {x1 , , xp } Rn . Then {x1 , , xp } is said to be orthogonal if xT i xj = 0, for T i = j and orthonormal if xi xj = ij , where ij = 0 if i = j and ij = 1 if i = j . S = {y Rm | y T x = 0, for x S } = orthogonal complement of S . Rn = R(AT ) N (A), Rm = R(A) N (AT ). R(AT ) N (A), R(A) = N (AT ).

1.1 Mathematical auxiliary, denitions and relations 5 nn nn AR AC T Symmetric: A = A Hermitian: A = A(AH = A) T skew-symmetric: A = A skew-Hermitian: A = A positive denite: xT Ax > 0, x = 0 positive denite: x Ax > 0, x = 0 T non-negative denite: x Ax 0 non-negative denite: x Ax 0 indenite: (xT Ax)(y T Ay ) < 0 for some x, y indenite: (x Ax)(y Ay ) < 0 for some x, y orthogonal: AT A = In unitary: A A = In T T normal: A A = AA normal: A A = AA positive: aij > 0 non-negative: aij 0. Table 1.1: Some denitions for matrices.

1.1.3

Special matrices

Let A Knn . Then the matrix A is diagonal if aij = 0, for i = j . Denote D = diag (d1 , , dn ) Dn the set of diagonal matrices; tridiagonal if aij = 0, |i j | > 1; upper bi-diagonal if aij = 0, i > j or j > i + 1; (strictly) upper triangular if aij = 0, i > j (i j ); upper Hessenberg if aij = 0, i > j + 1. (Note: the lower case is the same as above.) Sparse matrix: n1+r , where r < 1 (usually between 0.2 0.5). If n = 1000, r = 0.9, then n1+r = 501187. Example 1.1.2 If S is skew-symmetric, then I S is nonsingular and (I S )1 (I + S ) is orthogonal (Cayley transformation of S ).

1.1.4

Eigenvalues and Eigenvectors

Denition 1.1.1 Let A Cnn . Then C is called an eigenvalue of A, if there exists x = 0, x Cn with Ax = x and x is called an eigenvector corresponding to . Notations: (A) := Spectrum of A = The set of eigenvalues of A. (A) := Radius of A = max{|| : (A)}. (A) p() =
s i=1 (

det(A I ) = 0. i )m(i ) , where i = j (for i = j ) and


s i=1

p() = det(I A) = The characteristic polynomial of A. m(i ) = n.

6 m(i ) = The algebraic multiplicity of i .

Chapter 1. Introduction n(i ) = n rank(A i I ) = The geometric multiplicity of i . 1 n(i ) m(i ).

Denition 1.1.2 If there is some i such that n(i ) < m(i ), then A is called degenerated. The following statements are equivalent: (1) There are n linearly independent eigenvectors; (2) A is diagonalizable, i.e., there is a nonsingular matrix T such that T 1 AT Dn ; (3) For each (A), it holds that m() = n(). If A is degenerated, then eigenvectors and principal vectors derive the Jordan form of A. (See Gantmacher: Matrix Theory I, II) Theorem 1.1.1 (Schur) (1) Let A Cnn . There is a unitary matrix U such that U AU (= U 1 AU ) is upper triangular. (2) Let A Rnn . There is an orthogonal matrix Q such that QT AQ(= Q1 AQ) is quasi-upper triangular, i.e., an upper triangular matrix possibly with nonzero subdiagonal elements in non-consecutive positions. (3) A is normal if and only if there is a unitary U such that U AU = D diagonal. (4) A is Hermitian if and only if A is normal and (A) R. (5) A is symmetric if and only if there is an orthogonal U such that U T AU = D diagonal and (A) R.

1.2

Norms and eigenvalues

Let X be a vector space over K = R or C. Denition 1.2.1 (Vector norms) Let N be a real-valued function dened on X (N : X R+ ). Then N is a (vector) norm, if N1: N (x) = ||N (x), K, for x X ; N2: N (x + y ) N (x) + N (y ), for x, y X ; N3: N (x) = 0 if and only if x = 0. The usual notation is x = N (x).

1.2 Norms and eigenvalues Example 1.2.1 Let X = Cn , p 1. Then x cially,


n

=(

n i=1

|x i | )

p 1/p

7 is a p-norm. Espe-

x x x

=
i=1 n

|x i |

( 1-norm), (2-norm = Euclidean-norm),

=(
i=1

|xi |2 )1/2

= max |xi |
1in

(-norm = maximum norm).

Lemma 1.2.1 N (x) is a continuous function in the components x1 , , xn of x. Proof:


n

|N (x) N (y )| N (x y )
j =1 n

|xj yj |N (ej ) N (ej ).

xy

j =1

Theorem 1.2.1 (Equivalence of norms) Let N and M be two norms on Cn . Then there are constants c1 , c2 > 0 such that c1 M (x) N (x) c2 M (x), for all x Cn . Proof: Without loss of generality (W.L.O.G.) we can assume that M (x) = x is arbitrary. We claim that c1 x equivalently, c1 N (z ) c2 , z S = {z Cn | z

and N

N (x) c2 x

= 1}.

From Lemma 1.2.1, N is continuous on S (closed and bounded). By maximum and minimum principle, there are c1 , c2 0 and z1 , z2 S such that c1 = N (z1 ) N (z ) N (z2 ) = c2 . If c1 = 0, then N (z1 ) = 0, and thus, z1 = 0. This contradicts that z1 S . Remark 1.2.1 Theorem 1.2.1 does not hold in innite dimensional space. Denition 1.2.2 (Matrix-norms) Let A Cmn . A real-valued function : Cmn R+ satisfying N1: N2: A = || A ; A+B A + B ;

8 N3: N4: N5: A = 0 if and only if A = 0; AB A Ax


v

Chapter 1. Introduction

B ;
v

A x

(matrix and vector norms are compatible for some

v)

is called a matrix norm. If algebra norm.

satises N1 to N4, then it is called a multiplicative or

Example 1.2.2 (Frobenius norm) Let A AB


F

=[

n i,j =1

|ai,j |2 ]1/2 .

= (
i,j

|
k

aik bkj |2 ) 2 |aik |2 }{


k

(
i,j

|bkj |2 }) 2
k

(Cauchy-Schwartz Ineq.)
1

= (
i k F

|aik |2 ) 2 (
j F. k

|bkj |2 ) 2 (1.2.1)

This implies that N4 holds. Furthermore, by Cauchy-Schwartz inequality we have Ax


2

= (
i

|
j

aij xj |2 ) 2
1 2

(
j F

|aij |2 )(
j

|x j |2 ) (1.2.2)
F

x 2. =

This implies that N5 holds. Also, N1, N2 and N3 hold obviously. (Here, I Example 1.2.3 (Operator norm) Given a vector norm matrix norm is dened by A = sup
x=0

n).

. An associated (induced)

Ax Ax = max . x=0 x x

(1.2.3)

Then N5 holds immediately. On the other hand, (AB )x = A(Bx) A Bx A B x

(1.2.4)

for all x = 0. This implies that AB A B . It holds N 4. (Here I = 1). (1.2.5)

1.2 Norms and eigenvalues In the following, we represent and verify three useful matrix norms: A A A Proof of (1.2.6): Ax
1 1

= sup
x=0

Ax 1 = max 1j n x 1

|aij |
i=1 n

(1.2.6) (1.2.7) (1.2.8)

= sup
x=0

Ax = max |aij | 1in x j =1 Ax 2 = x 2 (A A)

= sup
x=0

=
i

|
j

aij xj |
i j

|aij ||xj |

=
j

|x j |
i

|aij |.
1

Let C1 := i |aik | = maxj i |aij |. Then Ax 1 C1 x 1 , thus A other hand, ek 1 = 1 and Aek 1 = n i=1 |aik | = C1 . Proof of (1.2.7): Ax

C1 . On the

= max |
i j

aij xj | |aij xj |
j

max
i

max
i j

|aij | x

|akj | x
.

C x

This implies that A C . If A = 0, there is nothing to prove. Assume that A = 0 and the k -th row of A is nonzero. Dene z = [zj ] Cn by zj = Then z

a kj |akj |

if akj = 0, if akj = 0.
n

= 1 and akj zj = |akj |, for j = 1, . . . , n. It follows that A

Az

= max |
i j

aij zj | |
j

akj zj | =
j =1

|akj | C .

Thus, A max1in n j =1 |aij | C . Proof of (1.2.8): Let 1 2 n 0 be the eigenvalues of A A. There are mutually orthonormal vectors vj , j = 1, . . . , n such that (A A)vj = j vj . Let x = 2 j j vj . Since Ax 2 = (Ax, Ax) = (x, A Ax), Ax
2 2

=
j

j vj ,
j

j j vj

=
j

j | j |2 1 x 2 2.

10 Chapter 1. Introduction 2 Therefore, A 2 1 . Equality follows by choosing x = v1 and Av1 2 2 = ( v1 , 1 v1 ) = 1 . So, we have A 2 = (A A).
1 Example 1.2.4 (Dual norm) Let p +1 = 1. Then q , (p = , q = 1). (It p = q concludes from the application of the Ho lder inequality that |y x| x p y q .)

Theorem 1.2.2 Let A Cnn . Then for any operator norm (A) A . Moreover, for any > 0, there exists an operator norm

, it holds

such that

(A) + .

Proof: Let || = (A) and x be the associated eigenvector with x = 1. Then, (A) = || = x = Ax A x = A .

On the other hand, there is a unitary matrix U such that A = U RU , where R is upper triangular. Let Dt = diag(t, t2 , . . . , tn ). Compute 1 t1 r12 t2 r13 tn+1 r1n 2 t1 r23 tn+2 r2n . 1 . 3 . Dt RDt = . .. . t1 rn1,n n For t > 0 suciently large, the sum of all absolute values of the o-diagonal elements of 1 1 Dt RDt is less than . So, it holds Dt RDt 1 (A) + for suciently large t() > 0. Dene for any B by B This implies that A
1 = Dt RDt (A) + .

= =

1 Dt U BU Dt 1 1 1 1 (U Dt ) B (U Dt ) 1.

Remark 1.2.2 U AV U AV
F 2

= =

A A

F 2

(by U A

U a1

2 2

+ + U an 2 2 ),

(1.2.9) (1.2.10)

(by (A A) = (AA )),

where U and V are unitary. Theorem 1.2.3 (Singular Value Decomposition (SVD)) Let A Cmn . Then there exist unitary matrices U = [u1 , , um ] Cmm and V = [v1 , , vn ] Cnn such that U AV = diag(1 , , p ) = , where p = min{m, n} and 1 2 p 0. (Here, i denotes the i-th largest singular value of A).

1.2 Norms and eigenvalues 11 n m Proof: There are x C , y C with x 2 = y 2 = 1 such that Ax = y , where = A 2 ( A 2 = sup x 2 =1 Ax 2 ). Let V = [x, V1 ] Cnn , and U = [y, U1 ] Cmm be unitary. Then w A1 U AV = . 0 B Since A1 it follows that A1 A1 2 2 +w w
2 2

( 2 + w w)2 ,
2

from

w w
2 2

2 2

2 + w w.

But 2 = A

2 2

= A1 2 2 , it implies w = 0. Hence, the theorem holds by induction.


2

Remark 1.2.3 A

(A A) = 1 = The maximal singular value of A.

Let A = U V . Then we have ABC


F

= U V BC F = V BC 1 BC F = A 2 BC F . A B C 2.

This implies ABC


F 2 F

(1.2.11)

In addition, by (1.2.2) and (1.2.11), we get A


2

n A 2.

(1.2.12)

Theorem 1.2.4 Let A Cnn . The following statements are equivalent: (1) lim Am = 0; (2) lim Am x = 0 for all x; (3) (A) < 1. Proof: (1) (2): Trivial. (2) (3): Let (A), i.e., Ax = x, x = 0. This implies Am x = m x 0, as m 0. Thus || < 1, i.e., (A) < 1. (3) (1): There is a norm with A < 1 (by Theorem 1.2.2). Therefore, Am A m 0, i.e., Am 0. Theorem 1.2.5 It holds that (A) = lim Ak
k 1/k m m

where

is an operator norm.

12 Proof: Since (A)k = (Ak ) Ak (A) Ak

Chapter 1. Introduction

1/k

= [(A) + ]1 A has spectral radius < 1 and by for k = 1, 2, . . .. If > 0, then A k 0 as k . There is an N = N (, A) such that Theorem 1.2.4 we have A k < 1 for all k N . Thus, Ak [(A) + ]k , for all k N or Ak 1/k (A) + A for all k N . Since (A) Ak 1/k , and k , are arbitrary, limk Ak 1/k exists and equals (A). Theorem 1.2.6 Let A Cnn , and (A) < 1. Then (I A)1 exists and (I A)1 = I + A + A2 + . Proof: Since (A) < 1, the eigenvalues of (I A) are nonzero. Therefore, by Theorem 2.5, (I A)1 exists and (I A)(I + A + A2 + + Am ) = I Am 0.

Corollary 1.2.1 If A < 1, then (I A)1 exists and (I A)1 1 1 A

Proof: Since (A) A < 1 (by Theorem 1.2.2),


(I A)1 =
i=0

Ai
i=0

= (1 A )1 .

Theorem 1.2.7 (Without proof ) For A Knn the following statements are equivalent: (1) There is a multiplicative norm p with p(Ak ) 1, k = 1, 2, . . .. (2) For each multiplicative norm p the power p(Ak ) are uniformly bounded, i.e., there exists a M (p) < such that p(Ak ) M (p), k = 0, 1, 2, . . .. (3) (A) 1 and all eigenvalue with || = 1 are not degenerated. (i.e., m() = n().) (See Householders book: The theory of matrix, pp.45-47.) In the following we prove some important inequalities of vector norms and matrix norms.

1.2 Norms and eigenvalues (a) It holds that 1 Proof: Claim x


q

13 x x
p q

n(qp)/pq ,

(p q ).

(1.2.13)

x p , (p q ): It holds
q

x where

x x

= x
p q

x x

Cp,q x p ,
q

Cp,q = max e q ,
e
p =1

e = (e1 , , en )T .

We now show that Cp,q 1. From p q , we have


n n

q q

=
i=1 q

|e i |
i=1

|ei |p = 1 (by |ei | 1).

Hence, Cp,q 1, thus x

x p.

To prove the second inequality: Let = q/p > 1. Then the Jensen inequality holds for the convex function (x): (

f d)

( f )d,

() = 1.

If we take (x) = x , then we have |f | dx =


q

(|f | )

p q/p

q/p

dx

|f | dx

1 with || = 1. Consider the discrete measure n i=1 n = 1 and f (i) = |xi |. It follows that q/p n n q1 p1 |x i | |x i | . n n i=1 i=1

Hence, we have Thus,

n q x

n p x p .
q

n(qp)/pq x (b) It holds that 1 Proof: Let q and lim x


q q

x p.

x x

np .

(1.2.14)

= x

:
1 q

= |x k | = ( |x k |q )
i=1

1 q

|x i |q

= x q.

14 On the other hand, we have


n
1 q

Chapter 1. Introduction

=
i=1

|x i |q
q

(n x
.

q q )

nq x

which implies that limq x (c) It holds that


1j n

= x A

max aj

n(p1)/p max aj p ,
1j n

(1.2.15)

where A = [a1 , , an ] Rmn . Proof: The rst inequality holds obviously. To show the second inequality, for y p = 1 we have
n n

Ay

j =1

|yj | aj y
1 j

j =1 p

|yj | max aj
j j

= (d) It holds that

max aj

n(p1)/p max aj

(by (1.2.13)).

max |aij | A
i,j

n(p1)/p m1/p max |aij |,


i,j

(1.2.16)

where A Rmn . Proof: By (1.2.14) and (1.2.15) immediately. (e) It holds that m(1p)/p A
1

n(p1)/p A 1 .

(1.2.17)

Proof: By (1.2.15) and (1.2.13) immediately. (f ) By H older inequality, we have (see Appendix later!) |y x | x where
1 p p

y q,

1 q

= 1 or max{|x y | : y
q

= 1} = x p .
q.

(1.2.18) (1.2.19)

Then it holds that A Proof: By (1.2.18) we have max Ax


x
p =1

= AT

= =

max max |(Ax)T y |


x y
p =1

q =1

max max |xT (AT y )| = max AT y


q =1

p =1

q =1

= AT

q.

1.2 Norms and eigenvalues (g) It holds that n p A


1

15

mp A

(1.2.20)

Proof: By (1.2.17) and (1.2.19), we get mp A


1

= m p AT 1 = m1 q AT 1 = m(q1)/q AT 1 AT q = A p . 1 1 ( + = 1). p q

(h) It holds that A Proof: By (1.2.19) we have A


p 2

A q,

(1.2.21)

= AT

AT A

AT A 2 .

The last inequality holds by the following statement: Let S be a symmetric matrix. Then S 2 S , for any matrix operator norm . Since || S , S This implies, S
2 2

(S S ) =

(S 2 ) = max || = |max |.
(S )

S .

(i) For A Rmn and q p 1, it holds that n(pq)/pq A Proof: By (1.2.13), we get A
p q

m(qp)/pq A q .

(1.2.22)

max Ax
x
p =1

max m(qp)/pq Ax
x
q 1

= m(qp)/pq A q .

Appendix: To show H older inequality and (1.2.18)


Taking (x) = ex in Jensens inequality we have exp

f d

ef d.

1 Let = nite set = {p1 , . . . , pn }, ({pi }) = n , f (pi ) = xi . Then

exp Taking yi = exi , we have

1 (x1 + + xn ) n

1 x1 (e + + exn ) . n

(y1 yn )1/n

1 (y1 + + yn ). n

16 Taking ({pi }) = qi > 0,

Chapter 1. Introduction
n i=1 qi

= 1 we have (1.2.23)

q1 qn y1 yn q1 y 1 + + qn y n .

Let i = xi / x p , i = yi / y q , where x = [x1 , , xn ]T , y = [y1 , , yn ]T , = [1 , , n ]T and = [1 , , n ]T . By (1.2.23) we have 1 p 1 q i i i + i . p q Since


p

= 1,

= 1, it holds
n

i i
i=1

1 1 + = 1. p q

Thus, |x T y | x To show max{|xT y |; x


p p

y q.
q/p q

q 1 = 1} = y q . Taking xi = yi / y

we have

x Note (q 1)p = q . Then


n

p p

n i=1

|yi |(q1)p = 1. y q q

xT i yi =
i=1

n i=1

|yi |q
q/p q

y y

q q q/p q

= y q.

The following two properties are useful in the following sections. (i) There exists z with z T z y = 1 and z p =
p = 1 . y q

1 such that y

=z T y . Let z = z / y q . Then we have |y T u| = y T z and z = 1.

(ii) From the duality, we have y = ( y ) = max u Let z = z / y . Then we have z T y = 1 and z =

=1

1 y

1.3
1.3.1

The Sensitivity of Linear System Ax = b


Backward error and Forward error

Let x = F (a). We dene backward and forward errors in Figure 1.1. In Figure 1.1, x + x = F (a + a) is called a mixed forward-backward error, where |x| |x|, |a| |a|. Denition 1.3.1 (i) An algorithm is backward stable, if for all a, it produces a computed x with a small backward error, i.e., x = F (a + a) with a small. (ii) An algorithm is numerical stable, if it is stable in the mixed forward-backward error sense, i.e., x + x = F (a + a) with both a and x small.

1.3 The Sensitivity of Linear System Ax = b


Input = backward error Output N . = forward error
X  = ,=  X N .

17

= ,=
,N

. = ,=

Figure 1.1: Relationship between backward and forward errors. (iii) If a method which produces answers with forward errors of similar magnitude to those produced by a backward stable method, is called a forward stable. Remark 1.3.1 (i) Backward stable forward stable, not vice versa! (ii) Forward error condition number backward error Consider x x = F (a + a) F (a) = F (a)a + Then we have x x = x aF (a) F (a) a + O (a)2 . a F (a + a) (a)2 , (0, 1). 2

(a) The quantity C (a) = aF is called the condition number of F. If x or F is a vector, F (a) then the condition number is dened in a similar way using norms and it measures the maximum relative change, which is attained for some, but not all a.

` Apriori error estimate ! ` Pposteriori error estimate !

1.3.2
Let A =

An SVD Analysis
n i=1

i ui vi T = U V T be a singular value decomposition (SVD) of A. Then


n

x = A1 b = (U V T )1 b =
i=1

ui T b vi . i

If cos() =| un T b | /

and n > 0.

(A un vn T )y = b + (un T b)un ,

18 Then we have yx
2

Chapter 1. Introduction ( ) n x
2

cos().

Let E = diag{0, , 0, }. Then it holds ( E )V T y = U T b + (un T b)en . Therefore, y x = V ( E )1 U T b + (un T b)(n )1 vn V 1 U T b = V (( E )1 1 )U T b + (un T b)(n )1 vn = V (1 E ( E )1 )U T b + (un T b)(n )1 vn = V diag 0, , 0, U T b + (un T b)(n )1 vn n (n ) = vn (un T b) + (un T b)(n )1 vn n (n ) = un T bvn ( + (n )1 ) n (n ) (1 + n ) T un bvn . = n (n ) From the inequality x
2

2 2

A1

we have
1+ ( ) n 2

yx x 2

| un T b | b

| un T b | . b 2 n

Theorem 1.3.1 A is nonsingular and and (A + E )1 A1 E A1

A1 E = r < 1. Then A + E is nonsingular /(1 r).

Proof:: Since A is nonsingular, A+E = A(I F ), where F = A1 E . Since F = r < 1, 1 . Then it follows that I F is nonsingular (by Corollary 1.2.1) and (I F )1 < 1 r (A + E )1 = (I F )1 A1 = (A + E )1 and (A + E )1 A1 = A1 E (A + E )1 . It follows that (A + E )1 A1 A1 E (A + E )1 A1 2 E . 1r A1 1r

Lemma 1.3.1 Let Ax = b, (A + A)y = b + b, where A A and b b . If (A) = r < 1, then A +A is nonsingular y r and x 1+ , where (A) = A A1 . 1r

1.3 The Sensitivity of Linear System Ax = b 19 1 1 Proof: Since A A < A A = r < 1, it follows that A + A is nonsingular. 1 From the equality (I + A A)y = x + A1 b follows that y (I + A1 A)1 ( x + A1 1 ( x + A1 b ) 1r 1 b = ( x +r ). 1r A x follows the lemma. b )

From b = Ax A

1.3.3

Normwise Forward Error Bound


xy x

Theorem 1.3.2 If the assumption of Lemma 1.3.1 holds, then Proof:: Since y x = A1 b A1 Ay , we have y x A1 So by Lemma 1.3.1 it holds yx x (A) b b + A1 A y .

2 (A). 1r

A x 1+r 2 (A)(1 + )= (A). 1r 1r

+ (A)

y x

1.3.4

Componentwise Forward Error Bound

Theorem 1.3.3 Let Ax = b and (A + A)y = b + b, where | A | | A | and x | b | | b |. If (A) = r < 1, then (A + A) is nonsingular and y 12 | x r 1 1 A || A | . Here | A || A | is called a Skeel condition number. Proof:: Since A A and b b , the assumptions of Lemma 1.3.1 y 1+r are satised in -norm. So, A + A is nonsingular and x 1 . r 1 1 Since y x = A b A Ay , we have | y x | | A1 || b | + | A1 || A || y | | A1 || b | + | A1 || A || y | | A1 || A | (| x | + | y |). By taking -norm, we have yx

| A1 || A | =

( x
.

1+r x 1r

2 | A1 || A | 1r

20

Chapter 1. Introduction

1.3.5
Let

Derivation of Condition Number of Ax = b


(A + F )x() = b + f with x(0) = x.

Then we have x (0) = A1 (f F x) and x() = x + x (0) + o(2 ). Therefore, x() x x A1 { f + x F } + o(2 ).

Dene condition number (A) := A x() x x

A1 . Then we have (A)(A + b ) + o(2 ),

where A = F / A and b = f / b .

1.3.6

Normwise Backward Error

Theorem 1.3.4 Let y be the computed solution of Ax = b. Then the normwise backward error bound (y ) := min |(A + A)y = b + b, is given by (y ) = where r = b Ay is the residual. Proof: The right hand side of (1.3.24) is a upper bound of (y ). This upper bound is attained for the perturbation (by construction!) Amin = A y rz T , bmin = A y + b A
1 y

A A ,

b b

r , A y + b

(1.3.24)

b y + b .

r,

where z is the dual vector of y , i.e. z T y = 1 and z = Check: Amin = (y ) A , or Amin = That is, to prove rz T = Since A y rz T = A y + b

r A y + b r . y z

A .

rz T = max (rz T )u = r max |z T u| = r


u =1 u =1

= r

1 , y

we have done. Similarly, bmin = (y ) b .

1.3 The Sensitivity of Linear System Ax = b

21

1.3.7

Componentwise Backward Error


(y ) := min |(A + A)y = b + b, |A| |A|, |b| |b|

Theorem 1.3.5 The componentwise backward error bound

is given by (y ) = max
i

|r |i , (A|y | + b)i

(1.3.25)

where r = b Ay . (note: /0 = 0 if = 0; /0 = if = 0.) Proof: The right hand side of (1.3.25) is a upper bound for (y ). This bound is attained for the perturbations A = D1 AD2 and b = D1 b, where D1 = diag(ri /(A|y | + b)i ) and D2 = diag(sign(yi )). Remark 1.3.2 Theorems 1.3.4 and 1.3.5 are posterior error estimation approach.

1.3.8

Determinants and Nearness to Singularity


Bn = 1 1 1 . .. . . 1 . , 1 1 0 1 1 Bn = 1 2n2 . ... ... . . .. . 1 0 1 1 .

Then det(Bn ) = 1, (Bn ) = n2n1 , 30 (B30 ) 108 . Dn = 101 0 0 . 101 .

..

Then det(Dn ) = 10n , p (Dn ) = 1 and n (Dn ) = 101 .

22

Chapter 1. Introduction

Chapter 2 Numerical methods for solving linear systems


Let A Cnn be a nonsingular matrix. We want to solve the linear system Ax = b by (a) Direct methods (nite steps); Iterative methods (convergence). (See Chapter 4)

2.1

Elementary matrices

x1 y x1 y 1 n . . . Let X = Kn and x, y X. Then y x K, xy = . . . . The eigenvalues xn y xn y 1 n of xy are {0, , 0, y x}, since rank(xy ) = 1 by (xy )z = (y z )x and (xy )x = (y x)x. Denition 2.1.1 A matrix of the form I xy is called an elementary matrix. The eigenvalues of (I xy ) are {1, 1, , 1, 1 y x}. Compute (I xy )(I xy ) = I ( + y x)xy . If y x 1 = 0 and let =
, y x1

( K, x, y Kn )

(2.1.1)

(2.1.2)

then + y x = 0. We have 1 1 + = y x. (2.1.3)

(I xy )1 = (I xy ),

Example 2.1.1 Let x Kn , and x x = 1. Let H = {z : z x = 0} and Q = I 2xx (Q = Q , Q1 = Q).

Then Q reects each vector with respect to the hyperplane H . Let y = x + w, w H . Then, we have Qy = Qx + Qw = x + w 2(x w)x = x + w.

24 Chapter 2. Numerical methods for solving linear systems Example 2.1.2 Let y = ei = the i-th column of unit matrix and x = li = [0, , 0, li+1,i , , ln,i ]T . Then, 1 ... 1 T I + li ei = (2.1.4) li+1,i . ... . . ln,i 1 Since eT i li = 0, we have From the equality
T T T T T T T (I + l1 eT 1 )(I + l2 e2 ) = I + l1 e1 + l2 e2 + l1 (e1 l2 )e2 = I + l1 e1 + l2 e2 1 (I + li eT = (I li eT i ) i ).

(2.1.5)

follows that
T T T T T (I + l1 eT 1 ) (I + li ei ) (I + ln1 en1 ) = I + l1 e1 + l2 e2 + + ln1 en1 1 . 0 l21 . . = . . (2.1.6) . . .. .. . . ln1 ln,n1 1

Theorem 2.1.1 A lower triangular with 1 on the diagonal can be written as the product of n 1 elementary matrices of the form (2.1.4).
T 1 T Remark 2.1.1 (I + l1 eT = (l ln1 eT 1 + . . . + ln1 en1 ) n1 ) . . . (I l1 e1 ) which can not be simplied as in (2.1.6).

2.2

LR-factorization

Denition 2.2.1 Given A Cnn , a lower triangular matrix L and an upper triangular matrix R. If A = LR, then the product LR is called a LR-factorization (or LRdecomposition) of A. Basic problem: Given b = 0, b Kn . Find a vector l1 = [0, l21 , . . . , ln1 ]T and c K such that (I l1 eT 1 )b = ce1 . Solution: b1 = c, bi li1 b1 = 0, b1 = 0 , b1 = 0 , i = 2, . . . , n.

it has no solution (since b = 0), then c = b1 , li1 = bi /b1 , i = 2, . . . , n.

2.2 LR-factorization 25 Construction of LR-factorization: (0) (0) (0) (0) Let A = A(0) = [a1 | . . . | an ]. Apply basic problem to a1 : If a11 = 0, then there (0) (0) T exists L1 = I l1 eT 1 such that (I l1 e1 )a1 = a11 e1 . Thus (0) A(1) = L1 A(0) = [L1 a1 | . . . | L1 a(0) ] = n a11 0 . . . 0 The i-th step: A(i) = Li A(i1) = Li Li1 . . . L1 A(0) (0) a11 (1) 0 a22 . .. . . 0 . . . . . a(i1) = . ii . . . (i) . . . 0 ai+1,i+1 . . . . . . . . . . . . . 0 If aii
(i1) (0)

a12 . . . a1n (1) (1) a22 a2n . . . . . . (1) (1) an2 . . . ann

(0)

(0)

. (2.2.1)

a1n (1) a2n . . . ain


(i1)

(0)

(2.2.2)

ai+1,n . . . ann
(i)

(i)

an,i+1

(i)

= 0, for i = 1, . . . , n 1, then the method is executable and we have that A(n1) = Ln1 . . . L1 A(0) = R (2.2.3)

is an upper triangular matrix. Thus, A = LR. Explicit representation of L:


1 T Li = I li eT L i , i = I + li ei 1 1 T T L = L 1 . . . Ln1 = (I + l1 e1 ) . . . (I + ln1 en1 ) T = I + l1 eT (by (2.1.6)). 1 + . . . + ln1 en1

Theorem 2.2.1 Let A be nonsingular. Then A has an LR-factorization (A=LR) if and only if ki := det(Ai ) = 0, where Ai is the leading principal matrix of A, i.e., a11 . . . a1i . . . Ai = . . . , ai1 . . . aii for i = 1, . . . , n 1. Proof: (Necessity ): Since A = LR, we have r11 r1i l11 a11 . . . a1i . .. . . . . . . O O ... . . . = . rii li1 . . . lii ai1 . . . aii

26 Chapter 2. Numerical methods for solving linear systems From det(A) = 0 follows that det(L) = 0 and det(R) = 0. Thus, ljj = 0 and rjj = 0, for j = 1, . . . , n. Hence ki = l11 . . . lii r11 . . . rii = 0. (Suciency ): From (2.2.2) we have
1 1 (i) A(0) = (L 1 . . . L i )A .

Consider the (i + 1)-th leading principle determinant. From (2.2.3) we have a11 . . . ai,i+1 . . . . . . ai+1 . . . ai+1,i+1 1 0 a11
(0)

.. . l21 . . .. ... = . . . ... ... . . li+1,1 li+1,i 1


(0) (1) (i)

a12 a22

(0)

...

aii
(i1)

(1)

. . . . . . ai,i+1 (i) ai+1,i+1


(i1)

0
(i)

Thus, ki = 1 a11 a22 . . . ai+1,i+1 = 0 which implies ai+1,i+1 = 0. Therefore, the LRfactorization of A exists. Theorem 2.2.2 If a nonsingular matrix A has an LR-factorization with A = LR and l11 = = lnn = 1, then the factorization is unique.
1 1 Proof: Let A = L1 R1 = L2 R2 . Then L 2 L1 = R2 R1 = I .

Corollary 2.2.1 If a nonsingular matrix A has an LR-factorization with A = LDR, where D is diagonal, L and RT are unit lower triangular (with one on the diagonal) if and only if ki = 0. Theorem 2.2.3 Let A be a nonsingular matrix. Then there exists a permutation P , such that P A has an LR-factorization. (Proof ): By construction! Consider (2.2.2): There is a permutation Pi , which inter(i1) changes the i-th row with a row of index large than i, such that 0 = aii ( Pi A(i1) ). This procedure is executable, for i = 1, . . . , n 1. So we have Ln1 Pn1 . . . Li Pi . . . L1 P1 A(0) = R. Let P be a permutation which aects only elements i + 1, , n. It holds
1 T P (I li eT = I (P li )eT i )P i = I li ei = Li , 1 (eT = eT i P i )

(2.2.4)

i is lower triangular. Hence we have where L i P. P Li = L Now write all Pi in (2.2.4) to the right as n2 . . . L 1 Pn1 . . . P1 A(0) = R. Ln1 L n2 L 1 and P = Pn1 P1 . Then we have P A = LR with L1 = Ln1 L (2.2.5)

2.3 Gaussian elimination

2.3

Gaussian elimination
Practical implementation
Ax = b

27

2.3.1

Given a linear system (2.3.1) with A nonsingular. We rst assume that A has an LR-factorization. i.e., A = LR. Thus LRx = b. We then (i) solve Ly = b; (ii) solve Rx = y . These imply that LRx = Ly = b. From (2.2.4), we have Ln1 . . . L2 L1 (A | b) = (R | L1 b). Algorithm 2.3.1 (without permutation) For k = 1, . . . , n 1, if akk = 0 then stop (); else j := akj (j = k + 1, . . . , n); for i = k + 1, . . . , n, := aik /akk , aik := ; for j = k + 1, . . . , n, aij := aij j , bj := bj bk . For x: (back substitution!) xn = bn /ann ; for i = n 1, n 2, . . . , 1, xi = (bi n j =i+1 aij xj )/aii . Cost of computation (one multiplication + one addition one op): (i) LR-factorization: n3 /3 n/3 ops; (ii) Computation of y : n(n 1)/2 ops; (iii) Computation of x: n(n + 1)/2 ops. For A1 : 4/3n3 n3 /3 + kn2 (k = n linear systems). Pivoting: (a) Partial pivoting; (b) Complete pivoting. From (2.2.2), we have 0 . . . = . . . . . . 0 a11
(0)

...

ak1,k1 0 . . .
(k2)

akk . . . ank
(k1)

a1n . . . akn . . .

(0)

A(k1)

ak1,n
(k1)

(k2)

...

(k1)

ann

(k1)

28 For (a):

Chapter 2. Numerical methods for solving linear systems

Find a p {k, . . . , n} such that |apk | = maxkin |aik | (rk = p) (2.3.2) swap akj , bk and apj , bp respectively, (j = 1, . . . , n). Replacing () in Algorithm 2.3.1 by (2.3.2), we have a new factorization of A with partial pivoting, i.e., P A = LR (by Theorem 2.2.1) and |lij | 1 for i, j = 1, . . . , n. For solving linear system Ax = b, we use P Ax = P b L(Rx) = P T b b. It needs extra n(n 1)/2 comparisons. For (b): Find p, q {k, . . . , n} such that |apq | max |aij |, (rk := p, ck := q ) swap akj , bk and apj , bp respectively, (j = k, . . . , n), swap aik and aiq (i = 1, . . . , n).
ki,j n

(2.3.3)

Replacing () in Algorithm 2.3.1 by (2.3.3), we also have a new factorization of A with complete pivoting, i.e., P A = LR (by Theorem 2.2.1) and |lij | 1, for i, j = 1, . . . , n. For solving linear system Ax = b, we use P A(T x) = P b LRx = b x = x. It needs n3 /3 comparisons. Example 2.3.1 Let A = Then (A) = A A1 Without pivoting: L = R = LR =

104 1 be in three decimal-digit oating point arithmetic. 1 1 4. A is well-conditioned. , f l(1/104 ) = 104 , , f l(1 104 1) = 104 . = 104 1 1 0 = 104 1 1 1 = A. 1 . 2

1 0 f l(1/104 ) 1

104 1 0 f l(1 104 1) 1 0 104 1 104 1 0 104

Here a22 entirely lost from computation. It is numerically unstable. Let Ax = Then x

1 1 . But Ly = solves y1 = 1 and y2 = f l(2 104 1) = 104 , Rx =y 1 2 solves x 2 = f l((104 )/(104 )) = 1, x 1 = f l((1 1)/104 ) = 0. We have an erroneous 8 solution with cond(L), cond(R) 10 . Partial pivoting: L = R = 1 0 4 f l(10 /1) 1 1 1 0 f l(1 104 ) = = 1 0 4 10 1 1 1 0 1 . ,

L and R are both well-conditioned.

2.3 Gaussian elimination

29

2.3.2

LDR- and LL -factorizations

Let A = LDR as in Corollary 2.2.1. Algorithm 2.3.2 (Crouts factorization or compact method) For k = 1, . . . , n, for p = 1, 2, . . . , k 1, rp := dp apk , p := akp dp , 1 dk := akk k p=1 akp rp , if dk = 0, then stop; else for i = k + 1, . . . , n, 1 aik := (aik k p=1 aip rp )/dk , 1 aki := (aki k p=1 p api )/dk . Cost: n3 /3 ops. With partial pivoting: see Wilkinson EVP pp.225-. Advantage: One can use double precision for inner product. Theorem 2.3.1 If A is nonsingular, real and symmetric, then A has a unique LDLT factorization, where D is diagonal and L is a unit lower triangular matrix (with one on the diagonal). Proof: A = LDR = AT = RT DLT . It implies L = RT . Theorem 2.3.2 If A is symmetric and positive denite, then there exists a lower triangular G Rnn with positive diagonal elements such that A = GGT . Proof: A is symmetric positive denite xT Ax 0, for all nonzero vector x Rnn ki 0, for i = 1, , n, all eigenvalues of A are positive. From Corollary 2.2.1 and Theorem 2.3.1 we have A = LDLT . From L1 ALT = D 1/2 1/2 1 T follows that dk = (eT ek ) > 0. Thus, G = Ldiag{d1 , , dn } is real, and k L )A(L then A = GGT . Algorithm 2.3.3 (Cholesky factorization) Let A be symmetric positive denite. To nd a lower triangular matrix G such that A = GGT . For k = 1, 2, . . . , n, 1 2 1/2 ; akk := (akk k p=1 akp ) for i = k + 1, . . . , n, 1 aik = (aik k p=1 aip akp )/akk . Cost: n3 /6 ops. Remark 2.3.1 For solving symmetric, indenite systems: See Golub/ Van Loan Matrix Computation pp. 159-168. P AP T = LDLT , D is 1 1 or 2 2 block-diagonal matrix, P is a permutation and L is lower triangular with one on the diagonal.

30

Chapter 2. Numerical methods for solving linear systems

2.3.3

Error estimation for linear systems


Ax = b, (2.3.4)

Consider the linear system and the perturbed linear system (A + A)(x + x) = b + b, where A and b are errors of measure or round-o in factorization. Denition 2.3.1 Let be an operator norm and A be nonsingular. Then (A) = 1 A A is a condition number of A corresponding to . Theorem 2.3.3 (Forward error bound) Let x be the solution of the (2.3.4) and x+x be the solution of the perturbed linear system (2.3.5). If A A1 < 1, then x x 1 Proof: From (2.3.5) we have (A + A)x + Ax + Ax = b + b. Thus, x = (A + A)1 [(A)x b]. Here, Corollary 2.7 implies that (A + A)1 exists. Now, (A + A)1 = (I + A1 A)1 A1 A1 On the other hand, b = Ax implies b A x . So, 1 A . x b From (2.3.7) follows that x inequality (2.3.6) is proved.
A1 1 A1 A A A

(2.3.5)

A b + A b

(2.3.6)

(2.3.7)

1 1 A1 A

(2.3.8) x + b ). By using (2.3.8), the

( A

Remark 2.3.2 If (A) is large, then A (for the linear system Ax = b) is called illconditioned, else well-conditioned.

2.3.4

Error analysis for Gaussian algorithm

A computer in characterized by four integers: (a) the machine base ; (b) the precision t; (c) the underow limit L; (d) the overow limit U . Dene the set of oating point numbers. F = {f = 0.d1 d2 dt e | 0 di < , d1 = 0, L e U } {0}. (2.3.9)

2.3 Gaussian elimination 31 L1 U t Let G = {x R | m |x| M } {0}, where m = and M = (1 ) are the minimal and maximal numbers of F \ {0} in absolute value, respectively. We dene an operator f l : G F by f l(x) = the nearest c F to x by rounding arithmetic. One can show that f l satises f l(x) = x(1 + ), || eps, (2.3.10)

1t . (If = 2, then eps = 2t ). It follows that where eps = 1 2 f l(a b) = (a b)(1 + ) or f l(a b) = (a b)/(1 + ), where || eps and = +, , , /. Algorithm 2.3.4 Given x, y Rn . The following algorithm computes xT y and stores the result in s. s = 0, for k = 1, . . . , n, s = s + xk yk . Theorem 2.3.4 If n2t 0.01, then
n n

f l(
k=1

xk y k ) =
k=1 p k=1

xk yk [1 + 1.01(n + 2 k )k 2t ], |k | 1

Proof: Let sp = f l(

xk yk ) be the partial sum in Algorithm 2.3.4. Then s1 = x1 y1 (1 + 1 )

with |1 | eps and for p = 2, . . . , n, sp = f l[sp1 + f l(xp yp )] = [sp1 + xp yp (1 + p )](1 + p ) with |p |, |p | eps. Therefore
n

f l(xT y ) = sn =
k=1

xk yk (1 + k ),

where (1 + k ) = (1 + k )
n

n j =k (1

+ j ), and 1 0. Thus,
n

f l(
k=1

xk y k ) =
k=1

xk yk [1 + 1.01(n + 2 k )k 2t ].

(2.3.11)

The result follows immediately from the following useful Lemma.

32 Chapter 2. Numerical methods for solving linear systems t Lemma 2.3.5 If (1 + ) = n and n2t 0.01, then k=1 (1 + k ), where |k | 2
n

(1 + k ) = 1 + 1.01n2t with || 1.
k=1

Proof: From assumption it is easily seen that


n

(1 2 )

t n

(1 + k ) (1 + 2t )n .
k=1

(2.3.12)

Expanding the Taylor expression of (1 x)n as 1 < x < 1, we get (1 x)n = 1 nx + Hence (1 2t )n 1 n2t . Now, we estimate the upper bound of (1 + 2t )n : x2 x3 x x 2x2 e =1+x+ + + = 1 + x + x(1 + + + ). 2! 3! 2 3 4!
x

n(n 1) (1 x)n2 x2 1 nx. 2 (2.3.13)

If 0 x 0.01, then 1 1 + x ex 1 + x + 0.01x ex 1 + 1.01x 2 (2.3.14)

(Here, we use the fact e0.01 < 2 to the last inequality.) Let x = 2t . Then the left inequality of (2.3.14) implies t (1 + 2t )n e2 n (2.3.15) Let x = 2t n. Then the second inequality of (2.3.14) implies e2 From (2.3.15) and (2.3.16) we have (1 + 2t )n 1 + 1.01n2t . , R be the Let the exact LR-factorization of A be L and R (A = LR) and let L LR-factorization of A by using Gaussian Algorithm (without pivoting). There are two possibilities: | and |R R |. (i) Forward error analysis: Estimate |L L R be the exact LR-factorization of a perturbed (ii) Backward error analysis: Let L = A + F . Then F will be estimated, i.e., |F | ?. matrix A
t n

1 + 1.01n2t

(2.3.16)

2.3 Gaussian elimination

2.3.5

` Apriori error estimate for backward error bound of LRfactorization


A(k+1) = Lk A(k) ,

33

From (2.2.2) we have for k = 1, 2, . . . , n 1 (k) (k) f l(aik /akk ), i k + 1. (k+1) = aij (A(1) = A). Denote the entries of A(k) by aij and let lik = From (2.2.2) we know that 0; for i k + 1, j = k (k) (k ) f l(aij f l(lik akj )); for i k + 1, j k + 1 (k) otherwise. aij ;
(k) (k) (k)

(2.3.17)

From (2.3.10) we have lik = (aik /akk )(1 + ik ) with |ik | 2t . Then aik lik akk + aij ik = 0, Let aik ik ik . From (2.3.10) we also have aij
(k+1) (k) (k) (k) (k) (k)

for i k + 1.

(2.3.18)

= f l(aij f l(lik akj )) =


(k) (aij

(k)

(k)

(2.3.19)

(k) (lik akj (1

+ ij )))/(1 + ij )

with |ij |, |ij | 2t . Then aij


(k ) (k+1)

= aij lik akj lik akj ij + aij


(k+1)

(k )

(k )

(k)

(k+1)

ij ,

for i, j k + 1.
(k)

(2.3.20)

Let ij lik akj ij + aij ij which is the computational error of aij in A(k+1) . From (2.3.17), (2.3.18) and (2.3.20) we obtain (k) (k) (k) aij lik akk + ij ; for i k + 1, j = k (k+1) (k) (k) (k) (2.3.21) aij = aij lik akj + ij ; for i k + 1, j k + 1 (k) (k) aij + ij ; otherwise, where ij
(k)

(k)

(k) for i k + 1, j = k, aij ij ; (k) (k+1) = lik akj ij aij ij ; for i k + 1, j k + 1 0; otherwise.
(k)

(2.3.22)

Let E (k) be the error matrix with entries ij . Then (2.3.21) can be written as A(k+1) = A(k) Mk A(k) + E (k) , where Mk = 0 .. . 0 lk+1,k . .. . . . ln,k 0 (2.3.23)

(2.3.24)

34 Chapter 2. Numerical methods for solving linear systems For k = 1, 2 . . . , n 1, we add the n 1 equations in (2.3.23) together and get M1 A(1) + M2 A(2) + + Mn1 A(n1) + A(n) = A(1) + E (1) + + E (n1) . From (2.3.17) we know that the k -th row of A(k) is equal to the k -th row of A(k+1) , , A(n) , respectively and from (2.3.24) we also have Mk A(k) = Mk A(n) = Mk R. Thus, Then R = A + E, L where L= 1 l21 . . . 1 .. O and E = E (1) + + E (n1) . 1 (2.3.26) (2.3.25) = A(1) + E (1) + + E (n1) . (M1 + M2 + + Mn1 + I )R

ln1 . . . ln,n1

Now we assume that the partial pivotings in Gaussian Elimination are already ar(k) ranged such that pivot element akk has the maximal absolute value. So, we have |lik | 1. Let (k ) = max |aij |/ A . (2.3.27)
i,j,k

Then

|aij | A

(k)

(2.3.28)

From (2.3.22) and (2.3.28) follows that t 2 ; for i k + 1, j = k, (k ) 21t ; for i k + 1, j k + 1, |ij | A 0; otherwise. Therefore, |E
(k )

(2.3.29)

| A
2 t

0 0 . . .

0 1 . . .

0 2 . . .

0 2 . . . 2

. (2.3.30)

0 1 2 From (2.3.26) we get 0 1 1 . . . 0 2 3 . . . 0 2 4 . . . 0 2 4 . . .

t |E | A 2 1 3 5 1 3 5 Hence we have the following theorem.

2n 4 2n 4 2n 3 2n 2

0 2 4 . . .

(2.3.31)

2.3 Gaussian elimination 35 Theorem 2.3.6 The LR-factorization L and R of A using Gaussian Elimination with partial pivoting satises R = A + E, L where E Proof:
n

n2 A

t 2

(2.3.32)

(
j =1

(2j 1) 1) < n2 A

t 2 .

and R , i.e., Now we shall solve the linear system Ax = b by using the factorization L = b and Rx = y. Ly For Ly = b: From Algorithm 2.3.1 we have y1 = f l(b1 /l11 ), li1 y1 li2 y2 li,i1 yi1 + bi yi = f l lii for i = 2, 3, . . . , n. From (2.3.10) we have y1 = b1 /l11 (1 + 11 ), with |11 | 2t f l(li1 y1 li2 y2 li,i1 yi1 )+bi yi = f l ( ) lii (1+ii ) = f l(li1 y1 li2 y2 li,i1 yi1 )+bi , with |ii |, | | 2t . ii l (1+ )(1+ )
ii ii ii

(2.3.33)

(2.3.34)

Applying Theorem 2.3.4 we get f l(li1 y1 li2 y2 li,i1 yi1 ) = li1 (1 + i1 )y1 li,i1 (1 + i,i1 )yi1 , where |i1 | (i 1)1.01 2t ; for i = 2, 3, , n, i = 2, 3, , n, |ij | (i + 1 j )1.01 2t ; for j = 2, 3, , i 1. So, (2.3.34) can be written as l11 (1 + 11 )y1 = b1 , l (1 + i1 )y1 + + li,i1 (1 + i,i1 )yi1 + lii (1 + ii )(1 + ii )yi = bi , i1 for i = 2, 3, , n. or (L + L)y = b. (2.3.37)

(2.3.35)

(2.3.36)

36 Chapter 2. Numerical methods for solving linear systems From (2.3.35) (2.3.36) and (2.3.37) follow that |l11 | 0 |l21 | 2|l22 | 2|l31 | 2|l32 | 2|l33 | |L| 1.01 2t ... . 3|l41 | 3|l42 | 2|l43 | . . . . . . . . .. .. . . . (n 1)|ln1 | (n 1)|ln2 | (n 2)|ln3 | 2|ln,n1 | 2|lnn | (2.3.38) This implies, L

n(n + 1) n(n + 1) 1.01 2t max |lij | 1.01 2t . i,j 2 2

(2.3.39)

Theorem 2.3.7 For lower triangular linear system Ly = b, if y is the exact solution of (L + L)y = b, then L satises (2.3.38) and (2.3.39). = b and Rx = y , respectively, the Applying Theorem 2.3.7 to the linear system Ly solution x satises + L )(R + R )x = b (L or R + ( L )R +L ( R ) + ( L )( R ))x = b. (L )R +L ( R ) + ( L )( R )]x = b. [A + E + ( L and R satisfy The entries of L |lij | 1, and |rij | A Therefore, we get L R n, n A
, .

(2.3.40)

R = A + E , substituting this equation into (2.3.40) we get Since L (2.3.41)

L R L Let

(2.3.42)
n(n+1) 1.01 2

2 ,

n(n+1) 1.012t . 2

In practical implementation we usually have n2 2t << 1. So it holds

n2 A

. (2.3.43) R

)R +L ( R ) + ( L )( R ). A = E + ( L R E + L 1.01(n3 + 3n2 ) A + L t 2 R + L

Then, (2.3.32) and (2.3.42) we get A


(2.3.44)

2.3 Gaussian elimination 37 Theorem 2.3.8 For a linear system Ax = b the solution x computed by Gaussian Elimination with partial pivoting is the exact solution of the equation (A + A)x = b and A satises (2.3.43) and (2.3.44). Remark 2.3.3 The quantity dened by (2.3.27) is called a growth factor. The growth factor measures how large the numbers become during the process of elimination. In practice, is usually of order 10 for partial pivot selection. But it can be as large as = 2n1 , when A = 1 0 1 1 . . . 1 . . . . . . 1 1 1 1 0 0 0 ... ... . . . ... ... 0 1 1 1 1 1 1 . 1 1 1

Better estimates hold for special types of matrices. For example in the case of upper Hessenberg matrices, that is, matrices of the form . ... . . . .. A = . .. .. . . . . 0 the bound (n 1) can be shown. (Hessenberg matrices arise in eigenvalus problems.) For tridiagonal matrices 1 2 0 . . 2 . . . . ... ... ... A = ... ... n 0 n n it can even be shown that 2 holds for partial pivot selection. Hence, Gaussian elimination is quite numerically stable in this case. For complete pivot selection, Wilkinson (1965) has shown that |ak ij | f (k ) max |aij |
i,j

with the function

f (k ) := k 2 [21 3 2 4 3 k (k1) ] 2 . This function grows relatively slowly with k : k 10 20 50 100 f (k ) 19 67 530 3300

38 Chapter 2. Numerical methods for solving linear systems Even this estimate is too pessimistic in practice. Up until now, no matrix has been found which fails to satisfy |aij | (k + 1) max |aij | k = 1, 2, ..., n 1,
i,j (k )

when complete pivot selection is used. This indicates that Gaussian elimination with complete pivot selection is usually a stable process. Despite this, partial pivot selection is preferred in practice, for the most part, because: (i) Complete pivot selection is more costly than partial pivot selection. (To compute A(i) , the maximum from among (n i + 1)2 elements must be determined instead of n i + 1 elements as in partial pivot selection.) (ii) Special structures in a matrix, i.e. the band structure of a tridiagonal matrix, are destroyed in complete pivot selection.

2.3.6

Improving and Estimating Accuracy

Iterarive Improvement: Suppose that the linear system Ax = b has been solved via the LR-factorization P A = LR. Now we want to improve the accuracy of the computed solution x. We compute r = b Ax, Ly = P r, Rz = y, (2.3.45) xnew = x + z. Then in exact arithmatic we have Axnew = A(x + z ) = (b r) + Az = b. Unfortunately, r = f l(b Ax) renders an xnew that is no more accurate than x. It is necessary to compute the residual b Ax with extended precision oating arithmetic. Algorithm 2.3.5 Compute P A = LR (t-digit) Repeat: r := b Ax (2t-digit) Solve Ly = P r for y (t-digit) Solve Rz = y for z (t-digit) Update x = x + z (t-digit) This is referred to as an iterative improvement. From (2.3.45) we have ri = bi ai1 x1 ai2 x2 ain xn . Now, ri can be roughly estimated by 2t maxj |aij | |xj |. That is r 2t A x . (2.3.47) (2.3.46)

2.3 Gaussian elimination Let e = x A1 b = A1 (Ax b) = A1 r. Then we have e A1 From (2.3.47) follows that e A1 2t A x = 2t cond(A) x . Let cond(A) = 2p , 0 < p < t, (p is integer). Then we have e / x 2(tp) . r .

39

(2.3.48)

(2.3.49) (2.3.50)

From (2.3.50) we know that x has q = t p correct signicant digits. Since r is computed by double precision, so we can assume that it has at least t correct signicant digits. Therefore for solving Az = r according to (2.3.50) the solution z (comparing with e = A1 r) has q -digits accuracy so that xnew = x + z has usually 2q -digits accuracy. From above discussion, the accuracy of xnew is improved about q -digits after one iteration. Hence we stop the iteration, when the number of the iterates k (say!) saties kq t. From above we have z / x e / x 2q = 2t 2p . (2.3.51) From (2.3.49) and (2.3.51) we have cond(A) = 2t ( z / x ). By (2.3.51) we get q = log2 ( x t ) and k = z log2 (
x z

In the following we shall give a further discussion of convergence of the iterative improvement. From Theorem 2.3.8 we know that z in Algorithm 5.5 is computed by (A+A)z = r. That is A(I + F )z = r, (2.3.52) where F = A1 A. Theorem 2.3.9 Let the sequence of vectors {xv } be the sequence of improved solutions in Algorithm 5.5 for solving Ax = b and x = A1 b be the exact solution. Assume that Fk in (2.3.52) satises Fk < 1/2 for all k . Then {xk } converges to x , i.e., limv xk x = 0. Proof: From (2.3.52) and rk = b Axk we have A(I + Fk )zk = b Axk . Since A is nonsingular, multiplying both sides of (2.3.53) by A1 we get (I + Fk )zk = x xk . (2.3.53)

40 Chapter 2. Numerical methods for solving linear systems From xk+1 = xk + zk we have (I + Fk )(xk+1 xk ) = x xk , i.e., (I + Fk )xk+1 = Fk xk + x . Subtracting both sides of (2.3.54) from (I + Fk )x we get (I + Fk )(xk+1 x ) = Fk (xk x ). Applying Corollary 1.2.1 we have xk+1 x = (I + Fk )1 Fk (xk x ). Hence, xk+1 x Fk Let = /(1 ). Then xk x k 1 x1 x . But < 1/2 follows < 1. This implies convergence of Algorithm 2.3.5. Corollary 2.3.1 If 1.01(n3 + 3n2 )2t A then Algorithm 2.3.5 converges. Proof: From (2.3.52) and (2.3.44) follows that Fk 1.01(n3 + 3n2 )2t cond(A) < 1/2. A1 < 1/2, xk x xk x . 1 Fk 1 (2.3.54)

2.4
2.4.1

Special Linear Systems


Toeplitz Systems

Denition 2.4.1 (i) T Rnn is called a Toeplitz matrix if there exists rn+1 , , r0 , , rn1 such that aij = rj i for all i, j . e.g., r0 r1 r2 r3 r1 r0 r1 r2 T = r2 r1 r0 r1 , (n = 4). r3 r2 r1 r0 (ii) B Rnn is called a Persymmetric matrix if it is symmetric about northest-southwest diagonal, i.e., bij = bnj +1,ni+1 for all i, j . That is, B = EB T E, where E = [en , e1 ].

2.4 Special Linear Systems Given scalars r1 , , rn1 such that 1 r Tk = .1 . . rk1

41 the matrices r1 1 r2 r1 ... rk1 . . . 1

are all positive denite, for k = 1, . . . , n. Three algorithms will be described: (a) Durbins Algorithm for the Yule-Walker problem Tn y = (r1 , . . . , rn )T . (b) Levinsons Algorithm for the general right hand side Tn x = b.
1 (c) Trenchs Algorithm for computing B = Tn .

To (a): Let Ek = [ek , . . . , e1 ]. Suppose the k -th order Yule-Walker system Tk y = (r1 , . . . , rk )T = rT has been solved. Consider the (k + 1)-st order system Tk Ek r r Ek 1
T

(k)

(k)

r rk+1

can be solved in O(k ) ops. Observe that


1 1 z = Tk (r Ek r) = y Tk Ek r

(2.4.55) (2.4.56)

and = rk+1 = rT Ek z. Since we get


1 Tk

is persymmetric,

1 Tk Ek

1 Ek Tk

and z = y + Ek y . Substituting into (2.4.56)

= rk+1 rT Ek (y + Ek y ) = (rk+1 + rT Ek y )/(1 + rT y ). Here (1 + rT y ) is positive, because Tk+1 is positive denite and I Ek y 0 1
T

Tk Ek r r Ek 1
T

I Ek y 0 1

Tk 0 0 1 + rT y

Algorithm 2.4.1 (Durbin Algorithm, 1960) Let Tk y (k) = r(k) = (r1 , . . . , rk )T . For k = 1, . . . , n, y (1) = r1 , for k = 1, . . . , n 1, k = 1 + r(k)T y (k) , k = (rk+1 + r(k)T Ek y (k) )/k , z (k) = y (k) + k Ek y (k) , z (k) . y (k+1) = k

42

Chapter 2. Numerical methods for solving linear systems This algorithm requires 3 n2 ops to generate y = y (n) . 2 Further reduction: k = 1 + r(k)T y (k) = 1 + [r(k1)T , r(k) ] y (k1) + k1 Ek1 y (k1) k1

= 1 + r(k1)T y (k1) + k1 (r(k1)T Ek1 y (k1) + rk ) 2 = k1 + k1 (k1 k1 ) = (1 k 1 )k1 . To (b): Tk x = b = (b1 , , bk )T , for 1 k n. Want to solve Tk Ek r rT Ek 1 b bk+1 , (2.4.58) (2.4.57)

1 where r = (r1 , , rk )T . Since = Tk (b Ek r) = x + Ek y , it follows that

= bk+1 rT Ek = bk+1 rT Ek x rT y = (bk+1 rT Ek x)/(1 + rT y ). We can eect the transition form (2.4.57) to (2.4.58) in O(k ) ops. We can solve the system Tn x = b by solving Tk x(k) = b(k) = (b1 , . . . , bk )T and Tk y (k) = r(k) = (r1 , . . . , rk )T . It needs 2n2 ops. See Algorithm Levinson (1947) in Matrix Computations, pp.128-129 for details. To (c): 1 B A Er 1 = , Tn = T T r E 1 where A = Tn1 , E = En1 and r = (r1 , . . . , rn1 )T . From the equation A Er r E 1
T

0 1

follows that A = Er = E (r1 , . . . , rn1 )T and = 1 rT E. If y is the solution of (n 1)-st Yule-Walker system Ay = r, then = 1/(1 + rT y ) and = Ey.
1 are readily obtained. Since AB + Er T = In1 , we Thus the last row and column of Tn have T B = A1 (A1 Er) T = A1 + .

2.4 Special Linear Systems Since A = Tn1 is nonsingular and Toeplitz, its inverse is persymmetric. Thus bij = (A1 )ij + i j i j = (A1 )nj,ni + ni nj i j = bnj,ni + 1 = bnj,ni (i j ni nj ).

43

It needs 7 n2 ops. See Algorithm Trench (1964) in Matrix Computations, pp.132 for 4 details.

2.4.2

Banded Systems

Denition 2.4.2 Let A be a n n matrix. A is called a (p, q )-banded matrix, if aij = 0 whenever i j > p or j i > q . A has the form .. . . . . A= .. . O .. . ... .. O , q

. . . .
p

where p and q are the lower and upper band widthes, respectively. Example 2.4.1 (1, 1): tridiagonal matrix; (1, n 1): upper Hessenberg matrix; (n 1, 1): lower Hessenberg matrix. Theorem 2.4.1 Let A be a (p, q )-banded matrix. Suppose A has a LR-factorization (A = LR). Then L = (p, 0) and R = (0, q )-banded matrix, respectively. Algorithm 2.4.2 See Algorithm 4.3.1 in Matrix Computations, pp.150. Theorem 2.4.2 Let A be a (p, q )-banded nonsingular matrix. If Gaussian Elimination with partial pivoting is used to compute Gaussian transformations Lj = I lj eT j , for j = 1,. . ., n 1, and permutations P1 , . . ., Pn1 such that Ln1 Pn1 L1 P1 A = R is upper triangular, then R is a (0, p + q )-banded matrix and lij = 0 whenever i j or i > j + p. (Since the j -th column of L is a permutation of the Gaussian vector lj , it follows that L has at most p + 1 nonzero elements per column.)

44

Chapter 2. Numerical methods for solving linear systems

2.4.3

Symmetric Indenite Systems

Consider the linear system Ax = b, where A Rnn is symmetric but indenite. There are a method using n3 /6 ops due to Aasen (1971) that computes the factorization P AP T = LT LT , where L = [lij ] is unit lower triangular, P is a permutation chosen such that | lij | 1, and T is tridiagonal. Rather than the above factorization P AP T = LT LT we have the calculation of P AP T = LDLT , where D is block diagonal with 1 by 1 and 2 by 2 blocks on diagonal, L = [lij ] is unit lower triangular, and P is a permutation chosen such that | lij | 1. Bunch and Parlett (1971) has proposed a pivot strategy to do this, n3 /6 ops are required. Unfortunately the overall process requires n3 /12 n3 /6 comparisons. A better method described by Bunch and Kaufmann (1977) requires n3 /6 ops and O(n2 ) comparisons. A detailed discussion of this subsection see p.159-168 in Matrix Computations.

Chapter 3 Orthogonalization and least squares methods


3.1
3.1.1

QR-factorization (QR-decomposition)
Householder transformation

Denition 3.1.1 A complex m n-matrix R = [rij ] is called an upper (lower) triangular matrix, if rij = 0 for i > j (i < j ). Example 3.1.1 r11 .. (1) m = n : R = . 0 r11 ... (3) m > n : R = 0 0 r1n r11 . .. . . . , (2) m < n : R = rnn 0 rmm r1n . . . . rnn r1n . . . , rmn

Denition 3.1.2 Given A Cmn , Q Cmm unitary and R Cmn upper triangular as in Examples such that A = QR. Then the product is called a QR-factorization of A. Basic problem: Given b = 0, b Cn . Find a vector w Cn with w w = 1 and c C such that (I 2ww )b = ce1 . Solution (Householder transformation): (1) b = 0: w arbitrary (in general w = 0) and c = 0. (2) b = 0: c=
1 |b b 2 , if b1 = 0, b1 | b 2, if b1 = 0,

(3.1.1)

(3.1.2) (3.1.3)

w = 21k (b1 c, b2 , . . . , bn )T := 21k u with2k = 2 b 2 ( b 2 + |b1 |)

46 Chapter 3. Orthogonalization and least squares methods Theorem 3.1.1 Any complex m n matrix A can be factorized by the product A = QR, where Q is m m-unitary. R is m n upper triangular.
Proof: Let A(0) = A = [a1 |a2 | |an ]. Find Q1 = (I 2w1 w1 ) such that Q1 a1 = ce1 . Then c1 0 (0) (0) (0) (1) (0) A = Q1 A = [Q1 a1 , Q1 a2 , , Q1 an ] = . (3.1.4) (1) (1) . . . a2 an 0 (0) (0) (0) (0)

Find Q2 =

(1) such that (I 2w2 w2 )a2 = c2 e1 . Then

0 I w2 w2

A(2) = Q2 A(1) =

c1 0 c2 0 0 . . (2) . . . . a3 0 0

. (2) an

We continue this process. Then after l = min(m, n) steps A(l) is an upper triangular matrix satisfying A(l1) = R = Ql1 Q1 A.
Then A = QR, where Q = Q 1 Ql 1 .

Remark 3.1.1 We usually call the method in Theorem 3.1.1 as Householder method. (Algorithm ??). Theorem 3.1.2 Let A be a nonsingular n n matrix. Then the QR- factorization is essentially unique. That is, if A = Q1 R1 = Q2 R2 , then there is a unitary diagonal matrix D = diag (di ) with |di | = 1 such that Q1 = Q2 D and DR1 = R2 .
1 Proof: Let A = Q1 R1 = Q2 R2 . Then Q 2 Q1 = R2 R1 = D must be a diagonal unitary matrix.

Remark 3.1.2 The QR-factorization is unique, if it is required that the diagonal elements of R are positive. Corollary 3.1.1 A is an arbitrary m n-matrix. The following factorizations exist: (i) A = LQ, where Q is n n unitary and L is m n lower triangular. (ii) A = QL, where Q is m m unitary and L is m n lower triangular. (iii) A = RQ, where Q is n n unitary and R is m n upper triangular.

3.1 QR-factorization (QR-decomposition) Proof: (i) A has a QR-factorization. Then (ii) Let Pm = A = QR A = R Q (i).

47

O 1

1 O

. Then by Theorem 3.1.1 we have Pm APn = QR. This implies (ii). A = (Pm QPm )(Pm RPn ) QL

(iii) A has a QL-factorization (from (ii)), i.e., A = QL. This implies A = L Q (iii). Cost of Householder method Consider that the multiplications in (3.1.4) can be computed in the form u1 u )A = (I vu (I 2w1 w1 )A = (I 1 )A 2 b 2 + |b1 | b 2 1 = A vu 1 A := A vw . So the rst step for a m n-matrix A requires; c1 : m multiplications, 1 root; 4k 2 : 1 multiplication; v : m divisions (= multiplications); w: mn multiplications; A(1) = A vw : m(n 1) multiplications. Similarly, for the j -th step m and n are replaced by m j +1 and n j +1, respectively. Let l = min(m, n). Then the number of multiplications is
l1

[2(m j + 1)(n j + 1) + (m j + 2)]


j =1

(3.1.5)

= l(l 1)[

2l 1 (m + n) 5/2] + (l 1)(2mn + 3m + 2n + 4) 3 (= mn2 1/3n3 , if m n).


n1

Especially, for m = n, it needs [2(n j + 1)2 + m j + 2] = 2/3n3 + 3/2n2 + 11/6n 4


j =1 ops and (l + n 2) roots. To compute Q = Q 1 Ql1 , it requires

(3.1.6)

2[m2 n mn2 + n3 /3] multiplications (m n). Remark 3.1.3 Let A = QR be a QR-factorization A. Then we have A A = R Q QR = R R.

(3.1.7)

If A has full column rank and we require that the diagonal elements of R are positive, then we obtain the Cholesky factorization of A A.

48

Chapter 3. Orthogonalization and least squares methods

3.1.2

Gram-Schmidt method

Remark 3.1.4 Theorem 3.1.1 (or Algorithm ??) can be used to solved orthonormal basis (OB) problem. (OB) : Given linearly independent vectors a1 , , an Rn1 . Find an orthonormal basis for span{a1 , , an }. If A = [a1 , , an ] = QR with Q = [q1 , , qn ], and R = [rij ], then
k

ak =
i=1

rik qi .

(3.1.8)

By assumption rank(A) = n and (3.1.8) it implies rkk = 0. So, we have 1 qk = (ak rkk
k 1

rik qi ).
i=1

(3.1.9)

1 The vector qk can be thought as a unit vector in the direction of zk = ak k i=1 sik qi . T To ensure that zk q1 , , qk1 we choose sik = qi ak , for i = 1, , k 1. This leads to the Classical Gram-Schmidt (CGS) Algorithm for solving (OB) problem.

Algorithm 3.1.1 (Classical Gram-Schmidt (CGS) Algorithm) Given A Rmn with rank(A) = n. We compute A = QR, where Q Rmn has orthonormal columns and R Rnn . For i = 1, , n, qi = a i ; For j = 1, , i 1 T rji = qj ai , qi = qi rji qj , end for rii = qi 2 , qi = qi /rii , end for Disadvantage : The CGS method has very poor numerical properties, if some columns of A are nearly linearly independent. Advantage : The method requires mn2 multiplications (m n). Remark 3.1.5 Modied Gram-Schmidt (MGS): Write A =
n T i=1 qi ri .

Dene A(k) by
k1 n T qi ri i=1

[0, A ] = A

(k)

=
i=k

T qi ri

(3.1.10)
2

It follows that if A(k) = [z, B ], z Rm , B Rm(nk) then rkk = z (3.1.9). Compute T B. [rk,k+1 , , rkn ] = qk Next step: A(k+1) = B qk [rk,k+1 , , rkn ].

and qk = z/rkk by

3.1 QR-factorization (QR-decomposition) 49 mn Algorithm 3.1.2 (MGS) Given A R with rank (A) = n. We compute A = QR, mn where Q R has orthonormal columns and R Rnn is upper triangular. For i = 1, , n, qi = a i ; For j = 1, , i 1 T rji = qj qi , qi = qi rji qj , end for rii = qi 2 , qi = qi /rii , end for The MGS requires mn2 multiplications. Remark 3.1.6 MGS computes the QR factorization at the k th step, the k th column of Q and the k th row of R are computed. CGS at the k th step, the k th column of Q and the k th column of R are computed. Advantage for OB problem (m n): (i) Householder method requires mn2 n3 /3 ops to get factorization. A = QR and mn2 n3 /3 ops to get the rst n columns of Q. But MGS requires only mn2 ops. Thus for the problem of nding an orthonormal basis of range(A), MGS is about twice as ecient as Householder orthogonalization. (ii) MGS is numerically stable.

3.1.3

Givens method
c s s c a b k 0

Basic problem: Given (a, b)T R2 , nd c, s R with c2 + s2 = 1 such that = ,

where c = cos and s = sin. Solution: c = 1, s = 0, k = a; if b = 0, a , s = a2b+b2 , k = a2 + b2 ; if b = 0. c = a2 +b2 Let G(i, j, ) = 1 . .. . . . cos . . . . . . sin . . . cos . .. . . . 1 .

(3.1.11)

sin . . .

Then G(i, j, ) is called a Givens rotation in the (i, j )-coordinate plane. In the matrix = G(i, j, )A, the rows with index = i, j are the same as in A and A a ik = cos()aik + sin()ajk , for k = 1, . . . , n, a jk = sin()aik + cos()ajk , for k = 1, . . . , n.

50 Chapter 3. Orthogonalization and least squares methods Algorithm 3.1.3 (Givens orthogonalization) Given A Rmn . The folllowing Algorithm overwrites A with QT A = R, where Q is orthonormal and R is upper triangular. For q = 2, , m, for p = 1, 2, , min{q 1, n}, Find c = cos and s = sin as in (3.1.11) such that c s s c A := G(p, q, )A. This algorithm requires 2n2 (m n/3) ops. Fast Givens method (See Matrix Computations, pp.205-209): A modication of Givens method bases on the fast Givens rotations and requires about n2 (m n/3) ops. app aqp = 0 .

3.2

Overdetermined linear Systems - Least Squares Methods


min Ax b 2 .
2

Given A Rmn , b Rm and m > n. Consider the least squares(LS) problem:


xRn

(3.2.1) = min!}. It is easy (3.2.2) (3.2.3) (3.2.4) (3.2.5)

Let X be the set of minimizers dened by X = {x Rn | Ax b to see the following properties:

x X AT (b Ax) = 0. X is convex. X has a unique element xLS having minimal 2-norm. X = {xLS } rank (A) = n.

For x Rn , we refer to r = b Ax as its residual. AT (b Ax) = 0 is refered to as 2 the normal equation. The minimum sum is dened by 2 LS = AxLS b 2 . If we let 1 T Ax b 2 (x) = 2 2 , then (x) = A (Ax b). Theorem 3.2.1 Let A =
r i=1 T , with r =rank(A), U = [u1 , . . . , um ] and V = i ui vi

[v1 , , vn ] be the SVD of A Rmn (m n). If b Rm , then xLS = and 2 LS =


m i=r+1 2 (uT i b) r

i=1

(uT i b/i )vi

(3.2.6)

(3.2.7)

3.2 Overdetermined linear Systems - Least Squares Methods Proof: For any x Rn we have
r m

51

Ax b

2 2

= U AV (V x) U b

2 2

=
i=1

(i i

2 uT i b)

2 (uT i b) , i=r+1

where = V T x. Clearly, if x solves the LS-problem, then i = (uT i b/i ), for i = 1, . . . , r . If we set r+1 = = n = 0, then x = xLS .
1 1 , 0, .., 0) Remark 3.2.1 If we dene A+ by A+ = V + U T , where + = diag(1 , .., r nm + + + R then xLS = A b and LS = (I AA )b 2 . A is refered to as the pseudo-inverse of A. A+ is dened to be the unique matrix X Rnm that satises Moore-Penrose conditions :

(i)AXA = A, (iii) (AX )T = AX, (ii)XAX = X, (iv ) (XA)T = XA.

(3.2.8)

Existence of X is easy to check by taking X = A+ . Now, we show the uniqueness of X . Suppose X and Y satisfying the conditions (i)(iv). Then X = = = = XAX = X (AY A)X = X (AY A)Y (AY A)X (XA)(Y A)Y (AY )(AX ) = (XA)T (Y A)T Y (AY )T (AX )T (AXA)T Y T Y Y T (AXA)T = AT Y T Y Y T AT Y (AY A)Y = Y AY = Y.

If rank(A) = n (m n), then A+ = (AT A)1 AT . If rank(A) = m (m n), then A+ = AT (AAT )1 . If m = n = rank(A), then A+ = A1 . For the case rank(A)=n: Algorithm 3.2.1 (Normal equations) Given A Rmn (m n) with rank(A) = n and b Rm . This Algorithm computes the solution to the LS-problem: min{ Ax b 2 ; x R n }. Compute d := AT b, and form C := AT A by computing the Cholesky factorization C = RT R (see Remark 6.1). Solve RT y = d and RxLS = y. Algorithm 3.2.2 (Householder and Givens orthogonalizations) Given A Rmn (m n) with rank(A) = n and b Rm . This Algorithm computes the solutins to the LS-problem: min{ Ax b 2 ; x Rn }. R1 by using Householder and Givens methods Compute QR-factorization QT A = 0 respectively. (Here R1 is upper triangular). Then Ax b where QT b =
2 2

= QT Ax QT b

2 2

= R1 x c

2 2

+ d 2 2,
2 2.

c 1 . Thus, xLS = R1 c, (since rank(A) =rank(R1 ) = n) and 2 LS = d d

52 Chapter 3. Orthogonalization and least squares methods Algorithm 3.2.3 (Modied Gram-Schmidt) Given A Rmn (m n) with rank(A) = n and b Rm . The solution of min Ax b 2 is given by: nn Compute A = Q1 R1 , where Q1 Rmn with QT upper tri1 Q1 = In and R1 R T T angular. Then the normal equation (A A)x = A b is transformed to the linear system 1 T R 1 x = QT 1 b xLS = R1 Q1 b. For the case rank(A) < n: Problem: (i) How to nd a solution to the LS-problem? (ii) How to nd the unique solution having minimal 2-norm? (iii) How to compute xLS reliably with innite conditioned A ? Denition 3.2.1 Let A be a m n matrix with rank(A) = r (r m, n). The factorization A = BC with B Rmr and C Rrn is called a full rank factorization, provided that B has full column rank and C has full row rank. Theorem 3.2.2 If A = BC is a full rank factorization, then A+ = C + B + = C T (CC T )1 (B T B )1 B T . Proof: From assumption follows that B + B = (B T B )1 B T B = Ir , CC + = CC T (CC T )1 = Ir . We calculate (3.2.8) with A(C + B + )A = BCC + B + BC = BC = A, (C + B + )A(C + B + ) = C + B + BCC + B + = C + B + , A(C + B + ) = BCC + B + = BB + symmetric, (C + B + )A = C + B + BC = C + C symmetric. These imply that X = C + B + satises (3.2.8). It follows A+ = C + B + . Unfortunately, if rank(A) < n, then the QR-factorization does not necessarily produce a full rank factorization of A. For example 1 1 1 A = [a1 , a2 , a3 ] = [q1 , q2 , q3 ] 0 0 1 . 0 0 1 Fortunately,we have the following two methods to produce a full rank factorization of A. (3.2.9)

3.2 Overdetermined linear Systems - Least Squares Methods

53

3.2.1

Rank Deciency I : QR with column pivoting

Algorithm ?? can be modied in a simple way so as to produce a full rank factorization of A. R11 R12 }r A = QR, R = , (3.2.10) }m-r 0 0
r nr

where r = rank(A) < n (m n), Q is orthogonal, R11 is nonsingular upper triangular and is a permuatation. Once (3.2.10) is computed, then the LS-problem can be readily solved by Ax b where T x = must have x= y z
2 2

= (QT A )( T x) QT b
}r }n-r

2 2

= R11 y (c R12 z ) . Thus if Ax b .

2 2

+ d 2 2, = min!, then we

and QT b =

c d

}r }m-r

1 R11 (c R12 z ) z

If z is set to be zero, then we obtain the basic solution xB =


1 R11 c 0

The basic solution is not the solution with minimal 2-norm, unless the submatrix R12 is zero. Since 1 R11 R12 xLS 2 = min x z . (3.2.11) B n r Inr z R 2 We now solve the LS-problem (3.2.11) by using Algorithms 3.2.1 to 3.2.3. Algorithm 3.2.4 Given A Rmn , with rank(A) = r < n. The following algorithm computes the factorization A = QR dened by (3.2.10). The element aij is overwritten by rij (i j ). The permutation = [ec1 , . . . , ecn ] is determined according to choosing the maximum of column norm in the current step. cj := j (j = 1, 2, . . . , n), rj :=
m

For k = 1, . . . , n, Detemine p with (k p n) so that rp = max rj .


kj n

i=1

a2 ij (j = 1, . . . , n),

If rp = 0 then stop; else Interchange ck and cp , rk and rp , and aik and aip , for i = 1, . . . , m. k such that Determine a Householder Q akk 0 . . . k Q . = . . . . . . 0 amk k )A; rj := rj a2 (j = k + 1, . . . , n). A := diag (Ik1 , Q kj

54

Chapter 3. Orthogonalization and least squares methods This algorithm requires 2mnr r2 (m + n) + 2r3 /3 ops. Algorithm 3.2.4 produces the full rank factorization (3.2.10) of A. We have the following important relations: |r11 | |r22 | . . . |rrr |, rjj = 0, j = r + 1, . . . , n, |rii | |rik |, i = 1, . . . , r, k = i + 1, . . . , n. (3.2.12)

Here, r = rank(A) < n, and R = (rjj ). In the following we show another application of the full rank factorization for solving the LS-problem. Algorithm 3.2.5 (Compute xLS = A+ b directly) (i) Compute (3.2.10): A = QR ( Q(1) | Q(2) )
r + (1) + (1) (ii) (A)+ = R1 Q = R1 Q . + (iii) Compute R1 : + T T 1 Either: R1 = R1 (R1 R1 ) (since R1 has full row rank) + T T 1 (1)T (A) = R1 (R1 R1 ) Q .
+ T

R1 0

}r }m-r

, A = Q(1) R1 .

using Householder transformation (Algorithm ??) such that QR T = Or: Find Q 1 T , where T Rrr is upper triangular. 0 (1) , Q (2) ) RT = Q (1) T + Q (2) 0 = Q (1) T . Let QT := (Q 1 T + (1) R1 (1)T )+ (T T )+ = Q (1) (T T )1 . R1 = T T Q = (Q T (1) (T T )1 Q(1) . (A)+ = Q (iv) Since min Ax b 2 = min A(T x) b xLS = (A )+ b .
2

( T x )LS = (A )+ b

Remark 3.2.2 Unfortunately, QR with column pivoting is not entirely reliable as a method for detecting near rank deciency. For example: 1 c c c 1 c c n1 Tn (c) = diag(1, s, , s ) . . . . . . 0 1

c2 + s2 = 1, c, s > 0.

If n = 100, c = 0.2, then n =0.3679e 8. But this matrix is unaltered by Algorithm 3.2.4. However,the degree of unreliability is somewhat like that for Gaussian elimination with partial pivoting, a method that works very well in practice.

3.2 Overdetermined linear Systems - Least Squares Methods

55

3.2.2

Rank Deciency II : The Singular Value Decomposition

Algorithm 3.2.6 (Householder Bidiagonalization) Given A Rmn (m n). The T following algorithm overwrite A with UB AVB = B , where B is upper bidiagonal and UB and VB are orthogonal. For k = 1, , n, k of order Determine a Householder matrix U akk . . . Uk . = . . amk k )A, A := diag(Ik1 , U k of order n k + 1 such that If k 2, then determine a Householder matrix V k = (, 0, , 0), [ak,k+1 , , akn ]V k ). A := Adiag(Ik , V This algorithm requires 2mn2 2/3n3 ops. Algorithm 3.2.7 (R-Bidiagonalization) when m method of bidiagonalization. n we can use the following faster R1 , where R1 Rnn is 0 n k + 1 such that 0 , . . . 0

(1) Compute an orthogonal Q1 Rmm such that QT 1A = upper triangular.

n n (2) Applying Algorithm 3.2.6 to R1 , we get QT 2 R1 VB = B1 , where Q2 , VB R nn orthogonal and B1 R upper bidiagonal. T (3) Dene UB = Q1 diag(Q2 , Imn ). Then UB AVB =

B1 0

B bidiagonal.

This algorithm require mn2 + n3 . It involves fewer compuations comparing with Algorithm 7.6 (2mn2 2/3n3 ) whenever m 5/3n. Once the bidiagonalization of A has been achieved,the next step in the Golub-Reinsch SVD algorithm is to zero out the super diagonal elements in B . Unfortunately, we must defer our discussion of this iteration until Chapter 5 since it requires an understanding of the symmetric QR algorithm for eigenvalues. That is, it computes orthogonal matrices U and V such that T BV = = diag(1 , , n ). U By dening U = UB U and V = VB V , we see that U T AV = is the SVD of A.

56

Chapter 3. Orthogonalization and least squares methods Algorithms Algorithm 3.2.1 Normal equations Algorithm 3.2.2 Householder orthogonalization Algorithm 3.2.3 Modied Gram-Schmidt Algorithm 3.1.3 Givens orthogonalization Algorithm 3.2.6 Householder Bidiagonalization Algorithm 3.2.7 R-Bidiagonalization LINPACK Golub-Reinsch SVD Algorithm 3.2.5 QR-with column pivoting Alg. 3.2.7+SVD Chan SVD Table 3.1: Solving the LS problem (m n) Flop Counts mn2 /2 + n3 /6 mn2 n3 /3 mn2 2mn2 2/3n3 2mn2 2/3n3 mn2 + n3 2mn2 + 4n3 2mnr mr2 + 1/3r3 mn2 + 11/2n3

rank(A)=n

rank(A) < n

Remark 3.2.3 If the LINPACK SVD Algorithm is applied with eps=1017 to 1 c c c 1 c c T100 (0.2) = diag(1, s, , sn1 ) , . . . . . . 0 1 then n = 0.367805646308792467 108 . Remark 3.2.4 As we mentioned before, when solving the LS problem via the SVD, only and V have to be computed (see (3.2.6)). Table 3.1 compares the eciency of this approach with the other algorithms that we have presented.

3.2.3

The Sensitivity of the Least Squares Problem


k i=1 T i ui vi ,

Corollary 3.2.1 (of Theorem 1.2.3) Let U = [u1 , , um ], V = [v1 , , vn ] and U AV = = diag (1 , , r , 0, , 0). If k < r = rank(A) and Ak = Then
rank(B )=k

min

AB

= A Ak

= k+1 .

Proof: Since U T Ak V = diag(1 , , k , 0, , 0), it follows rank(Ak ) = k and that A Ak


2

= U T (A Ak )V

= diag(0, , 0, k+1 , , r )

= k+1 .

Suppose B Rmn and rank(B ) = k , i.e., there are orthogonal vectors x1 , , xnk such that N (B ) = span{x1 , , xnk }. This implies span{x1 , , xnk } span{v1 , , vk+1 } = {0}.

3.2 Overdetermined linear Systems - Least Squares Methods Let z be a unit vector in the intersection set. Then Bz = 0 and Az = Thus,
k+1 k+1 i=1

57
T i (vi z )ui .

AB

2 2

(A B )z

2 2

= Az

2 2

=
i=1

2 T 2 i (vi z )2 k +1 .

3.2.4

Condition number of a Rectangular Matrix

Let A Rmn , rank(A) = n, 2 (A) = max (A)/min (A). (i) The method of normal equation:
xRn

min Ax b

AT Ax = AT b.

(a) C = AT A, d = AT b. (b) Compute the Cholesky factorization C = GGT . (c) Solve Gy = d and GT xLS = y . Then x LS xLS xLS
2

eps2 (AT A) = eps2 (A)2 . + o(2 ),

x x F f (A) + x A b where (A + F ) x = b + f and Ax = b. (ii) LS solution via QR factorization Ax b


2 2

= QT Ax QT b

2 2

= R1 x c
2.

2 2

+ d 2 2,

1 xLS = R1 c , LS = d

Numerically, trouble can be expected wherever 2 (A) = 2 (R) 1/eps. But this is in contrast to normal equation, Cholesky factorization becomes problematical once 2 (A) is in the neighborhood of 1/ eps. Remark 3.2.5 A
2

(AT A)1 AT T 1 A 2 2 (A A)

2 2

= 2 (A), = 2 (A)2 .

Theorem 3.2.3 Let A Rmn , (m n), b = 0. Suppose that x, r, x , r satisfy Ax b = min!, r = b Ax, LS = r (A + A) x (b + b) 2 = min!, r = (b + b) (A + A) x.
2,

58 If

Chapter 3. Orthogonalization and least squares methods = max { A 2 b 2 n (A) , }< A 2 b 2 1 (A) LS = 1, b 2

and sin = then x x x 2 r r b 2


2 2

22 (A) + tan 2 (A)2 } + O(2 ) cos

and

(1 + 22 (A)) min(1, m n) + O(2 ).

Proof: Let E = A/ and f = b/. Since A 2 < n (A), by previous Corollary follows that rank(A + E ) = n for t [0, ]. [t = A + tE = A + A. If rank(A + A) = k < n, then A (A + A) 2 = A 2 A Ak 2 = k+1 n . Contradiction! So min A B 2 = A Ak 2
rank(B )=k

= A

k i=1

T i ui vi

= k+1 ]. (A + tE )T (A + tE )x(t) = (A + tE )T (b + tf ). (3.2.13)

Hence we have, Since x(t) is continuously dierentiable for all t [0, ], x = x(0) and x = (), it follows that x = x + x (0) + O(2 ) and x x x (0) = x x
2

+ O(2 ).

Dierentiating (3.2.13) and setting t = 0 then we have E T Ax + AT Ex + AT Ax (0) = AT f + E T b. Thus, x (0) = (AT A)1 AT (f Ex) + (AT A)1 E T r. From f
2

and E x x x 2
2

follows
2

{ A +

(AT A)1 AT A
2 2 2

2(

b A
2

LS A 2 x

+ 1)
2

(AT A)1 2 } + O(2 ).

Since AT (AxLS b) = 0, AxLS AxLS b and then b Ax and A


2 2 2 2

+ Ax b
2 2

2 2

= b

2 2

2 2

2 LS .

3.2 Overdetermined linear Systems - Least Squares Methods Thus, x x 2 1 sin eps{2 (A)( + 1) + 2 (A)2 } + O(2 ). x 2 cos cos Furthermore, by
sin cos

59

LS , 2 b 2 2 LS 2

we have

x x x 2

eps(2 (A) + 2 (A)2 LS ). ( : small )

Remark 3.2.6 Normal equation: eps 2 (A)2 . QR-approach: eps(2 (A) + LS 2 (A)2 ). (i) If LS is small and 2 (A) is large, then QR is better than the normal equation. (ii) The normal equation approach involves about half of the arithmetic when m and does not requires as much storage. n

(iii) The QR approach is applicable to a wider class of matrices because the Cholesky to AT A break down before the back substitution process on QT A = R.

3.2.5

Iterative Improvement
Im A AT 0 r x = b 0 , b Ax
2

= min!

r + Ax = b, f (k) g (k ) This implies,

AT r = 0 AT Ax = AT b. Thus, = b 0 I A AT 0 r(k+1) x(k+1) r(k) x(k) and I A AT 0 p(k) z (k ) implies that p(k) z (k ) = f (k) g (k ) .

r(k) x(k)

+ f g

If A = QR = Q

R1 , then 0 In 0 T R1 f1 f2 QT p =

I A AT 0 0 Imn 0

p = z R1 h 0 f2 z 0

f1 = f2 , g

where QT f =

h T T h = g h = R1 g . Then . Thus, R1 f2 P =Q h f2 .

1 (f1 h), z = R1

60

Chapter 3. Orthogonalization and least squares methods

Chapter 4 Iterative Methods for Solving Large Linear Systems


4.1 General procedures for the construction of iterative methods

Given a linear system of nonsingular A Ax = b. Let A=F G (4.1.2) (4.1.1)

with F nonsingular. Then (4.1.1) is equivalent to F x = Gx + b; or letting T = F 1 G and f = F 1 b we have x = T x + f. Set x(k+1) = T x(k) + f, where x(0) is given. Then the solution x of (4.1.1) is determined by iteration. (4.1.4) (4.1.3)

Example 4.1.1 We consider the standard decomposition of A A = D L R, (4.1.5)

62 Chapter 4. Iterative Methods for Solving Large Linear Systems where A = [aij ]n i,j =1 , D = diag(a11 , a22 , , ann ), 0 0 a2,1 0 . L = a3,1 a3,2 . . , . . . . . .. .. . . . an,1 an,2 an,n1 0 0 a1,2 a1,3 a1,n 0 a2,3 a2,n . ... ... ... . . . R = ... ... an1,n .. . 0 0 For ai,i = 0, i = 1, , n, D is nonsingular. If we choose F = D and G = L + R in (8.2), we then obtain the Total-step method (Jacobi method): x(k+1) = D1 (L + R)x(k) + D1 b or in formula xj
(k+1)

(4.1.6)

1 ( ajj

aji xi + bj ), j = 1, , n, k = 0, 1, .
i=j

(k)

(4.1.7)

Example 4.1.2 If D L is nonsingular in (4.1.5), then the choices of F = D L and G = R as in (4.1.2) are possible and yields the so-called Single-step method (Gauss-Seidel method): x(k+1) = (D L)1 Rx(k) + (D L)1 b (4.1.8) or in formula xj
(k+1)

1 ( ajj

aji xi
i<j

(k+1)

i>j

aji xi + bj ), j = 1, , n, k = 1, 2, .

(k)

(4.1.9)

- Total-Step method=TSM=Jacobi method. - Single-Step method=SSM=Gauss-Seidel method. We now consider (4.1.1)-(4.1.4) once again: Theorem 4.1.1 Let 1 (T ) and x be the unique solution of (4.1.3). The sequence x(k+1) = T x(k) + f converges to x for arbitrary initial vector x(0) if and only if (T ) < 1 Proof: We dene the error (k) = x(k) x. Then (k) = x(k) x = T x(k1) f T x + f = T (k1) (4.1.10)

4.1 General procedures for the construction of iterative methods or (k) = T k (0) . Theorem 1.2.4 shows that (k) 0 if and only if (T ) < 1. We now consider the following point of views on the Examples 4.1.1 and 4.1.2: (i) ops counts per iteration step. (ii) Convergence speed. Let be a vector norm, and T be the corresponding operator norm. Then (m) T m (0) = Tm . (0) (0) Here T m call
1 m

63

(4.1.11)

is a measure for the average diminution of error (m) per iteration step. We Rm (T ) = ln( T m
1 m

)=

1 ln( T m ) m

(4.1.12)

the average of convergence rate for m iterations. 1 The larger is Rm (T ), so the better is convergence rate. Let = ( (m) / (0) ) m . From (4.1.11) and (4.1.12) we get Tm or
1 m

eRm (T ) ,

1 1/Rm (T ) . e That is, after 1/Rm (T ) steps in average the error is reduced by a factor of 1/e. Since Rm (T ) is not easy to determine, we now consider m . Since
m

lim

Tm

1 m

= (T ),

it follows R (T ) = lim Rm (T ) = ln (T ).
m

(4.1.13)

R is called the asymptotic convergence rate. It holds always Rm (T ) R (T ). Example 4.1.3 Consider the Dirichlet boundary-value problem (Model problem): u uxx uyy = f (x, y ), 0 < x, y < 1, u(x, y ) = 0 (x, y ) , for the unit square := {x, y |0 < x, y < 1} R2 with boundary . To solve (4.1.14) by means of a dierence methods, one replaces the dierential operator by a dierence operator. Let h := {(xi , yi )|i, j = 1, , N + 1}, h := {(xi , 0), (xi , 1), (0, yj ), (1, yj )|i, j = 0, 1, , N + 1}, (4.1.14)

64 Chapter 4. Iterative Methods for Solving Large Linear Systems where xi = ih, yj = jh, i, j = 0, 1, , N + 1, h := N1 , N 1, an integer. +1 The dierential operator uxx uyy can be replaced for all (xi , yi ) h by the dierence operator: 4ui,j ui1,j ui+1,j ui,j 1 ui,j +1 (4.1.15) h2 up to an error i,j . Therefore for suciently small h one can thus expect that the solution zi,j , for i, j = 1, , N of the linear system 4zi,j zi1,j zi+1,j zi,j 1 zi,j +1 = h2 fi,j , i, j = 1, , N, z0,j = zN +1,j = zi,0 = zi,N +1 = 0, i, j = 0, 1, , N + 1, (4.1.16)

obtained from (4.1.15) by omitting the error i,j , agrees approximately with the ui,j . Let z = [z1,1 , z2,1 , , zN,1 , z1,2 , , zN,2 , , z1,N , , zN,N ]T and b = h2 [f1,1 , , fN,1 , f1,2 , , fN,2 , , f1,N , , fN,N ]T . (4.1.17b) Then (4.1.16) is equivalent to a linear system Az = b with the N 2 N 2 matrix. 1 1 .. .. .. . . . 1 .. . 1 1 4 4 4 1 ... ... ... 1 .. . 4 . .. . 1 ... ... ... ... ... ... 1 1 4 1 .. .. .. . . . 1 .. .. . . 1 1 1 4 (4.1.17a)

1 ... A =

..

... 1

1 . . 1 . . . . ... 1 .. . ... ...

A A 2,1 . 2,2 ..

A1,1 A1,2

... .. . AN,N 1 AN 1,N AN,N .

(4.1.18)

Let A = D L R. The matrix J = D1 (L + R) belongs to the Jacobi method (TSM). The N 2 eigenvalues and eigenvectors of J can be determined explicitly. We can verify at

4.1 General procedures for the construction of iterative methods once, by substitution, that N 2 vectors z (k,l) , k, l = 1, , N with components zi,j := sin satisfy Jz (k,l) = (k,l) z (k,l) with 1 k l (k,l) := (cos + cos ), 1 k, l N. 2 N +1 N +1
(k,l)

65

ki lj sin , 1 i, j N, N +1 N +1 (4.1.19)

J thus has eigenvalues (k,l) , 1 k, l N . Then we have (J ) = 1,1 and R (J ) = ln(1 These show that (i) TSM converges; Nevertheless, (ii) Diminution of h will not only enlarge the op counts per step, but also the convergence speed will drastically make smaller. sectionSome Remarks on nonnegative matrices 2 h2 = cos =1 + O(h4 ) N +1 2 2 h2 2 h2 + O(h4 )) = + O(h4 ). 2 2 (4.1.20)

(4.1.21)

4.1.1

Some theorems and denitions


(T ): A measure of quality for convergence.

Denition 4.1.1 A real m n-matrix A = (aik ) is called nonnegative (positive), denoted by A 0 (A > 0), if aik 0 (> 0), i = 1, , m, k = 1, , n. Remark 4.1.1 Let Kn = {x|xi 0, i = 1, , n} Rn . It holds A Rmn , A 0 AKn Km . Especially, for m = n, A 0 AK K, K = K is a cone. = {1, 2, , n}. Let N , I = Denition 4.1.2 An m n-matrix A is called reducible, if there is a subset I N such that i I , j I aij = 0. A is not reducible A is irreducible. , I = N Remark 4.1.2 G(A) is the directed graph associated with the matrix A. If A is an n n-matrix, then G(A) consists of n vertices P1 , , Pn and there is an (oriented) arc Pi Pj in G(A) precisely if aij = 0.

66

Chapter 4. Iterative Methods for Solving Large Linear Systems It is easily shown that A is irreducible if and only if the graph G(A) is connected in the sense that for each pair of vertices (Pi , Pj ) in G(A) there is an oriented path from Pi to Pj . i.e., if i = j , there is a sequence of indices i = i1 , i2 , , is = j such that (ai1 ,i2 ais1 ,is ) = 0. Lemma 4.1.1 If A 0 is an irreducible n n matrix, then (I + A)n1 > 0. Proof: It is sucient to prove for any x 0, (I + A)n1 x > 0. Let xk+1 = (I + A)xk be a sequence of nonnegative vectors, for 0 k n 2 with x0 = x. We now verify that xk+1 has fewer zero components than does xk for every 0 k n 2. Since xk+1 = xk + Axk , it is clear that xk+1 has no more zero components than xk . If xk+1 and xk has exactly the same number of zero components, then for a suitable permutation P we have P xk+1 = Then 0 , P xk = 0 0 0 , > 0, > 0, , Rm , 1 m n. A11 A12 A21 A22 0

This implies A21 = 0. But A21 0 and > 0, it follows A21 = 0. It contradicts that A is irreducible. Thus xk+1 has fewer components and xk has at most (n k 1) zero component. Hence xn1 = (I + A)n1 x0 is a positive vector. (See also Miroslav Fiedler: Special Matrices and their applications in Numerical Mathematics for the following theorems.) Lemma 4.1.2 If A, B are squared matrices and |A| B , then (A) (B ). In particular, (A) (|A|). Proof: Suppose |A| B , but (A) > (B ). Let s satisfy (A) > s > (B ), P = ( 1 )A s 1 1 1 and Q = ( s )B . Then (P ) = s (A) > 1, (Q) = s (B ) < 1. This means that lim Qk = 0. But |P k | |P |k Qk this implies lim P k = 0, i.e., (P ) < 1. Contradiction!
k 0

Lemma 4.1.3 Let A 0, z 0. If is a real number satises Az > z , then (A) > . Proof: Assume 0. Clearly, z = 0. Since Az > z , there is an > 0 such that Az ( + )z . It means that B = ( + )1 A satises Bz z . Thus, B k z B k1 z z, for k > 0 (integer). Hence B k does not converge to the null matrix. This implies, (B ) 1 and (A) + > . Theorem 4.1.4 (Perron-Frobenius Theorem) Let A 0 be irreducible. Then (A) is a simple positive eigenvalue of A and there is a positive eigenvector belonging to (A). No nonnegative eigenvector belongs to any other eigenvalue of A.

4.1 General procedures for the construction of iterative methods 67 Remark 4.1.3 (A) is called a Perron root of A. The eigenvector corresponding to (A) is called a Perron vector. Lemma 4.1.1 (Perron Lemma) If A > 0, then (A) is a positive eigenvalue of A and there is only one linearly independent eigenvector corresponding to the eigenvalue (A). Moreover, this eigenvector may be chosen to be positive. Proof: The lemma holds for n = 1. Let n > 1 and A > 0. There exists an eigenvalue of A such that (A) = ||. Let Au = u, u = 0. (4.1.1) Since |v + w| |v | + |w|, if v, w C, , R+ , then = holds exits complex unit such that v 0 and w 0. Generalization: Since | Then
n i=1

i vi |

n i=1

i |vi |, for v1 , . . ., vn C and 1 , . . ., n R+ .

= holds complex unit such that vi 0, i = 1, , n. Use this result to show u in (4.1.1) has the property that there is a complex unit such that ui 0, for i = 1, , n. (4.1.2) To prove this, assume (4.1.2) does not hold. Then we have
n n

|||uk | = |
j =1

akj uj | <
j =1

akj |uj |

in k -th equation of (4.1.1). By the above statement, this is true for k = 1, , n. Thus, A|u| > |||u|. From Lemma 4.1.3 follows that || < (A), which contradicts that || = (A). Therefore, the inequality (4.1.2) implies v = u, v = 0 nonnegative and from (4.1.1) follows Av = v. (4.1.3) If vk = 0 and thus vk > 0, then the k-th equation in (4.1.3) gives > 0. Hence = (A) and using (4.1.3) again follows v > 0. In particular, we have proved the implication: if is an eigenvalue such that || = (A) and if u is an associated eigenvector then |u| > 0. Suppose that there are two linearly independent eigenvectors v = (vi ) and w = (wi ), belonging to . As v = 0, there is an integer k such that vk = 0. The vector z = 1 )v is also an eigenvector of A belonging to . Since z = 0, but zk = 0, this w (wk vk contradicts the proved results in above which states that |z | > 0. Corollary 4.1.1 Let A > 0. Then || < (A) for every eigenvalue = (A).

68 Chapter 4. Iterative Methods for Solving Large Linear Systems Proof: || (A) for all eigenvalues of A. Suppose || = (A) and Ax = x, x = 0. By Perron Lemma there is an w = ei x > 0 for some R such that Aw = w. But then = (A). Contradictions! Proof of Theorem 4.1.4: Since A 0 irreducible, (I + A)n1 is positive by Lemma 4.1.1. Also, (I + AT )n1 = ((I + A)n1 )T is positive. By Perron Lemma there is an y > 0 such that y T (I + A)n1 = ((I + A)n1 )y T . Let be the eigenvalue of A satisfying || = (A) and Ax = x, x = 0. Further, 2 (A)|x| (A)A|x| = A(A)|x| A2 |x|, and in general k (A)|x| Ak |x|, for k = 1, 2, . Hence (1 + (A))n1 |x| (I + A)n1 |x|. Multiplying y T from left it implies (1 + (A))n1 (y T |x|) y T (I + A)n1 |x|. From (4.1.4) follows that R.H.S = ((I + A)n1 )y T |x|. Since y T |x| > 0, it implies (1 + (A))n1 ((I + A)n1 ). (4.1.7) (4.1.6) (4.1.5) (4.1.4)

The eigenvalues of (I + A)n1 are of the form (1 + )n1 , where is an eigenvalue of A. Hence there is an eigenvalue of A such that |(1 + )n1 | = ((I + A)n1 ). On the other hand, we have || (A). Substituting into (4.1.7), we get (1 + (A))n1 |(1 + )n1 | and further 1 + (A) |1 + | 1 + || 1 + (A). Since the left-hand and right-hand sides coincide, we have equality everywhere. Thus 0 and hence = (A). Equality is valid in all the inequalities that we have added, i.e., in (4.1.5). For k = 1, it follows A|x| = (A)|x| or A|x| = |x|. In view of (4.1.6) and (4.1.8) follows (I + A)n1 |x| = |1 + |n1 |x| = ((I + A)n1 )|x|. (4.1.8)

4.1 General procedures for the construction of iterative methods 69 Using Perrons Lemma, we get |x| > 0. From this we know that there is only one linearly independent eigenvector belonging to eigenvalue by the same argument as that used in the last paragraph of the proof of Perrons Lemma. Moreover, (A) > 0 as A is distinct from the null matrix (n > 1)!. Consequently, we want to claim: (A) is a simple eigenvalue of A if and only if (i) there is a unique linearly independent eigenvector of A to , say u and also only one linearly independent eigenvector of AT belonging to , say v . (ii) v T u = 0. Indeed, only one linearly independent eigenvector of A, say u, belongs to (A). Moreover u > 0. Similarly, AT 0 irreducible. The respective eigenvector v of AT (to (A)) can be chosen positive as well v > 0. Therefore v T u > 0 and by Schur Lemma follows that (A) is simple. Finally, we show that no nonnegative eigenvector belongs to any other eigenvalue. Suppose Az = z, z 0 and = (A). We have shown that AT has a positive eigenvector, say w > 0. Then, AT w = (A)w. But, wT Az = wT z = (wT z ), i.e., wT Az = (A)(wT z ), which is a contradiction in view of (A) = 0 and wT z > 0. Theorem 4.1.5 Let A 0, x > 0. Dene the quotients: (Ax)i 1 qi (x) = xi xi Then
1in n

aik xk , for i = 1, , n.
k=1

(4.1.9)

min qi (x) (A) max qi (x).


1in

(4.1.10)

If A is irreducible, then it holds additionally, either q1 = q2 = qn (then x = z, qi = (A)) or


1in

(4.1.11) (4.1.12)

min qi (x) < (A) < max qi (x).


1in

Proof: We rst assume that A is irreducible. Then AT is irreducible. From Theorem 4.1.4 there exists y > 0 such that AT y = (AT )y = y . Since Ax = Qx with Q = diag (q1 , , qn ), it follows
n n

qi yi xi = y Qx = y Ax = y x =
i=1 i=1

y i xi

70 or

Chapter 4. Iterative Methods for Solving Large Linear Systems


n

(qi )yi xi = 0.
i=1

Now there is either qi = 0, for all i = 1, , n, that is (4.1.11) holds or there is a qi = . Since yi xi > 0, so (4.1.12) holds. (4.1.10) follows from the consideration of the limiting case. Theorem 4.1.6 The statements in Theorem 4.1.5 can be formulated as: Let A 0, x > 0. (4.1.10) corresponds: Ax x , (4.1.13) Ax x . Let A 0, irreducible, x > 0. (4.1.12) corresponds : Ax x, Ax = x Ax x, Ax = x < , < . (4.1.14)

Theorem 4.1.7 (Perron and Frobenius 1907-1912, see Varga pp.30) Let A 0 irreducible. Then (i) = (A) is a simple eigenvalue; (ii) There is a positive eigenvector z associated to , i.e., Az = z, z > 0; (iii) If Ax = x, x 0, then = , x = z , > 0. i.e, if x is any nonnegative eigenvector of A, then x is a multiplicity of z ; (iv) A B, A = B = (A) < (B ). Note that (i), (ii) and (iii) follows by Theorem 4.1.4 immediately. The proof of (iv) follows from Lemma 4.1.12 in Appendix. Theorem 4.1.8 (See Varga pp.46) If A 0, then (i) = (A) is an eigenvalue. (ii) There is a z 0, z = 0 with Az = z . (iii) A B = (A) (B ). Note that If A 0 reducible, then A is a limit point of irreducible nonnegative matrices. Hence some parts of Theorem 4.1.7 are preserved.

4.1 General procedures for the construction of iterative methods

71

Appendix
Let A = [aij ] 0 be irreducible and x 0 be any vector. Let rx min
xi >0 n j =1

aij xj xi 0.

Then, rx = sup{ 0 | Ax x}. Consider r = sup {rx }.


x>0,x=0

(4.1.15) (4.1.16)

Since rx and rx have the same value for all > 0, we only consider x = 1 and x 0. Let P = {x | x 0, x = 1} and Q = {y | (I + A)n1 x, x P }. From Lemma 4.1.1 follows Q consists only of positive vector. Multiplying Ax rx x by (I + A)n1 , we get Ay rx y (by (9.15)). Thus ry rx . The quantity r of (4.1.16) can be dened equivalently as r = sup{ry }.
y Q

(4.1.17)

Note that ry : Q R taking its maximum. As P is compact, so is Q, and as ry is a continuous function on Q, there exists a positive z for which Az rz (4.1.18)

and no vector w 0 exists for which Aw > rw. All non-negative nonzero z satifying (4.1.18) is called an extremal vector of the matrix A. Lemma 4.1.9 Let A 0 be irreducible. The quantity r of (4.1.16) is positive. Moreover, each extremal vector z is a positive eigenvector of A with corresponding eigenvalue r. i.e., Az = rz, z > 0. Proof: If is positive and i = 1, then since A is irreducible, no row of A can vanish. Thus no component of A can vanish. Thus r > 0. Proving that r > 0. Let z be an extremal vector which Az rz = , 0. If = 0, then some component of is positive. Multiplying both sides by (I + A)n1 we get Aw rw > 0, w = (I + A)n1 z > 0. Therefore, rw > r which contradicts (4.1.17). Thus Az = rz . Since w > 0 and w = (1 + r)n1 z , it follows z > 0.

72 Chapter 4. Iterative Methods for Solving Large Linear Systems Lemma 4.1.10 Let A = [aij ] 0 be irreducible and B = [bij ] be a complex matrix with |B | A. If is any eigenvalue of B , then | | r, (4.1.19)

where r is the positive constant of (4.1.16). Moreover, equality in (4.1.19) holds, i.e., = rei , if and only if, |B | = A, and B has the form B = ei DAD1 , where D is diagonal whose diagonal entries have modulus unity. Proof: If By = y, y = 0, then yi =
n j =1

(4.1.20)

bij yi , 1 i n. Thus,

| ||y | |B ||y | A|y |. This implies, | | r|y| r. Hence, (4.1.19) is proved. If | | = r, then |y | is an extremal vector of A. From Lemma 4.1.9 follows that |y | is a positive eigenvector of A corresponding to the eigenvalue r. Thus, r|y | = |B ||y | = A|y |. Since |y | > 0, from (4.1.21) and |B | A follows |B | = A. For vector y , (|y | > 0), we set D = diag Then y = D |y | Setting = rei , then By = y can be written as C |y | = r |y | where C = ei D1 BD. From (4.1.21) and (4.1.24) follows that C |y | = |B ||y | = A|y |. (4.1.26) (4.1.25) (4.1.24) (4.1.23) y yn , , |y 1 | |y n | . (4.1.22) (4.1.21)

From the denition of C in (4.1.25) follows that |C | = |B |. Combining with (4.1.22) we have |C | = |B | = A (4.1.27) Thus, from (4.1.26) we conclude that C |y | = |C ||y |, and as |y | > 0, follows C = |C |, and thus C = A from (4.1.27). Combing this result with (4.1.25) gives B = ei DAD1 . Conversely, it is obvious that B has the form in (4.1.20), then |B | = A. So, B has an eigenvalue with | | = r. Setting B = A in Lemma 4.1.10, we have

4.1 General procedures for the construction of iterative methods 73 Corollary 4.1.11 If A 0 is irreducible, then the positive eigenvalue r of Lemma 4.1.9 equals the spectral radius (A) of A. Lemma 4.1.12 If A 0 is irreducible and B is any principal squared submatrix of A, then (B ) < (A). Proof: There is a permutation P such that C= A11 0 0 0 and P AP T = A11 A12 A21 A22 .

Clearly, 0 C P AP T and (C ) = (B ) = (A11 ). But as C = |C | = P AP T follows that (B ) < (A).

4.1.2

The theorems of Stein-Rosenberg

Remark 4.1.4 Let D be nonsingular in the standard decomposition (4.1.5) A = D L R. = D 1 A = D L R , where D = I, L = D1 L and R = D1 R. Then we Consider A have 1 (L +R ) D1 (L + R) = D1 L + D1 R = D and L )1 R. (D L)1 R = (I D1 L)1 D1 R = (D

When we investigate TSM and SSM, we can without loss of generality suppose that D = I . Therefore in the following paragraph we assume that A = I L R. The iteration matrices of TSM and SSM become J = L + R, H = (I L)1 R, (4.1.29) (4.1.30) (4.1.28)

respectively. If L 0 and R 0, then J and H = (I L)1 R = (I + L + + Ln1 )R are nonnegative. Here, we have Ln = 0. Theorem 4.1.13 Let A = I L R, L 0, R 0, n 2. Then precisely one of the following relationships holds: (i) 0 = (H ) = (J ), (ii) 0 < (H ) < (J ) < 1, (iii) (H ) = (J ) = 1, (iv) (H ) > (J ) > 1.

74 Chapter 4. Iterative Methods for Solving Large Linear Systems Proof: We will only give the proof of the case when A is irreducible. Hence the case (i) does not occur. If A is reducible, then we can transform the reducible matrices into irreducible matrices by using the normalform method. The method is very skillful and behind our discussion, so we assume that A is irreducible. (a) claim: (H ) > 0. Let z > 0 be given. Then b = (I L)1 Rz 0. Certainly Rz = 0, thus b = . Because Rz + LRz + + Ln1 Rz = 0. Hence I = {i | bi = 0} = {1, 2, , n} = N Rz = b Lb, for i I we have 0 = bi =
k>i

rik zk +
k<i

lik bk

and

rik = 0, i I, k > i, k I, lik = 0, i I, k < i, k I,

and aik = 0 for all i I, k I . Since A is irreducible, it follows that I = . For b > 0 bi and from Theorem 4.1.5 follows that 0 < min z (H ). i
1in

(b) Let x 0 be the eigenvector of H corresponding to H = (H ) (by Theorem 4.1.8). Let J = (J ). Since (I L)1 Rx = H x, thus 1 1 Rx = x Lx or x = (L + R)x. H H Since A is irreducible, we can conclude that L + 1 R is also irreducible. According to H Theorem 4.1.7 (iii) we have 1 1 = (L + R) (4.1.31) H and x > 0. Now we dene the real value function 1 (t) = (L + R), t > 0. (4.1.32) t From Theorem 4.1.7 (iv) we can conclude that (t) is strictly (monotonic) decreasing in t. On the other hand, t(t) = (tL + R), t > 0 is strictly (monotone) increasing in t. (case 1) H < 1: Since J = (1), it implies that J = (1) = (L + R) > (H L + R) = H (L + (case2) H = 1: (L + R) = J = 1. (case 3) H > 1: J = (1) > (H ) = 1 and J = (1) = (L + R) < (H L + R) = H (L + 1 R ) = H . H 1 ) = H . (by (4.1.31)) H R

Theorem 4.1.14 If the o-diagonal elements in A (A = I L R) are nonpositive, then SSM is convergent if and only if TSM is convergent. Furthermore, SSM is asymptotically faster. Proof: The result follows immediately from theorem 4.1.13 and (4.1.13).

4.1 General procedures for the construction of iterative methods

75

4.1.3

Sucient conditions for convergence of TSM and SSM

Denition 4.1.3 A real matrix B is called an M -matrix if bij 0, i = j and B 1 exists with B 1 0. In the following theorems we give some important equivalent conditions of the M matrix. Theorem 4.1.15 Let B be a real matrix with bij 0 for i = j . Then the following statements are equivalent. (i) B is an M matrix. (ii) There exists a vector v > 0 so that Bv > 0. (iii) B has a decomposition B = sI C with C 0 and (C ) < s. (iv) For each decomposition B = D C with D = diag (di ) and C 0, it holds: di > 0, i = 1, 2, , n, and (D1 C ) < 1. (v) There is a decomposition B = D C , with D = diag (di ) and C 0 it holds: di > 0, i = 1, 2, , n and (D1 C ) < 1. Further, if B is irreducible, then (6) is equivalent to (1)-(5). (vi) There exists a vector v > 0 so that Bv 0, = 0. Proof: (i) (ii) : Let e = (1, , 1)T . Since B 1 0 is nonsingular it follows v = B 1 e > 0 and Bv = B (B 1 e) = e > 0. (ii) (iii) : Let s > max(bii ). It follows B = sI C with C 0. There exists a v > 0 with Bv = sv Cv (via (ii)), also sv > Cv . From the statement (4.1.13) in Theorem 4.1.6 follows (C ) < s. (iii) (i) : B = sI C = s(I 1 C ). For ( 1 C ) < 1 and from Theorem 1.2.6 follows s s C )1 = that there exists a series expansion (I 1 s (I 1 C )1 0. are nonnegative, we get B 1 = 1 s s
=0

(1 C )k . Since the terms in sum s

(ii) (iv) : From Bv = Dv Cv > 0 follows Dv > Cv 0 and di > 0, for i = 1, 2, , n. Hence D1 0 and v > D1 Cv 0. From (4.1.13) follows that (D1 C ) < 1. (iv) (v) : Trivial. (v) (i) : Since (D1 C ) < 1, it follows from Theorem 1.2.6 that (I D1 C )1 exists and equals to

(D1 C )k . Since the terms in sum are nonnegative, we have

(I D1 C )1 is nonnegative and B 1 = (I D1 C )1 D1 0. (ii) (vi) : Trivial.

k=0

76 Chapter 4. Iterative Methods for Solving Large Linear Systems (vi) (v) : Consider the decomposition B = D C , with di = bii . Let {I = i | di 0}. From di vi k=i cik vk 0 follows cik = 0 for i I , and k = i. Since Bv 0, = 0 I = {1, , n}. But B is irreducible I = and di > 0. Hence for Dv >, = Cv also v >, = D1 Cv and (4.1.14) show that (D1 C ) < 1. Remark 4.1.5 Theorem 4.1.15 can also be described as follows: If aij 0, i = j , then TSM and SSM converge if and only if A is an M -matrix. Proof: By (i) (iv) and (i) (v) of the previous theorem and Theorem 4.1.14. Lemma 4.1.16 Let A be an arbitrary complex matrix and dene |A| = [|aij |]. If |A| C , then (A) (C ). Especially (A) (|A|). Proof: There is a x = 0 with Ax = x and || = (A). Hence
n n n

(A)|xi | = |
k=1

aik xk |
k=1

|aik ||xk |
k=1

cik |xk |.

Thus, (A)|x| C |x|. If |x| > 0, then from (4.1.13) we have (A) (C ). Otherwise, let I = {i | xi = 0} and CI be the matrix, which consists of the ith row and ith column of C with i I . Then we have (A)|xI | CI |xI |. Here |xI | consists of ith component of |x| with i I . Then from |xI | > 0 and (4.1.13) follows (A) (CI ). We now ll CI with zero up to an n n I . Then C I C . Thus, (CI ) = (C I ) (C ) (by Theorem 4.1.8(iii)). matrix C Theorem 4.1.17 Let A be an arbitrary complex matrix. It satises either (Strong Row Sum Criterion): |aij | < |aii |, i = 1, , n.
j =i

(4.1.33)

or (Weak Row Sum Criterion): |aij | |aii |, i = 1, , n,


j =i

< |ai0 i0 |, at least one i0 , for A irreducible. Then TSM and SSM both are convergent.

(4.1.34)

Proof: Let A = D L R. From (4.1.33) and (4.1.34) D must be nonsingular and then as in Remark 4.1.4 we can w.l.o.g. assume that D = I . Now let B = I |L| |R|. Then (4.1.33) can be written as Be > 0. From Theorem 4.1.15(ii) and (i) follows that B is an M -matrix. (4.1.34) can be written as Be 0, Be = 0. Since A is irreducible, also B , from Theorem 4.1.15 (vi) and (i) follows that B is an M -matrix.

4.2 Relaxation Methods (Successive Over-Relaxation (SOR) Method ) Especially, from theorem 4.1.15(i), (iv) and Theorem 4.1.13 follows that (|L| + |R|) < 1 and ((I |L|)1 |R|) < 1. Now Lemma 4.1.16 shows that (L + R) (|L| + |R|) < 1. So TSM is convergent. Similarly, ((I L)1 R) = (R + LR + + Ln1 R) (|R| + |L||R| + + |L|n1 |R|) = ((I |L|)1 |R|) < 1. So SSM is convergent.

77

4.2

Relaxation Methods (Successive Over-Relaxation (SOR) Method )


A=DLR

Consider the standard decomposition (4.1.5)

for solving the linear system (4.1.1) Ax = b. The single-step method (SSM) (D L)xi+1 = Rx(i) + b can be written in the form x(i+1) = x(i) + {D1 Lx(i+1) + D1 Rx(i) + D1 b x(i) } := x(i) + v (i) . Consider a general form of (4.2.1) x(i+1) = x(i) + v (i) with constant . Also (4.2.2) can be written as Dx(i+1) = Dx(i) + Lx(i+1) + Rx(i) + b Dx(i) . Then x(i+1) = (D L)1 ((1 )D + R)x(i) + (D L)1 b. We now assume that D = I as above. Then (4.2.3) becomes x(i+1) = (I L)1 ((1 )I + R)x(i) + (I L)1 b with the iteration matrix L := (I L)1 ((1 )I + R). (4.2.5) (4.2.4) (4.2.3) (4.2.2) (4.2.1)

These methods is called for < 1 : under relaxation, = 1 : single-step method, > 1 : over relaxation. (In general: relaxation methods.) We now try to choose an such that (L ) is possibly small. But this is only under some special assumptions possible. we rst list a few qualitative results about (L ).

78 Chapter 4. Iterative Methods for Solving Large Linear Systems Theorem 4.2.1 Let A = D L L be hermitian and positive denite. Then the relaxation method is convergent for 0 < < 2. Proof: We claim that each eigenvalue of L has absolute value smaller than 1 (i.e., (L ) < 1). Let (L ). Then there is an x = 0 with (D L)x = ((1 )D + L )x. It holds obviously 2(D L) = ((2 )D + (D 2L)) = ((2 )D + A + (L L)) and 2[(1 )D + L ] = (2 )D + (D + 2L ) = (2 )D A + (L L). Hence multiplying (4.2.6) by x we get ((2 )x Dx + x Ax + x (L L)x) = (2 )x Dx x Ax + x (L L)x or by d = x Dx > 0, a := x Ax > 0 and x (L L)x := is, s R we get ((2 )d + a + is) = (2 )d a + is. Dividing above equation by and setting = (2 )/ , we get {d + a + is} = d a + is. For 0 < < 2 we have > 0 and d + is is in the right half plane. Therefore the distance +isa from a to d + is is smaller than that from a. So we have || = | d | < 1. d+is+a Theorem 4.2.2 Let A be Hermitian and nonsingular with positive diagonal. If SSM converges, then A is positive denite. Proof: Let A = D L L . For any matrix C it holds: A (I AC )A(I CA) = A A + ACA + AC A AC ACA = AC (C 1 + C 1 A)CA. For special case that C = (D L)1 we have C 1 + C 1 A = D L + D L D + L + L = D and I CA = (D L)1 (D L A) = (D L)1 L = H. Hence we obtain A H AH = A(D L) D(D L)1 A =: B (4.2.6)

4.2 Relaxation Methods (Successive Over-Relaxation (SOR) Method ) 79 1 Thus H AH = A B. D is positive denite, obviously so is B (since (D L) A is nonsingular). Because (H ) < 1, for any 0 C n the sequence {m } m=1 dened by m := H m 0 converges to zero. Therefore the sequence {m Am } also converges to m=1 zero. Furthermore, we have
m+1 Am+1 = m H AHm = m Am m Bm < m Am ,

(4.2.7)

because B > 0 is positive denite. If A is not positive denite, then there is a 0 Cn \{0} with 0 A0 0. This is a contradiction that {m Am } 0 and (4.2.7).

4.2.1

Determination of the Optimal Parameter for 2-consistly Ordered Matrices

For an important class of matrices the more qualitative assertions of Theorems 4.2.1 and 4.2.2 can be considerably sharpened. This is the class of consistly ordered matrices. The optimal parameter b with (Lb ) = min (L )

can be determined. We consider A = I L R. Denition 4.2.1 A is called 2-consistly ordered, if the eigenvalues of L + 1 R are independent of . Example 4.2.1 A = L + 1 R = 0 R L 0 + I, = I 0 0 I 0 R L 0 I 0 1 0 I .

0 1 R L 0

This shows that L + 1 R is similar to L + R, so the eigenvalues are independent to . A is 2-consistently ordered. Let A = I L R, J = L + R. Let s1 , s2 , . . . denote the lengths of all closed oriented path (oriented cycles) Pi Pk1 Pk2 Pks (i) = Pi in G(J ) which leads from Pi to Pi . Denoting by li the greatest common divisor: li = (i) (i) g.c.d.(s1 , s2 , ). Denition 4.2.2 The Graph G(J ) is called 2-cyclic if l1 = l2 = = ln = 2 and weakly 2-cyclic if all li are even. Denition 4.2.3 The matrix A has property A if there exists a permutation P such that D1 M1 with D1 and D2 diagonal. P AP T = M2 D2
(i) (i)

80 Chapter 4. Iterative Methods for Solving Large Linear Systems Theorem 4.2.3 For every nn matrix A with property A and aii = 0, i = 1, , n, there = D(I L R) of the permuted matrix A := P AP T exists a permutation P such that A is 2-consistly ordered. Proof: There is a permutation P such that P AP T = with D= D1 0 0 D2 D1 M1 M 2 D2 = D(I L R)
1 0 D1 M1 0 0

, L=

0 0 1 D2 M2 0

and R =

For = 0, we have J () = = where S := 0


1 D2 M2 1 , S J (1)S 1 1 D1 M1 0

= S

0
1 D2 M2

1 D1 M1 0

1 S

I1 0 . 0 I2

Theorem 4.2.4 An irreducible matrix A has property A if and only if G(J ) is weakly 2-cyclic. (Without proof !) Example 4.2.2 Block tridiagonal D1 A A = 21 matrices A12 D2 .. . .. .. . . AN 1,N DN .

AN,N 1

If all Di are nonsingular, then 1 0 1 D1 A12 0 . ... 1 . 0 . D2 A21 J () = . . . 1 1 . . . . . . DN 1 AN 1,N 1 0 DN AN,N 1 0

1 , with S = diag {I1 , I2 , , N 1 IN }. Thus which obeys the relation J () = S J (1)S A is 2-consistly ordered.

The other description: G(L + R) is bipartite. 1 b1 0 . ... c1 . . Example 4.2.3 A = is 2-consistly ordered. The eigenvalues . . .. . . bn1 0 cn1 1 are the roots of det(A I ) = 0. The coecients of above equation appear only those 1 products bi ci . For L + 1 R, we substitute bi and ci by bi and ci , respectively, then the products are still bi ci . Therefore eigenvalues are independent of .

4.2 Relaxation Methods (Successive Over-Relaxation (SOR) Method ) Examples 4.2.1 and 4.2.2 are 2-cyclic. Example 4.2.4 1 a b A = 0 1 0 , c d 1 0 1 a 1 b 0 0 . L 1 R = 0 c d 0

81

The coecients of characteristic polynomial are independent to , so A is 2-consistly ordered. But G(L + R) is not bipartite, so not 2-cyclic. If A is 2-consistly ordered, then L+R and (L+R) ( = 1) has the same eigenvalues. The nonzero eigenvalues of L + R appear in pairs. Hence
r

det(I L R) =

m i=1

(2 2 i ), n = 2r + m (m = 0, possible).

(4.2.8)

Theorem 4.2.5 Let A be 2-consistly ordered, aii = 1, = 0. Then hold: (i) If = 0 is an eigenvalue of L and satises the equation ( + 1)2 = 2 2 , then is an eigenvalue of L + R (so is ). (ii) If is an eigenvalue of L + R and satises the equation (4.2.9), then is an eigenvalue of L . Remark 4.2.1 If = 1, then = 2 , and ((I L)1 R) = ((L + R))2 . Proof: We rst prove the identity det(I sL rR) = det(I sr(L + R)). (4.2.10) (4.2.9)

Since both sides are polynomials of the form n + and sL + rR = sr( s L+ r r R) = sr(L + 1 R), s

if sr = 0, then sL + rR and sr(L + R) have the same eigenvalues. It is obviously also for the case sr = 0. The both polynomials in (4.2.10) have the same roots, so they are identical. For det(I L) det(I L ) = det((I L) (1 )I R) = det(( + 1)I L wR) = () and det(I L) = 0, is an eigenvalue of L if and only if () = 0. From (4.2.10) follows () = det(( + 1)I (L + R))

82 Chapter 4. Iterative Methods for Solving Large Linear Systems and that is (from (4.2.8))
r

() = ( + 1)

m i=1

(( + 1)2 2 2 i ),

(4.2.11)

where i is an eigenvalue of L + R. Therefore, if is an eigenvalue of (L + R) and satises (4.2.9), so is () = 0, then is eigenvalue of L . This shows (b). Now if = 0 an eigenvalue of L , then one factor in (4.2.11) must be zero. Let satisfy (4.2.9). Then (i) = 0: From (4.2.9) follows + 1 = 0, so ( + 1)2 = 2 2 i , for one i (from (4.2.11)), 2 2 = , (from (4.2.9)). This shows that = i , so is an eigenvalue of L + R. (ii) = 0: We have + 1 = 0 and 0 = () = det(( + 1)I (L + R)) = det( (L + R)), i.e., L + R is singular, so = 0 is eigenvalue of L + R. Theorem 4.2.6 Let A = I L R be 2-consistly ordered. If L + R has only real eigenvalues and satises (L + R) < 1, then it holds (Lb ) = b 1 < (L ), for = b , where b = 1+


(4.2.12)

2 1 2 (L + R)

(solve b in (4.2.9)).

>

Figure 4.1: gure of (Lb ) One has in general, (L ) = 1, for b 2 2 2 + 1+ 1 2


1 2 2 1+ 4 , for 0 < b

(4.2.13)

4.2 Relaxation Methods (Successive Over-Relaxation (SOR) Method ) 83 Remark: We rst prove the following Theorem proposed by Kahan: For arbitrary matrices A it holds (L ) | 1|, for all . (4.2.14)

Proof: Since det(I L) = 1 for all , the characteristic polynomial () of L is () = det(I L ) = det((I L)(I L )) = det(( + 1)I L R). For
n i=1

i (L ) = (0) = det(( 1)I R) = ( 1)n , it follows immediately that (L ) = max |i (L )| | 1|.


i

Proof of Theorem 4.2.6: By assumption the eigenvalues i of L + R are real and (L + R) i (L + R) < 1. For a xed (0, 2) (by (4.2.14) in the Remark it suces (1) to consider the interval (0,2)) and for each i there are two eigenvalues i (, i ) and (2) i (, i ) of L , which are obtained by solving the quadratic equation (4.2.9) in . (1) (2) Geometrically, i ( ) and i ( ) are obtained as abscissae of the points of intersection 1 of the straight line g () = + and the parabola mi () := i (see Figure 4.2). The line g () has the slope 1/ and passes through the point (1,1). If g () mi () = , then (1) (2) i ( ) and i ( ) are conjugate complex with modulus | 1| (from (4.2.9)). Evidently (L ) = max(|i ( )|, |i ( )|) = max(|(1) ( )|, |(2) ( )|),
i (1) (2)

where (1) ( ), (2) ( ) being obtained by intersecting g () with m() := , with = (L + R) = maxi |i |. By solving (4.2.9) with = (L + R) for , one veries (4.2.13) immediately, and thus also the remaining assertions of the theorem.
C   C>  

 
E

E
X

 



Figure 4.2: Geometrical view of i ( ) and i ( ).

(1)

(2)

84

Chapter 4. Iterative Methods for Solving Large Linear Systems

4.2.2

Practical Determination of Relaxation Parameter b


1 + 2 2

For [1, b ], from (4.2.13) in Theorem 4.2.6 we have (L ) = [ or = 2 2 4( 1)]2 (4.2.15)

(L ) + 1 (L )

(4.2.16)

Here := (L + R). If is simple, then (L ) is also a simple eigenvalue (See the proof of Theorem 4.2.6). So one can determine an approximation for (L ) using power method (see later for details!): Let {x(k) } k=1 be the sequence of iterates, which generated by (4.2.6) with parameter . Let e(k) = x(k) x be the error vector which satises the (0) relation e(k) = Lk (Here Ax = b). We dene d(k) := x(k+1) x(k) , for k N. Then we e have
(0) x(k+1) x(k) = e(k+1) e(k) = (L I )e(k) = (L I )Lk e (0) (0) = Lk = Lk (L I )e d . (0) Hence d(k) = Lk d . For suciently large k N we compute

qk := max

|x i

(k+1) (k)

xi |
(k1)

(k)

1in

|x i x i

(4.2.17)

which is a good approximation for (L ). We also determine the corresponding approximation for by (4.2.16) and the corresponding optimal parameter as (by Theorem 4.2.4): = 2/(1 + [1 (qk + 1)2 /( 2 qk )]1/2 ). (4.2.18)

4.2.3

Break-o Criterion for SOR Method


1 d(k) 1 (L )

From d(k) = (L I )e(k) follows (for (L ) < 1) that e(k) = (L I )1 d(k) and then e(k)

With an estimate q < 1 for (L ) one obtains the break-o criterion for a given R+ d(k) d(k )
/

(k+1)

(1 q ), (for absolute error), (1 q ), (for relative error).

The estimate qk in (4.2.17) for the spectral radius (L ) of SOR method is theoretically justied, if b . But during the computation we cannot guarantee that the new also satises b . Then an oscillation of qk at may occur, and 1 qk can be considerably larger than 1 (L ); the break-o criterion may be satised too early. It is better to take q := max(qk , 1) instead of qk .

4.3 Application to Finite Dierence Methods: Model Problem (Example 4.1.3) 85 Algorithm 4.2.1 (Successive Over-Relaxation Method) Let A Rnn and b Rn . Let A = D L R with D nonsingular. Suppose that A is 2-consistly ordered, all eigenvalues of J := D1 (L + R) are real and (J ) is a simple eigenvalue of J satisfying (J ) < 1. [We apply a simple strategy to the following Algorithm, to perform a new updating after p iterative steps (p 5).] Step 1: Choose a bound for machine precision R+ and a positive integer p, and a initial vector x(0) Rn . Let := 1, q := 1 and k := 0. Step 2: (Iterative step): Compute for i = 1, . . . , n,
(k+1) xi

= (1

(k ) )xi

+ aii

i1 (k+1) aij xj j =1

+
j =i+1

aij xj bi .

(k )

If k is not positive integral multiplicity of p, then go to Step 4. Step 3: (Adaptation of the Estimate of the Optimal Parameter): Compute (k+1) (k ) |x i xi | . q := max (k) (k1) 1in |x | i xi If q > 1, then go to Step 5. 2 Let q := max(q, 1) and := . 2
1+ 1
(q + 1) 2 q

Step 4: (Break-o criterion): If


1in

max xi

(k+1)

xi

(k)

1in

max xi

(k+1)

(1 q ),

then stop. Step 5: Let k := k + 1 and go to step 2.

4.3

Application to Finite Dierence Methods: Model Problem (Example 4.1.3)

We consider the Dirichlet boundary-value problem (Model problem) as in Example 8.3. We shall solve a linear system Az = b of the N 2 N 2 matrix A as in (4.1.18). To Jacobi method: The iterative matrix is 1 J = L + R = (4I A). 4 Graph G(J ) (N = 3) is connected and weakly 2-cyclic. Thus, A is irreducible and has property A. It is easily seen that A is 2-consistly ordered (Exercise!). To Gauss-Seidel method: The iterative matrix is H = (I L)1 R.

86 Chapter 4. Iterative Methods for Solving Large Linear Systems From the Remark of Theorem 4.2.5 and (4.1.20) follows that (H ) = (J )2 = cos2 . N +1 2 1 + sin N +1

According to Theorem 4.2.6 the optimal relaxation parameter b and (Lb ) are given by b = and 2 1+ 1 cos2
N +1

(3.1)

cos2 N +1 (Lb ) = . (1 + sin N )2 +1

(3.2)

The number k = k (N ) with (J )k = (Lb ) indicates that the k steps of Jacobi method produce the same reduction as one step of the optimal relaxation method. Clearly, k = ln (Lb )/ ln (J ). Now for small z one has ln(1 + z ) = z z 2 /2 + O(z 3 ) and for large N we have cos Thus that ln (J ) = Similarly, ln (Lb ) = 2[ln (J ) ln(1 + sin = 2[ )] N +1 N +1 =1 2 1 + O( 4 ). 2 2(N + 1) N (3.3)

2 1 + O( 4 ). 2 2(N + 1) N

2 2 1 + + O( 3 )] 2 2 2(N + 1) N + 1 2(N + 1) N 2 1 = + O( 3 ) (for large N ). N +1 N and 4(N + 1) . (3.4) The optimal relaxation method is more than N times as fast as the Jacobi method. The quantities k = k (N ) RJ := RH RLb ln 10 0.467(N + 1)2 . ln (J ) 1 RJ 0.234(N + 1)2 := 2 ln 10 0.367(N + 1) := ln (Lb ) (3.5) (3.6) (3.7)

indicate the number of iterations required in the Jacobi, the Gauss-Seidel method, and the optimal relaxation method, respectively, in order to reduce the error by a factor of 1/10.

4.4 Block Iterative Methods

4.4

Block Iterative Methods


A11 A1N . . ... . A= . . . , AN 1 AN N

87

A natural block structure

where Aii are square matrices. In addition, if all Aii are nonsingular, we introduce block iterative methods relative to the given partition of A, which is analogous to (4.1.5): A = D L R with A11 0 0 0 0 A22 0 0 := , . . 0 . 0 0 0 0 0 AN N 0 0 0 . .. . . . A := .21 . , .. .. . . . 0 AN 1 AN,N 1 0 0 A12 A1N . . ... . . 0 .. := . . ... . . AN 1,N 0 0 0

(4.1a)

(4.1b)

One obtains the block Jacobi method (block total-step method) for the solution of Ax = b by choosing in (4.1.4) analogously to (4.1.6) or (4.1.7), F := D . One thus obtains D x(i+1) = b + (L + R )x(i) or Ajj xj
(i+1)

(4.2) (4.3)

= bj
k =j

Ajk xk , for j = 1, . . . , N, i = 0, 1, 2, .

(i)

We must solve system of linear equations of the form Ajj z = y , j = 1, , N . By the methods of Chapter 2, a triangular factorization (or a Cholesky factorization, etc.) Ajj = Lj Rj we can reduce Ajj z = y to the two triangular systems Lj u = y and Rj z = u. For the matrix A in Example 8.3 (model agonal N N matrices. 4 1 0 0 ... ... 0 1 Ajj = . . . . . . 1 0 0 0 1 4 problem): Here Ajj are positive denite tridi 0 0 . ... . . , Lj = . . . .. .. 0 . . 0 .

88 Chapter 4. Iterative Methods for Solving Large Linear Systems The rate of convergence of (4.3) is determined by (J ) of the matrix J := L + R
1 1 with L := D L and R := D R . One can analogously to (4.1.8) dene a block Gauss-Seidel method (block single-step method): H := (I L )1 R

or Ajj xj
(i+1)

= bj
k<j

Ajk xk

(i+1)

k>i

Ajk xk , for j = 1, , N, i = 0, 1, 2, .

(i)

(4.4)

As in Section 10, one can also introduce block relaxation methods through the choice
1 L = (I L ) [(1 )I + R ]

(4.5) (4.6)

and x(i+1) = (I L )1 ((1 )I + R )x(i) + (I L )1 b. If one denes A as 2-consistly ordered whenever the eigenvalues of the matrices J () = L + 1 R are independent of . Optimal relaxation factors are determined as in Theorem 4.2.6 with the help of (J ). For the model problem (Example 8.3), relative to the partition given in (8.18), (J ) can again be determined explicitly. One nds (J ) = cos N +1 < (J ). 2 cos N +1 (4.7)

For the corresponding optimal block relaxation method one has asymptotically for N , k (L b ) (Lwb ) with k = 2 (Exercise!). The number of iterations is reduced by a factor 2 compared to the ordinary optimal relaxation method.

4.5
4.5.1

The ADI method of Peaceman and Rachford


ADI method (alternating-direction implicit iterative method)

Slightly generalizing the model problem (4.1.14), we consider the Poisson problem uxx uyy + u(x, y ) = f (x, y ), for (x, y ) , (4.5.1) u(x, y ) = 0, for (x, y ) , where = {(x, y ) | 0 < x < 1, 0 < y < 1} R2 with boundary . Here > 0 is a constant and f : R continous function. Using the same discretization and the same notation as in Example 8.3, one obtains 4zij zi1,j zi+1,j zi,j 1 zi,j +1 + h2 zij = h2 fij , 1 i, j N (4.5.2)

4.5 The ADI method of Peaceman and Rachford 89 with z0j = zN +1,j = zi,0 = zi,N +1 = 0, 0 i, j N + 1 for the approximate values zij of uij = u(xi , yj ). To the decomposition 4zij zi1,j zi+1,j zi,j 1 zi,j +1 + h2 zij (2zij zi1,j zi+1,j ) + (2zij zi,j 1 zi,j +1 ) + h2 zij ,

(4.5.3)

there corresponds a decomposition of the matrix A if the system Az = b, of the form A = H + V + (H : Horizontal, V : Vertical). Here H , V , are dened by wij = 2zij zi1,j zi+1,j , if w = Hz, wij = 2zij zi,j 1 zi,j +1 , if w = V z, wij = h2 zij , if w = z. (4.5.4a) (4.5.4b) (4.5.4c)

is a diagonal matrix with nonnegative elements, H and V are both symmetric and positive denite, where H = [] and V = []. A = H + V + is now transformed equivalently into 1 1 (H + + rI )z = (rI V )z + b 2 2 and also 1 1 (V + + rI )z = (rI H )z + b. 2 2 1 1 , one obtains ADI Here r is an arbitrary real number. Let H1 := H + 2 , V1 := V + 2 method: (H1 + ri+1 I )z (i+1/2) = (ri+1 I V1 )z (i) + b, (V1 + ri+1 I )z (i+1) = (ri+1 I H1 )z (i+1/2) + b. (4.5.5) (4.5.6)

With suitable ordering of the variables zij , the matrices H1 + ri+1 I and V1 + ri+1 I are positive denite tridiagonal matrices (assuming ri+1 0), so that the systems (4.5.5) and (4.5.6) can easily be solved for z (i+1/2) and z (i+1) via a Cholesky factorization. Eliminating z (i+1/2) in (4.5.5) and (4.5.6) we get z (i+1) = Tri+1 z (i) + gri+1 (b) with Tr := (V1 + rI )1 (rI H1 )(H1 + rI )1 (rI V1 ), gr (b) := (V1 + rI )1 [I + (rI H1 )(H1 + rI )1 ]b. (4.5.8) (4.5.9) (4.5.7)

For the error fi := z (i) z it follows from (4.5.7) and the relation z = Tri+1 z + gri+1 (b) by subtraction, that fi+1 = Tri+1 fi , (4.5.10) and therefore fm = Trm Trm1 Tr1 f0 . (4.5.11) In view of (4.5.10) and (4.5.11), ri are to be determined so that the spectral radius (Trm , , Tr1 ) becomes as small as possible. For the case ri = r:

90 Chapter 4. Iterative Methods for Solving Large Linear Systems Theorem 4.5.1 Under the assumption that H1 and V1 are positive denite, one has (Tr ) < 1, for all r > 0. Proof: V1 and H1 are positive denite. Therefore (V1 + rI )1 , and (H1 + rI )1 exist, for r > 0, and hence also Tr of (4.5.8). The matrix r := (V1 + rI )Tr (V1 + rI )1 T = [(rI H1 )(H1 + rI )1 ][rI V1 )(V1 + rI )1 ] r ). The matrix H := (rI H1 )(H1 + rI )1 has is similar to Tr . Hence (Tr ) = (T the eigenvalues (r j )/(r + j ), where j = j (H1 ) are the eigenvalues of H1 . Since ) < 1. Since H1 also H r > 0, j > 0 it follows that |(r j )/(r + j )| < 1 and thus (H are symmetric, we have 2 = (H ) < 1. H In the same way one has := (rI V1 )(V1 + rI )1 . Thus Let V r ) T r (T
2

< 1.

< 1.

The eigenvalues of Tr can be exhibited by H1 z (k,l) = k z (k,l) , V1 z (k,l) = l z (k,l) , Tr z (k,l) = (k,l) z (k,l) , where zij
(k,l) lj ki := sin N sin N , 1 i, j N , with +1 +1

(4.5.12a) (4.5.12b) (4.5.12c)

(k,l) = so that

(r l )(r k ) j , j := 4 sin2 , (r + l )(r + k ) 2(N + 1) r j r + j


2

(4.5.13)

(Tr ) = max One nally nds a result (Exercise!):

1j N

min (Tr ) = (Lb ) =


r>0

cos2 (1 + sin

N +1 N +1

)2

where b characterizes the best (ordinary) relaxation method. The best ADI method assuming constant choice of parameters, has the same rate of convergence for the model problem as the optimal ordinary relaxation method. Since the individual iteration step in ADI method is a great deal more expensive than in the relaxation method, the ADI method would appear to be inferior. This is certainly true if r = r1 = r2 = is chosen. For the case ri = r:

4.5 The ADI method of Peaceman and Rachford 91 However, if one makes use of the option to choose a separate parameter ri in each step, the picture changes in favor of the ADI method. Indeed
(k,l) Tri Tr1 z (k,l) = r z (k,l) , i r1

where
k,l) ( ri r1

(rj l )(rj k ) . (rj + l )(rj + k ) j =1


(k,l)

Choosing rj := j , for j = 1, , N , we have rN , ,r1 = 0, for 1 k, l N , so that by the linear independence of the z (k,l) , TrN Tr1 = 0. With this special choice of the rj , the ADI method for the model problem terminates after N steps with the exact solution. This is a happy coincidence, which is due to the following essential assumptions: (1) H1 and V1 have in common a set of eigenvectors which span the whole space. (2) The eigenvalues of H1 and V1 are known. Theorem 4.5.2 For Two Hermitian matrices H1 and V1 Cnn , there exist n linearly independent (orthogonal) vectors z1 , , zn , which are common eigenvectors of H1 and V1 , H1 zi = i zi , V1 zi = i zi , for i = 1, , n, (4.5.14) if and only if H1 commutes with V1 , i.e., H1 V1 = V1 H1 . Proof: : From (4.5.14) it follows that H1 V1 zi = i i zi = V1 H1 zi , for i = 1, 2, , n. Since the zi form a basis in Cn , it follows at once that H1 V1 = V1 H1 . : Let H1 V1 = V1 H1 . Let 1 < < r be the eigenvalues of V1 with the multiplicities (i ), i = 1, , r. According to Theorem 1.1.1 there exists a unitary matrix U with 1 I1 0 ... V := U V1 U = . 0 r Ir 1 = V H 1 , with H 1 := U H1 U . We From H1 V1 = V1 H1 it follows immediately that H 1 analogously to V : partition H H11 H1r . . 1 = . H . . . . Hr1 Hrr By multiplying out 1 V = V H 1, H

one obtains Hij = 0, for i = j , since i = j . The Hii are Hermitian of order (i ). There 1 U ... i = i (diagonal). For U = i such that U i Hii U exist unitary matrices U Ur

92 Chapter 4. Iterative Methods for Solving Large Linear Systems nn C , since Hij = 0, for i = j , it follows the relations 1 .. ) H1 (U U ) = U H 1U = H = (U U , . r ) = (U U )H , i.e., H1 (U U and ) V1 (U U ) = U V U = V (U U ) = (U U )V , i.e., V1 (U U )ei can be taken as n common orthogonal eigenvectors of H1 and V1 . so that zi := (U U We now assume in the following discussion that H1 and V1 are two positive denite commuting n n matrices with (4.5.14) and that two numbers , are given such that 0 < i , i , for i = 1, , n. Then Tr z i = gives the problem:
m

(r i )(r i ) zi , for r > 0, i = 1, 2, , n. (r + i )(r + i )

(Trm , , Tr1 ) = max

1in

j =1 m

(rj i )(rj i ) (rj + i )(rj + i ) rj x . rj + x


2

max

(4.5.15)

j =1

For a given m, it is natural to choose ri > 0, i = 1, , m, so that the function


m

(r1 , , rm ) := max

j =1

rj x , rj + x

(4.5.16)

becomes as small as possible. For each m it can be shown that there are uniquely determined number r i with < r i < , i = 1, , m, such that dm (, ) := ( r1 , . . . , r m ) =
ri >0, 1im

min

(r1 , . . . , rm ).

(4.5.17)

The optimal parameter r 1 , , r m can even be given explicitly, for each m, in term of elliptic functions [see Young (1971) pp.518-525]. In the special case m = 2k , the relevant results will now be presented without proof [see Young (1971), Varga (1962)]. (m) (m) Let ri , i = 1, 2 , m, denote the optimal ADI parameters for m = 2k . The ri and dm (, ) can be computed recursively by means of Gausss arithmetic-geometric mean algorithm. It can be shown that d2n (, ) = dn ( , + ). 2 (4.5.18)

4.5 The ADI method of Peaceman and Rachford 93 (n) (2n) The optimal parameter of the minimax problem (4.5.17), ri and ri , being related by ri
(n)

ri

(2n)

+ /ri 2

(2n)

, i = 1, 2, , n.

(4.5.19)

Dene 0 := , 0 := . Then j +1 := Thus d2k (0 , 0 ) = d2k1 (1 , 1 ) = k k = d1 (k , k ) = . (Exercise!) k + k j j , j +1 := j + j , j = 0, 1, , k 1. 2 (4.5.20)

(4.5.21)

(1) The solution of d1 (k , k ) can be found with r1 = k k . The optimal ADI parameter (m) ri , i = 1, , m = 2k can be computed as follows: (0) (i) s1 := k k . (ii) For j = 0, 1, , k 1, determine si the 2j quadratic equations in x,
(j +1)

, i = 1, 2, , 2j +1 as the 2j +1 solutions of

k1j k1j 1 (j ) ), i = 1, 2, , 2j . si = (x + 2 x (iii) Put ri


(j ) (m)

(4.5.22)

:= si , i = 1, 2, . . . , m = 2k .

(k)

The si , i = 1, 2, , 2j are just the optimal ADI parameters for the interval [kj , kj ]. Let us use these formulas to study the model problem (8.14)(8.16), with m = 2k xed, and the asymptotic behavior of d2k (, ) as N . For and we take the best possible bounds = 4 sin2 We then have dm (, ) 1 4 m N , = 4 sin2 = 4 cos2 . 2(N + 1) 2(N + 1) 2(N + 1) 4(N + 1) (4.5.23)

(4.5.24)

as N , m := 2k . Proof of (4.5.24): By induction on k . Let ck := k /k . One obtains from ((4.5.20) and (4.5.21) that 1 ck (4.5.25) d2k (, ) = 1 + ck and c2 k+1 = 2ck . 1 + c2 k (4.5.26)

94 Chapter 4. Iterative Methods for Solving Large Linear Systems In order to prove (4.5.24), it suces to show that ck 2 2k , N . 4(N + 1) (4.5.27)

It follows then from (4.5.25) that for N , d2k (, ) 1 2ck . But (4.5.27) is true for k = 0, by using c0 = tan . 2(N + 1) 2(N + 1) Thus, if (4.5.27) is valid for some k 0, then it is also valid for k + 1, because from (4.5.26) we have at once ck+1 2ck , as N . In practice, the parameter ri are often repeated cyclically, i.e., one chooses a xed m (m) (m = 2k ), then determines approximately the optimal ADI parameter ri belonging to this m, and nally takes for the ADI method the parameters rjm+i := ri
(m)

for i = 1, 2, , m, j = 0, 1, .

If m individual steps of the ADI method are considered a big iteration step, then the quantity ln 10 ln (T rm , . . . , Tr1 ) indicates how many big iteration steps are required to reduce the error by a factor of 1/10, i.e., ln 10 (m) RADI = m ln (Trm , . . . , Tr1 ) indicates how many ordinary ADI steps, on the average, are required for the same purpose. In case of the model problem one obtains for the optimal choice of parameter and m = 2k , by virtue of (4.5.15) and (4.5.24), (Trm , . . . , Tr1 ) dm (, )2 1 8 m ln (Trm , . . . , Tr1 ) 8 m so that m m 4(N + 1) ln(10) , N . (4.5.28) 8 Comparing to (3.5)-(3.7) shows that for m > 1 the ADI method converges considerably faster than the optimal ordinary relaxation method. This convergence behavior establishes the practical signicance of the ADI method. RADI
(m)

, N , 4(N + 1) , N , 4(N + 1)

4.5.2

The algorithm of Buneman for the solution of the discretized Poisson Equation
uxx uyy + u = f (x, y ), for (x, y ) , u(x, y ) = 0, for (x, y ) ,

Consider the possion problem (4.5.29)

4.5 The ADI method of Peaceman and Rachford 95 2 where {(x, y ) | 0 < x < a, 0 < y < b} R , > 0 is a constant and f : R is a continuous function. Discretizing (4.5.29): for the approximate zij of u(xi , yj ), xi = ix, yj = jy , x a/(p + 1), y b/(q + 1). We obtain the equation: zi1,j + 2zi,j zi+1,j zi,j 1 + 2zi,j zi,j +1 + + zi,j = fij = f (xi , yj ), x2 y 2 for i = 1, 2, , p, j = 1, 2, , q . Together with the boundary values z0,j zp+1,j 0, for j = 0, 1, . . . , q + 1, zi,0 zi,q+1 0, for i = 0, 1, . . . , p + 1.
T T T T Let z = [z1 , z2 , , zq ] , zj = [z1j , z2j , , zpj ]T . Then (4.5.30) can be written in the forms Mz = b (4.5.31)

(4.5.30)

with

I M =

. A .. ... ... I I A

b=

b1 b2 . . . bq

, (4.5.32)

where I = Ip , A is a p p Hermitian tridiagonal matrix, and M consists of q block rows and columns. We describe here only Buneman algorithm (1969). For related method see also Hockney (1969) and Swarztranber (1977). Now, (4.5.32) can be written as: Az1 + z2 = b1 , zj 1 + Azj + zj +1 = bj , j = 2, 3, , q 1, (4.5.33) zq1 + Azq = bq , from the three consecutive equations zj 2 +Azj 1 +zj = bj 1 , zj 1 +Azj +zj +1 = bj , zj +Azj +1 +zj +2 = bj +1 . One can for all even j = 2, 4, . . . eliminate zj 1 and zj +1 by subtracting A times the second equation from the sum of the others: zj 2 + (2I A2 )zj + zj +2 = bj 1 Abj + bj +1 . For q odd, we obtain the reduced system 2I A2 I 0 z2 z4 2 ... I 2I A . . . .. .. . . I 2 zq 1 0 I 2I A

b1 + b3 Ab2 b3 + b5 Ab4 . . . bq2 + bq Abq1

. (4.5.34)

96 Chapter 4. Iterative Methods for Solving Large Linear Systems A solution {z2 , z4 , ....., zq1 } of (4.5.34) is known, then {z1 , z3 , . . .} can be determined by (from(4.5.33)): A A 0 .. . A 0 z1 z3 . . . zq = b1 z2 b3 z2 z4 . . . bq zq1 . (4.5.35)

Thus, (4.5.34) has the same structure as (4.5.32): M (1) z (1) = b(1) with M
(1)

z (1) =

. A(1) . . .. .. . . 0 I (1) z1 z2 (1) z2 z4 . . . . . . I zq1


(1)

A(1)

0 I A(1) ,

, A(1) 2I A2 , b(1) = ,

b1 (1) b2 . . . bq1
(1)

(1)

b1 + b3 Ab2 b3 + b5 Ab4 . . . bq2 + bq Abq1

zq1

so that the reduction procedure just described can be applied to M (1) again. In general, for q = q0 = 2k+1 1, we obtain a sequence of A(r) and b(r) according to: Set A(0) = A, b(0) = bj , j = 1, 2, ...., q0 , q (0) = q = 2k+1 1. For r = 0, 1, 2, . . . , k 1 : (1) A(r+1) 2I (A(r) )2 , (2) bj
(r+1)

(4.5.36)
(r)

b2j 1 + b2j +1 A(r) b2j , j = 1, 2, . . . , 2kr 1 ( qr+1 ).

(r )

(r)

For each stage r + 1, r = 0, 1, ..., k 1, one has a linear system M (r+1) z (r+1) = b(r+1) or A(r+1) I 0 . A(r+1) . . .. .. . . I I 0 I A(r+1) z1 (r+1) z2 . . . zqr+1
(r+1) (r+1)

b1 (r+1) b2 . . . bqr+1
(r+1)

(r+1)

4.5 The ADI method of Peaceman and Rachford 97 (r+1) (r) (r) (r) Its solution z furnishes the subvectors with even indices of z of the system M z = (r) b in stage r, (r) (r+1) z2 z1 z (r) z (r+1) 4 2 . . , . . . . zqr1
(r)

zqr+1

(r+1)

while the subvector with odd indices of z (r) can be obtained by solving (r) (r) (r) b 1 z2 z1 A(r) z (r) b(r) z (r) z (r) A(r) 3 3 2 4 . . = . . . . . . . . (r) ( r ) ( r ) ( r ) A bqr zqr1 zqr From A(r) , b(r) produced by (4.5.36), the solution z := z (0) of (4.5.32) is thus obtained by the following procedure (13.37) (say!): Algorithm 4.5.1 (0) Initialization: Determine z (k) = z1 (k) b1 . (1) For r = k 1, k 2, ...., 0, (a) Put z2j := zj
(r ) (r+1) (k)

by solving A(k) z (k) = b(k) =

, j = 1, 2, . . . , qr+1 = 2kr 1,
(r)

(b) For j = 1, 3, 5, . . . , qr , compute zj


(r) (r)

by solving
(r )

A(r) zj = bj zj 1 zj +1 (2) Put z := z (0) .

(r)

(z0 := zqr +1 := 0).

(r)

(r)

Remark 4.5.1 (4.5.36) and Algorithm 4.5.1 are still unsatisfactory, as it has serious numerical drawbacks. We have the following disadvantages: (1) A(r+1) = 2I (A(r) )2 in (I) of (4.5.36) is very expensive, the tridiagonal matrix A(0) = A as r increases, very quickly turns into a dense matrix. So that, the computation of (A(r) )2 and the solution of (1b) of Algorithm 4.5.1 become very expensive. (2) The magnitude of A(r) grows 4 1 . 1 4 . . A = A0 = .. .. . . 0 1 exponentially: For 0 A0 4, A(r) A(r1) , 1 4

42 .

Both drawbacks can be avoided by a suitable reformulation of the algorithm. The explicit computation of A(r) is avoided if one exploits the fact that A(r) can be represented as a product of tridiagonal matrices.

98 Chapter 4. Iterative Methods for Solving Large Linear Systems Theorem 4.5.3 One has for all r 0,
2r

A
(r)

(r)

=
j =1

[(A + 2cosj I )],

(r )

where j := (2j 1)/2r+1 , for j = 1, 2, . . . , 2r . Proof: By (1) of (4.5.36), one has A(0) = A, A(r+1) = 2I (A(r) )2 , so that there exists a polynomial Pr (t) of degree 2r such that A(r) = Pr (A). Evidently, Pr satisfy P0 (t) = t, Pr+1 (t) = 2 (Pr (t))2 , so that Pr has the form Pr (t) = (t)2 + . Pr (2 cos ) = 2 cos(2r ).
r

(4.5.37)

(4.5.38) (4.5.39)

By induction, using the substitution t = 2 cos , we get

The formula is trivial for r = 0. If it is valid for some r 0, then it is also valid for r + 1, since Pr+1 (2 cos ) = 2 (Pr (2 cos ))2 = 2 4 cos2 (2r ) = 2 cos(2 2r ). In view of (4.5.39), Pr (t) has the 2r distinct real zeros 2j 1 ), j = 1, 2, . . . , 2r , 2r+1 and therefore by (4.5.38), the product representation tj = 2 cos(
2r

Pr (t) =
j =1

[(t tj )].

From this, by virtue of (4.5.37), the assertion of Theorem follows immediately. In practice, to reduce the systems A(r) u = b in (1b) of Algorithm 4.5.1 with A(r) , recursively to the solution of 2r systems with tridiagonal matrices Aj := A 2 cos j I, as follows: A1 u1 = b (r ) A2 u2 = u1 . . .
(r) (r) (r ) (r )

j = 1, 2, . . . , 2r ,

u1 u2

(4.5.40)

A2r u2r = u2r 1 u2r u := u2r .

4.5 The ADI method of Peaceman and Rachford 99 (r ) Remark 4.5.2 (i) It is easily veried, the tridiagonal matrices Aj are positive denite. One can use Cholesky decomposition for the systems. (ii) The numerical instability which occurs in (4.5.36)(2) because of the exponential growth of A(r) can be avoided. Buneman (1969) suggested that by introducing in place of the bj (r) (r) (r) pj , qj , j = 1, 2, ..., qr , which are related to bj : bj = A(r) pj + qj ,
(r) (r ) (r) (r)

other vectors

j = 1, 2, ..., qr ,

(4.5.41)

which can be computed as follows: (0) (0) (0) Set pj := 0, qj := bj = bj , j = 1, 2, ..., qr . For r = 0, 1, .., k 1 : for j = 1, 2, ..., qr+1 : Compute (r+1) (r) (r) (r ) (r) (1) pj := p2j (A(r) )1 [p2j 1 + p2j +1 + q2j ], (r+1) (r ) (r) (r+1) (2) qj := q2j 1 + q2j +1 2pj .
(r+1)

(4.5.42)

The computation of pj in (4.5.42)(1) is as in (4.5.40). The solution u of A(r) u = (r ) (r) (r) p2j 1 + p2j +1 q2j with the factorization of A(r) in Theorem 4.5.3 and then computing (r+1) pj from u by means of pj
(r+1)

:= p2j u.
(r )

(r )

Let us prove by induction on r that pj , qj in (4.5.42) satisfy the relation (4.5.41). For r = 0 (4.5.41) is trivial. Assume that (4.5.41) holds true for some r 0. Because of (4.5.36)(2) and A(r+1) = 2I (A(r) )2 , we then have bj
(r+1)

(r )

= b2j +1 + b2j 1 A(r) b2j


(r) (r) (r) (r)

(r)

(r)

(r) (r) (r) (r ) (r)

= A(r) p2j +1 + q2j +1 + A(r) p2j 1 + q2j 1 A(r) [A(r) p2j + q2j ] = A(r) [p2j +1 + p2j 1 q2j ] + A(r+1) p2j + q2j 1 + q2j +1 2p2j
(r) (r) (r) (r ) (r) (r) (r) (r) (r ) (r) (r ) (r) (r) (r ) (r) (r) (r ) (r)

= A(r+1) p2j + (A(r) )1 {[2I A(r+1) ][p2j +1 + p2j 1 q2j ]} + q2j 1 + q2j +1 2p2j = A(r+1) {p2j (A(r) )1 [p2j +1 + p2j 1 q2j ]} + q2j 1 + q2j +1 2pj = A(r+1) pj
(r+1) (r+1)

+ qj

(r+1)

.
(r ) (r) (r)

By (4.5.41) we can express bj in Algorithm 4.5.1 in terms of pj , qj and obtain, for (r) example, from (1b) of Algorithm 4.5.1 for zj the system A(r) zj = A(r) pj + qj zj 1 zj +1 ,
(r) (r) (r) (r) (r )

100 Chapter 4. Iterative Methods for Solving Large Linear Systems which can be solved by determining u of A(r) u = qj zj 1 zj +1 , and put zj := u + pj . Replacing the bj in (4.5.36) and Algorithm 4.5.1 systematically (r) (r) by pj and qj one obtains: Algorithm 4.5.2 (Algorithm of Buneman) Consider the system (4.5.32), with q = 2k+1 1. (0) (0) (0) Initialization: Put pj := 0, qj := bj , j = 1, 2, . . . , q0 := q . (1) For r = 0, 1, . . . , k 1, For j = 1, 2, ..., qr+1 := 2kr 1 :
(r) (r) (r) (r ) (r ) (r ) (r ) (r) (r)

Compute u of A(r) u = p2j 1 + p2j +1 q2j by the factorization of (r+1) (r) (r) (r) (r) Theorem 4.5.3 and put pj := p2j u, qj := q2j 1 + q2j +1 (r+1) 2pj . (2) Determine u of the systems A(k) u = q1 , and put z (k) := z1 := p1 + u. (3) For r = k 1, k 2, ..., 0, (a) Put z2j := zj for j = 1, 2, ..., qr+1 . (r) (r) (r) (b) For j = 1, 3, 5, ..., qr determine the solution u of A(r) u = qj zj 1 zj +1 (r ) (r) and put zj := pj + u. (4) Put z := z (0) . Remark 4.5.3 This method is very ecient: For the model problem (4.1.14) (a = 1 = b, p = q = N = 2k+1 1), with its N 2 unknowns, on requires about 3kN 2 3N 2 log2 N multiplications and about the same number of additions.
(r) (r+1) (k) (k) (k)

4.5.3

Comparison with Iterative Methods


uxx uyy = 2 2 sin x sin y, for (x, y ) , , u(x, y ) = 0, for (x, y ) ,

Consider the special model problem (4.5.43)

where = {(x, y )|0 < x, y < 1}, which has the exact solution u(x, y ) = sin x sin y . Using the discretization we have Az = b, A as in (4.1.18), b = 2h2 2 u (4.5.44)

with u := [ u11 , u 21 , . . . , u N 1 , . . . , u 1N , . . . , u N N ]T and u ij := u(xi , yj ) = sin ih sin jh, h = 1/(N + 1).

4.5 The ADI method of Peaceman and Rachford Method k N r(i) Jacobi 5 3.5 103 10 1.2 103 Gauss-Seidel 5 3.0 103 10 1.1 103 25 5.6 103 Relaxation 5 1.6 103 10 0.9 103 25 0.6 103 50 1.0 102 ADI 2 5 0.7 103 10 4.4 103 25 2.0 102 4 5 1.2 103 10 0.8 103 25 1.6 105 50 3.6 104

101 i 60 235 33 127 600 13 28 77 180 9 12 16 9 13 14 14

Table 4.1: Comparison results for Jacobi, Gauss-Seidel, SOR and ADI methods

Remark 4.5.4 The vector b in (4.5.44) is an eigenvector of J = (4I A)/4, also an eigenvector of A. We have Jb = b with = cos h. The exact solution of (4.5.44) can be found z := h2 2 u . 2(1 cos h) (4.5.45)

As a measure for the error we took the residual, weighted by 1/h2 : r (r) := 1 h2 Az (i) b

We start with z (0) := 0 ( r(0) = 2 2 20). We show the table computed by Jacobi, Gauss-Seidel, SOR and ADI methods respectively: Since the Algorithm of Buneman in 13.2 is a noniterative method which yields the exact solution of (4.5.44) in a nite number of steps at the expense of about 3N 2 log2 N multiplications. From (4.5.45), by Taylor expansion in powers of h, we have zu =
2 h2 2(1cos h)

u =

h2 2 u + O(h4 ), 12
2 2

so that the error z u , in as much as u 1, satises z u h12 + O(h4 ). In order to compute z with an error of the order h2 , the needed number of iterations and operations for the Jacobi, Gauss-Seidel and SOR methods are shown in Table 4.2. (m) For a given N , RADI is minimized for m ln[4(N +1)/ ], in which case m 4(N + 1)/ e. The ADI method with optimal choice of m and optimal choice of parameter thus requires

RADI log10 (N + 1)2 3.60(log10 N )2

(m)

102

Chapter 4. Iterative Methods for Solving Large Linear Systems Method No. of iterations No. of operations 5N 4 log10 N 2.5N 4 log10 N 3.6N 3 log10 N

Jacobi 0.467(N + 1)2 log10 (N + 1)2 N 2 log10 N Gauss-Seidel 0.234(N + 1)2 log10 (N + 1)2 0.5N 2 log10 N Optimal SOR 0.36(N + 1) log10 (N + 1)2 0.72N log10 N

Table 4.2: Number of iterations and operations for Jacobi, Gauss-Seidel and SOR methods iterations to approximate the solution z of (4.5.44) with error h2 . The ADI requires about 8N 2 multiplications per iteration, so that the total number of operations is about 28.8N 2 (log10 N )2 . The Buneman method, according to 13.2 requires only 3N 2 log2 N 10N 2 log10 N multiplications for the computation of the exact solution of (4.5.44). This clearly shows the superiority of Buneman method.

4.6

Derivation and Properties of the Conjugate Gradient Method

Let A Rnn be a symmetric positive denite (s.p.d.) matrix. Here n is very large and A is sparse. Consider the linear system Ax = b.

4.6.1

A Variational Problem, Steepest Descent Method (Gradient Method).


1 1 aik xi xk F (x) = xT Ax bT x = 2 2 i,k=1
n n

Consider the functional F : Rn R with b i xi .


i=1

(4.6.1)

Then it holds: Theorem 4.6.1 For a vector x the following statements are equivalent: (i) F (x ) < F (x), for all x = x , (ii) Ax = b. (4.6.2)

Proof: From assumption there exists z0 = A1 b and F (x) can be rewritten as 1 1 T F (x) = (x z0 )T A(x z0 ) z0 Az0 . (4.6.3) 2 2 Since A is positive denite, F (x) has a minimum at x = z0 and only at x = z0 , it follows the assertion. Therefore, the solution of the linear system Ax = b is equal to the solution of the minimization problem 1 (4.6.4) F (x) = xT Ax bT x = min!. 2

4.6 Derivation and Properties of the Conjugate Gradient Method 103 Method of the steepest descent Let xk be an approximate of the exact solution x and pk be a search direction. We want to nd an k+1 such that F (xk + k+1 pk ) < F (xk ). Set xk+1 := xk + k+1 pk . This leads to the following basic problem. Basic Problem: Given x, p = 0, nd 0 such that (0 ) = F (x + 0 p) = min! Solution: Since F (x + p) = 1 (x + p)T A(x + p) bT (x + p) 2 1 2 T = p Ap + (pT Ax pT b) + F (x), 2 0 = (b Ax)T p rT p = , pT Ap pT Ap (4.6.5)

it follows that if we take

where r = b Ax = gradF (x) = residual, then x + 0 p is the minimal solution. Moreover, 1 (rT p)2 F (x + 0 p) = F (x) . (4.6.6) 2 pT Ap Steepest Descent Method with Optimal Choice k+1 (Determine k via the given data x0 , p0 , p1 , ): Let
T rk pk pk , rk = b Axk , T pk Apk T pk )2 1 (rk F (xk+1 ) = F (xk ) , k = 0, 1, 2, . 2 pT k Apk

xk+1 = xk +

(4.6.7) (4.6.8)

Then, it holds
T rk +1 pk = 0.

(4.6.9)

Since

d F (xk + pk ) = gradF (xk + pk )T pk , d


Tp rk k , T pk Apk

as in (4.6.5) k+1 =

it follows that gradF (xk + k+1 pk )T pk = 0. Thus


T (b Axk+1 )T pk = rk +1 pk = 0,

hence (4.6.9) holds. Steepest Descent Method (Gradient Method) Let : Rn R be a dierential function on x. Then it holds (x + p) (x) = (x)T p + O().

104

Chapter 4. Iterative Methods for Solving Large Linear Systems

(x) (i.e., the largest descent) for all p The right hand side takes minimum at p = (x) with p = 1 (neglect O()). Hence, it suggests to choose

pk = gradF (xk ) = b Axk . Gradient Method: Given x0 , for k = 1, 2, rk1 = b Axk1 , if rk1 = 0, then stop; else T k1 rk1 k = r , xk = xk1 + k rk1 . T r Ark1
k1

(4.6.10)

(4.6.11)

Cost in each step: compute Axk1 (Ark1 does not need to compute). To prove the convergence of Gradient method, we need the Kontorowitsch inequality: Let 1 2 n > 0, i 0,
n n n

i = 1. Then it holds 1 + n n 2 ). 1 (4.6.12)

i=1 1 j j j =1

i i
i=1

1 (1 + n )2 = ( 41 n 4

Proof of (4.6.12): Consider the n points Pi = (i , 1/i ). Let B be the region between y = 1/x and the straight line through P1 , Pn . The slope of the straight line P1 Pn is 1/n 1/1 1 = . n 1 n 1 The point P =
n i=1

i Pi lies in B . Maximize xy , for all (x, y ) B . The point (, ) which

lies on P1 Pn is a maximum for and has the coordinates: = n + (1 )1 , and = Since 0 = d 1 1 [(n + (1 )1 )( + (1 ) )] d n 1 d 2 n 1 = [ + (1 )2 + (1 )( + )] d 1 n n 1 = 2 + 2( 1) + (1 2)( + ) 1 n n 1 = (1 2)( + 2), 1 n 1 1 + (1 ) . n 1

it follows = 1/2. Hence 1 1 1 (1 + n )2 1 . = (1 + n )( + ) = 4 1 n 4 1 n

4.6 Derivation and Properties of the Conjugate Gradient Method 105 So (4.6.12) holds. Another form: Let A be s.p.d. (symmetric positive denite) and 1 2 n > 0 T be the eigenvalues of A. Let x be a vector with x 2 2 = x x = 1, then it holds xT Ax xT A1 x 1 (1 + n )2 1 = ( 4 1 n 4 1 + n n 2 ). 1 (4.6.13)

Proof of (4.6.13): Let U be an orthogonal matrix satisfying U AU T = = diag(1 , , n ). Then we have n xT Ax = xT U T U x = y T y =


i=1 2 i yi

(y := U x).

Similarly, x A x=y y=
T 1 T 1

n 2 yi i=1

1 . i

From (4.6.12) follows (4.6.13). Theorem 4.6.2 If xk , xk1 are two approximations of the gradient method (4.6.11) for solving Ax = b and 1 2 n > 0 are the eigenvalues of A, then it holds: 1 1 n 2 1 F (xk ) + bT A1 b ( ) [F (xk1 ) + bT A1 b], 2 1 + n 2 i.e., xk x where x
A A

(4.6.14a)

1 n ) xk1 x 1 + n

A,

(4.6.14b)

xT Ax. Thus the gradient method is convergent.

Proof: By computation, F (xk ) + = = = = = 1 T 1 1 b A b = (xk x )T A(xk x ) 2 2 1 (xk1 x + k rk1 )T A(xk1 x + k rk1 ) ( since A(xk1 x ) = rk1 ) 2 1 T 2 T [(xk1 x )T A(xk1 x ) 2k rk 1 rk1 + k rk1 Ark1 ] 2 T 2 (rk 1 T 1 rk1 ) [rk1 A1 rk1 T ] 2 rk1 Ark1 T 2 (rk 1 T 1 rk1 ) 1 r A rk1 [1 T ] T 1 2 k 1 rk1 Ark1 rk 1 A rk1 1 T 41 n rk1 A1 rk1 [1 ] ( from (4.6.13)) 2 (1 + n )2 1 1 n 2 [F (xk1 ) + bT A1 b]( ). 2 1 + n

1 n If the condition number of A (= 1 /n ) is large, then 1. The gradient method 1 +n converges very slowly. Hence this method is not recommendable.

106

Chapter 4. Iterative Methods for Solving Large Linear Systems

4.6.2

Conjugate gradient method

It is favorable to choose that the search directions {pi } as mutually A-conjugate, where A is symmetric positive denite. Denition 4.6.1 Two vectors p and q are called A-conjugate (A-orthogonal), if pT Aq = 0. Remark 4.6.1 Let A be symmetric positive denite. Then there exists a unique s.p.d. B such that B 2 = A. Denote B = A1/2 . Then pT Aq = (A1/2 p)T (A1/2 q ). Lemma 4.6.3 Let p0 , . . . , pr = 0 be pairwisely A-conjugate. Then they are linearly independent. Proof: From 0 =
r j =0 r r

cj pj follows that

pT k A(
j =0

cj pj ) = 0 =
j =0

T cj pT k Apj = ck pk Apk ,

so ck = 0, for k = 1, . . . , r. Theorem 4.6.4 Let A be s.p.d. and p0 , . . . , pn1 be nonzero pairwisely A-conjugate vectors. Then n1 pj pT j 1 A = . (4.6.15) T p Ap j j j =0
T T Remark 4.6.2 A = I , U = (p0 , . . . , pn1 ), pT = I and i pi = 1, pi pj = 0, i = j . U U T I = U U . Then pT 0 . T T I = (p0 , . . . , pn1 ) . . = p0 p0 + + pn1 pn1 . pT n1 A Proof of Theorem 4.6.4: Since p i =
1/2 p i pT Ap i i

are orthonormal, for i = 0, 1, . . . , n 1,

we have

n1 p T I = p 0 p 0 T + . . . + p n1
n1

=
i=0

1/2 A1/2 pi pT i A = A1/2 pT Ap i i

n1

i=0

pi pT i pT Ap i i

A1/2 .

Thus,
n1

1/2

IA

1/2

=A

=
i=0

pi pT i . T pi Api

4.6 Derivation and Properties of the Conjugate Gradient Method 107 1 Remark 4.6.3 Let Ax = b and x0 be an arbitrary vector. Then from x x0 = A (b Ax0 ) and (4.6.15) follows that
n1

x = x0 +
i=0

pT i (b Ax0 ) pi . (pT i Api )

(4.6.16)

Theorem 4.6.5 Let A be s.p.d. and p0 , . . . , pn1 Rn \{0} be pairwisely A-orthogonal. Given x0 and let r0 = b Ax0 . For k = 0, . . . , n 1, let k = xk+1 rk+1 pT k rk , T pk Apk = xk + k pk , = rk k Apk . (4.6.17) (4.6.18) (4.6.19)

Then the following statements hold: (i) rk = b Axk . (By induction). (ii) xk+1 minimizes F (x) (see (4.6.1)) on x = xk + pk , R. (iii) xn = A1 b = x . (iv) xk minimizes F (x) on the ane subspace x0 + Sk , where Sk = Span{p0 , . . . , pk1 }. Proof: (i): By Induction and using (4.6.18) (4.6.19). (ii): From (4.6.5) and (i). (iii): It is enough to show that xk (which dened in (4.6.18)) corresponds with the partial sum in (4.6.16), i.e.,
k1

xk = x0 +
=0

pT (b Ax0 ) p . pT Ap

Then it follows that xn = x from (4.6.16). From (4.6.17) and (4.6.18) we have
k1 k1

xk = x0 +
=0

p = x0 +
=0

pT (b Ax ) p . pT Ap

To show that
T pT (b Ax ) = p (b Ax0 ).

(4.6.20)

From xk x0 =

k1 =0

p we obtain
k 1

pT k Axk

pT k Ax0

=
=0

pT k Ap = 0.

So (4.6.20) holds. (iv): From (4.6.19) and (4.6.17) follows that


T T pT k rk+1 = pk rk k pk Apk = 0.

108 Chapter 4. Iterative Methods for Solving Large Linear Systems From (4.6.18), (4.6.19) and by the fact that rk+s rk+s+1 = k+s Apk+s and pk are orthogonal (for s 1) follows that
T T pT k rk+1 = pk rk+2 = . . . = pk rn = 0.

Hence we have pT i rk = 0, i = 0, . . . , k 1, k = 1, 2, . . . , n. (i.e., i < k ). We now consider F (x) on x0 + Sk :


k 1

(4.6.21)

F ( x0 +
i=0

i pi ) = (0 , . . . , k1 ).
i

F (x) is minimal on x0 + Sk if and only if all derivatives = [gradF (x0 + s


k 1

vanish at x. But (4.6.22)

i pi )]T ps , s = 0, 1, . . . , k 1.
i=0

If x = xk , then gradF (x) = rk . From (4.6.21) follows that (xk ) = 0, for s = 0, 1, . . . , k 1. s Another proof of (iv): For arbitrary d Rn it holds F (x0 + d) F (x0 ) = 1 1 (x0 + d)T A(x0 + d) bT (x0 + d) xT Ax0 + bT x0 2 2 0 1 T = d Ad dT (b Ax0 ). 2

So for d =

k1 i=0

i pi we have
k1

F (x0 +
i=0

1 i pi ) = F (x0 ) + ( i pi )T A( j pj ) 2 i=0 j =0 = F (x0 ) + 1 2


k 1

k 1

k 1

k 1

i pT i (b Ax0 )
i=0

T [i2 pT i Api 2pi (b Ax0 )i ] = min!. i=0

(4.6.23)

The equation (4.6.23) holds if and only if i2 pT i Api 2i pi (b Ax0 ) = min! i = 0, . . . , k 1, if and only if i = pT pT i (b Ax0 ) i ri = = i T T pi Api pi Api
k 1 i=0

from (4.6.20) and (4.6.17). Thus xk = x0 +

i pi minimizes F on x0 + span{p0 , . . . , pk1 }.

4.6 Derivation and Properties of the Conjugate Gradient Method 109 T Remark 4.6.4 The following conditions are equivalent: (i) pi Apj = 0, i = j , AT conjugate, (ii) pT i rk = 0, i < k , (iii) ri rj , i = j . Proof of (iii):
T T T T pT i rk = 0 (ri + i1 pi1 )rk , i < k ri rk = 0, i < k ri rj = 0, i = j.

Remark 4.6.5 It holds < p0 , p1 , , pk >=< r0 , r1 , , rk >=< r0 , Ar0 , , Ak r0 > Since p1 = r1 + 0 p0 = r1 + 0 r0 , r1 = r0 0 Ar0 , by induction, we have r2 = r1 0 Ap1 = r1 0 A(r1 + 0 r0 ) = r0 0 Ar0 0 A(r0 0 Ar0 + 0 r0 ). Algorithm 4.6.1 (Method of conjugate directions) Let A be s.p.d., b and x0 Rn . Given p0 , . . . , pn1 Rn \{0} pairwisely A-orthogonal. r0 = b Ax0 , For k = 0, . . . , n 1, k rk k = pp T Ap , xk +1 = xk + k pk , k k rk+1 = rk k Apk = b Axk+1 , end for From Theorem 4.6.5 we get xn = A1 b.

4.6.3

Practical Implementation

In the k -th step a direction pk which is A-orthogonal to p0 , . . . , pk1 must be determined. It allows for A-orthogonalization of rk against p0 , . . . , pk1 (see (4.6.21)). Let rk = 0, F (x) decreases strictly in the direction rk . For > 0 small, we have F (xk rk ) < F (xk ). It follows that F takes its minimum at a point (= xk ) on x0 + span{p0 , . . . , pk1 , rk }. So it holds xk+1 = xk , i.e., k = 0. This derives that Conjugate Gradient method. Algorithm 4.6.2 (Conjugate Gradient method (CG-method), (Stiefel-Hestenes, 1952)) Let A be s.p.d., b Rn , choose x0 Rn , r0 = b Ax0 = p0 . If r0 = 0, then N = 0 stop; otherwise for k = 0, 1, . . . pT rk , (a) k = Tk pk Apk (b) xk+1 = xk + k pk , (c) rk+1 = rk k Apk = b Axk+1 , if rk+1 = 0, let N = k + 1, stop. T rk +1 Apk (d) k = , T pk Apk (e) pk+1 = rk+1 + k pk . Theorem 4.6.6 The CG-method holds

(4.6.24)

110 Chapter 4. Iterative Methods for Solving Large Linear Systems (i) If k steps of CG-method are executable, i.e., ri = 0, for i = 0, . . . , k , then pi = 0, i k and pT i Apj = 0 for i, j k , i = j . (ii) The CG-method breaks down after N steps for rN = 0 and N n. (iii) xN = A1 b. Proof: (i): By induction on k , it is trivial for k = 0. Suppose that (i) is true until k and rk+1 = 0. Then pk+ 1 is well-dened. we want to verify that (a) pk+1 = 0, (b) pT k+1 Apj = 0, for j = 0, 1, . . . , k . T T T For (a): First, it holds rk +1 pk = rk pk k pk Apk = 0 by (4.6.24)(c). Let pk+1 = 0. Then T T from (4.6.24)(e) we have rk+1 = k pk = 0. So, k = 0, hence 0 = rk +1 pk = k pk pk = 0. This is a contradiction, so pk+1 = 0. For (b): From (4.6.24)(d) and (e), we have
T T pT k+1 Apk = rk+1 Apk + k pk Apk = 0.

Let j < k , from (4.6.24)(e) we have


T T T pT k+1 Apj = rk+1 Apj + k pk Apj = rk+1 Apj .

(4.6.25)

It is enough to show that Apj span{p0 , ...., pj +1 }, j < k. (4.6.26)

Then from the relation pT i rj = 0, i < j k + 1, which has been proved in (4.6.21), follows (b). Claim (4.6.26): For rj = 0, it holds that j = 0. (4.6.24)(c) shows that Apj = 1 (rj rj +1 ) span{r0 , ...., rj +1 }. j

(4.6.24)(e) shows that span{r0 , ...., rj +1 } = span{p0 , ...., pj +1 } with r0 = p0 , so is (4.6.26). +1 (ii): Since {pi }k i=0 = 0 and are mutually A-orthogonal, p0 , ..., pk+1 are linearly independent. Hence there exists a N n with rN = 0. This follows xN = A1 b. Advantage:(1) Break-down in nite steps. (2) Less cost in each step: one matrix vector.

4.6.4

Convergence of CG-method

Consider the following A-norm with A being s.p.d. x


A

= (xT Ax)1/2 .

(4.6.27)

Let x = A1 b. Then from (4.6.3) we have 1 1 F (x) F (x ) = (x x )T A(x x ) = x x 2 2


2 A,

(4.6.28)

4.6 Derivation and Properties of the Conjugate Gradient Method 111 where xk is the k -th iterate of CG-method. From Theorem 4.6.5 xk minimizes the functional F on x0 + span{p0 , ...., pk1 }. Hence it holds xk x
A

y x

A,

y x0 + span{p0 , .., pk1 }.

(4.6.29)

From (4.6.24)(c)(e) it is easily seen that both pk and rk can be written as linear combination of r0 , Ar0 , . . . , Ak1 r0 . If y x0 + span{p0 , . . . , pk1 }, then y = x0 + c1 r0 + c2 Ar0 + ..... + ck Ak1 r0 = x0 + Pk1 (A)r0 , where Pk1 is a polynomial of degree k 1. But r0 = b Ax0 = A(x x0 ), thus y x = (x x ) + Pk1 (A)A(x x0 ) k (A)(x0 x ), = [I APk1 (A)] (x0 x ) = P k k and where degree P k (0) = 1. P (4.6.30) (4.6.31)

k is a polynomial of degree k and satises (4.6.31), then Conversely, if P k (A)(x0 x ) x0 + Sk . x + P k is a polynomial of degree k with P k (0) = 1, then Hence (4.6.29) means that if P xk x
A

k (A)(x0 x ) P

A.

(4.6.32)

Lemma 4.6.7 Let A be s.p.d. It holds for every polynominal Qk of degree k that max
x=0

Qk (A)x x A

= (Qk (A)) = max{|Qk ()| : eigenvalue of A}.

(4.6.33)

Proof: Qk (A)x x 2 A
2 A

xT Qk (A)AQk (A)x xT Ax (A1/2 x)T Qk (A)Qk (A)(A1/2 x) = (Let z := A1/2 x) (A1/2 x)(A1/2 x) z T Qk (A)2 z (Qk (A)2 ) = 2 (Qk (A)). = T z z =

Equality holds for suitable x, hence the rst equality is shown. The second equality holds by the fact that Qk () is an eigenvalue of Qk (A), where is an eigenvalue of A. From (4.6.33) we have that xk x
A

k (A)) x0 x (P

A,

(4.6.34)

k k and P k (0) = 1. where degree P Replacement problem for (4.6.34): For 0 < a < b, minmax{|Pk ()| : a b, for all polynomials of degree k with Pk (0) = 1} (4.6.35)

112 Chapter 4. Iterative Methods for Solving Large Linear Systems (if a = 0, it is clearly min max{| Pk () |} = 1). We use Chebychev polynomials of the rst kind for the solution. They are dened by T0 (t) = 1, T1 (t) = t, Tk+1 (t) = 2tTk (t) Tk1 (t). (4.6.36)

it holds Tk (cos ) = cos(k) by using cos((k + 1)) + cos((k 1)) = 2 cos cos k. Especially, j Tk (cos ) = cos(j ) = (1)j , for j = 0, . . . , k, k i.e. Tk takes maximal value one at k + 1 positions in [1, 1] with alternating sign. In addition (Exercise!), we have 1 (4.6.37) Tk (t) = [(t + t2 1)k + (t t2 1)k ]. 2 Lemma 4.6.8 The solution of the problem (4.6.35) is given by Qk (t) = Tk 2t a b ba Tk a+b , ab

i.e., for all Pk of degree k with Pk (0) = 1 it holds


[a,b]

max |Qk ()| max |Pk ()|.


[a,b]

Proof: Qk (0) = 1. If t runs through the interval [a, b], then (2t a b)/(b a) runs through the interval [1, 1]. Hence, in [a, b], Qk (t) has k + 1 extreme with alternating a+b 1 sign and absolute value = |Tk ( a ) |. b If there are a Pk with max {|Pk ()| : [a, b]} < , then Qk Pk has the same sign as Qk of the extremal values, so Qk Pk changes sign at k + 1 positions. Hence Qk Pk has k roots, in addition a root zero. This contradicts that degree(Qk Pk ) k . Lemma 4.6.9 It holds = Tk where c =
1 +1 1

b+a ab

1 Tk
b+a ba

2ck = 2ck , 2 k 1+c

(4.6.38)

and = b/a.
b+a ba

Proof: For t =

+1 , 1

we compute t+ t2 +1 1= = c1 1 1 1= = c. +1

and t Hence from (4.6.37) follows =

t2

2 2ck = 2ck . k k 2 k c +c 1+c

4.7 CG-method as an iterative method, preconditioning Theorem 4.6.10 CG-method satises the following error estimate xk x where c =
1 , +1 A

113 (4.6.39)

2ck x0 x

A,

1 n

and 1 n > 0 are the eigenvalues of A. (Pk (A)) x0 x A max {|Pk ()| : 1 n } x0 x

Proof: From (4.6.34) we have xk x


A A,

for all Pk of degree k with Pk (0) = 1. From Lemma 4.6.8 and Lemma 4.6.9 follows that xk x
A

max {|Qk ()| : 1 n } x0 x 2ck x0 x A .

Remark 4.6.6 To compare with Gradient method (see (4.6.14b)): Let xG k be the k th iterate of Gradient method. Then xG k But x
A

1 n 1 + n

x0 x

A.

because in general method.

1 n 1 1 = > = c, 1 + n +1 +1 . Therefore the CG-method is much better than Gradient

4.7

CG-method as an iterative method, preconditioning


Ax = b. (4.7.1) (4.7.2)

Consider the linear system of a symmetric positive denite matrix A Let C be a nonsingular symmetric matrix and consider a new linear system x A = b = C 1 AC 1 s.p.d., with A b = C 1 b and x = Cx. Applying CG-method to (4.7.2) it yields: x Choose x 0 , r 0 = bA 0 = p 0 . If r 0 = 0, stop, otherwise for k = 0, 1, 2, . . ., 1 (a) k = p T k /p T AC 1 p k , kr kC (b) x k+1 = x k + k p k , (c) r k+1 = r k k C 1 AC 1 p k , if r k+1 = 0 stop; otherwise, T 1 (d) k = r k AC 1 p k /p k C 1 AC 1 p k , +1 C k p (e) p k+1 = r k+1 + k .

(4.7.3)

114 Chapter 4. Iterative Methods for Solving Large Linear Systems Simplication: Let C 1 p k = pk , xk = C 1 x k , z k = C 1 r k . Then x rk = C r k = C bA k = C C 1 b C 1 AC 1 Cxk = b Axk .

Algorithm 4.7.1 (Preconditioned CG-method (PCG)) M = C 2 , choose x0 = C 1 x 0 , r0 = b Ax0 , solve M p0 = r0 . If r0 = 0 stop, otherwise for k = 0, 1, 2, ...., T (a) k = pT k rk /pk Apk , (b) xk+1 = xk + k pk , (c) rk+1 = rk k Apk , if rk+1 = 0, stop; otherwise M zk+1 = rk+1 , T T (d) k = zk +1 Apk /pk Apk , (e) pk+1 = zk+1 + k pk .

(4.7.4)

Algorithm 4.7.1 is CG-method with preconditioner M . If M = I , then it is CGmethod. Additional cost per step: solve one linear system M z = r for z . Advantage: cond(M 1/2 AM 1/2 ) cond(A).

4.7.1

A new point of view of PCG

From (4.6.21) and Theorem 4.6.6 follows that pi T rk = 0 for i < k , i.e., (ri T +i1 pi1 T )rk = ri T rk = 0, i < k and pi T Apj = 0, i = j . That is, the CG method requires ri T rj = 0, i = j. T So, the PCG method satises pi T C 1 AC 1 pj = 0 r j r j = 0, i = j and requires
T ziT M zj = ri M 1 M M 1 rj = ri T M 1 rj = ri T C 1 C 1 rj = ri T rj = 0,

i = j.

Consider the iteration (in two parameters): xk+1 = xk1 + k+1 (k zk + xk xk1 ) (4.7.5)

with k and k+1 being two undetermined parameters. Let A = M N . Then from M zk = rk b Axk follows that M zk+1 = b A (xk1 + k+1 (k zk + xk xk1 )) = M zk1 k+1 [k (M N )zk + M (zk1 zk )] For PCG method {k , k+1 } are computed so that zp T M zq = 0, p = q, p, q = 0, 1, , n 1. (4.7.7) (4.7.6)

Since M > 0, there is some k n such that zk = 0. Thus, xk = x, the iteration converges no more than n steps. We show that (4.7.7) holds by induction. Assume zp T M z q = 0 , p = q, p, q = 0, 1, , k (4.7.8)

4.7 CG-method as an iterative method, preconditioning holds until k . If we choose k = zk T M zk then zk T M zk+1 = 0 and if we choose k+1 = then
T zk 1 M zk+1 = 0.

115

zk T (M N )zk ,

z T N zk 1 k T k 1 zk1 M zk1

(4.7.9)

We want to simplify k+1 . From (4.7.6) follows that M zk = M zk2 k (k1 (M N )zk1 + M (zk2 zk1 )) . Multiplying (4.7.10) by zk T and from (4.7.8) we get z k T N zk 1 = z k T M zk k k1 . (4.7.11) (4.7.10)

T T Since zk 1 N zk = zk Zzk1 , from (4.7.11) the equation (4.7.9) becomes

k+1 =

k zk T M zk 1 1 T k1 zk1 M zk1 k

(4.7.12)

From (4.7.6) for j < k 1 we have zj T M zk+1 = k k+1 zj T N zk . But (4.7.6) holds for j < k 1, M zj +1 = M zj 1 j +1 (j (M N )zj + M (zj 1 zj )) . Multiplying (4.7.14) by zk T we get zk T N zj = 0. Since N = N T , it follows that zj T M zk+1 = 0, for j < k 1. (4.7.14) (4.7.13)

Thus, we proved that zp T M zq = 0, p = q , p, q = 0, 1, , n 1. Consider (4.7.5) again xk+1 = xk1 + k+1 (k zk + xk xk1 ). Since M zk = rk = b Axk , if we set k+1 = k = 1, then xk+1 = M 1 (b Axk ) + xk xk + zk . (4.7.15)

116 Chapter 4. Iterative Methods for Solving Large Linear Systems Here zk is referred to as a correction term. Write A = M N . Then (4.7.15) becomes M xk+1 = b Axk + M xk = N xk + b. Recall the Iterative Improvement in Subsection 2.3.6: Solve Ax = b, rk = b Axk , Azk = rk , M zk = rk . xk+1 = xk + zk .

(4.7.16)

(i) Jacobi method (k+1 = k = 1): A = D (L + R), xk+1 = xk + D1 (b Axk ). (ii) Gauss-Seidel (k+1 = k = 1): A = (D L) R,

xk+1 = xk + (D L)1 (b Axk ). i.e.,


j 1 (k+1) xj n k+1) ajp x( p p=1 p=j +1 (k) ajp xp + xj 1 xj (k) (k)

= bj

x1 . . . (k+1) xj 1 (k) = xj + bj (aj 1 , . . . , aj,j 1 , 1, aj,j +1 , . . . , ajn ) x(k) j . . . (k) xn (iii) SOR-method (k+1 = 1, k = ): Solve Ax = b. Write

(k+1)

(D = I ).

A = (D L) ((1 )D + R) M N. Then xk+1 = = = = = (D L)1 (R + (1 )D)xk + (D L)1 b (D L)1 ((D L) A)xk + (D L)1 b (I (D L)1 A)xk + (D L)1 b xk + (D L)1 (b Axk ) xk + M 1 rk .

4.7 CG-method as an iterative method, preconditioning i.e.,


j 1 (k+1) xj n k+1) ajp x( p p=1

117

= bj

p=j +1

k) ajp x( p

+ (1 )xj

(k)

x1 . . . (k+1) xj 1 (k ) = xj + bj (aj 1 , , aj,j 1 , 1, aj,j +1 , , ajn ) x(k) j . . . (k) xn (iv) Chebychev Semi-iterative method (later!) (k+1 = k+1 , k = ): xk+1 = xk1 + k+1 (zk + xk xk1 ) .

(k+1)

We can think of the scalars k+1 , k in (4.7.5) as acceleration parameters that can be chosen to speed the convergence of the iteration M xk+1 = N xk + b. Hence any iterative method based on the splitting A = M N can be accelerated by the Conjugate Gradient Algorithm so long as M (the preconditioner) is symmetric and positive denite. Choices of M (Criterion): (i) cond(M 1/2 AM 1/2 ) is nearly by 1, i.e., M 1/2 AM 1/2 I, A M . (ii) The linear system M z = r must be easily solved. e.g. M = LLT (see Section 16.) (iii) M is symmetric positive denite. Explanation: Why we need to use preconditioning for solving the linear system Ax = b. Fixed Point Principle: x = b Ax + x = (I A)x + b. Thus x = Bx + b with B I A. Fixed Point Iteration: xi+1 = Bxi + b. Let ei = xi x. Then ei+1 = Bei = B i e0 . Thus {ei } 0 if and only if (B ) < 1. Hence we want to nd an M so that M 1 A I with A = M N . Consider M 1 Ax = M 1 b, then xi+1 = I M 1 A xi + M 1 b = I M 1 (M N ) xi + M 1 b, = M 1 N xi + M 1 b. (4.7.17)

118 Chapter 4. Iterative Methods for Solving Large Linear Systems Here A = M N is called a splitting iterative scheme and M z = r should be easily solvable. The iteration (4.7.17) is called a preconditioned xed point iteration. Jacobi: A = D (L + R). Gauss-Seidel: A = (D L) R. SOR (Successive Over Relaxation): Ax = b, Ax = b, ( > 1), A = D L R = (D L) [(1 )D + R] M N. This implies, xi+1 = (D L)1 [(1 )D + R]xi + (D L)1 b = M 1 N xi + M 1 b (M 1 N = I (D L)1 A). SSOR (Symmetric Successive Over Relaxation): L LT . Let M : = D L, N : = (1 )D + LT , Then from the iterations M xi+1/2 = N xi + b, T T M xi+1 = N xi+1/2 + b, follows that xi+1 =
T T 1 M N M N x i + b

A is symmetric and A = D

and

T M = D LT , T N = (1 )D + L.

T T 1 T Gxi + M N M + M b 1 Gxi + M ( ) b.

But ((1 )D + L) (D L)1 + I = (L D D + 2D)(D L)1 + I = I + (2 )D(D L)1 + I = (2 )D(D L)1 , Thus M ( )1 = D LT then M ( ) = 1 (D L)D1 D LT (2 ) (D L)D1 D LT , ( = 1). (4.7.18)
1

(2 )D(D L)1 ,

4.8 Incomplete Cholesky Decomposition 119 1/2 1/2 1/2 For a suitable the condition number of M ( ) AM ( ) , i.e., cond(M ( ) AM ( )1/2 ), can be considered smaller than cond(A). Axelsson(1976) showed (without proof): Let = max
x=0

xT Dx ( cond(A)) xT Ax

and = max
x=0

xT (LD1 LT 1 D )x 1 4 . T x Ax 4
1/2

Then cond M ( ) for =


1+2 1/2

AM ( )

1+

(2 )2 4

= ( )

2 , (2 +1)/2

( ) is minimal and ( ) = 1/2 + 1 + 2

(1/2 + ). Especially cond(A).

cond M ( )1/2 AM ( )1/2

(1/2 + )cond(A)

Disadvantage : , in general are unknown. SSOR + Conjugate Gradient method. SSOR + Chebychev Semi-iterative Acceleration (later!)

4.8

Incomplete Cholesky Decomposition

Let A be sparse and symmetric positive denite. Consider the Cholesky decomposition of A = LLT . L is a lower triangular matrix with lii > 0 (i = 1, ..., n). L can be heavily occupied (ll-in). Consider the following decomposition A = LLT N, (4.8.1)

where L is a lower triangular matrix with prescribed reserved pattern E and N is small. (i, i) E, i = 1, ..., n Reserved Pattern: E {1, ..., n} {1, ..., n} with (i, j ) E (j, i) E For a given reserved pattern E we construct the matrices L and N as in (4.8.1) with (i) A = LLT N , (ii) L : lower triangular with lii > 0 and lij = 0 (i, j ) E, (iii) N = (nij ) , nij = 0, if (i, j ) E First step: Consider the Cholesky decomposition of A, 1 0 a11 aT a 0 1 11 = A= a1 A1 a1 / a11 I 0 A1 where A1 = A1 a1 aT 1 /a11 . Then A = L1 1 0 0 A1 LT 1. a11 aT 1 / a11 0 I (4.8.2a) (4.8.2b) (4.8.2c)

120 Chapter 4. Iterative Methods for Solving Large Linear Systems For the Incomplete Cholesky decomposition the rst step will be so modied. Dene b1 = (b21 , ...., bn1 )T and c1 = (c21 , ...., cn1 )T by bj 1 = Then A= aj 1 , (j, 1) E, 0, otherwise, cj 1 = bj 1 aj 1 = 0, (j, 1) E, aj 1 , otherwise. (4.8.3)

a11 bT 1 b1 A1

0 cT 1 c1 0

0 C1 . =B

(4.8.4)

, we get Compute the Cholesky decomposition on B 0 = B and a 0 11 b1 / a11 I I 0 1 0 B a11 bT 1 / a11 0 I = L1 B1 LT 1 (4.8.5)

b1 b1 T B1 = A 1 . a11 A = L1 B1 L1 T C1 .

(4.8.6)

Then (4.8.7) Consequently, compute the Cholesky decomposition on B1 : B1 = L2 B2 LT 2 C2 with L2 = Thus,


T T A = L1 L2 B2 LT 2 L1 L1 C2 L1 C1

1 0 . . . . . . 0

0 . . .

0 0 . . 1 . .. . . . . 1

0 . 0 . . . . .. . . and C2 = . . . .. . . . . . . . . 0 0 0 (4.8.8)

and so on, hence


T A = L1 Ln ILT n L1 Cn1 Cn2 C1

(4.8.9)

with L = L1 Ln and N = C1 + C2 + + Cn . (4.8.10)

Lemma 4.8.1 Let A be s.p.d. E be a reserved patten. Then there is at most a decomposition A = LLT N , which satises the conditions: (4.8.2b) : L is lower triangular with lii > 0, lii = 0 = (i, j ) E. (4.8.2c) : N = (nij ), nij = 0, if (i, j ) E.

4.8 Incomplete Cholesky Decomposition 121 T T 2 2 Proof: Let A = LL N = LL N . Then a11 = l11 = l11 = l11 = l11 (since l11 is positive). Also, ak1 = lk1 l11 nk1 = lk1 l11 n k1 , so we have If (k, 1) E = nk1 = n k1 = 0 = lk1 = lk1 = ak1 /l11 , If (k, 1) E = lk1 = lk1 = 0 = nk1 = n k1 = ak1 . Suppose that lki = lki , nki = n ki , for k = i, , n, 1 i m 1. Then from
m1 2 + amm = lmm k=0 2 2 + = lmm lmk k=1 m1 2 lmk

(4.8.11a) (4.8.11b)

follows that lmm = lmm . Also from


m1 m1

arm = lrm lmm +


k=1

lrk lmk nrm = lrm lmm +


k=0

lrk lmk n rm

and (4.8.11) follows that nrm = n rm and lrm = lrm (r m). The Incomplete Cholesky decomposition may not exist, if
m1

sm := amm
k=1

(lmk )2 0.

1 1 0 2 1 2 1 0 A= 0 1 2 3 . 2 0 3 10 1 0 0 1 1 0 The Cholesky decomposition of A follows L = 0 1 1 2 2 1 complete Cholesky decomposition with patten 0 0 E = E (A) = 0 . 0

Example 4.8.1 Let

0 0 . Consider the In0 1

Above procedures (4.8.3)-(4.8.10) can be performed on A until the computation of l44 (see proof of Lemma 4.8.1),
2 2 2 2 l44 = a44 l41 l42 l43 = 10 9 4 = 3.

The Incomplete Cholesky decomposition does not exit for this pattern E. 0 0 1 0 0 0 1 1 0 E= 0 = L exists and L = 0 1 1 0 0 0 0 3

Now take 0 0 . 0 1

122 Chapter 4. Iterative Methods for Solving Large Linear Systems Find the certain classes of matrices, which have no breakdown by Incomplete Cholesky decomposition. The classes are M-matrices, H-matrices. Denition 4.8.1 A Rnn is an M-matrix. If there is a decomposition A = I B with B 0 (B 0 bij 0 for i, j = 1, ..., n) and (B ) = max {|| : is an eigenvalue of B } < . Equivalence: aij 0 for i = j and A1 0. Lemma 4.8.2 A is symmetric, aij 0, i = j . Then the following statements are equivalent (i) A is an M-matrix. (ii) A is s.p.d. Proof: (i) (ii): A = I B , (B ) < . The eigenvalues of A have the form , where is an eigenvalue of B and || < . Since is real, so > 0 for all eigenvalues , it follows that A has only positive eigenvalues. Thus (ii) holds. (ii) (i): For aij 0, (i = j ), there is a decomposition A = I B , B 0 (for example = max(aii )). Claim (B ) < . By Perron-Frobenius Theorem 4.1.7, we have that (B ) is an eigenvalue of B. Thus (B ) is an eigenvalue of A, so (B ) > 0. Then (i) holds. Theorem 4.8.3 Let A be a symmetric M -matrix. Then the Incomplete Cholesky method described in (4.8.3)-(4.8.10) is executable and yields a decomposition A = LLT N , which satises (4.8.2). Proof: It is sucient to show that the matrix B1 constructed by (4.8.3)-(4.8.7) is a symmetric M-matrix. 0 is an M-matrix. A = B 0 C1 B 0 , (since only negative elements (i): We rst claim: B , B 0 = kI B 0 with A 0, are neglected). There is a k > 0 such that A = kI A 0 0, then B 0 A . By Perron-Frobenius Theorem 4.1.7 follows (B 0 ) (A ) < k . B 0 is an M-matrix. This implies that B 1 1 T 0 is positive denite, hence B1 = L (ii): Thus B is also positive denite. B1 1 B0 L1 T b b 1 = A 1 1 1 . Then B1 is an M-matrix (by has nonpositive o-diagonal element, since B a11 Lemma 4.8.2) Denition 4.8.2 A Rnn . Decomposition A = B C is called regular, if B 1 0, C 0 (regular splitting). Theorem 4.8.4 Let A1 0 and A = B C is a regular decomposition. Then (B 1 C ) < 1. i.e., the iterative method Bxk+1 = Cxk + b for Ax = b is convergent for all x0 . Proof: Since T = B 1 C 0, B 1 (B C ) = B 1 A = I T , it follows that (I T )A1 = B 1 .

4.8 Incomplete Cholesky Decomposition Then


k k

123

0
i=0

T B

=
i=0

T i (I T )A1 = (I T k+1 )A1 A1 .

i 1 That is, the monotone sequence k is uniformly bounded. Hence T k B 1 0 i=0 T B k for k , then T 0 and (T ) < 1.

Theorem 4.8.5 If A1 0 and A = B1 C1 = B2 C2 are two regular decompositions with 0 C1 C2 , then it holds (B1 1 C1 ) (B2 1 C2 ). Proof: Let A = B C , A1 0. Then (B 1 C ) = ((A + C )1 C ) = ([A(I + A1 C )] C ) (A1 C ) 1 = ((I + A1 C ) A1 C ) = . 1 + (A1 C ) monotone for 0]. 1+ Because 0 C1 C2 it follows (A1 C1 ) (A1 C2 ). Then [ (B1 1 C1 ) = since
1+ 1

(A1 C2 ) (A1 C1 ) = (B2 1 C2 ), 1 + (A1 C1 ) 1 + (A1 C2 )

is monotone for > 0.

Theorem 4.8.6 If A is a symmetric M-matrix, then the decomposition A = LLT N according to Theorem 4.8.3 is a regular decomposition.
1 Proof: Because each L 0, it follows (LLT )1 0, (from (I leT )1 = (I + leT ), j l 0). N = C1 + C2 + + Cn1 and all Ci 0.

Denition 4.8.3 A Rnn is called an H-matrix, if the matrix H = H (A) which is dened by hij = is an M-matrix. Theorem 4.8.7 (Manteuel) For any symmetric H-matrix A and any symmetric reserved pattern E there exists an uniquely determined Incomplete Cholesky decomposition of A which satises (16.2). [Exercise !]. History: (i) CG-method, Hestenes-Stiefel (1952). (ii) CG-method as iterative method, Reid (1971). (iii) CG-method with preconditioning, Concus-Golub-Oleary (1976). (iv) Incomplete Cholesky decomposition, Meijerink-Van der Vorst (1977). (v) Nonsymmetric matrix, H-matrix, Incomplete Cholesky decomposition, Manteufel (1979). Other preconditioning: aii , if i = j, |aij |, if i = j,

124 Chapter 4. Iterative Methods for Solving Large Linear Systems (i) A blockform A = [Aij ] with Aij blocks. Take M = diag[A11 , ..., Akk ]. (ii) Try Incomplete Cholesky decomposition: Breakdown can be avoided by two ways. 1 2 If zi = aii i k=1 lik 0, breakdown, then either set lii = 1 and go on or set lik = 0, (k = 1, .., i 1) until zi > 0 (change reserved pattern E). (iii) A is an arbitrary nonsingular matrix with all principle determinants = 0. Then A = LDR exists, where D is diagonal, L and RT are unit lower triangular. Consider the following generalization of Incomplete Cholesky decomposition. Theorem 4.8.8 (Generalization) Let A be an nn matrix and E be an arbitrary reserved pattern with (i, i) E , i = 1, 2, . . . , n. A decomposition of the form A = LDR N which satises: (i) L is lower triangular, lii = 1, lij = 0, then (i, j ) E , (ii) R is upper triangular, rii = 1, rij = 0, then (i, j ) E , (iii) D is diagonal = 0, (iv) N = (nij ), nij = 0 for (i, j ) E . is uniquely determined. (The decomposition almost exists for all matrices).

4.9

Chebychev Semi-Iteration Acceleration Method


x = T x + f, T = M 1 N and f = M 1 b. (4.9.1)

Consider the linear system Ax = b. The splitting A = M N leads to the form

The basic iterative method of (4.9.1) is xk+1 = T xk + f. How to modify the convergence rate? Denition 4.9.1 The iterative method (4.9.2) is called symmetrizable, if there is a matrix W with detW = 0 and such that W (I T )W 1 is symmetric positive denite. Example 4.9.1 Let A and M be s.p.d., A = M N and T = M 1 N , then I T = I M 1 N = M 1 (M N ) = M 1 A. Set W = M 1/2 . Thus, W (I T )W 1 = M 1/2 M 1 AM 1/2 = M 1/2 AM 1/2 s.p.d. (i): M = diag(aii ) Jacobi method. (D L)D1 (D LT ) SSOR-method. (ii): M = (21 ) (iii): M = LLT Incomplete Cholesky decomposition. (iv): M = I xk+1 = (I A)xk + b Richardson method. Lemma 4.9.1 If (4.9.2) is symmetrizable, then the eigenvalues i of T are real and satisfy i < 1, for i = 1, 2, . . . , n. (4.9.3) (4.9.2)

4.9 Chebychev Semi-Iteration Acceleration Method 125 1 Proof: Since W (I T )W is s.p.d., the eigenvalues 1 i of I T are large than zero. Thus i are real and (4.9.3) holds. Denition 4.9.2 Let xk+1 = T xk + f be symmetrizable. The iterative method u0 = x 0 , uk+1 = (T uk + f ) + (1 )uk = (T + (1 )I )uk + f T uk + f. is called an Extrapolation method of (4.9.2). Remark 4.9.1 T = T + (1 )I is a new iterative matrix (T1 = T ). T arises from 1 1 the decomposition A = M (N + ( 1)M ). Theorem 4.9.2 If (4.9.2) is symmetrizable and T has the eigenvalues satisfying 1 2 n < 1, then it holds for = 22 > 0 that 1 2 1 > (T ) = n 1 = min (T ). 2 1 n (4.9.5)

(4.9.4)

Proof: Eigenvalues of T are i + (1 ) = 1 + (i 1). Consider the problem min max |1 + (ui 1)| = min!
i

|1 + (n 1)| = |1 + (1 1)|, 1 + (n 1) = (1 n ) 1 (otherwise 1 = n ).


2 This implies = = 21 , then 1 + (n 1) = n From (4.9.2) and (4.9.4) follows that k k n 1 . 21 n

uk =
i=0

aki xi , and
i=0

aki = 1

with suitable aki . Hence, we have the following idea: Find a sequence {aki }, k = 1, 2, . . ., i = 0, 1, 2, . . . , k and
k

k i=0

aki = 1 such that (4.9.6)

uk =
i=0

aki xi , u0 = x0

is a good approximation of x (Ax = b). Hereby the cost of computation of uk should not be more expensive than xk . Error: Let ek = xk x , ek = T k e0 , e0 = x0 x = u0 x = d0 . (4.9.7) Hence,
k

dk = uk x =
i=0 k

aki (xi x )
k

(4.9.8)

=
i=0

aki T e0 = (
ki

aki T i )e0

= Pk (T )e0 = Pk (T )d0 ,

126 where

Chapter 4. Iterative Methods for Solving Large Linear Systems


k

Pk () =
i=0

aki i

(4.9.9)

is a polynomial in with Pk (1) = 1. Problem: Find Pk such that (Pk (T )) is possible small. Remark 4.9.2 Let x
W

= W x 2 . Then T
W

= max

Tx W x=0 x W W T W 1 W x 2 = max x=0 Wx 2 1 = WTW 2 = (T ),


W -norm

because W T W 1 is symmetric. We take dk


W

on both sides of (4.9.8) and have (4.9.10)

Pk (T ) W d0 W = W Pk (T )W 1 2 d0 2 Pk (W T W 1 ) 2 d0 W = (Pk (T )) d0 W .

Replacement problem: Let 1 > n 1 be the eigenvalues of T . Determine min [{max |Pk ()| : 1 n } : deg(Pk ) k, Pk (1) = 1] . Solution of (4.9.11): The replacement problem (4.6.35) max{|Pk ()| : 0 < a b} = min!, Pk (0) = 1 has the solution Qk (t) = Tk ( 2t b a ) ba Tk ( b+a ). ab (4.9.11)

Substituting t 1 , 1 t, (1 , n ) (1 n , 1 1 ), the problem (4.9.11) can be transformed to the problem (4.6.35). Hence, the solution of (4.9.11) is given by Qk (t) = Tk (
k i=0 k

2t 1 n ) 1 n

Tk (

2 1 n ). 1 n

(4.9.12)

Write Qk (t) :=

aki ti . Then we have

uk =
i=0

aki xi ,

which is called the optimal Chebychev semi-iterative method. Eective Computation of uk : Using recursion of Tk as in (4.6.36), we get T0 (t) = 1, T1 (t) = t, Tk+1 (t) = 2tTk (t) Tk1 (t).

4.9 Chebychev Semi-Iteration Acceleration Method Transforming Tk (t) to the form of Qk (t) as in (4.9.12) we get Q0 (t) = 1, Q1 (t) = and Qk+1 (t) = [pt + (1 p)]ck+1 Qk (t) + (1 ck+1 ) Qk1 (t), where 2 2Tk (1/r) 1 n , ck+1 = and r = . 2 1 n rTk+1 (1/r) 2 1 n Claim: (4.9.13b) p= Qk+1 (t) = Tk+1 = 2t 1 n 1 n Tk+1 1 r 2t 1 n = pt + (1 p) 2 1 n

127

(4.9.13a) (4.9.13b) (4.9.14)

1 2t 1 n 2t 1 n 2t 1 n 2 Tk Tk 1 Tk+1 (1/r) 1 n 1 n 1 n 21 n 1 n 2t1 n Tk1 1 n Tk1 2 2t 1 n Tk 1 n 2Tk (1/r) 1 n r = 1 n rTk+1 (1/r) 1 n Tk (1/r) Tk1 (1/r) TK +1 2
1 n

= Ck+1 [pt + (1 p)]Qk (t) [1 Ck+1 ]Qk1 (t), since r and 1 Ck+1 = 1 2Tk (1/r) rTk+1 (1/r) 2Tk (1/r) = rTk+1 (1/r) rTk+1 (1/r) rTk1 (1/r) Tk1 (1/r) = = . rTk+1 (1/r) Tk+1 (1/r) 2t 1 n 1 n = 2t 1 n = pt + (1 p) 2 1 n

Recursion for uk : dk+1 = Qk+1 (T )d0 = (pT + (1 p)I )ck+1 Qk (T )d0 + (1 ck+1 )Qk1 (T )d0 , x = (pT + (1 p)I )ck+1 x + (1 ck+1 )x + p(I T )x ck+1 . Adding above two equations together we get uk+1 = [pT + (1 p)I ]ck+1 uk + (1 ck+1 )uk1 + ck+1 pf = ck+1 p {T uk + f uk } + ck+1 uk + (1 ck+1 ) uk1 . Then we obtain the optimal Chebychev semi-iterative Algorithm. Algorithm 4.9.1 (Optimal Chebychev semi-iterative Algorithm)
1 n 2 Let r = 2 , p = 21 , c1 = 2 1 n n u 0 = x0 , u1 = p(T u0 + f ) + (1 p)u0 For k = 1, 2, , uk+1 = ck+1 [p(T uk + f ) + (1 p)uk ] + (1 ck+1 ) uk1 , ck+1 = (1 r2 /4 ck )1 .

(4.9.15)

128 Chapter 4. Iterative Methods for Solving Large Linear Systems Remark 4.9.3 Here uk+1 can be rewritten as the three terms recursive formula with two parameters as in (4.7.5): uk+1 = = = = ck+1 [p (T uk + f ) + (1 p)uk ] + (1 ck+1 ) uk1 ck+1 pM 1 ((M A)uk + b) + (1 p)uk + uk1 ck+1 uk1 ck+1 uk + pM 1 (b Auk ) uk1 + uk1 ck+1 [uk + pzk uk1 ] + uk1 ,

where M zk = b Auk . Recursion for ck : Since c1 = thus Tk+1 It follows 1 ck+1 Then we have ck+1 = Error estimate: It holds uk x
W

2t0 2 = = 2, rT1 (1/r) r1 r 1 r 1 r


1 r 1 r

1 r

2 = Tk r
1 r 1 r

Tk1

(from (4.6.36)). r2 ck . 4 (4.9.16)

rTk+1 = 2Tk

r2 2Tk1 =1 4 rTk

=1

1 1 n with r = . 2 (1 (r /4) ck ) 2 1 n 2 1 n 1 n
1

Tk

u0 x

W.

(4.9.17)

Proof: From (4.9.10) and (4.9.12) we have dk


W

= Qk (T )d0 W (Qk (T )) d0 W max {|Qk ()| : 1 n } d0 Tk 2 1 n 1 n


1

d0

W.

We want to estimate the quantity qk := |Tk (1/r)|1 (see also Lemma 4.6.9). From (4.6.37) we have k k 1 1 1 + 1 r2 1 1 r2 Tk = + r 2 r r 1 (1 + 1 r2 )k + (1 1 r2 )k = 2 (r2 )k/2 1 (1 + 1 r2 )k + (1 1 r2 )k = 2 (1 + 1 r2 )(1 1 r2 ) k/2 1 1 k/2 c + ck/2 k/2 , = 2 2c

4.9 Chebychev Semi-Iteration Acceleration n k q4 j j 0.8 5 0.0426 8 14 0.9 10 0.1449 910 18 0.95 20 0.3159 1112 22 0.99 100 0.7464 1415 29 Table 4.3: Convergence rate of qk where j :

Method q8 j 9.06(-4) 1718 1.06(-2) 2223 5.25(-2) 2930 3.86(-1) 47


j

129 j 31 43 57 95

1 +1

q4 , q8 and j : j n q4 , q 8 .

1r where c = 1 < 1. Thus qk 2ck/2 . Rewrite the eigenvalues of I T as i = 1 i , 1+ 1r2 1 2 n > 0. Then

r= Thus, from c =
11r2 1+ 1r2

1 n 1 n 1 = = , 2 1 n 1 + n +1 =
1 +1 2

1 n

follows 1 +1
k

qk 2

(4.9.18)
W

That is, after k steps of the Chebychev semi-iterative method the residual uk x reduced by a factor 2
1 +1 k

is

from the original residual u0 x


1

W.

n If min = 1 = 0, then qk = Tk 2 . Table 4.3 shows the convergence rate of the n quantity qk . All above statements are true, if we replace n by n (n n ) and 1 by 1 (1 1 ), because is still in [1 , n ] for all eigenvalue of T .

Example 4.9.2 Let 1 > = (T ). If we set n = , 1 = , then p and r dened in (4.9.14) become p = 1 and r = , respectively. Algorithm 4.9.1 can be simplied by u0 = x 0 , u1 = T u0 + f, uk+1 = ck+1 (T uk + f ) + (1 ck+1 )uk1 , ck+1 = 1 2 /4 ck
1

(4.9.19)

with c1 = 2.

Also, Algorithm 4.9.1 can be written by the form of (4.9.19), by replacing T by T = Tp = (pT + (1 p)I ) and it leads to uk+1 = ck+1 (Tp uk + f ) + (1 ck+1 ) uk1 . Here p1 + (1 p) =
1 n 21 n

(4.9.20)

and pn + (1 p) =

n 1 21 n

are eigenvalues of Tp .

Remark 4.9.4 (i) In (4.9.15) it holds (r = ) c2 > c3 > c4 > , and lim ck =
k

2 . 1 + 1 r2

(Exercise!)

130 Chapter 4. Iterative Methods for Solving Large Linear Systems (ii) If T is symmetric, then by (4.9.12) we get Qk (T )
2

= max {|Qk (i )| : i is an eigenvalue of T } max {|Qk ()| : } 1 = Tk , ( = (T )). 1 (b 1)k/2 = k/2 = , c + ck/2 1 + (b 1)k
1

(4.9.21)

where c =

1+

2 12 = b 1 with b =
1

1+

12

4.9.1
Recall

Connection with SOR Method

(i) The SOR method solves linear system Ax = b (standard decomposition A = I L R): x(i+1) = (I L)1 ((1 )I + R)x(i) + (I L)1 b = L x(i) + (I L)1 b, 0<<2 (4.9.22)

(ii) A = I L R is called 2consistly ordered, if the eigenvalues of L + 1 R are independent of , (iii) (Theorem) A = I L R and A is 2consistly ordered. If A has real eigenvalues and (L + R) < 1, then it holds b 1 = (Lb ) < (L ), where b =
1+

= b ,

(4.9.23)

2 12 (L+R)

Consider (4.9.1) again x = T x + f, Assume that all eigenvalues of T are real and (T ) < 1. Then the following linear system (of order 2n) is equivalent to (4.9.1). x = T y + f, y = T x + f. That is, if x solves (4.9.1), then (4.9.25) (4.9.24) A = M N, T = M 1 N, f = M 1 b.

z1 x solves solves (4.9.25), reversely, if z = z2 x (4.9.25), then z1 = z2 solves (4.9.1). Because z1 z2 = T (z1 z2 ) and 1 is not an eigenvalue of T , so z1 = z2 . Let z= x y , J= 0 T T 0 , h= f f .

4.9 Chebychev Semi-Iteration Acceleration Method Then (4.9.25) can be written as z = Jz + h and I J is 2consistly ordered. Applying SOR method to (4.9.26) we get J = L + R := and (I L)zi+1 = ((1 )I + R)zi + h. Let zi = xi . Then we have yi I 0 T I hence xi+1 = (1 )xi + T yi + f = {T yi + f xi } + xi , yi+1 = T xi+1 + (1 )yi + f = {T xi+1 + f yi } + yi , The optimal value b for (4.9.27) is given by b = 1 1+ 1 2 (J ) . xi+1 yi+1 = (1 )I T 0 (1 )I xi yi + f f , 0 0 T 0 + 0 T 0 0

131 (4.9.26)

(4.9.27)

(4.9.28a) (4.9.28b)

Lemma 4.9.3 It holds (J ) = (T ) { (T )}, where (T ) = spectrum of T . Especially (T ) = (J ). Proof: Let (T ). There exists x = 0 with T x = x. Then J x x = x x and J x x = x x .

Thus we have (J ) (T ) { (T )}. On the other hand, from J 2 =

T2 0 follows 0 T2 that if is an eigenvalue of J , then 2 = 2 for one (T ), so = or . Thus (J ) (T ) { (T )}.

We then have b = 2 1+ 1 2 (T ) , (Lb ) = b 1 = 1 1+ 1 2 (T ) 1 2 (T ) . (4.9.29)

132

Chapter 4. Iterative Methods for Solving Large Linear Systems

4.9.2

Practical Performance
x0 , y 0 , x 1 , y 1 , x 2 , y 2 , . . . 0 , 1 , 2 , 3 , 4 , 5 , . . . 2i = xi , 2i+1 = yi , i = 0, 1, 2, . . .

Then (4.9.28) can be written as i+1 = b {T i + f i1 } + i1 , i = 1, 2, with 0 = x0 and 1 = y0 = T x0 + f . Comparing (4.9.30) with (4.9.19) we get ui+1 = ci+1 {T ui + f ui1 } + ui1 , i = 1, 2, . (4.9.31) (4.9.30)

Since ci converges to b , the optimal Chebychev acceleration method is referred to as a variant SOR method. Error estimate of (4.9.30): Write (4.9.30) as k+1 = b {T k + f k1 } + k1 , 0 = x0 , 1 = T 0 + f. Let k = k x . (Ax = b) Then we have 0 = 0 x , 1 = T 0 , k+1 = b T k + (1 b )k1 . Since x = b {T x + f x } + x , it follow that k = rk (T )0 , where rk (x) is a polynomial of degree k , and r0 = 1, r1 (t) = t, rk+1 (t) = b trk (t) + (1 b )rk1 (t). (4.9.33) (4.9.32)

(4.9.34)

Either solve this dierence equation or reduce to Chebychev polynomials of 2nd kind. sk+1 (t) = 2tsk (t) sk1 (t), s0 (t) = 1, s1 (t) = 2t.

4.10 GCG-type Methods for Nonsymmetric Linear Systems 133 In fact sk (cos ) = sin((k + 1))/ sin . One can estimate rk (T ) (see Varga p.146) by: Let T be Hermitian. Then rk (T ) = max{|rk (i )| : i is an eigenvalue of T } = max{|rk ()| : (T ) (T )} = (b 1)k/2 1 + k This implies
k

1 2 (T ) b 1 .

lim rk (T )

1/k

From (4.9.21) follows that


k

lim Qk (T )

1/k

b 1 .

4.10

GCG-type Methods for Nonsymmetric Linear Systems


1 F (x) = xT Ax xT b 2 Ax = b min F (x) = F (x ) n
xR

Recall: A is s.p.d. Consider the quadratic functional (4.10.1)

Consider 1 1 (x) = (b Ax)T A1 (b Ax) = F (x) + bT A1 b, 2 2 where 1 bT A1 b is a constant. Then 2 1 Ax = b (x ) = min (x) = [min F (x)] + bT A1 b n n xR xR 2 CG-method: Given x0 , r0 = p0 = b Ax0 for k = 0, 1, . . . T k = rk pk /pT k Apk , xk+1 = xk + k pk , rk+1 = rk k Apk ( b Axk+1 ) pk+1 = rk+1 + k pk T T T T k = rk +1 Apk /pk Apk (= rk+1 rk+1 /rk rk ) end for
T T Numerator: rk +1 ((rk rk+1 )/k ) = (rk+1 rk+1 )/k T T T Denominator: pT k Apk = (rk + k1 pk1 )((rk rk+1 )/k ) = (rk rk )/k .

(4.10.2)

Remark 4.10.1 CG method does not need to compute any parameters. It only needs matrix vector and inner product of vectors. Hence it can not destroy the sparse structure of the matrix A.

134 Chapter 4. Iterative Methods for Solving Large Linear Systems The vectors rk and pk generated by CG-method satisfy: pT i<k i rk = (pi , rk ) = 0, T ri rj = (ri , rj ) = 0, i = j T pi Apj = (pi , Apj ) = 0, i = j xk+1 = x0 +
k i=0

i pi minimizes F (x) over x = x0 + < p0 , , pk >.

4.10.1

GCG method(Generalized Conjugate Gradient)

GCG method is developed to minimize the residual of the linear equation under some special functional. In conjugate gradient method we take 1 1 1 (x) = (b Ax)T A1 (b Ax) = rT A1 r = r 2 2 2 where x A1 = xT A1 x. Let A be a unsymmetric matrix. Consider the functional 1 f (x) = (b Ax)T P (b Ax), 2 where P is s.p.d. Thus f (x) > 0, unless x = A1 b f (x ) = 0, so x minimizes the functional f (x). Dierent choices of P: (i) P = A1 (A is s.p.d.) CG method (classical) (ii) P = I GCR method (Generalized Conjugate residual). 1 1 f (x) = (b Ax)T (b Ax) = r 2 2 Here {ri } forms A-conjugate. (iii) Consider M 1 Ax = M 1 b. Take P = M T M > 0 GCGLS method (Generalized Conjugate Gradient Least Square). (iv) Similar to (iii), take P = (A + AT )/2 (note: P is not positive denite) and M = (A + AT )/2 we get GCG method (by Concus, Golub and Widlund). In general, P is not necessary to be taken positive denite, but it must be symmetric (P T = P ). Therefore, the minimality property does not hold. Let (x, y )o = xT P y = (x, y )o = (y, x)o .
2 2 2 A1 ,

4.10 GCG-type Methods for Nonsymmetric Linear Systems Algorithm 4.10.1 (GCG method) Given x0 , r0 = p0 = b Ax0 for k = 0, 1, k = (rk , Apk )o /(Apk , Apk )o xk+1 = xk + k pk rk+1 = rk k Apk ( b Axk+1 ) i
(k)

135

(4.10.3a) (4.10.3b) (4.10.3c) i = 0, 1, , k (4.10.3d) (4.10.3e)

= (Ark+1 , Api )o /(Api , Api )o ,


k

pk+1 = rk+1 +
i=0

i pi

(k)

end for In GCG method, the choice of {i }k i=1 satisfy: (rk+1 , Api )o = 0, (rk+1 , Ari )o = 0, (Api , Apj )o = 0, ik ik i=j (4.10.4a) (4.10.4b) (4.10.4c)
(k )

1 T Theorem 4.10.1 xk+1 = x0 + k i=0 k pi minimizes f (x) = 2 (b Ax) P (b Ax) over x = x0 + < p0 , , pk >, where P is s.p.d.

(The proof is the same as that of classical CG method). If P is indenite, which is allowed in GCG method, then the minimality property does not hold. xk+1 is the critical point of f (x) over x = x0 + < p0 , , pk >. Question: Can the GCG method break down? i.e., Can k in GCG method be zero? Consider the numerator of k : (rk , Apk ) = = = = (rk , Ark )o [by (4.10.3e) and (4.10.4a) ] T rk P Ark T T rk A P rk [Take transpose] T T (P A+A P ) rk rk . 2

(4.10.5)

From (4.10.5), if (P A + AT P ) is positive denite, then k = 0 unless rk = 0. Hence if the matrix A satises (P A + AT P ) positive denite, then GCG method can not break down. From GCG method, rk and pk can be rewritten by rk = k (A)r0 , pk = k (A)r0 , (4.10.6a) (4.10.6b)

where k and k are polynomials of degree k with k (0) = 1 [by (4.10.3c), (4.10.3e)]. From (4.10.6a), (4.10.6b) and (4.10.4b) follows that (rk+1 , Ai+1 r0 )o = 0, i = 0, 1, , k.
(k)

(4.10.7)

From (4.10.6a), (4.10.6b) and (4.10.3d), the numerator of i

can be expressed by (4.10.8)

T T T T (Ark+1 , Api )o = rk +1 A P Api = rk+1 A P Ai (A)r0 .

136 Chapter 4. Iterative Methods for Solving Large Linear Systems T If A P can be expressed by AT P = P s (A), (4.10.9) where s is some polynomial of degree s. Then (4.10.8) can be written by
T T (Ark+1 , Api )o = rk +1 A P Ai (A)r0 T = rk+1 P s (A)Ai (A)r0 = (rk+1 , As (A)i (A)r0 )o .

(4.10.10)

From (4.10.7) we know that if s + i k , then (4.10.10) is zero, i.e.,(Ark+1 , Api )o = 0. (k) Hence i = 0, i = 0, 1, , k s. But only in the special case s will be small. For instance, (i) In classical CG method, A is s.p.d, P is taking by A1 . Then AT P = AA1 = I = (k) A1 A = A1 1 (A), where 1 (x) = x, s = 1. So, i = 0, for all i + 1 k , it is only (k) k = 0. (ii) Concus, Golub and Widlund proposed GCG method, it solves M 1 Ax = M 1 b. (A: unsymmetric), where M = (A + AT )/2 and P = (A + AT )/2 (P may be indenite). Check condition (4.10.9): (M 1 A)T P = AT M 1 M = AT = M (2I M 1 A) = P (2I M 1 A). Then s (M 1 A) = 2I M 1 A, where 1 (x) = 2 x, s = 1. Thus i use rk+1 and pk to construct pk+1 . Check condition AT P + P A: (M 1 A)T M + M M 1 A = AT + A indenite The method can possibly break down. (iii) The other case s = 1 is BCG (BiCG) (See next paragraph). Remark 4.10.2 Except the above three cases, the degree s is usually very large. That is, we need to save all directions pi (i = 0, 1, , k ) in order to construct pk+1 satisfying the conjugate orthogonalization condition (4.10.4c). In GCG method, each iteration step k needs to save 2k + 5 vectors (xk+1 , rk+1 , pk+1 , {Api }k i=0 , {pi }i=0 ), k + 3 inner products (Here k is the iteration number). Hence, if k is large, then the space of storage and the computation cost can become very large and can not be acceptable. So, GCG method, in general, has some practical diculty. Such as GCR, GMRES (by SAAD) methods, they preserve the optimality (p > 0), but it is too expensive (s is very large). Modication: (i) Restarted: If GCG method does not converge after m + 1 iterations, then we take xk+1 as x0 and restart GCG method. There are at most 2m + 5 saving vectors. (ii) Truncated: The most expensive step of GCG method is to compute i , i = 0, 1, , k so that pk+1 satises (4.10.4c). We now release the condition (4.10.4c) to require that pk+1 and the nearest m direction {p|i}k i=km+1 satisfy the conjugate orthogonalization condition.
(k) (k)

= 0, i = 0, 1, , k 1. Therefore we only

4.10 GCG-type Methods for Nonsymmetric Linear Systems

137

4.10.2

BCG method (A: unsymmetric)

BCG method is similar to the CG method, it does not need to save the search direction. But the norm of the residual produced by BCG method does not preserve the minimal property. Solve Ax = b by considering AT y = c (phantom). Let = A Consider T Z (P = P T ) with Z = Take P = A A 0 0 AT , x = x y , b= b c .

x A = b. 0 I . This implies I 0

T Z = Z A and A T P = P A. A x From (4.10.9) we know that s = 1 for A = b. Hence it only needs to save one direction pk as in the classical CG method. x Algorithm 4.10.2 (Apply GCG method to A = b) x0 r0 x , p 0 = r 0 = bA 0 = x 0 r 0 for k = 0, 1, . . . p p p k = ( rk , A k )o /(A k , A k )o , x k+1 = x k + k p k , p r k+1 = r k k A k , p k+1 = r k+1 + k p k k = (Ar k+1 , Ap k )o /(Ap k , Ap k )o . end for Given x0 = Algorithm 4.10.3 (Simplication (BCG method)) Given x0 , p0 = r0 = b Ax0 Choose r 0 , p 0 = r 0 for k = 0, 1, . . . k = ( rk , rk )/( pk , Apk ), xk+1 = xk + k pk , rk+1 = rk k Apk r k+1 = r k k AT p k k = ( rk+1 , rk+1 )/( rk , rk ) pk+1 = rk+1 + k pk , p k+1 = r k+1 + k p k . end for p p From above we have (A k , A k )o = (Apk , AT pk ) 0 AT A1 0 Apk AT p k = 2( pk , Apk ). .

138 Chapter 4. Iterative Methods for Solving Large Linear Systems BCG method satises the following relations:
T T rk p i = r k pi = 0, T T T pk A p i = p k Api = 0, T T rk r i = r k ri = 0,

i<k i<k i<k

(4.10.11a) (4.10.11b) (4.10.11c)

Denition 4.10.1 (4.10.11c) and (4.10.11b) are called biorthogonality and biconjugacy condition, respectively. Property 4.10.1 (i) In BCG method, the residual of the linear equation does not satisfy the minimal property, because P is taken by T Z = P =A 0 AT A1 0

and P is symmetric, but not positive denite. The minimal value of the functional f (x) may not exist. T P + P A )/2 is not positive denite. (ii) BCG method can break down, because Z = (A From above discussion, k can be zero. But this case occurs very few. GCG GCR, GCR(k ) BCG Orthomin(k ) CGS Orthodir BiCGSTAB Orthores QMR GMRES(m) TFQMR FOM Axelsson LS

4.11
4.11.1

CGS (Conjugate Gradient Squared), A fast Lanczostype solver for nonsymmetric linear systems
The polynomial equivalent method of the CG method

Consider rst A is s.p.d. Then the CG method r0 = b Ax0 = p0 for i = 0, 1, 2, ai = (ri , pi )/(pi , Api ) = (ri , ri )/(pi , Api ) xi+1 = xi + ai pi ri+1 = ri ai Api pi+1 = ri+1 + bi pi bi = (ri+1 , Api )/(pi , Api ) = (ri+1 , ri+1 )/(ri , ri )

4.11 CGS (Conjugate Gradient Squared), A fast Lanczos-type solver for nonsymmetric linear systems 139 is equivalent to r0 = b Ax0 , p1 = 1, 1 = 1 for n = 0, 1, 2, T n = rn rn , n = n /n1 pn = rn + n pn1 n = pT n Apn , n = n /n rn+1 = rn n Apn xn+1 = xn + n pn (rn = b Axn )
T 1 A rn = minxx0 +Kn b Ax Remark 4.11.1 1. En = rn T rm = n nm , 2. rn A1

pT n Apm = n nm

From the structure of the new form of the CG method, we write rn = n (A)r0 , pn = n (A)r0

where n and n are polynomial of degree n. Dene 0 ( ) 1 and 1 ( ) 0. Then we nd pn = n (A)r0 + n n1 (A)r0 n (A)r0 (4.11.12a) with n ( ) n ( ) + n n1 ( ), and rn+1 = n (A)r0 n An (A)r0 n+1 (A)r0 with n+1 ( ) n ( ) n n ( ). (4.11.13b) The CG method can be re-iterpreted as an algorithm for generating a system of (orthogonal) polynomials. Dene the symmetric bilinear form (, ) by (, ) = [(A)r0 ]T (A)r0 . We have (, ) 0. Since A is symmetric, we can write
T (, ) = r0 (A) (A)r0 .

(4.11.12b) (4.11.13a)

Furthermore, from the associate law of matrices (, ) = (, ) for any polynomial , , . Here (, ) is semidenite, thus (, ) = 0 may occur! The polynomial equivalent method of the CG method : 0 1, 1 0, 1 = 1 for n = 0, 1, 2, n = (n , n ), n = n /n1 n = n + n n1 n = (n , n ), n = n /n n+1 = n n n .

140 Chapter 4. Iterative Methods for Solving Large Linear Systems where ( ) = . The minimization property reads En = (n , 1 n ) = min We also have (i , j ) = 0, (i , j ) = 0, i = j from (ri , rj ) = 0, i = j from (pi , Apj ) = 0, i = j. i = j. (, 1 ) . (0)2

P N

Theorem 4.11.1 Let [, ] be any symmetric bilinear form satisfying [, ] = [, ] , , P N

Let the sequence of n and n be constructed according to PE algorithm, but using [, ] instead (, ). Then as long as the algorithm does not break down by zero division, then n and n satisfy [n , m ] = n nm , [n , m ] = n nm with ( ) . Proof: By induction we prove the following statement: [n1 , k ] = n1 n1,k , [n , k ] = 0 (4.11.14)

n 0, 1 k n 1 with 1 = 0. If n = 0, this is true since 1 ( ) 0. Suppose (4.11.14) holds for n m and let k < m. Then by PE algorithm, it holds [m , k ] = [m , k ] + m [m1 , k ]. (4.11.15)

Substitute k = (k k+1 )/k in the rst term. The second term is zero for k < m 1, by hypothesis. Thus [m , k ] = [m , k ] [m , m+1 ] + m m1 m1,k , k k m 1.

If k < m 1, then [m , k ] = 0. For k = m 1 we have [m , m1 ] = m /m1 + m m1 = 0, which proves rst part of (4.11.14) for n = m + 1. Second Part: Write [m+1 , k ] = [m , k ] m [m , k ], k m.

If k m 1, then [m+1 , k ] = 0 by hypothesis. Using the algorithm and choosing k = m we get [m+1 , m ] = [m , m + m m1 ] m [m , m ] = m m m = 0, which proves the second part of (4.11.14).

4.11 CGS (Conjugate Gradient Squared), A fast Lanczos-type solver for nonsymmetric linear systems 141 By Induction (4.11.14) is valid for all n. Finally, writing k = k k k1 , k 0, it implies [n , k ] = [n , k ] n [n , k1 ] = 0, k < n. Together with the rst part of (4.11.14), we prove the theorem. The theorem is valid as long as the algorithm does not break down. For this reason we shall use orthogonal polynomial for n and n , whether or not the bilinear forms involved are inner products. In the following, we want to generalize the CG Algorithm to the nonsymmetric case. Consider Ax = b, A : nonsymmetric. Given x0 , r0 = b Ax0 , let r 0 be a suitably chosen vector. Dene [, ] by
T [, ] = r 0 (A) (A)r0 = ((AT ) r0 )T (A)r0 T and dene p1 = p 1 = 0. (If A symmetric : (, ) = r0 (A) (A)r0 ). Then we have

rn = n (A)r0 , pn = n (A)r0 ,

r n = n (AT ) r0 , p n = n (AT ) r0

with n and n according to (4.11.12b) and (4.11.13b). Indeed, these vectors can be produced by the Bi-Conjugate Gradient algorithm: Algorithm 4.11.1 (Bi-Conjugate Gradient algorithm) Given r0 = b Ax0 , p1 = p 1 and r 0 arbitrary For n = 0, 1, T n = r n rn , n = n /n1 pn = rn + n pn1 , p n = r n + n p n1 T n = p n Apn , n = n /n rn+1 = rn n Apn , r n+1 = r n n AT p n xn+1 = xn + n pn .
T T Property 4.11.1 rn = b Axn , rk r j = 0, j = k and pT j = 0, j = k. kA p

Remark 4.11.2 The Bi-Conjugate Gradient method is equivalent to the Lanczos biorthogonalization method. Km = span(Vm ) = span(r0 , Ar0 , , Am1 r0 ) = span(p0 , p1 , , pm1 ), Lm = span(Wm ) = span( r 0 , AT r 0 , , (AT )m1 r 0 ) = span( p0 , p 1 , , p m1 ). Remark 4.11.3 In practice r 0 is often chosen equal to r0 . Then, if A is not too far from being S.P.D., the bilinear expressions [, ] and [, ] will be positive semi-denite, and the algorithm will converge in the same way, and by the same argument as does the ordinary CG algorithm in the SPD-case!

142

Chapter 4. Iterative Methods for Solving Large Linear Systems

4.11.2

Squaring the CG algorithm: CGS Algorithm

Assume that Bi-CG is converging well. Then rn 0 as n . Because rn = n (A)r0 , n (A) behaves like contracting operators. Expect: n (AT ) behaves like contracting operators (i.e., r n 0). But quasi-residuals r n is not exploited, they need to be computed for the n and n . Disadvantage: Work of Bi-CG is twice the work of CG and in general AT v is not easy to compute. Especially if A is stored with a general data structure. Improvement: Using Polynomial equivalent algorithm to CG. Since n = [n , n ] and n = [n , n ], [, ] has the property [, ] = [, ]. Let 0 = 1. Then n = [0 , 2 n ],
2 ]. n = [0 , n

n+1 = n n n , n = n + n n1 .
2 Purpose: (i) Find an algorithm that generates the polynomials 2 n and n rather than n and n . (ii) Compute the approximation solution xn with rn = 2 n (A)r0 as residuals (try to T interpret). Because n = r 0 rn with rn = 2 ( A ) r , r and p 0 n n need not to be computed. n 2 2 How to compute n and n ? 2 2 2 n = [n + n n1 ]2 = 2 n + 2n n n1 + n n1 , 2 2 2 2 2 2 n+1 = [n n n ] = n 2n n n + n n .

Since n n = n [n + n n1 ] = 2 n + n n n1 ,
2 we only need to compute n n1 , 2 n and n . Now dene for n 0 :

n = 2 n, Algorithm 4.11.2 (CGS)

n = n n1 ,

2 n1 = n 1 .

0 1. 0 1 0, 1 = 1. for n = 0, 1, n = [1, n ], n = n /n1 Yn = n + n n n = Yn + n (n + n n1 ) n = [1, n ], n = n /n , ( ) = , n+1 = Yn n n n+1 = n n (Yn + n+1 )

4.12 Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems 143 Algorithm 4.11.3 (CGS Variant) Dene rn = n (A)r0 , qn = n (A)r0 , pn = n (A)r0 , r0 = b Ax0 , q0 = p1 = 0, 1 = 1 for n = 0, 1, T n = r 0 rn , n = n /n1 un = rn + n qn pn = un + n (qn + n pn1 ) vn = Apn T n = r 0 vn , n = n /n qn+1 = un n vn rn+1 = rn n A(un + qn+1 ) xn+1 = xn + n (un + qn+1 ). Since r0 = b Ax0 , rn+1 rn = A(xn xn+1 ), we have that rn = b Axn . So this algorithm produces xn of which the residual satisfy rn = 2 n (A)r0 . Remark 4.11.4 Each step requires twice the amount of work necessary for symmetric CG. However the contracting eect of n (A) is used twice each step. The work is not more than for Bi-CG and working with AT is avoided.

4.12

Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems
Given x0 , r0 = b Ax0 , ( r0 , r 0 ) = 0 , 0 = 1, p 0 = p0 = 0. For i = 1, 2, 3, i = ( ri1 , ri1 ) i = (i /i1 ) pi = ri1 + i pi1 p i = r i1 + i p i1 vi = Api i = i /( pi , vi ) xi = xi1 + i pi Stop here, if xi is accurate enough. ri = ri1 i vi = ri1 i Api r i = r i1 i AT p i end for

Algorithm 4.12.1 (Bi-CG method)

Property 4.12.1 (i) rj r 0 , . . . , r j 1 and r j r0 , . . . , rj 1 . (ii) three-term recurrence relations between {rj } and {r j }.

144 Chapter 4. Iterative Methods for Solving Large Linear Systems (iii) It terminates within n steps, but no minimal property.
BiCG BiCG Since rj = j (A)r0 and r j = j (AT ) r0 , it implies that

(rj , r i ) = j (A)r0 , i (AT ) r0 = (i (A)j (A)r0 , r 0 ) = 0, Algorithm 4.12.2 (CGS method)

i < j.

Given x0 , r0 = b Ax0 , (r0 , r 0 ) = 0, r 0 = r0 , 0 = 1, p0 = q0 = 0. For i = 1, 2, 3, i = ( r0 , ri1 ) = i /i1 u = ri1 + qi1 pi = u + (qi1 + pi1 ) v = Api = i /( r0 , v ) qi = u v w = u + qi xi = xi1 + w Stop here, if xi is accurate enough. ri = ri1 Aw end for
CGS We have rj = i (A)2 r0 . BiCG From Bi-CG method we have ri = i (A)r0 and pi+1 = i (A)r0 . Thus we get

i (A)r0 = (i (A) + i+1 i1 (A)) r0 , and i (A)r0 = (i1 (A) i Ai1 (A)) r0 , where i = i + i+1 i1 and i = i1 i i1 . Since i (A)r0 , j (AT ) r0 = 0, it holds that i (A)r0 r 0 , AT r 0 , . . . , (AT )i1 r 0 if and only if ( j (A)i (A)r0 , r 0 , ) = 0 for some polynomial j of degree j < i for j = 0, 1, , i 1. In Bi-CG method, CGS = 2 we take j = j r j = j (AT ) r0 and exploit it in CGS to get rj j (A)r0 . Now ri = i (A)i (A)r0 . How to choose i polynomial of degree i so that ri satises the minimum. Like polynomial, we can determine the optimal parameters of i so that ri satises the minimum. But the optimal parameters for the Chebychev polynomial are in general not easily obtainable. Now we take i i (x), where j < i,

4.12 Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems 145 i (x) = (1 1 x)(1 2 x) (1 i x). Here j are suitable constants to be selected. Dene rj = j (A)j (A)r0 . Then ri = i (A)i (A)r0 = (1 i A)i1 (A) (i1 (A) i Ai1 (A)) r0 = {(i1 (A)i1 (A) i Ai1 (A)i1 (A))} r0 i A {(i1 (A)i1 (A) i Ai1 (A)i1 (A))} r0 = ri1 i Api i A(ri1 i Api ) and pi+1 = = = = i (A)i (A)r0 i (A) (i (A) + i+1 i1 (A)) r0 i (A)i (A)r0 + i+1 (1 i A)i1 (A)i1 (A)r0 i (A)i (A)r0 + i+1 i1 (A)i1 (A)r0 i+1 i Ai1 (A)i1 (A)r0 = ri + i+1 (pi i Api ).

Recover the constants i , i , and i in Bi-CG method. We now compute i : Let i+1 = ( r0 , i (A)i (A)r0 ) = i (AT ) r0 , i (A)r0 . From Bi-CG we have i (A)r0 all vectors i1 (AT ) r0 , where i1 is an arbitrary polynomial of degree i 1. Consider the highest order term of i (AT ) (when computing i+1 ) is (1)i 1 2 i (AT )i . From Bi-CG method, we also have i+1 = i (AT ) r0 , i (A)r0 . The highest order term of i (AT ) is (1)i 1 i (AT )i . Thus i = ( i / i1 ) (i1 /i1 ) , because i = 1 i1 (AT )i1 r 0 , i1 (A)r0 i = T i 2 i1 (1 i2 (A ) r 0 , i2 (A)r0 ) 1 i1 T i1 0 , i1 (A)r0 1 i1 1 i1 (A ) r = 1 i2 T i2 , 0 i2 (A)r0 1 i2 1 i2 (A ) r = ( i / i1 ) (i1 /i1 ) .

Similarly, we can compute i and i . Let ri = ri1 Ay, xi = xi1 + y (side product).

Compute i so that ri = i (A)(A)r0 is minimized in 2-norm as a function of i .

146 Chapter 4. Iterative Methods for Solving Large Linear Systems Algorithm 4.12.3 (Bi-CGSTAB method) Given x0 , r0 = b Ax0 , r 0 arbitrary, such that ( r0 , r0 ) = 0, e.g. r 0 = r0 , 0 = = 0 = 1, v0 = p0 = 0 For i = 1, 2, 3, i = ( r0 , ri1 ) = (i /i1 )(/i1 ) pi = ri1 + (pi1 i1 vi1 ) vi = Api = i /( r0 , vi ) s = ri1 vi t = As i = (t, s)/(t, t) xi = xi1 + pi + i s (= xi1 + pi + i (ri1 Api )) Stop here, if xi is accurate enough. ri = s i t [= ri1 Api i A(ri1 Api ) = ri1 A(pi + i (ri1 Api )] end for Preconditioned Bi-CGSTAB-P: Rewrite Ax = b as x A = b
1 1 = K1 with A AK2 ,

1 1 where x = K2 x and b = K1 b. Then 1 1 1 p i K1 pi , v i K1 vi , r i K1 ri , 1 1 K1 ti , x s K1 s i , t K 2 xi , T r 0 K1 r 0 .

4.13

A Transpose-Free Qusi-minimal Residual Algorithm for Nonsymmetric Linear Systems

T r0 = 0, e.g. r 0 = r0 . We know that Given x0 , r0 = b Ax0 and r 0 arbitrary such that r 0

wT (b AxBCG ) = 0, n

w Kn ( r0 , AT ),

xBCG x0 + Kn (r0 , A). n

, generated by Bi-CG is dened by Petrov-Galerkin method. The nth iterate, xBCG n


BCG = n (A)r0 , n Pn , n (0) = 1. rn CGS = (n (A))2 r0 , xn x0 + K2n (r0 , A). rn BiCGST AB = n (A)n (A)r0 , xn x0 + K2n (r0 , A). rn

4.13 A Transpose-Free Qusi-minimal Residual Algorithm for Nonsymmetric Linear Systems 147 Algorithm 4.13.1 (CGS Algorithm) Choose x0 RN , set p0 = u0 = r0 = b Ax0 , v0 = Ap0 , Choose r 0 such that 0 = r T r0 = 0, for n = 0, 1, 2, T n1 = r 0 vn1 , n1 = n1 /n1 , qn = un1 n1 vn1 , xn = xn1 + n1 (un1 + qn ), rn = rn1 n1 A(un1 + qn ), If xn converges, stop; T n = r 0 rn , n = n /n1 , un = rn + n qn , pn = un + n (qn + n pn1 ), vn = Apn . end for Note that n1 = 0 for all n, and un1 = n1 (A)n1 (A)r0 , where n , n are generated by n ( ) = n ( ) + n n1 ( ), and n ( ) = n1 ( ) n1 n1 ( ). (4.13.4) 0 1 (4.13.3) qn = n (A)n1 (A)r0 , (4.13.2) (4.13.1)

4.13.1
Set

Quasi-Minimal Residual Approach


un1 , if m = 2n 1, odd qn , if m = 2n, even

ym = and wm =

(4.13.5)

2 n (A)r0 , n (A)n1 (A)r0 ,

if m = 2n + 1, odd if m = 2n, even

(4.13.6)

CGS CGS . Using (4.13.2) and (4.13.4) we get = 2 From rn n (A)r0 follows that w2n+1 = rn

n1 (A) = A1

1 n1

(n1 (A) n (A)).

Multiply above equation by n (A), then the vectors in (4.13.5) and (4.13.6) are related by Aym = 1
(m1)/2

(wm wm+1 ).

(4.13.7)

148 Chapter 4. Iterative Methods for Solving Large Linear Systems By (4.13.1), (m1)/2 in (4.13.7) = 0. Let Ym = [y1 , y2 , , ym ], Then from (4.13.7) we get
(e) AYm = Wm+1 Bm ,

Wm+1 = [w1 , , wm , wm+1 ]. (4.13.8)

where

(e) Bm

1 0 1 1 .. .. = . . 1 1 0 1

diag(0 , 0 , 1 , 1 , , )1 (4.13.9)

(m1)/2

is an (m + 1) m lower bidiagonal matrix. By (4.13.3), (4.13.4) and (4.13.1) we have that polynomials n and n are of full degree n. With (4.13.2) and (4.13.5) it implies Km (r0 , A) = span{y1 , y2 , , ym } = {Ym z | z Rm }. But any possible iterate xm must lie in x0 + Km (r0 , A). Thus xm = x0 + Ym z for some z Rm . (4.13.11) (4.13.10)

CGS From (4.13.8) and w1 = r0 (see w2n+1 = rn ) follows that the residual satises

rm = r0 AYm z = Wm+1 (e1 Let

(m+1)

(e) Bm z ).

(4.13.12) (4.13.13)

m+1 = diag(w1 , w2 , , wm+1 ), be any scaling matrix, rewrite (4.13.12) as

wk > 0,

1 (e) rm = Wm+1 m+1 (fm+1 Hm z ),

(4.13.14) (4.13.15)

where

fm+1 = 1 e1

(m+1)

(e) (e) Hm = m+1 Bm .

We now dene the m-th iterate, xm , of the transpose-free quasi-minimal residual method (TFQMR) by xm = x0 + Ym z m , (4.13.16) where zm is the solution of the least squares problem
(e) zm m := fm+1 Hm 2 (e) z = min fm+1 Hm m z R (e) 2

(4.13.17)

By (4.13.9), (4.13.13) and (4.13.15) it implies that Hm has full column rank m. Then zm is uniquely dened by (4.13.17). In general, we set wk = wk 2 , k = 1 , , m + 1.

1 This implies that all columns of Wm+1 m+1 are unit vectors.

4.13 A Transpose-Free Qusi-minimal Residual Algorithm for Nonsymmetric Linear Systems 149

4.13.2

Derivation of actual implementation of TFQMR


x m = x0 + Ym z m ,
1 z m = Hm fm ,

Consider (4.13.18) . where


(e) Hm =

Hm

and fm+1 =

fm

By (4.13.9), (4.13.13) and (4.13.15) follows Hm nonsingular, thus z m = [0 , 0 , 1 , , and


(e) m+1 = fm+1 Hm z m 2 . (m1)/2

]T

(4.13.19) (4.13.20)

Comparing (4.13.18) and (4.13.19) with update formula for iterate xCGS in CGS Algon rithm we get x 2n = xCGS . (4.13.21) n Lemma 4.13.1 Let w1 > 0, m 1 and
(e) Hm =

hm+1,m eT m

Hm

Hm1 0 hm+1,m

(e)

(4.13.22)

be an (m + 1) m upper Hessenberg matrix of full column rank m. For k = m 1, m, let zk Rk denote the solution of the least-square problem k := min fk+1 Hk z 2 ,
z Rk (e) +1 fk+1 = w1 ek Rk+1 . 1

(4.13.23)

1 Moreover, assume that Hm in (4.13.22) is nonsingular. Set z m := Hm fm . Then

zm = (1 c2 m) m = m1 m cm , where m = 1 m1

zm1 0

2 + c2 m , mz

(4.13.24) (4.13.25)

(e) fm+1 Hm z m 2 ,

cm =

1 . 2 1 + m

(4.13.26)

4.13.3

TFQMR Algorithm
2 m . xm = (1 c2 m )xm1 + cm x

From (4.13.24), (4.13.11) and (4.13.18) are connected by (4.13.27)

By (4.13.25), (4.13.26) and (4.13.20) follows that m = wm+1 , m1 cm = 1 2 1 + m and m = m1 m cm . (4.13.28)

150 Setting

Chapter 4. Iterative Methods for Solving Large Linear Systems dm = 1


(m1)/2

( xm xm1 ).

(4.13.29)

Rewrite (4.13.27) and get xm = xm1 + m dm , where m = c2 m


(m1)/2

(4.13.30)

. By (4.13.18) and (4.13.19) we get z m = [0 , 1 , ,


(m1)/2

x m = x0 + Ym z m , and thus

]T ,

x m = x m1 +

(m1)/2

ym .

Together with (4.13.29) and (4.13.30) (m replaced by m 1) we have dm = ym +


2 where m 1 := 1c2 m . c2 m1 2 m 1 m1 dm1 , (m1)/2

(4.13.31)

Remark 4.13.1 dm = 1 1 ( xm1 + ym xm1 ) = ym + [ xm1 xm1 ] 1 = ym + ( xm1 xm2 m1 dm1 ) 1 1 dm1 m1 dm1 ) = ym + ( m1 )dm1 = ym + ( 1 c2 1 m1 1 = ym + ( 2 m1 )dm1 = ym + (m1 ( 2 m1 ))dm1 . cm1 cm1

From (4.13.5) and (4.13.6), qn and un in CGS Algorithm follows y2n = y2n1 n1 vn1 , y2n+1 = w2n+1 + n v2n . (4.13.32)

Multiplying the update formula for pn in CGS Algorithm by A we get vn = Ay2n+1 + n (Ay2n + n vn1 ), for vn = Apn . By (4.13.7) wm s can be generated by wm+1 = wm
(m1)/2

(4.13.33)

Aym .

(4.13.34)

Combining (4.13.28), (4.13.30)-(4.13.34) we get the TFQMR Algorithm in standard weighting strategy k = wk 2 .

4.14 GMRES: Generalized Minimal Residual Algorithm for solving Nonsymmetric Linear Systems Algorithm 4.13.2 (TFQMR Algorithm) Choose x0 RN . Set w1 = y1 = r0 = b Ax0 , v0 = Ay1 , d0 = 0, 0 = r0 2 , 0 = 0, 0 = 0; Choose r 0 such that 0 = r T r0 = 0, For n = 0, 1, 2, do T set n1 = r 0 vn1 , n1 = n1 /n1 , y2n = y2n1 n1 vn1 , For m = 2n 1, 2n do set wm+1 = wm n1 Aym , 2 , m = wm+1 2 /m1 , cm = 1/ 1 + m 2 m = m1 m cm , m = cm n1 , 2 dm = ym + (m 1 m1 /n1 )dm1 , xm = xm1 + m dm , If xm converges, stop; End for T set n = r 0 w2n+1 , n = n /n1 , y2n+1 = w2n+1 + n yc 2n, vn = Ay2n+1 + n (Ay2n + n vn1 ). End for

151

4.14

GMRES: Generalized Minimal Residual Algorithm for solving Nonsymmetric Linear Systems

Algorithm 4.14.1 (GCR) Input: Given x0 , compute p0 = r0 = b Ax0 ; Output: solution of linear system Ax = b. Iterate i = 0, 1, 2, , compute i = (ri , Api )/(Api , Api ), xi+1 = xi + i pi , ri+1 = ri i Api b Axi , pi+1 = ri+1 + End;
(i) j i j =0

j pj ,

(i)

are chosen so that (Api+1 , Apj ) = 0 , for 0 j i.

It requires that 1 (AT + A) is symmetric positive denite. 2 Example 4.14.1 Let A= 0 1 1 0 and b= 1 1 .

Take x0 = 0. Then we obtain the following results: For i = 0 in Algorithm 4.14.1, we have that 0 = 0 which implies that x1 = x0 and r1 = r0 . Thus p1 = 0. For i = 1 in Algorithm 4.14.1, we see that a division by zero when computing 1 and break down.

152

Chapter 4. Iterative Methods for Solving Large Linear Systems

4.14.1

FOM algorithm: Full orthogonalization method

For GMRES method, (a) CANNOT break down, unless it has already converged. (b) 1/2 storage required than GCR, (c) 1/3 fewer arithmetic operations than GCR Main goal: Find orthogonal basis for Kk = {r0 , Ar0 , , Ak1 r0 }, i.e., span(Kk ) =< v1 , , vk >, where vi vj for i = j. Theorem 4.14.1 (Implicit Q theorem) Let AQ1 = Q1 H1 and AQ2 = Q2 H2 , where H1 , H2 are Hessenberg and Q1 , Q2 are unitary with Q1 e1 = Q2 e1 = q1 . Then Q1 = Q2 and H1 = H2 . Proof: Let q2 qn ] h11 h12 h1n . ... . h21 h22 . . ... ... ... . 0 . . ... ... ... . . hn1,n 0 0 hn,n1 hnn .

A[q1 q2 qn ] = [q1

(4.14.1)

Then we have Aq1 = h11 q1 + h21 q2 . Since q1 q2 , it implies that


h11 = q1 Aq1 /q1 q1 .

(4.14.2)

From (4.14.2), we get that q 2 h21 q2 = Aq1 h11 q1 . That is q2 = q 2/ q 2 Similarly, from (4.14.1), Aq2 = h12 q1 + h22 q2 + h32 q3 , where
h12 = q1 Aq2 2

and

h21 = q 2 2.

and

h22 = q2 Aq2 .

Let q 3 = Aq2 h12 q1 + h22 q2 .

4.14 GMRES: Generalized Minimal Residual Algorithm for solving Nonsymmetric Linear Systems Then q3 = q 3/ q 3
2

153

and

h32 = q 3 ,

and so on. Therefore, [q1 , , qn ] are uniquely determined by q1 . Thus, uniqueness holds. Let Kn = [v1 , Av1 , , An1 v1 ] with v1 2 = 1 is nonsingular. Kn = Un Rn and Un e1 = v1 . Then 0 0 . .. . . 1 . . .. .. . . . AKn = Kn Cn = [v1 , Av1 , , An1 v1 ] (4.14.3) . . . . 0 . . . . .. ... 0 . . . . 0 0 1 Since Kn is nonsingular, (4.14.3) implies that
1 1 1 A = Kn Cn Kn = (Un Rn )Cn (Rn Un ).

That is
1 AUn = Un (Rn Cn Rn ), 1 where (Rn Cn Rn ) is Hessenberg and Un e1 = v1 . Because < Un >=< Kn >, nd AVn = (i) (i) Vn Hn by any method with Vn e1 = v1 , then it holds that Vn = Un , i.e., vn = un for i = 1 , , n.

Algorithm 4.14.2 (Arnoldi algorithm) Input: Given v1 with v1 2 = 1; Output: Arnoldi factorization: AVk = Vk Hk + hk+1,k vk+1 eT k. Iterate j = 1, 2, , compute hij = (Avj , vi ) for i = 1, 2, , j , v j +1 = Avj j i=1 hij vi , hj +1,j = v j +1 2 , vj +1 = v j +1 /hj +1,j . End; Remark 4.14.1 (a) Let Vk = [v1 , , vk ] Rnk where vj , for j = 1, . . . , k , is generated by Arnoldi algorithm. Then Hk VkT AVk is upper k k Hessenberg. (b) Arnoldis original method was a Galerkin method for approximate the eigenvalue of A by Hk . In order to solve Ax = b by the Galerkin method using < Kk >< Vk >, we seek an approximate solution xk = x0 + zk with zk Kk =< r0 , Ar0 , , Ak1 r0 > and r0 = b Ax0 . Denition 4.14.1 {xk } is said to be satised the Galerkin condition if rk b Axk is orthogonal to Kk for each k .

154 Chapter 4. Iterative Methods for Solving Large Linear Systems The Galerkin method can be stated as that nd xk = x0 + z k such that (b Axk , v ) = 0, which is equivalent to nd zk Vk yk Vk such that (r0 Azk , v ) = 0, Substituting (4.14.5) into (4.14.6), we get VkT (r0 AVk yk ) = 0, which implies that yk = (VkT AVk )1 r0 e1 . (4.14.7) v Vk . (4.14.6) (4.14.5) v Vk , with zk Vk (4.14.4)

Since Vk is computed by the Arnoldi algorithm with v1 = r0 / r0 , yk in (4.14.7) can be represented as


1 yk = Hk r0 e1 .

Substituting it into (4.14.5) and (4.14.4), we get


1 xk = x0 + Vk Hk r0 e1 .

Using the result that AVk = Vk Hk + hk+1,k vk+1 eT k , rk can be reformulated as rk = b Axk = r0 AVk yk = r0 (Vk Hk + hk+1,k vk+1 eT k )yk T = r0 Vk r0 e1 hk+1,k eT k yk vk+1 = (hk+1,k ek yk )vk+1 . Algorithm 4.14.3 (FOM algorithm: Full orthogonalization method) Input: choose x0 , compute r0 = b Ax0 and v1 = r0 / r0 ; Output: solution of linear system Ax = b. Iterate j = 1, 2, , k , compute hij = (Avj , vi ) for i = 1, 2, , j , v j +1 = Avj j i=1 hij vi , hj +1,j = v j +1 2 , vj +1 = v j +1 /hj +1,j . End; Form the solution: 1 e1 . xk = x0 + Vk yk , where yk = r0 Hk

4.14 GMRES: Generalized Minimal Residual Algorithm for solving Nonsymmetric Linear Systems 155 In practice, k is chosen such that the approximate solution xk will be suciently accurate. Fortunately, it is simple to determine a posteriori when k is suciently large without having to explicitly compute xk . Furthermore, we have b Axk = hk+1,k |eT k yk | Property 4.14.1 (FOM) (a) rk //vk+1 ri rj , i=j

(b) FOM does NOT break down If the degree of the minimal polynomial of v1 is at least k , and the matrix Hk is nonsingular. (c) The process terminates at most N steps. A diculty with the full orthogonalization method is that it becomes increasingly expensive when k increases. There are two distinct ways of avoiding this diculty. (i) restart the algorithm every m steps (ii) vi+1 are only orthogonal to the previous incomplete FOM( ). vectors. Hk is then banded, then we have

A drawback of these truncation techniques is the lack of any theory concerning the global convergence of these truncation technique. Such a theory is dicult because there is NO optimality property similar to that of CG method. Therefore, we consider GMRES which satises an optimality property.

4.14.2

The generalized minimal residual (GMRES) algorithm

The approximate solution of the form x0 + zk , which minimizes the residual norm over zk Kk , can in principle be obtained by following algorithms: The ORTHODIR algorithm of Jea and Young; the generalized conjugate residual method (GCR); GMRES. Let Vk = [v1 , , vk ] , k = H R(k+1)k .

h1,1 h2,1 : 0

: hk,k1 0

h1,k h2,k : hk,k hk+1,k

By Arnoldi algorithm, we have k. AVk = Vk+1 H To solve the least square problem:
z Kk

(4.14.8)

min ro Az

= min b A(xo + z ) 2 ,
z Kk

(4.14.9)

156 Chapter 4. Iterative Methods for Solving Large Linear Systems where Kk =< ro , Aro , , Ak1 ro >=< v1 , , vk > with v1 = rroo 2 . Set z = Vk y , the least square problem (4.14.9) is equivalent to
y Rk

min J (y ) = min v1 AVk y 2,


y Rk

= ro 2 .

(4.14.10)

Using (4.14.8), we have k y] J (y ) = Vk+1 [e1 H


2

k y 2. = e1 H

(4.14.11)

Hence, the solution of the least square (4.14.9) is xk = xo + Vk yk , where yk minimize the function J (y ) dened by (4.14.11) over y Rk . Algorithm 4.14.4 (GMRES algorithm) Input: choose x0 , compute r0 = b Ax0 and v1 = r0 / r0 ; Output: solution of linear system Ax = b. Iterate j = 1, 2, , k , compute hij = (Avj , vi ) for i = 1, 2, , j , v j +1 = Avj j i=1 hij vi , hj +1,j = v j +1 2 , vj +1 = v j +1 /hj +1,j . End; Form the solution: xk = x0 + Vk yk , where yk minimizes J (y ) in (4.14.11). Diculties: when k is increasing, storage for vj , like k , the number of multiplications is like 1 k2N . 2 Algorithm 4.14.5 (GMRES(m) algorithm) Input: choose x0 , compute r0 = b Ax0 and v1 = r0 / r0 ; Output: solution of linear system Ax = b. Iterate j = 1, 2, , m, compute hij = (Avj , vi ) for i = 1, 2, , j , v j +1 = Avj j i=1 hij vi , hj +1,j = v j +1 2 , vj +1 = v j +1 /hj +1,j . End; Form the solution: xm = x0 + Vm ym , where ym minimizes e1 Hm y for y Rm . Restart: Compute rm = b Axm , if rm is small , then stop, else , Compute x0 = xm and v1 = rm / rm , GoTo Iterate step.

4.14 GMRES: Generalized Minimal Residual Algorithm for solving Nonsymmetric Linear Systems

4.14.3

Practical Implementation: Consider QR factorization157 of Hk

Consider the matrix Hk , and let us suppose that we want to solve the least squares problem: min e1 Hk y 2
y Rk

Assume Givens rotations Fi , i = 1 . . . , j such that (j +1)j . Fj F1 Hj = Fj . . . F 1 0 = Rj R 0 0 0 0 0 0 In order to obtain Rj +1 we must start by premultiptying the new column by the previous rotations. + + + + 0 + + Hj +1 = Fj . . . Fi Hj +1 = 0 0 + + 0 0 0 + 0 r 0 h 0 0 0 0 + The principal upper (j + 1) j submatrix of the above matrix is nothing but Rj , and h := hj +2,j +1 is not aected by the previous rotations. The next rotation Fj +1 dened by cj +1 r/(r2 + h2 )1/2 , sj +1 = h/(r2 + h2 )1/2 . Thus, after k steps of the above process, we have achieved Qk Hk = Rk where Qk is a (k + 1) (k + 1) unitary matrix and J (y ) = e1 Hk y = Qk [e1 Hk y ] = gk Rk y , (4.14.12)

where gk Qk e1 . Since the last row of Rk is a zero row, the minimization of (4.14.12) 1 gk , where Rk and gk are removed the last row of Rk and the last is achieved at yk = Rk component of gk , respectively. Proposition 4.14.1 rk = b Axk =| The (k+1)-st component of gk |.

To avoid the extra computation needed to obtain xk explicitly we suggest an ecient implementation of the last step of GMRES(m). To compute xm we need to compute Hm

158 Chapter 4. Iterative Methods for Solving Large Linear Systems and v1 , . . . , vm . Since v1 , , vm are known, we need to compute hi,m , for i = 1, . . . , m +1, of the form h11 . . . h1m1 h1m . . . . h21 . . . .. .. . . . . 0 hm,m1 hmm 0 hm+1,m with hi,m = (Avm , vi ), for i m. Here hm+1,m satises
m m

h2 m+1,m = Avm
i=1

him vi

= Avm

i=1

h2 i,m ,

because
m

Avm
i=1

him vi = hm+1,m vm+1 ,

vm+1 vi , for i = 1, . . . , m.

Now we will show how to compute rm = b Axm from vi s i = 1, . . . , m and Avm . From (4.14.11) the residual vector can be expressed as rm = Vm+1 [e1 Hm ym ]. Dene t [t1 , t2 , . . . , tm+1 ]T e1 Hm ym . Then
m

vm = (
i=1 m

ti vi ) + tm+1 vm+1 ti vi ) + tm+1


i=1

= ( =

1 hm+1,m

[Avm
i=1

hi,m vi ]

tm+1 Avm + hm+1,m

(ti tm+1 hi,m /hm+1,m )vi .


i=1

Assume the rst m 1 Arnoldi steps have been performed that the rst m 1 columns of Hm and the rst m vectors vi , i = 1, . . . , m are available. Since we will not normalize vi at every step, we do not have explicitly vi but rather wi = i vi , i are some known scaling coecient (e.g., i = vi ). We have shown that rm is a linear combination of Avm and vi s, i = 1, . . . , m. Hence after m steps we do not need vm+1 . (Note that computing vm+1 and its norm costs (2m + 1)n multiplications. So elimination of its computation is a signicant saving). So using v1 , . . . , vm and Avm we can compute restarting vector v1 := rm / rm and dont need to compute vm+1 . Then rm = tm+1 Avm + hm+1,m
m

(ti tm+1 hi,m /hm+1,m )vi .


i=1

By Proposition 4.14.1 it holds that rm / rm 2 .

rm

2 =|

the (k + 1)-st component of gk |. So v1 :=

4.14 GMRES: Generalized Minimal Residual Algorithm for solving Nonsymmetric Linear Systems

4.14.4

Theoretical Aspect of GMRES

159

GMRES CANNOT break down! GCR can break down when A is not positive real, i.e., 1 (A + AT ) is not symmetric positive denite. We assume that the rst m Arnoldi vectors 2 can be constructed. That is, hj +1,j = 0, for j = 1, 2, . . . , m. In fact, if hj +2,j +1 = 0, the diagonal element rj +1,j +1 of Rj +1 satises
1/2 rj +1,j +1 = (cj +1 r sj +1 hj +2,j +1 ) = (r2 + h2 > 0. j +2,j +1 )

Hence, the diagonal elements of Rm do not vanish and the least squares problem J (y ) = min gm Rm y 2 can be solved, establishing that the algorithm can not break down if hj +1,j = 0, for j = 1, . . . , m. Thus the only possible potential diculty is that during the Arnoldi process we encounter hj +1,j = 0. From Arnoldis algorithm it is easily seen that (i) AVj = Vj Hj which means that Kj spanned by Vj is invariant. Note that if A is nonsingular then the eigenvalues of Hj are nonzero. J (y ) in (4.14.10) at the j th step becomes J (y ) = v1 AVj y = v1 Vj Hj y = Vj [e1 Hj y ] = e1 Hj y .
1 Since Hj is nonsingular, the above function is minimum for y = Hj e1 and the corresponding minimum norm is zero, i.e., the solution xj is exact.

Conversely, assume xj is the exact solution and xi , for i = 1, . . . , j 1 are not, i.e. rj = 0 but ri = 0, for i = 0, 1, . . . , j 1. From Proposition 4.14.1 we know that rj = sj eT j 1 gj 1 = sj rj 1 = 0. Then sj = 0 ( rj 1 = 0) which implies that hj +1,j = 0, i.e., the algorithm breaks down and v j +1 = 0 which proves the result. (ii) v j +1 = 0 and v i = 0, for i = 1, . . . , j the degree of minimal polynomial of r0 = v1 is equal to j . () Assume that there exists a polynomial pj of degree j such that pj (A)v1 = 0 and pj is the polynomial of the lowest degree for which this is true. Therefore, Kj +1 =< v1 , Av1 , , Aj v1 >= Kj so v j +1 Kj +1 = Kj and v j +1 Kj , then v j +1 = 0. Moreover, if v i = 0 for some i j then there is a polynomial pi of degree i such that pi (A)vi = 0. This contradicts the minimality of pj . () There is a polynomial pj of degree j such that pj (A)v1 = 0 (by assumption v j +1 = c, v i = 0, i = 1, . . . , j ). pj is the polynomial of the lowest degree for which this is true. Otherwise, we have v i = 0, for some i < j + 1 by the rst part of this proof. This is contradiction. Proposition 4.14.2 The solution xj produced by GMRES at step j is exact which is equivalent to (i) The algorithm breaks down at step j , j +1 = 0, (ii) v

160 Chapter 4. Iterative Methods for Solving Large Linear Systems (iii) hj +1,j = 0, (iv) The degree of the minimal polynomial of r0 is j . Corollary 4.14.1 For an n n problem GMRES terminates at most n steps. This uncommon type of breakdown is sometimes referred to as a Lucky breakdown is the context of the Lanczos algorithm. Proposition 4.14.3 Suppose that A is diagonalizable so that A = XDX 1 and let (m) = Then rm+1 (X )(m) r0 , where (X ) = X X 1 . rm [1 / ]m/2 r0 , where = (min (M ))2 and = max (AT A). This proves the convergence of GMRES(m) for all m, when A is positive real. Theorem 4.14.2 Assume 1 , . . . , v of A with positive(negative) real parts and the other eigenvalues enclosed in a circle centered at C with C > 0 and have radius R with C > R. Then m 2 m | i j | D R R (m) max j = +1, ,N C |i | d C i=1 where D=
i=1, , j = +1, ,N pPm ,p(0)=1

min

i (A)

max |p(i )| .

When A is positive real with symmetric part M , it holds that

max

|i j |

and

d = min |i | .
i=1, ,

Proof: Consider p(z ) = r(z )q (z ) where r(z ) = (1 z/1 ) (1 z/ ) and q (z ) arbitrary polynomial of deg m such that q (0) = 1. Since p(0) = 1 and p(i ) = 0, for i = 1, . . . , , we have (m)
j = +1, ,N

max

|p(j )|

j = +1, ,N

max

|r(j )|

j = +1, ,N

max

|q (j )| .

It is easily seen that


j = +1, ,N

max

|r(j )| =

j = +1, ,N

max

i=1

D |i j | | i | d

By maximin principle, the maximin of |q (z )| for z {j }N j = +1 is no larger than its maximin over the circle that encloses that set. Taking (z ) = [(C z )/C ]m whose maximin modulus on the circle is (R/C )m yields the desired result. Corollary 4.14.2 Under the assumptions of Proposition 4.14.3 and Theorem 4.14.2, GMRES(m) converges for any initial x0 if m > Log DC (X )1/ dR Log C . R

Part II On the Numerical Solutions of Eigenvalue Problems

Chapter 5 The Unsymmetric Eigenvalue Problem


Generalized eigenvalue problem (GEVP): Given A, B Cnn . Determine C and 0 = x Cn with Ax = Bx. is called an eigenvalue of the pencil A B (or pair(A, B )) and x is called an eigenvector corresponding to . is an eigenvalue of A B det(A B ) = 0. ( (A, B ) { C | det(A B ) = 0}.) Denition 5.0.2 A pencil A B (A, B Rmn ) or a pair(A, B ) is called regular if that (i) A and B are square matrices of order n, and (ii) det(A B ) 0. In all other case (m = n or m = n but det(A B ) 0), the pencil is called singular. Detailed algebraic structure of a pencil A B see Matrix theory II, chapter XII (Gantmacher 1959). Eigenvalue Problem (EVP): Special case in GEVP when B = I , we have C and 0 = x Cn with Ax = x. is an eigenvalue of A and x is an eigenvector corresponding to . Denition 5.0.3 (a) (A) = { C | det(A I ) = 0} is called the spectrum of A. (b) (A) = max{| |: (A)} is called the radius of (A). (c) P () = det(I A) is called the characteristic polynomial of A.
s s

Let P () =
i=1

( i )

m(i )

i = j (i = j ) and
i=1

m(i ) = n.

Example 5.0.2 A = {2}.

2 2 0 3

,B=

1 0 0 0

= det(A B ) = 2 and (A, B ) =

164 Example 5.0.3 A = Example 5.0.4 A =

Chapter 5. The Unsymmetric Eigenvalue Problem 1 2 0 1 ,B= = det(A B ) = 3 and (A, B ) = . 0 3 0 0 1 2 0 0 ,B= 1 0 0 0 = det(A B ) = 0 and (A, B ) = C.

Example 5.0.5 det(A B ) = (2 ) = 1 : Ax = Bx = = 2. 1 = 1 : Bx = Ax = = 0, = 2 = = , = 2. (A, B ) = {2, }. Example 5.0.6 det(A B ) = 3 = 1 : no solution for . = 1 : Bx = Ax = = 0, 0.(multiple) (A, B ) = {, }. Let m(i ) := algebraic multiplicity of i . n(i ) := n rank (A i I ) = geometric multiplicity. 1 n(i ) m(i ). If for some i, n(i ) < m(i ), then A is degenerated (defective). The following statements are equivalent: (a) A is diagonalizable: There exists a nonsingular matrix T such that T 1 AT = diag (1 , , n ). (b) There are n linearly independent eigenvectors. (c) A is nondefective, i.e. (A) = m() = n(). If A is defective then eigenvector + principle vector = Jordan form. Theorem 5.0.3 (Jordan decomposition) If A Cnn , then there exists a nonsingular X Cnn , such that X 1 AX = diag(J1 , , Jt ),where i 1 0 .. .. . . Ji = . .. 1 0 i is mi mi and m1 + + mt = n. Theorem 5.0.4 (Schur decomposition) If A Cnn then there exists a unitary matrix U Cnn such that U AU (= U 1 AU ) a upper triangular. - A normal(i.e. AA = A A) unitary U such that U AU = diag(1 , , n ), i.e. Aui = i ui , u i uj = ij . - A hermitian(i.e. A = A) A is normal and (A) R. - A symmetric(i.e. AT = A, A Rnn ) orthogonal U such that U T AU = diag(1 , , n ) and (A) R.

5.1 Orthogonal Projections and C-S Decomposition

5.1

Orthogonal Projections and C-S Decomposition

165

Denition 5.1.1 Let S Rn be a subspace, P Rnn is the orthogonal projection onto S if Range(P ) = S, P 2 = P, (5.1.1) T P = P, where Range(P ) = R(P ) = {y Rn | y = P x, for some x Rn }. Remark 5.1.1 If x Rn , then P x S and (I P )x S . Example 5.1.1 P = vv T /v T v is the orthogonal projection onto S = span{v }, v Rn .
x

S=span{v} Px

Figure 5.1: Orthogonal projection

Remark 5.1.2 (i) If P1 and P2 are orthogonal projections, then for any z Rn we have (P1 P2 )z
2 2=

(P1 z )T (I P2 )z + (P2 z )T (I P1 )z.

(5.1.2)

If R(P1 ) = R(P2 ) = S then the right-hand side of (5.1.2) is zero. Thus the orthogonal projection for a subspace is unique. (ii) If V = [v1 , , vk ] is an orthogonal basis for S , then P = V V T is unique orthogonal projection onto S . Denition 5.1.2 Suppose S1 and S2 are subspaces of Rn and dim(S1 ) = dim(S2 ). We dene the distance between S1 and S2 by dist(S1 , S2 ) = P1 P2 where Pi is the orthogonal projection onto Si ,
2,

(5.1.3)

i = 1, 2.

Remark 5.1.3 By considering the case of one-dimensional subspaces in R2 , we obtain a geometrical interpretation of dist(, ). Suppose S1 = span{x} and S2 = span{y } and

166

Chapter 5. The Unsymmetric Eigenvalue Problem


S 2 =span{y}

S 1 =span{x}

x 2 = y 2 = 1. Assume that xT y = cos , [0, ]. It follows that the dierence 2 between the projections onto these spaces satises P1 P2 = xxT yy T = x[x (y T x)y ]T [y (xT y )x]y T . If = 0( x = y ), then dist(S1 , S2 ) = P1 P2 If = 0, then
2=

sin = 0.

Ux = [u1 , u2 ] = [x, [y (y T x)x]/ sin ] and Vx = [v1 , v2 ] = [[x (xT y )y ]/ sin , y ] are dened and orthogonal. It follows that P1 P2 = Ux diag[sin , sin ] VxT is the SVD of P1 P2 . Consequently, dist(S1 , S2 ) = sin , the sine of the angle between the two subspaces. Theorem 5.1.1 (C-S Decomposition, Davis / Kahan(1970) or Stewart(1977)) Q11 Q12 If Q = is orthogonal with Q11 Rkk and Q22 Rj j (k j ), then there Q21 Q22 exists orthogonal matrices U1 , V1 Rkk and orthogonal matrices U2 , V2 Rj j such that I 0 0 T U1 0 Q11 Q12 V1 0 = 0 C S , (5.1.4) 0 U2 Q21 Q22 0 V2 0 S C where C = diag (c1 , , cj ), ci = cos i , S = diag (s1 , , sj ), si = sin i and 0 1 2 j . 2 Q1 be orthogonal with Q1 Rnn . Then there are unitary Q2 matrices U1 , U2 and W such that Lemma 5.1.1 Let Q =
T 0 U1 T 0 U2

Q1 Q2

W =

C S

2 where C = diag (c1 , , cj ) 0, and S = diag (s1 , , sn ) 0 with c2 i + si = 1, i = 1, , n.

5.1 Orthogonal Projections and C-S Decomposition T Proof: Let U1 Q1 W = C be the SVD of Q1 . Consider
T U1 0 0 I

167

Q1 Q2

W =

C Q2 W

2 2 Q2 W . Then C 2 + Q T T has orthogonal columns. Dene Q 2 Q2 = I or Q2 Q2 = I C T diagonal, thus Q 2 Q2 is diagonal. Which means that the nonzero column of Q2 are 2 T 2 are nonzero, set S = Q 2 Q 2 and orthogonal to one another.If all the columns of Q 1 T T U2 = Q2 S , then we have U2 U2 = I and U2 Q2 = S . It follows the decomposition. 2 has zero columns, normalize the nonzero columns and replace the zero columns If Q 2 . It is with an orthogonal basis for the orthogonal complement of the column space of Q T easily veried that U2 so dened is orthogonal and S U2 Q2 is diagonal. It also follows that decomposition.

Theorem 5.1.2 (C-S Decomposition) Let the unitary matrix W Cnn be partiW11 W12 tioned in the form W = , where W11 Crr with r n . Then there exist 2 W21 W22
r nr r nr

unitary matrices U = diag ( U1 , U2 ) and V = diag ( V1 , V2 ) such that 0 }r }r U WV = 0 , 0 0 I }n 2r

(5.1.5)

2 where = diag (1 , , r ) 0 and = diag(1 , , r ) 0 with i2 + i = 1, i = 1, , r. Proof: Let = U1 W11 V1 be the SVD of W11 with the diagonal elements of : 1 2 k < 1 = k+1 = = r , i.e.

= diag ( , Irk ). The matrix W11 W21 I= V1 has orthogonal columns. Hence W11 W21

V1

W11 W21

V1 = 2 + (W21 V1 ) (W21 V1 ).

Since I and 2 are diagonal, (W21 V1 ) (W21 V1 ) is diagonal. So the columns of W21 V1 are orthogonal. Since the ith diagonal of I 2 is the norm of the ith column of W21 V1 , only 2 be unitary whose rst the rst k (k r n r) columns of W21 V1 are nonzero. Let U k columns are the normalized columns of W21 V1 . Then W21 V1 = U 2 0 ,

2 C(nr)(nr) . Since where = diag (1 , , k , 0, , 0) diag( , 0), U 2 ) W11 V1 = diag(U1 , U W21 0

168 Chapter 5. The Unsymmetric Eigenvalue Problem 2 has orthogonal (orthonormal) columns, we have i2 + i = 1, i = 1, , r. ( is nonsingular). By the same argument as above : there is a unitary V2 C(nr)(nr) such that
U1 W12 V2 = (T, 0), 2 where T = diag (1 , , r ) and i 0. Since i2 + i2 = 1, it follows from i2 + i =1 that T = . Set U = diag(U1 , U2 ) and V = diag (V1 , V2 ). Then X = U W V can be partitioned in the form 0 0 0 }k 0 I 0 0 0 }r k 0 X X X X= . 33 34 35 }k 0 0 X43 X44 X45 }r k }n 2r 0 0 X53 X54 X55

Since columns 1 and 4 are orthogonal, it follows X34 = 0. Thus X34 = 0 (since nonsigular). Likewise X35 , X43 , X53 = 0. From the orthogonality of columns 1 and 3, it 3 = X44 X45 is unitary. follows that + X33 = 0, so X33 = . The matrix U X54 X55 H 3 )X with Set U2 = diag(Ik , U3 )U2 and U = diag (U1 , U2 ). Then U W V = diag (Ir+k , U 0 0 0 0 I 0 0 0 0 0 0 X= 0 0 0 I 0 0 0 0 0 I . The theorem is proved. Theorem 5.1.3 Let W = [W1 , W2 ] and Z = [Z1 , Z2 ] be orthogonal, where W1 , Z1 Rnk and W2 , Z2 Rn(nk) . If S1 = R(W1 ) and S2 = R(Z1 ) then dist(S1 , S2 ) =
2 T 1 min (W1 Z1 )

(5.1.6)

Proof: Let Q = W T Z and assume that k j = n k . Let the C-S decomposition of Q be given by (5.1.2), (Qij = WiT Zj , i, j = 1, 2). It follows that
T W1 Z2 2= T W2 Z1 2=

sj =

1 c2 j =

2 T 1 min (W1 Z1 ).

T T are the orthogonal projections onto S1 and S2 , respectively. We and Z1 Z1 Since W1 W1 have

dist(S1 , S2 ) = =

T T W1 W1 Z1 Z1 2 T T T W (W1 W1 Z1 Z1 )Z T 0 W1 Z2 = 2 T 0 W 2 Z1 = sj .

If k < j, the above argument by setting Q = [W2 , W1 ]T [Z2 , Z1 ] and noting that
T T Z2 ) = sj . Z1 ) = min (W1 min (W2

5.2 Perturbation Theory

5.2

Perturbation Theory

169

Theorem 5.2.1 (Gerschgorin Circle Theorem) If X 1 AX = D+F , D diag(d1 , , dn ) and F has zero diagonal entries, then (A) n i=1 Di , where
n

Di = {z C | |z di |
j =1,j =i

|fij |}.

Proof: Suppose (A) and assume without loss of generality that = di for i = 1, , n. Since (D I ) + F is singular, it follows that
n

1 (D I ) F

= j =1

|fkj | / |dk |

for some k (1 k n). But this implies that Dk . Corollary 5.2.1 If the union M1 = k j =1 Dij of k discs Dij , j = 1, , k, and the union M2 of the remaining discs are disjoint, then M1 contains exactly k eigenvalues of A and M2 exactly n k eigenvalues. Proof: Let B = X 1 AX = D + F , for t [0, 1]. Let Bt := D + tF , then B0 = D, B1 = B . The eigenvalues of Bt are continuous functions of t. Applying Theorem 5.2.1 of Gerschgorin to Bt , one nds that for t = 0, there are exactly k eigenvalues of B0 in M1 and n k in M2 . (Counting multiple eigenvalues) Since for 0 t 1 all eigenvalues of Bt likewise must lie in these discs, it follows for reasons of continuity that also k eigenvalues of A lie in M1 and the remaining n k in M2 . Remark 5.2.1 Take X = I, A = diag (A) + odiag(A). Consider the transformation A 1 A with = diag (1 , , n ). The Gerschgorin discs:
n

Di = {z C | |z aii |
k=1 k=i

aik k =: i }. i

Example 5.2.1 Let A = |z 2| 2 }, 0 <

1 2

, D1 = {z | |z 1| 2 }, D2 = D3 = {z |

2 1. Transformation with = diag (1, k , k ), k > 0 yields 1 k 2 k 2 = 1 A = 1 2 . A k 1 2 k


1 k

we have 1 = 2k 2 , 2 = 3 = For A

are disjoint if + . The discs D1 and D2 = D3 for A


2

1 + 2 = 2k

1 k

+ < 1.

, for which D1 and For this to be true we must clearly have k > 1. The optimal value k D2 (for A) touch one another, is obtained from 1 + 2 = 1. One nds

170

Chapter 5. The Unsymmetric Eigenvalue Problem 2 = k = 1 + + O( 2 ) 1 + (1 )2 8 2

2 = 2 2 + O( 3 ). Through the transformation A A the radius 1 of and thus 1 = 2k D1 can thus be reduced from the initial 2 to about 2 2 . Theorem 5.2.2 (Bauer-Fike) If is an eigenvalue of A + E Cnn and X 1 AX = D = diag (1 , , n ), then
(A)

min | | p (X )
p

E .

p,

where

is p-norm and p (X ) = X

X 1

Proof: We need only consider the case (A). If X 1 (A + E I )X is singular, then so is I + (D I )1 (X 1 EX ). Thus, 1 (D I )1 (X 1 EX )
p

(A)

1 min | |

X 1

Theorem 5.2.3 Let Q AQ = D+N be a Schur decomposition of A with D = diag (1 , , n ) and N strictly upper triangular, N n = 0. If (A + E ), then
(A)

min | | max{, n },

where = E

n1 k=0

k 2.

Proof: Dene = min(A) | |. The theorem is true if = 0. If > 0, then I (I A)1 E is singular and we have 1 = (I A)1 E 2 (I A)1 2 E [(I D) N ]1

2 2

Since (I D) is diagonal it follows that [(I D)1 N ]n = 0 and therefore


n 1

[(I D) N ] Hence we have 1

=
k=0

[(I D)1 N ]k (I D)1 .

max{1,

1 n1

n1

}
k=0

k 2,

from which the theorem readily follows.

5.2 Perturbation Theory 171 1 2 3 0 0 0 0 0 . Then (A + E ) 5 and E = 0 Example 5.2.2 If A = 0 4 = 0 0 4.001 0.001 0 0 {1.0001, 4.0582, 3.9427} and As matrix of eigenvectors satises 2 (X ) = 107 . The BauerFike bound in Theorem 5.2.2 has order 104 , but the Schur bound in Theorem 5.2.3 has order 100 . Theorems 5.2.2 and 5.2.3 each indicate potential eigenvalue sensitively if A is non1 normal. Specically, if 2 (X ) and N n is large, then small changes in A can induce 2 large change in the eigenvalues.

0 I9 0 0 and E = , then for all (A) and 10 10 0 0 0 10 (A + E ), | | = 10 10 . So a change of order 1010 in A results in a change of order 101 in its eigenvalues. Example 5.2.3 If A = Let be a simple eigenvalue of A Cnn and x and y satisfy Ax = x and y A = y with x 2 = y 2 = 1. Using classical results from Function Theory, it can be shown that there exists dierentiable x() and () such that (A + F )x() = ()x() with x() 2 = 1 and and set = 0: F
2

1, and such that (0) = and x(0) = x. By dierentiating

(0)x + x Ax (0) + F x = (0).

Applying y to both sides and dividing by y x = f (x, y ) = y n + pn1 (x)y n1 + pn2 (x)y n2 + + p1 (x)y + p0 (x). Fix x, then f (x, y ) = 0 has n roots y1 (x), , yn (x). f (0, y ) = 0 has n roots y1 (0), , yn (0). Theorem 5.2.4 Suppose yi (0) is a simple root of f (0, y ) = 0, then there is i > 0 such that there is a simple root yi (x) of f (x, y ) = 0 dened by yi (x) = yi (0) + pi1 x + pi2 x2 + , (may terminate!)

where the series is convergent for |x| < i . (yi (x) yi (0) as x 0). Theorem 5.2.5 If y1 (0) = = ym (0) is a root of multiplicity m of f (0, y ) = 0, then there exists > 0 such that there are exactly m zeros of f (x, y ) = 0 when |x| < having the following properties: (a)
r i=1

mi = m,

mi 0. The m roots fall into r groups.

(b) Those roots in the group of mi are mi values of a series y1 (0) + pi1 z + pi2 z 2 + corresponding to the mi dierent values of z dened by z = x mi .
1

172 Chapter 5. The Unsymmetric Eigenvalue Problem Let 1 be a simple root of A and x1 be the corresponding eigenvector. Since 1 is simple, (A 1 I ) has at least one nonzero minor of order n 1. Suppose this lies in the rst (n 1) rows of (A 1 I ). Take x1 = (An1 , An2 , , Ann ). Then An1 0 An2 0 (A 1 I ) . = . , . . . . Ann 0 since n j =1 anj Anj = det(A 1 I ) = 0. Here Ani is the cofactor of ani , hence it is a polynomial in 1 of degree not greater than (n 1). Let 1 () be the simple eigenvalue of A + F and x1 () be the corresponding eigenvector. Then the elements of x1 () are the polynomial in 1 () and . Since the power series for 1 () is convergent for small , so x1 () = x1 + z1 + 2 z2 + is a convergent (0) = |y F x| 1 . The upper bound is attained if F = yx . We refer power series |y x | |y x | to the reciprocal of s() |y x| as the condition number of the eigenvalue . (0) + O(2 ), an eigenvalue may be perturbed by an amount , () = (0) + s() if s() is small then is appropriately regarded as ill-conditioned. Note that s() is the cosine of the angle between the left and right eigenvectors associated with and is unique only if is simple. A small s() implies that A is near a matrix having a multiple eigenvalue. In particular, if is distinct and s() < 1, then there exists an E such that is a repeated eigenvalue of A + E and E
2

s() 1 s2 ()

this is proved in Wilkinson(1972). 1 2 3 0 0 0 5 and E = 0 0 0 . Then (A + E ) Example 5.2.4 If A = 0 4 = 0 0 4.001 0.001 0 0 {1.0001, 4.0582, 3.9427} and s(1) = 0.79 100 , s(4) = 0.16 103 , s(4.001) = 0.16 103 . Observe that E 2 /s() is a good estimate of the perturbation that each eigenvalue undergoes. If is a repeated eigenvalue, then the eigenvalue sensitivity question is more compli 1 a 0 0 cated. For example A = and F = then (A + F ) = {1 a}. Note 0 1 1 0 that if a = 0 then the eigenvalues of A + F are not dierentiable at zero, their rate of change at the origin is innite. In general, if is a detective eigenvalue of A, then O() 1 perturbations in A result in O( p ) perturbations in where p 2 (see Wilkinson AEP pp.77 for a more detailed discussion). We now consider the perturbations of invariant subspaces. Assume A Cnn has distinct eigenvalues 1 , , n and F 2 = 1. We have (A + F )xk () = k ()xk (), xk ()
2=

1,

5.2 Perturbation Theory and yk ()(A + F ) = k ()yk (),

173 yk ()
2=

1,

for k = 1, , n, where each k (),xk () and yk () are dierentiable. Set = 0 : k (0)xk + k x Ax k (0) + F xk = k (0), where k = k (0) and xk = xk (0). Since {xi }n k (0) = i=1 linearly independent, write x n a x , so we have i=1 i i
n

k (0)xk . ai (i k )xi + F xk =
i=1 i=k

But yi (0)xk = yi xk = 0, for i = k and thus ai = yi F xk /[(k i )yi xi ],

i = k.

Hence the Taylor expansion for xk () is


n

xk () = xk +
i=1 i=k

yi F xk (k i )yi xi

xi + O(2 ).

Thus the sensitivity of xk depends upon eigenvalue sensitivity and the separation of k from the other eigenvalues. 1 1.01 0.01 , then = 0.99 has Condition = 1.118 0.00 0.99 s(0.99) = 1.00 of the and associated eigenvector x = (0.4472, 8.944)T . On the other hand, 1.01 0.01 nearby matrix A + E = has an eigenvector x = (0.7071, 0.7071)T . 0.00 1.00 Example 5.2.5 If A = Suppose Q AQ = is a Schur decomposition of A with Q = [ Q1 , Q2 ].
p np

T11 T12 0 T22

}p }q = n p

(5.2.1)

(5.2.2)

Denition 5.2.1 We dene the separation between T11 and T22 by sepF (T11 , T22 ) = min
Z =0

T11 Z ZT22 Z F

Denition 5.2.2 Let X be a subspace of Cn , X is called an invariant subspace of A Cnn , if AX X (i.e. x X = Ax X ). Theorem 5.2.6 A Cnn , V Cnr and rank (V ) = r, then there are equivalent:

174 Chapter 5. The Unsymmetric Eigenvalue Problem rr (a) there exists S C such that AV = V S . (b) R(V ) is an invariant subspace of A. Proof: Trivial! Remark 5.2.2 (a) If Sz = z, z = 0 then is eigenvalue of A with eigenvector V z . = V (V V ) 1 2 is an orthogonal basis of X . (b) If V is a basis of X , then V Theorem 5.2.7 A Cnn , Q = (Q1 , Q2 ) orthogonal, then there are equivalent: (a) If Q AQ = B = B11 B12 , then B21 = 0. B21 B22

(b) R(Q1 ) is an invariant subspace of A. Proof: Q AQ = B AQ = QB = (Q1 , Q2 ) Q2 B21 . B11 B12 B21 B22 . Thus AQ1 = Q1 B11 +

(a) B21 = 0, then AQ1 = Q1 B11 , so R(Q1 ) is an invariant subspace of A (from Theorem 5.2.6). (b) R(Q1 ) is invariant subspace. There exists S such that AQ1 = Q1 S = Q1 B11 + Q2 B21 . Multiply with Q 1 , then
S = Q 1 Q1 S = Q1 Q1 B11 + Q1 Q2 B21 .

So S = B11 = Q2 B21 = 0 = Q B21 = 0. 2 Q2 B21 = 0 = Theorem 5.2.8 Suppose (5.2.1) and (5.2.2) hold and for E Cnn we partition Q EQ as follows: E11 E12 Q EQ = E21 E22 with E11 Rpp and E22 R(np)(np) . If = sep2 (T11 , T22 ) and E21
2

E11 +

2)

E22

2>

( T12

E12

2 /4.

Then there exists P C(nk)k such that P


2

E21

1 = (Q1 + Q2 P )(I + P P ) 1 2 form an orthonormal basis and such that the column of Q for a invariant subspace of A + E .(See Stewart 1973).

5.2 Perturbation Theory Lemma 5.2.1 Let {sm } and {pm } be two sequence dened by sm+1 = sm /(1 2pm sm ), and s0 = , satisfying 4 2 < 1. (Here , , > 0) p0 = pm+1 = p2 m sm+1 , m = 0, 1, 2,

175

(5.2.3) (5.2.4) (5.2.5)

Then {sm } is monotonic increasing and bounded above; {pm } is monotonic decreasing, converges quadratically to zero. Proof: Let xm = s m p m , From (5.2.3) we have
2 2 2 2 xm+1 = sm+1 pm+1 = p2 m sm /(1 2pm sm ) = xm /(1 2xm ) ,

m = 0, 1, 2, .

(5.2.6)

(5.2.7)

(5.2.5) can be written as 0 < x0 < Consider x = f (x), By df (x) 2x = , dx (1 2x)3 df (x) |x=0 = 0 dx : The equation (5.2.9) has zeros 0 and 1/4 in [0, 1/2 ). Under Condition (5.2.8) the iteration xm as in (5.2.7) must be monotone decreasing converges quadratically to zero. (Issacson &Keller Analysis of Num. Method 1996, Chapter 3 1.) Thus we know that f (x) is dierentiable and monotonic increasing in [0, 1/2 ), and sm+1 1 2xm = =1+ = 1 + tm , sm 1 2xm 1 2xm where tm is monotone decreasing, converges quadratically to zero, hence sm+1 sj +1 = s0 (1 + tj ) = s0 s j j =0 j =0
m m

1 1 . (since x0 = s0 p0 = 2 < ) 4 4 f (x) = x2 /(1 2x)2 , x 0.

(5.2.8)

(5.2.9)

monotone increasing, and converges to s0


j =0

(1 + tj ) < , so pm =

xm monotone desm

creasing, and quadratically convergent to zero.

176 Theorem 5.2.9 Let

Chapter 5. The Unsymmetric Eigenvalue Problem P A12 P + P A11 A22 P A21 = 0 (5.2.10)

be the quadratic matrix equation in P C(nl)l A11 A12 A21 A22 Dene operator T by: T Q QA11 A22 Q, Let = A12 , and = T 1 = sup
P =1

(1 l n), where (A22 ) = .

= A,

(A11 )

Q C(nl)l .

(5.2.11)

= A21 T 1 P .

(5.2.12) (5.2.13)

If 4 2 < 1, (5.2.14) then according to the following iteration, we can get a solution P of (5.2.10) satisfying P 2, and this iteration is quadratic convergence. (m) (m) A11 A12 , A0 = A. Iteration: Let Am = (m) (m) A21 A22 (i) Solve Tm Pm Pm A11 A22 Pm = A21 and get (ii) Compute A11 A21 Goto (i), solve Pm+1 . Then P = lim
m m (m+1) (m+1) (m) (m) (m)

(5.2.15)

(5.2.16)

Pm C(nl)l ;

= A11 + A12 Pm , = A22 Pm A12 , = Pm A12 Pm .


(m)

(m)

A22

(m+1)

Pi
i=0

(5.2.17)

is a solution of (5.2.10) and satises (5.2.15).

5.2 Perturbation Theory Proof: (a) Prove that for m = 0, 1, 2, ,


1 Tm = m ,

177
1 Tm

exist: denote = 0 ), (5.2.18) (5.2.19)

( T = T0 , Pm

then 4 By induction, m = 0, from (A11 ) (5.2.12)-(5.2.14) it holds 4 A12 P0 A12 m < 1. (A22 ) = we have T0 = T is nonsingular. From T 1 A21 4 2 < 1.

0 = 4

1 1 Suppose Tm exists, and (5.2.19) holds, prove that Tm +1 exists and

4 From the denition

A12

Pm+1

m+1 < 1. QA11 A22 Q


1

sep(A11 , A22 ) = inf

Q =1

and the existence of T 1 follows sep(A11 , A22 ) = T 1 property of sep follows sep(A11
(m+1)

= 1 , and by the perturbation


(m)

, A22

(m+1) (m)

) = sep(A11 + A12 Pm , A22 Pm A12 ) Pm A12 (5.2.20)

(m)

sep(A11 , A22 ) A12 Pm 1 2 A12 Pm m > 0. m From

(m)

sep(A11 , A22 ) min{|1 2 | : 1 (A11 ), 2 (A22 )}. We have (A11


(m+1)

(A22

(m+1)

1 ) = , hence Tm +1 exists and

sep(A11 From (5.2.20) it follows

(m+1)

, A22

(m+1)

1 ) = Tm +1

1 = m +1 .

m+1

m 12 A12 Pm m

(5.2.21)

Substitute (5.2.19) into (5.2.21), we get m+1 2m , and


1 Pm+1 Tm +1 +1 Am m+1 21

Pm

A12 <

1 2

Pm

Hence 2 A12 Pm+1 m+1 2 A12 Pm m < 1/2.

1 exists for all m = 0, 1, 2, and (5.2.19) holds. This proved that Tm

178 (b) Prove Pm satisfying From follows Dene {qm } by

Chapter 5. The Unsymmetric Eigenvalue Problem is quadratic convergence to zero. Construct sequences {qm }, {sm }, {pm } A21
(m)

qm , A21
(m+1)

m sm ,

Pm pm .

(5.2.22) (5.2.23) (5.2.24) (5.2.25) (5.2.26)

= Pm A12 Pm Pm
2

A21

(m+1)

A12

p2 m.

qm+1 = p2 m, m+1

q0 = ;

From (5.2.21) we have

sm . 1 2pm sm s0 = ;

Dene {sm } by sm+1 = From (5.2.16) we have


1 Pm Tm

sm , 1 2pm sm

(5.2.27)

A21

(m)

= m

A21

(m)

sm qm . (5.2.28)

Dene {pm } by

pm+1 = sm+1 qm+1 = p2 m sm+1 ,

p0 = .

By Lemma 5.2.1 follows that {pm } 0 monotone and form (5.2.22) follows that Pm 0 quadratically. (c) Prove P (m) P and (5.2.15) holds. According to the method as in Lemma 5.2.1. Construct {xm } (see (5.2.6),(5.2.7) ), that is xm+1 = and then pm+1 = x2 m , (1 2xm )2 sm+1 = sm 1 2xm (5.2.29)

xm+1 xm = pm . sm+1 1 2xm 1 . 4

(5.2.30)

By induction! For all m = 1, 2, we have 1 pm < pm1 , 2 In fact, substitute xm < (5.2.31)

x0 2 1 = < 2 1 2x0 1 2 2

(5.2.32)

1 into (5.2.30) and get p1 < p0 ; From (5.2.29) and (5.2.32) it follows that 2 1 x1 = x0 1 2x0
2

<

1 . 4

5.2 Perturbation Theory For m = 1, (5.2.31) holds. Suppose for m (5.2.31) holds, form (5.2.30), we have 1 pm+1 < pm ; 2

179

xm 1 1 by (5.2.29) it holds xm+1 = < , that is (5.2.31) holds for m + 1. Hence 1 2xm 4 (5.2.31) holds for all nature number m. Therefore pm < p0 /2m , m = 1, 2, , hence p(m) converges, where
m m

p Let

(m)

=
i=0

pi <
i=0

1 2i

p0 = 2 1

1 2m+1

p0 .

(5.2.33)

(m)

=
i=0

Pi .

From (5.2.22),(5.2.28) and (5.2.33) follows that


m m

(m)

i=0

Pi
i=0

pi < 2 1

1 2m+1

p0 = 2 1

1 2m+1

Let m , then (5.2.15) holds. By (b) the limit matrix P as in (5.2.17) is quadratic convergence. Theorem 5.2.10 Let A, E Cnn , Z1 Cnl be the eigenmatrix of A corresponding H to A11 Cll (i.e. AZ1 = Z1 A11 ) and Z1 Z1 = I, 1 l n. Let Z = (Z1 , Z2 ) be unitary. Denote Z AZ = A11 A12 0 A22 , Z EZ = E11 E12 E21 E22 T 1 . ( E11 + E22

Dene T as in (5.2.11). Suppose (A11 ) ) < 1. Let = If T 1 ( E11 ,

(A22 ) = and

T 1

E22 )

= A12

E12 ,

= E21

(5.2.34)

4 2 < 1, then there exists P C(nl)l with P 2 such that

(5.2.35)

1 = Z1 + Z2 P Cnl Z

(5.2.36)

= A + E corresponding to A = A11 + E11 + (A12 + E12 )P . is the eigenmatrix of A

180

Chapter 5. The Unsymmetric Eigenvalue Problem Il 0 Proof: Prove that there exists L = with P 2 such that P Inl L1 A11 + E11 A12 + E12 E21 A22 + E22 L= A 11 0 . (5.2.37)

This is resulted from solving the following equation P (A12 + E12 )P + P (A11 + E11 ) (A22 + E22 )P E21 = 0. Let P P (A11 + E11 ) (A22 + E22 )P. T By (5.2.34),(5.2.35) and 1 = { inf T { inf we have 4 (A12 + E12 ) 1 T
2 P =1 P =1

(5.2.38)

P (A11 + E11 ) (A22 + E22 )P }1 sup


P =1

P A11 A22 P T ( E11


1

P E11 E22 P }1 = ,

T 1

E22 )

E21 4 2 < 1.

Because the condition (5.2.14) in Theorem 5.2.9 is satised, by Theorem 5.2.9, the equation (5.2.38) has a solution P satisfying P 2 . Then it follows the result from (5.2.37). Remark 5.2.3 Normalized Z1 + Z2 P (Z1 + Z2 P )(I + P H P ) dist(Z1 , (Z1 + Z2 P )(I + P H P ) = = = = Example 5.2.6 6 1 A= 1 4 0 0
1 2 1 2

. Consider

)
1 2

2 H [Z1 (Z1 + Z2 P )(I + P H P ) 1 min 2 1 min [(I + P H P )


1 2

1 [max (I + P H P )]1 1 P 1+ 1 1+ P
2 2 2

2 2

Let n = 3, l = 2, k = 1, 0.5 0.1 0.3 1 A11 A12 0 = , E = 0.4 0.3 0.2 = A21 A22 1 0.3 0.2 0.3

E11 E12 , E21 E22

5.3 Power Iterations 6.5 1.1 1.3 = A + E = 0.6 4.3 0.2 = A 0.3 0. 2 1.3 5 1 0 The Jordan form of A is 0 5 0 , 0 0 1 1 0 The eigenmatrix of A is Z1 = 0 1 , 0 0
1

181 11 A 12 A 22 A21 A .

(A11 ) = {5, 5},

(A22 ) = {1}.

which satises AZ1 = Z1 A11 .


1

3 5 1 Question 1: Compute T = = , and A12 = 1, E12 = 0.3, 1 3 8 E21 = 0.5, E11 = 0.7, E22 = 0.3, to make sure the conditions in Theorem 117 5.2.10, which are = 0.6, = 1.3, = 0.5. Then check 4 2 = < 1, i.e., 125 (5.2.35) is satised. . For all m = 0, 1, 2, , we solve Question 2: From theorem 5.2.9, take A0 = A Pm A11 A22 Pm = A21 , and get Pm =
(m+1) (m) A21 Sm , (m) (m) A11 (m) (m) (m)

where Sm =
(m+1)

A22 0

(m)

0 (m) A22

.
(m+1)

Compute A11 = A11 + A12 Pm , A22 And then go back to solve Pm+1 . Then P Compute A21
(m) =?

= A22 Pm A12 and A21


m

(m)

= Pm A12 Pm .

(m)

=
i=0

Pi .
=?

Compute A4 =

when m = 0, 1, 2, 3. and Pm I 0 I 0 A ?. (3) (3) P I P I I P (3) =?.

when m = 0, 1, 2, 3.

(3) 1 Z 1 Compute Z =

= A(4) Compute A 11 =?.

5.3

Power Iterations
T0 = U 0 AU0 , for k=1,2,3,

Given A Cnn and a unitary U0 Cnn . Consider the following iteration: (5.3.1) (5.3.2) where Tk1 = Uk Rk is the QR factorization of Tk1 and set Tk = Rk Uk . Since Tk = (U0 U1 Uk ) A(U0 U1 Uk ), it is obvious that each Tk is unitary similar to A. Is (5.3.2) always converges to a Schur decomposition of A ? Iteration (5.3.1) is called the QR iterations. (See Section 5.4)

182

Chapter 5. The Unsymmetric Eigenvalue Problem

5.3.1

Power Method
Axi = i xi , i = 1, , n (5.3.3) (5.3.4)

Let A be a diagonalizable matrix,

with |1 | > |2 | |3 | |n | and let u0 = 0 be a given vector. From the expansion


n

u0 =
i=1

i xi

(5.3.5)

follows that As u0 =

n s i s i xi = 1 {1 x1 + i=1 s s 1 A u0

i (
i=2

i s ) xi }. 1

(5.3.6)

Thus the sequence of converges to a multiplicity of x1 . We consider two possibilities of normalization : (A) - a given vector norm: For i = 0, 1, 2, . . . , vi+1 = Aui ki+1 = vi+1 ui+1 = vi+1 /ki+1 with initial u0

(5.3.7)

End Theorem 5.3.1 Under the assumption (5.3.4) and 1 = 0 in (5.3.5) holds for the sequence dened by (5.3.7)
i

lim ki = |1 | x1 1 | 1 | , where = x 1 | 1 | 1

lim i ui =

Proof: It is obvious that us = As u0 / As u0 This follows from 1 s As u0 1 x1 that |1 |s As u0 |1 |s+1 As1 u0 and then |1 | x1 , |1 | x1 , , ks = As u0 / As1 u0 . (5.3.8)

|1 |1 As u0 / As1 u0 = |1 |1 ks 1. 1 x1 + As u0 = s A u0 1 x1 +

From (5.3.6) follows now for s s us = s 1 x1 x1 1 = . 1 x1 x1 |1 | (5.3.9)

5.3 Power Iterations (B) Let l be a linear functional: Consider For i = 0, 1, 2, . . . , vi+1 = Aui , ki+1 = l(vi+1 ) e.g. en (vi+1 ), e1 (vi+1 ), ui+1 = vi+1 /ki+1 with initial u0 . End Then it holds

183

(5.3.10)

Theorem 5.3.2 Under the assumption of theorem 5.3.1, consider the method dened by (5.3.10) and suppose that l(vi ) = 0, for i = 1, 2, , and l(x1 ) = 0. Then holds x1 lim ki = 1 and lim ui = . i i l(x1 ) Proof: As above we show that ui = Ai u0 /l(Ai u0 ) , ki = l(Ai u0 )/l(Ai1 u0 ). From (5.3.6) we get for s 1 s l(As u0 ) 1 l(x1 ), 1 s+1 l(As1 u0 ) 1 l(x1 ), thus 1 1 ks 1. Similarly for i ,
j i 1 x1 + n Ai u0 1 x1 j =2 j ( 1 ) xj ui = = i l(A u0 ) l(1 x1 + ) 1 l(x1 )

(5.3.11)

Remark 5.3.1 (a) As linear functional l, a x component k will always be chosen l(x) = xk , k x. (b) The above argument also holds, if is a multiple eigenvalue. (c) The iteration (5.3.10) follows ks 1 l(x1 ) + l(As u0 ) = 1 = s 1 l(A u0 ) 1 l(x1 ) + = 1 + O(| 2 s1 | ). 1
n j =2 n j =2
j s j ( 1 ) l(xj )

j s1 j ( 1 ) l(xj )

(5.3.12)

2 2 That is the convergence depends on | |. In the case | | = 1 the iteration does 1 1 2 not converge. Sometimes one can make the number | 1 | small if we replace A with A + I , then the eigenvalue i of A are transformed into i + and the convergence i + s |) . But this correction is not remarkable. The will be described by (maxi=1 | 1 + more useful method is use the inverse iteration . (See later !)

184 Chapter 5. The Unsymmetric Eigenvalue Problem Now consider the case : A real and 2 , |1 | > |3 | |n |. 1 = We can choose x2 = x 1 such that 1x Ax1 = 1 x1 , Ax 1 = 1 = 2 x 1 . Let u0 be real and let
n

(5.3.13)

u0 = 1 x 1 + 1x 1 +
i3

i xi , 1 = ei , 1 = ei .

Then from (5.3.6) and (5.3.10) we have A u0 =


s

1 s 1 x1
s

1 s x + 1 1 + x1 + e

i s i xi
i3 n

= {e

i(+s )

i(+s )

x 1 +
i3

i (

i s ) xi }.

It happens oscillation without convergence! Let 1 ) = 2 p q, p = 1 + 1 q = 1 1. h() = ( 1 )( Then


n

(A

s+2

pA

s+1

s h( 1) x qA )u0 = 1 1 h(1 ) x1 + 1 1 + 1
s s =0 =0 i=3

i h(i )i s xi .

Together with
n

l(A u0 ) = r {e follows

i(+s )

l(x1 ) + e

i(+s )

l( x1 ) +
i=3

i (

i s ) l(xi )}

(As+2 pAs+1 qAs )u0 ks+2 ks+1 us+2 pks+1 us+1 qus = 0. l(As u0 )

In this limit case us+2 , us+1 and us are linearly dependent. For x s we determine ps and qs such that ks+2 ks+1 us+2 ps ks+1 us+1 qs us 2 = min! We have to project the lot of ks+2 ks+1 us+2 on the plane determined by ks+1 us+1 and us , this leads ks+2 ks+1 us+2 pks+1 us+1 qus us+i , i = 0, 1 or
T uT s+1 us+1 us+1 us uT uT s us s us+1

ps ks+1 qs

= ks+1 ks+2

uT s+1 us+2 uT s us+2

(5.3.14)

We can show that ps p, qs q .

5.3 Power Iterations

185

5.3.2

Inverse Power Iteration

Let be an approximate eigenvalue of 1 , i.e., 1 , then (I A)1 has eigenvalues 1 1 , 1 , , . Substitute A by (I A)1 , then the convergence is determined 1 2 n 1 by maxi=1 | |. i Consider For i = 0, 1, 2, . . . , vi+1 = (I A)1 ui , ki+1 = l(vi+1 ), (5.3.15) ui+1 = vi+1 /ki+1 with initial vector u0 , End Let A and u0 be given and satisfy (5.3.3) and (5.3.5) respectively. Then we have the following theorem. Theorem 5.3.3 If | 1 |< | i |, for i = 1 and suppose that 1 = 0, l(x1 ) = 0, and l(vi ) = 0 for all i in (5.3.15) then holds 1 , i 1 x1 lim ui = . i l(x1 ) lim ki = Variant I: (5.3.15) with constant . Variant II: Updating . Given (0) = and u0 . For i = 0, 1, 2, . . . , vi+1 = ((i) I A)1 ui , ki+1 = l(vi+1 ), vi+1 ui+1 = k and (i+1) = (i) i+1 End (1 1 ) ki (5.3.16)

(5.3.17)
1 . ki+1

Show that: The method (5.3.17) is quadratic convergence. Let (i) 1 , ui x1 , and l(x1 ) = 1. The remaining components of x1 are smaller than 1 (Here l(z ) = z1 ). Let
n

um = (1 + and

(m) 1 )x1

+
j =2

j xj ,

(m)

(m) = |(m) 1 |

(5.3.18)

m) (m) ). m = max(|1 |, , |( n |,

(m)

(5.3.19)

Claim: There exist a constant M independent on m with


2 . m+1 M m

(5.3.20)

Let i

(m)

i . Then we have vm+1 = km+1 um+1 = 1 + 1 x1 + (m) 1


n

j =2

j xj (m) j

(5.3.21)

186 and (m+1) = (m) 1 km+1

Chapter 5. The Unsymmetric Eigenvalue Problem


n

= (m) (m 1 )[(1 + 1 ) +
j =2

j ((m) 1 ) xj,1 ]1 . (m) j (5.3.22)

= (m) ((m) 1 )(1 + O(m )) = 1 + m O(m ). From (5.3.21), um+1 = km+1 um+1 km+1
j 2

= [(1 + 1 )x1 + = [ x1 +
j 2

j ((m) 1 ) xj ][1 + 1 + (m) j

j 2

j ((m) 1 ) xj,1 ]1 (m) j

j ((m) 1 ) xj ] [1 + (1 + 1 )((m) j )
(m+1)

j 2

j ((m) 1 ) xj,1 ]1 . (1 + 1 )((m) j )


2 ) 1+O(m

= (1 + 1 with i
(m+1)

(m+1)

)x1 +
j 2

xj

2 2 = O(m ). This implies m+1 M m .

5.3.3

Connection with Newton-method

Consider the nonlinear equations Au u = 0, l T u = 1, for n + 1 unknowns u and . Let F Newton method for (5.3.23): ui+1 i+1 where F Multiplying with F and simplify
u (i)

:=

Au u lT u 1

= 0.

(5.3.23)

=
(i+1)

F
(i)

F
(i)

,
(i)

A I u lT 0

and write the rst n equations and the last equation separately lT ui+1 = 1. (5.3.24)

(A i I )ui+1 = (i+1 i )ui ,

We see that (5.3.24) identies with (5.3.17) and is also quadratic convergence.

5.3 Power Iterations Variant III: A is real symmetric and m+1 = Rayleigh Quotient Give u0 with u0 2 = 1 and 0 = uT 0 Au0 , For m = 0, 1, 2, . . . , vm+1 = (m I A)1 um , m+1 um+1 = vvm , +1 2 T m+1 = um+1 Aum+1 . End Claim: The iteration (5.3.25) is cubic convergence. The eigenvectors xi of A form an orthonormal system xT i xj = ij . As above, let i (m) (i = i ) :
(m)

187

(5.3.25)

(5.3.26)

and m be dened in (5.3.18) and (5.3.19). From (5.3.26) follows


n

um 2 2 So 1
n 2 2 m

= 1 = (1 + 1 ) +
j 2

2 j

= 1 + 2 1 +
j =1

2 j.

2 = O(m ). That is 2 (m) = uT m Aum = 1 (1 + 1 ) + j 2 n

j 2 j

(5.3.27)

= 1 + 21 1 +
j =1 2 Thus (m) = O(m ). On the other hand, T vm +1 vm+1

2 j 2 j = 1 + O (m ).

(1 + 1 )2 = + ((m) 1 )2

j 2

2 j ((m) j )2 2 (m)2 j }. (1 + 1 )2 ((m) j )2


6 ) 1+O(m

1 + 1 2 = | | {1 + (m) 1

j 2

Therefore um+1 = ( 1 + 1 x1 + (m) 1 (m) 1 j 6 ))( xj )(1 + O(m ) (m) j 1 + 1

j 2

= [ x1 +
j 2

j (m) 6 )) xj ](1 + O(m (1 + 1 )((m) j ) )x1 +


j 2

= (1 + 1 with | j
(m+1)

(m+1)

(m+1)

xj

3 (j = 1, , n). As in (5.3.27) we have | M m 6 2 |(m+1) 1 | = O(m +1 ) = O (m ).

188

Chapter 5. The Unsymmetric Eigenvalue Problem

5.3.4

Orthogonal Iteration
For k = 1, 2, Zk = AQk1 , Qk Rk = Zk , (QR decomposition) End

Given Q0 Cnp with orthogonal columns and 1 p < n.

(5.3.28)

Note that if p = 1 this is just the power method. Suppose that Q AQ = T = diag (i ) + N, |1 | |n | is a Schur decomposition of A and partition Q, T and N as follows
p n p

(5.3.29)

Q = [ Q , Q ], T =

T11 T12 0 T22

N=

N11 N12 0 N22

(5.3.30)

If |p | > |p+1 | we dene Dp (A) = R(Q ) Range(Q ) is a dominant invariant subspace . It is the unique subspace associated with associated with 1 , , p . The following theorem (without proof see Golub/Vanloan p.215) shows that the subspace p+1 k R(Qk ) generated by (5.3.28) converges to Dp (A) at a rate proportional to | | under p reasonable assumptions. Theorem 5.3.4 Let the Schur form of A be given by (5.3.29) and (5.3.30). Assume that |p | > |p+1 | and that 0 satises (1 + )|p | > N
F.

If Q0 Cnp with Q 0 Q0 = Ip and d = dist[Dp (A ), R(Q0 )] < 1, then Qk generated by (5.3.28) satisfy

T12 F |p+1 | + N F /(1 + ) k (1 + )n2 [1 + ][ ] . dist[Dp (A), R(Qk )] 2 sep(T11 , T12 ) |p | N F /(1 + ) 1d When is chosen large enough then the theorem essentially shows that dist[Dp (A), R(Qk )] c|p+1 /p |k , where c depends on sep(T11 , T12 ) and N F . Needless to say, the convergence can be very slow if the gap between |p | and |p+1 | is not suciently wide. To prove this theorem we need to prove the following two lemmas 5.3.1 and 5.3.3. Lemma 5.3.1 Let T = T11 T12 0 T22 and dene the linear operator : Cpq Cpq by

(X ) = T11 X XT22 . Then is nonsingular (T11 ) (T22 ) = . if is nonsingular and (Z ) = T12 , Ip Z then Y 1 T Y = diag (T11 , T22 ), where Y = . 0 Iq

5.3 Power Iterations Proof: =: Suppose (X ) = 0 for X = 0 and U XV = r 0 0 0 , r = diag (i ), r = rank (X ).

189

Substituting into T11 X = XT22 gives A11 A12 A21 A22 where U T11 U = (Aij ) and V T22 V = (Bij ). By comparing blocks, we see that A21 = B12 = 0 and (A11 ) = (B11 ). Conversely, = (A11 ) = (B11 ) (T11 ) (T22 ). =: If (T11 ) (T22 ), then there are x = 0, y = 0 satisfy T11 x = x and y T22 = y . This implies (xy ) = 0. Finally, if nonsingular, then Z exists and Y 1 T Y = T11 T11 Z ZT22 + T12 0 T22 = T11 0 0 T22 . r 0 0 0 = r 0 0 0 B11 B12 B21 B22 ,

{ Another proof } For A Cmm and B Cmm dene the Kronecker product of A and B by a11 B a1m B . . m mnmn . AB = . . = [aij B ]i,j =1 C . . am1 B amm B Let C = [c1 , , cn ] Cmm . Dene c1 . mn1 vec(C ) = . . . C cn Consider the linear matrix equation AX XB = C. Lemma 5.3.2 vec(AX XB ) = (I A B T I )vec(X ). Proof: (AX )j = AXj vec(AX ) = (I A)vec(X ),
n

(5.3.31)

(XB )j =
k=1

bkj Xk = [b1j I, , bnj I ]vec(X ).

190 By linearity of vec we have

Chapter 5. The Unsymmetric Eigenvalue Problem

vec(AX XB ) = vec(AX ) vec(XB ) = [(I A) (B T I )]vec(X ). Let G = [(I A B T I )], X = vec(X ), r = vec(C ). Then the equation (5.3.31) is equivalent to Gx = r and the equation (5.3.31) has a unique solution (A) (B ) = . There are unitary Q1 , Z1 such that s1 0 0 r1 . .. Q . , Z1 BZ1 = B1 = . . 0 . 1 AQ1 = A1 = 0 sm 0 0 rm (5.3.31) becomes
Q 1 AQ1 Q1 XZ1 Q1 XZ1 Z1 BZ1 = Q1 CZ1 C1 A1 X1 X1 B1 = C1 , where X1 = Q 1 XZ1 G1 x1 = r1 ,

where G1 = [I A1 B1 I ] and x1 = vec(X1 ), r1 = vec(C1 ). Also det(G1 ) =


1im 1jn

(ri rj ).

Hence we have (A) (B ) = (rj sj ) = 0 (i = 1, m, j = 1, n) det(G1 ) = 0 G1 x1 = r1 , has a unique solution. the equation (5.3.31) has a unique solution X . Exercise: (a) Consider the linear matrix equation AXB CXD = R where A, C Cmm , B, D Cnn and X, R Cmn . The equation has a unique solution (A, C ) (B, D) = . (b) Consider AX Y B = R, where A, B, C, D, X, Y, R, S Cmn . The equation CX Y D = S, has a unique solution (X, Y ) (A, C ) (B, D) = .

Lemma 5.3.3 Let Q AQ = T = D + N (Schur decomposition). D is diagonal and N is strictly upper triangular. Let = max{| | : det(A I ) = 0} and = min{| | : det(A I )} = 0. If 0, then Ak
2

(1 + )n1 [|| +

N F k ] , k 0. 1+
F,

(5.3.32)

If A is nonsingular and 0 satises (1 + )|| > N Ak


2

then ]k , k 0. (5.3.33)

(1 + )n1 [

1 || N
F /(1 + )

5.3 Power Iterations 191 2 n1 Proof: For 0, dene = diag(1, 1+, (1+) , , (1+) ) and 2 ( ) = (1+)n1 . 1 But N N F /(1 + ), thus F Ak
2 1 = Tk 2 = (D + N 1 )k 2 ( )[ D 2 + N 1 2 ]k (1 + )n1 [|| + N F /(1 + )]k . F, 2

On the other hand, if A is nonsingular and (1 + )|| > N D1 N and thus, Ak


2 1 2

then

<1

1 = T k 2 = (I + D1 N 1 )1 D1 ]k 2 ( )[ D1 2 /[1 D1 N 1 2 ]k 1 (1 + )n1 [ ]k || N F /(1 + )

{ proof of Theorem 5.3.4: } By induction Ak Q0 = Qk (Rk R1 ). By substituting (5.3.29), (5.3.30) into this equality we get V0 W0 = Vk Wk (Rk , , R1 ),

p(np) where Vk = Q such Qk and Wk = Q Qk . Using Lemma 5.3.1 there is an X C that 1 T11 T12 I X T11 0 I X = . 0 T22 0 I 0 T22 0 I

Moreover since sep(T11 , T22 ) = the smallest singular value of (X ) = T11 X XT22 . From (X ) = T12 follows X F T12 F /sep(T11 , T22 ). Thus
k T11 0 k 0 T22

V0 XW0 W0

Vk XWk Wk

(Rk , , R1 ).

Assume V0 XW0 is nonsingular. Then


k k Wk = T22 W0 (V0 XW0 )1 T11 (Vk XWk ).

From Theorem 5.1.3 follows that dist[Dp (A), R(Qk )] = Q Qk Then


k dist[Dp (A), R(Qk )] T22 2 2

= Wk 2 .

(V0 XW0 )1

k T11 2 [1 + X

F ].

(5.3.34)

192 Chapter 5. The Unsymmetric Eigenvalue Problem We prove V0 XW0 is nonsingular. From A Q = QT follows that
A (Q Q X ) = (Q Q X )T11 ,

which implies orthogonal column of Z = (Q Q X )(I + XX ) 2 are a basis of Dp (A ). Also 1 (V0 XW0 ) = (I + XX ) 2 Z Q0 . This implies p (V0 XW0 ) p (Z Q0 ) = p (V0 XW0 ) p (Z Q0 ) = Hence V0 XW0 is invertible and (V0 XW0 )1
k T22 2 2

1 d2 > 0.

1 . 1d2 F /(1

By Lemma 5.3.1 we get

(1 + )np1 [|p+1 | + N (1 + )p1 /[|p | N

+ )]k .

and
k T11 2 F /(1

+ )]k .

Substituting into (5.3.34) the theorem is proved.

5.4

QR-algorithm (QR-method, QR-iteration)


AU = U R,

Theorem 5.4.1 (Schur Theorem) There exists a unitary matrix U such that

where R is upper triangular. Iteration method (from Vojerodin): Set U0 = I, For i = 0, 1, 2, AUi = Ui+1 Ri+1 , End If Ui converges to U , then for i
Ri+1 = Ui +1 AUi U AU.

(an QR factorization of AUi .)

(5.4.1)

We now dene
Qi = Ui 1 Ui , Ai+1 = Ui AUi .

(5.4.2)

Then from (5.4.1) we have


Ai = Ui 1 AUi1 = Ui1 Ui Ri = Qi Ri .

On the other hand from (5.4.1) substituting i by i 1 we get


Ri Ui 1 = Ui A

5.4 QR-algorithm (QR-method, QR-iteration) and thus Ri Qi = Ri Ui 1 Ui = Ui AUi = Ai+1 . So (5.4.1) for U0 = I and A1 = A is equivalent to: For i = 1, 2, 3, Ai = Qi Ri (QR factorization of Ai ), Ai+1 = Ri Qi . End

193

(5.4.3) (5.4.4)

Equations (5.4.3)-(5.4.4) describe the basic form of QR algorithm. We prove two important results. Let Pi = Q1 Q2 Qi , Si = Ri Ri1 R1 . (5.4.5) Then hold Ai+1 = Pi APi = Si ASi1 , i = 1, 2, Ai = Pi Si i = 1, 2, . (5.4.6) (5.4.7)

(5.4.6) is evident. (5.4.7) can be proved by induction. For i = 1, A1 = Q1 R1 , Suppose (5.4.7) holds for i. Then Ai+1 = APi Si = Pi Ai+1 Si (from (5.4.6) ) = Pi Qi+1 Ri+1 Si = Pi+1 Si+1 . Theorem 5.4.2 Let A Cnn with eigenvalues i under the following assumptions: (a) |1 | > |2 | > |n | > 0; (b) The factorization A = X X 1 (5.4.9) with X 1 = Y and = diag(1 , , n ) holds. Here Y has an LR factorization. Then QR algorithm converges. Furthermore (a) limi ajk = 0, for j > k , where Ai = (ajk ); (b) limi akk = k , for k = 1, , n. Remark 5.4.1 Assumption (5.4.9) is not essential for convergence of the QR algorithm. If the assumption is not satised, the QR algorithm still converges, only the eigenvalues on the diagonal no longer necessary appear ordered in absolute values, i.e. (b) is replaced (i) by (b) limi akk = (k) , k = 1, 2 , n, where is a permutation of {1, 2, , n}. ( See Wilkinson pp.519 )
(i) (i) (i)

(5.4.8)

194 Chapter 5. The Unsymmetric Eigenvalue Problem Proof: { of Theorem 5.4.2 } Let X = QR be the QR factorization of X with rii > 0 and Y = LU be the LR factorization of Y with ii = 1. Since A = X X 1 = QRR1 Q , we have Q AQ = RR1 (5.4.10) is an upper-triangular matrix with diagonal elements i ordered in absolute value as in (5.4.8). Now As = X s X 1 = QRs LU = QRs Ls s U and since i < k, 0, i 1, i = k, (s Ls )ik = ik ( )s = k 0, i > k as s ,

where s Ls = I + Es , with lims Es = 0. Therefore As = QR(I + Es )s U = Q(I + REs R1 )Rs U = Q(I + Fs )Rs U with lims Fs = 0. From the conclusion of QR factorization the matrices Q and R (rii > 0) depend continuously on A (A = QR). But I = I I is the QR factorization of I , therefore it holds for the QR factorization: s R s . I + Fs = Q s = I and lims R s = I. From (5.4.7) we have Thus for Fs 0, we have lims Q s )(R s Rs U ) = Ps Rs . As = (QQ So from the uniqueness of QR factorization there exists a unitary diagonal matrix Ds with s Q. Ps Ds = QQ Thus from (5.4.6) we have
Di Ai+1 Di = Di Pi APi Di Q AQ = RR1 .

(5.4.11)

The assertions (a) and (b) are proved.


i Remark 5.4.2 One can show that lims Qs = diag ( | ). That is in general Qs does i| not converge to I and then Ps does not converge. Therefore Ds does not converge to I and (5.4.11) shows that the elements of As over the diagonal elements oscillate and only converge in absolute values.

Let A be diagonalizable and the eigenvalues such that |1 | = = |1 | > |1 +1 | = = |2 | > = |s | (5.4.12)

with s = n. We dene a block partition of n n matrix B in s2 blocks Bk for k, = 1, 2, , s B = [Bk ]s k, =1 .

5.4 QR-algorithm (QR-method, QR-iteration) 195 Theorem 5.4.3 (Wilkinson) Let A be diagonalizable and satisfy (5.4.12) and (5.4.9). (i) Then it holds for the blocks Ajk of Ai that (a) limi Ajk = 0,
(i)

j > k;
(i)

(b) The eigenvalues of Akk converges to the eigenvalues k1 +1 , , k . Special case: If A is real and all the eigenvalues have conjugate eigenvalues. Then + + + + + + + + + + + + Ai + 0 dierent absolute value except + + + + + .

Theorem 5.4.4 Let A be an upper Hessenberg matrix. Then the matrices Qi and Ai in (5.4.3) and (5.4.4) are also upper Hessenberg matrices.
1 1 Proof: It is obvious from Ai+1 = Ri Ai Ri and Qi = Ai Ri .

5.4.1

The Practical QR Algorithm

In the following paragraph we will develop an useful QR algorithm for real matrix A. We will concentrate on developing the iteration Compute orthogonal Q0 such that H0 = QT 0 AQ0 is upper Hessenberg. For k = 1, 2, 3, Compute QR factorization Hk = Qk Rk ; Set Hk+1 = Rk Qk ; (5.4.13) End Here A Rnn , Qi Rnn is orthogonal and Ri Rnn is upper triangular. Theorem 5.4.5 (Real Schur Decomposition) If thogonal Q Rnn such that R11 R12 0 R21 QT AQ = . . .. . . . . . 0 0 A Rnn , then there exists an orR1m R2m . . . Rmm (5.4.14)

where each Rii is either 1 1 or 2 2 matrix having complex conjugate eigenvalues.

196 Chapter 5. The Unsymmetric Eigenvalue Problem Proof: Let k be the number of complex conjugate pair in (A). We prove the theorem by induction on k . The theorem holds if k = 0. Now suppose that k 1. If = +i (A) and = 0, then there exists vectors y and z Rn (z = 0) such that A(y + iz ) = ( + i)(y + iz ), i.e., A[y, z ] = [y, z ] .

The assumption that = 0 implies that y and z span a two dimensional, real invariant subspace for A. It then follows that U T AU = T11 T12 0 T22 }. with (T11 ) = {,

so that U T T22 U has the require structure. By induction, there exists an orthogonal U ). The theorem follows by setting Q = U diag (I2 , U Algorithm 5.4.1 (Hessenberg QR step) Input: Given the upper Hessenberg matrix H Rnn ; = RQ; Compute QR factorization of H : H = QR and overwrite H with H For k = 1, , n 1, 2 Determine ck and sk with c2 k + sk = 1 such that ck s k hkk , = sk ck hk+1,k 0 For j = k, , n, hkj ck sk hkj = . hk+1,j sk ck hk+1,j End; End; For k = 1, , n 1, For i = 1, , k + 1, ck sk [hik , hi,k+1 ] [hik , hi,k+1 ] . sk ck End; End; This algorithm requires 4n2 ops. Moreover, since QT = J (n 1, n, n1 ) J (1, 2, = QR is upper Hessenberg. Thus the QR iteration preserves 1 ) is lower Hessenberg H Hessenberg structure. We now describe how the Hessenberg decomposition QT 0 AQ0 = H =upper Hessenberg to be computed. Algorithm 5.4.2 (Householder Reduction to Hessenberg Form) Given A Rnn . The following algorithm overwrites A with H = QT 0 AQ0 , where H is upper Hessenberg and Q0 = P1 Pn2 is a product of Householder matrices.

5.4 QR-algorithm (QR-method, QR-iteration) For k = 1, , n 2, of order n k such that Determine a Householder matrix P k ak+1,k . 0 . . k P . = . . . . . . 0 an,k T k ). Compute A Pk APk where Pk = diag (Ik , P End;

197

This algorithm requires 5 n3 ops. Q0 can be stored in factored form below the 3 subdiagonal A. If Q0 is explicitly formed, an additional 2 n3 ops are required. 3 Theorem 5.4.6 (Implicit Q Theorem) Suppose Q = [q1 , , qn ] and V = [v1 , , vn ] are orthogonal matrices with QT AQ = H and V T AV = G are upper Hessenberg. Let k denote the smallest positive integer for which hk+1,k = 0 with the convention that k = n, if H is unreduced. If v1 = q1 , then vi = qi and |hi,i1 | = |gi,i1 |, for i = 2, , k . Moreover if k < n then gk+1,k = 0. Proof: Dene W = V T Q = [w1 , , wn ] orthogonal, and observe GW = W H . For i = 2, k , we have
i1

hi,i1 wi = Gwi1
j =1

hj,i1 wj

Since w1 = e1 , it follows that [w1 , , wk ] is upper triangular and thus wi = ei T for i = 2, , k . Since wi = V T qi and hi,i1 = wi Gwi1 , it follows that vi = qi and |hi,i1 | = |gi,i1 | for i = 2, , k . If hk+1,k = 0, then ignoring signs we have
T T gk+1,k = eT k+1 Gek = ek+1 GW ek = (ek+1 W )(Hek ) k k

= eT k+1
i=1

hik W ei =
i=1

hik eT k+1 ei = 0.

Remark 5.4.3 The gist of the implicit Q theorem is that if QT AQ = H and Z T AZ = G are each unreduced upper Hessenberg matrices and Q and Z have the same rst column, then G and H are essentially equal in the sense that G = D1 HD, where D = diag (1, , 1). We now return to Hessenberg QR iteration in (5.4.13): Give orthogonal Q0 such that H = QT 0 AQ0 is upper Hessenberg. For k = 1, 2, 3, H = QR, (QR factorization) H := RQ, (upper Hessenberg) End

198 Chapter 5. The Unsymmetric Eigenvalue Problem Without loss of generality we may assume that each Hessenberg matrix produced by (5.4.13) is unreduced. If not, then at some stage we have H= H11 H12 0 H22 with H11 Rpp (1 p < n).

The problem decouples into two small problems involving H11 and H22 . The term deation is also used in this context, usually when p = n 1 or n 2. In practice, decoupling occurs whenever a subdiagonal entry in H is suitably small. For example in EISPACK if |hp+1,p | eps(|hp,p | + |hp+1,p+1 |), (5.4.15) then hp+1,p is declared to be zero. Now we will investigate how the convergence (5.4.13) can be accelerated by incorporating shifts. Let R and consider the iteration Give orthogonal Q0 such that H = QT 0 AQ0 is upper Hessenberg. For k = 1, 2, H I = QR, (QR factorization) H = RQ + I, End

(5.4.16)

The scale is refereed to a shift. Each matrix H in (5.4.16) is similar to A, since RQ + I = QT (QR + I )Q = QT HQ. If we order the eigenvalues i of A so that |1 | |n |, then Theorem p+1 k | . Of 5.4.5 says that the p-th subdiagonal entry in H converges to zero with rate | p course if p = p+1 then there is no convergence at all. But if is much closer to n than to the other eigenvalues, the convergence is required. Theorem 5.4.7 Let be an eigenvalues of an n n unreduced Hessenberg matrix H . = RQ + I , where (H I ) = QR is the QR decomposition of H I , then If H n,n1 = 0 and h nn = . h Proof: If H is unreduced, then so is the upper Hessenberg matrix H I . Since QT (H I ) = R is singular and since it can be shown that |rii | |hi+1,i |, i = 1, 2, , n 1, (5.4.17)

is equal to (0, , 0, ). it follows that rm = 0. Consequently, the bottom row of H

5.4.2

Single-shift QR-iteration
Give orthogonal Q0 such that H = QT 0 AQ0 is upper Hessenberg. For k = 1, 2, , Hi hnn I = Qi Ri , (QR factorization) Hi+1 := Ri Qi + hnn I, End

(5.4.18)

5.4 QR-algorithm (QR-method, QR-iteration) Quadratic convergence If the (n, n 1) entry converges to zero and let H= hnn then one step of the single shift QR algorithm leads:

199

= RQ + hnn I. QR = H hnn I, H After n 2 steps in the reduction of H hnn I a is given by And we have (n, n 1) entry in H n,n1 = h If 2 b . 2 + a 2 to upper triangular we have . b 0

a, then it is clear that (n, n 1) entry has order 2 .

5.4.3

Double Shift QR iteration

If at some stage the eigenvalues a1 and a2 of

hmm hmn (m = n 1) are complex, for hnm hnn then hnn would tend to be a poor approximate eigenvalue. A way around this diculty is to perform two single shift QR steps in succession, using a1 and a2 as shifts: H a1 I H1 H1 a2 I H2 = = = = Q1 R1 , R1 Q1 + a1 I, Q2 R2 , R2 Q2 + a2 I.

(5.4.19)

We then have (Q1 Q2 )(R2 R1 ) = = = = where M = (H a1 I )(H a2 I ). (5.4.21) Q1 (H1 a2 I )R1 = Q1 (R1 Q1 + a1 I a2 I )R1 (Q1 R1 )(Q1 R1 ) + a1 (Q1 R1 ) a2 (Q1 R1 ) (H a1 I )(H a1 I ) + a1 (H a1 I ) a2 (H a1 I ) (H a1 I )(H a2 I ) = M, (5.4.20)

200 Chapter 5. The Unsymmetric Eigenvalue Problem Note that M is a real matrix, since M = H 2 sH + tI, where s = a1 + a2 = hmm + hnn R and t = a1 a2 = hmm hnn hmn hnm R. Thus, (5.4.20) is the QR factorization of a real matrix, and we may choose Q1 and Q2 so that Z = Q1 Q2 is real orthogonal. It follows that
T H2 = Q 2 H1 Q2 = Q2 (Q1 HQ1 )Q2 = (Q1 Q2 ) H (Q1 Q2 ) = Z HZ

is real. A real H2 could be guaranteed if we (a) explicitly form the real matrix M = H 2 sH + tI ; (b) compute the real QR decomposition M = ZR and (c) set H2 = Z T HZ. But since (a) requires O(n3 ) ops, this is not a practical course. In light of the Implicit Q theorem, however, it is possible to eect the transition from H to H2 in O(n2 ) ops if we (a ) compute M e1 , the rst column of M ; (b ) determine Householder Matrix P0 such that P0 (M e1 ) = e1 , ( = 0); (c ) compute Householder matrices P1 , , Pn2 such that if Z1 = P0 P1 Pn2 the T Z1 HZ1 is upper Hessenberg and the rst column of Z and Z1 are the same. If T T Z HZ and Z1 HZ1 are both unreduced upper Hessenberg, then they are essentially equal. Since M e1 = (x, y, z, 0, , 0)T , where x = h2 11 + h12 h21 sh11 + t, y = h21 (h11 + h22 s), z = h21 h32 . So, a similarity transformation with P0 only changes rows and T columns 1, 2 and 3. Since P0 HP0 has the form , 0 0 0 0 0 0 0 it follows 0 0 0 0 that 0 0 0

P1

0 0 0 0

P2

0 0 0 0

0 0 0

5.4 QR-algorithm 201 (QR-method, QR-iteration) P4 0 0 P3 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 k , Ink3 ), P k is 3 3-Householder matrix. The applicability of Theorem Pk = diag (Ik , P 5.4.6 (Implicit Q-theorem) follows from that Pk e1 = e1 , for k = 1, , n 2, and that P0 and Z have the same rst column. Hence Z1 e1 = Ze1 . Algorithm 5.4.3 (Francis QR step) Given H Rnn unreduced whose trailing 2 2 principal submatrix has eigenvalues a1 and a2 , the following algorithm overwrites H with Z T HZ , where Z = P1 Pn2 is a product of Householder matrices and Z T (H a1 I )(H a2 I ) is upper triangular. Set m := n 1; s := hmm + hnn ; t := hmm hnn hmn hnm ; x := h2 n + h12 h21 sh11 + t; y := h21 (h11 + h22 s); z := h21 h32 ; For k = 0, , n 2, If k < n 2, then R33 such that Determine a Householder matrix Pk x Pk y = 0 ; z 0 Set T k , Ink3 ; H := Pk HPk , Pk = diag Ik , P else determine a Householder matrix Pn2 R22 such that n2 x = ; P y 0 Set T H := Pn2 HPn 2 , Pn2 = diag In2 , Pn2 ; End if x := hk+2,k+1 ; y := hk+3,k+1 ; If k < n 3, then z := hk+4,k+1 ; End for; This algorithm requires 6n2 ops. If Z is accumulated into a given orthogonal matrix, an additional 6n2 ops are necessary.

Algorithm 5.4.4 (QR Algorithm) Given A Rnn and a tolerance , this algorithm computes the real schur decomposition QT AQ = T . A is overwritten with the Hessenberg decomposition.

202

Chapter 5. The Unsymmetric Eigenvalue Problem Using Algorithm 5.4.2 to compute the Hessenberg decomposition QT AQ = H , where Q = P1 Pn2 and H is Hessenberg; Repeat: Set to zero all subdiagonal elements that satisfy |hi,i1 | (|hii | + |hi1,i1 |); Find the largest non-negative q and the smallest non-negative p such that p H11 H12 H13 npq , 0 H22 H23 H= q 0 0 H33 p npq q where H33 is upper quasi-triangular and H22 is unreduced (Note: either p or q may be zero). If q = n, then upper triangularize all 2 2 diagonal blocks in H that have real eigenvalues, accumulate the orthogonal transformations if necessary, and quit. Apply a Francis QR-step to H22 : H22 := Z T H22 Z ; If Q and T are desired, then Q := Q diag(Ip , Z, Iq ); Set H12 := H12 Z and H23 := Z T H23 ; Go To Repeat.

This algorithm requires 15n3 ops, if Q and T are computed. If only the eigenvalues are desired, then 8n3 ops are necessary.

5.4.4

Ordering Eigenvalues in the Real Schur From

If QT AQ =

T11 T12 with T11 Rpp and (T11 ) (T22 ) = , then the rst p 0 T22 columns of Q span the unique invariant subspace associated with (T11 ). Unfortunately, the Francis iteration leads QT R AQF = TF in which the eigenvalues appear somewhat randomly along the diagonal of TF . We need a method for computing an orthogonal matrix QD such that QT D TF QD is upper quasitriangular with appropriate eigenvalues ordering. Let A R22 , suppose QT F AQF = TF = 1 t12 0 2 , 1 = 2 .

Note that TF x = 2 x, where x = QT Dx = 0 . If Q = QF QD , then

t12 . Let QD be a given rotation such that 2 1

T QT AQ e1 = QT D TF (QD e1 ) = 2 QD (QD e1 ) = 2 e1

and so QT AQ =

2 t12 0 1

5.4 QR-algorithm (QR-method, QR-iteration) 203 Using this technique, we can move any subset of (A) to the top of T s diagonal. See Algorithm 7, 61 pp.241 (Golub & Van Loan: Matrix Computations). The swapping gets a little more complicated when T has 2 2 blocks. See Ruhe (1970) and Stewart (1976). Block Diagonalization Let

T =

T11 T12 T1q 0 T22 T2q . . ... . . . . 0 0 Tqq

}n1 }n2 }nq (5.4.22)

be a partitioning of some real Schur form QT AQ = T Rnn such that (T11 ), , (Tqq ) are disjoint. There exists a matrix Y such that Y 1 T Y = diag(T11 , , Tqq ). A practical procedure for determining Y is now given together with an analysis of Y s sensitivity as a function of the above partitioning. Partition In = [E1 , , Eq ] conformally with T and dene Yij Rnn as follows:
T Yij = In + Ei Zij Ej ,

i < j, Zij Rni nj .

1 = T ij then T and T are identical except that It follows that if Yij T Yij = T

ij = Tii Zij Zij Tjj + Tij , T ik = Tik Zij Tjk (k = j + 1, , q ), T kj = Tki Zij + Tkj (k = 1, , i 1). T ij can be zeroed provided we have an algorithm for solving the Sylvester equation This T F Z ZG = C, where F Rpp , G Rrr are given upper quasi-triangular and C Rpr . Bartels and Stevart (1972): Let C = [c1 , , cr ] and Z = [z1 , , zr ] be column partitionings. If gk+1,k = 0, then by comparing columns in (5.4.23) we nd
k

(5.4.23)

F zk
i=1

gik zi = ck .

Thus, once we know z1 , , zk1 then we can solve the quasi-triangular system
k1

(F gkk )zk = ck +
i=1

gik zi for zk .

If gk+1,k = 0, then zk and zk+1 can be simultaneously found by solving the 2p 2p system F gkk I gmk I gkm I F gmm I zk zm = ck cm
k 1

+
i=1

gik zi gim zi

(m = k + 1).

(5.4.24)

204 Chapter 5. The Unsymmetric Eigenvalue Problem By reordering the equations according to permutation (1, p +1, p +2, , p, 2p), a banded system is obtained that can be solved in O(p2 ) ops. The detail may be found in Bartel and Stewart (1972) and see algorithm 7.62, 63 pp.243 (Golub & Van Loan Matrix Computation). Connection with variant inverse iteration Now let A Cnn . The QR algorithm with respect to the sequence {ki } i=1 of shift: A1 = A, (Ai ki I ) = Qi Ri , Ai+1 = Ri Qi + ki I,

P i = Q1 Q2 Qi .

Theorem 5.4.8 Let ps denote the last column of Ps . The sequence {ps } s=1 is then created by the variant inverse iteration: p0 = en , k1 = pT p Ap0 , for s = 0, 1, 2, p s+1 = (A ks+1 I )1 ps , rs+1 = p s+1 s+1 p ps+1 = rs+1 p s+1 , ks+2 = ps+1 Aps+1 . Proof: APs = Ps As+1 implies
1 1 Ps+1 = Ps Qs+1 Rs+1 Rs +1 = Ps (As+1 ks+1 I )Rs+1 1 = (A Ks+1 I )Ps Rs +1 1/2

and therefore

s+1 I )1 Ps R (since P = Ps ). Ps+1 = (A k s+1 s

s+1 I )1 ps r. From If we denote by r the last diagonal element of Rs+1 , then ps+1 = (A k (As+1 ks+1 I ) Rs+1 = Qs+1 follows that s+1 I )1 Ps R = I Rs+1 Ps (A ks+1 I )1 (A k s+1 and then r = rs+1 . Deation Remove a computed eigenvalue and eigenvector from a matrix. (a) Deation from Hotelling: A is symmetric and real. Let 1 and x1 be the computed eigenvalue and eigenvector respectively, and xT 1 x1 = 1. Then B = A 1 x1 x 1 has the following relation Bxj = Axj 1 x1 xT 1 xj = where Axj = j xj j xj , j = 1 , 0 xj , j = 1 ,

j = 1, , n. B has the eigenvalues {0, 2 , , n }.

5.5 LR, LRC and QR algorithms for positive denite matrices 205 (b) Deation from Wielandt: Let A be arbitrary. We know the fact, that a left eigenvector y to and a right eigenvector x to for = are orthogonal: 0 = (y T A)x y T (Ax) = y T x y T x = ( )y T x. Let 1 and x1 be the given eigenvalue and the eigenvector respectively. Let u = 0 be a vector with uT x1 = 0. Then B = A x 1 uT . From Bx1 = 1 x1 (uT x1 )x1 = (1 uT x1 )x1 follows that the eigenvalue 1 is transformed to 1 uT x1 . If = 1 an eigenvalue, then follows from y T A = y T (y = 0) and y T B = y T A (y T x1 )uT = y T that is also an eigenvalue of B . But the right eigenvectors are changed. (c) Deation with similarity transformation A is arbitrary. Let x1 , 1 be given with Ax1 = 1 x1 . Find a matrix H such that Hx1 = ke1 (k = 0). Then holds HAH 1 Hx1 = 1 Hx1 That is HAH 1 has the form HAH 1 = B has the eigenvalues (A) \ {1 }. 1 bT 0 B . and HAH 1 e1 = 1 e1 .

5.5

LR, LRC and QR algorithms for positive denite matrices


A1 := A, for i = 1, 2, 3, Ai = Li Ri , (LRfactorization of Ai ) Ai+1 := Ri Li .

(a). LR-algorithm: Given matrix A. Consider

(5.5.1)

From (5.4.5)(5.4.7) we have Pi := L1 Li , Si := Ri R1 , Ai+1 := Pi1 APi = Si ASi1 , Ai = Pi Si . There exists the convergence theorem as Theorem 5.4.2. Advantage: less cost of computation at each step. Disadvantage: LR factorization does not always exist.

(5.5.2) (5.5.3)

206 Chapter 5. The Unsymmetric Eigenvalue Problem (b). LRC -algorithm: Let A be symmetric positive denite. Then the LR factorization exists. So we have the following iterative algorithm A1 := A, for i = 1, 2, 3, Ai = Li LT (Cholesky factorization of Ai ) i , T Ai+1 := Li Li . Similar to (5.4.5)(5.4.7) we also have Pi := L1 L2 Li , T 1 T Ak+1 = Pk APk = Pk APk , k T A = Pk Pk . Because all Pi are positive denite, the LRC algorithm is always performable. Theorem 5.5.1 Let A be symmetric positive denite with eigenvalues 1 , , n . Then the LRC algorithm converges: The sequence Ak converges to a diagonal matrix with the eigenvalues of A on the diagonal. If = diag(i ), where 1 > 2 > n > 0, A = U U T and U T has a LR factorization, then Ak converges to .
k Proof: Let Lk = k ij and sm = k and aii > 0, we have n m i=1

(5.5.4)

(5.5.5) (5.5.6)

ak ii , 1 m n. Since all Ak are positive denite

sk m

i=1

ak ii = trace of Ak = trace of A.

i T k k 2 T Thus sk m are bounded. From Ak = Lk Lk follows aii = p=1 | ip | . From Ak+1 = Lk Lk m i m n +1 k 2 k k 2 k+1 k 2 follows ak = i ii p=i | pi | . Hence sm = i=1 p=1 | ip | and sm = i=1 p=i | pi | . +1 k k k k The skizze shows clearly that sk sk m m . So sm converges, and then aii = si si1 m n +1 k k 2 and sk 0. This shows that k m sm = jp pj 0, p = j and since p=1 j =m+1 1 k k k ak + i and ak ii = ii ip ii > 0, so ii converges. So Li converges to a diagonal p=1 matrix. Here Ai = Li LT i . Second part: From A = U U T , U T = LR follows 2 2

As = U s U T = RT LT s LR (s = 2t) T t t T t t t = R L L t R. Since t Lt = I + Et with Et 0 and by continuity of LLT -factorization we have t LT t and sL T t LT t = (I + Et )T (I + Et ) = L s, s L s t R = Ps PsT . As = RT t L s I L

We now have two dierent LLT -decomposition of As . There is a unitary diagonal matrix Ds with s Ps Ds = RT T L

5.5 LR, LRC and QR algorithms for positive denite matrices and hence
1 1 1 Ds As+1 Ds = Ds Ps APs Ds 1 s t RT ART t L s = L 1 t LT LT t L s. = L s

207

Since A = U U 1 = RT LT LT RT and LT LT is a upper triangular with diagonal s I , it holds D1 As+1 Ds , also , it holds t LT LT t and because of L s As+1 . Remark 5.5.1 (i) One can also develop shift-strategy and deation technique for LR and LRC algorithm as in QR algorithm. (ii) If A is a (k, k )-band matrix, then L1 is a (k, 0)-band matrix and therefore A2 = LT 1 L1 is also a (k, k )-band matrix. The band structure is preserved. (c). QR-algorithm for positive denite matrices We apply QR-algorithm (5.4.3)(5.4.4) to symmetric matrices. From Ai+1 = Q i Ai Qi follows that Ai are symmetric. Theorem 5.5.2 The QR algorithm converges for positive denite matrices. The proof follows immediately from the following Theorem 5.5.3. We consider now the iteration of QR algorithm Ai+1 = Q i Ai Qi and the iterations of LRC algorithm i := Li LT , A i i+1 = LT Li . A i

Theorem 5.5.3 The (i + 1)-th iteration Ai+1 of QR algorithm for positive denite A 2i+1 of LRC algorithm for i = 0, 1, 2, . corresponds to the (2i + 1)-th iteration A Proof: From (5.4.5)(5.4.7) we have Pi := Qi Qi , Si := Ri R1 and Ai = Pi Si , T , i P Ai = P i From(5.5.8) follows A2i = (Ai ) Ai = Si Pi Pi Si = Si Si . Ai+1 = Si ASi1 . i+1 = P T AP T . A i i (5.5.8) i = L1 Li , we have Similarly, from (5.5.2) and (5.5.3) with P (5.5.9) (5.5.7)

208 Chapter 5. The Unsymmetric Eigenvalue Problem On the other hand from (5.5.9) with i 2i follows 2i P T . A2i = P 2i T and hence From the uniqueness of LRC factorization of positive diagonal follows Si = P 2i according to (5.5.8) (5.5.9) it holds
T 2 T Ai+1 = Si ASi1 = P i AP2i = A2i+1 .

The proof of Theorem 5.5.2 is now from Theorem 5.5.1 and Theorem 5.5.3 evident. Remark 5.5.2 For positive denite matrices two steps of LLT algorithm are as many as one step of QR algorithm. This shows that QR algorithm is much more favorable.

5.6

qd-algorithm (Quotient Dierence)

We indicated in Remark 5.5.1(ii), the band structure is preserved by LR algorithm. Let A = A1 be a (k, m)-band matrix. Then all Li , (k, 0), all Ri , (0, m) and all Ai , (k, m)band matrices, respectively. Especially tridiagonal form is preserved. A transformation of LR-algorithm for tridiagonal matrices derives to qd-algorithm. A tridiagonal matrix 2 1 0 3 2 2 . . . .. .. .. = A (5.6.1) .. .. . . n 0 n n i = 0 (i = 2, , n) can be transformed for 1 = A, where the form DAD 1 1 1 2 .. . A= 0 2 , 2 3 , , 2 n ) to with D = diag(1, 0 1 .. .. (5.6.2)

. . .. .. . . 1 n n

i with i = i and i = i . Hence without loss of generality (5.6.2) for tridiagonal matrices. We now apply LR-algorithm to s 1 0 1 1 .. . s s .. . es 2 2 2 As = , Ls = . . . .. ... .. .. 1 s s 0 es n 0 n n s q1 1 ... ... Rs = .. . 0

we can study the form (5.6.2): 0 , 1 0 (5.6.3)

. 1 s qn

5.6 qd-algorithm (Quotient Dierence) The LR factorization As = Ls Rs can be obtained by element comparison: (1, 1) : (i, i 1) : (i, i) : (i, i + 1) :
s s = q1 , 1 s s s i = ei qi1 , i = 2, , n, s s i = es i + qi , i = 2, , n, 1 = 1 1, i = 1, , n 1.

209

(5.6.4)

s s s s s s We can determine es i , qi from above equations for a given As in the sequence q1 , e2 , q2 , e3 , q3 , s . . . , qn and compute As+1 = Rs Ls by s+1 s + es i = qi i+1 , i = 1, , n 1 s s+1 n = qn , (5.6.5) s +1 s s i = 2, , n. i = qi ei ,

We write s + 1 instead of s in (5.6.4), then we can eliminate As+1 and obtain


s+1 +1 s+1 s (i =)es + qi = qi + es i+1 , i = 1, , n i . s+1 s+1 s+1 s s (i =)ei qi1 = qi ei , i = 2, , n

(5.6.6)

For the convenience of notation we suppose es 1 = 0, es n+1 = 0, s = 1, 2, . (5.6.7)

The equations (5.6.6) can be represented by the qd-scheme and the Rhomben rules: qd-Scheme (es 1 s+1 e1
s =) 0 q1 es 2 s+1 +1 s = 0 q1 es q2 2 s+1 q2

.. ..

s s . qn 1 en s+1 s+1 s . qn qn 1 en s+1 qn

0 = es n+1 +1 0 = es n+1

The rst equations in (5.6.6) can be formulated as sum rule:


s qi +1 es i

..

. es i+1

..

.
s+1 qi

The sum of elements of upper rows is equal to the sum of elements of lower rows. Thus,
s+1 s+1 s . = qi + es qi i+1 ei

(5.6.8)

The second equations in (5.6.6) can be formulated as product rule: es i


s+1 qi 1

..

.
s qi

..

.
+1 es i

210 Chapter 5. The Unsymmetric Eigenvalue Problem The product of elements of upper rows is equal to the product of elements of lower rows. Thus, s es i qi s+1 ei = s+1 . (5.6.9) qi1 With these rules a new qd-rows can be determined by sum and product rules from left to right. Start according to (5.6.4) with s = 1. The formulas (5.6.8)(5.6.9) interpret the name quotient-dierence algorithm.

5.6.1

The qd-algorithm for positive denite matrix

in (5.6.1) is positive denite, then det A > 0, and it also holds for A because If A 1 det D = det A > 0. This also holds for principal determinants det A = det D det A h1 , , hn of A. They are positive and equal to principal determinants of A, respectively. In general we have Lemma 5.6.1 If a matrix B is diagonal similar to a positive denite matrix C , then all principal determinants of B are positive. Lemma 5.6.2 A matrix in the form (5.6.2) is diagonal similar to a symmetric tridiagonal matrix, if and only if, i > 0, for i = 2, , n. Especially this matrix is irreducible. Proof: If i > 0, then D1 AD is symmetric, where D = diag(1, t2 , t2 t3 , , t2 tn ), ti := i .

= D1 AD symmetric, then Reversely, if D is a diagonal matrix, D = diag (di ) and A a i,i+1 = di+1 di = a i+1,i = i (di /di+1 ) and di+1 di = 0. So i = ( ai,i+1 )2 > 0. Theorem 5.6.1 The qd-algorithm converges for irreducible, symmetric positive denite is irreducible and positive denite, then it holds the quantridiagonal matrices. i.e. If A tities computed from (5.6.2) (5.6.4) (5.6.8)(5.6.9): es i > 0,
s > 0, qi s

lim es i = 0,
s = 0, lim qi

i = 2, , n, i = 1, , n.

(5.6.10) (5.6.11)

and satisfy Hereby i , i = 1, , n are the eigenvalues of A 1 > 2 > > n > 0. (5.6.12)

Proof: Let hk i be the i-th principal determinant of Ak . We rst show that by induction on k : k k ek i > 0, i = 2, , n, qi > 0, hi > 0, i = 1, , n. For A = A1 , Lemma 5.6.1 shows that: h1 i > 0, i = 1, , n. In addition we have from As = Ls Rs that s s hs i = 1, , n. (5.6.13) i = qi qi ,

5.6 qd-algorithm (Quotient Dierence) 211 1 1 Hence for s = 1 : qi > 0, i = 1, , n. From Lemma 5.6.2 follows i = i > 0, i = 2, , n, so from (5.6.4) we get
1 e1 i = i 1 qi 1 > 0.

We suppose the above assertion is true until k 1, then from (5.6.5) follows
k 1 k 1 ei > 0, ik = qi

so from Lemma 5.6.2 and Lemma 5.6.1 we have that Ak is diagonal similar to a symmetric . Hence all hk matrix, which must be positive denite, because Ak is similar to A i > 0. k k Therefore from (5.6.13) qi and from (5.6.4) ei are also positive. We now show that k lim ek i = 0, lim qi = qi > 0.
k k s+1 From (5.6.6) for i = n, qn s +1 = qn converges and es n +1 s + es = qn follows n s+1 qn approaches to s s that qn is monotone decreasing, so qn zero. Adding the following equations

together

k+1 k +1 qn = qn ek n , k+2 k+1 k+1 +2 qn ek 1 = qn1 + en n1 , . . . k+ +1 k+ k+ k+ +1 qn = qn + en+1 en ,

we get that
k+1 k+2 k+ +1 k k+1 k + k+ +1 qn + qn = qn + qn 1 + + qn 1 + + qn en ,

i.e.,
k+ +1 +1 k = k en .

The sequence k is positive, monotone decreasing, so it converges, for = 1, , n 1. k Hence q converges to a number q 0, thus limk ek = 0. So lims Ls = I and hence q1 1 0 ... ... 1 lim As = lim Ls Rs = lim Rs = . .. s s s . 0 qn . It is necessary to show that qi are in This shows that qi are the eigenvalues of A and A s s qi decreasing order. Suppose qi /qi1 > 1 for one i, then also holds for all s, qi 1 > 1. This contradicts that +1 s s s es = es i qi qi1 and ei 0. i On the other hand, qi = qi1 is not possible, since a tridiagonal matrix with nonzero subdiagonal only possesses simple eigenvalues. Remark 5.6.1 It is remarkable that the qd-algorithm has the advanced applications in the numerical mathematics for the computation of roots of polynomials.

212

Chapter 5. The Unsymmetric Eigenvalue Problem

Chapter 6 The Symmetric Eigenvalue problem


6.1 Properties, Decomposition, Perturbation Theory
A Hermitian A = A A = (aik ), aik = a ki , i, k = 1, , n. T A symmetric A = A, A = A aik = aki , aik = a ik , i, k = 1, , n. Theorem 6.1.1 (Schur Decomposition for Hermitian matrices) If A Cnn is Hermitian (real symmetric), then there exists a unitary (orthogonal) Q such that Q AQ = diag (1 , , n ), Aqi = i qi , i = 1, , n, Q = [q1 , , qn ]. (1.1)

Proof: Let Q AQ = T be the Schur Decomposition of A. It follows that T must be a direct sum of 1 1 and 2 2 matrices, since T is Hermitian. But a 2 2 Hermitian matrix can not have complex eigenvalues. Consequently, T has no 2 2 block along its diagonal. Classical techniques: There are extremely eetive techniques based on the minimax principle, for investigating the eigenvalues of the sum of two symmetric matrices. Let X be a symmetric matrix dened by X= a aT diag (i ) (i = 1, , n).

We wish to relate the eigenvalues of X with the i . Suppose that only s of the components of a are non-zero. If aj is zero, then j is an eigenvalue of X . There exists a permutation P such that bT 0 , 0 Y = P T XP = b diag (i ) 0 0 diag (i ) where no component of b is zero, diag (i ) is of order s, and diag (i ) is of order n 1 s.

214 Chapter 6. The Symmetric Eigenvalue problem The eigenvalues of X are therefore i together with those of the matrix Z dened by Z= bT b diag (i ) .

If s = 0, Z is the single element and hence the eigenvalues of X are diag (i ) and . Otherwise examine the characteristic polynominal of Z :
s s

( )
i=1

(i )
j =1

b2 j
i=j

(i ) = 0.

(1.1.1)

Suppose that there are only t distinct values among the i . W.l.o.g. we may take them to be 1 , , t with multiplicities r1 , r2 , , rt respectively, so that r1 + r2 + + rt = s. Clearly the left-hand side of (1.1.1) has the factor
t

(i )ri 1 ,
i=1

so that i is an eigenvalue of Z of multiplicity (ri 1). ri Dividing (1.1.1) by t i=1 (i ) we see that the remaining eigenvalues of Z are the roots of
t

0 = ( )
i=1

1 c2 f (), i (i )

(1.1.2)

2 where c2 i is the sum of the ri values bj associated with i and is therefore strictly positive. A graph of f () against is given as follows, where it is assumed that distinct i are in decreasing order. It is immediately evident that the t+1 roots of = f () which we denote by 1 , 2 , , t+1 satisfy

> 1 > 1 ; i1 > i > i (i = 2, 3, , t); t > t+1 > The n eigenvalues of X therefore fall into three sets:

(1.1.3)

(1) The eigenvalues 1 , , n1s corresponding to the zero ai . These are equal to n 1 s of the i . (2) s t eigenvalues consisting of ri 1 values equal to i (i = 1, 2, , t). These are equal to a further s t of the i . (3) t + 1 eigenvalues equal to i satisfying (1.1.3). If t = 0 then 1 = . Let the eigenvalues of X be denoted by 1 2 n . Then it is an immediate consequence of our enumeration of the i above that if the i are also arranged in nonincreasing order then 1 1 2 2 n1 n . In other words the i separate the i at least in the weak sense. (1.1.4)

6.1 Properties, Decomposition, Perturbation Theory 215 Consider now the eigenvalues of X derived from X by replacing and . The eigenvalues of X will equal to those of X as far as sets (1) and (2) are concerned. Let us denote those in (3) by 1 , 2 , , t+1 . Now for > 0, we have df =1+ d
t

i=1

c2 i > 1, (i )2

(1.1.5)

and hence i i lies between 0 and . We may write i i = mi ( ), (1.1.6)

+1 where 0 mi 1 and t i=1 mi = 1. If t = 0 then 1 = and 1 = and 1 1 = . Hence we may write in all cases

i i = mi ( ),
+1 where 0 mi 1 and t i=1 mi = 1. Since the other eigenvalues of X and X are equal, we have established a correspondence between n eigenvalues 1 , , n and 1 , , n of X and X respectively.

i i = mi ( ),
n

(1.1.7)

0 mi 1,
i=1

mi = 1 ,

where mi = 0 for the eigenvalues from sets (1) and (2). Now let C = A + B , where A and B are symmetric and B is of rank 1. There exists 0 an orthogonal matrix R such that RT BR = , = 0. Let 0 RT AR = aT a An1 .

Then there is an orthogonal matrix S of order n-1 such that S T An1 S = diag (i ), and if we dene Q by Q=R then Q is orthogonal and QT (A + B )Q = bT b diag (i ) + 0 0 , 1 0 0 S ,

where b = S T a, the eigenvalues of A and of (A + B ) are therefore those of bT b diag (i ) and bT + b diag (i ) ,

216 Chapter 6. The Symmetric Eigenvalue problem and if we denote these eigenvalues by i and i in decreasing order, then they satisfy i i = mi ,
n

(1.1.8) mi = 1.

0 mi 1
i=1

Hence when B is added to A, all eigenvalues of the latter are shifted by an amount which lies between zero and the eigenvalue of B . We summary the above discussion as the following theorem. Theorem 6.1.2 Supose B = A + c cT , where A Rnn is symmetric, c Rn has unit 2-norm and R. If 0 then i (B ) [i (A), i1 (A)], while if 0 then i (B ) [i+1 (A), i (A)], In either case i (B ) = i (A) + mi , where m1 + m2 + + mn = 1 and mi 0. Let i (A) denote the ith largest eigenvalue of A. Then n (A) n1 (A) 1 (A). Denition 6.1.1 If A = A ,x = 0, then R[x] = xT Ax xT x (1.2) i = 1, 2, . . . , n 1. i = 2, 3, . . . , n,

is called the Rayleigh-Quotient of x, sometimes denoted by R[x, A]. Theorem 6.1.3 If A = A , then it holds n (A) R[x] 1 (A). Proof: From (1.1) we have x Ax x U AU x y y R[x] = = = = xx x U U x yy
n 2 i=1 i |yi | , (y n 2 i=1 |yi |

(1.3)

= U x).

(1.4)

Thus R[x] is a convex combination of i , it follows (1.3). Corollary 6.1.4 1 (A) = max R[x]
x=0

and

n (A) = min R[x].


x=0

6.1 Properties, Decomposition, Perturbation Theory 217 Theorem 6.1.5 (Weyl) If A is Hermitian with eigenvalues 1 2 n and eigenvectors u1 , , un , then it holds i = max{R[x] : x = 0, x uk , k = 1, , i 1}, for i = 1, , n. Proof: It is clear for i = 1. Let i > 1. If x u1 , , ui1 then u k x = 0, for k = 1, , i 1. So y = U x satises yk = 0, k = 1, , i 1 (Here U = [u1 , , un ]). It follows from (1.4) that n 2 j =i j yj i . R[x] = n 2 j =i yj For x = ui , we have R[x] = i , so (1.5) holds. Theorem 6.1.6 (Courant-Fischer) Under above assumptions we have i = i = i =
{p1 , ,pn }l.i.

(1.5)

min

{max{R[x] : x = 0, x pk , k = 1, , i 1}} {max{R[x] : x S \ {0}}}.

(1.6) (1.7) (1.8)

dimS =n+1i dimS =i

min

max {min{R[x] : x S \ {0}}}.

Proof: (1.6) (1.7) trivial. (1.7) (1.8): Applying (1.7) to A, we then have i (A) = That is n+1i (A) =
dimS =n+1i dimS =n+1i

min

{max{R[x] : x S \ {0}}}.

max

{min{R[x] : x S \ {0}}}.

(Use max(ai ) = min(ai ), min(ai ) = max(ai )). By substituting i n + 1 i follows (1.8). Claim (1.6): Since 1 = maxx=0 (R[x]), for i = 1 it is true. Consider i > 1 : Let p1 , , pi1 = 0 be given. The linear system pk x = 0, uk x = 0 , k = 1, , i 1, k = i + 1, , n

has a solution x = 0, because of n 1 homogenous equations with n variables. Let U = [u1 , , un ]. Then R[x] = x U U x = x U U x
i 2 j =1 j |U x|j i 2 j =1 |U x|j

i .

But p k x = 0, k = 1, , i 1 so max{R[x] : x pk , k = 1, , i 1} i .

218 This implies


{pk }k=1

Chapter 6. The Symmetric Eigenvalue problem i min {max{R[x] : x pk , k = 1, , i 1}}. i1

Now set pk = uk , k = 1, , i 1. By (1.5) we have the equality (1.6). Theorem 6.1.7 (Separation theorem) A is Hermitian with eigenvalues n n1 1 . Let a11 a1,n1 . . . An1 = . . . an1,1 an1,n1 be the n 1 principal submatrix of A with eigenvalues n1 1 . Then it holds s+1 s s , for s = 1, , n 1. Proof: Let z = x 0 Cn , where x Cn1 . Then x An1 x z Az = . x x zz Applying (1.5) to An1 we have s = max{ x An1 x : 0 = x Cn1 , x ui , i = 1, , s 1} x x z Az ui = max{ : 0 = z Cn , z , eT n z = 0, i = 1, , s 1} 0 z z min max{R[z ] : z pi , i = 1, , s} = s+1 (By (1.6)). s
{pi }i=1 l.i.

therefore s+1 s . Here ui is the eigenvector of An1 . Now set A A then s+1 (A) s (An1 ). Thus ns (A) ns (An1 ). It follows ns (A) ns (An1 ). Hence we have ns ns . By setting s n s, we have s s . Theorem 6.1.8 (Separation theorem) Let 1 n be the eigenvalues of A and 1 n1 be the eigenvalues of B , where B is obtained by scratching a row and the same column of A, then s+1 s s . Further consequence are: If B consists of by scratching two rows and the coresponding columns of B , i.e., A B B , then we have i+2 i+1 i i i and i+2 i i . In general: Let B be the principal submatrix of A of order n r, then i+r (A) i (B ) i (A), i = 1, , n r.

6.1 Properties, Decomposition, Perturbation Theory 219 Theorem 6.1.9 (Perturbation theorem) Let A, E be Hermitian. Then it holds i (A) + n (E ) i (A + E ) i (A) + 1 (E ), Proof: For x = 0, R[x, A + E ] = R[x, A] + R[x, E ]. Thus R[x, A] + n (E ) R[x, A + E ] R[x, A] + 1 (E ). Applying (1.6) we get i (A) + n (E ) i (A + E ) i (A) + 1 (E ). i = 1, , n. (1.9)

Corollary 6.1.10 (Monotonic theorem) If E is positive semidenite, then it holds i (A + E ) i (A). Corollary 6.1.11 (Weyls theorem) It holds |i (A + E ) i (A)| max{1 (E ), n (E )} = max{|i (E )|, i = 1, , n} = (E ) = E 2 = spectral radius of E. Theorem 6.1.12 (Homann-Wielandt) If A, E are Hermitian matrices, then
n n

(i (A + E ) i (A)) E
i=1

2 F

=(
i=1

i (E )2 ) 2 .

Proof: Later! Denition 6.1.2 A matrix B = (bij ) is called double stochastic(d.s.), if (1) bij 0. (2) n n j =1 bij = j =1 bji = 1, for i, j = 1, , n. Remark: The d.s. matrices form a convex set D. is double stochastic. Example: Let W be orthogonal and W = (wik ). Then (|wik |2 ) = W Example: Let P be a permutation matrix. Then P is double stochastic (Extreme point of D). Theorem 6.1.13 (Birkho ) D is the convex closure of the permutation matrices, that is, for B D, there exists 1 , , r and P1 , , Pr permutations such that
r r

B=
i=1

i pi ,

i 0,
i=1

i = 1.

220 (Without Proof !)

Chapter 6. The Symmetric Eigenvalue problem

Remark: Let l be a linear functional from D into R. Then it holds


r P P erm.

min l(P ) l(B ) = l(


i=1

i pi ) max l(P ).
P P erm.

Proof of Homann-Wielandt theorem 6.1.12: Let A = U U , = diag (1 (A), , n (A)),

i ). V , = diag (1 (A + E ), , n (A + E )) diag ( A+E =V Then V E = A (A + E ) = U U V V U )U = V (V U W )U = V (W and since W = V U is unitary, we have


n

2 F

= =

W W
n

2 F

=
i,k=1

i )|2 |wik (k

k |2 = l(W ) l(P ) (for some P ) |wik |2 |i


i,k=1

= (|wik |) is in D). (Hereby W


n

=
k=1

(k) |2 (for some permutation ) | k


n

= min
k=1 n

(k) |2 | k

=
k=1

(k (A) k (A + E ))2 . (Exercise!)

Perturbation theorem of invariant subspaces ( eigenvectors ) Theorem 6.1.14 A Rnn symmetric, S Rmm symmetric and AQ1 Q1 S = E1 with Q1 Rnm , QT 1 Q1 = I m . Then there exist eigenvalues 1 , , m of A such that |i i (S )| E1 2 . (1.11) (1.10)

6.1 Properties, Decomposition, Perturbation Theory Proof: Extend Q1 to an orthogonal matrix Q = (Q1 , Q2 ), then QT AQ = =
T QT 1 AQ1 Q1 AQ2 T Q2 AQ1 QT 2 AQ2

221

= +

T S + QT E1 Q2 1 E1 T T Q2 E 1 Q2 AQ2 T QT 1 E1 E1 Q2 QT X 2 E1

(by (1.10))

S 0 T 0 Q2 AQ2 X

B + E.

T Here QT 1 E1 = Q1 AQ1 S is symmetric. Corollary 6.1.10 results

|i i (S )| E 2 . Show that: E 2 = E1 2 for suitable X . It holds E1 2 E 2 . The equality holds immediately from the Extension Theorem of Kahan(1967): H Extension Lemma: Let R = , H = H . There exists a W such that the extend B H B matrix A = satises A 2 = R 2 . B W Proof of Extension Lemma: Let = R 2 . For any choice of W we have 2 A2 2 (by separation theorem). The theorem requires that for some W the matrix 2 A2 is positive semidenite. Take any > , show that 2 A2 > 0 for some W depending on . Then a limiting argument show that, as + , lim W ( ) exists. = B . Write A = (R, R ). Then For any W : Dene R W 2 A2 = and 2 RR = where [I + R( 2 R R)1 R ]R, U ( ) = 2 R 2 2 2 1 V ( ) = [I B ( H ) B ].
2 2 2 2 Since 2 > 2 = R 2 2 = R R 2 = RR 2 , R R, RR and H are all positive denite. By Sylvesters Inertia theorem we have V ( ) positive denite. U ( ) depends on W .

I 0 L I I 0 K I

2 R R 0 0 U ( ) 2 H 2 0 0 V ( )

I L 0 I I K 0 I

The trick of the proof: To nd a W such that U ( ) = V ( ), and then from Sylvesters follows 2 A2 > 0. First we prove that W ( ) = BH ( 2 H 2 )1 B = B ( 2 H 2 )1 HB .

222 From above

Chapter 6. The Symmetric Eigenvalue problem

U ( ) = 2 BB W 2 (BH + W B )( 2 H 2 B B )1 (HB + B W ). Consider ( 2 H 2 B B )1 = ( 2 H 2 )1 + ( 2 H 2 )1 B [I B ( 2 H 2 )1 B ]1 B ( 2 H 2 )1 (Scherrman-Morrison formula) S + SB XBS, where S = ( 2 H 2 )1 and X = (I B ( 2 H 2 )1 B )1 = (I BSB )1 . Set Y = BSHB . Then by SH = HS we get U ( ) = = + = Then = W 2 + W Y + Y XY + W (I X 1 )XY + Y W + W (I X 1 )W +Y X (I X 1 )W + W (I X 1 )X (I X 1 )W = Y XY + W XY + Y XW + W XW = (Y + W )X (Y + W ) = 0. Thus W ( ) = Y = BSHB = B ( 2 H 2 )1 HB . The matrix W ( ) is a rational, and therefore meromorphic function of complex variable . Its only singularities are poles in any neighborhood of which W 2 must be unbounded. However W 2 A 2 < for all > and thus W ( ) must be regular at = and so W () = lim+ W ( ). By continuity of norm we have A()
2

2 BB W 2 (BH + HB )(S + SB XBS )(HB + B W ) 2 BB W 2 BSH 2 B + W Y + Y XY + W BSB XY + Y W W BSB W + Y XBSB W + W BSB W + W BSB XBSB W V ( ) + (remainder term).

= lim+ A( )

= .

Generalized Extension Theorem (C. Davis-Kahan-Weinberger) Given H, B, E arbitary, then there exists W with H E B W = max{
2

H B

,
2

(H, E ) 2 }.

6.1 Properties, Decomposition, Perturbation Theory So for suitable X we have E


2

223

QT 1 E1 QT 2 E1

= QT E 1

= E1 2 .

Theorem 6.1.15 A Rnn and S Rmm are symmetric and AX1 X1 S = E1 , where X1 Rnm satises m (X1 ) > 0, then there exists eigenvalues 1 , , m of A such that |i i (S )| E1 2 /m (X1 ). Proof: Let X1 = Q1 R1 be the QR-decomposition of X1 . By substituting into AX1 X1 S = E1 we get AQ1 Q1 S = F1 , where S1 = R1 SR1 1 and F1 = E1 R1 1 . The theorem follows by applying theorem 6.1.14 and noting that (S ) = (S1 ) and F1 2 E1 2 /m (X1 ). The eigenvalue bounds in theorem 6.1.14 depend on the size of the residual of the approximate invariant subspace, i.e., upon the size of AQ1 Q1 S . The following theorem tells how to choose S so that this quantity is minimized when = F . Theorem 6.1.16 If A Rnn is symmetric and Q1 Rnm satises QT 1 Q1 = Im , then
S Rmm

min

AQ1 Q1 S

= AQ1 Q1 (QT 1 AQ1 )

= (I Q1 QT 1 )AQ1

F.

Proof: Let Q2 Rn(nm) be such that Q = [Q1 , Q2 ] is orthogonal. For any S Rmm we have AQ1 Q1 S
2 F

= QT AQ1 QT Q1 S

2 F

= QT 1 AQ1 S

2 F

+ QT 2 AQ1

2 F.

Clearly, the minimizing S is given by S = Q1 T AQ1 . Theorem 6.1.17 Suppose A Rnn is symmetric and Q1 Rnk satises QT 1 Q1 = Ik . If Z T (QT 1 AQ1 )Z = diag (1 , k ) = D is the Schur decomposition of QT 1 AQ1 and Q1 Z = [y1 , , yk ], then Ayi i yi
2

= |(I Q1 QT 1 )AQ1 Zei

(I Q1 QT 1 )AQ1

for i = 1, , k . The i are called Ritz values, the yi are called Ritz vectors, and the (i , yi ) are called Ritz pairs. Proof: Ayi i yi =AQ1 Zei Q1 ZDei =(AQ1 Q1 (QT 1 AQ1 ))Zei . The theorem follows by taking norms. Denition 6.1.3 The inertia of a symmetric matrix A is a triplet of integers (m, z, p), where m, z and p are the number of negative, zero and positive elements of (A).

224 Chapter 6. The Symmetric Eigenvalue problem Theorem 6.1.18 (Sylvester Law of Interia) If A Rnn is symmetric and X Rnn in nonsingular, then A and X T AX have the same inertia. Proof: Suppose r (A) > 0 and dene the subspace S0 Rn by S0 = Span{X 1 q1 , , X 1 qr }, qi = 0, where Aqi = i (A)qi and i = 1, , r. From the Minimax characterization of r (X T AX ) we have y T (X T AX )y y T (X T AX )y r (X T AX ) = max min min . y S0 dim(S )=r y S yT y yT y Now for any y Rn we have y T (X T X )y n (X )2 , while for y S0 . yT y It is clear that y T (X T AX )y r (A). y T (X T X )y y T (X T AX )y y T (X T X )y } r (A)n (X )2 . T T T y (X X )y y y

Thus, r (X T AX ) min{
y S0

An analogous argument with the roles of A and X T AX reversed shows that r (A) r (X T AX )n (X 1 )2 = r (X T AX )/1 (X )2 . It follows that A and X T AX have the same number of positive eigenvalues. If we apply this result to A, we conclude that A and X T AX have the same number of negative eigenvalues. Obviously, the number of zero eigenvalues possessed by each matrix is also the same.

6.2 Tridiagonalization and the Symmetric QR-algorithm

6.2

Tridiagonalization and the Symmetric QR-algorithm

225

We now investigate how the practical QR algorithm develop in Chapter 1 can be specialized when A Rnn is symmetric. There are three obvious observations: (a) If Q0 T AQ0 = H is upper Hessenberg, then H = H T must be tridiagonal. (b) Symmetry and tridiagonal band structure are preserved when a single shift QR step is performed. (c) There is no need to consider complex shift, since (A) R. Algorithm 2.1 (Householder Tridiagonalization) Given symmetric A Rnn , the following algorithm overwrites A with Q0 T AQ0 = T , where T is tridiagonal and Q0 = P1 Pn2 is the product of Householder transformations. For k = 1, 2, , n 2, determine a Householder Pk Rnk such that ak+1,k 0 . = . k P . . . . . ank 0 T k ). A := Pk AP , Pk = diag (Ik , P
k

This algorithm requires 2 n3 ops. If Q0 is required, it can be formed with an additional 3 (2/3)n3 ops. We now consider the single shift QR iteration for symmetric matrices. T = QT (tridiagonal) 0 AQ0 , For k = 0, 1, T I = QR, (QR decomposition) T := RQ + I. Single Shift: Denote T by a1 b2 .

(6.2.1)

.. . b a T = 2 .2 . . . . . bn bn an

We can set (a) = an or (b) a more eective choice to shift by the eigenvalues of an1 bn that is closer to an . This is known as the Wilkinson shift and is given by bn a n = an + d sign(d) d2 + b2 n, d = (an1 an )/2. where (6.2.2)

Wilkinson (1968) has shown that (6.2.2) is cubically convergent with either shift strategy, but gives heuristic reasons why (6.2.2) is prefered.

226 Chapter 6. The Symmetric Eigenvalue problem Implicit Shift: As in the unsymmetric QR iteration, it is possible to shift implicitly in (6.2.1). Let c = cos() and s = sin() by computed such that c s s c
T

a1 b2

(as in (6.2.1)) and J1 = J (1, 2, ). then J1 e1 = Qe1 , where QT T Q = RQ + I = T J1 T J 1 =


T

+ 0 0

0 0

+ 0

0 0

0 0 0

We are thus in a position to apply implicit Q theorem provided we can compute rotations J2 , , Jn1 with the property that if Z = J1 Jn1 then Ze1 = J1 e1 = Qe1 and Z T T Z is tridiagonal. T :=
T J2 T J2

0 0 0 0 0 0

+ 0 0 0

0 0 0 0

0 + 0 0

0 0 0 0 0 0

, T :=
T J3 T J3

0 0 0

0 0

0 +

0 0

0 0 +

T :=

T J4 T J4

Algorithm 2.2 (Implicit Symmetric QR step with Wilkinson Shift) Given an unreduced symmetric tridiagonal matrix T Rnn , the following algorithm overwrites T T T Z , where Z = J1 Jn1 is the product of Givens rotation with Z T (T I ) is with Z upper triangular and is Wilkinson shift. d := (tn1,n1 tnn )/2,
2 2 = tnn t2 n,n1 /[d + sign(d) d + tn,n1 ],

x := t11 , z := t21 , For k = 1, , n 1, determine c = cos(), s = sin() c s x such that = , s c z 0 T T := Jk T Jk , Jk = J (k, k + 1, ). If k < n 1, then x := tk+1,k , z = tk+2,k .

6.2 Tridiagonalization and the Symmetric QR-algorithm 227 nn Algorithm 2.3 (Symmetric QR algorithm) Given symmetric matrix A R and a T tolerance , the following algorithm overwrites A with Q AQ = D + E , where Q is orthogonal, D is diagonal and E saties E 2 eps A 2 . Using Algorithm 2.1 compute A := (P1 Pn1 )T A(P1 Pn2 ) = T. Repeat set ai+1,i and ai,i+1 to zero if |ai+1,i | = |ai,i+1 | (|aii | + |ai+1,i+1 |) for any i = 1, , n 1. Find the largest q and the smallest p such that if A11 0 0 }p A = 0 A22 0 }n p q 0 0 A33 }q then A33 is diagonal and A22 has no zero subdiagonal elements. If q = n then stop. Iq )T A diag (Ip , Z, Iq ) , Apply algorithm 2.2 to A22 , A = diag (Ip , Z, Go to Repeat. This algorithm requires about (2/3)n3 ops and about 5n3 ops if Q is accumulated.

228

6.3

Once Again:The Singular Value Decomposition


2 2 V T (AT A)V = diag (1 , , n ) Rnn and 2 2 U T (AAT )U = diag (1 , , n , 0, , 0) Rnn .

Chapter 6. The Symmetric Eigenvalue problem

Let A Rmn . If U T AV = diag (1 , , n ) is the SV D of A (m n) then (3.1) (3.2)

Moreover if U =

U1 , U 2

and we dene the orthogonal Q by 1 Q= 2 V V 0 U1 U1 2 U2 ,

then QT

0 AT A 0

Q = diag (1 , , n , 1 , , n , 0, , 0).

(3.3)

These connections to the symmetric eigenvalue problem allow us to develop an algorithm for SV D as previous section. Theorem 6.3.1 If A Rmn , then for k = 1, , min{m, n}, k (A) = max { min y T Ax Ax 2 } = max {min }. dimS =k xS x 2 y 2 x 2

dimS =k,dimT =k xS,y T

Proof: Exercise! Prove theorem 6.1.5 (Weyl) and theorem 6.1.6 (Courant-Fisher)! By applying theorem 6.1.9 to to AT A we obtain Corollary 6.3.2 If A and A + E are in Rmn (m n), then for k = 1, 2, , n |k (A + E ) k (A)| 1 (E ) = E 2 . Corollary 6.3.3 Let A = [a1 , , an ] be a column partitioning of A Rmn (m n). If Ar = [a1 , , ar ], then for r = 1, , n 1, 1 (Ar+1 ) 1 (Ar ) 2 (Ar+1 ) r (Ar+1 ) r (Ar ) r+1 (Ar+1 ). Theorem 6.3.4 If A and A + E are in Rmn (m n), then
n

0 AT A 0

and

0 (A + E )T (A + E ) 0

and theorem6.1.8

[k (A + E ) k (A)]2 E
k=1

2 F.

Proof: Apply Theorem 6.1.12 to

0 AT A 0

and

0 (A + E ) T (A + E ) 0

We now show a variant of the QR algorithm can be used to comput SV D of a matrix. Equation (3.1) suggests:

6.3 Once Again:The Singular Value Decomposition (a) Form C = AT A; (b) Use the symmetric QR algorithm to compute V1T CV1 = diag ( 2 ); (c) Use QR with column pivoting to upper triangularize B = AV1 : U T (AV1 ) = R.

229

Since R has orthogonal columns, it follows that U T A(V1 ) is diagonal. A preferable method for computing the SV D is described in Golub and Kahan(1965). The rst step is to reduce A to upper bidiagonal form using algorithm 7.5 or 7.6 in part I: d1 f2 .. . d2 B . . T . f n . UB AVB = = dn 0 The remaining problem is thus to compute the SV D of B . Consider applying an implicit QR step (algorithm 8.2) to the tridiagonal matrix T = B T B : (a) Compute the eigenvalue of
2 d2 n + fn . 2 d2 dm fn m + fm 2 dm fn d2 n + fn

(m = n 1) that is closer to

(b) Compute c1 = cos 1 and s1 = sin 1 such that c1 s1 s1 c1 and set J1 = J (1, 2, 1 ). (c) Compute Givens rotations J2 , , Jn1 such that if Q = J1 Jn1 then QT T Q is tridiagonal and Qe1 = J1 e1 . Note that these calculations require the explicit formation of B T B , which is unwise in the numerical standpoint. Suppose instead that we apply Givens rotation J1 above to B directly. This gives + B := BJ1 = , n = 5. d2 1 d1 f2 = 0 ,

230 Chapter 6. The Symmetric Eigenvalue problem Determine Givens rotations U1 , V2 , U2 , , Vn1 and Un1 to chase the nonzero element down the diagonal:

+ T B := U1 B= + T B := U2 B =

, ,

B := BV2 =

B := BV3 = +

as follows The process terminates with a new bidiagonal B

T T = ( Un T B 1 U1 )B (J1 V2 Vn1 ) = U B V .

e1 = Qe1 . Since each Vi has the form Vi = J (i, i + 1, i ), i = 2, , n 1, it follows that V By implicit Q theorem we can assert that V and Q are essentially the same. Thus we can =B B T by working directly on the bidiagonal implicitly eect the transition from T to T matrix. It is necessary for these claims to hold that the underlying tridiagonal matrices be unreduced. This is the condition for the performance of implicit QR method. d1 f2 . d2 . . Let B = . If (B T B )i,i+1 = fi+1 di = 0, then: ... fn dn Either fi+1 = 0: B is reduced to B = Or di =0: What happens?
B1 B2

two small problems.

6.3 Once Again:The Singular For Example 0 0 B= 0 0 0 0 0 0

Value Decomposition 0 0 0 0 0 0 0 0 0 , (d2 = 0, n = 5) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

231

B := J1 (2, 3, )B = Rotation in (2, 3) B := J2 (2, 4, )B = Rotation in (2, 4) B := J3 (2, 5, )B = Rotation in(2, 5) |fi | (|di1 | + |di |) where and

Criteria: For smallness within B s band are usually of the the form |di | B , is a small multiple of the unit roundo.

Algorithm 3.1 (Golub-Kahan SV D Step) B Rnn is bidiagonal having nonzero subdiagonal and diagonal, the following algorithm overwrites B with the bidiagonal matrix =U T BV , where U and V are orthogonal and V is essentially the orthogonal matrix B that would be obtained by applying algorithm 8.2 to T = B T B. Let be the eigenvalue of the trailing 2 2 sumatrix of T = B T B that is closer to tnn . y = t11 z = t12 For k = 1, , n 1, Determine c = cos and s = sin such that c s [y , z ] = [ , 0] s c B = BJ (k, k + 1, ) y = bkk z = bk+1,k Determine c = cos and s = sin such that c s y = s c z 0 T B := J (k, k + 1) B If k < n 1, then y := bk,k+1 , z := bk,k+2 .

232 Chapter 6. The Symmetric Eigenvalue problem requires 4mn This algorithm requires 20n ops and 2n square roots. Accumulating U requires 4n2 ops. ops and V Algorithm 3.2 (The SV D Algorithm) Given A Rmn (m n) and a tolerance, the following algorithm overwrites A with U T AV = D + E , where U Rmn is orthogonal, V Rnn is orthogonal, D Rmn is diagonal, and E satises E 2 eps A 2 . Using algorithm 7.5 or 7.6 in Part I to compute the bidiagonalization A := (U1 Un )T A(V1 Vn2 ). Repeat Set ai,i+1 to zero if |ai,i+1 | (|aii | + |ai+1,i+1 |) for any i = 1, , n 1. Find the largest q and the smallest p such that A11 0 0 p 0 A22 0 n p q A= 0 0 A33 q 0 0 0 mn Then A33 is diagonal and A22 has a nonzero subdiagonal. If q = n then stop. If any diagonal entry in A22 is zero then zero the subdiagonal entry in the same row and go to Repeat. Apply algorithm 3.1 to A22 , , Iq+mn )T A diag (Ip , V , Iq ). A := diag (Ip , U Go to Repeat

6.4 Jacobi Methods

6.4

Jacobi Methods

233

Jacobi(1846) proposed a method for reducing a Hermitian matrix A = A Cnn to diagonal form using Givens rotations. Let A Cnn be a Hermitian matrix, there exists a unitary U such that U AU = diag (1 , , n ). (4.1) The Jacobi method constructs U as the product of innite many two dimensional Givens rotations. Fix indices i, k, i = k , Given a Givens Rotation
1 0 Uik = 0 .. . 1 . . . . . . . . . i e cos . . . . . . . . . sin . . . . . . . . . i . . . . . . . . . sin . . . . . . . . . 0 1

..

. 1

.
k

(4.2)

ei cos . . . 1 . . . . . . 0 k

..

Hereby , are free parameters, if A is real symmetric then = 0. Set V = Uik , B = V AV . Then s = i, k asj , ei cosaij sinakj , s = i j = i, k (4.3) bsj = i sinaij + e cosakj , s = k bsi = ei cosasi sinask , bsk = sinasi + ei cosask , s = i, k s = i, k (4.4)

bik = sincosei (aii akk ) + e2i (cos2 sin2 )aki bki = bik 2 b = cos aii + sin2 akk sincos[ei aki + ei aik ] ii bkk = sin2 aii + cos2 akk sincos[ei aki + ei aik ] We denote here the Frobenius norm (Hilbert-Schmidt norm) by (A) = dene the outer norm by g (A) =
i=k i,k

(4.5)

|aik |2 and

|aik |2 ,

which is only a seminorm (that is g (A) = 0 A = 0 does not hold). We also have (U A) = (A) = (AV ) for unitary U, V . Therefore (A) = (V AV ) = (B ),

234 that is

Chapter 6. The Symmetric Eigenvalue problem |ajs |2 =


j,s j,s

|bjs |2 .

(4.6)

On the other hand one computes |aii |2 + |akk |2 + 2|aik |2 = |bii |2 + |bkk |2 + 2|bik |2 . Together with bjj = ajj j = i, k follows from (4.6) and (4.7) |ajs |2 =
j =s j =s

(4.7)

|bjs |2 + 2|aik |2 2|bik |2

or g 2 (A) 2|aik |2 + 2|bik |2 = g 2 (B ). (4.8) To make g (B ) as small as possible we choose the free parameters , such that bik = 0. Multiplying the rst equation in (4.5) by 2ei and set bik to zero: sin2(akk aii ) = 2ei cos2 aik 2ei sin2 a ik . (4.9)

We exclude the trivial case aik = 0 (then set V = I ). Suppose that aik = 0. Compare the imaginary part in (4.9) which results 0 = Im(aik ei ). This equation holds for = argaik . From aik ei = |aik |. (4.9) leads to 2|aik |(cos2 sin2 ) = sin2(akk aii ), where cot2 = akk aii . 2|aik | (4.10)

(4.10) has exactly one solution in ( , ]. The choice = argaik + leads to the 4 4 same matrix B . For symmetric A, we choose = 0, then is obtained by cot2 = akk aii . 2aik

So the Jacobi method proceeds: A0 := A, an iteration sequence A0 , A1 , is con Am Vm , Am = (am structed by Am+1 := Vm ik ). Hereby Vm has the form of (4.2). The underlying pivot pairs i, k of Vm is formed according to a rule of choice so that the un+1 = 0. derlying , are chosen satisfying am ik Choice rules: (1) choose (i, k ) such that
m |am ik | = max |ajs |. j =s

This is the classical Jacobi method.

6.4 Jacobi Methods 235 Theorem 6.4.1 Let A be Hermitian. V = Uik is as in (4.2) where (i, k ) are chosen so that |aik | is maximal with , according to (4.10). Let B = V AV . Then it holds g 2 (B ) p2 g 2 (A) with p= n2 n 2 < 1. n2 n (4.11)

Proof: There are n2 n o-diagonal elements, so g 2 (A) (n2 n)|aik |2 . Thus |aik |2 1 g 2 (A), hence n2 n n2 n 2 2 g 2 (B ) = g 2 (A) 2|aik |2 g (A). n2 n Theorem 6.4.2 The classcial Jacobi method converges, that is, there exists a diagonal matrix so that limm Am = . Proof: From (4.11) follows g (Am ) 0, so ars 0 for all r = s. It remains to show the convergence of diagonal elements. From (4.5) and (4.10) follows that |bii aii | = | sin2 (akk aii ) |aik |2 sin cos | = |aik | |2 sin2 cot 2 2 sin cos | sin | |aik |. = |aik | | cos Analogously, |bkk akk | |aik |. If now i, k are the pivot indices of Am , then from above we have m+1 m m |a m jj ajj | |aik | g (Am ) p g (A). Thus
+q m m+1 |a m am + + pm+q1 |g (A) jj | |p + p jj (m)

pm g (A). 1p

This shows that the convergence of diagonal. Schonage (1964) and Van Kempen (1966) show that for k large enough there is a 1) constant c such that g (Ak+N ) cg (Ak )2 , N = n(n2 , i.e., quadratic convergence. An earlier result established by Henrici (1958) when A has distinct eigenvalue. (2) choose (i, k ) cyclically, e.g., (i, k ) = (1, 2), (1, 3), . . . , (1, n); (2, 3), . . . , (2, n); . . . ; (n 1, n); (1, 2), (1, 3), . . .. This is the cyclic Jacobi method. Algorithm 4.1 (Serial Jacobi cyclic Jacobi) Given a symmetric A Rnn and eps, the following algorithm overwrites A with U T AU = D + E , where U is orthogonal, D is diagonal, and E has a zero diagonal and satises E F A F : := A F Do until g (A) 2 For p = 1, 2, . . . , n 1, For q = p + 1, , n, Find J = J (p, q, ) such that the (p, q ) entry of J T AJ is zero, A := J T AJ.

236 Chapter 6. The Symmetric Eigenvalue problem 3 This algorithm requires 2n op per sweep. An additional 2n3 op are required if U is accumulated. (Hereby it is customary to refer to each set of n(n21) rotations as a sweep). A proof of quadratic convergence see Wilkinson (1962) and Van kempen (1966). Remark 4.1 In classical Jacobi method for each update O(n2 ) comparsions are required in order to locate the largest o-diagonal element. Thus much more time is spent by searching than updating. So the cyclic Jacobi method is considerably faster than classical Jacobi method. (3) When implementing serical Jacobi method, it is sensible to skip the annihilation of aik if its modulus is less than some small (sweep-dependent) parameter, because the net reduction of g (A) is not worth to cost. This leads to what is called threshold Jacobi method. Given a threshold value , choose the indices pair (i, k ) as in (2). But perform the m rotation only for |am ik | > . If all |aik | , then we substitute by /2 and so on. Details concering this variant of Jacobis algorithm may be found in Wilkinson (AEP p.277). Remark 4.2 (1) Although the serial Jacobi method (2) and (3) converge quadratically, it is not competitive with symmetric QR algorithm. One sweep of Jacobi requires as many ops as a complete computation of symmetric QR algorithm. However, the Jacobi iteration is attractive, for example, the matrix A might be close to a diagonal form. In this situation, the QR algorithm loses its advantage. (2) The Jacobi iteration is adapted to parallel computation. A given computational task, such as a sweep, can be shared among the various CPUs thereby reducing the overall computation time. (3) In practice we usually apply the choice (2) or (3). (4) It is not necessary to determine explicitly in (4.10), since only c = cos and s = sin aii )2 4s2 +4s4 are needed. From (4.10) follows 1 = (akk , a quadratic equation in s2 . The s2 (1s2 ) 4|aik |2 sign is determined by (4.10).

6.5 Some Special Methods

6.5

Some Special Methods


Bisection method for tridiagonal symmetric matrices
Write ... 0 . . . .. .. 0 . . 0 bn1 an .

237

6.5.1

Let A be tridiagonal, real and symmetric. a 1 b1 0 b1 a2 b2 0 b2 a 3 A= .. . . . 0 . 0 ... Let Ak be the k th principal submatrix a1 b1 b1 a 2 0 b2 Ak = . . 0 . 0 ... and

..

(5.1)

bn1

...

b2 a3 .. .. . . 0

0 . . . .. . ... bk1 0 bk1 ak

fk () = det(Ik Ak ), for k = 1, , n. (fn () = Characteristic polynomial of A.) Write f0 () = 1 and f1 () = a1 we have the recursive formula: fk () = ( ak )fk1 () b2 k1 fk2 (), k = 2, . . . , n. It holds:

(5.2)

(5.3)

Theorem 6.5.1 If bi = 0 in (5.1) for i = 1, . . . , n, then fk () has k real simple roots, k = 0, . . . , n. For 1 k n 1 the roots of fk () separate the roots of fk+1 (). Proof: Since Ak is real symmetric, it follows form (5.2) that the roots of fk () are real. The rank of Ik Ak is at least k 1 (scratsch the rst row and k -th column, and then consider bi = 0), therefore the dimension of the zero spaces of Ik Ak is not bigger than one, so we have simple roots. n = 2 : f1 has the root a1 and f2 (a1 ) = b2 1 < 0 (from (5.3), k = 2 and = a1 ), both roots of f2 must lie on the right and left sides of a1 , respectively. Suppose the assertion is true for k = 2, . . . , n 1, we shall prove that it is also true for k = n. It only needs to show that the roots of fn1 separate the roots of fn . Let 1 > 2 > > n1 be the roots of fn1 . From (5.3) we have fn (i ) = b2 n1 fn2 (i ), 2 fn (i+1 ) = bn1 fn2 (i+1 ). (5.4)

238 Chapter 6. The Symmetric Eigenvalue problem The roots of fn2 separate the roots of fn1 , there exists exactly one root of fn2 between i and i+1 , that is, sgnfn2 (i ) = sgnfn2 (i+1 ). Therefore it holds also for fn (because of (5.4)), so there is at least one root of fn in (i+1 , i ) from Rolle theorem, for i = n2 1, . . . , n 2. It is fn (1 ) = b2 + and all roots n1 fn2 (1 ) < 0, since fn2 () = of fn2 are on the left side of 1 . On the other hand fn for , so there exists an other root of fn in (1 , ). Similarly, we can show that there is a root of fn in (, n1 ). This shows that fn has n distinct, simple roots, which are separated by the roots of fn1 . The sequence of functions f0 , f1 , , fn satises in each bounded interval [a, b] the following conditions: (S1) fi (x) is continuous, i = 0, . . . , n. (S2) f0 (x) has constant sign in [a, b]. x) = 0 fi1 ( x)fi+1 ( x) < 0, i = 1, . . . , n 1, (S3) fi ( fn ( x) = 0 fn1 ( x) = 0. (S4) if x is a root of fn and h > 0 small, then fn ( x h) fn ( x + h) = 1 and sgn = +1. fn1 ( x h) fn1 ( x + h)

sgn

(S1) and (S2) are trivial, (S3) can be proved by (5.3) and f0 = 1 : fi+1 ( x) = b2 x), i fi1 ( so fi1 ( x)fi+1 ( x) 0. If fi1 ( x) = 0, then from (5.3) fi2 ( x) = 0 f0 ( x) = 0. Contradiction! So fi1 ( x)fi+1 ( x) < 0. For (S4): It is clear for largest root x , the others follow from induction. Denition 6.5.1 A sequence of functions with (S1)-(S4) is called a Sturm chain on [a, b]. If x [a, b], then f0 (x), f1 (x), , fn (x) are well-dened. Let 1 V (x) = 2
n1

|sgnfi (x) sgnfi+1 (x)|.


i=0

(5.5)

For fi (x) = 0, i = 0, . . . , n.V (x) is the number of the sign change of the sequence f0 (x), . . . , fn (x). If fk (x) = 0, 1 k n 1, then V (x) is no diernce, whether sgn 0 is dened by 0, 1 or 1. Only sgnfn (x) must be dened for fn (x) = 0, we set fn (x) = 0 sgnfn (x) := sgnfn1 (x). (5.6)

Theorem 6.5.2 Let f0 , . . . , fn be a Sturm chain on [a, b] and fn (a)fn (b) = 0. Then fn (x) has m = V (a) V (b) roots in [a, b].

6.5 Some Special Methods 239 Proof: x runs from a to b, what happens with V (x)? V (x) is constant in an interval, if all fk (x) = 0, k = 0, . . . , n, x [a, b]. (a) x runs through a root x of fk (x), 1 k n 1. If follows from (S3) that V (x) remains constant. (b) x runs through a root x of fn (x). Then from (S4) a sign changes is lost. So V (a) V (b) = the number of roots of fk (x) in (a, b). For special case as in (5.2), fk () is the characteristic polynomial of Ak . Since fk () for , so V (b) = 0 for large enough b. Theorem 6.5.3 If fi (x) are dened as in (5.2) and V (x) as in (5.5), then holds V (a) = the number of eigenvalues of A which are larger than a. Proof: (1) fn (a) = 0. Apply theorem 6.5.2 for large b. (2) fn (a) = 0, for > 0 small sgnfi (a + ) = sgnfi (a), i = 0, , n 1 and sgnfn (a + ) = sgnfn1 (a + ) from (S4). Thus V (a) = V (a + ) for > 0. So by theorem 6.5.3 V (a) = the number of eigenvalues of A, which are large than a + for arbitrary small > 0. Calculation of the eigenvalues Theorem 6.5.3 will be used as the basic tool of the bisection method in locating and separating the roots of fn (). Let 1 > 2 > . . . > n be the eigenvalues of A as in (5.1) and A is irreducible (i.e., bi = 0). Using the Gerschgorin circle theorem 5.2.1 all eigenvalues lie in [a, b], with a = min {ai |bi | |bi1 |}
1in

b = max {ai + |bi | + |bi1 |},


1in

where b0 = bn = 0. We use the bisection method on [a, b] to divide it into smaller subintervals. Theorem 6.5.3 is used to determine how many roots are contained in a subinterval, and we seek to obtain subintervals that will contain the desired root. If some eigenvalues are nearly equal, then we continue subdividing until the root is found with sucient accuracy. Let a(0) , b(0) be found with V (a(0) ) k , V (b(0) ) < k . Then by theorem 6.5.3 we have k (a(0) , b(0) ]. Determine a(0) + b(0) ) = v. V( 2 a(0) + b(0) (1) v k a(1) := , b := b(0) 2 a(0) + b(0) v < k a(1) := a(0) , b(1) := , 2 we have k (a(1) , b(1) ]. So k is always contained in a smaller interval. The evaluation (i) b(i) of V ( a + ) is simply computed by (5.3). 2

240 Example 11.1 Consider

Chapter 6. The Symmetric Eigenvalue problem T = 2 0 ... .. . 1 2 . . . 0 .. .. .. . . . . . . .. .. .. 0 ... 0 1 1 0 . . . . 1 2

By Gershgorin theorem all eigenvalues lie in [0, 4]. 0 and 4 are not the eigenvalues of T (Check!). The roots of T are labeled as 0 < 6 5 . . . 1 < 4. The roots can be found by continuing the bisection method. f6 () 0.0 7.0 4.0 7.0 2.0 -1.0 1.0 1.0 0.5 -1.421875 3.0 1.0 3.5 -1.421875 V () Comment 6 6 > 0 0 1 < 4 3 4 < 2 < 3 4 5 < 1 < 4 < 2 5 0 < 6 < 0.5 < 5 < 1 2 2 < 3 < 3 < 2 1 3 < 2 < 3.5 < 1 < 4

Remark 5.1 Although all roots of a tridiagonal matrix may be found by this technique, it is generally faster in that case to use the QR algorithm. With large matrices, we usually do not want all roots, so the method of this section are preferable. If we only want some certain specic roots, for example, the ve largest or all roots in a given interval, it is easy to locate them by using theorem 6.5.3.

6.5.2

Rayleigh Quotient Iteration

Suppose A Rnn is symmetric and x = 0 is a given vector. A simple dierentiation reveals that xT Ax = R[x] T (5.7) x x minimizes (A I )x 2 . The scalar r(x) is called the Rayleigh quotient of x. If x is an approximate eigenvector, then r(x) is a reasonable choice for the corresponding eigenvalue. On the other hand, if is an approximate eigenvalue, then inverse iteration tells us that the solution to (A I )x = b will almost always be a good approximate eigenvector. Combining these two ideas lead to the Rayleigh-quotient iteration: Given x0 with x0 2 = 1. For k = 0, 1, . . . k = R[xk ] Solve (A k I )zk+1 = xk for zk+1 xk+1 = zk+1 / zk+1 2 .

(5.8)

6.5 Some Special Methods 241 Parlett (1974) has shown that (5.8) converges globally and the loccally cubically. (See also Chapter I).

6.5.3

Orthogonal Iteration with Ritz Acceleration


Given Q0 Rnp with QT 0 Q0 = Ip . For k = 0, 1, . . . Zk = AQk1 , Qk Rk = Zk (QR-decomposition).

(5.9)

Let QT AQ = diag (i ) be the Schur decomposition of A and Q = [q1 , . . . , qn ], and |1 | > |2 | > . . . |n |. If follows from theorem 5.3.4 that if d = dist[D (A), R(Q0 )] < 1, then dist[D (A), R(Qk )] We know that (Stewart 1976) if Rk = [rij ] then |rii i | = O(|
(k) (k)

p+1 k 1 | | . 1 d2 p

i+1 k | ), i

i = 1, . . . , p.

This can be an unacceptably slow rate of convergence if i and i+1 are of nearly equal modulus. This diculty can be surmounted by replacing Qk with its Ritz Vectors at each step: Given Q0 Rnp with QT 0 Q0 = Ip . For k = 0, 1, . . . Zk = AQk1 , k Rk = Zk (QR decomposition), Q (5.10) T Sk = Qk AQk , T Uk Sk Uk = Dk (Schur decomposition), k Uk . Qk = Q It can be shown that if
(k) Dk = diag ((k) , . . . , p )

and p+1 k | , i

(k) |1 | |p |,

(k )

then |i i (A)| = |
(k) (k)

i = 1, . . . , p.
(k)

Thus the Ritz values i converge in a more favorable rate than the rii in (5.9). For details, see Stewart (1969) and Parletts book chapters 11 and 14.

242

6.6
6.6.1

Generalized Denite Eigenvalue Problem Ax = Bx


Generalized denite eigenvalue problem
Ax = Bx, (6.1)

Chapter 6. The Symmetric Eigenvalue problem

where A, B Rnn are symmetric and B is positive denite. (In practice A, B are very large and sparse). Theorem 6.6.1 The eigenvalue problem (6.1) has n real eigenvalues i associated with egienvectors xi satisfying Axi = i Bxi , i = 1, . . . , n. (6.2)
T Here {xi }n i=1 can be chosen such that xi Bxj = ij (B-orthogonal), i, j = 1, . . . , n.

Proof: Let B = LLT be the Cholesky decomposition of B . Then Axi = i Bxi Axi = i LLT xi L1 ALT (LT xi ) = i (LT xi ) Czi = i zi , where C L1 AL1 symmetric and zi LT xi . Since i are the eigenvalues of the symmetric matrix C , they T are real. The vectors zi can be chosen pairwisely orthogonal, i.e., ziT zi = ij = xT i LL xj = xT i Bxj . Let X = [x1 , , xn ]. Then from above we have X T BX = I and (X T AX )ij = T = j xT i Bxj = j ij which implies X AX = = diag (1 , , n ). That is, A, B are simultaneously diagonalizable by a congruence transformations. xT i Axj Numerical methods for (6.1): (a) Bisection method, (b) Coordinate relaxation, (c) Method of steepest descent. (a) Bisection methods: Basic tool: Sylvester law of inertia Denition 6.6.1 Two real, symmetric matrices A, B are called congruent, if there exists a nonsingular C such that A = C T BC. We denote it by A B . Dention 6.6.2 The inertia of a symmetric matrix A is a triplet of integers in(A) = ( (A), (A), (A)) (A) = the number of positive eigenvalues of A (geometry multiplicity), (A) = the number of negative eigenvalues of A (geometry multiplicity), (A) = n rank (A) = the number of zero eigenvalues of A. (6.4)
c

(6.3)

6.6 Generalized Denite Eigenvalue Problem Ax = Bx 243 Theorem 6.6.2 (Sylvester law of inertia) Two real, symmetric matrices are congruent if and only if they have the same inertia. Proof: (1) A, B real and symmetric. Suppose in(A) = in(B ), there exist orthogonal U and V such that U AU T = 1 = diag (1 (A), , n (A)) with 1 (A) n (A) and V BV T = 2 = diag (1 (B ), , n (B )) with 1 (B ) n (B ). Claim: 1 is congruent to 2 . Since in(A) = in(B ), it holds either i (A)i (B ) > 0 or i (A) = i (B ) = 0. Set D = diag (di ), where di = Then DT 2 D = 1 , so A B . (2) Suppose A B . Claim: in(A) = in(B ). Let A = C T BC , U AU T = 1 and V BV T = 2 as above. These imply 1 = P T 2 P , where P = V T CU T . Assume that in(A) = in(B ). Clearly (A) = (B ) (A) = (B ). Without loss of generality we can suppose (A) < (B ). The homogenous linear system xi = 0 , (P x)i = 0,
n c c i (A) , i (B )

1,

if i (A)i (B ) > 0 . if i (A)i (B ) = 0

i = 1, , (A), i = (B ) + 1, , n,

(6.5)

has a nonzero solution x = 0, since it has fewer than n equations. With this x we have 0
i=1 n T T T i (A)x2 i = x 1 x = x P 2 P x

=
i=1 (B )

i (B )(P x)2 i > 0 i (B )(P x)2 i.


i=1

That is, there is an i (1 i (B )) with (P x)i = 0, contradiction! Second Part of Proof: Show that B and C T BC have the same inertia. Because they have the same rank, it is sucient to show that: (B ) = (C T BC ). If r (B ) > 0, let Bqi = i (B )qi and S0 = span{C 1 qq , , C 1 qr }. Then r (C T BC ) = =
dim S =r xS,

max

min

xT C T BCx xT C T Cx min xS0 , x=0 x=0 xT x xT x

xS,

min

xT C T BCx xT C T Cx x=0 xT C T Cx xT x xT C T BCx xT C T Cx min x=0 xT C T Cx xRnn , x=0 xT x (Since x S0 Cx Span(q1 , , qr ))

xS0 ,

min

= r (B )n (C )2 > 0,

244 Chapter 6. The Symmetric Eigenvalue problem where 1 (C ) n (C ) > 0 are the singular values of C . So we have r (C T BC ) > 0 and (C T BC ) (B ). Exchange the role of B and C T BC we obtain (C T BC ) = (B ). Important inequality: 2 From above we have r (C T BC ) r (B )n (C ). Change B and C T BC we then obtain
2 r (B ) r (C T BC )n (C 1 ) = r (C T BC )

1
2 1 (C )

This imply
2 (C ) 1

r (C T BC ) 2 (C ). n r ( B )

(6.6)

It holds also for the negative eigenvalues of B and C T BC . Corollary 6.6.3 If A = C T BC , C nonsingular (A B ), then it holds for nonzero eigenvalues r (A) 2 2 1 (C ) n (C ). r ( B ) Lemma 6.6.4 A nonsigular, real and symmetric and has a LR-decomposition A = LR, (6.7)
c

where L is lower triangular with lii = 1 and R is upper triangular with rii = 0, i = 1, , n. Then holds (A) = #{i : rii > 0} and (A) = #{i : rii < 0}.

= D1 R has one on the diagonal. This implies Proof: Let D = diag (rii ). Then R = AT = R T DLT . A = LR = LDR D R , where L, R has one on the diagonal is unique, therefore The decomposition A = L T L = R . So c A = LDLT = A D = in(A) = in(D). But (D) = #{i : rii > 0}, the assertion is proved.

Theorem 6.6.5 Let A, B be real, symmetric and B positive denite, be a given real number. Then holds (A B ) = #{eigenvalues of (6.1) larger than } (A B ) = #{eigenvalues of (6.1) smaller than } (A B ) = #{multiplicity of as an eigenvalues of (6.1)}

6.6 Generalized Denite Eigenvalue Problem Ax = Bx 245 1 1 T T Proof: Ax = Bx Cy = y , where C = L AL , B = LL and L x = y . By theorem 6.6.2 (Sylvester law of inertia) we have in{(A B )} = in{L1 (A B )L1 } = in{(C I )}. Since C I has the eigenvalues i > 0, we have (A B ) = #{i : i > 0} = #{i : i > }. Similarly, we also have the assertions for (A B ) and (A B ). Remark 6.1 Theorem 6.6.5 leads to a bisection method for (6.1). If [a, b] is an interval, b which contains the desired eigenvalues, then by calculation of in(A a+ B ) we know 2 a+b a+b that the desired eigenvalues lie in [a, 2 ] or [ 2 , b]. It requires the LU decomposition of A B , which in general is indenite. (b) Methods of Coordinate relaxatoin: This method requires only the calculation of Ax and Bx. Consider the generalized Rayleigh quotient xT Ax R[x] = T . (6.8) x Bx Let z = LT x, C = L1 ALT and B = LLT (Ax = Bx). C is symmetric, let Cui = i ui and 1 2 n . By theorem 6.1.6 we have i = max{R[z, C ] = From (6.8) follows that xT Ax z T L1 ALT z z T Cz R[x] = T = = T . x Bx zT z z z Therefore we have the following new version of connection between the eigenvalues and Rayleigh quotient of generalized denite eigenvalues problem (6.1). Theorem 6.6.6 Let 1 n be the eigenvalues of Ax = Bx satisfying Axi = i Bxi , i = 1, , n. It holds i = max{R[x] = xT Ax : xT Bxj = 0, xT Bx j < i, x = 0}. (6.9) z T Cz : z ui , j < i, z = 0}. zT z

Proof: Axi = i Bxi Cui = i ui , ui = LT xi . Let z = LT x, then z uj z T uj = 0 xT LLT xj = 0 xT Bxj = 0. These imply that z T Cz { T : z uj , j < i, z = 0} z z xT Ax = { T : xT Bxj = 0, j < i, x = 0}. x Bx Take maximum and from (1.5) follows (6.9). Similarly, theorem 6.1.6 (Courant-Fischer) can be transfered to:

246 Chapter 6. The Symmetric Eigenvalue problem Theorem 6.6.7 For the eigenvalues 1 2 n of Ax = Bx it holds i = min
pT j x = 0, 1 j < i, x = 0

{p1 , , pi1 }, 1 j < i, pj = 0

max

xT Ax , xT Bx

(6.10)

i =

xT Ax , dimS =n+1i xS,x=0 xT Bx xT Ax i = max min T . dimS =i xS,x=0 x Bx min max

(6.11) (6.12)

Theorem 6.1.7 (Separation theorem) can be transfered to: Theorem 6.6.8 A, B are real, symmetric and B is positive denite. An1 and Bn1 are obtained by scratching the last row and column of A and B respectively. For the eigenvalues 1 2 n > 0 of Ax = Bx and 1 2 n1 of n1 x it holds An1 x = B s+1 s s , and 1 = max R[x],
x=0

s = 1, , n 1 n = min R[x].
x=0

(6.13) (6.14)

Problem: How to compute the smallest eigenvalue n and its associated eigenvector? Ideal: Minimize Rayleigh quotient R[x] on a two dimension subspace. Basic Problem: Given two linearly independent vectors x, y . Minimize R[x] on the from x and y generated subspace generated by x and y . Let x = x + y , then R[x ] = (x + y )T A(x + y ) 2 + 2f + 2 p = , (x + y )T B (x + y ) 2 + 2g + 2 q (6.15)

where = xT Ax, = xT Bx, f = xT Ay, g = xT By, p = y T Ay and q = y T By . Let = A Then f f p = ,B g g q ,x = . (6.16)

x x T A , T x Bx B are symmetric and B is positive denite. Applying (6.14) to A and B we where A, get that R[x ] has the minimum R , where R is the smallest eigenvalue of the problem B x x A = . That is, R[x ] = RB ) = 0, quadratic equation in R . det(A (6.17)

6.6 Generalized Denite Eigenvalue Problem Ax = Bx 247 T Compute the associated eigenvector (to R ) x = (, ) from one of the following equations: ( R ) + (f R g ) = 0 (f R g ) + (p R q ) = 0 RB ) (A = 0. (6.18) (6.19)

Case 1: p R q = 0 (p/q = y T Ay/y T By = R[y ] > R ): From (6.19) implies = 0. Set = 1. From (6.19) follows = and that x = x + y is the solution of the basic problem. Case 1 is called normal case. Case 2: p R q = 0: This implies f R g = 0, because RB ) = ( R )(p R q ) (f R g )2 . 0 = det(A (a) If R = 0, then = 0 and is arbitray. Set x = y . =RB = R[ (b) If R = 0, then A x] = R for all x Span(x, y ). Set x = x. The method of coordinate relaxation Given a starting vecor y1 = 0. yi+1 is determined by yi as follows: Set x = yi , y = ek , k = i mod n and Solve the basic problem with respect to x and y. Let x be the solution. Set yi+1 = x /|x |. We obtain the sequence of vectors y1 , y2 , y3 , such that R[y1 ] R[y2 ] R[y3 ] n . Remark to the computational cost B : compute (1) Compute A,
T p = y T Ay = eT k Aek = akk , q = ek Bek = bkk , u = Ax and v = Bx, and then T T f = y T Ax = eT k u = uk , g = y Bx = ek v = vk , = xT Ax = xT u and = xT v. and B. Construct A

f Rg pRq

(6.20)

(6.21)

) = 0 in R . RB (2) Solve the quadratic equation det(A

248 (3) Solve x = x + ek .

Chapter 6. The Symmetric Eigenvalue problem

(4) Ax and Bx (for the next step) can be computed implicitly. We use the following updating: Ax = Ax + Aek , (Ax )j = uj + ajk , Bx = Bx + Bek , (Bx )j = vj + bjk .

Remark 6.2 If R[y1 ] < mini (aii /bii ) = mini R[ei ], then it happens only normal case: R q p = R bkk akk R[y1 ]bkk akk < 0. Since R[y1 ] < akk /bkk , so R q p = 0, a normal case! Theorem 6.6.9 Let R[y1 ] < min
i

aii , bii

(6.22) (6.23)

Then it holds
i

lim R[yi ] = .

Here is an eigenvalue of (6.1) Ax = Bx, and each accumulation point of {yi } is the associated eigenvector to . Corollary 6.6.10 If (6.22) holds and R[y1 ] < n1 , Then limi R[yi ] = n . If n is simple, then holds:
i

(6.24)

lim yi = y exists and y is the eigenvector to n .

Proof of theorem 6.6.9 Only normal case! yi+1 is a function of ek (k = i mod n) and yi . Let yi+1 = Tk (yi ). The function Tk is continuous in yi , since for x y the solution x of basic problem depends continously on the given x. (normal case!) For R[y1 ] R[y2 ] n there exists the limit point . Show that: = lim R[yi ] is an eigenvalue.
i

In addition we show that an accumulation point y of the sequence {yi } satises Ay = By . Let yr(i) be the convergence subsequence of {yi }, i.e., limi yr(i) = y . Without loss of generality there are innite r(i) satisfying 1 = r(i) mod n. So y = lim ynk(i)+1 and R[y ] = .
i

Since T1 is continuous, where T1 y = lim T1 ynk(i)+1 = lim ynk(i)+2


i i

6.6 Generalized Denite Eigenvalue Problem Ax = Bx 249 which implies R[T1 y ] = . Thus, R[T1 y ] = R[y ] = = = 0 = y = T1 y and f g = 0. So f = (Ay )1 and g = (By )1 = (Ay )1 = (By )1 . Also T2 is continuous, we have T2 T1 y = T2 y = lim T2 ynk(i)+2 = lim ynk(i)+3 ,
i i

then = R[y ] = R[T2 y ]. As above we also have = 0 and then y = T2 y . So f = g , thus (Ay )2 = (By )2 , and so on. It follows Ay = By . Proof of Corollaray 6.6.10 The rst part is trivial, since n is the unique eigenvalue smaller than n1 . The normalized eigenvectors to n are x/|x|, where Ax = n x. Two possible accumulation points are separate. Let yi x/|x|, then Ayi n Byi . This follows f n g , so 0, thus yi+1 yi . A second accumulation point can not appear. As relaxation method by solving linear system, we introduce an overcorrect x = x + y (1 < < 2) for the csae 1 instead of x = x + y . We describe the above discussion as the following algorithm: Algorithm 6.1 (Coordinate over relaxation method to determine the smallest eigenvalue of symmetric generalized denite problem Ax = Bx) Let A, B Rnn be symmetric and B positive denite. Step 1: Choose a relaxation factor (1, 2), tolerance , R+ and a starting vector x Rn \{0}. Compute a := xT Ax, b := xT Bx and r := a/b. Step 2: Set Rmax := Rmin := r. For j = 1, 2, , n Compute f : n g := n k=1 ajk xk , k=1 bjk xk , Determine the smallest eigenvalue r1 of a f f p (2.1) If |p r1 q | > , then set
r1 g , xj := xj + , a := a + 2f + 2 p, := f g r1 q b := b + 2g + 2 g, r := a ; b

p := ajj ,

q := bjj

b g g q

= 0.

(2.2) If |p r1 q | , and |a r1 b| > then set x := ej , a := p, b := q, r := (2.3) If |p r1 q | , and |a r1 b| a b

then stop. Set

Rmax := max(r, Rmax ) and Rmin := min(r, Rmin ).

250 Step 3: If

Rmin Rmax

Chapter 6. The Symmetric Eigenvalue problem 1 , then go to step 2, otherwise stop.

A detail discussions for determining optimal can be found in: H.R.Schwarz: Numer. Math. 23, 135-151 (1974). H.R.Schwarz: Finite Elemente, Teubner Verlag. (c) Methods of steepest descent Recall that at a point xk the function : Rn R decreases most rapidly in the direction of the negative gradient (xk ). The method is called the gradient or steepest descent method. Here we have (x) = R[x] = It holds Grad(x) = 2[(xT Bx)Ax (xT Ax)Bx] 2 = T (Ax R[x]Bx). T 2 (x Bx) x Bx (6.25) xT Ax . xT Bx

Thus, Grad R[x] (=Grad (x)) = 0 R[x] is the eigenvalue and x is the associated eigenvector x is stationary point of R[x].
def

Methods of steepest descent:


Given y1 = 0. yi+1 is determined by yi . (1) Search direction pi = (A R[yi ]B )yi . If pi = 0 stop, otherwise (2) Solve the basic problem with x = yi and y = pi . If x is the solution, then set x yi+1 = , x Go to (1). Lemma 6.6.11 Let B = I . Then holds
T pT i (A R B )yi = pi pi , R = R[yi+1 ], pT i (A R B )pi > 0, if pi = 0.

(6.26)

(6.27)

(6.28) (6.29)

Especially, it happens only normal case, thus the function T (yi ) = yi+1 = T (yi ) is continuous. Proof: Since pT i yi = 0 (by computation!), we have
T T pT i (A R B )yi = pi (A R[yi ]B )yi + (R[yi ] R )pi Byi = pT i pi . T T If pi = 0, then f R g = pT i Ayi R pi Byi = pi pi > 0 (by (6.28)). From (6.18) and f R g = 0 = 0. Hence the minimum is not at y = pi , so R[pi ] > R . Thus T T pT i (A R B )yi = pi (A R[pi ]B )pi + (R[pi ] R )pi Bpi > 0.

6.6 Generalized Denite Eigenvalue Problem Ax = Bx So (6.29) holds. i.e. p R q = 0 normal case! T is continuous.

251

Theorem 6.6.12 Let B = I , and the sequence of vectors {yi } is obtained by the method of steepest descent (6.26),(6.27). Then it holds with ri = R[yi ] that (1) limi ri = is an eigenvalue of A . (2) Each accumulation point of {yi } is the eigenvector of A corresponding to . (3) If y1 = n k=1 k xk is the expansion of the starting vector by normalized eigenvectors n {xk }k=1 of A, (Axi = i xi with n n1 1 ) and n = 0, then it holds lim ri = n .

Proof: Since r1 r2 n , there exists the limit point with limi ri = . Let z be an accumulation point of {yi }iN , i.e., z = lim yn(i) ,
i

R[z ] = lim R[yn(i) ] = .


i

Since T (T : yi yi+1 ) is continuous, so


i

lim T yn(i) = lim yn(i)+1 = T lim yn(i) = T z.


i i

This implies R[T z ] = lim R[yn(i)+1 ] = .


i

From R[T z ] = R[z ] and = 0 we have T z = z . (Since Grad R[z ] = 0, z is the eigenvector to R[z ] = ). Thus (1),(2) are established. Claim (3): Let yi =
n k=1 i+1 i i k xk . Prove that k is determined by k . Since n i (k ri )k xk , k=1

pi = (A ri I )yi = from (6.28)(6.29) follows i = Since y i + i p i =


k=1

pT f ri+1 g i (A ri+1 B )yi = T > 0. p ri+1 q pi (A ri+1 B )pi


n i k [1 + i (k ri )]Bxk ,

we get yi+1 yi + i pi = = yi + i pi

k=1

k (1 + i (k ri )) i Bxk . n 2 2 1/2 i s=1 (s ) (1 + i (k ri )) )

252 This implies


i+1 r =

Chapter 6. The Symmetric Eigenvalue problem


i r (1 + i (r ri )) . n i )2 (1 + ( r ))2 ( i s i s s=1

(6.30)

We then have

i+1 i n n = i+1 i r r

1 + i (n ri ) 1 + i (r ri )

(6.31)

Assume that {ri } does not converge to n , but to r > n . Then


n(i) n(i) 0. 1 and n yn(i) xr = r

On the other hand, since r ri < 0, n ri < 0, i < 0 and n ri < r ri , we have 1 + i (n ri ) > 1. 1 + i (r ri )
i+1 i n n > . i+1 i r r

From (6.31) follows that

This contradicts that n

n(i)

0.

6.6 Generalized Denite Eigenvalue Problem Ax = Bx

Further methods for the symmetric eigenvalue problem Ax = Bx


A, B Rnn are symmetric and B is positive denite. Reduction to the ordinary eigenvalue problem: (1) B 1 Ax = x, the symmetry is lost. (2) B = LLT , L1 ALT z = z with LT x = z , Cholesky method.

253

Remark: For a given vector z we can compute L1 ALT z as follow: Comupte z1 from LT z1 = z by backward substitution. Then z2 = Az1 . Compute z3 from Lz3 = z2 by forward substitution. We can use the sparsity of B (also L) and A. Caution: L1 ALT is in general dense. (3) Theoretically, B has (unique) positive denite square root B 1/2 , i.e., B 1/2 B 1/2 = B , B 1/2 AB 1/2 z = z . Computation of B 1/2 is expensive. Let B = U DU T , where U is orthogonal and D is diagonal with D > 0. Then B = (U D1/2 U T )(U D1/2 U T ) B 1/2 = U D1/2 U T , where D = diag (di ) and D1/2 = diag ( di ). Consider Ax = Bx, A, B symmetric and B positive denite. Let 1 2 n > 0 be the eigenvalues of (6.1). Recall that the power method and the inverse iteration for B = I : Power method: Given x0 = 0, for i = 0, 1, 2, , yi+1 = Axi , ki+1 = yi+1 , xi+1 = yi+1 /ki+1 . xi converges to the eigenvector of the eigenvalue 1 and ki |1 | as i . Inverse power method: Given x0 = 0, for i = 0, 1, 2, , T i = xT i Axi /xi xi Rayleigh quotient (A i I )xi+1 = ki+1 xi , kk+1 is chosen so that xi+1 = 1. Cubic convergence. Transfer to the problem (6.1): Power method (6.32) A B 1 A: Given x0 = 0, for i = 0, 1, 2, , Byi+1 = Axi , ki+1 = yi+1 xi+1 = yi+1 /ki+1 . (6.32)

(6.33)

(6.34)

254 Chapter 6. The Symmetric Eigenvalue problem We must solve one linear system in each step. In general, Cholesky decomposition of B is necessary. Inverse power method for Ax = Bx: Given x0 = 0, for i = 0, 1, 2, , T i = xT i Axi /xi Bxi (A i B )xi+1 = ki+1 Bxi , ki+1 is chosen so that xi+1

(6.35)
B

= 1.

Reduction: Let B = LLT . Substitute A by L1 ALT in (6.33) then we have (here xi zi ): z T L1 ALT zi xT i Axi i = i = , where LT zi = xi , T T zi zi xi Bxi and (L1 ALT i I )zi+1 = ki+1 zi (A i B )xi+1 = ki+1 Bxi . Let 1 > 2 n be the eigenvalues of A B . Then the power iteration (6.34) converges to 1 . Let {x i }n i=1 be the complete system of eigenvectors,i.e., x T j = ij and x T j = i ij for all i, j = 1, , n. i Bx i Ax Let y1 =
n 1 i , j =1 ci x

yk = yk+1 =

n k i . i=1 ci x n

Then it holds
n n

+1 ck x i i i=1

=B A
i=1

ck i ix

=
i=1

ck i . i i x

+1 k k 1 This implies that ck = i ck i , and thus ci = i ci . Therefore, we have i n 1 k 1 1 {c1 x

yk =

+
=2

k 1 ) c x }. 1

Normalizing yk we get that xk converges to x 1 . Cost of computation: Matrix vector Axi , Solve the linear system Byi+1 = Axi . Determination of the eigenvalue: ki+1 |1 |. Although we have ki+1 |1 |, the better approximation of 1 is R[xi ]. Let xi = x 1 + d, where d span{x 2 , x n } and d = 1. Then R[xi ] = R[ x1 + d ] = ( x1 + d)T A( x1 + d ) ( x1 + d)T B ( x1 + d ) 2 T T T 1 1 + d Ad x 1 Ax x 1 Ax = T + O( 2 ) = T 2 T x 1 B x 1 + d Bd x 1 B x 1 2 = 1 + O( ).

6.6 Generalized Denite Eigenvalue Problem Ax = Bx Error of eigenvalue (error of eigenvector)2 .

255

Compute the other eigenvalues and eigenvectors:


Suppose 1 , x 1 are computed. Power method does not converge to 1 , x 1 , if it satises xT 1 = 0, i Bx i = 0, 1, 2, . (6.36)

1 = 0, then all iterate xi satisfy (6.36) theoretically If we start with x0 satisfying xT 0 Bx 1 (since c1 = 0). Because of roundo error we shall perform the reorthogonalization: By i+1 = Axi , i+1 ) x1 , yi+1 = y i+1 ( xT i By xi+1 = yi+1 / yi+1 B . In general: Suppose 1 , p , x 1 , x p are computed, then we perofrm the following reorthogonalization: By i+1 = Axi , yi+1 = y i+1 p xT i+1 ) xj , j By j =1 ( xi+1 = yi+1 / yi+1 B . Here xT j = 0, for j = 1, , p, and i = 0, 1, 2, . i Bx Simultaneous vector-iteration: Determine the p (p > 1) largest eigenvalues and the associated eigenvectors of (6.1). (i) (i) Compute simultaneuously the approximations x1 , , xp . Let
i) X (i) = (x1 , , x( p ) Rnp . (i)

(6.37)

We demand X (i) satises the following relation: X (i) BX (i) = Ip .


T

(6.38)

(i) Since x T are nearly B -orthogonal. From (6.34) we coni Bxj = ij , the columns of X (i) struct X by BY (i) = AX (i1) (6.39)

and then X (i) = Y (i) Ci , (6.40) where Ci Rpp and is chosen such that (6.38) is satised. We have the following methods for determining Ci . (a) Apply orthogonalization algorithm to the columns of Y (i) , then Ci is an upper triangular matrix. (i) (i) (i) (i) Let Y (i) = (y1 , , yp ) and X (i) = (x1 , , xp ). For k = 1, , p,
1 hk = y k k =1 (yk Bx )x , (i) 1/2 xk = hk /(hT . k Bhk ) (i) (i)T (i) (i)

256 Chapter 6. The Symmetric Eigenvalue problem (0) (i) The rst column x1 is the same as that we apply power method to x1 . Convergence can be slow. (b) Dene Gi = Y (i) BY (i) , then Gi is positive denite. There exists an orthogonal matrix Vi and Di = diag (dj ) with d1 d2 dp > 0 such that Gi = Vi Di ViT . Let X (i) = Y (i) Ci where Ci = Vi Di Check X (i) BX (i) = CiT Y (i) BY (i) Ci = (Vi Di = Di
1/2
T T T

1/2

and Di

1/2

= diag (1/ d1 , , 1/ dp ).

(6.41)

1/2 T

) Gi (Vi Di

1/2

ViT Gi Vi Di

1/2

= Ip .

So the columns of X (i) are B -orthogonal. Method (b) brings the approximations in the correct order. Example: Let X (1) = (x2 , x3 , x1 ), where xT i Bxj = ij and Axi = i Bxi , i, j = 1, 2, 3. Method (a): X (i) = X (1) , Y (2) = B 1 AX (1) = (2 x2 , 3 x3 , 1 x1 ). Then X (2) = (x2 , x3 , x1 ) = X (1) . Method (b): 2 2 0 0 1 0 2 2 0 0 0 2 G2 = , D2 = 3 2 0 0 2 0 0 1 0 1 0 0 1/2 V2 = 0 0 1 , C2 = V2 D2 = 0 1 1 0 0 1 X (2) = Y (2) C2 = (x1 , x2 , x3 ). Method (b) forces the eigenvectors in the correct order. (6.39) and (6.40) imply Treppen iteration (F.L. Bauer 1957) : For B = I : AX (i1) = Y (i) = X (i) Ci1 = X (i) Ri , where Ri is upper triangular. p = n: See the connection with QR Algorithm. Ai = X (i1) AX (i1) , Qi = X (i1) X (i) , Ai = Qi Ri , Ai+1 = Ri Qi .
T T

1 0 2 1 0 . 3 0 0

0 0 , 2 3

Then

(6.42)

6.6 Generalized Denite Eigenvalue Problem Ax = Bx 257 T 1 T (i) p < n: and B = LL positive denite: Treppen iteration for L AL leads to Z : L1 ALT Z (i1) = Z (i) Ri , Let X (i) = LT Z (i) , rewrite (6.43) to X (i) : AX (i1) = BX (i) Ri and Improvement: B = I . Recall Theorem 6.6.13 A is real and symmetric, Q Rnp orthogonal and S Rpp symmetric, then for an eigenvalue i (S ) of S there exists an eigenvalue ki (A) of A such that |i (S ) ki (A)| AQ QS 2 , i = 1, , p. Theorem 6.6.14 Let S0 = QT AQ, then AQ QS0 for all symmetric matrix S Rpp . For given orthogonal matrix X (i) , if we construct Si = X (i) AX (i) , then the eigenvalues of Si are the optimal approximations to the eigenvalues of A (optimal error estimation). Also good error estimation for eigenvectors. From Si z = z follows that AX (i) z X (i) z = (AX (i) X (i) Si )z, A(X (i) z ) (X (i) z )
2
T

Z (i) Z (i) = Ip .

(6.43)

(6.44)

X (i) LT LX (i) = Ip = X (i) BX (i) .

2,F

AQ QS

2,F

AX (i) X (i) Si

z 2.

So X (i) z is a good approximation to an eigenvector of A. B =positive denite: Given n p matrix S with rank (S ) = p. Let S = span(S ). Find a new base of S , which presents a good approximation to eigenvectors of Ax = Bx. (6.1) is equivalent to : x = B 1/2 AB 1/2 , x A = x with A = B 1/2 x. Orthonormalize B 1/2 S (S B 1/2 S ) and results = B 1/2 S (S T BS )1/2 . S (6.46) (6.45)

S are = Ip ). From above we know that the eigenvalue i of H = S T A T S (CheckS a good approximation to an eigenvalue of (6.45), so of (6.1) and g i is the associated

258 Chapter 6. The Symmetric Eigenvalue problem g eigenvector, then S g i is a good approximation to an eigenvector of (6.45), so B 1/2 S i an approximation to an eigenvector of (6.1). Rewrite A, B , S : = (S T BS )1/2 S T B 1/2 B 1/2 AB 1/2 B 1/2 S (S T BS )1/2 H = (S T BS )1/2 (S T AS )(S T BS )1/2 g Then H i corresponds to (S T AS i S T BS ) (S T BS )1/2 g i = 0,
gi

i.e. (As i Bs )gi = 0 with As = S T AS, Bs = S T BS. (6.47)

If S is given, construct As , Bs . The eigenvalues of As z = Bs z are good approximations to eigenvalues of (6.1). Compute the eigenvectors gi of (6.47), then Sgi are approximations to the eigenvectors of (6.1). Some variant simultaneuous vector iterations (B = I ): (a) (1) (2) (3) (4) Y ( ) = AX ( 1) , Orthonormalize Y ( ) = Q R (QR decomposition), Compute H = QT AQ , Solve the complete eigenvalue system for H , H = G GT , G : orthogonal and : diagonal, (5) X ( ) = Q G (The element of are in decreasing order).

(6.48)

The computation of (1) and (3) are expansive, it can be avoided by the following way. Since the invariant subspaces and eigenvectors of A and A2 are equal, so we can consider 2 the matrix A2 instead of A. The eigenvectors of QT A Q are the good approximations for the eigenvectors of A. Compute
Ip 2 1 T ( 1) 1 . AA2 AX ( 1) R QT A Q = ( R ) X 1 (Here Q = AX ( 1) R from (1) (2) above) T 1 T 1 ) . R = ( R R = R
T

So we have the following new method: (b) (1) (2) (3) (4) (5) Y ( ) = AX ( 1) , Orthonormalize Y ( ) = Q R , = R R T , Compute H = P 2 P T , P : orthogonal, : diagonal, Solve H X ( ) = Q P .

(6.49)

6.6 Generalized Denite Eigenvalue Problem Ax = Bx (c) Third variant compution of Q : Find F such that Y ( ) F is orthogonal. So
T ( ) F Y Y ( ) F = I
T

259

(6.50)

and

T 1 T 1 Y ( ) Y ( ) = F F = (F F ) .

On the other hand Y ( ) F diagonalize A2 , i.e.,


2 T ( ) 2 ( ) F = Y A Y F diagonal.
T

(6.51)

From (6.51) and because of Y ( ) = AX ( 1) follows


( 1) 2 T T AA2 AX ( 1) F = F F . = F X Ip T Thus I = F F and then F is orthogonal. Using (6.50), we have T 1 1 1 H = Y ( ) Y ( ) = (F )2 ( F ) 1 = ( F )T 2 ( F ) .
T T

ortho.

diag. ortho.

The diagonal elements of 2 are the eigenvalues of H and the column of F are the eigenvectors of H , therefore we can compute F as follows: (1)Y ( ) = AX ( 1) , = Y ( )T Y ( ) , (2)Compute H = B 2 B T complete eigensystem of H , (3)Compute H 1 ( ) (4)X ( ) = Y ( ) B F ). (= Y The cost of computation of (6.52) is more favorable than of (6.49).

(6.52)

260

Chapter 6. The Symmetric Eigenvalue problem

Chapter 7 Lanczos Methods


In this chapter we develop the Lanczos method, a technique that is applicable to large sparse, symmetric eigenproblems. The method involves tridiagonalizing the given matrix A. However, unlike the Householder approach, no intermediate (an full) submatrices are generated. Equally important, information about A s extremal eigenvalues tends to emerge long before the tridiagonalization is complete. This makes the Lanczos algorithm particularly useful in situations where a few of A s largest or smallest eigenvalues are desired.

7.1

The Lanczos Algorithm

Suppose A Rnn is large, sparse and symmetric. There exists an orthogonal matrix Q, which transforms A to a tridiagonal matrix T . QT AQ = T tridiagonal. (1.1)

Remark (1) Such Q can be generated by Householder transformations or Givens rotations. (2) Almost for all A (i.e. all eigenvalues are distinct) and almost for any q1 Rn with q1 2 = 1, there exists an orthogonal matrix Q with rst column q1 satisfying (1.1). q1 determines T uniquely up to the sign of the columns (that is, we can multiply each column with -1). Let (x Rn ) K [x, A, m] = [x, Ax, A2 x, , Am1 x] Rnm . K [x, A, m] is called a Krylov-matrix. Let K(x, A, m) = Range(K [x, A, m]) = Span(x, Ax, , Am1 x). K(x, A, m) is called the Krylov-subspace generated by K [x, A, m]. Remark: For each H Cnm or Rnm (m n) with rank (H ) = m, there exists an Q Cnm or Rnm and an upper triangular R Cmm or Rmm with Q Q = Im such that H = QR. (1.4) (1.3) (1.2)

262 Q is uniquely determined, if we require all rii > 0.

Chapter 7. Lanczos Methods

Theorem 7.1.1 Let A be symmetric (Hermitian), 1 m n be given and dim K(x, A , m) = m then (a) If K [x, A, m] = Qm Rm (1.5) is an QR decomposition, then Q m AQm = Tm is an m m tridiagonal matrix and satises AQm = Qm Tm + rm eT m, (b) Let x
2

Q m rm = 0.

(1.6)

= 1. If Qm C nm with the rst column x and Q m Qm = Im and satises AQm = Qm Tm + rm eT m,

where Tm is tridiagonal, then


m1 K [x, A, m] = [x, Ax, , Am1 x] = Qm [e1 , Tm e1 , , Tm e1 ]

(1.7)

is an QR decomposition of K [x, A, m] . Proof: (a) Since AK(x, A, j ) K(x, A, j + 1), From (1.5) we have Span(q1 , , qi ) = K(x, A, i), So we have qi+1 K(x, A, i) AK(x, A, i 1) = A(span(q1 , , qi1 )). This implies
qi +1 Aqj = 0, (1.8)

j < m. i m.

(1.8) (1.9)

j = 1, , i 1,

i + 1 m.

That is
(Q m AQm )ij = (Tm )ij = qi Aqj = 0 for i > j + 1.

So Tm is upper Hessenberg and then tridiagonal (since Tm is Hermitian). It remains to show (1.6). Since [x, Ax, , Am1 x] = Qm Rm and 0 0 + Am xeT m, 0

. 1 .. AK [x, A, m] = K [x, A, m] .. .. . . 0 we have 0 0 1

. 1 .. AQm Rm = Qm Rm .. .. . . 0 1 0

m T m T + Qm Q m A xem + (I Qm Qm )A xem .

7.1 The Lanczos Algorithm Then 0 . 1 .. AQm = Qm [Rm .. . 0 0 . 1 .. = Qm [Rm .. . 0

263 0 .. 1 . 0 0 .. 1 . 0 m T 1 m T 1 + Q m A xem ]Rm + (I Qm Qm )A xem Rm 1 T m T m Rm + Q m A xem ] + (I Qm Qm )A x em


rm

= Qm H m + r m e T m with Qm rm = 0,

where Hm is an upper Hessenberg matrix. But Q m AQm = Hm is Hermitian, so Hm = Tm is tridiagonal. (b) We check (1.7): x = Qm e1 coincides the rst column. Suppose that i-th columns are equal, i.e.
i1 Ai1 x = Qm Tm e1 i i1 A x = AQm Tm e1 i1 = (Qm Tm + rm eT m )Tm e1 i i1 = Q m Tm e1 + rm eT m Tm e1 . i1 i i But eT m Tm e1 = 0 for i < m. Therefore, A x = Qm T e1 the (i + 1)-th columns are equal. m1 It is clearly that (e1 , Tm e1 , , Tm e1 ) is an upper triangular matrix.

Theorem 7.1.2 If x = q1 with q1

= 1 satises

rank (K [x, A, n]) = n (that is {x, Ax, , An1 x} are linearly independent), then there exists an unitary matrix Q with rst column q1 such that Q AQ = T is tridiagonal. Proof: From Theorem 7.1.1(a) m = n we have Qm = Q unitary and AQ = QT . Uniqueness: Let Q AQ = T , AQ =T and Q1 e1 = Qe 1 Q R K [q1 , A, n] = QR = Q Q = QD, R = DR. Substitute Q by QD, where D = diag ( 1 , ,
n)

with | i | = 1. Then

(QD) A(QD) = D Q AQD = D T D = tridiagonal. So Q is unique up to multiplying the columns of Q by a factor with | | = 1.

In the following paragraph we will investigate the Lanczos algorithm for the real case, i.e., A Rnn .

264 Chapter 7. Lanczos Methods How to nd an orthogonal matrix Q = (q1 , , qn ) with QT Q = In such that QT AQ = T = tridiagonal and Q is almost uniquely determined. Let AQ = QT, Q = [q1 , , qn ] and T = 1 1 0 .. .. . . 1 .. .. . . n1 0 n1 n . (1.10)

It implies that the j -th column of (1.10) forms: Aqj = j 1 qj 1 + j qj + j qj +1 ,


T for j = 1, , n with 0 = n = 0. By multiplying (1.11) by qj we obtain T Aqj = j . qj

(1.11)

(1.12)

Dene rj = (A j I )qj j 1 qj 1 . Then rj = j qj +1 with j = rj and if j = 0 then qj +1 = rj /j . So we can determine the unknown j , j , qj in the following order: Given q1 , 1 , r1 , 1 , q2 , 2 , r2 2 , q3 , . The above formula dene the Lanczos iterations: j = 0, r0 = q1 , 0 = 1 , q0 = 0 Do while (j = 0) qj +1 = rj /j , j := j + 1 T Aqj , j = qj rj = (A j I )qj j 1 qj 1 , j = rj 2 . (1.14)
2

(1.13)

(1.15)

There is no loss of generality in choosing the j to be positive. The qj are called Lanczos T (Aqj j 1 qj 1 ), the vectors. With careful overwriting and use of the formula j = qj whole process can be implemented with only a pair of n-vectors. Algorithm 7.1.1 (Lanczos Algorithm): Given a symmetric A Rnn and w Rn having unit 2-norm. The following algorithm computes a j j symmetric tridiagonal matrix Tj with the property that (Tj ) (A).

7.1 The Lanczos Algorithm 265 The diagonal and subdiagonal elements of Tj are stored in 1 , , j and 1 , , j 1 respectively. vi := 0 (i = 1, , n) 0 := 1 j := 0 Do while (j = 0) if (j = 0), then for i = 1, , n, t := wi , wi := vi /j , vi := j t. v := Aw + v, j := j + 1, j := wT v, v := v j w, j := v 2 . Remark (1) If the sparity is exploited and only kn ops are involved in each call (Aw) (k n), then each Lanczos step requires about (4+k )n ops to execute. (2) The iteration stops before complete tridiagonalizaton if q1 is contained in a proper invariant subspace. From the iteration (1.15) we have 1 1 rm . .. . m1 1 . . A(q1 , , qm ) = (q1 , , qm ) + (0, , 0, m qm+1 ) .. .. . . rm eT m m1 m m = 0 if and only if rm = 0. This implies A(q1 , , qm ) = (q1 , , qm )Tm . That is Range(q1 , , qm ) = Range(K [q1 , A, m]) is the invariant subspace of A and the eigenvalues of Tm are the eigenvalues of A. Theorem 7.1.3 Let A be symmetric and q1 be a given vector with q1 2 = 1. The Lanczos iterations (1.15) runs until j = m where m = rank [q1 , Aq1 , , An1 q1 ]. Moreover, for j = 1, , m we have AQj = Qj Tj + rj eT j with Tj = 1 1 1 .. . .. . .. .. . . j 1 j and Qj = [q1 , , qj ] (1.16)

j 1

has orthonormal columns satisfying Range(Qj ) = K(q1 , A, j ).

266 Chapter 7. Lanczos Methods Proof: By induction on j . Suppose the iteration has produced Qj = [q1 , , qj ] such that Range(Qj ) = K(q1 , A, j ) and QT j Qj = Ij . It is easy to see from (1.15) that (1.16) holds. Thus T T QT j AQj = Tj + Qj rj ej .
T Since i = qi Aqi for i = 1, , j and T T T qi +1 Aqi = qi+1 (i qi+1 + i qi + i1 qi1 ) = qi+1 (i qi+1 ) = i T for i = 1, , j 1 we have QT j AQj = Tj . Consequently Qj rj = 0.

If rj = 0 then qj +1 = rj / rj

is orthogonal to q1 , , qj and

qj +1 Span{Aqj , qj , qj 1 } K(q1 , A, j + 1). Thus QT j +1 Qj +1 = Ij +1 and Range(Qj +1 ) = K(q1 , A, j + 1). On the other hand, if rj = 0, then AQj = Qj Tj . This says that Range(Qj ) = K(q1 , A, j ) is invariant. From this we conclude that j = m = dim[K(q1 , A, n)]. Encountering a zero j in the Lanczos iteration is a welcome event in that it signals the computation of an exact invariant subspace. However an exactly zero or even small j is rarly in practice. Consequently, other explanations for the convergence of Tj s eigenvalues must be sought. Theorem 7.1.4 Suppose that j steps of the Lanczos algorithm have been performed and that T Sj Tj Sj = diag (1 , , j ) is the Schur decomposition of the tridiagonal matrix Tj , if Yj Rnj is dened by Yj = [y1 , , yj ] = Qj Sj then for i = 1, , j we have Ayi i yi where Sj = (spq ). Proof: Post-multiplying (1.16) by Sj gives AYj = Yj diag (1 , , j ) + rj eT j Sj , i.e., Ayi = i yi + rj (eT j Sj ei ) , i = 1, , j. The proof is complete by taking norms and recalling rj
2 2

= |j ||sji |

= | j |

Remark: The theorem provides error bounds for Tj s eigenvalues:


(A)

min |i | |j ||sji | i = 1, , j.

7.1 The Lanczos Algorithm Note that in section 10 the (i , yi ) are Ritz pairs for the subspace R(Qj ).

267

T If we use the Lanczos method to compute AQj = Qj Tj + rj eT j and set E = ww where = 1 and w = aqj + brj , then it can be shown that T (A + E )Qj = Qj (Tj + a2 ej eT j ) + (1 + ab)rj ej .

If 0 = 1 + ab, then the eigenvalues of the tridiagonal matrix j = Tj + a2 ej eT T j are also eigenvalues of A + E . We may then conclude from theorem 6.1.2 that the interval [i (Tj ), i1 (Tj )] where i = 2, , j , each contains an eigenvalue of A + E . of A. One possibility is to choose a2 Suppose we have an approximate eigenvalue so that j ) = (j + a2 )pj 1 ( ) 2 pj 2 ( ) = 0, j I det(T j 1 using (5.3). where the polynomial pi (x) = det(Ti xIi ) can be evaluated at The following theorems are known as the Kaniel-Paige theory for the estimation of eigenvalues which obtained via the Lanczos algorithm. Theorem 7.1.5 Let A be n n symmetric matrix with eigenvalues 1 n and corresponding orthonormal eigenvectors z1 , , zn . If 1 j are the eigenvalues of Tj obtained after j steps of the Lanczos iteration, then (1 n ) tan (1 )2 , 1 1 1 [cj 1 (1 + 21 )]2
T where cos 1 = |q1 z1 |, 1 = (1 2 )/(2 n ) and cj 1 is the Chebychev polynomial of degree j 1.

Proof: From Courant-Fischer theorem we have 1 = max


y =0

y T Tj y (Qj y )T A(Qj y ) wT Aw = max = max . y =0 (Qj y )T (Qj y ) 0=wK(q1 ,A,j ) w T w yT y

Since 1 is the maximum of wT Aw/wT w over all nonzero w, it follows that 1 1 . To obtain the lower bound for 1 , note that 1 = max
pPj 1 T p(A)Ap(A)q1 q1 , T q1 p(A)2 q1

where Pj 1 is the set of all j 1 degree polynomials. If


n

q1 =
i=1

di zi
n 2 2 i=1 di p(i ) i n 2 2 i=1 di p(i )

then

T p(A)Ap(A)q1 q1 = T q1 p(A)2 q1

268 1 (1 n ) d2 1 p(1

n i=2 )2 +

Chapter 7. Lanczos Methods 2 d2 p i (i ) . n 2 2 i=2 di p(i )

We can make the lower bound tight by selecting a polynomial p(x) that is large at x = 1 in comparison to its value at the remaining eigenvalues. Set p(x) = cj 1 [1 + 2 x n ], 2 n

where cj 1 (z ) is the (j 1)-th Chebychev polynomial generated by cj (z ) = 2zcj 1 (z ) cj 2 (z ), c0 = 1, c1 = z.

These polynomials are bounded by unity on [-1,1]. It follows that |p(i )| is bounded by unity for i = 2, , n while p(1 ) = cj 1 (1 + 21 ). Thus, 1 1 (1 n ) (1 d2 1 1) . 2 2 d1 cj 1 (1 + 21 )

2 The desired lower bound is obtained by noting that tan (1 )2 = (1 d2 1 )/d1 .

Corollary 7.1.6 Using the same notation as the theorem 7.1.5 n j n + (1 n ) tan2 (n ) , c2 j 1 (1 + 2n )

T where n = (n1 n )/(1 n1 ) and cos (n ) = |q1 zn |.

Proof: Apply theorem 7.1.5 with A replaced by A. Example: Lj 1 1


1 [Cj 1 (2 2

1)]2

1 [Cj 1 (1 + 21 )]2

Rj 1 = ( 1 /2 1.5 1.01

2 2(j 1) ) 1

power methed j=25 1.4 10 /3.5 109 2.8 104 /6.2 101
27

j=5 1.1 10 /3.9 102 5.6 101 /9.2 101


4

Lj 1 /Rj 1 Lj 1 /Rj 1

Rounding errors greatly aect the behavior of algorithm 7.1.1, the Lanczos iteration. The basic diculty is caused by loss of orthogonality among the Lanczos vectors. To avoid these diculties we can reorthogonalize the Lanczos vectors. (1) Complete reorthogonalization: Orthogonalize qj to all q1 , , qj 1 by
j 1

qj := qj
i=1

T (qj qi )qi .

7.1 The Lanczos Algorithm 269 If we incorporate the Householder computations into the Lanczos process, we can produce Lanczos vectors that are orthogonal to working accuracy:

r0 := q1 (given unit vector) T T Determine P0 = I 2v0 v0 /v0 v0 so P0 r0 = e1 T 1 := q1 Aq1 Do j = 1, , n 1, rj := (A j )qj j 1 qj 1 (0 q0 0), w := (Pj 1 P0 )rj . T T /vj vj such that Determine Pj = I 2vj vj Pj w = (w1 , , wj , j , 0, , 0)T . qj +1 := (P0 Pj )ej +1 , T j +1 := qj +1 Aqj +1 . This is the complete reorthogonalization Lanczos scheme. (2) Selective reorthogonalization: A remarkable, somewhat ironic consequence of the Paige (1971) error analysis is that loss of orthogonality goes hand in hand with convergence of a Ritz pair. For details of (1) and (2) see the books: Parlett: Symmetric Eigenvalue problem (1980) pp.257 Golub & Van Loan: Matrix computation (1981) pp.332 Theorem 7.1.7 (Paige Theorem) Let yi = Qj Si , i = 1, . . . , j (Ritz vector). Then T yi qj +1 = rii /ji rii /(j sji ), where rii = O(), Recall that (round-o-error).

Ayi i yi 2 = |j ||sji | |ji | O() (very small), yj span(Qj ), rii T qj +1 yi = O () (very small) ji O() yes! if |ji | = O(1) = O(1) no! if |ji | = O(). Loss of orthogonality !

(1) Selective Reorthogonalization: Select good Ritz vectors ( |ji | o( ) ) and do reorthogonalization (ie. qi+1 good Ritz vector). (2) Restart: full reorthogonalization. Restart in m-steps (m = 30 50).

270 Lanczos method Given q1 = 0

Chapter 7. Lanczos Methods

j = 1, 2, ..., m T j = qj Aqj rj = (A j I )qj + j 1 qj 1 j = rj 2 , if j = 0, otherwise stop . qj +1 = rj /j end m

A(Qm ) = Qm Tm + rm eT m, where QT m Qm = Im , Tm = sT m Tm sm = m = 1 1 . 1 . . ... 0 1 0 ... 0 m ... ... m1 0 m1 m .

, Ritz value

A(Qm sj ) Qj (Qm sj ) 2 = jm |j ||sjm |, S = [s1 , , sm ], j = 1, . . . , m. Paige Theorem Since AQj = Qj Tj + rj eT j , let AQj Qj Tj = rj eT j + Fj
T I QT j Qj = Cj + j + Cj ,

(1.17) (1.18)

where Cj is strictly upper triangular and j is diagonal. (For simplicity, suppose (Cj )i,i+1 = 0 and i = 0.) Ritz vector yi Qj s i. 1 0 .. Ritz value sT j (Tj sj = j sj ). . j Tj sj = 0 j Theorem 7.1.8 (Paige Theorem) Assume (a) Sj and j are exact ! ( j n) T T (b) local orthogonality is maintained. ( i.e. qi+1 qi = 0, i = 1, . . . , j 1, rj qj = 0, and (Cj )i,i+1 = 0 ). Let
T FjT Qj QT j Fj = Kj Kj ,

j Tj Tj j Nj NjT ,
T Gj = Sj (Kj + Nj )Sj (rik ).

Then

7.1 The Lanczos Algorithm T (1) yi qj +1 = rii /ji , where yi = Qj Si , ji = j Sji . (2) For i = k ,
T (i k )yi yk = rii (

271 ( )

Sjk Sji ) rkk ( ) (rik rki ). Sji Sjk

(1.19)

Proof: From (1.17), AQj Qj Tj = rj eT j + Fj . (1.17) is multiplied from left by QT j


T T T T QT j AQj Qj Qj Tj = Qj rj ej + Qj Fj . T T T T (9.2.7)T QT j A Qj Tj Qj Qj = ej rj Qj + Fj Qj T T T (QT j j )ej ej (Qj j ) T T =(Cj Tj Tj Cj ) + (Cj Tj Tj Cj ) + (j Tj Tj j ) + FjT Qj Qj FjT T T T =(Cj Tj Tj Cj ) + (Cj Tj Tj Cj ) + (Nj NjT ) + (Kj Kj ) T (QT j rj )ej = Cj Tj Tj Cj + Nj + Kj SiT ( ) Si gives T yi qj +1 ji = SiT (Cj Tj Tj Cj )Si + SiT (Nj + Kj )Si =(SiT Cj Si )i i (SiT Cj Si ) + rii rii T yi qj +1 = ji (9.2.6) can be obtained by SiT (9.2.7) Sk . T Remark: To ( ): yi qj +1 = rii , ji

(1.20)

( )

i = 1, . . . , j . (1.21)

T yi qj +1 =

rii = ji

O(esp), if |ji | = O(1)(not converge!) O(1), if |ji | = O(esp)(converge for(j , yj ))

T qj +1 yi = O (1), qj +1 is not orthogonal to < Qj >, Qj Sj = yj (1) Full Reorthogonalization by MGS

qj +1 q1 , . . . , qj
j

qj +1 := qj +1
i=1

T (qj +1 qi )qi .

(2) Selective Reorthogonalization by MGS If |ji | = O( eps), (j , yj ) good Ritz pair Do qj +1 q1 , . . . , qj Else not to do Reorthogonalization (3) Restart after m-steps (Do full Reorthogonalization) (4) Partial Reorthogonalization Do reorthogonalization with previous (k=5) Lanczos vectors {q1 , . . . , qk }

272 (B)To (9.2.6): The duplicate pairs can occur! T i = k , (i k ) yi yk = O(esp) O(1), if yi = yk Qi Qk How to avoid the duplicate pairs ? The implicit Restart Lanczos algorithm: Let {(i , xi )}n i=1 : eigenpair of A u1 =
k k

Chapter 7. Lanczos Methods

ri xi +

ri xi
n

i=1

i=k+1

P (A)u1 P () : Filter poly of degree m k =


i=1

ri P (i )xi +
i=k+1

ri P (i )xi unexpected

expected

(1) k+1 , . . . , m [a, b] 1 , . . . , k /[a, b] P () = Chebychev poly of degree m-k (2) u1 , . . . , uk , uk+1 , . . . , um : Ritz values

expected

unexpected

P (t) = (t k+1 ) (t m ) Implicit Restarted Algorithm: AQk = Qk Tk + k qk+1 eT k, k < m rk Lanczos AQm = Qm Tm + m qm+1 eT m choose a lter poly of degree m-k P (t) = (t 1 ) (t mk ), 1 , . . . , mk : convergent Ritz values. mathemetician:P (A)u1 = (A 1 ) (A mk )u1 := q1 Apply Lanczos on q1 . (A K1 I )Qm = Qm (Tm K1 I ) +m qm+1 eT m u1 R1 (QR f actorization) (A K1 I ) Qm u1 = (Qm u1 )(R1 u1 ) + m qm+1 (eT m u1 ) Qm (1) AQm (1) = Qm (1) Tm (1) + m qm bm+1 (1)T (1) (1) where bm+1 (1)T = eT m u1 (0, . . . , 0, um1,m , um,m ) (1) Tm = R1 u1 + K1 I Tridiag Remark: The rst column of Qm (1) e1 = Qm u1 e1 = (A 1 I )q1 q1 (1) Repeat this process with 2 , . . . , mk = AQm (mk) = Qm (mk) Tm (mk) + m qm+1 bm+1 (mk)T Qm (mk) e1 := q1 = (A K1 I ) (A Kmk I )q1 Tumcate: (mk) (mk) (mk) (mk) T + tk+1,k qk+1 eT Tkk = Qk AQk k + k um,k qm+1 ek (mk) (mk) , , Tkk = Tkk Let Qk = Qk

7.1 The Lanczos Algorithm (mk) k = ||tk+1,k qk+1 + k um,k qm+1 ||2 , qk+1 =
Pk , k

273 + k um,k qm+1 ,

where Pk = tk+1,k qk+1 k qk+1 eT k.

(mk)

AQk = Qk = Qk Tkk + (1) Implicit Restarted Lanczos (2) Krylov-Schur cycle

[2] The symmetric eigenvalue problem , Parlett(1981) CH.11 Approximation from a subspace CH.12 Krylov subspace Assumption: A: symmetric, Azi = i zi , i = 1, . . . , n. 1 2 n n 1 T (x) = (x, A) = xxTAx x 1 Given a subspace S (m) =< F >=< F (F T F ) 2 >< Q > Rayleigh-Ritz-Quotient procedure T H := (Q) = Q AQ, QT Q = I Hgi = i gi , (i gi ) : Ritz pair yi = Qi gi (extension by Q) Check : {(i , yi )}m approximate eigenpair? i=1 Ayi i yi 2 T ol, ri = Ayi i yi residual Optimality (1) minmax: j = j (A) = min max (f, A) j n j
F R f F

j := min max (f, A), j m j m j


G S f G

= j j (H ) Gj S m QGj = Gj j := min max (S.H ) = j = j (H ), j = 1, . . . , m.


Gj Rm S Gj

(2) Optimal Residual: Let R(B ) = AQ QB R(H ) 2 R(B ) 2 AQ QH 2 AQ QB 2 (3) Projection on S m : Hgi = i gi , i = 1, . . . , m QT AQgi = i gi QQT A(Qgi ) = i (Qgi ) Qgi = yi , QQT yi = Q(QT Q)gi = yi PQ = QQT projection on < Q > (QQT )A(QQT )yi = i yi (PA APA )yi = i yi PA (Ayi i yi ) = 0 ri = Ayi i yi < Q >= S m Theorem 7.1.9 H = QT AQ, i (A), s.t.|j j | R 2 = AQ QH

274 (By extension Thm) Theorem 7.1.10


m i=1

Chapter 7. Lanczos Methods

(j j )2 2 R

2 F

some j

Wiedlanclt - Homann Theorem 7.1.11 Let y be a unit vector = (y ), be an EW of A closed to , z be the EV. Let r =min |i (A) |. Then (1)| | r(y ) 2 /r. (2)| sin | r(y ) /r, where r(y ) = Ay y , = (y, z ). Proof: Claim(2): Decompose y = z cos + w sin ,z T w = 0. r(y ) = z ( ) cos + (A )w sin . Az = z z T (A )w = 0 2 2 2 2 r(y ) 2 2 = ( ) cos + (A )w 2 sin , (i )2 i2 , w = i zi and |wT (A )(A )w| = r2 ( i = i2 ) = r2 , 2 2 r(y ) 2 2 (A )w 2 sin r(y ) 2 | sin | r .
i = i = i =

Claim(1): r(y )y (ri = Ayi i yi < Q >) ie. 0 = y T r(y ) = ( ) cos2 + wT (A )w sin2 T A )w 2 Thus cos =w ( sin2 T A )w cos2 Let sin2 k = w ( sin2 1 = k+1 wT (A)w
w (A )w k similarly, cos2 = k+1 =w T (A)w 2 T r(y ) 2 = ( )w (A )(A )w/wT (A )w (A )(A )zi = (i )(i )zi positive denite >0 wT (A )(A )w = |i ||i |zi2
T

r
i =

|i

|zi2
2 2

i =

r|
i =

(i )zi2 | = rwT (A )w

| |

r(y ) r

100 years old and still alive : Eigenvalue problems Hank / G. Gloub / Van der Vorst / 2000 A priori bounds for interior Ritzvalues T Given S m =< Q > subspace, {(i , yi )}m i=1 Ritz pairs of H = Q AQ Azi = i zi , i = 1, . . . , n. Lemma 7.1.12 For each j m for any unit s S m satisfying sT zi = 0, i = 1, . . . , j 1.
j 1 i=1

Then j j (s)+

(1 i ) sin2 i

7.1 The Lanczos Algorithm


j 1

275 (1 i ) sin2 i ,
i=1

(s)+

where i = (yi , zi ).
j 1

Proof: Take s = t+ t=
m i=j i=1

ri yi ,

ri yi , tyi , i = 1, . . . , j 1.

Assumption: sT zi = 0, i = 1, . . . , j 1. Find bound of ri , i = 1, . . . , j 1. |ri | = |sT yi | = |sT (yi zi cos )| s 2 | sin |. T yi zi cos i 2 2 = (yi zi cos i ) (yi zi cos i ) = 1 cos2 i cos2 i + cos2 i = 1 cos2 i = sin2 i T tT Ayi = 0, and yi Ayk = 0, i = k , i = 1, . . . , j 1. T T T Ayk , i = k , i,k=1,. . . ,m. 0 = gi (Q AQ)gk = yi (s) = tT At+
j 1 i=1 T 2 (yi Ayi )ri j 1 i=1 2 (i 1 )ri

(s) 1 = tT (A 1 )t+
tT (A tT t
1 )t

j 1

i=1 j 1 i=1

2 (i 1) ri

(t) 1 +

(i 1 ) sin2 i

and (t) j , tyi , i = 1, . . . , j 1 Assortion ! Let ij = (zi , yj ), i = 1, . . . , n , j = 1, . . . , m ,ii = i


n

yj =
i=1

zi cos ij

(1.22) (1.23)

| cos ij | | sin i |
n j 1

cos2 ij = sin2 j
i=j +1 i=1

cos2 ij

(1.24)

Lemma 7.1.13 For each j = 1, . . . , m,


j 1

sin j [(j j )+
i=1

(j +1 i ) sin2 i ]/(j +1 j )

(1.25)

Remark: T T T yi = 0 , i = j ) (yi cos i zi )| ( yj zi | = |yj prove (9.3.10):| cos ij | = |yj yj 2 yi cos i zi 2 | sin i | ( |(yi cos i zi )T (yi cos i zi )| = sin2 i ) claim (1.24): yj =
n

zi cos ij

i=1

276 1 = (yj , y j ) =

n i=1

Chapter 7. Lanczos Methods cos2 ij


n j 1

1 cos2 jj = sin2 j =
i=j +1

cos2 ij +
i=1 h

cos2 ij

(1.26)

Proof: By (9.3.9), (yj , A j I ) = j j =


j 1

(i j ) cos2 ij (i j ) cos2 ij
n i=j +1

i=1

j j +
i=1

(j i ) cos2 ij =

i=j +1

(j +1 j )
(9.3.13)

cos2 ij
j 1 i=1

(j +1 j )(sin2 j

cos2 ij )

Solve sin2 j and use (9.3.10) Inequation (9.3.12) explanation: A priori bound for interior Ritz values By Lemma 7.1.12, 7.1.13 j = 1: 1 (s), sT z1 = 0(Lemma 7.1.12) (s)1 1 1 , sT z1 = 0(Lemma 7.1.13) j = 1: sin2 1 2 1 2 1 j = 2: 2 (s) + (1 1 ) sin2 1 ( )1 (s) + (1 1 ) 2 1 sT z1 = sT z2 = 0, T z1 = 0(Lemma 7.1.12) j = 2: sin2 2
(Lemma7.1.13)

(2 2 ) +

j =1,j =2

(3 1 ) sin2 1 3 2 3 1 (t)1 ( ) 3 2 2 1

. . .

(t)1 ) 2 ] + [(s) + (1 1 )( 2 1

Chapter 12 Krylov subspace Azj = j zj , j = 1, 2, . . . , n K m (f ) = [f, Af, . . . , Am1 f ] Sm = Km (f ) = f, Af, . . . , Am1 f created by Lanczos(A:symmetric) or Arnoldi(A:unsymmetric) Sm = Qm H m = ( QT m AQm ), Hm Sj = j Sj , yj = Qm Sj , j = 1, . . . , m (j , yj ) : Rayleigh - Ritz pair , j :R-Ritz value , yj : R-Ritz vector
m Lemma 7.1.14 Let {(i , yi )}m i=1 be Ritz pairs of K (f ), then (A)f yk (k ) = 0, k = 1, . . . , m, where P m1

Proof: Let ( ) = ( k ) ( ), ( ) P m2 T T yk (A)f =yk (A k ) (A)f K m (f ) T = rk (A)f = 0 ( rk Q = Km (f )) exercise!

7.1 The Lanczos Algorithm [2] The symmetric eigenvalue problem , Parlett(1981)

277

Denition 7.1.15 ( ) = ( i ), k ( ) =
i=1

( ) . ( k )

Corollary 7.1.16 yk =

k (A)f k (A)f

Proof: k (i ) = 0, i = k , i = k Lemma7.1.14 k (A)f yi , i = k k (A)f k (A)f yk yk = . k (A)f Lemma 7.1.17 Let H be the normalized projection of f orthogonal to Z j , Z j span(z1 , . . . , zj ). For each P m1 , j m , sin (f, Z j ) (A)h 2 ( (A)f, A j I ) (n j )[ ] cos (f, Z j ) | (j |) Proof: = (f, Z j )(= cos1 f Z j ) f = g cos + h sin , Z j is invariant. S (A)f = (A)g cos + (A)h sin Zj (Z j ) g (A j I ) 2 (A)g cos2 + h (A j I ) 2 (A)h sin2 (s, A j I ) = (A)f 2 1 2 n (a) v (A j I )v 0, v Z j , in particular, v = (A)g v Av ( j , v Z j ) v v (b) w (A j I )w (n j ) w 2 , w (Z j ) , in particular, w = (A)h (A)h sin 2 reduction by (a), (b) (s, A j I ) (n j )[ ] (A)f and s
2

(1.27)

= (A)f

2 (j ) cos2 (f, zj ), where f =

(f zi )zi

(9.2.4). The Error Bound of Kaniel and Saad: The Error bounds come from choosing P m1 in Lemma 7.1.17 s.t. (i) | (j )| is large,while (A)h is small as possible,and (ii) (s, A j I ) 0 To (i): By Chebychev poly:
n

i=1

2 (i ) cos2 (f, zj )
n i=j +1

(A)h

i=j +1

cos2 (f, zj )
[j +1 ,n ]

max 2 (i )
i>j

max

2 ( ) 2 ( )

Chebychev poly solves min nj


P

[j +1 ,n ]

max

278 Chapter 7. Lanczos Methods To (ii): (a) 0 j j (Cauchy interlace Theorem) (b) j j (s, A j I ), syi , i = 1, . . . , j 1 (By minmax theorem)
j 1

(c) j j (s, A j I )+
i=1

(n i ) sin2 (yi , zi )

if szi , i = 1, . . . , j 1, (by Chap11 Lemma 7.1.12) Theorem 7.1.18 (Saad) Let 1 n be the Ritz values from Km (f ) (by Lanczos or Arnoldi) For j = 1, . . . , m, where r = 0 j j (n j )[
k n sin (f, Z j ) ( ) k j

j 1

k=1

j j +1 j +1 n
j 1 k=1

cos (f, Z j )Tmj (1 + 2r)

]2

and tan (zj , Km )

k n sin (f, Z j ) ( ) k j

cos (f, Z j )Tmj (1 + 2r)

Proof: Apply Lemma 7.1.17, Lemma 7.1.14, To ensure (b), require syi , i = 1, . . . , j 1 By Lemma 7.1.14, we construct ( ) = ( 1 ) ( j 1 ) ( ), P mj = 0 (A)f yi , i = 1, . . . , j 1 By Lemma 7.1.17 for this ( ) : (A)h (A 1 ) (A j 1 ) (A)h , h Zj | (j )| |(j 1 ) (j j 1 )|| (j )| n k | ( )| | max , [j +1 , j ] | (j )| k=1 n k j 1 n k | ( )| | min | max | (j )| P mj j k=1 j k j 1 n k 1 = | | k=1 j k Tmj (1 + 2r ) |
j 1 (i )

(1.28)

[1, 1], t = (t [j +1 , n ] t
(9.2.4),(9.2.5)

2t j +1 n ) n j +1
j 1

0 j j

(n j )[

k=1 j cos (f, Z )Tmj (1

k m sin (f, Z j ) ( ) k j

+ 2r )

]2

To prove the second inequality: is chosen to satisfy (i ) = 0, i = 1, . . . , j 1 s = (A)f = zj (j ) cos (f, zj ) + (A)h sin

7.1 The Lanczos Algorithm sin (f, Z j ) (A)h tan (s, zj ) = , cos (f, zj )| (j )| where ( ) = ( 1 ) ( j 1 ) ( ), ( ) P mj is chosen by chebychev poly as above inequality.
1

279

Theorem 7.1.19 Let m . . . 1 be Royleigh-Ritz values of Km (f ), Azj = sin (f, Z j ) j zj , j = n, . . . , 1, n 1 , 0 j j (j 1 )[


1
k n ) ( k j

k=j +1

n k ( ) k j

cos (f, zj )Tmj (1 + 2r)

]2 ,

sin (f, Z j ) k=j +1 j 1 j m , tan(zj , K ) where r = [ ]2 . n j 1 cos (f, zj ) Tmj (1 + 2r) Theorem 7.1.20 (Kaniel) By (c) and Lemma 7.1.12 of Chap11 s = (A)f = (A 1 ) (A j 1 ) (A)f By Lemma 7.1.17 and Lemma 7.1.14
k n sin (f, Z j ) ( ) k j

j 1

0 j j (n j )[
j 1

k=1

cos (f, zj )Tmj (1 + 2r)

]2

+
k=1

(n k ) sin2 (yk , zk )
j 1

(j j )+ and sin2 (yj , zj ) j j +1 where r = . j +1 n


k=1

(j +1 k ) sin2 (yk , zk ) j +1 j

280

7.2
7.2.1

Applications to linear Systems and Least Squares


Symmetric Positive Denite System

Chapter 7. Lanczos Methods

Recall: Let A be symmetric positive denite and Ax = b. Then x minimizes the functional 1 (x) = xT Ax bT x. (2.1) 2 An approximate minimizer of can be regarded as an approximate solution to Ax = b. One way to produce a sequence {xj } that converges to x is to generate a sequence of orthonormal vectors {qj } and to let xj minimize over span{q1 , , qj }, where j = 1, , n. Let Qj = [q1 , , qj ]. Since 1 T T x span{q1 , , qj } (x) = y T (QT j AQj )y y (Qj b) 2 for some y Rj , it follows that xj = Qj yj , where
T (QT j AQj )yj = Qj b.

(2.2) (2.3)

Note that Axn = b. We now consider how this approach to solving Ax = b can be made eective when A is large and sparse. There are two hurdles to overcome: (1) the linear system (2.3) must be easily solved; (2) we must be able to compute xj without having to refer to q1 , , qj explicitly as (2.2) suggests. To (1): we use Lanczos algorithm algorithm 7.1.1 to generate the qi . After j steps we obtain AQj = Qj Tj + rj eT (2.4) j, where 1 1 0 and Tj yj = QT j b. (2.5)

.. . 1 2 Tj = Q T AQ = j j .. .. . . j 1 0 j 1 j

With this approach, (2.3) becomes a symmetric positive denite tridiagonal system which may be solved by LDLT Cholesky decomposition, i.e., Tj = Lj Dj LT j, where 1 Lj = 2 0 0 . ... . . .. .. . . 0 j 1 and Dj = d1 0 0 (2.6)

..

0 . dj

7.2 Applications to linear Systems and Least Squares Comparsion of the entries of (2.6): d1 i i di Note that we need only calculate j = j 1 /dj 1 dj = j j 1 j in order to obtain Lj and Dj from Lj 1 and Dj 1 . To (2): Trick: we dene Cj = [c1 , , cj ] Rnj and pj Rj by the equations = Qj Cj LT j Lj Dj pj = QT jb and observe that
T 1 T xj = Qj Tj1 QT j b = Qj (Lj Dj Lj ) Qj b = Cj pj .

281

= = = =

1 , 2, , j, i1 /di1 , i i1 i .

(2.7)

(2.8)

(2.9)

It follows from (2.9) that [c1 , 2 c1 + c2 , , j cj 1 + cj ] = [q1 , , qj ], and therefore Cj = [Cj 1 , cj ], 0 Lj 1 Dj 1 0 0 j dj 1 dj cj = qj j cj 1 . . If we set pj = [1 , , j ]T in Lj Dj pj = QT j b, then that equation becomes j 1 j 1 2 . . . = T qj 1 b T b qj
T q1 b T q2 b . . .

Since Lj 1 Dj 1 pj 1 = QT j 1 b, it follows that pj = and thus xj = Cj pj = Cj 1 pj 1 + j cj = xj 1 + j cj . This is precisely the kind of recursive formula for xj that we need. Together with (2.8) and (2.9) it enables us to make the transition from (qj 1 , cj 1 , xj 1 ) to (qj , cj , xj ) with a minimal amount of work and storage. pj 1 j ,
T j = ( q j b j dj 1 j 1 )/dj

282 Chapter 7. Lanczos Methods A further simplication results if we set q1 = b/0 where 0 = b 2 . For this choice of T a Lanczos starting vector we see that qi b = 0 for i = 2, 3, . It follows from (2.4) that
T T Axj = AQj yj = Qj Tj yj + rj eT j yj = Qj Qj b + rj ej yj

= b + rj eT j yj . Thus, if j = rj 2 = 0 in the Lanczos iteration, then Axj = b. Moreover, since Axj b 2 = j |e T j yj |, the iteration provides estimates of the current residual. Algorithm 7.2.1 Given b Rn and a symmetric positive denite A Rnn . The following algorithm computes x Rn such that Ax = b.
T Aq1 , d1 = 1 , c1 = q1 , x1 = b/1 . 0 = b 2 , q1 = b/0 , 1 = q1 For j = 1, , n 1, rj = (A j )qj j 1 qj 1 (0 q0 0), j = rj 2 , If j = 0 then Set x = xj and stop; else qj +1 = rj /j , T j +1 = qj +1 Aqj +1 , j +1 = j /dj , dj +1 = j +1 j +1 j , j +1 = j +1 dj j /dj +1 , cj +1 = qj +1 j +1 cj , xj +1 = xj + j +1 cj +1 , x = xn .

This algorithm requires one matrix-vector multiplication and 5n ops per iteration. Symmetric Indenite Systems A key feature in the above development is the idea of computing LDLT Cholesky decomposition of tridiagonal Tj . Unfortunately, this is potentially unstable if A, and consequently Tj , is not positive denite. Paige and Saunders (1975) had developed the recursion for xj by an LQ decomposition of Tj . At the j -th step of the iteration we will Given rotations J1 , , Jj 1 such that d1 0 e2 d2 Tj J1 Jj 1 = Lj = f3 e3 d3 . . . . .. .. .. 0 fj ej dj Note that with this factorization, xj is given by xj = Qj yj = Qj Tj1 QT j b = Wj s j ,

7.2 Applications to linear Systems and Least Squares where Wj Rnj and sj Rj are dened by Wj = Qj J1 Jj 1 and Lj sj = QT j b.

283

Scrutiny of these equations enables one to develop a formula for computing xj from xj 1 and an easily computed multiple of wj , the last column of Wj . Connection of Algorithm 14.1 and CG method: Let xL j : Iterative vector generated by Algorithm 7.2.1

xCG : Iterative vector generated by CG method with , xCG = 0. i 0


CG Since r0 = b Ax0 = b = pCG 0 , then CG xCG = 0 p0 = 1

bT b b = xL 1. bT Ab

Claim: xCG = xL i i for i = 1, 2, , (1) CG method (A variant version): x0 = 0, r0 = b, For k = 1, , n, if rk1 = 0 then set x = xk1 and quit. T T else k = rk (1 0), 1 rk1 /rk2 rk2 pk = rk1 + k pk1 (p1 r0 ), T T k = rk 1 rk1 /pk Apk , xk = xk 1 + k p k , rk = rk1 k Apk , x = xn . Dene Rk = [r0 , , rk1 ] Rnk and Bk = 1 2 1 0 .. 0 . . .. . k 1

(2.10)

From pj = rj 1 + j pj 1 (j = 2, , k ) and p1 = r0 it follows Rk = Pk Bk . Since the columns of Pk = [p1 , , pk ] are A-conjugate, we see that
T T T diag (pT ARk = Bk Rk 1 Ap1 , , pk Apk )Bk

is tridiagonal. Since span{p1 , , pk }=span{r0 , , rk1 }=span{b, Ab, , Ak1 b} and r0 , , rk1 are mutually orthogonal, it follows that if
k

= diag (0 , , k1 ),

i = ri 2 ,

284 Chapter 7. Lanczos Methods 1 then the columns of Rk k form an orthonormal basis for span{b, Ab, , Ak1 b}. Consequently the columns of this matrix are essentially the Lanczos vectors of algorithm L CG 7.2.1, i.e., qi = ri 1 /i1 (i = 1, , k ). Moreover, Tk =
1 T T k Bk diag (pi Api )Bk 1 k

The diagonal and subdiagonal of this matrix involve quantities that are readily available during the conjugate gradient iteration. Thus, we can obtain good estimate of A s extremal eigenvalues (and condition number) as we generate the xk in (2.11). pCG = cL i i constant. Show that: cL i are A-orthogonal.
T Cj LT j = Qj Cj = Qj Lj

1 T T 1 T T Cj ACj = L = L j Qj AQj Lj j Tj Lj 1 T T = L = Dj . j Lj Dj Lj Lj

So {ci }j i=1 are A-orthogonal.


1 T (2) It is well known that xCG minimizes the functional (x) = 2 x Ax bT x in the j 1 T j 1 L subspace span{r0 , Ar0 , , A r0 } and xj minimize (x) = 2 x Ax bT x in the subspace span{q1 , , qj }. We also know that K [q1 , A, j ] = Qj Rj which implies K(q1 , A, j ) =span {q1 , , qj }. But q1 = b/ b 2 , r0 = b, so span {r0 , Ar0 , , Aj 1 r0 } = K(q1 , A, j ) =span {q1 , , qj } therefore we have xCG = xL j j.

7.2.2

Bidiagonalization and the SVD


U = [u1 , , um ], V = [v1 , , vn ], U T U = Im , V T V = In , 0 . . (2.12)

Suppose U T AV = B the bidiagonalization of A Rmn and that (2.11)

and

B= 0 0

1 .. .

.. ..

. n1 n 0

Recall that this decomposition serves as a front end for the SV D algorithm. Unfortunately, if A is large and sparse, then we can expect large, dense submatrices to arise during the Householder transformation for the bidiagonalization. It would be nice to develop a method for computing B directly without any orthogonal update of the matrix A. We compare columns in the equations AV = U B and AT U = V B T : Avj = j uj + j 1 uj 1 , 0 u0 0, AT uj = j vj + j vj +1 , n vn+1 0,

7.2 Applications to linear Systems and Least Squares for j = 1, , n. Dene rj = Avj j 1 uj 1 and pj = AT uj j vj . We may conclude that j = rj 2 , vj +1 = pj /j , uj = rj /j , j = pj 2 .

285

These equations dene the Lanczos method for bidiagonaling a rectangular matrix (by Paige (1974)): Given v1 Rn , with unit 2-norm. r1 = Av1 , 1 = r1 2 . For j = 1, , n, If j = 0 then stop; else uj = rj /j , pj = AT uj j vj , j = pj 2 , If j = 0 then stop; else vj +1 = pj /j , rj +1 = Avj +1 j uj , j +1 = rj +1 2 .

(2.13)

It is essentially equivalent to applying the Lanczos tridiagonalization scheme to the symO A . We know that metric matrix C = AT 0 i (C ) = i (A) = n+mi+1 (C ) for i = 1, , n. Becauseof this, the large singular values of the bidiagonal matrix 1 1 0 ... ... Bj = tend to be very good approximations to the large singular ... j 1 0 j values of A. Least Squares: As detailed in chapter III the full-rank LS problem min Ax b solved by the bidiagonalization (2.11)-(2.12). In particular,
n 2

can be

xLS = V yLS =
i=1

ai vi ,

T T where y = (a1 , , an )T solves the bidiagonal system By = (uT 1 b, , un b) .

Disadvantage: Note that because B is upper bidiagonal, we cannot solve for y until the bidiagonalization is complete. We are required to save the vectors v1 , , vn an unhappy circumstance if n is very large. Modication: It can be accomplished more favorably if A is reduced to lower bidiagonal

286 form:

Chapter 7. Lanczos Methods 1 0 1 2 .. .. . . U T AV = B = , .. . n 0 n 0 0

m n + 1,

where V = [v1 , , vn ] and U = [u1 , , um ]. It is straightforward to develop a Lanczos procedure which is very similar to (2.13). Let Vj = [v1 , , vj ], Uj = [u1 , , uj ] and 1 0 1 2 .. .. . . Bj = R(j +1)j ... j 0 j and consider minimizing Ax b AVj y b
2 T 2

over all vectors of the form x = Vj y , y Rj . Since


m T 2

= U AVj y U b

= Bj y

UjT+1 b 2

2 (uT i b) , i=j +2

it follows that xj = Vj yj is the minimizer of the LS problem over span{Vj } , where yj minimizes the (j + 1) j LS problem min Bj y UjT+1 b 2 . Since Bj is lower bidiagonal, it is easy to compute Jacobi rotations J1 , , Jj such that Jj J1 Bj = Let Jj J1 UjT+1 b = Bj y UjT+1 b dj u
2

Rj 0

is upper bidiagonal.

, then = Rj 0 y dj u
2.

= Jj J1 y Jj J1 UjT+1 b

1 1 So yj = Rj dj , xj = Vj yj = Vj Rj dj = Wj dj . Let

Wj = (Wj 1 , wj ), wj = (vj wj 1 rj 1,j )/rjj (rj 1,j and rjj are elements of Rj ). Rj can be computed from Rj 1 . Similarly, dj = dj 1 , xj can be obtained from xj 1 : j xj = Wj dj = (Wj 1 , wj ) Thus xj = xj 1 + wj j . dj 1 j = Wj 1 dj 1 + wj j .

7.2 Applications to linear Systems and Least Squares For details see Paige-Saunders (1978). Error Estimation of LS-problem Continuity of A+ of the function: Rmn Rmn dened by A A+ .

287

Lemma 7.2.2 If {Ai } converges to A and rank (Ai ) = rank (A) = n, then {A+ i } also + converges to A .
T Proof: Since lim AT i Ai = A A nonsingular, so i 1 T T 1 T + T A+ i = (Ai Ai ) Ai (A A) A = A . i

1 0 , Example: Let A = 0 0 0 1 0 rank (A0 ) < 2. But A+ = 0 1/

1 0 > 0, A0 = 0 0 then A A0 as 0 0 0 1 0 0 A+ as 0. 0 = 0 0 0 0

0,

Theorem 7.2.3 Let A, B Rmn , then holds + 2 A+ B + F 2 A B F max{ A+ 2 2, B 2 }. Without proof. Remark: It does not follow that A B implies A+ B + . Because A+ can diverges to , see example. Theorem 7.2.4 If rank (A) = rank (B ) then A+ B + where =
F

A+

B+

AB

F,

2, 1,

if rank (A) < min(m, n) if rank (A) = min(m, n).

Pseudo-Inverse of A: A+ is the unique solution of equations A+ AA+ = A+ , (AA+ ) = AA+ , AA+ A = A, (A+ A) = A+ A. PA = AA+ is Hermitian. PA is idempotent, and R(PA ) = R(A). PA is the orthogonal projection onto R(A). Similarly, R(A) = A+ A is the projection onto R(A ). Furthermore,
+ 2 LS = b AA b 2 2

= (I AA+ )b 2 2.

Banach Lemma: B 1 A1 A B A1 B 1 . Proof: From ((A + A)1 A1 )(A + A) = I I A1 A follows Lemma immediately.

288 Chapter 7. Lanczos Methods Theorem 7.2.5 (1) The product PB PA can be written in the form
PB PA = (B + ) RB E PA , where PA = I PA , B = A + E . Thus PB PA B+ 2

E .

(2) If rank (A) = rank (B ), then PB PA min{ B + 2 , A+ 2 } E .

Proof:
PB PA = PB PA = (B + ) B PA = (B + ) (A + E ) PA = (B + ) E PA = (B + ) B (B + ) E PA = (B + ) RB E PA ( RB 1, PA 1). PA . Exercise! PB Part (2) follows from the fact that rank (A) rank (B ) PB PA (Using C-S decomposition).

Theorem 7.2.6 It holds


F1 F2 F3 B + A+ = B + PB ERA A+ + B + PB PA RB RA A+ . B + A+ = B + PB ERA A+ + (B B )+ RB E PA RB E PA (AA )+ .

Proof: B + BB + (B A)A+ AA+ + B + BB + (I AA+ ) (I B + B )(A+ A)A+ = B + (B A)A+ + B + (I AA+ ) (I B + B )A+ = B + A+ (Substitute PB = BB + , E = B A, RA = AA+ , .). Theorem 7.2.7 If B = A + E , then B + A+ F 2 E

+ 2 max{ A+ 2 2, B 2 }.

Proof: Suppose rank (B ) rank (A). Then the column spaces of F1 and F2 are orthogonal to the column space of F3 . Hence B + A+
2 F

= F1 + F2
2 F

2 F

+ F3

2 F

((I B + B )B + = 0).
2 F + PB PA 2 F ).

Since F1 + F2 = B + (PB EA+ PA + PB PA ), we have

F1 + F2

+ B+ 2 2 ( PB EA PA

By theorem 7.2.5 and 7.2.6 follows that = Thus F1 + F2


F 2 + 2 2 PB EA+ PA 2 F + PB PA F PB EA F + PB PA F + 2 + 2 2 + 2 PB EA+ 2 F + PB EA F = EA F E F A 2.

A+ F3

B+

(PB PA = PB ERA A+ = PB EA+ ).

By theorem 7.2.6 we have


F

A+ A+

2 2

RA F = A+ RB A+ A+ ERB

2 2 2

RA RB E F.

The nal bound is symmetric in A and B , it also holds when rank (B ) rank (A).

7.2 Applications to linear Systems and Least Squares Theorem 7.2.8 If rank (A) = rank (B ), then B + A+ F 2 A+ 2 B + 2 E F . (see Wedin (1973)) From above we have

289

B + A+ B+ 2

2k2 (A)

E A

F 2

This bound implies that as E approaches zero, the relative error in B + approaches zero, which further implies that B + approach A+ . Corollary 7.2.9 lim B + = A+ rank (A) = rank (B ) as B appraches A.
B A

(See Stewart 1977) Perturbation of solutions of the LS-problem We rst state two Corollarys of Theorem (SVD). Theorem 7.2.10 (SVD) If A Rmn then there exists orthogonal matrices U = [u1 , , um ] Rmm and V = [v1 , , vn ] Rnn such that U T AV = diag (1 , , p ), p = min(m, n) where 1 2 p 0. Corollary 7.2.11 If the SVD is given by theorem 7.2.10 and 1 r > r+1 = = p = 0, then (1) rank (A) = r. (2) N (A) =span{vr+1 , , vn }. (3) Range(A) =span{u1 , , ur }. r T T (4) A = i =1 i ui vi = Ur r Vr , where Ur = [u1 , , ur ], Vr = [v1 , , vr ] and r = diag (1 , , r ). 2 2 (5) A 2 F = 1 + + r . (6) A 2 = 1 . Proof: exercise ! Corollary 7.2.12 Let SVD of A Rmn is given by theorem 7.2.10. If k < r = rank (A) T and Ak = k i=1 i ui vi , then
rank(X )=k,X Rmn

min

AX

= A Ak

= k+1 .

(2.14)

Proof: Let X Rmn with rank (X ) = k . Let 1 , , n with 1 n 0 be the singular values of X . Since A = X + (A X ) and k+1 = 0, then k+1 = |k+1 k+1 | V T ( = diag (1 , , k , 0, , 0)) we have A X 2 . For the matrix Ak = U A Ak
2

VT = U ( )

= k+1 .

LS-problem: Ax b 2 =min! xLS = A+ b. Perturbated LS-problem: (A + E )y (b + f )


2

= min!

y = (A + E )+ (b + f ).

290 Chapter 7. Lanczos Methods mn Lemma 7.2.13 Let A, E R and rank (A) = r. 1 (1) If rank (A + E ) > r then holds (A + E )+ 2 E . 2 + (2) If rank (A + E ) r and A 2 E 2 < 1 then rank (A + E ) = r and (A + E )+
2

A+ 1 A+

2 2

.
2

Proof: Let 1 n be the singular values of A + E . To (1): If k is the smallest nonzero singular value, then k r + 1 because of rank (A + E ) > r. By corollary 7.2.6 we have E 2 = (A + E ) A 2 r+1 k and therefore (A + E )+ 2 = 1/k 1/ E 2 . To (2): Let 1 n be the singular values of A, then r = 0 because of rank (A) = r and A+ 2 = 1/r . Since A+ 2 E 2 < 1 so E 2 < r , and then by corollary 7.2.6 it must be rank (A + E ) r, so we have rank (A + E ) = r. By Weyls theorem (theorem 6.1.5) we have r r E 2 and furthermore here r E 2 > 0, so one obtains (A + E )+
2

= 1/r 1/(r E 2 ) = A+ 2 /(1 A+

E 2 ).

Lemma 7.2.14 Let A, E Rmn , b, f Rm and x = A+ b, y = (A + E )+ (b + f ) and r = b Ax, then holds y x = [(A + E )+ EA+ + (A + E )+ (I AA+ ) +(I (A + E )+ (A + E )A+ ]b + (A + E )+ f = (A + E )+ Ex + (A + E )+ (A + E )+T E T r +(I (A + E )+ (A + E ))E T A+T x + (A + E )+ f. Proof: y x = [(A + E )+ A+ ]b + (A + E )+ f and for (A + E )+ A+ one has the decomposition (A + E )+ A+ = (A + E )+ EA+ + (A + E )+ A+ +(A + E )+ (A + E A)A+ = (A + E )+ EA+ + (A + E )+ (I AA+ ) (I (A + E )+ (A + E ))A+ . Let C := A + E and apply the generalized inverse to C we obtain C + = C + CC + = T C + C + C + and AT (I AA+ ) = AT AT AA+ = AT AT A+ AT = AT AT AT AT = 0, also A+ = AT A+ A+ and (I C + C )C T = 0. Hence it holds C + (I AA+ ) = C + C + E T (I AA+ ) and (I C + C )A+ = (I C + C )E T A+ A+ .
T T T T +

7.2 Applications to linear Systems and Least Squares 291 + If we substitute this into the second and third terms in the decomposition of (A+E ) A+ then we have the result (r = (I AA+ )b, x = A+ b): y x = [(A + E )+ EA+ + (A + E )+ (A + E )+ E T (I AAT ) +(I (A + E )+ (A + E ))E T A+ A+ ]b + (A + E )+ f = (A + E )+ Ex + (A + E )+ (A + E )+ E T r +(I (A + E )+ (A + E ))E T A+ x + (A + E )+ f Theorem 7.2.15 Let A, E Rmn , b, f Rm , and x = A+ b = 0, y = (A + E )+ (b + f ) and r = b Ax. If rank (A) = r, rank (A + E ) r and A+ 2 E 2 < 1, then holds yx x 2
2
T T T T

A 2 A+ 2 1 A+ 2 E

[2
2

E A

2 2

A+ 1 A+

2 2

E A

2 2

r x

2 2

f A
2

].
2

Proof: From Lemma 7.2.14 follows yx


2

(A + E )+ 2 [ E 2 x 2 + (A + E )+ + I (A + E )+ (A + E ) 2 E 2 A+

2 2

E 2 r x 2.

+ f

2]

Since I (A + E )+ (A + E ) is symmetric and it holds (I (A + E )+ (A + E ))2 = I (A + E )+ (A + E ). From this follows I (A + E )+ (A + E ) 2 = 1, if (A + E )+ (A + E ) = I . Together with the estimation of Lemma 7.2.13(2) we obtain yx and yx x 2
2 2

A+ 1 A+

2 2

[2 E
2

2+ f

2+

A+ 1 A+

2 2

E
2

r 2]

A 2 A+ 2 1 A+ 2 E

[2
2

E A

2 2

f A
2

+
2

A+ 1 A+

2 2

E A

2 2

r 2 ]. x 2

292

7.3

Unsymmetric Lanczos Method

Chapter 7. Lanczos Methods

Suppose A Rnn and that a nonsingular 1 X 1 AX = T = 1

matrix X exists such that 1 0 ... 2 . .. .. . . n1 0 n1 n

Let X = [x1 , , xn ] and X T = Y = [y1 , , yn ]. Compare columns in AX = XT and AT Y = Y T T we nd that Axj = j 1 xj 1 + j xj + j xj +1 , and AT yj = j 1 yj 1 + j yj + j yj +1 , 0 y0 0
T for j = 1, , n 1. These equations together with Y T X = In imply j = yj Axj and

0 x0 0

j xj +1 = j (A j )xj j 1 xj 1 j yj +1 = pj (A j )T yj j 1 yj 1 .

(3.1)

These is some exibility in choosing the scale factors j and j . A canonical choice is to set j = j 2 and j = xT j +1 pj giving: Biorthogonalization method of Lanczos:
T Given x1 , y1 Rn with xT 1 x1 = y1 y1 = 1. For j = 1, , n 1, T j = yj Axj , rj = (A j )xj j 1 xj 1 (0 x0 0), j = rj 2 . If j > 0 then xj +1 = rj /j pj = (A j )T yj j 1 yj 1 (0 y0 0), j = x T j +1 pj , else stop; If j = 0 then yj +1 = pj /j else stop; n = xT n Ayn

(3.2)

Dene Xj = [x1 , , xj ], Yj = [y1 , , yj ] and Tj to be the leading j j principal submatrix of T , it is easy to verify that AXj = Xj Tj + j eT j AT Yj = Yj TjT + pj eT j.
T Remark: (1) pT j j = j j xj +1 yj +1 = j j from (3.1). (2) Break of the algorithm (3.2) occurs if pT j j = 0: (a) j = 0 j = 0. Then Xj is an invariant subspace of A (by (3.3)). (b) pj = 0 j = 0. Then Yj is an invariant subspace of AT (by (3.3)).

(3.3)

7.3 Unsymmetric Lanczos Method 293 T (c) pj j = 0 but pj j = 0, then (3.2) breaks down. We begin the algorithm (3.2) with a new starting vector. (3) If pT j j is very small, then j or j small. Hence yj +1 or xj +1 are large, so the algorithm (3.2) is unstable. Denition 7.3.1 An upper Hessenberg matrix H = (hij ) is called unreducible, if hi+1,i = 0, for i = 1, , n 1 (that is subdiagonal entries are nonzero). A tridiagonal matrix T = (tij ) is called unreducible, if ti,i1 = 0 for i = 2, , n and ti,i+1 = 0 for i = 1, , n 1 Theorem 7.3.2 Let A Rnn . Then (1) If x = 0 so that K [x1 , A, n] = [x1 , Ax1 , , An1 x1 ] nonsingular and if X is a nonsingular matrix such that K [x1 , A, n] = XR, where R is an upper triangular matrix, then H = X 1 AX is an upper unreducible Hessenberg matrix. (2) Let X be a nonsingular matrix with rst column x1 and if H = X 1 AX is an upper Hessenberg matrix, then holds K [x1 , A, n] = XK [e1 , H, n] XR, where R is an upper triangular matrix. Furthermore, if H is unreducible, then R is nonsingular. = Y 1 AY where H and H are both upper Hessenberg matrices, (3) If H = X 1 AX and H H is unreducible and the rst columns x1 and y1 of X and Y , respectively, are linearly = J 1 HJ . dependent, then J = X 1 Y is an upper triangular matrix and H Proof: ad(1): Since x1 , Ax1 , , An1 x1 are linearly independent, so An x1 is the linear combination of {x1 , Ax1 , , An1 x1 } i.e. there exists c0 , , cn1 such that
n1

An x1 =
i=0

ci Ai x1 .

Let

1 C= 0

.. . ...

0 0 1

c0

c1 . . . . cn1 Thus

Then we have K [x1 , A, n]C = [Ax1 , A2 x1 , , An1 x1 , An x1 ] = AK [x1 , A, n]. XRC = AXR. We then have X 1 AX = RCR1 = H unreducible Hessenberg matrix. ad(2): From A = XHX 1 follows that Ai x1 = XH i X 1 x1 = XH i e1 . Then K [x1 , A, n] = [x1 , Ax1 , , An1 x1 ] = [Xe1 , XHe1 , , XH n1 e1 ] = X [e1 , He1 , , H n1 e1 ].

294 Chapter 7. Lanczos Methods If H is upper Hessenberg, then R = [e1 , He1 , , H n1 e1 ] is upper triangular. If H is unreducible upper Hessenberg, then R is nonsingular, since r11 = 1, r22 = h21 , r33 = h21 h32 , , and so on. ad(3): Let y1 = x1 . We apply (2) to the matrix H . It follows K [x1 , A, n] = XR1 . , we also have K [y1 , A, n] = Y R2 . Here R1 and R2 are upper triangular. Applying (2) to H Since y1 = x1 , so K [x1 , A, n] = XR1 = Y R2 .
1 Since R1 is nonsingular, by (2) we have R2 is nonsingular and X 1 Y = R1 R2 =J (upper triangular). So

= Y 1 AY = (Y 1 X )X 1 AX (X 1 Y ) = J 1 HJ. H

Theorem 7.3.3 Let A Rnn , x, y Rn with K [x, A, n] and K [y, AT , n] nonsingular. Then (1) If B = K [y, AT , n]T K [x, A, n] = (y T Ai+j 2 x)i,j =1, ,n has a decomposition B = LDLT , where L is a lower triangular with lii = 1 and D is diagonal (that is all principal determinants of B are nonzero) and if X = K [x, A, n]L1 , then T = X 1 AX is an unreducible tridiagonal matrix. (2) Let X, Y be nonsingualr with = Y 1 AY unreducible tridiagonal, (a) T = X 1 AX , T (b) the rst column of X and Y are linearly dependent, (c) the rst row of X and Y are linearly dependent. = D 1 T D . Then X 1 Y = D diagonal and T (3) If T = X 1 AX is unreducible tridiagonal, x is the rst column of X and Y is the rst row of X 1 , then B = K [y, AT , n]T K [x, A, n] has a LDLT decomposition. Proof: ad(1): X = K [x, A, n]LT XLT = K [x, A, n]. So the rst column of X is x. From B = LDLT follows K [y, AT , n]T = LDLT K [x, A, n]1 and then K [y, AT , n] = K [x, A, n]T LDLT = X T DLT . Apply theorem 7.3.2(1) to (3.4): X 1 AX unreducible upper Hessenberg. Apply theorem 7.3.2(1) to (3.5): X T AT X T = (X 1 AX )T unreducible upper Hessenberg. (3.5) (3.4)

7.3 Unsymmetric Lanczos Method So X 1 AX is an unreducible tridiagonal matrix.

295

are unreducible upper Hessenberg, by theorem 7.3.2(3) we have X 1 Y ad(2): T and T upper triangular on the other hand. Since T T = X T AT X T T = Y T AT Y T T upper Hessenberg upper Hessenberg unreducible,

then by theorem 7.3.2(3) we also have Y T X T = (X 1 Y )T upper triangular. Thus X 1 Y is upper triangular, also lower triangular so the matrix X 1 Y is diagonal. ad(3): exercise!

296

Chapter 7. Lanczos Methods

Chapter 8 Arnoldi Method


8.1 Arnoldi decompositions
Kk+1 = Uk+1 Rk+1 be the QR factorization of Kk+1 . Theorem 8.1.1 Let u1 2 = 1 and the columns of Kk+1 (A, u1 ) be linearly independent. Let Uk+1 = [ u1 uk+1 ] be the Q-factor of Kk+1 . Then there is a (k + 1) k unreduced upper Hessenberg matrix h11 h 21 k H 22 h ... ... k,k1 h 1k h 2k h . . . hkk with i+1,i = 0 h

Suppose that the columns of Kk+1 are linearly independent and let

k+1,k h

such that k. AUk = Uk+1 H (8.1.1)

k is a (k +1)k unreduced Conversely, if Uk+1 is orthonormal and satises (8.1.1), where H upper Hessenberg matrix, then Uk+1 is the Q-factor of Kk+1 (A, u1 ).
1 . Then Proof: () Let Kk = Uk Rk be the QR factorization and Sk = Rk

AUk = AKk Sk = Kk+1 where

0 Sk

= Uk+1 Rk+1

0 Sk

k, = Uk+1 H

k = Rk+1 H

0 Sk

298 Chapter 8. Arnoldi Method It implies that Hk is a (k + 1) k Hessenberg matrix and i+1,i = ri+1,i+1 sii = ri+1,i+1 . h rii k is unreduced. Thus by the nonsingularity of Rk , H () If k = 1, then 11 u1 + h 21 u2 . Au1 = h It follows that K2 (A, u1 ) = [u1 Au1 ] = [u1 u2 ] 1 0 11 h 21 h .

Since [u1 u2 ] is orthonormal, [u1 u2 ] is the Q-factor of K2 . Assume Uk is the Q-factor of Kk (A, u1 ), i.e. Kk (A, u1 ) = Uk Rk , where Rk is upper triangular. If we partition k = H then from (8.1.1) Kk+1 (A, u1 ) = = = Hence Uk+1 is the Q-factor of Kk+1 . Denition 8.1.1 Let Uk+1 Cn(k+1) be orthonormal. If there is a (k +1) k unreduced k such that upper Hessenberg matrix H k, AUk = Uk+1 H (8.1.2) Kk (A, u1 ) Auk k + h k+1,k uk+1 Uk Rk Uk h Uk uk+1 k Rk h k+1,k 0 h . k k 1 H h k+1,k 0 h ,

k is reduced, we say the then (8.1.2) is called an Arnoldi decomposition of order k . If H Arnoldi decomposition is reduced. Partition k = H and set k+1,k . k = h Then (8.1.2) is equivalent to AUk = Uk Hk + k uk+1 eT k. Hk hk+1,k eT k ,

8.1 Arnoldi decompositions 299 Theorem 8.1.2 Suppose the Krylov sequence Kk+1 (A, u1 ) does not terminate at k + 1. Then up to scaling of the columns of Uk+1 , the Arnoldi decomposition of Kk+1 is unique. Proof: Since the Krylov sequence Kk+1 (A, u1 ) does not terminate at k + 1, the columns of Kk+1 (A, u1 ) are linearly independent. By Theorem 8.1.1, there is an unreduced matrix Hk and k = 0 such that AUk = Uk Hk + k uk+1 eT k, (8.1.3)

where Uk+1 = [Uk uk+1 ] is an orthonormal basis for Kk+1 (A, u1 ). Suppose there is another k = 0 k+1 = [U k u k and orthonormal basis U k+1 ] for Kk+1 (A, u1 ), unreduced matrix H such that k u k = U k H k + AU k+1 eT k. Then we claim that H uk+1 = 0. U k k such that For otherwise there is a column u j of U u j = uk+1 + Uk a, Hence Au j = Auk+1 + AUk a which implies that Au j contains a component along Ak+1 u1 . Since the Krylov sequence Kk+1 (A, u1 ) does not terminate at k + 1, we have Kk+2 (A, u1 ) = Kk+1 (A, u1 ). Therefore, Au j lies in Kk+2 (A, u1 ) but not in Kk+1 (A, u1 ) which is a contradiction. k+1 are orthonormal bases for Kk+1 (A, u1 ) and U H uk+1 = 0, it follows Since Uk+1 and U k that k ) R(Uk ) = R(U that is k Q Uk = U for some unitary matrix Q. Hence k u k Q) + k Q) = ( U k Q)(QH H k+1 (eT A(U k Q), or k u k Q) + AUk = Uk (QH H k+1 eT k Q. (8.1.4) and
H u k+1 = 0, Uk

= 0.

300 Chapter 8. Arnoldi Method H On premultiplying (8.1.3) and (8.1.4) by Uk , we obtain


H k Q. H k = Uk AUk = QH H

Similarly, premultiplying by uH k+1 , we obtain


H H k+1 )eT Q. k eT k = uk+1 AUk = k (uk+1 u k

It follows that the last row of Q is k eT k , where |k | = 1. Since the norm of the last column of Q is one, the last column of Q is k ek . Since Hk is unreduced, it follows from the implicit Q theorem that Q = diag(1 , , k ), |j | = 1, j = 1, . . . , k.

k Q is the same as U k . Subtracting (8.1.4) from (8.1.3), Thus up to column scaling Uk = U we nd that k u k uk+1 = k k+1 so that up to scaling uk+1 and u k+1 are the same. Theorem 8.1.3 Let the orthonormal matrix Uk+1 satisfy k, AUk = Uk+1 H k is Hessenberg. Then H k is reduced if and only if R(Uk ) contains an eigenspace where H of A. k is reduced, say that hj +1,j = 0. Partition Proof: () Suppose that H k = H H11 H12 0 H22 and Uk = [ U11 U12 ],

where H11 is an j j matrix and U11 is consisted the rst j columns of Uk+1 . Then A[ U11 U12 ] = [ U11 U12 uk+1 ] It implies that AU11 = U11 H11 so that U11 is an eigenbasis of A. k is unre() Suppose that A has an eigenspace that is a subset of R(Uk ) and H duced. Let (, Uk w) for some w be an eigenpair of A. Then k Uk )w 0 = (A I )Uk w = (Uk+1 H k Uk+1 I w, = Uk+1 H w = Uk+1 H 0 H11 H12 0 H22 .

8.1 Arnoldi decompositions where = H Hk I hk+1,k eT k .

301

is unreduced, the matrix Uk+1 H is of full column rank. It follows that w = 0 Since H which is a contradiction. Write the k -th column of the Arnoldi decomposition AUk = Uk Hk + k uk+1 eT k, in the form Auk = Uk hk + k uk+1 . Then from the orthonormality of Uk+1 , we have
H hk = Uk Auk .

Since k uk+1 = Auk Uk hk and uk+1 and


1 uk+1 = k (Auk Uk hk ). 2

= 1, we must have k = Auk Uk hk


2

Algorithm 8.1.1 (Arnoldi process) 1. For k = 1, 2, . . . H 2. hk = Uk Auk . 3. v = Auk Uk hk 4. k = hk+1,k = v 2 5. uk+1 = v/k hk k = Hk1 6. H 0 hk+1,k 7. end for k The computation of uk+1 is actually a form of the well-known Gram-Schmidt algorithm. In the presence of inexact arithmetic cancelation in statement 3 can cause it to fail to produce orthogonal vectors. The cure is process called reorthogonalization. Algorithm 8.1.2 (Reorthogonalized Arnoldi process) For k = 1, 2, . . . H hk = Uk Auk . v = Auk Uk hk . H w = Uk v. hk = hk + w. v = v Uk w . k = hk+1,k = v 2 uk+1 = v/k hk k = H k 1 H 0 hk+1,k end for k

302 Chapter 8. Arnoldi Method (k ) (k) (k) (k ) Let yi be an eigenvector of Hk associated with the eigenvalue i and xi = Uk yi the Ritz approximate eigenvector. Theorem 8.1.4 (A i I )xi and therefore, (A i I )xi
(k) (k ) 2 (k) (k) (k)

= hk+1,k eT k yi uk+1 . = |hk+1,k || eT k yi |.


(k )

8.2

Krylov decompositions
AUk = Uk Bk + uk+1 bH k+1

Denition 8.2.1 Let u1 , u2 , . . . , uk+1 be linearly independent and let Uk = [u1 uk ].

is called a Krylov decomposition of order k . R(Uk+1 ) is called the space spanned by the decomposition. Two Krylov decompositions spanning the same spaces are said to be equivalent. Let [V v ]H be any left inverse for Uk+1 . Then it follows that Bk = V H AUk and
H bH k+1 = v AUk .

In particular, Bk is a Rayleigh quotient of A. Let AUk = Uk Bk + uk+1 bH k+1 be a Krylov decomposition and Q be nonsingular. That is k AUk = Uk+1 B with k = B Bk bH k+1 . (8.2.5)

Then we get an equivalent Krylov decomposition of (8.2.5) in the form A(Uk Q) = = Uk+1 Q 0 0 1 Q 0 0 1 Q 1 Bk Q bH k+1 Q (8.2.6)
1

k Q B

Uk Q uk+1

= (Uk Q)(Q1 BQ) + uk+1 (bH k+1 Q). The two Krylov decompositions (8.2.5) and (8.2.6) are said to be similar. Let u k+1 = uk+1 Uk a.

Since u1 , . . . , uk , uk+1 are linearly independent, we have = 0. Then it follows that k+1 (bH AUk = Uk (Bk + abH k+1 ). k+1 ) + u Since R([Uk uk+1 ]) = R([Uk u k+1 ]), this Krylov decomposition is equivalent to (8.2.5).

8.2 Krylov decompositions 303 Theorem 8.2.1 Every Krylov decomposition is equivalent to a (possibly reduced) Arnoldi decomposition. Proof: Let AU = U B + ubH be a Krylov decomposition and let R U =U be the QR factorization of U . Then = A(U R1 ) = (U R1 )(RBR1 ) + u(bH R1 ) U B + u AU bH is an equivalent decomposition. Let u = 1 (u U a) be a vector with u
2

= 1 such that U H u = 0. Then

=U (B + a B +u AU bH ) + u ( bH ) U bH is an equivalent orthonormal Krylov decomposition. Let Q be a unitary matrix such that bH Q = b 2 eT k is upper Hessenberg. Then the equivalent decomposition and QH BQ A(U Q ) = (U Q)(QH BQ )+u B + AU ( b H Q) U b 2u eT k is a possibly reduced Arnoldi decomposition where Hu Hu U = QH U = QH RH U H u = 0.

8.2.1
Let

Reduction to Arnoldi form

AU = U B + ubH be the Krylov decomposition with B Ckk . Let H1 be a Householder transformation such that bH H1 = ek .

304 Chapter 8. Arnoldi Method H Reduce H1 BH1 to Hessenberg form as the following illustration: B := B := BH2 = 0 0 + + + + + + + + + + H B := BH3 = + + B= B := H2 0 + + + + + + 0 0 0 0 H B := H3 B= 0 + + 0 0 Let Q = H1 H2 Hk1 . Then QH BQ is upper Hessenberg and
T bH Q = (bH H1 )(H2 Hk1 ) = eT k (H2 Hk1 ) = ek .

Therefore, the Krylov decomposition A(U Q) = (U Q)(QH BQ) + ueT k is an Arnoldi decomposition. (8.2.7)

8.3
Let

The implicitly restarted Arnoldi method


AUk = Uk Hk + k uk+1 eT k

be an Arnoldi decomposition. In principle, we can keep expanding the Arnoldi decomposition until the Ritz pairs have converged. Unfortunately, it is limited by the amount of memory to storage of Uk . Restarted the Arnoldi process once k becomes so large that we cannot store Uk . Implicitly restarting method Krylov-Schur decomposition Choose a new starting vector for the underlying Krylov sequence A natural choice would be a linear combination of Ritz vectors that we are interested in.

8.3 The implicitly restarted Arnoldi method

305

8.3.1

Filter polynomials

Assume A has a complete system of eigenpairs (i , xi ) and we are interested in the rst k of these eigenpairs. Expand u1 in the form
k n

u1 =
i=1

i x i +
i=k+1

i x i .

If p is any polynomial, we have


k n

p(A)u1 =
i=1

i p(i )xi +
i=k+1

i p(i )xi .

Choose p so that the values p(i ) (i = k +1, . . . , n) are small compared to the values p(i ) (i = 1, . . . , k ). Then p(A)u1 is rich in the components of the xi that we want and decient in the ones that we do not want. p is called a lter polynomial. Suppose we have Ritz values 1 , . . . , m and k+1 , . . . , m are not interesting. Then take p(t) = (t k+1 ) (t m ).

8.3.2
Let

Implicitly restarted Arnoldi

AUm = Um Hm + m um+1 eT m

(8.3.8)

be an Arnoldi decomposition with order m. Choose a lter polynomial p of degree m k and use the implicit restarting process to reduce the decomposition to a decomposition k u k = U k H k + AU k+1 eT k of order k with starting vector p(A)u1 . Let 1 , . . . , m be eigenvalues of Hm and suppose that 1 , . . . , mk correspond to the part of the spectrum we are not interested in. Then take p(t) = (t 1 )(t 2 ) (t mk ). The starting vector p(A)u1 is equal to p(A)u1 = (A mk I ) (A 2 I )(A 1 I )u1 = (A mk I ) [ [(A 2 I ) [(A 1 I )u1 ]]] .

306 Chapter 8. Arnoldi Method In the rst, we construct an Arnoldi decomposition with starting vector (A 1 I )u1 . From (8.3.8), we have (A 1 I )Um = Um (Hm 1 I ) + m um+1 eT m = Um Q1 R1 + m um+1 eT , m where Hm 1 I = Q1 R1 is the QR factorization of Hm 1 I . Postmultiplying by Q1 , we get (A 1 I )(Um Q1 ) = (Um Q1 )(R1 Q1 ) + m um+1 (eT m Q1 ) . It implies that
(1) (1) (1) AUm = Um Hm + m um+1 bm+1 , (1)H

(8.3.9)

where
(1) Um = Um Q1 , (1) (1) Hm = R1 Q1 + 1 I,

bm+1 = eT m Q1 .
(1)

(1)H

(Hm : one step of single shifted QR algorithm) Theorem 8.3.1 Let Hm be an unreduced Hessenberg matrix. Then Hm has the form
(1) Hm = (1) m is unreduced. where H (1) m h12 H 0 1

Proof: Let Hm 1 I = Q1 R1 be the QR factorization of Hm 1 I with Q1 = G(1, 2, 1 ) G(m 1, m, m1 ) where G(i, i + 1, i ) for i = 1, . . . , m 1 are Given rotations. Since Hm is unreduced upper Hessenberg, i.e., the subdiagonal elements of Hm are nonzero, we get i = 0 and (R1 )ii = 0 for i = 1, . . . , m 1. (R1 )mm = 0. Using the results of (8.3.10), (8.3.11) and (8.3.12), we get
(1) Hm = R1 Q1 + 1 I = R1 G(1, 2, 1 ) G(m 1, m, m1 ) + 1 I (1) m H h12 , = 0 1 (1) m where H is unreduced.

for i = 1, . . . , m 1

(8.3.10) (8.3.11) (8.3.12)

Since 1 is an eigenvalue of Hm , we have that Hm 1 I is singular and then

8.3 The implicitly restarted Arnoldi method Remark 8.3.1 Um is orthonormal.


(1)

307

Since Hm is upper Hessenberg and Q1 is the Q-factor of the QR factorization of (1) Hm 1 I , it implies that Q1 and Hm are also upper Hessenberg. The vector bm+1 = eT m Q1 has the form bm+1 =
(1)H (1)H

0 qm1,m qm,m
(1)

(1)

(1)

i.e., only the last two components of bm+1 are nonzero. For on postmultiplying (8.3.9) by e1 , we get
(1) (A 1 I )u1 = (A 1 I )(Um e1 ) = Um R1 e1 = r11 u1 . (1) (1)

Since Hm is unreduced, r11 is nonzero. Therefore, the rst column of Um is a multiple of (A 1 I )u1 . By the denition of Hm , we get
(1) H Q1 H m Q1 = Q1 (R1 Q1 + 1 I )QH 1 = Q1 R 1 + 1 I = H m . (1)

(1)

(1)

Therefore, 1 , 2 , . . . , m are also eigenvalues of Hm . Similarly,


(1) (1) (1) (A 2 I )Um = Um (Hm 2 I ) + m um+1 bm+1 (1) = Um Q2 R2 + m um+1 bm+1 , (1)H (1)H

(1)

(8.3.13)

where
(1) Hm 2 I = Q2 R 2

is the QR factorization of Hm 2 I with upper Hessenberg matrix Q2 . Postmultiplying by Q2 , we get


(1) (1) (A 2 I )(Um Q2 ) = ( Um Q2 )(R2 Q2 ) + m um+1 (bm+1 Q2 ). (1)H

(1)

It implies that
(2) (2) (2) AUm = Um Hm + m um+1 bm+1 , (2)H

where
(2) (1) Um Um Q2

is orthonormal,
(2) R2 Q2 + 2 I = Hm

Hm2

(2)

308 Chapter 8. Arnoldi Method (2) is upper Hessenberg with unreduced matrix Hm2 and
(1) T bm+1 bm+1 Q2 = qm1,m eH m1 Q2 + qm,m em Q2 (2)H (1)H (1)

For on postmultiplying (8.3.13) by e1 , we get


(2) (1) R2 e1 = r11 u1 . e1 ) = Um (A 2 I )u1 = (A 2 I )(Um (1) (2) (2)

Since Hm is unreduced, r11 is nonzero. Therefore, the rst column of Um is a multiple (1) (1) of (A 2 I )u1 = 1/r11 (A 2 I )(A 1 I )u1 . Repeating this process with 3 , . . . , mk , the result will be a Krylov decomposition
(mk) (mk) (mk) AUm = Um Hm + m um+1 bm+1 (mk)H

(1)

(2)

(2)

with the following properties i. Um


(mk)

is orthonormal. is upper Hessenberg.


(mk)H

ii. Hm

(mk)

iii. The rst k 1 components of bm+1 iv. The rst column of Um


(mk)

are zero.

is a multiple of (A 1 I ) (A mk I )u1 .

Corollary 8.3.1 Let 1 , . . . , m be eigenvalues of Hm . If the implicitly restarted QR step (mk) has the form is performed with shifts 1 , . . . , mk , then the matrix Hm
(mk) Hm =

Hkk 0

(mk)

Hk,mk T (mk)

(mk)

where T (mk) is an upper triangular matrix with Ritz value 1 , . . . , mk on its diagonal. For k = 3 and m = 6, A u u u u u u

u u u u u u

0 0 0 0 .

0 0 0

0 0

+u

0 0 q q q q

Therefore, the rst k columns of the decomposition can be written in the form AUk
(mk)

= Uk

(mk)

Hkk

(mk)

T + hk+1,k uk+1 eT k + k qmk um+1 ek ,

(mk)

8.3 The implicitly restarted Arnoldi method 309 (mk) (mk) (mk) where Uk consists of the rst k columns of Um , Hkk is the leading principal (mk) submatrix of order k of Hm , and qkm is from the matrix Q = Q1 Qmk . Hence if we set k = U (mk) , U k (mk) Hk = Hkk , k = hk+1,k u(mk) + k qmk um+1 2 , k+1 (mk) 1 + k qmk um+1 ), u k+1 = (hk+1,k u
k k+1

then k u k = U k H k + AU k+1 eT k is an Arnoldi decomposition whose starting vector is proportional to (A 1 I ) (A mk I )u1 . Avoid any matrix-vector multiplications in forming the new starting vector. Get its Arnoldi decomposition of order k for free. For large n the major cost will be in computing U Q.

310

Chapter 8. Arnoldi Method

Chapter 9 Jacobi-Davidson method


References: (i) Sleijpon, Henk and Van der Vorst, SIAM Matrix Anal, Appl, Vol.17, PP.401-425, 1996 (ii) BIT, Vol.36, PP.595-633, 1996. (iii) SIAM Sci comp, Vol.20, PP.94-125, 1998 (iv) Lehoucg and Meerbergen, Using generalized Cauchy Transformation within an inexact rational Krylov subspace method, 2001

9.1

JOCC(Jacobi Orthogonal Component Correction)


Ax = x,

Consider

where A is a nonsymmetric diagonal dominant matrix (i.e., |aii | > A= CT b F ,

j =i

|aij |). Let

with being the largest diagonal element. Then A That is = + C T z, (F I )z = b. EV = 1 z = 1 0 + 0 z = e1 + 0 , e1 z 0 z (correction) (9.1.2) 1 z = CT b F 1 z = 1 z . (9.1.1)

312 Chapter 9. Jacobi-Davidson method Jacobi proposed to solve (9.1.2) by the following Jacobi iteration with z1 = 0 for k = 1, 2, . . . k = + C T zk (D k I )zk+1 = (D F )zk b

(9.1.3)

end where D = diag (F ). Remark: k is not a Ritz value

9.2

Davidson method

Davidson s method as an accelerated JOCC method 1 EV x (Ax = x) Assume uk = zk and k is the associated EW. The residual rk = (A k I )uk = Davidson(1975) proposed computing tk from (DA k I )tk = rk , DA = diag (A) For the component y k = 0 yk of tk orthog to u1 = 1 0 (9.2.6) (9.2.5) k + cT zk (F k I )zk + b (9.2.4)

(D k I )yk = (F k I )zk b = (D F )zk (D k I )zk b 0 0 D yk (F k I )zk + b

(9.2.5) i

k I

(D k I )(zk + yk ) = (D F ). let (D k I )(zk + yk ) = (D k I )zk+1 compare (9.2.7), (??): zk + yk is the zk+1 that we would have obtained with one step JOCC starting with zk Davidson suggested computing Ritz value/vector of A w.r.t (xed k )

(9.2.7)

9.3 Jacobi Davidson method Sk+1 =< u1 , . . . , uk , u k+1 >=< u1 , . . . , uk , tk > O.B. = < v1 , . . . , vk+1 > (orthog basis) where u1 = e1 , u k+1 = uk + y k = uk + tk = yk = 0 + 0 yk , 0 yk 0 , = u1

313

i.e. compute a Ritz pair (k+1 , uk+1 ) which is nearest the target value. Then compute rk+1 = (A k+1 I )uk+1 GOTO (9.2.4)

9.3

Jacobi Davidson method

In fact, we want to nd the orthog comp for the current approx uk w.r.t the desired EW u. We are interested in seeing explicitly what happens in the subspace uk . Let B = (I uk uk T )A(I uk uk T ), uk T uk = 1 A = B + Auk uk T + uk uk T A k uk uk T . uk T Auk where (k , uk ) is a given Ritz pair k = uk T uk We are in search of EW of A choose to k . We want to have two correction v uk s.t. A(uk + v ) = (uk + v ) By (9.3.8) and Buk = 0, (B I )v = r + ( k uk T Av )uk (B I )v uk Since the left hand side and r have no component in uk r = Auk k uk uk (B I )v = r
k

(9.3.8)

(9.3.9)

(9.3.10)

(9.3.11) (9.3.12) (9.3.13)

v uk

(B k I )v = r

(I uk uk T )(A k I )t = r, t uk Remark:

314 Chapter 9. Jacobi-Davidson method (1) If we take v = r Arnoldi or Lanzcos. (2) If we take v = (DA k I )1 rk Davidson method. uk for the solution of (B k I )t = rk , t uk (3) Select sutable approx t Jacobi-Davidson Algorithm: SIMAX (1996) 1. Start: Choose v1 ( v1
2 T = 1)w1 = Av1 , h11 = v1 w1

Set V1 = [v1 ], W1 = [w1 ], H1 = [h11 ], u = v1 , = h11 , Compute r = w1 u 2. Iterate until convergence do 3. Inner loop For k = 1, . . . , m 1 do a.Solve (approximately) t u (I uuT )(A I )(I uuT )t = r b.Orthog. t against Vk by MGS and Vk+1 = [Vk |vk+1 ], vk+1 = t Vk (VkT t) c.VkT +1 AVk+1 Hk+1 d.Compute the largest (the nearest to the target) (, s) of Hk+1 ( s
2

= 1)

e.Compute Ritz vector u := Vk+1 s f.Compute residual r = Au u g.Test convergence ? 4.Restart set V1 = [u], H1 = [] go to 3. main part: (9.3.12) or (9.3.13) main step: Solving correction vector t with t uk from the correction equation
T I uk uT k A(k )(I uk uk )t = rk

(9.3.14)

where A(k ) = A k I . There are three methods to solve t. (a) Method I: Use preconditioning iterative approximations, e.g., GMRES, to solve (9.3.14). The method uses a preconditioner
T T T M p I uk uT k M I uk uk I uk uk A(k ) I uk uk ,

9.3 Jacobi Davidson method 315 where M is an approximation of A(k ) and an iterative method to solve Eq. (9.3.14). In each of the iterative steps, it needs to solve the linear system Mp t = y, t uk (9.3.15)

for a given y . Since t uk , Eq. (9.3.15) can be rewritten as


T I uk uT k Mt = y Mt = uk Mt uk + y k uk + y.

Hence t = M1 y + k M1 uk , where k = Let A(k ) = L + D + U . Then M = (D + L)D1 (D + U ) is a SSOR preconditioner of A(k ). (b) Method II: Since t uk , Eq. (9.3.14) can be rewritten as A(k )t = uT k A(k )t uk rk uk rk . Let t1 and t2 be approximated solutions of the following linear systems: A(k )t = rk and A(k )t = uk , (9.3.16)
1 uT kM y . 1 uT k M uk

for (9.3.16) is respectively. Then the approximated solution t = t1 + t2 t for = uT k t1 . uT k t2

for (9.3.16) can be For the special case, the approximated solution t = M1 rk + M1 uk t where M is an approximation of A(k ). (c) Method III: Eq. (9.3.16) implies that t = A(k )1 uk A(k )1 rk = A(k )1 uk uk . Let t1 be approximated solution of the following linear system: A(k )t = uk . for (9.3.16) is Then the approximated solution t = t1 uk t for = uT k t1
1

for

1 uT k M rk , 1 uT k M uk

(9.3.17)

316 Solve

Chapter 9. Jacobi-Davidson method (I uk uk T )(A k I )(I uk uk T )t = r, t uk since t uk I uk uT k t = t


(ref eq :23)

(9.3.18)

I uk uT k (A k I ) t = r, tuk (A k I ) t = uk r

Determine s.t. tuk t = (A k I )1 uk (A k I )1 r Dene tT uk = 0 uk (A k I )1 r = T uk (A k I )1 uk Choose a preconditioner M A k I = M 1 uk M 1 r t 1 uT r kM where = T 1 uk M uk (9.3.19) (9.3.20)

Remark: (1) If we choose = 0 M = DA k I Davidson method = M 1 r ( in this case t /uk ) (2) If we choose = 0, M = I Arnoldi or Lanczos (3) (9.3.17) is equivalent with t = (A k I )1 uk , ( t uk ). Math equation to shift-invert iteration (locally quadratic convergence). In nite arithmetics, the vector (A k I )1 uk may make a very small angle with uk , s.t. it will be impossible to compute a signicant orthogonal search direction. Discussion : A = DA + E : strongly diagonally dominant E D A a11 .. Assume , a11 : the largest diagonally element. . ann To Davidson method:

9.3 Jacobi Davidson method From r = Auk k uk = (DA + E ) uk k uk , = (DA k I )1 r = uk + (DA k I )1 Euk t (DA k I )1 Euk is not small compared with uk Davidson method works well for diagonal dominant problem. (DA k I )1 Euk not small compared with uk , k a11 , is expected to recover some part of signicant digits. t To Jacobi-Davidson method: = (DA k I )1 uk (DA k I )1 r, t uk t 1 T u M r is well-determined by = Tk 1 , M = (DA k I ) . uk M uk

317

(DA k I )1 uk

= =

1 uT r kM (DA k I )1 uk T 1 uk M uk

uk (DA k I )1 r uk (DA k I )1 uk (DA k I )1 r .

(DA k I )1 uk

uk r { (DA k I )1 uk , (DA k I )1 r} is linearly independent and (DA k I )1 uk (DA k I )1 r . There will be hardly any cacellation in the . computation of t Remark: = combination of ( Shift-Invert ) and ( Davidson ) , t where Shift-Invert = (DA k I )1 uk , Davidson = (DA k )1 r. Consider Ax = x, be simple. Lemma 9.3.1 Consider with T x = 0. Then the map Fp I is a bijection from to . Extension : Fp = I
T uuT uT u x T T x

(A I ) I

x T T x

(A I ) I

uuT uT u

t = r. t u, r u, t u r u .
Fp

Proof: Let y and Fp y = 0 Claim : y = 0 ? x T Fp y = 0 . I (A I ) I Tx

x T T x

y = 0.

(A I ) y x. y and x Ker(A I )2 . is simple. y x. But y , x / y = 0

318 Chapter 9. Jacobi-Davidson method Theorem 9.3.2 Assume that correction equation is solved exactly in each step of JDalgorithm. Choose = Au u T I T u u T (A I ) I T u t = r, t (9.3.21)

I P T x /0. Assume uk = u x, k = k , k Then if uk is suciently chosen to x, then uk x locally quadratically convergent k =


T k Auk T k uk

Proof: Ax = x. Let x = u + z, z then (A I ) z = (A I ) u + ( ) x = r + ( ) x (9.3.22)

Consider the exact solution z1 of (9.3.21) (I P ) (A I ) z1 = (I P ) r x (u + z1 ) = z z1 and z = x u It suces to show that x (u + z1 ) = z z1 = O z


2

( (I P ) r = r)

(9.3.23)

(9.3.24)

Multiplying (9.3.22) by (I P ) and subtracting the result from (9.3.23) yields (I P ) (A I ) (z z1 ) = ( ) (I P ) z + ( ) (I P ) u Multiplying (9.3.22) by T and using r , = uT /0 kx ( ) (I P ) z = T (A I ) z (I P ) z T x (9.3.27) T (A I ) z T x (9.3.26) (9.3.25)

From (9.3.25), lemma9.3.1 and (I P ) u = 0 z z1 = = = o (I P ) (A k I ) |k (I P ) (A k I ) |k z


2 1

( ) (I P ) z
T k (A k I ) z (I P ) z T k x

9.3 Jacobi Davidson method

319

9.3.1

Jacobi Davidson method as on accelerated Newton Scheme

Ax = x, : simple. Choose T x = 1 T x = 0 Consider nonlinear equation F (x) = 0 F (u) = Au u , = (u) = T Au T u

( u = 1) or T u = 1 Au (u) u T =0 F (u) = choose (u) = TAu u T u=1 F : {u| T u = 1} In particular, r F (u) = Au (u) u If uk x, the next Newton approximation uk+1 is given by uk+1 = uk + t, where t .
T T T uT k+1 = 1 = (uk + t) = 1 + t t = 0

F |u=uk t = F (uk ) = r u uk+1 = uk F |u=uk u


1

F (uk )

The Jacobian of F acts on and is given by F |u=uk t = u I uk T T uk (A k I ) t, t T Au u T u

Au (u) u = Au F u

= A ( u) I

T Au u T + T u u T A

(wT u)2 T Au u T A T = A I + u T u ( T u)2

On the other hand, I u T T u (A I ) = A I T Au u T A T + 2 u T T u ( u)

Hence the correction equation of Newton method read as : t , uk T I T (A k I ) t = r uk correction equation of JD

320

Chapter 9. Jacobi-Davidson method

9.3.2

Jacobi-Davidson with harmonic Ritz values

Ritz values : Vk Cn (k , uk ) is a Ritz pair. Auk k uk Vk (9.3.28)

Harmonic Ritz values : (Inverse of A implicitly) 1 is a Ritz value of A1 k C is a harmonic Ritz value of A with respect to Wk , if k with respect to Wk To avoid computing A1 Remark: A is a normal matrix. A1 is normal. Theorem 9.3.3 Let Vk =< v1 , v2 , . . . , vk >, k C is a harmonic Ritz value of A with respect to Wk = AVk Auk k uk AVk for some uk Vk (9.3.29)

If AVk = Wk =< w1 , . . . , wk > and


T T Hk = Wk Vk 1 T Wk AVk

(9.3.30)

then (9.3.29) Hk S = k S , uk = Vk S Remark: Compute Hk := VkT AT Vk VkT AT AVk . Compute Hk S = k S . 1 Proof: By (9.3.28). k , Auk is a Ritz pair of A1 with respect to Wk = AVk , some uk Vk .
1 A1 (Auk ) k (Auk ) AVk 1 = k (Auk k uk ) AVk Auk k uk AVk (9.3.29) T Wk (Auk k uk ) = 0 T Wk (AVk S k Vk S ) = 0 T T Wk AVk S = k Wk Vk S 1

Remark: If Vk Rnn (In general, Vk Rnk ) Hk :=


1 Hk T Wk Vk 1 T Wk AVk

T T Wk AVk = Vk1 Wk 1 A

Bi-orthogonalization basis construction T Vk =< v1 , . . . , vk >, Wk =< w1 , . . . , wk >, Wk = AVk and Lk = Wk Vk (Lower triangular), we say Vk and Wk be bi-orthogonal. 1 T T Wk AVk Let Hk = Wk Vk

9.3 Jacobi Davidson method If (k , S ) is eigenpair of Hk (k , Vk S ) is a harmonic Ritz pair. We bi-orthogonalize t with respect to Vk and Wk . t 1 T := t Vk L t k Wk t, vk = t Vk = {v }, Wk = {w}. Vk+1 =< Vk |vk+1 >, Wk+1 =< Wk |wk+1 >. Correction equation:
k k (A k I ) I wk t = r I wk T T k uk k uk t wk = Auk , where Hk S = k S, uk = Vk S

321

u wT

u wT

Solve correction equation: (A k I ) t = uk r t = (A k I )1 uk (A k I )1 r T T T (A k I )1 r (A k I )1 uk wk 0 = wk t = wk M A k I preconditioner , = wTkM 1 u k k Remark: Gk = QT k AQk Hk = VkT AVk


1 1

w T M 1 r

VkT AT AVk VkT AVk

1 Hk = VkT AT AVk 1 1 G k = Hk

Algorithm 1 : JD with Ritz value and orthogonal Algorithm 2 : JD with harmonic Ritz value and bi-orthogonal (1) Start : choose v1 ( v1 2 = 1), w1 = Av1 T T l11 = w1 v1 , h11 = w1 w1 l = 1, V1 = [v1 ], W1 = [w1 ], L1 = [l11 ], H1 = [h11 ] 11 u = v1 , w = w 1 , = h . Compute r = Au u. l11 (2) Iterate until convergence do : (3) Inner loop. For k = l1 , . . . , m 1 do Solve approximation t w I uwT uwT ( A I ) I t = r wT u wT u uuT uuT I T (A I ) I T t = r u u u u

To solve Bi-orthogonalize t against Vk and Wk .


1 T = t Vk L t k Wk t,

vk+1 = Vk t

t 2 t

= t Vk VkT Vk t

322 Compute

Chapter 9. Jacobi-Davidson method wk+1 = Avk+1 Wk+1 = [Wk |wk+1 ] Vk+1 = [Vk |vk+1 ] .

Compute
T Lk+1 = Wk +1 Vk+1 1 T Hk+1 = Lk+1 Wk +1 Vk+1

= VkT +1 AVk+1 Hk+1 = VkT AVk

T VkT +1 A AVk+1

Compute the smallest, largest eigenpair (k , s) of Hk+1 . V S Compute the harmonic Ritz vector ui = Vk+1 S k+1 Compute w := Au. Compute r := Au u (= w u). Test convergence ? STOP if no go to Inner loop. (4) Restart.

9.4

Jacobi-Davidson Type method for Generalized Eigenproblems


Ax = Bx (||2 + ||2 = 1)

Generalized eigenvalue problem (1) ((, ), x) eigenpair. The updating process for approx. EV: Let (, u) be Ritz pair of (A, B ). Then r Au Bu u. The goal is to nd an update z for u: (2) = (3) z u and A(u + z ) = B (u + z ). u A(u+z ) = u B (u+z) z u and (I uu )(A B )(I u u = r (Au Bu)
u Au , u Bu

uu )z u u

In practice, = (4)

z u and uu )(A B ) |u z = r (Au Bu) u u

(I

9.4 Jacobi-Davidson Type method for Generalized Eigenproblems Other projection for approx. EV: Assume r Au Bu w for some w. We look for an update z of u which is orthog. to u ( u u). i.e. (5) zu and A(u + z ) = B (u + z ).

323

For convenience, u = Bu. Similarly, select w w and consider P = = (6) With (7) and u w Au w Bu , = , a = Au u, b = Bu u w u w u ww I w w u. (A B )x = 0 P (A B )(Qx + (I Q)x) = 0 & (I P )(A B )(Qx + (I Q)x) = 0. ww uu and Q = w w u u

(5) is equiv. to w A(u+z ) = w B (u+z) (8) zu & (I ww )(A B )u w z w = (a b) ( )u . In practice, let = and r = a b = Au Bu.

Lemma 9.4.1 Let (A B )x = 0. Consider w and u with u x = 0 and (Bx) w = 0. Then the map Bxw xu Fp (I )(A B )(I ) w Bx u x is a bijection from u onto w . Proof: Suppose y u and Fp y = 0 = y = 0. Theorem 9.4.1 Choose w = Bu. Assume u and w conv. and u x and w Bx 0. Then if the initial u x, the seq. of u conv. to x quadratically and = w Au/w Bu . Proof: Suppose (A B )x = 0, with x = u + z for z u . Then (10) (A B )z = (A B )u + ( )Bx = r + ( )Bx.

324 Solve (11)

Chapter 9. Jacobi-Davidson method

(I P )(A B ) |u z1 = (I P )r. x (u + z1 ) = z z1 and z = x u. It suces to show x (u + z1 ) = z z1 = O( z 2 ).

(I P ) (10) (11) = (I P )(A B )(z z1 ) = ( )(I P )Bz + ( )(I P )Bu. w (10) and using r w = (12) By assumption and (12) = ( )(I P )Bz = w (A B )z (I P )Bz w Bx = O( z 2 ), = w (A B )z . w Bx

provided (I P )(A B )|u = Bu) nonsing. (by Lemma 1) and (I P )Bu = 0. ( w In practice, w = w = Bu, u = B w. Equiv. formulation for the correction: correction: (13) is equiv. to (14) A B w u 0 z = r 0 , (I ww )(A B )|u z1 = r, w w z1 u .

where = w (A B )z/w w. Theorem 9.4.2 The solution (13) is given by (15) with =

z = (A B )1 (r + w ) = u + (A B )1 w u u . u (A B )1 w

Proof: With z in (15) u z , and (A B )z = r + w . Since r w (13) holds.

Bibliography
[1] R. Barrett, M. Berry, T. F. Chan, and J. Demmel, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, 1994. [2] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe and van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems, SIAM, Philadelphia, 2000. [3] B. N. Datta, Numerical linear algebra and applications, Pacic Grove :Brooks/Cole Pub., 1995. [4] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed, The Johns Hopkins University Press, 1996. [5] L. A. Hageman, D. M. Young, Applied iterative methods, New York, Academic Press, 1981. [6] N. J. Higham, Accuracy and stability of numerical algorithms, 2nd ed., SIAM, , Philadelphia, 2002. [7] G. W. Stewart, Introduction to matrix computations, New York, Academic Press, 1973. [8] G. W. Stewart, Matrix algorithms, SIAM, Philadelphia, 1998. [9] J. H. Wilkinson, The Algebraic Eigenvalue Problems, Oxford Science Publications, 1965.

Index
A-conjugate, 106 LLT factorization, 29 2-consistly ordered, 79 2-cyclic, 79 Backward error, 16 Backward stable, 16 BCG method, 137 Bi-Conjugate Gradient algorithm, 141 Cholesky factorization, 29 Classical Gram-Schmidt Algorithm, 48 compact method, 29 Condition number, 17, 20 Conjugate gradient method, 106 Crouts factorization, 29 Durbin algorithm, 41 Forward stable, 17 Forward error, 16 Gauss-Seidel method, 62 GCG method, 134 Givens rotation, 49 Gradient Mmethod, 103 H-matrix, 123 Implicit Q Theorem, 197 irreducible, 65 Jacobi method, 62 Kronecker product, 189 LDR factorization, 29 LR factorization, 24 Modied Gram-Schmidt, 48 Norms dual, 10 matrix, 7 Frobenius norm, 8 operator, 8 vector, 6 Numerical stable, 16 Perron root, 67 Perron Lemma, 67 Perron vector, 67 Perron-Frobenius Theorem, 66 Persymmetric matrix, 40 Preconditioned CG-method, 114 property A, 79 reducible, 65 Schur Theorem, 6 Sherman-Morrison Formula, 4 Sherman-Morrison-Woodburg Formula, 4 Single-step method, 62 Singular Value Decomposition (SVD), 10 SOR, 77 Steepest descent method, 103 Stein-Rosenberg, 73 Toeplitz matrix, 40 Total-step method, 62 Yule-Walker system, 41

You might also like