You are on page 1of 93

A tutorial on: Robust iterative methods for Sparse Linear

Systems. (Part-1)
Yousef Saad
University of Minnesota
Computer Science and Engineering
CEA-EDF-INRIA school
Sophia Antipolis, Mar. 30 Apr. 3, 2009
Outline
Part 1
Introduction, sparse matrices and sparsity
Basic projection methods and (Briey) Krylov subspace methods
Preconditioned iterations
Preconditioning techniques
Part 2
Multilevel preconditioners
Nonsymmmetric permutations
Parallel implementations
Parallel Preconditioners
Software
CEA-EDF-INRIA school - 03/30/0904/03/09 2
Matlab codes
Material of this course will be supported by matlab scripts for
demos.
The matlab suite used for the demos is located here:
http://www.cs.umn.edu/saad/software
[Note: Updated at the occasion of each tutorial I give Make sure to
get the most recent version]
CEA-EDF-INRIA school - 03/30/0904/03/09 3
INTRODUCTION TO SPARSE MATRICES
Typical Problem:
Physical Model

Nonlinear PDEs

Discretization

Linearization (Newton)

Sequence of Sparse Linear Systems Ax = b


CEA-EDF-INRIA school - 03/30/0904/03/09 5
Linear System Solvers: recent work
Problem considered: Linear systems
Ax = b
Can view the problem from somewhat different angles:
Discretized problem coming from a PDE
An algebraic system of equations [ignore origin]
Sometimes a system of equations where A is not explicitly avail-
able
CEA-EDF-INRIA school - 03/30/0904/03/09 6
Linear System Solvers: general view
General
Purpose

Specialized
Direct sparse
Solvers
Iterative
A x = b
u = f
+ bc
Methods
Preconditioned Krylov
Fast Poisson
Solvers
Multigrid
Methods
CEA-EDF-INRIA school - 03/30/0904/03/09 7
Introduction: Linear System Solvers
Much of recent work on solvers has focussed on:
(1) Parallel implementation scalable performance
(2) Improving Robustness, developing more general preconditioners
CEA-EDF-INRIA school - 03/30/0904/03/09 8
A few observations
Problems are getting harder for Sparse Direct methods
(more 3-D models, much bigger problems,..)
Problems are also getting difcult for iterative methods Cause:
more complex models - away from Poisson
Researchers in iterative methods are borrowing techniques from
direct methods: preconditioners
The inverse is also happening: Direct methods are being adapted
for use as preconditioners
CEA-EDF-INRIA school - 03/30/0904/03/09 9
What are sparse matrices?
Common denition: ..matrices that allow special techniques to
take advantage of the large number of zero elements and the
structure.
A fewapplications of sparse matrices: Structural Engineering, Reser-
voir simulation, Electrical Networks, optimization problems, ...
Goals: Much less storage and work than dense computations.
Observation: A
1
is usually dense, but L and U in the LU factor-
ization may be reasonably sparse (if a good technique is used).
CEA-EDF-INRIA school - 03/30/0904/03/09 10
Nonzero patterns of a few sparse matrices
ARC130: Unsymmetric matrix from laser problem. a.r.curtis, oct 1974 SHERMAN5: fully implicit black oil simulator 16 by 23 by 3 grid, 3 unk
CEA-EDF-INRIA school - 03/30/0904/03/09 11
Sparse matrices in Matlab
Explore the scripts Lap2D, mark (provided in matlab suite) for
generating sparse matrices
Explore the command spy
Explore the command sparse
Run the demos titled demo sparse0 and demo sparse1
Load the matrix can 256.mat from the Florida set. Show its
pattern
CEA-EDF-INRIA school - 03/30/0904/03/09 12
Sparse matrices - continued
Main goal of Sparse Matrix Techniques: To perform standard
matrix computations economically, i.e., without storing the zeros
Example: To add two square dense matrices of size n requires
O(n
2
) operations. To add two sparse matrices A and B requires
O(nnz(A) + nnz(B)) where nnz(X) = number of nonzero ele-
ments of a matrix X.
For typical Finite Element /Finite difference matrices, number of
nonzero elements is O(n).
CEA-EDF-INRIA school - 03/30/0904/03/09 13
Graph Representations of Sparse Matrices
Graph theory is a fundamental tool in sparse matrix techniques.
Graph G = (V, E) of an n n matrix A defined by
Vertices V = {1, 2, ...., N}.
Edges E = {(i, j)|a
ij
= 0}.
Graph is undirected if matrix has symmetric structure: a
ij
= 0 iff
a
ji
= 0.
CEA-EDF-INRIA school - 03/30/0904/03/09 14
CEA-EDF-INRIA school - 03/30/0904/03/09 15
Adjacency graph of: A =
_
_
_
_
_
_
_
_
_
_
_
_
_






_
_
_
_
_
_
_
_
_
_
_
_
_
.
For any matrix A, what is the graph of A
2
? [interpret in terms of
paths in the graph of A]
CEA-EDF-INRIA school - 03/30/0904/03/09 16
Background: Direct versus iterative methods
Direct methods: based on sparse Gaussian eimination, sparse
Cholesky,..
Iterative methods: compute a sequence of iterates which con-
verge to the solution - preconditioned Krylov methods..
Remark: These two classes of methods have always been in
competition.
40 years ago solving a system with n = 10, 000 was a challenge
Now you can solve this in < 1 sec. on a laptop.
CEA-EDF-INRIA school - 03/30/0904/03/09 17
Sparse direct methods made huge gains in efciency. As a result
they are hard to beat for 2-D problems.
3-D problems much more challenging to direct solvers [inherent
to the underlying graph] Iterative methods become mandatory
Similar situation: Problems with many unknowns per grid point
Get the best of both worlds: turn direct methods into precondi-
tioners
Remarks: No robust black-box iterative solvers.
Robustness often conicts with efciency
However, situation improved in last decade
Line between direct and iterative solvers blurring
CEA-EDF-INRIA school - 03/30/0904/03/09 18
PROJECTION METHODS
One-dimensional projection processes
Steepest descent Problem: Ax = b , with A SPD
Dene: f(x) =
1
2
x x

2
A
=
1
2
(A(x x

), (x x

))
Note: 1. f(x) =
1
2
(Ax, x) (b, x) + constant
2. f(x) = Axb descent direction = bAx r
Idea: take a step of the formx
new
= x+r which minimizes f(x).
Best = (r, r)/(Ar, r).
Iteration:
r b Ax,
(r, r)/(Ar, r)
x x +r
Can show: convergence guaranteed if A is SPD.
CEA-EDF-INRIA school - 03/30/0904/03/09 20
Residual norm steepest descent: Now A is arbitrary
Minimize instead: f(x) =
1
2
b Ax
2
2
in direction f.
f(x) = A
T
(b Ax) = A
T
r.
Iteration:
r b Ax, d = A
T
r
d
2
2
/Ad
2
2
x x +d
Important Note: equivalent to usual steepest descent applied to
normal equations A
T
Ax = A
T
b .
Converges under the condition that A is nonsingular.
But convergence can be very slow
CEA-EDF-INRIA school - 03/30/0904/03/09 21
Minimal residual iteration: Assume Ais positive denite (A+A
T
is SPD).
The objective function is still
1
2
b Ax
2
2
, but the direction of
search is r = b Ax instead of f(x)
Iteration:
r b Ax,
(Ar, r)/(Ar, Ar)
x x +r
Each step minimizes f(x) = b Ax
2
2
in direction r.
Converges under the condition that A+A
T
is SPD.
CEA-EDF-INRIA school - 03/30/0904/03/09 22
Common feature of these techniques: x
new
= x +d , where
d = a certain direction.
is dened to optimize a certain quadratic function.
Equivalent to determining by an orthogonality constraint.
Example
In MR:
x() = x +d, with d = b Ax.
min

b Ax()
2
reached iff b Ax() r
One-dimensional projection methods can we generalize to m-
dimensional techniques?
CEA-EDF-INRIA school - 03/30/0904/03/09 23
General Projection Methods
Initial Problem: b Ax = 0
Given two subspaces K and L of R
N
dene the approximate prob-
lem:
Find x K such that b A x L
Leads to a small linear system (projected problems) This is a
basic projection step. Typically: sequence of such steps are applied
CEA-EDF-INRIA school - 03/30/0904/03/09 24
With a nonzero initial guess x
0
, the approximate problem is
Find x x
0
+K such that b A x L
Write x = x
0
+ and r
0
= b Ax
0
. Leads to a system for :
Find K such that r
0
A L
CEA-EDF-INRIA school - 03/30/0904/03/09 25
Matrix representation:
Let
V = [v
1
, . . . , v
m
] a basis of K &
W = [w
1
, . . . , w
m
] a basis of L
Then letting x be the approximate solution x = x
0
+ x
0
+ V y
where y is a vector of R
m
, the Petrov-Galerkin condition yields,
W
T
(r
0
AV y) = 0
and therefore
x = x
0
+V [W
T
AV ]
1
W
T
r
0
Remark: In practice W
T
AV is known from algorithm and has a
simple structure [tridiagonal, Hessenberg,..]
CEA-EDF-INRIA school - 03/30/0904/03/09 26
Prototype Projection Method
Until Convergence Do:
1. Select a pair of subspaces K, and L;
2. Choose bases V = [v
1
, . . . , v
m
] for K and W = [w
1
, . . . , w
m
]
for L.
3. Compute
r b Ax,
y (W
T
AV )
1
W
T
r,
x x +V y.
CEA-EDF-INRIA school - 03/30/0904/03/09 27
Operator Form Representation
Let P be the orthogonal projector onto K and
Q the (oblique) projector onto K and orthogonally to L.
Px K, x Px K
Qx K, x Qx L

K
L
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
c
x
Px
Qx

The P and Q projectors


CEA-EDF-INRIA school - 03/30/0904/03/09 28
Approximate problem amounts to solving
Q(b Ax) = 0, x K
or in operator form
Q(b APx) = 0
CEA-EDF-INRIA school - 03/30/0904/03/09 29
Question: what accuracy can one expect?
Let x

be the exact solution. Then


1) We cannot get better accuracy than (I P)x

2
, i.e.,
x x

2
(I P)x

2
2) The residual of the exact solution for the approximate problem
satises:
b QAPx

2
QA(I P)
2
(I P)x

2
CEA-EDF-INRIA school - 03/30/0904/03/09 30
Two important particular cases.
1. L = AK . then b A x
2
= min
zK
b Az
2
class of minimal residual methods: CR, GCR, ORTHOMIN,
GMRES, CGNR, ...
2. L = K class of Galerkin or orthogonal projection methods.
When A is SPD then
x

x
A
= min
zK
x

z
A
.
CEA-EDF-INRIA school - 03/30/0904/03/09 31
One-dimensional projection processes
K = span{d}
and
L = span{e}
Then x x +d and Petrov-Galerkin condition r A e yields
=
(r,e)
(Ad,e)
(I) Steepest descent: K = span(r), L = K
(II) Residual norm steepest descent: K = span(A
T
r), L = AK
(III) Minimal residual iteration: K = span(r), L = AK
CEA-EDF-INRIA school - 03/30/0904/03/09 32
Krylov Subspace Methods
Principle: Projection methods on Krylov subspaces:
K
m
(A, v
1
) = span{v
1
, Av
1
, , A
m1
v
1
}
probably the most important class of iterative methods.
many variants exist depending on the subspace L.
Simple properties of K
m
. Let = deg. of minimal polynomial of v
K
m
= {p(A)v|p = polynomial of degree m1}
K
m
= K

for all m . Moreover, K

is invariant under A.
dim(K
m
) = m iff m.
CEA-EDF-INRIA school - 03/30/0904/03/09 33
Overview
K = K
m
(A, r
0
), L = K Full Orthogonalization Method [Y.S.
81] ORTHORES [Young-Jea 82], Axelsson 81.
K = K
m
(A, r
0
), L = AK GMRES [YS-Schultz], GCR, Or-
thomin (Vinsome 1980), Orthodir (Young-Jea, 83), Axelssons CGLS
83...
K = K
m
(A, r
0
), L = K
m
(A
T
, w) Bi-CG (Fletcher 75)
Many variants of (3) that avoid the transpose (CGS, Sonneveld-
84), BiCGSTAB (Van der Vorst 92) TFQMR (Freund,93), ...
CEA-EDF-INRIA school - 03/30/0904/03/09 34
BASIC RELAXATION METHODS
Basic Relaxation Schemes
Relaxation schemes: based on the decomposition A = D E F
@
@
@
@
@
@
@
@
@
@
@
@
@
@
D
- F
- E
D = diag(A), E = strict lower part of Aand F
its strict upper part.
For example, Gauss-Seidel iteration :
(D E)x
(k+1)
= Fx
(k)
+b
The most common iterative procedures 50 years ago. However,
nowadays: seldom used by themselves.
Still used as smoothers in Multigrid schemes or sometimes as
preconditioners to Krylov subspace methods.
CEA-EDF-INRIA school - 03/30/0904/03/09 36
Iteration matrices
Jacobi, Gauss-Seidel, SOR, & SSOR iterations are of the form
x
(k+1)
= Mx
(k)
+f
M
Jac
= D
1
(E +F) = I D
1
A
M
GS
(A) = (D E)
1
F == I (D E)
1
A
M
SOR
(A) = (DE)
1
(F +(1)D) = I (
1
DE)
1
A
M
SSOR
(A) = I (2
1
1)(
1
D F)
1
D(
1
D E)
1
A
= I (2 1)(D F)
1
D(D E)
1
A
CEA-EDF-INRIA school - 03/30/0904/03/09 37
An observation & Introduction to Preconditioning
The iteration x
(k+1)
= Mx
(k)
+ f is attempting to solve (I
M)x = f. Since M is of the form M = I P
1
A this system can
be rewritten as
P
1
Ax = P
1
b
where for SSOR, we have
P
SSOR
= (D E)D
1
(D F)
referred to as the SSOR preconditioning matrix.
In other words:
Relaxation Scheme Preconditioned Fixed Point Iteration
CEA-EDF-INRIA school - 03/30/0904/03/09 38
PRECONDITIONING
Preconditioning Basic principles
Basic idea is to use the Krylov subspace method on a modified
system such as
M
1
Ax = M
1
b.
The matrix M
1
A need not be formed explicitly; only need to
solve Mw = v whenever needed.
Consequence: fundamental requirement is that it should be easy
to compute M
1
v for an arbitrary vector v.
CEA-EDF-INRIA school - 03/30/0904/03/09 40
Left, Right, and Split preconditioning
Left preconditioning: M
1
Ax = M
1
b
Right preconditioning: AM
1
u = b, with x = M
1
u
Split preconditioning: M
1
L
AM
1
R
u = M
1
L
b, with x = M
1
R
u
[Assume M is factored: M = M
L
M
R
. ]
CEA-EDF-INRIA school - 03/30/0904/03/09 41
Preconditioned CG (PCG)
Assume: A and M are both SPD.
Applying CG directly to M
1
Ax = M
1
b or AM
1
u = b
wont work because coefficient matrices are not symmetric.
Alternative: when M = LL
T
use split preconditioner option
Second alternative: Observe that M
1
A is self-adjoint wrt M
inner product:
(M
1
Ax, y)
M
= (Ax, y) = (x, Ay) = (x, M
1
Ay)
M
CEA-EDF-INRIA school - 03/30/0904/03/09 42
Preconditioned CG (PCG)
ALGORITHM : 1 Preconditioned Conjugate Gradient
1. Compute r
0
:= b Ax
0
, z
0
= M
1
r
0
, and p
0
:= z
0
2. For j = 0, 1, . . ., until convergence Do:
3.
j
:= (r
j
, z
j
)/(Ap
j
, p
j
)
4. x
j+1
:= x
j
+
j
p
j
5. r
j+1
:= r
j

j
Ap
j
6. z
j+1
:= M
1
r
j+1
7.
j
:= (r
j+1
, z
j+1
)/(r
j
, z
j
)
8. p
j+1
:= z
j+1
+
j
p
j
9. EndDo
CEA-EDF-INRIA school - 03/30/0904/03/09 43
Note M
1
A is also self-adjoint with respect to (., .)
A
:
(M
1
Ax, y)
A
= (AM
1
Ax, y) = (x, AM
1
Ay) = (x, M
1
Ay)
A
Can obtain a similar algorithm
Assume that M = Cholesky product M = LL
T
.
Then, another possibility: Split preconditioning option, which applies
CG to the system
L
1
AL
T
u = L
1
b, with x = L
T
u
Notation:

A = L
1
AL
T
. All quantities related to the precondi-
tioned system are indicated by .
CEA-EDF-INRIA school - 03/30/0904/03/09 44
ALGORITHM : 2 CG with Split Preconditioner
1. Compute r
0
:= b Ax
0
; r
0
= L
1
r
0
; and p
0
:= L
T
r
0
.
2. For j = 0, 1, . . ., until convergence Do:
3.
j
:= ( r
j
, r
j
)/(Ap
j
, p
j
)
4. x
j+1
:= x
j
+
j
p
j
5. r
j+1
:= r
j

j
L
1
Ap
j
6.
j
:= ( r
j+1
, r
j+1
)/( r
j
, r
j
)
7. p
j+1
:= L
T
r
j+1
+
j
p
j
8. EndDo
The x
j
s produced by the above algorithm and PCG are identical
(if same initial guess is used).
CEA-EDF-INRIA school - 03/30/0904/03/09 45
Flexible accelerators
Question: What can we do in case M is dened only approxi-
mately? i.e., if it can vary from one step to the other.?
Applications:
Iterative techniques as preconditioners: Block-SOR, SSOR, Multi-
grid, etc..
Chaotic relaxation type preconditioners (e.g., in a parallel com-
puting environment)
Mixing Preconditioners mixing coarse mesh / fine mesh pre-
conditioners.
CEA-EDF-INRIA school - 03/30/0904/03/09 46
ALGORITHM : 3 GMRES No preconditioning
1. Start: Choose x
0
and a dimension m of the Krylov subspaces.
2. Arnoldi process:
Compute r
0
= b Ax
0
, = r
0

2
and v
1
= r
0
/.
For j = 1, ..., m do
Compute w := Av
j
for i = 1, . . . , j, do
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
_
_
_
;
h
j+1,1
= w
2
; v
j+1
=
w
h
j+1,1
Dene V
m
:= [v
1
, ...., v
m
] and

H
m
= {h
i,j
}.
3. Formthe approximate solution: Compute x
m
= x
0
+V
m
y
m
where
y
m
= argmin
y
e
1


H
m
y
2
and e
1
= [1, 0, . . . , 0]
T
.
4. Restart: If satised stop, else set x
0
x
m
and goto 2.
ALGORITHM : 4 GMRES (right) Preconditioning
1. Start: Choose x
0
and a dimension m
2. Arnoldi process:
Compute r
0
= b Ax
0
, = r
0

2
and v
1
= r
0
/.
For j = 1, ..., m do
Compute z
j
:= M
1
v
j
Compute w := Az
j
for i = 1, . . . , j, do :
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
_
_
_
h
j+1,1
= w
2
; v
j+1
= w/h
j+1,1
Dene V
m
:= [v
1
, ...., v
m
] and

H
m
= {h
i,j
}.
3. Formthe approximate solution: x
m
= x
0
+M
1
V
m
y
m
where y
m
=
argmin
y
e
1


H
m
y
2
and e
1
= [1, 0, . . . , 0]
T
.
4. Restart: If satised stop, else set x
0
x
m
and goto 2.
ALGORITHM : 5 GMRES variable preconditioner
1. Start: Choose x
0
and a dimension m of the Krylov subspaces.
2. Arnoldi process:
Compute r
0
= b Ax
0
, = r
0

2
and v
1
= r
0
/.
For j = 1, ..., m do
Compute z
j
:= M
1
j
v
j
; Compute w := Az
j
;
for i = 1, . . . , j, do:
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
_
_
_
;
h
j+1,1
= w
2
; v
j+1
= w/h
j+1,1
Dene Z
m
:= [z
1
, ...., z
m
] and

H
m
= {h
i,j
}.
3. Formthe approximate solution: Compute x
m
= x
0
+Z
m
y
m
where
y
m
= argmin
y
e
1


H
m
y
2
and e
1
= [1, 0, . . . , 0]
T
.
4. Restart: If satised stop, else set x
0
x
m
and goto 2.
CEA-EDF-INRIA school - 03/30/0904/03/09 49
Properties
x
m
minimizes b Ax
m
over Span{Z
m
}.
If Az
j
= v
j
(i.e., if preconditioning is exact at step j) then ap-
proximation x
j
is exact.
If M
j
is constant then method is to Right-Preconditioned GM-
RES.
Additional Costs:
Arithmetic: none.
Memory: Must save the additional set of vectors {z
j
}
j=1,...m
Advantage: Flexibility
CEA-EDF-INRIA school - 03/30/0904/03/09 50
Standard preconditioners
Simplest preconditioner: M = Diag(A) poor convergence.
Next to simplest: SSOR M = (D E)D
1
(D F)
Still simple but often more efcient: ILU(0).
ILU(p) ILU with level of ll p more complex.
Class of ILU preconditioners with threshold
Class of approximate inverse preconditioners
Class of Multilevel ILU preconditioners: Multigrid, Algebraic Multi-
grid, M-level ILU, ..
CEA-EDF-INRIA school - 03/30/0904/03/09 51
The SOR/SSOR preconditioner
D
F
E
SOR preconditioning
M
SOR
= (D E)
SSOR preconditioning
M
SSOR
= (D E)D
1
(D F)
M
SSOR
= LU, L = lower unit matrix, U = upper triangular. One
solve with M
SSOR
same cost as a MAT-VEC.
CEA-EDF-INRIA school - 03/30/0904/03/09 52
k-step SOR (resp. SSOR) preconditioning:
k steps of SOR (resp. SSOR)
Questions: Best ? For preconditioning can take = 1
M = (D E)D
1
(D F)
Observe: M = LU +R with R = ED
1
F.
Best k? k = 1 is rarely the best. Substantial difference in
performance.
Write matlab script for k-step SSOR preconditioner ( = 1).
CEA-EDF-INRIA school - 03/30/0904/03/09 53
Iteration times versus
k for SOR(k) precon-
ditioned GMRES
CEA-EDF-INRIA school - 03/30/0904/03/09 54
ILU(0) and IC(0) preconditioners
Notation: NZ(X) = {(i, j) | X
i,j
= 0}
Formal denition of ILU(0):
A = LU +R
NZ(L)

NZ(U) = NZ(A)
r
ij
= 0 for (i, j) NZ(A)
This does not dene ILU(0) in a unique way.
Constructive denition: Compute the LU factorization of A but drop
any ll-in in L and U outside of Struct(A).
ILU factorizations are often based on i, k, j version of GE.
CEA-EDF-INRIA school - 03/30/0904/03/09 55
What is the IKJ version of GE?
Different computational patterns for Gaussian elimination
KJI,KJI IJK
IKJ JKI
CEA-EDF-INRIA school - 03/30/0904/03/09 57
ALGORITHM : 6 Gaussian Elimination IKJ Variant
1. For i = 2, . . . , n Do:
2. For k = 1, . . . , i 1 Do:
3. a
ik
:= a
ik
/a
kk
4. For j = k + 1, . . . , n Do:
5. a
ij
:= a
ij
a
ik
a
kj
6. EndDo
7. EndDo
8. EndDo
CEA-EDF-INRIA school - 03/30/0904/03/09 58
Not accessed
Accessed but not
Accessed and
modified
modified
CEA-EDF-INRIA school - 03/30/0904/03/09 59
ILU(0) zero-ll ILU
ALGORITHM : 7 ILU(0)
For i = 1, . . . , N Do:
For k = 1, . . . , i 1 and if (i, k) NZ(A) Do:
Compute a
ik
:= a
ik
/a
kj
For j = k + 1, . . . and if (i, j) NZ(A), Do:
compute a
ij
:= a
ij
a
ik
a
k,j
.
EndFor
EndFor
When Ais SPD then the ILU factorization = Incomplete Cholesky
factorization IC(0). Meijerink and Van der Vorst [1977].
CEA-EDF-INRIA school - 03/30/0904/03/09 60
Typical eigenvalue distribution of preconditioned matrix
CEA-EDF-INRIA school - 03/30/0904/03/09 61
Pattern of ILU(0) for 5-point matrix
CEA-EDF-INRIA school - 03/30/0904/03/09 62
Stencils and ILU factorization
Stencils of A and the L and U parts of A:
CEA-EDF-INRIA school - 03/30/0904/03/09 63
Higher order ILU factorization
Higher accuracy incomplete Cholesky: for regularly structured
problems, IC(p) allows p additional diagonals in L.
Can be generalized to irregular sparse matrices using the notion
of level of ll-in [Watts III, 1979]
Initially Lev
ij
=
_
_
_
0 for a
ij
= 0
for a
ij
== 0
At a given step i of Gaussian elimination:
Lev
kj
= min{Lev
kj
; Lev
ki
+Lev
ij
+ 1}
CEA-EDF-INRIA school - 03/30/0904/03/09 64
ILU(p) Strategy = drop anything with level of ll-in exceeding p.
* Increasing level of ll-in usually results in more accurate ILU and...
* ...typically in fewer steps and fewer arithmetic operations.
CEA-EDF-INRIA school - 03/30/0904/03/09 65
ILU(1)
CEA-EDF-INRIA school - 03/30/0904/03/09 66
ALGORITHM : 8 ILU(p)
For i = 2, N Do
For each k = 1, . . . , i 1 and if a
ij
= 0 do
Compute a
ik
:= a
ik
/a
jj
Compute a
i,
:= a
i,
a
ik
a
k,
.
Update the levels of a
i,
Replace any element in row i with lev(a
ij
) > p by zero.
EndFor
EndFor
The algorithm can be split into a symbolic and a numerical phase.
Level-of-ll in Symbolic phase
CEA-EDF-INRIA school - 03/30/0904/03/09 67
ILU with threshold generic algorithms
ILU(p) factorizations are based on structure only and not numerical
values potential problems for non M-matrices.
One remedy: ILU with threshold (generic name ILUT.)
Two broad approaches:
First approach [derived fromdirect solvers]: use any (direct) sparse
solver and incorporate a dropping strategy. [Munksgaard (?), Os-
terby & Zlatev, Sameh & Zlatev[90], D. Young, & al. (Boeing) etc...]
CEA-EDF-INRIA school - 03/30/0904/03/09 68
Second approach : [derived from iterative solvers viewpoint]
1. use a (row or colum) version of the (i, k, j) version of GE;
2. apply a drop strategy for the elment l
ik
as it is computed;
3. perform the linear combinations to get a
i
. Use full row expansion
of a
i
;
4. apply a drop strategy to ll-ins.
CEA-EDF-INRIA school - 03/30/0904/03/09 69
ILU with threshold: ILUT(k, )
Do the i, k, j version of Gaussian Elimination (GE).
During each i-th step in GE, discard any pivot or ll-in whose value
is below row
i
(A).
Once the i-th row of L + U, (L-part + U-part) is computed retain
only the k largest elements in both parts.
Advantages: controlled ll-in. Smaller memory overhead.
Easy to implement
Can be made quite inexpensive.
CEA-EDF-INRIA school - 03/30/0904/03/09 70
log
2
0

C
P
U
T
i
m
e
1.0 3.0 5.0 7.0 9.0
0.
4.0
8.0
12.
16.
20.
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h@@@@
$
$
$
$
2
2
2
2
4
4
4
4

Typical curve of CPU time versus numerical threshold


CEA-EDF-INRIA school - 03/30/0904/03/09 71
Restarting methods for linear systems
Motivation: Goal: to use the information generated from current
GMRES loop to improve convergence at next GMRES restart.
References:
R. A. Nicolaides (87): Deated CG.
R. Morgan (92) Deated GMRES
A. Chapman, YS (93) Deated GMRES+preconditioning
S. Kharchenko & A. Yeremin (92) pole placement ideas.
K. Burrage, J. Ehrel, and B. Pohl (93): Deated GMRES
E de Sturler: use SVD information in GMRES.
CEA-EDF-INRIA school - 03/30/0904/03/09 72
Can help improve convergence and prevent stagnation of GM-
RES in some cases.
Generally speaking: One should not expect to solve very hard prob-
lems with Eigenvalue Deation Preconditioning alone.
Question: Can the same effects be achieved with block-Krylov
methods?
CEA-EDF-INRIA school - 03/30/0904/03/09 73
Using the Flexible GMRES framework
Method: Deation can be achieved by enriching the Krylov
subspace with approximate eigenvectors obtained from previous
runs. We can use Flexible GMRES and append these vectors at
end. [See R. Morgan (92), Chapman & YS (95).]
Vectors v
1
, . . . , v
mp
= standard Arnoldi vectors
Vectors v
mp+1
, . . . , v
m
= Computed as in FGMRES where new
vectors z
j
are previously computed eigenvectors.
Storage: we need to store v
1
, . . . , v
m
and z
mp+1
, . . . , z
m
. p
additional vectors, with typically p << m.
GMRES with deation
1. Deated Arnoldi process: r
0
:= b Ax
0
, v
1
:= r
0
/( := r
0

2
).
For j = 1, ..., m do
If j < mp then z
j
:= v
j
Else z
j
= u
jm
(eigenvector)
w = Az
j
For i = 1, . . . , j, do
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
h
j+1,j
= w
2
, v
j+1
= w/w
2
.
EndDo
Dene Z
m
:= [z
1
, ...., z
m
] and

H
m
= {h
i,j
}.
2. Form the approximate solution:
Compute x
m
= x
0
+Z
m
y
m
where y
m
= argmin
y
e
1


H
m
y
2
.
3. Get next eigenvector estimates u
1
, . . . , u
p
from

H
m
, V
m
, Z
m
, ...
4. Restart: If satised stop, else set x
0
x
m
and goto 1.
Question 1: which eigenvectors to add?
Answer: those associated with smallest eigenvalues.
Question 2: How to compute eigenvectors from F-GMRES?
Answer: use the relation
AZ
m
= V
m+1

H
m
Approximation: , u = Z
m
y
Galerkin Condition: r AZ
m
gives the generalized problem

H
H
m

H
m
y =

H
H
m
V
H
m+1
Z
m
y
In Addition in GMRES:

H
m
= Q
m

R
m
so H
H
m
H
m
= R
H
m
R
m
.
See: Morgan (1993).
CEA-EDF-INRIA school - 03/30/0904/03/09 76
An example: Shell problems
Can be very hard to solve!
A matrix of size N=38,002, with Nz = 949,452 nonzero elements.
Actually symmetric. Not exploited in test.
Most simplistic methods fail.
ILUT(50,0) does not work even with GMRES(80).
This is an example when a large subspace is required.
0 100 200 300 400 500 600 700 800 900 1000
10
18
10
16
10
14
10
12
10
10
10
8
10
6
10
4
10
2
10
0
Numbers of GMRES steps
l
o
g
1
0

o
f

r
e
s
i
d
u
a
l
Shell problem. N = 38002, Nz = 949452, m = 80
0 eigenvector
5 eigenvectors
10 eigenvectors
20 eigenvectors
30 eigenvectors
CEA-EDF-INRIA school - 03/30/0904/03/09 78
An example: Eulers equations on an unstructured mesh
Contributed by Larry Wigton from Boeing
Size = 3,864. (966 mesh points).
Nonzero elements: 238,252 (about 62 per row).
Difcult to solve in spite of its small size.
Results with ILUT(lfil, )
ll Iterations estimate of
(tol = 10
8
) (LU)
1

100 0.19E + 56
110 0.34E + 9
120 30 0.70E + 5
130 25 0.33E + 7
140 20 0.17E + 4
150 19 0.69E + 4
CEA-EDF-INRIA school - 03/30/0904/03/09 80
Results with Block Jacobi Preconditioning
with Eigenvalue Deation
reduction in residual norm in
1200 GMRES steps with m = 49
4x4 block 16x16 block
p = 0 0.8 E 0 0.8 E 0
p = 4 0.8 E 0 4.0 E-5
p = 8 1.2 E-2 2.9 E-7
p = 12 1.9 E-2 3.8 E-6
CEA-EDF-INRIA school - 03/30/0904/03/09 81
Theory (Hermitian case only)
Assume that A is SPD and let K = K
m
+W, where W is s.t.
dist(AW, U) =
with U = exact invariant subspace associated with
1
, ..,
s
.
Then the residual r obtained from the minimal residual projection
process onto the augmented Krylov subspace K satises the
inequality,
r
2
r
0

1
T
2
m
()
+
2
where

n
+
s+1

s+1
, T
m
Chebyshev polyn. of deg. mof 1st kind.
See [YS, SIMAX vol. 4, pp 43-66 (1997)] for other results.
CEA-EDF-INRIA school - 03/30/0904/03/09 82
Crout-based ILUT (ILUTC)
Terminology: Crout versions of LU compute the k-th row of U and
the k-th column of L at the k-th step.
Computational pattern
Black = part computed at step k
Blue = part accessed
Main advantages:
1. Less expensive than ILUT (avoids sorting)
2. Allows better techniques for dropping
References:
[1] M. Jones and P. Plassman. An improved incomplete Choleski fac-
torization. ACM Transactions on Mathematical Software, 21:5
17, 1995.
[2] S. C. Eisenstat, M. H. Schultz, and A. H. Sherman. Algorithms
and data structures for sparse symmetric Gaussian elimination.
SIAM Journal on Scientic Computing, 2:225237, 1981.
[3] M. Bollh ofer. A robust ILU with pivoting based on monitoring the
growth of the inverse factors. Linear Algebra and its Applications,
338(13):201218, 2001.
[4] N. Li, Y. Saad, and E. Chow. Crout versions of ILU. MSI technical
report, 2002.
CEA-EDF-INRIA school - 03/30/0904/03/09 84
Crout LU (dense case)
Go back to delayed update algorithm (IKJ alg.) and observe: we
could do both a column and a row version
Left: U computed by rows. Right: L computed by columns
CEA-EDF-INRIA school - 03/30/0904/03/09 85
Note: Entries 1 : k1 in k-th row of gure need not be computed.
Available from already computed columns of L.
Similar observation for L (right)
CEA-EDF-INRIA school - 03/30/0904/03/09 86
ALGORITHM : 9 Crout LU Factorization (dense case)
1. For k = 1 : n Do :
2. For i = 1 : k 1 and if a
ki
= 0 Do :
3. a
k,k:n
= a
k,k:n
a
ki
a
i,k:n
4. EndDo
5. For i = 1 : k 1 and if a
ik
= 0 Do :
6. a
k+1:n.k
= a
k+1:n,k
a
ik
a
k+1:n,i
7. EndDo
8. a
ik
= a
ik
/a
kk
for i = k + 1, ..., n
9. EndDo
CEA-EDF-INRIA school - 03/30/0904/03/09 87
Comparison with standard techniques
0 10 20 30 40 50 60 70
0
2
4
6
8
10
12
14
Lfil
P
r
e
c
o
n
d
i
t
i
o
n
i
n
g

T
i
m
e

(
s
e
c
.
)
Preconditioning time vs. Lfil for RAEFSKY3
ILUC
rLLUT
cILUT
bILUT
Precondition time vs. Ll for ILUC (solid), row-ILUT (circles), column-
ILUT (triangles) and r-ILUT with Binary Search Trees (stars)
CEA-EDF-INRIA school - 03/30/0904/03/09 88
Inverse-based dropping strategies
Method developed mainly by Matthias Bollh offer
Observation: norm of inverses of the factors is more important
than the errors in the factors themselves: If A =

L

U +E then

L
1
A

U
1
= I +

L
1
E

U
1
In many cases

L
1
and

U
1
are *very* large Bad.
In contrast assume A = LU = exact LU factorization and

L
1
= L
1
+X

U
1
= U
1
+Y, Then:

L
1
A

U
1
= (L
1
+X)A(U
1
+Y ) = I +AY +XA+XY.
CEA-EDF-INRIA school - 03/30/0904/03/09 89
X, Y small preconditioned matrix close to identity
Let L
k
= matrix of the rst k rows of L and the last n k rows of
the identity matrix.
Consider a term l
jk
with j > k that is dropped at step k. Per-
turbed matrix

L
k
differs from L
k
by l
jk
e
j
e
T
k
. Note: L
k
e
j
= e
j
so

L
k
= L
k
l
jk
e
j
e
T
k
= L
k
(I l
jk
e
j
e
T
k
)

L
1
k
= (I l
jk
e
j
e
T
k
)
1
L
1
k
= L
1
k
+l
jk
e
j
e
T
k
L
1
k
.
j-th row of inverse of L
k
perturbed by l
jk
times k-th row of L
1
k
.
CEA-EDF-INRIA school - 03/30/0904/03/09 90
Need to limit the norm of this perturbing row, i.e.,
|l
jk
| e
T
k
L
1
k

should be small
L
1
is not available. Bollh ofers idea: use techniques for estimat-
ing condition numbers [see, e.g., Golub and van Loan]
CEA-EDF-INRIA school - 03/30/0904/03/09 91
ALGORITHM : 10 Estimating the norms e
T
k
L
1

1. Set
1
= 1,
i
= 0, i = 1, . . . , n
2 For k = 2, . . . , n do
3
+
= 1
k
;

= 1
k
;
4 if |
+
| > |

| then
k
=
+
else
k
=

5 For j = k + 1 : n and for l


jk
= 0 Do
6
j
=
j
+
k
l
jk
7 EndDo
8. EndDo
Idea ts very well with Crout ILU [Na Li, YS, E. Chow, 2004]
CEA-EDF-INRIA school - 03/30/0904/03/09 92
The end of Part-1
Given the following script for the IKJ version of LU write matlab
script for ILU(0)
function [L,U] = ikj(A)
%---------------------------------------------
% function [L,U] = ikj (A)
% LU factorization of A. Uses ikj variant of GE
%---------------------------------------------
n = size(A,1) ;
for i=1:n
for k=1:i-1
A(i,k) = A(i,k)/A(k,k); %% ! div by zero
A(i,k+1:n) = A(i,k+1:n)-A(i,k)
*
A(k,k+1:n);
end
end
L = diag(ones(n,1)) + tril(A,-1);
U = triu(A);
Explore the solvers in the package ITSOL:
www.cs.umn.edu/saad/software
CEA-EDF-INRIA school - 03/30/0904/03/09 93

You might also like