Professional Documents
Culture Documents
Numerical Analysis
Mathematics
Imperial College London
ii
CONTENTS iii
Contents
1 Applied Linear Algebra 1
1.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Cauchy-Schwartz inequality . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Gradients and Hessians . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Generalized inner product . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.8 Least Square Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.8.1 General Least Squares Case . . . . . . . . . . . . . . . . . . . 21
1.9 A more abstract approach . . . . . . . . . . . . . . . . . . . . . . . . 23
1.10 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Polynomial interpolation 31
2.1 Divided dierence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Finding the error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Best Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Piecewise Polynomial Interpolation . . . . . . . . . . . . . . . . . . . 41
3 Quadrature (Numerical Integration) 43
iv CONTENTS
1
Chapter 1
Applied Linear Algebra
1.1 Orthogonality
Denition. Let a, b R
n
. We dene the inner product of a, b to be inner product
, )
a, b) = a
T
b =
n
i=1
a
i
b
i
.
Also dene the outer product of a and b to be outer product
ab
T
=
_
_
_
a
1
.
.
.
a
n
_
_
_
_
b
1
b
n
_
=
_
_
_
a
1
b
1
a
1
b
n
.
.
.
.
.
.
.
.
.
a
n
b
1
a
n
b
n
_
_
_
.
Note. Note that:
1. The inner product is symmetric:
a, b) =
n
i=1
a
i
b
i
=
n
i=1
b
i
a
i
= b, a)
for all a, b R.
2. The inner product is linear with respect to the second argument:
a, b +c) =
n
i=1
a
i
(b
i
+c
i
) =
n
i=1
a
i
b
i
+
n
i=1
a
i
c
i
= a, b) +a, c)
for all a, b, c R
n
, , R.
3. From 1. and 2. get that the inner product is linear with respect to the second
argument.
2 1.1. ORTHOGONALITY
4. Observe that
a, a) =
n
i=1
a
2
i
0.
Denition. Let |a|
|a| = [a, a)]
1/2
be the length (or norm) of a. length
norm
Denition. We say that a, b R
n
, a, b ,= 0 are orthogonal if a, b) = 0. orthogonal
Example.
Claim. If a, b R
n
are orthogonal, then |a +b|
2
= |a|
2
+|b|
2
.
Proof.
|a +b|
2
def
= a +b, a +b)
= a +b, a) +a +b, b)
= |a|
2
+|b|
2
+ 2a, b)
= |a|
2
+|b|
2
.
n
k=1
, q
k
R
n
, q
k
,= 0 for k = 1 n is
orthogonal if q
j
, q
k
) = 0 for j, k = 1 n, j ,= k.
As a useful shorthand, introduce the Kronecker Delta notation
jk
jk
=
_
1 j = k,
0 j ,= k.
Example. For the n n identity matrix I we have I
jk
=
jk
.
Denition. A set of non-trivial vectors q
k
n
k=1
, q
k
R
n
, q
k
,= 0 for k = 1 n is
orthonormal if orthonormal
q
j
, q
k
) =
jk
for j, k = 1 n.
Note. A set of vectors is orthonormal if it is orthogonal and each vector has unit
length.
Denition. A set of vectors a
k
n
k=1
, a
k
R
m
for k = 1 n is linearly independent linearly
independent
if
n
k=1
c
k
a
k
= 0
implies c
k
= 0 for k = 1 n. The set a
k
n
k=1
is linearly dependent if there exist linearly
dependent
coecients c
k
R, k = 1 n, not all zero, such that
n
k=1
c
k
a
k
= 0.
1.2. GRAM-SCHMIDT 3
Note. Recall that for A =
_
_
_
a
1
.
.
.
a
n
_
_
_
, a
k
R
m
for k = 1 n:
(1) If the only solution to Ac = 0 is c = 0 then a
k
n
k=1
are linearly independent.
(2) If there exists c ,= 0 such that Ac = 0 then a
k
n
k=1
are linearly dependent.
(3) Restrict to m = n (so that A is square). If A
1
exists then rows (columns) of
A are linearly independent. If a
k
n
k=1
are linearly independent they form a
basis for R
n
and each vector x R
n
can be uniquely expressed a combination
of a
i
s.
Lemma 1.1. Let a
k
n
k=1
, a
k
R
m
, k = 1 n, be orthogonal. Then a
k
n
k=1
is
linearly independent.
Proof. If
n
k=1
c
k
a
k
= 0 then for 1 j n
k=1
c
k
a
k
, a
j
) = 0, a
j
)
n
k=1
c
k
a
k
, a
j
) = 0
c
j
a
j
, a
j
) = c
j
|a
j
|
2
= 0.
Since a
j
s are non-trivial, c
j
= 0. Repeat for all j = 1 n.
Remark 1.1. Linear independence does not imply orthogonality. For example take
n = m = 2 and a
1
=
_
2
0
_
and a
2
=
_
3
1
_
which are clearly linearly independent
but not orthogonal.
1.2 Gram-Schmidt
1 Classical Gram-Schmidt Algorithm (CGS)
1: v
1
= a
1
2: q
1
= v
1
/|v
1
|
3: for k = 2 to n do
4: v
k
= a
k
k1
j=1
a
k
, q
j
)q
j
5: q
k
= v
k
/|v
k
|
6: end for
Claim. Given a
i
n
i=1
, a
i
R
m
, i = 1 n linearly independent (so n m), CGS
nds q
i
n
i=1
, q
i
R
m
, i = 1 n, orthogonal, i.e. q
i
, q
j
) =
ij
with Spana
i
n
i=1
=
Spanq
i
n
i=1
, i, j = 1 n.
4 1.2. GRAM-SCHMIDT
Proof. Since a
i
n
i=1
are linearly independent, a
i
,= 0 for i = 1 n. For k = 1, we
get
q
1
=
v
1
|v
1
|
,
|q
1
| =
_
q
1
, q
1
)
_
1/2
=
_
1
|v
1
|
2
v
1
, v
1
)
_
1/2
= 1
For k = 2. From the code have
v
2
= a
2
a
2
, q
1
)q
1
. ()
Check that v
2
is orthogonal to q
1
:
v
2
, q
1
) = a
2
, q
1
) a
2
, q
1
) q
1
, q
1
)
. .
q
1
= 0.
Need to check that |v
2
| , = 0. If v
2
= 0, then by (), a
2
equals to a
2
, q
1
)q
1
, which is
a multiple of a
1
; contradiction to linear independence of a
i
n
i=1
.
Therefore v
2
,= 0 and q
2
has unit length and is a multiple of v
2
and hence q
i
2
i=1
is
orthonormal. Clearly Spana
i
2
i=1
= Spanq
i
2
i=1
.
Assume the statement is true for k 1, i.e. that q
i
k1
i=1
is orthonormal and
q
j
= linear combination of a
i
j
i=1
a
j
= linear combination of q
i
j
i=1
for j = 1 k 1. ()
Set
v
k
= a
k
k1
i=1
a
k
, q
i
)q
i
.
Then v
k
is orthogonal to all q
j
, j = 1 k 1:
v
k
, q
j
) = q
k
, q
j
)
k1
i=1
a
k
, q
i
) q
i
, q
j
)
. .
ij
= a
k
, q
j
) a
k
, q
j
)
= 0.
If v
k
= 0 then
a
k
=
k1
i=1
a
k
, q
i
)q
i
= linear combination of q
i
k1
i=1
= linear combination of a
i
k1
i=1
1.3. QR FACTORIZATION 5
by (); contradiction to a
i
n
i=1
linearly independent.
Hence v
k
,= 0. From q
k
=
v
k
v
k
get q
i
k
i=1
orthonormal. Since q
k
is a linear
combination of q
i
k1
i=1
and a
k
, it is a linear combination of a
i
k
i=1
by (). Similarly
by (), a
k
is a linear combination of q
i
k
i=1
.
Hence the result follows by induction.
1.3 QR Factorization
Look at CGS from dierent viewpoint. For a
i
n
i=1
, CGS gives q
i
n
i=1
orthonormal.
Let
A
mn
=
_
a
1
. . . a
n
_
,
Q
mn
=
_
q
1
. . . q
n
_
.
Let
R
nn
be an upper triangular matrix
R
lk
=
_
r
lk
l k,
0 l > k
and dene e
(n)
k
R
n
to be e
(n)
k
(e
(n)
k
)
j
=
kj
.
for j = 1 n. Then clearly for any B R
mn
, Be
(n)
k
= k-th column of B. From
CGS we have a
1
= |v
1
|q
1
, let r
11
= |a
1
|. Also for k = 1 n
Ae
(n)
k
= a
k
= v
k
+
k1
i=1
a
k
, q
i
)q
i
= |v
k
|q
k
+
k1
i=1
a
k
, q
i
)q
i
=
k
i=1
r
ik
q
i
=
Q
Re
(n)
k
,
where r
kk
= |v
k
| > 0 and r
ik
= a
k
, q
i
). Hence A =
Q
R.
Expressing A R
mn
as a product of
Q R
mn
with orthonormal columns and
R R
nn
upper triangular with positive diagonal entries is called the reduced QR
factorisation of A. reduced
QR factorisation
Now take Q R
mm
Q =
_
Q q
n+1
. . . q
m
_
with q
n+1
, . . . , q
m
chosen so that the columns of Q are orthonormal and R R
mn
R =
_
_
_
_
_
R
0
.
.
.
0
_
_
_
_
_
.
6 1.3. QR FACTORIZATION
Clearly, R is upper triangular matrix (as
R is). Call A expressed as a product of Q
and R as the QR factorisation of A. QR factorisation
Observe the product of Q with Q
T
_
Q
T
Q
jk
= q
T
j
q
k
= q
i
, q
k
)
=
jk
so Q
T
Q = I
(m)
and also Q
T
= Q
1
.
Denition. Matrix Q R
mm
is called orthogonal if Q
T
Q = I
(m)
. orthogonal
Proposition 1.2. Orthogonal matrices preserve length and angle, i.e. if Q R
mm
and Q
T
Q = I
(m)
then for all v, w R
m
(1) Qv, Qw) = v, w) (angle preserved),
(2) |Qv| = |v| (length preserved).
Proof. For v, w R
m
Qv, Qw) = (Qv)
T
Qw
= (v
T
Q
T
)Qw
= v
T
I
(m)
w
= v, w).
Also
|Qv| = [Qv, Qv)]
1/2
(1)
= [v, v]
1/2
= |v|.
Proposition 1.3. If Q
1
, Q
2
R
mm
are orthogonal, then (Q
1
Q
2
) is orthogonal.
Proof. (Q
1
Q
2
)
T
(Q
1
Q
2
) = Q
T
2
Q
T
1
Q
1
Q
2
= Q
T
2
Q
2
= I
(m)
.
Example. For m = 2 and
Q =
_
cos sin
sin cos
_
.
Clearly Q is orthogonal and rotates a vector in R
2
by an angle around the origin.
Denition. Dene the Givens Rotation Matrix G
pq
() R
mm
, p, q m, as G
pq
()
1.3. QR FACTORIZATION 7
G
pq
() =
_
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
cos sin
.
.
.
sin cos
.
.
.
1
_
_
_
_
_
_
_
_
_
_
_
_
_
,
i.e. j
th
column of G
pq
() is
e
(m)
j
=
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
(with 1 on j
th
row) if j ,= p and j ,= q, or
_
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
cos
.
.
.
sin
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
_
= e
(m)
p
cos +e
(m)
q
sin
if j = p, or
_
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
sin
.
.
.
cos
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
_
= e
(m)
p
sin +e
(m)
q
cos
if j = q.
Note. Length of every column of G
pq
() is 1 and columns of G
pq
() are orthogonal;
G
pq
() is orthogonal matrix.
For A, B R
mn
consider
G
pq
()A = B.
All rows of B are the same as those of A, except for rows p and q. Aim is to obtain
a QR factorisation of A using a sequence of givens rotations.
8 1.3. QR FACTORIZATION
Example. For m = 3, n = 2,
A =
_
_
3 65
4 0
12 13
_
_
.
Take a sequence of Givens Rotations so that Ais transformed into R upper triangular.
Choose G
12
() so that
A
(1)
= G
12
()A =
_
_
0
12 13
_
_
.
Choose such that (since G
12
preserves length)
_
cos sin
sin cos
__
3
4
_
=
_
5
0
_
.
Get
G
12
() =
_
_
3
5
4
5
0
4
5
3
5
0
0 0 1
_
_
and so
A
(1)
=
_
_
5 39
0 52
12 13
_
_
.
Choose G
13
() as the next rotation since it does not aect row 2, so A
(2)
21
stays 0
(G
23
() would not work). Want the rst column of A
(2)
to be a multiple of e
(3)
1
.
Since G
13
() preserves length, we know is (5
2
+ 12
2
)
1/2
= 13. So
G
13
() =
_
_
5
13
0
12
13
0 1 0
12
13
0
5
13
_
_
and so
A
(2)
= G
13
()A
(1)
=
_
_
13 27
0 52
0 31
_
_
.
Now choose G
23
()
G
23
() =
1
3665
_
_
1 0 0
0 52 31
0 31 52
_
_
to get
R = A
(3)
= G
23
()A
(2)
=
_
_
13 27
0
3665
0 0
_
_
.
So
R = G
23
()G
13
()G
12
()
. .
G
A
1.4. CAUCHY-SCHWARTZ INEQUALITY 9
with G being orthogonal since Givens rotations are orthogonal. Then
R = GA
G
T
GA = G
T
R
A = QR
with Q = G
T
.
In general, we want to solve Ax = b for A R
mn
. We apply a sequence of
Givens Rotations G to take A to R upper triangular to get an equivalent system
GAx = Rx = Gb = c.
If m > n and if c
i
,= 0 for some i = n+1 m then there is no solution x to Rx = c
and the system is said to be inconsistent. Otherwise there exists a unique solution
x which can be found by backward substitution.
1.4 Cauchy-Schwartz inequality
For vectors a, b R
3
a b = [a[[b[ cos .
Generalize this to R
n
.
Theorem 1.4 (Cauchy-Schwartx inequality). For a, b R
n
[a, b)[ |a||b|
with equality iff a and b are linearly dependent.
Proof. If a = 0 then a, b) = 0 for all b R
n
and so the inequality is trivially true.
If a ,= 0 then let q =
a
a
and c = b b, q)q so that
c, q) = b, q) b, q)q, q)
= 0.
We have
0 |c|
2
= c, c)
= c, b b, q)q)
= c, b) b, q)c, q)
= c, b)
= b b, q)q, b)
= |b|
2
b, q)q, b)
= |b|
2
_
b, q)
2
= |b|
2
[b, a)]
2
/|a|
2
[a, b)[ |a||b|
with equality iff c = 0, i.e.
b = b, q)q
= b, a)a/|a|
2
,
i.e. a, b are linearly dependent.
10 1.5. GRADIENTS AND HESSIANS
1.5 Gradients and Hessians
For a function of one variable f : R R have a Taylor series
f(a +h) = f(a) +hf
(a) +
h
2
2!
f
(a) +o(h
3
).
Now consider functions of n variables, i.e. f : R
n
R. Write f(x) where x =
_
_
_
x
1
.
.
.
x
n
_
_
_
R
n
. We dene the partial derivative of f with respect to x
i
f
x
i
to be a
f
x
i
derivative of f when taking all x
j
, j ,= i, as constants.
Example. For n = 2, x =
_
x
1
x
2
_
, f(x) = f(x
1
, x
2
) = sin x
1
sin x
2
. Then the rst
derivatives are
f
x
1
(x) = cos x
1
sin x
2
,
f
x
2
(x) = sin x
1
cos x
2
.
Generally, the second derivatives are
2
f
x
i
x
j
=
x
i
_
f
x
j
(x)
_
=
x
j
_
f
x
i
(x)
_
for i, j = 1 n and f suciently smooth.
Example. f(x) = sin x
1
cos x
2
. Then
2
f
x
2
1
(x) =
x
1
_
f
x
1
(x)
_
= sin x
1
cos x
2
,
2
f
x
2
2
(x) =
x
2
_
f
x
2
(x)
_
= sin x
1
cos x
2
,
2
f
x
1
x
2
(x) =
x
2
_
f
x
1
(x)
_
= cos x
1
sin x
2
.
Chain Rule
For f : R R, f(x), we can change the variable x so that x = x(t) or t = t(x) and
dene w(t) = f(x(t)). Then
dw(t)
dt
=
df
dx
(x(t))
dx
dt
(t).
Generalize to n variables. If w(t) = f(x(t)) then
dw
dt
=
n
i=1
f
x
i
(x(t))
dx
i
dt
(t).
1.5. GRADIENTS AND HESSIANS 11
Example. For n = 2, f(x) = sin x
1
sin x
2
, x
1
(t) = t
2
, x
2
(t) = cos t and hence
w(t) = sin t
2
sin(cos t). We have
dw
dt
= 2t cos t
2
sin(cos t) + sin t
2
(cos(cos t))(sin t)
=
f
x
1
(x(t))
dx
1
dt
(t) +
f
x
2
(x(t))
dx
2
dt
(t).
For general w(t) = f(a +th),
d
m
w
dt
m
=
_
n
i=1
h
i
x
i
_
m
f(a +th).
Now can generalize the Taylor series to get
f(a +h) = f(a) +
n
i=1
h
i
x
1
f(a) +
1
2
_
n
i=1
h
i
x
i
_
_
_
n
j=1
h
j
x
j
_
_
f(a) +O(|h
3
|).
Denition. For a function f : R
n
R, call the vector f(x) R
n
f(x)
f(x) =
_
_
f
x
1
(x)
.
.
.
f
x
n
(x)
_
_
the gradient of f at x. gradient
Denition. For a function f : R
n
R, call the matrix D
2
f(x) R
nn
D
2
f(x)
_
D
2
f(x)
ij
=
2
f
x
i
x
j
(x)
the hessian of f at x. hessian
Can now rewrite the Taylor series as
f(a +h) = f(a) +h
T
f(a) +
1
2
h
T
D
2
f(a)h +O(|h
3
|).
Example. Let f(x) = x
T
Ax for all x R
n
, where A R
nn
is a given symmetric
matrix. Find f(x) and D
2
f(x).
Get
f(x) =
n
i=1
n
j=1
A
ij
x
i
x
j
and so
f
x
p
(x) =
n
i=1
n
j=1
A
ij
x
p
(x
i
x
j
)
=
n
i=1
n
j=1
A
ij
_
x
j
_
x
p
x
i
_
+x
i
_
x
p
x
j
__
.
12 1.5. GRADIENTS AND HESSIANS
Also
x
p
x
i
=
_
1 if i = p,
0 if i ,= p
=
ip
.
Therefore
f
x
p
(x) =
n
i=1
n
j=1
A
ij
(
ip
x
j
+x
i
jp
)
=
n
j=1
A
pj
x
j
+
n
i=1
A
ip
x
i
and therefore
[f(x)]
p
= [Ax]
p
+
_
A
T
x
p
,
f(x) = Ax +A
T
x
= 2Ax if A is symmetric.
For the Hessian get
2
f
x
q
x
p
(x) =
x
q
_
f
x
p
(x)
_
=
x
q
_
[Ax]
p
+
_
A
T
x
p
_
=
x
q
_
_
n
j=1
A
pj
x
j
+
n
i=1
A
ip
x
i
_
_
= A
pq
+ (A
T
)
pq
and so for A symmetric, D
2
f(x) = 2A. Note the analogy with derivatives of functions
of one variable:
f(x) = ax
2
, f(x) = x
T
Ax,
f
(x) = 2a, D
2
f(x) = 2A.
Denition. A function f : R
n
R has a local maximum[minimum] in a if for all local maximum
minimum
u R
n
, |u| = 1, there exists > 0 such that
f(a +hu) f(a)
[]
for all h [0, ].
For n = 1, f
(a) = 0 and f
(a) +
1
2
h
2
f
(a) +O(h
3
)
= f(a) +
1
2
h
2
f
(a) +O(h
3
)
[]f(a) for small h.
1.5. GRADIENTS AND HESSIANS 13
Proposition 1.5. For f : R
n
R, if f(a) ,= 0 then f(x) does not have a local
minimum or maximum at x = a, i.e. f(a) = 0 is a necessary condition for f(x) to
have a local minimum or maximum at x = a.
Proof. We show that f does not have a maximum at a (analogous for minimum).
Let h 0 and consider
f(a +hu) = f(a) +hu
T
f(a) +O(h
2
).
Let
u =
f(a)
|f(a)|
so that |u| = 1. Then
f(a +hu) = f(a) +h
|f(a)|
2
|f(a)|
+O(h
2
)
= f(a) +h|f(a)|
. .
>0
> f(a).
Points a where f(a) = 0 are called stationary points of f(x). stationary points
Proposition 1.6. If f(a) = 0 and w
T
D
2
f(a)w > [<]0 for all w R
n
, w ,= 0, then
f(x) has a local minimum [maximum] at x = a.
Proof. Take u such that |u| = 1 (and so u ,= 0). Then
f(a +hu) = f(a) +hu
T
f(a)
. .
=0
+
1
2
h
2
..
0
u
T
D
2
f(a)u
. .
>[<]0
+O(h
3
)
[]f(a).
2
f
x
2
1
2
f
x
1
x
2
2
f
x
1
x
2
2
f
x
2
2
_
_
=
_
2 0
0 2
_
= 2I
(2)
.
14 1.6. GENERALIZED INNER PRODUCT
Check that for all w R
2
, w ,= 0,
w
T
D
2
f(a)w = 2w
T
w
= 2|w|
2
> 0.
So f has a local minimum at (1, 1).
Denition. Call a matrix A R
nn
denite
positive denite if x
T
Ax > 0,
negative denite if x
T
Ax < 0,
non-negative denite if x
T
Ax 0,
non-positive denite if x
T
Ax 0
for all x R
n
, x ,= 0.
Note. Clearly, a positive (negative) denite matrix A R
nn
is invertible since
there is no x R
n
, x ,= 0, such that Ax = 0; if there was, then x
T
Ax = x
T
0 = 0, a
contradiction.
Example. For n = 2, A =
_
1 1
1 1
_
, x =
_
x
1
x
2
_
,
x
T
Ax = x
2
1
+x
2
2
2x
1
x
2
= (x
1
x
2
)
2
0
so A is non-negative denite but not positive denite.
Using this denition, we can restate the proposition 1.6:
Proposition 1.7. If f(a) = 0 and D
2
f(a) is positive [negative] denite then a is
a local minimum [maximum] of f.
1.6 Generalized inner product
Denition. Let A be symmetric positive denite matrix A R
nn
. Dene the
inner product , )
A
v, u)
A
v, u)
A
= u
T
Av
for all v, u R
n
.
Note. We previously worked with , )
I
= u
T
v.
Check that the required properties of an inner product still hold:
symmetry
u, v)
A
= v
T
Au
= (v
T
Au)
T
= u
T
A
T
v
= u
T
Av
= u, v)
A
,
1.7. CHOLESKY FACTORIZATION 15
linearity
u, v +w)
A
= u, v)
A
+u, w)
A
u +v, w)
A
= u, w)
A
+v, w)
A
for all u, v, w R
n
and , R.
Denition. For a positive denite matrix A R
nn
dene the length of a vector length
| |
A
u R
n
as
|u|
A
= (u, u)
A
)
1/2
.
Theorem 1.8 (Generalised Cauchy-Schwartz inequality). If A R
nn
is symmetric
positive denite then
[a, b)
A
[ |a|
A
|b|
A
for all a, b R
n
with equality iff a, b are linearly dependent.
Proof. Replace , ) by , )
A
and | | by | |
A
in the proof of Cauchy-Schwartz
inequality.
1.7 Cholesky Factorization
Easy method of generating symmetric and positive denite matrices:
Proposition 1.9. If P R
nn
is invertible, then A = P
T
P is symmetric and
positive denite.
Proof. Matrix A is symmetric since
A
T
= (P
T
P)
T
= P
T
P
= A.
It is positive denite since
x
T
Ax = x
T
(P
T
P)x
= (Px)
T
(Px)
= |Px|
2
0
for all x R
n
. Also if x
T
Ax = 0 then |Px| = 0 and so Px = 0 and hence x = 0
since P is invertible.
We now prove the reverse direction.
Cholesky Factorisation
Theorem 1.10. Let A R
nn
be any symmetric positive denite matrix. Then
there exists an invertible P R
nn
such that A = P
T
P. Furthermore, we can
choose P to be upper triangular with P
ii
> 0, i = 1 n, in which case we say that
A = P
T
P is a Cholesky Factorisation (Decomposition) of A.
16 1.7. CHOLESKY FACTORIZATION
2 Apply CGS with , )
A
to v
i
n
i=1
1: w
1
= v
1
2: w
1
= v
1
/|v
1
|
3: for k = 2 to n do
4: w
k
= v
k
i1
j=1
v
k
, u
j
)
A
u
j
5: u
k
= w
k
/|w
k
|
6: end for
Proof. Let v
i
n
i=1
be any n linearly independent vectors in R
n
. Using the inner
product induced by A, we apply Gram Schmidt (with this inner product) to v
i
n
i=1
to get u
i
n
i=1
. Let U =
_
u
1
. . . u
n
_
R
nn
.
Then (this is a proof of 1.1 generalized)
[U
T
(AU)]
ij
= u
T
i
Au
j
= u
i
, u
j
)
A
=
ij
for i, j = 1 n. So U
T
AU = I
(n)
.
Does U
1
exist? Requires u
i
n
i=1
to be linearly independent. Suppose there exists
c R
n
such that
n
i=1
c
i
u
i
= 0. Then
n
i=1
c
i
Au
i
= A0 = 0
u
T
j
n
i=1
c
i
Au
i
= 0
n
i=1
c
i
u
i
, u
j
)
A
= 0
c
j
= 0.
for j = 1 n and so c = 0 and u
j
n
i=1
are linearly independent.
So U
1
exists and
U
1
U = I
(n)
= [I
(n)
]
T
= [U
1
U]
T
= U
T
(U
1
)
T
and therefore (U
T
)
1
= (U
1
)
T
. We let P = U
1
(so P is invertible). Observe that
P
T
= (U
1
)
T
= (U
T
)
1
.
Therefore
P
T
P = P
T
I
(n)
P
= P
T
U
T
AUP = A.
To nd P upper triangular with P
ii
> 0, need to choose v
i
n
i=1
to be a particular
basis for R
n
: for i = 1 n let v
i
= e
(n)
i
((e
(n)
i
)
j
=
ij
for i, j = 1 n). Clearly,
1.7. CHOLESKY FACTORIZATION 17
matrix U from CGS is upper triangular since each u
i
is a linear combination of
e
(n)
1
, . . . , e
(n)
i
. To show that U
ii
> 0, observe that U
ii
= (u
i
)
i
= (w
i
/|w
i
|
A
)
i
and
that
w
i
= e
(n)
i
i1
j=1
e
(n)
i
, u
j
)
A
u
j
.
Since (u
j
)
k
= 0 for k > j, we have that (w
i
)
i
= (e
(n)
i
)
i
= 1. Hence U is upper
triangular with U
ii
> 0.
Now choose P to be U
1
. Then
UP = I
(n)
U
_
p
1
. . . p
n
_
=
_
e
(n)
1
. . . e
(n)
n
_
.
For each i = 1 n solve Up
i
= e
(n)
i
: clearly (p
i
)
j
= 0 for j = i + 1 n and
(p
i
)
i
= 1/U
ii
> 0 so P is upper triangular with P
ii
> 0 for i = 1 n.
Proposition 1.11. Let A R
nn
be symmetric positive denite. Then A
kk
> 0 for
k = 1 n and [A
jk
[ < (A
jj
)
1/2
(A
kk
)
1/2
for j, k = 1 n, j ,= k.
Proof. Since A is symmetric positive denite, by the previous theorem, there exists
an invertible P such that A = P
T
P. Let
P =
_
p
1
. . . p
n
_
.
Then
A
jk
= p
T
j
p
k
= p
j
, p
k
)
for j, k = 1 n. So A
kk
= |p
k
| > 0 as p
k
,= 0 (P is invertible and so p
i
n
i=1
are
linearly independent).
Also
[A
jk
[ = [p
j
p
k
)[
< |p
j
||p
k
|
= (A
jj
)
1/2
(A
kk
)
1/2
by Cauchy-Schwartz (strict inequality as p
j
and p
k
are linearly independent).
Computing Cholesky Decomposition
Given A symmetric positive denite, can nd L = P
T
lower triangular with L
ii
> 0
such that A = LL
T
by applying CGS with , )
A
to e
i
n
i=1
to get u
i
n
i=1
and
putting P = U
1
= [u
1
, . . . , u
n
]
1
.
There is an easier way. Let L = [l
1
, . . . , l
n
] R
nn
and A = LL
T
. Then
A
ij
=
n
k=1
L
ij
(L
T
)
kj
=
n
k=1
(l
k
)
i
(l
k
)
j
. ()
18 1.7. CHOLESKY FACTORIZATION
Also
(l
k
l
T
k
)
ij
= (l
k
)
i
(l
T
k
)
j
= (l
k
)
i
(l
k
)
j
.
So from () get
A
ij
=
n
k=1
(l
k
l
T
k
)
ij
A =
n
k=1
l
k
l
T
k
.
Example. For n = 3. Find a Cholesky Decomposition of A =
_
_
2 1 0
1
5
2
1
0 1
5
2
_
_
,
i.e. nd lower triangular L, L
ii
> 0, i = 1 n, such that A = LL
T
.
Need to check that A is symmetric (clear) and positive denite (good to verify
conditions from 1.11) Take arbitrary x R
3
. Firstly,
x
T
Ax =
3
i=1
3
j=1
A
ij
x
i
x
j
= 2x
2
1
+
5
2
x
2
2
+
5
2
x
2
3
2x
1
x
2
2x
2
x
3
2x
2
1
+
5
2
x
2
2
+
5
2
x
2
3
(x
2
1
+x
2
2
) (x
2
2
+x
2
3
)
= x
2
1
+
1
2
x
2
2
+
3
2
x
2
3
> 0.
So let L = [l
1
, l
2
, l
3
] be lower triangular. Then
A = LL
T
=
3
k=1
l
k
l
T
k
= l
1
l
T
1
+l
2
l
T
2
+l
3
l
T
3
.
Since L is lower triangular
l
1
l
T
1
=
_
_
_
_
,
l
2
l
T
2
=
_
_
0 0 0
0 a
2
ab
0 ab b
2
_
_
,
l
3
l
T
3
=
_
_
0 0 0
0 0 0
0 0 x
_
_
.
1.7. CHOLESKY FACTORIZATION 19
Therefore the rst column of A is generated by l
1
alone, i.e.
l
1
=
Ae
1
A
11
=
Ae
1
|e
1
|
A
.
(l
1
)
1
(l
1
)
1
= 2, (l
1
)
1
(l
1
)
2
= 1, (l
1
)
1
(l
1
)
3
= 0. Thus l
1
=
1
2
_
_
2
1
0
_
_
, a multiple of
rst column of A.
Dene A
(1)
so that A
(1)
= l
2
l
T
2
+l
3
l
T
3
A
(1)
= Al
1
l
T
1
= A
_
_
2 1 0
1 2 0
0 0 0
_
_
=
_
_
0 0 0
0 2 1
0 1
5
2
_
_
.
By the same reasoning l
2
=
1
2
_
_
0
2
1
_
_
, multiple of the second column of A
(1)
.
Dene A
(2)
so that A
(2)
= l
3
l
T
3
A
(2)
= A
(1)
l
2
l
T
2
= A
(1)
_
_
0 0 0
0 2 1
0 1
1
2
_
_
=
_
_
0 0 0
0 0 0
0 0 2
_
_
and so l
3
=
1
2
_
_
0
0
2
_
_
, multiple of the third column of A
(2)
.
Putting these together gives
L =
1
2
_
_
2 0 0
1 2 0
0 1 2
_
_
.
Now consider the above constructive algorithm in the general case, i.e. A R
nn
symmetric positive denite. Since A
11
> 0, we can start the algorithm by dening
l
1
=
Ae
1
A
11
.
Then A
(1)
= A l
1
l
T
1
is symmetric (since A and l
1
l
T
1
are symmetric) and has the
form
A
(1)
=
_
_
_
_
_
0 0 0
0
.
.
. B
0
_
_
_
_
_
20 1.8. LEAST SQUARE PROBLEMS
with B symmetric. To continue, we need to show that B is positive denite and so
B
kk
> 0.
Theorem 1.12. Matrix B R
(n1)(n1)
dened above is positive denite.
Proof. We need to show that u
T
Bu > 0 for all u R
n1
, u ,= 0. Take u R
n1
,
u ,= 0. Construct v =
_
0
u
_
R
n
(hence v ,= 0) and e
T
1
v = 0 means that e
1
and v
are linearly independent. Then
A
(1)
= A
(Ae
1
)(Ae
1
)
T
|e
1
|
2
A
,
v
T
Av = u
T
Bu.
So
u
T
Bu = v
T
Av
(e
T
1
Av)
2
|e
1
|
2
A
=
(|v|
2
A
|e
1
|
2
A
[e
1
, v)
A
]
2
)
|e
1
|
2
A
.
By Cauchy-Schwartz, [e
1
, v)
A
[ < |e
1
|
A
|v|
A
. Hence u
T
Bu > 0.
Also B
11
> 0 and so A
(1)
22
> 0; the procedure can continue.
Application of Cholesky Decomposition
Given A R
nn
symmetric positive denite, can nd L lower triangular such that
LL
T
, L
ii
> 0. Solve Ax = b for given b R
n
: Get
LL
T
x = b
and let z = L
T
x. Solve Lz = b by forward substitution
z
1
= b
1
/L
11
,
z
k
= (b
k
k1
j=1
L
kj
z
j
)/(L
jk
)
for k = 2 n. Having z, solve L
T
x = z by backward substitution.
1.8 Least Square Problems
Example. Take a pendulum with length l, measure the period T and estimate g
(the acceleration due to gravity). Have the following
L =
l, C =
2
g
,
CL = T.
1.8. LEAST SQUARE PROBLEMS 21
Do m experiments to get
LC = T
with L, T R
m
. Plot the data (T
i
against L
i
) and t a straight line through the
data. Choose C to minimize the sum of squares of the errors, i.e. such that
m
i=1
(T
i
CL
i
)
2
= |T CL|
2
= T CL, T CL)
= |T|
2
2CL, T) +C
2
|L|
2
= S
is minimal. The derivative
dS
dC
= 2L, T) + 2C|L|
2
equals 0 iff C =
L,T
L
2
. Check the second derivative
d
2
S
dC
2
= 2|L|
2
> 0.
Take C
=
L,T
L
2
. Then
T C
L, L) = T, L) C
|L|
2
= 0,
i.e. the choice of C
makes T C
L perpendicular to L.
1.8.1 General Least Squares Case
Given A R
mn
(m n) b R
n
nd x R
n
such that Ax = b. For m > n, theres
no general solution as we have an overdetermined system. We are concerned about
nding x
R
n
which minimizes |Ax b| over x. Let
Q(x) = |Ax b|
2
= Ax b, Ax b)
= (Ax b)
T
(Ax b)
= (x
T
A
T
b
T
)(Ax b)
= x
T
A
T
Ax b
T
Ax x
T
A
T
b +b
T
b
= x
T
A
T
Ax 2b
T
Ax +|b|
2
= x
T
Gx 2
T
x +|b|
2
where
G = A
T
A R
nn
,
= A
T
b R
n
.
Note that G is symmetric.
Take derivatives of Q to get
Q(x) = 2(Gx ),
D
2
Q(x) = 2G.
22 1.8. LEAST SQUARE PROBLEMS
Theorem 1.13. Let A R
mn
(m n) with linearly independent columns and
b R
m
. Then A
T
A R
nn
is symmetric positive denite. Moreover, the x
R
n
solving A
T
Ax
= A
T
b is the unique minimum of Q(x) = |Ax b|
2
over x R
n
.
Note. The equations A
T
Ax
= A
T
b are called normal equations and x
n
i=1
linearly independent. Then for any c R
n
c
T
A
T
Ac = (Ac)
T
Ac
= |Ac|
2
0
with equality iff Ac = 0, i.e. when c = 0 since a
i
n
i=1
is linearly independent. Hence
A
T
A is positive denite.
To nd the minimum of Q(x), nd x
) = 0 and D
2
Q(x
) is positive
denite. Get
Q(x) = 2(Gx ) = 2(A
T
Ax A
T
b),
D
2
Q(x) = 2G = 2A
T
A.
Therefore x
has to solve A
T
Ax = A
T
b. As A
T
A is positive denite, (A
T
A)
1
exists.
Hence there exists a unique x
solving A
T
Ax = A
T
b. As D
2
Q(x
) is positive denite,
x
R
2
: solve the normal
equations
A
T
Ax
= A
T
b
to get x
=
_
0.090587 . . .
0.010515 . . .
_
.
In practice, it is not a good idea to solve the normal equations since the matrix
A
T
A is generally badly conditioned. A matrix B R
nn
is ill-conditioned if small ill-conditioned
changes to b lead to large changes in the solution Bx = b, so if in
B(x +x) = +b
for small b, x is large.
We now nd the x
= . We claim that x
R
n
is the least squares
solution of Ax = b. Also || = |Ax b|.
1.9 A more abstract approach
A more abstract denition of the inner product:
Denition. Let V be a real vector space. A inner product on V V is a function
, ) : V V R such that, for all u, v, w V, , R,
(1) u +v, w) = u, w) +v, w)
(2) u, v) = v, u),
(3) u, u) 0 with equality iff v = 0.
inner
product An inner product induces a norm |u| = (u, u))
1/2
for all u V . This implies
|u| = 0 iff u = 0.
Example. Let V = C[a, b] be continuous functions over [a, b]. Let w C[a, b] with
w(x) > 0 for all x [a, b]. Dene f, g) =
_
b
a
w(x)f(x)g(x)dx. Clearly (1) and (2)
hold. Also
f, f) =
_
b
a
w(x) (f(x))
2
dx
0
and f, f) = 0 implies f = 0.
Let V be a real vector space with inner product, ). Let U be a nite dimensional
subspace of V with basis
i
n
i=1
. Given v V , nd u
U such that |v u
|
|v u| for all u V .
Example. Let V = C[a, b] and f, g) =
_
b
a
f(x)g(x)dx (i.e. w(x) = 1). Let U be
polynomials of degree n 1 with basis
i
= x
i1
.
24 1.9. A MORE ABSTRACT APPROACH
We have u U implies u =
n
i=1
i
with
i
R. Also u
U implies u
n
i=1
i
with
i
R. Therefore
|v u
|
2
|v u|
2
_
_
_
_
_
v
n
i=1
i
_
_
_
_
_
2
_
_
_
_
_
_
v
n
j=1
j
_
_
_
_
_
_
2
.
Let E() =
_
_
_v
n
i=1
i
_
_
_
2
. Now we have to nd
R
n
such that E(
) E()
for all R
n
. We have
E() = v
n
j=1
j
, v
n
i=1
i
)
= |v|
2
2
n
i=1
i
v,
i
) +
n
i=1
n
j=1
i
,
j
).
Let R
n
where
i
= v,
i
). Let G R
nn
where G
ij
=
i
,
j
). Now we have
E() = |v|
2
2
T
+
T
G,
E() = 2 + 2G,
D
2
E() = 2G.
So
minimises E() if E(
) = 0. This is equivalent to G
= . The matrix G
is called the Gram matrix and depends on the basis for U. It is sometimes written Gram matrix
as G(
1
, ...,
n
).
Lemma 1.14. Let
i
n
i=1
be a basis of U. Let G R
nn
be such that G
ij
=
i
,
j
).
Then G is positive-denite.
Proof. Check that for any R
n
T
G =
n
i=1
n
j=1
i
,
j
)
=
n
i=1
i
,
n
j=1
j
)
=
_
_
_
_
_
n
i=1
i
_
_
_
_
_
2
0.
This only equals to zero if
n
i=1
i
= 0. As
i
s are linearly independent this
implies = 0. Therefore
T
G > 0 for all ,= 0.
1.9. A MORE ABSTRACT APPROACH 25
As G is positive denite, we can deduce that G
1
exists, and therefore there is a
unique
R
n
solving G
= , i.e. E(
) = 0 and therefore
is a global
minimum of E().
Theorem 1.15 (Orthogonality Property). Finding
R
n
which minimises E()
is equivalent to nding u
n
i=1
i
U such that v u
, u) = 0 for all u R
n
.
Proof. G
= implies that
T
G
=
T
for all R
n
. Moreover
T
G
=
T
implies that (G
)
i
=
i
, repeat for i = 0 n and we get G
= equivalent to
T
G
=
T
for all R
n
. So
G
T
G
=
T
i=1
n
j=1
i
G
ij
j
=
n
i=1
i=1
i
,
n
j=1
j
) =
n
i=1
i
, v)
u, u
) = u, v)
v u
, u) = 0,
where u =
n
i=1
i
.
Example. Let V = C[0, 1] and f, g) =
_
1
0
f(x)g(x)dx and let U = P
n1
. Take
i
= x
i1
. Given v V , nd u
n
i=1
i
x
i1
such that
|v u
| |v u|
|v u
|
2
|v u|
2
_
1
0
(v u
)
2
dx
_
1
0
(v u)
2
dx
for all u V . We now have to solve the normal equations G
= , where
i
= v,
i
)
=
_
1
0
v(x)x
i1
dx,
G
ij
=
i
,
j
)
=
_
1
0
x
i1
x
j1
dx
=
_
1
0
x
i+j2
dx
=
1
i +j 1
.
26 1.10. ORTHOGONAL POLYNOMIALS
This gives the Hilbert Matrix
G =
_
_
_
_
_
1
1
2
. . .
1
n
1
2
1
3
. . .
1
n+1
.
.
.
.
.
.
.
.
.
.
.
.
1
n
1
n+1
. . .
1
2n1
_
_
_
_
_
which is very badly conditioned as the columns are linearly dependent as n .
We need to change basis; we have two options:
1. We can use the Gram-Schmidt algorithm to change the basis to an orthonormal
basis
i
n
i=1
where
i
,
j
) =
ij
. This implies that G = I.
2. We can also create an orthogonal basis
i
n
i=1
where
i
,
j
) = 0 for i ,= j.
Now G is diagonal and G
ii
=
_
_
_
i
_
_
_
2
> 0. We have
i
=
i
|
i
|
2
and therefore
u
n
i=1
i
|
i
|
2
i
.
Example. Let V = R
m
and let a, b) = a
T
b. Let U = Span a
i
n
i=1
with n m. So
a
i
n
i=1
is a basis for U. Given v R
m
, we want to nd u
n
i=1
i
a
i
such that
|v u
= ,
i
= v, a
i
)
= a
T
i
v,
G
ij
= a
i
, a
j
)
= a
T
i
a
j
.
Let A =
_
a
i
a
n
_
so A
T
A = G and = A
T
v. We can deduce that
A
T
A
= A
T
v
A
= v.
So A
T
A is ill-conditioned and we shouldnt solve these normal equations and use the
QR approach instead.
1.10 Orthogonal Polynomials
V = C[a, b] and f, g) =
_
b
a
w(x)f(x)g(x)dx where w is the weight function w
C(a, b) such that w 0 with possibly a nite number of zeros. This is required for
the integral to be well-dened.
[f, g)[ =
_
b
a
w(x)f(x)g(x)dx
_
b
a
[w(x)f(x)g(x)[ dx
=
_
b
a
w(x) [f(x)g(x)[ dx
_
b
a
w(x)dx max
axb
[f(x)[ max
axb
[g(x)[ .
1.10. ORTHOGONAL POLYNOMIALS 27
Therefore , ) is well-dened if
_
b
a
w(x)dx < .
Let U = P
n
be the polynomials of degree n. The natural basis
_
x
i
_
n
i=0
leads to an
ill-conditioned Gram matrix. We will construct a new basis for P
n
,
i
n
i=0
where
j
(x) is a monic polynomial of degree j, i.e.
j
(x) = x
j
+
j1
i=o
a
ij
x
i
. monic polynomial
Theorem 1.16. Monic orthogonal polynomials,
j
P
j
, satisfy the three term
recurrence relation, for j 1
j+1
(x) = (x a
j
)
j
(x) b
j
j1
(x) for j 1
where
a
j
=
x
j
,
j
)
|
j
|
2
and
b
j
=
|
j
|
2
|
j1
|
2
.
Proof. Let
j
P
j
be monic. This implies that
j+1
(x) x
j
(x)
. .
P
j
=
j
k=0
b
k
x
k
=
j
k=0
c
k
k
(x).
Now we need to nd c
k
. We have
k=0
c
k
k
(x),
i
(x)) =
j+1
(x) x
j
(x),
i
(x)).
But
j
is orthogonal to
k
for k = 0 j 1. Therefore
j
is orthogonal to any
p P
j1
as
k
j1
k=0
is a basis for P
j1
. Then for i = 0 j
c
i
|
i
|
2
=
j+1
,
i
) x
j
,
i
)
=
j
, x
i
).
We have x
i
P
i+1
and hence
j
, x
i
) = 0 if i j 2. Since c
i
|
i
|
2
=
j
, x
i
),
we have c
i
= 0 for i = 0 j 2. Hence
j+1
(x) x
j
(x) = c
j1
j1
(x) +c
j
j
(x).
This implies
j+1
(x) = (x +c
j
)
j
(x) +c
j1
j1
(x).
28 1.10. ORTHOGONAL POLYNOMIALS
We have
c
j1
=
j
, x
j1
)
|
j1
|
2
,
c
j
=
j
, x
j
)
|
j
|
2
.
Now note that
j
, x
j1
) =
j
, x
j1
j
)
. .
=0
+
j
,
j
).
Therefore c
j1
=
j
j1
2
. Set b
j
= c
j1
and a
j
= c
j
.
To apply this Theorem we need
0
(x) = 1 and
1
(x) = x a
0
where a
0
R must
be chosen such that
1
,
0
) = 0, i.e.
x a
0
, 1) = 0
a
0
1, 1) = x, 1)
a
0
=
x, 1)
|1|
2
=
x
0
,
0
)
|
0
|
2
.
We use the theorem for j 0 by setting
1
(x) = 0. Thus
j+1
(x) = (x a
j
)
j
(x) b
j
j1
(x)
where j 0 and
a
j
=
x
j
,
j
)
|
j
|
2
,
b
j
=
|
j
|
2
|
j1
|
2
,
0
(x) = 1,
1
(x) = 0.
Remark 1.2. Recall that g(x) is even iff g(x) = g(x) or
_
2
2
g(x)dx = 2
_
2
0
g(x)dx.
and g(x) is odd iff g(x) = g(x) or
_
2
2
g(x)dx = 0. odd/even
function
Example. Let f, g) =
_
1
1
f(x)g(x)dx be our inner product (i.e. w(x) = 1 ). We
shall apply our method with j = 0 to this case. We have
0
(x) = 0 and
1
(x) = xa
0
which implies
1
(x) = x. Also
a
1
=
x
0
,
0
)
|
0
|
2
=
_
1
1
xdx
_
1
1
1dx
= 0
1.10. ORTHOGONAL POLYNOMIALS 29
(since x is an odd function). Using the method with j = 1 we deduce
2
(x) =
(x a
1
)
1
(x) b
1
0
(x) = x
2
a
1
x b
1
. Then
a
1
=
x
1
,
1
)
|
1
|
2
=
_
1
1
x
3
dx
|
1
|
2
= 0,
b
1
=
|
1
|
2
|
0
|
2
=
_
1
1
x
2
dx
_
1
1
1dx
=
1
3
.
So
2
(x) = x
2
1
3
and we can continue in this matter.
Recall now our original problem. Given f C[a, b] we wish to nd p
n
P
n
such
that |f p
n
| |f p
n
| for all p
n
P
n
.
We wish to nd an orthogonal basis for P
n
,
j
n
j=0
. Then p
n
=
n
j=0
j
(x). We
solve the normal equations G
= with G R
(n+1)(n+1)
and for i = 1 n
G
ij
=
i
,
j
)
=
_
0 if i ,= j,
|
i
|
2
if i = j,
i
= f,
i
),
i
=
i
G
ii
,
=
i
|
i
|
2
.
This implies that
p
n
(x) =
n
j=0
f,
j
)
|
j
|
2
j
(x)
is the best approximation to f.
Example. Show that the polynomials T
k
(x) = cos(k cos
1
(x)) for 1 x 1 are
orthogonal with respect to the inner product f, g) =
_
1
1
(1 x
2
)
1/2
f(x)g(x)dx.
Does T
k
(x) belong to P
k
?
T
0
(x) = cos 0
= 1,
T
1
(x) = cos(cos
1
x)
= x.
30 1.10. ORTHOGONAL POLYNOMIALS
Lets use a change of variable = cos
1
x and so x = cos . Now we can write
T
k
(x) = cos k. Using cos((k + 1)) + cos((k 1)) = 2 cos k cos . We can deduce
the following
T
k+1
(x) +T
k1
(x) = 2T
k
(x)x
T
k+1
(x) = 2xT
k
(x) T
k1
(x).
We have
T
2
(x) = 2xT
1
(x) T
0
(x)
= 2x
2
1,
T
3
(x) = 2xT
2
(x) T
1
(x)
= 2
2
x
3
3x.
By induction we have T
k
(x) P
k
. The coecient of x
k
is 2
k1
. Using x = cos ,
T
k
(x), T
j
(x)) =
_
1
1
(1 x
2
)
1/2
T
k
(x)T
j
(x)dx
=
_
0
(sin )
1
cos(k) cos(j)(sin )d
=
_
0
cos(k) cos(j)d
=
1
2
_
0
cos((j +k)) + cos((j k))d
=
_
_
_
0 if j ,= k,
2
if j = k ,= 0,
if j = k = 0.
Call T
k
(x) the Chebyshev polynomials. Chebyshev polynomials
31
Chapter 2
Polynomial interpolation
Given (z
j
, f
j
)
n
j=0
, z
j
, f
j
C, 0 = 1 n, we want to nd a polynomial p
n
(z) P
n
such that p
n
(z
j
) = f
j
for j = 0 n. Call such p
n
the interpolating polynomial . To interpolating
polynomial
prove that this polynomial exists:
Lemma 2.1 (Lagrange Basis Function). Let
l
j
(z) =
n
k=0,k=j
(z z
k
)
(z
j
z
k
)
for j = 0 n. Then l
j
(z) P
n
and l
j
(z
r
) =
jr
for j, r = 0 n.
Proof. For j = 0 n, l
j
(z) is a product of n factors of the form
zz
k
z
j
z
k
and therefore
l
j
(z) P
n
.
We have
l
j
(z
r
) =
n
k=0,k=j
z
r
z
k
z
j
z
k
for r = 0 n. If r = j, then clearly L
j
(z
r
) = 1. Otherwise for k = r,
z
r
z
k
z
j
z
k
= 0 and
so l
j
(z
r
) = 0. Hence l
j
(z
r
) =
rj
.
Lemma 2.2. The interpolating polynomial p
n
(z) P
n
for data (z
j
, f
j
)
n
j=0
with
z
j
distinct is
p
n
(z) =
n
j=0
f
j
l
j
(z).
Note. Call p
n
in this form the Lagrange form of the interpolating polynomial. Lagrange form
Proof. We have p
n
(z) P
n
since each l
j
(z) P
n
. Also by the previous lemma, for
r = 0 n,
p
n
(z
r
) =
n
j=0
f
j
l
j
(z
r
)
=
n
j=0
f
j
jr
= f
r
.
32
P
n
where a
i
C, a
n
,= 0. Then p
n
(z) has n distinct roots in C unless a
i
= 0 for
i = 0 n.
Lemma 2.4. Given (z
j
, f
j
)
n
j=0
, z
j
distinct, there exists a unique interpolating
polynomial p
n
(z) P
n
.
Proof. Assume the contrary, i.e. there exists q
n
P
n
such that p
n
(z
j
) = q
n
(z
j
) = f
j
for j = 0 n. Consider the polynomial (p
n
q
n
) P
n
. Then
(p
n
q
n
)(z
j
) = (p
n
(z
j
) q
n
(z
j
)) = 0
for j = 0 n. Hence (p
n
q
n
) has n + 1 roots and therefore by the Fundamental
Theorem of Algebra is 0, i.e. p
n
= q
n
. Hence p
n
is unique.
Example (of interpolating polynomial). For n = 2. Find p
2
P
2
such that p
2
(0) =
a, p
2
(1) = b and p
2
(4) = c. Get
l
0
(z) =
(z z
1
)(z z
2
)
(z
0
z
1
)(z
0
z
2
)
=
(z 1)(z 4)
(0 1)(0 4)
=
1
4
(z
2
5z + 4),
l
1
(z) =
(z 0)(z 4)
(1 0)(1 4)
=
1
3
(z
2
4z),
l
2
(z) =
(z 0)(z 1)
(4 0)(4 1)
=
1
12
(z
2
z).
Hence
p
2
(z) = al
0
(z) +bl
1
(z) +cl
2
(z)]]
=
_
a
4
b
3
+
c
12
_
z
2
_
3a
4
4b
3
+
c
12
_
z +a
in Lagrange form.
We are interested in nding the coecients of the interpolating polynomial in the
canonical form
p
n
(z) =
n
k=0
a
k
z
k
.
Consider the equations
p
n
(z
j
) =
n
k=0
a
k
z
k
j
= f
j
for j = 0 n. Get a system of equations
_
_
_
_
_
1 z
0
z
2
0
z
n
0
1 z
1
z
2
1
z
n
1
.
.
.
.
.
.
.
.
.
.
.
.
1 z
n
z
2
n
z
n
n
_
_
_
_
_
. .
V
_
_
_
a
0
.
.
.
a
n
_
_
_
=
_
_
_
f
0
.
.
.
f
n
_
_
_
.
33
Call V the Vandermonde Matrix. So we need to solve V a = f. In general, V is Vandermonde
matrix
ill-conditioned. With the canonical basis
_
z
k
_
n
k=0
, we can solve V a = f by nding
the Lagrange basis l
k
(z)
n
k=0
and thus solving Ia = f. However, the Lagrange basis
has to be constructed.
Assume we found p
n1
P
n1
interpolating (z
j
, f
j
)
n1
j=0
and are given a new data
point (z
n
, f
n
). One cannot use p
n1
to compute p
n
, since it is necessary to compute
the new Lagrange basis for P
n
.
We now look for an alternative construction. If p
n1
P
n1
is such that p
n1
(z
j
) =
f
j
for j = 0 n 1, let p
n
P
n
be such that p
n
(z
j
) = f
j
for j = 0 n and
p
n
(z) = p
n1
(z) +c
n1
k=0
(z z
k
).
Clearly p
n
(z
j
) = p
n1
(z
j
) = f
j
for j = 0 n 1. Choose c C such that
p
n
(z
n
) = p
n1
(z
n
) +c
n
k=0
(z
n
z
k
) = f
n
that is
c =
f
n
p
n1
(z
n
)
n1
k=0
(z
n
z
k
)
.
Therefore c depends on (z
j
, f
j
)
n
j=0
. We will use the notation c = f[z
0
, z
1
, . . . , z
n
] f[z
0
, z
1
, . . . , z
n
]
so that
p
n
(z) = p
n1
(z) +f[z
0
, z
1
, . . . , z
n
]
n1
k=0
(z z
k
).
That is, the coecient of z
n
in p
n
(z) is f[z
0
, z
1
, . . . , z
n
].
Note that since p
n
is such that p
n
(z
j
) = f
j
, j = 0 n, is unique,
f[z
(0)
, . . . , z
(n)
] = f[z
0
, . . . , z
n
]
for any permutation of 0, 1, . . . , n.
Lemma 2.5. For (z
j
, f
j
)
n
j=0
, z
j
, f
j
C, z
j
distinct,
f[z
0
, z
1
, . . . , z
n
] =
n
j=0
f
j
k=0,k=j
(z
j
z
k
)
.
Furthermore, if f
j
= f(z
j
), j = 0 n for some function f(z) then f[z
0
, . . . , z
n
] = 0
if f P
n1
.
Proof. Compare coecient of z
n
in the Lagrange form of p
n
with
p
n
(z) = p
n1
(z) +f[z
0
, . . . , z
n
]
n1
k=0
(z z
k
)
=
n
j=0
f
j
_
_
k=0,k=j
(z z
k
)
z
j
z
k
_
_
=
n
j=0
f
j
_
z
n
+
n
k=0,k=j
(z
j
z
k
)
_
.
34 2.1. DIVIDED DIFFERENCE
Clearly the leading coecient of z
n
in the Lagrange form is
n
j=0
f
j
k=0,k=j
(z
j
z
k
)
= f[z
0
, . . . , z
n
].
If f
j
= f(z
j
) for some f P
n1
then p
n
= f P
n1
as the interpolating polynomial
is unique. Therefore the leading coecient of p
n
f[z
0
, . . . , z
n
] is 0.
Note that
p
n
(z) = p
n1
(z) +f[z
0
, . . . , z
n
]
n1
k=0
(z z
k
),
p
n1
(z) = p
n2
(z) +f[z
0
, . . . , z
n1
]
n2
k=0
(z z
k
),
.
.
.
p
1
(z) = p
0
(z) +f[z
0
, z
1
](z z
0
)
p
0
(z) = f
0
= f[z
0
]
and so we can write
p
n
(z) = f[z
0
] +
n
j=1
f[z
0
, . . . , z
j
]
j1
k=0
(z z
k
).
Call this the Newton form of the interpolating polynomial. Newton form
2.1 Divided dierence
Call f[z
0
, . . . , z
n
] the divided dierence. divided
dierence
Theorem 2.6. For any distinct complex numbers z
0
, z
1
, . . . , z
n+1
, the divided dif-
ference satises the recurrence
f[z
0
, z
1
, . . . , z
n+1
] =
f[z
0
, . . . , z
n
] f[z
1
, . . . , z
n+1
]
z
0
z
n+1
.
Proof. Given (z
j
, f
j
)
n+1
j=0
we construct p
n
, q
n
P
n
such that p
n
(z
j
) = f
j
for j =
0 n and q
n
(z
j
) = f
j
for j = 1 n+1. Observe that f[z
0
, . . . , z
n
] is the coecient
of z
n
in p
n
(z) and that f[z
1
, . . . , z
n+1
] is the coecient of z
n
in q
n
(z). Then
r
n+1
(z) =
(z z
n+1
)p
n
(z) (z z
0
)q
n
(z)
z
0
z
n+1
P
n+1
and hence
r
n+1
(z
0
) = p
n
(z
0
) = f
0
,
r
n+1
(z
n+1
) = q
n
(z
n+1
) = f
n+1
,
r
n+1
(z
j
) =
(z
j
z
n+1
)f
j
(z
j
z
0
)f
j
z
0
z
n+1
= f
j
2.2. FINDING THE ERROR 35
for j = 1 n. Therefore r
n+1
(z) is the interpolating polynomial of (z
j
, f
j
)
n+1
j=0
.
Since f[z
0
, . . . , z
n+1
] is the coecient of z
n+1
in r
n+1
(z),
f[z
0
, . . . , z
n+1
] =
f[z
0
, . . . , z
n
] f[z
1
, . . . , z
n+1
]
z
0
z
n+1
.
j=1
f[z
0
, . . . , z
j
]
j1
k=0
(z z
k
).
Theorem 2.6 gives a recurrence relation
f[z
0
, . . . , z
j+1
] =
f[z
0
, . . . , z
j
] f[z
1
, . . . , z
j+1
]
z
0
z
j+1
.
Can construct a divided dierence table divided
dierence
table z
0
, f[z
0
],
z
1
, f[z
1
], f[z
0
, z
1
],
z
2
, f[z
2
], f[z
1
, z
2
], f[z
0
, z
1
, z
2
],
.
.
.
.
.
.
z
n
, f[z
n
], f[z
n1
, z
n
], f[z
0
, . . . , z
n
].
Note that the diagonal entries appear in the Newton form of p
n
(z).
Example. For n = 2 and
(z
j
, f
j
)
2
j=0
= (0, a), (1, b), (4, c) ,
we have
f[z
0
] = f
0
= a,
f[z
1
] = f
1
= b, f[z
0
, z
1
] =
f[z
0
] f[z
1
]
z
0
z
1
=
a b
1
= b a,
f[z
2
] = f
2
= c, f[z
1
, z
2
] =
f[z
1
] f[z
2
]
z
1
z
2
=
b c
3
=
c b
3
.
Therefore
f[z
0
, z
1
, z
2
] =
(b a)
_
cb
3
_
4
=
a
4
b
3
+
c
12
and so the Newton Form of p
2
(z) is
p
2
(z) = a + (b a)(z z
0
) +
_
a
4
b
3
+
c
12
_
(z z
0
)(z z
1
).
36 2.2. FINDING THE ERROR
Theorem 2.7. Let p
n
(z) interpolate f(z) at n + 1 distinct points z
j
n
j=0
, z
j
C.
Then the error e(z) = f(z) p
n
(z) is
e(z) = f[z
0
, . . . , z
n
, z]
n
k=0
(z z
k
)
for z ,= z
j
and e(z
j
) = 0 for j = 0 n.
Proof. Polynomial p
n
(z) interpolates f(z) at z
j
n
j=0
. Add a new distinct point z.
Newton Form of p
n+1
(z) is
p
n+1
(z) = p
n
(z) +f[z
0
, . . . , z
n
, z]
n
k=0
(z z
k
).
Therefore
e(z) = f(z) p
n
(z) = f[z
0
, . . . , z
n
, z]
n
k=0
(z z
k
).
i=0
(x x
i
)
= f[x
0
, . . . , x
n
]x
n
+ .
Therefore
p
(n)
n
(x) = n!f[x
0
, . . . , x
n
]
and hence, as e
(n)
() = 0,
f
(n)
() = p
(n)
n
() = n!f[x
0
, . . . , x
n
].
n
i=0
distinct in [a, b]. If p
n
P
n
interpo-
lates f at x
i
n
i=0
, then e(x) = f(x) p
n
(x) satises
[e(x)[
1
(n + 1)!
i=0
(x x
i
)
max
ayb
f
(n+1)
(y)
k=0
(x x
k
).
From Theorem 2.8, there exists
x
[a, b] such that
e(x) =
f
(n+1)
(
x
)
(n + 1)!
n
k=0
(x x
k
).
Therefore
[e(x)[ =
1
(n + 1)!
k=0
(x x
k
)
f
(n+1)
(
x
)
1
(n + 1)!
k=0
(x x
k
)
max
ayb
f
(n+1)
(y)
= max
axb
[g(x)[.
Note. Beware that
|f p
n
|
, 0
as n in all cases.
Example.
1. Let [a, b] = [
1
2
,
1
2
], f(x) = e
x
, x
i
[a, b], i = 0 n. Then [x x
i
[ 1 and so
|
n
i=0
(x x
i
)|
1. Also |f
(n+1)
|
= |e
x
|
= e
1/2
. Therefore
|f p
n
|
e
1/2
(n + 1)!
0
as n .
2. For any [a, b] and f(x) = cos x, |f
(n+1)
|
1. Also
| cos p
n
|
(b a)
n+1
(n + 1)!
0
as n . Therefore p
n
(x) cos x for all x.
38 2.3. BEST APPROXIMATION
3. Let [a, b] = [0, 1], f(x) = (1 +x)
1
. Then
f
(n + 1)!. Hence
|f p
n
|
1
(n + 1)!
(n + 1)! = 1 , 0
as n .
2.3 Best Approximation
Given [a, b] and f C[a, b], we want to choose interpolation points x
k
n
k=0
in [a, b]
to minimize |
n
k=0
(x x
k
)|
, i.e. to nd
min
{x
k
}
n
k=0
_
_
_
_
_
n
k=0
(x x
k
)
_
_
_
_
_
,
i.e.
min
q
n
P
n
|x
n+1
q
n
(x)|
.
Consider the more general problem: to nd
min
q
n
P
n
|g q
n
(x)|
,
that is to nd q
n
such that
|g q
n
|
|g q
n
|
for all q
n
P
n
. Call such q
n
P
n
the best approximation.
Theorem 2.10 (Equioscillation Property). Let g C[a, b] and n 0. Suppose
there exists q
n
P
n
and n + 2 distinct points x
n+1
j=0
,
a x
0
< x
1
< < x
n+1
b,
such that
g(x
j
) q
n
(x
j
) = (1)
j
|g q
n
|
for j = 0 n + 1 where = 1.
Then q
n
is the best approximation to g from P
n
with respect to | |
, that is
|g q
n
|
|g q
n
|
for all q
n
P
n
.
Note. Call x
n+1
k=0
the equioscillation points. equioscillation
points
2.3. BEST APPROXIMATION 39
Proof. Let E = |g q
n
|
. If E = 0, then q
n
= g is the best approximation. If
E > 0, then suppose that there exists q
n
P
n
such that |g q
n
|
< E. Consider
q
n
q
n
P
n
at the n + 2 points x
n+1
j=0
:
q
n
(x
j
) q
n
(x
j
) = (q
n
(x
j
) g(x
j
)) + (g(x
j
) q
n
(x
j
))
= (1)
j+1
E +
j
with
j
R and [
j
[ < E. Thus
sgn((q
n
q
n
)(x
j
)) = sgn((1)
j+1
E)
and therefore q
n
q
n
P
n
changes sign n+1 times and hence has n+1 roots. Then
by the Fundamental Theorem of Algebra q
n
q
n
; a contradiction, so q
n
does not
exist and q
n
is the best approximation.
Theorem 2.11 (Chebyshev Equioscillation Theorem). Let g C[a, b] and n 0.
Then there exists a unique q
n
P
n
satisfying the equioscillation property such that
|g q
n
|
|g q
n
|
for all q
n
P
n
.
Note. Construction of q
n
is dicult, we use best approximation in the least square
sense. But if g(x) = x
n+1
, the construction of q
n
is easy.
Lemma 2.12. If g(x) = x
n+1
on [1, 1], then the best approximation to g by P
n
with respect to | |
is
q
n
(x) = x
n+1
2
n
T
n+1
(x)
where T
n+1
(x) is the Chebyshev Polynomial of degree n + 1, i.e.
T
n+1
(x) = cos((n + 1) cos
1
x).
Proof. We rst need to show that q
n
is really in P
n
: recall that
T
0
(x) = 1,
T
1
(x) = x,
T
n+1
(x) = 2xT
n
(x) T
n1
(x)
and so
T
n+1
(x) = 2
n
x
n+1
+ .
Therefore q
n
P
n
.
The error is x
n+1
q
n
(x) = 2
n
T
n+1
(x) for x [1, 1]. Change the variable:
x = cos , so = cos
1
x [0, ]. Then
T
n+1
(x) = cos((n + 1)).
Hence
|x
n+1
q
n
|
= max
1x1
[ cos((n + 1) cos
1
x)[ = 1.
40 2.3. BEST APPROXIMATION
Choose
j
=
j
n + 1
for j = 0 n + 1 and so x
j
= cos
j
= cos
j
n+1
. Then
T
n+1
(x
j
) = cos((n + 1)
j
) = cos(j) = (1)
j
.
Hence x
n+1
2
n
T
n
(x) satises the equioscillation property and is thus the best
approximation to x
n+1
in P
n
.
Note. Note that the points are equally spaced in terms of , but clustered around
the end points 1 in terms of x.
Example. The interpolation points are the zeros of the error E. Therefore
n
j=0
(x x
j
) = x
n+1
q
n
= 2
n
T
n+1
(x).
Choose
j
=
(2j + 1)
2(n + 1)
and so
x
j
= cos
_
(2j + 1)
2(n + 1)
_
.
Then
T
n+1
(x
j
) = cos(n + 1)
j
= cos
_
(2j + 1)
2
_
= 0.
Therefore
_
cos
(2j + 1)
2(n + 1)
_
n
j=0
are the optimal Chebyshev Interpolation points for p
n
P
n
on [1, 1].
Generalize this for an interval [a, b]. For x [a, b], introduce t =
2x(a+b)
ba
[1, 1],
so x =
1
2
[(b a)t + (a +b)]. Then the optimal interpolation points for [a, b] are
x
j
=
1
2
_
(b a) cos
_
(2j + 1)
2(n + 1)
_
+ (a +b)
_
for j = 0 n.
Proof. Need to nd
min
{x
j
}
n
j=0
[a,b]
_
_
_
_
_
_
n
j=0
(x x
j
)
_
_
_
_
_
_
.
That is to nd
min
q
n
P
n
_
_
_
_
_
_
b a
2
_
n+1
_
2x (a +b)
b a
_
n+1
q
n
(x)
_
_
_
_
_
n
(x) =
_
b a
2
_
n+1
_
_
2x (a +b)
b a
_
n+1
2
n
T
n+1
_
2x (a +b)
b a
_
_
.
Using the Equioscillation Property, get
t
j
= cos
j
n + 1
and so
x
j
=
(b a) cos
j
n+1
+a +b
2
.
n
i=0
with x
0
= a, x
n
= b, x
j
x
j1
= h, we can use linear interpolation
on each subinterval [x
j1
, x
j
] for j = 1 n. Dene for x [x
j1
, x
j
], j = 1 n,
P
L
(x) = f(x
j1
+
(x x
j1
)
n
(f(x
j
) f(x
j1
))
and so P
L
(x
j1
) = f(x
j1
), P
L
(x
j
) = f(x
j
). The error is
|f P
L
|
= max
axb
[f(x) P
L
(x)[
= max
j=1J
_
max
x
j1
<xx
j
[f(x) P
L
(x)[
_
= max
j=1J
_
max
x
j1
<xx
j
[(x x
j1
)(x x
j
)[
2!
f
(z
j
)
_
where z
j
(x
j
1, x
j
). Since maximum of [(x x
j1
)(x x
j
)[ occurs at x =
(x
j1
+x
j
)/2,
|f P
L
|
max
j=1n
_
h
2
8
f
(z
j
)
_
h
2
8
|f
.
Then for h 0, P
L
f provided f C
2
[a, b]. We can generalize this method to
piecewise quadratics, cubics etc.
42 2.4. PIECEWISE POLYNOMIAL INTERPOLATION
43
Chapter 3
Quadrature (Numerical
Integration)
We are given an interval [a, b] and a weight function w(x) C(a, b), such that
w(x) > 0 except for a nite number of zeroes and
_
b
a
w(x)dx < . Now given a
function f(x), we want to approximate
I(f) =
_
b
a
w(x)f(x)dx
by approximating f(x) by an interpolating polynomial p
n
(x), that is approximate
I(f) by
I
n
(f) = I(p
n
) =
_
b
a
w(x)p
n
(x)dx.
The Lagrange form of p
n
(x) is
p
n
(x) =
n
k=0
f(x
k
)l
k
(x), l
k
=
j=0,j=k
(x x
i
)
x
k
x
j
.
Hence
I(p
n
) =
_
b
a
w(x)
_
n
k=0
f(x
k
)l
k
(x)
_
dx
=
n
k=0
_
b
a
w(x)l
k
(x)dx
=
n
k=0
w
k
f(x
k
)
where w
k
=
_
b
a
w(x)l
k
(x)dx for k = 0 n.
Example. Let [a, b] = [0, 1] and w(x) = x
1/2
, n = 1, x
0
= 1, x
1
= 1. Approximate
I(f) =
_
1
0
x
1/2
f(x)dx by
I
1
(f) =
1
k=0
w
k
f(x
k
)
44
where
w
0
=
_
1
0
x
1/2
(1 x)dx =
_
x
1/2
1/2
x
3/2
3/2
_
1
0
= 4/3,
w
1
=
_
1
0
x
1/2
xdx =
2
3
.
Hence
I
1
(f) =
4
3
f(x
0
) +
2
3
f(x
1
).
If w(x) 1, we get
I
1
(f) =
1
2
[f(x
0
) +f(x
1
)] ,
the trapesium rule.
In general, the error of approximation is
[I(f) I
n
(f)[ =
_
b
a
w(x) [f(x) p
n
(x)] dx
_
b
a
w(x)dx|f p
n
|
.
The error is zero if f P
n
, regardless of the interpolation (sampling) points x
k
n
k=0
.
Otherwise, we can choose x
k
n
k=0
in a smart way so that I
n
(f) = I(f) for all f P
m
,
where m > n is as large as possible.
Lemma 3.1. The orthogonal polynomial
n
has n distinct roots in [a, b].
Proof. Let denote the number of sign changes of
n
in [a, b]. If < n, let x
1
, . . . , x
(x) = (x
x
1
) (x x
). We have 2 possibilities q
n
, q
) =
_
b
a
w(x)
n
(x)q
(x)
. .
>0 except at x
i
dx > 0.
If it is negative, then
n
, q
) =
_
b
a
w(x)
n
(x)q
(x)
. .
<0 except at x
i
dx > 0.
Contradiction as
n
is the orthogonal polynomial of degree n, i.e. it is orthogonal
to all polynomials in P
n1
. Therefore n.
45
Theorem 3.2. Let w C(a, b) with w > 0 except for a nite number of points and
_
b
a
w(x)dx < . Let
n+1
be the orthogonal polynomial of degree n + 1 associated
with the inner product
g
1
, g
2
) =
_
b
a
w(x)g
1
(x)g
2
(x)dx.
Let x
n
i=0
, x
i
[a, b] be the n + 1 distinct zeroes of
n+1
(see the above lemma).
If we approximate
I(f) =
_
b
a
w(x)f(x)dx
by I
n
(f) = I(p
n
) where p
n
P
n
is such that p
n
(x
i
) = f(x
i
) for i = 0 n, then
I
n
(f) =
n
i=0
w
i
f(x
i
),
w
i
=
_
b
a
w(x)l
i
(x)dx,
l
i
=
n
j=0,j=i
(x x
j
)
(x
i
x
j
)
for i = 0 n. Also I
n
(f) = I(f) for all f P
2n+1
.
Proof. Let f P
2n+1
. Then f p
n
P
2n+1
has roots at x
n
i=0
and therefore
f p
n
= q
n
n+1
for some q
n
P
n
. Then
I(f) I
n
(f) = I(f) I(p
n
)
=
_
b
a
w(x) [f(x) p
n
(x)] dx
=
_
b
a
w(x)q
n
(x)
n+1
(x)dx
= q
n
,
n+1
) = 0
as
n+1
is the orthogonal polynomial of degree n + 1. Hence I
n
(f) = I(f) for all
f P
2n+1
.
With n + 1 sampling points, it is not possible to choose w
i
, i = 0 n, such that
I
n
(f) = I(f) for all f P
2n+2
. Consider f(x) =
n
i=0
(x x
i
)
2
P
2n+2
. Clearly
I(f) > 0, but I
n
(f) = 0.
Choosing the sampling points as the roots of
n+1
is called Gaussian Quadrature. Gaussian
Quadrature
Example. Let [a, b] = [1, 1], w 1 and
g
1
, g
2
) =
_
1
1
g
1
(x)g
2
(x)dx.
For n = 1,
I
1
(f) = w
1
f(x
0
) +w
1
f(x
1
).
46
Recall that
2
(x) = x
2
1/3 and so x
0
= 1/
3, x
1
= 1/
3. Now we determine
w
0
, w
1
. Observe that
I
1
(1) = w
0
+w
1
= I(1) =
_
1
1
1dx = 2,
I
1
(x) =
1
3
(w
0
+w
1
) = I(x) =
_
1
1
xdx = 0
and hence w
0
= w
1
= 1. Therefore I
1
(f) = f(1/
3) + f(1/
3). Also I
1
(x
2
) =
2/3 = I(x
2
) and I
1
(x
3
) = 0 = I(x
3
).
For n = 2,
3
(x) = x
3
(3/5)x and so x
0
=
_
3/5, x
1
= 0, x
2
=
_
3/5. Therefore
I
2
(f) = w
0
f(
_
3/5) +w
1
f(0) +w
2
f(
_
3/5).
This is exact for cubics and so
I
2
(1) = w
0
+w
1
+w
2
= 2,
I
2
(x) =
_
3/5w
0
+
_
3/5w
2
= 0,
I
2
(x
2
) =
3
5
w
0
+
3
5
w
2
=
2
3
.
Hence
I
2
(f) =
1
9
_
5f(
_
3/5) + 8f(0) + 5f(
_
3/5)
_
.