You are on page 1of 64

Chapter 6

System of Nonlinear
Equations

Finding the zeros of a given function f , i.e., finding an argument x for which

f (x) = 0 (6.1)

where f : Rn → Rn , is a classical problem arising from many areas of applica-


tions. Except in linear problems, root-finding invariably proceeds by iteration —
starting from some approximate trial solution, a useful algorithm will improve
the solution until some predetermined convergence criterion is satisfied.
Unlike most of the iterative methods used for linear systems, having a good
first-guess for the solution of a nonlinear system usually is crucial in determining
the success of an iterative process. Such methods, among which the Newton-
Raphson method is the most celebrated, are called local methods.
Thus far, general–purposed global methods are not available. For special
classes of problem, such as solving systems of polynomials, important progress
has recently been made. The homotopy method, having connections with dif-
ferential equations, is one such approach.
Another difficulty often associated with solving nonlinear equations is the
detection of existence of one or multiple solutions. A nonlinear set of equations
may have no (real) solutions at all. Contrariwise, it may have more than one
solution. Ideally, one should resort to some other means, such as the degree
theory, to determine that theoretically a nonlinear equation does have a solu-
tion before numerically finding the approximate solution. Applying a numerical
method blindfold has the danger of being misled to a wrong answer even though
the method behaves nicely.
6.1 The Newton-Raphson Method
An iterative scheme for solving the system (6.1) generally takes the form

xk+1 = ρ(xk ), k = 0, 1, . . . (6.2)

1
2 CHAPTER 6. SYSTEM OF NONLINEAR EQUATIONS

where ρ : Rn → Rn is a fixed function and x0 ∈ Rn is a given starting value.


It is hoped that the sequence {xk } will converge to a point ξ as i → ∞. If this
is so, then ξ must be a fixed point of ρ. It is desirable that f (ξ) = 0. It is also
important to know how quickly the sequence {xk } converges.
Suppose the system (6.1) has a solution at ξ and suppose that f is suffi-
ciently smooth. Let xk be an approximation to ξ. Considering the Taylor series
expansion of f about xk , we have

f (x) = f (xk ) + f ! (xk )(x − xk ) + 0(%x − xk %2 ) (6.3)

where f ! ∈ Rn×n denotes the Jacobian matrix of f . If, in particular, we take


x ∈ ξ, then
0 ≈ f (xk ) + f ! (xk )(ξ − xk ). (6.4)
From (6.4), we are motivated to think that the quantity

xk+1 := xk + sk (6.5)

where sk satisfies the linear system

f ! (xk )xk = −f (xk ) (6.6)

should a better approximation to ξ then xk . In other words, if f ! is nonsingular,


then we have derived a special iterative scheme (6.2) with

ρ(x) := x − f ! (x)−1 f (x). (6.7)

This is the well-known Newton-Raphson method.


There are many ways to define a sensible iteration function ρ other than
(6.7). Besides numerous research papers, a classical reference on this topic is
the book “Iterative Solution of Nonlinear Equations in Several Variables” by
Ortega and Rheinboldt. An earlier book is “Iterative Methods for the Solution
of Equations” by Traub.

Definition 6.1.1 Let ρ : Rn → Rn be an iteration function. Let ξ be a fixed


point of ρ. The iterative scheme (6.1.1) defined by ρ is said to be a method of
order p if for all initial point x0 in a neighborhood N (ξ), the generated sequence
{xk } satisfies
%xk+1 − ξ% ≤ C%xk − ξ%p (6.8)
for a certain constant C.

The Newton method is best knwon for its quadratic convergence. Toward
this, we first prove a useful lemma.

Lemma 6.1.1 Let f : Rn → Rn be continuously differentiable in an open con-


vex set D ⊂ Rn . Suppose a constant γ exists such that %f ! (x)−f ! (y)% ≤ γ%x−y%
for all x, y ∈ D. Then %f (x) − f (y) − f ! (y)(x − y)% ≤ γ2 %x − y%2 .
6.1. THE NEWTON-RAPHSON METHOD 3

!1
(pf): By the line integration, f (x) − f (y) = 0 f ! (y + t(x − y))(x − y)dt. So
" 1
f (x) − f (y) − f ! (y)(x − y) = [f ! (y + t(x − y)) − f ! (y)](x − y)dt.
0

It follows that
%f (x) − f (y) − f ! (y)(x − y)%
" 1
≤ %f ! (y + t(x − y)) − f ! (y)%%x − y%dt
0
" 1
γ
≤ γt%x − y%2 dt = %x − y%2 .
0 2
Theorem 6.1.1 Let f : Rn → Rn be continuously differentiable in an open
convex set D ⊂ Rn . Assume that there exist ξ ∈ D and β, γ > 0 such that
(1) f (ξ) = 0,
(2) f ! (ξ)−1 exists,
(3) %f ! (ξ)−1 % ≤ β, and
(4) %f ! (x) − f ! (y)% ≤ γ%x − y% for x, y in a neighborhood of ξ.
Then there exists % > 0 such that for every x0 ∈ N (ξ, %), the sequence {xk }
define by (6.5) and (6.6) is well defined, converges to ξ and satisfies
%xk+1 − ξ% ≤ βγ%xk − ξ%2 . (6.9)
1
(pf): By continuity of f ! , choose % ≤ min{γ, 2βγ } so that f ! (x) is nonsingular
for all x ∈ N (ξ, %). For k = 0, we have already known %x0 − ξ% < %. So

%f ! (ξ)−1 (f ! (x0 ) − f ! (ξ))% ≤ %f ! (ξ)−1 %%f ! (x0 ) − f ! (ξ)%


1
≤ βγ%x0 − ξ% ≤ .
2
By the Banach lemma (Theorem 2.1.1 in my note .4.14),

%f ! (x0 )−1 % = %[f ! (ξ) + (f ! (x0 ) − f ! (ξ))]−1 %


%f ! (ξ)−1 %
≤ ≤ 2%f ! (ξ)−1 % ≤ 2β.
1 − %f ! (ξ)−1 (f ! (x ) − f ! (ξ))%
Now x1 − ξ = x0 − ξ − f ! (x0 )−1 f (x0 ) = x0 − ξ − f ! (x0 )−1 (f (x0 ) − f (ξ)) =
f (x0 )−1 [f (ξ) − f (x0 ) − f ! (x0 )(ξ − x0 )]. So
!

%x1 − ξ% ≤ %f ! (x0 )−1 %%f (ξ) − f (x0 ) − f ! (x0 )(ξ − x0 )%


γ
≤ 2β %ξ − x0 %2 = βγ%x0 − ξ%2 (by Lemma 6.1.1)
2
1
≤ βγ%%x0 − ξ% ≤ %x0 − ξ% ≤ %.
2
4 CHAPTER 6. SYSTEM OF NONLINEAR EQUATIONS

The proof is now completed by induction.


Remarks. (1) The above theorem states that the Newton method converges
quadratically if f ! (ξ) is nonsingular and if starting point is close enough to ξ.
(2) At each step of the Newton method, an evaluation of the Jacobian matrix
f ! (xk ) is required. Also, a linear system (6.1.5) needs to be solved. All of these
mean that the Newton method is an expensive method. So, modifications of
the Newton method to make it more efficient is essential in practice.

6.2 The Broyden Method


Consider that in one-dimensional case, the derivative f ! (xk ) may be approxi-
mated by the finite difference quotient
f (xk ) − f (xk−1 )
Bk := . (6.10)
xk − xk−1
This choice results in the so called secant method:
xk+1 = xk − Bk−1 f (xk ) (6.11)

1+ 5
and it can be proved that the rate of convergence is p = 2 ≈ 1.618. In
n-dimensional case, we reformulate the relationship (6.10) as

Bk (xk − xk−1 ) = f (xk ) − f (xk−1 ) (6.12)


which is known as the quasi-Newton condition. If we further write
sk := xk+1 − xk (6.13)
∆fk := f (xk+1 ) − f (xk ) (6.14)
Bk+1 = Bk + Ck , (6.15)
then (6.12) is equivalent to
Ck sk = ∆fk − Bk sk . (6.16)
n
Let wk ∈ R be an arbitrary vector such that wkT sk
)= 0. Then obviously the
matrix
1
Ck := T (∆fk − Bk sk )wkT (6.17)
wk sk
satisfies the quasi-Newton condition (6.16).
Definition 6.2.1 If wk := sk , then
1
Ck := (∆fk − Bk sk )sTk (6.18)
sTk sk
is known as Broyden’s first method. If wk := BkT sk , then
1
Ck := (∆fk − Bk sk )sTk Bk (6.19)
sTk Bk sk
is known as Broyden’s second method.
6.2. THE BROYDEN METHOD 5

Theorem 6.2.1 Let sk , ∆fk ∈ Rn be given. The matrix Ck given by (6.18) is


the minimal change of Bk in the Frobenius norm such that Bk+1 = Bk + Ck
satisfies the quasi-Newton condition Bk+1 sk = ∆fk ,.

(pf): Let C̃k denote another possible change of Bk such that B̃k+1 := Bk + C̃k
satisfies B̃k+1 sk = ∆fk . Then
1
%Ck % = %Bk+1 − Bk %F = (∆fk − Bk sk )sTk %F
sTk sk
1
= % (B̃k+1 sk − Bk sk )sTk %
sTk sk
sk sTk
≤ %B̃k+1 − Bk %F % %F = %C̃k %F .
sTk sk

Algorithm 6.2.1 (Broyden’s Method)


Given an initial guesst x0 ,
Approximate f ! (x0 ) by a matrix B0 (say, by a finite difference method),
For k = 0, 1, . . .
Solve Bk dk = +f (xk ) for dk ,
Determine λk for which %f (xk − λk dk )%2 is approximately minimized,

xk+1 : = xk − λk dk ,
sk : = xk+1 − xk ,
∆fk : = f (xk+1 ) − f (xk ),
Bk+1 : = Bk + sT1s (∆fk − Bk sk )sTk .
k k

Lemma 6.2.1 Let f : Rn → Rn be continuously differentiable on an open


convex set D ⊂ Rn . Suppose there exists a constant γ exists such that %f ! (x) −
f ! (y)% ≤ γ%x − y% for x, y ∈ D. Then it holds that for any x, y, ξ ∈ D, %f (x) −
f (y) − f ! (ξ)(u − v)% ≤ γ2 (%x − ξ% + %y − ξ%)%x − y%.

(pf): The proof is parallel to that of (6.1.1). We have, by the line integral,
!1
%f (x) − f (y) − f ! (ξ)(x − y)% = % 0 [f ! (y + t(x − y)) − f ! (ξ)](x − y)dt% ≤ γ%x −
!1 !1
y% 0 %y + t(x − y) − ξ%dt ≤ γ%x − y% 0 {t%x − ξ% + (1 − t)%y − ξ}dt. ⊕

Lemma 6.2.2 Let f : Rn → Rn be continuously differentiable on an open


convex set D ⊂ Rn . Suppose there exists a constant γ exists such that %f ! (x) −
f ! (y)% ≤ γ%x − y% for x, y ∈ D. Then for xk+1 , xk ∈ D, holds that %Bk+1 −
f ! (ξ)% ≤ %Bk − f ! (ξ)% + γ2 (%xk+1 − ξ% + %xk − ξ%).

(pf): By definition,

(∆fk − Bk sk )sTk
Bk+1 − f ! (ξ) = Bk − f ! (ξ) +
sTk sk
# $ # $
sk sT sk sT (∆fk − f ! (ξ)sk sTk
= Bk I = T k − f ! (ξ) I − T k + .
sk sk sk sk sTk sk
6 CHAPTER 6. SYSTEM OF NONLINEAR EQUATIONS

Taking norm, we have

sk sTk (∆fk − f ! (ξ)sk )sTk


%Bk+1 − f ! (ξ)% ≤ Bk − f ! (ξ)%%I − % + % %.
sTk sk sTk sk
sk sT
Observe that %I − sT s
k
% ≤ 1. The third term is estimated by
k k

(∆fk − f ! (ξ)sk )sTk {[f (xk+1 ) − f (xk ) − f ! (ξ)(xk+1 − xk )]}sTk


% T
% = % %
sk sk sTk sk
γ
≤ (%xk+1 − ξ% + %xk − ξ%)
2
by the preceding lemma. ⊕

Theorem 6.2.2 Let f : Rn → Rn be continuously differentiable on an open


convex set D ⊂ Rn . Suppose there exists ξ ∈ Rn , β, γ > 0 such that

1. f (ξ) = 0,
2. f ! (ξ)−1 exists,
3. %f ! (ξ)−1 % ≤ β, and
4. %f ! (x) − f ! (y)% ≤ γ%x − y% for x, y in a neighborhood of ξ.

Then there exist positive constants δ1 , δ2 such that if %x0 − ξ% < δ1 and %B0 −
f ! (ξ)% ≤ δ2 , then the Broyden’s method is well defined, converges to ξ, and
satisfies
%xk+1 − ξ% ≤ ck %xk − ξ% (6.20)
with lim ck = 0. (This behavior is called superlinear convergence.)
k→∞

1 2δ
(pf): Choose δ2 ≤ 6β and δ1 ≤ 5γ . Then %f ! (ξ)−1 B0 − I% ≤ βδ2 ≤ 16 . By
the Banach lemma, B0−1 exists. So x1 can be defined. Furthermore,

%B0−1 % = %(f ! (ξ) + (B0 − f ! (ξ)))−1 %


%f ! (ξ)−1 % β
≤ ≤ . (6.21)
1 − %f ! (ξ)−1 %%B0 − f ! (ξ)% 1 − βδ2
Thus

%e1 % := %x1 − ξ% = %x0 − B0−1 (f (x0 ) − f (ξ)) − ξ%


= % − B0−1 [f (x0 ) − f (ξ) − B0 (x0 − ξ)]%
= %B0−1 [f (x0 ) − f (ξ) − f ! (ξ)(x0 − ξ) + (f ! (ξ) − B0 )(x0 − ξ)]%
β %γ & β
≤ %e0 %2 + δ2 %e0 % ≤ [γ, voer2δ1 + δ2 ] %e0 %
1 − βδ2 2 1 − βδ2
1
β 6δ2 6 1
≤ %e0 % ≤ 6 1 %e0 % < %e0 %.
1 − βδ2 5 1− 6 5 2
6.2. THE BROYDEN METHOD 7

By (6.2.2), we know
γ
%B1 − f ! (ξ)% ≤ %B0 − f ! (ξ)% + (%x1 − ξ% + %x0 − ξ%)
# $2 # $
γ 3 γ3 2
≤ δ2 + %e0 % ≤ δ2 1 +
2 2 2 2 5γ
# $
3 3
= 1+ δ2 ≤ δ2 .
10 2

Thus %f ! (ξ)−1 B1 − I% ≤ 2βδ2 ≤ 13 . By the Banach lemma, B1−1 exists and

%f ! (ξ)−1 % β 3
%B1−1 % ≤ ≤ ≤ β. (6.22)
1− %f ! (ξ)−1 %%B1 − f ! (ξ)% 1 − 2βδ2 2
We can now estimate
%e2 % : = %x2 − ξ% = %x1 − B1−1 (f (x1 ) − f (ξ)) − ξ%
= % − B1−1 [f (x1 ) − f (ξ) − B1 e1 ]%
= %B1−1 [f (x1 ) − f (ξ) − f ! (ξ)e1 + (f ! (ξ) − B1 )e1 ]%
'γ ( 3β ' γ δ1 (
≤ 3β
2
2 3
2 %e1 % + 2 δ2 %e1 % ≤ 2
3
2 2 + 2 δ2 %e1 %
% &
≤ 3βδ
2
2 γ1 2 3 1 16 1
2 2 5γ + 2 %e1 % ≤ 4 10 %e0 % < 2 %e0 %.

Continuing, we see that


γ
%B2 − f ! (ξ)% ≤ B1 − f ! (ξ)% + (%e2 % + %e1 %)
# 2 $ # $
13 γ 3 3 γ31 2
≤ δ2 + %e1 % ≤ δ2 1 + +
10 2 2 10 2 2 2 5γ
# $ ) # $2 *
3 1 3 1
= 1+ + δ2 ≤ 2 − δ2 ≤ 2δ2 .
10 2 10 2

The proof is now completed by induction.


Remark. In addition to saving functional evaluation of f ! (x), Broyden’s method
has another important advantage, that is, the matrix factorization of Bk+1 can
easily be updated.
For simplicity, we consider the basic form

B+ = Bc + uv T (6.23)

where u and v represent two column vectors in Rn . Suppose

Bc = Qc Rc (6.24)

is already known. We want to find the QR decomposition for B+ . Assume

B+ = Q+ R+ . (6.25)
8 CHAPTER 6. SYSTEM OF NONLINEAR EQUATIONS

Let w := QTc u. Then B+ = Qc (Rc +wv T ). If the QR decomposition of Rc +wv T


is Q̃R̃, then Q+ R+ = (Q+ Q̃)R̃ and we are done. But how to find Q̃R̃? The point
is that uv T is only a rank one matrix. So the QR decomposition of Rc + wv T
would be much cheaper if we perform the orthogonalization process carefully.
We first recall the 2-dimensional rotation matrix.

Definition 6.2.2 A 2-dimensional rotation matrix is a matrix R(θ) of the form


+ ,
cos θ, sin θ
R(θ) = , (6.26)
− sin θ, cos θ

or equivalently, a matrix R(α, β) of the form


+ ,
1 α, β
R(α, β) = √ 2 . (6.27)
α + β2 −β, α

Definition 6.2.3 A Jacobi rotation matrix is a matrix J(s, t; α, β) ∈ Rn×n of


√ 2 α β
the form, for ∆ := α + β 2 , α := ∆ , β := ∆ ,
 
1 0 0
 0 α 0 0 β 0 
 
 0 1  → s − throw
J =

.
 → t − throw (6.28)
 0 1 
 0 α 0 
−β
0 0 1

Remarks. (1) It is easy to see that


   
a1 a1
   αas + βat 
J(s, t; α, β) 

= 
  −βas + αat  . (6.29)
an an

(2) Let v = [v1 v2 ] be a 2-dimensional vector.


+ Then , R(θ)v rotates v by an angle
%v%
θ counterclockwise. Indeed, R(v1 , v2 )v = and R(v2 , −v1 )v =.
0
Consider
+ now, the QR decomposition of the matrix Rc + wv T . Note that
3
w1 v T
wv T = T
2
. Let c1 := wn−1 + wn2 . Then with Q̂1 := J(n−1, n; wn−1 , wn )
wn v
 
w1 v T
 .. 
 .  3
T   2
we have Q̂1 wv =  wn−2 v T . Let c2 := wn−2 + c21 and Q̂2 := J(n − 2, n −
 
 c1 v T 
0
6.3. STURM SEQUENCES 9
 
w1 v T
 .. 
 . 
 
 T 
1; wn−2 , c1 ), we have Q̂1 Q̂2 wv T =  wn−3 v . Continuing this process for
 c2 v T 
 
 0 
0
n − 1 iterations, we obtain
 
%w%v T
 0 
 
Q̂n−1 . . . Q̂1 (Rc + wv T ) = Q̂n−1 . . . Q̂1 Rc +  .. . (6.30)
 . 
0

We note Q̂n−1 . . . Q̂1 Rc to the worst is an upper Hessenberg matrix. So the


matrix Rc + wv T has been reduced by rotation matrices to an upper Hessenberg
matrix. We can now continue to do a sequence of plane rotations to change the
upper Hessenberg matrix into an upper triangular matrix (Recall how the QR
algorithm works!).

6.3 Sturm Sequences


The fundamental theorem of algebra asserts that a polynomial p(x) of degree
n,
p(x) = an xn + an−1 xn−1 + . . . + a1 x + a0 , (6.31)
has exactly n (complex) roots, if counting multiplicity. If all the coefficients are
real numbers, it is often desirable to determine the number of real roots of p(x)
in a specified region (a, b) where either a or b may be infinite. Toward this end,
the concept of Sturm sequences offers a very useful technique here.
Definition 6.3.1 A sequence
p(x) = f1 (x), . . . , fm (x) (6.32)
of real polynomials is called a Sturm sequence on an interval (a, b) if
1 The last polynomial fm (x) does not vanish in (a, b);
2 At any zero ξ of fk (x), k = 2, . . . , m − 1, the two adjacent polynomials are
nonzero and have opposite signs, that is
fk−1 (ξ)fk+! (ξ) < 0. (6.33)
Definition 6.3.2 Let {fk (x)} be a Sturm sequence on (a, b), and let x0 ∈ (a, b)
at which f1 (x) )= 0. We define V (x0 ) to be the number of changes of sign
of {fk (x0 )}, zero values being ignored. If a is finite, then V (a) is defined as
V (a + %), where % is such that no fk (x) vanishes in (a, a + %). If a = −∞, then
V (a) is defined to be the number of changes of signs of { lim fk (x)}. Similarly,
x→−∞
V (b) is defined.
10 CHAPTER 6. SYSTEM OF NONLINEAR EQUATIONS

Definition 6.3.3 Let R(x) be any rational function. The Cauchy index of R(x)
on (a, b), denoted by Iab R(x), is defined to be the difference between the number
of jumps of R(x) from −∞ to +∞ and the number of jumps from +∞ to −∞
as x goes from a to b, excluding the endpoints. That is, at every pole of R(x)
in (a, b) add 1 to the Cauchy index if R(x) → −∞ on the left of the pole and
R(x) → +∞ on the right of the pole, and subtract 1 if vice versa.

Theorem 6.3.1 (Sturm) If fk (x), k = 1, . . . , m is a Sturm sequence on an


interval (a, b), then if neither f1 (a) nor f1 (b) equals 0, we have

f2 (x)
Iab = V (a) − V (b). (6.34)
f1 (x)

(pf): we first claim that the value of V (x) does not change when x passes
through a zero of fk (x), k = 2, . . . , m − 1. To see this, suppose fk (ξ) =
0. Then by (6.33), fk+! (ξ)fk−1 (ξ) < 0. If fk (x) changes sign at x = ξ,
then for a sufficiently small perturbation h > 0, the signs of the polynomi-
als fk−1 (x), fk (x) and fk+1 (x) display the behavior in one of the following four
−−− +++ −−− +++
patterns: −0+, −0+, +0−, +0 − . In each case, V (ξ − h) = V (ξ) =
+++ −−− +++ −−−
V (ξ +h). This is also true if fk (x) does not change sign at x = ξ. Thus V (x) can
change only when f1 (x) goes through 0. If ξ is a zero of f1 (x), it is not a zero of
f2 (x) because of property 2 of Sturm sequences. Therefore, f2 (x) has the same
sign on both sides of ξ. If ξ is a zero of f1 (x) of even multiplicity, when V (x)
does not change as x increases through ξ and there is not contribution to the
Cauchy index. If the zero is of odd multiplicity, then V (x) will increase by 1 if
−0+ +0−
f1 (x) and f2 (x) have the same sign to the left of ξ, (i.e., , or )
−−− +++
−0+
and will decrease by 1 if the signs to the left are different, (i.e., , or
+++
+0−
). Correspondingly for zeros of odd multiplicity, there is a -1 contribu-
−−−
tion to the Cauchy indes if the signs of f1 (x) and f2 (x) are the same to the left
of ξ and +1 contribution if they are different. This establishes the theorem. ⊕
We now apply the Sturm theorem to find the real roots of p(x) in an interval
(a, b). Consider the sequence of functions fk (x), k = 1, . . . , m where

f1 (x) = p(x)
f2 (x) = p! (x)
(6.35)
fj−1 (x) = qj−1 (x)fj (x) − fj+1 (x), j = 2, . . . , m − 1
fm−1 (x) = qm−1 (x)fm (x)

where qj−1 (x) is the quotient and fj+1 (x) is the negative of the remainder
when fj−1 (x) is divided by fj (x). Thus {fk (x)} is a sequence of polyno-
mials of decreasing degree which eventually must terminate in a polynomial
fm (x), m ≤ n + 1, which divides fm−1 (x) (why?). The polynomial fm (x) is the
6.3. STURM SEQUENCES 11

greatest common divisor of f1 (x) and f2 (x) and also of every other member of
the sequence (6.35). (This is the so called the Euclidean algorithm.)
Suppose fm (x) does not vanish in (a, b) so that the first condition of Defini-
tion 6.3.1 is satisfied. If fk (ξ) = 0 for any k, k = 2, . . . , m − 1, then fk−1 (ξ) =
−fk+1 (ξ) by (6.35). Moreover, when fk+1 (ξ) )= 0 since otherwise fm (ξ) would
also be 0 (Why?). Thus the sequence {fk (x)} is a Sturm sequence when fm (x)
does not vanish in (a, b).
Suppose fm (x) is not of constant sign in (a, b), then we use the sequence
{fk (x)/fm (x)}. Then not only is this a Sturm sequence but also both sides of
(6.34) are the same for this sequence and for the sequence {fk (x)}. Therefore,
we can use these two sequences interchangeably in applying Sturm’s theorem.
Now for the sequence {fk (x)} define by (6.35), we write
p
f2 (x) p! (x) 4 nj
= = + R1 (x) (6.36)
f1 (x) p(x) j=1
x − aj

where the aj , j = 1, . . . , p, are the distinct real zeros of p(x), nj is the multiplicity
of the zeros aj , and R1 (x) has no poles on the real axis. Since the nj are all
positive, Iab (p! (x)/p(x)) is equal to the number of distinct real zeros of p(x) in
the interval (a, b). Therefore, we have the following theorem:
Theorem 6.3.2 The number of distinct real zeros of the polynomial p(x) in the
interval (a, b) is equal to V (a) − V (b) if neither f (a) nor f (b) is equal to 0.
Example. Consider the polynomial
p(x) = x6 + 4x5 + 4x4 − x2 4x − 4. (6.37)
Using (6.35), we calculate
f1 (x) = x6 + 4x5 + 4x4 − x2 − 4x − 4
f2 (x) = 6x5 + 20x4 + 16x3 − 2x − 4
f3 (x) = 4x4 + 8x3 + 3x2 + 14x + 16
(6.38)
f4 (x) = x3 + 6x2 + 12x + 8
f 5 (x) = −17x2 − 58x − 48
f6 (x) = −x − 2
where the coefficients have been made integers by multiplying by suitable posi-
tive constants. For some sample values of x the signs of the fk (x) are:
−∞ ∞ 0 -1 +1 -24/17
f1 (x) + + - 0 0 +
f2 (x) - + - - + -
f3 (x) + + + + + -
(6.39)
f4 (x) - + + + + +
f5 (x) - - - - - 0
f6 (x) + - - - - -
# of sign change 4 1 2 2 1 3
12 CHAPTER 6. SYSTEM OF NONLINEAR EQUATIONS

Thus we conclude there are three distinct real zeros, two negative and one posi-
tive. Moreover, there are two distinct zeros in (−∞, −1] and three in (−∞, +1].

6.4 Bairstow’s Method


A real polynomial may have complex conjugate roots. In order to find the
complex roots, most methods would have to begin at a complex starting point
and be carried out in complex arithmetic. Bairstow’s method avoids complex
arithmetic.
The roots of a quadratic polynomial

d(x) = x2 − rx − q (6.40)

are obviously known. For a polynomial p(x), we write

p(x) = p1 (x)(x2 − rx − q) + Ax + B. (6.41)

The coefficients of the remainder depends upon r and q. The idea of Bairstow’s
method is to determine r and q so that

A(r, q) = 0 (6.42)
B(r, q) = 0 (6.43)

Applying Newton’s method to (6.42), we need to compute


+ , + , 5 ∂A ∂A 6−1 + ,
ri+1 ri ∂r ∂q A(r1 , qi )
= − ∂B ∂B . (6.44)
qi+1 qi ∂r ∂q
B(r1 , qi )
(ri ,qi )

Upon differentiating, we observe that


∂p(x) ∂p1 ∂A ∂B
0 ≡ = (x)(x2 − rx − q) − p1 (x)x + x+ (6.45)
∂r ∂r ∂r ∂r
∂p(x) ∂p1 2 ∂A ∂B
0 ≡ = (x)(x − rx − q) − p1 (x) + x+ . (6.46)
∂q ∂q ∂q ∂q
Let p1 (x) be further divided by d(x) and denote

p1 (x) = p2 (x)(x2 − rx − q) + Ãx + B̃. (6.47)

Assuming the two roots x0 and x1 , of d(x) are distinct, it follows that

p1 (xi ) = Ãxi + B̃. (6.48)

Substituting (6.46) we obtain


∂A ∂B
−xi (Ãxi + B̃) + xi + =0 (6.49)
∂r ∂r
∂A ∂B
−(Ãxi + B̃) + xi + = 0. (6.50)
∂q ∂q
6.4. BAIRSTOW’S METHOD 13

Form the second equations we have (since x0 )= x1 )

∂A ∂B
= Ã, = B̃. (6.51)
∂q ∂q
Form the first equations, we have
∂A ∂B
−Ã(rxi + q) + xi ( − B̃) + = 0. (6.52)
∂r ∂r
Therefore, we know
∂A
= B̃ = Ãr (6.53)
∂r
∂B
= Ãq. (6.54)
∂r
The values of A, B can be obtained without difficulty: Suppose

p(x) = an xn + . . . + a1 x + a0 (6.55)
p1 (x) = bn−2 xn−2 + . . . + b0 . (6.56)

By comparing coefficients of (6.41), we obtain

bn−2 = an , (6.57)
bn−3 = bn−2 r + an−1 , (6.58)
bn−k = bn−k+2 q + bn−k+1 r + an−k+2 (6.59)
A = b1 q + b0 r + a1 (6.60)
B = b 0 q + a0 . (6.61)

Similarly, by using (6.47), we can obtain the values of à and B̃.


Chapter 7

Approximation Theory

The primary aim of a general approximation is to represent non-arithmetic


quantities by arithmetic quantities so that the accuracy can be ascertained to
a desired degree. Secondly, we are also concerned with the amount of compu-
tation required to achieve this accuracy. These general notions are applicable
to functions f (x) as well as to functionals F (f ) (A functional is a mapping
from the set of functions to the set of real or complex numbers). Typical ex-
amples of quantities to be approximated are transcendental functions, integrals
and derivatives of functions, and solutions of differential or algebraic equations.
Depending upon the nature to be approximated, different techniques are used
for different problems.

A complicated function f (x) usually is approximated by an easier function


of the form φ(x; a0 , . . . , an ) where a0 , . . . , an are parameters to be determined
so as to characterize the best approximation of f . Depending on the sense in
which the approximation is realized, there are three types of approaches:

1. Interpolatory approximation: The parameters ai are chosen so that on a


fixed prescribed set of points xi , i = 0, 1, . . . , n, we have

φ(xi ; a0 , . . . , an ) = f (xi ) := fi . (7.1)

Sometimes, we even further require that, for each i, the first ri derivatives
of φ agree with those of f at xi .

1
2 CHAPTER 7. APPROXIMATION THEORY

2. Least-square approximation: The parameters ai are chosen so as to

Minimize!f (x) − ψ(x; a0 , . . . , an )!2 . (7.2)

3. Min-Max approximation: the parameters ai are chosen so as to minimize

!f (x) − φ(x; a0 , . . . , an )!∞ . (7.3)

Definition 7.0.1 We say φ is a linear approximation of f if φ depends linearly


on the parameters ai , that is, if

φ(xi ; a0 , . . . , an ) = a0 ϕ0 (x) + . . . + an ϕ(xn ) (7.4)

where ϕi (x) are given and fixed functions.

Choosing ϕi (x) = xi , the approximating function φ becomes a polynomial.


In this case, the theory for all the above three types of approximation is well
established. The solution for the min-max approximation problem is the so
called Chebyshev polynomial. We state without proof two fundamental results
concerning the first two types of approximation:

Theorem 7.0.1 Let f (x) be a piecewise continuous function over the interval
[a, b]. Then for any $ > 0, there exist an integer n and numbers a0 , . . . , an such
"n
!b
that a {f (x) − ai xi }2 dx < $.
i=0

Theorem 7.0.2 (Weierstrass Approximation Theorem) Let f (x) be a continu-


ous function on [a, b]. For any $ > 0, there exist an integer n and a polynomial
pn (x) of degree n such that max |f (x) − pn (x)| < $. In fact, if [a, b] = [0, 1],
x∈[a,b]
then the Bernstein polynomial
n # $
" n k
Bn (x) := xk (1 − x)n−k f ( ) (7.5)
k n
k=0

converges to f (x) as n → ∞.

In this chapter, we shall consider only the interpolatory approximation.


Choosing ϕi (x) = xi , we have the so called polynomial interpolation; choos-
ing ϕi (x) = eix , we have the so called trigonometric interpolation. The so
7.1. LAGRANGIAN INTERPOLATION FORMULA 3

called rational interpolation where


a0 ϕ0 (x) + . . . + an ϕn (x)
ϕ(xi ; a0 , . . . , an , b0 , . . . , bm ) = (7.6)
b0 ϕ0 (x) + . . . + bm ϕm (x)
is an important non-linear interpolation.

7.1 Lagrangian Interpolation Formula

Theorem 7.1.1 Let f ∈ C[a, b]. Let xi , i = 1, . . . , n, be n distinct points in


[a, b],. There exists a unique polynomial p(x) of degree ≤ n−1 such that p(xi ) =
f (xi ). In fact,
!n
p(x) = f (xi )"i (x) (7.7)
i=1
where
"n
x − xj
"i (x) := . (7.8)
j=1
xi − xj
j!=1
n
In the case when f ∈ C [a, b], then

n
"
(x − xj )
j=1
E(x) := f (x) − p(x) = f (n) (ξ) (7.9)
n!
where min{x1 , . . . , xn , x} < ξ < max{x1 , . . . , xn , x}.

n−1
!
(pf): Suppose p(x) = ak xk where the coefficients ak are to be deter-
k=0
n−1
!
mined. Then p(xi ) = ak xki = f (xi ), i = 1, . . . , n, can be written in the
k=0
form     
1, x1 , . . . , xn−1
1 a0 f (x1 )
    
  = . (7.10)
    
1, xn , . . . , xn−1
n an−1 f (xn )
"
The matrix, known as the van Dermonde matrix, has determinant (xi −
i>j
xj ). Since all xi ’s are distinct, we can uniquely solve (7.10) for the unknowns
4 CHAPTER 7. APPROXIMATION THEORY

a0 , . . . , an−1 . Note that each !i (x) is a polynomial of degree n − 1 and !i (xj ) =


δij , the Kronecker delta notation. Therefore, by uniqueness, (7.7) is proved.
Let x0 ∈ [a, b] and x0 #= xi for any i = 1, . . . , n. Construct the C n -function
n
!
(x − xi )
F (x) = f (x) − p(x) − (f (x0 ) − p(x0 )) i=1
n .
!
(x0 − xi )
i=1

It is easy to see that F (xi ) = 0 for i = 0, . . . , n. By the Rolle’s theorem, there


exists ξ between x0 , . . . , xn such that F (n) (ξ) = 0. It follows that

n!
f (n) (ξ) − (f (x0 ) − p(x0 )) n = 0.
!
(x0 − xi )
i=1

n
!
(x0 − xi )
i=1
Thus E(x0 ) = f (x0 ) − p(x0 ) = n! f (n) (ξ). Since x0 is arbitrary, the
theorem is proved. ⊕

Definition 7.1.1 The polynomial p(x) defined by (7.7) is called the Lagrange
interpolation polynomial.

Remark. The evaluation of a polynomial p(x) = a0 + a1 x + . . . + an xn for


x = ξ may be done by the so called Horner scheme:

p(ξ) = (. . . ((an ξ + an−1 )ξ + an−2 )ξ . . . + a1 )ξ + a0 (7.11)

which only takes n multiplications and n additions.

Remark. While theoretically important, Lagrange’s formula is, in general, not


efficient for applications. The efficiency is especially bad when new interpolating
points are added, since then the entire formula is changed. In contrast, Newton’s
interpolation formula, being equivalent to the Lagrange’s formula mathemati-
cally, is much more efficient.

Remark. Suppose polynomials are used to interpolate the function

1
f (x) = (7.12)
1 + 25x2
7.2. NEWTON’S INTERPOLATION FORMULA 5

in the interval [−1, 1] at equally spaced points. Runge (1901) discovered that as
the degree n of the interpolating polynomial pn (x) tends toward infinity, pn (x)
diverges in the intervals .726 . . . ≤ |x| < 1 while pn (x) works pretty well in the
central portion of the interval.

7.2 Newton’s Interpolation Formula

Interpolating a function by a very high degree polynomial is not advisable in


practice. One reason is because we have seen the danger of evaluating high de-
gree polynomials (e.g. the Wilkinson’s polynomial and the Runge’s function).
Another reason is because local interpolation (as opposed to global interpola-
tion) usually is sufficient for approximation.

One usually starts to interpolate a function over a smaller sets of support


points. If this approximation is not enough, one then updates the current in-
terpolating polynomial by adding in more support points. Unfortunately, each
time the data set is changed Lagrange’s formula must be entirely recomputed.
For this reason, Newton’s interpolating formula is preferred to Lagrange’s in-
terpolation formula.

Let Pi0 i1 ...ik (x) represent the k-th degree polynomial for which

Pi0 i1 ...ik (xij ) = f (xij ) (7.13)

for j = 0, . . . , k.

Theorem 7.2.1 The recursion formula

(x − xi0 )Pi1 ...ik (x) − (x − xik )Pi0 ...ik−1 (x)


pi0 i1 ...ik (x) = (7.14)
xik − xi0

holds.

(pf): Denote the right-hand side of (7.14) by R(x). Observe that R(x) is a
polynomial of degree ≤ k. By definition, it is easy to see that R(xij ) = f (xij )
for all j = 0, . . . , k. That is, R(x) interpolates the same set of data as does the
polynomial Pi0 i1 ...ik (x). By Theorem 7.1.1 the assertion is proved. ⊕
6 CHAPTER 7. APPROXIMATION THEORY

The difference Pi0 i1 ...ik (x) − Pi0 i1 ...ik−1 (x) is a k-th degree polynomial which
vanishes at xij for j = 0, . . . , k − 1. Thus we may write

Pi0 i1 ...ik (x) = Pi0 i1 ...ik−1 (x) + fi0 ...ik (x − xi0 )(x − xi1 ) . . . (x − xik−1 ). (7.15)

The leading coefficients fi0 ...ik can be determined recursively from the formula
(7.14), i.e.,

fi1 ...ik − fi0 ...ik−1


fi0 ...ik = (7.16)
xik − xi0

where fi1 ...ik and fi0 ...ik−1 are the leading coefficients of the polynomials Pi1 ...ik (x)
and Pi0 ...ik−1 (x), respectively.

Remark. Note that the formula (7.16) starts from fi0 = f (xi0 ).

Remark. The polynomial Pi0 ...ik (x) is uniquely determined by the set of sup-
port data {(xij , fij )}. The polynomial is invariant to any permutation of the
indices i0 , . . . , ik . Therefore, the divided differences (7.16) are invariant to per-
mutation of the indices.

Definition 7.2.1 Let x0 , . . . , xk be support arguments (but not necessarily in


any order) over the interval [a, b]. We define the Newton’s divided difference as
follows:

f [x0 ] : = f (x0 ) (7.17)


f [x1 ] − f [x0 ]
f [x0 , x1 ] : = (7.18)
x1 − x0
f [x1 , . . . , xk ] − f [x0 , . . . , xk−1 ]
f [x0 , . . . , xk ] : = (7.19)
xk − x0

It follows that the k-th degree polynomial that interpolates the set of support
data {(xi , fi )|i = 0, . . . , k} is given by

Px0 ...xk (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) (7.20)


+ . . . + f [x0 , . . . , xk ](x − x0 )(x − x1 ) . . . (x − xk−1 ).
7.3. OSCULATORY INTERPOLATION 7

7.3 Osculatory Interpolation

(0) (r )
Given {xi }, i = 1, . . . k and values ai , . . . , ai i where ri are nonnegative inte-
gers. We want to construct a polynomial P (x) such that
(j)
P (j) (xi ) = ai (7.21)

for i = 1, . . . , k and j = 0, . . . , ri . Such a polynomial is said to be an osculatory


(j)
interpolating polynomial of a function f if ai = f (j) (xi ) . . .

k
!
Remark. The degree of P (x) is at most (ri + 1) − 1.
i=1

(j)
Theorem 7.3.1 Given the nodes {xi }, i = 1, . . . , k and values {ai }, j = 0, . . . , ri ,
there exists a unique polynomial satisfying (7.21).

(pf): For i = 1, . . . , k, denote


(0) (1) (r )
qi (x) = ci + ci (x − xi ) + . . . + ci i (x − xi )ri (7.22)
P (x) = q1 (x) + (x − x1 )r1 +1 q2 (x) + . . . (7.23)
+ (x − x1 )r1 +1 (x − x2 )r2 +1 . . . (x − xk−1 )rk−1 +1 qk (x).
k
! (j)
Then P (x) is of degree ≤ (ri + 1) − 1. Now P (j) (x1 ) = a1 for j = 0, . . . , r1
i=1
(j)
(0) (0) (j) (j) (j) a1
implies =a1 c1 , . . . , a1 = c1 j!. So q1 (x) is determined with c1 = j! .
Now we rewrite (7.23) as
P (x) − q1 (x)
R(x) := = q2 (x) + (x − x2 )r2 +1 q3 (x) + . . .
(x − x1 )r1 +1

Note that R(j) (x2 ) are known for j = 0, . . . , r2 since P (j) (x2 ) are known. Thus
(j)
all c2 , hence q2 (x), may be determined. This procedure can be continued to
determine all qi (x). Suppose Q(x) = P1 (x) − P2 (x) where P1 (x) and P2 (x) are
!k
two polynomials of the theorem. Then Q(x) is of degree ≤ (ri + 1) − 1,
i=1
and has zeros at xi with multiplicity ri + 1. Counting multiplicities, Q(x) has
!k
(ri + 1) zeros. This is possible only if Q(x) ≡ 0.
i=1
8 CHAPTER 7. APPROXIMATION THEORY

Examples. (1) Suppose k = 1, x1 = a, r1 = n − 1, then the polynomial (7.23)


n−1
! (x − a)j
becomes P (x) = f (j) (a) which is the Taylor’s polynomial of f at
j=0
j!
x = x1 .

(2) Suppose ri = 1 for all i = 1, . . . , k. That is, suppose values of f (xi ) and
f " (xi ) are to be interpolated. Then the resultant (2k − 1)-degree polynomial
is called the Hermite interpolating polynomial. Recall that the (k − 1)-degree
polynomial
" k
x − xj
!i (x) = (7.24)
j=1
xi − xj
j#=i

has the property


!i (xj ) = δij . (7.25)
Define

hi (x) = [1 − 2(x − xi )!"i (xi )]!2i (x) (7.26)


gi (x) = (x − xi )!2i (x). (7.27)

Note that both hi (x) and gi (x) are of degree 2k − 1. Furthermore,

hi (xj ) = δij ;
gi (xj ) = 0; (7.28)
" " " " 2
hi (xj ) = [1 − 2(x − xi )!i (xi )]2!i (x)!i (x) − 2!i (xi )!i (x)|x=xj = 0;
gi" (xj ) = (x − xi )2!i (x)!"i (x) + !2i (x)|x=xj = δij .

So the Hermite interpolating polynomial can be written down as


k
!
P (x) = f (xi )hi (x) + f " (xi )gi (x)). (7.29)
i=1

(3) Suppose ri = 0 for all i. Then the polynomial becomes P (x) = c1 + c2 (x −


x1 ) + . . . + ck (x − x1 ) . . . (x − xk−1 ) which is exactly the Newton’s formula.

7.4 Spline Interpolation

Thus far for a given function f of an interval [a, b], the interpolation has been
to construct a polynomial over the entire interval [a, b]. There are at least two
disadvantages for the global approximation:
7.4. SPLINE INTERPOLATION 9

1. For better accuracy, we need to supply more support data. But then
the degree the resultant polynomial gets higher and such a polynomial is
difficult to work with.
2. Suppose f is not smooth enough. Then the error estimate of an high
degree polynomial is difficult to establish. In fact, it is not clear whether
or not that the accuracy will increase with increasing number of support
data.

As an alternative way of approximation, the spline interpolation is a local


approximation of a function f , which, nonetheless, yields global smooth curves
and is less likely to exhibit the large oscillation characteristic of high-degree
polynomials. (Ref: A Practical Guide to Splines, Springer-Verlga, 1978, by C.
de Boor).

We demonstrate the idea of cubic spline as follows.

Definition 7.4.1 Let the interval [a, b] be partitioned into a = x1 < x2 < . . . <
xn = b. A function p(x) is said to be a cubic plite of f on the partition if

1. The restriction of p(x) on each subinterval [xi , xi+1 ], i = 1, . . . , n − 1 is a


cubic polynomial;
2. p(xi ) = f (xi ), i = 1, . . . , n;
3. p! (xi ) and p!! (xi ) are continuous at each xi , i = 2, . . . , n − 1.

Since there are n − 1 subintervals, condition (1) requires totally 4(n − 1)


coefficients to be determined. Condition (2) is equivalent to 2(n − 2) + 2 equa-
tions. Condition (3) is equivalent to (n − 2) + (n − 2) equations. Thus we still
need two more conditions to completely determine the cubic spline.

Definition 7.4.2 A cubic spline p(x) of f is said to be

1. A clamped spline if p! (x1 ) and p! (xn ) are specified.


2. A natural spline if p!! (x1 ) = 0 and p!! (xn ) = 0.
3. A periodic spline if p(x1 ) = p(xn ), p! (x1 ) = p! (xn ) and p!! (x1 ) = p!! (xn ).
10 CHAPTER 7. APPROXIMATION THEORY

Denote Mi := p!! (xi ), i = 1, . . . , n. Since p(x) is piecewise cubic and con-


tinuously differentiable, p! !(x) is piecewise linear and continuous on [a, b]. In
particular, over the interval [xi , xi+1 ], we have
x − xi+1 x − xi
p!! (x) = Mi + Mi+1 . (7.30)
xi − xi+1 xi+1 − xi

Upon integrating p!! (x) twice, we obtain

(xi+1 − x)3 Mi + (x − xi )3 Mi+1


p(x) = + ci (xi+1 − x) + di (x − xi ) (7.31)
6δi

where δi := xi+1 − xi , and ci and di are integral constants. By setting p(xi ) =


f (xi ), p(xi+1 ) = f (xi+1 ), we get

f (xi ) δi Mi
ci = − (7.32)
δi 6
f (xi+1 ) δi Mi+1
di = − . (7.33)
δi 6

Thus on [xi , xi+1 ],

(xi+1 − x)3 Mi + (x − xi )3 Mi+1


p(x) =
6δi
(xi+1 − x)f (xi ) + (x − xi )f (xi+1 )
+
δi
δi
− [(xi+1 − x)Mi + (x − xi )Mi+1 ]. (7.34)
6
It only remains to determine Mi . We first use the continuity condition of p! (x)
at xi for i = 2, . . . , n − 1, that is,

lim− p! (x) = lim+ p! (x). (7.35)


x→xi x→xi

Thus
2
3δi−1 Mi f (xi ) − f (xi−1 ) δi−1 (Mi − Mi−1 )
+ −
6δi−1 δi−1 6
−3δi2 Mi f (xi+1 ) − f (xi ) δi (Mi+1 − Mi )
= + − , (7.36)
6δi δi 6
or equivalently,

δi−1 δi + δi−1 δi
Mi−1 + Mi + Mi+1
6 3 6
f (xi+1 ) − f (xi ) f (xi ) − f (xi−1 )
= − . (7.37)
δi δi−1
7.5. TRIGONOMETRIC INTERPOLATION 11

Note that we have n − 2 equations in n unknowns. Suppose, for example, we


work with the clamped spline, that is,
p! (x1 ) = f ! (x1 ) (7.38)
p! (xn ) = f ! (xn ). (7.39)
Then we have
δ1 δ1 f (x2 ) − f (x1 )
M1 + M2 = − f ! (x1 ) (7.40)
3 6 δ1
δn−1 δn−1 f (xn ) − f (xn−1 )
Mn−1 + Mn = f ! (xn ) − . (7.41)
6 3 δn−1
In matrix form, we obtain a linear algebraic equation
 δ1 δ1 
3 6 0  
 δ61 δ1 +δ 2 δ2
0  M1
 3 6   M2 
 0  
  
 .. .. ..  
 . . .   .. 
  . 
 δn−2 δn−2 +δn−1 δn−1 
6 3
δn−1
6
δn−1 Mn
0 6 3
 f (x2 )−f (x1 ) 
δ1 − f ! (x1 )
 vdots 
 
 f (xi+1 )−f (xi )
− f (xi )−f (xi−1 ) 
= 
 δi δi−1 .

 .. 
 . 
! f (xn )−f (xn−1 )
f (xn ) − δn−1

We note that in (7.42) the coefficient matrix is real, symmetric, tridiagonal


and strictly diagonally dominant. Therefore, there is a unique solution for
Mi , i = 1, . . . , n.

7.5 Trigonometric Interpolation

For a given set of N support points (xk , fk ), k = 0, . . . , N − 1, we consider linear


interpolation of the following forms:
M
a0 '
φ(x) = + (ah cos hx + bh sin hx) (7.42)
2
h=1
M−1
'
a0 aM
φ(x) = + (ah cos hx + bh sin hx) + cos M x, (7.43)
2 2
h=1
12 CHAPTER 7. APPROXIMATION THEORY

depending upon whether N = 2M + 1 or N = 2M .

For simplicity, we shall consider equally spaced notes. Without loss of gen-
erality, we shall assume
2πk
xk = , k = 0, . . . , N − 1.
N
Observe that
e−i(hxk ) = e−i(2πhk/N ) = ei2π(N −h)k/N = ei(N −h)xk .
Thus, we may write
eihxk + ei(N −h)xk
cos hxk = ;
2
eihxk − ei(N −h)xk
sin hxk = . (7.44)
2i

Upon substitution into (7.42) and (7.43), we obtain


M
a0 ! eihxk + ei(N −h)xk eihxk − ei(N −h)xk
φ(xk ) = + (ah + bh )
2 2 2i
h=1
M
a0 ! ah − ibh ihxk ah + ibh i(N −h)xk
= + ( e + e )
2 2 2
h=1
= β0 + β1 eixk + . . . + β2M ei2Mxk , (7.45)
M−1
!
a0 ah − ibh ihxk ah + ibh i(N −h)xk
φ(xk ) = + ( e + e )
2 2 2
h=1
aM eiMxk + ei(N −M)xk
+
2 2
= β0 + β1 e + . . . + β2M−1 ei(2M−1)xk ,
ixk
(7.46)
respectively. Thus instead of considering the trigonometric expressions φ(x), we
are motivated to consider the phase polynomial
p(x) := β0 + β1 eix + . . . + βN −1 ei(N −1)x , (7.47)
or equivalently, by setting ω := eix , the standard polynomial
P (ω) := β0 + β1 ω + . . . + βN −1 ω N −1 . (7.48)
Denoting ωk := eixk , the interpolating condition becomes
P (ωk ) = fk , k = 0, . . . , N − 1. (7.49)
Since all ωk are distinct, by Theorem 7.1.1, we know
7.5. TRIGONOMETRIC INTERPOLATION 13

Theorem 7.5.1 For any support data (xk , fk ), k = 0, . . . , N − 1 with xk =


2πk/N , there exists a unique phase polynomial p(x) of the form (7.47) such
that p(xk ) = fk for k = 0, . . . , N − 1. In fact,
N −1 N −1
1 ! 1 !
βj = fk (ωk )−j = fk (eixk )−j . (7.50)
N N
k=0 k=0

2πk
(pf): It only remains to show (7.50). Recall that ωk = eixk = ei N . Thus
ωk = ωjk and ωk−j = ωkj . Observe that ωj−h
j N
= 1. That is, ωj−h is a root of the
N!−1
polynomial ω N − 1 = (ω − 1) ω k . Thus it must be either ωj−h = 1 which is
k=0
N
! −1 N!−1 N
! −1
the case j = h, or k
ωj−h = ωkj−h = ωkj ωk−h = 0. We may therefore
k=0 k=0 k=0
summarize that
N
! −1 "
0, if j=" 0
ωkj ωk−h = (7.51)
N, if j = h.
k=0
Introducing ω (h) := (1, ω1h , . . . , ωN
h
−1 )
T
∈ C N , we may rewrite (7.51) in terms
of the complex inner product

 0, if j "= h
$ω (j) , ω (h) % = . (7.52)

N, if j=h
That is, the vectors ω (0) , . . . ω (N −1) form an orthogonal basis of C N . Denote
f := (f0 , . . . , fN −1 )T .

Then the interpolating condition (7.49) can be written as


    
1 1 1 β0 f0
 1 ω1 . . . ω1N −1     
  = 
 ..    
 . 
1 ω ω N −1 βN −1 fN −1
N −1 N −1
or simply
β0 ω (0) + β1 ω (1) + . . . + βN −1 ω (N −1) = f. (7.53)
By the orthogonality of ω (h) , (7.50) follows.

Corollary 7.5.1 The trigonometric interpolation of the data set (xk , fk ), k =


0, . . . , N − 1 with xk = 2πk/N is given by (7.42) or (7.43) with
N −1
2 !
ah = fk cos hxk (7.54)
N
k=0
14 CHAPTER 7. APPROXIMATION THEORY

N −1
2 !
bh = fk sin hxk . (7.55)
N
k=0

Definition 7.5.1 Given the phase polynomial (7.47) and 0 ≤ s ≤ N , the s-


segment ps (x) is defined to be

ps (x) := β0 + β1 eix + . . . + βs eisx . (7.56)

Theorem 7.5.2 Let p(x) be the phase polynomial that interpolates a given set
of support data (xk , fk ), k = 0, . . . , N −1 with xk = 2πk/N . Then the s-segment
ps (x) of p(x) minimizes the sum

N
! −1
S(q) = |fk − q(xk )|2 (7.57)
k=0

over all phase polynomials q(x) = γ0 + γ1 eix + . . . + γs eisx .

s
!
(pf): Introducing the vectors ps := (ps (x0 ), . . . ps (xN −1 ))T = βj ω (j) and
j=0
s
!
q := (q(x0 ), . . . , q(xN −1 ))T = γj ω (j) in C N , we write S(q) = #f − q, f − q$.
j=0
1 (j)
By theorem 7.5.1, we know βj = N #f, w $. Thus for j ≤ s, we have

1
#f − ps , w(j) $ = βj − βj = 0,
N

and hence
s
!
#f − ps , ps − q$ = #f − ps , (βj − γj )ω (j) $ = 0.
j=1

It follows that S(q) = #f − q, f − q$ = #f − ps + ps − q$ = #f − ps , f − ps $ +


#ps − q, ps − q$ ≥ #f − ps , f − ps $ = S(ps ).

Remark. (7.5.2) states the important property that the truncated trigono-
metric interpolation polynomial p(x) produces the least-squares trigonometric
approximation ps (x) of all data.
7.6. FAST FOURIER TRANSFORM 15

7.6 Fast Fourier Transform

Suppose that we have a series of sines and cosines which represent a given
function of [−L, L], say,

a0 ! nπx nπx
f (x) = + an cos + bn sin . (7.58)
2 n=1
L L

Using the facts that


" L " L
nπx nπx
cos dx = sin dx = 0 (7.59)
−L L −L L
" #
L
nπx mπx 0, if n "= m
cos cos dx = (7.60)
−L L L L, if n = m

and
" L
nπx mπx
sin cos dx = 0, (7.61)
−L L L
one can show that
" L
1 nπx
an = f (x) cos dx, n = 0, 1, . . . (7.62)
L −L L
" L
1 nπx
bn = f (x) sin dx, n = 1, 2, . . . (7.63)
L −L L

Definition 7.6.1 The series (7.58), with coefficients defined by (7.62) and
(7.63), is called the fourier series of f (x) on [−L, L].

Given a function f (x), its Fourier series does not necessarily converge to
f (x) at every x. In fact,

Theorem 7.6.1 Suppose f (x) is piecewise continuous on [−L, L]. Then (1) If
# − f (x0 + h) − f (x0 )
x0 ∈ (−L, L) and f # (x+ # ±
0 ) and f (x0 ) both exist (where f (x ) := lim± ),
h→0 h
+ −
f (x0 )+f (x0 )
then the Fourier series converges to 2 . (2) At −L or L, if f # (−L+ )
f (−L+ )+f (L− )
and f # (L− ) exist, then the Fourier series converges to 2 .
16 CHAPTER 7. APPROXIMATION THEORY

Remark. Suppose that f (x) is defined on [0, L]. We may extend f (x) to become
an even function on [−L, L] (simply by defining f (x) := f (−x) for x ∈ [−L, 0]).
In this way, the Fourier coefficients for the extended function become
! L
2 nπx
an = f (x) cos dx (7.64)
L 0 L
bn ≡ 0.

Thus we are left with a pure cosine series



a0 " nπx
f (x) = + an cos . (7.65)
2 n=1
L

Similarly, we may extend f (x) to become an odd function on [−L, L], in which
case we obtain a pure sine series

" nπx
f (x) = bn sin (7.66)
n=1
L

with
! L
2 nπx
bn = f (x) sin dx. (7.67)
L 0 L

Example.

1. If f (x) = |x|, −π ≤ x ≤ π, then

π 4 cos 3x cos 5x
f (x) = − (cos x + + + . . .).
2 π 32 52

2. If g(x) = x, −π ≤ x ≤ π, then

sin 2x sin 3x sin 4x


f (x) = 2(sin x − + − + . . .).
2 3 4

3. If
#
x(π − x), for 0≤x≤π
h(x) =
x(π + x), for −π ≤ x ≤ 0

then
8 sin 3x sin 5x
h(x) = (sin x + + + . . .).
π 33 53
7.6. FAST FOURIER TRANSFORM 17

Remark. The relationship between the trigonometric interpolation and the


Fourier series of a function f (x) can easily be established. For demonstration
purpose, we assume f (x) is defined over [0, 2π]. Define g(y) = f (y + π) = f (x)
for y ∈ [−π, π]. Then a typical term in (7.58) for g(y) is an cos ny with
!
1 π
an = g(y) cos(ny)dy. (7.68)
π −π
Suppose we partition the interval [−π, π] into N equally spaced subinterval and
define yk = −π + 2πk
N for k = 0, . . . , N − 1. Then the approximation to (7.68) is

N −1
2 "
an ≈ g(yk ) cos(nyk )
N
k=0
N −1
2 " 2πk 2πkn
= (−1)n f( ) cos := ân . (7.69)
N N N
k=0

Now observe that



a0 "
f (x) = g(y) = + an cos ny + bn sin ny
2 n=1
M
a0 "
≈ + an cos ny + bn sin ny
2 n=1
M
â0 "
≈ + ân cos n(x − π) + b̂n sin n(x − π).
2 n=1

Comparing (7.69) with (7.54), we realize that the trigonometric interpolation


polynomial converges to the Fourier series as N → ∞. Thus the trigonometric
interpolation can be interpreted as the Fourier analysis applied to discrete data.

Remark. The basis of the Fourier analysis method for smoothing data is as
follows: If we think of given numerical data as consisting of the true values of
a function with random errors superposed, the true functions being relatively
smooth and the superposed errors quite unsmooth, then the examples above sug-
gest a way of partially separating functions from error. Since the true function
is smooth, its Fourier coefficients will decrease quickly. But the unsmoothness
of the error suggests that its Fourier coefficients may decrease very slowly, if at
all. The combined series will consist almost entirely of error, therefore, beyond
a certain place. If we simply truncate the series at the right place, then we are
discarding mostly error (although there will be error contribution in the terms
retained).

Remark. Since truncation produces a least-squares approximation (See (7.5.2),


the fourier analysis method may be viewed as least-squares smoothing.
18 CHAPTER 7. APPROXIMATION THEORY

We now derive the Fourier series in complex form.

jπx kπx
Lemma 7.6.1 The functions ei L and ei L are orthogonal in the following
sense:
! L "
i jπx i kπx 0 if k != j
e L e L = . (7.70)
−L 2L, if k = j

Assume the Fourier series takes the form



# nπx
f (x) = fn ei L . (7.71)
n=−∞

kπx
Multiplying both sides of (7.71) by e−i L and integrating brings
! L ∞
# ! L
−i kπx nπx kπx
f (x)e L dx = fn ei L ei L dx. (7.72)
−L n=−∞ −L

By the orthogonality property, it is therefore suggested that


! L
1 kπx
fk = f (x)ei L dx. (7.73)
2L −L

Remark. The complex form (7.71) can be written as



# ∞
#
nπx nπx nπx
fn ei L = fn (cos + i sin )
n=−∞ n=−∞
L L

# nπx nπx
= f0 + (fn + f−n ) cos + i(fn − f−n ) sin . (7.74)
n=1
L L

It is easy to see that f0 = a20 , fn + f−n = an and fn + f−n = bn . That is, the
series (7.71) is precisely the same as the series (7.58).

Remark. We consider the relation between the trigonometric interpolation


and the fourier series again. Let f (x) be a function defined on [0, 2π]. Then the
Fourier series of f may be written in the form


#
f (x) = fn einx (7.75)
n=−∞
7.6. FAST FOURIER TRANSFORM 19

where ! 2π
1
fn = f (x)e−inx dx. (7.76)
2π 0
Consider the function g(x) of the form
"
"
g(x) = dj eijx (7.77)
j=−"

for x ∈ [0, 2π]. Suppose g(x) interpolates f (x) at x = xk = 2πn N for k =


0, 1, . . . , N − 1. Multiplying both sides of (7.77) by e−inxk and sum over k, we
obtain
N
" −1 N
" −1 "
"
f (xk )e−inxk = dj eijxk e−inxk (7.78)
k=0 k=0 j=−"
"
" N
" −1
= dj ( eijxk e−inxk ). (7.79)
j=−" k=0

We recall from (7.51) the orthogonality in the second summation of (7.74).


Therefore,
N −1
1 "
dn = f (xk )e−inxk (7.80)
N
k=0
for n = 0, ±1, . . . ± ". Once, we see that (7.80) is a Trapezoidal approxi-
mation of the integral (7.76). The match between (7.49) and (7.80) is con-
spicuous except that the ranges of validity do not coincide. Consider the
case where N = 2" + 1. Then obviously βj = dj for j = 0, 1, . . . , ". But
N
" −1 N −1
1 "
βN +j = N1 f (xk )e−i(N +j)xk = f (xk )e−ijxk = dj for j = −1, . . . , −".
N
k=0 k=0

The central idea behind the fast Fourier transform (FFT) is that when N is
the product of integers, the numbers dj (or βj ) prove to be closely interdepen-
dent. This interdependence can be exploited to substantially reduce the amount
of computation required to generated these numbers. We demonstrate the idea
as follows:

Suppose N = t1 t2 where both t1 and t2 are integers. Let


j : = j1 + t1 j2
n : = n2 + t2 n1
for j1 , n1 = 0, 1, . . . , t1 − 1 and j2 , n2 = 0, 1, . . . , t2 − 1. Note that both j and n

run their required ranges 0 to N − 1. Let ω := e−i N . Then ω N = 1. Thus
N −1 N −1
1 " −i 2πjn 1 "
βj = βj1 +t1 j2 = f (xn )e N = f (xn )ω nj
N n=0 N n=0
20 CHAPTER 7. APPROXIMATION THEORY

N −1
1 !
= f (xn )ω j1 n2 +j1 t2 n1 +t1 j2 n2
N n=0
t2 −1 t!
1 −1
1 !
= ( f (xn2 +t2 n1 )ω j1 t2 n1 )ω j1 n2 +t1 j2 n2 . (7.81)
N n =0 n =0
2 1

The equation (7.81) can be arranged in a two-step algorithm:


1 −1
t!
F1 (j1 , n2 ) := f (xn2 +t2 n1 )ω j1 t2 n1 ; (7.82)
n1 =0

t2 −1
1 !
βj = F2 (j1 , j2 ) := F1 (j1 , n2 )ω j1 n2 +t1 j2 n2 . (7.83)
N n =0
2

To compute F1 there are t1 terms to processed; to compute F2 there are t2 .


The total is t1 + t2 . This must be done for each (j1 , n2 ) and (j1 , j2 ) pair, or N
pairs. The final count is, thus, N (t1 + t2 ) terms processed. The original form
processed N terms for each j, a total of N 2 terms. The gain in efficiency, if
measured by this standard, is thus t1N +t2
and depends very much on N . If, for
instance, N = 1000 = 10 × 100, then only 11% of the original 1000000 terms
are needed.

Example. Consider the case N = 6 = 2 × 3, xn = 2nπ


6 and the following discrete
data:
n 0 1 2 3 4 5
.
f (xn ) 0 1 1 0 -1 -1
Then values of F1 (j1 , n2 ), according to (7.82), are given in the table

n2
0 1 2
0 0 0 0
j1
1 0 2 2

Values of F2 (j1 , j2 ), according to (7.84), are given by

j2
0 1 2
0 0 0 0
j1 √ √
1 2 3i 0 −2 3i
7.7. UNIFORM APPROXIMATION 21

Note that in programming language, the j2 loop is external to the j1 loop.

Suppose N = t1 t2 t3 . Let
j = j1 + t1 j2 + t1 t2 j3
n = n3 + t3 n2 + t3 t2 n1 .
Then in the nine power terms in ω nj , three will contain the product t1 t2 t3 and
hence may be neglected. The remaining six power terms may be grouped into
a three step algorithm:
1 −1
t!
F1 (j1 , n2 , n3 ) := f (xn )ω j1 t3 t2 n1 (7.84)
n1 =0
2 −1
t!
F2 (j1 , j2 , n3 ) := F (j1 , n2 , n3 )ω (j1 +t1 j2 )t3 n2 (7.85)
n2 =0
t3 −1
1 !
βj = F3 (j1 , j2 , j3 ) := F2 (j1 , j2 , n3 )ω (j1 +t1 j2 +t1 t2 j3 )n3 . (7.86)
N n =0
3

3
We note that if N = 10 , then only 3% of the original terms are needed.

7.7 Uniform Approximation

We have seen that the problem of approximating a continuous function by a


finite linear combination of given functions can be approached in various ways.
In this section, we want to use the maximal deviation of the approximation as
a measure of the quality of the approximation. That is, we want to consider
the normed linear space C[a, b] equipped with the sup-norm ! · !∞ . We shall
limit out attention to approximate a continuous function by elements from the
subspace Pn−1 of all polynomials of degree ≤ n − 1. Since the error provides a
uniform bound on the deviation throughout the entire interval, we refer to the
result as a uniform approximation.

We first present a sufficient condition for checking if a given polynomial is a


best approximation.

Theorem 7.7.1 Let g ∈ Pn−1 , f ∈ C[a, b] and ρ := !f − g!∞ . Suppose there


exist n + 1 points a ≤ x1 < . . . < xn+1 ≤ b such that
|f (xν ) − g(xν )| = ρ (7.87)
22 CHAPTER 7. APPROXIMATION THEORY

f (xν+1 ) − g(xν+1 ) = −(f (xν ) − g(xν )) (7.88)

for all ν. Then g is a best approximation of f .

(pf): Let
M := {x ∈ [a, b]||f (x) − g(x)| = ρ}. (7.89)

Certainly xν ∈ M for all ν = 1, . . . , n+1. If g is not the best approximation, then


there exists a best approximation f˜ that can be written in the form f˜ = g + p
for some p ∈ Pn−1 and p is not identically zero. Observer that for all x ∈ M we
have
|e(x) − p(x)| < |e(x)| (7.90)

where e(x) := f (x) − g(x). The inequality in (7.90) is possible if and only if the
sign of p(x) is the same as that of e(x). That is, we must have (f (x)−g(x))p(x) >
0 for all x ∈ M . By (7.88), it follows that the polynomial p must change signs
at least n times in [a, b]. That is, p must have at least n zeros. This contradict
with the assumption that p is not identically zero.

Remark. The above theorem asserts only that g is a best approximation when-
ever there are at least n + 1 points satisfying (7.87) and (7.88). In general, there
can be more points where the maximal deviation is achieved.

Example. Suppose we want to approximate f (x) = sin 3x over the interval


[0, 2π]. It follows from the theorem that if n − 1 ≤ 4, then the polynomial g = 0
is a best approximation of f . Indeed, in this case the difference f − g alternates
between its maximal absolute value at six points, whereas the theorem only
requires n + 1 points. On the other hand, for n − 1 = 5 we have n + 1 = 7, and
g = 0 no longer satisfies conditions (7.87) and (7.88). In fact, in this case g = 0
is not a best approximation from P5 .

Remark. The only property of Pn−1 we have used to establish Theorem 7.7.1
is a weaker form of the Fundamental Theorem of Algebra, i.e., any polynomial
of degree n − 1 has at most n − 1 distinct zeros in [a, b]. This property is in fact
shared by a larger class of functions.

Definition 7.7.1 Suppose that g1 , . . . , gn ∈ C[a, b] are n linearly independent


functions such that every non-trivial element g ∈ U := span{g1 , . . . , gn } has at
most n − 1 distinct zeros in [a,b]. Then we say that U is a Haar space. The
basis {g1 , . . . , gn } of a Haar space is called a Chebyshev system.
7.7. UNIFORM APPROXIMATION 23

Remark. We have already seen that {1, x, x2 , . . . , xn−1 } forms a Chebyshev


system. Two other interesting examples are

1. {1, ex , e2x , . . . , e(n−1)x } over R .


2. {1, sin x, . . . , sin mx, cos x, . . . , cos mx} over [0, 2π].

We now state without proof the famous result that Theorem 7.7.1 is not
only sufficient but is also necessary for a polynomial g to a best approximation.
The following theorem is also known as the Alternation Theorem:

Theorem 7.7.2 The polynomial g ∈ Pn−1 is a best approximation of the func-


tion f ∈ [a, b] if and only if there exist points a ≤ x1 < . . . < xn+1 ≤ b such
that conditions (7.87) and (7.88) are satisfied.

Definition 7.7.2 The set of points {x1 , . . . , xn+1 } in Theorem 7.7.2 is referred
to as an alternant for f and g.

Corollary 7.7.1 For any f ∈ C[a, b], there is a unique best approximation.

Theorem 7.7.1 also provides a basis for designing a method for the compu-
tation of best approximations of continuous functions. The idea, known as the
exchange method of Remez, is as follows:

1. Initially select points such that a = x1 < . . . xn+1 = b.


2. Compute the coefficients of a polynomial p(x) := an−1 xn−1 + . . . a0 and a
number ρ so that
(f − p(0) )(xν ) = (−1)ν−1 ρ. (7.91)
for all 1 ≤ ν ≤ n + 1. Note that the equations in (7.91) form a linear
system which is solvable.
The problem in step (2) is that even the property of alternating signs has
been satisfied in (7.91), it is not necessarily ture that ρ = $f − p$∞ . We
thus need to replace a new alternant.
3. Locate the extreme points in [a, b] of the absolute error function e(x) :=
f (x) − p(x). For the sake of simplicity, we assume that there are exactly
n + 1 extreme points, including a and b.
24 CHAPTER 7. APPROXIMATION THEORY

4. Replace {xk } by the new extreme points and repeat the sequence of steps
given above beginning with step (2).

The objective here is to have the set {xk } converge to a true alternant and
hence the polynomial converge to a best approximation. It can be proved that
the process does converge for any choice of starting values in step (1) for which
the value of ρ computed in step (2) in not zero. With additional assumptions on
the differentibility of f , it can also be shown that convergence is quadratic. Also
the assumption that e(x) possesses exactly n + 1 extrem points in step (3) is
not essential. For more details, refer to G. Meinardus’s book ”Approximation of
Functions: Theory and Numerical Methods”, Springer-Verlag, New York, 1967.
Chapter 8

Differentiation and
Integration

Given a function f (x), whether known as discrete data or as continuum


!b
values, it is a classical problem to calculate the functionals f ! (x) or a f (x).
Except for a few simple functions, it is not always possible to obtain closed form
solutions. Therefore, numerical techniques for approximating these functions
becomes important in practice.
8.1 Numerical Differentiation
Suppose values of f (x) is known as continuum data, then the derivatives of f (x)
usually can be approximated by difference quotients. These formulas usually
can be derived from the Taylor series representation.
Example. Suppose f (x) has a Taylor’s polynomial expansion near the point
!! !!!
x = c. For instance, suppose f (c + h)= f (c) + f ! (c)h + f 2(c) h2 + f 6(c) h3 + 0(h4 ),
!! !!!
and f (c − h) = f (c) − f ! (c)h + f 2(c) h2 − f 6(c) h3 + 0(h4 ). Then

f (c + h) − f (c)
f ! (c) = + 0(h), (8.1)
h
f (c + h) − f (c − h)
f ! (c) = + 0(h2 ), (8.2)
2h
−3f (c) + 4f (c + h) − f (c + 2h)
f ! (c) = + 0(h2 ), (8.3)
2h
f (c − h) − 2f (c) + f (c + h)
f !! (c) = + 0(h2 ). (8.4)
h2
Remark. From the above finite difference formulas, it seems that the approx-
imation would become more and more accurate as h → 0. This observation
is true, however, only for exact arithmetic. In actually computation, there

1
2 CHAPTER 8. DIFFERENTIATION AND INTEGRATION

is a lower bound on h beyond which no improvement on accuracy should be


expected. To understand this phenomenon, consider the central finite differ-
ence (8.2). Due to roundoff errors, we know f (c + h) = f(c ˆ + h) + E + and
f (c − h) = fˆ(c − h) + E where fˆ represents the floating point value of f .

ˆ ˆ
According to the formula, we are calculating fˆ" (c) = f (c+h)−
2h
f(c−h)
. Thus
E+ − E−
f " (c) − fˆ" (c) = + 0(h2 ). (8.5)
2h
The second term in (8.5) is stable and converges to zero as h → 0. But the
first term becomes unbounded as h → 0, since the numerator E + − E − is
approximately equal to the machine accuracy and is bounded away from zero.
Obviously using double precision in calculation can help to delay this from
happening too soon.

8.2 Richardson Extrapolation


The Richardson extrapolation technique is often used to improve the accuracy
from a set of already computed approximate solutions. The method works for
both numerical differentiation and numerical integration. The idea is as follows:
Let q denote the true quantity needed. Let N (q) denote the approximation
to q by a specific numerical scheme. Usually these quantities are related by the
equation:
N (q) = q + chm + 0(hn ) (8.6)
where h is the stepsize used in the numerical scheme and n > m. Suppose we
have already used, respectively, two different stepsizes h = h1 and h = h2 (h1 >
h2 ) to obtain N (q)1 and N (q)2 . Then
N (q)1 = q + chm n
1 + 0(h1 ), (8.7)

N (q)2 = q + chm n
2 + 0(h2 ). (8.8)
h1
Let k := h2 . Then it is easy to see that
k m N (q)2 − N (q)1 = (k m − 1)q + 0(hn2 ). (8.9)
Thus
k m N (q)2 − N (q)1
M (q) := = q + 0(hn2 ) (8.10)
km − 1
is a higher order approximation to q.

8.3 Newton-Cotes Quadrature


Definition 8.3.1 Given a function f (x) defined on [a, b], a formula of the form
n
!
Qn (f ) := αi f (xi ) (8.11)
i=1
8.3. NEWTON-COTES QUADRATURE 3

with αi ∈ R and xi ∈ [a, b] is called a quadrature rule for the integral I(f ) :=
!b
a
f (x)dx. The points xi are called the quadrature points (abscissas) and the
values αi are called the quadrature coefficients (weights). We also define the
quadrature error En (f ) := I(f ) − Qn (f ).

Definition 8.3.2 A quadrature rule is said to have degree of precision m if


En (xk ) = 0 for k = 0, . . . , m and En (xm+1 ) #= 0.

Remark. If a quadrature rule has degree of precision m, then En (pk ) = 0 for


all polynomials pk (x) of degree ≤ m.
Examples.(1) (Trapezoidal Rule) Suppose we approximate f (x) over [a, b] by
a segment joining points (a, f (a)) and (b, f (b)). Then

b−a
I(f ) ≈ Q2 (f ) = [f (a) + f (b)]. (8.12)
2
Since f (x) − p1 (x) = (x − a)(x − b)f [a, b, x], it follows that
" b
E2 (f ) = (x − a)(x − b)f [a, b, x]dx. (8.13)
a

Observe that (x − a)(s − b) does not change sign in [a, b]. By the mean value
!b !b
theorem, i.e. a f (x)g(x)dx = f (ξ) a g(x)dx for some ξ ∈ [a, b] if g(x) is of one
sign over [a, b], we conclude that
" b
E2 (f ) = f [a, b, ξ] (x − a)(x − b)dx
a
f (2) (η) (b − a)3 f (2) (η)
= =− (b − a)3 . (8.14)
2 6 12
(2) (Simpson’s Rule) Suppose #we approximate
$ f (x) by a quadratic polynomial
p2 (x) that interpolates f (a), f a+b
2 and f (b). Then one can show that

b−a a+b
Q3 (f ) = [f (a) + 4f ( ) + f (b)]. (8.15)
6 2
The error obviously is given by
" b
a+b
E3 (f ) = f [a, b, , x]ω(x)dx (8.16)
a 2

with ω(x) := (x − a)(x − b)(x − a+b 2 ). The function ω(x) changes sign as
x crosses !a+b
2 . So we have to analyze E3 (x) by a different approach. Let
x
Ω(x) := a ω(t)dt. Then Ω! (x) = ω(x). Thus by integration by parts, we
have E3 (f ) = f [a, b, a+b b b a+b
2 , x]Ω(x)|a − |a f [a, b, 2 , x, x]Ω(x)dx. Observe that
Ω(a) = Ω(b) = 0. Observe also that Ω(x) > 0 for all x ∈ (a, b). Thus now we
4 CHAPTER 8. DIFFERENTIATION AND INTEGRATION

may apply the mean value theorem to conclude that


! b ! b
a+b a+b
E3 (f ) = − , x, x]Ω(x)dx = −f [a, b,
f [a, b, , ξ, ξ] Ω(x)dx
a 2 2 a
" #5 " #5
f (4) (η) 4 b − a f (4) (η) b − a
= − =− . (8.17)
4! 15 2 90 2

Remark. The degree of precision for Simpson’s rule is 3 rather than 2.


b−a
For a general integer n > 1, define h := n−1 , xi := a + (i − 1)h, i = 1, . . . , n.
Let pn−1 (x) be the Lagrangian interpolation polynomial of f (x) at nodes xi , i =
1, . . . , n. Then we may write
n
$ n
%
f (x) = f (xi )#i (x) + f [x1 , . . . , xn , x] (x − xi ). (8.18)
i=1 i=1

Thus we obtain a quadrature formula


n
$
I(f ) = αi f (xi ) + En (f ) (8.19)
i=1

with quadrature coefficients


! b
αi := #i (x)dx (8.20)
a

and error ! b n
%
En (f ) = f [x1 , . . . , xn , x] (x − xi )dx. (8.21)
a i=1
&n
In general, the function ωn (x) := i=1 (x−xi ) changes sign in the interval [a, b].
But it can be proved that

Theorem 8.3.1 (1) If n is odd and f ∈ C n+1 , then


! b
f (n+1) (ξ)
En (f ) = xωn (x)dx. (8.22)
(n + 1)! a

(2) If n is even and f ∈ C n , then


! b
f (n) (ξ)
En (f ) = ωn (x)dx. (8.23)
n! a

(pf): This is a homework problem.

Definition 8.3.3 The quadrature (8.19) obtained above is called a closed Newton-
Cotes formula.
8.4. GAUSSIAN QUADRATURE 5

1
Remark. Consider the Runge’s function f (x) = 1+x 2 defined over [−5, 5]. We

recall the fact that the interpolating polynomial using equally spaced nodes will
not yield good convergence as n → ∞. Consequently, we should not expect that
the closed Newton-Cotes formula will yield accurate results for the integral. For
!4 1 −1
instance, I(f ) = −4 1+x 2 dx = 2 tan 4 ≈ 2.6516. But the closed Newton-
Cotes formula yields
n 3 5 7 9 11
For this reason, we
Qn (f ) 5.4902 2.2776 3.3288 1.9411 3.5956
seldom use the Newton-Cotes formula for n > 5 in practice. Instead, we limit
n ≤ 5 and the so called composited rules.

Definition 8.3.4 In calculating the integral of a function f (x) over an interval


[a, b], we first divide [a, b] into a number of subintervals, and then apply the
Newton-Cotes formula with lower n to each subinterval, and then add the results.
Such a method is called a Composited Newton-Cotes formula.

Example. Suppose the interval [a, b] is divided into an even number, say 2m,
of subintervals. Let h = b−a
2m , xi = a + ih, i = 0, 1, . . . , 2m. Then the composite
simpson’s rule takes the form
" b # m−1 m−1
%
h $ $
f (x)dx = f (a) + 2 f (x2i ) + 4 f (x2i+1 ) + f (b)
a 3 i=1 i=0

(b − a) 4 (4)
− h f (ξ). (8.24)
180
Remark. Suppose f (xi ) = yi +"i where yi is the floating point number approx-
imation to f (xi ) and "i is the roundoff error. Upon substituting these values
into (8.24), we obtain
" b # m−1 m−1
%
h $ $
f (x)dx = y0 + 2 y2i + 4 y2i+1 + y2m
a 3 i=1 i=0
# m−1 m−1
%
h $ $
+ "0 + 2 "2i + f "2i+1 + "2m . (8.25)
3 i=1 i=0

Suppose |"i | ≤ ". Then the error due to roundoff is bounded by h3 6m" = (b−a)".
Thus, unlike the process of numerical differentiation, numerical integration using
composite rule is a stable process in the sense that the error due to roundoff is
independent of the stepsize h.

8.4 Gaussian Quadrature


!b
The Newton-Cotes formula for the integral I(f ) = a f (x)dx is based on the
integration of the polynomial p(x) that interpolates f (x) at a set of equally
6 CHAPTER 8. DIFFERENTIATION AND INTEGRATION

spaced nodes in [a, b]. Gaussian quadrature results from a different approach
in which both the abscissas xi and weights αi are to be determined so that the
quadrature
n
!
Qn (f ) = αi f (xi ) (8.26)
i=1

has a maximal degree of precision. Since there are 2n unknowns in (8.26), the
requirements
En (xk ) = 0, k = 0, 1, . . . , 2n − 1 (8.27)
supply 2n equations. Thus it is expected that the maximal degree of precision
is ≥ 2n − 1. The condition (8.27) is equivalent to
n
! bk+1 − ak+1
αi xki = , k = 0, 1, . . . , 2n − 1 (8.28)
i=1
k+1

which is a nonlinear system. It is not clear whether (8.28) always has a solution
for arbitrary a and b. Neither is it clear whether the solution of (8.28) is real-
valued or not.

Theorem 8.4.1 A quadrature formula using n distinct abscissas results from


the integration of an interpolation polynomial if and only if the quadrature has
degree of precision ≥ n − 1.

(pf): (=⇒) This follows from Theorem 8.3.1.


!n
(⇐=) Suppose the quadrature αi f (xi ) has degree of precision ≥ n − 1.
i=1
n
! bk+1 − ak+1
Then αi xki = for k = 0, . . . , n − 1. We may rewrite this system
i=1
k+1
of equations as
    
1 1 α1 b−a
 x1 xn    
  = . (8.29)
    
n−1 n n
b −a
x1 , . . . xn−1
n an n

The coefficient matrix is the von de Morde matrix. Thus system (8.29) has a
unique solution for (αi ). On the other hand, we can construct an interpolating
polynomial using the same nodes {xi }. This polynomial results in a quadrature
!n
formula βi f (xi ) which has degree of precision ≥ n − 1. By setting En (xk ) =
i=1
0 for k = 0, . . . , n − 1, we end up with the same linear system (8.29). By
uniqueness, it must be that αi = βi . ⊕
By Theorem 8.4.1, the gaussian quadrature may be thought of as the inte-
gration of ascertain polynomial that interpolates f (x) at a certain set of nodes
8.4. GAUSSIAN QUADRATURE 7

x1 , . . . , xn . In this sense, we still find


! b n
"
En (f ) = f [x1 , . . . , xn , x] (x − xi )dx. (8.30)
a i=1

Not only En (xk ) = 0 for k = 0, . . . , n − 1, but we further would like to require


En (xk ) = 0 for k = n, . . . , n + ν, and for ν as large as possible. There are two
approaches to accomplish this goal:

Lemma 8.4.1 If f (x) = xn+ν , ν ≥ 0, then the n-th divided difference f [x1 , . . . , xn , x]
is a polynomial of degree at most ν.

x1+ν −x1+ν
(pf): When n = 1, we find f [x1 , x] = f (xx11)−f −x
(x)
= 1 x1 −x is obviously a
polynomial of degree ν. Suppose the assertion is true for n = k. Consider f (x) =
xk+1+ν . Then f [x2 , . . . , xk+1 , x], by induction hypothesis, is a polynomial of de-
f [x1 ,...,xk+1 −f [x2 ,...,xk+1 ,x]
gree at most ν + 1. Observe that f [x1 , . . . , xk+1 , x] = x1 −x .
Note that the numerator has a zero at x = x1 . Thus f [x1 , . . . , xk+1 , x] is a
polynomial of degree of most ν. The assertion now follows from the induction.

From (8.30) and Lemma 8.4.1, we see that En (f ) = 0 for f (x) = xn , . . . , xn+ν
n
#b "
if and only if a xk (x − xi )dx = 0 holds for k = 0, 1, . . . , ν. Thus we are seek-
i+1
n
"
ing {xi } so that ωn (x) := (x − xi ) is perpendicular to functions 1, x, . . . , xν .
i=1
(x)
Since ωn is a polynomial of degree n, we cannot have ν ≥ n; otherwise, the
polynomial ωn (x) would be perpendicular to itself. Is it then possible to have
ν = n − 1? The answer is positive.

Lemma 8.4.2 From the basic polynomials 1, x, x2 , . . . and the inner product
#b
< f, g >:= a f (x)g(x)dx, we can generate a new basis of polynomials {φk (x)}
such that φk is of degree k, and < φi , φh >= 0 whenever i $= j. The φk ’s are
unique up to constant multipliers.

(pf): The assertion follows from the standard Gram-Schmidt orthogonalization


process. ⊕
From the above discussion, we conclude that we may select the quadra-
ture coefficients {αi } and the nodes {xi } in the formula (8.26) such that the
quadrature has degree of precision 2n − 1. Such a formula is interpolating
the function f (x) at xi which are zeros of the n-th orthogonal polynomial,
n
#b #b" x − xj
that is, ωn (x) = cφn (x). In this case, αi = a %i (x)dx = a dx =
xi − xj
j"=i
! b
1 φn (x)
dx. This formula is called the gaussian quadrature formula.
φn (xi ) a x − xi
8 CHAPTER 8. DIFFERENTIATION AND INTEGRATION

Remark. Without loss of generality, we may assume [a, b] = [−1, 1]. In this
case, the orthogonal polynomial are known as the Legendre’s polynomials and
are given by:

P0 (x) = 1, (8.31)
P1 (x) = x, (8.32)
1
P2 (x) = (3x2 − 1), (8.33)
2
1 dk 2
Pk (x) = (x − 1)k . (8.34)
2k k! dxk
The resulting quadrature is known as the Gauss-Legendre quadrature formula.

Theorem 8.4.2 All the n zeros of the orthogonal polynomial φn (x) are real-
valued, distinct, and contained in (a, b).

(pf): Let x1 , . . . , xm ∈ (a, b) denote all the distinct, real zeros of φn (x) with
!b
odd multiplicity. Assume m < n. Consider the integral a (x − x1 ) . . . (x −
xm )φn (x)dx. Note that the integrand does not change sign over [a, b] and is
not identically zero. Thus the integral is positive. On the other hand, observe
that (x − x1 ) . . . (x − xm ) is a polynomial of degree m < n. By orthogonality
of φn (x), the integral should be zero. This is a contradiction. Thus it must be
that m = n, and the multiplicity is 1.
We now consider another approach to the Gaussian quadrature. Recall the
Hermite interpolation formula for f (x) (Example 1 in Section 7.3):
n
" n
"
f (x) = hi (x)f (ai ) + gi (x)f ! (ai )
i=1 i=1

ωn2 (x)
+ f (2n) (ξ) (8.35)
(2n)!

where

hi (x) = [1 − 2(x − ai )$!i (ai )]$2i (x) (8.36)

gi (x) = (x − ai )$2i (x) (8.37)

and $i (x) is the Lagrangian interpolation polynomial

#n
x − aj ωn (x)
$i (x) = = (8.38)
j=1
a i − a j (x − ai )ωn! (ai )
j"=i

and
n
#
ωn (x) = (x − ai ). (8.39)
i=1
8.4. GAUSSIAN QUADRATURE 9

Remark. Recall that

!i (ak ) = δik , i, k = 1, . . . , n. (8.40)

and the properties

hi (ak ) = gi! (ak ) = δik (8.41)


gi (ak ) = h!i (ak ) = 0, i, k = 1, . . . , n. (8.42)

Remark. Since the Hermite interpolation polynomial interpolates f and f ! at


n nodes, we know that the Hermite polynomial is exactly the same f if f is a
polynomial of degree 2n − 1 or less.
Integrating (8.35) over the interval [a, b], we get
! b n
" n
"
f (x)dx = αi f (ai ) + βi f ! (ai ) + En (f ) (8.43)
a i=1 i=1

where
! b
αi = hi (x)dx (8.44)
a
! b
βi = gi (x)dx (8.45)
a
! b
ωn2 (x) (2n)
En (f ) = f (ξ)dx. (8.46)
a (2n)!

Note En (f ) is automatically zero if f (x) is a polynomial of degree 2n − 1 or less.


If we can choose the abscissas so that βj = 0, j = 1, . . . , n, then (8.43) will have
the form (8.26) with the desired degree of precision. That is,

βj = 0, j = 1, . . . , n, (8.47)

is a sufficient condition for the quadrature (8.26) to have degree of precision


2n − 1. The condition (8.47) is also necessary. This can be seen from letting
is a polynomial of degree 2n − 1, En (gj ) = 0.
f (x) = gj (x) in (8.26). Since gj (x) #
n
But also βj = I(gj ) = Qn (gj ) = i=1 αi gj (xi ) = 0 because of the property
(8.41) and (8.42).
Using (8.37) and (8.45), we have
! b ! b
!j (x)
βj = (x − aj )!2j (x)dx = ωn (x) dx. (8.48)
a a ωn! (aj )

Since ωn (x) is a polynomial of degree n and !j (x) is a polynomial of degree


n − 1, a sufficient condition for βj = 0, j = 1, . . . , n, is that ωn (x) be orthogonal
to all polynomials of degree n − 1 or less over [a, b]. It can be proved that this
condition is also necessary. This is done by letting f (x) = ωn (x)un−1 (x) in
(8.26) where un−1 (x) is an arbitrary polynomial of degree n − 1 or less. Since
10 CHAPTER 8. DIFFERENTIATION AND INTEGRATION

(8.26) is exact for polynomials of degree 2n − 1 or less we have I(ωn un−1 ) =


!n
αi ωn (ai )un−1 (ai ) = 0, that is,
i=1

ωn ⊥un−1 . (8.49)

Remark. By now we have reached the same conclusion as the other approach
on the property of the function ωn (x). Assume [a, b] = [1, −1]. Then

2n (n!)2
ωn (x) = Pn (x) (8.50)
(2n)!

where Pn (x) is the well-known Legendre polynomial defined by

P0 (x) = 1 (8.51)
P1 (x) = x (8.52)
2k + 1 k
Pk+1 (x) = xPk (x) − Pk−1 (x). (8.53)
k+1 k+1

We have seen from Theorem (8.4.2) that the zeros of the Legendre polyno-
mial of any degree are real, so this settles the question of the existence of real
abscissa. To find the weights we use (8.37) and (8.45) to get
" 1 " 1
αj = hj (x)dx = [1 − 2(x − aj )#"j (aj )]#2j (x)dx
−1 −1
" 1 " 1
= #2j (x)dx − 2#"j (aj ) (x − aj )#2j (x)dx
−1 −1
" 1
= #2j (x)dx (8.54)
−1

since the second integral, which by definition is βj , is zero. From (8.54) it is


obvious that the wieghts are all positive. By considering (8.26) with f (x) =
#j (x), which is a ploynomial of degree n − 1, we have
" 1 n
!
#j (x)dx = αi #j (ai ) = αj (8.55)
−1 i=1

since #j (ai ) = δji . Together (8.54) and (8.55) imply that


" 1 " 1
αj = #2j (x)dx = #j (x)dx. (8.56)
−1 −1

The following talbe shows the abscissas and weights for values of n of practical
interest.
8.5. OTHER GAUSSIAN QUADRATURES 11

n Abscissas xj Weights αj

2 ±1/ 3 1

3 ± 0.6 5/9
0 8/9
4 ±0.8611363116 0.3478548451
±0.3399810436 0.6521451549
5 ±0.9061798459 0.2369268850
±0.5384693101 0.4786286705
0 0.5688888889
6 ±0.9324695142 0.1713244924
±0.6612093865 0.3607615730
±0.2386191861 0.4679139346

To use Gauss quadrature to integrate f (x) over an arbitrary interval [a, b],
we simply map the ξ-interval [−1, 1] into the x-interval [a, b] using the linear
transformation
b−a b−a
x=a+ (ξ + 1), dx = dξ. (8.57)
2 2
Making this substitution in (8.26) gives
! b ! 1
b−a b−a
f (x)dx = f (a + (ξ + 1)) dξ (8.58)
a −1 2 2
! 1
b−a b−a
= f (a + (ξ + 1))dξ (8.59)
2 −1 2
n
b−a"
≈ αi f (xi ) (8.60)
2 i=1

where αj are the tabulated gaussian weights associated with the tabulated gaus-
sian abscissa aj in [−1, 1], and xj is obtained from aj as follows:
b−a
xj = a + (aj + 1), j = 1, . . . , n. (8.61)
2

8.5 Other gaussian Quadratures


Depending on the applications, sometimes it is necessary to integrate a function
f (x) with respect to a specified weight function w(x), i.e.,
! b
Iw (f ) := w(x)f (x)dx (8.62)
a
12 CHAPTER 8. DIFFERENTIATION AND INTEGRATION

where w(x) ≥ 0 on [a, b]. In such a case, we consider a quadrature rule of the
form
n
!
Qw (f ) := αi f (xi ). (8.63)
i=1

Note that the weight function w(x) does not appear on the right hand side of
(8.63). Using the same idea as discussed in the preceding section, it can be
proved that a sufficient and necessary condition for Qw (f ) to have as high the
degree of precision as possible is that the nodes xi are the zeros of an orthogonal
"n
polynomial with respect to w(x). That is, if ωn (x) := (x − xi ), then
i=1

# b
w(x)un−1 (x)ωn (x)dx = 0 (8.64)
a

for every polynomial un−1 (x) of degree ≥ n − 1. In this case, the quadrature
coefficients are
# b
αi := w(x)#i (x)dx (8.65)
a

and the quadrature error is


# b
f (2) (ξ)
Ew (f ) = w(x)ωn2 (x)dx. (8.66)
(2n)! a

Without repeating the arguments, we simply summarize below some of the well
known Gaussian formulas:
Weight function Interval Abscissas are Name of the special
w(x) [a, b] zeros of orthogonal polynomial

1 [−1, 1] Pn (x) Legendre


e−x [0, ∞] Ln (x) Laguerre
2
e−x [−∞, ∞] Hn (x) Hermite

(1 − x)α (1 + x)β [−1, 1] Jn (x; α, β) Jacobi


1
(1−x2 )1/2
[−1, 1] Tn (x) = cos(n cos−1 x) Chebyshev, 1-st kind
sin((n+1) cos−1 x
(1 − x2 )1/2 [−1, 1] Sn (x) = sin(cos−1 x) Chebyshev, 2-nd king

√1 [0, 1] P2n ( x
x
√ √
x [0, 1] √1 P2n+1 ( x)
s
x 1/2 √
( 1−x ) [0, 1] √1 T2n+1 ( x)
x
8.6. ADAPTIVE INTEGRATION 13

8.6 Adaptive Integration


Recall the equally-spaced composite Simpson’s rule
! b m−1
"
h
f (x)dx = {f (x0 ) + 2 f (x2i )
a 3 i=1
m−1
" (b − a) 4 (4)
+4 f (x2i+1 ) + f (x2m )} − h f (ξ). (8.67)
i=0
180

Obviously, if f (4) (x) various greatly in magnitude, then the error term in (8.67)
might not be reasonable. In practice, we should not introduce more than what
is necessarily many nodes to save the overhead. But near points where f (x)
behaves badly we should place denser nodes to maintain the accuracy. For
these reasons, we introduce an adaptive integration method which performs a
local error estimation and introduce nodes according to the behavior of the
integrand.
Let a = a0 < a1 < . . . < an = b be a partition of [a, b] where

hi := ai+1 − ai (8.68)

generally are of different lengths but are of the form b−a


2r . Such an interval is said
to be of level r. Consider the two quadrature rules over the interval [ai , ai+1 ]:
hi hi
{f (ai ) + 4f (ai + )
R2 [ai , ai+1 ](f ) =
12 4
hi 3hi
+ 2f (ai + ) + 4f (ai + ) + f (ai+1 )} (8.69)
2 4
hi hi
R1 [ai , ai+1 ](f ) = {f (ai ) + 4f (ai + ) + f (ai+1 )} (8.70)
6 2
Suppose that the nodes a0 , . . . , ai have already been determined so that
! ai i
"
f (x)dx ≈ R2 [aj−1 , aj ](f ) (8.71)
a j=1

has already been accepted as an approximation. We want to consider the next


stepsize h at the level r. Observe from (8.67) that
! α+h
ĥ5
f (x)dx − R1 [α, α + h](f ) = − f (4) (ξ)
α 90
ĥ5 (4) h
= f (α + ) + O(h6 ), (8.72)
90 2
! α+h
ĥ5 (4)
f (x)dx − R2 [α, α + h](f ) = − f (ξ)
α 1440
ĥ5 (4) h
= − f (α + ) + 0(h6 ) (8.73)
1440 2
14 CHAPTER 8. DIFFERENTIATION AND INTEGRATION

with ĥ := h2 . Thus

h 15ĥ5 (4)
R2 [α, α + h](f ) − R1 [α, α + ](f ) ≈ f (α + h) (8.74)
2 1440
and ! α+h
1
f (x)dx − R2 [α, α + h](f ) ≈ (R2 − R1 ). (8.75)
α 15
" ai +h "
In order that | ai f (x)dx − R2 [ai , ai + h](f )| < 2r , we check to see if

15"
|R2 [ai , ai + h](f ) − R1 [ai , ai + h](f )| < . (8.76)
2r
If (8.72) holds, we define ai+1 := ai + h, add " aR2 [ai , ai+1 ](f ) to the right hand
side of (8.66) to obtain an approximation of a i+1 f (x)dx, and go on to the next
subinterval. If (8.76) does not hold, we then consider the intervals [ai , ai + h2 ]
and [ai + h2 , ai + h] at the level r + 1. All the previously computed values should
be saved for future use. The integration is stopped when a subinterval [an−1 , b]
converges. A flowchart of the adaptive integration method is as follows:
Remark. The flowchart of adoptive integration method.
Chapter 9

Numerical Ordinary
Differential Equations -
Initial Value Problems

The problem to be considered in this chapter is to solve the initial value


problem (IVP):
dy
= f (x, y)
dx
y(x0 ) = y0 (9.1)
where f : R × Rn → Rn satisfies the Lipschitz condition in y, i.e.,
#f (x, y1 ) − f (x, y2 )# ≤ L#y1 − y2 # (9.2)
for every y1 , y2 ∈ Rn .
Remarks. (a) The Lipschitz condition is sufficient to prove that the IVP (9.1)
has a unique solution near x0 .
(b) Any higher-order ordinary differential equation
! "
dm y dy dm−1
= f x, y, , . . . , (9.3)
dxm dx dxm−1
can be reduced to a first-order system. This is done simply by letting
dy1 dym−1
y1 := y, y2 := , . . . , ym := (9.4)
dx dx
and by considering the differential equation associated with (y1 , . . . , ym )T .
Definition 9.0.1 Let x0 < x1 . . . be a sequence of points at which the solution
y(x) of (9.1) is approximated by the values y1 , y1 , . . .. Any numerical method
that computes yi+1 by using information at xi , xi−1 , . . . ,
xi−k+1 is called a k-step method.

1
2 Initial Value Problems

Example. The simplest 1-step method for solving (9.1) is the so called Euler
method
yn+1 = yn + hn f (xn , yn ) (9.5)
where xn+1 = xn + hn and hn is the step size. Several questions are typical
topics in the numerical ODE theory — What is the global en := y(xn ) − yn ?
How is it constituted? How does it propagated? How does the step size affect
the error?

9.1 Linear Multi-step Methods


Definition 9.1.1 By a linear (p + 1)-step method of step size h, we mean a
numerical scheme of the form
p
! p
!
yn+1 = ai yn−i + h bi fn−i (9.6)
i=0 i=−1

where xk = x0 + kh, fn−i := f (xn−i , yn−i ), and a2p + b2p "= 0. If b−1 = 0, then
the method is said to be explicit; otherwise, it is implicit.

Remark. In order to obtain yn+1 from an implicit method, usually it is neces-


sary to solve a nonlinear equation. Such a difficulty quite often is compensated
by other more desirable properties which are missing from explicit methods.
One such a desirable property is the stability.
Obviously, the choice of the coefficients ai and bi should not be arbitrary.
The choice is made with at least two concerns in mind:
(a) The truncation error made in each step should be as small as possible.
(b) The propagation of errors in overall steps should be as slow as possible.

Definition 9.1.2 The difference operator L associated with the linear (p + 1)-
step method (9.6) is defined to be, with a−1 = −1,
p
! p
!
L[y(x), h] := ai y(x + (p − i)h) + h bi y " (x + (p − i)h). (9.7)
i=−1 i=−1

Definition 9.1.3 When applying the scheme (9.6) to an initial value problem
(9.1), we say the local truncation error at xn+1 is L[y(xn−p ), h].

Remark. Assume yn−i = y(xn−i ) for i = 0, . . . , p. Then

L[y(xn−p ), h] = hb−1 [f (xn+1 , y(xn+1 )) − f (xn+1 , yn+1 )]


+ a−1 (y(xn+1 ) − yn+1 ). (9.8)

Obviously, if the scheme is explicit, then

|L[y(xn−p+1 ), h]| = |y(xn+1 ) − yn+1 | (9.9)


9.1. LINEAR MULTI-STEP METHODS 3

indeed represents the error introduced by the scheme at the present step when
no errors occur in all previous steps. If the scheme is implicit, then, by the
mean value theorem,
∂f
|L[y(xn−p+1 ), h]| = |hb−1 (xn+1 , ηn+1 ) + a−1 | |y(xn+1 ) − yn+1 |. (9.10)
∂y
Upon expanding the right hand side of (9.7) about x by the Taylor series
and collecting the like powers of h, we have

L[y(x), h] = c0 y(x) + c1 hy (1) (x) + . . . + cq hq y (q) (x) + . . . (9.11)

where
p
!
c0 = ai , (9.12)
i=−1

p−1
! p
!
c1 = (p − i)ai + bi , (9.13)
i=−1 i=−1

p−1 p−1
1 ! 1 !
cq = (p − i)q ai + (p − i)q−1 bi , q ≥ 2. (9.14)
q! i=−1 (q − 1)! i=−1

Definition 9.1.4 A linear multi-step method is said to be of order r if c0 =


c1 = . . . = cr = 0, but cr+1 #= 0 in the corresponding difference operator , i.e.,
If L[y(x), h] = cr+1 y (r+1) (η)hr+1 .

Definition 9.1.5 A linear multi-step method is said to be consistent if and only


if its order r ≥ 1.

By integrating both sides of (9.1) from xn to xn+1 , we obtain


" xn+1
y(xn+1 ) − y(xn ) = f (x, y(x))dx. (9.15)
xn

Suppose we have already known (xn , yn ), (xn−1 , yn−1 ), . . . , (xn−p , yn−p ). We


may approximate f (x, y(x)) by a p-th degree interpolation polynomial pp (x).
In this way, we obtain an explicit scheme
" xn+1 p
!
yn+1 = yn + pp (x)dx = yn + βpi fn−i (9.16)
xn i=0

where bi are the quadrature coefficients. Such a method is called an Adams-


Bashforth method. Similarly, if we also include (xn+1 , yn+1 ) in the data of
interpolation, then we end up with an implicit scheme
p
!
yn+1 = yn + βpi fn−i (9.17)
i=−1
4 Initial Value Problems

which is called an Adams-Moulton method.


Examples. (a) Some Adams-Bashforth methods:
0 1 2 3 4
β0i 1
2β1i 3 -1
12β2i 23 -16 5
24β3i 55 -59 37 -9
720β4i 1901 -2774 2616 -1274 251
(b) Some Adams-Moulton methods:
-1 0 1 2 3
β0i 1
2β1i 1 1
12β2i 5 8 -1
24β3i 9 19 -5 1
720β4i 251 646 -264 106 -19

9.2 Stability Theory of Multi-step Methods


Example. Suppose we want to apply the midpoint rule

yn+1 = yn−1 + 2hfn (9.18)

to the initial value problem

y " = −y, y(0) = 1. (9.19)

The exact solution is y(x) = e−x . The numerical scheme will produce a finite
difference equation
yn+1 = yn−1 − 2hyn . (9.20)
To solve (9.20) we try a solution of the form

yn = r n . (9.21)

Then r is a zero of the polynomial r2 + 2hr − 1 = 0. Therefore, the general


solution of (9.20) is given by

yn = α1 r1n + α2 r2n (9.22)



with ri = −h ± h2 + 1. Initial condition implies α1 + α2 = 1. We need one
more value y1 to start the scheme (9.18). Expressing α1 and α2 in terms of y1 ,
9.2. STABILITY THEORY OF MULTI-STEP METHODS 5

we have

r2 − y1 h2 + 1 + h + y 1
α1 = = √ (9.23)
r2 − r1 2
h2 + 1

r1 − y1 −h + h2 + 1 − y1
α2 = = √ . (9.24)
r1 − r2 2
h2 + 1
We note that |r1 | < 1 and |r2 | > 1. Therefore, unless y1 is chosen such that
α2 = 0, yn grows unboundedly and, hence, deviated from the exact solution
regardless of the step size. Such a phenomenon is called numerical unstability.
Consider a general finite difference equation
p
!
ai yn−i = φn (9.25)
i=−1

where ai are constants independent of n, a−1 #= 0, ap #= 0, and n ≥ p + 1. Given


starting values y0 , . . . , yp , a solution of the difference equation is a sequence of
value {yn }∞
p+1 that satisfies the equation (9.25) identically for all n = p + 1, . . ..
The general solution of (9.25) can be written as yn = ŷn + ψn where ŷn is the
general solution of the homogeneous equation
p
!
ai yn−1 = 0 (9.26)
i=−1

and ψn is a particular solution of (9.25). To determine ŷn , we try ŷn = rn for


some appropriate r. Then we find r must be a root of the polynomial
p
!
P (r) := ai rp−i = a−1 rp+1 + a0 rp + . . . + ap . (9.27)
i=−1

Let the roots of P (r) be denoted by r0 , r1 , . . . , rp . Then


(a) Suppose all the roots of P (r) are distinct. Then rjn , j = 0, . . . , p are
linearly independent in the sense that
p
!
αj rjn = 0 for all n =⇒ αj = 0 for all j. (9.28)
j=0

Hence, the general solution ŷn of (9.26) is given by


p
!
ŷn = αj rjn (9.29)
j=0

where αj are arbitrary constants to be determined by the starting values.


(b) Suppose some or all the roots of P (r) are repeated. For example, suppose
r0 = r1 = r2 and suppose the remaining roots are distance. Then it can be
6 Initial Value Problems

proved that r0n , nr1n , n(n − 1)r2n , r3n , . . . , rpn are linearly independent. In this
case, the general solution ŷn is given by
p
!
ŷn = α0 r0n + α1 nr1n + α2 n2 r2n + αj rjn . (9.30)
j=3

Remark. We note that ŷn is bounded as n → ∞ if and only if all roots of


P (r) = 0 have modulus |rj | ≤ 1 and those with modulus 1 are simple.

Definition 9.2.1 The first and the second characteristic polynomials associated
with a linear multistep method
p
! p
!
ai yn−i + h bi fn−i = 0, (9.31)
i=−1 i=−1

are defined, respectively, by


p
!
P1 (r) := ai rp−i (9.32)
i=−1

p
!
P2 (r) := bi rp−i . (9.33)
i=−1

We say the method is zero-stable if and only if all roots of P1 (r) have modulus
≤ 1 and those with modulus 1 are simple.

Remark. In a (p+1)-step method (9.31), totally there are 2p+3 undetermined


coefficients (a−1 = −1). Thus, it appears that we may choose these numbers so
that c0 = . . . = c2p+2 = 0. Indeed, such a choice is always possible. Therefore,
the maximal order for an implicit method is 2p+2. However, the maximal order
method is often useless because of the well known Dalquist barrier theorem:

Theorem 9.2.1 Any (p + 1)-step method (9.31) of order ≥ p + 3 is unstable.


More precisely, if p is odd, then the highest order of a stable method is p + 3; if
p is even, then the highest order of a stable method is of order p + 2.

Remark. The simpson’s rule is the best zero-stable 2-step method that attains
the optimal order 4.
We now apply the multistep method (9.31) to the text problem

y " = λy (9.34)

where λ ∈ R is a fixed constant. Recall that the local truncation error Tn+1 at
xn+1 is defined by
p
! p
!
Tn+1 := ai y(xn−i ) + h bi λy(xn−i ). (9.35)
i=−1 i=−1
9.2. STABILITY THEORY OF MULTI-STEP METHODS 7

Due to the floating-point arithmetic, we often will introduce round-off errors in


applying the scheme (9.31). Suppose
p
! p
!
Rn+1 = ai ỹn−i + h bi f (xn−i , ỹn−i ). (9.36)
i=−1 i=−1

Then the global errors ẽn := y(xn ) − ỹn−i satisfy the difference equation
p
! p
!
ai ẽn−i + λh bi ẽn−i = Tn+1 − Rn+1 . (9.37)
i=−1 i=−1

It is reasonable to assume that Tn+1 − Rn+1 ≈ φ ≡ constant. According to the


theory we develop earlier, a general solution of (9.37) would be
p
! φ
ẽn = αj rjn + "p , (9.38)
j=0
λh i=−1 bi

"p
if we assume all roots r0 , . . . , rp of the polynomial ii=−1 (ai + λhbi )rp−i are
distinct.

Definition 9.2.2 Let h := λh. The polynomial

# p
!
(r, h) := P1 (r) + hP2 (r) = (ai + hbi )rp−i (9.39)
i=−1

is called the stability polynomial of the multistep method (9.31).

Definition 9.2.3 The linear multistep $ method (9.31) is said to be absolutely


stable for a given h if all roots rj of (r, h) satisfies |rj | < 1. The interval S is
called the interval of absolute stability if the method is absolutely stable for all
h ∈ S.

Remark. It can be proved that every consistent, zero-stable method is abso-


lutely unstable for small positive h.
Example. Consider % the Simpson’s
& P1 (r) = −r2 + 1, P2 (r) = 13 (r2 +
rule.%Then &
$ $
4r + 1), (r, h) = h3 − 1 r2 + 43 hr + 1 + h3 . The roots of (r, h) can be
checked to be

r1 = 1 + h + 0(h−2 ) (9.40)
1
r2 = −1 + h + 0(h−2 ). (9.41)
3

Hence, if h > 0, then r1 > 0 and r2 > −1. If h < 0, then r1 < 1 and r2 < −1.
In other words, Simpson’s rule is no where absolutely stable.
8 Initial Value Problems

9.3 Predictor and Corrector Methods


A predictor-corrector (PC) method consists of an explicit method which is used
to make an initial guess of the solution and an implicit method which is used to
improve the accuracy of the solution. Generally, there are two ways to proceed
a PC method: One is to repeat the correction until a preassigned tolerance
(s+1) (s)
!yn+1 −yn+1 ! < ! is achieved. But more often, the PECE method is preferred.

Definition 9.3.1 Let P stand for the predictor


∗ ∗
p
! p
!
a∗i yn−i +h b∗i fn−i = 0. (9.42)
i=−1 i=0

Let C stand for the corrector


p
! p
!
ai yn−i + h bi fn−i = 0. (9.43)
i=−1 i=−1

Let m be a fixed integer. Then by a P (EC)m E method we mean the following


scheme:
∗ ∗
p
! p
!
(0) (m) (m)
yn+1 = a∗i yn−i +h b∗i fn−i , (9.44)
i=0 i=0

For s = 0, . . . , m − 1, do (9.45)
(s) (s)
fn+1 = f (xn+1 , yn+1 ) (9.46)
p
! p
!
(s+1) (m) (m) (s)
yn+1 = ai yn−1 + h bi fn−i + hb−1 fn+1 (9.47)
i=−1 i=0

(m) (m)
fn+1 = f (xn+1 , yn+1 ). (9.48)

The selection of the pair of the predictor and the corrector should not be
arbitrary. Rather

Theorem 9.3.1 Let r∗ and r be the order of the predictor and the corrector,
respectively.
(1) If the correction to convergence is used and if r∗ ≥ r, then the principal
local truncation error of the PC method is that of the corrector alone. The
properties of the predictor is immaterial.
(2) If P (EC)m method is used and if r∗ = r − q, 0 ≤ q < r, then the local
truncation of the PC method is
(a) that of the corrector alone if m ≥ q + 1; or
(b) of the same order as the corrector (but the error constant is larger) is
m = q; or
(c) of the form 0(hr−q+m+1 ) if m ≤ q − 1.
9.4. RUNGE-KUTTA METHODS 9

Remark. From the above theorem, we conclude that the predictor should not
be any better than the corrector. In practice, it is believed that the choice
r∗ = r with m = 1 is the best combination.
An important advantage of the PC method is the ease of estimating local
truncation errors. Suppose r∗ = r. Then under the local assumption, we know
(0)
y(xn+1 ) − yn+1 = c∗r+1 hr+1 y (r+1) + 0(hr+2 ) (9.49)
(m)
y(xn+1 ) − yn+1 = cr+1 hr+1 y (r+!) + 0(hr+2 ). (9.50)

It follows that
(m) (0)
yn+1 − yn+1 = (c∗r+1 − cr+1 )hr+1 y (r+1) + 0(hr+2 ). (9.51)
(m)
Therefore, the global error y(xn+1 ) − yn+1 is estimated by
cr+1 ! "
(m) (m) (0)
y(xn+1 ) − yn+1 ≈ ∗ yn+1 − yn+1 . (9.52)
cr+1 − cr+1

Such an estimate of error is called the Milne’s device. Note the right hand side
of (9.52) does not involve any high order derivative calculation. Note also the
local assumption is not realistic in practice. Thus the estimate should be used
with conservation.
Example. The following is a 4-th order Adams-Bashforth-Moulton pair;
h 251 5 (5)
yn+1 = yn + 24 (55fn − 59fn−1 + 37fn−2 − 9fn−3 ) , Tn+1 ≈ 720 h y
h 19 5 (5)
yn+1 = yn + 24 (9fn+1 + 19fn − 5fn−1 + fn−2 ) , Tn+1 ≈ − 720 h y .

It can be shown, analogous to the preceding section, that the stability polyno-
mial for a P (Ec)m E method is given by
# ! ∗
"
r, h) − P1 (r) + hP 2 (r) + Mm (h)(P1∗ (r) + hP 2 (r) (9.53)
P (EC)m E

where
(hb−1 )m (1 − hb−1 )
Mm (h) = . (9.54)
1 − (hb−1 )m
With somewhat more complicated analysis, one can show that the PC method
is absolutely unstable for small h > 0.

9.4 Runge-Kutta Methods


We have seen that higher order of accuracy can be attained by appropriately
selected coefficients in a multistep methods. But the disadvantage is that a
multistep methods requires additional starting values. Runge-Kutta methods,
in contrast, preserves the one-step nature but sacrifices the linearity.
10 Initial Value Problems

Definition 9.4.1 An R-stage Runge-Kutta method generally is defined by

yn+1 = yn + hφ(xn , yn , h) (9.55)

where
R
!
φ(x, y, h) = cr kr , (9.56)
r=1

R
!
kr = f (x + har , y + h brs ks ), r = 1, . . . , R, (9.57)
s=1

R
!
ar = brs . (9.58)
s=1

Such a method usually is represented by an array

a B
(9.59)
cT

where B = (brs ), cT = [c1 , . . . , cR ], a = [a1 , . . . , aR ]T . If B is strictly lower


triangular, then the method is said to be explicit; otherwise, it is implicit.

Remark. Note an R-stage Runge-Kutta method involves R function evalua-


tions per step. Each of the functions kr , r = 1, . . . , R, may be interpreted as
approximation to the derivative y ! (x) and the function φ as a weighted mean of
these approximations.
Remark. An important application of Runge-Kutta methods is to provide
additional starting values for a multistep predictor-corrector algorithm. If an
error estimator is available, then a Runge-Kutta method is easier to change the
step size than a multistep method.
Remark. Recall the initial value problem

y ! = f (x, y), y(xn ) = y(xn ). (9.60)

We can calculated the Taylor series

h2 (2)
y(xn+1 ) = y(xn ) + hy (1) (xn ) + y (xn ) + . . . (9.61)
2!
Thus by defining
h df
φT (x, y, h) := f (x, y) + (x, y) + . . . (9.62)
2! dx
we see that the Taylor algorithm of order p falls within the class (9.55).
If we choose values for the constants cr , ar , brs such that the expansion of the
function φ defined by (9.56) in powers of h differs from the expansion for φT (x, y)
9.4. RUNGE-KUTTA METHODS 11

defined in (9.62) only in the p-th and higher powers of h, then the method clearly
has order p. Although there is a great deal of tedious manipulation involved in
deriving Runge-Kutta methods of higher order, some well known methods have
already been developed. It is quite often that given a specific order, there are a
2-parameter family of Runge-Kutta methods all of which have the same order
of accuracy.
Examples. Two fourth-order 4-stage explicit Runge-Kutta methods

0 0
1/2 1/2 0
1/2 0 1/2 0 (9.63)
1 0 0 1 0
1/6 2/6 2/6 1/6

0 0
1/3 1/3 0
2/3 -1/3 1 0 (9.64)
1 1 -1 1 0
1/8 3/8 3/8 1/8
Example. The unique 2-stage implicit Runge-Kutta method of order 4:
√ √
1/2 + √3/6 1/4√ (1/4 + 3/6
1/2 − 3/6 (1/4 − 3/6 1/4 (9.65)
1/2 1/2

Example. A 3-stage semi-explicit Runge-kutta method or order 4:

0 0 0 0
1/2 1/4 1/4 0
(9.66)
1 0 1 0
1/6 4/6 1/6

Theorem 9.4.1 (Butcher’s attainable order theorem)


(a) Let P ∗ (R) be the highest order that can be attained by an R-stage explicit
Runge-Kutta method. Then

R 1 2 3 4 5 6 7 8 9 ≥ 10
. (9.67)
P ∗ (R) 1 2 3 4 4 5 6 6 7 ≤R−2

(b) For any R ≥ 2, there exists an R-stage implicit Runge-Kutta method of


order 2R.
We now consider the absolute stability property of Runge-Kutta methods.
We apply a Runge-Kutta method to the test problem y " = λy. Define the global
error ẽn := y(xn ) − ỹn , h := λh, k = [k1 , . . . , kR ]T and E := [1, . . . , 1]T . We first
observe that
k = λeyn + hBk, (9.68)
12 Initial Value Problems

or equivalently,
k = λ(I − hB)−1 eyn . (9.69)
Thus, analogous to (9.37), we have
T
ẽn+1 = ẽn + hc (I − hB)−1 eẽn (9.70)

+ (local error due to truncation and round-off ) (9.71)

The Runge-Kutta method is said to be absolutely stable on the interval (α, β) if


T −1
r := 1 + hc (I − hB e (9.72)

satisfies |r| < 1 whenever h ∈ (α, β). Note that, in contrast to the multistep
method, the stability of a Runge-Kutta method is determined by only one root.
In fact, it can be proved that

Theorem 9.4.2 For R = 1, 2, 3, 4, all R-stage explicit Runge-Kutta methods of


order R have the same interval of absolute stability. In each of these cases,
1 1
r = 1 + h + h−2 + . . . + h−R . (9.73)
2 R!
9.4. RUNGE-KUTTA METHODS 13

Examples.
Euler Rule Midpoint Rule Simpson’s Rule Trapezoial Rule
yn+1 = yn +hfn yn+1 = yn−1 +2hfn yn+1 = yn−1 + h3 (fn−1 +4fn +fn+1 ) yn+1 = yn + h2 (fn +fn+1 )
a−1 -1 -1 -1 -1
a0 1 0 0 0
a1 1 1 0
b−1 0 0 1/3 1/2
b0 1 2 4/3 0
b1 0 1/3 1/2
r 1 2 4 2
p+1 1 2 2 1

Figure 9.1: Examples of Multi-step Methods.

You might also like