You are on page 1of 46

1EC 2001 1 / 46

P1 Engineering Computation
David Murray
david.murray@eng.ox.ac.uk
www.robots.ox.ac.uk/dwm/Courses/1EC
Hilary 2001
1EC 2001 2 / 46
Intro: Accuracy and Eciency
1EC 2001 3 / 46
Preamble: Tales from Linear Algebra
The linear or linearizable nature of many systems means that engineers
often need to solve a set of linear independent equations Ax = b or
_

_
A
11
A
12
. . . A
1n
A
21
A
22
A
2n
.
.
.
.
.
.
.
.
.
A
m1
A
m2
. . . A
mn
_

_
_

_
x
1
x
2
.
.
.
x
n
_

_
=
_

_
b
1
b
2
.
.
.
b
m
_

_
where
A is a known m n matrix
x is the unknown vector of length n we wish to nd.
b is a known m 1 vector
In other words we have m equations and n unknowns.
1EC 2001 4 / 46
What happens for dierent sizes of m and n?
mxn nx1 mx1 mxn nx1 mx1 mxn nx1 mx1
m<n m=n m>n
More equations
than unknowns
Fewer equations
than unknowns
Same number
1 m < n: there are fewer equations than unknowns and we cannot
determine all the n unknowns.
2 m > n: If all the equations are linearly independent, then m n of
the equations must be inconsistent with the rest.
3 m = n: If there are the same number of linearly independent
equations as unknowns we can obtain a unique solution for x.
Matrix A is square and invertible (so |A|=0).
We will stick with Case 3: m = n. In other words, A is a SQUARE
matrix.
1EC 2001 5 / 46
Solving systems using Gauss Jordan Elimination
The fundamental operation is the subtraction of some multiple f of row r
of the augmented matrix from another row c, where c > r . Eg
_
_
1 2 3
1 4 2
9 3 1
_
_
x =
_
_
14
13
0
_
_
AM is
_
_
1 2 3 14
1 4 2 13
9 3 1 0
_
_
.
Now perform row operations on the Augmented Matrix
R
2
R
2
(1)R
1
R
3
R
3
(9)R
1
_
_
1 2 3 14
0 6 5 27
0 21 28 126
_
_
then
R
3
R
3
(
7
2
)R
2
_
_
1 2 3 14
0 6 5 27
0 0 21/2 63/2
_
_
whence x
3
= 3, x
2
= 2 and x
1
= 1.
1EC 2001 6 / 46
A Computer Algorithm for Gauss Elimination
To zero the (r , c)th element, the multiple to be subtracted is
f = A
rc
/A
cc
, where A is the current working matrix.
R
r
R
r
(A
rc
/A
cc
)R
c
Example: to clear -21 in
_
_
1 2 3 14
0 6 5 27
0 21 28 126
_
_
set (r , c) = (3, 2) then
R
3
R
3
(21/6)R
2
_
_
1 2 3 14
0 6 5 27
0 0 21/2 63/2
_
_
1EC 2001 7 / 46
Algorithm
This suggests the following algorithm or recipe to perform GJ elimination:
GaussJordan(A, n, b, x)
for c = 1 to n 1 step 1 do
for r = (c + 1) to n step 1 do
f A
rc
/A
cc
for k = c to n step 1 do
A
rk
A
rk
f A
ck
end for
b
r
b
r
f b
c
end for
end for
for r = n to 1 step -1 do
x
r

_
b
r

n
c=r+1
A
rc
x
c
_
/A
rr
end for
1EC 2001 8 / 46
All goes well ...
... until you try to solve
_

_
6.0000 9.0100 6.0000 5.0000
2.0000 3.0000 3.0000 2.0000
4.0000 3.0000 0.0000 1.0000
0.0000 2.0000 0.0000 1.0000
_

_
x =
_

_
6.0000
2.0000
7.0000
0.0000
_

_
on a computer (or calculator) with a small number of digits in the
mantissa.
Here as an example we have used 5 digits, so that every number is
rounded to appear in scientic notation, or mantissa/exponent form
1.2345 10
exponent
.
1EC 2001 9 / 46
Example ctd ...
Initial
6.0000 9.0100 6.0000 5.0000 6.000
2.0000 3.0000 3.0000 2.0000 2.000
4.0000 3.0000 0.0000 1.0000 7.000
0.0000 2.0000 0.0000 1.0000 0.000
Step 1a,b,c
6.0000 9.0100 6.0000 5.0000 6.000
0.0000 0.0033 5.0000 3.6667 4.000
0.0000 9.0067 4.0000 4.3333 11.000
0.0000 2.0000 0.0000 1.0000 0.000
Step 2a,b
6.0000 9.0100 6.0000 5.0000 6.000
0.0000 0.0033 5.0000 3.6667 4.000
0.0000 0.0000 13642.0000 10004.0000 10906.000
0.0000 0.0000 3030.3000 2223.2000 2424.200
Step 3
6.0000 9.0100 6.0000 5.0000 6.000
0.0000 0.0033 5.0000 3.6667 4.000
0.0000 0.0000 13642.0000 10004.0000 10906.000
0.0000 0.0000 0.0000 1.0000 1.700
1EC 2001 10 / 46
Example /ctd ...
Our solution x = (1.1981 0.8182 0.4472 1.7000)
Actual solution x = (0.9240 0.6608 0.1670 1.3216)
So why is the answer so hopelessy inaccurate, even though the system is
not ill-conditioned?
The problem occurs at the line f A
rc
/A
cc
. Here
Our algorithm fails if at any stage A
cc
= 0.
Our algorithm greatly amplies errors arising from nite machine
accuracy when any A
cc
is small.
Note that in both cases, we are concerned with the WORKING values of
A
cc
so the problem cannot be predicted from the initial value
without doing the elimination.
1EC 2001 11 / 46
Overcoming the problem using Partial Pivoting
At every step, rows of the augmented matrix are swapped so that A
cc
,
the pivot element, has greater magnitude than any other A
rc
: r > c.
This matrix is about to cause problems when clearing the 2nd column
(c = 2). With partial pivoting you would swap rows 2 and 4 before doing
the clearance:
_

_
5 6 2 7 1
0 0 9 7 2
0 2 4 6 3
0 3 2 1 4
_

_
5 6 2 7 1
0 3 2 1 4
0 2 4 6 3
0 0 9 7 2
_

_
Note that there is no need to swap all the rows such that they are
ordered. It is only the value of the pivot element that is important.
1EC 2001 12 / 46
Algorithm to swap rows
We need to nd the row r

that has the largest magnitude element. If we


are clearing column c, the following snippet nds and swaps the rows:
// Partial pivot row swapper ...
r

c
for r = c + 1 to n step 1 do
if |A
rc
| > |A
r

c
| then
r

r
end if
end for
// Swap elements in row c (yes!!) with those in r

for k = 1 to n step 1 do
temp A
ck
A
ck
A
r

k
A
r

k
temp
end for
temp b
c
b
c
b
r

b
r
temp
1EC 2001 13 / 46
Algorithm for GJ with Partial Pivoting
So our Partial Pivoting Gauss Elimination algorithm becomes
GaussJordanWithPivoting(A, n, b, x)
for c = 1 to n 1 step 1 do
// Slot the row swapping code in here!
// Then continue as before
for r = c + 1 to n step 1 do
f A
rc
/A
cc
for k = c to n step 1 do
A
rk
A
rk
f A
ck
end for
b
r
b
r
fb
c
end for
end for
for r = n to 1 step 1 do
x
r

_
b
r

n
c=r+1
A
rc
x
c
_
/A
rr
end for
1EC 2001 14 / 46
Solution WITH Partial Pivoting
(1a,b) Augmented Matrix A|b pivoted
matrix is same!
6.0000 9.0100 6.0000 5.0000 6.000
2.0000 3.0000 3.0000 2.0000 2.000
4.0000 3.0000 0.0000 1.0000 7.000
0.0000 2.0000 0.0000 1.0000 0.000
(2a) Augmented Matrix A|b after column
cleared
6.0000 9.0100 6.0000 5.0000 6.000
0.0000 0.0033 5.0000 3.6667 4.000
0.0000 9.0067 4.0000 4.3333 11.000
0.0000 2.0000 0.0000 1.0000 0.000
(2b) Pivoted Matrix A|b
6.0000 9.0100 6.0000 5.0000 6.000
0.0000 9.0067 4.0000 4.3333 11.000
0.0000 0.0033 5.0000 3.6667 4.000
0.0000 2.0000 0.0000 1.0000 0.000
(3a) Augmented Matrix A|b after column
cleared
6.0000 9.0100 6.0000 5.0000 6.000
0.0000 9.0067 4.0000 4.3333 11.000
0.0000 0.0000 4.9985 3.6651 3.996
0.0000 0.0000 0.8882 1.9623 2.4427
(3b) Pivoted Matrix A|b
6.0000 9.0100 6.0000 5.0000 6.000
0.0000 9.0067 4.0000 4.3333 11.000
0.0000 0.0000 4.9985 3.6651 3.996
0.0000 0.0000 0.8882 1.9623 2.4427
(4) Augmented Matrix A|b after column
cleared
6.0000 9.0100 6.0000 5.0000 6.000
0.0000 9.0067 4.0000 4.3333 11.000
0.0000 0.0000 4.9985 3.6651 3.996
0.0000 0.0000 0.0000 1.3110 1.7326
1EC 2001 15 / 46
Solution WITH Partial Pivoting
Solution is now x = (0.9240 0.6607900 0.1696100 1.3216)
So compare
-0.9240 0.6608 0.1696 -1.3216 Proper solution
-1.1981 0.8182 0.4472 -1.7000 Unpivoted
-0.9240 0.6607 0.1696 -1.3216 Pivoted
MESSAGE: One cannot lazily assume that computers are accurate.
Pivoting alleviates problems of nite machine precision, but not
ill-conditioning.
1EC 2001 16 / 46
The Importance of Being Ecient
Recall that work required for Gauss Elimination eectively performs the
LU decomposition of the matrix A, and that LU decomposition provides
a way of inverting a matrix.
Indeed solving the system using Gauss Elimination has similar
computational cost to solving x = A
1
b when and only when the
inverse is found by LU decomposition.
In more detail youll see that to zero a single element (r , c) using
R
r
R
r
(f )R
c
takes n MULTS and n ADDS.
This has to be done for n(n 1)/2 elements, so in total the requirement
is for
(1/2)n
2
(n 1)MULTS + (1/2)n
2
(n 1)ADDS .
Now multiplications (and divisions) take longer than additions (and
subtractions), so the time is dominated by the
O(n
3
) MULTS.
1EC 2001 17 / 46
Experiment to check the n
3
dependence of LU
decomposition
Looking at the times taken on
the lower straight line in the
graph, youll see that the slope
of the log-log graph is indeed 3
(count the decades), so the
method is O(n
3
).
It is common in text books to
see matrix inversion using the
method of cofactors. It is a
simple method to describe, so
perhaps it is a good method for
computation ...
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
1000
10000
100000
1 10 100 1000
T
i
m
e

(
s
e
c
o
n
d
s
)
Number of rows or columns
Time to invert
"LU"
"cofactors"
The upper curve is the time for solving using cofactors. This method
behaves much worse more like n
n1
.
1EC 2001 18 / 46
Is that so bad? No, it is STAGGERINGLY BAD!
Look at the times from the graph:
Matrix Size LU Cofactor
5 5 28 s 550 s Actual (450MHz Pentium)
10 10 0.2 ms 28 seconds Actual
12 12 0.3 ms 1.2 hours Actual
13 13 0.5 ms 17 hours Actual
20 20 1.0 ms 10 billion years estimate!
Given that the sun is due to engulf us in 4 billion years, even you have to
believe that eciency matters!
1EC 2001 19 / 46
1. Numerical Dierentation & Root Finding
1EC 2001 20 / 46
Numerical Dierentiation
In calculus, the derivative is dened as the limit as h tends to zero:
df
dx
= lim
h0
f (x + h) f (x)
h
.
Making the limit h small rather than zero provides a simple means of
estimating the derivative on a computer.
Suppose you wanted the numerical derivative of something function, your
code would contain nothing more than the following routines
Function(x)
v (what-
ever the
function
involves)
return(v)
Deriv(x, h)
v
(Function(x+
h)-
Function(x))/h
return(v)
The quality of the approximation depends on the size of h, as well as on
rounding errors (see later).
1EC 2001 21 / 46
Its Dependence on h is easily found ...
... using the Taylor expansion
f (x + h) = f (x) + hf

(x) +
h
2
2!
f

(x) + ...
and so
f (x + h) f (x)
h
= f

(x) +
h
2!
f

(x) + ...
We dont need to know what f

(x) is. It suces to know that our


approximation is
f

(x) =
f (x + h) f (x)
h
+ O(h)
We can forget the h
2
, h
3
etc terms, because these get small more quickly
than the leading error term in h.
So we say the approximation is Order (h), meaning in practice if we
reduce h by a factor of 2, the ERROR in the approximation should
reduced by about a factor of 2.
1EC 2001 22 / 46
Better approximations
You can do better than O(h) by evaluating the function at more points.
By using two points we get a straight line approximation which is O(h).
By using three we get a quadratic approximation which is O(h
2
).
f (x + h) = f (x) + hf

(x) +
h
2
2!
f

(x) +
h
3
3!
f

(x) +
h
4
4!
f

(x) + ...
f (x h) = f (x) hf

(x) +
h
2
2!
f

(x)
h
3
3!
f

(x) +
h
4
4!
f

(x) + ...
Then
f (x + h) f (x h) = 2hf

(x) + 2
h
3
3!
f

(x) + ...
and
f (x + h) f (x h)
2h
= f

(x) +
h
2
3!
f

(x) + ...
1EC 2001 23 / 46
So the order h
2
approximation is ...
f

(x) =
f (x + h) f (x h)
2h
+ O(h
2
) .
Curiously enough, the function at the third point f (x) doesnt have to be
computed, so this is no more expensive to compute than the O(h)
approximation!
/2/2
Function(x)
v (whatever the function is)
return(v)
/2/2
Deriv(x, h)
v (Function(x + h)-Function(x h))/2h
return(v)
1EC 2001 24 / 46
Example
Example for f = cos(x
2
) evaluated at x = 0.5. The actual derivative in
the table is computed directly using 2x sin(x
2
).
Note that as h is halved, the O(h) error does indeed halve, and the
O(h
2
) quarters. But the error does not keep getting smaller! So what is
happening after the bold values?
h Actual df /dx O(h) err df /dx O(h
2
) err
0.10000 -0.24740 -0.33016 0.08275 -0.25665 0.00925
0.05000 -0.24740 -0.28636 0.03895 -0.24972 0.00232
0.02500 -0.24740 -0.26628 0.01888 -0.24798 0.00058
0.01250 -0.24740 -0.25670 0.00929 -0.24755 0.00014
0.00625 -0.24740 -0.25202 0.00461 -0.24744 0.00004
0.00313 -0.24740 -0.24971 0.00231 -0.24742 0.00002
0.00156 -0.24740 -0.24853 0.00112 -0.24740 0.00000
0.00078 -0.24740 -0.24796 0.00055 -0.24738 0.00002
0.00039 -0.24740 -0.24765 0.00025 -0.24734 0.00006
0.00020 -0.24740 -0.24750 0.00009 -0.24734 0.00006
0.00010 -0.24740 -0.24719 0.00021 -0.24719 0.00021
1EC 2001 25 / 46
Derivative expressions: The method of undetermined
coecients
You will know already that when you equate two polynomials, you can
equate the coecients with the same power. For example if
a + bx + cx
2
+ dx
3
= x
2
x
3
then a = b = 0 and c = 1, d = 1.
You can use this to create a more orderly method of deriving expressions
for derivatives.
For example, suppose you wished to determine the expression for f

(x) in
terms of f (x + h), f (x) and f (x h). The orderly method is to write
f

(x) Af (x + h) + Bf (x) + Cf (x h) .
Now use the Taylor expansions of f (x + h) and so on. We nd
f

(x) (A+B+C)f (x)+(AC)f

(x)+
h
2
2!
(A+C)f

(x)+
h
3
3!
(AC)f

(x)+
h
4
4!
(A+C)f
Now make the coecients of f (x) and f

(x) zero, and of f

(x) unity.
Thence
A = C =
1
2h
2
B =
1
h
2
1EC 2001 26 / 46
and so the expression becomes ...
f

(x) =
1
2h
2
(f (x h) 2f (x) + f (x + h)) + O(h
2
) .
Why is is O(h
2
)?
The coecient that make it O(h) is
h
3
3!
(A C)f

(x)
and by good fortune this is zero, because (A C) = 0.
So, the O(h
2
) comes from the f

term in
f

(x) (A+B+C)f (x)+(AC)f

(x)+
h
2
2!
(A+C)f

(x)+
h
3
3!
(AC)f

(x)+
h
4
4!
(A + C)f
1EC 2001 27 / 46
Finding Roots to Equations
Now turn to the problem of nding a root of a nonlinear equation, ie,
nding a value of x for which f (x) = 0 where f (x) is a continuous
nonlinear function of x. (Why nonlinear?)
The bisection method assumes that initally there are known values x
1
and x
2
which straddle the desired root. Because f is continuous, f (x
1
)
and f (x
2
) must straddle zero.
Now look at f (r ) where r is the midpoint between x
1
and x
2
.
If f (r ) has opposite sign to f (x
1
), the root lies in the range [x
1
, r ] and
x
2
r in the next iteration.
Otherwise, the root lies in the range [m, x
2
], and x
1
r .
Iteration n
Iteration
n+1
1EC 2001 28 / 46
Bisection Algorithm
The bisection algorithm looks like
-10mm-10mm
Bisection(x1,x2,itermax,tol;
root)
root = ( x1 + x2 ) / 2
iter = 0
while iter < itermax AND |
Function(root)| > tol do
if Function(root)0
then
x1 = root
else
x2 = root
end if
root = ( x1 + x2 ) / 2
iter = iter + 1
end while
-2mm-2mm
Function(x)
v
(what-
ever
the
func-
tion
is)
return(v)
Here we have assumed f (x
1
) < 0 < f (x
2
) initially.
If this is not the case, then just swap x and x around.
1EC 2001 29 / 46
Example of Bisection
The Figure below illustrates the use of the bisection method to nd one
of the two roots of e
2x
x 2 = 0 starting with the interval [3, 1].
The absolute error has been worked out afterwards using the root
eventually determined.
Iter Root Abs Error
1 -1.5000000000e+00 4.8097395897e-01
2 -1.7500000000e+00 2.3097395897e-01
3 -1.8750000000e+00 1.0597395897e-01
4 -1.9375000000e+00 4.3473958969e-02
5 -1.9687500000e+00 1.2223958969e-02
.
.
.
23 -1.9809740782e+00 1.1920928955e-07
24 -1.9809739590e+00 0.0000000000e+00
25 -1.9809739590e+00 0.0000000000e+00
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
0 5 10 15 20 25
lo
g
1
0

A
b
s
E
r
r
o
r
Iteration
log10 Abs Error vs Iteration
"bisection.dat"
1EC 2001 30 / 46
Bisection Eciency: how does it converge?
Within the initial interval [x
1
, x
2
],
the best guess for the root r is
r =
1
2
(x
1
+ x
2
) so that the
associated error, x r , is at worst
1
2
(x
2
x
1
).
Each bisection, this worst error
must get reduced by factor 2.
Note from the graph that the
actual error is unlikely to reduce
uniformly!
If we let e
n
= ae
n1
then
e
n
= a
n
e
0
so
log(e
n
) = log e
0
+ n log(a) .
The slope of our graph is about
log(a) = 7/22 or a = 0.48. Ie,
the error magnitude halves each
iteration which is as expected.
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
0 5 10 15 20 25
lo
g
1
0
A
b
s
E
rro
r
Iteration
log10 Abs Error vs Iteration
"bisection.dat"
Bisection Accuracy
The accuracy that is actually achieved depends on the numerical errors
involved in computing f (x). Given a local Taylor series expansion
f f

x, a rounding error of f in the evaluation of f (x) produces a


corresponding error in the root of magnitude x (f

)
1
f .
1EC 2001 31 / 46
Denition of order of convergence
If we dene
n
to be the maximum possible error in the approximate
value for the root x after n iterations, then for a general iterative
procedure we say
the procedure is of order p if the relationship between the errors after n
and n + 1 iterations is of the form

n+1
a (
n
)
p
for some constants a, p.
The bisection method is clearly rst order with a = 1/2.
When using higher order methods such as Newton-Raphson (see later),
the error will converge more quickly towards zero, usually taking fewer
iterations to achieve some specied accuracy.
When using a 32-bit oating point number with a 24-bit mantissa to
represent x, the bisection method will achieve the best accuracy possible
after just 24 bisections; because of the nite machine accuracy, after that
point the new value of r will identically equal either x
1
or x
2
and so no
further change will take place. (See later.)
1EC 2001 32 / 46
Root nding using the Secant method
An obvious deciency in the Bisection method is that it only ask whether
points are below or above the root but not by how much. That is, it
does not use the gradient of the function.
The secant method uses a rough estimation of the gradient as the slope
of the straight line between the last two estimates of the root.
x
n-1
x
n
x
n+1
It is easy to show using similar triangles that the next estimate of the
root is
x
n+1
= x
n
f (x
n
)
(x
n
x
n1
)
(f (x
n
) f (x
n1
))
.
1EC 2001 33 / 46
Example: Root nding using Secant Method
Iter Root AbsErr f(root)
0 -3.000000000000 1.019026016104 1.002478752177
1 -1.991989258984 0.011015275087 0.010600704858
2 -1.981216135829 0.000242151932 0.000232939790
3 -1.980974088576 0.000000104680 0.000000100697
Final -1.980973983897 0.000000000001 0.000000000001
For general problems this method will converge faster than the bisection
method, but not as fast as the Newton-Raphson method analysed next.
However, unlike the Newton-Raphson method, it does not need the
separate computation of the gradient. Indeed, by storing the most recent
values of the function, the method requires only one function evaluation
per iteration.
So, though it requires more iterations than Newton-Raphson, each
iteration is faster. The consensus is that for functions where the actual
derivative is costly to compute, the secant method will be faster overall.
1EC 2001 34 / 46
Root nding using the Newton-Raphson method
Newton-Raphson is an advance on the secant method because it makes
use of the actual or estimated gradient at the current estimate of the
root. but we must impose the additional pre-condition of that f (x) has a
continuous rst derivative.
Given an approximate value for the root, x
n
, the nonlinear function f (x)
can be approximated in the local neighbourhood of x
n
by a linear
function with the same value and slope at x
n
,
f (x) f (x
n
) + f

(x
n
) (x x
n
).
Equating this linear approximation to zero denes the new approximate
root, x
n+1
,
f (x
n
) + f

(x
n
) (x
n+1
x
n
) = 0 = x
n+1
= x
n

f (x
n
)
f

(x
n
)
so long as f

(x)=0.
1EC 2001 35 / 46
This can be interpreted graphically ...
... as the intercept on the x-axis
of the tangent to the curve f (x)
at x = x
n
.
x
n+1
= x
n

f (x
n
)
f

(x
n
)
,
x
x
2
1
x
3
x
4
Newton-Raphson: Example
/2/2
The table shows the rst few iterations starting at x = 3 again for the
function f (x) = e
2x
x 2. Three iterations are sucient to reach the
solution.
1EC 2001 36 / 46
Newton-Raphson: Convergence
To learn about how NR converges for an arbitrary function f (x), it is
convenient to make the substitution

n
x
n
x
t
,
where x
n
is the current estimate, and x
t
the (unknown!) true root.
Putting this into the NR iteration gives
x
n+1
x
t
= x
n

f (x
n
)
f

(x
n
)
x
t
or
n+1
=
n

f (x
t
+
n
)
f

(x
t
+
n
)
.
Now x
t
is some xed value so, as the gure
indicates, we can think of
n+1
as a
function g of
n
:

n+1
= g(
n
) =
n

f (x
t
+
n
)
f

(x
t
+
n
)
.
g( )

n
n
1EC 2001 37 / 46
To learn more about this function near the solution ...
... we can make a Taylor expansion about zero (ie a MacLaurin series
expansion)
g(
n
) = g(0) +
n
dg
d
n
(0) +
1
2!

2
n
d
2
g
d
2
n
(0) + ...
What is g(0) ?
g(
n
) =
n
f (x
t
+
n
)/f

(x
t
+
n
) implies
g(0) = 0 f (x
t
+ 0)/f

(x
t
+ 0) = 0.
But f (x
t
) = 0, so g(0) = 0
What is dg/d
n
(0) ?
Dierentiating (it is straightforward use the quotient rule!):
g(
n
) =
n

f (x
t
+
n
)
f

(x
t
+
n
)
=
dg
d
n
=
f (x
t
+
n
)f

(x
t
+
n
)
(f

(x
t
+
n
))
2
.
Assuming that (i) f

(x
t
)=0 and (ii) f (x) has a second derivative at x
t
, it
is clear that
dg
d
n
(0) = 0.
It turns out that the higher derivatives are nite, so

n+1
= g(
n
)
1
2
g

(0)
2
n
.
This tells us that
Newton-Raphson iteration has second order convergence.
1EC 2001 38 / 46
How big can the initial error be and still achieve
convergence?
We have just found that:

n+1

1
2
g

(0)
n
,
and so, roughly speaking, the error at each step of the iteration will
decrease provided the initial error satises the condition
1
2
|g

(0)
0
| < 1.
Since
g

(0) =
f

(x)
f

(x)
,
this condition can be re-written as
|
0
| < 2
|f

(x)|
|f

(x)|
.
1EC 2001 39 / 46
Graphical interpretation
To repeat, the condition is
|
0
| < 2
|f

(x)|
|f

(x)|
.
This shows that the region within which the Newton-Raphson iteration is
guaranteed to converge is small when f

(x) is small relative to f

(x). If
the initial error lies outside this range, the iteration may not converge.
This conclusion seems to make sense for the sketch below.
f
f(x)
x
Wide Range Narrow Range
If you need to be fussier
Newton-Raphson iteration must converge if
1
2
|g

(
n
)
0
| < 1 for all |
n
|
0
.
1EC 2001 40 / 46
Newton-Raphson for multi-variable systems *
[* This will not be covered in the lectures, and is for interest. It is not
complicated in fact the reason for showing it at all is to suggest how
easily NR can be extended to functions of two or more variables.]
Suppose we wanted to solve m simultaneous nonlinear equations in m
variables. Collectively these equations can be expressed as
f (x) = 0,
where x = (x
1
, x
2
, ..., x
m
) is a m-dimensional vector of variables and f (x)
is a m-dimensional vector of non-linear functions of those variables. This
notation is confusing at rst, and it helps to look at the concrete example
below where we have four variables x
1
...x
4
, and four functions f
1
, ..., f
4
of
x
1
...x
4
.
Given an approximate value for the root, x
n
, the nonlinear vector
function f (x) can again be approximated in the local neighbourhood of
x
n
by a linear function of the form
f (x) f (x
n
) +
f
x
(x x
n
).
1EC 2001 41 / 46
The only real dierence ...
... is that this time the linear approximation involves the Jacobian matrix
f
x
, which is a square matrix whose (i , j ) element is the partial derivative
of the i
th
component of f with respect to the j
th
component of x,
_
f
x
_
i ,j

f
i
x
j
.
Equating this linear approximation to zero denes the new approximate
root, x
n+1
, to be the solution of a set of simultaneous linear equations,
f (x
n
) +
f
x
(x
n+1
x
n
) = 0,
which can be solved by inverting the Jacobian.
x
n+1
= x
n

_
f
x
_
1
f (x
n
) .
Compare this with
x
n+1
= x
n
(f

(x
n
))
1
f (x
n
)
for a single variable.
1EC 2001 42 / 46
Multivariable NR: Example
Find a root of these four simultaneous equations, some of which are
non-linear:
_
_
_
_
f
1
f
2
f
3
f
4
_
_
_
_
=
_
_
_
_
x
2
1
x
2
2
x
2
3
+ x
4
x
2
1
+ x
2
2
+ x
2
3
+ x
2
4
1
x
1
x
2
x
2
x
3
_
_
_
_
.
Ive chosen this system, because it is easy to nd the roots analytically.
Requiring f
3
= 0 and f
4
= 0 forces x
1
= x
2
= x
3
. The equations f
1
= 0
and f
2
= 0 then give x
2
4
+ x
4
1 = 0, and 3x
2
1
= x
4
Hence x
4
> 0 and so
x
4
= (1 +

5)/2 = 0.61803399 and x


1
= x
2
= x
3
= 0.45388471.
Well not show the code in detail, but the important bits are (1) a
routine to work out the function, (2) a routine to work out the Jacobian,
and (3) a routine to invert the Jacobian.
How would you work out the Jacobian? You can either compute the
(partial) derivatives numerically using the techniques learned earlier in
this lecture, or you can perform the partial dierentiation analytically
f
x
=
_
_
_
_
2x
1
2x
2
2x
3
1
2x
1
2x
2
2x
3
2x
4
1 1 0 0
0 1 1 0
_
_
_
_
1EC 2001 43 / 46
Example of multivariable NR
Heres what happened with my code, starting at an arbitary value of x.
Iter x
1
x
2
x
3
x
4
0 0.600000 0.800000 1.000000 1.200000
1 0.566247 0.566247 0.566247 0.717704
2 0.466233 0.466233 0.466233 0.622113
3 0.454051 0.454051 0.454051 0.618041
4 0.453885 0.453885 0.453885 0.618034
1EC 2001 44 / 46
Summary: Numerical Dierentiation
Main tool is a Taylor expansion.
Use directly for show that questions, but if starting from scratch
use he method of undetermined coecients:
Eg to nd f

(x) = Af (x + h) + Bf (x) + Cf (x h) use Taylor


expansions of f (x h) etc to ll out the RHS. Then set to ZERO the
coecients of all derivatives lower than that wanted (here f and f),
and set to UNITY the cocient if the derivative wanted (here f).
The order of error is the LOWEST power of h in the terms that are
ignored in your result m below:
f

(x) [Your result] + h


m
(this) + h
m+1
(that)
1EC 2001 45 / 46
Summary: Root-nding
Use sketches to help you remember the methods!
Bisection straddle and half ... doesnt use gradient information.
Secant average gradient info using straight line
x
n+1
= x
n
f (x
n
)[x
n
x
n1
]/[f (x
n
) f (x
n1
)]
Newton-Raphson actual gradient information
x
n+1
= x
n
f (x
n
)/f

(x
n
) requires f

(x)=0
1EC 2001 46 / 46
Summary: Order of convergence p in any iterative procedure
Write the error at each step

n+1
= a
p
n
p is the order of convergence.
For root nding:
Bisection p = 1 Dont use
Secant p 1.6 Uses average gradient, but very cheap to compute
Newton-Raphson p = 2 Better convergence, but needs numerical dierentiation

You might also like