You are on page 1of 271

Lecture 1

Introduction
1 Course goals
Most of your engineering courses teach you theory and analytic methods using the techniques of
mathematics (calculus, linear algebra, differential equations, and so on). The purpose is to train
you to understand significant engineering problems from a fundamental theoretical perspective.
You need this foundation to be able to design effective solutions to difficult problems it's
practically impossible to solve a problem you don't thoroughly understand.
Unfortunately, the set of engineering problems that can be solved analytically (with pencil, paper
and even a calculator) is limited to relatively simple and idealized cases. Indeed, many textbook
problems are carefully designed so that you can end up with the exact answer. However, most
applications in your engineering career will involve more complex systems for which you can
mathematically formulate the problem (if you've learned your theory) but cannot solve the
resulting equations analytically. People have long realized that in these cases numerical solutions
are possible. Yet before the development of digital computers, when calculations had to be done
by hand or slide rule, the application of numerical computing was fairly limited. As a result in
the past engineers were forced to make many simplifying assumptions, build scale models, use
trial and error and so on.
In the last several decades the development of computer technology has revolutionized
engineering. Almost any problem that can be formulated mathematically can be solved using
numerical techniques. Systems as complex as the supersonic flow of air through a jet engine or
the operation of an integrated circuit containing millions of transistors are now routinely
analyzed in minute detail using numerical methods implemented on computers.
This is the motivation behind EE 221 Numerical Computing for Engineers. The purpose of this
course is for you to:
1. learn fundamental numerical methods so you can understand how to formulate numerical
solutions to difficult engineering problems, and
2. develop programing skills using Scilab/Matlab so you are able to effectively implement
your numerical solutions.
The beginning of the course will be devoted to basic programing techniques, control structures,
input/output, graphics and the like. For the remainder of the course we will use those skills to
learn and implement numerical methods for important classes of problems.

2 Course topics
The main topics we will cover are listed below. Additional topics and programming project
examples will be covered as time permits.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

2/16

Scilab/Matlab Basics

Introduction

Arrays

Programming structures

Input and output

2D plots

3D plots and animation

Numerical methods

Root finding

Polynomials

Linear algebra

Linear systems

Nonlinear systems

Interpolation

Optimization

Curve fitting

Numerical calculus

Random numbers

Sparse systems

3 Course structure and grading


Your grade will be composed of the following components

25% Homework, typically twelve assignments in total, no late homework accepted. I


heavily weight the effort you put into your assignments. Assignments typically have a
pencil and paper component to be turned in as hard copy, and a programming component
to be submitted electronically via the course website.

75% Exams, 25% for each of three exams. The exams will include programing problems
to be done in the lab during the specified exam times as well as pencil-and-paper
problems.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

3/16

4 Scilab and Matlab


Computers basically do lots of relatively simple calculations really fast. Numerical methods are
typically mathematical statements of algorithms that (more-or-less) solve analytically intractable
problems by doing lots of adding, subtracting, multiplying and dividing along with logic
operations to guide program flow and termination. Computing runs the gamut from hand-held
calculators to massively parallel supercomputers. Calculators are fairly straight-forward, and you
will use one regularly throughout your coursework. Supercomputers are often dedicated to
specific types of problems for which highly optimized computer code, typically written in the
C/C++ or Fortran languages and compiled, is under ongoing development by teams of
researchers. In the middle lies the class of problems appropriate to a single PC.
It's a good idea to use a computing environment that best fits your problem. You want to get a
solution in a reasonable time with a minimum of effort. Simple arithmetic or evaluation of
elementary functions can readily be done with a $15 calculator. It would be a waste to boot up a
computer and open a programing environment in this case. For larger problems requiring a
computer there are many options. Spreadsheets are useful for many basic engineering situations.
They are quite visual and have graphic programing environments that are fairly intuitive. Why
write a computer program if you can solve the problem with a spreadsheet? At the high end
you can use a compiled language such as C with lots of time devoted to optimizing your code for
the utmost performance. In between is a large class of problems where you want the best of both
approaches. This is the target for computing environments such as Scilab and Matlab.
Scilab and Matlab are numerical computing environments that are extremely useful for the type
of calculations performed by engineers. This environment combines some of the best features of
a graphing calculator and a traditional computer programming environment. You will have an
opportunity to use this tool extensively in your engineering coursework and most likely in your
career.
Matlab (www.mathworks.com) is a commercial product, and expensive. Although the student
version is only about $50 (but expires when you are no longer a student), a single regular
license is more than $2,000. On the other hand, Scilab is a free, open-source tool
(www.scilab.org). They have quite a bit in common. If you learn one you can transfer that
knowledge to the other very easily. In this course we will emphasize Scilab. At the same time, I
will mostly emphasize those aspects of Scilab that are identical or very similar to Matlab so that
the conversion is essentially transparent. Where there are differences I will point those out.

4.1 History
(From http://en.wikipedia.org/wiki/Matlab)
Short for "matrix laboratory", MATLAB was invented in the late 1970s by Cleve Moler,
then chairman of the computer science department at the University of New Mexico. He
designed it to give his students access to LINPACK and EISPACK without having to
learn Fortran. It soon spread to other universities and found a strong audience within the
applied mathematics community. Jack Little, an engineer, was exposed to it during a visit
Moler made to Stanford University in 1983. Recognizing its commercial potential, he
joined with Moler and Steve Bangert. They rewrote MATLAB in C and founded The
MathWorks in 1984 to continue its development.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

4/16

(From http://en.wikipedia.org/wiki/Scilab)
Scilab was created in 1990 by researchers from INRIA and cole nationale des ponts et
chausses (ENPC). It was initially named lab[12] (Psilab). The Scilab Consortium was
formed in May 2003 to broaden contributions and promote Scilab as worldwide reference
software in academia and industry.[13] In July 2008, in order to improve the technology
transfer, the Scilab Consortium joined the Digiteo Foundation. [] Since July 2012,
Scilab is developed and published by Scilab Enterprises.
There are other free/open-source Matlab alternatives in addition to Scilab. One of the most
widely used is GNU Octave. In some ways Octave is arguably even more Matlab compatible than
Scilab. However, it's primarily written for Linux systems as is not as clean to install on
Windows PCs or Macs. The Python programing language has been gaining ground in the
numerical computing community, but differs significantly from Matlab. For these reasons I prefer
Scilab as the best free/open-source Matlab alternative.

5 Installing and running Scilab


Scilab can be downloaded from www.scilab.org. There are executable versions for Windows,
Mac and Linux. You can also download the complete source code. On Windows simply
download the installation program (scilab-5.5.0_x64.exe is about 128MByte). Double click and
agree to the license to install. The default on all options is recommended. When installation is
complete you should have a Scilab icon on your desktop that looks some like this

In the middle is the console where we will do most of our work. The other windows can X'd
out for now. We can open them later if need be. This leaves just the console.
Type help at the prompt, or use the ? menu to open the help browser. From there you can
search or browse the documentation. This is a good way to find out about all the capabilities of
Scilab. In this class we will use only a small subset of these.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

5/16

6 Interactive use
There are two basic ways to use Scilab/Matlab. As an interactive environment you can type a
command directly into the console at the prompt and Scilab will print the results. Then type
another command and so on. This is how we will start out using Scilab, as essentially a fancy
calculator. Later we'll learn how to use the editor to write structured programs.

6.1 Arithmetic
From the command line you can enter arithmetic expressions involving the addition, subtraction,
multiplication and division operators (+ - * /). For example
-->10+6/3-4*2
ans =
4.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

6/16

The ans variable stores the result of the last calculation and can be used in the next calculation.
Here is the same series of calculations performed one step at a time.
-->10
ans =
10.
-->ans+6/3
ans =
12.
-->ans-4*2
ans =
4.

Operator precedence rules apply. Operations are performed from left to right. Multiplication and
division are done before addition and subtraction. These can be overridden with parentheses. For
example
-->3+2/4
ans =
3.5

first divides 2 by 4 and then adds 3 to get 3.5; the + operator appears first (from left to right) but
multiplication has precedence over addition. However
-->(3+2)/4
ans =
1.25

Here the parenthesis tell Scilab to perform the addition operation first 3+2=5, followed by
division 5/4=1.25. I find it good practice in all but the simplest cases to use parentheses to
make the order of operations explicit. It is perfectly fine to use redundant parenthesis if this aids
the readability of an expression. For example the parenthesis in the expression
-->3+(2/4)
ans =
3.5

have no effect, but you might find they make the order of operations explicit.
A very useful feature of Scilab/Matlab is that by pressing the up or down arrow keys (Ctrl-P and
Ctrl-N) on your keyboard, you can cycle through the command line history. This allows you to
easily repeat previous commands. The commands can also be edited using the arrow and
Backspace keys.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

7/16

6.2 functions
Scilab/Matlab also has many built-in functions. The help menu provides complete listings. For
example, the trig functions sin, cos, tan as well as their inverses asin, acos, and
atan are available. You use parentheses to denote the argument.
-->sin(0.5)
ans =
0.4794255
-->asin(ans)
ans =
0.5

In Scilab the atan function is an overloaded function which allows you to provide different
numbers of arguments. The expression atan(z) returns the angle between -p/2 and p/2 for
which the tan function is z. It is a two-quadrant inverse tangent. The expression atan(y,x)
returns the angle between -p and p for which the tan function is y/x and the angle corresponds to
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

8/16

the polar angle of the rectangular coordinates point (x,y). In Matlab the same functionality is
provided by atan2(y,x).
The standard trig functions operate in radians. Thus cos(1) is the cosine of 1 radian. Since it is
often useful to work in degrees, Scilab provides the functions cosd, sind, atand and so
on where the appended d indicates degrees. So
-->atan(1)
ans =
0.7853982
-->atand(1)
ans =
45.

The power operation is performed by the ^ operator.


-->2^3
ans =
8.

Negative and fractional powers are supported.


-->3^(-0.37)
ans =
0.6659861

Square roots are evaluated with the sqrt(x) command.


-->sqrt(2)
ans =
1.4142136

The exponential function e x is implemented as exp(x).


-->exp(3.2)
ans =
24.53253

The natural logarithm function ln x is, somewhat confusingly, implemented as log(x).


-->exp(3)
ans =
20.085537
-->log(ans)
ans =
3.

In many engineering books log x is used to refer to the base 10 logarithm. In Scilab/Matlab the
base 10 logarithm is denoted by log10(x).
-->log10(2)
ans =
0.30103
-->10^ans
ans =
2.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

9/16

This distinction has caused problems for many an engineering student because we are so used to
using log(x) to denote the base-10 logarithm. You should pay particular attention to this.

6.3 variables
You can define variables and then operate on those. For example,
-->x = 2
x =
2.
-->x^2+3*x+7
ans =
17.

Typing in the variable name displays the variable value. If no such variable exists you'll get an
error message.
-->x
x =
2.
-->y
!--error 4
undefined variable : y

6.4 strings
Variables are typically numerical in nature. However, variables can be assigned text values using
quotation marks, as in the following examples.
-->a = 'this is a string'
a =
this is a string
-->b = ' and here is some more text'
b =
and here is some more text

Note that in Scilab you can also use double quotes as in this is a string but not in
Matlab. Therefore if you use single quotes in Scilab your code will be more Matlab compatible.
You can concatenate two strings with the + operator in Scilab. In Matlab you can form an array or
use the strcat function.
-->a+b //a+b in Scilab, in Matlab [a,b] or strcat(a,b)
ans =
this is a string and here is some more text

String are particularly useful for labeling and describing numerical output.

6.5 constants (Scilab)


There are certain pre-defined constants built-in to Scilab. (Scilab calls these permanent
variables.) These are prefixed by the % symbol. For example,
-->%pi
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

10/16

%pi

=
3.1415927

The % prefix protects the constant and keeps you from redefining it.
-->%pi = 3
!--error 13
redefining permanent variable

Many people find this notation messy but it has a very real advantage (see next section). If you
don't like this you can simply create a regular variable with the same value, as in
-->pi = %pi
pi =
3.1415927

6.6 constants (Matlab)


Matlab implements constants as pre-defined, but unprotected variables. These are not prefixed by
the % symbol. For example,
>> pi
ans =
3.1416

This is a weakness of Matlab, in my opinion, because you can accidentally redefine these, as in
>> pi = 3
pi =
3

If you later use the variable pi thinking it's value is p you'll end up with erroneous results.

6.7 complex numbers (Scilab/Matlab with a wrinkle)


Complex numbers are created by making use of the imaginary unit i , as in math texts. For
example z =23i is a complex number with real part 2 and imaginary part 3. In Scilab the
imaginary unit is the constant %i, as, for example, in
-->z = 2+3*%i
z =
2. + 3.i

Note that you need to explicitly include the multiplication operator *. You cannot just juxtapose
two variables to denote multiplication. If you don't like the % sign notation, you can define a
variable to equal %i and use that instead. For example,
-->j = %i
j =
i
-->z = 1+2*j
z =
1. + 2.i

In Matlab the imaginary unit is the predefined variable i (also j). Again this is dangerous because
you can redefine this variable accidentally. In fact i and j are very commonly used as index
variables in for loops.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction
>> i
ans =

11/16

0 + 1.0000i

>> i = 2
i =
2

To avoid this problem Matlab defines 1i as a protected variable equal to the imaginary unit
>> 2+3*1i
ans =
2.0000 + 3.0000i

In Matlab you can leave out the * operator as in


-->3+2i
ans =
3.0000 + 2.0000i

which is a nice feature that Scilab does not have. Scilab/Matlab functions can operate on and
return complex numbers.
-->exp(%i*3)
ans =
- 0.9899925 + 0.1411200i

6.8 semicolon notation


When you type in an expression Scilab/Matlab echoes the answer to the console. If you terminate
the expression with a semicolon, nothing is echoed back. This is very useful if you are not
interested in intermediate results and only want to see a final answer. For example,
-->x = 2;
-->y = 3;
-->z = tan(y/x);
-->w = sqrt(z)
w =
3.7551857

6.9 Numerical format


Scilab allows you some control over the default appearance of numerical output. Later we will
see how to format displayed numbers in great detail. The format command allows you to select
either v or e formats. For example
-->format('v');
-->10*%pi
ans =
31.415927
-->format('e');
-->10*%pi
ans =
3.142D+01
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

12/16

The D+01 notation represents the exponent of 10 in scientific notation. D stands for double
precision (more on that later). So, the last line above indicates a value 3.1412101 . You can also
control the number of digits that Scilab outputs using an optional second argument, as in
-->format('v',5);
-->%pi
%pi =
3.14
-->format('v',10);
-->%pi
%pi =
3.1415927

In Matlab you can choose from various predefined formats such as


>> format short
>> pi
ans =
3.1416
>> format long
>> pi
ans =
3.141592653589793

See help format for more details.

7 Capturing an interactive session


Sometimes you want to save all the commands and results from an interactive Scilab/Matlab
console session. The easiest way to do this is to use the Edit menu. First click Edit => Select
all followed by Edit => Copy after which you can paste the results into Notepad or any other
text editor.
A more premeditated approach is to use the diary command. This opens a file and echos all
console output to the file.
-->diary('part1.txt')
ans =
1.
-->str = 'everything I type should be saved'
str =
everything I type should be saved
-->str = ' in file part1.txt'
str =
in file part1.txt
-->x = 3
x =
3.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

13/16

-->sqrt(3)
ans =
1.7320508
-->diary(0)

The file part1.txt then contains


-->str = 'everything I type should be saved'
str =
everything I type should be saved
-->str = ' in file part1.txt'
str =
in file part1.txt
-->x = 3
x =
3.
-->sqrt(3)
ans =
1.7320508
-->diary(0)

that is, everything in the interactive session after the diary command. The diary command also
works in Matlab. The only difference is that the file closed with the command diary('off')
rather than diary(0).

8 Numerical limitations
If
x=(1+1020 )1
then what is x? Obviously x=1020 . Let's verify this with Scilab
-->x = (1+1e-20)-1
x =
0.

Scilab says x=0 , which is wrong. Why? Is 1020 too small for Scilab to represent?
-->x = 1e-20
x =
1.000D-20

Clearly that's not the problem. Instead we are seeing a round-off error. Scilab, like your
calculator, is limited in the number of digits it can use to represent a number.
Suppose we represent numbers in scientific notation as
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

14/16

x=d 1 . d 2 d 3 d 410e

1 2

where d 1 , d 2 , d 3 , d 4 , e1 , e2 are decimal digits (0-9). If y=+1.00010+00 and z=+1.0001020


then y+z is exactly
y+z =+1.0000000000000000000110+00
However, in our format we are limited to four digits of accuracy. Therefore we have to round
off the exact value as
y+z =+1.00010+00
The tiny value 1020 fell off the end so to speak. If we then calculate x=( y+z )1 we get
x=0 .
Scilab stores numbers in IEEE 754 double-precision binary floating-point format using 64 bits
or 8 bytes of computer memory. One bit accounts for the sign, eleven bits for the exponent
and 52 bits for the fraction or significand or mantissa. Since 25251015 we can say that a
double precision number has roughly 15-16 decimal digits of accuracy. If a calculation requires
more digits of accuracy than this then it will not produce a correct result.
Now that's a lot of accuracy, but not enough to correctly calculate (1+1020)1 . Unfortunately
this can be a very practical problem for us when working with numerical methods. Consider the
definition of the derivative of the function f ( x )
f ( x+) f ( x )
df
lim

dx 0
We might estimate this numerically as
f ( x+) f ( x)
df

dx
for very small. Ideally the approximation should get better and better.

Let's test this with f ( x )=e x for which

df
dx

=e 0=1 .

x=0

-->delta = 1e-3;
-->(exp(delta)-exp(0))/delta
ans =
1.000500166708385
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

15/16

-->delta = 1e-6;
-->(exp(delta)-exp(0))/delta
ans =
1.000000499962184
-->delta = 1e-9;
-->(exp(delta)-exp(0))/delta
ans =
1.000000082740371
-->delta = 1e-12;
-->(exp(delta)-exp(0))/delta
ans =
1.000088900582341
-->delta = 1e-15;
-->(exp(delta)-exp(0))/delta
ans =
1.110223024625157

Notice as we decrease , initially the approximation to the derivative improves. For =109
we get
df
1.000000082740371
dx
which is accurate to about 8 digits. But further reduction of actually leads to worse accuracy.
For =1015 we have
df
1.110223024625157
dx
which is not even accurate to 2 digits! In fact going to =1020 results in
-->delta = 1e-20;
-->(exp(delta)-exp(0))/delta
ans =
0.

which is completely wrong! The lesson is that we need to consider numerical limitations very
carefully when we develop and implement numerical algorithms.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 1: Introduction

16/16

9 References
Numerical methods books used as fundamental references throughout these notes
1. Press, W. H., Flannery, B. P., Teukolsky, S. A. and Vetterling, W. T., Numerical Recipes in
C, Cambridge, 1988, ISBN: 0-521-35465-X.
2. Recktenwald, G., Numerical Methods with Matlab: Implementation and Application,
Prentice Hall, 2000, ISBN: 0-201-30860-6.
3. Heath, M. T., Scientific Computing: An Introductory Survey, McGraw Hill, 2002, ISBN:
0-07-239910-4.
4. Urroz, G. E., Numerical and Statistical Methods with SCILAB for Science and
Engineering Vol. 1, BookSurge Publishing, 2001, ISBN-13: 978-1588983046.
5. http://www.mathworks.com/moler/
Software sites
1. http://www.scilab.org, official Scilab website
2. http://www.mathworks.com, official MatLab website
3. https://www.gnu.org/software/octave/, official GNU Octave website
4. https://www.python.org/, official Python website
5. http://www.gnu.org/software/gsl/, GNU Scientific Library
Wikipedia articles
1. http://en.wikipedia.org/wiki/Scilab
2. http://en.wikipedia.org/wiki/Matlab
3. http://en.wikipedia.org/wiki/GNU_Octave
4. http://en.wikipedia.org/wiki/Python_(programming_language)

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2
Arrays
1 Introduction
As the name Matlab is a contraction of matrix laboratory, you would be correct in assuming
that Scilab/Matlab have a particular emphasis on matrices, or more generally, arrays. Indeed, the
manner in which Scilab/Matlab handles arrays is one of its great strengths. Matrices, vectors and
the operations of linear algebra are of tremendous importance in engineering, so this material is
quite foundational to much of what we will do in this course. In this lesson we are going to focus
on generating and manipulating arrays. In later lessons we will use them extensively to solve
problems. Consider the following session.
-->A = [1,2;3,4]
A =
1.
2.
3.
4.
-->x = [1;2]
x =
1.
2.
-->y=A*x
y =
5.
11.

Here we defined a 2-by-2 matrix A and a 2-by-1 vector x. We then computed the matrix-vector
product y=A*x which is a 2-by-1 vector.
There are various ways to create an array. In general the elements of an array are entered between
the symbols [...]. A space or a comma between numbers moves you to the next column while
a carriage return ("enter" key on the keyboard) or a semi-colon moves you to the next row. For
example
-->B = [ 1 2
-->
3 4]
B =
1.
2.
3.
4.

creates a 2-by-2 array using spaces and a carriage return. The following example shows how
either commas or spaces can be used to separate columns.
-->u = [1,2,3]
u =
1.
2.

3.

-->v = [1 2 3]
v =
1.
2.

3.

It's a matter of preference, but I find that using commas increases readability, especially when
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

2/15

entering more complicated expressions. The comma clearly delimits the column entries. Now we
demonstrate how either semi-colons, carriage returns or both can be used to separate rows.
-->x = [1;2;3]
x =
1.
2.
3.
-->y = [1
-->
2
-->
3]
y =
1.
2.
3.
-->z = [1;
-->
2;
-->
3]
z =
1.
2.
3.

Again, my preference is for a visible delimiter.

2 Initializing an array
As shown above, arrays can be entered directly at the command line (or within a program). There
are some special arrays that are used frequently and can be created using built-in functions. An
array of all zeros can be created using the zeros(m,n)command
-->A = zeros(2,3)
A =
0.
0.
0.
0.
0.
0.

The command ones(m,n) creates an array of all 1's.


-->B = ones(3,2)
B =
1.
1.
1.
1.
1.
1.

An array with 1's on the diagonal and 0's elsewhere is created using the eye(m,n) command
(eye as in identity matrix).
-->C = eye(3,3)
C =
1.
0.
0.
0.
1.
0.
0.
0.
1.

Here this creates a 3-by-3 identity matrix. However, the matrix need not be square, as in

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

3/15

-->D = eye(2,3)
D =
1.
0.
0.
0.
1.
0.

To create a square matrix of all 0's except for specified values along the diagonal we use the
diag([...]) command
-->D = diag([1,2,3])
D =
1.
0.
0.
0.
2.
0.
0.
0.
3.

The diag command also allows you to extract the diagonal elements of a matrix
-->A = [1,2;3,4]
A =
1.
2.
3.
4.
-->diag(A)
ans =
1.
4.

The function size(A) returns the dimensions of the array A, as in


-->size(A)
ans =
2.
2.

If you only want the number or rows or columns of a matrix you can specify that
-->A = [1,2,3;4,5,6]
A =
1.
4.

2.
5.

3.
6.

-->size(A,'r') //or size(A,1) which is how Matlab does it


ans =
2.
-->size(A,'c') //or size(A,2) which is how Matlab does it
ans =
3.

Vectors are one-dimensional arrays. The size() command works on vectors, but the
length() command can be more convenient in that it returns a single number which is the
number of elements in the vector. Consider the following.
-->v = [1;2;3]
v =
1.
2.
3.
-->size(v)
ans =
3.
1.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

4/15

-->length(v)
ans =
3.

An array operator that can be useful in entering array values is the transpose operator. In
Scilab/Matlab this is the single quote sign. Consider the following
-->v = [1 2 3]'
v =
1.
2.
3.

Sometimes it is convenient to produce arrays with random values. The rand(m,n) command
does this.
-->B = rand(2,3)
B =
0.2113249
0.0002211
0.7560439
0.3303271

0.6653811
0.6283918

The random numbers are uniformly distributed between 0 and 1. You can use commands like
zeros, ones and rand to create an array with the same dimensions as an existing array. For
example, in Scilab
-->A = eye(3,2)
A =
1.
0.
0.
1.
0.
0.
-->ones(A)
//Scilab specific
ans =
1.
1.
1.
1.
1.
1.

The ones(A) command uses the dimensions of A to form the array of ones. Matlab is slightly
different. In Matlab the corresponding command would be
>>A = eye(3,2)
A =
1.
0.
0.
1.
0.
0.
>>ones(size(A)) %Matlab specific
ans =
1.
1.
1.
1.
1.
1.

In Matlab you need to use the size() function to pass the dimensions of the array A. In Scilab
you do not.
In many applications you want to create a vector of equally spaced values. For example
-->t = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
t =
0.
0.1
0.2
0.3
0.4
0.5
0.6
EE 221 Numerical Computing

Scott Hudson

0.7

0.8
2016-01-08

Lecture 2: Arrays

5/15

Instead of entering all the values directly you can use the following short cut
-->t = 0:0.1:0.8
t =
0.
0.1
0.2

0.3

0.4

0.5

0.6

0.7

0.8

This "colon" notation tells Scilab/Matlab to create a vector named t, starting at 0 and
incrementing by 0.1 up to the value 1. This approach allows you to define the increment (0.1 in
this case). The default increment is 1 as shown here
-->x = 1:5
x =
1.
2.

3.

4.

5.

Sometimes you are more concerned about the total number of elements in the vector rather than a
specific increment. The linspace() command allows you to specify the start and end values
and the total number of elements. Consider the following.
-->x = linspace(0,1,6)
x =
0.
0.2
0.4

0.6

0.8

1.

This creates a vector with values ranging from 0 to 1 and a total of 6 elements. It is sometimes
convenient to be able to "reshape" an array. In Scilab the matrix() command allows you to do
this.
-->A = [1,2,3;4,5,6]
A =
1.
2.
3.
4.
5.
6.
-->matrix(A,3,2) //Scilab specific
ans =
1.
5.
4.
3.
2.
6.

In Matlab the reshape() command does this.


>>A = [1,2,3;4,5,6]
A =
1.
2.
3.
4.
5.
6.
>>reshape(A,3,2) %Matlab specific
ans =
1.
5.
4.
3.
2.
6.

Finally, the elements of an array are not limited to numerical values but can consist of defined
constants, variables or functions. For example,
-->x = 0.2
x =
0.2

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

6/15

-->C = [cos(x),sin(x);-sin(x),cos(x)]
C =
0.9800666
0.1986693
- 0.1986693
0.9800666

3 Combining arrays
Two or more existing arrays can be combined into a single array by using the existing arrays as
elements in the new array. Consider the following.
-->v = [1,2,3]
v =
1.
2.

3.

-->A = [v;v]
A =
1.
2.
1.
2.

3.
3.

-->B = [v,v]
B =
1.
2.

3.

1.

2.

3.

The arrays being combined must fit together by having compatible dimensions, otherwise you
receive an error.
-->u = [4,5]
u =
4.
5.
-->C = [u;v]
!--error 6
Inconsistent row/column dimensions.

However consider
-->[u,0;v]
ans =
4.
5.
1.
2.

0.
3.

-->D = [u,v]
D =
4.
5.

1.

2.

3.

Here are two more examples.


-->A = [1,2;3,4];
-->B = [5,6;7,8];
-->[A,B]
ans =
1.
3.

2.
4.

5.
7.

EE 221 Numerical Computing

6.
8.

Scott Hudson

2016-01-08

Lecture 2: Arrays
-->[A;B]
ans =
1.
3.
5.
7.

7/15

2.
4.
6.
8.

These kinds of operations are very useful in many applications where large matrices are
constructed by stacking sub-matrices together. The submatrices might represent specific pieces of
a system, and the stacking operation corresponds to assembling the system.

4 Operating on array elements


If A is a two-dimensional array, then A(m,n) refers to the element in the mth row and nth
column. Consider the following.
-->A = [1,2,3;4,5,6;7,8,9]
A =
1.
2.
3.
4.
5.
6.
7.
8.
9.
-->A(3,2)
ans =
8.
-->A(3,2)
A =
1.
4.
7.

= 10
2.
5.
10.

3.
6.
9.

We can both access and assign the value of A(3,2) directly. Often you want to extract a row or
column of a matrix. The "colon" notation allows you to do this. For example,
-->A(:,1)
ans =
1.
4.
7.
-->v = A(2,:)
v =
4.
5.

6.

In the first case we extract the 1st column of A. In the second case we extract the 2nd row of A
and assign it to the variable v. In general the semicolon tells Scilab/Matlab to run through all
values of the corresponding subscript or index. More generally you can extract a subrange of
values. Consider
-->A(1:2,2:3)
ans =
2.
3.
5.
6.

This extracts rows 1 through 2 and columns 2 through 3 to create a new 2-by-2 matrix from the
elements of the original 3-by-3 matrix. Elements of a vector can be deleted in the following
manner.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

8/15

-->t = 0:0.2:1
t =
0.
0.2

0.4

0.6

0.8

-->t(2) = []
t =
0.
0.4

0.6

0.8

1.

1.

-->t(3:5) = []
t =
0.
0.4

In the first case we delete the second element of t. In the second case we delete elements 3
through 5. This technique applies to deleting rows or columns of a matrix.
A

=
1.
4.
7.

2.
5.
10.

-->A(1,:) = []
A =
4.
5.
7.
10.

3.
6.
9.

6.
9.

-->A(:,2) = []
A =
4.
6.
7.
9.

5 Arithmetic operations on arrays


Arrays can be multiplied or divided by a scalar (single number)
-->1.5*[1,2,3]
ans =
1.5
3.

4.5

-->[1,2,3]/2
ans =
0.5
1.

1.5

An operation such as A+1 where A is a matrix makes no sense algebraically, but Scilab/Matlab
interprets this as a shorthand way of saying you want to add 1 to each element of A
-->[1,2;3,4]+1
ans =
2.
3.
4.
5.

Addition, subtract and multiplication of two arrays follow the rules of linear algebra. To add or
subtract arrays they must be of the same size. If they are not you get an error.
-->A = [1,2;3,4]
A =
1.
2.
3.
4.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

9/15

-->B = [1,2,3;4,5,6]
B =
1.
2.
3.
4.
5.
6.
-->A+B
!--error 8
Inconsistent addition.

Otherwise each element of the resulting array is the sum or difference of the corresponding
elements in the two arrays.
-->C = [2,1;4,3]
C =
2.
1.
4.
3.
-->A+C
ans =
3.
7.
7.
-->A-C
ans =
- 1.
- 1.

3.

1.
1.

To multiply two arrays as in A*B the number of columns of A must equal the number of rows of
B. If A is m-by-n then B must be n-by-p. The product A*B will be m-by-p.
-->A
A =
1.
3.

2.
4.

-->B
B =
1.
4.

2.
5.

-->A*B
ans =
9.
19.

3.
6.

12.
26.

15.
33.

-->B*A
!--error 10
Inconsistent multiplication.
n

If C=A*B the elements of C are computed as C ij = Aik B kj .


k =1

The inner product, or dot product of two vectors can be calculated by transposing one and
performing an array multiplication.
-->u = [1;2;3]
u =
1.
2.
3.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

10/15

-->v = [4;5;6]
v =
4.
5.
6.

-->u'*v
ans =
32.

Arrays cannot be divided, but we do have the concept of "inverting" a matrix to solve a linear
1
equation. If A x=b and A is a square, non-singular matrix, then x= A b is the solution to
this system of linear equations. This is sort of like "dividing A into b." In Scilab/Matlab we use
the notation A\b to represent this. You can also think of the \ operator as representing the inverse
1
operation, so that A\ functions as A . Consider the following.
-->A = [1,2;3,4]
A =
1.
2.
3.
4.
-->b = [5;6]
b =
5.
6.
-->x = A\b
x =
- 4.
4.5
-->A*x
ans =
5.
6.

If the matrix A is singular (has no inverse) you'll get an error. We'll talk more about solving
systems of linear equations later.

6 Vectorized functions
A very powerful feature of Scilab/Matlab is that functions can be vectorized. In a language
such as C, if I have an array x and I want to calculate the sin of each element of x, I need to use
a for loop, as in
for (i=1;i<=n;i++) {
y(i) = sin(x(i));
}

In Scilab/Matlab we simply write


y = sin(x)

This creates a new array y of the same size as x. Each element of y is the sin of the
corresponding element of x. For example
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

11/15

-->x = [0,0.1,0.2,0.3]
x =
0.
0.1
0.2
-->y = sin(x)
y =
0.
0.0998334

0.3

0.1986693

0.2955202

7 Array operators
In linear algebra, arrays (e.g., vectors and matrices) are considered entities in their own right and
there are rules for operating on them, such as matrix multiplication and the inverse. In practice,
sometimes an array is just a convenient collection of numbers. In this case you might want to
perform operations on the elements of the array independent of the rules of linear algebra. For
example, suppose u=[1 2 3] and v=[4 5 6] and you want to multiply each element of u by
the corresponding element of v w=[14 25 36]=[ 4 10 18] . This is not a standard operation
in linear algebra. To perform component-by-component operations (or "array operations") you
prefix the operator with a period. For example,
-->u = [1,2,3]
u =
1.
2.

3.

-->v = [4,5,6]
v =
4.
5.

6.

-->u.*v
ans =
4.

10.

-->u.^2
ans =
1.

4.

18.

9.

This brings up a subtle point. Consider the following.


-->A = [1,2;3,4]
A =
1.
2.
3.
4.
-->A^2
ans =
7.
15.
-->A.^2
ans =
1.
9.

10.
22.

4.
16.

Notice the very different results. The operation A^2 tells Scilab/Matlab to use the rules of matrix
multiplication to calculate A*A. The operation A.^2 tells Scilab/Matlab to square each element
of A.
When an array represents a collection of values, say measurements, we often want to look at
properties such as the sum or average of the values. Consider the following.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

12/15

-->x = 0:0.1:1
x =
0. 0.1
0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.

-->sum(x)
ans =
5.5
-->mean(x)
ans =
0.5

The function sum() adds all the elements of an array while mean() calculates the average
value of the elements. If we wanted to find the mean-square value (average of the squared values)
we could use the command
-->mean(x.^2)
ans =
0.35

This first squares each element of x and computes the mean of those squared values. Here's an
important point. Consider the following
-->x = [1,2];
-->1./x
ans =
0.2
0.4
-->(1)./x
ans =
1.
0.5

In the first instance Scilab interpreted the dot as a decimal point and returned
-->(1.)/x
ans =
0.2
0.4

which is something called the pseudo inverse and not what we were after. We need to use
parentheses to avoid this interpretation
-->(1)./x
ans =
1.
0.5

8 Complex array elements


All of the preceding applies to complex array elements. For example, in Scilab (for Matlab use
1i instead of %i)
-->A = [1,%i;2,-%i]
A =
1.
i
2. - i
-->A^2
ans =
1. + 2.i

1. + i

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

13/15

2. - 2.i

- 1. + 2.i

The output can be a bit difficult to read with complex arrays. A^2 produces a 2-by-2 matrix of
complex values each expressed as real and imaginary parts. It can be helpful to view these
separately. The real and imag commands allow you to do this.
-->B = A^2
B =
1. + 2.i
2. - 2.i

1. + i
- 1. + 2.i

-->real(B)
ans =
1.
1.
2. - 1.
-->imag(B)
ans =
2.
1.
- 2.
2.

One subtle point is that the single-quote operator is actually the "transpose-conjugate" operator. It
takes the transpose of a matrix followed by its complex conjugate. For a real matrix this is simply
the transpose. But for a complex matrix you need to remember that there is also a conjugation
operation.
-->A
A =
1.
2.

i
- i

-->A'
ans =
1.
- i

2.
i

If you just want the transpose of a complex matrix use the .' operator.
-->A.'
ans =
1.
i

2.
- i

9 Structures
Arrays are convenient ways to organize data of a common type. Often you want to organize
different types of data as a single entity. Many programing languages allow you to do this using
structures. In Scilab/Matlab a structure is define as follows.
St = struct(field1,value1,field2,value2...);

Here's an example
-->Object = struct('name','ball joint','mass',2.7)
Object =
name: "ball joint"
mass: 2.7

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

14/15

This creates a structure named object having two fields: name and mass. Fields are accessed
using the syntax struct.field. Here we change the mass
-->Object.mass = 3.2
Object =
name: "ball joint"
mass: 3.2

Fields can be strings, numbers or even arrays. For example, say we have a rigid body. The
orientation of this body in space is specified by six numbers: three coordinates of its center of
mass, and three rotation angles. We might define a structure such as
-->Body = struct('pos',[-2,0,3],'ang',[12,-20,28])
Body =
pos: [-2,0,3]
ang: [12,-20,28]

Elements of the arrays are accessed as follows


-->Body.ang(2) = -22
Body =
pos: [-2,0,3]
ang: [12,-22,28]

The use of structures can greatly streamline complicated programing projects by uniting various
data into a single logical entity. Arrays of structures are possible, as in
-->Person = struct('name','','age',0);
-->Member = [person,person,person];
-->Member(2).name = 'Tom';
-->Member(2).age = 32;
-->Member(2)
ans =
name: "Tom"
age: 32

This defines a structure person and then an array of three of these structures named member.
Each element could refer to a member of a three-member team. Fields are accessed as shown.
You can even have structures serve as fields of other structures. Here's an example.
-->Book = struct('title','','author','');
-->Class = struct('name','','text',Book);
-->Class.name = "Engl 101";
-->Class.text.title = "How to Write";
-->Class.text.author = "Kay Smith";
-->Class.text
ans =
title: "How to Write"
author: "Kay Smith"

You can skip the initializing of the structure and just start assigning values to the fields. For
example.
-->clear
-->Class.name = 'Intro Psych';
-->Class.enrollment = 24;

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 2: Arrays

15/15

10 Three and higher dimensional arrays


Arrays can have as many dimensions as desired. Of course arrays with more than two dimensions
are cumbersome to display on the screen or page. Consider
-->A = zeros(2,2,2)
A =
(:,:,1)
0.
0.
(:,:,2)
0.
0.

0.
0.
0.
0.

This creates a three-dimensional array. Elements are accessed in the usual manner
-->A(2,1,2) = 3
A =
(:,:,1)
0.
0.
(:,:,2)
0.
3.

0.
0.
0.
0.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3
Programming structures
1 Introduction
In this lecture we introduce the basic programming tools that you will use to build programs. In
particular we will be introduced to the if, for and while loops.

2 Program files
2.1 Directories
The pwd command (print working directory) tells you what directory you are currently working
in.
-->pwd
ans =
C:\Users\Hudson

Use the cd (change directory) command followed by a valid directory name to move to another
directory.
-->cd Desktop
ans =
C:\Users\Hudson\Desktop

In Scilab the cd command with no argument moves you to your home directory.
-->cd
ans =
C:\Users\Hudson

Entering '..' for the directory name moves you up one directory level.
-->cd '..'
ans =
C:\Users
-->pwd
ans =
C:\Users

The mkdir and rmdir commands make/remove subdirectories.


-->mkdir newone;
-->cd newone
ans =
C:\Users\Hudson\newone
-->cd '..';
-->rmdir newone

The dir and ls commands list the contents of the current directory. If you want to list specific
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

2/14

files you can enter a name. You can use the '*' character as a wild card. For example

-->ls *menu
ans =
Start Menu

lists all file names that end in 'menu' (note this is not case sensitive).

2.2 Editing and running program files


From the command line, type edit to open up an editor window. Alternately there is an edit
icon in the upper-left corner of the command-line environment.
The Scilab editor is called SciNotes and is shown below. The circled arrow icons execute
whatever program you've entered into SciNotes. The first just executes the program as saved on
disk. The second first saves your code and then executes. The third saves all open programs (you
might have more than one open in SciNotes) before execution. To save without execution either
use the File menu or the disk icon.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

3/14

Scilab program files have the extension *.sce. In Windows Scilab file icons look something like
this

In Windows you can right-click on the desktop, or in a directory, and you should see a menu
similar to the following. This will create an empty *.sce file named something like
New scilab-5.5.0 (64-bit) Application (.sce)

In either case double click on a Scilab file icon to open it in SciNotes. From there you can edit
and execute it.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

4/14

2.3 Comments
A program is just a file containing one or more Scilab commands. In principle you could have
entered these one at a time on the command line, but when structured as a program they are much
more convenient to develop, save, modify and rerun. This brings up a key point about
programming. Generally the goal when writing a program is to create a resource that can be
reused and modified in the future, either by yourself or by others. When you are writing a
program you (hopefully) are clear about the purpose of the various commands and how they fit
together to implement an algorithm that solves the problem of interest. However, if you come
back in a month and look at the list of commands in your program it is quite common to have
forgotten all this. And, obviously, if some else opens your program they probably will have little
to no idea of what it's supposed to do. For this reason, programming languages allow programs to
include comments designed to explain in words what the program commands are doing. A wellwritten program will contain comments that clearly explain what the program as a whole and its
various components do. This allows you or another programmer to easily understand how to use
and/or modify the program.
Scilab follows the C++ syntax for comments. Any text from the double back-slash symbol (//)
until the end of a line is treated as a comment. Matlab uses the percent sign (%). For example the
rather wordy code
//tryit.sce, Scott Hudson, 2011-06-03
//This program is an example, blah blah blah
x = 2; //this is an end-of-line comment
disp(sqrt(x)); //here's another

does exactly the same thing as the bare program


x = 2;
disp(sqrt(x));

The comment text is there only for the benefit of the human trying to understand the program.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

5/14

2.4 Coding standards


In the spirit of using comments for understanding, adherence to a coding standard is an effective
way to ensure that all programs have the same look and feel and correctly interact with one
another. This is the software equivalent of the International Mechanical Code, or the National
Electrical Code. Work that does not meet code is unacceptable and must be corrected. In this
course you are expected to follow a simple coding standard that will be provided with your
assignments. Failure to follow the standard will result in a loss of credit on assignments and
exams.
With this as background, we are now ready to learn about basic components with which we can
build a computer program. The details will be specific to Scilab/Matlab, but the concepts are
universal across almost all programing languages.

3 Flow control
Flow control refers to program statements that determine which parts of the program get
executed and under what conditions based on logical (true/false) conditions. The main flow
control statements we will use are: if, select, for, and while.

3.1 Logical Expressions


The most common logical expressions consist of relational statements combined with the basic
logical operators: and, or, not.
<
<=
>
>=
==
~=
&
|
~

less than
less than or equal to
greater than
greater than or equal to
equal to
not equal to
logical and
logical or
logical not

The statement 1<2 asks Scilab/Matlab to answer the question, Is one less than 2? The answer is
yes, so Scilab returns T for true.
-->1<2
ans =
T

We use the equal sign = to assign values to variables. To test the equality of two expressions we
use the double-equal symbol ==. Here we have Scilab tell us that 1 is not equal to 2 (F stands for
false).
-->1==2
ans =
F

On the other hand the statement 1 is not equal to 2 is true.


-->1~=2
ans =
T

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

6/14

We can combine multiple T/F statements using the and/or/not logical operators.
-->(1<2)&(3>=4)
ans =
F

This statement is false because the and operator & requires both expressions to be true. (1<2)
is true but (3>=4) is false. Therefore the entire statement is false. On the other hand
-->(1<2)|(3>=4)
ans =
T

since the or operator | only requires one of the statements to be true.


It is good programming practice to use parentheses to clarify the order in which the elements of
an expression are evaluated. Blank space can help readability also. For example
( ((x>1)&(y<0)) | (z>=0) )

An important point is that an interval test, such as a xb , has to be performed as the and of
two inequalities, such as
((a<=x)&(x<=b))

3.2 if statement
The if statement is one of the most commonly encountered in programming. It allows you to do
something in response to a logical condition. In Scilab/Matlab its syntax is
if expression
statements
end

Note that Scilab allows an optional then after expression as in


if expression then
statements
end

Since Matlab does not use the then syntax we will avoid it in this class. This will make our
Scilab code more compatible with Matlab.
The way an if statement works is that Scilab/Matlab evaluates expression. If true, all
statements before the end keyword are executed. If false the statements are skipped and
execution continues at the statement immediate following the end keyword.
if (1<2)
disp('that is true');
end

This code will write that is true to the command environment since 1 is less than 2. On the
other hand
if (1>2)
disp('that is true');
end

will not write anything since the expression (1>2) is false. The statements can be of any number
and kind.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

7/14

if (x>1)
y = x;
x = x^2;
end

An optional "default" else statement can be included


if expression
statements1
else
statements2
end

If expression is true then statements1 are executed, and if it is false then


statements2 are executed. Additionally, any number of elseif statements can be included
if expression1
statements1
elseif expression2
statements2
elseif expression3
statements3
...
else
statments
end

Here expression1 is evaluated. If it's true then statements1 are executed and the
program continues after the end statement. If expression1 is false then expression2 is
evaluated. If it is true then statements2 are executed and the program continues after the
end statement. This can be repeated for as many expressions as you want. If the optional else
is present then if none of the statements are true, the else statements will execute. It's
important to note that no more than one set of statements will be executed, even if more than one
of the expressions are true. Only the statements corresponding to the first true expression are
executed, or failing that, the else statements if present.
As an example
if (x<1)
y = 0;
elseif (x>1)
y = 1;
else
y = x;
end

Here is an important error to watch out for. You should generally not use arrays in logical
expressions. For example, consider the following
x = [0,1];
disp(x<0.5);

This produces the output


T F

Scilab goes through the array x and evaluates the statement (x<0.5) for each element, producing
an array out binary T/F values. Now consider
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

8/14

x = [0,1];
if (x<0.5)
disp('x<0.5');
end

This produces no output, even though 0<0.5. Scilab can't both execute the disp statement and not
execute it. It will only execute the statement (and only once at that) if (x<0.5) is true for all
elements of x. If your intention is to run the if statement for each element of x independently, then
you need to add a for loop (see below), as in
x = [0,1];
for i=1:2
if (x(i)<0.5)
disp('x<0.5');
end
end

3.3 select statement


Sometimes you want to perform certain actions based on the value of a variable. You can
implement this using the if elseif end structure, but the select statement in
Scilab (switch statement in Matlab) is more convenient. The syntax is
select expression
case v1
statements
case v2
statements
else
statements
end

If expression has the value v1 then the statements listed under case v1 are executed. If it has
value v2 then the v2 statements are executed. If it has none of specified values then the
optional else statements are executed. For example,
select x
case 1
msg = "x is 1";
case 2
msg = "x is 2";
else
msg = "x is something else";
end

3.4 for loop


The for loop is used to execute a set of statements multiple times as determined by an index
variable or a list of values. It has the syntax
for x=expression
statements
end

As an example
x = zeros(1,5);
for i=1:5
x(i) = i;
end
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures


-->x
x =

1.

2.

3.

9/14

4.

5.

The for loop evaluates 1:5 as the list of numbers [1,2,3,4,5]. It initializes the index
variable i to the first value (i=1) then runs the statement x(i) = i causing x(1) to be
assigned the value 1. When the end statement is reached the for loop assigns to i the second
value in the list (2) and once again executes the statements causing x(2) to be assigned the
value 2. It continues until it exhausts the list.
The expression can be a sequence defined with colon notation, such as 1:5, or 2:0.1:3, or it
can be an array. An example of the latter case would be
y = [1,5,-3];
for x=y
x^3
end

The output is
ans

1.

ans

=
125.
ans =
- 27.

3.5 while loop


The while loop allows you to continue executing a set of statements as long as some logical
statement is true. The syntax is
while expression
statements
end

The loop evaluates expression. If it's false then execution skips to the statement following
end. If it's true then statements are executed and expression is evaluated again. If it's
still true then statements are once again executed. This continues until expression is
false. At that point execution continue with the first statement after the end keyword. As an
example, consider
x = 1;
while (x<10)
x = 2*x;
end

This produces
-->x
x =
16.

as follows

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

10/14

1. (x=1)<10 so x=2*x=2
2. (x=2)<10 so x=2*x=4
3. (x=4)<10 so x=2*x=8
4. (x=8)<10 so x=2*x=16
5. (x=0) is not <10 so the loop terminates with x=16

4 Functions
Functions allow us to break large programs into conceptual blocks that can be reused and
reorganized. One form of "top down" programming is to take a large problem and break it into
manageable parts. If those parts are themselves large they can be broken up into subparts and so
on. A programing solution to some problem might then have the structure
Big problem
Part A
Subpart
Subpart
Part B
Subpart
Subpart
...

A1
A2
B1
B2

In this approach, each of the subparts might be a separate function that is called by its "parent"
part. Those parent parts might also be functions which are called by the main program. These
functions are logically separate and often exist in separate files. We can even build up libraries
of useful functions which can be used repeatedly in different programs.
There are a few important differences between the ways Scilab and Matlab treat functions.

4.1 Simple function definition (Scilab)


In Scilab you can define a relatively simple function using the deff (define function) command.
For example
deff('y=f(x)','z=x^2,y=z+1');

This defines f(x) to return the value x^2+1 by first calculating z=x^2 followed by y=z+1.
Multiple statements can separated by commas as shown.

4.2 General function definition (Scilab)


For more complicated functions we use the function ... endfunction construct. As an
example
function y = myabs(x)
if (x>=0)
y = x;
else
y = -x;
end
endfunction

implements the absolute value operation. This function is now defined and can be called from the
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

11/14

command line
-->myabs(-3)
ans =
3.

or from within a program.


Arguments and returned variables can be arrays. It is also possible to return more than one
variable. In this case an example of the syntax is
function [m,s]=moms(x)
m = mean(x);
s = mean(x.^2);
endfunction

This takes a vector as input and outputs two scalars


-->x = 1:5
x =
1.
2.

3.

4.

5.

-->[a,b] = moms(x)
b =
11.
a =
3.

In Scilab multiple functions can reside in a single program file. The filename does not have to be
related to any of the function names.

4.3 General function definition (Matlab)


The Matlab function syntax is the same as in Scilab with the exception that end is used in place
of endfunction. In fact end is optional. Additionally (from Matlab documentation)
The commands and functions that comprise the new function must be
put in a file whose name defines the name of the new function,
with a filename extension of '.m'.

In other words if you write a function y=myfunc(x) it must be saved in a file named myfunc.m
(an m-file). You can then call myfunc(x) from the command line or in other functions. The
Scilab approach is closer to languages such as C and Fortran.

4.4 Recursive functions


A function which calls itself is said to be recursive. It may seem bizarre to have a function call
itself, but it can be very useful. This simple example implements the factorial function.
function m = myfact(n)
if (n==1)
m = 1;
else
m = n*myfact(n-1);
end
endfunction

If n=1 then n!=1 is the returned value. For n> 1 we use the fact that n!=n(n1)! . Suppose
we call myfact(3). The function wants to return m = 3*myfact(2). But myfact(2)
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

12/14

needs to be evaluated first. Scilab opens a new instance of the function and passes the argument
2. This second function call wants to return m = 2*myfact(1). But myfact(1) needs to
be evaluated. So Scilab opens a third instance of the function and passes the argument 1. In this
case the condition (n==1) is true and the value m = 1 is returned to the second function call.
The second function call now has the value of m = 2*1 and returns this to the first function
call. Finally the first function call can now evaluate m = 3*2 and return this to the user.
Obviously a recursive function will work only if eventually it stops calling itself and returns a
specific value.

4.5 Nested functions


It is possible to have a function defined within another function. For example, in Scilab (for
Matlab simply replace endfunction with end)
function y = f(x)
function v = g(u)
v = cos(u);
endfunction
y = g(x)*sin(x);
endfunction
-->f(2)
ans =
- 0.3784012
-->g(2)
!--error 4
Undefined variable: g

Function g(x) is defined inside function f(x). When we evaluate f(x) at a value of 2, that
function sets y = g(2)*sin(2) where g(2)=cos(2) resulting in a returned value of
cos (2)sin (2)=0.3784012
When we try to call g(u) outside of f(x), however, we get an error. Since g(u) is defined
inside of f(x) it is not visible outside of g(u). We say that the scope of the function g(u) is
limited to inside of f(x). Now consider the following
function v = g(u)
v = sin(u);
endfunction
function y = f(x)
function v = g(u)
v = cos(u);
endfunction
y = g(x)*sin(x);
endfunction
-->f(2)
ans =
- 0.3784012
-->g(2)
ans =
0.9092974

We have two functions named g(u). When g(u) is called inside f(x) it clearly returns the
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

13/14

value g(2)=cos(2) since we get the same output for f(2) as before. But now calling g(2)
outside of f(x)does not produce an error. Instead it returns
g(2)=cos(2)=0.9092974

which is the definition of g(u)given outside of f(x). Generally speaking, variables and
functions are only visible inside the function in which they are defined, and inside all nested
functions. Thus variables in the main program are globally visible. If a variable or function is
defined a second time inside a function, that definition overrides the previous definition within
the scope of that function (and nested functions).

4.6 Variable scope and global variables


Now consider the behavior of the following five similar programs. In case 1, a=2 is defined in
the main program. Function f (x ) , being nested within the main program, can see the value of
a, so it calculates f (2)=4 . In case 2, f ( x ) changes the value of a to a=4 , and calculates
f (2)=8 , but notice that outside of the function definition we still have a=2 . The scope of the
change a=4 is only within the function declaration, not outside it. In case 3 we initially have
a=2 , but before calling f (x ) we change the value to a=4 . This change is visible inside the
function, so f (2)=8 . In case 4 we declare the variable a to be global and assign it the value
a=2 . Inside the function we also declare a to be global and assign it the value a=4 . As in case
2, this results in f (2)=8 but it also changes the value of a outside the function (globally).
a = 2; //case 1
function y = f(x)
y = a*x;
endfunction
disp('f(2)='+string(f(2))+', a='+string(a));
f(2)=4, a=2
a = 2; //case 2
function y = f(x)
a = 4;
y = a*x;
endfunction
disp('f(2)='+string(f(2))+', a='+string(a));
f(2)=8, a=2
a = 2; //case 3
function y = f(x)
y = a*x;
endfunction
a = 4;
disp('f(2)='+string(f(2))+', a='+string(a));
f(2)=8, a=4

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 3: Programming structures

14/14

global a //case 4
a = 2;
function y = f(x)
global a
a = 4;
y = a*x;
endfunction
disp('f(2)='+string(f(2))+', a='+string(a));
f(2)=8, a=4

4.7 Variable number of arguments


It can be very useful for functions to have a variable number of input and/or output arguments.
The following function has two input and two output arguments.
function [y,ySize] = f(x,z)
[nargout,nargin] = argn(); //Scilab needs this, Matlab doesn't
if (nargin==1)
y = x^2;
else
y = z*x^2;
end
if (nargout==2)
if (abs(y)>=10)
ySize = 'big';
else
ySize = 'small';
end
end
endfunction

However we can call it with a single input and assign its output to a single variable
u = f(3);
disp('u='+string(u));
u=9

Or supply both input variables and assign its output to two variables
[u,v] = f(3,2);
disp('u='+string(u)+' and v='+v);
u=18 and v=big

The first line of the function assigns values to nargin, the number of input arguments, and
nargout, the number of output arguments. (This assignment is needed in Scilab; in Matlab
these variables are automatically assigned.) If nargin==1 we know that only an x value was
passed to the function. Otherwise we know that both x and z values are available. Likewise, if
nargout==1 we know that only y needs to be calculated. Otherwise we also assign a string
value to ySize.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 4
Input and output
1 Introduction
The deliverable of a computer program is its output. Output may be in graphical form as in a
two-dimensional function plot, or it may be in text form as in a table of data values. In addition
we often need to provide a program with input data, either interactively from the console or from
a disk file. We will cover graphics in future lectures. Here we look at various ways to input and
output data to and from the console and disk files.

2 Basic input/output (I/O)


Scilab/Matlab automatically displays the value of a variable when you type its name at the
command line. For example
-->x = 2.3;
-->x
x =
2.3

In a program, on the other hand, the appearance of a variable name does not produce printed
output. Moreover, even with console I/O we may want more control over the output format. We
also need a way for a program to display values and ask for input, either from the user or from a
file. There are various ways to approach this. We start with the simplest.

2.1 The disp function


Most basic console I/O can be implemented with two functions: disp and input. The disp
function can be used to display the values of variables without the extra "ans = " prefix. For
example
-->[x,y]
ans =
2.3

4.1

-->disp([x,y])
2.3
4.1

This works from within a program also. You can also combine it with the string() function
(num2str() in Matlab) and the string concatenate operation. This creates one long string as in
the following
-->disp(string(x)+' plus '+string(y)+' equals '+string(x+y));
2.3 plus 4.1 equals 6.4

This is generally good enough for most basic program output.

EE 221 Numerical Computing

Scott Hudson

2016-02-02

Lecture 4: Input and output

2/9

2.2 The input function


The input() function can be used to prompt the user for input. We will illustrate this with a
few examples
-->x = input('please enter the value of x :');
please enter the value of x :3.2
-->disp(x)
3.2

You can input arrays


-->y = input('enter a 1-by-3 vector : ')
enter a 1-by-3 vector : [1,2,3]
y =
1.
2.
3.

and strings
-->fname = input('output file : ')
output file : 'test.txt'
fname =
test.txt

Here's a little snippet of code that prompts the user for an array of data and prints the average
value.
z = input('enter a 1-by-n array of numbers : ');
disp('the average value is '+string(mean(z)));

this produces
enter a 1-by-n array of numbers : [1,2,3,4,5,6]
the average value is 3.5

2.3 The save and load functions


The save and load functions allow you to dump and recover any or all variables you have
created.
-->A = [1,2;3,4];
-->x = 1:5;
-->s = 'hello there';
-->save('test.dat');

The save command saves all variable names and values, essentially your entire Scilab session.
You can exit Scilab and in a later session use the load command to recove these saved values.
-->load('test.dat')
-->A
A =
1.
3.

2.
4.

-->x
x =
1.

2.

3.

EE 221 Numerical Computing

4.

5.

Scott Hudson

2016-02-02

Lecture 4: Input and output

3/9

-->s
s =
hello there

This works the same in Matlab, except the file name should not have an extension (for example
'test') as Matlab appends the .mat extension automatically. You can explicitly specify the
variables you want to save/load as in
-->save('Ax.dat','A','x'); //Scilab
>> save('Ax','A','x'); %Matlab

This saves only the variables A and x. To load one or more specific variables you do the
following
-->load('Ax.dat','A'); //Scilab
>> load('Ax','A'); %Matlab

3 Formatted I/O
Scilab/Matlab implement versions of the C fprintf and sprintf functions for formatted output.

3.1 mfprintf (Scilab) & fprintf (Matlab)


The C fprintf function (file print formatted) allows you great flexibility for generating
formatted output to the console or to a file. Scilab implements this as mfprintf while Matlab
uses fprintf. We will illustrate the mfprintf function in Scilab. For Matlab, simply leave
off the initial m.
The calling sequence for mfprintf is
mfprintf(fd,format,var_1,var_2,...,var_n);

fd is a file descriptor and format is a string that specifies how you want the output formatted.
var_1 through var_n are the variables you want displayed. In Scilab the number 6 is the file
descriptor for the console. This is also stored in the protected variable %io(2).
An example is
-->fd = %io(2);
-->x = 1.23;
-->mfprintf(fd,'the value of x is %f\n',x);
the value of x is 1.230000

In Matlab the screen corresponds to fd = 1.


In the format string %f stands for a floating-point number, %e stands for an exponential-format
number (scientific notation), %d for a decimal number (integer) and %s for a string. In the output
these are replaced by the variable values. Here's another example
x = 1.23;
y = 4;
z = %pi*1e10;
str = 'testing one two three';
mfprintf(fd,'%f %d %e %s\n',x,y,z,str);
1.230000 4 3.141593e+010 testing one two three

The \n symbol denotes a "new line." If you omit this then subsequent mfprintf commands
will be appended to the same line. Consider the following.
EE 221 Numerical Computing

Scott Hudson

2016-02-02

Lecture 4: Input and output

4/9

mfprintf(fd,'the value of x is %f',x);


mfprintf(fd,' and y is %f\n',y);

produces
the value of x is 1.230000 and y is 3.210000

Finally, if you want to control the precise format of the numerical output, the syntax for a floating
point number is %m.nf where m is the total number of spaces (you need one for the decimal
point and you might need one for the sign) and n is the number of decimal places. For example
-->mfprintf(fd,'%4.2f\n',x);
1.23
-->mfprintf(fd,'%6.2f\n',x);
1.23
-->mfprintf(fd,'%6.3f\n',x);
1.230

For %d and %s formats you can use the syntax %md or %ms where m is the total number of
spaces to be displayed. Note that if you don't allocate enough, the full value will be printed
anyway. If you allocate "too much" then blank space will be added. In the following example we
generate a formatted table of trig values.
N = 4;
x = linspace(0,%pi/2,N);
y = sin(x);
z = cos(x);
mfprintf(fd,'\n'); //creates blank line at start
mfprintf(fd, '%6s %6s %6s\n','x','sin','cos');
for i=1:N
mfprintf(fd,'%6.3f %6.3f %6.3f\n',x(i),y(i),z(i));
end

Note the mfprintf(fd,'\n'); statement used to clear any previously "open" lines of
output. The output is
x
0.000
0.524
1.047
1.571

sin
0.000
0.500
0.866
1.000

cos
1.000
0.866
0.500
0.000

A couple more points. Consider the following.


-->mfprintf(fd,'%3d\n',k);
5
-->mfprintf(fd,'%-3d\n',k);
5
-->mfprintf(fd,'%03d\n',k);
005

The format %-3d causes the output to be left aligned as opposed to the default right alignment.
The format %03d causes the output to be right aligned but all remaining space to the left is filled
with zeros.

EE 221 Numerical Computing

Scott Hudson

2016-02-02

Lecture 4: Input and output

5/9

In keeping with the vectorized nature of Scilab/Matlab, the mfprintf (and fprintf)
function is also vectorized. For example
-->x = 1:3
x =
1.
2.
3.
-->mfprintf(fd,'%d %d %d\n',x)
1 2 3

Scilab/Matlab recognizes that x is an array. It fills in the 3 %d formats with x(1), x(2) and
x(3). Now consider this
A = [1,2;3,4];
-->mfprintf(fd,'%f %f\n',A)
1.000000 2.000000
3.000000 4.000000

mfprintf repeats itself for each row of matrix A. In addition to mfprintf there is a Scilab
function mprintf that does not require the file descriptor argument and prints directly to the
console.
-->x = 2;
-->mprintf('%f %f %f\n',x,x^2,x^3)
2.000000 4.000000 8.000000

The advantage of using mfprintf with fd = %io(2) for console output is that it is very
simple to modify your code to output to a file. You merely need to assign the fd variable using
the mopen command described below.

3.2 msprintf (Scilab) & sprintf (Matlab)


The msprinf function (sprintf in Matlab) is similar to the mfprintf function except that
instead of writing output to the console or a file, it writes it to a string that can be assigned to a
variable. Consider the following
-->k = 3;
-->name = msprintf('file%03d.txt',k)
name =
file003.txt

This has created a string with the value of k embedded. An example where this is very useful is
in creating frames for an animation where k goes from 1 to N and each file output is a single
frame.

3.3 Opening and closing files


If we want output to go to a disk file instead of the console, we need to create and/or open a file,
write to it, and close the file. Opening a file is done with the mopen command (fopen in
Matlab).
[fd,err] = mopen('test.txt','wt');

This opens the file test.txt for output in the current directory. If it doesn't exist it is created.
If it does exist is it overwritten. The 'wt' notation indicates that we are opening this file for
writing in text format. You can also write in binary format, but we won't cover that. 'at'
designates a text file opened for appending. To open a text file for reading we use 'rt'.
EE 221 Numerical Computing

Scott Hudson

2016-02-02

Lecture 4: Input and output

6/9

[fd,err] = mopen('test.txt','rt');

The resulting file descriptor fd can be used to refer to the file and err is an error flag. It is zero
or empty if the open process worked properly and non-zero otherwise. You will get an error if
you try to open a file for reading that doesn't exist, or a file for writing in a directory where you
don't have write permission. Good programing practice is to always include error checking. For
example
[fd,err] = mopen('test.txt','rt');
if (err)
error('cannot open test.txt');
end

If there is an error opening 'test.txt' you will get a message and then Scilab will exit using
the error() function. It is very important to always close a file when you have finished reading
or writing. This is done with the mclose(fd) command (fclose(fd) in Matlab).
Here's an example
A = [1,2;3,4];
[fd,err] = mopen('A.txt','wt');
if (err)
error('cannot open file');
end
mfprintf(fd,'%f %f\n',A);
mclose(fd);

This creates the file A.txt which contains the following.


1.000000 2.000000
3.000000 4.000000

A common source of errors when trying to open a file for reading is that the file does not exist. A
way to test for this is using the isfile() command. In the following example a file named
'test.txt' exists the current directory but a file named 'test2.txt' does not.
-->isfile('test.txt')
ans =
T
-->isfile('test2.txt')
ans =
F

We might use this as follows


name = 'test.txt';
if (isfile(name))
[fd,err] = mopen(name,'rt');
if (err)
mfprintf(%io(2),'cannot open file %s\n',name);
else
mfprintf(%io(2),'file %s is now open\n',name);
end
else
mfprintf(%io(2),'file %s does not exist\n',name);
end

EE 221 Numerical Computing

Scott Hudson

2016-02-02

Lecture 4: Input and output

7/9

This produces the output


file test.txt is now open

whereas changing the first line to


name = 'test2.txt';

gives us the message


file test2.txt does not exist

3.4 mfscanf (Scilab) and fscanf (Matlab)


The mfscanf function is a modification of the C fscanf function and allows you to perform
formatted input from a file. Suppose we use a text editor to create a text file named data.txt that
contains
1 2 3
4 5 6

We can read these numbers into a vector x as follows (note that for compactness we are not
including error checking for the mopen command).
[fd,err] = mopen('data.txt','rt');
[n,x] = mfscanf(6,fd,'%d');
mclose(fd);
disp(x');
1.
2.
3.
4.
5.
6.

Variable n indicates the number of successful reads. It is -1 if the end of file was reached before
all desired data were read. The 6 tells mfscanf to read six times in the %d format. These are
assigned as the elements of x. Here a variation
[fd,err] = mopen('data.txt','rt');
[n,x] = mfscanf(3,fd,'%d');
[n,y] = mfscanf(3,fd,'%d');
mclose(fd);
-->x'
ans =

1.

2.

3.

-->y'
ans =

4.

5.

6.

This reads 3 numbers and assigns them to x, then another 3 numbers are read and assigned to y.
In Matlab the ordering is slightly different
[fd,err] = fopen('data.txt','rt');
[n,x] = fscanf(fd,'%d',3);
[n,y] = fscanf(fd,'%d',3);
fclose(fd);

Here another example in Scilab. This time the file data.txt looks like this
east
west
south
north

2
4
1
3

23.75
-94.5
8.2
-7.9

EE 221 Numerical Computing

Scott Hudson

2016-02-02

Lecture 4: Input and output

8/9

The following commands


[fd,err] = mopen('data.txt','rt');
[n,s,d,x] = mfscanf(4,fd,'%s %d %f');
mclose(fd);

fill the arrays s, d and x with the corresponding string, decimal and floating point entries
-->s'
ans = !east

west

south

-->d'
ans =

2.

4.

1.

-->x'
ans =

23.75

- 94.5

north

3.
8.1999998

- 7.9000001

4 meof (Scilab) and feof (Matlab)


Suppose we have a file named 'test.txt' which has the following contents
1
3

What do we do if we know the file contains some numbers but we don't know how many? One
way to read all the available numbers in the file is to read them one at a time followed by a check
for an end-of-file condition using the meof function (feof in Matlab). This returns a non-zero
value if the last input operation reached the end of the file. Here's an example of how we can use
this in a program.
[fd,err] = mopen('test.txt','rt');
i = 1; //use i for an array index
while (~meof(fd)) //while we haven't reach the end of the file
x(i) = mfscanf(fd,'%f'); //read the next number
i = i+1; //increment the array index
end
mclose(fd);
disp(x');

This produces the output


1.

2.

3.

4.

We open the file and then as long as we have not reached the end-of-file condition we read a
single number into an element of an array x, increment the array index i and try again. This
reads in the values 1, 2, 3 and 4 then stops when the end of the file is reached. One thing to
notice is that Scilab ignores the white space in the file (spaces, tabs and line-feed characters)
and only looks for printable characters.
There are many other input/output functions. In the Scilab Help Browser see the sections titled
Files : Input/Output functions
Input/Output functions

EE 221 Numerical Computing

Scott Hudson

2016-02-02

Lecture 4: Input and output

9/9

5 Spreadsheet support
Scilab/Matlab can read and write data in spreadsheet format. We will only consider spreadsheets
with numeric data and using comma-delimited text format (csv files). Suppose the spreadsheet
ss.csv looks like

The actual file in a text editor looks like this


1,2,3
4,5,6

In Scilab we read this file using


-->M = csvRead('ss.csv')
M =
1.
4.

2.
5.

3.
6.

To write a csv file we use the command


csvWrite(M,'new.csv');

It is possible to specify a separator other than a comma and to read and write strings. See the
Spreadsheet section of the help menu for more information.

EE 221 Numerical Computing

Scott Hudson

2016-02-02

Lecture 5
2D plots
1 Introduction
The built-in graphics capabilities of Scilab/Matlab are one of its strongest features. Compiled
languages such as C and Fortran are generally text-based. Generating graphics requires either
separate programs or libraries which are often operating-system specific. Scilab/Matlab, on the
other hand, provides a complete environment in which graphical routines are an integral
component and consistent across different operating systems.
There are many graphics routines. As always the Help command is a good way to see what is
available. We are going to focus on a few of the most useful. This is one area where there is a fair
amount of difference between Scilab and Matlab. We will try to emphasize those aspects in
common, but our primary focus will be on Scilab.

2 The plot command (Scilab/Matlab)


Basic 2D plots can be generated using the plot command. The first argument is a vector of x
data and the second argument is a vector of y data. These vectors must have the same size. For
example
x = linspace(0,6,100);
y = sin(x);
plot(x,y);

To put multiple graphs on the same figure you can either execute multiple plot commands, or you
can enter multiple x,y vector pairs in a single plot command as in
x = linspace(0,6,100);
y1 = sin(x);
y2 = cos(x);
plot(x,y1,x,y2);

Scilab/Matlab will automatically assign line attributes and/or colors as well as numerical axis
labels. If you want to choose these yourself you can add a text argument after each x,y pair. For
example
x = linspace(0,6,100);
y1 = sin(x);
y2 = cos(x);
y3 = 0.5-sin(x).^2;
y4 = 0.5-cos(x).^2;
plot(x,y1,'r-',x,y2,'g-',x,y3,'b-.',x,y4,'k:');

The letters denote colors and the symbols denote line types. Note that the single quote marks are
required. Here are the basic color options

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots
r
g
b
k
c
m
y
k
w

2/12

red
green
blue (default for first plot)
black
cyan
magenta
yellow
black
white

and the basic line types


-:
-.

solid line (default)


dashed line
dotted line
dash-dot line

Line thickness can be specified using the following syntax.


plot(x,y2,'b:','linewidth',2);

This produces a red, dashed plot of thickness 3 and a blue dotted plot of thickness 2.
In place of a line, you can plot your data as discrete points using various symbols. For example
plot(x,y,'r*')

plots red asterisks at each (x,y) pair. The available symbols are
+
o
*
.
x
s
d
^
v
>
<
p

plus sign
circle
asterisk
point
cross
square
diamond
up triangle
down triangle
right triangle
left triangle
pentagram (star)

The symbol size can be specified as follows


plot(x,y,'r>','markersize',10);

A grid is often useful. To add a grid in Scilab use the xgrid command. The corresponding
command in Matlab is grid.

3 Modifying a graphic window interactively (Scilab)


The default appearance of a plot is sufficient for many tasks. However there are situations where
we might want to, say, add axis labels and a title and enhance the appearance of a plot in other
ways. This is particularly the case when we are interested in saving a plot as an image file to be
used in documents or presentations. As an example,
x = linspace(0,2*%pi,100);
y = sin(x);
plot(x,y);

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

3/12

Produces the graphic window shown in Fig. 1. We can move and resize this window with the

Fig. 1

mouse. We can also use the Edit menu to select Axes properties. This opens the Axes Editor
(Fig. 2). In this window there are tabs labeled
X

Title

Style

Aspect

Viewpoint

Figures 2, 3 and 4 show the windows corresponding to the Style, X and Aspect tabs, respectively.
In the Style window (Fig. 2) we can select the font type, color and size used for the numerical
labels on the axes, in addition to other properties. These can best be learned by generating a plot
and playing around with the various settings.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

4/12

Fig. 2

In the X window (Fig. 3) we can specify a text label for the x axis. This is entered into the Text
box. Note that it must be enclosed in double quotes. We can also choose the font type, color and
size of the label. The Location pull-down menu allows us to reposition the x axis. The Grid color
slider selected causes an x grid to appear of the chosen color. The Data bounds boxes initially
contain the minimum and maximum x values of the plotted data. Suppose we wanted our plot to
scale such that it displayed the x axis region 1x10 (for instance we might want to compare
it to another plot). We manually enter -1 and 10 in the Data bounds boxes to achieve this. We can
also select Linear (default) or Logarithmic axis scaling. (Log scaling can only be used if x> 0
for all data.)
The Aspect tab window is shown in Fig. 4. Checking the Isoview box causes the plot to be scaled
so that one unit has the same length along the y axis as it does along the x axis. This is very
useful if our (x,y) data represent distance measurements, say in cm. Then the resulting plot is
geometrically accurate. If the Isoview box is unchecked (default) the x and y data will be
separately scaled to fill up graphic window.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

5/12

Fig. 3

In our example the data have x values over the range 0x2 =6.283 . By default Scilab will
display x values over a range such as 0x7 so that the plot begins and ends on a major tic
mark. Clicking the Tight bounds box overrides this behavior. The Margins boxes specify the
space between the plotted axes and the edges of the figure. These values are fractions of the plot
width or height, and we can change them as desired.
In the Objects Browser portion of the Axes Editor window appears the hierarchy Figure, Axes,
Compound. Expanding the Compound object (click on the + sign) shows a Polyline object.
Selecting this brings up the Polyline Editor (Fig. 5). Here we can modify the properties of the
plotted line(s). If we uncheck the Visibility box the corresponding line will become invisible. We
can also modify the line type, width and color. Alternately (as shown in the figure) we can
uncheck the Line mode box and check the Mark mode to cause the data to appear as discrete
symbols. The marker type, color and size can be selected as desired. The result is the graphic
illustrated in Fig. 7.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

6/12

Fig. 4

4 Exporting graphics (Scilab)


Once we have a plot formatted to our liking we can export it as an image file which can then be
imported into a word processor or other application. In the graphics windows we use the File
menu to select Export to. Various types of files are supported. PNG image files are generally the
most effect bit-mapped format for this purpose. The number of pixels depends on the size of the
window (default is generally 610-by-460).
For higher quality we can use the SVG (scalable vector graphics) format. If an application
supports this format it will allow the figure to be rendered at the highest possible resolution.
Alternately software such as Inkscape (free and open-source, available at inkscape.org) can
render an SVG file as a PNG file of arbitrary resolution.
Another approach is to use the Copy to clipboard option to directly copy and paste a figure. As
in the direct PNG export case this produces a relatively low-resolution graphic, but if this is good
enough then it's generally the fastest method.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

7/12

Fig. 5

5 Modifying a graphic window within a program (Scilab)


Using the graphic edit menus is fine for a one off figure, but often we want to a program to
directly generate a plot with some desired formatting. This is especially true if we intend to use
the program to generate plots of many different data sets. In that case it's too tedious to manually
format each plot as we generate it. Instead we need to learn how to use programming commands
to automatically modify the formatting of a figure.

5.1 scf and clf commands


A new (blank) figure can be produced using the scf (set current figure) command. This has a
single argument which is the desired graphic window number. The commands
scf(1);
scf(2);

generate separate blank graphic windows 1 and 2. Plot commands are directed to the last set
figure. Therefore
scf(1);
plot(x1,y1);
scf(2);
plot(x2,y2);

would plot (x1,y1) in Figure 1 and (x2,y2) in Figure 2. The scf command returns a
handle to a structure defining the figure properties. So

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

8/12

fg = scf(1);

Assigns to variable fg a structure with the following properties.


Handle of type "Figure" with properties:
========================================
children: "Axes"
figure_position = [865,219]
figure_size = [626,586]
axes_size = [610,460]
auto_resize = "on"
viewport = [0,0]
figure_name = "Graphic window number %d"
figure_id = 1
info_message = ""
color_map = matrix 32x3
pixel_drawing_mode = "copy"
anti_aliasing = "off"
immediate_drawing = "on"
background = -2
visible = "on"
rotation_style = "unary"
event_handler = ""
event_handler_enable = "off"
user_data = []
resizefcn = ""
closerequestfcn = ""
resize = "on"
toolbar = "figure"
toolbar_visible = "on"
menubar = "figure"
menubar_visible = "on"
infobar_visible = "on"
dockable = "on"
layout = "none"
layout_options = "OptNoLayout"
default_axes = "on"
icon = ""
tag = ""

Properties can be modified with commands such as


fg.figure_position = [100,150];

For example, the following commands


fg = scf(1);
fg.axes_size = [400,300];
fg.figure_position = [0,0];
fg = scf(2);
fg.axes_size = [400,300];
fg.figure_position = [416,0];
fg = scf(3);
fg.axes_size = [400,300];
fg.figure_position = [0,426];
fg = scf(4);
fg.axes_size = [400,300];
fg.figure_position = [416,426];

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

9/12

Fig. 6

generates four graphic windows (1,2,3,4) arranged in a 2-by-2 array as shown in Fig. 7.
We can then select graphic window 2 and draw a plot in it with the following commands
scf(2);
x = linspace(0,2*%pi,50);
y = sin(x);
plot(x,y,'r-','linewidth',3);
xgrid;

We have seen that repeated plot commands add curves to a figure. If we want to start over we
need to clear the figure using the clf() command. This applies to the currently selected figure.
If we want to clear a specific figure, say graphic window 3, we can specify clf(3).

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

10/12

5.2 Modifying axes properties


Most of the properties we want to modify are in the axes structure. To access this we use the
gca() command. The statement
ax = gca();

assigns to variable ax a handle to a structure specifying the axes properties of the currently
selected figure. These properties are as follows.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

11/12

Handle of type "Axes" with properties:


======================================
parent: Figure
children: "Compound"
visible = "on"
axes_visible = ["on","on","on"]
axes_reverse = ["off","off","off"]
grid = [-1,-1]
grid_position = "background"
grid_thickness = [1,1]
grid_style = [3,3]
x_location = "bottom"
y_location = "left"
title: "Label"
x_label: "Label"
y_label: "Label"
z_label: "Label"
auto_ticks = ["on","on","on"]
x_ticks.locations = matrix 11x1
y_ticks.locations = matrix 11x1
z_ticks.locations = []
x_ticks.labels = matrix 11x1
y_ticks.labels = matrix 11x1
z_ticks.labels = []
ticks_format = ["","",""]
ticks_st = [1,1,1;0,0,0]
box = "on"
filled = "on"
sub_ticks = [4,1]
font_style = 6
font_size = 1
font_color = -1
fractional_font = "off"
isoview = "off"
cube_scaling = "off"
view = "2d"
rotation_angles = [0,270]
log_flags = "nnn"
tight_limits = "off"
data_bounds = [0,-0.9999233;5,0.9995736]
zoom_box = []
margins = [0.125,0.125,0.125,0.125]
auto_margins = "on"
axes_bounds = [0,0,1,1]
auto_clear = "off"
auto_scale = "on"
hidden_axis_color = 4
hiddencolor = 4
line_mode = "on"
line_style = 1
thickness = 1
mark_mode = "off"
mark_style = 0
mark_size_unit = "tabulated"
mark_size = 0
mark_foreground = -1
mark_background = -2
foreground = -1
background = -2
arc_drawing_method = "lines"
clip_state = "clipgrf"
clip_box = []
user_data = []
tag =

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 5: 2D plots

12/12

Properties can be modified by assigning the corresponding variables new values. For example
ax.isoview = on;

causes the x and y axes to have the same scaling. To change the font style and font size of the
axes tic labels and add a black grid we might use
ax.font_size = 4;
ax.font_style = 3;
ax.grid = [1,1];

To add a label to the x axis with a desired font and font size we can execute the following
commands
ax.x_label.text = this is the x label;
ax.x_label.font_size = 5;
ax.x_label.font_style = 3;

Similarly for the y (and z) labels and title. To find out more about, say, font_style settings
see
help graphics_fonts

6 Learning more
In this lecture we have focused on rectangular x,y plots. These are not the only types of plots that
we may want to generate. Polar coordinate r,q plots are common. Other examples are plots of
vector fields (such as fluid velocity) and histograms. To learn more use the help window and
follow the links
Help => Graphics => 2d_plot //Scilab
Help => graph2d %Matlab

You'll find routines such as polarplot, histplot, and many others. Once you are comfortable with
rectangular plots you should find it easy to use the other plotting routines. The help sections on
most of these have examples that you can run and code you can examine or use as a template for
your own programs.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 6
3D plots and animation
1 Introduction
The majority of graphing tasks we face are typically two-dimensional functions of the form
y= f ( x ) . However, not all functions have a single input and a single output. The motion of
a particle through space is described by vector position vs time
r (t)=[ x (t) , y (t ) , z (t )]
We could represent this by three 2D plots, but a more physical representation would be to trace
the particle trajectory in a single 3D plot. In this case the independent variable t does not form
one of the plot axes. Instead it is a parameter of the motion. The resulting graph is called a
parametric plot.
Some engineering problems deal with fields. A field is physical property which can vary
throughout space. For example, the variation of ground elevation across a region of Earth's
surface can be expressed as
z= f ( x , y)

Here the two coordinates x,y might correspond to longitude and latitude and z to ground
elevation, possibly obtained from surveying. Or, z might represent surface temperature or
atmospheric pressure. In those cases we might also be interested with variation through time as
well as through space.
Since a computer screen is two dimensional, plots in three (and higher) dimensions will
necessarily have to represent a single projection of the function. Different projections might
highlight certain aspects of the function and obscure others. This problem grows with the number
of dimensions and is why scientific visualization is an active field of research.
In this lecture we want learn a few basic 3D plotting techniques. We will use the following code
Nx = 80;
Ny = 40;
x = linspace(-6,6,Nx);
y = linspace(-3,3,Ny);
z = zeros(Ny,Nx);
for i=1:Nx
for j=1:Ny
z(j,i) = cos(x(i))*sin(y(j));
end
end

to generate an array of z values which we will plot in various ways. Notice that the first index of
the z array corresponds to the y coordinate and the second index to the x coordinate. This relates
to the raster scan format traditionally used on computer monitors and the way arrays appear in
graphics cards. Both Scilab and Matlab use this convention.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 6: 3D plots and animation

2/9

Fig. 1: Output of surf(x,y,z) command

2 The surf and mesh commands


Given any 2D array of real numbers z either of the commands
surf(z);
mesh(z);

will plot the z values as the elevation field of a 3D surface. The mesh command shows this
surface as a wire mesh while the surf command show it as a solid color-coded surface. The x
and y coordinates are the integer indicies of the array. Alternately we can explicitly provide x and
y values
surf(x,y,z);

The result for our data is shown in Fig. 3. Because this is a 2D projection of a 3D surface, some
parts of the surface may be obscured. To get different views use the Rotation tool from the Tools
menu. Click (right button in Scilab, left button in Matlab) and drag with your mouse to reorient
the surface. As in the 2D case, we can export a figure to a graphics file for inclusion in a
presentation or paper.

2.1 Changing figure and axes properties interactively (Scilab)


As with 2D plots we can use the Edit => Figure properties menu option to change the appearance
of our figure. Similar options are available in Scilab and Matlab but the details differ. We will
consider Scilab.
One of the most visually noticeable changes we can make is to use a different color map. The
color map specifies how different z values are mapped into different colors. This can be changed
by either directly editing the red-green-blue values (Fig. 2) or more practically by specifying a
color map in the Colormap dialog box. Scilab has several predefined color maps (see help
colormap).
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 6: 3D plots and animation

3/9

Fig. 2: Changing the color map.

Two of the most useful are


rainbowcolormap(n)
jetcolormap(n)

The integer n determines the number of discrete colors. A larger number, such as 256 gives a
smooth variation of color throughout the figure. But, you may want to have only eight discrete
colors in which case use 8 as the argument.
Using the Axes Editor we can change axis labels, figure title and the numerical label font for the
x, y and also z axes. In the Aspect submenu deselecting Cube scaling will produce a more
geometrically accurate representation of the surface. By also selecting the Isoview option the
surface plot will correspond to a physical representation of the surface with equal scaling for the
x,y,z axes.
Another useful command is
colorbar(zmin,zmax);

which adds a color bar to the figure showing the relationship between color and z value. After
various formatting changes our figure appears as shown in Fig. 4.

2.2 Changing figure and axes properties within a program (Scilab)


As with 2D plots we can execute commands within a program to make formatting changes.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 6: 3D plots and animation

4/9

Fig. 3: Plot after applying format changes

Fig. 4 was generated by the following commands.


surf(x,y,z);
h = gce();
h.color_flag = 3;
fg = gcf();
fg.color_map = rainbowcolormap(256);
fg.figure_size = [900,600];
ax = gca();
ax.cube_scaling = "off";
ax.isoview = "on";
ax.font_style = 1;
ax.font_size = 3;
ax.grid = [0,0,0];
ax.x_label.text = "x axis";
ax.x_label.font_style = 3;
ax.x_label.font_size = 5;
ax.y_label.text = "y axis";
ax.y_label.font_style = 3;
ax.y_label.font_size = 5;
ax.z_label.text = "z axis";
ax.z_label.font_style = 3;
ax.z_label.font_size = 5;
ax.title.text = "My 3D Plot";
ax.title.font_style = 4;
ax.title.font_size = 6;
ax.rotation_angles = [60,-45];
ax.data_bounds = [-6,-3,-1;6,3,1];
colorbar(-1,1);

The get current entity command gce() and the color_flag subtly change the way the
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 6: 3D plots and animation

5/9

surface color is interpolated. You can experiment with flag values of 0 through 4. If you use these
commands they must come immediately after the surf plotting command.

3 The contour command


Another way to represent surface elevation is by drawing labeled contours of constant elevation
on an x,y representation of the surface as is done in a topographic map.

contour(x,y,z,n);

Here n is the number of (uniformly spaced) contour levels you want drawn on the figure.
Replacing contour by contourf creates a filled contour plot. One irritation in Scilab is that
if we following the raster scan format we used with our initial data-generating code, we have to
replace the z argument with its transpose z.' (this is not the case in Matlab). As with all
graphics, we can adjust the formatting to our liking to get something such as shown in Fig. 4.
For the plain contour command Scilab adds numerical labels to the contours by default. I find
these to be too messy to be of much use and prefer a color bar as shown in the figure. To turn off
labeling use the xset('fpf',' ') command before the plotting, as shown below.
fg = scf(0);
clf();
fg.figure_size = [800,400];
fg.color_map = jetcolormap(11);
xset('fpf',' ');
contourf(x,y,z.',9);
ax = gca();
ax.isoview = "on";
ax.auto_ticks = ["on","on","on"];
ax.font_style = 3;
ax.font_size = 4;
ax.x_label.text = "longitude";
ax.x_label.font_size = 4;
ax.y_label.text = "latitude";
ax.y_label.font_size = 4;
ax.title.text = "ground elevation";
ax.title.font_size = 6;
colorbar(min(z),max(z));

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 6: 3D plots and animation

6/9

Fig. 4: Output of the contour and countourf commands with added formatting

4 Parametric plots
Trajectory plots of the form [ x(t) , y (t ) , z (t)] can be generated by the commands
param3d(x,y,z); //Scilab
plot3(x,y,z); %Matlab

As an example, below is Scilab code to generate a spiral trajectory starting at the origin and
extending up along the z direction as time increases. After some interactive orientation and
formatting we end up with the graph of Fig. 5.

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 6: 3D plots and animation

7/9

Fig. 5: Parametric "trajectory"plot using the param3d(x,y,z) command

Nt = 200;
t = linspace(0,10,Nt);
x = zeros(Nt,1);
y = zeros(Nt,1);
z = zeros(Nt,1);
for i=1:Nt
r = t(i)/10;
x(i) = r*cos(2*%pi*t(i));
y(i) = r*sin(2*%pi*t(i));
z(i) = r;
end
param3d(x,y,z);

5 Animation (Scilab)
Animation is simply the process of generating a series of graphic figures one for each frame of
the animation. The frames can be displayed in real time on the screen or saved as graphics files
which can later be assembled into a video file. There are a few subtle points which arise when
generating animations. Let's illustrate by an example. We start with the following code
x = linspace(0,2*%pi,100);
y = sin(x);
nFrames = 200;
t = linspace(0,4*%pi,nFrames);

This is going to represent a vibrating string. Let's try to generate an animation as follows.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 6: 3D plots and animation

8/9

for i=1:nFrames
plot(x,y*cos(t(i)));
end

Fig. 6: First attempt at an animation

As we should have guessed, this plots one after another position of the string on the same plot
producing the result in Fig. 6. We need to erase the old curve before plotting a new one. So we
try
for i=1:nFrames
clf();
plot(x,y*cos(t(i)));
end

This produces a blinking mess in which the y axis is constantly rescaling. The rescaling we can
get rid of by explicitly stating the data bounds
for i=1:nFrames
clf();
plot(x,y*cos(t(i)));
ax = gca();
ax.data_bounds = [0,-1;2*%pi,1];
end

The problem is that the screen is still a blinking mess. What is happening is that when we tell
Scilab to clear the figure, we see it go blank. Then when we tell Scilab to draw a new figure, we
see that appear. The result is irritating on/off video flicker. What we really want is that as we
are viewing one frame we are generating a new frame behind the scenes. When we are ready
for it we want the new frame to swap out the old frame instantly. This requires two segments of
memory or video buffers. One holds the currently visible frame. The other background buffer
is where the computer is generating the next frame. When ready, the computer rapidly copies the
background buffer contents into the visible buffer. In video systems this process is called double
buffering. Triple buffering is used in high-end video (e.g., gaming systems) so that while the
buffer copying is occurring the computer can already be working on another frame. Scilab
provides two commands to implement double buffering: drawlater() and drawnow().
They are very simple to use as shown in the following code.
EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 6: 3D plots and animation

9/9

for i=1:nFrames
drawlater(); //turn on double buffering so that operations
clf();
//occur in the background
plot(x,y*cos(t(i)));
ax = gca();
ax.data_bounds = [0,-1;2*%pi,1];
drawnow(); //copy the background buffer to the visible buffer
end

This solves our problems. Inbetween the drawlater() and drawnow() commands we can
modify the figure in anyway we wish adding labels, titles, changing color maps and so on.
The speed with which frames update depends on how long it takes to generate a new frame. If the
goal is to produce an independent video file then we want to save each frame to disk. One
approach is shown here.
scf(0);
for i=1:nFrames
drawlater();
//generate a new frame here
drawnow();
fname = msprintf("frames/f%03d.png",i);
while (~isfile(fname))
xs2png(0,fname);
end
end

First we create a subdirectory names frames before running the animation code. During the
animation rendering the msprintf function creates a series of file names from the frame index
i. If this png file does not already exist it is written from the current graphics frame. The result is
a sequence of png files
f001.png , f002.png , ...

in the subdirectory frames. From there video editing software can be used to produce an
animation file. A useful free and open-source program for this is Virtualdub (virtualdub.org).

EE 221 Numerical Computing

Scott Hudson

2016-01-08

Lecture 7
Root finding I
1 Introduction
For our present purposes, root finding is the process of finding a real value of x which solves
f (x)=0 . Since the equation g x=h x can be rewritten as
the equation
f x =g xh x =0 , this encompasses the solution of any single equation in a single
unknown. Ideally we would want to know how many roots exist and what their values are. In
some cases, such as for polynomials, there are theoretical results for the number of roots (some of
which might be complex) and we have clues about what we are looking for. However, for an
arbitrary function f ( x ) there is not much we can say. Our equation may have no real roots, for
example 1+e x =0 , or, as in the case of sin( x)=0 with roots x=n , n=0,1,2, , there
may be an infinite number of roots. We will limit our scope to finding one root any root. If we
fail to find a root it will not mean that the function has no roots, just that our algorithm was
unable to find one.
To have any hope of solving the problem we need to make basic assumptions about f ( x) that
allow us to know if an interval contains a root, if we are close to a root, or in what direction
(left or right) along the x axis a root might lie. At a minimum we will have to assume that our
function is continuous. Intuitively, a continuous function is one that can be plotted without
lifting pen from paper, while the plot of a discontinuous function has breaks. Formally a
function f (x ) is continuous at x=c if for any >0 there exists a >0 such that

| xc|< | f ( x ) f (c)|<
If f (x ) is continuous at all points in some interval a xb then it is continuous over that
interval. Continuity allows us to assume that a small change in x results in a small change in
f (x ) . It also allows us to know that if f (a )>0 , f (b)<0 then there is some x in the interval
(a ,b) such that f ( x )=0 , because a continuous curve cannot go from above the x axis to
below without crossing the x axis.
In some cases we will also assume that f (x ) is differentiable, meaning the limit

f (x )=lim
0

f (x+) f ( x)

exists for all x of interest. This allows us to approximate the function by the line

f (x +) f (x )+f ( x) over at least a small range of x values.

2 Graphical solution
The easiest and most intuitive way to solve f (x )=0 is to simply plot the function and zoom in
on the region where the graph crosses the x axis. For example, say we want to find the first root
of cos ( x)=0 for x0 . We could run the command
x = 0:0.01:3;
y = cos(x);
plot(x,y);
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

2/12

xgrid; //short-hand way to add a grid in Scilab (grid in Matlab)

to get the plot shown in Fig. 1. We can see that there is a root in the interval 1.5x 1.6 . We
then use the magnifying lens tool (Fig. 2) to zoom in on the root and get the plot shown in Fig. 3.
From this we can read off the root to three decimal places as

Fig. 1: Plot of cos(x)

x=1.571
This approach is easy and intuitive. We can readily search for multiple roots by plotting different
ranges of x. However, in many situations we need an automated way to find roots. Scilab (and
Matlab) have built in functions to do this, and we will learn how to use those tools. But, in order
to understand what those functions are doing, and what limitations they may have, we need to
study root-finding algorithms. We start with one of the most basic algorithms called bisection.

Fig. 2: Zoom tool

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

3/12

Fig. 3: Zooming in on the root

3 Bisection
If the product of two numbers is negative then one of the numbers must be positive and the other
must be negative. Therefore, if f (x ) is continuous over an interval a xb , and if
f (a ) f (b)<0 then f ( x ) is positive at one end of the interval and negative at the other end.
Since f ( x ) is continuous we can conclude that f (r )=0 for some r in the interior of the
interval, because you cannot draw a continuous curve from a positive value of f (x ) to a
negative value of f (x ) , or vice versa, without passing through f (x )=0 . This is the basis of
the bisection method, illustrated in Fig. 4.

Fig. 4: Bisection

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

4/12

If f (a ) f (b)<0 then we can estimate the root by the interval's midpoint with an uncertainty of
half the length of the interval, that is, r =(b+a)/2(ba )/2 . To reduce the uncertainty by half
we evaluate the function at the midpoint x=(b+a)/2 . If f (x ) , f (a ) have the same sign (as in
the case illustrated in Fig. 4) we set a= x . If f (x) , f ( b) have the same sign we set b=x . In
the (very unlikely) event that f ( x )=0 then r =x is the root. In either of the first two cases we
have bisected the interval a x b into an interval half the original size. We then simply
repeat this process until the uncertainty | b a| /2 is smaller than desired. This method is simple
and guaranteed to work for any continuous function. The algorithm can be represented as follows
Bisection algorithm
repeat until |ba| /2 is smaller than tol
set x = midpoint of interval (a ,b)
if f (x ) has the same sign as f (a ) then set a= x
else if f (x ) has the same sign as f (b) then set b=x
else f (x) is zero and we've found the root!
Function rootBisection in the Appendix implements the bisection algorithm.

3.1 Bracketing a root


To start the bisection algorithm we need two values x=a , b such that f (a ) f (b)<0 . We say
that a and b bracket a root, meaning a root is guaranteed to lie in the interval (a ,b) . How do we
bracket a root in the first place? In an interactive session a plot of f ( x ) allows you to quickly
spot values which bracket a root. In Fig. 1 we immediately see that a=1.4 , b=1.6 bracket a
root.
In some cases a bracketing interval might be obvious from the form of the function. As an
example, a problem that comes up in fiber optics is the solution of an equation of the form
f (x)= 1x 2tan(ax )=0 for x0 . Since f (0)=1>0 and f ( x /(2 a)) we know
that the interval (0, /(2 a )) must bracket a root.
In general there is no sure-fire method for bracketing a root, even if a root exists. Fig. 5 is an
example of a function with roots that would be difficult to find unless we happened to start
searching near x=0 .
One approach to getting started is a grid search. Let x min <x< x max be an interval in which we
believe one or more roots f (x )=0 might exist. Let x be the smallest scale at which we think
the function is likely to change significantly (in Fig. 5 this might be x0.1 . Our search grid is
then ( x min , x min + x , x min +2 x , , x max ) . We search the grid by calculating f (x) at each grid
point. Any time we observe a sign change we have found a new root.
Grid searching is a non-interactive version of the plot and see method and it has the same pros
and cons. On the pro side, it will work even with hidden roots (as in Fig. 5), and it (ideally)
allows us to identify all roots in a prescribed interval. On the con side, it requires many
function evaluations. If the function is costly to evaluate, or if the root finding operation is in a
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

5/12

Fig. 5: A difficult root-finding problem

loop that will be repeated many times, then a grid search is not likely to be practical.
Another way to approach root bracketing is to start at an arbitrary value x=a with some step
size h. We then move along the x axis such that f ( x) is decreasing (that is, we are moving
towards y=0 ) until we find a bracket interval (a ,b) . If f ( x) starts to increase before we
find a bracket then we give up. Increasing the step size at each iteration protects us from getting
into a situation where we are inching along a function that has a far-away zero. Function
rootBracket in the Appendix implements this idea.

3.2 Convergence
The uncertainty in the root, the maximum error in our bisection estimate, is =ba/2 . This
decreases by a factor of with each bisection. Therefore the relationship between the error at
step k and step k +1 is
1
k+1= k
2

(1)

More generally we might have an error =xr that decreases as


k+1 qk

(2)

The exponent q is called the order of convergence. If q=1 , as it is for bisection, we say the
convergence is linear, and we call the rate of convergence. For q>1 the convergence is said
to be superlinear, and specifically, if q=2 the convergence is quadratic. For superlinear
convergence is called the asymptotic error constant.
From (1) and (2) we see that bisection converges linearly with a rate of convergence of . If the
initial error is 0 then after k iterations we have

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

6/12
k

()

1
k =0
2

In order to add one decimal digit of accuracy we need to decrease the error by a factor of 1/10. To
increase accuracy by n digits we require
k

()

1
1
= n
2
10

Solving for k we find


k=

n
=3.32 n
log 2

It takes k 10 iterations to add n=3 digits of accuracy. All linearly converging root finding
algorithms have the characteristic that each additional digit of accuracy requires a given number
of algorithm iterations.
Now consider a quadratically convergent method with
2

k+1=k

Suppose 0=0.1 , so we start with one decimal place of accuracy. Then


1 =0.01 , 2=0.0001 , 3=0.00000001 , 4=0.0000000000000001
The first iteration adds one decimal place of accuracy. The second adds two decimal places. The
third adds four and the fourth adds eight. This runaway increase in accuracy is what motivates
us to explore other root-finding algorithms.

4 Fixed-point iteration
The goal of root finding is to arrive at x=r where f (r )=0 . Now consider the following three
equations
0= f (x )
0=a f ( x)
x=x+a f ( x)
Multiplying the first equation through by a produces the second equation. Adding x to both sides
of the second equation produces the third. All three will have the same roots, provided a0 .
Now let's define a new function
g ( x)=x+a f ( x)
Then our root-finding problem can be written

x=g ( x)
the solution of which is
r =g (r )
What's attractive about this is that it has the form x equals something, which is almost an
explicit formula for x. The problem is that the something itself depends on x. So let's imagine
starting with a guess for the root x=x 0 . We might expect that x 1=g ( x 0) would be an improved
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

7/12

guess. We could continue iterating as many times as desired


x 1=g ( x 0) , x 2 =g ( x 1 ) , , x k+1=g ( x k )
Our hope is that this converges with x k r as k . Writing
x k =r + k
k is the error in the kth root estimate. Suppose that the error is small enough that the 1 st order
Taylor series

g ( r+)g (r )+g (r )=r +g (r )


is accurate. Then

r + k+1=r+ g (r ) k
and

k+1=g (r ) k
It follows that
k

k =[ g (r ) ] 0

(3)

Comparing (3) to (2) we see that provided


=g (r )<1

(4)

fixed-point iteration converges linearly with rate of convergence . The value r is said to be a
fixed point of the iteration since the iteration stays fixed at r =g ( r ) . Condition (4) requires

|1+a f ( r )|<1

Provided f (r )0 we can always find a value a satisfying this condition.


Fixed-point iteration
Set g ( x)= x+a f ( x) for some choice of a
From initial root estimate x 0
iterate until | x n+1 x n|<tol
x n+1=g (x n )
if iteration diverges, select another value of a

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

8/12

Example: Suppose we wish to solve

f ( x)=x 22=0
By inspection the roots are 2 . Let
2

g ( x)=x+ f ( x)= x+x 2


Since

g ( 2)=1+2 2>1
we expect that fixed point iteration will fail. Starting very close to a root with
x=1.4 . iteration gives the sequence of values
1.36 ,1.2096 , 0.6727322 ,0.8746993 ,2.1096004
which is clearly moving away from the root. On the other hand taking
g ( x)=x 0.25 f ( x)= x0.25(x 2 2)
for which

g ( 2)=0.293
we get the sequence of values
1.412975 , 1.4138504 ,1.4141072 , 1.4141824 ,1.4142044 , 1.4142109
which is converging to

2=1.4142136 .

Interestingly, if f ( r)0 we have


=| g (r )|=|1+a f (r )|=0

Solving for a
a=

f (r )

(5)

It seems we might achieve superlinear convergence for this choice of a. This is one way to
motivate Newton's method which we cover in the next lecture.

4.1 Root bracketing vs. root polishing


In the vocabulary of numerical analysis, a method like bisection which starts with and always
maintains a root bracket (a ,b) is called a root bracketing method. A technique such as fixedpoint iteration which starts at some value x 0 and generates a sequence x 1 , x 2 , x3 , , x k is
called a root polishing method. Root bracketing has the advantages that the existence of a
bracket guarantees (for a continuous function) the existence of a root within that bracket, and the
error in taking the midpoint of the bracketing interval as the root is guaranteed to be no more
than half the width of the interval. The price we pay for these guarantees is that we have to have a
root bracketed from the start. Root polishing has the advantage that we don't need a root bracket
to start, just a single value of x. On the down side root polishing techniques are not guaranteed to
converge to a root, even if we start off near one, and they don't give us a rigorous bound on the
error in the value of a root if we do.
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

9/12

5 Appendix Scilab code


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029

//////////////////////////////////////////////////////////////////////
// rootBisection.sci
// 2014-06-04, Scott Hudson, for pedagogic purposes
// Implements bisection method for finding a root f(x) = 0.
// Requires a and b to bracket a root, f(a)*f(b)<0.
// Returns root as r with maximum error tol.
//////////////////////////////////////////////////////////////////////
function r=rootBisection(a, b, f, tol)
fa = f(a);
fb = f(b);
if (fa*fb>=0) //make sure a,b bracket a root
error('rootBisection: fa*fb>=0');
end
while (abs(b-a)/2>tol) //stop when error in root < tol
x = (a+b)/2; //midpoint of interval
fx = f(x);
if (sign(fx)==sign(fa)) //r is in (x,b)
a = x;
//move a to x
fa = fx;
elseif (sign(fx)==sign(fb)) //r is in (a,x)
b = x;
//move b to x
fb = fx;
else //unlikely case that fx==0, sign(fx)==0, we found the root
a = x; //shrink interval to zero width a=b=x
b = x;
end
end
r = (a+b)/2; //midpoint of last bracket interval is root estimate
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032

10/12

//////////////////////////////////////////////////////////////////////
// rootBracket.sci
// 2014-06-04, Scott Hudson, for pedagogic purposes
// Given a function f(x), starting point x=x0 and a stepsize h
// search for a and b such that f(x) changes sign over [a,b] hence
// bracketing a root.
//////////////////////////////////////////////////////////////////////
function [a, b]=rootBracket(f, x0, h)
a = x0;
fa = f(a);
b = a+h;
fb = f(b);
done = (sign(fa)~=sign(fb)); //if the signs differ we're done
if (~done) //if we don't have a bracket
if (abs(fb)>abs(fa)) //see if a->b is moving away from x axis
h = -h; //if so step in the other direction
b = a; //and we will start from a instead of b
fb = fa;
end
end
while (~done)
a = b; //take another step
fa = fb;
h = 2*h; //take bigger steps each time
b = a+h;
fb = f(b);
done = (sign(fa)~=sign(fb));
if ((abs(fb)>abs(fa))&(~done)) //we're now going uphill, give up
error("rootBracket: cannot find a bracket\n");
end
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

11/12

6 Appendix Matlab code


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% rootBisection.m
%% 2014-06-04, Scott Hudson, for pedagogic purposes
%% Implements bisection method for finding a root f(x) = 0.
%% Requires a and b to bracket a root, f(a)*f(b)<0.
%% Returns root as r with maximum error tol.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function r = rootBisection(a,b,f,tol)
fa = f(a);
fb = f(b);
if (fa*fb>=0) %%make sure a,b bracket a root
error('rootBisection: fa*fb>=0');
end
while (abs(b-a)/2>tol) %%stop when error in root < tol
x = (a+b)/2; %%midpoint of interval
fx = f(x);
if (sign(fx)==sign(fa)) %%r is in (x,b)
a = x;
%%move a to x
fa = fx;
elseif (sign(fx)==sign(fb)) %%r is in (a,x)
b = x;
%%move b to x
fb = fx;
else %%unlikely case that fx==0, sign(fx)==0, we found the root
a = x; %%shrink interval to zero width a=b=x
b = x;
end
end
r = (a+b)/2; %%midpoint of last bracket interval is root estimate
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 7: Root finding I

12/12

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% rootBracket.m
%% 2014-06-04, Scott Hudson, for pedagogic purposes
%% Given a function f(x), starting point x=x0 and a stepsize h
%% search for a and b such that f(x) changes sign over [a,b] hence
%% bracketing a root.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [a,b] = rootBracket(f,x0,h)
a = x0;
fa = f(a);
b = a+h;
fb = f(b);
done = (sign(fa)~=sign(fb)); %%if the signs differ we're done
if (~done) %%if we don't have a bracket
if (abs(fb)>abs(fa)) %%see if a->b is moving away from x axis
h = -h; %%if so step in the other direction
b = a; %%and we will start from a instead of b
fb = fa;
end
end
while (~done)
a = b; %%take another step
fa = fb;
h = 2*h; %%take bigger steps each time
b = a+h;
fb = f(b);
done = (sign(fa)~=sign(fb));
if ((abs(fb)>abs(fa))&(~done)) %%we're now going uphill, give up
error('rootBracket: cannot find a bracket');
end
end
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8
Root finding II
1 Introduction
In the previous lecture we considered the bisection root-bracketing algorithm. It requires only
that the function be continuous and that we have a root bracketed to start. Under those conditions
it is guaranteed to converge with error
k

()
1
2

k =

at the kth iteration. Thus bisection provides linear convergence with a rate of convergence of 1/2.
Because of its general applicability and guaranteed convergence, bisection has much to
recommend it.
We also studied fixed-point iteration
x k +1 =g ( x k )= x k +a f ( x k )
where a is a constant. We found that provided we start out close enough to a root r the method
converges linearly with error
k

k =[ g (r ) ] 0

(1)

As this is a root-polishing algorithm, it does not require an initial root bracketing, which might be
considered a plus. Still, for one-dimensional functions f (x ) , fixed-point iteration is not an
attractive algorithm. It provides the same linear convergence as bisection without any guarantee
of finding a root, even if one exists. However, it can be useful for multidimensional problems.

What we want to investigate here is the tantalizing prospect suggested by g (r )=0 which in
light of (1) suggests superlinear convergence of some sort.

2 Newton's method

To achieve g (r )=1+a f (r )=0 we take


a=

f (r )

to arrive at the root polishing iteration formula


x k +1 =x k

f (x k )

f (r)

The only problem is that to compute f ( r) we'd need to know r, and that's what we are

searching for in the first place. But if we are close to the root, so that f ( x k ) f ( r) , then it
makes sense to use

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8: Root finding II

2/10

Fig. 1: Newton's method.

x k +1 =x k

f ( xk )

f ( xk )

Let's derive this formula another way. If f (x) is continuous and differentiable, then for small
changes in x, f (x ) is well approximated by a first-order Taylor series. If we expand the Taylor
series about the point x= x k and take x k +1=x k +h then, assuming h is small, we can write

f (x k +h)= f ( x k+1) f (x k )+ f (x k )h=0


and solve for
h=

f (x k )

f ( xk )

This gives us the formula


x k +1=x k

f (x k )

(2)

f (x k )

which is called Newton's method. In Newton's method we model the function by the tangent
line at the current point (x k , f (x k )) . The root of the tangent line is our next estimate of the root
of f (x ) (Fig. 1).
Taking x k =r + k and assuming the error k is small enough that the 2nd order Taylor series

f (x ) f ( r )+ f ( r )+

1
1

f ( r ) 2= f (r )+ f ( r )2
2
2

is accurate, it's straight-forward (but a bit messy) to show that Newton's method converges
quadratically with
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8: Root finding II

3/10

1 f (r ) 2
k+1=

2 f ( r ) k
Example: Let's take
f (x )=x 22=0

Since f ( x)=2 x , Newton's method gives the iteration formula


2

x 2
x k +1=x k k
2 xk
Let's start at x 0=1 . Four iterations produce the sequence of numbers
1.5 , 1.4166667 ,1.4142157 ,1.4142136
This is very rapid convergence toward the root 2=1.4142136 . In fact the
final value (as stored in the computer) turns out to be accurate to about 11
decimal places.
A Scilab implementation of Newton's is given in the Appendix. We require two functions as

arguments: f (x ) and f ( x) . Newton's method fails if f ( x k )=0 and more generally performs
poorly if

( x k ) is very small. It can also fail if we start far away from a root.

With root-bracketing methods, such as bisection, we know that the error in our root estimate is
less than or equal to half the last bracketing interval. This gives us a clear termination condition ;
we stop when the maximum possible error is less than some defined tolerance. With rootpolishing methods, such as Newton's method, we don't have any rigorous bound on the error in
our root. Deciding when to stop iterating involves some guess work. A simple criterion is

x k x k 1 tol

(3)

that is, we terminate when the change in root estimates is less than some specified tolerance. This
is a reasonable way to estimate the actual uncertainty in our root estimate, but keep in mind that
it is not a rigorous bound on the actual error, as is the case with bisection. Since Newton's
method is not guaranteed to converge it may just bounce around for ever it is a good idea to
also terminate if the number of iterations exceeds some maximum value.
It's natural to look for methods that converge with order q=3, 4, . Householder's method
generalizes Newton's method to higher orders. The q=3 version is called Halley's method. It

requires calculation of f ( x k ) , f ( x k ), f (x k ) at every step. The higher-order methods likewise


require ever higher-order derivatives to be calculated. For practical purposes, the increased order
of convergence is more than offset by the added calculations and complexity. One application
where this is not necessarily the case is polynomials. It is very easy to calculate a polynomial's
derivatives of all orders, and higher-order methods do find application in polynomial root
finding.
If started close enough to a simple root Newton's method generally performs very well and is
the method of choice in many applications. It does, however, require that we compute both

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8: Root finding II

4/10

Fig. 2: Secant method

f (x ) and f ( x) at each iteration. In some cases it may not be possible or practical to compute

f ( x) explicitly. We would like a method that gives us the rapid convergence of Newton's
method without the need of calculating derivatives.

3 Secant method
In the secant method we replace the derivative appearing in Newton's method by the
approximation

f ( x k )

f ( x k ) f ( x k1)
x k x k1

Substituting this into (2) results in


x k +1=x k f ( x k )

x k x k 1
f (x k ) f ( x k 1 )

(4)

Another way to view the secant method is as follows. Suppose we have evaluated the function at
two points ( x k , f k = f ( x k )) and (x k 1 , f k1= f ( x k1)) . Through these two points we can draw
a line, the formula for which is
y= f k +

f k f k1
( xx k )
x k x k1

Setting this formula equal to zero and solving for x we obtain (4). In the secant method we model
the function f (x ) by a line through our last two root estimates (Fig. 2). Solving for the root of
that line provides our next estimate.
When the secant method works it converges with order q1.6 , superlinear but not quadratic.
Comparison of Figs. 1 and 2 suggest why this is. The secant method will tend to underestimate or
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8: Root finding II

5/10

Fig. 3: Failure of the secant method

overestimate (depending on the 2nd derivative of the function at r) the actual slope at the point
. We are effectively using an average of the slopes at (x k , f (x k )) and
( x k 1 , f ( x k1 )) whereas we use the true slope at ( x k , f ( x k )) in Newton's method. It can be
shown [2] that the secant method converges as

1 f (r )
k+1=

2 f (r ) k k1
or

0.618

1 f (r )
k +1
2 f (r )

1.618

so the secant method is of order q1.6 , which is superlinear but less than quadratic. A Scilab
implementation of the secant method is given in the appendix.
As is so often the case with numerical methods we are presented a trade off. The secant method
does not require explicit calculation of derivatives while Newton's method does. But, the secant
method does not converge as fast as Newton's method. As with all root-polishing methods,
deciding when to stop iterating involves some guess work. Criterion (3) is an obvious choice.
As illustrated in Fig. 3, the secant method can fail, even when starting out with a bracketed root.
There we start with points 1 and 2 on the curve. The line through those points crosses the x axis
at s. The corresponding point on the curve is point 3. Now we draw a line through points 2 and 3.
This gives the root estimate t. The corresponding point on the curve is point 4. We are actually
moving away from the root.
In the case illustrated points 1 and 2 bracket a root while points 2 and 3 do not. Clearly if we
have a root bracketed we should never accept a new root estimate that falls outside that bracket.
The false position method is a variation of the secant method in which we use the last two points
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8: Root finding II

6/10

Fig. 4: Inverse quadratic interpolation. Solid line is the function. Dotted


line is x represented as a quadratic function of y passing through three
given points on the curve.

which bracket a root to represent our linear approximation. In that case we would have drawn a
line between point 3 and point 1. Of course, as in the bisection method, this would require us to
start with a bracketed root. Moreover in cases where both methods would converge the false
position method does not converge as fast as the secant method.

4 Inverse quadratic interpolation


Given two points on a curve ( x k , f k ) and (x k 1 , f k1 ) the secant method approximates the
function by a line y= f ( x )c 1+c2 x (Fig. 2). This suggests that if we have a third point
(x k 2 , f k 2) we might draw a parabola through the three points to obtain a quadratic
approximation to the function. Setting that approximation equal to zero
2

y= f ( x )c 1+c2 x+c3 x =0
might provide a better root approximation than the secant method. Unfortunately we would have
to solve a quadratic equation in this case. An alternative approach is inverse quadratic
interpolation where we represent x as a quadratic function of y
x= f 1 ( y)=c 1+c 2 y +c 3 y 2
Setting y= f ( x )=0 we simply have x=c1 (Fig. 4). It turns out there is an explicit formula for
this value
x=

x1 f 2 f 3
x2 f 3 f 1
x3 f 1 f 2
+
+
( f 1 f 2)( f 1 f 3 ) ( f 2 f 3)( f 2 f 1) ( f 3 f 1 )( f 3 f 2)

(We will understand this formula when we study Lagrange interpolation.) Inverse quadratic
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8: Root finding II

7/10

interpolation allows us to exploit information about both the first derivative (slope) and second
derivative (curvature) of the function.
Like the secant method, inverse quadratic interpolation can also fail. But when it works it
converges with order q1.8 . Still not as rapid as Newton's method, but it does not require
evaluation of derivatives. It converges faster than the secant method ( q1.6 ) but at the cost of
more bookkeeping and a more complicated recursion formula.

5 Hybrid methods
In general, numerical root finding is a difficult problem. We are presented with various trade
offs, such as that between the guaranteed converge of the bisection method and the faster
convergence of the Newton, secant or inverse quadratic methods. Consequently people have
developed hybrid methods that seek to combine the best of two or more simpler methods. One of
the most widely used is Brent's method [1] (used in the Matlab fzero function). This method
combines the bisection, secant and inverse quadratic methods. Brent's method starts off with a
bracketed root. If we don't have a bracket then we have to search for one. With those two initial
points, Brent's method applies the secant method to get a third point. From there it tries to use
inverse quadratic interpolation for rapid convergence, but applies tests at each iteration to see if it
is actually converging superlinearly. If not, Brent's method falls back to the slow-but-sure
bisection method. At the next iteration it again tries inverse quadratic interpolation. Another
example is Powell's hybrid method. (used in the Scilab fsolve function).
There is no single agreed-upon one size fits all algorithm for root finding. Hybrid methods seek
to use the fastest algorithm that seems to be working with the option to fall back to slower, but
surer methods as a backup. These methods are recommended for most root-finding applications.
However there may be specific applications where a particular method will be superior. Newton's

method is hard to beat if it is possible to directly calculate both f ( x ) and f ( x) and a


reasonably accurate initial guess to a root is available. It can be coded very compactly, so it's easy
to incorporate directly into a program.

6 The fsolve command (Scilab)


As with nearly all non-trivial numerical algorithms, Scilab employs state-of-the-art methods
developed by numerical computation researchers in it's library of functions. For solving
f (x )=0 Scilab provides the fsolve function. The simplest use of this is
r = fsolve(x0,f);

Here x0 is an initial guess at a root of the function f(x). The returned value r is the estimated
root. For example
-->deff('y=f(x)','y=cos(x)');
-->r = fsolve(1,f)
r =
1.5707963

The Matlab equivalent is called fzero, although it has a slightly different syntax. Note,
however, that fsolve may fail to find a root but may still return a value r. For example

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8: Root finding II

8/10

-->deff('y=f(x)','y=2+cos(x)');
-->r = fsolve(1,f)
r =
3.1418148

The function 2+cos ( x) is never 0 yet Scilab returned a value for r. This value is actually where
the function gets closest to the x axis; it's where f ( x) is a minimum. Therefore, you should
always check the value of the function at the reported root. Running the command using the
syntax
[r,fr] = fsolve(x0,f);

returns the value of the function at r, fr=f(r). For example


-->[r,fr] = fsolve(1,f)
fr =
r

1.
=
3.1418148

shows us that r is not actually a root since f (r )=1 . On the other hand
-->deff('y=f(x)','y=0.5+cos(x)');
-->[r,fr] = fsolve(1,f)
fr =
2.220D-16
r =
2.0943951

makes it clear that r is a root in this case. By default, fsolve tries to find the root to within an
estimated tolerance of 1010 . You can specify the tolerance explicitly as in
[r,fr] = fsolve(x0,f,tol);

Because of limitations due to round-off error it is not recommended to use a smaller tolerance
than the default. There may be situations where you don't need much accuracy and using a larger
tolerance might save a few function calls.

7 References
1. Brent, Richard P. Algorithms for Minimization Without Derivatives. Dover Publications.
Kindle edition. ASIN: B00CRW5ZTK. 2013 (Originally published 1973)
2. http://www.math.drexel.edu/~tolya/300_secant.pdf

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8: Root finding II

9/10

8 Appendix Scilab code


8.1 Newton's method
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017

//////////////////////////////////////////////////////////////////////
// rootNewton.sci
// 2014-06-04, Scott Hudson
// Implements Newton's method for finding a root f(x) = 0.
// Requires two functions: y=f(x) and y=fp(x) where fp(x) is
// the derivative of f(x). Search starts at x0. Root is returned as r.
//////////////////////////////////////////////////////////////////////
function r=rootNewton(x0, f, fp, tol)
MAX_ITERS = 40; //give up after this many iterations
nIters = 1; //1st iteration
r = x0-f(x0)/fp(x0); //Newton's formula for next root estimate
while (abs(r-x0)>tol) & (nIters<=MAX_ITERS)
nIters = nIters+1; //keep track of # of iterations
x0 = r; //current root estimate is last output of formula
r = x0-f(x0)/fp(x0); //Newton's formula for next root estimate
end
endfunction

8.2 Secant method


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021

//////////////////////////////////////////////////////////////////////
// rootSecant.sci
// 2014-06-04, Scott Hudson
// Implements secant method for finding a root f(x) = 0.
// Requires two initial x values: x1 and x2. Root is returned as r
// accurate to (hopefully) about tol.
//////////////////////////////////////////////////////////////////////
function r=rootSecant(x1, x2, f, tol)
MAX_ITERS = 40; //maximum number of iterations allowed
nIters = 1; //1st iteration
fx2 = f(x2);
r = x2-fx2*(x2-x1)/(fx2-f(x1));
while (abs(r-x2)>tol) & (nIters<=MAX_ITERS)
nIters = nIters+1;
x1 = x2;
fx1 = fx2;
x2 = r;
fx2 = f(x2);
r = x2-fx2*(x2-x1)/(fx2-fx1);
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 8: Root finding II

10/10

9 Appendix Matlab code


9.1 Newton's method
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% rootNewton.m
%% 2014-06-04, Scott Hudson
%% Implements Newton's method for finding a root f(x) = 0.
%% Requires two functions: y=f(x) and y=fp(x) where fp(x) is
%% the derivative of f(x). Search starts at x0. Root is returned as r.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function r = rootNewton(x0,f,fp,tol)
MAX_ITERS = 40; %%give up after this many iterations
nIters = 1; %%1st iteration
r = x0-f(x0)/fp(x0); %%Newton's formula for next root estimate
while (abs(r-x0)>tol) & (nIters<=MAX_ITERS)
nIters = nIters+1; %%keep track of # of iterations
x0 = r; %%current root estimate is last output of formula
r = x0-f(x0)/fp(x0); %%Newton's formula for next root estimate
end
end

9.2 Secant method


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% rootSecant.m
%% 2014-06-04, Scott Hudson
%% Implements secant method for finding a root f(x) = 0.
%% Requires two initial x values: x1 and x2. Root is returned as r
%% accurate to (hopefully) about tol.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function r = rootSecant(x1,x2,f,tol)
MAX_ITERS = 40; %%maximum number of iterations allowed
nIters = 1; %%1st iteration
fx2 = f(x2);
r = x2-fx2*(x2-x1)/(fx2-f(x1));
while (abs(r-x2)>tol) & (nIters<=MAX_ITERS)
nIters = nIters+1;
x1 = x2;
fx1 = fx2;
x2 = r;
fx2 = f(x2);
r = x2-fx2*(x2-x1)/(fx2-fx1);
end
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9
Polynomials
1 Introduction
The equation
p ( x)=c1 +c 2 x+c 3 x 2 +c 4 x 3=0

(1)

is one equation in one unknown, and the root finding methods we developed previously can be
applied to solve it. However this is a polynomial equation, and there are theoretical results that
can be used to develop specialized root-finding methods that are more powerful than generalpurpose methods.
For polynomials of order n=1,2,3,4 there are analytic formulas for all roots. The single root of
c 1+c 2 x=0 is x=c 1 /c2 . The quadratic formula gives the two roots of
2

c 1+c 2 x+c 3 x =0
as
x=

c2 c22 4 c 1 c3
2 c3

The formulas for order 3 and 4 polynomials are too complicated to be of practical use. Therefore
to find the roots of order 3 and higher polynomials we are forced to use numerical methods. Yet
it is still the case that we have important theoretical results to guide us.
The fundamental theorem of algebra states that a polynomial of degree n has precisely n roots
(some of which may be repeated). However, these roots may be real or complex. Most of the root
finding algorithms we have studied so far apply only to a real function of a real variable. They
can find the real roots of a polynomial (if there are any) but not complex roots.
If the polynomial has real coefficients c 1 , c 2 , then complex roots (if any) come in complexconjugate pairs. Let z= x+i y be a general complex number with real part x and imaginary part
y. If
c 1+c 2 z+c3 z 2++c n+1 z n=0
then taking the complex conjugate of both sides we have
c 1 +c2 z +c3 (z 2 )++cn+1( z n )=0

(2)

where z =xi y . Suppose the coefficients are real so that c k =c k . Since (z k ) =( z )k (2)
becomes

c 1+c 2 (z )+c3 (z )2++c n+1 (z )n=0

which tells us that if z is a root of the polynomial then so is z . Therefore complex roots must
come in complex-conjugate pairs. From this we know that a polynomial of odd order has at least
one real root since we must always have an even number of complex roots.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

2/14

Another way to see this is in terms of root bracketing. For very large (real) values of x, an nth
order polynomial is dominated by its highest power term
c 1+c 2 x+c 3 x 2++cn +1 x nc n+1 x n
For n odd, x n is positive for positive x and negative for negative x. Therefore the polynomial
must change sign between x and x . Since polynomials are continuous functions we
conclude that there must be a real root somewhere on the x axis.
For a complex root
p ( z )=c 1+c 2 (x+i y )+c 3 (x+i y) 2++cn +1 ( x+i y)n
Expanding each term into real and imaginary parts, along the lines of
( x+i y )2=x 2 y 2 +i 2 x y
( x+i y )3=x 33 x y 2 i y ( y 2 3 x 2)
we end up with an equation of the form
p ( z )= f ( x , y )+i g ( x , y)=0
which is actually two real equations in two real unknowns

)()

f (x , y)
0
=
g (x , y )
0

and so not directly solvable using our f (x )=0 algorithms.

2 Manipulating polynomials in Scilab


When we write out a polynomial such as (1) the variable x is simply a placeholder into which
some number is to be inserted, so the polynomial is fully specified by simply listing its
coefficients
[c1 , c 2 , , c n+1 ]
Here is a major difference between Scilab and Matlab. Scilab follows the convention that an
array of polynomial coefficients begins with the constant term on the left and ends with the
coefficient of x n on the right. Matlab does the bookkeeping in the opposite direction. In Scilab
the coefficients
c = [-15,23,-9,1]

correspond to the polynomial 15+23 x9 x 2+ x 3 . In Matlab the same coefficients correspond


to the polynomial 15 x 3+23 x 29 x+1 . The Matlab convention more closely corresponds to
the way we normally write polynomials (starting with the highest power). However, there are
some advantages to the Scilab convention. Regardless, I want to make it clear that we will be
following the Scilab convention, and most of the material in this section is particular to Scilab
(although there are counterparts in Matlab).
To explicitly form a polynomial in the variable x with coefficients c=[1,2,3] we use the
command

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

3/14

-->p = poly([1,2,3],'x','coeff')
p =
2
1 + 2x + 3x

Notice that Scilab outputs an ascii typeset polynomial in the variable of interest. To form a
polynomial with roots at x=1 , x=2 we use the command
-->q = poly([1,2],'x','roots')
q =
2
2 - 3x + x

Note that

( x1)(x2)= x 23 x+2=23 x+x 2


You can multiply and divide polynomials
-->p*q
ans =

2
3
4
2 + x + x - 7x + 3x

-->p/q
ans =

1 + 2x + 3x
----------2
2 - 3x + x

In the second case we obtain a rational function of x. Now consider the somewhat redundant
appearing command
-->x = poly(0,'x')
x =
x

This assigns to the Scilab variable x a polynomial in the symbolic variable x having a single root
at x=0 , that is, it effectively turns x into a symbolic variable. Now we can enter expressions
such as
-->h = 3*x^3-4*x^2+7*x-15
h =
2
3
- 15 + 7x - 4x + 3x
-->g = (x-1)*(x-2)*(x-3)
g =
2
3
- 6 + 11x - 6x + x

To evaluate a polynomial (or rational function) at a specific number we use the horner
command
-->horner(h,3)
ans =
51.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

4/14

-->horner(h,1-%i)
ans =
- 14. - 5.i

We can evaluate a polynomial (or rational function) at an array of numbers


-->v = [1,2,3]
v =
1.
2.

3.

-->horner(h,v)
ans =
- 9.
7.

51.

As always, we are only scratching the surface. See the Polynomials section of the Scilab Help
Browser for more information.

3 Factoring and deflation


Another thing we know about polynomials is that they can always (in principle) be factored
c 1+c 2 z+c3 z 2++c n+1 z n=c n+1 (z r 1 )(zr 2 )( zr n)
where r 1 , r 2 , , r n are the (in general complex) roots of the polynomial. If we find one root, r 1
say, we can divide out a factor of ( zr 1 ) to get a polynomial of degree n1
2

c1+c 2 z +c 3 z ++c n+1 z


=b 1+b 2 z ++bn z n1
zr 1
This is the process of deflation. If you form the ratio of two polynomials with a common factor,
Scilab will cancel the common factor. Consider
-->p = (x-1)*(x-2)
p =
2
2 - 3x + x
-->q = (x-1)*(x-3)
q =
2
3 - 4x + x
-->p/q
ans =
- 2 + x
----- 3 + x

Notice that the common factor of x1 has been canceled. This can be used for deflation
-->p/(x-2)
ans =
- 1 + x
----1

We need to consider the effect of finite precision. Suppose we've calculated a polynomial root r
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

5/14

numerically. We don't expect it to be exact. Will Scilab be able to factor it out of the polynomial?
Look at the following
-->p = (x-sqrt(2))*(x-%pi)
p =

4.4428829 - 4.5558062x + x
-->p/(x-%pi*(1+1e-6))
ans =

4.4428829 - 4.5558062x + x
-------------------------- 3.1415958 + x
-->p/(x-%pi*(1+1e-9))
ans =
- 1.4142136 + x
------------1

The polynomial has a factor of (x) . In the first ratio we are trying to cancel a factor of
6
(x [1+106 ]) . Now [1+10 ] is very close to , but not close enough for Scilab to
consider them the same numbers, so no deflation occurs. On the other hand, in the second ratio
Scilab treats (x [1+109 ]) as numerically equivalent to (x) and deflates the polynomial
by that factor. The lesson is that a root estimate must be very accurate for it to be successfully
factored out of a polynomial.

4 Horner's method
Let's turn to the numerical mechanics of evaluating a polynomial. To compute
2

f (x)=c1 +c 2 x+c 3 x ++c n+1 x

for some value of x, it is generally not a good idea to directly evaluate the terms as written.
Instead, consider the following factorization of a quadratic
2

c 1+c 2 x+c 3 x =c 1+( c2 +c 3 x ) x


a cubic
2

c 1+c 2 x+c 3 x +c 4 x =c 1+[c 2+(c 3+c 4 x) x ] x


and a quartic
2

c 1+c 2 x+c 3 x +c 4 x +c 5 x =c 1+(c2 +[c 3+(c 4+c 5 x) x ] x) x


Notice that in all cases the expressions on the right involve only two simple operations: multiply
by x or add a coefficient. There is no need to evaluate all the various powers x , x 2 , x 3 , x 4 , .
This is particularly important for low-level computational systems such as microcontrollers,
which typically do not have the hardware floating-point accelerator that a high-end cpu would.
Evaluating a polynomial using this factored form is called Horner's method. We can code it in a
few lines. See the function polyHorner in the Appendix. As we've already seen, Scilab
contains a built-in horner function.
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

6/14

Now consider polynomial deflation. Suppose we have a polynomial


p ( x)=c1 +c 2 x+c 3 x 2 +c 4 x 3
and we know that r is a root. We want to factor (xr ) from p ( x) to get a polynomial
q ( x)=b1+b2 x+b 3 x

We must have
c1+c2 x+c3 x 2+c4 x 3=( xr )( b1+b 2 x+b3 x 2)
=r b 1rb 2 xr b 3 x 2
+b1 x + b2 x 2+b 3 x 3
Equating coefficients of like powers of x we have
c4 =b3
c3=r b 3+b 2
c2 =r b2+b1
c1=r b 1
We can rearrange this to get
b 3=c 4
b 2=c 3+r b3
b 1=c 2+r b2
0=c 1+r b1
Generalizing to an arbitrary order polynomial we have
b n=c n+1
b k =c k+1+r bk +1 for k=n1, n2, , 1

(3)

Additionally, the equation c 1+r b 1=0 must automatically be satisfied if r is a root. This
algorithm appears in the Appendix as polyDeflate.

5 Finding the roots of a polynomial


We are now in a position to write a function that solves for all n roots of an nth order polynomial.
The algorithm is simply
Polynomial root-finding algorithm
for i=1 to n
find a root r of p(z)
remove a factor of (z-r) from p(z)
Newton's method actually works quite well at finding the complex zeros of a polynomial, and we
can use the rootNewton function we developed previously without modification. For the
purposes of calculating the derivative of a polynomial, note that if
p (x )=c 1+c 2 x+c 3 x 2++c n+1 x n
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

7/14

then

p ( x)=c 2+2 c 3 x++n c n+1 x n1


=b1+b 2 x++b n x n1
and for 1k n
b k =k c k +1
In Scilab we can calculate these coefficients of the derivative as
b = c(2:n+1).*(1:n);

Now, one subtle point. If the coefficients c k are all real, then so are the coefficients b k . It

follows that if x is real then x f ( x)/ f ( x) is real also. Therefore, if Newton's method starts on
the real axis, it can never leave the real axis. For that reason we need to start with a complex
value of x 0 .
The function polyRoots shown in the Appendix is our implementation. We use
polyHorner to evaluate the polynomials. Iteratively we use rootNewton to find a (any)
root. Then we use polyDeflate to remove that root's factor and reduce the order of the
polynomial.
This simple code actually works pretty well. Run on several hundred randomly generated 7 th
order polynomials it only failed about one percent of the time. However, those failures
demonstrate why numerical analysis is an active field of research. In any type of problem there
are almost always some hard cases which thwart a given algorithm. This motivates people to
develop more advanced methods. For polynomial root finding some of the more advanced
methods are Laguerre's method, Bairstow's method, the DurandKerner method, the Jenkins
Traub algorithm and the companion-matrix method. In its built-in root finding function Matlab
uses the companion-matrix method while in Scilab you can use either the companion-matrix
method (default) or the JenkinsTraub algorithm.

5.1 Polishing the roots


Numerical root finding and polynomial deflation will be subject to round-off and finite tolerance
errors. The more we deflate a polynomial the more error can be introduced into the coefficients.
So, especially for a large-order polynomial, we should be suspicious of the roots we find with the
polyRoots function. It is a very good idea to polish the roots using Newton's method on the
original polynomial before performing deflation. This process should be very rapid since we are
(presumably) very close to a true root. The function polyRootsPolished in the Appendix
adds root polishing to polyRoots.

6 The roots function


In both Scilab and Matlab there is a built-in roots function for finding all roots of a
polynomial. It's as simple as
r = roots(p);

where p is a polynomial and r is an array containing all roots of p. For example, in Scilab

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

8/14

-->p = poly([1,2,3,4,5,6],'x','coeff')
p =
2
3
4
5
1 + 2x + 3x + 4x + 5x + 6x
-->r = roots(p)
r =
0.2941946
0.2941946
- 0.6703320
- 0.3756952
- 0.3756952

+ 0.6683671i
- 0.6683671i
+ 0.5701752i
- 0.5701752i

Once you know the roots you can, if you wish, write the polynomial in factored form. If the
polynomial has real coefficients then complex roots come in conjugate pairs. A product of the

form ( zr )( zr ) always reduces to a quadratic with real coefficients z 2 +b z+c . To avoid


factors with complex roots Scilab has the polfact command
-->polfact(p)
ans =
6

0.5332650 - 0.5883891x + x

0.4662466 + 0.7513904x + x

0.6703320 + x

This tells us that our polynomial can be factored (approximately) as


1+2 x+3 x 2+4 x 3+5 x 4+6 x 5=6 (0.5330.588 x+ x 2)(0.466+0.751 x+ x 2)(0.670+x )
Some other useful polynomial/rational functions in Scilab are
derivat, factors, numer, denom, simp_mode, clean, order

As always, see the help browser for more details.

7 References
1. http://www.ece.rice.edu/dsp/software/FVHDP/horner2.pdf (retrieved 2014-06-04)

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

9/14

8 Appendix Scilab code


8.1 Horner's method
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015

//////////////////////////////////////////////////////////////////////
// polyHorner.sci
// 2014-06-04, Scott Hudson
// Horner's method for polynomial evaluation. c=[c(1),c(2),...,c(n+1)]
// are coefficients of polynomial
// p(z) = c(1)+c(2)*z+c(3)*z^2+...+c(n+1)*z^n
// z is the number (can be complex) at which to evaluate polynomial
//////////////////////////////////////////////////////////////////////
function w=polyHorner(c, z)
n = length(c)-1;
w = c(n)+c(n+1)*z;
for i=n-1:-1:1
w = c(i)+w*z;
end
endfunction

8.2 Polynomial deflation


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017

//////////////////////////////////////////////////////////////////////
// polyDeflate.sci
// 2014-06-04, Scott Hudson
// Given the coefficients c = [c(1),c(2),...,c(n+1)] of polynomial
// p(z) = c(1)+c(2)*z+c(3)*z^2+...+c(n+1)*z^n
// and root r, p(r)=0, remove a factor of (z-r) from p(z) resulting in
// q(z) = b(1)+b(2)*z+...+b(n)*z^(n-1) of order one less than p(z)
// Return array of coefficients b = [b(1),b(2),...,b(n)]
//////////////////////////////////////////////////////////////////////
function b=polyDeflate(c, r)
n = length(c)-1;
b = zeros(1,n);
b(n) = c(n+1);
for k=n-1:-1:1
b(k) = c(k+1)+r*b(k+1);
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

10/14

8.3 Polynomial root solver


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026

//////////////////////////////////////////////////////////////////////
// polyRoots.sci
// 2014-06-04, Scott Hudson, for pedagocic purposes only!
// Given an array of coefficients c=[c(1),c(2),...,c(n+1)]
// defining a polynomial p(z) = c(1)+c(2)*z+...+c(n+1)*z^n
// find the n roots using Newton's method (with complex arguments)
// followed by polynomial deflation. The derivative polynomial is
// b(1)+b(2)*z+b(3)*z^2+...+b(n)*z^(n-1) =
// c(2)+2*c(3)*z+3*c(4)*z^2+...+n*c(n+1)*z^(n-1)
//////////////////////////////////////////////////////////////////////
function r=polyRoots(c)
n = length(c)-1; //order of polynomial
b = zeros(1,n); //coefficients of polynomial derivative
b = c(2:n+1).*(1:n); //b(k) = c(k+1)*k
deff('y=f(z)','y=polyHorner(c,z)'); //f(x) for Newton method
deff('y=fp(z)','y=polyHorner(b,z)'); //fp(x) for same
r = zeros(n,1);
z0 = 1+%i; //initial search point, should not be real
for i=1:n-1
r(i) = rootNewton(z0,f,fp,1e-8);
c = polyDeflate(c,r(i));
m = length(c)-1; //order of deflated polynomial
b = c(2:m+1).*(1:m); //b(k) = c(k+1)*k
end
r(n) = -c(1)/c(2); //last root is solution of c(1)+c(2)*z=0
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

11/14

8.4 Polynomial root solver with root polishing


//////////////////////////////////////////////////////////////////////
// polyRootsPolished.sci
// 2014-06-04, Scott Hudson, for pedagocic purposes
// Given an array of coefficients c=[c(1),c(2),...,c(n+1)]
// defining a polynomial p(z) = c(1)+c(2)*z+...+c(n+1)*z^n
// find the n roots using Newton's method (with complex arguments)
// followed by polynomial deflation. The derivative polynomial is
// b(1)+b(2)*z+b(3)*z^2+...+b(n)*z^(n-1) =
// c(2)+2*c(3)*z+3*c(4)*z^2+...+n*c(n+1)*z^(n-1)
// Each root is polished before deflation
//////////////////////////////////////////////////////////////////////
function r=polyRootsPolished(c)
n = length(c)-1; //order of polynomial
b = zeros(1,n); //coefficients of polynomial derivative
b = c(2:n+1).*(1:n); //b(k) = c(k+1)*k
deff('y=f(z)','y=polyHorner(c,z)'); //f(x) for Newton method
deff('y=fp(z)','y=polyHorner(b,z)');//fp(x) for same
r = zeros(n,1);
z0 = 1+%i; //initial search point, should not be real
c0 = c;
//save original coefficients for polishing
b0 = b;
deff('y=f0(z)','y=polyHorner(c0,z)'); //f(x) or orig. poly
deff('y=fp0(z)','y=polyHorner(b0,z)');//fp(x) for same
for i=1:n-1
r(i) = rootNewton(z0,f,fp,1e-4);
r(i) = rootNewton(r(i),f0,fp0,1e-8); //polish root using original poly
c = polyDeflate(c,r(i));
m = length(c)-1; //order of deflated polynomial
b = c(2:m+1).*(1:m); //b(k) = c(k+1)*k
end
r(n) = -c(1)/c(2); //last root is solution of c(1)+c(2)*z=0
r(n) = rootNewton(r(n),f0,fp0,1e-8); //polish root using original poly
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

12/14

9 Appendix Matlab code


9.1 Horner's method
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% polyHorner.sci
%% 2014-06-04, Scott Hudson
%% Horner's method for polynomial evaluation. c=[c(1),c(2),...,c(n+1)]
%% are coefficients of polynomial
%% p(z) = c(1)+c(2)*z+c(3)*z^2+...+c(n+1)*z^n
%% z is the number (can be complex) at which to evaluate polynomial
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function w = polyHorner(c,z)
n = length(c)-1;
w = c(n)+c(n+1)*z;
for i=n-1:-1:1
w = c(i)+w*z;
end
end

9.2 Polynomial deflation


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% polyDeflate.sci
%% 2014-06-04, Scott Hudson
%% Given the coefficients c = [c(1),c(2),...,c(n+1)] of polynomial
%% p(z) = c(1)+c(2)*z+c(3)*z^2+...+c(n+1)*z^n
%% and root r, p(r)=0, remove a factor of (z-r) from p(z) resulting in
%% q(z) = b(1)+b(2)*z+...+b(n)*z^(n-1) of order one less than p(z)
%% Return array of coefficients b = [b(1),b(2),...,b(n)]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function b = polyDeflate(c,r)
n = length(c)-1;
b = zeros(1,n);
b(n) = c(n+1);
for k=n-1:-1:1
b(k) = c(k+1)+r*b(k+1);
end
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

13/14

9.3 Polynomial root solver


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% polyRoots.sci
%% 2014-06-04, Scott Hudson, for pedagocic purposes
%% Given an array of coefficients c=[c(1),c(2),...,c(n+1)]
%% defining a polynomial p(z) = c(1)+c(2)*z+...+c(n+1)*z^n
%% find the n roots using Newton's method (with complex arguments)
%% followed by polynomial deflation. The derivative polynomial is
%% b(1)+b(2)*z+b(3)*z^2+...+b(n)*z^(n-1) =
%% c(2)+2*c(3)*z+3*c(4)*z^2+...+n*c(n+1)*z^(n-1)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function r = polyRoots(c)
n = length(c)-1; %%order of polynomial
b = zeros(1,n); %%coefficients of polynomial derivative
b = c(2:n+1).*(1:n); %%b(k) = c(k+1)*k
function y = f(z)
y = polyHorner(c,z);
end
function y = fp(z)
y = polyHorner(b,z);
end
r = zeros(n,1);
z0 = 1+1i; %%initial search point, should not be real
for i=1:n-1
r(i) = rootNewton(z0,@f,@fp,1e-8);
c = polyDeflate(c,r(i));
m = length(c)-1; %%order of deflated polynomial
b = c(2:m+1).*(1:m); %%b(k) = c(k+1)*k
end
r(n) = -c(1)/c(2); %%last root is solution of c(1)+c(2)*z=0
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 9: Polynomials

14/14

9.4 Polynomial root solver with root polishing


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% polyRootsPolished.sci
%% 2014-06-04, Scott Hudson, for pedagocic purposes
%% Given an array of coefficients c=[c(1),c(2),...,c(n+1)]
%% defining a polynomial p(z) = c(1)+c(2)*z+...+c(n+1)*z^n
%% find the n roots using Newton's method (with complex arguments)
%% followed by polynomial deflation. The derivative polynomial is
%% b(1)+b(2)*z+b(3)*z^2+...+b(n)*z^(n-1) =
%% c(2)+2*c(3)*z+3*c(4)*z^2+...+n*c(n+1)*z^(n-1)
%% Each root is polished before deflation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function r = polyRootsPolished(c)
n = length(c)-1; %%order of polynomial
b = zeros(1,n); %%coefficients of polynomial derivative
b = c(2:n+1).*(1:n); %%b(k) = c(k+1)*k
function y = f(z)
y = polyHorner(c,z);
end
function y = fp(z)
y = polyHorner(b,z);
end
r = zeros(n,1);
z0 = 1+1i; %%initial search point, should not be real
c0 = c;
%%save original coefficients for polishing
b0 = b;
function y = f0(z)
y = polyHorner(c0,z);
end
function y = fp0(z)
y = polyHorner(b0,z);
end
for i=1:n-1
r(i) = rootNewton(z0,@f,@fp,1e-4);
r(i) = rootNewton(r(i),@f0,@fp0,1e-8); %%polish root using original poly
c = polyDeflate(c,r(i));
m = length(c)-1; %%order of deflated polynomial
b = c(2:m+1).*(1:m); %%b(k) = c(k+1)*k
end
r(n) = -c(1)/c(2); %%last root is solution of c(1)+c(2)*z=0
r(n) = rootNewton(r(n),@f0,@fp0,1e-8); %%polish root using original poly
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10
Linear algebra
1 Introduction
Engineers deal with many vector quantities, such as forces, positions, velocities, heat flow, stress
and strain, gravitational and electric fields and on and on. In this lecture we want to review basic
concepts and operations on vectors. We will see how a linear system of equations can naturally
arise from physical constraints on linear combinations of vectors, and how the resulting
bookkeeping naturally leads to the idea of a matrix and matrix-vector equations.

2 Vectors
Abstractly a vector is simply a one-dimensional array of numbers. The dimension of the vector is
the number of elements in the array. When arranged vertically we call this a column vector. The
following are three-dimensional column vectors

() ()

u1
u= u2
u3

4
v= 3
1

A horizontal arrangement is called a row vector, such as


w=(11 ,7 , 2 )

r=( x , y , z )

In engineering applications a vector almost always represents a physical quantity that has both
magnitude and direction. The elements of the array are the components of the vector along the
corresponding coordinate axis. It's very useful to visualize a vector as an arrow in space with the
same direction and the arrow length representing the magnitude.
As an example, the two-dimensional vector

()

r=

3
2

can be graphically represented (Fig. 1) as an arrow from the origin to a point with rectangular
coordinates (3,2) . We say r has component 3 in the x direction and component 2 in the y
direction. Converting the x,y coordinates into polar form
x= cos
y=sin
= x 2+ y 2 = 133.61
y
2
=tan 1 =tan1 33.7 o
x
3
We identify the length as the magnitude of the vector and as its direction (relative to the x
axis).
Here's a potential source of confusion. There is not necessarily anything physically significant
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

2/13

Fig. 1 Length and direction of a vector. In two dimensions these are


just the polar coordinate representation of the vector components.

about the arrow's endpoint, (3,2) in this case. Say this vector represents a force acting on a
particle at the origin. Then the force exists at a single point, the origin. It does not exist at the
point (3,2) or anywhere except the origin. A force acting on a point particle has no extension in
space. We are simply using a physical arrow to visualize the magnitude and direction of the
force. The coordinates in this case might have units of newtons. On the other hand, suppose the
vector r represents a displacement in which a particle originally at the origin is moved to the
point x=3 , y=2 . In this case there is a physical significance to the arrow's endpoint. This is just
something to keep in mind. We get so accustomed to representing physical quantities such as
forces and velocities by arrows that it's easy to forget that they do not necessarily physically
coincide with these arrows.
A vector does not have to start at the origin. Suppose a particle is at location x=3 m , y=2 m
and moving with velocity v x =2 m/s , v y =1 m/s . We could illustrate this situation as shown in
Fig. 2.
In this case it's more physically meaningful to put the tail of the v vector at the particle's location.
Again, the location of the head of the v vector is not physically significant. In fact the r and v
vectors don't even have the same units m vs. m/s. However, could say that if the particle
traveled at velocity v for 1 second it would end up at the head of the vector v, which is location

Fig. 2 Particle at position r moving with velocity v. If the particle moved


with constant velocity for 1 second it would arrive at the head of v.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

3/13

Fig. 3 If v is an instantaneous velocity the particle does


not follow the vector v for any extended length of time.

x=5 , y=1 . In fact, assuming v remains constant, we could write the particle position as the
vector

( )()( ) ( )

r (t)=

x (t)
3
2
3+2 t
= +
t=
y (t)
2
1
2t

This illustrates the concept of vector addition. Often, however, v will represent the instantaneous
velocity of a particle following a curved trajectory, as illustrated in Fig. 3. In this case the particle
will not following the v vector (expect for an instant) or arrive at its head.

2.1 Vector norm (length)


In two- and three-dimensional geometry we have the Pythagorean theorem which gives use the
length of a displacement with components x,y or x,y,z as
L= x 2+ y 2 or L= x 2+ y 2+ z 2
We generalize this to an n-dimensional vector u to define the vector norm as
u = u 21+u 22++u2n

(1)

Even though the length of a ten-dimensional vector doesn't have a direct physical meaning, the
norm concept is very useful.
This formula is actually only one way to define a vector norm. The p-norm is defined as

Fig. 4 unit vectors centered at the origin fall on


the unit circle in 2D and on the unit sphere in 3D.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

4/13

u p=(| u1| +| u2| ++|u n|


p

1
p p

The Euclidean norm of the Pythagorean theorem would then be called the 2-norm. When
p we obtain the infinity-norm
u =max | u k|
which has important applications in, among other things, control systems theory. Suppose the kth
element of u represents the distances traveled during the kth segment of a trip. Then the 1-norm
u 1=| u1| +|u 2|++|u n|
is just the total length of the trip. When vectors are represented by boldface letters the
corresponding italic letter is often taken to represent the norm
w= w
In both Scilab and Matlab the function norm(x,p) calculates the p-norm of vector x. For
example
-->x = [1;2;3]
x =
1.
2.
3.
-->norm(x,1)
ans =
6.
-->norm(x,2)
ans =
3.7416574
-->norm(x,'inf')
ans =
3.
-->norm(x)
ans =
3.7416574

Note that norm(x) gives the default 2-norm or Euclidean norm. In this class we will take
norm to mean Euclidean norm, unless explicitly stated otherwise.
A vector with norm of 1 is called a unit vector. We can make any non-zero vector a unit vector
by dividing it by its norm. Commonly a hat or the letter a with a subscript is used to denote a
unit vector, for example
a u= u^ =

u
u

A unit vector represents a pure direction. In two and three dimensions this is literally a
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

5/13

direction in space (Fig. 4), but in higher dimensions it's a direction only in an abstract sense.
The concept is still very useful, however.

3 Scalar/inner/dot product
In three dimensions the scalar product (also called the inner product or dot product) of two
vectors is
uv=u 1 v 1+u 2 v 2+u3 v 3=u v cos uv

(2)

where uv is the angle between the vectors. The scalar product readily generalizes to ndimensional vectors as
n

uv= ui v i

(3)

uv=u v cos uv

(4)

i=1

We can still write


which defines the angle between u and v as

cos uv

uv
=u^
^ v
u v

(5)

Two vectors with a zero scalar product are said to be orthogonal. Since cos 90 o=0 this means
that the vectors form a right angle, they are perpendicular.

4 Vector/cross product
In three dimensions the vector product (also called the cross product) of two vectors is

u 2 v 3u 3 v 2
w=uv= u3 v 1u 1 v3
u1 v 2u 2 v 1

(6)

The vector product is specific to three dimensions; it does not readily generalize to n dimensions.
It is very important in many applications. For example torque about the origin is the cross
product of force and position. The magnitude of the cross product is
w=u v sin uv

(7)

Since sin 0=0 the vector product of parallel vectors is zero.

5 Matrix-vector product
Suppose we have two, two-dimensional vectors

() ()

u=

u1
u2

v=

v1
v2

(8)

A certain displacement might be described as travel x 1 times u followed by x 2 times v to end


up at y. Algebraically we have x 1 u+ x 2 v=y or
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

6/13

x1

() ()( )
u1
v
y
+ x2 1 = 1
u2
v2
y2

(9)

In terms of components
u1 x1 +v 1 x 2= y 1
u 2 x1+v 2 x 2= y 2

(10)

For bookkeeping purposes we will write this as

( )( ) ( )
u 1 v1
u2 v2

x1
y
= 1
x2
y2

(11)

where the two-dimensional array is a matrix, the columns of which are the vectors u and v.
Thinking of this array as a single entity A we can specify its elements using two indicies

a 11 a12
a 21 a22

)( ) ( )
x1
y
= 1
x2
y2

(12)

This is just a different way to express the two equations


a11 x 1+a 12 x 2= y 1
a 21 x1 +a 22 x 2= y 2

(13)

Employing the notation

A=

) () ()

a 11 a12
x
y
, x= 1 , y= 1
a 21 a 22
x2
y2

(14)

we can write (13) compactly as the matrix-vector equation


Ax=y
If A is an m-by-n matrix

a11 a 12
A= a 21 a 22

am1 am2

a1n
a2n

a mn

(15)

and x is an n-by-1 matrix (a column vector)

()

x1
x= x 2

xn

(16)

Then the product y=Ax is an m-by-1 matrix (a column vector)

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

7/13

Fig. 5 The product Ax=y can be thought of as summing scaled versions of the column vectors of A.

()

y1
y= y 2

ym

(17)

with components
n

y i= aij x j

(18)

j =1

Note that the product Ax as defined by (18) only works if number of columns of A is equal to
the number of elements of x (in this case both are n); each column of A gets multiplied by the
corresponding element of x.
We are most often (but not always) interested in the case m=n where the matrix A is square. In
any case we can visualize the linear system
Ax=y

(19)

as (Fig. 5)

think of each column of A as a vector

scale the jth column by the factor x j

sum up all the scaled vectors to get y

In this visualization we assume that x is known and we want to calculate y. Often we are faced
with the inverse problem where y is known and we want to calculate x. We formally write
1

x=A y

(20)

where A is the inverse of matrix A. Solving this problem will be the topic of the next lecture.
For now we want to motivate our study of linear systems of equations by considering two
important problems which give rise to such systems.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

8/13

Fig. 6 A truss bridge formed of members connected at joints.

6 Two-dimensional truss problem


In Civil Engineering a classic problem is to design a bridge structure that can withstand some
specified loads (Fig. 6). In its simplified form this leads to the two-dimensional truss problem
(Fig. 7). We assume that the structure consists of a number of linear members of negligible mass
which are connected at their ends by frictionless pins at various joints. The members are either in
compression or tension, so they either push or pull on the joints. External forces may be applied
at the joints. Two joints, that we will take to be joints 1 and 2, connect the structure to the
ground, and at these joints the ground exerts reaction forces. The truss problem is to calculate the
member compression/tension forces and the reaction forces, given the truss geometry and
external applied forces. The conditions for static equilibrium are that the net vector force at each
joint is zero (the so-called method of joints). In two dimensions this gives us two equations
F net
x =0
F net
y =0

(21)

at each joint.
To be specific let's take the system illustrated in Fig. 7. There are four joints located at
(x k , y k ) , k=1,2,3,4 and five members with compression forces u k , k =1,2,3,4,5 . An external
force is applied at joint 4 with components ( f 4 x , f 4 y ) . A reaction force (due to the mounting of
the system) is applied at joint 1 with components (r 1 x , r 1 y ) and a reaction force with y
component r 2 y is applied at joint 2.
Assuming the force ( f 4 x , f 4 y ) is known, there are eight unknowns which form the components
of an eight-dimensional vector:

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

9/13

Fig. 7: Geometry of the truss problems. Member m i has compression force u i and connects to other
members at some joints j k . Applied forces f and reaction forces r also acts on two or more joints.

()

u1
u2
u3
u4
u=
u5
u 6=r x 1
u 7=r y1
u8=r y 2

(22)

This bookkeeping is an example of how higher-dimensional spaces arise. Our vector is a


collection of various (arbitrarily arranged) physical parameters as opposed to a three-dimensional
force or velocity with a direct physical significance.
A given member pushes on a given joint in a direction determined by the member's endpoints.
For example, member m3 pushes on joint j 2 at an angle 23 where
tan 23 =

y 2 y3
x 2 x 3

(23)

tan ij =

yi y j
x i x j

(24)

In general we will define

The equations of equilibrium are now as follows.


At j 1
u 1 cos 12 +u 2 cos 13 +u6=0
u 1 sin 12 +u 2 sin 13+u 7=0

EE 221 Numerical Computing

Scott Hudson

(25)

2015-08-18

Lecture 10: Linear algebra

10/13

At j 2
u 1 cos 21 +u 3 cos 23 +u5 cos 24=0
u 1 sin 21+u 3 sin 23+u 5 sin 24+u 8=0

(26)

u 2 cos 31 +u 3 cos 32 +u4 cos 34=0


u2 sin 31+u3 sin 32+u4 sin 34=0

(27)

u 4 cos 43+u5 cos 42+ f 4 x =0


u 4 sin 43 +u5 sin 42+ f 4 y =0

(28)

At j 3

At j 4

Putting these together into an eight-by-eight linear system we have


(29)

A u=b
with

cos 12
sin 12
cos 21
sin 21
A=
0
0
0
0

cos 13
0
sin 13
0
0
cos 23
0
sin 23
cos 31 cos 32
sin 31 sin 32
0
0
0
0

0
0
0
0
cos 34
sin 34
cos 43
sin 43

0
0
cos 24
sin 24
0
0
cos 42
sin 42

1
0
0
0
0
0
0
0

0
1
0
0
0
0
0
0

0
0
0
1
0
0
0
0

(30)

and

()

0
0
0
0
b=
0
0
f 4x
f 4y

(31)

Notice that the applied forces shown up in the b (the knowns) vector while the member and
reaction forces form the u vector (the unknowns).
From a programming perspective, the challenge would be to generalize this process to allow the
solution of an arbitrary truss problem. This would mostly involve figuring out a systematic way
to do the bookkeeping involved in forming the A matrix and the b vector.
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

11/13

Fig. 8: Two-dimensional grid for solving Laplace's equation.

7 Laplace's equation in two dimensions


A very important equation in engineering analysis is Laplace's equation
2 f =0
where f is a continuous function of space such as f (x , y ) . This field might represent, say,
temperature distribution over the surface of a metal plate. A solution of Laplace's equation has
the property that the value of f at any point is equal to the average value of f at neighboring
points. This is why it comes up so often in equilibrium Nature usually wants physical
quantities (temperature, pressure, electrical potential) to be as smooth or as averaged-out as
possible.
One approach to solving Laplace's equation numerically is to specify u at a discrete grid of points
and require that f at each point be equal to the average of f at neighboring points. Let's consider a
specific two-dimensional case illustrated in Fig. 8. The indicies i,j specify x,y location,
respectively. The condition that f at location i,j is equal to the average of f at its four nearestneighbor points is
1
( f + f i1, j + f i , j +1 + f i , j1 )
4 i+1, j

(32)

4 f i , j f i+1, j f i1, j f i , j+1 f i , j1=0

(33)

f i , j=
or equivalently

We assume the f values on the boundary are specified. These form the boundary conditions. Our
task is then to calculate the interior f values such that Laplace's equation is satisfied.
To use our matrix-vector formalism (29) the unknowns need to be arranged in a one-dimensional
column vector. As illustrated (Fig. 8), one way to do this is to number the interior points from 1
to 16 as unknowns
u k = f i , j where k =4( j2)+(i1)
Just running through the 16 points and considering (33) we can, by inspection, obtain a 16-by-16
linear system of the form (29) with

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

12/13

4 1
0
0 1
0
0
0
0
0
0
0
0
0
0
0
1
4 1
0
0 1
0
0
0
0
0
0
0
0
0
0
0
1
4
1
0
0 1
0
0
0
0
0
0
0
0
0
0
0 1
4
0
0
0 1
0
0
0
0
0
0
0
0
1
0
0
0
4 1
0
0 1
0
0
0
0
0
0
0
0 1
0
0 1
4 1
0
0 1
0
0
0
0
0
0
0
0 1
0
0 1
4 1
0
0 1
0
0
0
0
0
0
0
0 1
0
0 1
4
0
0
0 1
0
0
0
0
A=
0
0
0
0 1
0
0
0
4 1
0
0 1
0
0
0
0
0
0
0
0 1
0
0 1
4 1
0
0
1
0
0
0
0
0
0
0
0 1
0
0 1
4 1
0
0
1
0
0
0
0
0
0
0
0 1
0
0 1
4
0
0
0 1
0
0
0
0
0
0
0
0 1
0
0
0
4 1
0
0
0
0
0
0
0
0
0
0
0 1
0
0 1
4 1
0
0
0
0
0
0
0
0
0
0
0 1
0
0 1
4 1
0
0
0
0
0
0
0
0
0
0
0 1
0
0 1
4

(34)

and

()

f 1,2 + f 2,1
f 3,1
f 4,1
f 5,1
f 1,3
0
0
f 6,3
b=
f 1,4
0
0
f 6,4
f 1,5 + f 2,6
f 3,6
f 4,6
f 5,6 + f 6,5

(35)

The kth row of this system is a statement of (33) for unknown u k . This illustrates a few things.
First, the dimension n of our linear system is determined by the number of unknowns, not by the
2 or 3 dimensions of physical space. This number can easily be very large. Suppose we wanted to
have a 100-by-100 two-dimensional grid of unknown field values. This is not actually very large,
after all, a 100-by-100 pixel image is essentially a thumbnail. Yet this results in n=10,000
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 10: Linear algebra

13/13

unknowns and a matrix A that is 10,000-by-10,000 in size. In three dimensions the problem
dimension would be n=1003=1,000,000 , and our matrix would have (106 )2 or one trillion
entries! Yet these are often the size of problems we need to solve in engineering applications.
Second, and fortunately for us, a glance at (34) and consideration of the way it was built using
(33) reveals that the great majority of entries in A will be zeros. We say that A is a sparse matrix.
So, even if it does contain a trillion elements, only a tiny fraction are non-zero. Sparse matrix
techniques exploit this fact to store and manipulate such matrices using orders-or-magnitude less
resources than would be needed for dense matrices, and they enable use to solve physically
significant problems using available computing power. We will take a look at sparse matrix
techniques in a later lecture.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11
Linear systems of equations I
1 Introduction
In this lecture we consider ways to solve a linear system Ax=b for x when given A and b.
Writing out the components our system has the form

a 11
a 21
a 31

an1

a 12
a 22
a 32

an2

a 13
a 23
a 33

an3

a1 n
a2 n
a3 n

a nn

)( ) ( )
x1
b1
x2
b2
=
x3
b3

xn
bn

(1)

Each row of A corresponds to a single linear equation in the unknown x values. The ith equation
is
a i 1 x1 +a i 2 x 2++a in x n=b i
and the jth equation is
a j 1 x 1+a j 2 x 2++a j n x n=b j
We will use the following facts to transform the system (1) into a form in which the solution is
trivially apparent or at least can be easily calculated.
Fact 1: We can scale any equation by a non-zero constant ( c0 ) without changing the
solution
a i 1 x1 +a i 2 x 2++a in x n=b i ca i 1 x1 +ca i 2 x 2++ca in x n=cbi
Fact 2: We can replace an equation by its sum or difference with another equation without
changing the solution
a i 1 x1 +a i 2 x 2++a in x n=b i (a i 1+a j 1) x 1+(ai 2+a j 2) x 2++(ai n+a jn ) x n =bi +b j

2 Gauss-Jordan elimination
If A was the identity matrix

1
0
0

0
1
0

0
0
1

)( ) ( )

0
0
0

x1
b1
x2
b2
=
x3
b3

xn
bn

(2)

the solution would be trivial


x k =b k
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

2/13

Gauss-Jordan elimination is a process to convert an arbitrary system (1) into the trivial system
(2). Since we want to end up with a 11=1 , we use Fact 1 to multiply the first row of A and b by
1/a 11
a 1 j a 1 j / a11 , j=1,2, , n , b1 b 1 /a11
to obtain the form

1
a 21
a 31

an1

a 12
a 22
a 32

an2

a 13
a 23
a 33

an3

a1n
a2n
a3n

a nn

)( ) ( )
x1
b1
x2
b2
=
x3
b3

xn
bn

(note a 12 , a 13 etc. will have changed values). Now we use Fact 2 to eliminate the elements
a 21 , a 31 , , a n 1 in the first column by subtracting a i 1 times the first row from the ith row
a ij a ij a i 1 a 1 j , j=1,2, , n , bi biai 1 b1
for i=2,3, , n resulting in

1
0
0

a 12
a 22
a 32

an2

a 13
a 23
a 33

an3

a1 n
a2n
a3n

a nn

)( ) ( )
x1
b1
x2
b2
=
x3
b3

xn
bn

The first column is now in the desired form. Let's move to the second column. Since we want to
end up with a 22=1 , use Fact 1 to multiply the second row by 1/a 22
a 2 j a 2 j /a 22 , j=2,3, , n , b 2 b2 /a 22
Note that we don't bother with j=1 since a 21=0 . Then we use Fact 2 to eliminate all elements
a i 2 , i2
a ij a ij a i 2 a 2 j , j=2,3, , n , bi biai 2 b 2
Again we don't bother with j=1 since a 21 =0 . We end up with

1
0
0

0
1
0

a13
a 23
a33

an3

a1n
a2n
a3n

a nn

)( ) ( )
x1
b1
x2
b2
=
x3
b3

xn
bn

We now move on to the third column and so on, continuing until we have the form shown in (2).
Notice that the element bi is transformed in the same manner as the elements a ij . This suggests
that we form an augmented matrix as A plus b as an extra column and then simply transform the
~
a values as a whole. Let's call the augmented matrix A . Then
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

a 11
a 21
~
A= a
31

an1

a 12
a 22
a 32

an2

a 13
a 23
a 33

an3

3/13

)(

a1n
a2n
a3n

a nn

b1
a 11
b2
a 21
=
b3
a 31

bn
an1

a 12 a13
a 22 a 23
a 32 a 33

an2 an3

a1n
a2n
a3n

a nn

a1 n +1
a2 n+1
a3 n +1

a n n+1

We simply replace a ij a ij ai 2 a 2 j , j =2,3, , n by a ij a ija i 2 a 2 j , j=2,3, , n+1 and the b


values are automatically taken care of. Our algorithm can now be stated as
Gauss-Jordan elimination algorithm
~
Form the augmented matrix A=( A , b )
for k =1,2, , n
for j=n+1, n , n1, , k
a kj a kj /a kk
for i=1,2, , n
if ik (we don't want a row to eliminate itself)
for j=n+1, n , n1, , k
a ij a ij aik a kj
~
When the loops are complete, the last column of A will be the solution vector x.

Example 1: Consider the system Ax=b with

( ) ( )

A=

1 2
3 4

4
, b=
10

We form the augmented matrix

~
A= 1 2 4
3 4 10

The first pivot, a 11 , is already 1. We now subtract 3 times row 1 from row 2

~ 1 2
4
A=
0 2 2

The second pivot is a 22 =2 . We divide row 2 by this to get

~
A= 1 2 4
0 1 1

We now subtract 2 times row 2 from row 1 to obtain

~ 1 0 2
A=
0 1 1

The last column is the solution x.


EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

4/13

The function linearGaussJordan shown in the Appendix is a Scilab implementation of this


algorithm.
Now, suppose we solve the following problems one at a time using this method

() () () ()
( )

1
0
A x1= 0 , A x2 =

0
Then

0
1
0 , A x3 =

1
0
A ( x1 , x2 , x3 , , x n)= 0

0
0
1 , , A xn =

0
1
0

0
0
1

0
0
0

0
0
0
0
1

or AX =I where X is the matrix with column vectors x 1 , x 2 , , x n . It follows that X is the


inverse of A. We don't have to solve n problems one at a time, however. The operations we
perform with the a ij values will be the same regardless of the right-hand vector b. Therefore, we
can simply form the augmented matrix

a 11
a 21
~
A= a
31

an1

a 12
a 22
a 32

an 2

a 13
a 23
a 33

an3

a1n
a2n
a3n

a nn

1
0
0

0
1
0

0
0
1

0
0
0

~
and perform Gauss-Jordan elimination on A to effectively solve these n problems in parallel.
~
When complete, the last n columns of A will be the inverse A1 . In the Appendix, program
linearGaussJordanInverse gives a Scilab implementation of this algorithm. Note the
slight differences between this and linearGaussJordan.
Notice that the Gauss-Jordan elimination algorithm consists of three nested levels of for loops.
Each (runs roughly speaking) over on order of n values. It follows that the total number of
operations is on the order of nnn=n3 . This gives a measure of the computational complexity
and therefore of the amount of cpu time required for the algorithm. For this reason you will often
see statements of the form matrix inversion is an O(n3) operation, where the big O
represents order of.

3 Gaussian elimination
Gauss-Jordan elimination is a logical way to solve Ax=b or to find A1 . However, there are
faster and more robust methods. In particular we can solve Ax=b with only about 1/3 the
number of operations using the so-called LU decomposition that we will develop in the next
lecture. The price we pay is that getting the solution is more convoluted than it is for GaussEE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

5/13

Jordan elimination; it involves various substitution operations. Here we'll introduce this idea
by considering the so-called Gaussian elimination algorithm.
In Gauss-Jordan elimination we zero-out all elements of A except those on the diagonal, and
we normalize the diagonal elements to 1. In Gaussian elimination we don't bother with the
elements above the diagonal; we only zero-out the elements below the diagonal. We also don't
bother to normalize the diagonal elements to 1. The result is a system in the form

a 11 a 12 a 13
0 a 22 a 23
0
0 a 33

0
0
0

a1 n
a2 n
a3 n

a nn

)( ) ( )
x1
b1
x2
b2
x3 = b3

xn
bn

(3)

The matrix is in upper-triangular form; all elements below the diagonal are zero. The algorithm
to achieve this is an obvious variation of Gauss-Jordan elimination.
Gaussian elimination algorithm
~
Form the augmented matrix A=( A , b )
for k =1,2, , n1
for i=k +1, k +2, , n
for j =n+1, n , n1, , k
a ij a ij a kj a ik /a kk
Since it has three nested for loops, each running over on the order of n values, this is also an
O(n3) process. A problem with (3) is that the solution is not obvious (as it is for (2)) except for
the last row which gives us
a nn x n=b n x n=b n /a nn
So, x n is easy to get. The next-to-last equation is
a n1 , n1 x n1+a n1 ,n x n=bn1
But, we already know x n , so we can solve for
x n1=

1
a n1 , n1

(b n1a n1 , n x n )

The second-to-last equation is


a n2 ,n 2 x n 2+a n2 ,n 1 x n1+a n2 , n x n=b n2
and since we've already solved for x n1 , x n we can solve for the single remaining unknown
x n2 =

1
a n 2 , n2

(b n2[ a n2 ,n 1 x n1+a n2 , n x n ])

Continuing to work our way up row-by-row we have the following algorithm.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

6/13

Back-substitution algorithm
If A is upper-triangular, the solution to Ax=b is given by
for i=n , n1, n2, , 1

1
x i=
b a x
a ii i j=i+1 ij j

(summation is zero when i=n )

A Scilab implementation of Gaussian elimination followed by back-substitution appears in the


Appendix as linearGaussian. This algorithm has two nested for loops (i and j) and is
therefore an O(n2 ) process. This tells us that, at least for large matrices (large n values), backsubstitution is much faster than Gaussian elimination or Gauss-Jordan elimination. The fact that
we've had to add this step, therefore, is not of much concern regarding cpu time.

4 The need for pivoting


Looking over the Gauss-Jordan and Gaussian elimination algorithms it is clear that the only way
they can go wrong is if a kk =0 for some value of k since we would then be trying to divide by
zero. Note that when we get to this point in either algorithm, in general a kk will not have the
same value it had in the original matrix A, but will have been modified by various subtractions
performed in the algorithm. Therefore, even if none of the diagonal elements of A are zero, we
can still end up with a kk =0 at some stage in the algorithm. Furthermore, even if a kk is non-zero
but very small this will still cause problems by amplifying accumulated round-off error by a large
factor of 1/a kk .
The diagonal elements a kk are called pivots, and the way to avoid the problem of a small or zero
pivot is through pivoting swapping two rows or two columns of the augmented matrix.
Pivoting is a critical requirements for a robust algorithm, and without it the code given in the
Appendix is not generally reliable.
Let's again consider the general problem Ax=b written out in component form

(
(

a 11
a 21
a 31

an1

a 12
a 22
a 32

an2

a 13
a 23
a 33

an3

a1 n
a2 n
a3 n

a nn

)( ) ( )
)( ) ( )
x1
b1
x2
b2
x3 = b3

xn
bn

Each row represents a single linear equation in the unknowns. In what way will the solution
change if we swap, say, the first and third rows of both A and b?
a 31
a 21
a 11

an1

a 32
a 22
a 12

an2

a 33
a 23
a 13

an3

a3 n
a2 n
a1 n

a nn

x1
b3
x2
b2
=
x3
b1

xn
bn

The answer is that it won't. We've just rearranged the same n equations in n unknowns. It doesn't
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

7/13

matter in what order we write them; the solution will remain the same.
~
Fact 3: Any two rows of the augmented matrix A can be swapped without changing the
solution vector x.
What about swapping columns? Say we swap the first and third columns of A.

a 13 a12
a 23 a 22
a 33 a32

an3 an2

a 11
a 21
a 31

a n1

a1 n
a2 n
a3 n

a nn

)( ) ( )
x1
b1
x2
b2
x3 = b3

xn
bn

Thinking of Ax as a linear combination of the columns of A, the ith column of A gets multiplied
by x i . Swapping the columns is equivalent to relabeling the two corresponding components of x.
In other words if we write the system as

a 13 a12
a 23 a 22
a 33 a32

an3 an2

a 11
a 21
a 31

a n1

a1 n
a2 n
a3 n

a nn

)( ) ( )
x3
b1
x2
b2
x1 = b3

xn
bn

it is just a rearrangement of our original system. Provided we remember to swap x 3 , x 1 at the


end this system will give us the same solution as the original system.
Fact 4: Any two columns of the matrix A can be swapped without changing the solution
vector x, except for a reordering of its elements.
The idea of full pivoting is that when we come to the place in our algorithm where we will be
dividing by a kk , we examine all the ways we can swap rows and columns so as to replace a kk
by the largest magnitude value possible. Provided we do the bookkeeping correctly, this won't
change our solution but it will make it numerically robust.
The difficultly with full pivoting is that for each loop there are a lot of possible row-plus-column
swaps, and finding the best involves many absolute-value tests in each loop. Moreover, if we
swap columns we need to keep track of this in order to untangle the x values at the end.
However, if we limit ourselves to a single row swap, which is called partial pivoting or row
pivoting, then we need only check the magnitudes of the elements in the kth column, and we
don't have to keep track of the swaps. Experience shows that partial pivoting is sufficient to
produce reliable algorithms. Almost always in practice, therefore, only partial pivoting is used.
From here on when we talk about pivoting we mean partial pivoting swapping rows only.
Adding partial pivoting to the linearGaussian code results in the function
linearGaussianPivot given in the Appendix. This is a serviceable routine.

5 References
1. Golub, G.H. and C.F. Van Loan, Matrix Computations, Johns Hopkins University Press,
1983, ISBN: 0-8018-3011-7.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

8/13

6 Appendix Scilab code


6.1 Gauss-Jordan elimination
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023

//////////////////////////////////////////////////////////////////////
// linearGaussJordan.sci
// 2014-06-23, Scott Hudson, for pedagogic purposes
// Solves Ax=b for x using Gauss-Jordan elimination.
// No pivoting is performed.
//////////////////////////////////////////////////////////////////////
function x=linearGaussJordan(A, b)
n = length(b);
A = [A,b]; //form augmented matrix
for k=1:n //A(k,k) is the pivot
for j=n+1:-1:k //normalize row k so A(k,k)=1
A(k,j) = A(k,j)/A(k,k);
end
for i=1:n //eliminate a(i,k) for all i~=k
if (i~=k) //a Pivot does not eliminate itself
for j=n+1:-1:k
A(i,j) = A(i,j)-A(k,j)*A(i,k);
end
end
end //i loop
end //k loop
x = A(:,n+1); //last column of augmented matrix is now x
endfunction

6.2 Matrix inverse using Gauss-Jordan elimination


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023

//////////////////////////////////////////////////////////////////////
// linearGaussJordanInverse.sci
// 2014-06-23, Scott Hudson, for pedagogic purposes
// Forms inverse of matrix A using Gauss-Jordan elimination.
// No pivoting is performed.
//////////////////////////////////////////////////////////////////////
function Ainv=linearGaussJordanInverse(A)
n = size(A,'r');
A = [A,eye(A)]; //form augmented matrix
for k=1:n //A(k,k) is the pivot
for j=2*n:-1:k //normalize row k so A(k,k)=1
A(k,j) = A(k,j)/A(k,k);
end
for i=1:n //eliminate a(i,k) for all i~=k
if (i~=k) //a Pivot does not eliminate itself
for j=2*n:-1:k
A(i,j) = A(i,j)-A(k,j)*A(i,k);
end
end
end //i loop
end //k loop
Ainv = A(:,n+1:2*n); //last column of augmented matrix is now x
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

9/13

6.3 Gaussian elimination


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028

//////////////////////////////////////////////////////////////////////
// linearGaussian.sci
// 2014-06-25, Scott Hudson, for pedagogic purposes
// Solves Ax=b for x using Gaussian elimination and backsubstitution.
// No pivoting is performed.
//////////////////////////////////////////////////////////////////////
function x=linearGaussian(A, b)
n = length(b);
A = [A,b]; //form augmented matrix
//Gaussian elimination loop
for k=1:n-1 //A(k,k) is the pivot
for i=k+1:n //eliminate a(i,k) for all i>k
for j=n+1:-1:k
A(i,j) = A(i,j)-A(k,j)*A(i,k)/A(k,k);
end
end //i loop
end //k loop
//Backsubstitution loop
x = zeros(n,1);
x(n) = A(n,n+1)/A(n,n);
for i=n-1:-1:1
x(i) = A(i,n+1);
for j=i+1:n
x(i) = x(i)-A(i,j)*x(j);
end
x(i) = x(i)/A(i,i);
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

10/13

6.4 Gaussian elimination with partial pivoting


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032
0033
0034
0035
0036
0037
0038
0039
0040
0041
0042
0043

//////////////////////////////////////////////////////////////////////
// linearGaussianPivot.sci
// 2014-06-25, Scott Hudson, for pedagogic purposes
// Solves Ax=b for x using Gaussian elimination and backsubstitution.
// Partial pivoting is performed.
//////////////////////////////////////////////////////////////////////
function x=linearGaussianPivot(A, b)
n = length(b);
A = [A,b]; //form augmented matrix
//Gaussian elimination loop
for k=1:n-1 //A(k,k) is the pivot
//see if there is a larger pivot below this in the kth column
Amax = abs(A(k,k));
imax = k;
for i=k+1:n
if abs(A(i,k))>Amax
Amax = abs(A(i,k));
imax = i;
end
end
if (imax~=k) //we found a larger pivot, swap rows
w = A(k,:); //copy the kth row
A(k,:) = A(imax,:); //replace it with the imax row
A(imax,:) = w; //replace the imax row with the original kth row
end
//pivoting complete
for i=k+1:n //eliminate a(i,k) for all i>k
for j=n+1:-1:k
A(i,j) = A(i,j)-A(k,j)*A(i,k)/A(k,k);
end
end //i loop
end //k loop
//Backsubstitution loop
x = zeros(n,1);
x(n) = A(n,n+1)/A(n,n);
for i=n-1:-1:1
x(i) = A(i,n+1);
for j=i+1:n
x(i) = x(i)-A(i,j)*x(j);
end
x(i) = x(i)/A(i,i);
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

11/13

7 Appendix Matlab code


7.1 Gauss-Jordan elimination
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% linearGaussJordan.sci
%% 2014-06-23, Scott Hudson, for pedagogic purposes
%% Solves Ax=b for x using Gauss-Jordan elimination.
%% No pivoting is performed.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function x = linearGaussJordan(A,b)
n = length(b);
A = [A,b]; %%form augmented matrix
for k=1:n %%A(k,k) is the pivot
for j=n+1:-1:k %%normalize row k so A(k,k)=1
A(k,j) = A(k,j)/A(k,k);
end
for i=1:n %%eliminate a(i,k) for all i~=k
if (i~=k) %%a Pivot does not eliminate itself
for j=n+1:-1:k
A(i,j) = A(i,j)-A(k,j)*A(i,k);
end
end
end %%i loop
end %%k loop
x = A(:,n+1); %%last column of augmented matrix is now x
end

7.2 Matrix inverse using Gauss-Jordan elimination


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% linearGaussJordanInverse.sci
%% 2014-06-23, Scott Hudson, for pedagogic purposes
%% Forms inverse of matrix A using Gauss-Jordan elimination.
%% No pivoting is performed.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function Ainv = linearGaussJordanInverse(A)
n = size(A,1);
A = [A,eye(size(A))]; %%form augmented matrix
for k=1:n %%A(k,k) is the pivot
for j=2*n:-1:k %%normalize row k so A(k,k)=1
A(k,j) = A(k,j)/A(k,k);
end
for i=1:n %%eliminate a(i,k) for all i~=k
if (i~=k) %%a Pivot does not eliminate itself
for j=2*n:-1:k
A(i,j) = A(i,j)-A(k,j)*A(i,k);
end
end
end %%i loop
end %%k loop
Ainv = A(:,n+1:2*n); %%last column of augmented matrix is now x
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

12/13

7.3 Gaussian elimination


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% linearGaussian.sci
%% 2014-06-25, Scott Hudson, for pedagogic purposes
%% Solves Ax=b for x using Gaussian elimination and backsubstitution.
%% No pivoting is performed.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function x = linearGaussian(A,b)
n = length(b);
A = [A,b]; %%form augmented matrix
%%Gaussian elimination loop
for k=1:n-1 %%A(k,k) is the pivot
for i=k+1:n %%eliminate a(i,k) for all i>k
for j=n+1:-1:k
A(i,j) = A(i,j)-A(k,j)*A(i,k)/A(k,k);
end
end %%i loop
end %%k loop
%%Backsubstitution loop
x = zeros(n,1);
x(n) = A(n,n+1)/A(n,n);
for i=n-1:-1:1
x(i) = A(i,n+1);
for j=i+1:n
x(i) = x(i)-A(i,j)*x(j);
end
x(i) = x(i)/A(i,i);
end
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 11: Linear systems of equations I

13/13

7.4 Gaussian elimination with partial pivoting


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% linearGaussianPivot.sci
%% 2014-06-25, Scott Hudson, for pedagogic purposes
%% Solves Ax=b for x using Gaussian elimination and backsubstitution.
%% Partial pivoting is performed.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function x = linearGaussianPivot(A,b)
n = length(b);
A = [A,b]; %%form augmented matrix
%%Gaussian elimination loop
for k=1:n-1 %%A(k,k) is the pivot
%%see if there is a larger pivot below this in the kth column
Amax = abs(A(k,k));
imax = k;
for i=k+1:n
if abs(A(i,k))>Amax
Amax = abs(A(i,k));
imax = i;
end
end
if (imax~=k) %%we found a larger pivot, swap rows
w = A(k,:); %%copy the kth row
A(k,:) = A(imax,:); %%replace it with the imax row
A(imax,:) = w; %%replace the imax row with the original kth row
end
%%pivoting complete
for i=k+1:n %%eliminate a(i,k) for all i>k
for j=n+1:-1:k
A(i,j) = A(i,j)-A(k,j)*A(i,k)/A(k,k);
end
end %%i loop
end %%k loop
%%Backsubstitution loop
x = zeros(n,1);
x(n) = A(n,n+1)/A(n,n);
for i=n-1:-1:1
x(i) = A(i,n+1);
for j=i+1:n
x(i) = x(i)-A(i,j)*x(j);
end
x(i) = x(i)/A(i,i);
end
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 12
Linear systems of equations II
1 Introduction
We have looked at Gauss-Jordan elimination and Gaussian elimination as ways to solve a linear
system Ax=b . We now turn to the LU decomposition, which is arguably the best way to
solve a linear system. We will then see how to use the backslash operator that is built in to
Scilab/Matlab.

2 LU decomposition
Generally speaking, a square matrix A, for example

a 11
a
A= 21
a 31
a 41

a12
a 22
a32
a 42

a 13
a 23
a 33
a 43

a 14
a 24
a 34
a 44

(1)

can be decomposed or factored as


A=LU
where L and U are unit-lower-triangular and upper-triangular matrices of the form

1 0 0
l 21 1 0
L=
l 31 l 32 1
l 41 l 42 l 43

0
0
0
1

) (

u11 u 12 u13
0 u 22 u 23
U=
0
0 u33
0
0
0

u14
u 24
u34
u 44

(2)

One way to see this is to carry out the multiplication.

u11
u12
u 13
u 14
u l
u 22+u 12 l 21
u23+u13 l 21
u24+u14 l 21
LU= 11 21
u 11 l 31 u22 l 32+u12 l 31 u 33+u 23 l 32 +u 13 l 31
u 34+u 24 l 32+u14 l 31
u 11 l 41 u 22 l 42+u12 l 41 u 33 l 43+u 23 l 42+u13 l 41 u 44+u 23 l 43+u 24 l 42+u 14 l 41

(3)

Comparing (1) and (3) we see immediately from the first row that
u 11=a 11 , u 12 =a 12 , u 13=a 13 , u 14 =a 14

(4)

Then from the first column we have


l 21=a 21 / u11 , l 31=a 31 /u11 , l 41=a 41 /u 11

(5)

Now the second row gives us


u 22=a 22u 12 l 21 , u 23=a 23u 13 l 21 , u 24=a 24u14 l 21
EE 221 Numerical Computing

Scott Hudson

(6)
2015-08-18

Lecture 12: Linear systems of equations II

2/11

Then from the second column we have


l 32=( a32u12 l 31)/u22 , l 42=( a 42u12 l 41 )/u 22

(7)

The third row now gives us


u 33=a 33(u 23 l 32 +u13 l 31) , u 34=a 34(u24 l 32+u 14 l 31)

(8)

The third column gives us


l 43=a 43(u 23 l 42+u 13 l 41)/u 33

(9)

u 44 =a 44(u 23 l 43+u 24 l 42+u 14 l 41 )

(10)

and, finally, from the fourth row


This can be generalized to a matrix A of size n-by-n in which case it forms Doolittle's algorithm.
LU decomposition can be conveniently done in place, meaning we don't need to actually create
the L and U matrices. Instead, we can replace the elements of A by the corresponding element of
L or U as they are calculated. The result is that A is replaced with

u11 u12
l
u 22
A 21
l 31 l 32
l 41 l 42

u 13
u 23
u 33
l 43

u 14
u 24
u 34
u 44

The diagonal values of L (all 1's) are understood. The algorithm to do this can be stated very
concisely as
LU decomposition algorithm
for k =1,2, , n1
for i=k +1, k +2, , n
a ik a ik /a kk
for j=k+1, k +2, , n
a ij a ij a ik a kj
A Scilab implementation of this appears in the Appendix as linearLU. Note the three nested
for loops, implying that this is an O(n3) process.

2.1 Solving a linear system using LU decomposition


LU decomposition replaces the original system A x=b with a factored or decomposed system
LU x=b . We get the solution x by a two-step process. First we define a vector y=U x and
write the system in the form L y=b :

(
EE 221 Numerical Computing

1 0 0
l 21 1 0
l 31 l 32 1
l 41 l 42 l 43

0
0
0
1

)( ) ( )
y1
b1
y2
b
= 2
y3
b3
y4
b4

Scott Hudson

2015-08-18

Lecture 12: Linear systems of equations II

3/11

From the first row we see that the value of y 1 is simply


y 1=b1
The second row now gives us
l 21 y 1+ y 2 =b2 y 2=b 2l 21 y 1
and the ith row will provide the solution
i1

y i=bi l ik y k
k =1

This process used to solve for y is called forward substitution. The complete algorithm is
Forward-substitution algorithm
The solution of Ly=b where L is unit-lower-triangular is
for i=1,2, , n
i1

y i=bi l ij y j (for i=1 there are no terms in the sum)


j=1

With two nested loops this is an O(n2 ) process.


Now that we have y, in the next step we solve U x=y

u11 u12 u 13
0 u 22 u 23
0
0 u 33
0
0
0

u 14
u 24
u 34
u 44

)( ) ( )
x1
y1
x2
y
= 2
x3
y3
x4
y4

This upper-triangular system can be solved by the backward-substitution algorithm we developed


in the previous lecture. We restate it here for completeness.
Back-substitution algorithm
The solution of Ux=y where U is upper-triangular is
for i=n , n1, n2, , 1
x i=

1
y u x
u ii i j=i+1 i , j j

(summation is zero when i=n )

A Scilab implementation of forwardlinearLUsubstitute in the Appendix.

and

back-substitution

is

given

as

Back-substitution is also an O(n2 ) process. Therefore in solving Ax=b using LU


decomposition followed by forward- and back-substitution, the bulk of the computation is in
performing the LU decomposition. A detailed analysis shows that while Gauss-Jordan
elimination, Gaussian elimination and LU decomposition are all O( n3) processes, Gauss-Jordan
elimination takes about 3 times as many, and Gaussian elimination about 1.5 times as many
operations as LU decomposition. Hence, LU decomposition is preferable.
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 12: Linear systems of equations II

4/11

Moreover, in a matrix-vector equation of the form Ax=b , typically A represents the geometry,
structure and/or physical properties of the system being analyzed, x is the response of the system
(displacements, temperature, etc.) and b represents the inputs (such as forces or heat flows). It
is not uncommon to want to calculate the system response for several different inputs (different b
vectors). The beauty of the LU decomposition is that it only needs to be performed once at a
computational cost of O( n3) . The solution x for a new input b then simply requires
application of the forward-substitution and back-substitution algorithms with are only O( n2 ) .
This is a tremendous benefit over repeatedly solving Ax=b for each b.

2.2 Partial pivoting


In the LU decomposition algorithm we divide by the pivots a kk . Obviously this fails if one of
these terms is zero, and it produces poor results if one of these terms is very small. As with
Gaussian elimination, the solution is to perform partial (or row) pivoting.
~
In the previous lecture we pointed out that swapping rows of the augmented matrix A=( A , b )
does not change the solution vector x. This is because the b values are swapped along with the A
values. LU decomposition, however, does not consider the b vector. Rather it factors A into a
form in which we can quickly solve for x given any b vector. Therefore we need to keep track of
these row swaps so that we can apply them to the vector b before performing forward- and backsubstitution. The result is sometimes called LUP decomposition. We write
PA=LU
where P is a permutation matrix. As an example, consider
-->A = [1,2,3;4,5,6;7,8,9]
A =
1.
4.
7.

2.
5.
8.

3.
6.
9.

-->P = [0,1,0;0,0,1;1,0,0]
P =
0.
0.
1.

1.
0.
0.

0.
1.
0.

5.
8.
2.

6.
9.
3.

-->P*A
ans =
4.
7.
1.

P is an identity matrix in which the rows have been swapped. In this case the identity matrix
rows 1,2,3 are reordered 2,3,1. The product PA then simply reorders the rows of A in the same
manner. With row pivoting, we actually compute the LU decomposition of PA. Multiplying both
sides of Ax=b by P we have PAx=Pb , and then
LUx=Pb
This shows that we need only apply the permutation to a b vector and then we can obtain the
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 12: Linear systems of equations II

5/11

correct x using forward- and back-substitution. Instead of generating the matrix P we can form a
permutation vector p which lists the reordering of the rows of A, for example

()

2
p= 3
1
This tells us to perform the reordering

()

b2
b b3
b1

before performing forward- and back-substitution. In Scilab/Matlab, if b is a vector and p is a


permutation vector then b( p) is the permuted version of b. A Scilab implementation of the LUP
decomposition algorithm is given in the Appendix as linearLUP. An example of using this
functions to solve a linear system is
-->A = [1,2,3;4,5,-6;7,-8,-9];
-->b = [1;2;3];
-->[LU,p] = linearLUP(A);
-->x = linearLUsubstitute(LU,b(p));
-->disp(x);

which produces output


0.6078431
0.0392157
0.1045752

As a check
-->A*x
ans =
1.
2.
3.

The same result is obtained with


-->LU = linearLU(A);
-->x = linearLUsubstitute(LU,b);

3 Matrix determinant
In your linear algebra course you learned about the determinant of a square matrix, which we will
write as det A . The determinant is very important theoretically. Numerically it does not find
much application, but on occasion you may want to compute it. We know from the properties of
the determinant that if A=LUP then

det A=det L det U det P


The determinant of a permutation matrix is 1 with the sign depending on whether P represents
an even or odd number of row swaps. The determinant of a triangular matrix is simply the
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 12: Linear systems of equations II

6/11

product of its diagonal elements. Therefore det L=1 and

det A=det L det U det P=u 11 u 22 u33 unn


This is the best way to numerically calculate the determinant of a square matrix A. In
Scilab/Matlab the determinant can be calculated using the command det(A).

4 The backslash operator \


In both Scilab and Matlab the most common way to solve the square linear system Ax=b is
using the backslash operator
x = A\b

You can think of A\b as A inverse times b, or A left-divided into b. If A is non-singular,


Scilab uses LUP decomposition with partial pivoting and backward and forward substitution to
solve for x. If A is singular or close to singular (poorly conditioned), or is not square, then a
least-squares solution is computed. We will study least squares in a future lecture.
In Scilab/Matlab, built-in functions run much faster than anything we can code and run
ourselves. This is because built-in functions are written in a compiled language (such as C),
compiled and then called at run time. They run with the much greater speed of compiled code
whereas anything we write must run in the Scilab/Matlab interpreter environment. It is possible
for us to write compiled functions in C and have Scilab/Matlab call them, but this is an advanced
topic.
Repeating our previous example
-->A = [1,2,3;4,5,-6;7,-8,-9];
-->b = [1;2;3];
-->x = A\b
x =
0.6078431
0.0392157
0.1045752

5 Singular matrices
Even with pivoting, all of the methods we have studied may fail to solve Ax=b because there
may be no solution! In fact we know that if det A=0 then A is a singular matrix, A1 fails to
exist, and it is therefore impossible to calculate x=A1 b . Consider the following example
-->A = [1,2,3;4,5,9;6,7,13]
A =
1.
4.
6.

2.
5.
7.

3.
9.
13.

-->[A,p] = linearLUP(A)
p =
3.
1.
2.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 12: Linear systems of equations II


A

7/11

=
6.
0.1666667
0.6666667

7.
0.8333333
0.4

13.
0.8333333
0.

Notice that a 33 =u 33=0 , even with pivoting. It follows that det A=0 and the matrix is singular.
If we try to perform forward- and back-substitution we will fail when we come to the step where
we are supposed to divide by u 33 .
The problem with this A is that the third column is the sum of the first two columns; the three
columns are linearly dependent, and the matrix is singular. There is no solution to Ax=b . If we
ask Scilab to find one we are told
-->x = A\[1;2;3]
Warning :
matrix is close to singular or badly scaled. rcond =
computing least squares solution. (see lsq).
x

0.0000D+00

=
0.
0.7105263
- 0.1578947

-->A*x
ans =
0.9473684
2.1315789
2.9210526

We're warned that the matrix is close to singular and given a least squares solution. Testing
Ax we get something close to, but no equal to b. There is no true solution, but Scilab gives us
the best that can be achieved in its place.

6 References
1. Golub, G.H. and C.F. Van Loan, Matrix Computations, Johns Hopkins University Press,
1983, ISBN: 0-8018-3011-7.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 12: Linear systems of equations II

8/11

7 Appendix Scilab code


7.1 LU decomposition
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019

//////////////////////////////////////////////////////////////////////
// linearLU.sci
// 2014-06-24, Scott Hudson, for pedagogic purposes
// Given an n-by-n matrix A, calculate the LU decomposition
// A = LU where U is upper triangular and L is unit lower triangular.
// A is replaced by the elements of L and U.
// No pivoting is performed.
//////////////////////////////////////////////////////////////////////
function A=linearLU(A)
n = size(A,1); //A is n-by-n
for k=1:n-1
for i=k+1:n
A(i,k) = A(i,k)/A(k,k);
for j=k+1:n
A(i,j) = A(i,j)-A(i,k)*A(k,j);
end
end
end
endfunction

7.2 Forward and backward substitution


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026

//////////////////////////////////////////////////////////////////////
// linearLUsubstitute.sci
// 2014-06-24, Scott Hudson, for pedagogic purposes
// A has been replaced by its LU decomposition. This function applies
// forward- and back-substitution to solve Ax=b. Note that if LUP
// decomposition was performed then b(p) should be used as the b
// argument where p is the permutation vector.
//////////////////////////////////////////////////////////////////////
function x=linearLUsubstitute(A, b)
n = length(b);
y = zeros(b);
for i=1:n //forward-substitution
y(i) = b(i);
for j=1:i-1
y(i) = y(i)-A(i,j)*y(j); //a(i,j) = l(i,j) for j<i
end
end
x = zeros(b);
for i=n:-1:1 //back-substitution
x(i) = y(i);
for j=i+1:n
x(i) = x(i)-A(i,j)*x(j); //a(i,j) = u(i,j) for j>i
end
x(i) = x(i)/A(i,i);
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 12: Linear systems of equations II

9/11

7.3 LU decomposition with partial pivoting


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032
0033
0034
0035
0036
0037
0038
0039
0040

//////////////////////////////////////////////////////////////////////
// linearLUP.sci
// 2014-06-24, Scott Hudson, for pedagogic purposes
// Given an n-by-n matrix A, calculate the LU decomposition
// A = LU where U is upper triangular and L is unit lower triangular.
// A is overwritten by LU. Partial pivoting is performed.
// Vector p lists the rearrangement of the rows of A. Given a
// vector b, the solution to A*x=b is the solution to L*U*x=b(p).
//////////////////////////////////////////////////////////////////////
function [A, p]=linearLUP(A)
n = size(A,1); //a is n-by-n
//Replace A with its LU decomposition
p = [1:n]';
for k=1:n-1 //k indexes the pivot row
//pivoting - find largest abs() in column k
amax = abs(A(k,k));
imax = k;
for i=k+1:n
if abs(A(i,k))>amax
amax = abs(A(i,k));
imax = i;
end
end
if (imax~=k) //we found a larger pivot
w = A(k,:); //copy row k
A(k,:) = A(imax,:); //replace it with row imax
A(imax,:) = w; //replace row imax with original row k
t = p(k); //perform same swap of elements of p
p(k) = p(imax);
p(imax) = t;
end
//pivoting complete, continue with LU decomposition
for i=k+1:n
A(i,k) = A(i,k)/A(k,k);
for j=k+1:n
A(i,j) = A(i,j)-A(i,k)*A(k,j);
end
end
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 12: Linear systems of equations II

10/11

8 Appendix Matlab code


8.1 LU decomposition
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% linearLU.m
%% 2014-06-24, Scott Hudson, for pedagogic purposes
%% Given an n-by-n matrix A, calculate the LU decomposition
%% A = LU where U is upper triangular and L is unit lower triangular.
%% A is replaced by the elements of L and U.
%% No pivoting is performed.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function A = linearLU(A)
n = size(A,1); %%A is n-by-n
for k=1:n-1
for i=k+1:n
A(i,k) = A(i,k)/A(k,k);
for j=k+1:n
A(i,j) = A(i,j)-A(i,k)*A(k,j);
end
end
end
end

8.2 Forward and backward substitution


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% linearLUsubstitute.m
%% 2014-06-24, Scott Hudson, for pedagogic purposes
%% A has been replaced by its LU decomposition. This function applies
%% forward- and back-substitution to solve Ax=b. Note that if LUP
%% decomposition was performed then b(p) should be used as the b
%% argument where p is the permutation vector.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function x = linearLUsubstitute(A,b)
n = length(b);
y = zeros(size(b));
for i=1:n %%forward-substitution
y(i) = b(i);
for j=1:i-1
y(i) = y(i)-A(i,j)*y(j); %%a(i,j) = l(i,j) for j<i
end
end
x = zeros(size(b));
for i=n:-1:1 %%back-substitution
x(i) = y(i);
for j=i+1:n
x(i) = x(i)-A(i,j)*x(j); %%a(i,j) = u(i,j) for j>i
end
x(i) = x(i)/A(i,i);
end
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 12: Linear systems of equations II

11/11

8.3 LU decomposition with partial pivoting


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% linearLUP.m
%% 2014-06-24, Scott Hudson, for pedagogic purposes
%% Given an n-by-n matrix A, calculate the LU decomposition
%% A = LU where U is upper triangular and L is unit lower triangular.
%% A is overwritten by LU. Partial pivoting is performed.
%% Vector p lists the rearrangement of the rows of A. Given a
%% vector b, the solution to A*x=b is the solution to L*U*x=b(p).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [A,p] = linearLUP(A)
n = size(A,1); %%a is n-by-n
%%Replace A with its LU decomposition
p = [1:n]';
for k=1:n-1 %%k indexes the pivot row
%%pivoting - find largest abs() in column k
amax = abs(A(k,k));
imax = k;
for i=k+1:n
if abs(A(i,k))>amax
amax = abs(A(i,k));
imax = i;
end
end
if (imax~=k) %%we found a larger pivot
w = A(k,:); %%copy row k
A(k,:) = A(imax,:); %%replace it with row imax
A(imax,:) = w; %%replace row imax with original row k
t = p(k); %%perform same swap of elements of p
p(k) = p(imax);
p(imax) = t;
end
%%pivoting complete, continue with LU decomposition
for i=k+1:n
A(i,k) = A(i,k)/A(k,k);
for j=k+1:n
A(i,j) = A(i,j)-A(i,k)*A(k,j);
end
end
end
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13
Nonlinear systems of equations
1 Introduction
We have investigated the solution of one nonlinear equation in one unknown: f (x )=0 . What
about multiple nonlinear equations in multiple unknowns? To get started, consider one equation
in two unknowns
f (x , y )=0
To be specific, let's take
2

f ( x , y )=x + y 4=0

Although we can solve this for y


y= 4 x 2
this does not give us a unique solution ( x , y ) . Rather it defines y as a (two valued) function of x.
In general f (x , y )=0 defines a contour in the x,y plane (Fig. 1). With two unknowns we need
two equations to define a solution. Suppose our second equation is
g ( x , y)= ysin (x)=0

This also defines y as a function of x

y=sin( x)
and a contour in the x,y plane. An intersection of those contours (Fig. 1) is the solution of the
system of equations
f (x , y )=0
g ( x , y )=0

(1)

In general one equation in n unknowns defines a surface of dimension n1 . Two equations


define a surface of dimension n2 , and k equations define a surface of dimension nk . A
point is a surface of dimension 0, and to define a point we need to have n equations in n
unknowns.
A graphical approach to root finding gets progressively more difficult as the number of
dimensions grows. Beyond two dimensions it is rarely an option.

2 Notation
For a general system of n equations in n unknowns it's not convenient to use different letters x,y,z
for the unknowns or f,g,h for the functions. A better notation is to use indices so that the
unknown variables are represented as x i :1in . Our system of 2 equations (1) would then be
written as
f 1 ( x 1 , x 2 )=0
f 2 ( x 1 , x 2 )=0
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

2/11

Fig. 1: Intersections of the f(x,y)=0 and g(x,y)=0 contours define the solutions of the
system of two equations in two unknowns.

A system of n equations would have the form


f 1 ( x 1 , x 2 , , x n)=0
f 2 ( x 1 , x 2 , , x n)=0

f n ( x 1 , x 2 , , x n )=0

(2)

Now, we can treat the arrays of unknowns x 1 , x 2 , , x n and functions f 1 , f 2 , , f n as vectors


to arrive at the compact notation
(3)

f (x)=0

The notation in (3) is simply a shorthand representation for the system of (2), but when written
using vectors it displays the same form as the scalar root-finding problem f (x )=0 .

3 Challenges
In general solving a system of the form (3) is a difficult problem. To quote the classic work
Numerical Recipes in C [1,Section 9.6]
There are no good, general methods for solving systems of more than one nonlinear
equation. Furthermore, it is not hard to see why (very likely) there never will be any good,
general methods.
The single largest problem is that in two or more dimensions we loose the concept of bracketing
a root. Going back to the notation (1), suppose the point a is (x a , y a) and the point b is ( x b , y b )
and

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

3/11

f (x a , y a) f (x b , y b )<0
g (x a , y a) g ( x b , y b)<0

Fig. 2: Continuous function f ( x , y) , g ( x , y) both change signs between point a and


point b, but f ( x , y)=0 and g ( x , y)=0 are unlikely to occur at the same point.

Provided f and g are continuous along a path connecting a and b, on that path there will be a
point c where f (x c , y c )=0 . Likewise there will be a point d where g ( x d , y d )=0 . However we
have no way of knowing if these two points will coincide, as they must at a solution of (1). In
fact, based on Fig. 2 we can see that it is quite unlikely that they will. Thus there is no procedure
analogous to bisection that can guarantee we find a root with any given precision.
Without bracketing and bisection methods we are left with the possibility of implementing some
form of root polishing. Recall that in one-dimension these methods were not guaranteed to find a
solution, even if one or more exists. Typically they need to start reasonably close to a solution to
converge.
The one-dimensional root-polishing methods we investigated were: fixed-point iteration,
Newton's method and the secant method. We will develop multidimensional versions of those
below.

4 Fixed-point iteration
Consider the following equations
0=f (r)
0=A f (r )
r=r+A f (r )=g (r)
where r is an n-dimensional column vector that forms a solution of our problem and f is an ndimensional column-vector function. The first equation is simply a statement of our root finding
problem. In the second equation we have multiplied both sides by an n-by-n matrix A. In the last
equation we have added r to both sides and defined the right-hand side as the function g(r) .
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

4/11

Let's now write


x=g (x)

(4)

This has solution x=r . Following the one-dimensional algorithm we expect that the sequence
of vectors x k where
x k +1=g( x k )

(5)

might converge to r under the right conditions. Note that the notation x k refers to the kth vector
in a sequence of vectors and not the kth component of the vector x, which would be written x k .
In terms of components (4) gives us n equations
x i=g i ( x 1 , x 2 , x3 , , x n) , i=1,2,3, , n
Let's write
x i=r i+ei
where e i=x i r i is the error in the ith component of x. Supposing that all the e i values are small
enough that first-order Taylor series are accurate we write (5) as
r i+e i= g i (r)+

gi
g
g
e 1+ i e2 ++ i e n
x1
x2
xn

(6)

or
n

e i =
j =1

gi
e
xj j

(7)

since r i=g i (r) . Now, define the n-by-n matrix J to have the elements
J ij =

gi
xj

This is called the Jacobian matrix of the vector function g(x) . We also define the n-by-1
column vector e to have elements e i . Then (7) can be written

e k+1=J e k

(8)

where, again, e k is the kth vector in the sequence of error vectors, and is not to be confused with
e k which is the kth element in a particular vector.
Intuitively the sequence (8) will converge provided J e k e k for some 0<1 . That is,
the norm of the error vector decreases by a fixed factor at each iteration so that
e k k e 0 0 as k
In principle it is always possible to find a matrix A so that (8) converges. However it's a difficult
problem since A has n 2 components, and it is rarely practical to do so. Instead, we might try to
manipulate our system of equations into the form (4), in various ways until we find one that
converges.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

5/11

Example 1: Consider the system


2

f ( x , y)=x + y 4=0
g ( x , y)= ysin (x)=0
Let's write this in the iterative form
2

x=x+x + y 4
y= y+ ysin( x)
From Fig. 1 we see that x=2 , y=1 is near a root. Starting at this point our
iteration gives us
k
x
y
1
3
1.85888
2 11.455435 4.6138744
3 159.97026 8.9794083
which is clearly not converging. However writing
x= 4 y 2
y=sin ( x)
we obtain
k
1
2
3

x
1.7320508
1.7394765
1.7401679

1.7402407

y
0.9870266
0.9858072
0.9856909

0.9856786

which does converge.

5 The Newton-Raphson method


In one-dimension, the idea behind Newton's method is to approximate a function by its tangent
line

f (x k +1)= f ( x k )+ f ( x k )(x k +1 x k )=0


and solve for the root of that line to get
x k +1=x k

f ( xk )

f ( xk )

In the n-dimensional case, if we know f i (x ) , for a small change to x we can approximate the
change in the function by a first-order Taylor series

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

6/11

f i (x 1+u1, x 2+u 2, , x n+u n) f i (x)+

fi
fi
fi
u 1+
u2 ++
u
x1
x2
xn n

where the derivatives are evaluated at x and u is the small change in x. Setting this expression
equal to zero we have
fi
fi
fi
u 1+
u 2++
u = f i (x)
x1
x2
xn n
Doing this for i=1,2, , n we get the system of equations
f1
f1
f1
u1 +
u 2++
u = f 1 (x)
x1
x2
xn n
f2
f2
f2
u1+
u 2++
u = f 2 (x)
x1
x2
xn n

fn
fn
fn
u1+
u 2++
u = f n (x)
x1
x2
xn n
Defining the Jacobian matrix J to have elements
J ij =

fi
xj

we then have the linear system


J u=f
Solving this we update x and repeat until convergence. As in all root-polishing methods,
convergence involves some guesswork. Typically we assume convergence when u is smaller
than some tolerance. The result is the Newton-Raphson method.
Newton-Raphson method
Given x k calculate

f k =f (x k ) and J k with elements J ij =

fi
evaluated at x k
xj

solve for u k =J1


k fk
update x k +1=x k +uk
repeat until u k is small enough
Let's see how this works on our previous example problem.
Example 2: We wish to solve f (x)=0 where

x 21+ x 224
f ( x)=
x 2sin( x1 )
The Jacobian matrix is
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

7/11

J=

2 x1
2 x2
cos (x 1) 1

The iteration
x xJ1 f
starting at x=2 , y=1 produces
-->deff('y=f(x)','y=[x(1)^2+x(2)^2-4;x(2)-sin(x(1))]');
-->deff('y=J(x)','y=[2*x(1),2*x(2);-cos(x(1)),1]');
-->x = [2;1];
-->x = x-J(x)\f(x)
x

=
1.7415812
1.0168376

-->x = x-J(x)\f(x)
x =
1.7405501
0.9856269
-->x = x-J(x)\f(x)
x =
1.7402407
0.9856787

When started near a root, the Newton-Raphson method converges quadratically. Like all rootpolishing methods there is no guarantee that it will converge, even if a root exists. A Scilab
implementation of the Newton-Raphson method is given in the Appendix as
rootNewtonRaphson.

6 Broyden's method
The Newton-Raphson method requires the calculation of the Jacobian matrix, n derivatives of n
functions, at each iteration. It might not be possible to calculate this analytically, or it might not
be convenient to do so. In the one-dimensional problem we avoided calculating derivatives by
approximating the derivative by

f ( x k )

f ( x k+1 ) f (x k )
x k+1x k

This led to the secant method. A similar approach in the multidimensional case leads to
Broyden's method. We approximate the Jacobian J by a matrix B and otherwise follow the
Newton-Raphson method. Assume we start with some x k , f k =f (x k ) and some estimate B k
for the Jacobian. We solve for
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

8/11

u k =B1
k fk
update our root estimate
x k +1=x k +uk
calculate

f k+1=f (x k +1)
and update our Jacobian estimate using Broyden's formula
Bk +1=Bk +f k +1

uTk
u k 2

The motivation for Broyden's formula is that we want to have


B k+1 uk =f k+1f k

This is analogous to the one-dimensional formula f x= f where the Jacobian estimate B


takes the place of the function derivative. Since
uTk
u k 2

uk =1

we have
B k+1 uk =Bk uk +f k+1=f k +1f k
In summary:
Broyden's method
Given x k , f k =f (x k ) and B k
solve for u k =B1
k fk
update x k +1=x k +uk
calculate f k+1=f (x k +1)
update B k+1=B k +f k +1

u Tk
u k 2

repeat until u k is small enough


If we have no information to guide us in forming the initial estimate B0 we can take it to be the
identify matrix. Broyden's update formula will usually cause B k to quickly form a good estimate
of J k . A Scilab implement appears in the Appendix as rootBroyden.

The fsolve command (Scilab)


We have already learned how to use the fsolve command to find roots of one-dimensional
functions. Precisely the same syntax applies in the case of an n-dimensional function
r = fsolve(x0,f);
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

9/11

The only difference is that r, x0 and f are now n-dimensional. If the Jacobian can be explicitly
calculated that can be added as an additional argument
r = fsolve(x0,f,J);

and fsolve will utilize that for faster convergence.

7 References
1. Press, W. H., Flannery, B. P., Teukolsky, S. A. and Vetterling, W. T., Numerical Recipes
in C, Cambridge, 1988, ISBN: 0-521-35465-X.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

10/11

8 Appendix Scilab code


8.1 Newton-Raphson method
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020

//////////////////////////////////////////////////////////////////////
// rootNewtonRaphson.sci
// 2014-06-04, Scott Hudson, for pedagogic purposes
// Implements Newton-Raphson method for finding a root f(x) = 0
// where f and x are n-by-1 vectors.
// Requires two functions: y=f(x) and y=J(x) where J(x) is the n-by-n
// Jacobian of f(x). Search starts at x0. Root is returned as r,
// niter is number of iterations performed. Termination when change
// in x is less than tol or MAX_ITERS exceeded.
//////////////////////////////////////////////////////////////////////
function [r, nIters]=rootNewtonRaphson(x0, f, J, tol)
MAX_ITERS = 40; //give up after this many iterations
nIters = 1; //1st iteration
r = x0-J(x0)\f(x0); //Newton's formula for next root estimate
while (max(abs(r-x0))>tol) & (nIters<=MAX_ITERS)
nIters = nIters+1; //keep track of # of iterations
x0 = r; //current root estimate is last output of formula
r = x0-J(x0)\f(x0); //Newton's formula for next root estimate
end
endfunction

8.2 Broyden's method


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024

//////////////////////////////////////////////////////////////////////
// rootBroyden.sci
// 2014-06-12, Scott Hudson, for pedagogic purposes
// Implements Broyden's method for finding a root of f(x)=0
// where f and x are n-by-1 vectors. x0 is initial guess for root
// and tol is termination tolerance for change in x.
//////////////////////////////////////////////////////////////////////
function [r, nIters]=rootBroyden(x0, f, tol)
MAX_ITERS = 40; //give up after this many iterations
xk = x0;
n = length(xk);
fk = f(xk);
Bk = eye(n,n);
uk = -Bk\fk;
nIters = 0;
while (max(abs(uk))>tol) & (nIters<=MAX_ITERS)
xk = xk+uk;
fk = f(xk);
Bk = Bk+fk*(uk')/(uk'*uk);
uk = -Bk\fk;
nIters = nIters+1;
end
r = xk+uk;
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 13: Nonlinear systems of equations

11/11

9 Appendix Matlab code


9.1 Newton-Raphson method
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% rootNewtonRaphson.m
%% 2014-06-04, Scott Hudson, for pedagogic purposes
%% Implements Newton-Raphson method for finding a root f(x) = 0
%% where f and x are n-by-1 vectors.
%% Requires two functions: y=f(x) and y=J(x) where J(x) is the n-by-n
%% Jacobian of f(x). Search starts at x0. Root is returned as r,
%% niter is number of iterations performed. Termination when change
%% in x is less than tol or MAX_ITERS exceeded.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [r,nIters] = rootNewtonRaphson(x0,f,J,tol)
MAX_ITERS = 40; %%give up after this many iterations
nIters = 1; %%1st iteration
r = x0-J(x0)\f(x0); %%Newton's formula for next root estimate
while (max(abs(r-x0))>tol) && (nIters<=MAX_ITERS)
nIters = nIters+1; %%keep track of # of iterations
x0 = r; %%current root estimate is last output of formula
r = x0-J(x0)\f(x0); %%Newton's formula for next root estimate
end
end

9.2 Broyden's method


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% rootBroyden.m
%% 2014-06-12, Scott Hudson, for pedagogic purposes
%% Implements Broyden's method for finding a root of f(x)=0
%% where f and x are n-by-1 vectors. x0 is initial guess for root
%% and tol is termination tolerance for change in x.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [r,nIters] = rootBroyden(x0,f,tol)
MAX_ITERS = 40; %%give up after this many iterations
xk = x0;
n = length(xk);
fk = f(xk);
Bk = eye(n,n);
uk = -Bk\fk;
nIters = 0;
while (max(abs(uk))>tol) && (nIters<=MAX_ITERS)
xk = xk+uk;
fk = f(xk);
Bk = Bk+fk*(uk')/(uk'*uk);
uk = -Bk\fk;
nIters = nIters+1;
end
r = xk+uk;
end

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14
Interpolation I
1 Introduction
A common problem faced in engineering is that we have some physical system or process with
input x and output y. We assume there is a functional relation between input and output of the
form y= f ( x ) , but we don't know what f is. For n particular inputs x 1<x 2 <<x n we
experimentally determine the corresponding outputs y i= f ( x i ) . From these we wish to estimate
the output y for an arbitrary input x where x 1x x n . This is the problem of interpolation.
If the experiment is easy to perform then we could just directly measure y= f ( x ) , but typically
this is not practical. Experimental determination of an input-output relation is often difficult,
expensive and time consuming. Or, y= f ( x ) may represent the result of a complex numerical
simulation that is not practical to perform every time we have a new x value. In either case the
only practical solution may be to estimate y by interpolating known values.
To illustrate various interpolation methods, we will use the example of
x

y= f ( x )=3 e sin x
sampled at x=0,1,2,3,4,5 . These samples are the black dots in Fig. 1.

2 Nearest neighbor or staircase interpolation


A simple interpolation scheme for estimating f (x ) is to find the sample value x i that is nearest
to x and then assume y= f ( x )= y i . This is called nearest neighbor interpolation, and we can
describe it as follows.
Nearest-neighboor interpolation
Find the value of i that minimizes the distance | x xi|
Set y= y i
The interpolation will be constant near a sample point. As we move towards another sample
point the interpolation will discontinuously jump to a new value. This produces the staircase
appearance of the solid line in Fig. 1. Except for the special case where all y i are equal (a
constant function), nearest neighbor interpolation produces a discontinuous output.
Although this may not look like a realistic function, the fact is that without more information
than just the sample points we don't have any basis to declare one interpolation better than
another. Digital systems often produce output with piecewise-constant, discrete levels such as
this. On the other hand, if we know that the function f (x ) is the result of some continuous
physical process, then a nearest-neighbor interpolation is almost certainly a poor representation
of the true function.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

2/13

3 Linear interpolation

Fig. 1 circles: samples of


line, linear interpolation

3 e sin x , solid line: nearest-neighbor interpolation; dashed

An obvious and easy way to interpolate data is to connect the dots with straight lines. This
produces a continuous interpolation (Fig. 1) but which has kinks at the sample points where the
slope is discontinuous. The algorithm is
Linear interpolation
Find k such that x k <x< x k+1
Set y= y k +

y k+1 y k
( xx k )
x k+1x k

If the samples are closely spaced, linear interpolation works quite well. In fact it's used by default
for the plot() routine in Scilab/Matlab. However, when sampling is sparse (as in Fig. 1), linear
interpolation is unlikely to give an accurate representation of a smooth function. This
motivates us to investigate more powerful interpolation methods.

4 Polynomial interpolation
Through two points (x 1 , y 1) ,(x 2 , y 2 ) we can draw a unique line, a 1st order polynomial.
Through three points ( x 1 , y 1) ,( x 2 , y 2 ) ,( x3 , y 3 ) we can draw a unique parabola, a 2 nd order
polynomial. In general, through n points (x i , y i ) , i=1,2, , n we can draw a unique polynomial
of order (n1) . Although this polynomial is unique, there are different ways to represent and
derive it. We start with the most obvious approach, the so-called monomial basis.

4.1 Monomial basis


A term in a single power of x, such as x k , is called a monomial. Adding two or more of these
together produces a polynomial. A polynomial of order (n1) with arbitrary coefficients is a
linear combination of monomials:
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

3/13

y= p (x )=c 1+c 2 x+c3 x 2++c n x n1


We want this polynomial to pass through our samples (x i , y i ) , i=1,2, , n . We therefore require
c1+c2 x 1+c3 x 21++c n x n1
1 = y1
2
c1+c2 x 2+c 3 x 2++c n x n1
2 = y2

c1+c2 x n +c 3 x 2n ++cn x nn 1= y n
which has the form of n equations in n unknown coefficients c i . We can express this as the
linear system
A c=y
where

) () ()

1 x 1 x 21 x n1
c1
y1
1
2
n1
A= 1 x 2 x 2 x 2
, c= c 2 , y= y 2

2
n1
c
y
1 xn xn xn
n
n

(1)

A matrix of the form A, in which each column contains the sample values to some common
power is called a Vandermonde matrix. The coefficient vector c is easily calculated in
Scilab/Matlab as
n =
A =
for
A
end
c =

length(x);
ones(x);
k=1:n-1
= [A,x.^k];
A\y;

Here we've assumed that the vector x is a column vector. If it is a row vector replace it by the
transpose (x'). Likewise for the vector y.
Example 1:Suppose x T =[0,1,2] and y T =[2,1,3] . The coefficients of the 2nd
order polynomial that passes through these points are found from

( )( ) ( )
1 0 0
1 1 1
1 2 4

c1
2
c2 = 1
3
c3

Solving the system we obtain

()

2
5

c= 2
3
2
so the interpolating polynomial is
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

4/13

Fig. 2 5th order polynomial interpolation. Solid dots are samples; squares are actual
function values (for comparison); line is interpolation.

5
3
y=2 x + x 2
2
2
A Scilab program to interpolate n samples appears in Appendix 2 as interpMonomial().
Applying this to the sample data shown in Fig. 1 produces the results shown in Fig. 2. The
resulting interpolation gives a good representation of the underlying function except near x=4.5
.
Numerically that is all there is to polynomial interpolation. However, it does require the solution
of a linear system which for a high-order polynomial must be done numerically. The monomial
approach starts having problems for very high-order polynomials due to round-off error. We'll
come back to this.
In additional to numerical considerations, there are times when we would like to be able to write
the interpolating polynomial directly, without solving a linear system. This is particularly true
when we use polynomial interpolation as part of some numerical method. In this case we don't
know what the x and y values are because we are developing a method that can be applied to any
data. Therefore we want to express the interpolating polynomial coefficients as some algebraic
function of the sample points. There are two classic way to do this: Lagrange polynomials and
Newton polynomials.

4.2 Lagrange polynomials


Through n points we can pass a polynomial of order n1 . Lagrange polynomials are an
ingenious way to write this polynomial down by inspection. Here's the idea.
Suppose we have three data points: (x 1 , 1) ,( x2 , 0),( x3 , 0) . There is a unique second-order
polynomial p (x ) which interpolates these data. Since p ( x 2 )=0 , x 2 is a root of the
polynomial. That means the polynomial must have a factor ( xx 2 ) . Likewise, p ( x 3)=0 so it
also has a factor ( xx 3 ) . But the two factors ( xx 2 )( x x3 ) by themselves form a 2nd order
polynomial x 2( x 2+ x 3) x+x 2 x 3 . Therefore to within a multiplicative constant p ( x) must
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

5/13

simply be the product (xx 2 )( x x3 ) , that is


p ( x)=c( xx 2 )( xx 3)
The constant is fixed by the third constraint p ( x 1)=1 to give us
p ( x)=

( xx 2 )( xx 3)
(x 1x 2 )( x 1x 3)

From the form of this quadratic it is immediately clear that p (x 1)=1 and p ( x 2 )= p (x 3)=0 .
We'll call this
L1 ( x)=

(x x 2)( xx 3)
( x1 x 2)( x1 x 3)

and L1 ( x1 )=1 , L 1( x 2)=0 , L 1 (x 3)=0 . Now consider the data points ( x 1 , 0), ( x 2 , 1),( x3 , 0) . By
the same logic the interpolating polynomial must be
L 2( x)=

(xx 1 )( xx 3 )
( x 2x 1 )( x 2x 3 )

and L 2( x 1)=0 , L 2 ( x 2)=1 , L 2( x 3)=0 . Finally consider the data points (x 1 , 0), ( x 2 , 0) ,( x 3 , 1) .
These are interpolated by
L3 ( x)=

( xx 1)(x x 2)
( x3 x 1)(x 3 x 2)

and L3 ( x1 )=0 , L3 ( x2 )=0 , L3 ( x3 )=1 .


Now suppose we have three data points ( x 1 , y 1) ,( x 2 , y 2 ) ,( x3 , y 3 ) . The claim is that the secondorder interpolating polynomial is
p ( x)= y1 L1 ( x)+ y 2 L 2( x)+ y3 L3 ( x)
Since L1 , L2 , L 3 are each second-order, a linear combination of them is also. We can verify that
it interpolates our three data points by simply calculating p (x 1) , p( x 2) , p( x 3) . For example
p ( x 2 )= y 1 L 1( x 2)+ y 2 L 2 (x 2 )+ y 3 L3 ( x 2 )= y 1 (0)+ y 2 (1)+ y 3 (0)= y 2
The Lagrange polynomial is not in the monomial form, but if we expand out the terms we will
(to within round-off error) get the same result we obtain with the monomial-basis method.
Example 2: Suppose as in Example 1 x T =[0,1,2] and y T =[2,1,3 ] . We have
L1 ( x)=

( x1)( x2)
( x0)(x2)
( x0)( x1)
, L 2( x)=
, L3 ( x)=
(01)(02)
(10)(12)
(20)(21)

so
2
1
3
y= ( x1)(x2)+
x ( x2)+ x ( x1)
2
1
2
Expanding out the terms and collecting like powers we obtain

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

6/13

3
3
y=2 x + x 2
2
2
which is the result we obtained in Example 1.
In general, if we have n sample points ( x i , y i ) , i=1,2, , n we form the n polynomials
L k (x )=
ik

x xi
x k xi

(2)

for k =1,2, , n , where the P symbol signifies the product of the n1 terms indicated. The
interpolating polynomial is then
n

y= p ( x )= y k Lk (x)

(3)

k =1

Lagrange polynomials are very useful analytically since they can be written down by inspection
without solving any equations. They also have good numerical properties. A Scilab program to
interpolate n samples using the Lagrange method appears in Appendix 2 as
interpLagrange(). It's important to remember that there is a unique polynomial of order
n1 which interpolates a given n points. Whatever method we use to compute this must
produce the same polynomial (to within our numerical limitations).

4.3 Newton polynomials


Newton polynomials are yet another way to obtain the n1 order polynomial that interpolates a
given n points. Suppose we have a single sample (x 1 , y 1) . The zeroth-order polynomial (a
constant function) passing through this point is
p 0 (x )=c 1
where c 1= y 1 . Now suppose we have an additional point (x 2 , y 2 ) . The polynomial that
interpolates these two points will now be first order. Newton's idea was to write it in the form
p 1 ( x)=c1+c 2( xx 1)
because this guarantees that p 1 (x 1)=c1= p0 ( x 1)= y 1 , so our coefficient c 1 is unchanged. We
need only calculate a single new coefficient c 2 from
p 1( x 2)= y 2=c 1+c 2 (x 2 x 1)
This gives us
c 2=

y 2c 1 y 2 y1
=
x 2x 1 x2 x 1

so
p 1 (x)= y 1+

y 2 y 1
( xx 1)
x 2 x1

Notice that this looks somewhat like f (x )= f ( x 1)+ f ( x 1)( xx 1) where f (x 1 ) is


approximated by change in y over change in x. That is it has the form of a discrete Taylor
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

7/13

series.
Now suppose we add a third point ( x 3 , y 3) . Our interpolating polynomial will now need to be
quadratic, but we want to write it in a way that preserves the fit we've already obtained for the
first two points. Therefore we write
p 2 ( x )=c 1+c 2 (xx 1 )+c 3 ( xx 1)( xx 2 )
which guarantees that p 2 ( x 1)=c 1= y1 = p 0 ( x 1) and p 2 (x 2 )=c 1+c 2 ( x 2x 1)= y 2 = p 1( x 2) . The
single new coefficient c 3 is obtained from
p 2 (x 3)= y 3=c 1+c 2 ( x 3x 1)+c 3 (x 3x 1)( x3 x 2)
so that
y 3 y 1
y 3 y 1 y 2 y 1
c2

y 3c1c2 ( x3 x 1) x 3x 1
x3 x1 x 2x 1
c 3=
=
=
( x 3 x1 )(x 3x 2 )
x 3x 2
x 3x 2
The coefficient c 3 has a form that suggests difference in the derivative of y over difference in
x the form of a second derivative. Our interpolating polynomial
y 3 y 1 y 2 y 1

y 2 y 1
x 3 x1 x 2x 1
p 2 (x )= y 1+
( x x1 )+
( xx 1)( xx 2)
x 2x 1
x 3x 2
is roughly analogous to a second-order Taylor series

f (x)= f ( x 1)+ f (x 1)( x x1 )+

1
f ( x1 )( xx 1 )2
2

We might think of p 2 (x ) as a second-order discrete Taylor series.


If we now add a fourth data point ( x 4 , y 4) we will need a third-order interpolating polynomial
p 3 (x) which we write as
p 3 (x)=c 1+c 2 (x x1 )+c3 (x x 1)( x x 2)+c 4 ( x x1 )(xx 2 )( x x3 )
As before all the coefficients except the new one will be unchanged, so we need only calculate
the single new coefficient c 4 .
Suppose instead we skip calculating the polynomials p 0 ( x ) , p1 ( x) , p2 ( x) and want to calculate
p 3 (x) directly from the four data points. The four coefficients are determined by the conditions

)( ) ( )

1
0
0
0
1 ( x 2x 1 )
0
0
1 ( x 3x 1) (x 3x 1)( x3 x 2)
0
1 ( x 4x 1 ) ( x 4x 1)( x 4x 2 ) ( x 4x 1)( x 4x 2 )(x 4 x3 )

c1
y1
c2
y
= 2
c3
y3
c4
y4

(4)

The matrix is lower-triangular, so we can solve it using forward substitution. However, if we


solve it using Gauss-Jordan elimination an interesting pattern emerges.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

8/13

Fig. 3: Monomial (thick green line), Lagrange (medium blue line) and Newton (thin dashed red line)
polynomials. At left N=11 point (hece 10 th order polynomials). Agreement is good. At right N=13 points (12 th
order polynomials). The monomial polynomial fails to interpolate the data. The y data values were chosen
randomly. The x values are 0,1,2,...,N-1.

The augmented matrix is

1
0
0
0
1 (x 2x 1 )
0
0
1 (x 3x 1) ( x 3x 1)( x3 x 2)
0
1 (x 4x 1 ) ( x 4x 1)( x 4x 2 ) ( x 4x 1)( x 4x 2 )( x 4 x3 )

y1
y2
y3
y4

Subtracting the row 1 from rows 2, 3 and 4

1
0
0
0
0 ( x 2 x 1)
0
0
0 ( x3 x1 ) ( x 3 x1 )(x 3x 2 )
0
0 ( x 4 x 1) ( x 4 x1 )(x 4 x 2) ( x 4 x1 )( x 4 x 2)(x 4 x 3)

y1
y 2 y 1
y 3 y 1
y 4 y 1

Normalizing rows 2, 3 and 4 by the element in the 2nd column

1 0

0 1

0 1 ( x 3 x 2)

0 1 ( x 4 x 2) (x 4 x 2)( x 4x 3)

y1
y 2 y1
x 2 x 1
y 3 y 1
x3 x1
y 4 y1
x 4 x 1

Note the cancellation of the term (x 3x 1) in row 3 and (x 4 x1 ) in row 4. Now substract row 2
from rows 3 and 4.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

9/13

1 0

0 1

0 0 ( x 3 x 2)

0 0 ( x 4 x 2) (x 4 x 2)( x 4x 3)

y1
y 2 y 1
x 2 x1
y 3 y 1 y 2 y 1

x3 x1 x 2x 1
y 4 y1 y 2 y 1

x 4 x 1 x 2x 1

Normalizing rows 3 and 4 by the element in the 3rd column

1 0 0

0 1 0

0 0 1

0 0 1 ( x 4 x3 )

y1
y 2 y 1
x 2 x1
y 3 y 1 y 2 y 1

x3 x1 x 2x 1
x 3x 2
y 4 y1 y 2 y 1

x 4 x 1 x 2x 1
x 4x 2

Note the cancellation in row 4. Subtract row 3 from row 4

1 0 0

0 1 0

0 0 1

0 0 0 ( x 4 x3 )

y1
y 2 y 1
x 2x 1
y 3 y 1 y 2 y1

x 3 x1 x2 x 1
x 3x 2
y 4 y1 y 2 y 1 y 3 y 1 y 2 y 1

x 4 x 1 x 2x 1 x 3 x1 x 2x 1

x 4x 2
x 3x 2

Normalize row 4 by its element in column 4

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

10/13

1 0 0 0
0 1 0 0

0 0 1 0

0 0 0 1

y1
y 2 y 1
x 2x 1
y 3 y 1 y 2 y 1

x 3x 1 x 2 x1
x3 x 2
y 4 y 1 y 2 y1 y 3 y 1 y 2 y 1

x 4 x1 x 2 x 1 x 3x 1 x 2 x1

x 4x 2
x3 x 2
x 4 x 3

The 5th column now contains the coefficients of the Newton polynomial. Following through these
steps we arrive at the very compact algorithm for calculating the Newton coefficients.
Calculating Newton coefficients
c = y(:); //c is a column vector of the y values
for i=1:n-1
for j=i+1:n
c(j) = (c(j)-c(i))/(x(j)-x(i));
end
end

This is used in the Scilab program interpNewton which appears in Appendix 2. This
interpolates n sample points using the Newton method.

4.4 Numerical considerations and polynomial wiggle


In principle, for a given set of data the monomial, Lagrange and Newton interpolating
polynomials are identical. They are merely expressed in different formats. However,
numerically the monomial polynomial method suffers from round-off error more than the other
two methods. This is illustrated in Fig. 3. In these examples we see that for higher than about a
10th order polynomial the monomial basis fails to accurately interpolate the data. The problem is
that the Vandermonde matrix in (1) tends to become nearly singular for large n. For this reason it
is not recommended for use except for low-order problems, if at all.
The Lagrange and Newton polynomials are both quite stable numerically. The form of the
Newton algorithm allows it to be implemented with fewer floating-point operations, and so it
tend to be faster, especially at high order. Numerically, therefore, Newton polynomials are
arguably the best choice for polynomial interpolation.
Even without round-off error, high-order polynomials tend to produce unsatisfactory
interpolation. The problem is polynomial wiggle. This can be seen in the right graph of Fig. 3.
Near the middle x values the interpolation seems reasonable, but at the left and right extremes
it oscillates between very large y values much larger than any of the sample y values. Very
slight changes in the y values of the sample points can cause very large changes in this wiggle
effect. Although we are guaranteed to be able to find an (n-1)th order polynomial passing through
n samples, there is no guarantee that it will be smooth. In fact for large n it will typically
display these large oscillations. We consider alternative interpolation methods in the next lecture.
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

11/13

5 Appendix 1 Lagrange polynomials for equally spaced points


Suppose we have n sample points ( x 1 , y 1) ,( x 2 , y 2 ) , ,( x n , y n ) with uniformly spaced x values
given by x i= x1+(i1)h . Define
t=

xx 1
h

so that
x= x1+h t
Written at (t i , y i) samples, our data have the form
(0 , y1 ),(1 , y 2) , ,(n1 , y n )
Suppose we fit a polynomial y= p(t) through these data. Then the polynomial

( )

y=q (x)= p (t)= p

x x1
h

interpolates the ( x i , y i ) samples. We limit consideration to the Lagrange basis, as it is the most
useful theoretically. Here we list the p (t) polynomials of orders 0,1,2,3,4 which interpolate
n=1,2,3,4,5 sample points. These can easily be derived from formulas (2) and (3).
For n=1
p 0 (t )= y 1
For n=2
p 1(t)= y 2 t y 1 (t1)
For n=3
1
1
p 2 (t)= y 3 t (t1) y 2 t(t2)+ y 1 (t1)(t2)
2
2
For n=4
1
1
1
1
p 3 (t)= y 4 t (t1)(t2) y3 t (t 1)( t3)+ y 2 t( t2)(t3) y 1 (t1)(t 2)(t3)
6
2
2
6
For n=5
p 4 (t )=

1
1
1
y 5 t (t 1)(t 2)( t3) y 4 t(t 1)(t 2)(t4)+ y 3 t (t 1)(t 3)(t4)
24
6
4
1
1
y 2 t ( t2)(t 3)(t 4)+
y (t1)(t2)(t 3)( t4)
6
24 1

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

12/13

6 Appendix 2 Scilab code


6.1 Monomial-basis polynomial interpolation
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020

//////////////////////////////////////////////////////////////////////
// interpMonomial.sci
// interpLagrangeCoeff.sci
// 2014-06-25, Scott Hudson, for pedagogic purposes only
// Given n samples x(i),y(i), in the column vectors x,y
// calculate the coefficients c(i) of the (n-1) order
// monomial interpolating polynomial and evaluate at points xp.
//////////////////////////////////////////////////////////////////////
function yp = interpMonomial(x, y, xp)
n = length(x); //x and y must be column vectors of length n
A = ones(x); //build up the Vandermonde matrix A
for k=1:n-1
A = [A,x.^k]; //each column is a power of the column vector x
end
c = A\y; //solve for coefficients
yp = ones(xp)*c(1); //evaluate polynomial at desired points
for k=2:n
yp = yp+c(k)*xp.^(k-1);
end
endfunction

6.2 Lagrange-basis polynomial interpolation


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019

//////////////////////////////////////////////////////////////////////
// interpLagrange.sci
// 2014-06-25, Scott Hudson, for pedagogic purposes only
// Given n samples x(i),y(i), in the column vectors x,y
// evaluate the Lagrange interpolating polynomial at points xp.
//////////////////////////////////////////////////////////////////////
function yp = interpLagrange(x, y, xp)
n = length(x);
yp = zeros(xp);
for k=1:n //form Lagrange polynomial L_k
L = 1;
for i=1:n
if (i~=k)
L = L.*(xp-x(i))/(x(k)-x(i));
end
end
yp = yp+y(k)*L;
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 14: Interpolation I

13/13

6.3 Newton-basis polynomial interpolation


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022

//////////////////////////////////////////////////////////////////////
// interpNewton.sci
// 2014-06-25, Scott Hudson, for pedagogic purposes
// Given n-dimensional vectors x and y, compute the coefficients
// c(1), c(2), ..., c(n) of the Newton interpolating polynomial y=p(x)
// and evaluate at points xp.
//////////////////////////////////////////////////////////////////////
function yp = interpNewton(x, y, xp)
n = length(y);
c = y(:);
for i=1:n-1
for j=i+1:n
c(j) = (c(j)-c(i))/(x(j)-x(i));
end
end
yp = ones(xp)*c(1);
u = ones(xp);
for i=2:n
u = u.*(xp-x(i-1));
yp = yp+c(i)*u;
end
endfunction

6.4 Coefficients of Newton polynomial


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016

//////////////////////////////////////////////////////////////////////
// interpNewtonCoeffs.sci
// 2014-06-25, Scott Hudson, for pedagogic purposes only
// Given n samples x(i),y(i), in the column vectors x,y
// calculate the coefficients c(i) of the
// (n-1) order interpolating Newton polynomial
//////////////////////////////////////////////////////////////////////
function c = interpNewtonCoeffs(x, y)
n = length(y);
c = y(:);
for j=2:n
for i=n:-1:j
c(i) = (c(i)-c(i-1))/(x(i)-x(i-j+1));
end
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15
Interpolation II
1 Introduction
In the previous lecture we focused primarily on polynomial interpolation of a set of n points. A
difficulty we observed is that when n is large, our polynomial has to be of high order, namely
n1 . Unfortunately, high-order polynomials tend to suffer from wiggle, and this limits their
practical usefulness for interpolation. In this lecture we will explore how we can use polynomials
of moderate order to achieve smooth interpolations while avoiding the problems associate with
high-order polynomials.

2 Piecewise polynomial interpolation Hermite splines


What we've previously called linear interpolation is more precisely piecewise-linear
interpolation. We don't interpolate the entire set of points with a single line. Instead, we use
different line segments over different intervals between sample points (Fig. 1). The complete
interpolation is built by tying together these lines. In fact the sample points where lines join or
tie together, (x i , y i ), i=2,3, , x n1 , are appropriately called knots.

Fig. 1: Piecewise interpolation: linear (left) and cubic (right). The sample points at which
pieces join (or tie together) are called knots.

The entire interpolation function is described by


y= f ( x )=a i x+bi if xi x< xi +1
and the a i ,b i values are determined by the conditions
y i=a i xi +bi
y i+1=ai x i+1+bi
We can express a linear function in different forms, one of which might be more convenient for
determining coefficients. We could write the ith segment in the form of a 1 st order Taylor series
expanded about the point x i ,
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15: Interpolation II

2/12

y= y i+ y i (xx i )
This trivially satisfies y= y i when x= xi . For x= xi +1 the requirement

y i+1= y i+ y i ( x i+1x i )

provides the value the y i value

y i=

y i+1 y i
x i+1x i

Alternately, the Lagrange interpolating polynomial can be written by inspection as


y= y i

xx i+1
xx i
+ yi +1
x i x i+1
x i+1x i

We now extend this idea of piecewise interpolation to polynomials. While a piecewise linear
interpolation is continuous, the derivative is clearly not continuous at the sample points. Suppose

now that for each x i we know both the function value y i and the function slope y i . Let's build
a piecewise polynomial interpolation that has the specified function and slope values at the knots.
These polynomial pieces are known as splines. This term comes from the practice of bending
strips of wood or plastic to form smooth curves, a technique often used in ship building and precomputer-era drafting.
For each segment we have four equations to satisfy, the two endpoint function values and the two
endpoint slope values. Our interpolation function must therefore have four unknown coefficients.
Since a 3rd order polynomial has four coefficients we write (Fig. 1)
f (x)=a i x 3+bi x 2+ci x+d i if x i x<x i+1
In each interval we have four unknowns a i ,b i , c i , d i satisfying four equations
y i=a i x3i +bi x 2i +ci x i+d i

y i =3 ai xi +2 bi x i +c i
3
2
y i+1=ai x i+1+bi x i+1+c i x i+1+d i +1

(1)

y i+1=3 a i x i+1+2 bi x i+1+c i


We could solve these four equations in four unknowns directly. Instead we'll apply some
bookkeeping to obtain a bit cleaner approach.
To simplify analysis over an arbitrary interval x i x<x i+1 it's a good idea to form the
normalized variable
u=

xx i
xx i
=
xi +1x i
hi

(2)

As x varies over x i xx i+1 , u varies over 0u1 . Note that


du 1
=
dx hi
so
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15: Interpolation II

3/12

y=

dy dy du 1 dy
=
=
dx du dx hi du

or
dy

=hi y
du
We now write
2

y=a+b u+c u +d u
dy
2
=b+2 c u+3 d u
du

Our system of equations is now (evaluating the above equations at u=0,1 )


y i=a

hi yi =b
y i+1=a+b+c+d

(3)

hi yi +1=b+2 c+3 d
The simplification from (1) is significant. These can easily be solved to give
a= yi

b=h i y i

c=3 y i2 h i y i +3 y i+1h i y i+1

d =2 y i+hi y i2 yi +1 +hi y i+1


and the interpolating cubic is

y= y i+hi y i u+(3 yi 2 h i y i +3 y i+1hi y i+1) u 2+(2 y i+hi y i2 y i+1+hi y i+1)u 3


It can be convenient to separate out the various y terms to obtain

y= y i (13 u 2+2 u3 )+hi yi (u2 u 2+u3)+ y i+1 (3 u2 2 u3 )+hi yi +1(u 2+u3 )


A bit of factoring puts this in a more compact form

y=(1u)2 [ y i (1+2 u)+hi yi u]+u 2 [ y i+1 (32 u)hi yi +1 (1u)]


The result is the so-called Hermite spline interpolation algorithm.
Hermite spline interpolation

Given (x 1 , y 1 , y 1) ,( x 2 , y 2 , y 2 ) , ,( x n , y n , y n ) , with increasing x values


For a value x
Find x i such that x i< x<x i+1
Set hi =x i+1 xi and u=( xx i)/ hi
2

Calculate y=(1u) [ y i (1+2 u)+hi yi u]+u [ y i+1 (32 u)hi yi +1 (1u)]


EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15: Interpolation II

4/12

Fig. 2: y=3 e sin x . solid circles: sample points, squares: function values, dashed
line: linear interpolation, solid line: Hermite-spline interpolation. Derivative values
were estimated numerically.

Considering the entire interpolation algorithm as a function, we write y=S ( x) . A Scilab


function to perform Hermite spline interpolation is given in Appendix 1 as interpHermite,
and an example of Hermite spline interpolation is shown in Fig. 2.

If we had samples of the form ( x i , y i , y i , yi ) we could find 5th order interpolation polynomials
for each interval and so on, in principle, for any number of known derivatives at each sample
point. If we have function and derivative values up to d m y /dx m , the two endpoints of each
interval will provide 2(m+1) equations. A polynomial with this many coefficients has order
n=2 m+1 .

3 Cubic splines
If we know function and derivative values at n points, we can interpolate each interval with
Hermite splines. Often, however, we only know the function values and not the derivative values.
This provides only enough information to uniquely determine a piecewise-linear interpolation.
But the smoothness of a piecewise-cubic interpolation is highly desirable, and we would like to
find a way to keep that property even when we lack derivative information. We will refer to
piecewise cubic interpolation without specific derivative values as cubic splines.

3.1 Smoothest Hermite spline interpolation

One approach would be to treat the y i as unknowns and find the values that optimize some
desirable property of the curve. Smoothness is an intuitively appealing property to have. A
smooth curve is one in which the slope does not change rapidly. A sudden change in slope
produces a kink in the curve, which is about as unsmooth as you can get.
Therefore the second derivative the rate of change of the slope should be small. Let's write

the integral of the square of the second derivative of S ( x ) as a function of the unknown y i
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15: Interpolation II

5/12

values:
xn

( y , y , , y )= [ S ( x) ] dx

(4)

x1

Using the notation

S i (u )=(1u)2 [ yi (1+2 u)+h i y i u ]+u2 [ y i+1 (32 u)hi yi +1 (1u)]


for u as given in (2), we write (4) as
n1

1
d2
=
S (u) du
2 i
dt
i=1 h i 0
For evenly space samples where hi =h we show in Appendix 1 that minimizing leads to the
equations
3
3

y 2+2 y 1= ( y 2 y1 ) , y n+2 y n1 = ( y n y n1)


h
h

(5)

y i+1+4 y i + y i1= ( y i+1 y i1) for i=2,3, , n1


h

(6)

for the y i values. The case of nonuniform samples is similar but a bit messier because we have
to keep track of different h values.

3.2 Continuity of second derivatives


Another approach is to require that not only the first, but also the second derivatives of the
interpolation be continuous at the knots x 2 , x3 , , x n1 . There are n2 knots, since the
endpoints x 1 , x n are not knots (no other pieces connect there). Continuity of the second
derivatives at the knots provide n2 equations. For uniformly spaced samples these turn out to
be (6). This tells us that the smoothest Hermite spline interpolation we derived previously also
results in continuous second derivatives at all knots.
We then need two more equations to obtain a unique solution. So-called natural end conditions

are obtained by setting S (x )=0 at the endpoints x 1 , x n , in other words, we let the
interpolation go straight at both ends. This leads to equations (5). Therefore a cubic spline
interpolation with natural end conditions is precisely the optimally smooth Hermite spline
interpolation we derived above.

Another option is to specify the end-point slopes y 1 , y n . This is called the fixed-slope end
conditions. If we have a good estimate of these slopes then this makes sense. Otherwise the
choice is arbitrary.
Finally we can choose the so-called not-a-knot conditions where we require the third derivative
of the interpolation to be continuous at the first and last knots. At these knots, therefore, the
cubic functions and the first, second and third derivatives are continuous. But cubics that agree in
this manner are simply the same cubic; there is no other possibility. So what used to be the first
and last knots are no longer knots, hence the name not-a-knot. For the uniformly sampled case
these equations read
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15: Interpolation II

6/12

Fig. 3 A case where natural conditions produce a more accurate interpolation than not-aknot conditions.

3
3

y 1+4 y 2+ y 3= ( y3 y1 ) and y n2+4 y n1+ y n= ( y n y n2)


h
h

(7)

Which end conditions should we choose? The natural conditions are attractive because of their
maximally smooth feature. However, in many cases the not-a-knot conditions provide a more
accurate interpolation. It depends on the underlying function f (x ) (see Fig. 3 and Fig. 4).
Actual functions are not necessarily as smooth as possible! Common practice is to use the nota-knot conditions. In practice the two end conditions produce very similar results except,
possibly in the first and last intervals.

3.3 Optimal smoothness of natural cubic spline interpolation


We've spent a lot of time working with cubic splines. We've shown that natural cubic splines are
the smoothest-possible piecewise-cubic interpolation of any set of points. If smoothness is so
desirable, why not try piecewise interpolation with even higher-order polynomials? It turns out
that no other interpolation is smoother than a natural cubic spline. This rather remarkable result
tells us that, as far as smoothness is concerned, cubic splines are the best we can do.
Suppose that S (x ) is the natural cubic spline interpolation of the n samples (x i , y i ) . Let f (x )
be any twice-differentiable function that also interpolates the samples (this could even be the
original function from which the samples were drawn). Then one can show that
xn

[S
x1

xn

(x ) ] dx [ f ( x) ] dx

x1

This is the sense in which we can say that no function provides a smoother interpolation of a
set of data points than does the natural cubic spline. However, as shown in Fig. 3 and Fig. 4,
smoother is not necessarily better.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15: Interpolation II

7/12

Fig. 4 A case where not-a-knot conditions produce a more accurate interpolation than
natural conditions.

4 The interp1 function (Scilab/Matlab)


As for most numerical methods we study, Scilab and Matlab have built-in functions offering
state-of-the-art implementations. The following command
yp = interp1(x,y,xp,str); //str = 'nearest' or 'linear' or 'spline'

Implements one-dimensional interpolation of either nearest-neighbor, linear or spline type. For


spline interpolation not-a-knot end conditions are used. Here the vectors x,y are sample data,
xp is the vector of x values where we want interpolations, and yp is the vector of interpolated

Fig. 5: Ten randomly selected points (dots). Thick (green) line: ninth-order polynomial.
Thin (blue) line: not-a-knot spline. Thin dashed (red) line: natural spline.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15: Interpolation II

8/12

Fig. 6 Sample data represented as impulses or poles of height yi .

values. This is sufficient for almost all one-dimensional interpolation needs.


With regards to wiggle, the advantage of splines over high-order polynomials for interpolation
is illustrated in Fig. 5. Notice too that the difference of the two spline end conditions is
significant only in the first and last intervals. These data points were chosen at random so there is
no actual underlying f (x ) . However, the spline interpolations certainly appear more realistic.

5 Lanczos (convolution) interpolation


The methods we have considered so far apply to an arbitrary set of samples ( x i , y i ) . If the x
values are uniformly spaced, however, then some ideas from signal processing can be applied.
We turn to that now. Let's suppose we have uniformly spaced samples with x i= x1+(i1)h . We
can visualize our data as shown in Fig. 6.
Imagine an impulse or pole of height y i erected vertically with its base on the ground at
location x i . For our purposes convolution is the process of replacing each impulse with a
common impulse response function, centered at the impulse and scaled by the height y i . The
triangle function ( x ) is shown in Fig. 7.

Fig. 7: The triangle function

EE 221 Numerical Computing

Scott Hudson

(x) .
2015-08-18

Lecture 15: Interpolation II

9/12

Fig. 8: Convolution of impulses with triangle function. Thick gray curve is sum of all
triangle functions and interpolates the data points.

If we replace each impulse by a stretched version of the triangle function

| | | |
( x / h )= 1 x /h x h
| x| >h
0

scaled by y i , then the sum of these


n

f (x)= y i
i=1

( )
xx i
h

produces the interpolation shown in Fig. 8. We recognize this as the piecewise linear
interpolation of the data with the addition of linear extrapolations at the two ends. This naturally
leads us to wonder if using a different impulse response function might produce a better
interpolation.
Thinking of x as time and y as the amplitude of an audio signal, there is a remarkable theorem

Fig. 9 The sinc function.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15: Interpolation II

10/12

due to Nyquist which says that provided: 1) the signal from which the audio samples were draw
contains frequency components only within a limited range, and 2) the sample separation h is
properly chosen, then a convolution interpolation using the sinc function (pronounced sink)
will exactly recreate the original function f (x ) . Mathematically

y= f ( x )= y i sinc
i=

( )
xx i
h

The sinc function is


sinc(x )=

sin x
x

and is plotted in Fig. 9. Note that sinc 0=1 and sinc n=0 for n a non-zero integer.
Unfortunately the sinc function extends to x , although the amplitude of the bumps drop
off as 1/ | x| . If we are interpolating many points, we'll have to add a contribution from each
point. A compromise proposed by Lanczos is to window the sinc function by another sinc
function to produce the Lanczos kernel
L( x)=

sinc ( x ) sinc

( ) ||
x
a

x a

| x|>a

where a is typically chosen to be a small integer (most often 2 or 3). This is plotted in Fig. 10.
The Lanczos interpolation is
i+a

y=

j=i +1a

yj L

( )
xx j
h

Fig. 10 The Lanczos kernel for

EE 221 Numerical Computing

Scott Hudson

a=1,2,3 .
2015-08-18

Lecture 15: Interpolation II

11/12

Fig. 11: Seven data points and Lanczos interpolation.

where i is the index such that x i x<x i+1 . In this expression only the 2 a nearest sample points
contribute to the interpolation at a given value of x. An example of Lanczos interpolation (with
a=3 ) is shown in .

6 Appendix 1 Scilab code


6.1 Hermite spline interpolation
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023

//////////////////////////////////////////////////////////////////////
// interpHermite.sci
// 2014-06-25, Scott Hudson, for pedagogic purposes
// Given n samples x(i),y(i),y1(i) in the column vectors x,y,y1
// where y(i)=f(x(i)) and y1(i) is the derivative of f(x) at x(i),
// interpolate at points xp using Hermite splines.
// Note: x and xp values must be in ascending order.
//////////////////////////////////////////////////////////////////////
function yp=interpHermite(x, y, y1, xp)
n = length(x);
m = length(xp);
yp = zeros(xp);
i = 1; //start linear search at first element
for j=1:m
while (xp(j)>x(i+1)) //find j so that x(j)<=u(i)<=x(j+1)
i = i+1;
end
h = x(i+1)-x(i);
t = (xp(j)-x(i))/h;
yp(j) = (t-1)^2*(y(i)*(2*t+1)+y1(i)*h*t) ..
+t^2*(y(i+1)*(3-2*t)+y1(i+1)*h*(t-1));
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 15: Interpolation II

12/12

7 Appendix 2 Smoothest Hermite spline interpolation


Assume we have n samples (x i , y i ) , and the x values are equally spaced x i= x1+(i1)h . Let
1

n1

1
d2
=
S (t) dt
2 i
i=1 h 0 dt
where

] [

S i (t )=(t1)2 yi (2 t+1)+ y i h t +t 2 y i+1 (32 t)+ y i+1 h(t1)

One can show that (a computer algebra program helps!)


1

d2

S (t ) dt=12( y i+1 y i) 212 h ( y i+1+ yi )( y i+1 y i ) +4 h 2 ( y i )2+ y i y i+1+( y i+1)2


2 i
dt

For a knot 1<i<n , y j appears in S i1 and S i . Calling


1

1
d2
1
d2
4 w=
S
(t)
dt+
S (t) dt

h 0 dt 2 i1
h 0 dt 2 i
we have

[
y ) 3 h [ ( y

] (
y )] +h (( y ) + y y

)
))

w= 3( y i yi 1 )23 h ( y i+ y j 1)( y i y i1) +h 2 ( y i1 )2+ y i1 yi +( yi )2


+3( y i+1

i+1

+ y i )( y i+1

2
i

i+1

+( y i+1

Setting
w
y

=h 2 ( yi +1+4 y i + y i1)+3 h( y i1 y i+1)=0

we have
3

y i+1+4 y i + y i1= ( y i+1 y i1)


h

(8)

For the first interval

)}

3( y y )23 h ( y + y )( y y ) +h2 ( y )2+ y y +( y )2 =0


2
1
1
2
1
2
1
1
1
1 2
2

y1
gives us
3

y 2+2 y 1= ( y 2 y 1 )
h

(9)

y n+2 y n1 = ( y n y n1)
h

(10)

while for the last interval we find

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16
Interpolation in 2D
1 Introduction
The two-dimensional (2D) interpolation problem is as follows. We are given n samples
( x i , y i , z i) assumed to be drawn from some function z= f ( x , y) . How can we best estimate
z= f ( x , y) for arbitrary x and y values?
In many practical cases our samples are drawn from a uniform rectangular grid which allows us
to separate the bookkeeping for x and y values. A digital photograph, with its distinct rows and
columns, is an example. We will limit consideration to this case and restate the problem as
follows. Given nm samples of a 2D function
z ij = f ( xi , y j ) , i=1,2, , m , j=1,2, , n

(1)

xi =x 1+(i1) x
y j = y 1+( j1) y

(2)

where

how can we best estimate z= f ( x , y) for arbitrary x,y values?

2 Bookkeeping
For a uniform rectangular grid a point (x , y ) will fall inside of a single rectangle
x i x<x i+1 , y j y< y j+1 as shown in Fig. 1. We will call each of these rectangles a unit cell.
Even more than in the 1D case, it is convenient to define normalized coordinates
u=

xx i
y y j
, v=
xi +1x i
y j +1 y j

(3)

so that x i x<x i+1 , y j y< y j+1 corresponds to 0u<1 , 0v<1 .


As shown in Fig. 1, the distance x x1 can be broken up into an integer number of x steps
plus a fractional x step

Fig. 1 Left: A point ( x , y) will fall within a unit cell, in the case shown cell i=3 , j=2 .
Position in the cell is specified by (u , v) . Right: The integer part of ( xx 1)/ x
determines i while the fractional part determine u.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D

2/12

x x1=(i1) x+u x

(4)

Rearranging we have
(i1)+u=

xx 1
x

(5)

where (i1) is an integer and 0u<1 is a fractional value. In Scilab the command int(z)
returns the integer part of a real number. Given an x value we can solve for i and u using
i=1+int

( )
x x1
x

, u=

xx 1
+1i
x

(6)

( )

, v=

y y 1
+1 j
y

(7)

Likewise in the y direction we have


j=1+int

y y1
y

Function interpUnitCell in the Appendix implements this bookkeeping. Below, we will


assume that the values i,j,u,v have been calculated for the given (x , y ) at which we are
computing the interpolation.

3 Nearest neighbor interpolation


This is an obvious extension of the 1D case. We find the grid point closest to (x , y ) and use the
z value at that grid point as our interpolation. That grid point will be one of the corners of the unit
cell. With our bookkeeping this reads
Nearest-neighbor interpolation
if u0.5 then k =i else k =i+1
if v0.5 then l = j else l= j+1
z= z kl
This interpolation is piecewise constant and discontinuous. Function interpNeighbor2D in
the appendix implements this algorithm.

4 Linear interpolation over triangles


A 2D linear function has the form
z= f ( u , v)=a+b u+c v

(8)

and defines a plane in 3D space. A plane cannot pass through four arbitrary points, so a function
of this form cannot represent f ( x , y ) over a unit cell with four corners. However, we can
divide a unit cell into two triangles, as shown in Fig. 2. Each of those triangles has three vertices
through which we can pass a plane. For the lower triangle ( vu ) this requires
z i , j=a
z i +1, j=a+b
z i +1, j+1=a+b+c

EE 221 Numerical Computing

Scott Hudson

(9)

2015-08-18

Lecture 16: Interpolation in 2D

3/12

Fig. 2: Dividing a unit cell into two triangles.

corresponding to the three vertices (u , v )=(0,0) ,(1,0) ,(1,1) . The solution is


a=z i , j
b=z i+1, jz i , j
c=z i+1, j+1z i+1, j

(10)

For the upper triangle ( v>u ) the equations are


z i , j=a
z i , j+1=a+c
z i +1, j+1=a+b+c

(11)

corresponding to the three vertices (u , v )=(0,0) ,(0,1),(1,1) . The solution is


a=z i , j
c=z i , j+1z i , j
b=z i+1, j+1z i , j +1

(12)

The result is summarized as follows.


Linear interpolation over triangles
if vu then z =z i , j +u ( z i+1, j z i , j )+v( z i+1, j +1 z i+1, j )
else z=z i , j +v ( z i , j+1z i , j )+u( z i+1, j+1 z i , j+1)
This form of linear interpolation is a key component of the Finite Element Method (FEM) for
solving partial differential equations. In that case, the z ij samples values are treated as unknowns
whose values are solved by requiring the interpolation to satisfy some equations describing the
physics of the system. In the FEM the triangles are called finite elements. Function
interpTriangle in the appendix implements this algorithm.

5 Bilinear interpolation
The linear function (8) cannot pass through the four corners of a unit cell. However a bilinear
function
z= f ( u , v)=a+b u+c v +d uv
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D

4/12

can with proper choice of the coefficients a,b,c,d. The equations are
z i , j=a
z i+1, j=a+b
z i , j+1=a+c
z i+1, j+1=a+b+c+d

(13)

corresponding to the four corners (u , v )=(0,0),(1,0) ,(0,1) ,(1,1) . The solution is


a=z i , j
b=z i +1, jz i , j
c= z i , j+1z i , j
d =z i +1, j+1+z i , jz i +1, j z i , j+1

(14)

and we have the algorithm


Bilinear interpolation
z= z i , j+(z i +1, jz ij ) u+( z i , j+1z i , j)v +(z i +1, j+1+ z i , j z i+1, j z i , j+1 )uv
A bilinear function
a+b x+c y+d xy

(15)

is linear in x if y is held constant, say y= y 0


(a+c y 0)+(b+d y 0 ) x

(16)

and is linear in y if x is held fixed, say x= x0


z=(a+b x 0)+( c+d x 0) y

(17)

Indeed one way to think of bilinear interpolation is illustrated in Fig.3. First perform linear
interpolation in u along the top and bottom sides of the cell to get the values
z j=z i , j +(z i+1, j z i , j ) u
z j+1=z i , j +( z i+1, j z i , j )u

(18)

at the locations marked X. Now perform linear interpolation in v between those values to get
z= z j+(z j +1z j ) v

(19)

Fig.3:Bilinear interpolation can be thought of as a series of 1D linear interpolations.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D

5/12

Substituting (18) into (19) and do a bit of algebra results in the bilinear interpolation formula.
This idea is easily extended into 3 or more dimensions, and Scilab provides a function for
performing this calculation. In 2D we execute
[xp,yp] = ndgrid(xx,yy);
zp = linear_interpn(xp,yp,x,y,z);

where x and y are the arrays of sample x i , y j values and z is the array of sample z i , j values.
The (two-dimensional) arrays xp,yp are the coordinates of the points were we want interpolated
values and zp is the array of those values. In this example we used the ndgrid function to
convert one-dimensional arrays xx and yy into two-dimensional arrays xp and yp. Function
interpBilinear in the appendix implements this algorithm.

6 Bicubic interpolation
In the 1D case piecewise cubic interpolation offered dramatic improvements over piecewise
linear interpolation at the expense of extra calculation and bookkeeping. We might expect a
similar improvement in the 2D case. In 2D we have to consider bicubic functions over rectangles
in the x,y plane. Just as a bilinear expression has a cross term uv, a bicubic expression has
various cross terms of the form
1
v
v2
v3
u u v u v2 u v3
u 2 u 2 v u2 v 2 u2 v 3
u 3 u3 v u3 v 2 u3 v 3
There are sixteen such terms, and each requires its own coefficient to form the bicubic
interpolation
4

z= f ( u , v)= c kl u k 1 v l 1

(20)

k=1 l =1

There are four corners to a rectangle (Fig. 2), so we need four equations at each corner to get a
total of sixteen equations in sixteen unknowns. One way to obtain these equations is to specify
the four values
z,

z z 2 z
,
,
x y x y

at each sample point. Although the bookkeeping is messy, it's then straight-forward to set up
equations to solve for the coefficients c kl for each rectangle.
As in 1D, in 2D we most often do not have known derivative values at the sample points, so we
must estimate these somehow. A simple approach is to approximate the derivatives using central
differences of the z values (we will study central differences in the numerical derivatives lecture).
The x and y partial derivatives are approximated by
z z ( x+ x , y)z ( x x , y ) z i +1, jz i 1, j

=
x
2 x
2x

(21)

and
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D

6/12

z z (x , y+ y )z ( x , y y) z i , j+1z i , j 1

=
y
2y
2x

(22)

The cross derivative is approximated by


2 z
1

+z
z
z
[z
]
x y (2 y )( 2 x) i +1, j+1 i i , j1 i 1, j+1 i+1, j1

(23)

Another approach is to pass 1D splines through the z data in various ways and use those splines
to estimate the derivative values. This leads to the idea of bicubic splines. In Scilab we perform
bicubic spline interpolation as follows
[xp,yp] = ndgrid(xx,yy);
zp = interp2d(xp,yp,x,y,splin2d(x,y,z));

The x,y,z,xp,yp,zp arrays are the same as in linear_interpn above. The splin2d
function calculates the c kl coefficients for each rectangle which then becomes the last argument
of the interp2d function.

7 Lanczos interpolation
The Lanczos interpolation method readily translates from 1D to 2D. In 2D we write
i+a

z=

j+a

L( x)=

zk , l L

k=i+1a l= j +1a

( )( )
xx k
y y l
L
x
y

(24)

where as before
sinc ( x ) sinc

( ) ||
x
a

x a

(25)

| x|>a

and
sinc(x )=

sin x
x

(26)

Typically a=3 is used. In (24) we take z k , l=0 if either k or l falls outside the grid. Function
interpLanczos2D in the appendix implements this algorithm.
The various 2D interpolation methods we have looked at are commonly used for image resizing
(resampling is just a form of interpolation) in graphics manipulation programs such as
Photoshop and Gimp. Fig. 4 shows a screen shot of the Interpolation menu of Gimp. Finally
Fig. 5 compares interpolation performed by the various methods we have been discussing.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D

7/12

Fig. 4: The interpolation menu of Gimp.

Fig. 5: Comparison of various 2D interpolation methods.

8 Appendix Scilab code


8.1 unit cell bookkeeping
0001
0002
0003
0004
0005

//////////////////////////////////////////////////////////////////////
// interpUnitCell.sci
// 2014-11-07, Scott Hudson, for pedagogic purposes
// Given sample locations x(i),y(j) 1<=i<=m , 1<=j<=n
// with x and y sorted in ascending order x(1)<x(2)<x(3) etc.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D


0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032
0033
0034
0035
0036

8/12

// and a interpolation location xp,yp


// calculate the indices of the unit cell i,j
// and the location u,v within the unit cell
//////////////////////////////////////////////////////////////////////
function [i, j, u, v]=interpUnitCell(x, y, xp, yp);
m = length(x);
n = length(y);
Dx = (x(m)-x(1))/(m-1);
Dy = (y(n)-y(1))/(n-1);
t = (xp-x(1))/Dx;
i = 1+int(t);
u = t+1-i;
t = (yp-y(1))/Dy;
j = 1+int(t);
v = t+1-j;
// if xp or yp are outside the x,y grid, move to the grid edge
if (i>=m)
i = m-1;
u = 1;
elseif (i<=0)
i = 1;
u = 0;
end
if (j>=n)
j = m-1;
v = 1;
elseif (j<=0)
j = 1;
v = 0;
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D

9/12

8.2 Nearest-neighbor interpolation


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032
0033

//////////////////////////////////////////////////////////////////////
// interpNeighbor2D.sci
// 2014-11-07, Scott Hudson, for pedagogic purposes
// Given samples (x(i),y(j),z(i,j)) 1<=i<=m , 1<=j<=n
// use nearest-neighbor interpolation over triangles to estimate
// zp(k,l) = f(xp(k),yp(l)) 1<=k<=p , 1<=l<=q
// x and y must be sorted in ascending order x(1)<x(2)<x(3) etc.
//////////////////////////////////////////////////////////////////////
function zp=interpNeighbor2D(x, y, z, xp, yp);
m = length(x);
n = length(y);
p = length(xp);
q = length(yp);
zp = zeros(p,q);
Dx = (x(m)-x(1))/(m-1);
Dy = (y(n)-y(1))/(n-1);
for k=1:p
for l=1:q
[i,j,u,v] = interpUnitCell(x,y,xp(k),yp(l));
if (u<=0.5)
kk = i;
else
kk = i+1;
end
if (v<=0.5)
ll = j;
else
ll = j+1;
end
zp(k,l) = z(kk,ll);
end
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D

10/12

8.3 Linear interpolation over triangles


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029

//////////////////////////////////////////////////////////////////////
// interpTriangle.sci
// 2014-11-07, Scott Hudson, for pedagogic purposes
// Given samples (x(i),y(j),z(i,j)) 1<=i<=m , 1<=j<=n
// use linear interpolation over triangles to estimate
// zp(k,l) = f(xp(k),yp(l)) 1<=k<=p , 1<=l<=q
// x and y must be sorted in ascending order x(1)<x(2)<x(3) etc.
//////////////////////////////////////////////////////////////////////
function zp=interpTriangle(x, y, z, xp, yp);
m = length(x);
n = length(y);
p = length(xp);
q = length(yp);
zp = zeros(p,q);
Dx = (x(m)-x(1))/(m-1);
Dy = (y(n)-y(1))/(n-1);
for k=1:p
for l=1:q
[i,j,u,v] = interpUnitCell(x,y,xp(k),yp(l));
if ((i>=1)&(i<=m-1)&(j>=1)&(j<=n-1))
if (v<=u)
zp(k,l) = z(i,j)+u*(z(i+1,j)-z(i,j))+v*(z(i+1,j+1)-z(i+1,j));
else
zp(k,l) = z(i,j)+v*(z(i,j+1)-z(i,j))+u*(z(i+1,j+1)-z(i,j+1));
end
end
end
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D

11/12

8.4 Bilinear interpolation


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026

//////////////////////////////////////////////////////////////////////
// interpBilinear.sci
// 2014-11-07, Scott Hudson, for pedagogic purposes
// Given samples (x(i),y(j),z(i,j)) 1<=i<=m , 1<=j<=n
// use bilinear interpolation to estimate
// zp(k,l) = f(xp(k),yp(l)) 1<=k<=p , 1<=l<=q
// x and y must be sorted in ascending order x(1)<x(2)<x(3) etc.
//////////////////////////////////////////////////////////////////////
function zp=interpBilinear(x, y, z, xp, yp);
m = length(x);
n = length(y);
p = length(xp);
q = length(yp);
zp = zeros(p,q);
Dx = (x(m)-x(1))/(m-1);
Dy = (y(n)-y(1))/(n-1);
for k=1:p
for l=1:q
[i,j,u,v] = interpUnitCell(x,y,xp(k),yp(l));
if ((i>=1)&(i<=m-1)&(j>=1)&(j<=n-1))
zp(k,l) = z(i,j)+u*(z(i+1,j)-z(i,j))+v*(z(i,j+1)-z(i,j))..
+u*v*(z(i+1,j+1)+z(i,j)-z(i+1,j)-z(i,j+1));
end
end
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 16: Interpolation in 2D

12/12

8.5 Lanczos interpolation


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032
0033
0034
0035
0036
0037
0038
0039
0040
0041
0042
0043
0044
0045

//////////////////////////////////////////////////////////////////////
// interpLanczos2D.sci
// 2014-11-07, Scott Hudson, for pedagogic purposes
// Given samples (x(i),y(j),z(i,j)) 1<=i<=m , 1<=j<=n
// use Lanczos3 interpolation to estimate
// zp(k,l) = f(xp(k),yp(l)) 1<=k<=p , 1<=l<=q
// x and y must be sorted in ascending order x(1)<x(2)<x(3) etc.
//////////////////////////////////////////////////////////////////////
function zp=interpLanczos2D(x, y, z, xp, yp);
m = length(x);
n = length(y);
p = length(xp);
q = length(yp);
zp = zeros(p,q);
Dx = (x(m)-x(1))/(m-1);
Dy = (y(n)-y(1))/(n-1);
a = 3;
function w=L(z) //Lanczos kernel
if abs(z)<1e-6
w = 1;
elseif (abs(z)>=a)
w = 0;
else
w = sin(%pi*z)*sin(%pi*z/a)/(%pi^2*z^2/a);
end
endfunction
for k=1:p
for l=1:q
[i,j,u,v] = interpUnitCell(x,y,xp(k),yp(l));
for kk=i+1-a:i+a
if ((kk>=1)&(kk<=m))
hx = L((xp(k)-x(kk))/Dx);
for ll=j+1-a:j+a
if ((ll>=1)&(ll<=n))
hy = L((yp(l)-y(ll))/Dy);
zp(k,l) = zp(k,l)+z(kk,ll)*hx*hy;
end
end
end
end
end
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

1/13

Lecture 17
Optimization in one dimension
1 Introduction
Optimization is the process of finding the best of a set of possible alternatives. If the
alternatives are described by a single continuous variable x, and the goodness of an alternative
is given by the value of a function y= f ( x ) , then optimization is the process of finding the
value x=x 0 where f (x ) takes on its maximum value. In many applications f (x ) will
measure the badness of an alternative (error, cost, etc.) and then our goal is to find where
f (x ) takes on its minimum value. Since finding the minimum of g ( x) is equivalent to finding
the maximum of g (x) a simple sign change converts a minimization problem into a
maximization problem and conversely. Therefore, we can focus on minimization alone with no
loss of generality.
From calculus we know that at the extreme values of a continuous, differentiable function f (x)

the derivative f (x ) is zero. This suggests that we might simply apply our root finding

techniques to f (x ) . However, as shown in Fig. 1, f (x )=0 is not a sufficient condition for a

minimum. If f ( x)<0 the point is a maximum and if f (x)=0 it may be an inflection point.

To uniquely identify a minimum we must have two conditions satisfied: f (x )=0 , f ( x)>0 .

Therefore a useful algorithm must do more than just find a root of f ( x) . Nevertheless, as we

Fig. 1 The condition f ( x)=0 can correspond to (left point) a maximum, (middle point)

an inflection point or (right point) a minimum. The sign of f ( x) distinguishes these


cases.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

2/13

Fig. 2 Both points satisfy f ( x)=0 , f ( x)>0 but only

x=0 is a global minimum.

will see, there are many commonalities between optimization and root-finding algorithms.

One difficultly with optimization is illustrated in Fig. 2. The condition f (x )=0 , f ( x)>0 tells
us only that x is a local minimum of f ( x ) , not if it is the global minimum of f (x ) .
Unfortunately there are no good, general techniques for finding a global minimum. In calculus
the algorithm given for finding a global minimum is typically to first find all local minimum
values and then identify the least of those as the global minimum. For the same reason that it is
not numerically feasible to find all zeros of an arbitrary function, it is not feasible to find every
local minimum of an arbitrary function. Therefore we will focus on trying to find a single local

Fig. 3 Graphical method for finding a minimum.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

3/13

Fig. 4 Bracketing a minimum. If these are samples of a continuous function there must be a
minimum in [a , b] .

minimum.

2 Graphical solution
Similar to root finding, simply plotting y= f ( x ) and visually identifying the minimum is
typically the easiest and most intuitive approach to optimization. This is illustrated in Fig. 3.
However we often need an automated way to to optimize a function. We now turn to the
optimization version of the bisection method for root find, the so-called golden search method.

3 Golden search
The slow-but-sure bisection method for root finding relies on the idea of bracketing a zero.
Recall that for a continuous function, if the signs of f (a ) , f (b) are different then there must be
a zero in the interval [a , b] . To bracket a minimum we need three points a<b<c (or a>b>c )
such that f (b)< f (a) and f (b)< f (c ) , as illustrated in Fig. 4. If f ( x ) is continuous over
[a , c] it cannot go down and come back up without passing through a minimum value of
f (x ) . There may be more than one local minimum, but there has to be at least one.
The golden search method is a way to shrink the interval [a , c] while maintaining three points
a<b<c that bracket a minimum. When |ca| is less than some tolerance we can report our
minimum as
x=bmax(|ba| , |bc|)
The algorithm for shrinking the interval is illustrated in Fig. 5. We might expect that b should be
the midpoint of the interval [a , c] . But if it was we would have to arbitrarily choose in which of
the two equal-length subintervals [a , b] ,[b ,c ] to sample f (x ) . The most efficient strategy is
to have |ba|<| cb| and then sample f ( x ) in the larger interval [b , c ] at
x=b+R( cb)
where R is some constant. We will find either f (x )< f (b) or f (x ) f (b) . Depending on
which of these occur we relabel the a,b,c values as follows (Fig. 5)
if f ( x)< f (b) set a b ,b x

if f ( x) f (b) set a x , c a
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

Fig. 5 left: f ( x)< f (b ) ,

4/13

a b , b x ; right: f ( b)< f ( x) , a x , c a

This gives us a new and smaller bracket. In the second case we change direction with a on the
right and c on the left. This process is most efficient if the ratio of the resulting large and small
intervals is always the same, that is

|cb| | c x| | ba|
=
=
| ba| | xb| | xb|
The value of R that gives this property is
R=

3 5
=0.382 ...
2

and is related to the golden ratio of antiquity, hence the name golden search. We then have

|ba|=R |ca| , | xb|=R |cb| , | xb|=R | xa|


This algorithm converges linearly with the error decreasing by a factor of (1R) at each
iteration. Function optimGolden in the Appendix gives a Scilab implementation of this
method.
To start the golden search method we need an initial set of three bracket values a,b,c. A simple
way to obtain these is to choose a near where you guess a minimum might be. Then set b=a+h
where h is some step size appropriate to your problem. If f (b)< f ( a) then we are going
downhill which is what we want. If not, then swap a and b so that we are. Now take
c=b+(ba )(1R)/ Rb+1.618(ba)
If f (c)> f (b) then a,b,c bracket a minimum. If not then set a b , b c and continue until a
bracket is found, or we give up and conclude that there is no minimum to be found. This is
illustrated in Fig. 6. Function optimBracket in the Appendix gives a Scilab implementation
of this method.

4 Parabolic interpolation
In the secant method for root finding, given two points (x 1 , y 1) and ( x 2 , y 2 ) we draw a line
through them to approximate the function y= f ( x ) and find the root of that line as our next
approximation to the root of f (x ) . Given three points ( x i , y i ) , i=1,2,3 we can draw a unique
parabola through them as an approximation of f ( x ) . We can then take the minimum of that
parabola as our next approximation of the minimum of f ( x ) . This is the idea behind parabolic
interpolation.
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

5/13

Fig. 6 Finding an initial bracket

We start with the Lagrange interpolating polynomial for our three points
y= y 1

( x x 2)( xx 3)
( xx 1)( xx 3)
( xx 1)( xx 2)
+ y2
+ y3
( x1 x 2)( x1 x3 )
( x 2x 1)( x 2x 3)
(x 3x 1)( x3 x 2)

Now we set dy /dx=0 . Using the fact that


d
( xx i)( xx j )=( xx i)+(x x j )=2 x( xi +x j )
dx
We get
2 x( x 2+ x 3)
2 x( x 1+x 3)
2 x( x 1+x 2 )
dy
= y1
+ y2
+ y3
=0
dx
(x 1x 2 )(x 1x 3 )
( x 2 x 1)(x 2 x 3)
( x 3x 1)( x3 x 2)

Fig. 7 Parabolic interpolation. Circles are samples. Square is minimum of interpolated


parabola.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

6/13

Fig. 8 Test function for minimization

Multiplying through by (x 1x 2 )(x 1x 3 )( x 2x 3 ) to clear fractions we find


y 1 [2 x( x 2+x 3)]( x 2x 3 ) y 2 [2 x( x 1+x 3 )](x 1x 3)+ y 3 [2 x( x1 +x 2)](x 1x 2 )=0
(The minus sign for y 2 comes from (x 1x 2 )/( x 2x 1)=1 .) Now we solve for x to find (using
(a+b)(ab)=a 2b2 )
2

1 y 1 ( x 2 x 3) y 2 ( x1 x 3 )+ y 3 (x 1 x 2)
x=
2 y1 ( x 2x 3) y 2 (x 1x 3)+ y 3( x 1x 2 )
Rearranging terms we obtain
2
2
2
1 x 1 ( y3 y 2 )+ x 2 ( y1 y3 )+x 3 ( y2 y1 )
x=
2 x1 ( y 3 y 2 )+ x 2( y1 y3 )+x 3 ( y 2 y 1 )

as the x coordinate of the minimum of the parabola.


It can be convenient to subtract x 2 from all x values, corresponding to a shift of the x axis, to
obtain
2
2
1 (x 1x 2 ) ( y 3 y 2 )+( x3 x 2) ( y 2 y 1)
xx 2 =
2 ( x 1 x 2)( y 3 y 2 )+( x3 x 2)( y 2 y1 )

The term on the right is the displacement of the new estimate from the previous estimate x 2 .
This should shrink to zero as the method converges. Parabolic interpolation converges
superlinearly, k+1 qk , with q1.32 . A Scilab implementation is given in the Appendix as
optimParabolic.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

7/13
2

Example. The function f (x)=(1 x) ex is plotted in Fig. 8. It has a minimum


at
x 0=

3+1
=1.3660254 ...
2

Running optimBracket with x 1=1.2 , x 2=1.4 identified the bracket


a=1.2 , b=1.4 , c=1.6763932
after one function evaluation. Running optimGolden with those starting
values and tol=0.0001 , 17 iterations (each involving a single function
evaluation) were required to obtain
x min =1.3660326 , f ( x min )=0.0566378
Running optimParabolic with the same conditions, only 7 iterations were
required to obtain
x min =1.3660254 , f ( x min )=0.0566378
The progress of each algorithm towards the minimum is shown in Fig. 9.
On the other hand, it is easy to find an a , b , c bracket that causes
optimParabolic to diverge. One example is a=1, b=2, c=3 . From Fig. 8

we can see that for x2 , f (x)<0 , so we might expect problems when using
a method which models the function as an upwardly curved parabola.

Fig. 9: Progression of golden search (circles) and parabolic interpolation (squares)


x
toward minimization of f ( x)=(1 x) e . Horizontal axis increments at each function
call. Vertical axis is | xr| on log scale. First three calls are for bracketing routine.
2

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

8/13

5 Newton's method
For root finding we saw that Newton's method gave quadratic convergence at the price of having

to explicitly calculate the derivative f ( x) . We can apply Newton's method to solve f ( x)=0 .
We find the root of the first-order Taylor series

f (x k +h) f ( x k )+ f ( x k )h=0
to be

h=

f ( xk )

f ( xk )

This gives us the iteration formula

x k +1=x k

f ( xk )

f (x k )

To apply this we need to explicitly calculate both first and second derivatives. Essentially what
we are doing is to approximate the function by the second-order Taylor series

f (x k +h)= f ( x k+1) f ( x k )+ f ( x k )h+

1
f ( x k ) h2
2

and setting h to correspond to the minimum of this parabola. Of course this requires that

f (x k )>0 otherwise the parabola will have a maximum, not a minimum. This should be the
case, provided we start close enough to the actual minimum of f ( x ) . We could avoid

explicitly calculating the derivatives by approximating f (x) (and possibly f ( x) also) using
function values alone, analogous to what we did in the tangent method. The result would be a
quasi-Newton method. These often form a primary component of a state-of-the-art optimization
routine.

6 Hybrid methods
As with root finding, optimization presents us a tradeoff. Sure-fire methods (golden search) are
relatively slow while faster methods can be unstable and fail to find a solution. For particular
functions one or the other may be preferable. For a general-purpose optimization routine, a good
strategy is to combine slow-but-sure and fast-but-unstable methods into a hybrid method. Brent's
method [1] is a good example. This algorithm first attempts to use parabolic interpolation, but
includes tests to indicate if this is converging in a desirable manner. If it isn't, the algorithm falls
back on the golden search for one or more iterations before trying parabolic interpolation again.
Other hybrid algorithms employ quasi-Newton methods in an attempt to achieve rapid
convergence when possible and slow-but-guaranteed convergence otherwise. These include the
built-in optimization routines in Scilab and Matlab

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

9/13

7 The optim (Scilab) and fminunc (Matlab) functions


Scilab provides a built-in optimization routine optim. It will attempt to minimize a function of
any number of variables. By default it expects the function being optimized to return both
function and derivative values. However, numerical derivatives can be used instead. In this case
the basic calling syntax is
[fopt,xopt] = optim(list(NDcost,f),x0);

Here f(x) is the function to be minimized, x0 is an initial guess at the minimum, xopt is the
computed minimum and fopt is the minimum function value. The list(NDcost,f)
statement takes care of providing numerical derivatives.
Matlab provides the function fminunc for optimization. Its basic syntax is
[xopt,fopt] = fminunc(f,x0);

As always, there are many options that can be set, and the help browser provides complete
documentation.

8 Constrained optimization
What we have covered so far is more precisely referred to as unconstrained optimization. We
are free to test any values of x in our search for a minimum. In a constrained optimization
problem only x values that satisfy one or more given constraints are valid candidates for a
minimum. A simple example would be
min ( x+1)2 s.t. x0
Here we want to find the minimum of f (x )=( x+1) 2 but subject to the constraint that x is non-

Fig. 10 Minimization without constraint (solid dot) and with constraint x0 (open dot).

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

10/13

Fig. 11 Constrained optimization using a penalty function.

negative. This is illustrated in Fig. 10. With no constraint our solution would simply be the
bottom of the parabola. But with the constraint the best we can do is x=0 . Implementing
constrained optimization can be tricky, depending on the complexity of the constraints. A simple
work around that allows us to implement contained optimization using unconstrained
optimization algorithms employs the idea of a penalty function. Instead of minimizing f (x ) , we
minimize f ( x)+ p ( x) where the penalty function p ( x ) is large for values of x that violate the
constraints and zero otherwise. For example, adding
p (x )=

0
, x0
4 x , x<0

to f (x )=( x+1) 2 results in the function shown in Fig. 11. Now an unconstrained minimization
of f (x )+ p (x ) produces the same solution as the original constrained minimization problem.

9 References
1. Brent, Richard P. Algorithms for Minimization Without Derivatives. Dover Publications.
Kindle edition. ASIN: B00CRW5ZTK. 2013 (Originally published 1973)

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

11/13

10 Appendix Scilab code


10.1 Golden search
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028

//////////////////////////////////////////////////////////////////////
// optimGolden.sci
// 2014-10-31, Scott Hudson, for pedagogic purposes.
// Implements golden search for minimization of y=f(x).
// (a,b,c) must bracket a minimum, i.e., f(b)<f(a) & f(b)<f(c)
// with a<b<c or a>b>c. Function terminates when minimum has been
// estimated to within +-tol
//////////////////////////////////////////////////////////////////////
function [xmin, fmin]=optimGolden(a, b, c, fa, fb, fc, f, tol)
R = (3-sqrt(5))/2; //golden ratio
while (abs(c-b)>tol) //abs(c-b)>abs(b-a) is upper bound on error
x = b+R*(c-b);
fx = f(x);
if (fx<fb)
a = b;
fa = fb;
b = x;
fb = fx;
else
c = a;
fc = fa;
a = x;
fa = fx;
end
end
xmin = b;
fmin = fb;
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

12/13

10.2 Bracketing a minimum


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032
0033
0034
0035
0036
0037
0038
0039
0040
0041
0042

//////////////////////////////////////////////////////////////////////
// optimBracket.sci
// 2014-10-31, Scott Hudson, for pedagogic purposes.
// Given intial values x1,x2 and a function y=f(x), attempts to
// follow the function downhill until a minimum has been bracketed
// f(b)<f(a) & f(b)<f(c) with a<b<c or a>b>c.
//////////////////////////////////////////////////////////////////////
function [a, b, c, fa, fb, fc]=optimBracket(x1, x2, f)
MAX_ITERS = 20; //give up after this many attempts
a = x1;
b = x2;
fa = f(a);
fb = f(b);
if (fa<fb) //going uphill, go other way by switching a & b
c = a;
//save a
fc = fa;
a = b;
//a<-b
fa = fb;
b = c;
//b<=a
fb = fc;
end
R = (3-sqrt(5))/2; //golden ratio
step = (1-R)/R;
done = 0;
iter = 0;
while (~done)
c = b+(b-a)*step;
fc = f(c);
if (fc>fb)
//we're now going uphill, bracket found
done = 1;
else
//still going down hill
a = b;
fa = fb;
b = c;
fb = fc;
iter = iter+1;
end
if (iter>MAX_ITERS)
error('optimBracket: MAX_ITERS reached');
end
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 17: Optimization in one dimension

13/13

10.3 Parabolic interpolation


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028

//////////////////////////////////////////////////////////////////////
// optimParabolic.sci
// 2014-10-31, Scott Hudson, for pedagogic purposes.
// Uses parabolic interpolation to estimate the minimum of y=f(x).
// Last three estimates are retained for interpolation.
//////////////////////////////////////////////////////////////////////
function [xmin, fmin]=optimParabolic(a, b, c, fa, fb, fc, f, tol)
MAX_ITERS = 20;
x = [a,b,c];
y = [fa,fb,fc];
iter = 1;
while ((max(x)-min(x))>2*tol)
N = (y(3)-y(2))*x(1)^2+(y(1)-y(3))*x(2)^2+(y(2)-y(1))*x(3)^2;
D = (y(3)-y(2))*x(1) +(y(1)-y(3))*x(2) +(y(2)-y(1))*x(3) ;
x(1) = x(2);
y(1) = y(2);
x(2) = x(3);
y(2) = y(3);
x(3) = N/(2*D);
y(3) = f(x(3));
iter = iter+1;
if (iter>MAX_ITERS)
error('optimParabolic: MAX_ITERS reached');
end
end
xmin = x(3);
fmin = y(3);
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 18
Optimization in n dimensions
1 Introduction
We now consider the problem of minimizing a single scalar function of n variables, f (x) ,
where x=[ x1 , x 2 , , x n ]T . The 2D case can be visualized as finding the lowest point of a
surface z= f ( x , y) (Fig. 1).

Fig. 1 The 2D minimization problem is equivalent to finding the lowest point on a


surface.

A necessary condition for a minimum is that f / x i=0 for all 1in . The partial derivative
f / x i is the ith component of the gradient of f, denoted f , so at a minimum we must have
f =0

(1)

In the 2D case this implies we've bottomed out at the lowest point of a valley. The gradient
also vanishes at a maximum, so this is a necessary but not sufficient condition for a minimum.

2 Quadratic functions
Quadratic functions of several variables come up in many applications. A quadratic function of n
variables x i has the form
n

f (x 1 , x 2 , , x n)=c+ bi x i+
i=1

1
a x x
2 i=1 j =1 ij i j

(2)

Since
1
1
1
a x x + a x x = (a +a ji ) xi x j
2 ij i j 2 ji j i 2 ij

(3)

the coefficients a ij , a ji only appear as the sum a ij +a ji . Without loss of generality, therefore, we
can take a ij =a ji .
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 18: Optimization in n dimensions

2/9

By differentiation we have
f (0)=c ,

f
f2
=aij
=b ,
xi i xi x j

(4)

which allows us to interpret the form (2) as a multivariable Taylor series of an arbitrary function.
The conditions for a minimum (or maximum) are (for k =1,2, , n )
n

f
1
1
=b k + aik x i+ a kj x j=bk + a kj x j =0
xk
2 i=1
2 j=1
j=1

(5)

In matrix notation this reads


f =b+A x=0

(6)

where bi are the components of b, and a ij are the components of the symmetric matrix A. The
solution is
x=A1 b

(7)

The minimization of a quadratic function is equivalent to the solution of a linear system.


However, for an arbitrary function f (x) we can't make any general statements about a
minimum. This motivates us to seek a method to systematically search for the minimum of a
function of n variables.

3 Line minimization
We know how to go about minimizing a function of one variable. If we start at a point x 0 and
move only in the direction of a vector u (Fig. 2) then the f (x) values we can sample form a
function of a single variable

Fig. 2 x0 is the starting point and u the


direction in which we search for a minimum.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 18: Optimization in n dimensions

3/9

(8)

g (t )= f (x 0+t u)

Here the variable t is the distance we move in the direction u. We can use any of our 1D
minimization methods on this function. Of course this is not likely to find the minimum of
f (x) . However, suppose we start at x 0 and move along the direction u1 to a minimum. Call
this new point x 1=x 0+t 1 u1 . Then move along another direction u 2 to find a minimum at
x 2=x1+t 2 u 2 and so on. This process should eventually find a local minimum (if one exists).
The process of minimizing the function f (x) along a single line defined by some vector u is
called line minimization. The algorithm is quite simple
Successive line minimization algorithm
start with initial guess x 0 and search directions ui
iterate until converged
for i=1,2, , n
find t that minimizes g (t )= f (x 0+t ui )
set x 0 x 0+t ui
An obvious set of directions is the coordinate axes

() () ()

1
0
u 1 = 0 , u2 =

0
1
0 ,, u n=

0
0
0

(9)

In this case the algorithm simply minimizes f (x) with respect to x 1 , then with respect to x 2
and so on. The algorithm will find a minimum (if one exists), but in many cases it can be very
slow to do so. An example is shown in Fig. 3 where we minimize the quadratic function
2

( )

x y
2
f ( x , y )=
+( x+ y 2 )
4

(10)

starting at x= y=0 . The minimum is at x= y=1 . From the contour plot we see that the
valley of this function is narrow and oriented 45o to the coordinate axes. Since we are limited
to moving in only the x or y direction at any one time, the algorithm ends up taking many,
progressively smaller, zig-zag steps down the valley. The net movement is in a diagonal direction
along the valley floor. If that direction was one of our ui directions then we might be able to
take one big step directly to the minimum. This motivates the development of direction set
methods which attempt to adapt the ui directions to the geometry of the function being
minimized.
Consider the quadratic function (2). This can be written as
1
f =c+xT b+ x T A x
2
EE 221 Numerical Computing

Scott Hudson

(11)

2015-08-18

Lecture 18: Optimization in n dimensions

4/9

Fig. 3: Successive line minimization along the coordinate axes. Left: contours of quadratic function.
Right: Progressive results of line minimization.

The gradient is

f =b+A x

(12)

Suppose we have found the minimum of g 1 (t)= f ( x 0+t u1) at x 1 . That this point the gradient
of f must be orthogonal to u1 , otherwise we could move along the u1 to lower values of f.
Therefore
uT1 b+uT1 A x1=0

(13)

Now we find the minimum of g 2 (t)= f (x 1+t u2 ) at x 2=x1+t m u 2 . At this new x value we
want both u1 and u 2 to be orthogonal to the gradient. This ensures that the new point remains a
minimum along the u1 direction as well as the u 2 direction. This requires
T

u1 b+u1 A(x 1+t m u 2)=0

(14)

uT1 A u 2=0

(15)

Because of (13) this reduces to


Two vectors u1 , u2 satisfying this condition are said to be conjugate. A set of n vectors
u1 , u2 , , un in which all pairs are conjugate is a conjugate set. One of the first methods
presented (1964) to generate a set of conjugate directions was Powell's method to which we now
turn.

4 Powell's method
Powell showed that a simple addition to the successive line minimization algorithm enables it to
find conjugate directions and minimize an arbitrary quadratic function of n variables in n
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 18: Optimization in n dimensions

5/9

iterations. After completing the line minimizations of the for loop, we form a new direction v
which is the net direction x 0 moved due to the n line minimizations. We then perform a single
line minimization along the direction v. Finally, we discard the first search direction, u1 , left
shift the other n1 directions ( ui ui+1 ) and make v the new u n direction. It turns out any
quadratic function will be minimized by n iterations of this procedure. The algorithm is
Powell's method
start with initial guess x 0 and search directions ui
iterate until converged
save current estimate x old
0 x0
for i=1,2, , n
find t that minimizes f (x 0+t ui )
set x 0 x 0+t ui
old
v [x0 x old
0 ]/ x 0x 0

find t to minimize f (x 0+t v )


set x 0 x 0+t v
for i=1,2, , n1
ui ui+1

un v

Fig. 4: Powell's method. Left: contours of quadratic function; Right: progressive results of each line
minimization. The minimum is found in n=2 iterations (6 line minimizations total).

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 18: Optimization in n dimensions

6/9

Fig. 5: The "banana function. This is actually the negative function so that the valley appears as a
hill. Values less than -100 have been chopped off to show greater detail.

A Scilab version of Powell's method is given in the Appendix. Applying this to function (10),
starting at x= y=0 , we obtain the results shown in Fig. 4. In two iterations of three line
minimizations each (Powell's method adds one line minimization after the for loop) we arrive at
the minimum. Powell's method has figured out the necessary diagonal direction.
A more challenging test is given by the Rosenbrock function
f (x , y )=(1x)2 +100( y x2 )2

(16)

shown in Fig. 5. Because of its shape it is sometimes called the banana function. The minimum

Fig. 6 Powell's method following the valley of the banana function.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 18: Optimization in n dimensions

7/9

is f (1,1)=0 . Unlike the function of Fig. 4, the valley of this function twists and the algorithm
must following this changing direction. Starting at x= y=0 Powell's method gives the results
shown in Fig. 6. We can see how the algorithm tracks the twisting valley and arrives at the
minimum after only a few iterations.

5 Newton's method
Earlier we saw that the gradient of the quadratic function
n

f (x 1 , x 2 , , x n)=c+ b i xi +
i=1

1
a x x
2 i =1 j=1 ij i j

(17)

vanishes at
x=A1 b

(18)

Newton's method approximates an arbitrary function by a quadratic Taylor series with


bi =

2
f
f
, a ij =a ji=
xi x j
xi

(19)

The vector b is the gradient of f. The matrix of second derivatives A is called the Hessian of f.
We solve for the minimum of this quadratic Taylor series and take that to be our new x. We
continue until the method has converged.
Newton's method
iterate until converged
at x evaluate the gradient b and the Hessian A
set x xA1 b
Let's apply Newton's method to the banana function
f (x , y )=(1x)2 +100( y x 2)2

(20)

The gradient is

()

f
2(1x)400 x ( yx 2 )
x
b=
=
f
200( y x 2)
y

(21)

The Hessian is

2 f
2
x
A=
2 f
y x

2 f
x y
2+1200 x 2 400 y 400 x
=
400 x
200
2 f
2
y

(22)

Starting at x= y=0 , we have


EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 18: Optimization in n dimensions

( )

b= 2
0

8/9

2 0
, A=
0 200

1
, A =

( )

1 200 0
1
1
, A b=
400 0 2
0

(23)

and the new estimate is

() () ( ) ()
x
0
1
1
=
=
y
0
0
0

(24)

At x=1 , y=0 we have

( )

b=

( )

1
400
1202 400
1
200 400
1
0
, A=
, A =
, A b=
80400 400 1202
200
400 200
1

(25)

and the new estimate is

() () ( ) ()
x = 1 0 = 1
y
0
1
1

(26)

which is the solution.


Newton's method is conceptually simple and quite powerful. However, it requires us to compute
the gradient and the Hessian of the function. This may be difficult or impossible in many cases.
To overcome this challenge, quasi-Newton methods have been developed which attempt to form
an approximation of the Hessian matrix, and possibly the gradient also, as the algorithm
progresses. One such method is the BroydenFletcherGoldfarbShanno (BFGS) algorithm,
presented in 1970.
Another challenge is that for a complicated function, Newton's method needs to be started
sufficiently close to a minimum to be stable. As in the 1D case this motivates the development of
hybrid algorithms that attempt to use a quasi-Newton method when it works, but revert to a slowbut-sure backup when it does not. The Scilab optim function is of this type. The calling syntax
is the same as for the 1D case
[fopt,xopt] = optim(list(NDcost,f),x0);

Here f(x) is the function to be minimized, x0 is an initial guess at the minimum, xopt is the
computed minimum and fopt is the minimum function value. By default optim assumes the
function f provides both function and gradient values. If f returns only a function value, the
list(NDcost,f) statement takes care of providing numerical derivatives. Here's this
function applied to the banana function.
x0 = [0;0];
function z = f(x)
z = (1-x(1))^2+100*(x(2)-x(1)^2)^2;
endfunction
[fopt,xopt] = optim(list(NDcost,f),x0);
disp(xopt);

This produces the output


1.0000000
1.0000000

which is the minimum.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 18: Optimization in n dimensions

9/9

6 Appendix Scilab code


6.1 Powell's method
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032
0033
0034
0035
0036
0037
0038
0039
0040
0041

//////////////////////////////////////////////////////////////////////
// optimPowell.sci
// 2014-10-31, Scott Hudson, for pedagogic purposes.
// Implements Powell's method for minimizing a function of
// n variables.
//////////////////////////////////////////////////////////////////////
function [xmin, fmin]=optimPowell(fun, x0, h, tol)
n = length(x0); //# of variables
searchDir = eye(n,1); //direction for current search
searchDirs = eye(n,n); //set of n search directions
function s=gfun(t)
//local scalar function to pass to 1D
s = fun(x0+t*searchDir) //optimization routines
endfunction
done = 0;
while(~done)
x0old = x0; //best solution so far
for i=1:n //minimize along each of n directions
searchDir = searchDirs(:,i);
[a,b,c,fa,fb,fc] = optimBracket(-h,h,gfun);
[tmin,gmin] = optimGolden(a,b,c,fa,fb,fc,gfun,tol/10);
x0 = x0+tmin*searchDir; //minimum along this direction
end
for i=1:n-1 //update search directions
searchDirs(:,i) = searchDirs(:,i+1);
end
v = x0-x0old; //new search direction
searchDirs(:,n) = v/sqrt(v'*v); //add new search dir unit vector
searchDir = searchDirs(:,n); //minimize along new direction
[a,b,c,fa,fb,fc] = optimBracket(-h,h,gfun);
[tmin,gmin] = optimGolden(a,b,c,fa,fb,fc,gfun,tol/10);
x0 = x0+tmin*searchDir;
xChange = sqrt(sum((x0-x0old).^2));
if (xChange<tol)
done = 1;
end
end //while
xmin = x0;
fmin = fun(xmin);
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 19
Curve fitting I
1 Introduction
Suppose we are presented with eight points of measured data (x i , y j ) . As shown in Fig. 1 on the
left, we could represent the underlying function of which these data are samples by
interpolating between the data points using one of the methods we have studied previously.

Fig. 1: Measured data with: (left) spline interpolation, (right) line fit.

However, maybe the data are samples of the response of a process that we know, in theory, is
supposed to have the form y= f ( x )=a x+b where a,b are constants. Maybe we also know that
y is a very weak signal and the sensor used to measure it is noisy, that is, it adds its own
(random) signal in with the true y data. Given this it makes no sense to interpolate the data
because in part we'll be interpolating noise, and we know that the real signal should have the
form y=ax+b . In a situation like this we prefer to fit a line to the data rather than perform an
interpolation (Fig. 1 at right). If done correctly this can provide a degree of immunity against the
effects of measurement errors and noise. More generally we want to develop curve fitting
techniques that allow theoretical curves, or models, with unknown parameters (such as a and b in
the line case) to be fit to n data points.

2 Fitting a constant to measured data


The simplest curve fitting problem is estimating a parameter from multiple measurements.
Suppose m is the mass of an object. We want to measure this using a scale. Unfortunately the
scales in our laboratory are not well calibrated. However we have nine scales. We expect that if
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 19: Curve fitting I

2/10

Fig. 2: Horizontal line is average of several measurements (dots)

we take measurements with all of them and average the results we should get a better estimate of
the true mass that by relying on the measurement from a single scale. Our results might look
something like shown in Fig. 2. Let the measurement of the ith scale be mi then the average
measurement is given by
n

m=

1
m
n i=1 i

(1)

where n is the number of measurements. This is what we should use for our best estimate of
the true mass. Averaging is a very basic form of curve fitting.

3 Least-squares line fit


Going back to the situation illustrated in Fig. 1, how do we figure out the best fit line? There
doesn't seem to be a straightforward way to average the data like we did in Fig. 2. Instead, let's
suppose we have n data points ( x i , y i ) . We are interested in a linear model of the form
y=a x+b , and our task is calculate the best values for a and b. If all our data actually fell on
a line then the best a and b values would result in y i(a xi +b)=0 for i=1,2, , n . More
generally let's define the residual (error of the fit) for the ith data point as
r i= y i(ax i+b)

(2)

A perfect fit would give r i=0 for all i. The residual can be positive or negative, but what we are
most concerned with is its magnitude. Let's define the mean squared error (MSE) as
n

1
1
MSE = r i2= ( y i( a xi +b))2
n i =1
n i=1

EE 221 Numerical Computing

Scott Hudson

(3)

2015-08-18

Lecture 19: Curve fitting I

3/10

We now seek the values of a and b that minimize the MSE. These will satisfy
MSE
MSE
=0 and
=0
a
b

(4)

The b derivative is
n

MSE
2
= ( y i(a x i+b))=0
b
n i =1

(5)

Multiplying through by 1 /2 and rearranging we find


n

1
a
1
yi x i b=0

n i =1
n i=1
n i=1

(6)

Now define the average x and y values as


n

1
1
y= y i , x = x i
n i=1
n i=1

(7)

ya x b=0

(8)

a x +b= y

(9)

Equation (6) then reads


or
This tells us that the point ( x , y ) (the centroid of the data) falls on the line.
The a derivative of the MSE is
n

MSE
2
= ( y i(a x i+b)) x i =0
a
n i =1

(10)

Multiplying through by 1 /2 and rearranging we find


n

1
a
b
x i y i x 2i x i=0

n i =1
n i =1
n i =1

(11)

xy a x 2 b x =0

(12)

or
with the additional definitions
n

1
1
xy = x i y i , x 2 = x 2i
n i=1
n i=1

(13)

a x 2 +b x = xy

(14)

A final rearrangement gives us


We now have two equations in the two unknowns a,b
a x +b= y
a x +b x = xy
2

EE 221 Numerical Computing

Scott Hudson

(15)

2015-08-18

Lecture 19: Curve fitting I

4/10

Fig. 3: Least-squares line fit to noisy data.

Solving the first equation for b


b= y a x

(16)

and substituting this into the second equation we obtain


a x 2 +( ya x ) x = xy

(17)

Solving this for a we have


a=

xy x y
x 2 x 2

(18)

Equations (18) and (16) provide the best-fit values of a and b. Because we obtained these
parameters by minimizing the sum of squared residuals, this is called a least-squares line fit.
Example. The code below generates six points on the line y=1 x and adds
normally-distributed noise of standard deviation 0.1 to the y values. Then (18)
and (16) are used to calculate the best-fit values of a and b. The data and fit line
are plotted in Fig. 3. The true values are a=1 , b=1 . The fit values are
a=0.91 ,b=1.02 .
-->x = [0:0.2:1]';
-->y = 1-x+rand(x,'normal')*0.1;
-->a = (mean(x.*y)-mean(x)*mean(y))/(mean(x.^2)-mean(x)^2)
a =
- 0.9103471
-->b = mean(y)-a*mean(x)
b =
1.0191425

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 19: Curve fitting I

5/10

4 Linear least-squares
The least-squares idea can be applied to a linear combination of any m functions
f 1 (x) , f 2 (x) , , f m (x) . Our model has the form
m

y= c j f j ( x)

(19)

j=1

For example, if m=2 and f 1 (x)=1 , f 2 ( x)=x then our model is


y=c 1+c 2 x

(20)

which is just the linear case we've already dealt with. If we add f 3 (x)=x 2 then the model is
y=c 1+c 2 x+c 3 x 2

(21)

which is an arbitrary quadratic. Or we could have a model such as


y=c 1 cos (5 x)+c 2 sin (5 x)+c 3 cos (10 x)+c 4 sin(10 x )

(22)

In any case we'll continue to define the residuals as the difference between the observed and the
modeled y values
m

r i = y i c j f j ( x i)

(23)

j =1

and the mean-squared error as


n

1
1
MSE = r i2= y i c j f j (x i )
n i =1
n i=1
j =1

(24)

Let's expand this as


n

1
1
y i c j f j (x i ) = y 2i 2 y i c j f j ( x i )+

n i =1
n i =1
j=1
j =1

j =1

])
2

c j f j ( x i)

(25)

Call
n

1
yi2= y 2

n i =1
and
n

2
y c f (x )= b j c j
n i=1 i j=1 j j i
j =1

(26)

with
n

2
b j = yi f j ( xi )
n i =1

(27)

The last term in (25) can be written


EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 19: Curve fitting I

6/10

[
m

j=1

j=1

k=1

c j f j ( xi ) = c j f j ( x i ) c k f k (x i)

(28)

Therefore
n

n i =1

c j f j ( xi ) =
j=1

n i=1

j=1

k=1

c j f j ( xi ) ck f k (x i ) =

1
a c c
2 j=1 k=1 jk j k

(29)

with
n

a jk =a kj =

2
f (x ) f (x )
n i=1 j i k i

(30)

Finally we can write


m

MSE = y 2 bi ci +
i =1

1
a c c
2 i =1 j=1 ij i j

(31)

This shows that the MSE is a quadratic function of the unknown coefficients. In the lecture
Optimization in n dimensions we calculated the solution to a system of this form, except that
the second term (with the b coefficients) had a plus rather than minus sign. Defining the m1
column vectors b and c and the mm matrix A as

c=[c j ] , b=[b j ] , A=[a ij ]

(32)

the condition for a minimum is (with the minus sign for the b coefficients)
b+A c=0

(33)

c=A1 b

(34)

and
Another way arrive at this result is to define the n1 column vector

y=[ yi ]

(35)

F=[ f ij] with f ij = f j ( xi )

(36)

y=F c

(37)

and the nm matrix


Then our model is
This is n equations in m<n unknowns and in general will not have a solution. Multiplying both
sides on the left by FT results in the system
FT Fc=FT y

(38)

Since FT F is mm and FT y is m1 this is a system of m equations in m unknowns that, in


general, will have a unique solution
1

c=( FT F ) FT y

(39)

The elements of FT F are


EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 19: Curve fitting I

7/10
n

[ F F ] jk =
T

i=1

n
f ij f ik = a jk
2

(40)

n
f ij y i= b j
2

(41)

while the elements of FT y are


n

[ F y ] j =
T

i =1

Therefore FT Fc=FT y , when multiplied through by 2 /n , is equivalent to


(42)

A c=b
The linear system (38) is called the normal equation, and we have the following algorithm
Linear least squares fit
Given n samples (x i , y i )
m

and a model y= c j f j ( x)
j=1

Form the nm matrix F with elements f ij = f j ( xi )


Form the n1 column vector y with elements y i
Solve the normal equation FT Fc=FT y for c
^
The modeled y values are y=Fc
The nm matrix F is not square if n>m , so we cannot solve the linear system
y=Fc

(43)

c=F1 y

(44)

by writing
because F does not have an inverse. However, as we've seen, we can compute
1

c=( FT F ) FT y

(45)

and this c will come as close as possible (in a least-squares sense) to solving (43). This leads us
to define the pseudoinverse of F as the mn matrix
1

F =( F F ) F
T

(46)

Our least-squares solution can now be written


c=F + y

(47)

In Scilab/Matlab the pseudo inverse is computed by the command pinv(F). However, if we


simply apply the backslash operator as we would for a square system
c = F\y

Scilab/Matlab returns the least-squares solution. We do not have to explicitly form the normal
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 19: Curve fitting I

8/10

equation or the pseudoinverse.


Example. Noise was added to Eleven samples of y=x 2 x , x=0,0.1,0 .2, , 1 .
A least-squares fit of the model c 1+c 2 x+c 3 x 2 gave
c 1=0.044 , c1=1.110 , c2=1.039
Code is shown below and results are plotted in Fig. 4.
-->x = [0:0.1:1]';
-->y0 = x.^2-x;
-->y = y0+rand(y0,'normal')*0.03; //add noise
-->F = [ones(x),x,x.^2];
-->c = F\y
c =
0.0436524
- 1.1104735
1.0390231
-->yf = F*c

5 Goodness of fit
Once we've fit a model to data we may wonder if the fit is good or not. It would be helpful to
have a measure of goodness of fit. Doing this rigorously requires details from probability theory.
We will present the following results without derivation.
Assume our y values are of the form
y i=si +i
where si is the signal that we are trying to model and i is noise. If our model were to perfectly

Fig. 4: f ( x)= x x (dashed curve), samples of f ( x)


with noise added (dots) and least-squares fit of model
2
c1 +c2 x+c3 x (solid line).

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 19: Curve fitting I

9/10

fit the signal, then the residuals


m

r i= y i c j f j (x i)

(48)

j =1

would simply be noise r i=i . We can quantify the goodness of fit by comparing the statistics of
our residuals to the (assumed known) statistics of the noise. Specially, for large nm , and
normally distributed noise, a good fit will result in the number
=

1
r2
nm i=1 i

(49)

being equal, on average, to the standard deviation of the noise, where n is the number of data and
m is the number of model coefficients. If it is significantly larger than this it indicates that the
model is not accounting for all of the signal, where a fractional change of about 2/( nm) is
statistically significant. For example, 2/50=0.2 means that a change of around 20% is
statistically significant. If the noise standard deviation is 0.1 , a larger than about
0.1(1.2)=0.12 implies the signal is not being fully modeled. The following example illustrates
the use of this goodness-of-fit measure.
Example. The following code was used to generate 50 samples of the function
f (x)=x+x 2 over the interval 0x1 with normally distributed noise of
standard deviation 0.05 added to each sample.
n = 50;
rand('seed',2);
x = [linspace(0,1,n)]';
y = x+x.^2+rand(x,'normal')*0.05;

These data were then fit by the four models y=c 1 , y=c 1+c 2 x ,
y=c 1+c 2 x+c 3 x 2 and y=c 1+c 2 x+c 3 x 2+c 4 x 3 . The resulting values were
0 =0.6018 , 1=0.0864 , 2=0.0506 and 3 =0.0504 . Since 2/50=0.2 a
change of about 20% is statistically significant. The fits improved significantly
until the last model. The data therefore support the model y=c 1+c 2 x+c 3 x 2 but
not the cubic model. The fits are shown in Fig. 5.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 19: Curve fitting I

10/10

Fig. 5 Data set fit by polynomials. Top-left: y=c 1 , 0=0.6018 . Top-right: y=c 1+c 2 x , 1=0.0864 .
2
2
3
Bottom-left: y=c 1+c 2 x+c3 x , 2=0.0506 . Bottom-right: y=c 1+c 2 x+c3 x +c4 x , 3=0.0504 .

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 20
Curve fitting II
1 Introduction
In the previous lecture we developed a method to solve the general linear least-squares problem.
Given n samples (x i , y i ) , the coefficients c j of a model
m

y= f (x )= c j f j ( x )

(1)

j=1

are found which minimize the mean-squared error


n

MSE=

2
1
y i f ( x i) ]

[
n i =1

(2)

The MSE is a quadratic function of the c j and best-fit coefficients are the solution to a system of
linear equations.
In this lecture we consider the non-linear least-squares problem. We have a model of the form
y= f ( x ; c 1 , c 2 , , c m )

(3)

where the c j are general parameters of the function f, not necessarily coefficients. An example
is fitting an arbitrary sine wave to data where the model is
y= f ( x ; c 1 , c 2 , c3 )=c 1 sin (c 2 x+c 3)
The mean-squared error
n

MSE (c 1 , , c m)=

2
1
y i f ( x i ; c 1 , , c m )]

[
n i=1

(4)

will no longer be a quadratic function of the c j , and the best-fit c j will no longer be given as
the solutions of a linear system. Before we consider this general case, however, let's look at a
special situation in which a non-linear model can be linearized.

2 Linearization
In some cases it is possible to transform a nonlinear problem into a linear problem. For example,
the model
y=c 1 e c

(5)

is nonlinear in parameter c 2 . However, taking the logarithm of both sides gives us


ln y=ln c 1+c 2 x

(6)

If we define ^y =ln y and c^ 1=ln c 1 then our model has the linear form

EE 221 Numerical Computing

^y =^c 1+c 2 x

(7)

Scott Hudson

2015-08-18

Lecture 20: Curve fitting II

2/5

Fig. 1: Dashed line: y= e 2 x . Dots: ten samples with added noise. Solid line: fit of the
c x
model y=c 1 e
obtained by fitting a linear model ln y=^c1+c 2 x and then calculating
c^
c1 =e .
2

^c
Once we've solved for c^ 1 , c 2 we can calculate c 1=e .
1

Example. Noise was added to ten samples of y= e 2 x , 0x2 . The


following code computed the fit of the linearized model.
ylog = log(y);
a = (mean(x.*ylog)-mean(x)*mean(ylog))/(mean(x.^2)-mean(x)^2);
b = mean(ylog)-a*mean(x);
c1 = exp(b);
c2 = a;
disp([c1,c2]);
3.0157453

- 1.2939429

3 Nonlinear least squares


The general least-squares problem is to find the c j that minimize
n

2
1
MSE (c 1 , , c m)= [ y i f ( x i ; c 1 , , c m )]
n i=1

(8)

This is simply the optimization in n dimensions problem that we dealt with in a previous
lecture. We can use any of those techniques, such as Powell's method, to solve this problem. It is
convenient, however, to have a front end function that forms the MSE given the data (x i , y i )
and the function f ( x ; c 1 , c 2 , , c m ) and passes that to our minimization routine of choice. The
function fitLeastSquares in the Appendix is an example of such a front end.
The following example illustrates the use of fitLeastSquares.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 20: Curve fitting II

3/5

Example. Eleven data samples in the interval 0x1 of the function


y=2 cos (6 x+0.5) were generated. Normally distributed noise with standard
deviation 0.1 was added to the y data. A fit of the model y=c 1 cos (c 2 x+c 3)
gave c 1=1.94 , c 2=5.91 , c 3=0.53 . The fit is shown in Fig. 2 and was
generated with the following code.
rand('seed',2);
x = [0:0.1:1]';
y = 2*cos(6*x+0.5)+rand(x,'normal')*0.1;
c0 = [1;5;0];
function yf = fMod(x,c)
yf = c(1)*cos(c(2)*x+c(3));
endfunction
[c,fctMin] = fitLeastSquares(x,y,fMod,c0,0.01,1e-6);
disp(c);
1.9846917
5.8614475
0.5453276

4 The datafit (Scilab) and lsqcurvefit (Matlab) functions


The Scilab function datafit provides a front end to the optim function for solving leastsquares problems. We create a function to calculate the residual given a single value of x and a
single value of y. These x,y values are passed in a 21 array.

Fig. 2: Fit of the model y=c 1 cos(c 2 x+c 3) to 11 data points.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 20: Curve fitting II

4/5

The following code solves the problem in the previous example using datafit.
rand('seed',2);
x = [0:0.1:1]';
y = 2*cos(6*x+0.5)+rand(x,'normal')*0.1;
c0 = [1;5;0];
xyArray = [x,y]';
function r = residual(c,xy)
x = xy(1);
y = xy(2);
r = y-c(1)*cos(c(2)*x+c(3));
endfunction
c0 = [1;5;0];
c = datafit(residual,xyArray,c0);
disp(c);

In Matlab the function lsqcurvefit can be used to implement a least-squares fit. The first
step is to create a file specifying the model function in terms of the parameter vector c and the x
data. In this example the file is named fMod.m
function yMod = fMod(c,x)
yMod = c(1)*cos(c(2)*x+c(3));

Then, in the main program we pass the function fMod as the first argument to lsqcurvefit,
along with the initial estimate of the parameter vector c0 and the x and y data.
x = [0:0.1:1]';
y = 2*cos(6*x+0.5)+randn(size(x))*0.1;
c0 = [1;5;0];
c = lsqcurvefit(@fMod,c0,x,y);
disp(c);

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 20: Curve fitting II

5/5

5 Appendix Scilab code


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022

//////////////////////////////////////////////////////////////////////
// fitLeastSquares.sci
// 2014-11-11, Scott Hudson, for pedagogic purposes
// Given n data points x(i),y(i) and a function
// fct(x,c) where c is a vector of m parameters, find c values that
// minimize sum over i (y(i)-fct(x(i),c))^2 using Powell's method.
// c0 is initial guess for parameters. cStep is initial step size
// for parameter search.
//////////////////////////////////////////////////////////////////////
function [c,fctMin] = fitLeastSquares(xData,yData,fct,c0,cStep,tol)
nData = length(xData);
function w=fMSE(cTest)
w = 0;
for i=1:nData
w = w+(yData(i)-fct(xData(i),cTest))^2;
end
w = w/nData;
endfunction
[c,fctMin] = optimPowell(fMSE,c0,cStep,tol);
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 21
Numerical differentiation
1 Introduction
We can analytically calculate the derivative of any elementary function, so there might seem to
be no motivation for calculating derivatives numerically. However we may need to estimate the
derivative of a numerical function, or we may only have a fixed set of sampled function values.
In these cases we need to estimate the derivative numerically.

2 Finite difference approximation


The definition of the derivative of a function f ( x ) that you will most often find in calculus
textbooks is
f ( x+h) f (x )
df
=lim
dx h 0
h

(1)

This immediately suggests the approximation


f ( x+h) f ( x)
df f

=
dx x
h

(2)

where the step size h is small but not zero. This is called a finite difference. Specifically it's a
forward difference because we compare the function value at x with its value at a point
forward of this along the x axis, x+h .
How small should h be? Because of round-off error, smaller is not always better. Let's use Scilab
to estimate
d x
e
dx

=1
x=0

e e
h

(3)

for various h values. The absolute error in the estimate vs. h is graphed in Fig. 1. As h decreases
from 101 down to 108 the error decreases also. However for h=109 and smaller the error
actually increases! The culprit is round-off error in the form of the small difference of large
numbers. Double precision arithmetic provides about 16 digits of precision. If h1016 then
e he0 to 16 digits and the difference e he 0 will be very inaccurate. When h=108 the
difference e he 0 will be accurate to about 8 digits or about 108 , the point at which theoretical
improvement in numerical accuracy is offset by higher round-off error. We typically ignore
round-off error when estimating numerical accuracy, but round-off error needs to be kept in mind
when implementing any algorithm.
Let's investigate how the numerical accuracy of our estimate varies with step size h. Assume we
want to estimate the derivative of f (x ) at x=0 . Write f as a power series

f (x)= f (0)+
n=1

EE 221 Numerical Computing

1 (n )
f (0) x n
n!

Scott Hudson

(4)

2015-08-18

Lecture 21: Numerical differentiation

2/11

Fig. 1: Error in forward-difference estimation of derivative of e at x=0 vs. step size h.


1
8
As h decreases from 10
to 10 , numerical error decreases proportionally. As h
decreases further, round-off error begins to exceed numerical error.

Then
f (h) f (0) 1 (n)

1
1
=
f (0)h n1= f (0)+ f (0)h+ f (0) h2+
h
n!
2
6
n=1

(5)

For small h therefore


f (h) f (0)

1
= f (0)+ f (0) h+
h
2

(6)

and we say that the approximation is first-order accurate since the error (the second term on the
right) varies as the first power of h. Decreasing h by a factor 1/10 will decrease the error by
1/10 . However, as we see in Fig. 1 this is only true up to the point that round-off error begins to
be significant. For double precision h108 is optimal. We can shift (6) along the x axis and
rearrange to obtain the general forward-difference formula

f (x )=

f ( x+h) f ( x)
+O(h)
h

(7)

3 Higher-order formulas
The forward-difference approximation (6) uses two samples of the function, namely f (0) and

f (h) . Using three samples we might be able to get a better estimate of f (0) . Suppose we
have the samples f (h), f (0), f (h) . In terms of the power series representation of the
function these are

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 21: Numerical differentiation

3/11

f (h) =

f ( 0) f (0) h+

f (0)

f ( 0)

f (h)

f ( 0)+ f ( 0) h+

1
1
1
1
f (0)h 2 f (3) ( 0) h 3+ f (4) (0)h 4 f (5 )( 0)h5+
2
3!
4!
5!
(8)

1
1
1
1
f (0)h 2+ f (3) (0) h3+ f (4 ) (0)h4 + f (5) (0)h5+
2
3!
4!
5!

Let's use a linear combination of these values to estimate f (0) as

f (0)a f (h)+b f ( 0)+c f ( h)

(9)

where a , b , c are unknown coefficients that we will choose to get the best possible estimate.
The sum a f (h)+b f (0)+c f ( h) will include a term with a factor of f (0) . We want this to
vanish. This requires
(a+b+c) f ( 0)=0 a+b+c=0

(10)

which is one equation in three unknowns. Terms with a factor of f (0) should combine to give

the derivative value f (0) . This requires

(a+c) f (0) h= f (0)a+c=

1
h

(11)

We now have two equations in three unknowns. To get a third equation we can require the next

2
term, which contains a factor of f (0) h , to vanish. This gives us the equation
(a+c)

1
f ( 0)h2=0 a+c=0
2

(12)

Our three equations in three unknowns can be written as

)( ) ( )

0
1 1 1 a
1
1 0 1 b =
h
1 0 1 c
0

The solution is
a=

1
1
, b=0 , c=
2h
2h

and our approximation reads

f (0)

f ( h) f (h)
2h

Using (8) we have


f (h) f (h)

1
= f (0)+ f (3) (0)h 2+
2h
6
so this approximation is second-order accurate. Decreasing h by a factor of 1/10 should
decrease the numerical error by a factor of 1/100 . Rearranging and writing this for an arbitrary
value of x we have the formula
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 21: Numerical differentiation

4/11

Fig. 2: Error in central-difference estimate of derivative of e vs. h.

f (x )=

f ( x+h) f ( xh)
+O( h2 )
2h

(13)

This type of finite difference is called a central difference since it uses both the forward sample
f (x+h) and the backward sample f (xh) . Scilab code is given in the Appendix.
The error in the central-difference approximation
d x
e
dx

=1
x=0

e heh
2h

is plotted in Fig. 2. Note how the error reduces more rapidly with decreasing h. This allows the
approximation to reach a greater accuracy before round-off error starts to become significant.
With h=105 the error is only about 1011 .
We extend this idea by using even more function samples. If we have the five samples
f (2 h) , f (h) , f (0) , f ( h), f (2 h) we can form an estimate

f (0)a f (2 h)+b f (h)+c f (0)+d f (h)+e f (2 h)

(14)

This has five unknowns, so we need to form five equations. In terms of the Taylor series
representation of f ( x ) our five samples have the form

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 21: Numerical differentiation

f (2 h)

8 (3)
16 (4 )
32 (5)
f (0) h 3+
f (0)h4
f (0) h5+
3!
4!
5!

1
1
1
1
f ( 0) f (0) h+ f (0) h2 f (3) (0) h3+ f (4) (0)h 4 f (5) ( 0) h 5+
2
3!
4!
5!
(15)
f ( 0)
1
1
1
1

f ( 0)+ f (0) h+ f (0) h2+ f (3) (0) h3+


f (4) (0) h4 +
f (5) (0) h5+
2
3!
4!
5!

8
16 (4)
32 (5)
f ( 0)+2 f (0) h+2 f ( 0) h 2+ f (3) (0)h3+
f (0)h 4+
f (0) h 5+
3!
4!
5!

f (h) =
=

f (h)

f (2 h)

f ( 0)2 f (0)h+2 f (0) h 2

f (0)

5/11

To get the f (0) terms in (14) to vanish requires


(a+b+c+d +e) f (0)=0 a+b+c+d +e=0

(16)

To get the f (0) terms to produce the value f (0) requires

(2 ab+d +2 e ) f (0)h= f ( 0)2 a+b+d +2 e=

1
h

(17)

The remaining three equations are obtained by requiring the f (0) , f (3) (0) and f (4 )( 0) terms
to vanish:

b d

2 a+ + +2 e f (0) h 2=0 4 a+b+d +4 e=0


2 2

1
(8 ab+d +8 e ) f (3) (0)h3=0 8 ab+d +8 e=0
3!
1
( 16 a+b+d +16 e ) f (4) (0) h 4=0 16 a+b+d +16 e=0
4!
1
( 16+b+d +16 e ) f (4) ( 0)h 4=0 16 a+b+d +16 e=0
4!
Our five equations in five unknowns form the system

)( ) ( )

1
1 1 1 1
2 1 0 1 2
4
1 0 1 4
8 1 0 1 8
16
1 0 1 16

0
a
1
b
h
c =
0
d
0
e
0

(18)

which has the solution


a=

1
8
8
1
, b=
, c=0 , d =
, e=
12 h
12 h
12 h
12 h

(19)

Our approximation is therefore

f (0)
EE 221 Numerical Computing

f (2 h)8 f (h)+8 f (h) f (2 h)


12 h
Scott Hudson

(20)
2015-08-18

Lecture 21: Numerical differentiation

6/11

Fig. 3: Fourth-order accurate central-difference approximation to the derivative of e .

Using (15) the f (5) (0) terms are


f

(5)

(0)

h
1
(32+8+832 )= f (5) (0) h 4
12 h5!
30

(21)

so
f (2 h)8 f (h)+8 f (h) f (2 h)

1 (5)
4
= f (0)
f (0)h +
12 h
30

(22)

We see that this is a fourth-order accurate approximation. Decreasing h by a factor of 1/10


should decrease the numerical error by a factor of 1/10,000 . This is illustrated in Fig. 3 for
x
3
14
f (x )=e . For h=10 the error is only about 10 .
For arbitrary x value our fourth-order central difference approximation is

f ( x)=

f (x2 h)8 f ( xh)+8 f ( x+h) f ( x+2 h)


4
+O(h )
12 h

(23)

Scilab code is given in the Appendix.

4 The numderivative function (Scilab)


Scilab has a numerical derivative function named numderivative. To calculate the derivative
of the function f ( x ) at x=x 0 we execute
fp = numderivative(f,x0);

If f is a function several variables f (x) , then numderivative will return the gradient of f. It
is also possible to specify the step size h and the order of the approximation (1,2 or 4).
fp = numderivative(f,x0,h,order);
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 21: Numerical differentiation

7/11

The default is second order (central difference) and Scilab chooses an optimal value of h.
Some examples:
-->deff('y=f(x)','y=exp(-x)');
-->numderivative(f,1) //default central difference with optimal h
ans =
- 0.3678794 //exact value is -exp(-1)=-0.3678794...
-->(f(1.1)-f(1))/0.1 //forward difference h=0.1
ans =
- 0.3500836
-->numderivative(f,1,0.1,1) //forward difference h=0.1
ans =
- 0.3500836
-->(f(1.1)-f(0.9))/0.2 //central difference h=0.1
ans =
- 0.3684929
-->numderivative(f,1,0.1,2) //central difference h=0.1
ans =
- 0.3684929

In most cases the default suffices.

5 Second derivative

With reference to (8), if we want to approximate the second derivative f (0) as

f (0)a f (h)+b f ( 0)+c f ( h)

(24)

(a+b+c) f ( 0)=0 a+b+c=0

(25)

we would require

(a+c) f ( 0)h=0 a+c=0

(26)

and
(a+c)

1
2

f (0)h2= f (0) a+c= 2


2
h

(27)

The solution to these three equations is


a=c=

1
2
, b= 2
2
h
h

(28)

and we find
f (h)2 f (0)+ f (h)
h

= f (0)+

1 (4)
f ( 0)h 2+
12

(29)

so the approximation is second-order accurate. For arbitrary x value we have

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 21: Numerical differentiation

8/11

f (x)=

f ( x+h)2 f ( x)+ f ( xh)


+O (h 2)
2
h

(30)

6 Partial derivatives
For a function f (x , y ) the partial derivative

f
can be defined as
x

f ( x+h , y ) f (x , y)
f
=lim
x h0
h
that is, we hold y fixed and compute the derivative as if f was only a function of x. A centraldifference approximation is
f ( x+h , y) f ( xh , y)
f

x
2h

(31)

f ( x , y+h) f ( x , yh)
f

y
2h

(32)

f
f ( x+h , y)2 f ( x , y )+ f ( xh , y )

2
x
h2

(33)

Likewise

and
2

f
Mixed partial derivative approximations such as
can be developed in steps such as
y x
2

yx

[ ] [ ]
f
x

y+h

f
x

yh

2h
f (x+h , y+h) f ( xh , y+h) f (x+h , y h) f ( xh , yh)

2h
2h

2h
f ( x+h , y+h) f (xh , y+h) f (x+h , yh)+ f ( xh , yh)
=
4 h2

(34)

7 Differential equations
7.1 Ordinary differential equations
An ordinary differential equation (ODE) relates a single independent variable, e.g., x, to a

function f (x ) and its derivatives f (x ) , f (x) , . Most physical laws are expressed in terms
of differential equations, hence their great importance. Certain classes of ODEs can be solved
analytically but many cannot. In either case our derivative formulas can be used to develop
numerical solutions.
Suppose a physical problem is described by a differential equation of the form

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 21: Numerical differentiation

9/11

f +2 f +17 f =0

(35)

(36)

One can verify that


f ( x )=e cos(4 x)

solves (35) by taking derivatives and substituting into the equation. A numerical approximation
to (35) is given by (using (30) and (13))
f ( x+h)2 f ( x)+ f ( xh)
f (x+h) f ( xh)
+2
+17 f ( x)=0
2
2h
h

(37)

Solving this for f (x+h) we obtain


f (x+h)=

(217 h2 ) f (x )(1h) f ( xh)


1+h

(38)

Let's use this to calculate f (x ) for x=0, h , 2 h , 3 h , . To get started we need the first two
values
f ( x 1=0)=1 , f ( x 2=h)=eh cos (4 h)

(39)

Then we can apply (38) to get f (x 3=x 2 +h) , f ( x 4= x3 +h) and so on as long as we wish. In
Scilab this looks something like
h = 0.1;
x = 0:h:5;
f(1) = 1;
f(2) = exp(-h)*cos(4*h);
for i=2:n-1
y(i+1) = ((2-17*h^2)*y(i)-(1-h)*y(i-1))/(1+h);
end

The resulting numerical solution and the exact solution are shown in Fig. 4. The agreement is
excellent.
Function odeCentDiff in the Appendix uses this idea to numerically solve a second-order
equation of the form

y + p( x ) y +q ( x) y=r ( x)
Given and initial x value x 1 , a step size h and the two function values y ( x 1) and y (x 1+h) .
Fig. 4 compares the numerical solution of (35) using odeCentDiff with the exact solution
y= f ( x )=ex cos (4 x )

(40)

for a step size h=0.1 .

7.2 Partial differential equations


A partial differential equation (PDE) relates two or more independent variables, e.g., x,y, to a
f f 2 f 2 f
,
,
,
, . One of the most
function f ( x , y ) and its partial derivatives
x y x2 x y
important PDEs is Laplace's equation

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 21: Numerical differentiation

10/11

Fig. 4: Solid line: numerical solution of (35); circles: exact solution.


2

f f
+
=0
x2 y 2

(41)

Numerically we can write


2 f 2 f
+
x2 y2
f (x+h , y )2 f ( x , y )+ f ( xh , y ) f ( x , y+h)2 f (x , y)+ f ( x , yh) (42)

+
h2
h2
f ( x+h , y)+ f ( xh , y)+ f (x , y+h)+ f ( x , yh)4 f ( x , y)
=
h2
The last expression is zero when
1
f (x , y )= [ f ( x+h , y )+ f (xh , y )+ f ( x , y +h)+ f ( x , yh) ]
4

(43)

which relates the value f (x , y ) to its neighboring values. Specifically f (x , y ) is equal to


the average of its neighbor's values.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 21: Numerical differentiation

11/11

8 Appendix Scilab code


8.1 2nd order central difference
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010

//////////////////////////////////////////////////////////////////////
// derivSecondOrder.sci
// 2014-11-15, Scott Hudson, for pedagogic purposes
// Numerical estimation of derivative of f(x) using 2nd-order
// accurate central difference and "optimum" step size.
//////////////////////////////////////////////////////////////////////
function yp=derivSecondOrder(f, x)
h = 1e-5*(1+abs(x)); //step size scales with x, no less than 1e-5
yp = (f(x+h)-f(x-h))/(2*h);
endfunction

8.2 4th order central difference


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010

//////////////////////////////////////////////////////////////////////
// derivFourthOrder.sci
// 2014-11-15, Scott Hudson, for pedagogic purposes
// Numerical estimation of derivative of f(x) using 4th-order
// accurate central difference and "optimum" step size.
//////////////////////////////////////////////////////////////////////
function yp=derivFourthOrder(f, x)
h = 1e-3*(1+abs(x)); //step size scales with x, no less than 1e-3
yp = (f(x-2*h)-8*f(x-h)+8*f(x+h)-f(x+2*h))/(12*h);
endfunction

8.3 Differential equation solver using 2nd order central difference


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024

//////////////////////////////////////////////////////////////////////
// odeCentDiff.sci
// 2014-11-15, Scott Hudson, for pedagogic purposes
// Uses 2nd-order accurate central difference approximation to
// derivatives to solve ode y''+p(x)y'+q(x)y=r(x)
// approximations are
// y' = (y(x+h)-y(x-h))/(2h) and y'' = (y(x+h)-2y(x)+y(x-h))/h^2
// p,q,r are functions, x1 is the initial x value, h is step size,
// n is number of points to solve for, y1=y(x1), y2=y(x1+h).
//////////////////////////////////////////////////////////////////////
function [x, y]=odeCentDiff(p, q, r, x1, h, n, y1, y2)
x = zeros(n,1);
y = zeros(n,1);
x(1) = x1;
x(2) = x(1)+h;
y(1) = y1;
y(2) = y2;
h2 = h*h;
for i=2:n-1
hp = h*p(x(i));
x(i+1) = x(i)+h;
y(i+1) = (2*h2*r(x(i))+(4-2*h2*q(x(i)))*y(i)+(hp-2)*y(i-1))/(2+hp);
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22
Numerical integration
1 Introduction
The derivative of any elementary function can be calculated explicitly as an elementary function.
However, the anti-derivative of an elementary function may not be expressible as an elementary
function. Therefore situations arise where the value of a definite integral
b

I = f ( x)dx

(1)

must be either approximated analytically or calculated numerically, even if f is an elementary


function. If f is a numerical function, or if we only have samples of f, then numerical integration
is our only option.
There are two cases we will consider.
Case 1. We are given n samples of the function (x i , y i ) , i=1,2, , n . We cannot evaluate
f (x ) for any other values of x.
Case 2. We can evaluate f (x ) for any value of x.
In Case 1 our options are somewhat limited while in Case 2 we have the freedom (and the
burden) of deciding how many and which values of x to evaluate f (x ) at. We briefly deal with
Case 1 in the next section and then devote the rest of the lecture to Case 2.

2 Integration of sampled data


Fig. 1 illustrates our problem in Case 1. We have n samples ( x i , y i ) , x 1=a< x 2<< x n=b .
They may be uniformly spaced, with x i+1=x i+h , or non-uniformly spaced. A reasonable
approach is to estimate f (x ) by interpolating these data and integrate that interpolation. In our
study of interpolation techniques we learned that cubic splines provide the smoothest possible
function which passes through n given points. A cubic function can be explicitly integrated.
Therefore an attractive option is to interpolate our data with cubic splines and use the integral of
the spline interpolation as an estimate of the integral of the underlying function f (x ) . Scilab

Fig. 1: Integration of sampled data by integrating an interpolated function.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

2/15

provides the intsplin function to do just this. Its usage is


I = intsplin(x,y);

where x,y are the arrays of x and y samples, the integration is from x 1 to x n , and I is the
estimate of the integral. As an example, the following integral can be calculated exactly
3

I = ex sin ( x )dx=
0

(1+e3)=0.3034152 ...
+1
2

(2)

Eleven samples of f ( x ) (Fig. 2) passed to intsplin estimated I with an error of less than
1%.
deff('y=f(x)','y=exp(-x).*sin(%pi*x)');
n = 11;
x = linspace(0,3,n);
y = f(x);
I = intsplin(x,y);
disp('I = '+string(I));
I = 0.3057043

If no additional information is available, this is typically about as good as we can do with


sampled data, especially if the sampling is non-uniform. However, suppose we know from the
physics of the problem that f (x ) has to be of the form a ebx sin(cx ) . Then the best approach
would be to estimate a,b,c using a least-squares fit and integrate a ebx sin(cx ) .

Fig. 2: Eleven samples of f ( x)=e sin ( x) over 0 x3 used with the intsplin
3

function to estimate I =0 f ( x)dx .

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

3/15

Fig. 3 Integration using the midpoint rule.

3 Midpoint (rectangle) rule


Now we turn to Case 2 where we can evaluate f (x ) as desired. The advantage we have over
Case 1 is that we can increase the number of sample points and use the change in the value of I to
estimate the error in our calculation.
The midpoint rule can be thought of as integration of a nearest-neighbor interpolation (Fig. 3).
We divide the interval a xb into n sub-intervals of width h=(ba )/n . This width h is our
step size. We sample f (x ) at the midpoint of each interval. Treating the function as a constant,
the integral over one interval is f (x )h , the area of a rectangle of height f (x ) and width h.
Adding the contributions of all n intervals we have
n

I f (a+[i1/2 ] h)h where h=


i=1

ba
n

(3)

The midpoint rule is conceptually simple. In it is nothing more than a Riemann sum such as is
typically used in calculus textbooks to define a definite integral. It has the advantage that f (x )
is not evaluated at x=a , b , so it can be applied to functions which are singular at one or both
endpoints, such as
1

dxx
0

(4)

4 Trapezoid rule
The trapezoid rule approximates f (x ) using linear interpolation (Fig. 4). The integral is then
the sum of areas of trapezoids. If the left and right heights of a trapezoid are f (x i ) and
f (x i+h) then the trapezoid's area is
h
I = [ f ( xi )+ f ( x i+h)]
2

(5)

(the average height times the width). Adding up all these areas we have

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

4/15

Fig. 4: Integration using the trapezoid rule.

h
h
h
I (h)= [ f (a )+ f (a+h)]+ [ f (a+h)+ f (a+2 h)] + [ f (a+2 h)+ f ( a+3 h) ]
2
2
2
h
++ [ f (a+[n1] h)+ f (b) ]
2

(6)

Notice that except for f (a ), f (b) , all the function values appear twice in the sum. Therefore
I (h)=h

1
1
f (a )+ f (a+h)+ f ( a+2 h)++ f (a+[n1]h)+ f (b)
2
2

(7)

or
I (h)=h

n1

f (a )+ f (b)
+ f (a+ih)
2
i=1

(8)

is the formula for integration by the trapezoid method.


It is highly desirable to have some idea of the accuracy of a numerical integration. A reasonable
error estimate can be obtained by comparing one integration to a second with twice as many
samples (half the step size)
est=| I ( h/2)I (h)|
In order to achieve a desired tolerance then we can iteratively halve h until this error estimate is
less than the tolerance.
Suppose a=0 , b=1 and our first iteration of the trapezoid rule uses h=1 . Then we will need
samples of f at x=0 , 1 . As shown in Fig. 5, if for the second iteration we take h=1/2 we will
need samples at x=0 /2 ,1/ 2 , 2 /2 . But we will have already sampled x=0=0/ 2 , 1=2/2 . If the
third iteration has h=1/4 we will need samples at x=0 /4 ,1 /4 , 2/4 , 3/4 , 4 /4 , but we will
have already sampled at x=0 /4=0/2=0 , 2/ 4=1/2 , 4/4=2 /2=1 . In fact at each iteration only
the odd-numbered samples are new. If at the fourth iteration we take h=1/8 , we only need to
sample at x=1/ 8 ,3 /8 ,5 /8 , 7/ 8 . In other words, if we break the sum in (8) into even and odd
samples

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

5/15

Fig. 5: Repeating the trapezoid rule at 1/2 the step


size results in all even samples being identical to
samples from the previous calculation.
n1

i=1

n1

n 1

i=1
even

i =1
odd

f ( a+ih)= f (a+ih)+ f ( a+ih)

(9)

the sum of even samples has already been calculated in the previous iteration. We only need to
multiply the previous iteration value by 1/2 (since h is being halved) and add in the new (odd)
samples
n1

1
I ( h)= I (2 h)+h f (a+ih)
2
i=1

(10)

odd

Our complete algorithm then reads


Algorithm for trapezoid-rule integration with error estimate
h
n=1 , h=(ba ) , I = [ f (a)+ f (b) ]
2
repeat until converged
I old = I , n 2 n , h=(ba )/n
n1

1
I = I old +h f ( a+ih)
2
i=1
odd

converged if | I I old |<tol


A Scilab implementation of this is given in the Appendix as function intTrapezoid. The
trapezoid rule is second-order accurate, that is, the error varies as h 2 . Halving the step size
reduces the error by a factor of 1/4.

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

6/15

3
x
Example 1. I = e sin ( x )dx estimated with intTrapezoid with desired
0

error of no more than 103 .


deff('y=f(x)','y=exp(-x)*sin(%pi*x)');
Iex = (%pi/(%pi^2+1))*(1+exp(-3)); //exact value
I = intTrapezoid(f,0,3,1e-3);
disp([Iex,I]);
0.3034152

0.3032642

The integration is indeed accurate to three decimal places. This required 129
function evaluations.

5 Simpson's rule
Simpson's rule integrates a quadratic interpolation of groups of three sampled function values.
Suppose we want to estimate
b

I = f (x)dx

(11)

using h=(ba )/2 and the three samples y 1= f (a ), y 2= f (a+h), y 3= f (a+2 h=b) . Let
x=a+th . Then a xb corresponds to 0t2 and dx=h dt so that
2

I =h f (a+th)dt

(12)

Representing f (a+th) by the Lagrange interpolating polynomial through the points


(t=0, y 1) ,(t=1, y 2 ) ,(t=2, y3 ) and integrating we find
2

I =h
0

1
1
h
y t(t 1) y 2 t (t2)+ y1 (t1)(t2) dt= ( y 1+4 y 2+ y 3)
2 3
2
3

(13)

To apply this result in general (Fig. 6) we arrange our samples x 1 , x 2 , x3 , x 4 , into adjacent
groups of three
( x 1 , x 2 , x3 ),( x3 , x 4 , x 5) ,(x 5 , x 6 , x 7 ) ,

(14)

(this only works if we have an odd number of samples, which implies an even number of
intervals). We then apply (13)
I=

h
[( y +4 y 2+ y 3)+( y 3+4 y 4+ y5 )+( y5 +4 y 6+ y 7 )++( y n2 +4 y n1+ y n)]
3 1

(15)

Notice that samples at group boundaries, such as y 3 and y 5 , appear twice in the summation.
Therefore
I=

h
[ y +4 y 2+2 y 3+4 y 4 +2 y 5+4 y 6+2 y 7++2 y n2+4 y n1+ y n ]
3 1

(16)

Simpson's rule is fourth-order accurate (error varies as h 4 ). Simpson's rule applied to the eleven
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

7/15

Fig. 6: Simpson's rule integration.

samples of Fig. 2 gave the estimate I =0.3044273 .


Rules based on cubic interpolation of four-point groups (Simpson's 3/8 rule), quartic
interpolation of five-point groups (Boole's rule), and even high-order interpolations exist.
Together these are referred to as Newton-Cotes formulas. These formulas were of considerable
interest in the past when calculations had to be done by hand.
An interesting way to view Simpson's rule is as follows. Suppose we have two samples of a
function: y 1= f (a) and y 3= f (b) . The trapezoid rule with h=(ba ) gives
I 1=

ba
[ y 1+ y 3 ]
2

(17)

Suppose we add a sample between the other two: y 2= f ((a+b)/2) . The trapezoid rule with
h=(ba )/2 applied to these three samples gives us
I 2=

ba
[ y 1+2 y 2+ y 3 ]
4

(18)

Now
4 I 2I 1 ba
1
ba 1
1
=
y 1+2 y 2+ y 3 ( y 1+ y 3) =
y 1+2 y 2+ y 3
41
3
2
3 2
2
ba
=
[ y1 +4 y 2+ y3 ]
6

(19)

is Simpson's rule with h=(ba )/2 . We see that if we have one trapezoid-rule estimate I 1
using a step size 2 h and a second I 2 using step size h, then Simpson's rule with step size h can
be calculated as
I=

4 I 2 I 1
3

(20)

Simpson's rule can be thought of as a weighted combination of trapezoid rules with different step
sizes. This idea is generalized by Romberg integration.

6 Romberg integration
Neglecting round-off error, the trapezoid rule would (in principle) produce an exact result in the
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

8/15

limit h 0 . Let's call the exact result I 0 . For arbitrary h let's denote the trapezoid-rule estimate
by I (h) . Then I 0 =I (0) . For small but finite h, I (h) will equal the exact result plus some
error, and the error will be a function of the step size h. We can write
2

I (h)=I 0 +a h +b h +

(21)

(It can be shown that the error is an even function of h and therefore involves only even powers.)
Romberg integration is a technique that allows us to subtract off the error terms a h2 , b h 4 , .
Applying the trapezoid rule with step size h /2 we get
I ( h/2)= I 0+a h2 /4+b h4 /16+

(22)

We don't know the value of the coefficient a, so we don't know the first error terms in (21) and
(22). However, we do know that for any value of a
4 (a h2 / 4)=a h2

(23)

4 I ( h/ 2) I ( h)
=I 0b h4 /4+
41

(24)

This allows us to write

We have just removed the h 2 error term! Two second-order accurate trapezoid-rule calculations
have been combined to produce a fourth-order accurate result. In fact, as we saw above, this is
just Simpson's rule.
Now run the trapezoid rule with step size h /4 to get
I (h/ 4)=I 0+a h2 /16+b h 4 /256+

(25)

Once again the h 2 error term is 1/ 4 the value of the previous iteration, and we can calculate
4 I ( h/ 4)I (h/2)
=I 0b h 4 /64+
41

(26)

Now we have two results, (24) and (26), that are fourth-order accurate (both are Simpson's rule
calculations). Furthermore, notice that although we don't know the value of the coefficient b, we
do know that
16(b h4 /64)=b h4 / 4

(27)

Therefore
42 ( I 0b h4 /64)(I 0 b h 4 /4)
4 21

=I 0 +

(28)

and we have eliminated both the h 2 and h 4 error terms! This result is sixth-order accurate. We
can continue on in this manner to produce a result accurate to as high an order as we wish.
Here is a useful notation that will allow us to easily code Romberg integration. Define
R( j ,1)=I

( )
ba
2 j 1

(29)

so that R(1,1)=I ( ba) , R(2,1)= I ((ba)/2) , R(3,1)=I ((ba )/4) and so on. Stack these
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

9/15

into a one-dimensional array

[ ]

(30)

R( 2,2)=

4 R(2,1)R(1,1)
41

(31)

R(3,2)=

4 R(3,1)R(2,1)
41

(32)

R(1,1)
R(2,1)
R(3,1)

Now calculate

and

Just as for (24) and (26), R(2,2) and R(3,2) will be fourth-order accurate results, lacking the
2
h error term. Place these in the second column of our array

R(1,1)
0
R(2,1) R(2,2)
R(3,1) R(3,2)

(33)

Now calculate
R(3,3)=

4 2 R( 3,2)R(2,2)
4 21

(34)

As for (28) this will be sixth-order accurate, lacking both the h 2 and h 4 terms. Place this in the
third column of our array

R(1,1)
0
0
R(2,1) R(2,2)
0
R(3,1) R(3,2) R(3,3)

(35)

The relation between an element in the kth column and the elements in the previous column is
4(k 1) R( j , k 1)R( j1, k 1)
R( j , k )=
4(k1)1

(36)

Suppose we want an eight-order accurate result. Calculate R( 4,1)= I ((ba)/8) and then use
formula (36) to calculate R( 4,2) , R(4,3), R( 4,4) to obtain

R(1,1)
0
0
0
R(2,1) R( 2,2)
0
0
R(3,1) R(3,2) R(3,3)
0
R(4,1) R( 4,2) R(4,3) R(4,4)

(37)

Our eight-order accurate estimate is R( 4,4) . We can continue to add rows in this manner as
many times as desired. The difference R( 4,4)R(3,3) provides an error estimate. Adding these
calculations to the trapezoid-rule algorithm results in

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

10/15

Algorithm for Romberg integration with error estimate


h
n=1 , h=(ba ) , I = [ f (a)+ f (b)] , j =1 , R( j ,1)=I
2
repeat until converged
I old = I , n 2 n , h=(ba )/n
n1

1
I = I old +h f (a+ih)
2
i=1
odd

j j+1 , R( j ,1)=I
for k=2,3,...,j
R( j , k )=

w R( j , k 1)R( j1, k1)


, w=4k 1
w1

converged if | R( j , j)R( j1, j1)|<tol


A Scilab implementation of this appears in the Appendix as intRomberg.
3

Example 2.

I = ex sin ( x )dx

estimated with intTrapezoid and

intRomberg with desired error of no more than 106 .


global nCalls
function y = f(x)
global nCalls
nCalls = nCalls+1;
y=exp(-x)*sin(%pi*x);
endfunction
Iex = (%pi/(%pi^2+1))*(1+exp(-3)); //exact result
nCalls = 0;
I = intTrapezoid(f,0,3,1e-6);
disp([Iex,I,nCalls]);
nCalls = 0;
I = intRomberg(f,0,3,1e-6);
disp([Iex,I,nCalls]);
0.3034152
0.3034152

0.3034151
0.3034152

4097.
65.

4097 function calls were required by the trapezoid rule while only 65 were
needed for Romberg integration. The trapezoid rule error was 1.5107 while the
Romberg integration error was 7.21011 .
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

11/15

7 Gaussian quadrature
Quadrature is an historic term used to describe integration. So far we've assumed f ( x ) is
uniformly sampled along the x axis. The idea behind Gaussian quadrature is to consider
arbitrarily placed x samples. To see why this is a good idea consider the following function
f (x )=a+b x 2+c x 4

(38)

and its integral


1

1
1
I = f (x) dx=2 a+ b+ c
3
5
1

(39)

(Note: the integral of an odd power of x over 1x1 vanishes, hence we don't bother to
include odd powers in f (x ) .) Suppose we are allowed to estimate I using three samples of
f (x ) . We could use Simpson's rule to get

1
1
1
1
I Simpson = [ f (1)+4 f (0)+ f (1) ]= [ (a+b+c)+4 a+(a+b+c) ] =2 a+ b+ c
3
3
3
3

(40)

The a and b terms are correct but the c term is not. This is not surprising since Simpson's rule
interpolates the three samples with a quadratic. This is exact for a quadratic function, but the
presence of the x 4 results in error. Now consider the following combination of three f (x )
samples
I Gauss=

[ ()

1
3
5f
+8 f ( 0 )+5 f
9
5

1
1
=2 a+ b+ c
3
5

( )] [(
3
1
=
5
9

9
9
5 a+3 b+ c +(8 a )+ 5 a+3 b+ c
5
5

)]

(41)

This result is exact, I Gauss =I , even though it required only three samples. It turns out that if you
properly choose the n sample points x i and corresponding weights w i you can make
n

i=1

wi f (x i )= f (x)dx
for f ( x ) an arbitrary polynomial of order 2 n1 . The x i turn out to be roots of certain
polynomials, and the formulas for the x i and w i are fairly involved.

8 The intg (Scilab) and quadgk (Matlab) functions


Scilab and Matlab provide state-of-the-art numerical integration functions. Both are based on a
b

version of Guassian quadrature called GaussKronrod quadrature. In Scilab I = f (x)dx is


a

estimated using
I = intg(a,b,f);

The default tolerance is very small. To specify the tolerance use


[I,err] = intg(a,b,f,tol);

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

12/15

Optional output variable err is an estimate of the absolute error. In Matlab the corresponding
function is
I = quadgk(f,a,b);

To use intg in the console you first need to define a function f ( x ) . For example
-->deff('y=f(x)','y=exp(-x)*sin(%pi*x)');
-->I = intg(0,3,f)
I =
0.3034152
3
x
Example 3. I = e sin ( x )dx was estimated in Scilab using
0

deff('y=f(x)','y=exp(-x)*sin(%pi*x)');
I = intg(0,3,f,1e-6)
I =
0.3034152

Only 21 function calls were required to produce a result with error of 0, i.e.,
the exact result and the intg result were equal to within double precision
accuracy.
The integrate function convenient allows you to skip the deff statement as it accepts the
function and variable of integration as string arguments
-->I = integrate('exp(-x)*sin(%pi*x)','x',0,3)
I =
0.3034152

9 Improper integrals
An integral is improper if the integrand has a singularity within the integration interval. For
example
1

sinx x dx

(42)

Here the integrand in undefined at x=0 where it has a 0/0 form. In fact lim
x 0

could define away the problem with

sin x
f ( x )= x
1

x0

sin x
=1 , so one
x

(43)

x=0

On the other hand the integrand of


1

1
dx=
2
2
1x

(44)

becomes infinite as x 1 . In either case a solution would be an integration technique that avoids
evaluating the function at the endpoints. The midpoint rule is a simple example of this type of soEE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

13/15

called open integration formula. The GaussKronrod quadrature method used by intg and
quadgk is also an open formula and will work for functions with singularities at one or both
endpoints. For example
-->integrate('sin(x)/x','x',0,1)
ans =
0.9460831
-->integrate('1/sqrt(1-x^2)','x',0,1)
ans =
1.5707963

For a singularity at x=c , a<c<b , we can break the integral into two
b

f ( x )dx= f (x) dx+ f ( x)dx

Another type of improper integral is one with an infinite limit of integration, such as

x 3 ex dx=6

(45)

One way to treat an integral of this type is by using a change of variable such as
x=ln(1u)

(46)

For u=0 , x=0 and as u 1 , x . The differential is


dx=

du
1u

(47)

Conveniently ex =1u , so that ex dx=du , and the integral becomes


1

(ln (1u))3 du

(48)

This is also improper because as u 1 , x =ln (1u) but an open integration formula can
evaluate it
-->integrate('(-log(1-u))^3','u',0,1)
ans =
6.

10 References
1. http://en.wikipedia.org/wiki/Numerical_integration

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

14/15

11 Appendix Scilab code


11.1 Trapezoid rule
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025

//////////////////////////////////////////////////////////////////////
// intTrapezoid.sci
// 2014-11-15, Scott Hudson, for pedagogic purposes
// Trapezoid rule estimation of integral of f(x) from a to b
// Estimated error is <= tol
//////////////////////////////////////////////////////////////////////
function I=intTrapezoid(f, a, b, tol)
n = 1;
h = (b-a);
I = (h/2)*(f(a)+f(b));
converged = 0;
while (~converged)
Iold = I;
n = 2*n;
h = (b-a)/n;
I = 0;
for i=1:2:n-1 //i=1,3,5,... odd values
I = I+f(a+i*h);
end
I = 0.5*Iold+h*I;
if (abs(I-Iold)<=tol)
converged = 1;
end
end
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 22: Numerical integration

15/15

11.2 Romberg integration


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032
0033
0034

//////////////////////////////////////////////////////////////////////
// intRomberg.sci
// 2014-11-15, Scott Hudson, for pedagogic purposes
// Romberg integration of f(x) from a to b
// Estimated error is <= tol
//////////////////////////////////////////////////////////////////////
function I=intRomberg(f, a, b, tol)
n = 1;
h = (b-a);
I = (h/2)*(f(a)+f(b));
j = 1;
R(j,j) = I;
converged = 0;
while (~converged)
Iold = I;
n = 2*n;
h = (b-a)/n;
I = 0;
for i=1:2:n-1 //i=1,3,5,... odd values
I = I+f(a+i*h);
end
I = 0.5*Iold+h*I;
j = j+1;
R(j,1) = I;
for k=2:j
w = 4^(k-1);
R(j,k) = (w*R(j,k-1)-R(j-1,k-1))/(w-1);
end
if (abs(R(j,j)-R(j-1,j-1))<=tol)
converged = 1;
end
end
I = R(j,j);
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23
Random numbers
1 Introduction
1.1 Motivation
Scientifically an event is considered random if it is unpredictable. Classic examples are a coin
flip, a die roll and a lottery ticket drawing. For a coin flip we can associate 1 with heads and 0
with tails, and a sequence of coin flips then produces a random sequence of binary digits.
These can be taken to describe a random integer. For example, 1001011 can be interpreted as the
binary number corresponding to
6

1(2 )+0 (2 )+0(2 )+1(2 )+0(2 )+1(2 )+1(2 )=64+8+2+1=75

(1)

For a lottery drawing we can label n tickets with the numbers 0, 1, 2, , n1 . A drawing then
produces a random integer i with 0in1 . In these and other ways random events can be
associated with corresponding random numbers.
Conversely, if we are able to generate random numbers we can use those to represent random
events or phenomena, and this can very useful in engineering analysis and design. Consider the
design of a bridge structure. We have no direct control over what or how many vehicles drive
across the bridge or what environmental conditions it is subject to. A good way to uncover
unforeseen problems with a design is to simulate it being subject to a large number of random
conditions, and this is one of the many motivations for developing random number generators.
How can we be sure our design will function properly? By the way, if you doubt this is a real
problem, read up on the Tacoma Narrows bridge which failed spectacularly due to wind forces
alone [1].

1.2 Truly random numbers


In classical physics, nothing is fundamentally random. In Principle the present state of the
universe combined with physical laws exactly determines the future state of the universe.
Classical physics is strictly deterministic. In principle if we knew the positions and velocities of a
pair of dice thrown in the air, their physical properties and the properties of the surface on which
they will land, we could predict what numbers they will show by solving the equations of motion.
However, in practice the behavior of the dice-plus-table system is so complicated that even
extremely small changes in the initial conditions or the properties of the dice will cause a
significant change in their final state. There are many chaotic systems that display this so-called
butterfly effect [Error: Reference source not found]. For practical purposes these are random
events, even in a deterministic universe.
According to quantum mechanics there are truly random phenomena at the atomic scale. For
example, it is not possible, even in principle, to predict when a radioactive nucleus will decay.
We can only make statements about the probability of its doing so during a certain time interval.
For this reason quantum effects are arguably the ultimate way to generate truly random
numbers, and the fields of quantum random number generation and quantum cryptography are
areas of active research and development.
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

2/12

In the past sequences of random numbers, generated by some random physical process, were
published as books. In fact the book A Million Random Digits with 100,000 Normal Deviates,
produced by the Rand Corporation in 1955, is still in print (628 pages of can't put it down
reading!) as of 2014. It is still the case that if a sequence of truly random numbers is desired they
must be obtained from some physical random process. One can buy plug-in cards that sample
voltages generated by thermal noise in an electronic circuit (such as described at onerng.info),
and the website random.org generates random numbers by sampling radio-frequency noise in the
atmosphere.

1.3 Pseudo-random numbers


A properly functioning computer is a strictly deterministic system. Given the same data and
instructions it will produce the same output every time. Therefore it is impossible for a computer
to generate truly random numbers. Instead we have to be content with the generation of pseudorandom numbers. A computer program that generates a sequence of such numbers is called a
pseudo-random number generator (PRNG).
A sequence of numbers is pseudo-random if it looks like a random sequence. More rigorously
there are various statistical tests available to quantify how random looking the output of a
PRNG is. Until recently the so-called diehard tests [2] were a common software tool for this
purpose. More recently the National Institute of Standards and Technology has released a
random-number toolkit [3].
Consider the following sequence of numbers
0 1 2 3 4 5 6 7

There appears to be nothing random about this sequence; it follows an obvious pattern of
incrementing by one. Of course it's possible that a random sequence of numbers just happened to
form this pattern by chance, but intuitively that's not very likely. On the other hand the sequence
1 6 7 4 5 2 3 0

which contains the same eight digits does look somewhat random, although it doesn't take long to
notice an add-one and subtract-three pattern in the last seven digits. In fact it was generated by a
algorithm which is every bit as deterministic as increment by one, and to which we now turn.

2 Linear congruential generators


The expression

y=x (mod m)

(2)

read y equals x modulo m, means that y is the remainder when x is divided by m. For example
3=11 (mod 4)

(3)

11 8+3
=
=2 remainder 3
4
4

(4)

because

In Scilab/Matlab we can calculate this using the command


y = x-int(x/m)*m

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

3/12

which subtracts off an integer number times n from x leaving the remainder. More directly
-->modulo(11,4) //Scilab
ans =
3.
>> mod(11,4) %Matlab
ans =
3

Another way to think of this is that x (mod m) is the least-significant digit of x expressed in
base-m. For example 127 (mod 10)=7 , and 13 ( mod 8)=5 because 13=181+580=158 .
If x ( mod m)= y (mod m) we say x is congruent to y modulo m and we write

x y ( mod m)

(5)

A linear congruential generator (LCG) is a simple method for generating a permutation of the
integers 0im1 using modular arithmetic. Starting with a seed value x 0 , with 0x 0<m , a
LCG generates a sequence x 1 , x 2 , , x m with
x n+1=(a x n+c ) ( mod m)

(6)

Provided the constants a and c are properly chosen this will be a permutation of the integers
0im1 . If m is a power of 2, then c must be odd and a must be one more than a multiple of
4. For example, if m=23=8 , a=5, c=1 and x 0=0 , then x 1 , x 2 , , x8 is the permutation
1 6 7 4 5 2 3 0

because
(0+1) mod 8=1 , (51+1) mod8=6 , (56+1) mod 8=31 mod 8=7 ,

(57+1) mod 8=36 mod 8=4 , (54+1) mod 8=21 mod 8=5 ,
(55+1) mod8=26 mod 8=2 , (52+1) mod 8=11 mod 8=3
and (53+1) mod 8=16 mod8=0
Since x 8=x 0=0 the permutation will then repeat with x 9=x 1=1 and so on. If we used a seed
value of x 0=4 then we would get the sequence
5 2 3 0 1 6 7 4

which is the same sequence starting from a different digit. A LCG with given a and c values will
always produce the same sequence, and using a different seed value will simply start us off at a
different location in the sequence.
A repeating sequence of eight numbers is probably not of much use, but in practice we use a
large value of m, quite commonly m=232 corresponding to a 32-bit unsigned integer which
are conveniently implemented in digital hardware. Fig. 1 shows plots of sequences generated
with m=210=1024 and parameters a=5 , c=1 (left) and a=9 , c=1 (right). Ten samples from
the latter sequence

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

4/12

Fig. 1: Output of linear congruential generator ( vertical vs. n horizontal) with m=1024, starting with seed
value x0 =0 . (left) a=5, c=1; (right) a=9,c=1. Repeated patterns are a strong indication that these are not
truly random sequences.
532 693 94 847 456 9 82 739 508 477

don't display an immediately obvious sequential relationship. In Fig. 1 we plot both sequences in
full. In the left graph some of the samples form a very noticeable diagonal line pattern which is a
strong indication that the data are almost certainly not truly random. The right plot lacks such a
glaring red flag. However, on closer inspection one can see repeated patterns throughout the
image (indicated by polygons). It is extremely unlikely that a truly random sequence of numbers
would exhibit such a pattern, and a LCG is unacceptable for a demanding PRNG application
such as cryptography.
In engineering applications we usually want random real numbers x instead of random integers i.
We can easily generate real numbers 0x<1 from integers 0im1 by calculating
x=

i
m

Of course there will only be a discrete set of m such real numbers with spacing 1/m , but if m is
large enough this can provide a good approximation to a real random number generator. In
many programming languages (including Scilab) the command
x = rand();

generates a random real number x with 0x<1 using this method.


LCGs are simple to implement and for many years were the standard PRNGs in many
computer programming languages. For simple engineering design and analysis tasks they are
often good enough. A Scilab LCG implementation appears in the appendix. Note that this
makes explicit use of 32-bit unsigned integers. Arithmetic performed with this data type is
inherently modulo 232 .

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

5/12

Fig. 2: 100,000 samples of the multiply-with-carry PRNG.

3 More advanced pseudo-random number generators


Various methods have been developed to try and improve on the LCG method. The multiplywith-carry method, roughly speaking, generates two LCG sequences combined with some bit
masking and bit shifting and combines these to produce a pseudo-random output. A Scilab
implementation is given in the Appendix and a plot of 100,000 samples is shown in Fig. 2.
Currently one of the most commonly used improved algorithm is called the Mersenne twister [5],
and both Scilab and Matlab have implementations of this. This algorithm is fairly complex.
Instead of one or two seed integers, it's initial state consists of an array of 625 integers.
Cryptography is arguably the fields with the most demanding PRNG requirements. Not
surprisingly, some of the best PRNG algorithms make use of encryption techniques, such as the
Advanced Encryption Standard.

4 The rand function (Scilab/Matlab)


The function call
x = rand();

in both Scilab and Matlab generates a random real number 0x<1 . Subsequent calls return the
next number in the pseudo-random sequence. To generate an array with dimensions mn use
x = rand(m,n);

On start up both Scilab and Matlab initialize the PRNG to a specific state. Therefore you will
always get the same sequence of numbers. In my current version of Scilab, on start up I will
always obtain the initial result
-->rand(3,1)
ans =
0.2113249
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

6/12

0.7560439
0.0002211

while in my current version of Matlab, on start up I will always obtain the initial result
>> rand(3,1)
ans =
0.8147
0.9058
0.1270

To start off at a different place in the sequence you can seed the PRNG as follows
-->rand('seed',i0); //Scilab
>> rand('twister',i0); %Matlab

where i 0 is an integer in the range 0i 0<232 . It can actually be useful to generate the same
random number on separate occasions because it allows interesting simulation results to be
repeated. However, sometimes you want pseudo-random numbers that are different each time
your open and run a program. One way to get a unique seed each time you run a program is to
generate it from the system clock. Recommended ways to do this are
-->rand('seed',getdate('s')); //Scilab
>> rand('twister',sum(100*clock)); %Matlab

4.1 The grand function (Scilab)


The Scilab rand() function is based on a LCG while the Matlab version uses the superior
Mersenne twister algorithm. In Scilab the twister algorithm is available in the grand()
function. This is used as follows
-->grand(3,1,'def')
ans =
0.8147237
0.135477
0.9057919

to produce a 3-by-1 array. The 'def' string indicates that you want the returned value to be from
the default distribution which is uniform over 0x<1 . To seed this PRNG use the command
-->grand('setsd',i0);

As with the rand() function you can use the system clock as an ever-changing seed
-->grand('setsd',getdate('s'));

5 The probability density function and histograms


We say that the probability density function (pdf) of a random variable x is f x ( x) if the
probability that x will take on a value x 1x x 2 is
x2

Prob { x 1xx 2 }= f x ( x) dx
x1

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

7/12

So far we have used the rand() and grand() functions to produce x values uniformly
distributed over 0x<1 . The ideal uniform pdf is

f x ( x)= 1 0x<1
0 otherwise

(7)

Let's test this by generating a large number of random values and then plotting a histogram of the
data. To generate a histogram we first divide the x interval of interest into a number of
subintervals or bins. Let's take our bins to be 0x<0.05 , 0.05 x<0.10 , 0.10x<0.15 and
so on up to 0.95 x<1.00 . We then count how many x samples fall in each bin. This number
divided by the total number of samples is an estimate of f x (x) dx over the bin. Dividing by
the width of the bin (0.05 in our case) we get an estimate of the average value of f x ( x) for that
bin.
Generating a histogram plot in Scilab is as simple as
histplot(nbins,x);

where nbins is the number of equal-sized bins we want in the interval [ x min , x max ] . As the
number of samples increases a histogram should give a progressively better estimate of the
underlying pdf of the random (or pseudo-random) process. Running the commands
x = grand(1e3,1,'def');
histplot(20,x);

and
x = grand(1e6,1,'def');
histplot(20,x);

produced the results shown in Fig. 3. We see that the pdf does approach the ideal uniform

Fig. 3: histograms of x = grand(1e3,1,'def') and x = grand(1e6,1,'def');

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

8/12

distribution as the number of samples gets large enough.

6 Need for random variables with non-uniform pdf


The uniform distribution (7) is only one possible pdf. We often need to generate random numbers
with some specific pdf in order to simulate a certain physical process. A few examples are:

6.1 Normal distribution


This is the classic bell curve (also called the Gaussian distribution)
1 2(
f y ( y)=
e
2

1 y

, < y<

The normal distribution is specified by two parameters: is the mean value (average) of y and
is the standard deviation. The Central Limit Theorem tells us that any process that is the sum
or average of a large number of independent, identically distributed processes will be normally
distributed. Since so many natural phenomena have this property, the normal distribution finds
wide application. In particular noise in measurements is often assumed to be normally
distributed.

6.2 Exponential distribution


The power received by a cell phone (or other wireless device) in very cluttered environment
where the field is scattered many times is described by the exponential distribution
f P (P )=

1 P / P
e
, P0
P av
av

(8)

Here P av is the average received power. To simulate a wireless communication channel we need
to be able to generate exponentially distributed random values to model fading effects.

6.3 Maxwell-Boltzman distribution


In a gas in thermal equilibrium at temperature T, the probability that a molecule will have
velocity v is given by the Maxwell-Boltzmann distribution
f v (v )=

( )

4
v
v p vp

( )
e

2 v
vp

, v 0

(9)

where
v p=

2 kT
m

(10)

is the most likely velocity (the peak of the pdf). Here m is the molecular mass and k is
Boltzmann's constant. We need to generate samples from this distribution if we wish to perform
molecular dynamics simulations.

7 Generating random variables with an arbitrary pdf


Suppose x is a uniformly distributed random variable of the type we discussed above with pdf
EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

9/12

given by (7). Now define a random variable y as

Fig. 4: f x ( x) x f y ( y) y

y=g ( x)

(11)

a=g (0) , b=g (1)

(12)

with
This is illustrated in Fig. 4. We want to find the pdf of y over the interval a yb . An x interval
of width x will correspond to a y interval of width y , and y will fall in the y interval if
and only if x falls in the x interval. Therefore we can equate the probabilities
f x ( x) x f y ( y) y

(13)

Since f x ( x)=1 we have the differential relation


dx= f y ( y )dy

(14)

Integrating this from 0 to x on the left and correspondingly from a to y on the right, we have
x

x= dx = f y ( y )dy=F y ( y)
0

where F y ( x) is called the cumulative distribution function (cdf) of the random variable y.
Inverting this relation
x=F y ( y)

(15)

we get y=g ( x) .
For example, the exponential distribution (8) has cdf
y

P1
0

eP / P dP=1e y/ P
av

av

(16)

av

y/ P
Solving x=1e
gives us
av

y=P av ln(1x)
EE 221 Numerical Computing

Scott Hudson

(17)
2015-08-18

Lecture 23: Random numbers

10/12

Fig. 5: Histograms of exponentially distributed random values generated by y=4 ln(1 x) and ideal pdf.

In Fig. 5 we show histograms of y=4 ln (1x) where x is a uniform random variable. Given
enough sample points the histogram approximates the ideal pdf very well.
Unfortunately, for some pdfs it is not possible to calculate the cdf. Arguably the most important
example is the normal distribution. The integral
(
1
F y ( y)=
e 2

2
y

1 y

) dy

(18)

cannot be evaluated in terms of elementary functions. However, it is possible to transform two


independent, uniformly distributed random variables x 1 , x 2 into two independent normally
distributed random variables y 1 , y 2 using the Box-Muller transform. We first calculate the polar
coordinates
r = 2 ln x1 ,=2 x 2
and then y 1 , y 2 are the rectangular coordinates
y 1=r cos , y 2=r sin
For the values x 1 , x 2 we can use two sequential values from a PRNG. Fig. 6 shows histograms
obtained by applying the Box-Muller transform to 103 (left) and 106 (right) values of the
rand() function and compares these with the ideal normal distribution. With enough samples
the agreement is excellent.
The Box-Muller transform produces y values with =0 , =1 . To change these we perform an
additional transform
z= y+

(19)

Then z has mean m and standard deviation s.


EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

11/12

Fig. 6: Histograms of zero-mean, unit-variance, normally distributed random values generated by the
Box-Muller transform and ideal pdf.

In Matlab the randn() function is similar to the rand() function but generates normal
random values with =0 , =1 . In Scilab the grand() function can generate arrays of normal
random variables with specified , when called as follows
Y = grand(m, n, 'nor', Av, Sd);

For example
-->grand(2,3,'nor',2,1)
ans =
1.5434479
0.7914453

3.5310142
1.459521

1.172681
3.5340618

8 References
1. Tacoma Narrows bridge failure: https://www.youtube.com/watch?v=j-zczJXSxnw
2. http://en.wikipedia.org/wiki/Diehard_tests
3. http://csrc.nist.gov/groups/ST/toolkit/random_number.html
4. http://en.wikipedia.org/wiki/Random_number_generation
5. http://en.wikipedia.org/wiki/Mersenne_twister
6. http://en.wikipedia.org/wiki/List_of_probability_distributions

EE 221 Numerical Computing

Scott Hudson

2015-08-18

Lecture 23: Random numbers

12/12

9 Appendix Scilab code


9.1 Linear congruential generator
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018

//////////////////////////////////////////////////////////////////////
// randLCG.sci
// 2014-12-08, Scott Hudson, for pedagocic purposes
// 32-bit linear congruential generator for generating uniformly
// distributed real numbers 0<=x<1.
//////////////////////////////////////////////////////////////////////
global randLCGseed randLCGa randLCGc
randLCGseed = uint32(0);
function x=randLCG(seed)
global randLCGseed randLCGa randLCGc
[nargout,nargin] = argn();
if (nargin==1) //if there is an argument, use it as the seed
randLCGseed = uint32(seed);
end
randLCGseed = uint32(1664525)*randLCGseed+uint32(1013904223);
x = double(randLCGseed)/4294967296.0;
endfunction

9.2 Multiply-with-carry method


0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023

//////////////////////////////////////////////////////////////////////
// randMWC.sci
// 2014-12-08, Scott Hudson, for pedagocic purposes
// Multiply-with-carry algorithm for pseudo-random number generation.
// Retuns uniformly distributed real number 0<x<1.
// Reference: http://en.wikipedia.org/wiki/Random_number_generation
//////////////////////////////////////////////////////////////////////
global randMWCs1 randMWCs2
randMWCs1 = uint32(1);
randMWCs2 = uint32(2);
function x=randMWC(seed1,seed2)
global randMWCs1 randMWCs2
[nargout,nargin] = argn();
if (nargin==2) //if there are arguments, use as seeds
randMWCs1 = uint32(seed1); //should not be zero!
randMWCs2 = uint32(seed2); //should not be zero!
end
s = uint32(2^16);
randMWCs1 = 36969*modulo(randMWCs1,s)+randMWCs1/s;
randMWCs2 = 18000*modulo(randMWCs2,s)+randMWCs2/s;
x = double(randMWCs1*s+randMWCs2)/4294967296.0;
endfunction

EE 221 Numerical Computing

Scott Hudson

2015-08-18

You might also like