Professional Documents
Culture Documents
Our text jumps right into finite element methods in Chapter 2, but I prefer to step back a bit first
and try to provide a frame work in which we can place the formulations and techniques well use.
All of the problems we will solve have a common thread: they are linear in a sense that we will
make precise shortly. The problems of linear algebra are the prototypes of all linear problems, and
therefore it is important that we review some of the basic concepts of this subject so that we can
see the parallels in more complex situations. An additional benefit of this review is that we will
ultimately reduce all our problems to linear algebraic ones so it is helpful to recall some basic
results from this subject.
A basic notion in linear algebra is that of a vector space. The definition of a vector space is a
formalization of the properties of three dimensional vectors with which we are all familiar. Let F
denote the set of real numbers R or complex numbers C, and call the elements of F scalars. We
say a set V with elements ~x, ~y , ~z, . . . called vectors forms a vector space over F provided:
1. For each element ~x V and each scalar F there is defined a vector ~x V called the
product of and ~x (i.e., there is a function, say p : F V V whose value p(, ~x) at a
particular pair is denoted by ~x usually we omit the and simply write ~x).
2. For each pair of vectors ~x, ~y in V there is defined a vector ~x + ~y called the sum of ~x and ~y
(i.e., there is a function s : V V V whose value s(~x, ~y ) at a pair ~x, ~y of elements in V is
denoted by ~x + ~y ).
3. The functions p and s satisfy the conditions:
(a) ~x + ~y = ~y + ~x.
(b) ~x + (~y + ~z) = (~x + ~y ) + ~z.
(c) There is a unique vector ~0 such that ~x + ~0 = ~x for every ~x. We usually simply write 0
instead of ~0.
(d) For every vector ~x there is a vector ~x such that ~x + (~x) = 0.
(e) (~x + ~y ) = ~x + ~y .
(f) ( + )~x = ~x + ~x.
(g) (~x) = ()~x.
(h) 1 ~x = ~x.
(i) 0 ~x = 0.
A number of elementary properties follow immediately from these definitions (e.g. The cancelation
law ~x + ~y = ~x + ~z ~y = ~z. Proof: ~y = ~y + 0 = ~y + ~x + (~x) = ~z + ~x + (~x) = ~z + 0 = ~z and
~x = (1) ~x proof: 0 = 0~x = (1 1)~x = ~x + (1) ~x, but by (d) 0 = ~x + (~x) ~x = (1) ~x
by the cancelation law.)
All these conditions are obviously satisfied in the two most important cases of vector space we will
deal with:
1. n dimensional Euclidean space Rn = {~x : ~x = (x1 , x2 , . . . , xn )} with addition defined componentwise, ~x + ~y = (x1 + y1 , . . . xn + yn ), and scalar multiplication defined by ~x = (x1 , . . . , xn ).
For n = 3 this is the space of ordinary three dimensional vectors. It is useful in most cases to
think of the components of a vector ~x as arranged in a column, i.e.
x1
.
~x = ..
xn
2. Real or complex function spaces. As a familiar example, let [a, b] denote a closed interval on
the real axis, and let C([a, b]) denote the set of all real valued continuous functions defined on
[a, b]. Letting f, g be elements of C([a, b]) and x [a, b] we define the sum of f and g pointwise
by (f + g)(x) = f (x) + g(x) and if c is a real number we define scalar multiplication pointwise
by (cf )(x) = cf (x). It is clear that these definitions give a vector space. By the same reasoning
the sets C k ([a, b]) of all functions defined, continuous, and having continuous derivatives up to
and including order k all form vector spaces (all strictly contained in C([a, b])).
Vector spaces are the setting for our definition of a linear function. Let V, W be vector spaces
and f : V W be a function defined on V and giving values in W , i.e., if ~x V f (~x) is defined
and is a vector in W . f is said to be a linear function if f (~x + ~y ) = f (~x) + f (~y ). The set V is
called the domain of f , and the set W the codomain. The set of vectors in W that are images of
vectors in V under f , i.e., f (V ) = {y W : f (x) = y}, is called the range of f (range(f ) = f (V )).
For example, suppose a x1 < x2 b are two fixed points in [a, b]. Define A : C([a, b]) R2 by
A(f ) = (f (x1 ), f (x2 )) and we have a function from the vector space C([a, b]) to the vector space R2 .
This function is linear since A(f + g) = (f (x1 ) + g(x1 ), f (x2 ) + g(x2 )) = A(f ) + A(g).
As another example, consider the problem of fitting a pair of data vectors of length n to a straight
line: we want to find a, b such that yi = axi + b, i = 1, . . . , n. We can write this as A~v = ~y , where
y1
1 x1
b
.
.
.. , ~v =
A = ..
, and ~y = ... ,
a
1 xn
yn
and we see that A is a linear map from R2 to Rn . If we write Av in the form
x1
1
.
..
.
A~v = b . + a
.
1
xn
we see that the range of A is a linear combination of just two vectors in Rn , and so we cant hope to
have exact equivalence of A~v and ~y except in very special cases the range of A is smaller than Rn .
Our two prime examples of vector spaces have a vastly different size. The size of a vector space V
is measured by its dimension dim(V ) which is either a positive integer or . The Euclidean space
Rn has dimension n whereas the space of continuous functions on an interval [a, b] has dimension .
To get at a definition of the number dim(V ) we need to introduce the notion of a system of basis
vectors of V . This requires several useful ideas.
otherwise
include
them
with
0
coefficients. Then subtracting
P
we have j (cj c0j )vj = 0 cj = c0j for all j since the vectors are linearly independent.)
It can be shown that if a vector space is finite dimensional, then every basis has the same number
n
of elements, and we call this number dim(V ). In the case
P of R it is clear that the vectors ~ej with
n
jth component 1 and all P
others 0 span R since ~x = j xj ~ej . It is also clear that the vectors ~ej
are linearly independent, j cj ~ej = 0 cj = 0, j = 1, . . . , n so that dim(Rn ) = n according to our
definitions.
It can also be shown that given any linearly independent set of vectors S it is always possible to find
a spanning set S1 S, i.e., any linearly independent set may be increased to form a basis. Now the
set of functions {xk : k = 0, 1, 2, . . .} is linearly independent in C([a, b]) (if c0 + c1 x + + cn xn = 0,
we would have a polynomial of degree n with more than n roots which is impossible). Since this set
is not finite dim(C([a, b])) = . This space has many finite dimensional subspaces, e.g., the set of
all polynomials of degree n is a subspace of dimension n + 1 (spanned by {1, x, x2 , . . . , xn }
A major concern of linear algebra is the study of linear maps between finite dimensional linear
spaces. Weve already seen an example of this in the data fitting problem, where the linear map
was a matrix. Well now see that any linear map can be thought of as a matrix. Let A : V W be
a linear map between a vector space V (dim(V ) = n, ~vj , j = 1, . P
. . , n a basis) and a vector space
vj , we can write the image
W (dim(W ) = m, w
~P
,
k
=
1,
.
.
.
,
m
a
basis).
Since
~
v
~
v
=
k
j xj ~
vector w
~ = A(~v ) = j xj A(~vj ). But each of the n vectors A(~vj ) W can be expressed in terms of
the basis w
~ k , i.e., we have
m
X
A(~vj ) =
asj w
~ s,
s=1
where the m n coefficients asj are given numbers for a fixed pair of bases and a fixed operator A.
n
X
xj
j=1
Since w
~=
m
X
s=1
m X
n
X
asj w
~s =
(
asj xj )w
~ s.
s=1 j=1
ys w
~ s with unique coefficients we have
ys =
n
X
asj xj , s = 1, . . . , m
j=1
y1
a11 a1n
x1
.. ..
.. ..
=
,
.
.
.
.
am1 amn
xn
ym
where the m n matrix [aij ] corresponds to the operator A. It is useful to observe that the jth
column of the matrix is the result of applying A to the jth basis vector. The correspondence between
a linear map and a matrix is one to many since each time we choose a system of basis vectors for V
and W we get, in general, a different matrix representing the map. On the other hand, linear maps
can now be thought of as simply matrices objects we are all familiar with.
As an example, lets take V = P2 =the set of all polynomials of degree at most 2 (considered as real
values functions on R). This has the basis {e0 (x) = 1, e1 (x) = x, e2 (x) = x2 } (note indices run from
0 to 2). We define on this set the operator D : V W = P1 (with basis vectors e0 , e1 ) defined by
Dp(x) = dp(x)/dx. We have De0 = 0, De1 = e0 , De2 = 2e1 . Thus if p = c0 + c1 x + c2 x2 we have
Dp = p0 = c00 e0 + c01 e1 or in matrix form
0
c0
c0
0 1 0
c1 =
Dp =
c01
0 0 2
c2
Note that Dp = 0 is satisfied not only by the zero vector (polynomial) p 0, but also by p = c0 e0 .
For any linear function A, we set ker(A) = {~v V : A~v = 0}, and call this set the kernel or
null space of A. If ker(A) = {0}, we say that is trivial. If ker(A) is trivial then the equation
A~v = w
~ has at most one solution for a given w
~ W . (Suppose there were two ~v1 , ~v2 , then
A(~v1 ~v2 ) = w
~ w
~ = 0 ~v1 ~v2 ker(A) ~v1 ~v2 = 0.) If ker(A) is nontrivial, then the solution
of A~v = w
~ for a given w
~ W is not unique: Reason if ~v1 is one solution, then ~v1 + ~v0 is also a
solution for any ~v0 6= 0 ker(A).
Up to this point, we have not introduced two very important features of the Euclidean space Rn ,
and we consider these features now.
1. Rn has a notion of distance defined on it: namely there if ~x Rn the length of ~x is given
P
1/2
2
, and if ~y is another point the distance between ~x and ~y is defined by
by k~xk =
x
j j
k~x ~y k2 . The function k k2 : Rn R is called the Euclidean norm on Rn .
Let V be a vector space and assume there is a function k k : V R. This function is called
a norm on V if satisfies the following three conditions
4
By the properties of the integral, it is clear that the first two requirements of an inner product are
Rb
satisfied, and for a continuous function a f 2 (x) dx = 0 implies that f 0 as you can easily check.
The general situation that we will deal with in determining approximate solutions to both ordinary
and partial differential equations is this: We will be seeking the solution of a linear differential
equation and will be able to pose our problem using linear space terminology. That is, the differential
equation will be viewed as a linear operator from a vector space V to a vector space W , A : V W ,
and for a given f~ W we have to find a vector ~v V such that A~v = f~. For example, consider the
boundary value problem
u00 = f, 0 < x < 1,
1 All
norm conditions but the triangle inequality are obvious. Define the quadratic Q() = h~
x ~
y, ~
x ~
y i 0.
Expanding and using the properties of the inner product gives Q = h~
x, ~
xi 2h~
x, ~
y i + 2 h~
y, ~
y i. The minimum value of
Q occurs for h~
y, ~
y i = h~
x, ~
y i. Assuming ~
y 6= 0 this means h~
x, ~
xi h~
x, ~
y i2 /h~
y, ~
y i = Qmin 0. From this computation
we find the result |h~
x, ~
y i| k~
xkk~
y k called the Cauchy inequality. The triangle inequality follows from this, and the
computation k~
x+~
y k2 = k~
xk2 + 2h~
x, ~
y i + k~
y k2 k~
xk2 + 2|h~
x, ~
y i| + k~
y k2 . But using the Cauchy inequality this last
sum is k~
xk2 + 2k~
xkk~
y k + k~
y k2 = (k~
xk + k~
y k)2 , and the triangle inequality is proved.
take V = C02 ([0, 1]) = {u C 2 ([0, 1]) : u(0) = u(1) = 0}, W = C([0, 1]), and for u V take
Au(x) = u00 (x). Then the equation and the boundary conditions are reduce to finding the solution
of the operator equation Au = f.
It is important to realize that a numerical approximation to a problem of solving the linear equation
Au = f for a given f where the domain space for A has infinite dimension (like C02 ([0, 1])) can
only yield a finite number of values, say uj , j = 1, . . . , n which may approximate the values of the
solution u at certain points in [0, 1]. For our purposes, it is best to view this constraint as having
to find an approximation to the solution u in a finite dimensional subspace of V. If V is an inner
product space we have the following important approximation theorem.
If V is an inner product space, ~v V , and W a finite dimensional subspace of V , then the problem
of finding a w
~ W such that k~v wk
~ =minimum has a unique solution, w
~ , i.e., k~v w
~ k =
minwW
k~v wk.
~
~
I wont try to prove the existence of w
~ rigorously. Instead Ill develop formulas for computing
P its
components (assuming a real vector space). Let ~ej , j = 1, . . . , n be a basis for W , and w
~ = j xj ~ej
be an element of W . We have
d k~v wk
~ 2 = h~v w,
~ ~v wi
~ = k~v k2 2h~v , wi
~ + kwk
~ 2
and introducing the components of w
~
d = d(x1 , . . . , xn ) = k~v k2 2
n
X
j=1
n
X
i,j=1
In order to minimize d we set its partial derivatives with respect to xj , j = 1, . . . , n equal to 0. This
gives
n
X
h~ei , ~ej ixj = h~v , ~ei i, i = 1, . . . , n or, G~x = ~y
j=1
with ~x = [x1 , . . . , xn ] , ~y = [h~v , ~e1 i, . . . , h~v , ~en i]T . These equations are called the Normal Equations.
The matrix G = [h~ei , ~ej i] is called the Gram matrix of the basis vectors ~ei . It is relatively easy to show
that G is nonsingular so that the normal equations always have a solution, xj = xj , j = 1, . . . , n.
In particular, if the basis vectors are mutually perpendicular,
gi > 0, i = j
h~ei , ~ej i =
0,
i 6= j
then the normal equations have the simple solution, xi = h~v , ~ei i/gi , i = 1, . . . , n.
Note the following important property of this solution: h~v w
~ , wi
~ = 0 for all w
~ W , i.e., the
difference vector ~v w
~ is perpendicular to W . In fact, the normal equations can be written as
hw
~ , ~ei i = h~v , ~ei i, i = 1, . . . , n so that h~v w
~ , ~ei i = 0 for all i, so that ~v w
~ W.
The problem of fitting a straight line to a sequence of data pairs can be viewed as a best approximation problem. Recall that this problem could be expressed as finding a, b such that
y1
x1
1
.
.
b .. + a .. = b~e + a~x, approximates ~y = ...
1
xn
yn
6
in the sense that k~y (b~e + a~x)k =minimum. This is just the problem we are studying with W the
two dimensional space spanned by ~e , ~x. In this case, the Gram matrix is 2 by 2
P
h~e , ~e i h~e , ~xi
n
P i xi
G=
= P
h~e , ~xi h~x, ~xi
i xi yi
i xi
P
P
and the right hand side vector is [h~e , ~y i, h~x, ~y i]T = [ i yi , i xi yi ]T .
As another example, assume we want to approximate the transcendal function sin(x) on the interval
[0, 1] by a polynomial of degree at most 2. Here V = C([0, 1]), W is the three dimensional subspace
P2 . We will use the L2 norm and take the basis vectors 1, x, x2 for P2 . The Gram matrix is
Z 1
1 1/2 1/3
xi xj dx, i, j = 0, 1, 2 or G = 1/2 1/3 1/4
gij =
0
1/3 1/4 1/5
The right hand side vector is [
R1
0
1.2
0.8
0.6
0.4
0.2
0.2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9