You are on page 1of 8

Review of Concepts from Linear Algebra

Our text jumps right into finite element methods in Chapter 2, but I prefer to step back a bit first
and try to provide a frame work in which we can place the formulations and techniques well use.
All of the problems we will solve have a common thread: they are linear in a sense that we will
make precise shortly. The problems of linear algebra are the prototypes of all linear problems, and
therefore it is important that we review some of the basic concepts of this subject so that we can
see the parallels in more complex situations. An additional benefit of this review is that we will
ultimately reduce all our problems to linear algebraic ones so it is helpful to recall some basic
results from this subject.
A basic notion in linear algebra is that of a vector space. The definition of a vector space is a
formalization of the properties of three dimensional vectors with which we are all familiar. Let F
denote the set of real numbers R or complex numbers C, and call the elements of F scalars. We
say a set V with elements ~x, ~y , ~z, . . . called vectors forms a vector space over F provided:
1. For each element ~x V and each scalar F there is defined a vector ~x V called the
product of and ~x (i.e., there is a function, say p : F V V whose value p(, ~x) at a
particular pair is denoted by ~x usually we omit the and simply write ~x).
2. For each pair of vectors ~x, ~y in V there is defined a vector ~x + ~y called the sum of ~x and ~y
(i.e., there is a function s : V V V whose value s(~x, ~y ) at a pair ~x, ~y of elements in V is
denoted by ~x + ~y ).
3. The functions p and s satisfy the conditions:
(a) ~x + ~y = ~y + ~x.
(b) ~x + (~y + ~z) = (~x + ~y ) + ~z.
(c) There is a unique vector ~0 such that ~x + ~0 = ~x for every ~x. We usually simply write 0
instead of ~0.
(d) For every vector ~x there is a vector ~x such that ~x + (~x) = 0.
(e) (~x + ~y ) = ~x + ~y .
(f) ( + )~x = ~x + ~x.
(g) (~x) = ()~x.
(h) 1 ~x = ~x.
(i) 0 ~x = 0.
A number of elementary properties follow immediately from these definitions (e.g. The cancelation
law ~x + ~y = ~x + ~z ~y = ~z. Proof: ~y = ~y + 0 = ~y + ~x + (~x) = ~z + ~x + (~x) = ~z + 0 = ~z and
~x = (1) ~x proof: 0 = 0~x = (1 1)~x = ~x + (1) ~x, but by (d) 0 = ~x + (~x) ~x = (1) ~x
by the cancelation law.)
All these conditions are obviously satisfied in the two most important cases of vector space we will
deal with:

1. n dimensional Euclidean space Rn = {~x : ~x = (x1 , x2 , . . . , xn )} with addition defined componentwise, ~x + ~y = (x1 + y1 , . . . xn + yn ), and scalar multiplication defined by ~x = (x1 , . . . , xn ).
For n = 3 this is the space of ordinary three dimensional vectors. It is useful in most cases to
think of the components of a vector ~x as arranged in a column, i.e.

x1
.
~x = ..
xn

2. Real or complex function spaces. As a familiar example, let [a, b] denote a closed interval on
the real axis, and let C([a, b]) denote the set of all real valued continuous functions defined on
[a, b]. Letting f, g be elements of C([a, b]) and x [a, b] we define the sum of f and g pointwise
by (f + g)(x) = f (x) + g(x) and if c is a real number we define scalar multiplication pointwise
by (cf )(x) = cf (x). It is clear that these definitions give a vector space. By the same reasoning
the sets C k ([a, b]) of all functions defined, continuous, and having continuous derivatives up to
and including order k all form vector spaces (all strictly contained in C([a, b])).
Vector spaces are the setting for our definition of a linear function. Let V, W be vector spaces
and f : V W be a function defined on V and giving values in W , i.e., if ~x V f (~x) is defined
and is a vector in W . f is said to be a linear function if f (~x + ~y ) = f (~x) + f (~y ). The set V is
called the domain of f , and the set W the codomain. The set of vectors in W that are images of
vectors in V under f , i.e., f (V ) = {y W : f (x) = y}, is called the range of f (range(f ) = f (V )).
For example, suppose a x1 < x2 b are two fixed points in [a, b]. Define A : C([a, b]) R2 by
A(f ) = (f (x1 ), f (x2 )) and we have a function from the vector space C([a, b]) to the vector space R2 .
This function is linear since A(f + g) = (f (x1 ) + g(x1 ), f (x2 ) + g(x2 )) = A(f ) + A(g).
As another example, consider the problem of fitting a pair of data vectors of length n to a straight
line: we want to find a, b such that yi = axi + b, i = 1, . . . , n. We can write this as A~v = ~y , where

y1
1 x1
 
b

.
.
.. , ~v =
A = ..
, and ~y = ... ,
a
1 xn
yn
and we see that A is a linear map from R2 to Rn . If we write Av in the form


x1
1
.
..

.
A~v = b . + a
.
1
xn
we see that the range of A is a linear combination of just two vectors in Rn , and so we cant hope to
have exact equivalence of A~v and ~y except in very special cases the range of A is smaller than Rn .
Our two prime examples of vector spaces have a vastly different size. The size of a vector space V
is measured by its dimension dim(V ) which is either a positive integer or . The Euclidean space
Rn has dimension n whereas the space of continuous functions on an interval [a, b] has dimension .
To get at a definition of the number dim(V ) we need to introduce the notion of a system of basis
vectors of V . This requires several useful ideas.

1. Given a set of vectors S contained in a vector space V . A finite linear combination of


vectors in S is a sum of the form c1~v1 + c2~v2 + + ck~vk with ~vj S j = 1, . . . , k.
2. If U V is a subset of a vector space V , then U is called a subspace if every finite linear
combination of vectors in U belongs to U (in particular, the zero vector must belong to U ).
If U is a subspace, it is a vector space all by itself. (For example, the subspaces of R2 are all
lines through the origin, the subspaces of R3 are all lines and all planes passing through the
origin.)
3. Given a set of vectors S contained in a vector space V . The span of S, written span(S) is the
set of all finite linear combinations of vectors in S, i.e. ~v span(S) ~v = c1~v1 + c2~v2 + +
ck~vk , ~vj S, j = 1, . . . , k. Of course, span(S) is a subspace of V for every S V.
4. A set of vectors S = {~v1 , . . . , ~vk } in a vector space V is called linearly independent if
c1~v1 + c2~v2 + + ck~vk = 0 cj = 0, j = 1, . . . , k, that is a linear combination of the vectors
can vanish only when all the coefficients of combination are 0. A set of vectors is linearly
dependent if it is not linearly independent, i.e., if there exist scalars c1 , . . . , ck , not all zero
such that c1~v1 + c2~v2 + + ck~vk = 0.
5. A set of vectors S in the vector space V forms a basis of V if S is linearly independent and
span(S) = V . If S is a basis, then every vector
in V P
may be written as a unique linear
P
0 0
combination of vectors in S. (Suppose ~y =
c
~
v
=
vk . We can assume the same
j
j
j
k ck ~
vectors appear
in
each
finite
sum

otherwise
include
them
with
0
coefficients. Then subtracting
P
we have j (cj c0j )vj = 0 cj = c0j for all j since the vectors are linearly independent.)
It can be shown that if a vector space is finite dimensional, then every basis has the same number
n
of elements, and we call this number dim(V ). In the case
P of R it is clear that the vectors ~ej with
n
jth component 1 and all P
others 0 span R since ~x = j xj ~ej . It is also clear that the vectors ~ej
are linearly independent, j cj ~ej = 0 cj = 0, j = 1, . . . , n so that dim(Rn ) = n according to our
definitions.
It can also be shown that given any linearly independent set of vectors S it is always possible to find
a spanning set S1 S, i.e., any linearly independent set may be increased to form a basis. Now the
set of functions {xk : k = 0, 1, 2, . . .} is linearly independent in C([a, b]) (if c0 + c1 x + + cn xn = 0,
we would have a polynomial of degree n with more than n roots which is impossible). Since this set
is not finite dim(C([a, b])) = . This space has many finite dimensional subspaces, e.g., the set of
all polynomials of degree n is a subspace of dimension n + 1 (spanned by {1, x, x2 , . . . , xn }
A major concern of linear algebra is the study of linear maps between finite dimensional linear
spaces. Weve already seen an example of this in the data fitting problem, where the linear map
was a matrix. Well now see that any linear map can be thought of as a matrix. Let A : V W be
a linear map between a vector space V (dim(V ) = n, ~vj , j = 1, . P
. . , n a basis) and a vector space
vj , we can write the image
W (dim(W ) = m, w
~P
,
k
=
1,
.
.
.
,
m
a
basis).
Since
~
v

~
v
=
k
j xj ~
vector w
~ = A(~v ) = j xj A(~vj ). But each of the n vectors A(~vj ) W can be expressed in terms of
the basis w
~ k , i.e., we have
m
X
A(~vj ) =
asj w
~ s,
s=1

where the m n coefficients asj are given numbers for a fixed pair of bases and a fixed operator A.

We therefore find that


w
~=

n
X

xj

j=1

Since w
~=

m
X
s=1

m X
n
X
asj w
~s =
(
asj xj )w
~ s.
s=1 j=1

ys w
~ s with unique coefficients we have
ys =

n
X

asj xj , s = 1, . . . , m

j=1

Using matrix multiplication we can write w


~ = A(~v ) as the matrix equation



y1
a11 a1n
x1
.. ..
.. ..
=
,
.
.
.
.
am1 amn
xn
ym
where the m n matrix [aij ] corresponds to the operator A. It is useful to observe that the jth
column of the matrix is the result of applying A to the jth basis vector. The correspondence between
a linear map and a matrix is one to many since each time we choose a system of basis vectors for V
and W we get, in general, a different matrix representing the map. On the other hand, linear maps
can now be thought of as simply matrices objects we are all familiar with.
As an example, lets take V = P2 =the set of all polynomials of degree at most 2 (considered as real
values functions on R). This has the basis {e0 (x) = 1, e1 (x) = x, e2 (x) = x2 } (note indices run from
0 to 2). We define on this set the operator D : V W = P1 (with basis vectors e0 , e1 ) defined by
Dp(x) = dp(x)/dx. We have De0 = 0, De1 = e0 , De2 = 2e1 . Thus if p = c0 + c1 x + c2 x2 we have
Dp = p0 = c00 e0 + c01 e1 or in matrix form

 0
 c0

c0
0 1 0
c1 =
Dp =
c01
0 0 2
c2
Note that Dp = 0 is satisfied not only by the zero vector (polynomial) p 0, but also by p = c0 e0 .
For any linear function A, we set ker(A) = {~v V : A~v = 0}, and call this set the kernel or
null space of A. If ker(A) = {0}, we say that is trivial. If ker(A) is trivial then the equation
A~v = w
~ has at most one solution for a given w
~ W . (Suppose there were two ~v1 , ~v2 , then
A(~v1 ~v2 ) = w
~ w
~ = 0 ~v1 ~v2 ker(A) ~v1 ~v2 = 0.) If ker(A) is nontrivial, then the solution
of A~v = w
~ for a given w
~ W is not unique: Reason if ~v1 is one solution, then ~v1 + ~v0 is also a
solution for any ~v0 6= 0 ker(A).
Up to this point, we have not introduced two very important features of the Euclidean space Rn ,
and we consider these features now.
1. Rn has a notion of distance defined on it: namely there if ~x Rn the length of ~x is given
P
1/2
2
, and if ~y is another point the distance between ~x and ~y is defined by
by k~xk =
x
j j
k~x ~y k2 . The function k k2 : Rn R is called the Euclidean norm on Rn .
Let V be a vector space and assume there is a function k k : V R. This function is called
a norm on V if satisfies the following three conditions
4

(a) k~v k 0, k~v k = 0 ~v = 0


(b) k~v + wk
~ k~v k + kwk.
~
This inequality is known as the triangle inequality. In R2 with
the Euclidean norm it states that the length of any side of a triangle is less than the sum
of the lengths of the other two sides. If n > 2, it may not be obvious that the triangle
inequality is satisfied by k k2 , but well see in a moment that it is.
(c) k~v k = ||k~v k.
Other distance functions or norms can also be defined of Rn . For example, it is easy to show
that k~xk = maxi |xi | satisfies the three conditions just given. A vector space on which a norm
is defined is called a normed linear space.
2. Rn has a scalar (or inner, or dot) product defined on it by which the idea of angles
Pn between
vectors may be defined, recall that (at least in two and three dimensions) h~x, ~y i = j=1 xj yj =
k~xkk~y k cos(~x, ~y ). In general, a scalar product on a vector space V is a function hi : V V
R or C such that (assuming the codomain is R)
(a) h~x, ~y i = h~y , ~xi
(b) h~x, ~y i = h~x, ~y i, and h~x + ~z, ~y i = h~x, ~y i + h~z, ~y i
(c) h~x, ~xi 0, h~x, ~xi = 0 ~x = 0
The usual scalar product on Rn clearly satisfies these axioms, and we see that k~xk22 = h~x, ~xi,
that is, the norm is actually expressible in terms of the inner product. A vector space with
an inner product defined on it is called
p an inner product space. Any inner product space
becomes a normed space with k~xk h~x, ~xi as the norm1
A vector space with an inner product defined on it is called an inner product space, and this
is the type of space where most of our problems can be posed in a natural way. Aside from finite
dimensional spaces like Rn we will also have to contend with infinite dimensional spaces like C([a, b]).
This vector space and others like it can be made into inner product spaces in many ways, but one
useful straight forward definition of inner product is
Z b
f (x)g(x) dx
hf, gi =
a

By the properties of the integral, it is clear that the first two requirements of an inner product are
Rb
satisfied, and for a continuous function a f 2 (x) dx = 0 implies that f 0 as you can easily check.
The general situation that we will deal with in determining approximate solutions to both ordinary
and partial differential equations is this: We will be seeking the solution of a linear differential
equation and will be able to pose our problem using linear space terminology. That is, the differential
equation will be viewed as a linear operator from a vector space V to a vector space W , A : V W ,
and for a given f~ W we have to find a vector ~v V such that A~v = f~. For example, consider the
boundary value problem
u00 = f, 0 < x < 1,

u(0) = u(1) = 0, f C([0, 1]) given

1 All

norm conditions but the triangle inequality are obvious. Define the quadratic Q() = h~
x ~
y, ~
x ~
y i 0.
Expanding and using the properties of the inner product gives Q = h~
x, ~
xi 2h~
x, ~
y i + 2 h~
y, ~
y i. The minimum value of
Q occurs for h~
y, ~
y i = h~
x, ~
y i. Assuming ~
y 6= 0 this means h~
x, ~
xi h~
x, ~
y i2 /h~
y, ~
y i = Qmin 0. From this computation
we find the result |h~
x, ~
y i| k~
xkk~
y k called the Cauchy inequality. The triangle inequality follows from this, and the
computation k~
x+~
y k2 = k~
xk2 + 2h~
x, ~
y i + k~
y k2 k~
xk2 + 2|h~
x, ~
y i| + k~
y k2 . But using the Cauchy inequality this last
sum is k~
xk2 + 2k~
xkk~
y k + k~
y k2 = (k~
xk + k~
y k)2 , and the triangle inequality is proved.

take V = C02 ([0, 1]) = {u C 2 ([0, 1]) : u(0) = u(1) = 0}, W = C([0, 1]), and for u V take
Au(x) = u00 (x). Then the equation and the boundary conditions are reduce to finding the solution
of the operator equation Au = f.
It is important to realize that a numerical approximation to a problem of solving the linear equation
Au = f for a given f where the domain space for A has infinite dimension (like C02 ([0, 1])) can
only yield a finite number of values, say uj , j = 1, . . . , n which may approximate the values of the
solution u at certain points in [0, 1]. For our purposes, it is best to view this constraint as having
to find an approximation to the solution u in a finite dimensional subspace of V. If V is an inner
product space we have the following important approximation theorem.
If V is an inner product space, ~v V , and W a finite dimensional subspace of V , then the problem
of finding a w
~ W such that k~v wk
~ =minimum has a unique solution, w
~ , i.e., k~v w
~ k =
minwW
k~v wk.
~
~
I wont try to prove the existence of w
~ rigorously. Instead Ill develop formulas for computing
P its
components (assuming a real vector space). Let ~ej , j = 1, . . . , n be a basis for W , and w
~ = j xj ~ej
be an element of W . We have
d k~v wk
~ 2 = h~v w,
~ ~v wi
~ = k~v k2 2h~v , wi
~ + kwk
~ 2
and introducing the components of w
~
d = d(x1 , . . . , xn ) = k~v k2 2

n
X
j=1

h~v , ~ej ixj +

n
X

h~ei , ~ej ixi xj

i,j=1

In order to minimize d we set its partial derivatives with respect to xj , j = 1, . . . , n equal to 0. This
gives
n
X
h~ei , ~ej ixj = h~v , ~ei i, i = 1, . . . , n or, G~x = ~y
j=1

with ~x = [x1 , . . . , xn ] , ~y = [h~v , ~e1 i, . . . , h~v , ~en i]T . These equations are called the Normal Equations.
The matrix G = [h~ei , ~ej i] is called the Gram matrix of the basis vectors ~ei . It is relatively easy to show
that G is nonsingular so that the normal equations always have a solution, xj = xj , j = 1, . . . , n.
In particular, if the basis vectors are mutually perpendicular,

gi > 0, i = j
h~ei , ~ej i =
0,
i 6= j
then the normal equations have the simple solution, xi = h~v , ~ei i/gi , i = 1, . . . , n.
Note the following important property of this solution: h~v w
~ , wi
~ = 0 for all w
~ W , i.e., the

difference vector ~v w
~ is perpendicular to W . In fact, the normal equations can be written as
hw
~ , ~ei i = h~v , ~ei i, i = 1, . . . , n so that h~v w
~ , ~ei i = 0 for all i, so that ~v w
~ W.
The problem of fitting a straight line to a sequence of data pairs can be viewed as a best approximation problem. Recall that this problem could be expressed as finding a, b such that



y1
x1
1
.

.
b .. + a .. = b~e + a~x, approximates ~y = ...
1
xn
yn
6

in the sense that k~y (b~e + a~x)k =minimum. This is just the problem we are studying with W the
two dimensional space spanned by ~e , ~x. In this case, the Gram matrix is 2 by 2


 
P
h~e , ~e i h~e , ~xi
n
P i xi
G=
= P
h~e , ~xi h~x, ~xi
i xi yi
i xi
P
P
and the right hand side vector is [h~e , ~y i, h~x, ~y i]T = [ i yi , i xi yi ]T .
As another example, assume we want to approximate the transcendal function sin(x) on the interval
[0, 1] by a polynomial of degree at most 2. Here V = C([0, 1]), W is the three dimensional subspace
P2 . We will use the L2 norm and take the basis vectors 1, x, x2 for P2 . The Gram matrix is

Z 1
1 1/2 1/3
xi xj dx, i, j = 0, 1, 2 or G = 1/2 1/3 1/4
gij =
0
1/3 1/4 1/5
The right hand side vector is [

R1
0

xi sin(x) dx]T = [2/, 1/pi, ( 2 4)/ 3 ]. Using Matlab we find

>> g=[1 1/2 1/3;1/2 1/3 1/4;1/3 1/4 1/5]


g =
1.0000
0.5000
0.3333
0.5000
0.3333
0.2500
0.3333
0.2500
0.2000
>> rhs=[2/pi, 1/pi, (pi^2-4)/pi^3]
rhs =
0.6366
0.3183
0.1893
>> y=g\rhs
y =
-0.0505
4.1225
-4.1225
>> xx=linspace(0,1);
>> yy=sin(pi*xx);
>> zz=y(1)*ones(1,100)+y(2)*xx+y(3)*xx.^2;
>> plot(xx,yy,xx,zz)
The plot shows close agreement.

1.2

0.8

0.6

0.4

0.2

0.2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9