You are on page 1of 11

Chapter 0

Motivation

In linear algebra, we will be working with, as the name suggests, linear things, but probably not
in the intuitive geometric sense you are used to. You are probably already familiar with lines and
planes, but the “linear” suggested in “linear algebra” requires that they pass through the origin as
well. However, even this geometric picture is only a subtle part of the picture.
Linear algebra is the basis for modern physical simulation and computation; it is analogous to
algebra in importance when trying to solve equations. Linear algebra is the extension of ordinary
algebra of linear equations (where unknowns are not raised to any powers) to higher dimensions.
Linear algebra allows you to solve systems of equations, determine if there are no solutions or
infinitely many, and most importantly, do it in a systematic way. Beyond solving equations, linear
algebra also opens up systematic methods of analyzing systems, to draw general conclusions about
systems and gain intuitiion for very high dimensional objects.
The great achievement of applied math in the last century was producing algorithmic ways
of performing linear algebra manipulations in a reliable, robust way. We will explore these de-
velopments throughout the course by studying particular algorithms to replace methods of hand-
computation.
A basic understanding of linear algebra is required for engineering in general. Most physical
sciences require the concept of eigenvalues and eigenvectors (without necessarily dealing with linear
algebra directly), which is covered in the latter half. Linear algebra is also required for further
studies in most mathematical fields, including and certainly not limited to (systems of) differential
equations, group theory, and modern algebra and geometry. I would say that the first half of the
course covers all of the linear algebra you would ever learn in a US high school. The second half
covers everything you would cover in an introductory undergraduate course. Keep in mind that the
class will skip over most of the mathematical rigor, but there are still a few short and simple proofs
thrown in because they are interesting. You will not get a deep enough mathematical understanding
to continue further in math, but this is probably enough for the science and engineering audience.

1
Chapter 1

Vectors, Matrices, and Spaces

1.1 What are these things?


The basic objects that we work with are objects called “vectors”, “matrices” (a matrix, two ma-
trices), and “spaces” (actually, “vector spaces”, but let’s not dwell on this). Practically, a vector
is a list of numbers, a matrix is a two-dimensional grid of numbers, and a space is where vectors
live (in the same way numbers live on the number line). Let’s bring a more concrete form to these
concepts; we write vectors as a column of numbers surrounded by square brackets:
 
3.4
x = 6.1

1.2

When we give names to vectors, we usually use lowercase roman letters near the end of the alphabet
(x in the example above). Each item (number) in the vector is called an “element” of the vector,
and we use numerical subscripts (called indexes) to refer to them (x2 = 6.1). We write matrices as
a grid of numbers surrounded by square brackets:
 
7.1 2.0 0.0 1.3
A=
5.4 5.0 3.9 2.3

Matrices are usually given uppercase roman letter names near the beginning of the alphabet (A
in the example above), but sometimes we will use letters near the end. The elements of a matrix
require two subscripts to determine their location; the first is the row number, the second the
column number (A12 = 2.0). We denote the space that a vector lives in by the “field” of the vector
elements raised to a power equal to the length of the vector. The field of a vector element is simply
the kind of number that goes into the vector. In the examples above, we are only using real numbers
(denoted by R), but we could just as easily used complex numbers (denoted by C). With this in
mind, the vector space to which x belongs in the above example is R3 (meaning that x is a vector
of length 3, with elements taken from the real numbers). Finally, we refer to individual numbers
themselves as “scalars”. You can think of a scalar as a vector of length 1, or a 1 × 1 matrix.
We can visualize vectors in three and fewer dimensions as arrows in space originating from the
origin and whose tips are located at points with coordinates are the elements of the vector. In this
way, a vector can be thought of encapsulating a direction in space, and a magnitude. The vector
space in which such a vector lives is the entire space; for a 2D vector in the Cartesian plane, the

2
entire Cartesian plane is the vector space R2 . This is an especially useful picture to keep in mind
if you are new to linear algebra.

(2.5, 1.5)

x
1

R2, Euclidean plane

Figure 1.1: Visualizing a vector in R2 . All vectors begin at the origin and end at a point with
coordinates equal to the elements of the vector.

The sizes (dimensions) of these objects is very important; a vector and its associated space is
characterized by a single integer: the length of the vector. The size of a matrix is the number
of rows and the number of columns (in that order). We refer to matrices with equal numbers of
rows and columns as square matrices, a subset of generally rectangular matrices. We will consider
vectors to be columnar and columns of matrices to be vectors, and rows of matrices to be row
vectors (transposed column vectors). The “diagonal” of a matrix are those entries with equal row
and column indexes. We call a matrix a diagonal matrix if all its off-diagonal entries are zero.
In fact, as we will discuss later on, matrices can also be thought of living in vector spaces, with
dimension equal to the number of elements in them.
The space is really an infinite set of objects. Consider R3 (pronounced “are-three”), it contains
all the possible vectors of length 3 which contain real numbers. We denote this relationship by
x ∈ R3 (pronounced “ex in are-three”). Similarly, for the above matrix, we could write A ∈ R2×4 .
Note that for matrices, we explicitly write the number of rows times the number of columns in the
exponent.
We have no set up all the basic terminology so you can at least communicate vector and matrix
information with other people. A complete verbal description of x above would be: “x is in are-
three, x-sub-one equals 3.4, x-sub-two equals 6.1, and x-sub-three equals 1.2”.

1.2 The delusion of meaning


When you write down a vector or a matrix as a column or a grid of numbers on a piece of paper,
somebody else looking at it would see only the numbers but not know what the numbers mean.
In the context of linear algebra, a vector or matrix is an object for which certain mathematical

3
operations can be performed on it and the result should still have a reasonable interpretation. Take
for example the following two tables:
1 2 3 4 5 6 7 8 9
9 8 5 7 2 1 3 6 4
7 6 4 3 9 8 2 1 5
2 9 8 5 6 4 1 7 3 Company Product A Product B
4 5 7 1 8 3 6 9 2 Company 1 48.29 69.28
6 3 1 9 7 2 4 5 8 Company 2 38.68 72.52
5 1 2 6 3 9 8 4 7
8 7 6 2 4 5 9 3 1
3 4 9 8 1 7 5 2 6
The table on the left is a solved Sudoku grid, the table on the right is a table of costs of
items for certain companines. It’s not hard to imagine that adding values to the table on the left
probably would not make much sense, while it would for the table on the right. Furthermore, we
could multiply the table on the right by a column vector of product quantities to obtain a new
vector of total costs, but multiplication by a Sudoku grid is meaningless. This is a conceptual
misunderstanding people often make regarding matrices; a 2D grid of numbers is not always a
matrix in the mathematical sense, even though the underlying data structure, representation, or
storage format is a 2D grid. Here is another example: taking the red component of a bitmap image
usually does not result in a mathematical matrix with respect to the operation of multiplication (as
we will see later), but many people refer to image manipulation as matrix manipulation. I would
argue this is a poor choice of terminology.
The point I want to drive home in this section is that the meaning and interpretation of a matrix
is ascribed by whoever wrote it down, or whoever reads it; it is not an intrinsic notion attached to
the grid of numbers (the analogous statement holds for vectors). In practice, vectors and matrices
are almost certainly used to solve some kind of real world problem, so all the numbers have some
physical meaning and units. It is important to keep this in mind when working with matrices,
because operations on these mathematical objects should keep their meanings consistent.

1.3 Basic arithmetic


1.3.1 Addition, Subtraction, Scaling

Vectors and matrices can be added to each other or subtracted from one another, provided their
dimensions are the same, thus a vector of length 4 can only be added to another vector of length 4,
etc. It is not meaningful to operate on two objects of different dimensions. Addition and subtraction
are performed elementwise, as you would guess. So, for example,
     
4.7 −3.7 9.2 3.2 −4.5 −6.9
− =
2.6 0.9 0.5 −5.2 2.1 6.1

Addition has all the properties of scalar addition: it is commutative, so x+y = y+x, and associative
so (x + y) + z = x + (y + z). You can scale a vector or matrix by a scalar value, which just multiples
each entry of the vector matrix by that value.

4
1.3.2 Matrix multiplication
Multiplication is not as you would expect; it is not simply multiplying together corresponding
elements of two matrices. First of all, multiplication as we will define it is only between two
matrices; vectors are treated differently. Second, the order of the two matrices when multiplying
matters; for two matrices A and B, AB 6= BA in general. Third, the “inner” dimensions of the
matrices must match; for two matrices A ∈ Rm×n and B ∈ Rp×q , we must have n = p, and
AB ∈ Rm×q . Or in words, the number of columns of the first matrix must equal the number of
rows of the second matrix. The dimension of the resulting matrix is the number of rows of the first
by the number of columsn of the second. These properties make matrix multiplication extremely
confusing to beginners.
Now, let’s actually discuss how matrix multiplication works. We will refer to the relationship
C = AB, with A ∈ Rm×n , B ∈ Rn×q , and C ∈ Rm×q . The formula for computing a single element
of C is
Xn
Cij = Aik Bkj
k=1

A11 A12 A13 B11 B12 C11 C12


A21 A22 A23 B21 B22 C21 C22
A31 A32 A33 B31 B32 C31 C32

Figure 1.2: Visual description of matrix multiplication: Each element of the result comes from the
row of the first matrix and the column of the second matrix corresponding to the row and column of
the result element. For that row and column, multiply corresponding elements and add the results
to obtain the value of the resulting entry.

Here is the completely worked out 2 × 2 symbolic example so you can follow along:
    
a b p q ap + br aq + bs
=
c d r s cp + dr cq + ds
Multiplying a matrix by a vector simply involves treating the vector as a column matrix, which
results in another vector, possibly of a different length.

1.3.3 Identity elements and the matrix inverse


Let us now make a small digression and discuss some special vectors and matrices. There are
zero vectors and matrices which are the additive identity; adding zero to any other vector or
matrix leaves it unchanged. We say that there are multiple zero vectors because differently sized
zero vectors are distinct objects which cannot be interchanged. Similarly, there are multiplicative
identity matrices, denoted by In , but usually we don’t write the dimensions because it is obvous
how large it should be. An identity matrix is a square matrix that is zero everywhere except for
ones along the diagonal. For example,
 
1 0 0
I3 = 0 1 0
0 0 1

5
Multiplication by the identity matrix is commutative: AI = IA = A.
With these identity elements defined, it is clear that the additive inverse is simply obtained
by negation. The multiplicative inverse, the closest thing to division, is only well defined for
square matrices. The inverse of a matrix A, when it exists, is denoted A−1 , with the property
that A−1 A = AA−1 = I. Note that not every square matrix has an inverse; the zero matrix, for
example, does not, for the same reason that you cannot divide by zero. When an inverse exists
however, it is unique.
The inverse a product of matrices is (AB)−1 = B −1 A−1 (note the order changes). It is easy
to see why this must be: (AB)−1 AB = I, so then right multiply each side by B −1 to obtain
(AB)−1 ABB −1 = (AB)−1 A = B −1 . Do the same with A−1 and you get the result.
There also exists such a thing as a pseudo-inverse for non-square matrices, but we will not
encounter the concept until the end. We have only defined the inverse so far, we do not yet know
how to compute it.

1.3.4 Manipulating matrix shapes and elements


When treating a matrix as simply a grid of numbers, we can perform several operations on them.
The most common is transposition. The transpose of a matrix A is denoted AT . If A ∈ Rm×n ,
then AT ∈ Rn×m (the number of rows and columns is swapped). The entries of the transpose are
(AT )ij = Aji ; we flip all elements across the diagonal. The transpose of a product of matrices
follows the same rule as the inverse of a product; the order gets reversed.
We define the operator diag(A) operating on matrices to produce the vector made from the
elements of the diagonal of A. Therefore, if A ∈ Rm×n , diag(A) ∈ Rmin(m,n) and diag(A)i = Aii .
With this operator, it should be obvious that diag A = diag(AT ). We also define the operator
diag(x) operating on vectors to produce the square matrix of all zeros, but with the entries of x
on the diagonal. Therefore, if x ∈ Rn , diag(x) ∈ Rn×n , and diag(x)ij = xi if i = j, otherwise 0.
We can play games now: diag(diag(x)) = x, but diag(diag(A)) 6= A unless A was diagonal.
One will frequently also encounter the symbols 1 and δij . The symbol 1 represents the vector
of all ones (like the zero vector, but with ones instead of zeros). We can then represent the identity
matrix as I = diag(1). The symbol δij is the Kronecker delta symbol, and it takes the value 1
if i = j, and 0 otherwise. The elements of the identity matrix are Iij = δij , and we can further
redefine the elements of the diag operator on vectors as diag(x)ij = xi δij .
Frequently when working with many equations involving matrices, it is convenient to lump them
together into a single equation. It is implied that when we write
 
A B
F =
C D

where A, B, C, and D are matrices (or vectors), that F is a matrix whose four corners are simply
the properly tiled elements of its four submatrices. We also assume in writing this that the sizes of
the component submatrices are compatible (A and B have the same number of rows, as do C and
D, and that A and C have the same number of columns, as do B and D). We can define matrix
multiplication recursively using this kind of divide-and-conquer approach:
    
A B P Q AP + BR AQ + BS
=
C D R S CP + DR CQ + DS
Note that the order in which we wrote all the symbols in the products matters.

6
1.4 The unfamiliar arithmetic
The previous section provided analogues to the usual arithmetic operations we perform on scalars,
as well as basic operations for manipulating the shapes and arrangements of 2D grids of numbers.
In this section, we introduce several additional operations which have no simple scalar analogue.

1.4.1 The scalar product of two vectors


Commonly called the “dot product” of two vectors, the scalar product takes two vectors of identical
size and produces a single scalar number. We will use these two names interchangeably, although
it is also often called an inner product. For two vectors x, y ∈ Rn , the scalar product is written
x · y and defined by:
Xn
x·y = x i yi
i=1

It is clearly commutative (x · y = y · x), distributive (x · (y + z) = x · y + x · z), and linear in


each argument. The dot product loosely determines how much of one vector points in the direction
of another. If two vectors are roughly pointing in the same direction, their dot product will be
large and positive. If they roughly point in opposite directions, their dot product will be large and
negative.
In performing matrix multiplication, you must perform one dot product per entry of the result;
the dot product is between one row of the first matrix and one column of the second.

1.4.2 The norm of a vector


The “norm” of a vector is a measure of its length (the length of the visualized arrow, not the

number of elements). The norm of a vector x is denoted kxk = x · x. You can recognize this as
the usual Euclidean distance formula from the origin to a point with coordines x, and therefore we
always have kxk ≥ 0. A vector is normalized if kxk = 1, and we may normalize any nonzero vector
with x/kxk.
Two vectors are orthogonal if x · y = 0, a generalization of the notion of perpendicularity. For
example, in two dimensions, orthogonal vectors are perpendicular to each other:

1.4.3 Vector projection


You may already have been exposed to the dot product in plane geometry, where if two vectors are
separated by an angle θ, their dot product is x · y = kxkkyk cos θ. We may extend this notion to
higher dimensions by defining the angle θ between two vectors to be
x·y
cos θ =
kxkkyk

At this point, we can more clearly define our notion of how much of one vector lies in the direction
of another by defining the projection operator:
 
u·v u u
proju v = u = v ·
kuk2 kuk kuk

7
v
v - projuv
u

q projuv

||u||

Figure 1.3: Illustration of projection and norm in R2 .

It is the amount of vector v in the direction of u; note that the last equality makes clear that we
dot v against the normalized version of u, obtaining the amount of length of v in the u direction,
and then multiply back in the unit direction of u.
What if we wanted to obtain the component of v that is not in the direction of u? In other
words, we are interested in the part of v which is orthogonal to u. This is simply v − proju v;
subtract the projected component from the original vector and you get the orthogonal part.
The idea of projection, and more generally inner products, finds use in many other fields, where
linear algebra is not practiced with grids of numbers. The vector spaces involved might be composed
of infinite families of functions, or more abstract objects (as long as they satisfy the requirements
of a mathematical “field”). In all of these cases, the inner product generalizes the dot product to
computing a kind of “overlap” between two objects; the larger their overlap, the more similar they
are, and orthogonality means that they don’t overlap. But I digress; all I mean to say is that these
concepts are much more general and you will likely encounter them elsewhere.

1.5 Numerical Algorithm 0: The BLAS


Invariably, if you want computionally perform matrix arithmetic, you will encounter the BLAS,
short for the Basic Linear Algebra Subroutines. The BLAS is a well defined interface (a set
of function declarations) for performing operations such as vector addition, multiplication, dot
products, and many other operations. They are organized by the time complexity of the operation,
called the level of the BLAS operation. BLAS level 1 deals with operations which take time
proportional to the length of their input vectors, BLAS level 2 deals with matrix-vector products
(which take time proportional to the number of elements in the matrix), and BLAS level 3 deals
with things like matrix-matrix multiplication. A brief overview of the routines can be found by
searching for “BLAS Quick Reference”. We will provide some examples on using it, and describing
the rationale behind several interface choices.

8
1.5.1 Data storage format
Vectors are typically stored as contiguous arrays of values. A vector is very much like a C array,
in which it is identified by the address of its first element. A vector x ∈ R6 would be stored in
the C equivalent of double x[6] (or float). However, often we must pick out vectors as parts of
matrices, in which the values may not be contiguous, but are merely spaced evenly. The BLAS
associates with every vector an increment, which is the offset in memory between adjacent elements
of the vector. Usually, the increment is 1, meaning that elements are stored contiguously without
intermediate space.
Matrices are two dimensional arrays, which are stored in linear memory one column after
another. Just like in C, when given an array (a pointer to the first element) you cannot tell if the
array stores a vector or a matrix; the interpretation of the data is up to you, and you just have
to remember it. It is important that the matrix is stored in column-major format, that is, the
first column is stored as a vector, then the second column immediately follows it, etc. Therefore,
a matrix A ∈ R3×4 would be stored in the C equivalent of double A[12], and to get to the (i, j)
element (where i and j are 1-based row and column numbers), we would index (0-based offsets) as
A[(i-1)+(j-1)*3]. Note that in order to properly index into the array, we needed to know the
number of rows of the matrix A. This concept is generalized into the “leading dimension” which is
the offset in memory between adjacent elements in a row. The reason that the leading dimension
is not always equal to the number of rows is when we wish to work with submatrices, where the
number of rows of the submatrix is less than the leading dimension of the underlying storage.
n

B p

m A

Figure 1.4: Storage format expected by the BLAS for a matrix A ∈ Rm×n and a submatrix
B ∈ Rp×q . The leading dimension of A is simply m, while for B it is also m, and not p.

In order to obtain the i-th row of A as a vector, we would refer to the address of the first element
&A[(i-1)+0*3], and the corresponding vector increment is the leading dimension of A, which is 3.
This is the most common reason for increments not equal to 1.
For C++ programmers, a gut reaction is often to immediately wrap up vectors and matrices
into classes, with proper data hiding, accessors, and operator overloading. This is a reasonable
thing to do as long as the kinds of operations you wish to perform are limited to a small set. The
trouble is that the language of linear algebra is very rich, and the kinds of operations permissible
are varied. For example, you may overload the multiplication operator * to operator on matrix-
vector pairs. But now, what if you need to compute AT x, or for complex numbers, AH x? You
could create a more complicated interface to write things like transpose(A)*x, or you can imagine

9
making multiplication a method of the matrix class. You will find that this quickly gets out of
hand, and in order to not lose efficiency, you have to jump through many hoops leaving you with
spaghetti code. For the purposes of this course, I stick with the plain old Fortran storage format,
and you simply annotate your code with the interpretations of pointers to vectors and matrices.
It’s clunky, but at the end of the day, simple and performant. I have found after many years of
fighting against matrix libraries that this plain approach suites me better.

1.5.2 Level 1
The BLAS was conceived in the early days of Fortran, when subroutine names were limited to
around 6 characters, hence some of the names are a bit cryptic. Fortran and C lack function over-
loading, so the first letter of each subroutine designates the number type of the data. Subroutines
starting with s operate on floats (32-bit IEEE floating point numbers), while those starting with
d operate on doubles (64-bit IEEE floating point numbers). There are also corresponding complex
routines starting with c and z, respectively.
Level 1 blas contains routines to copy, swap, scale, and add vectors. Keep in mind that due
to the storage format of matrices, these routines work on matrices if their leading dimension is
equal to the number of rows, or else you must operate separate on each column of the matrix. Also
in Level 1 are dot products, norms, and a routines to determine the index of the largest element
of a vector. You should keep in mind that these indexes are 1-based, since Fortran uses 1-based
indexing. The programming language of choice in this course is C, in which 0-based offsets are
most natural, so you need to take care when calling into the BLAS that you refer to the proper
indexes, or else you will generate segmentation faults.

1.5.3 Levels 2 and 3


Level 2 contains matrix-vector products, back substitution routines, and low rank updates. The
second and third letters indicate the matrix format since the BLAS contains optimized routines for
matrices with special structure. Matrices without any structure are general matrices, with the two
letter acronym GE. Triangular matrices are stored in either the upper or lower half of a rectangular
matrix, and have the acronym TR. There are others, but we will not work with anything other
than the general routines. For structured matrices, one must also indicate the storage format (e.g.
upper or lower for triangular matrices). Level 3 contains matrix-matrix products, as well as back
substitution and low rank updates.

1.5.4 Why use the BLAS?


With the knowledge of how matrix-matrix multiplication works, you might ask why anyone would
use the BLAS when the following code works just fine:

for(i = 0; i < m; ++i){


for(j = 0; j < p; ++j){
C[i+j*ldC] = 0;
for(k = 0; k < n; ++k) C[i+j*ldC] += A[i+k*ldA] * B[k+j*ldB];
}
}

10
The code is simple, short, and does not require linking against ancient Fortran libraries.
The reason to use the BLAS is because you can often achieve several orders of magnitude in
speedup. Tuned BLAS implementations can squeeze out nearly the theoretical maximum achievable
performance out of a machine, something that the above simple code cannot do. If you search for the
BLAS, you will find the official site contains a reference implementation, which probably performs
comparably to the above code. However, if you use the ATLAS package, GotoBLAS, Intel’s MKL,
or AMD’s ACML, you will find a substantial improvement in performance. The MKL is not free,
and ACML is only optimized for AMD processors. ATLAS and GotoBLAS are notoriously difficult
to compile. Despite the remarkable maturity of numerical linear algebra, from a user standpoint,
it is still quite formidable. We will be satisfied using just the reference implementation since it is
reasonably easy to get working, and packaged versions are available on most operating systems.

1.6 Conclusion
Throughout this section we have been dancing around the idea that vectors and matrices belong to
vector spaces. The formal notion of a vector space places requirements on the operations between
its elements. For example, a vector space must be linear (which requires our definition of addition
and scalar multiplication). Also, a vector space must have a norm (in the mathematical sense),
which our definition above also satisfies. The formal notion of a vector space merely codifies these
properties that we have so far attributed to vectors and matrices of numbers, and allows these
concepts to be applied to more general objects. If you already knew about vector spaces, then all
the properties of matrix multiplication should have been evident to you. We are only focusing on
vector spaces of real and complex numbers, so much of the definitional baggage is overkill. I just
want you to be aware that it is there.

11

You might also like