You are on page 1of 52

Least squares and the singular value

decomposition
Ivan Markovsky
University of Southampton
(Lecture 3) Least squares and the singular value decomposition 1 / 52
Outline
QR and SVD decompositions
Least squares and least norm problems
Extensions of the least squares problem
Recursive
Multiobjective
Regularized
Constrained
(Lecture 3) Least squares and the singular value decomposition 2 / 52
QR and SVD decompositions
(Lecture 3) Least squares and the singular value decomposition 3 / 52
Orthonormal set of vectors
Consider a nite set of vectors Q:=q
1
, . . . , q
k
R
n
Q is orthogonal : q
i
, q
j
) :=q

i
q
j
, for all i ,=j
Q is normalized : |q
i
|
2
2
:=q
i
, q
i
) =1, i =1, . . . , k
Q is orthonormal : Q is orthogonal and normalized
with Q :=
_
q
1
q
k

, Q orthonormal Q

Q =I
k
Properties:
orthonormal vectors are independent
multiplication with Q preserves inner product and norm
Qz, Qy) =z

Qy =z

y =z, y)
(Lecture 3) Least squares and the singular value decomposition 4 / 52
Orthogonal projectors
Consider orthonormal set Q:=q
1
, . . . , q
k
and L :=span(Q) R
n
.
Q is an orthonormal basis for L.
With Q :=
_
q
1
q
k

, Q

Q =I
k
, however, for k <n, QQ

,=I
n
.

span(Q)
:=QQ

is an orthogonal projector on span(Q), i.e.,

L
x =argmin
y
|x y|
2
subject to y L
Properties: =
2
, =

(necessary and sufcient for orth. proj.)

:= (I ) is also orthogonal projector, it projects on


_
col span()
_

R
n
orth. complement of the column span of
(Lecture 3) Least squares and the singular value decomposition 5 / 52
Orthonormal basis for R
n
orthonormal set Q:=q
1
, . . . , q
k
R
n
of k =n vectors
then Q :=
_
q
1
q
n

is called orthogonal and satises Q

Q =I
n
It follows that Q
1
=Q

and
QQ

=
n

i =1
q
i
q

i
=I
n
Expansion in orthonormal basis x =QQ


x :=Q

x coordinates of x in the basis Q


x =Q

x reconstruct x from the coordinates a


Geometrically multiplication by Q (and Q

) is rotation.
(Lecture 3) Least squares and the singular value decomposition 6 / 52
Gram-Schmidt (G-S) procedure
Given independent set a
1
, . . . , a
k
R
n
,
G-S produces orthonormal set q
1
, . . . , q
k
R
n
such that
span(a
1
, . . . , a
r
) =span(q
1
, . . . , q
r
), for all r k
G-S procedure: Let q
1
:=a
1
/|a
1
|
2
. At the i th step i =2, . . . , k
orthogonalized a
i
w.r.t. q
1
, . . . , q
i 1
:
v
i
:= (I
span(q
1
,...,q
i 1
)
)a
i
. .
projection of a
i
on
_
span(q
1
, . . . , q
i 1
)
_

normalize the result: q


i
:=v
i
/|v
i
|
2
(Lecture 3) Least squares and the singular value decomposition 7 / 52
QR decomposition
G-S procedure gives as a byproduct scalars r
ji
, j i , i =1, . . . , k, s.t.
a
i
= (q

1
a
i
)q
1
+ +(q

i 1
a
i
)q
i 1
+|q
i
|
2
q
i
=r
1i
q
1
+ +r
ii
q
i
in a matrix form G-S produces the matrix decomposition
_
a
1
a
2
a
k

. .
A
=
_
q
1
q
1
q
k

. .
Q
_

_
r
11
r
12
r
1k
0 r
22
r
2k
.
.
.
.
.
.
.
.
.
.
.
.
0 0 r
kk
_

_
. .
R
with orthonormal Q R
nk
and upper triangular R R
kk
(Lecture 3) Least squares and the singular value decomposition 8 / 52
If a
1
, . . . , a
k
are dependent, v
i
:= (I
span(q
1
,...,q
i 1
)
)a
i
=0 for some i
Conversely, if v
i
=0 for some i , a
i
is linearly dependent on a
1
, . . . , a
i 1

Modied G-S procedure: when v


i
=0, skip to the next input vector a
i +1
= R is in upper staircase form, e.g.,
_

_
(empty elements
are zeros)
(Lecture 3) Least squares and the singular value decomposition 9 / 52
Full QR
A =
_
Q
1
Q
2

. .
orthogonal
_
R
1
0
_
col span(A) = col span(Q
1
)
_
col span(A)
_

= col span(Q
2
)
Procedure for nding Q
2
:
complete A to full rank matrix, e.g., A
m
:=
_
A I

, and apply G-S on A


m
Application: complete an orthonormal matrix Q
1
R
nk
to an orthogonal matrix Q =
_
Q
1
Q
2

R
nn
(by computing the full QR of
_
Q
1
I

)
(Lecture 3) Least squares and the singular value decomposition 10 / 52
Singular value decomposition (SVD)
The SVD is used as both computational and analytical tool.
Any mn matrix A of rank r has a reduced SVD
A =
_
u
1
u
r

. .
U
1
_

1
.
.
.

r
_

_
. .

1
_
v
1
v
r

. .
V

1
where U
1
and V
1
are orthonormal

1

r
are called singular values
u
1
, . . . , u
r
are called left singular vectors
v
1
, . . . , v
r
are called right singular vectors
(Lecture 3) Least squares and the singular value decomposition 11 / 52
Full SVD A =UV

where U R
mm
and V R
nn
are orthogonal and
=
r nr
_

1
0
0 0
_
r
mr
where
1
=diag(
1
, . . . ,
r
)
Note that the singular values of A are
(A) :=
_

1
, . . . ,
r
, 0, . . . , 0
. .
min(nr ,mr )
_

min
(A) smallest singular value of A

max
(A) largest singular value of A
(Lecture 3) Least squares and the singular value decomposition 12 / 52
Proof of existence of an SVD
The proof is constructive and uses induction. W.l.o.g. assume mn.
End of induction: vector A R
m1
has (unique) SVD
A =UV

, with U :=A/|A|
2
, :=|A|
2
, V :=1
Inductive step: choose v
i
R
n
with |v
i
|
2
=1 and let
A
i
v
i
=:
i
u
i
, where
i
:=|A
i
|
2
Complete v
i
and u
i
to orthogonal matrices (QR decomp.)
V
i
:=
_
v
i

and U
i
:=
_
u
i

We have that for certain w R


n1
and A
i +1
R
(n1)(n1)
U

i
A
i
V
i
=
_

i
w

0 A
i +1
_
Next we show that w =0.
(Lecture 3) Least squares and the singular value decomposition 13 / 52
Proof of existence of an SVD

2
i
=|A
i
|
2
2
=|U

i
A
i
V
i
|
2
2
=max
v
|A
i
v|
2
2
|v|
2
2

|A
i
[

i
w
] |
2
2
|[

i
w
] |
2
2
=
1

2
i
+w

w
_
_
_
_
_

2
i
+w

w
A
i +1
w
__
_
_
_
2
2

2
i
+w

w
(
2
i
+w

w)
2
=
2
i
+w

w
The inequality
2
i

2
i
+w

w can be true only when w =0.


(Lecture 3) Least squares and the singular value decomposition 14 / 52
Geometric fact motivating the SVD
The image of a unit ball under linear map is a hyperellips.
_
1.00 1.50
0 1.00
_
. .
A
=
_
0.89 0.45
0.45 0.89
_
. .
U
_
2.00 0
0 0.50
_
. .

_
0.45 0.89
0.89 0.45
_
. .
V

1.5 1 0.5 0 0.5 1 1.5


1.5
1
0.5
0
0.5
1
1.5
v
1
v
2
A

1.5 1 0.5 0 0.5 1 1.5


1.5
1
0.5
0
0.5
1
1.5

1
u
1

2
u
2
(Lecture 3) Least squares and the singular value decomposition 15 / 52
Low-rank approximation
Given
a matrix A R
mn
, mn, and
an integer r , 0 <r <n,
nd

A :=argmin

A
|A

A| subject to rank(

A) r
Interpretation:

is optimal rank-r approximation of A w.r.t. the norm | |, e.g.,


|A|
2
F
:=
m

i =1
n

j =1
a
2
ij
or |A|
2
:=max
x
|Ax|
2
|x|
2
(Lecture 3) Least squares and the singular value decomposition 16 / 52
Solution via SVD

:=argmin

A
|A

A|
F
subject to rank(

A) r (LRA)
Theorem Let A =UV

be the SVD of A and dene


U =:
r r n
_
U
1
U
2

n =:
r r n
_

1
0
0
2
_
r
r n
and V =:
r r n
_
V
1
V
2

n
A solution to (LRA) is

=U
1

1
V

1
It is unique if and only if
r
,=
r +1
.
(Lecture 3) Least squares and the singular value decomposition 17 / 52
Proof of the low-rank approximation theorem
Let

A

be solution to (LRA) and let



A

:=U

(V

be an SVD of

A

.
|A

|
F
=|(U

AV

. .
B

|
F
=

is an opt. approx. of B
Partition B =:
_
B
11
B
12
B
21
B
22
_
conformably with

=:
_

1
0
0 0
_
and observe that
rank(
_

1
B
12
0 0
_
) r and B
12
,=0 =
_
_
_B
_

1
B
12
0 0
__
_
_
F
<
_
_
_B
_

1
0
0 0
__
_
_
F
so that B
12
=0. Similarly B
21
=0. Observe also that
rank(
_
B
11
0
0 0
_
) r and B
11
,=

1
=
_
_
_B
_
B
11
0
0 0
__
_
_
F
<
_
_
_B
_

1
0
0 0
__
_
_
F
so that B
11
=

1
. Therefore, B =
_

1
0
0 B
22
_
.
(Lecture 3) Least squares and the singular value decomposition 18 / 52
Proof of the low-rank approximation theorem
Let B
22
=U
22

22
V

22
be the SVD of B
22
. Then the matrix
_
I 0
0 U

22
_
B
_
I 0
0 U
22
_
=
_

1
0
0
22
_
has optimal rank-r approximation

=
_

1
0
0 0
_
, so that
min(diag(

1
)) >max(diag(U
22
))
Therefore
A =U

_
I 0
0 U
22
__

1
0
0
22
__
I 0
0 U

22
_
(V

is an SVD of A.
(Lecture 3) Least squares and the singular value decomposition 19 / 52
Proof of the low-rank approximation theorem
SVD of A:
A =U

_
I 0
0 U
22
__

1
0
0
22
__
I 0
0 U

22
_
(V

Then, if
r
>
r +1
, the rank-r SVD truncation

=U

1
0
0 0
_
(V

=U

_
I 0
0 U
22
__

1
0
0 0
__
I 0
0 U
22

_
(V

is unique and

A

is the unique solution of (LRA).


Note that

A

is simultaneously optimal in any unitarily invariant norm.


(Lecture 3) Least squares and the singular value decomposition 20 / 52
Numerical rank

i =r +1

2
i
=min

A
|A

A|
F
subject to rank(

A) r
and

r +1
=min

A
|A

A|
2
subject to rank(

A) r
are measures of the distance of A to the manifold of rank-r matrices
In particular,
min
(A) is the distance of A to rank deciency.
rank(A, ) := # of singular values > is called numerical rank of A
Note that rank(A, ) depends on an a priori given tolerance .
(Lecture 3) Least squares and the singular value decomposition 21 / 52
Pseudo-inverse A
+
:=V
1

1
1
U

1
R
nm
rank(A) =n =m = A
+
=A
1
rank(A) =n = A
+
= (A

A)
1
A

rank(A) =m = A
+
=A

(AA

)
1
In general, A
+
y is the least squares, least norm solution of Ax =y
Note that the pseudo-inverse depends on the rank of A.
In practice the numerical rank rank(A, ) is used.
The SVD, using numerical rank and pseudo-inverse, is the most
reliable way of solving Ax =y.
It should be used in cases when A is ill-conditioned.
(Lecture 3) Least squares and the singular value decomposition 22 / 52
Condition number (A) :=
max
(A)/
min
(A)
Geometrically (A) is the eccentricity of the hyperellipsoid
Ax [ |x|
2
=1
1.5 1 0.5 0 0.5 1 1.5
1.5
1
0.5
0
0.5
1
1.5

1
u
1

2
u
2
(A) measures the sensitivity of A
+
y to perturbations in y and A
For large (A) (above a few 1000) A is called ill-conditioned.
(Lecture 3) Least squares and the singular value decomposition 23 / 52
Least squares and least norm
(Lecture 3) Least squares and the singular value decomposition 24 / 52
Least squares
consider an overdetermined system of linear equations Ax =y
problem: given A R
mn
, m >n and y R
m
, nd x R
n
for most A and y, there is no solution x
Least squares approximation:
choose x that minimizes 2-norm of the residual (eqn. error)
e(x) :=y Ax
a minimizing x is called a least squares approximate solution

x
ls
:=argmin
x
|y Ax
. .
e(x)
|
2
(Lecture 3) Least squares and the singular value decomposition 25 / 52
Geometric interpretation: project y onto the image of A
(

y
ls
:=A

x
ls
is the projection)
e
ls
:=

y
ls
A

x
ls
R
m
y
e
ls

y
ls
col span(A)
(Lecture 3) Least squares and the singular value decomposition 26 / 52
A

x
ls
=

y
ls

_
A

y
ls

x
ls
1
_
=0

_
a
i

y
ls,i

x
ls
1
_
=0, for i =1, . . . , m
(a
i
is the i th row of A)
(a
i
,

y
ls,i
), for all i , lies on the subspace perpendicular to (

x
ls
, 1)
data point (a
i
, y
i
) = (a
i
,

y
ls,i
) +(0, e
ls,i
)
the approximation error (0, e
ls,i
) is the vertical distance from (a
i
, y
i
)
to the subspace
(Lecture 3) Least squares and the singular value decomposition 27 / 52
Another geometric interpretation of the LS approximation:
R
n+1
R
n
R
1
_

x
ls
1
_
e
ls,i null([

ls
1])
(a
i
, y
i
)
(a
i
,

y
ls,i
)
(Lecture 3) Least squares and the singular value decomposition 28 / 52
Notes
Assuming mn =rank(A), i.e., A is full column rank,

x
ls
= (A

A)
1
A

y
is the unique least squares approximate solution.


x
ls
is a linear function of y
If A is square

x
ls
=A
1
y


x
ls
is an exact solution if Ax =y has an exact solution


y
ls
:=A

x
ls
=A(A

A)
1
A

y is a least squares approximation of y


(Lecture 3) Least squares and the singular value decomposition 29 / 52
Projector onto the span of A
The mm matrix

col span(A)
:=A(A

A)
1
A

is the orthogonal projector onto L :=col span(A).


The columns of A are an arbitrary basis for L.
If the columns of Q form an orthonormal basis for L

col span(Q)
:=QQ

(Lecture 3) Least squares and the singular value decomposition 30 / 52


Orthogonality principle
The least squares residual vector
e
ls
:=y A

x
ls
=
_
I
m
A(A

A)
1
A

_
. .

(col span(A))

y
is orthogonal to col span(A)
e
ls
, A

x
ls
) =y

_
I
m
A(A

A)
1
A

_
A

x
ls
=0, for all x R
n
y
e
ls

y
ls
col span(A)
(Lecture 3) Least squares and the singular value decomposition 31 / 52
Least squares via QR decomposition
Let A =QR be the QR decomposition of A.
(A

A)
1
A

= (R

QR)
1
R

= (R

QR)
1
R

=R
1
Q

so that

x
ls
=R
1
Q

y and

y
ls
:=Ax
ls
=QQ

y
Let A =:
_
a
1
a
n

and consider the sequence of LS problems


A
i
x
i
=y, where A
i
:=
_
a
1
a
i

, for i =1, . . . , n
Dene R
i
as the leading i i submatrix of R and Q
i
:=
_
q
1
q
i

x
i
ls
=R
1
i
Q

i
y
(Lecture 3) Least squares and the singular value decomposition 32 / 52
Least norm solution
Consider an underdetermined system Ax =y, with full rank A R
mn
.
The set of solutions is
x R
n
[ Ax =y =x
p
+z [ z null(A)
where x
p
is a particular solution, i.e., Ax
p
=y.
Least norm problem
x
ln
:=argmin
x
|x|
2
subject to Ax =y
(Lecture 3) Least squares and the singular value decomposition 33 / 52
Geometric interpretation:
x
ln
is the projection of 0 onto the solution set
orthogonality principle x
ln
null(A)
R
n
x
ln
|x
ln
|
2
0
null(A) +x
p
(Lecture 3) Least squares and the singular value decomposition 34 / 52
Derivation of the solution: Lagrange multipliers
Consider the least norm problem with A full rank
min
x
|x|
2
2
subject to Ax =y
introduce Lagrange multipliers R
m
L(x, ) =xx

(Ax y)
the optimality conditions are

x
L(x, ) =2x +A

=0

L(x, ) =Ax y =0
from the rst condition x =A

/2, substituting into the second


=2(AA

)
1
y = x
ln
=A

(AA

)
1
y
(Lecture 3) Least squares and the singular value decomposition 35 / 52
Solution via QR decomposition
Let A

=QR be the QR decomposition of A

.
A

(AA

)
1
=QR(R

QR)
1
=Q(R

)
1
is a right inverse of A. Then
x
ln
=Q(R

)
1
y
(Lecture 3) Least squares and the singular value decomposition 36 / 52
Extensions
(Lecture 3) Least squares and the singular value decomposition 37 / 52
Weighted least squares
Given a positive denite matrix W R
mm
, dene wighted 2-norm
|e|
2
W
:=e

We
Weighted least squares approximation problem

x
W,ls
:=argmin
x
|y Ax|
W
The orthogonality principle holds by dening the inner product as
e, y)
W
:=e

Wy
and

x
W,ls
= (A

WA)
1
A

Wy
(Lecture 3) Least squares and the singular value decomposition 38 / 52
Recursive least squares
Let a

i
be the i th row of A
A =
_

_
a

1

.
.
.
a

m

_

_
with this notation, |y Ax|
2
2
=

m
i =1
(y
i
a

i
x)
2
and

x
ls
=

x
ls
(m) :=
_
m

i =1
a
i
a

i
_
1
m

i =1
a
i
y
i
(a
i
, y
i
) correspond to a measurement
often the measurements (a
i
, y
i
) come sequentially (e.g., in time)
(Lecture 3) Least squares and the singular value decomposition 39 / 52
Recursive computation of

x
ls
(m) =
_
m

i =1
a
i
a

i
_
1
m

i =1
a
i
y
i
P(0) =0 R
nn
, q(0) =0 R
n
For m =0, 1, . . .
P(m+1) :=P(m) +a
m+1
a

m+1
, q(m+1) :=q(m) +a
m+1
y
m+1
.
If P(m) is invertible, x
ls
(m) =P
1
(m)q(m).
Notes:
In each step, the algorithm requires inversion of an nn matrix
P(m) invertible = P(m

) invertible, for all m

>m
(Lecture 3) Least squares and the singular value decomposition 40 / 52
Rank-1 update formula
(P +aa

)
1
=P
1

1
1+a

P
1
a
(P
1
a)(P
1
a)

Notes:
gives an O(n
2
) method for computing P
1
(m+1) from P
1
(m)
standard methods based on dense LU, QR, or SVD for computing
P
1
(m+1) require O(n
3
) operations
(Lecture 3) Least squares and the singular value decomposition 41 / 52
Multiobjective least squares
least squares minimizes the cost function J
1
(x) :=|y Ax|
2
2
.
Consider a second cost function J
2
(x) :=|z Bx|
2
2
,
which we want to minimize together with J
1
.
Usually the criteria min
x
J
1
(x) and min
x
J
2
(x) are competing.
Common example: J
2
(x) :=|x|
2
2
minimize J
1
with small x
feasible objectives:
(, ) R
2
[ x R
n
subject to J
1
(x) =, J
2
(x) =
optimal trade-off curve: boundary of the feasible objectives
the corresponding x is called Pareto optimal
(Lecture 3) Least squares and the singular value decomposition 42 / 52
Set of Pareto optimal solutions
Example:
green area feasible
white area infeasible
black line marginally
feasible
Pareto optimal solutions cor-
respond to points on the line
0.1 0.2 0.3 0.4 0.5
0.6
0.8
1
1.2
1.4
1.6
J
2
J
1
For any 0,

x() =argmin
x
J
1
(x) +J
2
(x) is Pareto optimal.
By varying [0, ),

x() sweeps all Pareto optimal solutions
(Lecture 3) Least squares and the singular value decomposition 43 / 52
Regularized least squares
Tychonov regularization

x
tych
() =argmin
x
|y Ax|
2
2
+|x|
2
2
the solution

x
tych
() = (A

A+I
n
)
1
A

y
exists for any >0, independent on size and rank of A.
Trade-off between
tting accuracy J
1
(x) =|y Ax|
2
, and
solution size J
2
(x) =|x|
2
.
(Lecture 3) Least squares and the singular value decomposition 44 / 52
Quadratically constrained least squares
Consider again the biobjective LS problem min
x
J
1
(x) and J
2
(x)
Scalarization approach:

x
tych
() =argmin
x
J
1
(x) +J
2
(x)
where is trade-off parameter
Constrained optimization approach:

x
constr
() =argmin
x
J
1
(x) subject to J
2
(x)
where is upper bound on the J
2
objective
(Lecture 3) Least squares and the singular value decomposition 45 / 52
Regularized least squares
Tychonov regularization corresponds to the scalarization approach for
tting accuracy J
1
(x) =|y Ax|
2
, and
solution size J
2
(x) =|x|
2
.
The constrained optimization approach leads in this case to

x
constr
() =argmin
x
|y Ax|
2
2
subject to |x|
2
2

2
least squares minimization over the ball U

2 :=x [ |x|
2
2

2
.
The solution to the latter problem involves scalar nonlinear equation.
(Lecture 3) Least squares and the singular value decomposition 46 / 52
Secular equation
If |A
+
y|
2
2

2
, then

x
constr
() =|A
+
y|
2
2
.
If |A
+
y|
2
2
>
2
, then it can be shown that

x
constr
() U

2 .
The Lagrangian of
minimize
x
|y Ax|
2
2
subject to |x|
2
2
=
2
is |y Ax|
2
2
+(|x|
2
2

2
), where is a Lagrange multiplier.
Necessary and sufcient optimality condition is
x

tych
()x
tych
() =
2
, where x
tych
() := (A

A+I)
1
y
The nonlinear equation in
y

(A

A+I)
2
y =
2
is called secular equation. It has unique positive solution because
|x
tych
()| is monotonically decreasing on the interval [0, ) and by
assumption |x
tych
(0)|
2
2
>
2
.
(Lecture 3) Least squares and the singular value decomposition 47 / 52
Total least squares (TLS)
The LS method minimizes 2-norm of the equation error e(x) :=y Ax.
min
x,e
|e|
2
subject to Ax =y e
alternatively the equation error e can be viewed as a correction on y.
The TLS method is motivated by the asymmetry of the LS method:
both A and y are given data, but only y is corrected.
TLS problem: min
x,A,y
_
_
_
A y
_
_
F
subject to (A+A)x =y +y
A correction on A, y correction on y
Frobenius matrix norm: |C|
F
:=
_

m
i =1

n
j =1
c
2
ij
, where C R
mn
(Lecture 3) Least squares and the singular value decomposition 48 / 52
Geometric interpretation of the TLS criterion
In the case n =1, the problem of solving approximately Ax =y is
_
_
a
1
.
.
.
a
m
_
_
x =
_
_
y
1
.
.
.
y
m
_
_
, x R
Geometric interpretation:
t a line L(x) passing through 0 to the points (a
1
, y
1
), . . . , (a
m
, y
m
)
LS minimizes
sum of squared vertical distances from (a
i
, y
i
) to L(x)
TLS minimizes
sum of squared orthogonal distances from (a
i
, y
i
) to L(x)
(Lecture 3) Least squares and the singular value decomposition 49 / 52
R
n+1
R
n
R
1
_

x
tls
1
_
null([

tls
1])
(a
i
, y
i
)
(

a
i
,

y
tls,i
)
(Lecture 3) Least squares and the singular value decomposition 50 / 52
Solution of the TLS problem
Let
_
A y

=UV

be the SVD of the data matrix


_
A y

and
=diag(
1
, . . . ,
n+1
), U =
_
u
1
u
n+1

, V =
_
v
1
v
n+1

.
A TLS solution of Ax =y exists iff v
n+1,n+1
,=0 (last element of v
n+1
)
and is unique iff
n
,=
n+1
.
In the case when a TLS solution exists and is unique, it is given by

x
tls
=
1
v
n+1,n+1
_
_
v
1,n+1
.
.
.
v
n,n+1
_
_
and the corresponding TLS corrections are
_
A
tls
y
tls

=
n+1
u
n+1
v

n+1
(Corollary of the low-rank approximation theorem, see page 17.)
(Lecture 3) Least squares and the singular value decomposition 51 / 52
References
1. S. Boyd.
EE263: Introduction to linear dynamical systems.
2. G. Golub and C. Van Loan.
Matrix Computations.
Johns Hopkins, 1996.
3. L. Trefethen and D. Bau.
Numerical Linear Algebra.
SIAM, 1997.
4. B. Vanluyten, J. C. Willems, and B. De Moor.
Model reduction of systems with symmetries.
In Proc. of the CDC, pages 826831, 2005.
5. I. Markovsky and S. Van Huffel
Overview of total least squares methods
Signal Processing, 87, pages 22832302, 2007
(Lecture 3) Least squares and the singular value decomposition 52 / 52

You might also like