You are on page 1of 116

ASE 396: Model-Based Dete

tion and Estimation

Class Notes
Spring 2012

Contents
1 Overview of Estimation and Dete tion
1.1

Appli ations of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1

Fantasti ally simple: Least squares problem . . . . . . . . . . . . . . . . . . .

1.1.2

Fantasti ally omplex: Estimation of GPS . . . . . . . . . . . . . . . . . . . .

1.2

General Solution Pro edure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3

We Want to Know Our Unknowns

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4

Example: Missile Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Linear Algebra Review

2.1

Ve tors, matri es and operations

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.2.1

Properties of norms

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.2.2

S hwartz inequality

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

Determinant and inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.3.1

Determinant

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.3.2

Singularity and non-singularity . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.3.3

Matrix inversion

12

2.3.4

Inversion of partitioned matri es

2.3.5

Matrix inversion lemma

2.3

2.4

2.5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .

12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

Ve tor transformation and matrix fa torization . . . . . . . . . . . . . . . . . . . . .

13

2.4.1

The Householder Transformation . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.4.2

QR Fa torization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.4.3

Cholesky Fa torization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

Eigenvalues and ve tors

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Review of Probability Theroy

14

15

3.1

Probability: what is it?

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

3.2

Axiomati Approa h to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

4 Review of Probability and Statisti s

20

4.1

Conditional Probability

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

4.2

Independent Events and Independent Random Variables . . . . . . . . . . . . . . . .

20

4.2.1

20

Further impli ations of independen e . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS

CONTENTS

4.3

Ve tor-Valued Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

4.4

Total Probability Theorem

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

4.5

Bayes's Theorem

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

4.5.1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

4.6

Conditional Expe tation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

4.7

Gaussian Random Ve tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

4.7.1

Bayesian Jargon

Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

4.8

Joint and Conditional Gaussian Random Variables . . . . . . . . . . . . . . . . . . .

24

4.9

Expe ted Value of a Quadrati Form . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

5 Dete tion/Hypothesis Testing Basi s

26

5.1

Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

5.2

Neyman-Pearson Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

5.3

Example:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

5.4

Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

6 Optimum Dete tion Stru tures


6.1

Dete tor Performan e

6.2

Composite Hypothesis Testing

30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 Estimation Basi s

30
30

34

7.1

The Problem of Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . .

34

7.2

Maximum Likelihood Estimators

34

7.3

Maximum A Posteriori Estimators

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

7.4

Example 1: ML Estimator for a Linear First-Order Dynami al System . . . . . . . .

35

7.5

Example 2: MAP Estimator for a Linear First-Order Dynami al System . . . . . . .

36

7.6

Least-Squares Estimators

7.7

Example 3: LS Estimator for a Linear First-Order Dynami al System

. . . . . . . .

36

7.8

Minimum Mean-Squared Error Estimators . . . . . . . . . . . . . . . . . . . . . . . .

37

7.9

Summary

37

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 Linear estimation for stati systems


8.1
8.2

38

MAP estimator for Gaussian problems . . . . . . . . . . . . . . . . . . . . . . . . . .

38

8.1.1

40

Analysis of MAP Estimate

Bat h Least Squares Estimator


8.2.1

8.3

36

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

Properties of the Least Squares Estimator . . . . . . . . . . . . . . . . . . . .

42

Square-Root-Based LS Solutions

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

8.3.1

Re ursive Least Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . .

44

8.3.2

Analysis of Re ursive LS Algorithm

46

. . . . . . . . . . . . . . . . . . . . . . .

8.4

Example: Maximum Likelihood Estimate

. . . . . . . . . . . . . . . . . . . . . . . .

46

8.5

Re ursive Approa h Using Square-Root LS Method . . . . . . . . . . . . . . . . . . .

47

8.5.1

Review of square-root LS method . . . . . . . . . . . . . . . . . . . . . . . . .

47

8.5.2

Re ursive Square-Root LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

CONTENTS

CONTENTS

9 Nonlinear Least Square Estimation

50

9.1

Basi s of nonlinear least square estimation . . . . . . . . . . . . . . . . . . . . . . . .

9.2

Newton-Rhaphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

9.3

Gauss-Newton Algorithm (Gill et al. 4.7.2)(with step length algorithm)

. . . . . . .

53

9.4

Levenberg-Marquart Method (LM) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

10 Sto hasti Linear System Models

50

55

10.1 Continuous-time model for dynami systems . . . . . . . . . . . . . . . . . . . . . . .

55

10.2 White noise for sto hasti systems

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

10.3 Predi tion of mean and ovarian e

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

10.4 Dis rete-time models of sto hasti systems . . . . . . . . . . . . . . . . . . . . . . . .

57

10.5 Dis rete-time measurement model

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

10.6 Full dis rete-time model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

11 Kalman lter for dis rete-time linear system

61

12 Alternative formulas for ovarian e and gain of Kalman lter

65

12.0.1 Interpretation of the Kalman gain

. . . . . . . . . . . . . . . . . . . . . . . .

65

12.0.2 Generalization for Weighted Pro ess Noise . . . . . . . . . . . . . . . . . . . .

66

12.0.3 Deriving the Kalman Filter from a MAP approa h . . . . . . . . . . . . . . .

66

12.0.4 Setting up the ost fun tion . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

12.0.5 Minimizing the ost fun tion

. . . . . . . . . . . . . . . . . . . . . . . . . . .

67

12.0.6 Solving for the minimum- ost estimate . . . . . . . . . . . . . . . . . . . . . .

68

13 Stability and Consisten y of Kalman Filter


13.1 Stability of KF

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.2 Control of a System Estimated by KF

70
70

. . . . . . . . . . . . . . . . . . . . . . . . . .

71

13.3 Matrix Ri ati Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

13.4 Steady-State KF Equations

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

13.5 Steady-State Error Dynami s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

13.6 Properties of KF Innovations

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

13.7 Likelihood of a Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

14 Kalman Filter Consisten y

75

14.1 Consisten y Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

14.2 Statisti al test for KF performan e . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

14.2.1 Monte Carlo Simulation based test . . . . . . . . . . . . . . . . . . . . . . . .

76

14.2.2 Real-Time (Multiple-Runs) Tests . . . . . . . . . . . . . . . . . . . . . . . . .

77

14.2.3 Real-Time (Single-Run) Tests . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

15 Correlated Pro ess and Measurement Noise


15.1 Aside: Realization Problem

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.2 Aside: Spe tral Fa torization

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81
83
83

CONTENTS

CONTENTS

16 Information Filter/SRIF
16.1 Information Filter

84

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

16.1.1 Benets of the Information Filter . . . . . . . . . . . . . . . . . . . . . . . . .

85

16.1.2 Disadvantages of the Information Filter

85

16.2 Square Root Information Filtering

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

16.2.1 Propagation Step and Measurement Update . . . . . . . . . . . . . . . . . . .

88

16.3 Benets of the square root information lter

17 Smoothing
17.1 Estimate

x(k)

based on

Zj

. . . . . . . . . . . . . . . . . . . . . .

91

92
with

j>k

. . . . . . . . . . . . . . . . . . . . . . . . . .

92

17.2 Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

18 Nonlinear Dieren e Equations from ZOH Nonlinear Dierential Equations

95

18.1 Zero-Order-Hold (ZOH) Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

18.2 Varian e of the ZOH Pro ess Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

18.3 Partial Derivatives of Dieren e Equations . . . . . . . . . . . . . . . . . . . . . . . .

97

19 Nonlinear Estimation for Dynami al Systems


19.1 Standard EKF Methods for Approximation
19.1.1 Predi tion:

100

. . . . . . . . . . . . . . . . . . . . . . . 100

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

19.1.2 Measurement update:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

19.2 EKF as an algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

20 Iterated Kalman Filter

104

20.1 Re-interpretation of EKF as a Gauss Newton Step

. . . . . . . . . . . . . . . . . . . 104

20.2 Iterated Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106


20.3 Forward Ba kward Smoothing (BSEKF) . . . . . . . . . . . . . . . . . . . . . . . . . 107

21 Multiple Model (MM) Filtering


21.1 Stati Case

108

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

21.1.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108


21.1.2 Steps

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

21.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

22 The "Bootstrap" Parti le Filter


22.1 Motivation

111

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

22.1.1 Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111


22.1.2 Update

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

22.1.3 Importan e Sampling

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

22.2 Parti le Filter Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112


k
22.2.1 How to hoose q[x(k)|z ]? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
22.3 Bootstrap Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
22.3.1 Note:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

1 OVERVIEW OF ESTIMATION AND DETECTION

1 Overview of Estimation and Dete tion


S ribe: Kyle Wesson

Estimation
observations.

is the pro ess of inferring the value of a quantity from indire t and ina urate

Estimation an be thought of as working with a ontinuoum of values.

We will

frequently seek optimal solutions over heuristi solutions. The so- alled optimal solutions seek to
provide the best estimates with some quanti ation of their a ura y.

Dete tion

is the pro ess of making de isions based on our estimates. It inovlves hypothesis

testing against a set of nite possibilities and sele ting one that best represents our estimates.
Dete tion an be though of as a subset of estimation.

disturbances and
modeling errors

control
inputs

Dynamical
System

prior
information

measurement
errors

system
state

Measurement
System

measurements

State
Estimator

state estimate
and uncertainties

this gets you about 80% of the way


we often do this naturally (e.g., playing catch)
Figure 1: A blo k diagram illustrating the estimation pro ess.

1.1 Appli ations of Estimation


1.1.1 Fantasti ally simple: Least squares problem
We often work with equations of the form

z = Hx + w
where

is a ve tor of deterministi but unknown parameters,

is a matrix that linearly relates

to

z,

and

(1)

is a ve tor of Gaussian noise,

is a measurement ve tor.

1.1.2 Fantasti ally omplex: Estimation of GPS


Consider the IGS Tra king Network with 300 ground sites (shown in Fig. 2) tra king GPS
satellites in 6 planes with 4 (or more) satellites (SV) per plane.

Q: How do the sites know where they are?


A: They range o of several SVs whose positions are known.
Q: How do the SVs know where they are?
A: They range o of the sites... Oh no! ir ular logi .

When all of the quantities of interest are ompiled together, one has a staggeringly omplex
estimator:

non-linear state dynami s and measurement equations (square root sum of squares)

1.2 General Solution Pro edure

1 OVERVIEW OF ESTIMATION AND DETECTION

Figure 2: Map showing lo ation of the ground-based IGS sites (from

non-uniform gravitational potential

disturban e for es on SVs su h as irregular solar heating

rustal motion alters ground site lo ation

ionosphere and neutral atmosphere have ee ts

earth orientation parameter variations (EOP)

lo k variations

Goal:

http://igs b.jpl.nasa.gov)

estimate all site lo ations, all SV orbital parameters, atmospheri delays, slowly varying

SV disturban es, EOP, lo k errors. Wow!

1.2 General Solution Pro edure


1. Develop a physi s-based model that relates the unknown parameters or states to the measurements and identify in model the sour es of un ertainty, quantify levels.
2. Do an analysis to determine whether the unknown quantities are uniquely determinable from
the measurements (observability).
3. If answer to 1. is yes, apply estimation te hnique to estimate the unknown quantities. Also
estimate the a ura y of our estimates.

1.3 We Want to Know Our Unknowns


There are known knowns, These are things we know that we know. There are known
unknowns. That is to say, there are things that we known we don't know. But there
are also unknown unknowns. There are things we don't know we don't know. 

Rumsfeld, Feb 12, 2002

In addition, we an onsider:

known an't-knowns: observability analysis says not uniquely determinable

unknown an't knowns: didn't do the observability analysis

Donald

1.4 Example: Missile Problem

1 OVERVIEW OF ESTIMATION AND DETECTION

1.4 Example: Missile Problem


x2

missle

x1

Figure 3: An illustration of the missile tra king problem.


Physi s:

x
1 = 0

(2)

x
2 = g

(3)

y1k = x1 (tk ) + n1k


y2k = x2 (tk ) + n2k

(4)

Measurements:

(5)

Assume:

E[n1k ] = E[n2k ] = 0
E[n1k n1j ] =
E[n2k n2j ] =
Unknown Quantities:

(6)

12 kj
22 kj

(7)
(8)

x1 (0), x2 (0), v1 (0), v2 (0) where v1 = x 1 and v2 = x 2 .


x1 (t) and x2 (t) (the dynami s

First, solve equation of motion to determine

model):

x1 (t) = x1 (0) + v1 (0)t

(9)

x2 (t) = x2 (0) + v2 (0)t gt /2

(10)

Next, determine the measurements:

y1 (t) = x1 (0) + v1 (0)t + n1k

(11)

(12)

y2 (t) = x2 (0) + v2 (0)t gt /2 + n2k

Q: Are x1 (0) (and other quantities) observable


A: Yes! (for n 2)

from

[y11 , y12 , y21 , . . . , y1n , y2n ]?

1.4 Example: Missile Problem

1 OVERVIEW OF ESTIMATION AND DETECTION

Now develop an estimation algorithm and apply it:


1 0
y11
y22 0 1
=
.
.
.

| {z }
z

.
.
.

.
.
.

t1
0
{z
H

.
.
.

x (0)

1
0
n11
0

x2 (0) gt2 /2 n22


t1

v (0) +
+
.
.
.
1
.
.
.
.
.
.
v2 (0)
} | {z } | {z } | {z }
x

Assuming some estimation strategy, we get the following quantities:


Then, we an predi t the impa t lo ation noting

x
1 (0), x
2 (0), v1 (0), v2 (0).

x2 (timpact ) = 0.

0=x
2 (0) + v2 (0)timpact gt2impact/2
p
2 (0)
v2 (0) + v22 (0) + 2g x
timpact =
g
timpact into previous equation for site estimate: x
1 (timpact ). We
ximpact . Re ursion: in orporate new measurements as they arise.

Plug
and

(13)

(14)
(15)

an also estimate

timpact

2 LINEAR ALGEBRA REVIEW

2 Linear Algebra Review


S ribe: Chirag Patel

2.1 Ve tors, matri es and operations


Ve tor

= ordered list of quantities:

v1
v2

v= .
..
vn
where

vj

is the

j th

s alar element and

Matrix

where

is an

nm

matrix,

v Rn .

a11

Outer Produ t:

s alar
If

a12

a21
A=
.
..
an1
 
A = aij

..

= aT b

b Rn1

if

and

a, b

a1m

..

.
.
.

Q: What is the rank of the outer produ t?


A: 1, it is a produ t of two rank one ve tors.

anm

(i, j)th

(17)

(18)

element (row

ve tors. Notation:

then

A = bcT Rnm

of a matrix is the number of independent rows or olumns.

Tra e:

of

If

is the sum of the diagonal elements: tr(A)

Symmetri Matrix:

n
P

i=1

A Rnm , B Rmn ,
if

then tr(AB)nn

A = AT

= tr(BA)mm

for a square matrix

olumn

j ).

(19)

Rank:

i,

AT = C T DT

n1

c Rm1 ,

.
.
.

is the

then

are

A Rnm ; aij
A = BC ,

Inner Produ t:

(16)

A.

aii .

ha, bi
with

aij = bi cj .

2.2 Norms

2 LINEAR ALGEBRA REVIEW

Quadrati Form:

If

a quadrati form in

P = PT,

x Rn1 .

then we an dene

= xT P x

(20)

n X
n
X

(21)

Or,

xi Pij xj

j=1 i=1

Positive deniteness: Given P = P T , if xT P x > 0 x 6= 0, then P is positive denite


T
(notation: P > 0). If only x P x 0, then P is positive semidenite (notation: P 0).




1
0.5
1 5
Q: Whi h is positive denite:
or
?
0.5
2
5 2

A: The rst one is, the se ond is not.


Q: Can one tell by inspe tion?
A: Not in general, but P > 0

all eigenvalues are positive (we will learn about eigenvalues

soon).

Weighted inner produ t: ha, bip = aT P b

is a weighted inner produ t if

P > 0.

2.2 Norms

Norm of x ould be:

l1 : kxk1 =
l2 : kxk2 =

n
P

i=1

|xi |,

n
P

i=1

x2i

Manhattan norm"

1/2

xT x,

Eu lidean norm"

l : kxk = max |xi |


i

Aside: The three norms dened above are shown in Figure

?? for the n = 2 ase.

The interse -

tions on the axes are the same in ea h ase.


Fun fa t #2: The l distan e is also alled the hessboard distan e be ause it is equal to the minimum number of moves a king must make to go from one square to another, if ea h square is of side
length

1.

Also, for an entertaining (and informative) read on taxi ab geometry (that uses the l1 - in-

stead of the l2 -norm for distan e), try


In general, the

p-norm

http://www.ams.org/samplings/feature- olumn/f ar -taxi.

an be dened

p 1:
1/p

kxp k = (|x1 |p + |x2 |p + + |xn |p )

10

(22)

2.3 Determinant and inverse

2 LINEAR ALGEBRA REVIEW

2.2.1 Properties of norms


Three properties of norms:
1.

kxk 0

2.

kxk = ||kxk

3.

ka + bk kak + kbk

If

P > 0,

and

then

kxk = 0

kxk2P =

i

x=0

- the triangle inequality

xT P x;

this is also a valid norm alled the

Hereafter, as a matter of notation,

P -weighted

norm.

kxk2 = kxk.

Indu ed matrix norm:


kAknm = max kAxk

(23)

|xT y| kxkkyk

(24)

kxk=1

2.2.2 S hwartz inequality

It's like saying

| cos | 1.

2.3 Determinant and inverse


2.3.1 Determinant
Determinant of A where A is square = |A|
|A| =
where

Also,

Cij

is

 
a =a

with the

for a

ith

11

n
X
i=1

row and

aij |Cij |(1)i+j j = 1, . . .

(25)

j th olumn removed.


a b


c d = ad bc

(26)

matrix (this is not the absolute value, in ase

Q: Where does
 the
 determinant ome from? 
d b
a b
1
1
, then A
= |A|
.
A: If A =
c

about a matrix.

This is re ursive until

11

is negative).

The determinant tells us ru ial information

2.3 Determinant and inverse

2 LINEAR ALGEBRA REVIEW

2.3.2 Singularity and non-singularity


A Rnn

is singular if

|A| = 0.

is rank-de ient if

If

|A| 6= 0,

then

If

|A| = 0,

then

x 6= 0

But if

|A| 6= 0,

rank(A) < n.

is non-singular (all olumns are linearly independent).

then

su h that

Ax = 0.

Ax = 0 x = 0.

2.3.3 Matrix inversion


If

A Rnn

and

|A| 6= 0, A1

A1 =

|A|

su h that

|C11 |

A = BC ,

then

|C21 |
..

|C12 |
.
.
.
1+n

(1)

If

A1 A = AA1 = I .

A1 = C 1 B 1 ,

.
..

|C1n |

provided that

A, B ,

and

(1)n+1 |Cn1 |

.
.

.
.

.
|Cnn |
C

(27)

are non-singular.

Solution of linear equations: If A Rnn and |A| 6= 0, and b Rn1 ,


Ax = b by x = A1 b. Note that x is a unique solution and x Rn1 .

then we an solve

2.3.4 Inversion of partitioned matri es


Let

P
P = 11
P21

P12
P22

with

P11 , P22

square. If

P11

and

1
= P22 P21 P11
P12

are invertible (

is the S hur omplement), then:

P 1 =

V11
V21

Where

1
1
1
V11 = P11
+ P11
P12 1 P21 P11
1
V12 = P11
P12 1
1
V21 = 1 P21 P11

V22 = 1
Note that

dim(Pij ) = dim(Vij ).

12

V12
V22

(28)

2.4 Ve tor transformation and matrix fa torization

2 LINEAR ALGEBRA REVIEW

2.3.5 Matrix inversion lemma


(A + BCD)

= A1 A1 B DA1 B + C 1

Alternatively,
1. If

D = BT ,

1

= A1 A1 B B T A1 B + C 1

1

B T A1

A = P 1 , B = H T , D = H , C = R1 , then
1
1
P 1 + H T R1 H
= P P H T HP H T + R
HP

Orthonormal matri es (orthogonal matri es):


Q

DA1

(29)

then

A + BCB T
2. If

1

is orthonormal. Thus
All olumns of

If

QT Q = QQT = I .

Q is n n and |Q| 6= 0 and Q1 = QT ,

are perpendi ular to ea h other and have unit norm:

Important appli ation of orthogonal matrix:

kQxk = kxk: Q

(30)

(31)
then

qiT qj = ij .

is isometri .

kQxk2 = xT QT Qx = xT x = kxk2

2.4 Ve tor transformation and matrix fa torization


2.4.1 The Householder Transformation
See Bierman p59-63, Gill et. al. p35-36.
T

x2 xn , then there is a matrix
Let x = x1
for some

The matrix


Hx =

is symmetri and orthogonal.

H
0

su h that

T

(32)

is alled a Householder matrix.

kHxk = kxk =

(33)

Orthogonal transformation an be used to ompress all the magnitude ("energy") of a ve tor in to


a single omponent, zeroing all other omponents.

has the form

H=I
We an solve for

that makes


Hx =

2vv T
vT v
T
0 :


1
0


v = x + sign(x1 )kxk 0
..
.
0
13

(34)

(35)

2.5 Eigenvalues and ve tors

2 LINEAR ALGEBRA REVIEW

2.4.2 QR Fa torization
Given any

A Rmn ,

we an express A as a produ t of an orthogonal matrix and an upper

triangular matrix:

QR = A
QT Q = QQT = I , Q Rmm , and R Rmn . R is an upper triangular matrix,
ne essarily square (all elements of R below the main diagonal are zero).
1
If A is square and invertible, then so is R. What's more, R
is easy to ompute.
 
e
R
e is n n and upper triangular.
If A is m n with m > n, then R =
; R
0
where

Q=

(36)
and not

H1 H 2 . . . H p
{z
}
|

(37)

(Householder transforms)

p = min(m, n).

with

Use MATLAB:

[Q, R = qr(A);

2.4.3 Cholesky Fa torization


Given an n
T
that R R = P .

matrix

P = PT

with

is sort of a matrix square root of

P > 0,

it is possible to nd an upper triangular

su h

P.

Use MATLAB:

R = hol(P);

2.5 Eigenvalues and ve tors


A Rnn , s alars i (possible omplex) and asso iated ve tors vi , whose elements are
omplex if i C su h that Avi = i vi with kvi k 6= 0.
This an also be written as (A i I)vi = 0.
Thus, det(A i I) = 0 gives a degree n polynomial in i ( hara teristi equation) whose roots
are i .
T
If A = A : n eigenvalues of A are real and the n eigenve tors are independent and orthogonal.
Given

The onverse is not ne essarily true.

If

A > 0:

all

i > 0 ().

If

A 0:

all

i 0 ().

Also,

|A| =

n
Q

i=1

i ;

tr(A)

n
P

i .

i=1

14

3 REVIEW OF PROBABILITY THEROY

3 Review of Probability Theroy


S ribe: Lu Xia
Q: Where does randomness - a omponent randomness - originate?
A1: Essentially random: atomi s ale.

Heisenberg un entainty prin iple:

(Figure 4)

p1 (x)

p2 (v)

x0

v0

(a) Position of parti le

(b) Speed of parti le

Figure 4: Heisenberg un ertainty prin iple

x v

n
m

(38)

n  plan k's onstant


m  mass of parti le

"Our most pre ise des ription of nature must be in terms of probabilities"  R.Feynman.

A2: Ee tively random


Assume a parti le is aught in a potential given by
along

and a parti le of mass

m = 1,

we have,

x
=
Add a bit of dampling:

V (x) =

dV
= x3 + x
dx

x4
4

x2

(Figure 5), for motion only

(39)

x
= x x3 + x,
x = v

(40)

x
= v = x x + x.

15

(41)

3.1 Probability: what is it?

3 REVIEW OF PROBABILITY THEROY

Figure 5: potential

1.5

xdot

0.5

0.5

1.5
2

1.5

0.5

0
x

0.5

1.5

Figure 6: Phase portrait

Phase portrait: (Figure 6)

1 2
2 v + V (x) ,the total density of the basin of attra tion for some
approa hes 50%. This implies a requirement of in reasing energy pre ision

For large energies

E(x) =

neighbourhood E
E = E
E to spe ify the sink that we want.

 alled "transient haos" in nonlinear dynami s. (Strogatz)

3.1 Probability: what is it?


(1). Measure of belief denition: make argumens for out omes based on, e.g. the phase portrait
in pro eeding example. (No experiments required  perhaps imagined experiments  maybe not
even that.)
(2). Relative frequen y denition:

P (A) = lim

This denition gives mathematitians ts:


1. In what sense doese the limit exist?
2. Can't evern perform

N =

experiments.

3. What about non repeatable experiments?

16

NB
N

(42)

3.2 Axiomati Approa h to Probability

3 REVIEW OF PROBABILITY THEROY

3.2 Axiomati Approa h to Probability

experiement: a pro ess with (ee tively) random out omes.


event: a set of possible out omes of experiments.
S = sample space (all possible outcomes)

Figure 7: Sample Spa e


The event A o urs when an experiment has been done and the out ome is an element of A.

Probability of A: a number that satises the follows axioms.


0
= 1. (S = sample spa e, "sure event")
(3) Let A B , set of out omes in both A and B.
If A B = , then P (A B) = P (A) + P (B)

(1)P (A)
(2)P (S)

onsequen e:

P (A) = 1 P (A)

A=

"not A"

S =AA

AA =

(44)
(45)

P (A A) = P (A) + P (A) = 1
Assign probabilities by whatever means.(e.g.

(43)

relateive frequen y, means of belief.)

(46)
But they

must always satisfy these axioms.

Probability Density Fun tion (pdf)


event:

{x : d < x }

probability density fun tion:

x() , lim

d0

P ( d < x )
d

(47)
(48)

ommonly written as:

p(x), px (x)

P ( < x ) =
17

p(x)dx

(49)

3.2 Axiomati Approa h to Probability

3 REVIEW OF PROBABILITY THEROY

Commulative Density fun tion:Px () , P (x < ) =

inf p(x)dx

Note:

"sure event", this onstraint also implies a normalization requirement on the pdf.

()2
1
e 22
p() =
2

R
Px ( ) = I .

(deathbed identity)

the

(50)

Dis rete-valued random variable


i ,

If x an only take a dis rete value

for

i = 1, . . . n,

then we an dene probability masses

i ,

i = 1, 2, . . . , n.
i = P (x = p )
p(x) =

n
X
i=1

()

i = 1

(51)

i (x i )

(52)

is Dira delta fun tion.

Expe ted value:


E[x] ,

xP (x)dx = x

(mean of x)

(53)

In general,

E[g(x)] =

g(x)p(x)dx

(54)

(55)

varian e of x:
x2 = E[(x x
)2 ] =

We say
If

x [a, b]

if

E[x] = a, x2 = b.

x =

(56)
(57)

Also,

is Gaussian,

(x x)2 p(x)dx = E[x2 ] x


2

x 0

is the standard deviation of x.

x(

1
(x)
)e 2x2 dx =
2

(58)
(59)

similarly,

x2 = 2

18

3.2 Axiomati Approa h to Probability

3 REVIEW OF PROBABILITY THEROY

Joint PDF of 2 Random Variables


p(, ) ,

P {( d < x ) ( d < x )}
d0,d0
dd
lim

(60)
(61)

events on a joingt distribution are regions in 2-spa e.


out omes are points.

p(x) =

p(x, y)dy.

(62)

marginal density "integrate out" all the y dimension.

Cumulative distribution
Px,y (, ) = x
=

xp(x, y)dxdy =

A similar argument holds for

y.

p(x, y)dyx =

xp(x)dx

(63)

(64)

Varian e denitions follow a simillar pattern.

Covarian e:
cov(x, y) = E[(x x
)(y y)]
Z Z
=
(x x)(y y)p(x, y)dxdy

an show
dependent.

|xy 1,

cov(x, y)
= orrelation
x y

and if

xy = 0 x, y

(66)

= x yxy

xy =

(65)

oe ient

are un orrelated.

19

(67)
(68)

xy = 1 x, y

are linearly

4 REVIEW OF PROBABILITY AND STATISTICS

4 Review of Probability and Statisti s


S ribe: Na hiappan Valliappan

4.1 Conditional Probability


Are events

and

related in the sense that knowledge of the o urren e of one alter the

probability of the other? If one of the events, say


from

(the sure set) to

B,

has o urred, it then shrinks the sample spa e

as illustrated in Figure 8.

A B

Figure 8: Venn diagram illustrating onditional probability.


Now, to nd the probability that event
overlapping region
this is expressed as

AB

o urred (given

B ),

we need to divide the area of the

in the Venn diagram by the redu ed sample spa e

Pr(A | B) =

B.

Mathemati ally,

Pr(A B)
Pr(B)

(69)

Similarly, for random variables we have

p(x | y) =

p(x, y)
p(y)

(70)

4.2 Independent Events and Independent Random Variables


If knowledge of the o urren e of

does not alter the probability of

A,

then

and

are

independent events. Mathemati ally,

Pr(A) = Pr(A | B) =

Pr(A B)
Pr(A, B)
=
Pr(A, B) = Pr(A) Pr(B)
Pr(B)
Pr(B)

(71)

4.2.1 Further impli ations of independen e


Pr(A | B) = Pr(A) Pr(B | A) = Pr(B)
20

(72)

4.3 Ve tor-Valued Random Variables

4 REVIEW OF PROBABILITY AND STATISTICS

S
B

A B

A
Figure 9: Does this Venn diagram express inde-

Figure 10:

penden e between A and B?

den e between events A and B

Q:

Venn diagram expressing indepen-

Does the Venn diagram in gure 9 express independen e? If not, how do we express inde-

penden e between events in a Venn diagram?

A:

A and B sin e
A and B are indepenthe Venn diagram we have Pr(A) 6= 0 and Pr(B) 6= 0.
no overlap between events A and B in the Venn diagram

No! The Venn diagram in gure 9 does not express independen e between

it leads to a ontradi tion.

It an be explained as follows.

Sin e events

Pr(A B) = Pr(A) Pr(B). From


Pr(A B) 6= 0. However, there is
= Pr(A B) = 0. Therefore, we run into a ontradi tion and gure 9 does not represent any
independen e between A and B . A orre t way to express independen e between events A and B

dent,

Hen e,

is illustrated in gure 10.


Random variables are independent if the joint probability density fun tion (pdf )
fa tored into the produ t of its marginal densities

p(x)

and

p(y)

p(x, y)

p(x, y) = p(x)p(y)
Also, if

and

Cov(x, y) =

Z Z

(x x)(y y)p(x, y) dxdy =


Cov(x, y)
=0
x y

(x x)p(x) dx

and

denote

n-valued

(y y)p(y) dy = 0

(no oupling between x and y)

4.3 Ve tor-Valued Random Variables


x

(73)

are independent random variables, we have

= xy =

Let

an be

i.e.,

ve tor random variables.

x1
x2

x= .
..
xn
21


1
2

=.
..
n

(74)

(75)

4.4 Total Probability Theorem

4 REVIEW OF PROBABILITY AND STATISTICS

The joint pdf des ribing the statisti s of the random ve tor

is dened as follows.

px () = px (1 , 2 , . . . , n )
Pr {(1 d1 < x1 < 1 ) (2 d2 < x2 < 2 ) (n dn < xn < n )}
d1 0,
d1 d2 . . . dn

, lim

(76)

d2 0,
...
dn 0

The expe ted value of the random ve tor

is denoted by

E[x]

(or equivalently,

dened as

E[x] =

...

) Rn
x

and is

xp(x) dx1 dx2 . . . dxn

(77)

Pxx Rnn and is dened as


Z
T

] = (x x)(x

T p(x)dx
Cov(x) = E[(x x)(x
x)
x)

The ovarian e matrix of

where

Pxx (i, j)

is denoted by

(78)

is given by

Pxx (i, j) =

The ovarian e matrix

...

Px x

(xi x
i )(xj x
j )p(x1 , x2 , . . . , xn ) dx1 dx2 . . . dxn

is symmetri i.e.,

T
Pxx = Pxx
.

Further,

there is no linear dependen e between any of the random elements of

Pxx  0
x.

and if

Pxx 0

then

4.4 Total Probability Theorem


If B1 , B2 . . . , Bn are mutually
{B1 B2 Bn } = S , then

ex lusive and exhaustive events i.e.,

Pr(A) =

n
X

{Bi Bj = }, i 6= j

Pr(A, Bi )

i=1

n
X
i=1

and

(79)

Pr(A | Bi ) Pr(Bi )

For random variables,

p(x) =

p(x, y)dy =

4.5 Bayes's Theorem

p(x | y)p(y)dy

(80)

B1 , B2 . . . , Bn be n mutually ex lusive and exhaustive events i.e., {Bi Bj = }, i 6= j


{B1 B2 Bn } = S , and let A be any event with Pr(A) > 0, then

Consider
and

Pr(Bi | A) =

Pr(A | Bi ) Pr(Bi )
Pr(Bi A)
= Pn
Pr(A)
j=1 Pr(A | Bj ) Pr(Bj )
22

(81)

4.6 Conditional Expe tation

4 REVIEW OF PROBABILITY AND STATISTICS

Thus equation (81) allows us to reverse the onditioning in onditional probabilities. For densities,

p(y | x)p(x)
p(y)

p(x | y) =

(82)

Be ause of its utility in solving estimation problems, equation (82) is onsidered the 

estimation .

workhorse for

Note that, the onditional density is a valid probability density fun tion and hen e

sums to 1.

p(x | y) dx =

p(y | x)p(x)
dx =
p(y)

p(y)
p(x, y)
dx =
=1
p(y)
p(y)

(83)

4.5.1 Bayesian Jargon


p(x)

is an

p(x | y)

apriori

is an

belief (or often referred to as the prior)

aposteriori

belief (or often referred to as the posterior)

Q: What if we have no prior belief ?


A: (A1) There are Bayesians and loset Bayesians!
onsidering a

diused

prior, say uniform over

R.

(A2) One an still apply Bayes postulate by

p(x) is entirely unknown, then

If

assume

p(x) = ,

a onstant independent of x (diused prior assumption).

p(x) =
It an be shown that

1
2
otherwise

|x|

divides out of the formula

(84)

p(y|x)p(x)
1
for an
p(y)
wide interval on either side

of 0.

4.6 Conditional Expe tation


Conditional expe tation of a random variable

E[x | y] =
In general,

E[x | y] 6= E[x] (unless x and y are


E[E[x | y]] = E[x].

given another random variable

xp(x | y) dx

E[x | y]p(x | y) dx

(85)

independent). Now, we prove the tower property of

Z +

is dened as

the expe tation operator

E[E[x | y]] =

xp(x | y)p(y) dydx


xp(x, y) dydx

= E[x] = x
.

23

(Marginalizing over y)

4.7 Gaussian Random Ve tors

4 REVIEW OF PROBABILITY AND STATISTICS

4.7 Gaussian Random Ve tors


Given an

nn

Rn1 , we dene the


P (= P T ) and


T P 1 (x )

(x )
1

p(x) =
n
1 exp
2
(2) 2 |P | 2

symmetri matrix

4.7.1 Properties
1.

E[x] =

2.

T] = P
E[(x )(x
)

pdf of

as
(86)

4.8 Joint and Conditional Gaussian Random Variables1


Given two Gaussian random ve tors
sta king ve tors

and

z,

and

dene a new Gaussian random ve tor,

y,

by

z.
y=

 
x
z

dim(x, y, z)

Pyy

= (nx , ny , nz )


Pxx Pxz
= Cov(y) =
Pzx Pzz

(87)


nz
1
1
T Pyy
|Pzz | 2
(2) 2 exp 12 (y y)
(y y)
p(x, z)
p(y)
p(x | z) =
=
=

nx +nz
1
1
p(z)
p(z)
T Pzz
|Pyy | 2
(z z)
(2) 2 exp 21 (z z)

(88)

After mu h algebra (in luding blo k inverse formula et .), we an show that the pdf of the
onditional random variable

p(x | z)

(x | z)

is

1 T 12
|Pxx Pxz Pzz
Pxz |
1
1
1 T 1
1
Pxz Pzz
T )T (Pxx Pxz Pzz
Pxz Pzz

(z z)
Pxz ) (x x
(z (89)
z)))
exp( (x x
2
(2)

nx
2

Mean:
1
+ Pxz Pzz

E[x | z] = x
(z z)
This applies a orre tion term to

based on the value of

(90)

z.

Covarian e:
1 T
E[(x E[x | z])((x E[x | z]))T | z] = Pxx Pxz Pzz
Pxz
1 T
Pxx Pxz Pzz
Pxz 0. So,
with information about z .

Note that,
estimate

it redu es the ovarian e

1 We will revisit this topi later with regard to optimal estimation

24

Pxx

(91)

and hen e improves the

4.9 Expe ted Value of a Quadrati Form 4 REVIEW OF PROBABILITY AND STATISTICS

4.9 Expe ted Value of a Quadrati Form


Given a symmetri matrix

A(= AT )

and a random ve tor

with

E[x] = 0,

E[xT Ax] =

E[tr(xT Ax)]

=
=

E[tr(AxxT )]
tr(AE[xxT ])

(sin e
(sin e

xT Ax = tr(xT Ax))
tr(.) and E[.] are linear

tr(ACov(x))

(sin e

E[x] = 0

tr(APxx )

and

then we have

operators)

Cov(x) = xxT )
(92)

25

5 DETECTION/HYPOTHESIS TESTING BASICS

5 Dete tion/Hypothesis Testing Basi s


S ribe: Marlon Bright

5.1 Hypothesis Testing


Hypothesis Testing (HT) provides the theory behind our previously des ribed methods for onsisten y he king. HT is also impli it in multiple model estimation, in signal dete tion (e.g., GPS
signal a quisition, spoong dete tion,et .), et . Consider a parameter (or a ve tor of parameters)

Assume that

an take on only dis rete values:

{0 , 1 , , m1 }
Let

Hi

denote the hypothesis that the true value of

i .

is

Thus, we dene the following

m-ary

hypothesis:

H 0 : = 0
H 1 : = 1
.
.
.

H m : = m
Therefore, in the binary hypothesis ase we have, only

H 0 : = 0
H 1 : = 1
In this ase,

H0

is known as the

null hypothesis

and

H1

as the

alternative hypothesis.

We fo us

rst on binary HT. Let,

= P [ a ept H1 |H0
= P [ a ept H1 |H1
= P [ a ept H0 |H1
= 1 PD

PF
PD
PM
A de ision between

p(z|0 )

and

p(z|1 )

H0

and

H1

true
true
true

]
]
]

(false alarm)
(dete tion)
(miss)

is based on a set of random variables (observations)

(probability density fun tion of

given

and

z,

where

are true, respe tively) are

known.

5.2 Neyman-Pearson Lemma


The Neyman-Pearson Lemma (NP) tells us how to hoose optimally between
by optimal we mean: given a xed maximum
Let

>0

PF ,

maximize

PD .

be a threshold. NP states that the optimal test is of the form:

hoose
hoose

H0
H1

if
if

p(z|1 ) < p(z|0 )


p(z|1 ) > p(z|0 )

hoose randomly on equality

26

H0

and

H1 , where

5.3 Example:

5 DETECTION/HYPOTHESIS TESTING BASICS

NP, also, states that su h a test

exists

and is unique for every

0 PF 1

The Neyman Pearson Lemma goes on to give us the form of the test. It boils down to omparing
the likelihood ratio to a threshold:

(z) =

p(z|1 )
p(z|0 )

(z) =

p(z|H1 )
p(z|H0 )

whi h is sometimes written:

Apply the NP test as follows:


1. Assign

H0

2. Sele t

PF .

and

H1 .

By onvention, we often set

PF

willing to a ept a 5% han e that we reje t

= 0.05 or 0.01.

H0

PF

= 0.05 means that we are

(missile in oming) when true ("no missile").

3. Simplify the threshold test if possible:

H1

(z)

H0

(z)

4. Determine (or approximate) the distribution of


5. Cal ulate the proper value of

based on

PF

H1

H0

(z).

and on the distribution of

5.3 Example:
Assume that the re eived signal in a radar dete tion problem is

z = + ,

N (0, 2 )

and

H0 : = 0
H 1 : = 1

>0

Then:

p(z|0 ) = N (z; 0, 2 )
p(z|1 ) = N (z; 1 , 2 )
Form the log-likelihood ratio:

27

(z).

5.3 Example:

5 DETECTION/HYPOTHESIS TESTING BASICS





p(z|1 )
(z 1 )2 z 2
L(z) = log(z) = log
=
p(z|0 )
2 2


2z1 12
=
2 2
At this point the NP tests amounts to

H1

(z)

= log()

H0
We further simplify by noting that the NP test an be boiled down to:

H1

= 0

with

H0
Now set

to satisfy:

PF = P [z > 0 |0 ] =

2 2 + 12
21

p(z|0 ) dz

p(z|0 )

p(z|1 )

1
0
PM

PF

Figure 11: Binary Hypothesis PDFs. Shaded regions illustrate probability density of

MATLAB

ommand syntax for the above is:

PF = 1 lambda_0
PD = 1 PM = 1 -

norm df(lambda_0, theta_0, sigma);


= norminv(1-PF, theta_0, sigma);
norm df(lambda_0, theta_1, sigma);
PD;
28

PF

and

PM .

5.4 Remarks

5 DETECTION/HYPOTHESIS TESTING BASICS

5.4 Remarks
1. The hoi e of threshold depends only on
2.

PF

PF

p( (z)|0 ).

and

PF

applies to ea h measurement taken. The ombined

after repeated measurements may

be large.
3. Note that the NP approa h does not take into a ount any
of
4.

H0

Q:

and

a priori judgement on the likelihood

H1 .
PF ?

What rationale an we give for our hoi e of

The NP formulation doesn't give any

guidan e here.

A: Remarks 2 and
ost (AKA risk).

3 point us to a Bayesian framework where we onsider the

Pj = prior
Cjk = ost

probability of

of de iding

R = C00 P0 P [de iding H0 |H0


+C11 P1 P [de iding H1 |H1

is true]
is true]

Hj

expe eted

Hj
when

Hk

is true

+ C10 P0 P [de iding H1 |H0


+ C01 P0 P [de iding H0 |H1

is true]
is true]

Or in a synonymous form:

R = C00 P0 P [de iding H0 |H0

is true]

+ C10 P0 PF + C11 P1 PD + C01 P0 PM

Goal: minimize R
This formulation leads dire tly to a likelihood ratio test and it hooses the value of

for

you:

p(z|H1 )
p(z|H0 )

H1

H0

Q: If the Bayesian approa h automati ally hooses the "right" value of , then why take the
NP approa h, whi h seems to set

a ording to some seemingly arbitrary value of

A: It's often not possible or pra ti al to nd Cjk

29

and

Pj .

PF ?

6 OPTIMUM DETECTION STRUCTURES

6 Optimum Dete tion Stru tures


S ribe: Gezheng Wen

6.1 Dete tor Performan e


A dete tor designed a ording to the NP formulation is guaranteed to have the max
given

PF .

Even so,

The urve is alled

PD might

not be a eptable. It is often useful to plot

Re eiver Operating hara teristi s.

PD

PD

for a

as a fun tion of

PF .

Receiver Operating characteristics


1
0.9
0.8
0.7

PD

0.6
0.5
0.4
0.3
0.2
0.1
0

0.1

0.2

0.3

0.4

0.5
PF

0.6

0.7

0.8

0.9

Figure 12: Re eiver Operating hara teristi s


When

, it is easy to
(0, 0) and (1, 1).

dis riminate. When

p(z|0 ) = p(z|1 ),

the ROC is a straight line

going through

6.2 Composite Hypothesis Testing


In a simple hypothesis testing (HT), the parameter

orresponding to a xed value:

[0 , 1 , . . . , M1 ].
In a omposite HT,

an take on a range of values (e.g.,

[0, 2]).

In many ases of pra ti e

interest,the nal dete tion test again boils down to a likelihood ration test.

Example:

H0 :

z1 = w1

z2 = w2

H1 :

z1 = A cos + w1

z2 = A sin + w2

A > 0 is some positive onstant,


N (0, 2 ) and w1 , w2 , are independent
Our parameter is this ase is

where

is some random variable and

 
1
2
30

U[0, 2], w1, w2

6.2 Composite Hypothesis Testing

6 OPTIMUM DETECTION STRUCTURES

H0

H1

Figure 13: (a)signal absent,(b)signal present

1 {0, A},2 [0, 2].


parameter spa e on whi h

where
The

lives is thus de omposed into two disjoint regions:

Figure 14: Two Regions

0 = { | 1 = 0}
1 = { | 1 = A}
HT boils down to:

(z) =

p(z | 1 )
p(z | 0 )

H1

H0

We an show that:

1
1
exp{
(z )T (z )}
2 2
2 2


1 cos 2

= (1 , 2 ) =
1 sin 2
p(z | ) =

31

6.2 Composite Hypothesis Testing

6 OPTIMUM DETECTION STRUCTURES

1
1
exp{ 2 z T z}
2 2
2
Z2
p(z | 1 ) = p(z | 1 = A, 2 = )p ()d

p(z | 0 ) =

1
1
exp{ 2 (z (A, ))T (z (A, ))}d
4 2 2
2

This te hnique of averaging out is generalizable.


The key idea is to re over
From the likelihood ratio:

p(z | )

by averaging over

A
exp( 2
p(z | 1 )
2)
(z) =
=
p(z | 0 )
2

Z2

A
exp{ (z1 cos() + z2 sin()}d
2

Simplify further by going to the polar oordinates:

r=

q
z12 + z22

= arctan(

z2
)
z1

Then

z1 = r cos()

z2 = r sin()

Plug in and simplify


2

(z) =

A
exp( 2
2)
2

Z2

exp{

Note:I0 ( Ar
2 ) =
1st kind.

1
2

2
R
0

exp{ Ar
2 cos( )}d ,

where

I0

Ar
cos( )}d
2

is the 0th order modied Rossel fun tion of

Let

(z)
Sin e

I0 (x)

= I0 (

is monotoni ally in reasing in

(z)

x,

Ar
)
2

H0
Thus,the optimal test ompares
Under
Under

H0 ,
H1 ,

H0

the test is equivalent to:

H1

H1

2 I1
0 ( )
=
A

with a threshold.

is distributed as a Rayleigh distribution.


is distributed as a Ri e distribution.

32

6.2 Composite Hypothesis Testing

6 OPTIMUM DETECTION STRUCTURES

Q: Why do we so often nd ourselves performing a orrelation as part of a dete tion test?
A: Here we go:
General Gaussian Problem:

H0 : z N (0 , P0 )

H1 : z N (1 , P1 )

Note:
j

Measurement in

may be orrelated (Pj may no be diagonal) and the values of elements of

may not be all the same.


We an show in

HW

(z) =

that HT redu es to:

1
1
(z 0 )T P01 (z 0 ) (z 1 )( T )P11 (z 1 )
2
2

H1

H0

In the general ase, nding an analyti al expression for the distribution of

(z)

But some spe ial ases are tra table.

Example:

Suppose

P0 = P1 = P,

then the test redu es to:

H1

(z) = P1 z

H0
where

= 0 .
0 = 0

Further suppose

and

P = 2 I.
(z) = T1 z

H1

H0

n
P

zi

()

> ( )

H1

<

H0

1i
Figure 15: Correlator blo k diagram

33

is not easy.

7 ESTIMATION BASICS

7 Estimation Basi s
S ribe: Zaher M. Kassas

7.1 The Problem of Parameter Estimation


Given a data set

Z k , {z(1), z(2), . . . , z(k)},

we wish to estimate some unknown parameter

x.

Our estimate will be a fun tion of the data set and possibly time, i.e.

= x(k,
Z k ) = f (k, Z k ).
x
f (k, Z k ),

There are several ways to designing the fun tion

i.e. ways to dening an estimator, whi h

will be explored next.

7.2 Maximum Likelihood Estimators


x, then we simply regard it as an unknown
p(Z |x) is known. Dene the likelihood fun tion

If we have no a priori information about the parameter


k

onstant ve tor. Assume that the onditional pdf


as

Z k (x) , p(Z k |x).


Then, the maximum likelihood (ML) estimator is one that maximizes su h likelihood fun tion,
namely

ML = arg max Z k (x)


x
x

By the rst order ne essary ondition (FONC) of optimality, we set the derivative of the likelihood
fun tion with respe t to

to zero, namely

Z k (x)
0.
x
This impli itly denes

ML
x

as the solution of

equations in

unknowns.

7.3 Maximum A Posteriori Estimators


Assume that we have some prior information about the parameter
the pdf

p(x),

in that it is a sample from


p(Z k |x). Then, the

alled the prior pdf. Assume that we know the onditional pdf

posterior distribution an be written by Bayes' rule as

p(x|Z k ) = R

p(Z k |x)p(x)
.
p(Z k |x)p(x)dx

(93)

The maximum a posteriori (MAP) estimator is one that maximizes the posterior distribution,
namely

MAP = arg max p(x|Z k )


x
x

It is worth noting that the denominator in (93) is onstant with respe t to the maximization
parameter

x;

therefore, it is immaterial in nding

MAP .
x

The ML and MAP estimators are the same if the prior pdf

p(x) = lim

, |x|
0, |x| >
34

1
2 ;
1
2 .

p(x)

is diuse, i.e.

7.4 Example 1: ML Estimator for a Linear First-Order Dynami al System


7 ESTIMATION BASICS

7.4 Example 1: ML Estimator for a Linear First-Order Dynami al System


Consider the linear dynami al system, hara terized by the dierential equation

y = ay.
The solution to this system an be readily found to be

y(t) = y(0)eat ,
where

y(0)

is the initial ondition. The obje tive is to estimate the parameter

x = y(0)

given the

dis rete measurements

z(j) = y(jt) + w(j),


where

is the sampling interval,

j = 1, 2, . . . , k,

w(j) N (0, 2 ),

and

E [w(i)w(j)] = 2 ij ,

with

ij

being the

yields the

Krone ker delta fun tion. The measurements an be re-written as

z(j) = j x + w(j),
where

j , eatj

and

j = 1, 2, . . . , k,

x , y(0).

By independen e of the measurements, the likelihood fun tion an be written as

Z k (x) , p(Z k |x) = p(z(1)|x)p(z(2)|x) p(z(k)|x).


The pdf

p(z(j)|x)

is nothing but the pdf

p(w),

shifted by

z(j) j x,

i.e.

p(z(j)|x) = N (z(j) j x, 2 ).
Therefore, we an write

p(Z k |x) = pw (z(1) 1 x)pw (z(2) 2 x) pw (z(k) k x),


where

pw (w) = N (w; 0, 2 ).

Therefore the likelihood fun tion be omes

Z k (x) =

Dierentiating

Z k (x)

k
1 X

1
exp
2 2
(2)k/2 k

j=1

[z(j) j x]

as per the FONC yields

k
X
1
[z(j) j x] (j ) 0
Z k (x). 2
j=1

Re ognizing that
ML estimate as

Z k (x) 6= 0

implies that the se ond term must be zero. Solving for

xML =

Pk

j=1

Pk

35

z(j)j

j=1

j2

7.5 Example 2: MAP Estimator for a Linear First-Order Dynami al System


7 ESTIMATION BASICS

7.5 Example 2: MAP Estimator for a Linear First-Order Dynami al


System
Consider the system in Example 1 with the additional information that x is a sample from a
x N (x , x2 ). The posterior pdf in
this ase is given by

Gaussian distribution with known mean and varian e, namely

k
X
1
1
2
[z(j) j x] 2 (x x )2 ,
exp 2
p(x|Z k ) = c1
2 j=1
2x
(2)(k+1)/2 k x

where

c1 = [p(Z k )]1

is onstant that is not a fun tion of x, hen e it is immaterial for purposes


p(x|Z k ) as per the FONC yields

of maximization. Dierentiating

k
X
1
1
p(x|Z k ). 2
[z(j) j x] j 2 (x x ) 0.
j=1
x

Re ognizing that
MAP estimate as

p(x|Z k ) 6= 0

implies that the se ond term must be zero. Solving for

x
MAP =
It is worth noting that as

x 0,

1
2

Pk

z(j)j +

j=1

1
2

Pk

j=1

j2 +

1
2 x
x

yields the

1
2
x

the ML and MAP estimators oin ide.

Note that we an re-write the posterior pdf as

where

2
new

is dened as



1
1
2
p(x|Z k ) = p
,
(x

)
exp
MAP
2
2
2new
2new
2
new
=

1
2

Pk

j=1

j2 +

1
2
x

i.e. the posterior pdf has the form of a Gaussian pdf with mean

x
MAP

and varian e

7.6 Least-Squares Estimators

2
new
.

Least-squares (LS) estimators aim at minimizing the ost fun tion, dened by the sum of the
squares of the error between the data and the model, denoted as

namely

LS = arg min C(k, Z k ) , ||||22 .


x
x

Hen e, LS estimators aim at nding

that minimize the Eu lidean norm of su h error ve tor.

7.7 Example 3: LS Estimator for a Linear First-Order Dynami al System


For the linear rst-order dynami al system in Example 1, the LS estimator is dened a ording
to

xLS

k
1 X
2
= arg min C(k, Z ) , 2
[z(j) j (x)] .
x
2 j=1
k

36

7.8 Minimum Mean-Squared Error Estimators


Dierentiating

C(k, Z k )

7 ESTIMATION BASICS

as per the FONC yields

k
1 X
[z(j) j x] (j ) 0.
2
j=1

Solving for

yields the LS estimator as

xLS =

Pk

j=1

Pk

z(j)j

j=1

j2

Note that the resulting LS estimator oin ided with the ML estimator. This stems from the fa t
that for Gaussian random variables, the ML estimation orresponds to a Eu lidean distan e metri .

7.8 Minimum Mean-Squared Error Estimators


Assume that

p(x)

and

p(Z k |x)

are known, whi h allow us to determine

p(x|Z k ).

The mini-

mum mean-squared error (MMSE) estimator aims at minimizing the ost fun tion dened by the
onditional mean of the squared estimation error, i.e.

MMSE = arg min C(


x
x, Z ) , E (
x x) |Z
x

Taking the derivative of

C(
x, Z k )

with respe t to

(
x x)2 p(x|Z k )dx.

as per the FONC and solving for

yields the

MMSE estimator to be the onditional mean, namely



MMSE = E x|Z k .
x

Q: Under what onditions is x MMSE = x MAP ?


A: Whenever the peak (mode) of p(x|Z k ) oin ides with its mean.

7.9 Summary

LS

ML

MAP

For Gaussian random variables, if

ML for Gaussian measurement noise.

MAP if a diuse prior is assumed.

MMSE if the mean and the mode of

p(x)

p(x|Z k )

oin ide.

is diuse, then: ML

37

MAP

LS

MMSE.

8 LINEAR ESTIMATION FOR STATIC SYSTEMS

8 Linear estimation for stati systems


S ribe: Joshua Yuan

8.1 MAP estimator for Gaussian problems


z = Hx + w, suppose we wish to estimate x Rnx x1 Assume that
w N (0, R) is nz 1 noise ve tor, and H is a known nz nx matrix. We also have a priori knowledge
T
Pxx ) The noise is also un orrelated, so the ovarian e matrix E[(x x)W

that x N (x,
]=0
Our approa h to this problem is to develop a joint PDF for x and z , then use our understanding
MAP su h that
of onditional Gaussian distributions to determine p(x | z). Thereby, we an nd x
Given a system model

MAP = arg max p(x | z)


x

(94)

In order to nd

p(x | z),
z.

we rst need

p(x, z),

so let's dene some things that will help us get

there. First, let's nd

z = E[z]
= E[Hz + w]

(95)
(96)

= HE[x] + E[w]
+0
= Hx
Next, we need ovarian e matri es

Pxz , Pzx ,

and

(97)
(98)

Pzz .

T]
Pxz = E[(x x)(z
z)

(99)

T]
= E[(x x)(Hx
+ w H x)
T

(100)

H + (x x)w

= E[(x x)(x
x)
]

(101)

H ] + E[(x x)w

= E[(x x)(x
x)
]
= Pxx H

(102)
(103)
(104)

Be ause of the symmetry of ovarian e matri es, we an also say that

T
Pzx = Pxz
= HPxx
For

(105)

Pzz ,

T]
Pzz = E[(z z)(z
z)

(106)

+ w)(H(x x)
+ w) ]
= E[(H(x x)
T

= HPxx H + R

38

(107)
(108)

8.1 MAP estimator for Gaussian problems8 LINEAR ESTIMATION FOR STATIC SYSTEMS
Now we an dene

p(x, z),

p(x | z) =

from whi h we an nd

p(x | z).

p(x, z)
p(z)

(109)

= c(z) exp

1
2

T Pxx
(x x)
T Pzx
(z z)

Pxz
Pzz


(x x)

(z z)

(110)

Re all from the linear algebra review that

(111)

1 T 1
Vxx = (Pxx Pxz Pzz
Pxz )

(112)

Pxx
Pzx

Pxz
Pzz

1

Vxx
Vzx

Vxz
Vzz

where

Vxz =

Vzz =
Now we an nd

Note that

MAP
x

1
Vxx Pxz Pzz
1
(Pzz Pxz Pxx
Pxz )1

by maximizing

p(x | z),

is a s alar value. Set

0=

Vxz
Vzz



(x x)

(z z)

x
(115)
(116)

C(x | z) = 0.

C
x
C

=
Now we solve for

(114)

or equivalently, we an minimize over

log(p(x | z) = const + C(x | z)



 Vxx
1
T
T

(x x)
(z z)
C(x | z) =
Vzx
2

C(x | z)

(113)

(117)

x1
.
.
.
C
xnk

(118)

+ Vxz (z z)

= Vxx (x x)

(119)

1
MAP = x
Vxx

x
Vxz (z z)

(120)

MAP .
x

+
=x

1
Pxz Pzz
(z

z)

(121)

Using our original formulas for the ovarian e matri es, we get

MAP = x
+ Pxx H T (HPxx H T + R)1 (z H x)

x
39

(122)

8.2 Bat h Least Squares Estimator

8 LINEAR ESTIMATION FOR STATIC SYSTEMS

8.1.1 Analysis of MAP Estimate


, E[x]
,
x

Let's nd

and

Pxx|z .

=xx

(123)

Pxx H (HPxx H + R)
= (x x)
T

= E[x x]
Pxx H (HPxx H + R)
E[x]
=0

Be ause

E[x]

= 0,

MAP
x

(z H x)

E[z H x]

(127)
(128)

(129)

] Pxz Pzz

T]
= E[(x x)(x
x)
E[(z z)(x
x)
1 T
T

T ]Pzz

E[(x x)(z
z)
Pxz + Pxz Pzz
E[(z z)(z

Note that

Pxx|z < Pxx ,

1 T
Pxz Pzz
Pxz
T
Pxx H (HPxx H T

and

1 T
T ]Pzz
z)
Pxz

(130)
(131)

+ R)

HPxx

(132)

implying we are getting a better result. However, if the elements of

be ome too large (the noise is too powerful),

assumes

(126)

]
= E[(x x)(x
x)

= Pxx

(125)

is an unbiased estimator.

x
T ]
Pxx|z = E[x

= Pxx

(124)

Pxx|z = Pxx .

Note also that the above analysis

w are Gaussian distributed.


x or w is not Gaussian? What

an we say about the form of the estimator then?


Q: What if
A: See Bar-Shalom, se tion 3.3; Even if they're not Gaussian, if E[w] = 0 and we know x , Pxx ,

and

R,

then

MAP
x

found from assuming Gaussian distribution is the optimal linear estimator.

However, there may still be a better non-linear estimator, i.e.

1
MAP = x
+ Pxz Pzz

x
(z z)

= (I

an estimator of the form

+ Dz ,
Cx

Pxz Pzz
H)x

a linaer ombination of

(133)

1
Pxz Pzz
z

and

(134)

is the best hoi e for C and D.

8.2 Bat h Least Squares Estimator


Given measurements

z(i) = H(i)x + w(i)

(with no a priori

distribution), suppose that

w(i) N (0, R(i))


E[w(i)w(j)] = 0, if i 6= j
T

R(i) = R(i) > 0

And we dene a (s alar) ost fun tion

J(k) =

k
X
i=1

J(k)

(135)
(136)
(137)

as

[z(i) H(i)x]T R(i)1 [z(i) H(i)x]

40

(138)

8.2 Bat h Least Squares Estimator

8 LINEAR ESTIMATION FOR STATIC SYSTEMS

We want to use this ost fun tion to deemphasize any noisy measurements.
Note that now we have a time index, and no a priori knowledge.
1

p(z k | x) = C exp 2 J(k)

(139)

J(k) = 2 ln[p(z | x] + constant

So minimizing

J(k)

with respe t to

is equivalent to maximizing

estimator).

(140)

(x) = p(z k | x)

(ML

We need a hange of notation to in orporate data and parameters for ea h new time step.

z(1)
z(2)

Z ( k) = . Rnz k1
..
z(k)

H(1)
H(2)

H k = . Rnz knx
..
H(k)

w(1)
w(2)

k = . Rnz k1
..
w(k)

R(1)
0

.
.
0
R(2) 0
.
Rnz knz k
Rk =

.
.
..
..
0
0
0

0 R(k)

(141)

(142)

(143)

(144)

(145)

(R is a blo k diagonal matrix.)


Then we an rewrite

J(k)

as

J(k) = [Z k H k x]T (Rk )1 [Z k H k x]


The summation notation is now gone, but

J(k)

(146)

is the same as before, and still a s alar value.

Now to minimize J(k),

0=

J
x

J
x1
J
x2
.
.
.
J
xnx

(147)

(148)

= 2(H k )(Rk )1 (Z k H k x)
41

(149)

8.3 Square-Root-Based LS Solutions


This gives us

nx

8 LINEAR ESTIMATION FOR STATIC SYSTEMS

number of equations, and

nx

unknowns. Now we solve for

k.
x

k = [(H k )T (Rk )1 H k ]1 (H k )T (Rk )1 Z k


x
T

= (H R
By dropping the

H)

H R

(150)

(151)

supers ripts, we get one of the deathbed identities, the normal equations.

8.2.1 Properties of the Least Squares Estimator


=xx

(152)

= x (H T R1 H)1 H T R1 z
T

= x (H R

= [I (H R

H)

H R

H)

H R

=0

(153)

(Hx + w)

(154)

H]x (H R

Using this, we an nd expe tation and varian e for

H)

H R

(155)
(156)

= (H T R1 H)1 H T R1 E[w]
E[x]
=0

(157)
(158)

As you an see here, the least squares estimator is unbiased.

Px x = (H T R1 H)1 H T R1 E[wT ]R1 H(H T R1 H)1


T

= (H R
This follows be ause the
1
with a neighboring R
.

H)

(159)
(160)

H T R1 and R1 an els with (H T R1 H)1 , and E[wT ] = R, an eling

8.3 Square-Root-Based LS Solutions


Refer to Bierman for more detail on this topi .

H T R1 H must be invertible, whi h amounts to the parameter x being


T 1
observable. In some ases, x is only lo ally observable from the data, meaning that H R
H is
T 1
almost singular. In these ases, it's a bad idea to dire tly invert H R H ; small numeri al errors
.
an lead to large errors in x

Re all for a solution to exist,

The solution to this problem is to use square root algorithms.

They are more numeri ally ro-

bust, and also leads to a more elegant and intuitive interpretation of least squares.
Let

R), R = RT , R > 0.
z H x + w , w N (x,

MATLAB,

Ra = chol(R);

42

Use the Cholesky fa torization,

RaT Ra = R.

In

8.3 Square-Root-Based LS Solutions

8 LINEAR ESTIMATION FOR STATIC SYSTEMS

Then let

z = (Ra1 )T z

(161)

(RaT )1 z
(RaT z
RaT H
RaT w

(162)

=
=
H=
w=

(163)
(164)
(165)

Now we have a transformed measurement model,

z = Hx + w
E[w] =

E[ww ] =
=
=

(166)

E[RaT w ] = 0
E[RaT w wT Ra1 ]
RaT E[w wT ]Ra1
RaT RaT Ra Ra1

=I
Be ause

E[ww ] = I , w N (0, I).

(167)
(168)
(169)
(170)
(171)

Now our noise is standard uniform, and ni er to work with.

The ost fun tion of our problem is

J(k) = [z Hx]T [z Hx]


= kHx zk

(172)
(173)

Re all multiplying a ve tor by an orthonormal matrix doesn't hange its magnitude.

kvk = v T v
T

(174)

kQvk = v Q Qv
T

=v v
= kvk

(175)
(176)
(177)

Q: Can we multiply Hxz by some orthonormal matrix and leverly simplify the ost fun tion?
A: yes we an; use QR fa torization
R
= H,
Q

so let

T
T =Q

to get

T (Hx z)k2
J(k) = kQ
zk
2
= kRx
T

z = Q z

We an break this up further, like this:

43

(178)
(179)

8.3 Square-Root-Based LS Solutions

8 LINEAR ESTIMATION FOR STATIC SYSTEMS

 
  2
R
o
z

J(k) =
x o

0
o x zo k + kk2
= kR

(180)
(181)

How do we minimize this? We solve rst for x, made possible be ause if rank(H) = nx (whi h
o Rnx nx and is invertible. We also have the solution
R

is needed for observability), then

d
kRo x zo k2
dx
o kR
o x zk

= 2R
1
o zo
=R

0=

LS
x

(182)
(183)
(184)

The solution was obtained without squaring anything. Unfortunately, the omponent norm of
is the irredu ible part of the ost. In other words,

Re all our expression for

Px x



J(k)

LS
x

= kk2

(185)

from before

Px x = (H T R1 H )1

(186)

(187)

(H Ra1 RaT H )1
T
1

= (H H)

 T  1

 T
 T
Q
Ro

= R
Q
0
o
0
T
1
o R
o)
= (R
=
Remember that

kk2

is orthonormal, so

(188)
(189)
(190)

1 R
T )
(R
o
o

T Q
= I.
Q

(191)
Additionally, we know that

o
R

an be inverted

without problems at this point.


The matrix
in

oT R
o
HT H = R

is alled the

information matrix.

produ es a large amount of information about

x,

A large

leading to a small

HT H

indi ates data

Px x|z
.

8.3.1 Re ursive Least Squares Estimator


k+1
z k ), P (k, z k ) = Px x|z
x(k,
= H(k +
k and we get a new pie e of data, z
k+1
+ 1, z
1)x + w(k + 1). Ideally, we would like to nd x(k
) without starting from s rat h at ea h
Suppose we have

time step.

44

8.3 Square-Root-Based LS Solutions

8 LINEAR ESTIMATION FOR STATIC SYSTEMS

So let us set this up using sta ked ve tors and matri es:

k+1

H k+1
Rk+1
wk+1


zk
=
z(k + 1)
 k 
H
=
H k+1
 k

R
0
=
0 R(k + 1)


wk
=
w(k + 1)

(192)

(193)

(194)

(195)

We an show that

J(k + 1) = J(k) + [z(k + 1) H(k + 1)x]T R1 (k + 1)[z(k + 1) H(k + 1)x]


J(k)

The key to the re ursion is to rewrite

as

z k )]T P 1 (k, z k )[x x(k,


z k )]
J(k) = [x x(k,
k

k 1

(R )
+ [z H x]

The term after the

(197)

[z H x]

(198)

sign is the irredu ible omponent that doesn't depend on

x.

Therefore,

z k )]T P 1 (k, z k )[x x(k,


z k )]
J(k + 1) = [x x(k,
T

+ [z(k + 1) H(k + 1)x] R

(196)

(199)

(k + 1)[z(k + 1) H(k + 1)x] + irreducibles

J
Then to minimize the ost fun tion, we set x

=0

and solve for

(200)

+ 1, z(k + 1),
x(k

J
x
z k )] zH T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x]
= 2P 1 (k, z k )[x x(k,

0=

(201)
(202)

Solving this gives

+ 1, z(k + 1)) = [P 1 (k, z k ) + H T (k + 1)R1 (k + 1)H(k + 1)]1


x(k
z k ) + H T (k + 1)R1 (k + 1)z(k + 1)]
[P 1 (k, z k )x(k,

(203)
(204)

With some manipulation, we an get a ommon form we see in literature,

+ 1, z k+1 ) = x(k,
z k ) + W (k + 1)[z(k + 1) H(k + 1)x]

x(k
1
k
T
1
W (k + 1) = [P (k, z ) + H (k + 1)R (k + 1)H(k + 1)]1 H T (k + 1)R1 (k + 1)
This form of

is a feedba k/predi tion/ orre tion form with a gain matrix

45

W.

(205)
(206)

8.4 Example: Maximum Likelihood Estimate


8 LINEAR ESTIMATION FOR STATIC SYSTEMS
8.3.2 Analysis of Re ursive LS Algorithm
We an analyze

+ 1, z k+1 )
x(k

like this,

+ 1, z k+1 ) = x x(k
+ 1, z k+1 )
x(k

(207)

z ) W (k + 1)w(k + 1)
= [I W (k + 1)H(k + 1)]x(k,

Let's assume that

z k )wT (k + 1)] = 0.
E[x(k,

(208)

Then,

+ 1, z k+1 )x
T (k + 1, z k+1 )]
P (k + 1, z k+1 ) = E[x(k

(209)

z )x
(k, z )][I W (k + 1)H(k + 1)]
= [I W (k + 1)H(k + 1)]E[x(k,
T

+ W (k + 1)R(k + 1)W (k + 1)
We an nd alternate formulas for

(210)
(211)

P (k + 1, z k+1 ) and W (k + 1) an be

derived using the matrix

inversion lemma.

P (k + 1, z k+1 ) = P 1 (k, z k ) + [H T (k + 1)R1 (k + 1)H(k + 1)]1


W (k + 1) = P (k + 1, z k+1 )H T (k + 1)R1 (k + 1)

P (k + 1, z k+1 ) = [I W (k + 1)H(k + 1)]P (k, z k )


W (k + 1) = P (k, z k )H T (k + 1)[H(k + 1)P (k, z k )H k (k + 1) + R(k + 1)]1

(212)
(213)

(214)
(215)

8.4 Example: Maximum Likelihood Estimate


Re all a previous example,

zj = j x + w,

oming from

y = ay

(216)

zj = y(jt) + w(j)
a(jt)

j = exp

(217)
(218)
(219)

The ML estimate is

x
(k, z ) =

Pk

j=1

Pk

j z(j)

j=1

j2

2
P (k, z k ) = LS

= Pk

(220)

(221)

2
j=1 j

, w(j) N (0, 2 )

(222)

Now, in the new notation we ame up with above,

H(k + 1) = k+1
R(k + 1) =
46

(223)
(224)

8.5 Re ursive Approa h Using Square-Root8 LSLINEAR


Method ESTIMATION FOR STATIC SYSTEMS
Then it follows that

W (k + 1) =

We absorbed the
+ 1, z k+1 ).
x(k

+ 1, z
x(k

2
k+1

k+1

Pk

j=1

j2

k+1

k+1

k+1
=
Pk
2
k+1 + j=1 j2
k+1
= Pk+1
2
j=1 j


2
Pk
k+1 + 2
2
j=1

(225)

(226)

(227)

term in the denominator into the summation. Using this, we an get a new

 Pk j z(j)  
h
 Pk j z(j) i
k+1
j=1
j=1
)=
+ Pk+1
z(k + 1) k+1
Pk
Pk
2
2
2

j=1 j
j+1 j
j=1 j
Pk+1
j=1 j z(j)
= Pk+1
2
j=1 j

(228)

(229)

8.5 Re ursive Approa h Using Square-Root LS Method


8.5.1 Review of square-root LS method
Let's review the square-root LS method rst.
z = H x + w , w N (0, R), the data equation,

Given

1. Change of oordinates to turn data equation into a normalized form.

z = Hx + w, w N (0, I)

(230)

Do a Cholesky fa torization of R.
2. Set up the ost fun tion

J(x) = kHx zk2


Additionally, note that we have shown the

(231)

that minimizes a ost fun tion based on the

normalized data equation also minimizes a ost fun tion based on the original data equation.
Our solution

LS
x

will be the value of

that minimizes

J(x).

3. Transform the problem again

J(x) = kHx zk2

= kT (Hx z)k

(232)

(233)

LS
This works provided T is orthogonal, so we'll hoose a spe ial T that makes solving for x
R
= H from QR fa torization, so Q
is orthogonal, and R
is upper
T , and Q
easier. Let T = Q

47

8.5 Re ursive Approa h Using Square-Root8 LSLINEAR


Method ESTIMATION FOR STATIC SYSTEMS
triangular. Then,

T (Q
Rx
zk2
J(x) = kQ
Q
T zk2
= kRx
 
  2
R
o
z

=
x o

0
o x zo k2 + kk2
= kR
4. Minimize

J(x)

Assuming that

o
R

(234)
(235)
(236)
(237)

is invertible,

o1 zo
LS = R
x
LS ) = kk
J(x
kk2 is
o
Note: if R

The

(238)

(239)

irredu ible, the ost due to data that did not quite t.
is not invertible, then the problem was not quite observable, and the original

matrix had some linearly dependent olumns.


5. Analysis of the Solution

oT R
o )1
Px x|z
= (R

1 R
T
=R
o

(240)
(241)

We get the result by taking the inversion inside to avoid al ulating square inversion. We an
solve for the information matrix,

T R

HT H = R
o o

(242)

Remember the information matrix is larger in the positive denite sense when you have lots
of information. This is equivalent to the Fisher information matrix for an e ient estimator.

this ase, the information matrix is equal to the Fisher information matrix.
8.5.2 Re ursive Square-Root LS
So for

Let's onsider the re ursive square root approa h we have.

We are assuming that we have

already done normalization on our data equation.

J(k + 1) = kH k+1 x z k+1 k2

(243)

We sta k in oming data like before, normalizing measurements as they ome in, assuming independent measurements.




 2


Hk
zk


x
J(k + 1) =
H(k + 1)
z(k + 1)

= kH k x z k k2 + kH(k + 1)x z(k + 1)k2


o (k)x zo k2 + kH(k + 1)x z(k + 1)k2
= k(k)k2 + kR



 2
R
o (k)
zo (k)

x

= k(k)k2 +
H(k + 1)
z(k + 1)
48

(244)
(245)
(246)
(247)
(248)

8.5 Re ursive Approa h Using Square-Root8 LSLINEAR


Method ESTIMATION FOR STATIC SYSTEMS
o (k)x zo k2 and kH(k + 1)x z(k + 1)k2 are so similar,
kR
minimize for the same x.
With our sta ked x oe ient ve tor, we an do QR fa torization,


o (k)
R

QR =
H(k + 1)
Be ause

And if we transform via

we an sta k them while we

(249)

T ,
T =Q




 2
R
o (k + 1)
zo (k + 1)


x

J(k + 1) = k(k)k +
(k + 1)
0
o (k + 1)x zo (k + 1)k2
= k(k)k2 + k(k + 1)k2 + kR
2

Ignoring the irredu ible stu, we an now nd

+ 1, z k+1 ),
x(k

and the varian e

o1 (k + 1)zo (k + 1)
+ 1, z k+1 ) = R
x(k
P (k + 1, z k+1 ) = [(H k+1 )T H k+1 ]1
=
=
1
R
o

and

T
R
o

oT (k + 1)R
o (k +
[R
1
(k + 1)R
T (k
R
o
o

are square root information matri es.

49

1)]

(250)
(251)

P (k + 1, z k+1 ).
(252)
(253)

+ 1)

(254)
(255)

9 NONLINEAR LEAST SQUARE ESTIMATION

9 Nonlinear Least Square Estimation


S ribe: Chao Jia

9.1 Basi s of nonlinear least square estimation


Problem model:

z = h(x + w)
w N (0, R), w Rnz 1
fun tion of x. We an write
where

and

x Rnx 1 .

Normally

(256)

nz > nx . h(x)

is an

nx -by-1

Problem statement:

nd

h1 (x)
h2 (x)

h(x) = . .
..
hnz (x)

ve tor

(257)

to minimize the obje tive fun tion

JN LW (x

(nonlinear and

weighted).

JN LW (x) = [z h(x)]T R1 [z h(x)].

(258)

, our nonlinear LS estimate.


x that minimizes JN LW (x), then this be omes x
h(x) = Hx, also note that p(z|x) = C exp [ 21 JN LW (x)] (be ause
the noise is additive). Minimizing JN LW (x) is equivalent to maximizing the likelihood fun tion.
is also the ML estimate. It does not matter that h(x) is nonlinear. Hense, JN LW (x) has a
So x
If we an nd a unique

Note that in the linear ase

rigorous statisti al meaning.

Properties of JN LW (x):

1.

JN LW (x) 0

2.

JN LW (x) = 0 h(x) = z

(assumes

R > 0)

Use Cholesky fa torization to simplify the form of JN LW (x):


T
and wa = Ra w N (0, I). Then we have

ha (x) = RaT h(x),

RaT Ra = R.

JN LW (x) = (za ha (x))T (za ha (x))


= ||za ha (x)||

Drop the a's and onsider


now.

JN LW (x) = ||za ha (x)||2 .

50

Then let

za = RaT z ,
(259)
(260)

This will be our generi problem formulation

9.2 Newton-Rhaphson method

9 NONLINEAR LEAST SQUARE ESTIMATION

Aside:

The gradient operator is

x f T (x) =

T
) =
x = ( x


x1

x2
.
..

xn


x1

x2
. [f1 (x), f2 (x), . . . , fm (x)]
..

xn

The Ja obian is the transpose of this:


f
T
T
x = [x f (x)]
Dene

h
x xnom

= H(xnom ) = H .

Thus,

Hij =

f2
x1
f2
x2
.
.
.
f2
xn

x
f11
x2
.
.
.
f1
xn

hi
xj x

h(x)

(262)

(263)

x.

were linear, then from

9.2 Newton-Rhaphson method

(261)


J T

) = 2H T (z h(x)
0=(
x x

Our goal is to solve this equation for

To nd

Rnm



h
J
T
( ) = 2(z h(x)
T H.
= 2(z h(x)
x x
x x

So we need

Note that if

xn

We know that

..

fm
x1
fm
x2

.
.
.
fm
xn

nom

JN L is:
J


x1
J T
..
0=(
) = .

x x
J

First order ne essary ondition for the minimum

0 = 2H T (z h(x))

we have

= (H T H)1 H T z .
x

whi h satises the rst order ne essary ondition for minimum

JN L ,

rstly we will

onsider the Newton-Rhaphson (NR) method.


The NR method is originally used for nding zeros of a nonlinear fun tion.
As shown in the above gure, to nd the zero of a nonlinear fun tion
initial guess

x1

In Newton-Rhaphson
expansion:

x2 , where x2 = x1 x.
f (x1 )
method x = f (x ) . The NR
1

f (x),

we start from an

and update it as

method omes from rst order Taylor

0 = f (x) = f (x + x) = f (x1 ) + f (x1 )x + H.O.T.

x
=

f (x1 )
f (x1 )

51

(264)

9.2 Newton-Rhaphson method

9 NONLINEAR LEAST SQUARE ESTIMATION

Figure 16: Newton-Rhaphson method

In the urrent ontext (minimizing


Dene

=x
x
g,
x

then we have

JN L (x), suppose
= x
+x
g.
x

we have a guess for

and all it

g .
x

Based on the rst order ne essary ondition we need

g + x)[z

g + x)]

0 = H T (x
h(x

(265)

The ve tor Taylor series is dened as

= f (x
g + x)

f (x)

Let

f (x) = H T (x)[z h(x)],

(266)


f
g) + [
+ O(x
2)
= f (x
]x
x x g

(267)

then based on NR method we need to solve

g) + [
]x
0 = f (x
x x g

f
H T
h
=
[z h(x)] H T ( )
x
x
x
2h
= 2 [z h(x)] + H T H = V
x
f
2h
x2 is beyond a matrix, it is a tually a tensor of three indi es. x
f
entry in
x an be written as

where

=V


nz
X
2 hl
f
g )] + (H T H)ij
[zl hl (x
]ij =
x
xi xj x g
l=1

52

is symmetri . Ea h

(268)

9.3 Gauss-Newton Algorithm (Gill et al. 4.7.2)


(with step length algorithm)
9 NONLINEAR LEAST SQUARE ESTIMATION
g )(z h(x
g )) + V x)
. So we have x
= V 1 H T (x
g ). In
0 = H T (x
x
g + x
, then repeat this until onvergen e. (i.e., x
0)
for NR method just let x
g starts "su iently lose" to the optimum, then x
g onverges super linearly to the optimum
If x
under normal onditions (h(x) satises several smoothness requirements).
NR method says: solve

9.3 Gauss-Newton Algorithm (Gill et al. 4.7.2)


(with step length algorithm)
Problems with NR method:
1. painful to ompute

H
x

g
x

2. NR an diverge if

2h
x2

is "too far" from solution

soln produ es a small residual error, then ||z h(x


soln )|| is small, and it is reasonable to
If x
T
negle t the se ond order term in V , i.e., let V = H H .
= V 1 f (x
g ) = (H T H)H T [z h(x
g )].
Therefore, x
One an arrive at this expression via another straightforward route:

2
g ) Hx||
= ||z h(x)||
2
J(x)
= || z h(x
| {z }

(269)

Let

= ||z Hx||
2.
J(x)

The

in (269).

that minimize this ost fun tion is the same as

given

Comparison:
NR method applies a Taylor series expansion to the rst order ne essary onditions. On the

other hand, the GN method applies Taylor series to the measurement model:
To avoid divergen e, we modify the updating equation in GN method as

0 < 1.

g )+Hx+w

z = h(x
.
g x
g + x
, where
x

g + x]
is less than J[x
g ]. This guarantees onvergen e in
J[x
J(x) 0.
< J(x
g )?
Q: How do we know : 0 < 1 s.t. J(x g + x)

g + x)
, all the step length.
= J(x
A: Dene J()

dJ
= J(x

. ( hain rule)
g ) and d
Note: J(0)
)x
= ( J
x x
+x

Choose

s.t.

virtually

all situations be ause of ondition

Consider


dJ

d

=0

=
x
x x g

(270)

g )]T H(H T H)1 H T (z h(x


g ))
= 2[z h(x

g )))T (H T H)1 (H T (z h(x


g )))
= 2(H T (z h(x

(271)
(272)

This is a quadrati form. In ases where the nonlinear system is observable,


H T H > 0, whi h

dJ
implies rank(H) = nx (all olumns are linearly independent) and
d =0 < 0, with equality only
g ) = 0.
if z h(x
Thus for some small values of

g)
, J()
< J(x

is guaranteed!

Pra ti al but rude approa h for GN algorithm:

53

9.4 Levenberg-Marquart Method (LM)


1. Set
2.

9 NONLINEAR LEAST SQUARE ESTIMATION

= 1.

= 0), Jg
Jg = J(
new = J( = 1)

3. while

Jg new Jg

= /2

Jg new = J()
end
This will onverge to a lo al minimum.

9.4 Levenberg-Marquart Method (LM)


In ea h updating step of LM method, we have

g x
g + x
LM ,
x

where

LM = (H T H + I)1 H T [z h(x
g )]
x
with
If

0
LM
= 0 x

is equivalent to

in Gauss-Newton method with

The LM method does not use step size parameter

LM

GN

=0
=

=1
=0

instead it uses

(273)

= 1.
.

Pseudo LM algorithm:
1.

=0

2. he k if

H T H > 0,

3. ompute

something small, say

= ||H|| 0.001

LM ()
x

4. measure the ost


5. If

if not let

Jgnew Jg ,

g ), Jgnew = J(x
g + x
LM ())
Jg = J(x

then let

= max(2, ||H|| 0.001),

go to step (3)

Else we a ept the new guess

LM a hieves fast onvergen e near the solution if the residuals are small. If the residuals near
solution are not small, we may have to use full NR(Newton Rhaphson) method.

54

10 STOCHASTIC LINEAR SYSTEM MODELS

10 Sto hasti Linear System Models


S ribe: Noble Hatten

10.1 Continuous-time model for dynami systems

Figure 17: We will be dealing with NLTV sto hasti systems.


A ontinuous-time model for a dynami system is given by

x = A(t)x + B(t)u + D(t)


v (t)

(274)

x (the state ve tor) is nx 1, u (the input ve tor) is nu 1, v (the pro ess noise or disturban e)
nv 1, and the matri es A (the system matrix), B (the input gain), and D (the noise gain) are

where
is

appropriately dimensioned. The measurement model is given by

z(t) = C(t)x(t) + w(t)


where

(the measurement noise) is

Note: v

nz 1

and

(275)

(the measurement matrix) is

nz nx .

is ontinuous but not dierentiable, meaning that it annot properly be put into a

dierential equation. However, a more rigorous derivation of the equation still leads to the same
result.

Note:

If

v(t) = w(t)
= 0,

v(t)

x(t0 ) in order to predi t the


z() for t0 ), but sometimes

pdf of
data

x(t0 ) and u( ) for t0 t, one an predi t x and z

w(t)
are not equal to 0, it may be enough to know the
onditional pdfs of all future x(t) values ( onditioned on the

then, given

for the entire time interval. When

and

this is not the ase.

The solution of the above system is

x(t) = F (t, t0 )x(t0 ) +

F (t, )[B( )u( ) + D( )


v ( )]d

(276)

t0

where

Note: F

is the state transition matrix, sometimes denoted

(t, t0 ).

is dened by its properties:

F (t, t0 ) = A(t)F (t, t0 )


t
F (t0 , t0 ) = I
where

is the identity matrix. Other properties of

55

in lude

(277)
(278)

10.2 White noise for sto hasti systems

10 STOCHASTIC LINEAR SYSTEM MODELS

F (t, ) = F (t, )F (, )
F (t, ) = [F (, t)]1

If

A is onstant, then F (t, ) = F (t, 0) = eA(t ), where the matrix exponential is dened

as

eA(t ) = I + A (t ) +

1 2
A (t )2 + . . .
2!

(279)

(The matrix exponential may be al ulated in MATLAB using the expm() fun tion.)

If

is time-varying, then one must numeri ally integrate the matrix initial value problem in

order to determine

F (t, ).

10.2 White noise for sto hasti systems


v(t)

is white noise if

v(t)

is sto hasti ally independent of

v( )

for all

t 6=

and

E[
v (t)] = 0.

(The noise must be independent even when t and are very lose.) A onsequen e of whiteness
is that E[
v (t)
v T ( )] = V (t)(t ), whi h for V (t) = V = const implies that Svv (f ) = power
spe tral density of

v(t) = v ,

independent in time

zero mean

has ovarian e

i.e. the power spe trum is at. This implies that white noise is

V (t)(t )

This also implies a pro ess that has innite power be ause, at

t = , (t ) = .

However, we

"just go with" the  tion of white noise be ause it is onvenient and an be a good approximation
over a frequen y band (as opposed to the entire frequen y spe trum).

10.3 Predi tion of mean and ovarian e


The predi tion of the mean is

E[x(t)] = F (t, t0 )E[x(t0 )] +


= F (t, t0 )x(t
0) +
x(t)

F (t, )[B( )u( )]d

(280)

t0

F (t, )B( )u( )d

(281)

t0

Additionally,

= A(t)x(t)
+ B(t)u(t)
x
meaning that the predi tion of the mean follows the linear system. If

(282)

E[
v (t)] = v

(not zero-

mean), then

= A(t)x(t)
+ B(t)u(t) + D(t)
x
v
56

(283)

10.4 Dis rete-time models of sto hasti systems10 STOCHASTIC LINEAR SYSTEM MODELS
whi h is still deterministi .
The ovarian e is

T]
Pxx (t) = E[(x x)(x
x)
Substituting for

(284)

gives

0 )] +
Pxx (t) = E[(F (t, t0 )[x(t0 ) x(t

t0

0 )] +
F (t, 1 )D(1 )
v (1 )d1 )(F (t, t0 )[x(t0 ) x(t

F (t, 2 )D(2 )
v (2 )d2 )T ]

t0
(285)

Expanding gives

0 ))(x(t0 ) x(t
0 ))T ]F T (t, t0 ) +
Pxx (t) = F (t, t0 )E[(x(t0 ) x(t

Z tZ
t0

F (t, 1 )D(1 )E[


v (1 )
v T (1 )]D T (2 )F (t, 2 )d1 d2

t0
(286)

where E[
v (1 )
v T (1 )] = V (1 )(1 2 ). Also, ross terms in the ovarian e go to zero be ause
0 ))(
E[(x(t0 ) x(t
v T )] = 0 for all > t0 due to the whiteness of the noise. The sifting property of
the Dira delta allows us to ollapse one integral:

Pxx (t) = F (t, t0 )Pxx (t0 )F (t, t0 ) +

F (t, 1 )D(1 )V (1 )D T (1 )F T (t, 1 )d1

(287)

t0

P xx (t) = A(t)Pxx (t) + Pxx (t)AT (t) + D(t)V (t)D T (t)

(288)

Note: P xx (t) is symmetri and linear in Pxx (t).


Note: If A(t) = A = const and if Re[eig(A)] < 0 eigenvalues of A (i.e.

and if

V (t) = V = const, D(t) = D = const,

Pxxss ,

the steady-state value. Thus,

and

V > 0,

then

Pxx (t)

the system is stable)

onverges to a onstant

0 = APxxss + Pxxss AT + DV D T

(289)

is a linear matrix equation known as the ontinuous time Lyapunov equation.

To solve this

equation in MATLAB, use the lyap() fun tion

Pxxss = lyap(A, DV D T )

(290)

10.4 Dis rete-time models of sto hasti systems


Assume a zero-order hold ontrol input:

u(t) = u(tk ) = uk

for

tk t tk+1 .

Then an equivalent dis rete-time model of our original ontinuous system is:

x(tk+1 ) = F (tk+1 , tk )x(tk ) + G(tk+1 , tk )u(tk ) + v(tk )


57

(291)

10.4 Dis rete-time models of sto hasti systems10 STOCHASTIC LINEAR SYSTEM MODELS

Figure 18: A zero-order hold ontrol input holds a onstant value for

t [tk , tk+1 ).

where

G(tk+1 , tk ) =
v(tk ) =

tk+1

tk
Z tk+1

F (tk+1 , )B( )d

(292)

F (tk+1 , )D( )
v ( )d

(293)

tk

v(tk )

is the dis rete-time pro ess noise disturban e. If

v(t)

is white noise, then

E[v(tk )] = 0

and

E[v(tk )v (tj )] =

tj+1 Z tk+1

tj

where

F (tk+1 , 1 )D(1 )E[


v (1 )
v T (2 )]D T (2 )F T (tj+1 , 2 )d2 d2

(294)

tk

E[
v (1 )
v T (2 )] = V (1 )(1 2 ).
E[v(tk )v T (tj )] = jk

tk+1

Thus,

F (tk+1 , 1 )D(1 )V (1 )D T (1 )F T (tk+1 , 1 )d1

(295)

tk

= jk Qk
jk appears
j = k .)

(The Krone ker delta


are independent unless

(296)
be ause, if the

windows do not overlap, then the intervals

We may now simplify the notation:

x(tk ) x(k)

(297)

v(tk ) v(k)
F (tk+1 , tk ) F (k)

(299)

u(tk ) u(k)

G(tk+1 , tk ) G(k)
V (tk+1 , tk ) V (k)
58

(298)

(300)
(301)
(302)

10.5 Dis rete-time measurement model

10 STOCHASTIC LINEAR SYSTEM MODELS

The dynami s model then be omes:

x(k + 1) = F (k)x(k) + G(k)u(k) + v(k)


E[v(k)] = 0 and E[v(k)v T (j)] = kj Qk .
and if tk+1 tk = t = const, then

where
stant),

(303)

For a time-invariant system (A, B, D, V on-

F (k) = F = eAt
Z t
G(k) = G =
eA Bd
Q(k) =

(304)
(305)

eA DV D T eA

(306)

10.5 Dis rete-time measurement model


The dis rete-time measurement model is

z(k) = H(k)x(k) + w(k)


where

w(k)

(307)

is dis rete-time white measurement noise. This implies

E[w(k)] = 0

(308)

E[w(k)wT (j)] = kj R(k)

(309)

, but
R(k) = RT (k) > 0. We think of z(k) as a sample from z(t) = C(t)x(t) + w(t)
it is not orre t to say that z(k) = z(tk ). The problem lies in the assumption of whiteness, and
. Be ause of this, E[w(t
k )w
T (tk )] = (0)Q = . The orre t way
therefore innite power, for w(t)
to obtain z(k) is to assume an anti-aliasing lter is used to low-pass lter z(t) before sampling.
where

This an be modeled as an average-and-sample operation:


By the Nyquist sampling theorem, we must sample at twi e the bandwidth of the anti-aliasing
1
= t
).
Now, R(k) be omes

lter (fsamp

1
R(k) = 2
t
where

tk

tk

tk t tk t

1 )w
T (2 )]d1 d2
E[w(

k)
Rw(t
t

(310)

1 )w
T (2 )] = Rw(
1 )(1 2 ).
E[w(

10.6 Full dis rete-time model

Combining the dis rete-time dynami s and measurement models, we obtain the full dis rete-time
model.

59

10.6 Full dis rete-time model

10 STOCHASTIC LINEAR SYSTEM MODELS

x(k + 1) = F (k)x(k) + G(k)u(k) + v(k)


z(k) = H(k)x(k) + w(k)

(311)
(312)

E[v(k)] = 0, E[v(k)v T (j)] = Q(k)kj , E[w(k)] = 0, E[w(k)wT (j)] = R(k)kj ,


E[w(k)v (j)] = 0 k, j .
with

and

We an solve the dis rete-time dynami s equation.

x(k) = [F (k 1)F (k 2) F (0)]x(0) +

Note:

When

2) F (i + 1)]

i = k 1,

k1
X
i=0

{[F (k 1)F (k 2) F (i + 1)][G(i)u(i) + v(i)]}


(313)

the identity matrix

should be used in pla e for

in the summation.

We an also predi t the statisti s of

[F (k 1)F (k

x(k).

+ 1) = E[x(k + 1)] = F (k)x(k)

x(k
+ G(k)u(k)
+ 1))(x(k + 1) x(k
+ 1))T ] = F (k)Pxx F T k + Q(k)
Pxx = E[(x(k + 1) x(k

(314)
(315)

Pxx exploits E[(x(k) x(k))v


(k)] = 0 to an el ross terms.
F (k) = F = const, and if Q(k) = Q = const (LTI system), and if max(abs(eig(F ))) < 1

The simpli ation of


If

(asymptoti ally stable dis rete-time system), then

P (k)xx Pxxss
where

Pxxss

(316)

is a steady-state ovarian e. We then have the dis rete-time Lyapunov equation

Pxxss = F Pxxss F T + Q

(317)

Pxxss = dlyap(F , Q)

(318)

In MATLAB:

Also,

to

Q > 0 Pxxss > 0.

Q: Where does the requirement that max(abs(eig(F ))) < 1 for stability ome from?
A: F = eAt . One an show that || = eRe()t , where is the eigenvalue of F orresponding

an eigenvalue of

A.

I

Re() < 0,

then

|| < 1.

60

11 KALMAN FILTER FOR DISCRETE-TIME LINEAR SYSTEM

11 Kalman lter for dis rete-time linear system


S ribe: Alan Bernstein
1) Dynami s and measurement model

x(k + 1) = F (k)x(k) + G(k)u(k) + v(k)


z(k) = H(k)x(k) + w(k)

E[v(k)v(j)T ] = Qk jk

E[v(k)] = 0

E[w(k)] = 0

E[w(k)w(j) ] = Rk jk

E[w(k)v(j)T ] = 0 k, j
Note:

and

(319)
(320)

(321)
(322)

(323)

un orrelated is not required, but will be assumed here to simplify analysis.

2) A priori information about initial state

E[x(0)|Z 0 ] = x
(0)
T

(325)

(326)

(327)

E[(x(0) x(0))(x(0) x(0)) |Z ] = P (0)

E[w(k)(x(0) x(0)) ] = 0 k 0
E[v(k)(x(0) x(0)) ] = 0 k 0

3)

v(k), w(k), x(0)

(324)

are all Gaussian random variables

Here we have hosen some spe i onditions for setting up the Kalman lter. Later, we will
relax some of these onditions, or investigate the impli ations of them being violated.
There are several generi estimation problems:
x
(k|Z k ) = x
(k|z0, ..., zk ) - Use measurements up to time k, to estimate

Filtering: Determine

state at time k. This an be done in real time, and ausally - it does not depend on future states.
Smoothing: Determine x
(j|Z k ), for j < k - Use future data to nd an improved estimate of a
histori al state. This is non ausal.
(j|Z k ), for
Predi tion: Determine x

j > k

- Estimation of a future state - this gives the worst

estimate of the three types.


Predi tion is always hard, espe ially when it's about the future - Grou ho Marx
We will on entrate on the ltering problem for now.
There are several dierent standard notations for the ltering problem. Bar-Shalom's notation
is unambiguous, but umbersome, so we will use a leaner alternative.

61

11 KALMAN FILTER FOR DISCRETE-TIME LINEAR SYSTEM

Bar-shalom

Humphreys

x
(k|Z k ) = x(k|k)
x
(k + 1|Z k ) = x(k + 1|k)
P (k|Z k ) = P (k|k)
P (k + 1|Z k ) = P (k + 1|k)

x
(k)
x(k + 1)
P (k)
P (k + 1)

others
+

name

x
(k)
x
(k + 1)
P + (k)

P (k + 1)

a posteriori state estimate


a priori state estimate
a posteriori state estimate error ovarian e
a priori state estimate error ovarian e

This notation is ni e be ause it orresponds to the prior in the stati estimation equations.
Filtering steps (derivation based on MMSE)

0) set

k = 0,

then

x
(k), P (k)

known

1) state and ovarian e propagation: predi t state and error ovarian e at step
on data through

k+1, onditioned

z(k)

state estimate propagation:

x
(k + 1) = E[x(k + 1)|Z k ]

(328)

x
(k + 1) = E[F (k)x(k) + G(k)u(k) + v(k)|Z ]

x
(k + 1) = F (k)E[x(k)|z k ] + G(k)u(k) + 
E[v(k)|Z
k ]
x
(k + 1) = F (k)
x(k) + G(k)u(k)

(329)
(330)
(331)

state ovarian e propagation:

P (k + 1) = E[(x(k + 1) x
(k + 1))(x(k + 1) x(k + 1))T |Z k ]
P (k + 1) = F (k)P (k)F T (k) + Q(k)

(332)
(333)

Some additional steps for this an be found on page 204 of Bar-Shalom. The ross-terms are zero
due to the fa t that

v(k)

x
(k).

is zero mean and white, and orthogonal to

The rst term tends to de rease (sin e the absolute value of the eigenvalues of F are less than one),
but be ause of the additive

term, the overall ovarian e grows (in the positive denite sense).

2) measurement update: use

x
(k + 1) and P (k + 1) (a priori info)

and the measurement

z(k + 1)

to get an improved state estimate with a redu ed estimation error ovarian e, due to our measurement update. This next step has been solved previously in lass, in the review of linear algebra.
Bar-Shalom derivation: get distribution of

[x(k + 1) z(k + 1)]T ,

onditioned on

Z k+1 ,

and solve

by analogy to stati ase.

z(k + 1) = E[z(k + 1)|Z k ]

(334)

z(k + 1) = E[H(k + 1)x(k + 1) + w(k + 1)|Z


z(k + 1) = H(k + 1)E[x(k + 1)|z
z(k + 1) = H(k + 1)
x(k + 1)

62

k+1

k+1

]
((
((k+1
(1)|Z
E[w(k
]
] +(
((+

(335)
(336)
(337)

11 KALMAN FILTER FOR DISCRETE-TIME LINEAR SYSTEM

Pzz (k + 1) = E[(z(k + 1) z(k + 1))(z(k + 1) z(k + 1))T |Z k ]


Pzz (k + 1) = E[[H(k + 1)(x(k + 1) x
(k + 1)) + w(k + 1)]

[H(k + 1)(x(k + 1) x(k + 1)) + w(k + 1)]T |Z k ]


Pzz (k + 1) = H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)

Pxz (k + 1) = E[(x(k + 1) x
(k + 1))(z(k + 1) z(k + 1))T |Z k ]
Pxz (k + 1) = E[(x(k + 1) x
(k + 1))(x(k + 1) x
(k + 1))T ]H T (k + 1)+
(
(
((((
((
E[(x(k (
+(
1)(
(
x(k
+ 1))wT (k + 1)]
(((
Pxz (k + 1) = P (k + 1)H T (k + 1)
Thus, we have all the moments required to spe ify

P ([x(k + 1)z(k + 1)]T |z k ),




 
x(k + 1)
x(k + 1)
P
N
,
z(k + 1)
H(k + 1)
x(k + 1)
H P
where the

(k + 1)

P H T

HP HT + R

so



(338)

index on the elements of the ovarian e matrix are suppressed for brevity, and
zk.

the distribution is onditioned on


We seek:

x
(k + 1) = E[x(k + 1)|Z k+1 ]
= E[x(k + 1)|z(k + 1), Z k ]
= E[x(k + 1)|z(k + 1)]
Here, the onditioning on

Zk

is impli it; by suppressing this onditioning, the form of this problem

is made to resemble the stati ase. So, the problem has now been redu ed to one we've already
solved (in Linear Estimation of Stati Systems):

1
x(k + 1) = x
(k + 1) + Pxz (k + 1)Pzz
(k + 1)[z(k + 1) z(k + 1)]
1
T

P (k + 1) = P (k + 1) Pxz (k + 1)Pzz (k + 1)Pxz


(k + 1)
substituting:


1
x = x + P H T H P H T + R
[z H x
]
P = P P H T [H P H T + R]1 H P
where the

(k + 1)

index is suppressed on ea h term in both expressions for brevity.

63

(339)
(340)

11 KALMAN FILTER FOR DISCRETE-TIME LINEAR SYSTEM


Using Bar-Shalom notation:

S(k + 1) = Pzz = H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)


(k + 1) = z(k + 1) H(k + 1)
x(k + 1)
T

W (k + 1) = P (k + 1)H (k + 1)S(k + 1)

(innovation ovarian e )
(innovation )
(Kalman

gain matrix )

x
(k + 1) = x(k + 1) + W (k + 1)(k + 1)
P (k + 1) = P (k + 1) W (k + 1)S(k + 1)W T (k + 1)
Summary:
given
0) set

x
(0), P (0),
k=0

1) propagate state and ovarian e ( ompute

x
(k + 1), P (k + 1)

2) measurement update of state and ovarian e:


ompute:

(k + 1) = z(k + 1) H(k + 1)
x(k + 1)
T

S(k + 1) = H(k + 1)P (k + 1)H (k + 1) + R(k + 1)

(innovation )

(innovation ovarian e )
T

W (k + 1) = P (k + 1)H (k + 1)S(k + 1)
(gain )
x(k + 1) = x
(k + 1) + W (k + 1)(k + 1)
(a posteriori or ltered state estimate)
T

P (k + 1) = P (k + 1) W (k + 1)S(k + 1)W (k + 1)
(a posteriori state error ovarian e )
3)

k k+1

4) goto (1)

64

12 ALTERNATIVE FORMULAS FOR COVARIANCE AND GAIN OF KALMAN FILTER

12 Alternative formulas for ovarian e and gain of Kalman


lter
Note: These are algebrai ally equivalent to the formulas given previously

P (k + 1) = [P 1 (k + 1) + H T (k + 1)R1 (k + 1)H(k + 1)]1


P (k + 1) = [I W (K + 1)H(k + 1)]P (k + 1)[I W (K + 1)H(k + 1)]T

(341)

. . . + W (K + 1)R(k + 1)W T (K + 1)

(342)

The se ond of these is alled the Joseph Form of the state ovarian e update. It guarantees

P (k + 1) > 0, even with the limited pre ision of a omputer. The


P (k + 1) is always positive denite, so this an help implementation.

Kalman lter assumes that

W (k + 1) = P (k + 1)H T (k + 1)R1 (k + 1)

(343)

that

Sin e

P (k + 1)

is usually al ulated using

W (k + 1),

this last form is more useful for derivation

and analysis than as part of the ( ausal) lter.

12.0.1 Interpretation of the Kalman gain


What is the

W (k + 1)?

Again repeat the denition of it:

W (k + 1) = P (k + 1)H T (k + 1)[H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)]1


x(k + 1) = x
(k + 1) + W (k + 1)(k + 1)

(344)
(345)

Remarks

As

If

If

W (k + 1) 0

then new measurements are not taken into a ount in

P (k + 1) is very small then W (k + 1) is also small and x(k + 1)


+ 1) being small implies that x
(k + 1) is a good estimate)

(k
(P

P (k + 1)

is so big that

an upper limit.

H(k + 1)P (k + 1)H T (k + 1) R(k + 1)

x
(k + 1)

diers little from

then

W (k + 1)

x(k + 1)

approa hes

In this ase we trust the measurements almost entirely in the subspa e in

whi h they give information.

The subspa e in whi h the measurements provide information


If

nz < nx

or the range

P (k + 1)

then the measurements provide information in an

H(k + 1). Spe i ally,


of H(k + 1):

determined by

nz -dimension

subspa e that is

this subspa e is the omplement of the null spa e of

z(k + 1) range[H(k + 1)] = (null[H(k + 1)])


gives the weighting of the update in this subspa e.

This an be visualized in two dimensions:

65

H(k + 1),

12 ALTERNATIVE FORMULAS FOR COVARIANCE AND GAIN OF KALMAN FILTER

range[H(k + 1)]
state update

x
(k + 1)

x
(k + 1)

Figure 19: The state update o urs in the subspa e in whi h the measurements provide information

12.0.2 Generalization for Weighted Pro ess Noise


We an modify the dynami s equations to have weightings on the noise su h that:

x(k + 1) = F (k)x(k) + G(k)u(k) + (k)v(k)


Here setting

=I

(346)

re overs the previous (unweighted) form. With this generalization, the only

hange in the Kalman lter is in the ovarian e propagation, whi h is now given by:

P (k + 1) = F (k)P (k)F T (k) + (k)Q(k)T (k)

(347)

12.0.3 Deriving the Kalman Filter from a MAP approa h


The value of approa hing the ltering problem from a maximum a-posteriori estimation (MAP)
approa h:

E[x|Z k ]

Can be better for nonlinear problems or whenever

Aids interpretation of square-root ltering te hniques

Aids in statisti al hypothesis testing of the Kalman lter

is di ult to determine

For dis rete, statisti al, linear, time-varying (SLTV) systems, the result will be equivalent to
the MMSE-based derivation:

x
MAP (k) = x
MMSE (k)

12.0.4 Setting up the ost fun tion


Note: We will be estimating pro ess noise

v(k)

along with the state.

This omes out of the

equations anyway and sets us uf for doing smoothing in the future.


k
Note: Conditioning on Z is implied in all of this derivation but not shown expli itly for brevity

p[x(k + 1), v(k)|z(k + 1)] =

p[z(k + 1)|x(k + 1), v(k)]p[x(k + 1), v(k)]


p[z(k + 1)]
66

(348)

12 ALTERNATIVE FORMULAS FOR COVARIANCE AND GAIN OF KALMAN FILTER


p[x(k + 1), v(k)|z(k + 1)] with respe t to x(k + 1) and v(k). This allows us to
p[z(k + 1)] and just maximize the numerator. This is equivalent to minimizing
J(x(k + 1), v(k)) (where log() denotes the natural logarithm):

Now maximize
essentially ignore
the ost fun tion

J(x(k + 1), v(k)) = log(p[z(k + 1)|x(k + 1), v(k)]) log(p[x(k + 1), v(k)])

(349)

Note that px(k+1),v(k) [x(k + 1), v(k)] = Cpx(k),v(k) [x(k), v(k)] where C is a onstant and the
subs ripts make lear that these are two distin t probability distributions.

Q: Why an we make the above transformation from p[x(k + 1), v(k)] to p[x(k), v(k)]?
A: Be ause x(k + 1) is a fun tion of x(k). More generally, for any invertible 1-to-1 fun tion

Y = g(X),

it an be shown that

pY [y] =

pX [g1 (y)]
| dy
dx |

Applying this transformation to the probability leads to an additive onstant in the ost fun tion
be ause of the

log,

and then this onstant an be ignored in minimizing the ost fun tion. The ost

fun tion be omes the sum of three parts:

1
J(x(k + 1), v(k)) = [z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)
2
. . . [z(k + 1) H(k + 1)x(k + 1)]

1
1
(k)]T P 1 [x(k) x
(k)] + v T (k)Q1 (k
. . . + [x(k) x
2
2

12.0.5 Minimizing the ost fun tion


The ost fun tion developed above an be written su in tly as:

1
[z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]
2
1
1
(350)
. . . + [x(k) x(k)]T P 1 [x(k) x
(k)] + v T (k)Q1 (k)v(k)
2
2

J(x(k + 1), v(k)) =

Repla e

x(k)

with

x(k) = F 1 [x(k + 1) G(k)u(k) (k)v(k)]

1
[z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]
2
1
(k)]T P 1
. . . + [F 1 [x(k + 1) G(k)u(k) (k)v(k)] x
2
1
. . . [F 1 [x(k + 1) G(k)u(k) (k)v(k)] x
(k)] + v T (k)Q1 (k)v(k) (351)
2

J(x(k + 1), v(k)) =

Simplify this some by re alling that

x
(k + 1) = F (k)
x(k) + G(k)u(k)

1
[z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]
2
1
. . . + [F 1 [x(k + 1) (k)v(k) x
(k + 1)]]T P 1
2
1
(352)
. . . [F 1 [x(k + 1) (k)v(k) x
(k + 1)]] + v T (k)Q1 (k)v(k)
2

J(x(k + 1), v(k)) =

67

12 ALTERNATIVE FORMULAS FOR COVARIANCE AND GAIN OF KALMAN FILTER


Now minimize

J(x(k + 1), v(k))

with respe t to

x(k + 1)

and

v(k)

by nding the respe tive

rst-order ne essary onditions




J
v(k)

J
x(k + 1)

T
T

= 0 = T (k)F T (k)P 1 (k)F 1 (k)[x(k + 1) (k)v(k) x


(k + 1)]
. . . + Q1 (k)v(k)

(a)

(353)

(b)

(354)

= 0 = H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]


. . . + F T (k)P 1 (k)F 1 (k)[x(k + 1) (k)v(k) x
(k + 1)]

Now we wish to reformulate (a) and (b) above into a form su h that:

  
x(k + 1)
C1
=
v(k)
C2

Then we an solve by taking the inverse of

or by subsitution (Cholesky fa torization)

12.0.6 Solving for the minimum- ost estimate


In order to get the equations (a) and (b) into the desired form, rst solve (a) for

v(k)

v(k) = [T (k)F T (k)P 1 kF 1 (k)k + Q1 (k)]1 T (k)F T (k)P 1 kF 1 (k)[x(k + 1) x


(k + 1)]
Now using the matrix inversion lemma and then the denition of

P (k + 1):

v(k) = Q(k)T (k)[F (k)P (k)F T (k) + (k)Q(k)T (k)]1 [x(k + 1) x


(k + 1)]
T
1
v(k) = Q(k) (k)P (k + 1)[x(k + 1) x
(k + 1)]

(355)

(356)
(357)

Substitute this result into (b) to get the next equation, and then olle t terms and manipulate
to simplify.

0 = H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]


. . . + F T (k)P 1 kF 1 (k)[x(k + 1) (k)Q(k)T (k)P 1 (k + 1)[x(k + 1) x
(k + 1)] x
(k + 1)]

(358)

= H (k + 1)R

(k + 1)[z(k + 1) H(k + 1)x(k + 1)]


. . . + F T (k)P 1 kF 1 (k)[I (k)Q(k)T (k)P 1 (k + 1)][x(k + 1) x
(k + 1)]
Now repla ing

with

P (k + 1)P 1 (k + 1)

(359)

we have:

0 = H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]


. . . + F T (k)P 1 kF 1 (k)[P (k + 1) (k)Q(k)T (k)]P 1 (k + 1)[x(k + 1) x
(k + 1)]
68

(360)

12 ALTERNATIVE FORMULAS FOR COVARIANCE AND GAIN OF KALMAN FILTER


(k + 1) it an be shown that
Then using the denition of P
T
(k)Q(k) (k)] = I so the equation be omes:

F T (k)P 1 kF 1 (k)[P (k + 1)

0 = H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)] + P 1 (k + 1)[x(k + 1) x


(k + 1)]
Now solve for

x(k + 1),

whi h we now all

(after the measurement update).

x
(k + 1)

(361)

be ause it is the a posteriori state estimate

Then manipulate the result to get it in the same form as the

previous Kalman lter derivation

x
(k + 1) = [P 1 (k + 1) + H T (k + 1)R1 (k + 1)H(k + 1)]1
. . . + [P 1 (k + 1)
x(k + 1) + H T (k + 1)R1 (k + 1)z(k + 1)]
1

= x(k + 1) + [P

(k + 1) + H (k + 1)R

(k + 1)H(k + 1)]

(362)

H (k + 1)R

. . . [z(k + 1) H(k + 1)
x(k + 1)]
= x(k + 1) + W (k + 1)[z(k + 1) H(k + 1)
x(k + 1)]

(k + 1)
(363)
(364)

As expe ted this agrees with the previous derivation. We an also substitute this ba k into
to get an astimate for it:

v(k)

v(k)

v(k) = Q(k)T (k)P 1 (k + 1)[


x(k + 1) x(k + 1)]
T
1

v(k) = Q(k) (k)P (k + 1)W (k + 1)[z(k + 1) H(k + 1)


x(k + 1)]

(365)
(366)

This is extra information whi h we did not get from the previous MMSE derivation.
The end result for the state guess

x
(k + 1)

is the same as for the MMSE-based derivation but

now there is also an estimate for the pro ess noise

v(k). W (k + 1)

is dened as before.

x
(k + 1) = x(k + 1) + W (k + 1)[z(k + 1) H(k + 1)
x(k + 1)]
T
1

v(k) = Q(k) (k)P (k + 1)W (k + 1)[z(k + 1) H(k + 1)


x(k + 1)]
Note: Even in a perfe t model
white.

v(k) 6= v(k)

be ause

69

v(k)

is onditioned on

z(k + 1)

(367)
(368)
and is not

13 STABILITY AND CONSISTENCY OF KALMAN FILTER

13 Stability and Consisten y of Kalman Filter


S ribe: Mi hael Szmuk

13.1 Stability of KF
Assume

v(k) = w(k) = 0 k

(zero-input stability).

We want to show that the error ve tor

de ays to zero:

e(k) = x(k) x(k)

(369)

Pseudo Proof:

+ 1)
e(k + 1) = x(k + 1) x(k

(370)

+ 1) + W (k + 1){z(k + 1) H(k + 1)x(k


+ 1)}]
= F (k)x(k) + G(k)u(k) [x(k

(371)

However,

z(k + 1) = H(k + 1)x(k + 1) = H(k + 1)[F (k)x(k) + G(k)u(k)]


+ 1) = F (k)x(k)

x(k
+ G(k)u(k)
Substituting the above auses the

u(k)'s

to an el, giving the error dynami s for

(372)
(373)

v(k) = w(k) = 0

k :
e(k + 1) = [I W (k + 1)H(k + 1)]F (k)e(k)

(374)

It is a little tri ky to prove stability be ause this is a time varying system. If the system was
not time varying, it would be possible to look at the modulus of the eigenvalues. To analyze, we
will use a Lyapunov-type energy method to prove stability. Dene an energy-like fun tion

V [k, e(k)] =

1 T
e (k)P 1 (k)e(k)
2

(375)

V is a weighted 2-norm of e(k) be ause P (k) > 0. Therefore, V 0,


if e(k) = 0. We wish to show that V always gets smaller as k in reases.

Then,
only

with equality if and

1 T
e (k + 1)P 1 (k + 1)e(k + 1)
2
1
= eT (k)[P (k) + D(k)]1 e(k)
2

V [k + 1, e(k + 1)] =

(376)
(377)

where

D(k) , F 1 (k)[Q(k)T + P (k + 1)H T (k + 1)R1 (k + 1)H(k + 1)P (k + 1)]F T (k)


Note

D(k) 0,

[P (k) + D(k)]1 < P 1 (k).


Q(k), R(k), F (k), and H(k) (i.e.

whi h implies

under suitable onditions on

Thus

(378)

V [k + 1, e(k + 1)] V [k, e(k)]


Q and R are not too big

observable,

or small, ontrollable with respe t to points of entry of pro ess noise, or "sto hasti ontrollability
and observability").
Here observable implies that all unstable or neutrally stable subspa es of original system are
observable. However, the original system need not be stable! Then, we an show that:

70

13.2 Control of a System Estimated


13 STABILITY
by KF
AND CONSISTENCY OF KALMAN FILTER
1.

P 1 (k) > 0

2.

V [k + N, e(k + N )] < V [k, e(k)]

Thus,

for some bound

is de reasing but not be ause

for some

P (k)

: 0 < 1

and some

N.

is in reasing.

13.2 Control of a System Estimated by KF


Re all the system:

x(k + 1) = F (k)x(k) + G(k)u(k) + (k)v(k)


z(k) = H(k)x(k) + W (k)

(379)
(380)

Re all the Kalman Filter equations:

+ 1)
(k + 1) = z(k + 1) H(k + 1)x(k
S(k + 1) = H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)
W (k + 1) = P (k + 1)H T (k + 1)S 1 (k + 1)
+ 1) = x(k
+ 1) + W (k + 1)(k + 1)
x(k
P (k + 1) = P (k + 1) W (k + 1)S(k + 1)W T (k + 1)
Now, suppose we have a ontrol law
by:

u(k) = C(k)x(k).

Innovation

(381)

Innovation Covarian e

(382)

Kalman Gain Matrix

(383)

A Posteriori State Est.

(384)

A Posteriori State Error Cov.

(385)

Then, the losed-loop dynami s are given

x(k + 1) = [F (k) G(k)C(k)]x(k) + (k)v(k)


If the pair

(F, G)

(386)

is ontrollable then we an design a stabilizing ontroller. However, it may be

expensive, impra ti al, or impossible to measure all of the states. What if we use

x(k)

instead?

Can this stabilize the system?

Q:

for x(k)? Alternatively,


x(k)
?
x(k)
. The system be omes:
A: Then, u(k) = C(k)x(k)
Can we substitute

an we measure

z(k),

estimate

x(k),

and

feed ba k

x(k + 1) = F (k)x(k) G(k)C(k)x(k)


+ (k)v(k)
+ 1) = [I W (k + 1)H(k + 1)][F (k)x(k)

x(k
G(k)C(k)x(k)]+

W (k + 1){H(k + 1)[F (k)x(k) G(k)C(k)x(k)


+ (k)v(k)] + w(k + 1)}
Change oordinates:






x(k)
x(k)
x(k)

=
.

x(k)
e(k)
x(k) x(k)

(387)
(388)
(389)

Then, the dynami s of the overall

ontroller-estimator system are given by:

x(k + 1) = [F (k) G(k)C(k)]x(k) + G(k)C(k)e(k) + (k)v(k)

e(k + 1) = [I W (k + 1)H(k + 1)]F (k)e(k)+


[I W (k + 1)H(k + 1)](k)v(k) W (k + 1)w(k + 1)

71

(390)
(391)
(392)

13.3 Matrix Ri ati Equation 13 STABILITY AND CONSISTENCY OF KALMAN FILTER

Q: Is this system stable?


A: For analysis, ignore exogenous

inputs. Also, re ognize that error dynami s do not depend

on the state (not true for nonlinear systems). We already showed that
any linear KF that satises the sto hasti observability onditions, et .

e(k) 6= 0

e(k) 0

as

for

Then, after a long time

and the state dynami s obey

x(k + 1) = [F (k) G(k)C(k)]x(k)


whi h is stable by our hoi e of

(393)

C(k).

This is alled the separation prin iple. That is, one an design a full-state feedba k ontrol law
separately from the KF. When onne ted, the system will be stable. The ombined system ends up
with the properties of the two independent systems (in terms of poles and zeros). In pra ti e, the
poles of the observer should be to the left (i.e. faster) than those of the ontrolled plant or pro ess.
Note: If the estimator has modeling errors (e.g.

or

not perfe tly known), then the system

may not de ouple.

13.3 Matrix Ri ati Equation


Re all that

P (k + 1) = F (k)P (k)F T (k) + (k)Q(k)T (k)


We substituted for

P (k)

(394)

to get:

P (k + 1) = F (k){P (k) P (k)H T (k)[H(k)P (k)H T (k) + R(k)]1 H(k)P (k)}F T (k)+
T

(k)Q(k) (k)

This gives us a dynami model for

(395)
(396)

P (k),

whi h is nonlinear in

P (k).

This is alled the Matrix

Ri ati equation (MRE). Beware: Analysis of MRE is not easy! Spe ial ase: Steady state solution
for LTI systems:
and if

Q, R > 0

F (k) = F , G(k) = G,

et . If the pair

(F, H)

is observable,

then MRE onverges to a steady-state solution

Pss > 0,

(F, )

is ontrollable,

whi h an be determined

by solving the Algebrai Ri ati Equation (ARE):

Pss = F Pss Pss H T [H Pss H T + R]1 H Pss F T + QT


where

P (0) > 0.

(397)

The MATLAB fun tion for solving the ARE is:

[Wss , Pss , Pss ] = dlqe(F, , H, Q, R)

72

(398)

13.4 Steady-State KF Equations13 STABILITY AND CONSISTENCY OF KALMAN FILTER

13.4 Steady-State KF Equations


The steady-state Kalman Filter equations are given by

+ 1) = F x(k)

x(k
+ Gu(k)
+ 1) = x(k
+ 1) + Wss [z(k + 1) H(k + 1)x(k
+ 1)]
x(k

(399)
(400)

13.5 Steady-State Error Dynami s


The steady-state error dynami s are given by

e(k + 1) =Ass HF e(k)

(401)

Ass , I Wss H

(402)

where

We know that the dynami s of

e(k)

are stable from

max(abs(eig(Ass ))) < 1.

NB: Original system dynami s may not have been stable.

13.6 Properties of KF Innovations


, where E[(k)] =
The KF innovation ve tor is given by (k) = z(k) H(k)x(k)
T
show that E[(k) (j)] = S(k)kj (i.e. the innovation is white), where we re all that

0.

S(k) , H(k)P (k)H T (k) + R(k)


The whiteness of

(k)

We an

(403)

follows from the whiteness of the pro ess noise:

+ 1)]
v(k) = [T F T P 1 F 1 + Q1 ]1 T F T P 1 F 1 [x(k + 1) x(k

(404)

Using the Matrix Inversion Lemma,

+ 1)]
v(k) = Q(k)T (k)[P 1 (k + 1)[x(k + 1) x(k

(405)

P (k + 1) , F P F T + QT

(406)

where we re all that

Then,

0 = H T R1 [z(k + 1) Hx(k + 1)]+


+ 1)] x(k
+ 1)}
F T P 1 F 1 {x(k + 1) QT P 1 (k + 1)[x(k + 1) x(k
T

= H R
F

T
T

[z(k + 1) Hx(k + 1)]+


+ 1)]
P F 1 [P P 1 QT P 1 (k + 1)][x(k + 1) x(k

= H R

[z(k + 1) Hx(k + 1)]+


+ 1)]
F P F 1 [P (k + 1) QT ]P 1 (k + 1)[x(k + 1) x(k
T 1
1
+ 1)]
= H R [z(k + 1) Hx(k + 1)] + I P (k + 1)[x(k + 1) x(k
T

73

(407)
(408)
(409)
(410)
(411)
(412)
(413)
(414)

13.7 Likelihood of a Filter Model13 STABILITY AND CONSISTENCY OF KALMAN FILTER


where the last step is a hieved using the denition of

P (k + 1).

We an then solve for

+ 1) su h
x(k

that

+ 1) = [P 1 + H T R1 H]1 [P 1 x(k
+ 1) + H T R1 z(k + 1)]
x(k

(415)

This is just like the re ursive least-squares form. With further manipulation we an get:

+ 1) = x(k
+ 1) + [P 1 + H T (k + 1)R1 (k + 1)H(k + 1)]1
x(k
+ 1)]
H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k
+ 1) + W (k + 1)[z(k + 1) H(k + 1)x(k
+ 1)]
= x(k

Substituting this into the equation for

v(k)

(417)
(418)

gives

+ 1) x(k
+ 1)]
] = Q(k)T (k)P 1 (k + 1)[x(k
T
1

+ 1)]
= Q(k) (k)P (k + 1)W (k + 1)[z(k + 1) H(k + 1)x(k

v(k) = E[v(k)|Z

(416)

k+1

(419)
(420)

This gives us more information about the pro ess noise.

Q: How do we know if the lter is working properly?


A: If it is, then E[(k) T (j)] should behave as kj S(k).

If not, then there ould be a modeling

or oding error, or the system may be subje t to olored noise (i.e. non-white noise).
NB: This is only a ne essary ondition, not su ient.

That is, satisfying these onditions does

not imply that the lter will work properly.

13.7 Likelihood of a Filter Model


Consider the MAP form of the KF:

+ 1), (k)]
min{J[x(k + 1), (k)]} = J[x(k
1 T
= (k + 1)S 1 (k + 1)(k + 1)
2

(421)
(422)

Use this to get the likelihood fun tion for the orre tness of the KF model.

P [Z k |KF

mode]

1)]}
= C exp{J[x(1),
(0)]} exp{J[x(2),
(1)]}
exp{J[x(k),
(k
(423)

= C exp{

1
2

k
X

T (j)S 1 (j)(j)}

(424)

j=1

Suppose we have multiple KF models with dierent

Q: Whi h model do we trust the most?


A: The one with largest P [Z k |KF mode].

J ,

k
X

, P (0).
F , , Q, R, H , x(0)

This amounts to nding the KF with the minimum

T (j)S 1 (j)(j)

(425)

j=1

That is, the one with the minimum weighted least-square error. This leads to the Multiple Model
approa h.

74

14 KALMAN FILTER CONSISTENCY

14 Kalman Filter Consisten y


S ribe: Anamika Dubey
Re all the denition of onsisten y:

x(k)
T] = 0
E[x(k)
This doesn't hold in most auses as eviden e by the fa t that

(426)

P (k)

does not go to zero in most

ase. The ulprit is the pro ess noise. Hen e, for our purpose we de rease our requirements for
onsisten y.

14.1 Consisten y Denition


A state estimator (KF) is alled onsistent if its state estimation errors satisfy the onditions
given below. This is nite-sample onsisten y property, whi h means analysis for the estimation
errors based on a nite number of samples.
1. Unbiased estimator - Have zero mean

E[x(k)]
=0

(427)

2. E ient estimator - Have ovarian e matrix as al ulated by the lter

x(k)
T ] = P (k) = J 1 (k)
E[x(k)
where,

J 1 (k)

(428)

is the sher information matrix

In ontradi tion, the parameter estimator onsisten y is an asymptoti (innite size sample)
property. Typi ally the onsisten y riteria of the lter are as follows:
1. The state errors should have zero mean and have ovarian e matrix as al ulated by the lter.
2. The innovations should also have the same property as mentioned in 1.
3. The innovation should be a eptable as white.
The rst riteria, whi h is most important one, an be tested only in simulation (Monte- arlo
simulations). The last two riteria an be tested on real data (single-run / multiple-runs). In theory
these properties should hold but in pra ti e they might not. Following are the reasons for this:

Modeling error - an be solved using tuning the lter

Numeri al error - an be solved using square root information (SRI) te hnique

Programming error - solve by xing the bugs

14.2 Statisti al test for KF performan e


Consisten y of a lter an be tested in two ways.

First, o-line-test - Using multiple runs

(Monte- arlo simulations); Se ond, Real-time test - using real-time measurements by single-run or
multiple-runs of the experiment (if the experiment an be repeated).

75

14.2 Statisti al test for KF performan e

14 KALMAN FILTER CONSISTENCY

Figure 20: Truth model simulation for monte- arlo test

14.2.1 Monte Carlo Simulation based test


Consider a truth model simulation
The truth model uses the dynami s equations to generate measurement

x(k).

z(k)

and state ve tors

The measurement ve tors are then used as input to the kalman lter (under evaluation) and

estimated states

x(k)

are generated.

Using the notation,

x(k)
= x(k) x(k)

(429)

dene normalized estimation error squared (NEES) as

T (k)P 1 (k)x(k)

(k) = x
If KF is working properly then

(k)

(430)

should be distributed as hi-square with

nx

degrees of free-

dom.
To demonstrate this further, We de ompose
and

(k)

P 1 (k) = V (k)(k)V T (k)

where,

V (k)V T (k) = I

is a diagonal matrix

let,

y(k)
= V T (k)x(k)

(431)

then,

(k) = y (k)

(k)y(k)
=

2
nx 
X
yi
i=1

whi h is distributed as
We an do

with

nx

(432)

yi

degree of freedom, i.e. mean((k)) =

nx

monte arlo simulations of our truth model and lter the measurements and see

if average of the normalized estimation error squared (NEES) approa hes

76

nx .

14.2 Statisti al test for KF performan e

14 KALMAN FILTER CONSISTENCY

Let,

k =
where

N
1 X i
(k)
N i=1

ith monte arlo simulation model. Then, if


nx as N approa hed to . Note that N k is

(433)

denote

approa hes to

the lter is working properly then


2N n .

distributed as

Probability distribution for average normalized state estimation error squared


0.014

Chisquare density

0.012
0.01
0.008
0.006
(1alpha)

0.004
0.002

(alpha/2)

(alpha/2)

Nr1

Nnx

Nr2

Figure 21: Hypothesis test for average NEES


If

2 hypothesis test, under H0 hypothesis that the lter


r1 k r2 ; (1 ) 100% of the times. s given by:

is false alarm probability. Then for

is onsistent,

should be limited to

N r2

N r1

We usually hoose

r1

and

r2

su h that

p()d = (1 )
= 0.01

or

(434)

= 0.05

In MATLAB:

r1

chi2inv( , N.nx )
2
N

and

r2

chi2inv(1

, N.nx )
2

Note: If these limits are violated then something is wrong with the lter.
14.2.2 Real-Time (Multiple-Runs) Tests
This is test is done on lter (KF) based on the real time data for the dynami model under
evaluation. The test is appli able for the experiments that an be repeated in real world. Hen e
the dynami model under evaluation should available for

real-time runs.

First ompute,

(k) = T (k)S 1 (k)(k)

77

(435)

14.2 Statisti al test for KF performan e

14 KALMAN FILTER CONSISTENCY

If KF is working properly then this is distributed as

2nz .

We an do

N,

independent rum of the

real experiment (if it an be repeated) and al ulate :

k =
Note that
If

N k

is distributed as

2N nz .

is false alarm probability, then for

limited to

r1 k r2 , (1 ) 100%

N
1 X i
(k)
N i=1

(436)

hypothesis test for lter onsisten y,

of the times.

r1

and

r2

should be

are given by

In MATLAB:

r1

chi2inv( , N.nz )
2
N

Note:

and

r2

chi2inv(1

, N.nz )
2

If these limits are violated then something is wrong with the lter.

In addition to testing size of

(k),

we an test for its whiteness as well.

Compute the auto-

orrelation fun tion,

lm (k, j) = r
P

PN

i
li (k)m
(j)
2 P
2
N
N
i (k)
i (j)

i=1 l
i=1 m
i=1

(k, k) = 1
k = j then lm
k 6= j , then we expe t lm (k, k) to be small as ompared to 1,
For N large enough and k 6= j , this statisti s an be approximated
1
as varian e.
zero mean and
N
If

(437)

When

as normal distribution with

Normal Distribution

Probability distribution for autocorrelation function

(1alpha) *100 %

Figure 22: Hypothesis test for auto- orrelation fun tion

78

14.2 Statisti al test for KF performan e

14 KALMAN FILTER CONSISTENCY

E[lm (k, j)] = 0

(438)

E[lm (k, j)Tlm (k, j)] =


hen e,

r lm (k, j) r , (1 ) 100% of the time.

Note:

Typi ally to do the whiteness test, we look

1
N

(439)

, 0, p
where, r is given as : r = norminv 1
2
(N )
at l = m and just look for k and k + 1

14.2.3 Real-Time (Single-Run) Tests


All the above tests assume that

independent runs have been made. While they an be used

on a single run, they might have a high variability. The question is whether one an a hieve a low
variability based on a single run, as a real-time implementation. This test for lter onsisten y is
alled

real-time onsisten y test.

These test are based on repla ing the ensemble averages by time averages based on the

ergodi ity

of the innovation sequen e.


Time-averaged normalized statisti s is given by :

1X T
(k)S 1 (k)(k)
=

(440)

k=1

is distributed as
then,

2nz .

Hen e, same hypothesis test applies, with only one experiment.

Similarly whiteness test an be done. The whiteness test statisti s for innovations, whi h are

time-step apart an be written as time-average auto orrelation.

P
k=1 l (k)m (k + j)
lm (j) = q P
2 P
2

( k=1 l (k)) ( k=1 m (k + j))

This statisti s is normally distributed for large

(441)

Furthermore it an be shown that for large

E[lm (j)] = 0
E[lm (j)Tlm (j)] =

(442)

(443)

Question: What if one of the above tests fail?


Answer: Consider lter tuning.
Strategy for lter tuning
Assume
vary

Q(k)

1). If

F (k), H(k), (k), R(k)

are orre t.

Sin e we know least about pro ess noise, Hen e,

to pass the onsisten y test.

(k)

is too small then it means model ts measurement too well.

79

14.2 Statisti al test for KF performan e


This is the ase if

Q(k)

14 KALMAN FILTER CONSISTENCY

is too big. In this ase the system is making too mu h adjustment to

x(k)

in response to ea h measurement.
(k + 1) = F (k)P (k)F T (k) +
Re all P

(k)Q(k)T (k)

If Q(k) is too big then so is P (k + 1)


(k + 1)H T (k + 1) + R(k + 1)
Sin e, S(k + 1) = H(k + 1)P
whi h will make (k) too small.
Qold (k)
Solution is to de rease Q(k) su h that Qnew (k) =
10
2). If

(k)

is too big, then in rease

Q(k)

80

then

S 1 (k + 1)

will be too small,

15 CORRELATED PROCESS AND MEASUREMENT NOISE

15 Correlated Pro ess and Measurement Noise


S ribe: Henri Kjellberg
Base assumptions of the Kalman Filter:
1. Pro ess noise is white
2. Measurement noise is white
3. The two noises are un orrelated

Q: What if one or more of these assumptions is violated?


A: Let's ba k up and onsider our noise sour es in the frequen y domain.
time noise. Usually by white we mean spe i ally that:

Svv (f ) = power

Consider ontinuous

spe trum of

v=Const

V.

By the Weiner-Khon hin theorem, take the inverse Fourier transform to re over the auto- orrelation
fun tion:

Spe trally, at

E [
v (t) v (t + )] = R = F 1 [(S (f ))] = V ( )

(444)

un orrelated in time

This implies that a nonuniform power spe tra leads to auto- orelated noise:

6 I
YY

5 W
YY

Figure 23: An example of a low-pass pro ess


Strategy: the power spe trum of the auto- orrelated noise an be approximated as losely as
desired by the output of a linear subsystem driven by white noise.

ZKLWH

Z W Q W

DXWRFRUUHODWHG

DXWRFRUUHODWHG

Y W
X W

2ULJLQDO
6\VWHP

Figure 24: Power spe trum approximation linear system

x (t) = A (t) x (t) + B (t) u (t) + D (t) v (t)

81

(445)

15 CORRELATED PROCESS AND MEASUREMENT NOISE

z (t) = C (t) x (t) + w


(t) + n
(t)
Assume

(446)

E [
v (t)] = E [w
(t)] = E [
n (t)] = 0

Z W Q W 
ZKLWH
* V

Z W Q W
Y W

* V


X W

Y W

2ULJLQDO
6\VWHP

Figure 25: Augmented system

Let

(t) =

v (t)
w
(t) + n (t)

The shaping or pre-whitening lters an be implemented in state spa e as follows:

x (t) = A x (t) + B n
(t)

(447)

(t) = C x (t) + D (t)

(448)

where

v (t)
(t)
(t) = w
n
(t)

(449)

The output of the shaping system an be used to drive the original systems. The augmented
dynami s be ome:

 
 

 

I
I
x (t)
x (t)
B (t)
A (t) D (t)
C
D (t)
D
=
+
u (t) +
0
0
(t)
x (t)
x (t)
0
0
A
B
{z
}
|


(450)

new A matrtix







  x (t)
+ 0 I D (t)
z (t) = C (t) 0 I C
x (t)

Q: How do we develop the shaping lters G1 (s) and G2 (s)?


A: Estimate the power spe trum as a rational model (e.g., Svv (f ) =
transfer fun tion from

Svv (f )

and

Sww (f )

(451)

N (f )
D(f ) ) and derive the

via spe tral fa torization.

(See the standard texts on power spe trum parametrization).


One an often build up the desired power spe trum as a ombination of several building blo ks prototypi al Gauss-Markov pro esses. The above overs only auto- orrelated noise. See Bar Shalom
8.3 for ross- orrelated measurement and pro ess noise.

82

15.1 Aside: Realization Problem15 CORRELATED PROCESS AND MEASUREMENT NOISE

15.1 Aside: Realization Problem


We an re all the

Realization Problem,

where one attempts to go from an input-output

relationship governed by a transfer fun tion su h as

(s) u
y = G
(s)

(452)

into the state spa e model

x (t) =

A (t) x (t) + B (t) u (t)

(453)

y (t) =

C (t) x (t) + D (t) u (t)

(454)

For stri tly proper transfer fun tions (the degree of the numerator is less than the degree of the
denominator) one an reate a ontrollable anoni al form or an observable anoni al form. We do
this by exposing the oe ients of the numerator and denominator of the transfer fun tion. As an
example:
Given a transfer fun tion:

G (s) =

n1 s2 + n2 s + n3
s3 + d1 s2 + d2 s + d3

(455)

A state spa e model that is guaranteed to be ontrollable will take the form:


d2 d3
1
0
0 x (t) + 0 u (t)
1
0
0


y (t) = n1 n2 n3 x (t)

d1
x (t) = 1
0

(456)

(457)

Related MATLAB fun tions to investigate are: tf, ss, zpk, frd, ssdata, tf2ss.

15.2 Aside: Spe tral Fa torization

Spe tral fa torization involves taking a transfer fun tion su h as the one shown in Bar Shalom
on p. 67:

Svv () = S0

1
a2 + 2

(458)

And splitting it into two fun tions, one part with all the Right Hand Plane (RHP) roots and
the other in luding all the Left Hand Plane (LHP) roots.

Svv () =

1
1
S0
a + j a j

H () =

1
a + j

Is identied as the ausal transfer fun tion.

83

(459)

(460)

16 INFORMATION FILTER/SRIF

16 Information Filter/SRIF
S ribe: Ken Pesyna
Re all that the

a posteriori

state estimate error ovarian e matrix

P (k + 1)

an be dened in

terms of an update formula:

1
T
1

P (k + 1) =
(k + 1)H(k + 1)
P (k + 1) + |H (k + 1)R {z
}
matrix squaring operation

However, the matrix squaring operation is a bad idea numeri ally. It squares ondition number
of the matri es.
Note that

pT R
p > 0
P (k + 1) = R

may also be ill- onditioned for the same reason.

So may

R(k + 1).
Let's write

P (k + 1)

in terms of another update formula:

P (k + 1) = P (k + 1) W (k + 1)S 1 (k + 1)W T (k + 1)

(461)

P (k+1) > 0 but roundo errors (e.g. from non-innite


P (k + 1) indenite (not positive denite) or symmetri .
T
Joseph's form of P (k + 1) update to ensure P (k + 1) = P (k + 1) > 0, but this
a ura y of P (k + 1). A deterioration in a ura y of P (k + 1) an lead to garbage

Here, innite numeri al pre ision ensures


numeri al pre ision) an make
One an use
doesn't help the
results.

Bar Shalom introdu es the square root ovarian e lter, whi h keeps tra k of the square root of
the ovarian e matrix. But this requires the ability to update a Cholesky fa torization.

16.1 Information Filter


Strategy: work with

P 1 (k).

Instead of keeping tra k of

x(k + 1), P (k + 1), x(k + 1), P (k + 1),

we keep tra k of:

y(k)
y(k)
P 1 (k)
P 1 (k)

= P 1 (k)
x(k)
= P 1 (k)
x(k)

= I (k)

(463)
(464)

= I (k)

I (k) is known as the information matrix and is equal to


P (k) and is related to the Fisher information matrix I(k).

where
matrix

(462)

(465)
the inverse of the ovarian e

We an substitute these denitions into the Kalman Filter to get the Information Filter. After
mu h algebra in luding the matrix inversion lemma, we arrive at the following:
Let

A(k) = F T (k)I (k)F 1 (k)


Propagation step:

84

(466)

16.1 Information Filter

16 INFORMATION FILTER/SRIF

n

1 T o  T

y(k + 1) = I A(k)(k) T (k)A(k)(k) + Q1 (k)
(k) F (k)
y (k) + A(k)G(k)u(k)

(467)


1 T
I(k + 1) = A(k) A(k)(k) T (k)A(k)(k) + Q1 (k)
(k)A(k)
|
{z
}

(468)

Information is de reasing due to pro ess noise.

Pro ess noise de reases the information during the propagation step. This is similar to a hole
in a metaphori al information bu ket; If

Qk = 0,

i.e. there is no pro ess noise, then there is no

information leaking out and

I(k + 1) = A(k) = F T (k)I (k)F 1 (k)

(469)

y(k + 1) = y(k + 1) + H T (k + 1)R1 (k + 1)z(k + 1)

(470)

The update step:

I (k + 1) = I(k + 1) +

H T (k + 1)R1 (k + 1)H(k + 1)
|
{z
}

(471)

information is in reasing from measurements


Here the information is in reasing due to new measurements. This is similar to an information
pipe lling the metaphori al information bu ket.
We an re over

and

by:

x
(k + 1) = I 1 (k + 1)
y (k + 1)

(472)

P (k + 1) = I 1 (k + 1)

(473)

16.1.1 Benets of the Information Filter

The information lter is more e ient than Kalman Filter if

nz > nx , nv

and if

R(k)

is

diagonal. Usually this not the ase, however.


For linear systems we an pi k the initial state estimate arbitrarily as long as we set

I (0) = 0.

This represents the diuse prior, i.e. no idea of our initial state. This setting the initial prior
to be diuse annot be as easily done with the regular Kalman Filter.

We ould set the

error ovarian e matrix to innity, but limited numeri al pre ision limits our ability to do so
(k) = I 1 (k)
y (k) until I (k) be omes invertible. If
in real systems. We annot ompute x
the system is observable, then

I (k)

will eventually be ome invertible. Waiting for I (k) to


k so that H k has full olumn rank in the bat h

be ome invertible is like waiting large enough


initialization problem.

16.1.2 Disadvantages of the Information Filter

This form of the information lter still involves squaring in the

85

H T R1 H

terms.

16.2 Square Root Information Filtering

16 INFORMATION FILTER/SRIF

16.2 Square Root Information Filtering


The square root information lter is one spe i implementation of the information lter whi h
involves no squaring of terms.
Dene:

T
Rxx
(k) Rxx (k) = I (k)
T

xx (k) = I (k)
Rxx (k) R
where

Rxx (k)

and

kk (k)
R

are the Choleski fa torizations of

(474)
(475)

I (k)

and

zx (k) = Rxx (k) x


(k)

zx (k) = Rxx (k) x


(k)

I(k)

respe tively, and

(476)
(477)

Also let

RaT (k)Ra (k)

= R(k)

(478)

Ha (k) = RaT (k) H (k)


za (k) = RaT (k) z (k)

(479)

wa (k) =

RaT

(k) w (k)

(480)
(481)

The transformed measurement equation be omes:

za (k) = Ha (k) x (k) + wa (k)

(482)

E [wa (k)] = 0

(483)

where



E wa (k) waT (j) = kj I

(484)

The dynami s model remains un hanged:

x (k + 1) = F (k) x (k) + G (k) u (k) + (k) v (k)

(485)

E [v (k)] = 0

(486)



E v (k) v T (j) = kj Q (k)

We now en ode prior information about

x (k)

and

v (k)

(487)

into so alled (square root) information

equations (AKA the data equations). Let

T
Rvv
(k) Rvv (k) = Q1 (k)
Note:

Rvv (k) = hol (inv (Q (k))) = [inv ( hol (Q (k)))]


86

(488)

16.2 Square Root Information Filtering

16 INFORMATION FILTER/SRIF

zx (k) = Rxx (k) x (k) + wx (k) , wx (k) (0, I)

(489)

zv (k) = 0 = Rvv (k) v (k) + wv (k) , wv (k) (0, I)

(490)



E wx (k) wvT (k) = 0

(491)

These square root information equations store, or en ode, the state and pro ess noise estimates
and their ovarian es. We an re over our estimates from the information equations as long as
is invertible. If
time

and the

Rxx (k)

Rxx

is not invertible then the system is not observable from the data through

a priori info at time 0.

Note that

Rxx (k)

is upper triangular.

Let's now de ode the state from the state information equation:

1
x (k) = Rxx
(k) [zx (k) wx (k)]
Suppose we want our best estimate of

x(k)

denoted as

(492)

x
(k):

x
(k) = E [x (k) |k]
1
1
= Rxx
(k) E [zx (k) |k] Rxx
(k) E [wx (k) |k]
|
|
{z
}
{z
}
=

(494)

zx (k)

1
Rxx

(493)

(k) zx (k)

(495)

This result is onsistent with the previous denition in Eq. 476.


Let

1
x
(k) = x (k) x
(k) = Rxx
(k) wx (k)

P (k)



= E x
(k) x
T (k) |k

 T
1
(k)
= Rxx
(k) E wx (k) wxT (k) Rxx
{z
}
|

(496)

(497)
(498)

1
T
= Rxx
(k) Rxx
(k)

= I

(k)

(499)
(500)

This result is onsistent with the previous denition in Eq. 474.


Likewise,

v (k) = E [v (k) |k] = 0

(501)

h
i
T
E (v (k) v (k)) (v (k) v (k)) |k = Q (k)

(502)

This result is also onsistent with previous denitions.

87

16.2 Square Root Information Filtering

16 INFORMATION FILTER/SRIF

16.2.1 Propagation Step and Measurement Update


We need to gure out how to perform the propagation step and the measurement update on
these square root information equations.

We'll use the MAP approa h whi h is equivalent to

minimizing the negative natural log of the a posteriori of the

a posteriori

onditional probability

density fun tion. This amounts to minimizing the ost fun tion:

Ja [x (k) , v (k) , x (k + 1) , k] =
=

log (p)
1
1
[x(k) x
(k)]T P 1 (k)[...] + v T (k)Q1 (k)v(k)
2
2
1
+ [z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[...]
2

(503)
(504)
(505)

After normalization of the above form, an alternative formulation of the ost fun tion based in
square root information notation is:

Ja [x (k) , v (k) , x (k + 1) , k] =

1
1
2
2
kRxx (k) x (k) zx (k)k + kzv (k) Rvv (k) v (k)k
{z
} 2|
{z
}
2|
a priori x(k)
a priori v(k)
1
2
+ kHa (k + 1) x (k + 1) za (k + 1)k
(506)
{z
}
2|
measurement at k+1

The insight here is that the prior estimate of the state and pro ess noise an be expressed as a
measurement and thus formulated into the above ost fun tion.
Our task is to minimize

x (k + 1).
eliminate

Ja

x (k)

x (k), v (k), and


v (k), x (k + 1) and use this to

subje t to the dynami s model, whi h relates

We will solve the dynami s model for

x (k)

in terms of

from the ost fun tion:

x (k) = F 1 (k) [x (k + 1) G (k) u (k) (k) v (k)]


Simultaneously eliminate

Ja .

x (k)

and enfor e the dynami s onstraint, by substituting

We'll all this equivalent ost fun tion

(507)

x (k)

into

Jb :

Jb [v (k) , x (k + 1) , k](508)

2











1
Rvv (k)
0
v (k)
0

=

(509)
zx (k) + Rxx (k) F 1 (k) G (k) u (k)
2 Rxx (k) F 1 (k) (k) Rxx (k) F 1 (k) x (k + 1)
|

{z
}


Big blo k matrix
1
2
+ kHa (k + 1) x (k + 1) za (k + 1)k (510)
2

In the equation above, we used the following identity for the rst term:

  2
a
2
2

kak + kbk =
b
88

(511)

16.2 Square Root Information Filtering


Re all that

kT vk = kV k

16 INFORMATION FILTER/SRIF
Ta (k) = QT (k) from QR fa torization of the
orthonormal. Multiplying insides of rst term by Ta (k)

for T orthonormal. Let

big blo k matrix in Eq. 508 above.

Ta is

and rewrite as:

Jb [v (k) , x (k + 1) , k] =



 
 2
1
v (k)
zv (k)
Rvv Rvx (k + 1)

zx (k)
Rxx (k + 1) x (k + 1)
2 0
1
+ kHa (k + 1) x (k + 1) za (k + 1)k2
2

(512)

NB: This was the propagation step!


Next Step: Just as we took a step of data equations and pa ked them into a ost fun tion, so
an we take the ost fun tion and unpa k it ba k into a set of data equations.
If we unpa k the rst term into its square root information equations, we obtain:
1. The

a posteriori square root information equation for v (k) as a fun tion of x (k + 1):
zv (k) = Rvv v (k) + Rvx (k + 1) x (k + 1) + w v (k) ,

w v (k) (O, I)

(513)

This equation is a by-produ t of the ltering pro ess. It is not used to determine the ltered
state estimate, but will be used in smoothing. Filtering implies ausality, smoothing implies
non ausality ( an use future information).
2. The

a priori state square root information matrix at k + 1:


zx (k + 1) = Rxx (k + 1) x (k + 1) + wx (k + 1) ,
Jb

Now minimizing

wrt.

Jb
0=
v (k)

w x (k) (O, I)

(514)

v (k):

T

Rvv (k)
| {z }

non-singular

This yields:



Rvv (k) v (k) + Rvx (k + 1) x (k + 1) zv (k)



1
v (k) = Rvv (k) zv (k) Rvx (k + 1) x (k + 1)

Eq. 516 is equivalent to the solution for

v (k)

(515)

(516)

in the MAP derivation of the Kalman Filter.

Substitute Eq. 516 into solution into Eq. 512, and sta king the remaining terms we get a new
yet equivalent ost fun tion:


2








1 R (k + 1)
z (k + 1)
Jc [x (k + 1) , k + 1] = xx
x (k + 1) x

za (k + 1)
2 Ha (k + 1)
|

{z
}
Matrix

A.

If Matrix A were square (and non-singular) we ould just take it's inverse to ompute
the lter's best estimate of

x(k + 1).

(517)

x
(k + 1)

However, it's not, so we want to QR fa torize this matrix

in order to push all the energy from the

Ha

term up into the top of the matrix, making it upper

89

16.2 Square Root Information Filtering

16 INFORMATION FILTER/SRIF

triangular. This will de ouple the ost fun tion into a omponent that depends on

x(k + 1)

and

one that does not. We do this by performing QR fa torization on matrix A and the subsequent
and orthonormal transformation to the ost fun tion as before to get:


2








z (k + 1)
R (k + 1)
Jc (x (k + 1) , k + 1) = xx
x (k + 1) x

0
zr (k + 1)

|

{z
}
upper triangular

NB: This was the update step!

to

a posteriori.

1
1
2
2
kRxx (k + 1) x (k + 1) zx (k + 1)k + kzr (k + 1)k
2
2

(519)

The la k of bars above the terms indi ates that we have gone from

a priori

(518)

Unsta k to get:

Jc [x (k + 1) , k + 1] =

Now unpa k the impli it square root information equation from this ost fun tion to get:
1. The

a posteriori square root information equation for for the state at k + 1:


zx (k + 1) = Rxx (k + 1) x (k + 1) + wx (k + 1) ,

wx (k + 1) (0, I)

(520)

2. The residual error equation:

zr (k + 1) = wr (k + 1) ,

90

wr (k + 1) (0, I)

(521)

16.3 Benets of the square root information lter

16 INFORMATION FILTER/SRIF

Aside:
Q: Where do the

Rxx , zx (k + 1), zr (k + 1),wx (k + 1)

and

wr (k + 1)

terms ome from?

A: They ome from the orthogonal transformation that o urs to the ost fun tion, i.e. Eq. 517,
after performing the QR fa torization and transforming the matri es. First, to make things lear,
let's unpa k Eq. 517 into its impli it square root information equations:

z x (k + 1) = Rxx (k + 1) x (k + 1) + w x (k + 1)

(522)

za (k + 1) = Ha (k + 1) x (k + 1) + wa (k + 1)

(523)

It's easy to see the impli it noise terms

w
x

and

wa .

A in the ost fun tion, i.e. in Eq. 517, and dening

Now performing QR fa torization on matrix


Ta (k + 1) = QT , where Q is from the QR

fa torization and then transforming ea h matrix by left multiplying by



Rxx (k + 1)
0


zx (k + 1)
zr (k + 1)


wx (k + 1)
wr (k + 1)

=
=
=

Ta (k + 1),

we arrive at:



xx (k + 1)
R
Ha (k + 1)


z (k + 1)
Ta (k + 1) x
za (k + 1)


w
x (k + 1)
Ta (k + 1)
wa (k + 1)
Ta (k + 1)

(524)

(525)

(526)

Ta (k + 1) is orthonormal, wx (k +1) and wr (k +1) retain the same distribution as w


x (k +1)
wa (k + 1), i.e.wx (k + 1) (0, I) , wr (k + 1) (0, I).
Now returning to Jc . We an minimize it by inspe tion.

Be ause
and

1
x
(k + 1) = Rxx
(k + 1) zx (k + 1)

(k + 1) su h that the rst term of of


Thus, we have solved for x
2
Jc = 12 kzr (k + 1)k , whi h be omes its minimum.
It an be shown that

(527)

Jc

be omes 0 and we are left

with

kzr (k + 1)k = zTr (k + 1) zr (k + 1) = T (k + 1) S (k + 1) (k + 1)


where

and

(528)

are terms dened earlier within the normal Kalman ltering (non-square-root-

information) ontext.

16.3 Benets of the square root information lter

No matrix squaring: (only QR fa torization and inversion of R matri es when ne essary)

Very robust numeri ally

P (k)is
Note:

F (k)

guaranteed symmetri and positive denite be ause

1
T
P (k) = Rxx
(k) Rxx
(k)

must be invertible. If not our solution must be more fan y.

91

17 SMOOTHING

17 Smoothing

17.1 Estimate x(k) based on Z j with j > k


3 Classi Types:

Fixed point smoothing: Interested in

Fixed lag smoothing: Estimate

Fixed interval smoothing: Estimate

urrent

x(k)

for a xed

x(k), x(k + 1), ...

k,

but

keeps in reasing

using data that extends

samples past the

of interest

x(k)k

from 1 to N using all data on

k = 1, 2, , N

Fo us: Fixed interval smoothing


New Notation

x(k|N
) = x (k)
P (k|N ) = P (k)
v(k|N ) = v (k)
Preferred Implementation: Square-Root Information Smoother (SRIS)
Key Observation: Smoother equations fall out of the MAP estimation approa h.

First, exe tue

an SRIF forward (ltering) pass.


Square-Root Information Equations relating the state and pro ess noise

v (k) (0, I)
w

vv (0)v(0) + R
vx (1)x(1) + w
zv (0) = R
v (0)
.
.
.

zv (N 1) = Rvv (N 1)v(N 1) + Rvx (N )x(N ) + w


v (N 1)
Square-Root Information Equations for the residuals: Dis ard them, they are useless

zr (0) = wr (1)
.
.
.
zr (N ) = wr (N )
Square-Root Information Equation for the state at N

zx (N ) = Rxx (N )x(N ) + wx (N )
Invoke the Dynami s Model:

x(k + 1) = F (k)x(k) + G(k)u(k) + (k)v(k)

Smoothing is essentially a ba kwards iteration. Use the dynami s model to emilimate


of

x(N 1)

92

x(N ) in favor

17.2 Steps

17 SMOOTHING

17.2 Steps
1. Let:

zx (N ) = zx (N )

Rxx
(N ) = Rxx(N )
wx (N ) = wx (N )
2. (If needed) Compute:

1
x (N ) = Rxx
(N )z (N )
1
T
P (N ) = Rxx
(N )Rxx
(N )
3. Set

k = N 1.

The ost fun tion asso iated with the Square-Root Information equations at

k an be written as:

Jb [v(k), x(k + 1), k] =

vx (k + 1)x(k + 1) zv (k)||2 + 1 ||Rxx


||Rvv (k)v(k) + R
(k + 1)x(k + 1) zx (k + 1)||2
2
2

We seek to minimize this subje t to the dynami s equation.


4. Use the dynami s equation to eliminate

x(k + 1) in

sta king leads to:

Ja [v(k), x(k), k] =

1
||
2

favor of



x(k).

 
v(k)

x(k)

Substitute for

x(k + 1) and

||2

whose implied SR Information equations are:

 

 

vx (k + 1)G(k)u(k)
vv (k) + R
vx (k + 1)(k) R
vx (k + 1)F (k) v(k)
v (k)
zv (k) R
|R
w
=
+

zx (k + 1) Rxx
(k + 1)G(k)u(k)
Rxx
(k + 1)(k)
Rxx
(k + 1)F (k) x(k)
wx (k + 1)

Multiply both sides by some

Ta (k) = QT (k)

from qr-fa torization of the blo k matrix. This

does not hange the ost, but now the SR information equations are de oupled:

Where
Thus


wv (k)
(0, I)
xx (k)

 

  

zv (k)
Rvv (k) Rvx
(k) v(k)
w (k)
=
+ v

zx (k)
0
Rxx
(k) x(k)
xx (k)

Ja [v(k), x(k)] =

||R (k)v(k) + Rvx


(k)x(k) zv (k)||2
2 vv

5. Minimize the ost:

1
x (k) = Rxx
(k)zx (k) = E[x(k)|Z N ]
1
T
P (k) = Rxx
(k)Rxx
(k)
1

v (k) = Rvv
(k)[zv (k) Rvx
(k)x (k)] = E[v(k)|Z N ]

1
T
T
T
Pvv
(k) = Rvv
(k)[I + Rvx
(k)Rxx
(k)Rxx
(k)Rvx
(k)]Rvv
(k)

1
(k) = Rvv
(k)Rvx
(k)Rxx
(k)Rxx
(k)
Pvx

N ote : Ja [v (k), x (k), k] = 0

93

17.2 Steps
6. If

k = 0,

17 SMOOTHING
stop. Otherwise, de rement k by 1 and goto step 4. now using the SR Information

equations:

zx (k + 1) = Rxx
(k + 1)x(k + 1) + wx (k + 1)
vv (k)v(k) + R
vx (k + 1)x(k + 1) + w
zv (k) = R
v (k)

94

18 NONLINEAR DIFFERENCE EQUATIONS FROM ZOH NONLINEAR DIFFERENTIAL


EQUATIONS

18 Nonlinear Dieren e Equations from ZOH Nonlinear Differential Equations


S ribe: Daniel P. Shepard
We wish to nd an approximate Kalman Filter algorithm for non-linear systems as dieren e
equations of the form

x(k + 1) = f [k, x(k), u(k), v(k)]


z(k) = h [k, x(k)] + w(k)

(529)
(530)

We need to onstru t this from the ontinuous-time non-linear models of the form

x(t)
= f (t, x(t), u(t)) + D(t)
v (t)

z(t) = h (t, x(t)) + w(t)

(531)
(532)

Re all that we already did this in Chapter 10 for linear systems when

x(t)
= A(t)x(t) + B(t)u(t) + D(t)
v (t)

z(t) = C(t)x(t) + w(t)

(533)

x(k + 1) = F (k)x(k) + G(k)u(k) + (k)v(k)

(535)

(534)

was tranformed into

z(k) = H(k)x(k) + w(k)

(536)

18.1 Zero-Order-Hold (ZOH) Assumption


Under zero-order-hold (ZOH) assumptions, the sampling interval
enough that over the interval

is assumed to be small

tk = kt t < (k + 1)t = tk+1


u(t) u(k)
v(t) v(k)
w(k)
w(t)

(537)
(538)
(539)

This is illustrated in Fig. 26, whi h shows that the value of the ontrol is assumed onstant (i.e.

t. ZOH also requires that the


z(t) after anti-alias ltering to avoid innite varian e of w(k).
xk (t) be the ontinuous-time state in the interval tk t < tk+1 .

held over) over the sampling interval

measurement

z(k)

is a

sample of
Let

problem

95

Dene the initial-value

18 NONLINEAR DIFFERENCE EQUATIONS FROM ZOH NONLINEAR DIFFERENTIAL


18.2 Varian e of the ZOH Pro ess Noise
EQUATIONS

Figure 26: The ontrol input under ZOH assumptions

x k (t) = f [t, xk (t), u(t)] + D(t)


v (t)

z(t) = h [t, x(t)] + w(t)


xk (tk ) = x(k)

(540)
(541)
(542)

We an solve for xk (t) on tk t < tk+1 . The solution depends on x(k), u(k), and v(k). Let
f [k, x(k), u(k), v(k)] = xk (tk+1 ), where f [] is some pro edure for integrating forward to tk+1 . In
MATLAB, this integration pro edure ould be ode45 or any other numeri al integration s heme.

18.2 Varian e of the ZOH Pro ess Noise


Q: Given

How do we relate

Q(t)

A: If t is small, then

and




E v(t)
v T ( ) = (t )Q(t)


T
E v(k)v (j) = kj Q(t)

(544)

?
Q(t)

i
h
f [k, x(k), u(k), v(k)] x(k) + t f (tk , x(k), u(k)) + D(tk )v(k)
This is simple Euler Integration. In this ase, the term of

tD(tk )v(k),

(543)

f []

(545)

orresponding to the pro ess noise is

whi h an be alternatively expressed as

tD(tk )v(k)

Both of these forms of the pro ess noise term of

tk+1

D( )
v ( ) d

tk

f []

96

have zero mean. Also,

(546)

18 NONLINEAR DIFFERENCE EQUATIONS FROM ZOH NONLINEAR DIFFERENTIAL


18.3 Partial Derivatives of Dieren e Equations
EQUATIONS

cov

Z

cov (tD(tk )v(k)) = t2 D(tk )Q(k)DT (tk )


 Z tk+1
tk+1
)DT ( ) d
D( )Q(
D( )
v ( ) d =
tk

tk

(548)

k )DT (tk )
tD(tk )Q(t

Equating these two ovarian es for small

yields that

Q(k) =
Note that

(547)

limt0 Q(k) = .

k)
Q(t
t

(549)

This result omes from the whiteness of

v(t).

Q: What if the measurement interval t is too large for the ZOH assumption to hold?
A: One an take m intermediate steps of length t
m between ea h measurement. Choose m su h

that

t
m is small enough that

Q(k) =

t )
Q(k
m
t
m

is a reasonable approximation. Thus, the model takes

the form

x(k + 1) = f [k, x(k), u(k), v(k)]


z(0) = h [0, x(0)] + w(0)

(550)

z(m) = h [m, x(m)] + w(m)

(rst measurement)

(551)

(next measurement) . . .

(552)

In other words, implement a KF by performing m propagation steps and then an update step, sin e
new measurements only arrive every

mt

se onds.

18.3 Partial Derivatives of Dieren e Equations


To design a nonlinear estimation algorithm, we'll need to know


f []
F (k) =
x(k) k,x(k),u(k),0



f []
(k) =
v(k)

(553)

(554)

k,x(k),u(k),0

Q: How do we nd these?


A: Re all that

x k (t) = f (t, xk (t), u(t)) + D(t)


v (t)
xk (tk ) = x(k)
Taking the partial with respe t to

x(k)

(555)
(556)

yields

97

18 NONLINEAR DIFFERENCE EQUATIONS FROM ZOH NONLINEAR DIFFERENTIAL


18.3 Partial Derivatives of Dieren e Equations
EQUATIONS

f()
[x k (t)] =

x(k)
xk (t)


xk
x(k) t

t,x (t),u(k)

k

xk
= A(t)
x(k) t

Sin e

xk (tk ) = x(k)

(557)

and the order of dierentiation an be inter hanged, we have the following

initial-value problem for the state transition matrix



d xk (t)
xk (t)
= A(t)
dt x(k)
x(k)
xk (tk )
= Inx xnx
x(k)

(558)

(559)

xk (tk )
x(k) is similar to the state-transition matrix
ontinuous-time linear systems in Se tion 10.
This shows that
Similarly, for

F (t, tk )

from the dis ussions of the

(k)


xk (t)
d xk (t)
= A(t)
+ D(t)
dt v(k)
v(k)
xk (tk )
=0
v(k)

However, we want to know the derivatives of the dis rete

f [],

(560)

(561)

not the ontinuous

f().

These

derivatives are given by

xk (tk+1 )
f []
=
x(k)
x(k)
f []
xk (tk+1 )
=
v(k)
v(k)
This requires integration of Eqs. (558) and (560) from

tk

to

(562)

(563)

tk+1 .

This an be a omplished using

numeri al integration s hemes, su h as ode45, to integrate the two matrix dierential equations at
the same time we're integrating the

xk (t) dierential equation.

To do this, break the matri es apart

into olumn ve tors as

xk (t)
= [1 (t), 2 (t), . . . , nx (t)]
x(k)
xk (t)
= [1 (t), 2 (t), . . . , nv (t)]
v(k)
98

(564)

(565)

18 NONLINEAR DIFFERENCE EQUATIONS FROM ZOH NONLINEAR DIFFERENTIAL


18.3 Partial Derivatives of Dieren e Equations
EQUATIONS
Then, the initial-value problems for these newly dened ve tors are

i (t) = A(t)i (t),

i = 1, 2, . . . , nx

i (tk ) = [0, 0, . . . , 0, |{z}


1 , 0, . . . , 0]

(566)
(567)

ith row

i (t) = A(t)i (t) + di (t),

i (tk ) = 0
where

D(t) = [d1 (t), d2 (t), . . . , dnv (t)].

i = 1, 2, . . . , nv

(569)

Now a large state ve tor an be dened as


T
Xbig = xTk , T1 , T2 , . . . , Tnx , 1T , 2T , . . . , nTv

This state ve tor has dimension

(568)

nx (nx + nv + 1)x1.

(570)

A large numeri al integration routine an be

written, with the appropriate previously dened initial onditions, to solve

X big = fbig (t, Xb ig, u(k), v(k))

99

(571)

19 NONLINEAR ESTIMATION FOR DYNAMICAL SYSTEMS

19 Nonlinear Estimation for Dynami al Systems


S ribe: Shaina Johl

Problem statement
Dynami s model:

x(k + 1) = f [k, x(k), u(k), v(k)]


E[v(k)] = 0,

(572)

E[v(k)v (j)] = kj Q(k)

(573)

Measurement model:

z(k) = h [k, x(k)] + w(k)

(574)

E [w(k)] = 0, E [w(k)w(j)] = jk R(k)

(575)

Q: How to optimally estimate x(k)?


A: See Bar Shalom 10.2
Our strategy is a sub-optimal strategy. Approximate the optimal estimates by applying linear
MMSE estimator to the non-linear problem.

Approximate

Approximate their ovarian es



x(k + 1) = E x(k + 1)|Z k


z(k + 1) = E z(k + 1)|Z k

(576)
(577)

P (k + 1), P kz (k + 1), P zz (k + 1)

(578)

If we an assume that the approximations are valid then we an use our old update equations
for the measurement update of the Kalman lter

x
(k + 1) =x(k + 1) + P xz (k + 1)Pzz (k + 1) [z(k + 1) z(k + 1)]

1
P (k + 1) =P (k + 1) P xz (k + 1)Pzz
(k +

T
1)P xz (k

+ 1)

(579)
(580)

19.1 Standard EKF Methods for Approximation


EKF Philosophy: Linearize the non-linear equations about the urrent best state estimate.

19.1.1 Predi tion:




x(k + 1) = E f [k, x(k), u(k), v(k)] |Z k

100

(581)

19.1 Standard EKF Methods for


19 Approximation
NONLINEAR ESTIMATION FOR DYNAMICAL SYSTEMS
Expand in a Taylor series expansion about

x(k) = x
(k), v(k) = v(k) = 0:

x
(k + 1) = E f [k, x
(k), u(k), 0] +
[x(k) x
(k)]
x k,x(k),u(k),0

|
{z
}

F (k)

f
+ Higher

v(k)
+

v(k)

k,
x(k),u(k),0
{z
}
|

Order Terms|Z

(k)

(582)

Negle t higher order terms, hoping that the linearization is valid over the likely values of

x(k)!

This is surprisingly ee tive.


Now take the expe tations:



x(k + 1) = f [k, x
(k), u(k), 0] + F (k) E[x(k) x(k)|Z k ] +(k) E v(k)|Z k
|
{z
}
|
{z
}
approximately=0

Note: Compute

x(k + 1)

via our numeri al pro edure

Q(k)

would have ae ted

But,

x(k + 1).

f [k, x(k),
h u(k),
h
i v(k)].

Wat h out: if we had retained the se ond order derivatives

(583)

2f
x2

and

2 f
v 2

then

P (k)

and

This is a property of non-linear lters in general.



P (k + 1) = E (x(k + 1) x(k + 1))( )T

(584)

x(k + 1) = x(k + 1) + F (k) [x(k) x


(k)] + (k)v(k) + HOT
| {z }

(585)

P (k + 1) = F (k)P (k)F T (k) + (k)Q(k)T (k)

(586)

negle t

Substituting and taking expe tations yields:

Note: This is the same as for the linear Kalman Filter. The only dieren e is that

(k)

F (k)

and

are omputed by numeri al integration.

19.1.2 Measurement update:




z(k + 1) = E h [k + 1, x(k + 1)] + w(k + 1)|Z k




|k+1,x(k+1) [x(k + 1) x(k + 1)] + HOT + w(k + 1)|Z k


= E h [k + 1, x(k + 1)] +
x

{z
}
|
H(k+1)





= h [k + 1, x(k + 1)] + H(k + 1) E x(k + 1) x(k + 1)|Z k + E w(k + 1)|Z k
|
{z
} |
{z
}
approximately=0

h [k + 1, x(k + 1)]

101

(587)

19.2 EKF as an algorithm

19 NONLINEAR ESTIMATION FOR DYNAMICAL SYSTEMS

P ae ts z .


P xz (k + 1) = E (x(k + 1) x(k + 1))(z(k + 1) z(k + 1))T |Z k

(588)

z(k + 1) z(k + 1) + H(k + 1) [x(k + 1) x(k + 1)] + w(k + 1)

(589)

Note: If HOTs are retained, then

Note that

Therefore



P xz (k + 1) = E (x(k + 1) x(k + 1))(H(k + 1) [x(k + 1) x(k + 1) + w(k + 1)])T |Z k
T

= P (k + 1)H (k + 1)

(590)
(591)

Similarly,

P zz (k + 1) = H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)

(592)

19.2 EKF as an algorithm


1. Start with
2. Set

x
(0), P (0)

k=0

3. Compute

x(k + 1) = f [k, x
(k), u(k), 0]


f
F (k) =
x k,x(k),u(k),0

f
(k) =
v(k) k,x(k),u(k),0

(593)
(594)

(595)

P (k + 1) = F (k)P (k)F T (k) + (k)Q(k)T (k)

(596)
(597)

4. Compute the Measurement Update

z(k + 1) =h[k + 1, x
(k + 1)]

h
H(k + 1) =
x

(598)
(599)

k+1,x(k+1)

(k + 1) =z(k + 1) z(k + 1)
S(k + 1) =H(k + 1)P (k + 1)H T (k + 1) + R(k + 1) = P zz (k + 1)
T

W (k + 1) =P (k + 1)H (k + 1)S (k + 1)
x(k + 1) =x(k + 1) + W (k + 1)(k + 1)

(603)

P (k + 1) =P (k + 1) W (k + 1)S(k + 1)W (k + 1)

102

(601)
(602)

Note that we an use alternate formulas instead of

(600)

and

for the Kalman lter.

(604)

19.2 EKF as an algorithm

19 NONLINEAR ESTIMATION FOR DYNAMICAL SYSTEMS

5. Filter:

P (k + 1) = [P 1 (k + 1) + H T (k + 1)R1 (k + 1)H(k + 1)]1


Joseph form (guarantees that

P (k + 1) > 0:

P (k + 1) =[I W (k + 1)H(k + 1)]P (k + 1)[I W (k + 1)H(k + 1)]T


T

W (k + 1) =P (k + 1)H (k + 1)R
k

by 1 and go to Step 3.

103

(606)

(607)

(k + 1)

(608)

+ W (k + 1)R(k + 1)W (k + 1)

6. In rement

(605)

20 ITERATED KALMAN FILTER

20 Iterated Kalman Filter


S ribe: Aprajita Sant

20.1 Re-interpretation of EKF as a Gauss Newton Step


It an be thought of as a step towards the solution of the nonlinear MAP problem.
Consider the MAP ost fun tion:

1
[x (k) x
(k)]T P 1 (k) [...]
2
1 T
v (k) Q1 (k) v (k)
2
1
T
{z (k + 1) h [k + 1, x (k + 1)]} R1 (k + 1) {...}
2

Ja [k, v (k) , x (k) , x(k + 1)] =


+
+

whi h omes from minimizing




p x (k) , v (k) , x (k + 1) Z k = Cexp(Ja )
Minimizing of Ja must be subje t to the nonlinear dynami s.

(609)

Dene a nonlinear inverted dynami s

fun tion:

x (k) = f 1 [k, x (k + 1) , u (k) , v(k)]


Here, the fun tion is dened su h that:
1

x (k + 1) = f [k, f

{k, x (k + 1) , u (k) , v (k)} , u (k) , v(k)]

The above an be better visualized by taking the example of our original nonlinear problem

f = F (k)x(k) + G(k)u(k) + (k)v(k)

(610)

f 1 = F 1 (k) [x(k + 1) G(k)u(k) (k)]

(611)

1
We an obtain f
by numeri ally integrating ba kward f in time.
1
to eliminate x(k) from the MAP ost fun tion and thereby dene:
Use f

Jb [k, v (k) , x(k + 1)] =


+

T
1  1
f [k, x(k + 1, u(k), v(k))] x
(k) P 1 (k) [...]
2
1 T
1
T
v (k) Q1 (k) v (k) + {z (k + 1) h [k + 1, x (k + 1)]} R1 (k + 1)(612)
{...}
2
2

This is just a weighted least squares ost fun tion for errors in the three equations given below:
1.

0 = f 1 [k, x(k + 1), u(k), v(k)] x(k)

2.

0 = v(k)

3.

0 = z(k + 1) h[k + 1, x(k + 1)]

with weighting

with weighting

P 1

Q1
with weighting

104

R1 (k + 1)

20.1 Re-interpretation of EKF as a Gauss Newton Step

20 ITERATED KALMAN FILTER

Strategy: We use the Gauss-Newton method to linearize and solve. We start by rst linearizing

x(k + 1) = x
(k + 1) and v(k) = 0.
x
(k)

about the rst guess at


below using our initial

Here

x
(k + 1) is obtained by the equation

x
(k + 1) = f [k, x
(k), u(k), v(j)]
The next step is to solve the linearized least squares problem for

(613)

x
(k)

and

v(k),

then re-linearize

and then iterate. We do this by linearizing ea h of the three equations above.


For 1) we have:

f 1
f [k, x
(k + 1) , u (k) , v (k)] +
x (k + 1)
 1 
f
[v (k) 0] x
(k)
v (k) k,x(k+1)

k,
x(k+1)

(614)

x
(k) = f 1 [k, x
(k + 1) , u (k) , v (k)]

In the above expression

the x
(k) s an el.

[x (k + 1) x (k + 1)]

from the denition of inverse. Thus

It an be shown that:

f 1
x (k + 1)

= F 1 (k) =

k,
x(k+1),u(k),0

f
x(k)

1

k,
x(k),u(k),0

And

f 1
v (k)

k,
x(k+1),u(k),0

1

f
= F 1 (k) (k) = F 1 (k)
v(k) k,x(k),u(k),0

The linearized equation then be omes:



0 = F 1 (k) [x (k + 1) x
(k + 1)] F 1 (k) (k) v(k)

Similarly, we linearize 3) as shown below:

0
(k + 1)]
= z (k + 1) h [k + 1, x
We dene,

H (k + 1) =

Also, we know,

h
x(k+1)

h
x (k + 1)

k,
x(k+1)

[x (k + 1) x
(k + 1)]

k,
x(k+1)

z (k + 1) = h [k + 1, x
(k + 1)]

The linearized equation for 3) redu es to:

0 = z (k + 1) z (k + 1) H(k + 1)[x (k + 1) x (k + 1)]


Summarize: To summarize the three linearized equations are:
1.



0 = F 1 (k) [x (k + 1) x (k + 1)] F 1 (k) (k) v(k)
105

20.2 Iterated Kalman Filter

20 ITERATED KALMAN FILTER

2.

0 = v (k)
3.

0 = z (k + 1) z (k + 1) H (k + 1) [x (k + 1) x
(k + 1)]

.Jb

New ost fun tion obtained by substituting the linearized equations ba k into the ost fun tion

Jb [k, v (k) , x(k + 1)] =


+
+

1
1
T
[x (k + 1) x (k + 1) (k) v(k)] F T (k) P (k) F 1 (k) [...]
2
1 T
v (k) Q1 (k) v (k)
2
1
T
{z (k + 1) z (k + 1) H (k + 1) [x (k + 1) x
(k + 1)]} R1 (k + 1)(615)
{...}
2

The new ost fun tion is minimized w.r.t

x (k + 1)and v (k).

If the linearization is good, this is very

lose to maximizing the aposteriori likelihood fun tion and an be viewed as the justi ation for
EKF. Also, there are analogies to represent Extended Kalman lter as a square root information
lter.

20.2 Iterated Kalman Filter


The traditional EKF takes only one Gauss-Newton step. The iterated EKF is su h that it takes
multiple Gauss-Newton steps, re-linearizing only the measurement equation at ea h step.(One ould
1
also imagine re-linearizing the f
dynami s using.x
(k + 1) and v (k) Infa t, one an imagine relinearizing over the last N-steps whi h is the idea behind the BSEKF.

Jc [k + 1, x(k + 1)] = Jb [k, vOP T (k) , x(k + 1)]

Consider

(i.e the

v (k)that

minimizesJb )

The approximation omes from linearized dynami s (linearized about)x


(k). Measurement equation
remains non-linear.
Strategy:
iterate.
i (k
Let x

Start with a guess of

x (k + 1)solve

the linearized LS problem for a new guess, then

+ 1) be the estimated,x (k + 1) after the ith Gauss Newton step, note. x


0 (k + 1) = x (k + 1)

Also, let

H i (k + 1) =
The linearized measurement equation after the

h
x

ith

k+1,
xi (k+1)

step:





0 = z (k + 1) h k + 1, x
i (k + 1) H i (k + 1) x (k + 1) x
i (k + 1)

Relate these iterations ba k to standard KF equations by:

T
Jci
=
x(k + 1)
= P 1 (k + 1) [x (k + 1) x (k + 1)]





T
H i (k + 1)R1 (k + 1){z (k + 1) h k + 1, x
i (k + 1) H i (k + 1) x (k + 1) x
i (k +(616)
1) }
106

20.3 Forward Ba kward Smoothing (BSEKF)

20 ITERATED KALMAN FILTER

Let,

P i+1 (k + 1) = [P 1 (k + 1) + H i (k + 1)R1 (k + 1)H i (k + 1)]


Then, solving for

x
i+1 (k + 1) =
+

x (k + 1)

yields:



T
x
i (k + 1) + P i (k + 1) H i (k + 1)R1 (k + 1) ){z (k + 1) h k + 1, x
i (k + 1) }
P i (k + 1) P 1 (k + 1) [
x (k + 1) x
i (k + 1)]
(617)

With,

P 0 (k + 1) = P (k + 1) and x0 (k + 1) = x
(k + 1)
Note that

xi (k + 1)

is the traditional (non-iterated) EKF estimate.

Stop iterating when, the expression below gets very small

i+1

x

(k + 1) x
i (k + 1) <

Use step size adjustment as before if worried about divergen e

20.3 Forward Ba kward Smoothing (BSEKF)

It is used to deal with urrent and several past measurement non-linearities plus dynami s nonlinearities.
Solve for,

x (k j) and v (k j) , j = m, m 1, ..., 0
Su h that it minimizes the expression below:

1
T
[x (k j) x (k m)] P 1 (k m) [...]
2

k1
1 X T
v (l) Q1 (l) v(l)
2
l=km

1
2

k
X

l=km1

{z (l) h [l, x (l))]} R

Su h that,

x (k + 1) = f [k, x (l) , u (k) , v (k)]


Steps:

x (k j)

Start with an initial guess of

Linearize the dynami s about this guess at ea h time.

Solve the linear smoothing problem with a step-size adjustment algorithm.

propagate it forward.

107

T
T

(l) {...}
(618)

21 MULTIPLE MODEL (MM) FILTERING

21 Multiple Model (MM) Filtering


S ribe: Yousof Mortazavi
Let

be a ve tor of unknown parametersit might ae t

any

of

F, G, , H, Q, R, x
(0), P (0).

A Bayesian approa h to multiple model ltering seeks the following:

p[x(k), |Z k ] = p[x(k)|, Z k ] p[|Z k ]


For onvenien e, let

take on values in

Then:

p[x(k)|j , Z k ]

{1 , 2 , . . . , M }.

is the posterior density of

p[ = j |Z k ] , j (k) is

x(k)

under the

the probability that the

j th

j th

model.

model is orre t given

Zk

21.1 Stati Case

Consider the stati ase, where

PM

j=1

j = 1).

(k) = = onst.

21.1.1 Strategy
1. Determine how to propagate

p[x(k)|Z k ] =

j (k)

to

PM

j (k + 1)

j=1 j (k) p[x(k)|j , Z


or MMSE. Also, al ulate P (k).

2. Find

] and use this to hoose an optimal x(k) per MAP

21.1.2 Steps
1. Propagate:

j (k)

p[j |Z k ]

= p[j |z(k), Z k1 ]
=
= j (k)

p[z(k)|j , Z k1 ] p[j |Z k1 ]
p[z(k)|Z k1 ]
p[z(k)|j , Z k1 ] j (k 1)
M
X
p[z(k)|l , Z k1 ] l (k 1)
l=1

The fa tor

p[z(k)|j , Z k1 ]

is the likelihood fun tion of model

at time

k.

In the linear Gaussian ase:

p[z(k)|j , Z k1 ] = N (z(k); Hj (k) xj (k), Sj (k))

= N ((k); 0, Sj (k)) = p(j (k))

where

Sj (k)

is the innovation ovarian e matrix for model

108

and

j (k) = z(k) Hj (k) xj (k)

KF
T1
21.2 Remarks
Z

x1
n1
21 MULTIPLE
MODEL (MM) FILTERING
mu1

x2
2. Estimate: n2
T2
KF

xhat

xMAP (k) =
x
MMSE
xM(k) =

KF

PMMSE (k) =

mu2

arg max p[x(k)|Z k ]


x(k)

E[x(k)|Z ] =

nM

TM

sum

M
X

M
X

x
j (k) j (k)

j=1

spread of the means


z
}|
{

muM(k)] [xj (k) xMMSE (k)]T


j (k) Pj (k) + [
xj (k) x
MMSE

j=1
propagate

muk1

Fig. 27 shows the s hemati for the MMSE ase.

Figure 27: Multiple model lter s hemati for MMSE ase

21.2 Remarks

Choosing




{1 , 2 , . . . , M }

is the subje t of ongoing resear h.

hard to distinguish similar


large

makes models hard to distinguish

Approximate multiple model lter for nonlinear systems an be designedjust repla e KF


with EKF.
If the orre t

is among the

j ,

then orresponding

j (k)

will approa h 1 as

Q: What an be done for time varying (k)


A1: Ad ho modi ation: impose an arti ial lower bound on the j (k)
A2: Dynami multiple model lter
Let

(k) {1 , 2 , . . . , M }

k .

and assume model swit hing is a Markov pro ess with transition

probabilities given by

pij = p[(k) = j |(k 1) = i ]


Then for a general non-linear system

x(k + 1) =
z(k) =

f [k, x(k), (k)]


h[k, x(k), (k)]
109

Hidden Markov Model

21.2 Remarks
There exist

Mk

21 MULTIPLE MODEL (MM) FILTERING


possible sequen es at time

k.

Q: How an we deal pra ti ally with su h exponential growth?


A:
Method

(1) retain only most probable

Complexity (# of lters)

M
MN

sequen es

(2) ombine sequen es that dier only before last

steps

Bar Shalom introdu es:

Generalized Pseudo-Bayesian estimator of rst order (GPB1) (N

Generalized Pseudo-Bayesian estimator of se ond order (GPB2) (N

Intera ting Multiple Model Estimator (IMM) (like

N =2

= 1)

but only

= 2)
M

lters used)

For additional material, please refer to Mayba k se tion 10.8 and Bar Shalom Se tion 11.6.

110

22 THE "BOOTSTRAP" PARTICLE FILTER

22 The "Bootstrap" Parti le Filter


22.1 Motivation

A more perfe t linear estimator. All MMSE estimators, in luding approximate te hniques su h
as the Extended Kalman Filter (EKF) and Sigma Point Filter (SPF), redu e to taking the approximate onditional mean and ovarian e:

x(k)
= E[x(k)|z k ]
T

P (k) = E[[x(k) x(k)][x(k)


x(k)]
]
Remarks

EKF-produ ed approximations for

SPF approximations are generally better

But both EKF and SPF are onsidering only the rst two moments of the posterior pdf
k

x(k)

and

P (k)

are poor

p[x(k)|z ]

Q: What if the posterior pdf is multi-modal?


Q: What would the estimation look like without approximation?
A: (Assume no ontrol input for simpli ity)
x(k + 1) = f [k, x(k), v(k)]
z(k + 1) = h[k + 1, x(k + 1)] + w(k + 1)

22.1.1 Propagation
Prior at

k+1

(Also known as the Chapman-Komorgorov Equation):

p[x(k + 1)|z k ] =

p[x(k + 1)|x(k)]p[x(k)|z k ]dx(k)

22.1.2 Update
p[x(k + 1)|z k+1 ] =

p[z(k + 1)|x(k + 1)]p[x(k + 1)|z k ]


p[z(k + 1)|z k ]

This re ursion allows us a ess to the entire posterior pdf.

We an then hoose to optimize our

estimate against any riterion we wish.

Q: Why should we settle for anything less than this optimal estimate?
A: Think about the one-dimensional problem: We an approa h optimality

by numeri al inte-

gration. But grid size must be small and the grid must apture the tails of the distribution.
Now think about the multi-dimensional problem: If we need 100 ells per dimension and we have
nx dimensions, the total number of ells is 100nx . Another problem: The grid-based method is not

111

22.2 Parti le Filter Algorithm

22 THE "BOOTSTRAP" PARTICLE FILTER

readily parallelizable.
The Parti le Filter suers from the same drawba ks as the grid-based method (massive memory and omputation is required even for modest

nx .

Not readily parallelizable), but we'll study it

anyway.
Key Idea: Estimate the posterior using weighted "parti les":

p[x(k)|z k ]

Ns
X
i=1

wi [x(k) i (k)]
X

wi = 1

are alled parti les or support points

To hoose the support points

and the weights

wi

we approximate the pro ess of drawing a sample

from a distribution.
Problem: For an arbitrary distribution, it is omputationally expensive to generate random samples.

22.1.3 Importan e Sampling


Suppose we an't draw from

p(x) e onomi ally, but we an from another distribution q(x), alled

the importan e density.


1.

q(x)

is non-zero everywhere

2.

q(x)

and

p(x)

an be approximated as:

Then

wi 's

p(x)

p(x)

is non-zero

have similar mean and ovarian e

are given by

p(x)

PNs

i=1

wi [x i ]

wi = c

Ns
where we draw {i }i=1 from

p(i )
q(i )

where is a normalizing onstant.


The approximation be omes exa t as
Open Questions:

Q:
Q:

Ns

How fast does the approximation approa h the truth?


What should

Ns

be for a given problem?

22.2 Parti le Filter Algorithm


Basi Idea: Like the SPF ex ept that
1. We use more samples
2. We draw them randomly
3. We don't attempt to shoehorn transformed samples into a Gaussian distribution

112

q(x)

and

22.2 Parti le Filter Algorithm

22 THE "BOOTSTRAP" PARTICLE FILTER

Generi Steps (Later we'll fo us spe i ally on the Bootstrap part):

1. Choose an importan e density


2. Draw samples

q[x(k)|z k ]

i (k), i = 1, 2, ..., Ns

3. Compute importan e weights:


4. If ne essary, approximate

from

wi (k) =

p[x(k)|z k ]

q[x(k)|z k ]

p[i (k)|z k ] P p[i (k)|z k ] 1


[ i q[i (k)|zk ] ]
q[i (k)|z k ]

PNs

i=1
And ompute basi estimation quantities:

wi (k)[x(k) i (k)]

x(k)
= E[x(k)|z k ]

wi (k)i (k)

T k

P (k) = E[[x(k) x(k)][x(k)


x(k)]
|z ]
X
T

wi (k)[i (k) x(k)][

i (k) x(k)]
i

This step is not ne essary unless you a tually have to provide a single estimate.

22.2.1 How to hoose q[x(k)|z k ]?


Choose q to fa tor in a onvenient way:

q[i (k)|i (0), , i (k 1)] q[i (0), , i (k 1)|z k ]


Consider the true density

p[i(0), , i (k)|z k ]

= p[i (0), , i (k)|z k1 , z(k)]

p[z(k)|i (0), , i (k), z k1 ]p[i (0), , i (k)|z k1 ]


p[z(k)|z k1 ]

p[z(k)|i (0), , i (k), z k1 ] = p[z(k)|i (k)] (zeroeth-order


k1
]
i (0), ,i (k)|z
p[i (0), ..., i (k)|z k ] = p[z(k)|i (k)]p[
p[z(k)|z k1 ]

Assume:
Then

Markov assumption)

Use Bayes' rule on the se ond fa tor in the numerator to get:

p[i (0), ..., i (k)|z k1 ] = p[i (k)|i (0), , i (k 1), z k1 ]p[i (0), ..., i (k 1)|z k1 ]
Assume:

p[i (k)|i (0), , i (k 1), z k1 ] = p[i (k)|i (k 1)]


p[i (0), ..., i (k)|z k ] =

(rst-order Markov assumption)

p[z(k)|i (k)]p[i (k)|i (k 1)]p[i (0), , i (k 1)|z k1 ]


p[z(k)|z k1 ]

Now onsider the weights:

wi (k) =

p[i (0), ..., i (k)|z k ]


q[i (0), ..., i (k)|z k ]
113

22.3 Bootstrap Algorithm

22 THE "BOOTSTRAP" PARTICLE FILTER

Then, making use of the above, we an write:

wi (k) = c

p[z(k)|i (k)]p[i (k)|i (k 1)] p[i (0), , i (k 1)|z k1 ]

q[i (k)|i (0), , i (k 1)]


q[i (0), , i (k 1)|z k ]

where the se ond fra tion is the same as


How to pi k

q[i (k)|i (0), , i (k 1)]?

Bootstrap method:
Then the

wi (k 1)

wi (k)

q = p[i (k)|i (k 1)]

be ome:

wi = c p[z(k)|i (k)] wi (k 1)
(This is similar to the Multiple-Model updates for the

i 's)

22.3 Bootstrap Algorithm


1. Draw initial parti les

i (0), i [1, Ns ]

from the known initial probability density

p[x(0)]

and

initialize weights one ea h parti le equally:

i (0) p[x(0)], i [1, Ns ]


1
, i [1, Ns ]
wi (0) =
Ns
2. Draw one sample of the pro ess noise for ea h parti le.

Note this step is performed only

q[x(k)] =
p[x(k)|x(k1)], whi h is generally made be ause pro ess noise is often assumed to be Gaussian.
be ause the bootstrap lter makes the parti ular hoi e of importan e density

vi (k 1) p[
v (k 1)], i [1, Ns ]
3. Propagate ea h parti le forward a ording to the dynami s fun tion. This is analogous to the
predi tion step in teh EKF or SPF, as it predi ts the parti le set forward in time without any
measurement updates. Noti e the parti le weights don't hange during this step.

i (k) = f [i (k 1), vi (k 1), u(k 1)]


4. Repeat steps 2 and 3 until the next measurement update.

Note that times between mea-

surements an be subdivided into multiple predi tions if ne essary for a ureate modeling or
omputational savings.
5. At the time of the measurement, re al ulate the weights on the parti les a ording to the

bootstrap weight update equation. Note that the use of the primed wi (k) to indi ate that it
is not yet normalized.

wi (k) = p[z(k)|i (k)] wi (k 1)


114

22.3 Bootstrap Algorithm

22 THE "BOOTSTRAP" PARTICLE FILTER

Very small likelihoods

p[z(k)|i (k)] may ause the numeri al underow problems in the parwi (k) might get set to zero be ause it is too small to represent in

ti le lter, i.e. a weight

double pre ision. To be safe, the parti les may be updated a ording to log-likelihoods:

log[wi (k)] = log[p[z(k)|i (k)]] + log[wi (k 1)]


wi (k) = exp[log(wi (k)) maxi [log(wi (k))]]
wi (k) indi ates that the doubly-primed weights are also not normalized and are dif
ferent from wi (k). In performing this log likelihood update, the value of the largest weight

maxi [log(wi (k))] has been subtra ted from ea h weight before taking teh exponent. This
Where

s ales all weights prior to the exponentiation for added numeri al robustness. In the parti ular (and typi al) ase of zero-mean additive Gaussian white measurement noise, the weight
update is parti ularly simple. That is, if

z(k) = h[x(k)] + w(k)

with

w(k) N (0, R),

then:

1
log[wi (k)] = [z(k) h(i (k))]T R1 (k)[z(k) h(i (k))] + log[wi (k 1)]
2
wi (k) = exp(log[wi (k)] maxi (log[wi (k)]))
Noti e in taking the log of the Gaussian likelihood, we drop the normalization onstant. That
onstant is the same for all weights, so it gets an elled when the weights are re-normalized
later on.
6. Re-normalize the weights so they sum to unity.

This is ne essary in order to preserve the

fa t that the set of parti les a tually represent a dis rete approximation to the posterior
probability density of

x(k).
w (k)
wi (k) = PNsi

i=1 wi (k)

7. Evaluate the ee tive number of parti les

s:
N

1
s = P
N
Ns
2
i=1 (wi (k))

8. Resample the parti les if the ee tive number


if

s < Ns /2, but


N

s
N

is too low. A de ent heuristi is to resample

resampling more or less than that may also be justied. Here is a ommon

resampling algorithm to be used when the parti le lter needs to be resampled:


(a) Choose a random number
(b) Find

su h that

Pm1
j=1

on

[0, 1]

uniformly

wj (k) <

Pm

j=1

= m (k) and winew (k) = N1s


new
(d) Repeat steps a through until i
(k), i
new
( ) Set i
(k)

wj (k)

= [1, Ns ]

are hosen

(e) Delete the old set of parti les and use the new set and new weights
Note that some old parti les might appear more than on e in the new set, whereas others
might disappear altogether.

115

22.3 Bootstrap Algorithm

22 THE "BOOTSTRAP" PARTICLE FILTER

9. Compute basi estimation statisti s when desired (but don't throw out the parti le set!)

x(k)

P (k)

Ns
X
i=1

wi (k)i (k)

i=1

wi (k)[i (k) x(k)][


i (k) x(k)]

10. Return to step 2 and ontinue

22.3.1 Note:
Q: What is p[z(k)|i (k)]?
A: Suppose z(k) = h[k, x(k)] + w(k), w(k) N (0, R)

Then

Ns
X

p[z(k)|i (k)] = N (z(k); h[i (k)], R)

116

You might also like