You are on page 1of 32

Advanced Signal Processing

Introduction to Estimation Theory


Danilo Mandic,
room 813, ext: 46271

Department of Electrical and Electronic Engineering


Imperial College London, UK
d.mandic@imperial.ac.uk,
c D. P. Mandic

URL: www.commsp.ee.ic.ac.uk/mandic
Advanced Signal Processing, Spring 2015

Aims of this lecture


To introduce the notions of: Estimator, Estimate, Estimandum
To discuss the bias and variance in statistical estimation theory,
asymptotically unbiased and consistent estimators
Performance metric: the Mean Square Error (MSE)
The biasvariance dilemma and the MSE in this context
To derive a feasible MSE estimator
A class of Minimum Variance Unbiased (MVU) estimators
Extension to the vector parameter case
Point estimators, confidence intervals, statistical goodness of an
estimator, the role of noise

c D. P. Mandic

Advanced Signal Processing, Spring 2015

Role of estimation in signal processing


(try also the function specgram in Matlab)

An enabling technology in many electronic signal processing systems


1. Radar
2. Sonar
3. Speech

4. Image analysis
5. Biomedicine
6. Communications

7. Control
8. Seismics
9. Almost everywhere ...

Radar and sonar: range and azimuth


Image analysis: motion estimation, segmenation
Speech: features used in recognition and speaker verification
Seismics: oil reservoirs
Communications: equalization, symbol detection
Biomedicine: various applications
c D. P. Mandic

Advanced Signal Processing, Spring 2015

Statistical estimation problem


for simplicity, consider a DC level in WGN, x[n] = A + w[n], w N (0, 2)

Problem statement: We seek to determine a set of parameters


= [1, . . . , p]T from a set of data points x = [x[0], . . . , x[N 1]]T
such that the values of these parameters would yield the highest
probability of obtaining the observed data. In other words,
max p(x; )

span

reads :

p(x) parametrised by

The unknown parameters may be seen as deterministic or random


variables
There are essentially two alternatives to the statistical case
No a priori distribution assumed: Maximum Likelihood
A priori distribution known: Bayesian estimation
Key problem # to estimate a group of parameters from a discrete-time
signal or dataset.
c D. P. Mandic

Advanced Signal Processing, Spring 2015

Estimation of a scalar random variable


Given an N - point dataset x[0], x[1], . . . , x[N 1] which depends on an
unknown parameter , (scalar), define an estimator as some function, g,
of the dataset, that is
= g(x[0], x[1], . . . , x[N 1])
which may be used to estimate (single parameter case).
(in our DC level estimation problem, = A)
This defines the problem of parameter estimation
Also need to determine g()
Theory and techniques of statistical estimation are available
Estimation based on PDFs which contain unknown but deterministic
parameters is termed classical estimation
In Bayesian estimation, the unknown parameters are assumed to be
random variables, which may be prescribed a priori to lie within some
range of allowable parameters (or desired performance)
c D. P. Mandic

Advanced Signal Processing, Spring 2015

The stastical estimation problem


First step: to model, mathematically, the data
We employ a Probability Desity Function (PDF) to describe the
inherently random measurement process, that is
p(x[0], x[1], . . . , x[N 1]; )
which is parametrised by the unknown parameter
Example: for N = 1, and denoting the mean value, a generic form of
PDF for the class of Gaussian PDFs with any value of is given by

 1
1
2

p(x[0]; ) =
exp 22 (x[0] )
2
2

Clearly,
the
observed value of
x[0] impacts upon
the likely value of
.

c D. P. Mandic

x[0]

Advanced Signal Processing, Spring 2015

Estimator vs. Estimate


The parameter to be estimated is then viewed as a realisation of the
random variable
Data are described by the joint PDF of the data and parameters:
p(x, ) =

p(x | )
| {z }

p()
|{z}

(conditional P DF ) (prior P DF )

An estimator is a rule that assigns a value of from each realisation


of x = x = [x[0], . . . , x[N 1]]T
An estimate of, i.e. (also called estimandum) is the value obtained
for a given realisation of x = [x[0], . . . , x[N 1]]T in the form = g(x)
Example: for a noisy straight line:

p(x; ) =

1
e
(2 2 )N/2

i
PN 1
2
1
2
n=0 (x[n]ABn)
2

Performance is critically dependent upon this PDF assumption - the


estimator should be robust to slight mismatch between the measurement
and the PDF assumption
c D. P. Mandic

Advanced Signal Processing, Spring 2015

Example: finding the parameters of straight line


Specification of the PDF is critical in determining a good estimator

In practice, we choose a PDF which fits the problem constraints and any
a priori information; but it must also be mathematically tractable.
Example: Assume that on the Data: straight line embedded in
average the data are increasing
random noise w[n] N (0, 2)
x[n]

x[n] = A + Bn + w[n]
noisy line

n = 0, 1, . . . , N 1
p(x; A, B) =

1
e
(2 2 )N/2

i
PN 1
2
1
2
n=0 (x[n]ABn)
2

Unknown parameters:
A, B [A B]T

A
ideal noiseless line
0

c D. P. Mandic

Careful: what would be the effects


of bias in A and B?
Advanced Signal Processing, Spring 2015

Bias in parameter estimation


Estimation theory (scalar case): estimate the value of an unknown
from a set of observations of a random variable described by
parameter, ,
that parameter


= g x[0], x[1], . . . , x[N 1]


Example: given a set of observations from Gaussian distribution, estimate
the mean or variance from these observations.

Recall that in linear mean square estimation, when estimating a value of


random variable y from an observation of a related random variable x,
the coefficents a and b in the estimation y = ax + b depend upon the
mean and variance of x and y as well as on their correlation.
The difference between the expected value of the estimate and the
actual value is caleld the bias and will be denoted y B.
B = E{N }
where N denotes estimation over N data samples, x[0], . . . , x[N 1]
c D. P. Mandic

Advanced Signal Processing, Spring 2015

Asymptotic unbiasedness
If the bias is zero, then the expected value of the estimate is equal to the
true value, that is
E{N } =

B = E{N } = 0
and the estimate is said to be unbiased.
If B 6= 0 then the estimator = g(x) is said to be biased.
Example: Consider the sample mean estimator of the signal
x[n] = A + w[n], w N (0, 1), given by
N
1
X
1
A = x
=
x[n]
that is = A
N + 2 n=0

Is the above sample mean estimator of the true mean A biased?


More often: an estimator is biased but bias B 0 when N
lim E{N } =
N

Such as estimator is said to be asymptotically unbiased.


c D. P. Mandic

Advanced Signal Processing, Spring 2015

10

How about the variance?


It is desirable that an estimator be either unbiased or asymptotically
unbiased (think about the power of estimation error due to DC offset)
For an estimate to be meaningful, it is necessary that we use the
available statistics effectively, that is,
V ar 0

as

or in other words
o
n
lim var{N } = lim |N E{N }|2 = 0
N

If N is unbiased then E{N } = , and from Tchebycheff inequality > 0


var{N }

P r{|N | }
2
if V ar 0 as N , then the probability that N differs by more
than from the true value will go to zero (showing consistency).
In this case, N is said to converge to with probability one.
c D. P. Mandic

Advanced Signal Processing, Spring 2015

11

Mean square convergence


Another form of convergence, stronger than convergence with probability
one is mean square convergence.
An estimate N is said to converge to in the meansquare sense, if
lim E{|N |2} = 0
{z
}
N |
mean square error

For an unbiased estimator this is equivalent to the previous condition


that the variance of the estimate goes to zero
An estimate is said to be consistent if it converges, in some sense, to
the true value of the parameter
We say that the estimator is consistent if it is asymptotically
unbiased and has a variance that goes to zero as N

c D. P. Mandic

Advanced Signal Processing, Spring 2015

12

Example: Assessing the performance of the Sample


Mean as an estimator
Consider the estimation of a DC level, A in random noise, which could be
modelled as
x[n] = A + w[n]
w[n] some zero mean random process.
Aim: to estimate A given {x[0], x[1], . . . , x[N 1]}
Intuitively, the sample mean is a reasonable estimator
N
1
X
1
x[n]
A =
N n=0

Q1: How close will A be to A?


Q2: Are there better estimators than the sample mean?
c D. P. Mandic

Advanced Signal Processing, Spring 2015

13

Mean and variance of the Sample Mean estimator


x[n] = A + w[n]
w[n] N (0, 2 )
Estimator = f( random data ), a random variable itself
its performance must be judged statistically

(1) What is the mean of A?


( N 1
)
N
1
n o
X
X
1
1
E A = E
x[n] =
E {x[n]} = A # unbiased
N n=0
N n=0

(2) What is the variance of A?

Assumption: The samples of w[n]s are uncorrelated


( N 1
)
n o
n o
X
1
2
= V ar A = V ar
E A
x[n]
N n=0

N 1
1 X
1
2
2
=
V ar {x[n]} = 2 N =
2
N n=0
N
N
Notice the variance 0 as N # consistent (see your P&A sets)
c D. P. Mandic

Advanced Signal Processing, Spring 2015

14

Minimum Variance Unbiased (MVU) estimation


Aim: to establish good estimators of unknown deterministic parameters
Unbiased estimator # on the average yields the true value of the
unknown parameter independent of its particular value, i.e.
=
E()
a<<b
where (a, b) denotes the range of possible values of
Example: Unbiased estimator for a DC level in White Gaussian
Noise (WGN). If we are given
x[n] = A + w[n]

n = 0, 1, . . . , N 1

where A is the unknown, but deterministic, parameter to be estimated


which lies within the interval (, ), then the sample mean can be used
as an estimator of A, namely
N
1
X
1
A =
x[n]
N n=0
c D. P. Mandic

Advanced Signal Processing, Spring 2015

15

Careful: the estimator is parameter dependent!


An estimator may be unbiased for certain values of the unknown
parameter but not all, such an estimator is not unbiased
Consider another sample mean estimator:
PN 1

1
A = 2N n=0 x[n]
n o

when A = 0
E A = 0

Therefore:
n o

E A =

A
2

when A 6= 0

but

(parameter dependent)

Hence A is not an unbiased estimator


A biased estimator introduces a systemic error which should not
generally be present
Our goal is to avoid bias if we can, as we are interested in stochastic
signal properties and bias is largely deterministic
c D. P. Mandic

Advanced Signal Processing, Spring 2015

16

Remedy
(also look in the Assignment dealing with PSD in your CW )

Several unbiased estimates of the same quantity may be averaged


together, i.e. given the L independent estimates
n
o
1, 2, . . . , L

We may choose to average them, to yield


P

= L1 L
l=1 l

Our assumption was that the individual estimators are unbiased, with equal
variance, and uncorrelated with one another.
Then (NB: averaging biased estimators will not remove the bias)
n o
E =
and

n o
V ar =

1
L2

n o
n o
1

l=1 V ar l = L V ar l

PL

Note, as L ,
c D. P. Mandic

(consistent)

Advanced Signal Processing, Spring 2015

17

Effects of averaging for real world data


Problem 3.4 from your P/A sets: heart rate estimation
The heart rate, h, of a patientnis automatically recorded
by a computer every 100ms. In
o
. Given than
one second the measurements h1, h2, . . . , h10 are averaged to obtain h
n o
E hi = h for some constant and var(hi ) = 1 for all i, determine whether
averaging improves the estimator if = 1 and = 1/2.
Before Averaging

10

10
n o
n o
X
1
i
=
var h
var h
2
L i=1

p(hi )

h i

1
2

h/2

c D. P. Mandic

=1

1
2

p(hi )

10

X
h = h
=
10 i=1
If = 1, unbiased, if = 1/2 it will not
be unbiased unless the estimator is formed

= 1 P10 h
as h
i=1 i [n].
5
n o

E h

=1
p(hi )

After Averaging

p(hi )

1 X
hi[n],
10 i=1

h i

h/2

Advanced Signal Processing, Spring 2015

18

Minimum variance criterion


An optimality criterion is necessary to define an optimal estimator
Mean Square Error (MSE)

2 
=E

M SE()

measures the average mean squared deviation of the estimator from the
true value.
This criterion leads, however, to unrealisable estimators - namely, ones
which are not solely a function of the data
h
 
i2
+ E()

=E
E()
M SE()
n

+ E ()

= V ar()

o2

+ B 2()

= V ar()

MSE = VARIANCE OF THE ESTIMATOR + SQUARED BIAS


c D. P. Mandic

Advanced Signal Processing, Spring 2015

19

Example: An MSE estimator with gain factor


Consider the following estimator for DC level in WGN
N
1
X
1
A = a
x[n]
N n=0

Task: Find a which results in minimum MSE


Given
n o
E A = aA
=
V ar(A)

and

a2 2
N

we have
2 2
a

+ (a 1)2A2
M SE(A) =
N

Of course, the choice of a = 1 removes the mean and minimises the variance
c D. P. Mandic

Advanced Signal Processing, Spring 2015

20

Continued: MSE estimator with gain


But, can we find a analytically? Differentiating with respect to a yields
M SA
2a 2
(A) =
+ 2(a 1)A2
a
N
and setting the result to zero gives the optimal value
aopt =

A2
2

A2 + N

but we do not know the value of A


The optimal value depends upon A which is the unknown parameter
Comment - any criterion which depends on the value of the unknown
parameter to be found is likely to yield unrealisable estimators
Practically, the minimum MSE estimator needs to be abandoned, and
the estimator must be constrained to be unbiased
c D. P. Mandic

Advanced Signal Processing, Spring 2015

21

A counter-example: A little bias can help


(but the estimator is difficult to control)
Q: Let {y[n]}, n = 1, . . . , N be iid Gaussian variables N (0, 2).
Consider the following estimate of 2
N
X

2 =
y 2[n] > 1
N n=1
Find which minimises the MSE of
2.
S: It is straightforward to show that E{ 2} = 2 and
M SE(
2)

2
4 } + 4(1 2)
E{
2 2 } = E{

2 X X
2
2
4
E{y
[n]y
[s]}
+

(1 2)
2
N n=1 s=1

i

h
2  2 4
2
4
4
4
4
N + 2N + (1 2) = (1 + ) + (1 2)
N2
N

The MMSE is obtained for min =


2

2 4
N +2 .

N
N +2

and has the value

min M SE(
)=
Given that the minimum variance of an
unbiased estimator (CRLB, later) is 2 4/N , this is an example of a
biased estimator which obtains a lower MSE than the CRLB.
c D. P. Mandic

Advanced Signal Processing, Spring 2015

22

Example: Estimation of hearing threshold (hearing aids)


Objective estimation of hearing threshold:
Design AM or FM stimuli with carrier frequency up to 20 kHz
Modulate this carrier with e.g. 40 Hz envelope
Present this sound to the listener
Measure the electroencephalogram, which should contain a 40 Hz signal
if the subject was able to detect the sound
This is a good objective estimate of the hearing curve

c D. P. Mandic

Advanced Signal Processing, Spring 2015

23

Key points
Simulation: Auditory steady state response (ASSR), 40 Hz amplitude modulated
Time series

An estimator is a random
variable, its performance can
only be completely described
statistically or by a PDF.
Estimators performance cannot
be conclusively assessed by
one computer simulation - we
often average M trials of
independent simulations, called
Monte Carlo analysis.
Performance and computation
complexity are often traded - the
optimal estimator is replaced by
a suboptimal but realisable one.

x 10

PSD analysis

2
0
15

10

90

92

94

96

98

20

25

30

35

40

45

25

30

35

40

45

25

30

35

40

45

x 10
2
0

15

10

2
98

100

102

104

106

20

x 10
4
2
0
2

15

10

4
106

108

110

112

114

20

Averaged PSD analysis

x 10
4
2
0
2
4
90

15

10
95

100
105
Time (s)

110

20

25

30
35
40
Frequency (Hz)

45

Simulation: Left column: The 24s of ITE EEG sampled at 256Hz was split into 3
segments of length 8s. Left column: The respective periodograms (rectangular window).
Bottom right: The averaged periodogram, observe the reduction in variance (in red).
c D. P. Mandic

Advanced Signal Processing, Spring 2015

24

Desired: minimum variance unbiased (MVU) estimator


Minimising the variance of an unbiased estimator concentrates the PDF of
the error about zero estimation error is therefore less likely to be large
Existence of the MVU estimator

The MVU estimator is an unbiased estimator with minimum


variance for all , that is, 3 on the graph.
c D. P. Mandic

Advanced Signal Processing, Spring 2015

25

Methods to find the MVU estimator


The MVU estimator may not always exist
A single unbiased estimator may not exist in which case a search
for the MVU is fruitless!
1. Determine the Cramer-Rao lower bound (CRLB) and find some
estimator which satisfies
2. Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem
3. Restrict the class of estimators to be not only unbiased, but also linear
(BLUE)
4. Sequential vs. block estimators
5. Adaptive estimators
c D. P. Mandic

Advanced Signal Processing, Spring 2015

26

Extensions to the vector parameter case


h

iT

If = 1, 2, . . . , p Rp1 is a vector of unknown parameters, an


estimator is unbiased if
E(i) = i

ai < i < b i
for i = 1, 2, . . . , p

and by defining

E(1)
E(2)

E() =
..

E(p)
an unbiased estimator has the property

=
E()

within the p dimensional space of parameters


An MVU estimator has the additional property that V ar(i) for
i = 1, 2, . . . , p is minimum among all unbiased estimators
c D. P. Mandic

Advanced Signal Processing, Spring 2015

27

Summary and food for thoughts


We are now equipped with performance metrics for assessing the
goodnes of any estimator (bias, variance, MSE)
Since MSE = var + bias2, some biased estimators may yield low MSE.
However, we prefer the minimum variance unbiased (MVU) estimators
Even a simple Sample Mean estimator is a very rich example of the
advantages of statistical estimators
The knowledge of the parametrised PDF p(data;parameters) is very
important for designing efficient estimators
We have introduced statistical point estimators, would it be useful to
also know the confidence we have in our point estimate
In many disciplines it is useful to design so called set membership
estimates, where the output of an estimator belongs to a pre-definined
bound (range) of values
We will next address linear, best linear unbiased, maximum likelihood,
least squares, sequential least squares, and adaptive estimators
c D. P. Mandic

Advanced Signal Processing, Spring 2015

28

Homework: Check another proof for the MSE expression


= var()
+ bias2()
MSE()

2
2
Note : var(x) = E[x ] E[x]
Idea :

to give

Let x =

()

substitute into ()

2
2

var( ) = E[( ) ] E[ ]
()
| {z }
| {z } |
{z
}
term (1)

term (2)

term (3)

Let us now evaluate these terms:


(1)
(2)
(3)

var( ) = var()
E[ ]2 = MSE

2 
2 
2

E[ ] = E[] E[] = E[ ] = bias2()

Substitute (1), (2), (3) into (**) to give


= MSE bias2
var()

c D. P. Mandic

+ bias2()

MSE = var()

Advanced Signal Processing, Spring 2015

29

Recap: Unbiased estimators


Due to the linearity properties of the E {}, that is
E{a + b} = E{a} + E{b}
the sample mean operator can be simply shown to be unbiased, i.e.
n o
PN 1
PN 1
1
1

E A = N n=0 E {x[n]} = N n=0 A = A

In some applications, the value of A may be constrained to be positive


a component value such as an inductor, capacitor or resistor would
be
positive (prior knowledge)
For N data points in random noise, unbiased estimators generally have
symmetric PDFs centred about their true value, i.e.
A N (A, 2/N )

c D. P. Mandic

Advanced Signal Processing, Spring 2015

30

Notes

c D. P. Mandic

Advanced Signal Processing, Spring 2015

31

Notes

c D. P. Mandic

Advanced Signal Processing, Spring 2015

32

You might also like