ASP Lecture 3 Estimation Intro 2015

Advanced Signal Processing
Introduction to Estimation Theory

Danilo Mandic,
room 813, ext: 46271
Department of Electrical and Electronic Engineering

Imperial College London, UK
d.mandic@imperial.ac.uk,
c D. P. Mandic

URL: www.commsp.ee.ic.ac.uk/mandic
Advanced Signal Processing, Spring 2015
Aims of this lecture

To introduce the notions of: Estimator, Estimate, Estimandum
To discuss the bias and variance in statistical estimation theory,
asymptotically unbiased and consistent estimators
Performance metric: the Mean Square Error (MSE)
The biasvariance dilemma and the MSE in this context
To derive a feasible MSE estimator
A class of Minimum Variance Unbiased (MVU) estimators
Extension to the vector parameter case
Point estimators, confidence intervals, statistical goodness of an
estimator, the role of noise
c D. P. Mandic

Role of estimation in signal processing

(try also the function specgram in Matlab)
An enabling technology in many electronic signal processing systems

1. Radar
2. Sonar
3. Speech
4. Image analysis
5. Biomedicine
6. Communications
7. Control
8. Seismics
9. Almost everywhere ...
Radar and sonar: range and azimuth

Image analysis: motion estimation, segmenation
Speech: features used in recognition and speaker verification
Seismics: oil reservoirs
Communications: equalization, symbol detection
Biomedicine: various applications
c D. P. Mandic

Statistical estimation problem

for simplicity, consider a DC level in WGN, x[n] = A + w[n], w N (0, 2)
Problem statement: We seek to determine a set of parameters

= [1, . . . , p]T from a set of data points x = [x[0], . . . , x[N 1]]T
such that the values of these parameters would yield the highest
probability of obtaining the observed data. In other words,
max p(x; )
span
reads :
p(x) parametrised by
The unknown parameters may be seen as deterministic or random

variables
There are essentially two alternatives to the statistical case
No a priori distribution assumed: Maximum Likelihood
A priori distribution known: Bayesian estimation
Key problem # to estimate a group of parameters from a discrete-time
signal or dataset.
c D. P. Mandic

Estimation of a scalar random variable

Given an N - point dataset x[0], x[1], . . . , x[N 1] which depends on an
unknown parameter , (scalar), define an estimator as some function, g,
of the dataset, that is
= g(x[0], x[1], . . . , x[N 1])
which may be used to estimate (single parameter case).
(in our DC level estimation problem, = A)
This defines the problem of parameter estimation
Also need to determine g()
Theory and techniques of statistical estimation are available
Estimation based on PDFs which contain unknown but deterministic
parameters is termed classical estimation
In Bayesian estimation, the unknown parameters are assumed to be
random variables, which may be prescribed a priori to lie within some
range of allowable parameters (or desired performance)
c D. P. Mandic

The stastical estimation problem

First step: to model, mathematically, the data
We employ a Probability Desity Function (PDF) to describe the
inherently random measurement process, that is
p(x[0], x[1], . . . , x[N 1]; )
which is parametrised by the unknown parameter
Example: for N = 1, and denoting the mean value, a generic form of
PDF for the class of Gaussian PDFs with any value of is given by

1
1
2
p(x[0]; ) =
exp 22 (x[0] )
2
2
Clearly,
the
observed value of
x[0] impacts upon
the likely value of
.
c D. P. Mandic

x[0]
Estimator vs. Estimate

The parameter to be estimated is then viewed as a realisation of the
random variable
Data are described by the joint PDF of the data and parameters:
p(x, ) =
p(x | )
| {z }
p()
|{z}
(conditional P DF ) (prior P DF )
An estimator is a rule that assigns a value of from each realisation

of x = x = [x[0], . . . , x[N 1]]T
An estimate of, i.e. (also called estimandum) is the value obtained
for a given realisation of x = [x[0], . . . , x[N 1]]T in the form = g(x)
Example: for a noisy straight line:
p(x; ) =
1
e
(2 2 )N/2
i
PN 1
2
1
2
n=0 (x[n]ABn)
2
Performance is critically dependent upon this PDF assumption - the

estimator should be robust to slight mismatch between the measurement
and the PDF assumption
c D. P. Mandic

Example: finding the parameters of straight line

Specification of the PDF is critical in determining a good estimator
In practice, we choose a PDF which fits the problem constraints and any
a priori information; but it must also be mathematically tractable.
Example: Assume that on the Data: straight line embedded in
average the data are increasing
random noise w[n] N (0, 2)
x[n]
x[n] = A + Bn + w[n]
noisy line
n = 0, 1, . . . , N 1
p(x; A, B) =
1
e
(2 2 )N/2
i
PN 1
2
1
2
n=0 (x[n]ABn)
2
Unknown parameters:
A, B [A B]T
A
ideal noiseless line
0
c D. P. Mandic

Careful: what would be the effects

of bias in A and B?
Bias in parameter estimation

Estimation theory (scalar case): estimate the value of an unknown
from a set of observations of a random variable described by
parameter, ,
that parameter
= g x[0], x[1], . . . , x[N 1]

Example: given a set of observations from Gaussian distribution, estimate
the mean or variance from these observations.
Recall that in linear mean square estimation, when estimating a value of

random variable y from an observation of a related random variable x,
the coefficents a and b in the estimation y = ax + b depend upon the
mean and variance of x and y as well as on their correlation.
The difference between the expected value of the estimate and the
actual value is caleld the bias and will be denoted y B.
B = E{N }
where N denotes estimation over N data samples, x[0], . . . , x[N 1]
c D. P. Mandic

Asymptotic unbiasedness
If the bias is zero, then the expected value of the estimate is equal to the
true value, that is
E{N } =
B = E{N } = 0
and the estimate is said to be unbiased.
If B 6= 0 then the estimator = g(x) is said to be biased.
Example: Consider the sample mean estimator of the signal
x[n] = A + w[n], w N (0, 1), given by
N
1
X
1
A = x
=
x[n]
that is = A
N + 2 n=0
Is the above sample mean estimator of the true mean A biased?

More often: an estimator is biased but bias B 0 when N
lim E{N } =
N
Such as estimator is said to be asymptotically unbiased.

c D. P. Mandic

10
How about the variance?

It is desirable that an estimator be either unbiased or asymptotically
unbiased (think about the power of estimation error due to DC offset)
For an estimate to be meaningful, it is necessary that we use the
available statistics effectively, that is,
V ar 0
as
or in other words
o
n
lim var{N } = lim |N E{N }|2 = 0
N
If N is unbiased then E{N } = , and from Tchebycheff inequality > 0

var{N }
P r{|N | }
2
if V ar 0 as N , then the probability that N differs by more
than from the true value will go to zero (showing consistency).
In this case, N is said to converge to with probability one.
c D. P. Mandic

11
Mean square convergence

Another form of convergence, stronger than convergence with probability
one is mean square convergence.
An estimate N is said to converge to in the meansquare sense, if
lim E{|N |2} = 0
{z
}
N |
mean square error
For an unbiased estimator this is equivalent to the previous condition

that the variance of the estimate goes to zero
An estimate is said to be consistent if it converges, in some sense, to
the true value of the parameter
We say that the estimator is consistent if it is asymptotically
unbiased and has a variance that goes to zero as N
c D. P. Mandic

12
Example: Assessing the performance of the Sample

Mean as an estimator
Consider the estimation of a DC level, A in random noise, which could be
modelled as
x[n] = A + w[n]
w[n] some zero mean random process.
Aim: to estimate A given {x[0], x[1], . . . , x[N 1]}
Intuitively, the sample mean is a reasonable estimator
N
1
X
1
x[n]
A =
N n=0
Q1: How close will A be to A?

Q2: Are there better estimators than the sample mean?
c D. P. Mandic

13
Mean and variance of the Sample Mean estimator

x[n] = A + w[n]
w[n] N (0, 2 )
Estimator = f( random data ), a random variable itself
its performance must be judged statistically
(1) What is the mean of A?

( N 1
)
N
1
n o
X
X
1
1
E A = E
x[n] =
E {x[n]} = A # unbiased
N n=0
N n=0
(2) What is the variance of A?
Assumption: The samples of w[n]s are uncorrelated

( N 1
)
n o
n o
X
1
2
= V ar A = V ar
E A
x[n]
N n=0
N 1
1 X
1
2
2
=
V ar {x[n]} = 2 N =
2
N n=0
N
N
Notice the variance 0 as N # consistent (see your P&A sets)
c D. P. Mandic

14
Minimum Variance Unbiased (MVU) estimation

Aim: to establish good estimators of unknown deterministic parameters
Unbiased estimator # on the average yields the true value of the
unknown parameter independent of its particular value, i.e.
=
E()
a<<b
where (a, b) denotes the range of possible values of
Example: Unbiased estimator for a DC level in White Gaussian
Noise (WGN). If we are given
x[n] = A + w[n]
n = 0, 1, . . . , N 1
where A is the unknown, but deterministic, parameter to be estimated

which lies within the interval (, ), then the sample mean can be used
as an estimator of A, namely
N
1
X
1
A =
x[n]
N n=0
c D. P. Mandic

15
Careful: the estimator is parameter dependent!

An estimator may be unbiased for certain values of the unknown
parameter but not all, such an estimator is not unbiased
Consider another sample mean estimator:
PN 1
1
A = 2N n=0 x[n]
n o
when A = 0
E A = 0
Therefore:
n o
E A =
A
2
when A 6= 0
but
(parameter dependent)
Hence A is not an unbiased estimator

A biased estimator introduces a systemic error which should not
generally be present
Our goal is to avoid bias if we can, as we are interested in stochastic
signal properties and bias is largely deterministic
c D. P. Mandic

16
Remedy
(also look in the Assignment dealing with PSD in your CW )
Several unbiased estimates of the same quantity may be averaged

together, i.e. given the L independent estimates
n
o
1, 2, . . . , L
We may choose to average them, to yield

P
= L1 L
l=1 l
Our assumption was that the individual estimators are unbiased, with equal
variance, and uncorrelated with one another.
Then (NB: averaging biased estimators will not remove the bias)
n o
E =
and
n o
V ar =
1
L2
n o
n o
1
l=1 V ar l = L V ar l
PL
Note, as L ,
c D. P. Mandic

(consistent)
17
Effects of averaging for real world data

Problem 3.4 from your P/A sets: heart rate estimation
The heart rate, h, of a patientnis automatically recorded
by a computer every 100ms. In
o
. Given than
one second the measurements h1, h2, . . . , h10 are averaged to obtain h
n o
E hi = h for some constant and var(hi ) = 1 for all i, determine whether
averaging improves the estimator if = 1 and = 1/2.
Before Averaging
10
10
n o
n o
X
1
i
=
var h
var h
2
L i=1
p(hi )
h i
1
2
h/2
c D. P. Mandic

=1
1
2
p(hi )
10
X
h = h
=
10 i=1
If = 1, unbiased, if = 1/2 it will not
be unbiased unless the estimator is formed
= 1 P10 h
as h
i=1 i [n].
5
n o
E h
=1
p(hi )
After Averaging
p(hi )
1 X
hi[n],
10 i=1
h i
h/2
18
Minimum variance criterion

An optimality criterion is necessary to define an optimal estimator
Mean Square Error (MSE)

2
=E

M SE()
measures the average mean squared deviation of the estimator from the
true value.
This criterion leads, however, to unrealisable estimators - namely, ones
which are not solely a function of the data
h

i2
+ E()

=E
E()
M SE()
n
+ E ()

= V ar()
o2
+ B 2()
= V ar()
MSE = VARIANCE OF THE ESTIMATOR + SQUARED BIAS

c D. P. Mandic

19
Example: An MSE estimator with gain factor

Consider the following estimator for DC level in WGN
N
1
X
1
A = a
x[n]
N n=0
Task: Find a which results in minimum MSE

Given
n o
E A = aA
=
V ar(A)
and
a2 2
N
we have
2 2
a
+ (a 1)2A2
M SE(A) =
N
Of course, the choice of a = 1 removes the mean and minimises the variance
c D. P. Mandic

20
Continued: MSE estimator with gain

But, can we find a analytically? Differentiating with respect to a yields
M SA
2a 2
(A) =
+ 2(a 1)A2
a
N
and setting the result to zero gives the optimal value
aopt =
A2
2
A2 + N
but we do not know the value of A

The optimal value depends upon A which is the unknown parameter
Comment - any criterion which depends on the value of the unknown
parameter to be found is likely to yield unrealisable estimators
Practically, the minimum MSE estimator needs to be abandoned, and
the estimator must be constrained to be unbiased
c D. P. Mandic

21
A counter-example: A little bias can help

(but the estimator is difficult to control)
Q: Let {y[n]}, n = 1, . . . , N be iid Gaussian variables N (0, 2).
Consider the following estimate of 2
N
X
2 =
y 2[n] > 1
N n=1
Find which minimises the MSE of
2.
S: It is straightforward to show that E{ 2} = 2 and
M SE(
2)
2
4 } + 4(1 2)
E{
2 2 } = E{
2 X X
2
2
4
E{y
[n]y
[s]}
+
(1 2)
2
N n=1 s=1
i

h
2 2 4
2
4
4
4
4
N + 2N + (1 2) = (1 + ) + (1 2)
N2
N
The MMSE is obtained for min =

2
2 4
N +2 .
N
N +2
and has the value
min M SE(
)=
Given that the minimum variance of an
unbiased estimator (CRLB, later) is 2 4/N , this is an example of a
biased estimator which obtains a lower MSE than the CRLB.
c D. P. Mandic

22
Example: Estimation of hearing threshold (hearing aids)

Objective estimation of hearing threshold:
Design AM or FM stimuli with carrier frequency up to 20 kHz
Modulate this carrier with e.g. 40 Hz envelope
Present this sound to the listener
Measure the electroencephalogram, which should contain a 40 Hz signal
if the subject was able to detect the sound
This is a good objective estimate of the hearing curve
c D. P. Mandic

23
Key points
Simulation: Auditory steady state response (ASSR), 40 Hz amplitude modulated
Time series
An estimator is a random
variable, its performance can
only be completely described
statistically or by a PDF.
Estimators performance cannot
be conclusively assessed by
one computer simulation - we
often average M trials of
independent simulations, called
Monte Carlo analysis.
Performance and computation
complexity are often traded - the
optimal estimator is replaced by
a suboptimal but realisable one.
x 10
PSD analysis
2
0
15
10
90
92
94
96
98
20
25
30
35
40
45
25
30
35
40
45
25
30
35
40
45
x 10
2
0
15
10
2
98
100
102
104
106
20
x 10
4
2
0
2
15
10
4
106
108
110
112
114
20
Averaged PSD analysis
x 10
4
2
0
2
4
90
15
10
95
100
105
Time (s)
110
20
25
30
35
40
Frequency (Hz)
45
Simulation: Left column: The 24s of ITE EEG sampled at 256Hz was split into 3
segments of length 8s. Left column: The respective periodograms (rectangular window).
Bottom right: The averaged periodogram, observe the reduction in variance (in red).
c D. P. Mandic

24
Desired: minimum variance unbiased (MVU) estimator

Minimising the variance of an unbiased estimator concentrates the PDF of
the error about zero estimation error is therefore less likely to be large
Existence of the MVU estimator
The MVU estimator is an unbiased estimator with minimum

variance for all , that is, 3 on the graph.
c D. P. Mandic

25
Methods to find the MVU estimator

The MVU estimator may not always exist
A single unbiased estimator may not exist in which case a search
for the MVU is fruitless!
1. Determine the Cramer-Rao lower bound (CRLB) and find some
estimator which satisfies
2. Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem
3. Restrict the class of estimators to be not only unbiased, but also linear
(BLUE)
4. Sequential vs. block estimators
5. Adaptive estimators
c D. P. Mandic

26
Extensions to the vector parameter case

h
iT
If = 1, 2, . . . , p Rp1 is a vector of unknown parameters, an

estimator is unbiased if
E(i) = i
ai < i < b i
for i = 1, 2, . . . , p
and by defining
E(1)
E(2)
E() =
..
E(p)
an unbiased estimator has the property
=
E()
within the p dimensional space of parameters

An MVU estimator has the additional property that V ar(i) for
i = 1, 2, . . . , p is minimum among all unbiased estimators
c D. P. Mandic

27
Summary and food for thoughts

We are now equipped with performance metrics for assessing the
goodnes of any estimator (bias, variance, MSE)
Since MSE = var + bias2, some biased estimators may yield low MSE.
However, we prefer the minimum variance unbiased (MVU) estimators
Even a simple Sample Mean estimator is a very rich example of the
advantages of statistical estimators
The knowledge of the parametrised PDF p(data;parameters) is very
important for designing efficient estimators
We have introduced statistical point estimators, would it be useful to
also know the confidence we have in our point estimate
In many disciplines it is useful to design so called set membership
estimates, where the output of an estimator belongs to a pre-definined
bound (range) of values
We will next address linear, best linear unbiased, maximum likelihood,
least squares, sequential least squares, and adaptive estimators
c D. P. Mandic

28
Homework: Check another proof for the MSE expression

= var()
+ bias2()
MSE()

2
2
Note : var(x) = E[x ] E[x]
Idea :
to give
Let x =
()
substitute into ()

2
2
var( ) = E[( ) ] E[ ]
()
| {z }
| {z } |
{z
}
term (1)
term (2)
term (3)
Let us now evaluate these terms:

(1)
(2)
(3)
var( ) = var()
E[ ]2 = MSE

2
2
2
E[ ] = E[] E[] = E[ ] = bias2()
Substitute (1), (2), (3) into (**) to give

= MSE bias2
var()
c D. P. Mandic

+ bias2()
MSE = var()
29
Recap: Unbiased estimators

Due to the linearity properties of the E {}, that is
E{a + b} = E{a} + E{b}
the sample mean operator can be simply shown to be unbiased, i.e.
n o
PN 1
PN 1
1
1
E A = N n=0 E {x[n]} = N n=0 A = A
In some applications, the value of A may be constrained to be positive

a component value such as an inductor, capacitor or resistor would
be
positive (prior knowledge)
For N data points in random noise, unbiased estimators generally have
symmetric PDFs centred about their true value, i.e.
A N (A, 2/N )
c D. P. Mandic

30
Notes
c D. P. Mandic

31
Notes
c D. P. Mandic

32

ASP Lecture 3 Estimation Intro 2015

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ASP Lecture 3 Estimation Intro 2015

Uploaded by

Copyright:

Available Formats

Advanced Signal Processing

Introduction to Estimation Theory

Department of Electrical and Electronic Engineering

Aims of this lecture

Advanced Signal Processing, Spring 2015

Role of estimation in signal processing

An enabling technology in many electronic signal processing systems

Radar and sonar: range and azimuth

Advanced Signal Processing, Spring 2015

Statistical estimation problem

Problem statement: We seek to determine a set of parameters

The unknown parameters may be seen as deterministic or random

Advanced Signal Processing, Spring 2015

Estimation of a scalar random variable

Advanced Signal Processing, Spring 2015

The stastical estimation problem

Advanced Signal Processing, Spring 2015

Estimator vs. Estimate

An estimator is a rule that assigns a value of from each realisation

Performance is critically dependent upon this PDF assumption - the

Advanced Signal Processing, Spring 2015

Example: finding the parameters of straight line

Careful: what would be the effects

Bias in parameter estimation

= g x[0], x[1], . . . , x[N 1]

Recall that in linear mean square estimation, when estimating a value of

Advanced Signal Processing, Spring 2015

Is the above sample mean estimator of the true mean A biased?

Such as estimator is said to be asymptotically unbiased.

Advanced Signal Processing, Spring 2015

How about the variance?

If N is unbiased then E{N } = , and from Tchebycheff inequality > 0

Advanced Signal Processing, Spring 2015

Mean square convergence

For an unbiased estimator this is equivalent to the previous condition

Advanced Signal Processing, Spring 2015

Example: Assessing the performance of the Sample

Q1: How close will A be to A?

Advanced Signal Processing, Spring 2015

Mean and variance of the Sample Mean estimator

(1) What is the mean of A?

(2) What is the variance of A?

Assumption: The samples of w[n]s are uncorrelated

Advanced Signal Processing, Spring 2015

Minimum Variance Unbiased (MVU) estimation

where A is the unknown, but deterministic, parameter to be estimated

Advanced Signal Processing, Spring 2015

Careful: the estimator is parameter dependent!

Hence A is not an unbiased estimator

Advanced Signal Processing, Spring 2015

Several unbiased estimates of the same quantity may be averaged

We may choose to average them, to yield

Advanced Signal Processing, Spring 2015

Effects of averaging for real world data

Advanced Signal Processing, Spring 2015

Minimum variance criterion

MSE = VARIANCE OF THE ESTIMATOR + SQUARED BIAS

Advanced Signal Processing, Spring 2015

Example: An MSE estimator with gain factor

Task: Find a which results in minimum MSE

Advanced Signal Processing, Spring 2015

Continued: MSE estimator with gain

but we do not know the value of A

Advanced Signal Processing, Spring 2015