Professional Documents
Culture Documents
Coursework
Sta:
17%
30%
40%
10%
Attendance/participation/enthusiasm in
recitations/tutorials
3%
Sample space
LECTURE 1
Readings: Sections 1.1, 1.2
Mutually exclusive
Lecture outline
Collectively exhaustive
Probabilistic models
sample space
probability law
Axioms of probability
Simple examples
= {(x, y) | 0 x, y 1}
Y = Second
1,1
1,2
1,3
1,4
roll
1
1
X = First roll
4,4
Probability axioms
Y = Second 3
roll
Axioms:
1. Nonnegativity: P(A) 0
1
1
2. Normalization: P() = 1
X = First roll
P({X = 1}) =
= P(s1) + + P(sk )
P(X + Y is odd) =
P(min(X, Y ) = 2) =
number of elements of A
P(A) =
total number of sample points
P(X + Y 1/2) = ?
P( (X, Y ) = (0.5, 0.3) )
Remember!
1/2
1/4
1/8
1/16
..
1
1
1
1
+ 4 + 6 + =
22
3
2
2
LECTURE 2
Sample space
Mutually exclusive
Collectively exhaustive
Lecture outline
Right granularity
Review
Conditional probability
3. If A B = ,
then P(A B) = P(A) + P(B)
Conditional probability
Y = Second 3
roll
2
1
P(A | B) = probability of A,
given that B occurred
X = First roll
P(A B)
P(A | B) =
P(B)
Let M = max(X, Y )
P(M = 1 | B) =
P(M = 2 | B) =
Multiplication
rule
Multiplication rule
(AB
B C)
P(B | A)
P(C
(C ||A
A
B)
PP
(A
C) =
= PP(A)
(A)P
A)P
B)
P(C | A B)
A B C
P(B | A)
P(Bc | A)
A Bc C
U
A
P(A)
U
P(Ac)=0.95
P(Bc | A)=0.01
Bc
A Bc Cc
U
P(B | Ac)=0.10
P(Bc | Ac)=0.90
P(A)=0.05
A B
P(B | A)=0.99
U U
P(Ac)
Ac
P(A B) =
P(B) =
P(A | B) =
Bayes rule
Prior probabilities P(Ai)
initial beliefs
A1
B
A1
A2
A3
A2
P(B) =
A3
P(A1)P(B | A1)
+ P(A2)P(B | A2)
+ P(A3)P(B | A3)
P(Ai | B) =
=
P(Ai B)
P(B)
P(Ai)P(B | Ai)
P(B)
P(Ai)P(B | Ai)
= !
j P(Aj )P(B | Aj )
LECTURE 3
Review
Independence of two events
Independence of a collection of events
HHT
HTH
1- p
HTT
THH
p
1- p
THT
TTH
1- p
TTT
1- p
P(T HT ) =
P(1 head) =
P(Ai)P(B | Ai)
P(Ai | B) =
P(B)
P(B | A) = P(B)
occurrence of A
provides no information
about Bs occurrence
1- p
1- p
1- p
Multiplication rule:
Defn:
HHH
Review
P(A B)
P(A | B) =
,
P(B)
Intuitive definition:
Information on some of the events tells
us nothing about probabilities related to
the remaining events
0.9
0.1
0.9
E.g.:
Coin A
0.9
0.5
0.1
0.1
Mathematical definition:
Events A1, A2, . . . , An
are called independent if:
0.1
0.5
0.1
0.9
Coin B
0.1
0.9
0.9
B: Second toss is H
P(A) = P(B) = 1/2
HH
HT
TH
TT
LECTURE 4
Lecture outline
P(A) =
Principles of counting
number of elements of A
|A|
=
total number of sample points
||
Just count. . .
Many examples
permutations
k-permutations
combinations
partitions
Binomial probabilities
Example
r stages
ni choices at stage i
Answer:
Combinations
Binomial probabilities
P(HT T HHH) =
P(k heads) =
n
k! =
P(seq.)
khead seq.
n!
(n k)!
n!
k!(n k)!
k=0
pk (1 p)nk
Partitions
Number of outcomes in B:
48!
12! 12! 12! 12!
Answer:
48!
12! 12! 12! 12!
52!
13! 13! 13! 13!
432
LECTURE 5
Random variables
Lecture outline
Mathematically: A function
from the sample space to the real
numbers
Random variables
Probability mass function (PMF)
Expectation
Variance
Notation:
random variable X
numerical value x
pX (x) = P(X = x)
= P({ s.t. X() = x})
pX (x) 0
!
x pX (x) = 1
3
S = Second roll
2
pX (k) = P(X = k)
= P(T T T H)
= (1 p)k1p,
k = 1, 2, . . .
F = First roll
geometric PMF
pX (2) =
10
Binomial PMF
Expectation
Definition:
E[X] =
$
x
P(H) = p
Interpretations:
Center of gravity of PMF
Average in large number of repetitions
of the experiment
(to be substantiated later in this course)
Let n = 4
pX (2) = P(HHT T ) + P(HT HT ) + P(HT T H)
+P(T HHT ) + P(T HT H) + P(T T HH)
= 6p2(1 p)2
=
"4#
Example: Uniform on 0, 1, . . . , n
p2(1 p)2
In general:
"n#
pX (k) =
pk (1p)nk ,
k
pX(x )
1/(n+1)
...
k = 0, 1, . . . , n
0
E[X] = 0
Easy: E[Y ] =
y
$
x
Recall:
E[g(X)] =
$
x
ypY (y)
g(x)pX (x)
g(x)pX (x)
Variance
! 2
x x pX (x)
var(X) = E (X E[X])2
$
x
Prop erties:
Variance
n- 1
1
1
1
+1
+ +n
=
n+1
n+1
n+1
Properties of expectations
xpX (x)
&
= E[X 2] (E[X])2
E[] =
Properties:
E[X] =
var(X) 0
var(X + ) = 2var(X)
E[X + ] =
11
LECTURE 6
Review
Random variable X: function from
sample space to the real numbers
Lecture outline
Review: PMF, expectation, variance
Expectation:
Conditional PMF
E[X] =
Geometric PMF
E[g(X)] =
!
x
!
x
xpX (x)
g(x)pX (x)
E[X + ] = E[X] +
"
E X E[X] =
var(X) = E (X E[X])2
=
!
x
(x E[X])2pX (x)
= E[X 2] (E[X])2
Standard deviation:
X =
&
var(X)
Random speed
1/2
pV (v )
200
1/2
pV (v )
1/2
1/2
200
E[V ] =
'
v t(v)pV (v) =
E[T V ] = 200 =
" E[T ] E[V ]
var(V ) =
V =
12
Geometric PMF
X: number of independent coin tosses
until first head
pX|A(x) = P(X = x | A)
E[X | A] =
!
x
xpX |A(x)
pX (k) = (1 p)k1p,
pX (x )
E[X] =
k = 1, 2, . . .
kpX (k) =
k=1
k=1
k(1 p)k1p
1/4
pX (k)
pX |X>2(k)
p(1-p)2
x
...
Let A = {X 2}
...
pX- 2|X>2(k)
pX|A(x) =
E[X | A] =
...
k
Joint PMFs
A1
1/20
A2
A3
!!
x
pX,Y (x, y) =
!
pX (x) =
Geometric example:
A1 : {X = 1}, A2 : {X > 1}
pX |Y (x | y) = P(X = x | Y = y) =
E[X] =
P(X = 1)E[X | X = 1]
+P(X > 1)E[X | X > 1]
13
!
x
pX,Y (x, y)
pX |Y (x | y) =
pX,Y (x, y)
pY (y)
LECTURE 7
Review
pX (x) = P(X = x)
Lecture outline
pX |Y (x | y) = P(X = x | Y = y )
Joint PMF
Conditioning
Independence
pX (x) =
More on expectations
!
y
pX,Y (x, y)
Expectations
E[X] =
!
x
E[g(X, Y )] =
!!
x
xpX (x)
g(x, y)pX,Y (x, y)
"
for all x, y, z
E[X + ] = E[X] +
y
4
1/20
1
Independent?
What if we condition on X 2
and Y 3?
14
Variances
Var(aX) = a2Var(X)
X = # of successes in n independent
trials
Var(X + a) = Var(X)
probability of success p
Let Z = X + Y .
If X, Y are independent:
E[X] =
n
!
k=0
1,
0,
If X = Y , Var(X + Y ) =
E[Xi] =
If X = Y , Var(X + Y ) =
E[X] =
If X, Y indep., and Z = X 3Y ,
Var(Z) =
Var(Xi) =
"n#
pk (1 p)nk
if success in trial i,
otherwise
Var(X) =
X2 =
Find E[X]
Xi =
1,
0,
Xi2 +
XiXj
i,j :i=j
#
E[Xi2] =
if i selects own hat
otherwise.
X = X1 + X2 + + Xn
P(Xi = 1) =
E[Xi] =
Are the Xi independent?
E[X 2 ] =
E[X] =
Var(X) =
15
LECTURE 8
fX(x)
Sample Space
P(a X b) =
!
P(X B) =
E[X] =
fX (x) dx
fX (x) dx = 1
P(x X x + ) =
! b
! x+
x
fX (x) dx,
fX (s) ds fX (x)
xfX (x) dx
!
E[g(X)] =
g(x)fX (x) dx
!
2 =
(x E[X ])2fX (x) dx
var(X ) = X
FX (x) = P(X x) =
! x
fX (t) dt
CDF
fX(x )
fX (x) =
FX (x) = P(X x) =
axb
pX (k)
kx
3/6
2/6
E[X] =
2 =
X
1/6
! b"
a
#
a+b 2 1
(b a)2
dx =
2
ba
12
16
Mixed distributions
1
0.5
-1
1/2
E[X] =
x0
-1
var(X) = 1
2
2
1
fX (x) = e(x) /2
2
1
3/4
Let Y = aX + b
Then: E[Y ] =
1/4
1/2
Var(Y ) =
Fact: Y N (a + b, a2 2)
If X N (, 2), then
pX (x)
X
N(
If X N (2, 16):
"
FX (x)
E[X], var(X)
X 2
32
3) =Random
PSec.
(X
P Variables
= CDF(0.25)
3.3 Normal
155
4
4
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
0.0
0.1
0.2
0.3
0.4
.5000
.5398
.5793
.6179
.6554
.5040
.5438
.5832
.6217
.6591
.5080
.5478
.5871
.6255
.6628
.5120
.5517
.5910
.6293
.6664
.5160
.5557
.5948
.6331
.6700
.5199
.5596
.5987
.6368
.6736
.5239
.5636
.6026
.6406
.6772
.5279
.5675
.6064
.6443
.6808
.5319
.5714
.6103
.6480
.6844
.5359
.5753
.6141
.6517
.6879
0.5
0.6
0.7
0.8
0.9
.6915
.7257
.7580
.7881
.8159
.6950
.7291
.7611
.7910
.8186
.6985
.7324
.7642
.7939
.8212
.7019
.7357
.7673
.7967
.8238
.7054
.7389
.7704
.7995
.8264
.7088
.7422
.7734
.8023
.8289
.7123
.7454
.7764
.8051
.8315
.7157
.7486
.7794
.8078
.8340
.7190
.7517
.7823
.8106
.8365
.7224
.7549
.7852
.8133
.8389
1.0
1.1
1.2
1.3
1.4
.8413
.8643
.8849
.9032
.9192
.8438
.8665
.8869
.9049
.9207
.8461
.8686
.8888
.9066
.9222
.8485
.8708
.8907
.9082
.9236
.8508
.8729
.8925
.9099
.9251
.8531
.8749
.8944
.9115
.9265
.8554
.8770
.8962
.9131
.9279
.8577
.8790
.8980
.9147
.9292
.8599
.8810
.8997
.9162
.9306
.8621
.8830
.9015
.9177
.9319
1.5
1.6
1.7
1.8
1.9
.9332
.9452
.9554
.9641
.9713
.9345
.9463
.9564
.9649
.9719
.9357
.9474
.9573
.9656
.9726
.9370
.9484
.9582
.9664
.9732
.9382
.9495
.9591
.9671
.9738
.9394
.9505
.9599
.9678
.9744
.9406
.9515
.9608
.9686
.9750
.9418
.9525
.9616
.9693
.9756
.9429
.9535
.9625
.9699
.9761
.9441
.9545
.9633
.9706
.9767
2.0
2.1
2.2
2.3
2.4
.9772
.9821
.9861
.9893
.9918
.9778
.9826
.9864
.9896
.9920
.9783
.9830
.9868
.9898
.9922
.9788
.9834
.9871
.9901
.9925
.9793
.9838
.9875
.9904
.9927
.9798
.9842
.9878
.9906
.9929
.9803
.9846
.9881
.9909
.9931
.9808
.9850
.9884
.9911
.9932
.9812
.9854
.9887
.9913
.9934
.9817
.9857
.9890
.9916
.9936
2.5
2.6
2.7
2.8
.9938
.9953
.9965
.9974
.9940
.9955
.9966
.9975
.9941
.9956
.9967
.9976
.9943
.9957
.9968
.9977
.9945
.9959
.9969
.9977
.9946
.9960
.9970
.9978
.9948
.9961
.9971
.9979
.9949
.9962
.9972
.9979
.9951
.9963
.9973
.9980
.9952
.9964
.9974
.9981
fX (x)
17
pX,Y (x, y)
fX,Y (x, y)
pX |Y (x | y)
fX |Y (x | y)
LECTURE 9
fX(x)
Sample Space
Outline
PDF review
independence
P(a X b) =
Examples
E[g(X)] =
fX (x)
FX (x)
xpX (x)
E[X]
var(X)
pX,Y (x, y)
fX (x) dx
P(x X x + ) fX (x)
Summary of concepts
pX (x)
g(x)fX (x) dx
xfX (x) dx
fX,Y (x, y)
pX|A(x)
fX |A(x)
pX |Y (x | y )
fX |Y (x | y)
P((X, Y ) S) =
Buons needle
Parallel lines at distance d
Needle of length (assume < d)
Find P(needle intersects one of the lines)
fX,Y (x, y) dx dy
q
x
l
Interpretation:
Expectations:
E[g (X, Y )] =
fX,(x, ) =
Intersect if X
P X
0 x d/2, 0 /2
for all x, y
18
sin
2
sin
2
4 /2 (/2) sin
dx d
d 0
0
4 /2
2
sin d =
2
d
d 0
x 2 sin
fX (x)f() dx d
Conditioning
Recall
P(x X x + ) fX (x)
P(x X x + | Y y) fX |Y (x | y)
This leads us to the definition:
fX |Y (x | y) =
fX,Y (x, y)
if fY (y) > 0
fY (y)
Slice through
density surface
for fixed x
fX |Y (x|y) = fX (x)
Stick-breaking example
Break a stick of length twice:
break at X: uniform in [0, 1];
break again at Y , uniform in [0, X]
fX,Y (x, y) =
1
,
x
0yx
y
f Y |X (y | x)
f X(x)
on the set:
fY (y) =
=
=
L
E[Y ] =
E[Y | X = x] =
yfY |X (y | X = x) dy =
19
fX,Y (x, y) dx
1
y x
dx
log ,
yfY (y) dy =
0y
1
y log dy =
y
4
0
LECTURE 10
pX|Y (x | y) =
pX,Y (x, y)
pY (y)
pY (y) =
Readings:
Section 3.6; start Section 4.1
pX (x)pY |X (y | x)
pY (y)
pX (x)pY |X (y | x)
Example:
Review
pX (x)
pX |Y (x | y) =
pX (x) =
pX,Y (x, y)
pX,Y (x, y)
pY (y)
pX,Y (x, y)
fX (x)
fX,Y (x, y)
fX|Y (x | y) =
fX (x) =
fX,Y (x, y)
Continuous counterpart
fY (y)
fX,Y (x, y) dy
fX|Y (x | y) =
fX,Y (x, y)
fY (y) =
FX (x) = P(X x)
Discrete X, Continuous Y
fY (y ) =
fY (y)
y
f X ,Y(y,x)=1
1
Continuous X, Discrete Y
fX (x)fY |X (y | x) dx
pX (x)fY |X (y | x)
Example:
X: a discrete signal; prior pX (x)
Y : noisy version of X
fY |X (y | x): continuous noise model
pY (y) =
fY (y)
pX (x)fY |X (y | x)
fX |Y (x | y) =
fX (x)fY |X (y | x)
E[X], var(X)
pX |Y (x | y) =
fY (y)
fX (x)pY |X (y | x)
g(X, Y ) = Y /X
pY (y)
fX (x)pY |X (y | x) dx
Example:
X: a continuous signal; prior fX (x)
(e.g., intensity of light beam);
Y : discrete r.v. aected by X
(e.g., photon count)
pY |X (y | x): model of the discrete r.v.
E[g(X, Y )] =
20
Discrete case
Two-step procedure:
pY (y) = P(g(X) = y)
pX (x)
=
fY (y) =
x: g(x)=y
dFY
(y)
dy
y
g(x)
.
.
.
.
.
.
.
Example
.
.
.
.
.
.
.
X: uniform on [0,2]
Find PDF of Y = X 3
Solution:
FY (y) = P(Y y) = P(X 3 y)
1
= P(X y 1/3) = y 1/3
2
fY (y) =
Example
dFY
1
(y) =
dy
6y 2/3
Y = 2X + 5:
fX
faX
faX+b
200
Let T (V ) =
.
V
Find fT (t)
-2
-1
f v(v0 )
1/30
fY (y) =
30
60
v0
1
yb
fX
a
|a|
21
LECTURE 11
A general formula
Let Y = g(X)
g strictly monotonic.
slope
dg
(x)
dx
g(x)
[y, y+?]
Example
y
f X ,Y(y,x)=1
[x, x+d]
Hence,
z1
FZ (z) =
z1
where y = g(x)
The distribution of X + Y
W = X + Y ; X, Y independent
dx
!
!
(x)!!
W = X + Y ; X, Y independent
!
! dg
fX (x) = fY (y) !!
y
(0,3)
(1,2)
(2,1)
(3,0)
pW (w) = P(X + Y = w)
=
=
"
x
"
x
x
x +y=w
P(X = x)P(Y = w x)
fW |X (w | x) = fY (w x)
pX (x)pY (w x)
Mechanics:
= fX (x)fY (w x)
fW (w) =
22
fX (x)fY (w x) dx
(x x)2 (y y )2
1
=
exp
2xy
2x2
2y2
Let W = X + Y
fW (w) =
=
Correlation coefficient
&
'
.
.
. . .... . .. .
. ... ...... ... .. . . .
. .. ... . .... ... . .. .. . ... . .
.
. . . .. .
. . .. .. . . ... .. . .
. . .
.
i=1
Xi =
n
"
i=1
var(Xi) + 2
"
1 1
y
(X E[X]) (Y E[Y ])
X
Y
cov(X, Y )
=
X Y
Covariance
n
"
2
2
2
2
1
ex /2x e(wx) /2y dx
2xy
mean=0, variance=x2 + y2
fX (x)fY (w x) dx
Conclusion: W is normal
is constant
.
.
. . .... . .. .
. . ... ...... ... .. . . .
. .. ... . ... ... . .. .. . ... . .
. . . .. .. .
y
. . . .. . . ... .. . .
.
.
. .
(algebra) = cew
(x x)2
(y y ) 2
+
2x2
2y2
cov(Xi, Xj )
&
(i,j ):i=j
independent cov(X, Y ) = 0
(converse is not true)
23
LECTURE 12
Conditional expectations
Given the value y of a r.v. Y :
E[X | Y = y] =
!
x
xpX|Y (x | y)
Lecture outline
Conditional expectation
Law of iterated expectations
E[X | Y = y] =
E[X | Y ] =
Y
2
y
(number)
2
(r.v.)
mean, variance
Law of iterated expectations:
E[E[X | Y ]] =
!
y
In stick example:
E[X] = E[E[X | Y ]] = E[Y /2] = !/4
Two sections:
y = 1 (10 students); y = 2 (20 students)
y=1:
10
1 !
xi = 90
10 i=1
E[X] =
Proof:
E[X | Y ] =
E[E[X | Y ]] =
30
1 !
xi = 60
20 i=11
30
1 !
90 10 + 60 20
xi =
= 70
30
30 i=1
E[X | Y = 1] = 90,
y=2:
E[X | Y = 2] = 60
90,
60,
w.p. 1/3
w.p. 2/3
1 90 + 2 60 = 70 =
3
3
E[X]
1
2
(90 70)2 + (60 70)2
3
3
600
=
= 200
3
var(E[X | Y ]) =
24
Example
30
1 !
(xi60)2 = 20
20 i=11
var(X | Y = 1) = 10
var(X | Y ) =
E[var(X | Y )] =
var(X | Y = 2) = 20
10,
20,
1/3
Y=1
w.p. 1/3
w.p. 2/3
E[X | Y = 1] =
1 10 + 2 20 = 50
3
3
3
var(X | Y = 1) =
var(X) = E[var(X | Y )] + var(E[X | Y ])
50
=
+ 200
3
= (average variability within sections)
Y=2
E[X | Y = 2] =
var(X | Y = 2) =
E[X] =
var(E[X | Y ]) =
E[Y | N ] = N E[X]
var(E[Y | N ]) = (E[X])2 var(N )
Xi assumed i.i.d.
var(Y | N = n) = n var(X)
var(Y | N ) = N var(X)
E[var(Y | N )] = E[N ] var(X)
independent of N
Let Y = X1 + + XN
E[Y | N = n] =
=
=
=
E[X1 + X2 + + Xn | N = n]
E[X1 + X2 + + Xn]
E[X1] + E[X2] + + E[Xn]
n E[X]
E[Y | N ] = N E[X]
E[Y ] = E[E[Y | N ]]
= E[N E[X]]
= E[N ] E[X]
25
LECTURE 13
At each trial, i:
P(success) = P(Xi = 1) = p
Lecture outline
P(failure) = P(Xi = 0) = 1 p
Examples:
Random processes
Random processes
First view:
sequence of random variables X1, X2, . . .
P(S = k) =
E[Xt] =
E[S] =
Var(Xt) =
Var(S) =
Second view:
what is the right sample space?
P(Xt = 1 for all t) =
Random processes we will study:
Bernoulli process
(memoryless, discrete time)
Poisson process
(memoryless, continuous time)
Markov chains
(with memory/dependence across time)
26
Interarrival times
P(T1 = t) =
Memoryless property
independent of T1
E[T1] =
Var(T1) =
Var(Yk ) =
P(Yk = t) =
Sec. 6.1
305
time
Merged process:
Bernoulli (p + q pq)
q
Original
process
Chap. 6
time
time
Bernoulli (q)
1q
time
time
yields
Bernoulli processes
In a reverse situation, we start with two independent Bernoulli processes
np has a moderate value. A situation of this type arises when one passes from
discrete to continuous time, a theme to be picked up in the next section. For
some examples, think of the number of airplane accidents on any given day:
there is a large number n of trials (airplane flights), but each one has a very
small probability p of being involved in an accident. Or think of counting the
number of typos in a book: there is a large number of words, but a very small
probability of misspelling any single one.
Mathematically, we can address situations of this kind, by letting n grow
while simultaneously decreasing p, in a manner that keeps the product np at a
constant value . In the limit, it turns out that the formula for the binomial PMF
simplifies to the Poisson PMF. A precise statement is provided next, together
with a reminder of some of the properties of the Poisson PMF that were derived
in Chapter 2.
(with parameters p and q, respectively) and merge them into a single process,
as follows. An arrival is recorded in the merged process if and only if there
is an arrival in at least one of the two original processes. This happens with
probability p + q pq [one minus the probability (1 p)(1 q) of no arrival in
either process]. Since dierent time slots in either of the original processes are
independent, dierent slots in the merged process are also independent. Thus,
the merged process is Bernoulli, with success probability p + q pq at each time
step; see Fig. 6.4.
Splitting and merging of Bernoulli (or other) arrival processes arises in
many contexts. For example, a two-machine work center may see a stream of
arriving parts to be processed and split them by sending each part to a randomly
chosen machine. Conversely, a machine may be faced with arrivals of dierent
types that can be merged into a single arrival stream.
The Poisson Approximation to the Binomial
k
,
k!
k = 0, 1, 2, . . . .
27
var(Z) = .
LECTURE 14
Bernoulli review
Discrete time; success probability p
Readings: LECTURE
Start Section
166.2.
Number
of arrivals in
n time
slots:5.2.
Readings:
Start
Section
Time to k arrivals: Pascal pmf
binomial pmf
N
b
Definition of
Poisson
process
Lecture
outline
Memorylessness
Interarrival
time pmf:
geometric
pmf
Lecture
outline
Distribution
of number
of arrivals
Review of Bernoulli
process
TimetoReview
k arrivals:
Pascal pmf
of Bernoulli
process
Distribution
interarrival
times
Definition ofof
Poisson
process
Memorylessness
Definition of Poisson process
Other
properties
of the of
Poisson
Distribution
of number
arrivalsprocess
t1
0
x x
x x
t3
x x
xx
PMF of NumberPoisson
of Arrivals N
PMF Definition
of Numberofofthe
Arrivals N process
t2
t1
x xk
x x
0 ( ) e
x x
Time
Time
P (k, )homogeneity:
= Prob. of k arrivals in interval
Pof(k,duration
) = Prob.
of k arrivals in interval
of duration
Assumptions:
Numbers of arrivals in disjoint time
Numbers of arrivals in disjoint time inintervals
independent
tervals are
are independent
For VERY
small
:
Small
interval
probabilities:
1
, ifif kk=
0;
0
if k > 1
P (k, ) ,
if k = 1;
= arrival rate
0,
if k > 1.
P (k, ) =
k!
t3
x x
xx
!
x
k = 0, 1, . . .
x x
Time
E[N ] =
P (k, ) = Prob. of k arrivals in interval
Finely
discretize [0, t]: approximately Bernoulli
2 = of duration
N
Assumptions:
t(esdiscrete
1)
Nte(of
=
approximation): binomial
MN (s)
Numbers of arrivals in disjoint time intervals are independent
Taking 0 (or n ) gives:
For VERY small :
( )k
eaccording
Example: You
to a
P (k, get
) =email
1 , ifk k==0,01, . . .
k!
Poisson process at
a
rate
of
=
0.4
mesP (k, )
if k = 1
: arrival rate
28
Exa
Poi
sag
thir
Example
Interarrival Times
Interarrival Times
Yk time of kth arrival
Erlang distribution:
k k1
ey
y
fYk (y) = k k1 y ,
y(k e1)!
fYk (y) =
to the
past
(k 1)!
flr (l)
fY (y)
k
y0
y0
r=1
r=2
r=3
k=1
k=2
k=3
0
Y1
Memoryless
the
Memorylessproperty:
property: The
The time
time to
to the
next
arrival
is
independent
of
the
past
next arrival is independent of the past
Adding
Poisson
Merging
PoissonProcesses
Processes
Bernoulli/Poisson Relation
Bernoulli/Poisson Relation
! ! ! ! ! ! ! !
0
n = t /!
Arrivals
Sum
of of
independent
Poisson
Poissonrandom
randomvariSum
independent
ables
is Poisson
variables
is Poisson
Time
p ="!
np =" t
Merging
of independent
Poisson
processes
Sum
of independent
Poisson
processes
Poisson
is is
Poisson
1):
Erlang distribution:
POISSON
Times of Arrival
Times Arrival
of Arrival
Rate
POISSON
Continuous
Continuous
/unit time
BERNOULLI
Discrete
Poisson
/uni
t time
Exponential
Geometric
Erlang
Pascal
Poisson
"1
Discrete
p/per
trial
PMF of Rate
# of Arrivals
Arrival
PMF of # of Arrivals
BERNOULLI
"2
Binomial
p/per trial
Binomial
Exponential
Geometric
Erlang
Pascal
All flashes
(Poisson)
m vari-
29
Interarrival Times
LECTURE 15
Erlang
distribution:
Defining
characteristics
Review
Poisson process II
Time homogeneity:
k y k1eyP (k, )
f (y) =
y0
Yk
Independence
(k 1)!
Small interval probabilities (small ):
fY (y)
1 , if k = 0,
P (k, ) ,
if k = 1,
k=1
0,
if k > 1.
k=2
Random incidence
[N ] = var(N
) =
E
First-order
interarrival
times (k = 1):
exponential
Interarrival times (k = 1): exponential:
y0
fY1 (y) = ey ,
E[T1] = 1/
fT1 (t) = et, t 0,
Memoryless property: The time to the
next arrival is independent of the past
Time Yk to kth arrival: Erlang(k):
fYk (y) =
k y k1ey
,
(k 1)!
y0
Poisson fishing
Sum
of independent
Poisson
processes
Merging
of independent
Poisson
processes
is is
Poisson
Poisson
"1
All flashes
(Poisson)
"2
What
next
Whatis isthe
theprobability
probability that
that the
the next
arrival
comes
from
the
first
process?
arrival comes from the first process?
d) E[number of fish]=
30
Splitting of
of Poisson
Poisson processes
Splitting
processes
Assume
that email
trac through
a server
Each
message
is routed
along the
first
is
a
Poisson
process.
stream with probability p
Destinations of dierent messages are
independent.
Routings of different messages are independent
USA
Email Traffic
leaving MIT
MIT
Server
p!
(1 - p) !
Foreign
Renewal processes
Random incidence for Poisson
Random incidence in
processes
Series ofrenewal
successive
arrivals
i.i.d. interarrival
times
Series
of successive
arrivals
(but not necessarily exponential)
i.i.d. interarrival times
(but not necessarily exponential)
Example:
interarrival times are equally likely to
Bus
Example:
be
5
10 minutes
Bus or
interarrival
times are equally likely to
be 5 or 10 minutes
Time
Chosen
time instant
to next arrival?
31
LECTURE 16
Markov Processes I
Lecture outline
Checkout counter example
State Xn:
time n
number of customers at
.. .
10
Time 0
Time n-1
Time n
pij = P(Xn+1 = j | Xn = i)
p 1j
r ik(n-1)
p kj
...
...
r i1(n-1)
r im(n-1)
p mj
Model specification:
Key recursion:
rij (n) =
m
!
k=1
rik (n 1)pkj
P(Xn = j) =
m
!
i=1
32
Example
0.5
0.8
0.5
0.5
n=1
n odd: r2 2(n)=
n=2
0.2
n=0
0.5
2
n = 100
n = 101
n even: r2 2(n)=
r11(n)
r12(n)
0.4
r21(n)
r22(n)
0.3
r1 1(n)=
r3 1(n)=
r2 1(n)=
4
6
i transient:
P(Xn = i) 0,
i visited finite number of times
Recurrent class:
collection of recurrent states that
communicate with each other
and with no other state
33
0.3
LECTURE 17
Review
Markov Processes II
Lecture outline
Review
Key recursion:
!
rij (n) =
rik (n 1)pkj
Steady-State behavior
Warmup
9
Periodic states
P(X1 = 2, X2 = 6, X3 = 7 | X0 = 1) =
P(X4 = 7 | X0 = 2) =
Recurrent and transient states
State i is recurrent if:
starting from i,
and from wherever you can go,
there is a way of returning to i
9
5
4
8
6
3
34
Steady-State Probabilities
Do the rij (n) converge to some j ?
(independent of the initial state i)
Consider n transitions of a Markov chain with a single class which is aperiodic, starting from a given initial state. Let qjk (n) be the expected number
of such transitions that take the state from j to k. Then, regardless of the
Visit frequency interpretation
initial state, we have
qjk (n)
= j pjk .
lim
n
n!
j =
k pkj
Yes, if:
recurrent states are all in a single class,
and
! of being in j: j
(Long run) frequency
j =
rik (n 1)pkj
Frequency
of transitions
j: the
k pexpected
kj
has an intuitive meaning.
It expresses
the factk
that
frequency
of visits to j is equal to the sum of the expected frequencies k pkj of transition
!
that lead to j; see Fig.
7.13.
p
Frequency of transitions into j:
k kj
j pjj
for all j
k pkj ,
2p2j
. . .
Additional equation:
!
1 p1j
. . .
rij (n) =
k pkj
k=1
j = 1
m pmj
Example
0.5
0.8
0.5
Birth-death processes
In fact, some stronger statements are also true, such as the following. Wheneve
1experiment
- p1- q1
1 - q m of the Markov chai
1- p0
we carry out a probabilistic
and generate a trajectory
over an infinite time horizon, the observed long-term frequency with which state j
p
p
visited will be exactly equal0 to j , 1and the observed long-term frequency of transition
3
m
1
0
from j to k will be exactly equal to j p2 jk . Even
though the trajectory
is random, thes
q
q 1 certainty,
q2
m
equalities hold with essential
that is, with probability
1.
.. .
pi
0.2
i+1
ipi = i+1qi+1
q i+1
i = 0, 1, . . . , m
E[Xn] =
35
(in steady-state)
LECTURE 18
Review
Assume
a single class of recurrent states,
aperiodic;
aperiodic.
Then,
plus transient
states. Then,
LECTURE 20
Review
(n)
=j j
lim rrijij(n
)=
lim
n
n
where
not depend
dependon
onthe
theinitial
initial
where
jj does
does not
conditions
conditions:
Lecture outline
j | jX|0X
=0)i)=
=jj
lim P(X = =
lim P(X
n n
n
n
Lecture outline
can
m can
11,,......,, m
solution
to
the
solution of the
j =
!
k
Calculating
absorption
Calculating
expected
timeprobabilities
to absorption
0.8
0.8
0.5
0.5
k pkj
!j = 1
j j = 1
j
Example
j =
together with
Example
k pkj , ! j = 1, . . . , m,
together with
0.5
0.5
be
be found
found as
as the
theunique
unique
balance
equations
balance equations
Each
duration isis exponentially
exponentially
Each call
call duration
distributed
(parameter
)
distributed (parameter )
0.2
0.2
B
B lines
lines available
available
1 = 2/7, 2 = 5/7
1 = 2/7, 2 = 5/7
"#
i!1
B-1
i#
(X100
= 1 and X101 = 2)
PP
(X
100 = 1 and X101 = 2)
Balance
i1
i
Balance equations:
equations:
i1==i
i i
i
i = 0 ii
i = 0 i i!
i!
36
B
i
!
B i
0 = 1/ !
i
0 = 1/i=0 i!i
i=0 i!
1
3
44
0.5
0.4
54
0.5
0.6
0.2
3
0.3
0.8
44
0.5
0.4
0.6
0.2
2
1
0.8
For i = 4,
For i = 5,
ai =
ai =
ai =
i = 0 for i =
pij aj ,
unique solution
unique solution
pij tj ,
for all i =
% s
!
j
ts = 0,
0.2
2
"
j psj tj
37
pij j
LECTURE 19
Limit theorems I
Chebyshevs inequality
Random variable X
(with finite mean and variance 2)
2 =
X1, . . . , Xn i.i.d.
X1 + + Xn
n
What happens as n ?
Mn =
(x )2fX (x) dx
! c
(x )2fX (x) dx +
!
c
(x )2fX (x) dx
c2 P(|X | c)
Why bother?
2
c2
P(|X | c)
P(|X | k )
Convergence of Mn
(weak law of large numbers)
Deterministic limits
1
k2
Convergence in probability
Sequence an
Number a
an converges to a
lim a = a
n n
an eventually gets and stays
(arbitrarily) close to a
1 - 1 /n
pmf of Yn
1 /n
0
Does Yn converge?
38
X + + Xn
Mn = 1
n
Xi =
1,
0,
if yes,
if no.
Mn = (X1 + + Xn)/n
fraction of yes in our sample
E[Mn] =
Var(Mn) =
P(|Mn | ")
Var(Mn)
=
"2
n"2
P(|Mn f | .01)
2
M
n
(0.01)2
x2
1
=
4n(0.01)2
n(0.01)2
Mn converges in probability to
If n = 50, 000,
then P(|Mn f | .01) .05
(conservative)
Different scalings of Mn
X1, . . . , Xn i.i.d.
finite variance 2
Standardized Sn = X1 + + Xn:
Zn =
zero mean
variance n 2
unit variance
Sn
variance 2/n
n
converges in probability to E[X] (WLLN)
Mn =
Sn
n
Sn nE[X]
Sn E[Sn]
=
Sn
n
constant variance 2
P(Zn c) P(Z c)
Asymptotic shape?
39
LECTURE 20
THE CENTRAL LIMIT THEOREM
Usefulness
universal; only means, variances matter
accurate computational shortcut
Sn E[Sn]
Sn nE[X]
Zn =
=
n
Sn
E[Zn] = 0,
var(Zn) = 1
Normal approximation
Treat Zn as if normal
P(Zn c) P(Z c)
P(Z c) is the standard normal CDF,
(c), available from the normal tables
0.1
0.14
n =2
n =4
0.12
0.08
0.1
0.06
0.08
0.06
0.04
0.04
0.02
0.02
0
10
15
20
10
15
20
25
30
35
0.035
0.25
n =32
0.03
0.2
Xi =
0.025
0.15
0.02
0.015
0.1
0.01
0.005
0
100
0.12
120
140
160
180
0.07
n=8
0.06
0.08
0.05
0.06
X1 + + Xn nf
.01
0.04
0.03
0.04
0.02
0.02
0.01
0
10
15
20
25
30
35
40
10
20
30
40
50
60
X + + X nf
n
1
70
0.06
n = 32
0.9
0.05
0.8
0.7
0.04
0.6
0.5
0.03
0.4
0.2
0.01
0.1
0
0
30
40
50
60
70
80
90
.01 n
P(|Z| .02 n)
0.02
0.3
if yes,
if no.
Suppose we want:
200
0.08
0.1
0,
Mn = (X1 + + Xn)/n
0.05
1,
100
40
Apply to binomial
Sn = X1 + + Xn: Binomial(n, p)
Sn np
np(1 p)
standard normal
Example
18 19 20 21 22
Exact answer:
21
36 1 36
k=0
= 0.8785
18.5 18
Sn 18
19.5 18
3
3
3
Poisson=normal (????)
Binomial(n, p)
0.17 Zn 0.5
p fixed, n : normal
np fixed, n , p 0: Poisson
= 0.6915 0.5675
= 0.124
Exact answer:
36 1 36
19
= 0.1251
41
p()
p()
p()
p()
p()
pX|(x | )
Types of
Inference models/approaches
p()p()
Sample Applications
N
(x | )
pX|
Model
building
versus
inferring
unkno
wn
(x
|
)
p
p()
X|
N
p()
p()
Readings: Sections 8.1-8.2
X
Polling
variables.
E.g., assume X = aS + W
N
N
(x | )
p
X
ModelNbuilding:
X X|
Design of experiments/sampling
N
pX|(x | ) methodologies
N
It is the mark of truly educated people
| )(x | know
pX|(x
)
p
signal
S,
observe
X,
infer
a
X|
Lancet study on Iraq death toll
Xpresence of noise:
(x
| the
)
pX|
Estimation
in
X
(x
|
)
p
Estimator
X|
pX (x;X)
pX|(x | )
(Oscar Wilde)
X
Medical/pharmaceutical
trials
a, observe X,
pknow
estimate S.
p()
Estimator
()
Estimator
X
X
p()
Hypothesis testing: unknown
takes one of
Data
mining X
Reality
Model
N N possible values;
Estimator
p()
few
at small
(e.g., customer arrivals)
Poisson)
{0, 1}
X =+W
W aim
fW (w)
(e.g.,Netflix
competition
Estimator
N
probability
of
incorrect
decision
Estimator
Estimator
(x(x
| )
pX|
p()
N
| )
pX|
Finance
Estimator
Data
at a small
error
{0,
1} Estimator
X = aim
+W
W fW (w) Estimation:
| )
pX|(xestimation
LECTURE 21
Estimator
interpretation of experiments
Design &
polling,
trials.
W .f.W (w)
Matrix Completion
pmedical/pharmaceutical
Y |X(y | x)
Netflix
competition
Finance
Y |X (
! Partially observed matrix:pgoal
predict the
to| )
X
unobserved entries
2
5
?
2
3
1X
pX (x)
1/6
Estimator
X
10
Estimator
pX|
pX
(x; (x
) | )
pX|(x | )
p()
Estimator
N
pX|(x | )
X
p()
Estimator
X
X : unknown parameter
a r.v.) N
()
p(not
pX|(x | )
Estimator
E.g., =
mass
of
electron
pX|(x | )
p()
p
()
Bayesian:
Use priors &XBayes rule
Estimator
pX|(x | )
X
Estimator
N N
measurement
3 ?
pY (y ; )
5
1?
4
1
5 5 4
2 ?5 ? 4
3 3 1 5 2 1
pY (y ; )
3
1
2 3
N
4
51
3
3
3?
5
2 ? 1 p1Y (y ; )
pX (x)
5
2 ? X4 4
Y
1 3 1 5 4 5pY |X (y | x)
1 2
4
5?
sensors
f()
pY |X (y | x)
objects X
N
1 pY 4|X (y |5x)
W fW
1}
{0, 1}X =
X+
=W
+W
fW (w) {0,
W (w)
N
p() XX
p()
Classical statistics: X
{0, 1}
X
=
pX|(x |)
+ W
N
N
Estimator
Graph of S&P 500 index removed
Estimator
X {0, 1}
W fW (w)
Signal processing
W fW (w)
f()
Tracking,
detection, speaker identification,. . .
Y =X +W
pY (y ; )
pp()
()
{0,
1}| )
p(x
|(x
)
p
X|X|
1/6 X X4
NN
10
X = Estimator
+W
Estimator
Estimator
(x(x
| )
pX|
| )
pX|
Estimator
Estimator
XX
Estimator
Estimator
Hypothesis testing
discrete data
p|X ( | x) =
f|X ( | x) =
p() pX |(x | )
pX (x)
pX (x) =
continuous data
p|X ( | x) =
p() fX |(x | )
f() pX |(x | )
pX (x)
f()pX |(x | ) d
Example:
fX (x)
f() fX |(x | )
fX (x)
Zt = 0 + t1 + t22
Xt = Zt + Wt,
t = 1, 2, . . . , n
42
)
pX|(x | W
(w)
pX (x) =
f ( ) pf W
X | (x | ) d
(x
|
)
p
X|
W
pW ( w )
pX (x)
p()
X
Y = X + W
+
N | )
pN
N (x
X|
0 , p1Y} ( y ; ) Y = X + W
EXxaX
m{ ple:
N
| )
(x
pX|
X
(x
| |) )
pXp|(x
object at unknown location X
X|
W
pW ( w )
sensors
Least
Mean
Squares
Estimation
pX|(x | )
Estimator
p()
X X X
Y = X + W
Estimator in the absence of information
Estimation
X
N +W
{0, 1}
X=
W fW (w)
Estimator
{0, 1}
X =+W
W fW (w)
pX|(x | )
p X | (f1
= 1}1/6 X =4 + W10
| ()
Estimator
) {0,
W
ft W
E sEstimator
i m(w)
a t or
f=P()
(se nsor i1/6
se nses t4
h e o b j e c10
t |
= ) X
Estimator
{0,
=
X
+
W
W
fW
4t a
nsor
fW
(w
) = h
n c
{e1}
0,o{0,
1 } 1}
==
++
WW
W
f(w)
( dis
f10
fr o X
m X
se
i )
W (w)
()
Wf1/6
Npp()
)()
p(
fW (w)
{0, 1}
X =+W
4 4 4 101 010
ff()
1 /1/6
6
)() 1/6
f(
f()
Conditional expectation:
E[ | X = y] =
"
minimize E ( c)2
1/6
4
10
Estimator
"
E ( E[])2 = Var()
f|X ( | x) d
Two r.v.s , X
Unknown r.v.
we observe that X = x
"
#
( c)2 | X = x is minimized by
"
( E[ | X = x])2 | X = x
c=
E
E[( g(x))2 | X = x]
"
"
E ( E[ | X])2 | X E ( g(X))2 | X
"
"
E ( E[ | X ])2 E ( g(X))2
"
E[ | X] minimizes E ( g (X))2
over all estimators g()
43
p()
f()
f()
f()
N
f() f f()
(x | )
X|
fX|(x | )
LECTURE
22
)(x
x
fX|(xf | g(
) | )
g( ) f ()3
X|9
pX|(x | )
11
(x | )
f
X|
| )
(x
p pXp|(x
X
(x
| |) )
object at unknown location X
X|
X|
{0,
1}
=p W( w
+) W
WX
sensors
pX|(x | )
Estimator
p()
X X X
1/6
4
10 Y = X + W
Estimator
x
10
X
N +W
{0, 1}
X=
W fW (w)
Estimator
{0, 1}
X =+W
W fW (w)
pX|(x | )
p X | (f1
= 1}1/6 X =4 + W10
| ()
Estimator
) {0,
W
ft W
E sEstimator
i m(w)
a t or
f()
f()
1/6
4
10
f=P()
(se nsor i se nses t h e o b j e c t |
= )X
Estimator
(w)
{0,
=
X
+
W
W
fW
4t a
nsor
fW
(w
)4= h
n c
{ef1}
0,
1()
} 1}
X
==
++
WxW
W
f(w)
( dis
o{0,
f10
frfo X
se
i )
(x
|
)
fmX|
W
()
Wf1/6
()
fX|(x | )
W fW (w)
f()3
= g(X)
1/2
Topics
= g(X)
1/2 g( )
fW (w)
{0, 1}
X = +W
(x
)(x
x
g(
)|
4 4 f4X|
1/6
10
ff()
6
1fX|
0|10
)
)() g(
f(
1)/1/6
f()
f()
p() p
X| (x | )
1/6
X |
1/2 g( )
1
= g(X)
{0, 1}
4
X
10
10
MAP estimate:
= g(X)
X=
E[ | X] minimizes E ( g(X ))2
=g(X)
over
all estimators g()
10 )
g(
11
Estimator
p()
= g(X)
g( )
= g(X)
1/2
= g(X)
p() p
X| (x | )
1/2
f()
1
1/2
fX|(x | )
N
X
f()
+
1
= +
W
p 1 (x | )
+ 11/2
g( )
X|
f()
fX|(x | )
+1 +1
= g(X)
X
1
Estimator
fX|(x | )
g( )
x f()
p()
+1
g( )
= g(X)
fX|
Estimator
N (x | )
= g(X)
(x | )
LMSX|
estimation:
= g(X)
f
(x | )
(Bayesian) Least
1/2 means
1 squares (LMS)
1/2
1
fX|(x | )
= g(X)
N
estimation
X
f() Estimator
1
1
+1
+ 11/2
g( )
|
)
pX|(x
(Bayesian)
Linear LMS
estimation
f()
fX|(x | )
+1 +1
W fW (w)
X
1
= g(X)
Estimator
fX |(x | )
g( )
f()
p()
1/6
f()
+1
g( )
= g(X)
fX|
(x
|
)
Estimator
N
g(p )
X| (x | )
f()
= g(X)
fX|(x | )
for any x, = E[ | X = x]
Estimator
minimizes E ( )2 | X = x
over all estimates
f()
f() f f()
(x | )
X|
Estimator
g( ) f
)(x
x
fX|(xf | g(
) | )
X|9
11
x
y
p()
g( ) g(
) = g(X)
= g(X)
x|
f
X|
1/2 g
x | )
E[( E[ | X ])2 | X = x]
fW (w)
1/6
{0, 1}
4
10
10
E[( E[ | X])2 | X = x]
g( )
x
same
10 as Var( | X = x): variance of the
conditional distribution of
= g(X)
1/2
4
f
()
1
x
fX|
+ 1(x | )
11
9
3
5
Predicting X based on Yy
g( )
Two r.v.s X, Y
we observe that Y = y
= g(X)
fX|(x | )
1
1/2
=
:
Since
)
var() = var() + var(
Estimator
(x | ) error
x
fX|
Conditional mean
squared
11
9
5
f()
= g(X)
,
) = 0
cov(
f()
1/2
g X
p() p
X| (x | )
by
E (X c)2 | Y = y is minimized
1/2
c=
f()
1
1
+1
+1 /
g( )
|
)
pX|(x
f()
Some properties
of LMS estimation
fX|(x | )
+1 +1
= g(X)
X
Estimator
Estimator:
(xE![
) | X]
fX!=
g( )
f()
=
Estimation
error:
p()
g( " )
= g(X)
(x
|
)
fX|
]N= 0
| XEstimator
E[
E[
= x] = 0
= g(X)
g(p ) (x | )
X|
h(X)] = 0, for any function h
E[
mator
= g(X)
1/2
= g(X)
44
Linear LMS
Consider estimators of ,
= aX + b
of the form
Minimize E ( aX b)2
L = E[] +
2
L )2] = (1 2)
E[(
Cov(X, )
X =
+W
(X E[X])
L = E[] +
{0, 1}
var(X)
10
10
Cov(X, )
(X E[X])
var(X)
f()
4
fX|(x | )
g( ) f
f()
= g(X)
x|
f
X|
1/2 g
1
f() f f()
(x | )
X|
)(x
x
fX|(xf | g(
) | )
X|9
11
x
y
p()
g( ) g(
) = g(X)
= g(X)
1/2
= g(X)
1/2
g X
p() p
X| (x | )
f()
fX|(x | )
1
1/2
f()
The cleanest linear LMS example
Big picture
1
1
+1
+1 /
g( )
pX|(x
| )
Standard examples:
Xi = + Wf
i , (), W1 , . . . , Wn independent
fX|(x | )
2
2
Wi +
1
0, +
, 0
i X1
= g(X) Xi uniform on [0, ];
Estimator
n
fX!(x ! )
uniform prior on
2
g( )
Xi/i
/02 +
f()
i=1
p()
L =
Xi Bernoulli(p);
n
g( " )
1/i2
(weighted average
of , X1, . . . , Xn)
= g(X)
L = E[ | X1, . . . , Xn]
) normal,
g(
If pall
(x | )
X|
= g(X)
Estimation methods:
MAP
E[ | X] is the same as E[ | X 3]
MSE
Estimator
Linear
LMS is dierent:
= aX + b versus
= aX 3 + b
= a1 X + a2 X 2 + a 3 X 3 + b
Also consider
Linear MSE
45
Estimator
W fW (w)
Estimator
p()
X X
Estimator
NN
Estimator
p()
{0, 1}
X =+W
W fW (w)
Estimator
N
Estimator
Estimator
(x
|
)
p
p()
X|
N
pX|(x | )
Estimator
{0,
1} Estimator
X =+W
W fW (w)
pX|(x | )
(w)
{0,
1}
X
=
+
W
W fW
(w)
{0,
1}
X
=
+
W
f
W
W
N
pX|(x | )
p() XX
p()
Classical
statistics
X
LECTURE 23
pp()
()
1/6
X
=
+ W
{0, 1}
N
4
Estimator
X
10
Estimator
pX|
pX
(x; (x
) | )
pX|(x | )
Outline
pX|(x | )
Estimator
Classical statistics
Estimator
Estimator
These areEstimator
NOT conditional probabilities;
is NOT random
{0, 1}
W
fW (w)
mathematically:
X=
+W
many
models,
one for each possible value of
f()
1/6
10
Problem types:
Hypothesis testing:
H0 : = 1/2 versus H1 : = 3/4
Composite hypotheses:
1/2
H0 : = 1/2 versus H1 : =
,
Estimation: design an estimator
n] =
Unbiased: E[
n (in probability)
Consistent:
exponential example:
(X1 + + Xn)/n E[X] = 1/
pX |(x|)p()
MAP = arg max
pX (x)
)2] = var(
) + (E[
])2
E[(
) + (bias)2
= var(
i=1
max n log
exi
ML =
xi
i=1
n
x1 + + xn
n =
n
X1 + + X n
46
Estimate a mean
An 1 confidence interval
+
is a (random) interval [
n ],
n,
X1 + + X n
n
Properties:
n] =
E[
(unbiased)
n
WLLN:
+
P(
n
n ) 1 ,
s.t.
(consistency)
MSE: 2/n
|n |
/ n
1.96
1.96
n
n+
P
0.95
n
n
More generally: let z be s.t. (z) = 1 /2
z
z
n
n +
P
1
n
n
(1
)
n
1
(Xi )2 2
n i=1
n
1
n )2 2
(Xi
n 1 i=1
n2] = 2)
(unbiased: E[S
47
LECTURE 24
Review
Maximum likeliho o d estimation
Have model with unknown parameters:
X pX (x; )
Pick that makes data most likely
476
max pX (x; )
Chap. 9
Outline
in the context of various probabilistic
frameworks, which provide perspective and
a mechanism for quantitative analysis.
consider
Reviewthe case of only two variables, and then generalize. We
We first
wish to modeltheMaximum
relation between
two variables
of interest, x and y (e.g., years
likelihood
estimation
of education and income), based on a collection of data pairs (xi , yi ), i = 1, . . . , n.
For example,
xi could
be the years
of education and yi the annual income of the
Confidence
intervals
ith person in the sample. Often a two-dimensional plot of these samples indicates
a systematic, approximately
linear relation between xi and yi . Then, it is natural
Linear regression
to attempt to build a linear model of the form
pX |(x|)p()
pY (y)
1 confidence interval
y 0 testing
+ 1 x,
Binary hypothesis
+
P(
n
n ) 1 ,
1 are
unknown
parameters to be estimated.
where 0 and
Types
of error
In particular, given some estimates 0 and 1 of the resulting parameters,
the value yi corresponding
to xiratio
, as predicted
by the model, is
Likelihood
test (LRT)
yi = 0 + 1 xi .
Generally, yi will be dierent from the given value yi , and the corresponding
dierence
yi = yi yi ,
z
z
n
n +
P
1
n
n
is called the ith residual. A choice of estimates that results in small residuals
is considered to provide a good fit to the data. With this motivation, the linear
regression approach chooses the parameter estimates 0 and 1 that minimize
the sum of the squared residuals,
n
n
(yi yi )2 =
(yi 0 1 xi )2 ,
i=1
i=1
Regression
y
Residual
x yi 0 1 xi
(xi , yi )
Linear regression
Model y 0 + 1x
min
x y = 0 + 1 x
x + + xn
x= 1
,
n
yx
1 =
Figure 9.5: Illustration of a set of data pairs (xi , yi ), and a linear model y =
by minimizing
0 + 1 x,
+ 01, x1 the sum of the squares of the residuals
obtained
Model:
y 0over
yi 0 1 x i .
min
0 ,1
( yi 0 1 x i ) 2
i=1
(yi 0 1xi)2
2 2 i=1
n
i=1(xi x)(yi y)
n
2
i=1(xi x)
y + + yn
y= 1
n
0 = y 1x
()
One interpretation:
Yi = 0 + 1xi + Wi, Wi N (0, 2), i.i.d.
c exp
( y i 0 1 xi )2
0 ,1 i=1
=0
Check that
1 =
=
var(X)
E (X E[X])2
48
model: y 0 + x + x + x
formulation:
min
(yi 0 xi xi xi )2
, , i=1
Multicollinearity
model y 0 + 1h(x)
e.g., y 0 + 1x2
etc.
formulation:
min
(yi 0 1h1(xi))2
i=1
Types of errors:
P(X = x; H1)
> (discrete case)
P(X = x; H0)
fX (x; H1)
>
fX (x; H0)
(continuous case)
49
LECTURE 25
Outline
examples
pX (x; H1)
>
pX (x; H0)
or
fX (x; H1)
>
fX (x; H0)
is my die fair?
>
>
Xi >
algebra: reject H0 if
Xi2 >
Xi > ; H0 =
i=1
Xi2 > ; H0 =
i=1
50
Composite hypotheses
Is my die fair?
Hypothesis H0:
P(X = i) = pi = 1/6, i = 1, . . . , 6
Observed occurrences of i: Ni
reject H0 if T =
npi
>
Choose so that:
(Ni npi)2
= 31
(http://www.itl.nist.gov/div898/handbook/)
51
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.