Professional Documents
Culture Documents
N
j=1
a
ij
= 1, i.
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 4 / 30
First-Order Markov Models
w
1
: rain/snow
w
2
: cloudy
w
3
: sunny
= {a
ij
}
=
_
_
_
0.4 0.3 0.3
0.2 0.6 0.2
0.1 0.1 0.8
_
_
_
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 6 / 30
First-Order Markov Model Examples
Solution:
P(W
8
|) = P(w
3
, w
3
, w
3
, w
1
, w
1
, w
3
, w
2
, w
3
)
= P(w
3
)P(w
3
|w
3
)P(w
3
|w
3
)P(w
1
|w
3
)
P(w
1
|w
1
)P(w
3
|w
1
)P(w
2
|w
3
)P(w
3
|w
2
)
= P(w
3
) a
33
a
33
a
31
a
11
a
13
a
32
a
23
= 1 0.8 0.8 0.1 0.4 0.3 0.1 0.2
= 1.536 10
4
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 7 / 30
First-Order Markov Model Examples
Solution:
W
d+1
= {w(1) = w
i
, w(2) = w
i
, . . . , w(d) = w
i
, w(d+1) = w
j
= w
i
}
P(W
d+1
|, w(1) = w
i
) = (a
ii
)
d1
(1 a
ii
)
E[d|w
i
] =
d=1
d (a
ii
)
d1
(1 a
ii
) =
1
1 a
ii
M
k=1
b
jk
= 1, j.
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 10 / 30
First-Order Hidden Markov Models
k=1
jk
p(x|
jk
,
jk
)
where
jk
0 and
M
j
k=1
jk
= 1, j.
{a
ij
}, the state transition probability distribution
{b
jk
}, the observation symbol probability distribution
{
i
= P(w(1) = w
i
)}, the initial state distribution
= ({a
ij
}, {b
jk
}, {
i
}), the complete parameter set of the
model
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 12 / 30
First-Order HMM Examples
Your friend has a list of activities that she/he does every day
(such as playing sports, shopping, studying) and the choice
of what to do is determined exclusively by the weather on a
given day.
Assume that
Istanbul has a weather state distribution similar
to the one in the previous example.
Given the model and the activity of your friend, you can make a
guess about the weather in
Istanbul that day.
For example, if your friend says that she/he played sports on the
rst day, went shopping on the second day, and studied on the
third day of the week, you can answer questions such as:
Speech recognition
all W
T
P(V
T
, W
T
|)
=
all W
T
P(V
T
|W
T
, )P(W
T
|).
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 19 / 30
HMM Evaluation Problem
t=1
P(v(t)|w(t))
__
T
t=1
P(w(t)|w(t 1))
_
=
T
t=1
P(v(t)|w(t))P(w(t)|w(t 1))
where P(w(t)|w(t 1)) for t = 1 is P(w(1)).
Dene
j
(t) as the probability that the HMM is in state w
j
at
time t having generated the rst t observations in V
T
j
(t) = P(v(1), v(2), . . . , v(t), w(t) = w
j
|).
j
(t), j = 1, . . . , N can be computed as
j
(t) =
_
_
_
j
b
jv(1)
t = 1
_
N
i=1
i
(t 1)a
ij
_
b
jv(t)
t = 2, . . . , T.
Then, P(V
T
|) =
N
j=1
j
(T).
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 21 / 30
HMM Evaluation Problem
i
(t) = P(v(t + 1), v(t + 2), . . . , v(T)|w(t) = w
i
, )
is the probability that the HMM will generate the
observations from t + 1 to T in V
T
given that it is in state w
i
at time t.
i
(t), i = 1, . . . , N can be computed as
i
(t) =
_
_
_
1 t = T
N
j=1
j
(t + 1)a
ij
b
jv(t+1)
t = T 1, . . . , 1.
Then, P(V
T
|) =
N
i=1
i
(1)
i
b
iv(1)
.
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 22 / 30
HMM Evaluation Problem
Dene
i
(t) as the probability that the HMM is in state w
i
at
time t given the observation sequence V
T
i
(t) = P(w(t) = w
i
|V
T
, )
=
i
(t)
i
(t)
P(V
T
|)
=
i
(t)
i
(t)
N
j=1
j
(t)
j
(t)
where
N
i=1
i
(t) = 1.
= arg max
i=1,...,N
i
(t).
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 25 / 30
HMM Decoding Problem
Dene
ij
(t) as the probability that the HMM is in state w
i
at
time t 1 and state w
j
at time t given the observation
sequence V
T
ij
(t) = P(w(t 1) = w
i
, w(t) = w
j
|V
T
, )
=
i
(t 1) a
ij
b
jv(t)
j
(t)
P(V
T
|)
=
i
(t 1) a
ij
b
jv(t)
j
(t)
N
i=1
N
j=1
i
(t 1) a
ij
b
jv(t)
j
(t)
.
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 27 / 30
HMM Learning Problem
i
(t) dened in the decoding problem and
ij
(t) dened
here can be related as
i
(t 1) =
N
j=1
ij
(t).
Then, a
ij
, the estimate of the probability of a transition from
w
i
at t 1 to w
j
at t, can be computed as
a
ij
=
expected number of transitions from w
i
to w
j
expected total number of transitions away from w
i
=
T
t=2
ij
(t)
T
t=2
i
(t 1)
.
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 28 / 30
HMM Learning Problem
Similarly,
b
jk
, the estimate of the probability of observing the
symbol v
k
while in state w
j
, can be computed as
b
jk
=
expected number of times observing symbol v
k
in state w
j
expected total number of times in w
j
=
T
t=1
v(t),v
k
j
(t)
T
t=1
j
(t)
where
v(t),v
k
is the Kronecker delta which is 1 only when
v(t) = v
k
.
Finally,
i
, the estimate for the initial state distribution, can
be computed as
i
=
i
(1) which is the expected number of
times in state w
i
at time t = 1.
CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 29 / 30
HMM Learning Problem