Professional Documents
Culture Documents
CS109/Stat121/AC209/E-109
(2)
qij = qik qkj
k
ince to get from i to j in two steps, the chain must go from i to some intermediar
Data Science
tate k, and then from k to j (these transitions are independent because of the Marko
roperty). So the matrix Q2 gives the 2-step transition probabilities. Similarly (b
3 4
This Week
HW3 due next Thursday (Oct 17) at 11:59 pm
start now!
randomly choosing which arrow to follow. 3 Here we 4 assume that if there are a arr
originating at state i, then each is chosen with probability 1/a, but in general
arrow could be given any probability, such that the sum of the probabilities on
arrows leaving i Figure
is 1. The1: transition
A Markovmatrix
Chainofwith 4 Recurrent
the chain Statesis
shown above
Transition matrix, 0 1
can be visualized by thinking of1/3 1/3 1/3
a particle 0
wandering around from state
if at each stage an B 0 0 1/2 1/2 C
Q=B C.
arrow is followed @ 0 1
2
0 0 A
1/2 0 0 1/2
uniformly at random:
To compute, say, the probability that the chain is in state 3 after 5 steps, star
at state 1, we would look at the (3,1) entry of Q5 . Here (using a computer to
distributions, via algorithms known as MCMC (Markov Chain Monte Carlo).
(n)
Qnan
Definition of Markov Chain
To see where the Markov modelqijcomes is the (i, j)
from, entry of
consider first . i.i.d. sequence of
random variables X0 , X1 , . . . , Xn , . . . where we think of n as time. Independence is
Example. Figure 1 shows an example of a Markov chain with 4 states. T
a very strong assumption: it means that the Xj s provide no information about each
other. At the other extreme, allowing general interactions between the Xj s makes
it very difficult to compute even basic things. 1Markov chains are a happy medium
between complete independence and complete dependence.
The space on which a Markov process lives can be either discrete or continuous,
Chain
and time with
can be4either
states: discrete or continuous. In 2
Stat 110, we will focus on Markov
chains X0 , X1 , X2 , . . . in discrete space and time (continuous time would be a process
Xt defined for all real t 0). Most of the ideas can be extended to the other cases.
Specifically, we will assume that Xn takes values in a finite set (the state space),
which we usually take to be {1, 2, . . . , M } (or 3{0, 1, . . . ,4M } if it is more convenient).
In Stat 110, we will always assume that our Markov chains are on finite state spaces.
The matrix Q is called the transition matrix of the chain, and qij is the transition
probability from i to j.
step. For example, the chain below is a birth-death chain if the labeled transitions
Example: Birth-Death Chain
have positive probabilities (except for the loops from a state to itself, which are
allowed to have 0 probability). We will now show that any birth-death chain is
or all states j with 2 j M . Choose s1 so that the sj s sum to 1. Then the chain
s reversible with respect to s, since qij = qji = 0 if |i j| 2 and by construction
si qij = sj qji if |i j| = 1. Thus, s is the stationary distribution.
graphically as a collection of states, each of which corresponds to
r residue,Application:
with arrowsDNA Sequence
between Analysis,
the states. CpG chain
A Markov Islandsfor DNA c
n like this:
A T
C G
source:
e we see a state Durbin
for each et al,
of the fourBiological Sequence
letters A, Analysis
C, G, and T in the DNA
A probability parameter is associated with each arrow in the figure,
mines the probability of a certain residue following another residue,
following another state. These probability parameters are called the t
probabilities, which we will write ast :
The transition probabilities for each model were set using the equation
lly as a collection of states, each of which corresponds to a par-
Application: DNA Sequence
with arrows between the states. A Markov c
+ chain for
+Analysis, CpG
st DNA can be Islands
ast = ! + , (3.3)
: t # cst #
A for where
and its analogue
ast , T
+
cst is the number of times letter t followed letter
In C-G dinucleotides, the C
s in the labelled regions. These are the maximum likelihood (ML) estimators for
the transition probabilities, as described often mutates
in Chapter 1. to a T due to
(In this case there were almost 60 000 methylation. InML
nucleotides, and a CpG island
estimators are ade-
G of eachthe
quate. If theCnumber of counts typemethylation
had been small,isthen
suppressed.
a Bayesian es-
timation process would have been more appropriate, as discussed in Chapter 11
a stateand
for below
each offor
theHMMs.) The
four letters resulting
A, C, G, and Ttables
in theare
DNA alpha-
lity parameter is associated with each arrow in the figure, which
+
probability Aor one C
ofAa certainC residueGfollowingT another residue, G T
another
A state. These0.274
0.180 probability parameters
0.426 0.120 are called
A the transi-0.205
0.300 0.285 0.210
ies, which we will write a st :
C 0.171 0.368 0.274 0.188 C 0.322 0.298 0.078 0.302
G 0.161 0.339 0.375 0.125 G 0.248 0.246 0.298 0.208
T ast = P(x
0.079 i = t|x0.384
0.355 i1 = s). 0.182 T 0.177(3.1)0.239 0.292 0.292
3 4
How important
Figure a page
1: A Markov Chainiswith
depends not only
4 Recurrent Stateson
can be
how many other pages link to it, but also on
visualized by thinking of a particle wandering around from state to stat
how important those pages are!
2
Markov Chain Monte Carlo (MCMC)
http://en.wikipedia.org/wiki/Darwin's_finches
Darwins Finches
Island
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Total
1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 14
2 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 13
3 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 14
4 0 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 10
5 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 0 12
Species
6 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 2
7 0 0 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 10
8 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
9 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 10
10 0 0 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 11
11 0 0 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0 6
12 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2
13 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 17
Total 4 4 11 10 10 8 9 10 8 9 3 10 4 7 9 3 3 122
Island
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Total
1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 14
2 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 13
3 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 14
4 0 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 10
5 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 0 12
Species
6 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 2
7 0 0 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 10
8 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
9 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 10
10 0 0 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 11
11 0 0 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0 6
12 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2
13 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 17
Total 4 4 11 10 10 8 9 10 8 9 3 10 4 7 9 3 3 122
One
12
move of a Markov chain that preserves
Table 1.1: Presence of 13 finch species (rows) on 17 islands (columns). A 1 in entry
(i, j) indicates that species i was observed on island j. Data are from Sanderson
CHAPTER 1. MARKOV CHAIN
(2000).
row and column sums:
Given these data, we might be interested in knowing whether the pattern of 0s
and 1s observed in the table is anomalous in some way. For example, does there
have one of the two following patterns:
appear to be dependence between the rows and columns? Do some pairs of species
2. If submatrix is 0 1
or competition. One way to test for such patterns is by looking at a lot of random
tables with the same row and column sums as the observed table, to see how the 1 0
or
observed table compares to the random ones. This is a common technique in statistics
known as a goodness-of-fit test. 1 0 0 1
But how do we generate random tables with the same row and column sums
then swap between them;
as Table ??? The number of tables satisfying these constraints is impossible to
s1 s2 s3 s4 hidden
x1 x2 x3 x4 observed
Figure 3.2 The histogram of the length-normalised scores for all the se-
quences. CpG islands are shown with dark grey and non-CpG with light
grey. HMM Application: CpG Islands
C+ G+
A+ T+
A T
C G
Figure 3.3 An HMMsource: Durbin
for CpG islands. et alto the transitions shown,
In addition
there is also a complete set of transitions within each set, as in the earlier
simple Markov chains.
The challenging part is that we can only observe the
sequence of As, Cs, Ts, Gs, not whether the state was an
expect CpG islands to stand out with positive values. However, this is somewhat
island (+) or non-island (-).
unsatisfactory if we believe that in fact CpG islands have sharp boundaries, and
are of variable length. Why use a window size of 100? A more satisfactory ap-
proach is to build a single model for the entire sequence that incorporates both
HMM Application: Speech Recognition
Three Fundamental Questions for HMMs
s1 s2 s3 s4
x1 x2 x3 x4
Three Fundamental Questions for HMMs
Methods:
1. Forward algorithm, backward algorithm (dynamic
programming, recursive)
2. Viterbi algorithm (dynamic programming, recursive)
3. Baum-Welch algorithm (a form of the EM algorithm
[Dempster-Laird-Rubin])
Finding the probability p(x)
X X
Naive method: p(x) = p(x, s) = p(x|s)p(s)
s s
Twostar
Triangle
6
Exponential Random Graph Model
Idea: Make edge, triangle, two-star totals be sufficient statistics
in an exponential family.
p(G) exp(edges (# edges)
+ triangles (# triangles)
+ two-stars (# two-stars))
(Number of nodes presumed fixed.) More generally,
"
p(G) exp( x(G))
To get = instead of , need normalizing constant:
"
exp( x(G))
p(G) =
c()
Normalizing constant c() is unknown!
For 20 nodes, sum involves7 1057 terms....
MCMC for Generating Random Networks
True Approximated
Gibbs Sampler
Explore space by updating one coordinate at a time.
Figure 11.3 Four independent sequences of the Gibbs sampler for a bivariate
normal distribution with correlation =0.8, with overdispersed
starting points indicated by solid squares. (a) First 10 iterations,
showing the component-by-component updating of the Gibbs
iterations. (b) After 500 iterations, the sequences have reached
approximate convergence. Figure (c) shows the iterates from the
second halves of the sequences.
3. Flip a coin that lands Heads with probability aij , independently of the Markov
chain.
4. If the coin lands Heads, accept the proposal and set Xn+1 = j. Otherwise, stay
in state i; set Xn+1 = i.
In other words, the modified Markov chain uses the original transition probabilities
pij to propose where to go next, then accepts the proposal with probability aij ,
Metropolis-Hastings Algorithm
Posterior simulation 293
Figure 11.2 Five independent sequences of a Markov chain simulation for the
bivariate unit normal distribution, with overdispersed starting points
indicated by solid squares. (a) After 50 iterations, the sequences are
still far from convergence. (b) After 1000 iterations, the sequences
are nearer to convergence. Figure (c) shows the iterates from the
second halves of the sequences. The points in Figure (c) have been
jittered so that steps in which the random walks stood still are not
hidden. The simulation is a Metropolis algorithm described in the
example on page 290.
yi = 0 + 1 xi + i
But what about differences between schools?
yi = j[i] + xi 1 + i
2
j N( 0 , 0 )
yi = j[i] + xi 1 + i
j = 0 + 1 zj + j
2
j N (0, 1 )