Anderson1957 - Statistical Inference About Markov Chains

Statistical Inference about Markov Chains Author(s): T. W. Anderson and Leo A.
Goodman Reviewed work(s): Source: The Annals of Mathematical Statistics, Vol. 28, No. 1 (Mar., 1957), pp. 89-110 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/2237025 . Accessed: 24/05/2012 00:22
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The Annals of Mathematical Statistics.
http://www.jstor.org
STATISTICAL INFERENCE ABOUT MARKOV CHAINS T. W.

ANDERSON AND LEO
A.
GOODMAN1
and University Columbia University of Chicago Summary.Maximum likelihoodestimatesand their asymptoticdistribution are obtained for the transitionprobabilitiesin a Markov chain of arbitrary orderwhen thereare repeatedobservations of the chain. Likelihoodratio tests and x2-tests of the formused in contingency tables are obtainedfortestingthe following hypotheses:(a) that the transition probabilities of a first orderchain are constant,(b) that in case the transition probabilities are constant,they are and (c) that the processis a uth orderMarkov chain against specified numbers, the altemative it is rth but not uth order.In case u = 0 and r = 1, case (c) resultsin tests of the null hypothesis that observations at successivetimepoints are statistically independent against the alternatehypothesis that observations orderMarkov chain. Tests of several otherhypotheses are froma first are also The statisticalanalysis in the case of a singleobservationof a long considered. chain is also discussed.There is some discussionof the relationbetweenlikeliand X2-tests of the formused in contingency hood ratio criteria tables. A Markov chain is sometimesa suitable probabilitymodel 1. Introduction. forcertaintime seriesin whichthe observationat a given time is the category into which an individual falls. The simplestMarkov chain is that in which there are a finitenumberof states or categoriesand a finitenumberof equiare made, the chain is of first-order, distanttime pointsat whichobservations and the transitionprobabilitiesare the same for each time interval. Such a chain is describedby the initial state and the set of transitionprobabilities; namely, the conditionalprobabilityof going into each state, given the immediatelyprecedingstate. We shall considermethods of statisticalinference for this model when there are many observationsin each of the initial states and the same set of transition probabilities operate.For example,one may wish or test hypotheses about them. We deto estimatethe transition probabilities whenthe numberof velop an asymptotic theoryforthese methodsof inference for more observationsincreases.We shall also considermethods of inference need not be the generalmodels,forexample,wherethe transition probabilities same foreach time interval. of the use of some of the statisticalmethodsdescribedherein An illustration came from a "panel has been given in detail [2]. The data for this illustration study" on vote intention.Preceding the 1940 presidentialelection each of a each numberof potentialvoters was asked his party or candidate preference
Received August29, 1955;revisedOctober18, 1956. 1This work oftheSocial ScienceResearchCouncil, was carried outunderthesponsorship and the Statistics Branch,Office of Naval Research. The RAND Corporation,
89
90
T. W. ANDERSON AND LEO A. GOODMAN
monthfromMay to October (6 interviews). At each interview each personwas classified as Republican,Democrat,or "Don't Know," the latterbeinga residual categoryconsistingprimarilyof people who had not decided on a party or candidate. One of the null hypotheses in the study was that the probability of a voter's intention at one interview depended only on his intention at the immediatelyprecedinginterview(first-order case), that such a probabilitywas constantover time (stationarity), and that the same probabilities hold for all individuals.It was of interestto see how the data conformed to this null hypothesis,and also in what specific ways the data differed fromthishypothesis. This presentpaper develops and extendsthe theoryand the methodsgiven in [1] and [2]. It also presentssome newermethods,whichwere first mentioned in [9], that are somewhatdifferent fromthose given in [1] and [2], and explains how to use both the old and new methodsfor dealing with more generalhypotheses. Some corrections of formulasappearing in [1] and [2] are also given in the present paper. An advantage of some of the new methods presented herein is that, for many users of these methods,their motivationand their applicationseem to be simpler. The problemof the estimation of the transition probabilities, and of the testing of goodnessof fitand the orderof the chain has been studied by Bartlett [3] and Hoel [10] in the situationwhereonly a singlesequence of states is obierved; they consider the asymptotictheory as the number of time points sncreases.We shall discussthissituationin Section5 of the presentpaper,where of the formused in contingency a x2-test tables is givenfora hypothesis that is a generalization of a hypothesisthat was consideredfromthe likelihoodratio pointofview by Hoel [10]. In the presentpaper, we presentboth likelihoodratio criteriaand X2-tests, and it is shown how these methodsare related to some ordinarycontingency table procedures.A discussion of the relation between likelihoodratio tests and x2-tests appears in the finalsection. For further discussionof Markov chains,the readeris referred to [2] or [7]. 2. Estimationof the parametersof a first-order Markov c.hain. 2.1. The model. Let the states be i = 1, 2, - - *,m. Though the state i is from1 to m, no actual use is made of usually thoughtof as an integerrunning this orderedarrangement, so that i mightbe, forexample,a politicalparty,a geographicalplace, a pair of numbers(a, b), etc. Let the times of observation 1,... ,m;t = 1, --,T)betheprobabet= 0,1,., T.Letpi,(t) (i,j= bilityof state j at time t, given state i at time t - 1. We shall deal both with , T) (a) stationarytransition probabilities(that is, pij(t) = pij fort = 1, and with (b) nonstationary transition probabilities (that is, wherethe transition need not be the same for each time interval),We assume in this probabilities sectionthat thereare ni(O) individualsin state i at t = 0. In this section,we treat the ni(O) as though they were nonrandom, while in Section 4, we shall discuss the case where they are randomvariables. An observationon a given
MARKOV
CHAINS
91
individualconsistsof the sequence of states the individualis in at t = 0, 1, **, , i(T). Given the initial state i(O), there are m T, namely i(O), i(1), i(2), possiblesequences.These represent mutuallyexclusiveeventswith probabilities (2.1)
Pi(o)i(l) Pi(I)i(2)
*
Pi(T-1)i(T)
Let nij(t) denote the numberof individualsin state i at t- 1 and j at t. We shall show that the set of nij(t) (i, j = 1, , m; t = 1, , T), a set statisticsfol the observedsequences. of m2T numbers,forma set of sufficient Let ni(o)i(i).. .i(T) be the numberof individualswhose sequence of states is i(O), i(l), **-i(T). Then
(2.2) ngj(t)
=
replaced by pi(t-l)i(t) (t) throughout.)
when the transitionprobabilitiesare stationary.(When the transitionprobabilities are not necessarily stationary, symbolsof the form pi(t-_)i(t) should be
ni(j)
... i (T)
wherethe sum is over all values of the i's withi(t -1) = g and i(t) = j. The probability,in the nmT dimensionalspace describingall sequences for all n individuals(for each initial state thereare nT dimensions),of a given ordered set of sequencesforthe n individualsis
II
(2.3)
i((.)3i(()
(2) - - - Pi(T-1)i(T)(T)]ni(O)i(1).*. =J(I [pi(o) i(l)(1j)]ni(o) i t1) -i (T)) * * *([pi(T-1)ji(T) [Pi(o)i(l) (1) I
pi()i(2)
i (T)
(T) ]ni
= (
T II t=1
pi(O),(.) (1)ni(o)i(j)(1))
. ..
i (T-1),i(T)
II H
( ?)
i ( 1).*.*. i ( T))
)i(T_ p,(,_,),(T) (T)n i(T-
TI
(t)noi(t) g,j pg,
m; s = O,
two lines are over all values of the T + 1 indices. wherethe productsin the first Thus, the set of numbersnij(t) forma set of sufficient statistics,as announced. of the nij(t) is (2.3) multipliedby an appropriate The actual distribution functionof factorials. Let ni(t - 1) = Z=1 nij(t). Then the conditional , m, given ni(t -1) distribution of nij(t), j = 1, (or given nk(s), k = 1,
*, t 1) is
(2.4)
ni(t
m
j=1
1)? "
tI nij(t) ! j-'
p ,(t)nii(t). HII
as one would obtain if one had ni(t - 1) observaThis is the same distribution with probabilitiespij(t) and with resulting distribution tions on a multinomial of the nij(t) (conditionalon the ni(O)) is numbersnij(t). The distribution
(2.5) 1 ][ (t)nii(t)___ nij(t)! 'JJ
92'
For a Markov chain withstationary transition probabilities, a stronger result concerning sufficiency followsfrom (2.3); namely, the set nij = Et- nij(t) forma set of sufficient statistics.This followsfromthe fact that, when the transitionprobabilitiesare stationary,the probability (2.3) can be written in the form (2.6)
II II p;,vi(t)
t-1 g.J
I p7i
i,
For not necessarilystationarytransitionprobabilitiespij(t), the nij(t) are a minimalset of sufficient statistics. 2.2. Maxium likelihoodestimates. The stationarytransitionprobabilities the probability(2.6) with respect to the pij can be estimatedby maximizing pj _ 0 and pij, subject of courseto the restrictions (2.7)
1,
i = 1,2,
,m,
when the nif are the actual observations.This probabilityis preciselyof the same form,except fora factorthat does not depend on pij, as that obtained form independent , m) consistsof samples,wherethe ith sample (i = 1, 2, n* = EZnij multinomial trialswith probabilities pij(i, j = 1, 2, m,). For such samples,it is well-known and easily verified that the maximumlikelihood estimatesforpij are
T
pij (2.8) (2.8)

T
En
E nii(t)/ E-o ni(t), tl1 t
~~~~~T ~
tl T-1
nij(t)/ E E
kl-
t-1
nik(t)
in which and hencethisis also trueforany other distribution the elementary probability is of the same form exceptforparameter-free factors, and the reon thepij are thesame.In particular, strictions it appliesto theestimation of theparameters pij in (2.6). Whenthe transition are not necessarily probabilities the general stationary, usedin thepreceding canstillbe applied, approach and themaximum paragraph forthepij(t) arefound likelihood estimates to be m (2.9) pij(t) = nq(t)/ni(t - 1) = nru(t)/ZE nk(t).
k-1
The samemaximum likelihood forthepij(t) are obtained whenwe estimates ofn,j(t)given consider theconditional distribution thejoint ni(t - 1) as when ofthenmu(l), distribution these estimates nij(2), *- , niu(T)is used.Formally ifforeach i and t onehad ni(t - 1) observaare thesameas one wouldobtain tionson a multinomial withprobabilities distribution pij(t) and withresulting n j(t). numbers
MARKOV CIAINS
93
way: Let the entriesnfi(t) The estimatescan be describedin the following forgivent be enteredin a two-waym X m table. The estimateof pij(t) is the i, in the table divided by the sum of the entriesin the ith row.In order jth entry chain,add the corresponding entriesin the twoto estimatepiq fora stationary way tables for t =1, **, T, obtaininga two-waytable with entriesnii nij(t). The estimateof pii is the i, jth entryof the table of nij's divided by the sum of the entriesin the ith row. of the maximumlikelihoodestimatespresentedin The covariance structure on. this sectionwill be givenfurther
of ni,(t). To findthe asymptoticbehavior of the 2.3. Asymptotic behavior lk p*, first consider the nij(t). We shall assume that nk(O)/Z nj(O) 0, EI Vk = 1) as E n3(O) -m oo. For each i(O), the set ni(o)j(l)... i(T) are (fk > (0) and parameters simply multinomial variables with sample size nf(O)
-
Pi(O)i(i)
of these multias the sample size increases.The nij(t) are linear combinations distributed. nomial variables,and hence are also asymptotically normally of the matrixPt. Then ptt is the Let P = (psi) and let p[t be the elements probability of statej at timet givenstate i at time0. Let nkf; i,(t) be the number of sequences includingstate k at time 0, i at time t - 1 and j at time t. Then we seek the low ordermomentsof
Pi(l)i(2) ...
pi(T-1)i(T) ,
distributed and henceare asymptotically normally
(2.10)
n 2(t)-
k-I
nk;i,(t)-
The probability associated withnk;i (t) is Pki Thus (2.11) (2.12) (2.13)
6nk; j(t) =
pij with a sample size of nk(O).
nk(0)pki
pij, P"ki-li j
Var{nk; i(t)
I
-
nk(0)p "k1iPij[1 -nk(0)p'
Cov{nk;jj(t),
nk;,h(t)
pijpk9
P,
(i, j)
= (g,
h),
distribution. Covariances between since the set of nk;ij(t)followsa multinomial othervariablesweregivenin [1]. ofnk; i(t)wherenk;i(t - 1) Let us now examinemoments nk;i(t -l)pij, Ej nk;ij(t); they will be needed in obtainingthe asymptotictheoryfor test of nk; ii(t) given nk; i(t - 1) is easily procedures.The conditionaldistribution with the probabilities seen to be multinomial, pij. Thus, (2.14) (5
fnk;ij(t)
-
g{nk;ij(t)
nk;i(t
I nk;i(t
-
1))
pt,jnk;i(t
1),
0-.
1)pi;I
nk;i(t
-
8t[nk:ij(t)
l)pij]
I nk;i(t
1)
99
T. W. ANDERSON
AND
LEO
A. GOODMAN
The variance of this quantityis

8[fnk;ij(t)- nk;i(t- 1) pij]2 (2.16)
=
JI{[fnk;ji(t)
nk;i(t
1) pij]
I nk;i(t
1)}
= &nk;,(t =
1) pij(l
pij(l
-
pij)
nk(O) pk* '
pi)
The covariancesof pairs of such quantitiesare

8[nk;ij(t)
nk;i(t -
1) Pij][fk;ih(t) -
nk;i(t
-
1) Pih]
-
(2.17)
881 [77k;ij(t)
=8[fnk;i(t -
nk;i(t -
l)pij][nk;ih(t)
-nk(O) -
nk;i(t
l)PihJ I nk;i(t
1)}
1) Pii Pih] =
-
pki
nk;g(t
p*jpih ,j
1) Pgh]
h,
nk;g(t
-
8[fk;ij(t)
nk;i(t =
l)Pij][fk;gh(t)
88&{[k;ij(t)
nk;fi(t -
l)Pi3j] [k;gh(t)
1)}
l)PghI
I nk;i(t = O,
?[nk;ij(t) nk;i(t -
1), nk;g(t
# q. ~~~~~~~~~~~i + r)
-
l)Pij][fk;gh(t
nk;i(t -
nk;g(t
+ r - l)pg,]
nk;g(t
=29 (
=0,
I[nk;ij(t)
nk;g(t
l)Pij]l[k;gh(t
+ r) nk; ij(t)}
7 -
l)Pgh]
r -1),
nk; i(t -1),
r > 0.
To summarize, the randoinvariables nk; ii(t) - nk; i(t - 1)pij forj = 1, m have means 0 and variances and covariancesof multinomial variables with V'. The variablesnk; j(t) - nk; i(t -1)p probabilities pij and samplesize nk(0)pkit if t X s or i 5 g. and nk;gh(S) - nk;g(S - l)pgh are uncorrelated Since we assume nk(O) fixed,nk;jj(t) and nl;gh(t) are independent if k X 1. Thus (2.20)
8[nij(t)
-
ni(t - 1)pij] = 0,
m
=
(2.21) ?
(2.22)--mj
[nij(t) - ni(t -
1)pij]2
ksl
"nk(0) -
pij(l
1)pih]
pij)
8[n0j(t)- ni(t - 1)ptj][nh(t) -ni(t

k=1
Znk(0))
-
pk
PijPih, j 5 h,
=
(2.23)
&[nij(t)
n*(t-
1)pij][ngh(s)
ng(s
- l)pgh]
0,
t X s or i
g.
MARKOV CHAINS
95
of the estimates. It will now be shownthat distribution 2.4. The asymptotic n -oo wheni
.r
T
p ij)
V-n
pi
ni(t -~
1)
(2.24)
-=/n
~~~ 1)] -1)] -pijni(t :?[nij(t)

n(t
[nk;ij(t)
_t=l
m
T
1)
pij nk.i(t - 1)]
- ,n
_k-1
- -
t-I
j:
E ni(t -1) and the means, variances and covariances normal distribution, has a limiting will be found. Because nk; ii(t) is a multinomial of the limitingdistribution variable,we knowthat nk;ij(t)/nt~ [nk;ij(t)lnk(O)]qkr --. rn. Thus in probability converges to its expectedvalue when nk(O)/n
(2.25)
(2.26) (2.26)
n'12 Therefore
p lim
-2Enf(tn>,o 71t=1
n-~o r&
1)
fl~OOm
=
k31
flk
noont t=l
T
lim -
E nZ(t
pk$
1)
t=l
( jj-pij)
as distribution has thesamelimit

t=i
(2.27)
, [nii(t) -pijni(t - 1)/nl/2 m

E
7pt-1]
variance
(see p. 254in [6]). of (2.27) has mean0 and in Section Fromtheconclusions 2.3,thenumerator E nii(t)- pini(t-1)
_ t31
k=l1 t=1
(2.28)
/n
t=l ~~~~~~k=l
Z nk(0)pkR-' E
T
pij(l
pj)/n.
is numerators twodifferent between The cov ariance

(2.29)
[i f
mtT
nii(t) - pin(t
-1)][Z
ngh(t)
phn(t-1)
k=i
n
pkI
jpgh/n,
-bigE where Sg = 0 if i $ g and bii = 1.
fk(0) E t=i n
96 Let
(2.30)
k-1 t -1
E E7A vp[t-1]
variance of the numeratorof (2.27) is 4i pij(1 - pij), and Then the limiting numerators is -big qt pij pgh. two different the limiting covariance between of normalizedmultiof (2.27) are linear combinations Because the numerators sample size, they have and increasing nomial variables,with fixedprobabilities and the variances and covariancesof this limit a limiting normaldistribution variancesand covariances(see, e.g., distribution are the limitsof the respective Theorem2, p. 5 in [4]). - jp) has the same limitdistribution as (2.27), the variables Since n112 (Aij withmeans 0, variances n1/2 (ij - pij) have a limiting joint normaldistribution . The variables pij) (n4i)"2(pij pi,(l1- pij)/4iand thecovariances- igpit?pgh/Ai with means 0, variancespij(l - pij) joint normaldistribution have a limiting pij) has a limiting and covariances -bigpij-pgh. Also, the set (ni*)112 withmeans 0, variancespij(ljoinitnormaldistribution pij) and covariances -5igpijPgh, wheren* = ET-:0 ni(t). 12 (Aij-ps) fora giveni has the same limiting In otherterms, the set (n4i) as the estimatesof multinomial distribution probabilities pij with sample size n* in the ith state for n4i, whichis the expectedtotal numberof observations form different t = 0, *.**, T -1. The variables (n4oi)112 (pij-pij) values of i (i = 1, 2,*--, m) are asymptotically independent(i.e., the limitingjoint as distribution factors),and hence have the same limitingjoint distribution probabilities obtained fromsimilarfunctionsof the estimatesof multinomial pij fromm independent m). It samples with sample sizes n4i (i = 1, 2, will oftenbe possible to reformulate hypothesesabout the pij in termsof m ofmultinomial trials. samplesconsisting independent We shall also make use of the factthat the variablespij(t) = nij(t)/ni(t -1) as the estimatesof for a given i and t have the same asymptoticdistribution with sample sizes 8ni(t - 1), and the variablesjij(t) multinomial probabilities indevalues of t are asymptotically values of i or two different fortwo different pendent.This fact can be proved by methodssimilarto those used earlierin the pij(t) it will sometimes this section.Hence, in testinghypotheses concerning the hypothesesin terms of m X T independent be possible to reformulate of multinomial may then trials,and standardtest procedures samplesconsisting be applied.
-
(i
and confidence regions. 3. Tests ofhypotheses and confidence regions. 3.1. Tests of hypothesesabout specificprobabilities distribution section,we On the basis of the asymptotic theoryin the preceding Here we shall assume that can derive certainmethodsof statisticalinference. everyps, > 0. that certaintransition the hypothesis First we considertesting_ probabilities
MARKOV
CHAINS
97
values pi%. We make use of the fact that under the null hypij have specified pothesisthe (n*)112(pijP- pIj) have a limiting normaldistribution with means zero, and variances and covariancesdependingon pt in the same way as obtains for multinomialestimates. We can use standard asymptotictheoryfor multinomialor normal distributions to test a hypothesisabout one or more a confidence regionforone or morep,j. pij, or determine As a specificexample considertestingthe hypothesisthat pij - pSi,i 1,** , m, fora giveni. Under the null hypothesis, (3.1)
J=i
n n*
psi
has an asymptotic x2-distribution withm - 1 degreesof freedom (accordingto the usual asymptotic theoryof multinomial variables). Thus the criticalregion of one test of this hypothesis at significance level a consistsof the set pij for which (3.1) is greaterthan the a significance point of the x2-distribution with a conm - 1 degreesof freedom. A confidence coefficient regionof confidence sists of the set p?j forwhich (3.1) is less than the a significance point. (The po 2 can be replaced by pij.) Since the variables n4(jij in the denominator i are asymptotically i are fordifferent independent, the forms(3.1) fordifferent and hencecan be added to obtainother asymptotically independent, X2-variables. For instance a testforall pij (i, j = 1, 2, ***, m) can be obtained by adding (3.1) over all i, resulting in a x2-variable withm(m - 1) degreesof freedom. The use of the X2-test of goodness of fitis discussedin [5]. We believe that thereis as good reason for adoptingthe tests,which are analogous to x2-tests of goodnessof fit,describedin this sectionas in the situationfromwhichthey wereborrowed(see [5]). In the stationaryMarkov chain, pij is the probabilitythat an individual in to this assumpstate i at timet - 1 moves to statej at t. A generalalternative tionis that the transition probability dependson t; let us say it is puj(t).We test the null hypothesis H:pij(t) = pij (t = 1, .- , T). Under the alternate hyfortimet are pothesis,the estimatesof the transition probabilities (3.2) nji(t) P = (t) ni(t -1)
3.2. Testingthe hypothesis are constant. that the transition probabilities
is The likelihoodfunietion maximizedunder'the null hypothesis (3.3)
( TIi,j nj t=l
TI I
t i,j
is maximizedunderthe alternative The likelihoodfunction
(3.4)
pij(t)
98
T. W. ANDERSON
AND LEO
A. GOODMAN
ratio criterion The ratio is the likelihood (3.5)

X
11I II
I(
A slight extensionof a theoremof Cram6r [61 or of Neyman [11] shows that when as x2with (T - 1) [m(m - 1)1 degreesoffreedom -2 log X is distributed is true. the null hypothesis The likelihoodratio (3.5) resembleslikelihoodratios obtained for standard in contingency tables (see [6], p. 445). We shall now detests of homogeneity tables. A proof forcontingency to usual procedures this similarity velop further table approachare asymptotically that the resultsobtainedby this contingency earlierin this sectionwill be givenin Section 6. equivalentto thosepresented as the estidistribution For a giveni, the set Pij(t) has the same asymptotic samples.An m X T probabilities mates of multinomial pij(t) forT independent table, can be table, which has the same formalappearance as a contingency - and for j = 1, 2, **, m the joint estimatesPij(t) fora given used to represent T. and t 1 l 2,*-
t
1 p i (l)
m2
pi2(1)
... ...
m
Pim(J)
2
T
pil(2)
pil(T)
Pi2(2) pi2(T)
jim(2) pim(T)
...
jth column is equal to pij in all T rows; that is, pij(t) = pij for t = 1, 2, **
by the T is that the randomvariablesrepresented of interest The hypothesis so that the data are homogeneousin this rows have the same distribution, to the hypothesisthat there are m constailts pil, respect.This is equivalenit associated with the withEZ pij = 1, such that the probability Pi2, * *
,
here ([6], p. 445); that is, in order of homogeneity seems appropriate The %2-test we calculate to test thishypothesis,
(3.6) xi =
T.
E t,j
ni(t
1)[pij(t)
P-
] / Pij
with (m - 1) distribution has the usual limiting ifthe null hypothesis is true,x% for T independentsamples Anothertest of the hypothesisof homogeneity ratio criterion; trialscan be obtainedby use of the likelihood frommultinomial forthe data given in the m X T table, that is, in orderto test this hypothesis calculate = (3.7) Ai,(t)]nii j(t Ai HI[ [pij / p
tJ
(T - 1) degrees of freedom.
which is formallysimilar to the likelihood ratio criterion.The asymptotic of -2 log Xi is x2 with (m - 1)(T - 1) degreesof freedom. distribution
MARKOV CHAINS
99
table approach dealt The precedingremarksrelating to the contingency can be testedseparatelyforeach with a givenvalue of i. Hence, the hypothesis value of i. that pij(t) = pij forall i 1, 2, **, the joint hypothesis Let us now consider follows m, j = 1, 2, ... , m, t = 1, - - *, T. A test of this joint null hypothesis fromthe fact that the randomvariables pij(t) and jij fortwo different directly Hence, under the null hypothesis, independent. values of i are asymptotically independent, the set ofx2calculatedforeach i = 1,2, ** , m are asymptotically and the sum (3.8) x
i=1
Zx
m
t,j
ni(t -1)[pij(t)
withm(m - 1)(T - 1) degreesof freedom. distribution has the usual limiting based on (3.5) can be written the test criterion Similarly, (3.9)
E-2logXi=
i-1
-2log X.
a thatthe chain is of a givenorder.Considerfirst 3.3. Test of the hypothesis Markov chain. Given that an individualis in state i at t - 2 and second-order , T) be theprobability , m; m t = 2, 3, in j at t - 1, let pijk(t) (i, j, k = 1, chain is stationary, of being in state k at t. When the second-order pijk(t) t = 2, *--, T A first-order stationarychain is a special second-order pijk for chain,one forwhichpijk(t) does not depend on i. On the otherhand, as is wellas a more complicatedfirstchain can be represented known,the second-order orderchain (see, e.g. [2]). To do this, let the pair of successivestates i and j definea compositestate (i, j). Then the probabilityof the compositestate state (i, j) at t - 1 is pijk(t). Of course,the prob(j, k) at t giventhe composite states are easily abilityof state (h, k), h $ j, given (i, j), is zero. The composite 0. probabilities seen to forma chain with m2 states and with certaintransition Markov is usefulbecause some of the resultsforfirst-order This representation chains can be carriedover fromSection2. Now let nijk(t) be the numberof individualsin state i at t - 2, in j at t - 1, and in k at t, and let nij(t - 1) = Ek nijk(t). We assume in this sectionthat the idea of the earliersections extending the n,(O) and nij(l) are nonrandom, where the n.(O) were nonrandomand the nij(1) were random variables. The for statistics nijk(t) (i, j, k = 1, *-, m; t = 2, *.-, T) is a set of sufficient of nijk(t), given the different sequences of states. The conditionaldistribution
nij(t-
1), is
(3.10)
;n
(t) 1) k
need not be the same foreach time interval, probabilities (When the transition of be the symbols course, replacedby the appropriate should, pijk(t) throughpijk
100
T. W. ANDERSON
AND LEO
A. GOODMAN
out). The jointdistribution ofnijk(t) fori, j, k = 1, , m and t = 2, *, T, when the set of nij(1) is given,is the productof (3.10) over i, j and t. For chains withstationary transition probabilities, a stronger resultconcerning sufficiency can be obtained as it was for first-order chains; namely. the numbersn,jk = Et-2 nijk(t) forma set of sufficient statistics.The maximum likelihoodestimateof Pijk forstationary chainsis
(3.11)
m
pijk=
ijk= 1-1 T T
ni
t=2
niJk(t)/
t=2
nir(t -
1).
Now let us considertestingthe null hypothesis that the chain is first-order against the alternativethat it is second-order. The null hypothesisis that Plik = P2ik Pmjk = say, forj, k = 1,***, m. The likelihood pik. ratio criterion fortestingthishypothesis is2
(3.12)
=
i,j,k-l
(Pjk/
ijk)'ik,
where (3.13)
p=
ni/ E E ni,
m m
/ T-1
n njk(t)/
Z nj(t)
is the maximum likelihood estimate of pjk. We see herethatpjk differs somewhatfrom (2.8). Thisdifference is due to thefactthatin theearlier section the variables whilein thissection we assumed thatthe n,j(l) nij(l) wererandom werenonrandom. Underthe nullhypothesis, -2 log X has an asymptoticx2- 1) - m(m- 1) = m(m- 1)2 degrees with distribution m2(m offreedom. We observe that the likelihood ratio (3.12) resembles likelihood ratiosobtained for problems relating to contingency tables. We shallnowdevelop further thissimilarity to standard forcontingency procedures tables. For a given j, then"12 (j. have thesameasymptotic distribution as -Pijk) the estimates of multinomial form independent probabilities samples(i = 1, contingency table,can be used to represent the estimates fora givenj pik and fori, k = 1, 2, * -, m. The nullhypothesis is thatpijk = Pik fori = 1, 2, ... , m, and thex2-test of homogeneity seemsappropriate. To testthishypothesis, calculate (3.14) where (3.15)
nf*j
Xi=
2,
...
, m). An m X m table, which has the same formalappearance as a
E nIj(pijk i,k
*T
Pik) /Pjk
T t-2
Z Enjk
k
= E niik(t)
t=2
nj(t
1) =
T-1 tlI
ni3(t).
-
If thehypothesis is true, with(m distribution Xj has theusual limiting offreedom. degrees

2 The
1)2
criterion (3.12) was written incorrectly in (6.35) of [1] and (4.10) of [2].
MARKOV CHAINS
101
test of the hypothesisof In continuedanalogy with Section 3.2, aniother for m independentsamples frommultinomialtrials canl be obhomogeneity We calculate tained by use of the likelihoodratio criterion. (3.16)
A
1. (=pik i,k
/ Pijk)',
The asymptotic which is formallysimilar to the likelihood ratio criteriotn. of -2 log Xj is x2with (m - 1)2 degreesof freedom. distributioni table approach dealt with relatingto the contingency remarks The preceding a given value of j. Hence, the hypothesiscan be tested separatelyfor each value ofj. that Pijk = Pik, forall i, j, k = 1, Let us now considerthe joint hypothesis the sum can be obtainedby computing 2, , m. A testof thisjoint hypothesis
(3.17)
n
j=1
x,
j,i,k
Z ifjk
k -
/ PjkX
with m(m- 1)2 degreesof freedom. distribution whichhas the usual limiting based on (3.12) canibe written Similarlythe test criterion
(3.18)
-2 logXj= -2 logX = 2 E nijlog

ijk
[ijk
/jk]
2 E nijk [log pijk

ijk
logpjk]
The precedingremarkscan be directlygeneralizedfor a chain of order r. of the transition probability m) denote Let Pij ..ki (i, j .** X k, 1 = 1, 2, ... ... r t time and statej at + 1 state 1at time t, given state k at time t- 1 and state i at timet - r (t = r, r + 1, * , T). We shall test the null hypothesis that the processis a chain of orderr -1 (that is, pij. .ki = pi. ... k for i = 1, that it is not an r -1 but an r-order 2, * , m) against the alternatehypothesis chain. of the statesi, j, ***, k, 1 at frequency Let n,..-.kl(t) denotethe observed I t,and let nii ....k(t-1 timest -r, t -.r + 1, *s therespective The that the 1) are nonrandom. niq.. .k(rEl-, nif... kl(t). We assumehere
estimate ofpij.. likelihood maximum

(3.19)
kl iS
Pij... kl =
ki
nij
- -kI/n,
whereni,.. (3.20)
Et-=r
nij
El... l
..
.kl(t)
and
TTt-r
fl>...k
A- =
nj...k(t
1) =
t-r-1
(t)
distribu,k For a givenset j, Ic,the set pij.. ik will have the same asymptotic m for of multinomial (i = samples independent probabilities tion as estimates null If the m table. an m be hypothesis X 2, ..., m), and may represented by
102
T. W. A NDEUcSON AND
LEO
A . GOODMAN
1, 2, (Pj ..kl= pj * ki fori seemsappropriate,and (3.21) = Z.

Xi-..k
of homogeneity m) is true,thenthe x2-test

Pi
...k
where (3.22)
i
i,l1
..k(Pij .. kl -
P... ki
J...kl nWk/ npj
T t=r
,T-1
E nnj,k(t)/
t=lr-l
f,... k(t),
= p-.. -kl fori = of freedom underthe jointnullhypothesis degrees (p,.j.kl , m and all valuesfrom 1, 2, 1 to m ofj, , k) is true. Another testofthenullhypothesis can be obtained by use ofthelikelihood ratiocriterion
the sum Ej,.
has the usual limiting distribution with(m - 1)2 degrees of freedom. We see herethat pj .- kl differs somewhat from the maximum likelihood estimate for an (r- 1)-order chain(viz.,Zt=r- nj ... kl(t)/Zt=-r2 nf.. pj.. -ki for .k(t)).This - 1), foran (r - 1)-order difference is due to thefactthatthenjf...kz(r chain, p. -kl while are assumed to be multinomial random withparameters variables in thisparagraph we have assumed thatthenj... kl(r - 1) are fixed. k = 1, 2, , m), setsj,**, k (j= 1, 2, *,m; Sincethere aremr-1
..,k Xi
E.
willhave the usual limiting withmr1(rn_ 1)2 distribution
(3.23)
-1*j*k=
II i.l
(iu..qcz/itj...
k1 lij
where-2 log Xi.. offreedom. Also, (3.24)

*. *,k j,-
is distributed as x2with(m - 1)2 degrees asymptotically

kz)
o {-2 logX... k} = 2 i j , nij... kIg(jA.---kI/jA... *,k,I
withmr1-(m- 1)2 degrees x2-distribution has a limiting offreedom whenthe is true(see [10]). jointnullhypothesis r 1, the testis of the nullhypothesis In the specialcase where thatobtimepointsare statistically at successive servations independent againstthe a first-order thatobservations arefrom chain. alternate hypothesis The reader willnotethatthe method used to testthe nullhypothesis that r - 1 againstthe alternate is a chainof order the process thatit hypothesis is oforder to testthenullhypothesis r can be generalized is of thattheprocess u against order thealternate thatit is oforder r (u < r). By an aphypothesis in thissection, to that presented earlier we can compute proachsimilar the 2 or -2 times thelogarithm of the likelihood ratioand observe that x_-criterion aredistributed as x2within7-mu](rn - 1) dethese statistic asymptotically offreedom whenthenullhypothesis is true. grees In this section, we have assumedthat the transition are the probabilities thatis, stationary. It is possible to testthe null sameforeach timeinterval, that the rth orderchain has stationary transition hypothesis probabilities
MARKOV CHAINS
103
using methodsthat are straightforward generalizations of the tests presented in the previoussectionforthe special case of a first-order chain. 3.4. Test of the hypothesis that several samples are fromthe same Markov chain of a givenorder.The generalapproach presented in the previoussectiorns can be used to test the null hypothesis that s (s ? 2) samplesare from the same rthorderMarkov chain; that is, that the s processesare identical. Let p ..kl --=nj. ..ki/n, hk denote likelihood themaximum estimate of the rth order transition for the process fromwhich sample h probabilityp(. ., (h = 1,2, ** , s) was obtained.We wishto testthenullhypothesis thatp(h) ..k = k for h = s. the it 1, 2, , pij-.. herein, followsthat Using approachpresented wherefl, ,. . .k = Eh r4,. ..k and px ? ..kl = n,; .,1k/Zv=.. n8? .k,, has the usual limiting distribution with (s - l)(m - 1) degreesof freedom. Also, Ei,j. .. x2-distribution withm7(s - )(m -1) degree.sof freedom. Xii..k has a limiting Whens = 2, Xii. k can be rewritten in the form
.
(3.25)
k 'Xti. - -k
iXl fhtl
'(p(j)
klh
P(;
)2/jp()
I (pIn (3.26) Xij,.-k E= Cj.ic.kl)/pi;. .1 ..-kl - P(j) wherenj4?..kl .1 is the estimate pn. of obtainedby poolingthe data in the two ? (1/n, deg). Also, samples, and :tib. . ik = (s/n - 1).k) >Et,d,o..,k - has the in the two sample usual limiting distribution withm-(m -m1) degreesoffreedom case. Analogousresultscan also be obtainedusing the likelihood-ratio criterion.
3.5. A test involving twosets of states. In the case of panel studies,a person each individualaccording is usually asked se=eral questions.We mightclassify to his opinionon two different questions.In an examplein [2], one classification saw the advertisement indicatedwhe the es on of a certainproductand the otherwahether he boughtthe productin a certaintime interval.LJet the state 1,*,Aandfl= 1,* , B wherea denotesthe first besdenoteds(a,,),aopinionor class and d3thesecond.We assume that the sequenceof statessatisfies a first-order Markovchainwithtransition probabilities ask whether pE,,j.. WNe the sequenceof is independent ofthat in the second. ghanges in one classification For example,if a personnoticesan advertisement, is he morelikelyto buy the of independence product?The null hypothesis of chanfges is uulak (3.27) cl ,B,Y c= = quers aos = 1m igh A;
1, ind *, B,
is a transition where qah probability forthe first and r is forthe classification second.We shall find ratiocriterion fortesting thisnullhypothesis. the likelihood Let nado ;(a) be thenumber ofindividuals in state(a, r)at td-1 and (t , v) at t. Fromthe previous results, the maximum likelihood estimate of satisfi when the null hypothesis is not assumed,is
(3.28)
(3.28
pa,^
Pa.p
,- =
A
A
EEna#,.wh
B-I
104
T. W. ANDERSON
AND
LEO
A. GOODMAN
is assumed,the maxWhen the null hypothesis wherenO,,,>= ET=. nac,.4V(t). estimateof pa,#,,;is 'a r, , where imumlikelihood
B
(3.29)
Eai
~,v=1B s=lA#
A
(3.30)
is The likelihood ratiocriterion (3.31)

= t = 1 a ,,u= 1 ;,t It1
A1
A
pa#,A,
and Under the null hypothesis,-2 log X has an asymptoticx2-distribution, offreedom is AB(AB - 1) -A (A - 1) - B(B - 1) = the numberofdegrees(A - 1)(B- 1)(AB + A + B). 4. A modifiedmodel. In the precedingsections,we assumed that the ni(O) An alternative is that the ni(O) are distributed multinomially were nonrandom. of the set nij(t) with probabilityji and sample size n. Then the distribution of the set nJ(O)whichis is (2.5) multiplied by the marginaldistribution (4.1)
II ni(0)!
i=l
In this model, the maximumlikelihoodestimateof pij is again (2.8), and the estimateof st is maximum.likelihood (4.2)
v
ni(0) n
The means,variances,and covariancesof nij(t) - njt - l)pij are foundby taking the expected values of (2.20) to (2.23); the same formulasapply with with ni(O). nfk(O) replaced by nflfk Also nij(t) - ni(t - l)pij are uncorrelated variancesand covariances the asymptotic Since nk(O)/n estimatesflk consistently, of n112 ( iipii) are as in Section 2.4. It followsfromthese facts that the model. theoryof the testsgivenin Section 3 hold forthis modified asymptotic somewhatif the chain The asymptoticvariances-and covariances simplify startsfroma stationary state; that is, if
(4.3)
k=1
k Pki =
MARKOV
CHAINS
105
thatthechainstarts If it is known For thenE 'mk T=i. P"k-l - vi and Xistate, equations (4.3) should be of some additionaluse in the froma stationary estimationof Pki when knowledgeof the qi, or even estimatesof the 77i,are available. We have dealt in this paper with the more generalcase whereit is not known whether(4.3) holds, and have used the maximumlikelihoodestimates for this case. The estimatesobtained for the more general case are not in the special case of a chain in a stationarystate because relevant efficient is ignored.In the special case, the maximumlikelihoodestimates information logL = Enij logpuj + Eni(O) by maximizing forthe 77j and pij are obtained 1, pij ? log vi subject to the restrictions ?j= ,/pij = 1, >2i 1pijP= ?i,j 0, vi ? 0. In the case of a chain in a stationarystate wherethe ?li are known, the maximum likelihood estimates for the pij are obtained by maximizing = 1, Zi Nij = hj, pij > 0. ,jpij Enij log pij subject to the restrictions can be used to obtain the equations for the maximum Lagrange multipliers hood estimates.
5. One observationon a chain of great length. In the previous sections, oo, and hence ni(O) l' n -+ oo, while was fixed.The case of one observedsequence of states (n = 1) the asymptotic has been studiedby Bartlett[3] and Hoel [10],and theyconsider theorywhen the numberof times of observationincreases (T -* oo). Bartlett has shown that the numbernij of times that the observed sequence was in , T, is asymptotically 1, state i at timet - 1 and in statej at timet,fort = I.. in the 'positively regular'situation(see [3], p. 91). He also distributed normally has shown ([3], p. 93) that the maximumlikelihoodestimatespij = n,dn* variancesand covariancesgivenby the usual (n* = Z>i nij) have asymptotic multinomialformulasappropriateto 8 n* independentobservations(i = 1, m and thatthe , i), probabilities 2, * n,m) from multinomial pij (j = '1, 2, values of i are 0. An argumentlike asymptoticcovariancesfor two different have a limiting - pj) that of Section 2.4 shows that the variables (n*)112 (j means 0 and the variances and covariancesgiven ir wvith normal distribution way by IJ.A. Gardner[8]. Section 2.4. This resultwas provedin a different Thus we see that the asymptotic theoryfor T -> oo and n = 1 is essentially the same as for T fixedand ni(O) - oo. Hence, the same test proceduresare chains. For example, valid except for such tests as on possiblynonstationary Hoel's likelihoodratio criterion [10] to test the null hypothesisthat the order to that it is r is para,llel of the chain is r - I against the alternatehypothesis for for this test given in Section 3.3. The x2-test the likelihoodratio criterion of the tests to the case wherethe null and the generalizations this hypothesis, is that is that the processis of orderu and the alternatehypothesis hypothesis the process is of order r(u < r), which are presentedin Section 3.3, are also in Section3.1 can be generalized presented applicableforlarge T. Also,thex2-test to provide an alternativeto Bartlett's likelihoodratio criterion [31 for testing
forni(O) results werepresented asymptotic
Etl
thatpij...kl thenullhypothesis
pij... k
(specified).
106
T. W. ANDERSON
AND
LEO
A. GOODMAN
6. X2-tests and likelihoodratio criteria.The X2-tests presentedin this paper are asymptotically equivalent,in a certainsense,to the corresponding likelihood ratio tests,as will be proved in this section.This fact does not seem to follow fromthe general theoryof X2-tests; the X2-tests presentedhereinare different fromthose X2-teststhat can be obtained directly by considering the numberof individualsin each of the mTpossiblemutuallyexclusivesequences (see Section 2.1) as the multinomial variablesofinterest. The X2-testsbased on mTcategories need not considerthe data as having been obtained froma Markov chain and the alternatehypothesis may be extremely general,whilethe X2-testspresented hereinare based on a Markov chain model. For small samples, not enough data has been accumulated to decide which tests are to be preferred (see commentsin [5]). The relativerate of approach to the asymptoticdistributions and the relative power of the tests for small samples is not known.In this section,a methodsomewhatrelated to the relative powerwill be tentatively suggestedfordecidingwhichtests are to be preferred when the sample size is moderately large and thereis a specific alternate hypothesis.An advantage of the X2-tests,which are of the formused in contingencytables, is that, for many users of these methods,their motivation and theirapplicationseem to be simpler. We shall now prove that the likelihoodratio and the X2-tests (tests of homogeneity)presentedin Section 3.2 are asymptotically equivalentin a certain has an asymptotic sense. First,we shall show that the X2-statistic X2-distribution under the null hypothesis.The method of proof can be used wheneverthe relevantp's have the appropriatelimitingnormal distribution. In particular, this will be true forstatisticsof the formx%(see (3.6)). In orderto prove that similarto the likelihood statisticsof the formXi (see (3.7)), whichare formally ratio criterionbut are not actually likelihood ratios, have the appropriate we shall then show that -2 log Xi is asymptotically asymptoticdistribution, and therefore to the X%-statistic, it has an asymptotic X2-distribution -equivalent under the null hypothesis.Then we shall discuss the question of the equivalence of the testsunderthe alternatehypothesis. The methodof proofpresented here can be applied to the appropriatestatisticsgiven in the other sections as well as wheren - oo. herein,and also whereT -X Let us considerthe distribution of the x2-statistic (3.8) under the null hypothesis.From Section 2.4, we see that n'12 (pij(t) - pij) are asymptotically withmeans 0 and variances pij(1 - pij)/mi(t - 1), etc., distributed normally t or different wheremi(t) = Sni(t)/n. For different i, they are asymptotically variindependent. Then the [nmi(t- 1)]1I2 [k,j(t) - p,j] have asymptotically = m (t- 1) pij(t)/ m;(tances pij(l - pij), etc. Let tjMM E 1). Then jA has an asymptotic by the usual x2-theory, Enmi (t - A)[ ij(t) - p*j]2/ underthe null hypothesis. But x2-distribution
(6.1)
P lim(j
j)
MARKOV
CHAINS
107
because (6.2) p lim (niM) )-miQ))
O.
From the convergence in probabilityof (pj4 p ij) and (mi(t) - ni(t)/n), and the fact that n'/2 - pij) has a limiting distribution, it followsthat
(ij(t)
(6.3) p lim nE mi(t -1)(P,(t)
Puj
p-,)
ni(t
i)(jp ,(t)
)ij
.u)]
Hence, the X2-statistichas the same asymptoticdistribution as ,nm,(t1) _ that is, a x2-distribution.This proofalso indicatesthat the X2-distribution.We shall now show that (3.6) also have a limiting Xi-statistics -2 log Xi (see (3.7)) is asymptotically equivalentto x% underthe nullhypothesis; and hence will also have a limiting x2-distribution. note that for{xl < a We first
[Aij(t) A4'j]2/j4j;
(6.4) (1 + x) log (1
and (6.5)
+ x)
=
=
(1 + x)(x-x2/2
x + x2/2
-
+
-
3/3-x4/4+ x/2 +
-
*-)
(2/6)(1
=
(1 + x) log (1 + x) -2 log Xi = -2
x2/21
(x/6)(1
x/2 +
)?_Ix3I
(see p. 217 in [6]). We see also that j(t)] ni1(t)log [ki/p E j,t
(6.6)
= 2
E j t j,t
ni(t -
1) Apij(t) log [PAij(t)/jAi] A
= 2 E ni(t - 1) ij[1 + xij(t)] log [1 + xij(t)], .where xij(t) = is xi-statistic

[ij(t)
-_
iI/Aij
. The difference A between -2 log X) and the
A = -2 logXi-x
= 2 Zj,t ni(t - l)Aij{ [1 + xij(t)] log [1 + xij(t)] - [xij(t)]2/21. Since Z=1 (6.8 /A
=
A ijxij(t)
0,
_
2E j,t
ni(t - l)Ai{ij [1 + xij(t)] log [1 + xij(t)] -xij(t)
[xij(t)]2/21.
We shall show that A convergesto 0 in probability;i.e. forany e > 0, the tendsto unityas of the relationI A I < c, underthe null hypothesis, probability therelation satisfies n= i ni(t) -m o. The probability Pr{ I A I < e} ? Pr{ I A I < e and I xij(t)I < 2} (6.9) and I xi(t)I <-} > Pr{ 1 2Ej,t ni(t- 1)pi[xij(t)]3 I< > Pr{2n Ej,t I xij(t) I" < e and I xij(t) I < I}.
108
T. W. ANDERSON
AND
LEO
A. GOODMAN
ability. Sincexij(t) (6.10)

(6.11) X
It is therefore necessaryonly to prove that n[xij(t)]3 convergesto 0 in prob=

[
the null hypothesis, and
pjp(t)-pij]/j
converges to zeroin probability under

p
x,nx,
t)
A]t]n
[
Xij(t)]2
it followsthat
n[xij(t)]3 - [(xj,(t)n)/2
convergesto zero in probability when the null hypothesis is true. Q.E.D. Since the x2-statistic has a limiting x2-distribution under the null hypothesis, and A = -2 logXi in probability converges to zero,-2 log Xi = X2 + A has a limiting x2-distribution underthe null hypothesis. The methodpresentedhereinforshowingthe asymptoticequivalence of -2 log Xi and x2 could also be used to show the asymptoticequivalence of statisticsof the form -2 log X and x2.It was proved in Section 3.2 that, under the null hypothesis,-2 log X has a limitingx2-distribution with m(m - 1) (T - 1) degreesof freedom.(The proofin Section 3.2 applied to X,a likelihood ratio criterion, but would not apply to Xi since they are not actually likelihood ratios.) Hence, we have anotherproofthat the x2-statistic has the same limiting distribution as the likelihoodratio criterion underthe null hypothesis. The previous remarksreferto the case where the null hypothesisis true. Now suppose the alternatehypothesis is true; that is, p,j(t) pij(S) forsome t, s, i, j. It is easy to see 'thatboth the x2-test and the likelihoodratio test are conslstent underany alternatehypothesis. In otherwords,if the values of pij(t) forthe alternatehypothesis and the significance level are kept fixed,then as n increases,the powerof each test tends to 1 (see [5] and [11]). In orderto examinethe situationin whichthe poweris not close to 1 in large samples and also to make comparisonsbetweentests, the alternatehypothesis may be moved closer to the null hypothesisas n increases. If the values of pi(t) for the alternate hypothesisare not fixed but move closer to the null hypothesis, it can be seen that the two tests are again asymptotically equivalent. This can be deduced by a slightmodification of the proofof asymptotic equivalence underthe null hypothesis givenin this section (see also [5], p. 323). We shall now suggestanotherapproach to the comparison of thesetestswhen the alternate hypothesisis kept fixed. Since the null hypothesisis rejected when an appropriatestatistic(x2 or -2 log;X) exceeds a specified criticalvalue, we mightdecide that the X2-test is to be preferred to the likelihoodratio test if the statisticx2is in some sense (stochastically)largerthan -2 log X under the alternatehypothesis. Since ni(t) is a linear combinationof multinomialvariables, we see that = mi(t). Hence, in probability n,(t)/n converges to its expectedvalue 8[nu(t)/n] in probability to x /n converges (6.12)
i,j,t
: mi(t-
1)[puj(t) -Pt]2/ij
MARKOV CHAINS
109
to in probability and (-2 log X)/nconvrerges (6.13) where 2 E mi(t - l)pi,(t) log [pj,(t)/p7]P, i,j,t
(6.14) (6.15)
pij
mi(t-)/ pij(t)
E mi(t- 1) = p lim pi.

t noo
between(6.12) and (6.13) is approximately The difference

Emi
(t - 1)pipj(t)-
from0, Under the alternate hypothesis,these two stochastic limits.differ and computationof them suggestswhich test is better.If (pij(t) - pj)/pij is betweenthe two limits.When small, then therewill be only a small difference as is usually the case when X2the alternativeis some compositehypothesis, tests are applied, then these stochasticlimits can be computedand compared that are includedin the alternatehypothesis. forthe simplealternatives tests is somewhatrelatedto Cochran's comment This methodforcomparing probabilitycan be made to (see p. 323 in [5]) that either (a) the significance decrease as n increases,thus reducingthe chance of an errorof type I, or (b) the alternatehypothesiscan be moved steadily closer to the null hypothesis. Method (b) was discussedin [3]. If method (a) is used, then the criticalvalue of the statistic(X2 or - log X) will increase as n increases.When the critical value has the formon, wherec is a constant (theremay be some question as to whetherthis formfor the criticalvalue is really suitable), we see fromthe remarksin the precedingparagraphthat the power of a test will tend to 1 if c is less than the stochasticlimitand it will tend to 0 if c is greaterthan the stochasticlimit.Hence, by this approach we findthat the power of the X2-test fromthe power of the likelihoodratio test, and some can be quite different can suggestwhichtest is to be preferred. computations approximate level so the However,a more appealing approach is to vary the significance level tG the probabilityof some particularType II error ratio of significance approachesa limit (or at least it seems that desirablesequences of significance theorydoes not give pointslie betweenc' and cn). While the usual asymptotic of stochasticlimits the comparison to handle this problem, enoughinformation may suggesta comparisonof powers. discussedhereincan also be used in the study of The methodsof comparison tables. We have the x2and likelihoodratio methodsfor ordinarycontingency ratio methodsare not equivaseen that,in a certainsense,the x2and likelihood is true and fixed,and we have suggesteda lent when the alternatehypothesis whichtestis to be preferred. methodfordetermining
REFERENCES [1] T. W. ANDERSON, "Probability models for analyzing time changes in attitudes," No. 455,1951. RAND ResearchMemorandum
110
12]T. W. ANDERSON, "Probabilitymodels for analyzing time changes in attitudes," Thinking in the Social Sciences, edited by Paul F. Lazarsfeld, Mathematical The Free Press, Glencoe,Illinois, 1954. chains," Proc. goodnessof fittest forprobability [3] M. S. BARTLETT, "The frequency Philos. Soc., Vol. 47 (1951),pp. 86-95. Cambridge [4] H. CHERNOFF, "Large-sampletheory:parametriccase," Ann. Math. Stat. Vol. 27 (1956),pp. 1-22. of goodnessof fit,"Ann. Math. Stat., Vol. 23 (1952), [5] W. G. COCHRAN,."The X2-test pp. 315-345. Methods Press, PrincePrincetonUniversity of Statistics, [61H. CRAMbR, Mathematical ton, 1946. to Probability Vol. 1, John Theory and Its Applications, [71W. FELLER, An Introduction Wileyand Sons, New York, 1950. problemsin information [8] L. A. GARDNER, JR., "Some estimationand distribution Library,1954. Master's Essay, ColumbiaUniversity theory," [91 L. A. GOODMAN, "On the statistical analysis of Markov chains" (abstract), Ann. Math. Stat., Vol. 26 (1955),p. 771. Vol. 41 (1954),pp. 430-433. [101P. G. HOEL, "A test forMarkoff chains," Biometrika, Proceedings of theBerkeley to the theoryof the X2-test," [111J. NEYMAN, "Contribution of California on Mathematical and Probability, University Statistics Symposium Press, Berkeley,1949,pp. 239-274.

Anderson1957 - Statistical Inference About Markov Chains

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anderson1957 - Statistical Inference About Markov Chains

Uploaded by

Copyright:

Available Formats

Statistical Inference about Markov Chains Author(s): T. W. Anderson and Leo A.

STATISTICAL INFERENCE ABOUT MARKOV CHAINS T. W.

T. W. ANDERSON AND LEO A. GOODMAN

replaced by pi(t-l)i(t) (t) throughout.)

)i(T_ p,(,_,),(T) (T)n i(T-

(t)noi(t) g,j pg,

T. W. ANDERSON AND LEO A. GOODMAN

pij (2.8) (2.8)

E nii(t)/ E-o ni(t), tl1 t

distributed and henceare asymptotically normally

pij with a sample size of nk(O).

nk(0)p "k1iPij[1 -nk(0)p'

The variance of this quantityis

nk(O) pk* '

The covariancesof pairs of such quantitiesare

nk; i(t -1),

8[n0j(t)- ni(t - 1)ptj][nh(t) -ni(t

~~~ 1)] -1)] -pijni(t :?[nij(t)

as distribution has thesamelimit

, [nii(t) -pijni(t - 1)/nl/2 m

is numerators twodifferent between The cov ariance

-bigE where Sg = 0 if i $ g and bii = 1.

T. W. ANDERSON AND LEO A. GOODMAN

3.2. Testingthe hypothesis are constant. that the transition probabilities

is The likelihoodfunietion maximizedunder'the null hypothesis (3.3)

is maximizedunderthe alternative The likelihoodfunction

ratio criterion The ratio is the likelihood (3.5)

, m). An m X m table, which has the same formalappearance as a

If thehypothesis is true, with(m distribution Xj has theusual limiting offreedom. degrees

-2 logXj= -2 logX = 2 E nijlog

2 E nijk [log pijk

estimate ofpij.. likelihood maximum

1, 2, (Pj ..kl= pj * ki fori seemsappropriate,and (3.21) = Z.

of homogeneity m) is true,thenthe x2-test

J...kl nWk/ npj

the sum Ej,.

willhave the usual limiting withmr1(rn_ 1)2 distribution

where-2 log Xi.. offreedom. Also, (3.24)

is distributed as x2with(m - 1)2 degrees asymptotically

o {-2 logX... k} = 2 i j , nij... kIg(jA.---kI/jA... *,k,I

is The likelihood ratiocriterion (3.31)

forni(O) results werepresented asymptotic

because (6.2) p lim (niM) )-miQ))

(6.3) p lim nE mi(t -1)(P,(t)

1) Apij(t) log [PAij(t)/jAi] A

= 2 E ni(t - 1) ij[1 + xij(t)] log [1 + xij(t)], .where xij(t) = is xi-statistic

. The difference A between -2 log X) and the

ni(t - l)Ai{ij [1 + xij(t)] log [1 + xij(t)] -xij(t)

ability. Sincexij(t) (6.10)

It is therefore necessaryonly to prove that n[xij(t)]3 convergesto 0 in prob=

the null hypothesis, and

converges to zeroin probability under

E mi(t- 1) = p lim pi.

between(6.12) and (6.13) is approximately The difference

T. W. ANDERSON AND LEO A. GOODMAN

You might also like