Professional Documents
Culture Documents
Stochastic Processes
with Applications
Books in the Classics in Applied Mathematics series are monographs and textbooks declared out
of print by their original publishers, though they are of continued importance and interest to the
mathematical community. SIAM publishes this series to ensure that the information presented in these
texts is not lost to today's students and researchers.
Editor-in-Chief
Robert E. O'Malley, Jr., University of Washington
Editorial Board
John Boyd, University of Michigan
Leah Edelstein-Keshet, University of British Columbia
William G. Faris, University of Arizona
Nicholas J. Higham, University of Manchester
Peter Hoff, University of Washington
Mark Kot, University of Washington
Peter Olver, University of Minnesota
Philip Protter, Cornell University
Gerhard Wanner, L'Universite de Geneve
Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis
and Design
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New
Statistical Methodology
James A. Murdock, Perturbations: Theory and Methods
Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems
Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications
F. Natterer, The Mathematics of Computerized Tomography
Avinash C. Kale and Malcolm Slaney, Principles of Computerized Tomographic Imaging
R. Wong, Asymptotic Approximations of Integrals
O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation
David R. Brillinger, Time Series: Data Analysis and Theory
Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point
Theorems
Philip Hartman, Ordinary Differential Equations, Second Edition
Michael D. Intriligator, Mathematical Optimization and Economic Theory
Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems
Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, Vol. I: Theory
M. Vidyasagar, Nonlinear Systems Analysis, Second Edition
Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice
Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology
of Selecting and Ranking Populations
Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods
Leah Edelstein-Keshet, Mathematical Models in Biology
Heinz-Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems and the Navier-Stokes Equations
J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition
George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and
Technique
Friedrich Pukelsheim, Optimal Design of Experiments
Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications
Lee A. Segel with G. H. Handelman, Mathematics Applied to Continuum Mechanics
Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues
Barry C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics
Charles A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties
Stephen L. Campbell and Carl D. Meyer, Generalized Inverses of Linear Transformations
Alexander Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems
I. Gohberg, P. Lancaster, and L. Rodman, Matrix Polynomials
Galen R. Shorack and Jon A. Wellner, Empirical Processes with Applications to Statistics
Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone, The Linear Complementarity Problem
Rabi N. Bhattacharya and Edward C. Waymire, Stochastic Processes with Applications
Robert J. Adler, The Geometry of Random Fields
Mordecai Avriel, Walter E. Diewert, Siegfried Schaible, and Israel Zang, Generalized Concavity
Rabi N. Bhattacharya and R. Ranga Rao, Normal Approximation and Asymptotic Expansions
F ^
Stochastic Processes
with Applications
b ci
Rabi N. Bhattacharya
University of Arizona
Tucson, Arizona
Edward C. Waymire
Oregon State University
Corvallis, Oregon
pia m o
Society for Industrial and Applied Mathematics
Philadelphia
Copyright 2009 by the Society for Industrial and Applied Mathematics
This SIAM edition is an unabridged republication of the work first published by John
Wiley & Sons (SEA) Pte. Ltd., 1992.
10987654321
All rights reserved. Printed in the United States of America. No part of this book may
be reproduced, stored, or transmitted in any manner without the written permission of
the publisher. For information, write to the Society for Industrial and Applied
Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.
Preface xv
ix
X CONTENTS
XIII
Preface
xv
PREFACE
xvi
m.n, or Corollary m.n, refers to the nth such assertion in section m of the same
chapter. Exercise n, or Example n, refers to the nth Exercise, or nth Example,
of the same section. Exercise m.n (Example m.n) refers to Exercise n (Example
n) of a different section m within the same chapter. When referring to a result
or an example in a different chapter, the chapter number is always mentioned
along with the label m.n to locate it within that chapter.
This book took a long time to write. We gratefully acknowledge research
support from the National Science Foundation and the Army Research Office
during this period. Special thanks are due to Wiley editors Beatrice Shube and
Kate Roach for their encouragement and assistance in seeing this effort through.
RABI N. BHATTACHARYA
EDWARD C. WAYMIRE
Bloomington, Indiana
Corvallis, Oregon
February 1990
Sample Course Outlines
COURSE I
Beginning with the Simple Random Walk, this course leads through Brownian
Motion and Diffusion. It also contains an introduction to discrete/continuous-
parameter Markov Chains and Martingales. More emphasis is placed on concepts,
principles, computations, and examples than on complete proofs and technical
details.
Chapter 1 Chapter II Chapter III
1-7 (+ Informal Review of Chapter 0, 4) 1-4 1--3
13 (Up to Proposition 13.5) 5 (By examples) 5
11 (Example 2)
13
Chapter IV Chapter V Chapter VI
1-7 (Quick survey 1 4
by examples) 2 (Give transience/recurrence
from Proposition 2.5)
3 (Informal justification of
equation (3.4) only)
5-7
10
11 (Omit proof of Theorem 11.1)
12-14
COURSE 2
The principal topics are the Functional Central Limit Theorem, Martingales,
Diffusions, and Stochastic Differential Equations. To complete proofs and for
supplementary material, the theoretical complements are an essential part of this
course.
Chapter I Chapter V Chapter VI Chapter VII
1-4 (Quick survey) 1-3 4 1--4
6-10 6-7
13 11
13 -17
COURSE 3
This is a course on Markov Chains that also contains an introduction to
Martingales. Theoretical complements may he used only sparingly.
Denoting by X the value of a stock at an nth unit of time, one may represent
its (erratic) evolution by a family of random variables {X0 , X,, ...} indexed by
the discrete-time parameter n E 7L + . The number X, of car accidents in a city
during the time interval [0, t] gives rise to a collection of random variables
>,
{ X1 : t 0} indexed by the continuous-time parameter t. The velocity X. at a
point u in a turbulent wind field provides a family of random variables
{X: u e l8 3 indexed by a multidimensional spatial parameter u. More generally
}
In the above, one may take, respectively: (i) I = Z , S = I!; (ii) I = [0, oo),
S = Z; (iii) I = l, S = X8 3 . For the most part we shall study stochastic
processes indexed by a one-dimensional set of real numbers (e.g., time). Here
the natural ordering of numbers coincides with the sense of evolution of the
process. This order is lost for stochastic processes indexed by a multidimensional
parameter; such processes are usually referred to as random fields. The state
space S will often be a set of real numbers, finite, countable, (i.e., discrete) or
uncountable. However, we also allow for the possibility of vector-valued
variables. As a matter of convenience in notation the index set is often suppressed
when the context makes it clear. In particular, we often write {X} in place of
{X: n = 0, 1, 2, ...} and {X,} in place of {X,: t >, 0}.
For a stochastic process the values of the random variables corresponding
2 RANDOM WALK AND BROWNIAN MOTION
(a)
(b)
Figure 1.1
THE SIMPLE RANDOM WALK 3
Example 1. The sample space S2 for repeated (and unending) tosses of a coin
may be represented by the sequence space consisting of sequences of the form
w = (co l , w 2 . . , w n , ...) with aw n = 1 or co,, = 0. For this choice of 0, the value
,.
each variable has the same (Bernoulli) distribution. These facts are summarized
by saying that {X 1 , X2,. . .} is a sequence of independent and identically
distributed (i.i.d.) random variables with a common Bernoulli distribution. Let
Fn denote the event that the specific outcomes E 1 , ... , e n occur on the first n
tosses respectively. Then
0 < P(G) < P(F,,) = p'"(1 p)" '" - for each n = 1, 2, .... (1.2)
Now apply a limiting argument to see that, for 0 < p < 1, P(G) = 0. Hence the
probability of every singleton event in S2 is zero.
Definition 2.1. The stochastic process {S,,: n = 0, 1, 2, ...} is called the simple
random walk. The related process S = S + x, n = 0, 1, 2, ... is called the simple
random walk starting at x.
n
n + y x pin+Yx)12q(nY+x)/2 if ly xI < ri
P(S. =Y)= 2
and y x, n have the same parity,
0 otherwise. (2.2)
TRANSIENCE AND RECURRENCE PROPERTIES OF THE SIMPLE RANDOM WALK 5
Let us first consider the manner in which a particle escapes from an interval.
Let TY denote the first time that the process starting at x reaches y, i.e.
To avoid trivialities, assume 0 <p < 1. For integers c and d with c < d, denote
In other words, 4(x) is the probability that the particle starting at x reaches d
before it reaches c. Since in one step the particle moves to x + I with probability
p, or to x 1 with probability q, one has
so that
Thus, (x) is the solution to the discrete boundary-value problem (3.4). For
p ^ q, Eq. 3.4 yields
x-1 x-1 q Y
O(x) = Z [^(y + 1 ) o(y)] = Z - [O(c + 1) O(c)]
v=c v=c P
1 - (q/P)x -'
1 (qlP)- `
1 =4(d)=4(c+ 1)
1 q/P
Then
1 q/P
q(c + 1) = 1
(glp)d c
6 RANDOM WALK AND BROWNIAN MOTION
so that
P(Tx<Tx)= 1 (q/P)xd-c for c<x<d, p q. (3.6)
1 (q/P)
Now let
P(Tx<Td)= 1(P/q)d-xd-c
or c<x<d,p q.
f (3.8)
1 (P/q)
Note that O(x) + fr(x) = 1, proving that the particle starting in the interior of
[c, d] will eventually reach the boundary (i.e., either c or d) with probability
1. Now if c < x, then (Exercise 3)
um \9/ x
c
ifp >21
= dam
1, ifp <Z,
xc
q if p>
= P (3.9)
1, ifp<Z.
By symmetry, or as above,
i
P({S:} will ever reach d) = P(Td' < oo) = 1' d_x i f p > 2 (3.10)
C q/ ,
fp<Z.
Observe that one gets from these calculations the (geometric) distribution
function for the extremes Mx = sup,, S and mx = inf S; (Exercise 7).
Note that, by the strong law of large numbers (Chapter 0),
P Sx = x+S
^pgasn--oo =1. (3.11)
n n
TRANSIENCE AND RECURRENCE PROPERTIES OF THE SIMPLE RANDOM WALK 7
Hence, if p > q, then the random walk drifts to + oo (i.e., S -* + co) with
probability 1. In particular, the process is certain to reach d > x if p > q.
Similarly, if p < q, then the random walk drifts to - co (i.e., S -+ - cc), and
starting at x > c the process is certain to reach c if p < q. In either case, no
matter what the integer y is,
x
= y -+ 0 as n k - cc,
nk nk
Definition 3.1. A state y for which Eq. 3.12 holds is called transient. If all
states are transient then the stochastic process is said to be a transient process.
x-c
P(Tx<Tx)= ,
c<x<d,p=q =Z (3.13)
d-c
Similarly,
d-x
P(Tcx <Td)=
d-c
c<,x<d,p=q =2 (3.14)
Again we have
= lim xc = 1. (3.17)
e -- ao d c
Thus, no matter where the particle may be initially, it will eventually reach any
given state y with probability 1. After having reached y for the first time, it will
move to y + 1 or to y 1. From either of these positions the particle is again
bound to reach y with probability 1, and so on. In other words (Exercise 4),
Definition 3.2. A state y for which Eq. 3.18 holds is called recurrent. If all
states are recurrent, then the stochastic process is called a recurrent process.
Consider the random variable 7 := T representing the first time the simple
random walk starting at zero reaches the level (state) y. We will calculate the
distribution of Ty by means of an analysis of the sample paths of the simple
random walk. Let FN , y = {Ty = N} denote the event that the particle reaches
state y for the first time at the Nth step. Then,
Note that "SN = y" means that there are (N + y)/2 plus l's and (N y)/2
minus 1's among X I , X2 , ... , XN (see Eq. 2.1). Therefore, we assume that
IYI <, N and N + y is even. Now there are as many paths leading from (0, 0)
to (N, y) as there are ways of choosing (N + y)/2 plus l's among X 1 , X2 , ... , XN ,
namely
N
N+y
2
FIRST PASSAGE TIMES FOR THE SIMPLE RANDOM WALK 9
where L is the number of paths from (0, 0) to (N, y) that do not touch or cross
the level y prior to time N. To calculate L, consider the complementary number
L of paths that do reach y prior to time N,
N
L'= N+y L. (4.3)
2
First consider the case of y> 0. If a path from (0, 0) to (N, y) has reached
y prior to time N, then either (a) SJY _ 1 = y + 1 (see Figure 4.1a) or
(b) SN _ 1 = y I and the path from (0, 0) to (N 1, y 1) has reached y prior
to time N 1 (see Figure 4.1b). The contribution to L from (a) is
N-1
N+y
2
We need to calculate the contribution to L from (b).
(a)
1I
Figure 4.1
10 RANDOM WALK AND BROWNIAN MOTION
Proposition 4.1. (A Reflection Principle). Let y > 0. The collection of all paths
from (0, 0) to (N 1, y 1) that touch or cross the level y prior to time N 1
is in one-to-one correspondence with the collection of all possible paths from
(0,0)to (N 1,y + 1).
It now follows from the reflection principle that the contribution to L' from
(b) is
N-1
N+y
2
Hence
N-1
L' =2 N+y (4.4)
2
N-
Figure 4.2
MULTIDIMENSIONAL RANDOM WALKS 11
N
= IYI N + y p(N+Y)12q(N_Y)1i for N >, y, y + N even, y > 0
N \ 2
(4.5)
To calculate P(TT = N) for y < 0, simply relabel H as T and T as H (i.e.,
interchange + 1, 1). Using this new code, the desired probability is given by
replacing y by y and interchanging p, q in (4.5), i.e.,
N
P(Tr = N) = ( + y q(N_Y)/2p(N+Y)/2
2
N
P ( Ty = N ) = N + y p ( N+y)/2 q (x -v)I2 = I (4.6)
p(SN = y)
N
2
However, observe that the expected time to reach y is infinite since by Stirling's
formula, k! = (2irk) 1 / 2 Ve '`( 1 + o(1)) as k -,, oo, the tail of the p.m.f. of Ty is of
-
ES = x,
Cov(S nxi xj
' , Sn')
= tn, if =j i (5.3)
0, ifi*j.
Proof. The result has already been obtained for k = 1. In general, let S. = Sno
and write
rn=P(Sn=0)
fn = P(S" = 0 for the first time after time 0 at n), n >, 1. (5.4)
Let P(s) and f(s) denote the respective probability generating functions of {r n }
and {f,.} defined by
P(s) = 1 + I E .ijrn-js'sn-j = 1 + jZ Y r
n =1j =0 =0 (M W
=0
m sm)fj sj = 1 + f(s)f(s). (5.7)
MULTIDIMENSIONAL RANDOM WALKS 13
Therefore,
(5.8)
r(s) 1 f(s)
Note that by the Monotone Convergence Theorem (Chapter 0), P(s) ,, r(1) and
f(s) / f(1) as s T 1. If f(1) < 1, then P(l) = (1 f (1))' < oo. If f(1) = 1,
then P(1) = um s , (1 f(s) = oo. Therefore, y < 1 (i.e., 0 is transient) if
and only if :=r(1) < oo.
This criterion is applied to the case k = 2 as follows. Since a return to 0 is
possible at time 2n if and only if the numbers of steps among the 2n in the
positive horizontal and vertical directions equal the respective numbers of steps
in the negative directions,
\
412n nn ^ n ^I n n 1 nn^ z . (5.10)
j=o j 4 2n
The combinatorial identity used to get the last line of (5.10) follows by
considering the number of ways of selecting samples of size n from a population
of n objects of type 1 and n objects of type 2 (Exercise 2). Apply Stirling's
formula to (5.10) to get r2 = 0(1/n) > c/n for some c > 0. Therefore,
= P(1) = + oo and so 0 is recurrent in the case k = 2.
In the case k = 3, similar considerations of "coordinate balance" give
1 2n 1
( ) n! 2
= 22n n (5.11)
j+msn 3" j!m!(n j m)!} .
Therefore, writing
n! 1
pj, m =
j!m!(n j m)! 3n
and noting that these are the probabilities for the trinomial distribution, we have
14 RANDOM WALK AND BROWNIAN MOTION
that
1 (2n) 2
= 2 z" (P;.m) (5.12)
n
( 2n)
22 n jmax Pj,m]Pj,m= 2 a" ( nnx) ma Pj.m. (5.13)
j,m j,m
r2n \ 1 2n 1
( ) n.i
(5.14)
2 2 n n 3" rn i fl, [n],
C
r 2n - 2" nn n n
3/2 for some C' > 0. (5.15)
In particular,
Er"<oo. (5.16)
"
The general case, r2n < c k n -k/ 2 for k > 3, is left as an exercise (Exercise 1).
n
The constants appearing in the estimate (5.15) are easily computed from the
monotonicity of the ratio n!/{(2nn)`I 2 n"e - "}; whose limit as n -> oo is 1
according to Stirling's formula. To see that the ratio is monotonically decreasing,
simply observe that
t
log n! = log n! flog n n log n + n log(2n) l i 2
(2nn) 112 n"e - "
J .
,-1
j log j Z log n}{n lognn}log(2n)"2
)
U2 2 J^ J
(5.17)
CANONICAL CONSTRUCTION OF STOCHASTIC PROCESSES 15
where the integral term may be checked by integration by parts. The point is
that the term defined by
provides the inner trapezoidal approximation to the area under the curve
y = log x, 1 < x <, n. Thus, in particular, a simple sketch shows
01 J logxdxT"
n! e
n = 1, 2, .... (5.19)
1 (2nn) 112 n"e - " < (2n) 1 / z ,
Equivalently,
for all Borel sets B 1 , . .. , B. in 118 1 and n >, 1. Kolmogorov's Existence Theorem
asserts that the consistency condition (6.3) is also sufficient for such a probability
measure P to exist and that there is only one such P on (, R) = (12, F)
(theoretical complement 1). This holds more generally, for example, when the
state space S is l, a countable set, or any Borel subset of tF . A proof for the
simple case of finite state processes is outlined in Exercise 3.
Since Q(R') = 1, the consistency condition (6.3) follows immediately from the
definition (6.4). Now one simply invokes the Kolmogorov Existence Theorem
to get a probability measure P on (S2, F) such that
= p(X1 EB1) . .
.p(X,,EB). (6.5)
The simple random walk can be constructed within the framework of the
canonical probability space (S2, F, P) constructed for coin tossing, although
this is a noncanonical probability space for {S}. Alternatively, a canonical
construction can be made directly for {S} (Exercise 2(i)). This, on the other
hand, provides a noncanonical probability space for the displacement
(coin-tossing) process defined by the differences X. = S. S_ 1 , n > 1.
(Non-negative Definiteness)
7 BROWNIAN MOTION
p=2+ 2 ^ o and 0= ^
Here p and a are two fixed numbers, or > 0. Then as f --> cc, the mean
displacement t f (p q)0 converges to t and the variance converges to ta e . In
the limit, then, the position X, of the particle at time t > 0 is Gaussian with
probability density function (in y) given by
(Y
_
2QZt_t)z (7.1)
P(t; x, Y) _ (2ita2t)1j2 eXp{
Ifs > 0 then X, + X, is the sum of displacements during the time interval
(t, t + s]. Therefore, by the argument above, X, +s X, is Gaussian with mean
s and variance sa e , and it is independent of {X,,: 0 < u < t}. In particular, for
every finite set of time points 0 < t l < t 2 < <t, the random variables
X,,, X^ 2 X,.. . , X XX ,,,_, are independent. A stochastic process with this
last property is said to be a process with independent increments. This is the
continuous-time analogue of random walks. From the physical description of
the process {X} as representing (a coordinate of) the path of a diffusing solute
particle, one would expect that the sample paths of the process (i.e., the
trajectories t * X(w) = w,) may be taken to be continuous. That this is indeed
the case is an important mathematical result originally due to Norbert Wiener.
For this reason, Brownian motion is also called the Wiener process. A complete
definition of Brownian motion goes as follows.
1. The sample space S2 := C[0, oo) is the set of all real-valued continuous
functions on the time interval [0, cc). This is the set of all possible
trajectories (sample paths) of the process.
2. XX (co) := co, is the value of the sample path w at time t.
3. S2 is equipped with the smallest sigmafield .y of subsets of S2
containing the class .moo of all finite-dimensional sets of the form
F = fce e ): a ; <w,. < b i , i = 1, 2, ... , k}, where a <b, are constants
;
and 0 < t l < t 2 < < t k are a finite set of time points. .F is said to be
generated by .moo.
BROWNIAN MOTION 19
For the set F above, P (F) can be calculated as follows. Definition (7.1) gives
P
the joint density of X, Xr2 - X,,, ... , X,, - Xtk _, as that of k independent
Gaussian random variables with means t 1 p., (t 2 - t1), ... , (tk - tk -1)/ 2 ,
respectively, and variances tIQ 2 , (t2 t1)a 2 , ... , (tk tk_ 1 )a 2 , respectively.
Transforming this (product) joint density, say in variables z 1 , z 2 , ... , by the
change of variables z 1 = YI, z2 = Y2 - Y1' I zk = Yk - Yk-1 and using the fact
that the Jacobian of this linear transformation is unity, one obtains
J
fbk
^ bI ...
-
{(Y1 X t A'
2 "2 exp 2
01 ak I ak (27rQ t l) 2v t,
1 t t1)1 1 ) 2
1Y2 Y1 (2 l
(21IU2(t2 t1))"2exp
2a2(t2 t1) 1
... 2
I
1/ 2
J (Yk Yk-1 (tk tk-1)t^) 2
expl( 2 dYk dYk-I
...
dY1
(27L6 (tk tk - 1)) 26
(tk tk - 1) ^
(7.2)
The joint density of X, , X, , ... , X,,, is the integrand in (7.2) and may be
1 2
Define, for each value of the scale parameter n >, 1, the stochastic process
X i n) = S[nr] (t
^ i 0), (8.2)
V "
where [nt] is the integer part of nt. Figure 8.1 plots the sample path of
{X;"^: t >, 0} up to time t = 13/n if the successive displacements take values
Z 1 =-1, Zz =+1, Z 3 =+1, Z4 =+1, Z 5 =-1, Z6 =+1, Z 7 =+1,
Zg = 1, Zq = + 1, Z10 = + 1, 211 = + 1, Z12 = 1.
_ 1
Simi
I n .i
4 4 '-4
Vn
3
^---; -4
'In
? 4.--1-4
Vn
Vn
1 1 3 4 5 6 7 i 9 10 11 12 13 t
n n n n n n It n n n n n n
Intl
EX " = 0, VarX^11 = n ~ t,
' )
Cov(Vt., + `v l )= [n] ~ S.
Figure 8.1
>,
The process {S 11 : t 0} records the discrete-time random walk
{S.: m = 0, 1, 2, ...} on a continuous time scale whose unit is n times that of
the discrete time unit, i.e., Sm is plotted at time m/n. The process
{X} = {(1/ \/)S[ f] also scales distance by measuring distances on a scale
}
whose unit is f times the unit of measurement used for the random walk.
This is a convenient normalization, since
[nt]0 z2
EX ) = 0, Var X}" ) = ta z for large n. (8.3)
n
In a time interval (t 1 , t 2 ] the overall "displacement" X X( ) is the sum of
a large number [nt z ] [nt,] n(t 2 t,) of small i.i.d. random variables
1 1
In the case {Z,} is i.i.d. Bernoulli, this means reducing the step sizes of the
random variables to t1 = 1/,.fn. In a physical application, looking at {X}
means the following.
1. The random walk is observed at times t, < t 2 <t 3 < sufficiently far
apart to allow a large number of individual displacements to occur during
each of the time intervals (t,, t z ], (t z , t 3 ... , and
],
Since the sample paths of {X} have jumps (though small for large n) and
are, therefore, discontinuous, it is technically more convenient to linearly
interpolate the random walk between one jump point and the next, using the
same spacetime scales as used for {X}. The polygonal process {X,( 1 is " }
formally defined by
In this way, just as for the limiting Brownian motion process, the paths of
{X} are continuous. Figure 8.2 plots the path of {X1 "} corresponding to the
path of {X} drawn in Figure 8.1. In a time interval m/n < t < (m + 1)/n, X;" )
is constant at level 1// S while X}" ) changes linearly from l/ f S. at time
t=m/n to
m+1
I S, +1 = S"' Z'" + ' at time
me t =
n
22 RANDOM WALK AND BROWNIAN MOTION
I [n(] Z101j+j
S^ rl + (t
n ) ^n
Vn
4
Vn
3
In
2
do
W,
[nt] + 1 (t [nt] t
= 0, VarX^ rn) _ )2
n n n
[ns]
Figure 8.2
Thus, in any given interval [0, T] the maximum difference between the two
processes {X,(n} and {X,( } does not exceed
)
To see that the difference between {X,(n } and {X;n } is negligible for large n,
) )
I ^'<(5
= I P( Z for allm= 1,2,...,[nT]+ 1)
lV n
[nT1+1
= 1 (P(IZ11 < 5\))
= 1 (1 P(IZ11 > 6.^ n))[nT 1 +l (8.5)
Assuming for simplicity that EIZ 1 1 3 < co, Chebyshev's inequality yields
P(1Z11 > 6 ^) <, EIZII 3 /6 3 n 3/2 . Use this in (8.5) to get (Exercise 9)
(nTJ+l
EIZIr
P(e (T) > (5) 1 ( i (533/2 )
when n is large. Here indicates that the difference between the two sides
THE FUNCTIONAL CENTRAL LIMIT THEOREM (FCLT) 23
goes to zero. Thus, on any closed and bounded time interval the behaviors of
{X,(" } and {X} are the same in the large-n limit.
)
Note that given any finite set of time points 0 < t 1 < t 2 < < t, the joint
distribution of (X, X;z ) , .. . , X(" ) converges to the finite-dimensional
)
drift and diffusion coefficient a 2 . To see this, note that X, X Xt"^, ... ,
X,( X() , are independent random variables that by the classical central limit
k )
therefore, of {X^" 1 }) to those of the Brownian motion process {X} (Exercise 1).
Roughly speaking, to establish the full convergence in distribution of {X!" 1 }
to Brownian motion, one further looks at a finite set of time points comprising
a fine subdivision of a bounded interval [0, T] and shows that the fluctuations
of the process {X^"^} on [0, T] between successive points of this subdivision
are sufficiently small in probability, a property called the tightness of the process.
This control over fluctuations together with the convergence of {X^" 1 } evaluated
at the time points of the subdivision ensures convergence in distribution to a
continuous process whose finite-dimensional distributions are the same as those
of Brownian motion (see theoretical complements for details). Since there is no
process other than Brownian motion with continuous sample paths that has
these limiting finite-dimensional distributions, it follows that the limit must be
Brownian motion.
A precise statement of the functional central limit theorem (FCLT) is the
following.
dimensional events, e.g., the events {max a , b X > y} and f maxa t b X < x}
pertaining to extremes of the process. More generally, if f is a continuous
function on C[0, oo) then the event { f( {X}) < x} is also a Borel subset of
C[0, oo) (Exercise 2). With events of this type in mind, a precise meaning of
convergence in distribution (or weak convergence) of the probability measures
P. to P on this infinite-dimensional space C[0, oo) is that the probability
distributions of the real-valued (one dimensional) random variables f( {X;'})
converge (in distribution as described in Chapter 0) to the distribution of f( {X1 })
for each real-valued continuous function f defined on C[0, cc). Since a number
of important infinite-dimensional events can be expressed in terms of continuous
functionals of the processes, this makes calculations of probabilities possible
by taking limits; for examples of infinite dimensional events whose probabilities
do not converge see Exercise 9.3(iv).
Because the limiting process, namely Brownian motion, is the same for all
increments {Z,} as above, the limit Theorem 8.1 is also referred to as the
Invariance Principle, i.e., invariance with respect to the distribution of the
increment process.
There are two distinct types of applications of Theorem 8.1. In the first type
it is used to calculate probabilities of infinite-dimensional events associated with
Brownian motion by studying simple random walks. In the second type it
(invariance) is used to calculate asymptotics of a large variety of partial-sum
processes by studying simple random walks and Brownian motion. Several such
examples are considered in the next two sections.
The first problem is to calculate, for a Brownian motion {X} with drift Ic = 0
and diffusion coefficient Q 2 , starting at x, the probability
P(T < Ta) = P({X' } reaches c before d) (c < x < d), (9.1)
where
_X
P(2x < ra) = P({B,} reaches c x before d (9.3)
a Q
complement 2)
cx dxl
P(i < rd) = lim P ( {i} reaches ------- before ----/)
"- x Q 6
where
c"= Lc -x ;],
6
and
d
x n if d" = d X is an integer,
d =
" dx
d x + 1 if not.
d_ x -
- n
d a
P(rx <t) = l im " = lim ----- -- . (9.5)
Therefore,
The relations (9.8) mean that a Brownian motion with zero drift is recurrent,
26 RANDOM WALK AND BROWNIAN MOTION
[nt]
EX (n) = ES1 n,1,n = a n tp
^n or (9.9)
[
nt] Var Z n [nt] (
(1 1\1 )Z) t,
Var X^ (n) =
n n 7
1- J
Q.
1 7
= um
n-m l^ dc
a Jn 1 +
U n
1 I I
; ^
FIRST PASSAGE TIME DISTRIBUTIONS FOR BROWNIAN MOTION 27
exp
c -
d z x
exp- 2 A ^
c) Ja
e x p^ (d 2
)j
d -
exp
vz 1L
Therefore,
1 exp{2(d x)p/v z }
P(i' < zcd) = (c < x < d, p 0). (9.10)
1 exp{2(d c)p/v 2 }
P(i<<oo)=exp{- 2(x z c)p } (c < x, p > 0),
l o J)) (9.12)
P(r <oo)= 1 (c<x,p<0).
We have seen in Section 4, relation (4.7), that for a simple symmetric random
walk starting at zero, the first passage time 7 y to the state y 0 0 has the
28 RANDOM WALK AND BROWNIAN MOTION
distribution
N
P(7.=N)=IYI N+y 1
Y N N=IYI>IYI+2,IYI+4..... (10.1)
2
Now let r = T be the first time a standard Brownian motion starting at the
origin reaches z. Let {X^" } be the polygonal process corresponding to the simple
)
symmetric random walk. Considering the first time {X } reaches z, one has )
= lim P(T=,n] = N)
n-+m N=(nt]+1
N
y IYI ( N
= lim N+ (Y = [z^])
n-+ao N=tnt]+1, N 2
Nyeven
(10.2)
Now according to Stirling's formula, for large integers M, we can write
2 e -M M nr+2 (1 + S M )
M! = (21r) (10.3)
N e-NNN+#2-N
IYI N + y 2 _ N = IYI 2
(N+y)/2+I e l (N-Yu2+#
N 2 (2ir) t N e -(N+Y)12( N + Y ) (NY)/2 (N Y
2 ` 2 J
X (1 + o(1))
(2ir)1I2N312 1+ N I 1 N (1 + o( 1 ))
I 2 (N + Y)/ 2 (N Y)l2
(2ir) /2N3/2 (1 + ) 1 (1 + o( 1 )),
N
(10.4)
where o(1) denotes a quantity whose magnitude is bounded above by a quantity
en (t, z) that depends only on n, t, z and which goes to zero as n oo. Also,
FIRST PASSAGE TIME DISTRIBUTIONS FOR BROWNIAN MOTION 29
r
log (1
y (N+ y uz y wN-vuz _ N + Y Y _ Y z IYI3
+ N) 1 - N 2 N 2N2 +(N3)
[
2 3
+N 2 y [
N+2 N +O \ INI3 /]
_ -2N+8(N,y), (10.5)
where IB(N, y)j < n -11 z c(t, z) and c(t, z) is a constant depending only on t and
z. Combining (10.4) and (10.5), we have
N
z
^N N + Y 2-N = n N3I/2 1 exp -
(
2N}(1 + o( 1 ))
2
( (10.6)
= n I N 312 exp1-2N}(1 + 0(1)),
where o(1) --, 0 as n -* oo, uniformly for N> [nt], N - [z\] even. Using this
in (10.2), one obtains
2
P(T= > t) _ v
^ f
olz
e - " Z / z dv. (10.9)
The first passage time distribution for the more general case of Brownian
motion {X1 } with zero drift and diffusion coefficient Qz > 0, starting at the
origin, is now obtained by applying (10.9) to the standard Brownian motion
{(1/Q)X}. Therefore,
P(;> t) _ 2 f o
I=Ibf
e-2/2 dv. (10.10)
Note that for large t the tail of the p.d.f. f 2 (t) is of the order of t -3 / 2 . Therefore,
although {X} will reach z in a finite time with probability 1, the expected time
is infinite (Exercise 11).
Consider now a Brownian motion {X,} with a nonzero drift and diffusion
coefficient a 2 that starts at the origin. As in Section 9, the polygonal process
{X^n) } corresponding to the simple random walk S,, n = Z 1 , + + Z,, n ,
S0 ,, = 0, with P(Ztn , n = 1) = p = 2 + /(2Q\), converges in distribution to
{W = Xja}, which is a Brownian motion with drift /u and diffusion coefficient
1. On the other hand, writing Ty , n for the first passage time of {S, n : m = 0, 1, ...}
to y, one has, by relation (4.6) of Section 4,
N
N = IY)
P(1 = N) N N +y p(N+v)/2R(N-v)/2
2
N (N-i-y)12 (lN-Y)l2
IYI N ) (
N+y 2 - 1+
N 2
Ql Qn
FYI 2 \ N/21 y /2 p y /2
=
1
NN 2 U
1_
+ y 2 N(
n \ 1 +
a2 n / \
l
; )
(10.12)
For y = [w..J] for some given nonzero w, and N = [nt] + 2r for some given
t> 0 and all positive integers r, one has
N/2 / l-y/2
^ v/z/ l
( _
I1 1 }
J ^
2 2r
exp{ _
= ex 4
t e i
}(x a^w
2
ex 1
l + 0 ())
n^ p 26 p 2v (
z)n]rin
t/1 2 /LW
-zn1 +o(1))
(=exp-2+6
i ( z l r ro
exp
t 2 + ^W exp Y-2 + e (l + 0(1)) (10.13)
2a Cr l Q J
where E does not depend on r and goes to zero as n + oc, and o(l) represents
a term that goes to zero uniformly for all r >, 1 as n --* x. The first passage
time i 2 to z for {X1 } is the same as the first passage time to w = z/a for the
process {I4'} = {X/a}. It follows from (9.12), (9.13) that if
then there is a positive probability that the process { W} will never reach w = z/a
(i.e., t Z = co). On the other hand, the sum of the probabilities in (10.12) over
N> [nt] only gives the probability that the random walk reaches [w,,/n- ] in a
finite time greater than [nt]. By the FCLT, (10.6)(10.8) and (10.12) and (10.13),
we have
x [ du
expf^Zn`-`I/2
1
(2n)i/z w=
exI I p a
v3/z
^ t1i
2a W/.1
2
W2
x exp{----
(v t) dv
2v 2az
1 W/I ) z z
= ^ aw^ exp fj 1 exp W v A,
(2n) l/Z v 3 ' 2 2v 2U2 ^
Differentiating this with respect to t (and changing the sign) the probability
density function of r Z is given by
Therefore,
Izi
.f 2 exp{ -- 1 (z t) 2 } (t > 0). (10.16)
(t) _ (2na2)1J2t3n
In particular, letting p(t; 0, y) denote the p.d.f. (7.1) of the distribution of the
position X at time t, (10.16) can be expressed as (see 4.6)
I I p(t; 0, Z).
.%2. u (t) = C (10.17)
(i) p>O,z<Oor
(ii) p<0,z>0.
In all other cases, (10.16) is a proper probability density function. By putting
p = 0 in (10.16), one gets (10.11).
Consider a simple symmetric random walk {S,} starting at zero. The problem
is to calculate the distribution of the last visit to zero by S o , S,, ... , S. For
this we first calculate the probability that the number of + l's exceeds the
number of l's until time N and with a given positive value of the excess at
time N.
P(S1>0,S2>0,...,S.+b-i> 0,Sa+b=ba)
bb
[(a+ 11 (a+b 111^21a+n(a b - a
(11.1)
b b)a+b(2)a+n.
THE ARCSINE LAW 33
M= (a +
bb i 1) (a+b-1^
since there are altogether (' + 6 1 1 ) paths from (1,1) to (a + b, b a). Now a
straightforward simplification yields
_ a+b ba
M b )a+b
Lemma 2. For the simple symmetric random walk starting at zero we have,
"
=2 Z P(S 1 >O,S 2 >0,...,S Z "_ 2 >0,S en =2r)
r=1
a"
=2,= 2
[ (n +r 1 1) ( 2n+r/](2)
= 2(2nn 1 ( 2n)(^)'"
= P(S2" = 0),
)\ 2 / 2n = n
34 RANDOM WALK AND BROWNIAN MOTION
Theorem 11.1. Let I' ( ' ) = max{ j: 0 ,< j ,< m, Si = 0}. Then
P(F
(2n) = 2k) = P(S2k = 0 )P(S2n-2k = 0 )
2 ) J2n-2k
= \ k /\ 2/ 2k (fl_k k2
(2k)!(2n 2k)! ( i'\"
= fork =0,1,2,..., n. (11.3)
(k!) 2 ((n k)!)2 2
Theorem 11.2. (The Arc Sine Law). Let {B,} be a standard Brownian motion
at zero. Let y = sup{t: 0 < t 5 1, B, = 0}. Then y has the probability density
function
P(y < x) =
fo x
.f(y) dy = sin
n
1 x-. (11.6)
(2k)!(2n 2k)!
lim P(I' (Z " 1 < 2nx) = lim [^1 2_2n
n-w n-. k=o (k!) 2 ((n k)!)z
(2x)1/2e-2(n-k)(2(n k))[2cn-k)+#12-2n
x \ ((2n)'1'e-("-k)(n k)n-k++)2
[nxxl
= lim Y-
1/2
n-+oo k=o 7i (n k)
1 Intl 1 1 _ 1
1x
= n li k= n k
k 112 n o (y( 1 y)) 1I2 dy
\ n^ 1
n//
Corollary 11.3. Let {Z,, Z 2 , ...} be a sequence of i.i.d. random variables such
that EZ, = 0, EZ i = 1. Then, defining {X^" 1 } as in (8.4) and y ( n 1 as above, one has
From the arcsine law of the time of the last visit to zero it is also possible
to get the distribution of the length of time in [0, 1] the standard Brownian
motion spends on the positive side of the origin (i.e., an occupation time
law) again.as an arcsine distribution. This fact is recorded in the following
corollary (Exercise 2).
and
Cov(B*, B*) = Cov(B,,, B, 2 ) t 2 Cov(B,,, B 1 ) t, Cov(B,,, B 1 )
+ t1t2 Cov(B 1 , B1)
= tl t2tl t1t2 + t1t2 = t1(1 t2), for t i t2. (12.3)
From this one can also write down the joint normal density of (B*, B*, ... , B)
for arbitrary 0 < t l < t 2 < < t k < 1 (Exercise 1).
The Brownian bridge arises quite naturally in the asymptotic theory of
statistics. To explain this application, let us consider a sequence of real-valued
i.i.d. random variables Y1 , Y2 ,... , having a (common) distribution function F.
The nth empirical distribution is the discrete probability distribution on the line
assigning a probability 1/n to each of the n values Y Y2 , ... , Y. The
corresponding distribution function F. is called the (nth) empirical distribution
function,
1
F(t)=#{j:1<j<n,Y; <t}, co<t<00, (12.4)
n
FF(t)
1 ^--
3 ~ I
Figure 12.1
THE BROWNIAN BRIDGE 37
is the sum of n i.i.d. Bernoulli random variables each taking the value I with
probability F(t) = P( t) and the value 0 with probability 1 F(t). Now
E(1 (y . ) ) = F(t) and, for t 1
=
F(t1)( 1 F(t2)),
Cov( 1 (Y ; s1,) , 1{Yks:2))
to, ifj = k,
(12.6)
is asymptotically (as n > oo) Gaussian with mean zero and variance
F(t)(1 F(t)). For t l < t 2 < . < t k , the multidimensional central limit
theorem applied to the i.i.d. sequence of k-dimensional random vectors
( 1 (Y)' 1(Y;st^)' ... , 1 (Y ; ,Ik t) shows that (.(Fn(t1) F(t1)), \/(F,(t2) F(ti)),
. , f (F,,(t k ) F(t k ))) is asymptotically (k-dimensional) Gaussian with zero
mean and dispersion matrix E = ((a s,)), where
In the special case of observations from the uniform distribution on [0, 1],
one has
so the sequence U 1 = F(Y1 ), U2 = F(Y2 ), . .. is i.i.d. uniform on [0, 1]. The same
is true more generally (Exercise 2). Let F be the empirical distribution function
of Y1 , ... , Y,,, and G. that of U1 ,. .. , U. Then, since the proportion of Yk 's,
1 < k < n, that do not exceed t coincides with the proportion of Uk 's, 1 < k < n,
that do not exceed F(t), we have
If a = oo (b = + oo), the index set [a, b] for the process is to exclude a (b).
Since ^(G(t) t), 0 < t <, 1, converges in distribution to the Brownian bridge,
and since t -+ F(t) is increasing on (a, b), one derives the following extension
of Proposition 12.1.
D. = sup "I F(t) F(t)I = sup ' I G(F(t)) F(t)I = sup ,/ /IG(t) tl.
a_<t5b a5t5b 0-<t<,1
(12.11)
Thus, the distribution of D. is the same (namely that obtained under the uniform
distribution) for all continuous F. This common distribution has been tabulated
for small and moderately large values of n (see theoretical complement 2). By
Proposition 12.2, for large n, the distribution is approximately the same as that
of the statistic defined by (also see theoretical complement 1)
D:= sup (B*I. (12.12)
o,t,1
STOPPING TIMES AND MARTINGALES 39
These facts are often used to test the statistical hypothesis that observations
Y,, Y2 , ... , Y are from a specified distribution with a continuous distribution
function F. If the observed value, say d, of D is so large that the probability
(approximated by (12.13) for large n) is very small for a value of Dn as large as
or larger than d to occur (under the assumption that Y,, ... , Y do come from
F), then the hypothesis is rejected.
In closing, note that by the strong law of large numbers, F (t) -+ F(t) as
F
Definition 13.1. A stopping time r for the process {X} is a random variable
taking nonnegative integer values, including possibly the value + oo, such that
If X. does not lie in B for any n, one takes T B = oo. Sometimes the minimum
in (13.3) is taken over In >, 1, X . e B} , in which case we call it the first return
time to B, denoted rl B
A less interesting but useful example of a stopping time is a constant time,
T:= m (13.4)
Again, if X. does not lie in B for any n> TB 1) , take ie^ = oo. Also note that
if i8 ) = oo for some r then r ' = oo for all r' r. It is a simple exercise to check
( )
then
ES= ES o . (13.6)
Assumptions (2), (3) ensure that ES, is well defined and finite. Assumption
(4) is of a technical nature, but cannot be dispensed with. To demonstrate this,
consider a simple symmetric random walk {S n } starting at zero (i.e., S o = 0).
Write r y for T { ^, ) , the first passage time to the state y, y 0. Then (1), (2), (3)
are satisfied. But
ES,Y=y 0. (13.13)
The reason (13.6) does not hold in this case is that assumption (4) is violated
(see Exercise 4). If, on the other hand,
where a and b are positive integers, then P(T < co) = 1. There are various ways
of proving this last assertion. A more general result, namely, Proposition 13.4,
is proved later in the section to take care of this condition. To check condition
(3) of Theorem 13.1, note that ISJ < max{a, b}, so that EISB I < max{a, b}. Also,
on the set {tr > m} one has a <Sm <b and therefore
42 RANDOM WALK AND BROWNIAN MOTION
Thus condition (4) is verified. Hence the conclusion (13.6) of Theorem 13.1
holds. This means
( a >z b ) =
Pz_ a (, 13.17)
a+b
a result that was obtained by a different method earlier (see Chapter I, Eq. 3.13).
To deal with the case EX 0, as is the case with the simple asymmetric
random walk, the following corollary to Theorem 13.1 is useful.
Then
which yields (13.18). Note that EISB I < EISLI + (Er)I j! < oo, by (2') and (3'). Also
Er= (b+a)P(T_a>rb)a
(13.21)
p R
1_
(P) a
P(ta > Tb) _(q'\)a+b'
,)a
P q( a (13.22)
P_y
b+a (
Assumption (2') for this case follows from Proposition 13.4 below, while (3'),
(4'), follow exactly as in the case of the simple symmetric random walk (see
Eq. 13.15).
In the proof of Theorem 13.1, the only property of the sequence {X} that
is made use of is the property
I
E(Xa 1 {X0 ,X 1 ,...,X.})=0 (n= 0,1,2,...). (13.23)
I
E(S +1 {S o ,...,Sn })=S (n = 0, 1,2,...), (13.24)
44 RANDOM WALK AND BROWNIAN MOTION
since
X=SS i (n=1,2,3,...),
- Xo=So, (13.26)
satisfies (13.23).
Martingales necessarily have constant expected values. Likewise, if {X} is a
martingale difference sequence, then EX = 0 for each n >, 1. Theorem 13.1,
Corollary 13.2, and Theorem 13.3 below, assert that this constancy of
expectations of a martingale continues to hold at appropriate stopping times.
In the gambling setting, the martingale property (13.24), or (13.23), is often
taken as the definition of a fair game, since whatever be the outcomes of the
first n plays, the expected net gain at the (n + 1)st play is zero. As an example
of a strategy for the gambler, suppose that it is decided not to stop until an
amount a is lost or an amount b is gained, whichever comes first. Under (13.23)
and conditions (2)(4) of Theorem 13.1, the expected gain at the end of the
game is still zero. This conclusion holds for more general stopping times, as
stated in Theorem 13.3 below. Before this result is stated, it would be useful to
extend the definition of a martingale somewhat. To motivate this new definition,
consider a sequence of i.i.d. random variables { Y: n = 1, 2, ...} such that
EY = 0, EY,, = Var()) = 1. Then {S n: n = 0, 1, 2, ...} is a martingale,
where So is an arbitrary random variable independent of { Y: n = 1, 2, ...}
satisfying ES < oo.
To see this, form the difference sequence
Xo _
Szo . (13.27)
Then, writing Yo = So ,
I
E(Xn+ 1 {X0, Xl, X2, .... Xn })
by (13.28). Thus,
implies that
In general, however, the converse is not true; namely, (13.31) does not imply
(13.30). To understand this better, consider a sequence of random variables
{Y: n = 0, 1, 2, ...}. Suppose that {X: n = 0, 1, 2, ...} is another sequence of
random variables such that, for every n, X0 , X 1 , X2,... , X can be expressed
as functions of Yo , Y1 , Y2 . . , Y. Also assume EXn < oo for all n. The condition
(13.31) implies that X, +1 is orthogonal to all square integrable functions of
X0 , X 1 , ... , X, while (13.30) implies that X + , is orthogonal to all square
integrable functions of Yo , Y1 .....} (Chapter 0, Eq. 4.20). The latter class of
functions is larger than the former class. Property (13.30) is therefore stronger
than property (13.31).
One may express (13.30) as
Note that a martingale in this sense is also a martingale in the sense of Definition
13.2, since (13.30) implies (13.31). In order to state an appropriate generalization
of Theorem 13.1 we need to extend the definition of stopping times given earlier.
Then
Let Yo = 0, ^ = Q{ Yo , Y,, ... , Y}. Then, by (13.27) and (13.28), the sequence
of random variables
and
a2b 2
Et=ES, =a 2 P(r_ a <T b )+ b 2 P(t_ a > Tb) = (13.41)
a+b + a6 ab ab.
Proof. There exists an c> 0 such that either P(X" > c) > 0 or P(X" < c)> 0.
Assume first that 6 := P(X" > c) > 0. Define
no = r a+b l
+ 1, (13.43)
i
Ls
where [(a + b) /E] is the integer part of (a + b) /E. No matter what the starting
position x E (a, b) of the random walk may be, if X" > E for all
n=1,2,...,n o , then S=x+X 1 ++X" o >x+n o e>x +a+b>,b.
Therefore,
By (13.44),
Next,
Now
The equality in (13.49) is due to the fact that the distribution of (X 1 , X2 , ... , Xno )
is the same as that of (Xn o +l, Xn o +z, . .. , X20 ). Note that Sao E ( a, b) on the
set A 1 . Hence the last probability in (13.49) is not larger than 1 5 by (13.46).
Therefore, (13.47) yields
P(T > kn o ) = E(1 42 IA,,) = E[1A, ... 1 A,,- E(lAk {S1, ... , S(k- 1)no })]
E[I A1 ...l Ak- ,( 1 So)] < (1 b o )P(A 1 n ... n Ak-i)
0o kno
_ )e'I'IP(-r = m)
k=1 m= (k- 1)no+1
STOPPING TIMES AND MARTINGALES 49
w oo
e k ^ 0I=I P(2 > (k 1)n0) Ze (1 6) k-1
k=1 k=1
log(1 6)
no^z^
= e ((1 b)e Holz)) k-1 < cc for jzj < . (13.53)
k=1 no
One may proceed in an entirely analogous manner assuming P(X,, < E) > 0.
n
Thus, Proposition 13.4 has the following extension, which is useful in studying
processes other than random walks.
Proposition 13.5. The conclusion of Proposition 13.4 holds if, instead of the
assumption that {X} is i.i.d., (13.54) holds for a pair of positive numbers s, S.
and
Proof. (a) Write .Fk := a{Z 0 , ... , Zk }, the sigmafield of events determined by
Z0 ,. . . , Z. Consider the events A o '= {IZ O I % A}, A k := {IZ; I <A for 0 < j < k,
IZkI >, A} (k = 1, ... , n). The events A k are pairwise disjoint and
n
UA k ={M 2}.
0
Therefore,
n
P(MM > A) _ Y P(A k ). (13.58)
k=0
Now I < IZk !/.1 on A k . Using this and the martingale property,
Hence,
r
E( 1 AkZ^) ? E[ 1 AkE(Zk I . )] = E[E(1 Ak Zk I k)] = E( 1 AkZk). (13.63)
STOPPING TIMES AND MARTINGALES 51
co
EMS =
n
2
f A d2 dP = 2
om n o
21(M,M d i) dP
=
21 o
2(J 1{MA} dP) dA = 2
n f ,0 2P(M > A) dA. (13.64)
0
=
2 fr i
IZIM dP = 2 E(IZI M) < 2(EZ2)'/ 2 (EMn) 112 , (13.65)
using the Schwarz Inequality. Now divide the extreme left and right sides of
(13.65) by (EM)" 2 to get (13.57). n
where;:= r{X: 0 < u < t} is the sigmafield of events that depend only on the
("past" of the) process up to time t. As in the discrete-parameter case, first
passage times to finite sets are stopping times. Write for Bore! sets B c ff8 1 ,
r B :=min{t>,0:X,eB}, (13.68)
for the first passage time to the set B. If B is a singleton, B = {y}, T, is written
simply as r y , the first passage time to the state y.
52 RANDOM WALK AND BROWNIAN MOTION
For a Brownian motion {X} with drift , the process {Z, := XX t} is easily
seen to be a martingale with respect to {A}. For this {X,} another example is
{Z, :_ (X1 t) 2 ta 2 }, where a 2 is the diffusion coefficient of {X1 }.
The following is the continuous-parameter analogue of Theorem 13.6(b).
and
EM, <,4EZ,. (13.71)
Proof. For each n let 0 = t,, n <t 2 ,, < < = t be such that the sets
I:= {t j ,: I j < n} are increasing (i.e., I n c I + ,) and U 1. is dense in [0, t].
Write M1 , n := max{,Zj: I < j < n}. By (13.56),
P(Mr,, )) EZ`
> 2
Letting n j oo, one obtains (13.70) as the sets F := {M A,} increase to a set
that contains {M, > Al as n j oo.
Next, from (13.57),
CHAPTER APPLICATION 53
Then
It is simple to check that T " is a stopping time with respect to the sequence of
( )
sigmafields {.k2 -n: k = 0, 1, ...}, as is r " A r for every positive integer r. Since
( )
EZT ., A , = EZ o . (13.74)
Now ' , T for all n and T " J, r as n j oc. Therefore, t ( " ) A r j r A r. By the
( )
One may apply (13.72) exactly as in the case of the simple symmetric random
walk starting at zero (see (13.16) and (13.17)) to get, in the case it = 0,
for arbitrary a > 0, b > 0. Similarly, applying (13.72) to {Z, :_ (XX - tp) 2 - ta l l,
as in the case of the simple asymmetric random walk (see (13.20)-(13.22), and
use (9.11)),
Cl - exp^_2a2 11(b + a)
(b + a)P(r_ Q > TO - a = 6 }) - a
Et _ __ (13.76)
a6 ^ 2ba '
(1 - exp{ - (+
6Z )N 1)
(
sediment deposition, erosion, etc.) constrain the life and capacity of a reservoir.
However, a particular design parameter analyzed extensively by hydrologists,
based on an idealization in which water usage and natural loss would occur at
an annual rate estimated by YN units per year, is the (dimensionless) statistic
defined by
RN MNmN
DN DN (14.1)
where
MN:=max{SnYN:n=0, 1,...,N}
_ (14.2)
mN :=min{SnYN :n=0, 1,...,N},
DN :=r (Y YN ) Z ] IIZ ,
YN = - -SN .
' (14.3)
First consider that, by the central limit theorem, SN = Nd + O(N 1 " 2 ) in the
sense that (SN Nd)/..N is, for large N, distributed approximately like a
Gaussian random variable with mean zero and variance 0 2 . If one defines
MN =max{Snd:0 n<N},
r N =min{Snd:0^n<,N}, (14.5)
RN = MN - rN,
then by the functional central limit theorem (FCLT)
Q^N
N
-
mN l (M, m), R
Q^N
NvJ /
^^
as N oo, (14.6)
CHAPTER APPLICATION 55
M := max{B,: 0 . t . l } ,
m:=min{B,:0 < t 1}, (14.7)
R':=M-m,
R N RN
(14.9)
V '. DN \/'. Q
where "-S" indicates "asymptotic equality" in the sense that the ratio of the
two sides goes to 1 as N -- oo. This implies that the asymptotic distributions
of the two sides of (14.9) are the same. Next notice that
MN _ ( (S-nd)-n(, -d))
max
v,IN o-<n,<N aIN
and
mN - [Nt] ( SN d \l
= min ( SIN`S [Nt]d min (B, - tB l ) := m,
and
^
C MN
a \/
mN
' aN
V
-^^(M,m). (14.11)
56 RANDOM WALK AND BROWNIAN MOTION
Therefore,
RN RN
(14.12)
DN . 7
/ a..JN
where R is a strictly positive random variable. Once again then, R N /DN , the
so-called rescaled adjusted range statistic, is of the order of O(N 1 J 2 ).
The basic problem raised by Hurst is to identify circumstances under which
one may obtain an exponent H > 2. The next major theoretical result
following Feller was again somewhat negative, though quite insightful.
Specifically, P. A. P. Moran considered the case of i.i.d. random variables
Y1 , Y2 , having "fat tails" in their distribution. In this case the re-scaling by
...
Y. = X + f (n). (14.13)
= S* + Z f(j), So = 0, (14.14)
i= 1
where
S,*= X1 ++X. (14.15)
N _
D^:=1 L. (}'n Y, )2
N L=1
1 N1
N 2
/' N
= N n^l (X XN) 2 + N nZl ( f (n) JN) 2 + N nl (f(n) fN)(Xn XN)
1 N 2 N
=D2 + I (f(n) fN)Z + ( f (n) fv)(X XN). (14.17)
N= 1 N=1
Also write
_ _ n _
m N = min {SS nYN } = min Sn nXN + Y- (f (j) fN )} , (14.18)
O-<n-<N O-<n-<N j=1
IN)' /IN(0) 0,
^N(n) : jY-
=1 (1(f)
(14.19)
AN max IN(n) min PN(n).
O-<n<-N 05n-<N
Observe that
and
OSn-<N O-<n-<N
(14.21)
From (14.20) one gets R N < A N + RN, and from (14.21), R N > A N R. In
58 RANDOM WALK AND BROWNIAN MOTION
other words,
IR N - A N S < R. (14.22)
The second term on the right clearly tends to zero as N increases. Also, by
Schwarz inequality,
1 N 1 N 1/2 N 1/2
- (f(n)-a)(Xn-d) -< 1 Z (f(n)- a)2
)(X^
^ - d)2
N n=1 N n=1 ( =J
Theorem 14.1. If f (n) converges to a finite limit, then for every H > 2,
In particular, the Hurst effect with exponent H > Z holds if and only if, for
some positive number c',
lm ^
N = c'. (14.28)
N-' N "
Example 1. Take
First let < 0. Then f (n) a, and Theorem 14.1 applies. Recall that
AN = max N (n) min P N (n), (14.30)
0-<n5N O-<n-<N
where
n
N
) 14.32)
N(n) pN(n 1) = c n !J (
is positive for n < (N ' Ij 1j6)116, and negative or zero otherwise. This shows
-
1
1 N 1/
n o = j jQ (14.33)
where [x] denotes the integer part of x. The minimum value of p(fl) is zero,
60 RANDOM WALK AND BROWNIAN MOTION
ON = /N(no) = c Y ^k 1 E j) (14.34) .
k=1 Ni=i
ng N c1 Ni+fl,
cno > -1,
1 -^ "
\ 1+
A N cn o (n^ ' log n o N -1 log N) c log N, = 1, (14.37)
X
C Y_ j# = c 2 , <-1.
j=1
CASE 1: 2 < < 0. In this case Theorem 14.1 applies with H() = 1 + > 2.
Note that, by Lemma 1, DN a with probability 1. Therefore,
CASE 2: < Z. Use inequality (14.23), and note from (14.37) that
O N = o(N l l 2 ). Dividing both sides of (14.23) by DN N 112 one gets, in probability
asN --goo,
RN R" R"
if < 2 .
DN N 1 I 2 DN N 1 I 2 QN1 2
1 (14.39)
CHAPTER APPLICATION 61
CASE 3: = 0. In this case the Y are i.i.d. Therefore, as proved at the outset,
the Hurst exponent is 2.
CASE 4: > 0. In this case Lemma 1 does not apply, but a simple computation
yields
that
_nl S nYN
ZN( N) forn= l,2,...,N,
V ' DN
and linearly interpolated between n/N and (n + 1)/N. Then {Z N (s)} converges in
distribution to {BS + 2c,(1 ../s)}, where B'} is the Brownian bridge. In
this case the asymptotic distribution of R N /( DN) is the nondegenerate
distribution of
max B min B,
oss,i osssi
(Exercise 1).
The graph of H() versus in Figure 14.1 summarizes the results of the
preceding cases 1 through 5.
62 RANDOM WALK AND BROWNIAN MOTION
Figure 14.1
In other words, for large N the plot of log R N /DN against log N should be
approximately linear with slope H = 1 + , if < < 0.
Under the i.i.d. model one would expect to find a fluctuation between the
maximum and the minimum of partial sums, centered around the sample mean,
over a period N to be of the order of N 1 / 2 . One may then try to check the
appropriateness of the model, i.e., the presumed i.i.d. nature of the observations,
by taking successive (disjoint) blocks of Y. values each of size N, calculating
the difference between the maximum and minimum of partial sums in each
block, and seeing whether this difference is of the order of N" 2 . In this regard
it is of interest that many other geophysical data sets indicative of climatic
patterns have been reported to exhibit the Hurst effect.
EXERCISES
Each integer lattice site of Z' is independently colored red or green with
probabilities p and q = 1 - p, respectively. Let E m be the event that the number
of green sites equals the number of red sites in the block of sites of side lengths
2m (sites per side) with a corner at the origin. Calculate P(E m i.o.) for d > 3. [Hint:
Use the Borel-Cantelli Lemma, Chapter 0, Lemma 6.1.]
5. A die is repeatedly tossed and the number of spots is recorded at each stage. Fix
j, 1 _< j < 6, and let p,, be the probability that j occurs among the first n tosses.
Calculate p. and the probability that j eventually occurs.
6. (A Fair Coin Simulation) Suppose that you are given a coin for which the
probability of a head is p where 0 < p < 1. At each unit of time toss the coin twice
and at the nth such double toss record:
XX = 1 if a head followed by a tail occurs,
X, = -1 if a tail followed by a head occurs,
XX = 0 if the outcomes of the double toss coincide.
Let,r=min{n_> 1:X= 1 or -1}.
(i) Verify that P(t < oo) = 1.
(ii) Calculate the distribution of Y = X.
(iii) Calculate Er.
7. Show that the two probability distributions for unending independent tosses of a
coin, corresponding to distinct probabilities Pi 0 p Z for a head in a single toss,
assign respective total probabilities to mutually disjoint subsets of the coin-tossing
sample space S2. [Hint: Consider the density of l's in the various possible sequences
in S2 and use the SLLN (Chapter 0, Theorem 6.1). Such distributions are said to
be mutually singular.]
8. Suppose that M particles can be in each of N possible states s,, S21'.. , s N . Construct
a probability space and calculate the distribution of (X ... , XN ), where Xi is the
number of particles in state s ; , for each of the following schemes (i)-(iii).
(i) (Maxwell-Boltzmann) The particles are distinguishable, say labeled m i , . .. , m M
,
and are randomly assigned states in such a way that all possible
distinct assignments are equally likely to occur. (Imagine putting balls
(particles) into boxes (states).)
(ii) (Bose-Einstein) The particles are not distinguishable but are randomly
assigned states in such a way that all possible values of the numbers of particles
in the various states are equally likely to occur.
(iii) (Fermi-Dirac) The particles are not distinguishable but are randomly
assigned states in such a way that there can be at most one particle in any
one of the states and all possible values of the numbers of particles in various
states under the exclusion principle are equally likely to occur.
(iv) For each of the above distributions calculate the asymptotic distribution of
X as M and N -. oo such that MIN .- where p > 0 is the asymptotic
;
1/3 " - t . In particular, the probability (under f) that a randomly selected point
belongs to I", i.e., the length of 1", is P(I") = 2" '/3" '. The sets
- -
first time at the nth stage. Then F 0 is well defined and has a continuous
extension to a function F on all of [0,1] with F(l) = I and F(0) = 0.]
u u 0 u II u
n=5
Figure Ex.I.2
test is needed. If the test of the pool is positive then at least one individual has the
disease and each of the m persons must be retested individually, resulting in this
event in m + 1 tests. Let X 1 , X 2 , ... , XN be an i.i.d. sequence of 0 or 1 valued
random variables with p = P(X = 1) for n = 1, 2, .... Let the event X. = l} be
used to indicate that the nth individual is infected; then the parameter p measures
the incidence of the disease in the population. Let S. = X, + X 2 + + X. denote
the number of infected individuals among the first n individuals tested, S o = 0. Let
Tk denote the number of tests required for the kth group of m individuals tested,
k = 1,2,...,[N/m]. Thus, form? 2, Tk = m + I if5,kSm(k_ ^0and Tk = I if
Smk 'Sm(k-,) = 0. The total number of tests (cost) for N individuals tested in groups
of size m each is
[N!m]
C N = Tk+(Nm[N/m]), m? 2.
k=1
Find m such that, for given N (large) and p, the expected number of tests per person
is minimal. [Hint: Consider the limit as N oo and show that the optimal m, if
one exists, is the integer value of m that minimizes the function
ECN I
c(m)=lim =1+--(1p) m , m? 2, c(1) = 1.
N -' N in
Analyze the extreme values of the function g(x) = (I/x) (1 p)x for x > 0
(see D. W. Turner,. F. E. Tidmore, D. M. Young (1988), SIAM Review, 30,
pp. 119-122).]
4. Let {S;} be the simple symmetric random walk starting at x and let
66 RANDOM WALK AND BROWNIAN MOTION
u(n, y) = P(S = y). Verify that u(n, y) satisfies the following initial value problem
sink{y (x c)}
P(Td < Tx) = exp{y p (d x)} where y,, = ln((q/p)" 2 )
Binh{y,,(d c)}'
3. Justify the use of limits in (3.9) using the continuity properties (Chapter 0, (1.1),
(1.2)) of a probability measure.
4. Verify that P(S.x j4 y for all n >, N) = 0 for each N = 1, 2, ... to establish (3.18).
5. (i) If p < q and x < d, then give the symmetry argument to calculate, using (3.9),
the probability that the simple random walk starting at x will eventually reach
d in (3.10).
(ii) Verify that (3.10) may be expressed as
R. represents the number of distinct states visited by the random walk in time 0 to n.
(i) Show that E(R/n) -' ^p qi as n -* oo. [Hint: Write
_ (l ifSk;S;forallj=0,1,...,k-1,
Ik Sl0 otherwise.
Then
For the case P(X 1 = 0) = 0, take logarithms and apply Jensen's Inequality (Chapter
0, (2.7)) and the SLLN (Chapter 0, Section 0.6) to show log T. -- 00 a.s. Note the
strict inequality in Jensen's Inequality by nondegeneracy.]
15. Let {S"} be a simple random walk starting at 0. Show the following.
(i) If p = Z, then P(S" = 0) diverges.
(ii) If p Z, then =o P(S" = 0) = (1 4pq) -112 = Ip q1'. [Hint: Apply the
Taylor series generalization of the Binomial theorem to z"P(Sn = 0) noting
that
C 2n (pq)" =
n )
(_1)(4pq)n (
/
\ n
2 I ]
(iii) Give a proof of the transience of 0 using (ii) for p ^ Z. [Hint: Use the
BorelCantelli Lemma (Chapter 0).]
16. Define the backward difference operator by
(iv) Verify that if two harmonic functions agree on a, then they must coincide on
all of [c, d].
(v) Give an alternate proof of the fact that a symmetric simple random walk starting
at x in (c, d) must eventually reach the boundary based on the above
maximum/minimum principle for harmonic functions. [Hint: Verify that the sum
of two harmonic functions is harmonic and use the above ideas to determine
the minimum of the escape probability from [c, d] starting at x, c -< x -< d.]
17. Consider the simple random walk with p -< q starting at 0. Let N denote the number ;
of visits to j > 0 that occur prior to the first return to 0. Give an argument that
EN; = (p/q)'. [Hint: The number of excursions to j before returning to 0 has a
geometric distribution. Condition on the first displacement.]
Let T" denote the time to reach the boundary for a simple random walk {S}
starting at x in (c, d). Let p = 2p - 1, a 2 = 2p.
(i) Verify that ET" < oo. [Hint: Take x = 0. Choose N such that P(ISI > d - c).
Argue that P(T > rN) -< ('-,)', r = 1, 2, ... using the fact that the r sums over
(jN - N, jN], j = 1, ... , r, are i.i.d. and distributed as SN.]
(ii) Show that m(x) = ETx solves the boundary value problem
aZ
2 -V m+pVm= -1,
m(c) = m(d) = 0, Vm(x):= m(x) m(x 1).
(iii) Find an analytic expression for the solution to the nonhomogeneous boundary
value problem for the case p = 0. [Hint: m(x) = -x 2 is a particular solution
and 1, x solve the homogeneous problem.]
(iv) Repeat (iii) for the case p 96 0. [Hint: m(x) = Iq - p 'x is a particular solution
-
(i) Y- N
N
( I N +y 2 -N =1
1 forally#0.
NIIyI
N+yeven 2
Y
N,IyI
J IyIIN
N
N
+ 1
1
p (N+y)lZ q (N y)/2 = / p\ v
f I
fort'>0
for y < 0.
N+ y even 2 \q )
70 RANDOM WALK AND BROWNIAN MOTION
3. (A Reflection Property) For the simple symmetric random walk {S} starting at 0
show that, for y > 0,
MN =max{S:n=0, 1,2,...,N},
m N =min{S:n=0, 1, 2,...,N}.
N
_ P(Tn =n)P(SN _>-ba)
n=1
8. What percentage of the particles at y at time N are there for the first time in a dilute
system of many noninteracting (i.e., independent) particles each undergoing a simple
random walk starting at the origin?
*9. Suppose that the points of the state space S = 1 are painted blue with probability
EXERCISES 71
Nn(P) _ 1B(Sk)
k=0
(i) Show that EN(p) = (n + 1)p. [Hint: EI B (Sk ) = E{E[I B (Sk ) I Sk ]}.]
(ii) Verify that
for p = Z
lim Var{ Nn(p) l - cap
l n ) P(I - P)
for p z.
1p - 9^
10. Apply Stirling's Formula, (k! _ (27rk)'/ Z k k e -k (1 + 0(l)) as k -. 00), to show for the
simple symmetric random walks starting at 0 that
(i) P(T N)
'
as N -^ oo.
(2rz)1'^z N-3/2
I. (i) Complete the proof of P61ya'.s Theorem fork > 3. (See Exercise 5 below.)
(ii) Give an alternative proof of transience for k > 3 by an application of the
Borel-Cantelli Lemma Part 1 (Chapter 0, (6.1)). Why cannot Part 2 of the
lemma be directly applied to prove recurrence for k = 1, 2?
2. Show that
kO \k/ Z =()
/ .
[Hint: Consider the number of ways in which n balls can be selected from a box
of n black and n white balls.]
3. (i) Show that for the 2-dimensional simple symmetric random walk, the probability
of a return to (0, 0) at time 2n is the same as that for two independent walkers,
one along the horizontal and the other along the vertical, to be at (0, 0) at
time 2n. Also verify this by a geometric argument based on two independent
walkers with step size 1/ f and viewed along the axes rotated by 450
72 RANDOM WALK AND BROWNIAN MOTION
(ii) Show that relations (5.5) hold for a general random walk on the integer lattice
in any dimension. Use these to compute, for the simple symmetric random
walk in dimension two, the probabilities fj that the random walk returns to
the origin at time j for the first time for j = 1, ... , 8. Similarly compute fj in
dimension three for I < j < 4.
4. (i) Show that the method of Exercise 3(i) above does not hold in k = 3 dimensions.
(ii) Show that the motion of three independent simple symmetric random walkers
starting at (0, 0, 0) in 71 3 is transient.
5. Show that the trinomial coefficient
n n!
( j, k, n j k j!k!(njk)!
n ^ n
( j, k,njk)_ < (J,K,nJK)
2k-1 '"
P(S+1,...,S,+mE{So,...,S}`5
2k )
S0=(0,0,...,0).
Show that the configuration in which all switches are off is recurrent in the cases
k = 1, 2. The general case will follow from the methods and theory of Chapter II
when k < oo. The problem when k = cc has an interesting history: see F. Spitzer
(1976), Principles of Random Walk, Springer Verlag, New York, and references
therein.
*12. Use Exercise 11 above and the examples of random walks on Z to arrive at a
general formulation of the notion of a random walk on a group. Describe a random
walk on the unit circle in the complex plane as an illustration of your ideas.
13. Let {X"} denote a recurrent random walk on the 1-dimensional integer lattice.
Show that
[Hint: Translate the problem by x and consider that starting from 0, the number
of visits to 0 before hitting x is bounded below by the number of visits to 0
before leaving the (open) interval centered at 0 of length lxi. Use monotonicity to
pass to the limit.]
for an arbitrarily prescribed sequence E i ..... E" of l's and 0's. [Hint:
p(w, n) = J^ ^ Iw" 7"I/ 2 " metrizes the product topology on S2. Consider the
open balls of radii of the form r = 2 -^' centered at sequences which are 0 from
some n onward and use separability.]
74 RANDOM WALK AND BROWNIAN MOTION
(ii) Let {P"} be a consistent family of probability measures, with P" defined on
(Ii",."), and such that P. is concentrated on Q, = {0,1 }". Define a set function
for events of the form F = F(E 1 , ... , e") in (i), by
where B c {0, 1 }", which agrees with this formula for Fe .f", n >_ 1.
(iii) Show that F := Un 1 .f" is a field of subsets of Q but not a sigmafield.
(iv) Show that P is a countably additive measure on F. [Hint: f) is compact and
the cylinder sets are both open and closed for the product topology on D.]
(v) Show that P has a unique extension to a probability measure on F. [Hint:
Invoke the Caratheodory Extension Theorem (Chapter 0, Section 1) under (iii),
(iv)]
(vi) Show that the above arguments also apply to any finite-state discrete-parameter
stochastic process.
4. Let (S2, F, P) and (S, 2) represent measurable spaces. A function X defined on S2
and taking values in S is called measurable if X - `(B) E F for all Be 2', where
X -' (B) = {X E B} _ {w E D: X(o) e B}. This is the meaning of an S-valued random
variable. The distribution of X is the induced probability measure Q on ,P defined by
Let (S2, .F, P) be the canonical model for nonterminating repeated tosses of a coin
and X(a) = w", WE n. Show that {X" X.1, . ..}, m an arbitrary positive integer,
is a measurable function on (S2, F, P) taking values in (S2, F) with the distribution
P; i.e., {X,", Xm+ ,, ... , } is a noncanonical model for an infinite sequence of coin
tossings.
5. Suppose that Di" and D'2 are covariance matrices.
(i) Verify that aD" + D 21 , a, _> 0, is a covariance matrix.
(ii) Let {D ( " ) = ((a;;'))} be a sequence of covariance matrices (k x k) such that
lim a v;j ) = a i; exists. Show that D = ((a u )) is a covariance matrix.
*6. Let (t) = f e"'p(dx) be the Fourier transform of a positive finite measure p.
(Chapter 0, (8.46)).
(i) Show that ((t, t.)) is a nonnegative definite matrix for any t, < < t k .
(ii) Show that (t) = e - I` 1 if p is the Cauchy distribution.
*7. (Plya Criterion for Characteristic Functions) Suppose that 4 is a real-valued
nonnegative function on ( cc, oo) with 4(t) = 4(t) and 0(0) = 1. Show that if 0
is continuous and convex on [0, cc), then 0 is the Fourier transform (characteristic
function) of a probability distribution (in particular, for any t, < t 2 < < t k , k >_ 1,
((4(t ; ti ))) is nonnegative definite by Exercise 6), via the following steps.
(i) Check that 1 Its, Its _< 1, t E 68',
Y(t) = 0,
Itl > 1
EXERCISES 75
Y( . ) (X, - mnt)
-
t
4. Let {X} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and
zero drift.
(ii) Show that the process has the following scaling property. For each A > 0 the
process {Y} defined by Y, = A - ' t Z Xx, is distributed exactly as the process Al.
(ii) How does (i) extend to k-dimensional Brownian motion?
5. Let {X} be a stochastic process which has stationary and independent increments.
(i) Show that the distribution of the increments must be infinitely divisible; i.e., for
each integer n, the distribution of X, - X, (s < t) can be expressed as an n-fold
convolution of a probability measure p".
(ii) Suppose that the increment 24 - X s has the Cauchy distribution with p.d.f.
(t - s)/n[(t - s) 2 + x 2 ] for s < t, x e I8'. Show that the Cauchy process so
described is invariant under the rescaling { Y} where Y = A -' Xi, for A > 0; i.e.,
{Y,} has the same distribution as {X,}. (This process can be constructed by
methods of theoretical complements 1, 2 to Section IV.1.)
6. Let {X} be a Brownian motion starting at 0 with zero drift and diffusion coefficient
a 2 > 0. Define Y,= JXJ,t ?0.
(i) Calculate EY, Var Y,.
(ii) Is { Y} a process with independent increments?
7. Let R, = X, where {Xj is a Brownian motion starting at 0 with zero drift and
diffusion coefficient a 2 > 0. Calculate the distribution of R.
8. Let {B,} be a standard Brownian motion starting at 0. Define
11. Let {X} be any mean zero Gaussian process. Let t, < t 2 < < t".
(i) Show that the characteristic function of (X r ,... .. X,^) is of the form e -44 ) for
some quadratic form Q(E,) = <A4, i>.
(ii) Establish the pair-correlation decomposition formula for block correlations:
0 if n is odd
E{X,,X, 2 ...X,} =
* E{X,X, }. E{X,X, k } if n is even,
where Y* denotes the sum taken over all possible decompositions into all possible
disjoint pairs {t ; , t 3 }, ... , {t,,,, t,} obtained from {t,.. ... t"}. [Hint: Use induction
on derivatives of the (multivariate) characteristic function at (0, 0, ... , 0) by
first observing that c?e -1 Q 14) /c7 i = a,e t'2 and c?x ; /a5 t = a ;j , where
x i = Y J a ;j ^ t and A = ((a i; )).]
2n n
Figure Ex.I.8
78 RANDOM WALK AND BROWNIAN MOTION
*4 Give an example to demonstrate that it is not the case that the FCLT gives
convergence of probabilities of all infinite-dimensional events in C[0, x). [Hint:
The polygonal process has finite total variation over 0 <_ t z 1 with probability 1.
Compare with Exercise 7.8.]
5. Verify that the probability density function p(t; x, y) of the position at time t of the
Brownian motion starting at x with drift p and diffusion coefficient a 2 solves the
so-called Fokker-Planck equation (for fixed x) given by
ap _ 1 2 02 p ap
at - Zo OY Z - p aY
(i) Check that for fixed y, p also satisfies the adjoint equation
ap =, 22
a l p a
2
c7 + p
at ax "^ ax
ac a2C ac
za g 2 - p, c(O , Y) = co(y).
8t = Y Y
6. (Collective Risk in Actuary Science) Suppose that an insurance company has an
initial reserve (total assets) of X0 > 0 units. Policy holders are charged a (gross)
risk premium rate a per unit time and claims are made at an average rate A. The
average claim amount is it with variance a 2 . Discuss modeling the risk reserve
process {X,} as a Brownian motion starting at x with drift coefficient of the form
a - p l and diffusion coefficient 2a 2 , on some scale.
7. (Law of Proportionate Effect) A material (e.g., pavement) is subject to a succession
of random impacts or loads in the form of positive random variables L,, L 2 , ..
(e.g., traffic). It is assumed that the (measure of) material strength T k after the kth
impact is proportional to the strength Tk _, at the preceding stage through the
applied load L k , k = 1, 2, ... , i.e., Tk = L,,T,k _,. Assume an initial strength To - 1
as normalization, and that E(log L 1 ) 2 < co. Describe conditions under which it is
appropriate to consider the geometric Brownian motion defined by {exp(pt + a 2 B,)},
where {Bj is standard Brownian motion, as a model for the strength process.
8. Let X 1 , X2 ,,.. be i.i.d. random variables with EX = 0, Var X. = a 2 > 0. Let
S. = X, + . + X,,, n >, 1, So = 0. Express the limiting distribution of each of the
random variables defined below in terms of the distribution of the appropriate
random variable associated with Brownian motion having drift 0 and diffusion
coefficient a 2 > 0.
(i) Fix 0>0, Y. = n -012 max{ISj : 1 _< k < n}.
(ii) Yn = n - 'I'S..
(iii) Y. = n 312 >I Sk . [Hint: Consider the integral of t -- S(n , l , 0 5 t - 1.]
9. (i) Write R n (x) = 1(1 + xfn)" - esj. Show that
EXERCISES 79
+ I x en+
R(x) 1 1 1 _ r = 1_ ^x r t e^xl
n n Jj r! (n + 1)!
(ii) Use (i) to prove (8.6). [Hint: Use Taylor's theorem for the inequality, and
Lebesgue's Dominated Convergence Theorem (Chapter 0, Section 0.3).]
1. (i) Use the SLLN to show that the Brownian motion with nonzero drift is transient.
(ii) Extend (i) to the k-dimensional Brownian motion with drift.
2. Let X, = X 0 + vt, t >, 0, where v is a nonrandom constant-rate parameter and X 0
is a random variable.
(i) Calculate the conditional distribution of X,, given XS = x, for s < t.
(ii) Show that all states are transient if v 0.
(iii) Calculate the distribution of X, if the initial state is normally distributed with
mean and variance a 2 .
3. Let {X,} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and
zero drift.
(i) Define { }} by Y, = tX,,, for t > 0 and Y0 = 0. Show that { Y} is distributed as
Brownian motion starting at 0. [Hint: Use the law of large numbers to prove
sample path continuity at t = 0.]
(ii) Show that {X,} has infinitely many zeros in every neighborhood of t = 0 with
probability 1.
(iii) Show that the probability that t - X, has a right-hand derivative at t = 0 is zero.
(iv) Use (iii) to provide another example to Exercise 8.4.
4. Show that the distribution of min,, 0 X is exponential if {X, } is Brownian motion
starting at 0 with drift p > 0. Likewise, calculate the distribution of max,,, X, when
p<0.
*5. Let {Sn } denote the simple symmetric random walk starting at 0, and let
m= min S k , M= max Sk , n= 1, 2, ... .
OSk<n 05k-'n
Let {B,} denote a standard Brownian motion and let m = min,,,,, B,,
M = max, < ,,, B,. Then, by the FCLT, n" t j 2 (m n , Mn , Sn ) converges in distribution
to (m, M, B 1 ); for rigorous justification use theoretical complements 1.8, 1.9 noting
that the functional w + (min,,,,, co,, max,,,,, w,, w,) is a continuous map of the
metric space C[0, 1] into R 3 . For notational convenience, let
for integers u, v, y such that u _< 0 _< v, u < v and u _< y _< v. Also let
1(a, b) = P(a <Z < b), where Z has the standard normal distribution. The following
use of the reflection principle is taken from an exercise in P. Billingsley (1968),
80 RANDOM WALK AND BROWNIAN MOTION
Convergence of Probability Measures, Wiley, New York, p. 86. These results for
Brownian motion are also obtained by other methods in Chapter V.
(i) P(u, v, Y) = P,(Y) n(v, Y) it(u, Y) + n(v, u, y) + n(u, v, Y) t(v, u, v, Y)
n(u, v, u, y) + , where for any fixed sequence of nonnegative integers
Y1, Y2, , Yk, y, k, n, zc(y 1 , Y2, ... , Yk, y) denotes the probability that an n-step
random walk meets y, (at least once), then meets Y 2 , then meets y 3 , ... ,
then meets (-1) k- 'y k , and ends at y.
( 11 ) R(Y 1, Y2, ... , Yk, Y) = pn( 2 Y 1 + 2Y 2 + ... + 2y k - 1 + (-1)k + 1 y) if (-1)k + 'Y > Yk,
n(Y1, Y2, ... , Yk, Y) = Pn( 2 Y1 + 2Y 2 + ... + 2y k _ (_ 1)k+ l y) if (1)'y -< Yk
[Hint: Use Exercise 4.4(ii), the reflection principle, and induction on k. Reflect
through (-1)k y k _, the part of the path to the right of the first passage through
that point following successive passages through y1, Y2, ... , ( -1 ) k-1 Yk-2.]
Y P(2vy 2 +2k(vu)<S<2vy,+2k(vu)).
k=w
(vii) P(u <m -< M < v) _ (-1)kD(u + 2k(v u), v + 2k(v u)).
k=m
* 1. (i) Show that (10.2) holds at each point z _< 0 (> 0) of continuity of the distribution
function for min o , X. (max o <,,, X,). [Hint: These latter functionals are
continuous.]
(ii) Use (i) and (10.9) to assert (10.2) for all ^.
2. Calculate the probability that a Brownian motion with drift it and diffusion coefficient
a 2 > 0 starting at x will reach y : x in time t or less.
3. Suppose that solute particles are undergoing Brownian motion in the horizontal
direction in a semi-infinite tube whose left end acts as an absorbing boundary in
the sense that when a particle reaches the left end it is taken out of the flow. Assume
that initially a proportion i(x) dx of the particles are present in the element of
volume between x and x + dx from the left end, so that f b(x) dx = 1. For a given
drift it away from the left end and diffusion coefficient a 2 > 0, calculate the fraction
of particles eventually absorbed. What if p = 0?
4. Two independent Brownian motions with drift p ; and diffusion coefficient a?, i = 1, 2,
are found at time t = 0 at positions x i , i = 1, 2, with x, <x 2 .
(i) Calculate the probability that the two particles will never meet.
(ii) Calculate the probability that the particles will meet before time s > 0.
5. (i) Calculate the distribution of the maximum value of the Brownian motion
starting at 0 with drift p and diffusion coefficient a 2 over the time period [0, t].
*(ii) For the case p = 0 give a geometric "reflection" argument that P(max o , s <, Xs >_ y)
= 2P(X, _> y). Use (i) to verify this.
6. Calculate the distribution of the minimum value of a Brownian motion starting at
0 with drift I and diffusion coefficient a 2 over the time period [0, t].
7. Let {B 1 } be standard Brownian motion starting at 0 and let a, b > 0.
(i) Calculate the probability that at < B, < bt for all sufficiently large t.
(ii) Calculate the probability that {B,} last touches the line v = at instead of
y = bt. [Hint: Consider the process {Z,} defined by Zo = 0, Z, = tB, I , for t > 0.
and Exercise 9.3(i).]
8. Let {(B,(' ) , B 2 )} be a two-dimensional standard Brownian motion starting at (0, 0)
(see Section 7). Let r y = inf{t _> 0: B; 2 = y}, y > 0. Calculate the distribution of
Bty ) . [Hint: {B} and {B 2 } are independent one-dimensional Brownian motions.
Condition on r. Evaluate the integral by substituting u = (x 2 + y 2 )/t.]
9. Let {Br } be a standard Brownian motion starting at 0. Describe the geometric
structure of sample paths for each of the following stochastic processes and calculate
EY,.
fY = B 1 if max o<s ,, Bs < a
(i) (Absorption)
1 Y = a if max o , s <, B s >_ a,
where a > 0 is a constant.
82 RANDOM WALK AND BROWNIAN MOTION
(*ii) (Reflection)
J Y=B,
Y,=2aB,
ifB,<a
if B,>a,
a e
.f8(t) = (2 )"2 22 ' (t > 0).
t 3/2
ir
(ii) Verify that the distribution of; is a stable law with exponent 0 = 2 (index z)
in the sense that if Ti , T2 , ... , T. are i.i.d. and distributed as T. then
n -8 (T1 + + T.) is distributed as T. (see Eq. 10.2).
(iii) (Scaling property) ; is distributed as z 2 z1.
11. Let T. be the first passage time to z for a standard Brownian motion starting at 0
with zero drift.
(i) Verify that Ez= is not finite.
(ii) Show that Ee = e i 2 ' ^"', A > 0. [Hint: Tedious integration will
work.]
(iii) Use Laplace transforms to check that (1/n)z (,J J converges in distribution to
t z as n > oo.
12. Let {B,} be standard Brownian motion starting at 0. Let s < t. Show that the
probability that {B,} has at least one zero in (s, t) is given by (2/it) cos - '(s/t)`/ 2 .
[Hint: Let
Likewise for x < 0, p(x) = P(r _ x -< t s). So the desired probability can be obtained
by calculating
2
EP(IB,I) =
f
o,
P(x)(-
ns
e zs ^ 2 dx.
EXERCISES 83
Throughout this set of exercises {S"} denotes the simple symmetric random
walk starting at 0.
1. Show the following for r 0.
2n Irl
(i) P(S, ^0,S 2 ^0,...,S 2n - t # 0 ,S2 = 2r) = 2
-2n
fl + rJ n
= 2r) _ (2k)( 2n 2k) Iri 2 -zn
(ii) P(r' 2 n ) = 2k, S Zn
k/J nk+r /Jnk
... , S;, = S". This transformation corresponds to a rotation through 180 degrees.
Use (11.2).]
(iii) Show that
P(i a") =
2k) = P(t (2 "' = 2k + 1) = 1 (2k)2-2k(2(n - k))2-2i"-k)
2\k/ n-k
for k = 1, ... , n in the first case and k = 0, ... , n - 1 in the second. [Hint: A
path of length 2n with a maximum at 2k can be considered in two sections.
Apply (i) and (ii) to each section.]
")
1
n C
(iv) lim P t /I1 -sin - ^,
it
2
0<t<1.
*4. Let F, UU be as defined in Exercise 2. Define V. = #{k < V" ) : Sk _ , >, 0, Sk _> 0} =
UT "). Show that P(VZ " = 2r ( S 2 . = 0) = 1/(n + 1), r = 0, 1, ... , n. [Hint: Use
induction and Exercise 3(i) to show that P(V2 . = 2r, S 2 " = 0) does not depend on
r,0_<r_<n.]
1. Show that the finite-dimensional distributions of the Brownian bridge are Gaussian.
2. Suppose that F is an arbitrary distribution function (not necessarily continuous).
Define an inverse to F as F - '(y) = inf{x: F(x) > y}. Show that if Y is uniform on
[0, 1] then X = F '(Y) has distribution function F.
-
3. Let {B r } be standard Brownian motion starting at 0 and let B* = B, - tB,, 0 < t <_ 1.
(i) Show that {B*} is independent of B,.
(ii) (The Inverse Simulation) Give a construction of standard Brownian motion
from the Brownian bridge. [Hint: Use (i).]
*4. Let {B,} be a standard Brownian motion starting at 0 and let {B*} be the Brownian
bridge.
(i) Show that for time points 0 < t, <t 2 < < t k _ 1,
- exp{-2[v+k(v-u)]2}.
[Hint: Express as a limit of the ratio of probabilities as in (i) and use Exercise
9.5(v). Also, 4(x, x + e) = e/(2n) " 2 exp( - x 2 /2) + o(1) as e -+ 0.]
EXERCISES 85
(iii) Prove
*6. (Brownian Meander) The Brownian meander {B+ } is defined as the limiting
distribution of the standard Brownian motion {B,} starting at 0, conditional on
{m = min,,, I B, > e} as E > 0 (see theoretical complement 4 for existence). Let
m + = min a ^ B,+ , M + = max o ,,, , B+ . Prove the following:
L'-(2kz+y)2'2]
(i) P(M -< x, B -< y) = ^ [e-l2kx)2/2 - 0 < y x.
k=-m
[Hint: Express as a limit of ratios of probabilities and use Exercise 9.5(v). Also
P(m > e)=(2/tt) 2 +o(1);see Exercise 10.5(ii)notingmin(A)= max(A)
and symmetry. Justify interchange of limits with the Dominated Convergence
Theorem (Chapter 0).]
(ii) P(M + < x) = I + 2 Yk I ( 1) k exp{ (kx) 2 /2}. [Hint: Consider (i) with
y = x.]
(iii) EM + = (2tt)'t 2 log 2 = 1.7374.... [Hint: Compute f P(M + > x) dx from
(ii).]
(iv) (Rayleigh Distribution) P(B, < x) = I e 22 , x > 0. [Hint: Consider (i) in
the limit as x oo.]
*7. (Brownian Excursion) The Brownian excursion {B* + } is defined by the limiting
distribution of {B} conditioned on {m* > s} as d0 (see theoretical complement
4 for existence). Let M* = max 0 , 1 B* + . Prove the following:
M
(i) P(M* + -< x) = 1 + 2 1 [l (2kx) 2 ] exp{(2kx) 2 /2}, x > 0.
k-1
and note that for k >- A the integrand is nonnegative on [A, oc ). So Lebesgue's
monotone convergence can be applied to interchange integral with sum over
k > 1/(20) to get zero for this. Thus, EM* + is the limit as 0 - 0 of a finite
sum over k <i of an integral that can be evaluated (by parts). Note that this
gives a Riemann sum limit for 2J exp(- Zx 2 ) dx = (it/2)' 1 2 .]
(iv/2)'t 2 , if r = 1
(iii)
r
(2(2)li2)-.4( n )lt2 r 1 (r), if r = 2, 3 , ... ,
Ir
Ii
\ 2 /
where C(r) _ ^k , k ' is the Riemann Zeta function (r >, 2). [Hint: The case
r = 1 is given in (ii) above.] For the case r >- 2, we have
TI k m 1 )^i^T^m^, t<1(m)andt>t(1),
check that
m m
(v) C= U U {{F(t(k/m))+-*F(i(k/m))} u {F(z(k/m) - )+-* F(z(k/m) - )}
m=1k=1
EXERCISES 87
where {Sk 2 nj *: k = 0, 1, 2, ... , 2n} is the simple symmetric random walk bridge
(starting at 0 and tied down at k = 2n) as defined in Exercise 5. [Hint: Arrange
X, Y1 ..... Y in increasing order as X(1) < X (2) < < X 2
define the kth displacement of {Sk 2 n ) *} by
(ii) Find the analytic expression for the probability in (i). [Hint: Consider the event
that the simple random walk with absorbing boundaries at r returns to 0 at
time 2n. First condition on the initial displacement.]
(iii) Calculate the large-sample-theory (i.e., asymptotic as n - x) limit distribution
of fn D., n . See Exercise 4(iii).
(iv) Show
C2nl
r'\ n-r/f
Pl sup (F.(x)- Gn(x))<= 1 ---- r= 1,...,n.
n () 2n
(11J
[Hint: Only one absorbing barrier occurs in the random walk approach.]
4. For the simple symmetric random walk, starting at x, show that E{S
yP(r y = r) for r _< m.
, ,1(, ,)} _
5. Prove that EZn is independent of n (i.e., constant) for a martingale {Zn ). Show also
that E(Zn I {Z o , ... , Zk )) = Zk for any n> k.
6. Write out a proof of Theorem 13.3 along the lines of that of Theorem 13.1.
7. Let {Sn } be a simple symmetric random walk with p a (2, 1).
(i) Prove that {(q/p)s ': n = 0, 1, 2, ...} is a martingale.
(ii) Let c <x <d be integers, So = x, and T = z, A t d := min(t,, r d ). Apply Theorem
13.3 to the martingale in (i) and t to compute P({S n } reaches c before d).
8. Write out a proof of Proposition 13.5 along the lines of that of Proposition 13.4.
9. Under the hypothesis that the pth absolute moments are finite for some p _> 1, derive
the Maximal Inequality P(MM >, A) < EIZn I/ti" in the context of Theorem 13.6.
10. (Submartingales) Let {Z n : n = 0, 1, 2, ...} be a finite or infinite sequence of
integrable random variables satisfying E(Zn+ I {Z 0 , ...Zn }) > Zn for all n. Such a
sequence {Zn } is called a submartingale.
(i) Prove that, for any n > k, E(Zn ( {Z0.....4)) >- Z.
(ii) Let Mn = max{Z o , ... , Zn }. Prove the maximal inequality P(MM _> A) _< EZ/A 2
for A > 0. [Hint: E(ZkIAk(Zn Zk)) = E(Z,IAkE(Zn Zk I {Zo, ... , Zk})) i 0
for n>k, where A k :={Z 0 <A,...,Zk _ I <A,Zk ->.i}.]
(iii) Extend the result of Exercise 9 to nonnegative submartingales.
11. Let {Zn } be a martingale. If EIZ,,I < oo then prove that IZ,,I is a submartingale,
p >_ 1. [Hint: Use Jensen's or Hlder's Inequality, Chapter 0, (2.7), (2.12).]
12. (An Exponential Martingale) Let {X,: j _> 0} be a sequence of independent random
variables having finite moment-generating functions 4,(^):= E exp{^XX } for some
96 0. Define Sn := X, + + XX , Zn = exp{^Sn }/fl7 = , q().
(i) Prove that {Zn } is a martingale.
(ii) Write M = max{S...... Sn }. If > 0, prove that
n
P(MM - A) -< exp{ ZA}
11 O ; (Z) ( >0).
13. Let {Xn : n >_ 1) be i.i.d. Gaussian with mean zero and variance a 2 > 0. Let
Sn = X, + + Xn , MM = max{S 1 , ... , Sn }. Prove the following for A > 0.
(i) P(MM _> 2) < exp{ 2 2 /(2a 2 n)). [Hint: Use Exercise 12(ii) and an appropriate
choice of .]
(ii) P(max {ISST: 1 < j < n} >_ Aaln) _< 2 exp{ A 2 /2}.
14. Let r ' r 2 be stopping times. Show the following assertions (i) (v) hold.
(i) z l v r 2 '= max(r l , tr z ) is a stopping time.
EXERCISES 89
17. Let {S} be the simple symmetric random walk starting at 0. Let r = inf{n ? 0:
S=2n}.
(i) Calculate Er from the distribution of t.
(ii) Use the martingale stopping theorem to calculate Et.
(*iii) How does this generalize to the cases r = inf{n ? 0: S = b n}, where b is
a positive integer? [Hint: Check that n + S. is even for n = 0, 1, 2, ....]
18. (i) Show that if X is a random variable such that g(z) = Ee^ x is finite in a
neighborhood of z = 0, then EX < oo for all k = 1, 2.....
(ii) For a Brownian motion {X,} with drift p and diffusion coefficient a 2 , prove that
exp{AX, Atp A 2 a 2 t/2} (t _> 0) is a martingale.
19. Consider an arbitrary Brownian motion with drift and diffusion coefficient a 2 > 0.
(i) Let m(x) = ET", where Ts is the time to reach the boundary {c, d} starting at
x e [c, d]. Show that m(x) solves the boundary-value problem
d e m dm
m(c)=m(d)=0.
zag d2 + dz 1,
(ii) Let r(x) = Px (r d < ;) for x e [c, d]. Verify that r(x) solves the boundary value
problem
Z 2 x z
dx
+ dr = 0,
dx
r(c) = 0, r(d) = 1.
THEORETICAL COMPLEMENTS
Proof. To see this one uses the general measure-theoretic fact that the probability
of any event A belonging to the sigmafield F = 6 {X,, X2, ... , Xn , ...} generated (
the field of events ,`fo = Un 1 ar {X i ..... X} in the sense that A. e a{X ... , X} for
each n and P(AAA) -+ 0 as n -+ oo, where A denotes the symmetric difference
AAA = (A n A n v (A` n A). Applying this approximation to a tail event A, one
obtains that since Ac u{X +1, X +2, ...} for each n, A is independent of each event
A. Thus, 0 = lim... P(AAA) = 2P(A)P(AC) = 2P(A)(1 P(A)). The only solutions
to the equation x(1 x) = 0 are 0 and 1. n
2. Let S. = X l + + X, n 1. Events that depend on the tail of the sums are trivial
(i.e., have probability I or 0) whenever the summands X 1 , X 2 , ... are i.i.d. This is a
THEORETICAL COMPLEMENTS 91
consequence of the following more general zeroone law for events that symmetrically
depend on the terms X 1 , X 2 , .. . of an i.i.d. sequence of random variables (or vectors).
Let ' denote the sigmafield of subsets of I8x' _ {(x,, x 2 , ...): x ; E W} generated by
events depending on finitely many coordinates.
Theorem T.1.2. (HewittSavage ZeroOne Law). Let X,, X 2 ,. . . bean i.i.d. sequence
of random variables. If an event A = {(X,, X 2 , ...) e B}, where Be , is invariant
under finite permutations (Xi ,, X.....) of terms of the sequence (X,, X 2 , ...), that
is, A = {(X, X......) e B} for any finite permutation (i,, i 2 , ...) of (1, 2, ...), then
P(A) = I or 0.
Proof. To prove the HewittSavage 0-1 law, proceed as in the Kolmogorov 0-1 law
by selecting finite-dimensional approximants to A of the form A n = {(X,, ... , Xn ) e B},
B. e 4n, such that P(AAA,) - 0 as n * oo. For each fixed n, let (i,, i Z , ...) be the
B,
permutation (2n, 2n 1, ... , 1, 2n + I, ...) and define A. = {(X; , ... , X; ,) e B}.
Then A and A n are independent with P(A, n A n ) = P(A,)P(A,) = (P(A n )) 2 - (P(A)) z
as n ^ co. On the other hand, P(AA,) = P(ALA) * 0, so that P(A,O n ) 0 and,
in particular, therefore P(A n r n ) * P(A) as n 4 co. Thus x = P(A) satisfies x = x 2 .
n
Proof. To prove this, first observe that P(S, = 0 i.o.) is 1 or 0 by the HewittSavage
zeroone law (theoretical complement 1.2). If X , P(S n = 0) < oc, then P(S, = 0
i.o.) = 0 by the BorelCantelli Lemma. If Y_ , P(S, = 0) is divergent (i.e., the
expected number of visits to 0 is infinite), then we can show that P(Sn = 0 i.o.) = 1
as follows. Using independence and the property that the shifted sequence
Xk , Xk ,,, ... has the same distribution as X,, X 2 , ... , one has
=Z
P(Sn =0)P(Sm S, 540,m>n)
Thus,
1 1 "
P(S " = 0) = Ee"s^ dt = (p"(t) dt
lac _ 2n _"
1 N dt
9(x) = 2n " 1 - xtp(t)
f n dt f ( l
" Re 1 dt =
f 1 - x(p (t) dt a 1 - xq,(t)
dt
J n -- x0(t) - .L \1 - xtP(t)/ J -" I 1 - x^P(t)Iz j I1 - x^V(t)I Z
> a
J- a (1 -
1 -x(p 1 (t)
x(V1(t))2 + x2(t)
1-x
dt v
f b
-6 (1
1 -x
- x + xslti) 2 + x2e2t2
dt
dt> a 1-x dt
2(1 - x) 2 + 3x 2 r 2 t 2 3[(1 -x) 2 + e 2 t 2 ]
2 - 1^ ES 1 n
=tan /J-^ asx-' 1.
3e 1 - x 3E
l(f)' L
'I "ki
f(x1 .,..., x !k)pI1....f Pik(dxi^ . . .dx^k)s
where
The Borel sigmafield .4 of C[0, 1] for the metric p is the smallest sigmafield of subsets
of C[0, 1] that contains all finite-dimensional events of the form
min(i,j) / k \2
^yijXiXj=>Xi Xj ^ (Lr trt) = ttrtr-1) xi J1 iO.
i,j i,j r=1 r i=r
However, the problem with this construction of a probability space (i2, F, P) for
{X,} is that events in F can only depend on specifications of values at countably
many time points. Thus, the subset C[0, 1] of S2 is not measurable; i.e., C[O, 1] 0 F.
This dilemma is resolved in the theoretical complement to Section I.13 by showing
that there is a modification of the process {X} that yields a process {B,} with sample
paths in C[O, 1] and having the same finite-dimensional distributions as {X}; i.e.,
{B,} is the desired Brownian motion process. The basic idea for this modification is
to show that almost all paths q + Xq , q e D, where D is a countable dense set of time
points, are uniformly continuous. With this, one can then define {B,} by the continuous
extension of these paths given by
B, _ XQ ift=qeD
lim Xq if t D.
q1t
Probability and Measure, 2nd ed., Wiley, New York, P. 558). In theory, this is enough
sample path regularity to make manageable most such measurability issues connected
with processes at uncountably many time points. In practice though, one seeks to
explicitly construct models with sufficient sample path regularity that such
considerations are often avoidable. The latter is the approach of this text.
1. Let {X;" }, n = 1, 2, ... and {X,} be stochastic processes whose sample paths
)
(S, 9). Assume {X;" } and {X} are defined on a probability space (S2, .F , Q). Then,
)
Convergence in distribution of {X} to {X} has been defined in the text to mean that
the sequence of real-valued random variables Y:= f({X;" }) converges in distribution )
To see the equivalence, first observe that for any continuous f: S R', the
functions cos(rf) and sin(rf) are, for each r E R', continuous and bounded functions
on S. Therefore, assuming that condition (T.8.2) gives the convergence of the
characteristic functions of the Y. to that of Y for each continuous f on S. In particular,
the Y" must converge in distribution to Y. To go the other way, suppose that f: S ff8'
is continuous and bounded. Assume without loss of generality that 0 <f _< 1. Then,
for each N _> 1,
t
liminf f dP + N , (T.8.5)
s
by (T.8.4) applied to P, and the fact that lim Prob(} > x) = Prob(Y > x) for all
points x of continuity of the d.f. of Y implies liminf Prob( Y > y) >- Prob(Y > y) for
ally. Letting N -' gives
Thus, in general,
limsup
J s
f dP < I f dP -< liminf I f dP
i s ', s
(T.8.6)
which implies
With the above equivalence in mind we make the following general definition.
Definition. A sequence {P} of probability measures on (S, .4) converges weakly (or in
distribution) to a probability measure P on (S, 9) provided that lim f s f dP = f s f dP
for all bounded and continuous functions f: S -+ U8'.
then {P.} has a subsequence weakly convergent to a probability measure Q on (S, t7).
Moreover, if S is complete and separable then the condition (T.8.8) is also necessary.
The condition (ii) refers to the equicontinuity of the functions in A in the sense that
given any r > 0 there is a common S > 0 such that for all functions w e A we have
Iw, cw,I < e if It sI < b. Conditions (i) and (ii) together imply that A is uniformly
bounded in the sense that there is a number B for which
This is because for N sufficiently large we have sup WEA v w (1/N) < I and, therefore,
for each 0<t<1
N
Iw1(< Iw01 + wi,1N wr-,u Nl _< sup Iwol + N sup v^,( = B.
i =1 weA w.A \N
4. Combining the Prohorov theorem (T.8.2) with the Arzela-Ascoli theorem (T.8.3)
gives the following criterion for tightness of probability measures { P} on S = C[0, 1].
Theorem T.8.4. Let {P} be a sequence of probability measures on C[0, 1]. Then
{P} is tight if and only if the following two conditions hold.
98 RANDOM WALK AND BROWNIAN MOTION
(ii) For each e> 0, ry > 0, there is a 0 <(5 < 1 such that
P"({weC[0,1]:v",(b)>_a})_<ry, n>_1.
Proof. If {P"} is tight, then given ry > 0 there is a compact K such that P(K) > 1 ry
for all n. By the ArzelaAscoli theorem, if B> supKIWol then
P"({w e C[0,1]: v W (S) >_ ej) < P(KC) < ry for all n _> 1.
The converse goes as follows. Given ry > 0, first select B using (i) such that
P"({w: Iw o l < B}) _> 1 Zry, for n >, 1. Select S, using (ii) such that P({w: v w ((5,) < 1/r})
1 for n >, 1. Now take K to be the closure of
{w: Iw o l < B} n n t
co: v w (8,) <
1r
1
Then P"(K) > 1 ry for n > 1, and K is compact by the ArzelaAscoli theorem.
Theorem T.8.5. Let {X: 0 < t < 1) and {XX :0 _< t <, 1} be stochastic processes on
(S2, .f, P) which have a.s. continuous sample paths and suppose that the
finite-dimensional distributions of {X} converge to those of {X}. Then {X;"}
converges weakly to {X,} if and only if for each F > 0
Corollary. For the last limit to hold it is sufficient that there be positive numbers
a, , M such that
EIX;" X" M
) -< MIt
it sl' +0 for all s, t, n.
To prove the corollary, let D be the set of all dyadic rationals in [0, 1], i.e., numbers
in [0, 1] of the form j/2" for integers j and m. By sample path continuity, the oscillation
THEORETICAL COMPLEMENTS 99
j2 - k<i2 - ^<(j+l)2 - k
r
i2 - m=j2 -k + 1] 2 - m' wherek<m 1 <m 2 < <m,^m,
(n) lI ^`
J)j
X - k X.j2-k+al/+) Xj2 - k+al -
N=1
Therefore,
Let e > 0 and take = 2 -k+ so small (i.e., k so large) that ^=k+1 1/m 2 < E/2. Then
h=0 m
x m 2
m 2a2m M2 m(l+p) = M m
mk+1 mk+12
of all orders, this approach can also be used to give an alternative rigorous construction
of the Wiener measure based on Prohorov's theorem as the limiting distribution of
random walks. (Compare theoretical complement 13.1 for another construction). Let
Z 1 , Z 2 ,... be i.i.d. random variables on a probability space (S2, y , P) having mean
zero, variance one, and finite fourth moment m, = EZ. Define So = 0,
S"=Z 1 + +Z",n>,1,and
Xtn] = n- IIZ
Stnr] + n - ' ]Z (nt [ nt])Zt,, i +l, 0 <- I -< 1.
We will show that there are positive numbers a, and M such that
By our corollary this will prove tightness of the distributions of the process
n = 1, 2, .... This together with the finite-dimensional CLT proves the FCLT under
the assumption of finite fourth moments. One needs to calculate the probabilities of
fluctuations described in the ArzelaAscoli theorem more carefully to get the proof
under finite second moments alone.
To establish (T.5.1), take a = 4. First consider the case s = (j/n) < (k/n) = t are
at the grid points. Then
k k k k
=n 2 E{Zi,ZZ,,Z;,}
ij+1 i2=j+1 i3=j+1 is=j+1
1.
Thus, in this case,
Next, consider the more general case 0 < s, t -< 1, but for which It si -> 1/n. Then,
for s < t,
4
E{X;' X;" ] } 4 = n Z E
t [ntl
j=(ns]+ 1
ZJ + (nt [nt])Z[",1+1 ([ns] ns)Z]"sl+l
[n^] \a
n - 2 3 E( ' Z^ ) + (nt [nt]) 4 EZ1", 1+ 1
=[ns]+I
In the above, we used the fact that (a + b + c) 4 < 3 4 (a 4 + b a + c a ) to get the first
inequality. The analysis of the first (gridpoint) case was then used to get the second
inequality. Finally, if it sI < 1/n, then either
(a) k_<s<t<k+
l forsome0_<k_<n 1,or
n n
k k+ 1 k+2
(b)- s< and\t< forsome0_<k_<n-1.
n n n n
E{X^") _ XS"^}a = n Z E Zk + nl t
k k
\ Zk+i n(s -^Zk+I
la
+i
\ ( n n )
( ( k+l ^ k+11 la
=t1
n -Z E n \ t Zk+zn s JZk+i Jj
n n
<24n2(t
nt
k a+(k+
am
1 a
s) m a
+
= 2 a n 2 m a 1(t k 1-^ a + I k n--s )a }
a )J)
+ I + k + 1 sl = 2 n2m
tk
2an2ma 4 4 (t s)a
n n J
The FCLT (Theorem 8.1) is stated in the text for convergence in S = C[0, 00),
when S has the topology of uniform convergence on compacts. One may take the met-
ric to be p(w, co') _ Zk 1 2 -k dk /(1 + dk ), where d k = max{Iw(t) cu'(t)J: 0 t k}.
Since the above arguments apply to [0, k] in place of [0, 1], the assertion of Theorem
8.1 follows (under the moment condition m a < oo).
102 RANDOM WALK AND BROWNIAN MOTION
6. (Measure Determining Classes) Let (S, p) be a metric space, .a(S) its Borel sigmafield.
A class l c a(S) is measure-determining if, for any two finite measures u, v,
p(C) = v(C) VC e' implies p = v. An example is the class -9 of all closed sets. To
see this, consider the lambda class .sad of all sets A for which p(A) = v(A). If this class
contains -9 then by the Pi-Lambda Theorem (Chapter 0, Theorem 4.1)
a a(s) = .a(S). Similarly, the class (0 of all open sets is measure-determining. A
class 9 of real-valued bounded Borel measurable functions on S is measure-determining
if ff dp = f f dv Vg e I implies p = v. The class C b (S) of real-valued bounded
continuous functions on S is measure-determining. To prove this, it is enough to
show that for each Fe -9 there exists a sequence {f} c Cb (S) such that f" j I F as
n I oo. For this, let h a (r) = 1 nr for 0 _< r < 1/n, h a (r) = 0 for r > 1/n. Then take
f(x) = h"(p(x, F)).
f ({Xs }) := max{X5 :0 ,< s ,< t} __ M, we get (10.10), for P(T. > t) = P(M< < z), if z > 0.
The case z < 0 is similar.
Joint distributions of several functionals may be similarly obtained by looking at
linear combinations of the functionals. Here is the precise statement.
Theorem T.9.1. If f: C[0, oo) -- l8'` is continuous, say f = (fl , ... , f,,) where
f : C[0, oo) - III', then the random vectors X. = f({X;" 1 }), n >, 1, converge in
distribution to X = f({X,}).
Proof. This can be proved using Alexandrov's Theorem (T.8.1(ii)), since for any
closed set F, f '(F) c f - `(F) = Df v f - `(F), where the overbar denotes the closure
-
of the set. n
THEORETICAL COMPLEMENTS 103
A proof of Proposition 12.1 for the special case of infinite-dimensional events that
depend on the empirical process through the functional (w - supo,,,, Ico,l) used to
define the Kolmogorov-Smirnov statistic (12.11) is given below. This proof is based
on a trick of M. D. Donsker (1952), "Justification and Extension of Doob's Heuristic
Approach to the Kolmogorov-Smirnov Theorems," Annals Math. Statist., 23,
pp. 277-281, which allows one to apply the FCLT as given in Section 8 (and proved
in theoretical complements to Section 1.8 under the assumption of finite fourth
moments).
The key to Donsker's proof is the simple observation that the distribution of the
order statistic (Y i) , ... , Y) of n i.i.d. random variables Y, Y2 , ... from the uniform
distribution on [0, 1] can also be obtained as the distribution of the ratios
S, S Z S
( S., S+ I S +I
Intuitively, if the T are regarded as the successive times between occurrence of some
phenomena, then S, is the time to the (n + 1)st occurrence and, in units of S + 1 ,
the occurrence times should be randomly distributed because of lack of memory and
iridependence properties. A version of this simple fact is given in Chapter IV
(Proposition 5.6) for the Poisson process. The calculations are essentially the same,
so this is left as an exercise here.
The precise result that we will prove here is as follows. The symbol = below
denotes equality in distribution.
Proposition T.12.1. Let Y1 , Y2 be i.i.d. uniform on [0, 1] and let, for each n _> 1,
, ...
d
='nmax
d 'I
I Sk
k5n S^11
k
-- =
n
n Skk kSn+1
max I --
Sn+1 kn n
n
where
and, by the SLLN, n/(S n+1 ) -^ I a.s. as n - ao. The result follows from the FCLT,
(8.6), and the definition of Brownian bridge. n
where FE := {w e C[0,1]: dist(o, F) _< a} fordist(w, A):= inf{p(w, y): ye Al, A c C[0, 1].
But, starting with finite-dimensional sets and then using the monotone class argument
(Chapter 0), one may check that the events {{B*) e F } and {0 _< B, _< e} are E
independent. Therefore, P({B,} e F 0 _< B 1 _< e) _< P({B*} e FE ) for any s > 0. Since
F is closed, the events {{B*} e F } decrease to {{B*} e F) as e - 0, and tightness
E
follows from the continuity of the probability measure P and Alexandrov's Theorem
T.8.1 (ii).
4. A check of tightness is also required for the Brownian meander and Brownian
excursion as described in Exercises 12.6 and 12.7, respectively. For this, consult
R. T. Durrett, D. L. Iglehart, and D. R. Miller (1977), "Weak Convergence to
Brownian Meander and Brownian Excursion," Ann. Probab., 5, pp. 117-129. The
THEORETICAL COMPLEMENTS 105
distribution of the extremal functionals outlined in the exercises can also be found
in R. T. Durrett and D. L. Iglehart (1977), "Functionals of Brownian Meander and
Brownian Excursion," Ann. Probab., 5, pp. 130-135; K. L. Chung (1976), "Excursions
in Brownian Motion," Ark. Mat., pp. 155-177; D. P. Kennedy (1976), "Maximum
Brownian Excursion," J. App!. Probability, 13, 371-376. Durrett, Iglehart and Miller
(1977) also show that the * and + commute in the sense that the Brownian excursion
can be obtained either by a meander of the Brownian bridge (as done in Exercise
12.7) or as a bridge of the meander (i.e., conditioning the meander in the sense of
theoretical complement 3 above). Brownian meander and Brownian excursion have
been defined in a variety of other ways in work originating in the late 1940's with
Paul Levy; see P. Levy (1965), Processus Stochastiques et Mouvement Brownien,
Gauthier-Villars, Paris. The theory was extended and terminology introduced in
K. Ito and H. P. McKean, Jr. (1965), Diffusion Processes and Their Sample Paths,
Springer Verlag, New York. The general theory is introduced in D. Williams (1979),
Diffusions, Markov Processes, and Martingales, Vol. 1, Wiley, New York. A much
fuller theory is then given in L. C. G. Rogers and D. Williams (1987), Diffusions,
Markov Processes, Martingales, Vol. II, Wiley, New York. Approaches from the point
of view of Markov processes (see theoretical complement 11.2, Chapter V) having
nonstationary transition law are possible. Another very useful approach is from the
point of view of FCLTs for random walks conditioned on a late return to zero; see
W. D. Kaigh (1976), "An Invariance Principle for Random Walk Conditioned by a
Late Return to Zero," Ann. Probab., 4(1), pp. 115 -121, and references therein. A
connection with extreme values of branching processes is described in theoretical
complement 11.2, Chapter V.
j P max
n=1
sup Xq - XXI2 "i >
(0_<k1.2- qeJ,,,,, D
n1 < oo . (T.13.1)
By the Borel-Cantelli lemma we will get from this that with probability 1, for all n
sufficiently large,
1
max sup IXq - Xkf2 .,I -< -. (T.13.2)
OEk<n2" geJ,,.kn D n
_>
In particular, it will follow that with probability 1, for every t > 0, q - Xq is uniformly
continuous on D n [0, t]. Thus, almost all sample paths of {Xq : q e D} have a unique
extension to continuous functions {B,: t 0}. That is, letting C = {wu e Q: for each
B,(co) = Xq (W), if t = q e D,
(T.13.3)
lim Xq (co), if t 0 D,
qt
where the limit is over dyadic rational q decreasing to t. By construction, {B,: t _> 0}
has continuous paths with probability 1. Moreover, for 0 < t, < < t, with prob-
ability one, (B,,, .. . , B1 ) = lim" . (Xqc ., ... , Xq ,) for dyadic rational q;" , ... , qp"
) )
decreasing to t,.. ... t k . Also, the random vector (Xgti .... .. Xq ^ ,) has the multivariate
normal distribution with mean vector 0 and variancecovariance matrix
= min(t ; , t i ), I < i, j < k as a limiting distribution. lt follows from these
two facts that this must be the distribution of (B 1 , B, k ). Thus, {B,} is a standard
Brownian motion process.
To verify the condition (T.13.1) for the BorelCantelli lemma, just note that by
the maximal inequality (see Exercises 4.3, 13.11),
P( max IX,+,a2--
(s2"
a) 2P(IX,+a X11 % a)
2
4 E(X1+, X,) 4
a
66'
a
(T.13.4)
since the increments of {X,} are independent and Gaussian with mean 0. Now since
the events {max 142 .IX, +1a2 -, X, a} increase with m, we have, letting m * oo,
Thus,
/ 1
Pl max sup Xq Xk12 .4 > - Pf sup (Xq Xk,2"I >
O^kan2"qeJ".knD n k=0 \qeJ",k,D n
n2"
6^ri2 ^
( n)2
)4 =
6
5
(T.13.6)
which is summable.
1 MARKOV DEPENDENCE
Definition 1.1. A stochastic process {X0 , X 1 ..... X, ...} has the Markov
property if, for each n and m, the conditional distribution of X + 1 .. , Xn +m
,
is a function only of X.
Proof. For simplicity, take the state space S to be countable. The necessity of
the condition is obvious. For sufficiency, observe that
109
110 DISCRETE-PARAMETER MARKOV CHAINS
The last equality follows from the hypothesis of the proposition. Thus the
conditional distribution of the future as a function of the past and present states
i 0 , i 1 , ... , i n depends only on the present state i n . This is, therefore, the
conditional distribution given X n = i n (Exercise 1). n
An i.i.d. sequence and a random walk are merely two examples of Markov
chains. To define a general Markov chain, it is convenient to introduce a matrix
p to describe the probabilities of transition between successive states in the
evolution of the process.
matrix p = ((p i3 )), where i and] vary over a finite or denumerable set S, satisfying
(i) p i; >, 0 for all i and j,
The set S is called the state space and its elements are states.
Think of a particle that moves from point to point in the state space according
to the following scheme. At time n = 0 the particle is set in motion either by
starting it at a fixed state i o , called the initial state, or by randomly locating it
in the state space according to a probability distribution it on S, called the
initial distribution. In the former case, it is the distribution concentrated at the
state i 0 , i.e., n ; = 1 if j = i o , ire = 0 if j : i 0 . In the latter case, the probability
is 7r ; that at time zero the particle will be found in state i, where 0 < n i < 1 and
y i tc i = 1. Given that the, particle is in state i 0 at time n = 0, a random trial is
performed, assigning probability p ;0 to the respective states j' E S. If the
outcome of the trial is the state i l , then the particle moves to state i 1 at time
n = 1. A second trial is performed with probabilities p i , j - of states j' E S. If the
outcome of the second trial is i 2 , then the particle moves to state i 2 at time
n = 2, and so on.
A typical sample point of this experiment is a sequence of states, say
(i o , i 1 , i z , ... , i n , ...), representing a sample path. The set of all such sample
paths is the sample space S2. The position Xn at time iI is a random variable
whose value is given by X n = i n if the sample path is (i 0 , i l , ... , i n , ...). The
precise specification of the probability P,, on Q for the above experiment is
given by
.. p
Pn( XO = l 0, X1 = ii, , Xn = in) = M io Pioi1 Pi l i 2 (2.1)
nj ) _ P kcS
ik Pkj (2.4)
The elements of the matrix p" are defined recursively by p" = p" 'p so that the -
It is easily checked by induction on n that the expression for p;n is given directly )
Now let us check the Markov property of this probability model. Using (2.1)
and summing over unrestricted coordinates, the joint distribution of
Xo , Xn ,, X" 2 , . .. , Xnk , with 0 = n o < n l < n2 < < nk, is given by
Pa(X0 = i, Xn l =j, Xn 2
12' ... , Xn k =ik)
_ Z E . . . I (
Y /n2 - 1 J2 )
7r l Pii, Pi, . . . . pin 1 - lj l)(Pi l in i + I Pin, + l in, + Z . . I7 . .
1 2 k
where >, is the sum over the rth block of indices ii + n ,_ ', .+ .), . 1,,
, in ,
(r = 1, 2, . .. , k). The sum Y_ k , keeping indices in all other blocks fixed, yields
the factor p^kk Ok -') using (2.6) for the last group of terms. Next sum successively
over the (k 1)st, ... , second, and first blocks of factors to get
, Xn k =1k) = 7riP) p
^nJ 2 nl)..
P (X0 = i, Xn1 =j1, X,2 =J2' . plkk Ijk 1). (2.8)
Xnk 1) .
PE(Xni j1, X2 121 , Jk) = ( ^I Pin1))Pjll.i2 "
. .pjkk lik 1). (2.9)
( ics /
Although by Proposition 1.1 the case m = 1 would have been sufficient to prove
SOME EXAMPLES 113
the Markov property, (2.10) justifies the terminology that p' :_ ((p;; >)) is the
m-step transition probability matrix. Note that p' is a stochastic matrix for all
m>'1.
The calculation of the distribution of X, follows from (2.10). We have,
where n' is the transpose of the column vector n, and (n'pt) j is the jth element
of the row vector n'pm.
3 SOME EXAMPLES
The transition probabilities for some familiar Markov chains are given in the
examples of this section. Although they are excluded from the general
development of this chapter, examples of a non-Markov process and a Markov
process having a nonhomogeneous transition law are both supplied under
Example 8 below.
This means that if the process is now in state i it must be in state h(i) at the
next instant. In this case, if the initial state X o is known then one knows the
entire future. Thus, if X 0 = i, then
X l = h(i),
X z = h(h(i)) := h (z) (i), ... ,
" - 1)
X" = h(h ( (i)) = h ( " (i), ... .
)
Hence p;;^ = 1 if j = h ( " (i) and p;j" = 0 if j ^ h ( " (i). Pseudorandom number
) ) )
p si =p (ieS,jeS). (3.2)
114 DISCRETE-PARAMETER MARKOV CHAINS
Pi; =P ifj =i +1
=q ifj =i -1
=0 ifjjiI> 1. (3.3)
p ;j =p ifj =i +1 andc<i<d
=q ifj =i I andc<i<d
Pc,c +i = I, Pa,e -i = 1. (3.4)
In this case, if at any point of time the particle finds itself in state c, then at
the next instant of time it moves with probability I to c + 1. Similarly, if it is
at d at any point of time it will move to d I at the next instant. Otherwise
(i.e., in the interior of [c, d]), its motion is like that of a simple random walk.
For c < i < d, p i; is as defined by (3.3) or (3.4). In this case, once the particle
reaches c (or d) it stays there forever.
One may think of this Markov chain as a partial-sum process as follows. Let
X0 have a distribution n. Let Z 1 , Z 2 , ... be a sequence of i.i.d. random variables
with common distribution Q and independent of X o . Then,
is a Markov chain with the transition probability (3.6). Also note that Example
3 is a special case of Example 6, with Q( l) = q, Q(l) = p, and Q(i) = 0 for
i: +1.
The last row says that "zero" is an absorbing state, i.e., if at any point of time
X = 0, then X. = 0 for all m > n, and extinction occurs.
[r+(s-1)c][r+(s-2)c]. r[b+(T-1)c].b
(3.9)
[r+b+(n 1)c][r+b+(n-2)c]...[r+b]
where
s= Y_ s k , r =ns. (3.10)
k=1
[ r + (n 1) c] r
P(X I = 1, ... X = 1) (3.11)
[r+b+(n-1)c].[r+b]
[b + (n 1)c].b
P(XI=0,...,Xn =0)= (3.12)
[r+b+(n 1)c][r+b]
In particular,
P (X1 = E1, .. . , X. = n)
P (X 1 = c1,. ..,Xt = En t )
[r+s_Ic]
ifE= 1
r+b+(n 1)c
(3.13)
[b+r-IC]
ife=0.
r+b+ (ii 1)c
It follows that {X} is non-Markov unless c = 0 (in which case {X} is i.i.d.).
Note, however, that {X} does have a distinctive symmetry property reflected
in (3.9). Namely, the joint distribution is a function of s = Yk =1 e k only, and
is therefore invariant under permutations of e l . i,,. Such a stochastic process
is called exchangeable (or symmetrically dependent). The Plya urn model was
originally introduced to illustrate a notion of "contagious disease or "accident
proneness' for actuarial mathematics. Although {X} is non-Markov for c ^ 0,
it is interesting to note that the partial-sum process {S}, representing the
evolution of accumulated numbers of red balls sampled, does have the Markov
property. From (3.13) one can also get that
STOPPING TIMES AND THE STRONG MARKOV PROPERTY 117
r + CS n _ , if s = l +s_,
+(n-1)c
P(sn= sIS, =s1, ,Sn_, =s,)= r+b
b +(n S- 1 ) c
lfs=s _ i .
r + b + (n 1)c
(3.14)
Observe that the transition law (3.14) depends explicitly on the time point n.
In other words, the partial-sum process {S} is a Markov process with a
nonhomogeneous transition law. A related continuous-time version of this
Markov process, again usually called the P1ya process is described in Exercise
1 of Chapter IV, Section 4.1. An alternative model for contagion is also given
in Example 1 of Chapter IV, Section 4, and that one has a homogeneous
transition law.
One of the most useful general properties of a Markov chain is that the Markov
property holds even when the "past" is given up to certain types of random
times. Indeed, we have tacitly used it in proving that the simple symmetric
random walk reaches every state infinitely often with probability 1 (see
Eq. 3.18 of Chapter 1). These special random times are called stopping times
or (less appropriately) Markov times.
If co is such that Y(w) y whatever be n (i.e., if the process never reaches y),
then take r y,(w) = oo. Observe that
Hence zr Y is a stopping time. The rth return times r;' ofy are defined recursively by
)
Once again, the infimum over an empty set is to be taken as oo. Now whether
or not the process has reached (or hit) the state y at least r times by the time
m depends entirely on the values of Y1 , ... , Y.. Indeed, {rI m} is precisely
the event that at least r of the variables Y1 , ... , Y,n equal y. Hence ry" is a
stopping time. On the other hand, if n y denotes the last time the process reaches
the state y, then ?J is not a stopping time; for whether or not i < m cannot
in general be determined without observing the entire process {Y n }.
Let S be a countable state space and p a transition probability matrix on S,
and let P,, denote the distribution of the Markov process with transition
probability p and initial distribution n. It will be useful to identify the events
that depend on the process up to time n. For this, let S2 denote the set of all
sequences w = (i 0 , i 1 , i 2 , ...) of states, and let Y(w) be the nth coordinate of w
(if w = (i o , i,, ... , i, ...), then Yn (cw) = in ). Let .fin denote the class of all events
that depend only on Yo , YI , ... , Yn . Then the n form an increasing sequence
of sigmafields of finite-dimensional events. The Markov property says that given
the "past" Yo , Yi , ... , Y. up to time m, or given .gym , the conditional distribution
of the "after-m"stochastic process Y, = {(Ym ) } := {Ym+n . n = 0, 1, ...} is P.
In other words, if the process is re-indexed after time m with m + n being
regarded as time n, then this stochastic process is conditionally distributed as
a Markov chain having transition probability p and initial state Y..
Suppose now that r is the stopping time. "Given the past up to time t" means
given the values oft and Yo , Y1 , ... , YY . By the "after-r"process we now mean
the stochastic process
Theorem 4.1. Every Markov chain { Yn : n = 0, 1, 2, ...} has the strong Markov
property; that is, for every stopping time i, the conditional distribution of the
after-r process Yt = { Yt+n : n = 0, 1, 2, ...}, given the past up to time i is P..
on the set {i < co}.
Proof. Choose and fix a nonnegative integer m and a positive integer k along
with k time points 0 _< m, < m 2 < < m k , and states i o , i l , ... 'im'
STOPPING TIMES AND THE STRONG MARKOV PROPERTY 119
Now all the steps in (4.7) remain valid if one replaces Ty l) by i;,' -1) and T;,2) by
zy'r and assumes that < oo almost surely. Hence, by induction,
P (') < oo) = 1 for all positive integers r. This is equivalent to asserting
The unrestricted simple random walk {S} is an example in which any state
i E S can be reached from every state j in a finite number of steps with positive
probability. If p denotes its transition probability matrix, then p 2 is the transition
probability matrix of { Y} 2_ {SZ : n = 0, 1, 2, ...}. However, for the Markov
chain { Y}, transitions in a finite number of steps are possible from odd to odd
integers and from even to even, but not otherwise. For {S} one says that there
is one class of "essential states and for { Y} that there are two classes of
essential states.
A different situation occurs when the random walk has two absorbing
boundaries on S = {c, c + 1, ... , d 1, d}. The states c, d can be reached (with
positive probability) from c + 1, ... , d 1. However, c + 1, ... , d 1 cannot
A CLASSIFICATION OF STATES OF A MARKOV CHAIN 121
>,
behavior of the process. If a chain has several essential classes, the process
restricted to each class can be analyzed separately.
Definition S.I. Write i --' f and read it as either ` j is accessible from i" or "the
Y_
process can go from i to j" if p;l > 0 for some n ) 1.
Since
Pi;)= Pi (5.1)
1I.i2....,ln - 1 E.S
i - j if and only if there exists one chain (i, i,, i 2 , ... , i n _ 1 , j) such that
Pul' pi,i2 , . , p,,,_ 1 j are strictly positive.
Proposition 5.1
(a) For every i there exists (at least one) j such that i ' f. --
Y-
Proof. (a) For each i, jEs pit = 1. Hence there exists at least one j for which
p i; >0; for this] one has i-- j.
(b) i -* j, j - k means that there exist m? 1, n I such that p;f > 0, >, )
pi
l
m+1)
= li- Pu
i
(m) = pif Pi (m) + pilp it ) > 0. (5.3)
Ics )3E.%
i.e., there exists m' >, 1 such that p1') > 0. Then, by (b), i - k. Since i is essential,
one must have k -> i. Together with i --+ j this implies (again by (b)) k - j.
Thus, if any state k is accessible from j, then j is accessible from that state k,
proving that j is essential.
(e) If 60 is empty (which is possible, as for example in the case p 1 = 1,
and j - k. Hence i -4 k (by (b)). Also, k - j and j -> i imply k --> i (again by
(b)). Hence i H k. This shows that "-+" is transitive (on 9 as well as on S).
From the proof of (e) the relation "^-" is seen to be symmetric and transitive
on all of S (and not merely 9). However, it is not generally true that i i (or,
i - i) for all i e S. In other words, reflexivity may break down on S.
Definition 5.3. A transition probability matrix p having one essential class and
no inessential states is called irreducible.
Now fix attention on S. Distinct subsets of essential states can be identified
according to the following considerations. Let i e.1. Consider the set
6'(i) = { j e 6": i --> j}. Then, by (d), i +-] for all j E 6(i). Indeed, if], k e 6"(i), then
j H k (for j --> i, i - k imply j -+ k; similarly, k - j). Thus, all members of 6'(i)
communicate with each other. Let r e 6, r 6"(i). Then r is not accessible from
a state in 6'(i) (for, if j e e'(i) and j -+ r, then i --> j, j -a r will imply i -> r so
that r e 6"(i), a contradiction). Define S(r) = { j E 6',r -+ j}. Then, as before, all
states in c0(r) communicate with each other. Also, no state in 6"(r) is accessible
from any state in 6'(i) (for if ! e "(r), and j e 6'(i) and j -+ 1, then i - 1; but r F + 1,
so that i - 1, 1 --> r implying i - r, a contradiction). In this manner, one
decomposes 6" into a number of disjoint classes, each class being a maximal set
of communicating states. No member of one class is accessible from any member
of a different class. Also note that if k e 6"(i), then 6"(i) = 6'(k). For if j e 6'(i),
then j , i, i -* k imply j -- k; and since j is essential one has k --> j. Hence
j e 6"(k). The classes into which of decomposes are called equivalence classes.
In the case of the unrestricted simple random walk {S}, we have
6" S = {0, + 1, 2,. . .}' and all states in 6" communicate with each other;
only one equivalence class. While for {X} = {S 2n }, = S consists of two disjoint
equivalence classes, the odd integers and the even integers.
Our last item of bookkeeping concerns the role of possible cyclic motions
within an essential class. In the unrestricted simple random walk example, note
that p i ,=0 for all i=0,1,2,...,butp;2 ) =2pq>0. In fact p;7 1 =0for
all odd n, and p;" > 0 for all even n. In this case, we say that the period of i
)
Proposition 5.2
(a) If i H j then i and j possess the same period. In particular "period" is
constant on each equivalence class.
(b) Let i e 9' have a period d = d ; . For each j e 6'(i) there exists a unique
integer r 1 , 0 < rj d - 1, such that p;j ) > 0 implies n = rj (mod d) (i.e.,
either n = rj or n = sd + rj for some integer s >, I).
for all positive integers a, m, b. Choose a and b such that p;, > 0 and pj(b > 0.
) )
0 1 0 0
0 0 1 0
0 2 1 0 '2
1 0 0 0
4.1
T
I-->2^3
2
Thus p;i' = 0, pi 1 > 0 , P11 > 0, etc., and pi"i = 0 for all odd n. The states
communicate with each other and their common period is 2, although
min{n: p;"1 > 0} = 4. Note that min{n > 1: p; ) > 0} is a multiple of d, since d,
divides all n for which p;" ) > 0. Thus, d. <, min{n >, 1: p> 0}.
Proposition 5.3. Let i E e have period d> 1. Let Cr be the set of j e .9(i) such
that rj = r, where rr is the remainder term as defined in Proposition 5.2(b). Then
(a) Co , C,, ... , Cd _, are disjoint, U r = C, =
(b) If je C then pik >0 implies k e C, + ,, where we take r + 1 = 0 if
r = d 1.
Here is what Proposition 5.3 means. Suppose i is an essential state and has
a period d > 1. In one step (i.e., one time unit) the process can go from i E Ca
only to some state in C, (i.e., p 1 > 0 only if j e C 1 ). From states in C,, in one
step the process can go only to states in C 2 . This means that in two steps the
process can go from i only to states in C 2 (i.e., p; > 0 only if je C 2 ), and so
on. In d steps the process can go from i only to states in Cd + , = CO3 completing
one cycle (of d steps). Again in d + 1 steps the process can go from i only to
states in C 1 , and so on. In general, in sd + r steps the process can go from i
only to states in Cr. Schematically, one has the picture in Figure 5.1 for the
case d = 4 and a fixed state i e Co of period 4.
Example S.S. In the case of the unrestricted simple random walk, the period
is 2 and all states are essential and communicate with each other. Fix i = 0.
Then C o = {0, 2, 4, ...}, C l = (1, 3, 5, ...}. If we take i to be any
even integer, then C O3 C l are as above. If, however, we start with i odd, then
C o = { 1, 3, 5, ...}, C, = {0, 2, 4, ...}.
C,
iEG,--.jEC j ^kEC,./EC,^mEC^
Figure 5.1
126 DISCRETE-PARAMETER MARKOV CHAINS
Proposition 6.1. Suppose S is finite and p,J > 0 for all i, j. Then there exists a
unique probability distribution it = {m 1 : j e S} such that
and
Proof. Let M;" ) , m;" ) denote the maximum and the minimum, respectively, of
the elements {p: i e S} of the jth column of p'. Since p ;J >, 6 and
p ;J = 1 Y- k0 J P ik <- 1 (N 1)b for all i, one has
J JE)' JE)
so that
and
I (Pij P) _ y- Pij
jE jEJ jEJ
Therefore,
(n + 1)(n + 1) (n) (n) (n)
nj Pi'j = Pik Pkj Pi'k Pj
k = (Pik Pi'k)Pj
k
k k k
/ (n) (n)
\Pik Pi'k)Pkj + (Pik Pi'k)Pkj
kcJ kEJ'
kJ kJ'
min)). (6.6)
_ (Mj mj )` ") ") y (Pik Pik)) (1 NS)(M
`keJ
Letting i, i' be such that p (n+l) _ Mj(n+l) p1j = m(n+l) one gets from (6.6),
Min+l) m 5v+l) < (1 N6)(Mj(" min)). ) (6.7)
Now
M in+ 1) = max (n+ 1) = max I )) < max (Y- p ik M(n)1 = Mcn) ,
P \ P'k
Pkj j J
i i k i k
m
r 1) =min p^^ +i) = min Y_ P ik Pkj) min )
(^ Pikm;" ^ = m;" ) ,
i i \ k J i k
bounded above by 1, (6.7) now implies that both sequences have the same limit,
say n j . Also, 6<mj(' <m<nj <Mj for all n, so that n j >6 for all jand
) ()
one gets it j = E k Trk Pkj, proving (6.1). Since E j p;^ = 1, taking limits, as n -- co,
)
IP(jv+r") 1Cjl = - v)
Pik ) (Pkj x j ) ) 1< L, Pik )(1 N(5 ' ) " = (1 Nb ' ) ,
" in i 1,
k k
(6.13)
to obtain
where [x] is the integer part of x. From here one obtains the following corollary
to Proposition 6.1.
Also,
Pi
Pn(Xm = lo, Xm+1 = i l , ... , Xm+n = in) _ (TC ' P m )i0Pt0t Pi 1 i2
1 .. .
r 11.
Now let
In a manner analogous to the proof of Proposition 6.1, one can also obtain
the following result (Exercise 9).
and
p( )(x, y) n(y)I '< [1 6(d c)]' '0 for all x, ye (c, d) (6.24)
where
Here p "(x, y) is the n -step transition probability density function of X" given
[
Xo = x.
Proof. Define A + = {A > 0: Ax >, tix for some nonnegative nonzero vector x};
here inequalities are to be interpreted componentwise. Observe that the set A +
is nonempty and bounded above by IIAII :_ JN , J" , a ij . Let A o be the least
upper bound of A. There is a sequence {2 n : n >, l} in A + with limit A o as
n -+ oc. Let {x n : n >, l} be corresponding nonnegative vectors, normalized so
that hIxIl xi = 1, n >, 1, for which Ax n >, .? n x n . Then, since Ilxnll = 1 ,
n = 1, 2, ... , {x n } must have a convergent subsequence, with limit denoted x 0 ,
say. Therefore Ax 0 >, 2 0 x 0 and hence A o e A + . In fact, it follows from the least
1 _j a x
upper bound property of 2 c that Ax o = 2 0 x 0 . For otherwise there must be a
component with strict inequality, say l j j 2 0 x 1 = > 0, where
x 0 = (x 1 ..... x N )', and Ij , a kj x j 2 o x k >, 0, k = 2, ... , N. But then taking
y = (x 1 + ((5/2), x 2 , ... , x N )' we get Ay > pl o y with strict inequality in each
component. This contradicts the maximality of 2. To prove that if A is any
other eigenvalue then Al J <, A., let z be an eigenvector corresponding to A and
define Izl = (Iz1l, , Iz n l). Then Az = Az implies AIzl %JAIIzl. Therefore, by
definition of 2, we have 1 2 1 <, 20. To prove part (ii) of the Theorem we can
apply Proposition 6.1 to the transition probability matrix
C ^ P`, 10xi
(2
)X j
(2) N
Puj k1 Pik Pkj
N a ik x k a kj xj a
k = 1 20x1 AOxk ^O X
Xi
and inductively
^n> ai] xj
Pij ^f n >
A xi
Y
j=1
a,j x j = 2 o x 1 , i = 1, .. , N, x i > 0,
we have
N-1
(Bx)i = a,jxj = 2oxi a,NXN < 2OX;, 1= 1,. ..,N 1.
j=1
Corollary 6.6. Let A = ((a ;j )) be a matrix of strictly positive elements and let
2 0 be the positive eigenvalue of maximum magnitude (spectral radius). Then
A 0 is a simple eigenvalue.
In general, the transition law p may admit several essential classes, periodicities,
or inessential states. In this section, we consider the asymptotics of p" for such
cases.
First suppose that S is finite and is, under p, a single class of periodic essential
states of period d> 1. Then the matrix p , regarded as a (one-step) transition
probability matrix of the process viewed every d time steps, admits d
(equivalence) classes C o , C l , ... , C_ 1 , each of which is aperiodic. Applying
(6.16) to Cr and p d (instead of S and p) one gets, writing N, for the number of
elements in Cr ,
^pi; ^ ic j 1 '< (1 Nr S r
)In/vr for i, j a Cr, (7.1)
where n j = lim n . p;^l, v, is the smallest positive integer such that p!yd > > 0
STEADY-STATE DISTRIBUTIONS 133
for all i, j e C and b r = min{ pij, d ) : i, j e C,}. Let 6 = min{b,: r = 0, 1, ... , d l}.
Then one has, writing L = min{ N,: 0 < r < d 1 }, v = max{v,: 0 < r < d 1 },
If i E C, and j c C S with s = r + m (mod d), then one gets, using the facts that
pkjd) = 0 if k is not in Cs , and that Y-kEC. pik ) = 1 ,
d+m) 71j )I
7I j l = I I Pik)(Pkjd) (I L5)'" for i e Cr, E Cs .
^ Pij j
keC s
(7.3)
Of course,
d+m') = 0 if i e C j E Cs , and m' s r (mod d). (7.4)
Note that {nj : j e Cr } is the unique invariant initial distribution on C r for the
restriction of p d to C,.
Now, in view of (7.4),
nd nI
= pit'd+m)
if je C j E Cs , and m = s r (mod d). (7.5)
t=1 no
If r = s then m = 0 and the index of the second sum in (7.5) ranges from 1 to
n. By (7.3) one then has
I nd
lim - Y pii = 7r J.
) (i, j ES). (7.6)
nac n t=1
n
n
lim - P _
) (i, j e S). (7.7)
nm nt=1 d
k 1 n
keS d
>i
, lim
kes n-oo
-
n [=I
P ik ) Pkj
n
7C
= lim - Y Pi;+ I) = (j e S). (7.8)
n-xnI d
na
fj = 1 Y_ Y ^k Pkj =
=1 kES keS
nk ^ 1
n=1
Pk(
i
(7.10)
and zero probabilities to states not in 4 Then it is easy to check that ^6 = I a ; t" )
Let {X} be a Markov chain with countable state space S and transition
probability law p = ((p ij )). As in the case of random walks, the frequency of
returns to a state is an important feature of the evolution of the process.
PJ (X =j i.o.) = 1, (8.1)
and transient if
t^ =0,
} '}=min{n>0:X=j},
(8.3)
r=min{n> T:X=j} (r= 1,2,...),
Pj(T! ') < 00) = Pj (ii' - ' < oo and Xt,,- + = i for some n i 1)
}
Therefore, by iteration,
Pj(T ! ' ) < cc) = Pj(i{ l } < cc)P' - ' = Pji P - ` (r = 2, 3, ...). (8.6)
Now
= ^Pii=1
1 i
= Jim P i (T;r ^ < oo) f (8.8)
r .. 0 if pii < 1.
Further, write N(j) for the number of visits to the state j by the Markov chain
{X}, and denote its expected value by
cc
E1N(j) _ Pj(N(j) > r) _ Pr+t < )
op) = Pji (8.10)
r=0 r=0 r=0
so that, if i 0 j,
0 if i +* j, i.e., if p ii = 0,
G(i, j) = Pii/(l - p) if i --, j and p 1, (8.11)
00 if i -+ f and pii = 1.
Proposition 8.1
(a) Every state is either recurrent or transient. A state j is recurrent iff p ii = 1
iff G(j, j) = oc, and transient if pii < 1 if G(j, j) < oo.
(b) If j is recurrent, j --* i, then i is recurrent, and pi; = p ii = 1. Thus,
recurrence (or transience) is a class property. In particular, if all states
communicate with each other, then either they are all recurrent, or they
are all transient.
(c) Let j be recurrent, and S(j) _ {i e S: j -- i) be the class of states which
communicate with j. Let n be a probability distribution on S(j). Then
Proof. Part (a) follows from (8.8), (8.11). For part (b), suppose j is recurrent
and j - i (i j). Let A r denote the event that the Markov chain visits i between
the rth and (r + 1)st visits to state). Then under Pi , A, (r >, 0) are independent
events and have the same probability 6, say. Now 0 > 0. For if 0 = 0, then
PJ (X = i for some n >, 1) = Pj ((J r>0 A,) = 0, contradicting j - i. It now
follows from the second half of the Borel-Cantelli Lemma (Chapter 0, Lemma
6.1) that PJ (A. i.o.) = 1. This implies G(j, i) = oo. Interchanging i and j in (8.11)
one then obtains p,, = 1. Hence i is recurrent. Also, pi; >_ Pj (A r i.o.) = 1. By
the same argument, p ii = 1.
Hence
n
Note that G(i, i) = 1/(1 p a ), i.e., replace p, by 1 in (8.10).
For the simple random walk with p> Z, one has (see (3.9), (3.10) of
Chapter I),
j
)'-
Proposition (8.1) shows that the difference between recurrence and transience
is quite dramatic. If) is recurrent, then P P (N(j) = cc) = 1. If] is transient, not
only is it true that P3 (N(j) < oo) = 1, but also E1 (N(j)) < oo. Note also that
every inessential state is transient (Exercise 7).
events have positive probability. If ax c 11, then I[ax + u]I = laxl. Since
0 is absorbing, all states x ; 0 must, therefore, be inessential, and thus transient.
To obtain convergence, simply note that the state space of the process started
at x is finite since, with probability 1, the process remains in the interval
[ Ixl, Ixl]. The finiteness is critical for this argument, as one can readily see
by considering for contrast the case of the simple asymmetric (p > 2) random
walk on the nonnegative integers with absorbing boundary at 0.
Consider now the k-dimensional problem X n+ 1 = [AX n + U n+ 1 ], n = 0, 1, 2,
, where A is a k x k real matrix and {U n } is an i.i.d. sequence of random
vectors uniformly distributed over the k-dimensional cube, [0, 1) k and [ ] is
defined componentwise, i.e., [x] = ([x 1 ], ... , [x k ]), x e 11 k It is convenient to
use the norm II II defined by Ilx lie := max{Ix,l, , x,j}. The ball
B,(0):= {x: Ilxllo < r} (of radius r centered at 0) for this norm is the square of
side length 2r centered at 0. Assume II Ax II o < II x II o, for all x 0 (i.e.,
IIAII := sup, , 0 I1AxiI 0 /Ilx11 0 < 1) as our stability condition. Once again we wish
to show that X n 0 as n > oo with probability 1. As in the one-dimensional
case, 0 is an (absorbing) essential state that is accessible from every other state
x because there is a subset N(x) of [0, 1) k having positive volume such that for
each u c N(x), one has II[Ax + u]110 < IIAxllo < Ilxll o . So each state x # 0 is
inessential and thus transient. The result follows since the process starting at
x e 71k does not escape B 11x1 ^ 0 (0), since II[Ax + u]110 < I1Axllo + I < 11x1, + 1
and, since II[Ax + u]Il o and Ilx110 are both integers, (I[Ax + u](lo < 11x10
Linear models X n+ , = AX n + E n+ ,, with {E n } a sequence of i.i.d. random
vectors, are systematically analyzed in Section 13.
Zj r I I I
r
lim Z Z= EZ 1 , (9.3)
r-+x r 1
provided that EIZ 1 I < oc. In what follows we will make the stronger assumption
provided
that
T 2,
NN =max{r>0:tjr'<n}. (9.5)
n 1 J( m)
m=0 r=1 n+I
S Al)
T
f(i) f(Xn)
x,=r T T
f(j) f(j) f(j) f(j) f(j)
---- T-------I Y-------------T--------------z---
f(k)
k T T
f(xn1)
7...
U t I I
ZIA Zi Z, ... ZN
"-I Z.N,
Sn
Figure 9.1
1 T 11)
The last sum on the right side of (9.6) has at most r;'"' ) r( N ^ ) summands,
this number being the time between the last visit to j by time n and the next
visit to j. Although this sum depends on n, under the condition (9.4) we still
have that (Exercise 1)
1T '" a l l 1 T'N-' 1 1
Therefore,
S _ 1 N^ N 1 N^
2] Z,+R n = " E Z,+R n , (9.9)
n n, 1n
Nn ,= 1
where R n --* 0 as n --> oo with probability 1 under (9.4). Also, for each sample
path outside a set of probability 0, N N oo as n oo and therefore by (9.3)
(taking limit over a subsequence)
1 N^
lim Y Z r = EZ i (9.10)
n_^
if (9.4) holds. Now, replacing f by the constant function f - 1 in (9.10), we have
THE LAW OF LARGE NUMBERS 141
_' = E(tj 2) T i
-
lim L_ (9.11)
nx Nn
T (N,J
nT (N " ) < T(N"+ 1)
J J J
Note that the right side, E(r 2) t'), is the average recurrence time of
j(= E j tj(' ) ), and the left side is the reciprocal of the asymptotic proportion of
time spent at].
Combining (9.9)(9.11) gives the following result. Note that positive recurrence
is a class property (see Theorem 9.4 and Exercise 4).
(b) If S comprises a single class of essential states (irreducible) that are all
positive recurrent, then (9.15) holds with probability I regardless of the
initial distribution.
1 n
lim - P;;) _ (9.16)
n- n m=i Et
142 DISCRETE-PARAMETER MARKOV CHAINS
f(j)
I for all k ^ j. (9.17)
f(k)=0
Then Zr - 1 for all r = 0, 1, 2, ... since there is only one visit to state j in
(T cr) ,
i(r+1)] Hence, taking expectation on both sides of (9.15) under the initial
state i, one gets (9.16) after interchanging the order of the expectation and the
limit. This is permissible by Lebesgue's Dominated Convergence Theorem since
(n + I)' YOf(Xm)I < 1.
constitute an invariant distribution if the states are all positive recurrent and
communicate with each other. Let us first show that Y- t it = I in this case. For
this, introduce the random variables
i.e., T i ' is the amount of time the Markov chain spends in the state i between
( )
the rth and (r + 1)th passages to j. The sequence {T: r = 1, 2, ...} is i.i.d. by
the strong Markov property. Write
Then,
03(i) _ EJ T; 1) = Ej ( T}1)) = E j ( t f2 _ r ) = -.
(9.21)
ics 1 l l 7f
r = 1, 2, ...}, and by (9.12) and (9.18), the limit on the left side also equals
N
V T(r)
nJ IN, T1'
lim -h---- = lim " N = n 1 Bj (i). (9.23)
n- oc n- x n \r1 n
Hence,
ni = n j Oj (i). (9.24)
(9.25)
71 f
By Scheffe's Theorem (Theorem 3.7 of Chapter 0) and (9.16),
In
- 71i +0 as n ^ oo . (9.26)
iES nm =1
Hence,
1 1
Pam) Pij < TCi - Y- pj, " ') --> 0 as n
iES iES n m= 1 iES n m= 1
(9.27)
Therefore,
in
= hm - Y E p^.m)p . `
n-^ao n m= 1 iES
1 n (m+l) _
= lim Pi; it; (9.28)
n-'x n m=1
it = z niPij (m = 1, 2, ...).
; )
(9.29)
iES
144 DISCRETE-PARAMETER MARKOV CHAINS
1
n; = Y_ ni
ieS n m=1
i Pi; (9.30)
If j is a null recurrent state and i..j, then for the Markov chain having initial
state i the sequence {Z,: r = 1, 2, ...} defined by (9.2) with f - 1 is still an i.i.d.
sequence of random variables, but the common mean is infinity. It follows
(Exercise 3) that, with Pi -probability 1,
j (N")
Since n we have
n+l
lim = cc,
n- w Nn
and, therefore,
lim N = 0
" with P- probability 1. (9.32)
". te n+l
Since 0 < N" /(n + 1) < 1 for all n, Lebesgue's Dominated Convergence
Theorem applied to (9.32) yields
N
lim E " ; = 0. (9.33)
n -^ n+1
But
lim p) = 0 (9.35)
n-.o Il + 1 m=1
00
p;; ) <ao.
m=0
In particular, therefore,
if j is a transient state.
The main results of this section may be summarized as follows.
Theorem 9.4. Assume that all states communicate with each other. Then one
has the following results.
(a) Either all states are recurrent, or all states are transient.
(b) If all states are recurrent, then they are either all positive recurrent, or
all null recurrent.
(c) There exists an invariant distribution if and only if all states are positive
recurrent. Moreover, in the positive recurrent case, the invariant
distribution it is unique and is given by
(d) In case the states are positive recurrent, no matter what the initial
distribution p, if E,, I f (X 1 )I < cc, then
n
him !
- Z f(Xm) = Z nif(i) = Enf(X,) (9.38)
n c m=1 ieS
with P -probability 1.
in the latter case). Thus (9.24) holds. Now O (i) > 0; for otherwise T = 0 with
probability 1 for all r > 0, implying p , = 0, which is ruled out by the assumption.
;
The relation (9.24) implies it > 0, since it > 0 and O (i) > 0. Therefore, all states
are positive recurrent.
For part (c), it has been proved above that there exists a unique invariant
probability distribution it given by (9.35) if all states are positive recurrent.
Conversely, suppose it is an invariant probability distribution. We need to show
that all states are positive recurrent. This is done by elimination of other
possibilities. If the states are transient, then (9.36) holds; using this in (9.29)
(or (9.30)) one would get it = 0 for all j, which is a contradiction. Similarly,
null recurrence implies, by (9.30) and (9.35), that ii = 0 for all j. Therefore, the
states are all positive recurrent. Part (d) will follow from Theorem 9.2 if:
(i) The hypothesis (9.14) holds whenever
and
=)
T!
Since ET;' = O (i) = n i /rr (see Eq. 9.24), taking expectations in (9.41) yields
Ta
The last equality follows upon interchanging the orders of summation and
expectation, which by Fubini's theorem is always permissible if the summands
are nonnegative. Therefore (9.14) follows from (9.39). Now as in (9.42),
T cn
where this time the interchange of the orders of summation and expectation is
justified again using Fubini's theorem by finiteness of the double "integral."
n
THE LAW OF LARGE NUMBERS 147
If the assumption that "all states communicate with each other" in Theorem
9.4 is dropped, then S can be decomposed into a set . t of inessential states and
(disjoint) classes S l , S 2 , ... , St of essential states. The transition probability
matrix p may be restricted to each one of the classes S..... , S, and the
conclusions of Theorem 9.2 will hold individually for each class. If more than
one of these classes is positive recurrent, then more than one invariant
distribution exist, and they are supported on disjoint sets. Since any convex
combination of invariant distributions is again invariant, an infinity of invariant
distributions exist in this case. The following result takes care of the set , of
inessential states in this connection (also see Exercise 4).
Proposition 9.5
(a) If j is inessential then it is transient.
(b) Every invariant distribution assigns zero probability to inessential,
transient, and null recurrent states.
Proof. (a) If j is inessential then there exist i E S and m > I such that
P;M > 0
) and p;; = 0
) for all n >, 1. (9.44)
Hence
Corollary 9.6. If S is finite, then there exists at least one positive recurrent
state, and therefore at least one invariant distribution n. This invariant
distribution is unique if and only if all positive recurrent states communicate.
Proof. Suppose it possible that all states are either transient or null recurrent.
Then
Since (n + 1) ' I, _ , p;T < I for all], and there are only finitely many states
)
1 1 "
lim Pi; ^) = lim Pi;
jes n n+ 1 m=1 nix jes n+ 1 m=o
1
= lim (m)
%^ij
n -. x n + 1 m=0 jcs
n
=tim 1 i 1 =lim n--1 =1. (9.47)
(n + 1 m=o
- n-x,n+1
But the first term in (9.47) is zero by (9.46). We have reached a contradiction.
Thus, there exists at least one positive recurrent state. The rest follows from
Theorem 9.2 and the remark following its proof.
The same method as used in Section 9 to obtain the law of large numbers may
be used to derive a central limit theorem for S. = Ym =o f(Xm ), where f is a
real-valued function on the state space S. Write
_ n _
Sn = Y- .J(X.) = Z (f(Xm) - U),
m=0 m=0
(10.3)
Zr = ^ J(X) (r = 0, 1, 2, ...).
m =,(i +1
Then by (9.40),
Thus {Z,: r = 1, 2, ...} is an i.i.d. sequence with mean zero and finite variance
or = EJ . (10.5)
Now apply the classical central limit theorem to this sequence. As r - cc,
(1/^) jk =1 Zk converges in distribution to the Gaussian law with mean zero
and variance U 2 . Now express Sn as in (9.6) with f replaced by f, S" by S,, and
THE CENTRAL LIMIT THEOREM FOR MARKOV CHAINS 149
Zr by Zr, to see that the limiting distribution of (1/^)S n is the same as that
of (Exercise 1)
1 N,, Na )112
(
1 N"
(10.6)
7 r i
n JNn r=1
We shall need an extension of the central limit theorem that applies to sums
of random numbers of i.i.d. random variables. We can get such a result as an
extension of Corollary 7.2 in Chapter 0 as follows.
Proposition 10.1. Let {XX : j >, 1} be i.i.d., EX^ = 0, 0 < a 2 := EX J2 < oc. Let
{v n : n >, l} be a sequence of nonnegative integer-valued random variables with
v
lim " = a in probability (10.7)
n-,, n
+
P( max
(mI (na1I <L 3 Ina17
ISm (n]1 % c([na]) 1J2 )
(10.8)
The first term on the right goes to zero as n > x, by (10.7). The second term
is estimated by Kolmogorov's Maximal Inequality (Chapter 1, Corollary 13.7),
as being no more than
S" Sfna1 + 0
in probability. (10.10)
1r2
1[na])
Since S1na1 /([na]) l " 2 converges in distribution to N(0, 1), it follows from (10.10)
that so does S"/([na])'/ z . The desired convergence now follows from (10.7).
n
By Proposition 10.1, N'2 1 Zr is asymptotically Gaussian with mean
zero and variance o z . Since N/n converges to (Er') ', it follows that the
expression in (10.6) is asymptotically Gaussian with mean zero and variance
150 DISCRETE-PARAMETER MARKOV CHAINS
Wn(t) = Sin(]
n+l
(10.11)
I
l' (t) = W(t) + (nt [nt])X 1+ 1 (t '> 0),
n+l
(Exercise 2). In fact, convergence of the full distribution may also be obtained
by consideration of the above renewal argument. The precise form of the
functional central limit theorem (FCLT) for Markov chains goes as follows (see
theoretical complement 1).
Then,
/
E,J 2 (Xm)
n + 1 m=0
ABSORPTION PROBABILITIES 151
2
m-m
= EJ 2(X0) + Y Y Erz[f
1 (Xm , )(P J )(XYr')]
n + 1 m=1 m=0
rn- I
= EJ 2 (X0 ) + ^ Erz % (X0)(P m '.j)(X0)]
n + 1 m=1 m'=0
2 n m-1
n + 1 m=1 m'=0
=Erzf 2 (X0)+ 2 f, 1 Y- k
Y-Pi ( k=mm ' )
n+ 1 m = 1 k= 1I I n
(10.14)
Now assume that the limit
m
y= lim I <f,P kfi rz (10.15)
mimeo k=1
Note that
and
m m
Y <j P k j >rz = COV{f(X0),J (Xk)l.
k=1 k=1
The condition (10.15) that y exists and is finite is the condition that the
correlation decays to zero at a sufficiently rapid rate for time points k units
apart as k , oo.
11 ABSORPTION PROBABILITIES
where Y* denotes summation over all m-tuples (i,, i 2 , ... , i) of elements from
S { j}. Now let p denote the matrix obtained by deleting the jth row and jth
column from p,
and, therefore,
Observe that the above idea can be applied to the calculation of the first
passage time to any state j e S or, for that matter, to any nonempty set A of
states such that i 0 A. Moreover, it follows from Theorem 6.4 and its corollaries
that the rate of absorption is, therefore, furnished by the spectral radius of p .
This will be amply demonstrated in examples of this section.
where p is the matrix obtained by deleting the rows and columns of p
corresponding to the states in A.
In general, the matrix p is not a proper transition probability matrix since the
row sums may be strictly less than 1 upon the removal of certain columns from
p. However, if each of the rows in p corresponding to states j e A is replaced
by ej having 1 in the jth place and 0 elsewhere, then the resulting matrix ,
say, is a transition probability matrix and
ABSORPTION PROBABILITIES 153
The reason (11.8) holds is that up to the first passage time t o the distribution
of Markov chains having transition probability matrices p and (starting at i)
are the same. In particular,
In the case that there is more than one state in A, an important problem is
to determine the distribution of X,,, starting from i A. Of course, if
P; ('r A < co) < 1 then X, is a defective random variable under Pi, being defined
on the set {r A < co} of P; -probability less than 1.
Write
Denoting by a j the vector (a(i): i E S), viewed as a column vector, one may
express (11.10) as
(11.13)
_ 1 ifi=j
(^ E A).
a ' (j) 0 if i E A, but i A j
1 ifi=j
(11.15)
a ' (i) f o if i e A, i^ j.
ifieA`,keA,and
for all je A`, since a(k) = 0 for k E A\{ j} and a(j) = 1. Hence a ; is the unique
solution of (11.13).
Conversely, if P; (T A < cc) < 1 for some in A`, then the function
h = (h(i): i e S) defined by
(2N)
P(-Y+1 = k X o , X,, ... , X) 8n(1 B)zN k -
(11.23)
Notice that {X} is an aperiodic Markov chain. The "boundary states {0}
and {2N} form closed classes of essential states. The set of states
{ 1, 2, ... , 2N I } constitute an inessential class. The model has a special
conservation property of the form of the following martingale property,
for n = 0, 1, 2..... However, since S is finite we know that in the long run
{X} is certain to be absorbed in state 0 or 2N, i.e., the population is certain
to eventually come to a unanimous opinion, be it pro or con. It is of interest to
calculate the absorption probabilities as well as the rate of absorption. Here,
with A = {0, 2N}, one has p = p.
Let a(i) denote the probability of ultimate absorption at j = 0 or at j = 2N
starting from state i e S. Then,
156 DISCRETE-PARAMETER MARKOV CHAINS
a 2N()
i = 2N , i =0, 1, 2,...,2N, (11.28)
= 2N i
ao(i) 2N ' i = 0, 1, 2, ... , 2N. (11.29)
2N
E' p j vj =2v i ,
; i =0,1,...,2N. (11.30)
j=o
The rth factorial moment of the binomial distribution (p ij : 0 < j < 2N) is
(2N rl
Z j(j 1)... (j r + l)pij =( 1r(2N)...
j=o 2N)
(2N r + 1)
j =r j r
(jJ_r i 2N -j
x 2N) 1 2N
jj (11.31)
for r = 1, 2, ... , 2N. Equation (11.31) contains a transformation between
"factorial powers" and "ordinary powers" that deserves to be examined for
connections with (11.30). The "factorial powers" (j) r := j(j 1)... (j r + 1)
are simply polynomials in the "ordinary powers" and can be expressed as
r
j(1 1)(j r + 1) _ skjk. (11.32)
k =1
Likewise, "ordinary powers" jr can be expressed as polynomials in the "factorial
ABSORPTION PROBABILITIES 157
powers" as
r
J r = Sk(J)k, (11.33)
k=0
with the convention (j) 0 = 1. Note that S r' = 1 for all r > 0. The coefficients
Sr), {Sk} are commonly referred to as Stirling coefficients of the first and second
kinds, respectively.
Now every vector v = (v 0 , ... , v 2N )' may be represented as the successive
values of a unique (factorial) polynomial of degree 2N evaluated at 0, 1, . .. , 2N
(Exercise 7), i.e.,
2N
a r (j)r forj=0,1,...,2N. (11.34)
r=0
pij vj = L-
j =0 r=0
Y- pij(J)r r=0
j=0
ar Y- a r
(2N)
r jr = Y- ar r t J
2N n=0
r=0 (2N)
S( \
2N /2N
(2N)r (11.35)
= Y- ar (2N )r S)(1)n.
n=0 r=n
_ _ (2 N )2N (2N)
^`'2N (2N)2N + a2N I,
\ 1 1.37 )
a2n
and a r = & 2N) , r = 2N 1,... 0, are solved recursively from (11.36). Next
takea2N 1) 0 , a2N-1 1) = 1 and solve for
(2N)2N_ 1
(2N)2N-1'
etc.
Then,
pV = VD, (11.39)
or
p = VDV -1 , (11.40)
so that
Since the left side of (11.42) must go to zero as m oo, the coefficients of
and A; - 1 must be zero. Thus,
2N 2N-1
Pi(T(0.2N) > m) _ Y, Y, Vikv k ^^k
k=O j=1
_ ^2 vi2v 2i ) + kY 1
Lt , E1 1 , Y1 1
Vik Uk,/ \^ 2/ m J
.f * `(j), ifi>'1,j>'0,
p ij = 1 ifi=0,j=0, (11.44)
0 ifi=0,j 0.
Since a power series can be differentiated term by term within its radius of
convergence, one has
i.e., if
= Y_
If the mean y of the number of particles generated by a single particle is finite,
i =1
ifU) < oo,
(11.49)
Since &'(z) > 0 for 0 < z < 1, 0 is strictly increasing. Also, since 4) "(z) (which
exists and is finite for 0 < z < l) satisfies
dz
zz O(z) _ j(j 1)f(j)z' 2 >0 for 0 < z < 1, (11.50)
i =2
the function 0 is strictly convex on [0, 1]. In other words, the line segment
joining any two points on the curve y = 4)(z) lies strictly above the curve (except
at the two points joined). Because 0(0) = f(0) >0 and 4)(l) _ f(j) = 1,
the graph of looks like that of Figure 11.1 (curve a or b).
The maximum of '(z) is y, which is attained at z = 1. Hence, in the case
> 1, the graph of y = 4)(z) must lie below that of y = z near z = 1 and, because
4)(0) = f(0) > 0, must cross the line y = z at a point z o , 0 < z o < 1. Since the
slope of the curve y = 4)(z) continuously increases as z increases in (0, 1), z o is
the unique solution of the equation z = 4)(z) that is smaller than 1.
160 DISCRETE-PARAMETER MARKOV CHAINS
4(0) = f(0)
U Zo I z
Figure 11.1
In case u <, 1, y = cb(z) must lie strictly above the line y = z, except at z = 1.
For if it meets the line y = z at a point z o < 1, then it must go under the line
in the immediate vicinity to the right of z o , since its slope falls below that of
the line (i.e., unity). In order to reach the height 4(l) = 1 (also reached by the
line at the same value z = 1) its slope then must exceed 1 somewhere in (z o , 1];
this is impossible since 0'(z) < 0'(1) = p < I for all z in [0,1]. Thus, the only
solution of the equation z = 4(z) is z = 1.
Now observe
thus if p <, 1, then p = 1 and extinction is certain. On the other hand, suppose
p> 1. Then p is either z o or 1. We shall now show that p = z o (< 1). For this,
consider the quantities
That is, q is the probability that the sequence of generations originating from
a single particle is extinct at time n. As n increases, q. j p; for clearly,
{X=0}c{X,,,=0} for all m>,n,so that q<,qifn<m.Also
co
Since q 1 = f(0) = 4(0) < 4(z 0 ) = z o (recall that b(z) is strictly increasing in z
for 0 < z < 1), one has using (11.53) with n = 1, q 2 = o(q 1 ) < O(z o ) = z o , and
so on. Hence, q <z for all n. Therefore, p = lim n .. q < z o . This proves
p = zo.
If f(0) + f(1) = 1 and 0 <f(0) < 1, then q5"(z) = 0 for all z, and the graph
of 4(z) is the line segment joining (0, f(0)) and (1, 1). Hence, p = I in this case.
Let us now compute the average size of the nth generation. One has
_^ ^k P ^n1c11^
1j Pjk P1j
^i(k^111
Pjk l `
O ro
_ p E(XI I Xo =j)=
) pi jE(XI 1X0 = 1)
j=1 j=1
ao a.
_ Pi;Ju = P 1Pij = E(X I X o = 1). (11.54)
j= 1 j=1
It follows that
Thus, in the case < 1, the expected size of the population at time n decreases
to zero exponentially fast as n -+ co. If = 1, then the expected size at time n
does not depend on n (i.e., it is the same as the initial size). If > 1, then the
expected size of the population increases to infinity exponentially fast.
162 DISCRETE-PARAMETER MARKOV CHAINS
regards the total potential energy as a sum of energies at individual sites plus
the sum of interaction energies between pairs of sites plus the sum of energies
between triples, etc., and the probability distribution is specified for various
types of such interactions. For example, if U(w) = > fEA q1(6 n ) for
ONE-DIMENSIONAL NEAREST-NEIGHBOR GIBBS STATE 163
Y_
C, except for the independent case it is not quite obvious how to do this starting
with energy considerations. First consider then the independent case. For
single-site energies q 1 (s, n), se S. at the respective sites ne Z, the probabilities
of cylinder events can be specified according to the formula
That is, the state at n depends on the given states at sites in D \ {n} only through
the neighboring values at n 1, n + 1. One would like to know that there is
a probability distribution having these conditional probabilities. For the
one-dimensional case at hand, we have the following basic result.
Theorem 12.1. Let S be an arbitrary finite set, S2 = S^, and let be the
sigmafield of subsets of ) generated by finite-dimensional cylinder sets. Let
{Xn } be the coordinate projections on S2. Suppose that P is a probability measure
on S with the following properties.
(i) P(C) > 0 for every finite-dimensional cylinder set C e S.
(ii) For arbitrary n c- Z let 3(n) = {n 1, n + 1 } denote the boundary of {n}
in 7L. If f is any S-valued function defined on a finite subset D,, of 7/ that
contains {n} u (n) and if a e S is arbitrary, then the conditional
probability
P(X=a Xm = f(m),meD\{n})
Proof. Let
gb,C(a)=P(X=1 X.- =b,X+ =c). (12.7)
...
P(XX = a o , X+ = a l , ... , X,,+n = ab) = ir(ao)p(ao, a, ) p(ab_ 1, ab). (12.8)
So, in particular, the condition (i) is satisfied; also see (2.9), (2.6), Prop. 6.1.
For m and n > I it is a straightforward computation to verify
Therefore, the condition (ii) holds for P. since condition (iii) also holds because
P is the distribution of a stationary Markov chain. Next suppose that P is a
probability distribution satisfying (i), (ii), and (iii). We must show that P is the
distribution of a stationary Markov chain. Fix an arbitrary element of S, denoted
as 0, say. Let the local structure of P be as defined in (12.7). Observe that for
each b, c E S. g( ) is a probability measure on S. Outlined in Exercise 1 are
the steps required to show that
q(b, a)q(a, c)
g(a) =q^z^(b ^) (12.10)
where
0 , (b)
q(b, c) = g (12.11)
g 9 (0)
and
q("+ '>
(b, c) = I q(b, a)q(a, c). (12.12)
aS
) (b, c) (c)
9(b, c) = q^" (12.15)
2 "u(b)
Let Q be the probability distribution of this Markov chain. It is enough to
166 DISCRETE-PARAMETER MARKOV CHAINS
show that P and Q agree on the cylinder events. Using (12.13) we have for any
n 1,
(n)(a,, b)
(n) (a , ao)9(ao, a,) ... q(ar-,, ar)q
_ P(X-n = a, Xn+r = b) q q(r+2n)(a, b)
aES bES
random variables defined on some probability space (f), .F, P). Given an initial
random variable X 0 independent of {E}, define recursively the sequence of
random variables {X": n > 0} as follows:
and le n s < c with probability I for some constant c. Then it follows from (13.5)
that
This implies that, with probability 1, Ib "E" +l l < c(Ibj6)" for all but finitely many
n. Since IbI b < 1, the series on the right side of (13.7) is convergent and is the
limit of Y".
It is simple to check that (13.8) holds if I bi < 1 and (Exercise 3)
The conditions (13.6) and (13.8) (or (13.9)) are therefore sufficient for the
existence of a unique invariant probability it and for the convergence of X. in
distribution to it.
As in (13.2), (13.3), {X"} is a Markov process with state space 68'" and transition
probability
Assume that
where Ixl denotes the Euclidean length of x in LRm. For a positive integer n > n o
write n = jn o + j', where 0 < j' < n o . Then using the fact IIB1B2II IIB1II IIB2II
for arbitrary m x m matrices B,, B 2 (Exercise 2), one gets
1
" ^ 1 P(i 1 I > cb") < oo for some 6 < II Bn0Il11n0 . (13.15)
It also follows, as in Example 1 (see Exercise 1), that no matter what the initial
distribution (i.e., the distribution of X 0 ) is, X n converges in distribution to the
distribution it of Y. Therefore, it is the unique invariant distribution for p(x, dy).
For purposes of application it is useful to know that the assumption (13.12)
holds if the maximum modulus of eigenvalues of B, also known as the spectral
radius r(B) of B, is less than 1. This fact is implied by the following result from
linear algebra.
Proof. Let A,, ... , A m be the eigenvalues of B. This means det(B Al) =
(^ 1 A)(A2 A) * (Am A), where det is shorthand for determinant and I is
the identity matrix. Let A m have the maximum modulus among the A ; , i.e.,
J > Iiml then B Al is invertible, since det(B AI) 0. Indeed,
I'ml = r(B). If Al
by the definition of the inverse, each element of the inverse of B Al is a
polynomial in A (of degree m 1 or m 2) divided by det(B AI). Therefore,
one may write
where B,(0 < j < m 1) are m x m matrices that do not involve A. Writing
z = 1/A, one may express (13.18) as
170 DISCRETE-PARAMETER MARKOV CHAINS
m-1
(B AI)-1 = (_)_m(1 /^1,)-1.. .(1 )!mR)-12m-1 j=0
m-1
= (-1)mz(l .l,z) -1 ...(I .mZ) 1 I z m 1-, Bj
j=0
Z Y a,, Z n) -1 Zm-1-jBj
(IZI < I2ml-1 ). (13.19)
^ n=0 /-0
To see this, first note that the series on the right is convergent in norm for
Izl < 1 /1111, and then check that term-by-term multiplication of the series I z k B k
by I zB yields the identity I after all cancellations. In particular, writing b;j'^
for the (i, j) element of B k , the series
X
zY_z k b; (13.21)
k=0
converges absolutely for Izl < 1/ B. Since (13.21) is the same as the (i, j) element
of the series (13.19), at least for Izl < I/IIBII, their coefficients coincide (Exercise
4) and, therefore, the series in (13.21) is absolutely convergent for Izl < IAm) -'
(as (13.19) is).
This implies that, for each e > 0,
For if (13.22) is violated, one may choose Izl sufficiently close to (but less than)
1/I ; m l such that Iz^ k ' ) b'>l -- co for a subsequence {k'}, contradicting the
requirement that the terms of the convergent series (13.21) must go to zero for
IZI < 1/IAmI
Now IIB k ll < m' 12 max{Ib>I: 1 < i, j < m} (Exercise 2). Since m l/21 ^ 1 as
k --* co, (13.22) implies (13.17). n
Two well-known time series models will now be treated as special cases of
Example 2. These are the pth order autoregressive (or AR(p)) model, and the
autoregressive moving-average model ARMA(p, q).
p-1
U+1 + 1
Un+p iln+p (n >, 0). (13.23)
i =0
0 1 0 0 0 0
0 0 1 0 .. 0 0
B:= . (13.27)
0 0 0 0 ... 0 1
). 1 0 0 0 0
0 ) 1 0 0 0
BA.(= ... .
0 0 0 0 ^ 1
Expanding det(B Al) by its last row, and using the fact that the determinant
of a matrix in triangular form (i.e., with all zero off-diagonal elements on one
side of the diagonal) is the product of its diagonal elements (Exercise 5), one gets
1
det(B AI) = ( - 1 p+1 ) (0 + 1A + ... + p ,; ,p -
-
A"). (13.28)
0+1A+ ...
+p-I )p-I Ap = 0 . (13.29)
Finally, in view of (13.17), the following proposition holds (see (13.15) and
Exercise 3).
Proposition 13.1. Suppose that the roots of the polynomial equation (13.29)
are all strictly inside the unit circle in the complex plane, and that the common
distribution G of {tl"} satisfies
where JA m t is the maximum modulus of the roots of (13.29). Then (i) there exists
a unique invariant distribution 71 for the Markov process {X"}, and (ii) no
matter what the initial distribution, X n converges in distribution to it.
Once again it is simple to check that (13.30) holds if G has a finite absolute
moment of some order r > 0 (Exercise 3).
An immediate consequence of Proposition 13.1 is that the time series
{ Un : n >, 0} converges in distribution to a steady state n U given, for all Bore!
sets C R', by
To see this, simply note that U. is the first coordinate of X, so that X,, converges
to it in distribution implies U. converges to it in distribution.
p I q
Un+p := Z i Un +i + 1 Sj^1 n+pj + rin+p (n 0), (13.32)
i =0 j=1
where p, q are positive integers, i (0 < i < p 1) and bj (1 < j < q) are real
constants, {ri n : n >, p q} is an i.i.d. sequence of real-valued random variables,
and U. (0 < i <, p 1) are arbitrary initial random variables independent of
{rj"}. Consider the sequence {X"}, {s"} of (p + q)-dimensional vectors
A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS 173
X,,+1 = HXn + En+ 1 (n -> 0 ), (13.34)
b ll . ... b 1 0 . ... 0 0
h nl h ... b 2 d1
d9-1
0 00 1 0 .. 00
H :=
0 00 0 1 00
0 0 ... 0 0 01
0 0 . 0 0 ... 00
Therefore, the eigenvalues of H are q zeros and the roots of (13.29). Thus, one
has the following proposition.
no matter what the distribution of (U0 , U ... , U p _,) is, provided the
hypothesis of Proposition 13.2 is satisfied.
In the case that E n is Gaussian, it is simple to check that under the hypothesis
(13.12) in Example 2 the random vector V in (13.16) is Gaussian. Therefore, it
is Gaussian, so that the stationary vector-valued process {X n } with initial
distribution it is Gaussian (Exercise 6). In particular, if q n are Gaussian in
Example 2(a), and the roots of the polynomial equation (13.29) lie inside the
unit circle in the complex plane, then the stationary process {Un }, obtained
174 DISCRETE-PARAMETER MARKOV CHAINS
lt will be shown now that the following condition guarantees the existence
of a unique invariant probability it as well as stability, i.e., convergence of X(x)
in distribution to it for every initial state x. Assume
6, := P(X o (x) '< z 0 Vx) >0 and 82 := P(X o (x) z 0 Vx) >0
(14.4)
for some zo E J and some integer n o .
Define
A,:= sup I P(X,(x) '< z) P(X,(y) '< z). (14.5)
x,y,zcJ
For this fix x, y E J and first take z < z o . On the set F2 :_ {X 0 (x) >, z 0 Vx} the
176 DISCRETE-PARAMETER MARKOV CHAINS
events {X 0 (x) <, z}, {X 0 (y) < z} are both empty. Hence, by the second
condition in (14.4),
IP(Xno (x) < z) P(Xn o (y) < z)I = IE(1{x^o(x)sz) 1{x,o(Y),z))I P(FZ) = 1 6 2 ,
(14.7)
since the difference between the two indicator functions in (14.7) is zero on F2 .
Similarly, if z > z o then on the set F, :_ {X 0 (x) <, z 0 Vx} the two indicator
functions both equal 1, so that their difference vanishes and one gets
IP(X^ o (x) < z) P(Xn o (y) < z)] < P(F) = 1 6 1 . (14.8)
For,
= I E ( 1 {ai . +un o ... ain o +IXjn o (x) z) 1 (au+IW o ...ai no + Xjn o (Y)Sz) ) I
= I E(l {xjno(x) 2 ( ( Itj+1mo ... n 0+1) - '( aO.zJ) 1 {Xjno(Y)e(a(j+ llno ... a^no+1) - '( ao.zJ) )1 '
(14.13)
Let
F3 1 = {a(j + 1) no ...aj no+ lx < zOVx}, F4:= {a(j+1) no ...aj np +1X s Zpdx}.
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS 177
Take z < z o first. Then the inverse image of (x,:] in (14.13) is empty on
F4 , so that the difference between the two indicator functions vanishes on F 4 .
On the complement of F,, the inverse image of ( ao, z] under the continuous
increasing map a (j+1)n0 ' *anno + , is an interval (oo, Z'] n J, where Z' is a
random variable. Therefore, (14.13) leads to
As F and Z' are determined by a+,)n o , ... , a;n.+, and the latter are
independent of Xj(x), X 0 (y) one gets, by taking conditional expectation given
1aU+1)no, ... , ajno+1
Y0(x) = x, }(x):=12. (n
(n 1). (14.16)
and the cp (n >, 1) are independent. Thus the Markov process {X(x): n >, 0}
on the state space (0, cc) may be represented as
X.(X) = a( ...p(IX,
where, writing
one has
i.e., lim x10 g,(x) > I for all r. As lim x ^ g(x) = 0 + (1 /3) lim x . f(x) = 0 < 1,
it follows from the strictly increasing and strict concavity properties of g r that
each g r has a unique fixed point a r (see Figure 14.1)
Note that by property (iii) of fr , a, < a z < < a N . If y >, a,, then
9r(Y) % g(a 1 ) % g 1 (a 1 ) = a,, so that X,,(x) >, a, for all n > 0 if x >, a,. Similarly,
if y < a N then gr(Y) 9 r(aN ) ' 9N(aN) = a N , so that X,,(x) < a N for all n >, 0 if
x < a N . As a consequence, if the initial state x is in [a,, a N ], then the process
n >, 0} remains in [a,, a N ] forever. In this case, one may take
J = [a,, a N ] to be the effective state space. Also, if x > a, then the nth iterate
of g i , namely gin ) (x), decreases as n increases. For if x a,, then g,(x) < x,
8121 (x) = g, (g, (x)) < g, (x), etc. The limit of this decreasing sequence is a fixed
point of g l (Exercise 3) and, therefore, must be a,. Similarly, if x < a N then
g(x) increases, as n increases, to a N . In particular,
no)
91 (aN) < 9 N< 0) (a,). (14.22)
a,
u a,
Figure 14.1
180 DISCRETE-PARAMETER MARKOV CHAINS
P(X" O (x) <z O Vxc[a i ,a N ])>P(oc"=g l for I <,n <n o )=pi 0 >0,
P(X"o(x) >, z 0 Vx e [a1, aN]) % P(a = gN for 1 < n < n o ) = pn > 0.
Hence, the condition (1.4.4) of Example I holds, and there exists a unique
invariant probability it, if the state space is taken to be [a,, a N ].
Next fix the initial state x in (0, a 1 ). Then g(x) increases, as n increases.
The limit must be a fixed point and, therefore, a 1 . Since g(a 1 ) > a, for
r = 2, ... , N, there exists s > 0 such that g(y) > a l (2 r < N) if
ye [a 1 e, a 1 ]. Now find n E such that g(x) >, a i e. If T, inf{n > 1:
X(x) >_ a,}, then it follows from the above that
because t, > n E + k implies that the last k among the first n e + k function a"
are g 1 . Since p; goes to zero as k + oo, it follows from this that i l is a.s. finite.
Also XTI (x) < a N as g(y) < gr(aN) gN(aN) = aN (t < r < N) for y <, a 1 , so that
in a single step it is not possible to go from a state less than a 1 to a state larger
than a N . By the strong Markov property, and the result in the preceding
paragraph on the existence of a unique invariant distribution and stability on
[a l , a N ], it follows that XL , + ,(x) converges in distribution to it, as m o0
(Exercise 5). From this, one may show that pl"'(x, dy) converges weakly to it(dy)
for all x, as n + oo, so that it is the unique invariant probability on (0, oo)
(Exercise 5).
In the same manner it may be checked that X(x) converges in distribution
to it if x > a N . Thus, no matter what the initial state x is, X(x) converges in
distribution to n. Therefore, on the state space (0, cc) there exists a unique
invariant distribution it (assigning probability 1 to [a,, a N ]), and stability holds.
In analogy with the case of Markov chains, one may call the set of states
{x; 0 < x <a 1 or x > a N } inessential.
The study of the existence of unique invariant probabilities and stability is
relatively simpler for those cases in which the transition probabilities p(x, dy)
have a density p(x, y), say, with respect to some reference measure (dy) on the
state space. In the case of Markov chains this measure may be taken to be the
counting measure, assigning mass I to each singleton in the state space. For a
class of simple examples with an uncountable state space, let S = Il' and f a
bounded measurable function on I8', a < f (x) < b. Let {E"} be an i.i.d. sequence
of real-valued random variables whose common distribution has a strictly
positive continuous density cp with respect to Lebesgue measure on 1. Consider
the Markov process
with X0 arbitrary (independent of {E"}). Then the transition probability p(x, dy)
Note that
where
Then (see theoretical complement 6.1) it follows that this Markov process has
a unique invariant probability with a density n(y) and that the distribution of
X. converges to it(y) dy, whatever the initial state.
The following example illustrates the dramatic difference between the cases
when a density exists and when it does not.
x+l if-2x0,
(14.26)
Lx-1 if 0<x^2.
First let E be Bernoulli, P(E = I) = '' = P(f _ I). Then, with X. - x e (0, 2],
X 1 (x) =
I x 2
x
with probability ' ,
with probability i,
In other words, X 1 (x) and X 2 (x) are independent and have the same two-point
distribution It s . It follows that { X(x): n >, I } is i.i.d. with common distribution
m. In particular, n x is an invariant initial distribution. If x e [-2,0], then
{X(x): n > l} is i.i.d. with common distribution nx+2, assigning probabilities
Z and Z to {x + 2} and {x}. Thus, there is an uncountable family
of invariant initial distributions {n r : 0 < x < 1) v {n x+ ,: I _< x 0}.
On the other hand, suppose s is uniform on [ 1, 1], i.e., has the density z
on [ 1, 1] and zero outside. Check that (Exercise 6) {X 2 (x): n > l} is an i.i.d.
sequence whose common distribution does not depend on x and has a density
7 r(Y) = 2 4 IYI
2<y<2. (14.27)
C CC
iffX,=x>c+ c +-- -+..+ -
EI E1E2 E1E2...sn+1
{Xn >cforalln}={x>c+ +
_ - ++ 1 foralln}
l E1 6162 E1gZ...En
)
` D
=jxiC+cF
1 llj^
( 1 0D x
,<--1
l n=1 6 I E Z ...E n ) In=1 E182...En C
}
In other words, Ix
p(x)=P1 x 1 (14.31)
(n=j E 1 g 2 C,, C
so that log E 1 c 2 E --p oo a.s., or e,e 2 E n --, 0 a.s. This implies that the
infinite series in (14.31) diverges a.s., that is,
p(x) = 0 for all x, if E log E, < 0. (14.32)
Now by Jensen's Inequality (Chapter 0, Section 2), E log E, < log EE,, with
strict inequality unless E, is degenerate. Therefore, if EE, < 1, or EE, = 1 and
r, is nondegenerate, then E log e, < 0. If E, is degenerate and EE 1 = 1, then
P(E 1 = 1) = 1, and the infinite series in (14.31) diverges. Therefore, (14.32)
implies
p(x) = 0 for all x, if EE 1 < 1. (14.33)
It is not true, however, that E log E, > 0 implies p(x) = 1 for large x. To see
this and for some different criteria, define
This is possible, as 11 (1 + 1/r 2 ) < exp{j] l/r 2 } < oo. If m < I then
P(E, < I + 1/r 2 ) > 0 for all r >, 1. Hence,
m
<1 ifx<c
m 1
P(x) _ (m > 1). (14.37)
m
=1 ifx>c
m
1 < 1 1
n =1 E1...E,, n =1 m m 1
with probability 1 (if m > 1). Therefore, (14.31) implies the second relation in
n(S) 1 1 n(S) 1 1 a
r -1 (m +b )...(
i m+8 r ) > = 1 mr
-2. (14.39)
Then
r.l m
Y
5P1 1 >> 1 a1 =P
\ r =1 ElE2...Er r =1 m
r )
1 > 1 SI.
(r11r m )
Y
If (5 >0 is small enough, the last probability is smaller than P(I 1/(E, E r ) >
x/c 1), provided x/c 1 < 1/(m 1), i.e., if x < cm/(m 1). Thus for such
x one has 1 p(x) > 0, proving the first relation in (14.37).
It = limsup ,. (15.2)
t-.x
^ --, ( 15.3 )
log M log M
in the sense that the compression coefficient of a code is never smaller than
this, although there are codes whose coefficient is arbitrarily close to it.
The parameter H = p, log p ;j is referred to as the entropy of the
;
that, given the transition law of the Markov chain, the optimal compression
coefficient may easily be computed from (15.3) once the invariant initial
distribution is determined.
For a word a of length t let p,(a) = P((Xo , ... , X, _ ,) = a). Then,
as t --+ oo with probability 1 (Exercise 4); i.e., for almost all sample realizations,
for large t the probability of the sequence X0 , X,, ... , X,_, is approximately
exp{ tH}. The result (15.5) is quite remarkable. It has a natural generalization
that applies to a large class of stationary processes so long as a law of large
numbers applies (Exercise 4).
An important consequence of (15.5) that will be used below is obtained by
considering for each t the M` words of length t arranged as x ( , x ( ,., ... in order
of decreasing probability. For any positive number a < 1, let
log N,(c)
lim - - = H. (15.7)
C ^ log
Pt(Xo, ... , X' -' )
t
H < v )>
with probability at least I . Let R, denote the set consisting of all words x
of length t such that e - ` ( " + }' ) < p 1 (a) < e - ` ( " - ' ) . Fix t larger than T. Let
Sr = {a ) , a (2 ... , ; N , (e }. The sum of the probabilities of the M,(E), say, words
a of length t in R, that are counted among the N,(e) words in S, equals
Yr s, u R, p,(a) > r 5 by definition of N,(e). Therefore,
Also, none of the elements of S, has probability less than exp{ t(H + y)}, since
the set of all a with p,(a) > exp{ t(H + y)} contains R, and has total probability
larger than I > E. Therefore,
log N,(e)
<H+y.
t
On the other hand, by (15.8),
Again taking logarithms and now combining this with (15.9), we get
log N,(E)
t
Then
e`H'M
#J < M + M 2 + + MIIH'IIogM] < M(1H'jiogM) { l + 1/M + _ ,
M-1
(15.12)
since the number of code-words of length k is M k . Now observe that
tH'
t,=Ec(X1,...,X^)%lo MP{(X...,X,)eJi}
g
Therefore,
Now observe that for any positive number e < 1, for the probability
P({(X,, ... , Xr ) e J}) to exceed E requires that N,(E) be smaller than #J,. In
view of (15.12) this means that
Now by Proposition 15.1 for any given t this can hold for at most finitely many
values of t. In other words, we must have that the probability
P({ (X,, ... , X,) e Jr }) tends to 0 in the limit as t grows without bound. Therefore,
(15.14) becomes
H' H-26
(15.17)
log M log M
EXERCISES 189
for all sufficiently large t. That is, the number of (relatively) high-probability
words of length t, the sum of whose probabilities exceeds I e, is no greater
than the number M` (H +y ) / I og At of words of length t(H + y)/log M. Therefore,
there are enough distinct sequences of length t(H + y)/log M to code the
N (1 e) words "most likely to occur." For the lower-probability words, the
sum of whose probabilities does not exceed 1 (I e) = t, just code each one
as itself. To ensure uniqueness for decoding, one may put one of the previously
unused sequences of length t(H + y)/log M in front of each of the self-coded
terms. The length c(X0 , X,.... , X,_,) of code-words for such a code is then
either t(H + y)/log M or t + t(H + y)/log M, the latter occurring with
probability at most e. Therefore,
t(H + y) [ _
t(H + y)l t(H + 8)
Ec(X0 , X 1 .... , X1 _ t ) ^ ^
l + t + (15.19)
g lo g M F lo g M
EXERCISES
5. (i) Let {Xn } be a sequence of random variables with denumerable state space S.
Call {Xn } rth order Markov-dependent if
EXERCISES 191
3. (Random Walks on a Group) Let G be a finite group with group operation denoted
by. That is, G is a nonempty set and is a well-defined binary operation for G
such that (i) if x, y a G then x p+ y e G; (ii) if x, y, z e G then x (D(v z) = (v J y) z;
(iii) there is an e E G such that x (D e = e p+ x = x for all x e G; (iv) for each XE G
there is an element in G, denoted x, such that x $(x) = (_r) +Q x = e. If (1 is
commutative, i.e., x Q+ y = y O + x for all x, y E G, then G is called abelian. Let
X 1 , X 2 , ... be i.i.d. random variables taking values in G and having the common
probability distribution Q(g) = P(X = g), g e G.
(i) Show that the random walk on G defined by S = X 0 Q+ X i Q+ O+ X, ri _> 0, is
a Markov chain and calculate its transition probability matrix. Note that it is
not necessary for G to be abelian for {S} to be Markov.
(ii) (Top-In Card Shuffles) Construct a model for card shuffling as a Markov chain
on a (nonabelian) permutation group on N symbols in which the top card of
the deck is inserted at a randomly selected location in the deck at each shuffle.
(iii) Calculate the transition probability matrix for N = 3. [Hint: Shuffles are of the
form (c1, c2, c 3) s (c2. c1, c 3) or (c2, c3, C1) only.] Also see Exercise 4.5.
An individual with a highly contagious disease enters a population. During each
subsequent period, either the carrier will infect a new person or be discovered and
removed by public health officials. A carrier is discovered and removed with
probability q = I p at each unit of time. An unremoved infected individual is sure
to infect someone in each time unit. The time evolution of the number of infected
individuals in the population is assumed to be a Markov chain {X: n = 0, 1, 2, ...}.
What are its transition probabilities?
5. The price of a certain commodity varies over the values 1, 2, 3, 4, 5 units depending
on supply and demand. The price X at time n determines the demand D. at time n
through the relation D. = N X, where N is a constant larger than 5. The supply
C. at time n is given by CC = N 3 + E. where {F} is an i.i.d. sequence of equally
likely 1-valued Bernoulli random variables. Price changes are made according to
the following policy:
(i) Fix X0 = i o . Show that {X} is a Markov chain with state space S = { 1, 2, 3, 4, 5}.
(ii) Compute the transition probability matrix of {X}.
(iii) Calculate the two-step transition probabilities.
6. A reservoir has finite capacity of h units, where h is a positive integer. The daily
inputs are i.i.d. integer-valued random variables {J: n = 1, 2, ...} with the common
p.m.f. {g j = P(J = j), j = 0, 1, 2, ...}. One unit of water is released through the dam
at the end of each day provided that the reservoir is not empty or does not exceed
its capacity. If it is empty, there is no release. If it exceeds capacity, then the excess
water is released. Let X. denote the amount of water left in the reservoir on the nth
day after release of water. Compute the transition matrix for {X}.
7. Suppose that at each unit of time each particle located in a fixed region of space has
probability p, independently of the other particles present, of leaving the region. Also,
192 DISCRETE-PARAMETER MARKOV CHAINS
at each unit of time a random number of new particles having Poisson distribution
with parameter ) enter the region independently of the number of particles already
present at time n. Let X. denote the number of particles in the region at time n.
Calculate the transition matrix of the Markov chain {X"}.
8. We are given two boxes A and B containing a total of N labeled balls. A ball is
selected at random (all selections being equally likely) at time n from among the N
balls and then a box is selected at random. Box A is selected with probability p and
B with probability q = I p independently of the ball selected. The selected ball is
moved to the selected box, unless the ball is already in it. Consider the Markov
evolution of the number X. of balls in box A. Calculate its transition matrix.
9. Each cell of a certain organism contains N particles, some of which are of type A
and the others type B. The cell is said to be in state j if it contains exactly j particles
of type A. Daughter cells are formed by cell division as follows: Each particle replicates
itself and a daughter cell inherits N particles chosen at random from the 2j particles
of type A and the 2N 2j particles of type B present in the parental cell. Calculate
the transition matrix of this Markov chain.
3. Let p = ((p i; )) denote the transition matrix for the unrestricted general random walk
of Example 6.
(i) Calculate p;t interms of the increment distribution Q.
(ii) Show that p = Q*"(j i), where the n-fold convolution is defined recursively by
4. Verify each of the following for the Plya urn model in Example 8.
(i) P(X" = 1) = r/(r + b) for each n = 1, 2, 3, ... .
(ii) P(X1 = e1,...,Xn =En)= P(Xi+n=x ,...,X. +h= ),foranyh=0,1,2,...
(*iii) {X is a martingale (see Definition 13.2, Chapter I).
X}
5. Describe the motion represented by a Markov chain having transition matrix of the
following forms:
10
(i) = 0 1 ],
01
=[1 0J'
(ii)
I
5 S
(iii) p =
ss
EXERCISES 193
(iv) Use the probabilistic description to write down p" without algebraically
performing the matrix multiplications. Generalize these to m-state Markov
chains.
(Length of a Queue) Suppose that items arrive at a shop for repair on a daily basis
but that it takes one day to repair each item. New arrivals are put on a waiting list
for repair. Let A. denote the number of arrivals during the nth day. Let X. be the
length of the waiting list at the end of the nth day. Assume that A,, A,, .. . is an i.i.d.
nonnegative integer-valued sequence of random variables with a(x) = P(A = x),
x = 0, 1, 2, .... Assume that A + , is independent of X o X (n _> 0). Calculate
, ... ,
_<
7. (Pseudo Random Number Generator) The linear congruential method of generating
integer values in the range 0 to N I is to calculate h(x) = (ax + c) mod(N) for
some choice of integer coefficients 0 a, c < N and an initial seed value of x.
More generally, polynomials with integer coefficients can be used in place of ax + c.
Note that these methods cycle after N iterations.
(i) Show that the iterations may be represented by a Markov chain on a circle.
(ii) Calculate the transition probabilities in the case N = 5, a = 1, c = 2.
(iii) Calculate the transition probabilities in the case h(x) _ (x 2 + 2) mod(5).
[See D. Knuth (1981), The Art of Computer Programming, Vol. lt, 2nd ed.,
Addison-Wesley, Menlo Park, for extensive treatments.]
(A Renewal Process) A system requires a certain device for its operation that is
subject to failure. Inspections for failure are made at regular points in time, so that
an item that fails during the nth period of time between n 1 and n is replaced at
time n by a device of the same type having an independent service life. Let p denote
the probability that a device will fail during the nth period of its use. Let X. be the
age (in number of periods) of the item in use at time n. A new item is started at time
n = 0, and X. = 0 if an item has just been replaced at time n. Calculate the transition
matrix of the Markov chain {X}.
A balanced six-sided die is rolled repeatedly. Let Z denote the smallest number of
rolls for the occurrence of all six possible faces. Let Z, = 1, Z ; = smallest number of
tosses to obtain the jth new face after j 1 distinct faces have occurred. Then
Z=Z 1 ++Z 6 .
(i) Give a direct proof that Z,, ... , Z 6 are independent random variables.
(ii) Give a proof of (i) using the strong Markov property. [Hint: Define stopping
times t, denoting the first time after r j _, that X. is not among X,, .... X T
N )(
Ik
(iv) P(T > m) = k ^^ ( I)k+l(
where (i, i 1, ... , I> is the permutation in which the ith value moves to i 1,
i I to i 2, ... 2 to 1, and 1 to i. Let S o be the identity permutation and let
S = X,. . . X, where the group operation is being expressed multiplicatively. Let T
denote the first time the original bottom card arrives at the top and is inserted back
into the deck (cf. Exercise 2.3). Then
(i) T is a stopping time.
(ii) T has the additional property that P(T = k, Sk = g) does not depend on g e G.
[Hint: Show by induction on N that at time T I the (N 1)! arrangements
of the cards beneath the top card are equally likely; see Exercise 2.3(iii).]
(iii) Property (ii) is equivalent to P(Sk = g I T = k) = 1/1G 5 I; i.e., the deck is mixed
at time T. This property is referred to as the strong uniform time property by
D. Aldous and P. Diaconis (1986), "Shuffling Cards and Stopping Times, Amer.
Math. Monthly, 93, pp. 333-348, who introduced this example and approach.
EXERCISES 195
_< _<
(iv) Show that
P(ScA)=P(SEA,T_<n)+P(SeA,T>n)
0 _< _<
r
A
IGNI
I B + rP(T > n),
= I I P(T n) + P(S e A T > n)P(T > n) = Al
IGNJ
(ii) Show that {1'} is a Markov process and calculate its transition probabilities.
(iii) Extend (ii) to the case when the distribution function of Xk is continuous.
j 0 0 0 j 0 0
0 0 0; 0 0 2
I 1 1 1 1 1
6 6 6 6 6 6
0 1 0 0 0 1 0
5 0 0 0 3 00
0 0 0 6 0 0 6`
0 * 0 0 0 3 0
4. Suppose that S comprises a single essential class of aperiodic states. Show that there
is an integer v such that p$ 1 > 0 for all i, j e S by filling in the details of the following
steps.
(i) For a fixed (i, )), let B i; _ {v > 1: p;J) > 0}. Then for each state j, B is closed
under addition.
(ii) (Basic Number Theory Lemma) If B is a set of positive integers having greatest
common divisor 1 and if B is closed under addition, then there is an integer b
such that ne B for all n >, b. [Hints:
(a) Let G be the smallest additive subgroup of Z that contains B. Then argue
that G = Z since if d is the smallest positive integer in G it will follow that
if n E B, then, since n = qd + r, 0 <, r < d, one obtains r = n qd E G and
hence r = 0, i.e., d divides each n e B and thus d = 1.
(b) If leB,theneachn=l+1++Ie B. If10B,thenby(a), 1 =a
for a, E B. Check b = ( a + ) Z + 1 suffices; for if n > (a + ) 2 , then, writing
(iii) For each (i, j) there is an integer b;; such that v b implies v e B.. [Hint:
;;
Obtain b from (ii) applied to (i) and then choose k such that p;; 9 > 0. Check
that b = k + b suffices.]
;; ;;
(iv) Check that v = max {b : i, j e S} suffices for the statement of the exercise.
;;
5. Classify the states in Exercises 2.4, 2.5, 2.6, 2.8 as essential and inessential states.
Decompose the essential states into their respective equivalence classes.
6. Let p be the transition matrix on S = {0, 1, 2, 3} defined below.
2 0 ' 0 '2
20 Z 0
2 0 1 0 '2
2 0 2 0
Show that S is a single class of essential states of period 2 and calculate p" for all n.
7. Use the strong Markov property to prove that if j is inessential then P,,(X,, =j for
infinitely many n) = 0.
8. Show by induction on N that all states communicate in the Top-In Card Shuffling
example of Exercises 2.3(ii) and 4.5.
EXERCISES 197
solution z = (z, z"); recall that det(B) = det(B') for any N x N matrix B.]
(ii) Show that A = 1 must be a simple eigenvalue of A (i.e., geometric multiplicity
1). [Hint: Suppose z is any (left) eigenvector corresponding to A = 1. By the
results of this section there must be an invariant distribution (positive
eigenvector) n. For t sufficiently large z + to is also positive (and normalizable).]
*8. Let A = ((a ;j )) be a N x N matrix with positive entries. Show that the spectral radius
is also given by min{.l > 0: Ax < Ax for some positive x}. [Hint: A and its transpose
A' have the same eigenvalues (why?) and therefore the same spectral radius. A' is
adjoint to A with respect to the usual (dot) inner product in the sense
(Ax, y) = (x, A'y) for all x, y, where (u, v) = Z" 1 u 1 v.. Apply the maximal property
to the spectral radius of A'.]
9. Let p(x, y) be a continuous function on [c, d] x [c, d] with c < d. Assume that
J
p(x, y) > 0 and p(x, y) dy = 1. Let S2 denote the space of all sequences
w = (x 0 x 1 ,...) of numbers x ; e [c, d]. Let .yo denote the class of all
,
finite-dimensional sets A of the form A = {co = (x 0 x,, . ..) e S2: a, <x < b ; , ,
i = 0,1, ... , n}, where c < a ; < b, < d for each i. Define P (A) for such a set A by P
Px(A) = "... P(x,Y1)p(Y1,Y2) ... p(Y "-1,Y ")dY" ... dy1 forxe[ao,bo].
J fa bl ,
us that PX has a unique extension to a probability measure defined for all events in
the smallest sigmafield .y of subsets of Q that contains FO . For any nonnegative
integrable function y with integral 1, define
Let X. denote the nth coordinate projection mapping on 52. Then {X"} is said to
be a Markov chain on the state space S = [c, d] with transition density p(x, y) and
initial density y under P. Under P. the process is said to have initial state x.
(i) Prove the Markov property for {X"}; i.e., the conditional distribution of X"+ 1
given X0 ,... , X" is p(X", y) dy.
(ii) Compute the distribution of X. under P.
(iii) Show that under PY the conditional distribution of X. given X o = x o is
p 1 " ) ( x o , y) dy, where
by breaking the integral into two terms involving y such that p(x, y) > p(z, y)
and those y such that p(x, y) -< p(z, y).
(v) Show that there is a continuous strictly positive function n(y) such that
max {Ip " (x, y) m(y)I: c -< x, y -< d} < [1 b(d c)]" 'p
( )
S"=x+X,++X" mod1,
the process is called time-reversible if it p ij = 7r j pj; for all i, j e S [it is often said
to be time-reversible (with respect to p) as well]. Show that if S is finite and p
is doubly stochastic, then the (discrete) uniform distribution makes the process
time-reversible if and only if p is symmetric.
(*ii) Suppose that {X^} is a Markov chain with invariant distribution it and started
in x. Then {X} is a stationary process and therefore has an extension backward
in time ton = 0, 1, 2..... [Use Kolmogorov's Extension Theorem.] Define
the time-reversed process by Y = X_,. Show that the reversed process {}} is
a Markov chain with 1-step transition probabilities q ;j = nj pji /n,.
(iii) Show that under the time-reversibility condition (i), the processes in (ii), { Y}
and {X}, have the same distribution; i.e., in equilibrium a movie of the evolution
looks the same statistically whether run forward or backward in time.
(iv) Show that an irreducible Markov chain on a state space S with an invariant
initial distribution it is time-reversible if and only if (Kolmogorov Condition):
Pr, Pi,i 2 Piki = PukP ikik- 1 * P i,i for all i, i 1 , .... i k E S, k >_ 1.
(v) If there is a j e S such that p ij > 0 for all i 0 j in (iv), then for time-reversibility
it is both necessary and sufficient that p ij Pik Pki = Pik Pkj Pji for all i, j, k.
5. (Random Walk on a Tree) A tree graph on n vertices v 1 , v 2 , ... , v, is a connected
graph that contains no cycles. [That is, there is given a collection of unordered pairs
of distinct vertices (called edges) with the following property: Any two distinct vertices
u, v e S are uniquely connected in the sense that there is a unique sequence
e l , e 2 .... , e of edges e ; _ {v k; , v,,} such that u e e, v e e,,, e i n e i ,, ^ 0,
i = 1, ... , n 1.] The degree v i of the vertex v ; represents the number
of vertices adjacent to v i , where u, v E S are called adjacent if there is an edge {u, v}.
By a tree random walk on a given tree graph we mean a Markov chain on the state
space S = {v1, v2.... , v r } that at each time step n changes its state v ; to one of its v i
randomly selected adjacent states, with equal probabilities and independently of its
states prior to time n.
(i) Explain that such a Markov chain must have a unique invariant distribution.
(ii) Calculate the invariant distribution in terms of the vertex degrees v i , i = 1, ... , r.
(iii) Show that the invariant distribution makes the tree random walk time-reversible.
6. Let {X} be a Markov chain on S and define Y. = (X, X n = 0, 1, 2, ... .
(i) Show that {Y} is a Markov chain on S' = {(i, j) E S x S: p i , > 0).
(ii) Show that if {X} is irreducible and aperiodic then so is {Y}.
(iii) Show that if {X} has invariant distribution it = (rz ; ) then {}} has invariant
distribution (n ; p ; ,).
7. Let {X} be an irreducible Markov chain on a finite state space S. Define a graph G
having states of S as vertices with edges joining i and j if and only if either p, j > 0
or pj; > 0.
(i) Show that G is connected; i.e., for any two sites i and j there is a path of edges
from i to j.
(ii) Show that if {X} has an invariant distribution it then for any A c S,
(i.e., the net probability flux across a cut of S into complementary subsets A, S\A
is in balance).
(iii) Show that if G contains no cycles (i.e., is a tree graph in the sense of Exercise
5). then the process is time-reversible started with n.
P(Z>r)= P(Z=n).
.=o r=on=r+t
as n oc. [Hint: Represent N(j) as a sum of indicator variables and use (8.11).]
4. Classify the states for the models in Exercises 2.6, 2.7, 2.8, 2.9 as transient or recurrent.
5. Classify the states for {R} = {IS,,}, where S. is the simple symmetric random walk
starting at 0 (see Exercise 1.8).
6. Show that inessential states are transient.
7. (A Birth or Collapse Model) Let
1 i
-- '
Pi.^+I =i+1 Pi.o i =0, 1,2,....
=i+ 1'
I i
A.o = Pi.;+I = i + 1' >_ I ' Po.i 1.
i+ l ' i
(i) Prove (see (5.5) of Chapter I) that r = J]ni=1 f,,r_ m (n _> 1).
(ii) Sum (i) over n to give an alternative proof of (8.11).
(iii) Use (i) to indicate how one may compute the distribution of the first visit to
state j (after time zero), starting in state i, in terms of p;PP (n _> 1).
202 DISCRETE-PARAMETER MARKOV CHAINS
11. (i) Show that if II II and II 110 are any two norms on t, then there are positive
constants c,, c 2 such that
(ii) Show that the stability condition given in Example 1 implies that X. . 0 in
every norm.
n
lim = E(T; 2) t) =
Jim ' = o0
n w Nn
distribution. [Hint: Use the coupling method described in Exercise 6.6 for finite
state space.]
EXERCISES 203
k x
iff pj < b .
k=1j=1
8. Calculate the invariant distribution for the Renewal Model of Exercise 3.8, in the
case that p"=pn -1 (1 p),n=0, 1,2,. . . where 0 <p < 1.
9. (One-Dimensional Nearest-Neighbor Ising Model) The one-dimensional nearest-
neighbor Ising model of magnetism consists of a random distribution of + I-valued
random variables (spins) at the sites of the integers n = 0, 1, 2, .... The
parameters of the model are the inverse temperature /i = - 1 - > 0 where T is
kT
temperature and k is a universal constant called Boltzmann s constant, an external
field parameter H, and an interaction parameter (coupling constant) J. The spin
variables X", n = 0, + 1, 2, +3, ... , are distributed according to a stochastic
process on { 1,1 } indexed by Z with the Markov property and having stationary
transition law given by
exp{Jai + Hn}
p(X" + ' =nI X" =Q)= 2cosh(H+Ja)
(iv) Determine when the process (in equilibrium) is reversible for the invariant
distribution; modify Exercise 7.4 accordingly.
10. Show that if {X} is an irreducible positive-recurrent Markov chain then the
condition (iv) of Exercise 7.4 is necessary and sufficient for time-reversibility of the
stationary process started with distribution it.
*11. An invariant measure for a transition matrix ((p ;j )) is a sequence of nonnegative
numbers (m ; ) such that m ; p ;J = m j for all j ES. An invariant measure may or
may not be normalizable to a probability distribution on S.
(i) Let p ; _ ;+ , = p ; and p ; , o = I p ; for i = 0, 1, 2, .... Show that there is a unique
invariant measure (up to multiples) if and only if tim,,. fjk=, Pk = 0; i.e., if
and only if the chain is recurrent, since the product is the probability of no
return to the origin.
(ii) Show that invariant measures exist for the unrestricted simple random walk
but are not unique in the transient case, and is unique (up to multiples) in
the (null) recurrent case.
(iii) Let Poo = Poi = i and p i , ; _, = p i , ; = 2 - ' -Z , and p ;.;+ , = I 2 i = 1, 2, 3,
.... Show that the probability of not returning to 0 is positive (i.e., transience),
but that there is a unique invariant measure.
12. Let { Y} be any sequence of random variables having finite second moments and
let y, = Cov(Y, Y,), = EY, o = Var(Y) = y, and p
(i) Verify that 1 <_ p _< 1 for all n and m. [Hint: Use the Schwarz Inequality.]
(ii) Show that if p . , _< f(In ml), where f is a nonnegative function such that
n_ 2 Yk=f(k)Y n =1 Qk -*0 as n -* oo, then the WLLN holds for {}}.
(iii) Verify that if = p o ^_^ = p(ln ml) > 0, then it is sufficient that
p(k) -*0 as n -+ oc for the WLLN.
(iv) Show that in the case of nonpositive correlations it is sufficient that
n -1 Ik= 1 ak -+0 as n , oo for the WLLN.
13. Let p be the transition probability matrix for the asymmetric random walk on
S = {0, 1, 2, ...} with 0 absorbing and p i , ;+ , = p > Z for i _> 1. Explain why for
fixed i > 0,
1 ^
j e S,
n,-,
does not converge to the invariant distribution S o ({j}) (as n -- co). How can this
be modified to get convergence?
14. (Iterated Averaging)
(i) Let a,, a 2 , a 3 be three numbers. Define a 4 = (a, + a 2 + a 3 )/3, a 5 =
(a 2 + a 3 + a 4 )/3,.... Show that lim,, a = (a, + 2a 2 + 3a 3 )/6.
(ii) Let p be an irreducible positive recurrent transition law and let a,, a 2 , ... be
any bounded sequence of numbers. Show that
lim Pi! aj _
) ajnt,
where (it) is the invariant distribution of p. Show that the result of (i) is a
special case.
EXERCISES 205
1. (i) Let Y1 , Y,, ... be i.i.d. with EY < co. Show that max(Y 1 , ... , Y")/,/n 0 a.s.
as n * . [Hint: Show that P(Y.2 > ne i.o.) = 0 for every e > 0.]
(ii) Verify that n '/ZS" has the same limiting distribution as (10.6).
-
2. Let {W"(t): t >_ 0} be the path process defined in (10.7). Let t I < t 2 < < t k , k >_ 1,
be an arbitrary finite set of time points. Show that (W"(t 1 ), ... , W"(t k )) converges in
distribution as n ^ x to the multivariate Gaussian distribution with mean zero and
variancecovariance matrix ((D min{t i , t i })), where D is defined by (10.12).
3. Suppose that {X"} is Markov chain with state space S = { 1, 2, . .. , r} having unique
invariant distribution (it ; ). Let
Show that
`(N"(1) Na(r)
n 7C 1 ...
n n
4. For the one-dimensional nearest-neighbor Ising model of Exercise 9.9 calculate the
following:
(i) The pair correlations p,,,, = Cov(X", X,").
(ii) The large-scale variance (magnetic susceptibility) parameter Var(X 0 ).
(iii) Describe the distribution of the fluctuations in the (bulk limit) magnetization
(cf. Exercise 9.9(i)).
5. Let {X"} be a Markov chain on S and define Y = (X", Xri+ 1 ), n = 0, 1, 2..... Let
p = ((p, j )) be the transition matrix for {X"}.
(i) Show that { Y"} is a Markov chain on the state space defined by
S'_ {(i,j)eS x S:p ij >0}.
(ii) Show that if {XX } is irreducible and aperiodic then so is { }"}.
(iii) Suppose that {X"} has invariant distribution it = (n i ). Calculate the invariant
distribution of { },}.
(iv) Let (i, j) e S' and let T be the number of one-step transitions from i to j by
X0 , X"... , X" started with the invariant distribution of (iii). Calculate
lim". x (T"/n) and describe the fluctuations about the limit for large n.
6. (Large-Sample Consistency in Statistical Parameter Estimation) Let XX = I or 0
according to whether the nth day at a specified location is wet (rain) or dry. Assume
{X"} is a two-state Markov chain with parameters = P(X" +1 = II X" = 0) and
S=P(X 1 =0^X"=1),n=0,1,2,...,0<f<1,0<S<I.Suppose that {X}
is in equilibrium with the invariant initial distribution it = (n 1 , n o ). Define statistics
based on the sample X 0 , X 1 , ... , X" to estimate /3, it , respectively, by t" = S/(n + 1)
206 DISCRETE-PARAMETER MARKOV CHAINS
7. Use the result of Exercise 1.5 to describe an extension of the SLLN and the CLT to
certain rth-order dependent Markov chains.
1. Let {X"} be a two-state Markov chain on S = {O, 1 } and let T o be the first time
{X"} reaches 0. Calculate P l (z o = n), n _> 1, in terms of the parameters p, o and Poi
2. Let {X"} be a three-state Markov chain on S = {0, 1, 2} where 0, 1, 2 are arranged
counterclockwise on a circle, and at each time a transition occurs one unit clockwise
with probability p or one unit counterclockwise with probability 1 p. Let t o denote
the time of the first return to 0. Calculate P(-r o > n), n > 1.
3. Let T o denote the first time starting in state 2 that the Markov chain in Exercise
5.6 reaches state 0. Calculate P 2 (r o > n).
4. Verify that the Markov chains starting at i having transition probabilities p and p,
and viewed up to time T A have the same distribution by calculating the probabilities
of the event {X0 = i, X, = i ...... Xm = m , T A =m} under each of p and p.
5. Write out a detailed explanation of (11.22).
6. Explain the calculation of (11.28) and (11.29) as given in the text using earlier results
on the long-term behavior of transition probabilities.
7. (Collocation) Show that there is a unique polynomial p(x) of degree k that takes
prescribed (given) values v o , v 1 ..... v k at any prescribed (given) distinct points
x 0 , x,, ... , x k , respectively; such a polynomial is called a collocation polynomial.
[Hint: Write down a linear system with the coefficients a o , a,, ... , a k of p(x) as the
unknowns. To show the system is nonsingular, view the determinant as a polynomial
and identify all of its zeros.]
*8. (Absorption Rates and the Spectral Radius) Let p be a transition probability matrix
for a finite-state Markov chain and let r ; be the time of the first visit to j. Use (11.4)
and the results of the PerronFrobenius Theorem 6.4 and its corollary to show that
exponential rates of convergence (as obtained in (11.43)) can be anticipated more
generally.
ifi>0,j=0,1,2,...,i-1
ifi<0,j=0,-1,-2,..., i +I
1 ifi=0,j=0
0 ifi=0,j960.
(ii) Show that the mean time to absorption starting at i > 0 is given by =, (i /k).
10. Let {X} be the simple branching process on S = {0, 1, 2, ...} with offspring
distribution { fj }, f j a jfj 1.
(i) Show that all nonzero states in S are transient and that lim^ P 1 (X = k) = 0,
k=1,2,....
(ii) Describe the unique invariant probability distribution for {X}.
II. (i) Suppose that in a certain society each parent has exactly two children, and
both males and females are equally likely to occur. Show that passage of the
family surname to descendants of males eventually stops.
(ii) Calculate the extinction probability for the male lineage as in (i) if each parent
has exactly three children.
(iii) Prompted by an interest in the survival of family surnames, A. J. Lotka (1939),
"Theorie Analytique des Associations Biologiques II, Actualites Scientifiques
et Industrielles, (N.780), Hermann et Cie, Paris, used data for white males in
the United States in 1920 to estimate the probability function f for the
number of male children of a white male. He estimated f(0) = 0.4825,
f(j)= (0.2126)(0.5893)' ' (j = 1,2,...).
-
12. Let f be the offspring distribution function for a simple branching process having
finite second moment. Let p = > k kf(k), v 2 = E k (k p) 2 f(k). Show that, given
Xo = 1,
1 )/(u 1) if it # I
Var X. =
g if p = 1.
13. Each of the following distributions below depends on a single parameter. Construct
graphs of the nonextinction probability and the expected sizes of the successive
generations as a function of the parameter.
p ifj= 2
(i) f(j)= q ifj=0
0 otherwise;
(iii) f(j)=^ i j
-
9,(a) _
_ q(b, a)q(a, c)
a, b, cc S. [Hint: Y_ g(a) = 1, b, c e S.]
9,(0) q(b, O)q(O, c)' a
(ii) Use (12.11), (12.12) to show that this condition can be expressed as
and, therefore,
[a, , a, b] _ g()
[a, /3', a, b] g,,,(')
[a, , a, b] g (a)
[a, , a', b] g.b(a')
Use the "substitution scheme" of (iii) and (iv) to verify (12.10) by checking (ii).
2. (i) Verify (12.13) for the case n = 1, r = 2. [Hints: Without loss of generality, take
h = 0, and note,
[x, , y b]
,
[a, u, v, b]
for all x, and x * p(x, dy) is weakly continuous (i.e., f s f (y) p(x, dy) is a continuous
function of x for every bounded continuous function f on S), then n is the unique
invariant probability for p(x, dy), i.e., j 13 p(x, B)rc(dx) = n(B) for all Be .V V. [Hint:
Let f be bounded and continuous. Then
1lr(dz). ]
J
(iii) Extend (i) and (ii) to arbitrary metric space S, and note that it suffices to require
convergence of n - ' Jm- If (Y)pl'")(x, dy) to If (y)ir(y) dy for all bounded
continuous f on S.
2. (i) Let B 1 , B 2 be m x m matrices (with real or complex coefficients). Define IIBII as
in (13.13), with the supremum over unit vectors in Il'" or C'. Show that
(iii) If B is an m x m matrix and IIBII is defined to be the supremum over unit vectors
in C'", show that IIB"II >, r"(B). Use this together with (13.17) to prove that
limllB"II" exists and equals r(B). [Hint: Let A. be an eigenvalue such that
12,1 = r(B). Then there exists x e C', (1x11 = 1, such that Bx = Ax.]
_<
3. Suppose E I is a random vector with values in 1.
(i) Prove that if b > 1 and c > 0, then
log c
P(Ie 1 I > cb") EEIZI 1, where Z = logI ,I
" =1 log (5
_< _<
n =1 n=1
IIS"B"II _<
(ii) Show that if (13.15) holds then (13.16) converges. [Hint:
(iii) Show that (13.15) holds, if it holds for some S < 1 /r(B). [Hint: Use the Lemma.]
r n o }. ]
4. Suppose Y a"z" and 1] b "z" are absolutely convergent and are equal for Izl < r, where
r is some positive number. Show that a n = bn for all n. [Hint: Within its radius of
EXERCISES 211
(i) Prove that p " (x, dz) converges weakly to n(dz). [Hint: p '` (x, J) . I as k --i c0,
( ) ( )
(k+r) .) (k)
f f(y)p (x dy) = f ($f(z)p^ (y, dz))p (x dy).]
(ii) Assume the hypothesis above for all x e J (with J not depending on x). Prove
that it is the unique invariant probability.
6. In Example 2, ifs,, are i.i.d. uniform on [ 1,1], prove that {X 2n (x): n > 1} is i.i.d.
with common p.d.f. given by (14.27) if XE [-2, 2].
7. In Example 2, modify f as follows. Let 0 < <. Define fo (x): _f(x) for
2 _< x < 6, and 6 _< x _< 2, and linearly interpolate between (,), so that f,
is continuous.
(i) Show that, for x e [6, 1] (or, x c [-1, b]) {X"(x): n >, 1} is i.i.d. with common
distribution it (or, nx+2).
(ii) For x e (1, 2] (or, [-2, 1)) {X"(x): n >_ 2} is i.i.d. with common distribution
ix (or, ix +2).
(iii) For x e (-8, 6) {X"(x): n >_ 1} is i.i.d. with common distribution it_ X+ ,.
8. In Example 3, assume P(e 1 = 0) > 0 and prove that P(e,, = 0 for some n >_ 0) = 1.
9. In Example 3, suppose E log s > 0. ;
(i) Prove that E, , {1/(E 1 ...e" )} converges a.s. to a (finite) nonnegative random
variable Z.
(ii) Let d, := inf{z > 0: P(Z < z) > 0}, d 2 == sup{z >- 0: P(Z >- z) > 0}. Show that
=0 if x < c(d, + l ),
p(x) e (0, 1) if c(d, + l) < x < c(d 2 + 1),
=1 if x > c(d 2 + 1).
_ 1
(a) d, = > m- " if m > 1, = oo if m 5 1 , and
n =, mI
EXERCISES 213
Intuitively, condition (d) says that the total uncertainty in the joint occurrence of
two independent events is the cumulative uncertainty for each of the events. Verify
that h must be of the form h(p) = c log 2 p where c = h(21) > h(1) = 0 is a positive
(f)
constant. Standardizing, one may take
h(p) = log2 p.
2. Let f= be a probability distribution on S = { 1, 2, . .. , M}. Define the entropy
in f by the "average uncertainty," i.e.,
Y_
(i) Show that H(f) is maximized by the uniform distribution on S.
(ii) If g = (g ; ) is another probability distribution on S then
H(f) Ji 1og 2 gi
i
<_
3. Suppose that X is a random variable taking values a 1 ..... a M with respective
probabilities p(a, ), ... , p(a M ). Consider an arbitrary binary coding of the respective
symbols a,, I < i z M, by a string cb(a ; ) = ( rI', ... , sl,'t) of 0's and l's, such that no
string (eilt, ... , F 11 ) can be obtained from a shorter code (c,, ... , e 1 n ; n j , by
adding more terms; such codes will be called admissible. The number n ; of bits is
called the length of the code-word 4)(a 1 ).
(i) Show that an admissible binary code 0 having respective lengths n ; exists if and
only if
M
I.
i =1
(ii) (Noiseless Coding Theorem) For any admissible binary code 0 of a,..... a M
having respective lengths n 1 ..... M' the average length of code-words cannot
=l
n.p(a,) % Z
be made smaller than the entropy of the distribution p of a,, ... , a,, i.e.,
i =1
p(a) log2 p(ai)
H(p) Y
[Hint: Use Exercise 2() with f = p(a ; ), g ; = 2 - " to show
i =1
n1p(ai) + log 2 (
k =1
2 - "").
214 DISCRETE-PARAMETER MARKOV CHAINS
M
H(p) n1 P(ai) 5 H(p) + 1.
log t p(a 1 )
(iv) Verify that there is not a more efficient (admissible) encoding (i.e. minimal
average number of bits) of the symbols a,, a 2 , a 3 for the distribution p(al) = z,
P(a2) = p(a 3 ) = , than the code 0(a 1 ) = ( 0), 0(a 2 ) = ( 1, 0), (a 3 ) = ( 1, 1)
4. (i) Show that Y1 , Y2 ,in (15.4) satisfies the law of large numbers.
...
(ii) Show that for (15.5) to hold it is sufficient that Yl , YZ satisfy the law of large
, ...
numbers.
THEORETICAL COMPLEMENTS
Px(A)
=L ...
9,
P(x,Y1)P(Y1,Y2)...P(yn_1,Y(dyn)...u(dy1) (T.6.1)
Let X. denote the nth coordinate projection mapping on 12. Then {X} is said to be
a Markov chain on the state space S with transition density p(x, y) with respect to p
and initial distribution, y under P. The results of Exercise 6.9 can be extended to this
THEORETICAL COMPLEMENTS 215
setting as follows. Suppose that there is a positive integer r and a p-integrable function
p on S such that fs p(x)p(dx) > 0 and p ' (x, y) > p(y) for all x, y in S. Then there is
( )
M(B) := uES ^
JB
Then
2 , s
pick+ n.>
(x, y)p(dy) p k+ '(z, y)p(dy)1 ( 1 e)[Mk.(B) mk.(B)]
(ii) Ifil
e' - x ify<x
P(x, Y) _
,r,(1_y)m
P(L m) =
fo
P(L m I Y = Y)P(Y E dY) =
0
( m 1)! J P(Y E dY),
and therefore,
ri
y
P(L >, m):= lim P(L >_ m) = J ^ 1
0 )1 2(1 y) dy.
--W o (m 1)!
THEORETICAL COMPLEMENTS 217
i
lim p(Jd) = pijd(E;i^ii)-
n-^m
To obtain these from the general renewal theory described below, take as the
delay Zo the time to reach j for the first time starting at i. The durations of the
subsequent replacements Z 1 , Z 2 , ... represent the lengths of times between returns
to I.
F(x + a) F(a)
G(x) _
1 F(a)
Let
S. = Zo + Z, + + Zn , n ^> 0, (T.9.1)
and let
We will use the notation S.0, N sometimes to identify cases when Z o = 0 a.s. Then
Sn is the time of the (n + 1)st renewal and N, counts the number of renewals up to
218 DISCRETE-PARAMETER MARKOV CHAINS
and including time t. In the case that Z o = 0 a.s., the stochastic (counting) process
{N} - {N} is called the (ordinary) renewal process. Otherwise {N,} is called a delayed
renewal process.
For simplicity, first restrict attention to the case of the ordinary renewal process.
Let u = EZ l < oo. Then 1/ is called the renewal rate. The interpretation as an
average rate of renewals is reasonable since
Sx -i \ t \
(T.9.3)
N, N, N,
>_
N, -. 1
a.s. as t - oc. (T.9.4)
t p
Since N, is a stopping time for {Sn } ( for fixed t 0), it follows from Wald's Equation
(Chapter I, Corollary 13.2) that
ESN , = pEN,. (T.9.5)
EN,1
-
as t --* Cc. (T.9.6)
t
To deduce this from the above, simply observe that pEN, = ESN , > t and therefore
EN,1
/
liminf -
- . t p
On the other hand, assuming first that Z. < C a.s. for each n 1, where C is a >,
positive constant, gives pEN, < t + C and therefore for this case, limsup,.,(EN,/t)
I/p. More generally, since truncations of the Z. at the level C would at most decrease
the S, and therefore at most increase N, and EN this last inequality applied to the
truncated process yields
EN,
<- 1
fc
limsup as C -4 co. (T.9.7)
t CP(Z 1 >, C) + xF(dx) p
The above limits (T.9.4) and (T.9.6) also hold in the case that p = co, under the
convention that 1 /0o is 0, by the SLLN and (T.9.7). Moreover, these asymptotics
can now be applied to the delayed renewal process to get precisely the same conclusions
for any given (initial) distribution G of delay.
With the special choice of G = F. defined by
x
F(x) =1 I P(Z, > u)du, x 0, (T.9.8)
u j0
THEORETICAL COMPLEMENTS 219
the corresponding delayed renewal process N, called the equilibrium renewal process,
has the property that
where
NI(t,t+h]=N+hN,
N. (T.9.10)
To prove this, define the renewal function m(t) = EN t >- 0, for {N1 }. Then for the
general (delayed) process we have
Observe that g(t) = t/p, t -> 0, solves the renewal equation (T.9.11) with G = Fx,;
i.e.,
t
=
u o
1`
(1 F(u)) du +
o
t
F(u) du = F(t) + --
N o 0
F(ds) du JJ
= F^(t) + 1
u J J r duo
' s f EP
F(ds) = F (t) + t s F(ds).
x
.'
(T.9.13)
To finish the proof of (T.9.9), observe that g(t) = m`(t):= EN;, t >- 0, uniquely
solves (T.9.11), with G = F., among functions that are bounded on finite intervals.
For if r(t) is another such function, then by iterating we have
= F(t) + f { F^(t u) +
.'
(
J t
o
u r(t u s)F(ds)jF(du)
(T.9.14)
Thus,
r(t) _ P(Nj 3 n) = m W (t)
n =i
since
P(S, -< t) = P(N >_ n) -+ 0 as n -- oo since P(N >, n) = EN < cc.
n =i
Let d be a positive real number and let L d = {0, d, 2d, 3d, ...}. The common
distribution F of the durations Z 1 , Z 2 , ... is said to be a lattice distribution if there
is a number d > 0 such that P(Z 1 e L a ) = 1. The largest such d is called the period of F.
where
Nc") _ (T.9.17)
lsk=nd)
k=0
Note that assuming that the limit exists for each h > 0, the value of the limit in
(i), likewise (ii), of Blackwell's theorem can easily be identified from the elementary
renewal theorem (T.9.6) by noting that cp(h):= lim^.. m EN(t, t + h] must then be
linear in h, p(0) = 0, and
EN" 1
_ (im = -.
n -.. n
THEORETICAL COMPLEMENTS 221
Proof. To make the coupling idea precise for the case of Blackwell's theorem with
< oc, let {Z8 : n >- 1} and {Z: n >- l} denote two independent sequences of renewal
lifetime random variables with common distribution F, and let Z o and Z o be
independent delays for the two sequences having distributions G and G = F es ,
respectively. The tilde () will be used in reference to quantities associated with the
latter (equilibrium) process. Let a > 0 and define,
Suppose we have established that (e-recurrence) P(v(e) < oo) = I (i.e., the coupling will
occur). Since the event {v(e) = n, v(e) = n} is determined by 4, Z 1 , ... , Z and
Z;,, the sequence of lifetimes {Z' +k : k >- l} may be replaced by the
sequence {Z, +k : k >- 1} without changing the distributions of {S}, {N}, etc. Then,
after such a modification for < h/2, observe with the aid of a simple figure that
(T.9.20)
Therefore,
N(t+e,t+hs]N(t,t+h]1 {s , ( , > , t
N(t + s, t + h E] 1 ^ s ,, E
N(t, t + h]I (s , (,) ,, I
N(t, t + h]1 Is , II + N(t, t + h]l {s )(= N(t, t + h])
Taking expected values and noting the first, fifth, and seventh lines, we have the
following coupling inequality,
Using (T.9.9),
h+2e
EN(t+s, t+h e]= h - 2s and EN(t E, t+h+e]=
p
Therefore,
E(N(t, t + h]1 (s ,^ >,^) < E o Nh P(SVO > t) = m(h)P(S V(C) > t), (T.9.24)
where E o denotes expected value for the process N h under zero-delay. More precisely,
because (t, t + h] c (t, SN , + h] and there are no renewals in (t, SN ,), we have
N(t, t + h] < inf{k 3 0: ZN, +k > h). In particular, noting (T.9.2), this upper bound
by an ordinary (zero-delay) renewal process with renewal distribution F, is
independent of the event A, and furnishes the desired estimate (T.9.24).
Now from (T.9.23) and (T.9.24) we have the estimate
which is enough, since e> 0 is arbitrary, provided that the initial e-recurrence
assumption, P(v(e) < oo) = 1, can be established. So, the bulk of the proof rests on
showing that the coupling will eventually occur. The probability P(v(c) < cc) can be
analyzed separately for each of the two cases (i) and (ii) of Theorem
T.9.1.
First take the lattice case (ii) with lattice spacing (period) d. Note that for e < d,
For case (i), observe by the HewittSavage zeroone law (theoretical complement
1.2 of Chapter I) applied to the i.i.d. sequence (Z 1 , Z 1 ), (Z 2 , Z 2 ), (Z 3 , Z 3 ), ... , that
where R"=
min{ S,fS":5;,S">0,n_>0}=SS,^S "
Now, the distribution of {SR, +; t} ; does not depend on t (Exercise 7.5, Chapter
THEORETICAL COMPLEMENTS 223
->
IV). This, independence of {Z' j } and {S}, and the fact that {Sk + Sk : n 0} does
not depend on k, make {R,, +k } also have distribution independent of k. Therefore,
the probability P(R, < e for some n > k), does not depend on k, and thus
implies P(R < r i.o.) = P(R < e for some n >, 0) -< P(v(e) < oo). Now,
The proof that P(R <e for some n z) > 0 (and therefore is 1) in (T.9.29)
follows from a final technical lemma given below on "points of increase" of distribution
functions of sums of i.i.d. nonlattice positive random variables; a point x is called a
point of increase of a distribution function F if F(b) F(a) > 0 whenever a < x < b.
Lemma. Let F be a nonlattice distribution function on (0, co). The set E of points
of increase of the functions F, F* z , F* 3 , . . is "asymptotically dense at co" in the
sense that for any t > 0 and x sufficiently large, E n (x, x + e) 96 0, i.e., the interval
(x, x + e) meets E for x sufficiently large. 0
Proof. Let a, b E E, 0 <a < b, such that b a < e. Let 1 = (na, nb]. For
a < n(b a), the interval 1 properly contains (na, (n + 1)a), and therefore each
x > a 2 /(b a) belongs to some I., n >, 1. Since E is easily checked to be closed under
addition, the n + 1 points na + k(b a), k = 0,1, ... , n, belong to E and partition
I. into n subintervals of length b a < r. Thus each x > a 2 /(b a) is at a distance
(b a)/2 </2 of E. If for some r > 0, b a -> e for all a, b e E then F must be
a lattice distribution. To see this say, without loss of generality, E -< b a < 2a for
somea,beE.ThenEnl c {na +k(ba):k= 0,1,...,n}.Since(n+l)aeEnl
for a < n(b a), E n I must consist of multiples of (b a). Thus, if c e E then
c + k(b a) e / n E for n sufficiently large. Thus c is a multiple of (b a). n
Coupling approaches to the renewal theorem on which the preceding is based can
be found in the papers of H. Thorisson (1987), "A Complete Coupling Proof of
Blackwell's Renewal Theorem," Stoch. Proc. App!., 26, pp. 87-97; K. Athreya, D.
McDonald, P. Ney (1978), "Coupling and the Renewal Theorem," Amer. Math.
Monthly, 851, pp. 809-814; T. Lindvall (1977), "A Probabilistic Proof of Blackwell's
Renewal Theorem," Ann. Probab., 5, pp. 482-485.
3. (Birkhoff's Ergodic Theorem) Suppose {X: n -> 0} is a stochastic process on
(S2, .F, P) with values in (S, ,V). The process {X} is (strictly) stationary if for every
pair of integers m >- 0, r >- 1, the distribution of (X 0 , X 1 , ... , X,,) is the same as that
224 DISCRETE-PARAMETER MARKOV CHAINS
>,
of (X X l+ ... , X. + ,). An equivalent definition is: {X} is stationary if the distribu-
tion , say, of X :_ (X0 , X 1 , X2 , ...) is the same as that of T'X :_ (X,, X l + X2 + ...)
for all r 0. Recall that the distribution of (X X, + ...) is the probability measure
and
Definition T.9.1. The process {X: n > 0} and the shift transformation T are said
to be ergodic if S is trivial.
0) be a stationary
sequence on the state space S (having sigmafield .9'). Let f(X) be a real-valued
1-measurable function such that Elf(X)( < oc. Then
(a) n - ' Y_;=
f(TX) converges a.s. and in L' to an invariant random variable g(X),
and
(b) g(X) = Ef(X) a.s. if S is trivial.
(T.9.30)
Proof. Note that f(X)+M(foT)=M^ +1 (f) on the set {MM+ ,(f)>0}. Since
M, + , (f) >, M(f) and {M(f) > 0} c {M + , (f) > 0}, it follows that f (X)
M(f) M(f o T) on {M(f) > 0}. Also, M (f) -> 0, M^(f o T) -> 0. Therefore,
M
f(X)dP> - (M^(f)M^(fT))dP
J(
M(f)>OV G {M(f)>OInG
J J
= M(f) dP M(f o T) dP
O
J G
J
M^(f) dP M(f o T) dP
G
= 0,
where the last equality follows from the invariance of G and the stationarity of
Thus, (T.9.31) holds with {M(f) > 0} in place of {M(f) > 0}. Now let n j cc.
1 n- 1
f(A(fc)n G
f(X) dP -> cP({A(f ) > c} n G) VG E 5. (T.9.32)
226 DISCRETE-PARAMETER MARKOV CHAINS
f (M(f-^0)nG
f(X) dP -> cP({M(f - c) > 0} n G).
But {M(f - c) > 0} = {A(f - c) > 0} = {A(f) > c}, and {M(f - c) > 0} _
{A(f)>c}. n
I - ' I^-'
7(X):= i-- > f(T'X), f(X):= lim - Y f(T'X),
p_Qn.=o n,n.=o (T.9.33)
fG"d(f)
f(X)dP _> dP(GC , d (f )),
i.e.,
Now if c > d, then (T.9.34) and (T.9.35) cannot both be true unless P(G,, d (f )) = 0.
Thus, if c > d, then P(GC , d (f )) = 0. Apply this to all pairs of rationals c > d to get
P(f (X) > f(X)) = 0. In other words, (1/n) y;= f (T'X) converges a.s. to
g(X) := f(X).
To complete the proof of part (a), it is enough to assume f >, 0, since
n - ' j -1 f + (T'X) -- 7(X) a.s. and n - ' 2]o - 'f(TX) - r(X) a.s., where
f + = max{ f, 0}, - f - = min{ f, 0}. Assume then f > 0. First, by Fatou's Lemma
and stationarity of {X},
-'
1
E7(X) < lim E - f(T'(X)) = Ef(X) < oo.
n-ao n r=o
nonnegative and integrable, given s > 0 there exists a constant N E such that
S e + N,Ef(X)/.l. (T.9.36)
It follows that the left side of (T.9.36) goes to zero as .l --p oo, uniformly for all n.
Part (b) is an immediate consequence of part (a).
Notice that part (a) of Theorem T.9.2 also implies that g(X) = E(f (X) J).
Theorem T.9.2 is generally stated for any transformation T on a probability space
(S2, ^, p) satisfying p(T 'G) = p(G) for all G e ^. Such a transformation is called
-
X= + +Z
Q^
Since Z i , . are i.i.d. with finite second moment, the FCLT of Chapter I provides
that {X;"} converges in distribution to standard Brownian motion. The corresponding
result for {W"(t)} follows by an application of the Maximal Inequality to show
There are specifications of local structure that are defined in a natural manner but
for which there are no Gibbs states having the given structure when, for example,
A = Z, but S is not finite. As an example, one can take q to be the transition matrix
of a (general) random walk on S = Z such that q = q11-1 > 0 for all i, j. In this case
;;
of zero it-measure. Now the invariance of n' implies f (1/n) p(')(x; A)zr'(dx) = rz'(A)
for all n. Therefore, n'(A) = rz(A). Thus it' = it, completing the proof.
As a very special case, the following strong law of large numbers (SLLN) for
Markov processes on general state spaces is obtained: If p(x; dy) admits a unique
invariant probability rr, and {X": n -> 0} is a Markov process with transition probability
230 DISCRETE-PARAMETER MARKOV CHAINS
p and initial distribution n, then (1/n) j ' f (X,) converges to f f (x)n(dx) a.s. provided
-
f
that If (x)I i(dx) < co. This also implies, by conditioning on X 0 , that this almost
sure convergence holds under all initial states x outside a set of zero it-measure.
5. (Ergodic Decomposition of a Compact State Space) Suppose S is a compact metric
space and S = .l(S) its Borel sigmafield. Let p(x; dy) be a transition probability on
(S, s(S)) having the Feller property: x -* p(x; dy) is weakly continuous on S into
p(S)the set of all probability measures on (S,.R(S)). Let T* denote the map on
9(S) into a(S) defined by: (T*)(B) = $ p(x; B)p(dx) (Be f(S)). Then T* is weakly
continuous. For if probability measures ^ converge weakly to p then, for every
real-valued bounded continuous f on S, J f d(T*p") = f (f f (y)p(x; dy))p ^ (dx)
f
converges to ($ f(y)p(x; dy))p(dx) =If d(T*p), since x -a f f(y)p(x;dy) is continuous
by the Feller property of p.
Let us show that under the above hypothesis there exists at least one invariant
_>
probability for p. Fix p e P1(S). Consider the sequence of probability measures
1^ '
-
where
J f d^ . - J f d(T *u )I =4 f
nJ
f dy - f f d(T*" p)) -< (sup{f(x)I: x e S })(2/n') -+ 0,
as n' - oo. Therefore, {p} and {T*p".} converge to the same limit. In other words,
it = T*n, or it is invariant. This also shows that on a compact metric space, and with
p having the Feller property, if there exists a unique invariant probability it then
T*p 1_ (1/n) T*'p converges weakly to n, no matter what (the initial
distribution) p is.
Next, consider the set .f = .alp of all invariant probabilities for p. This is a convex
and (weakly) compact subset of P1(S). Convexity is obvious. Weak compactness follows
from the facts (i) q(S) is weakly compact (by Prohorov's Theorem), and (ii) T* is
continuous for the weak topology on 9(S). For, if u ^ e .elf and ^ converges weakly
to p, then ^ = T*" converges weakly to T*p. Therefore, T*p = p. Also, P1(S) is a
metric space (see, e.g., K. R. Parthasarathy (1967), Probability Measures on Metric
Spaces, Academic Press, New York, p. 43). It now follows from the Krein-Milman
Theorem (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York,
p. 207) that di is the closed convex hull of its extreme points. Now if {X ^ } is not
ergodic under an invariant initial distribution n, then, by the construction given in
theoretical complement 4 above, there exists B e P(S) such that 0 < n(B) < I and
it = it(B)it B + n(B`)i B .,, with n B and iB ., mutually singular invariant probabilities. In
other words, the set K, say, of extreme points of d# comprises those it such that {X"}
with initial distribution it is ergodic. Every it e .i is a (weak) limit of convex
combinations of the form .1;'p;" ( n -+ cc), where 0 < A;^ < 1, .l;' = 1, ;' e K.
)
THEORETICAL COMPLEMENTS 231
Each of the simple random walk examples described in Section 1.3 has the
special property that it does not skip states in its evolution. In this vein, we
shall study time-homogeneous Markov chains called birthdeath chains whose
transition law takes the form
; ifj =i +1
S ; ifj =i -1
(1.1)
a i ifj =i
0 otherwise,
for I i<,2r-1,
233
234 BIRTHDEATH MARKOV CHAINS
r
(w+4i)(2r i)
P1,i+ 1 =
(w + r) 2
CC.-V-^:)
(1.3)
2r
Pol = P2r,2r - 1 = w + r
Just as the simple random walk is the discrete analogue of Brownian motion,
the birthdeath chains are the discrete analogues of the diffusions studied in
Chapter V.
Most of this chapter may be read independently of Chapter II.
CASE I. Let {X} bean unrestricted birthdeath chain on S = {O, 1, 2, ...} = 7L.
The transition probabilities are
with
or equivalently,
Rewrite (2.4) as
xbx-1
. :. 61
0(d) 0(y) = d - 1 S +1 (0(c + 1) 0(c)). (2.8)
x=yxl'x-1 ... Nc+l
d-1 Sxax-1..'Sc+1
Let p y, denote the probability that starting at y the process eventually reaches
c after time 0, i.e.,
if dx x _1_+l = 0
p = lim ^i(y)
y ^
= 1
di x x=c+l !'xl'x I " ' f'c+ l
R IS'
2 ax = cc
i=l for ally > c iff Y a
x=1 12 '
0
x x+ = 0
p yd = 1 for all y < d iff F
x=oo axax+l"'SO
0
<1 for all y < d iff Y xx+l' - < oo (2.15)
x=m Sxsx+1" 60
A state y e S satisfying (2.18) is called a transient state; since (2.18) holds for
all y e S, the birthdeath chain is transient. Just as in the case of a simple
asymmetric random walk, the strong Markov property may be applied to see
that with probability 1 each state occurs at most finitely often in a transient
birthdeath Markov chain.
CASE II. The next case is that of two reflecting boundaries. For this take
S = {0, 1, 2, .. , N } , P00 = 1 0 , POI = 0' PN.N-1 = 6 N' PN.N = 1 6N, and
Pi.j+ 1 = fli, Pi.r-1 = b1, pi,; = 1 . d ; for I <, i < N -- 1. If one takes c = 0,
d = N in (2.3), then fr (y) gives the probability that the process starting at y
reaches 0 before reaching N. The probability 4(y), for the process to reach N
before 0 starting at y, may be obtained in the same fashion by changing the
boundary conditions (2.5) to c(0) = 0, (N) = I to get that q (y) = I Ji(y).
Alternatively, check that b(y) - I ^i(y) satisfies the equation (2.6) (with 0
replacing 0) and the boundary conditions (P(0) = 0, 4(N) = 1, and then argue
that such a solution is necessarily unique (Exercise 4). All states are recurrent,
by Corollary 9.6 (see Exercise 5 for an alternative proof).
CASE III. For the case of one absorbing boundary, say at 0, take
S = j0, 1, 2, ...1, Poo = 1 , Pi.i+1 = #i, Pi.^-1 = b;, Pi.; = 1 ; S ; for i > 0;
; , 6 i > 0 for i > 0, fl + 1 < 1. For c, d e S, the probability Ji(y) is given by
(2.10) and the probability p, which is also interpreted as the probability of
eventual absorption starting at y> 0, is given by
...
d-1 ax ax l - bl
Y ..- 1 . . Ij1
p Ya =hm d-1
dtv 1 + Sxbx-1_..51
= 1 iff 2 a lb
/j J^ a = oo (for y > 0). (2.19)
x=1 I2'''Yx
so that
Poo = 1. (2.24)
On the other hand, if the series in (2.19) converges then p^, o < 1 for all y > 0.
In particular, from (2.23), we see Poo < 1. Convergence of the series in (2.19)
also gives p r,, < 1 for all c < y by (2.12). Now apply (2.16) to get
whenever the series in (2.19) converges. That is, the birthdeath chain is transient.
The various remaining cases, for example, two absorbing, or one absorbing
and one reflecting boundary, are left to the Exercises.
k>,1,
Pn(xo = io, ... , Xm Im) = Pn (Xk = 1p, ... > Xm+k = I m ). (3.2)
no(1 o) + ir rbi = ir o
(3.3)
7i ifli i +(l i ai) + mi+Ibi+1 = ni (j = 1,2,...,N 1),
or
n i n (l<j<N),
(3.5)
...iI) 1
N ot
7Cp
= 1 +
Y_(
i =1 CS1(52...(SJ
7ro(I o)+n151=ito,
(3.6)
71 i -Ii -^+it(I i S i )+7r i +la i +^=j (j>11).
01 .J- 1
.
< co. (3.8)
i =1 6162..81
/
n o =l +
1 .. .
1 1
. (3.9)
...6 1
16 2
)
i =1
240 BIRTHDEATH MARKOV CHAINS
ol ... j-1
7t, 6162...gj Ro (J% 1 ),
(3.11)
aj+16j+2 07r0
(j1< 1).
+r * * F'-1
Y_ 6
bj+1 . j+2 E . 01...j-1
< 00, (3.12)
j<-1 Pifli+1 .-1 .j>-1 6 1 2 ...(a j
in which case
aj+lbj+2
...S 0 + ol ... i 1
7< o= 1+ (3.13)
j_' 1 Yjl'j+l
-
/^/^
...
I j>-1 6162...(ai
Notice that the convergence of the series in (3.12) implies the divergence of
the series in (2.14), (2.15). In other words, the existence of an equilibrium
distribution for the chain implies its recurrence. The same remark applies to
the birth-death chain with one or two reflecting boundaries.
2r 2w
_ o -1
j_ 1 2r(w + r) i (w + r i)(2r i) (j X W + j
...
r
^j S 1 b j "o j(2r+j) j i(wr +i) 2w+2r
(w+r)
(3.14)
The assertions concerning positive recurrence contained in Theorem 3.1
below rely on the material in Section 2.9 and may be omitted on first reading.
Recall from Theorem 9.2(c) of Chapter I that in the case that all states
communicate with each other, existence of an invariant distribution is equivalent
to positive recurrence of all states.
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS 241
Theorem 3.1
c` 66
12
[J1 1 Nx
diverges or converges. All states are positive recurrent if and only if the
series (3.8) converges. In the case that (3.8) converges, the unique
invariant distribution is given by (3.7), (3.9).
(c) For an unrestricted birth-death chain on S = {0, 1, 2, ...} all states
are transient if and only if at least one of the series in (2.14) and (2.15)
is convergent. All states are positive recurrent if and only if (3.12) holds;
if (3.12) holds, then the unique invariant distribution is given by (3.11),
(3.13).
We will apply the spectral theorem to calculate p", for n = 1, 2, ... , in the case
that p is the transition law for a birth-death chain.
First consider the case of a birth-death chain on S = {0, 1, ... , N} with
reflecting boundaries at 0 and N. Then the invariant distribution it is given by
(3.5) as
1
7z 1 = n o , 71 t = '^ '-1 71 0 (2 < j < N). (4.1)
b l51...51
In the applied sciences the symmetry property (4.3) is often referred to as detailed
balance or time reversibility. Introduce the following inner product ( . )" in the
vector space R"+'
242
(x, Y)n =
i
Y_
N
=o
xiYi 7ry x = (x 0 , x i , ... , xN)',
i xI
(iO
1JZ . ( 4.5)
Y,
i=O UO PjiYiJxj7rj
Y_
Therefore, by the spectral theorem, p has N + 1 real eigenvalues a o , a l , ... , a N
(not necessarily distinct) and corresponding eigenvectors (0o, 4' , dN, which
are of unit length and mutually orthogonal with respect to ( , ). Therefore,
the linear transformation x --> px has the spectral representation
N
P= akEk ,
k=0
(4.7)
N
x= E ak(4>k, x)aek ,
k=O
Letting x = e j denote the vector with 1 in the jth coordinate and zeros elsewhere,
one gets
1 I
7Z j =N (1 <j<,N-1), tr o =zZ N =2
N . (4.10)
0 2 - 2at + 1 = 0, (4.13)
The equation (4.11) is linear in x, i.e., if x and y are both solutions of (4.11)
then so is ax + by for arbitrary numbers a and b. Therefore, every linear
combination
satisfies (4.11). We now apply the boundary conditions (4.12) to fix A(a), B(a),
up to a constant multiplier. Since every scalar multiple of a solution of (4.11)
and (4.12) is also a solution, let us fix x o = 1. Note that x o = 0 implies x 3 = 0
for all j. Letting j = 0 in (4.15), one has
A(a)(0 1 0 2 ) + 0 2 = a, (4.17)
or,
Now write 0 1 = e`o, 0 2 = e - ' O, where 0 is the unique angle in [0, it] such that
cos 4) = a. Note that cosine is strictly decreasing in [0, n] and assumes its entire
range of values [-1, 1] on [0, 7c]. Note also that this is consistent with the
requirement sin 4) = 1 a 2 0. Then (4.19) becomes
i.e.,
Now,
k1 = J n j cosz
IIx(Iz
N kn l 1 N -i (k7rj) 1
2N+Nlyl cosz
N + 2N
cos2(krrj) = 1 il I + cos(2kirj/N)
= 1 N1
N j _ o N N j=0 2
c1 ifk=0orN
(4.25)
t J- ifk=1,2,...,N-1.
Now use (4.9), (4.23), and (4.26) to get, for 0 < i, j < N,
N
Pij
(n) E k
n4 ki kj 7 lj
k=0
N-1
k7rj
= rr j + 2rz j y- cos ( k^ cos kni cos( + (-1)rc j . (4.27)
k=1\ Nj N N
1 2 } _ 1 n+j i _
p j 1 + COS"( )COS COS !
N N k=1
^l N (N I )
) .. ) N ( N
For 0<i<N,
1 IN 1 ( ^ ( ) 1
Po = + COS"t
\ COS J
/ + 1 " I
2N N k = t N N 2N
For 0<i<N,
1 1 N -' / k\ ki i
1
P,N + Z COs cos (--- I COS + ( 1)n+N (4.28)
2N N k = 1 N N N 2N
Note that when n and j i have the same parity, say n = 2m and j i is
even, then
r)
C p;;"' 1 I=4
L cos( I I cos( cos( I[] + o(1)] (4.29)
246 BIRTHDEATH MARKOV CHAINS
for all i > 1. Note that p;! is the same as in the case of a random walk with
)
two reflecting boundaries 0 and N, provided N > n + i, since the random walk
cannot reach N in n steps (or fewer) starting from i if N > n + i. Hence for all
i, j, n, p is obtained by taking the limit in (4.28) as N . oc, i.e.,
p=2
f
2 "
o
, cos"(nO) cos(iirO) cos( jn6) dO
are i balls in box I, then there are 2d i balls in box II. Thus there is no overall
heat loss or gain. Let X. denote the number of balls in box I after the nth trial.
Then {X: n = 0, 1, ...} is a Markov chain with state space S = {0, 1, 2, ... , 2d}
and transition probabilities
P ij = 0, otherwise.
This is a birth-death chain with two reflecting boundaries at 0 and 2d. The
transition probabilities are such that the mean change in temperature, in box
I, say, at each step is propostional to the negative of the existing temperature
gradient, or temperature difference, between the two bodies. We will first see
that the model yields Newton's law of cooling at the level of the evolution of
the averages. Assume that initially there are i balls in box I. Let Y = X d,
the excess of the number of balls in box I over d. Writing e = E j (Y), the
expected value of Y given X 0 = i, one has
e=E,(Xd)=E,[X- d+(XX-,)]
d)+E;(X
/ 2d x_, X _ i 1
=E,(X_1 X 1)=e-1+Er
2d 2d )
e -i 1\
= e- 1 + Ei d = e-, d = 1 ^ e 1. -
Suppose in the physical model the frequency of transitions is r per second. Then
in time t there are n = tT transitions. Write v = log[(l (1/d))]T. Then
e = (i d)e - `, (5.3)
_ ( 2 d) 2 _ a y
j=0, 1,...,2d. (5.4)
m ; =1+ m 1 +5 1 m i _ 1 (1<i<N-1),
(5.5)
m0=0, mN = 1 + mN_1.
u 0 =0, u l = 1, (5.6)
In other words, in this new scale the probability of reaching the relabeled
boundary u = 0, before U N , starting from u x (inside), is proportional to the
distance from the boundary U N . This scale is called the natural scale. The
difference equations (5.5) when written in this scale assume a simple form, as
will presently be shown. First let us determine u x from (5.6) and (5.7) and the
difference equation
a 1 S Z .. _ S,a2... x
a a a (ui uo)= a (1<x<N-1), (5.10)
or
x . (5 i
61o2"
ux+1 = 1 +i^ i2 ...i (1 x<N 1). (5.11)
Now write
m(u) - m x . (5.12)
1 m(u) m(u- 1 ) 1
_
_ (1 x N 1).
UN UN-1 Ux Ux-1 i =x
6162...(Si
(5.14)
Relations (5.10) and (5.14) lead to
Y_
l'N-1 +
m \U x ) m(U x -1) = Yx/'x+l 1 F'x
(1 x < N 1). Ni -1i
...6 (Sx...Sii
Sxax+l N-1 i =x l
(5.15)
The factor ; / i is introduced in the last summands to take care of the summand
corresponding to i = x (this summand is actually 1/S x ). Sum (5.15) over
x = 1, 2, ... , y to finally get, using m(u 0 ) = 0,
m(UO _ xfx+1
...%jN ...
-1 + x i 1t
Ll (1 < y < N 1).
x=1 Sx 6 x+I * * ' SN-1 x=1 i =x 6X... 6 i
(5.16)
In particular, for the Ehrenfest model one gets
= 2d 22d(1 (5.17)
+Q)).
Next let us calculate
m =_ Ei T (0<i<d),
i 4 (5.18)
where
Writing m(u i ) = Ph i , one obtains the same equations as (5.13) for 1 < i < d 1,
and boundary conditions
...
m(ux+l) - m(ux) tn(ul) - tn(uo) x o1 _-I
ux+I U x U1 UQ i =1 6 1 62...^i
o1
...'
_ -1- Y
_ 6 6 ...6.
1 1 2
(5.21)
where 0 = 1. Therefore,
x
m(u x+, ) - m(u) _- ' z
x - Y r+ I x x+ I
(5 22)
x ,z
x i 1 , x6x+I '
m l+
d-1
X= I
d-1
x!
(2d-1)(2d-x)
x!
+ x
+ d-I
=
d-I x ((x + 1)x (i + 2)(i + 1)
i; (2d_i)...(2d_x)(x+1) )
2d x x x -i
<1 +
x=1 (2d- 1)(2d-x) x = 1 2d-x 1= 2d-x
d-1 x! d-I
2d
,1+Y- -+I
x= I (2d - 1)...(2d - x) x= I 2(d - x)
1+
dIl xl + d(log d + 1). (5.24)
x= I (2d - 1) (2d - x)
For d = 10 000 balls and rate of transition one ball per second, it follows that
It takes only about a day on the average for the system to reach equilibrium from
a state farthest from equilibrium, but takes an average time inconceivably large,
even compared to cosmological scales, for the system to go back to that state
from equilibrium.
252 BIRTHDEATH MARKOV CHAINS
For d = 10 000 one gets, using Stirling's approximation for the second estimate,
EXERCISES
EXERCISES 253
2. Suppose that balls labeled 1, ... , N are initially distributed between two boxes labeled
I and II. The state of the system represents the number of balls in box I. Determine
the one-step transition probabilities for each of the following rules of motion in the
state space.
(i) At each time step a ball is randomly (uniformly) selected from the numbers
1, 2, ... , N. Independently of the ball selected, box I or II is selected with
respective probabilities p, and P2 = t p,. The ball selected is placed in the
box selected.
(ii) At each time step a ball is randomly (uniformly) selected from the numbers in
box I with probability p, or from those in II with probability P2 = 1 p,. A
box is then selected with respective probabilities in proportion to current box
sizes. The ball selected is placed in the box selected.
(iii) At each time step a ball is randomly (uniformly) selected from the numbers in
box I with probability proportional to the current size of I or from those in II
with the complementary probability. A box is also selected with probabilities in
proportion to current box size. The ball selected is placed in the box selected.
_< _<
3. Prove (2.4), (2.16), and (2.23) by conditioning on X, and using the Markov property.
>_ _<
4. Suppose that cp(i)(c i d) satisfy the equations (2.4) and the boundary conditions
q(c) = 0, cp(d) = 1. Prove that such a cp is unique.
5. Consider a birthdeath chain on S = {0, 1, ... , N } with both boundaries reflecting.
(i) Prove that P(T mN) (I S N 5 N _ I ...6, )m if i > j, and < (1 o/i t .. N _, )m
if i < f. Here T = inf {n 1: Xn =j}.
(ii) Use (i) to prove that p ; , = P; (Tt < x) = I for all i, j.
6. Consider a birthdeath chain on S = {0, 1, ...} with 0 reflecting. Argue as in Exercise
5 to show that p 1 = I for all y.
7. Consider a birthdeath chain on S = {0, I, ... , N} with 0, N absorbing. Calculate
9. If 0 is absorbing, and N
Derive the necessary and sufficient condition for recurrence.
reflecting, for a birthdeath chain on S = {0, 1, ... , N},
then show that 0 is recurrent and all other states are transient.
10. Let p be the transition probability matrix of a birthdeath chain on S = {0, 1, 2, ...}
with
.= j= 0.1,2,....
2 (j +21 ) Si 2 (j+ 1),
254 BIRTH-DEATH MARKOV CHAINS
N i
N p l , ifj =i +1,
Ni (P 1N i)
+ N p 2 , ifj=i,i=0,1 ' ..., N,
Pij=
3. Calculate the transition probabilities p" for n >, 1 by the spectral method in the case
of Exercise 1.2(i) and p, = p 2 = Z according to the following steps.
(i) Consider the eigenvalue problem for the transpose p'. Write out difference
equations for p'x = ax.
(ii) Replace the system of equations in (i) by the infinite system
1 1 N i 1 i +2
- x0+ ---X I = ax0, 2N Xi + - xi+) + x^+2 = ax;+
2 2N 2 2N
N(2a 1 z)
q (Z) =
' q(Z), q(0) = xo.
1 ZZ
[Hint: Multiply both sides of the second equation in (ii) by z' and sum over
i >-0.]
(iv) Show that (iii) has the unique solution
N I' -a)(I + )Na
(P(Z) = X0( 1 z) Z
(v) Show that for aj = j/N, j = 0, 1, ... , N, cp(z) is a polynomial of degree N and
therefore, by (ii) and (iii), a j = j/N, j = 0, 1, ... , N, are the eigenvalues of p' and,
therefore, of p.
(vi) Show that the eigenvector x (' ) = (x ) , ... , xN ) )' corresponding to aj = j/N is
given with xo' ) = 1, by xk ) = coefficient of z' in (1 z)" - '(1 + z) .
(vii) Write B for the matrix with columns x^ 0) , .. , x (N) . Then,
where
no n
(B') ' B diag ...
(IX' 0) lirz ' IIX IN) IIa2 )
4. (Relaxation and Correlation Length) Let p be the transition matrix for a finite state
stationary birthdeath chain {X n } on S = {0, 1, ... , N} with reflecting boundaries at
0 and N. Show that
Use the inequality ab _< (a' + b 2 )/2 to show (Corr, ( f (X.), g(X ))I < e -z ".]
o
5. (i) (Simple Random Walk with Periodic Boundary) States 0, 1, 2, ... , N 1 are
arranged clockwise in a circle. A transition occurs either one unit clockwise or
one unit counterclockwise with respective probabilities p and q = 1 p. Show
that
1 N-1
N r_o1
where 0 = e(znptN is an Nth root of unity (all Nth roots of unity being
1,0,0 ,...,0 ).
2 N-1
(*ii) (General Random Walk with Periodic Boundary) Suppose that for the
arrangement in (i), a transition k units clockwise (equivalently, N k units
counterclockwise) occurs with probability p k , k = 0,1, ... , N 1. Show that
NI NI
(n) = 1 0rU'k) I Br5 "
Pjk P,
N r=o s=o
THEORETICAL COMPLEMENTS
aT, x
f(T,x), t > 0,
at (T.5.1)
To x=x,
such that f = (f,, ... , f") . I^" R" uniquely determines the solution at all timest >0
for each initial state x by (T.5.1).
cos(yt) my sin(yt) k
A(t)= 1 t_>0, where y= > 0.
--sin(yt) cos(yt)
my
Notice that areas (2-dimensional phase-space volume) are preserved under T, since
det A(t) = 1. The motion is obviously periodic in this case.
OH
dq;_aH dp;_
i =1,...,k, (T.5.2)
dt ap ' ; dt aq; '
where H - ll(q,, . .. , qk, p,, ... , Pk) is the Hamiltonian function representing the
total energy (kinetic energy plus potential energy) of the system. Example I is of this
form with k = 1, H(q, p) = p 2 /2m + kg 2 . Writing n = 2k, x, = q,, ... , X k = qk>
Xk+ 1 = Pi, , X2k = Pk, this is also of the form (T.5.1) with
p / OH OH OH aH
f(x) _ (fl (x), ..... 2k(x)) _ , ... , -- (T.5.3)
GXk+ 1 aX2, ax, OXk
Observe that for H sufficiently smooth, the flow in phase space is generally
incompressible. That is,
258 BIRTHDEATH MARKOV CHAINS
Liouville first noticed the important fact that incompressibility gives the volume
preserving property of the flow in phase space.
Lionville Theorem T.5.1. Suppose that f(x) in (T.5.1) is such that div f(x) = 0 for
all x. Then for each bounded (measurable) set D c R', IT DI = IDI for all t > 0, where
I I denotes n-dimensional volume (Lebesgue measure).
Proof. By the uniqueness condition stated at the outset we have T, +h = T,Th for all
t, h > 0. So, by the change of variable formula,
-T,,x
ITs+hDI = f d e t( l dx.
11,D \ ax )
aThx
I+ af h+O(h2) as h0.
ax = ax
But, expanding the determinant and collecting terms, one sees for any matrix M that
det( Ox ) = 1 + O(h 2 ) as h p 0.
ax
It follows that for each t >_ 0
THEORETICAL COMPLEMENTS 259
Proof: Consider A, T", T - 2 "A, .... Then there are distinct times i, j such that
IT -. "A n T'01 ^ 0; for otherwise
It follows that
IO n T - "li - 'IAI ^ 0.
Continuous-Parameter Markov
Chains
<,
In other words, for any sequence of time points 0 t o < t, < ... , the discrete
parameter process Yo := X, 0 , Y, := Xr ..... is a Markov chain as described in
I
Chapter II. The conditional probabilities p, J (s, t) = P(X1 = j Xs = i), 0 < s < t,
are collectively referred to as the transition probability law for the process. In
the case p ; j (s, t) is a function of t s, the transition law is called
time-homogeneous, and we write p, 1 (s, t) = p, j (t s).
Simple examples of continuous-parameter Markov chains are the
continuous-time random walks, or processes with independent increments on
countable state space. Some others are described in the examples below.
Example 1. (The Poisson Process). The Poisson process with intensity function
p is a process with state space S = {0, 1, 2, ...} having independent increments
distributed as
261
262 CONTINUOUS-PARAMETER MARKOV CHAINS
(f:P(u)du )^ (
P.1(s,t)=P(Xr=j1 X, = i)
P(X'
= P = j, Xs = i)
(X5=I)
(J i )!
l expj p(u) du)
\
J
for j i
(1.3)
0 ifj<i.
(j i)! .
Pi;(s,t)= (1.4)
0, 3> ,
Therefore,
p,J(s,t)=E{P(X,XS=j i N,N )} S
2(t S)]k
*k ^
_ Y if) (j k^ e
( 1 .7)
=
k0
Here p(t o ) takes the place of p, and t o is treated as the unit of time. Events that
depend on the process at time points that are not multiples of t o are excluded.
Likewise, specifying transition matrices p(t o ), p(t, ), ... , p(t) for an arbitrary
finite set of time points t o , t,, ... , t, will not be enough.
On the other hand, if one specifies all transition matrices p(t) of a
time-homogeneous Markov chain for values of t in a time interval 0 < t <, t o
for some t o > 0, then, regardless of how small t o > 0 may be, all other transition
probabilities may be constructed from these. To understand this basic fact, first
264 CONTINUOUS-PARAMETER MARKOV CHAINS
assume transition matrices p(t) to be given for all t > 0, together with an initial
distribution n. Then for any finite set of time points 0 < t l < t 2 < < t,,, the
joint distribution of X 0 , X,,, ... , X given by
and
Pi(XI +s = k) =Y JEs
P;(XX =j, Xt +s = k). (2.4)
pik(t + s) _Ep
,jEs
i; (t)p Jk (s) (i, k E S; s > 0, t > 0), (2.5)
Therefore, the transition matrices p(t) cannot be chosen arbitrarily. They must
be so chosen as to satisfy the ChapmanKolmogorov equations.
It turns out that (2.5) is the only restriction required for consistency in the
sense of prescribing finite-dimensional distributions as in Section 6, Chapter I.
To see this, take an arbitrary initial distribution it and time points
0 < t t < t 2 < t 3 . For arbitrary states i 0 , i l , i 2 , i 3 , one has from (2.2) that
as well as
But consistency requires that (2.8) be obtained from (2.7) by summing over i 2 .
This sum is
showing that the right sides of (2.8) and (2.9) are indeed equal. Thus, if (2.5)
holds, then (2.2) defines joint distributions consistently, i.e., the joint distribution
at any finite set of points as specified by (2.2) equals the probability obtained
by summing successive probabilities of a joint distribution (like (2.2)) involving
a larger set of time points, over states belonging to the additional time points.
Suppose now that p(t) is given for 0 < t < t o , for some t o > 0, and the
transition probability matrices satisfy (2.6). Since any t > t o may be expressed
uniquely as t = rt o + s, where r is a positive integer and 0 < s < t o , by (2.6)
we have
Thus, it is enough to specify p(t) on any interval 0 < t < t o , however small
t o > 0 may be. In fact, we will see that under certain further conditions p(t) is
determined by its values for infinitesimal times; i.e., in the limit as t o 0.
From now on we shall assume that
where I is the identity matrix, with 1's along the diagonal and 0's elsewhere.
266 CONTINUOUS-PARAMETER MARKOV CHAINS
p(o)=I. (2.12)
Then (2.11) expresses the fact that p(t), 0 < t < oc, is (componentwise)
continuous at t = 0 as a function of t. It may actually be shown that owing to
the rich additional structure reflected in (2.6), continuity implies that p(t) is in
fact differentiable in t, i.e., p(t) = d(p ;; (t))/dt exist for all pairs (i, j) of states,
and alit > 0. At t = 0, of course, "derivative" refers to the right-hand derivative.
In particular, the parameters q ;j given by
are well defined. Instead of proving differentiability from continuity for transition
probabilities, which is nontrivial, we shall assume from now on that p ;; (t) has a
finite derivativefor all (i, j) as part of the required structure. Also, we shall write
Q = ((qi;)), (2.14)
Suppose for the time being that S is finite. Since the derivative of a finite
sum equals the sum of the derivatives, it follows by differentiating both sides
of (2.5) with respect to t and setting t = 0 that
Y 9i; = 0. (2.21)
jES
Note that
in view of the fact p iJ (t) 0 = p ij (0) for i 0 j, and p ii (t) < 1 = p i; (0).
In the general case of a countable state space S, the term-by-term
differentiation used to derive Kolmogorov's equations may not always be
justified. Conditions are given in the next two sections for the validity of these
equations for transition probabilities on denumerable state spaces. However,
regardless of whether or not the differential equations are valid for given
transition probabilities p(t), we shall refer to the equations in general as
Kolmogorov's backward and forward equations, respectively.
co ^ktk
p 1 (t) = I f*k(j i) 1 e -at
t j0,
k=o k
= s ;; e + f(j i)a.te ^' + o (t), as t j 0. (2.23)
Therefore,
Y_
p(t) _ > gikPkj(t), i, j e S, t 0, (3.1)
k
In the case that S is finite it is known from the theory of ordinary differential
equations that, subject to the initial condition p(0) = I, the unique solution to
(3.1) is given by
p(t) = e, t ? 0, (3.4)
Example 1. Consider the case S = {0, 1} for a general two-state Markov chain
with rates
Therefore,
r(+b)
p(t)= e`Q=I R I (e- 1)Q
_ 1 S + e-(B+a)' e c+air
+b S Se - +Se-
It is also simple, however, to solve the (forward) equations directly in this case
(Exercise 3).
In the case that S is countably infinite, results analogous to those for the
finite case can be obtained under the following fairly restrictive condition,
z t t"
p1j(t) = ; + tq ;j + t2 q (2) + ... + t n q (n) + ... (3.11)
so that the series on the right in (3.11) converges to a function r;j (t), say,
absolutely for all t. By term-by-term differentiation of this series for r(t), which
is an analytic function of t, one verifies that r. (t) satisfies the Kolmogorov
backward equation and the correct initial condition. Uniqueness under (3.9)
follows by the same estimates typically used in the finite case (Exercise 2).
To verify the Chapman-Kolmogorov equations (2.6), note that
2 m
p (t +s)=e ( z+s'Q=I+(t+s)Q+^t2^s^ Q 2 + ... .^^^
m
t Qm +...l
=
L I+tQ+t?Q2+...
2!
_}.
m! J
2 n
x I+sQ+sQz+... +S_Qn+..
2! n!
= e`QesQ = p(t)p(s). (3.13)
with initial conditions H 1 (0) = E IES b i; = 1 (i e S). Since HH (t):= 1 (for all t > 0,
all i e S) clearly satisfy these equations, one has H,(t) = 1 for all t by uniqueness
of such solutions. Thus, the solutions (3.11) have been shown to satisfy all
conditions for being transition probabilities except for nonnegativity (Exercise
5). Nonnegativity will also follow as a consequence of a more general method
of construction of solutions given in the next section. When it applies, the
exponential form (3.4) (equivalently, (3.11)) is especially suitable for calculations
of transition probabilities by spectral methods as will be seen in Section 9.
Example 2. (Poisson Process). The Poisson process with parameter 2 > 0 was
introduced in Example 1.1. Alternatively, the process may be regarded as a
Markov process on the state space S = {0, 1, 2, ...} with prescribed infinitesimal
transition rates of the form
nn+j n
(-1) i^l, if0<jinn
(n) _ (3.16)
0, otherwise.
/ t2
!)ijlt) = (S ij + tqi; + Cji^ + .. .
t ;i+k oo tji+k i + k
Likewise,
For the general problem of constructing transition probabilities ((p ;; (t))) having
a prescribed set of infinitesimal transition rates given by Q = ((q1)), where
the method of successive approximations will be used in this section. The main
result provides a solution to the backward equations
Theorem 4.1. Given any Q satisfying (4.1) there exists a smallest nonnegative
solution p(t) of the backward equations (4.2) satisfying (4.3). This solution
satisfies
In case equality holds in (4.4) for all i e S and t > 0, there does not exist any
other nonnegative solution of (4.2) that satisfies (4.3) and (4.4).
or
d li"
ds (ePik(s)) = A;e PIk(S) + e r ' s ga;P;k(s) = y elisqPJk(s)
JE$ j :i
or
Reversing the steps shows that (4.2) together with (4.3) follow from (4.5). Thus
(4.2) and (4.3) are equivalent to the system of integral equations (4.5). To solve
the system (4.5) start with the first approximation
Pik = Sike A'` (i, k e S, t ) 0) (4.6)
Since q ;j 0 for i j, it is clear that p;k ^(t) > p;k ^(t). It then follows from (4.7)
by induction that p;k + '(t) p(t) for all n , 0. Thus, p ;k (t) = lim n -. p(t)
exists. Taking limits on both sides of (4.7) yields
Pik(t) = 6ike-Ail + e-a'(t-S)gijPjk(s) ds. (4.8)
j9 6 i JO
Hence, p ik (t) satisfy (4.5). Also, P k (t) - p; (t) >, 0. Further, >kes Pik (t) 1 for
all t > 0 and all i. Assuming, as induction hypothesis, that >kES Pik '^(t) 1
for all t >, 0 and all i, it follows from (4.7) that
e -x;' + ds
joi o
Hence, Ekcs p(t) < 1 for all n, all t >, 0, and the same must be true for
kES
I Pikt) = lim X pik (t)
nj ro kES
)
We now show that p(t) is the smallest nonnegative solution of (4.5). Suppose
p(t) is any other nonnegative solution. Then obviously p ;k (t) ^ 6 ik e = p(t)
for all i, k, t. Assuming, as induction hypothesis, p ik (t) > p;k 1 (t) for all i, k e S.
t > 0, it follows from the fact that p ;k (t) satisfies (4.5) that
SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATION 273
r
> S ex;t + e x;a s^ cn-1) sds
pik (t) > ik qjk
ij p ( ) /t ) .
= p ikt
joi Jo
Hence, p ik (t) p(t) for all n 0 and, therefore, p ikt) 15 k (t) for all i, k e S
and all t >, 0. The last assertion of the theorem is almost obvious. For if equality
holds in (4.4) for p(t), for all i and all t >, 0, and p(t) is another transition
probability matrix, then, by the above p ik (t) ^ p ;k (t) for all i, k, and t >, 0. If
strict inequality holds for some t = t o and i = i o then summing over k one gets
Note that we have not proved that p(t) satisfies the Chapman-Kolmogorov
equation (2.6). This may be proved by using Laplace transforms (Exercise 6).
It is also the case that the forward equations (2.18) (or (2.19)) always hold for
the minimal solution p(t) (Exercise 5).
In the case that (3.9) holds, i.e., the bounded rates condition, there is only
one solution satisfying the backward equations and the initial conditions and,
therefore, p;k(t) is given by exponential representation on the right side of (3.11).
Of course, the solution may be unique even otherwise. We will come back to
this question and the probabilistic implications of nonuniqueness in the next
section.
Finally, the upshot of all this is that the Markov process is under certain
circumstances specified by an initial distribution it and a matrix Q, satisfying
(4.1). In any case, the minimal solution always exists, although the total mass
may be less than 1.
Clearly,
poo(t) =
(4.10)
p01 (t) _ (v + A)poI(t) + vpoo(t) _ (v + A)poI(t) + ve "`
274 CONTINUOUS-PARAMETER MARKOV CHAINS
This equation can be solved with the aid of an integrating factor as follows.
Let g(t) = e ( v + z tp ol (t). Then (4.10) may be expressed as
dg(t) = vez,
dt '
or
Therefore,
Next
= e-(v+2)t [1'(%'
+ ^1) J` (ezz"
ezu) du]
^ o
= e-cv+2t
z) [''("_+ A) e 2zr 1 e zt 1
,1 221 2
v(v + 2) [ e - vt
2e -(
v+z ) t
+ e - w+zz)t]
22 2
v(v + 2) e_vt[1
e - zt]z
22 2
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY 275
yields
Po,"+ 1
(t) = ecv+("+1)z)r
J 0
e(v+("+1)z)u(v
+ n.1 )po"(u) du
x/ f
2 l
1 (x 1)"+1 e ,.^ 1 ( e z^ 1)"+1
= L n+1 ] 1 n+l
Hence,
Let Q = ((q ij )) be transition rates satisfying (4.1) and such that the
corresponding Kolmogorov backward equation admits a unique (transition
probability semigroup) solution p(t) = ((p ;j (t))). Given an initial distribution it
on S there is a Markov process {X1 } with transition probabilities p(t), t 0,
and initial distribution it having right-continuous sample paths. Indeed, the
process {X1 } may be constructed as coordinate projections on the space S2 of
right-continuous step functions on [0, oo) with values in S (theoretical
complement 5.3).
Our purpose in the present section is to analyze the probabilistic nature of
the process {X,}. First we consider the distribution of the time spent in the
initial state.
Proposition 5.1. Let the Markov chain {X,: 0 < t < cc} have the initial state
i and let To = inf{t > 0: X, 0- i }. Then To has an exponential distribution with
parameter q . In the case q = 0, the degeneracy of the exponential
;; ;;
Proof. Choose and fix t > 0. For each integer n > 1 define the finite-dimensional
event
276 CONTINUOUS-PARAMETER MARKOV CHAINS
A={X(m/2 n )t =iform=0,l,...,2n}.
A = linl A:= fl A
n-00 n=1
To see why the last equality holds, first note that{ To > t} = {X = i for all u
in [0, t]} c A. On the other hand, since the sample paths are step functions, if
a sample path is not in {To > t} then there occurs a jump to state j, different
from i, at some time t o (0 < t o < t). The case t o = t may be excluded, since it
is not in A. Because each sample path is a right-continuous step function, there
is a time point t 1 > t o such that X. =j for t o < u < t 1 . Since there is some u
of the form u = (m/2n)t < t in every nondegenerate interval, it follows that
X = j for some u of the form u = (m/2n)t < t; this implies that this sample path
is not in A. and, hence, not in A. Therefore, {To > t} A. Now note by (2.2)
r Z
P1 (To > t) = Pi (A) = lim F(A) = 11n p(2
nt x n 1 o0 L
t 1 2
]
= l im 1 + 2 qii
+ o^2 ) = e`q , (5.1)
nt o0
The following random times are basic to the description of the evolution of
continuous time Markov chains.
Thus, To is the holding time in the initial state, T1 is the holding time in the
state to which the process jumps first time, and so on. Generally, T. is the
holding time in the state to which the process jumps at its nth transition.
As usual, P; denotes the distribution of the process {X1 } under Xo = i. As
might be guessed, given the past up to and including time To , the process evolves
from time To onwards as the original Markov process would with initial state
X.0 . More generally, given the sample path of the process up to time
= To + T1 + + Tn _ 1 , the conditional distribution of {Xz , + ,: t > 0} is Ps , , ,
depends only on the (present) state X, and on nothing else in the past.
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY 277
Although this seems intuitively clear from the Markov property, the time To is
not a constant, as in the Markov property, but a random variable.
The italicized statement above is a case of an extension of the Markov
property known as the strong Markov property. To state this property we
introduce a class of random times called stopping times or Markov times. A
stopping time r is a random time, i.e., a random variable with values in [0, 00],
with the property that for every fixed time s, the occurrence or nonoccurrence
of the event {i <, s} can be determined by a knowledge of {X: 0 < u <, s}. For
example, if one cannot decide whether or not the event IT < 10} has happened
by observing only {X,,: 0 < u <, 10}, then r is not a stopping time. The random
variables i l = To , r 2 = To + T1 ..... t are stopping times, but T1 , To + TZ are
not (Exercise 1).
Proposition 5.2. On the set {co e Si: r(w) < 00}, the conditional distribution of
{X^ + ,: t >, 0} given the past up to time r is P, if r is a stopping time.
Stated another way, Proposition 5.2 says that on IT < co}, given the past of
the process {X,} up to time r, the future is (conditionally) distributed as the
Markov chain starting at X, and having the same transition probabilities as
those of {X1 }. This is the strong Markov property. The proof of Proposition 5.3
in the case that r is discrete is similar to that already given in Section 4, Chapter
II. The proof in the general case follows by approximating t by discrete stopping
times. For a detailed proof see Theorem 11.1 in Chapter V.
The strong Markov property will now be used to obtain a vivid probabilistic
description of the Markov chain {X,}. For this, let us write
k..=0
^r = qjif q1j : 0,
= q " for i J (5.3)
P1 (s<T0 s+A,X5+n=J)
P; (X =i for 0<u<s, Xs+o=j)
=P; (X= i for 0<u <s)P;(XS+o =jI X =i for 0<u<s)
= P;(To > s)p1j(A) = ev sp 1 (A). (5.5)
,
278 CONTINUOUS-PARAMETER MARKOV CHAINS
Dividing the first and last expressions of (5.5) by i and letting 0. 0 one gets
the joint density-mass function of To and XTp at s and j, respectively (Exercise 2),
qij
iTo.xTO (s , j) = e v1s pij(0 ) = e v " s gij = 21e-xs , (5.6)
where 2 i = q ;i .
Now use Propositions 5.1 and 5.3 and the strong Markov property to check
the following computation.
= Pio(TI ' S1, Xti = l2, ... , Tn ' Sng Xt.^ = in+l I TO = S, XTo = i 1)^io
s=0
x e liosk ;oi , ds
-
so
1 (T <1 S1 i XT o = i2, ... , TnI < sn , Xs. = in+1)2ioe^ioskioi, ds
= s=0
so \
ds)kio`,
_Aioexios RieitsdS)ki,i2
:=o ( fs"=0
X Pie(TO'S2, XTo= 13,...,Tn-2'Sn,Xtn_2 =in+J
Note that
n
fl
j=0
= Pi (Y1 = il, Y2 = i2, ... , Yn+1 = in+1) ,
Theorem 5.4
(a) Let {X1 } be a Markov chain having the infinitesimal generator Q and
initial state i o . Then {Yn := XT ,: n = 0, 1, 2, ...} is a discrete parameter
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY 279
X, =Yo , 0<t<To ,
(5.8)
X, =Yk+1 for To +Tl ++Tk <t<To +Ti ++Tk+l .
p<<(t) = 8e-t + Y_
Ai e - xisk p (t s) ds. (5.9)
i0i o
p 1 (t) = P; (X, = 1) =
f' Z
j#t ,
Pi(XT o = I -x
j, X, = l Tp = s)A e :s ds
_Yf ioi o, l
ie lisk ;i p i,( t s) ds. (5.10)
(
At) k e xttk
e
.fz(so, s / , ... , SO = k! (5.15)
if 0 <so< s 1 < < sk and sk _ 1 1< t < S,
0 otherwise.
Integrating this over S k we get the conditional density of To , To + T1 , ...
T0 +TI + ... +Tk - 1 given {Y=k}as
0 otherwise.
Thus, as asserted, f3 is the same as g. n
Now,
For] = i - 1, we have
pt.j-i(h) = (i)fo[ f Jh
a:lxn) + o (h)
Go
9((r) _ E p 1 (t)r, i = 0, 1, 2, .... (5.21)
j=o
Since each of the i initial particles, independently of the others, has produced
a random number of progeny at time t with distribution (p lj (t): j = 0, 1, ...),
the distribution of the total progeny of the i initial particles at time t is the
i-fold convolution (p(t): j = 0, 1, 2, ...). Therefore,
since
l
pij(t + s)r' = Y, Y, P ik(t)pkj(s)r
j=0 j=0 k
_ P^k(t)(gs"(r)) k = 9i`'(gs"(r)).
k
Fix i > 0 and consider the discrete parameter stochastic process {X.} defined
by
n=0,1,2,.... (5.24)
In particular, observe that this root cannot depend on t > 0. One may therefore
expect to be able to compute p from the generating function of the infinitesimal
transition rates. Define
h(r) = 0.
Proof. Observe that the Kolmogorov backward equation for the (minimal)
branching evolution transforms as follows.
Thus,
ag, 1 ) (r)
=h n (9 1 (r)), (5.27)
at
00
g0 (r) _ pij( 0 )r j = r. (5.28)
j=0
In particular, since g, l) (p) = p for all t > 0, it follows that p must satisfy
ag,(' (p) = 0.
)
it follows that 0 (r) can have at most one zero in the open interval 0 < r < 1.
)
fr'i
d h" (r) _ J+ Y 2jrj
)
dr
Imagine a binary tree graph emanating from a single root vertex (see Figure
5.1) such that starting from a single root particle (vertex), a particle lives for
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY 285
Figure 5.1
an exponentially distributed duration with mean A -1 and then either splits into
two particles or terminates with equal probabilities and independently of the
holding times. Each new particle is again independently subjected to the same
rules of evolution as its predecessors. Then L represents the time to extinction
of this process, and M, is the total number of deaths by time t. According to
Proposition 5.7, L is finite with probability 1. In particular, M L, represents the
number of degree-one vertices in the tree (excluding the root). In the context
of random flow networks, such vertices represent sources, and the extreme value
statistic L is the main channel length. We will consider the problem of computing
the (conditional) mean of L given ML = n. The process {(X1 , M,)} is a Markov
process on Z + x Z + having rates A = ((a, j )), where
C i (u C 2 ) + C 2 (C 1 u)eI'^ (c,-C2)1
g(t, u, v) = u C (5.33)
( z)) (( u Ci)e=z(c,=czi,_ _,
2
Proof. It follows from Kolmogorov's forward equation that for i o = (1, 0),
a E so {ux^vM^} ;
= E ;o
i iz
u v a(xM,),(il.iz)
at th'i2)
= Z2u2 E; o {u xt v Mt } + ZAv 3u Au E; o {u x `v M `}
au au
(5.34)
Therefore, one has
g(0, u, v) = U. (5.36)
with C, - C,(v) and C 2 - C2 (v) as defined in the statement of the lemma, (5.35)
and (5.36) take the form
1 2 1 2-zn+1
P(ML=n)=(2(-1)n-1=2 n=1,2.....
n-1 n
(5.38)
Thus, by the Stirling formula one has,
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY 287
and
we have for 0 < v < 1, using (5.33), (5.37), (5.40) and (5.41),
n1
J
h n vn = r tr -1 {(1 (1 v) 112 ) g(t, 0, v)} dt
0
= rr C C z C
zC1e
1A(c,-c
tr-1 C _ 1
2)r
dt
-C2)u
o f 1 C 1a(c,
C2z 1 e ^
_r-1J C C
r t d t. 1
( 2
Cl)e#M(cl-CZ)`^
Ix(c1-C2)1
(5.43)
C ie
C 22
2
i
"
= irr (C C l r+1 y (C
\nnr S r les ds
r l n=1
) 2 f"
= 2 ^r (1 v)_ 1)/2 (Clinn-r.
(5.44)
n=1 C2J
In the case r = 1,
_ 2(1 v) 1/2
h(v) _ 2 log 1
C1(v)
2a. log (5.45)
A C2(v) 1 + (1 V)112
288 CONTINUOUS-PARAMETER MARKOV CHAINS
To invert h(v) in the case r = 1, consider that vh'(v) is the generating function
of {nh"}. Thus, differentiating one gets that
(5.46)
and, therefore, after expanding (1 -- v) - ' and (1 v) - `/ z in a Taylor series
about v = 0 and equating coefficients in the left and rightmost expansions in
(5.46), it follows that
(_ 2
1+ ' )(_1)"
nP(ML =n)E(LIML =n)=2 t . ( 5.47)
In particular, using (5.39) and Stirling's formula, it follows that
T0 +T1 +T2 +=
n =o . ( 6.1)
The time (is called the explosion time.
What we have shown in general is that every Markov chain with infinitesimal
generator Q and initial distribution is is given, up to the explosion time, by the
process {X1 } described in Theorem 5.4. If ( = oo with probability 1, then the
minimal process is the only one with infinitesimal generator Q and we have
uniqueness. In the case P;(( = co) < 1, i.e., if explosion does take place with
positive probability, then there are various ways of continuing the minimal
process for t ( so as to preserve the Markov property and the backward
equations (2.16) for the continued process {X1 }. One such way is to fix an
arbitrary probability distribution 4 on S and let X, = j with probability i/i i
(j e S). For t > ( the continued process then evolves as a new (minimal) process
THE MINIMAL PROCESS AND EXPLOSION 289
Definition 6.1. A Markov process {X} for which P; (C = oo) = I for all je S
is called conservative or nonexplosive.
ll ( A t) /' / ^t
e ^J*v1I)_ i e-i,` (^) *n (j _ i)
jes n=O JES n n=O n jES J
e-x^(^.t)= 1.
(6 . 2 )
n=O n!
Example 1. (Pure Birth Process). A pure birth process has state space
S = {0, 1, 2, ...}, and its generator Q has elements q ;i = A i , q ; , ;+ I = ^.,, q,; = 0
for j > i + I or j < i, i e S. Note that this has the same degenerate (i.e.,
completely deterministic) embedded spatial transition matrix as the Poisson
process. However, the parameters Al i of the exponential holding times are not
assumed to be equal (spatially constant) here.
Fix an initial state i. Then the embedded chain { }: n = 0, 1, ...} has only
one possible realization, namely, Y = n + i (n > 0). Therefore, the holding times
To , T1 ,... , are (unconditionally) independent and exponentially distributed
with parameters 2, A i+ ,, ... , Consider the explosion time
First assume
1
Y, =00. (6.3)
n=0 ^n
= ri m +i = H 1
mj=O'm +i + S =O 1 + S/Am +i
1
_ s = oo, (6.4)
m=0 1m+i + S
since log(1 + x) >, x/(1 + x) for x 0 (Exercise 1). Hence log 0(s) = oc, i.e.,
(k(s) = 0. This is possible only if P i (C = oc) = 1 since e > 0 whenever ( < oo.
Thus if (6.3) holds, then explosion is impossible.
Next suppose that
Z 1 <0. (6.5)
n=0 ^n
Then
1 m1
E(= = <0o.
m=0 1 m +i n =i in
This implies P;(( = oo) = 0, i.e., Pi (C < oo) = 1. In other words, if (6.5) holds,
then explosion is certain.
In the case of explosion, the minimal process {X,: 0 < t <} can be continued
to a conservative Markov process on the time interval 0 <, t < oc in a variety
of ways. For illustration consider the continuation {X 1 } of an explosive pure
birth process {X } obtained by an instantaneous jump to state 0 at time t = C,
after which it evolves as a minimal process starting at 0 until the time of the
second explosion, and so on. Let Pij (t) denote the transition probabilities for
{X}. Likewise, Pi denotes the distribution of {X} given Xo = I. Consider, for
THE MINIMAL PROCESS AND EXPLOSION 291
More generally, the event {X, = 0, N, = n} occurs if and only if the nth explosion
occurs at some time prior to t and the minimal process remains at 0 from this
time until t. Let fo denote the probability density function of the first explosion
time ( for the process {X,} starting at 0. Since the times between successive
explosions are i.i.d. with p.d.f. fo , the p.d.f. of the nth explosion time is given
by the n-fold convolution ft". Therefore,
ds.
Po(X, = 0, N, = n) =
J 0 poo(t s)f
t *n (s) (6.8)
Poo(t)= e "` 1 +
n=1 0
Je f 0 (s) ds] . (6.9)
In particular, (6.10) shows that the forward equation does not hold for (t).
Since the time of the first explosion starting at 0 is the holding time at 0
plus the remaining time for explosion starting at state 1, we have by the strong
Markov property,
fo=e^*.fi (6.11)
where f1 is the p.d.f. of the first explosion time starting in state 1 and e Ao is the
exponential p.d.f. with parameter A0 . Therefore,
Since,
(6.13)
exo(t) = Aoe-z' = 2opo0(t),
o*<n
L f n (t) = A0 L Poo * fl * f Z)(t)
n =1 n=1
e oo r
_ 2 0 .fi(s)poo(t s) ds + A 0. fi*.fn(s)poo(t s) ds
o n =i o
= A0P10(t) (6.14)
Equation (6.15) is the (0, 0) term in the backward equation for (t). As remarked
earlier (Exercise 2) the backward equations hold for all terms
Another method of constructing a conservative process from an explosive
minimal process is to adjoin a new state A to S and define X, = A for t > l;.
Then A is an absorbing state and the transition probabilities are
p ;1 (t) for i, je S
7 SOME EXAMPLES
Example 1. (The Compound Poisson Process). The state space for {Xr is }
Since the A, do not depend on i, it follows from Theorem 5.1 that the embedded
chain {1',}, corresponding to the transition matrix K and some initial
distribution, is independent of the i.i.d. exponentially distributed sequence of
holding times { To , T,, .. .}. Also, { Yn } is a random walk that may be represented
as Yo =Xo ,Y1 =X0 +Z 1 ,Y2 =X0 +Z 1 +Z 2 ,...,Yn =X o +Z 1 +Z 2 +.
+ Z .... , where {Z 1 , Z 2 ..... Zn , ...} is a sequence of i.i.d. random variables
with common probability function f, independent of X0 . It follows from this
description that the transition probabilities are given as follows.
Let A = 2(1 f (0)). Then,
x PA. =j)
(A t)n
_ f *n (j i), (7.2)
n = o n.
The rates given in (7.1) are the same as those that are obtained in (2.24).
Example 2. (The Yule Linear Growth Process). This process is sometimes used
as a model for the growth of bacteria, the fission of neutrons, and other random
replication processes. It is a special case of Example 2 in Section 5 with
fo = 0,f2 = 1. However, to keep this example self-contained, consider a
situation in which there are initially i members of a population, each splitting
into two identical particles after an exponentially distributed amount of time
having parameter A and independently of the others. Let X X denote the size of
the population at time t. Then {Xj is a Markov process on S = { 1, 2, ...}
(Exercise 1), whose transition rates can be calculated explicitly as follows.
-xn)1 = e -stn = I i1h + o(h).
p,1(h) = ( e (7.3)
Similarly,
(i)[Jh
p1 1(h) =
Acisz
e ds] [e ] i 1
= i2h + o(h) (7.4)
o
294 CONTINUOUS-PARAMETER MARKOV CHAINS
and forj>i+2,
--1 = o(h).
0 <, p 1 (h) < (1 e (7.5)
1 c1 1 1
n 1 An n1 nA 2n =1 n
it follows from Example 6.1 that the process is conservative. We will use the
backward equations to compute p k (t) for all i, k. We have
;
Substituting this into the last equation of (7.7) (with k = i + 1), one gets
= i2ee At = iAe21
or
t
et)
epi,r +l(t) =
fo iAe ds + e` Ap1+l( 0 ) = 1(1
-
Therefore,
t. t
p; +2(t) = e ` ` f i^.(i + 1)e '(1 e) ds
- - -
By induction (a basis for which is laid out in (7.9)-(7.10)), one may prove
(Exercise 2)
e-ahe-iph + O(h 2 )
if 0 < i < N,
p 1 (h)
= l e -ahe-N(1h + O(h 2 ) if i > N,
e - a h ahe - `a h + O(h 2 ) if 0 <' i '< N,
p1,+3(h) =
ehe - N a h + 0(h 2 ) if I> N,
^e - ah i(1 e -ah) e -u-'inh + O(h 2 ) if I '< i '< N,
1(h)
= e ah
N(1 e - u h )e -(N - ' ) ah + O(h 2 ) if i> N,
p(h) = O(h 2 ) if ji jj > 1.
296 CONTINUOUS-PARAMETER MARKOV CHAINS
Therefore,
1 e etr u^ d u = 1 - e - '
(7.13)
t o `
)J^ 1 _ t Qtk
\ J
= kJ e
at (k^k \%/\
1 !'t R`1
Q j
e' 1 C ot J (at)k t k-
(
j! t k=J (k J)! t
_ j _ ( t) - e )lat)
e
! \
t
t I
1e ` J
a ea([I - ( 1 R
More generally, since the chance of two or more occurrences in a small interval
of length h is 0(h 2 ), one has
h
= e -Bh-ih ae-c'-v)s ds + O(h2) = e-u+i)hah + O(h 2 )
0
(7.17)
= ah a(i + 1)h 2 + O(h 2 ) = ah + O(h2),
- h )i(1 e - 9h)e - c" - ')h
pj,j_1(h) = (e
Therefore,
For completeness one needs to argue that {X,} is a Markov process. For
this note that X 4 is
is a function of
(i) Xs and the additional service times up to time s + t required by the X.,
customers present at time s, and
(ii) the numbers and times of arrivals of those customers arriving during
(s, s + t] and their service times.
But the latter (i.e., (ii)) are stochastically independent of {X: 0 < u < s} since
Ys+, Ys is independent of Y. for 0 <, u < s, and the service times of all customers
are independent of each other as well as of the process {Y,,: u > 0}. Also, the
conditional distribution of the additional service times of the Xs customers who
are being served at time s, given {X, 0 < u < s} (or, for that matter, given
{ } u , 0 < u < s}, as well as the service times of all these arrivals in [0, s] already
spent in [0, s]) is still that of XS i.i.d. exponential random variables each having
parameter ("lack of memory" property). Hence P(Xs+ , = k ( X.: 0 <, u < s) _
P(Xs+, = k I X.J.
au a2u au
=D+c-
at ax2 ax
SOME EXAMPLES 299
a 2
t
2a-=0,
v2+
(7.19)
with the initial conditions of the form u(x, 0) = (p(x) and (0u/at)J, =0 = 0 for a
suitable initial profile cp(x). The telegrapher's equation arises in connection with
the spatiotemporal evolution of both the voltage and the charge density in
one-dimensional electrical transmission lines. In this framework the coefficients
, a, and v 2 are all nonnegative real numbers; in terms of the physical parameters
/v 2 = LC and 2a/v 2 = RC, where R is the electrical resistance, L is the
inductance, and C is the capacitance (see Exercise 9). However, these quantities
play no particular role in the probabilistic derivation.
After dividing through by > 0 and adjusting the other parameters
accordingly, we may take = 1. Partition the state space as a one-dimensional
lattice of sites spaced A > 0 units apart and partition time into units of
0, r, 2r, 3r, ... , with A = vr. A particle is started at x = 0 and in the first unit
of time r moves with speed v > 0, with some probability ic + in the positive
direction to A(= vT), or moves with probability zr - = 1 ir + in the negative
direction at speed v to A. Subsequently, at each successive time step the
particle will continue to travel at the rate v but at each time step, independently
of the previous choice(s), may choose to reverse its previous direction with
probability p = ar. Notice that, given the initial velocity, the time step until
the particle makes a reversal in direction is geometrically distributed with mean
1/p = 1/ar and variance q/p 2 = (1 ar)/(ar) 2 . The reversal mechanism provides
a kind of stochastic oscillation in the motion.
The position process {S} is represented in terms of successive displacements
by
Let us now consider the average profile after n time steps and viewed from
the position of the particles initially moving to the right, starting from the initial
300 CONTINUOUS-PARAMETER MARKOV CHAINS
=E(gq(m 0- S, -i +A)^X, =
(mO+S-1 + 0 )I Xt = 0 )( 1 - p)
= u - ((n - 1)i, (m + 1)0)t + u + ((n - 1)t, (m + 1)4)(1 - ar).
(7.25)
Conditioning likewise in (7.23) gives
(7.26)
u (nr, x)
- - u ((n
- - 1)i, x) u ((n- - 1)i, x - vT) - u ((n
- - 1)i, x)
r z
- au ((n - 1)i, x -
- vT) + au + ((n - 1)r, x - v c),
-
(7.28)
where x = mA, A = vi. In the limit as i -> 0, A = vt -- 0, n, m -+ oo such that
x = mA, t = nT, these equations go over to
au = v a au + au , - (7.29)
-
a u = v ax - au - + au + . (7.30)
SOME EXAMPLES 301
u + +u - u+-u-
u=- and w= (7.31)
By combining (7.29) and (7.30) we get the so-called transmission line equations
for u and w,
au aw
(7.32)
^t = v -- ,
aw = V au - 2aw.
(7.33)
at ax
0
Y,=x+ J t
Vds=x+vs E0
(l)' ds.
, (7.35)
Although {Y} is not a Markov process, the joint evolution {(Y V)} of position
and velocity is a Markov process, as is the velocity process { V} alone. This is
seen as follows.
o
\ J
=P(a YSb,V=w Yo=y,Vo=^=Yw=v. (7.36)
One expects from the earlier considerations that the solution to the
telegrapher's equation with the prescribed initial conditions is given by
u(t, x) = Ecp(x + Y)
To verify that (7.37) indeed solves the problem for a reasonably large class
of initial profiles, let f(x, w) = cp(x), for w = v, v. Define
Let g(t, x) = g(t, x, v), and g (t, x) = g(t, x, v). Since {(Y, V)} is a Markov
-
a
9 = Ag(t, x, w) = w ag + a[g(t, x, w) g(t, x, w)]. (7.39)
In order to check that the infinitesimal generator A is of this form, let x , f(x, w)
be, for w = v and w = v, a bounded twice-differentiable function with bounded
derivatives. Then, as s ,^ 0,
I f ( x , w)
+E((-x)- - ------ Yo=x, vo=wl+o(s)
\\ Gx
J (e
S -,
+ o(i)) dr + o(s)
0
i?f ( x, w)
= f(x, w) + as(f (x, - w) - f(x, w)) + w s + o(s). (7.40)
3x
Proposition 8.1
(a) For every i there exists at least one j such that i - j.
(b) Ifi-->j,j- .k then i-k.
(c) If i is essential then i H I.
(d) If i is essential and i - j then "j is essential" and i .-+ j.
(e) On 41 the relation "+-." is an equivalence relation, i.e., reflexive,
symmetric, and transitive.
continuous parameter chains are not periodic. More precisely, one has the
following proposition.
Proposition 8.2
(a) For each state i, p, (t) > 0 for all t > 0.
(b) For each pair i, j of distinct states, either p 1 (t) = 0 for all t > 0 or
p ij (t)>0 for all t>0.
C P14
(
n
n )
t ... P^ n
t Pi^( n
= Puu( n
P to = 0
)
t ^ < P(to),
\ n)
for all positive integers n. Taking the limit as n --' oo, one gets by continuity
that lim p ;; (t o /n) = 0, which is a contradiction to continuity at t = 0 and
the requirement p(0) = 1.
(b) Let i, j be two distinct states. Suppose t o is a positive number such that
p, (t o ) = 0. Since p i,(t o ) > p ;; (t o s)p ; ,(s) for all s, 0 < s < t o , and since
p ;i (t o s) > 0 it follows that p. s (s) = 0 for all s <, t o . Then q i; = p(0) = 0, and
= 0. It follows from (5.4) that P; (X To = j) = 0. This implies that k ; ; = 0 for
every j' such that k uj ' > 0. For, if k ;j . > 0 and k j . j > 0, then no matter how small
t is (t > 0), one has, denoting by {} n : n = 0, 1, 2, ...} the embedded discrete
parameter Markov chain,
This contradicts "p 1 (s) = 0 for all s < t o . Thus, k; = 0. In this manner one
may prove that k;j = 0 for all n. This implies p 1 (t) = 0 for all t.
) n
A state i is recurrent if
A state i is transient if
Thus, if j 0 i, then p ;j is the probability that the particle starting at i will ever
visit j. Note that p ;; is the probability that the particle will ever return to i after
having visited another state.
Theorem 8.3
(a) A state i is recurrent or transient for {X} according as i is recurrent or
transient for the embedded discrete parameter chain { }"} having
transition matrix K.
(b) i is recurrent if and only if p i; = 1.
(c) Recurrence (or transience) is a class property.
Proof. (a) Suppose i is recurrent. This clearly implies that P1 (Y" = i for infinitely
many integers n) = 1. Conversely, suppose i is a recurrent state of the
discrete-parameter embedded chain { }"}. Let
for r = 2, 3, ... , denote the times of successive returns to state I. Under Pi the
random variables r; 11 , r; 2 > . , T;'^ ( ' -1) are i.i.d. and positive. Hence,
the nth partial sum t;" ) converges to oo with probability 1 as n --^ cc (Exercise 1).
Therefore (8.2) holds.
If i is transient for {X,}, then i cannot be recurrent for { }"} since this would
imply i is recurrent for {X}. This implies that i is transient for { }"}, since, for
a discrete-parameter chain a state is either recurrent or transient.
(b) Since transitions occur only at times To , To + T,, To + T, + T2 , ... , it
follows that
In other words, the p ;; defined by (8.51 for {X) are the same as those defined
for the embedded discrete-parameter chain. Hence, (b) follows from (a) and
Proposition 8.1 of Chapter II.
(c) i is recurrent and i i--> j for {XX } implies "i is recurrent and i -J for { Y},"
which implies that j is recurrent for { Y}, which, in turn, implies thatj is recurrent
for {X}. n
.:=E< oo . (8.8)
Y ,,fr' +1)
t
= (i)
i
T (r) r' = 1
lim = lira = E(r; Z) - r; l) ) = E ; T; 1) . (8.9)
r-+ao r r
(E;J
'
0
TS 1 )
f (XS ) ds < oo , (8.10)
then applying the strong law of large numbers to the i.i.d. sequence
I,
Zr = f(XS)ds (r = 1,2,...), (8.11)
t 0
('
t 0
^
t r l=1
Zr +
1
t
`
.f (Xs) ds. (8.13)
The first and third terms on the right-hand side of (8.13) go to zero as t - 00
with probability 1 (Exercise 2). By (8.9), (v, /t) -+ (E 1) ) -1 with probability 1
as t -+ oo. Therefore we have the following theorem.
ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV CHAINS 307
1
lim ds=(E;T;'1) `E;
f(X,)ds, (8.14)
. t JO
1
lim p,.(s) ds = (Et''. (8.15)
t_.0t 0
7r, = (E; 25 1 ^) -1 , j E S.
(c) If (8.10) holds, then the limit in (8.14) also equals ER f (Xo ) = Y j n j f (j).
To state the analog of Theorem 10.2 of Chapter II, write f(j) = f(j) E n f (Xo )
and
T
1 CTI
-
WT (t) = f (Xs ) ds (t >, 0). (8.16)
"T fo
Theorem 8.6. Assume, in addition to the hypothesis of Theorem 8.4, that
t^ 1z
Q 2 .= E ; f (XS ) ds) < oo . (8.17)
0
Then, as T -+ oo, the stochastic process {WT (t): t >, 0} converges in distribution
to Brownian motion with drift zero and diffusion coefficient
'Q Z .
b 2 = (E j rj' ) ) - (8.18)
communicating class, the Markov chain {X,} is positive recurrent if and only
if it admits an invariant probability distribution. The "only if" part of the last
statement follows from Corollary 8.5(b). For the sufficiency see Exercise 12.
In order to find an invariant probability n, one may try to differentiate the
right side of the equation n'p(t) = n' to get
n'Q=0. (8.19)
l#1N2* Yx
= co. (8.20)
no20 + it1A1S1 = 0,
(8.21)
ni Ai + ni+it5i+1Ai+1 = 0 for j >, 1.
__ -o ir _ (AO^ 0i2' f
o or j2.
J 2. ( 8.22 )
1 b1 1 o ' ' l S 1S 2' S
..
no 1 i -i 1 (8.24)
= ( 3162.... ai
=1 ( 20 ) o'2
in (8.22) to get the unique invariant initial distribution. Thus, the necessary and
sufficient conditions for positive recurrence (or for the existence of a steady state)
are (8.20) and (8.23).
The N-server queue described in Example 7.3 is a continuous-parameter
birth-death chain on S = {0, 1, 2, ...} with infinitesimal rate parameters given
by (7.12). In particular, the process has a reflecting boundary at zero so that
the criterion for recurrence and positive recurrence apply. The series (8.20)
diverges or converges according to the divergence or convergence of
N!
= - l y I ( a I y (8.25)
Thus, the process is recurrent if and only if > a/N. Similarly, the series (8.23)
converges or diverges according as the following series converges or diverges,
4 i2 ... v i = i ay -i
6162 y ! y N+ 1 a + N
Y=N A y Y=N N.N
NN
k))(a+N)-I. (8.26)
Thus, the process is positive recurrent if and only if> a/N. It is null recurrent
if and only (f = a/N. In the case > a/N, the invariant initial distribution is
given by
1 _1 a' ( 1_-_
= it o (for 2 j N),
.
^ 1 (a + ) ito j! (a +j) it
_ a 'N N 1
(8.27)
7r ' N N! (a + N) 710 (j > N),
-I
1+ N 1 + 1_ a 'NN
0
;^j! II)i
(a 1
(+J) ;_ (a + N) Nj N(
where
v=X ^+^^+z
.
..
y
p xc = Px ({X,} ever reaches c) = lim 1 for x > c.
a.^
d-1
6c+iSc+i
... 8v
1+1
... y )
v=c +i c+ic+z
(8.29)
Thus, for x > 0, p x , = I and, more generally, p x, = 1 for x > c, if and only if
v= i
6 1 6 2 ... by
iz ...
v
= cc. (8.30)
But if p xo = 1 for all x > 0 then Px (( = cc) = 1 for all x > 0, since the holding
time at 0 is infinite. So there is no explosion if (8.30) holds. If the series in
(8.30) is convergent, then p so < 1 for x> 0, and p x, < I for x > c. However,
this does not necessarily mean explosion. For example, let x = p, 6 x = 1 p
for x>0 with 2<p<1, also take A x =2>0 for all x>0,g 00 =0. Then by
Proposition 6.1, the process is conservative and, therefore, nonexplosive.
N
W;(z, t) = E;(z ") _ Y_ z'pri(t) (8.32)
j=o
so that, differentiating both sides with respect to t, using (2.19), and regarding
X --^ zx as a function of X for fixed z,
U4,1
NrB 9 i (z, t) + rz (z, t). (8.33)
az
Collecting similar terms in the last expression, one gets
`P^
a at t) _ (1 z)(rB Z + rA a ) ^az'
t) NrB (1 z)^p;(z, t). (8.34)
Introduce a real parameter 0 and in the (z, t) plane define the (family of)
parametric curve(s) 0 + (z(0), t(0)), by
d z(9) dt(B) = 1.
--(1 z)(rB Z + rA), (8.36)
dO dO
Now note that along such a curve the above directional derivative is none other
than d[cp (z(0), t(6))]/d8, so that one has
;
d^P^(z( 8 ), t( 8 )) = Nr
B (1 z)gqr(z( 0 ), t(0)), (8.37)
dO
rA + rB (1 + cee) i
t( 8 ) = 0, z(0) _ r" + . (8.39)
rB rB
Different values of c give rise to different curves of this parametric family. Each
point (z, t) in the planar region {(z, t): 1 < z < 1, t > 0} lies on one and only
one curve of this family. Thus, if we fix (z, t), then this point corresponds to c
obtained by solving for c and 0 with t(0) = t, z(0) = z in (8.39). That is,
0 = t'
c=e_(rA+rH)l rA arB/\z+ra/
1
J . (8.40)
dz =
((1z)d9=^ l log(z+ A
r ), (8.41)
J rB z + rA re \ rB
so that
Specifically,
N(l
d' = zi(0) ( z(0) + rA^ N = r " + rB
z(0) ^
)-
+ c) -N
rB rB
_1_rA+rA+ rB
(1 +c)-J( rA +rB)-N(l+c)-N(8.44)
rB rB rB
NN
t) =[ (1 + c(t, z))' rA B B r) (1 + c(t, Z))Z + rB/
rB rBJ ^ rA
(8.45)
From (8.45) and (8.40), by appropriate differentiation, one may calculate E.XX
and Var X. Moreover, as t oo, c(t, z) -+ 0. Hence,
;
)N(Z
r
lye qi(z, t) = (8.46)
ra rB
(_ + )N,
which shows that, no matter what the initial state i may be, as t - cc, the
limiting distribution of XX is binomial with parameters N and p = rB /(r,, + r ).
X00 XOm
X0 = , ..., X.
xm0 Xmm
.X11 .. Xlm
B = ( XOXI ... xm ) = X (9.1)
Then
...
QB = (a0X0a1x1 mXm) ,
Q B = (px0a1X1 ... pt
m X. )
Hence
0o t o ao t o
tQ ' n n n
e B= Q) B = (OXOGL'1X1...oc"X.)
n=o n. n=o n.
= (et^wxoetatx1...eta' X,) = B diag(et^o, . .. , et"'). (9.2)
Therefore
e tao
etas
0
e`Q=B 0 B-1. (9.3)
with 0 <, < 1 for 1 < i < N 1. The invariant distribution n, which exists
and is unique, satisfies n'p(t) = n'. Differentiation at t = 0 yields
n'Q=0, (9.4)
or, in detail,
n o A o + rz 1 A 1 5 1 = 0,
nN-1YN-1AN-1 nNAN = O.
Ao
zt 1 = ial 1c o ,
7Tj _ Ao 12"';-1
n o for 2<,j<N-1,
ki 12" S;
7C o =[i +
o + Ao Nl
...j _ 1 1.
(9.7)
A1S1 j=2 Aj 1... 5J
Y_
N-1
(Qx, Y)n = ( - 2 0X0 + 2 0X1)Y0it0 + Y (bjAjX;-1 2;X; + ;A;Xj+l)Yjlrj
;= 1
+ ( ANXN -1 A NXN)YNltN
N
1 OX0Y0it0 ANXNYNiN + 6j Aj x j lY; "j
j=1
N-1 N-1
1jxjyjij +
N-1
_ A :XoY0it0 ANXNYNiN I Ajx j Y; 7r ;
j=1
N N
+ + (9.10)
j= 1 j=1
Equivalently,
Therefore,
N
pij(t) = y- xi k e tak x jk 7C j . ( 9.15)
k=0
kn
ak =cos N 1 (k=0, 1,2,...,N), (9.17)
Hence, by (9.15),
N
Y e(cos(krz/N)-1)tx.kxjki,
pi(t) _
k=0
10 ABSORPTION PROBABILITIES
Proposition 10.1. Let p(t) and $(t) denote the transition probabilities for the
distributions PI and P; generated by Q and Q, respectively. Then, for i 0 A,
we have
the backward equations for p ;; (t), with i, je S\A, take the form
p '(t) = Q P (t), (10.6)
where p and Q are obtained from p and Q by deleting all rows and columns
corresponding to states in the set A.
At any particular time t, the site is in one of three possible states a, b, or c, say,
depending on whether the site is (a) occupied by an acceptable molecule,
(b) occupied by an unacceptable molecule, or (c) unoccupied. The evolution of
states at the given site is, according to the above assumptions, a continuous-time
Markov chain {X} with state space S = {a, b, c} starting at c and having
infinitesimal rates given by
a b c
a 0 0 0
Q = b 0 . (10.7)
c 2 (1 a)A A
and
b c
. (10.10)
Q c[(1 a)A A]
In particular,
1
x = x 1
[r, +i ri+
2
(10.14)
ABSORPTION PROBABILITIES 321
detB=det 1 = (r 2 r 1 ) 0. (10.15)
[ r, + r 2 +
/22{er2t e rlt}
_ I {(r2 + )e r (r1 + )er2(} 2{e
(r2 r1) (rl + )(r2 + ){e " e r2l } {(r2 + (rl + )e r'I }J
(10.16)
Therefore, from (10.16) and (10.8) we obtain
Pc (z a > t) = ^
r I ^r 2 (1 + r^er'` r i (1 + 2 ) r 2 `]. (10.17)
Assume that the offspring distribution {f} has finite second moment with mean
< 1. Then h'(1) <0 and h"(1) < oo. In particular it follows from Proposition
5.7 that p = 1. We will exploit the special Markov branching structure to
compute the tail probability P,(t {0} > t) for large values of t under these
assumptions. Let
We have,
(rg'' ds )
J h(s) = t. (10.24)
Y
Now,
where (1 - ) = h"(1) < cc. Thus, under the assumptions for (10.21),
1 _ 1 _ 1
h(s) K,(s 1) + z^(S)(s 1) 2 (s)(s 1)
K 1 (S-1) 1+
2K 1
^(s)(s l )
= 1 1 2K1 = 1 + qO(s), (10.25)
K(S 1) 1 + ^(s)(s 1) K,(s 1)
2K 1
where
c(s)
((s)= 2K,
^( s)(s - 1 )
I+
2K 1
11
(s)
h(S) K I (s 1)
ABSORPTION PROBABILITIES 323
Then H'(x) = 1/h(x) > 0 for 0 < x < I and (10.24) may be expressed as
H(x) =
_J t cp(s) ds + 1 log(1 x)
x K1
1
_ 1 1 x^p(s) ds (1 x) + log(1 x)
x 1
log(1 2
x) = K 1 H(x) K (1 x) + o(1 x). (10.30)
2K 1
Therefore, for x 1 , -
Now, by (10.22), (10.28), and (10.31) we have for t oo, y,:=H - 1 (t + H(0)),
where
K1 = 2(1 )>0, c=e'Ieto>>0.
Z ds
k(z, co) := tim k(z, t) = 1 exp K l (10.34)
roo o h(s)
Y_
cm( 6 ) = I 2am( a m-I + a m+1 ). (11.1)
Time
Space
m n
... (+) (+) (-) ... ( ao(o))
(-) (-)
Figure 11.1
6 n (t) = 1 if the (negative) flow reaches (n, t) from some initial point (m, 0)
= +1 otherwise. (11.3)
More precisely, the flow is defined to reach (n, t) from (m, 0) if there are times
0= t o <r 1 <<z k+1 =t and sites m= n 0 ,n 1 ,...,n k =n such that an
arrow occurs from (n ; _ 1 , T i ) to (n;, t ; ) and there are no b's along the half-open
vertical intervals (excluding lower end points) joining (n t i ) to (n i ),
i = 1, ... , k. In this case (n, t) and (m, 0) are said to be connected by a
continuous-time nearest-neighbor percolation.
Y_
The transition from a m (t) = Ito a m (t) = +1 occurs if and only if t =
is an event time of the process {Nmn (t)} for some n such that a n (t) _ +1.
Therefore the flipping rate from 1 to + I is
Y_
The transition from a m (t ) = + 1 to Q m (t) = 1 occurs if and only if t = T is
an event time of the process {Nmn (t)} for some n such that a n (t ) _ 1.
Therefore the flipping rate from + 1 to 1 is
that the value Q,(t + s) must coincide with Qm ,(t) for some m i e Z", where m ;
depends only on a(u), u >, t, but not on a(u), u < t. In the case of Figure 11.1
the history of v(t + s) given a(u), u < t, shows that the voter at n at time t + s
is copying the voter at m at time t (see theoretical complement 1).
The two configurations a + and a - representing total consensus, a = + 1 o
and a. _ -1 for all n, are absorbing. In particular this makes each probability
distribution of the form p . + ( 1 - p)S a -, for 0 < p < 1, an invariant
Q
Then,
`_
T,f(a) .f (a) = Ea{.f (d'(t)) .%(a)}
_Y_ m
{f() f(a)}a n (a)l +O(t), (11.5)
Af(a) = lim
t - a + t
CHAPTER APPLICATION 329
E { f (a(t
Q )) f (a)}
= lim
r-0 1
tp e (t, D) j^ l
=
Ea
kn Q( ) (t) kfl 6 k(t) Cm(a(t))
t
_ 2d Z E,, j
meD l fl a (t)] L 1
t keD
k am(t) I Pmn an( t )
n J }
_ 2dIDI cp e (t, D) + 2d 1 p mn q (t, (D\{m})A{n}), (11.9)
meD n
acp (t, D)
2dIDjcp (t, D) + 2d [' / t, / D m 0 J n
a t mED n
Now, for the simple symmetric random walk on 7L , one can show that the only
bounded solutions to (11.11) are constants (theoretical complement 3).
Therefore, the distribution of a m under the invariant equilibrium distribution
is independent of m in this case. From here out we will restrict our consideration
to the long-time behavior of various translation invariant initial distributions,
with say,
In particular, we will consider what happens in the long run when voters initially
are, independently, + 1 or 1 with probabilities p, 1 p, respectively.
The equation (11.9) or (11.10) suggests consideration of the following
finite-particle system. A particle is initially placed at each site of the finite set
Do = D. Each of these particles then undergoes a continuous-time random walk
according to p = ((p m )) with exponential holding times with parameter 2d,
independently of the others until two particles meet. If two of the particles meet,
then they are mutually annihilated. Let D, denote the collection of sites occupied
by particles at time t > 0. The evolution of the stochastic process {D,} takes
place in the (denumerable) state space S = q(d) consisting of all finite subsets
of Z'. The empty set is absorbing for the process. If D, : 0 then in time t to
t + At a change to (D,\{m })i {n}, for some me D, and ne Z', occurs with
probability 2dp mn At + o(At) as At --* 0+. Other types of changes in the
configuration have probability o(At) as At --^ 0+.
Now consider that for fixed a e S we have by (11.9) that
0 9.(t, D) = p *
^VQ(t, )(D), (11.13)
at
together with
(p.(0, D) = [1 a m , (11.14)
mED
The representation (11.16) is known as the duality equation and is the basis for
the proof of the following major result.
Theorem 11.1. Let p = ((p m )) be the transition probability matrix of the simple
symmetric random walk on 7L associated with the simple symmetric voter
model on 7L . For an initial probability distribution satisfying (11.12) we have
the following:
(a) If d < 2, then p, converges in distribution to p Q+ + (1 - p)S,, _ as t -* oo.
(b) If d >, 3, then for each p e (0, 1) there is a distinct translation-invariant
equilibrium distribution v ) on (S, .F ), which is not a mixture of 6 Q+
and S a _, such that E,,,p,v m = 2p - 1. Moreover, if the initial distribution
It is that of independent 1-valued Bernoulli random variables with
probability parameter determined by (11.12), then , converges in
distribution to v 0) as t -+ co.
_J [ ESml fl T k ] 1t(da)
where p mk (t) is the transition law for the continuous-time random walk. Also
for distinct sites n and m in 7L" we have
Therefore, it is enough to show that cp(t, {n, m}) - 1 as t -> oo to prove (a).
Now since the difference of the two independent simple symmetric random
walks starting at m and n evolves as a (symmetrized) continuous-parameter
simple symmetric random walk, it follows from the recurrence when d < 2, that
0 a.s. as t --> oo; i.e., the particles will eventually meet. Using the duality
relation and the Lebesgue Dominated Convergence Theorem we have, therefore,
To prove (b), on the other hand, take for It the Bernoulli product distribution
332 CONTINUOUS-PARAMETER MARKOV CHAINS
yl,(t, D) = E 1 , ED H 6 m} = ED Et fl amt = ED fl EN am
l meD, { meD1 J meDt
where 1D1 1 denotes the cardinality of D1 . Now since JD I is a.s. nonincreasing, the
limit, denoted FDJ, exists a.s. and is positive with nonzero probability by
transience. The limit distribution v P is defined through its block correlation
( )
f ( = lim
s-. f
s s
11)p(t; a, dq)g,(da), (11.22)
The property that, for each bounded continuous function f on S. the function
a * E f (a(t)) = f s f (rl) p(t; a, dq) is also a bounded continuous function, is
Q
called the Feller property. As illustrated in the above proof (Eq. 11.22), this
property is essential for confirming one's intuition about invariance properties
of long-time limits under weak convergence. The other important role played
EXERCISES 333
by topology in the above was in the use of compactness to, in fact, get weak
convergence from finite-dimensional calculations.
EXERCISES
4. Prove that the compound Poisson process (Example 2) has independent increments.
5. Consider a compound Poisson process {X,} with state space U8' and an arbitrary
jump distribution p(dx) on (f8'.
(ii) Show that {X} is a Markov process and compute its transition probability
p(t; x, B)'= P(X, E B I X0 = x).
(ii) Compute the characteristic function of X,.
6. (Doubly Stochastic Poisson or Cox Process) Suppose that the parameter A (mean
rate) of a homogeneous Poisson process {X} is random with distribution
p(dA) = f(A) d). on (0, ro). In other words, conditionally given A = ) o , {X} is a
Poisson process with parameter A 0 .
(i) Show that {X} is not a process with independent increments.
(ii) Show that {X,} is (generally) not a Markov process.
(iii) Compute the distribution of X, for arbitrary but fixed t > 0.
(iv) Compute Cov(X ,, X,).
.
(i) Use the foward equation to show m(t) _ ( S)m (t), m ; (0) = i.
;
+ + (1 e ^e_ 1)] if 8
sr(t)=
i(i + 2t), if = b.
*5. Consider a compound Poisson process {X,} on III' with an arbitrary jump distribution
p(dx).
(i) For a given bounded continuous function f on II' compute
at u(t, x) = Au(t, x) + J
.i J u(t, x + y)(dy) = A (u(t, x + y) u(t, x))(dy).
EXERCISES 335
, lo at
where Q is the integral operator (Qf)(x) _ ) f (f (x + y) f (x)) p(d y).
(iii) Write the (backward) equation (i) above for u(t, x) in terms of Q.
6. Let { Y} be a nonhomogeneous Markov chain with transition probabilities
p (s, t) = P(Y = j ( Y = i), continuous for 0 _< s _< t, with p ;i (s, s) = b, i , i, j e S, and
S
such that
(ii) For finite S show that the backward and forward equations, respectively, take
the forms below:
(forward)
a
P,k(S, t)
dt _ P,;(s, t)9;k(t)
i
10. (Continuous-Parameter Plya Process) Fix r> 0 and consider a box containing
r = r(r) red balls and b = b(T) black balls. Every r units of time a ball is randomly
selected, its color is noted, and together with c = c(z) balls of the same color it is
placed in the box. Let S T denote the number of red balls sampled by the time nr,
n = 0, 1, 2, ... , S o = 0. As in Eq. 3.14 of Chapter II, {S,} is a discrete-parameter
nonhomogeneous Markov chain with one-step transition probabilities
and
r + ci
p,.;+ (nt,(n+1)a)=P(S^, +nz =i +1 S,, =i)= i= 0,1,...
r + b + nc '
Let p = p(r) = r/(r + b), y = y(r) = c/(r + b), t = nz. Then the probability of a
transition from i to i + 1 in time t to + t is given by
p+yi
(t,t+T)=
1+yt
i
Note that p = r/(r + b) is the probability of selecting a red ball at the nth trial and
np = (p/z)t is the expected number of red balls sampled by time t = nr, i.e., p/i is
the mean sampling rate of red balls. Suppose that p/T = p(r)/r -+ 1 and
= Y(i)/t -. y o > 0 as t - 0.
(i) For fixed r > 0, use a combinatorial argument to show that the distribution of
S,, is given by
(
j (b+r)(b+r+c) (b+r+nc-c)
where
+ iy o
qi,i+,(t) = l = q(t), q 1 (t) = 0, j : i, i + 1 .
l + ty o
z
r;j (t)= S, j +q ;; t +q;j 1 t +
2!
for small t > 0, and use the fact that the product of matrices with all entries
nonnegative has itself all entries nonnegative together with the property
e to = (e(>Q)'.]
2. Write out the third iterate (of Eq. 4.7) p; for the case S = {l, 2, 3}, q,, i = A,
q1,2 = A q 22 = qz.3 = , q3.3 = y, q 32 = y, q; = 0 otherwise for i, j e S.
3. (A Pure Death Process) Let S = {0, 1, 2, ...} and let q_ = 6 > 0, q ;; _ S, i > 1,
= 0, otherwise.
(i) Calculate p (t), t 0 using successive approximations.
;;
f(v) =
f" '
e f(t) dt.
-
i(v) ^()
F(v, ) _ v ^ .
V/1
(iii) Show that the backward and forward equations (see also Exercise 5) transform,
respectively, as
p(v) = A + Q(v)
induction, using (4.7), that p(v) > ( " ) (p)A for n = 0, 1, 2, ... , and therefore
(v) Use [F] to prove the reverse inequality and hence (ii). [Hint: Check
r(v) = p(p) + (u - v)p(p)p(v) solves [F] and use minimality.]
7. Compute p ;k (t) for all i, k in Example 1.
C t"
I )/40 Ak 2 )
'
is asymptotically (as n -* co) Gaussian with mean 0 and variance 1. [Hint: Use
Liapounov's central limit theorem, Chapter 0.]
6. Messages arrive at a telegraph office according to a Poisson process with mean rate
of occurrence of 4 messages per hour.
(i) What is the probability that no message will have arrived during the morning
hours (8 to 12)?
(ii) What is the distribution of the time at which the second afternoon message arrives?
340 CONTINUOUS-PARAMETER MARKOV CHAINS
7. Let {X,} and {Y} be independent Poisson processes with parameters 2 and p
respectively. Show that the probability of n occurrences of {Y,} within the time
interval from the first to the (r + 1)st occurrence of {X,} is given by
C n r-1 1)(p
A + pJ \.A + p) n =
'
0, 1,2 , ... .
8. Let {X} be a Poisson process with parameter A > 0. Let To denote the time of the
first arrival.
(i) Let N = X ZTo X T ,, and calculate Cov(N, To ).
(ii) Calculate the (conditional) expected value of the time To + + T,_, of the
rth arrival given X, = n > r.
>_
9. Suppose that two colonies, I and 2, start as single units and independently undergo
growth by pure birth processes with rates 2 , respectively. Calculate the expected
size of colony 1 at the time when the first offspring is produced in colony 2.
10. Consider a puredeath process with rates q ;; = q 1 _ 1 = 2>0, i 1, q ; = 0
otherwise.
(i) Calculate p(t).
(ii) If initially there are n, particles of type 1 and n 2 particles of type 2 that
independently undergo pure death at rates S,, 5, respectively, calculate the
expected number of type 1 particles at the time of extinction of the type 2 particles.
11. Let {N,} be a homogeneous Poisson process with parameter 2>0. Define
X, _ (-1) ^ `, t 0. Show that {X,} is a Markov process and compute its transition
probabilities.
12. Consider all rooted binary tree graphs having n sources (i.e., degree-one vertices
excluding the root). Call any edge incident to a source vertex an external edge and
call the others internal (see Fig. Ex.IV.5). [By a rooted tree is meant a tree graph
in which a vertex is singled out as the root. Other graph-theory terminology is
described in Exercise 7.5 of Chapter II.]
n=4
-source
Internal
Root
Figure Ex. IV.5
_>
EXERCISES 341
(i) Show that a rooted binary tree with n sources has n external edges and n t
internal edges, n 1. In particular, the total number of edges is 2n 1, which
is also the total number of vertices excluding the root.
(ii) Show that the following code establishes a one-to-one correspondence between
the collection of all rooted binary tree graphs and the collection of all simple
polygonal paths from (0, 0) to (2n 1, 1) in steps of + I that do not touch
or cross the line y = 1 prior to x = 2n 1. Starting with the edge incident
to the root, traverse the tree along the leftmost path until reaching the leftmost
source. Then follow back until reaching a junction leading to the next leftmost
source, and so on, recording, on the first (and only on the first) traverse of an
edge, + if it is internal and if it is external. The path 2n I long of (+ )s
and ()s furnishes the coding of the tree in the form of a "random walk
excursion."
(iii) Use (ii) and the reflection principle (Section 4, Chapter I) to calculate the
distribution of ML in Example 3.
(iv) Let g(x) = Ex M I denote the probability-generating function of ML . Use the
recursive structure of the tree to establish the quadratic equation
g(x) = Zx + ? g 2 (x).
(v) Use (iv) to give another derivation of the distribution of ML .
13. Show that the holding time in state j for Example 2 is exponentially distributed
with parameter i t = jA (I? 1).
>_
3. Show that {p(t): t > 0} are transition probabilities on S v {A} satisfying the
backward and forward equations.
4. (i) Consider an initial mass of size x o that grows (deterministically) to a size x, at
time t at a rate that is dependent upon size; say, x; = f (x,), t 0. Give an example
of a growth-rate function f (x), x 0, such that the mass will grow without bound
within a finite time > 0, i.e., lim,_, x, = co. [Hint: Consider a case in which
each "element" of mass grows at a rate proportional to the total mass, so that
the total mass itself grows at a rate proportional to the mass squared.]
(ii) Let X, denote the number of reactions that have occurred on or before time t in
a chain reaction process. Suppose that {X} is a pure birth process starting at 0.
Show that the expected number of reactions that occur in a finite time interval
[0, t] need not be finite.
(A Bus Stop Problem) A passenger regularly travels from home to office either by
bus or by walking. The travel time by walking is a constant t w , whereas the travel
time by bus from the stop to work is a random variable, independent of the bus
arrival time, with a continuous distribution having mean t b < t w . Buses arrive
randomly at the home stop according to a Poisson process with intensity parameter
2. The passenger uses the following strategy to decide whether to walk or ride. If the
bus arrives within c time units, then ride the bus; if the wait reaches c, then walk.
Determine (optimality) conditions on c that minimize the average travel time in terms
342 CONTINUOUS-PARAMETER MARKOV CHAINS
of incoming calls. If a call arrives at a time when the line is busy, the call is simply lost.
(i) Give the corresponding prescription of Kolmogorov's backward and forward
equations for the state of the line.
(ii) Solve the forward equation.
(ii) The failure rate (also called hazard rate or force of mortality) for the objects
being renewed is defined by h(t) = f(t)/[l - F(t)], where f is the p.d.f. (assumed
to exist) of F. For continuously differentiable functions g having bounded
derivatives show, for a _> 0,
(iii) Show that (a) = Z -1 exp{ - J$ h(u) du}, a >_ 0, where Z is the normalization
constant, solves f o u(a)Qg(a) da = 0. In particular, show that p is the density
of an invariant probability for {A,}.
(iv) Show (i) holds also for the residual lifetime { SN , - t}, where N, = inf {n >_ 0:
S> t}. For ET, < oc, show that ff (t) = (1 - F(t))/ET t >, 0, is the density
of an invariant probability. [The above has an interesting generalization by
EXERCISES 343
N,
It `_ li0.1)(e tn), t > 0, Vo = 0,
n =o
where {N, = max {n: i n < t}} is the Poisson counting process. Show that {}} is a
Poisson process with intensity parameter (1 - p)2.
7. (i) (Vibrating String) For Example 5, check that in the (deterministic) case
p = ar = 0, u(x, t) = 2 {cp(x + vt) + cp(x - vt)}. In particular, that u(x, t) solves
=v2-
[Z x2 , u(x, 0) = cP(x), nt
au t) =0 at t = 0.
diffusion equation
u au
at = D 3x2'
x+Ax
R4x Lex
when there is a change in the flow of current aw/at in an inductor then there is a
corresponding voltage drop of L Ax Ow/Ot. According to Kirchhoff's second law, the
sum of potential drops (as measured by a voltmeter) around a closed loop in an
electric network is zero. Thus, for Ax small, one has
The nature of a capacitor is such that the ratio of the charge stored to the voltage
drop is the capacitance C Ax. Thus, the capacitor current, being the time rate of
change of charge, is given by C Ax au/at. According to Kirchhoff's first law, the sum
of the currents flowing toward any point in an electrical network is zero (i.e., charge
is conserved). Thus, for Ax small, one has
au
w(x+ Ax, t)w(x,t) +CAx=0.
at
(i) Use the above to complete the derivation of the transmission line equations and
the telegrapher's equation (for twice-continuously differentiable functions) and
give the corresponding electrical interpretation of the parameters a, , and v 2 .
(ii) In the case when leakage is present there is an additional parameter (loss factor)
G, called the conductance per unit length, such that the leakage current is
proportional to the voltage G Ax u(x, t). In this case the term G Ax u(x, t) is
added to the left-hand side in Kirchhoff's first law and one sees that the voltage
and current satisfy transmission line equations of exactly the same general forms.
Show that precisely the same equation is satisfied by both voltage and current.
10. In Example 4, suppose that the service time distribution is arbitrary, with distribution
function F. Determine the distribution of W,.
2. Prove that the first and third terms on the right side of Eq. 8.13 go to zero (almost
surely) as t -+ oo, provided (8.10) holds.
EXERCISES 345
3. When is a compound Poisson process (i) a pure birth process, (ii) a birthdeath
process? Write down q1, k ;j in these cases.
4. Let {X,} be a birthdeath chain, and let c <x <d be three states.
(i) Compute Px ({X,} reaches c before d) in terms of the infinitesimal rates (i.e.,
bi).
(ii) Calculate p PX ({X,} ever reaches c) in the case {X} is nonexplosive. Briefly
comment on what may happen in the case that explosion may occur with positive
Ps - probability.
5. Given a birthdeath chain {X}:
(i) Derive a difference equation for m(x):= E X (a) where i is the first time {X }
reaches c or d (c 5 x d).
(ii) For the special case qzs = A., , = hx = i (c <x < d), compute m(x).
(*iii) Compute m(x) for general birthdeath chains.
6. Let p(t) = ((p, (t))) be transition probabilities for a Markov chain on a finite state
space S. Adapt Proposition 6.1 of Chapter II to show that lim, m p. (t) exists for
all i, Je S, and the convergence is exponentially fast. [Hint: Fix t > 0 and consider
the discrete-parameter one-step transition probability matrix q = ((p ;,(t))). Then
q" = ((p ; ,(nt)), n = 0, 1, 2, .... Use the semigroup property to show that lim"^ q;jl'
(exists) does not depend on t > 0.]
7. Suppose that {X,} is irreducible and positive-recurrent. Let {}"} be the embedded
discrete-parameter Markov chain with transition matrix K. Let a be the invariant
distribution for {X,} such that n' Q = 0. Calculate the invariant distribution for { Y"}.
(IP( ^)
) N' rc o ,
i
where p = N (< 1)
(ii) Show that the average length of time a customer must wait for service is
(NP)"
It o .
N(1 p) 2 N!
age _ z _ ag^'>
r)+b(1r)] ^r , Q=Q1, S=2S.
at =[(r
EXERCISES 347
3. (Relaxation and Maximal Correlation) Let {X,} bean irreducible finite-state Markov
chain on S with transition probabilities p(t) = ((p ;j (t))), t >, 0, and infinitesimal rates
Q = ((q i3 )). Let it be the unique invariant distribution and assume the distribution
to be time-reversible as defined in Exercise 2(ii) above. Define the maximal correlation
by p(t) = sup f , 9 Corr( f (XX ), g(X o )), where the supremum is over all real-valued
functions f, g on S and
Show that
p(t)=e2, t>_0
p(t) = e Qr _ k ^
_ Qktk
k!
the eigenvalues of p(t) are of the form e`^' where A. are eigenvalues of Q. Take
f = g = cp, to get <p(t)f, g>, = e` and, for the general case (centered and scaled),
expand f and g in terms of {q,}, i.e., f = J (f, cp) n cp,,, g = E (g, to show
ICorr n ( f (X,), g(X0 ))l _< e. Use the simple inequality ab _< (a 2 + b 2 )/2.]
4. Let {X,} be an irreducible positive recurrent finite-state Markov chain with initial
distribution p. Let lt j (t) = PI (X, = j), je S, and let it denote the invariant initial
distribution. Also let Q = ((q, j )) be the matrix of infinitesimal rates for {X}. Show that
d (t)
(i) ' = {(t)q 1 p (t)q 1 }, t >' 0, j e S,
dt tes
z(0)=p, j e S.
Yjj = _
n1g1j njgj,
Consider an electrical network in which the states of S are the nodes, and y jk is
the resistance in a wire connectingj and k. Suppose also that each nodej carries
a capacitance rrj . Then these equations are Kirchhoffs equations for the spread
348 CONTINUOUS-PARAMETER MARKOV CHAINS
of an initial "electrical charge" ; (0) = p,, j ES, with time (see also Exercise 7.9).
The potential energy stored in the capacitors at the nodes at time t, when the
initial distribution is (0) = t, is given by
(ul(t))^
U(t) = U(t(t)) = - Y z
2 ieS n%
One expects that as time progresses energy will be dissipated as heat in the wires.
(iii) Show that if p 96 it, then U is strictly decreasing as a function of time.
(iv) Calculate U(p(t)) for the two-state (flip-flop) Example 3.1 in the cases t(0) = 6 ( ,^,
i =0,1.
5. As in Exercise 4 above, let {X1 } be an irreducible positive recurrent finite-state Markov
chain with initial distribution it. Let ; (t) = PI (XZ = j), je S, and let it denote the
invariant initial distribution. Let h be a strictly concave function on [0, x) and define
the h-entropy of the distribution at time t by
1. Calculate the distribution G(t; 2, A ; , ... , A i ) appearing in Eq. 10.4. [Hint: Apply
the partial fractions decomposition to the characteristic function of the sum.]
2. Calculate the distribution of the time until absorption at j = 0 for the pure death
process on S = {0, 1, 2, ...} starting at i, with infinitesimal rates q 1 = ib if j = i 1,
i >, 1, q i1 = i8, q 11 = 0 otherwise. What is the average time to absorption?
3. A zero-seeking particle in state i > 0 waits for an exponentially distributed time with
parameter a. and then jumps to a position uniformly distributed over
0, 1, 2, ... , i 1. If in state i < 0, it holds for an exponential time with parameter
2 and then jumps to a position uniformly distributed over i + 1, i + 2, ... , 1, 0.
Once in state 0 it stays there. Calculate the average time to reach zero, starting in
state i.
*4. (i) Under the conditions stated in Example 2, show that conditioned on non-
extinction at time t, X, has a limit distribution as t -- oc with p.g.f. given by
Eq. 10.34. [Hint: Check that Zj o P,(X, =j I r (o) > t)r may be expressed as
{g,(' ) (r) g,(' ( 0 )}/( 1 g"()). Apply (10.28) to get
1 H"'(t + H(0))
(ii) Determine the precise form of the distribution of the limit in the binary case
.r2=p<9=.%,p+9= 1 .
THEORETICAL COMPLEMENTS
lb,(dx) = n J (I e 2
n[1 cPn( 2 5)] = n I J e 2
I J u'
),u,(dx)
J
= n [I cos(2^x)]p(dx) = 2n J [I
a
cos 2 (^x)](dx)
So, the sequence n[ 1 cp(2' )] k is bounded on [r, r]. This means that each cp(s ),
and therefore, cp(^) must be nonzero on [Zr, Zr]. In particular, there could be no
largest such r and this proves p has no real zeros.
If {X,} is a stochastic process with state space S c R' having stationary independent
increments, then for any s < t, the distribution of X, X, is clearly infinitely divisible
since for each integer n _> 1,
where B x :_ {y x: ye B} is a translate of B.
2. (LevyKhinchine Representation) The Brownian motion process and the compound
Poisson process are simple examples of processes having stationary independent
increments. While the Brownian motion is the only one of these that is a.s. continuous,
both are continuous in probability (called stochastic continuity). That is, for any t o >' 0,
X, X 0 in probability as t + t o since (i) for the Brownian motion, a.s. convergence
implies convergence in probability and (ii) for the compound Poisson process,
P(IX, X,I > e) < I e -z^` = 0Qt sI) for e > 0. Although the sample paths of
the Poisson and compound Poisson processes have jump discontinuities, stochastic
continuity means that there are no fixed discontinuities.
Stochastic processes {X} and {Y} defined on the same probability space and
having the same index set are said to be stochastically equivalent if P(X, = Y) = 1
for each t. Stochastically equivalent processes must have the same finite-dimensional
distributions. As an application of the fundamental theorem on the sample path
regularity of stochastically continuous submartingales given in theoretical complement
5.2, it will follow that a stochastically continuous process {X,} with independent
increments is equivalent to a stochastic process {Y} having the property that almost all
of its sample paths are right-continuous and have left-hand limits at each t, i.e., have
at most jump discontinuities of the first kind. Moreover, the process {Y,} is unique in
the sense that if { Y;} is any other such process, then P(Y, = Y, for every t) = 1. Without
loss of generality we may assume the given process {X} with stationary independent
increments to have jump discontinuities of the first kind. Such a process is called a
(homogeneous) Levy process. The prefix homogeneous refers only to stationarity of
the increments.
Proof. Let s < t. By sample path continuity, for any e> 0 there is a number
6 = S(c)> 0 such that
Let {E"} be a sequence of positive numbers decreasing to zero and partition (s, t] into
subintervals s = t / < t;" < ... < tk", = t of lengths less than b" = b(e,,). Then
) )
k k
X,X9 = E (XX ,1X1 )__ E X;" I
where X'> . .. , Xk^ are independent (triangular array). Observe that the truncated
)
random variables
THEORETICAL COMPLEMENTS 351
k
P(X, XS =R)>> I e".
Within this same context, the other extreme is represented by the following.
Theorem T.1.2. Let {XX } be a homogeneous Levy process almost all of whose sample
paths are step functions with unit jumps. Then {X,} is a Poisson process. 0
The proof of Theorem T.1.2 will be based on the following basic coupling lemma.
Lemma. (Coupling Bound). Let X and Y be arbitrary random variables. Then for
any (Borel) set B,
Proof. First consider the case P(X e B) > P(Y e B). Then
0-<P(XeB)--P(XeB,YeB)=P(XeB,Y0B)-<P(X ^ Y).
The coupling lemma can be used to get the following Poisson approximations
which, while important in their own right, will be used in the proof of Theorem 1.1.2.
where
Q based on the value of a random variable U uniformly distributed over (0, 1] (i.e.,
352 CONTINUOUS-PARAMETER MARKOV CHAINS
X = Q(U), (T.1.1)
where Q(x) e {O, 1, 2, ...} is uniquely defined for each 0 < x < I by the condition
that (an empty sum being zero)
Qt:) -1 Q(x)
q ; . (T.1.2)
j=0 j=0
Now let U 1 , U2 ,..., UU be i.i.d. uniform over (0, 1] and let G, - G F; = F,, be
the Bernoulli and Poisson distributions with parameter p ; , respectively, i = 1, ... , n.
That is,
G,({1})=p = 1 G ({0}),
; ; F; ({k})=p` e P', - k=0,1,2,....
k!
Then, by the coupling lemma, letting p _ ^i= 1 p ; and letting indicate the
corresponding "simulation" function as defined above, we have
since the distribution of the sum of independent Poisson random variables with
parameters p,,... , p is Poisson with parameter E" =1 p ; . Now from (T.1.3) we get
P(G;(U;) ^ F;(U,)) =
(e-v1 (I p;)) + (1 e-a'(1 + p;)) 5 p? (T.1.5)
.- {G (U ) = 0} - 1 -- {G (U ) = I}
; ; ; ;
0 1 pi e-n, (1 + p;)e-P' 1
I ' {(U,) = 0) I - {F (U;) = 1) -;I f-- { F (U,) > 1) -- ;
Figure TC.IV.1
THEORETICAL COMPLEMENTS 353
IF(J) FA(J)I 1
For this, first suppose p" = _, p ; > land let Z Z Z be independent Poisson random
variables with parameters a and Y"_ , p i A. Again use the coupling bound lemma
to bound the last term in (T.1.6) by
Proof of Theorem T.1.2. It is enough to show that for each t > 0, X, has a Poisson
distribution with EX, = At for some A > 0. Partition (0, t] into n intervals of the form
(t i _,, t i ], i = 1, ... , n having equal lengths A = t/n. Let A> = {X,, X1 l} and
SS _ Y"-_, I"? (the number of time intervals with at least one jump occurrence). Let
D denote the shortest distance between jumps in the path Xs , 0 < s -< t. Then
P(X, 0 S") -< P(0 < D < t/n) + 0 as n --, oo, since {X, 0 Sn } implies that there is at
least one interval containing two or more jumps.
Now, by the Poisson approximation lemma (ii), taking J = {m}, we have
i = 1, ... , n, and A > 0 is arbitrary. Thus, using the triangle inequality and the
coupling bound lemma,
P(XX = m )-- e xS
- IP(X, =m )P(Sn=m )I + P(Sn=m ) --- e -zl
m! m!
P(X, 0 Sn) + Inpn AI + npn
The proof will be completed by determining A such that np" Al 0, at least along
354 CONTINUOUS-PARAMETER MARKOV CHAINS
a subsequence of {np"}. For then we also have np = np"P(A) 0 as n > oo. But,
since P(X, = 0) = P(n"=, A;'') = (1 p")", it follows that P(X, = 0) > 0; for
otherwise P(A;" ) ) = 1 for each i and therefore X, > n for all n, which is not possible
under the assumed sample path regularity. Therefore np, _< n log(1 p") =
log P(X, = 0) for all n. Since {np"} is a bounded sequence of positive numbers,
there is at least one limit point A > 0. This provides the desired A.
The remarkable fact which we wish to record here (with a sketch of the proof and
examples) is that every (homogeneous) Levy process may be represented as sum of a
(possibly degenerate) Brownian motion and a limit of independent superpositions of
compound Poisson processes with varying jump sizes. Observe that if the sample paths
of {X,} are step function of a fixed jump size y 0 0, then {y 1 X1 } is a Poisson process
by Theorem 1.1.2. The idea is that by removing (subtracting) the jumps of various
sizes from {XX } one arrives at an independent homogeneous Levy process with
continuous sample paths. According to Theorem T.1.1, therefore, this is a Brownian
motion process. More precisely, let {X,} be a homogeneous Levy process and let
S = [0, T) x !J' for fixed T> 0. Let a(S) be the Borel sigmafield of S, and let
R + (S) = {A e g(S): p(A, [0, T) x {0}) > 0}, where p(A, B) = inf{Jx yl: x e A, ye B}.
The spacetime jump set of {X,} is defined by
For fixed we S2, v is a measure on the sigmafield a(S), and for each A e P4(S), v(A)
is a random variable (i.e., w --+ v(cw, A) is measurable). Moreover, for fixed A a P4(S),
the process v(A n [0, t] x U8'), t _> 0, is a Poisson process. To prove this, take A an
open subset of S and show that the process is a Levy process with unit jump step
function sample paths and apply Theorem 1.1.2.
Define (A) = Ev(A) for A e .4 + (S). Then visa measure on P4(S); countable
additivity can be proved from countable additivity of v by Lebesgue's Monotone
convergence theorem. Now observe that for each fixed B E .(OR' \ {0} ), the set function
(C) = v(C x B) is translation invariant on .4([0, T)). Therefore, v(C x B) is a
multiple (dependent on B) of the Lebesgue measure ICJ of C, i.e., v(C x B) = K(B)ICI,
where K(B) is the multiplying factor. Notice now that since p is a measure, K must also
be a measure on .P(R' \ {0} ). Define
Since there are at most a finite number of jumps in [0, t] of size 1/n or larger,
{uv^> Ii "i yv,(dy) is well-defined. However, in general, its limit may not exist pathwise.
By appropriate centering, a limit may be shown to exist in L 2 . Finally, one may
prove that this process built up from jumps is independent of the Brownian motion
component. In this way one obtains the following.
Theorem T.1.3. Let {X,} be a homogeneous Levy process. Then there is a standard
Brownian motion {B,} independent of {v,(B): Be .4(IV \ {0}), 0 < t < T} such that
X, = mt + /CT B 1 +
ff
'^
Yv1(dY)
1 +t y z K(dY) }
in distribution, where me R', Q 2 -> 0, and for fixed Be .(U8' \{0}), v 1 (B) is a Poisson
process in t.
Example 2
r = lim y z K(dy)
e10 IYI>s 1 + y
exists. Then,
X, = (m + r)t + 6 2 B, +
5 yv,(dy).
In this case
l
Ee' 4x, =exp{it(m+r)^t Z +t J -. [e'^Y - 1]K(dy)}.J J
Example 3. Ifm+r= 0, a 2 = 0, then
X, = lim
e10 f(IYI>e)
yv,(dy) and Ee = exp{lim t J
(((sly
(e'4Y 1)K(dy)
(IYI>el )))
One may then check the positive-definiteness using the bilinearity of the inner product;
see, for example, M. Ossiander and E. Waymire (1989), "On Certain Positive Definite
Kernels," Proc. Amer. Math. Soc., 107, 487-492.
The fractional Brownian motion {Zr } with exponent may now be most simply
defined as a mean-zero Gaussian process starting at 0 such that Z S and Z, have
covariance given by p (s, t). This class of processes was studied by Benoit Mandelbrot
and others in connection with the Hurst effect problem of Section 1.14; see
B. Mandelbrot (1982), The Fractal Geometry of Nature, Freeman, San Francisco (and
references therein) and J. Feder (1988), Fractals, Plenum Press, New York.
Additional reference: M. Taqqu (1986), "A Bibliographical Guide to Self-Similar
Processes and Long Range Dependence," in Dependence in Statistics and Probability
(E. Eberlein, M. Taqqu, eds.), Birkhuser, Boston.
THEORETICAL COMPLEMENTS 357
with the convention that the supremum over an empty set is 0. Thus UN is also the
number of uperossings of (0, b a) by {X} in time N.
Since X X N fork > [N/2], one may write (setting t 0 = 1)
IN121 [N^/'2` 1
XN X1 = Y- (XT2k-1 X T2k-2 ) + L^ (XT2k Xr2k - 1 ) (T.5.2)
k=1 k=1
To relate (T.5.2) to the number UN , let v denote the largest k such that ry k _< N, i.e.,
v is the last time _< N for an uperossing or a downcrossing. Notice that UN = [v/2].
Ifvis even, then XT2k XT2k-, >b a if 2 k<v , and =X N X N =0if2k>v.Now
suppose v is odd. Then XT2k XT2k _ 1 >_ b a if 2k I < v, and = 0 if 2k I > v, and
= XT2k 0 _> 0 if 2k I = v. Hence in every case
N/21
(Xr2k X T2k - 1 ) >- [v/2](b a) = UN(b a). (T.5.3)
k= 1
As a consequence,
[N/21
XN X1 >- Y- (XT2k - 1 XT2k - 2) + (b a)UN . (T.5.4)
k=1
Observe that (T.5.4) is true for an arbitrary sequence of random variables (or real
numbers) {Z}. Assume now that {Z} is a {.f}-submartingale. Then {X} is a
{.F}-submartingale by convexity. One may show, using the method of proof of
_> .
Theorem 13.3 of Chapter I, that {X Tk : k >_ l} is a submartingale. Hence EX
increasing in k, so that
E[ [N121
Y-
k=1
(XT2k-I XT2k-2 ) ] 0
Proof. Let U(a, b) denote the total number of uperossings of (a, b) by {Zn }. Then
UN (a, b) j U(a, b) as N I oo. Therefore, by the Monotone Convergence Theorem
(Chapter 0)
EIZNI + lal
EU(a, b) = lim EUN (a, b) _< sup < co . (T.5.6)
N1 N b a
Since this holds for every pair a, b = a + 1/m with a rational and m a positive integer,
and the set of all such pairs is countable, one must have liminf Z. = limsup Z. almost
surely. If lim Z. is not finite with probability 1, IZ,I * oo with positive probability
so that E limlZI = co. Then, by Fatou's Lemma (Chapter 0), lim EIZI = 00, which
contradicts the hypothesis of the theorem. The integrability of IZJ follows from
Fatou's Lemma. n
Proof. For a nonnegative martingale {Zn }, IZn l = Z. and therefore, sup EIZI =
sup EZ = EZ, < oo. By Fatou's Lemma (Chapter 0), EZ ^ = E(lim Z) S lim EZ, =
EZ,. n
2. (Martingales and Sample Path Regularity) The control over fluctuations available
through Doob's Inequality (Section 1.13) and the Uperossing Inequality (theoretical
complement 1) for (sub/super) martingales make it possible to find versions of such
processes with very regular sample path behavior. In this connection the martingale
property is of fundamental importance for the construction of models of stochastic
processes having manageable sample path properties (cf. theoretical complements 1.1
and 5.3). The basic result is the following. Recall the meaning of continuity in
probability from theoretical complement 1.2.
THEORETICAL COMPLEMENTS 359
Proof: Fix T> 0 and let Q T denote the set of rational numbers in [0, T]. Write
Q T = U ^ , R", where each R. is a finite subset of [0, T] and T a R, c R 2 c . By
Doob's Maximal Inequality (Section 1.13) we have
IS rl
P( max (X,I > AlE n= 1,2 ... .
tE R, j
Therefore,
In particular, the paths of {X,: t a Q T } are bounded with probability 1. Let (c, d) be
any interval in R and let U (T) (c, d) denote the number of uperossings of (c, d) by the
process {X,: t e Q T }. Then U (T) (c, d) is the limit of the number U ( " ) (c, d) of uperossings
of (c, d) by { X,: t E R"} as n -* co. By the uperossing inequality (theoretical complement
1), one has
EIXTI + j cj
EU 1 " ) (c, d)
dc
Since, the Ul" ) (c, d) are nondecreasing with n, it follows that U (T) (c, d) is a.s. finite.
Taking unions over all (c, d), with c, d rational, it follows that with probability one
{X,: t a Q 7_} has only finitely many uperossings of any interval. In particular, therefore,
left- and right-hand limits must exist at each t < T a.s. To construct a right-continuous
version of {X,}, define X, = lim5.., ;.SEQT X, fort < T. That {X,} is in fact stochastically
equivalent to {X1 } now follows from continuity in probability; i.e., X, = lim s -,. X, = X,
since a.s. limits and limits in probability must a.s. coincide. Since T is arbitrary, the
proof is complete. n
3. (Markov Processes and Sample Path Regularity) The martingale regularity theorem
of theoretical complement 2 can be applied to the problem of constructing Markov
processes with regular sample path properties.
Theorem T.5.4. Let {X,: t _> 0} be a Markov process with a compact metric state
space S and transition probability function
such that
(i) (Uniform Stochastic Continuity) For each e 0, p(t; x, BE(x)) = o(1) as t -* 0 +
uniformly for x e S, where B E (x) is the ball centered at x of radius e > 0.
(ii) (Feller Property) For each (bounded) continuous function f, the function
x -+ f s f(y)p(t; x, dy) = E x f (Xr ) is continuous. Then there is a version {X,} of
{X,} a.s. having right continuous sample paths with left-hand limits at each t.
0
Proof. It is enough to prove that {X,} a.s. has left- and right-hand limits at each t,
for then {X,} can be modified as X, = lim s .,+ XS which by stochastic continuity will
provide a version of {X,}. It is not hard to check stochastic continuity from (i) and
(ii). Let f E C(S) be an arbitrary continuous function on S. Consider the semigroup
{ T} acting on C(S) by T, f (x) = E x f (X,), x e S, t _> 0. For A > 0, write
R x f (x) =
f' .
e 'Tf (x) ds,
- X E S;
R x (A > 0) is called the resolvent (Laplace transform) of the semigroup {T}. A basic
property of the resolvent is that AR, behaves like the identity operator on C(S) for
A large in the sense that
f o
^ e - 2IIf Tf II ds = f e - `Ilf Tx-lTII dt.
0
(T.5.8)
For each z > 0, by the properties (i) and (ii), Ill - Tx- lTII - 0 as A - oo, and the
integrand is bounded by 2e - TII f II since II 7;f II _< Ill II for any t >_ 0. By Lebesgue's
Dominated Convergence Theorem (Chapter 0) it follows that III f - 2R x f II - 0 as
f - f uniformly on S as d --> oo. With this property of the resolvent
A -+ oo; i.e., ARJ
in mind, consider the process { Y} defined by
1', = e -,
t >, 0, (T.5.9)
"Rj(X,),
m
= e -at f m e -xis+nt - zr e
T,+hf(X$) ds = e.
-,
"T.f(X,) ds
J 0 n
Thus, the same is true for {)e^'Y} _ {AR x f(X,)}. Since J.R A f -+ f uniformly as ;. -+ x,
the process { f(X,)} must, therefore, a.s. have left and right-hand limits at each t. The
same will be true for any f E C(S) since one can write f = f ' - f - with f + and
f - continuous nonnegative functions on S. So we have that for each f e C(S), { f (X,)}
is a process whose left- and right-hand limits exist at each t (with probability 1). As
remarked at the outset, it will be enough to argue that this means that the process
{X,} will a.s. have left- and right-hand limits. This is where compactness of S enters
the argument. Since S is a compact metric space, it has a countable dense subset
{x}. The functions f: S -- 68' defined by f(x) = p(x, x), XE S, are continuous for
the metric p, and separate points of S in the sense that if x y, then for some n,
f(X) ^ f,(y). In view of the above, for each n, { f,(X,)} is a process whose left- and
right-hand limits exist at each t with probability 1. Thus, the countable union of
events of probability 0 having probability 0, it follows that, with probability 1, for
all n the left- and right-hand limits exist at each t for { f(X,)}. But this means that
with probability one the left- and right-hand limits exist at each t for {X} since the
fn 's separate points; i.e., if either limit, say left, fails to exist at some t', then, by
compactness of S, the sample path t -+ X, must have at least two distinct limit points
as t --> t' , contradicting the corresponding property for all the processes { f(X,)},
-
n=1,2,.... n
In the case that S is locally compact, one may adjoin a point at infinity, denoted
A( S), to S. The topology of the one point compactification on S = S v {A} defines
a neighborhood system for A by complements of compact subsets of S. Let . !(S) be
the sigmafield generated by {, {A}}. The transition probability function p(t; x, B),
(t >, 0, XE S. Be s(S)) is extended to p"(t; x, B) (t >, 0, XE S. B e 4(S)) by making A
an absorbing state; i.e., p(t; A, B) = I if and only if Ac B, Be 4, otherwise
p(t; A, B) = 0. If the conditions of Theorem T.5.4 are fulfilled for p, then one obtains
a regular process {X,} with state space S. Defining
r o =inf{t>0;X,=A},
the basic Theorem T.5.4 provides a process {X,: t < r o } with state space S whose
sample paths are right continuous with left-hand limits at each t < r e . A detailed
treatment of this case can be found in K. L. Chung (1982), Lectures from Markov
Processes to Brownian Motion, Springer- Verlag, New York.
While the one-point compactification is natural on analytic grounds, it is not
always probabilistically natural, since in general a given process may escape the state
space in a variety of manners. For example, in the case of birth-death processes with
state space S = Z one may want to consider escapes through the positive integers as
distinct from escapes through the negative integers. These matters are beyond the
present scope.
4. (Tauberian Theorem) According to (5.44), in the case r >- 2 the generating function
h(v) can be expressed in the form
where p >- 0 and K(x) is a slowly varying function as x -+ oo; that is, K(tx)/K(t) --* 1
as t - oo for each x > 0. Examples of slowly varying functions at infinity are constants,
various powers of Ilog xl, and the coefficient appearing in (5.44) as a function of
x = (1 - v) '. In the case r = 1, the generating function vh'(v) for {nh} is also of
-
362 CONTINUOUS-PARAMETER MARKOV CHAINS
this form asymptotically with p = 1. Likewise, in the case of (5.37) one can differentiate
to get that the generating function of {(n + I)P(ML = n + 1)} is asymptotically of the
form Z(1 v) '' 2 as v - 1.
-
Let d(v) be the generating function for a sequence {a} of nonnegative real numbers
and suppose
(v) =I a k v k , ( T.5.13)
k=0
converges for 0 < v < 1. The Tauberian theorem provides the asymptotic growth of
the sums as
n
Y a k (1/pl'(p))nP as n -+ co, (T.5.14)
k=0
I
E{n - 'I 2 L ML = n} (2/)'p* + (r) as n - oo, (T.5.17)
where y* + (r) is the rth moment of the maximum of the Brownian excursion process
given in Exercise 12.8(iii) of Chapter I. Moreover, in view of the last equation in the
hint following that exercise, one can check that the moments *+(r) uniquely
determine the distribution function by checking that the moment-generating function
with these coefficients has an infinite radius of convergence. From this one obtains
convergence in distribution to that of the maximum of a (appropriately scaled)
Brownian excursion. This result, which was motivated by considerations of the main
channel length as an extreme value of a river network, can be found in V. K. Gupta,
O. Mesa and E. Waymire (1990), "Tree Dependent Extreme Values: The Exponential
Case, J. Appl. Probability, in press. A more comprehensive treatment of this problem
is given by R. Durrett, H. Kesten and E. Waymire (1989), "Random Heights of
Weighted Trees," MSI Report, Cornell University, Ithaca. Problems of this type also
occur in the analysis of tree search algorithms in computer science (see P. Flajolet
and A. M. Odlyzko (1982), "The Average Height of Binary Trees and Other Simple
Trees," J. Comput. System Sci., 25, pp. 171-213).
THEORETICAL COMPLEMENTS 363
1. A (noncanonical) probability space for the voter model is furnished by the graphical
percolation construction. This approach was created by T. E. Harris (1978), "Additive
Set-Valued Markov Processes and Percolation Methods," Ann. Probab., 6, pp.
355-378. To construct Q, first, for each m e Z', n belonging to the boundary
set 0(m) of nearest neighbors to m, let f2 m , n denote the collection of right-continuous
nondecreasing unit jump stepfunctions cn m.n (t), t >, 0, such that Wm(t) --* co as t -+ x.
By the term "step function," it is implied that there are at most finitely many jumps
in any bounded interval (nonexplosive). Let Pm ,, be the (canonical) Poisson
probability distribution on (S 2 m.n m.n) with intensity 1, where S m.n is the sigmafield
generated by events of the form {w E Q m : co(t) s k}, t > 0, k = 0, 1,2,.... Define
(1 ',Y', P') as the product probability space S2' = 11 ' = H ' 11 m.n,
where the products fj are over m E 1d , n e 3(m). To get 52, remove from Q' any
= ((Wm,n (t): t % 0): m E Z', n E 3(m)) such that two or more w m , n (t) have a jump
at the same time t. Then f(= S2 n #') and P are obtained by the corresponding
restriction to S2 (and measure-theoretical completion). The percolation flow structure
can now be defined on (S2, , P) as in the text, but sample pointwise for each
w = ((wm.n(t): t 0): m E 1 d , ne 3(n)) e 0. For a given initial configuration 1I E S let
D = D(q) :_ {m E 7L : 'im = -1 }. A sample path of the process started at '1. denoted
a D( " ) (t, w) _ (a^" ) (t, co): n E 71"), t > 0, co e S2, is given in terms of the percolation flow
on w by
The Markov property follows from the following basic property of the construction:
D^Q UI9I^S.CU %i
QD(^) (t + s, w) = (t, Us w), (T.11.2)
A metric for S, which gives it the so-called product space topology, is defined by
= --kJ.21ni_._... (T.11.5)
nEzd
where uni _ I(n l , ... , n d )I := max(1n 1 1, ... , In d l). Note that this metric is possible largely
because of the denumerability of 7L . In any case, this makes the fact that, (i) ' is
364 CONTINUOUS-PARAMETER MARKOV CHAINS
the Bore! sigmafield and (ii) S is compact, rather straightforward exercises for this
metric (topology). Compactness is in fact true for arbitrary products of compact
spaces under the product topology, but this is a much deeper result (called Tychonoffs
Theorem).
To prove the Feller property, it is sufficient to consider continuity of the mappings
of the form a ' PQ (v(t) = 1, ne F) at tt e S, for, fixed t >0 and finite sets F c V.
Inclusionexclusion principles can then be used to get the continuity of a -+ P0 (B)
for all finite-dimensional sets B e F. The rest will follow from compactness (tightness).
For simplicity, first consider the map a * PP (a (t) _ 1), for F = {n}. Open
neighborhoods of t) e S are provided by sets of the form,
Now, in view of the simple percolation structure for the voter model, regardless of
the initial configuration il, one has
where m(n, t) is some (random) site, which does not depend on the initial configuration,
obtained by following backwards through the percolation diagram down and against
the direction of arrows. Thus, if {r k } is a sequence of configurations in fl that converges
to il in the metric p as k oo, then o (t) v,,,,, t) (t) a.s. as k + oo. Thus, by
Lebesgue's Dominated Convergence Theorem (Chapter 0), one has
I /
4 (n) = q (m), (T.11.9)
2d ^ Ym) E
where 0(m) denotes the set of nearest neighbors of m, is said to be harmonic (with
respect to the discrete Laplacian on 7Ld ). Note that (T.11.9) may be expressed as
Proof. Suppose that {(Zk, ZZ)} is a coupling of two copies of {Z k }; i.e., a Markov
chain on S x S such that the marginal processes {Zk} and {Z'} are each Markov
>,
THEORETICAL COMPLEMENTS 365
chains on S with the same transition law as {Z}. Then, if r = inf{k 1: Zk = Zk} < o0
-
a.s. one may define Z; = Zk for all k > r without disrupting the property that the
process {(Z;, Zk)} is a coupling of {Z k }. With this as the case, by the boundedness
of cp and (T.11.10), one obtains
where B = sup.J(p(x)I. Letting k + oc it would then follow that q(n) = p(m). The
success of this approach rests on the construction of a coupling {(Z, Zk)} with r < 00
a.s.; such a coupling is called a successful coupling.
The independent coupling is the simplest to try. To see how it works, take d = 1
and let {Z} and {Z} be independent simple symmetric random walks on Z starting
at n, m, n m even. Then {(Zk, Zk)} is a successful coupling since {Zk Zk} is easily
_>
checked to be a recurrent random walk using the results of Chapters II and IV or
theoretical complement to Section 3 of Chapter 1. This would also work for d = 2,
but it fails for d 3 owing to transience. In any case, here is another coupling that
is easily checked to be successful for any d. Let {(Z, Zk)} be the Markov chain on
7L x Z d starting at (n, m) and having the stationary one-step transition probabilities
furnished by the following rules of motion. At each time, first select a (common)
coordinate axis at random (each having probability 1 /d). From (n, m) such that the
coordinates of n, m differ along the selected axis, independently select directions
(each having probability i) along the selected axis for displacements of the components
n, m. If, on the other hand, the coordinates of n, m agree along the selected axis, then
randomly select a common direction along the axis for displacement of both
components n, m. Then the process {(Zk, Zk)} with this transition law is a coupling.
That it is successful for all (n, m), whose coordinates all have the same parity, follows
from the recurrence of the simple symmetric random walk on 1 1 , since it guarantees
with probability 1 that each of the d coordinates will eventually line up. Thus, one
obtains cp(n) = cp(m) for all n, m whose coordinates are each of the same parity. This
is enough by (T.11.9) and the maximum principle for harmonic functions described
below.
defined above, that is harmonic on D , i.e.,
p(m) _m
2d neB(ml
(P(n), m e D,
In particular, it follows from the maximum principle that if cp also takes an extreme
value on D , then it must be constant on D . This can be used to complete the proof
of Theorem T. 11.1 by suitably constructing a domain D with any of the 2 d coordinate
parities on D and D desired.
Proof. To prove the uniqueness principle, simply note that q - 42 is harmonic with
zero boundary values and hence, by the Maximum Principle, zero extreme values.
n
The coupling described in Theorem T.11.1 can be found in T. Liggett (1985),
Interacting Particle Systems, Springer-Verlag, New York, pp. 67-69, together with
another coupling to prove the boundedness principle (Choquet-Deny theorem) for
the more general case of an irreducible random walk on V.
4. Additional references: The voter model was first considered in the papers by P. Clifford
and A. Sudbury (1973), "A Model for Spatial Conflict," Biometrika, 60, pp. 581-588,
and independently by R. Holley and T. M. Liggett (1975), "Ergodic Theorems
for Weakly Interacting Infinite Systems and the Voter Model," Ann. Probab., 3,
pp. 643-663. The approach in section 11 essentially follows F. Spitzer (1981), "Infinite
Systems with Locally Interacting Components," Ann. Probab., 9, 349-364. The
so-called biased voter (tumor-growth) model originated in T. Williams and R. Bjerknes
(1972), "Stochastic Model for Abnormal Clone Spread Through Epithelial Basal
Layer," Nature, 236, pp. 19-21.
Much of the modern interest in the mathematical theory of infinite systems of
interacting components from the point of view of continuous-time Markov evolutions
was inspired by the fundamental papers of Frank Spitzer (1970), "Interaction of
Markov Processes," Advances in Math., 5, pp. 246-290, and by R. L. Dobrushin
(1971), "Markov Processes With a Large Number of Locally Interacting
Components," Problems Inform. Transmission, 7, pp. 149-164, 235-241. Since then
several books and monographs have been written on the subject, the most
comprehensive being that of T. Liggett (1985), loc. cit. Other modern books and
monographs on the subject are those of F. Spitzer (1971), Random Fields and
Interacting Particle Systems, MAA, Summer Seminar Notes; D. Griffeath (1979),
Additive and Cancellative Interacting Particle Systems, Lecture Notes in Math.,
No. 724, Springer-Verlag, New York; R. Kindermann and J. L. Snell (1980), Markov
Random Fields and Their Applications, Contemporary Mathematics Series, Vol. 1,
AMS, Providence, RI; R. Durrett (1988), Lecture Notes on Particle Systems and
Percolation, Wadsworth, Brooks/Cole, San Francisco.
CHAPTER V
Note that (1.2) holds for Brownian motions (Exercise 1). A more general
formulation of the existence of infinitesimal mean and variance parameters,
which does not require the existence of finite moments, is the following. For
every c > 0 assume that
P(IXs,XI>E1Xs=x)=o(t),
hold as t j 0.
It is a simple exercise to show that (1.2) implies (1.2)' (Exercise 2). However,
there are many Markov processes with continuous sample paths for which (1.2)'
hold, but not (1.2).
Definition 1.1. A Markov process {X} on the state space S = (a, b) is said to
be a diffusion with drift coefficient (x) and diffusion coefficient v 2 (x) > 0, if
(i) it has continuous sample paths, and
(ii) relations (1.2)' hold for all x.
If the transition probability distribution p(t; x, dy) has a density p(t; x, y),
then, for (Borel) subsets B of S,
It is known that a strictly positive and continuous density exists under the
Condition (1.1) below, in the case S = ( co, oo) (see theoretical complement
1). Since any open interval (a, b) can be transformed into ( oo, oo) by a strictly
increasing smooth map, Condition (1.1) may be applied to S = (a, b) after
transformation (see Section 3).
Below, a(x) denotes either the positive square root of a 2 (x) for all x or the
negative square root for all x.
Condition (1.1). The functions (x), a(x) are continuously differentiable, with
bounded derivatives on S = ( oo, oo). Also, a exists and is continuous, and
INTRODUCTION AND DEFINITION 369
a 2 (x) > 0 for all x. If S = (a, b), then assume the above conditions for the
relabeled process under some smooth and strictly increasing transformation
onto ( oo, co ).
Although the results presented in this chapter are true under (1.2)', in order
to make the calculations less technical we will assume (1.2). It turns out that
Condition (1.1) guarantees (1.2). From the Markov property the joint density
of X,,, X, 2 , ... , X, for 0 < t l < t 2 < . < t is given by the product
p(tl;x,Yi)p(t2 t1;Yi,Y2)"'p(tn tn-1;Y,-1,Yn) Therefore, for an initial
distribution n,
P,,(XX,EB,,...,X,EB.)
fs e,
..
JB,,
p(tl;x,Y)
...
p(t t^-1;Yn-1,Yn)dy ...dy e n(dx). (1.4)
As usual P,, denotes the distribution of the process {X} for the initial distribution
rt. In the case it = S.. we write PX in place of P6 . Likewise, E E X are the
corresponding expectations.
Therefore, a reasonable model for the velocity process is a diffusion with drift
(/m)V and diffusion coefficient a 2 > 0, called the Ornstein-Uhlenbeck
process.
370 BROWNIAN MOTION AND DIFFUSIONS
furnishes the solution. In other words, given V = x, L' Gaussian with mean
xe - '" ` and variance ZQ 2 (1 e_ 2 '``) -1 m. From this one may check (1.5)
directly.
Z, =X,+
f o
f(u)du, t30. (1.7)
o
E{Z5+ ,Z3 1Z ,=z}=E{Xs+,XXIX=z
(
f(u)du}+
f
s
"+ t f (u) du
=^z J 0
f(u)du)t+f(s)t+o(t). (1.8)
Also,
(f.' s //I
= a 2 (z J S
0
f(u) du)t + o(t). (1.9)
Similarly,
Then T, is a bounded linear operator on B(S) when the latter is given the sup
norm defined by
il f II := sup{II(Y)I : Y e S} . (2.2)
Indeed T, is a contraction, i.e., IITrf II < IIfit for all f E B(S). For,
The family of transition operators {T,: t > 0} has the semigroup property,
where the right side denotes the composition of two maps. This relation follows
from
The relation (2.4) also implies that the transition operators commute,
for all f e B(S) such that the right side converges to some function uniformly
in x. The class of all such f comprises the domain ^A c?f A.
(Tj)(x) - f(x) _
um {u(x)f'(x) + z6 2 (x)f "(x)}
( l o t
Ilfll E(1
-562
2 1 x)
( + liro (fix,xl>el X o x). (2.12)
(lo t
The Firn on the right is zero by the last relation in (1.2)'. As >0 is arbitrary
it now follows that
(T^.1)(x)-
Firn .l (x) = (x)1 '(x) + ia 2 (x)f "(x). (2.13)
rlo t
the fact that given b > 0 there may not exist an e independent of x such that
I f() f (x)I <5 for all x, y satisfying Ix yI < e. The third source is removed
by requiring that f" be uniformly continuous. Assume p(x)f'(x), r 2 (x)f "(x)
are bounded to take care of the second. The first source is intrinsic. One example
where the errors in (1.2)' are o(t) uniformly in x, is a Brownian motion (Exercise
4). In the case the Brownian motion has a zero drift, the second source is
removed if f" is bounded. Thus, for a Brownian motion with zero drift, every
bounded f having a uniformly continuous and bounded f" is in 9A . Indeed,
it may be shown that 27A comprises precisely this class of functions. Similarly,
if the Brownian motion has a nonzero drift, then f e .9A if f, f ', f " are all
bounded and f" is uniformly continuous (Exercise 4). In general one may ensure
uniformity of o(t) in (1.2)' by restricting to a compact subset of S. Thus, all
twice continuously differentiable f, vanishing outside a closed and bounded
subinterval of S, belong to 9A . We have sketched a proof of the following result
(see theoretical complements 1, 3 and Exercise 3.12 of Chapter VII).
Proposition 2.1. Let {X} be a diffusion on S = (a, b). Then, all twice
continuously differentiable f, vanishing outside a closed bounded subinterval
of S, belong to -9A , and for such f
In what follows, the symbol A will stand, in the case of a diffusion {X}, for
the second-order differential operator (2.14), and it will be applied to all
twice-differentiable f, whether or not such an f is in _9A
Turning to the derivation of the backward equation, note that the arguments
leading to (2.13) hold for all bounded f that are twice continuously
differentiable. If the function x -+ (T,f )(x) is bounded and twice continuously
differentiable, then applying (2.13) to this function one gets
This is Kolmogorov's backward equation for the function (t, x) (T, f)(x)
(t > 0, x e S). It will be shown below that if f E.9,, then T, f e Q. The following
proposition establishes this fact for general Markov processes.
Proposition 2.2. Let {T,} be the family of transition operators for a Markov
process on a metric space S. If f E 22 A then the backward equation (2.15) holds.
TS(T,f) T^f s is .i Af
T(A.f) = T,( T
which shows that T, f e _9A and, therefore, by the definition of A, (2.15) holds.
n
The relation (2.16) also shows that
A(T,f) = T 1 (Af), (2.17)
i.e., T, and A commute on .9A This fact is made use of in the proof of the
following important result.
Z,'= f(X,) J 0
r (Af)(X) ds (t > 0 ), (2.18)
where XS is the after-s or shifted process (X) = XS+ (u > 0). By the Markov
property,
E
(A.f)(XS ) du 1 5s J =
[
Ex
(A.f)(X^.)
du]X=x,.
=
LJ
. P T (Af)(x) dul
o X-x
s (2.20)
376 BROWNIAN MOTION AND DIFFUSIONS
By the backward equation (2.15), which holds by Proposition 2.2, and the
fundamental theorem of calculus,
o au f,
o
A(T,.f)(XS)du. (2.23)
Corollary 2.4. If Condition (1.1) holds for a diffusion {X,}, then for every twice
continuously differentiable f vanishing outside a compact subset of S, the
process {Z} in (2.18) is a {.F,}-martingale.
d
A=2o.2 aZ2+p .
dx dx
Proposition 2.5. Let {X,} be a diffusion on S = (a, b), whose coefficients satisfy
Condition (1.1). Let [c, d] c (a, b). Then
f
^ d
exp J - 2(y
(' ' )
d> dz
Px ({X,} reaches c before d) = ` o ? y --
( ) ) , c < x <, d.
J^ a
c f
d exp^ f Z
2u( y) dy1 dz
2 (y) ))
(2.24)
Proof. Define a twice continuously differentiable function f that equals the
right side of (2.24) for c < x < d, and vanishes outside [c s, d + c] c (a, b)
for some s > 0. This is always possible as f" is continuous at c, d (Exercise 3).
By Corollary 2.4,
Z,:= f(X,)
f o
(A.f)(X,) ds (t 3 0 ), (2.25)
Let x E (c, d). By the optional stopping result Proposition 13.9 of Chapter I
E x Zt = E x Z o . (2.27)
But Z o = f(X0) - f(x) under PX , so that E.,Z 0 is the right side of (2.24). Now
check that Af(x) = 0 for c <x < d, so that (Af)(XS ) = 0 for 0 < s <t if
x e (c, d). Therefore,
on {XT = c},
Z= f(X) = {f(c) = 1 (2.28)
f(d) = 0 on {X, = d},
(2.5) leads to
SB
=J I p(t;zydy)ps;xz) dz
+ t;x,y)dy
ss \
J
= J(J p(
a s
t; z, y)p(s; x, z) dz) dy,
//
(2.29)
lim
(Tsf)(x) i(x) . ( 2.31)
(Af)(x) =
slo s
But the right side equals p(t; x, y)/t, and the left side is Ap(t; x, y). Therefore,
g(x)p(t; x, y)
(T*g)(y):= fa
b
dx. (2.33)
(2.34) and interchange the orders of integration and differentiation to get, using
(2.17),
Now, assuming that f, h are both twice continuously differentiable and that f
vanishes outside a finite interval, integration by parts yields
<h, Af> =f s
h(Y)[p(Y).f'(Y) + 1 2 (Y).f"(Y)] dY
2
(.u(Y)h(Y)) + dy2 (2a2(Y)h(Y)) f(Y) dy = <A * h,.f >, (2.36)
Js L Y
where A* is the formal adjoint of A defined by
i
d
(A*h)( y ) _ (,u(Y)h(Y)) + d
yZ (ia 2 (Y)h(Y)). (2.37)
dy
a
*g,.i = <A*T *9, f >. (2.38)
at T
Since (2.38) holds for sufficiently many functions f, all infinitely differentiable
functions vanishing outside some closed, bounded interval contained in S for
instance, we get (Exercise 1),
That is,
I 2
g(x) dx ay ((Y)P(t; x, Y)) 2 aye ( 62 (Y)P(t; x, Y)) 9(x) dx.
Js at Js L
(2.40)
Since (2.40) holds for sufficiently many functions g, all twice continuously
differentiable functions vanishing outside a closed bounded interval contained
in S for instance, we get Kolmogorov's forward equation for the transition
380 BROWNIAN MOTION AND DIFFUSIONS
ap(t; x, y) 2
y (h(y)p(t; x, y)) + ay2 (icr 2 (y)p(t; x, y)), (t > 0). (2.41)
t
c(t, x) =
fs
c o (z)p(t; z, x) dz, (2.42)
ac(t, x)
A*c(t, x) ax J(t, x), (2.43)
with J given by
c(t, x)
At Ax. (2.45)
t
On the other hand, if v(t, x) denotes the velocity of the solute at x at time t,
moving as a continuum, then a fluid column of length approximately v(t, x) At
flows into the region at x during [t, t + At]. Hence the amount of solute that
flows into the region at x during [t, t + At] is approximately v(t, x)c(t, x) At,
while the amount passing out at x + Ax during [t, t + At] is approximately
v(t, x + Ax)c(t, x + Ax) At. Therefore, the increase in the amount of solute in
[x, x + Ax] during [t, t + At] is approximately
c(
x)c(t, x)). (2.47)
t x) x (v(t,
d 2 d
+(x)x (3.1)
A=162(x)dx2
Lemma. If {X} is a diffusion satisfying (1.2)' for all c> 0, then for every r> 2
one has
Proof. Let r> 2, E' > 0 be given. Fix 0 > 0. We will show that the left side of
(3.2) is less than 6t for all sufficiently small t. For this, write
382 BROWNIAN MOTION AND DIFFUSIONS
(5= (0/(2 r 2 (x))) 1 " ( r -2) . Then the left side of (3.2) is no more than
E(IXS+1 Xsl'lnx.+txsl,a) ( XS = x)
r
+ E(IXs+r X 1 (Ixs+t xsls^xs.txsl>S) Xs = x)
S' 2 (1 X., X521(1 xsi,ts) 1 XX = x) + E 'r P(IX., XJ > (5 XS = x)
The last inequality uses the fact that (1.2)' holds for all e > 0. Now the term
o(t) is smaller than Ot/2 for all sufficiently small t. n
Proposition 3.1. Let {X} be a diffusion on S = (c, d) having drift and diffusion
coefficients (x), Q 2 (x), respectively. If 0 is a three times continuously
differentiable function on (c, d) onto (a, b) such that 0' is either strictly positive
or strictly negative, then {Z, :_ (XX )} is a diffusion on (a, b) whose drift i()
and diffusion coefficient Q 2 (.) are given by
2!
(3.5)
where lies between Xs and XS+ ,. Fix s > 0, z e (a, b). There exist positive
constants (5 1 (),5 2 (c) such that 0 - ' maps the interval [z e, z + s] onto
[4 -1 (z) (5 1 (e), - '(z) + 6 2 (s)] Write x = 4 - '(z), and let S m = min{8 1 (s), 6 2 (8)},
SM = max{81(c), 62(8)}. Then
1 ({Zs+t-zlbE) = 1(x-61(E)sX,+t,x+bz(E))1
(3.6)
magnitude by
which is of the order o(t) as t j 0, by the last relation in (1.2)'. Also, by the first
relation in (1.2)',
Therefore,
,,,
E(IXs+r Xs1 3 1o (^)Ilnzs.,z,I,E) ( ZS = z)
cE(IXX+r Xsl 31 {Ix,,,-x,I_<oM} I Xs = x) = o(t). (3.10)
cE(IX3+, Xsl'lnz,+-z,IsE)1 Xs = x)
where r> 2. Hence by the Lemma above R, = o(t) and we get, using (3.9) in
(3.12),
Finally,
Z Z Z Z d
z2 + ( + 2 f z z . d (3.16)
In other words, the mean rate of growth as well as the mean rate of fluctuation
in growth at z is proportional to the size z. Note that one has
Z, = e x'. (3.17)
Z, = Z o e` ', Zo = e x o. (3.18)
TRANSFORMATION OF THE GENERATOR 385
generator
d2 - f \yz log z - 2
= ? Q Z Z Z dz2 z d
z /%i dz , ( 0<z<oo ). (3.20)
Example 3. S = (- co, oo), S = (0, 1), 4(x) = ex/(ex + 1). For this case, first
transform by x - ex, then by y - y/(1 + y). Thus, one needs to apply the
transformation y(y): y - y/(1 + y) to the operator with coefficients (3.15). Since
the inverse of y is y '(z) = z/(1 - z), one may use (3.15) to obtain the
transformed operator (Exercise 1)
dz
A = Zz 2 (1 - Z
z) 2 7 2 log
( 1 -z dz'
z z(1 - z) z
+ z(1 - z) log-----) +-- a 2 log ---
1-z 2 1-z
- z 2 (1 - z)Q2 (los
_--)]
r z d 2
= Zz(1 - z) z(1 - z)a 2 log
L ---
1 - z j dzZ
=
r
L d
zz(1 - z) z(1 - z)a 2 z2 + ( 2 + (1- 2z)Q 2 ) d , 0< z <1, (3.22)
d
= Zz(1 - z) z(1 - z)a 2 dz -
dz 2 1
2y log z - - (1 - 2z) a2
- z dz
0<z<1. (3.23)
dz
A _iQZ_
z dx2
386 BROWNIAN MOTION AND DIFFUSIONS
then
z
= Z6 2 (9z 2 (log z)4"3) dz 2 z{6(log z)" + 9(log z)413} . (3.24)
z + Za dz
(where {X,} is a Brownian motion) does not have a finite first moment (Exercise 2).
A2
Az . (4.3)
Uo
Note that under Condition (1.1) and boundedness of p(x), v z (x), the quantities
, 5, and I ;) a; ) are nonnegative for sufficiently small A. We shall
let e be the actual time in between two successive transitions. Note that, given
that the process is at x = iA, the mean displacement in a single step in time e is
A /3i() - A bi() = p(iA)e = fi(x)e. (4.4)
Hence, the instantaneous rate of mean displacement per unit time, when the
process is at x, is u(x). Also, the mean squared displacement in a single step is
A-^
) )
lim E{.f(X( ) I X = ^]Aj = Ez.f(X,).
[ (4.7)
By (2.15), in the case of a smooth initial function f, the function u(t, x):= E x f (XX )
solves the initial value problem
au _ ^
Z 2(x) ai Z + p(x) au , lim u(t, x) _ f (x). (4.8)
ac ax ax ,1 o
where n = [ t/s], i = [x/A], and p;; are the n -step transition probabilities. It is
)
or,
1 Q2 (iA) (")
+ 2 A2 (P( +i,; 2pi(;
(") + PI
(") -i, ;) ( 4.11 )
By dividing both sides of (4.11) by A, one then arrives at the difference equation
where
J Y1 exp
f rA
( 2 (Y)/a 2 (Y)) dY j
rip c
1L/(X) = hm -
f I
d
J c
exp{ J ( (Y)/a (Y)) dy} dz
c
2 2
)))
which confirms the computation (2.24) given in Section 2. This leads to necessary
and sufficient conditions for transience and recurrence of diffusions (Exercise
3). This is analogous to the derivation of the corresponding probabilities for
Brownian motion given in Section 9 of Chapter I.
Alternative derivations of (4.17) are given in Section 9 (see Eq. 9.23 and
Exercise 9.2), in addition to Section 2 (Eq. 2.24).
ap(t at a
, Y) = 2a2 a Z P(; 2 , Y)
( t > 0, co < x < oc, oo < y < co ). (5.1)
Y
= x, y) dy. (5.2)
P(t; x, )
J e 14y p(t;
(5.3)
t 2 Q2'
or,
2
)
0, (5.4)
t (e
and we obtain
But the right side is the characteristic function of the normal distribution with
TRANSITION PROBABILITIES FROM THE KOLMOGOROV EQUATIONS: EXAMPLES 391
P z ZP
(5.9)
+ y - (YP(t; X, Y))-
at 2 y 2
(t; Z) =
J e'4YP(t; x, y) dy. (5.10)
G Y/ (t; ) = J -^ e aY (t; x, y) dy
i^e'4vp(t; x, y) dy = -i,
^ I (t; Z) =
GY / _ m
e1 ap(t ; x, Y) dY = - iZ
aY
J e`' apaydy
_ (_ iZ) 2 ep(t; x, Y) dY =
J
C Y ^y ) ^(t; ) = J
ey aP(t;
, Y) =
i
y a J p e ^ y ! dY
1 (ap
1 d p(t; )
( -1 P) _ - P - a (5.11)
ay^ i
Here we have assumed that y(ap/ay), a l p/ay e are integrable and go to zero as
jyj - cc. Taking Fourier transforms on both sides of (5.9) one has
Therefore,
p(t;
3 2
^ 2 ^ Z p. (5.12)
t
d
t p(t; a(t)) = --- a 2 (t)p(t; a(t)) (5.13)
or
= c( x , d) e x p ^ a
4 dz (e 2 Yt _ l)}.
Y
For arbitrary t and one may choose d = ^e - Y`, so that a(t) = ^, and get
Note that the above derivation does not require that y be a positive parameter.
However, observe that if y = /m > 0 then letting t --+ oo in (5.16) we obtain
DIFFUSIONS WITH REFLECTING BOUNDARIES 393
the characteristic function of the Gaussian distribution with mean 0 and variance
Q 2 /2y = mc 2 /2 (the MaxwellBoltzmann velocity distribution). It follows that
is the p.d.f. of the invariant initial distribution of the process. With it as the
initial distribution, { V} is a stationary Gaussian process. The stationarity may
be viewed as an "equilibrium" status in which energy exchanges between the
particle and the fluid by thermal agitation and viscous dissipation have reached
a balance (on the average). Observe that the average kinetic energy is given by
E, ( mV,) = (m 2 /4)Q 2
CrZAE
P^^ = 1 (i + b) = 1 (i )
394 BROWNIAN MOTION AND DIFFUSIONS
and
,
Y)
= u(x) aP(
)x Y + ZU2(x) a 2 P(t; x, Y)
z t>0, x>0, Y>0.
aP(t;
(6.3)
The backward boundary condition for the diffusion is obtained in the limit as
A J, 0 from the corresponding equations for the chain for i = 0, j >, 0,
(J 0),
or,
P; i)E P;
+
_ 2Q0) (Pi; P;) + 2D) (P (' P;) (j > 0). (6.4)
In the notation of (4.12) this leads to,
(6.5)
Fix y >, 0, t > 0. For n = [ t/E], j = [y/0], the left side of (6.5) is approximately
0 ( aP(t; x, Y)
at ) .= O'
while the right side is approximately
The equations (6.3) and (6.6), together with an initial condition p(0; x,.) = 6 x
determine a transition probability density of a Markov process on [0, co) having
continuous sample paths and satisfying the infinitesimal conditions (1.2) on the
interior (0, oc). This Markov process is called the diffusion on [0, co) with
reflecting boundary at 0 and drift and diffusion coefficients (x), a 2 (x). The
equations (6.3) and (6.6) are called the Kolmogorov backward equation and
backward boundary condition, respectively, for this diffusion. This particular
boundary condition (6.6) is also known as a Neumann boundary condition. The
precise nature of the approximation of the reflecting diffusion by the
corresponding reflecting birthdeath chain is the same as described in
Theorem 4.1.
(0) = 0. (6.7)
Theorem 6.1. Let {X} denote a diffusion on FR' with the extended coefficients
,u(.), a 2 (.) defined above. Then {1X1 1} is a Markov process on the state space
S = [0, cc), whose transition probability density q(t; x, y) is given by
aq(t; x, y) 12 a 2 q
q
ax (t > 0; x > 0, y % 0 ), (6.10)
+ p(x)
396 BROWNIAN MOTION AND DIFFUSIONS
aq(t; x, y) I
=0 (t>0;y>,0). (6.11)
ax x=0
Proof. First note that the two Markov processes {X} and {X} on R' have
the same drift and diffusion coefficients (use Proposition 3.1, or see Exercise
1). Therefore, they have the same transition probability density function p, so
that the conditional density p(t; x, y) of XX at y given X0 = x is the same as the
conditional density of X, at y given X 0 = x; but the latter is the conditional
density of X, at y, given X0 = x. Hence,
0 x' =X s
=
EX[J 0
g(z)q(t; IXXI, z) dz I {IXI: 0 < u s}]
g(z)q(t; I X31, z) dz, (6.13)
=J 0
where
and making use of the backward equation for p, one arrives at (6.10). Since
the right side of (6.14) is an even differentiable function of x, (6.11) follows.
n
Definition 6.1. A Markov process on S = [a, oo) that has continuous sample
paths and whose transition probability density satisfies equations (6.10), (6.11),
with 0 replaced by a, is called a reflecting diffusion on [a, oo) having drift and
diffusion coefficients u(x), a 2 (x). The point a is then called a reflecting boundary
point.
Note that in this definition (0) is not required to be zero. It is possible to give
a description of such a general diffusion with a reflecting boundary, similar to
that given in Theorem 6.1 for the case (0) = 0 (see theoretical complement 1).
-viz[ yl z
yt)
q(t; x, y) _ ? (1 e -2 exp a
z (1 e -2 y t )
_ y( y_+_xe) 2 1
+ exp a >
z (1 e-2yi)
(t>0;x,y>,0). (6.16)
l p(t; x, Y) 2 a
(ia (Y)P(t; x, Y)) - ((Y)P(t; x, Y)),
Y Y
(t>0;x>,a,y>a). (6.17)
398 BROWNIAN MOTION AND DIFFUSIONS
1
=J to-I
p(t; x, y) dy, (6.18)
(' ap(t; x, y) r a a
0
=J tQ 00 ^ at
dY =
J f^(Y)P + a (ia 2 (Y)P) dY
t.'.) ay \ ay
Theorem 6.2. Let {X} be a diffusion on l' with periodic coefficients ( ), 2(.)
as above. Define
q(t; z, z') _ Z p(t; z, z' + md) (t > 0; z, z' e [0, d)) (6.21)
m=m
1
E[f (ZS +,) {X: 0 < u < s}] = E[g(X5+,) {X: 0 < u < s}] 1
[II
=g(y)p(t; x, y) dY]
m +l}d
z x s
x, Y) dY(6 . 2 3)
m=m LLL L 9(Y)P(t;
d x =x,,
But the periodicity of z() and a'() implies that the Markov processes Al
and {X, md} have the same drift and diffusion coefficients (Proposition 3.1
or Exercise 2) and, therefore, the same transition probability densities. This
means
p(t;x+md,y+md)=p(t;x,y) (t>O;x,ye[0,d);m=0,+1,2,...).
(6.24)
Using (6.24) in (6.23), and using the fact that g(md + z') = f(z') for z E [0, d),
one gets
0 m = ao
Xs = m o d + Z. Then
= J d
0
f (z')q(t; Zs , z') dz'. (6.26)
Since 0(mod d) = 0 = d(mod d), the state space of {Z} may be regarded as
400 BROWNIAN MOTION AND DIFFUSIONS
the compact set [0, d] with 0 and d identified. Actually, the state space is best
thought of as a circle of circumference d, with z as the arc length measured
counterclockwise along the circle starting from a fixed point on the circle. It
may also be identified with the unit circle by zE-.exp{i2irz/d}.
Definition 6.5. Let {XX } be a diffusion on O' with periodic drift and diffusion
coefficients with period d. Then the Markov process {Z} := {X,(mod d)}, is
called a diffusion on [0, d) with periodic boundary or a diffusion on the circle.
q(t; z, z')
_ m-
=m
1
(2ja2t)-i2 expl (2mn + z' z (
2a2t 6.27)
t)2
q(2)(t; z, z') _Y
,,,_ _
( (2m+z'z) 2 1
(2nQ 2 t) 1/2 expt
2a2t Jj
for 1 < z', z ,< 1.
(6.28)
In particular,
[0, 1] then
f
i
forx,ye[0,l]. (6.31)
Note that
_ aq(t; x, )
Z
1- a9t>Y)
ax2
y
, (xE(0 , 1 )>YE[0 , 1 ]). (6.33)
at
anti- x_ v1
=0, (yE[0,1]). (6.34)
UA 1x=0
402 BROWNIAN MOTION AND DIFFUSIONS
Since q (2) (t; x, y) + q (2) (t; x, y) is symmetric about x = 1 (see (6.21)) one has
aq(t; x, y) = 0
(y E [0 , 1 ]). (6.35)
3x x=1
Theorem 6.4. Let )' a 2 (.) be extended as above, and let {XX } be a diffusion
on S = (oo, oo) having these coefficients. Define {Z'} __ {X,(mod 2)}, and
{ZZ } _= {IZ,( 1 " 1I}. Then {Z^} is a diffusion with coefficients p(.) and Q 2 (.) on
[0, 1], and reflecting boundary points 0, 1.
Diffusions with absorbing boundaries are rather simple to describe. Upon arrival
at a boundary point (state) the process is to remain in that state for all times
thereafter. In particular this entails jumps in the transition probability
distribution at absorbing states.
for this extension. Let {X} denote a diffusion on ER' having these coefficients,
and starting at x e [a, oo). Define a new stochastic process {X = } by
X, if t < Ta
(7.1)
^
a=X^. ift>,T a ,
On the other hand, introducing the shifted process {(X) := X5+ ,: t > 0} and
using the Markov property of {X,} one gets, on the set {T a > s},
= 1 {ta , $} P((XS ), e B, and 'r a (XS ) > t I {Xu : 0 <- u -< s})
= 1 {^a, )P(X, e B, and Ta > t I {Xo = Y})I ^ =xs ,
For the second equality in (7.4), we have used the fact that if T a > s then
T a = s + T a (X, ), being the first passage time to a for X. Also, (T a > s}
is determined by {X: 0 < u < s}, so that 1{a>s} may be taken outside the
conditional probability. Combining (7.3) and (7.4) one gets, for B c (a, oo),
Definition 7.1. The Markov process {XJ is called a diffusion on [a, cc) having
drift and diffusion coefficients u( ), a 2 ( ), and an absorbing boundary at a.
The transition probability p(t; x, dy) of {Xj is given by
P(t; x, dy) is given by a density p (t; x, y), say, on (a, co) with p (t; x, y) < p(t; x, y)
(for x > a, y > a) (Exercise 1). One may then rewrite (7.7) as
ap (t; x, y) = I 2 0Zp ap0 (t>0;x>a,y>a), (7.10)
t Za (x) axz + (x) Ox
lim p (t; x, y) = 0. (7.11)
xlo
DIFFUSIONS WITH ABSORBING BOUNDARIES 405
noting that as x j a the probability on the right side goes to zero. If one assumes
that p (t; x, y) has a limit as x j a, then this limit must be zero.
By the same method as used in the derivation of (6.17) it will follow (Exercise
3) that p satisfies the forward equation
aP (t;x,Y) = aZ
t
(i 6Z (Y)P ) a ((Y)P ), (t > 0; x > a,Y > a), (7.13)
ay2 aY
tx,X,
ift>T,(7.15)
ift ^i,
Virtually the same proof as given for Theorem 7.1 applies to show that {X t }
is a Markov process on [a, b].
Definition 7.2. The process {X,} in (7.15) is called a diffusion on [a, b] with
coefficients i(.), a'() and with two absorbing boundaries a, b.
Once again, the transition probability p(t; x, dy) of {X} is given by a density
406 BROWNIAN MOTION AND DIFFUSIONS
p(t; x, B) = J p (t; x, y) dy (t > 0; a <x < b, B c (a, b)). (7.17)
a
J(a,b)
P (t; x, y) dy = PX (r > t), (t > 0; x e (a, b)), (7.18)
In order to calculate these probabilities, let A denote the event that the
diffusion {X,}, starting at x, reaches a before reaching b, and let DD be the event
that by time t the process does not reach either a or b but eventually reaches
a before b. Then DD c A and
where (/i(x) is the probability that starting at x the diffusion {X,} reaches a
before b. Conditioning on XX , one gets for t > 0, a <x < b,
PP(DD) =
f(a,b)
PX(D, 1 X, = y)P (t; x, y) dy =
f (a.b)
i(y)P(t; x, y) dy. (7.23)
Proposition 7.2. Let (x) and a 2 (x) satisfy Condition (1.1) on (oo, oo). Let
S = [a, b], for some a < b. Then the probability p(t; x, {a}) is given by
P(t; x, {a}) = /i(x) J0 6 ^i(Y)p (t; x, y) dy (t > 0, a <x < b), (7.25)
where '(y) = PY (X, = a), Py being the distribution of the unrestricted process
{X,} starting at y.
rJi(x) _
f.
'
exp
Z
2(y) dy 1J dz
dy } dz
< Ja ^Z(Y) ) ) (a < x <, b). (7.26)
f exp
O ."
f(Y)
Finally,
7.3 Mixed Two-Point Boundary Case (One Absorbing Boundary Point and
One Reflecting)
Let {X} be a diffusion on [a, oo) with a reflecting boundary a, having drift and
diffusion coefficients t(), a 2 (.). For b > a, define
X, if t < zb,
i (7.28)
{
b ift>T,.
If {X,} starts at x e [a, b], then {1,} is a Markov process on [a, b], starting at
x, which is called a diffusion on [a, b] having coefficients p(), a 2 ( ), and a
reflecting boundary point a and an absorbing boundary point b.
408 BROWNIAN MOTION AND DIFFUSIONS
2(z)
1(x 0 , x) := x dz (8.2)
xo a 2 (Z)
Notice that n(x) is proportional to the limiting measure obtained from Eq. 4.1
of Chapter III, in the diffusion limit of birthdeath chains with the parameters
given by (4.2) of the present chapter.
Consider the space LZ (S, it) of real-valued functions on S that are square
integrable with respect to the density n. Let f, g be twice continuously
differentiable functions that satisfy the backward Neumann or Dirichiet
boundary condition(s) imposed at the boundary point(s), and that vanish
outside a finite interval. Then, upon integration by parts, one obtains the
following property for A (Exercise 1)
where
<f h> =
fS
f(x)h(x)n(x) dx. (8.5)
In view of the symmetry of A reflected in (8.3), one may expect that there is a
spectral representation of the transition probability analogous to that for
discrete state spaces (see Section 4 of Chapter III and Section 9 of Chapter IV).
That this is indeed the case will be illustrated in forthcoming examples of this
section (also see theoretical complement 2).
Consider the case in which S is a closed and bounded interval. The idea
behind this method is that if i is an eigenfunction of A (including boundary
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS 409
J = Y_ <fi'Yn>nWn. (8.8)
n=0
Then u satisfies the backward equation and the boundary condition(s) together
with the initial condition
which means
e""'^(x)^n(Y))it(Y) dy.
fs f(Y)p(t; x, dy) = fs f(Y)(Z
=O
(8.13)
In such cases, therefore, p(t; x, dy) has a density p(t; x, y) and it is given by
p(t; x, y) _Ze "'^n(x)^ n(Y)n(Y)
a
(8.14)
n=o
a t
p =0 (t > 0; ye S). (8.16)
x x=o,a
J 0
l z dx=d, J cost(Jdx =2,
0
(8.21)
b rn = jI
\ (m ? 1); bo d
(8.22)
To see whether (8.19) and (8.20) provide all the eigenvalues and eigenfunctions,
one needs to check the completeness of {di m : m = 0, 1, 2, ...} in L2 ([0, d], n).
This may be done by using Fourier series (Exercise 10). Thus,
a"t
p(t; x, y) = L e II/m(x)frm(y)
m=0
1 2 z z z
1 anmmirx m7ry
exp _ +
t cos cos ( X )
d
dm, 2dz d
(t >0,0<x,y <d). (8.23)
Notice that
1
lim p(t; x, y) = d , ( 8.24)
Example 2. (Brownian Motion with One Reflecting Boundary). Let S = [0, oo),
(x) - 0, a 2 (x) - a z > 0. We seek the transition density p that satisfies
ap 2 a lp
fort>0, x >0, y:0,
at= 2
a xZ
(8.25)
ap
=0 fort>0, y>,O.
ax I =o
X
1 ( oznz(m/d)2 1 ^
p(t; x, y) = lim - exp 1 t)t cos cos
d r^d m 2=-m d d
\ m^y^
( to z i z u zl
= exp{ 2 cos(zrxu) cos(nyu) du
-X 1. )
Zzz
= 1 exp - ta [cos(n(x + y)u) + cos(n(x - y)u)] du
2.
_ 2^ I (e
,J-^
+ e)
ex{p_
l
t62 z 1 dZ
I
( _ nu)
(x + Z )Z } + ex
= (2ntor)
lz ^1z (exp {( _ 2tv ) p 2tQ Z )Z) /
(x
(t>0,0<x,y<oo). (8.26)
The last equality in (8.26) follows from Fourier inversion and the fact that
exp{ ta 2 ^ 2 /2} is the characteristic function of the normal distribution with
mean zero and variance to e .
z
(t>0,0<x<d;0<y<d), (8.27)
a^ 2a +ax,
yx In=
I1 m (x) = b m exp -- z sin d( m = 1, 2, ...) (8.29)
2
b m = (m=1,2,...). (8.32)
d
_ 2 exp( (Y0-z
x)
} exp t
t ia-i j expj m22daZt^
m=1
l 2d 2
x sin( m
dx ) sin^ m
dy^ (t > 0,0 < x, y < d). (8.33)
It remains to calculate
p(t; x, {0}) = PX (XX = 0), p(t; x, {d}) = PX (X, = d), 0 < x < d. (8.34)
Note that
= 1 2 exp
1 Q2c m^I a
m exp{
l
m2
d^l
\(8.35)
2t si
n( ^
( mix
where
d
am = exp " sin y)dy. (8.36)
f'
Thus, it is enough to determine p(t; x, {0}); the function p(t; x, {d}) is then
determined from (8.35). From (7.25) we get
where i/i(x) = Px ({X,} reaches 0 before d). From the calculations of (9.6) in
Chapter I we have
x if=0
(8.38)
^ (x) = exp{2(d x)/a2}
if 0.
1 exp{2d/6 2 }
J 2 exp l 2
nz x) _
IY ^z
p(t; x, y) = 2t exp 2 sin(irxu) sin(ityu) du
}n2o2tu2
2
_ exp 1 J exp{ - 2 } sin(Zx) sin(ZY) dZ
-
t( a 2 x) 2 Q Z o 1. ))
2 2t)
2_ exp p(y i J exp{
l ort z ).
x) / 0
2t ) 1
(
= exp { (y x)
l Q z 2vz (2na 2 t) 1 / 2
(x y) z (xexp+
y)
z
x exp (8.39)
2o 2 t 2vzt
Integration of (8.39) with respect to y over (0, oo) yields p(t; x, {0}`), which
is the probability that, starting from x, the first passage time to zero is greater
than t.
Examples 3 and 4 yield the distributions of the maximum and the minimum
and the joint distribution of the maximum, minimum and the state at time
t of a diffusion with constant coefficients over the time period [0, t]
(Exercises 4-6).
Assume, as always, that Condition (1.1) holds. Criteria for transience and
recurrence of diffusions may be derived from the computation of i/i(x) given in
Section 4 (see Eq. (4.17)). A different method of computation based on solving
a differential equation governing ifr is given in this section; recall Chapter III,
Eqs. 2.4, 2.5 in the case of the birthdeath chain for the analogous discrete
problem. The present method is similar to that of Proposition 2.5, but does
not make use of martingales. It does, however, use the fact derived in Exercise
3.5, namely,
Proof. Let i denote the first passage to the set {c, d} with two elements. The
event
{i > h} - {c < min{X: 0 < u < h} < max{X: O <, u < h} <d}
Here X,; is the shifted process {(X,; ),} := {X,, ,: t >, 0} and tr' = inf{t: (X,) = c}
(= T h on {r> h}). Now extend the function i/i(x) over x <c and x > d
smoothly so as to vanish outside a compact set in S. Denote this extended
function also by ay. Then (9.3) may be expressed as
=
J
I (y)p(h; x, y) dy + o(h), as h j 0. (9.4)
s
The remainder term in (9.4) is o(h) by the third condition in (1.2)'. Actually,
one needs the stronger property that PX(maxos5shlX5 xj > E) = o(h) (see
Exercise 3.5). For the same reason,
Hence
But the limit on the left is (Ai/i)(x) = Za 2 (x)^/i"(x) + p(x)/i'(x) (see Section 2).
n
with the usual convention that 1(z, x) _ I(x, z). Fix an arbitrary x 0 E S, and
define the scale function
d2 d
A = 2 6 2 (x) dx2 + (x) x (9.11)
d d
= (9.12)
A dm(x) ds(x)
(df (x)
(Af)(x) =d (9.13)
dm(x) ds(x)
d
(9.15)
dm(x) ds(x)
which yields
s(d)
cl
--) = 1- c2 = c i s(d) = (C.
) (9.17)
s(d)s(c) s(d)s(c)
The relation (9.18) justifies the scale function nomenclature for the function s.
When distance is measured in this scale, i.e., when s(y) s(x) is regarded as
the distance between x and y (x < y), then the probability of reaching c before
d starting at x is proportional to the distance s(d) s(x) between x and d. This
is a property of Brownian motion, for which s(x; 0) = x (also see Eq. 9.6,
Chapter 1). In particular, starting from the middle of an interval in this scale,
418 BROWNIAN MOTION AND DIFFUSIONS
the probability of reaching the left end point before the right end point is Z,
the same as that of reaching the right end point first.
It follows from (9.18) (Exercise 2) that
e_'' dz
. (9.20)
Jc
(b)If
exp{-1(x o , z)} dz
s(b) s(x)= fX' (y < x). (9.25)
pxY s(b) s(y)
f b exp{ 1(x o , z)} dz
J
'
Proposition 9.4
(a) Suppose S = [a, b) and a is reflecting. Then the diffusion is recurrent if
and only if s(b) = oo. If, on the other hand, s(b) < oo then one has p 1
for y > x and
b exp{-1(x 0 , z)} dz
s(b)s(x) =
Pxv = y<x. (9.26)
s(b) s(y)
exp{ I(x o ,z)} dz
fb y
s(x) s(a)
if0<x<'y,
s(y) s(a)
Px y = I ifs(b)=oo
and 0,<y<x, (9.27)
s(b)s(x)
if s(b) < oc and 0 ,< y ,< x.
s(b) s(y)
s(b) s(x)
a y x<b,
s(b) s(y)
Px y = (9.29)
s(x) s(a)
a<x<y^b.
s(y) s(a)
with the usual convention that t y (uw) = oo for those sample points w for which
X,(oo) does not equal y for any t (i.e., if the set on the right side of (10.1) is
empty). If c < d are two states, then write
where A denotes the minimum, i.e., S A t = min(s, t). Then i is the first time
the process is at c or d, provided X. e [c, d].
Let us now calculate the mean escape time M(x) of a diffusion from an
interval (c, d),
M(x):= E x T, (10.3)
Note that M(x) = 0 if x (c, d). Here [c, d] c S. Now denoting by Xti the
shifted process {(Xh ),:= Xk+( : t > 0}, one has
since i(X,;) = z({(X,; ),}) is precisely the additional time needed by the process
{X1 }, after time h, to escape from (c, d). Because 1{,>k} is determined by (i.e.,
measurable with respect to) {X: 0 < u < h}, it follows by the Markov property
that
1
lim - {(Th M)(x) M(x)} _ 1, (10.7)
Noh
i.e., (AM)(x) = 1 for x e (c, d). Therefore the following result is proved.
Proposition 10.1. The mean escape time M(x) from (c, d) satisfies
d dM(x)
1, (10.9)
dm(x) ds(x)
dM(x) =
m(x) + c 1 , M(x) = c,s(x) f m(y) ds(y) + c 2 . (10.10)
ds(x) cx
^ d m(y) ds(y)
cz = c,s(c). (10.11)
c ' s(d) s(c) '
z
S(C; X) d
M(x) = m(c; y)s'(c; y) dy m(c; y)s'(c; y) dy
s(c; d) ff
422 BROWNIAN MOTION AND DIFFUSIONS
e-rt^.:^ dz
_ jd d y
^2(z) eic'.Z) dz ] e_'.
dY
J e-,cc,=) dz < <
^ x y 2
^ LJG 6 2 e''
dz e `c'.y dY (c < x < d). (10.12)
-
)
Equation (10.9) justifies the nomenclature speed measure given to m. For suppose
the diffusion is in natural scale, i.e., that it has been relabeled so that s(x) = x.
Then (10.9) may be expressed as
only if
2 e`cX.z) dz - X
m(a):_ m(x o ; a):=
f
. a. a2(Z) a a'
2
W
e-(z,x) dz < oc .
(10.18)
Then (Exercise 4)
Finiteness (with probability 1) of the passage time t y under every initial state
is the meaning of recurrence of a diffusion. A stronger property is the finiteness
of their expectations.
and
For intervals S having boundary points one may prove the following result.
The modifications that give this result are left to exercises.
Proposition 10.3
(a) Suppose S = [ a, b) with a a reflecting boundary. Then the diffusion is
positive recurrent if and only if s(b) = cc and m(b) < oo.
(b) Suppose S = [a, b] with a and b reflecting boundaries. Then the diffusion
is positive recurrent.
Take Q to be the set of all possible. paths, i.e., the set of all continuous functions
aw on the time interval [0, co) into the interval S (state space). In this case X,
424 BROWNIAN MOTION AND DIFFUSIONS
is the value of the trajectory (function) at time t, X1 (w) = w,. The sigmafield F
is the smallest sigmafield that includes all finite-dimensional events of the form
{X^. e B, for I < i ,< n}, where 0 .< t 1 <t 2 < <t, and B 1 , .... B are
subintervals of the state space S. The probabilities of such finite-dimensional
events are specified by the transition probability p(t; x, dy) of the diffusion and
an initial distribution it (see Eq. 1.4). This determines P on all of .F .
Let .. denote the sigmafield of events determined by finite-dimensional events
of the form {X1 , e B L for I <, i < n} with t, <, t for all i. Thus, f, = a{Xs : 0 < s < t}
is the class of events determined by the sample path or trajectory of the diffusion
up to time t.
This says that, for a stopping time i, whether or not to stop by time t is
determined by the sample path or trajectory up to time t. Check that constant
times T - t o are stopping times. The first passage times t y , ; A T d are important
examples of stopping times (Exercise 1). By the "past up to time t" we mean
the pre-T sigmafield of events .FT defined by
{T<oo}eWit . (11.3)
The proof of the equivalence of (11.1) and (11.2) is a little long and is omitted
(see theoretical complement 1). For our purposes in this section, we take (11.2)
as the definition of .yt .
It follows from (11.1) or (11.2) that if i is the constant time r - t o , then
.; =^
As intuition would suggest, if t, and r 2 are two stopping times such that
r 2 then
F ci2 . (11.4)
STOPPING TIMES AND THE STRONG MARKOV PROPERTY 425
For, if A e 34 ,, then
Since A n {z, < t} E .F, and {z 2 > t} _ {T Z < t}` E .y the set on the right side
of (11.5) is in .F,. Hence A e.yr2 .
Another important property of stopping times is that, if i 1 and r 2 are stopping
times, then r, A i 2 - min{r,, i 2 } is also a stopping time. This is intuitively
clear and simple to prove (Exercise 2). It follows from (11.4) that
c 5b Ft
A cz C 9 n ^`tz
Finite-dimensional events in J1t are those that depend only on finitely many
coordinates of the stopped process, i.e., events of the form
The sigmafield . is precisely this class of sets G. The Markov property holds if
the "past" and "future" are defined relative to stopping times, and not merely
constant times. For this the past relative to a stopping time T is taken to be
the pre-T sigmafield information contained in the stopped process {X,,, T }. The
future relative to to r is the after-T process Xt = {(XT )1 } obtained by viewing
{X,} from time t = r onwards, for r < oo. That is,
Since the after-T process is defined only on the set {r < oo}, events based on
it are subsets of {r < co}. The after -i sigmafield on IT < cc} comprises sets of
the form
If (11.12) holds, then for every bounded .Ft measurable function Z we have
Conversely, if the equality between the first and last terms in (11.13) holds for
all bounded measurable functions Z of the stopped process, then (11.12) holds.
Indeed, to verify (11.12) it is enough to check (11.13) for finite-dimensional
functions Z of the form (11.7), where f is bounded and continuous on S'
(Exercise 3). Now
since XSJA t ,, ... , X,, A are determined by {Xs : 0 < s <, s ; }, i.e., are measurable
with respect to ^sJ ; also,
E(.f(XS ; A , .
.. , X,, t ,) 1 {==s,}[Ey h(Xt,, ... , Xt:})],,=x, j )
Therefore,
This completes the proof in the case r has countably many values
o s 1 < S 2 < .
The case of more general r may be dealt with by approximating it by stopping
times assuming countably many values. Specifically, for each positive integer n
define
1
L if j=0,1,2,...
Z n = (11.17)
0o if r = oo .
Since
^ Tn = jn 1
< 2n t< 2
j J_ f t \ 2n J'f T J 2n }
RE^
{;t}= U fT n =^n r for allt >,0.
j:j/2"-<t )))
Therefore, T n is a stopping time for each n and i n (w) j i(w) as n oo for each
w e Q. Also, .F, c 3 (11.4). Let f, h be bounded continuous functions on
S', S', respectively. Define
Then
PP(Y) =
J 9(Yi )p(ti;Y,Yi)dYi, (11.19)
Fix x E (c, d) and It > 0 such that [x h, x + h] c (c, d). Writer = T x _ h A 'rx+n,
i.e., r is the first time {X} reaches x h or x + h. Then Px (i < oo) = 1, since
x+nexp^_(y_X)Zl dy
Px(i>t)<PP(xh<X,<x+h)=
( 21
1
f
(27CV2t)l^Z x_h
2o2t J}
_ 1
(2 )I Z
^ ^
f _ '
exp{
,41,7
z2
dz -*0 as t -+ cc. (11.23)
Now,
Therefore,
Therefore,
dx
OX)=dc (11.29)
which is a special case of (2.24), or (4.17) or (9.18); see also Eq. 9.6 of Chapter I.
Now, by (11.23) and (11.29),
where T b (X) is the first hitting time of b by the after-T a process X. Since this
last hitting time depends only on the after-T a process, it is independent of T a
which is measurable with respect to .F. Hence T b Ta - Tb(X o) and T a are
independent, and the distribution of T b T a under P0 is the distribution of T b
under Pa . This last distribution is the same as the distribution of T b _ a under
P0 . For, if {X1 } is a Brownian motion starting at a then {X, a} is a Brownian
motion starting at zero; and if T b is the first passage time of {X,} to b then it
is also the first passage time of {XX a} to b a.
If 0 < a, <a 2 < ... <a are arbitrary, then the above arguments applied
430 BROWNIAN MOTION AND DIFFUSIONS
to
Figure 11.1
Proposition 11.2. Under P0 , the stochastic process {Ta : 0 < a < o} is a process
with independent increments. Moreover, the increments are homogeneous, i.e.,
the distribution of T b t o is the same as that of T d ; if d c = b a; both
distributions being the same as the distribution of r b _ a under P0 .
where t 0 (Y) is the first passage time to a of { }}. Note that ra (Y) = r a . By
Proposition 11.3, the extreme right side of (11.34) equals Po (X1 < 2a x, z a <- t).
Hence,
P0 (X, > a) = P0 (X, < a, M, >, a) = P0 (M, >, a) P0 (M, >, a, X, > a)
= PO (M, > a) PO (X, > a) (11.36)
Hence,
2 1c0 YZI(2a2t) dy
PO(ta < t) = P0 (M, % a) _ e-
(2ita2t)I 2 a
2
(2)1/2 e - = 212 dz. (11.38)
t 3/2 e a2 /(2ort)
-
0 < t < co. (11.39)
fa(t) = u(27r)1/2
9,(a) = 2 e
- a2 /(2,,2f) = 2Q1
( 27ra 2 t) 1J2 \a
aj tJI , cc <a < oo , (11.40)
F =
- )/(a lt) ( 2irl)
e =/2 dz, co
(11.41)
<ya, a0.
Differentiating successively with respect to y and a, one gets the joint p.d.f. of
432 BROWNIAN MOTION AND DIFFUSIONS
(X1 , MM ) as
h (y a) = 2(2a y) j 2a
r a 3 t 312 y
\ t7 \/ /
2 2ay 2
exp' (2ay) co< ,<a,
0,<a<oo .
g Q' t 3 / 2 p 2Q2t y
(11.42)
is i.i.d., where
r_X
1
t
lim f(X.,) ds =
t o
f fs(x)n(dx), (12.4)
1 d2
2 (a 2 (x)rc(x)) x ((x)n(x)) = 0 for x e S, (12.5)
d
or simply of
ir(x) = m (x)(12.7)
m(b) m(a)
Proof. Positive recurrence implies E5 (rl 2r r12._2) = E Y ri 2 < oc. Hence, by the
classical strong law of large numbers,
r
(flr' 12(r -1))
- r' 1
X12r
* Ey112 (12.8)
r r
lim 1
rc, fl2r
f 0
f(XS ) ds = lim
t ao t 0
f f( X ) ds, (12.10)
EI J 0
n2 I f(X,)I ds < oo . (12.11)
for all f satisfying (12.11). In the special case f = 1 8 with B a Borel subset of
S, (12.12) becomes
r
1
lim 1 8 (X3 ) ds = ir(B) (12.13)
t- r tfo
where
n(B)'= 1
Ep^i2
fon 2
EY 1 B (XS ) ds. (12.14)
Thus, (12.13) says that the limiting proportion of time the process spends in a
set B equals the expected amount of time it spends in B during a single cycle
relative to (i.e., divided by) the mean length of the cycle. It is simple to check
from (12.14) that n is a probability measure on S (Exercise 2). Also, if
f = a1 , where a l , a 2 , ... , a are real numbers and B 1 , B 2i ... , B are
pairwise disjoint (Borel) subsets of S, then
Y a ^rl: 1
1 E5 f(Xs) ds =E 1 8 r (X.) ds
E5,12 0 2 i= 1 0
_ a n (B ) =J f(x)ic(dx).
i ;
The equality
n= ('
J s
may now be extended to all f satisfying (12.11) (Exercise 2). Combining (12.12)
and (12.15), one has
lim 1
r- ,O t 0
r
f
f(X5 ) ds = sf(x)n(dx) (12.16)
INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF LARGE NUMBERS 435
1 J (T
(' t
lim S f )(y) ds = f (x)n(dx) (12.19)
r - .0 t o s
for all y E S. But the left side also equals, for any given h> 0,
1 t+h 1 t 1 t ('
= J (T f)(x)n(dx) (12.20)
h
The last equality follows from (12.19) applied to the function Th f. It follows
from (12.19) and (12.20) that, at least for all bounded (measurable) f, one has
JI (T f )(x)n(dx) = f f(x)n(dx)
s
h
s
(h > 0). (12.21)
Specializing this to f = 1 B one has (T h f)(x) = p(h; x, B), so that (12.21) yields
=f J s
1 B (x)rc(dx) = J x(dx) = T(B) = P,,(Xo e B),
s
(12.22)
proving that it is an invariant probability. The proof of parts (a), (b) of Theorem
12.2 is now complete, excepting for uniqueness. Uniqueness may be proved in
the same manner as in Section 6 of Chapter II (Exercise 3).
In order to prove (c), first note that n(x) given by (12.7) is a probability
density function (p.d.f.) which satisfies (12.6) and, therefore, (12.5). To prove
its invariance one needs to check (see (12.14)), for n given by (12.7),
1
YIl
E
z
E E x .f(X:) ds = .f(z)n(z) dz =
s i (b) m(
s
)d (
-
) (12.23)
0 Jz a
f
.f(X,) ds = Y b .f(u) dm(u) ds(z).
Ex o
JJ
(12.24)
EJ n2 f(X)ds=
0
E !
' f(Xs) ds + E x J 0
tY .f (XS) ds
r y /(' 6 \ 6
= I ( J f (u) dm(u) I ds(z) = s(x; y) f(u) dm(u). (12.25)
J X \ n / a
In particular, taking f - 1,
ap(t
f p(t; x, y) 7 r(x) dx =
s s
,
t
y)
f
m(x) dx = (Ax p(t; x, y))rti(x) dx
s
= Jp(t; x, y)(A*rc)(x) dx = 0,
s
(t > 0). (12.27)
Since fs p(t; x, y)rc(x) dx is the p.d.f. of X, when the initial p.d.f. is ir(x), it follows
from (12.27) that the distribution of X, does not change for t > 0; but X, Xo
a.s. as t 10, so that X, converges in distribution to ir(x) dx. Therefore, X, has
p.d.f. n(x) for every t > 0.
In view of (12.28) one may concentrate on the behavior of just one excess
demand, say z l (x). Price adjustments in a market usually depend on excess
demands. Since x 2 = 1 x 1 , one may consider the price X 1 (t) of the first
commodity as a diffusion on (0, 1) with a drift function determined by the excess
demand z l at this price. For example, take the generator of such a diffusion to
be of the form
= 12
62 O
x ,O
dz + x d (0 <x<1), (12.29)
dx2 dx
so that the mean rate of change in price is proportional to the excess demand.
The drift function z 1 (x) satisfies
This says that if the price of commodity I is very low (high) its demand is very
high (low), so the price will tend to go up (down) fast at the mean rate z,. Also,
assume that there is a constant r > 0 such that
Qz (x) , Q. (12.31)
and a diffusion coefficient 2 (y) that represents the (time) rate of local fluctuation
(variance) when the process is at y. If one assumes
J
1
o
'
(f(X) E n f) ds (13.2)
In order to use Theorem 13.1, one must be able to express the variance
parameter in terms of the drift and diffusion coefficients. A derivation of such
an expression will be given now.
Assume E,j = 0 (i.e., replace f by f E rz f ). Under the initial distribution
n the variance of (13.2) is given by
Var>< Jf(X) ds = E(
o t o
J f(X.,) ds) = 1 E n J
t J
o 0
f(X)f(X5 ) ds' ds
,
r r r s
_
o
? En
o J f(X)f(X)ds' ds = ?
o 0
En(f(XX).f(X5))
_?JJ
t o
S
oo
E><[f(X5)E(f(X,) 1 X .)] ds' ds
5
2 r s
J
('
Now
E,c(.f(Xs.)(T5 - 5.f)(X))=
f s
.f(x)(TS-s-f)(x)n(x)dx. (13.4)
1 r 2 r s
J
Var n o f (XS) ds
=
fs ( fo I (Ts
2 r
t
s f)(x) ds' ds)f (x)n(x) dx
exists in L2 (S, it) (see remark following proof of Theorem T.13.4, page 515).
Then
1.
lim 1
t .' ot JS (T f)(x) duds = h(x),
o
(13.7)
(13.10)
A = Zo2(x) - + (x) d ,
together with boundary conditions if S has a boundary. For this, note that, by
(13.6), for > 0,
= lim
S I s
+E (Tf)(x) dv = lim
J e
S (T f)(x) dv
= lim
s'^^ o
J S (Tj)(x) dv
J0
(T^f)(x) dv = h(x) J (T^f)(x) dv.
0
(13.11)
Hence,
(TEh)(x) h(x) 1 E
Ah(x) = lim = um-- (T f)(x) dv = (T o f)(x) = f (x).
ej0 E sf.0 E 0
(13.12)
These calculations provide the following result.
2<f, h> := 2
f
s
f(x)h(x)n(x) dx (13.13)
It may be shown that if there exists a function h in LZ (S, it) satisfying (13.9)
for a given f in LZ (S, it) with Ej = 0, then such an h is unique up to the
addition of a constant (see theoretical complement 4).
The condition E n f = 0 ensures that in this case the expression (13.13) does
not change if a constant is added to h. Also note that, in order that (13.9) may
admit a solution h, one must have Ej f = 0. This is seen by integrating both
sides of (13.9) with respect to it(x) dx and reducing the first integral by
integration by parts. That is,
Ah(x)it(x) dx =
JS f s
h(x)(A*n(x)) dx
= fhs
(x)[Z(u'(x)n(x))" (p(x)n(x))'] dx = 0, (13.14)
using (12.5).
A k- dimensional standard Brownian motion with initial state x = (x ( ' ) , x (2) , ... , x)
is the process {B, = (B, .. . , B"): t 0} where {B, }, 1 < i < k, are k
independent one-dimensional Brownian motions starting at x ' (1 < i < k). It
(
(Y ` x(`))
I
( )
k I ex { _
(2nt) 1 / 2 p 2t ?^
where
As in the one-dimensional case, the Markov property also follows from the fact
that {B: t > 0} is a (vector-valued) process with independent increments. It is
straightforward to check that p satisfies the backward equation
ap_1k ^ a 2 p
(14.2)
442 BROWNIAN MOTION AND DIFFUSIONS
ap - 1 k02p
(14.3)
t 2 a=1 (a y (I)) 2
A Brownian motion {X, = (X,(' ) , . .. , X,(k) )} with drift vector p = ( (1) , ... , (k) )
and diffusion matrix D = ((d ;; )), is defined by
1
p(t; x, y) -
((21rt)'/ 2 ) k (det D)" 2
( 1 k k
x ci) - t (i))
x expS -- Y I d''(y(`) - x ( ` ) - t()(yc;) -
` 2 t;=ii=1
(x,yeRk;t>,0). (14.5)
Here ((d")) = D -i* . This transition probability density satisfies the backward
and forward equations (Exercise 3)
a 1 k k a2 k a
U)
,
rat = 2 ; ^ j=1 d" ax ` axu + ; Y axe;)
( ) )
(14.6)
IC
a t__ 1d
l3 a 2 P - u ;;) aP
t 2;=1; =i
ay u)a x c;) ;=1 a y (j)
Kolmogorov equations
aP
t
1 Y_
aP __ 1
k
2, =1 i=1
k
a
I d,i(x) ax ^ ax ct^ + Y (x) ax
a 2 (dj;(Y)P)
1
3(f^(`^(Y)P)
^,^ ,
(14.7)
=1 a y <<^
t 2i=11=1 yu
4 l ( O(y)) does not go to infinity faster than lyl, and Id (y)I does not go to
;;
is the joint density of (X,,, X, ... , Xj where 0 < t 1 < t 2 < < t,. The
sample paths may also be taken to be continuous (vector-valued functions oft).
An alternative method of constructing the diffusion is to solve Itb's stochastic
differential equations:
where {B,: t >, 0} is a standard Brownian motion. The equations (14.8) (there
are k equations corresponding to the k components of the vector X,) may be
roughly interpreted as follows: Given {X: 0 < u <, t }, the conditional distribution
of the increment dX, = X, +d , X, is approximately Gaussian with mean (X,) dt
and variancecovariance matrix D(X) dt. We will study the Ito calculus in detail
in Chapter VII.
The operator
k k
12
A:= d,i(x) a2 +
x^`^ xci^
Y j(x)axti)
a (14.9)
^.i =1 r=1
is called the (infinitesimal) generator of the diffusion on R k with drift p() and
diffusion matrix D(.).
444 BROWNIAN MOTION AND DIFFUSIONS
Example 1. (The Bessel Process or the Radial Brownian Motion). Let {B,} be
a k-dimensional standard Brownian motion, k> 1. Let
k
O(x) = Ixl = 5" (x) 21 ^ 2 R,:= q(B,) = IB,I
( =J
Let us show that {R} is a Markov process on the state space g= [0, co).
Let f be an arbitrary real-valued, bounded measurable function on S. Write
g(x) = (f o ip)(x) = f (IxI). Using the Markov property of {B,} one has
E[.f (R,+s) I {R,,: 0 u s}] = E[E(f (R 1+S ) I {B,,: 0 s u s}) I {R: 0 u s}]
= E[E(g(B, +S ) I {B: 0 u s}) I {R: 0 u s}]
= E[(T,g)(B5) I {R: 0 u <, s}], (14.10)
(T,g)(x) = g((y))p(t; x, y) dy
fR k
f
2
= i(IYI)(2^ct) k^2 eXp _ ly 2a x l dy. (14.11)
Rk
To reduce the last integral, change to polar coordinates. That is, first integrate
over the surface of the sphere {lyl = r} with respect to the normalized surface
area measure da, and then integrate out r after multiplying the integrand by
INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS 445
x12+IYI?1
ex { IY xl Z ^ da = ex {_I
ffl yl =r} p 2t p 2t J
x
flI exp lx
ZI =1}
lIYI
t i=1
wu ) zc` ) }da i (z) (14.12)
)))
where w = x /Ixl, z = Y/IYI Note that E;` =1 w ( zW is the cosine of the angle
between the unit vectors w and z, and its distribution under da l does not depend
on the particular unit vector w. In particular, one may replace w by (1, 0, ... , 0).
Hence,
(14.13)
where (Exercise 9)
1
e "(1 uz)(k zu2 du -
h(v):=
f
ilzl =11
e Z `
da, _
J
' 1 (14.14)
1
(1 uz)(k-2)I2 du
Thus,
w k e - I=I 212 i f( r ) e -.=I2(h( j
(T1g)(x) = (2mt) - ki 2 r x ll r k-I d r
= G(t, Ixl),
J o ` t )
(14.15)
say. Using (14.15) in (14.10) one gets the desired Markov property of {R 1 },
E[.i(R, +s) I {R: 0 < u 5 s }] = E[ i(t, IX5I) I {R u : 0 < u < s }] ='(t, Rs).
(14.16)
Further, it follows from (14.15) that the transition probability density function
of {R 1 } is
z
9(t; r, r') = (21rt)-ki2wk
exp _ r ^rt
lhl,k ^. 2 r zl (14.17)
Thus, the transformed Markov process has as its infinitesimal generator the
Bessel operator
Ar. 1 dz + k 1 d
(14.19)
2 dr 2 2r dr
It will be shown later that the state space of {R,} may be restricted to (0, cc),
so that (k 1)/2r in (14.19) is defined.
It may be noted that not only is {R,} Markovian, its distribution under P,,
is Q 4,( . ) ; where P. is the distribution of the Brownian motion {X,} when X 0 = x,
and Q, that of {R} when R o = r. This is easily checked by looking at the joint
distribution, under Px , of (R,., ... , R) by successive conditioning with respect
to {X: 0 '< U
Let us make use of the above reduction to calculate the probability (for
given 0<c<d<cc)
Px ({X 1 } reaches aB(a:c) before aB(a:d)), (14.20)
for c<Ixal<d.
e - ' , 'I dr
0(Ix al) _ ^" dH^ (14.23)
f c
e -'..) dr
^'
dy= log(C) k- ( 14.24)
I(c,r) Jar Q Z (y) dy= ^^rk y I
so that
logdloglx al
ifk =2,
logd lagt (
c k 2 (C ) k
-2`14.25)
^G(Ix al) _
Ix alJ d
c k-2 if k > 2,
(k -)
for c<Ixal<d.
Further, letting dl oo in (14.25) (and (14.21)), one arrives at
If k 2
Px ( {X 1 } ever reaches B(a:c)) = 1 k_2 (c < Ix al).
ifk>2,
xca
(14.26)
the countability of the state space, the random walk reaches every state with
positive probability. This is not true for multidimensional Brownian motions.
Indeed, by letting c 10 in (14.25) one gets for 0< jx aj < d,
Proposition 14.1
(a) If A transforms functions of q(x) into functions of cp(x) then {cp(X,)} is
a Markov process.
(b) If P. denotes the distribution of {X} when X 0 = x and Q that of {cp(X,)}
when gp(X o ) = u, then the P. distribution of {cp(X,)} is Q 4, .
This statement needs some qualification, since not all bounded measurable
functions are in the domain (of definition) of A (see theoretical complement 1).
if t < i o ,
= ( 15.1)
` X ift>r
where t a, G is the first passage time to G, also called the first hitting time of G,
defined by
The Markov property of {JZ,} follows from calculations that are almost the
same as carried out for Theorem 7.1 (Exercise 1). Its transition probability is
given by
Now {Xj has a transition probability density, say, p(t; x, y), and
B c G, x e G,
p(t; x, B) = J (
B
p (t; x, y) dy,
15.5)
PX(zaG<t,X uG c-B) Bc2G, xeG,
,
p(t; u, v) = exp{
u^(V u) p It }(2 ma^t)-viz
2 z )))
1
x exp
(v u) z
exp _
(v u) 2 11(15.6)
2Q, t 2v; t
u) ds (15.9)
PP(Tac '< t) =
fo, fq,u,(s;
where
z
iiz exp _ (u s
.%1,1(s; u) = u(2^ols3)
)
(15.10)
2s
Q;
for u > 0.
For k> 1, the (k 1)-dimensional Brownian motion {(X 2 ,. . . , X; k) )} is
independent of {X1 1) } and, therefore, of the first passage time
Therefore, for every interval I, c (0, cc) and every Borel subset C of l " ' it
follows that
Hence, for x, y e H k ,
k
p (t; x, Y) = p(t; x(') y
( )) fl p;(t; x , (15.14)
i=z
V/(x, y') = ()
f6 2 ' ,,(s; x ' ) fl p (s, x ti) yti)) ds (y' E OG; x e H k ). (15.16)
fo "o i=z
452 BROWNIAN MOTION AND DIFFUSIONS
J x(')(2i exp{
i(x, Y) =
0
a2)-kizs-(k+z)iz
2Q2s L
^(x(r))2 + I (y(i) xci))2]} ds
z J
Jx (1) 2kt2 I
t (krz)- 1e-^ dt
s t = [( x (1))2 + Y_ (y x j ) 1'2a s.
( ) 2
2
L z J
It is interesting to note that, for k = 2, (15.17) is the Cauchy distribution.
The explicit calculations carried out for the example above are not possible
for general diffusions, or for more general domains G. One may, however, derive
the linear second-order equations whose solutions are p (t; x, dy). The function
p (t; x, y) satisfies Kolmogorov's backward equation (see theoretical complement 2)
x, y) _ k d.. x a2p(t;
x, y) + '` (^) x aP (t; x, Y) __ A o
F^ O ax( , ) P ,
at 2 j "O ax (O ax(,)
(t>0,x,yEG), (15.18)
lim
p (t; x, y) = 0 (t > 0, ye G). (15.19)
x-a 3G.x G
= ^i(x; B)
B)p (t; x, y) dy (t > 0, x c G), (15.20)
where
Since p (t; x, y) is determined by (15.18) and (15.19) (along with the initial
condition lim, 10 p (t; x, y) dy = S x for x E G), it remains to determine fir. Now,
for fixed x, the hitting distribution O(x; dy) on G is determined by the collection
of functions
p 1 (x) '=
fac
f(y)^(x; dy) = Ex(f(X^) 1 (^<^}) (x E G),
(15.22)
for arbitrary bounded continuous f on 13 G. Let us show that
Al f (x) =0 (xEG),
(15.23)
lim tG f (x) = f(a) (a c 1G).
x a
Drop the subscript f and assume for simplicity that O(x) can be extended
to a twice-differentiable function i on Q that belongs to the domain of A, the
infinitesimal generator of the unrestricted diffusion {X 1 } (theoretical complement
4). Then, for x E G, assuming PP (r ac < t) = o(t) as t j0, (see Exercise 14.10),
proceed as in the proof of Proposition 9.1 to get, writing r for ' OG
Hence,
T^i(x) >y(x) = 0
A(x) = At (x) = lim (x e G), (15.25)
1 1 0t
establishing the first relation in (15.23). The second relation in (15.23), namely,
continuity at the boundary, may also be established under broad assumptions
(theoretical complement 2).
If G is bounded and aG smooth then the so-called Dirichlet problem (15.23),
for a given continuous f specified on G, can be shown to have a unique solution
by the so-called maximum principle (see theoretical complement 3). A proof of
this principle is sketched in Exercises 10-12 for the case A = A is the Laplacian
I' 02
A = (15.26)
ca> 2 '
G = {Ixj = a} for some a > 0. The case k = 1 is dealt with in Example 8.3, in
detail. Let us consider k> 1. The hitting distribution i/i(x; dy), i.e., the
Ps -distribution of X L , c is called the harmonic measure on G. Its density fi(x, y)
with respect to the surface area measure s(dy) on G (or the arc length measure
in the case k = 2) is the Poisson kernel,
a 2 Ix1 2
i(x; Y) =(IYI = a), (15.27)
cokalx yl k
where w k is the surface area of the unit sphere (Exercise 7). The solution to
(15.23) is
a2 _ 2 (Y)
s(dy). (15.28)
Of(x) o
IX1 t^y^ =o} IX
SOG i/i(x; y)s(dy) = 1 for each x e G, and that i(x, y) dy converges weakly to
0 (dy) as x 4 y o E G are left to Exercise 11.
O(x):= PX ( {X,} reaches OB(0; c) before aB(0; d)), (0 < c < lxl < d),
(15.29)
is the solution to
A(i(x) = 0 for 0 < c < ixi < d,
lim (x)=
x y ,^x^<a
10 1 if lyi = c,
if lYl = d.
(15.30)
To see this take G in (15.22) and (15.23) to be the annulus G = {c < lxl <d},
and f to be 1 on {ixi = c }, and 0 on {lxl = d}. Recall that the analog (9.2) to
(15.30) was solved in Section 9 to compute i,/i for the derivation of criteria for
transience and recurrence of one-dimensional diffusions. In the case of
multidimensional Brownian motion, (15.30) reduces to a two-point boundary
value problem, such as (9.2), in a single variable r = lxi. This equation was
solved in Section 14 to derive criteria for transience and recurrence of multi-
dimensional Brownian motion. Unfortunately, for general multidimensional
diffusions the Dirichlet problem (15.30) cannot be solved explicitly. It is possible,
however, to use a multidimensional analog of Corollary 2.4 to derive lower
and upper bounds for the probability i. Notice that the arguments sketched
in Section 2 remain valid for multidimensional diffusions and, hence, Corollary
2.4 is valid with S = Il k . Such a generalization is also derived in Chapter VII,
Corollary 3.2, by the method of stochastic differential equations.
In order to derive lower and upper bounds for ', some notation is needed.
Let F be a given real-valued twice continuously differentiable function on (0, oo).
Consider the radial function
x(')
x(^)f(x) = IXI F'(ixl),
z (x(`))2 1 - (x ( ' ) ) 2 \
(ax(i)2 f(x) _ ^ xI2 F (IXI) + IXI (IXI), (15.32)
IXI3 F
3(2) x(i)xU) x(i)x(1)
Also write,
d(x):= Y d..(x)x''
ixt2
' x
( ) )
C(x):=2 x p (x),
I
(15.33)
B(x) + C(x)
(r) := max d(x) 1,
Ixl =r
O B(x) + C(x)
r min
d(x)
Ixl_r 1,
f d
exp{ - 1(u)} du J exp{ I(u)} du
Proof. Let
J r exp{ -1(u)} du
F(r):= ('`d .f(x):= F(Ixi) (c < Ix! < d). (15.37)
J
exp{ 1(u)} du
Then F'(r) >0 and F"(r) + F'(r)(r)/r = 0, for r > c. Hence by (15.35) and the
definition of , writing r = jxl,
A x A
f <, max i(Y) c F'(r)(r)
F"(r) + -- = 0, (15.38)
d(x) I,1= d(y) r
so that
Z, :=f(X,) J 0
Af(X S )ds (t '>0) (15.40)
with
E,,.f(X^) =f(x) + Ex J Af(X.,) ds < f(x) (c < Ixl < d). (15.43)
0
On the other hand, f (X t ) is 1 if {X} reaches B(0; d) before OB(0; c), and is 0
otherwise. Therefore, (15.43) becomes
Subtracting both sides from 1, the first inequality in (15.36) is obtained. For
the second inequality in (15.36), replace by in the definition of f in (15.37)
to get A f (x) >, 0, and E x f (X r ) >, f (x). n
Corollary 15.2. If the conditions (1)-(4) of Section 14 hold, then for all x,
Ixt > C,
for all x.
Proof. (a) Fix x 0 , y (x o y) and c > 0. Let d > max{1x 0 1, lyl + s}. Choose c,
0 < c < d, such that {Ixl = c} is disjoint from B(y; e). Define the stopping times
Now the function x - PX (IX,I = d for some t > 0), (x) < d, is the solution to
the Dirichlet problem (15.23) with G = B(0; d), and boundary function f - 1
on {Ixl = d}. But i(x) - 1 is a solution to this Dirichlet problem. By the
MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS 459
This means
A,,:= { {X, } does not reach B(y; s) during (ten, r 21]} (n > 1). (15.55)
Then
where
P 0 X,} does not reach B(y; E)) < P, 0 (A I n . n A) < (1 8 0 ) for all n,
({
(15.61)
which implies (15.48), with x = x 0 .
(b) The proof is analogous to the proof of the transience of Brownian motion
for k >, 3, as sketched in Exercise 14.5 and Exercise 18. n
First consider the case k = 1. This is dealt with in detail in Subsection 6.2,
Example 1. In this case a reflecting Brownian motion {X,} with zero drift and
diffusion coefficient a 2 > 0 is given by
Y = All (16.1)
given by
g1(t;x,y)= p1(t;x,y)+p1(t;x, y) (x,y% 0 ), (16.2)
_(y x Z
p, (t; x, y) = (2na2t)_ 1/2 exp 2
( X, y e W). (16.3)
2Q t )
a t = i 62 A.q(t; x, y),
k
(t > 0; XE
a2
Hk , ye Hk), (16.5)
where
=j ax
x = ( x (1 1 x 121 x ).
(16.8)
Now extend the coefficients /1(1)(.), d ;; () on all of 11" by setting, on (Hk )`,
aP(t; x, y) ap(t; X, Y)
at = at
1 k k
It follows from (16.7), (16.9), (16.12), and (16.13) that {X} and {X t } have
the same drift and diffusion coefficients and, therefore, the same transition
probability density, i.e.,
The backward equation for p leads, by (16.10), to the backward equation for q,
k 2 k
1 d(x) ax(^ ax (;) + ^ ()(x) (x E Hk ; t > 0). (16.15)
a^ =
The second equation in (16.10), together with the assumption that p(t; x, y) is
REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS 463
aq(t;x, y)
=0 (t > 0; xEaHk ,yEHk ). (16.16)
Definition 16.1. A Markov process on the state space Hk that has continuous
sample paths and whose transition density q satisfies (16.15), (16.16) is called
a reflecting diffusion on Hk with drift coefficients ( ` 1 (x) and diffusion coefficients
d(x).
' (x)=0
( ) forxeHk , (16.17)
is really not needed for the validity of Proposition 16.1. The condition (16.17)
ensures that the extended drift coefficients in (16.9) are continuous. If it does
not hold, extend the coefficients as in (16.9) and modify (')() on OHk by setting
it zero there. Although the extended 'I() is no longer continuous, it is still
(
possible to define a diffusion having these coefficients and Proposition 16.1 goes
through (theoretical complement 1).
The second requirement in (16.7), i.e., the condition
is of a different nature. The next proposition shows that the failure of (16.18)
means that the direction of reflection is no longer normal to the boundary.
First consider a one-dimensional reflecting Brownian motion on [0, co) with
mean drift it 'I and diffusion coefficient 1. It may be constructed by the method
described in the proof of Proposition 16.1, after setting the drift p"I on
( oo, 0) and modifying the drift at 0 to have the value zero as described above
(also see Section 6). Let q l denote the transition density of this reflecting
Brownian motion. Let p j denote the transition probability density of a
one-dimensional Brownian motion (on R) with drift and diffusion
coefficient 1,2 < j < k, i.e.,
- " 2 exp (y x t ^2
p;(t; x,y):_(2nt) ) . (16.19)
2t
Then
k
pj ( c; x ye>),
q(t; x, y) := qi(t; x I ye>) [] (x, y E Hk), (16.20)
I= 2
backward equation
k
aq(t; x, y) 1 a 2 9 ^;, aq
(t>0; xEHk ,yEHk ), (16.21)
at = 2 ; _ , (ax<<>)2 +
aq(t;x,y)
= forxeaHk , (t>O;yeHk ). (16.22)
ax" ^
( 3f(x)
(grad = , ... , of (x) (16.24)
ax(1) ax(k)
Also write D" 2 for the positive definite matrix such that D 1 / 2 D 1 / 2 = D, and
D 1/2 for its inverse.
where 0 is an orthogonal transformation such that 0' maps D" 2 y/ID" 2 yJ into
e:= (1, 0, ... 0), and {Y} is a reflecting Brownian motion on Hk having drift
vector v:= (D 112 O) - 't and diffusion matrix I.
Proof. Let {Y,} denote a reflecting Brownian motion on the half-space Hk and
0 an orthogonal matrix as described. Then {Z, }, defined by (16.27), is a Markov
process on the state space H (Exercise 3). Let q, r denote the transition
probability densities of {Y, }, {Z, }, respectively. Then, writing T:= D 112 0,
k
aq(t; x, y)
=2exq+ ;^1 v ( ` a q 16.29
)
( )
at ax(`)
and that, for every real-valued twice-differentiable function f on W' the following
standard rules on the differentiation of composite functions apply (Exercise 4):
This is used with f as the function x q(t; x, T -1 w), for fixed t and T - 'w, to
arrive at (16.25) using (16.28) and (16.29).
The boundary condition aq/ax^`> - (egrad q) = 0 on OHk becomes, using
the first relation in (16.30),
ar(t; z, w) 1 k 32r k ( i) ar
(z) az^i) (16.33)
at 2 ,, ^ d " as<<^ a z ( +
Write G:= {p(x) > 0}. Let D(x):= ((d ; (x))), and u ' (x) (1 < i < k) satisfy the
( )
Definition 16.2. Let {X,} be a Markov process on the state space G in (16.34)
that has continuous sample paths and a transition probability density q
satisfying the Kolmogorov backward equation
k 2 k
ag(t; x, y) = Aq := 1 Z d,j ( x ) a g + 1 u^i^(x) qa ,
at 2 i . j = 1 ax(`) ax j> ( i = 1 ax(f)
and diffusion coefficients d(.) (1 < i,j < k). The vector D(x)(grad cp)(x) at
x e aG (or any nonzero multiple of it), is said to be conormal to the boundary
at x. The reflection of {X,} is said to be in the direction of the conormal to the
boundary.
Example 2. The reflecting standard Brownian motion {X,} on the unit ball
B(0:1) = {x e R': Ixl < 1} has a transition probability density p satisfying the
backward equation
ap(t; x, Y) = i
!A p(t; x, Y) (t > 0; IXI < 1 , IY1 , 1), (16.38)
at
y- 0) ap(t; x, Y) =
0 (t > 0; Ixl = 1, IYI < 1). (16.39)
ax ' ( )
It is useful to note that the radial motion {R,:= IX,I} is a Markov process
on [0, 1]. This follows from the fact that the Laplacian A. transforms radial
functions into radial functions (Proposition 14.1) and the boundary condition
is radial in nature. Indeed, if f is a twice continuously differentiable function
on [0, 1] and g(x) = f(Ixl), then g is a twice continuously differentiable radial
function on 9(0:1) and the function
au(t, x)
ZAXu(t, x), (t > 0; Ixl < 1),
at
k
au(t,
x(i' x) = 0, (Ixl = 1), (16.41)
av(t, r) 1 az v k 1 av
= ----- + ------- (t>0;0 r<1),
at tare 2r ar'
av(t, r) -
- 0, (t > 0; r = 1), (16.42)
ar
It is not difficult to check that the function v(t, Ixl) satisfies (16.41) (Exercise
5). By uniqueness of the solution to (16.41) it follows that u(t, x) _ v(t, Ixl).
468 BROWNIAN MOTION AND DIFFUSIONS
1d2 k-1 d
A ,:=--+ ,
2 dr z 2r dr
with boundary condition d/dr = 0 at r = 1. Recall (see Eq. 14.29, Section 14)
that 0 is an inaccessible boundary for {R,}. Hence, a reflecting boundary
condition attached to the accessible boundary r = 1 suffices to specify the
Markov process {R,}.
Let c(t; y) denote the solute concentration at a point y at time t. The velocity
of the liquid is along the capillary length (i.e., in the yU ) direction) and is given,
as the solution of a linearized NavierStokes equation governing a steady
nonturbulent flow, by
F(y') = u0 ( i --i
a
^z
. (17.2)
is the 3 x 3 identity matrix. When this solute particle reaches the boundary clG,
it is reflected. In other words, the location {X, = (X; ", X;^^, X^ 3 ^) = (X^' ) , X,)}
of the solute particle is a reflecting diffusion on G = G u G having drift
(x) = (F(x'), 0, 0) and diffusion matrix D.I. Therefore, its transition probability
density p(t; x, y) satisfies Kolmogorov's backward equation,
x )
x
p +x ( ' ) b i
x
p =n(x)gradp =0 (xcG,t>0;ye6). (17.4)
Here, n(x) = (0, x') is the normal (of length a) at the boundary point
x = (x", x').
The FokkerPlanck or forward equation may now be shown, by arguments
entirely analogous to those given in Sections 2 and 6, to be
the disc Ba := {ly'I <, a} with diffusion matrix D 0 I', where I' is the 2 x 2
identity matrix.
2. X,(' = Xo' + f F(X) ds + Do B where {B 1 } is a one-dimensional
)
By the Markov property of {XI}, the integral on the right does not involve xi ".
Now the disc Ba = {ly'I <, a} is compact and p'(t; x', y') is positive and
continuous in x', y' e BQ for each t > 0. It follows, as in Proposition 6.3 of
Chapter II, that for some positive constants c 1 , c 2 , one has
where y(y') is the unique invariant probability for p'. Let us check that y(y') is
the uniform density,
(17.15)
(t; Y,) :=
J ^ c(t; y) dy ' ' c'=
ao
( )
(fa co(dx))'7za2
exponentially fast as t > oo. Of much greater interest, however, is the asymptotic
behavior of the concentration in the y ' -direction, i.e., of ( )
e(t; y
)) := dy'I.
7 (1 .16)
y) dy' = p(t; x, y)
fj c(t;
Iy'l a) J _co(dx)I
G IJy'^ a)
The study of the asymptotics of c(t; y ") is further simplified by the fact that
(
the radial process {R,:= IX,'1} is Markovian. This is a consequence of the facts
that (1) the backward operator ZD.A.. of {X;} transforms all smooth radial
functions on the disc B o into radial functions and (2) the boundary condition
(17.12) is radial, asserting that the derivative in the radial direction at the
boundary vanishes (see Proposition 14.1). Hence {R,'} is a diffusion on [0, a]
whose transition probability density is
where s,(dy') is the arc length measure on the circle {Iy'I = r' }. The infinitesimal
generator of {R;} is given by the backward operator
_ (d 2 1 dl
0 < r < a (17.18)
AR A dr 2 + r drJ'
= 0.
dl
dr r = a
(17.19)
One may arrive at (17.18) and (17.19) from (17.17), (17.13), and (17.14), as in
Example 2 of Section 16 (see Eq. 16.42 for k = 2). It follows from (17.10),
(17.11), and (17.17) that {R,'} has the unique invariant density
2r
ir(r) = z , 0 <, r <, a, (17.20)
a
and that
2r
max q(t; r, r') r 27cac 1 e - ` 2t (t > 0). (17.2)
o,r.rsa a
472 BROWNIAN MOTION AND DIFFUSIONS
48 0
Finally, since {B} and {X,'} in (17.8) are independent it follows that, as n 00,
in distribution. Here
a2U2
o , (17.30)
D 48D 0+ D
CHAPTER APPLICATION 473
is the large scale or effective dispersion along the capillary axis. Equations (17.29)
and (17.30) represent Taylor's main result as completed by R. Aris (theoretical
complement 1). The effective dispersion D is larger than the molecular diffusion
coefficient Do and grows quadratically with velocity.
Actually, the functional central limit theorem holds, i.e., { Y} converges in
distribution to a Brownian motion with zero drift and variance parameter D.
Also, for each t > 0, the convergence in (17.29) is much stronger than in
distribution. Indeed, since the distribution of Y,(" is the convolution of those
of Z,(" ) and Brownian motion B, (see (17.8(ii))), Y,(" has a density that)
converges to the density of N(0, Dt) as n --> oo. Hence, by Scheffe's theorem
(Section 3 of Chapter 0), the density of Y," converges to that of N(0, Dt) in
)
the L' -norm. Since the distribution of X,(' ) has the density c(t;)/Co , where
C o := $,3 c o (dx) is the total amount of solute present, the density of Y^' is given by
I E
-l^( E -z t; E -i z + zU0 E -2 t)
(2nCt)ii2 exp{ - 2
dz-+0 as v J, 0.
(17.32)
From a practical point of view, (17.33) says that at time t' much larger than
the relaxation time over which the error in (17.10) becomes negligible, the
concentration along the capillary axis becomes Gaussian. The center of mass
of the solute moves with a velocity ZU 0 along the capillary axis, with a dispersion
D per unit of time. The relaxation time in (17.10) is 1/c 2 , and c 2 is estimated
by the first (i.e., closest to zero) nonzero eigenvalue of ZD o i X . on the disc Ba
with the no-flux, or Neumann, boundary condition.
It remains to give a proof of the representation (17.8). First, let us show that
{X, = (X; 1 ", X;)} as defined by (17.8) is a time-homogeneous Markov process.
Fix s, t positive, and let the initial state be x o = (x'I, xo'). Write
= E[E(g(X, +S ) {B:0 u,< t}, {X: 0,< u<, t}) ^ {X: 0,< u,< t}]. (17.35)
Now,
X;+s). (17.36)
g(X1+5) = 9(X}' , +
foS F((X; + )u)du+ (B1+sB1),
E(h(s, (X, + F
f
o ((X;+)))du, X
) +s {B: 0 < u < t}, {X: 0 < u < t}
(17.37)
where
To evaluate (17.37) use the facts that (1) X,(' ) is determined by the conditioning
variables, (2) {X;} is independent of {B,}, so that the conditional distribution
of X given {B,,: 0 < u < t}, {X: 0 < u < t}, is the same as that given
{X: 0 < u < t}. Then (17.37) becomes
expectation given X 0 = x,
where g', g" are the first and second derivatives of g. It follows that
a
( at
T,9(x) I
/t_o
= F(x')g'(x") + ZDo g"(x" i ^). (17.43)
But the right side is Ag, where A is as in (17.3). Therefore, the infinitesimal
generator of {X,} agrees with that of {Z I } for functions depending only on the
first coordinate. In particular, the backward equation for T, g becomes
EXERCISES
2. Show that if (1.2) holds then so does (1.2)' for every e > 0.
3. Suppose that {X,} is a Markov process on S = (a, b), and (p is a continuous strictly
monotone function on (a, b) onto (c, d).
(i) Prove that {(XX )} is a Markov process.
(ii) Compute the transition probability density for the process {X,:=exp{B,}} in
Example 1.
4. Consider the Ornstein-Uhlenbeck velocity process { Y,} starting at V o = v. Let
y = /m (Example 2).
(i) Show that
az
Cov(V, I;+) = (1 - e - ' 7 )e - Y5 , s > 0.
Y
(ii) Calculate the limit of the transition probability density p(t; x, y) in (1.6), as
t--. oo,if(a)y<0,or(b)y>0.
(iii) Define the Ornstein-Uhlenbeck position process by X, = x + f o V. ds, t -> 0.
(a) Show that {X} is a Gaussian process.
(b) Show EX, = x + vy '(1 - e Y`).
(c) Show
z z z
4e-rr-1e-zyr
VarX,=a t-3v +v
Y z 2 Y' 2Y z Y Y
(d) Show
z
COV(Xs , Xs+t - Xs) = (1 - e)(1 - e) 2 .
Y
2. Show that the derivations of (2.13) and of the backward equation (2.15) do not
require the assumption of the existence of a density of the transition probability.
EXERCISES 477
3. Let f be a twice continuously differentiable function on [c, d]. Show that given &> 0
there exists a twice-differentiable g on l' such that g = f on [c, d] and g vanishes
outside [c - e, d + J.
4. (i) Show that (1.2) in Section 1 holds uniformly for all x in the following cases:
(a) U(x) =0,a 2 (x) =a 2 .
(b) u(x) = , a 2 (x) = a 2
(ii) Show that, for the Ornstein-Uhlenbeck process, PP (IX, - xI > r) = o(t) does not
hold uniformly in x.
5. Let p(t; x, y) be the transition probability density of a diffusion on V8' and A a real
number. Write q(t; x, y) exp{ - At} p(t; x, y).
(i) Show that q satisfies the Chapman-Kolmogorov equation (2.2), and the
backward and forward equations
at = (x) 2q,
+ Za2(x) ax
ax -
aq
at
- ay a (u(y)q) + ya (za (y)q) - Aq.
z
2 2
where {X} is a diffusion on W. Show that the operators T, have the semigroup
property, and that u(t, x):= (T, f)(x) solves the initial-value problem in (ii) with
A=
(v) Note that (T, f)(x) is finite (indeed, T, f is bounded if f is bounded), if A(x) >_ 0.
One may also express T, f as
where conditionally given the sample path {X s : s >_ 0}, the killing time ^ (X , E has
the (nonhomogeneous) exponential distribution
The killing times ^ fx . ) may take the value + co with positive probability. If {X,}
is defined on a probability space of trajectories (52,,.F 1 , P,), then show that by
enlarging the space appropriately one may define both {X,} and on a
common probability space (52,.x, P). [Hint: First construct the product space
(Q 2 , .Fz , Pz ) for an independent family of random variables { } indexed by the
set S2, of all trajectories co,. That is, 0 2 = X w En , Icu,, where lo = [0, oo]
for all w l E Q. Then take the product space (S2 = X S 2 Z, = 1 OO Wiz,
P = P 1 X P2).]
(vi) (FeynmanKac Formula) If A(x) is bounded below and continuous, and not
necessarily nonnegative, show that T, f given in (iv) is well defined, defines a
semigroup, and solves the initial-value problem (ii) with A = A(x).
(Adjoint Diffusions) Let p(t; x, y) denote the transition probability density of a
diffusion on F1' with coefficients satisfying Condition (1.1). In addition, assume that
z
(o2(x))d (x)=0, (xeR').
2dxz
(i) Show that P(t; x, y):= p(t; y, x) is a transition probability density of a diffusion
on F1' and compute its drift and diffusion coefficients. What is the forward equation
for p?
(ii) (Time Reversibility) Let {X,} and {Y} be diffusions with transition probability
densities p and p" as above, with X o =_ x, Yo = y. Prove that for arbitrary time
points 0 < t, < < t,, < t the conditional p.d.f. of (X,,, X,,, ... , X,) given
X, = y, evaluated at (x,, x 2 ..... x), is the same as the conditional p.d.f. of
(Y,, Y,, .. , Y) given Y, = x, evaluated at (x,,, x_,, .. , x,). Thus, the
sample path of one process over any finite time period cannot be distinguished
probabilistically from that of the other with the direction of time reversed.
7. (Sources and Sinks) Consider the FokkerPlanck equation with a source (or sink)
term h(t, y),
c(t, a2
(u(Y)c(t, Y)) + i (ia z (Y)c(t, y)). + h(t, y),
at Y
where h(t, y) is a bounded continuous function of t and y, and f Ih(t, x)Ip(t; x, y) dx < oo.
One may interpret h(t, y) (in case h _> 0) as the rate at which new solute particles are
created at y at time t, providing an additional source for the change in concentration
with time other than the flux. The contribution of the initial concentration c o to the
concentration at y at time t is
c, (t, Y)'=
J
co(x)p(t; x, Y) dx.
EXERCISES 479
The contribution due to the h(s, x) Ax As new particles created in the region
[x, x + ix] during the time [s, s + As] is h(s, x)p(t s; x, y) As Ax (0 _< s < t).
Integrating, the overall contribution from this to the concentration at y at time t is
c 2 (t, y) = J
f. gal
'
h(s, x) p(t s; x, y) dx ds.
lim
ajo PU
J h(t s, x)p(s; x, y) ds = h(t, x).
Show that this last condition is satisfied, for example, under the hypothesis of
Exercise 6. *Can you make use of Exercise 5 to verify it more generally?
(ii) Apply (i) to the case (x), a z (x), h(t, x) are constants.
(iii) Solve the corresponding initial-value problem for the backward equation.
(Y
1(x z):= ^ 2, u ) d Y
_
x 62 (y)
with the usual convention !(z, x) 1(x, z) for z < x. Fix an arbitrary x o e S, and
define the scale function
J
s(x)= exp{ 1(x 0 , z)} dz,
x
[Hint: Let f be twice continuously differentiable, such that (a) f (y) = 2 for Ix yj > 1,
(b) 1 <, f (y) <- 2 for Iy xl > e, (c) f (y) = Iy xl'/E 3 for ly xl e. Then
Px ^ max tf
ost-<h
(X,) J t
0
Af(X,) ds}
2
if h is chosen so small that max{IAf (y)I: ye 11'} -< 1/2h. By the maximal inequality
(see Chapter I, Theorem 13.6), the last probability is less than
I3
EXf 2 (Xh) 4 PX(IXh xl > e) + E x IX'^ x I(lXh_XI F) = o(h),
az
Cov,(V, , V +,) = 2- e - 'S, s > 0.
Y
(v) Calculate the average kinetic energy E(mV )of an Ornstein-Uhlenbeck particle
under the invariant initial distribution.
(vi) According to Bochner's Theorem (Chapter 0, Section 8), one can check that
p(s) = Cov,,(V V + ,) in (iv) is the Fourier transform of a measure It (spectral
distribution). Calculate p.
6. Briefly indicate how the Ornstein-Uhlenbeck process may be viewed as a limit of
the Ehrenfest model (see Section 5 of Chapter I11) as the number of balls d goes to
infinity.
7. Let
s (x) J
E
= pex ^-I 2ti(1)dy^dz
. xo a 2 (Y)
be the so-called scale function on S = (a, b), where x o is some point in S. If {X} is
a diffusion on S with coefficients p( ), a 2 ( ), then show that the diffusion { Y, := s(X,)}
on S = (s(a), s(b)) has the property
ap(t x, y)
; ^2
= -^ ^((Y)P(t;
yyz x, y)) + (ia2(Y)P(t; X, y)),
at
(t >0, -cc < x, y < oo), along the lines of the derivation of the backward equation
(4.14) given in this section. [Hint: p " +t = pap ] .
3N
T = T(v1, v2, ... , U3N) _ 'zmv; = zm IIvIIi, for v = (v1.... , V3N) E R 3N ,
j=1
where IlvlI2:=^; =Ni vj. Define equilibrium for a gas in a closed system of energy E
482 BROWNIAN MOTION AND DIFFUSIONS
to mean that the velocities are (purely random) uniformly distributed over the energy
surface S given by the surface of the 3N-dimensional ball of radius R = (2E/m)" 2 , i.e.,
f
S = ( VI, v2,
3N
, V3N) Y_vJ _
i =1 m
2E
(ii) Calculate the limiting distribution as N + oo and E + cc such that the average
energy per particle EIN (density) stabilizes to > 0.
(iii) How do the above calculations change when the 1 2 -norm 1 2 defined above is
replaced by the 1p norm defined by jIvIIP'= v, where p >_ 1 is fixed?
(This interesting generalization was brought to our attention by Professor
Svetlozar T. Rachev.)
u ^= x
( ) Z
(t>0,oo<x<oo),
=za axe+x
4. Using the construction of reflecting diffusions on [0, oo), [0, 1], construct reflecting
diffusions on [a, oo), ( co, a], [a, b] for arbitrary a < b. What are the analogs of
(6.7), (6.8) and (6.36)- (6.38) in these cases? Express the transition probability
densities for these reflecting diffusions in terms of appropriate unrestricted diffusions.
5. Suppose that cp is a measurable map on an interval S onto an interval S'. Suppose
that {X} is a Markov process having a homogeneous transition probability p. If
T,(f o ^p)(x) := E[ f ((p(X,)) I X 0 = x] is a function v 1 (t, p(x)) of qp(x), for every
real-valued bounded measurable function f on S. then prove that { Y, := cp(X,)} is a
homogeneous Markov process on S'.
6. Give a derivation of the forward equation (6.17).
7. Prove that the transition probability density q(t; x, y) of {Z} in Theorem 6.4 satisfies
(aq/ax)x -1 = 0.
8. (i) Assume that p(t; x, y) >0 for all t > 0, and x, y e ( oo, oo) for every diffusion
whose drift and diffusion coefficients satisfy Condition (1.1). Prove that the
Markov process {Z,} on the circle in Theorem 6.2 has a unique invariant
probability m(y) dy, and that the transition probability density q(t; x, y) of {Z,}
converges to m(y) in the sense that
(a) (T^f)(x) =
f( a.m)
f(Y)P (t; x, Y) dY (x > a),
(T.f )(x);= f( a, )
f(Y)P(t; x, y) dY,
(i) Prove that p is the density component of the transition probability of the diffusion
on [0, d] absorbed at the boundary {0, d}. [Hint: Check (2.30). (7.10), and (7.11)
on [0, d], as well as the initial condition: (T f)(x) -+ f (x) as t 10 for every
continuous f on [0, d] with f(0) = f(d) = 0.]
(ii) For the special case j(.) = 0, a 2 (.) = a 2 compute p.
EXERCISES 485
6. (Duhamel's Principle)
(i) (Nonhomogeneous Equation) Solve the initial-boundary-value problem
a u tazu
u(t, x) = u(x) za2(x) x), (t > 0, a <x < b);
at -x + c3x2 + h(t,
where h is a bounded continuous function on [0, cc) x [a, b], and f is a bounded
continuous function on [a, b] vanishing at a, b. [Hint: Let p be as in (7.17).
Define
u0(t, x)'=
f. J
' [a,b]
h(s, y)p (t s; x, y) dy, ui(t; x) =
ft a,b]
f(y)p (t; x, v) dy.
Check that u satisfies the nonhomogeneous equation and the given boundary
conditions, but has zero initial value. Then check that u, satisfies the
homogeneous differential equation (i.e., with h = 0), the given boundary and
initial conditions.]
(ii) (Nonzero Constant Boundary Values) Solve the initial-boundary-value problem
z
u(t, x) = u(t, x), (t > 0; a <x < b);
at axz
lim u(t, x) = a, lim u(t, x) = ;
:la xTb
au, = _a u 1 _ a u0
1Q2 _
(3t z x z at
with zero boundary conditions and initial condition lim, u, (t, x) = ]
(0, T] x (a, b), continuous on [0, T] x [a, b], and satisfy (au/at) Au <0 on
_<
(b) at a point (t, a) or (t, b) for some t e (0, T]. [Hint: Suppose not. Let the
_<
(0, T] x (a, b), where A = o 2 (x) d 2 /dx 2 + p(x) d/dx, with a 2 (x) > 0. Show that
the maximum value of u is attained either (a) initially, i.e., at a point (0, x o ), or
_<
points of a finite interval S.
2. Prove that
(i) if a is an eigenvalue of A (with Dirichlet or Neumann boundary conditions at
the end points of S), then a 0;
(ii) if a,, a 2 are two distinct eigenvalues and ei l 0 2 corresponding eigenfunctions,
,
then Ali, and i' 2 are orthogonal, i.e., <i/i,, i/i 2 > = 0;
(iii) if 0,, 0 2 are linearly independent, then 4,, 2 (,, 02> /II1Il 2 )O, are
orthogonal; if 0 1 , 0 2 , ... , 0, are linearly independent, and ,, ... , O k _, are
orthogonal to each other, then 0 ... , k_ 1, k ( Z,-i ;l 4,k) /11c6; II 2 )4 . ,
are orthogonal.
3. Assume that the nonzero eigenvalues of A are bounded away from zero (see
theoretical complement 2).
(i) In the case of two reflecting boundaries prove that
where m(y) is the normalization of n(y) in (8.1). Show that this convergence is
exponentially fast, uniformly in x, y.
(ii) Prove that m(y) dy is the unique invariant probability for p.
(iii) Prove (ii) without the assumption concerning eigenvalues.
4. By a change of scale in Example 3 compute the transition probability of a Brownian
motion on [a, b] with both boundaries absorbing.
5. Use Example 4, and a change of scale, to find
(i) the distribution of the minimum m, of a Brownian motion {X,} over the time
interval [0, t];
(ii) the distribution of the maximum M, of {X s } over [0, t];
(iii) the joint distribution of (X m,).
EXERCISES 487
6. Use Example 3, and change of scale, to compute the joint distribution of (X m, M,),
using the notation of Exercise 5 above.
7. For a Brownian motion with drift p and diffusion coefficient a 2 > 0, compute the
p.d.f. of T A r b when X0 = x and a <x < b.
8. Establish the identities
(na
2t) z ' Z
m
L
[
ex jl 3
= _ x 2azt
p ( - (2md +y -x) Z l
+ ex
p(
Sl
(2md +Y +x)' ll
2621 )7J
1 2
- + - ex z - cos cos
= m=1 p^- - -- --> m x /
\i \-
exp{-(2md + y Y^ - exp^-(2md + y + x)
(2na 2 t) - '/ 2
L 2a 2 1 ) 2a2t -^1
x,
2 a 2 rr 2 m 2 1 m rzy
_-- Z exp - sin ( rnnx
- si n - ) (0 x, y d).
d m = 1 2d2 d d
^
(z):=
m - m _
exp{ -nm z} _1
2 exp{ -nmz /z} _ --O`
1 (il
f \z / (z > 0).
^z m s, ^Z
[Hint: Compare (8.23) with (6.31), and (8.33) with Exercise 7.5(ii).]
9. (Hermite Polynomials and the Ornstein-Uhlenbeck Process) Consider the generator
A = ' d 2 /dx 2 - x d/dx on the state space W.
(i) Prove that A is symmetric on an appropriate dense subspace of L2 (!8', e - x 2 dx).
(ii) Check that A has eigenfunctions H"(x)'= ( -1)" exp {x 2 }(d "/dx") exp{ -x 2 }, the
so-called Hermite polynomials, with corresponding eigenvalues n = 0, 1, 2, ... .
(iii) Give some justification for the expansion
x t
Z c"e
p(t; x, y) = e 2 "'H"(x)H"(y), c "'=
10. According to the theory of Fourier series, the functions cos nx (n = 0, 1, 2, ...),
sin nx (n = 1, 2, ...) form a complete orthogonal system in L2 [-n, 7r]. Use this to
prove the following:
(i) The functions cos nx (n = 0, 1, 2, ...) form a complete orthogonal sequence in
L2 [0, it]. [Hint: Let f E L2 [0, n]. Make an even extension of f to [ -n, n], and
show that this f may be expanded in L2 [-n, it] in terms ofcos nx (n = 0, 1, ...).]
(ii) The functions sin x (n = 1, 2, ...) form a complete orthogonal sequence in
L2 [0, it]. [Hint: Extend f to [-it, n] bysetting,f( -x) = - f(x)forx e [ -n, 0).]
3. Suppose S = [a, b) and a is reflecting. Show that the diffusion is recurrent if and only
if s(b) = cc. If, on the other hand, s(b) < oo then one has p s,. = 1 for y> x and
s(b) s(x) _
J
x
n exp{ I(x o , z)} dz
Px r = Sib) s(Y) ----, y < x.
s
exp{ I(x o , z)} dz
[Hint: For y > x, assume the fact that the transition probability density p(t; x, y)
is strictly positive and continuous in x, y for each t> 0, to show that
b := min{P= (X o y): a -< z -< y} > 0 for t o > 0.]
4. Suppose S = [a, b) and a is absorbing. Then show that no state, other than a, is
recurrent and that
s(x) s(a)
ifa<x^y,
s(y) s(a)
Px y = I ifs(b)=oo,anda<-y<x,
s(b) s(x)
ifs(b)< cc, and a<y-<x.
s(b) s(y)
5. Suppose S = [a, b] with both boundaries reflecting. Then show that p x,, = 1 for all
x, y E S. and the diffusion is recurrent.
6. Apply Corollary 9.3 and Exercise 3 to decide which of the following diffusions are
recurrent and which are transient:
(i) S=(ac,00),(x)=p960,a z (x)=a 2 >0;
(ii) S=( cc, cc),p(x)=0,a 2 (x)=a z >0;
(iii) S = (cc, cc), (x) = x, a 2 (x) = a 2 >0 (consider separately, >0, < 0);
(iv) S = [0, cc), p(x) - p, a 2 (x) = a 2 > 0, 0 reflecting (consider separately the cases
P< 0 ,P= 0 ,u>0);
(v) S=(0,1),u(x)--* oo as x10,p(x)-+ oo asxji,a 2 (x)>a 2 >0 for all x.
7. Suppose S = [a, b] with a absorbing and b reflecting. Show that all states but a
are transient, p xy = 1 for y < x, and
s(x) s(a)
Px v Y > x.
= s(y) s(a),
8. Suppose S = [a, b] with both boundaries absorbing. Show that all states, other than
a and b, are transient and
s(b)s(x)
ayx<b,
s(b) s(y)
Pz y =
s(x) s(a)
a<xyb.
s(y)s(a)
EXERCISES 489
9. Let [c, d] c S. Solve the two-point boundary-value problem: A f (x) = 0 for c <x < d,
lim a 1 , f (x) = a, lim X1 d f (x) = , where a, are given numbers. Show that the solution
equals aP (r < < r d ) + Px (r a < t a ).
P
1. (i) Assuming that the transition probability density p(t; x, y) is positive and
continuous in x, y for each t > 0, show that M(x), defined by (10.3), is bounded
on [c, d]. [Hint: Use Proposition 13.5 of Chapter I.]
(ii) Let T be as in (10.2), x e (c, d). Assume that
for every e > 0 (this is proved in Exercise 3.5). Show that E (1,,, h } M(Xh )) = o(h)
X
f
>_
OT, n za
8. Show that, under natural scale (i.e., s(x) = x), m'1 (x) m'2 (x) for all x implies
M I (x) -> M 2 (x) for all x.
490 BROWNIAN MOTION AND DIFFUSIONS
for all Z of the form (11.7), m being arbitrary and f bounded continuous. Prove
that (11.12) holds. [Hint: Use (a) the fact that bounded continuous functions
on S"' are dense in O(S', y) where u is any probability measure, (b) Dynkin's
Pi-Lambda Theorem (Section 4 of Chapter 0).]
(ii) Prove that if (11.12) holds for arbitrary finite sets t, < t2 < ... < t; and bounded
continuous h, then the conditional distribution of X, given .F, is Px , on the
set {r < co}.
4. Prove that q(y) in (11.18) is continuous for every bounded continuous h on S', if
x p(t; x, dy) is weakly continuous (see Section 5 of Chapter 0, for the definition
of weak convergence). This weak continuity, also called the (weak) Feller property,
is proved in Section 3 of Chapter VII.
5. Let {XX } be a Brownian motion with drift p and diffusion coefficient a 2 > 0, starting
at x.
(i) Prove that the conditional distribution of {X; 0 _< u <_ t}, given X, does not
depend on p. [Hint: Look at finite-dimensional conditional distributions.]
(ii) Use (i) and (11.42) to compute the joint p.d.f. of (X M,), where
MM = max{X,,: 0 < u < t}.
(iii) Use (ii) to compute the distributions of (a) M, and (b) r a (a > x).
(iv) Check that the conditional distribution in (i), given X, = y, is the same as the
distribution of the process
{_o_ (yx))+x:Ou<'t1 1
where { }.: u >, 0} is a Brownian motion with zero drift and diffusion coefficient
a 2 , starting at zero.
6. Assume the result of Example 8.3, for the case p = 0. Show how you can use Exercise
5(i) to derive the joint distribution of (Xe , m, M,) where {X,), M, are as in Exercise
5(i), (ii), and m, = min{X: 0 _< u _< t}.
7. Let {X} be a Brownian motion with drift zero and diffusion coefficient az > 0,
starting at x. Let t =; A r d , where c <x < d. Show that the after-r process X, is
not independent of the pre-t sigmafield.
8. Let {X} be a Brownian motion with drift p > 0 and diffusion coefficient a 2 > 0,
starting at zero.
(i) Prove that {z Q : a - 0} is a process with independent and homogeneous
increments.
(ii) Find the distribution of r q .
(iii) What can you say concerning (i), (ii), if p < 0?
EXERCISES 491
9. Show that the proof of the strong Markov property (Theorem 11.1) essentially
applies (indeed, more simply) to the countable state space case (see Section 5 of
Chapter IV).
10. (i) Use Exercise 5(ii) to derive the transition probability density (component) of
a Brownian motion on ( oo, 0] with absorption at 0.
(ii) Derive the corresponding transition density on [0, co) with absorption at 0.
(iii) Use Exercise 6 to derive the transition probability density (component) of a
Brownian motion on [0, 1] with absorption at both ends.
11. (Method of Images) Consider the drift and diffusion coefficient g(.), c 2 () defined
on ( oo, 0]. Assume p(0) = 0. Extend p(.)o 2 (.) to R' as in (6.8) by setting
( x) = g(x), Q 2 ( x) = a 2 (x) for x < 0. Let {X,} be a diffusion on R' with these
coefficients, starting at x < 0, and let z o denote the first passage time to 0. Let { Y,}
be the process defined by
_ X, for t < T o
Y X, for >, t o .
(i) Show that {Y,} has the same distribution as
(ii) Show that Px (M, -> 0) = 2P,,(X, > 0), where M, := max{Xs : 0 -< s -< t }. [Hint:
Show that (11.34) (11.36) hold with 0 replaced by x and a replaced by 0.]
(iii) Show that Px (XX -< y, t o > t) = PX (X, >, y) for ally < 0. [Hint: Use the analog
of (11.35).]
(iv) Deduce from (iii) that
7. Prove that the diffusion in Example 2 is positive recurrent under the assumptions
(12.35).
1. Give a proof of Theorem 13.1 similar to the proof of Proposition .10.1 in Chapter II.
(Also, see the proof of Proposition 12.1.)
2. Consider a diffusion {X1 } on S = (0, 1], with "1" a reflecting boundary and coefficients
(x) = (k 1)/x (k > 1), v 2 (x) - a 2 . Apply Theorem 13.1 to compute the asymptotic
distribution of
where f (x) = 1 x 2 . What is E n f ? [Hint: Remember the boundary condition for A.]
ap
at =2 i =,
I Z " Z (U?(y Ii ')p) e ^ ay^,^ (f^(Y' )p)
(ay )
(iii) Let p(x) = yx, D(x) = a 2 I, with y > 0, a 2 > 0. Write down its transition
probability density p(t; x, y), and check that it satisfies the Kolmogorov
equations. Show p(t; x, y) converges to a Gaussian density as t cc, and deduce
that the diffusion has a unique invariant distribution.
2. Write down the analogs of (1.2), (1.2)' for a multidimensional diffusion.
3. (i) Show that a k-dimensional Brownian motion with drift p = (p". .. , p" k ") and
diffusion matrix D = ((d ;; )) is a process with independent increments, and that
its transition probability density p(t; x, y) satisfies Kolmogorov's equations
(14.6).
EXERCISES 493
By the strong Markov property and (14.26), P(r 2 " < oo) = (d/d,) (k-2 >", and
P(T2 1 < oo I TZ" < (Xo) = 1.
(c) By (b), P(T Z " = oo for some n) = 1.]
(ii) From (i) conclude that
P(IB,I-+coast-+ cc) =1.
7. Let {X,} be a k - dimensional diffusion with periodic drift and diffusion coefficients,
the period being d > 0 in each coordinate. Write Z;' := X;'(mod d), Z, )
(Z t (I) , Z't1k) )
(i) Use Proposition 14.1 to show that {Z,} is a Markov process on the torus
T = [0, d)" which may be regarded as a Cartesian product of k circles.
(ii) Assuming that the transition probability density p(t; x, y) of {X,} is continuous
[I - bdk]ttt -1 0
where [t] is the integer part of t, b'= min{q(1;z, z'): z, z' e T} and
>_
0= max{q(l;y,z')q(l;z,z'):y,z,z'c-T}.
sufficient condition (in terms of the coefficients of {X,}) for {R,:= IX,I} to be a
Markov process. Compute the generator of {R,} if this condition is met.
(ii) Assume that the sufficient condition in (i) is satisfied. Find necessary and sufficient
conditions for {X} to be (a) recurrent, (b) transient.
9. Derive (14.14).
10. (i) Sketch an argument, similar to that in Section 2, to show that Corollary 2.4
holds for diffusions on 11" (under conditions (1)(4) following Definition 14.1).
(ii) Use (i) to prove
for a k-dimensional diffusion (k > 1) {X,} such that the radial process {R,:= IX,I} is
Markovian.
5. Consider a standard k-dimensional Brownian motion (k > 1) on G = {x E R k : 0
for 1 <, i 5 k} with absorption at the boundary.
(i) Compute p (t; x, y) for t> 0; x, ye G).
(ii) Compute the distribution of r ac , if the process starts at x e G.
*(iii) Compute the hitting (or, harmonic) measure i/i(x, dy) on G, for x e G. (1'(x, dy)
is the Pr -distribution of X t G ).
*(iv) Compute the transition probability P(t; x, B) for x e G, B G.
6. Do the analog of Exercise 5 for G = {x e I: 0 _< x ' _< 1 for 1 < i _< k}.
(
7. Prove (15.27).
A = Z A ( A := ^ z
i =1 (ax )
EXERCISES 495
u(x) = u(y) + Y- (xW`W - y (0 ) au (Y) + 2 Z (xu) - y 'n)( x u> - yn) xO xu> (Y)
+ O(p 3 ),
(ii) f
IG
O(x; y)s(dy) = I for x E 3G, and
(iii) i(i(x, y)s(dy) converges weakly to b .(dy) as x -^ y o E 3G.
y
12. (Maximum Principle) Let u be harmonic in a connected open set G c 1& k . Show
that u cannot attain its infimum or supremum in G, unless u is constant in G. [Hint:
Use the mean value property.]
13. (Dirichlet Problem) Let G be a bounded, connected, open subset of R. Given a
continuous function q' on OG, show, using Exercises 10 and 12, that
(i) u(x):= E, e q (B) is a solution of the Dirichlet problem
and
(ii) if u is continuous at the boundary, i.e., u(y) = lim u(x) as x(e G) - y e 0G, then
this solution is unique in the class of solutions which are continuous on G.
496 BROWNIAN MOTION AND DIFFUSIONS
14. (Poisson's Equation) Let G be as in Exercise 13. Give at least an informal proof,
akin to that of (15.25), that
Tr'G
I x ' Ip''(x + z) -< S for Ixt >, M, where z e l8 k , 5>0, M >0 are given.
(
Apply Theorem 15.3 (and Exercise 19) to decide whether the diffusion is recurrent
or transient.
21. For a diffusion having a positive and continuous (in x, y) transition density p(t; x, y)
(t > 0), prove that the P,-distribution of T OB(O . d) has a finite m.g.f. in a neighborhood
of zero, if xl < d.
3. Prove that {Z,} defined by (16.27) is a Markov process on H = {x e 11": yx > 0}.
4. Check (16.30).
5. Check that the function v(t, jxj) given by the solution of (16.42) (with r = jxl) satisfies
(16.41).
THEORETICAL COMPLEMENTS
in Chapter VII, Corollary 3.2, that this martingale property holds for every twice
continuously differentiable f such that f, f', f" are bounded. Many important
properties of diffusions may be deduced from this martingale property (see Sections
3, 4 of Chapter VII).
2. (Progressive Measurability) The assumption of right continuity of t X,(w) for
every w e Q in Theorem 2.3 ensures that {X,} is progressively measurable with respect
to {.y, }; that is, for each t > 0 the map (s, w) + X(w) on [0, t] x S2 into S is measurable
with respect to the product sigmafield _4([0, t]) SF, (on [0, t] x S2) and the Borel
sigmafield .a(S) on S. Before turning to the proof, note that as a consequence of the
progressive measurability of {X} the integral f p Af (Xj ds is well defined, is
.y,-measurable and has a finite expectation, by Fubini's Theorem.
498 BROWNIAN MOTION AND DIFFUSIONS
Lemma 1. Let (S2, .y) be a measurable space. Suppose S is a metric space and
s --+ X(s, w) is right continuous on [0, oo) into S, for every aw e Q. Let {.y,} be an
increasing family of sub-sigmafields of .f such that X, is .F, measurable for every
t _> 0. Then {X,} is progressively measurable with respect to {. ~r }.
Proof. The first part is obvious. For part (b), fix t > 0. On the set 52, := {r < t}, XT
is the composition of the maps (i) a - (tr(w), w) (on f2, into [0, t] x S2,) and (ii)
(s, w) --> X (w) (on [0, t] x Q r into S).
5
{west,: (r(w),o)e[0, r] x A} = r} n Ac
Thus the map (i) is measurable with respect to In (on the domain St,) and
,1([0, t] (on the range [0, t] x S2 r ).
Next, the map (s, w) -+ XS (w) is measurable with respect to .q([0, t]) O .yr (on
[0, t] x f2) and .a(S) (on S), in view of the progressive measurability of {X 5 }. Since
[0, t] x Q, e ([0, t]) F,, and the restriction of a measurable function to a
measurable set is measurable with respect to the trace sigmafield, the map (ii) is
e([0, t]) SFJ,,- measurable. Hence w -. XT(m[ (W) is . I n ,-measurable on R.
Therefore, for every B e a(S),
d d
Mx 1, c<x<d, M(c)=M(d) =0.
(
dm(x) ds(x) ( )
For the class of diffusions considered in this book (see (9.9), (9.10), (9.13)),
Jx
s(x)= exp{I(x o , z)} dz, m(x) = f z^ z) exp{I(x o , z)} dz,
xp ,
X0
T,ff= J ( T,f)ds=
, ds
'
o
ATJds= J ' T.Afds.
0
E
In particular, the function t . T, f on (0, oo) solves the initial-value problem: for
f e 1,A , solve
The solution t --+ T, f is also unique. To see this, let u(t) be a solution. Fix t> 0 and
consider the function s --+ T,_ S u(s) (0 _< s 5 t). Now,
Tu(s) = lim
{(T,
h h u(s + h) T 1 _ s u(s + h)) + (T 1 _ g u(s + h) T,_.u(s))}
ds 1
t-s
du s
( )
Hence T,_ s u(s) is independent of s so that setting in turn s = 0, t, we get T,u(0) = u(t),
that is, T, f = u(t).
When is an operator A defined on a linear subspace Q of B an infinitesimal
generator of a contraction semigroup {T,}? The answer is contained in the important
Hille Yosida Theorem: Suppose 1 x is dense in B. Then A is the infinitesimal generator
of a contraction semigroup on B if and only if, for each . > 0, A A is one-to-one (on
9A ) and onto B with II(A A) - ' II < 1/2. A simple proof of this theorem for the case
B = C[a, b], the set of all continuous functions on (a, b) with finite limits at a and
b, and for closed linear subspaces of C[a, b], may be found in P. Mandl (1968),
THEORETICAL COMPLEMENTS 501
2 > 0; that is, . - A)g = f. Suppose that the resolvent operator (;. - A) ' is positive;
-
if f >_ 0 then (1 - A) 'f >_ 0. Then, by the uniqueness theorem for Laplace transforms,
T, f > 0 if f >_ 0. In this case f -^ T, f(x) is a positive bounded linear functional on
C 0 (a, b). Therefore, by the Riesz Representation Theorem (see H. L. Royden (1968),
Real Analysis, 2nd ed., Macmillan, New York, pp. 310-11), there exists a finite
measure p(t; x, dy) on S = (a, b) such that T, f(x) = $ f(y)p(t; x, dy) for every
f e C 0 (a, b). In view of the contraction property, p(t; x, S) < 1.
In order to verify the hypothesis of the Hille-Yosida Theorem in the case a, b are
inaccessible, construct for each A. > 0 two positive solutions u u 2 of (A - A)u = 0,
u, increasing and u 2 decreasing. Then define the symmetric Green's function
G,A (x, y) = W 'u,(x)u z (y) for a <x _< y < b, extended to x > y by symmetry. Here
-
of (A - A) ' follows from that of G A (x, y). Also, one may directly check that G.1 - 1/2.
-
This implies p(t; x, (a, b)) = 1. For details of this, and for the proof that a Markov
process on S = (a, b) with this transition probability may be constructed on the space
C([0, oo ): S) of continuous trajectories, see Mandl (1968), loc. cit., pp. 14-17,21-38.
4. For a diffusion on S = (a, b), consider the semigroup {T} in (2.1). Under Condition
(1.1), it is easy to check on integration by parts that the infinitesimal generator A is
self-adjoint on LZ (S, n(y) dy), where ir(y) dy is the speed measure whose distribution
function is the speed function (see Eq. 8.1). That is,
Af(Y)g(Y)n(Y) dy = ff(y)Ag(y)7r(y)dy
J
for all twice continuously differentiable f, g vanishing outside compacts. It follows
that T, is self-adjoint and, therefore, the transition probability density
q(t; x, y):= p(t; x, y) /n(y) with respect to n(y) dy is symmetric in x and y. That is,
q(t; x, y) = q(t; y, x). Since p satisfies the backward equation cap/c?t = A ' p, it follows
that so does q: cq/t = A x q. By symmetry of q it now follows that 3q /t = A y q. Here
Aq, A y q denote the application of A to x -. q(t; x, y) and y -^ q(t: x, y) respectively.
The equation tq/at = A y q easily reduces to cep/lit = A *p, as given in (2.41). For
details see Ito and McKean (1965), loc. cit., pp. 149-158.
For more general treatments of the relations between Markov processes and
semigroup theory, see E. B. Dynkin (1965), Markov Processes, Vol. 1, Springer- Verlag,
New York, and S. N. Ethier and T. G. Kurtz (1986), Markov Processes:
Characterization and Convergence, Wiley, New York.
502 BROWNIAN MOTION AND DIFFUSIONS
Proof. Taking the conditional expectation first with respect to a{Xu : u <, s} and then
with respect to a{cp(X): u < s),
I I
P(<a(X, +1) e B' c {q (XX): u s}) = E(p(t; X (p 1 (B')) a{co(X4): u s}).
THEORETICAL COMPLEMENTS 503
It has been pointed out to us by J. K. Ghosh that the idea underlying the
proposition also occurs in the context of sequential analysis in statistics. See W. J. Hall,
R. A. Wijsman, and J. K. Ghosh (1965), "The Relationship Between Sufficiency and
Invariance with Applications in Sequential Analysis, Ann. Math. Statist., 36,
pp. 575-614. Also see theoretical complement 16.3.
A maximal invariant of the reflection group G = {e, e} (ex := x, ex := x) is
(p(x) = Jx ( . Thus, {IX,l} is Markov if {X} and { X,} have the same transition
probability, i.e., if p(t; x, B) = p(t; x, B). The group G of transformations on 68'
generated by reflections around 0 and 1 (i.e., by g o x = x, g l x = g 1 (1 + x 1) _
1 x + I = 2 x) is infinite, and
1. It is shown in Example 14.1, that the so-called radial Brownian motion {IB,^} is a
diffusion on S = [0, co), such that "0 is inaccessible (from the interior of S). On the
other hand, if the process starts at "0 ", it instantaneously enters the interior (0, cc)
and stays there forever. Thus, although the diffusion may be restricted to the state
space (0, oo), "0" may also be included in the state space. The only other way of
including "0" in the state space is to make "0" an absorbing boundary, if the process
is to have the Markov property and continuous sample paths. The nonabsorbing
boundary point "0" in this case is called an entrance boundary. In general, an
inaccessible lower boundary a is an entrance boundary if
v= J a
xp
s(x)dm(x)> cc.
point cannot enter the interior; that is, no boundary condition other than absorption
is consistent with the requirement that the process be Markovian and have continuous
sample paths. Then the boundary is said to be an exit boundary.
In general, an accessible lower boundary a is exit if v a = oc, and an accessible
upper boundary h is exit if u b = co. For details see Ito and McKean, loc. cit., p. 108.
An example of a lower exit boundary is S = [0, oc ), (x) = ax, a 2 (x) = x with > 0,
which occurs as a model of growth of an isolated population (see S. Karlin and H.
M. Taylor (1981), A Second Course in Stochastic Processes, Academic Press,
New York, p. 239).
If one allows jump discontinuities at a boundary point, then the diffusion on
reaching the boundary (or, starting from it) may stay at this point for a random
holding time before jumping to another state. The holding time distribution is
necessarily exponential (see Chapter IV, Proposition 5.1), with a parameter 6 > 0,
while the jump distribution y(dy) is arbitrary. The successive holding times at this
boundary are i.i.d. and are independent of the successive jump positions, the latter
being also i.i.d. See Mandl (1968), loc. cit., pp. 39, 47, 66-69.
(Semigroups under Boundary Conditions) Let S = [a, b], p(. ) and a(. ) differentiable,
and a 2 (x) > 0 for all x e S. If Dirichlet or Neumann boundary conditions are
prescribed at a, b, then the Hille-Yosida theorem may be used to construct a
contraction semigroup {T,} generated by A on B:= C[a, b], or C 0 (a, b). As indicated
in theoretical complement 2.3 above, there exists a transition probability, with total
mass _< 1, such that Tj(x) = f f (y)p(t; x, dy) for all f c B. To give an indication how
the hypothesis of the Hille-Yosida Theorem is verified, consider Dirichlet boundary
conditions. That is, let B = C 0 (a, b), -9A the set of all Je B such that
Af := (d/dm(x))(d/ds(x))f(x) is in B. Fix A > 0, and let u u 2 be the solutions of
(it A)u = 0 with u i (a) = 0, ui(a) = 1, u 2 (b) = 0, ui(b) = 1. By the existence and
uniqueness theorem for linear ordinary differential equations, these solutions exist
and are unique. Define the Green's function G A (x, y) = W - `u 1 (x)u 2 (y) for x <, y, and
extend it symmetrically to x > y, with W as the Wronskian (see theoretical
complement 2.3). It is not difficult to check that for every f e B the function
G A f (x):= f G A (x, y)f (y) dm(y) is the unique element in B satisfying (2 A)G A f = f.
Since G A is positive and (G x 1)(x) may be shown directly to be less than I for all x,
the verification is complete, and we have a transition probability p(t; x, dy) with
p(t; x, S) < 1.
In the case of Neumann, or reflecting, boundary conditions at a, b, take B = C[a, b]
and 9,^ = the set of all f in B such that Af e B and f'(a) = 0 = f'(b). Then (.? A)
may be expressed as (A A) 'f = G z f + I I (f)u, + ! 2 (f )u 2 , where u I , u 2 , GA are as
-
above, and 1 1 (f),1 2 (f) are bounded linear functionals of f determined so that
g== (2 A) 'f satisfies the boundary conditions g'(a) = 0 = g'(b). The hypothesis of
the Hille-Yosida Theorem may be directly verified now. Since constant functions
belong to ,^ and (A A)(1/2) = 1, it follows that
Mandl (1968), loc. cit., Chapter II. The direct probabilistic constructions given in
the text (see Sections 6 and 7) do not make use of this theory.
2. (Eigenfunction Expansions) For a justification of the eigenfunction expansion
described in Section 8, consider first {T,} on C0 (a, b) under Dirichlet boundary
conditions. Extend T, to L2 ([a, b], a(y) dy = dm(y)). Now n(y) dy is an invariant
measure. For
dt
f T1f(Y)n(Y) dy = J ATf(Y)h(Y) dy = J Tf(Y)A *7 z(Y) dy = 0,
so that
(.? A) ' is the integral operator G., and it follows from a well-known theorem of
-
Riesz (see F. Riesz (1955), Functional Analysis, F. Ungar Publishing Co., New York,
Chapter VI) that (A A) ' is a compact self-adjoint operator whose eigenvalues
-
p(2) are positive and converge to zero as n ^ oo, and that the corresponding
normalized eigenfunctions 0. comprise a complete orthonormal sequence in L2 . As
a consequence, the eigenvalues of A are ;:= = A '(2) < 0, with the
corresponding eigenfunctions i/i,,.
Under Neumann boundary conditions, it follows from the representation
(A A) 'f = G x f + 1,(f )u 1 + 1 2 (f )u 2 that (A A) ' is again a compact self-adjoint
- -
Then in B. Belkin (1972), "An Invariance Principle for Conditioned Random Walk
Attracted to a Stable Law," Z. Wahrscheinlichkeitstheorie und Verw. Geb., 21,
pp. 45-64, it is shown that the meander process defined by (T.11.1) is a
continuous-parameter Markov process starting at 0 with transition probabilities given
by the following densities
(
D 1 ^(0, y)
p + (s, x; t, Y) = {(P1-s(Y x) cp,-s(Y + x)} 0 < s < t 1, x, y > 0,
ca l _,(0, x)'
where
Likewise, for the case of the excursion process defined by (T.11.2), one obtains a
continuous-parameter Markov process with nonhomogeneous transition law with
the following density (see Ito and McKean (1965), loc. cit., p. 76):
0<s<t<1, x,y>0.
The equivalence of these processes with those of the same name defined in
theoretical complements to Section I.12 is the subject of the paper by R. T. Durrett,
D. L. Iglehart, and D. R. Miller (1977), "Weak Convergence to Brownian Meander
and Brownian Excursion," Ann. Probab., 5, pp. 117-129.
can show that r < oo a.s., then the process {Z} defined by
Z,:=
(X,
Y, _>
for t <r,
for t 'r,
(T.12.1)
has, by the strong Markov property for the two-dimensional Markov process
{(X, ) ,) }, the same distribution as {X, }. In this case,
In order to prove r < oo a.s., it is enough to prove that the process {(X Y,)}
reaches every rectangle [a, b] x [c, d] (a < b, c < d) almost surely. For the process
must then go across the diagonal {(u, v): u = v} a.s. Let cp(u, v) denote the probability
that a pair of such independent diffusions starting at (u, v) ever reach the rectangle
[a, b] x [c, d]. Note that it x it is an invariant initial distribution for the
two-dimensional diffusion, and let {(U, V)} denote such a diffusion having the initial
distribution it x it. Let a denote the probability that the latter ever reaches
[a, b] x [c, d]. Also write D := {(U V) e [a, b] x [c, d] for some t > n }, and
F.:= {(U V,) e [a, b] x [c, d] for some t _< n }. Then a = P(D) + P(F n D.c). But, by
stationarity, P(D) = a. Therefore, P(F, n D ) = 0. By the Markov property,
P(F n Dc) = E'[IF.,( 1 ro(U,,, V'))] = E(P(F I (U,, V'))(l qp(U,,, l' )).
Now we may use the positivity of the transition probability density to show easily
that P(F, U = u, V. = v) > 0 for almost all pairs (u, v) (with respect to Lebesgue
measure on S x S). Therefore, I (p(u, v) = 0 for almost every pair (u, v). By using
the continuity of x p(t; x, y) it is now shown that (u, v) * c,(u, v) is continuous,
so that cp(u, v) = I for all (u, v). This proves that r < oo a.s.
E(X k,n
22 I p--
'^kI,n)>
2 =
Sk.n
k
2
62.,
i =1
kn
(b) p k , n () -+ exp{ X 2 /2} in probability.
Indeed, if J([ <p k , n (^)) - 'I -< b(^), then (a), (b) imply
E exp{iZSf , kn } E exp{i_Sn , k }
E ex
P{ i S n.kn } ex p{ Z/ }I P{ 2/2
2 ==ex
} exp{ Z2/2} 1 1 4'k.,,()
Sexp{Z2/2
Z 1 I .0.
}Elexp{
1 /2} f (Pk.n(Z)
(T.13.3)
Now part (a) follows by taking successive conditional expectations given .yk-1,n
(k = kn , kn 1, ... ,1), if (JJ ())-' is integrable. Note that the martingale
difference property is not needed for this. It turns out, however, that in general
(f cannot be bounded away from zero. Our first task is then to replace Xk,n
by new martingale differences Yk , n for which this integrability does hold, and whose
sum has the same asymptotic distribution as S., kn . To construct 1, first use
THEORETICAL COMPLEMENTS 509
Similarly, there exists, for each e > 0, a nonrandom sequence O(E) j 0 such that
A k (s):= {Qk, < L k. < O (E) , se.,, < 2 }, (1 _< k _< Q. (T.13.6)
k
P(Yk.n = Xk.n for 1 -< k < kn) >- Pn Ak,n(E)
k=1
z \l (^ \
^ kn() ^ 1 2 QkZn i E exP(i Yk.n } ( 1+ t(; Yk,n + (1 2 ) Yk.n
L 1 k-1
\\\ )1
=E
L 2 Yk
f
ol
(l u)(exp{iu^Yk } 1) du I ^k_ 1
"J^
JI
<E
II3
2 6k ' n + SzZ E(Yk.n
/ c-
ll^Yt.^^>e} ^-ln)
3
II akZ , + 2 E(Xk.n 1 Urk."I>E1 j Ak- l.n) (T.13.9)
j^
11 ^k.n(S) ^ - Qk n^ _ Y_k
x
f 1
SZ
k f//
11
"2 2 J
2 ak,n
1
and
J 2 ak? }
(T.13.11)
8 8 4
Therefore,
( ^2 Sa
1 j (Pk.n (^) exp1 2 Sn.k 1 1 3 E +X 2 0(E) + 4 8 n . (T.13.12)
By choosing E sufficiently small, one has for all sufficiently large n (depending on E),
Ifl gqk.n(^)I bounded away from zero (uniformly in n). Therefore, (a) holds for {Yk , n },
for all sufficiently small E (and all sufficiently large n, depending on E). By using
relations as in (T.13.3) and the inequalities (T.13.12, 13) and the fact that s k. - 1
in probability, we get
2
lira E exp{i^S k } exp --
n^m 2
( ^2
exP 2 ii') (T.13.14)
( 1
Finally,
_ z2 l
urn E exp {i S, k ,} exp
S }
n- W 2 )
THEORETICAL COMPLEMENTS 511
The extreme right side of (T.13.15) goes to zero as e j 0, while the extreme left does
not depend on e. n
Condition (ii) of the theorem is called the conditional Lindeberg condition. For
additional information on martingale central limit theorems, see P. G. Hall and
C. C. Heyde (1980), Martingale Limit Theory and Its Applications, Academic Press,
New York.
One may deduce from this theorem a result of P. Billingsley (1961), "The
Lindeberg-Levy Theorem for Martingales," Proc. Amer. Math. Soc., 12, pp. 788-792,
and I. A. Ibragimov (1963), "A Central Limit Theorem for a Class of Dependent
Random Variables," Theor. Probability Appl., 8, pp. 83-89, stated below.
such that {X: n _> 0} has the same distribution as {X: n >_ 0} has. This can be
accomplished by setting for each m, the finite-dimensional distribution of any m
consecutive terms of the sequence {X: n e Z) to be the same as the distribution of
(X0 ..... X, _ , ). Since this specification satisfies Kolmogorov's consistency
requirements, the construction is complete on the canonical space 08 1 . To simplify
notation we shall write X, instead of X, for this process.
If a 2 = 0, then X = 0 almost surely, and the desired conclusion holds trivially.
Assume a 2 > 0.
First, let us show that {X} so extended is a {3} -martingale difference sequence,
where ^ := a {X; : co <f _< n }. We need to show that, for each fixed n > 0,
E(Xn +l I `;n) = 0, i.e.,
`
E(1 X ,) = 0
A + for all A e .fF. (T.13.16)
This is true, e.g., for all f in case S is finite. In the case (T.13.17) holds, h satisfies
i.e., f' belongs to the range of I - T regarded as an operator on LZ (S,., iv). Now it
is simple to check that
Also, by (T.13.18),
n n-1
Zn = (h(Xm) - (Th)(Xm-1)) _ (h(X,) - (Th)(Xm)) + h(X) - h(X0)
m=1
nI
_ r' f'(Xm ) + h(X) - h(Xo ). (T.13.23)
m=0
Since
and
1 " 11
1
m
1=o f'(Xm) = Z
In
(h(X") h(X0)),
f
(T.13.24)
it follows that the left side converges in distribution to N(0, a 2 ) as n * Co. We have
arrived at a result of M. 1. Gordin and B. A. Lifsic (1978), "The Central Limit
Theorem for Stationary Ergodic Markov Processes," Dokl. Akad. Nauk SSSR, 19,
pp. 392-393.
Theorem T.13.3. (CLT for Discrete-Parameter Markov Processes). Assume p(x, dy)
admits an invariant probability n and, under the initial distribution n, {X"} is ergodic.
Assume also that f' := f f is in the range of I T. Then
Z" = g(X) J 0
Ag(XX) ds = g(X)
J o
f(X) ds (n = 1, 2,...)
a z = EIg(X,) g(X 0 ) J 0
1 Ag(XS) ds
j
2 . ( T.13.26)
514 BROWNIAN MOTION AND DIFFUSIONS
But
Therefore,
Also, for each positive integer n, {Zk/n - Z(k-1)/n 1 < k < n} are stationary martingale
differences, so that
n
Q 2 = E(ZI 4) 2 = E E(Zk/ n Z(k 1)/n) 2 = nE(Z11n 4)2
k=1
f
/n 2
f
/n
- 2nE (g(XI/n) - g(X0)) Ag(XX) ds . (T.13.27)
o,
Now,
= n 2 Eg 2 (Xo) - 2 J g(x)TI/ng(x)it(dx)]
[
(^(' 1/n
f J
= - 2n g(x)T,Ag(x) ds ir(dx) --o - 2
0, /n
(T.13.28)
g(x)Ag(x)rr(dx),
( E
lJn
o
1 2
g(XS) ds I-< E(!
/ n
1
fo
l/"
(g(Xs))2ds) _ - d
n o
E(g)s
2 (Xo) J(' I/n
= 1
n
E(Ag)
2 2 (X0). (T.13.29)
Applying (T.13.28, 29), the product term on the extreme right side of (T.13.27) is
seen to go to zero. Therefore, a 2 = - 2<g, f > + o(1) as n -+ oo, which implies
a 2 = - 2<g, f >. This proves the result for the sequence t = n. But if
then
as n p since
n
G0' If(X)Ids)
.
(Tag g)(x) = h
n
TS(.% .%)( x) ds ' (f
J 0
Ta(f f)(x) ds = I f
os
J
y f( )(p(s; x, dy) n(dy)) ds. (T.13.26)
Therefore, one simple sufficient condition that f f belongs to the range of for
every f e LZ(S, n) is
To see this consider the subspace B of C,(S) comprising all functions g o gyp, g in
the closure of the linear span of 2(f) of W . Then B is a Banach space, and A is the
generator of a semigroup on B by the Hille -Yosida Theorem and property (i). This
implies in particular that T, maps lB into itself, so that T,(g o 4,) is of the form h, o cp
for some bounded continuous h, on S'. Since 5 is a determining class, it follows that
T,(g o cp) is of the form h, o cp for every bounded measurable g on S'. Therefore,
Proposition 6.3 applies, showing that {q (X,)} is a Markov process on S'.
Choose y sufficiently large that the expression within square brackets is strictly positive
for all x e G. Then Au(x) > 0 for x e G, for all t > 0. Applying the conclusion of the
preceding paragraph conclude that the maximum of u t is attained on l G, say at x^.
Since G is compact, choosing a convergent subsequence of x L as E j 0 through a
sequence, it follows that the maximum of u is attained on EG.
Finally, if Au(x) = 0 for all x e G, then applying the above result to both u and
u one gets: u attains its maximum and minimum on c?G. Note that for the above
proof it is enough to require that '(x), d ;j (x), I _< i, j < k, are continuous on G,
(
and d 4 (x) > 0 on G for some i. Under the additional hypothesis that ((d ;; (x))) is
positive definite on G, one can prove the strong maximum principle: Suppose G is a
bounded and connected open set. 1f Au(x) = 0 for all x e G, and u is continuous on G,
then u cannot attain its maximum or minimum in G , unless u is constant.
The probabilistic argument for the strong maximum principle is illuminating.
Under the conditions (1)(4) of Section 14 it may be shown that the support of the
probability measure P. is the set of all continuous functions on [0, ou) into R', starting
at x (see theoretical complements to Section Vll.3). Therefore, for any ball B x, the
Pr -distribution i(x; dy) of X i 8 has support i)B. Now suppose u is continuous on G
and Au(x) = 0 for all x e G. Suppose u(x 0 ) = maxju(x): x e G} for some x o e G. Then
for every closed ball B e = ;x: Ix x 0 ) _< e} c G, the strong Markov property yields:
u(x o ) = E..u(X,,,,,,), in view of the representation (15.22). (Also see Exercise 3.14 of
Chapter VII ). That is,
Example 1. (Radial Diffusions). Let {X} be a diffusion on ll ' with drift s(.), and
diffusion matrix ((d1 (. ))). If I d;j (x)xl' x'j and 2 E x ( ' ) (i) (x) + E d 1 (x) are radial
) )
functions (i.e., functions of Ix), then {IX,I} is a Markov process. This follows from
the fact that if f is a smooth radial function then Af is also radial (see (15.35), and
theoretical complement 14.1). The underlying group G in this case is the group of
all orthogonal matrices.
Example 2. (Diffusions on the Torus). Let {X 1 } be a diffusion on Ft" such that 1 ` ) (x)
and d ij (x) are invariant under the group G generated by translations by a set of k
linearly independent vectors ^,,, 4 2 , ... , 4k . In other words,
for all x and all k-tuples of integers (m 1 , M21. .. , m k ). Now express x (uniquely) as
S,(x) r , and let 8,(x) = 5 r (x)(mod 1) e [0, 1), 1 -< r -< k. Then the maximal
invariant under G is rp(x):= ($ 1 (x), ... , &(x)), and {cp(X,)} is a Markov process on
the torus [0,1) k .
1 FINITE-HORIZON OPTIMIZATION
Consider a finite set S, called the state space, a finite set A, called the action
space, and, for each pair (x, a) e S x A, a probability function p(y; x, a) on S:
The function p(y; x, a) denotes the probability that the state in the next period
will be y, given that the present state is x and an action a has been taken.
A policy (or, a feasible policy) is a sequence of functions (fo , f,, .. .) defined
as follows. fo is a function on S into A. If the state in period k = 0 is x 0 then
an action f0 (x 0 ) = a o is taken. Given the state x 0 and the action a o = fo(xo),
the state in period k = I is x, with probability p(x,; x 0 ,f0 (x 0 Now fl is a
)).
k = 2 (namely x0, ao = .fo(xo), xi, ai = fi(xo, ao, x1), x2, a 2 = .f2(x0, ao, xi, ai x2)),
,
519
520 DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
distribution of X l is given by
P(X2 = I
x2 X0 = x0 , X1 = x1) = P(x2; x1, al), al = f1(xo,f0(x0) , x1).
(1.3)
In general, given X. = x0, Xl = x1, Xk -1 = X k _ 1' the conditional distribution
of Xk is given by
P(Xk = Xk I XO = X0, Xl = X1, ... , Xk- I = xk- 1) = P(xk; Xk- 1, ak- 1), (1.4)
where the actions ak are defined recursively by
ak = fk(xo , a0, x1, al, ... , Xk-1r ak-1 , xk) (k = 1, 2, ... N),
(1.5)
a0 = f0(xo)
The joint distribution of X0 , X 1 , ... , Xk is then given by
(k = 1,2,...,N) (1.6)
with the states a 0 , a 1 , ... defined by (1.5). Expectations with respect to n,N
will be denoted by o En .
The kth period reward function is a real-valued function g k on S x A such
that if the state at time k is X k and the action at time k is a k , then a reward
g k (x k , a k ) accrues. The total expected return for a policy f is given by
N
Jao Eaak=0
Z gk(Xk, ak), (1.7)
Let F denote the set of all policies. The object is to find an optimal policy,
i.e., a policy f* = ( f , ... , f N) such that
where
Let us show that a Markovian optimal policy exists. For each yE S pick an
element f N(y) (perhaps not unique) of A such that
Write
say, obtained successively (i.e., by backward recursion) starting from (1.11) and
(1.12).
f^ = (fo'f . ..
(1.15)
f(0) = f*
First note that the joint distribution of Xo , X 1 , ... , Xk is the same under f
and under f (k) (i.e., n, k = it). In particular,
k-1 k-1
ERO 9k'(Xk,, a k .) = E o ' Z g k (Xk ., a k .). (1.17)
k'=0 k'=0
Now,
Together with (1.17) with k = N, this proves (1.16) for k = N. To prove (1.16)
for k = N 1, note that
Eno(9N_,( XN -1 aN -1)
, + 9N(XN, aN)) <, Erzo 11 (9N-1( XN -1 aN -1) , + 9N(XN, aN))
(1.22)
Together with (1.17) for k = N 1, (1.22) yields (1.16) for k = N 1.
To complete the proof we use backward induction. Assume that
N
r
E no .
9k'(Xk ak') I XO = XO, , Xk +l = Xk+I hk+11/ ll23)
(1.23)
k'=k+1
I
N
En`" .) IX
(X.
9k' k>a k k+l = x k+l h k+l(x k+l )
(1.24)
k'=k+1
Y_
for all X k+ 1 e S. Then
N
= gk(xk, ak) + Ea o Eao 9k'(Xk', a k ,) I X 0 ,.. . , Xk +1
k'=k+1
XO = X 0 , X i = x 1 , ... , Xk = xk1
The first inequality in (1.25) follows from (1.23), while the last inequality follows
from (1.14). Hence (1.23) holds for k. Also,
En o [gk(Xk, ak) +
hk(xk) = gk(xk, f (xk))
r1kl
+
Y_
1] hk+ I (z)p(z; xk, f (xk))
=ES
k'=k +I
gk'(Xk', ak') Xk = Xk I (1.26)
524 DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
E 110 1nol
E E gk'(Xk'^ ak') ( Xk^ Xk+l Xk
[ k'=k+l
N
=t o gk'(Xk,, a k ) Xk . (1.27)
k'=k+1
k-1
= E><o(^ 9k'(Xk', Q k )) + Erzp(E 9k (Xk s Qk)
r N
k-1
(Ikl r (Ikl
N
(Ikl (Ikl
= En o gk'(Xk., Q k .) = J np . (1.28)
k'=0
Exactly the same proof works for the more general result below (Exercise 3).
E rzo
N
k=0
g(X , ak, Xk + 1) = Erzo
= Eno Y_
N
k=0
N
k=0
gk(Xk, a k ).
(1.30)
N^ E ao
i En o k ^ gk(Xk, ak)J 2 =gk(Xk, a k )) _c=J o _c. (2.3)
i E^^ ( = O
lim f "(x) = f(x) for all k = 0, 1, 2, ... , and for all x e S, (2.4)
n + ao
for some sequence of functions f,*. Since S and A are finite, (2.4) implies that
f,*, b^(x) = f k (x) for all sufficiently large n. From this it follows that given any
integer N there exists n(N) such that
IN
Eao^ gk(Xkr ak)) = EIto gk(Xk, ak)) for n > n(N). (2.5)
\k=0 \k=0
In particular, for fixed E > 0, (2.5) holds for N = N E . In view of this and (2.2),
one has for all n > n(N).
s N` 1 E Ea ,
JEo k^ gk(Xk, ak) ) 2 = E no ^ gk(Xk, ak)) 2 i J n o E. (2.6)
(k N'
Now apply (2.3) to this to get J`, > Jn 0 S E for all n > n(N), and then let
n cc to obtain
JRO ? J,` 0 E.
THE INFINITE-HORIZON PROBLEM 527
Theorem 2.1. Under the hypothesis (2.1), there exists an optimal Markovian
policy.
We now consider a special case of Theorem 2.1, the case of discounted dynamic
programming. Assume that
JI = Ex bkgo(Xk, ak)
k=0
Therefore,
f in) = 1 01 -11
(n = 1, 2, ...), f *(0) = f* (2.13)
where
f *(n-1):_
(fo,JO, ... IJo,
flI f2l ...),
with fo appearing in the first n positions. By (2.2) it follows that the stationary
policy f* = ( f * ,f *, f *' 'f, ...) is optimal. We shall denote it by f*. This
proves the first part of Theorem 2.2 below. The dynamic programming equation
(2.14) below may be derived by letting k T oo in (1.14) with g k (x, a) = S k (x, a).
However, an alternative derivation of Theorem 2.2 is given below.
Let us describe first the so-called dynamic programming principle. Suppose
there exists a stationary optimal policy f* = ( f *, f *, ...) in the case of
discounted rewards (2.8) with the discount factor S satisfying (2.9). Let J, := Jz'
denote the optimal reward, starting at state x. Fix x e S and a e A. Consider a
Markovian policy f' = ( fo , f *, f *, . ..) for which f0 (x) = a, while the optimal
policy is used from the next period onward. Since the conditional distribution
of {Xk : k > 1}, given X 1 under f' is the same as that under f*,
k=1 X, /]
= go(x, a) + b 1] J2p(z; x, a).
Z E.S
The optimal reward Jx must equal the maximum of Jz over all choices of
f0 (x) = a E A. Thus, one arrives at the dynamic programming equation (2.14).
Theorem 2.2. Let g o be bounded and 0 < 6 < 1. Then for the discounted
rewards (2.8) there exists a stationary optimal policy f* = ( f *, f *, ...). This
policy satisfies the dynamic programming equation, or the Bellman equation, for
each x e S, namely
Proof. Denote by B(S) the set of all real-valued functions on S. For each
THE INFINITE-HORIZON PROBLEM 529
(2.15)
(TJ)(x) = max { 9 o (x, a) + 6 1 J(z)p(z; x, a) } (x ES, J e B(S)).
aEA (( ZES ))
(TJ)(x) (TJ')(x) = (Tf J)(x) (TJ')(x) <, (Tf J)(x) T1 J'(x). (2.19)
sup [(TJ)(x) (TJ')(x)] < sup [(TJJ)(x) (TfJ')(x)] < 611 J J'. (2.20)
xeS XES
This inequality being true for all J, J' e B(S), one gets (interchanging the roles
of J and J'),
IITJ TJ'jI <6IIJJ'II. (2.21)
Now let f be an arbitrary policy. In view of (2.22) one has, for all k = 0, 1, 2....
(and all x),
Summing (2.24) over k = 0, 1, 2,... and canceling common terms from both
sides it follows that
W
J * (x) i h k f g0(Xk, ak) = J. (2.26)
k=0
For the map T defined in (2.15), (T N O)(x) is the N-stage optimal reward
function for the discounted case, as may be seen from (1.11)-(1.14) and Theorem
1.2. Hence J*(x) = lim N -.(T N 0)(x) is the optimal reward function in the
infinite-horizon case. The contraction property of T shows that T"'J(x)
converges to the optimal reward function, no matter what J is.
The proof of Theorem 2.2 extends word for word to the following more
general case.
Example. Let S = 7L, the set of all integers, A = [a, 1 a] with 0 < a < and
Take gk (x, a) = b k ^p(x) (0 < b < 1), where co is bounded and even (i.e.,
cp(x) = gyp(x)), and cp(jxl) decreases as lxi increases. Then the hypothesis of
Theorem 2.3 is satisfied. Under a stationary policy f = (f, f, . ..) the stochastic
f(0) ,
YO' = P(Xk +1 = 1 I Xk = 0) _ O = P(Xk +l = l I Xk = O) =
2
Since the maximum of cp is at zero, and q(ix ) decreases as lxi increases, one
expects the maximum (discounted) reward to be attained by a policy
f* = (f* f*,.. .) which assigns the highest possible probability to move in the
,
1 a ifx<0,
Also, (iii) is trivially true for all k if x = 0. For x > 0 one has, by (iii),
/ a a /
Eogo(Xk+l) = ( 1 a)EO(p(Xk) + 2 E-Igq(Xk) + 2 E1(P( Xk)
_ (1 a)Eo^P(Xk) + aEI(P(Xk)
Jx := I, 6k E.(V(Xk) (2.35)
k=0
satisfies the dynamic programming equation (2.14). That is, one needs to show
Now, by (2.30(i)),
Jx_I'>Jx+I forx>,l,
so that, assigning the largest possible weight to the larger quantity J._ I ,
Hence, Jx satisfies (2.36) for x >, 1. In exactly the same way one shows, using
(2.30(ii)), that J. satisfies (2.36) for x < 1. Finally, since J-, = J 1 by (2.30(iii)),
OPTIMAL CONTROL OF DIFFUSIONS 533
_, + --J,, (2.39)
max (1 a)Jo +2J_, +-J 1 ]= (1 a)Jo +2J
acA [
2
assigning the largest possible weight to J0 for the last equality. The proof of
(2.36), and therefore of the optimality of f *, is now complete.
In general, even for the infinite-horizon discounted case, it is difficult to
compute explicitly optimal policies and optimal rewards. The approximations
T"0, and the corresponding N -period optimal policy given by backward
recursion (1.11)(1.14), are therefore useful.
Let G be an open set c l, and consider the problem of minimizing the expected
penalty E X f (X.. (; ) on reaching the boundary 0G, starting from a point x outside
of G= G v 1G. Here {X,} is a diffusion on Fk" \ G, with absorption at 1G. The
function f is a continuous bounded function on 1G. If f = I on 1G, the problem
is to minimize the probability of ever reaching 13G. As usual TG denotes the
first time to reach the boundary 1G.
The drift vector (x, c) and diffusion matrix D(x, c) of the diffusion are
specified. The control variable c, which may depend on the present state x,
takes values in some compact set C c 18m The function c(.) (on W'\G into C)
so chosen is called a control, or a feedback control. The class of allowable, or
feasible, controls may vary. For example, it may be the class of all measurable
functions, the class of all continuous functions, the class of all continuously
differentiable functions, etc.
Let us give a heuristic derivation of the dynamic programming equation, by
means of the so-called dynamic programming principle. Write Ex for the
expectation when the diffusion has coefficients i(., c), D(., c) with c constant,
and xis the initial state. If c(. ) is a control (function), this expectation is denoted
EX^ , the diffusion having drift and diffusion coefficients (y, c(y)), D(y, c(y)).
-)
Write
Je(x) '= Ex i(X r ^), J` (x) = EX ' f (Xr.G), J(x)t= inf J(x), (3.1)
() ( )
c(-
where the infimum is over the class of all feasible controls. Suppose there exists
an optimal control c*(.). Consider the following small perturbation c l () from
the optimal policy. c l (.) takes the constant value c for an initial period [0, t],
after which it switches to the optimal control c *(. ). This control is of course
time-dependent and not feasible under our description above. But one may
actually allow time-dependent controls in the present development (theoretical
534 DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
Note that if after time t the state is X, e Il k \ G, then the conditional expectation
of f (Xt G ) is J(X,), as the optimal control c*(.) is now in effect. Also, the
probability that t -> 'r OG is o(t) for x e Il k \ G (see Exercise 1 for an indication
of this). As J" ' (x) > J(x), one gets
( )
But the left side is TIJ(x) J(x), where {T',} is the semigroup of transition
operators generated by
2
a
x^'^axu^ + Y pl'>(x, c) a (3.4)
A` : 2 d " (x ' c)
with absorbing boundary condition on 8G. Hence, dividing both sides of (3.3)
by t and letting t f, 0, one gets
A,J(x) >, 0. (3.5)
By minimizing the left side of (3.3) with respect to c, one expects an equality
in (3.3) and, therefore, in (3.5); i.e., the dynamic programming equation is
We will illustrate the use of this rigorously in Example 1 below. Before this
is undertaken, a simple proposition in the case k = 1, f - 1, directly shows how
to minimize the expected penalty when the diffusion coefficient is fixed. Let
p ) () (i = 1, 2) be two drift functions, Q 2 (.) > 0 a diffusion coefficient specified
on (0, oo), satisfying appropriate smoothness conditions. Let i/i(z) denote the
probability that a diffusion with coefficients p ( O( ), 6 2 (.) reaches 0 before
reaching d, starting at z. Here d > 0 is a fixed number.
It is intuitively clear that, for the same diffusion coefficient a 2(.), larger the
drift tc(.) better the chance of staying away from zero. Here is a proof.
Write
`I( z ) pti^(z),
^(z):= y i (z) =t' (z) + srl(z),
and let F(r; z) denote the probability of reaching 0 before d starting at z, for a
diffusion with drift coefficient E ( ) and diffusion coefficient o2(.). It is
straightforward to check (Exercise 2) that
where
a
y(z):=
J
f exp
o
^ ti ^
2
(Y) rl(Y)
2 dy
6 (Y) 0
22
6 (Y)
dy du
I
exp
0
2 pta^(
6
Y)
z dy du.
(Y}
z(3.9)
Since rl(z) >, 0 for all z, it follows that
Zo =z >0,
(3.12)
Z, =exp {W +n }(Z,ah), (t = nh; n=0,1,...)
Therefore, one may, in the limit as h j 0, model Z, by a diffusion with drift t(.)
and diffusion coefficient Q 2 (.) given by
The state space of this diffusion should be taken to be [0, cc), since negative
capital is not allowed. One takes "0" as an absorbing state, which when reached
indicates the agent's economic ruin.
In the discrete-time model, suppose that the agent is free to choose (m, v)
for the next period depending on the capital in the current period. This choice
is restricted to a set C x (0, cc). In the diffusion approximation, this
amounts to a choice of drift and diffusion coefficients,
such that
(m(z), v(z)) E C for every z e (0, co). (3.16)
The object is to choose m(z), v(z) in such a way as to minimize the probability
of ruin. The following assumptions on C are needed:
First fix a d > 0. Suppose that the agent wants to quit while ahead, i.e., when
a capital d is reached. The goal is to maximize the probability of reaching d
before 0. This is equivalent to minimizing the probability O(z) of reaching 0
before d, starting at z.
It follows from Proposition 3.1 that an optimal choice of (m(z), v(z)) should
be of the form
The problem of optimization has now been reduced to a choice of v(z). The
dynamic programming equation for this choice is contained in the following
proposition. To state it define, for each constant v E [v * , v*], the infinitesimal
generator A by
For a given measurable function v(.) on (0, cc) into [v * , v *] define the
infinitesimal generator A v( . ) by
Fix d > 0. Let /i,, ( . ) (z) denote the probability that a diffusion having generator
A, ( . ) , starting at z e [0, d], reaches 0 before d. For simplicity consider v(.) to
be differentiable, although the arguments are valid for all measurable v(.) (see
theoretical complement 2). Write
Proof. In the case v * = v*, Proposition 3.1 already provides the optimum
choice. We therefore assume v * < v *. Let li(z) be a twice continuously
differentiable function satisfying (3.22), and v() a (measurable) function such
that the minimum in (3.22) is attained at v = v(z). Then,
It follows from (3.23), since its solution is unique (Exercise 7.7(iii), (iv) of Chapter
V), that i(z) is the probability of reaching zero before d, starting at z, for the
diffusion generated by A, ( . ) . In particular, i'(z) < 0. One may actually explicitly
calculate i'(z) from (9.19) in Chapter V. Therefore, to determine v(z) we find
the critical point(s) of the expression within the square brackets in (3.25) by
setting its derivative equal to zero. This leads to
(i(v(z)) + Zv(z))z a = (i
(v) + z)z, (3.26)
v(z)
or,
.f(v(z)) v(z)f'(v) = a
z
As the derivative of the left side is v f "(v) > 0, the left side increases from
oo to + co as v increases from v * to v*. Therefore, for each z > 0 there exists
a unique solution of (3.27), say v = v(z) e (v * , v*). Since the right side of (3.27)
decreases as z increases, it follows that the function v() is strictly decreasing.
It remains to show that cui is minimal. For this, consider an arbitrary
(measurable) choice v(.). It follows from (3.22) that
Since (A V( . ) ct' V( . ) )(z) = 0 and >/i and ^i v( .^ satisfy the same boundary conditions,
letting u(z):= ii(z) 0,,. (z), one gets
(A, ( . ) u)(z) > 0 for all z e (0, d), lim u(z) = 0, lim u(z) = 0. (3.29)
zj0 zld
does not depend on d. Therefore, with this choice one also minimizes the
probability p Zo of ever reaching 0, starting from z. For this, simply let d j oo in
the inequality
(.(.)(z) <' ^1l ((z). (3.32)
(v v0)2 1 / 2
C ={(m,v):vo^v'<vo+,m-mo+a^l (3.33)
where the state space of the diffusion {X 1 } is an open set G c W. The diffusion
has drift (y, c(y)) and diffusion matrix D(y, c(y)); r(y, c) is the reward rate
when in state y and an action c belonging to a compact set C c 11" is taken,
and > 0 is the discount rate.
The given functions ph i) (y, c) (1 < i < k) are appropriately smooth on G x C,
as are d(y, c) (1 < i, j < k), and D(y, c) is positive definite. Once again, the set
of feasible controls c(.) may vary. For example, it may be the class of all
continuously differentiable functions on G into C. The function r(y, c) is
continuous and bounded on G x C.
If, under some feasible control c(. ), the diffusion {X,} can reach G with
positive probability, then in (3.34) the upper limit of integration should be
replaced by rO G Let
J(x):= sup J`O(x). (3.35)
a.)
In order to derive the dynamic programming equation informally, let c *(.) be
an optimal control. Consider a control c 1 (.) which, starting at x, takes the
action c initially over the period [0, t] and from then on uses the optimal control
c *(. ). Then, as in the derivation of (3.2), (3.3),
-s
c(XS)) ds
= E`
fo, e sr(X,, c) ds + EX^'Ie `
-
f 'o
e ri r(X^ +s, c(Xr +s)) ds'
= E` o e - sr(X s , c) ds + e - Q`ExJ(X,)
J
= t(r(x, c)) + o(t) + e - `T,J(x) < J(x),
540 DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
or,
One expects equality if the right side of (3.37) (or (3.36)) is maximized with
respect to c, leading to the dynamic programming equation
Example 2. The state space is G = (0, oo). Let c denote the consumption rate
of an economic agent, i.e., the "fraction" of stock consumed per unit time. The
utility, or reward rate, is
c) = (cx)1
r(x, (3.39)
y
Let us for the moment assume that the maximum in (3.41) is attained in the
interior so that, differentiating the expression within braces with respect to c
one gets
cY-'xY = xJ'(x),
or,
Since the function within braces in (3.41) is strictly concave it follows that (3.42)
is the unique maximum, provided it lies in (0, 1). Substituting (3.42) in (3.41)
one gets
262x2)(.x) J(x) = 0. (3.43)
+ SxJ'(x) I
\ Y/
Try
Try the "trial solution
J(x) = dxv. (3.44)
-1) d = 0,
a 2 dy(v I) + dy (1 I I (dv)rrw (3.45)
vl
i.e.,
1 1 y 1-v
d = -- - (3.46)
v + .a 2 v(l y) ay
72
ii^ v^ = + Y( 1 =r SY . )
(3.47)
c * := c * (x) = (yd) 1
I y
Thus c*(x) is independent of x. Of course one needs to assume here that this
constant is positive, since a zero value leads to the minimum expected discounted
reward zero. Hence,
+ a 2 y(1 y) by > 0. (3.48)
For feasibility, one must also require that c* < 1; i.e., if the expression on the
extreme right in (3.47) is greater than 1, then the maximum in (3.41) is
attained always at c* = 1 (Exercise 5). Therefore,
where d' is the extreme right side of (3.47). We have arrived at the following.
Proposition 3.3. Assume 6 > land (3.48). Then the control c*() = min{ 1, d'}
is optimal in the class of all continuously differentiable controls.
Proof. It has been shown above that c*(. ) is the unique solution to the dynamic
programming equation (3.38) with J = J` * . Let c() be any continuously
differentiable control. Then it is simple to check (see Exercise 6) that
the inequality
or,
Under the optimal policy c* the growth of the capital (or stock) X, is that
of a diffusion on (0, co) with drift and diffusion coefficients
This problem is solved by backward recursion in much the same way as the
finite-horizon dynamic programming problem was solved in Section 1.
We first give a somewhat heuristic derivation of rm. If S = . . . ., X}
(n >, 0), and X0 , X 1 ,. . . , Xm _ 1 have been observed, then by stopping at time
m 1 the loss incurred would be Ym _ 1 . On the other hand, if one decided to
continue sampling then the loss would be Ym . But Ym is not known yet, since
Xm has not been observed at the time the decision is made to stop or not to
stop sampling. The (conditional) expected value of Ym , given X0, ... , Xm_1,
must then be compared to Ym _ 1 . In other words, rm is given, on the set
{im >,m l}, by
0 if Yo ` E(V1 I .Fo),
(4.5)
m >1 ifYo>E(Vil^o) on{t0} =52.
Theorem 4.1. (Optional Stopping Theorem for Submartingales). Let {S; : j > 0}
be an {.}-submartingale, r a stopping time such that (i) P(r < cc) = 1, (ii)
E^SZ C < oc, and (iii) lim n . E(S,l {t>m) ) = 0. Then
Proof. The proof is almost the same as that of Theorem 13.1 of Chapter I if
we write XX := SS SJ _ 1 (j >, 1), X0 = Sp, and note that E(XX I .9j _ 1 ) 0 and
accordingly change (13.8) of Chapter Ito
This gives the first relation in (4.9). The second relation in (4.9) follows by the
martingale property of { Vtm J } (and Theorem 4.1). Note YL ^, = V^.,. n
Y Y
In the minimization of EYt over .9 , need not be .-measurable. In such
cases one may replace by E( Y .) = U, say, and note that, for every t C-
m m
J=0
E[ 1 1t=r)UJ) = EUt .
Hence the minimization of EYE reduces to the minimization of EUr over .%,,,.
Also, instead of minimization one could as easily maximize EYT over .9,.
Simply replace min by max in (4.7), and replace ">," by "<," in (4.9). This is
the case in the following example, in which the indexing of the random variables
and sigmafields starts at j = I (instead of j = 0).
Example 1. (The Secretary Problem or the Search for the Best). Let m> 1
labels carrying m distinct numbers be in a box. A person takes out one label
at a time at random, observes the number on it and sets it aside before the
next draw. This continues until he stops after the rth draw. The objective is to
maximize the probability that this rth number is the maximum M in the box.
Thus, if Wj = 1 or 0 according as the jth number X; drawn is the maximum or
not, one would like to maximize EW over the class of all stopping times
r(<, m) relative to the sigmafields .Fj := a{X I , ... , X3 }, I <, j '< m.
Define the .F -measurable random variables
;
Y` =E(WiI.Fi)=P(XX=MI ). (4.14)
j
Y = m l(x;=Mj1' (4.15)
Hence the maximum of EWt over 9-m is also the maximum of EYt over m . In
order to use Theorem 4.2 with min replaced by max in (4.7) and ">" replaced
by "<," in (4.9), we need to calculate Vj (1 < j < m). Now Vm = Ym and (see
Eq. 4.15)
J 1 m-1 1
Vm -i =max {Ym-1 E(Ym I ^m-1) }
, = 1 {X Mm + m 1 {X--i<M.,,- }-i)
Then,
m-1
E( , I ^m 2) = P(Xm-1 = Mm-1 (gym-z) + 1 P(Xm - 1 < Mm-1 m-z)
m m
m 1 1 lm-2 m-2 1 1
_ + _ I +
m m-1 mm-1 m m-2 m-1
(
(4.16)
To evaluate
note that if (m _2)_1 + (m 1)' < 1 then (see Eqs. 4.14 and 4.16)
m-2 m-2( 1 1
V m z = m 1txm-2M.-z} + m 1 ltxm-2<n^m :} (4.17)
m m-2
m-3 1 1 1
E( V
m Iz m 3)=
m m-3+m-2+m-1
1 11
J (4.18 )
m j j+l m I
Then,
V; = max{ Y, E(V;+ 1 F; )} = max { m 1(x; =M; ) ' m ai}
I
= 1 {x;=M;} + m aj1jx;<M;},
and, since P(X; = M; 35- 1 ) = 1/j, it follows that
j-1 11 1 J-1
E(Vj I. j -^)= +-+...+ = a;-^
m j-1 j m-1 m
The induction is complete, i.e., (4.18) holds for all j >j', where
Also,
V;. = max{, E(1'. 1 ;.)} = L a*, (4.22)
m
j*
V; -^ = max {Y;.- 1 , E(Vi. I-)} = a .. ;
m
548 DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
For, if j <j" then Yt < E(V;+ 1 Iy ) = ( j*/m)a;.. On the other hand, if j >j'
;
then aj < 1 so that (i) Y > E(Vj+1 ;) = ( j/m)a j on {XX = MM } and (ii)
I
0= Y < E(V 1 .yj ) on {X, < M}.
Simply stated, the optimal stopping rule is to draw j* observations and then
Simply
continue sampling until an observation larger than all the preceding shows up
(and if this does not happen, stop after the last observation has been drawn).
The maximal probability of stopping at the maximum value is then
11 1
E(V1 ) Vi -
-+ j*+ 1 +...+ (4.25)
m m-1
1 1 1
a*=+ ++ z1, (4.26)
' j* j* + 1 m 1
where the difference between the two sides of the relation "z" goes to zero.
This follows since j* must go to infinity (as the series (1/j) diverges and j*
is defined by (4.20)) and a;. > 1, a;. + l < 1. Now,
1 1
l m
iI J/m ,*/m
a,. = m*
1
. dx = log(j*/m). (4.27)
j*
logt .: 1, - e ', (4.28)
m m
where the ratio of the two sides of "" goes to one, as m --+ oo. Thus,
if a = 0,
(5.1)
K +ca ifa>0,
where K >, 0 is the reorder cost and c > 0 is the cost per unit. There is a holding
cost h > 0 per unit of stock left unsold, and a depletion cost d > 0 per unit of
unmet demand. Thus the expected total cost is
where
In this model x may be negative, but a > 0. The objective is to find, for each
x, the value a = f *(x) that minimizes (5.2).
Assume
Then,
G(x) if a = 0,
a) + cx =
I(x, (5.7)
+a) JG(x +K ifa>0.
Thus, the optimum a = f *(x) equals y*(x) x, where y = y*(x) minimizes the
if y = x,
G(Y; x) {G(x) (5.8)
G(y) + K if y> x,
over y > x.
Now the function y - L(y) is convex, since the functions y -+ max{0, y - w},
max{0, w - y} are convex for each w. Therefore, G(y) is convex. Clearly, G(y)
goes to 0o as y -> oo. Also, for y < 0, G(y) > cy + djyl. Hence G(y) -+ oo as
y -. - oo, as d > c > 0. Thus, G(y) has a minimum at y = S. say. Let s < S be
such that
If K = 0, one may take s = S. If K > 0 then such ans < S exists, since G(y) --> o0
as y -- - oo and G is continuous. Observe also that G decreases on (- cc, S]
and increases on (S, cc), by convexity. Therefore, for XE (-cc, s],
G(x) > G(s) = K + G(S), and the minimum of G in (5.8) is attained at
y = y*(x) = S. On the other hand, for x E (s, S],
Therefore, y*(x) = x for XE (s, S]. Finally, if x e (S, oo) then G(x) < G(y) for
all y > x, since G is increasing on (S, oc). Hence y*(x) = x for x E (S, oo).
Since f *(x) = y*(x) - x, we have proved the following.
Proposition 5.1. Assume (5.4). There exist two numbers s <, S such that the
optimal policy is to order
Instead of fixed costs per unit for holding and depletion, one may assume
that L(y) is given by
Here H and D are convex and increasing on [0, co), H(0) = 0 = D(0). The
CHAPTER APPLICATION 551
L(y) < oo for all y, lim (cx + L(x)) > K + cS + L(S), (5.13)
X - - -
where is the discount factor, 0 < 6 < oo. The objective is to minimize the total
(N + 1)- period expected discounted cost
N
JX = EX ak I (Xk, ak), (5.17)
k=0
fN( )= Sx ifX<S,
X (5.18)
0 ifx>S,
and f (0 < k < N 1) are given recursively by (1.14). Let us determine f N _i.
For this one minimizes
where 8 N,- 1 is given by (5.16) and h N,(x) is the minimum of g,(x, a) = S"I(x, a),
i.e., (see Eq. 5.11)
hN(x) = N I(x).
Write
GN -1(Y) = cy + L(y) + 6E1(y W). (5.22)
GN -i(x) if a = 0,
I N - i (x, a) + cx = f (5.23)
+a)+K G,-,(x ifa>0.
over y >, x. The corresponding points f _ 1 (x), yN _ 1 (x), where these minima
are achieved, are related by
The main difficulty in extending the one-period argument here is that the
function GN - 1 is not in general convex, since I is not in general convex. Indeed
(see (5.11)), to the left of the point s it is linear with a slope c, while the
right-hand derivative at s is G'(s) c < c since G'(s) <0 if K > 0. One can
easily check now that I is not convex, if K > 0.
It may be shown, however, that GN _ I is K- convex in the following sense.
a Y2
K + 9(Y3) 9(Y2) + Y (9(Y2) 9(Y1)). (5.26)
Y2 Yi
CHAPTER APPLICATION 553
(i) G N _ 1 is K-convex on V^ 1 ,
(ii) GN -1(Y) -* cc as IYI --> oo.
minimizes GN _ 1 (y; x) over the interval [x, cc). For this let SN _ 1 be a point
where G N _ , attains its minimum value, and let S N _ 1 be the smallest number
SN _ 1 such that
SN -1 Y2 / /
K + GN- I (SN- 1) > G_1(y2) + -- ( GN- I (Y2) GN- 1 (YI )). (5.29)
Y2 YI
Also, GN _ I (y 2 ) > K + G N _ 1 (SN _ 1 ), since Y 2 <5N- , and (5.28) holds for the
smallest possible 5 N _ 1 Using this in (5.29), one gets
Y2 Y1
or,
i.e., GN_l(x) <- GN-I(SN-1) = GN-1(SN-1) + K < GN_1(y) + K for all y. Since
(5.30) clearly holds for x = s N _ 1 , it remains to prove it in the case S N _ 1 <x < y.
By K- convexity one has
/ x /
/ /
K + GN -1(Y) i GN -1(x) + ( GN-1( x) GN-,(SN-1))
X SN-1
K+G(S)cx ifx<s,
1 ( x) __ (5.32)
ifx>s. ( )
We want to show
K + 1(x 3 ) > 1(x 2 ) + X3 X2 (1(x 2 ) 1(x 1 )) for all x 1 <x 2 <x 3 . ( 5.33)
x2 x1
If G(x 2 ) >, G(s), then the right side of (5.35) is no more than
G(x2) + X3 X2
(G(x2) G(s)),
x2 s
By simple algebra, (5.36) is equivalent to G(s) >, G(x 2 ), which has been assumed
to be the case.
X2
K + G(x3) cx3 > K + G(S) cx2 + x3 ( cx2 + cx l )
x2 x1
= K + G(S) cx 3 ,
(5.37)
The equation for the determination of f_2 is then (see Eq. 1.14)
Proposition 5.2. Assume (5.4). Then there exists an optimal Markovian policy
f* _ (f , ... , f *) of the (S, s) type for the (N + 1)-period dynamic inventory
model, i.e., there exist Sk < S k (k = 0, 1, . .. , N) such that during period k it is
optimal to order Sk x if the stock x is < s k , and to order nothing if x > Sk.
Also s k <Sk for all kifK>0, and sk =Sk ifK=0.
X
Jx := Ex b k l (Xk, ak), (5.40)
k=0
over the class of all (measurable) policies f = (fo , f1 , ...). Write this infimum as
To show that Jx is finite, consider the policy f := (0, 0, ...), i.e., the stationary
policy that orders zero in every period. It is straightforward to check that
(h + d)EW (h + d)Ixl
JX + <co. (5.44)
(1 S) 2 1 S
The dynamic programming equation, therefore, holds for J, x . This also implies
that the convex function
_ Sx ifx'<S,
f *(x) 0 if x > S.
Here S is the point where the function (5.48) attains its minimum.
EXERCISES
the excess supply max{0, x + a W} at the rate of $1 per unit, and (c) a penalty of
$1 per unit for excess demand max{0, W x a}.
2. (Taxicab Operation) The area of operation of a cab driver comprises three towns,
Ti, T2 , T3 (states). To get a fare, the driver may follow one of three courses (actions):
pull over and wait for a radio call (a 1 ), go to the nearest taxi stand and wait in line
(a 2 ), or go on cruising until hailed by a passenger (a 3 ). The action a, is not available
if the driver is in town T,, as there is no radio service in this town. There is a known
probability p(T; ; Ti , a,) that the next trip will be to town T; , if the cab is in town T
and follows the course of action a,. The corresponding reward is g(T, a k , T).
(i) Write down the backward recursion relations for maximizing the total expected
reward over a finite horizon.
(ii) Find the optimal policy for the case N = 2, if the transition probabilities and
rewards are as follows:
T1 T2 T3 TI T2 T3
Ti a2 2 0 2 2 0 4
1 1 1
a3 4 2 4 4 2 4
i 1 1
T2 al 4 2 4 2 2 4
1 1 1
a2 4 4 2 2 4 2
i 1 1
a3 4 2 4 2 2 2
t
T3 a, 41 21 4 4 2
I i 1
az 2 4 4 2 2 2
I I 1
a3 4 4 2 2 4 2
3. Prove Theorem 1.3 along the lines of the proof of Theorem 1.2.
(vii) Let 6, := 8 . Show that f k "(x) - f (x) for every k >_ 0 and every x e S, as
n -* oo.
2. Let the elements of a finite set S be labeled 1, 2, ... , m.
(i) Show that the set B(S) of all real-valued functions on S may be identified with 1m,
and the "sup norm" on B(S) corresponds to the norm Ixl := max {Ix' I: I _< i <_ m}
)
on p'".
(ii) Let T be a (strict) contraction on S, i.e., IITf Tg <6111 gll for some
6 e [0, 1). Show that this may be viewed as a contraction, also denoted T, on
68",i.e.,ITxTyI Ix yl.
(iii) Show that T"O converges to a point, say x *, in S. [Hint: {TO: n >_ l} is a
Cauchy sequence in l8'".]
(iv) Show that Tx* = x *.
(v) Show that T"y x* whatever be ye p'". [Hint: If 7'x* = x *, Ty* = y *, then
x* = y *]
3. Write out the dynamic programming equation for the optimal infinite-horizon
discounted reward version of part (i) of the taxicab operation problem described in
Example 1.2.
4. (i) Prove Theorem 2.3.
(ii) Under the hypothesis of Theorem 2.3 show that (T"O)(x) is the N -stage optimal
reward, where T is defined by (2.15).
_<
1. Suppose {X} is a diffusion on Oa k \G with absorption at OG, starting at x G. Give
an argument analogous to that given in Exercise 3.5 of Chapter V to prove that
P:(taG t) = o(t), as t f, 0.
2. Check (3.8).
3. Check (3.13).
4. Show that the probability that the diffusion with coefficients (3.40), with c replaced
by c(x) (continuously differentiable) ever reaches 0, starting at x > 0, is zero.
5. Suppose that the expression on the extreme right in (3.47) is greater than or equal
to 1, then the maximum in (3.41) is attained at c* = 1. [Hint: c = 0 gives a minimum,
and the function is strictly concave on [0, 1].]
6. Suppose {X,} is a diffusion on (a, b) with infinitesimal generator A = 26 2 (x) d 2 /dx 2 +
(x) d /dx. Let f be in the domain of A, i.e., (T, f r)
, /t -* Af as t j 0. Prove that the
function g(x):= E x f e s(T s f )(x) ds satisfies the resolvent equation: (/3 A)g(x) _
-
fo
f(x). [Hint: (T,g)(x) = e`9(x) efl' e -vs(Tsf)(x) ds.]
THEORETICAL COMPLEMENTS
_< _<
1. (Extensions to More General State Spaces: Finite-Horizon Case) Let S be a complete
separable metric space, A a compact metric space, g,,, (0 k N) continuous
560 DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
(i) h N (x) 2= SUP QEA g s,(x, a) is attained (i.e., h N (x) = g N (x, aN(x)) for some a N (x) e A)
and is upper semicontinuous on S.
(ii) Given that hk +l is upper semicontinuous on S:
f
(a) The function (x, a) . h k+ 1(z)p(dz; x, a) is upper semicontinuous on S x A.
(b) h k (x) = suq, EA {g k (x, a) + $ h k+ ,( z)p(dz; x, a)} is upper semicontinuous on S
(and the supremum is attained).
(c) There is a Borel-measurable function f k on S into A such that
h&(x) = gk(x,fk(x)) + f hk+ 1(z)p(dz;x,f k(x)).
See, Maitra, loc. cit., for this generalization.
Therefore, Proposition 1.1 and Theorem 1.2 go over under (A).
Jx := EX I
0
e - `r(Y, a,) dt
where > 0 is the discount rate, Y is the state at time t, and a, is the action at time
t. A policy f is (semi-)Markov if, for all k >- 1, f, depends only on the last coordinate
among its arguments, i.e., if fk is a function on S into A. Under such a policy, the
stochastic process { },: t >- 0} is semi-Markov. In other words, although the embedded
process {Xk : k >- 0} is (nonhomogeneous) Markov having transition probability
p(dz; x, fk _,(x)) at the kth step, and the holding times To , T,, ... are conditionally
independent given {X k : k >- 0}, the latter (conditional) distributions y(du; x, fk (x)) are
not in general exponential. Hence, { Y: t >- 0} is not Markov in general. A policy
f =_ f is said to be stationary if it is (semi-)Markov and if fk = f for all k, where
( )
An Introduction to Stochastic
Differential Equations
where /1(.), a(.) are given functions on ll ', and {B,} is a standard
one-dimensional Brownian motion. In other words, conditionally given
{X : 0 < s < t}, in a small time interval (t, t + dt] the displacement
5
X,=x+m(Xx )ds
0
+5 a(X)dB.
0
(1.2)
563
564 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
function o(x) - a, the second integral has the obvious meaning as a(B, B o ),
so that (1.2) becomes
It turns out that (1.3) may be solved, more or less by Picard's well-known
method of iteration for ordinary differential equations, if (x) is Lipschitzian
(Exercise 2). This definition of the (stochastic) integral with respect to the
Brownian increments dB, easily extends to the case of an integrand that is a
step function. In order that {X} may have the Markov property, it is necessary
to restrict attention to step functions that do not anticipate the futute. This
motivates the following development.
Let (S2, F, P) be a probability space on which a standard one-dimensional
Brownian motion {B,: t > 0} is defined. Suppose { : t >, 0} is an increasing
family of sub-sigmafields of F such that
For example, one may take .y, = a {B.,: 0 < s < t }. This is the smallest .01, one
may take in view of (i). Often it is important to take larger A. As an example,
let SZ = a[{B s : 0 < s < t }, {Z z : A E A}], where {Z 2 : A E A} is a family of random
variables independent of {B,: t >, 0 }. For technical convenience, also assume
that .F F are P-complete, i.e., if N e .y, and P(N) = 0, then all subsets of N
are in Wit : this can easily be ensured (theoretical complement 1).
Next fix two time points 0 <, a < < oc. A real-valued stochastic process
{ f (t): a < t < } is said to be a nonanticipative step functional on [a, ] if there
exists a finite set of time points t o = a < t, < < t, = and random variables
f (0 < i < m) such that
(i) f is .F,,- measurable (0 < i < m),
(1.5)
(ii) f (t) =f for t, < t < t + , (O <, i < m 1), f () = f,.
;
Definition 1.1. The stochastic integral, or the Ito integral, of the nonanticipative
step functional f = { f(t): a < t < } in (1.5) is the stochastic process
f(to)(B, B1 0 ) . f(a)(B, Ba
I f(s) dB
`
s :=
i
) for t E [a, t,],
(a<t<). (1.6)
Observe that the Riemann-type sum (1.6) is obtained by taking the value of
the integrand at the left endpoint of a time interval (t,_,, ti]. As a consequence,
THE STOCHASTIC INTEGRAL 565
for each t E [a, ] the Ito integral ca f(s) dB., is .9i measurable, i.e., it is
nonanticipative. Some other important properties of this integral are contained
in the proposition below.
Proposition 1.1
(a) If f is a nonanticipative step functional on [a, ], then t ^ f f(s) dB, is
continuous for every w n S2; it is also additive, i.e.,
f^"
'f(u)dB+
s
j f(u)dB= J " f(u)dB., (a<'s<t<'). (1.7)
a
J a
(f(s) + g(s)) dB5 = J f(s) dB,, +
a a
g(s) dB,,. (1.8)
E
J R
a
f 2 (t) dB, = J
a
(Ef 2 (t)) dt < oo. (1.9)
E(J
rI
a
f(u)dB )= f a
, f (u) dB. (a'<s<t<). (1.10)
Also,
2(s)dsl.Fa) (1.11)
E (f f(s)dB,)2I J' E (Ja f
and
c 2 r
f
Ef (s) dB 5 = 0,
' a
E
f (s) dB s = E f
a
2 (s) ds, (a < t < )
(1.12)
+ (1.14)
...
+f(ti- 1 )(B,, Bt,-,)+.f(t,)(B, B,,).
Observe that, for s' < t', B,. B5 . is independent of.. (property (1.4(ii)), so that
Therefore,
=0 (s (1.17)
E
( J s
t f(u)dBI.F,)
/
<t).
From this and (1.7), the martingale property (1.10) follows. In order to prove
(1.11), first let t e [a, t 1 ]. Then
E (f a
f(s) dBs J z I ya = E(.f Z (a)(B, Ba) 2 I ^`a) = .% 2 (a)E((B ,
/ J Ba) 2 I Via)
= 0. (1.20)
THE STOCHASTIC INTEGRAL 567
= E (t tj_ t_ + (t t /' tj
(1.21)
=E(
J t f 2 (s)dsI F.
The next task is to extend the definition of the stochastic integral to a larger
class of functionals, by approximating these by step functionals.
R
E f 2 (t)dt < co. (1.22)
a
t3
E (f(t)f(t)) 2 dt-^0 asn --goo. (1.23)
a
Proof. Extend f(t) to ( oo, oo) by setting it zero outside [a, ]. For > 0
write g(t) := f (t s). Let i be a symmetric, continuous, nonnegative (non-
random) function that vanishes outside (-1, 1) and satisfies f 1_ 1 i1i(x) dx = I.
Write Ii e (x) %= 1i(x/s)/E. Then /0, vanishes outside (s, E) and satisries
568 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
9E ( ) = 7E(S) YE( ) = e f(
' )Y ( ),
1
IgE(t) f(t)I Z dt = 2>z 19 0) f(^)I z d^
1
1
= 2^ I.f(^)I2Ie1E(^) lj 2 )d^. (1.25)
E J d f J I.f ( )In ^
2 d dP = 2n Jn J f 2 (t) dt dP
J
E I gE(t) f (t)I 2 dt --> 0 as s 0. (1.27)
e
E (g(s) f (s)) 2 ds --> 0. (1.28)
For this, first assume that g is also bounded, jgj < M. Define
h(t)=gI a+ k (a )I
n /
ifa+k<t<a
n
+kt
n
(a),(0<k<nI),
h() = g().
1.1(c). Therefore, by the Maximal Inequality (see Chapter!, Eq. 13.56) and the
second relation in (1.12), for all c > 0,
where
Choose an increasing sequence {n k } of positive integers such that the right side
of (1.30) is less than 1/k 2 for r = 1/k 2 , if n, m > n k . Denote by A the set
A :_ {cw e S2: M k k (w) <, 1/k 2 for all sufficiently large k} . (1.32)
In other words, on the set A the sequence { f,, f, (s) dB,: a < t < } is Cauchy
k
570 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Ja
in the supremum distance (1.31). Therefore, f", (s) dB, (a < t < ) converges
uniformly to a continuous function, outside a set of probability zero.
Definition 1.3. For f e ], the uniform limit of f f" (s) dB (a < t < ) k 5
A`. If f e M[a, co ), then such a continuous version exists for f f (s) dB s on the
infinite interval [a, cc).
It should be noted that the Ito integral of f is well defined up to a null set.
For if {f}, {g"} are two sequences of nonanticipative step functionals both
satisfying (1.23), then
If { f" }, {g," } are subsequences of {f}, {g"} such that f f" (s) dB s and
k k k
a
J g(s) dB3 both converge uniformly to some processes {Y, }, {Z, }, respectively,
outside a set of zero probability, then
Ef
a
(1 Z) 2 ds tim E
k- a
(fn k (S) 9m k (S))
2 ds = 0,
continuous, one must have Ys = Z., for a < s < , outside a P -null set.
Note also that $f(s) dB, is .` ,-measurable for a < t <, , f e -#[a, ]. The
following generalization of Proposition 1.1 is almost immediate.
Example. Let f (s) = B . Then f e 4' [a, cc). Fix t > 0. Define f(t) = B, and
S
J
t
B5 dB 5 = (B-B)t. (1.35)
0
z^ -i 1 z^ -i
( z i
(B z B0)
z
Br+l,n(Br+l,n Br,n) _ tBr+l,n Br,n) + -1
r=0 2 r=0
belongs to .A'[0, cc). In the next section, it is shown that there exists a unique
continuous nonanticipative solution of (1.2) if u() and a(.) are Lipschitzian
and, in particular, {6(X )} e .H[0, cc).
5
In the last section, a precise meaning was given to the stochastic differential
equation (1.1) in terms of its integral version (1.2). The present section is devoted
to the solution of (1.1) (or (1.2)).
I(x) i (Y)I <' Mix YI, 1Q(x) Q(Y)I '< Mix YI , Vx, y. (2.1)
The first order of business is to show that equation (1.1) is valid as a stochastic
integral equation in the sense of Ito, i.e.,
Theorem 2.2. Suppose p(), a(.) satisfy (2.1). Then, for each 3-measurable
square integrable random variable X, there exists a unique (except on a set of
zero probability) continuous nonanticipative functional {Xt : t >, a} belonging
to .i [a, oo) that satisfies (2.2).
572 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Define, recursively,
f
XI(" +i) ; = Xa + ,u(X(")) ds +
f
a
'
a
t
For example,
+
J ([Xa + p(XX)(s
a) + u(Xa)(B.,
B )]) dB,,
2 a t <, T.
r2
t
(X ^"+ 1 ) _ X = ") ) 2
If.t
((Xs"))_u(Xs" i))) ds +I(Q(Xs" ) )o(XSn -1) )) dB
a
2
2( /2(XS" -1) ds
(k(X")) ))
j a
\JJ a
Write
Taking expectations of the maximum in (2.6), over 0 < t < T, and using (2.1)
we get
T 2
D r" + 1) 2EM2
T I X(")
s X s( " - ' ds
)
Now,
J
J
T
EI a
T IXs"t XS" 'It ds) < (T a)E
J a
(Xsn) Xsn 1))2 ds
T
(T a) DS"t ds. (2.9)
a
Also, by the Maximal Inequality (Chapter I, Theorem 13.6) and Theorem 1.3
(see the second relation in (1.12)),
4M2
E E(Xs"t Xs" - ' 1 )zds 4M z J T DS" ds.
a
(2.10)
where c z is a finite positive number (Exercise 1). It follows from (2.11), (2.12),
and induction that
By Chebyshev's Inequality,
2
zn c 2(T
PI max ^Xtn+'1 Xt"Ij > 2 - ") < (2.14)
T n.
Since the right side is summable in n, it follows from the Borel-Cantelli Lemma
that
Let N denote the set within parentheses in (2.15). Outside N, the series
J a
T E(X;" ) Xt ) 2 dt <
m
tim
- 00 f ^,
T E(X;" ) X;'" ) ) 2 dt. (2.16)
m o0
(DT))"2 \ (DT))1 12 > 0, (2.1 7 )
rn+1 / rn+1
f' (X,)
ds J o(XS) dB5
1
a
f
T
M J IXS" Xs 1 ds + max
a a,tsT
)
a'
(Q(XS" ) u(XX)) dB s . (2.19)
)
The first term on the right side goes to zero, as proved above. For the second
term, use the Maximal Inequality (Chapter I, Theorem 13.6) to get
PI max I
\\\a5t5T f. '
(a(Xs" ) ) a(X2)) dB, > 1 J < k 2 E
k /
J T (v(Xs" )
a
) ^(X)) 2 ds
Y (D) ) (2.20)
k2
\ r-
= n+1
Since the last expression in (2.20) is summable in n (Exercise 2), it follows from
the BorelCanteIli Lemma that max{jsa (o(XS" ) ) a(X5 )) dB s j: a < t < } < 1/k
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS 575
T
(PT < cl q ds. (2.21)
c 2 (T a) C1 "
PP T ^ --'.0 asn *oo. (2.22)
n!
The next result identifies the stochastic process {X,: t >, 0} solving (2.2), in
the case a = 0, as a diffusion with drift u(. ), diffusion coefficient a 2 (), and
initial distribution as the distribution of the given random variable X 0 .
Theorem 2.2. Assume (2.1). For each x E El let {X: t > 0} denote the unique
continuous solution in .l[0, oo) of It's integral equation
X f = x +
f o,
p(Xs) ds + i 1 u(Xs) dB (t > 0).
,10
5 (2.23)
Then {X'} is a diffusion on R' with drift j(.) and diffusion coefficient a 2 (. ),
starting at x.
X; = Xs +
S
t
p(X) du + f 5
' u(Xx) dB, (t '> s). (2.24)
X, = z + J S
u(X) du + J a(X)dB,
S
(t >' s). (2.25)
576 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Let us write the solution to (2.25) (in . #[s, co)) as 6(s, t; z, Bs), where
Bs:= {B B s : s < u < t }. It may be seen from the successive approximation
scheme (2.3)(2.5) that 6(s, t; z, B.,) is measurable in (z, B.,) (theoretical
complement 1). As {X: u > s} is a continuous stochastic process in #[s, co)
and is, by (2.24), a solution to (2.25) with z = Xs, it follows from the uniqueness
of this solution that
+n +h
X,+h = z +
l' ,h
(X) du +
s+n
J
a(X) dB, (t >, s) (2.27)
has the same distribution as that of (2.25). This fact is verified by noting that
the successive approximations (2.3)(2.5) yield the same functions in the two
cases except that, in the case of (2.27), B., is replaced by Bs+,^,. But B, and B +
have
have the same distribution, so that q(s + h, t + h; z, dy) = q(s, t; z, dy). This
proves homogeneity.
To prove that {X} is a diffusion in the sense of (1.2) or (1.2)' of Chapter
V, assume for the sake of simplicity that () and o 2 (. ) are bounded (see Exercise
7 for the general case). Then
E(Xx x) 2 = E( Jo
(X;)ds
r
)
Z
+E(fo
t a(Xs)dB5)
/
(X5)
+ 2EL(
J o ds)( o J a(X) dB.,) ]
= 0 (t 2 ) +
J 0
Ea 2 (X) ds + 0(t) = 0(t) as t j 0. (2.29)
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS 577
leading to
Now, as in (2.28),
J Ea
0
2(XS) ds =
0
a 2 (x) dx +
= ta 2 (x) + 0(t)
J0
E(a 2 (X;) a 2 (x)) ds
as t j 0. (2.31)
to be a nonanticipative step functional on [a, ] if (1.5) holds for some finite set
of time points t o = a < t, < <t,, = and some random vectors f ;
(2.32)
578 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
It follows from this definition that the stochastic integral (2.32) is the sum
of k one-dimensional stochastic integrals,
k
f(s)dB =
J a
s
i=1
f m`m(s) dB(` . ) (2.34)
Parts (a), (b) of Proposition 1.1 extend immediately. In order to extend part
(c) assume, as in (1.9),
J E^f(u)1 2 du - 1
t1
J
a
E(f ^`^(u)) 2 du < cc. (2.35)
The square integrability and the martingale property of the stochastic integral
follow from those of each of its k summands in (2.34). Similarly, one has
e f(s)dB s I = 0. (2.36)
E(J a 111
EL ( of n(u) dB(')Xf " fv1(u) dB)
a ^
= 0 for i J. (2.38)
To see this, note that, for s < s' < u < u',
= E[E(... .mau) I . ] = E[.f (n(s)(Bs`) Bsn)f u)( u )E(B/) B;) I .F) I .Fa] = 0.
(2.39)
= 0.
Definition 2.2.. If f = {f(t) = (f ' 1 (t), ... , f(m))) a < t < } is a right-continuous
stochastic process with values in (for some m) such that f(t) is F,-measurable
for all t e [a, ], then f is called a nonanticipative functional on [a, ] with values
in fl8m. If, in addition,
J
E R If(t)j 2 dt < oo, (2.40)
then f is said to belong to .WW[a, ]. If f e W[a, ] for all > a, then f belongs
to ./H[a, o)).
For f e di[, ] with values in R'` one may apply the results in Section Ito
each of the k components f;, f (u) dBu (1 <, i < k) to extend Proposition 1.2
to k-dimension and to define the stochastic integral f f(u) dB. for a
k-dimensional f e ^[a ]. The following extension of Theorem 1.3 is now
immediate.
I t(x) t(y) I '< Mx YI, Ia(x) a(Y)II '< Mix YI, (x, ye
(2.42)
580 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
X;'' = X +
a
p (X,) ds +
a
a ; (X 5 )dB s , ( 1 < i < k). (2.44)
Here ,(x) is the k- dimensional vector, (6 i1 (x), ... , v ;k (x)). The proofs of the
following theorems are entirely analogous to those of Theorems 2.1 and 2.2
(Exercises 4 and 5). Basically, one only needs to write
Theorem 2.4. Suppose p( ), a(.) satisfy (2.42). Then, for each -measurable
square integrable random vector X 8 , there exists a unique (up to a P -null set)
continuous element {X,: t > a} of #[a, oo) such that (2.43) holds.
Theorem 2.5. Suppose ( ), a(.) satisfy (2.42), and let {X'; t > 0) denote the
unique (up to a P-null set) continuous nonanticipative functional in .t[0, co)
satisfying It's integral equation
X,
f' ,
s) ds +
0
J r 6(X 8 ) dB5 , ( t >- 0). (2.45)
Then {X'} is a diffusion on R' with drift t() and diffusion matrix a()6'( ),
a'(.) being the transpose i(.).
It may be noted that in Theorems 2.4 and 2.5 it is not assumed that a(x) is
positive definite. The positive definiteness guarantees the existence of a density
for the transition probability and its smoothness (see theoretical complement
5.1 of Chapter V), but is not needed for the Markov property.
X'( o) - X0,
X; 2) = X0 + J y{X (I sy) +
r 0 QB S } ds + aB ,
0
t i ,z \ r
= I ty+ } )XpyQ B,ds+aB
2!J 0
Cl S2y2 s
X1 3) = Xp + p y (I Sy + 2,--- ^ Xp y6 B u du + QB S ^ dS + 6B,
J(' B ds+a'B,
fo
2 2 3 31 I s Jz
= t ^ ---- IX 0 +, 2 6
I ty+ 2
2 2 33
/ f(' o,
r
0
B dudsy6
0
s
i t 3! ^Xp+,, a
ty+ -2
t22 t3 3
o
r
( f ds)Bduy f Bs ds+aB,
. ' o,
= 1 ty+ -- ,
2!
- - ---'-)X0+,r 2 (J
3
J( '
o
(tu)Bdu
0
Bsds+aB,.
(2.46)
Assume, as an induction hypothesis,
I
fr(tS )m
-
(ty)Mn-1
a," ) = Y -- )X 0 + Y- (Y)r"6 Bs ds + 6B,. (2.47)
( M "= 0 mI m1 0 (n1 I)!
Now use (2.4) to check that (2.47) holds for n + 1, replacing n. Therefore, (2.47)
holds for all n. But as n co, the right side converges (for every w e S2) to
X, = e-"x r
t7 e-''u-s)B5 ds + aB r . (2.49)
0
3 IT'S LEMMA
This is easily checked by recognizing that the expression ZN say, within the ,,, ,
absolute signs is, for each n, a martingale (1 < N < 2 "), so that the Maximal
Inequality (Chapter I, Eq. 13.56) may be used to prove (3.1) (Exercise 1). Notice
that the quadratic variation of {B } over an interval equals the length of the
5
where f, g E l'[0, T], and Y(0) is .-measurable and square integrable. One
may express (3.2) in the differential form
In other words,
(P(Y(t)) = (P(Y(0)) +
E {(P'(Y(s))f(s) + 1 "(Y(s))g 2 (s)} ds +
0
-
J 9 '(Y(s))g(s) dB 5
(3.5)
.
The extra term Zip "(Y(t))g 2 (t) dt appearing in (3.4) arises because
Since the term g 2 (t) dt cannot be neglected in computing the differential dcp(Y(t)),
one must expand cp(Y(t + dt)) around Y(t) in a Taylor expansion including the
second derivative of cp.
The same argument as above applied to a function p(t, y) on [0, T] x (f8 1 ,
such that cp o := cp/t, q,', cp" are continuous and bounded, leads to
dcp(t, Y(t)) = {(p o (t, Y(t)) + cp'(t, Y(t))f(t) + jqp'(t, Y(t))g 2 (t)} dt
a0(P(t, Y) __ a0t, Y) 7r(P(t, y)'= a Y) (1 < r < m). (3.8)
at
f
(3.9)
y (r) (t) = y()() + I fr( s) ds + J s dB s
grO r m).
(1 ^^
o o
Here fr,. 'fm are real-valued and g 1 , ... , g m vector-valued (with values in
ff8'k ) nonanticipative functionals belonging to . &[0, T]. Also Y ( ' ) (0) are
.o -measurable square integrable random variables. One may express (3.9) in
the differential form
1
+ arar. (t, Y(t))(gr(t) . r (t)) dt
2 ! ISr.r'-<m
m
+ I a r (p(t, Y(t))gr (t).dB,. (3.11)
r
584 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
dcp(t, Y(t)) = (p(t + dt, Y(t + dt)) (p(t, Y(t)) = ^p(t + dt, Y(t + dt))
r=1
+ 12 Orar'gP(t,Y(t))dy(r)(t)dY(r')(t).
! 1 r,r' S m
(3.12)
_
f=1
ga 1 (t)9:` 1 (t)(dB,( ) 2 + E 9( t) (t)9(' ) (t)dB(` ) dBf;> .
i#j
(3.13)
Now as seen above (see Eq. 3.1) the first sum in (3.13) equals
N-1
= Y 9:` ) (m 2 "t)grj) (m 2-" t)(B(m+l)Y ^t - Bm 2 ^t)(Bm +1)2
-
t ) - ^t - Bm2 ^t),
m=0
I <N<2", (3.15)
Thus,
Using (3.14), (3.17) in (3.12), It's lemma (3.11) is obtained. A more elaborate
argument is given in theoretical complement 3.1. For ease of reference, here is
a statement of It's Lemma.
E fOT ( 0 r 4P(s, Y(s)) 2 Ig r (s)I 2 ds < oo, (I < r <, m), (3.18)
i.e., r cpgr e .A"[0, T] (1 <, r < m). Then (3.11) holds, i.e.,
r= 1
+ 12 1
. 1 5
arar'(p(u, Y(u))gr(u)gr.(u)
r.r' <- m I
du
m t
+ Z 0 r cp(u, Y(u))g r (u) dB (0 <, s < t < T).
r=1 s
(3.19)
where a r (x) is the rth row vector of a(x), and A is the differential operator
586 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
r
(A(P)(u, x)'= drr'(X)arar'(v(u, x) + Z (X)ar(v(u, x), )
(
2 1-<r,r'Sk r=1
(3.21)
((drr'(X)))'^ O(X)O'(X).
B(x) + C(x)d(x)
2 (Aq, )(x) = (d(x))F (Ixl) + Ixl F (Ixl) (Ixt > 0). (3.24)
Proposition 3.3. Let t(.), 6() be Lipschitzian, and (x) nonsingular for all x.
IT'S LEMMA 587
J^ J 1 exp{1(u)} du < oo
a(u)
(3.25)
Then
where
Proof. First note that if (3.25) holds for some c> 0 then it holds for all c > 0
(Exercise 2). Let c = ro > 0. Define
F(r) :=
f"0
exp{ 1(u)}
u
J acv)exp{I(v)} dv J du, (r >, ro ) (3.28)
and
Note that F'(r) > 0 and F"(r) + ((r)/r)F'(r) _ 1/a(r) for r > r 0 . Hence, by
(3.24),
Fix x such that Ixi > r0 . By Corollary 3.2(b), and optional stopping (Proposition
13.9 of Chapter 1; also see Exercises 3, 4),
?J N := inf{t >, O:IX; I = ro or N}, (r0 < IxI < N). (3.32)
2EF(IXn I) 2F(Ixl) = E
N
J nN 2(Acp)(X9) ds E(11,). (3.33)
0
Now the first relation in (3.25) implies that T (O:, o ) < oo a.s. (see Corollary 15.2
of Chapter V. or Exercise 5). Therefore, rJ N --+ t,, B(a:ro) a.s. as N -* co, so that
588 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
(3.33) yields
Proposition 3.4. Let f = {(fi (t), ... , fk (t)): 0 < t < T} be a nonanticipative
functional on [0, T] satisfying
E
ff .
f(t)dB,4 < 9k 3 T Jy ^
'T{Ef
1=1
4(t)} dt.
0
(3.36)
f
E( oT f(t)dB,f 4 =k 4 E) 1 j f Tf(t)dBi^>)4
/ \\\ki =io
4
E f0T
g 4 (t) dt < oo. (3.39)
0<t<T. (3.40)
Taking expectations,
z 4^1/2
f
(' )
o g(u) dB {Eg4(t)}^^2
dtv = 6E I( 0 g(u) dB ) g2(t)} 6 E(
= 6J 2 {Eg 4 (t)}'/ 2 . (3.42)
or,
This proves (3.38) for bounded nonanticipative step functionals. For the case
of an arbitrary nonanticipative functional g satisfying (3.39), observe that such
a g belongs to ./l[0, T]. Following the proof of Proposition 1.2, there exists a
sequence of bounded step functionals g that converge almost surely to g and
for which
T
f 'T
9(s) dB, --
0
g(s) dB 5 a.s. (i = 1, 2),
E f
OT (gn(s) g z (S)) Z ds > 0.
By Fatou's Lemma,
n-
E J('o
T \ 4
g(t) dB r !
/
-< lim E
n-'m (f oT
g(t) dB,^
4
9T lim E
Jo
T
g, (t) dt
T
= 9TE fg 0
4 (t) dt. n
Corollary 3.5. Let {X} be a diffusion with drift coefficient p() and diffusion
coefficient a 2 (.), both Lipschitzian and bounded. Then, for every E > 0,
0
Y(t) != Xl X = x y+
J r
((Xs) (X5 )) ds + J r
0
(a(XS) a(XS )) dBs,
(3.44)
and cp(z) = Iz1 2 , to get
+ I rr(Xs) ar(X5)i2} ds
r=1
+ J
0
2(X Xs) (a(Xs) (XS )) dB s . (3.45)
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS 591
Note that, since X, X; E l[0, oo) and p() and o(.) are Lipschitzian, the
expectation of the stochastic integral is zero, so that
EIX, Xfl 2 = Ix y1 2 +
J'
o f
2E(Xs Xs) . ((Xs) (Xs ))
As a consequence, the diffusion has the Feller property: If y --^ x, then p(t; y, dz)
converges weakly to p(t; x, dz) for every t >, 0. To deduce this property, note
that (3.48) implies that, for every bounded Lipschitzian function h on R' with
Ih(z) h(z')I < cjz z'J,
Now apply Theorem 5.1 of Chapter 0. To state the Feller property another
way, write T, for the transition operator
the coefficients is that they are Lipschitzian. As we shall see in this section, the
stochastic integral representation (2.45) and It's Lemma are effective tools in
analyzing the asymptotic behavior of these diffusions. Notice, on the other
hand, that the method of Section 15 of Chapter V (also see Section 3 of the
present chapter) does not work for analyzing transience, recurrence, etc., as
quantities such as (r) may not be finite.
Singular diffusions arise in many different contexts. Suppose that the velocity
V, of a particle satisfies the stochastic differential equation
where
( 0 0
6(x) - 0( (x)^
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS 593
where a&(x) is nonsingular. The following result may be thought to deal with
this kind of phenomenon, although 6(x) need not be singular. For its statement,
use the notation
Assume that, for all x, the eigenvalues of the symmetric matrix 2(J(x) + J'(x))
are all less than or equal to A 1 < 0, where k2 < 22 1 . Then there exists a
unique invariant distribution ir(dz) for the diffusion, and p(t; x, dz) converges
weakly to n(dy) as t , cc, for every x e W.
Proof. The first step is to show that, for some x, the family { p(t; x, dy): t >, 0}
is tight, i.e., given E > 0 there exists C E < oo such that
It will then follow (see Chapter 0, Theorem 5.1) that there exist t ^ cc and a
probability measure it, perhaps depending on x, such that
weakly
p(t.; x, dy) i rc(dy) as n --+ oo. (4.7)
where {X'} is the solution to (4.3) with X 0 = x. Clearly, (4.6) follows from (4.7)
by means of Chebyshev's Inequality,
In order to prove (4.8), apply It6's Lemma (see Eq. 3.20) to the function
^(y) = Iy1 2 to get
5
t (' t
IXxI 2 = IXI2 + (Acp)(XS)ds + . ,(p(X7)a,(Xs)dB5. (4.9)
0 r= 1 0
594 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
where tr D denotes the trace of D, i.e., the sum of the diagonal elements of D.
Now, by a one-term Taylor expansion,
i
(r) (Y) r (0) =
( ) y grad j(By) dB,
0
(Y'J(Oy)Y) dO + y(0).
y (Y) = Y - ((Y) (0 )) + Y'( 0 ) =
f o
1
Now,
Also,
Substituting (4.13) and (4.15) into (4.10), and using the fact that Il < 6 1Y1 2 + 1/5
for all 6 > 0, there exist 6, > 0, 6 2 > 0, such that 2), 1 kt 6 1 > 0 and
In arriving at (4.17), use the facts (i) IAq (y)I < cjy1 2 + c' for some c, c' positive,
so that f o EIA(p(Xs)I ds < oo, and (ii) the integrand of the stochastic integral in
(4.9) is in .t[0, cc). Now (4.17) implies that t --> EIX, I Z is absolutely continuous
with a density satisfying, by (4.16),
a ` 0 (t)) - 6 2 e b `,
dt (e (4.19)
0 (t) -< {0(0) + ( 2 /8')(e a ` 1)}e - a` <, Ix1 2 e - a' 1 + 6 2 /S'.
For, if (4.7) holds for some x, and (4.20) holds for this x and all y then for
every Lipschitzian and bounded f, writing t for t,,,
0
Z,:= X, X; =x y+ J t f(s) ds + J ry(s) dB s ,
0
(4.22)
where
Letting Y, = Z, and p(z) = Iz1 2 in It6's Lemma (see Eqs. 3.44-3.46), obtain
But, by (4.23) and the fact that z'J(w)z < A l lz1 2 for all w, z (see Eq. 4.12),
Z S fs = J 1
0
Z;J(X., + OZ5 )Zs dO A I IZs12. (4.25)
Also,
(T,J)(Y) - ..i :=
-
f f (z)it(dz). (4.29)
i.e., if X 0 has distribution it, then Ef(X,) = Ef(X 0 ) for all t > 0. In other
words, it is an invariant (initial) distribution. To prove uniqueness, let n' be
any invariant probability. Then for all bounded continuous f,
J (T f)(z)n'(dz) = f f(z)ir'(dz)
Vn. (4.33)
f- f f(z)ir(dz) = ff(z)7r'(dz),
(4.34)
implying it' = n. U
t
XX:= e``x + e ( `"s )c v dB
Q s (4.36)
L(t) = E eta-s)co dB se
(r-s)'6 dB s = e('-s)c66^e('-s)' ds
0 0 0
=
5
0
eae du. (4.37)
Suppose that the real parts of the eigenvalues A.....2, , say, of C are
negative. This does not necessarily imply that the eigenvalues of C + C are all
negative (Exercise 3). Therefore, Theorem 4.1 does not quite apply. But, writing
598 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Ile` c I < Ile t " c Ii Ile" -( ""II < IIB 1 t'II max{ Ilesc ii: 0 < s < 1} -+ 0 (4.38)
E := J ec aae c du.
0
" (4.39)
EXERCISES
J B,, i l.n B, ,I * 00
i=0
(iv) Prove that, outside a set of zero probability, the Brownian paths are of unbounded
variation over every finite interval [a, b], a < b. [Hint: Use (iii) for every interval
[a, b] with rational end points.]
(v) Use (iv) to prove that, outside a set of zero probability, the Brownian paths are
nowhere differentiable on (0, oo). [Hint: If f is differentiable at x, then there
exists an interval [x h, x + h] such that If() f (z)I < ( 2 I f '(x)l + i) iy zI
Vy, z e [x h, x + h], so that f is of variation less than (21f'(x)l + 1)2h on
[x h,x+h).]
2. Consider the stochastic integral equation (1.3) with p() Lipschitzian, ((x) le(y)i
Mix yt. Solve this equation by the method of successive approximations, with the
nth approximation X;" given by )
[Hint: For each t > 0 write b"(t) := max {IXS" XS" "1.0 -< s < t}. Fix T> 0. Then
) -
S"(T)-<M
T
0
S"_ i (s)ds-<M 2 J f
(' Tr
0 0
-1
S "_ 2 (s)dsdt"_,
M"
T
o fo
t-,
J(' tz
Hence, E"
c5(T) converges to a finite limit, which implies the uniform convergence
of {X:0 5 s T} to a finite and continuous limit {XS : 0 s T}.]
3. Suppose f", Ja [a, ] and E f Q (f"(s) f (s)) 2 ds - 0 as n + co.
(i) Prove that
r r l
U"'= max f(s)dB f(s)dB. : 0
Ja ))J
in probability.
(ii) If { Y(t): a t } is a stochastic process with continuous sample paths and
f Q f(s) dB s --* Y(t) in probability for all t e [a, ], then prove that
P(Y(t)=f(s)dB,,`dte[a,]=1. )
J to
f(s) dB a = f(t1)B1 f(t 0 )B B,sf'(s) ds.
ro
10
600 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
E(X, - X0 ) 2 -< 4M 2 (t + 1)
fo,
E(XS - X0 ) 2 ds + 4t 2 Ep 2 (X0 ) + 4 tEa 2 (Xo
(a) D, <, dl
f o D
s ds + d 2 , where d, = 4M 2 (T + 1), d 2 = 4T(TEp 2 (X0 ) + Ea 2 (X0
4. Write out a proof of Theorem 2.4 by extending step by step the proof of Theorem 2.1.
5. Write out a proof of Theorem 2.5 along the lines of that of Theorem 2.2.
6. Consider the Gaussian diffusion {X} given by (2.49).
(i) Show that EX, = e - `Yx.
(ii) Compute Cov(Xl , X+,,).
(iii) For the case y > 0, a 96 0, prove that there exists a unique invariant probability
distribution, and specify this distribution.
7. (i) Prove (2.28), (2.30), and (2.31) under the assumption (2.1).
(ii) Write out a corresponding proof for the multidimensional case.
(m = 0, 1, ... , 2" - 1) is i.i.d. with mean zero and variance 2(4 ")t 2 . Use (13.56)
-
of Chapter I to estimate
1
P max ZN ,"I > --
EXERCISES 601
f c
^ exp{ 1(u)} du = ca for some c> 0.
(iv) Use an argument similar to that outlined in (i) (iii) to prove that
f t
t . ( g(u)
o dB g(t)
3
belongs to .'H[0, T]
602 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
[Hint: Use (1.6), with g in place of f, to get E {(f g(u) dB) 6 g 2 (t)} S ct' for an
appropriate constant c.]
7. Let {X} be a diffusion on 68 1 having Lipschitzian coefficients ( ), v( ), with
o 2 (x) > 0 for all x.
(i) Use It's Lemma to compute (a) i/i(x):= P( {X^ } reaches c before d), c < x < d;
(b) p P({X; } ever reaches c), x > c; (c) p xd t= P( {Xf } ever reaches d), x < d.
[Hint: Consider p(y):= f' exp{ 1(c, r)} dr for c -< y -< d, where 1(c, r):_
f C ( 2 u(z)/a 2 (z)) dz.]
(ii) Compute (a) ET, A T d , where T,:= inf {t _> 0: Xx = r }, and c < x < d; (b) Et c
(x > c); (c) ET d (x <d). [Hint: Let cp be the solution of Acp(y) _ 1 for c <y < d,
cp(c) = p(d) = 0; use Ito's Lemma.]
T 2m
E/ g(t)dB,) < (2m 1)mTm-I f Eg2m(t)dt.
o 0T
[Hint: For bounded nonanticipative step functionals g, use Ito's Lemma to get
t \ 2m t (' s 2,-2
Jt 1= E(g(s) dB s ) m(2m 1) E (J g(u) dB g 2 (s) ds,
)
o / o o
so that
(' dJ, = m(2m I )E (J
r
g(u) dB,)
2m -2
g 2 (t)
dt o
f
2m (2m -2)/2m
9. (L-Maximal Inequalities)
(i) Let {Z.: n = 0,1, ...} be an {}-martingale, such that EIZ n I < oo for all n
and for some p > 1. Prove that
\I m n
[Hint: Note that {IZ n j} is a submartingale, and see Exercise 13.10 of Chapter I.]
EXERCISES 603
(ii) Assume the hypothesis of (i) for some p > 1. Prove that
AP(M" A) _ j IZnI 1M , dP ,
Ja
by (13.55) of Chapter I. Therefore,
J('
o
AP - 'P(M" >- A) d i (by Fubini)
p
J
(' m
o
A p-2 ^ IZnI1,.^,,, dPl dA = P IZ4I^
\ /
('
J \ o
M^
Al-2 d), dP
_ (PI (P 1 ))
IZnl M^ -' dP
Ja
10. (i) In addition to the hypothesis of Theorem 2.1 assume that EX'" < co, where
m is a positive integer. Prove that for 0 _< t a < 1,
/ / / (2m)1 u2m
IIX[ Xall2m -< c1( m, M) (t IX)Il^lXa)112m + (t a)1J211Q(Xa)112m\m
2m j J
= c l (m, M)tp(t 0),
say. Here II.112. is the Le m-norm for random variables, and c l (m, M) depends
only on m and M. [Hint: Write S;":= sup{E(XS"' s t}. Use
the triangle inequality for II 2m' and Exercise 8(i), to show that
t
(b) afn) < 22m-1M2m {(t
IX)2m
-1 + (2m 1)m (t a) m -1 } 6(gn-1j ds
l
a
for n>2.
Observe that
II Xt X12,,, ((S(n))1/2m
n=1 /
(ii) Deduce from (i) that sup,.,,, EX < oo for all t > a.
(iii) Extend (i), (ii) to multidimension.
11. (i) Use the result in 10(i) to prove, for every e > 0, that P(IX,, xl >, s) = o(t), as
t j 0, (a) uniformly for all x in a bounded interval, if p(.), a(.) are Lipschitzian,
and (b) uniformly for all x in 08' if p(. )' a() are bounded as well as Lipschitzian.
(ii) Extend (i) to multidimension.
(iii) Let t(.), a(.) be Lipschitzian. Prove that
uniformly for every bounded set of x's. Show that the convergence is uniform
for all x e IF k if t(. ), a() are bounded as well as Lipschitzian. [Hint:
by (3.21). [Hint: Suppose f(x) = 0 for Ixl > N. For each E > 0 and x E
where (h:) = sup{th(y) h(z)I: y, z E R", ly ... zl < a}. As r j0. O(Af :s) --p 0,
and for each e > 0,
uniformly for all x in Ne r_ {y: II _< N + F} (see Exercise I I (iii)). For x E NE,
where z':= inf{t >_ 0: IX, I = N}. But P(r" <, t) < sup{P(r'' <_ t): lyl = N + F},
by the strong Markov property applied to the stopping time t:= inf{t >_ 0:
IX, I = N + f}. Now
13. (Initial-Value Problem) Let t(). a() be Lipschitzian. Adopt the notation of
Exercise 12.
(i) If f E -9, then T, f E for all t _> 0, and T, f = T,f. [Hint:
h(Th(Tf) T1f)=
Ti(Tti h f ) ^T,Af
in "sup norm," since T,g 11911 for all bounded measurable g.]
(ii) (Existence of a Solution) Let f satisfy the hypothesis of Exercise 12(iv). Show
that the function u(t, x):= T, f (x) satisfies Kolmogorov's backward equation
au(t, x) = Au(t, x
where f is continuous and polynomially bounded. If Av(t, x) and 1grad v(t, x)j =
^(a/axW . .. , a/ax"k")v(t, x)I are polynomially bounded uniformly for every
compact set of time points in (0, eo), then v(t, x) = u(t, x):= Ef (X^ ). [Hint: For
each t > 0, use It's Lemma to the function w(s, x) = v(t s, x) to get
r
Ev(e, X^_ E ) v(t, x) = E {v o (r, X) + Av(r, X r )} dr = 0,
E
v(x)=f(x) (x e bG),
where r'= inf{t >, 0: X; e aG}. [Hint: v can be taken to be twice continuously
differentiable on H k with compact support. Apply Itd's Lemma to {v(X,)}, and then
use the Optional Stopping Theorem (Proposition 13.9, Chapter I). Also see Exercise
3.]
au(t, x)
= Au(t, x) + V(x)u(t, x) (t > 0, x E H"), u(0, x) = f(x),
at
bounded in x uniformly for every compact set oft in (0, oo), then
u(t,x)= E(f(X,)exp {
f .'
V(XS)ds})
J
(t>,0,xcR'`).
where y = ( y 1 , y 2 ).]
THEORETICAL COMPLEMENTS
>_
subsigmafield of .. The P-completion ' of
_>
1. (P- Completion of Sigmafields) Let (0, .F , P) be a probability space,
is the sigmafield {G u N: G e
N c M e such that P(M) = 0}. The proof of Lemma I of theoretical complement
V.1.1 shows that if .3, is P-complete for all t 0 then the stochastic process
{f(t): t 0} is progressively measurable provided (1) f(t) is ,y,-measurable for all t,
and (2) t + f(t) is almost surely right-continuous. Several times in Section 1 we have
constructed processes { f(t)} that have continuous sample paths almost surely, and
a
which have the property that f(t) is .F- measurable. These are nonanticipative in view
of the fact that, is P-complete.
_< _<
2. (Extension of the Ito Integral to the Class .[a, /f]) Notice that the It integral (1.6)
is well defined for all nonanticipative step functionals, and not merely those that are
in #[a, ]. It turns out that the stochastic integral of a right-continuous
nonanticipative functional { f(t): a t fl} exists as an almost surely continuous
P
f 2 (s) ds < oo a.s. (T.1.1)
Ja
If m > n then on A. we have fm (t, co) = f"(t, w) for a <, t _< . It follows that
J a
" fm (s) dB, =
a
J f (s) dB, n for a _< t _< , for almost all w E A".
To see this, consider g, h E H[a, ] such that w) = h(t, w) for a _< t _< , g(t,
we A e F. If g, h are step functionals, then it follows from the definition (1.6) of the
stochastic integral that
In the general case, let g ", h" be nonanticipative step functionals such that
g"(s) dB, -- j g(s) dB,, j h"(s) dB, > h(s) dB, for a _< t _< , a.s.
j a ,l a ,J a a
Without loss of generality one may take, for each n, the same partition
"}
{t ; (n): 0 < i < N of [a, ] for both g h Modify h if necessary, so that ", ". ",
h"(tf", co) = g"(t; ", w) for we {h(s) = g(s) for a _< s _< t," } (0 _< i <_ N"). Then )
c
for a _< t _< , if w E A,
h"(s)
J dB, = a
g"(s) dB,
so that, in the limit,
Thus, we get
J r fm(s)
a
dB. = Jf a
t "(s) dB, a _< t < , almost everywhere on A".
THEORETICAL COMPLEMENTS 609
Therefore,
From this one may show that f Q h a (s) dB, converges in probability to S f(s) dB,. For
this we first derive the inequality
P^ J
('
f (s) dB, > e^ P(
a
j a
f2(5)dS -> c +
c.
z (T.1.3)
r,
Since g e . #{, ], and f(t) = g(t) for -< t /3 on the set { Ja f 2 (s) ds <c}, it follows
that
P
(I Ja.i(s) dB E) P( J f 2 (s) ds > c) +
P(I JQ
g(s) dB, > r
2
2(s) ds c^ + E\Ja g(s) dB )/ F :
P \J f
J
PI f 2 (s) ds c) + (E
c
E g2(s)
ds) J /2
lim
(J :h
a (S) dB, J f(s) dB, >
which is the desired result. Properties (a), (b) of Proposition 1.1 hold for f E ^P[a, ].
610 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
(R' x [s, t] x f2,.(l 1 ) Q.1([s, t]) Qx 9,,,) for every t > s. Also, if (z, u, u0) --* X (w)
is measurable with respect to this product sigmafield, then so is the right side of (2.4).
This is easily checked by expressing the time integral as a limit of Riemann sums,
and the stochastic integral as a limit of stochastic integrals of nonanticipative (with
respect to { 9 : u -> s }) step functionals. It then follows that the almost sure limit X,
of X; " (as n - oo) is measurable as a function of (z, w) (with respect to the sigmafield
)
f(H 1 ) px (g R' x i)). Write the solution of (2.25) as cp(s, t; z). Then
X; = (p(s, t; XS ), by (2.24) and the uniqueness of the solution of (2.25) with z = X.
Since X. is S measurable, and is independent of 15 ,, it follows that for every bounded
Borel measurable function g on R', E(g(X;) I ) _ [Ey(rp(s, t; z))] z=Xt (see Section
4 of Chapter 0, Theorem on Independence and Conditional Expectation). This proves
the Markov property of {X, }. The multidimensional case is entirely analogous.
2. (Nonhomogeneous Diffusions) Suppose that we are given functions p '(t, x), 6(t, x)(
on [0, oo) x H k into H'. Write (t, x) for the vector whose ith component is
^' (t, x), a(t, x) for the k x k matrix whose (i, j) element is Q(t, x). Given s > 0, we
)
where Z is `ks measurable, EIZI 2 < oc. It is not difficult to extend the proofs of
Theorems 2,1, 2.2 (also see Theorems 2.4, 2.5) to prove the following result.
X,=Z+ J S
u)du+
5
J f a(X,u)dB" (t s). (T.2.3)
(b) Let {X;": t ? s} denote the solution of (T.2.3) with Z = x E H". Then it is a
(possibly nonhomogeneous) Markov process with (initial) state x at time s and
transition probability
THEORETICAL COMPLEMENTS 611
S":=max{t+1t;"):=0-<i<-N"-1}--^0 as n oc.
Write
Nn - 1
(t, Y(t)) P(s, Y(s)) =[(t1, Y(t )) (P(ti"', Y(tt)fl
i =o
Nn - 1
Nn - i
"1
(tl t l ){oq(t , (ti+l))+R';} (T.3.2)
i
where
By a Taylor expansion, the second sum on the right in (T.3.1) may be expressed as
Nn - 1 m
Y(tlnl))
(Ylrl(ti+l) Y (ry
(tlnl)) 0, q(tin)
i =0 r=1
I Nn -I
+ > ( ) Y l.l (tl" ) ))(Y`"'(t'+) y r.l (t^"'))
2 ! t^r,r'Sm i =o
X l a r ar' q ( t n) Y(t'n))) + Rrr).n.i}, (T.3.4)
where R, 0 uniformly in i (for each w), because (u, y) ---r ,. 3, cp(u, y) and
t -a Y(t) are continuous. Using (3.9) the first sum in (T.3.4) is expressed as
m Nn-1 J {'
nl
Y- Y- l( 1 i ") ti )f,(s) + g,(s)' (Bt( , Bilnl)}urq(tin) Y(t))
r=1 i =0 l
' Jt 1 `_ .=t ^J sl
f.(u)a.^P(u, J
1'(u)) du + 1 a,(P(u, Y(u))g.(u)dB"],
in probability, (T.3.5)
N-1
f 4"
(0 (p(t(
"1 Y(ti"^)) 3 p(u, Y(u)))g,(s)dB = J h"(u)dB,,,
(' I
1=0 " s
say, where $ Jh,.,(u)j du -+ 0 a.s. It remains to find the limiting value of the second
sum in (T.3.4) (excluding the remainder). For this, write
N-1
t l + l) Y '
+1) Y (. ) (tl n) ))a.a. w(ti"' y(t;n '))
.
(Y'( (t"))(Y'(t
1=o
N-1
i0
By (3.1), (3.15) and (3.16), (T.3.6) converges a.s. (for a suitable sequence of partitions)
to
N -1 k k
um 9(s)(B^f;, B.)}{ g (s)(B) B ) a.: ^P(t} 1
" , Y(ti''))
j =1i=1
Jim
= 9:1(s)9r1(s) (Bri'. B^i^ ) Z a.a; W (ti " 1 Y(ti" 1 ))
j=1 1=0
k N-1
= L^ 9r11(S)gr^)(s) llm Z nl
(ti +l ti )a,a,. (ti ") , Y(ti "1 ))
j=1 i=0
It may be noted that the proof goes through for f g, e [0, T], I -< r -< k, cp(t, y)
twice continuously differentiable in y and once in t.
2. (Explosion) Assume that ( ), 6() are locally Lipschitzian, that is, (2.42) holds for
^xl < n, jyj < n, with M = Mn , where Mn may go to infinity as n co. One may still
construct a diffusion on R k satisfying (2.43) up to an explosion time C. For this, let
{X': t >- 0} be the solution of (2.43) with globally Lipschitzian coefficients i,,(), c()
satisfying
We may, for example, let "(y) = p(ny/Iyl) and v"(y) = a(ny/Iyj) for lyl > n, so that
(2.42) holds for all x, y with M = M. Let us show that, if (xl -< n,
X' m (w) = X;. ,,(aw) for 0 -< t < C n (co) (m >- n) (T.3.9)
THEORETICAL COMPLEMENTS 613
To prove (T.3.9), first note that if m >- n then (y) = (y) and a,,,(y) = a(y) for
II -< n, so that
m (X,,,,) = (X,.,,) and (X;,,,) = a(Xf ) for 0 -< t < S(w). (T.3.1 I
It follows from (T.3.1 1) (see the argument in theoretical complement I.2) that
f p.(X,.,,) ds +
Jo Jo
a,X) dB =
Jo
(X) ds +
JO
(X;. ,,) dB for 0 -< t < S
(T.3.12)
almost surely. Therefore, a.s.,
This implies
exists a.s., and has a.s. continuous sample paths on [0, S(cw)). If P(C < co) > 0, then
we say that explosion occurs. In this case one may continue {X'} fort -> S(w) by setting
where "co" is a new state, usually taken to be the point at infinity in the one-point
compactification of 11" (see H. L. Royden (1963), Real Analysis, 2nd ed., Macmillan,
New York, p. 168). Using the Markov property of {X} for each n, it is not difficult
to show that {X': t >- 0} defined by (T.13.15) and (T.13.16) is a Markov process on
08'` u {"cc"}, called the minimal dillusion generated by
0 2 r; u^
A = 1
2 d ,( )
x 3x^^i fix' + (X) (izri'
614 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Theorem T.3.1. (Feller's Test for Explosion). Let k = 1 and I(z) := f o (2(z')/a 2 (z')) dz'.
Then
Proof. (a) Assuming that the integral on the right in (a) diverges we will construct
a nonnegative, increasing and twice continuously differentiable function (p on [0, cc)
such that (i) Agq(y) < p(y) for all y and (ii) 4p(y) -+ oo as y -^ oo. Granting the
existence of such a function cp for the moment (and extending it to a twice continuously
differentiable function on (-0o,0]), apply It's Lemma to the function
cp(t, y) := exp{ t }(p(y) to get, for all x > 0,
n TO A I
Eq (t A T o n t, X '^ Tp ,,,) q (x) = E (exp{ s })[cp(Xs) AQp(Xs )] ds
0
0.
Thus, P(T + < t, T o = oo) = 0, for all t >- 0, so that P(t + ^ < oo, -r o = oo) = 0. But on
the set {tt o < co}, r + > i o by definition of r +m . Therefore, P(r +W t, r o < oo) = 0.
This implies P(r + . < oo) = 0. It remains to construct cp. Let p 0 (z) = I (z ? 0) and
define, recursively,
cp(z)
( [ex
2 p{-!(Y)}
f ex6{((zl
z
)}
To prove (T.3.20), note that it holds for n = 0 and that (z) -< cp_,(z)^p,(z) (n -> 1).
Now use induction on n. It follows from (T.3.20) that the series q (z):= = o y^(z)
converges uniformly on compacts. Also, it is simple to check that A(P(z) = cP_,(z)
for all n >_ 1, so that A may be applied term by term to yield AQQ(z) = (P(z) (z -> 0).
By assumption, p 1 (z) --' -) as z -+ rc,. Therefore, cp(z) -* x as z - o. Next assume
that the series on the right in (a) converges. Again, applying It's Lemma to
rp(t, y):=exp{-t}cp(y), we get
X
q(x) = EcP(r A T o A t , TnA ToA
Letting t -* oc we get
9(x) -< p(n){P(r < To, T o < cc) + P(i < r x , T o = cc)} < w(n)P(i < cc),
so that
REMARK. It should be noted that P(T + , < oo) > 0 means +co is accessible. For
diffusions on S = (a, b) with a and/or b finite, the criterion of accessibility mentioned
in theoretical complement V.2.3 may be derived in exactly the same manner.
In multidimension (k > 1) the following criteria may be derived. We use the
notation (3.23) for the following statement.
f exp{/(u)}
(a) P(S < oo) = 0 if J x e i^^( du) dr = cc,
` J x(u)
a(u)
du dr < co. [I]
Proof. (a) Assume that the right side of (a) diverges. Fix r o e [0, xl). Define the
following functions on (r o , cc),
Then the radial functions q(y)on (r0 _< IyI < oo) satisfy: A(y) _< ^p_ I(IYI) Now,
as in the proof of Theorem T.3.1, apply It's Lemma to the function
or
w(IXI)
e-'w(R)P(tR ^ t ro A t) (IX), P(ZR t, a A t) e'
cp(R)
Letting R - co, we get P(^ _< t, o A t) = 0 for all t _> 0. It follows that P(S < co) = 0.
(b) This follows by replacing upper bars by lower bars in the definition (T.3.22)
and noting that for the resulting functions, Acp(IYI) % (V - i(IYI) The rest of the proof
follows the proof of the second half of part (a) of Theorem T.3. 1. n
Define
3 exp{ J
`
6-
1 `
'(X II)R(XS )dB 3 . - 2 la -t (Xs:)P(X -, )I Z ds'
J5
(0`s<t<oo). (T.3.24)
By It's Lemma,
or,
One may show, using Exercises 3.8, 3.10, that the integrand in (T.3.25) is in 1I[0, ox).
Hence {Z'} is a nonnegative {3}-martingale and EZ' = 1. Define the set function
Q on U,,o ^F, by
Note that if A e ,y, then A e .,, for all t' > t, and
o.x
E(9(X')Z+,1 gis) = Z '(t, X., ), (T.3.27)
where
o .r).
i(t, Y):= E(g(XY )Z (T.3.28)
.x
EQ(hg(Xs+^)) = E(hZ i(t, XS)) = EQ(h+i(t, Xs)), (T.3.29)
which proves the desired Markov property. Finally, for all twice continuously
differentiable g with compart support, It6's Lemma yields, with f = a`tt,
o
d(g(X, )Zo,.) = (Aog)(X' )Z dt + Z.' (grad 9)(X: ). (X; ) dBt
+ g(X,)Z,o '"f(X, )dB, + Z"(grad g)(X(X, ) dt
= (Ag)(X, )Z" dt + Z"(grad g)(X,) + g(X, )f (X, )) dB (T.3.30)
1 82g( Y) ^;^ ag
Aog(Y) = 2 1] oy^^i ayu^' Ag(Y)'= Aog(Y) + (Y) _ .) . (T.3.31)
An immediate corollary is the following result. Let .4, denote the sigmafield on
S2':= C([0, oo): P) generated by the coordinate projections co' - a, 0 <_ s _< t. Let
9 = a(U,, o ,4,). Denote by P" the distribution (on (0', ga')) of a diffusion with drift
)
;(.) and diffusion matrix ((d; j ( ))) = a()a( )', starting at x (i = 1, 2).
Corollary T.3.4. Suppose ;() (i = 1, 2), a(. ), a ' () are Lipschitzian. Then, for
every t > 0, P;;2 ^ is absolutely continuous with respect to P on (Q', V,).
(i = 1, 2) denote the distribution (on (C([0, oo): R'), .4)) of a diffusion with drift
;() and diffusion matrix a( . )a'( ). Then P ; is absolutely continuous with
respect to Po'; on .4.
assumption that these may be locally Lipschitzian and that the diffusions, with
and without the drift p, be nonexplosive. This may be established by
approximating , a by , a as in (T.3.8), and then letting n -. oo.
R. H. Cameron and W. T. Martin (1944), "Transformation of Wiener Integrals
Under Translations, Ann. of Math., 45, pp. 386-396, proved the above results
for the case a() = I. These were generalized by I. V. Girsanov (1960), "On
Transforming a Class of Stochastic Processes by Absolutely Continuous
Substitution of Measures," Theor. Probability Appl., 5, pp. 314-330.
THEORETICAL COMPLEMENTS 619
4. (The Support Theorem and the Maximum Principle) We want to establish the
following result. The notation is the same as in the statement of Corollary T.3.4.
Proof. It is enough to show that all sufficiently smooth functions in C. are in the
support of P. Let t * u(t) = (u,(t), ... , u k (t)) be a twice continuously differentiable
function in C. Let > 0, T> 0 be given. Define a continuously differentiable (in t)
function c(t) _ (c 1 (t), ... , c k (t)) that vanishes for t > 2T and such that c.(t) = (d/dt)u ; (t),
1 _< i <_ k, t T. Consider the diffusion {X'} that is the solution of
Xx = x + J
0
c(s) ds + J a(X
0
') dB s (t -> 0). (T.3.36)
In turn, (T.3.37) follows if we can find a Lipschitzian vector field b() such that the
solution {Y,} of
r r
Y, = x + c(s) ds +
f.'
b(Y,) ds +
0
6(Y s ) dB, (t -> 0) (T.3.38)
satisfies
But (T.3.39) will follow if one may find b() such that
Lemma. Let a() be nonsingular and Lipschitzian. Then, for every e > 0,
where the supremum is over the class of all Lipschitzian vector fields b(.).
define
Za)ZW
d(t, z):= d ;i (z + c(s) ds) Z
IZI . f ,
B(t, z) _ Y d ii (z +
t. t
('
J c(s) ds)
o
C(t, z) = 2 + c(s) ds
o
B(t, z) + C(t, z)
3=' sup 1, a(r):= sup d(t, z).
f30.Izl =r d(t, z) t >-o.Izl =r
Similarly (r), a(r) are defined by replacing "sup" by "inf" in the last two expressions.
Finally, let
I(r):= f r(u) du !(r):_ I r (u) du
o U J a U
Define
F(r) =
f, f, 8( "a 1v )
(
exp{
J u(,
v
dll } dv) du.
))) v
(T.3.42)
Using (T.3.41) it is simple to check that F(r) is finite for 0 < r s. Also, for r > 0,
J (U)
dv' } dv < 0,
Er 1 exp^
r
F'(r) =
a(v) ^ v' J)
(1.3.43 )
F"(r) _ -r) Y
r)
Therefore, by Itb's Lemma applied to cp(Z,) (and using the fact that cp = 0 on B(O:e)),
0 4?(0) 3 ZEiaBto:E),
so that
ETaB(o:,) -> 2q(0) = 2F(0), 0 -< IzI -< a. (T.3.44)
Choose b(z) _ Mz (M > 0). Then it is simple to check that as M r oo, F(0) oo.
n
THEORETICAL COMPLEMENTS 621
Corollary T.3.7. (Maximum Principle for Elliptic Equations). Suppose It(.) and a( )
are locally Lipschitzian and r(.) nonsingular. Let A be as in (T.3.31). Let G be a
bounded, connected, open set, and u() a continuous function on G, twice continuously
differentiable in G, such that Au(x) = 0 for all x e G. Then u(.) cannot attain its
maximum or minimum in G, unless u is constant on G. 0
Proof. If possible, let x o e G be such that u(x) _< u(x o ) for all x e G. Let 6 > 0 be
such that B(x 0 :2 5) c G. Let cp be a bounded, twice continuously differentiable function
on H" with bounded first and second derivatives, satisfying q0(x) = u(x) on B(x o :b).
Let {X,} be a diffusion with coefficients ( ), a( ), starting at x 0 . Apply It6's Lemma
to get
Eg7(X8,0 ai) =
u(y)rt(dy)
f =
eB(x o :j)
u(x o ). (T.3.45)
But u(y) _< u(x o ) for all ye aB(x 0 :6). If u(y o ) < u(x o ) for some y o E 3B(x 0 :S) then, by
the continuity of u, there is a neighborhood of y o on 3B(x 0 :S) in which u(y) < u(x o ).
But Theorem T.3.6 implies that iv assigns positive probability to this neighborhood,
which would imply that the left side of (T.3.45) is smaller than its right side. Thus
u(y) = u(x o ) on 3B(x 0 :8). By letting 6 vary, it follows that u is constant on every
open ball B contained in G centered at x o . Let D = {y e G: u(y) = u(x o )}. For each
y e D there exists, by the same argument as above, an open ball centered at y on
which u equals u(x o ). Thus D is open. But D is also closed (in G), since u is continuous.
By connectedness of G, D = G. n
The random variables q ; are a.s. finite. By the strong Markov property {Xn z ,: i > 1}
is a Markov chain on the state space 0B(O:r 1 ) = {)yl = r l }, having a (one-step)
transition probability
where .yn , is the pre-q sigmafield, and H (y, dz) is the distribution of X (0 ., t the
;
random point where {X;} hits OB(0:r; ) at the time of its first passage to aB(O:r; ).
Now for all z,, z 2 e OB(O:r, ), H2 (z,, dy), H2 (z 2 ,dy) are absolutely continuous with
respect to each other and there exists a positive constant c (independent of z,, z 2 )
such that
dn1(Y, dz)
(Y, Yi e B(O:r l )). (T.3.49)
dn1(Y1, dz) >c
This implies (see theoretical complements II.6) that there exists a unique invariant
probability l for the Markov chain {X11 2 , : i > l} and the n-step transition
probability n(y, B) converges exponentially fast to 1 (B) as n -^ oo, uniformly for
all y e 3B(O: r,) and all BE .(0B(0: r, )).
Now let {X 1 } be the above diffusion with initial distribution ,, so that
{X n2 : i > 1} is a stationary process, as is the process
nz
Z1 (f)= fJ(XS)ds (i 1),
n2i- 1
Proof. Let A e := n,,,, o {Z; (f ): i >, n} the tail sigmafield, and Bev{Z i (f ):
I _< i _< n} c ,Fn2 , (the pre-n Z + , sigmafield of {X, }). For every positive integer m,
1
A belongs to the after- ,j 2( ,,, )+i process X,1z and, therefore, may be expressed
m
as {Xnzi m e A n +m a.s., for some Borel set A +m of C([0, oo): 11"). By the strong
}
Markov property,
+
P(A j na^.m^. ) = E(In,m(X .I) I ^ ?2I.'+m,.,)
_ (Px (A n+m )) x = xa21,,.^1.I ^n+m( X nzi^)s
say. Hence,
But
But the left side of (T.3.51) does not depend on m. Therefore, P(B n A) = P(B)P(A).
I-fence a{Z (f ): I
;< i _< n} and the tail sigmafield % are independent for every n _> 1.
This implies that .i is independent of a{Z (f ): _>
i ;1 }. In particular, is independent
of itself, so that P(A) = P(A n A) = P(A)P(A). Thus, P(A) = 0 or 1. n
Using the above lemma and the Ergodic Theorem (theoretical complements II.9),
instead of the classical strong law of large numbers, we may show, as in Section 9
of Chapter 1I, or Section 12 of Chapter V, that
1(
J as.
. 1
lim -- .i(X ) ds --' --- ---- E f(X s ) ds = .i(X)m(dX) > (T.3.52)
r -. >
S
say, where in is the probability measure on (H', k ) with m(B) equal to the average
amount of time the process {X } spends in the set B during a cycle [72+ 92n+3].
S
It follows from (T.3.52) that in is the unique invariant probability for the diffusion.
The criterion (3.25) (Proposition 3.3) for positive recurrence is due to R. Z.
Has'minskii (1960), "Ergodic Properties of Recurrent Diffusion Processes and
Stabilization of the Solution to the Cauchy Problem for Parabolic Equations," Theor.
Probability App!., 5, pp. 196-214. A proof and some extensions may be found in R.
N. Bhattacharya (1978), "Criteria for Recurrence and Existence of Invariant Measures
for Multidimensional Diffusions," Ann. Probab., 6, pp. 541-553; Correction Note
(1980), Ann. Probab., 8. The proof in the last article does not require Harnack's
inequality.
1 PROBABILITY SPACES
Underlying the mathematical description of random variables and events is the notion
of a probability space ((2, F, P). The sample space 1 is a nonempty set that represents
the collection of all possible outcomes of an experiment. The elements of 12 are called
sample points. The sigmafield F is a collection of subsets of Q that includes the empty
set 0 (the "impossible event') as well as the set n (the "sure event") and is closed under
the set operations of complements and finite or denumerable unions and intersections.
The elements of F are called measurable events, or simply events. The probability measure
P is an assignment of probabilities to events (sets) in F that is subject to the
conditions that (i) 0 < P(F) _< 1, for each Fe F, (ii) P(0) = 0, P(12) = 1, and
(iii) "(Ui F; ) = Z ; P(F; ) for any finite or denumerable sequence of mutually exclusive
(pairwise disjoint) events F; , i = 1, 2, ... , belonging to F. The closure properties of F
ensure that the usual applications of set operations in representing events do not lead
to nonmeasurable events for which no (consistent) assignment of probability is possible.
The required countable additivity property (iii) gives probabilities a sufficiently rich
structure for doing calculations and approximations involving limits. Two immediate
consequences of (iii) are the following so-called continuity properties: if A, c A 2 c
is a nondecreasing sequence of events in .y then, thinking of Ute , A n as the "limiting
event" for such sequences,
P^ U lim P(An).
An)=
n-I ^
625
626 A PROBABILITY AND MEASURE THEORY OVERVIEW
events A, A 2 that
While (1.1) holds for all countably additive set functions u (in place of P) on
F, finite or not, (1.2) does not in general hold if p(A n is not finite for at least some n
>_
)
(onwards).
If S2 is a finite or denumerable set, then probabilities are defined for all subsets F of
(I once they are specified for singletons, so F is the collection of all subsets of S2. Thus,
if f is a probability mass function (p.m.f.) for singletons, i.e., f (w) 0 for all con S2 and
y. f(co) = 1, then one may define P(F) = y.EF f(cw). The function P so defined on the
class of all subsets of Q is countably additive, i.e., P satisfies (iii). So (Q, F, P) is easily
seen to be a probability space. In this case the probability measure P is determined by
the probabilities of singletons {co}.
In the case f) is not finite or denumerable, e.g., when Q is the real line or the space
the probability P(I) = ff(x) dx, by a Riemann integral. This set function P may be
_<
of all infinite sequences of 0's and l's, then the above formulation is no longer possible
in general. Instead, for example in the case S2 = I8', one is often given a piecewise
continuous probability density function (p.d.f.) f, i.e., f is nonnegative, integrable, and
f f (x) dx = 1. For an interval I = (a, b) or (b, cc), cc < a <b oo, one then assigns
extended to the class' comprising all finite unions F = U I of pairwise disjoint intervals
; ;
I by setting P(F) = Z t P(I). The class' is afield, i.e., 0 and S2 belong to' and it is
;
closed under complements and finite intersection (and therefore finite unions). But, since
W is not a sigmafield, usual sequentially applied operations on events may lead to events
outside of for which probabilities have not been defined. But a theorem from measure
theory, the Caratheodory Extension Theorem, asserts that there is a unique countably
additive extension of P from a field to the smallest sigmafield that contains W. In the
case of f above, this sigmafield is called the Borel sigmafield .fi t on 11' and its sets are
called Borel sets of R'. In general, such an extension of P to the power set sigmafield,
that is the collection of all subsets of II', is not possible. The same considerations apply
to all measures (i.e., countably additive nonnegative set functions p defined on a sigmafield
>_
with p(Q) = 0), whether the measure of 0 is 1 or not. The measure p = m, which is
defined first for each interval I as the length of the interval, and then extended uniquely
to ,4', is called the Lebesgue measure on R. Similarly, one defines the Lebesgue measure
on R'k (k 2) whose Borel sigmafield . is the smallest sigmafield that contains all
k -dimensional rectangles I = I, x I Z x x 1,, with I a one-dimensional rectangle
;
(interval) of the previous type. The Lebesgue measure of a rectangle is the product of
the lengths of its sides, i.e., its volume. Lebesgue measure on 11" has the property that
the space can be decomposed into a countable union of measurable sets of finite Lebesgue
measure; such measures are said to be sigma finite. All measures referred to in this book
are sigma-finite.
The extension of a measure p from a field W, as provided by the Caratheodory
Extension Theorem stated above, is unique and may be expressed by the formula
where the summation is over a finite collection C,, C 2 , ... of sets in' whose union
contains F and the infimum is over all such collections.
RANDOM VARIABLES AND INTEGRATION 627
Often one writes Px for Q. We sometimes write Q(dx) = f(x) dx to signify that the
distribution of X has p.d.f. f with respect to Lebesgue measure.
Suppose that (S2, ,y , p) is an arbitrary measure space and g is a real-valued function
on S2 that is measurable with respect to .^. For the simplest case, suppose that g takes
on only finitely many distinct nonzero values y t , Y 2 ..... y k on the respective sets
A,, A 21 ... , A k ; such a function g is called a simple function. In the case is a probability
measure, simple functions are discrete random variables. If (A ; ) is finite for each i _> 1
then the Lebesgue integral of g is defined as the sum of the values of g weighted by the
p-measures of the sets on which it takes its values. That is,
EX = J xQ(dx).
u^
More generally, if^p is a measurable function on R' and q (X) integrable, then
To verify the change-of-variable formula (2.3) one begins with simple functions (p and
proceeds by limits as described after the definition (2.2).
Estimates and bounds on expected values and other integrals are often obtained by
applications of the basic inequalities for integrals summarized below. Let (S2, ^ , p) be
an arbitrary measure space and let f and g be integrable functions defined on S2. A
property is said to hold almost everywhere (a.e.) or almost surely (a.s.) if the set of
points where it fails has p-measure zero. Then, it is simple to check from the definition
of the integral that we have the following.
Jn
f d
< in g dl^ (2.4)
where m(x) = 4,'(x). The function cy is strictly convex if the inequality in (2.5) is strict
except when x = y. For twice continuously differentiable functions this means a positive
second derivative. Observe that if rp is convex on I, and X and 4(X) have finite expec-
RANDOM VARIABLES AND INTEGRATION 629
This inequality is strict if (p is strictly convex and X is not degenerate. More generally,
convexity means that for any x 0 ye I, (p(tx + (1 t)y) _< up(x) + (1 t)q (y),
0 < t < 1; likewise strict convexity means strict inequality here. One may show that (2.5)
still holds if m(x) is taken to be the right-hand derivative of tp at x. The general inequality
(2.6) may then be stated as follows.
provided that the indicated expected values exist. Moreover, if q is strictly convex, then
equality holds if and only if the distribution of X is concentrated at a point (degenerate).
From Jensen's inequality we see that if X is a random variable with finite pth absolute
moment then for 0 < r < p, writing IXI = (IXI')'/", we get EIXI >_ (EIXI')l'. That is,
taking the pth root,
EIXI")' ' (0 < r < p). (2.8)
(EIXI')' 1 -<' (
In particular, all moments of lower order than p also exist. The inequality (2.8) is known
as Liapounov's Inequality.
The next inequality is the Hlder Inequality, which can also be viewed as a convexity
type inequality as follows. Let p > 1, q > I and let f and g be functions such that If I
and Igl are both p-integrable. If p and q are conjugate in the sense I/p + 1/q = 1, then
using convexity of the function ex we obtain
Replacing f and g by f/IlfII,, and g /Ilgll q , respectively, we have upon integration that
f Ifgj dy
5
1
J Ifl d +--_+=
IJ1919du t 1
1. (2.11)
II.fIIpII9II q P (IifII P )' q (11911 9 )4 P q
Proposition 2.3. (Hlder Inequality). If I f" and Ig1 4 are integrable, where 1 <p < oo,
1/p + 1/q = 1, then fg is integrable and
if If+gl p d}
l/p
<If IIPdul
.f
'ip
+{ IgI "d }
J
'ip
. ( 2.13)
_<
To prove (2.13) first notice that since,
f If+glpdu f
_< (IfI +IgI)pdp. (2.15)
J (I.f I + IgI) p d =f J
I.f I(If I + Igl) p- ' du + IgI(If I + II) p- ' du
/q
Elle
(IfI + IgI) p d < IIflip [(IfI + II) p-I]9d)1 + II8II [(Ifl + II)p- I]4dp
J \J (J
_
_ (Ilf Ilp + II IIp) (III + IgI) p
n
Dividing by (f (If I + Igl)p )' Iq and again using conjugacy 1 1/q = 1/p gives the desired
inequality by taking (1 /p)th roots in (2.15), i.e., Ilf + gllp < IlfII + IlgII .
Finally, if X is a random variable on a probability space (f2, .., P) then one has
Chebyshev's Inequality
>, >_
E
converge in measure (or in probability in case p is a probability) to f if, for every e > 0,
If g is a probability measure, and (3.2) holds, one says that (f} converges almost sure/i'
(a.s.) to f.
A sequence { f,} is said to be Cauchy in measure if, for every E > 0,
Given such a sequence, one may find an increasing sequence of positive integers
{n k : k >- 1} such that
The sets B,:= U= {If fnk , l > 2 -k } form a decreasing sequence converging to
.
B:= {If fnk I > 2 for infinitely many integers k}. Now p(B,) -< 2"+'. Therefore,
-
u(B) = 0. Since B = {1 fn , f,,, s 2 for all sufficiently large k}, on B` the sequence
{ f,,,: k >- l} converges to a function f. Therefore, { fnk } converges to f a.e. Further, for
any r>0 and all m,
u({IfnfI>E})Vu({Ifnfnml>2})+11({Ifn .fl>2}). m
(3.5)
The first term on the right goes to zero as n and n m go to infinity. Also,
Bm if
1 1 Jn.^ J I > 21 C
2m+1 < 2
As p(B m ) -+ 0 as m -+ oc, so does the second term on the right in (3.5). Therefore,
{ fn : n -> l} converges to fin measure. Conversely, if {f,,} converges to f in measure then,
for every E > 0,
f > E}) < ({l f .fl > 2}) + p({IJJ f > 2}) --> 0
as n, m -- x. That is, {f,,} is Cauchy in measure. We have proved parts (a), (b) of the
following result.
(a) If { f} converges in measure to f then If.) is Cauchy in measure and there exists
a subsequence { f,k : k >- l} which converges a.e. to f
(b) If {f} is Cauchy in measure then there exists f such that {f} converges to f in
measure.
(c) If is finite and { f,} converges to f a.e., then {f} converges to f in measure.
Note that the sets A, say, within parentheses on the right side are decreasing to
A := {I f, f I > s for infinitely many m }. As remarked in Section 1, following (1.2), the
convergence of (A m ) to u(A) holds because (A m ) is finite for all m.
The first important result on interchanging the order of limit and integration is the
following.
Proof. If {f) is a sequence of simple functions then (3.7) is simply the definition of
$ f dp. In the general case, for each n let { f k : k >- l} be an increasing sequence of non-
negative simple functions converging a.e. to f, (ask --* x). Then g k := max{ fn k : 1 -< n -< k}
(k >, 1) is an increasing sequence of simple functions and, for k >- n,
Jk (3.8)
(3.10)
where g = lim k g k By taking limits in (3.10) as n - oo, one gets f < g , f, that is, g = f.
Now, by the definition of the integral, on taking limits in (3.9) as k -^ oo one gets
Proof. Write g:= inf{ fk : k -> n}, g:= liminf f,. Then 0 g 1g. Therefore, by the
Monotone Convergence Theorem, f g dp --+ ( g d. But g f so that $ g dp $ ff dp
for all n. Therefore,
In particular, $ f d - $ f dp.
We now turn to the L"-spaces. Consider the set of all measurable functions f on a
measure space (S2, .F, u) such that I I fl dp < co, where p is a given number in [1, oo).
For each such f, consider the class of all g such that f = g a.e. Since the relation "f = g
a.e." is an equivalence relation (i.e., it is reflexive, symmetric, and transitive), the set of
all such f splits into disjoint equivalence classes. The set of all these equivalence classes
is denoted L"(S), F, p) or L(S2, p) or simply L if the underlying measure space is obvious.
It is customary and less cumbersome to write f e L, rather than {equivalence class of
f} e U. For Je L, the L-norm II f lI is defined by (2.10). By Minkowskii's inequality
(2.13), If is a linear space. That is, f, g e L implies cf + dg e If for all reals c, d. Also,
under the L'-norm, If is a normed linear space. This means (i) III f ii,, 0 for all f e L ,
with equality if and only if f = 0 (a.e.); (ii) Ilcf II D = I cl II f II P for every real c, every f e L;
634 A PROBABILITY AND MEASURE THEORY OVERVIEW
and (iii) the triangle inequality II f + gll, < II f II P + g11, holds for all f, g e L (by (2.13)).
The space L" is also complete, i.e., if fn e L' (n > 1) is a Cauchy sequence (i.e.,
II.% .fmll P -* 0 as n, m -. oo), then there exists f in L such that II fn f II P - 0. For
this last important fact, note first that Ilfn fmllP - 0 easily implies that { fn } is Cauchy
in measure and therefore, by Proposition 3.1, converges to some f in measure. As a
consequence, {h f"} converges to I f I in measure. It then follows by Fatou's Lemma
applied to {I ff} that f I f I dp < oo, i.e., f e U. Applying Fatou's Lemma to the sequence
{If fml in >_
p: n} one gets
But S Ifn fmi' d = (11L fmllp)". Therefore, the right side goes to zero as n -^ oo,
proving that II f f II,, - 0. A complete normed linear space is called a Banach space.
Thus, the LP-spaces are Banach spaces (1 < p < oo). If is a probability measure (more
generally, a finite measure), then p < p' implies L" c La', in view of Liapounov's inequality
(2.8). This is not true if (Sl) = cc.
When does a sequence {f} in L converge to some Je LP in L" -norm? Clearly, {f,,}
must converge to f in measure. A useful sufficient condition is provided by Lebesgue's
Dominated Convergence Theorem if one assumes that the dominating function g is in
R. For then IL f I -* 0 in measure, and IL f IP <, 2"(/c,"+ I f(" ) (see Eq. 2.14)
2 + 'Igl" = h, say, so that the conditions of the theorem apply to {If f I}.
If p is a probability measure, then one may obtain a necessary and sufficient condition
for L-convergence. For this purpose, define a sequence of random variables {X n } on a
probability space (S2, F, P) to be uniformly integrable if
n
J
sup IXnI dP -. 0 as A - co. (3.15)
f11X.-Xj>_.Z)
IXnXIdP<
f
flX.j_>Al2)
IXXXIdP+
fflX,j<Al2jX,-Xj>_A)
IXn XIdP
^ f I XnI dP +
jX^J>_Al21 f I
jX.J^_.1/2)
X I dP
fI
+ X,XIdP.
jjX.J ^Aj2.jX.-Xj1_A)
(3.17)
Now, given e > 0, choose 2(E) > 0 such that the first and second terms of the last sum
are less than e for A = 2(e) (use hypothesis (ii)). With this value of A, the third term goes
to zero, as n -. eo, by Lebesgue's Dominated Convergence Theorem. Hence
limsup
- 'm f IX.-XI<,( t )^
IX XI dP = 0.
J {Ix^l%zt
J ilx.,l x)
IXI dP - ix X^ dP + JXj dP
f IXX X^dP +
f fjXj>-k2)
JXidP + f
jjXj<ZJZ.IX,,-XI zz,
JX^dP.
(3.18)
The first term of the last sum goes to zero, as n - co, by hypothesis. For each A > 0
the third term goes to zero by the Dominated Convergence Theorem, as n + oo. The
same convergence theorem implies that the second term goes to zero as A . cc. Therefore,
given e > 0 there exists 2(E), n(e), such that
Since a finite sequence {X: 1 < n < n(e)} of integrable random variables is always
uniformly integrable, it follows that {XX } is uniformly integrable. n
Proof. Apply the above result to the sequence {IXX XI}. For sufficiency, note, as in
(3.16), that (i), (ii) imply Xe L, and then argue as in (3.17) that the uniform integrability
of {jXI} implies that of {IX XI"}. The proof of necessity is analogous to (3.18) and
(3.19), using (2.14). n
Theorem 3.7. (Scheffe's Theorem). Let (S), .^ , p) be a measure space. Let f,(n >- 1),f
be p.d.f.'s with respect to p, i.e., f,,, f are nonnegative, and Sf dp = 1 for all n, If dp = 1.
If f converges a.e. to f, then f . fin L'.
636 A PROBABILITY AND MEASURE THEORY OVERVIEW
Proof: Recall that for every real-valued function g on SI, one has
One has
so that
Among the L"- spaces the space Lz - Lz (O, .F , p) has a particularly rich structure. It
is a Hilbert space. That is, it is a Banach space with an inner product < , > defined by
The inner product is bilinear, i.e., linear in each argument, and the L z -norm is given by
If II = <.f,f>' rz (3.22)
and the Schwarz Inequality may be expressed as
If (S,, .,, , ), (S2,'2 , P2) are two measure spaces, then the product space (S, .9', p)
is a measure space where (i) S is the Cartesian product S 1 x S 2 ; (ii) .50 = .9' g .'z
is the smallest sigmafeld containing the class R of all measurable rectangles,
x B 2 : B, e .So,, B z e Soz ); and (iii) p is the product measure P, x z on .9'
determined by the requirement
under finite intersections. Then the class' of all finite disjoint unions of sets in .4 is a
field. By finite additivity and (4.1), p extends to' as a countably additive set function.
Finally, Caratheodory's Extension Theorem extends p uniquely to a measure on
9 = e('), the smallest sigmafield containing '.
For each Ba 5, every x-section B(x ,., := {y: (x, y) E B} is in $o,. This is clearly true
for measurable rectangles F = B, x B 2 . The class .01 of all sets in So for which the
assertion is true is a 2-class (or, a lambda class). That is, (i) S e 2, (ii) A, Be 2, and
A c B imply B\A E 2, and (iii) A(n l) e 2, A IA imply Ac Y. We state without
proof the following useful result from which the measurable-sections property asserted
above for all Be .9' follows.
Theorem 4.1. (Dynkin's Pi-Lambda Theorem).' Suppose a class .4 is closed under finite
intersections, a class d is a lambda class, and R c .4. Then a(s) c .W, where a(s) is
the smallest sigmafield containing -4.
In view of the measurable-section property, for each B e 9 one may define the
functions x " p 2 (B(X ,.,), y -a p,(B). These are measurable functions on (S,,.',,)
and (SZ, 52, P2), respectively, and one has
J /d
= J \Js=
.f(x,Y)P2(dY) IPi(dx) =
Js ( f, ^ f
'
(x,Y)p,(dx) F^2(dy). (4.3)
J
By the usual approximation of measurable functions by simple functions one arrives
at the following theorem.
(i) x - fsz f(x, y)p 2 (dy) is measurable and integrable on (S,, So,, p,).
(ii) y - f s , f(x, y)p,(dx) is measurable and integrable on (S2, 9'Z, P2).
(iii) One has the equalities (4.3).
(b) If f is nonnegative measurable on (S, .9', p) then (4.3) holds, whether the integrals
are finite or not.
or
J fd (4.6)
Next, the conditional probability P(A B), of an event A given another event B, is
defined in classical probability as
n B)
P(A I B)._ P(A (4.10)
P(B)
provided P(B) > 0. To introduce Kolmogorov's extension of this classical notion, let
(S2, ,F, P) be a probability space and {B} a countable partition of S2 by sets B. in F.
Let -9 denote the sigmafield generated by this partition, Q = a{B}. That is, -9 comprises
all countable disjoint unions of sets in {B}. Given an event A e .F, one defines P(A
the conditional probability of A given Q, by the s-measurable random variable
P(A n B.)
P(A 19)(o)._ for w e B,,, (4.11)
P(B)
if P(B) > 0, and an arbitrary constant c,,, say, if P(B) = 0. Check that
If X is a random variable, E!XJ < oo, then one defines E(X (.9), the conditional
expectation of X given -9, as the 9i-measurable random variable
where c are arbitrarily chosen constants (e.g., c = 0 for all n). From (4.13) one easily
verifies the equality
D
E(X 9) dP = J D
XdP for all D e ^. (4.14)
Note that P(A I -9) = E(1 A I D), so that (4.12) is a special case of (4.14).
2 P. Billingsley (1986), loc. cit., p. 443.
640 A PROBABILITY AND MEASURE THEORY OVERVIEW
One may express (4.14) by saying that E(X -9) is a 9- measurable random variable
whose integral over each D E -9 equals the integral of X over D. By taking D = B. in
(4.14), on the other hand, one derives (4.13). Thus, the italicized statement above may
be taken to be the definition of E(X -9). This is Kolmogorov's definition of E(X -9), I
which, however, holds for any subsigmafield -9 of F, whether generated by a countable
partition or not. To see that a 9-measurable function E(X -9) satisfying (4.14) exists
and is unique, no matter what sigmafield .9 c F is given, consider the set function v
defined on 9 by
Then v(D) = 0 if P(D) = 0. Consider the restriction of P to .9. Then one has v P on
9. By the Radon-Nikodym Theorem, there exists a unique (up to a P-null set)
9-measurable function, denoted E(X 9), such that (4.14) holds.
1
The simple interpretation of E(X -9) in (4.13) as the average of X on each member
of the partition generating .9 is lost in this abstract definition for more general sigmafields
9. But the italicized definition of E(X -9) above still retains the intuitive idea that given
the information embodied in -9, the reassessed (or conditional) probability of A, or
expectation of X, must depend only on this information, i.e., must be 9-measurable,
and must give the correct probabilities, or expectations, when integrated over events in -9.
Here is a list of some of the commonly used properties of conditional expectations.
in V.
All the properties (a) -(f) are fairly straightforward consequences of the definition of
conditional expectations, and are therefore not proved here. The interplay between
conditional expectations and independence is described by the following result.
(c) Let X, Y, Z be measurable maps on (0, .Y-, P) into (S,, S/). (S Z , .9 ), and (S 3 , ./ ),
respectively. Let co be a real-valued measurable function on (S,, .9, ). Assume
cp(X) is integrable, and a{X} and a{ Y} both independent of a{Z}. Then, writing
Q{Y, Z} for the smallest sigmafield c .} with respect to which both Y and Z are
measurable, one has
Proof. (a) EX is a constant and, therefore, -measurable. Also, relation (4.14) holds
with EX in place of E(X I -9) by independence.
(b) Let Px , P,. denote the distributions of X and Y, respectively. Then the product
probability Px x P. on (S, x S 2 , .VZ ) is the (joint) distribution of (X, Y). Let
D E cr{ Y}. This means there exists BE ./ such that D = {w eft Y(co) E B}
By the change-of-variables formula and Fubini's Theorem,
J <P(X, Y) dP = J la(Y)(fl(X, Y) dP = f
n 11 x sz
la(Y)(P(x, Y)^'x(da)Pr(dY)
=
L IB(Y)(Js (P(x, v)Px(dx) I Pr(dY) = J Sz 1 s(Y)(E( (X, Y))P1(dv)
=J n
Ja
le(Y(w))(E(P(X, y)) y =Y(,) dP(w) = 1 D(W)(Ew(X, y)) y =Y () dP(w)
= JI (E<V(X,Y))y=rdP.
D
1 e,(Y) 1 ,,(Z)p(X)dP
fD rP(X)dP= J 1 D,ID rP(X)dP= J
S2
z
o
= E(tp(X)I a{Y})dP.
Din Di
Thus, the desired relation (of the type) (4.14) holds for sets D = D, n D 2 e at Y, Z}. The
class :/,1 of all such sets is closed under finite intersection. Also, the class d of all sets
D e at Y, Z} for which f D q(X) dP = f D E(cp(X) ( at Y}) dP holds is a lambda class.
Therefore, by Dynkin's Pi-Lambda Theorem, a(4) c d. But = at Y, Z}. n
The last term on the right side vanishes, on first taking conditional expectation given
(see Basic Property (e)). Hence, for all Ye L2 (12, .9, P),
Note that X has the orthogonal decomposition: X = E(X (9) + (X E(X .9)).
Finally, the classical notion of conditional probability as a reassessed probability
measure may be recovered under fairly general conditions. The technical difficulty lies
in the fact that for every given pairwise disjoint sequence of events {A n } one may assert
the equality P(U A .9)(cu) _ E P(A I .9)(m) for all w outside a P-null set (Basic
Properties (b), (f)). Since in general there are uncountably many such sequences, there
may not exist any choice of versions of P(A ^) for all A e F such that for each w,
outside a set of zero probability, A -+ P(A 1 .9)(cw) is a probability measure on F. When
such a choice is possible, the corresponding family {P(A .9): A e f} is called a regular
conditional probability. The problem becomes somewhat simpler if one does not ask for
a conditional probability measure on F, but on a smaller sigmafield. For example, one
may consider the sigmafield I = {Y - '(B): B e.9'} where Y is a measurable function on
((1, F, P) into (S, .9'). A function (w, B) -+ Q w(B -9) on S2 x .9' into [0, 1] is said to be a
conditional distribution of Ygiven -9 if (i) for each Be .50 , Q W (B 1 -9) = P( {Ye B) .9)(cw) _-
E(I B (Y) .9)(w), for all w outside a P-null set, and (ii) for each to e S2, B 4 Q^,(B I -9) is
a probability measure on (S, .9'). Note that (i), (ii) say that there is a regular conditional
probability on 9 .
If there exists a conditional distribution Q,,(B -9) of Y given .9, then it is simple to
check that
for every measurable cp on (S, 9) such that co(Y) is integrable. Conditional distributions
do exist if S is a (Borel subset of a) complete separable metric space and 5' its Borel
sigmafeld.
In this book we often write E(Z {X x : Ac A}) in place of E(Z 6{X,: A c A}) for
simplicity.
Hence, writing 0, = 416''N and noting that 0 = Q^ s, on {ixi -< N + 1} and that on
{IxI > N + 1} one has f r(x)I < c, we have
lim I^ dP,,
n-. I$ pi
0 dPI <- lim I
n-m i
JI 0 x dP I
,t
dP
E 8
<C-+C-=E.
2c 2c
Since e> 0 is arbitrary, J, 0 dPn . } a , dP, and the proof of the italicized statement
above is complete. Let us now show that it is enough to verify (5.1) for every infinitely
644 A PROBABILITY AND MEASURE THEORY OVERVIEW
(a)
(b)
Figure 5.1
differentiable function that vanishes outside a finite interval. For each e > 0 define the
function
where d(e) is so chosen as to make f p,(x) dx = 1. One may check that p,(x) is infinitely
differentiable in x. Now let 0 be a continuous function that vanishes outside a finite interval.
Then is uniformly continuous and, therefore, b(s) = sup{1c(x) (y)j: x y^ -< e) -+ 0
as e j 0. Define
and note that, since 4(x) is an average over values of 4 within the interval (x e, x + s),
fr/(x) (x)I <- S(e) for all E. Hence,
J
t8 n u o8
+I L O'dP- bdP
2(r) + I J
u
O` dP. J ulO` dP 26(E) as n--. cc .
lim F(x 0 ) <- lim J c+(X)dPn (X) = J (x)dP(x) <- F(xo + ?I(e)) < F(x 0 ) + F,
n-+m n-+m
(5.5)
lim F(x 0 ) lim J (x) dPP(x) = J ^E dP(x) >- F(xo - q(F)) > F(x o ) - e.
n-+m n-^w
Since r > 0 is arbitrary, lim n ^ FF (x o ) <_ F(x o ) _< lim n ^ F(x o ), showing that
F(x 0 ) -+ F(x o ) as n - co. One may show that the converse is also true. That is, if
FF (x) --* F(x) at all points of continuity of a distribution function (d.f.) F, then
{Pn : n = 1, 2, ...} converges weakly to P. For this take a continuous function f that
vanishes outside [a, b] where a, h are points of continuity of F. One may divide [a, b]
into a finite number of small subintervals whose end points are all points of continuity
of F, and approximate f by a step function constant over each subinterval. The integral
of this step function with respect to P. converges to that with respect to P. All the above
results easily extend to multidimensions and we have the following theorem.
Theorem 5.1. Let P 1 , P2 .... , P be probability measures on ff8". The following are
equivalent statements.
(a) {Pn : n = 1, 2, ...} converge weakly to P.
(b) Eq. 5.1 holds for all infinitely differentiable functions vanishing outside a bounded
set.
(c) F(x) -* F(x) as n -* oo, for every point of continuity x of F.
F(x), it need not be the case that F(x) is the distribution function of a probability
_>
x of
measure. While there will be a positive measure p such that F(x) = p( oo, x], x e R',
it can happen that u(R') < 1; for example, if P. = b , i.e., P({n }) = 1,F(x) = 0, x <n
and F(x) = 1 for x n, then F = 0. Situations such as this are described by the
>_
expression, "probability mass has escaped to infinity. A property that prevents this
from happening for a sequence {P} is the so-called tightness criterion. Namely, a family
{PP } of probability measures on R' (with d.f.'s {F }), is said to be tight if for each e > 0
there is an interval (a, b] in R' such that P((a, b]) = F(b) FF (a) > I E for each n I.
It is clear that if a sequence of probability measures on R' (or R") converges weakly
to a probability measure P, then {P} is tight. Conversely, one has the following result.
Theorem 5.2. Suppose a sequence of probability measures {P} on R' is tight. Then it
has a subsequence converging weakly to a probability measure P.
Proof. Let F be the d.f. of P.. Let {r 1 , r 2 , be an enumeration of the rationals. Since
...}
converges. Continue in this manner. Consider the "diagonal sequence 1 1 , 2 2 , kk ... , , ... .
lim Ff .(x) < lim F,,(x) < lim F.,(y") = G(y") _< F(x").
It remains to prove that F is the d.f. of a probability measure. By tightness of {P} (and,
therefore, of {P,,.}), given e > 0, there exist x F , yE such that F,,(x,) < F(y,) > I s.
Let xE, y be points of continuity of F such that xE <x and y> yE . Then
Theorem 6.1. (Strong Law of Large Numbers). Let {X} be a sequence of pairwise
independent and identically distributed random variables defined on a probability space
CLASSICAL LAWS OF LARGE NUMBERS 647
X' + _+ Xn = EX,.
lim (6.1)
n wn
The proof we present here is due to Etemadi. 3 It is based on Part I of the following
Borel-Cantelli Lemmas. Part 2 is also important and it is cited here for completeness.
However, it is not used in Etemadi's proof of the SLLN.
P(A, i.o.):= P n U A k ) = 0.
( ' =I k=n
m m \ x m
P n U A k t = lim P^ U A k ) -< lim Z P(A k ) = 0.
nk=n n-m k=n n^z k=n
But
w m
P(Ak) = lim U (1 P(Ak))
k =n
Without loss of generality we may assume for the proof of the SLLN that the random
variables X. are nonnegative, since otherwise we can write X. = X X,, where
X = max(X,, 0) and X, = min(X,, 0) are both nonnegative random variables, and
then the result in the nonnegative case yields that
S
n
= n1 k=, ark _ n1_k=,
n
z Xk
' N. Etemadi (1983), `On the Laws of Large Numbers for Nonnegative Random Variables," J.
Multivariate Analysis, 13, pp. 187-193.
648 A PROBABILITY AND MEASURE THEORY OVERVIEW
P ^I T^^ ET
`^ >
Var(T) 1 T^
VarYk ^
1 T^
Z EYk
T
z z=^ = z
E T
z
E T k= 1 E rn k=1
1 t^ 1
Zz E{X 1 ltx t ^^} = 2 TE{X i lix, s^^(}. (6.2)
c T k=1 e T
Therefore,
P \I
T ET
Tn
r^I > e^ Z E {Xi l(x, s^ "1} = 2
n=1 E Tn
+2,
En=1 Tn
.3)
1(x^^(I. (6
Let x > 0 and let N = minn >- 1: T. -> x }. Then a" >- x and, since y < 2[y] for any y >- 1,
Z
n=l
1 l(:st^l= Z 1 -<2
Tn in>-x Tn n3N
"= 2 a " =k
a!
- k,
x
1 k
forX,>0.
n=1 in Xl
So
^n ET= I>e)<k Et
" 2
<co. (6.4)
nE1P\IT
By the BorelCantelli Lemma (Part 1), taking a union over positive rational values of
e, with probability 1, (TL^ ET ^ )/T n ^ 0 as n -4 oo. Therefore,
T EX, (6.5)
Tn
since
n_ao Tn n^ro
Since
S. T. -. 0
as n -+ oo. (6.6)
n
Therefore, the previous results about {T} give for {Sn } that
S
`"-SEX, as n-* (6.7)
rn
Zn Sty Sk Zn +l SrI
(6.8)
Tn+1 in k to Tn+l
1 Sk Sk
- EX, -< liminf - -< limsup - -< aEX, . (6.9)
a k k k k
Take the intersection of all such events for rational a > 1 to get lim k - d, Sk /k = EX, with
probability 1. This is the Strong Law of Large Numbers (SLLN). n
The above proof of the SLLN is really quite remarkable, as the following observations
show. First, pairwise independence is only used to make sure that the positive and negative
parts of X, and their truncations, remain pairwise uncorrelated for the calculation of
the variance of Tk as the sum of the variances. Observe that if the random variables are
all of the same sign, say nonnegative, and bounded, then it suffices to require that they
merely be uncorrelated for the same proof to go through. However, this means that, if
the random variables are bounded, then it suffices that they be uncorrelated to get the
SLLN; for one may simply add a sufficiently large constant to make them all positive.
Thus, we have the following.
X, +.+Xn
*0 asn -*oo.
n
In view of the great importance of the central limit theorem (CLT) we shall give a general
but self-contained version due to Lindeberg. This version is applicable to non-identically
distributed summands.
Theorem 7.1. (Lindeberg's CLT). For each n, let X,,,,. .. , Xk^ n be independent random
650 A PROBABILITY AND MEASURE THEORY OVERVIEW
variables satisfying
( k"
EXj.n = 0 , al.n' (EX^ n)1/Z CO, an = 1 , (7.1)
j=1
m
Um,n := Y_ X,
j=1
,, + I Zj,n
k"
j=m+1
(I < _<
m kn 1),
(7.4)
j=1 j=1
Let f be a real-valued function on I8' such that ff', f ", f " are bounded. Recall the
following version of the Taylor expansion, which is easy to check by integration by parts,
h2
f(x + h) = f(x)+hf'(x) + h f"(x)+h 2 (1 6){f"(x + Oh) f"(x) }dO
2! 0
.n
Ef(Um,n) = E.f(I'm.n) + ^2 Ef"(ym.n) + E(R m , n ). (7.8)
Also Um_ l.n = Vm,n + Z., n , and Vm.n and Zm.n are independent. Therefore, exactly as
CLASSICAL CENTRAL LIMIT THEOREMS 651
where
(V. + Zm ) .f"(Vm,)} dB.
R = Zm
J 1 (1 O){ f (7.10)
Hence,
. ]
We have used the notation 11911 := sup{Ig(x)I: x e W}. By (7.1), (7.2), and (7.12),
As > 0 is arbitrary,
k
tim Z EIR m.I = 0. (7.13)
m=1
Also,
,
EIRl s E[zm. J(1 9)Ilf"'1 Zm,nl de] _ ?II f"'II^EIzm.1 3 = ?Ilf" IIWam.nEIZ1i 3
a m.n = E(xm.n 1 11x., ,d>al) + E(Xm.n 1 lIX,,, ,I s) -< E ( x m.n 1 llx.^..,1 >)) + 5 2 ,
Therefore, by (7.2),
k
I EIR; ,j < c max a m n -^ 0 as n cc. (7.16)
m=1
/ k /
J
k
Ef Y- x) f(y)(2n)-lie exp{Zy 2 } dy -+ 0 as n -+ oo.
j= 1
It has been shown by Feller 4 that in the presence of the uniform asymptotic negligibility
condition (7.15), the Lindeberg condition is also necessary for the CLT to hold.
Corollary 7.2. (The Classical CLT). Let {Xj : j -> 1} be i.i.d. EXj = p, 0 < a 2 :=
Var Xj < oo. Then Y 7 =1 (Xj )/(i/) converges in distribution to N(0, 1).
Corollary 7.3. (Liapounov's CLT). For each n let X1,n, X2,n, . .. , Xn k be k n independent
random variables such that
k k
I EXj,n = u, I Var Xi ,, = Q Z > 0,
j= 1j=l
k
(Liapounov condition) lim Z EIXj , n EXj , n 1 2+ ' = 0 (7.18)
n-. j=l
for some 6 > 0. Then Xi,, converges in distribution to the Gaussian law with mean
p and variance a Z .
k.,
EXj , n =0, Z EX; n = 1.
j=1
It then remains to show that the hypothesis of the corollary implies the Lindeberg
condition (7.2). This is true since, for every E > 0,
k k IX. Iz+a
i ,.n
Ea 0, (7.19)
(
E X;.^ltlx;..,l>tl) Z E
1 J=1
as n - oo, by (7.18). n
Observe that the most crucial property of the normal distribution used in the proof
of Theorem 7.1 is that the sum of independent normal random variables is normal. In
other words, the normal distribution is infinitely divisible. In fact, the normal distribution
N(0, 1) may be realized as the distribution of the sum of independent normal random
variables having zero means and variances Q? for any arbitrarily specified set of
nonnegative numbers or? adding up to 1. Another well-known infinitely divisible
distribution is the Poisson distribution.
The following multidimensional version of Corollary 7.2 may be proved along the
lines of the proof of Theorem 7.1.
Consider a real- or complex-valued periodic function on the real line. By changing the
scale, if necessary, one may take the period to be 2n. Is it possible to represent f as a
superposition of the periodic functions ("waves") cos nx, sin nx of frequency n
(n = 0, 1, 2, ...)? The Weierstrass approximation theorem (Theorem 8.1) says that every
continuous periodic function f of period 2n is the limit (in the sense of uniform convergence
of functions) of a sequence of trigonometric polynomials, i.e., functions of the form
T T
Y c n e" s = c0 + (an cos nx + b n sin nx).
n=-T n=1
The theory of Fourier series says, among other things, that with the weaker notion of
L2 -convergence the approximation holds for a wider class of functions, namely for all
square integrable functions f on [it, n]; here square integrability means that I f I z is
measurable and that J"_,, f (x)1 2 dx < oc. This class of functions is denoted by L2 [ rr, n].
The successive coefficients c n for this approximation are the so-called Fourier coefficients:
1 "
c n = -- f(x)edx (n=0,1,2,...). (8.1)
n
654 A PROBABILITY AND MEASURE THEORY OVERVIEW
1
e1nxe-"""dx =0 forn0m,
2it _n
=1 for n = m, (8.2)
so that the Fourier series off written formally, without regard to convergence for the
time being, as
X
Y cneinx
(8.3)
Theorem 8.1. Let f be a continuous periodic function of period 2n. Then, given > 0,
there exists a trigonometric polynomial Y"= - N do exp{inx} such that
N
sup f (x) do exp{inx} <5.
XE[, n=N
lexp^i(N + 1)x} l 2
(N + 1)k N (x) _ exp{i( j k)x} = I exp{ijx}I z =
o,j.ksN i=o exp{ix} 1
The first equality in (8.5) follows from th^fact that there are N + 1 (ni pairs (j, k)
such thatj k = n. It follows from (8.5) that k N is a positive continuous periodic function
with period 2n. Also, k N is a p.d.f. on [ n, iv], as follows from (8.4) on integration. For
every e> 0 it follows from (8.5) that k N (x) goes to zero uniformly on [ n, e] u [E, n]
so that
kN(x) dx -. 0 as N -^ co . (8.6)
[ - a. - e]v [c.n]
In other words, k N (x) dx converges weakly to 5 0 (dx), the point mass at 0, as N -* oo.
Consider now the approximation fN of f defined by
fN(x)'=
f rz
-
.f(y)kN(x y)dy = y 1 1
. n= _N \\\
N
Inl+llJc exp{inx}, (8.7)
FOURIER SERIES AND THE FOURIER TRANSFORM 655
where c" is the nth Fourier coefficient of f. By changing variables and using the periodicity
of f and k N , one may express fN as
.fN(x) =
f n f(x y)kN(y) dy.
rz
Therefore, writing M = sup{I f (x)I: Xe l}, and S E = sup{If (y) f(y')I: I y y'I <c},
one has
$ R [-R.-E]V [E.f[]
(8.8)
It now follows from (8.6) that f fN converges to zero uniformly as N -+ oo. Now
write do = (1 Inj/(N + 1))c n . n
The next task is to establish the convergence of the Fourier series (8.3) to f in L2 .
For this note that for every square integrable f and all positive integers N,
1 rz
( f(x)_ C" e'nx i e dx = c, C m = 0 (m = 0, 1.... .N). (8.9)
27[ - n-
N
Therefore, if one defines the norm (or "length") of a function g in L2 [iv, rz] by
rz i2
I lg(x)j' dx
Ilg11 = ( = JIgll2, (8.10)
rz
N
c"einxZ I = 1 f rz (f(x) ^ c"enx )( f( x ) [^ C "e -inx 1 dx
0 - I f(x)
_N 21 _ rz \ -N -N J
=
2 1
^
/'
rz
J -n IJ (x)12 dx L
N^
[
(
-N \ 27C f rz
-n
N
einx f(x) dx + 2^
J
rz
-a
e"' f (x) dx cn ,,
J 7
This shows that II f (x) > N_ N c" exp{inx} 11 2 decreases as N increases and that
N 2
f (8.12)
lim f(x) Y- cne`"x = 11 112 Y- Ic n I 2 .
N- . -N
To prove that the right side of (8.12) vanishes, first assume that f is continuous and
f(it) = f(n). Given e> 0 there exists, by Theorem 8.1, a trigonometric polynomial
ZNOy o d n e i "x such that
No
max f(x) Y- de"' <e.
< E.
x -No
656 A PROBABILITY AND MEASURE THEORY OVERVIEW
This implies
1 n No 2
f(x) de"" dx < F 2 . (8.13)
2r j,, -No
rz No 2 fNo No 2
1 /'( x ) d rze "x d x =1 f(x) So crzeinx + (cn d n )e
nx
l dx
7C .J No 27t _ No
n - No
1 n No 2
= f(x) cneinx^ dx
27r ., _No
rz 2
1 No
+ (c d rz )e inx dx. (8.14)
27r ,, -No
No c n e
inx
l2 dx < e 2 , lim f(x) cetnx 2 \ e 2 . (8.15)
27[ f rz I f (x)
-rz -No Nam -N
N
lim f(x) crze inx = 0,
(8.16)
N-- -N
Y_ (8.17)
and, by (8.12),
II.f ll 2 = Ic1 2 .
This completes the proof of convergence for continuous periodic f. Now it may be
shown that given a square integrable f and e > 0, there exists a continuous periodic g
such that II f gll < E/2. Also, letting Z d. exp{inx}, X c exp{inx} be the Fourier series
of ,g, f, respectively, there exists N t such that
N,
g(x) d exp{inx} II < -
-N,
N x
c12
llm f(x) CnPinx =0, Ilf 11 2 = (8.19)
N-^ x^ - N -x
Theorem 8.2
(a) For every f in L2 [n, n], the Fourier series of f converges to f in L2 -norm, and
the identity li f II = IcnI2)"2 holds for its Fourier coefficients c n .
(b) If (i) f is differentiable, (ii) f(n) = f(n), and (iii) f' is square integrable, then
the Fourier series of f also converges uniformly to f on [n, n].
Proof. To prove part (b), let f be as specified. Let Y_ c n exp{inx} be the Fourier series
of f, and Y c;,' ) exp{inx} that of f'. Then
(8.20)
Since f' is square integrable,
1 1 \h/2/
Icnl = Icol + n^o Inl Inc,I <_ Icol + z Incnl2 < oo. (8.22)
n#on/ \n#O
Since the continuous functions Y N_ N c n exp{inx} converge uniformly (as N > co) to
c n exp{inx}, the latter must be a continuous function, say h. Uniform convergence
to h also implies convergence in norm to h. Since Y"' N c n exp{inx} also converges in
norm to f, f(x) = h(x) for all x. For if the two continuous functions f and h are not
identically equal, then
J f(x)g(x)l 2 dx>0. n
For a finite measure (or a finite signed measure) p on the circle [n, n) (identifying
1
c = exp{ -inx}p(dx) (n = 0, 1, ...). (8.23)
2n [n)
If p has a density f, then (8.23) is the same as the nth Fourier coefficient of f given by (8.1).
Proposition 8.3. A finite measure on the circle is determined by its Fourier coefficients.
rz n
g N (x):= kN(x - y)p(dy) = ^ (1 - l ^c exp{inx}, (8.24)
_ rz N N+ 1
with c defined by (8.23). For every continuous periodic function h (i.e., for every
continuous function on the circle),
f h(x)gN(x) dx
t = 5 ( ft - x,.)
h(x)k,(x - y) dx)P(dy). (8.25)
N
lim
- f
,-x,n)
h(x)g N (x) dx = J
[-rz.rz)
h(y)p(dy). (8.26)
This means that p is determined by {g N : N > 1} The latter in turn are determined by {c),
n
We
We are now ready to answer an important question: When is a given sequence
{c: n = 0, + 1, ...}, the sequence of Fourier coefficients of a finite measure on the circle?
A sequence of complex numbers {c: n = 0, + 1, 2,.. .) is said to be positive definite
if for any finite sequence of complex numbers {zj : 1 < j < N), one has
Y- c j - k zj z k i 0. (8.27)
1 j.k5N
1 E zjik
1j,kN
Ci_kZJZk =
21t 1-<j.kSN J [-a.n)
exp{i( j k)x}(dx)
('
1
J N
_ Irz ^E zj exp{ iix})(Z z k exp{ikx})(dx)
N
=1J
(' N Z
exp{ ijk} j (dx) >- 0.
j (8.28)
z
Also,
J
c0= p(dx) = 1.
Again, as there are N + 1 Inj pairs (j, k) such that j k = n (N _< n _< N), (8.29)
becomes
'
0 -< 9 N (X) _ 1 j exp{inx}c. (8.30)
N +1 /
In particular,
1
gN(x)dx = c o = 1. (8.31)
27[ t-n,n)
Hence (l/21t)g N is a p.d.f. on [it, ir]. By Theorem 5.2, there exists a subsequence {g N -}
such that (1/27t)g N .(x) dx converges weakly to a probability measure u(dx) as N' -+ oo.
Also, integration yields
i2rcf -,,.
[ n[
exp{ inx}g N (x) dx = ( l Inj ^c
N + 1
(n = 0, + 1, ... , N). (8.32)
For each fixed n, take N = N' in (8.32) and let N' - oo. Then
n
C = lim^1 j --c = exp{inx}p(dx) (n = 0, 1,...). (8.33)
N '- N' + 1 ;
In other words, c is the nth Fourier coefficient of p. If u({rz}) > 0, one may change p
to p' where p'({it}) = p({R}) + p({rr}), and p' = p on (it, n) to get a probability
measure p' on the circle whose Fourier coefficients are c. Note that (8.33) holds with
p replaced by p', because exp{inn} = exp{inrc} = 1. n
660 A PROBABILITY AND MEASURE THEORY OVERVIEW
Proof. Since the measure p = 0 has Fourier coefficients c = 0 for all n, and the latter
trivially comprise a positive definite sequence, it is enough to prove the correspondence
between nonzero positive definite sequences and nonzero finite measures. It follows from
Theorem 8.4, by normalization, that this correspondence is 1-1 between positive definite
sequences {c} with c o = c > 0, and measures on the circle having total mass c. n
exp{ i^ d} - exp{i^ c }
= i
(8.35)
so that f() -. 0 as ICI -+ oo. This convergence to zero as - oo is clearly valid for
arbitrary step functions, i.e., finite linear combinations of indicator functions of finite
intervals. Now let f be an arbitrary integrable function. Given e > 0 there exists a step
function f, such that
.^^( ) = - ).
if. (8.38)
The boundary terms in deriving (8.38) vanish, for if f' is integrable (as well as f) then
f (x) -. 0 as x - oo. More generally, if f is r-times continuously differentiable and
f e' , 0 < j < r, are all integrable, then one may repeat the relation (8.38) to get
)
In particular, (8.39) implies that if f, f', f" are integrable then f is integrable.
It is instructive to consider the Fourier transform as a limiting version of a Fourier
series. Consider for this purpose that f is differentiable and vanishes outside a finite
interval, and that f' is square integrable. Then, for all sufficiently large integers N, the
function
vanishes outside (n, n). Let Cn Ne x , c e S be the Fourier series of y ti. and its
derivative g N , respectively. Then
f
Cn.N gN(x)e-;nx dx =
n 2R
f (Nx)e
'
dx = 2
1 - ^r. f(y)e
inl/N
dy
1 n
= f (8.41)
2Nrz N
f=
ou
g(x)dx + A ( rz f Ig (x)I 2 dx <
n
..
1/2
x j 2 -R /
f(z) = yN
( -
N
-
n Y>.
= n1
_.. f
x, 2Nn
Letting N - co in (8.42), if f E C(If', dx), one gets the Fourier inversion formula,
n e ;n 1N.
N
(8.42)
` ('
.f(y)ey dy = - J ^^ ds.
_^ .i(^)e_
;
f(z) = -- (8.43)
One may show that this formula holds for all f such that both f and f are integrable.
Next, any f that vanishes outside a finite interval and is square integrable is automatically
integrable, and for such an f one has, for all sufficiently large N,
1 I
IgN(x)I2 dx If(y)I 2 dy,
1- =2I f
so that
Theorem 8.6
(a) If f and f are both integrable, then the Fourier inversion formula (8.43) holds.
(b) If f is integrable as well as square integrable, then the Plancherel identity (8.45)
holds.
= f m
If (x)l dx
J Ig(y)I dy, (8.48)
(f * g) (Z) =
^ f (x y)g(y) dy) dx
Jm e'4X \J -
=
JJ e'Z(x-rie''yf(x y)g(y) dy dx
e.42e14y
f(z)g(y) dy dz = i( ( ), (8.49)
-m -t
FOURIER SERIES AND THE FOURIER TRANSFORM 663
a result of importance in probability and analysis. By iteration, one defines the n-fold
convolution f, * *f,, of n integrable functions f,, ... , f and it follows from (8.49)
that (f, * * f) = f, f. f.. Note also that if f, g are real-valued integrable functions
and one defines the measures u, v by (dx) = f (x) dx, v(dx) = g(x) dx, and * v by
(f * g)(x) dx, then
(8.50)
=
J p(B y)g(y) dy =
J T (B y) dv(y),
for every interval (or, more generally, for every Bore! set) B. Here B y is the translate
of B by y, obtained by subtracting from each point in B the number y. Also,
( * v) = (f * g) = fo = . In general (i.e., whether or not p and/or v have densities),
the last expression in (8.50) defines the convolution p * v of finite signed measures p and
v. The Fourier transform of this finite signed measure is still given by (p * v) = v".
Recall that if X,, X 2 are independent random variables on some probability space
((2, d, P) and have distributions Q1, Q 2 , respectively, then the distribution of X, + X Z
is Q, * Q 2 whose characteristic function (i.e., Fourier transform) may also be computed
from
This argument extends to finite signed measures, and is an alternative way of thinking
about (or deriving) the result (p * v) = v".
Theorem 8.4 and Corollary 8.5 may be extended to a correspondence between finite
(probability) measures on H' and positive definite functions on R' defined as follows.
A complex-valued function cp on R' is said to be positive definite if for every positive
integer N and finite sequences {^ , ... , N } c H' and {z,, Z21... , z} c C (the set
of complex numbers), one has
The proof of necessity is entirely analogous to (8.28). The proof of sufficiency may
be viewed as a limiting version of that for Fourier series, as the period increases to
infinity. We omit this proof.
The above results and notions may be extended to higher dimensions H k . This
extension is straightforward. The Fourier series of a square integrable function f on
[m, i) x [n, n) x x [n, n) = [rz, n) k is defined by E c,, exp{iv x} where
the summation is over all integral vectors (or multi-indices) v = (v', V' 21 , .. , v (k '), each
vf being an integer. Also v x = v'''xthe usual euclidean inner product on U& k
between two vectors v = (v"'. .. , v' ), and x = (x', x (2 ', .. , x" k '). The Fourier
coefficients are given by
I n "
... 'v
C^ k f(x)e . dx. (8.53)
( 2n ) ,,
664 A PROBABILITY AND MEASURE THEORY OVERVIEW
The extensions of Theorems 8.1-8.4 are fairly obvious. Similarly, the Fourier transform
of an integrable function (with respect to Lebesgue measure on Il) f is defined by
f(z) =
...
.f(4)e d4, (8.55)
(2 I f
which holds when f(x) and f() are integrable. The Plancherel identity (8.45) becomes
J .. J
If(4)I Z d4 = ( 2 n) k J ... If(y)1 2 dy,
J (8.56)
which holds whenever f is integrable and square integrable. Theorem 8.6 now extends
in an obvious manner. The definitions of the Fourier transform and convolution of finite
signed measures on 08'` are as in (8.46) and (8.50) with integrals over ( 00, cc) being
replaced by integrals over R". The proof of the property (it s * 12) = l P2 is unchanged.
Our final result says that the correspondence P -+ P. on the set of probability measures
onto the set of characteristic functions, is continuous.
Theorem 8.8. (Cramer-Levy Continuity Theorem). Let P(n >, 1), P be probability
measures on (R', ,1k)
(a) If P. converges weakly to P, then P() converges to P(4) for every e Il k .
(b) If P,, () - P() for every , then P. converges weakly to P.
Proof. (a) Since P(, P() are the integrals of the bounded continuous function
exp {i4 x} with respect to P and P, it follows from the definition of weak convergence
that PP (^) -> P(^).
(b) Let f be an infinitely differentiable function on l having compact support.
Define f(x) = f(x). Then f is also infinitely differentiable and has compact support.
Now,
665
666 AUTHOR INDEX
667
668 SUBJECT INDEX
Kolmogorov's existence theorem. 16, 73, 92, Maximum principle. 365, 454. 485, 495, 516,
93 621
Kolmogorov's extension theorem, see Maxwell-Boltzmann distribution, 63.393,
Kolmogorov's existence theorem 481
Kolmogorov's forward equation, see Forward Measure, 626
equation Measure determining class, 101
Kolmogorov's inequality, 51 Measure preserving transformation, 227
Kolmogorov-Smirnov statistic, 38, 103 Method of images, 394, 484, 491. See also
Kolmogorov zero-one law, 90 Reflecting boundary, diffusion
Kronecker's delta, 265 Monotone convergence theorem, 632
Minimal process:
Lp convergence criterion, 634, 635 diffusion. 613
LP maximal inequality, 602, 603 Markov chain, 288
Lp space, 633 Minkowski inequality, 630
Lagrange's method of characteristics, 312 Mutually singular, 63
Laplacian, 454
Law of diminishing returns, 178 Natural scale:
Law of large numbers: birth-death chain, 249
classical, 646 diffusion, 416, 479
diffusion, 432 Neumann boundary condition, 404,408
Markov chain, 145, 307 Negative definite, 607
Law of proportionate effect, 78, 480 Newton's law in Hamiltonian form, 257
Learning model, see Artificial intelligence Newton's law of cooling, 247
Lebesgue dominated convergence theorem, Noiseless coding theorem. 213
see Dominated convergence theorem Nonanticipative functional. 564, 567. 577. 579
Lebesgue integral, 627 Nonhomogeneous:
Lebesgue measure, 626 diffusion, 370, 610
Levy-Khinchine representation, 350, 354 Markov chain, 263, 335, 336
Levy process. 350 Markov process, 506
Levy stable process. 355 Nonnegative definite. 17, 74, 658
Liapounov's inequality, 629 Normal reflection. 460
Linear space, normed, 633 Null recurrent:
Liouville theorem, 258 diffusion. 423
Lipschitzian, local, global, 612 Markov chain, 144, 306
673
674 ERRATA
234, 2nd line of (1.2), should read: i(2r -i) (w -'+z)(w +r-i)
p. 236, (2.16): Last term should be (1 - 6, - ,Qy )p w (i.e., missing factor pvv ).
p. 239, (3.6): Right side of 2nd equation in (3.6) should be irk in place of j.
p. 240, 6 lines from bottom: Replace first denominator (2 - r + j) in (3.14) by (w - r + j).
p.256, Exc. 4, should read: al = e -x1 is the largest nontrivial (i.e., 1) eigenvalue of p.
c'i is the eigenvector corresponding to al.
p. 279, line 4 from top: Insert at the end: on the set {Yk = ik : k > 0 }.
p. 280, 2nd line of Proposition 5.6: Replace {Yt} by {Xt}.
p. 285, line 4 from bottom: The reference should be to Example 8.3 (not Example 8.2).
p. 290, line 7 from top: A,,,, +1 should be A, +i.
p. 293, 3rd line of (7.2): Capital T in center of inequality show be lowercase t.
p. 302, line 6 from top: {Vt : 0 < u < t} should be {Vu : 0 < u < t }.
p. 303, line 5 from top: o(T) should be o(1).
p. 307, (8.15): The right side should be divided by A2.
p. 307, 3rd line after (8.15): The right side of the display should be divided by A.
p. 309, 12 lines from top: The second and third sums in display (8.26) are E N -y
pyrv ,
_ N ^0" (y
N) N ,
p. 309, (8.27): Delete (^ +p) I (a+jR) (a +N,3) wherever it appears in display (8.27).
p. 357, (T.5.3) and 2nd line from bottom: Replace the upper limit of the sum by [N/2] + 1
(not [N/2]).
p. 363, line 6 from bottom: Replace (S,.T) by (E, C9).
p. 363, line 5 from bottom: Replace F by 9.
p. 364, line 8 from top: Replace F by 9.
ERRATA 675
p. 364, line 14 from bottom, should read: to y in the metric p as k - oo, then a ( "' k) (t) _
am (nk) ( 0 ) + a(0) ( 0 ) _ aD(7n)(t) a.s. as k - oo.
m(n,t) m(n,t)
p. 364, line 12 from bottom, display should read: 1-2P.n , (o (t) = -1) = EQD""` ) (t) -f
Ea n (t) = 1 - 2P.,(o n (t) = -1).
p. 373, (2.12): tfH should be H f I.
p. 374, line 2 from top: l 1(y) - f (x) should be f" (y) - f" (x)1.
p. 375, line 3 from bottom: x = X. should be x = X s .
p. 380, (2:44): Delete square brackets on the right-hand side, and insert one parenthesis
after -a/ax.
p. 400, (6.27): Insert a minus sign in the exponential.
p. 414, (8.38), ist line: Replace by d
d x.
p. 439, 2nd line of (13.3): Replace f o fo b y fo fo in first term and insert ds'ds at the end
of the line.
p. 476, Exc. 4(v): a 2 = yao should be a 2 = ry 2 a .
p. 477, Exc. 5(i): Reference to (2.4) in place of (2.2).
p. 485, line 9: Insert ds in the double integral.
p. 487, Exc. 9(ii): The eigenvalues are 0, -1, -2, ... .
p. 505, line 18 from bottom: "nonnegative" should be "nonpositive".
p. 518, line 5 from top: The publisher is Wiley Eastern Division (not Eka Press).
p. 564, line 20: In the definition of P-complete, replace N E .Ft by N E F.
p. 565, (1.9): dB t should be dt.
p. 566, (1.18): Should be .P not. P,9 in (1.18).
p.569,line2:Insert (-a) toread a+n(-a)<t<a+n1 (-a).
p. 570, line 16: Insert ( - c) after < on right side of the display.
p. 571, (1.35): Delete first minus sign.
p. 571: Theorem 2.2 at the bottom of the page should be named Theorem 2.1.
p. 576, (2.29): Delete last = o(t).
p. 577, first line of (2.31): dx should be ds.
p. 578, (2.34): Insert f t after 1
p. 590, line 9 from top: P(IXt - xJ > e).
p. 597, (4.37): Insert one prime on the right side: aa'.
p. 598, (4.39): Insert one prime on the right side: aa'.
p. 602, Exc. 8(i): In the last inequality of the Hint, transpose E and the first bracket {.
p. 607, line 4: Insert factor eye in cp(s, y) = u(t - s, yl)eh/ 2 .
p. 617, (T.3.30): Insert a'(Xt) after Z'' in the dB t term on the right side of last line
of (T.3.30) and close parenthesis before dBt. Last line of (T.3.30) should thus read:
A9(Xi )Z'"dt + Z'^ (a (Xi) grad g(Xe) + g(Xe ).f (X t )) dBt.
676 ERRATA
p. 618, lines 14, 15 from the bottom, should read: Then P and Q are absolutely continuous
with respect to each other on (Il, .Ft), with dQ/dP = Z o ""
p. 619, (T.3.36): c(s) should be replaced by (X 8 )
p. 620, line 4 from top: Insert (r): Should read (r) :=.
p. 626, line 17 from top: I = [a, b).
p. 640, Theorem 4.4(f): Delete the last "a.s. and".
p. 648, lines 11, 12 from bottom: k = 2a /(a 1) in line 11, and a should be a in line 12.
p. 651, 4 lines from bottom, should read: where c = 2I I f " I C E Zl I3
Zr in (8.4) on the right; missing factor 21r on the left in (8.5).
p. 654: Missing factor ^
p. 668, line 15 from top: Martingale pages are 507-515 (not 503-511).
p. 670, line 5 from top, should read: Kolmogorov forward equation, 78, 266, 335, 379, 380,
390, 405, 442, 477, 478, 481, 492. See also Forward equation.
p. 670, line 15 from bottom: OrnsteinUhlenbeck pages are 369, 391, 476, 480, 581, 598
(not 580).