Stochastic Processes With Applications

9
Stochastic Processes
with Applications
Books in the Classics in Applied Mathematics series are monographs and textbooks declared out
of print by their original publishers, though they are of continued importance and interest to the
mathematical community. SIAM publishes this series to ensure that the information presented in these
texts is not lost to today's students and researchers.
Editor-in-Chief
Robert E. O'Malley, Jr., University of Washington
Editorial Board
John Boyd, University of Michigan
Leah Edelstein-Keshet, University of British Columbia
William G. Faris, University of Arizona
Nicholas J. Higham, University of Manchester
Peter Hoff, University of Washington
Mark Kot, University of Washington
Peter Olver, University of Minnesota
Philip Protter, Cornell University
Gerhard Wanner, L'Universite de Geneve
Classics in Applied Mathematics

C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences
Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and
Computational Methods
James M. Ortega, Numerical Analysis: A Second Course
Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Unconstrained
Minimization Techniques
F. H. Clarke, Optimization and Nonsmooth Analysis
George F. Carrier and Carl E. Pearson, Ordinary Differential Equations
Leo Breiman, Probability
R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding
Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences
Olvi L. Mangasarian, Nonlinear Programming
*Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One,
Part Two, Supplement. Translated by G. W. Stewart
Richard Bellman, Introduction to Matrix Analysis
U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for
Ordinary Differential Equations
K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-Value Problems
in Differential- Algebraic Equations
Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems
J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear
Equations
Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability
Cornelius Lanczos, Linear Differential Operators
Richard Bellman, Introduction to Matrix Analysis, Second Edition
Beresford N. Parlett, The Symmetric Eigenvalue Problem
Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow
Peter W. M. John, Statistical Design and Analysis of Experiments
Tamer Baar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition
Emanuel Parzen, Stochastic Processes
*First time in print.
Classics in Applied Mathematics (continued)
Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis
and Design
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New
Statistical Methodology
James A. Murdock, Perturbations: Theory and Methods
Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems
Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications
F. Natterer, The Mathematics of Computerized Tomography
Avinash C. Kale and Malcolm Slaney, Principles of Computerized Tomographic Imaging
R. Wong, Asymptotic Approximations of Integrals
O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation
David R. Brillinger, Time Series: Data Analysis and Theory
Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point
Theorems
Philip Hartman, Ordinary Differential Equations, Second Edition
Michael D. Intriligator, Mathematical Optimization and Economic Theory
Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems
Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, Vol. I: Theory
M. Vidyasagar, Nonlinear Systems Analysis, Second Edition
Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice
Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology
of Selecting and Ranking Populations
Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods
Leah Edelstein-Keshet, Mathematical Models in Biology
Heinz-Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems and the Navier-Stokes Equations
J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition
George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and
Technique
Friedrich Pukelsheim, Optimal Design of Experiments
Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications
Lee A. Segel with G. H. Handelman, Mathematics Applied to Continuum Mechanics
Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues
Barry C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics
Charles A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties
Stephen L. Campbell and Carl D. Meyer, Generalized Inverses of Linear Transformations
Alexander Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems
I. Gohberg, P. Lancaster, and L. Rodman, Matrix Polynomials
Galen R. Shorack and Jon A. Wellner, Empirical Processes with Applications to Statistics
Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone, The Linear Complementarity Problem
Rabi N. Bhattacharya and Edward C. Waymire, Stochastic Processes with Applications
Robert J. Adler, The Geometry of Random Fields
Mordecai Avriel, Walter E. Diewert, Siegfried Schaible, and Israel Zang, Generalized Concavity
Rabi N. Bhattacharya and R. Ranga Rao, Normal Approximation and Asymptotic Expansions
F ^
Stochastic Processes
with Applications

b ci
Rabi N. Bhattacharya
University of Arizona
Tucson, Arizona
Edward C. Waymire
Oregon State University
Corvallis, Oregon
pia m o
Society for Industrial and Applied Mathematics
Philadelphia
Copyright 2009 by the Society for Industrial and Applied Mathematics
This SIAM edition is an unabridged republication of the work first published by John
Wiley & Sons (SEA) Pte. Ltd., 1992.
10987654321
All rights reserved. Printed in the United States of America. No part of this book may
be reproduced, stored, or transmitted in any manner without the written permission of
the publisher. For information, write to the Society for Industrial and Applied
Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.
Library of Congress Cataloging-in-Publication Data
Bhattacharya, R. N. (Rabindra Nath), 1937-

Stochastic processes with applications / Rabi N. Bhattacharya, Edward C. Waymire.
p. cm. -- (Classics in applied mathematics ; 61)
Originally published: New York : Wiley, 1990.
Includes index.
ISBN 978-0-898716-89-4
1. Stochastic processes. I. Waymire, Edward C. II. Title.
QA274.B49 2009
519.2'3--dc22
2009022943
S1L2JTL. is a registered trademark.

To Gouri and Linda,
with love
Contents
Preface to the Classics Edition xiii
Preface xv
Sample Course Outline xvii
I Random Walk and Brownian Motion 1

1. What is a Stochastic Process?, 1
2. The Simple Random Walk, 3
3. Transience and Recurrence Properties of the Simple Random Walk, 5
4. First Passage Times for the Simple Random Walk, 8
5. Multidimensional Random Walks, 11
6. Canonical Construction of Stochastic Processes, 15
7. Brownian Motion, 17
8. The Functional Central Limit Theorem (FCLT), 20
9. Recurrence Probabilities for Brownian Motion, 24
10. First Passage Time Distributions for Brownian Motion, 27
11. The Arcsine Law, 32
12. The Brownian Bridge, 35
13. Stopping Times and Martingales, 39
14. Chapter Application: Fluctuations of Random Walks with Slow Trends
and the Hurst Phenomenon, 53
Exercises, 62
Theoretical Complements, 90
II Discrete-Parameter Markov Chains 109

1. Markov Dependence, 109
2. Transition Probabilities and the Probability Space, 110
ix

X CONTENTS
3. Some Examples, 113

4. Stopping Times and the Strong Markov Property, 117
5. A Classification of States of a Markov Chain, 120
6. Convergence to Steady State for Irreducible and Aperiodic Markov
Processes on Finite Spaces, 126
7. Steady-State Distributions for General Finite-State Markov
Processes, 132
8. Markov Chains: Transience and Recurrence Properties, 135
9. The Law of Large Numbers and Invariant Distributions for Markov
Chains, 138
10. The Central Limit Theorem for Markov Chains, 148
11. Absorption Probabilities, 151
12. One-Dimensional Nearest-Neighbor Gibbs States, 162
13. A Markovian Approach to Linear Time Series Models, 166
14. Markov Processes Generated by Iterations of I.I.D. Maps, 174
15. Chapter Application: Data Compression and Entropy, 184
Exercises, 189

III BirthDeath Markov Chains 233
1. Introduction to BirthDeath Chains, 233
2. Transience and Recurrence Properties, 234
3. Invariant Distributions for BirthDeath Chains, 238
4. Calculations of Transition Probabilities by Spectral Methods, 241
5. Chapter Application: The Ehrenfest Model of Heat Exchange, 246
Exercises, 252
IV Continuous-Parameter Markov Chains 261

1. Introduction to Continuous-Time Markov Chains, 261
2. Kolmogorov's Backward and Forward Equations, 263
3. Solutions to Kolmogorov's Equations in Exponential Form, 267
4. Solutions to Kolmogorov's Equations by Successive Approximation, 271
5. Sample Path Analysis and the Strong Markov Property, 275
6. The Minimal Process and Explosion, 288
7. Some Examples, 292
8. Asymptotic Behavior of Continuous-Time Markov Chains, 303
9. Calculation of Transition Probabilities by Spectral Methods, 314
10. Absorption Probabilities, 318
CONTENTS Xi
11. Chapter Application: An Interacting System: The Simple Symmetric

Voter Model, 324
Exercises, 333
V Brownian Motion and Diffusions 367

1. Introduction and Definition, 367
2. Kolmogorov's Backward and Forward Equations, Martingales, 371
3. Transformation of the Generator under Relabeling of the State Space, 381
4. Diffusions as Limits of BirthDeath Chains, 386
5. Transition Probabilities from the Kolmogorov Equations: Examples, 389
6. Diffusions with Reflecting Boundaries, 393
7. Diffusions with Absorbing Boundaries, 402
8. Calculation of Transition Probabilities by Spectral Methods, 408
9. Transience and Recurrence of Diffusions, 414
10. Null and Positive Recurrence of Diffusions, 420
11. Stopping Times and the Strong Markov Property, 423
12. Invariant Distributions and the Strong Law of Large Numbers, 432
13. The Central Limit Theorem for Diffusions, 438
14. Introduction to Multidimensional Brownian Motion and Diffusions, 441
15. Multidimensional Diffusions under Absorbing Boundary Conditions and
Criteria for Transience and Recurrence, 448
16. Reflecting Boundary Conditions for Multidimensional Diffusions, 460
17. Chapter Application: G. I. Taylor's Theory of Solute Transport in a
Capillary, 468
Exercises, 475
VI Dynamic Programming and Stochastic Optimization 519

1. Finite-Horizon Optimization, 519
2. The Infinite-Horizon Problem, 525
3. Optimal Control of Diffusions, 533
4. Optimal Stopping and the Secretary Problem, 542
5. Chapter Application: Optimality of (S, s) Policies in Inventory
Problems, 549
Exercises, 557

xii CONTENTS
VII An Introduction to Stochastic Differential Equations 563

1. The Stochastic Integral, 563
2. Construction of Diffusions as Solutions of Stochastic Differential
Equations, 571
3. It6's Lemma, 582
4. Chapter Application: Asymptotics of Singular Diffusions, 591
Exercises, 598
0 A Probability and Measure Theory Overview 625

1. Probability Spaces, 625
2. Random Variables and Integration, 627
3. Limits and Integration, 631
4. Product Measures and Independence, RadonNikodym Theorem and
Conditional Probability, 636
5. Convergence in Distribution in Finite Dimensions, 643
6. Classical Laws of Large Numbers, 646
7. Classical Central Limit Theorems, 649
8. Fourier Series and the Fourier Transform, 653
Author Index 665

Subject Index 667
Errata 673
Preface to the Classics Edition
The publication of Stochastic Processes with Applications (SPWA) in the SIAM

Classic in Applied Mathematics series is a matter of great pleasure for us, and we
are deeply appreciative of the efforts and good will that went into it. The book has
been out of print for nearly ten years. During this period we received a number of
requests from instructors for permission to make copies of the book to be used as
a text on stochastic processes for graduate students. We also received many kind
laudatory words, along with inquiries about the possibility of bringing out a second
edition, from mathematicians, statisticians, physicists, chemists, geoscientists, and
others from the U.S. and abroad. We hope that the inclusion of a detailed errata is a
helpful addition to the original.
SPWA was a work of love for its authors. As stated in the original preface,
the book was intended for use (1) as a graduate-level text for students in diverse
disciplines with a reasonable background in probability and analysis, and (2) as a
reference on stochastic processes for applied mathematicians, scientists, engineers,
economists, and others whose work involves the application of probability. It was our
desire to communicate our sense of excitement for the subject of stochastic processes
to a broad community of students and researchers. Although we have often empha-
sized substance over form, the presentation is systematic and rigorous. A few proofs
are relegated to Theoretical Complements, and appropriate references for proofs are
provided for some additional advanced technical material. The book covers a sub-
stantial part of what we considered to be the core of the subject, especially from the
point of view of applications. Nearly two decades have passed since the publication
of SPWA, but the importance of the subject has only grown. We are very happy to
see that the book's rather unique style of exposition has a place in the broader applied
mathematics literature.
We would like to take this opportunity to express our gratitude to all those col-
leagues who over the years have provided us with encouragement and generous
words on this book. Special thanks are due to SIAM editors Bill Faris and Sara
Murphy for shepherding SPWA back to print.
RABI N. BHATTACHARYA
EDWARD C. WAYMIRE
XIII
Preface
This is a text on stochastic processes for graduate students in science and

engineering, including mathematics and statistics. It has become somewhat
commonplace to find growing numbers of students from outside of mathematics
enrolled along with mathematics students in our graduate courses on stochastic
processes. In this book we seek to address such a mixed audience. For this
purpose, in the main body of the text the theory is developed at a relatively
simple technical level with some emphasis on computation and examples.
Sometimes to make a mathematical argument complete, certain of the more
technical explanations are relegated to the end of the chapter under the label
theoretical complements. This approach also allows some flexibility in
instruction. A few sample course outlines have been provided to illustrate the
possibilities for designing various types of courses based on this book. The
theoretical complements also contain some supplementary results and references
to the literature.
Measure theory is used sparingly and with explanation. The instructor may
exercise control over its emphasis and use depending on the background of the
majority of the students in the class. Chapter 0 at the end of the book may be
used as a short course in measure theoretical probability for self study. In any
case we suggest that students unfamiliar with measure theory read over the first
few sections of the chapter early on in the course and look up standard results
there from time to time, as they are referred in the text.
Chapter applications, appearing at the end of the chapters, are largely drawn
from physics, computer science, economics, and engineering. There are many
additional examples and applications illustrating the theory; they appear in the
text and among the exercises.
Some of the more advanced or difficult exercises are marked by asterisks.
Many appear with hints. Some exercises are provided to complete an argument
or statement in the text. Occasionally certain well-known results are only a few
steps away from the theory developed in the text. Such results are often cited
in the exercises, along with an outline of steps, which can be used to complete
their derivation.
Rules of cross-reference in the book are as follows. Theorem m.n, Proposition
xv
PREFACE
xvi
m.n, or Corollary m.n, refers to the nth such assertion in section m of the same
chapter. Exercise n, or Example n, refers to the nth Exercise, or nth Example,
of the same section. Exercise m.n (Example m.n) refers to Exercise n (Example
n) of a different section m within the same chapter. When referring to a result
or an example in a different chapter, the chapter number is always mentioned
along with the label m.n to locate it within that chapter.
This book took a long time to write. We gratefully acknowledge research
support from the National Science Foundation and the Army Research Office
during this period. Special thanks are due to Wiley editors Beatrice Shube and
Kate Roach for their encouragement and assistance in seeing this effort through.
RABI N. BHATTACHARYA
EDWARD C. WAYMIRE
Bloomington, Indiana
Corvallis, Oregon
February 1990
Sample Course Outlines
COURSE I
Beginning with the Simple Random Walk, this course leads through Brownian
Motion and Diffusion. It also contains an introduction to discrete/continuous-
parameter Markov Chains and Martingales. More emphasis is placed on concepts,
principles, computations, and examples than on complete proofs and technical
details.
Chapter 1 Chapter II Chapter III
1-7 (+ Informal Review of Chapter 0, 4) 1-4 1--3
13 (Up to Proposition 13.5) 5 (By examples) 5
11 (Example 2)
13
Chapter IV Chapter V Chapter VI
1-7 (Quick survey 1 4
by examples) 2 (Give transience/recurrence
from Proposition 2.5)
3 (Informal justification of
equation (3.4) only)
5-7
10
11 (Omit proof of Theorem 11.1)
12-14
COURSE 2
The principal topics are the Functional Central Limit Theorem, Martingales,
Diffusions, and Stochastic Differential Equations. To complete proofs and for
supplementary material, the theoretical complements are an essential part of this
course.
Chapter I Chapter V Chapter VI Chapter VII
1-4 (Quick survey) 1-3 4 1--4
6-10 6-7
13 11
13 -17
COURSE 3
This is a course on Markov Chains that also contains an introduction to
Martingales. Theoretical complements may he used only sparingly.
Chapter I Chapter II Chapter III Chapter IV Chapter VI

1-6 1-9 1 1-11 1-2
13 11 5 4-5
12 or 15
13-14
xvii
CHAPTER I
Random Walk and Brownian

Motion
1 WHAT IS A STOCHASTIC PROCESS?
Denoting by X the value of a stock at an nth unit of time, one may represent
its (erratic) evolution by a family of random variables {X0 , X,, ...} indexed by
the discrete-time parameter n E 7L + . The number X, of car accidents in a city
during the time interval [0, t] gives rise to a collection of random variables
>,
{ X1 : t 0} indexed by the continuous-time parameter t. The velocity X. at a
point u in a turbulent wind field provides a family of random variables
{X: u e l8 3 indexed by a multidimensional spatial parameter u. More generally
}
we make the following definition.
Definition 1.1. Given an index set I, a stochastic process indexed by I is a

collection of random variables {X1 : 2 e I} on a probability space (Cl, ., P)
taking values in a set S. The set S is called the state space of the process.
In the above, one may take, respectively: (i) I = Z , S = I!; (ii) I = [0, oo),
S = Z; (iii) I = l, S = X8 3 . For the most part we shall study stochastic
processes indexed by a one-dimensional set of real numbers (e.g., time). Here
the natural ordering of numbers coincides with the sense of evolution of the
process. This order is lost for stochastic processes indexed by a multidimensional
parameter; such processes are usually referred to as random fields. The state
space S will often be a set of real numbers, finite, countable, (i.e., discrete) or
uncountable. However, we also allow for the possibility of vector-valued
variables. As a matter of convenience in notation the index set is often suppressed
when the context makes it clear. In particular, we often write {X} in place of
{X: n = 0, 1, 2, ...} and {X,} in place of {X,: t >, 0}.
For a stochastic process the values of the random variables corresponding
2 RANDOM WALK AND BROWNIAN MOTION
to the occurrence of a sample point co e fl constitute a sample realization

of the process. For example, a sample realization of the coin-tossing
process corresponding to the occurrence of w e f2 is of the form
(X0 (aw), X,(co), ... , X(w), ...). In this case X(w) = 1 or 0 depending on
whether the outcome of the nth toss is a head or a tail. In the general case of
a discrete-time stochastic process with state-space S and index set
I = 7L + = {0, 1, 2, ...}, the sample realizations of the process are of the form
(X0 (a ), X l (co), ... , X(w), ...), X(co) e S. In the case of a continuous-parameter
stochastic process with state space S and index set I = I{B + = [0, cc), the sample
realizations are functions t X(w) e S, w e S2. Sample realizations of a
stochastic process are also referred to as sample paths (see Figures 1.1 a, b).
In the so-called canonical choice for f) the sample points of f represent
sample paths. In this way S2 is some set of functions w defined on I taking
values in S, and the value X,(co) of the process at time t corresponding to the
outcome co E S2 is simply the coordinate projection X,(w) = co y . Canonical
representations of sample points as sample paths will be used often in the text.
Stochastic models are often specified by prescribing the probabilities of events
that depend only on the values of the process at finitely many time points. Such
events are called finite-dimensional events. In such instances the probability
measure P is only specified on a subclass ' of the events contained in a sigmafield
F. Probabilities of more complex events, for example events that depend on
the process at infinitely many time points (infinite-dimensional events), are
(a)
(b)
Figure 1.1
THE SIMPLE RANDOM WALK 3
frequently calculated in terms of the probabilities of finite-dimensional events

by passage to a limit.
The ideas contained in this section will be illustrated in the example and
in exercises.
Example 1. The sample space S2 for repeated (and unending) tosses of a coin
may be represented by the sequence space consisting of sequences of the form
w = (co l , w 2 . . , w n , ...) with aw n = 1 or co,, = 0. For this choice of 0, the value
,.
of X. corresponding to the occurrence of the sample point w e f is simply the

nth coordinate projection of w; i.e., X(w) = w,. Suppose that the probability of
the occurrence of a head in a single toss is p. Since for any number n of tosses
the results of the first n 1 tosses have no effect on the odds of the nth toss,
the random variables X 1 . . , X. are, for each n >, 1, independent. Moreover,
,.
each variable has the same (Bernoulli) distribution. These facts are summarized
by saying that {X 1 , X2,. . .} is a sequence of independent and identically
distributed (i.i.d.) random variables with a common Bernoulli distribution. Let
Fn denote the event that the specific outcomes E 1 , ... , e n occur on the first n
tosses respectively. Then
Fn = {X1 =e ,...,Xn =En } = { w a fl: w 1 = s 13 ...,CJ n =E n }
is a finite-dimensional event. By independence,
P(F,,) = p'"( 1 p)" - '" (1.1)
where rn is the number of l's among e, . , e n . Now consider the singleton

. .
event G corresponding to the occurrence of a specific sequence of outcomes

c ,e ,...,s n ,... . Then
G = {X l =e ,. . . , Xn = E n ,. . .} = {(E1, E2, . . . , en , ...)}
consists of the single outcome a = ( E,, e 2 , ... , E n , ...) in f2. G is an

infinite-dimensional event whose probability is easily determined as follows.
Since G c Fn for each n > 1, it follows that
0 < P(G) < P(F,,) = p'"(1 p)" '" - for each n = 1, 2, .... (1.2)
Now apply a limiting argument to see that, for 0 < p < 1, P(G) = 0. Hence the
probability of every singleton event in S2 is zero.
2 THE SIMPLE RANDOM WALK
Think of a particle moving randomly among the integers according to the

following rules. At time n = 0 the particle is at the origin. At time n = 1 it
moves either one unit forward to + I or one unit backward to 1, with

respective probabilities p and q = 1 p. In the case p = 2, this may be
accomplished by tossing a balanced coin and making the particle move forward
or backward corresponding to the occurrence of a "head" or a "tail",
respectively. Similar experiments can be devised for any fractional value of p.
We may think of the experiment, in any case, as that of repeatedly tossing a
coin that falls "head" with probability p and shows "tail" with probability
I p. At time n the particle moves from its present position S_ 1 by a unit
distance forward or backward depending on the outcome of the nth toss.
Suppose that X. denotes the displacement of the particle at the nth step from
its position S_, at time n 1. According to these considerations the
displacement (or increment) process {X.} associated with {S} is an i.i.d. sequence
with P(X = + 1) = p, P(X _ 1) = q = 1 p for each n > 1. The position
process {S} is then given by
S,,:=X i +...+X., S 0 =0. (2.1)
Definition 2.1. The stochastic process {S,,: n = 0, 1, 2, ...} is called the simple
random walk. The related process S = S + x, n = 0, 1, 2, ... is called the simple
random walk starting at x.
The simple random walk is often used by physicists as an approximate model

of the fluctuations in the position of a relatively large solute molecule immersed
in a pure fluid. According to Einstein's diffusion theory, the solute molecule
gets kicked around by the smaller molecules of the fluid whenever it gets within
the range of molecular interaction with fluid molecules. Displacements in any
one direction (say, the vertical direction) due to successive collisions are small
and taken to be independent. We shall return to this physical model in Section 7.
One may also think of X,, as a gambler's gain in the nth game of a series of
independent and stochastically identical games: a negative gain means a loss.
Then S = x is the gambler's initial capital, and S is the capital, positive or
negative, at time n.
The first problem is to calculate the distribution of S. To calculate the
probability of {S; = y}, count the number u of + I's in a path from x to y in
n steps. Since n u is then the number of l's, one must have
u (n u) = y x, or u = (n + y x)/2. For this, nand y x must be both
even or both odd, and ly xj <, n. Hence
n
n + y x pin+Yx)12q(nY+x)/2 if ly xI < ri
P(S. =Y)= 2
and y x, n have the same parity,
0 otherwise. (2.2)
TRANSIENCE AND RECURRENCE PROPERTIES OF THE SIMPLE RANDOM WALK 5
3 TRANSIENCE AND RECURRENCE PROPERTIES OF THE

SIMPLE RANDOM WALK
Let us first consider the manner in which a particle escapes from an interval.
Let TY denote the first time that the process starting at x reaches y, i.e.
Ty:= min{n >, 0:S = y}. (3.1)
To avoid trivialities, assume 0 <p < 1. For integers c and d with c < d, denote
4(x):= P(T < T' ). (3.2)
In other words, 4(x) is the probability that the particle starting at x reaches d
before it reaches c. Since in one step the particle moves to x + I with probability
p, or to x 1 with probability q, one has
4(x) = po(x + 1) + q4(x 1) (3.3)
so that
O(x+ 1)O(x)=-[O(x)c(x 1)], c+ 1 ,<x,<d 1

p
(3.4)
0(c) = 0,
q(d) = 1.
Thus, (x) is the solution to the discrete boundary-value problem (3.4). For
p ^ q, Eq. 3.4 yields
x-1 x-1 q Y
O(x) = Z [^(y + 1 ) o(y)] = Z - [O(c + 1) O(c)]
v=c v=c P
1 - (q/P)x -'
=0(c+1) Y - =0(c+1) 1- /p) (3.5)

yc ' \P )
To determine 4(c + 1) take x = d in Eq. 3.5 to get
1 (qlP)- `
1 =4(d)=4(c+ 1)
1 q/P
Then
1 q/P
q(c + 1) = 1
(glp)d c
so that

P(Tx<Tx)= 1 (q/P)xd-c for c<x<d, p q. (3.6)
1 (q/P)
Now let
0/i(x)==P(T, < Tdx). (3.7)
By symmetry (or the same method as above),
P(Tx<Td)= 1(P/q)d-xd-c
or c<x<d,p q.
f (3.8)
1 (P/q)
Note that O(x) + fr(x) = 1, proving that the particle starting in the interior of
[c, d] will eventually reach the boundary (i.e., either c or d) with probability
1. Now if c < x, then (Exercise 3)
P({S} will ever reach c) = P(T,' < oo) = lim i(x)

d-+oo

um \9/ x
c
ifp >21
= dam
1, ifp <Z,
xc
q if p>
= P (3.9)
1, ifp<Z.
By symmetry, or as above,
i
P({S:} will ever reach d) = P(Td' < oo) = 1' d_x i f p > 2 (3.10)
C q/ ,
fp<Z.
Observe that one gets from these calculations the (geometric) distribution
function for the extremes Mx = sup,, S and mx = inf S; (Exercise 7).
Note that, by the strong law of large numbers (Chapter 0),
P Sx = x+S
^pgasn--oo =1. (3.11)
n n
TRANSIENCE AND RECURRENCE PROPERTIES OF THE SIMPLE RANDOM WALK 7
Hence, if p > q, then the random walk drifts to + oo (i.e., S -* + co) with
probability 1. In particular, the process is certain to reach d > x if p > q.
Similarly, if p < q, then the random walk drifts to - co (i.e., S -+ - cc), and
starting at x > c the process is certain to reach c if p < q. In either case, no
matter what the integer y is,
P(Sn = y i.o.) = 0, if p q, (3.12)
where i.o. is shorthand for "infinitely often." For if Sx = y for integers

nl < n2< through a sequence going to infinity, then
x
= y -+ 0 as n k - cc,
nk nk
the probability of which is zero by Eq. 3.11.
Definition 3.1. A state y for which Eq. 3.12 holds is called transient. If all
states are transient then the stochastic process is said to be a transient process.
In the case p = q = 2, according to the boundary-value problem (3.4), the

graph of 4(x) is along the line of constant slope between the points (c, 0) and
(d, 1). Thus,
x-c
P(Tx<Tx)= ,
c<x<d,p=q =Z (3.13)
d-c
Similarly,
d-x
P(Tcx <Td)=
d-c
c<,x<d,p=q =2 (3.14)
Again we have
q5(x) + i(x) = 1. (3.15)
Moreover, in this case, given any initial position x> c,
P({S} will eventually reach c) = P(Tc < cc)
= lim P({S} will reach c before it reaches d)
= lim d-x = 1. (3.16)

d - mo d - c
Similarly, whatever the initial position x < d,

P({S.} will eventually reach d) = P(Td < oo)
= lim xc = 1. (3.17)
e -- ao d c
Thus, no matter where the particle may be initially, it will eventually reach any
given state y with probability 1. After having reached y for the first time, it will
move to y + 1 or to y 1. From either of these positions the particle is again
bound to reach y with probability 1, and so on. In other words (Exercise 4),
P(S' = y i.o.) = 1, if p = 9 = 2. (3.18)
This argument is discussed again in Example 4.1 of Chapter II.
Definition 3.2. A state y for which Eq. 3.18 holds is called recurrent. If all
states are recurrent, then the stochastic process is called a recurrent process.
Let r^ X denote the time of the first return to x,
rl := inf{n >, 1: S.' = x} . (3.19)
Then, conditioning on the first step, it will follow (Exercise 6) that
P(f],, < oo) = 2 min(p, q). (3.20)
4 FIRST PASSAGE TIMES FOR THE SIMPLE RANDOM WALK

Consider the random variable 7 := T representing the first time the simple
random walk starting at zero reaches the level (state) y. We will calculate the
distribution of Ty by means of an analysis of the sample paths of the simple
random walk. Let FN , y = {Ty = N} denote the event that the particle reaches
state y for the first time at the Nth step. Then,
FN.y ={Siky for n=0,1,...,N-1,SN =y}. (4.1)
Note that "SN = y" means that there are (N + y)/2 plus l's and (N y)/2
minus 1's among X I , X2 , ... , XN (see Eq. 2.1). Therefore, we assume that
IYI <, N and N + y is even. Now there are as many paths leading from (0, 0)
to (N, y) as there are ways of choosing (N + y)/2 plus l's among X 1 , X2 , ... , XN ,
namely
N
N+y
2
FIRST PASSAGE TIMES FOR THE SIMPLE RANDOM WALK 9
Each of these choices has the same probability of occurrence, specifically

p(N+y)/2q(N-r)/z Thus,
Lp(N+r)rzq(N -vz (4.2)

P(FN.Y) =
where L is the number of paths from (0, 0) to (N, y) that do not touch or cross
the level y prior to time N. To calculate L, consider the complementary number
L of paths that do reach y prior to time N,
N
L'= N+y L. (4.3)
2
First consider the case of y> 0. If a path from (0, 0) to (N, y) has reached
y prior to time N, then either (a) SJY _ 1 = y + 1 (see Figure 4.1a) or
(b) SN _ 1 = y I and the path from (0, 0) to (N 1, y 1) has reached y prior
to time N 1 (see Figure 4.1b). The contribution to L from (a) is
N-1
N+y
2
We need to calculate the contribution to L from (b).
(a)
1I
Figure 4.1

Proposition 4.1. (A Reflection Principle). Let y > 0. The collection of all paths
from (0, 0) to (N 1, y 1) that touch or cross the level y prior to time N 1
is in one-to-one correspondence with the collection of all possible paths from
(0,0)to (N 1,y + 1).
Proof. Given a path y from (0, 0) to (N 1, y + 1), there is a first time r at

which the path reaches level y. Let y' denote the path which agrees with y up
to time T but is thereafter the mirror reflection of y about the level y (see Figure
4.2). Then y' is a path from (0, 0) to (N 1, y 1) that touches or crosses the
level y prior to time N 1. Conversely, a path from (0, 0) to (N 1, y 1)
that touches or crosses the level y prior to time N 1 may be reflected to get
a path from (0, 0) to (N 1, y + 1). This reflection transformation establishes
the one-to-one correspondence. n
It now follows from the reflection principle that the contribution to L' from
(b) is
N-1
N+y
2
Hence
N-1

L' =2 N+y (4.4)
2
Therefore, by (4.3), (4.2),

N N-1
P(T,,=N)= P(FN. y )= N + y 2 N + y
p(N +Y)/2q(N-y)/2

2 2
N-
Figure 4.2
MULTIDIMENSIONAL RANDOM WALKS 11
N
= IYI N + y p(N+Y)12q(N_Y)1i for N >, y, y + N even, y > 0
N \ 2
(4.5)
To calculate P(TT = N) for y < 0, simply relabel H as T and T as H (i.e.,
interchange + 1, 1). Using this new code, the desired probability is given by
replacing y by y and interchanging p, q in (4.5), i.e.,
N
P(Tr = N) = ( + y q(N_Y)/2p(N+Y)/2
2
Thus, for all integers y 0, one has
N
P ( Ty = N ) = N + y p ( N+y)/2 q (x -v)I2 = I (4.6)
p(SN = y)
N
2
for N = IYI, IYI + 2 , IYI + 4, .... In particular, if p = q = Z, then (4.6) yields

N
P(Ty = N)= I N N+y
2
Z N for N= IYI,IYI +2,IYI +4,.... (4.7)
However, observe that the expected time to reach y is infinite since by Stirling's
formula, k! = (2irk) 1 / 2 Ve '`( 1 + o(1)) as k -,, oo, the tail of the p.m.f. of Ty is of
-
the order of N -3 / 2 as N - oo (Exercise 10).
5 MULTIDIMENSIONAL RANDOM WALKS
The k-dimensional unrestricted simple symmetric random walk describes the

motion of a particle moving randomly on the integer lattice 7L k according to
the following rules. Starting at a site x = (x., ... , x k ) with integer coordinates,
the particle moves to a neighboring site in one of the 2k coordinate directions
randomly selected with probability 1/2k, and so on, independently of previous
displacements. The displacement at the nth step is a random variable X. whose
possible values are vectors of the form e ; , i = 1, ... , k, where the jth
component of e ; is 1 for j = i and 0 otherwise. X 1 , X 2 ,... are i.i.d. with
P(X= e ; )=P(X= e,) = 1/2k fori= 1,...,k. (5.1)

The corresponding position process is defined by
S=x, S"= x+X 1 ++X n , ni1. (5.2)
The case k = 1 is that already treated in the preceding sections with p = q = 2.

In particular, for k = 1 we know that the simple symmetric random walk is
recurrent.
Consider the coordinates of X. = (X,.. . , X.). Although X,', and X are not
i
independent, notice that they are uncorrelated for 0 j. Likewise, it follows that
the coordinates of the position vector S = (Sn' 1 , ... , S) are uncorrelated.
In particular,
ES = x,
Cov(S nxi xj
' , Sn')
= tn, if =j i (5.3)
0, ifi*j.
Therefore the covariance matrix of S. is nI where I is the k x k identity matrix.

The problem of describing the recurrence properties of the simple symmetric
random walk in k dimensions is solved by the following theorem of P1ya.
Theorem 5.1. [P61ya]. {S} is recurrent for k = 1, 2 and transient for k ? 3.
Proof. The result has already been obtained for k = 1. In general, let S. = Sno
and write
rn=P(Sn=0)
fn = P(S" = 0 for the first time after time 0 at n), n >, 1. (5.4)
Then we get the convolution equation

n
rn = f r"_ forn=1,2,...,
j j
j =0
ro=1, f0= 0 . (5.5)
Let P(s) and f(s) denote the respective probability generating functions of {r n }
and {f,.} defined by
P(s) _ f (s) _ > fn s" (0 < s < 1). (5.6)

n =o n =o
The convolution equation (5.5) transforms as
P(s) = 1 + I E .ijrn-js'sn-j = 1 + jZ Y r
n =1j =0 =0 (M W
=0
m sm)fj sj = 1 + f(s)f(s). (5.7)

MULTIDIMENSIONAL RANDOM WALKS 13
Therefore,
(5.8)
r(s) 1 f(s)
The probability of eventual return to the origin is given by

00
Y:= Y- fn =.f(l) (5.9)
Note that by the Monotone Convergence Theorem (Chapter 0), P(s) ,, r(1) and
f(s) / f(1) as s T 1. If f(1) < 1, then P(l) = (1 f (1))' < oo. If f(1) = 1,
then P(1) = um s , (1 f(s) = oo. Therefore, y < 1 (i.e., 0 is transient) if
and only if :=r(1) < oo.
This criterion is applied to the case k = 2 as follows. Since a return to 0 is
possible at time 2n if and only if the numbers of steps among the 2n in the
positive horizontal and vertical directions equal the respective numbers of steps
in the negative directions,
r2n=4 -in " (2n)! 1

=_ 2n ( n2
j=o j!j!(n j)!(n j)! 42" n j
\
412n nn ^ n Î n n 1 nn^ z . (5.10)
j=o j 4 2n
The combinatorial identity used to get the last line of (5.10) follows by
considering the number of ways of selecting samples of size n from a population
of n objects of type 1 and n objects of type 2 (Exercise 2). Apply Stirling's
formula to (5.10) to get r2 = 0(1/n) > c/n for some c > 0. Therefore,
= P(1) = + oo and so 0 is recurrent in the case k = 2.
In the case k = 3, similar considerations of "coordinate balance" give
rzn = 6zn (2n)!

(j.m):j+m ,n j!j!m!m!(n
j m)!(n
j m)!
1 2n 1
( ) n! 2
= 22n n (5.11)
j+msn 3" j!m!(n j m)!} .
Therefore, writing
n! 1
pj, m =
j!m!(n j m)! 3n
and noting that these are the probabilities for the trinomial distribution, we have
that
1 (2n) 2
= 2 z" (P;.m) (5.12)
n
is nearly an average of pj , m 's (with respect to the p j , m distribution). In any case,
( 2n)
22 n jmax Pj,m]Pj,m= 2 a" ( nnx) ma Pj.m. (5.13)
j,m j,m
The maximum value of pj , m is attained at j and m nearest to n/3 (Exercise 5).

Therefore, writing [x] for the integer part of x,
r2n \ 1 2n 1
( ) n.i
(5.14)
2 2 n n 3" rn i fl, [n],
Apply Stirling's formula to get (see 5.19 below),
C
r 2n - 2" nn n n
3/2 for some C' > 0. (5.15)
In particular,
Er"<oo. (5.16)
"
The general case, r2n < c k n -k/ 2 for k > 3, is left as an exercise (Exercise 1).
n
The constants appearing in the estimate (5.15) are easily computed from the
monotonicity of the ratio n!/{(2nn)Ì 2 n"e - "}; whose limit as n -> oo is 1
according to Stirling's formula. To see that the ratio is monotonically decreasing,
simply observe that
t
log n! = log n! flog n n log n + n log(2n) l i 2
(2nn) 112 n"e - "
J .
,-1
j log j Z log n}{n lognn}log(2n)"2
)
= j log(j 1) + log(j) f " log

x dx } + 1 log(2n) l i z
(
U2 2 J^ J
(5.17)
CANONICAL CONSTRUCTION OF STOCHASTIC PROCESSES 15
where the integral term may be checked by integration by parts. The point is
that the term defined by
" log(J 1) + log(j)

2 (5.18)
j =2
provides the inner trapezoidal approximation to the area under the curve
y = log x, 1 < x <, n. Thus, in particular, a simple sketch shows
01 J logxdxT"
is monotonically increasing. So, in addition to the asymptotic value of the ratio,

one also has
n! e
n = 1, 2, .... (5.19)
1 (2nn) 112 n"e - " < (2n) 1 / z ,
6 CANONICAL CONSTRUCTION OF STOCHASTIC PROCESSES
Often a stochastic process is defined on a given probability space as a sequence

of functions of other already constructed random variables. For example, the
simple random walk {S" = X l + + X"}, So = 0 is defined in terms of the
coin-tossing process {X"} in Section 2. At other times, a probability space is
constructed specifically to define the stochastic process. For example, the
probability space for the coin-tossing process was constructed starting from the
specifications of the probabilities of finite sequences of heads and tails. This
latter method, called the canonical construction, is elaborated upon in this
section.
Consider the case that the state space is R' (or a subset of it) and the
parameter is discrete (n = 1, 2, ...). Take S2 to be the space of all sample paths;
i.e., 52:= (ff!)' := R' is the space of all sequences co = (cw l , w 2 ,...) of real
numbers. The appropriate sigmafield .y := R is then the smallest sigmafield
containing all finite-dimensional sets of the form {w e SZ: w 1 e B I , ... , w e Bk },
where B I , ... , B k are Borel subsets of W. The coordinate functions X" are
defined by X(w) = con.
As in the case of coin tossing, the underlying physical process sometimes
suggests a specification of probabilities of finite-dimensional events defined by
the values of the process at time points 1, 2, ... , n for each n >, 1. That is, for
each n > 1 a probability measure P. is prescribed on (R", M "). The problem is
that we require a probability measure P on (f, F) such that P" is the distribution
of X 1 , ... , X. That is, for all Borel sets B 1 , ... , B",
P(we02:m,EB 1 ,...,Cw "EB")=P"(B 1 x ... x B"). (6.1)

Equivalently,
P(X 1 E B 1 , ... , X E B) = P(B 1 x .. x B). (6.2)
Since the events {X 1 c-B 1 ,...,XEB,Xn+1 eR'}and {X 1 eB 1 ,...,XEB}

are identical subsets of .^', for there to be a well-defined probability measure
P prescribed by (6.1) or (6.2) it is necessary that
PP +1(B, x x B x Ili') = Pn (B, x .. x B) (6.3)
for all Borel sets B 1 , . .. , B. in 118 1 and n >, 1. Kolmogorov's Existence Theorem
asserts that the consistency condition (6.3) is also sufficient for such a probability
measure P to exist and that there is only one such P on (, R) = (12, F)
(theoretical complement 1). This holds more generally, for example, when the
state space S is l, a countable set, or any Borel subset of tF . A proof for the
simple case of finite state processes is outlined in Exercise 3.
Example 1. Consider the problem of canonically constructing a sequence

X 1 , X2 , ... of i.i.d. random variables having the common (marginal) distribution
Q on (111 1 , R 1 ). Take il = IR', F = R', and X. the nth coordinate projection
X(w) = w, w E S2. Define, for each n >, 1 and all Borel sets B 1 , ... , B,,,
p,, (B 1 x ... x B.) = Q(B 1 ). . . Q(B,,). (6.4)
Since Q(R') = 1, the consistency condition (6.3) follows immediately from the
definition (6.4). Now one simply invokes the Kolmogorov Existence Theorem
to get a probability measure P on (S2, F) such that
P(X 1 E B 1 , ... , Xq E Bn) = Q(B1) ...

Q(Bn)
= p(X1 EB1) . .
.p(X,,EB). (6.5)
The simple random walk can be constructed within the framework of the
canonical probability space (S2, F, P) constructed for coin tossing, although
this is a noncanonical probability space for {S}. Alternatively, a canonical
construction can be made directly for {S} (Exercise 2(i)). This, on the other
hand, provides a noncanonical probability space for the displacement
(coin-tossing) process defined by the differences X. = S. S_ 1 , n > 1.
Example 2. The problem is to construct a Gaussian stochastic process having

prescribed means and covariances. Suppose that we are given a sequence of
real numbers i , p a ,... , and an array, a 1 , i, j = 1, 2, ... , of real numbers
satisfying
(Symmetry)
Qi; = o;i for all i, j, (6.6)

BROWNIAN MOTION 17
(Non-negative Definiteness)
Z 6 ;j x ; xj ^ 0 for all n-tuples (x 1 , ... , x) in (6.7)

i,j= 1
Property (6.7) is the condition that D. = ((Q ;j )), ,; , be a nonnegative definite

matrix for each n. Again take ) = R', = :4', and X,, X 2 ,. . . the respective
coordinate projections. For each n >, 1, let P be the n-dimensional Gaussian
distribution on (O", ") having mean vector ( l , . .. , ) and covariance matrix
D. Since a linear transformation of a Gaussian random vector is also Gaussian,
the consistency condition (6.3) can be checked by applying the coordinate
projection mapping (x 1 ..... x + ,) -+ (x 1 , ... , x) from I +' to tll (Exercise 1).
Example 3. Let S be a countable set and let p = ((p)) be a matrix of

nonnegative real numbers such that for each fixed i, p ; , j is a probability
distribution (sums to I over j in S). Let a = (7r i ) be a probability distribution
on S. By the Kolmogorov Existence Theorem there is a probability distribution
P,, on the infinite sequence space = S x S x x S x . such that
PP (X0 =10.....X =j) = n 30 pJ0 J1 p where X denotes the nth
projection map (Exercise 2(ii)). In this case the process {X}, having distribution
P, is called a Markov chain. These processes are the subject of Chapter II.
7 BROWNIAN MOTION
Perhaps the simplest way to introduce the continuous-parameter stochastic

process known as Brownian motion is to view it as the limiting form of an
unrestricted random walk. To physically motivate the discussion, suppose a
solute particle immersed in a liquid suffers, on the average, f collisions per
second with the molecules of the surrounding liquid. Assume that a collision
causes a small random displacement of the solute particle that is independent
of its present position. Such an assumption can be justified in the case that the
solute particle is much heavier than a molecule of the surrounding liquid. For
simplicity, consider displacements in one particular direction, say the vertical
direction, and assume that each displacement is either +A or A with
probabilities p and q = 1 p, respectively. The particle then performs a
one-dimensional random walk with step size A. Assume for the present that
the vessel is very large so that the random walk initiated far away from the
boundary may be considered to be unrestricted. Suppose at time zero the particle
is at the position x relative to some origin. At time t > 0 it has suffered
approximately n = tf independent displacements, say Z 1 , Z 2 , ... , Z. Since f
is extremely large (of the order of 10 21 ), if t is of the order of 10 -10 second
then n is very large. The position of the particle at time t, being x plus the sum
of n independent Bernoulli random variables, is, by the central limit theorem,
approximately Gaussian with mean x + tf(p q)0 and variance tf4A 2 pq. To
make the limiting argument firm, let
p=2+ 2 ^ o and 0= ^
Here p and a are two fixed numbers, or > 0. Then as f --> cc, the mean
displacement t f (p q)0 converges to t and the variance converges to ta e . In
the limit, then, the position X, of the particle at time t > 0 is Gaussian with
probability density function (in y) given by
(Y
_
2QZt_t)z (7.1)
P(t; x, Y) _ (2ita2t)1j2 eXp{
Ifs > 0 then X, + X, is the sum of displacements during the time interval
(t, t + s]. Therefore, by the argument above, X, +s X, is Gaussian with mean
s and variance sa e , and it is independent of {X,,: 0 < u < t}. In particular, for
every finite set of time points 0 < t l < t 2 < <t, the random variables
X,,, X^ 2 X,.. . , X XX ,,,_, are independent. A stochastic process with this
last property is said to be a process with independent increments. This is the
continuous-time analogue of random walks. From the physical description of
the process {X} as representing (a coordinate of) the path of a diffusing solute
particle, one would expect that the sample paths of the process (i.e., the
trajectories t * X(w) = w,) may be taken to be continuous. That this is indeed
the case is an important mathematical result originally due to Norbert Wiener.
For this reason, Brownian motion is also called the Wiener process. A complete
definition of Brownian motion goes as follows.
Definition 7.1. A Brownian motion with drift and diffusion coefficient a 2 is

a stochastic process {X,: t 0} having continuous sample paths and
independent Gaussian increments with mean and variance of an increment
XX+s XX being sp and sa e , respectively. If X0 = x, then this Brownian motion
is said to start at x. A Brownian motion with zero drift and diffusion coefficient
of 1 is called the standard Brownian motion.
Families of random variables {X} constituting Brownian motions arise in

many different contexts on diverse probability spaces. The canonical model for
Brownian motion is given as follows.
1. The sample space S2 := C[0, oo) is the set of all real-valued continuous
functions on the time interval [0, cc). This is the set of all possible
trajectories (sample paths) of the process.
2. XX (co) := co, is the value of the sample path w at time t.
3. S2 is equipped with the smallest sigmafield .y of subsets of S2
containing the class .moo of all finite-dimensional sets of the form
F = fce e ): a ; <w,. < b i , i = 1, 2, ... , k}, where a <b, are constants
;
and 0 < t l < t 2 < < t k are a finite set of time points. .F is said to be
generated by .moo.

BROWNIAN MOTION 19
4. The existence and uniqueness of a probability measure P x on F, called

the Wiener measure starting at x, as specified by Definition 7.1 is
determined by the probability assignments of the form of (7.2) below.
For the set F above, P (F) can be calculated as follows. Definition (7.1) gives
P
the joint density of X, Xr2 - X,,, ... , X,, - Xtk _, as that of k independent
Gaussian random variables with means t 1 p., (t 2 - t1), ... , (tk - tk -1)/ 2 ,
respectively, and variances tIQ 2 , (t2 t1)a 2 , ... , (tk tk_ 1 )a 2 , respectively.
Transforming this (product) joint density, say in variables z 1 , z 2 , ... , by the
change of variables z 1 = YI, z2 = Y2 - Y1' I zk = Yk - Yk-1 and using the fact
that the Jacobian of this linear transformation is unity, one obtains
PX (a ; <X11 <b 1 fori= 1,2,...,k)

I fbk
J
fbk
^ bI ...
-
{(Y1 X t A'
2 "2 exp 2
01 ak I ak (27rQ t l) 2v t,
1 t t1)1 1 ) 2
1Y2 Y1 (2 l
(21IU2(t2 t1))"2exp

2a2(t2 t1) 1
... 2
I
1/ 2
J (Yk Yk-1 (tk tk-1)t^) 2
expl( 2 dYk dYk-I
...
dY1
(27L6 (tk tk - 1)) 26
(tk tk - 1) ^
(7.2)
The joint density of X, , X, , ... , X,,, is the integrand in (7.2) and may be
1 2
expressed, using (7.1), as
P(t1;x+Y1)P(t2 t1+Y1 , Y2)"'P(tk tk-I+Yk-I,Yk) (7.3)
The probabilities of a number of infinite-dimensional events will be calculated

in Sections 9-13, and in Chapter IV. Some further discussion of mathematical
issues in this connection are presented in Section 8 also. The details of a
construction of the Brownian motion and its Wiener measure distribution are
given in the theoretical complements of Section 13.
If {X,(' ) }, j = 1, 2, ... , k, are k independent standard Brownian motions, then
the vector-valued process {X,} = {{X; 1) , Xt( 2 I, ... , Xr( k) )} is called a standard
k-dimensional Brownian motion. If {X,} is a standard k-dimensional Brownian
motion, p = ( (1) , ... , ( k 1 ) a vector in l, and A a k x k nonsingular matrix,
then the vector-valued process {Y, = AX, + t} has independent increments, the
increment Y +S - Y, = A(X, +s - X,) + (t + s - t) being Gaussian with mean
vector su and covariance (or dispersion) matrix sD, where D = AA' and A'
denotes the transpose of A. Such a process Y is called a k-dimensional Brownian
motion with drift vector p and diffusion matrix, or dispersion matrix D.
8 THE FUNCTIONAL CENTRAL LIMIT THEOREM (FCLT)

The argument in Section 7 indicating Brownian motion (with zero drift
parameter for simplicity) as the limit of a random walk can be made on the
basis of the classical central limit theorem which applies to every i.i.d. sequence
of increments {Z m } having finite mean and variance. While we can only obtain
convergence of the finite-dimensional distributions by such considerations, much
more is true. Namely, probabilities of certain infinite-dimensional events will
also converge. The convergence of the full distributions of random walks to
the full distribution of the Brownian motion process is informally explained in
this section. A more detailed and precise discussion is given in the theoretical
complements of Sections 8 and 13.
To state this limit theorem somewhat more precisely, consider a sequence
of i.i.d. random variables {Z.} and assume for the present that EZ, = 0 and
Var Z. = a 2 > 0. Define the random walk
S0= 0 , Sm =Z 1 +.+.Zm (m=1,2,...). (8.1)
Define, for each value of the scale parameter n >, 1, the stochastic process
X i n) = S[nr] (t
^ i 0), (8.2)
V "
where [nt] is the integer part of nt. Figure 8.1 plots the sample path of
{X;"^: t >, 0} up to time t = 13/n if the successive displacements take values
Z 1 =-1, Zz =+1, Z 3 =+1, Z4 =+1, Z 5 =-1, Z6 =+1, Z 7 =+1,
Zg = 1, Zq = + 1, Z10 = + 1, 211 = + 1, Z12 = 1.
_ 1
Simi
I n .i
4 4 '-4
Vn
3
^---; -4
'In
? 4.--1-4
Vn
Vn
1 1 3 4 5 6 7 i 9 10 11 12 13 t
n n n n n n It n n n n n n
Intl
EX " = 0, VarX^11 = n ~ t,
' )
Cov(Vt., + `v l )= [n] ~ S.
Figure 8.1

THE FUNCTIONAL CENTRAL LIMIT THEOREM (FCLT) 21
>,
The process {S 11 : t 0} records the discrete-time random walk
{S.: m = 0, 1, 2, ...} on a continuous time scale whose unit is n times that of
the discrete time unit, i.e., Sm is plotted at time m/n. The process
{X} = {(1/ \/)S[ f] also scales distance by measuring distances on a scale
}
whose unit is f times the unit of measurement used for the random walk.
This is a convenient normalization, since
[nt]0 z2
EX ) = 0, Var X}" ) = ta z for large n. (8.3)
n
In a time interval (t 1 , t 2 ] the overall "displacement" X X( ) is the sum of
a large number [nt z ] [nt,] n(t 2 t,) of small i.i.d. random variables
1 1
In the case {Z,} is i.i.d. Bernoulli, this means reducing the step sizes of the
random variables to t1 = 1/,.fn. In a physical application, looking at {X}
means the following.
1. The random walk is observed at times t, < t 2 <t 3 < sufficiently far
apart to allow a large number of individual displacements to occur during
each of the time intervals (t,, t z ], (t z , t 3 ... , and
],
2. Measurements of distance are made on a "macroscopic" scale whose unit

of measurement is much larger than the average magnitude of the
individual displacements. The normalizing large parameter n scales time
and n'' z scales space coordinates.
Since the sample paths of {X} have jumps (though small for large n) and
are, therefore, discontinuous, it is technically more convenient to linearly
interpolate the random walk between one jump point and the next, using the
same spacetime scales as used for {X}. The polygonal process {X,( 1 is " }
formally defined by
Xt") = SIntl + (nt [nt]) t 0. (8.4)
In this way, just as for the limiting Brownian motion process, the paths of
{X} are continuous. Figure 8.2 plots the path of {X1 "} corresponding to the
path of {X} drawn in Figure 8.1. In a time interval m/n < t < (m + 1)/n, X;" )
is constant at level 1// S while X}" ) changes linearly from l/ f S. at time
t=m/n to
m+1
I S, +1 = S"' Z'" + ' at time
me t =
n

I [n(] Z101j+j
S^ rl + (t
n ) ^n
Vn
4
Vn
3
In
2
do
W,
[nt] + 1 (t [nt] t
= 0, VarX^ rn) _ )2
n n n
[ns]
Figure 8.2
Thus, in any given interval [0, T] the maximum difference between the two
processes {X,(n} and {X,( } does not exceed
)
c n (T) = max IZII IZ21 , , IZ nT,+1I

[
To see that the difference between {X,(n } and {X;n } is negligible for large n,
) )
consider the following estimate. For each 6 > 0,
P(e n (T) > 6) = 1 P(e n (T) < (5)
I ^'<(5
= I P( Z for allm= 1,2,...,[nT]+ 1)
lV n
[nT1+1
= 1 (P(IZ11 < 5\))
= 1 (1 P(IZ11 > 6.^ n))[nT 1 +l (8.5)
Assuming for simplicity that EIZ 1 1 3 < co, Chebyshev's inequality yields
P(1Z11 > 6 ^) <, EIZII 3 /6 3 n 3/2 . Use this in (8.5) to get (Exercise 9)
(nTJ+l
EIZIr
P(e (T) > (5) 1 ( i (533/2 )
1exp{ EIZ113T } 0 (8.6)

63 n 1/2
when n is large. Here indicates that the difference between the two sides
THE FUNCTIONAL CENTRAL LIMIT THEOREM (FCLT) 23
goes to zero. Thus, on any closed and bounded time interval the behaviors of
{X,(" } and {X} are the same in the large-n limit.
)
Note that given any finite set of time points 0 < t 1 < t 2 < < t, the joint
distribution of (X, X;z ) , .. . , X(" ) converges to the finite-dimensional
)
distribution of (X , X, , ... , X ), where {X,} is a Brownian motion with zero

X1 2 tk
drift and diffusion coefficient a 2 . To see this, note that X, X Xt"^, ... ,
X,( X() , are independent random variables that by the classical central limit
k )
theorem (Chapter 0) converge in distribution to Gaussian random variables

with zero means and variances t1a 2 , (t2 t, )a 2 , ... , (tk tk_ t )Q 2 . That is to
say, the joint distribution of (X, X ,( " ) X X (") X ("^ )converges to
,
that of (X,,, X X, ... , X X, _,). By a linear transformation, one gets

k
the desired convergence of finite-dimensional distributions of {X(" } (and, )
therefore, of {X^" 1 }) to those of the Brownian motion process {X} (Exercise 1).
Roughly speaking, to establish the full convergence in distribution of {X!" 1 }
to Brownian motion, one further looks at a finite set of time points comprising
a fine subdivision of a bounded interval [0, T] and shows that the fluctuations
of the process {X^"^} on [0, T] between successive points of this subdivision
are sufficiently small in probability, a property called the tightness of the process.
This control over fluctuations together with the convergence of {X^" 1 } evaluated
at the time points of the subdivision ensures convergence in distribution to a
continuous process whose finite-dimensional distributions are the same as those
of Brownian motion (see theoretical complements for details). Since there is no
process other than Brownian motion with continuous sample paths that has
these limiting finite-dimensional distributions, it follows that the limit must be
Brownian motion.
A precise statement of the functional central limit theorem (FCLT) is the
following.
Theorem 8.1. (The Functional Central Limit Theorem). Suppose {Z,,,:

m = 1, 2, ...} is an i.i.d. sequence with EZ, = 0 and variance a 2 > 0. Then as
n * cc the stochastic processes {X: t > 0} (or {Xr": t >, 0}) converge in
distribution to a Brownian motion starting at the origin with zero drift and
diffusion coefficient a 2 .
An important way in which to view the convergence asserted in the FCLT

is as follows. First, the sample paths of the polygonal process {X^" 1 } belong to
the Space S = C[O, oo) of all continuous real-valued function on [0, oo), as do
those of the limiting Brownian motion {X}. This space C[O, oo) is a metric
space with a natural notion of convergence of sequences {d" ) }, say, being that
"{m ( " ) } converges to w in C[O, co) as n tends to infinity if and only if
{co ( " ) (t): a <, t < b} converges uniformly to {w(t): a < t < b} for all closed and
bounded intervals [a, b]." Second, the distributions of the processes {X} and
{X,} are probability measures P" and P on a certain class F of events of C[0, cc),
called Borel subsets, which is generated by and therefore includes all of the
finite-dimensional events. .F includes as well various important infinite-
dimensional events, e.g., the events {max a , b X > y} and f maxa t b X < x}
pertaining to extremes of the process. More generally, if f is a continuous
function on C[0, oo) then the event { f( {X}) < x} is also a Borel subset of
C[0, oo) (Exercise 2). With events of this type in mind, a precise meaning of
convergence in distribution (or weak convergence) of the probability measures
P. to P on this infinite-dimensional space C[0, oo) is that the probability
distributions of the real-valued (one dimensional) random variables f( {X;'})
converge (in distribution as described in Chapter 0) to the distribution of f( {X1 })
for each real-valued continuous function f defined on C[0, cc). Since a number
of important infinite-dimensional events can be expressed in terms of continuous
functionals of the processes, this makes calculations of probabilities possible
by taking limits; for examples of infinite dimensional events whose probabilities
do not converge see Exercise 9.3(iv).
Because the limiting process, namely Brownian motion, is the same for all
increments {Z,} as above, the limit Theorem 8.1 is also referred to as the
Invariance Principle, i.e., invariance with respect to the distribution of the
increment process.
There are two distinct types of applications of Theorem 8.1. In the first type
it is used to calculate probabilities of infinite-dimensional events associated with
Brownian motion by studying simple random walks. In the second type it
(invariance) is used to calculate asymptotics of a large variety of partial-sum
processes by studying simple random walks and Brownian motion. Several such
examples are considered in the next two sections.
9 RECURRENCE PROBABILITIES FOR BROWNIAN MOTION
The first problem is to calculate, for a Brownian motion {X} with drift Ic = 0
and diffusion coefficient Q 2 , starting at x, the probability
P(T < Ta) = P({X' } reaches c before d) (c < x < d), (9.1)
where
T;:= inf{t >, 0: Xx = y} . (9.2)
Since {B, = (X; x)/v) is a standard Brownian motion starting at zero,
_X
P(2x < ra) = P({B,} reaches c x before d (9.3)
a Q
Now consider the i.i.d. Bernoulli sequence {Z m : m = 1, 2, ...} with P(Z m = 1) =

P(Zm = 1) = 2, and the associated random walk So = 0, Sm = Z l + + Z,
(m >, 1). By the FCLT (Theorem 8.1), the polygonal process {X} associated
with this random walk converges in distribution to {B,}. Hence (theoretical

RECURRENCE PROBABILITIES FOR BROWNIAN MOTION 25
complement 2)
cx dxl
P(i < rd) = lim P ( {i} reaches ------- before ----/)
"- x Q 6
= lim P({S,} reaches c" before d"), (9.4)

"-+00
where
c"= Lc -x ;],
6
and
d
x n if d" = d X is an integer,
d =
" dx
d x + 1 if not.
By relation (3.14) of Section 3, one has
d_ x -
- n
d a
P(rx <t) = l im " = lim ----- -- . (9.5)
Therefore,
P(r, <ra) = dc (c<x<d,=0). (9.6)
Similarly, using relation (3.13) of Section 3 instead of (3.14), one gets
P(ta <T') = x_c (c <x<d,p =0). (9.7)
Letting d --* + oo in (9.6) and c + co in (9.7), one has
P(rc < oo) = P({X, } ever reaches c) = I (c < x, p = 0),

(9.8)
P(r< < oo) = P({X; } ever reaches d) = 1 (x < d, p = 0).
The relations (9.8) mean that a Brownian motion with zero drift is recurrent,

just as a simple symmetric random walk was shown to be in Section 3.

The next problem is to calculate the corresponding probabilities when the
drift is a nonzero quantity p. Consider, for each large n, the Bernoulli sequence
p
P(Z,n.n=+1)=Pn=1+
2 26 ^ ,
{Z, n : m = 1, 2, ...} with

1
P(Zm,n= 1)=9 n =---
22 ------.
Write Sm , n =Z l , n + +Z, n for m>,1,So. ,,=0.Then,

,
[nt]
EX (n) = ES1 n,1,n = a n tp
^n or (9.9)
[
nt] Var Z n [nt] (
(1 1\1 )Z) t,
Var X^ (n) =
n n 7
and a slight modification of the FCLT, with no significant difference in proof,

implies that {X;n ) } and, therefore, {X;n} converges in distribution to a Brownian
motion with drift /Q and diffusion coefficient of I that starts at the origin. Let
{X'} be a Brownian motion with drift It and diffusion coefficient a 2 starting at
x. Then {W = (X, x)/Q} is a Brownian motion with drift p/a and diffusion
coefficient of! that starts at the origin. Hence, by using relation (3.8) of Section 3,
P(i, <x) = P({X, } reaches c before d)

d x^
= Pl{W} reaches c x before
Q a
= lim ({S m , n : m = 0, 1, 2, ...} reaches c n before d n )
n- ro
d=x
1 (ihn /^]n) a f
= lim / d-x - c=x
n a n
1 (pn/qn) a
d=x
1 + a n
1- J
Q.
1 7
= um
n-m l^ dc
a Jn 1 +
U n
1 I I
; ^
FIRST PASSAGE TIME DISTRIBUTIONS FOR BROWNIAN MOTION 27
exp
c -
d z x
exp- 2 A ^
c) Ja
e x p^ (d 2
)j
d -
exp
vz 1L
Therefore,
1 exp{2(d x)p/v z }
P(i' < zcd) = (c < x < d, p 0). (9.10)
1 exp{2(d c)p/v 2 }
If relation (3.6) of Section 3 is used instead of (3.8), then
1 exp{ 2(x c) /a2}

P(Td < T') = (c <x < d, y 0). (9.11)
1 exp{-2(d c)/a}
Letting d T oo in (9.10), one gets

P(i<<oo)=exp{- 2(x z c)p } (c < x, p > 0),
l o J)) (9.12)
P(r <oo)= 1 (c<x,p<0).
Thus, in this case the extremal random variable min,,, X is exponentially

distributed (Exercise 4). Letting c j oc in (9.11) one obtains
P(trd <oo)=1 (x<d,p>0),

(9.13)
P(-rd < oo) = exp{2(d x)/a 2 } (x < d, p < 0).
In particular it follows that max,, o X is exponentially distribute (Exercise 4).

Relations (9.12), (9.13) imply that a Brownian motion with a nonzero drift is
transient. This can also be deduced by an appeal to (a continuous time version
of) the strong law of large numbers, just as in (3.11), (3.12) of Section 3
(Exercise 1).
10 FIRST PASSAGE TIME DISTRIBUTIONS FOR BROWNIAN

MOTION
We have seen in Section 4, relation (4.7), that for a simple symmetric random
walk starting at zero, the first passage time 7 y to the state y 0 0 has the

distribution
N
P(7.=N)=IYI N+y 1
Y N N=IYI>IYI+2,IYI+4..... (10.1)
2
Now let r = T be the first time a standard Brownian motion starting at the
origin reaches z. Let {X^" } be the polygonal process corresponding to the simple
)
symmetric random walk. Considering the first time {X } reaches z, one has )
by the FLCT (Theorem 8.1) and Eq. 10.1 (Exercise 1),
P(a = > t) = lim P(T Z f ] > [nt])

n- X
= lim P(T=,n] = N)
n-+m N=(nt]+1
N
y IYI ( N
= lim N+ (Y = [z^])
n-+ao N=tnt]+1, N 2
Nyeven
(10.2)
Now according to Stirling's formula, for large integers M, we can write
2 e -M M nr+2 (1 + S M )
M! = (21r) (10.3)
where 8 M 0 as M oo. Since y = [z\], N> [nt], and 2(N y) >

{[nt] I[z/]I}/2, both N and N + y tend to infinity as n oo. Therefore,
for N + y even,
N e-NNN+#2-N
IYI N + y 2 _ N = IYI 2
(N+y)/2+I e l (N-Yu2+#
N 2 (2ir) t N e -(N+Y)12( N + Y ) (NY)/2 (N Y
2 ` 2 J
X (1 + o(1))
(2ir)1I2N312 1+ N I 1 N (1 + o( 1 ))
I 2 (N + Y)/ 2 (N Y)l2
(2ir) /2N3/2 (1 + ) 1 (1 + o( 1 )),
N
(10.4)
where o(1) denotes a quantity whose magnitude is bounded above by a quantity
en (t, z) that depends only on n, t, z and which goes to zero as n oo. Also,

r
log (1
y (N+ y uz y wN-vuz _ N + Y Y _ Y z IYI3
+ N) 1 - N 2 N 2N2 +(N3)
[
2 3
+N 2 y [
N+2 N +O \ INI3 /]
_ -2N+8(N,y), (10.5)
where IB(N, y)j < n -11 z c(t, z) and c(t, z) is a constant depending only on t and
z. Combining (10.4) and (10.5), we have
N
z
^N N + Y 2-N = n N3I/2 1 exp -
(
2N}(1 + o( 1 ))
2
( (10.6)
= n I N 312 exp1-2N}(1 + 0(1)),
where o(1) --, 0 as n -* oo, uniformly for N> [nt], N - [z\] even. Using this
in (10.2), one obtains
P(r= > t) = lim rz I

( z
N 31 2 expj -2N}. (10.7)
nâo N> n:J,
N[z.n) even
l
Now either [nt] + 1 or [nt] + 2 is a possible value of N. In the first case the
sum in (10.7) is over values N = [nt] - I + 2r (r = 1, 2, ...), and in the second
case over values [nt] + 2r (r = 1, 2, ...). Since the differences in corresponding
values of N/n are 2/n, one may take N = [nt] + 2r for the purpose of calculating
(10.7). Thus,
P(t 2 > t) = lim ] + 2 ex ( nz2

fl X) n ([nt] + 2r) 31^ p 1 2([nt] + }
2r)
_j2- Izl lim ^ 1

= 2
n -.. r1 2 (t + 2 r /n) 312 2
n z
1 (t + 2r/n) 1
exp^-
z
_ r
I zl 2 f
l
- z2^
u3/ ^ exp - 2u du. (10.8)
Now, by the change of variables v = Izi/ f , we get
2
P(T= > t) _ v
^ f
olz
e - " Z / z dv. (10.9)

The first passage time distribution for the more general case of Brownian
motion {X1 } with zero drift and diffusion coefficient Qz > 0, starting at the
origin, is now obtained by applying (10.9) to the standard Brownian motion
{(1/Q)X}. Therefore,
P(;> t) _ 2 f o
I=Ibf
e-2/2 dv. (10.10)
The probability density function f 2(t) of; is obtained from (10.10) as

Q
f 2(t) = Izl e-ZZna=t (t > 0). (10.11)

(2nci 2 ) 1/2 t 3/2
Note that for large t the tail of the p.d.f. f 2 (t) is of the order of t -3 / 2 . Therefore,
although {X} will reach z in a finite time with probability 1, the expected time
is infinite (Exercise 11).
Consider now a Brownian motion {X,} with a nonzero drift and diffusion
coefficient a 2 that starts at the origin. As in Section 9, the polygonal process
{X^n) } corresponding to the simple random walk S,, n = Z 1 , + + Z,, n ,
S0 ,, = 0, with P(Ztn , n = 1) = p = 2 + /(2Q\), converges in distribution to
{W = Xja}, which is a Brownian motion with drift /u and diffusion coefficient
1. On the other hand, writing Ty , n for the first passage time of {S, n : m = 0, 1, ...}
to y, one has, by relation (4.6) of Section 4,
N
N = IY)
P(1 = N) N N +y p(N+v)/2R(N-v)/2
2
N (N-i-y)12 (lN-Y)l2
IYI N ) (
N+y 2 - 1+
N 2
Ql Qn
FYI 2 \ N/21 y /2 p y /2
=
1
NN 2 U
1_
+ y 2 N(
n \ 1 +
a2 n / \
l
; )
(10.12)
For y = [w..J] for some given nonzero w, and N = [nt] + 2r for some given
t> 0 and all positive integers r, one has
N/2 / l-y/2
^ v/z/ l
( _
I1 1 }
J ^
/ \ z ) nt/2 +r J Wf/z w,,//z

a2n (1 + 0(1))
\1 + a nI \l Q^)
2 2r
exp{ _
= ex 4
t e i
}(x a^w
2
ex 1
l + 0 ())
n^ p 26 p 2v (
z)n]rin
t/1 2 /LW
-zn1 +o(1))
(=exp-2+6

i ( z l r ro
exp
t 2 + ^W exp Y-2 + e (l + 0(1)) (10.13)
2a Cr l Q J
where E does not depend on r and goes to zero as n + oc, and o(l) represents
a term that goes to zero uniformly for all r >, 1 as n --* x. The first passage
time i 2 to z for {X1 } is the same as the first passage time to w = z/a for the
process {I4'} = {X/a}. It follows from (9.12), (9.13) that if
(i) p. <0 and z > O or

(ii) p > 0 and z < 0,
then there is a positive probability that the process { W} will never reach w = z/a
(i.e., t Z = co). On the other hand, the sum of the probabilities in (10.12) over
N> [nt] only gives the probability that the random walk reaches [w,,/n- ] in a
finite time greater than [nt]. By the FCLT, (10.6)(10.8) and (10.12) and (10.13),
we have
P(t < r < co) = 2 IwI exp z

tu2 + w lim 1
n 2a Q ^^ r _ 1 n(t + 2r/n) 312
wz
x exp + e )'^
2(t + 2r/n)
(wI exp 2a2 + 2 V312 exp 2v

_ ^ t2 jJ1 1 f w2 ^
x [ du
expf^Zn`-Ì/2
1
(2n)i/z w=
exI I p a
v3/z
^ t1i
2a W/.1
2
W2
x exp{----
(v t) dv
2v 2az
1 W/I ) z z
= ^ aw^ exp fj 1 exp W v A,
(2n) l/Z v 3 ' 2 2v 2U2 ^
for w = z/a. (10.14)

Therefore, for t > 0,

z z
Pt
( = 1 i 1ex z -- v dv. ( 10.15 )
) (27r) / Z a o2 v3/2 p 2a 2 o 2QZ
Differentiating this with respect to t (and changing the sign) the probability
density function of r Z is given by
.fa=. (t) = Izl exp ^ z z2 Z t}

(2 nc 2 ) l / 2 t 3 J 2 QZ 2Q Z t 2o 2
Therefore,
Izi
.f 2 exp{ -- 1 (z t) 2 } (t > 0). (10.16)
(t) _ (2na2)1J2t3n
In particular, letting p(t; 0, y) denote the p.d.f. (7.1) of the distribution of the
position X at time t, (10.16) can be expressed as (see 4.6)
I I p(t; 0, Z).
.%2. u (t) = C (10.17)
As mentioned before, the integral of J 2 , (t) is less than 1 if either

2
(i) p>O,z<Oor
(ii) p<0,z>0.
In all other cases, (10.16) is a proper probability density function. By putting
p = 0 in (10.16), one gets (10.11).
11 THE ARCSINE LAW
Consider a simple symmetric random walk {S,} starting at zero. The problem
is to calculate the distribution of the last visit to zero by S o , S,, ... , S. For
this we first calculate the probability that the number of + l's exceeds the
number of l's until time N and with a given positive value of the excess at
time N.
Lemma 1. Let a, b be two integers, 0 < a < b. Then
P(S1>0,S2>0,...,S.+b-i> 0,Sa+b=ba)
bb
[(a+ 11 (a+b 111^21a+n(a b - a
(11.1)
b b)a+b(2)a+n.
THE ARCSINE LAW 33
Proof. Each of the (t b)

b ) paths from (0, 0) to (a + b, b a) has probability
(2) +b We seek the number M of those for which S 1 = 1, S2 > 0, S 3 > 0, ... ,
Sa+b_1 > 0, Sa+b = b a. Now the paths from (1,1) to (a + b, b a) that cross
or touch zero (the horizontal axis) are in oneone correspondence with those
that go from (1, 1) to (a + b, b a). This correspondence is set up by reflecting
each path of the last type about zero (i.e., about the horizontal time axis) up
to the first time after time zero that zero is reached, and leaving the path from
then on unchanged. The reflected path leads from (1, 1) to (a + b, b a) and
crosses or touches zero. Conversely, any path leading from (1,1) to
(a + b, b a) that crosses or touches zero, when reflected in the same manner,
yields a path from (1, 1) to (a + b, b a). But the number of all paths from
(1, 1) to (a + b, b a) is simply ("1') since it requires b plus l's and a 1
minus l's among a + b I steps to go from (1, 1) to (a + b, b a). Hence
M= (a +
bb i 1) (a+b-1^
since there are altogether (' + 6 1 1 ) paths from (1,1) to (a + b, b a). Now a
straightforward simplification yields
_ a+b ba
M b )a+b
Lemma 2. For the simple symmetric random walk starting at zero we have,
f'(S154 0 ,S2^` 0 ,...,Sz" 0)=P(Sz"=0)_\nn2n. (11.2)
Proof. By symmetry, the leftmost side of (11.2) equals
2P(S I >0,S 2 >0,...,S 2n >0)
"
=2 Z P(S 1 >O,S 2 >0,...,S Z "_ 2 >0,S en =2r)
r=1
a"
=2,= 2
[ (n +r 1 1) ( 2n+r/](2)
= 2(2nn 1 ( 2n)(^)'"
= P(S2" = 0),
)\ 2 / 2n = n
where we have adopted the convention that ( 2 2n 1 ) = 0 in writing the middle

equality.

Theorem 11.1. Let I' ( ' ) = max{ j: 0 ,< j ,< m, Si = 0}. Then
P(F
(2n) = 2k) = P(S2k = 0 )P(S2n-2k = 0 )
2 ) J2n-2k
= \ k /\ 2/ 2k (fl_k k2
(2k)!(2n 2k)! ( i'\"
= fork =0,1,2,..., n. (11.3)
(k!) 2 ((n k)!)2 2
Proof. By considering the conditional probability given {S 2k = 0} one can easily

justify that
P(r(2n) = 2k) = P(S2k
= 0, S2k +1 5 0, Sek + 2 0, ... , Sen * 0)
= P(S2k = 0)P(S I 0, S2 0, ... , S2n-2k 0 )
= P(S2k = 0 )P(S2n-2k = 0)
Theorem 11.1 has the following symmetry relation as a corollary.
P(r' (2 n ) = 2k) = P(17 (2 n) = 2n 2k) for all k = 0,1, ... , n. (11.4)
Theorem 11.2. (The Arc Sine Law). Let {B,} be a standard Brownian motion
at zero. Let y = sup{t: 0 < t 5 1, B, = 0}. Then y has the probability density
function
i(x) 1 0<x<l. (11.5)

_ IZ(x(1 x))112 ,
P(y < x) =
fo x
.f(y) dy = sin
n
1 x-. (11.6)
Proof. Let {So = 0, S 1 , S 2 , ...} be a simple symmetric random walk. Define

{X ' ) } as in (8.4). By the FCLT (Theorem 8.1) one has
P(y '< x) = um P(y (") < x) (0 < x < 1),
n ao
where
y (" ) = sup{t: 0 '< t '< 1, X = 0} = 1 sup{m: 0 < m <, n, S. = O} = 1 r( " ) .

n n
In particular, taking the limit over even integers, it follows that
P(y x) = limP 1 r 2n ) x I = um P(r(2 nj < 2nx),

n- 0,( 2n jjj n ao
THE BROWNIAN BRIDGE 35
where I' (2 n 1 is defined in Theorem 11.1. By Theorem 11.1 and Stirling's

approximation
(2k)!(2n 2k)!
lim P(I' (Z " 1 < 2nx) = lim [^1 2_2n
n-w n-. k=o (k!) 2 ((n k)!)z
1-1 (27r) 112 e - 2k( 2k )zk+ j

= lim Y-
112 -k k+})2
n-.00 k=o ((2n) e k
(2x)1/2e-2(n-k)(2(n k))[2cn-k)+#12-2n
x \ ((2n)'1'e-("-k)(n k)n-k++)2
[nxxl
= lim Y-
1/2
n-+oo k=o 7i (n k)
1 Intl 1 1 _ 1
1x
= n li k= n k
k 112 n o (y( 1 y)) 1I2 dy
\ n^ 1
n//
The following (invariance) corollary follows by applying the FCLT.
Corollary 11.3. Let {Z,, Z 2 , ...} be a sequence of i.i.d. random variables such
that EZ, = 0, EZ i = 1. Then, defining {X^" 1 } as in (8.4) and y ( n 1 as above, one has
lim P(y ( n ) x) = 2 sin - ' ^. (11.7)

n-^M n
From the arcsine law of the time of the last visit to zero it is also possible
to get the distribution of the length of time in [0, 1] the standard Brownian
motion spends on the positive side of the origin (i.e., an occupation time
law) again.as an arcsine distribution. This fact is recorded in the following
corollary (Exercise 2).
Corollary 11.4. Let U = I {t < 1: Bt e IT + }I, where I I denotes Lebesgue

measure and {B 1 } is standard Brownian motion starting at 0. Then,
P(U < x) = 2 sin -1 /i (11.8)

n
12 THE BROWNIAN BRIDGE
Let {B,} be a standard Brownian motion starting at zero. Since B, tB l vanishes

for t = 0 and t = 1, the stochastic process {B*} defined by
B*:=B t tB,, O'<t'< 1, (12.1)
is called the Brownian bridge or the tied-down Brownian motion.

Since {B,} is a Gaussian process with independent increments, it is simple
to check that {B*} is a Gaussian process; i.e., its finite dimensional distributions
are Gaussian. Also,
EB* = 0, (12.2)
and
Cov(B*, B*) = Cov(B,,, B, 2 ) t 2 Cov(B,,, B 1 ) t, Cov(B,,, B 1 )
+ t1t2 Cov(B 1 , B1)
= tl t2tl t1t2 + t1t2 = t1(1 t2), for t i t2. (12.3)
From this one can also write down the joint normal density of (B*, B*, ... , B)
for arbitrary 0 < t l < t 2 < < t k < 1 (Exercise 1).
The Brownian bridge arises quite naturally in the asymptotic theory of
statistics. To explain this application, let us consider a sequence of real-valued
i.i.d. random variables Y1 , Y2 ,... , having a (common) distribution function F.
The nth empirical distribution is the discrete probability distribution on the line
assigning a probability 1/n to each of the n values Y Y2 , ... , Y. The
corresponding distribution function F. is called the (nth) empirical distribution
function,
1
F(t)=#{j:1<j<n,Y; <t}, co<t<00, (12.4)
n
where # A denotes the cardinality of the set A.

Suppose Y Y(2) 1< ... 1< Y is the ordering of the first n observations.
Figure 12.1 illustrates F 5 . Note that {F(t): t >, 0} is for each n a stochastic
FF(t)
1 ^--
3 ~ I
O Yti)=Y2 Y(2)=Ys Y{3)=YI Y(4)= Y3 Y(c =Y, t
Figure 12.1

THE BROWNIAN BRIDGE 37
process. Now for each t the random variable

n
nF(t) _ Y l <() (} (12.5)

j=1
is the sum of n i.i.d. Bernoulli random variables each taking the value I with
probability F(t) = P( t) and the value 0 with probability 1 F(t). Now
E(1 (y . ) ) = F(t) and, for t 1
=
F(t1)( 1 F(t2)),
Cov( 1 (Y ; s1,) , 1{Yks:2))
to, ifj = k,
(12.6)
since in the case j = k,
Cov(l{Yj_<ti)' I (YkSt2)) = E(l(YjEt1} 1 (Y^Se2}) E( 1 (YJ_<tt))E( 1 {yJ ,2})
= F(t1) F(t1)F(t2) = F(t1)( 1 F(ti))
It follows from the central limit theorem that
n 1/2 ( 1{Yt) nF(t)) = fn(Fn(t) F(t))

- 1
is asymptotically (as n > oo) Gaussian with mean zero and variance
F(t)(1 F(t)). For t l < t 2 < . < t k , the multidimensional central limit
theorem applied to the i.i.d. sequence of k-dimensional random vectors
( 1 (Y)' 1(Y;st^)' ... , 1 (Y ; ,Ik t) shows that (.(Fn(t1) F(t1)), \/(F,(t2) F(ti)),
. , f (F,,(t k ) F(t k ))) is asymptotically (k-dimensional) Gaussian with zero
mean and dispersion matrix E = ((a s,)), where
= Cov(I(Y s,,), 1 {Y ; st ; )) = F(t,)(1 F(t ; )),

;
for t. < t j . (12.7)
In the special case of observations from the uniform distribution on [0, 1],
one has
F(t)=t, O'<t'<1, (12.8)
so that the Finite-dimensional distributions of the stochastic process

^(Fn (t) F(t)) converge to those of the Brownian bridge as n --p cc. As in
the case of the functional central limit theorem (FCLT), probabilities of many
infinite-dimensional events of interest also converge to those of the Brownian
bridge (theoretical complements 1 and 2). The precise statement of such a result
is as follows.
Proposition 12.1. If Y1 , Y2 , ... is an i.i.d. sequence having the uniform

distribution on [0, 1], then the normalized empirical process { f (F(t) t):
0 < t < 1} converges in distribution to the Brownian bridge as n -+ co.
Let Y1 , Y2 ,... be an i.i.d. sequence having a (common) distribution function

F that is continuous on the real number line. Note that in the case that F is
strictly increasing on an interval (a, b) with F(a) = 0, F(b) = 1, one has for
0<t<1,
P(F(Yk) < t) = P(Yk < F -1 (t)) = F(F -1 (t)) = t, (12.9)
so the sequence U 1 = F(Y1 ), U2 = F(Y2 ), . .. is i.i.d. uniform on [0, 1]. The same
is true more generally (Exercise 2). Let F be the empirical distribution function
of Y1 , ... , Y,,, and G. that of U1 ,. .. , U. Then, since the proportion of Yk 's,
1 < k < n, that do not exceed t coincides with the proportion of Uk 's, 1 < k < n,
that do not exceed F(t), we have
f [F(t) F(t)] = /[G(F(t)) F(t)], a < t < b. (12.10)
If a = oo (b = + oo), the index set [a, b] for the process is to exclude a (b).
Since ^(G(t) t), 0 < t <, 1, converges in distribution to the Brownian bridge,
and since t -+ F(t) is increasing on (a, b), one derives the following extension
of Proposition 12.1.
Proposition 12.2. Let Y1 , Y2 , ... be a sequence of i.i.d. real-valued random

variables with continuous distribution function F on (a, b) where F(a) = 0,
F(b) = 1. Then the normalized empirical process \/(F,,(t) F(t)), a < t b,
converges in distribution to the Gaussian process {Z} :_ {BF() : a <, t < b}, as
n -->co.
It also follows from (12.10) that the Kolmogorov-Smirnov statistic defined

by D := sup{ JIF(t) F(t)I: a < t s b} satisfies
D. = sup "I F(t) F(t)I = sup ' I G(F(t)) F(t)I = sup ,/ /IG(t) tl.
a_<t5b a5t5b 0-<t<,1
(12.11)
Thus, the distribution of D. is the same (namely that obtained under the uniform
distribution) for all continuous F. This common distribution has been tabulated
for small and moderately large values of n (see theoretical complement 2). By
Proposition 12.2, for large n, the distribution is approximately the same as that
of the statistic defined by (also see theoretical complement 1)
D:= sup (B*I. (12.12)
o,t,1
STOPPING TIMES AND MARTINGALES 39
A calculation of the distribution of D yields (theoretical complement 3 and

Exercise 4*(iii))
P(D >d) =2 (- 1)k-

ie- 2k2d2, d>0. (12.13)
k=1
These facts are often used to test the statistical hypothesis that observations
Y,, Y2 , ... , Y are from a specified distribution with a continuous distribution
function F. If the observed value, say d, of D is so large that the probability
(approximated by (12.13) for large n) is very small for a value of Dn as large as
or larger than d to occur (under the assumption that Y,, ... , Y do come from
F), then the hypothesis is rejected.
In closing, note that by the strong law of large numbers, F (t) -+ F(t) as
F
n --> cia, with probability 1. From the FCLT, it follows that

sup IF(t) F(t)I -- 0 in probability as n - oo. (12.14)
- oo < t
<
In fact, it is possible to show that the uniform convergence in (12.14) is also

almost sure (Exercise 8). This stronger result is known as the Glivenko-Cantelli
lemma.
13 STOPPING TIMES AND MARTINGALES
An extremely useful concept in probability is that of a stopping time, sometimes

also called a Markov time. Consider a sequence of random variables
{X: n = 0, 1, ...}, defined on some probability space (0, F, P). Stopping times
with respect to {X} are defined as follows. Denote by . the sigmafield
Q{X .... , X} comprising all events that depend only on {X0 , X 1 ..... X}.
Definition 13.1. A stopping time r for the process {X} is a random variable
taking nonnegative integer values, including possibly the value + oo, such that
{t<n }E. ( n=0,1,...). (13.1)
Observe that (13.1) is equivalent to the condition
{t=n}e5 (n=0,1,...), (13.2)
since . are increasing sigmafields (i.e., . c 3y ,,) and r is integer-valued.

Informally, (13.1) says that, using r, the decision to stop or not to stop by
time n depends only on the observations X0 , X 1 . . , X. ,.
An important example of a stopping time is the first passage time r B to a

(Bore]) set B c R',
t := min{n > 0: X e B}. (13.3)
If X. does not lie in B for any n, one takes T B = oo. Sometimes the minimum
in (13.3) is taken over In >, 1, X . e B} , in which case we call it the first return
time to B, denoted rl B
A less interesting but useful example of a stopping time is a constant time,
T:= m (13.4)
where m is a fixed positive integer.

One may define, for every positive integer r, the rth passage time t B
recursively, by
i2:= min{n > rB -1) : X. E B} (r = 2, ...)

(13.5)
rB = T B .
Again, if X. does not lie in B for any n> TB 1) , take ie^ = oo. Also note that
if i8 ) = oo for some r then r ' = oo for all r' r. It is a simple exercise to check
( )
that each TB' ) is a stopping time (Exercise 1).

The usefulness of the concept of stopping times will now be illustrated by a
result that in gambling language says that in a fair game the gambler has no
winning strategy. To be precise, consider a sequence {X: n = 0, 1, ...} of
independent random variables, S = X0 + X l + + X. Obviously, if
EX = 0, n > 1, then ES = S o for each n. We will now consider an extension
of this property when n is replaced by certain stopping times. Since {X 0 ,. . . , X}
and {So , S 1 , ... , S} determine each other, stopping times for {S} are the same
as those for {X}.
Theorem 13.1. Let r be a stopping time for the process {S}. If

1. EX=0forn>,1,EIX0 I-EIS0 I<oo,
2. P(i< cc) =1,
3. EISJ < oo, and
4. E(S m1{T>) --> 0 as m 00,
then
ES= ES o . (13.6)
Proof. First assume T < m for some integer m. Then
ES, = E(Sol{L=ol) + E(Sll(T=i^) + ... + E(S m l {s=m) )

= E(Xo 1 (T, )) + E(X, 1 (,,)) + ... + E(XJ 1 (T j}) + ... + E(Xm1{rim})
= EXo + E(X 1 1 (t , l) ) + + E(XJl{t_>j}) + . + E(Xml{T,m}). (13.7)
Now {i > j} = {x <j}` = {r <j l }` depends only on Xo , X,, ... , Xj -,.

Therefore,
E(Xjl(t , i)) = E[ 1 ( >J)E(XJ {X 0 .... , X; -i })]

= E[l >_ E(XX )]=0 (j=1,2,...,m), (13.8)
so that (13.7) reduces to
ESL = EXo = ES,. (13.9)
To prove the general result, define
r m := r A m = min{r, m}, (13.10)
and check that r m is a stopping time (Exercise 2). By (13.9),
ES=ES, (m=1,2,...). (13.11)
Since r = z m on the set Jr <, m}, one has
IESI ESJ = IE(SS S^m)1 = I E((S5 Sm) 1 (s>m))I
IE(sl{Y>m))I + IE(Sml{t>m))I. (13.12)

The first term on the right side of the inequality in (13.12) goes to zero as
m > oo, by assumptions (2), (3) (Exercise 3), while the second term goes to
zero by assumption (4). 0.
Assumptions (2), (3) ensure that ES, is well defined and finite. Assumption
(4) is of a technical nature, but cannot be dispensed with. To demonstrate this,
consider a simple symmetric random walk {S n } starting at zero (i.e., S o = 0).
Write r y for T { ^, ) , the first passage time to the state y, y 0. Then (1), (2), (3)
are satisfied. But
ES,Y=y 0. (13.13)
The reason (13.6) does not hold in this case is that assumption (4) is violated
(see Exercise 4). If, on the other hand,
r = min{n: S = a or b}, (13.14)
where a and b are positive integers, then P(T < co) = 1. There are various ways
of proving this last assertion. A more general result, namely, Proposition 13.4,
is proved later in the section to take care of this condition. To check condition
(3) of Theorem 13.1, note that ISJ < max{a, b}, so that EISB I < max{a, b}. Also,
on the set {tr > m} one has a <Sm m })I < max {a, b }EI 1 (t>m )I

= max {a, b}P{ r > m} ,. 0 as m cc. (13.15)
Thus condition (4) is verified. Hence the conclusion (13.6) of Theorem 13.1
holds. This means
0 = ES, = aP(t _ q < Tb ) + bP(T_a > T i,)
= a(1 P(r_ a > T b )) + bP(t_ o > t b ), (13.16)
which may be solved for P(T_ a > Tb ) to yield
( a >z b ) =
Pz_ a (, 13.17)
a+b
a result that was obtained by a different method earlier (see Chapter I, Eq. 3.13).
To deal with the case EX 0, as is the case with the simple asymmetric
random walk, the following corollary to Theorem 13.1 is useful.
Corollary 13.2. (Wald's Equation). Let { Y.: n = 1, 2, ...} be a sequence of i.i.d.

random variables with EY = . Let S = Yl + + Y., and let i be a stopping
time for the process {S: n = 1, 2, ...} such that
2'. ET < cc,
3'. ES LI < oo, and
4'. lim,,, IE(Sml {t>m })I = 0.
Then
ES,' = (E-r). (13.18)
Proof. To prove this, simply set X0 = 0, X = Y p (n >, 1), and

S. = X i + + XX = S n, and apply Theorem 13.1 to get
0 = ES = E(ST i) = EST E(r)u,

Q (13.19)
which yields (13.18). Note that EISB I < EISLI + (Er)I j! < oo, by (2') and (3'). Also
IE(Sm 1 {i >m })I < IE(Sm 1 (r>m})I + EIZi21 {t >m}I
= IE(S;,,1 T >m ) I + I1EItI { T >m I - 0.

( } }
Z
For an application of Corollary 13.2, consider the case Y = + 1 or 1 with

probabilities p and q = I p, respectively, Z TO _ (Er)(p q). (13.20)
From this one gets
Er= (b+a)P(T_a>rb)a
(13.21)
p R
Making use of the evaluation
1_
(P) a
P(ta > Tb) _(q'\)a+b'
(Chapter I, Eq. 3.6) one has

,)a

P q( a (13.22)
P_y
b+a (
Assumption (2') for this case follows from Proposition 13.4 below, while (3'),
(4'), follow exactly as in the case of the simple symmetric random walk (see
Eq. 13.15).
In the proof of Theorem 13.1, the only property of the sequence {X} that
is made use of is the property
I
E(Xa 1 {X0 ,X 1 ,...,X.})=0 (n= 0,1,2,...). (13.23)
Theorem 13.1 therefore remains valid if assumption (1) is replaced by (13.23),

and (2)-(4) hold. No other assumption (such as independence) concerning the
random variables is needed. This motivates the following definition.
Definition 13.2. A sequence {X: n = 0, 1, 2, ...} satisfying (13.23) is called a

sequence of martingale differences, and the sequence of their partial sums, {S},
is called a martingale.
Note that (13.23) is equivalent to
I
E(S +1 {S o ,...,Sn })=S (n = 0, 1,2,...), (13.24)

since
E(S+ j I {So,S1,...,S}) = E(S + i I {Xo , X l , ... , X})

= E(S + X+ 1 I {Xo , X i ,... , X})
= S + E(X + , I {X0 , X 1 .... , X}) = S. (13.25)
Conversely, if (13.24) holds for a sequence of random variables {S:

n = 0, 1, 2, ...}, then the sequence {X: n = 0, 1, 2, ...} defined by
X=SS i (n=1,2,3,...),
- Xo=So, (13.26)
satisfies (13.23).
Martingales necessarily have constant expected values. Likewise, if {X} is a
martingale difference sequence, then EX = 0 for each n >, 1. Theorem 13.1,
Corollary 13.2, and Theorem 13.3 below, assert that this constancy of
expectations of a martingale continues to hold at appropriate stopping times.
In the gambling setting, the martingale property (13.24), or (13.23), is often
taken as the definition of a fair game, since whatever be the outcomes of the
first n plays, the expected net gain at the (n + 1)st play is zero. As an example
of a strategy for the gambler, suppose that it is decided not to stop until an
amount a is lost or an amount b is gained, whichever comes first. Under (13.23)
and conditions (2)(4) of Theorem 13.1, the expected gain at the end of the
game is still zero. This conclusion holds for more general stopping times, as
stated in Theorem 13.3 below. Before this result is stated, it would be useful to
extend the definition of a martingale somewhat. To motivate this new definition,
consider a sequence of i.i.d. random variables { Y: n = 1, 2, ...} such that
EY = 0, EY,, = Var()) = 1. Then {S n: n = 0, 1, 2, ...} is a martingale,
where So is an arbitrary random variable independent of { Y: n = 1, 2, ...}
satisfying ES < oo.
To see this, form the difference sequence
X+ 1' =Sn + 1 ( n + 1) (S n) = Yn+a +2SY +i 1 (n = 0, 1,2,...),
Xo _
Szo . (13.27)
Then, writing Yo = So ,
E(X +aI {Yo ,Y,,Y2 ,...,} })
= E(Yn +1 I {Yo , Yl , ... , Y}) + 2 SE(Y,. { Yo, Yl , ... , Y}) 1

= E(Yn +l ) + 2SE(Yn+1 ) 1 = 1 + 0 1 = 0. (13.28)
Since Xo , X 1 , X2 , ... , X. are determined by (i.e., are Borel measurable

functions of) Yo , Y1 , Y2 , ... , Y, it follows that

I
E(Xn+ 1 {X0, Xl, X2, .... Xn })
= E[E(X i l {Yo , Yi , ... , Y})I {Xo , X i , X 2 .. .. , X}] = 0, (13.29)
by (13.28). Thus,
E(X + 1 I { Yo , Yl , ... , Y}) = 0 (n = 1, 2, ...) (13.30)
implies that
E(X +i 1 {X0 ,X 1 ,...,X})=0 (n = 1,2,...). (13.31)
In general, however, the converse is not true; namely, (13.31) does not imply
(13.30). To understand this better, consider a sequence of random variables
{Y: n = 0, 1, 2, ...}. Suppose that {X: n = 0, 1, 2, ...} is another sequence of
random variables such that, for every n, X0 , X 1 , X2,... , X can be expressed
as functions of Yo , Y1 , Y2 . . , Y. Also assume EXn < oo for all n. The condition
(13.31) implies that X, +1 is orthogonal to all square integrable functions of
X0 , X 1 , ... , X, while (13.30) implies that X + , is orthogonal to all square
integrable functions of Yo , Y1 .....} (Chapter 0, Eq. 4.20). The latter class of
functions is larger than the former class. Property (13.30) is therefore stronger
than property (13.31).
One may express (13.30) as
E(X +1 1S )=0 (n=0,1,2,...), (13.32)
where . := 6{ Yo , Y1 .....}} is the sigmafield of all events determined by

{Yo , Y1 , ... , Y}. Note that X is .f-measurable, i.e., a Borel-measurable
function of Yo , ... , Y. Also, {.} is increasing, .^ c + This motivates the
.
following more general definition of a martingale.
Definition 13.3. Let {XX } be a sequence of random variables, and {. } an

increasing sequence of sigmafields such that, for every n, Xo , X 1 , X 2 , ... , X.
are 3-measurable. If EJXX j < oo and (13.32) holds for all n, then
{X: n = 0, 1, 2, ...} is said to be a sequence of {F }-martingale differences. The
sequence of partial sums {Z = X0 + . + X: n = 0, 1, 2, ...} is then said to
be a {.F}-martingale.
Note that a martingale in this sense is also a martingale in the sense of Definition
13.2, since (13.30) implies (13.31). In order to state an appropriate generalization
of Theorem 13.1 we need to extend the definition of stopping times given earlier.
Definition 13.4. Let {.: n = 0, 1, 2, ...} be an increasing sequence of

sub-sigmafields of the basic sigmafield F . A random variable r with nonnegative
integral values, including possibly the value + co, is a {.^}-stopping time if
(13.1) (or, (13.2)) holds for all n.
Theorem 13.3. Let {X}, {Z}, {} be as in Definition 13.3. Let i be a

{.}-stopping time such that
1. P(r<oo)=1,
2. EIZJ < oo, and
3. limm E(Zml{t>m)) = 0.
Then
EZZ = EZo . (13.33)
The proof of Theorem 13.3 is essentially identical to that of Theorem 13.1

(Exercise 6), except that (13.6) becomes
EZL = EZm = EZa (13.34)

which may not be zero.
For an application of Theorem 13.3 consider an i.i.d. Bernoulli sequence
{Y:n= 1,2,...}:
P(Y = 1) = P() = 1) = i. (13.35)
Let Yo = 0, ^ = Q{ Yo , Y,, ... , Y}. Then, by (13.27) and (13.28), the sequence
of random variables
Z:= S2 n (n = 0, 1, 2, ...) (13.36)
is a {.F}-martingale sequence. Here S = Yo + + Y. Let r be the first time

the random walk {S: n = 0, 1, 2, ...} reaches a orb, where a, b are two given
positive integers. Then r is a { F}-stopping time. By Proposition 13.4 below,
one has P(r < oo) = 1 and E ' < oo for all k. Moreover,
EIZT I = EIS z rl < ESt + Et <, max{a 2 , b 2 } + Er < oo, (13.37)
and
EIZm1{t>m}I < E((max {a 2 , b 2 } + m) 1 {t>m))
= (max{a 2 , b 2 } + m)P(i > m)
= max{a 2 , b 2 }P(r > m) + mP(r > m). (13.38)
Now P(i > m) -- 0 as m -+ co and, by Chevyshev's inequality,
mP(T > m) <, m Er e 0. (13.39)

m
Conditions (1)(3) of Theorem 13.3 are therefore verified. Hence,
EZ, =EZ 1 = ES;-1= 1 -1=0, (13.40)
i.e., using (13.17),
a2b 2
Et=ES, =a 2 P(r_ a <T b )+ b 2 P(t_ a > Tb) = (13.41)
a+b + a6 ab ab.
By changing the starting position Yo = So = 0 of the random walk to

Yo = So = x one then gets
ET= (xc)(dx), c<x<d, (13.42)
where z = min{n: S.' = c or d}.

The next result has been made use of in deriving (13.17), (13.22) and (13.42).
Proposition 13.4. Let {X": n = 1, 2, ...} be an i.i.d. sequence of random

variables such that P(X" = 0) < 1. Let T be the first escape time of the (general)
random walk {S.x = x + X l + + X": n = 1, 2, ...}, S = x, from the interval
(a, b), where a and b are positive numbers. Then P(i < oo) = 1, and the
moment generating function 4(z) = Eezt oft is finite in a neighborhood of z = 0.
Proof. There exists an c> 0 such that either P(X" > c) > 0 or P(X" < c)> 0.
Assume first that 6 := P(X" > c) > 0. Define
no = r a+b l
+ 1, (13.43)
i
Ls
where [(a + b) /E] is the integer part of (a + b) /E. No matter what the starting
position x E (a, b) of the random walk may be, if X" > E for all
n=1,2,...,n o , then S=x+X 1 ++X" o >x+n o e>x +a+b>,b.
Therefore,
P(T<,n o )>P(S > b) >,P(X">rforn= 1,2,...,n o )>,6=6 0 , ( 13.44)
say. Now consider the events
Ak ={ a<S n o ) 2n o ) = E(I A ,1 A2 ) = E[1A1E(lA2 I {Si, ... , S.X})]. (13.47)
Now
E(I A2 I {S;,...,So})=P(a <S <bform= 1,...,n o I{Si,...,S

+m 0 })
=P(a<S o +Xna+l +... +Xno+m <b,
for m = 1, ... , n o l {Si, ... , S p}). (13.48)
Since X 0+1 + + Xn n +m is independent of Si, ... , So , the last conditional

probability may be evaluated as
P(a<z+Xnn+t + + Xno +m <bform= 1 ,...,no)Z=s o

=P(a<z+X l ++Xm <bform=1,...,n o ) = = s;o . ( 13.49)
The equality in (13.49) is due to the fact that the distribution of (X 1 , X2 , ... , Xno )
is the same as that of (Xn o +l, Xn o +z, . .. , X20 ). Note that Sao E ( a, b) on the
set A 1 . Hence the last probability in (13.49) is not larger than 1 5 by (13.46).
Therefore, (13.47) yields
P(A 1 n A 2 ) = P(z > 2n o ) < E[l A1 (1 So)] = ( 1 8 0)P(A1) <- (1 8 o ) 2 .

(13.50)
By recursion it follows that
P(T > kn o ) = E(1 42 IA,,) = E[1A, ... 1 A,,- E(lAk {S1, ... , S(k- 1)no })]
E[I A1 ...l Ak- ,( 1 So)] < (1 b o )P(A 1 n ... n Ak-i)
= (1 8 0 )P(r > ( k 1)n o ) <, (1 5 o )(1 o)k 1

=(1-8 o ) k (k=1,2,...). (13.51)
Since {r = co} c {z > kn o } for every k, one has
P(r = co) < (1 8 o ) k for all k, (13.52)
and consequently, P(T = oo) = 0. Finally,
Ee:z = em:P(T = m) < emIZIP(r = m)

m =1 m =1
0o kno
_ )e'I'IP(-r = m)
k=1 m= (k- 1)no+1

w oo
e k ^ 0I=I P(2 > (k 1)n0) Ze (1 6) k-1
k=1 k=1
log(1 6)
no^z^
= e ((1 b)e Holz)) k-1 < cc for jzj < . (13.53)
k=1 no
One may proceed in an entirely analogous manner assuming P(X,, < E) > 0.
n
An immediate corollary to Proposition 13.4 is that Er k < cc for all

k = 1, 2, ... (Exercise 18(i)).
It is easy to check that (Exercise 8), instead of the independence of {X n }, it
is enough to assume that for some e > 0 there is a S > 0 such that
P(XX+ 1 > e l {X 1 , ... , Xn }) >, 8 > 0 for all n,

or (13.54)
P(Xn+1 < EI {X 1 ,...,Xn })>,S>0 foralln.
Thus, Proposition 13.4 has the following extension, which is useful in studying
processes other than random walks.
Proposition 13.5. The conclusion of Proposition 13.4 holds if, instead of the
assumption that {X} is i.i.d., (13.54) holds for a pair of positive numbers s, S.
The next result in this section is Doob's Maximal Inequality. A maximal

inequality relates the growth of the maximum of partial sums to the growth of
the partial sums, and typically shows that the former does not grow much faster
than the latter.
Theorem 13.6. (Doob's Maximal Inequality). Let {Z o , ... , Zn } be a martingale,

MM max{1Z0 1, ... , JZJ).
(a) For all A > 0,
P(MM , A) E(IZfhI(M.>A ) < EIZnj/A.

)
(13.55)
(b) If EZ < co then, for all A > 0,
P(Mn i A) E(ZR 1 (M>_ A)) < EZn/ 22 , (13.56)
and
EMn < 4EZ,^, . (13.57)

Proof. (a) Write .Fk := a{Z 0 , ... , Zk }, the sigmafield of events determined by
Z0 ,. . . , Z. Consider the events A o '= {IZ O I % A}, A k := {IZ; I <A for 0 < j < k,
IZkI >, A} (k = 1, ... , n). The events A k are pairwise disjoint and
n
UA k ={M 2}.
0
Therefore,
n
P(MM > A) _ Y P(A k ). (13.58)
k=0
Now I < IZk !/.1 on A k . Using this and the martingale property,
P(Ak) = E(l Ak ) E(lAkIZkl) _ E[ 1 A,,IE(Zn I `yk)I] < E[ 1 A k E(IZfI I "'k)]

(13.59)
Since ' A , is .F-measurable, 1 A , E(IZn I I ) = E(I Ak jZn I I .^k ); and as the
expectation of the conditional expectation of a random variable Z equals EZ
(see Chapter 0, Theorem 4.4),
P(Ak) ' A E[E(IA k IZnl _ E( 1 jZnl). (13.60)
Summing over k one arrives at (13.55).

(b) The inequality (13.56) follows in the same manner as above, starting with
P(Ak) = E(lA k ) Z E( 1 AkZk), (13.61)
and then noting that
E(Zn Ilk) = E(Zk (Zn Zk) 2 2 Zk(4 Zk) I . 9k)

= E(Zk + (Z Zk) 2 13k) 2ZkE(Zn Zk I Jk)
= E(Zk Ilk) + E((Zn Zk) Z I.k) i E(Zk I lk). (13.62)
Hence,
r
E( 1 AkZ^) ? E[ 1 AkE(Zk I . )] = E[E(1 Ak Zk I k)] = E( 1 AkZk). (13.63)
Now use (13.63) in (13.61) and sum over k to get (13.56).

In order to prove (13.57), first express M as 2 j'" A dA and interchange the
order of taking expectation and integrating with respect to Lebesgue measure

(Fubini's Theorem, Chapter 0, Theorem 4.2), to get
co
EMS =
n
2
f A d2 dP = 2
om n o
21(M,M d i) dP
=
21 o
2(J 1{MA} dP) dA = 2
n f ,0 2P(M > A) dA. (13.64)
0
Now use the first inequality in (13.55) to derive
EM2 < 2 E( 1 (M,A)IZnl) dA = 2 J IZI(J M dA) dP

o
f00 n o /
=
2 fr i
IZIM dP = 2 E(IZI M) < 2(EZ2)'/ 2 (EMn) 112 , (13.65)
using the Schwarz Inequality. Now divide the extreme left and right sides of
(13.65) by (EM)" 2 to get (13.57). n
A well-known inequality of Kolmogorov follows as a simple consequence of

(13.56).
Corollary 13.7. Let X0 , X 1 ,. . . , X be independent random variables, EXX = 0

for l < j < n, EX j2 < oo for 0 <, j < n. Write S,k := Xo + + Xk , M:= max{ISkI:
0<k<n}. Then for every .?>0,
P(M>1A)<22 SdP J IM>_A)

z ES. (13.66)
The main results in this section may all be extended to continuous-parameter

martingales. For this purpose, consider a stochastic process {X1 : t > 0} having
right continuous sample paths t + X. A stopping time r for {X} is a random
variable with values in [0, oo], satisfying the property
{T<,t}E.^, forallt>,0, (13.67)
where;:= r{X: 0 < u < t} is the sigmafield of events that depend only on the
("past" of the) process up to time t. As in the discrete-parameter case, first
passage times to finite sets are stopping times. Write for Bore! sets B c ff8 1 ,
r B :=min{t>,0:X,eB}, (13.68)
for the first passage time to the set B. If B is a singleton, B = {y}, T, is written
simply as r y , the first passage time to the state y.
A stochastic process {Z,} is said to be a {.F}-martingale if
E(Z,1 5;) = Zs (s < t). (13.69)
For a Brownian motion {X} with drift , the process {Z, := XX t} is easily
seen to be a martingale with respect to {A}. For this {X,} another example is
{Z, :_ (X1 t) 2 ta 2 }, where a 2 is the diffusion coefficient of {X1 }.
The following is the continuous-parameter analogue of Theorem 13.6(b).
Proposition 13.8. Let {Z1 : t >, 0} be a right continuous square integrable

{.^^}-martingale, and let M,:= sup{jZj: 0 < s <, t}. Then
P(M, > ).) < EZ, /2 2 (). > 0) (13.70)
and
EM, <,4EZ,. (13.71)
Proof. For each n let 0 = t,, n <t 2 ,, < < = t be such that the sets
I:= {t j ,: I j < n} are increasing (i.e., I n c I + ,) and U 1. is dense in [0, t].
Write M1 , n := max{,Zj: I < j < n}. By (13.56),
P(Mr,, )) EZ`
> 2
Letting n j oo, one obtains (13.70) as the sets F := {M A,} increase to a set
that contains {M, > Al as n j oo.
Next, from (13.57),
EM2 < 4EZ, .
Now use the Monotone Convergence Theorem (Chapter 0, Theorem 3.2), to

get (13.71). n
The next result is an analogue of Theorem 13.3.
Proposition 13.9. (Optional Stopping). Let {Z,: t > 0} be a right continuous

square integrable {. }-martingale, and r a {3}-stopping time such that
(i) P(r < oo) = 1,
(ii) EIZÎ < oo, and
(iii) EZ,,,. * EZt as r cc.

CHAPTER APPLICATION 53
Then
EZi = EZo . (13.72)
Proof. Define, for each positive integer n, the random variables
k2-" on {(k - 1)2 " < t < k2 "}

- - (k = 0, 1, 2, ...)
t": rrr 13.73
1 00 on {i =oo}. ( )
It is simple to check that T " is a stopping time with respect to the sequence of
( )
sigmafields {.k2 -n: k = 0, 1, ...}, as is r " A r for every positive integer r. Since
( )
A r < r, it follows from Theorem 13.3 that
EZT ., A , = EZ o . (13.74)
Now ' , T for all n and T " J, r as n j oc. Therefore, t ( " ) A r j r A r. By the
( )
right continuity of t -> Z 1 , Z'-, r -> Z, A r .

Also, IZI,,,, r I < Mr := sup{lZt I: 0 < t < r}, and EMr <, (EM; )" 2 < oo by
(13.71). Applying Lebesgue's Dominated Convergence Theorem (Chapter 0,
Theorem 3.4) to (13.74), one gets EZt r = EZo . The proof is now complete by
assumption (iii). n
One may apply (13.72) exactly as in the case of the simple symmetric random
walk starting at zero (see (13.16) and (13.17)) to get, in the case it = 0,
P('c_ a > T 6 ):= P({X,} reaches b before - -

a) =a-+b
--- (13.75)

for arbitrary a > 0, b > 0. Similarly, applying (13.72) to {Z, :_ (XX - tp) 2 - ta l l,
as in the case of the simple asymmetric random walk (see (13.20)-(13.22), and
use (9.11)),
Cl - exp^_2a2 11(b + a)
(b + a)P(r_ Q > TO - a = 6 }) - a
Et _ __ (13.76)
a6 ^ 2ba '
(1 - exp{ - (+
6Z )N 1)
(
where {X} is a Brownian motion with a nonzero drift p, starting at zero.
14 CHAPTER APPLICATION: FLUCTUATIONS OF RANDOM

WALKS WITH SLOW TRENDS AND THE HURST PHENOMENON
Let { }": n = 1, 2, ...} be a sequence of random variables representing the annual

flows into a reservoir over a span of N years. Let S" = Y, + + Y", n >, 1,
So = 0. Also let YN = N 'SN . A variety of complicated natural processes (e.g.,

-
sediment deposition, erosion, etc.) constrain the life and capacity of a reservoir.
However, a particular design parameter analyzed extensively by hydrologists,
based on an idealization in which water usage and natural loss would occur at
an annual rate estimated by YN units per year, is the (dimensionless) statistic
defined by
RN MNmN
DN DN (14.1)
where
MN:=max{SnYN:n=0, 1,...,N}
_ (14.2)
mN :=min{SnYN :n=0, 1,...,N},
DN :=r (Y YN ) Z ] IIZ ,
YN = - -SN .
' (14.3)
The hydrologist Harold Edwin Hurst stimulated a tremendous amount of

interest in the possible behaviors of R N /DN for large values of N. On the basis
of data that he analyzed for regions of the Nile River, Hurst published a finding
that his plots of log(R N /DN ) versus log N are linear with slope H : 0.75. William
Feller was soon to show this to be an anomaly relative to the standard statistical
framework of i.i.d. flows Yl , ... , Y having finite second moment. The precise
form of Feller's analysis is as follows.
Let { Y} be an i.i.d. sequence with
EY=d, VarY=a 2 >0. (14.4)
First consider that, by the central limit theorem, SN = Nd + O(N 1 " 2 ) in the
sense that (SN Nd)/..N is, for large N, distributed approximately like a
Gaussian random variable with mean zero and variance 0 2 . If one defines
MN =max{Snd:0 n<N},
r N =min{Snd:0^n<,N}, (14.5)
RN = MN - rN,
then by the functional central limit theorem (FCLT)
Q^N
N
-
mN l (M, m), R
Q^N
NvJ /
^^
as N oo, (14.6)
where denotes convergence in distribution of the sequence on the left to the

distribution of the random variable(s) on the right. Here
M := max{B,: 0 . t . l } ,
m:=min{B,:0 < t 1}, (14.7)
R':=M-m,
with {B 1 } a standard Brownian motion starting at zero. It follows that the

magnitude of R N is 0(N 112 ). Under these circumstances, therefore, one would
expect to find a fluctuation between the maximum and minimum of partial
sums, centered around the mean, over a period N to be of the order N 2 . To
see that this still remains for R N /(./ DN ) in place of R N /(fN o), first note
that, by the strong law of large numbers applied to Y and Y separately, we
have with probability 1 that YN -- d, and
N
DN= Yn-YN->EYI-d2= .2a
sN -.00. (14.8)
N n= 1
Therefore, with probability 1, as N -* oo,
R N RN
(14.9)
V '. DN \/'. Q
where "-S" indicates "asymptotic equality" in the sense that the ratio of the
two sides goes to 1 as N -- oo. This implies that the asymptotic distributions
of the two sides of (14.9) are the same. Next notice that
MN _ ( (S-nd)-n(, -d))
max
v,IN o-<n,<N aIN
( SiN - [Nt]d - [Nt] ( SN - Nd \1

= max n`)
o s t ,1 a\/7
max (B t - tB i ):= M (14.10)

ostsi
and
mN - [Nt] ( SN d \l
= min ( SIN`S [Nt]d min (B, - tB l ) := m,
and
^
C MN
a \/
mN
' aN
V
-^^(M,m). (14.11)
Therefore,
RN RN
(14.12)
DN . 7
/ a..JN
where R is a strictly positive random variable. Once again then, R N /DN , the
so-called rescaled adjusted range statistic, is of the order of O(N 1 J 2 ).
The basic problem raised by Hurst is to identify circumstances under which
one may obtain an exponent H > 2. The next major theoretical result
following Feller was again somewhat negative, though quite insightful.
Specifically, P. A. P. Moran considered the case of i.i.d. random variables
Y1 , Y2 , having "fat tails" in their distribution. In this case the re-scaling by
...
DN serves to compensate for the increased fluctuation in R N to the extent that

cancellations occur resulting again in H = Z.
The first positive result was obtained by Mandelbrot and VanNess, who
obtained H > ? under a stationary but strongly dependent model having
moments of all orders, in fact Gaussian (theoretical complements 1, and
theoretical complements 1.3, Chapter IV). For the present section we will
consider the case of independent but nonstationary flows having finite second
moments. In particular, it will now be shown that under an appropriately slow
trend superimposed on a sequence of i.i.d. random variables the Hurst effect
appears. We will say that the Hurst exponent is H if R N /(DN N") converges in
distribution to a nonzero real-valued random variable as N tends to infinity.
In particular, this includes the case of convergence in probability to a positive
constant.
Let {X} be an i.i.d. sequence with EX = d and Var X. = a 2 as above, and
let f (n) be an arbitrary real-valued function on the set of positive integers. We
assume that the observations Y are of the form
Y. = X + f (n). (14.13)
The partial sums of the observations are
S=Y1 +...+Y=X 1 +...+X+f(1)+... +f(n)
= S* + Z f(j), So = 0, (14.14)
i= 1
where
S,*= X1 ++X. (14.15)
Introduce the notation DN for the standard deviation of the X-values

{X: 1 < n < N},
N
DN Z = (X XN ) 2 . 1N _, ( 14.16)

N
Then, writing IN = f (n)/N,
n=1
N _
D^:=1 L. (}'n Y, )2
N L=1
1 N1
N 2
/' N
= N n^l (X XN) 2 + N nZl ( f (n) JN) 2 + N nl (f(n) fN)(Xn XN)
1 N 2 N
=D2 + I (f(n) fN)Z + ( f (n) fv)(X XN). (14.17)
N= 1 N=1
Also write
MN = max {S nYN } = max S* nXN + Y- (f(j) fN )},

O-<n<N 0-<n5N j=1 )))
_ _ n _
m N = min {SS nYN } = min Sn nXN + Y- (f (j) fN )} , (14.18)
O-<n-<N O-<n-<N j=1
RN = MN M N , RN = max {Sn nXN } min {S nXN}.

O-<n-<N 05n-<N
For convenience, write
IN)' /IN(0) 0,
^N(n) : jY-
=1 (1(f)
(14.19)
AN max IN(n) min PN(n).
O-<n<-N 05n-<N
Observe that
MN max 1N(fl) + max (S* nXN ),

OSn-<N O^n-<N
(14.20)
mN min I.IN(n) + min (Sn nXN),
O-<n-<N On-<N
and
MN >, max N (n) + min (S . nXN ),
OSn-<N O-<n-<N
(14.21)
M N - min P N (n) + max {S, nXN).

05n5N
From (14.20) one gets R N < A N + RN, and from (14.21), R N > A N R. In

other words,
IR N - A N S < R. (14.22)
Note that in the same manner,
IR N - RNS < A N . (14.23)
It remains to estimate DN and A N .
Lemma 1. If f(n) converges to a finite limit, then DN converges to Q 2 with

probability 1.
Proof. In view of (14.17), it suffices to prove

N
I (f(n) fN) 2 + ? (f(n) fN)(X,, XN) -+ 0 as N -+ oo. (14.24)
N= 1 N
Let be the limit of f(n). Then
1 Z (f(n) - JN) 2 = 1 Y (f(n) - a) 2 - (f - a) 2 . (14.25)

N=1 N=1
Now if a sequence g(n) converges to a limit 0, then so do its Caesaro means

N -1 I N g(n). Applying this to the sequences (f(n) - a) 2 and f(n), observe that
(14.25) goes to zero as N -* oo. Next
1 1, (f(n) fN)(X,, XN) = 1 Y, (f(n) a)(X,, d) (IN a)(XN d).

N , N,1
(14.26)
The second term on the right clearly tends to zero as N increases. Also, by
Schwarz inequality,
1 N 1 N 1/2 N 1/2
- (f(n)-a)(Xn-d) -< 1 Z (f(n)- a)2
)(X^
^ - d)2
N n=1 N n=1 ( =J
By the strong law of large numbers, N (XX - d)2 - E ( X1 - d)2 = a 2

and the Caesaro means N - ' >.rt=1 (f(n) - a) 2 go to zero as N -> oo, since
(f(n)a) 2 ,.0asn*oo. n
From (14.12), (14.22), and Lemma 1 we get the following result.

Theorem 14.1. If f (n) converges to a finite limit, then for every H > 2,
RN AN I p 0 in probability as N . oo. (14.27)

DN N H DN NH
In particular, the Hurst effect with exponent H > Z holds if and only if, for
some positive number c',
lm ^
N = c'. (14.28)
N-' N "
Example 1. Take
f(n)=a+c(n+m) (n = 1,2,...), (14.29)
where a, c, m, are parameters, with c 0 0, m >, 0. The presence of in indicates

the starting point of the trend, namely, in units of time before the time n = 0.
Since the asymptotics are not affected by the particular value of m, we assume
henceforth that m = 0 without essential loss of generality. For simplicity, also
take c > 0. The case c < 0 can be treated in the same way.
First let < 0. Then f (n) a, and Theorem 14.1 applies. Recall that
AN = max N (n) min P N (n), (14.30)
0-<n5N O-<n-<N
where
n
1 1N (n) = Y- (f(j) IN) for 1 < n ,< N,

i =1 (14.31)
N( 0 ) = 0.
Notice that, with m = 0 and c > 0,
N
) 14.32)
N(n) pN(n 1) = c n !J (
is positive for n < (N ' Ij 1j6)116, and negative or zero otherwise. This shows
-
that the maximum of p h (n) is attained at n = n 0 given by
1
1 N 1/
n o = j jQ (14.33)
where [x] denotes the integer part of x. The minimum value of p(fl) is zero,
attained at n = 0 and n = N. Thus,
ON = /N(no) = c Y ^k 1 E j) (14.34) .
k=1 Ni=i
By a comparison with a Riemann sum approximation to f o x dx, one obtains
(l +)N for > 1

1j=N jl#1 N-'logN for= 1(14.35)
Ni =1 ; YN J N N-' j for < 1.
j=1
By (14.33) and (14.35),

(1 + ) 'I N
- for > 1
N/log N for = 1
no
C l
1IB
^ j1 N 1- for
for < 1.
(14.36)
From (14.34)(14.36) it follows that
ng N c1 Ni+fl,
cno > -1,
1 -^ "
\ 1+
A N cn o (n^ ' log n o N -1 log N) c log N, = 1, (14.37)
X
C Y_ j# = c 2 , <-1.
j=1
Here c l , c 2 are positive constants depending only on . Now consider the

following cases.
CASE 1: 2 < < 0. In this case Theorem 14.1 applies with H() = 1 + > 2.
Note that, by Lemma 1, DN a with probability 1. Therefore,
N' - c' >0

RN in probability as N cc, if 2 < < 0. (14.38)
DN +
CASE 2: < Z. Use inequality (14.23), and note from (14.37) that
O N = o(N l l 2 ). Dividing both sides of (14.23) by DN N 112 one gets, in probability
asN --goo,
RN R" R"
if < 2 .
DN N 1 I 2 DN N 1 I 2 QN1 2
1 (14.39)
But RN/vN'" Z converges in distribution to R by (14.11). Therefore, the Hurst

exponent is H() = Z.
CASE 3: = 0. In this case the Y are i.i.d. Therefore, as proved at the outset,
the Hurst exponent is 2.
CASE 4: > 0. In this case Lemma 1 does not apply, but a simple computation
yields
DN c 3 N with probability 1 as N + oo, if >0. (14.40)
Here c 3 is an appropriate positive number. Combining (14.40), (14.37), and

(14.22) one gets
RN 4 in probability as N oo , if > 0, (14.41)

NDN c
where c 4 is a positive constant. Therefore, H() = 1.
CASE 5: /i = . A slightly more delicate argument than used above shows
that
R 1 j2 max (B 1 tB, 2ct(1 t)),

ost,i
(14.42)
DNN
where {B,} is a standard Brownian motion starting at zero. Thus, H(

Z) = Z
In this case one considers the process {Z N (s)} defined by
_nl S nYN
ZN( N) forn= l,2,...,N,
V ' DN
and linearly interpolated between n/N and (n + 1)/N. Then {Z N (s)} converges in
distribution to {BS + 2c,(1 ../s)}, where B'} is the Brownian bridge. In
this case the asymptotic distribution of R N /( DN) is the nondegenerate
distribution of
max B min B,
oss,i osssi
(Exercise 1).
The graph of H() versus in Figure 14.1 summarizes the results of the
preceding cases 1 through 5.

Figure 14.1
For purposes of data analysis, note that (14.38) implies
log RN [loge'+(l +) log N] 0 if 2 < <0. (14.43)
In other words, for large N the plot of log R N /DN against log N should be
approximately linear with slope H = 1 + , if < < 0.
Under the i.i.d. model one would expect to find a fluctuation between the
maximum and the minimum of partial sums, centered around the sample mean,
over a period N to be of the order of N 1 / 2 . One may then try to check the
appropriateness of the model, i.e., the presumed i.i.d. nature of the observations,
by taking successive (disjoint) blocks of Y. values each of size N, calculating
the difference between the maximum and minimum of partial sums in each
block, and seeing whether this difference is of the order of N" 2 . In this regard
it is of interest that many other geophysical data sets indicative of climatic
patterns have been reported to exhibit the Hurst effect.
EXERCISES
Exercises for Section I.1

1. The following events refer to the coin-tossing model. Determine which of these
events is finite-dimensional and calculate the probability of each event.
(i) 1 appears for the first time on the 10th toss.
(ii) 1 appears for the last time on the 10th toss.
(iii) 1 appears on every other toss.
(iv) The proportion of 1's stabilizes to a given value r as the number of tosses is
increased without bound.
(v) Infinitely many l's occur.
2. For the coin-tossing experiment (Example 1) determine the probability that the
first head is followed immediately by a tail.
3. Let L. denote the length of the run of 1's starting at the nth toss in the coin-tossing
model. Calculate the distribution of L.

EXERCISES 63
Each integer lattice site of Z' is independently colored red or green with
probabilities p and q = 1 - p, respectively. Let E m be the event that the number
of green sites equals the number of red sites in the block of sites of side lengths
2m (sites per side) with a corner at the origin. Calculate P(E m i.o.) for d > 3. [Hint:
Use the Borel-Cantelli Lemma, Chapter 0, Lemma 6.1.]
5. A die is repeatedly tossed and the number of spots is recorded at each stage. Fix
j, 1 _< j < 6, and let p,, be the probability that j occurs among the first n tosses.
Calculate p. and the probability that j eventually occurs.
6. (A Fair Coin Simulation) Suppose that you are given a coin for which the
probability of a head is p where 0 < p < 1. At each unit of time toss the coin twice
and at the nth such double toss record:
XX = 1 if a head followed by a tail occurs,
X, = -1 if a tail followed by a head occurs,
XX = 0 if the outcomes of the double toss coincide.
Let,r=min{n_> 1:X= 1 or -1}.
(i) Verify that P(t < oo) = 1.
(ii) Calculate the distribution of Y = X.
(iii) Calculate Er.
7. Show that the two probability distributions for unending independent tosses of a
coin, corresponding to distinct probabilities Pi 0 p Z for a head in a single toss,
assign respective total probabilities to mutually disjoint subsets of the coin-tossing
sample space S2. [Hint: Consider the density of l's in the various possible sequences
in S2 and use the SLLN (Chapter 0, Theorem 6.1). Such distributions are said to
be mutually singular.]
8. Suppose that M particles can be in each of N possible states s,, S21'.. , s N . Construct
a probability space and calculate the distribution of (X ... , XN ), where Xi is the
number of particles in state s ; , for each of the following schemes (i)-(iii).
(i) (Maxwell-Boltzmann) The particles are distinguishable, say labeled m i , . .. , m M
,
and are randomly assigned states in such a way that all possible
distinct assignments are equally likely to occur. (Imagine putting balls
(particles) into boxes (states).)
(ii) (Bose-Einstein) The particles are not distinguishable but are randomly
assigned states in such a way that all possible values of the numbers of particles
in the various states are equally likely to occur.
(iii) (Fermi-Dirac) The particles are not distinguishable but are randomly
assigned states in such a way that there can be at most one particle in any
one of the states and all possible values of the numbers of particles in various
states under the exclusion principle are equally likely to occur.
(iv) For each of the above distributions calculate the asymptotic distribution of
X as M and N -. oo such that MIN .- where p > 0 is the asymptotic
;
density of occupied states.

9. Suppose that the sample paths of {X,} solve the following problem, dX/dt = -X,
X0 = 1, with probability 1, where is a random parameter having a normal
distribution.
(i) Calculate EX,.
(ii) Calculate the solution x(t) to the problem with replaced by E.
(iii) How do EX, and x(t) compare for short times? What about in the long run?

(iv) Calculate the distribution of the process{X,}.

*10. (i) Show that the sample space S2 = {w = (ah, w 2 ,.. .): w, = I or 0} for repeated
tosses of a coin is uncountable.
(ii) Show that under binary expansion of numbers in the unit interval, the event
{w E S2: CO e l , w Z = 8 Z , .. , co" = e"}, e i = 0 or 1, is represented by an
interval in [0, 1] of length 1/2.
*11. Suppose that S2 = [0,1] and 9 1 is the Borel sigmafield of [0, 1]. Suppose that P is
defined by the uniform p.d.f. f(x) = 1, x E [0, 1]. Remove the middle one-third
segment from [0,1] and let J = [ 0,1/3] and J 12 = [ 2/3,1] be the remaining left
and right segments, respectively. Let 1 2 = J l , u J 12 . Next remove the middle
one-third from each of these and let J ill = [0, 1/9] and J 112 = [ 2/9, 3/9] be the
remaining left and right segments of J l ,, and let J121 = [ 6/9, 7/9] and
J 122 = [ 8/9, 1] be the remaining left and right intervals of J 12 . Let
1 3 = J 111 L) J112 .. J121 v J122. Repeat this process for each of these segments, and
so on. At the nth stage, I n is the union of 2" ' disjoint intervals each of length
-
1/3 " - t . In particular, the probability (under f) that a randomly selected point
belongs to I", i.e., the length of 1", is P(I") = 2" '/3" '. The sets
- -
I l 1 2 In ... form a decreasing sequence. The Cantor set is

the limiting set defined by C = n I".
(i) Show that C is in one-to-one correspondence with the interval (0, 1].
(ii) Verify that C is a Borel set and calculate P(C).
(iii) Construct a continuous c.d.f that does not have a p.d.f. [Hint: Define the
Cantor ternary function F on [0,1] as follows. Let F0 (x) = (2k 1)/2" for x
in the kth interval from the left among those 2" ' subintervals deleted for the
-
first time at the nth stage. Then F 0 is well defined and has a continuous
extension to a function F on all of [0,1] with F(l) = I and F(0) = 0.]

1. Let {X": n 1} be an i.i.d. sequence with EX.' < eo, and define the general random
walk starting at x by S = x, S; = x + X, + + X". Calculate each of the following
numerical characteristics of the distribution.
(i) ES.'
(ii) Var S.'
(iii) COV(Sn, Sm)
2. (A Mechanical Model) Consider the scattering device depicted in Figure Ex.I.2,
consisting of regularly spaced scatterers in a triangular array. Balls are successively
dropped from the top, scattered by the pegs and finally caught in the bins along
the bottom row.
(i) Calculate the probabilities that a ball will land in each of the respective bins if
the tree has n levels.
(ii) What is the expected number of balls that will fall in each of the bins if N balls
are dropped in succession?
*3. (Dorfman 's Blood Testing Scheme) Prior to World War lithe army made individual
tests of blood samples for venereal disease. So N recruits entering a processing
station required N tests. R. Dorfman suggested pooling the blood samples of m
individuals and then testing the pooled sample. If the test is negative then only one

EXERCISES 65
u u 0 u II u
n=5
Figure Ex.I.2
test is needed. If the test of the pool is positive then at least one individual has the
disease and each of the m persons must be retested individually, resulting in this
event in m + 1 tests. Let X 1 , X 2 , ... , XN be an i.i.d. sequence of 0 or 1 valued
random variables with p = P(X = 1) for n = 1, 2, .... Let the event X. = l} be
used to indicate that the nth individual is infected; then the parameter p measures
the incidence of the disease in the population. Let S. = X, + X 2 + + X. denote
the number of infected individuals among the first n individuals tested, S o = 0. Let
Tk denote the number of tests required for the kth group of m individuals tested,
k = 1,2,...,[N/m]. Thus, form? 2, Tk = m + I if5,kSm(k_ ^0and Tk = I if
Smk 'Sm(k-,) = 0. The total number of tests (cost) for N individuals tested in groups
of size m each is
[N!m]
C N = Tk+(Nm[N/m]), m? 2.
k=1
Find m such that, for given N (large) and p, the expected number of tests per person
is minimal. [Hint: Consider the limit as N oo and show that the optimal m, if
one exists, is the integer value of m that minimizes the function
ECN I
c(m)=lim =1+--(1p) m , m? 2, c(1) = 1.
N -' N in
Analyze the extreme values of the function g(x) = (I/x) (1 p)x for x > 0
(see D. W. Turner,. F. E. Tidmore, D. M. Young (1988), SIAM Review, 30,
pp. 119-122).]
4. Let {S;} be the simple symmetric random walk starting at x and let
u(n, y) = P(S = y). Verify that u(n, y) satisfies the following initial value problem
aû = 2oyu, u(0, y) = bx,Y>
where is Kronecker's delta and
u(n, y) = u(n + 1, y) u(n, y),

V y u(n, y) = u(n, y + Z) u(n, y 2)
5. Suppose that {S,.,'"} and {S 2 } are independent (general) random walks whose
increments (step sizes) have the common distribution {p x : x E Z}. Verify that
{T = S S 2) } is a random walk with step size distribution {p x : x e Z} given by
a symmetrization of {p,: x e 7L}, i.e. px = XEE 7Z. Express {p s } in terms of
{px : xE Z}.

1. Write out a complete derivation of (3.3) by conditioning on X,.
2. (i) Write out the symmetry argument for (3.8) using (3.6). [Hint: Look at the
random walk {S.x: n >, 0}.]
(ii) Verify that (3.6) can be expressed as
sink{y (x c)}
P(Td < Tx) = exp{y p (d x)} where y,, = ln((q/p)" 2 )
Binh{y,,(d c)}'
3. Justify the use of limits in (3.9) using the continuity properties (Chapter 0, (1.1),
(1.2)) of a probability measure.
4. Verify that P(S.x j4 y for all n >, N) = 0 for each N = 1, 2, ... to establish (3.18).
5. (i) If p < q and x < d, then give the symmetry argument to calculate, using (3.9),
the probability that the simple random walk starting at x will eventually reach
d in (3.10).
(ii) Verify that (3.10) may be expressed as
P(Ta < oo) = e - ap (d- "^ for some a >- 0.

6. Let rl, denote the "time of the first return to x" for the simple random walk {S.}
starting at x. Show that
P(rl, < < co) = P({S} eventually returns to x)

= 2 min(p, q).
i < cc) + q P(T' -' < cc).]

[Hint: P(nx < cc) = pP(T

7. Let M - M := sup, S,,, m = m := inf,,, S. be the (possibly infinite) extreme
statistics for the simple random walk {S} starting at 0.
(i) Calculate the distribution function and probability mass function for M in the
case p < q. [Hint: Use (3.10) and consider P(M > t).]
(ii) Do the calculations corresponding to (i) for m when p > q, and for M, and m'.

EXERCISES 67
8. Let {X"} be an arbitrary i.i.d. sequence of integer-valued displacement random

variables for a general random walk on Z denoted by S" = X t + + X", n >_ 1,
and So = 0. Show that the random walk is transient if EX, exists and is noqzero.
Say that an "equalization" occurs at time n in coin tossing if the number of heads
acquired by time n coincides with the number of tails. Let E. denote the event that
an equalization occurs at time n.
(i) Show that if p j4 Z then P(E 2 ) -r c,r"n -1 / 2 , for some 0< r < 1, as n - oo.
[Hint: Use Stirling's formula: n! (2rtn)` 12 n"e - " as n -. cc; see W. Feller (1968),
An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed:,
pp. 52-54, Wiley, New York.]
(ii) Show that if p = Z then P(E 2 ) c 2 n -1 / 2 as n -+ oo.
(iii) Calculate P(E i.o.) for p 96 2 by application of the Borel-Cantelli Lemma
(Chapter 0, Lemma 6.1) and (i). Discuss why this approach fails for p = 2.
(Compare Exercise 1.4.)
(iv) Calculate P(E n i.o.) for arbitrary values of p by application of the results of this
section.
10. (A Gambler's Ruin) A gambler plays a game in which it is possible to win a unit
amount with probability p or lose a unit amount with probability q at each play.
What is the probability that a gambler with an initial capital x playing against an
infinitely rich adversary will eventually go broke (i) if p = q = 2, (ii) if p > z?
11. Suppose that two particles are initially located at integer points x and y, respectively.
At each unit of time a particle is selected at random, both being equally likely to
be selected, and is displaced one unit to the right or left with probabilities p and q
respectively. Calculate the probability that the two particles will eventually meet.
[Hint: Consider the evolution of the difference between the positions of the two
particles.]
12. Let {Xn } be a discrete-time stochastic process with state spaces S = Z such that
state y is transient. Let Z N denote the proportion of time spent in state y among the
first N time units. Show that lim N -. T N = 0 with probability 1.
13. (Range of Random Walk) Let {S"} be a simple random walk starting at 0 and
define the Range R. in time n by
R"= #{S 0 =0,S i ,...,S"}.
R. represents the number of distinct states visited by the random walk in time 0 to n.
(i) Show that E(R/n) -' ^p qi as n -* oo. [Hint: Write
R" _ /k where I. = 1, I k = I k (S,, . , Sk ) is defined by

k=0
_ (l ifSk;S;forallj=0,1,...,k-1,
Ik Sl0 otherwise.
Then
Elk = P(Sk - Sk _ 1 ^ 0, Sk - Sk _ 2 ^ 0, ... , Sk 96 0)

=P(S; r0,j= 1,2,...,k) (justify)
= 1 P(time of the "first return to 0" _< k)

k-1
= 1 Y {P(T =J)p + P(To 1 =1)q} . Ip ql ask-+ co. ]
j=1
(ii) Verify that R"/n + 0 in probability as n * oo for the symmetric case p = q = Z.
[Hint: Use (i) and Chebyshev's (first moment) Inequality. (Chapter 0, (2.16).]
For almost sure convergence, see F. Spitzer (1976), Principles of Random Walk,
Springer-Verlag, New York.
14. (LLD. Products) Let X,, X2 ,... be an i.i.d. sequence of nonnegative random
variables having a nondegenerate distribution with EX, = 1. Define T. = fl. , Xk,
n > 1. Show that with probability unity, T. 0 as n --> oo. [Hint: If P(X, = 0) > 0
then
P(T" = 0 for all n sufficiently large)
= P(Xk = 0 for some k) = lim (1 [1 P(X, = 0)]") = 1.

n-.m
For the case P(X 1 = 0) = 0, take logarithms and apply Jensen's Inequality (Chapter
0, (2.7)) and the SLLN (Chapter 0, Section 0.6) to show log T. -- 00 a.s. Note the
strict inequality in Jensen's Inequality by nondegeneracy.]
15. Let {S"} be a simple random walk starting at 0. Show the following.
(i) If p = Z, then P(S" = 0) diverges.
(ii) If p Z, then =o P(S" = 0) = (1 4pq) -112 = Ip q1'. [Hint: Apply the
Taylor series generalization of the Binomial theorem to z"P(Sn = 0) noting
that
C 2n (pq)" =
n )
(_1)(4pq)n (
/
\ n
2 I ]
(iii) Give a proof of the transience of 0 using (ii) for p ^ Z. [Hint: Use the
BorelCantelli Lemma (Chapter 0).]
16. Define the backward difference operator by
VO(x) = O(x) 0(x 1).
(i) Show that the boundary-value problem (3.4) can be expressed as
ZagOZ^+V =0on(c,d) and 4(c)=0, 4i(d)=1,
where u = p q and a 2 = 2p.

(ii) Verify that if p = q( = 0), then 0 satisfies the following averaging property on
the interior of the interval.
4(x) = [4(x 1) + 4(x + 1)]/2 for c < x < d.
Such a function is called a harmonic function.

(iii) Show that a harmonic function on [c, d] must take its maximum and its
minimum on the boundary 0 = {c, d}.

EXERCISES 69
(iv) Verify that if two harmonic functions agree on a, then they must coincide on
all of [c, d].
(v) Give an alternate proof of the fact that a symmetric simple random walk starting
at x in (c, d) must eventually reach the boundary based on the above
maximum/minimum principle for harmonic functions. [Hint: Verify that the sum
of two harmonic functions is harmonic and use the above ideas to determine
the minimum of the escape probability from [c, d] starting at x, c -< x -< d.]
17. Consider the simple random walk with p -< q starting at 0. Let N denote the number ;
of visits to j > 0 that occur prior to the first return to 0. Give an argument that
EN; = (p/q)'. [Hint: The number of excursions to j before returning to 0 has a
geometric distribution. Condition on the first displacement.]
Let T" denote the time to reach the boundary for a simple random walk {S}
starting at x in (c, d). Let p = 2p - 1, a 2 = 2p.
(i) Verify that ET" < oo. [Hint: Take x = 0. Choose N such that P(ISI > d - c).

Argue that P(T > rN) -< ('-,)', r = 1, 2, ... using the fact that the r sums over
(jN - N, jN], j = 1, ... , r, are i.i.d. and distributed as SN.]
(ii) Show that m(x) = ETx solves the boundary value problem
aZ
2 -V m+pVm= -1,
m(c) = m(d) = 0, Vm(x):= m(x) m(x 1).
(iii) Find an analytic expression for the solution to the nonhomogeneous boundary
value problem for the case p = 0. [Hint: m(x) = -x 2 is a particular solution
and 1, x solve the homogeneous problem.]
(iv) Repeat (iii) for the case p 96 0. [Hint: m(x) = Iq - p 'x is a particular solution
-
and 1, (q/p)" solve the homogeneous problem.]

(v) Describe ETx in the case c = 0 and d -^ oc.
2. Establish the following identities as a consequences of the results of this section.
(i) Y- N
N
( I N +y 2 -N =1
1 forally#0.
NIIyI
N+yeven 2
(ii) For p > q,
Y
N,IyI
J IyIIN
N
N
+ 1
1
p (N+y)lZ q (N y)/2 = / p\ v
f I
fort'>0
for y < 0.
N+ y even 2 \q )
(iii) Give the corresponding results for p < q.

3. (A Reflection Property) For the simple symmetric random walk {S} starting at 0
show that, for y > 0,
P(max S. > y = 2P(S N -> Y) P(SN = Y)

n5N
Let S. = X l + + X, So = 0, and suppose that X 1 , X2 , ... are independent

random variables such that S. S S S 2 , ... have symmetric probability
distributions.
(i) Show that P(max, N S. >- y) = 2P(SN y) P(S N = y) for all y> 0.
(ii) Show that P(max, N S x, SN = y) = P(SN = y) if y x and P(max N S > x,
SN =y)= P(SN =2xy)ify-<x.
5. (A Gambler's Ruin) A gambler wins or loses 1 unit with probabilities p and

q = 1 p, respectively, at each play of a game. The gambler has an initial capital
of x units and the adversary has an initial capital of d> x units. The game is played
repeatedly until one of the players is broke.
(i) Calculate the probability that the gambler will eventually go broke.
(ii) What is the expected duration of the game?
6. (Bertrand's Classical Ballot Problem) Candidates A and B have probabilities p and
1 p (0 n, then what is the probability that A will maintain a lead throughout
the process of sequentially counting all m + n votes cast? [Hint: P (out of m + n
votes cast, A scores m votes, B scores n votes, and A maintains a lead
throughout) = P(T n = m + n).]
-
7. Let {S} be the simple random walk starting at 0 and let
MN =max{S:n=0, 1,2,...,N},
m N =min{S:n=0, 1, 2,...,N}.
(i) Calculate the distribution of MN .

(ii) Calculate the distribution of m N .
(iii) Calculate the joint distribution of MN and S N . [Hint: Let a > 0, b be integers,
a -> b. Then
N
P(MN >,a,S N ^b)=P(Ta -<N,SN ->b)= P(Tn =n,SN ^b)
n=1
N
_ Z P(Ta =n,SN S,>ba)
N
_ P(Tn =n)P(SN _>-ba)
n=1
8. What percentage of the particles at y at time N are there for the first time in a dilute
system of many noninteracting (i.e., independent) particles each undergoing a simple
random walk starting at the origin?
*9. Suppose that the points of the state space S = 1 are painted blue with probability

EXERCISES 71
p or green with probability 1 - p, 0 -< p < 1, independently of each other and of a

simple random walk {Sn } starting at 0. Let B denote the random set of states (integer
sites) colored blue and let N(p) denote the amount of time (occupation time) that
the random walk spends in the set B prior to time n, i.e.,
Nn(P) _ 1B(Sk)
k=0
(i) Show that EN(p) = (n + 1)p. [Hint: EI B (Sk ) = E{E[I B (Sk ) I Sk ]}.]
(ii) Verify that
for p = Z
lim Var{ Nn(p) l - cap
l n ) P(I - P)
for p z.
1p - 9^
[Hint: Use Exercise 13.15.]

(iii) For p ^ Z, use (ii), to show Nn (p)/n -+ p in probability as n -+ Go. [Hint: Use
Chebyshev's Inequality.]
(iv) For p = 1, show NN (p)/n -^ p in probability. [Hint: Show Var(N n (p)/n) -* 0 as
n. co.]
10. Apply Stirling's Formula, (k! _ (27rk)'/ Z k k e -k (1 + 0(l)) as k -. 00), to show for the
simple symmetric random walks starting at 0 that
(i) P(T N)
'
as N -^ oo.
(2rz)1'^z N-3/2
(ii) ETy = co.
Exercises for Section 1.5
I. (i) Complete the proof of P61ya'.s Theorem fork > 3. (See Exercise 5 below.)
(ii) Give an alternative proof of transience for k > 3 by an application of the
Borel-Cantelli Lemma Part 1 (Chapter 0, (6.1)). Why cannot Part 2 of the
lemma be directly applied to prove recurrence for k = 1, 2?
2. Show that
kO \k/ Z =()
/ .
[Hint: Consider the number of ways in which n balls can be selected from a box
of n black and n white balls.]
3. (i) Show that for the 2-dimensional simple symmetric random walk, the probability
of a return to (0, 0) at time 2n is the same as that for two independent walkers,
one along the horizontal and the other along the vertical, to be at (0, 0) at
time 2n. Also verify this by a geometric argument based on two independent
walkers with step size 1/ f and viewed along the axes rotated by 450

(ii) Show that relations (5.5) hold for a general random walk on the integer lattice
in any dimension. Use these to compute, for the simple symmetric random
walk in dimension two, the probabilities fj that the random walk returns to
the origin at time j for the first time for j = 1, ... , 8. Similarly compute fj in
dimension three for I < j < 4.
4. (i) Show that the method of Exercise 3(i) above does not hold in k = 3 dimensions.
(ii) Show that the motion of three independent simple symmetric random walkers
starting at (0, 0, 0) in 71 3 is transient.
5. Show that the trinomial coefficient
n n!
( j, k, n j k j!k!(njk)!
is largest for j, k, n j k, closest to n/3. [Hint: Suppose a maximum is attained

for j = J, k = K. Consider the inequalities of the form
n ^ n
( j, k,njk)_ < (J,K,nJK)
when j, k and/or n j k differ from J, K, n J K, respectively, by 1. Use

this to show In J 2KI < 1, in K 2JI < 1.]

6. Give a probabilistic interpretation to the relation (see Eq. 5.9) = 1/(1 y). [Hint:
Argue that the number of returns to 0 is geometrically distributed with parameter y.]
7. Show that a multidimensional random walk is transient when the (one-step) mean
displacement is nonzero.
8. Calculate the probability that the simple symmetric k-dimensional random walk
will return i.o. to a previously occupied site. [Hint: The conditional probability,
given S o , ... , S, that S + 1 ^ {S o , ... , S} is at most (2k 1)/2k. Check that
2k-1 '"
P(S+1,...,S,+mE{So,...,S}`5
2k )
for each m >, 1.]

9. (i) Estimate (numerically) the expected number ( 1) of returns to the origin.
[Hint: Estimate a bound for C in (5.15) and bound (5.16) with a Riemann
integral.]
(ii) Give a numerical estimate of the probability y that the simple random walk
in k = 3 dimensions will return to the origin. [Hint: Use (i).]
*10. Calculate the probability that a simple symmetric random walk ink = 3 dimensions
will eventually hit a given line {(ra, rb, rc): r e Z}, where (a, b, c) & (0, 0, 0) is a
lattice point.
*11. (A Finite Switching Network) Let F = (x 1 , X21... , x k } be a Finite set of k sites
that can be either "on" (1) or "off" (0). At each instant of time a site is randomly
selected and switched from its current state e to 1 e. Let S = {0,1 } F =
{(e l , ... , e k ): e ; = 0 or 1}. Let X1 , X2 , ... be i.i.d. S-valued random variables with

EXERCISES 73
P(X" = e ; ) = p ; , i = i,... , k, where e ; e S is defined by e.(j) = b ; , j , and p, _> 0,

I p i = 1. Define a random walk on S, regarded as a group under coordinatewise
addition mod 2, by
S0=(0,0,...,0).
Show that the configuration in which all switches are off is recurrent in the cases
k = 1, 2. The general case will follow from the methods and theory of Chapter II
when k < oo. The problem when k = cc has an interesting history: see F. Spitzer
(1976), Principles of Random Walk, Springer Verlag, New York, and references
therein.
*12. Use Exercise 11 above and the examples of random walks on Z to arrive at a
general formulation of the notion of a random walk on a group. Describe a random
walk on the unit circle in the complex plane as an illustration of your ideas.
13. Let {X"} denote a recurrent random walk on the 1-dimensional integer lattice.
Show that
E,(number of visits to x before hitting 0) = + cc, x ^ 0.
[Hint: Translate the problem by x and consider that starting from 0, the number
of visits to 0 before hitting x is bounded below by the number of visits to 0
before leaving the (open) interval centered at 0 of length lxi. Use monotonicity to
pass to the limit.]
1. Let X be an (n + 1)-dimensional Gaussian random vector and A an arbitrary linear

transformation from 11" + ' to I. Show that AX is a Gaussian random vector in 11'.
2. (i) Determine the consistent specification of finite-dimensional distributions for the
canonical construction of the simple random walk.
(ii) Verify Kolmogorov consistency for Example 3.
*3. (A Kolmogorov Extension Theorem: Special Case) This exercise shows the role of
topology in proving what otherwise seems a (purely) measure-theoretical assertion
in the simplest case of Kolmogorov's theorem. Let fl = {0, 1 } be the product space
consisting of sequences of the form w = (cw w 2 ,...) with w i e {0, 1}. Give S2 the
product topology for the discrete topology on {0, 1}. By Tychonoff's Theorem from
topology, this makes S compact. Let X" be the nth coordinate projection mapping
on S2. Let .y be the Borel sigmafield for ).
(i) Show that F coincides with the sigmafield for S2 generated by events of the form
F(E 1 ,.. ,E")={we c2:w 1 =E i for i= 1,2,.. ,n},
for an arbitrarily prescribed sequence E i ..... E" of l's and 0's. [Hint:
p(w, n) = J^ ^ Iw" 7"I/ 2 " metrizes the product topology on S2. Consider the
open balls of radii of the form r = 2 -^' centered at sequences which are 0 from
some n onward and use separability.]
(ii) Let {P"} be a consistent family of probability measures, with P" defined on
(Ii",."), and such that P. is concentrated on Q, = {0,1 }". Define a set function
for events of the form F = F(E 1 , ... , e") in (i), by
P(F)=P"({w ; =c,i=1,2,.. ,n}).
Show that there is a unique probability measure P on the sigmafield .^" of

cylinder sets of the form
C={a eft(w,.. .w n )EB},
where B c {0, 1 }", which agrees with this formula for Fe .f", n >_ 1.
(iii) Show that F := Un 1 .f" is a field of subsets of Q but not a sigmafield.
(iv) Show that P is a countably additive measure on F. [Hint: f) is compact and
the cylinder sets are both open and closed for the product topology on D.]
(v) Show that P has a unique extension to a probability measure on F. [Hint:
Invoke the Caratheodory Extension Theorem (Chapter 0, Section 1) under (iii),
(iv)]
(vi) Show that the above arguments also apply to any finite-state discrete-parameter
stochastic process.
4. Let (S2, F, P) and (S, 2) represent measurable spaces. A function X defined on S2
and taking values in S is called measurable if X - `(B) E F for all Be 2', where
X -' (B) = {X E B} _ {w E D: X(o) e B}. This is the meaning of an S-valued random
variable. The distribution of X is the induced probability measure Q on ,P defined by
Q(B)=P(X - '(B))=P(XEB), BEY.
Let (S2, .F, P) be the canonical model for nonterminating repeated tosses of a coin
and X(a) = w", WE n. Show that {X" X.1, . ..}, m an arbitrary positive integer,
is a measurable function on (S2, F, P) taking values in (S2, F) with the distribution
P; i.e., {X,", Xm+ ,, ... , } is a noncanonical model for an infinite sequence of coin
tossings.
5. Suppose that Di" and D'2 are covariance matrices.
(i) Verify that aD" + D 21 , a, _> 0, is a covariance matrix.
(ii) Let {D ( " ) = ((a;;'))} be a sequence of covariance matrices (k x k) such that
lim a v;j ) = a i; exists. Show that D = ((a u )) is a covariance matrix.
*6. Let (t) = f e"'p(dx) be the Fourier transform of a positive finite measure p.
(Chapter 0, (8.46)).
(i) Show that ((t, t.)) is a nonnegative definite matrix for any t, < < t k .
(ii) Show that (t) = e - I` 1 if p is the Cauchy distribution.
*7. (Plya Criterion for Characteristic Functions) Suppose that 4 is a real-valued
nonnegative function on ( cc, oo) with 4(t) = 4(t) and 0(0) = 1. Show that if 0
is continuous and convex on [0, cc), then 0 is the Fourier transform (characteristic
function) of a probability distribution (in particular, for any t, < t 2 < < t k , k >_ 1,
((4(t ; ti ))) is nonnegative definite by Exercise 6), via the following steps.
(i) Check that 1 Its, Its _< 1, t E 68',
Y(t) = 0,
Itl > 1

EXERCISES 75
is the characteristic function of the (probability) measure with p.d.f.

y(x) = (1 - cos x)/itx 2 , - oo <x < oc. Use the Fourier inversion formula
[(8.43), Chapter 0].
(ii) Check that the characteristic function of a convex combination (mixture) of
probability distributions is the corresponding convex combination of
characteristic functions.
(iii) Let 0 < a, < <a be real numbers, p ; >- 0, Y-"_, p ; = 1. Show that
D(t) = p, y(t/a 1 ) + + p, (t/a^) is a characteristic function. Draw a graph
of 0(t) and check that the slope of the segment between a k and ak+I
is -[(p k /a k ) + + (p/a)], k = 1, 2, ... , n - 1. Interpret the numbers
Pi , p, + p 2 , ... , p, + + p = I along the vertical axis with reference to the
polygonal graph of (t).
(iv) Show that a function 0(t) satisfying Plya's criterion can be approximated by
a function of the form (iii) to arbitrary accuracy. [Hint: Approximate 4(t) by
a polygonal path consisting of n segments of decreasing slopes.]
(v) Show that the (pointwise) limit of characteristic functions is a characteristic
function.
*8. Show that the following define covariance functions for a Gaussian process.
(i) Q.j =e i,J = 0, 1, 2, .. .
(ii) Q = min(i, j) = (iii + I il - Ii -JI)/2

;;
[Hint: Use Exercises 6 and 7.]

1. Verify that (B,,, ... , B,,,,) has an m-dimensional Gaussian distribution and calculate
the mean and the variance-covariance matrix using the fact that the Brownian
motion has independent Gaussian increments.
2. Let {X,} be a process with stationary and independent increments starting at 0 with
EXs < cc for s> 0. Assume EX EX, are continuous functions of t.
(i) Show that EX, = mt for some constant m.
(ii) Show that Var X, = at for some constant a 2 > 0.
(iii) Calculate the limiting distribution of
Y( . ) (X, - mnt)
-
t
asn-> cc, for t>0,fixed.

3. (i) (Diffusion Limit Scalings) Let X, be a random variable with mean
x + t f (p - q)0 and variance t f A 2 pq. Give a direct calculation of p and q = I - p
in terms off and A using only the requirements that the mean and variance of
X, - x should stabilize to some limiting values proportional to t as f -. oc and
A -+0.
(ii) Verify convergence to the Gaussian distribution for the distribution of (Q//)S1 , j
as n -* co, where {S} is the simple random walk with p = (p/2 f) + Z, by
application of Liapunov's CLT (Chapter 0, Corollary 7.3).

4. Let {X} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and
zero drift.
(ii) Show that the process has the following scaling property. For each A > 0 the
process {Y} defined by Y, = A - ' t Z Xx, is distributed exactly as the process Al.
(ii) How does (i) extend to k-dimensional Brownian motion?
5. Let {X} be a stochastic process which has stationary and independent increments.
(i) Show that the distribution of the increments must be infinitely divisible; i.e., for
each integer n, the distribution of X, - X, (s < t) can be expressed as an n-fold
convolution of a probability measure p".
(ii) Suppose that the increment 24 - X s has the Cauchy distribution with p.d.f.
(t - s)/n[(t - s) 2 + x 2 ] for s < t, x e I8'. Show that the Cauchy process so
described is invariant under the rescaling { Y} where Y = A -' Xi, for A > 0; i.e.,
{Y,} has the same distribution as {X,}. (This process can be constructed by
methods of theoretical complements 1, 2 to Section IV.1.)
6. Let {X} be a Brownian motion starting at 0 with zero drift and diffusion coefficient
a 2 > 0. Define Y,= JXJ,t ?0.
(i) Calculate EY, Var Y,.
(ii) Is { Y} a process with independent increments?
7. Let R, = X, where {Xj is a Brownian motion starting at 0 with zero drift and
diffusion coefficient a 2 > 0. Calculate the distribution of R.
8. Let {B,} be a standard Brownian motion starting at 0. Define
V" _ - Bo- nie "I
(i) Verify that EV" = 2"' 2 EIB 1 I.

(ii) Show that Var V. = VarIB 1 j.
(iii) Show that with probability one, {B,} is not of bounded variation on 0 -< -<
t 1.
[Hint: Show that I' + l', >, >-
V", n 1, and, using Chebyshev's Inequality,
P(V">M)->lasn- ooforanyM>0.]
9. The quadratic variation over [0, t) of a function f on [0, oo) is defined by
V(f)= lim. .v"(t,f),where
2^
v"(t,f) _ [f(kt/ 2") - f((k - 1 )t/ 2 ")] z
k=1
provided the limit exists.
(i) Show if f is continuous and of bounded variation then V (f) = 0.

(ii) Show that Ev"(t, {XX - a 2 t as n - cc for Brownian motion {X} with
})
diffusion coefficient a 2 and drift p.

(iii) Verify that v"(t, {X,}) - v et in probability as n - co.
(*iv) Show that the limit in (iii) holds almost surely. [Hint: Use Borel-Cantelli
Lemma and Chebyshev Inequality.]
10. Let {X} be the k-dimensional Brownian motion with drift lt and diffusion coefficient
matrix D. Calculate the mean of X, and the variance-covariance matrix of X.

EXERCISES 77
11. Let {X} be any mean zero Gaussian process. Let t, < t 2 < < t".
(i) Show that the characteristic function of (X r ,... .. X,^) is of the form e -44 ) for
some quadratic form Q(E,) = <A4, i>.
(ii) Establish the pair-correlation decomposition formula for block correlations:
0 if n is odd
E{X,,X, 2 ...X,} =
* E{X,X, }. E{X,X, k } if n is even,
where Y* denotes the sum taken over all possible decompositions into all possible
disjoint pairs {t ; , t 3 }, ... , {t,,,, t,} obtained from {t,.. ... t"}. [Hint: Use induction
on derivatives of the (multivariate) characteristic function at (0, 0, ... , 0) by
first observing that c?e -1 Q 14) /c7 i = a,e t'2 and c?x ; /a5 t = a ;j , where
x i = Y J a ;j ^ t and A = ((a i; )).]
1. Construct a matrix representation of the linear transformation on U8'` of the

increments (x,,, x, 2 x . ,x X, k ,) to (x,,, x, z , .
- , x tk ).
*2. (i) Show that the functions f: CEO, oo) E defined by
f(w) = max w(t), g(co) = min w(t)

a<t,n
are continuous for the topology of uniform convergence on bounded intervals.

(ii) Show that the set { f({X;"}) _< x} is a Borel subset of CEO, oo) if f is continuous
on C[0, oo).
(iii) Let f: C[0, oo) --* R'` be continuous. Explain how it follows from the definition
of weak convergence on C[0, oo) given after the statement of the FCLT (pages
23-24) that the random vectors f({X;"}) must converge in distribution to f({X })
too.
(iv) Show that convergence in distribution on CEO, cc) implies convergence of the
finite-dimensional distributions. Exercise 3 below shows that the converse is
not true in general.
*3. Suppose that for each n = 1, 2, ... , {x"(t), 0 _< t _< 1}, is the deterministic process
whose sample path is the continuous function whose graph is given by Figure Ex.I.8.
(i) Show that the finite-dimensional distributions converges to those of the a.s.
identically zero process {z(t)}, i.e., z(t) - 0, 0 S t _< 1.
(ii) Check that max,,,,, x(t) does not converge to max o ,,,, z(t) in distribution.
2n n
Figure Ex.I.8
*4 Give an example to demonstrate that it is not the case that the FCLT gives
convergence of probabilities of all infinite-dimensional events in C[0, x). [Hint:
The polygonal process has finite total variation over 0 <_ t z 1 with probability 1.
Compare with Exercise 7.8.]
5. Verify that the probability density function p(t; x, y) of the position at time t of the
Brownian motion starting at x with drift p and diffusion coefficient a 2 solves the
so-called Fokker-Planck equation (for fixed x) given by
ap _ 1 2 02 p ap
at - Zo OY Z - p aY
(i) Check that for fixed y, p also satisfies the adjoint equation
ap =, 22

a l p a
2
c7 + p
at ax "^ ax
(ii) Show that p is a symmetric function of x and y if and only if p = 0.

(iii) Let c(t, y) = $ c o (x)p(t, x, y) dx, where c o is a positive bounded initial
concentration smoothly distributed over a finite interval. Verify that c(t, y)
solves the Fokker-Planck equation with initial condition c o , i.e.,
ac a2C ac
za g 2 - p, c(O , Y) = co(y).
8t = Y Y
6. (Collective Risk in Actuary Science) Suppose that an insurance company has an
initial reserve (total assets) of X0 > 0 units. Policy holders are charged a (gross)
risk premium rate a per unit time and claims are made at an average rate A. The
average claim amount is it with variance a 2 . Discuss modeling the risk reserve
process {X,} as a Brownian motion starting at x with drift coefficient of the form
a - p l and diffusion coefficient 2a 2 , on some scale.
7. (Law of Proportionate Effect) A material (e.g., pavement) is subject to a succession
of random impacts or loads in the form of positive random variables L,, L 2 , ..
(e.g., traffic). It is assumed that the (measure of) material strength T k after the kth
impact is proportional to the strength Tk _, at the preceding stage through the
applied load L k , k = 1, 2, ... , i.e., Tk = L,,T,k _,. Assume an initial strength To - 1
as normalization, and that E(log L 1 ) 2 < co. Describe conditions under which it is
appropriate to consider the geometric Brownian motion defined by {exp(pt + a 2 B,)},
where {Bj is standard Brownian motion, as a model for the strength process.
8. Let X 1 , X2 ,,.. be i.i.d. random variables with EX = 0, Var X. = a 2 > 0. Let
S. = X, + . + X,,, n >, 1, So = 0. Express the limiting distribution of each of the
random variables defined below in terms of the distribution of the appropriate
random variable associated with Brownian motion having drift 0 and diffusion
coefficient a 2 > 0.
(i) Fix 0>0, Y. = n -012 max{ISj : 1 _< k < n}.
(ii) Yn = n - 'I'S..
(iii) Y. = n 312 >I Sk . [Hint: Consider the integral of t -- S(n , l , 0 5 t - 1.]
9. (i) Write R n (x) = 1(1 + xfn)" - esj. Show that
EXERCISES 79
+ I x en+
R(x) 1 1 1 _ r = 1_ ^x r t e^xl
n n Jj r! (n + 1)!
sup R(x) *0 as n ' oo (for every c > 0).

JxJ Sc
(ii) Use (i) to prove (8.6). [Hint: Use Taylor's theorem for the inequality, and
Lebesgue's Dominated Convergence Theorem (Chapter 0, Section 0.3).]
1. (i) Use the SLLN to show that the Brownian motion with nonzero drift is transient.
(ii) Extend (i) to the k-dimensional Brownian motion with drift.
2. Let X, = X 0 + vt, t >, 0, where v is a nonrandom constant-rate parameter and X 0
is a random variable.
(i) Calculate the conditional distribution of X,, given XS = x, for s < t.
(ii) Show that all states are transient if v 0.
(iii) Calculate the distribution of X, if the initial state is normally distributed with
mean and variance a 2 .
3. Let {X,} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and
zero drift.
(i) Define { }} by Y, = tX,,, for t > 0 and Y0 = 0. Show that { Y} is distributed as
Brownian motion starting at 0. [Hint: Use the law of large numbers to prove
sample path continuity at t = 0.]
(ii) Show that {X,} has infinitely many zeros in every neighborhood of t = 0 with
probability 1.
(iii) Show that the probability that t - X, has a right-hand derivative at t = 0 is zero.
(iv) Use (iii) to provide another example to Exercise 8.4.
4. Show that the distribution of min,, 0 X is exponential if {X, } is Brownian motion
starting at 0 with drift p > 0. Likewise, calculate the distribution of max,,, X, when
p<0.
*5. Let {Sn } denote the simple symmetric random walk starting at 0, and let
m= min S k , M= max Sk , n= 1, 2, ... .
OSk<n 05k-'n
Let {B,} denote a standard Brownian motion and let m = min,,,,, B,,
M = max, < ,,, B,. Then, by the FCLT, n" t j 2 (m n , Mn , Sn ) converges in distribution
to (m, M, B 1 ); for rigorous justification use theoretical complements 1.8, 1.9 noting
that the functional w + (min,,,,, co,, max,,,,, w,, w,) is a continuous map of the
metric space C[0, 1] into R 3 . For notational convenience, let
Pn(J)=P(Sn=1), Pn (u,v,y)=P(u<in _<MM <v,S.=y),
for integers u, v, y such that u _< 0 _< v, u < v and u _< y _< v. Also let
1(a, b) = P(a <Z < b), where Z has the standard normal distribution. The following
use of the reflection principle is taken from an exercise in P. Billingsley (1968),

Convergence of Probability Measures, Wiley, New York, p. 86. These results for
Brownian motion are also obtained by other methods in Chapter V.
(i) P(u, v, Y) = P,(Y) n(v, Y) it(u, Y) + n(v, u, y) + n(u, v, Y) t(v, u, v, Y)
n(u, v, u, y) + , where for any fixed sequence of nonnegative integers
Y1, Y2, , Yk, y, k, n, zc(y 1 , Y2, ... , Yk, y) denotes the probability that an n-step
random walk meets y, (at least once), then meets Y 2 , then meets y 3 , ... ,
then meets (-1) k- 'y k , and ends at y.
( 11 ) R(Y 1, Y2, ... , Yk, Y) = pn( 2 Y 1 + 2Y 2 + ... + 2y k - 1 + (-1)k + 1 y) if (-1)k + 'Y > Yk,
n(Y1, Y2, ... , Yk, Y) = Pn( 2 Y1 + 2Y 2 + ... + 2y k _ (_ 1)k+ l y) if (1)'y -< Yk
[Hint: Use Exercise 4.4(ii), the reflection principle, and induction on k. Reflect
through (-1)k y k _, the part of the path to the right of the first passage through
that point following successive passages through y1, Y2, ... , ( -1 ) k-1 Yk-2.]
(iii) p(u, v, y) _ p(y + 2k(v u)) p(2v y + 2k(v u)).
(iv) For integers, u -< 0 -< v, u <- y 1 < Y 2
P(u <m<v,y l <S,,<Y 2 )
= j P(y, + 2k(v u) < S. < Y 2 + 2k(v u)))
Y P(2vy 2 +2k(vu)<S<2vy,+2k(vu)).
k=w
[Hint: Sum over y in (iii).]

(v) For real numbers u < 0 -< v, u < y 1 < Y 2
P(u<m-<M<v,y 1 <B, <Y 2 )
_ ^(y, + 2k(v u), Y 2 + 2k(v u))

k=m
(D(2v Y 2 + 2k(v u), 2v y, + 2k(v u)).

k=ac
[Hint: Respectively substitute the integers [u / ], [u/], [y,^],

[ --- Y2\/] into (iv) ([ ] denoting the greatest integer function). Use Scheffe's
Theorem (Chapter 0) to justify the interchange of limit with summation over k.]
(vi) P(M < v, y 1 < B, < ky2) = 1 (YI,Y2) '( 2 v y 2 , 2v YI)
[Hint: Take u = n I in (iv) and then pass to the limit.]
(vii) P(u <m -< M < v) _ (-1)kD(u + 2k(v u), v + 2k(v u)).
k=m
[Hint: Take y, = U, Y 2 = v in (v).]

EXERCISES 81
(viii) P(sup^B,j < u) _ ( l ) k (D((2k 1)v, (2k + 1)v).

k=a
[Hint: Take u = v in (vii).]
* 1. (i) Show that (10.2) holds at each point z _< 0 (> 0) of continuity of the distribution
function for min o , X. (max o <,,, X,). [Hint: These latter functionals are
continuous.]
(ii) Use (i) and (10.9) to assert (10.2) for all ^.
2. Calculate the probability that a Brownian motion with drift it and diffusion coefficient
a 2 > 0 starting at x will reach y : x in time t or less.
3. Suppose that solute particles are undergoing Brownian motion in the horizontal
direction in a semi-infinite tube whose left end acts as an absorbing boundary in
the sense that when a particle reaches the left end it is taken out of the flow. Assume
that initially a proportion i(x) dx of the particles are present in the element of
volume between x and x + dx from the left end, so that f b(x) dx = 1. For a given
drift it away from the left end and diffusion coefficient a 2 > 0, calculate the fraction
of particles eventually absorbed. What if p = 0?
4. Two independent Brownian motions with drift p ; and diffusion coefficient a?, i = 1, 2,
are found at time t = 0 at positions x i , i = 1, 2, with x, <x 2 .
(i) Calculate the probability that the two particles will never meet.
(ii) Calculate the probability that the particles will meet before time s > 0.
5. (i) Calculate the distribution of the maximum value of the Brownian motion
starting at 0 with drift p and diffusion coefficient a 2 over the time period [0, t].
*(ii) For the case p = 0 give a geometric "reflection" argument that P(max o , s <, Xs >_ y)
= 2P(X, _> y). Use (i) to verify this.
6. Calculate the distribution of the minimum value of a Brownian motion starting at
0 with drift I and diffusion coefficient a 2 over the time period [0, t].
7. Let {B 1 } be standard Brownian motion starting at 0 and let a, b > 0.
(i) Calculate the probability that at < B, < bt for all sufficiently large t.
(ii) Calculate the probability that {B,} last touches the line v = at instead of
y = bt. [Hint: Consider the process {Z,} defined by Zo = 0, Z, = tB, I , for t > 0.
and Exercise 9.3(i).]
8. Let {(B,(' ) , B 2 )} be a two-dimensional standard Brownian motion starting at (0, 0)
(see Section 7). Let r y = inf{t _> 0: B; 2 = y}, y > 0. Calculate the distribution of
Bty ) . [Hint: {B} and {B 2 } are independent one-dimensional Brownian motions.
Condition on r. Evaluate the integral by substituting u = (x 2 + y 2 )/t.]
9. Let {Br } be a standard Brownian motion starting at 0. Describe the geometric
structure of sample paths for each of the following stochastic processes and calculate
EY,.
fY = B 1 if max o<s ,, Bs < a
(i) (Absorption)
1 Y = a if max o , s <, B s >_ a,
where a > 0 is a constant.
(*ii) (Reflection)
J Y=B,
Y,=2aB,
ifB,<a
if B,>a,
where a > 0 is a constant.
(*iii) (Periodic) Y, = B, [B, ],
where [x] denotes the greatest integer less than or equal to x.

10. Let, for a > 0,
a e
.f8(t) = (2 )"2 22 ' (t > 0).
t 3/2
ir
(i) Verify the convolution property
fa * ff (t) = L + (t) for any a, > 0.
(ii) Verify that the distribution of; is a stable law with exponent 0 = 2 (index z)
in the sense that if Ti , T2 , ... , T. are i.i.d. and distributed as T. then
n -8 (T1 + + T.) is distributed as T. (see Eq. 10.2).
(iii) (Scaling property) ; is distributed as z 2 z1.
11. Let T. be the first passage time to z for a standard Brownian motion starting at 0
with zero drift.
(i) Verify that Ez= is not finite.
(ii) Show that Ee = e i 2 ' ^"', A > 0. [Hint: Tedious integration will
work.]
(iii) Use Laplace transforms to check that (1/n)z (,J J converges in distribution to
t z as n > oo.
12. Let {B,} be standard Brownian motion starting at 0. Let s < t. Show that the
probability that {B,} has at least one zero in (s, t) is given by (2/it) cos - '(s/t)`/ 2 .
[Hint: Let
p(x) = P({B,} has at least one zero in (s, t) ^ B, = x).
Then for x > 0,
p(x)=P(min B,-<OIB,=x)=P( max BOIB,=

5,.sr s,<r,<1
=P(max BxIB s =0)=P(t x -<ts).
Likewise for x < 0, p(x) = P(r _ x -< t s). So the desired probability can be obtained
by calculating
2
EP(IB,I) =
f
o,
P(x)(-
ns
e zs ^ 2 dx.
EXERCISES 83
Throughout this set of exercises {S"} denotes the simple symmetric random
walk starting at 0.
1. Show the following for r 0.
2n Irl
(i) P(S, ^0,S 2 ^0,...,S 2n - t # 0 ,S2 = 2r) = 2
-2n
fl + rJ n
= 2r) _ (2k)( 2n 2k) Iri 2 -zn
(ii) P(r' 2 n ) = 2k, S Zn
k/J nk+r /Jnk
(*iii) Calculate the joint distribution of (y, B 1 ).

2. Let U. = #{k < n: Sk ( or Sk > 0} denote the amount of time spent (by the
polygonal path) on the positive side of the state space during time 0 to n.
(i) Show that
P(UZn = 2k) = P(r,( zn ) = 2k)

rz(k(n k))'
Check k = 0, k = n first. Use mathematical induction on n and consider

the conditional distribution of U 2n given the time of the first return to 0. Derive
the last equality of (11.3) for U 2 using the induction hypothesis and (11.3).]
(ii) Prove Corollary 11.4 by calculating the distribution of the proportion of time
spent above 0 in the limit as n - oo for the random walk.
(iii) How does one reconcile the facts that U2 "/2n++Z as n --* co and P(S" > 0) -+
as n -+ ô? [Hint: Consider the average length of time between returns to zero.]
*3. Let r (" 1 = min{k -< n: Sk = max o , ; , n S; } denote the location of the first absolute
maximum in time 0 to n. Then
{T (n) =k} _ {Sk> 0, Sk> S(,..., S,> Sk_1,Sk>-Sk+1,. ,Sk>- S n }
fork>- t,{T " =0}={Sk <-0,1 -<k-<n}.

( (
(i) Show that
p(r(zn^ = 0) = P( z (z" >> = 0) _ ( 2 n _2n

\n J2
[Hint: Use (10.1) and induction.]

(ii) Show that
P(T(z") = 2n) = P( ,r (2n+1) = 2n) _ 1(2n\2

2- n
[Hint: Consider the dual paths to {S k : k = 0, 1, ... , n} obtained by reversing the

order of the displacements, S, = S" S"_,, Sz = (S n Sn _ 1 ) + (S"_, Sn_2),

... , S;, = S". This transformation corresponds to a rotation through 180 degrees.
Use (11.2).]
(iii) Show that
P(i a") =
2k) = P(t (2 "' = 2k + 1) = 1 (2k)2-2k(2(n - k))2-2i"-k)
2\k/ n-k
for k = 1, ... , n in the first case and k = 0, ... , n - 1 in the second. [Hint: A
path of length 2n with a maximum at 2k can be considered in two sections.
Apply (i) and (ii) to each section.]
")
1
n C
(iv) lim P t /I1 -sin - ^,
it
2
0<t<1.
*4. Let F, UU be as defined in Exercise 2. Define V. = #{k < V" ) : Sk _ , >, 0, Sk _> 0} =
UT "). Show that P(VZ " = 2r ( S 2 . = 0) = 1/(n + 1), r = 0, 1, ... , n. [Hint: Use
induction and Exercise 3(i) to show that P(V2 . = 2r, S 2 " = 0) does not depend on
r,0_<r_<n.]
1. Show that the finite-dimensional distributions of the Brownian bridge are Gaussian.
2. Suppose that F is an arbitrary distribution function (not necessarily continuous).
Define an inverse to F as F - '(y) = inf{x: F(x) > y}. Show that if Y is uniform on
[0, 1] then X = F '(Y) has distribution function F.
-
3. Let {B r } be standard Brownian motion starting at 0 and let B* = B, - tB,, 0 < t <_ 1.
(i) Show that {B*} is independent of B,.
(ii) (The Inverse Simulation) Give a construction of standard Brownian motion
from the Brownian bridge. [Hint: Use (i).]
*4. Let {B,} be a standard Brownian motion starting at 0 and let {B*} be the Brownian
bridge.
(i) Show that for time points 0 < t, <t 2 < < t k _ 1,
1imP(B,,<x 1 ,i=1,2,...,k!-s<B, <s)=P(B,*.<x i ,i=1,...,k).

Eo
Likewise, for conditioning on B, e DE = [0, e) or DE = [-s, 0) the limit is

unchanged; for the existence of {B*} as the limit distribution (tightness) as
e -* 0, see theoretical complement 3.
(ii) Show that for m* = info ,,,, B*, M* = sup 0 ,,, B*, u <0< v,
P(u < m* _< M* _< v) _ exp{-2k2(v - u) 2 }

k=-m
- exp{-2[v+k(v-u)]2}.
[Hint: Express as a limit of the ratio of probabilities as in (i) and use Exercise
9.5(v). Also, 4(x, x + e) = e/(2n) " 2 exp( - x 2 /2) + o(1) as e -+ 0.]

EXERCISES 85
(iii) Prove
sup IB*l<yI=1+2 J (-1)ke - zk 2 y 2 , y>0.

P^ OSt6l / k=1
[Hint: Take u = v in (ii).]

(iv) P(M* < v) = 1 e 22 , v > 0. [Hint: Use Exercise 9.5(vi) for the ratio of
probabilities described in (i).]
5. (Random Walk Bridge) Let {S.} denote the simple symmetric random walk starting
at 0.
(i) Calculate P(S, = y I S 2 = 0), 0 -< m < 2n.
(*ii) Let U 2 . = # {k 2n: Sk _, 0, Sk > 0}. Calculate P(U2 , = r S Z = 0). [Hint:
See Exercise 11.4*.]
*6. (Brownian Meander) The Brownian meander {B+ } is defined as the limiting
distribution of the standard Brownian motion {B,} starting at 0, conditional on
{m = min,,, I B, > e} as E > 0 (see theoretical complement 4 for existence). Let
m + = min a ^ B,+ , M + = max o ,,, , B+ . Prove the following:
L'-(2kz+y)2'2]
(i) P(M -< x, B -< y) = ^ [e-l2kx)2/2 - 0 < y x.
k=-m
[Hint: Express as a limit of ratios of probabilities and use Exercise 9.5(v). Also
P(m > e)=(2/tt) 2 +o(1);see Exercise 10.5(ii)notingmin(A)= max(A)
and symmetry. Justify interchange of limits with the Dominated Convergence
Theorem (Chapter 0).]
(ii) P(M + < x) = I + 2 Yk I ( 1) k exp{ (kx) 2 /2}. [Hint: Consider (i) with
y = x.]
(iii) EM + = (2tt)'t 2 log 2 = 1.7374.... [Hint: Compute f P(M + > x) dx from
(ii).]
(iv) (Rayleigh Distribution) P(B, < x) = I e 22 , x > 0. [Hint: Consider (i) in
the limit as x oo.]
*7. (Brownian Excursion) The Brownian excursion {B* + } is defined by the limiting
distribution of {B} conditioned on {m* > s} as d0 (see theoretical complement
4 for existence). Let M* = max 0 , 1 B* + . Prove the following:
M
(i) P(M* + -< x) = 1 + 2 1 [l (2kx) 2 ] exp{(2kx) 2 /2}, x > 0.
k-1
[Hint: Write P(M* + < x) as the limit of ratios as 0. Multiply numerator

and denominator by s -Z , apply Exercise 4, and use ]'Hospital's rule twice to
evaluate the limit term by term. Check that the Dominated Convergence
Theorem (Chapter 0) can be used to justify limit interchange.]
(ii) EM* + = (n/2)" 22 . [Hint: Note that interchange of (limits) integral with sum in
(i) to compute EM* + = 10 P(M* + > x) dx leads to an absurdity, since the
values of the termwise integrals are zero. Express EM* + as
Q, q)
2 lim Y [(2kx) 2 l]exp{ ' (2kx) 2 } dx

e - 0 k=1
_

and note that for k >- A the integrand is nonnegative on [A, oc ). So Lebesgue's
monotone convergence can be applied to interchange integral with sum over
k > 1/(20) to get zero for this. Thus, EM* + is the limit as 0 - 0 of a finite
sum over k <i of an integral that can be evaluated (by parts). Note that this
gives a Riemann sum limit for 2J exp(- Zx 2 ) dx = (it/2)' 1 2 .]
(iv/2)'t 2 , if r = 1
(iii)
r
(2(2)li2)-.4( n )lt2 r 1 (r), if r = 2, 3 , ... ,
Ir
Ii
\ 2 /
where C(r) _ ^k , k ' is the Riemann Zeta function (r >, 2). [Hint: The case
r = 1 is given in (ii) above.] For the case r >- 2, we have
p*+(r)=r^xr-1(1 - F(x))dx=2r ^^^t' '{(2t) 2 1}e ^ z n 22 dt,

0 k=l k 0
where the interchange of limits is justified for r >- 2 since

1
i -I tr - 'I(2t) 2 _ lle - an = rz dt < oo, r = 2, 3, ... .
k=1 k f,^ '
In particular, letting Z denote a standard normal random variable,

2 (n)l/ 2 [EIZI . +I EIZI'-1]
u*+(r) = 2 - 'r^(r)(2)''
Consider the two cases whether r is odd or even.

8. Prove the Glivenko-Cantelli Lemma by justifying the following steps.
(i) For each t, the event {F (t) - F(t)} has probability 1.
F
(ii) For each t, the event {F(t) - F(t )} has probability 1.

(iii) Let r(y) = inf{t: F(t) -> y}, 0 <y < 1. Then F(z(y) ) y < F(r(y)).
(iv) Let
= max {IF(z(k/m)) - F(r(k/m))I, IF(t(k/m) - ) - F(i(k/m) - )I}.
I km
Then, by considering the cases
TI k m 1 )î^T^m^, t<1(m)andt>t(1),
check that
sup IF(t) - F(t)I < D, + 1

m
m m
(v) C= U U {{F(t(k/m))+-*F(i(k/m))} u {F(z(k/m) - )+-* F(z(k/m) - )}
m=1k=1

EXERCISES 87
has probability zero, and for w e C and each m
D;n , n (w)->0 asn- co.

(vi) sup IF (t, w) - F(t)J -> 0
F as n -+ co for we C.
I
9. (The Gnedenko-Koroljuk Formula) Let (X 1 ..... X) and (Y1 .....) be two

independent i.i.d. random samples with continuous distribution functions F and G,
respectively. To test the null hypothesis that both samples are from the same
population (i.e., F = G), let Fn and G. be the respective empirical distribution functions
and consider the statistic Dn , n = sup x jFn (x) - G(x)I. Under the null hypothesis,
X 1 ..... X, Y,..... Y are 2n i.i.d. random variables with the common distribution
F. Verify that under the null hypothesis:
(i) the distribution of Dn. does not depend on F and can be explicitly calculated
according to the formula
P Dn.n < r = P max (S kc2nl*

I
n O<ks2n
where {Sk 2 nj *: k = 0, 1, 2, ... , 2n} is the simple symmetric random walk bridge
(starting at 0 and tied down at k = 2n) as defined in Exercise 5. [Hint: Arrange
X, Y1 ..... Y in increasing order as X(1) < X (2) < < X 2
define the kth displacement of {Sk 2 n ) *} by
Sk2nl* Sk2n`* = + 1 if X(k) E {X,, ... , Xn }

and
Skan)* - Skzn),* = - 1 if X(k) E { Y1 , ... , Yn }. ]

(ii) Find the analytic expression for the probability in (i). [Hint: Consider the event
that the simple random walk with absorbing boundaries at r returns to 0 at
time 2n. First condition on the initial displacement.]
(iii) Calculate the large-sample-theory (i.e., asymptotic as n - x) limit distribution
of fn D., n . See Exercise 4(iii).
(iv) Show
C2nl
r'\ n-r/f
Pl sup (F.(x)- Gn(x))<= 1 ---- r= 1,...,n.
n () 2n
(11J
[Hint: Only one absorbing barrier occurs in the random walk approach.]

1. Prove that t defined by (13.5) (r = 1, 2, ...) are stopping times.
2. If r is a stopping time and m > 0, prove that r A m := min{r, m} is a stopping time.
3. Prove E(S l,, > .,) -*0 as m -* oo under assumptions (2) and (3) of Theorem 13 1.
i

4. For the simple symmetric random walk, starting at x, show that E{S
yP(r y = r) for r _< m.
, ,1(, ,)} _
5. Prove that EZn is independent of n (i.e., constant) for a martingale {Zn ). Show also
that E(Zn I {Z o , ... , Zk )) = Zk for any n> k.
6. Write out a proof of Theorem 13.3 along the lines of that of Theorem 13.1.
7. Let {Sn } be a simple symmetric random walk with p a (2, 1).
(i) Prove that {(q/p)s ': n = 0, 1, 2, ...} is a martingale.
(ii) Let c <x <d be integers, So = x, and T = z, A t d := min(t,, r d ). Apply Theorem
13.3 to the martingale in (i) and t to compute P({S n } reaches c before d).
8. Write out a proof of Proposition 13.5 along the lines of that of Proposition 13.4.
9. Under the hypothesis that the pth absolute moments are finite for some p _> 1, derive
the Maximal Inequality P(MM >, A) < EIZn I/ti" in the context of Theorem 13.6.
10. (Submartingales) Let {Z n : n = 0, 1, 2, ...} be a finite or infinite sequence of
integrable random variables satisfying E(Zn+ I {Z 0 , ...Zn }) > Zn for all n. Such a
sequence {Zn } is called a submartingale.
(i) Prove that, for any n > k, E(Zn ( {Z0.....4)) >- Z.
(ii) Let Mn = max{Z o , ... , Zn }. Prove the maximal inequality P(MM _> A) _< EZ/A 2
for A > 0. [Hint: E(ZkIAk(Zn Zk)) = E(Z,IAkE(Zn Zk I {Zo, ... , Zk})) i 0
for n>k, where A k :={Z 0 <A,...,Zk _ I <A,Zk ->.i}.]
(iii) Extend the result of Exercise 9 to nonnegative submartingales.
11. Let {Zn } be a martingale. If EIZ,,I < oo then prove that IZ,,I is a submartingale,
p >_ 1. [Hint: Use Jensen's or Hlder's Inequality, Chapter 0, (2.7), (2.12).]
12. (An Exponential Martingale) Let {X,: j _> 0} be a sequence of independent random
variables having finite moment-generating functions 4,(^):= E exp{^XX } for some
96 0. Define Sn := X, + + XX , Zn = exp{^Sn }/fl7 = , q().
(i) Prove that {Zn } is a martingale.
(ii) Write M = max{S...... Sn }. If > 0, prove that
n
P(MM - A) -< exp{ ZA}
11 O ; (Z) ( >0).
(iii) Write mit = min{S l , ... , Sn }. If < 0, prove
P(mn -< A) -< exp{ZA} 11 4;(Z) (A > 0).

i =1
13. Let {Xn : n >_ 1) be i.i.d. Gaussian with mean zero and variance a 2 > 0. Let
Sn = X, + + Xn , MM = max{S 1 , ... , Sn }. Prove the following for A > 0.
(i) P(MM _> 2) < exp{ 2 2 /(2a 2 n)). [Hint: Use Exercise 12(ii) and an appropriate
choice of .]
(ii) P(max {ISST: 1 < j < n} >_ Aaln) _< 2 exp{ A 2 /2}.
14. Let r ' r 2 be stopping times. Show the following assertions (i) (v) hold.
(i) z l v r 2 '= max(r l , tr z ) is a stopping time.

EXERCISES 89
(ii) il A r Z := min(r l , 2 2 ) is a stopping time.

(iii) t l + 1 2 is a stopping time.
(iv) at,, where a is a positive integer, is a stopping time.
(v) If i l < t Z a.s. then it need not be the case that 1 2 r, is a stopping time.
(vi) If r is an even integer-valued stopping time, must zr be a stopping time?
15. (A Doob-Meyer Decomposition)
(i) Let { } } be an arbitrary submartingale (see Exercise 10) with respect to
sigmafields , o c ,yl c ,F2 c . Show that there is a unique sequence { V} such
that:
(a) 0= VQ _< V <_ _< . .
I VZ
(b) V. is .f _ I -measurable.
a.s.
(c) {Z} := { } V} is a martingale with respect to .. [Hint: Define

V=V_ 1 +E{YIF 1 }Y,n> 1.]
(ii) Calculate the {V}, {Z} decomposition for Y = S, where {S} is the simple
symmetric random walk starting at 0. [Note: A sequence { V} satisfying (b) is
called a predictable sequence with respect to {.y}.]
16. Let {S} be the simple random walk starting at 0. Let {G} be a predictable sequence
of nonnegative random variables with respect to .f = a{X l , ... , X} = a So , ... , S},
where X. = SS S,_ 1 (n = 1, 2, ...); i.e., each G. is 9_ 1 -measurable. Assume each
G. to have finite first moment. Such a sequence {G} will be called a strategy. Define
,
Wn =Wo+ Y Gk(Sk Sk-I), nil,

k=1
where Wo is an integrable nonnegative random variable independent of {S}

(representing initial capital). Show that regardless of the strategy {G} we have the
following.
(i) If p = Z then { W} is a martingale.
(ii) If p > Z then { W} is a submartingale.
(iii) If p < Z then {W} is a supermartingale (i.e., EIWI < cc, E(Wn+1 I f) W,
n=1,2,...).
(iv) Calculate EW, n > 1, in the case of the so-called double-or-nothing strategy
defined by G = 2 S ^ - '1 (S=i _ 1j , n >_ 1.
17. Let {S} be the simple symmetric random walk starting at 0. Let r = inf{n ? 0:
S=2n}.
(i) Calculate Er from the distribution of t.
(ii) Use the martingale stopping theorem to calculate Et.
(*iii) How does this generalize to the cases r = inf{n ? 0: S = b n}, where b is
a positive integer? [Hint: Check that n + S. is even for n = 0, 1, 2, ....]
18. (i) Show that if X is a random variable such that g(z) = Ee^ x is finite in a
neighborhood of z = 0, then EX < oo for all k = 1, 2.....
(ii) For a Brownian motion {X,} with drift p and diffusion coefficient a 2 , prove that
exp{AX, Atp A 2 a 2 t/2} (t _> 0) is a martingale.
19. Consider an arbitrary Brownian motion with drift and diffusion coefficient a 2 > 0.
(i) Let m(x) = ET", where Ts is the time to reach the boundary {c, d} starting at
x e [c, d]. Show that m(x) solves the boundary-value problem

d e m dm
m(c)=m(d)=0.
zag d2 + dz 1,
(ii) Let r(x) = Px (r d < ;) for x e [c, d]. Verify that r(x) solves the boundary value
problem
Z 2 x z
dx
+ dr = 0,
dx
r(c) = 0, r(d) = 1.

1. In the case = Z, consider the process {Z,N (s)} defined by
ZN Y
(n) ,v
S n
n=1 2 ...N
1 \NJ f i D,
and linearly interpolate between n/N and (n + 1)/N.

(i) Show that {Z N } converges in distribution to BS + 2cs 2 (1 s 2 ), where
{B'} = {B s sB 1 : 0 < s < 1} is the Brownian Bridge.
(ii) Show that R N /(J/ DN ) converges in distribution to maxo, S,1 B; min o , s , , B'.
(iii) Show that the asymptotic distribution in (ii) is nondegenerate.
THEORETICAL COMPLEMENTS
Theoretical Complements to Section I.1

1. Events that can be specified by the values of X, X,, 1 , ... for each value of n (i.e.,
events that depend only on the long-run values of the sequences) are called tail events.
Theorem T.1.1. (Kolmogorov Zero-One Law). A tail event for a sequence of

independent random variables has probability either zero or one. 0
Proof. To see this one uses the general measure-theoretic fact that the probability
of any event A belonging to the sigmafield F = 6 {X,, X2, ... , Xn , ...} generated (
by X 1 , X2 ,... , X, ...) can be approximated by events A 1 ..... A n , .. belonging to

.
the field of events ,`fo = Un 1 ar {X i ..... X} in the sense that A. e a{X ... , X} for
each n and P(AAA) -+ 0 as n -+ oo, where A denotes the symmetric difference
AAA = (A n A n v (A` n A). Applying this approximation to a tail event A, one
obtains that since Ac u{X +1, X +2, ...} for each n, A is independent of each event
A. Thus, 0 = lim... P(AAA) = 2P(A)P(AC) = 2P(A)(1 P(A)). The only solutions
to the equation x(1 x) = 0 are 0 and 1. n
2. Let S. = X l + + X, n 1. Events that depend on the tail of the sums are trivial
(i.e., have probability I or 0) whenever the summands X 1 , X 2 , ... are i.i.d. This is a

THEORETICAL COMPLEMENTS 91
consequence of the following more general zeroone law for events that symmetrically
depend on the terms X 1 , X 2 , .. . of an i.i.d. sequence of random variables (or vectors).
Let ' denote the sigmafield of subsets of I8x' _ {(x,, x 2 , ...): x ; E W} generated by
events depending on finitely many coordinates.
Theorem T.1.2. (HewittSavage ZeroOne Law). Let X,, X 2 ,. . . bean i.i.d. sequence
of random variables. If an event A = {(X,, X 2 , ...) e B}, where Be , is invariant
under finite permutations (Xi ,, X.....) of terms of the sequence (X,, X 2 , ...), that
is, A = {(X, X......) e B} for any finite permutation (i,, i 2 , ...) of (1, 2, ...), then
P(A) = I or 0.
As noted above, the symmetric dependence with respect to {X n } applies, for

example, to tail events for the sums {S n }.
Proof. To prove the HewittSavage 0-1 law, proceed as in the Kolmogorov 0-1 law
by selecting finite-dimensional approximants to A of the form A n = {(X,, ... , Xn ) e B},
B. e 4n, such that P(AAA,) - 0 as n * oo. For each fixed n, let (i,, i Z , ...) be the
B,
permutation (2n, 2n 1, ... , 1, 2n + I, ...) and define A. = {(X; , ... , X; ,) e B}.
Then A and A n are independent with P(A, n A n ) = P(A,)P(A,) = (P(A n )) 2 - (P(A)) z
as n ^ co. On the other hand, P(AA,) = P(ALA) * 0, so that P(A,O n ) 0 and,
in particular, therefore P(A n r n ) * P(A) as n 4 co. Thus x = P(A) satisfies x = x 2 .
n
Theoretical Complements to Section 1.3

1. Theorem T.3.1. If {Xn } is an i.i.d. sequence of integer-valued random variables for
a general random walk S. = X, + + X,,, n >_ 1, So = 0, on Z with 1 - EX, = 0,
then {Sn } is recurrent.
Proof. To prove this, first observe that P(S, = 0 i.o.) is 1 or 0 by the HewittSavage
zeroone law (theoretical complement 1.2). If X , P(S n = 0) < oc, then P(S, = 0
i.o.) = 0 by the BorelCantelli Lemma. If Y_ , P(S, = 0) is divergent (i.e., the
expected number of visits to 0 is infinite), then we can show that P(Sn = 0 i.o.) = 1
as follows. Using independence and the property that the shifted sequence
Xk , Xk ,,, ... has the same distribution as X,, X 2 , ... , one has
1 >, P(S, = 0 finitely often) > P(S, = 0, S, 0, m> n)
=Z
P(Sn =0)P(Sm S, 540,m>n)
= Y_ P(S = 0)P(Sm, 0,m i 1).
Thus, if Z P(S, = 0) diverges, then P(S, : 0, m >_ 1) = 0 or equivalently P(S m = 0

for some m >_ 1) = 1. This may now be extended by induction to get that at least r
visits to 0 is certain for each r = 1, 2, .... One may also use the strong Markov
property of Chapter II, Section 4, with the time of the rth visit to 0 as the stopping time.
From here the proof rests on showing E n P(S, = 0) is divergent when p = 0.
Consider the generating/unction of the sequence P(S, = 0), n = 0, 1, 2, ... , namely,

g(x):= P(S" = 0)x", Ix) < 1.

"=o
The problem is to investigate the divergence of g(x) as x -+ 1 . Note that P(S" = 0) -
is the 0th-term Fourier coefficient of the characteristic function (Fourier series)
Ee tts" _ P(S" = k) e ttk
Thus,
1 1 "
P(S " = 0) = Ee"s^ dt = (p"(t) dt
lac _ 2n _"
where cp(t) = Ee. It follows that for lxl < 1,
1 N dt
9(x) = 2n " 1 - xtp(t)
Thus, recurrence or transience depends on the divergence or convergence, respectively,

of this integral. Now, with p = 0, we have q(t) = 1 - o(Iti) as t - 0. Thus, for any
e > 0 there is a 6 > 0 such that I1 - cp 1 (t)I ltl. I(P2(t)I lti, for Itl _< S, where
q(t) = q 1 (t) + i , 2 (t) has real and imaginary parts q, qi 2 . Now, for 0 < x < 1, noting
that g(x) is real valued,
f n dt f ( l
" Re 1 dt =
f 1 - x(p (t) dt a 1 - xq,(t)
dt
J n -- x0(t) - .L \1 - xtP(t)/ J -" I 1 - x^P(t)Iz j I1 - x^V(t)I Z
> a
J- a (1 -
1 -x(p 1 (t)
x(V1(t))2 + x2(t)
1-x

dt v
f b
-6 (1
1 -x
- x + xslti) 2 + x2e2t2
dt
dt> a 1-x dt
2(1 - x) 2 + 3x 2 r 2 t 2 3[(1 -x) 2 + e 2 t 2 ]
2 - 1^ ES 1 n
=tan /J-^ asx-' 1.
3e 1 - x 3E
Since e is arbitrary, this completes the argument. n
The above argument is a special case of the so-called Chung-Fuchs recurrence

criterion developed by K. L. Chung and W. H. J. Fuchs (1951), "On the Distribution
of Values of Sums of Random Variables," Mem. Amer. Math. Soc., No. 6.

1. The Kolmogorov Extension Theorem holds for more general spaces S than the case
S = R' presented. In general it requires that S be homeomorphic to a complete and
separable metric space. So, for example, it applies whenever S is R k or a rectangle,

or when S is a finite or countable set. Assuming some background in analysis, a

simple proof of Kolmogorov's theorem (due to Edward Nelson (1959), "Regular
Probability Measures on Function Space," Annals of Math., 69, pp. 630-643) in the
case of compact S can be made as follows. Define a linear functional Ion the subspace
of C(S') consisting of continuous functions on S' that depend on finitely many
coordinates, by
l(f)' L

'I "ki
f(x1 .,..., x !k)pI1....f Pik(dxi^ . . .dx^k)s
where
f((xl)IE/) = J (xi i , . . . , xik).
By consistency, l is a well-defined linear functional on a subspace of C(S') that, by

the StoneWeierstrass Theorem of functional analysis, is dense in C(S'); note that S'
is compact for the product topology by Tychonoff's Theorem from topology (see H.
L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, pp. 174, 166). In
particular, I has a natural extension to C(S') that is linear and continuous. Now
apply the RieszRepresentation Theorem to get a (probability) measure p on (S', .y )
such that for any Je C(S'), l(f) = f s t f dp (see Royden, loc. cit., p. 310). To make
the proof in the noncompact but separable and complete case, one can use a
fundamental (homeomorphic) embedding of S into IJ (see P. Billingsley (1968),
Convergence of Probability Measures, Wiley, New York, p. 219); this is a special case
of Urysohn's Theorem in topology (see H. L. Royden, loc. cit., p. 149). Then, by
making a two-point compactification of R', S can be further embedded in the Hilbert
cube [0,1], where the measure p can be obtained as above. Consistency allows one
to restrict p back to S. 0
1. (Brownian Motion and the Inadequacy of Kolmogorov's Extension Theorem) Let

S = C[0,1] denote the space of continuous real-valued functions on [0, 1] equipped
with the uniform metric
p(x, y) = max Jx(t) y(t)I, x, ye C[0, 1].

osr<_i
The Borel sigmafield .4 of C[0, 1] for the metric p is the smallest sigmafield of subsets
of C[0, 1] that contains all finite-dimensional events of the form
{xeC{0,1]:a.<x(t )_<b ,i= 1,.. ,k},

; ;
k 1, 0 <... < t, I, a 1 ... .a k' b.,.. ,b k C_R"
The problem of constructing standard Brownian motion (starting at 0) on the time

interval 0 _< t _< 1 corresponds to constructing a process {X,: 0 <_ t _< 1} having
Gaussian finite-dimensional distributions with mean zero and variancecovariance
matrix y ;; = min(t i , t.) for the time points 0 _< t o < . < t k _< 1, and having a.s.
continuous sample paths. One can easily construct a process on the product space
S2 = t O . 11 (or R 10, ' ) ) equipped with the sigmafeld F generated by finite-dimensional

events of the form {x E 118 10 ' ) : a, < x(t 1 ) _< b i , i = 1, ... , k} having the prescribed
finite-dimensional distributions; in particular, consistency in the Kolmogorov
extension theorem can be checked by noting that for any time points
t 1 < t 2 < . < t k , the matrix y ;j = min(t,, t j ), 1 i, j _< k, is symmetric and
nonnegative definite, since for any real numbers x l , ... , x k , t o = 0,
min(i,j) / k \2
^yijXiXj=>Xi Xj ^ (Lr trt) = ttrtr-1) xi J1 iO.
i,j i,j r=1 r i=r
However, the problem with this construction of a probability space (i2, F, P) for
{X,} is that events in F can only depend on specifications of values at countably
many time points. Thus, the subset C[0, 1] of S2 is not measurable; i.e., C[O, 1] 0 F.
This dilemma is resolved in the theoretical complement to Section I.13 by showing
that there is a modification of the process {X} that yields a process {B,} with sample
paths in C[O, 1] and having the same finite-dimensional distributions as {X}; i.e.,
{B,} is the desired Brownian motion process. The basic idea for this modification is
to show that almost all paths q + Xq , q e D, where D is a countable dense set of time
points, are uniformly continuous. With this, one can then define {B,} by the continuous
extension of these paths given by
B, _ XQ ift=qeD
lim Xq if t D.
q1t
It is then a simple matter to check that the finite-dimensional distributions of {Xr }

carry over to {B,} under this extension. The distribution of {B r } defines the Wiener
measure on C[O, 1]. So, from the point of view of Kolmogorov's (canonical)
construction, the main problem remains that of showing a.s. uniform continuity on
a countable dense set. A solution is given in theoretical complement 13.1.
2. In general, if {X,: t e i} is a stochastic process with an uncountable index set, for
example I = [0, 1], then the measurability of events that depend on sample paths at
uncountably many points t e I can be at issue. This issue goes beyond the
finite-dimensional distributions, and is typically solved by exploiting properties of
the sample paths of the model. For example, if I is discrete, say I = {0, 1, 2, ...},
then the event {sup , X. > x} is the countable union {X > x} of measurable
sets, which is, therefore, measurable. But, on the other hand, if I is uncountable, say
1 = [0, 1], then {sup,., X, > x} is an uncountable union of measurable sets. This of
course does not make {sup,,, X,> x} measurable. If, for example, it is known that
{XX } has continuous sample paths, however, then, letting T denote the rationals in
[0, 1], we have {sup,,, X, > x} = {sup, ET X, > x} = U tET {X> x}. Thus {sup,,, X, > x}
is seen to be measurable under these circumstances. While it is not always possible
to construct a stochastic process {X,: t _> 0} having continuous sample paths for a
given consistent specification of finite-dimensional distributions, it is often possible
to construct a model (S2, F, P), {Xr : t 0} with the following property, called
separability. There is a countable dense subset T of I = [0, oo] and a set D e F with
P(D) = 0 such that for each we S1 - D, and each t e i there is a sequence t 1 , t 2 ,... in
T (which may depend on w) such that t -> t and X(w) -+ X(w) (P. Billingsley (1986),

Probability and Measure, 2nd ed., Wiley, New York, P. 558). In theory, this is enough
sample path regularity to make manageable most such measurability issues connected
with processes at uncountably many time points. In practice though, one seeks to
explicitly construct models with sufficient sample path regularity that such
considerations are often avoidable. The latter is the approach of this text.
1. Let {X;" }, n = 1, 2, ... and {X,} be stochastic processes whose sample paths
)
belong to a metric space S with metric p; for example, S = C[O, 1] with

p(aw, ri) = sup o , t ,, 1co, n,j, w, ry E S. Let .4 denote the Borel sigmafield for S. The
distributions P of {X,} and P. of {X," }, respectively, are probability measures on
)
(S, 9). Assume {X;" } and {X} are defined on a probability space (S2, .F , Q). Then,
)
(i) P(B) = Q({w E ): {X,(uo)} E B})

(T.8.1)
(ii) P = Q({ (U aft { ^n) (m)} E B}), n >- 1.
Convergence in distribution of {X} to {X} has been defined in the text to mean that
the sequence of real-valued random variables Y:= f({X;" }) converges in distribution )
to Y:= f({X})for each continuous (for the metric p) function f: S - W. However,

an equivalent condition is that for each bounded and continuous real-valued function
f: S one has,
lim Ef({X!" }) = Ef({X,}).

) (T.8.2)
To see the equivalence, first observe that for any continuous f: S R', the
functions cos(rf) and sin(rf) are, for each r E R', continuous and bounded functions
on S. Therefore, assuming that condition (T.8.2) gives the convergence of the
characteristic functions of the Y. to that of Y for each continuous f on S. In particular,
the Y" must converge in distribution to Y. To go the other way, suppose that f: S ff8'
is continuous and bounded. Assume without loss of generality that 0 <f _< 1. Then,
for each N _> 1,
" k-1 /k-1 k) ( N k (k-1 kl

P < f< - /J , J fdP^ P l----< < /J (T.8.3)
k , N N N s k=1 N N N
Equivalently, by rearranging terms,
N k Ej P(f> k N ) -S jsfdP_<- ky' P^f>k N l ^. (T.8.4)
Likewise, these inequalities hold with P. in place of P. Therefore, by (T.8.4) applied

to P,
x _ 1
fdP^N P^f> k N l l
S k=t
1N kj=, liminf P f> k N- I

N
-
t
liminf f dP + N , (T.8.5)
s
by (T.8.4) applied to P, and the fact that lim Prob(} > x) = Prob(Y > x) for all
points x of continuity of the d.f. of Y implies liminf Prob( Y > y) >- Prob(Y > y) for
ally. Letting N -' gives
S J f dP -< liminf J fdP..

n s
The same argument applied to f gives that
s J f dP >- limsup f dP.

is
Thus, in general,
limsup

J s
f dP < I f dP -< liminf I f dP
i s ', s
(T.8.6)
which implies
limsup J f dP = liminf J f dP = J f dP. (T.8.7)

s s s
This is the desired condition (T.8.2). n
With the above equivalence in mind we make the following general definition.
Definition. A sequence {P} of probability measures on (S, .4) converges weakly (or in
distribution) to a probability measure P on (S, 9) provided that lim f s f dP = f s f dP
for all bounded and continuous functions f: S -+ U8'.
Weak convergence is sometimes denoted by P P as n -+ oo. Other equivalent

notions of convergence in distribution are as follows. The proofs are along the lines
of the arguments above. (P. Billingsley (1968), Convergence of Probability Measures,
Theorem 9.1, pp. 11-14.)
Theorem T.8.1. (Alexandrov). P, = P as n -. oo if and only if

(i) lim.^ P(A) = P(A) for all A e -4 such that P(A) = 0, where A denotes the
boundary of A for the metric p.
(ii) limsup, P(F) P(F) for all closed sets F c S.
(iii) liminf, P,(G) P(G) for all open sets G c S.
2. The convergence of a sequence of probability measures is frequently established by

an application of the following theorem due to Prohorov.

Theorem T.8.2. (Prohorov). Let {PP } be a sequence of probability measures on the

metric space S with Bore! sigmafield . If for each r > 0 there is a compact set K E c S
such that
P(K) > 1 a for all n = 1, 2, ... , (T.8.8)
then {P.} has a subsequence weakly convergent to a probability measure Q on (S, t7).
Moreover, if S is complete and separable then the condition (T.8.8) is also necessary.
The condition (T.8.8) is referred to as tightness of the sequence of probability measures

{PP }. A proof of sufficiency of the tightness condition in the special case S = 11' is
given in Chapter 0, Theorem 5.2. For the general result, consult Billingsley, loc. cit.,
pp. 37-40. A version of (T.8.8) for processes with continuous paths is computed
below in Theorem T.8.4.
3. In the case of probability measures {P P } on S = C[0, 1], if the finite-dimensional
distributions of P. converge to those of P and if the sequence {P n } is tight, then it
will follow from Prohorov's theorem that {P P } converges weakly to P. To check
tightness it is useful to have the following characterization of (relatively) compact
subsets of C[0, 1] from real analysis (A. N. Kolmogorov and S. V. Fomin (1975),
Introductory Real Analysis, Dover, New York, p. 102).
Theorem T.8.3. (Arzela-Ascoli). A subset A of functions in C[0, 1] has compact

closure if and only if
(i) sup Iwol < c,

..A
(ii) lim sup v w (S) = 0,

5-o w.A
where v() is the oscillation in we C[0,1 ] defined by v w (S) = sup <5Iws co . D
The condition (ii) refers to the equicontinuity of the functions in A in the sense that
given any r > 0 there is a common S > 0 such that for all functions w e A we have
Iw, cw,I < e if It sI < b. Conditions (i) and (ii) together imply that A is uniformly
bounded in the sense that there is a number B for which
IIwII := sup ow,l _< B for all w e A.

ose<i
This is because for N sufficiently large we have sup WEA v w (1/N) < I and, therefore,
for each 0<t<1
N
Iw1(< Iw01 + wi,1N wr-,u Nl _< sup Iwol + N sup v^,( = B.
i =1 weA w.A \N
4. Combining the Prohorov theorem (T.8.2) with the Arzela-Ascoli theorem (T.8.3)
gives the following criterion for tightness of probability measures { P} on S = C[0, 1].
Theorem T.8.4. Let {P} be a sequence of probability measures on C[0, 1]. Then
{P} is tight if and only if the following two conditions hold.

(i) For each ry > 0 there is a number B such that
P"({weC[0,1]:1W o l> B})<ry, n = 1,2,....
(ii) For each e> 0, ry > 0, there is a 0 <(5 < 1 such that
P"({weC[0,1]:v",(b)>_a})_<ry, n>_1.
Proof. If {P"} is tight, then given ry > 0 there is a compact K such that P(K) > 1 ry
for all n. By the ArzelaAscoli theorem, if B> supKIWol then
P"({w e C[O, 1]: I(0 o l >, B}) 5 P"(K c ) _< 1 (1 n) = n.

Also given e > 0 select b > 0 such that sup. EK v.((5) < E. Then
P"({w e C[0,1]: v W (S) >_ ej) < P(KC) < ry for all n _> 1.
The converse goes as follows. Given ry > 0, first select B using (i) such that
P"({w: Iw o l < B}) _> 1 Zry, for n >, 1. Select S, using (ii) such that P({w: v w ((5,) < 1/r})
1 for n >, 1. Now take K to be the closure of
{w: Iw o l < B} n n t
co: v w (8,) <
1r
1
Then P"(K) > 1 ry for n > 1, and K is compact by the ArzelaAscoli theorem.
The above theorem taken with Prohorov's theorem is a cornerstone of weak

convergence theory in C[O, 1]. If one has proved convergence of the finite-dimensional
distributions of {X,('} to {X,} then the distributions of {Xo'} must be tight as
probability measures on III' so that condition (i) is implied. In view of this and the
Prohorov theorem we have the following necessary and sufficient condition for weak
convergence in C[0,1] based on convergence of the finite-dimensional distributions
and tightness.
Theorem T.8.5. Let {X: 0 < t < 1) and {XX :0 _< t <, 1} be stochastic processes on
(S2, .f, P) which have a.s. continuous sample paths and suppose that the
finite-dimensional distributions of {X} converge to those of {X}. Then {X;"}
converges weakly to {X,} if and only if for each F > 0
lim sup sup IX;"^ Xs" I > e^ = 0.

)
P(
6-0 n Is-11<'6
Corollary. For the last limit to hold it is sufficient that there be positive numbers
a, , M such that
EIX;" X" M
) -< MIt
it sl' +0 for all s, t, n.
To prove the corollary, let D be the set of all dyadic rationals in [0, 1], i.e., numbers
in [0, 1] of the form j/2" for integers j and m. By sample path continuity, the oscillation

of the process over It - sI <- is given by
v(D, (5) = sup; X;"'l a.s.

It - si-<a.
S.t E D
Take 6 = 2 -k+ '. Then, taking suprema over positive integers i, j, n,
v(D, (5) 2 sup IXiz)-m - Xj2 -k1. )
j2 - k<i2 - ^<(j+l)2 - k
Now, for j2 -k < i2 - m < (j + 1)2 -k , writing
r
i2 - m=j2 -k + 1] 2 - m' wherek<m 1 <m 2 < <m,^m,
we have, writing a(p) = Jv=, 2 , -
(n) lI ^`
J)j
X - k X.j2-k+al/+) Xj2 - k+al -
N=1
Therefore,
v(D, (5) < 2 Y sup IX( +l)2( m - Xnz--I

m=k+1 0-<h-<2'-1
Let e > 0 and take = 2 -k+ so small (i.e., k so large) that ^=k+1 1/m 2 < E/2. Then
P(v(D, S) > s) P^ sup IXin+n2-m - Xhz--I >

k+1 <
0-<h2m-, 12^
2'^-1 1 1
V E
m=k+1
Y- P IXI h X > () J .
h=0 m
Now apply Chebyshev's Inequality to get

/ ao 2' - 1
a
P(vn(D, (5) > E) m2a EIX(h+1)2-m Xh2 - "l
m=k+l h=0
x m 2
m 2a2m M2 m(l+p) = M m
mk+1 mk+12
This bound does not depend on n and goes to zero as S -+ 0 (i.e.,

k = log 2 (5 -' + 1 - + oo) since it is the tail of a convergent series. This proves the
corollary. n
5. (FCLT and a Brownian Motion Construction) As an application of the above

corollary one can obtain a proof of Donsker's Invariance Principle for i.i.d. summands
having finite fourth moments. Moreover, since the simple random walk has moments

of all orders, this approach can also be used to give an alternative rigorous construction
of the Wiener measure based on Prohorov's theorem as the limiting distribution of
random walks. (Compare theoretical complement 13.1 for another construction). Let
Z 1 , Z 2 ,... be i.i.d. random variables on a probability space (S2, y , P) having mean
zero, variance one, and finite fourth moment m, = EZ. Define So = 0,
S"=Z 1 + +Z",n>,1,and
Xtn] = n- IIZ
Stnr] + n - ' ]Z (nt [ nt])Zt,, i +l, 0 <- I -< 1.
We will show that there are positive numbers a, and M such that
E^X;" XS" ] â ] - Mit sa l+ for 0 < s, t <- 1, n= 1, 2, .... (T.5.1)
By our corollary this will prove tightness of the distributions of the process
n = 1, 2, .... This together with the finite-dimensional CLT proves the FCLT under
the assumption of finite fourth moments. One needs to calculate the probabilities of
fluctuations described in the ArzelaAscoli theorem more carefully to get the proof
under finite second moments alone.
To establish (T.5.1), take a = 4. First consider the case s = (j/n) < (k/n) = t are
at the grid points. Then
E{X;" 1snl} 4 = n -2 E{Z^ +1 + ... + Z} 4

)
k k k k
=n 2 E{Zi,ZZ,,Z;,}
ij+1 i2=j+1 i3=j+1 is=j+1
= fl-2 (k j)EZ1 + (2)( k 1 )(EZ;) 2

{
1.
Thus, in this case,
E{X;'] XS"]}4 = n-2{(k

j)m4 + 3(k j)(k j 1)}
k z
n -2 {(kj)m 4 + 3(k j) 2 } -<(m,+3) 1
n n
(m 4 + 3)It s1 2 = c l (t st 2 , where c 1 = m 4 + 3.
Next, consider the more general case 0 < s, t -< 1, but for which It si -> 1/n. Then,
for s < t,
4
E{X;' X;" ] } 4 = n Z E
t [ntl
j=(ns]+ 1
ZJ + (nt [nt])Z[",1+1 ([ns] ns)Z]"sl+l
[n^] \a
n - 2 3 E( ' Z^ ) + (nt [nt]) 4 EZ1", 1+ 1
=[ns]+I
+ (ns [ns]) 4 EZ[ns]+1

J
n - 2 3 4 {c 1 ([nt] [ns]) 2 + (nt [nt]) 2 m 4 + (ns [ns])2m4}
n -2 3ac 1 {([nt] [ns]) z + (nt [nt])z + (ns [ns])Z}
n -2 3 4 c i {([nt] [ns]) + (nt [nt]) + (ns [ns])} 2

= n -2 3 4 c 1 {nt ns + 2(ns [ns])} 2
n -2 3 a c 1 {nt ns + 2(nt ns)} 2
= n -2 3 6 c1(nt ns) 2 = 3 6 c1(t s) 2 .
In the above, we used the fact that (a + b + c) 4 < 3 4 (a 4 + b a + c a ) to get the first
inequality. The analysis of the first (gridpoint) case was then used to get the second
inequality. Finally, if it sI < 1/n, then either
(a) k_<s<t<k+
l forsome0_<k_<n 1,or
n n
k k+ 1 k+2
(b)- s< and\t< forsome0_<k_<n-1.
n n n n
In (a), S1 ,, = S, so that, since Int nsl 1,
E{X;") Xs"'}a = n -2 E{n(t s)4+ }4 = m4n 2 (t s)4

= ma(nt ns) 2 (t s) 2 -< m a (t
In (b), S = Zk+ ,, so that
E{X^") _ XS"^}a = n Z E Zk + nl t
k k
\ Zk+i n(s -^Zk+I
la
+i
\ ( n n )
( ( k+l ^ k+11 la
=t1
n -Z E n \ t Zk+zn s JZk+i Jj
n n
<24n2(t
nt
k a+(k+
am
1 a
s) m a
+
= 2 a n 2 m a 1(t k 1-^ a + I k n--s )a }
a )J)
+ I + k + 1 sl = 2 n2m
tk
2an2ma 4 4 (t s)a
n n J
= 24 m4(nt ns) 2 (t s) 2 24m4(t s) 2 .
Take = 1, M = max{2 4 m 4 , 3 6 (m 4 + 3)} = 3 6 (m 4 + 3) for a = 4. n
The FCLT (Theorem 8.1) is stated in the text for convergence in S = C[0, 00),
when S has the topology of uniform convergence on compacts. One may take the met-
ric to be p(w, co') _ Zk 1 2 -k dk /(1 + dk ), where d k = max{Iw(t) cu'(t)J: 0 t k}.
Since the above arguments apply to [0, k] in place of [0, 1], the assertion of Theorem
8.1 follows (under the moment condition m a < oo).
6. (Measure Determining Classes) Let (S, p) be a metric space, .a(S) its Borel sigmafield.
A class l c a(S) is measure-determining if, for any two finite measures u, v,
p(C) = v(C) VC e' implies p = v. An example is the class -9 of all closed sets. To
see this, consider the lambda class .sad of all sets A for which p(A) = v(A). If this class
contains -9 then by the Pi-Lambda Theorem (Chapter 0, Theorem 4.1)
a a(s) = .a(S). Similarly, the class (0 of all open sets is measure-determining. A
class 9 of real-valued bounded Borel measurable functions on S is measure-determining
if ff dp = f f dv Vg e I implies p = v. The class C b (S) of real-valued bounded
continuous functions on S is measure-determining. To prove this, it is enough to
show that for each Fe -9 there exists a sequence {f} c Cb (S) such that f" j I F as
n I oo. For this, let h a (r) = 1 nr for 0 _< r < 1/n, h a (r) = 0 for r > 1/n. Then take
f(x) = h"(p(x, F)).

1. If f: C[0, oo) -+ 08' is continuous, then the FCLT provides that the real-valued
random variables X. = f({X;" 1 }), n > 1, converge in distribution to X = f({X' }).
Thus, if one checks by direct computation that the limit F(a) = lim a P(X" _< a), or
F(a ) = lim a P(X" < a), exists for all a, then F is the d.f. of X. Applying this to
-
f ({Xs }) := max{X5 :0 ,< s ,< t} __ M, we get (10.10), for P(T. > t) = P(M< < z), if z > 0.
The case z < 0 is similar.
Joint distributions of several functionals may be similarly obtained by looking at
linear combinations of the functionals. Here is the precise statement.
Theorem T.9.1. If f: C[0, oo) -- l8'` is continuous, say f = (fl , ... , f,,) where
f : C[0, oo) - III', then the random vectors X. = f({X;" 1 }), n >, 1, converge in
distribution to X = f({X,}).
Proof. For any r i , ... , rk e R , ^j_ rr f : C[0, co) -- W is continuous so that

^j= 1 r; f ({ X}'}) converges in distribution to jj =1 rt f ({X}) by the FCLT. Therefore,
its characteristic function converges to Ee'<' 12 '"> as n - oo for each r e W'. This
means f({X^"^}) converges in distribution to f({X,}) as asserted. n
2. (Mann-Wald) It is sufficient that f: C[0, cc) -+ t8 be only a.s. continuous with

respect to the limiting distribution for the FCLT to apply, i.e., for the convergence
'}) in distribution to f({X,}). That is,
of f ({X,(
Theorem T.9.2. If {X;">} converges in distribution to {Xf } and if P({X,} e Df ) = 0,

where
Df = {x e C[0, oo): f is discontinuous at x},
then f({X}) converges in distribution to f({XX }).
Proof. This can be proved using Alexandrov's Theorem (T.8.1(ii)), since for any
closed set F, f '(F) c f - `(F) = Df v f - `(F), where the overbar denotes the closure
-
of the set. n
In applications it is the requirement that the limit distribution assign probability

zero to the event DJ that is sometimes nontrivial to check. As an example, consider

(9.5) and (9.6). Let F = {rc < z. }. Recall that Q - C[0, oo) is given the topology
of uniform convergence on bounded subintervals of [0, x). Consider an element w e S2
that reaches a number less than c before reaching d. It is simple to check that cu
belongs to the interior of F. On the other hand, if w belongs to the closure of F, then
either (i) we F, or (ii) w neither reaches c nor d. Thus, if cu e iF, then either (ii)
occurs, the probability of which is zero for a Brownian motion since JX,I -i x in
probability, as t -+ oo, or (iii) in the interval [r., z'), w never goes below c. By the
strong Markov property (see Chapter V, Theorem 11.1), the latter event (iii) has the
same probability as that of a Brownian motion, starting at c, never reaching below
c before reaching d. In turn, the last probability is the same as that of a Brownian
motion, starting at 0, never reaching below 0 before reaching d c. This probability
is zero, since r = r' converges to zero in probability as z 10 (see Eq. 10.10). By
combining such arguments with those given in theoretical complement I above, one
may also arrive at (10.15) for Brownian motions with nonzero drifts.
A proof of Proposition 12.1 for the special case of infinite-dimensional events that
depend on the empirical process through the functional (w - supo,,,, Ico,l) used to
define the Kolmogorov-Smirnov statistic (12.11) is given below. This proof is based
on a trick of M. D. Donsker (1952), "Justification and Extension of Doob's Heuristic
Approach to the Kolmogorov-Smirnov Theorems," Annals Math. Statist., 23,
pp. 277-281, which allows one to apply the FCLT as given in Section 8 (and proved
in theoretical complements to Section 1.8 under the assumption of finite fourth
moments).
The key to Donsker's proof is the simple observation that the distribution of the
order statistic (Y i) , ... , Y) of n i.i.d. random variables Y, Y2 , ... from the uniform
distribution on [0, 1] can also be obtained as the distribution of the ratios
S, S Z S
( S., S+ I S +I
where Sk = T + + T k > 1, and T1 , T2 ,

k, ... is an i.i.d. sequence of (mean 1)
exponentially distributed random variables.
Intuitively, if the T are regarded as the successive times between occurrence of some
phenomena, then S, is the time to the (n + 1)st occurrence and, in units of S + 1 ,
the occurrence times should be randomly distributed because of lack of memory and
iridependence properties. A version of this simple fact is given in Chapter IV
(Proposition 5.6) for the Poisson process. The calculations are essentially the same,
so this is left as an exercise here.
The precise result that we will prove here is as follows. The symbol = below
denotes equality in distribution.
Proposition T.12.1. Let Y1 , Y2 be i.i.d. uniform on [0, 1] and let, for each n _> 1,
, ...
{F(t)} be the corresponding empirical process based on Y1 , Y. Then

... ,
D = sup o ,,, i JF(t) tI converges in distribution to sup o ,,, i IB*l as n - oc.

Proof. In the notation introduced above, we have
D= fn sup iFF(t) - tI = f max I y k) - k

O^t51 k5n n
d
='nmax
d 'I
I Sk
k5n S^11
k
-- =
n
n Skk kSn+1
max I --
Sn+1 kn n
n
= sup JX;' tX; lI + O(n -1 / 2 ), (T.12.1)

Sn+1
where
Xcnl = S[ ,u1 - [nt]

t
and, by the SLLN, n/(S n+1 ) -^ I a.s. as n - ao. The result follows from the FCLT,
(8.6), and the definition of Brownian bridge. n
This proposition is sufficient for the particular application to the large-sample

distribution given in the text. A precise definition of weak convergence of the scaled
empirical distribution function as well as a proof of Proposition 12.1 can be found
in P. Billingsley (1968), Convergence of Probability Measures, Wiley, New York.
2. Tabulations of the distribution of the Kolmogorov-Smirnov statistic can be found
in L. H. Miller (1956), "Table of Percentage Points of Kolmogorov Statistics," J.
Amer. Statist. Assoc., 51, pp. 111-121.
3. Let (X I A) denote the conditional distribution of a random variable or stochastic
process X given an event A. As suggested by Exercise 12.4*, from the convergence
of the finite-dimensional distributions it is possible to obtain the Brownian bridge
as the limiting distribution of ({B,} I 0 < B, _< e) as e -* 0, where {B,} denotes standard
Brownian motion on 0 _< t < 1. To make this observation firm, one needs to check
tightness of the family {({B,} 0 < B, _< e): e > 0). The following argument is used
by P. Billingsley (1968), Convergence of Probability Measures, Wiley, New York, p. 84.
Let F be any closed subset of C[0, 1]. Then, since sup,,,,, IB* - B 1 1 = IB,^,
P({B t }eFl0<-B, -< e)-<P({B*}e FE I0<B, -<e),
where FE := {w e C[0,1]: dist(o, F) _< a} fordist(w, A):= inf{p(w, y): ye Al, A c C[0, 1].
But, starting with finite-dimensional sets and then using the monotone class argument
(Chapter 0), one may check that the events {{B*) e F } and {0 _< B, _< e} are E
independent. Therefore, P({B,} e F 0 _ 0. Since
F is closed, the events {{B*} e F } decrease to {{B*} e F) as e - 0, and tightness
E
follows from the continuity of the probability measure P and Alexandrov's Theorem
T.8.1 (ii).
4. A check of tightness is also required for the Brownian meander and Brownian
excursion as described in Exercises 12.6 and 12.7, respectively. For this, consult
R. T. Durrett, D. L. Iglehart, and D. R. Miller (1977), "Weak Convergence to
Brownian Meander and Brownian Excursion," Ann. Probab., 5, pp. 117-129. The

distribution of the extremal functionals outlined in the exercises can also be found
in R. T. Durrett and D. L. Iglehart (1977), "Functionals of Brownian Meander and
Brownian Excursion," Ann. Probab., 5, pp. 130-135; K. L. Chung (1976), "Excursions
in Brownian Motion," Ark. Mat., pp. 155-177; D. P. Kennedy (1976), "Maximum
Brownian Excursion," J. App!. Probability, 13, 371-376. Durrett, Iglehart and Miller
(1977) also show that the * and + commute in the sense that the Brownian excursion
can be obtained either by a meander of the Brownian bridge (as done in Exercise
12.7) or as a bridge of the meander (i.e., conditioning the meander in the sense of
theoretical complement 3 above). Brownian meander and Brownian excursion have
been defined in a variety of other ways in work originating in the late 1940's with
Paul Levy; see P. Levy (1965), Processus Stochastiques et Mouvement Brownien,
Gauthier-Villars, Paris. The theory was extended and terminology introduced in
K. Ito and H. P. McKean, Jr. (1965), Diffusion Processes and Their Sample Paths,
Springer Verlag, New York. The general theory is introduced in D. Williams (1979),
Diffusions, Markov Processes, and Martingales, Vol. 1, Wiley, New York. A much
fuller theory is then given in L. C. G. Rogers and D. Williams (1987), Diffusions,
Markov Processes, Martingales, Vol. II, Wiley, New York. Approaches from the point
of view of Markov processes (see theoretical complement 11.2, Chapter V) having
nonstationary transition law are possible. Another very useful approach is from the
point of view of FCLTs for random walks conditioned on a late return to zero; see
W. D. Kaigh (1976), "An Invariance Principle for Random Walk Conditioned by a
Late Return to Zero," Ann. Probab., 4(1), pp. 115 -121, and references therein. A
connection with extreme values of branching processes is described in theoretical
complement 11.2, Chapter V.

(Construction of Wiener Measure) Let {X,: t _> 0} be the process having the
finite-dimensional distributions of the standard Brownian motion starting at 0 as
constructed by the Kolmogorov Extension Theorem on the product probability space
(D, .y , P). As discussed in theoretical complement 7.1, events pertaining to the
behavior of the process at uncountably many time points (e.g., path continuity) cannot
be represented in this framework. However, we will now see how to modify the model
in such a way as to get around this difficulty.
Let D be the set of nonnegative dyadic rational numbers and let
J", k = [ k/2", (k + 1)/2"]. We will use the maximal inequality to check that

j P max
n=1
sup Xq - XXI2 "i >
(0_<k1.2- qeJ,,,,, D
n1 < oo . (T.13.1)
By the Borel-Cantelli lemma we will get from this that with probability 1, for all n
sufficiently large,
1
max sup IXq - Xkf2 .,I -< -. (T.13.2)
OEk<n2" geJ,,.kn D n
_>
In particular, it will follow that with probability 1, for every t > 0, q - Xq is uniformly
continuous on D n [0, t]. Thus, almost all sample paths of {Xq : q e D} have a unique
extension to continuous functions {B,: t 0}. That is, letting C = {wu e Q: for each

t > 0, q - Xq (w) is uniformly continuous on D n [0, t]}, define for w e C,
B,(co) = Xq (W), if t = q e D,
(T.13.3)
lim Xq (co), if t 0 D,
qt
where the limit is over dyadic rational q decreasing to t. By construction, {B,: t _> 0}
has continuous paths with probability 1. Moreover, for 0 < t, < < t, with prob-
ability one, (B,,, .. . , B1 ) = lim" . (Xqc ., ... , Xq ,) for dyadic rational q;" , ... , qp"
) )
decreasing to t,.. ... t k . Also, the random vector (Xgti .... .. Xq ^ ,) has the multivariate
normal distribution with mean vector 0 and variancecovariance matrix
= min(t ; , t i ), I < i, j < k as a limiting distribution. lt follows from these
two facts that this must be the distribution of (B 1 , B, k ). Thus, {B,} is a standard
Brownian motion process.
To verify the condition (T.13.1) for the BorelCantelli lemma, just note that by
the maximal inequality (see Exercises 4.3, 13.11),
P( max IX,+,a2--
(s2"
a) 2P(IX,+a X11 % a)
2
4 E(X1+, X,) 4
a
66'
a
(T.13.4)
since the increments of {X,} are independent and Gaussian with mean 0. Now since
the events {max 142 .IX, +1a2 -, X, a} increase with m, we have, letting m * oo,
P( sup iXr+qa X1I > a \ 6S . (T.13.5)

o$q$l.q.D J a
Thus,
/ 1
Pl max sup Xq Xk12 .4 > - Pf sup (Xq Xk,2"I >
O^kan2"qeJ".knD n k=0 \qeJ",k,D n
n2"
6^ri2 ^
( n)2
)4 =
6
5
(T.13.6)
which is summable.

1. The treatment given in this section follows that in R. N. Bhattacharya, V. K. Gupta,
and E. Waymire (1983), "The Hurst Effect Under Trends," J. App!. Probability, 20,
pp. 649-662. The research on the Hurst effect is rather extensive. For the other related
results mentioned in the text consult the following references:

W. Feller (1951), "The Asymptotic Distribution of the Range of Sums of

Independent Random Variables," Ann. Math. Statist., 22, pp. 427-432.
P. A. P. Moran (1964), "On the Range of Cumulative Sums," Ann. Inst. Statist.
Math., 16, pp. 109-112.
B. B. Mandelbrot and J. W. Van Ness (1968), "Fractional Brownian Motions,
Fractional Noises and Applications," SIAM Rev., 10, pp. 422-437. Processes
of the type considered by Mandelbrot and Van Ness are briefly described in
theoretical complement 1.3 to Chapter IV of this book.
CHAPTER II
Discrete-Parameter Markov Chains
1 MARKOV DEPENDENCE
Consider a discrete-parameter stochastic process {X}. Think of X0 , X,, ... , X,_,

as "the past," X. as "the present," and X +1 X +z, i as "the future" of the
, .
process relative to time n. The law of evolution of a stochastic process is often

thought of in terms of the conditional distribution of the future given the present
and past states of the process. In the case of a sequence of independent random
variables or of a simple random walk, for example, this conditional distribution
does not depend on the past. This important property is expressed by
Definition 1.1.
Definition 1.1. A stochastic process {X0 , X 1 ..... X, ...} has the Markov
property if, for each n and m, the conditional distribution of X + 1 .. , Xn +m
,
given X0 , X 1 ..... X is the same as its conditional distribution given X. alone.

A process having the Markov property is called a Markov process. If, in addition,
the state space of the process is countable, then a Markov process is called a
Markov chain.
In view of the next proposition, it is actually enough to take m = I in the above

definition.
Proposition 1.1. A stochastic process X0 , X,, X 2 , .. . has the Markov property

if and only if for each n the conditional distribution of X + , given X 0 X,..... X
, X
is a function only of X.
Proof. For simplicity, take the state space S to be countable. The necessity of
the condition is obvious. For sufficiency, observe that
109
110 DISCRETE-PARAMETER MARKOV CHAINS
P(Xn+1 =ill . .. , Xn +m .lm I Xo = ... , Xn = in)
= P(Xn+t = f1 I Xo = io, .. . , Xn = in)'

P(Xn+z = J2jXo= io,...,Xn=In,Xn+t = J1) ...
P(Xn +m=lmIXo = io,...,Xn +m-1 =1m-1)
= P(Xn +1 =J1 I Xn = in)P(Xn +2 =12 I Xn +1 =1 ).

P(Xn +m =1m Xn +m-1 =1m-1). (1.1)
The last equality follows from the hypothesis of the proposition. Thus the
conditional distribution of the future as a function of the past and present states
i 0 , i 1 , ... , i n depends only on the present state i n . This is, therefore, the
conditional distribution given X n = i n (Exercise 1). n
A Markov chain {X0 , X1,. . .} is said to have a homogeneous or stationary

transition law if the distribution of Xn +l, ... , Xn +m given Xn = y depends on
the state at time n, namely y, but not on the time n. Otherwise, the transition
law is called nonhomogeneous. An i.i.d. sequence {X n } and its associated random
walk possess time-homogeneous transition laws, while an independent
nonidentically distributed sequence {X} and its associated random walk have
nonhomogeneous transitions. Unless otherwise specified, by a Markov process
(chain) we shall mean a Markov process (chain) with a homogeneous transition
law.
The Markov property as defined above refers to a special type of statistical
dependence among families of random variables indexed by a linearly ordered
parameter set. In the case of a continuous parameter process, we have the
following analogous definition.
Definition 1.2. A continuous-parameter stochastic process {X,} has the

Markov property if for each s < t, the conditional distribution of X, given
{X, u < s} is the same as the conditional distribution of X, given Xs . Such a
process is called a continuous-parameter Markov process. If, in addition, the
state space is countable, then the process is called a continuous-parameter
Markov chain.
2 TRANSITION PROBABILITIES AND THE PROBABILITY SPACE
An i.i.d. sequence and a random walk are merely two examples of Markov
chains. To define a general Markov chain, it is convenient to introduce a matrix
p to describe the probabilities of transition between successive states in the
evolution of the process.
Definition 2.1. A transition probability matrix or a stochastic matrix is a square

TRANSITION PROBABILITIES AND THE PROBABILITY SPACE III
matrix p = ((p i3 )), where i and] vary over a finite or denumerable set S, satisfying
(i) p i; >, 0 for all i and j,
(ii) YJES pit = I for all i.
The set S is called the state space and its elements are states.
Think of a particle that moves from point to point in the state space according
to the following scheme. At time n = 0 the particle is set in motion either by
starting it at a fixed state i o , called the initial state, or by randomly locating it
in the state space according to a probability distribution it on S, called the
initial distribution. In the former case, it is the distribution concentrated at the
state i 0 , i.e., n ; = 1 if j = i o , ire = 0 if j : i 0 . In the latter case, the probability
is 7r ; that at time zero the particle will be found in state i, where 0 < n i < 1 and
y i tc i = 1. Given that the, particle is in state i 0 at time n = 0, a random trial is
performed, assigning probability p ;0 to the respective states j' E S. If the
outcome of the trial is the state i l , then the particle moves to state i 1 at time
n = 1. A second trial is performed with probabilities p i , j - of states j' E S. If the
outcome of the second trial is i 2 , then the particle moves to state i 2 at time
n = 2, and so on.
A typical sample point of this experiment is a sequence of states, say
(i o , i 1 , i z , ... , i n , ...), representing a sample path. The set of all such sample
paths is the sample space S2. The position Xn at time iI is a random variable
whose value is given by X n = i n if the sample path is (i 0 , i l , ... , i n , ...). The
precise specification of the probability P,, on Q for the above experiment is
given by
.. p
Pn( XO = l 0, X1 = ii, , Xn = in) = M io Pioi1 Pi l i 2 (2.1)
More generally, for finite-dimensional events of the form
A={(X0 ,X 1 ....,Xn )EB}, (2.2)
where B is an arbitrary set of (n + 1)-tuples of elements of S, the probability

of A is specified by
P,1(A) _ y- iioPioi,...p1 ^. (2.3)

(io.i1 ..... ln)EB
By Kolmogorov's existence theorem, P,, extends uniquely as a probability

measure on the smallest sigmafield F containing the class of all events of the
form (2.2); see Example 6.3 of Chapter I. This probability space (i), F, Pn )
with Xn (CU) = w,, co E S2, is a canonical model for the Markov chain with transition
probabilities ((p ij )) and initial distribution it (the Markov property is established
below at (2.7)(2.10)). In the case of a Markov chain starting in state i, that
is, 1t ; = 1, we write Pi in place of P11.
To specify various joint distributions and conditional distributions associated

with this Markov chain, it is convenient to use the notation of matrix
multiplication. By definition the (i, j) element of the matrix p 2 is given by
nj ) _ P kcS
ik Pkj (2.4)
The elements of the matrix p" are defined recursively by p" = p" 'p so that the -
(i, j) element of p" is given by
cn) = cn-1) _ cn-') n = 2,3 .... (2.5)

P ik
Pkj PikPkj >
kES kcS
It is easily checked by induction on n that the expression for p;n is given directly )
in terms of the elements of p according to
Pi; = ) Y_ P1Pi1i2..Pin-zln-IPin-li. (2.6)

11.....In - l E S
Now let us check the Markov property of this probability model. Using (2.1)
and summing over unrestricted coordinates, the joint distribution of
Xo , Xn ,, X" 2 , . .. , Xnk , with 0 = n o < n l < n2 < < nk, is given by
Pa(X0 = i, Xn l =j, Xn 2
12' ... , Xn k =ik)
_ Z E . . . I (
Y /n2 - 1 J2 )
7r l Pii, Pi, . . . . pin 1 - lj l)(Pi l in i + I Pin, + l in, + Z . . I7 . .
1 2 k
X ( Pik-link-,+1 Pink-,+l ink_I+2...Pink-lik)> (2.7)
where >, is the sum over the rth block of indices ii + n ,_ ', .+ .), . 1,,
, in ,
(r = 1, 2, . .. , k). The sum Y_ k , keeping indices in all other blocks fixed, yields
the factor p^kk Ok -') using (2.6) for the last group of terms. Next sum successively
over the (k 1)st, ... , second, and first blocks of factors to get
, Xn k =1k) = 7riP) p
^nJ 2 nl)..
P (X0 = i, Xn1 =j1, X,2 =J2' . plkk Ijk 1). (2.8)
Now sum over i e S to get
Xnk 1) .
PE(Xni j1, X2 121 , Jk) = ( Î Pin1))Pjll.i2 "
. .pjkk lik 1). (2.9)
( ics /
Using (2.8) and the elementary definition of conditional probabilities, it now

follows that the conditional distribution of X +m given X o , X,,. . . , Xn is given by
Pt(Xn+m = J I Xo = i0, Xl = il, .. , Xn 1 = in 1, Xn = i)
=p^) =P"(Xn+m=j1 Xn =i)

)
= PP(Xm = .1 1 XO = 1), m i 1 , j E S. (2.10)
Although by Proposition 1.1 the case m = 1 would have been sufficient to prove
SOME EXAMPLES 113
the Markov property, (2.10) justifies the terminology that p' :_ ((p;; >)) is the
m-step transition probability matrix. Note that p' is a stochastic matrix for all
m>'1.
The calculation of the distribution of X, follows from (2.10). We have,
Pn(Xm=j)=ZPn(Xm=j X0 =i) =ZPP(X0=i)Pi(Xm=j1 X0 =')

i i
_ 7 Z.pi;' = (n'pm)., (2.11)
where n' is the transpose of the column vector n, and (n'pt) j is the jth element
of the row vector n'pm.
3 SOME EXAMPLES
The transition probabilities for some familiar Markov chains are given in the
examples of this section. Although they are excluded from the general
development of this chapter, examples of a non-Markov process and a Markov
process having a nonhomogeneous transition law are both supplied under
Example 8 below.
Example 1. (Completely Deterministic Motion). Let the only elements of p,

which may be either a finite or denumerable square matrix, be 0's and l's. That
is, for each state i there exists a state h(i) such that
p ih(i) = 1, pi; = 0 for j 0 h(i) (i n S). (3.1)
This means that if the process is now in state i it must be in state h(i) at the
next instant. In this case, if the initial state X o is known then one knows the
entire future. Thus, if X 0 = i, then
X l = h(i),
X z = h(h(i)) := h (z) (i), ... ,
" - 1)
X" = h(h ( (i)) = h ( " (i), ... .
)
Hence p;;^ = 1 if j = h ( " (i) and p;j" = 0 if j ^ h ( " (i). Pseudorandom number
) ) )
generators are of this type (see Exercise 7).
Example 2. (Completely Random Motion or Independence). Let all the rows

of p be identical, i.e., suppose p i , is the same for all i. Write for the common row,
p si =p (ieS,jeS). (3.2)
In this case, X0 , X 1 , X2 . . forms a sequence of independent random variables.

,.
The distribution of X0 is it while X 1 ,. . . , X, ... have a common distribution

given by the probability vector (p j )jcs . If we let it = (p; ), ES , then Xo , X 1 , .. .
form a sequence of independent and identically distributed (i.i.d.) random
variables. The coin-tossing example is of this kind. There, S = {0,1 } (or {H, T } )
and po=2'Pji
=za1
Example 3. (Unrestricted Simple Random Walk). Here, S = {0, + 1, 2, ...}
and p = ((p ;j )) is given by
Pi; =P ifj =i +1
=q ifj =i -1
=0 ifjjiI> 1. (3.3)
where 0 < p < l and q = 1 p.
Example 4. (Simple Random Walk with Two Reflecting Boundaries). Here

S = {c, c + 1, ... , d}, where c and d are integers, c < d. Let
p ;j =p ifj =i +1 andc<i<d
=q ifj =i I andc<i<d
Pc,c +i = I, Pa,e -i = 1. (3.4)
In this case, if at any point of time the particle finds itself in state c, then at
the next instant of time it moves with probability I to c + 1. Similarly, if it is
at d at any point of time it will move to d I at the next instant. Otherwise
(i.e., in the interior of [c, d]), its motion is like that of a simple random walk.
Example S. (Simple Random Walk with Two Absorbing Boundaries). Here

S = {c, c + 1, ... , d}, where c and d are integers, c < d. Let
PCC = 1, Pad = 1. (3.5)
For c < i < d, p i; is as defined by (3.3) or (3.4). In this case, once the particle
reaches c (or d) it stays there forever.
Example 6. (Unrestricted General Random Walk). Take S = {0, + 1, 2, ...}.

Let Q be an arbitrary probability distribution on S, i.e.,
(i) Q(i) > 0 for all i E S,
(ii) liS Q(i) = I.
Define the transition matrix p by
SOME EXAMPLES 115
= Q(j i), i, j E S. (3.6)
One may think of this Markov chain as a partial-sum process as follows. Let
X0 have a distribution n. Let Z 1 , Z 2 , ... be a sequence of i.i.d. random variables
with common distribution Q and independent of X o . Then,
X^=X0 +Z 1 +--+Z,,, forn>,l, (3.7)
is a Markov chain with the transition probability (3.6). Also note that Example
3 is a special case of Example 6, with Q( l) = q, Q(l) = p, and Q(i) = 0 for
i: +1.
Example 7. (BienaymeGaltonWatson Simple Branching Processes). Particles

such as neutrons or organisms such as bacteria can generate new particles or
organisms of the same type. The number of particles generated by a single
particle is a random variable with a probability function f; that is, f (j) is the
probability that a single particle generates j particles, j = 0, 1, 2, .... Suppose
that at time n = 0 there are Xo = i particles present. Let Z 1 , Z 2 , ... , Z i denote
the numbers of particles generated by the first, second, ... , ith particle,
respectively, in the initial set. Then each of Z 1 , ... , Z ; has the probability
function f and it is assumed that Z 1 , ... , Z; are independent random variables.
The size of the first generation is X 1 = Z 1 + + Z., the total number generated
by the initial set. The X, particles in the first generation will in turn generate
a total of X 2 particles comprising the second generation in the same manner; that
is, the X, particles generate new particles independently of each other and the
number generated by each has probability function f. This goes on so long as
offspring occur. Let X. denote the size of the nth generation. Then, using the
convolution notation for distributions of sums of independent random variables,
pti=P(Z1+ ... +Z.=1)_f *i (j), i> 1, j^0,

(3.8)
Poo= 1 , p oi =0 ifj00.
The last row says that "zero" is an absorbing state, i.e., if at any point of time
X = 0, then X. = 0 for all m > n, and extinction occurs.
Example 8. (Plya Urn Scheme and a Non-Markovian Example). A box

contains r red balls and b black balls. A ball is randomly selected from the box
and its color is noted. The ball selected together with c 0 balls of the same
color are then placed back into the box. This process is repeated in successive
trials numbered n = 1, 2,-. . . . Indicate the event that a red ball occurs at the
nth trial by XX = 1 and that a black ball occurs at the nth trial by X = 0. A
straightforward induction calculation gives for 0 < Yk= 1 e k < n,

P(X1 = E1, .Xn = En)
[r+(s-1)c][r+(s-2)c]. r[b+(T-1)c].b
(3.9)
[r+b+(n 1)c][r+b+(n-2)c]...[r+b]
where

s= Y_ s k , r =ns. (3.10)
k=1
In the cases=n(i.e.,c 1 =...=E=1),
[ r + (n 1) c] r
P(X I = 1, ... X = 1) (3.11)
[r+b+(n-1)c].[r+b]
and if s = 0 (i.e., e 1 = = E = 0) then
[b + (n 1)c].b
P(XI=0,...,Xn =0)= (3.12)
[r+b+(n 1)c][r+b]
In particular,
P(Xn = EIXt = ei,...,Xn-1 =E -1)
P (X1 = E1, .. . , X. = n)
P (X 1 = c1,. ..,Xt = En t )
_ [r + (s 1)c]..r[b + (r 1)c]b [r + b + (n 2)c]..[r + b]

[r + (s n _ 1 1)c]. r[b + (t n - 1 1)c] b [r + b + (n 1)c]. .[r + b]
[r+s_Ic]
ifE= 1
r+b+(n 1)c
(3.13)
[b+r-IC]
ife=0.
r+b+ (ii 1)c
It follows that {X} is non-Markov unless c = 0 (in which case {X} is i.i.d.).
Note, however, that {X} does have a distinctive symmetry property reflected
in (3.9). Namely, the joint distribution is a function of s = Yk =1 e k only, and
is therefore invariant under permutations of e l . i,,. Such a stochastic process
is called exchangeable (or symmetrically dependent). The Plya urn model was
originally introduced to illustrate a notion of "contagious disease or "accident
proneness' for actuarial mathematics. Although {X} is non-Markov for c ^ 0,
it is interesting to note that the partial-sum process {S}, representing the
evolution of accumulated numbers of red balls sampled, does have the Markov
property. From (3.13) one can also get that
STOPPING TIMES AND THE STRONG MARKOV PROPERTY 117
r + CS n _ , if s = l +s_,
+(n-1)c
P(sn= sIS, =s1, ,Sn_, =s,)= r+b
b +(n S- 1 ) c
lfs=s _ i .
r + b + (n 1)c
(3.14)
Observe that the transition law (3.14) depends explicitly on the time point n.
In other words, the partial-sum process {S} is a Markov process with a
nonhomogeneous transition law. A related continuous-time version of this
Markov process, again usually called the P1ya process is described in Exercise
1 of Chapter IV, Section 4.1. An alternative model for contagion is also given
in Example 1 of Chapter IV, Section 4, and that one has a homogeneous
transition law.
4 STOPPING TIMES AND THE STRONG MARKOV PROPERTY
One of the most useful general properties of a Markov chain is that the Markov
property holds even when the "past" is given up to certain types of random
times. Indeed, we have tacitly used it in proving that the simple symmetric
random walk reaches every state infinitely often with probability 1 (see
Eq. 3.18 of Chapter 1). These special random times are called stopping times
or (less appropriately) Markov times.
Definition 4.1. Let {},,: n = 0, 1, 2, ...} be a stochastic process having a

countable state space and defined on some probability space (S2, 3, P). A
random variable r defined on this space is said to be a stopping time if
(i) It assumes only nonnegative integer values (including, possibly, +co),
and
(ii) For every nonnegative integer m the event {w: r(w) < m} is determined
by Yo , Y1 ,..., Y..
Intuitively, if r is a stopping time, then whether or not to stop by time m

can be decided by observing the stochastic process up to time m. For an example,
consider the first time T y the process { Y} reaches the state y, defined by
t(co) = inf{n > 0: Y(co) = y} . (4.1)
If co is such that Y(w) y whatever be n (i.e., if the process never reaches y),
then take r y,(w) = oo. Observe that
{w: r(w) < m} = U {co: Y(w) = y} . (4.2)

n =O
Hence zr Y is a stopping time. The rth return times r;' ofy are defined recursively by
)
r" (w) = inf{n

) 1: Y(w) = y},
for r = 2, 3, .... (4.3)
r(w) = inf{n > iy '(w): Y (w) = y},
Once again, the infimum over an empty set is to be taken as oo. Now whether
or not the process has reached (or hit) the state y at least r times by the time
m depends entirely on the values of Y1 , ... , Y.. Indeed, {rI m} is precisely
the event that at least r of the variables Y1 , ... , Y,n equal y. Hence ry" is a
stopping time. On the other hand, if n y denotes the last time the process reaches
the state y, then ?J is not a stopping time; for whether or not i < m cannot
in general be determined without observing the entire process {Y n }.
Let S be a countable state space and p a transition probability matrix on S,
and let P,, denote the distribution of the Markov process with transition
probability p and initial distribution n. It will be useful to identify the events
that depend on the process up to time n. For this, let S2 denote the set of all
sequences w = (i 0 , i 1 , i 2 , ...) of states, and let Y(w) be the nth coordinate of w
(if w = (i o , i,, ... , i, ...), then Yn (cw) = in ). Let .fin denote the class of all events
that depend only on Yo , YI , ... , Yn . Then the n form an increasing sequence
of sigmafields of finite-dimensional events. The Markov property says that given
the "past" Yo , Yi , ... , Y. up to time m, or given .gym , the conditional distribution
of the "after-m"stochastic process Y, = {(Ym ) } := {Ym+n . n = 0, 1, ...} is P.
In other words, if the process is re-indexed after time m with m + n being
regarded as time n, then this stochastic process is conditionally distributed as
a Markov chain having transition probability p and initial state Y..
A WORD ON NOTATION. Many of the conditional distributions do not depend

on the initial distribution. So the subscripts on P,^, Pi , etc., are suppressed as a
matter of convenience in some calculations.
Suppose now that r is the stopping time. "Given the past up to time t" means
given the values oft and Yo , Y1 , ... , YY . By the "after-r"process we now mean
the stochastic process
Yi = {Yt+n :n=0, 1,2,...},
which is well defined only on the set IT < co}.
Theorem 4.1. Every Markov chain { Yn : n = 0, 1, 2, ...} has the strong Markov
property; that is, for every stopping time i, the conditional distribution of the
after-r process Yt = { Yt+n : n = 0, 1, 2, ...}, given the past up to time i is P..
on the set {i < co}.
Proof. Choose and fix a nonnegative integer m and a positive integer k along
with k time points 0 _< m, < m 2 < < m k , and states i o , i l , ... 'im'
J1,J2, ,Jk Then,
P(Yt +m l J1 Yt +m 2 =J2, = Jk I T = m, Yo = i o , ... , Ym = Im)

... , Yt +m k
= P(Ym+mi = J1 , Ym+mz = J2, .. , Ym+mk Jk I T = m, Yo = io... , Ym = Im)

(4.4)
Now if the event IT = m} (which is determined by the values of Y,,... , Ym )
is not consistent with the event "Yo = i o , ... , Ym = im " then {T = m,
Yo = i o , , Ym = i m } = 0 and the conditioning event is impossible. In that
case the conditional probability may be defined arbitrarily (or left undefined).
However, if IT = m} is consistent with (i.e., implied by) "Yo = i o , , Ym =
then{ T= m,Yo =i o .....Ym = i m }= {Yo =i o ,,Ym =i m },and the right side
of (4.4) becomes
P(Ym+mi 11> Ym+mz =i2' .. , Ym+m k =Jk IY Q = 1 p , ... , Ym = im) (4.5)
But by the Markov property, (4.5) equals
Pi m (Ym i = J1, Ymz =J2, , Ymk = Jk)
= Py,(Ym i = 11, Ym z = J1, Ym z = J2, ... , Ym k =Jk), (4.6)
on the set {T = m}. n

We have considered just "future" events depending on only finitely many
(namely k) time points. The general case (applying to infinitely many time
points) may be obtained by passage to a limit. Note that the equality of (4.4)
and (4.5) holds (in case {r = m} {Y0 = i o , ... , Ym = i m }) for all stochastic
processes and all events whose (conditional) probabilities may be sought. The
equality between (4.5) and (4.6) is a consequence of the Markov property. Since
the latter property holds for all future events, so does the corresponding result
for stopping times T.
Events determined by the past up to time T comprise a sigmafield . , called
the pre -T sigmafleld. The strong Markov property is often expressed as: the
conditional distribution of Y7 given . is PY (on {T < co}). Note that .
is the smallest sigmafield containing all events of the form IT = m,
,Ym =im }.
Example 1. In this example let us reconsider the validity of relation (3.18) of

Chapter I in light of the strong Markov property. Let d and y be two integers.
For the simple symmetric random walk {S: n = 0, 1, 2, ...} starting at x,
is an almost surely finite stopping time, for the probability p x ,, that the random
walk ever reaches y is I (see Chapter 1, Eqs. 3.16-3.17). Denoting by E, the
expectation with respect to P, we have

P.(TY < oo) = P (S Y + = y for some n > 1)

X
= EX[P(Sv+ = y for some n , 1 (4 , So , ... ,SY)]

Y = y)
= E X [PY (S = y for some n >, 1)] (since S t,,,
(Strong Markov Property)
= P(S = y for some n > 1)
=Py (S, =y 1,S=y forsome n>, 1)
+PP (S 1 = y + 1, S=y for some n>, 1)
=PY (S 1 = y-1)Py (S=y forsomen> 1IS =y-1) 1
+PP (S, =y 1)Py (S=yforsomen>, 1 IS 1 =y+ 1)

=zPy (S, + ,=y for some m>,01S, = y 1)
+ZPy (S l+ ,=y for some m >,QIS 1 =y+1) (m=n-1)
=ZPy _ 1 ( S.=y for some m>,0) +ZPY+ ,( S,=y for some m,0)
(Markov property)
2P r 1,r +2Py +1, 2 +z= 1. (4.7)
Now all the steps in (4.7) remain valid if one replaces Ty l) by i;,' -1) and T;,2) by
zy'r and assumes that < oo almost surely. Hence, by induction,
P (') < oo) = 1 for all positive integers r. This is equivalent to asserting
1 = P.,(T ( ' ) < oo for all positive integers r)

= PX (S = y for infinitely many n). (4.8)
The importance of the strong Markov property will be amply demonstrated

in Sections 9-11.
5 A CLASSIFICATION OF STATES OF A MARKOV CHAIN
The unrestricted simple random walk {S} is an example in which any state
i E S can be reached from every state j in a finite number of steps with positive
probability. If p denotes its transition probability matrix, then p 2 is the transition
probability matrix of { Y} 2_ {SZ : n = 0, 1, 2, ...}. However, for the Markov
chain { Y}, transitions in a finite number of steps are possible from odd to odd
integers and from even to even, but not otherwise. For {S} one says that there
is one class of "essential states and for { Y} that there are two classes of
essential states.
A different situation occurs when the random walk has two absorbing
boundaries on S = {c, c + 1, ... , d 1, d}. The states c, d can be reached (with
positive probability) from c + 1, ... , d 1. However, c + 1, ... , d 1 cannot
A CLASSIFICATION OF STATES OF A MARKOV CHAIN 121
be reached from c or d. In this case c + 1, ... , d - I are called "inessential"

states while {c }, {d} form two classes of essential states.
The term "inessential" refers to states that will not play a role in the long-run
>,
behavior of the process. If a chain has several essential classes, the process
restricted to each class can be analyzed separately.
Definition S.I. Write i --' f and read it as either ` j is accessible from i" or "the
Y_
process can go from i to j" if p;l > 0 for some n ) 1.
Since
Pi;)= Pi (5.1)
1I.i2....,ln - 1 E.S
i - j if and only if there exists one chain (i, i,, i 2 , ... , i n _ 1 , j) such that
Pul' pi,i2 , . , p,,,_ 1 j are strictly positive.
Definition 5.2. Write i H j and read "i and j communicate" if i - j and j - i.

Say "i is essential" if i - j implies j -+ i ( i.e., if any state j is accessible from i,
then i is accessible from that state). We shall let & denote the set of all essential
states. States that are not essential are called inessential.
Proposition 5.1
(a) For every i there exists (at least one) j such that i ' f. --
(b) i -*j,j- k imply i- k.

(c) "i essential" implies i <-+ i.
(d) i essential, i --'f imply "f is essential" and i-'].
(e) On (' the relation "H" is an equivalence relation (i.e., reflexive,
symmetric, and transitive).
Y-
Proof. (a) For each i, jEs pit = 1. Hence there exists at least one j for which
p i; >0; for this] one has i-- j.
(b) i -* j, j - k means that there exist m? 1, n I such that p;f > 0, >, )
pik > 0. Hence,

)
(m+n)_ (m) (n)

Pik Pu Pik
iEs
(m) (n) (m) (n) (m) (n)

= Vif P.ik + L. Pa P)k % Pij Pik > 0. (5.2)
+96i
Hence, i - k. Note that the first equality is a consequence of the relation

p m+n = pm pn
(c) Suppose i is essential. By (a) there exists j such that p ;j > 0. Since i is
>,
essential, this implies] --' i, i.e., there exists m I such that pj7' 1 > 0. But then
pi
l
m+1)
= li- Pu
i
(m) = pif Pi (m) + pilp it ) > 0. (5.3)
Ics )3E.%
Hence i - i and, therefore, i 4--* i.

(d) Suppose i is essential, i -*j. Then there exist m > 1, n >, 1 such that
p, J > 0 and p > 0. Hence i j. Now suppose k is any state such that j -> k,
) )
i.e., there exists m' >, 1 such that p1') > 0. Then, by (b), i - k. Since i is essential,
one must have k -> i. Together with i --+ j this implies (again by (b)) k - j.
Thus, if any state k is accessible from j, then j is accessible from that state k,
proving that j is essential.
(e) If 60 is empty (which is possible, as for example in the case p 1 = 1,
i = 0, 1, 2, ...), then, there is nothing to prove. Suppose ' is nonempty. Then:

(i) On f the relation "+-" is reflexive by (c). (ii) If i is essential and i.-.j, then
(by (d)) j is essential and, of course, i -^ j and j H i are equivalent properties.
Thus "->" is symmetric (on ' as well as on S). (iii) If i H j and j *-+ k, then i j -
and j - k. Hence i -4 k (by (b)). Also, k - j and j -> i imply k --> i (again by
(b)). Hence i H k. This shows that "-+" is transitive (on 9 as well as on S).
From the proof of (e) the relation "^-" is seen to be symmetric and transitive
on all of S (and not merely 9). However, it is not generally true that i i (or,
i - i) for all i e S. In other words, reflexivity may break down on S.
Example 1. (Simple (Unrestricted) Random Walk). S = {0, 1, 2, ...}.

Assume, as usual, 0 <P < 1. Then i -+ j for all states i e S, j e S. Hence ' = S.
Example 2. (Simple Random Walk with Two Absorbing Boundaries). Here

S = {c, c + 1, ... , d}, 9 = {c, d}. Note that c is not accessible from d, nor is d
accessible from c.
Example 3. (Simple Random Walk with Two Reflecting Boundaries). Here

S={c,c+ 1,....,d}, and i--*j for all i eSandjeS. Hence f'=S.
Example 4. Let S = {1, 2, 3, 4, 5} and let

1 1 1
5 5 5 5
0 3 0 3 0
p= O 0 4 0 (5.4)
0 3 0 3 0
1 0 1 0 1
[ ii
3 3 3
Note that 9 = {2, 4}.

In Examples I and 3 above, there is one essential class and there are no
inessential states.
Definition 5.3. A transition probability matrix p having one essential class and
no inessential states is called irreducible.
Now fix attention on S. Distinct subsets of essential states can be identified
according to the following considerations. Let i e.1. Consider the set
6'(i) = { j e 6": i --> j}. Then, by (d), i +-] for all j E 6(i). Indeed, if], k e 6"(i), then
j H k (for j --> i, i - k imply j -+ k; similarly, k - j). Thus, all members of 6'(i)
communicate with each other. Let r e 6, r 6"(i). Then r is not accessible from
a state in 6'(i) (for, if j e e'(i) and j -+ r, then i --> j, j -a r will imply i -> r so
that r e 6"(i), a contradiction). Define S(r) = { j E 6',r -+ j}. Then, as before, all
states in c0(r) communicate with each other. Also, no state in 6"(r) is accessible
from any state in 6'(i) (for if ! e "(r), and j e 6'(i) and j -+ 1, then i - 1; but r F + 1,
so that i - 1, 1 --> r implying i - r, a contradiction). In this manner, one
decomposes 6" into a number of disjoint classes, each class being a maximal set
of communicating states. No member of one class is accessible from any member
of a different class. Also note that if k e 6"(i), then 6"(i) = 6'(k). For if j e 6'(i),
then j , i, i -* k imply j -- k; and since j is essential one has k --> j. Hence
j e 6"(k). The classes into which of decomposes are called equivalence classes.
In the case of the unrestricted simple random walk {S}, we have
6" S = {0, + 1, 2,. . .}' and all states in 6" communicate with each other;
only one equivalence class. While for {X} = {S 2n }, = S consists of two disjoint
equivalence classes, the odd integers and the even integers.
Our last item of bookkeeping concerns the role of possible cyclic motions
within an essential class. In the unrestricted simple random walk example, note
that p i ,=0 for all i=0,1,2,...,butp;2 ) =2pq>0. In fact p;7 1 =0for
all odd n, and p;" > 0 for all even n. In this case, we say that the period of i
)
is 2. More generally, if i -- i, then the period of i is the greatest common divisor

of the integers in the set A = In >, 1: p}. If d = d, is the period of i, then
p;" ) = 0 whenever n is not a multiple of d and d is the largest integer with this
property.
Proposition 5.2
(a) If i H j then i and j possess the same period. In particular "period" is
constant on each equivalence class.
(b) Let i e 9' have a period d = d ; . For each j e 6'(i) there exists a unique
integer r 1 , 0 < rj d - 1, such that p;j ) > 0 implies n = rj (mod d) (i.e.,
either n = rj or n = sd + rj for some integer s >, I).
Proof. (a) Clearly,

(a+m+b) (a) (m) (b)
P >P);Pi; P;, (5.5)
for all positive integers a, m, b. Choose a and b such that p;, > 0 and pj(b > 0.
) )
If pj7 1 > 0, then ps_m ) - pj^" ) p^^ ) > 0, and

(a+2m+b) > (a ) ^^m) ^b) > 0.
P^l P ( ) P (b.. p..
(a+m+b) > ) > 0
Pu (
) P ( p. p. ( 5.6 )
Therefore, d (the period of i) divides a + m + b and a + 2m + b, so that it

divides the difference m = (a + 2m + b) (a + m + b). Hence, the period of i
does not exceed the period of j. By the same argument (since i 4J is the same
as j -+ i), the period of j does not exceed the period of i. Hence the period of i
equals the period of j.
(b) Choose a such that !> > 0. If ;'") > 0, ;") > 0, then "' 3p 1) p)> 0
and p;' ? pp;) > 0. Hence d, the period of i, divides m + a, n + a and,
therefore, m n = m + a (n + a). Since this is true for all m, n such that
p 1) > 0, p;j> > 0, it means that the difference between any two integers in the
set A = {n: p;jn> > 0} is divisible by d. This implies that there exists a unique
integer r,, 0 < rj < d 1, such that n = rj (mod d) for all n e A (i.e., n = sd + r,
for some integer s >, 0 where s depends on n). n
It is generally not true that the period of an essential state i is

min{n >, 1: p;; > 0}. To see this consider the chain with state space { 1, 2, 3, 4}
)
and transition matrix
0 1 0 0
0 0 1 0
0 2 1 0 '2
1 0 0 0
Schematically, only the following one-step transitions are possible.
4.1
T
I-->2^3
2
Thus p;i' = 0, pi 1 > 0 , P11 > 0, etc., and pi"i = 0 for all odd n. The states
communicate with each other and their common period is 2, although
min{n: p;"1 > 0} = 4. Note that min{n > 1: p; ) > 0} is a multiple of d, since d,
divides all n for which p;" ) > 0. Thus, d. <, min{n >, 1: p> 0}.
Proposition 5.3. Let i E e have period d> 1. Let Cr be the set of j e .9(i) such
that rj = r, where rr is the remainder term as defined in Proposition 5.2(b). Then
(a) Co , C,, ... , Cd _, are disjoint, U r = C, =
(b) If je C then pik >0 implies k e C, + ,, where we take r + 1 = 0 if
r = d 1.
Proof. (a) Follows from Proposition 5.2(b).

(b) Suppose j e C, and p;j > 0. Then n = sd + r for some s >, 0. Hence, if
)
p;k > 0 then

Pik +>> Pi; )Pjk >0, (5.7)
which implies k e Cr+l (since n + I = sd + r + I = r + 1 (mod d)), by

Proposition 5.2(b). n
Here is what Proposition 5.3 means. Suppose i is an essential state and has
a period d > 1. In one step (i.e., one time unit) the process can go from i E Ca
only to some state in C, (i.e., p 1 > 0 only if j e C 1 ). From states in C,, in one
step the process can go only to states in C 2 . This means that in two steps the
process can go from i only to states in C 2 (i.e., p; > 0 only if je C 2 ), and so
on. In d steps the process can go from i only to states in Cd + , = CO3 completing
one cycle (of d steps). Again in d + 1 steps the process can go from i only to
states in C 1 , and so on. In general, in sd + r steps the process can go from i
only to states in Cr. Schematically, one has the picture in Figure 5.1 for the
case d = 4 and a fixed state i e Co of period 4.
Example S.S. In the case of the unrestricted simple random walk, the period
is 2 and all states are essential and communicate with each other. Fix i = 0.
Then C o = {0, 2, 4, ...}, C l = (1, 3, 5, ...}. If we take i to be any
even integer, then C O3 C l are as above. If, however, we start with i odd, then
C o = { 1, 3, 5, ...}, C, = {0, 2, 4, ...}.
C,
iEG,--.jEC j ^kEC,./EC,^mEC^
Figure 5.1
6 CONVERGENCE TO STEADY STATE FOR IRREDUCIBLE AND

APERIODIC MARKOV PROCESSES ON FINITE SPACES
As will be demonstrated in this section, if the state space is finite, a complete

analysis of the limiting behavior of p", as n oo, may be carried out by
elementary methods that also provide sharp rates of convergence to the so-called
steady-state or invariant distributions. Although later, in Section 9, the
asymptotic behavior of general Markov chains is analyzed in detail, including
the law of large numbers and the central limit theorem for Markov chains that
admit unique steady-state distributions, the methods of the present section are
also suited for applications to certain more general (nonfinite) state spaces (e.g.,
closed and bounded intervals). These latter extensions are outlined in the
exercises.
First we consider what happens to the n-step transition law if all states are
accessible from each other in one time step.
Proposition 6.1. Suppose S is finite and p,J > 0 for all i, j. Then there exists a
unique probability distribution it = {m 1 : j e S} such that
yrc ; p ;J =n J for alljeS (6.1)

i
and
^p;jîtJ1 (1Nb)" for all i,jeS, n>,1, (6.2)
where 6 = min{p ;J : i, je S} and N is the number of elements in S. Also, it J >, 6

for alljeS and 6<1/N.
Proof. Let M;" ) , m;" ) denote the maximum and the minimum, respectively, of
the elements {p: i e S} of the jth column of p'. Since p ;J >, 6 and
p ;J = 1 Y- k0 J P ik <- 1 (N 1)b for all i, one has
M}' ) mj 1) <l(N-1)S-6=IN5. (6.3)
Fix two states i, i' arbitrarily. Let J = { j e S: p ;J > p ; .J }, J' _ { j e S: p ;J ^ p ; J }.

Then,
0 = 1 1 = E (p Pi J) = Y_ (P Pij) + Y_ (Pij PrJ),

,
J JE)' JE)
so that
I (Pij PrJ) = Y_ (p Pi,j), (6.4)

je)' JE

CONVERGENCE TO STEADS` STATE 127
and
I (Pij P) _ y- Pij
jE jEJ jEJ
= 1 y Pij y pi'j 1 (#J')6 (#J)6 = I N. (6.5)

jE ' jE
Therefore,
(n + 1)(n + 1) (n) (n) (n)
nj Pi'j = Pik Pkj Pi'k Pj
k = (Pik Pi'k)Pj
k
k k k
/ (n) (n)
\Pik Pi'k)Pkj + (Pik Pi'k)Pkj
kcJ kEJ'
(Pik Pi'k)Mj ") + Y- (Pik Pi'k)mj ")
kJ kJ'
min)). (6.6)
_ (Mj mj )` ") ") y (Pik Pik)) (1 NS)(M
`keJ
Letting i, i' be such that p (n+l) _ Mj(n+l) p1j = m(n+l) one gets from (6.6),
Min+l) m 5v+l) < (1 N6)(Mj(" min)). ) (6.7)
Iteration now yields, using (6.3) as well as (6.7),
M ) m (1 N(5)" for n >, 1. (6.8)
Now
M in+ 1) = max (n+ 1) = max I )) < max (Y- p ik M(n)1 = Mcn) ,
P \ P'k
Pkj j J
i i k i k
m
r 1) =min p^^ +i) = min Y_ P ik Pkj) min )
(^ Pikm;" ^ = m;" ) ,
i i \ k J i k
i.e., Mj(" is nonincreasing and m

) nondecreasing in n. Since M;" , mj" are ) )
bounded above by 1, (6.7) now implies that both sequences have the same limit,
say n j . Also, 6<mj(' <m<nj <Mj for all n, so that n j >6 for all jand
) ()
\p i1)_ 7El\ M(n) m^n)

m ,")M(n)
which, together with (6.8), implies the desired inequality (6.2).

Finally, taking limits on both sides of the identity
Pij +l) _ > P ik Pkj )

(6.9)
k
one gets it j = E k Trk Pkj, proving (6.1). Since E j p;^ = 1, taking limits, as n -- co,
)
it follows that Y j nj = 1. To prove uniqueness of the probability distribution

it satisfying (6.1), let it = f nj : j e S} be a probability distribution satisfying
ic'p = it'. Then by iteration it follows that
frj = n;P;j = (it'p)j = (n'pp)j = (n'p 2 )j = ... _ (n'p")j = Y n;P. (6.10)
Taking limits as n * oo, one gets nj = Y ; n ; ij = n j . Thus, n = n. n
For an irreducible aperiodic Markov chain on a finite state space S there is

a positive integer v such that
6':=minp;j ) >0. (6.11)

^.j
Applying Proposition 6.1 with p replaced by p" one gets a probability

it = {rc j : j e S} such that
max nj l < (1 Na')", n = 1, 2 .... (6.12)

;.j
Now use the relations
IP(jv+r") 1Cjl = - v)
Pik ) (Pkj x j ) ) 1< L, Pik )(1 N(5 ' ) " = (1 Nb ' ) ,
" in i 1,
k k
(6.13)
to obtain
p8j) it! < (1 N8')[ "iv) n = 1, 2, ... , (6.14)
where [x] is the integer part of x. From here one obtains the following corollary
to Proposition 6.1.
Corollary 6.2. Let p be an irreducible and aperiodic transition law on a state

space S having N states. Then there is a unique probability distribution it on
S such that
Y_n;p;j =nj foralljeS. (6.15)

;
Also,
Ip!) ijl < (1 N6')r"J" ] for all i, je S, (6.16)
for some 6' > 0 and some positive integer v.

CONVERGENCE TO STEADY STATE 129
The property (6.15) is important enough to justify a definition.
Definition 6.1. A probability measure it satisfying (6.15) is said to be an

invariant or steady-state distribution for p.
Suppose that it is an invariant distribution for p and let {X n } be the Markov

chain with transition law p starting with initial distribution n. It follows from
the Markov property, using (2.9), that
Pi
Pn(Xm = lo, Xm+1 = i l , ... , Xm+n = in) _ (TC ' P m )i0Pt0t Pi 1 i2
1 .. .
r 11.
_ lt lo PioiI Pi,(z. . . D)n Iin
= P (X0 = i0, X1 = i1,

P . Xn = in),
(6.17)
for any given positive integer n and arbitrary states i 0 , i 1 , ... , i n e S. In other
words, the distribution of the process is invariant under time translation; i.e.,
{Xn } is a stationary Markov process according to the following definition.
Definition 6.2. A stochastic process { Yn } is called a stationary stochastic

process if for all n, m
P(YO = i0, Y1 = i1, ... , Y. = in) = P(Ym = lo, Y1 +m = i1 , . Yn+m = in).

(6.18)
Proposition 6.1 or its corollary 6.2 establish the existence and uniqueness
of an invariant initial distribution that makes the process stationary. Moreover,
the asymptotic convergence rate (relaxation time) may also be expressed in the
form
IPi;) ir;I < ce ' - (6.19)
for some c, A > 0.

Suppose that {Xn } is a stochastic process with state space J = [c, d] and
having the Markov property. Suppose also that the conditional distribution of
Xn+ 1 given X. = x has a density p(x, y) that is jointly continuous in (x, y) and
does not depend on n >, 1. Given X0 = x 0 , the (conditional on X 0 ) joint density
of X 1 , ... , Xn is given by
J(Xi.....X..IXo=xo)(xl, ... , xn) = P(x0, X1)P(x1, X2) ...

P(Xn 1, xn). (6.20)
If X0 has a density u(x), then the joint density of X 0 , . .. , Xn is
Jxo,....X,)(xo, .. . , xn) = ( x0)P(xo, x1) ...

P(xn-1, xn). (6.21)
Now let
6 = min p(x, y). (6.22)

x.y c [c,dj
In a manner analogous to the proof of Proposition 6.1, one can also obtain
the following result (Exercise 9).
Proposition 6.3. If 6 = minx, Y E[C,dlp(x, y) > 0 then there is a continuous

probability density function n(x) such that
f d n(x)p(x, y) dx = n(y) for all y e (c, d) (6.23)
and
p( )(x, y) n(y)I '< [1 6(d c)]' '0 for all x, ye (c, d) (6.24)
where
0 = max { p(x, y) p(z, y)} . (6.25)

x,y, z e [c,dl
Here p "(x, y) is the n -step transition probability density function of X" given
[
Xo = x.
Markov processes on general state spaces are discussed further in theoretical

complements to this chapter.
Observe that if A = (( a s)) is an N x N matrix with strictly positive entries,
then one may readily deduce from Proposition 6.1 that if furthermore
yN , a, = 1. for each i, then the spectral radius of A, i.e., the magnitude of the
largest eigenvalue of A, must be 1. Moreover, 2 = I must be a simple eigenvalue
(i.e., multiplicity 1) of A. To see this let z be a (left) eigenvector of A
corresponding to A = 1. Then for t sufficiently large, z + to is also a positive
eigenvector (and normalizable), where it is the invariant distribution (normalized
positive eigenvector for A = 1) of A. Thus, uniqueness makes z a scalar multiple
of n. The following theorem provides an extension of these results to the case
of arbitrary positive matrices A = ((a u )), i.e., a 1 > 0 for 1 < i, j < N. Use is not
made of this result until Sections 11 and 12, so that it may be skipped on first
reading. At this stage it may be regarded as an application of probability to
analysis.
Theorem 6.4. [PerronFrobenius]. Let A = ((a ij )) be a positive N x N matrix.

(a) There is a unique eigenvalue A o of A that has largest magnitude.
Moreover, A is positive and has a corresponding positive eigenvector.

CONVERGENCE TO STEADY STATE 131
(b) Let x be any nonnegative nonzero vector. Then
v = tim A nAnx (6.26)

new
exists and is an eigenvector of A corresponding to A,, unique up to a

scalar multiple determined by x, but otherwise independent of x.
Proof. Define A + = {A > 0: Ax >, tix for some nonnegative nonzero vector x};
here inequalities are to be interpreted componentwise. Observe that the set A +
is nonempty and bounded above by IIAII :_ JN , J" , a ij . Let A o be the least
upper bound of A. There is a sequence {2 n : n >, l} in A + with limit A o as
n -+ oc. Let {x n : n >, l} be corresponding nonnegative vectors, normalized so
that hIxIl xi = 1, n >, 1, for which Ax n >, .? n x n . Then, since Ilxnll = 1 ,
n = 1, 2, ... , {x n } must have a convergent subsequence, with limit denoted x 0 ,
say. Therefore Ax 0 >, 2 0 x 0 and hence A o e A + . In fact, it follows from the least
1 _j a x
upper bound property of 2 c that Ax o = 2 0 x 0 . For otherwise there must be a
component with strict inequality, say l j j 2 0 x 1 = > 0, where
x 0 = (x 1 ..... x N )', and Ij , a kj x j 2 o x k >, 0, k = 2, ... , N. But then taking
y = (x 1 + ((5/2), x 2 , ... , x N )' we get Ay > pl o y with strict inequality in each
component. This contradicts the maximality of 2. To prove that if A is any
other eigenvalue then Al J <, A., let z be an eigenvector corresponding to A and
define Izl = (Iz1l, , Iz n l). Then Az = Az implies AIzl %JAIIzl. Therefore, by
definition of 2, we have 1 2 1 <, 20. To prove part (ii) of the Theorem we can
apply Proposition 6.1 to the transition probability matrix
C ^ P`, 10xi
where x o = (x l , ... , x r,) is a positive eigenvector corresponding to A c,. In

particular, noting that
(2
)X j
(2) N
Puj k1 Pik Pkj
N a ik x k a kj xj a
k = 1 20x1 AOxk Ô X
Xi
and inductively
^n> ai] xj
Pij ^f n >
A xi
the result follows. n

Corollary 6.5. Let A = ((a ij )) be an N x N matrix with positive entries and
let B be the (N 1) x (N 1) matrix obtained from A by striking out an ith
row and jth column. Then the spectral radius of B is strictly smaller than that
of A.
Proof. Let Z oo be the largest positive eigenvalue of B and without loss of

generality take B = ((a 1 : i, j = 1, ... , N 1)). Since, for some positive x,
N
Y
j=1
a,j x j = 2 o x 1 , i = 1, .. , N, x i > 0,
we have
N-1
(Bx)i = a,jxj = 2oxi a,NXN < 2OX;, 1= 1,. ..,N 1.
j=1
Thus, by the property (6.26) applied to Z oo we must have .loo <2 g . n
Corollary 6.6. Let A = ((a ;j )) be a matrix of strictly positive elements and let
2 0 be the positive eigenvalue of maximum magnitude (spectral radius). Then
A 0 is a simple eigenvalue.
Proof. Consider the characteristic polynomial p(A) = det(A Al). Differen-

tiation with respect to A gives p'(2) = yk=, det(A k ) I), where A k is the
matrix obtained from A by striking out the kth row and column. By Corollary
6.5 we have det(A k 1 o I) 0, since each A k has smaller spectral radius than
). o . Moreover, since each polynomial det(A k AI) has the same leading term,
they all have the same sign at A = A. Thus p'(2 o ) 0 0. n
Another formula for the spectral radius is given in Section 14.
7 STEADY-STATE DISTRIBUTIONS FOR GENERAL

FINITE-STATE MARKOV PROCESSES
In general, the transition law p may admit several essential classes, periodicities,
or inessential states. In this section, we consider the asymptotics of p" for such
cases.
First suppose that S is finite and is, under p, a single class of periodic essential
states of period d> 1. Then the matrix p , regarded as a (one-step) transition
probability matrix of the process viewed every d time steps, admits d
(equivalence) classes C o , C l , ... , C_ 1 , each of which is aperiodic. Applying
(6.16) to Cr and p d (instead of S and p) one gets, writing N, for the number of
elements in Cr ,

^pi; ^ ic j 1 '< (1 Nr S r
)In/vr for i, j a Cr, (7.1)
where n j = lim n . p;^l, v, is the smallest positive integer such that p!yd > > 0
STEADY-STATE DISTRIBUTIONS 133
for all i, j e C and b r = min{ pij, d ) : i, j e C,}. Let 6 = min{b,: r = 0, 1, ... , d l}.
Then one has, writing L = min{ N,: 0 < r < d 1 }, v = max{v,: 0 < r < d 1 },
piid)j1 <(1 Lc5)I'I for i, je C, (r=0,1,. ..,d 1). (7.2)

)
If i E C, and j c C S with s = r + m (mod d), then one gets, using the facts that
pkjd) = 0 if k is not in Cs , and that Y-kEC. pik ) = 1 ,
d+m) 71j )I
7I j l = I I Pik)(Pkjd) (I L5)'" for i e Cr, E Cs .
^ Pij j
keC s
(7.3)
Of course,
d+m') = 0 if i e C j E Cs , and m' s r (mod d). (7.4)
Note that {nj : j e Cr } is the unique invariant initial distribution on C r for the
restriction of p d to C,.
Now, in view of (7.4),
nd nI
= pit'd+m)

if je C j E Cs , and m = s r (mod d). (7.5)
t=1 no
If r = s then m = 0 and the index of the second sum in (7.5) ranges from 1 to
n. By (7.3) one then has
I nd
lim - Y pii = 7r J.
) (i, j ES). (7.6)
nac n t=1
This implies, on multiplying (7.6) by 1/d,
n
n
lim - P _
) (i, j e S). (7.7)
nm nt=1 d
Now observe that it = {n j /d: j e S} is the unique invariant initial distribution

of the periodic chain defined by p since
k 1 n
keS d
>i
, lim
kes n-oo
-
n [=I
P ik ) Pkj
n
7C
= lim - Y Pi;+ I) = (j e S). (7.8)
n-xnI d
Moreover, if it = {i c j : j e S} is another invariant distribution, then

-
Frj = nkPkj it] (t = 1, 2, ...), (7.9)

kES keS
so that, on averaging over t = 1, ... , n 1, n,
na
fj = 1 Y_ Y ^k Pkj =
=1 kES keS
nk ^ 1
n=1
Pk(
i
(7.10)
The right side converges to Y- k n k (rc j /d) = n j /d, as n * oc by (7.7). Hence

n = /d.
;
If the finite state space S consists of b equivalence classes of essential states

(b > 1), say 61,, &'z , ... .9b , then the results of the preceding paragraphs may
be applied to each , for the restriction of p to ef, separately. Let it denote
the unique invariant initial distribution for the restriction of p to f;. Write n ` ( )
for the probability distribution on S that assigns probability 71j(` to states j e 9i )
and zero probabilities to states not in 4 Then it is easy to check that ^6 = I a ; t" )
is an invariant initial distribution for p, for every b-tuple of nonnegative numbers

... , a b satisfying = 1. The probabilistic interpretation of this is
as follows. Suppose by a random mechanism the equivalence class .6 ; is chosen
with probability a i (1 < i < b). If tf, is the class chosen in this manner, then
start the process with initial distribution n ' on S. The resulting Markov chain
( )
{X} is a stationary process.

It remains to consider the case in which S has some inessential states. In the
next section it will be shown that if S is finite, then, no matter how the Markov
chain starts, after a finite (possibly random) time the process will move only
among essential states, so that its asymptotic behavior depends entirely on the
restriction of p to the class of essential states. The main results of this section
may be summarized as follows.
Theorem 7.1. Suppose S is finite.

(i) If p is a transition probability matrix such that there is only one essential
class ' of (communicating) states and these states are aperiodic, then
there exists a unique invariant distribution it such that y j , S n ; = 1 and
(6.16) holds for all i, j e if.
(ii) If p is such that there is only one essential class of states o' and it is
periodic with period d, then again there exists a unique invariant
distribution it such that ;E , n j = 1 and (7.3), (7.4), (7.7) hold for i, j e
and i; =
(iii) If there are b equivalence classes of essential states, then (i), (ii), as the
case may be, apply to each class separately, and every convex
combination of the invariant distributions of these classes (regarded as
state spaces in their own right) is an invariant distribution for p.
MARKOV CHAINS: TRANSIENCE AND RECURRENCE PROPERTIES 135
8 MARKOV CHAINS: TRANSIENCE AND RECURRENCE

PROPERTIES
Let {X} be a Markov chain with countable state space S and transition
probability law p = ((p ij )). As in the case of random walks, the frequency of
returns to a state is an important feature of the evolution of the process.
Definition 8.1. A state j is said to be recurrent if
PJ (X =j i.o.) = 1, (8.1)
and transient if
PJ(X =j i.o.) = 0. (8.2)
Introduce the successive return times to the state j as
t^ =0,
} '}=min{n>0:X=j},
(8.3)
r=min{n> T:X=j} (r= 1,2,...),
with the convention that Tj(' is infinite if there is no n >

} for which X =j.
Write
pj1=Pj(X=i for some n? 1)=Pj (r;<oo). (8.4)
Using the strong Markov property (Theorem 4.1) we get
Pj(T! ') < 00) = Pj (ii' - ' < oo and Xt,,- + = i for some n i 1)
}
= Ej( 1 {ty- < } PXt , r - u (X = i for some n > 1))

= E j (1,- < R,(X = i for some n? 1))
= Ej( 1 {t^.- <x)Pij) = Pj(T,'-n < 00 )P. (8.5)
Therefore, by iteration,
Pj(T ! ' ) < cc) = Pj(i{ l } < cc)P' - ' = Pji P - ` (r = 2, 3, ...). (8.6)
In particular, with i =j,

)
Pj(T(' <cc)=Pjj (r=1,2,3,...). (8.7)
Now
Pj (X =j for infinitely many n) = Pj (rj(' < cc for all r) }

= ^Pii=1
1 i
= Jim P i (T;r ^ < oo) f (8.8)
r .. 0 if pii < 1.
Further, write N(j) for the number of visits to the state j by the Markov chain
{X}, and denote its expected value by
G(i,j) = E.N(j). (8.9)
Now by (8.6) and summation by parts (Exercise 1), if i j then
cc
E1N(j) _ Pj(N(j) > r) _ Pr+t < )
op) = Pji (8.10)
r=0 r=0 r=0
so that, if i 0 j,
0 if i +* j, i.e., if p ii = 0,
G(i, j) = Pii/(l - p) if i --, j and p 1, (8.11)
00 if i -+ f and pii = 1.
This calculation provides two useful characterizations of recurrence; one is in

terms of the long-run expected number of returns and the other in terms of the
probability of eventual return.
Proposition 8.1
(a) Every state is either recurrent or transient. A state j is recurrent iff p ii = 1
iff G(j, j) = oc, and transient if pii < 1 if G(j, j) < oo.
(b) If j is recurrent, j --* i, then i is recurrent, and pi; = p ii = 1. Thus,
recurrence (or transience) is a class property. In particular, if all states
communicate with each other, then either they are all recurrent, or they
are all transient.
(c) Let j be recurrent, and S(j) _ {i e S: j -- i) be the class of states which
communicate with j. Let n be a probability distribution on S(j). Then
P,r (X visits every state in S(j) i.o.) = 1. (8.12)
Proof. Part (a) follows from (8.8), (8.11). For part (b), suppose j is recurrent
and j - i (i j). Let A r denote the event that the Markov chain visits i between
the rth and (r + 1)st visits to state). Then under Pi , A, (r >, 0) are independent
events and have the same probability 6, say. Now 0 > 0. For if 0 = 0, then
PJ (X = i for some n >, 1) = Pj ((J r>0 A,) = 0, contradicting j - i. It now
follows from the second half of the Borel-Cantelli Lemma (Chapter 0, Lemma
6.1) that PJ (A. i.o.) = 1. This implies G(j, i) = oo. Interchanging i and j in (8.11)
one then obtains p,, = 1. Hence i is recurrent. Also, pi; >_ Pj (A r i.o.) = 1. By
the same argument, p ii = 1.

MARKOV CHAINS: TRANSIENCE AND RECURRENCE PROPERTIES 137
To prove part (c) use part (b) to get
P* (X visits i i.o.) = 1] n k P,k (X visits i i.o.) = I n,^ = 1. (8.13)

k c S(j) k E S(j)
Hence
Pn( n {XX visits i i.o.}) = 1. (8.14)

iES(j)
n
Note that G(i, i) = 1/(1 p a ), i.e., replace p, by 1 in (8.10).
For the simple random walk with p> Z, one has (see (3.9), (3.10) of
Chapter I),
j
)'-
(1 2p) for i > j,

G(i, j) _ (qp (8.5)
1/(1-2p) for i<j.
Proposition (8.1) shows that the difference between recurrence and transience
is quite dramatic. If) is recurrent, then P P (N(j) = cc) = 1. If] is transient, not
only is it true that P3 (N(j) < oo) = 1, but also E1 (N(j)) < oo. Note also that
every inessential state is transient (Exercise 7).
Example 1. (Random Rounding). A real number x >, 0 is truncated to its

(greatest) integer part by the function [x] := max{n E 7L: n < x}. On the other
hand, [x + 0.5] is the value of x rounded-off to the nearest whole number.
While both operations are quite common in digital filters, electrical engineers
have also found it useful to use random switchings between rounding and
truncation in the hardware design of certain recursive digital multipliers. In
random rounding, one applies the function [x + u], where u is randomly selected
from [0, 1), to digitize. The objective in such designs is to remove spurious fixed
points and limit cycles (theoretical complement 8.1).
For the underlying mathematical ideas, first consider the simple one-
dimensional digital recursion (with deterministic round-off), x + i = [ax + 0.5],
n = 0, 1, 2, ... , where dal < 1 is a fixed parameter. Note that x o = 0 is always
a fixed point; however, there can be others in 71 (depending on a). For example,
if a = Z, then x o = 1 is another. Let U 1 , U 2 ,... be an i.i.d. sequence of random
variables uniformly distributed on [0, 1) and consider X, = [aX + U, + ,],
n >, 0 on the state space 7L. Again X0 = 0 is a fixed point (absorbing state).
However, for this case, as will now be shown, X. --> 0 as n oo with probability
1. To see this, first note that 0 is an essential state and is accessible from any
other state x. That 0 is accessible in the case 0 < a 0, [ax + u] _ [ax] < ax <x if u E [0, 1 (ax [ax] )). If x < 0, ax 0 7L,
^[ax + u]I = [ax] + II < alx) if ue(1 + ([ax] ax), 1). In either case, these
events have positive probability. If ax c 11, then I[ax + u]I = laxl. Since
0 is absorbing, all states x ; 0 must, therefore, be inessential, and thus transient.
To obtain convergence, simply note that the state space of the process started
at x is finite since, with probability 1, the process remains in the interval
[ Ixl, Ixl]. The finiteness is critical for this argument, as one can readily see
by considering for contrast the case of the simple asymmetric (p > 2) random
walk on the nonnegative integers with absorbing boundary at 0.
Consider now the k-dimensional problem X n+ 1 = [AX n + U n+ 1 ], n = 0, 1, 2,
, where A is a k x k real matrix and {U n } is an i.i.d. sequence of random
vectors uniformly distributed over the k-dimensional cube, [0, 1) k and [ ] is
defined componentwise, i.e., [x] = ([x 1 ], ... , [x k ]), x e 11 k It is convenient to
use the norm II II defined by Ilx lie := max{Ix,l, , x,j}. The ball
B,(0):= {x: Ilxllo < r} (of radius r centered at 0) for this norm is the square of
side length 2r centered at 0. Assume II Ax II o < II x II o, for all x 0 (i.e.,
IIAII := sup, , 0 I1AxiI 0 /Ilx11 0 < 1) as our stability condition. Once again we wish
to show that X n 0 as n > oo with probability 1. As in the one-dimensional
case, 0 is an (absorbing) essential state that is accessible from every other state
x because there is a subset N(x) of [0, 1) k having positive volume such that for
each u c N(x), one has II[Ax + u]110 < IIAxllo < Ilxll o . So each state x # 0 is
inessential and thus transient. The result follows since the process starting at
x e 71k does not escape B 11x1 ^ 0 (0), since II[Ax + u]110 < I1Axllo + I < 11x1, + 1
and, since II[Ax + u]Il o and Ilx110 are both integers, (I[Ax + u](lo < 11x10
Linear models X n+ , = AX n + E n+ ,, with {E n } a sequence of i.i.d. random
vectors, are systematically analyzed in Section 13.
9 THE LAW OF LARGE NUMBERS AND INVARIANT

DISTRIBUTIONS FOR MARKOV CHAINS
The existence of an invariant distribution is intimately connected with the

limiting frequency of returns to recurrent states. For a process in steady state
one expects the equilibrium probability of a state j to coincide with the fraction
of time spent on the average by the process in state j. To this effect a major
goal of this section is to obtain the invariant distribution as a consequence of
a (strong) law of large numbers.
Assume from now on, unless otherwise specified, that S comprises a single
class of (communicating) recurrent states under pfor the Markov chain {X}.
Let f be a real-valued function on S, and define the cumulative sums
Let
Sn = f(Xm) (n = 1, 2, ...). (9.1)

m=0
For example, if f (i) = 1 for i = j and f (i) = 0 for i j, then S/(n + 1) is

THE LAW OF LARGE NUMBERS 139
the average number of visits to j in time 0 to n. As in Section 8, let t.r denote

the time of the rth visit to state j. Write the contribution to the sum S, from
the rth block of'time (tr, zr + "] as,
Zj r I I I
Z r = I f^(X,,,) (r=0,1,2,...). (9.2)

in =+ I
By the strong Markov property, the conditional distribution of the process

{Xt l ,, Xt ..,, + ,, .. , X r ir +n , ...}, given the past up to time r, is PP , which is the
distribution of {X 0 , X,, ... , Xn , ...} under X0 =1. Hence, the conditional
distribution of Zr given the process up to time Tcr' is that of
Z o = [(X 1 ) + + f (X T ), given X 0 =1. This conditional distribution does
not change with the values of X 0 , X 1 .... , X(= j), rfr'. Hence, Zr is
independent of all events that are determined by the process up to time rfr . In )
particular, Zr is independent of Z,, ... , Z r _ . Thus, we have the following result.
Proposition 9.1. The sequence of random variables {Z 1 , Z 2 , ...} is i.i.d., no

matter what the initial distribution of {X,,: n = 0, 1, 2, ...} may be.
This decomposition will be referred to as the renewal decomposition. The strong

law of large numbers now provides that, with probability 1,
r
lim Z Z= EZ 1 , (9.3)
r-+x r 1
provided that EIZ 1 I < oc. In what follows we will make the stronger assumption
provided
that
T 2,
E I f(Xm)I < oo . (9.4)

m= tJ! I ,
Write Nn for the number of visits to state j by time n. Now,
NN =max{r>0:tjr'<n}. (9.5)
Then (see Figure 9.1),

r N tih * 1^
n 1 J( m)
m=0 r=1 n+I
For each sample path there are a finite number, t 1, of summands in

the first sum on the right side, except for a set of sample paths having probability
zero. Therefore,
S Al)
T
f(i) f(Xn)
x,=r T T
f(j) f(j) f(j) f(j) f(j)
---- T-------I Y-------------T--------------z---
f(k)
k T T
CO tI... i (') C (') + 1 10)... T oi ii +l ... T in L N. +I...

0 1 .. n n+ I ... T iN.+u...
f(xn1)
7...
U t I I
ZIA Zi Z, ... ZN
"-I Z.N,
Sn
Figure 9.1
1 T 11)
lim ^ f(Xm ) = 0, with probability 1. (9.7)

n-.m n m=0
The last sum on the right side of (9.6) has at most r;'"' ) r( N ^ ) summands,
this number being the time between the last visit to j by time n and the next
visit to j. Although this sum depends on n, under the condition (9.4) we still
have that (Exercise 1)
1T '" a l l 1 T'N-' 1 1
Y_ .f(X.)I' I If(Xm)^ +0 a.s. as n co. (9.8)

n m=n n m-r"+1
Therefore,
S _ 1 N^ N 1 N^
2] Z,+R n = " E Z,+R n , (9.9)
n n, 1n
Nn ,= 1
where R n --* 0 as n --> oo with probability 1 under (9.4). Also, for each sample
path outside a set of probability 0, N N oo as n oo and therefore by (9.3)
(taking limit over a subsequence)
1 N^
lim Y Z r = EZ i (9.10)
n_^
if (9.4) holds. Now, replacing f by the constant function f - 1 in (9.10), we have

_' = E(tj 2) T i
-
lim L_ (9.11)
nx Nn
assuming that the right side is finite. Since
T (N,J
nT (N " ) < T(N"+ 1)
J J J
which is negligible compared to N n as n oo, one gets (Exercise 2)
tim n = E( T ^. 2 ) r'') (9.12)

, Nn
Note that the right side, E(r 2) t'), is the average recurrence time of
j(= E j tj(' ) ), and the left side is the reciprocal of the asymptotic proportion of
time spent at].
Definition 9.1. A state j is positive recurrent if
E T;' < oo.

; ) (9.13)
Combining (9.9)(9.11) gives the following result. Note that positive recurrence
is a class property (see Theorem 9.4 and Exercise 4).
Theorem 9.2. Suppose j is a positive recurrent state under p and that f is a

real-valued function on S such that
E {If(X1)I + ... + If(X)I} < oo.

;
(9.14)
Then the following are true.

(a) With F3 -probability 1,
n
lim -- Z f(Xm) = E;(.f (X,) + ... + .i(X , ))/E^T; 1) . (9.15)
n-aao n m=O
(b) If S comprises a single class of essential states (irreducible) that are all
positive recurrent, then (9.15) holds with probability I regardless of the
initial distribution.
Corollary 9.3. If S consists of a single positive recurrent class under p, then

for all i, j E S,
1 n
lim - P;;) _ (9.16)
n- n m=i Et
A WORD ON PROOFS. The calculations of limits in the following proofs require

the use of certain standard results from analysis such as Lebesgue's Dominated
Convergence, Fatou's Lemma, and Fubini's theorem. These results are given
in Chapter 0; however, the reader unfamiliar with these theorems may proceed
formally through the calculations to gain acquaintance with the statements.
Proof Take f to be the indicator function of the state j,
f(j)
I for all k ^ j. (9.17)
f(k)=0
Then Zr - 1 for all r = 0, 1, 2, ... since there is only one visit to state j in
(T cr) ,
i(r+1)] Hence, taking expectation on both sides of (9.15) under the initial
state i, one gets (9.16) after interchanging the order of the expectation and the
limit. This is permissible by Lebesgue's Dominated Convergence Theorem since
(n + I)' YOf(Xm)I < 1.
It will now be shown that the quantities defined by
i; =(Eji; n ) - ' (jES) (9.18)
constitute an invariant distribution if the states are all positive recurrent and
communicate with each other. Let us first show that Y- t it = I in this case. For
this, introduce the random variables
Ti(" ) = # {m e (r (.' ) , i (.' +i)]: Xm = i} (r = 0, 1, 2, ...), (9.19)
i.e., T i ' is the amount of time the Markov chain spends in the state i between
( )
the rth and (r + 1)th passages to j. The sequence {T: r = 1, 2, ...} is i.i.d. by
the strong Markov property. Write
O (i) = E J T;' . ) (9.20)
Then,
03(i) _ EJ T; 1) = Ej ( T}1)) = E j ( t f2 _ r ) = -.
(9.21)
ics 1 l l 7f
Also, taking f to be the indicator function of {i} and replacing j by i in Theorem

9.2, one obtains
#{m n:X m = = Ir!

lim h w probability
it 1. (9.22)
n-+ao n
On the other hand, by the strong law of large numbers applied to

r = 1, 2, ...}, and by (9.12) and (9.18), the limit on the left side also equals
N
V T(r)
nJ IN, T1'
lim -h---- = lim " N = n 1 Bj (i). (9.23)
n- oc n- x n \r1 n
Hence,
ni = n j Oj (i). (9.24)
Adding over i, one has using (9.21),
(9.25)
71 f
By Scheffe's Theorem (Theorem 3.7 of Chapter 0) and (9.16),
In
- 71i +0 as n ^ oo . (9.26)
iES nm =1
Hence,
1 1
Pam) Pij < TCi - Y- pj, " ') --> 0 as n
iES iES n m= 1 iES n m= 1
(9.27)
Therefore,
ni Pi; = lim E - Y Pjl Pi; ")
iES n-^ iESn m=1
in
= hm - Y E p^.m)p . `
n-âo n m= 1 iES
1 n (m+l) _
= lim Pi; it; (9.28)
n-'x n m=1
In other words, it is invariant.

Finally, let us show that for a positive recurrent chain the probability
distribution it above is the only invariant distribution. For this let it be another
invariant distribution. Then
it = z niPij (m = 1, 2, ...).
; )
(9.29)
iES
Averaging over m = 1, 2, ... , n, one gets
1
n; = Y_ ni
ieS n m=1
i Pi; (9.30)
The right side converges to Y_ i n i n; = nj by Lebesgue's Dominated Convergence

Theorem. Hence n; = n; .
We have proved above that if all states of S are positive recurrent and
communicate with each other, then there exists a unique invariant distribution
given by (9.18). The next problem is to determine what happens if S comprises
a single class of recurrent states that are not positive recurrent.
Definition 9.2. A recurrent state j is said to be null recurrent if
Ej t,' = cc. (9.31)
If j is a null recurrent state and i..j, then for the Markov chain having initial
state i the sequence {Z,: r = 1, 2, ...} defined by (9.2) with f - 1 is still an i.i.d.
sequence of random variables, but the common mean is infinity. It follows
(Exercise 3) that, with Pi -probability 1,
j (N")
llm ' = cc.

nôo Nn
Since n we have
n+l
lim = cc,
n- w Nn
and, therefore,
lim N = 0
" with P- probability 1. (9.32)
". te n+l
Since 0 < N" /(n + 1) < 1 for all n, Lebesgue's Dominated Convergence
Theorem applied to (9.32) yields
N
lim E " ; = 0. (9.33)
n -^ n+1
But
EiNn = Ei(t. 1 (x.., =J)) = Z pi) , (9.34)

m=1 m=1

and (9.33), (9.34) lead to
lim p) = 0 (9.35)
n-.o Il + 1 m=1
if j is null-recurrent and j -+ i (or, equivalently, j H i). In other words, (9.16)

holds if S comprises either a single class of positive recurrent states, or a single
class of null recurrent states.
In the case that j is transient, (8.11) implies that G(i, j) < co, i.e.,
00
p;; ) <ao.
m=0
In particular, therefore,
lmp;; = 0 ) (in S), (9.36)

n -.
if j is a transient state.
The main results of this section may be summarized as follows.
Theorem 9.4. Assume that all states communicate with each other. Then one
has the following results.
(a) Either all states are recurrent, or all states are transient.
(b) If all states are recurrent, then they are either all positive recurrent, or
all null recurrent.
(c) There exists an invariant distribution if and only if all states are positive
recurrent. Moreover, in the positive recurrent case, the invariant
distribution it is unique and is given by
n= (E;i ')) 1 (j E S). (9.37)
(d) In case the states are positive recurrent, no matter what the initial
distribution p, if E,, I f (X 1 )I < cc, then
n
him !
- Z f(Xm) = Z nif(i) = Enf(X,) (9.38)
n c m=1 ieS
with P -probability 1.
Proof. Part (a) follows from Proposition 8.1(a).

To prove part (b), assume that j is positive recurrent and let i 0 j. Since
T`r) < r+1) _ T^'
) , one has ET;' < co. Hence (9.23) holds. Also, (9.22) holds
)
with iv i > 0 if i is positive recurrent, and it = 0 if i is null recurrent (use (9.32)

in the latter case). Thus (9.24) holds. Now O (i) > 0; for otherwise T = 0 with
probability 1 for all r > 0, implying p , = 0, which is ruled out by the assumption.
;
The relation (9.24) implies it > 0, since it > 0 and O (i) > 0. Therefore, all states
are positive recurrent.
For part (c), it has been proved above that there exists a unique invariant
probability distribution it given by (9.35) if all states are positive recurrent.
Conversely, suppose it is an invariant probability distribution. We need to show
that all states are positive recurrent. This is done by elimination of other
possibilities. If the states are transient, then (9.36) holds; using this in (9.29)
(or (9.30)) one would get it = 0 for all j, which is a contradiction. Similarly,
null recurrence implies, by (9.30) and (9.35), that ii = 0 for all j. Therefore, the
states are all positive recurrent. Part (d) will follow from Theorem 9.2 if:
(i) The hypothesis (9.14) holds whenever
Y iil.f(i)I < oo, (9.39)

ics
and
n() E;Zo = E;(.i(X1) + ... + .i(XT;1)) = 1 i f(i). (9.40)

tj ics
To verify (i) and (ii), first note that by the definition of
=)
T!
If(Xm)I = If(i)IT;' (9.41)

m I iES
Since ET;' = O (i) = n i /rr (see Eq. 9.24), taking expectations in (9.41) yields
Ta
E( If(Xm)I = E If(i)IT')) = If(i)I"`. (9.42)

m=T^ ) +I ies iES 7rj
The last equality follows upon interchanging the orders of summation and
expectation, which by Fubini's theorem is always permissible if the summands
are nonnegative. Therefore (9.14) follows from (9.39). Now as in (9.42),
T cn
E;Z0 = EZi = E .i(Xm) = E( , i(i)T;' )

m +i lES
_ f(i)ET' ) = 1 f(i) 1 , (9.43)

ics I)
where this time the interchange of the orders of summation and expectation is
justified again using Fubini's theorem by finiteness of the double "integral."
n
If the assumption that "all states communicate with each other" in Theorem
9.4 is dropped, then S can be decomposed into a set . t of inessential states and
(disjoint) classes S l , S 2 , ... , St of essential states. The transition probability
matrix p may be restricted to each one of the classes S..... , S, and the
conclusions of Theorem 9.2 will hold individually for each class. If more than
one of these classes is positive recurrent, then more than one invariant
distribution exist, and they are supported on disjoint sets. Since any convex
combination of invariant distributions is again invariant, an infinity of invariant
distributions exist in this case. The following result takes care of the set , of
inessential states in this connection (also see Exercise 4).
Proposition 9.5
(a) If j is inessential then it is transient.
(b) Every invariant distribution assigns zero probability to inessential,
transient, and null recurrent states.
Proof. (a) If j is inessential then there exist i E S and m > I such that
P;M > 0
) and p;; = 0
) for all n >, 1. (9.44)
Hence
PJ(N(j)<c)>Pj(Xm =i, X,^j for n>m)

=pP.(Xn rj for n>0)=p;7' 1 >0. (9.45)
By Proposition 8.1, j is transient, since (9.45) says j is not recurrent.

(b) Next use (9.36), (9.35) and (9.30) and argue as in the proof of part (c)
of Theorem 9.2 to conclude that nj = 0 if j is either transient or null recurrent.
n
Corollary 9.6. If S is finite, then there exists at least one positive recurrent
state, and therefore at least one invariant distribution n. This invariant
distribution is unique if and only if all positive recurrent states communicate.
Proof. Suppose it possible that all states are either transient or null recurrent.
Then
lim p7 = 0 for all i, j e S. (9.46)

n -.^ n+1 m -p
Since (n + 1) ' I, _ , p;T < I for all], and there are only finitely many states
)
j, by Lebesgue's Dominated Convergence Theorem,

1 1 "
lim Pi; ^) = lim Pi;
jes n n+ 1 m=1 nix jes n+ 1 m=o
1
= lim (m)
%îj
n -. x n + 1 m=0 jcs
n
=tim 1 i 1 =lim n--1 =1. (9.47)
(n + 1 m=o
- n-x,n+1
But the first term in (9.47) is zero by (9.46). We have reached a contradiction.
Thus, there exists at least one positive recurrent state. The rest follows from
Theorem 9.2 and the remark following its proof.
10 THE CENTRAL LIMIT THEOREM FOR MARKOV CHAINS
The same method as used in Section 9 to obtain the law of large numbers may
be used to derive a central limit theorem for S. = Ym =o f(Xm ), where f is a
real-valued function on the state space S. Write
= Ef(X0) = 1] it f(i), (10.1)

ieS
and assume that, for Z, = L , r fr, + 1 f (Xm ), r = 0, 1, 2, ...
Ej (Z0 - E ; Zo ) 2 < co. (10.2)
Now replace f by f = f - , and write
_ n _
Sn = Y- .J(X.) = Z (f(Xm) - U),
m=0 m=0
(10.3)
Zr = ^ J(X) (r = 0, 1, 2, ...).
m =,(i +1
Then by (9.40),
Ej Zr = (Ejt)En f(Xo) = 0 (r = 0, 1, 2, ...). (10.4)
Thus {Z,: r = 1, 2, ...} is an i.i.d. sequence with mean zero and finite variance
or = EJ . (10.5)
Now apply the classical central limit theorem to this sequence. As r - cc,
(1/^) jk =1 Zk converges in distribution to the Gaussian law with mean zero
and variance U 2 . Now express Sn as in (9.6) with f replaced by f, S" by S,, and
THE CENTRAL LIMIT THEOREM FOR MARKOV CHAINS 149
Zr by Zr, to see that the limiting distribution of (1/^)S n is the same as that
of (Exercise 1)
1 N,, Na )112
(
1 N"
(10.6)
7 r i
n JNn r=1
We shall need an extension of the central limit theorem that applies to sums
of random numbers of i.i.d. random variables. We can get such a result as an
extension of Corollary 7.2 in Chapter 0 as follows.
Proposition 10.1. Let {XX : j >, 1} be i.i.d., EX^ = 0, 0 < a 2 := EX J2 < oc. Let
{v n : n >, l} be a sequence of nonnegative integer-valued random variables with
v
lim " = a in probability (10.7)
n-,, n
for some constant a > 0. Then XX/v, converges in distribution to

N(0, a 2 ).
Proof. Without loss of generality, let a = 1. Write S n := X, + + Xn . Choose

F > 0 arbitrarily. Then,
) < P(Ivn [na]I

1/2
P(IS," Small > c([na]) s 3 [na])
+
P( max
(mI (na1I <L 3 Ina17
ISm (n]1 % c([na]) 1J2 )
(10.8)
The first term on the right goes to zero as n > x, by (10.7). The second term
is estimated by Kolmogorov's Maximal Inequality (Chapter 1, Corollary 13.7),
as being no more than
{s([na]) 1 J z ) -z (8 3 [na]) = r. (10.9)
This shows that
S" Sfna1 + 0
in probability. (10.10)
1r2
1[na])
Since S1na1 /([na]) l " 2 converges in distribution to N(0, 1), it follows from (10.10)
that so does S"/([na])'/ z . The desired convergence now follows from (10.7).
n
By Proposition 10.1, N'2 1 Zr is asymptotically Gaussian with mean
zero and variance o z . Since N/n converges to (Er') ', it follows that the
expression in (10.6) is asymptotically Gaussian with mean zero and variance
(E j i5 l) ) -1 0 2 . This is then the asymptotic distribution of n - '/ 2 5,,. Moreover,

defining, as in Chapter I,
Wn(t) = Sin(]
n+l
(10.11)
I
l' (t) = W(t) + (nt [nt])X 1+ 1 (t '> 0),
n+l
all the finite-dimensional distributions of {W(t)}, as well as {W(t)}, converge

in distribution to those of Brownian motion with zero drift and diffusion
coefficient
D = (Ejij1>)-'Q2 (10.12)
(Exercise 2). In fact, convergence of the full distribution may also be obtained
by consideration of the above renewal argument. The precise form of the
functional central limit theorem (FCLT) for Markov chains goes as follows (see
theoretical complement 1).
Theorem 10.2. (FCLT). If S is a positive recurrent class of states and if (10.2)

holds then, as n -+ oo, l4(I) = (n + 1) -112 S converges in distribution to a
Gaussian law with mean zero and variance D given by (10.12) and (10.5).
Moreover, the stochastic process {W(t)} (or {W(t)}) converges in distribution
to Brownian motion with zero drift and diffusion coefficient D.
Rather than using (10.12), a more convenient way to compute D is sometimes

the following. Write E,^, Var, f , Cov,, for expectation, variance, and covariance,
respectively, when the initial distribution is it (the unique invariant distribution).
The following computation is straightforward. First write, for any two functions
h, g that are square summable with respect to it,
= Z h(i)g(i)r1 = E,,[h(X0)g(X0)]. (10.13)

ics
Then,
Var,,((n + 1)-1/2) = E,d f(Xm) Z '(n + 1)

( M=O JJ
= E, MZ0 1 2(Xm) +2 mm 1 M^ O J (Xm')J (Xm)]'(n + 1)

,
/
E,J 2 (Xm)
n + 1 m=0

ABSORPTION PROBABILITIES 151
+ ? I Y E,,[J (Xm , )Erz(f (Xm) I Xr')]

n + 1 m=1 m0
2
m-m
= EJ 2(X0) + Y Y Erz[f
1 (Xm , )(P J )(XYr')]
n + 1 m=1 m=0
rn- I
= EJ 2 (X0 ) + ^ Erz % (X0)(P m '.j)(X0)]
n + 1 m=1 m'=0
2 n m-1
=ErzfZ(X0)+ ---- Z _Z <J,Pm-mJ>rz ,
n + 1 m=1 m'=0
=Erzf 2 (X0)+ 2 f, 1 Y- k
Y-Pi ( k=mm ' )
n+ 1 m = 1 k= 1I I n
(10.14)
Now assume that the limit
m
y= lim I <f,P kfi rz (10.15)
mimeo k=1
exists and is finite. Then it follows from (10.14) that
D = lim Var,,((n + l)_' 2 ) = E,, f 2 (Xo ) + 2y = irj f 2 (j) + 2y. (10.16)

n^ o is
Note that
<f, P k firz = COVf{.f(X0),f(Xk)}, (10.17)
and
m m
Y <j P k j >rz = COV{f(X0),J (Xk)l.
k=1 k=1
The condition (10.15) that y exists and is finite is the condition that the
correlation decays to zero at a sufficiently rapid rate for time points k units
apart as k , oo.
11 ABSORPTION PROBABILITIES
Suppose that p is a transition probability matrix for a Markov chain {Xn }

starting in state i e S. Suppose that j e S is an absorbing state of p, i.e., p 1,
Pik = 0 for k # j. Let r j denote the time required to reach j,

rj = inf{n: X. =j}. (11.1)
To calculate the distribution of i j , consider that
P1(r > m) = Y-* Per, PiIi2... Pi-- ;m , (11.2)
where Y* denotes summation over all m-tuples (i,, i 2 , ... , i) of elements from

S { j}. Now let p denote the matrix obtained by deleting the jth row and jth
column from p,
p = ((p ik : i, k E S { j})). (11.3)
Then, by definition of matrix multiplication, the calculation (11.2) may be

expressed as
Pi(t j > m) _ L P kkm1, (11.4)

k
and, therefore,
Pi(tj=m)=ZA<M-11_LPklml 7 Ai. (11.5)

k k
Observe that the above idea can be applied to the calculation of the first
passage time to any state j e S or, for that matter, to any nonempty set A of
states such that i 0 A. Moreover, it follows from Theorem 6.4 and its corollaries
that the rate of absorption is, therefore, furnished by the spectral radius of p .
This will be amply demonstrated in examples of this section.
Proposition 11.1. Let p be a transition probability matrix for a Markov chain

{X} starting in state i. Let A be a nonempty subset of S, i 0 A. Let
T A = inf{n >, 0: X E A}. (11.6)

Then,
R(TA-m)=1Y_Pk^m), m=1,2,..., (11.7)

k

where p is the matrix obtained by deleting the rows and columns of p
corresponding to the states in A.

In general, the matrix p is not a proper transition probability matrix since the
row sums may be strictly less than 1 upon the removal of certain columns from
p. However, if each of the rows in p corresponding to states j e A is replaced
by ej having 1 in the jth place and 0 elsewhere, then the resulting matrix ,
say, is a transition probability matrix and
Pi(TA < m) = 1 Pik ) (11.8)

kA
The reason (11.8) holds is that up to the first passage time t o the distribution
of Markov chains having transition probability matrices p and (starting at i)
are the same. In particular,
Pik ) =Pk(m) fori,k0A. (11.9)
In the case that there is more than one state in A, an important problem is
to determine the distribution of X,,, starting from i A. Of course, if
P; ('r A < co) < 1 then X, is a defective random variable under Pi, being defined
on the set {r A < co} of P; -probability less than 1.
Write
aJ(i):= Pi ({T A < co, XXA =j}) (j E A, i E S). (11.10)
By the Markov property, (conditional on X, in (11.10)),
a j (i)_>P ik a i (k) (jEA,ieS). (11.11)

k
Denoting by a j the vector (a(i): i E S), viewed as a column vector, one may
express (11.10) as
aj=paj (je A). (11.12)
Alternatively, (11.11) or (11.12) may be replaced by
a j (i) _ E p k a J (k) for i A,

k
(11.13)
_ 1 ifi=j
(^ E A).
a ' (j) 0 if i E A, but i A j
A function (or, vector) a = (a(i): i ES) is said to be p-harmonic on B(c S) if
a(i) = (pa)(i) for i e B. (11.14)
Hence a j is p-harmonic on A` and has the (boundary-)values on A,
1 ifi=j
(11.15)
a ' (i) f o if i e A, i^ j.
We have thus proved part (a) of the following proposition.

Proposition 11.2. Let p be a transition probability matrix and A a nonempty

subset of S.
(a) Then the distribution of Xtq , as defined by (11.10), satisfies (11.13).
(b) This is the unique bounded solution if and only if
P; (T A <oo)=1 forallieS. (11.16)
Proof. (b) Let i e A. Then
Pil o = P1 (T A > n)1 P,(T A = cc) as n T oo. (11.17)

kEAl
Hence, if (11.16) holds, then
um p;V = 0 for all i, k e A. (11.18)

n-. .0
On the other hand,
Pil o =P(TA<n,XÂ=k)TPI(TA< cc, XTA=k)=ak(i) (11.19)
ifieA`,keA,and
PIk = b ik for all n, if i e A, k e S.

)
Now let a be another solution of (11.13), besides a j . Then a satisfies (11.12),

which on iteration yields a = p"a. Taking the limit as n j oo, and using (11.18),
(11.19), one gets
a(i) = tim a(k) _ a k (i)a(k) = a(i) (11.20)

nw k keA
for all je A`, since a(k) = 0 for k E A\{ j} and a(j) = 1. Hence a ; is the unique
solution of (11.13).
Conversely, if P; (T A < cc) < 1 for some in A`, then the function
h = (h(i): i e S) defined by
h(i):= I P; (T A < cc) = Pi(TA = 00) (i n S), (11.21)
may be shown to be p-harmonic in A with (boundary-)value zero on A. The

harmonic property is a consequence of the Markov property (Exercise 5),
h(i) = Pi(TA = 00 ) = Y_ PikPk(TA = 00 )

k
_ Y_ Pik') _ Y_ P ik h(k) (i E A`). (11.22)

k k
Since P;(t A = 0) = 1 for je A, h(i) = 0 for je A. It follows that both a j and

a J + h satisfy (11.13). Since h r 0, the solution of (11.13) is not unique.
Example 1. (A Random Replication Model). The simple Wright-Fisher model

originated in genetics as a model for the evolution of gene frequencies. Here
the mathematical model will be described in a somewhat different physical
context. Consider a collection of 2N individuals, each one of which is either in
favor of or against some issue. Let X denote the number of individuals in favor
of the issue at time n = 0, 1, 2, .... In the evolution, each one of the individuals
will randomly re-decide his or her position under the influence of the current
overall opinion as follows. Let B = X /2N denote the proportion in favor of
X
the issue at time n. Then given X 0 , X...... X, each of the 2N individuals,

independently of the choices of the others, elects to be in favor with probability
0,, or against the issue with probability I 0. That is,
(2N)
P(-Y+1 = k X o , X,, ... , X) 8n(1 B)zN k -
(11.23)
for k = 0, 1, ... , 2N. So {X} is a Markov chain with state space

S = {0, 1, ... , 2N} and one-step transition matrix p = ((p, j )), where
Pik _ (2N^^2N)j(I )2N-i

i, j = 0, 1, ... , 2N. (11.24)
Notice that {X} is an aperiodic Markov chain. The "boundary states {0}
and {2N} form closed classes of essential states. The set of states
{ 1, 2, ... , 2N I } constitute an inessential class. The model has a special
conservation property of the form of the following martingale property,
E(X+1 1 X o , X 1 ..... X) = E(X+1 1 X) = (2N)2N = X, (11.25)
for n = 0, 1, 2, .... In particular, therefore,
EX,,, = E{E(X+, 1 X o , X t , ... , X)} = EX, (11.26)
for n = 0, 1, 2..... However, since S is finite we know that in the long run
{X} is certain to be absorbed in state 0 or 2N, i.e., the population is certain
to eventually come to a unanimous opinion, be it pro or con. It is of interest to
calculate the absorption probabilities as well as the rate of absorption. Here,
with A = {0, 2N}, one has p = p.
Let a(i) denote the probability of ultimate absorption at j = 0 or at j = 2N
starting from state i e S. Then,
a2N( 1 ) _ > hika2N(k) for 0< i < 2N,

/ k (11.27)
a2N( 2 N) = 1, a2N(0) = 0,
and a o (i) = 1 a 2N (i). In view of (11.26) and (11.19) we have
2N
i = E ; X0 = E . X,, _ kp ;k) . 0a 0 (i) + 2 Na2N(i) = 2 Na2N(i)
k=0
for i = 1, ... , 2N 1. Therefore,
a 2N()
i = 2N , i =0, 1, 2,...,2N, (11.28)
= 2N i
ao(i) 2N ' i = 0, 1, 2, ... , 2N. (11.29)
Check also that (11.29) satisfies (11.27).

In order to estimate the rate at which fixation of opinion occurs, we shall
calculate the eigenvalues of p(= p here).
Let v = (v o ,v 2N )' and consider the eigenvalue problem
... ,
2N
E' p j vj =2v i ,
; i =0,1,...,2N. (11.30)
j=o
The rth factorial moment of the binomial distribution (p ij : 0 < j < 2N) is
(2N rl
Z j(j 1)... (j r + l)pij =( 1r(2N)...
j=o 2N)
(2N r + 1)
j =r j r
(jJ_r i 2N -j
x 2N) 1 2N
_ (_N I r (2N)(2N 1)...(2N r + 1),
jj (11.31)
for r = 1, 2, ... , 2N. Equation (11.31) contains a transformation between
"factorial powers" and "ordinary powers" that deserves to be examined for
connections with (11.30). The "factorial powers" (j) r := j(j 1)... (j r + 1)
are simply polynomials in the "ordinary powers" and can be expressed as
r
j(1 1)(j r + 1) _ skjk. (11.32)
k =1
Likewise, "ordinary powers" jr can be expressed as polynomials in the "factorial
powers" as
r
J r = Sk(J)k, (11.33)
k=0
with the convention (j) 0 = 1. Note that S r' = 1 for all r > 0. The coefficients
Sr), {Sk} are commonly referred to as Stirling coefficients of the first and second
kinds, respectively.
Now every vector v = (v 0 , ... , v 2N )' may be represented as the successive
values of a unique (factorial) polynomial of degree 2N evaluated at 0, 1, . .. , 2N
(Exercise 7), i.e.,
2N
a r (j)r forj=0,1,...,2N. (11.34)
r=0
According to (11.24), (11.32),

2N 2N 2N 2N !2N\2N /2N) r
pij vj = L-
j =0 r=0
Y- pij(J)r r=0
j=0
ar Y- a r
(2N)
r jr = Y- ar r t J
2N n=0
r=0 (2N)
S( \
2N /2N
(2N)r (11.35)
= Y- ar (2N )r S)(1)n.
n=0 r=n
It is now clear that (11.30) holds if and only if

2N (2N)'
a, (2N)r S 7 (11.36)
r n
In particular, taking n = 2N and noting S' = 1, we see that (11.36) holds if
_ _ (2 N )2N (2N)
^`'2N (2N)2N + a2N I,
\ 1 1.37 )
a2n
and a r = & 2N) , r = 2N 1,... 0, are solved recursively from (11.36). Next
takea2N 1) 0 , a2N-1 1) = 1 and solve for
(2N)2N_ 1
(2N)2N-1'
etc.
Then,
0 < r < 2N. (11.38)

2r (2N)' = ( 1 2N)...(1 r 1),
Notice that .l o = A l = 1. The next largest eigenvalue is A 2 = 1 (1/2N). Let

V = ((v ij )) be the matrix whose columns are the eigenvectors obtained above.
Writing D = diag(a. 0 , A.1, ... , 2 2 ,), this means
pV = VD, (11.39)
or
p = VDV -1 , (11.40)
so that
p' = VDmV -1 . (11.41)
Therefore, writing V' = ((v i ')), we have from (11.8)

2N-1 2N-1 2N 2N 2N-1
Pi(T(o.2N) > m) = Y p,, Y_ Y_ Ak vikU k ' _ Y Y_ V ik V k ' tik . (11.42)
j=1 j=1 k=O k=O j=1
Since the left side of (11.42) must go to zero as m oo, the coefficients of
and A; - 1 must be zero. Thus,
2N 2N-1
Pi(T(0.2N) > m) _ Y, Y, Vikv k ^^k
k=O j=1
_ ^2 vi2v 2i ) + kY 1
Lt , E1 1 , Y1 1
Vik Uk,/ \^ 2/ m J
(const.).12 for large m. (11.43)
Example 2. (BienaymeGaltonWatson Simple Branching Process). A simple

branching process was introduced as Example 3.7. The state X,, of the process
at time n represents the total number of offspring in the nth generation of a
population that evolves by i.i.d. random replications of parent individuals. If
the offspring distribution is denoted by f then, as explained in Section 3, the
one-step transition probabilities are given by
.f * `(j), ifi>'1,j>'0,
p ij = 1 ifi=0,j=0, (11.44)
0 ifi=0,j 0.
According to the values of p i j at the boundary, the state i = 0 is absorbing

(permanent extinction). Write p io for the probability that eventually extinction
occurs given X0 = i. Also write p = p lo . Then p io = p i , since each of the i
sequences of generations arising from the i initial particles has the same chance
p of extinction, and the i sequences evolving independently must all be extinct

in order that there may be eventual extinction, given X0 = i.
If f(0) = 0, then p = p, o = 0 and extinction is impossible. If f(0) = 1, then
pro % pi = I and extinction is certain (no matter what X o is). To avoid these
-
and other trivialities, we assume (unless otherwise specified)
0<f(0)<1, f(0)+f(1)<1. (11.45)
Introduce the probability generating function of f: .
0(z) _ f(j)z' = f( 0 ) + f(j)z i ( IzI '< 1). (11.46)

i =o i= 1
Since a power series can be differentiated term by term within its radius of
convergence, one has
4 (z) = 4)(z) _ Y_ jf(j)z i '

, ( Iz1 < 1). (11.47)
dz j=1
i.e., if
= Y_
If the mean y of the number of particles generated by a single particle is finite,
i =1
ifU) < oo,
then (11.47) holds even for the left-hand derivative at z = 1, i.e.,

(11.48)
(11.49)
Since &'(z) > 0 for 0 < z < 1, 0 is strictly increasing. Also, since 4) "(z) (which
exists and is finite for 0 < z < l) satisfies
dz
zz O(z) _ j(j 1)f(j)z' 2 >0 for 0 < z < 1, (11.50)
i =2
the function 0 is strictly convex on [0, 1]. In other words, the line segment
joining any two points on the curve y = 4)(z) lies strictly above the curve (except
at the two points joined). Because 0(0) = f(0) >0 and 4)(l) _ f(j) = 1,
the graph of looks like that of Figure 11.1 (curve a or b).
The maximum of '(z) is y, which is attained at z = 1. Hence, in the case
> 1, the graph of y = 4)(z) must lie below that of y = z near z = 1 and, because
4)(0) = f(0) > 0, must cross the line y = z at a point z o , 0 < z o < 1. Since the
slope of the curve y = 4)(z) continuously increases as z increases in (0, 1), z o is
the unique solution of the equation z = 4)(z) that is smaller than 1.

4(0) = f(0)
U Zo I z
Figure 11.1
In case u <, 1, y = cb(z) must lie strictly above the line y = z, except at z = 1.
For if it meets the line y = z at a point z o < 1, then it must go under the line
in the immediate vicinity to the right of z o , since its slope falls below that of
the line (i.e., unity). In order to reach the height 4(l) = 1 (also reached by the
line at the same value z = 1) its slope then must exceed 1 somewhere in (z o , 1];
this is impossible since 0'(z) < 0'(1) = p < I for all z in [0,1]. Thus, the only
solution of the equation z = 4(z) is z = 1.
Now observe
P=Pio= P(X1=X0= 1 )Pjo= ^.f(1)P'=q5 (P), (11.51)
thus if p <, 1, then p = 1 and extinction is certain. On the other hand, suppose
p> 1. Then p is either z o or 1. We shall now show that p = z o (< 1). For this,
consider the quantities
q'=P(X=0I Xo = 1)=Pi (n=1,2,...). (11.52)
That is, q is the probability that the sequence of generations originating from
a single particle is extinct at time n. As n increases, q. j p; for clearly,
{X=0}c{X,,,=0} for all m>,n,so that q<,qifn<m.Also
lim X = 0 = U {X = 0} = {extinction occurs).
Now, by independence of the generations originating from different particles,

P(X=O1 Xo=j)=q i (J=O, 1,2,...),
q,,+1 =P(X,+1 =01X 0 = 1)=P(X I =01X0 = 1)

cIJ
+ Y- P(X 1 =J,X,+1 =O^Xo= I)

j=1
co
=f( 0 )+ Z P(X 1 = j I Xo = 1 )P(X+1 = 0 1X0 = 1, X 1 =J)

j=1
_ f(0) + f(J)q,' = O(qn) (n = 1, 2, ...). (11.53)

i =1
Since q 1 = f(0) = 4(0) < 4(z 0 ) = z o (recall that b(z) is strictly increasing in z
for 0 < z < 1), one has using (11.53) with n = 1, q 2 = o(q 1 ) < O(z o ) = z o , and
so on. Hence, q <z for all n. Therefore, p = lim n .. q < z o . This proves
p = zo.
If f(0) + f(1) = 1 and 0 <f(0) < 1, then q5"(z) = 0 for all z, and the graph
of 4(z) is the line segment joining (0, f(0)) and (1, 1). Hence, p = I in this case.
Let us now compute the average size of the nth generation. One has
E(a'^+1 I `Yo = 1 ) = Z kPik+l^ _ k(Z PliPik)

k=1 k=1 \j=1
_^ ^k P ^n1c11^
1j Pjk P1j
î(k^111
Pjk l `
k=1 j=1 J. k=1
O ro
_ p E(XI I Xo =j)=
) pi jE(XI 1X0 = 1)
j=1 j=1
ao a.
_ Pi;Ju = P 1Pij = E(X I X o = 1). (11.54)
j= 1 j=1
Continuing in this manner, one obtains
E(X,,+ 1 I Xo = 1) = uE(X,, j X0 = 1) = p 2 E(XX- j Xo = 1 )
=...=p"E(X 1 IXo =1)=n+ 1 . (11.55)
It follows that
E(X,, I X0 =1) =.IP . (11.56)
Thus, in the case < 1, the expected size of the population at time n decreases
to zero exponentially fast as n -+ co. If = 1, then the expected size at time n
does not depend on n (i.e., it is the same as the initial size). If > 1, then the
expected size of the population increases to infinity exponentially fast.
12 ONE-DIMENSIONAL NEAREST-NEIGHBOR GIBBS STATE
The notion of a Gibbs distribution has its origins in statistical physics as a

probability distribution with respect to which bulk thermodynamic properties
of materials in equilibrium can be expressed as expected values (called phase
averages). The thrust of Gibbs' idea is that a theoretically convenient way in
which to view materials at the microscopic scale is as a large system composed
of randomly distributed but interacting components, such as the positions and
momenta of molecules comprising the material.
To arrive at an appropriate probability distribution for computing large-scale
averages, Gibbs argued that the probability of a given configuration of
component values in equilibrium should be inversely proportional to an
exponential function of the total energy of the configuration. In this way the
lowest-energy configurations (ground states) are the most likely (modes) to
occur. Moreover, the additivity of the combined total energy of two
noninteracting systems is reflected in the multiplication rule for (exponential)
probabilities of independent values for such a specification. It is a tribute to
the genius of Gibbs that this approach leads to the correct thermodynamics at
the bulk material scale. Perhaps because of the great success these ideas have
enjoyed in physics, Gibbs' probability distributions have also been introduced
to represent systems having a large number of interacting components in a wide
variety of other contexts, ranging from both genetic and automata codes to
economic and sociological systems.
For purposes of orientation one may think of random values {X: n e A}
(states) from a finite set S distributed over a finite set of sites. The set of
possible configurations is then represented by the cartesian product
I
S2:_ {w = ( 6: n e A) n --> fi n e S. n e A, is a function on Al. For A and S
finite, S2 is also a finite set and a (free-boundary) Gibbs distribution on S2 can
be described by a probability mass function (p.m.f.) of the form
P(Xn = o, n e A):= Z ` exp{ U(cw)}, - w = ( 6,,) e S (12.1)
where U(w) is a real-valued function on S2, referred to as total potential energy

of configurations as e S2, is a positive parameter called inverse temperature, and
Z:= Z exp{U(w)}, (12.2)

WE
is the normalization constant, referred to as the partition function.

Observe that if P is any probability distribution on a finite set S2 that assigns
strictly positive probability p(w) to each co e S2, then P can trivially be expressed
in the form (12.1) with U(w) = ' log{p(w)}. In physics, however, one
-
regards the total potential energy as a sum of energies at individual sites plus
the sum of interaction energies between pairs of sites plus the sum of energies
between triples, etc., and the probability distribution is specified for various
types of such interactions. For example, if U(w) = > fEA q1(6 n ) for
ONE-DIMENSIONAL NEAREST-NEIGHBOR GIBBS STATE 163
co = (o: n E A) is a sum of single-site energies, higher-order interactions being

zero, then the distribution (12.1) is that of independent components; similarly
for the case of infinite temperature ((i = 0).
It is both natural and quite important to consider the problem of extending
the above formulation to the so-called infinite volume setting in which A is a
countably infinite set of sites. We will consider this problem here for
one-dimensional systems on A = Z consisting of interacting components taking
values in a finite set S and for which the nonzero interaction energies contributing
to U are at most between pairs of nearest-neighbor (i.e., adjacent) integer sites.
Let Q = S 1 for a finite set S and let .F be the sigmafield generated by
finite-dimensional cylinder sets of the form
C={w=(a)efto k =s k forkeA}, (12.3)
where A is a finite subset of Z and Sk e S is specified for each k e A. Although

it would be enough to consistently prescribe the probabilities for such events
Y_
C, except for the independent case it is not quite obvious how to do this starting
with energy considerations. First consider then the independent case. For
single-site energies q 1 (s, n), se S. at the respective sites ne Z, the probabilities
of cylinder events can be specified according to the formula
P(C) = Zn' exp{ cp, (s k , k)}, (12.4)

l keA J
where ZA appropriately normalizes P for each set of sites A. In the homogeneous

(or translation-invariant) case we have cp,(s, n) = cp l (s), s e S, depending on the
site n only through the component values s at n.
Now suppose that we have in addition to single-site energies cp, (s), pairwise
nearest-neighbor interactions represented by (p 2 (s, n; t, m) for In ml = 1, s, t e S.
For translation invariance of interactions, take cp 2 (s, n; t, m) = q 2 (s, t) if
In ml = 1 and 0 otherwise. Now because values inside of A should be
dependent on values across boundaries of A it is less straightforward to
consistently write down expressions for P(C) than in (12.4). In fact, in this
connection it is somewhat more natural to consider a (local) specification of
conditional probabilities of finite-dimensional events of the form C, given
information about a configuration outside of A. Take A = {n} and consider a
specification of the conditional probability of C = {X = s} given an event of
the form {Xk = S k for k e D \{n}}, where D is a finite set of sites that includes
n and the two neighboring sites n 1 and n + 1, as follows
P(X= sIXk =s k ,keD\{n})
= Z,,.'' exp{ [cpi(s) + (P2(s, s -1) + (P2(s, sn +i)] }> (12.5)
where
Zn.n= Y exp{
scs
[9,(s)+92(s>s,- 1)+42(s,S,, +i)] }. (12.6)
164 DISCRETEPARAMETER MARKOV CHAINS
That is, the state at n depends on the given states at sites in D \ {n} only through
the neighboring values at n 1, n + 1. One would like to know that there is
a probability distribution having these conditional probabilities. For the
one-dimensional case at hand, we have the following basic result.
Theorem 12.1. Let S be an arbitrary finite set, S2 = S^, and let be the
sigmafield of subsets of ) generated by finite-dimensional cylinder sets. Let
{Xn } be the coordinate projections on S2. Suppose that P is a probability measure
on S with the following properties.
(i) P(C) > 0 for every finite-dimensional cylinder set C e S.
(ii) For arbitrary n c- Z let 3(n) = {n 1, n + 1 } denote the boundary of {n}
in 7L. If f is any S-valued function defined on a finite subset D,, of 7/ that
contains {n} u (n) and if a e S is arbitrary, then the conditional
probability
P(X=a Xm = f(m),meD\{n})
depends only on the values of f on a(n).

(iii) The value of the conditional probability in (ii) is invariant under
translation by any amount 0 e 7L.
Then P is the distribution of a (unique) stationary (i.e., translation-invariant)
Markov chain having strictly positive transition probabilities and, conversely,
every stationary Markov chain with strictly positive transition probabilities
satisfies (i), (ii), and (iii).
Proof. Let
gb,C(a)=P(X=1 X.- =b,X+ =c). (12.7)
The family of conditional probabilities {g b (a)} is referred to as the local structure

of the probability distribution. We will prove the converse statement first and
along the way we will calculate the local structure of P in terms of the transition
probabilities. So assume that P is a stationary Markov chain with strictly
positive transition matrix ((p(a, b): a, b e S)) and marginal distribution
(n(a): a E S). Consider cylinder set probabilities given by
...
P(XX = a o , X+ = a l , ... , X,,+n = ab) = ir(ao)p(ao, a, ) p(ab_ 1, ab). (12.8)
So, in particular, the condition (i) is satisfied; also see (2.9), (2.6), Prop. 6.1.
For m and n > I it is a straightforward computation to verify
P(Xo=al X-m=b_,,...,X_, =b-,,X1 =b1,...,XX =b^)
=P(X0 =ajX- 1 =b-1,X1 =b1)
= p(b-1, a)p(a, bi) /p

j2)
(b- i, bi ) = gb e. b,(a). (12.9)
ONE-DIMENSIONAL NEAREST-NEIGHBOR GIBBS STATE 165
Therefore, the condition (ii) holds for P. since condition (iii) also holds because
P is the distribution of a stationary Markov chain. Next suppose that P is a
probability distribution satisfying (i), (ii), and (iii). We must show that P is the
distribution of a stationary Markov chain. Fix an arbitrary element of S, denoted
as 0, say. Let the local structure of P be as defined in (12.7). Observe that for
each b, c E S. g( ) is a probability measure on S. Outlined in Exercise 1 are
the steps required to show that
q(b, a)q(a, c)
g(a) =q^z^(b ^) (12.10)
where
0 , (b)
q(b, c) = g (12.11)
g 9 (0)
and
q("+ '>
(b, c) = I q(b, a)q(a, c). (12.12)
aS
An induction argument, as outlined in Exercise 2, can be used to check that
P(ìk+h = ak , 1 < k < r I Xh+ l -" = a, Xh+r+n = b)
= q " (a, a1)q( ai, a2 . .. q (")( ar b)

( )
)
b) q (r+ 2n - 1)(Q b) >

(12.13)
l ,
for h E 7L, a , a, b E S. n, r > 1. We would like to conclude that P coincides

;
with the (uniquely determined) stationary Markov chain having transition

probabilities ((q(b, c))) defined by normalizing ((q(b, c))) to a transition
probability matrix according to the special formula
q( b, c)(c) 9(b, c)(c)

4(b,c)= --- _ (12.14)
Y q(b,a),u(a) Au(b)
aES
where p = ((a)) is a positive eigenvector of q corresponding to the largest (in

magnitude) eigenvalue ), = Amax of q. Note that q is a strictly positive transition
probability matrix with unique invariant distribution n, say, and such that
(Exercise 3)
) (b, c) (c)
9(b, c) = q^" (12.15)
2 "u(b)
Let Q be the probability distribution of this Markov chain. It is enough to
show that P and Q agree on the cylinder events. Using (12.13) we have for any
n 1,
P(X0 =a o ,...,Xr = ar)
_ P(X-n= a,Xn+r=b)P(X0=a0,...,Xr=arI X - n = a,Xn+r=b)

aES bES
(n)(a,, b)
(n) (a , ao)9(ao, a,) ... q(ar-,, ar)q
_ P(X-n = a, Xn+r = b) q q(r+2n)(a, b)
aES bES
(n) (a , a0)4(a0, a1). . (a r _ 1, ar)4 (n) (ar , b )

Y_ Y_ P ( X
aeS beS
- n = a, Xn+r = b) 4 q(r
+2n)(a, b)
(12.16)
Now let n - oo to get by the fundamental convergence result of Proposition
6.1 (Exercise 4),
P(X0 = a 0 , ... , Xr = a,) = ir(ao)4(ao, a1)...4(ar -i, ar)
=Q(X0 =a o ,...,Xr =a r ). (12.17)
Probability distributions on 7L that satisfy (i)-(iii) of Theorem 12.1 are also

referred to as one-dimensional Markov random fields (MRF). This definition
has a natural extension to probability distributions on 7L called the
d-dimensional Markov random field. For that matter, the important element of
the definition of a MRF on a countable set A is that A have a graph structure
to accommodate the notion of nearest-neighbor (or adjacent) sites. The result
here shows that one-dimensional Markov random fields can be locally specified
and are in fact stationary Markov chains. While existence of a probability
distribution satisfying (i)-(iii) can be proved for any dimension d (for finite S),
the probability distribution need not be unique. This interesting phenomenon
is known as a phase transition. Existence may fail in the case that S is countably
infinite too (see theoretical complement 1).
13 A MARKOVIAN APPROACH TO LINEAR TIME SERIES

MODELS
The canonical construction of Markov chains on the space of trajectories has

been explained in Chapter I, Section 6, using Kolmogorov's Existence Theorem.
In the present section and the next, another widely used general method of
construction of Markov processes on arbitrary state spaces is illustrated.
Markovian models in this form arise naturally in many fields, and they are
often easier to analyze in this noncanonical representation.
Example 1. (The Linear Autoregressive Model of Order One, or the AR(1)

Model).Let b be a real number and {E n : n > 1} an i.i.d. sequence of real-valued
A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS 167
random variables defined on some probability space (f), .F, P). Given an initial
random variable X 0 independent of {E}, define recursively the sequence of
random variables {X": n > 0} as follows:
X,, X 1 := bX o + c , Xn+ 1 := bXn + (n 0). (13.1)
As X0 , X 1 ... . , Xn are determined by {X 0 , s .... E n }, and E" + , is independent

of the latter, one has, for all Bore! sets C,
P(Xn+t E Cl {X0 , X 1 ... .. Xn}) = [P(bx + c,,1 E C)]x=x
= [P(En+ ^ E C bx)] =x = Q(C bXn),

(13.2)
where Q is the common distribution of the random variables t. n . Thus {Xn : n >, 0}
is a Markov process on the state space S = ER', having the transition probability
(of going from x to C in one step)
p(x, C):= Q(C bx), (13.3)
and initial distribution given by the distribution of X 0 . The analysis of this

Markov process is, however, facilitated more by its representation (13.1) than
by an analytical study of the asymptotics of n-step transition probabilities. Note
that successive iteration in (13.1) yields
X,=bX0 +E,, X 2 =bX,+f 2 =b 2 X 0 +he,+c 2

(13.4)
XX = bnX0 + bn-' E1 + bn-2 c2 + ... + be-, + e (n % I).
The distribution of Xn is, therefore, the same as that of
Y := b"X0 + e, + bs 2 + b 2 e 3 + . + b""'F (n >, 1). (13.5)
Assume now that

IbI < 1 (13.6)
and le n s < c with probability I for some constant c. Then it follows from (13.5)
that
Yn ' b"En+^ a.s., (13.7)

n=0
regardless of X0 . Let it denote the distribution of the random variable on the

right side in (13.7). Then Y. converges in distribution to it as n -, oo (Exercise
1). Because the distribution of X. is the same as that of Yn , it follows that X.
converges in distribution to n. Therefore, is is the unique invariant distribution
for the Markov process {Xn }, i.e., for p(x, dy) (Exercise 1).
The assumption that the random variable s, is bounded can be relaxed.

Indeed, it suffices to assume
" E P(IE I > cS") < oo

1 I for some 6 <--- and some c > 0. (13.8)
For (13.8) is equivalent to assuming I

P(IE" + 1 I > c8") < co so that, by the
BorelCantelli Lemma (see Chapter 0, Section 6),
P(IE"+ lI < cS" for all but finitely many n) = 1.
This implies that, with probability 1, Ib "E" +l l < c(Ibj6)" for all but finitely many
n. Since IbI b < 1, the series on the right side of (13.7) is convergent and is the
limit of Y".
It is simple to check that (13.8) holds if I bi < 1 and (Exercise 3)
Eie 1 i' <oo for some r > 0. (13.9)
The conditions (13.6) and (13.8) (or (13.9)) are therefore sufficient for the
existence of a unique invariant probability it and for the convergence of X. in
distribution to it.
Next, Example 1 is extended to multidimensional state space.
Example 2. (General Linear Time Series Model). Let {E": n 1} be a sequence

of i.i.d. random vectors with values in Rm and common distribution Q, and let
B be an m x m matrix with real entries b . Suppose X o is an m-dimensional
;;
random vector independent of {E" }. Define recursively the sequence of random

_>
vectors
X0,X 1'= BXn+En +1 (n =0, 1,2,...). (13.10)
As in (13.2), (13.3), {X"} is a Markov process with state space 68'" and transition
probability
p(x, C):=Q(C Bx) (for all Borel sets C c Qm). (13.11)
Assume that
IIB"II < 1 for some positive integer n o . ( 13.12)
Recall that the norm of a matrix H is defined by
IIHII:= sup IHxi, (13.13)

IXI =1
where Ixl denotes the Euclidean length of x in LRm. For a positive integer n > n o
write n = jn o + j', where 0 < j' < n o . Then using the fact IIB1B2II IIB1II IIB2II
for arbitrary m x m matrices B,, B 2 (Exercise 2), one gets
II WI = IIB'"B' II < IIB"IVIiB' II < cIIB"II'> c max {IIBII:0 < r < n o }.

(13.14)
From (13.12) and (13.14) it follows, as in Example 1, that the series Z B"E,
converges a.s. in Euclidean norm if (Exercise 3), for some c > 0,
1
" ^ 1 P(i 1 I > cb") < oo for some 6 < II Bn0Il11n0 . (13.15)
Write, in this case,
Y:= E B"sn+l (13.16)

n=O
It also follows, as in Example 1 (see Exercise 1), that no matter what the initial
distribution (i.e., the distribution of X 0 ) is, X n converges in distribution to the
distribution it of Y. Therefore, it is the unique invariant distribution for p(x, dy).
For purposes of application it is useful to know that the assumption (13.12)
holds if the maximum modulus of eigenvalues of B, also known as the spectral
radius r(B) of B, is less than 1. This fact is implied by the following result from
linear algebra.
Lemma. Let B be an m x m matrix. Then the spectral radius r(B) satisfies
r(B) >, Iii JIB"Il li ". (13.17)

nai
Proof. Let A,, ... , A m be the eigenvalues of B. This means det(B Al) =
(^ 1 A)(A2 A) * (Am A), where det is shorthand for determinant and I is
the identity matrix. Let A m have the maximum modulus among the A ; , i.e.,
J > Iiml then B Al is invertible, since det(B AI) 0. Indeed,
I'ml = r(B). If Al
by the definition of the inverse, each element of the inverse of B Al is a
polynomial in A (of degree m 1 or m 2) divided by det(B AI). Therefore,
one may write
(BAI) -1 =(A, A) -1 ...(A m A) -] (B o +.B 2 +...+,lm -1 B m - 1 )

(IAI > kml), (13.18)
where B,(0 < j < m 1) are m x m matrices that do not involve A. Writing
z = 1/A, one may express (13.18) as

m-1
(B AI)-1 = (_)_m(1 /^1,)-1.. .(1 )!mR)-12m-1 j=0
m-1
= (-1)mz(l .l,z) -1 ...(I .mZ) 1 I z m 1-, Bj
j=0
Z Y a,, Z n) -1 Zm-1-jBj
(IZI < I2ml-1 ). (13.19)
^ n=0 /-0
On the other hand,
(B ):I)-1 = z(I zB)' = z IZI < 1zkB . (13.20)

=
k0 IIBII
To see this, first note that the series on the right is convergent in norm for
Izl < 1 /1111, and then check that term-by-term multiplication of the series I z k B k
by I zB yields the identity I after all cancellations. In particular, writing b;j'^
for the (i, j) element of B k , the series
X
zY_z k b; (13.21)
k=0
converges absolutely for Izl < 1/ B. Since (13.21) is the same as the (i, j) element
of the series (13.19), at least for Izl < I/IIBII, their coefficients coincide (Exercise
4) and, therefore, the series in (13.21) is absolutely convergent for Izl < IAm) -'
(as (13.19) is).
This implies that, for each e > 0,
I14 I < (I.l m l + E) k for all sufficiently large k. (13.22)
For if (13.22) is violated, one may choose Izl sufficiently close to (but less than)
1/I ; m l such that Iz^ k ' ) b'>l -- co for a subsequence {k'}, contradicting the
requirement that the terms of the convergent series (13.21) must go to zero for
IZI < 1/IAmI
Now IIB k ll < m' 12 max{Ib>I: 1 < i, j < m} (Exercise 2). Since m l/21 ^ 1 as
k --* co, (13.22) implies (13.17). n
Two well-known time series models will now be treated as special cases of
Example 2. These are the pth order autoregressive (or AR(p)) model, and the
autoregressive moving-average model ARMA(p, q).
Example 2(a). (AR(p) Model). Let p > 1 be an integer, o , ,, . .. , P -, real

constants. Given a sequence of i.i.d. real-valued random variables {ri n : n > p},
and p other random variables U0 , U,..... UP _, independent of {ri n }, define
recursively

p-1
U+1 + 1
Un+p iln+p (n >, 0). (13.23)
i =0
The sequence { Un } is not in general a Markov process, but the sequence of

p-dimensional random vectors
Xn'=(Un , Un+1 , ... , U+p-1)' (n 0) (13.24)
is Markovian. Here the prime (') denotes transposition, so X. is to be regarded

as a column vector in matrix operations. To prove the Markov property,
consider the sequence of p-dimensional i.i.d. random vectors
0 ,ij +p-1Y (n>, 1), (13.25)
and note that
X+, = BX n + r n+1 (13.26)
where B is the p x p matrix
0 1 0 0 0 0
0 0 1 0 .. 0 0
B:= . (13.27)
0 0 0 0 ... 0 1
0 F'1 N2 1'3 ' Np - 2 p-1
Hence, arguing as in (13.2), (13.3), or (13.11) X,; is a Markov process on the

state space R. Write
). 1 0 0 0 0
0 ) 1 0 0 0
BA.(= ... .
0 0 0 0 ^ 1
YO I'1 /32/33 ' * * /'p-2 1 1 p-1 a
Expanding det(B Al) by its last row, and using the fact that the determinant
of a matrix in triangular form (i.e., with all zero off-diagonal elements on one
side of the diagonal) is the product of its diagonal elements (Exercise 5), one gets
1
det(B AI) = ( - 1 p+1 ) (0 + 1A + ... + p ,; ,p -
-
A"). (13.28)
Therefore, the eigenvalues of B are the roots of the equation

0+1A+ ...
+p-I )p-I Ap = 0 . (13.29)
Finally, in view of (13.17), the following proposition holds (see (13.15) and
Exercise 3).
Proposition 13.1. Suppose that the roots of the polynomial equation (13.29)
are all strictly inside the unit circle in the complex plane, and that the common
distribution G of {tl"} satisfies
l G({x e AB I : lxi > cS"}) < oo for some S < ^ (13.30)

n A mt
where JA m t is the maximum modulus of the roots of (13.29). Then (i) there exists
a unique invariant distribution 71 for the Markov process {X"}, and (ii) no
matter what the initial distribution, X n converges in distribution to it.
Once again it is simple to check that (13.30) holds if G has a finite absolute
moment of some order r > 0 (Exercise 3).
An immediate consequence of Proposition 13.1 is that the time series
{ Un : n >, 0} converges in distribution to a steady state n U given, for all Bore!
sets C R', by
1t (C):= ir({x E W: x ' e C}). ( ) (13.31)
To see this, simply note that U. is the first coordinate of X, so that X,, converges
to it in distribution implies U. converges to it in distribution.
Example 2(b). (ARMA(p, q) Model). The autoregressive moving-average

model of order (p, q), in short ARMA(p, q), is defined by
p I q
Un+p := Z i Un +i + 1 Sj^1 n+pj + rin+p (n 0), (13.32)
i =0 j=1
where p, q are positive integers, i (0 , p q} is an i.i.d. sequence of real-valued random variables,
and U. (0 < i <, p 1) are arbitrary initial random variables independent of
{rj"}. Consider the sequence {X"}, {s"} of (p + q)-dimensional vectors
X. _ (Un , , Un+pI. Iln+pq+ e In+p-1) , e

(n ? 0), (13.33)
$ n := (0, 0, . . 0, 1%n + p 1 , 0, . . , 0 , j,,+,,_ 1 )/
where il n+p-1 occurs as the pth and (p + q)th elements of E.


X,,+1 = HXn + En+ 1 (n -> 0 ), (13.34)
where H is the (p + q) x (p + q) matrix
b ll . ... b 1 0 . ... 0 0
h nl h ... b 2 d1
d9-1
0 00 1 0 .. 00
H :=
0 00 0 1 00
0 0 ... 0 0 01
0 0 . 0 0 ... 00
the first p rows and p columns of H being the matrix B in (13.27).

Note that U., ... , U P _ 1 , rl p _ q , ... , q p _, determine X 0 so that X o is ,
independent of rl p and, therefore, of E,. It follows by induction that X. and E n+ ,
are independent. Hence {X n } is a Markov process on the state space I8 + Q.

In order to apply the Lemma above, expand det(H ;.I) in terms of the
elements of its pth row to get (Exercise 5)
det(H Al) = det(B Al)(_2) 9 . ( 13.35)
Therefore, the eigenvalues of H are q zeros and the roots of (13.29). Thus, one
has the following proposition.
Proposition 13.2. Under the hypothesis of Proposition 13.1, the ARMA(p, q)

process {X n } has a unique invariant distribution n, and X. converges in
distribution to it no matter what the initial distribution is.
As a corollary, the time series { Un } converges in distribution to m given for

all Borel sets C c R' by
ii u (C):= rz({x e RP + q: x ( ' E

) C}), (13.36)
no matter what the distribution of (U0 , U ... , U p _,) is, provided the
hypothesis of Proposition 13.2 is satisfied.
In the case that E n is Gaussian, it is simple to check that under the hypothesis
(13.12) in Example 2 the random vector V in (13.16) is Gaussian. Therefore, it
is Gaussian, so that the stationary vector-valued process {X n } with initial
distribution it is Gaussian (Exercise 6). In particular, if q n are Gaussian in
Example 2(a), and the roots of the polynomial equation (13.29) lie inside the
unit circle in the complex plane, then the stationary process {Un }, obtained
when (U0 , U I ... , U,,,_ 1 ) have distribution

, it in Example 2(a), is Gaussian. A
similar assertion holds for Example 2(b).
14 MARKOV PROCESSES GENERATED BY ITERATIONS OF

I.I.D. MAPS
The method of construction of Markov processes illustrated for linear time

series models in Section 13 extends to more general Markov processes. The
present section is devoted to the construction and analysis of some nonlinear
models. Before turning to these models, note that one may regard the process
{X} in Example 13.1 (see (13.1)) to be generated by successive iterations of an
i.i.d. sequence of random maps a l , a...... a, ... defined by
x-->ax=bx +E (n> 1),
{s: n >, l} being a sequence of i.i.d. real-valued random variables. Each a is

random (affine linear) map on the state space R' into itself. The Markov
sequence {X} is defined by
X = a . a,X (n > 1), (14.1)

()
where the initial X 0 is a real-valued random variable independent of the sequence

of random maps {a: n >, 1 }. A similar interpretation holds for the other
examples of Section 13. Indeed it may be shown, under a very mild condition
on the state space, that every Markov process in discrete time may be represented
as (14.1) (see theoretical complement 1). Thus the method of the last section
and the present one is truly a general device for constructing and analyzing
Markov processes on general state spaces.
Example 1. (Iterations of I.I.D. Increasing Maps). Let the state space be an

interval J, finite or infinite. On some probability space (Q, .F , P) is given a
sequence of i.i.d. continuous and increasing random maps {a: n >, l} on J into
itself. This means first of all that for each w E S2, a(w) is a continuous and
increasing (i.e., nondecreasing) function on J into J. Second, there exists a set
I' of continuous increasing functions on J into J such that P(a e i') = I for
all n; I' has a sigmafield .4(I') generated by sets of the form {y e I': a < yx < b}
where a < b and x are arbitrary elements of J, and yx denotes the value of y
at x. The maps a on Q into I' are measurable, i.e., F := {co e S2: a(w) e D} E F
for every D e .(r). Also, P(F) is the same for all n. Finally, {a: n >, 1 } are
independent, i.e., events {a e D} is an independent sequence for every given
sequence {D}
For any finite set of functions y,, Y 2 , ... , y k in I', one defines the composition
Y1Y2' Yk in the usual manner. For example, y 1 y 2 x = y 1 (y 2 x), the value of y 1
at the point y2x.
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS 175
For each x e J define the sequence of random variables
X0(x):=x, X,(x):=aX - 1(x) = anan- 1 ...

a,x (n > 1). (14.2)
In view of the independence of {a: n > 1}, {X: n 0} is a Markov process

(Exercise 1) on J, starting at x and having the (one-step) transition probability
p(y, C):= P(a y E C) = p({y e F: yy e Cl) (C Borel subset of J), (14.3)
where y is the common distribution of an .
lt will be shown now that the following condition guarantees the existence
of a unique invariant probability it as well as stability, i.e., convergence of X(x)
in distribution to it for every initial state x. Assume
6, := P(X o (x) '< z 0 Vx) >0 and 82 := P(X o (x) z 0 Vx) >0
(14.4)
for some zo E J and some integer n o .
Define
A,:= sup I P(X,(x) '< z) P(X,(y) '< z). (14.5)
x,y,zcJ
For the existence of a unique invariant probability it and for stability it is

enough to show that A,, > 0 as n > co. For this implies P(X(x) < z) converges,
uniformly in z e J, to a distribution function (of a probability measure on J).
To see this last fact, observe that X +m (x) _ a + m a,x has the same
distribution as a a1 +m a + ,x, so that
IP(Xn +m(x) < Z) P(Xn(x) <, z)1

= IP(Xn(Ln +m ...
an+lx) <, Z) P(X(x) <, z)I < A,
by comparing the conditional probabilities given a +m, ... , an+ 1 first. Thus, if
A + 0, then the sequence of distribution functions {P(X(x) < z)} is a Cauchy
sequence with respect to uniform convergence for z E J. It is simple to check
that the limiting function of this sequence is a distribution function of a
probability measure is on J (Exercise 2). Further, A. --+ 0 implies that this limit
it does not depend on the initial state x, showing that P is the unique invariant
probability (Exercise 13.1).
In order to prove A. --> 0, the first step is to establish, under the assumption
(14.4), the inequality
A o < 6:=max{1 S 1 z }. (14.6)
For this fix x, y E J and first take z < z o . On the set F2 :_ {X 0 (x) >, z 0 Vx} the
events {X 0 (x) <, z}, {X 0 (y) < z} are both empty. Hence, by the second
condition in (14.4),
IP(Xno (x) < z) P(Xn o (y) < z)I = IE(1{xô(x)sz) 1{x,o(Y),z))I P(FZ) = 1 6 2 ,
(14.7)
since the difference between the two indicator functions in (14.7) is zero on F2 .
Similarly, if z > z o then on the set F, :_ {X 0 (x) <, z 0 Vx} the two indicator
functions both equal 1, so that their difference vanishes and one gets
IP(X^ o (x) < z) P(Xn o (y) < z)] < P(F) = 1 6 1 . (14.8)
Combining (14.7) and (14.8) one gets
IP(Xno (x) < z) P(Xn o (y) <, z)I < 8 (14.9)
for all z z o . But the function on the left is right-continuous in z. Therefore,

letting z I z o , (14.9) holds also for z = z o . In other words, (14.6) holds.
Next note that O n is monotonically decreasing,
On+i < On. (14.10)
For,
IP(XX+i(x) 5 z) P(Xn+i(y) 5 z)1

= IP(n+l...a2a1X' Z) Man+l ... a2 0tlY ^ Z)I ^ An,
by comparing the conditional probabilities given a,.

The final step in proving O n -^ 0 is to show
A n < 8[n/no) (14.11)
where [r] is the integer part of r. In view of (14.10), it is enough to prove
O jno S' (j=1,2,...). (14.12)
Suppose that this is true for some j >, 1. Then,
IP(X(j+l)no(x) < Z) P(X(j+l)no(.y) < z)I
= I E ( 1 {ai . +un o ... ain o +IXjn o (x) z) 1 (au+IW o ...ai no + Xjn o (Y)Sz) ) I
= I E(l {xjno(x) 2 ( ( Itj+1mo ... n 0+1) - '( aO.zJ) 1 {Xjno(Y)e(a(j+ llno ... a^no+1) - '( ao.zJ) )1 '
(14.13)
Let
F3 1 = {a(j + 1) no ...aj no+ lx < zOVx}, F4:= {a(j+1) no ...aj np +1X s Zpdx}.
Take z < z o first. Then the inverse image of (x,:] in (14.13) is empty on
F4 , so that the difference between the two indicator functions vanishes on F 4 .
On the complement of F,, the inverse image of ( ao, z] under the continuous
increasing map a (j+1)n0 ' *anno + , is an interval (oo, Z'] n J, where Z' is a
random variable. Therefore, (14.13) leads to
IP(X(J+I)no(x) < z) P(X(J+1)no(y) < z)I = JE 1 F ( I lX,lx)SZ'1 1 X(v)$l'^)I-

(14.14)
As F and Z' are determined by a+,)n o , ... , a;n.+, and the latter are
independent of Xj(x), X 0 (y) one gets, by taking conditional expectation given
1aU+1)no, ... , ajno+1
IP(X(J+1)no(X) l< z) P(X(J+1)no(Y) S z)1
IE1 Fg i jn0 I _ (1 5 2 )A jn0 < b\ ino < hJ+ 1 . (14.15)
Similarly, if z > z 0 , the inverse image in (14.13) is J on F3 . Therefore, the

difference between the two indicator functions in (14.13) vanishes on F3 , and
one has (14.14), (14.15) with F4 , b 2 replaced by F 3 and J,. As (14.12) holds for
j = 1 (see (14.6)), the induction is complete and (14.12) holds for allj >, 1. Since
6 < 1, it follows that under the hypothesis (14.4), A n * 0 exponentially fast as
n + oo and, therefore, there exists a unique invariant probability.
If J is a closed bounded interval [a, b], then the condition (14.4) is essentially
necessary for stability. To see this, define
Y0(x) = x, }(x):=12. (n
(n 1). (14.16)
Then Yn (x) and X(x) have the same distribution. Also,
Y1(a) a, Y2(a) = Y1(a2a) >, Y,(a), .. .
Yn+,(a) = Yn(an+la) i Yn(a),
i.e., the sequence of random variables { Yn (a): n > 0} is increasing. Similarly,

n > 0} is decreasing. Let the limits of these two sequences be Y, Y,
respectively. As Yn (a) < Yn (b) for all n, Y < Y If P( Y < Y) > 0, then Y and Y
cannot have the same distribution. In other words, Yn (a) (and ; therefore, Xn (a))
and Yn (b) (and, therefore, XX (b)) converge in distribution to different limits
it 1 , rr z say. On the other hand, if Y = Y a.s., then these limiting distributions
are the same. Also, Yn (a) < Yn (x) ` Yn (b) for all x, so that Yn (x) converges in
distribution to the same limit it, whatever x. Therefore, it is the unique invariant
probability. Assume that is does not assign all its mass at a single point. That
is, rule out the case that with probability I ally's in I, have a common fixed
point. Then there exist m < M such that P(Y < m) > 0 and P(Y > M) > 0.
There exists n o such that P(Yno (b) < m) > 0 and P(Yno (a) > M) > 0. Now any
z o e [m, M] satisfies (14.4).
As an application of Example 1, consider the following example from

economics.
Example 1(a). (A Descriptive Model of Capital Accumulation). Consider an

economy that has a single producible good. The economy starts with an initial
stock X0 = x > 0 of this good which is used to produce an output Y, in period
1. The output Y, is not a deterministic function of the input x. In view of the
randomness of the state of nature, Y, takes one of the values fr (X) with
probability p, > 0 (1 < r <, N). Here f, are production functions having the
following properties:
(i) f, is twice continuously differentiable, f(x) > 0 and f' (x) <0 for all
x>0.
(ii) limxlo fr(X) = 0, limXlof;(x) > 1, limit f(x) = 0.
(iii) If r > r', then fr (X) > fr (X) for all x > 0.
The strict concavity of f, in (i) reflects a law of diminishing returns, while (iii)
assumes an ordering of the technologies or production functions f from the
least productive fi to the most productive fN .
A fraction (0 < < 1) of the output Yl is consumed, while the rest (1 )Y,
is invested for the production in the next period. The total stock X, at hand
for investment in period 1 is OXo + (1 )Y1 . Here 0 < I is the rate of
depreciation of capital used in production. This process continues indefinitely,
each time with an independent choice of the production function (f, with
probability p r , 1 <, r < N). Thus, the capital X + , at hand in period n + 1
satisfies
Xt = 0X + ( 1 )q 1(X.) (n 0), (14.17)
where cp is the random production function in period n,
P(q'n= fr) =pr (I <r<N),
and the cp (n >, 1) are independent. Thus the Markov process {X(x): n >, 0}
on the state space (0, cc) may be represented as
X.(X) = a( ...p(IX,
where, writing
g,(x):= 0x + (1 )f,(x), 1 < r < N, (14.18)
one has
P(an=9^)=Pr (I <r <N). (14.19)

Suppose, in addition to the assumptions already made, that
0+(1)limf;(x)>I (1<r<,N), (14.20)

xtO
i.e., lim x10 g,(x) > I for all r. As lim x ^ g(x) = 0 + (1 /3) lim x . f(x) = 0 < 1,
it follows from the strictly increasing and strict concavity properties of g r that
each g r has a unique fixed point a r (see Figure 14.1)
g,(a r )=a r (I <r<N). (14.21)
Note that by property (iii) of fr , a, < a z < < a N . If y >, a,, then
9r(Y) % g(a 1 ) % g 1 (a 1 ) = a,, so that X,,(x) >, a, for all n > 0 if x >, a,. Similarly,
if y < a N then gr(Y) 9 r(aN ) ' 9N(aN) = a N , so that X,,(x) < a N for all n >, 0 if
x < a N . As a consequence, if the initial state x is in [a,, a N ], then the process
n >, 0} remains in [a,, a N ] forever. In this case, one may take
J = [a,, a N ] to be the effective state space. Also, if x > a, then the nth iterate
of g i , namely gin ) (x), decreases as n increases. For if x a,, then g,(x) < x,
8121 (x) = g, (g, (x)) < g, (x), etc. The limit of this decreasing sequence is a fixed
point of g l (Exercise 3) and, therefore, must be a,. Similarly, if x < a N then
g(x) increases, as n increases, to a N . In particular,
lim g(a N ) = a,, lim g(a 1 ) = a N .

n-x n-y,
Thus, there exists an integer n 0 such that
no)
91 (aN) < 9 N< 0) (a,). (14.22)
a,
u a,
Figure 14.1

This means that if z o e [gi" '(aN), 9 ) (a1)], then
P(X" O (x) <z O Vxc[a i ,a N ])>P(oc"=g l for I <,n <n o )=pi 0 >0,
P(X"o(x) >, z 0 Vx e [a1, aN]) % P(a = gN for 1 < n < n o ) = pn > 0.
Hence, the condition (1.4.4) of Example I holds, and there exists a unique
invariant probability it, if the state space is taken to be [a,, a N ].
Next fix the initial state x in (0, a 1 ). Then g(x) increases, as n increases.
The limit must be a fixed point and, therefore, a 1 . Since g(a 1 ) > a, for
r = 2, ... , N, there exists s > 0 such that g(y) > a l (2 r < N) if
ye [a 1 e, a 1 ]. Now find n E such that g(x) >, a i e. If T, inf{n > 1:
X(x) >_ a,}, then it follows from the above that
P(z i >n E +k)<pi (k^l),
because t, > n E + k implies that the last k among the first n e + k function a"
are g 1 . Since p; goes to zero as k + oo, it follows from this that i l is a.s. finite.
Also XTI (x) < a N as g(y) < gr(aN) gN(aN) = aN (t < r < N) for y <, a 1 , so that
in a single step it is not possible to go from a state less than a 1 to a state larger
than a N . By the strong Markov property, and the result in the preceding
paragraph on the existence of a unique invariant distribution and stability on
[a l , a N ], it follows that XL , + ,(x) converges in distribution to it, as m o0
(Exercise 5). From this, one may show that pl"'(x, dy) converges weakly to it(dy)
for all x, as n + oo, so that it is the unique invariant probability on (0, oo)
(Exercise 5).
In the same manner it may be checked that X(x) converges in distribution
to it if x > a N . Thus, no matter what the initial state x is, X(x) converges in
distribution to n. Therefore, on the state space (0, cc) there exists a unique
invariant distribution it (assigning probability 1 to [a,, a N ]), and stability holds.
In analogy with the case of Markov chains, one may call the set of states
{x; 0 < x <a 1 or x > a N } inessential.
The study of the existence of unique invariant probabilities and stability is
relatively simpler for those cases in which the transition probabilities p(x, dy)
have a density p(x, y), say, with respect to some reference measure (dy) on the
state space. In the case of Markov chains this measure may be taken to be the
counting measure, assigning mass I to each singleton in the state space. For a
class of simple examples with an uncountable state space, let S = Il' and f a
bounded measurable function on I8', a < f (x) < b. Let {E"} be an i.i.d. sequence
of real-valued random variables whose common distribution has a strictly
positive continuous density cp with respect to Lebesgue measure on 1. Consider
the Markov process
X" +1 := f(X") + E" + , (n > 0), (14.23)
with X0 arbitrary (independent of {E"}). Then the transition probability p(x, dy)

has the density
p (x, y):= (P(y .i (x)). (14.24)
Note that
tp(y f (x)) >, ii(y) for all XE 68, (14.25)
where
i(y):= min{cp(y z): a <, z < b} > 0.
Then (see theoretical complement 6.1) it follows that this Markov process has
a unique invariant probability with a density n(y) and that the distribution of
X. converges to it(y) dy, whatever the initial state.
The following example illustrates the dramatic difference between the cases
when a density exists and when it does not.
Example 2. Let S = [-2,2] and consider the Markov process X + , =

.f(X) + r + , (n > 0), X0 independent of {^}, where ie} is an i.i.d. sequence
with values in [-1,1], and
x+l if-2x0,
(14.26)
Lx-1 if 0<x^2.
First let E be Bernoulli, P(E = I) = '' = P(f _ I). Then, with X. - x e (0, 2],
X 1 (x) =
I x 2
x
with probability ' ,
with probability i,
and X, (x 2) has the same distribution as X 1 (x). It follows that
P(X 2 (x)=x-2IX,(x)=x)=?=P(X 2 (x)=x 21X,(x)=x-2),

P(X 2 (x) = xI X 1 (x) = x) = z = P(X 2 (x) = x X 1 (x) = x 2).
In other words, X 1 (x) and X 2 (x) are independent and have the same two-point
distribution It s . It follows that { X(x): n >, I } is i.i.d. with common distribution
m. In particular, n x is an invariant initial distribution. If x e [-2,0], then
{X(x): n > l} is i.i.d. with common distribution nx+2, assigning probabilities
Z and Z to {x + 2} and {x}. Thus, there is an uncountable family
of invariant initial distributions {n r : 0 < x < 1) v {n x+ ,: I _< x 0}.
On the other hand, suppose s is uniform on [ 1, 1], i.e., has the density z
on [ 1, 1] and zero outside. Check that (Exercise 6) {X 2 (x): n > l} is an i.i.d.

sequence whose common distribution does not depend on x and has a density
7 r(Y) = 2 4 IYI
2<y<2. (14.27)
Thus, i(y) dy is the unique invariant probability, and stability holds.
The final example deals with absorption probabilities.
Example 3. (Survival Probability of an Economic Agent). Suppose that in a

one-good economy the agent starts with an initial stock x. A fixed amount
c > 0 is consumed (x > c) and the remainder x c is invested for production in
the next period. The stock produced in period 1 is X l = t 1 (X0 c) = E 1 (x c),
where a l is a nonnegative random variable. Again, after consumption, X l c
is invested in production of a stock of E 2 (X 1 c), provided X l > c. If X l < c,
the agent is ruined. In general,
Xo = x, Xn+l = En+1 (Xn c) (n>0). (14.28)
where {En : fl ? l} is an i.i.d. sequence of nonnegative random variables. The

state space may be taken to be [0, oo) with absorption at 0. The probability of
survival of the economic agent, starting with an initial stock x > c, is
p(x):= P(Xn > c for all n > 01 X 0 = x). (14.29)
If S = P(c 1 = 0) > 0, then it is simple to check that P(r n = 0 for some

n > 0) = 1 (Exercise 8), so that p(x) = 0 for all x. Therefore, assume
P(e 1 > 0) = 1. (14.30)
From (14.28) one gets, by successive iteration,

c
c+ --
Xn+l > C lff Xn > C + C iff Xn - 1 > C +
En+l =C+c + - C
E n+1 En En 2n n+1
C CC
iffX,=x>c+ c +-- -+..+ -
EI E1E2 E1E2...sn+1
Hence, on the set {E n > 0 for all n},
{Xn >cforalln}={x>c+ +
_ - ++ 1 foralln}
l E1 6162 E1gZ...En
)
` D
=jxiC+cF
1 llj^
( 1 0D x
,<--1
l n=1 6 I E Z ...E n ) In=1 E182...En C
}

In other words, Ix
p(x)=P1 x 1 (14.31)
(n=j E 1 g 2 C,, C
This formula will be used to determine conditions on the common distribution

of E n under which (1) p(x) = 0, (2) p(x) = 1, (3) p(x) c). Suppose first
that E log E, exists and E log E, < 0. Then, by the Strong Law of Large
Numbers,
1 n
Y log E n gElogE l <0,
n r= , a.s.
so that log E 1 c 2 E --p oo a.s., or e,e 2 E n --, 0 a.s. This implies that the
infinite series in (14.31) diverges a.s., that is,
p(x) = 0 for all x, if E log E, < 0. (14.32)
Now by Jensen's Inequality (Chapter 0, Section 2), E log E, < log EE,, with
strict inequality unless E, is degenerate. Therefore, if EE, < 1, or EE, = 1 and
r, is nondegenerate, then E log e, < 0. If E, is degenerate and EE 1 = 1, then
P(E 1 = 1) = 1, and the infinite series in (14.31) diverges. Therefore, (14.32)
implies
p(x) = 0 for all x, if EE 1 < 1. (14.33)
It is not true, however, that E log E, > 0 implies p(x) = 1 for large x. To see
this and for some different criteria, define
m:=inf{z>0:P(e, <z)>0}. (14.34)

Let us show that
p(x)<1 for allx,ifm<l. (14.35)
For this fix A > 0, however large. Find n o such that
n o >A rfl (I +). (14.36)

r
This is possible, as 11 (1 + 1/r 2 ) < exp{j] l/r 2 } < oo. If m 0 for all r >, 1. Hence,
0< P(E r z I + l/r 2 for I r n0 )

n0I n I nU In
r=1 E1...E r, E,...gr
r1
(l+)
\ 1 + 1 /
API >A^^P >A.

\ r =I E,...0 r=1 E1...Er

Because A is arbitrary, (14.31) is less than 1 for all x, proving (14.35).

One may also show that, if m> 1, then
m
<1 ifx<c
m 1
P(x) _ (m > 1). (14.37)
m
=1 ifx>c
m
To prove this, observe that
1 < 1 1
n =1 E1...E,, n =1 m m 1
with probability 1 (if m > 1). Therefore, (14.31) implies the second relation in
for some 8 > 0. Then x/c 1 < 1/(m 1) Choose b.

(14.37). In order to prove the first relation in (14.37), let x < cm/(m 1) cS
such that n(b)
<
1

a (14.38)
n(b) m 2
and then choose 8 r > 0 (1 <, r n(b)

<, 1) such that
n(S) 1 1 n(S) 1 1 a
r -1 (m +b )...(
i m+8 r ) > = 1 mr
-2. (14.39)
Then
0<P(Er<m+Srfor I <r<n(b) 1)<P(

"r=1
Y_
(a)-1 1
E1...Er
>Y
n(a)-1
r.l m
Y
5P1 1 >> 1 a1 =P
\ r =1 ElE2...Er r =1 m
r )
1 > 1 SI.
(r11r m )
Y
If (5 >0 is small enough, the last probability is smaller than P(I 1/(E, E r ) >
x/c 1), provided x/c 1 < 1/(m 1), i.e., if x < cm/(m 1). Thus for such
x one has 1 p(x) > 0, proving the first relation in (14.37).
15 CHAPTER APPLICATION: DATA COMPRESSION AND

ENTROPY
The mathematical theory of information storage and retrieval rests largely on

foundations established by Claude Shannon. In the present section we will
consider one aspect of the general theory. We will suppose that "text" is
CHAPTER APPLICATION: DATA COMPRESSION AND ENTROPY 185
constructed from symbols in a finite alphabet S = {a,, a 2 , ... , a M }. The term

"text" may be interpreted rather broadly and need not be restricted to the text
of ordinary human language; other usages occur in genetic sequences of DNA,
computer data storage, music, etc. However, applications to linguistics have
played a central role in the historical development as will be discussed below.
An encoding algorithm is a transformation in which finite sequences of text are
replaced by sequences of code symbols in such a way that it must be possible
to uniquely reconstruct (decode) the original text from the encoded text. For
simplicity we shall consider compression codes in which the same alphabet is
available for code symbols. Any sequence of symbols is referred to as a word
and the number of symbols as its length. A word of length t will be encoded
into a code-word of length s by the encoding algorithm. To compress the text
one would use a short code for frequently occurring words and reserve the
longer code-words for the more rarely occurring sequences. It is in this
connection that the statistical structure of text (i.e., word frequencies) will play
an important role. We consider a case in which the symbols occur in sequence
according to a Markov chain having a stationary transition law
((p,: i j = 1, 2, ... , M)). Shannon has suggested the following scheme for
generating a Markov approximation to English text. Open a book and select
a letter at random, say T. Next skip a few lines and read until a T is encountered,
and select the letter that follows this T (we observed an H in our trial). Next
skip a few more lines, and read until an H is encountered and take the next
letter (we observed an A), and so on to generate a sample of text that should
be more closely Markovian than text composed according to the usual rules
of English grammar. Moreover, one expects this to resemble more closely the
structure of the English language than independent samples of randomly selected
single letters. Accordingly, one may also consider higher-order Markov
approximations to the structure of English text, for example, by selecting letters
according to the two preceding letters. Likewise, as was also done by Shannon
and others, one may generate text by using "linguistic words" as the basic
alphabetic symbols (see theoretical complement 1 for references). It is of
significant historical notice that, in spite of a modern-day widespread utility in
diverse physical sciences, Markov himself developed his ideas on dependence
with linguistic applications in mind.
Let X0 , X,, ... denote the Markov chain with state space S having the
stationary transition law ((p ;j )) and a unique invariant initial distribution
n = (n ). Then {X} is a stationary process and the word a = ( a, a ;Z ..... a )
; ;
of length t has probability 7c i , p ; ,. ; z p ;, _ .;, of occurring. Suppose that under

the coding transformation the word a = ( a ,, .... a ; ,) is encoded to a word of
;
length s = c(a ...... a ,). Let

; ;
u, = E[c(X,..... X,)]/t. (15.1)
The quantity p, is referred to as the average compression for words of length t.

The optimal extent to which a given (statistical) type of text can be compressed
by a code is measured by the so-called compression coefcient defined by
It = limsup ,. (15.2)
t-.x
The problem for our consideration here is to calculate the compression

coefficient in terms of the parameters of the given Markov structure of the text
and to construct an optimum compression code. We will show that the optimal
compression coefficient is given by
H 17r; H; 7c, ^ p ^ log Pt1J

It
L'
= ------ log
;
^ --, ( 15.3 )
log M log M
in the sense that the compression coefficient of a code is never smaller than
this, although there are codes whose coefficient is arbitrarily close to it.
The parameter H = p, log p ;j is referred to as the entropy of the
;
transition distribution from state i and is a measure of the information obtained

when the Markov chain moves one step ahead out of state i (Exercises 1 and
2). The quantity H = Y_ zt H is called the entropy of the Markov chain. Observe
; ; ;
that, given the transition law of the Markov chain, the optimal compression
coefficient may easily be computed from (15.3) once the invariant initial
distribution is determined.
For a word a of length t let p,(a) = P((Xo , ... , X, _ ,) = a). Then,
log P,((Xo, ... , X,-1)) = log Tt(Xo ) + Y, ( log Px;.x1. ) = Yo + Z Y

(15.4)
where Y,, Y2 , . .. is a stationary sequence of bounded random variables. By the

law of large numbers applied to the stationary sequence Y1 , Y2 ,... , we have
by Theorems 9.2 and 9.3 (see Exercise 10.5),
log P,((Xo _- , X,- 1 )) *

EY, = H (15.5)
as t --+ oo with probability 1 (Exercise 4); i.e., for almost all sample realizations,
for large t the probability of the sequence X0 , X,, ... , X,_, is approximately
exp{ tH}. The result (15.5) is quite remarkable. It has a natural generalization
that applies to a large class of stationary processes so long as a law of large
numbers applies (Exercise 4).
An important consequence of (15.5) that will be used below is obtained by
considering for each t the M` words of length t arranged as x ( , x ( ,., ... in order
of decreasing probability. For any positive number a < 1, let
Ne (e) = min{N: Y_ p,(a^,^) s}. (15.6)

CHAPTER APPLICATION: DATA COMPRESSION AND ENTROPY 187
Proposition 15.1. For any 0 < g < 1,
log N,(c)
lim - - = H. (15.7)
Proof. Since almost sure convergence implies convergence in probability, it

follows from (15.5) that for arbitrarily small positive numbers y and S,
b < max{e, I s}, for all sufficiently large t we have
C ^ log
Pt(Xo, ... , X' -' )
t
H < v )>
In particular, for all sufficiently large t, say t >, T,
exp{t(H + y)} < p,(X 0 , ... , X,_,) < exp{t(H y)}
with probability at least I . Let R, denote the set consisting of all words x
of length t such that e - ` ( " + }' ) < p 1 (a) < e - ` ( " - ' ) . Fix t larger than T. Let
Sr = {a ) , a (2 ... , ; N , (e }. The sum of the probabilities of the M,(E), say, words
a of length t in R, that are counted among the N,(e) words in S, equals
Yr s, u R, p,(a) > r 5 by definition of N,(e). Therefore,
M, ( s ) e -r(H-vi >, I p,(a) > t b. (15.8)

aeS,' R,
Also, none of the elements of S, has probability less than exp{ t(H + y)}, since
the set of all a with p,(a) > exp{ t(H + y)} contains R, and has total probability
larger than I > E. Therefore,
N,(c)c- pur +y) < 1. (15.9)
Taking logarithms, we have
log N,(e)
<H+y.
t
On the other hand, by (15.8),
N,(r)e-ûr-Y) > e 6. (15.10)
Again taking logarithms and now combining this with (15.9), we get
log N,(E)
t
Since y and 6 are arbitrarily small the proof is complete. n

Returning to the problem of calculating the compression coefficient, first let

us show that p >, H/log M. Let 6 > 0 and let H' = H 2b < H. For an arbitrary
given code, let
J, = {a a is a word of length t and c(a) < tH'/log M}. (15.11)
Then
e`H'M
#J < M + M 2 + + MIIH'IIogM] < M(1H'jiogM) { l + 1/M + _ ,
M-1
(15.12)
since the number of code-words of length k is M k . Now observe that
tH'
t,=Ec(X1,...,X^)%lo MP{(X...,X,)eJi}
g
_ ^H[1 P{(X,,...,X,)eJ,}]. (15.13)

log M
Therefore,
p = limsup p, limsup [1 P{(X,, ... , X,) e J}]. (15.14)

> log M
Now observe that for any positive number e < 1, for the probability
P({(X,, ... , Xr ) e J}) to exceed E requires that N,(E) be smaller than #J,. In
view of (15.12) this means that
N,(E) <M--exp{t(H 2b)} (15.15)

M-1
or
log N1(c) <

0(1) + H 26. (15.16)
t
Now by Proposition 15.1 for any given t this can hold for at most finitely many
values of t. In other words, we must have that the probability
P({ (X,, ... , X,) e Jr }) tends to 0 in the limit as t grows without bound. Therefore,
(15.14) becomes
H' H-26
(15.17)
log M log M
and since 6 > 0 is arbitrary we get p >, H/log M as desired.

EXERCISES 189
To prove the reverse inequality, and therefore (15.3), again let b be an

arbitrarily small positive number. We shall construct a code whose compression
coefficient u does not exceed (H + S)/log M. For arbitrary positive numbers y
and e, we have
N1(1 e) < e ^(H+y) = M t(n+y)(Io9m (15.18)
for all sufficiently large t. That is, the number of (relatively) high-probability
words of length t, the sum of whose probabilities exceeds I e, is no greater
than the number M` (H +y ) / I og At of words of length t(H + y)/log M. Therefore,
there are enough distinct sequences of length t(H + y)/log M to code the
N (1 e) words "most likely to occur." For the lower-probability words, the
sum of whose probabilities does not exceed 1 (I e) = t, just code each one
as itself. To ensure uniqueness for decoding, one may put one of the previously
unused sequences of length t(H + y)/log M in front of each of the self-coded
terms. The length c(X0 , X,.... , X,_,) of code-words for such a code is then
either t(H + y)/log M or t + t(H + y)/log M, the latter occurring with
probability at most e. Therefore,
t(H + y) [ _
t(H + y)l t(H + 8)
Ec(X0 , X 1 .... , X1 _ t ) ^ ^
l + t + (15.19)
g lo g M F lo g M
where b = eH + e log M + ey + y. The desired inequality now follows.
EXERCISES
Exercises for Section I1.1

1. Verify that the conditional distribution of X n+ , given Xo , X i , .... Xn is the
conditional distribution of X +1 given X. if and only if the conditional distribution
of X + given
given Xo , X 1 , ... , Xn is a (measurable) function of X. alone. [Hint: Use
properties of conditional expectations. Section 0.4.]
2. Show that the simple random walk has the Markov property.
3. Show that every discrete-parameter stochastic process with independent increments
is a Markov process.
4. Let A, B, C be events with C, B n C having positive probabilities. Verify that the
following are equivalent versions of the conditional independence of A and B given
C: P(A n B C) = P(A C)P(B I C) if and only if P(A I B n C) = P(A I C).
5. (i) Let {Xn } be a sequence of random variables with denumerable state space S.
Call {Xn } rth order Markov-dependent if
P(Xn +i =JI Xo=io,..., Xn = i. '
= P(Xn+ 1 =j I Xn-r+1 = ( n-r+ l+ , Xn = in ) for i o , .. , i n , j e S, n i r.

Show that Y. = (XX , X + I , ... , X + ,._, ), n = 0, 1, 2, ... is a (first-order) Markov

chain under these circumstances.
(ii) Let V. = X +1 X, n = 0, 1, 2, .... Show that if {X} is a Markov chain, then
so is {(X, V)}. [Hint: Consider first {(X X + ,)} and then apply a one-to-one
transformation.]
6. Show that a necessary and sufficient condition on the correlations of a
discrete-parameter stationary Gaussian stochastic process {X n } to have a Markov
property is Cov(X,,, Xn+m ) = a 2 pi for some a' > 0, 1p, < 1, m, n = 0, 1, 2, .... [A
stochastic process {X: n >, 0} is said to be stationary if, for all n, m, the distribution
of (Xo , .. , X) is the same as that of (Xm, Xm+1, , Xm+n)]
7. Let {} be an i.i.d. sequence of 1-valued Bernoulli random variables with
parameter 0 <p < 1. Define a new stochastic process by XX = (Y + Y_,)/2, for
n = 1, 2, .... Show that {X} does not have the Markov property.
8. Let {S} denote the simple symmetric random walk starting at the origin and let
R = ISn I. Show that {R n } is a Markov chain.
9. (Random Walk in Random Scenery) Let { Y: n e 7L} be a symmetric i.i.d. sequence
of + 1-valued random variables indexed by the set of integers 1. Let {S}
be the simple symmetric random walk on the state space 7l starting at S o = 0. The
random walk {S} is assumed to be independent of the random scenery {Y}. Define
a new process {X} by noting down the scenery at each integer site upon arrival in
the course of the walk. That is, X. = Ys, n = 0, 1, 2, ... .
(i) Calculate EX. [Hint: X. _ ^m=_ YmI(s.=mj.]
(*ii) Show that {X} is stationary. [See Exercise 6 for a definition of stationarity.]
(iii) Is {X} Markovian?
(iv) Show that Cov(X, X +m ) (2m)112m 112 for large even m, and zero for odd
m. (For an analysis of the "long-range dependence in this example, see H.
Kesten and F. Spitzer (1979), "A Limit Theorem Related to a New Class of
Self-Similar Processes, Z. Wahr. Verw. Geb., 50, 5-25.)
10. Let {Z: n = 0, 1, 2, ...} be i.i.d. + I-valued with P(Z = 1) = p ^ 2. Define
X=Z Zn +1 , n=0,1,2,.... Show that for k_<n-1, P(Xn+1 =jIXk =i)=
P(X + 1 j), i.e., X + I and Xk are independent for each k = 0, ... , n 1, n 1. Is
{X} a Markov chain?
Exercises for Section II.2

1. (i) Show that the transition matrix for a sequence of independent integer-valued
random variables is characterized by the property that its rows are identical; i.e.,
p ;f =p ;for all i,jeS.
(ii) Under what further condition is the Markov chain an i.i.d. sequence?
2. (i) Let {} be a Markov chain with a one-step transition matrix p. Suppose that
the process {Y} is viewed only at every mth time step (m fixed) and let X. = Y m ,
for n = 0, 1, 2, .... Show that {X} is a Markov chain with one-step transition
law given by pm.
(ii) Suppose {X} is a Markov chain with transition probability matrix p. Let
n l < n 2 < < n k . Prove that
(nkk-l)
P(Xk _J I X 1 = i ^, . . . , Xnk - = `k O = Pik - IJ

EXERCISES 191
3. (Random Walks on a Group) Let G be a finite group with group operation denoted
by. That is, G is a nonempty set and is a well-defined binary operation for G
such that (i) if x, y a G then x p+ y e G; (ii) if x, y, z e G then x (D(v z) = (v J y) z;
(iii) there is an e E G such that x (D e = e p+ x = x for all x e G; (iv) for each XE G
there is an element in G, denoted x, such that x $(x) = (_r) +Q x = e. If (1 is
commutative, i.e., x Q+ y = y O + x for all x, y E G, then G is called abelian. Let
X 1 , X 2 , ... be i.i.d. random variables taking values in G and having the common
probability distribution Q(g) = P(X = g), g e G.
(i) Show that the random walk on G defined by S = X 0 Q+ X i Q+ O+ X, ri _> 0, is
a Markov chain and calculate its transition probability matrix. Note that it is
not necessary for G to be abelian for {S} to be Markov.
(ii) (Top-In Card Shuffles) Construct a model for card shuffling as a Markov chain
on a (nonabelian) permutation group on N symbols in which the top card of
the deck is inserted at a randomly selected location in the deck at each shuffle.
(iii) Calculate the transition probability matrix for N = 3. [Hint: Shuffles are of the
form (c1, c2, c 3) s (c2. c1, c 3) or (c2, c3, C1) only.] Also see Exercise 4.5.
An individual with a highly contagious disease enters a population. During each
subsequent period, either the carrier will infect a new person or be discovered and
removed by public health officials. A carrier is discovered and removed with
probability q = I p at each unit of time. An unremoved infected individual is sure
to infect someone in each time unit. The time evolution of the number of infected
individuals in the population is assumed to be a Markov chain {X: n = 0, 1, 2, ...}.
What are its transition probabilities?
5. The price of a certain commodity varies over the values 1, 2, 3, 4, 5 units depending
on supply and demand. The price X at time n determines the demand D. at time n
through the relation D. = N X, where N is a constant larger than 5. The supply
C. at time n is given by CC = N 3 + E. where {F} is an i.i.d. sequence of equally
likely 1-valued Bernoulli random variables. Price changes are made according to
the following policy:
X,, +1 X=+1 if DC>0,

X + , X= 1 if DC<0,
X, 1 X= 0 if DC=0.
(i) Fix X0 = i o . Show that {X} is a Markov chain with state space S = { 1, 2, 3, 4, 5}.
(ii) Compute the transition probability matrix of {X}.
(iii) Calculate the two-step transition probabilities.
6. A reservoir has finite capacity of h units, where h is a positive integer. The daily
inputs are i.i.d. integer-valued random variables {J: n = 1, 2, ...} with the common
p.m.f. {g j = P(J = j), j = 0, 1, 2, ...}. One unit of water is released through the dam
at the end of each day provided that the reservoir is not empty or does not exceed
its capacity. If it is empty, there is no release. If it exceeds capacity, then the excess
water is released. Let X. denote the amount of water left in the reservoir on the nth
day after release of water. Compute the transition matrix for {X}.
7. Suppose that at each unit of time each particle located in a fixed region of space has
probability p, independently of the other particles present, of leaving the region. Also,

at each unit of time a random number of new particles having Poisson distribution
with parameter ) enter the region independently of the number of particles already
present at time n. Let X. denote the number of particles in the region at time n.
Calculate the transition matrix of the Markov chain {X"}.
8. We are given two boxes A and B containing a total of N labeled balls. A ball is
selected at random (all selections being equally likely) at time n from among the N
balls and then a box is selected at random. Box A is selected with probability p and
B with probability q = I p independently of the ball selected. The selected ball is
moved to the selected box, unless the ball is already in it. Consider the Markov
evolution of the number X. of balls in box A. Calculate its transition matrix.
9. Each cell of a certain organism contains N particles, some of which are of type A
and the others type B. The cell is said to be in state j if it contains exactly j particles
of type A. Daughter cells are formed by cell division as follows: Each particle replicates
itself and a daughter cell inherits N particles chosen at random from the 2j particles
of type A and the 2N 2j particles of type B present in the parental cell. Calculate
the transition matrix of this Markov chain.

1. Let p be the transition matrix for a completely random motion of Example 2. Show
that p" = p for all n.
2. Calculate p;; for the unrestricted simple random walk.
)
3. Let p = ((p i; )) denote the transition matrix for the unrestricted general random walk
of Example 6.
(i) Calculate p;t interms of the increment distribution Q.
(ii) Show that p = Q*"(j i), where the n-fold convolution is defined recursively by
Q * "(j) _ Q *( " - I) (k)Q(j k), Q * "' = Q.

k
4. Verify each of the following for the Plya urn model in Example 8.
(i) P(X" = 1) = r/(r + b) for each n = 1, 2, 3, ... .
(ii) P(X1 = e1,...,Xn =En)= P(Xi+n=x ,...,X. +h= ),foranyh=0,1,2,...
(*iii) {X is a martingale (see Definition 13.2, Chapter I).
X}
5. Describe the motion represented by a Markov chain having transition matrix of the
following forms:
10
(i) = 0 1 ],
01
=[1 0J'
(ii)
I
5 S
(iii) p =
ss
EXERCISES 193
(iv) Use the probabilistic description to write down p" without algebraically
performing the matrix multiplications. Generalize these to m-state Markov
chains.
(Length of a Queue) Suppose that items arrive at a shop for repair on a daily basis
but that it takes one day to repair each item. New arrivals are put on a waiting list
for repair. Let A. denote the number of arrivals during the nth day. Let X. be the
length of the waiting list at the end of the nth day. Assume that A,, A,, .. . is an i.i.d.
nonnegative integer-valued sequence of random variables with a(x) = P(A = x),
x = 0, 1, 2, .... Assume that A + , is independent of X o X (n _> 0). Calculate
, ... ,
the transition probabilities for {X}.
_<
7. (Pseudo Random Number Generator) The linear congruential method of generating
integer values in the range 0 to N I is to calculate h(x) = (ax + c) mod(N) for
some choice of integer coefficients 0 a, c < N and an initial seed value of x.
More generally, polynomials with integer coefficients can be used in place of ax + c.
Note that these methods cycle after N iterations.
(i) Show that the iterations may be represented by a Markov chain on a circle.
(ii) Calculate the transition probabilities in the case N = 5, a = 1, c = 2.
(iii) Calculate the transition probabilities in the case h(x) _ (x 2 + 2) mod(5).
[See D. Knuth (1981), The Art of Computer Programming, Vol. lt, 2nd ed.,
Addison-Wesley, Menlo Park, for extensive treatments.]
(A Renewal Process) A system requires a certain device for its operation that is
subject to failure. Inspections for failure are made at regular points in time, so that
an item that fails during the nth period of time between n 1 and n is replaced at
time n by a device of the same type having an independent service life. Let p denote
the probability that a device will fail during the nth period of its use. Let X. be the
age (in number of periods) of the item in use at time n. A new item is started at time
n = 0, and X. = 0 if an item has just been replaced at time n. Calculate the transition
matrix of the Markov chain {X}.
A balanced six-sided die is rolled repeatedly. Let Z denote the smallest number of
rolls for the occurrence of all six possible faces. Let Z, = 1, Z ; = smallest number of
tosses to obtain the jth new face after j 1 distinct faces have occurred. Then
Z=Z 1 ++Z 6 .
(i) Give a direct proof that Z,, ... , Z 6 are independent random variables.
(ii) Give a proof of (i) using the strong Markov property. [Hint: Define stopping
times t, denoting the first time after r j _, that X. is not among X,, .... X T
where X,, X2 , are the respective outcomes on the successive tosses.]

...
(iii) Calculate the distributions of Z21 ... , Z 6

.
(iv) Calculate EZ and Var Z.

2. Let {S} denote the two-dimensional simple symmetric random walk on the integer
lattice starting at the origin. Define r, = inf{n: IISI1 = r}, r = 1, 2, ... , where
II(a, b)JI = (ai + Ibl. Describe the distribution of the process {S,,: n = 0, 1, 2, ...} in
the two cases r = 1 and r = 2.
3. (Coupon Collector's Problem) A box contains N balls labeled 0, 1, 2, ... , N 1.

Let T - TN be the number of selections (with replacement) required until each ball
is sampled at least once. Let T; be the number of selections required to sample j
distinct balls. Show that
(i) T=(T" T"-a)+(T"-I T"-2)+ ...

+(T2 T1)+T1,
where T, = 1, T2 T,..... T + , T, ... , T" T"_, are independent geomet-

rically distributed with parameters (N j)/N, respectively.
(ii) Let r j be the number of selections to get ball j. Then r ; is geometrically distributed.
(iii) P(T > m) [Hint: P(T > m) m)].
; ;
N )(
Ik
(iv) P(T > m) = k ^^ ( I)k+l(
[Hint: Use inclusionexclusion on {T> m} = U=L {i ; > m}.]

(v) Let X,, X2 , ... be the respective numbers on the balls selected. Is T a stopping
time for {X}?
4. Let {Xn } and { }.} be independent Markov chains with common transition probability
matrix p and starting in states i and j respectively.
(i) Show that {(X, Y)} is a Markov chain on the state space S x S.
(ii) Calculate the transition law of {(X n , Y,)}.
(iii) Let T = inf{n: X. = Y.J. Show that T is a stopping time for the process {(X n , Yc,)}.
(iv) Let {Z} be the process obtained by watching {X n } up until time T and then
switching to { } n } after time T; i.e., Zn = X, n < T, and Zn = Y., for n > T Show
that {Z} is a Markov chain and calculate its transition law.
5. (Top-In Card Shuffling) Suppose that a deck of N cards is shuffled by repeatedly
taking the top card and inserting it into the deck at a random location. Let G be
the (nonabelian) group of permutations on N symbols and let X,, X 2 , ... be i.i.d.
G-valued random variables with
P(Xk =<i,i-1,...,1>)=1/N fori= 1,2,...,N,
where (i, i 1, ... , I> is the permutation in which the ith value moves to i 1,
i I to i 2, ... 2 to 1, and 1 to i. Let S o be the identity permutation and let
S = X,. . . X, where the group operation is being expressed multiplicatively. Let T
denote the first time the original bottom card arrives at the top and is inserted back
into the deck (cf. Exercise 2.3). Then
(i) T is a stopping time.
(ii) T has the additional property that P(T = k, Sk = g) does not depend on g e G.
[Hint: Show by induction on N that at time T I the (N 1)! arrangements
of the cards beneath the top card are equally likely; see Exercise 2.3(iii).]
(iii) Property (ii) is equivalent to P(Sk = g I T = k) = 1/1G 5 I; i.e., the deck is mixed
at time T. This property is referred to as the strong uniform time property by
D. Aldous and P. Diaconis (1986), "Shuffling Cards and Stopping Times, Amer.
Math. Monthly, 93, pp. 333-348, who introduced this example and approach.

EXERCISES 195
_< _<
(iv) Show that
maxlP(S E A) Al P(T > n) Ne

A I GNI
[Hint: Write
P(ScA)=P(SEA,T_<n)+P(SeA,T>n)
0 _< _<
r
A
IGNI
I B + rP(T > n),
= I I P(T n) + P(S e A T > n)P(T > n) = Al
IGNJ
1. For the rightmost upper bound, compare Exercise 4.3.]

6. Suppose that for a Markov chain {Xn p,.,,:= Pr (XX = y for some n
that PX (X, = y for infinitely many n) = 1.
}, >_ 1) = 1. Prove
7. (Record Times) Let X 1 , X2 , .. . be an i.i.d. sequence of nonnegative random variables

having a continuous distribution (so that the probability of a tie is zero). Define
R, = 1, R k = inf {n_> R k _ I + 1:XX >,max(X,,...,XX _ 1 )}, for k=2,3,....
(i) Show that {R} is a Markov chain and calculate its transition probabilities.
[Hint: All ik ! rankings of (X,, X2 , ... , X;k are equally likely. Consider the event
)
{R 1 = 1, R 2 = '2, ... , R k = ik } and count the number of rankings of (X l , X2 , ... , X,

k)
that correspond to its occurrence.]

(ii) Let T = R +1 R. Is {T} a Markov chain? [Hint: Compute P(T3 = 1 T2 = 1, I
T1 = 1) and P(T3 =I T2 = 1).]
8. (Record Values) Let X,, X2 ,.. . be an i.i.d. sequence of nonnegative random
variables having a discrete distribution function. Define the record times R 1 = 1,
R 2 , R 3 , ... as in Exercise 7. Define the record values by Vk = XRk k = 1, 2, ... .
,
(i) Show that each R k is a stopping time for {X. }.
(ii) Show that {1'} is a Markov process and calculate its transition probabilities.
(iii) Extend (ii) to the case when the distribution function of Xk is continuous.

1. Construct a finite-state Markov chain such that
(i) There is only one inessential state.
(ii) The set .' of essential states decomposes into two equivalence classes with periods
d = I and d = 3.
2. (i) Give an example of a transition matrix for which all states are inessential.
(ii) Show that if S is finite then there is at least one essential state.
3. Classify all states for p given below into essential and inessential subsets. Decompose
the set of all essential states into equivalence classes of communicating states.
j 0 0 0 j 0 0
0 0 0; 0 0 2
I 1 1 1 1 1
6 6 6 6 6 6
0 1 0 0 0 1 0
5 0 0 0 3 00
0 0 0 6 0 0 6`
0 * 0 0 0 3 0
4. Suppose that S comprises a single essential class of aperiodic states. Show that there
is an integer v such that p$ 1 > 0 for all i, j e S by filling in the details of the following
steps.
(i) For a fixed (i, )), let B i; _ {v > 1: p;J) > 0}. Then for each state j, B is closed
under addition.
(ii) (Basic Number Theory Lemma) If B is a set of positive integers having greatest
common divisor 1 and if B is closed under addition, then there is an integer b
such that ne B for all n >, b. [Hints:
(a) Let G be the smallest additive subgroup of Z that contains B. Then argue
that G = Z since if d is the smallest positive integer in G it will follow that
if n E B, then, since n = qd + r, 0 <, r < d, one obtains r = n qd E G and
hence r = 0, i.e., d divides each n e B and thus d = 1.
(b) If leB,theneachn=l+1++Ie B. If10B,thenby(a), 1 =a
for a, E B. Check b = ( a + ) Z + 1 suffices; for if n > (a + ) 2 , then, writing
n > ( r + 1)(a + ).]

>_
n= q(a +)+r, 0<r <a+, n= q(a +)+r(a)= (q +r)a +(q r),
and in particular n e B since q + r > 0 and q r > 0 by virtue of
(iii) For each (i, j) there is an integer b;; such that v b implies v e B.. [Hint:
;;
Obtain b from (ii) applied to (i) and then choose k such that p;; 9 > 0. Check
that b = k + b suffices.]
;; ;;
(iv) Check that v = max {b : i, j e S} suffices for the statement of the exercise.
;;
5. Classify the states in Exercises 2.4, 2.5, 2.6, 2.8 as essential and inessential states.
Decompose the essential states into their respective equivalence classes.
6. Let p be the transition matrix on S = {0, 1, 2, 3} defined below.
2 0 ' 0 '2
20 Z 0
2 0 1 0 '2
2 0 2 0
Show that S is a single class of essential states of period 2 and calculate p" for all n.
7. Use the strong Markov property to prove that if j is inessential then P,,(X,, =j for
infinitely many n) = 0.
8. Show by induction on N that all states communicate in the Top-In Card Shuffling
example of Exercises 2.3(ii) and 4.5.

EXERCISES 197

1. (i) Check that "deterministic motion going one step to the right" on
S = {0, 1, 2, ...} provides a simple example of a homogeneous Markov chain
for which there is no invariant distribution.
(ii) Check that the "static evolution" corresponding to the identity matrix provides
a simple example of a 2-state Markov chain with more than one invariant
distribution (see Exercise 3.5(i)).
(iii) Check that "deterministic cyclic motion" of oscillations between states provides
a simple example of a 2-state Markov chain having a unique invariant
distribution it that is not the limiting distribution lim,,. P(X =1) for any
other initial distribution It : it (see Exercise 3.5(ii)).
(iv) Show by direct calculation for the case of a 2-state Markov chain having strictly
positive transition probabilities that there is a unique invariant distribution
that is the limiting distribution for any initial distribution. Calculate the precise
exponential rate of convergence.
2. Calculate the invariant distribution for Exercise 2.5. Calculate the so-called
equilibrium price of the commodity,
3. Calculate the invariant distribution for Exercise 2.8.
4. Calculate the invariant distribution for Exercise 2.3(ii) (see Exercise 5.8).
5. Suppose that n states a,, a 2 .... , a are arranged counterclockwise in a circle. A
particle jumps one unit in the clockwise direction with probability p, 0 _< p _< 1, or
one unit in the counterclockwise direction with probability q = I p. Calculate the
invariant distribution.
6. (i) (Coupling Bound) If X and Y are arbitrary real-valued random variables and
J an arbitrary interval then show that JP(X e J) P(Y e J)I < P(X ^ Y).
(ii) (Doeblin's Coupling Method) Let p be an aperiodic irreducible finite-state
transition matrix with invariant distribution n. Let {X} and { Y} be independent
Markov chains with transition law p and having respective initial distributions
it and S ; ,. Let T denote the first time n that X. =
(a) Show P(T < oo) = 1. [Hint: Let v = min{n: p >0 for all i, j e S}. Argue
that
P(T> kv)5 P(Xkv^ YkvIX(k-i)i# (k-1)v''Xv# Yv) ...
P(X 2 ,, ^ Y'2V I X ^ YV )P(X V 0 YY ) < (I Nb 2 ) k ,
where 6 = min ; p;j> > 0, N = ISI.]

(b) Show that {(X, ))} is a Markov chain on S x S and T is a stopping time
for this Markov chain.
(c) Define {Z} to be the process obtained by observing { Y} until it meets {X.}
and then watching {X} from then on. Show that {Z} is a Markov chain
with transition law p and invariant initial distribution H.
(d) Show IP(Z = j) P(XX =1)1 _< P(T _> n).
(e) Show Ip N2)o.iv^ -
1.
P(T >_ n) _< ( 1
Let A = ((a ;J )) be an N x N matrix. Suppose that A is a transition probability

matrix with strictly positive entries a ;j .
(i) Show that the spectral radius, i.e., the magnitude of the largest eigenvalue of A,
must be 1. [Hint: Check first that A is an eigenvalue of A (in the sense Ax = Ax

has a nonzero solution x = (x 1 x")') if and only if zA = atz has a nonzero
.
solution z = (z, z"); recall that det(B) = det(B') for any N x N matrix B.]
(ii) Show that A = 1 must be a simple eigenvalue of A (i.e., geometric multiplicity
1). [Hint: Suppose z is any (left) eigenvector corresponding to A = 1. By the
results of this section there must be an invariant distribution (positive
eigenvector) n. For t sufficiently large z + to is also positive (and normalizable).]
*8. Let A = ((a ;j )) be a N x N matrix with positive entries. Show that the spectral radius
is also given by min{.l > 0: Ax < Ax for some positive x}. [Hint: A and its transpose
A' have the same eigenvalues (why?) and therefore the same spectral radius. A' is
adjoint to A with respect to the usual (dot) inner product in the sense
(Ax, y) = (x, A'y) for all x, y, where (u, v) = Z" 1 u 1 v.. Apply the maximal property
to the spectral radius of A'.]
9. Let p(x, y) be a continuous function on [c, d] x [c, d] with c < d. Assume that
J
p(x, y) > 0 and p(x, y) dy = 1. Let S2 denote the space of all sequences
w = (x 0 x 1 ,...) of numbers x ; e [c, d]. Let .yo denote the class of all
,
finite-dimensional sets A of the form A = {co = (x 0 x,, . ..) e S2: a, <x < b ; , ,
i = 0,1, ... , n}, where c < a ; < b, < d for each i. Define P (A) for such a set A by P
Px(A) = "... P(x,Y1)p(Y1,Y2) ... p(Y "-1,Y ")dY" ... dy1 forxe[ao,bo].
J fa bl ,
Define P (A) = 0 if x < a o or x > b 0 . The Kolmogorov Extension Theorem assures

P
us that PX has a unique extension to a probability measure defined for all events in
the smallest sigmafield .y of subsets of Q that contains FO . For any nonnegative
integrable function y with integral 1, define
P(A) = J P (A)y(X) dx,

x A e F.
Let X. denote the nth coordinate projection mapping on 52. Then {X"} is said to
be a Markov chain on the state space S = [c, d] with transition density p(x, y) and
initial density y under P. Under P. the process is said to have initial state x.
(i) Prove the Markov property for {X"}; i.e., the conditional distribution of X"+ 1
given X0 ,... , X" is p(X", y) dy.
(ii) Compute the distribution of X. under P.
(iii) Show that under PY the conditional distribution of X. given X o = x o is
p 1 " ) ( x o , y) dy, where
p(x,Y)= (x, z)p(z, y) dz and p ( ' ) =p.

f p ( n - 1)
(iv) Show that if S = inf{p(x, y): x, ye [c, d]} > 0, then
J I P(x, Y) P(z, Y)I dy < 2[ 1 b(d c)]

EXERCISES 199
by breaking the integral into two terms involving y such that p(x, y) > p(z, y)
and those y such that p(x, y) -< p(z, y).
(v) Show that there is a continuous strictly positive function n(y) such that
max {Ip " (x, y) m(y)I: c -< x, y -< d} < [1 b(d c)]" 'p
( )
where p = max {Ip(x, y) p(z, y)I: c -< x, y, z -< d} < oo.

(vi) Prove that it is an invariant distribution and moreover that under the present
conditions it is unique.
(vii) Show that P(X" e (a, b) i.o.) = 1, for any c <a m).]

10. (Random Walk on the Circle) Let {X"} be an i.i.d. sequence of random variables
taking values in [0, 1] and having a continuous p.d.f. _f(x). Let {S} be the process
on [0, 1) defined by
S"=x+X,++X" mod1,
where x E [0, 1).

(i) Show that {S"} is a Markov chain and calculate its transition density.
(ii) Describe the time asymptotic behavior of p "(x, y).
(
(iii) Describe the invariant distribution.

11. (The Umbrella Problem) A person who owns r umbrellas distributes them between
home and office according to the following routine. If it is raining upon departure
from either place, an event that has probability p, say, then an umbrella is carried
to the other location if available at the location of departure. If it is not raining,
then an umbrella is not carried. Let X. denote the number of available umbrellas at
whatever place the person happens to be departing on the nth trip.
(i) Determine the transition probability matrix and the invariant distribution
(equilibrium).
(ii) Let 0 < a < 1. How many umbrellas should the person own so that the
probability of getting wet under the equilibrium distribution is at most a against
a climate (p)? What number works against all possible climates for the
probability a?

1. Calculate the invariant distributions for Exercise 2.6.
2. Calculate the invariant distribution for Exercise 5.6.
3. A transition matrix is called doubly-stochastic if its transpose p' is also a transition
matrix; i.e., if the elements of each column add to 1.
(i) Show that the vector consisting entirely of l's is invariant under p and can be
normalized to a probability distribution if S is finite.
(ii) Under what additional conditions is this distribution the unique invariant
distribution?
4. (i) Suppose that it = ( a ; ) is an invariant distribution for p. The distribution P,, of
the process is called time-reversible if it p ij = 7r j pj; for all i, j e S [it is often said
to be time-reversible (with respect to p) as well]. Show that if S is finite and p
is doubly stochastic, then the (discrete) uniform distribution makes the process
time-reversible if and only if p is symmetric.
(*ii) Suppose that {X^} is a Markov chain with invariant distribution it and started
in x. Then {X} is a stationary process and therefore has an extension backward
in time ton = 0, 1, 2..... [Use Kolmogorov's Extension Theorem.] Define
the time-reversed process by Y = X_,. Show that the reversed process {}} is
a Markov chain with 1-step transition probabilities q ;j = nj pji /n,.
(iii) Show that under the time-reversibility condition (i), the processes in (ii), { Y}
and {X}, have the same distribution; i.e., in equilibrium a movie of the evolution
looks the same statistically whether run forward or backward in time.
(iv) Show that an irreducible Markov chain on a state space S with an invariant
initial distribution it is time-reversible if and only if (Kolmogorov Condition):
Pr, Pi,i 2 Piki = PukP ikik- 1 * P i,i for all i, i 1 , .... i k E S, k >_ 1.
(v) If there is a j e S such that p ij > 0 for all i 0 j in (iv), then for time-reversibility
it is both necessary and sufficient that p ij Pik Pki = Pik Pkj Pji for all i, j, k.
5. (Random Walk on a Tree) A tree graph on n vertices v 1 , v 2 , ... , v, is a connected
graph that contains no cycles. [That is, there is given a collection of unordered pairs
of distinct vertices (called edges) with the following property: Any two distinct vertices
u, v e S are uniquely connected in the sense that there is a unique sequence
e l , e 2 .... , e of edges e ; _ {v k; , v,,} such that u e e, v e e,,, e i n e i ,, ^ 0,
i = 1, ... , n 1.] The degree v i of the vertex v ; represents the number
of vertices adjacent to v i , where u, v E S are called adjacent if there is an edge {u, v}.
By a tree random walk on a given tree graph we mean a Markov chain on the state
space S = {v1, v2.... , v r } that at each time step n changes its state v ; to one of its v i
randomly selected adjacent states, with equal probabilities and independently of its
states prior to time n.
(i) Explain that such a Markov chain must have a unique invariant distribution.
(ii) Calculate the invariant distribution in terms of the vertex degrees v i , i = 1, ... , r.
(iii) Show that the invariant distribution makes the tree random walk time-reversible.
6. Let {X} be a Markov chain on S and define Y. = (X, X n = 0, 1, 2, ... .
(i) Show that {Y} is a Markov chain on S' = {(i, j) E S x S: p i , > 0).
(ii) Show that if {X} is irreducible and aperiodic then so is {Y}.
(iii) Show that if {X} has invariant distribution it = (rz ; ) then {}} has invariant
distribution (n ; p ; ,).
7. Let {X} be an irreducible Markov chain on a finite state space S. Define a graph G
having states of S as vertices with edges joining i and j if and only if either p, j > 0
or pj; > 0.
(i) Show that G is connected; i.e., for any two sites i and j there is a path of edges
from i to j.
(ii) Show that if {X} has an invariant distribution it then for any A c S,
Y. E 7 ri Pij = /_ Y_ Irj Pjl

leA jES\A ieA jeS\A

EXERCISES 201
(i.e., the net probability flux across a cut of S into complementary subsets A, S\A
is in balance).
(iii) Show that if G contains no cycles (i.e., is a tree graph in the sense of Exercise
5). then the process is time-reversible started with n.
Exercises for Section 1I.8

1. Verify (8.10) using summation by parts as indicated. [Hint: Let Z be nonnegative
integer-valued. Then
P(Z>r)= P(Z=n).
.=o r=on=r+t
Now interchange the order of summation.]

2. Classify the states in Examples 3.1-3.7 as transient or recurrent.
3. Show that if j is transient and i *j then - o p ^ < co and, in particular, p *0
( )
as n oc. [Hint: Represent N(j) as a sum of indicator variables and use (8.11).]
4. Classify the states for the models in Exercises 2.6, 2.7, 2.8, 2.9 as transient or recurrent.
5. Classify the states for {R} = {IS,,}, where S. is the simple symmetric random walk
starting at 0 (see Exercise 1.8).
6. Show that inessential states are transient.
7. (A Birth or Collapse Model) Let
1 i
-- '
Pi.^+I =i+1 Pi.o i =0, 1,2,....
=i+ 1'
Determine whether p is transient or recurrent.

8. Solve Exercise 7 when
I i
A.o = Pi.;+I = i + 1' >_ I ' Po.i 1.
i+ l ' i
9. Let p,, +1 = p, p; , o = q, i = 0, 1, 2, .... Classify the states of S= {0, 1, 2, ...} as

transient or recurrent (0 1).
(ii) Sum (i) over n to give an alternative proof of (8.11).
(iii) Use (i) to indicate how one may compute the distribution of the first visit to
state j (after time zero), starting in state i, in terms of p;PP (n _> 1).
11. (i) Show that if II II and II 110 are any two norms on t, then there are positive
constants c,, c 2 such that
c1 IIxll -< Ilxllo <- c211x11 for all x e
A norm on t8 k is a nonnegative function x p II x II on 08 k with the properties that

(a) Ilcx11 = Icl Ilxll for c e R , x e ask,
(b) IlxII =0,xel8k,iffx=0,
(c) Ix + y11 _< Ilxll + IIYII for all x, ye R'.
[Hint: Use compactness of the unit ball in the case of the Euclidean norm,
IIXII=(xi+ ... +x)' 12 x=(x 1 ,...,x k )]
,
(ii) Show that the stability condition given in Example 1 implies that X. . 0 in
every norm.

1. Let Y1 , Y2 , ... be i.i.d. random variables.
(i) Show that if EIY,I < oo then {max(Y1 , .... Yn )}/n --* 0 a.s. as n > cc.
(ii) Verify (9.8) under the assumption (9.4). [Hint: Show that P(IYÎ > e i.o.) = 0
for every e> 0.]
2. Verify for (9.12) that
n
lim = E(T; 2) t) =
provided E j (tj(' ) ) < co.

3.Prove
Jim ' = o0
n w Nn
Pia.s. for a null-recurrent state j such that i#-j.

j.
4. Show that positive and null recurrence are class properties.
5. Let {X} be an irreducible aperiodic positive recurrent Markov chain having
transition matrix p on a denumerable state space S. Define a transition matrix q
on S x S by
= Pik P;i, (i, j), (k, 1) e S x S.
Let {Z} be a Markov chain on S x S with transition law q. Define

Tp = inf{n: Zn e D}, where D = {(i, i): je S} c S x S. Show that if P(i.j) (TD < cc) = I
for all i, j then for all i, je S, lim,,.,. p;; = 7t 1 , where {n t } is the unique invariant
)
distribution. [Hint: Use the coupling method described in Exercise 6.6 for finite
state space.]

EXERCISES 203
6. (General BirthCollapse) Let p be a transition probability matrix on

S={0,1,2,...} of the form e ;. , + ,=p i ,p,. o =I p j , i=0,1,2,...,O_ 1, Po = 1. Show:
(i) All states are recurrent
k x
iff lim F1 p j = 0 iff I (1 p j ) = oo .
(ii) If all states are recurrent, then positive recurrence holds
iff pj 0 and f (n) = 0 if n < 0. Describe the behavior of
[ (So) + ... + f(Sn- 1 )]/n as n -i oo.
8. Calculate the invariant distribution for the Renewal Model of Exercise 3.8, in the
case that p"=pn -1 (1 p),n=0, 1,2,. . . where 0 <p < 1.
9. (One-Dimensional Nearest-Neighbor Ising Model) The one-dimensional nearest-
neighbor Ising model of magnetism consists of a random distribution of + I-valued
random variables (spins) at the sites of the integers n = 0, 1, 2, .... The
parameters of the model are the inverse temperature /i = - 1 - > 0 where T is
kT
temperature and k is a universal constant called Boltzmann s constant, an external
field parameter H, and an interaction parameter (coupling constant) J. The spin
variables X", n = 0, + 1, 2, +3, ... , are distributed according to a stochastic
process on { 1,1 } indexed by Z with the Markov property and having stationary
transition law given by
exp{Jai + Hn}
p(X" + ' =nI X" =Q)= 2cosh(H+Ja)
for a, >a e { + 1, 1}, n = 0, + 1, 2.... ; by the Markov property is meant that

the conditional distribution of X" + , given {Xk : k _< n} does not depend on
{Xk ,k_<n-1}.
(i) Calculate the unique invariant distribution n for p.
(ii) Calculate the large-scale magnetization (i.e., ability to pick up nails), defined by
MN =[X_ N + ... +XN ]/(2N+1)
in the so-called bulk (thermodynamic) limit as N + oo.

(iii) Calculate and plot the graph (i.e., magnetic isotherm) of EX0 as a function of
H for fixed temperature. Show that in the limit as H 0 + or H 0 - the
bulk magnetization EX, tends to 0; i.e., there is no (zero) residual magnetization
remaining when H is turned off at any temperature.
(iv) Determine when the process (in equilibrium) is reversible for the invariant
distribution; modify Exercise 7.4 accordingly.
10. Show that if {X} is an irreducible positive-recurrent Markov chain then the
condition (iv) of Exercise 7.4 is necessary and sufficient for time-reversibility of the
stationary process started with distribution it.
*11. An invariant measure for a transition matrix ((p ;j )) is a sequence of nonnegative
numbers (m ; ) such that m ; p ;J = m j for all j ES. An invariant measure may or
may not be normalizable to a probability distribution on S.
(i) Let p ; _ ;+ , = p ; and p ; , o = I p ; for i = 0, 1, 2, .... Show that there is a unique
invariant measure (up to multiples) if and only if tim,,. fjk=, Pk = 0; i.e., if
and only if the chain is recurrent, since the product is the probability of no
return to the origin.
(ii) Show that invariant measures exist for the unrestricted simple random walk
but are not unique in the transient case, and is unique (up to multiples) in
the (null) recurrent case.
(iii) Let Poo = Poi = i and p i , ; _, = p i , ; = 2 - ' -Z , and p ;.;+ , = I 2 i = 1, 2, 3,
.... Show that the probability of not returning to 0 is positive (i.e., transience),
but that there is a unique invariant measure.
12. Let { Y} be any sequence of random variables having finite second moments and
let y, = Cov(Y, Y,), = EY, o = Var(Y) = y, and p
(i) Verify that 1 <_ p _< 1 for all n and m. [Hint: Use the Schwarz Inequality.]
(ii) Show that if p . , _< f(In ml), where f is a nonnegative function such that
n_ 2 Yk=f(k)Y n =1 Qk -*0 as n -* oo, then the WLLN holds for {}}.
(iii) Verify that if = p o ^_^ = p(ln ml) > 0, then it is sufficient that
p(k) -*0 as n -+ oc for the WLLN.
(iv) Show that in the case of nonpositive correlations it is sufficient that
n -1 Ik= 1 ak -+0 as n , oo for the WLLN.
13. Let p be the transition probability matrix for the asymmetric random walk on
S = {0, 1, 2, ...} with 0 absorbing and p i , ;+ , = p > Z for i _> 1. Explain why for
fixed i > 0,
1 ^
j e S,
n,-,
does not converge to the invariant distribution S o ({j}) (as n -- co). How can this
be modified to get convergence?
14. (Iterated Averaging)
(i) Let a,, a 2 , a 3 be three numbers. Define a 4 = (a, + a 2 + a 3 )/3, a 5 =
(a 2 + a 3 + a 4 )/3,.... Show that lim,, a = (a, + 2a 2 + 3a 3 )/6.
(ii) Let p be an irreducible positive recurrent transition law and let a,, a 2 , ... be
any bounded sequence of numbers. Show that
lim Pi! aj _
) ajnt,
where (it) is the invariant distribution of p. Show that the result of (i) is a
special case.
EXERCISES 205
1. (i) Let Y1 , Y,, ... be i.i.d. with EY < co. Show that max(Y 1 , ... , Y")/,/n 0 a.s.
as n * . [Hint: Show that P(Y.2 > ne i.o.) = 0 for every e > 0.]
(ii) Verify that n '/ZS" has the same limiting distribution as (10.6).
-
2. Let {W"(t): t >_ 0} be the path process defined in (10.7). Let t I < t 2 < < t k , k >_ 1,
be an arbitrary finite set of time points. Show that (W"(t 1 ), ... , W"(t k )) converges in
distribution as n ^ x to the multivariate Gaussian distribution with mean zero and
variancecovariance matrix ((D min{t i , t i })), where D is defined by (10.12).
3. Suppose that {X"} is Markov chain with state space S = { 1, 2, . .. , r} having unique
invariant distribution (it ; ). Let
N(i)= #{k_<n:X k =i }, ieS.
Show that
`(N"(1) Na(r)
n 7C 1 ...
n n
is asymptotically Gaussian with mean 0 and variancecovariance matrix F = ((y, J ))

where
= b it n ; n t + (p) njmi) + (pj n t 7Z I ), for 1 -< i, j -< r.

k=1 k=1
4. For the one-dimensional nearest-neighbor Ising model of Exercise 9.9 calculate the
following:
(i) The pair correlations p,,,, = Cov(X", X,").
(ii) The large-scale variance (magnetic susceptibility) parameter Var(X 0 ).
(iii) Describe the distribution of the fluctuations in the (bulk limit) magnetization
(cf. Exercise 9.9(i)).
5. Let {X"} be a Markov chain on S and define Y = (X", Xri+ 1 ), n = 0, 1, 2..... Let
p = ((p, j )) be the transition matrix for {X"}.
(i) Show that { Y"} is a Markov chain on the state space defined by
S'_ {(i,j)eS x S:p ij >0}.
(ii) Show that if {XX } is irreducible and aperiodic then so is { }"}.
(iii) Suppose that {X"} has invariant distribution it = (n i ). Calculate the invariant
distribution of { },}.
(iv) Let (i, j) e S' and let T be the number of one-step transitions from i to j by
X0 , X"... , X" started with the invariant distribution of (iii). Calculate
lim". x (T"/n) and describe the fluctuations about the limit for large n.
6. (Large-Sample Consistency in Statistical Parameter Estimation) Let XX = I or 0
according to whether the nth day at a specified location is wet (rain) or dry. Assume
{X"} is a two-state Markov chain with parameters = P(X" +1 = II X" = 0) and
S=P(X 1 =0^X"=1),n=0,1,2,...,0<f<1,0<S<I.Suppose that {X}
is in equilibrium with the invariant initial distribution it = (n 1 , n o ). Define statistics
based on the sample X 0 , X 1 , ... , X" to estimate /3, it , respectively, by t" = S/(n + 1)
and ^" = T"/n, where S. = X0 + + X. is the number of wet days and

)
_ >. k=O l ((Xk.Xk l)=(0.I)) is the number of dry-to-wet transitions. Calculate

n(" and lim". ^"
) )
7. Use the result of Exercise 1.5 to describe an extension of the SLLN and the CLT to
certain rth-order dependent Markov chains.
1. Let {X"} be a two-state Markov chain on S = {O, 1 } and let T o be the first time
{X"} reaches 0. Calculate P l (z o = n), n _> 1, in terms of the parameters p, o and Poi
2. Let {X"} be a three-state Markov chain on S = {0, 1, 2} where 0, 1, 2 are arranged
counterclockwise on a circle, and at each time a transition occurs one unit clockwise
with probability p or one unit counterclockwise with probability 1 p. Let t o denote
the time of the first return to 0. Calculate P(-r o > n), n > 1.
3. Let T o denote the first time starting in state 2 that the Markov chain in Exercise
5.6 reaches state 0. Calculate P 2 (r o > n).
4. Verify that the Markov chains starting at i having transition probabilities p and p,
and viewed up to time T A have the same distribution by calculating the probabilities
of the event {X0 = i, X, = i ...... Xm = m , T A =m} under each of p and p.
5. Write out a detailed explanation of (11.22).
6. Explain the calculation of (11.28) and (11.29) as given in the text using earlier results
on the long-term behavior of transition probabilities.
7. (Collocation) Show that there is a unique polynomial p(x) of degree k that takes
prescribed (given) values v o , v 1 ..... v k at any prescribed (given) distinct points
x 0 , x,, ... , x k , respectively; such a polynomial is called a collocation polynomial.
[Hint: Write down a linear system with the coefficients a o , a,, ... , a k of p(x) as the
unknowns. To show the system is nonsingular, view the determinant as a polynomial
and identify all of its zeros.]
*8. (Absorption Rates and the Spectral Radius) Let p be a transition probability matrix
for a finite-state Markov chain and let r ; be the time of the first visit to j. Use (11.4)
and the results of the PerronFrobenius Theorem 6.4 and its corollary to show that
exponential rates of convergence (as obtained in (11.43)) can be anticipated more
generally.
9. Let p be the transition probability matrix on S = {0, 1, 2, ...} defined by
ifi>0,j=0,1,2,...,i-1
ifi<0,j=0,-1,-2,..., i +I
1 ifi=0,j=0
0 ifi=0,j960.
(i) Calculate the absorption rate.

EXERCISES 207
(ii) Show that the mean time to absorption starting at i > 0 is given by =, (i /k).
10. Let {X} be the simple branching process on S = {0, 1, 2, ...} with offspring
distribution { fj }, f j a jfj 1.
(i) Show that all nonzero states in S are transient and that lim^ P 1 (X = k) = 0,
k=1,2,....
(ii) Describe the unique invariant probability distribution for {X}.
II. (i) Suppose that in a certain society each parent has exactly two children, and
both males and females are equally likely to occur. Show that passage of the
family surname to descendants of males eventually stops.
(ii) Calculate the extinction probability for the male lineage as in (i) if each parent
has exactly three children.
(iii) Prompted by an interest in the survival of family surnames, A. J. Lotka (1939),
"Theorie Analytique des Associations Biologiques II, Actualites Scientifiques
et Industrielles, (N.780), Hermann et Cie, Paris, used data for white males in
the United States in 1920 to estimate the probability function f for the
number of male children of a white male. He estimated f(0) = 0.4825,
f(j)= (0.2126)(0.5893)' ' (j = 1,2,...).
-
(a) Calculate the mean number of male offspring.

(b) Calculate the probability of survival of the family surname if there is only
one male with the given name.
(c) What is the survival probability forj males with the given name under this
model.
(iv) (Maximal Branching) Consider the following modification of the simple
branching process in which if there are k individuals in the nth generation, and
if X,, X 2 , ... , X,, are independent random variables representing their
respective numbers of offspring, then the (n + 1)st generation will contain
Zn + , = max{X . ,, X z , ... , X.k } individuals. In terms of the survival of family
names one may assume that only the son providing the largest number of
grandsons is entitled to inherit the family title in this model (due to J. Lamperti
(1970), "Maximal Branching Processes and Long-Range Percolation, J. Appt.
Probability, 7, pp. 89-98).
(a) Calculate the transition law for the successive size of the generations when
the offspring distribution function is F(x), x = 0, 1, 2, ... .
(b) Consider the case F(x) = 1 (1/x), x = 1, 2, 3..... Show that
lim P(Z + , -< kx I Z = k) = e x ',

- - x>0.
12. Let f be the offspring distribution function for a simple branching process having
finite second moment. Let p = > k kf(k), v 2 = E k (k p) 2 f(k). Show that, given
Xo = 1,
1 )/(u 1) if it # I
Var X. =
g if p = 1.
13. Each of the following distributions below depends on a single parameter. Construct
graphs of the nonextinction probability and the expected sizes of the successive
generations as a function of the parameter.

p ifj= 2
(i) f(j)= q ifj=0
0 otherwise;
(ii) f(j)=qp', j=0,1,2,...;
(iii) f(j)=^ i j
-
14. (Electron Multiplier) A weak current of electrons may be amplified by a device

consisting of a series of plates. Each electron, as it strikes a plate, gives rise to a
random number of electrons, which go on to strike the next plate to produce more
electrons, and so on. Use the simple branching process with a Poisson offspring
distribution for the numbers of electrons produced at successive plates.
(i) Calculate the mean and variance in the amplification of a single electron at the
nth plate (see Exercise 12).
(ii) Calculate the survival probability of a single electron in an infinite succession
of plates if y = 1.01.
15. (A Generalized Gambler's Ruin) Gamblers 1 and 2 have respective initial capitals
i > 0 and c i > 0 (whole numbers) of dollars. The gamblers engage in a series of
fair bets (in whole numbers of dollars) that stops when and only when one of the
gamblers goes broke. Let X. denote gambler l's capital at the nth play (n >_ 1),
Xo = I. Gambler 1 is allowed to select a different game (to bet) at each play subject
to the condition that it be fair in the sense that E(X X^_,) = X,_,, n = 1, 2, ... ,
and that the amounts wagered be covered by the current capitals of the respective
gamblers. Assume that gambler l's selection of a game for the (n + 1)st play (n >_ 0)
depends only on the current capital X. so that {XX } is a Markov chain with stationary
transition probabilities. Also assume that P(X + I = i XX = i) < 1, 0 < i < c,
although it may be possible to break even in a play. Calculate the probability that
gambler 1 will eventually go broke. How does this compare to classical gamblers
ruin (win or lose $1 bets placed on outcomes of fair coin tosses)? [Hint: Check that
a c (i) = i/c, 0 < i < c, a 0 (i) _ (c i)/c, 0 < i _< c (uniquely) solves (11.13) using the
equation E(X X,) = X_ 1 .]

1. (i) Show that for (12.10) it is sufficient to check that
9,(a) _
_ q(b, a)q(a, c)
a, b, cc S. [Hint: Y_ g(a) = 1, b, c e S.]
9,(0) q(b, O)q(O, c)' a
(ii) Use (12.11), (12.12) to show that this condition can be expressed as
9(a) 9e.o(b) 9e.e( 0 ) 9e.^(a)

a, b, cc S.
9n,( 8 ) g9(0) 9e,e(b) 9e.,,(0)

EXERCISES 209
(iii) Consider four adjacent sites on Z in states a, , a, b, respectively. For notational

convenience, let [a, , a, b] = P(X0 = a, X, = , X 2 = a, X 3 = b). Use the
condition (ii) of Theorem 12.1 to show
[a, , a, b] = P(X0 = a, X 2 = a, X3 = b)g".a()
and, therefore,
[a, , a, b] _ g()
[a, /3', a, b] g,,,(')
(iv) Along the lines of (iii) show that also
[a, , a, b] g (a)
[a, , a', b] g.b(a')
Use the "substitution scheme" of (iii) and (iv) to verify (12.10) by checking (ii).
2. (i) Verify (12.13) for the case n = 1, r = 2. [Hints: Without loss of generality, take
h = 0, and note,
[x, , y b]
,
P(X 1 = ,X 2 = yIXo = a,X3 =h)= --
[a, u, v, b]
and using Exercise 1(iii)(iv) and (12.10),
[a, u, v, b] _ g(u)g(v) _ 9(a, u)q(u, v)q(v, b)

[a, u', v', b] ga."(u')gr,b(v') q(a, u')q(u', )9(v', b)
Sum over u, v E S and then take u' = , v' = y.]

(ii) Complete the proof of (12.13) for n = 1, r 2 by induction and then n >- 2.
3. Verify (12.15).
4. Justify the limiting result in (12.17) as a consequence of Proposition 6.1. [Hint: Use
Scheffe's Theorem, Chapter 0.]
*5. Take S = {0, 1}, g,,,(1) = , g, 0 (1) = v. Find the transition matrix q and the
invariant initial distribution for the Markov random field viewed as a Markov chain.

(i) Let {Y": n -> 0} be a sequence of random vectors with values in Il k which converge
a.s. to a random vector Y. Show that the distribution Q. of Y. converges weakly
to the distribution Q of Y.
(ii) Let p(x, dy) be a transition probability on the state space S = R k (i.e., (a) for each
x e S. p(x, dy) is a probability measure on (R", B') and (b) for each B E
x . p(x, B) is a Borel-measurable function on R k ). The n-step transition
probability p " (x, dy) is defined recursively by
( )
p cl) pcn+1)(x, p(y, B)p' n) (x, dy).

(x, dy) = p(x, dy), B)
= Js
Show that, if p " (x, dy) converges weakly to the same probability measure it(dy)
) )
for all x, and x * p(x, dy) is weakly continuous (i.e., f s f (y) p(x, dy) is a continuous
function of x for every bounded continuous function f on S), then n is the unique
invariant probability for p(x, dy), i.e., j 13 p(x, B)rc(dx) = n(B) for all Be .V V. [Hint:
Let f be bounded and continuous. Then
J dY) -> f .%(Y)7r(dy),
1lr(dz). ]
J
(iii) Extend (i) and (ii) to arbitrary metric space S, and note that it suffices to require
convergence of n - ' Jm- If (Y)pl'")(x, dy) to If (y)ir(y) dy for all bounded
continuous f on S.
2. (i) Let B 1 , B 2 be m x m matrices (with real or complex coefficients). Define IIBII as
in (13.13), with the supremum over unit vectors in Il'" or C'. Show that
IIBIB2II < IIB,II IIB211-
(ii) Prove that if B is an m x m matrix then
IIBII < m' 12 max {Ib 1 l: 1 < i,j <, m}.
(iii) If B is an m x m matrix and IIBII is defined to be the supremum over unit vectors
in C'", show that IIB"II >, r"(B). Use this together with (13.17) to prove that
limllB"II" exists and equals r(B). [Hint: Let A. be an eigenvalue such that
12,1 = r(B). Then there exists x e C', (1x11 = 1, such that Bx = Ax.]
_<
3. Suppose E I is a random vector with values in 1.
(i) Prove that if b > 1 and c > 0, then
log c
P(Ie 1 I > cb") EEIZI 1, where Z = logI ,I
" =1 log (5
[Hint: P(IZI > n) _ nP(n < IZI < n + 1). ]
_< _<
n =1 n=1
IIS"B"II _<
(ii) Show that if (13.15) holds then (13.16) converges. [Hint:
dfl(" B""II 1 "f") where d = max{8'IIBII': 0
(iii) Show that (13.15) holds, if it holds for some S < 1 /r(B). [Hint: Use the Lemma.]
r n o }. ]
4. Suppose Y a"z" and 1] b "z" are absolutely convergent and are equal for Izl < r, where
r is some positive number. Show that a n = bn for all n. [Hint: Within its radius of
EXERCISES 211
convergence a power series is infinitely differentiable and may be repeatedly

differentiated term by term.]
5. (i) Prove that the determinant of an m x m matrix in triangular form equals the
product of its diagonal elements.
(ii) Check (13.28) and (13.35).
6. (i) Prove that under (13.15), Y in (13.16) is Gaussian if c" are Gaussian. Calculate
the mean vector and the dispersion matrix of Y in terms of those of E".
(ii) Apply (i) to Examples 2(a) and 2(b).
7. (i) In Example I show that Jhi < 1 is necessary for the existence of a unique invariant
probability.
(ii) Show by example that ibI < I is not sufficient for the existence of a unique invariant
probability. [Hint: Find a distribution Q of the noise E" with an appropriately
heavy tail.]
8. In Example 1, assume EE < oo, and write a = EE", X" + , = a + bX" + 0" + where
= a" a (n -> 1). The least squares estimates of a, b are N , b N , which minimize
Yn- (XX+ , a bX") z with respect to a, b.
(i) Show that a N = Y 6 N X, b N = I ' X" + , (X" (XX X ) 2 , where
X= N 1 X", Y= N 1 I i X. [Hint: Reparametrize to write a + bX" _
a, + b(X" X).]
(ii) In the case Ibi < 1, prove that a N + a and b N - b a.s. as N oo.
9. (i) In Example 2 let m = 2, b _ 4, b 12 = 5, b 2 , = 10, b 22 = 3. Assume e, has
a finite absolute second moment. Does there exists a unique invariant probability?
(ii) For the AR(2) or ARMA(2, q) models find a sufficient condition in terms of o
and (i, for the existence of a unique invariant probability, assuming
that q" has a finite rth moment for some r > 0.

1. Prove that the process {X"(x): n >- 0} defined by (14.2) is a Markov process having
the transition probability (14.3). Show that this remains true if the initial state x is
replaced by a random variable X. independent of {a"}.
2. Let F"(z):= P(X, -< z), n >- 0, be a sequence of distribution functions of random
variables X. taking values in an interval J. Prove that if F + ,(z) F(z) converges
to zero uniformly for z e J, as n and m go to co, then F(z) converges uniformly (for
all z e J) to the distribution function F(z) of a probability measure on J.
3. Let g be a continuous function on a metric space S (with metric p) into itself. If,
for some x e S, the iterates g ( "^(x) - g(gt" (x)), n >- 1, converge to a point x* E S
as n * cc, then show that x* is a fixed point of g, i.e., g(x*) = x*.
4. Extend the strong Markov property (Theorem 4.1) to Markov processes {X"} on
an interval J (or, on
5. Let r be an a.s. finite stopping time for the Markov process {X"(x): n > 0} defined
by (14.2). Assume that X,(x) belongs to an interval J with probability 1 and p ( " ) (y, dz)
converges weakly to a probability measure n(dz) on J for all ye J. Assume also
that p(y, J) = I for all y e J.
(i) Prove that p " (x, dz) converges weakly to n(dz). [Hint: p '` (x, J) . I as k --i c0,
( ) ( )
(k+r) .) (k)
f f(y)p (x dy) = f ($f(z)p^ (y, dz))p (x dy).]
(ii) Assume the hypothesis above for all x e J (with J not depending on x). Prove
that it is the unique invariant probability.
6. In Example 2, ifs,, are i.i.d. uniform on [ 1,1], prove that {X 2n (x): n > 1} is i.i.d.
with common p.d.f. given by (14.27) if XE [-2, 2].
7. In Example 2, modify f as follows. Let 0 < <. Define fo (x): _f(x) for
2 _< x < 6, and 6 _< x _< 2, and linearly interpolate between (,), so that f,
is continuous.
(i) Show that, for x e [6, 1] (or, x c [-1, b]) {X"(x): n >, 1} is i.i.d. with common
distribution it (or, nx+2).
(ii) For x e (1, 2] (or, [-2, 1)) {X"(x): n >_ 2} is i.i.d. with common distribution
ix (or, ix +2).
(iii) For x e (-8, 6) {X"(x): n >_ 1} is i.i.d. with common distribution it_ X+ ,.
8. In Example 3, assume P(e 1 = 0) > 0 and prove that P(e,, = 0 for some n >_ 0) = 1.
9. In Example 3, suppose E log s > 0. ;
(i) Prove that E, , {1/(E 1 ...e" )} converges a.s. to a (finite) nonnegative random
variable Z.
(ii) Let d, := inf{z > 0: P(Z < z) > 0}, d 2 == sup{z >- 0: P(Z >- z) > 0}. Show that
=0 if x < c(d, + l ),
p(x) e (0, 1) if c(d, + l) < x < c(d 2 + 1),
=1 if x > c(d 2 + 1).
10. In Example 3, define M := sup{z >_ 0: P(e, > z) > 0}.

(i) Suppose I <M < oo. Then show that p(x) = 0 if x < cM/(M 1).
(ii) If Al _< 1, then show that p(x) = 0 for all x.
(iii) Let m be as in (14.34) and M as above. Let d,, d 2 be as defined in Exercise
9(ii). Show that
_ 1
(a) d, = > m- " if m > 1, = oo if m 5 1 , and
n =, mI
(b)d 2 = M " - (=_ifM>1=coifM1).

mI

1. Let 0 < p < I and suppose h(p) represents a measure of uncertainty regarding the
occurrence of an event having probability p. Assume
(a) h(i)=0.
(b) h(p) is strictly decreasing for 0 < p _< 1.
(c) h is continuous on (0, 1].
(d)h(pr) = h(p) + h(r), 0 <p < 1,0< r < I.

EXERCISES 213
Intuitively, condition (d) says that the total uncertainty in the joint occurrence of
two independent events is the cumulative uncertainty for each of the events. Verify
that h must be of the form h(p) = c log 2 p where c = h(21) > h(1) = 0 is a positive
(f)
constant. Standardizing, one may take
h(p) = log2 p.
2. Let f= be a probability distribution on S = { 1, 2, . .. , M}. Define the entropy
in f by the "average uncertainty," i.e.,
H(f) =Zf loge fi (0 log 0:= 0).

a=1
Y_
(i) Show that H(f) is maximized by the uniform distribution on S.
(ii) If g = (g ; ) is another probability distribution on S then
H(f) Ji 1og 2 gi
i
with equality if and only if f = g for all i E S.

;
<_
3. Suppose that X is a random variable taking values a 1 ..... a M with respective
probabilities p(a, ), ... , p(a M ). Consider an arbitrary binary coding of the respective
symbols a,, I < i z M, by a string cb(a ; ) = ( rI', ... , sl,'t) of 0's and l's, such that no
string (eilt, ... , F 11 ) can be obtained from a shorter code (c,, ... , e 1 n ; n j , by
adding more terms; such codes will be called admissible. The number n ; of bits is
called the length of the code-word 4)(a 1 ).
(i) Show that an admissible binary code 0 having respective lengths n ; exists if and
only if
M
I.
i =1
[Hint: Let n 1 .... , n M be positive integers and let
Pk = #{ i:n i =k}, k = 1,2,....
Then it is necessary and sufficient for an admissible code that p, S 2,

P2522-2P1.....Pk52kpl2k 1 Pk -1 2 ,k>, 1 .]
...
(ii) (Noiseless Coding Theorem) For any admissible binary code 0 of a,..... a M
having respective lengths n 1 ..... M' the average length of code-words cannot
=l
n.p(a,) % Z
be made smaller than the entropy of the distribution p of a,, ... , a,, i.e.,
i =1
p(a) log2 p(ai)
H(p) Y
[Hint: Use Exercise 2() with f = p(a ; ), g ; = 2 - " to show
i =1
n1p(ai) + log 2 (
Then apply Exercise 3(i) to get the result.]

M
k =1
2 - "").

(iii) Show that it is always possible to construct an admissible binary code of

a l , ... , a, such that
M
H(p) n1 P(ai) 5 H(p) + 1.
[Hint: Select n, such that
log t p(a 1 )
and apply 3(i).]

_< <,
n; log 2 p(a) + 1 for 1 _< _<
i M,
(iv) Verify that there is not a more efficient (admissible) encoding (i.e. minimal
average number of bits) of the symbols a,, a 2 , a 3 for the distribution p(al) = z,
P(a2) = p(a 3 ) = , than the code 0(a 1 ) = ( 0), 0(a 2 ) = ( 1, 0), (a 3 ) = ( 1, 1)
4. (i) Show that Y1 , Y2 ,in (15.4) satisfies the law of large numbers.
...
(ii) Show that for (15.5) to hold it is sufficient that Yl , YZ satisfy the law of large
, ...
numbers.
Theoretical Complements to Section II.6

1. Let S be an arbitrary state space equipped with a sigmafield .5 of events. Let y(dx)
be a sigmafinite measure on (S, 9) and let p(x, y) be a transition probability density
with respect to p; i.e., p is a nonnegative measurable function on (S x S, .9' x ,9') such
that for each fixed x, p(x, y) is a p-integrable function of y with total mass 1. Let S2
denote the space of all sequences w = (x 0 x,, ...) of states x i e S. Let .F, denote the
,
class of all finite-dimensional sets A of the form
A={w=(x 0 ,x l ,...)eftx 1 eB 1 ,i = 0,1,.. ,n},
where B ; e 9 for each i. Define PP (A) for such a set A, x e B 0 , by
Px(A)
=L ...
9,
P(x,Y1)P(Y1,Y2)...P(yn_1,Y(dyn)...u(dy1) (T.6.1)
for x e S. Define P(A) = 0 if x 0 B 0 . The Kolmogorov Extension Theorem assures

us that Px has a unique extension to a probability measure defined for all events in
the smallest sigmafield .F of subsets of S2 that contain S. For any probability measure
y on (S, .9"), define
P1(A) = J PX(A)y(dx), A e F. (T.6.2)
Let X. denote the nth coordinate projection mapping on 12. Then {X} is said to be
a Markov chain on the state space S with transition density p(x, y) with respect to p
and initial distribution, y under P. The results of Exercise 6.9 can be extended to this

setting as follows. Suppose that there is a positive integer r and a p-integrable function
p on S such that fs p(x)p(dx) > 0 and p ' (x, y) > p(y) for all x, y in S. Then there is
( )
a unique invariant distribution it such that
sup p ( " (x, y)p(dy) it(B) (1

B
where a = f s p(x) u(dx), n' = [n/r], and the sup is over B e .Y .
Proof. To see this define
sup p " (u, y)p(dy)

( )
M(B) := uES ^
JB
m"(B):= inf if p " (u, y)p(dy),

( ) (T.6.3)
s "
sup {M(B) m"(B)}.

B
Then
= sup Ip ( " ) " 1.

(x, y) pc (z, y)I u(dy)
)
2 , s
pick+ n.>
(x, y)p(dy) p k+ '(z, y)p(dy)1 ( 1 e)[Mk.(B) mk.(B)]
(ii) Ifil
(iii) The probability measure it given by
n(B) = lim J p " (x, y)p(dy)

( ) (T.6.4)
is well defined and
su ( " ) (x, y)p(dy) it(B) (1 a)"'. (T.6.5)

B p Ip
n
Also, the following facts are simple to check.

(iv) it is absolutely continuous with respect to p and therefore by the
RadonNykodym Theorem (Chapter 0) has a density rz(y) with respect to p.
(v) For Be .9' with 7r(B) > 0, one has Py,(X" e B i.o.) = 1.
2. (Doeblin Condition) A well-known condition ensuring the existence of an invariant
distribution is the following. Suppose there is a finite measure cp and an integer r >_ 1,
and > 0, such that p ' (x, B) < I r whenever (p(B) _< e. Under this condition there
( )
is a decomposition of the state space as S = U;"_ j S i such that

(i) Each Si is closed in the sense that p(x, S ; ) = I for x e S. (1 i m).
(ii) p restricted to Si has a unique invariant distribution rti.
(iii) - j p ' (x, B) -+ zr (B)

( ) ; as n -. cc for x e Si .
n r =^
Moreover, the convergence is exponentially fast and uniform in x and B.

It is easy to check that the condition of theoretical complement 1 above implies the
Doeblin condition with Qp(dx) = p(x)p(dx). The more general connections between
the Doeblin condition with the gi(dx) = p(x) p(dx) condition in theoretical complement
1 can be found in detail in J. L. Doob (1953), Stochastic Processes, Wiley, New York,
pp. 190.
3. (Lengths of Increasing Runs) Here is an example where the more general theory
applies.
Let X,, X2 , ... be i.i.d. uniform on [0, 1]. Successive increasing runs among the
values of the sequence X 1 , X2 ,. . . are defined by placing a marker at 0 and then between
X; and X +1 whenever X; exceeds X +1 , e.g., X0.20, 0.24, 0.6010.5010.30, 0.7010.20... .
Let Y. denote the initial (smallest) value in the nth run, and let L. denote the length
of the nth run, n = 1, 2.... .
(i) { Y,} is a Markov chain on the state space S = [0, 1] with transition density
e' - x ify<x
P(x, Y) _
(ii) Applying theoretical complement 1, { Y} has a unique invariant distribution n.

Moreover, it has a density given by n(y) = 2(1 y), 0 < y < 1.
(iii) The limit in distribution of the length L. as n --+ oo may also be calculated from
that of { }}, since
,r,(1_y)m
P(L m) =
fo
P(L m I Y = Y)P(Y E dY) =
0
( m 1)! J P(Y E dY),
and therefore,
ri
y
P(L >, m):= lim P(L >_ m) = J ^ 1
0 )1 2(1 y) dy.
--W o (m 1)!
Note from this that
EL. _ P(L. _> m) = 2.

m=1
Theoretical Complement to Section 11.8

1. We learned about the random rounding problem from Andreas Weisshaar (1987),
"Statistisches Runden in rekursiven, digitalen Systemen 1 und 2, Diplomarbeit erstellt
am Institut fr Netzwerk- und Systemtheorie, Universitt Stuttgart. The stability
condition of Example 8.1 is new, however computer simulations by Weisshaar suggest
that the result is more generally true if the spectral radius is less than 1. However,
this is not known rigorously. For additional background on the applications of this

technique to the design of digital filters, see R. B. Kieburtz, V. B. Lawrance, and K.

V. Mina (1977), "Control of Limit Cycles in Recursive Digital Filters by Randomized
Quantization," IEEE Trans. Circuits and Systems, CAS-24(6), pp. 291-299, and
references therein.
Theoretical Complements to Section I1.9

1. The lattice case of Blackwell's Renewal Theorem (theoretical complement 2 below)
may be used to replace the large-scale Caesaro type convergence obtained for the
transition probabilities in (9.16) by the stronger elementwise convergence described
as follows.
(i) If j e S is recurrent with period 1, then for any i e S,
lim pi; ) = pj;(E;i; t) ) - `

nom
where p 0 is the probability P; (i^' ) < cc) (see Eq. 8.4).

(ii) If j e S is recurrent with period d> 1, then by regarding p as a one-step transition
law,
i
lim p(Jd) = pijd(E;iîi)-
n-^m
To obtain these from the general renewal theory described below, take as the
delay Zo the time to reach j for the first time starting at i. The durations of the
subsequent replacements Z 1 , Z 2 , ... represent the lengths of times between returns
to I.
2. (Renewal Theorems) Let Z,, 4,... be independent nonnegative random variables

such that Z,, 4,.. . are i.i.d. with common (nondegenerate) distribution function F,
and Z o has distribution function G. In the customary framework of renewal theory,
components subject to failure (e.g. lightbulbs) are instantly replaced upon failure,
and Z,, Z 2 , ... represent the random durations of the successive replacements. The
delay random variable Zo represents the length of time remaining in the life of the
initial component with respect to some specified time origin. For example, if the
initial component has age a relative to the placement of the time origin, then one
may take
F(x + a) F(a)
G(x) _
1 F(a)
Let
S. = Zo + Z, + + Zn , n ^> 0, (T.9.1)
and let
N,=inf{n>-O:Sn>t}, t>-0. (T.9.2)
We will use the notation S.0, N sometimes to identify cases when Z o = 0 a.s. Then
Sn is the time of the (n + 1)st renewal and N, counts the number of renewals up to

and including time t. In the case that Z o = 0 a.s., the stochastic (counting) process
{N} - {N} is called the (ordinary) renewal process. Otherwise {N,} is called a delayed
renewal process.
For simplicity, first restrict attention to the case of the ordinary renewal process.
Let u = EZ l < oo. Then 1/ is called the renewal rate. The interpretation as an
average rate of renewals is reasonable since
Sx -i \ t \
(T.9.3)
N, N, N,
and N, -- oc as t --* oo, so that by the strong law of large numbers
>_
N, -. 1
a.s. as t - oc. (T.9.4)
t p
Since N, is a stopping time for {Sn } ( for fixed t 0), it follows from Wald's Equation
(Chapter I, Corollary 13.2) that
ESN , = pEN,. (T.9.5)
In fact, the so-called Elementary Renewal Theorem asserts that
EN,1
-
as t --* Cc. (T.9.6)
t
To deduce this from the above, simply observe that pEN, = ESN , > t and therefore
EN,1
/
liminf -
- . t p
On the other hand, assuming first that Z. < C a.s. for each n 1, where C is a >,
positive constant, gives pEN, < t + C and therefore for this case, limsup,.,(EN,/t)
I/p. More generally, since truncations of the Z. at the level C would at most decrease
the S, and therefore at most increase N, and EN this last inequality applied to the
truncated process yields
EN,
<- 1
fc
limsup as C -4 co. (T.9.7)
t CP(Z 1 >, C) + xF(dx) p
The above limits (T.9.4) and (T.9.6) also hold in the case that p = co, under the
convention that 1 /0o is 0, by the SLLN and (T.9.7). Moreover, these asymptotics
can now be applied to the delayed renewal process to get precisely the same conclusions
for any given (initial) distribution G of delay.
With the special choice of G = F. defined by
x
F(x) =1 I P(Z, > u)du, x 0, (T.9.8)
u j0

the corresponding delayed renewal process N, called the equilibrium renewal process,
has the property that
EN(t, t + h] = h/p, for any h, t l 0, (T.9.9)
where
NI(t,t+h]=N+hN,
N. (T.9.10)
To prove this, define the renewal function m(t) = EN t >- 0, for {N1 }. Then for the
general (delayed) process we have
m(t) = EN, = P(N, n) = G(t) + P(N, n+l)
= G(t) + P(N,_ -> n)F(du) = G(t) + J m(t u)F(du)

f0
' 0
= G(t) + m * F(t), (T.9.11)
where * denotes the convolution defined by
m s F(t) = J m(t s)F(ds). (T.9.12)

u
Observe that g(t) = t/p, t -> 0, solves the renewal equation (T.9.11) with G = Fx,;
i.e.,
t
=
u o
1`
(1 F(u)) du +
o
t
F(u) du = F(t) + --
N o 0
F(ds) du JJ
= F^(t) + 1
u J J r duo
' s f EP
F(ds) = F (t) + t s F(ds).
x
.'
(T.9.13)
To finish the proof of (T.9.9), observe that g(t) = m`(t):= EN;, t >- 0, uniquely
solves (T.9.11), with G = F., among functions that are bounded on finite intervals.
For if r(t) is another such function, then by iterating we have
r(t) = F(t) +^^ r(t u)F(du)

0
= F(t) + f { F^(t u) +
.'
(
J t
o
u r(t u s)F(ds)jF(du)
= P(Nj' >- 1) + P(N' -> 2)

+ f r(t v)P(S e dv)
= P(Nf' ^ 1) + P(N, 3 2) + + P(Nf' ? n) + J r(t v)P(S e dv).
(T.9.14)

Thus,
r(t) _ P(Nj 3 n) = m W (t)
n =i
since
r Ir(t - v)IP(S E dv) __ n) -+ 0 as n -- oo since P(N >, n) = EN < cc.
n =i
Let d be a positive real number and let L d = {0, d, 2d, 3d, ...}. The common
distribution F of the durations Z 1 , Z 2 , ... is said to be a lattice distribution if there
is a number d > 0 such that P(Z 1 e L a ) = 1. The largest such d is called the period of F.
Theorem T.9.1. (Blackwell's Renewal Theorem)

(i) If F is not lattice, then for any h > 0,
EN(t,t+h]= P(t<Sk_<t+h)-^h ast -+oo, (T.9.15)

k=1
where N(t, t + h] := N, +k - NN is the number of renewals in time t to t + h.

(ii) If F is lattice with period d, then
EN " = ( ) P(Sk = nd) - - as n - oo, (T.9.16)

k=^ u
where
Nc") _ (T.9.17)
lsk=nd)
k=0
is the number of renewals at the time nd. In particular, if P(Z, = 0) = 0, then

EN " is simply the probability of a renewal at time nd.
( )
Note that assuming that the limit exists for each h > 0, the value of the limit in
(i), likewise (ii), of Blackwell's theorem can easily be identified from the elementary
renewal theorem (T.9.6) by noting that cp(h):= lim^.. m EN(t, t + h] must then be
linear in h, p(0) = 0, and
q(1)= lim {EN" +, - EN"}
[{EN, -ENo }+{EN 2 -EN,}++{ENn -EN I }]

= 1im
EN" 1
_ (im = -.
n -.. n

Recently, coupling constructions have been successfully used to prove Blackwell's

renewal theorem. The basic idea, analogous to that outlined in Exercise 6.6 for
another convergence problem, is to watch both the given delayed renewal and an
equilibrium renewal with the same renewal distribution F until their renewal times
begin (approximately) to agree. After sufficient agreement is established, then one
expects the statistics of the given delayed renewal process to resemble more and more
those of the equilibrium process from that time onwards. The details for this argument
follow. A somewhat stronger equilibrium property than (T.9.9) will be invoked, but
not verified until Chapter IV.
Proof. To make the coupling idea precise for the case of Blackwell's theorem with
< oc, let {Z8 : n >- 1} and {Z: n >- l} denote two independent sequences of renewal
lifetime random variables with common distribution F, and let Z o and Z o be
independent delays for the two sequences having distributions G and G = F es ,
respectively. The tilde () will be used in reference to quantities associated with the
latter (equilibrium) process. Let a > 0 and define,
v(r)=inf{n>0:IS for some h}, (T.9.18)

v"(s)=inf{n>,0:IS,jj e. for some n}. (T.9.19)
Suppose we have established that (e-recurrence) P(v(e) < oo) = I (i.e., the coupling will
occur). Since the event {v(e) = n, v(e) = n} is determined by 4, Z 1 , ... , Z and
Z;,, the sequence of lifetimes {Z' +k : k >- l} may be replaced by the
sequence {Z, +k : k >- 1} without changing the distributions of {S}, {N}, etc. Then,
after such a modification for < h/2, observe with the aid of a simple figure that
N(t+r,t+he]1 1S,wt) -<l (t,t+h]l{s ^SI<N(tE,I+h+E]1{s,iEi^^
(T.9.20)
Therefore,
N(t+e,t+hs]N(t,t+h]1 {s , ( , > , t
=N(t+e,t+h e]l^s +(N(t+E,t+hE]N(t,t+h])l (s >g
N(t + s, t + h E] 1 ^ s ,, E
N(t, t + h]I (s , (,) ,, I
N(t, t + h]1 Is , II + N(t, t + h]l {s )(= N(t, t + h])
N(t E, t + h + E]1 (s , } + N(t, t + h]l {s ,,,,,,,

N(te,t+h+e]+N(t,t+h]l 5 )>,}. (T.9.21)
Taking expected values and noting the first, fifth, and seventh lines, we have the
following coupling inequality,
EN(t+r,t+he]E(N(t,t +h]l is ,. ( ,,,)

EN(t, t + h] < EN(t c, t + h + e] + E(N(t, t + h]1 (5 , > ,). (T.9.22)
222 DISCRETE - PARAMETER MARKOV CHAINS
Using (T.9.9),
h+2e
EN(t+s, t+h e]= h - 2s and EN(t E, t+h+e]=
p
Therefore,
EN(t, t + h] hl < ?E + E(N(t, t + h]l {s ,}). (T.9.23)

u
Since A = I Is ,()>$} is independent of {ZN, +k: k >, 1}, we have
E(N(t, t + h]1 (s ,^ >,^) < E o Nh P(SVO > t) = m(h)P(S V(C) > t), (T.9.24)
where E o denotes expected value for the process N h under zero-delay. More precisely,
because (t, t + h] c (t, SN , + h] and there are no renewals in (t, SN ,), we have
N(t, t + h] < inf{k 3 0: ZN, +k > h). In particular, noting (T.9.2), this upper bound
by an ordinary (zero-delay) renewal process with renewal distribution F, is
independent of the event A, and furnishes the desired estimate (T.9.24).
Now from (T.9.23) and (T.9.24) we have the estimate
EN(t, t + h] hl < m(h)P(SV(C) > t) + ?E , (T.9.25)
which is enough, since e> 0 is arbitrary, provided that the initial e-recurrence
assumption, P(v(e) < oo) = 1, can be established. So, the bulk of the proof rests on
showing that the coupling will eventually occur. The probability P(v(c) < cc) can be
analyzed separately for each of the two cases (i) and (ii) of Theorem
T.9.1.
First take the lattice case (ii) with lattice spacing (period) d. Note that for e < d,
v(e) = v(0) = inf{n > 0: S" = 0 for some n}. (T.9.26)
Also, by recurrence of the mean-zero random walk on the integers (theoretical

complement 3.1 of Chapter I), we have P(v(0) < co) = 1. Moreover, S, (0) is a.s. a
finite integral multiple of d. Taking t = nd and h = d with e = 0, we have
EN ( " ) = EN(nd, nd + h] -+ h = d as n - cc. (T.9.27)

P p
For case (i), observe by the HewittSavage zeroone law (theoretical complement
1.2 of Chapter I) applied to the i.i.d. sequence (Z 1 , Z 1 ), (Z 2 , Z 2 ), (Z 3 , Z 3 ), ... , that
P(R " < c i.o. I Z o = z) = 0 or 1,
where R"=
min{ S,fS":5;,S">0,n_>0}=SS,^S "
Now, the distribution of {SR, +; t} ; does not depend on t (Exercise 7.5, Chapter
->
IV). This, independence of {Z' j } and {S}, and the fact that {Sk + Sk : n 0} does
not depend on k, make {R,, +k } also have distribution independent of k. Therefore,
the probability P(R, < e for some n > k), does not depend on k, and thus
{R < e i.o.} = n {R. < e for some n -> k} (T.9.28)

I,=0
implies P(R < r i.o.) = P(R < e for some n >, 0) -< P(v(e) < oo). Now,
J P(R < s i.o. = z)P(2 0 e dz)

= P(R < e i.o.) = P(R < r for some n)
= J P(R. <e for some n 12 = z)P(2 e dz). (T.9.29)

0 0
The proof that P(R <e for some n z) > 0 (and therefore is 1) in (T.9.29)
follows from a final technical lemma given below on "points of increase" of distribution
functions of sums of i.i.d. nonlattice positive random variables; a point x is called a
point of increase of a distribution function F if F(b) F(a) > 0 whenever a < x < b.
Lemma. Let F be a nonlattice distribution function on (0, co). The set E of points
of increase of the functions F, F* z , F* 3 , . . is "asymptotically dense at co" in the
sense that for any t > 0 and x sufficiently large, E n (x, x + e) 96 0, i.e., the interval
(x, x + e) meets E for x sufficiently large. 0
The following proof follows that in W. Feller (1971), An Introduction to Probability

Theory and Its Applications, 2nd ed., Wiley, New York, p. 147.
Proof. Let a, b E E, 0 <a < b, such that b a < e. Let 1 = (na, nb]. For
a < n(b a), the interval 1 properly contains (na, (n + 1)a), and therefore each
x > a 2 /(b a) belongs to some I., n >, 1. Since E is easily checked to be closed under
addition, the n + 1 points na + k(b a), k = 0,1, ... , n, belong to E and partition
I. into n subintervals of length b a < r. Thus each x > a 2 /(b a) is at a distance
(b a)/2 </2 of E. If for some r > 0, b a -> e for all a, b e E then F must be
a lattice distribution. To see this say, without loss of generality, E -< b a < 2a for
somea,beE.ThenEnl c {na +k(ba):k= 0,1,...,n}.Since(n+l)aeEnl
for a < n(b a), E n I must consist of multiples of (b a). Thus, if c e E then
c + k(b a) e / n E for n sufficiently large. Thus c is a multiple of (b a). n
Coupling approaches to the renewal theorem on which the preceding is based can
be found in the papers of H. Thorisson (1987), "A Complete Coupling Proof of
Blackwell's Renewal Theorem," Stoch. Proc. App!., 26, pp. 87-97; K. Athreya, D.
McDonald, P. Ney (1978), "Coupling and the Renewal Theorem," Amer. Math.
Monthly, 851, pp. 809-814; T. Lindvall (1977), "A Probabilistic Proof of Blackwell's
Renewal Theorem," Ann. Probab., 5, pp. 482-485.
3. (Birkhoff's Ergodic Theorem) Suppose {X: n -> 0} is a stochastic process on
(S2, .F, P) with values in (S, ,V). The process {X} is (strictly) stationary if for every
pair of integers m >- 0, r >- 1, the distribution of (X 0 , X 1 , ... , X,,) is the same as that

>,
of (X X l+ ... , X. + ,). An equivalent definition is: {X} is stationary if the distribu-
tion , say, of X :_ (X0 , X 1 , X2 , ...) is the same as that of T'X :_ (X,, X l + X2 + ...)
for all r 0. Recall that the distribution of (X X, + ...) is the probability measure
_< _< _< _>

induced on (St, .9 x) by the map co -+ (X,(w), X 1 + ,( w), X 2+r (w)....). Here St is
the space of all sequences x = (x 0 , x 1 , x 2 , ...) with x i E S for all i, and .5' is the
smallest sigmafield containing the class of all sets of the form C = {x e St: x i E B, for
0 < i n} where n 0 and B E . ' (0 i n) are arbitrary. The shift transformation
;
T is defined by Tx:= (x i , x 2 , ...) on St, so that T'x = (x XI+ x2+ ...).

Denote by I the sigmafield generated by {X: n 0 }. That is, ! is the class of all
sets of the form G = X -1 C = {co E Q: X(co) E C}, C E .9'x For a set G of this form,
write T '6= {w e Q: TX(w) E C) = {(X,, X 2 ,. . .) E C) _ {X E T 'Cl. Such a set
G is said to be invariant if P(G AT 'G) = 0, where A denotes the symmetric difference.
-
By iteration it follows that if G = {X e C} is invariant then P(G AT 'G) = 0 for all

-
r > 0, where T 'G = {(X X l + X2,,. . .) e C}. Let f be a real-valued measurable

-
function on (5 50) Then cp(w):= f(X(w)) is W- measurable and, conversely, all

t,
1-measurable functions are of this form. Such a function cp = f(X) is invariant if

f(X) = f(TX) a.s. Note that G = {X E C) is invariant if and only if 1 G = l c (X) is
invariant. Again, by iteration, if f(X) is invariant then f(X) = f (T'X) a.s. for all r ? 1.
Given any p -measurable real-valued function cp = f(X), the functions (extended
real-valued)
f(X): = lim n -1 (f(X)+f(TX)+-+f(T'X))

n -m
and
lim n '( f(X)+ ... + f(Tn -1 X))

f(X):= nyt -
are invariant, and the set {7(X) = f(X)} is invariant.

The class 5 of all invariant sets (in 5
) is easily seen to be a sigmafield, which is
called the invariant sigmafield. The invariant sigmafield .1 is said to be trivial if
P(G) = 0 or 1 for every G E J.
Definition T.9.1. The process {X: n > 0} and the shift transformation T are said
to be ergodic if S is trivial.
numbers (Chapter 0, Theorem 6.1).
Theorem T.9.2. (Birkhoff's Ergodic Theorem). Let {X: n _>

The next result is an important generalization of the classical strong law of large
0) be a stationary
sequence on the state space S (having sigmafield .9'). Let f(X) be a real-valued
1-measurable function such that Elf(X)( < oc. Then
(a) n - ' Y_;=
f(TX) converges a.s. and in L' to an invariant random variable g(X),
and
(b) g(X) = Ef(X) a.s. if S is trivial.
We first need an inequality whose derivation below follows A. M. Garcia (1965),

"A Simple Proof of E. Hopf's Maximal Ergodic Theorem, J. Math. Mech., 14,
pp. 381-382. Write
M(f):= max{0, f(X),f(X)+ f(TX),...,f(X)+ + f(T" - 'X)},

M(f-T)= max{0, f(TX),f(TX)+f(T 2 X),...,f(TX)++f(T"X)},
M(f):= lim M(f)= sup M(f).
(T.9.30)
Proposition T.9.3. (Maxima! Ergodic Theorem). Under the hypothesis of Theorem

T.9.2,
f(X)dP>-0 VGEJ. (T.9.31)

fuW 1 01IG
Proof. Note that f(X)+M(foT)=M^ +1 (f) on the set {MM+ ,(f)>0}. Since
M, + , (f) >, M(f) and {M(f) > 0} c {M + , (f) > 0}, it follows that f (X)
M(f) M(f o T) on {M(f) > 0}. Also, M (f) -> 0, M^(f o T) -> 0. Therefore,
M
f(X)dP> - (M^(f)M^(fT))dP
J(
M(f)>OV G {M(f)>OInG
J J
= M(f) dP M(f o T) dP
O
J G
J
M^(f) dP M(f o T) dP
G
= 0,
where the last equality follows from the invariance of G and the stationarity of
Thus, (T.9.31) holds with {M(f) > 0} in place of {M(f) > 0}. Now let n j cc.
Now consider the quantities
1 n- 1
A,(f):= max{.f(X), -1(.f(X) + f(TX)), ... , f(T`X)

n,_ o
A(f):= tim A(f)= sup A(f).

new n,(
The following is a consequence of Proposition T.9.3.
Corollary T.9.4. (Ergodic Maximal Inequality). Under the hypothesis of Theorem

T.9.1 one has, for every cc If8',
f(A(fc)n G
f(X) dP -> cP({A(f ) > c} n G) VG E 5. (T.9.32)
Proof. Apply Proposition T.9.3 to the function f - c to get
f (M(f-^0)nG
f(X) dP -> cP({M(f - c) > 0} n G).
But {M(f - c) > 0} = {A(f - c) > 0} = {A(f) > c}, and {M(f - c) > 0} _
{A(f)>c}. n
We are now ready to prove Theorem T.9.2, using (T.9.31).
Proof of Theorem T.9.2. Write
I - ' I^-'
7(X):= i-- > f(T'X), f(X):= lim - Y f(T'X),
p_Qn.=o n,n.=o (T.9.33)
Gc,,(f )'= {f(X) > c, f(X) <d} (c, dc R' )
;Since GG , d ( f) e 5 and Gc , d (f) c {A(f) > c}, (T.9.32) leads to
f(X) dP -> cP(GC , d ( f )). (T.9.34)

Ld(f)
Now take -f in place of f and note that (- f) _ - f, ( f ) _

- - f
G_ d , _ C (- f) = GC d (f) to get from (T.9.34) the inequality

fG"d(f)
f(X)dP _> dP(GC , d (f )),
i.e.,
f(X)dP < dP(Gc , d (f )). (T.9.35)
Now if c > d, then (T.9.34) and (T.9.35) cannot both be true unless P(G,, d (f )) = 0.
Thus, if c > d, then P(GC , d (f )) = 0. Apply this to all pairs of rationals c > d to get
P(f (X) > f(X)) = 0. In other words, (1/n) y;= f (T'X) converges a.s. to
g(X) := f(X).
To complete the proof of part (a), it is enough to assume f >, 0, since
n - ' j -1 f + (T'X) -- 7(X) a.s. and n - ' 2]o - 'f(TX) - r(X) a.s., where
f + = max{ f, 0}, - f - = min{ f, 0}. Assume then f > 0. First, by Fatou's Lemma
and stationarity of {X},
-'
1
E7(X) < lim E - f(T'(X)) = Ef(X) < oo.
n-ao n r=o
To prove the L'-convergence, it is enough to prove the uniform integrability of the

sequence {(1/n)S(f):n_>1}, where S(f):=Z 'f(T'X). Now since f(X) is
-
nonnegative and integrable, given s > 0 there exists a constant N E such that

II f (X) fE(X)II l <r where fE (X) := min{ f (X ), N.J. Then
J 1 S"(f) dP 5 J 1 S"(f fE) dP + S"(.ff) dP

t.S^(f>>a1 nn
Ins.,ti)>,1 n
+ N"P({n S"(f) > })
S e + N,Ef(X)/.l. (T.9.36)
It follows that the left side of (T.9.36) goes to zero as .l --p oo, uniformly for all n.
Part (b) is an immediate consequence of part (a).
Notice that part (a) of Theorem T.9.2 also implies that g(X) = E(f (X) J).
Theorem T.9.2 is generally stated for any transformation T on a probability space
(S2, ^, p) satisfying p(T 'G) = p(G) for all G e ^. Such a transformation is called
-
measure-preserving. If in this case we take X to be the identity map: X(cu) = w, then

parts (a) and (b) hold without any essential change in the proof.
1. To prove the FCLT for Markov-dependent summands as asserted in Theorem 10.2,

first consider
X= + +Z
Q^
Since Z i , . are i.i.d. with finite second moment, the FCLT of Chapter I provides
that {X;"} converges in distribution to standard Brownian motion. The corresponding
result for {W"(t)} follows by an application of the Maximal Inequality to show
sup I X;" W " E . iJ (t)I - 0 in probability as n -+ cc, (T.10.1)

osfsI
where t is the first return time to j.
There are specifications of local structure that are defined in a natural manner but
for which there are no Gibbs states having the given structure when, for example,
A = Z, but S is not finite. As an example, one can take q to be the transition matrix
of a (general) random walk on S = Z such that q = q11-1 > 0 for all i, j. In this case
;;
no probability distribution on S^ exists having the local structure furnished by (12.10).

For proofs, refer to the papers of F. Spitzer (1974), "Phase Transition in One-
Dimensional Nearest-Neighbor Systems," J. Functional Analysis, 20, pp. 240-254;
H. Kesten (1975), "Existence and Uniqueness of Countable One-Dimensional Markov
Random Fields," Ann. Probab., 4, pp. 557-569. The treatment here follows F. Spitzer
(1971), "Random Fields and Interacting Particle Systems," MAA Lecture Notes,
Washington, D.C.

(Markov Processes and Iterations of I.I.D. Maps) Let p(x; dy) denote a transition
probability on a state space (S, 91); that is, (1) for each x e S. p(x; dy) is a probability
measure on (S, 9), and (2) for each B e.9', x --+ p(x; B) is .9'-measurable. We will
assume that S is a Borel subset of a complete separable metric space, and .9' its Borel
sigmafield M(S). It may be shown that S may be "relabeled" as a Borel subset C of
[0, 1], with M(C) as the relabeling of .^(S). (See H. L. Royden (1968), Real Analysis,
2nd ed., Macmillan, New York, pp. 326-327). Therefore, without any essential loss
of generality, we take S to be a Borel subset of [0, 1]. For each x e S, let F(.)
denote the distribution function of p(x; dy): FX (y)1= p(x; S o (oo, y]). Define
Fx ' (t) := inf {y e 11': FF (y) > t}. Let U be a random variable defined on some
probability space (f2, , , P), whose distribution is uniform on (0, 1). Then it is
simple to check that P(Fx'(U) _< y) > P(F x (y) > U) = P(U < FX (y)) = Fx (y), and
P(F'(U) _< y) _< P(FF (y) 3 U) = P(U S FX(y)) = Fx(y). Therefore, P(FX'(U) _< y) _
F,(y), that is, the distribution of Fx'(U) is p(x; dy). Now let U,, U2 ,. . . be a sequence
of i.i.d. random variables on (S2, , P), each having the uniform distribution on (0, 1).
Let X0 be a random variable with values in S. independent of {U}. Define
X + 1 = f(X, U,, ,) (n > 0), where f(x, u) := Fx '(u). It then follows from the above
that {X: n >_ 0} is a Markov process having transition probability p(x; dy), and initial
distribution that of X0 .
Of course, this type of representation of a Markov process having a given transition
probability and a given initial distribution is not unique.
For additional information, see R. M. Blumenthal and H. K. Corson (1972), "On
Continuous Collections of Measures, Proc. 6th Berkeley Symposium on Math. Stat.
and Prob., Vol. 2, pp. 33-40.
2. Example I is essentially due to L. E. Dubins and D. A. Freedman (1966), "Invariant
Probabilities for Certain Markov Processes, Ann. Math. Statist., 37, pp. 837-847.
The assumption of continuity of the maps is not needed, as shown in J. A. Yahav
(1975), "On a Fixed Point Theorem and Its Stochastic Equivalent," J. App!.
Probability, 12, pp. 605-611. An extension to multidimensional state space with an
application to time series models may be found in R. N. Bhattacharya and O. Lee
(1988), "Asymptotics of a Class of Markov Processes Which Are Not in General
Irreducible," Ann. Probab., 16, pp. 1333-1347. Example l(a) may be found in
L. J. Mirman (1980), "One Sector Economic Growth and Uncertainty: A Survey,"
Stochastic Programming (M. A. H. Dempster, ed.), Academic Press, New York.
It is shown in theoretical complement 3 below that the existence of a unique
invariant probability implies ergodicity of a stationary Markov process. The SLLN
then follows from Birkhoff's Ergodic Theorem (see Theorem T.9.2). Central limit
theorems for normalized partial sums may be derived for appropriate functions on
the state space, by Theorem T.13.3 in the theoretical complements of Chapter V.
Also see, Bhattacharya and Lee, loc. cit.
Example 3 is due to M. Majumdar and R. Radner, unpublished manuscript.
K. S. Chan and H. Tong (1985), "On the Use of Deterministic Lyapunov Function
for the Ergodicity of Stochastic Difference Equations," Advances in App!. Probability,
17, pp. 666-678, consider iterations of i.i.d. piecewise linear maps.
3. (Irreducible Markov Processes) A transition probability p(x; dy) on the state space
(S,5) is said to be co-irreducible with respect to a sigmafinite nonzero measure q if,
for each x e S and Be .9' with q(B) > 0, there exists an integer n = n(x, B) such that
p "^(x, B) > 0. There is an extensive literature on the asymptotics of cp-irreducible

(
Markov processes. We mention in particular, N. Jain and B. Jamison (1967),

"Contributions to Doeblin's Theorem of Markov Processes," Z. Wahrscheinlichkeits-
theorie und Venn Gebiete, 8, pp. 19-40; S. Orey (1971), Limit Theorems for Markov
Chain Transition Probabilities. Van Nostrand, New York; R. L. Tweedie (1975),
"Sufficient Conditions for Ergodicity and Recurrence of Markov Chains on a General
State Space," Stochastic Process App!., 3, pp. 385-403. Irreducible Markov chains
on countable state spaces S are the simplest examples of cp-irreducible processes; here
cp is the counting measure, ep(B)'= number of points in B. Some other examples are
given in theoretical complements to Section 11.6.
There is no general theory that applies if p(x; dy) is not cp-irreducible, for any
sigmafinite q. The method of iterated maps provides one approach, when the Markov
process arises naturally in this manner. A simple example of a nonirreducible p is
given by Example 2. Another example, in which p admits a unique invariant
probability, is provided by the simple linear model: X" = ZX,, + E" + ,, where F" are
i.i.d., P(e" = i) = P(F" = i) = z.

4. (Ergodicity, SLLN, and the Uniqueness of Invariant Probabilities) Suppose

{XX : it -> 0} is a stationary Markov process on a state space IS, 9'), having a transition
probability p(x; dy) and an invariant initial distribution it. We will prove the following
result: The process {X"} is ergodic if and only if there does not exist an invariant
probability n' that is absolutely continuous with respect to it and different from it.
The crucial step in the proof is to show that every (shift) invariant bounded
measurable function h(X) is a.s. equal to a random variable g(Xo ) where g is a
measurable function on (S, .9'). Here X= (X0 , X 1 , X 2 , . ..), and we let T denote the
shift transformation and 5 the (shift) invariant sigmafield (see theoretical complement
9.3). Now if h(X) is invariant, h(X) = h(T"X) a.s. for all n > 1. Then, by the Markov
property, E(h(X) I cr(Xo , ... , XX }) = E(h(T"X) I a{X 0 , ... , X,,}) = E(h(T"X) 1 6{X"}) =
g(X"), where g(x) = E(h(X0 , X . ..) I X0 = x). By the Martingale Convergence
Theorem (see theoretical complement 5.1 to Chapter IV, Theorem T.5.2), applied to
the martingale g(X) = E(h(X) a{Xo , ... , X"}), g(X) converges a.s., and in L', to
E(h(X) I a{Xo , X ...}) = h(X). But g(X) h(X) = g(X) h(T"X) has the same
distribution as g(X0 ) h(X) for all n > 1. Therefore, g(Xo ) h(X) = 0 a.s., since the
limit of g(X) h(X) is zero a.s. In particular, if G e .S then there exists B E . ' such
that {X 0 e B} = G a.s. This implies it(B) = P(X0 e B) = P(G). If {X"} is not ergodic,
then there exists G ei such that 0 < P(G) < I and, therefore, 0 < n(B) < 1 for a
corresponding set BE .y as above. But the probability measure r1 B defined by:
n e (A) = tr(A n B)/n(B), A e.9", is invariant. To see this observe that $ p(x; A)i B (dx) =
f B p(x; A)rz(dx)/ic(B) = P(X0 e B, X, e A)/ir(B) = P(X, E B, X, E A)/it(B) (since {X o e B}
is invariant) = P(X0 E A n B)/n(B) (by stationarity) = n(A r B)/tc(B) = tt 8 (A). Since
i,(B) = 1 > rz(B), and tt B is absolutely continuous with respect to n, one half of the
italicized statement is proved.
To prove the other half, suppose {X"} is ergodic and n' is also invariant and
absolutely continuous with respect to n. Fix A e Y. By Birkhoff's Ergodic Theorem,
and conditioning on X 0 , (I/n) Z;= p ' (x; A) converges to it(A) for all x outside a set
( )
of zero it-measure. Now the invariance of n' implies f (1/n) p(')(x; A)zr'(dx) = rz'(A)
for all n. Therefore, n'(A) = rz(A). Thus it' = it, completing the proof.
As a very special case, the following strong law of large numbers (SLLN) for
Markov processes on general state spaces is obtained: If p(x; dy) admits a unique
invariant probability rr, and {X": n -> 0} is a Markov process with transition probability
p and initial distribution n, then (1/n) j ' f (X,) converges to f f (x)n(dx) a.s. provided
-
f
that If (x)I i(dx) < co. This also implies, by conditioning on X 0 , that this almost
sure convergence holds under all initial states x outside a set of zero it-measure.
5. (Ergodic Decomposition of a Compact State Space) Suppose S is a compact metric
space and S = .l(S) its Borel sigmafield. Let p(x; dy) be a transition probability on
(S, s(S)) having the Feller property: x -* p(x; dy) is weakly continuous on S into
p(S)the set of all probability measures on (S,.R(S)). Let T* denote the map on
9(S) into a(S) defined by: (T*)(B) = $ p(x; B)p(dx) (Be f(S)). Then T* is weakly
continuous. For if probability measures ^ converge weakly to p then, for every
real-valued bounded continuous f on S, J f d(T*p") = f (f f (y)p(x; dy))p ^ (dx)
f
converges to ($ f(y)p(x; dy))p(dx) =If d(T*p), since x -a f f(y)p(x;dy) is continuous
by the Feller property of p.
Let us show that under the above hypothesis there exists at least one invariant
_>
probability for p. Fix p e P1(S). Consider the sequence of probability measures
1^ '
-
^'= T *',u (n 1),

n r=o
where
T *I = T*p, and T*('"y = T*(T (r 1).

T *0 p = u, *rp)
Since S is compact, by Prohorov's Theorem (see theoretical complement 8.2 of Chapter

I), there exists a subsequence {p.} of {p"} such that p". converges weakly to a
probability measure n, say. Then T *p , converges weakly to T *n. On the other hand,
J f d^ . - J f d(T *u )I =4 f
nJ
f dy - f f d(T*" p)) -< (sup{f(x)I: x e S })(2/n') -+ 0,
as n' - oo. Therefore, {p} and {T*p".} converge to the same limit. In other words,
it = T*n, or it is invariant. This also shows that on a compact metric space, and with
p having the Feller property, if there exists a unique invariant probability it then
T*p 1_ (1/n) T*'p converges weakly to n, no matter what (the initial
distribution) p is.
Next, consider the set .f = .alp of all invariant probabilities for p. This is a convex
and (weakly) compact subset of P1(S). Convexity is obvious. Weak compactness follows
from the facts (i) q(S) is weakly compact (by Prohorov's Theorem), and (ii) T* is
continuous for the weak topology on 9(S). For, if u ^ e .elf and ^ converges weakly
to p, then ^ = T*" converges weakly to T*p. Therefore, T*p = p. Also, P1(S) is a
metric space (see, e.g., K. R. Parthasarathy (1967), Probability Measures on Metric
Spaces, Academic Press, New York, p. 43). It now follows from the Krein-Milman
Theorem (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York,
p. 207) that di is the closed convex hull of its extreme points. Now if {X ^ } is not
ergodic under an invariant initial distribution n, then, by the construction given in
theoretical complement 4 above, there exists B e P(S) such that 0 < n(B) < I and
it = it(B)it B + n(B`)i B .,, with n B and iB ., mutually singular invariant probabilities. In
other words, the set K, say, of extreme points of d# comprises those it such that {X"}
with initial distribution it is ergodic. Every it e .i is a (weak) limit of convex
combinations of the form .1;'p;" ( n -+ cc), where 0 < A;^ < 1, .l;' = 1, ;' e K.
)
Therefore, the limit it may be expressed uniquely as it = f K pm(dp), where m is a

probability measure on (K, :. (K)). This means, for every real-valued bounded
continuous f, 1, f drt = Sic (f s f dp)m(d).

1. For some of Claude Shannon's applications of information theory to language
structure, see C. E. Shannon (1951), "Prediction and Entropy of Printed English,"
Bell System Tech. J., 30(1), pp. 50-64. The basic ideas originated in C. E. Shannon
(1948), "A Mathematical Theory of Communication," Bell System Tech. J., 27,
pp. 379-423, 623-656. There are a number of excellent textbooks and references
devoted to this and other problems of information theory. A few standard references
are: C. E. Shannon and W. Weaver (1949), The Mathematical Theory of
Communications, University of Illinois Press, Urbana; and N. Abramson (1963),
Information Theory and Coding. McGraw-Hill, New York.
CHAPTER III
BirthDeath Markov Chains
1 INTRODUCTION TO BIRTHDEATH CHAINS
Each of the simple random walk examples described in Section 1.3 has the
special property that it does not skip states in its evolution. In this vein, we
shall study time-homogeneous Markov chains called birthdeath chains whose
transition law takes the form
; ifj =i +1
S ; ifj =i -1
(1.1)
a i ifj =i
0 otherwise,
where a + , + b = 1. In particular, the displacement probabilities may depend

; ;
on the state in which the process is located.
Example 1. (The BernoulliLaplace Model). A simple model to describe the

mixing of two incompressible liquids in possibly different proportions can be
obtained by the following considerations. Consider two containers labeled
box I and box II, respectively, each having N balls. Among the total of 2N
balls, there are 2r red and 2w white balls, I < r < w. At each instant of time,
a ball is randomly selected from each of the boxes, and moved to the other
box. The state at each instant is the number of red balls in box I.
In this example, the state space is S = {O, 1, ... , 2r} and the evolution is a
Markov chain on S with transition probabilities given by
for I i<,2r-1,
233
234 BIRTHDEATH MARKOV CHAINS
r
(w+4i)(2r i)
P1,i+ 1 =
(w + r) 2
CC.-V-^:)
_ i(2r mal) + .(2r^r(w + r i)

P" (1.2)
(2 + r) 2 (w + r) 2
i(w r + i)
Pi,i-1 =
(w + r) 2
and
w r
Poo =P2r,2r= w+r
(1.3)
2r
Pol = P2r,2r - 1 = w + r
Just as the simple random walk is the discrete analogue of Brownian motion,
the birthdeath chains are the discrete analogues of the diffusions studied in
Chapter V.
Most of this chapter may be read independently of Chapter II.
2 TRANSIENCE AND RECURRENCE PROPERTIES
The long-run behavior of a birthdeath chain depends on the nature of its

(local) transition probabilities pi,i+1 = ., p;,i-1 = S i at interior states i as well
as on its transitions at boundaries, when present. In this section a case-by-case
computation of recurrence properties will be made according to the presence
and types of boundaries.
CASE I. Let {X} bean unrestricted birthdeath chain on S = {O, 1, 2, ...} = 7L.
The transition probabilities are
1 = i, Pi,e-1 = ai, Pi,i = 1 i Si (2.1)
with
0<i<1, 0<,<1, .+bl. (2.2)
Let c, d e S, c < d, and write
i(i) = P({X} reaches c before d I Xo = i) = Pi (T < Td ) (c < i < d), (2.3)
where Tr denotes the first time the chain reaches r. Now,
^(i) = ( 1 i 1 )i(i) + rli(i + 1) + S1'(i 1),

TRANSIENCE AND RECURRENCE PROPERTIES 235
or equivalently,
,(i/i(i + 1) iji(i)) = b.(iJi(i) t'(i 1)) (c + I 5 i <, d 1). (2.4)
The boundary conditions for Ii are
ci(c)= 1, li(d)=0. (2.5)
Rewrite (2.4) as
i/i(i + 1) i(i) = (ii(i) 1)), (2.6)

i
for (c + I < i < d 1). Iteration now yields
/x bx-l ... b^+l

(x+ 1) î(x) = S (^(c + 1) î(c)) (2.7)
#X #X 1 #c + l
for c + l < x < d 1. Summing (2.7) over x = y, y + 1, ... , d 1, one gets
xbx-1
. :. 61
0(d) 0(y) = d - 1 S +1 (0(c + 1) 0(c)). (2.8)
x=yxl'x-1 ... Nc+l
Let y = c + 1 and use (2.5) to get
d-1 Sxax-1..'Sc+1
NxNx 1 ' ' ' /'c+ 1

0(c + 1) = X=+ 1 ------- -- . (2.9)
Sx6..Sc+ 1
1 +
x=c+1 Nxx 1' .. #C+1
Using this in (2.8) (and using fr(d) = 0, i/i(c) = 1) one gets
d-1 5x5x-1 "'Sc +l

= xx-
x=y l ^+ l
(y)-- (c + l <, y < d 1). (2.10)
d-1 S
xSx-1c + 1
1+ E /^(^ /^
x=c+1 xx-1' ' .Yc+l
Let p y, denote the probability that starting at y the process eventually reaches
c after time 0, i.e.,
py, = PP (X = c for some n? 1). (2.11)
Then (Exercise 1),

if dx x _1_+l = 0
p = lim î(y)
y ^
= 1
di x x=c+l !'xl'x I " ' f'c+ l
axSx-1...6C < x (c < y). (2.12)

< I if
x=c+1#X#Xl' . 'I'c+l
Since, for c + 1 < 0,
Sxax-1'_c+1 = 0 ac+1 c+2" 'ax
x=c+1 /'xYx-1 ... c+l x=[c+1 lac+11'c+2 ... 1'x
+ ac+ISc+2 ...bo x M2...gx ( 2.13)

Nc+lc+2 *'Ox 1 F12"'x
and a similar equality holds for c + 1 > 0, (2.12) may be stated as
R IS'
2 ax = cc
i=l for ally > c iff Y a
x=1 12 '
 c if Y x < 00. (2.14)

x=1/'1N2"'Nz
By relabeling the states i as i (i = 0, + 1, 2, ...), one gets (Exercise 2)
0
x x+ = 0
p yd = 1 for all y < d iff F
x=oo axax+l"'SO
0
<1 for all y < d iff Y xx+l' - < oo (2.15)
x=m Sxsx+1" 60
By the Markov property, conditioning on X I (Exercise 3),
pYY 6 ypY 1.Y + p

-
+ 1'y + (1 Sy /3 y ). (2.16)
If both sums in (2.14) and (2.15) diverge, then p Y _ l , y = 1, p y+l , y = 1, so that

(2.16) implies
pyy = 1, for all y. (2.17)
In other words, all states are recurrent.

If one of the sums (2.14) or (2.15) is convergent, say (2.14), then by (2.16)
we get
pyy < 1, y e S. (2.18)

TRANSIENCE AND RECURRENCE PROPERTIES 237
A state y e S satisfying (2.18) is called a transient state; since (2.18) holds for
all y e S, the birthdeath chain is transient. Just as in the case of a simple
asymmetric random walk, the strong Markov property may be applied to see
that with probability 1 each state occurs at most finitely often in a transient
birthdeath Markov chain.
CASE II. The next case is that of two reflecting boundaries. For this take
S = {0, 1, 2, .. , N } , P00 = 1 0 , POI = 0' PN.N-1 = 6 N' PN.N = 1 6N, and
Pi.j+ 1 = fli, Pi.r-1 = b1, pi,; = 1 . d ; for I <, i < N -- 1. If one takes c = 0,
d = N in (2.3), then fr (y) gives the probability that the process starting at y
reaches 0 before reaching N. The probability 4(y), for the process to reach N
before 0 starting at y, may be obtained in the same fashion by changing the
boundary conditions (2.5) to c(0) = 0, (N) = I to get that q (y) = I Ji(y).
Alternatively, check that b(y) - I î(y) satisfies the equation (2.6) (with 0
replacing 0) and the boundary conditions (P(0) = 0, 4(N) = 1, and then argue
that such a solution is necessarily unique (Exercise 4). All states are recurrent,
by Corollary 9.6 (see Exercise 5 for an alternative proof).
CASE III. For the case of one absorbing boundary, say at 0, take
S = j0, 1, 2, ...1, Poo = 1 , Pi.i+1 = #i, Pi.^-1 = b;, Pi.; = 1 ; S ; for i > 0;
; , 6 i > 0 for i > 0, fl + 1 < 1. For c, d e S, the probability Ji(y) is given by
(2.10) and the probability p, which is also interpreted as the probability of
eventual absorption starting at y> 0, is given by
...
d-1 ax ax l - bl
Y ..- 1 . . Ij1
p Ya =hm d-1

dtv 1 + Sxbx-1_..51
x=1 x-1 ... Y1
= 1 iff 2 a lb
/j J^ a = oo (for y > 0). (2.19)
x=1 I2'''Yx
Whether or not the last series diverges,
...6, > 0 , for all y> 0 (2.20)

and
P Yd 1 6 y 6 y _ 1 * (6 1 < l , ford > y > 0,
(2.21)
Pod=O foralld>0.
,
By (2.16) it follows that
pYY < 1 (y > 0). (2.22)
Thus, all nonzero states y are transient.

238 BIRTH-DEATH MARKOV CHAINS
CASE IV. As a final illustration of transiencerecurrence conditions, take the

case of one reflecting boundary at 0 with S = {0, 1, 2, 3, ...} and Poo = I o,
Poi =o, pi.r+ =' = p ;.; = 1 ; b ; for i > 0; 1 >O for all i,
b ; > 0 for i > 1, i + 6 i < 1. Let us now see that all states are recurrent if and
only if the infinite series (2.19) diverges, i.e., if and only if p yo = 1.
First assume that the infinite series in (2.19) diverges, i.e., p yo = I for all
y > 0. Then condition on X, to get
Poo = ( 1 o) + opio, (2.23)
so that
Poo = 1. (2.24)
Next look at (see Eq. 2.16)
Pu = 6 1Poi + 1P21 + (1 6 1 l). (2.25)
Since P 20 = 1 and the process does not skip states, P 22 = 1. Also, p ol = 1

(Exercise 6). Thus, p ll = I and, proceeding by induction,
p= 1, for each y > 0. (2.26)
On the other hand, if the series in (2.19) converges then p^, o < 1 for all y > 0.
In particular, from (2.23), we see Poo < 1. Convergence of the series in (2.19)
also gives p r,, < 1 for all c < y by (2.12). Now apply (2.16) to get
p<1, for ally (2.27)
whenever the series in (2.19) converges. That is, the birthdeath chain is transient.
The various remaining cases, for example, two absorbing, or one absorbing
and one reflecting boundary, are left to the Exercises.
3 INVARIANT DISTRIBUTIONS FOR BIRTHDEATH CHAINS
Suppose that there is a probability distribution it on S such that
n'p = n'. (3.1)

Then n'p" = it' for each time n = 1, 2, .... That is, it is invariant under the
transition law p. Note that if {X"} is started with an invariant initial
distribution it then X" has distribution it at each successive time point n.
Moreover, {X"} is stationary in the sense that the P,,-distribution of
(X0 , X,, ... , X.) is for each m > I invariant under all time shifts, i.e., for all
INVARIANT DISTRIBUTIONS FOR BIRTH DEATH CHAINS 239
k>,1,
Pn(xo = io, ... , Xm Im) = Pn (Xk = 1p, ... > Xm+k = I m ). (3.2)
For a birth-death process on S = {0, 1, 2, ... , N} with two reflecting

boundaries, the invariant distribution it is easily obtained by solving n'p' = n',
i.e.,
no(1 o) + ir rbi = ir o
(3.3)
7i ifli i +(l i ai) + mi+Ibi+1 = ni (j = 1,2,...,N 1),
or
7[i-i-^ ni(i + S i ) + ni+^Sj+1 = 0. (3.4)
The solution, subject to ni 0 for all j and J] i rc i = 1, is given by
n i n (l<j<N),
(3.5)
...iI) 1
N ot
7Cp
= 1 +
Y_(
i =1 CS1(52...(SJ
For a birth-death process on S = {0, 1, 2, ...} with 0 as a reflecting

boundary, the system of equations n'p = n' are
7ro(I o)+n151=ito,
(3.6)
71 i -Ii -^+it(I i S i )+7r i +la i +^=j (j>11).
The solution in terms of n o is
7C = o i ...i 1 n o (j% 1). (3.7)

a 1 a 2 ...Si
In order that this may be a probability distribution one must have
01 .J- 1
.
< co. (3.8)
i =1 6162..81
In this case one must take
/
n o =l +
1 .. .
1 1
. (3.9)
...6 1
16 2
)
i =1
For an unrestricted birth-death process on S = {0, 1, 2,. . .} the

equations n' p' = n' are
7 rj-1Ni-1 + 1Cj(I /'j Uj) + 7C j+1 b j+1 = 7Cj (j = 0, + 1, 2,...) (3.10)
which are solved (in terms of n o ) by
ol ... j-1
7t, 6162...gj Ro (J% 1 ),
(3.11)
aj+16j+2 07r0
(j1< 1).
+r * * F'-1
This is a probability distribution if and only if

This
Y_ 6
bj+1 . j+2 E . 01...j-1
< 00, (3.12)
j<-1 Pifli+1 .-1 .j>-1 6 1 2 ...(a j
in which case
aj+lbj+2
...S 0 + ol ... i 1
7< o= 1+ (3.13)
j_' 1 Yjl'j+l
-
/^/^
...
I j>-1 6162...(ai
Notice that the convergence of the series in (3.12) implies the divergence of
the series in (2.14), (2.15). In other words, the existence of an equilibrium
distribution for the chain implies its recurrence. The same remark applies to
the birth-death chain with one or two reflecting boundaries.
Example 1. (Equilibrium for the Bernoulli-Laplace Model). For the Bernoulli-

Laplace model described in Section 1, the invariant distribution it =
(n : i = 0, 1, ... , 2r) is the hypergeometric distribution calculated from (3.5) as
;
2r 2w
_ o -1
j_ 1 2r(w + r) i (w + r i)(2r i) (j X W + j
...
r
^j S 1 b j "o j(2r+j) j i(wr +i) 2w+2r
(w+r)
(3.14)
The assertions concerning positive recurrence contained in Theorem 3.1
below rely on the material in Section 2.9 and may be omitted on first reading.
Recall from Theorem 9.2(c) of Chapter I that in the case that all states
communicate with each other, existence of an invariant distribution is equivalent
to positive recurrence of all states.
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS 241
Theorem 3.1
(a) For a birth-death chain on S = {O, 1, ... , N} with both boundaries

reflecting, the states are all positive recurrent and the unique invariant
distribution is given by (3.5).
(b) For a birth-death chain on S = {0, 1, 2, ...} with 0 a reflecting boundary,
all states are recurrent or transient according as the series
c` 66
12
[J1 1 Nx
diverges or converges. All states are positive recurrent if and only if the
series (3.8) converges. In the case that (3.8) converges, the unique
invariant distribution is given by (3.7), (3.9).
(c) For an unrestricted birth-death chain on S = {0, 1, 2, ...} all states
are transient if and only if at least one of the series in (2.14) and (2.15)
is convergent. All states are positive recurrent if and only if (3.12) holds;
if (3.12) holds, then the unique invariant distribution is given by (3.11),
(3.13).
4 CALCULATION OF TRANSITION PROBABILITIES BY

SPECTRAL METHODS
We will apply the spectral theorem to calculate p", for n = 1, 2, ... , in the case
that p is the transition law for a birth-death chain.
First consider the case of a birth-death chain on S = {0, 1, ... , N} with
reflecting boundaries at 0 and N. Then the invariant distribution it is given by
(3.5) as
1
7z 1 = n o , 71 t = '^ '-1 71 0 (2 < j < N). (4.1)
b l51...51
It is straightforward to check that
i m.i-1 =m1b1=ni-1i-1 = 7 Ei-1Pi-1.;, i= 1,2.....N, (4.2)
from which it follows that
7T ; p i; = n j p j for all i, j. (4.3)
In the applied sciences the symmetry property (4.3) is often referred to as detailed
balance or time reversibility. Introduce the following inner product ( . )" in the
vector space R"+'
242
(x, Y)n =
i
Y_
N
=o
xiYi 7ry x = (x 0 , x i , ... , xN)',
so that the "length" IlxiI,, of a vector x is given by

BIRTHDEATH MARKOV CHAINS
Y = (Yo, Yi, ... , YNY, (4.4)
i xI
(iO
1JZ . ( 4.5)
With respect to this inner product the linear transformation x px is symmetric

since by (4.3) we have
N N N N
(px, y),, = Pijxj Yiii = PjixjYiitj
i =0 j =o i=O j=O
Y,
i=O UO PjiYiJxj7rj
= i=O Y- PijYj)xi 71 i = ( PY, x). = (x, PY)n.

Z (=O (4.6)
Y_
Therefore, by the spectral theorem, p has N + 1 real eigenvalues a o , a l , ... , a N
(not necessarily distinct) and corresponding eigenvectors (0o, 4' , dN, which
are of unit length and mutually orthogonal with respect to ( , ). Therefore,
the linear transformation x --> px has the spectral representation
N
P= akEk ,
k=0
(4.7)
N
x= E ak(4>k, x)aek ,
k=O
where Ek denotes orthogonal projection with respect to ( , ) onto the

one-dimensional subspace spanned by 4: E k x = ( 4 k , x),1 4 k . It follows that
N
p= Z akEk ,
k=O
N (4.8)
Pn x = Y- ak(4)k, x)nek
k=0
Letting x = e j denote the vector with 1 in the jth coordinate and zeros elsewhere,
one gets
p;7) = ith element of p"ej

N N
_ ak( ok, ej),Oki = akOki4kjnj. (4.9)
k=0 k=0
Without loss of generality, one may take a o = I and 0 o = 1 throughout. We

now consider two special birth-death chains as examples.
Example 1. (Simple Symmetric Random Walk with Two Reflecting Boundaries)
S={0,1,.. ,N}, p;,;+a=Pj,^-1 =i, for1_<i<N-1,

Pol = 1 =PN ,N-1
In this case the invariant initial distribution is given by
1 I
7Z j =N (1 <j<,N-1), tr o =zZ N =2
N . (4.10)
If a is an eigenvalue of p, then a corresponding eigenvector x = (x o , x 1 , ... , x N )'

satisfies the equation
z(x j _ 1 +x j+l )=ax j (1<,j<N-1), (4.11)
along with "boundary conditions"
x l = aX0, XN_ 1 = axN. (4.12)
As a trial solution of (4.11) consider x j = 0 1 for some nonzero 0. Since all

vectors of C N+ 1 may be expressed as unique linear combinations of functions
j - exp{ (2rn/N + 1) ji} = 0', with i = (-1) 1 / 2 , one expects to arrive at the right
combination in this manner. Then (4.11) yields '-z (B' -1 + B' +1 ) = cth , i.e.,
0 2 - 2at + 1 = 0, (4.13)
whose two roots are
B 1 =a+i,,/T-c , 0 2 =a-ijl-a 2 . (4.14)
The equation (4.11) is linear in x, i.e., if x and y are both solutions of (4.11)
then so is ax + by for arbitrary numbers a and b. Therefore, every linear
combination
xj=A(cc)0 +B(a)0z (0<j<N) (4.15)
satisfies (4.11). We now apply the boundary conditions (4.12) to fix A(a), B(a),
up to a constant multiplier. Since every scalar multiple of a solution of (4.11)
and (4.12) is also a solution, let us fix x o = 1. Note that x o = 0 implies x 3 = 0
for all j. Letting j = 0 in (4.15), one has
A(a) + B(a) = 1, B(a) = I - A(a). (4.16)

The first boundary condition, x, = ax, = a, then becomes
A(a)(0 1 0 2 ) + 0 2 = a, (4.17)
or,
2A(a)(l a 2 ) 1' 2i = ( 1 a 2 )'' 2 i,
i.e., at least for a 1,
A(a) = i, B(a) ='-z . (4.18)
The second boundary condition x N - 1 = ax N may then be expressed as
( 1 + 0) = 2 (0; + 0z). (4.19)
Now write 0 1 = eò, 0 2 = e - ' O, where 0 is the unique angle in [0, it] such that
cos 4) = a. Note that cosine is strictly decreasing in [0, n] and assumes its entire
range of values [-1, 1] on [0, 7c]. Note also that this is consistent with the
requirement sin 4) = 1 a 2 0. Then (4.19) becomes
cos(N 1)4) = a cos No = cos 0 cos N4), (4.20)
i.e.,
sin No sin 0 = 0, (4.21)
whose only solutions in [0, it] are
4)=cos (k= 0,1,2,...,N). (4.22)
Thus, there are N + 1 distinct (and, therefore, simple) eigenvalues
a k = cos (k = 0, 1, 2, ... , N), (4.23)
and corresponding eigenvectors x '` ( k = 0, 1, ... , N):

( )
x5 = z(0i+9z)= cos k 0,1,...,N). (4.24)

(j=
N
Now,

CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL, METHODS 245
k1 = J n j cosz
IIx(Iz
N kn l 1 N -i (k7rj) 1
2N+Nlyl cosz
N + 2N
cos2(krrj) = 1 il I + cos(2kirj/N)
= 1 N1
N j _ o N N j=0 2
c1 ifk=0orN
(4.25)
t J- ifk=1,2,...,N-1.
Thus, the normalized eigenvectors are
=( 1 , 1 , ,l)', ON=( 1 , -1 ,+1, -- 1,..

^ (4.26)
(k1
4)k/2-x 0kj=^/Lcos (1 <k<N-1).
Now use (4.9), (4.23), and (4.26) to get, for 0 < i, j < N,
N
Pij
(n) E k
n4 ki kj 7 lj
k=0
N-1
k7rj
= rr j + 2rz j y- cos ( k^ cos kni cos( + (-1)rc j . (4.27)
k=1\ Nj N N
Thus, for 1 <j,<N-1,0,<i,<N,
1 2 } _ 1 n+j i _
p j 1 + COS"( )COS COS !
N N k=1
^l N (N I )
) .. ) N ( N
For 0<i<N,
1 IN 1 ( ^ ( ) 1
Po = + COS"t
\ COS J
/ + 1 " I
2N N k = t N N 2N
For 0<i<N,
1 1 N -' / k\ ki i
1
P,N + Z COs cos (--- I COS + ( 1)n+N (4.28)
2N N k = 1 N N N 2N
Note that when n and j i have the same parity, say n = 2m and j i is
even, then
r)
C p;;"' 1 I=4
L cos( I I cos( cos( I[] + o(1)] (4.29)
as m - co. This establishes the precise rate of exponential convergence to the

steady state. One may express this as
u m (p;]'" ) ^ê z n' A _ cosh cos ^, (4.30)
where A = log a, (see (4.23)).
Example 2. (Simple Symmetric Random Walk with One Reflecting Boundary)
S = 0, 1, 2, ... Poi = 1 and i_

Pi,k -1 = 2 = Pij+ i
for all i > 1. Note that p;! is the same as in the case of a random walk with
)
two reflecting boundaries 0 and N, provided N > n + i, since the random walk
cannot reach N in n steps (or fewer) starting from i if N > n + i. Hence for all
i, j, n, p is obtained by taking the limit in (4.28) as N . oc, i.e.,
p=2
f
2 "
o
, cos"(nO) cos(iirO) cos( jn6) dO
_ I cos"(0) cos(iO) cos( JO) dO (j 1> 1, i >1 0), (4.31)

l o
R
p p = 1c
) os(0) cos(iO) dO (i i 0).
ir o
An alternative calculation of (4.31) can be made by first noting that the

condition (4.3) is valid for the sequence of weights {rr ; } given in (4.1) with
it o = 1. This provides an inner product (sequence) space on which p is bounded
self-adjoint linear transformation. The spectral theory extends to such settings
as well.
An example in which the birthdeath parameters are state-space dependent

(i.e., nonconstant) is given in the chapter application.
5 CHAPTER APPLICATION: THE EHRENFEST MODEL OF

HEAT EXCHANGE
The Ehrenfest model illustrates the process of heat exchange between two bodies
that are in contact and insulated from the outside. The temperatures are assumed
to change in steps of one unit and are represented by the numbers of balls in
two boxes. The two boxes are marked I and II and there are 2d balls labeled
1, 2, ... , 2d. Initially some of these balls are in box I and the remainder in box
II. At each step a ball is chosen at random (i.e., with equal probabilities among
ball numbers 1, 2, ... , 2d) and moved from its box to the other box. If there
CHAPTER APPLICATION: THE EHRENFEST MODEL OF HEAT EXCHANGE 247
are i balls in box I, then there are 2d i balls in box II. Thus there is no overall
heat loss or gain. Let X. denote the number of balls in box I after the nth trial.
Then {X: n = 0, 1, ...} is a Markov chain with state space S = {0, 1, 2, ... , 2d}
and transition probabilities
Pu.,- = 2d ' Pi,c+1 =1 d- , for i = 1,2,... , 2d 1,

(5.1)
Poi- 1 , P2d,24-1= 1 ,
P ij = 0, otherwise.
This is a birth-death chain with two reflecting boundaries at 0 and 2d. The
transition probabilities are such that the mean change in temperature, in box
I, say, at each step is propostional to the negative of the existing temperature
gradient, or temperature difference, between the two bodies. We will first see
that the model yields Newton's law of cooling at the level of the evolution of
the averages. Assume that initially there are i balls in box I. Let Y = X d,
the excess of the number of balls in box I over d. Writing e = E j (Y), the
expected value of Y given X 0 = i, one has
e=E,(Xd)=E,[X- d+(XX-,)]
d)+E;(X
/ 2d x_, X _ i 1
=E,(X_1 X 1)=e-1+Er
2d 2d )
e -i 1\
= e- 1 + Ei d = e-, d = 1 ^ e 1. -
Note that in evaluating E i (X X _ 1 ) we first calculated the conditional

expectation of X X-, given Xn _ 1 and then took the expectation of
this conditional mean. Now, by successive applications of the relation
e = (1
e=(1 !) e o = ( 1 !) E i (X0 d) =(i d)(1 ! .

)
(5.2)
Suppose in the physical model the frequency of transitions is r per second. Then
in time t there are n = tT transitions. Write v = log[(l (1/d))]T. Then
e = (i d)e - `, (5.3)
which is Newton's law of cooling.

The equilibrium distribution for the Ehrenfest model is easily seen, using (3.5),
to be
_ ( 2 d) 2 _ a y
j=0, 1,...,2d. (5.4)
That is, it = (ij : j e S) is binomial with parameters 2d, z. Note that d =

is the (constant) mean temperature under equilibrium in (5.3).
The physicists P. and T. Ehrenfest in 1907, and later Smoluchowski in 1916,
used this model in order to explain an apparent paradox that at the turn of
the century threatened to wreck Boltzmann's kinetic theory of matter. In the
kinetic theory, heat exchange is a random process, while in thermodynamics it
is an orderly irreversible progression toward equilibrium. In the present context,
thermodynamic equilibrium would be achieved when the temperatures of the
two bodies became equal, or at least approximately or macroscopically equal.
But if one uses a kinetic model such as the one described above, from the state
i = d of thermodynamical equilibrium the system will eventually pass to a state
of extreme disequilibrium (e.g., i = 0) owing to recurrence. This would contradict
irreversibility of thermodynamics. However, one of the main objectives of kinetic
theory was to explain thermodynamics, a largely phenomenological
macroscopic-scale theory, starting from the molecular theory of matter.
Historically it was Poincare who first showed that statistical-mechanical
systems have the recurrence property (theoretical complement 2). A scientiest
named Zermelo then forcefully argued that recurrence contradicted
irreversibility.
Although Boltzmann rightly maintained that the time required by the random
process to pass from the equilibrium state to a state of macroscopic
nonequilibrium would be so large as to be of no physical significance, his
reasoning did not convince other physicists. The Ehrenfests and Smoluchowski
finally resolved the dispute by demonstrating how large the passage time may
be from i = d to i = 0 in the present model.
Let us now present in detail a method of calculating the mean first passage
time m = E 1 To , where To = inf {n > 0: X = 0} . Since the method is applicable
;
to general birthdeath chains, consider a state space S = {0, 1, 2, ... , N} and

a reflecting chain with parameters ; , S ; = 1 , such that 0 < , < 1 for
1 <i<,N l and o =S N = 1. Then,
m ; =1+ m 1 +5 1 m i _ 1 (1<i<N-1),
(5.5)
m0=0, mN = 1 + mN_1.
Relabel the states by i u, so that u ; is increasing with i,
u 0 =0, u l = 1, (5.6)
and, for all x e S,
4'(x) __ Px ({ X} reaches 0 before N) = u " u " (5.7)

UN - UO
In other words, in this new scale the probability of reaching the relabeled
boundary u = 0, before U N , starting from u x (inside), is proportional to the
distance from the boundary U N . This scale is called the natural scale. The
difference equations (5.5) when written in this scale assume a simple form, as
will presently be shown. First let us determine u x from (5.6) and (5.7) and the
difference equation
'(x)= x '(x+1)+ 5 x (i(x-1), 0<x< N,

(5.8)
tf(x) = 1, i(N) = 0,
which may also be expressed as
î(x+l)î(x)= 6X[î(x)î(x-1)], 0<x<N,

(5.9)
/i(0) = 1, /i(N)=0.
Equations (5.6), (5.7), and (5.9) yield
ux+1 ux= --- (ux ux -i)

x
a 1 S Z .. _ S,a2... x
a a a (ui uo)= a (1<x<N-1), (5.10)

or
x . (5 i
61o2"
ux+1 = 1 +i^ i2 ...i (1 x<N 1). (5.11)
Now write
m(u) - m x . (5.12)
Then (5.5) becomes
[m(ui +i) m(ur)]r [m(ui) m(ui- )]5 = 1 (1 <, i < N 1),

(5.13)
m(u 0 )=m(0)=0, m(u N )m(u N - 1 )= 1.
One may rewrite this, using (5.10), as
m(u1 + 1 ) m(ui) m(ui) m(u1 1 ) _ o1 1

- .
(1 < i < N 1)
ui +l Ui ui Ui -1 6162.. 4j
or, summing over i = x, x + 1, ... , N 1 and using the last boundary

condition in (5.13), one has
1 m(u) m(u- 1 ) 1
_
_ (1 x N 1).
UN UN-1 Ux Ux-1 i =x
6162...(Si
(5.14)
Relations (5.10) and (5.14) lead to
Y_
l'N-1 +
m \U x ) m(U x -1) = Yx/'x+l 1 F'x
(1 x < N 1). Ni -1i
...6 (Sx...Sii
Sxax+l N-1 i =x l
(5.15)
The factor ; / i is introduced in the last summands to take care of the summand
corresponding to i = x (this summand is actually 1/S x ). Sum (5.15) over
x = 1, 2, ... , y to finally get, using m(u 0 ) = 0,
m(UO _ xfx+1
...%jN ...
-1 + x i 1t
Ll (1 < y < N 1).
x=1 Sx 6 x+I * * ' SN-1 x=1 i =x 6X... 6 i
(5.16)
In particular, for the Ehrenfest model one gets
(2dx).2.1 + U-1 (2dx)(2d i)

m =m(ud)=
x= 1x(x+1)...(2d-1) x=1 i =x x(x+1)...i(2d i)
= (2d x)!(x 1)! d 2d1 (2d x)!(x 1)!
x=1 (2d 1)! x=1 i =x (2d i)!i!
= 2d 22d(1 (5.17)
+Q)).
Next let us calculate
m =_ Ei T (0<i<d),
i 4 (5.18)
where
Td= inf{n >0:Xd}. (5.19)
Writing m(u i ) = Ph i , one obtains the same equations as (5.13) for 1 < i < d 1,
and boundary conditions
in(u o ) = 1 + n(u 1 ), m(u) = 0. (5.20)
As before, summing the equations over i = 1, 2, ... , x,

...
m(ux+l) - m(ux) tn(ul) - tn(uo) x o1 _-I
ux+I U x U1 UQ i =1 6 1 62...î
o1
...'
_ -1- Y
_ 6 6 ...6.
1 1 2
(5.21)
where 0 = 1. Therefore,
x
m(u x+, ) - m(u) _- ' z
x - Y r+ I x x+ I
(5 22)
x ,z
x i 1 , x6x+I '
which, on summing over x = 1, 3, ... , d - 1 and using (5.20), leads to
.g x - d - i x 5;+ , ...g xbx+l

-m(u 0 ) _ -1 - d-a 6152 . (5.23)
x=1 PlP2 . .Px x=1 i =1 Pf ... xbx+l
For the present example this gives
m l+
d-1
X= I
d-1
x!
(2d-1)(2d-x)
x!
+ x
+ d-I
=
d-I x ((x + 1)x (i + 2)(i + 1)
i; (2d_i)...(2d_x)(x+1) )
2d x x x -i
<1 +
x=1 (2d- 1)(2d-x) x = 1 2d-x 1= 2d-x
d-1 x! d-I
2d
,1+Y- -+I
x= I (2d - 1)...(2d - x) x= I 2(d - x)
1+
dIl xl + d(log d + 1). (5.24)
x= I (2d - 1) (2d - x)
Since the sum in the last expression goes to zero as d - oo,
m o <d+dlogd+0(1), asd -* oo. (5.25)
For d = 10 000 balls and rate of transition one ball per second, it follows that
m o < 102 215 seconds < 29 hours,

(5.26)
a n d = 10 10 6000 years.
It takes only about a day on the average for the system to reach equilibrium from
a state farthest from equilibrium, but takes an average time inconceivably large,
even compared to cosmological scales, for the system to go back to that state
from equilibrium.
The original calculations by the Ehrenfests concerned the mean recurrence

times. Using Theorem 9.2(c) of Chapter II it is possible to get these results
quite simply as follows. Let t;' = min{n ? 1: X = i }. Then, the mean recurrence
time of the state i is
1 = i!(2d i)! 22d

Et' (5.27)
it 2d!
For d = 10 000 one gets, using Stirling's approximation for the second estimate,
E o zoi 220000 Edt'^ ^_ l00,Jr. . (5.28)
Thus, within time scales over which applications of thermodynamics make

sense, one would not observe a passage from equilibrium to a (macroscopic)
nonequilibrium state. Although Boltzmann did not live to see it, this vindication
of his theory ended a rather spirited debate on its validity and contributed in
no small measure to its eventual acceptance by physicists.
The spectral representation for p can be obtained by precisely the same steps
as those outlined in Exercise 4.3. The 2d eigenvalues that one obtains are given
by a j =j/d, j = 1,2,..., d.
EXERCISES
Exercises for Section III.1

(Artificial Intelligence) The amplitudes of pure noise in a signal- detection device
has p.d.f. fo (x), while the amplitude is distributed as f l (x) when a signal is present
with the noise. A detection procedure is designed as follows. Select a threshold value
0 0 = r8 for .integer r. If the first amplitude observed, X 1 , is larger than the initial
threshold value 0, then decide that the signal is present. Otherwise decide that the
signal is absent. Suppose that a signal is being sent with probability p = Z and that
upon making a decision the observer learns whether or not the decision was correct.
The observer keeps the same threshold value if the decision is correct, but if the
decision was incorrect the threshold is increased or decreased by an amount S
depending on the type of error committed. The rule governing the learning process
of the observer is then
0.11 = n + U{1(O,.)(X. 0)S]
where S. is 1 or 0 depending on whether a signal is sent or not. The signal transmission

processes {S} and {X} are i.i.d.
(i) Show that the threshold adjustment process {0} is a birthdeath Markov chain
and identify the state space.
(ii) Calculate the transition probabilities.

EXERCISES 253
2. Suppose that balls labeled 1, ... , N are initially distributed between two boxes labeled
I and II. The state of the system represents the number of balls in box I. Determine
the one-step transition probabilities for each of the following rules of motion in the
state space.
(i) At each time step a ball is randomly (uniformly) selected from the numbers
1, 2, ... , N. Independently of the ball selected, box I or II is selected with
respective probabilities p, and P2 = t p,. The ball selected is placed in the
box selected.
(ii) At each time step a ball is randomly (uniformly) selected from the numbers in
box I with probability p, or from those in II with probability P2 = 1 p,. A
box is then selected with respective probabilities in proportion to current box
sizes. The ball selected is placed in the box selected.
(iii) At each time step a ball is randomly (uniformly) selected from the numbers in
box I with probability proportional to the current size of I or from those in II
with the complementary probability. A box is also selected with probabilities in
proportion to current box size. The ball selected is placed in the box selected.

>_ > >_
1. Let A d be the set {w: X 0 (co) = y, {Xn (w): n 0} reaches c before d}, where y > c.
Show that A d j A = {cw: X0 (cu) = y, {X (cu): n O} ever reaches c}.
X
2. Prove (2.15) by using (2.14) and looking at {Xn : n 0}.
_< _<
3. Prove (2.4), (2.16), and (2.23) by conditioning on X, and using the Markov property.
>_ _<
4. Suppose that cp(i)(c i d) satisfy the equations (2.4) and the boundary conditions
q(c) = 0, cp(d) = 1. Prove that such a cp is unique.
5. Consider a birthdeath chain on S = {0, 1, ... , N } with both boundaries reflecting.
(i) Prove that P(T mN) (I S N 5 N _ I ...6, )m if i > j, and < (1 o/i t .. N _, )m
if i < f. Here T = inf {n 1: Xn =j}.
(ii) Use (i) to prove that p ; , = P; (Tt < x) = I for all i, j.
6. Consider a birthdeath chain on S = {0, 1, ...} with 0 reflecting. Argue as in Exercise
5 to show that p 1 = I for all y.
7. Consider a birthdeath chain on S = {0, I, ... , N} with 0, N absorbing. Calculate
lim n ' p;T', for all i, j.

nix m=1
8. Let 0 be a reflecting boundary for a birthdeath chain on S = {... , 3, 2, 1, 0}.
9. If 0 is absorbing, and N
Derive the necessary and sufficient condition for recurrence.
reflecting, for a birthdeath chain on S = {0, 1, ... , N},
then show that 0 is recurrent and all other states are transient.
10. Let p be the transition probability matrix of a birthdeath chain on S = {0, 1, 2, ...}
with
.= j= 0.1,2,....
2 (j +21 ) Si 2 (j+ 1),
(i) Are the states transient or recurrent?

(ii) Compute the probability of reaching c before d, c < d, starting from state i,
c_<i_<d.
11. Suppose p is the transition matrix of a birthdeath chain on S = {0, 1, 2, ...} such
that o = 1, , < S ; for j = 1, 2, .... Show that all states must be recurrent.

1. Let {X} be the asymmetric simple random walk on S = {0, 1, 2, ...} with j = p < 2,
1, 2,... and (partial) reflection at 0 with
(i) Calculate the invariant initial distribution it.
(ii) Calculate EX as a function of p < Z.
2. (A BirthDeath Queue) During each unit of time either 0 or I customer arrives for
service and joins a single line. The probability of one customer arriving is 2, and no
customer arrives with probability 1 2. Also during each unit of time, independently
of new arrivals, a single service is completed with probability p or continues into the
next period with probability 1 p. Let X. be the total number of customers (waiting
in line or being serviced) at the nth unit of time.
(i) Show that {X} is a birthdeath chain on S = {0, 1, 2, ...}.
(ii) Discuss transience, recurrence, positive-recurrence.
(iii) Calculate the invariant initial distribution when A < p.
(iv) Calculate EX when A < p, where it is the invariant initial distribution.
3. Calculate the invariant distribution for Exercise 1.2(i) where
N i
N p l , ifj =i +1,
Ni (P 1N i)
+ N p 2 , ifj=i,i=0,1 ' ..., N,
Pij=
Ni P 2' ifj =i -1,

0, otherwise.
Discuss the situation for Exercise 2(ii) and (iii).

4. (Time Reversal) Let {X} be a (stationary) irreducible birthdeath chain with in-
variant initial distribution n. Show that P(X = j I X + i = i) = Pn(X+ 1 = j I X. = i).

1. Calculate p for the birthdeath queue of Exercise 3.2.
2. Let T be a self-adjoint linear transformation on a finite-dimensional inner product
space V. Show
(i) All eigenvalues of T must be real.
(ii) Eigenvectors of T associated with distinct eigenvalues are orthogonal.

EXERCISES 255
3. Calculate the transition probabilities p" for n >, 1 by the spectral method in the case
of Exercise 1.2(i) and p, = p 2 = Z according to the following steps.
(i) Consider the eigenvalue problem for the transpose p'. Write out difference
equations for p'x = ax.
(ii) Replace the system of equations in (i) by the infinite system
1 1 N i 1 i +2
- x0+ ---X I = ax0, 2N Xi + - xi+) + x^+2 = ax;+
2 2N 2 2N
i = 0, 1, 2, .... Show that if for some a there is a nonzero solution

x = (x 0 , x,, x 2' ...)' of this infinite system with X N+l = 0, then a must be an
eigenvalue of p' with corresponding eigenvector (x o , x,, ... , x N )'. [Hint: Show
that X N+ , = 0 implies x i = 0 for all i -> N + 1.]
(iii) Introduce the generating function q(z) = Z^ x ; z` for the infinite system in (ii).
Note that for the desired solutions satisfying X N+ , = 0, q(z) will be a polynomial
of degree _< N. Show that
N(2a 1 z)
q (Z) =
' q(Z), q(0) = xo.
1 ZZ
[Hint: Multiply both sides of the second equation in (ii) by z' and sum over
i >-0.]
(iv) Show that (iii) has the unique solution
N I' -a)(I + )Na
(P(Z) = X0( 1 z) Z
(v) Show that for aj = j/N, j = 0, 1, ... , N, cp(z) is a polynomial of degree N and
therefore, by (ii) and (iii), a j = j/N, j = 0, 1, ... , N, are the eigenvalues of p' and,
therefore, of p.
(vi) Show that the eigenvector x (' ) = (x ) , ... , xN ) )' corresponding to aj = j/N is
given with xo' ) = 1, by xk ) = coefficient of z' in (1 z)" - '(1 + z) .
(vii) Write B for the matrix with columns x^ 0) , .. , x (N) . Then,
(B') - ' diag(a, . , a N)B

,
where
no n
(B') ' B diag ...
(IX' 0) lirz ' IIX IN) IIa2 )
and the (invariant) distribution it is binomial with p = 2, N; see Exercise 1.2(i).

[Hint: Use the definitions of eigenvectors and matrix multiplication to write
(p')"B = B diag(cc .... , aN). Multiply both sides by B' and take the transpose
to get the spectral representation of p". Note that the columns of (B') - ' are the
eigenvectors of p since p(B') - ' = (B') - ` diag(a o , ... , a N ). To compute (B') - '
use orthogonality of the eigenvectors with respect to ( , )" to first write
B' diag(a , ... ,. N )B = diag(llx (0) jjn, , IIx (N) 11n). The formula for (B') - '
follows.]
4. (Relaxation and Correlation Length) Let p be the transition matrix for a finite state
stationary birthdeath chain {X n } on S = {0, 1, ... , N} with reflecting boundaries at
0 and N. Show that
sup {Corr,,( f(Xn ), g(X 0 ))} = e z i.

1.9
where
Corrn(.i(X.), 9(X0)) = En{[f(Xn) Ef(XX)][g(X0) Eg(Xo)]}

(Varnf(Xn)) 2 (Var,^ g(X0)) 1J2
A l is the largest nontrivial (i.e., ^ 1) eigenvalue of p, and the supremum is over

real-valued functions f and g on S. The parameter r = 1/x. 1 is called the relaxation
time or correlation length in applications. [Hint: Use the self-adjointness of p (i.e.,
time-reversibility) with respect to ( , ),, to obtain an orthonormal basis {cp"} of
eigenvectors of p. Check equality in the case f = g = cp l is the eigenvector
corresponding to A 1 . Restrict attention to f and g such that E n f = E n g = 0 and
ii IIn = 119 11,, = 1, and expand as
f =Y_(f (M. P.,

( g=Y_(g ,qin)np .
n n
Use the inequality ab _< (a' + b 2 )/2 to show (Corr, ( f (X.), g(X ))I < e -z ".]
o
5. (i) (Simple Random Walk with Periodic Boundary) States 0, 1, 2, ... , N 1 are
arranged clockwise in a circle. A transition occurs either one unit clockwise or
one unit counterclockwise with respective probabilities p and q = 1 p. Show
that
1 N-1
N r_o1
where 0 = e(znptN is an Nth root of unity (all Nth roots of unity being
1,0,0 ,...,0 ).
2 N-1
(*ii) (General Random Walk with Periodic Boundary) Suppose that for the
arrangement in (i), a transition k units clockwise (equivalently, N k units
counterclockwise) occurs with probability p k , k = 0,1, ... , N 1. Show that
NI NI
(n) = 1 0rU'k) I Br5 "
Pjk P,
N r=o s=o
where 0 = e (2 "' is an Nth root of unity.
Theoretical Complements to Section III.5

1. Conservative dynamical systems consisting of one or many degrees of freedom
(components, particles) are often represented by one-parameter families of

transformations {T,} acting on points x = (x,, ... , x") of [I8" (positionmomentum

phase space). The transformations are obtained from the differential equations
(Newton's Law) that govern the evolution started at arbitrary states x e (18"; i.e., Tx
is the state at time t when initially the state is T o x = x, x e ti". The mathematical
theory described below applies to phenomena where the physics provide a law of
evolution of the form
aT, x
f(T,x), t > 0,
at (T.5.1)
To x=x,
such that f = (f,, ... , f") . I^" R" uniquely determines the solution at all timest >0
for each initial state x by (T.5.1).
Example 1. Consider a mass m on a spring initially displaced to a location x,

(relative to rest position at x, = 0) and with initial momentum x 2 . Then Hooke's
law provides the force (acting along the gradient of the potential curve U(x 1 ) = Zkx, ),
according to which f(x) - (f,(x), f2 (x)) _ ((1/m)x 2 , kx,), where k > 0 is the spring
constant. In particular, it follows that T r x = xA(t), where
cos(yt) my sin(yt) k
A(t)= 1 t_>0, where y= > 0.
--sin(yt) cos(yt)
my
Notice that areas (2-dimensional phase-space volume) are preserved under T, since
det A(t) = 1. The motion is obviously periodic in this case.
Example 2. A standard model in statistical mechanics is that of a system having k

(generalized) position coordinates q l , ... , q, and k corresponding (generalized)
momentum coordinates p l , ... , P k . The law of evolution is usually cast in Hamiltonian
form:
OH
dq;_aH dp;_
i =1,...,k, (T.5.2)
dt ap ' ; dt aq; '
where H - ll(q,, . .. , qk, p,, ... , Pk) is the Hamiltonian function representing the
total energy (kinetic energy plus potential energy) of the system. Example I is of this
form with k = 1, H(q, p) = p 2 /2m + kg 2 . Writing n = 2k, x, = q,, ... , X k = qk>
Xk+ 1 = Pi, , X2k = Pk, this is also of the form (T.5.1) with
p / OH OH OH aH
f(x) _ (fl (x), ..... 2k(x)) _ , ... , -- (T.5.3)
GXk+ 1 aX2, ax, OXk
Observe that for H sufficiently smooth, the flow in phase space is generally
incompressible. That is,
div f(x) - traceâf 1

of r a / a Fl 1 + a (_ OH )]
ax;/1 i =1 ax; i 1 LOx1\aXk+t/ axk +i I ax1
= 0 for all x. (T.5.4)
Liouville first noticed the important fact that incompressibility gives the volume
preserving property of the flow in phase space.
Lionville Theorem T.5.1. Suppose that f(x) in (T.5.1) is such that div f(x) = 0 for
all x. Then for each bounded (measurable) set D c R', IT DI = IDI for all t > 0, where
I I denotes n-dimensional volume (Lebesgue measure).
Proof. By the uniqueness condition stated at the outset we have T, +h = T,Th for all
t, h > 0. So, by the change of variable formula,
-T,,x
ITs+hDI = f d e t( l dx.
11,D \ ax )
To calculate the Jacobian, first note from (T.5.1) that
aThx
I+ af h+O(h2) as h0.
ax = ax
But, expanding the determinant and collecting terms, one sees for any matrix M that
det(I + hM) = 1 + h trace(M) + 0(h 2 ) as h 0.
Thus, since trace(af/ax) = div f(x) = 0,
det( Ox ) = 1 + O(h 2 ) as h p 0.
ax
It follows that for each t >_ 0
IT,+h DI = IT,DI + O(h 2 ) as h + 0,

or
IT,DI = 0 and ITODI = IDI,
dt
i.e., t * ITDI is constant with constant value L. n

2. Liouville's theorem becomes especially interesting when considered along side the
following theorem of Poincare.
Poincare's Recurrence Theorem T.5.2. Let T be any volume preserving continuous

one-to-one mapping of a bounded (measurable) region D c 1' onto itself. Then for
each neighborhood A of any point x in D and every n, however large there is a subset
B of A having positive volume such that for all ye B T'y E A for some
r _> n.

Proof: Consider A, T", T - 2 "A, .... Then there are distinct times i, j such that
IT -. "A n T'01 ^ 0; for otherwise
Ipl >- T - i "a,1 =I I -i "AI = Z JAI = +oo.

i =o i =a i =o
It follows that
IO n T - "li - 'IAI ^ 0.
Take B=AnT - "'i - 'IA,r= nil ii. n

3. S. Chandrasekhar (1943), "Stochastic Problems in Physics and Astronomy", Reviews
in Modern Physics, 15, 1-89, contains a discussion of Boltzmann and Zermello's
classical analysis together with other applications of Markov chains to physics. More
complete references on alternative derivations as well as the computation of the mean
recurrence time of a state can be found in M. Kac (1947), "Random Walk and the
Theory of Brownian Motion", American Mathematical Monthly, 54, 369-391; also
see E. Waymire (1982), "Mixing and Cooling from a Probabilistic Point of View",
SIAM Review, 24, 73-75.
CHAPTER IV
Continuous-Parameter Markov
Chains
I INTRODUCTION TO CONTINUOUS-TIME MARKOV CHAINS
Suppose that {X,: t O} is a continuous-parameter stochastic process with a

finite or denumerably infinite state space S. Just as in the discrete-parameter
case, the Markov property here also refers to the property that the conditional
distribution of the future, given past and present states of the process, does not
depend on the past. In terms of finite-dimensional events, the Markov property
requires that for arbitrary time points 0 < s o < s, < ... < s <S <1< t, < . .
<t and states i0 .. , i k , i, j, j,, ... , j in S
P(X,=j,X l = jt,...,X,.=jfl Xso =i o ,...,X=i k ,Xs =i)

=P(X,=J X, i =11 ...,X1=J.IXs =i). (1.1)
, ,
<,
In other words, for any sequence of time points 0 t o < t, < ... , the discrete
parameter process Yo := X, 0 , Y, := Xr ..... is a Markov chain as described in
I
Chapter II. The conditional probabilities p, J (s, t) = P(X1 = j Xs = i), 0 < s < t,
are collectively referred to as the transition probability law for the process. In
the case p ; j (s, t) is a function of t s, the transition law is called
time-homogeneous, and we write p, 1 (s, t) = p, j (t s).
Simple examples of continuous-parameter Markov chains are the
continuous-time random walks, or processes with independent increments on
countable state space. Some others are described in the examples below.
Example 1. (The Poisson Process). The Poisson process with intensity function
p is a process with state space S = {0, 1, 2, ...} having independent increments
distributed as
261
262 CONTINUOUS-PARAMETER MARKOV CHAINS
(f:P(u)du )^ (
P(X,XX =j)= exp p(u)duI, (1.2)
for j = 0, 1, 2, ... , s < t, where p(u), u > 0, is a continuous nonnegative

function. Just as with the simple random walk in Chapter II, the Markov
property for the Poisson process is a consequence of the independent increments
property (Exercise 2). Moreover,
P.1(s,t)=P(Xr=j1 X, = i)
P(X'
= P = j, Xs = i)
(X5=I)
P(Xt Xs=j i)P(X5 =i)

= P(XS=I)
(J i )!
l expj p(u) du)
\
J
for j i
(1.3)
0 ifj<i.
In the case that p(u) _ A (constant) for each u >, 0,
[2(t s))' ' e 2(i s)

-
(j i)! .
Pi;(s,t)= (1.4)
0, 3> ,
is a function p ;; (t s) of s and t through t s; i.e., the transition law is

time-homogeneous. In this case the process is referred to as the Poisson process
with parameter 2.
Example 2. (The Compound Poisson Process). Let {N} be a Poisson process

with parameter A > 0 starting at 0, and let Y,, Y2 ,... be i.i.d. integer-valued
random variables, independent of the process {N}, having a common
probability mass function f. The process {X,} is defined by
Nt
x= E Y, (1.5)
n=o
where Yo is independent of the process {NN } and of Y1 , Y2 .... . The stochastic

process {XX } is called a Compound Poisson Process. The process has independent
increments and is therefore Markovian (Exercise 4). As a consequence of the
independence of increments,
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS 263
p i,(s,t)=P(X,=jI X5 =i)= P(X,XX =ji XX =i)

=P(X,X =j i). S (1.6)
Therefore,
p,J(s,t)=E{P(X,XS=j i N,N )} S
2(t S)]k
*k ^
_ Y if) (j k^ e
( 1 .7)
=
k0
where f * k is the k-fold convolution of f with f * 0 (0) = 1. In particular, {X,}

has a time-homogeneous transition law. The continuous-time simple random
walk is defined as the special case with f( + 1) = P(Yn = + 1) = p,
f(l)=P(Y,,= l)=q,O<p,q< , l,p+q= 1.
Another popular continuous-parameter Markov chain with nonhomogeneous

transition law, the Plya process is provided in Exercise 6 of the next section
as a limiting approximation to the (nonhomogeneous) discrete-parameter
partial sum process in Example 3.8, Chapter II. However, unless stated
otherwise, we shall generally restrict our attention to the study of Markov
chains with a time-homogeneous transition law.
2 KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS
We continue to denote the finite or denumerable state space by S. To construct

a Markov process in discrete time (i.e., to find all possible joint distributions)
it was enough to specify a one-step transition matrix p along with an initial
distribution n. In the continuous-parameter case, on the other hand, the
specification of a single transition matrix p(t o ) = ((p i; (t 0 ))), where p 11 (t o ) gives
the probability that the process will be in state j at time t o if it is initially at
state i, together with an initial distribution n, is not adequate. For a single time
point t o , p(t o ) together with it will merely specify joint distributions of
X0 , X, o , X 2 , 0 , . .. , XX , o , ... ; for example,
Pn (X0 = i0, X, 0 = i1, ... , Xn,0 = in) = iti0pioi,(t0)pi1iz(t0)...pi^,-1in(t0). (2.1)
Here p(t o ) takes the place of p, and t o is treated as the unit of time. Events that
depend on the process at time points that are not multiples of t o are excluded.
Likewise, specifying transition matrices p(t o ), p(t, ), ... , p(t) for an arbitrary
finite set of time points t o , t,, ... , t, will not be enough.
On the other hand, if one specifies all transition matrices p(t) of a
time-homogeneous Markov chain for values of t in a time interval 0 < t <, t o
for some t o > 0, then, regardless of how small t o > 0 may be, all other transition
probabilities may be constructed from these. To understand this basic fact, first
assume transition matrices p(t) to be given for all t > 0, together with an initial
distribution n. Then for any finite set of time points 0 < t l < t 2 < < t,,, the
joint distribution of X 0 , X,,, ... , X given by
Pn( XO = i0+ X1 i = i1 , Xr2 = i2. , Xt -i = in - 1 , X, = in)

= 1tioPi0i,(t1)Pi1i2(t2
_ t1) ... Pi,,-( t,, t,, I). (2.2)
Specializing to n i = 1, t = t, t 2 = t + s, it follows that
Pi(X1 =j, X1 k) = Pi;(t)P;k(s),
and
Pi(XI +s = k) = p(t + s). (2.3)
But {Xt +s = k} is the countable union U JEs {X, =1' Xt +s = k} of pairwise

disjoint events. Therefore,
Pi(XI +s = k) =Y JEs
P;(XX =j, Xt +s = k). (2.4)
The relations (2.3) and (2.4) provide the ChapmanKolmogorov equations,

namely,
pik(t + s) _Ep
,jEs
i; (t)p Jk (s) (i, k E S; s > 0, t > 0), (2.5)
which may also be expressed in matrix notation by the following so-called

semigroup property
p(t + s) = p(t)p(s) (s > 0, t > 0). (2.6)
Therefore, the transition matrices p(t) cannot be chosen arbitrarily. They must
be so chosen as to satisfy the ChapmanKolmogorov equations.
It turns out that (2.5) is the only restriction required for consistency in the
sense of prescribing finite-dimensional distributions as in Section 6, Chapter I.
To see this, take an arbitrary initial distribution it and time points
0 < t t < t 2 < t 3 . For arbitrary states i 0 , i l , i 2 , i 3 , one has from (2.2) that
P-(X0 = i0, Xti = i1 , X,2 = 12, X3 = 1 3)

_ "ioPioi1(t1)P2(t2 t1)Pi2t3(t3 t2), (2.7)
as well as
Pn(XO = i0 X,, = , i1 , Xt3 = i3) _ it OPiOiJ (t1)Pi1i3(t3 t1). (2.8)

KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS 265
But consistency requires that (2.8) be obtained from (2.7) by summing over i 2 .
This sum is
Yii 7C top10 (tl)Pt,i2(t2 tl)P1213(t3 t2)

= ni o p0 (t1)Y-Pi
1
1 i2 (t2 t1)Pi 2 i 3 (t3 t 2 ). (2.9)
i2
By the ChapmanKolmogorov equations (2.5), with t = t 2 t,, s = t 3 t Z ,

one has
Z PiIi2(t2 t1)Pi2i3(t3 t2) = Pi ,3(t3 t1 ),

iZ
1
showing that the right sides of (2.8) and (2.9) are indeed equal. Thus, if (2.5)
holds, then (2.2) defines joint distributions consistently, i.e., the joint distribution
at any finite set of points as specified by (2.2) equals the probability obtained
by summing successive probabilities of a joint distribution (like (2.2)) involving
a larger set of time points, over states belonging to the additional time points.
Suppose now that p(t) is given for 0 < t < t o , for some t o > 0, and the
transition probability matrices satisfy (2.6). Since any t > t o may be expressed
uniquely as t = rt o + s, where r is a positive integer and 0 < s < t o , by (2.6)
we have
p(t) = p(rt o + s) = p(t o )p((r 1)t 0 + s)
= p 2 (to)p((r 2)t o + s) = ... = pr(t)p( s )
Thus, it is enough to specify p(t) on any interval 0 < t < t o , however small
t o > 0 may be. In fact, we will see that under certain further conditions p(t) is
determined by its values for infinitesimal times; i.e., in the limit as t o 0.
From now on we shall assume that
lim p(t) = 5 ;j , (2.10)

rlo
where Kronecker's delta is given by 6 ij = 1 if i = j, 5 = 0 if i : j. This condition

is very reasonable in most circumstances. Namely, it requires that with
probability 1, the process spends a positive (but variable) amount of time in
the initial state i before moving to a different state j. The relations (2.10) are
also expressed as
lim p(t) = I, (2.11)

tlo
where I is the identity matrix, with 1's along the diagonal and 0's elsewhere.
We shall also write,
p(o)=I. (2.12)
Then (2.11) expresses the fact that p(t), 0 < t < oc, is (componentwise)
continuous at t = 0 as a function of t. It may actually be shown that owing to
the rich additional structure reflected in (2.6), continuity implies that p(t) is in
fact differentiable in t, i.e., p(t) = d(p ;; (t))/dt exist for all pairs (i, j) of states,
and alit > 0. At t = 0, of course, "derivative" refers to the right-hand derivative.
In particular, the parameters q ;j given by
Pij(t) p(0) = p(t)

q^; = lim lim S `' , (2.13)
tlo t elo t
are well defined. Instead of proving differentiability from continuity for transition
probabilities, which is nontrivial, we shall assume from now on that p ;; (t) has a
finite derivativefor all (i, j) as part of the required structure. Also, we shall write
Q = ((qi;)), (2.14)
for q i; defined in (2.13). The quantities q ;; are referred to as the infinitesimal

transition rates and Q the (formal) infinitesimal generator. Note that (2.13) may
be expressed equivalently as
p 1 (At) = S i; + q, At + o(At) as At . 0. (2.15)
Suppose for the time being that S is finite. Since the derivative of a finite
sum equals the sum of the derivatives, it follows by differentiating both sides
of (2.5) with respect to t and setting t = 0 that
Pik(s) = p' (0 )Pfk(s) _ qij Pik(s), i, k e S, (2.16)

jes JE$
or, in matrix notation after relabeling s as t in (2.16) for notational convenience,
p'(t) = QP(t) (t 0). (2.17)
The system of equations (2.16) or (2.17) is called Kolmogorov's backward

equations.
One may also differentiate both sides of (2.5) with respect to s and then set
s = 0 to get Kolmogorov's forward equations for a finite state space S,
Pik(t) = E p (t)q^ k , i, k E S, (2.18)

JES
SOLUTIONS TO KOLMOGOROV'S EQUATIONS IN EXPONENTIAL FORM 267
or, in matrix notation,
P'(t) = p(t)Q. (2.19)
Since p ij (t) are transition probabilities,
p 13 (t) = I for all i e S. (2.20)

,%ES
Differentiating (2.20) term by term and setting t = 0,
Y 9i; = 0. (2.21)
jES
Note that
qij:= pi (0) > 0 for i ^ j,

(2.22)
9ii'= p(0) 0,
in view of the fact p iJ (t) 0 = p ij (0) for i 0 j, and p ii (t) < 1 = p i; (0).
In the general case of a countable state space S, the term-by-term
differentiation used to derive Kolmogorov's equations may not always be
justified. Conditions are given in the next two sections for the validity of these
equations for transition probabilities on denumerable state spaces. However,
regardless of whether or not the differential equations are valid for given
transition probabilities p(t), we shall refer to the equations in general as
Kolmogorov's backward and forward equations, respectively.
Example 1. (Compound Poisson). From (1.7),
co ^ktk
p 1 (t) = I f*k(j i) 1 e -at
t j0,
k=o k

= s ;; e + f(j i)a.te ^' + o (t), as t j 0. (2.23)
Therefore,
= pii( 0 ) = 1 f(j i),

and (2.24)
4 = pii( 0 ) = 2(1 .f(0 )).
3 SOLUTIONS TO KOLMOGOROV'S EQUATIONS IN

EXPONENTIAL FORM
We saw in Section 2 that if p(t) is a transition probability law on a finite state

space S with Q = p'(0), then p satisfies Kolmogorov's backward equation

Y_
p(t) _ > gikPkj(t), i, j e S, t 0, (3.1)
k
and Kolmogorov's forward equation
p(t) _ Pik(t)gkj , i, j ES, t 0, (3.2)

k
where Q = (( q ij )) satisfies the conditions in (2.21) and (2.22).

The important problem is, however, to construct transition probabilities p(t)
having prescribed infinitesimal transition rates Q = ((q1)) satisfying
qij > 0 for i : j, q 11 0,

(3.3)
= L. qij
j
In the case that S is finite it is known from the theory of ordinary differential
equations that, subject to the initial condition p(0) = I, the unique solution to
(3.1) is given by
p(t) = e, t ? 0, (3.4)
where the matrix e`Q is defined by
e= I (tQ)n = I + Y t Q". (3.5)

n =o n. n=1 n.
Example 1. Consider the case S = {0, 1} for a general two-state Markov chain
with rates
qoo = q01 = , q11 = q10 = 6. (3.6)
Then, observing that Q 2 = ( + 5)Q and iterating, we have
Q" = (- 1)" '( + S)" ' Q,

- - for n = 1, 2, .... (3.7)
Therefore,
r(+b)
p(t)= e`Q=I R I (e- 1)Q
_ 1 S + e-(B+a)' e c+air
+b S Se - +Se-
It is also simple, however, to solve the (forward) equations directly in this case
(Exercise 3).

SOLUTIONS TO KOLMOGOROV'S EQUATIONS IN EXPONENTIAL FORM 269
In the case that S is countably infinite, results analogous to those for the
finite case can be obtained under the following fairly restrictive condition,
A := sup Iql < oo . (3.9)
In view of (3.3) the condition (3.9) is equivalent to
sup Igjjl < c. (3.10)

i
The solution p(t) is still given by (3.4), i.e., for each j e S,
z t t"
p1j(t) = ; + tq ;j + t2 q (2) + ... + t n q (n) + ... (3.11)
for all i e S. For, by (3.9) and induction (Exercise 1), we have
Iq n i < A(2A)" ',

(
) - for i, j e S, n >, 1, (3.12)
so that the series on the right in (3.11) converges to a function r;j (t), say,
absolutely for all t. By term-by-term differentiation of this series for r(t), which
is an analytic function of t, one verifies that r. (t) satisfies the Kolmogorov
backward equation and the correct initial condition. Uniqueness under (3.9)
follows by the same estimates typically used in the finite case (Exercise 2).
To verify the Chapman-Kolmogorov equations (2.6), note that
2 m
p (t +s)=e ( z+s'Q=I+(t+s)Q+^t2^s^ Q 2 + ... .^^^
m
t Qm +...l
=
L I+tQ+t?Q2+...
2!
_}.
m! J
2 n
x I+sQ+sQz+... +S_Qn+..
2! n!
= e`QesQ = p(t)p(s). (3.13)
The third equality is a consequence of the identity
Y t m Qm S " Qn tmS"Qr . (t + S)r `fi

r ( 3.14 )
m+n=r m! n! ( m+n=r m!n!^r!
Also, the functions H1 (t) := Y JES p. (t) satisfy
Hi(t) = Z p(t) = 1 gi,Pki(t) = gi,HH(t) (t 0),

jES jS kES kES
with initial conditions H 1 (0) = E IES b i; = 1 (i e S). Since HH (t):= 1 (for all t > 0,
all i e S) clearly satisfy these equations, one has H,(t) = 1 for all t by uniqueness
of such solutions. Thus, the solutions (3.11) have been shown to satisfy all
conditions for being transition probabilities except for nonnegativity (Exercise
5). Nonnegativity will also follow as a consequence of a more general method
of construction of solutions given in the next section. When it applies, the
exponential form (3.4) (equivalently, (3.11)) is especially suitable for calculations
of transition probabilities by spectral methods as will be seen in Section 9.
Example 2. (Poisson Process). The Poisson process with parameter 2 > 0 was
introduced in Example 1.1. Alternatively, the process may be regarded as a
Markov process on the state space S = {0, 1, 2, ...} with prescribed infinitesimal
transition rates of the form
q ij =2, forj =i +1, i= 0,1,2,...

q=A, i =0,1,2,... (3.15)
q 1 = 0, otherwise.
By induction it will follow that
nn+j n
(-1) i^l, if0<jinn
(n) _ (3.16)
0, otherwise.
Therefore, the exponential formula (3.4) [(3.11)] gives for j i, t > 0,
/ t2
!)ijlt) = (S ij + tqi; + Cji^ + .. .
t ;i+k oo tji+k i + k
k= (j i +k)! q k=O(j i +k)!\ j i l
= (2t }' 1 (At)k = (2ty-` e-a:

(3.17)
(j 1 )!k=o k! (j i)!
Likewise,
pij(t) = 0 for j < i. (3.18)
Thus, the transition probabilities coincide with those given in Example 1.

SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATION 271
4 SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY

SUCCESSIVE APPROXIMATIONS
For the general problem of constructing transition probabilities ((p ;; (t))) having
a prescribed set of infinitesimal transition rates given by Q = ((q1)), where
q i1 = y q ;j , q ;j >0 for i j, (4.1)

1Î
the method of successive approximations will be used in this section. The main
result provides a solution to the backward equations
p(t) _ Z gikPk;(t), i, j E S (4.2)

k
under the initial condition
p 11 (0)=5 1j , i,jeS. (4.3)
The precise statement of this result is the following.
Theorem 4.1. Given any Q satisfying (4.1) there exists a smallest nonnegative
solution p(t) of the backward equations (4.2) satisfying (4.3). This solution
satisfies
X p 1 (t) < 1 for all i ES, all t >, 0. (4.4)

jES
In case equality holds in (4.4) for all i e S and t > 0, there does not exist any
other nonnegative solution of (4.2) that satisfies (4.3) and (4.4).
Proof. Write n, = q (i e S). Multiplying both sides of the backward equations

; ;;
(4.2) by the "integrating factor" e z is one obtains
e A " s PLk(s) = e Ais y gI,Ptk(s),

jCs
or
d li"
ds (ePik(s)) = A;e PIk(S) + e r ' s ga;P;k(s) = y elisqPJk(s)
JE$ j :i
On integration between 0 and t one has, remembering that p,(0) = S ;k ,

t
e Air Pik(t) = Sjk + e gIJPjk(s) ds ,

jo o
or
f` e -aiu-s) k E S). (4.5)

Pik(t) = Sike a,' + gijpjk(s) ds (t ^ 0; i,
j #i o
Reversing the steps shows that (4.2) together with (4.3) follow from (4.5). Thus
(4.2) and (4.3) are equivalent to the system of integral equations (4.5). To solve
the system (4.5) start with the first approximation

Pik = Sike A'` (i, k e S, t ) 0) (4.6)
and compute successive approximations, recursively, by
p(t) _ 6 ike -Ail + Y_ e -x<-s^ gij p(k

s
i ) ( ) ds (n 3 1). (4.7)
j0i Jo
Since q ;j 0 for i j, it is clear that p;k ^(t) > p;k ^(t). It then follows from (4.7)
by induction that p;k + '(t) p(t) for all n , 0. Thus, p ;k (t) = lim n -. p(t)
exists. Taking limits on both sides of (4.7) yields

Pik(t) = 6ike-Ail + e-a'(t-S)gijPjk(s) ds. (4.8)
j9 6 i JO
Hence, p ik (t) satisfy (4.5). Also, P k (t) - p; (t) >, 0. Further, >kes Pik (t) 1 for
all t > 0 and all i. Assuming, as induction hypothesis, that >kES Pik '^(t) 1
for all t >, 0 and all i, it follows from (4.7) that
p(t) = e -xis + Y e-zju-s>qij( ^ pk -1)(S))ds

kES j #iJ0 ' keS l
e -x;' + ds
joi o
= et + tie-A;' fells ds = e-x;' + A i e - ' = 1.

o
Hence, Ekcs p(t) < 1 for all n, all t >, 0, and the same must be true for
kES
I Pikt) = lim X pik (t)
nj ro kES
)
We now show that p(t) is the smallest nonnegative solution of (4.5). Suppose
p(t) is any other nonnegative solution. Then obviously p ;k (t) ^ 6 ik e = p(t)
for all i, k, t. Assuming, as induction hypothesis, p ik (t) > p;k 1 (t) for all i, k e S.
t > 0, it follows from the fact that p ;k (t) satisfies (4.5) that
SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATION 273
r
> S ex;t + e x;a s^ cn-1) sds
pik (t) > ik qjk
ij p ( ) /t ) .
= p ikt
joi Jo
Hence, p ik (t) p(t) for all n 0 and, therefore, p ikt) 15 k (t) for all i, k e S
and all t >, 0. The last assertion of the theorem is almost obvious. For if equality
holds in (4.4) for p(t), for all i and all t >, 0, and p(t) is another transition
probability matrix, then, by the above p ik (t) ^ p ;k (t) for all i, k, and t >, 0. If
strict inequality holds for some t = t o and i = i o then summing over k one gets
Y- Pik( O) > E Pik(0) = I,

kES kE$
contradicting the hypothesis that p(t o ) is a transition probability matrix.
Note that we have not proved that p(t) satisfies the Chapman-Kolmogorov
equation (2.6). This may be proved by using Laplace transforms (Exercise 6).
It is also the case that the forward equations (2.18) (or (2.19)) always hold for
the minimal solution p(t) (Exercise 5).
In the case that (3.9) holds, i.e., the bounded rates condition, there is only
one solution satisfying the backward equations and the initial conditions and,
therefore, p;k(t) is given by exponential representation on the right side of (3.11).
Of course, the solution may be unique even otherwise. We will come back to
this question and the probabilistic implications of nonuniqueness in the next
section.
Finally, the upshot of all this is that the Markov process is under certain
circumstances specified by an initial distribution it and a matrix Q, satisfying
(4.1). In any case, the minimal solution always exists, although the total mass
may be less than 1.
Example 1. Another simple model of "contagion" or "accident proneness" for

actuarial mathematics, this one having a homogeneous transition law, is
obtained as follows. Suppose that the probability of an accident in t to t + At
is (v + Ar) At + o(At) given that r accidents have occurred previous to time t,
for v,). > 0. We have a pure birth process on S = {0, 1, 2, ...} having
infinitesimal parameters q = (v + i),), q i , i+ , = v + i2, q ;k = 0 if k i + 1. The forward equations yield
p(t) _ (v + nA)po(t) + (v + (n 1 )A)po,-1(t),

(4.9)
p 0 (t) =
vpoo(t) (n 1).
Clearly,
poo(t) =
(4.10)
p01 (t) _ (v + A)poI(t) + vpoo(t) _ (v + A)poI(t) + ve "`
This equation can be solved with the aid of an integrating factor as follows.
Let g(t) = e ( v + z tp ol (t). Then (4.10) may be expressed as
dg(t) = vez,
dt '
or
g(t) =g(0)+ Jve z u

0
du= J 0
vez"du=V(ezt - 1).
Therefore,
Poi(t) = e-(V+zng(t) = V ( e -^^ _ e -cV+z)t) _ V e - z'). (4.11)
Next
p2(t) = (v + 22)P02(t) + (v + A)Poi(t)
_ (v + 22)p 02 (t) + (v + 2) -v e - "`(1 e -zt ). (4.12)
Therefore, as in (4.10) and (4.11),
p 02 (t) = e cv+zztt[J r (v+zz)"

(v 2 )v e"(1 e
z") du
0 e ]
= e-(v+2)t [1'(%'
+ ^1) J` (ezz"
ezu) du]
^ o
= e-cv+2t
z) [''("_+ A) e 2zr 1 e zt 1
,1 221 2
v(v + 2) [ e - vt
2e -(
v+z ) t
+ e - w+zz)t]
22 2
v(v + 2) e_vt[1
e - zt]z
22 2
Assume, as induction hypothesis, that
v(v + 2) .. .( v + (n 1)^) e_vr[1

Po"(t)=
n! A"
Then,
P. "+i(t) = (v + (n + 1 ) 2 )p0,"+1(t) + (v + n2)P0"(t),

SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY 275
yields
Po,"+ 1
(t) = ecv+("+1)z)r
J 0
e(v+("+1)z)u(v
+ n.1 )po"(u) du
= v(v + 2) (v+n2) i)z), `

ec"+i)z"[1 e z"]"du. (4.13)
n!^" o
Now, setting x = e za,

r ` 1 e " 1 \ 1 e"
e-z]" du = - J x"( i dx = - (x 1)" dx
Jo i
--)
x/ f
2 l
1 (x 1)"+1 e ,.^ 1 ( e z^ 1)"+1
= L n+1 ] 1 n+l
Hence,
v(v + A)... (v + n2)

po+i(t) _ e vt (1 e)
z^ "+ i (4.14)
(n + l)!2'
5 SAMPLE PATH ANALYSIS AND THE STRONG MARKOV

PROPERTY
Let Q = ((q ij )) be transition rates satisfying (4.1) and such that the
corresponding Kolmogorov backward equation admits a unique (transition
probability semigroup) solution p(t) = ((p ;j (t))). Given an initial distribution it
on S there is a Markov process {X1 } with transition probabilities p(t), t 0,
and initial distribution it having right-continuous sample paths. Indeed, the
process {X1 } may be constructed as coordinate projections on the space S2 of
right-continuous step functions on [0, oo) with values in S (theoretical
complement 5.3).
Our purpose in the present section is to analyze the probabilistic nature of
the process {X,}. First we consider the distribution of the time spent in the
initial state.
Proposition 5.1. Let the Markov chain {X,: 0 < t < cc} have the initial state
i and let To = inf{t > 0: X, 0- i }. Then To has an exponential distribution with
parameter q . In the case q = 0, the degeneracy of the exponential
;; ;;
distribution can be interpreted to mean that i is an absorbing state, i.e.,

P1 (TO =oo)=1.
Proof. Choose and fix t > 0. For each integer n > 1 define the finite-dimensional
event
A={X(m/2 n )t =iform=0,l,...,2n}.
The events A. are decreasing as n increases and
A = linl A:= fl A
n-00 n=1
= {X = i for all u in [0, t] which is a binary rational multiple of t}

={To >t}.
To see why the last equality holds, first note that{ To > t} = {X = i for all u
in [0, t]} c A. On the other hand, since the sample paths are step functions, if
a sample path is not in {To > t} then there occurs a jump to state j, different
from i, at some time t o (0 < t o < t). The case t o = t may be excluded, since it
is not in A. Because each sample path is a right-continuous step function, there
is a time point t 1 > t o such that X. =j for t o < u < t 1 . Since there is some u
of the form u = (m/2n)t < t in every nondegenerate interval, it follows that
X = j for some u of the form u = (m/2n)t < t; this implies that this sample path
is not in A. and, hence, not in A. Therefore, {To > t} A. Now note by (2.2)
r Z
P1 (To > t) = Pi (A) = lim F(A) = 11n p(2
nt x n 1 o0 L
t 1 2
]
= l im 1 + 2 qii
+ o^2 ) = e`q , (5.1)
nt o0
proving that To is exponentially distributed with parameter q ii . If q 1 , = 0, the

above calculation shows that P; (To > t) = e = 1 for all t > 0. Hence
P(T0 =oo)=1. n
The following random times are basic to the description of the evolution of
continuous time Markov chains.
T o =O, r 1 =inf{t>0:X, Xo }, r=inf{t>t n _ 1 :X1 0-X,, 1 },

(5.2)
To=t1, Tn=Tn+1Tn forn^ 1.
Thus, To is the holding time in the initial state, T1 is the holding time in the
state to which the process jumps first time, and so on. Generally, T. is the
holding time in the state to which the process jumps at its nth transition.
As usual, P; denotes the distribution of the process {X1 } under Xo = i. As
might be guessed, given the past up to and including time To , the process evolves
from time To onwards as the original Markov process would with initial state
X.0 . More generally, given the sample path of the process up to time
= To + T1 + + Tn _ 1 , the conditional distribution of {Xz , + ,: t > 0} is Ps , , ,
depends only on the (present) state X, and on nothing else in the past.
Although this seems intuitively clear from the Markov property, the time To is
not a constant, as in the Markov property, but a random variable.
The italicized statement above is a case of an extension of the Markov
property known as the strong Markov property. To state this property we
introduce a class of random times called stopping times or Markov times. A
stopping time r is a random time, i.e., a random variable with values in [0, 00],
with the property that for every fixed time s, the occurrence or nonoccurrence
of the event {i <, s} can be determined by a knowledge of {X: 0 < u <, s}. For
example, if one cannot decide whether or not the event IT < 10} has happened
by observing only {X,,: 0 < u <, 10}, then r is not a stopping time. The random
variables i l = To , r 2 = To + T1 ..... t are stopping times, but T1 , To + TZ are
not (Exercise 1).
Proposition 5.2. On the set {co e Si: r(w) < 00}, the conditional distribution of
{X^ + ,: t >, 0} given the past up to time r is P, if r is a stopping time.
Stated another way, Proposition 5.2 says that on IT < co}, given the past of
the process {X,} up to time r, the future is (conditionally) distributed as the
Markov chain starting at X, and having the same transition probabilities as
those of {X1 }. This is the strong Markov property. The proof of Proposition 5.3
in the case that r is discrete is similar to that already given in Section 4, Chapter
II. The proof in the general case follows by approximating t by discrete stopping
times. For a detailed proof see Theorem 11.1 in Chapter V.
The strong Markov property will now be used to obtain a vivid probabilistic
description of the Markov chain {X,}. For this, let us write
k..=0
^r = qjif q1j : 0,
= q " for i J (5.3)
=1 and k;;=0 for i j if q ;; =0.
Proposition 5.3. If the initial state is i, and q ;; 0 then To and X T ,, are

independent and the distribution of XT0 is given by
P^(XT O =j) = kr;, j e S. (5.4)
Proof. For s, A > 0, and j i observe that
P1 (s<T0 s+A,X5+n=J)
P; (X =i for 0<u<s, Xs+o=j)
=P; (X= i for 0 s)p1j(A) = ev sp 1 (A). (5.5)
,
Dividing the first and last expressions of (5.5) by i and letting 0. 0 one gets
the joint density-mass function of To and XTp at s and j, respectively (Exercise 2),
qij
iTo.xTO (s , j) = e v1s pij(0 ) = e v " s gij = 21e-xs , (5.6)

where 2 i = q ;i .
Now use Propositions 5.1 and 5.3 and the strong Markov property to check
the following computation.
Pio(TO - SO , XTo = il T1 ' S17

, XtI = 12, ... , Tn - Sn, X, = in+l)
so
= Pio(TI ' S1, Xti = l2, ... , Tn ' Sng Xt.^ = in+l I TO = S, XTo = i 1)îo
s=0
x e liosk ;oi , ds
-
so
1 (T <1 S1 i XT o = i2, ... , TnI < sn , Xs. = in+1)2ioeîoskioi, ds
= s=0
so \
ds)kio`,
_Aioexios RieitsdS)ki,i2
:=o ( fs"=0
X Pie(TO'S2, XTo= 13,...,Tn-2'Sn,Xtn_2 =in+J
_ ... = f f ,l ife zt ,s ds)( f k, ji . + , I . (5.7)

j=0 j=0 /
Note that
n
fl
j=0
= Pi (Y1 = il, Y2 = i2, ... , Yn+1 = in+1) ,
where {Y.: m = 0, 1, ...} is a discrete-parameter Markov chain with initial state

i o and transition matrix K = ((k ij )), while
n sj
ft 1i,e-xi;s ds
j=0 s=0
is the joint distribution function of n + 1 independent exponential random

variables with parameters ^. io , A i ,.... , 2 i .. We have, therefore, proved the
following.
Theorem 5.4
(a) Let {X1 } be a Markov chain having the infinitesimal generator Q and
initial state i o . Then {Yn := XT ,: n = 0, 1, 2, ...} is a discrete parameter
Markov chain having the transition probability matrix K = (( k 1 )) given

by (5.3). Also, conditionally given {Y}, the holding times To , T1 , T2 ,.. .
are independent exponentially distributed random variables with

parameters A ;0 il ,'l iV ...
,.?
(b) Conversely, suppose { Y} is a discrete-parameter Markov chain having

transition probability matrix K and initial state i 0 , and for each sequence
of states i 0 , i1, '2, ... , let To , T1 , T2 , be a sequence of independent
, ...
exponentially distributed random variables with parameters

respectively. Assume also that the sequences {To , T,, T2 .}, corresponding , ..
to all distinct sequences of states, comprise an independent family. Define
X, =Yo , 0<t<To ,
(5.8)
X, =Yk+1 for To +Tl ++Tk <t<To +Ti ++Tk+l .
Then the process {XX : t >, 0} so constructed is Markovian with initial

state i o and infinitesimal generator Q.
Part (b) of Theorem 5.4 provides a noncanonical construction of a Markov

chain with the transition rates given by Q = ((q ;i )).
By randomizing over the initial state, one easily extends the theorem to an
arbitrary initial distribution n.
The content of the system of integral equations (4.5) is now easy to
understand. Let us write (4.5) as
p<<(t) = 8e-t + Y_
Ai e - xisk p (t s) ds. (5.9)
i0i o
If j 0 i, then k ;i pi,(t s) is the conditional probability, given To = s, that XTo =

and in an additional time t s the process will be in state 1. Adding over all
j 0 i, and then multiplying by the density of To and integrating, one gets for
the case l 0 i,
p 1 (t) = P; (X, = 1) =
f' Z
j#t ,
Pi(XT o = I -x
j, X, = l Tp = s)A e :s ds
_Yf ioi o, l
ie lisk ;i p i,( t s) ds. (5.10)
If 1 = i, an extra term, namely, e -z i` must be added as the probability that no

jump occurs in the time interval (0, t].
Example 1. (Homogeneous Poisson Process). As an immediate consequence of

Theorem 5.4 we get the following corollary.

Corollary 5.5. For a homogeneous Poisson process with parameter A the

successive holding times are i.i.d. exponential with parameter A.
An interesting and useful property of the holding times of a Poisson process

concerns the conditional distribution of the successive arrival times in the period
from 0 to t given the number of arrivals in [0, t].
Proposition 5.6. Let To , To + T1 , ... , To + T1 + + i,... denote succes-

sive arrival times of the Poisson process { Y} with parameter A. Then the
conditional distribution of (To , To + T1 , ... , To + T1 + + T,_ 1 ) given
{ X1 Xo = k} is the same as that of k increasingly ordered independent random
variables each having the uniform distribution on (0, t].
Proof. Let Uo , U,,... , Uk _ 1 be k i.i.d. random variables each uniformly

distributed on (0, t]. Let U (o) be the smallest of U o , U 1 , ... , U1 the next
smallest, etc., so that with probability one U (o) < U(1) < ... < UUk _ 1 ^. Since
each realization of (U(0) , U(1) , ... , Usk _ 1 ^) corresponds to k! permutations of
(Uo , U 1 ,. . . , Uk _ 1 ), each with the same density 1/t 1', the joint density of
(U(o), U41), ... , U(k-n) is given by
k!k! if 0 < so <S 1 < < S_ 1 <_ t

g(so> ... > sk -1) _ (5.11)
0 otherwise.
Now, by Corollary 5.5, To , T1 , ... , Tk are i.i.d. with joint density given by
k
2k+1 exp A x ; if x ; >0 for all i,
x)=
f (xo, x1, ... 0 `-o otherwise.(5.12)
Since the Jacobian of the transformation
(xo, xl, ... , xk) -- (x0, xo + xl, ... , xo + xl + + xk)
is 1, the joint density of To , To + T1 , ... , To + T1 + + Tk is given by
) = 2k+leIrk for 0 < t o < t 1 < t 2 < ... < t k . (5.13)

f1(t0, t1, ... , t k
The density of the conditional distribution of To , To + T1 ..... To + T1 + + Tk

given {To +TI ++Tk - l ,<t<TO +TI ++ Tk} is
2k+l e tsk
P(To +TI ++Tk - l *t<TO +T1 ++Tk )

(5.14)
if0<s o <s l < <s k ands k _ 1 <t<s k ,
0 otherwise.
But the denominator in (5.14) equals P(Y, = k), where Y, = X, X0 . Therefore,

^k+le-dsk Ak!e Th
ASk
(
At) k e xttk
e
.fz(so, s / , ... , SO = k! (5.15)
if 0 <so< s 1 < < sk and sk _ 1 1< t < S,
0 otherwise.
Integrating this over S k we get the conditional density of To , To + T1 , ...
T0 +TI + ... +Tk - 1 given {Y=k}as
(2) k! k! e -zt = k'

e -ASk ds _
t k e - xt k e xt t k
- tk
J3(s0+S1.. . Sk-1) = (5.16)

if0<s0<S 1 < . <sk_1 Wit,
0 otherwise.
Thus, as asserted, f3 is the same as g. n
Example 2. (Continuous-Parameter Markov Branching Process). This is the

continuous-parameter version of the Bienayme-Galton-Watson branching
process discussed in Chapter II. It is another basic random replication scheme
that is used to describe processes ranging from the growth of populations to
the branching of river networks. Consider a population of X0 = i o >, 1 particles
at time 0. Each particle, independently of the others, waits an exponentially
distributed length of time with parameter A > 0 and then splits into a random
number k of particles with probability fk (independently of the waiting time),
where fk > 0, fo >0,0 < fo + f1 < 1, 1k ofk = 1. The progeny continue to
split according to this same process. Let X t denote the number of particles
present at time t. An equivalent description of the process {X t } is as follows.
Since the minimum of i o i.i.d. exponential random variables with parameter
2 > 0 is exponential with parameter i o 2, the holding time To in state i o is
exponential with parameter i o 2. At time To the initial population of X 0 = i o
particles becomes a population of X To = j i o 1 particles with probability
fj - ;o+ 1 . If i o = 0, then Xt = 0 for all t >, 0. In view of the lack of memory
property of the i.i.d. exponentially distributed splitting times of the particles
and their progeny, the holding time T 1 to the next transition given {XTo =1' To }
(j >, 1) is exponential with parameter 2 j = jA (Exercise 13). Thus, {Xt } may be
described as a continuous-parameter Markov chain with state space
S = {0, 1, 2, ...} with absorbing state at 0 and having infinitesimal transition
rates that can be calculated as follows. First note that the probability of k of
i particles splitting within time h > 0 is
( )( e _y_k(l e -z'')k = O(h k ) = o(h) for i 3 k >, 2.

Now,
p(h) = e''xn + .fi \ /[J n 2e-ase-zln-SI ds](e -.tn ) i-1 + 0 (h 2 )

1 0
= 1 - iA,(1 - fl )h + o(h) as h --> 0. (5.17)
Likewise, for j > i + 1,
pii(h) _ ()ii-r+i e -ze e -u- +iâcn s^ ds (e xn)i-1 + o (h)

LJ h fi
0 ]
=if; - i + ,2h+o(h) ash->0. (5.18)
For] = i - 1, we have
pt.j-i(h) = (i)fo[ f Jh
a:lxn) + o (h)
= ilfo h + o(h) ash-0. (5.19)
Therefore, the transition rates are given by
iifj -i +II jai-1, i^1, j0i,

q i; = 0, i =0, j>,0 or j<i-1, i,l, (5.20)
-12(1-f1 ), i =j'> 1.
The branching evolution is characteristically multiplicative and as such clearly
does not possess (additive) independent increments. The branching mechanism
is distinctively displayed in terms of the probability-generating functions of the
(minimal) transition probabilities by the following considerations. Let
Go
9((r) _ E p 1 (t)r, i = 0, 1, 2, .... (5.21)
j=o
Since each of the i initial particles, independently of the others, has produced
a random number of progeny at time t with distribution (p lj (t): j = 0, 1, ...),
the distribution of the total progeny of the i initial particles at time t is the
i-fold convolution (p(t): j = 0, 1, 2, ...). Therefore,
9,(` (r) = (9,(' ) (r)), i = 0, 1, 2, .... (5.22)
The Chapman-Kolmogorov equations for the branching evolution take the

(transformed) form
9^+s(r) = g^ 1) (9s"(r)), (5.23)

since
l
pij(t + s)r' = Y, Y, P ik(t)pkj(s)r
j=0 j=0 k
_ P^k(t)(gs"(r)) k = 9i`'(gs"(r)).
k
Fix i > 0 and consider the discrete parameter stochastic process {X.} defined
by
n=0,1,2,.... (5.24)
The process {X} is clearly a discrete-parameter Markov chain. In fact, from

(5.24) one can see that {X^} has stationary transition probabilities j3 p ij (r).
This makes {X} a discrete parameter BienaymeGaltonWatson branching
process with offspring distribution f, = p lj = p(r).
Observe that the event that {X,} eventually becomes extinct is equivalent to
the event that {XX } eventually becomes extinct. It follows from the result
obtained in Example 11.2 of Chapter II for the discrete parameter case that
the probability p of eventual extinction of an initial single particle is the smallest
nonnegative (fixed-point) r of the equation
9i' > (r) = r. (5.25)
In particular, observe that this root cannot depend on t > 0. One may therefore
expect to be able to compute p from the generating function of the infinitesimal
transition rates. Define
h" (r) _ Y, gijri = fir +

) (5.26)
j=0 j=0
Proposition 5.7. The extinction probability p for the continuous-parameter

branching process is the smallest nonnegative root of the equation
h(r) = 0.
Moreover, if 0 j fi <, I then p = 1.
Proof. Observe that the Kolmogorov backward equation for the (minimal)
branching evolution transforms as follows.
age (r) aPij(t)

_ r _ 91kPkj(t)ri
8t. j 8t j k
_ I glkgi k1 (r) = Y glk(gi i) (r)) k .

k k

Thus,
ag, 1 ) (r)
=h n (9 1 (r)), (5.27)
at
00
g0 (r) _ pij( 0 )r j = r. (5.28)
j=0
In particular, since g, l) (p) = p for all t > 0, it follows that p must satisfy
ag,(' (p) = 0.
)
h (ll (p) = (5.29)

at
Since h (l ^(0)=2f0 >0,h" 1 (l)=Zj o q 1j =0,and

d2V l( r )
dr2 =A I j(j-1)fjrj i 0, 0<r<1,
j=2
it follows that 0 (r) can have at most one zero in the open interval 0 < r < 1.
)
If^^ o jf<I then
fr'i
d h" (r) _ J+ Y 2jrj
)
dr
is nonpositive for all 0 < r < 1. Thus p = 1 in this case. n
As a corollary it follows that since A appears as a simple positive factor in

(5.26), the extinction probability is the smallest nonnegative fixed point of the
probability-generating function of the offspring distribution {f3}. In particular,
therefore, extinction is completely determined by the embedded discrete
parameter Markov chain {Y}. In particular, if > j jfj > 1, then p < 1 (see
Example 11.2 of Chapter II).
Example 3. (Continuous-Parameter Binary Branching or BirthDeath). The

BienaymeGaltonWalton binary branching Markov process {X} is a simple
birthdeath process on {0, 1, 2, ...} with X 0 = 1, absorption at 0, having linear
birthdeath rates q 11 + 1 = q ; , ; _ 1 = i2/2, i >, 1, and having right continuous
sample paths a.s. Define
L:= inf{t > 0:X=0}, (5.30)

M,==#{O<s<t:X,X,=-1}. (5.31)
Imagine a binary tree graph emanating from a single root vertex (see Figure
5.1) such that starting from a single root particle (vertex), a particle lives for
Figure 5.1
an exponentially distributed duration with mean A -1 and then either splits into
two particles or terminates with equal probabilities and independently of the
holding times. Each new particle is again independently subjected to the same
rules of evolution as its predecessors. Then L represents the time to extinction
of this process, and M, is the total number of deaths by time t. According to
Proposition 5.7, L is finite with probability 1. In particular, M L, represents the
number of degree-one vertices in the tree (excluding the root). In the context
of random flow networks, such vertices represent sources, and the extreme value
statistic L is the main channel length. We will consider the problem of computing
the (conditional) mean of L given ML = n. The process {(X1 , M,)} is a Markov
process on Z + x Z + having rates A = ((a, j )), where
Zile, j = i + (1, 0) = (il + 1, i2)

'i i i_, j= i +(-1,1)=(i1-1, i 2 +1)
_(5.32)
i.?, j=i
0, otherwise.
Justification of the use of the forward equations below is deferred to Section

8 (Example 8.2, Exercise 8.14).
Lemma. Let g(t, u, v) = E {u X tv M }, 0 < u, v < 1, t >, 0. Then,
C i (u C 2 ) + C 2 (C 1 u)eI'^ (c,-C2)1
g(t, u, v) = u C (5.33)
( z)) (( u Ci)e=z(c,=czi,_ _,
2
where C i = C 1 (v) = 1 (1 v)" 2 , C 2 = C 2 (v) = 1 + (1 v)i"2

Proof. It follows from Kolmogorov's forward equation that for i o = (1, 0),
a E so {ux^vM^} ;
= E ;o
i iz
u v a(xM,),(il.iz)
at th'i2)
= E1 0 {2AX,u x,+ '^ + ZAX1 u x,-1 V M ^ +1 AX,u x `v M `}
= Z2u2 E; o {u xt v Mt } + ZAv 3u Au E; o {u x `v M `}
au au
(5.34)
Therefore, one has
g(t, u, v) = A (u 2 + v 2u) û g(t, u, v) (5.35)

a
with the initial condition,
g(0, u, v) = U. (5.36)
Letting y(t) be the characteristic curve defined by
Y'(t) _ A (y 2 (t) 2y(t) + v) _ A (y(t) CO)(Y(t) C2),
with C, - C,(v) and C 2 - C2 (v) as defined in the statement of the lemma, (5.35)
and (5.36) take the form
dt g(t, y(t), v) = 0, g( 0 , Y( 0 ), v) = Y(0 )
These are easily integrated to get (5.33). n
From the lemma it follows that
vnP(ML = n) = lim g(t, 1, v) = 1 (1 v) l " 2 = (z)(_ 1)n+'vn

n=1 n=1 \n
0 < v < 1. (5.37)
In particular, the classic formula of Cayley for the distribution of ML follows

as (see also Exercise 11)
1 2 1 2-zn+1
P(ML=n)=(2(-1)n-1=2 n=1,2.....
n-1 n
(5.38)
Thus, by the Stirling formula one has,
P(ML =n) 1n -3 / 2 asn-->oo. (5.39)

2f
The lemma provides a way to compute the conditional moments as follows:
EL I ML = n} = r J " tr 'P(L>tJML =n)dt

- (5.40)
0
and
P(L> t,ML =n) P(ML =n)P(L <t,ML =n)

P(L>t1 ML =n)=
P(ML = n) P(ML = n)
P(ML=n)P(X,=0,ML=n)
(5.41)
P(ML =n)
Now, letting
hn=P(ML=n)E(Ln ML =n), n> 1, (5.42)
we have for 0 < v < 1, using (5.33), (5.37), (5.40) and (5.41),
n1
J
h n vn = r tr -1 {(1 (1 v) 112 ) g(t, 0, v)} dt
0
= rr C C z C
zC1e
1A(c,-c
tr-1 C _ 1
2)r
dt
-C2)u
o f 1 C 1a(c,
C2z 1 e ^
_r-1J C C
r t d t. 1
( 2
Cl)e#M(cl-CZ)`^
Ix(c1-C2)1
(5.43)
C ie
C 22
Expanding the integrand as a geometric series, one obtains

0on
r CI(C2 C2) t r-1 e(n+1)z(c1 cz)t dt
(v) =
C2 Jo n ( C
C 21 )
2
i
"
= irr (C C l r+1 y (C
\nnr S r les ds
r l n=1
) 2 f"
= 2 ^r (1 v)_ 1)/2 (Clinn-r.
(5.44)
n=1 C2J
In the case r = 1,
_ 2(1 v) 1/2
h(v) _ 2 log 1
C1(v)
2a. log (5.45)
A C2(v) 1 + (1 V)112
To invert h(v) in the case r = 1, consider that vh'(v) is the generating function
of {nh"}. Thus, differentiating one gets that
vh'(v) =.i 1 (1 v) -1 [1 (I v) i 2] _). {( 1 v)

i i
(1 v) -i'2}
-
(5.46)
and, therefore, after expanding (1 -- v) - ' and (1 v) - `/ z in a Taylor series
about v = 0 and equating coefficients in the left and rightmost expansions in
(5.46), it follows that
(_ 2
1+ ' )(_1)"
nP(ML =n)E(LIML =n)=2 t . ( 5.47)
In particular, using (5.39) and Stirling's formula, it follows that
E(n - '' z )LI ML =n)' 2, as noc. (5.48)
The asymptotic behavior of higher moments E({n - " 2 ):L }' I

ML = n), r >, 2, may
also be determined from (5.44) (theoretical complement 4). Moreover, it may
be shown that as n > oo, the conditional distribution of n -112 AL given ML = n
has a limiting distribution that coincides with the distribution of the maximum
of a (suitably scaled) Brownian excursion process as described in Exercise 12.7,
Chapter I (see theoretical complement 4).
6 THE MINIMAL PROCESS AND EXPLOSION
The Markov process with infinitesimal generator Q as constructed in Theorem

5.4 is called the minimal process with infinitesimal generator Q because it gives
the process only up to the time
T0 +T1 +T2 +=
n =o . ( 6.1)
The time (is called the explosion time.
What we have shown in general is that every Markov chain with infinitesimal
generator Q and initial distribution is is given, up to the explosion time, by the
process {X1 } described in Theorem 5.4. If ( = oo with probability 1, then the
minimal process is the only one with infinitesimal generator Q and we have
uniqueness. In the case P;(( = co) < 1, i.e., if explosion does take place with
positive probability, then there are various ways of continuing the minimal
process for t ( so as to preserve the Markov property and the backward
equations (2.16) for the continued process {X1 }. One such way is to fix an
arbitrary probability distribution 4 on S and let X, = j with probability i/i i
(j e S). For t > ( the continued process then evolves as a new (minimal) process
THE MINIMAL PROCESS AND EXPLOSION 289
with initial j and infinitesimal generator Q until a second explosion occurs, at

which time {X,} is assigned a state according to 4r independently of the
assignment at the time of the first explosion C, and so on. The transition
probabilities p ;k (t), say, of {X,} clearly satisfy (5.9), the integral version of the
backward equations (2.16), since the derivation (5.10) is based on conditioning
with respect to To , which is surely less than S, and on the Markov property.
Unfortunately, the forward equations (2.19) do not hold for p, since the
conditioning in the corresponding integral equations is with respect to the final
jump before time t and one or more explosions may have already occurred by
time t.
We have already shown in Section 3 that if {q ;; : i c S} is bounded, the
backward equations have a unique solution. This solution p. (t) gives rise to a
unique Markov process. Therefore we have the following elementary criterion
for nonexplosion.
Proposition 6.1. If sup, Es ) (= sup,jq ;i l = sup ; , j Jq ;^^) is finite, then P = cc) = 1

for all i e S, and the minimal process having infinitesimal generator Q and an
arbitrary initial distribution is the only Markov process with infinitesimal
generator Q.
Definition 6.1. A Markov process {X} for which P; (C = oo) = I for all je S
is called conservative or nonexplosive.
Compound Poisson processes are conservative as may be checked from
ll ( A t) /' / ^t
e ^J*v1I)_ i e-i,` (^) *n (j _ i)
jes n=O JES n n=O n jES J
e-x^(^.t)= 1.
(6 . 2 )
n=O n!
Example 1. (Pure Birth Process). A pure birth process has state space
S = {0, 1, 2, ...}, and its generator Q has elements q ;i = A i , q ; , ;+ I = ^.,, q,; = 0
for j > i + I or j < i, i e S. Note that this has the same degenerate (i.e.,
completely deterministic) embedded spatial transition matrix as the Poisson
process. However, the parameters Al i of the exponential holding times are not
assumed to be equal (spatially constant) here.
Fix an initial state i. Then the embedded chain { }: n = 0, 1, ...} has only
one possible realization, namely, Y = n + i (n > 0). Therefore, the holding times
To , T1 ,... , are (unconditionally) independent and exponentially distributed
with parameters 2, A i+ ,, ... , Consider the explosion time
=To +T1 += Tm.

m=0

First assume
1
Y, =00. (6.3)
n=0 ^n
For s > 0, consider
=Ee -'16r m= Eie-ST,

4(s)=E1e-
m=0
= ri m +i = H 1
mj=O'm +i + S =O 1 + S/Am +i
Choose and fix s> 0. Then,
log 4(s) _ E 1og 1 + S E s/'l m

m=0 ^m +i =0 1 + S/Am+l
MW
1
_ s = oo, (6.4)
m=0 1m+i + S
since log(1 + x) >, x/(1 + x) for x 0 (Exercise 1). Hence log 0(s) = oc, i.e.,
(k(s) = 0. This is possible only if P i (C = oc) = 1 since e > 0 whenever ( < oo.
Thus if (6.3) holds, then explosion is impossible.
Next suppose that
Z 1 <0. (6.5)
n=0 ^n
Then
1 m1
E(= = <0o.
m=0 1 m +i n =i in
This implies P;(( = oo) = 0, i.e., Pi (C < oo) = 1. In other words, if (6.5) holds,
then explosion is certain.
In the case of explosion, the minimal process {X,: 0 < t <} can be continued
to a conservative Markov process on the time interval 0 <, t < oc in a variety
of ways. For illustration consider the continuation {X 1 } of an explosive pure
birth process {X } obtained by an instantaneous jump to state 0 at time t = C,
after which it evolves as a minimal process starting at 0 until the time of the
second explosion, and so on. Let Pij (t) denote the transition probabilities for
{X}. Likewise, Pi denotes the distribution of {X} given Xo = I. Consider, for

THE MINIMAL PROCESS AND EXPLOSION 291
simplicity, p oo (t). One has

00
Poo(t)=Po(X,= 0 )_ 1 P0 (X,=0,N,=n) (6.6)
n=0
where N, is the number of explosions by time t. Since the event {X, = 0, N, = 0}

is the event that the minimal process holds at zero until time t,
Po(^': = 0, N, = 0 ) = p00(t) = e-x'. (6.7)
More generally, the event {X, = 0, N, = n} occurs if and only if the nth explosion
occurs at some time prior to t and the minimal process remains at 0 from this
time until t. Let fo denote the probability density function of the first explosion
time ( for the process {X,} starting at 0. Since the times between successive
explosions are i.i.d. with p.d.f. fo , the p.d.f. of the nth explosion time is given
by the n-fold convolution ft". Therefore,
ds.
Po(X, = 0, N, = n) =
J 0 poo(t s)f
t *n (s) (6.8)
Using (6.7) and (6.8) in (6.6) we get
Poo(t)= e "` 1 +
n=1 0
Je f 0 (s) ds] . (6.9)
Differentiating (6.9) gives
Po(t) = ) OPoo(t) + Y f"(t). (6.10)

n=1
In particular, (6.10) shows that the forward equation does not hold for (t).
Since the time of the first explosion starting at 0 is the holding time at 0
plus the remaining time for explosion starting at state 1, we have by the strong
Markov property,
fo=e^*.fi (6.11)
where f1 is the p.d.f. of the first explosion time starting in state 1 and e Ao is the
exponential p.d.f. with parameter A0 . Therefore,
.f "(t) _ e * .fi * f " (t). (6.12)

n=1 n=1
Since,
(6.13)
exo(t) = Aoe-z' = 2opo0(t),
we can write (6.12) as
o*<n
L f n (t) = A0 L Poo * fl * f Z)(t)
n =1 n=1
e oo r
_ 2 0 .fi(s)poo(t s) ds + A 0. fi*.fn(s)poo(t s) ds
o n =i o
= A0P10(t) (6.14)
Now (6.10) takes the form
P'00(t) = A 000 (t) + A010(t). (6.15)
Equation (6.15) is the (0, 0) term in the backward equation for (t). As remarked
earlier (Exercise 2) the backward equations hold for all terms
Another method of constructing a conservative process from an explosive
minimal process is to adjoin a new state A to S and define X, = A for t > l;.
Then A is an absorbing state and the transition probabilities are
p ;1 (t) for i, je S
o(t) 1 > P Ik (t) for i E S, j = A

A(6.16)
I keS
1 fori=j=A
0 fori= A,jES.
It is simple to check that (6.16) defines a transition probability (Exercise 3) and

satisfies the backward as well as the forward equations for i, je S (Exercise 3).
7 SOME EXAMPLES
Example 1. (The Compound Poisson Process). The state space for {Xr is }
S = {0, 1, 2, ...}. Let f be a given probability mass function on S with

f(0) < 1, and let A > 0. Define the Q-matrix by
A.= 9 11 =A.(l f(0)),

.1)
(7
k _
i 2ii f(ji) ifj
i.
' A 1 f(0)
SOME EXAMPLES 293
Since the A, do not depend on i, it follows from Theorem 5.1 that the embedded
chain {1',}, corresponding to the transition matrix K and some initial
distribution, is independent of the i.i.d. exponentially distributed sequence of
holding times { To , T,, .. .}. Also, { Yn } is a random walk that may be represented
as Yo =Xo ,Y1 =X0 +Z 1 ,Y2 =X0 +Z 1 +Z 2 ,...,Yn =X o +Z 1 +Z 2 +.
+ Z .... , where {Z 1 , Z 2 ..... Zn , ...} is a sequence of i.i.d. random variables
with common probability function f, independent of X0 . It follows from this
description that the transition probabilities are given as follows.
Let A = 2(1 f (0)). Then,
p 1 (t) _ Pi ( n transitions occur in (0, t], and X, = j)

n=0
ro
+ Y- P,(T O +T1 +... +T- ,1<t<T0 +T1 + .. +T.,Yn =A
n=1
=S;je -x`+ R(To+Ti +... +Tn- i<T<To+Ti +...+Tn)

n=1
x PA. =j)
=bi,e -kt+ e -xt

n =1 n!
-- f *n(j i)
(A t)n
_ f *n (j i), (7.2)
n = o n.
with the convention that f * o (k) = k .
The rates given in (7.1) are the same as those that are obtained in (2.24).
Example 2. (The Yule Linear Growth Process). This process is sometimes used
as a model for the growth of bacteria, the fission of neutrons, and other random
replication processes. It is a special case of Example 2 in Section 5 with
fo = 0,f2 = 1. However, to keep this example self-contained, consider a
situation in which there are initially i members of a population, each splitting
into two identical particles after an exponentially distributed amount of time
having parameter A and independently of the others. Let X X denote the size of
the population at time t. Then {Xj is a Markov process on S = { 1, 2, ...}
(Exercise 1), whose transition rates can be calculated explicitly as follows.
-xn)1 = e -stn = I i1h + o(h).
p,1(h) = ( e (7.3)
Similarly,
(i)[Jh
p1 1(h) =
Acisz
e ds] [e ] i 1
= i2h + o(h) (7.4)
o
and forj>i+2,
--1 = o(h).
0 <, p 1 (h) < (1 e (7.5)
In particular, therefore, the infinitesimal rates are given by
=iA, q i , i+1 =IA, A i; =0 ifj>i+Iorifj<i. (7.6)
Note that this makes {X} a pure birth process. Since
1 c1 1 1
n 1 An n1 nA 2n =1 n
it follows from Example 6.1 that the process is conservative. We will use the
backward equations to compute p k (t) for all i, k. We have
;
p k (t) = 0 for k , i + 1).

Clearly,
p11(t) = e -Ail = e -'A' (i = 1, 2, ...).
Substituting this into the last equation of (7.7) (with k = i + 1), one gets
pi.j +1(t) _ i2p +1(t) + iAe " + nx'. (7.8)
Multiplying both sides by e` ' one gets
d (e"'pi.i +I(t)) = iAe' A`p1,1+1(t) + e pi.1+1(t)

t
= i2ee At = iAe21
or
t
et)
epi,r +l(t) =
fo iAe ds + e` Ap1+l( 0 ) = 1(1
-
Therefore,
pi,i +1(t) = ie - ` ( 1 e -At ). (7.9)

SOME EXAMPLES 295
Substituting this into (7.7) with k = i + 2, one gets
p+2(t) _ i2 R,;+z(t) + i?,[(i + 1)e-(r+1 t(1 e
which as above, on multiplying both sides by eU`, yields
t. t
p; +2(t) = e ` ` f i^.(i + 1)e '(1 e) ds
- - -
= i(i + 1)eì t[(1 e -z') + z (e -22' 1)]
= i(i + 1l ) e i.Afo e -A T. (7.10)

2
By induction (a basis for which is laid out in (7.9)-(7.10)), one may prove
(Exercise 2)
i(i+ 1). ..(k

Fik(t)= e'(1 e - l`)'` -; fork>i. (7.11)
(k i)!
Example 3. (N-Server Queues). Think of arrivals of customers as a Poisson

process with parameter a. There are N servers. Let X, denote the number of
customers being served or waiting at time t. A customer arriving at time t is
immediately served if XX is less than N. On the other hand, if X, exceeds or
equals N, then the customer must stand in a queue and await a turn. Assume
that the service time for each customer, counted from the moment of arrival
at a free counter until the moment of departure from the counter, is exponentially
distributed with parameter , and assume that the service times for the different
customers are independent. The process {X,} is Markovian (Exercise 4). The
infinitesimal conditions may be found as follows. The state space is
S = {0, 1, 2, ...} and the infinitesimal probabilities are given by
e-ahe-iph + O(h 2 )
if 0 N,
e - a h ahe - à h + O(h 2 ) if 0 <' i '< N,
p1,+3(h) =
ehe - N a h + 0(h 2 ) if I> N,
ê - ah i(1 e -ah) e -u-'inh + O(h 2 ) if I ' N,
p(h) = O(h 2 ) if ji jj > 1.
Therefore,
q=-(a+i) if0<i^N, q =-(a+N)

;; ifi>N;
= a if i >, 0; (7.12)
=i if1<,i N, q _,=N ; ifi>N.
It is difficult to calculate explicitly the transition probabilities by either of

the methods of the two preceding examples. In the next example, we consider
the case of infinitely many servers and zero waiting time for mathematical
convenience. The N-server queue is a special case of the continuous-parameter
birth-death Markov process to be analyzed in Example 1 of Section 8.
Example 4. (Queues with Infinitely Many Servers). In the physical model, X,

denotes the number of customers being served at time t. Suppose that the
number Y of arrivals of customers is a Poisson process with parameter a. Each
customer is upon arrival immediately attended to by a server, the service time
being an exponential random variable with parameter . The service times of
different customers are independent. Suppose that initially there are X o = i
customers who are being served. Because of the "lack of memory" property of
the exponential distribution, the remaining service times for these i customers
are still independent exponential random variables with parameter . Of the i
customers served at time zero, let Z, be the number of those who remain at the
counters (being served) at time t. Then Z, is a binomial random variable with
parameters i and e `. During the time interval (0, t], however, there will be
-
Y new arrivals. Suppose W of them remain at the counters at time t. Then

X, = Z, + W. Of course, Z, and W are independent random variables. To
determine the distribution of X, given X 0 = i, we therefore need to compute
the distribution of W, given Y, = k. Given that there are k arrivals in a Poisson
process in (0, t], the k arrival times, say r 1 , r l + T 2i , ... , i, + z 2 + + T k ,
may be thought of as k independent observations U 1 , U2 ,. . . , U the
uniform distribution on (0, t] arranged in increasing order of magnitude (see
Proposition 5.6). Since the probability that an individual who arrives at time
u will not leave the counter by time t is e ` - (, the probability that a random
-
arrival in (0, t] will not leave by time t is
1 e etr u^ d u = 1 - e - '
(7.13)
t o `
Therefore, the conditional distribution of W given Y = k is binomial with

SOME EXAMPLES 297
parameters k and (1 e t )/t. Unconditionally, then,
P(W = j) = > P(Y = k)P(W = i 1 Y = k)

k=j
)J^ 1 _ t Qtk
\ J
= kJ e
at (k^k \%/\
1 !'t R`1
Q j

e' 1 C ot J (at)k t k-
(
j! t k=J (k J)! t
_ j _ ( t) - e )lat)
e
! \
t
t I
1e ` J
a ea([I - ( 1 R
= e'>" - e - ) ( [('x/)( 1 e `)] J(

-
j >, 0). (7.14)
j! )
In other words, W, is Poisson with parameter (a/)(1 e `). Since Z, is binomial, -
it follows from (7.14) that
p ik (t)=P(X,=kI X o = i) =P(Z,+W =kIXo =i)

min(i,k) i [( c //1 )(l
[(a/ )(1 e-
r )] k-r
= !e-t )r(1 e - Rt)i - re - â^Q)u -e-B')
r t(
kr)!
mik
,) _ 1a r
= e -cawR1-e 01 (1 e `)`C(a/)(1 e-a`)]ki!
r = o r!(i r)!(k r)!
x [e` + e - ` 2] - r for k = 0,1, .... (7.15)
From (7.15) one can directly check the infinitesimal conditions

q, i = ( + i) for i l 0,q 111 = a and q i. , I = i for i 1, q o , l = a. However,
it is simpler to use the probabilistic description to arrive at these. For example,
p i (h) is the sum of the probabilities of the events that there are no arrivals in
(0, h] and none of the Xo = i customers leaves during (0, h], or one customer
arrives in (0, h] and one of i + 1 customers leaves during (0, h], etc. Therefore,
p (h) = e-ah(e-hh)i + 0(h 2 ). (7.16)

r
More generally, since the chance of two or more occurrences in a small interval
of length h is 0(h 2 ), one has
p(h) = e-(a+ù' + 0(h 2 ) = 1 (a + i)h + 0 (h 2 ),

h
ae-aye-P(h-5) ds)e - ' h + 0
p+1(h) = (h 2 )
\\ o
h
= e -Bh-ih ae-c'-v)s ds + O(h2) = e-u+i)hah + O(h 2 )
0
(7.17)
= ah a(i + 1)h 2 + O(h 2 ) = ah + O(h2),
- h )i(1 e - 9h)e - c" - ')h
pj,j_1(h) = (e
= i(1 ah)h(1 (i 1)h) + 0(h 2 ) = ih + 0(h 2 )
Therefore,
= 1im P<<(h) I _ (a + i),

hlo h
(7.18)
P^,^+i(h) 0 = a
qi,i+i(h) = lim qj,1-1 = i.
hlo h
For completeness one needs to argue that {X,} is a Markov process. For
this note that X 4 is
is a function of
(i) Xs and the additional service times up to time s + t required by the X.,
customers present at time s, and
(ii) the numbers and times of arrivals of those customers arriving during
(s, s + t] and their service times.
But the latter (i.e., (ii)) are stochastically independent of {X: 0 0}. Also, the
conditional distribution of the additional service times of the Xs customers who
are being served at time s, given {X, 0 < u < s} (or, for that matter, given
{ } u , 0 < u < s}, as well as the service times of all these arrivals in [0, s] already
spent in [0, s]) is still that of XS i.i.d. exponential random variables each having
parameter ("lack of memory" property). Hence P(Xs+ , = k ( X.: 0 <, u < s) _
P(Xs+, = k I X.J.
Example 5. (Monte Carlo Approaches to the Telegrapher's Equation). Monte

Carlo methods broadly refer to numerical approximations based on averages
of simulations of suitably designed random processes. Modern-day computing
speed and precision make the random-number generator an important tool for
numerical calculations. The results and methods of Chapter I indicate how
discrete time and space Monte Carlo numerical solutions to initial and
boundary-value problems associated with the heat equation (or diffusion
equation)
au a2u au
=D+c-
at ax2 ax
SOME EXAMPLES 299
might be obtained from density profiles computed from simulated random

walks. A connection between general linear parabolic (diffusion) partial
differential equations and discrete space and/or time Markov birthdeath chains
is described in Section 4 of Chapter V.
In the present example we will consider a probabilistic approach to a special
hyperbolic (wave) equation known as the telegrapher's equation. Namely,
z i
a 2
t
2a-=0,
v2+
(7.19)
with the initial conditions of the form u(x, 0) = (p(x) and (0u/at)J, =0 = 0 for a
suitable initial profile cp(x). The telegrapher's equation arises in connection with
the spatiotemporal evolution of both the voltage and the charge density in
one-dimensional electrical transmission lines. In this framework the coefficients
, a, and v 2 are all nonnegative real numbers; in terms of the physical parameters
/v 2 = LC and 2a/v 2 = RC, where R is the electrical resistance, L is the
inductance, and C is the capacitance (see Exercise 9). However, these quantities
play no particular role in the probabilistic derivation.
After dividing through by > 0 and adjusting the other parameters
accordingly, we may take = 1. Partition the state space as a one-dimensional
lattice of sites spaced A > 0 units apart and partition time into units of
0, r, 2r, 3r, ... , with A = vr. A particle is started at x = 0 and in the first unit
of time r moves with speed v > 0, with some probability ic + in the positive
direction to A(= vT), or moves with probability zr - = 1 ir + in the negative
direction at speed v to A. Subsequently, at each successive time step the
particle will continue to travel at the rate v but at each time step, independently
of the previous choice(s), may choose to reverse its previous direction with
probability p = ar. Notice that, given the initial velocity, the time step until
the particle makes a reversal in direction is geometrically distributed with mean
1/p = 1/ar and variance q/p 2 = (1 ar)/(ar) 2 . The reversal mechanism provides
a kind of stochastic oscillation in the motion.
The position process {S} is represented in terms of successive displacements
by
S0=0, Sn =X 1 +X 2 +.+Xn , nil, (7.20)
where {X} is a two-state Markov chain with initial distribution

P(X, = A) = n + = I P(X I = A) and homogeneous one-step transition law
determined by
f'(X+^=A^X_A)=P(Xn +^_ A^X_ A)=1p, n?1.

(7.21)
Let us now consider the average profile after n time steps and viewed from
the position of the particles initially moving to the right, starting from the initial
position m0. Let

u + (na, m0) = E(cp(m0 + S) I X l = 0), (7.22)
and let
u-(nT, m0) = E(p(m0 + S) 1 X l = -A). (7.23)
Note that the conditional distribution of {S} given X, = -A coincides with

the conditional distribution of { - S} given X l = A. Therefore,
u (nt, m0) = E(cp(mA - S^) I X l = 0).

- (7.24)
Conditioning on X 2 in (7.22) we can write
u + (ni, mEt) = E(E(cp(m 0 + S) XZ) j Xl = 0)
=E(gq(m 0- S, -i +A)^X, =
(mO+S-1 + 0 )I Xt = 0 )( 1 - p)
= u - ((n - 1)i, (m + 1)0)t + u + ((n - 1)t, (m + 1)4)(1 - ar).
(7.25)
Conditioning likewise in (7.23) gives
u (nr, mA) = u + ((n - 1)r, (m - 1)A)ar + u ((n - 1)i, (m - 1)A)(1 - OCT).

- -
(7.26)
Note that taking n + = n - = Z, u(nT, mA):= Ecp(m0 + S) = Z(u + + u - ). We

now have the first-order equations for u + and u - given by
u + (nT, x) - u + ((n - l)z, x) u + ((n - 1)T, x + vr) - u + ((n - 1)r, x)

i T
- au+((n - 1)T, x + vr) + au - ((n - 1)i, x + vT),
(7.27)
u (nr, x)
- - u ((n
- - 1)i, x) u ((n- - 1)i, x - vT) - u ((n
- - 1)i, x)
r z
- au ((n - 1)i, x -
- vT) + au + ((n - 1)r, x - v c),
-
(7.28)
where x = mA, A = vi. In the limit as i -> 0, A = vt -- 0, n, m -+ oo such that
x = mA, t = nT, these equations go over to
au = v a au + au , - (7.29)
-
a u = v ax - au - + au + . (7.30)

SOME EXAMPLES 301
Let, for the solutions u + , u to these equations,
u + +u - u+-u-
u=- and w= (7.31)
By combining (7.29) and (7.30) we get the so-called transmission line equations
for u and w,
au aw
(7.32)
^t = v -- ,
aw = V au - 2aw.
(7.33)
at ax
Now w can be eliminated by differentiating (7.32) with respect to t and (7.33)

with respect to x and combining the equations. This results in the telegrapher's
equation for u. The initial conditions can also be checked by passage to the limit.
The natural Monte Carlo simulation for solving the telegrapher's equation
suggested by the above analysis is to start a large number of noninteracting
particles in motion according to the above scheme, say half of them starting
to the right and the other half going to the left initially. One then approximates
u(t, x), t = nt, x = m0, by the arithmetic mean N -1 Yk= , cp(x + Sr), where
N is the number of particles. However, there is a practical issue that makes the
approach unfeasible. Namely, for a small time-space grid, the probability p = ca
of a reversal will also be very small; the smaller the grid size the larger the
mean time to reversal and its variance. This means that an extremely large
number of particle evolutions will have to be simulated in order to keep the
fluctuations in the average down. Mark Kac first suggested that for this problem
it is possible, and more practical, to use continuous-time simulations. The idea
is to consider directly the time to velocity reversal. In the above scheme this
time is TN2 , where N, the number of time steps until reversal, is geometrically
distributed with parameter p = ar. In the limit as T -+ 0 the time therefore
converges in distribution to the exponential distribution with parameter a. So
it at least seems reasonable to consider the continuous-parameter position
process { Y} defined by the motion of a particle that, starting from the origin,
travels at a rate v for an exponentially distributed length of time, reverses its
direction, and then travels again for an (independent) exponentially distributed
length of time at the speed v before again reversing its direction, and so on.
The Poisson process of reversal times can be accurately simulated.
To make the above ideas firm, let {N,} be the continuous-parameter Poisson
process with parameter a. Let be a + 1-valued random variable independent
of {N} with P(e = 1) = Z. The velocity process is the two-state Markov process
{ V,} defined by
V,,=us(-1)v, t>,0. (7.34)

The position process, starting from x, is therefore given by
0
Y,=x+ J t

Vds=x+vs E0
(l)' ds.
, (7.35)
Although {Y} is not a Markov process, the joint evolution {(Y V)} of position
and velocity is a Markov process, as is the velocity process { V} alone. This is
seen as follows.
P(a<Y +S <,b,1' =w1 {Y,:0<u<t},{V:O-<u<r})
=PI a^Y+ J V +t dz<b,V, +s =wl{Yo =x),1V,:0<u<t})

S
o
\ J
=P(a YSb,V=w Yo=y,Vo=^=Yw=v. (7.36)
One expects from the earlier considerations that the solution to the
telegrapher's equation with the prescribed initial conditions is given by
u(t, x) = Ecp(x + Y)
= 1E (x+v E(-1) N ds)+cp^xv J^(-1)'-ds)]. (7.37)

[ 0
To verify that (7.37) indeed solves the problem for a reasonably large class
of initial profiles, let f(x, w) = cp(x), for w = v, v. Define
g(t,x,w)=E{f(Y,V) V0 =w,Y0 =x}=Tf(x, w). (7.38)
Let g(t, x) = g(t, x, v), and g (t, x) = g(t, x, v). Since {(Y, V)} is a Markov
-
process, g satisfies the backward equation
a
9 = Ag(t, x, w) = w ag + a[g(t, x, w) g(t, x, w)]. (7.39)
In order to check that the infinitesimal generator A is of this form, let x , f(x, w)
be, for w = v and w = v, a bounded twice-differentiable function with bounded
derivatives. Then, as s ,^ 0,
9(s,x,x)=E(i(I:,V)1 I'o=x, Vo=w)
af(X, ys) Yo = x, Vo = w] o(s)

= E f(x, Vs) + (Y5 x)
[
_ (f(x, w)e "^ + ase-"f(x, w) + o(s))

- J
ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV CHAINS 303
I f ( x , w)
+E((-x)- - ------ Yo=x, vo=wl+o(s)
\\ Gx
= f(x, w) - as(.f (x, w) - f(x, - w))

f( x, w) S
_ f(x, w) + as(.f (x, w) 1(x, w))

f (x w)
+ w Ox
'
J (e
S -,
+ o(i)) dr + o(s)
0
i?f ( x, w)
= f(x, w) + as(f (x, - w) - f(x, w)) + w s + o(s). (7.40)
3x
Taking w = v and w = - v in (7.39), we get the two first-order equations (7.29)

and (7.30) satisfied by g + and g - , from which the corresponding transmission
line equations and then the telegrapher's equation follow from (7.39) in the
same way as before.
8 ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV

CHAINS
Let {X} be a Markov chain on a countable state space S having a transition

probability matrix p(t), t > 0.
As in the case of discrete time, write i - j if p 13 (t) > 0 for some t > 0. States
i and j communicate, denoted i -* j, if i - j and j -* i. Say "i is essential" if i - j
implies j --- i for all j, otherwise say "i is inessential." The following analog of
Proposition 5.1 of Chapter II is proved in exactly the same manner, replacing

p7 , p, etc., by p j (t), p ( s), etc. Let denote the set of all essential states.
; ;;
Proposition 8.1
(a) For every i there exists at least one j such that i - j.
(b) Ifi-->j,j- .k then i-k.
(c) If i is essential then i H I.
(d) If i is essential and i - j then "j is essential" and i .-+ j.
(e) On 41 the relation "+-." is an equivalence relation, i.e., reflexive,
symmetric, and transitive.
By (e) of Proposition 8.1, & decomposes into disjoint equivalence classes of

states such that members of a class communicate with each other. If i and j
belong to different classes then i +-* j, j -i- I.
A significant departure from the discrete-parameter case is that states in
continuous parameter chains are not periodic. More precisely, one has the
following proposition.
Proposition 8.2
(a) For each state i, p, (t) > 0 for all t > 0.
(b) For each pair i, j of distinct states, either p 1 (t) = 0 for all t > 0 or
p ij (t)>0 for all t>0.
Proof. (a) If there is a positive t o > 0 such that p 1 (t 0 ) = 0, then since
C P14
(
n

n )
t ... P^ n
t Pi^( n
= Puu( n
P to = 0
)
t ^ < P(to),
\ n)
for all positive integers n. Taking the limit as n --' oo, one gets by continuity
that lim p ;; (t o /n) = 0, which is a contradiction to continuity at t = 0 and
the requirement p(0) = 1.
(b) Let i, j be two distinct states. Suppose t o is a positive number such that
p, (t o ) = 0. Since p i,(t o ) > p ;; (t o s)p ; ,(s) for all s, 0 < s < t o , and since
p ;i (t o s) > 0 it follows that p. s (s) = 0 for all s <, t o . Then q i; = p(0) = 0, and
= 0. It follows from (5.4) that P; (X To = j) = 0. This implies that k ; ; = 0 for
every j' such that k uj ' > 0. For, if k ;j . > 0 and k j . j > 0, then no matter how small
t is (t > 0), one has, denoting by {} n : n = 0, 1, 2, ...} the embedded discrete
parameter Markov chain,
P.;(t) P(XT o =j', XTo+T1 = j To+Ti _<t<To +T1 +T2 )

,
=Pi(Y1=j',Y2=j,To+T1 <t<To +Ti +T2 )
kip krj fff ,,
+r; 5r,+tj +r;)

dt ; dt ; . dt ; > 0
ifA >0. (8.1)
If ), j = 0 then, given Y2 = j, T2 = oo with probability 1. Thus, the triple integral

may be replaced by
JJ 2 te - xiti+ ,t;tr dt ; dt v > 0.
This contradicts "p 1 (s) = 0 for all s < t o . Thus, k; = 0. In this manner one
may prove that k;j = 0 for all n. This implies p 1 (t) = 0 for all t.
) n

A state i is recurrent if
P;(sup{t>0:X,= i }=oo)=1. (8.2)
A state i is transient if
P,(sup{t>,0:X, =i} <Co)= 1. (8.3)
Define the time of first return to i by
rj i = inf{t >, TO : X, = i }. (8.4)
Also denote by p i; the probability
p = P(^1 i < cc). (8.5)
Thus, if j 0 i, then p ;j is the probability that the particle starting at i will ever
visit j. Note that p ;; is the probability that the particle will ever return to i after
having visited another state.
Theorem 8.3
(a) A state i is recurrent or transient for {X} according as i is recurrent or
transient for the embedded discrete parameter chain { }"} having
transition matrix K.
(b) i is recurrent if and only if p i; = 1.
(c) Recurrence (or transience) is a class property.
Proof. (a) Suppose i is recurrent. This clearly implies that P1 (Y" = i for infinitely
many integers n) = 1. Conversely, suppose i is a recurrent state of the
discrete-parameter embedded chain { }"}. Let
>7 ; =z=inf{t>, To : X, = i}, T;"=inf{t> X^ =i }, (8.6)
for r = 2, 3, ... , denote the times of successive returns to state I. Under Pi the
random variables r; 11 , r; 2 > . , T;'^ ( ' -1) are i.i.d. and positive. Hence,
the nth partial sum t;" ) converges to oo with probability 1 as n --^ cc (Exercise 1).
Therefore (8.2) holds.
If i is transient for {X,}, then i cannot be recurrent for { }"} since this would
imply i is recurrent for {X}. This implies that i is transient for { }"}, since, for
a discrete-parameter chain a state is either recurrent or transient.
(b) Since transitions occur only at times To , To + T,, To + T, + T2 , ... , it
follows that
p;j=P;(Y"=j for some n 1). (8.7)

In other words, the p ;; defined by (8.51 for {X) are the same as those defined
for the embedded discrete-parameter chain. Hence, (b) follows from (a) and
Proposition 8.1 of Chapter II.
(c) i is recurrent and i i--> j for {XX } implies "i is recurrent and i -J for { Y},"
which implies that j is recurrent for { Y}, which, in turn, implies thatj is recurrent
for {X}. n
A state i is positive recurrent if
.:=E< oo . (8.8)
Note that if (8.8) holds then p ;; = 1. Therefore, positive recurrence implies

recurrence.
If a recurrent state i is such that E ri = oo, it is called null recurrent.
; ;
Suppose that S is a single essential class of positive recurrent states, under

p(t). By the strong law of large numbers one has with probability 1,
r
Y ,,fr' +1)
t
= (i)
i
T (r) r' = 1
lim = lira = E(r; Z) - r; l) ) = E ; T; 1) . (8.9)
r-+ao r r
Similarly, if f is a real-valued function on S such that
(E;J
'
0
TS 1 )
f (XS ) ds < oo , (8.10)
then applying the strong law of large numbers to the i.i.d. sequence
I,
Zr = f(XS)ds (r = 1,2,...), (8.11)
it follows that, with probability 1,

1 r
urn- Zr , = E ; J f(XS ) ds. (8.12)
r.00 rr'=1 0
Writing v, = number of visits to state i during (0, t], one has
- J f(Xs) ds = -1JT" .f (Xs) ds + 1 -

1 `
('
t 0
('
t 0
^
t r l=1
Zr +
1
t
`
.f (Xs) ds. (8.13)
The first and third terms on the right-hand side of (8.13) go to zero as t - 00
with probability 1 (Exercise 2). By (8.9), (v, /t) -+ (E 1) ) -1 with probability 1
as t -+ oo. Therefore we have the following theorem.
Theorem 8.4. Suppose S comprises a single essential class of positive recurrent

states. If f satisfies (8.10) for some i, then with probability 1,
( t L(11
1
lim ds=(E;T;'1) È;

f(X,)ds, (8.14)
. t JO
regardless of the initial distribution.
The following analog of Theorem 9.2 of Chapter II may be proved in exactly

the same manner using Theorem 8.4.
Corollary 8.5. Suppose S consists of a single positive recurrent class under

p(t). Then:
(a) For all i, j e S
1
lim p,.(s) ds = (Et''. (8.15)
t_.0t 0
(b) The transition probability p(t) admits a unique invariant initial

distribution it given by
7r, = (E; 25 1 ^) -1 , j E S.
(c) If (8.10) holds, then the limit in (8.14) also equals ER f (Xo ) = Y j n j f (j).
To state the analog of Theorem 10.2 of Chapter II, write f(j) = f(j) E n f (Xo )
and
T
1 CTI
-
WT (t) = f (Xs ) ds (t >, 0). (8.16)
"T fo
Theorem 8.6. Assume, in addition to the hypothesis of Theorem 8.4, that
t^ 1z
Q 2 .= E ; f (XS ) ds) < oo . (8.17)
0
Then, as T -+ oo, the stochastic process {WT (t): t >, 0} converges in distribution
to Brownian motion with drift zero and diffusion coefficient
'Q Z .
b 2 = (E j rj' ) ) - (8.18)
A complete analog of Theorem 9.4 of Chapter II follows from Theorems 8.3,

8.4, Corollary 8.5, if one observes that in the case S comprises a single
communicating class, the Markov chain {X,} is positive recurrent if and only
if it admits an invariant probability distribution. The "only if" part of the last
statement follows from Corollary 8.5(b). For the sufficiency see Exercise 12.
In order to find an invariant probability n, one may try to differentiate the
right side of the equation n'p(t) = n' to get
n'Q=0. (8.19)
A sufficient condition for the validity of term-by-term differentiation is given

in Exercise 11.
Example 1. (Birth-Death Process with One Reflecting Boundary)
S = {0, 1, 2, ...}, 9 01 = 2, q00 = 2 0;
= 8 i, ,, ql,1+1 = 1 2 i, qt1 = 2 k (i > 1).
Here ; , 8 i > 0 and ; + b l = 1. By Theorem 8.3 and Section 2 of Chapter III,

we know that this Markov chain is recurrent if and only if
Also, the equations (8.19) become

x= big2...bx
l#1N2* Yx
= co. (8.20)
no20 + it1A1S1 = 0,
(8.21)
ni Ai + ni+it5i+1Ai+1 = 0 for j >, 1.
Solving these successively in terms of n o one gets, with o = 1,
__ -o ir _ (AO^ 0i2' f
o or j2.
J 2. ( 8.22 )
1 b1 1 o ' ' l S 1S 2' S
For the chain to be positive recurrent requires n i < co, i.e.,
( AO) 0i2 i-1 < oo. (8.23)

i S1S2...Si
If this holds, take
..
no 1 i -i 1 (8.24)
= ( 3162.... ai
=1 ( 20 ) o'2
in (8.22) to get the unique invariant initial distribution. Thus, the necessary and

sufficient conditions for positive recurrence (or for the existence of a steady state)
are (8.20) and (8.23).
The N-server queue described in Example 7.3 is a continuous-parameter
birth-death chain on S = {0, 1, 2, ...} with infinitesimal rate parameters given
by (7.12). In particular, the process has a reflecting boundary at zero so that
the criterion for recurrence and positive recurrence apply. The series (8.20)
diverges or converges according to the divergence or convergence of
6 1 6 2 ' * = - ( 2 ) ... [(N 1 )](N) Y -N+1

Y=N l2... y y=N aY
N!
= - l y I ( a I y (8.25)
Thus, the process is recurrent if and only if > a/N. Similarly, the series (8.23)
converges or diverges according as the following series converges or diverges,
4 i2 ... v i = i ay -i
6162 y ! y N+ 1 a + N
Y=N A y Y=N N.N
NN
k))(a+N)-I. (8.26)
Thus, the process is positive recurrent if and only if> a/N. It is null recurrent
if and only (f = a/N. In the case > a/N, the invariant initial distribution is
given by
1 _1 a' ( 1_-_
= it o (for 2 j N),
.
^ 1 (a + ) ito j! (a +j) it
_ a 'N N 1
(8.27)
7r ' N N! (a + N) 710 (j > N),
-I
1+ N 1 + 1_ a 'NN
0
;^j! II)i
(a 1
(+J) ;_ (a + N) Nj N(
Example 2. (Birth-Death Process with One Absorbing Boundary). Here
S={0,1,2,...}, q1;=A.;>0 fori_>l, q00=0,

q1.j + 1 = 1 2 ; and q1.1-1 = b i A 1 for i > 1,
where
0<i<1 and ;+b;=1 fori>,l.

Thus, k oo = 1, k i; = 0 if i >, 1, k ;.;+ , = ; , k_, = 6, for i >, 1. The embedded

discrete-parameter chain is a birth-death chain with absorption at 0. Therefore,
"0" is an absorbing state for {X,}, and it is the only absorbing state. Given two
states c and d with c < d, let /i(x) denote the probability that {X} reaches c
before it reaches d starting from x. Note that this probability is the same for
{X,} as it is for the embedded chain {}}. Therefore by Eq. 2.10 of Chapter III,
d_1 6c
+lac 2 ... 5y
-x ^+i+z
...y
^(x) = a-1 (c + 1 < x < d 1). (8.28)
ac+1&+z
b
y+
( Y=C+l Nc+le+z . ley)
Letting d --s oo, one gets

d-1
v=X ^+^^+z
.
..
y
p xc = Px ({X,} ever reaches c) = lim 1 for x > c.
a.^
d-1
6c+iSc+i
... 8v
1+1
... y )
v=c +i c+ic+z
(8.29)
Thus, for x > 0, p x , = I and, more generally, p x, = 1 for x > c, if and only if
v= i
6 1 6 2 ... by
iz ...
v
= cc. (8.30)
But if p xo = 1 for all x > 0 then Px (( = cc) = 1 for all x > 0, since the holding
time at 0 is infinite. So there is no explosion if (8.30) holds. If the series in
(8.30) is convergent, then p so < 1 for x> 0, and p x, c. However,
this does not necessarily mean explosion. For example, let x = p, 6 x = 1 p
for x>0 with 2<p<1, also take A x =2>0 for all x>0,g 00 =0. Then by
Proposition 6.1, the process is conservative and, therefore, nonexplosive.
Example 3. (Chemical Reaction Kinetics). The study of the time evolution of

molecular concentrations that undergo chemical reactions is known as chemical
reaction kinetics. The interest in reaction kinetics is not limited to chemistry.
Ecologists and biologists are interested in the development of populations of
various organic species that are involved in "reactions" through predation,
births, deaths, immigration, emigration, etc.
The simplest view of chemical reactions is as an evolution that takes place
through a succession of stages called elementary reactions. For example,
2A + B -+ C is the notation used by chemists to denote an elementary reaction
in which two molecules of A react with a molecule of B to produce a molecule
of C. The notation A -* B + C represents the dissociation of a molecule of A
into molecules of types B and C. The reaction A -s B represents a transformation
of a molecule of A into a molecule of B (for example, as a result of radioactive

emissions). The term molecularity is used to refer to the number of reactant
molecules involved in a given elementary reaction. So, A B is called
uni-molecular while A + B --* C is bi-molecular. Double arrows are used to
denote reversible reactions. For example A . B signifies that either the reaction
A --. B or the reverse reaction B -- A may occur at an elementary reaction stage.
As an illustration, consider a chemical solution that undergoes elementary
reactions of the form C + A . C + B where C denotes a catalyst whose
concentration is held constant. For example, if the concentration of C is large
relative to the concentrations of A and B then, although C does react with A
and B, the relative variation in its concentration is so small that it may be
viewed as constant. In describing the evolution of such a process, it is convenient
to disregard the concentrations of substances that are constant throughout the
evolution. So we are left with the reversible unimolecular reaction A # B. Let
X, denote the number of molecules (concentration) of A at time t in a volume
of substance held constant. The total number N of molecules A and B is constant
throughout the reaction process. The state space of the process {X1 } is
S={O,1,2,...,N}.
The holding time in state i represents the time required for an elementary
reaction (either A -- B or B ^ A) to occur when there are i molecules of species
A (and N i of species B). The transition (jump) from state i to state i 1
corresponds to the reaction A --* B by one of the A molecules with the catalyst
and the transition from i to i + 1 corresponds to the reverse reaction. Let us
assume that {X} is a time-homogeneous Markov process such that each molecule
of A has probability rA At + o(At) of reacting with the catalyst in time At to
produce a molecule of B (as At 0). Suppose also that the reverse reaction
has probability rB At + o(At) as At ^ 0. In addition, suppose that the
probability of occurrence of more than one elementary reaction in time At has
order o(At). This makes {X} a continuous parameter (Ehrenfest type)
birthdeath chain on S = {0, 1, 2, ... , N} with two reflecting boundaries at
i = 0 and i = N and having the infinitesimal transition rates q ;; given by
2=(Ni)r forj= i +1,0<,i^N-1

52 =irA forj =i -1, 1 <,i^N
9.; (8.31)
)_ [irA +(Ni)r 8] forj=i 3 O<i<N
0 otherwise.
Observe that uni-molecularity is reflected in the birthdeath property

p.1 (At) = o(At) as At 0 for I j i! > 1. In case of higher-order molecularities,
for example, 2A + C + B, the concentration of A would skip states. Also notice
that the conservation of mass is obtained in the boundary conditions.
It follows from an application of Corollary 8.5 that there is a unique invariant
initial distribution it that is also the limiting distribution regardless of the initial
state. In fact, one may obtain it algebraically from the equation n'Q = d.
However, it is difficult to explicitly solve the Kolmogorov equations (2.17) and

(2.19) for the distribution at finite t > 0. By considerations of the evolution of
the generating functions of the distribution under (2.17) or (2.19), one may
obtain moments and other useful information at times t < oo, when the
coefficients are, as in the present case and in the branching case of Example 3
in Section 5, (affine) linear in the state variable j.
The generating function of X X given X. = i, is
N
W;(z, t) = E;(z ") _ Y_ z'pri(t) (8.32)
j=o
so that, differentiating both sides with respect to t, using (2.19), and regarding
X --^ zx as a function of X for fixed z,
E1(zx`) = E,( qx,,;z')

U at t)
'
= Ej( 1 ti,x<<N-1)[rAXtz x1-1 + (N X,)ra z x l (rA X, + (N XX)rs)z x`]

+i
+ 1 txo} [NrB Z NrB] + 1 (x,=N)[rAXrz x ' -' NrAZ N ])

+ +i
= rAEi(Xtz xt-1 ) + NrB E,(z x t ^) rB E ; (X,z x1 rAEj(Xtz x `)
)
NrB E(z x `) + rB E(X z x )
= rA z ` (z, t) + Nrezcp,(z, t) z2r8 Viz (z, t) rAz 0z ' (z, t)
U4,1
NrB 9 i (z, t) + rz (z, t). (8.33)
az
Collecting similar terms in the last expression, one gets
`P^
a at t) _ (1 z)(rB Z + rA a ) âz'
t) NrB (1 z)^p;(z, t). (8.34)
This is aJrst-order partial differential equation subject to the "initial condition"
^p (z, 0) = E.z x0 = z`.

; (8.35)
A standard method of solving this equation is by Lagrange's method of

characteristics. This method simply recognizes that (8.34) in essence says that
the directional derivative
C -t (1 z)(rB Z + rA ) Z )wl(z, t) _ NrB (1 - z)4,1(z, t).

Introduce a real parameter 0 and in the (z, t) plane define the (family of)
parametric curve(s) 0 + (z(0), t(0)), by
d z(9) dt(B) = 1.
--(1 z)(rB Z + rA), (8.36)
dO dO
Now note that along such a curve the above directional derivative is none other
than d[cp (z(0), t(6))]/d8, so that one has
;
d^P^(z( 8 ), t( 8 )) = Nr
B (1 z)gqr(z( 0 ), t(0)), (8.37)
dO
whose general solution is
cp ; (z(0), t(0)) = exp{ NrB J (1 z(0)) d9 + d} , (8.38)
where d is a constant of integration. Now the solution of (8.36) is
rA + rB (1 + cee) i
t( 8 ) = 0, z(0) _ r" + . (8.39)
rB rB
Different values of c give rise to different curves of this parametric family. Each
point (z, t) in the planar region {(z, t): 1 < z < 1, t > 0} lies on one and only
one curve of this family. Thus, if we fix (z, t), then this point corresponds to c
obtained by solving for c and 0 with t(0) = t, z(0) = z in (8.39). That is,
0 = t'
c=e_(rA+rH)l rA arB/\z+ra/
1
J . (8.40)
Next note that, by (8.36),
dz =
((1z)d9=^ l log(z+ A
r ), (8.41)
J rB z + rA re \ rB
so that
q, (z(0), t(0)) = d'(z + r'') N , (8.42)

rB/
where d' is to be obtained from the initial condition
cp,(z(0), t(0)) = cp,(z(0), 0) = z`(0). (8.43)

Specifically,
N(l
d' = zi(0) ( z(0) + rA^ N = r " + rB
z(0) ^
)-
+ c) -N
rB rB
_1_rA+rA+ rB
(1 +c)-J( rA +rB)-N(l+c)-N(8.44)
rB rB rB
c being given by (8.40). Thus, writing c(t, z) for c in (8.40),
NN
t) =[ (1 + c(t, z))' rA B B r) (1 + c(t, Z))Z + rB/
rB rBJ ^ rA
(8.45)
From (8.45) and (8.40), by appropriate differentiation, one may calculate E.XX
and Var X. Moreover, as t oo, c(t, z) -+ 0. Hence,
;
)N(Z
r
lye qi(z, t) = (8.46)
ra rB
(_ + )N,
which shows that, no matter what the initial state i may be, as t - cc, the
limiting distribution of XX is binomial with parameters N and p = rB /(r,, + r ).

SPECTRAL METHODS
We are interested here in the computation of e`Q, where Q is an

(m + 1) x (m + 1) matrix with eigenvalues a , a l , ... , a m , possibly with
repetitions, and corresponding eigenvectors
X00 XOm
X0 = , ..., X.
xm0 Xmm
which are linearly independent. Let
X00 X01 ... XOm
.X11 .. Xlm
B = ( XOXI ... xm ) = X (9.1)
XmO Xml ... xmm

Then
...
QB = (a0X0a1x1 mXm) ,
... mx.) _ (000XÎX1Xl

Q Z B = Q(oXO1X1 ...mXm), ... ,
Q B = (px0a1X1 ... pt
m X. )
Hence
0o t o ao t o
tQ ' n n n
e B= Q) B = (OXOGL'1X1...oc"X.)
n=o n. n=o n.
= (et^wxoetatx1...eta' X,) = B diag(etô, . .. , et"'). (9.2)
Therefore
e tao
etas
0
e`Q=B 0 B-1. (9.3)
Example 1. (Birth-Death Process with Reflecting Boundaries). Let
S = {0,1,2,...,N}, 900= ) o, qol =
= 'N' qN,N 1 = AN 9i,i+ 1 = NiAi , q1,_1 = ( 1 )A, = SiAi
with 0 <, < 1 for 1 < i < N 1. The invariant distribution n, which exists
and is unique, satisfies n'p(t) = n'. Differentiation at t = 0 yields

n'Q=0, (9.4)
or, in detail,
n o A o + rz 1 A 1 5 1 = 0,
ni-lj-itij-1 "jAj + nj +1Sj+1^j+1 =0, for 1 < j < N 1, (9.5)
nN-1YN-1AN-1 nNAN = O.
The solution is unique and given by
Ao
zt 1 = ial 1c o ,
n2 (ir1A1 no2o) 1 Aolno

0 = (
= no,
Ô ) 1

aZ^z 622 61 J ^2 6162
it s = 1 (n2A2 n11 2 1) = __ ( ' oP! A 0 1 '

no = Ao 12 no
63A3 3'3 (5 1 62 b1 ^.3 61/52(53

7Tj _ Ao 12"';-1
n o for 2<,j<N-1,
ki 12" S;
NIAN-1 A0 1A2 . N-1

7 rN = 'rN-1 = ( ( 9.6)
7r0.
AN AN 1 2 S 6 ... (SN-1
Since Y it, = 1, one gets
7C o =[i +
o + Ao Nl
...j _ 1 1.
(9.7)
A1S1 j=2 Aj 1... 5J
In (9.5) we have used the convention S N = 1. Now observe that Q is a symmetric

linear transformation with respect to the inner product defined by
N
(x, y)n = Z xiyi 1r;. (9.8)
i =o
That is,
(Qx, y), = (x, QY), . (9.9)
To show this, note that b j Aj nj = j - l Aj _ i nj - 1 , so that
Y_
N-1
(Qx, Y)n = ( - 2 0X0 + 2 0X1)Y0it0 + Y (bjAjX;-1 2;X; + ;A;Xj+l)Yjlrj
;= 1
+ ( ANXN -1 A NXN)YNltN
N
1 OX0Y0it0 ANXNYNiN + 6j Aj x j lY; "j
j=1
N-1 N-1
1jxjyjij +
N-1
_ A :XoY0it0 ANXNYNiN I Ajx j Y; 7r ;
j=1
N N
+ + (9.10)
j= 1 j=1
which is symmetric in x and y. Hence Q has N + 1 real eigenvalues

a N possibly with repetitions, and corresponding mutually orthogonal
eigenvectors x 0 , x 1 , ... , X N of unit length. By orthogonality of the eigenvectors
with respect to (,)n ,
N
(xis xk), = [r XjiXjkitj = 6ik. (9.11)
j=0

Equivalently,
B' diag(n o , ... , 7z N )B = I. (9.12)
Therefore,
B -1 = B' diag(n o , ... '1 N ). (9.13)
Using (9.12) in (9.3) we get
P(t) = etQ = B diag(etao ... , et )B' diag(n o , ... , it N ). (9.14)
From (9.14) we have
N
pij(t) = y- xi k e tak x jk 7C j . ( 9.15)
k=0
As a special case consider the continuous-parameter simple symmetric random

walk given by
2 0 = 2 l= =AN = 1 and i = z=8 ; forl<i N-1.
If a is an eigenvalue of Q, then a corresponding eigenvector y satisfies the

equations
Yj+2Yj +i =ayj (1 <j(N 1),

(9.16)
Yo + Yi = aYo' YN-1 YN = aYN
One may check that the eigenvalues a o , ... , a N are given by
kn
ak =cos N 1 (k=0, 1,2,...,N), (9.17)
and the corresponding eigenvectors x 0 , ... , x N are
x o =(1,1,...,1)', xN= (1,-1,1,...)',

k1L11 (9.18)
x jk = f cos( 1 <k<N-1, 0<j<N.
\NJ
A systematic method for calculating these eigenvalues and eigenvectors is

given in Example 4.1 of Chapter III. The eigenvectors here are the same as
there, but the eigenvalues differ by 1 because the matrices differ by an identity
matrix. Also from (9.6),
itOitN for I <,j<,N 1.

2N, 7rj=N
Hence, by (9.15),
N
Y e(cos(krz/N)-1)tx.kxjki,
pi(t) _
k=0
N-1 k ril cos(k^r^l +

_ 7Tj 1 + e('os("l')-1)'2 cos( ? e-
zt0(i,j) (9.19)
k=1 N \NJ
where 0(i, j) = X IN XjN = +1 or 1 according as I j i( is even or odd. Note

that p,j (t) converges to it, exponentially fast as t oo.
10 ABSORPTION PROBABILITIES
Suppose that PR is the distribution of a continuous-parameter Markov chain

{X1 } starting at i c- S and having homogeneous transition probabilities
p(t) = ((p ;j (t))) with corresponding infinitesimal rates furnished by Q = p'(0).
For a nonempty set A of states, let Q denote the matrix obtained from Q by
replacing all rows corresponding to states in A by zeros. Let P, denote the
distribution of the process {Xj so transformed.
For an initial state i A, all states belonging to A are absorbing states under
Pi . Moreover, in view of Theorem 5.4, the distributions under Pi and P, of the
process {Xt : 0 - t 0:X,EA} (10.1)
is the first passage time to A, must coincide. Therefore, absorption probabilities

starting from i 0 A can be calculated directly in terms of the transition
probabilities under the transformed distribution P, as follows.
Proposition 10.1. Let p(t) and $(t) denote the transition probabilities for the
distributions PI and P; generated by Q and Q, respectively. Then, for i 0 A,
PP(rA > t) = Y_ Pij(t). (10.2)

jO A
Proof. Let { Y} be the discrete-parameter Markov chain starting in state i under

P, obtained in accordance with Theorem 5.4 and let To , Tl , T2 ,... denote the
corresponding exponentially distributed holding times. Writing
v A =inf{n>,0: YeA}, (10.3)

we have
P;(t G TA < 00) _ Z Z * P;(Y = 1, Yl = il, , Ym_1 - i m_ 1, Ym - j, VA - m,

m=1
T0+ ..+Tm - 1 >t)
= L ^* k k. ,G(t; 2, ; , ... > i ^), (10.4)

m=1
where Z* denotes summation over all m-tuples (i 1 , ... , i.-,,j) of states

i l , i2, .. . 'L- 1 E S\A , je A, and G(t; .?;, 2, ... .l ; is the probability that
a sum of m independent exponentially distributed variables with parameters
). A i , ... , ). i exceeds the value t; recall that A,, = g,,. The corresponding
probability under P, is obtained on replacing k ii ,k i ,i k ; , ;z by
iii k; 6 . L ;m _ ,; and G by G.
Now the transition probabilities for the discrete-parameter Markov chains
corresponding to Q and Q are the same for all sequences of states in*.
Therefore, for such sequences G(t;1 i ,, ... , ;m _,) = G(t; 2, .. , J ; ), and
Pi(TA > t) = Pi(TA > t) _ Y_ 1 (t).

j0 A
Observe that since
p, k (t)=0 for all lEA,keS\A, (10.5)
the backward equations for p ;; (t), with i, je S\A, take the form

p '(t) = Q P (t), (10.6)

where p and Q are obtained from p and Q by deleting all rows and columns
corresponding to states in the set A.
Example 1. (Biomolecular Fixation). The surface of a bacterium is assumed to

consist of molecular sites that are susceptible to invasion by foreign molecules.
Molecules of a particular composition, termed acceptable, will be permanently
affixed to a site upon arrival, to the exclusion of further attachments by other
molecules. Molecules that do not possess the acceptable composition remain at
the site for some positive length of time but are eventually rejected. The problem
here is to analyze the rate at which fixation occurs.
Suppose that foreign molecules arrive at a fixed site according to the
occurrences of a Poisson process with parameter A > 0. A proportion a > 0 of
the molecules that arrive at the site are acceptable. Other "unacceptable"
molecules remain at the site for an exponentially distributed length of time with
parameter > 0. The calculation of the distribution of the length of time until
fixation occurs is an absorption probability problem.
At any particular time t, the site is in one of three possible states a, b, or c, say,
depending on whether the site is (a) occupied by an acceptable molecule,
(b) occupied by an unacceptable molecule, or (c) unoccupied. The evolution of
states at the given site is, according to the above assumptions, a continuous-time
Markov chain {X} with state space S = {a, b, c} starting at c and having
infinitesimal rates given by
a b c
a 0 0 0
Q = b 0 . (10.7)
c 2 (1 a)A A
According to Proposition 10.1, the distribution of the first passage time z a to

fixation (a) is given by
Pc(T., > t) = P (t) + P (t), (10.8)
where by (10.6), p (t) is determined by
p '(t) = Q P (t), P ( 0 ) = I, (10.9)
and
b c
. (10.10)
Q c[(1 a)A A]
Q has distinct eigenvalues r,, r 2 given by the zeros of the characteristic

polynomial
det(Q rI) = r 2 + ( A + )r + aA. (10.11)
In particular,
rl = ! (A + ) + z(A 2 + 2(1 2)A + Z)/ 2 , (

10.12)
rz = (A + ) i( 22 + 2(1 2a)i + 2)"2. (10.13)
Two corresponding linearly independent eigenvectors are
1
x = x 1
[r, +i ri+
2
(10.14)

Linear independence is easily checked since
detB=det 1 = (r 2 r 1 ) 0. (10.15)
[ r, + r 2 +
The solution to the system (10.9) is given as in Section 9 by
p (t) = eQ ` = B diag(etr e ,r,)B- i
/22{er2t e rlt}
_ I {(r2 + )e r (r1 + )er2(} 2{e
(r2 r1) (rl + )(r2 + ){e " e r2l } {(r2 + (rl + )e r'I }J
(10.16)
Therefore, from (10.16) and (10.8) we obtain
Pc (z a > t) = ^
r I ^r 2 (1 + rêr'` r i (1 + 2 ) r 2 `]. (10.17)
The average rate of molecular fixation is given by
^D rlr2 +(rr2) , + + A(1 - a)

Erna = Pc(Ta > t) dt = -- -- _ -. (10.18)
o r1 r2 xA
Example 2. (Continuous-Parameter Markov Branching). Let {X} be the

continuous-parameter Markov branching process defined in Example 5.2. The
generator Q = ((q 1 )) is given by
iAfi -i +1 j0i,j =i 1,i+1,...,i'> 1,

,
qî= i).(1 f1), =J 1, (10.19)

0, i =0,j,>0ori,> l,j<i 1.
The probability p of eventual extinction for an initial single particle was

calculated in Section 5. Let
h(s) = h" (s) =gis . (10.20)
Assume that the offspring distribution {f} has finite second moment with mean
< 1. Then h'(1) <0 and h"(1) < oo. In particular it follows from Proposition
5.7 that p = 1. We will exploit the special Markov branching structure to
compute the tail probability P,(t {0} > t) for large values of t under these
assumptions. Let
K i =h'(1)=2(Et-1)<0, K2=h(1)=2 1 j(j-1)fj <co. (10.21)

i =1

We have,
Pl(Tto) > t} =p(t) = I P1.0(t) = 1 gi"( 0 ), (10.22)

i =1
where g 1 (r) _ , p l , j (t)rJ. As in the proof of Proposition 5.7 at (5.27), the

Kolmogorov backward equation for branching transforms according to
a cis r
9r O
= h(g,' (r)), (10.23)
at
with g(r) = r. Thus, for each t > 0, 0 < r < 1,
(rg'' ds )
J h(s) = t. (10.24)
Y
Now,
h(1) = 0 and h(s) = h'(1)(s 1) + (s)(s 1) 2 for s <, 1,
where (1 - ) = h"(1) < cc. Thus, under the assumptions for (10.21),
1 _ 1 _ 1
h(s) K,(s 1) + z^(S)(s 1) 2 (s)(s 1)
K 1 (S-1) 1+
2K 1
^(s)(s l )
= 1 1 2K1 = 1 + qO(s), (10.25)
K(S 1) 1 + ^(s)(s 1) K,(s 1)
2K 1
where
c(s)
((s)= 2K,
^( s)(s - 1 )
I+
2K 1
In particular, note that
11
(s)
h(S) K I (s 1)

is bounded for 0 < s < 1. Define for 0 < x < 1,
H(x)_ 1 1 ds + log(1 x). (10.26)

h(s) K 1 (s 1) K1
Then H'(x) = 1/h(x) > 0 for 0 < x 0. (10.27)
Since H is strictly increasing on [0, 1), this can be solved to get
g, l ^(r)=H - '(t+H(r)), 0<r<1, t>,0. (10.28)
Now, from (10.26), we have for x 1 - ,
H(x) =
_J t cp(s) ds + 1 log(1 x)
x K1
1
_ 1 1 x^p(s) ds (1 x) + log(1 x)
x 1
= (1 )(1 x) + 0(1 x) + - log(1 x)

K1
= K2 (1 x) + 1 log(1 x) + o(1 x). (10.29)

2K1 K1
Equivalently, solving this for log(1 x) we have
log(1 2
x) = K 1 H(x) K (1 x) + o(1 x). (10.30)
2K 1
Therefore, for x 1 , -
1 x = e"iH(x)e-O"2I2"jxl-x)(1 + o(1 x))
= e1 1 (1 x) + 0(1 _x)). (10.31)

2K1
Now, by (10.22), (10.28), and (10.31) we have for t oo, y,:=H - 1 (t + H(0)),
Pl (T (o ^ > t) = 1 g'(0) = 1 H - '(t + H(0)) = 1 y,
= e"Jecv^) 1 K (1 YI) + o(1 Yj)^ ,

(1 (10.32)
t
where y, = H '(t + H(0)) - 1 as t --^ oo is used in applying (10.31). Thus,

-
Klcr+ 11 t 0 (1 2k (1 Yt) + o(1 Y))

Pl(T{o} > t) = e
i
= e Kit e K1 ()(1 2K1 (1 g; 1) (0)) + 0(1 g 1) (0)))
ce 1 ' - ) ` as t -+ oo, (10.33)
where
K1 = 2(1 )>0, c=e'Ieto>>0.
By consideration of the p.g.f. k(z, t) of the conditional distribution of X, given

{X, > 0, X0 = 1}, one may obtain the existence of a nondegenerate limit
distribution in the limit as t --^ oo having p.g.f. of the form (Exercise 4*),
Z ds
k(z, co) := tim k(z, t) = 1 exp K l (10.34)
roo o h(s)
11 CHAPTER APPLICATION: AN INTERACTING SYSTEM

THE SIMPLE SYMMETRIC VOTER MODEL
Although their interpretations vary, the voter model was independently

introduced by P. Clifford and A. Sudbury and by R. Holley and T. Liggett. In
either case, one considers a distribution of + l's at the point of the d-dimensional
integer lattice Z d at time t = 0. In the demographic interpretation one imagines
the sites of Z' as the locations of a species of one of the two types, + 1 or 1.
In the course of time, the type of species occupying a particular site can change
owing to invasion by the opposition. The invasion is by occupants of neighboring
sites and occurs at a rate proportional to the number of neighboring sites
maintained by the opposing species. In the sociopolitical version, the values
+ 1 represent opposing opinions (pro or con) held by occupants of locations
indexed by 7L'4 . As time evolves, a voter may change opinion on the issue. The
rate at which the voters' position on the issue changes is proportional to the
number of neighbors who hold the opposing opinion. This model is also related
to a tumor-growth model introduced by T. Williams and R. Bjerknes, which,
in fact, is now often referred to as the biased voter model. If one assumes the
mechanism for cell division to be the same for abnormal as for normal cells in
the Williams-Bjerknes model (i.e., no "carcinogenic advantage" in their model),
then one obtains the voter model discussed here (see theoretical complements
4 for reference). While mathematical methods and theories for this and the
more general models of this type are relatively recent, quite a bit can be learned
about the voter model by applying some of the basic results of this chapter.
A sample configuration of 1-values is represented by points a = ( a n : n E 7L )

in the (uncountable) product space S = (1, I )". The evolution of
configurations for the voter model will be defined by a Markov process in S.
The desired transition rates are such that in time t to t + At, the configuration
may change from a, by a flip at some site m, to the configuration 6 ( m , where )
Q;,m ) = a n if n = m and 4m ) = a n otherwise. This occurs, with probability

C m (a) At + o(At), where c m (a) is the number of neighbors n of m such that
a n 54 a,,; two sites that differ by one unit in one (and only one) of the coordinate
directions are defined to be neighbors. More complicated changes involving a
flip at two or more sites are to occur with probability o(At) as At -- 0.
It is instructive to look closely at the flip-rates for the one-dimensional case.
For a configuration a E S a flip at site to occurs at unit rate c m (a) = I if the
neighboring sites m I and m + I have opposite values (opinions); i.e.,
u._ lam+ 1 = 1. If, on the other hand, the values at m 1 and m + I agree
mutually, but disagree with the value at m, then the flip-rate (probability) at
in is proportionately increased to c m (a) = 2. If both neighboring values agree
mutually as well as with the value at ni, then the flip-rate mechanism is turned
off, c m (a) = 0. Thus there is a local tendency toward consensus within the system.
Observe that c m (a) may be expressed as a local averaging via
Y_
cm( 6 ) = I 2am( a m-I + a m+1 ). (11.1)
More generally, in d dimensions the rates c m (a) may be expressed likewise as
c m (a) = d(I am hmn"n J

nt/ d /
=2d I Pmn' (11.2)

In: o. # amt
where p = (( p m ,,)) is the transition probability matrix on 7L of the simple

symmetric random walk on Z , i.e.,
1
if m and n are neighbors,
2d'
Pm,, =
0, otherwise.
This helps explain the terminology simple symmetric voter model.

The first issue one must contend with is the existence of a Markov evolution
{a(t): t >, 0} on the uncountable state space S having the prescribed infinitesimal
transition rates. In general this itself can be a mathematically nontrivial matter
when it comes to describing infinite interacting systems. However, for the special
flip rates desired for the voter model, a relatively simple graphical construction
of the process is possible; called the percolation construction because of the
"fluid flow" interpretation described below.
Time
Space
m n
... (+) (+) (-) ... ( ao(o))
(-) (-)
Figure 11.1
Consider a space-time diagram of V x [0, oo) in which a "vertical time

axis" is located at each site m n 7L , and imagine the points of Z" as being laid
out along a "horizontal axis"; see Figure 11.1. The basic idea is this. Imagine
each vertical time axis as a wire that transports negative charge (opinion)
upward from the (initially) negatively charged sites m such that a m (0) = -1.
For a certain random length of time (wire), the voter opinion at m will coincide
with that of Q m (0), but then this influence will be blocked, and the voter will
receive the opinion of a randomly selected neighbor. Place a S at the blockage
time, and draw an arrow from the wire at the randomly selected site to the
wire at m at this location of the blockage. S stands for "death of influence from
(directly) below." If the neighbor is conducting negative flow from some point
below to this time, then it will be transferred across the arrow and up. So, while
the source of influence at m might have changed, the opinion has not. However,
if the selected neighbor is not conducting negative flow by this time, then a
portion of the wire at m above this time will be given positive charge (opinion).
Likewise, at certain times positive (plus) portions of the wire at a site can
become negatively charged by the occurrence of an arrow from a randomly
selected neighbor conducting negative flow across the arrow. This "space-time"
flow of influence being qualitatively correct, it is now a matter of selecting the
distribution of occurrence times of 8's and arrows with the Markov property
of the evolution in mind. The specification of the average density of occurrences
of 6's and arrows will then furnish the rate-parameter values.
Let {Nm (t): t >, 0}, m E 71 d , be independent Poisson processes with intensity
2d. The construction of a Poisson process having right-continuous unit jump
sample functions is equivalent to the construction of a sequence {7,,(k): k > 1}
of i.i.d. exponential inter-arrival times (see Section 5). The construction of
countably many independent versions is made possible by Kolmogorov's
existence theorem (theoretical complement 6.1 of Chapter!). Let {U m (k): k
m e Z', be independent i.i.d. sequences and independent of the processes
{Nm (t) }, m E Z, where P(U m (k) = n) = p mn . At the kth occurrence time

r m (k):= Tm (1) + + Tm (k) of the Poisson process at m e Z d a neighboring
,
site of m, represented by Um (k), is randomly selected. One may also consider

independent Poisson processes {Nmn (t): t >, 0}, m, n e Z d , with respectively
intensity parameters 2dp mn representing those (Poisson) times at m when the
neighboring site n is picked. Notice that 2dp mn is either 1 or 0 according to
whether m and n are neighbors or not.
At the kth event time i = r m (k) of the process {Nm (t)}, let n = Um (k) be the
corresponding neighbor selected. Place an arrow from (n, t) to (m, r), and attach
the symbol S to the point (m, r) at the arrowhead. For a given initial
configuration a(0) _ ( m (0)), define a flow to be initiated at the sites m such
that a m (0) = 1. The flow passes vertically until blocked by the occurrence of
a S above, and also passes horizontally across arrows in the direction of the
arrows and is stopped at the occurrence of a b from moving upward. The
configuration at time t is defined by
6 n (t) = 1 if the (negative) flow reaches (n, t) from some initial point (m, 0)
= +1 otherwise. (11.3)
More precisely, the flow is defined to reach (n, t) from (m, 0) if there are times
0= t o <r 1 <<z k+1 =t and sites m= n 0 ,n 1 ,...,n k =n such that an
arrow occurs from (n ; _ 1 , T i ) to (n;, t ; ) and there are no b's along the half-open
vertical intervals (excluding lower end points) joining (n t i ) to (n i ),
i = 1, ... , k. In this case (n, t) and (m, 0) are said to be connected by a
continuous-time nearest-neighbor percolation.
Y_
The transition from a m (t) = Ito a m (t) = +1 occurs if and only if t =
is an event time of the process {Nmn (t)} for some n such that a n (t) _ +1.
Therefore the flipping rate from 1 to + I is
1 2 dPmn( 1 + a n )/2 = d( 1 am Pmnan = c m (a) for am= 1.

)
Y_
The transition from a m (t ) = + 1 to Q m (t) = 1 occurs if and only if t = T is
an event time of the process {Nmn (t)} for some n such that a n (t ) _ 1.
Therefore the flipping rate from + 1 to 1 is
2 dpmn( 1 Qn)/ 2 = d(l am Pmnan) , for a m = + 1.

n
This (noncanonical) construction provides the existence of a Markov process

with the desired infinitesimal rates. To check the Markov property, it is enough
to consider the distribution of a n (t + s) for some finitely many sites n e Z", and
fixed s, t >, 0, conditionally given the (entire) configurations a(u), u < t. Working
backwards through the spacetime diagram from a., (t + s), ... , a n , ( t + s), note
that the value Q,(t + s) must coincide with Qm ,(t) for some m i e Z", where m ;
depends only on a(u), u >, t, but not on a(u), u < t. In the case of Figure 11.1
the history of v(t + s) given a(u), u < t, shows that the voter at n at time t + s
is copying the voter at m at time t (see theoretical complement 1).
The two configurations a + and a - representing total consensus, a = + 1 o
and a. _ -1 for all n, are absorbing. In particular this makes each probability
distribution of the form p . + ( 1 - p)S a -, for 0 < p < 1, an invariant
Q
equilibrium distribution for the system. To understand conditions under which

it is also possible to have other equilibrium distributions of persistent
disagreement requires some further analysis.
Probability distribution on S are defined for events belonging to the sigmafield
F generated by events in S that depend on the values of configurations at
finitely many sites. The state space S has a natural metric space structure for
which F coincides with its Bore! sigmafield (theoretical complement 2).
Moreover, S is a compact metric space with this metric (theoretical complement
2). Consequently, any collection of probability measures on (S, .) is necessarily
tight (theoretical complement 8.2 of Chapter I). This is quite significant since
it implies that convergence in distribution (weak convergence) coincides with
convergence offinite-dimensional distributions in this setting. Special advantage
is taken of this fact in the use of a method of finite-dimensional (multivariate)
moments to study the long-run behavior of the distribution of the system.
Let f be a real-valued function on S that depends on a fixed set of finitely
many coordinates of configurations a e S. Then f must be bounded. Define
linear operators T , ( t >, 0), for such functions f by
1
T,f(a)=E.f(a(t)), aeS. (11.4)
Then,
`_
T,f(a) .f (a) = Ea{.f (d'(t)) .%(a)}
_Y_ m
{f() f(a)}a n (a)l +O(t), (11.5)
uniformly as t - 0 + . Accordingly, for functions f on S depending on finitely

many coordinates we have for t > 0, a E S,
(a) = T,(Af)(a) = E Q {Af(a(t))}, (11.6)

at
where, for such functions f,

T f(a) f(a)
`
Af(a) = lim
t - a + t

E { f (a(t
Q )) f (a)}
= lim
r-0 1
= L. { (a 1m, ) J (a)}C m (J). (11.7)

m
Let tt be an arbitrary initial probability distribution on (S, S) and let ,

denote the corresponding distribution of a(t) with p o = . The (spatial) block
correlations of the distribution p, over the finite set of sites D = {n 1 , ... , n,},
say, are the multivariate moments of a (t), ... , .r (t) defined by
(p (t, D) = E { Qn (t) x ... X a(t)}. (11.8)
Using standard inclusionexclusion calculations, one can verify that a probability

distribution on (S, .y) is uniquely determined by its block correlations. Let
cp 0 (t, D) = cp sa (t, D). Applying equation (11.6) to the function f (a) := Q n , x
x a, a e S, we get an equation describing the evolution of the block
correlations of the distributions as follows:
tp e (t, D) j^ l
=
Ea
kn Q( ) (t) kfl 6 k(t) Cm(a(t))
t
_ 2E6{ [ii ak(t) ] Cm(a(t))S

mED kD
_ 2d Z E,, j
meD l fl a (t)] L 1
t keD
k am(t) I Pmn an( t )
n J }
_ 2dIDI cp e (t, D) + 2d 1 p mn q (t, (D\{m})A{n}), (11.9)
meD n
where A is symmetric difference, A AB := A n BC u AC n B, A, B c Z', and IDS

is cardinality of D. Taking expected values, we get from (11.9) that
acp (t, D)
2dIDjcp (t, D) + 2d [' / t, / D m 0 J n
a t mED n
_ Y E {cp (t, (D\{m})A{n}) (p (t, D)}2p mn . (11.10)

mD n
As a warm-up to the equations (11.10) take D = {m} and suppose that p is

an invariant (equilibrium) distribution. Then Ep C m = cp,,(0, {m}) is a harmonic
. function for the random walk; i.e., the left-hand side of (11.10) is zero so that
the equations show that q, has the averaging property (harmonicity)
p (0, {m}) _ Y_ q,(0, {n})p mn . (11.11)

n

Now, for the simple symmetric random walk on 7L , one can show that the only
bounded solutions to (11.11) are constants (theoretical complement 3).
Therefore, the distribution of a m under the invariant equilibrium distribution
is independent of m in this case. From here out we will restrict our consideration
to the long-time behavior of various translation invariant initial distributions,
with say,
EN7m=2p-1, 0<p<l, fEZ d . (11.12)
In particular, we will consider what happens in the long run when voters initially
are, independently, + 1 or 1 with probabilities p, 1 p, respectively.
The equation (11.9) or (11.10) suggests consideration of the following
finite-particle system. A particle is initially placed at each site of the finite set
Do = D. Each of these particles then undergoes a continuous-time random walk
according to p = ((p m )) with exponential holding times with parameter 2d,
independently of the others until two particles meet. If two of the particles meet,
then they are mutually annihilated. Let D, denote the collection of sites occupied
by particles at time t > 0. The evolution of the stochastic process {D,} takes
place in the (denumerable) state space S = q(d) consisting of all finite subsets
of Z'. The empty set is absorbing for the process. If D, : 0 then in time t to
t + At a change to (D,\{m })i {n}, for some me D, and ne Z', occurs with
probability 2dp mn At + o(At) as At --* 0+. Other types of changes in the
configuration have probability o(At) as At --^ 0+.
Now consider that for fixed a e S we have by (11.9) that
0 9.(t, D) = p *
^VQ(t, )(D), (11.13)
at
together with
(p.(0, D) = [1 a m , (11.14)
mED
where for D E _V (d) ,
A*f(D) = Y I { f((D\{m})A{n}) f(D)}2dp mn , (11.15)

meD n
for bounded real-valued functions f on (d) . Since A* is the infinitesimal

generator for T f (D) = E D f (D,), t >, 0, D e 2 (d) , the annihilating random walk,
the solution to (11.13), (11.14) is given by
(PQ(t, D) = ED(a(0, D1) = E D I H Qm } . (11.16)

m e D, )))
The representation (11.16) is known as the duality equation and is the basis for
the proof of the following major result.
Theorem 11.1. Let p = ((p m )) be the transition probability matrix of the simple
symmetric random walk on 7L associated with the simple symmetric voter
model on 7L . For an initial probability distribution satisfying (11.12) we have
the following:
(a) If d < 2, then p, converges in distribution to p Q+ + (1 - p)S,, _ as t -* oo.
(b) If d >, 3, then for each p e (0, 1) there is a distinct translation-invariant
equilibrium distribution v ) on (S, .F ), which is not a mixture of 6 Q+
and S a _, such that E,,,p,v m = 2p - 1. Moreover, if the initial distribution
It is that of independent 1-valued Bernoulli random variables with
probability parameter determined by (11.12), then , converges in
distribution to v 0) as t -+ co.
Proof. To prove (a) first consider that
E p m (t) = cp (t, {m}) =

fS
^p.(t, {m })(da)
_J [ ESml fl T k ] 1t(da)
Y_ Pmk(t)E j,Uk = 2p - I for all t >, 0, (11.17)

k
where p mk (t) is the transition law for the continuous-time random walk. Also
for distinct sites n and m in 7L" we have
q (t, {n, m}) = Pp(0m(t) = (t)) Pj (Om(t) # cr (t))
= I - 2P (a m (t) : a, (t)). (11.18)
Therefore, it is enough to show that cp(t, {n, m}) - 1 as t -> oo to prove (a).
Now since the difference of the two independent simple symmetric random
walks starting at m and n evolves as a (symmetrized) continuous-parameter
simple symmetric random walk, it follows from the recurrence when d < 2, that
0 a.s. as t --> oo; i.e., the particles will eventually meet. Using the duality
relation and the Lebesgue Dominated Convergence Theorem we have, therefore,
lim cp (t, {n, m}) = lim J E{f,m) (11.19)

s L fl keD^
(Tk}(d) = 1.
To prove (b), on the other hand, take for It the Bernoulli product distribution
of independent values subject to (11.12) and consider the duality relation
yl,(t, D) = E 1 , ED H 6 m} = ED Et fl amt = ED fl EN am
l meD, { meD1 J meDt
= ED( 2 p 1)ID I, (11.20)

,
where 1D1 1 denotes the cardinality of D1 . Now since JD I is a.s. nonincreasing, the
limit, denoted FDJ, exists a.s. and is positive with nonzero probability by
transience. The limit distribution v P is defined through its block correlation
( )
accordingly. That is,
f s{ [ am}vc")(da) = E D (2p 1)ID. 1 .

meD
Certainly it follows that v p is translation-invariant and, being a time-asymptotic

(
distribution of the evolution, one expects it to be invariant under the (further)

evolution. More precisely, for any (bounded) continuous real-valued function
f on S we have by the ChapmanKolmogorov equations (semigroup property)
that
f(i1)v(di) = fS J f(rl)p(t; a, dil)vcP ) (da)

fS s
f ( = lim
s-. f
s s
11)p(t; a, dq)g,(da), (11.22)
since for fixed (continuous) f and t > 0, the (bounded) function

a -- $ s f (rl)p(t; a, dq) is continuous (theoretical complement) and, as noted
earlier, by compactness of S s converges to v(da) in distribution (i.e., weak
convergence) as s --* oo. Now, from (11.22) and the ChapmanKolmogorov
equations again, one has
f(t)v(di) = lim J .f(il),+s(drl) = J f(rl)v (a) (dii). (11.23)

J s s_^ s s
Thus, since the integrals of continuous functions on S with respect to the

distributions v(drl) and v (1(dq) coincide, the two measures must be the same
(see theoretical complement 8.6 of Chapter I). n
The property that, for each bounded continuous function f on S. the function
a * E f (a(t)) = f s f (rl) p(t; a, dq) is also a bounded continuous function, is
Q
called the Feller property. As illustrated in the above proof (Eq. 11.22), this
property is essential for confirming one's intuition about invariance properties
of long-time limits under weak convergence. The other important role played
EXERCISES 333
by topology in the above was in the use of compactness to, in fact, get weak
convergence from finite-dimensional calculations.
EXERCISES
Exercises for Section IV.1

1. Show that in order to establish the Markov property it is enough to check Eq. 1.1
for only one "future" time point t, for arbitrary t > s.
2. Show that a continuous-parameter process with independent increments has the
Markov property. Show also that such a process has a homogeneous transition law
if and only if, for every h > 0, the distribution of X, +h X, is the same for all t _> 0.
3. Show that, given a Poisson process {X} as in Example 1 with f p(u) du = cc, there
exists an increasing transformation cp: [0, oo) + [0, cc) such that the process
X,,,,} is homogeneous with parameter). = 1.
4. Prove that the compound Poisson process (Example 2) has independent increments.
5. Consider a compound Poisson process {X,} with state space U8' and an arbitrary
jump distribution p(dx) on (f8'.
(ii) Show that {X} is a Markov process and compute its transition probability
p(t; x, B)'= P(X, E B I X0 = x).
(ii) Compute the characteristic function of X,.
6. (Doubly Stochastic Poisson or Cox Process) Suppose that the parameter A (mean
rate) of a homogeneous Poisson process {X} is random with distribution
p(dA) = f(A) d). on (0, ro). In other words, conditionally given A = ) o , {X} is a
Poisson process with parameter A 0 .
(i) Show that {X} is not a process with independent increments.
(ii) Show that {X,} is (generally) not a Markov process.
(iii) Compute the distribution of X, for arbitrary but fixed t > 0.
(iv) Compute Cov(X ,, X,).
.
7. Generalize Exercise 6 to compound Poisson processes.

8. The lifetimes of elements of a certain type are independent and exponentially
distributed with parameter a > 0. At time t = 0 there are X0 = n living elements
present. Let X, denote the number alive at time t. Show that {X,} is a Markov
process and calculate its transition probabilities.
9. Let {X,: t _> 0} be a process starting at x with (stationary) independent increments,
and EX, < cc. Assuming EX, and EX are continuous, prove the following.
(i) EX, = mt + x, Var X, = a 2 t for some constants m and a 2 .
(ii) (X, mt)/../ converges in distribution to the Gaussian distribution with mean
0 and variance a', as t ^ co.
10. (i) Show that the sum of a finite number of independent real-valued stochastic
processes, each having independent increments is also a process with
independent increments.
(ii) Let {N,(' ) } (i = 1, 2, ... , k) be independent Poisson processes with mean
parameter A ; (i = 1, 2, ... , k). If c C 2 ..... C, are arbitrary distinct positive

constants, show that { c N} is a compound Poisson process and compute
;
its jump distribution and the (Poisson) mean rate of occurrences.

(*iii) Prove that every compound Poisson process is a limit in distribution of
superpositions of independent Poisson processes as described in (ii). [Hint:
Compute the characteristic function of respective increments.]

1. Check the Kolmogorov consistency condition for P,, defined by Eq. 2.2, assuming
the ChapmanKolmogorov condition (2.5).
2. There are n identical components in a system that operate independently. When a
component fails, it undergoes repair, and after repair is placed back into the system.
Assume that for a component the operating times between successive failures are
i.i.d. exponential with mean 1/A, and that these are independent of the successive
repair times, which are i.i.d. exponential with mean 1/. The state of the system is
the number of components in operation. Determine the infinitesimal generator of
this Markov process.
3. For the process {X,} in Exercise 1.8, give the corresponding infinitesimal generator
and Kolmogorov's backward and forward equations.
4. Let {X,} be a birthdeath process on S = {0, 1, 2, ...} with q + , = i, q , _, = i5,
; ;
i >,0, where ,b>0,q ;; =0ifljit>1,q o ,=0.Let
m(t) = EX,, s (t) =

; E.X..
(i) Use the foward equation to show m(t) _ ( S)m (t), m ; (0) = i.
;
(ii) Show m (t) =

;
(iii) Show s(t) = 2( b)s (t) + ( + S)m (t).

; ;
(iv) Show that
+ + (1 e ê_ 1)] if 8
sr(t)=
i(i + 2t), if = b.
(v) Calculate Var X,.

;
*5. Consider a compound Poisson process {X,} on III' with an arbitrary jump distribution
p(dx).
(i) For a given bounded continuous function f on II' compute
u(t, x)'= E(f(X,) 1 Xo = x),
and show that
at u(t, x) = Au(t, x) + J
.i J u(t, x + y)(dy) = A (u(t, x + y) u(t, x))(dy).
EXERCISES 335
(ii) Show that the limit of the last expression is
lim d u(t x) = (Qf)(x),

'
, lo at
where Q is the integral operator (Qf)(x) _ ) f (f (x + y) f (x)) p(d y).
(iii) Write the (backward) equation (i) above for u(t, x) in terms of Q.
6. Let { Y} be a nonhomogeneous Markov chain with transition probabilities
p (s, t) = P(Y = j ( Y = i), continuous for 0 _< s _< t, with p ;i (s, s) = b, i , i, j e S, and
S
such that
lim P. i(s' t ) a +i = 91;(s)

NS ts
exists and is finite for each s.

(i) Show that the ChapmanKolmogorov equations take the form,
Pk(, t) _ p^ ; (s, r)P;k (r, t), s < r < t.
(ii) For finite S show that the backward and forward equations, respectively, take
the forms below:
(backward) apàs' t) = Z 9ii(s)Pik(s, t),

i
(forward)
a
P,k(S, t)
dt _ P,;(s, t)9;k(t)
i
7. Consider a collection of particles that act independently in giving rise to succeeding

generations of particles. Suppose that each particle, from the time it appears, waits
a random length of time having an exponential distribution with parameter ), and
then either splits into two particles with probability p or disappears with probability
q = I p. Find the generator of this Markov process on the state space
S = {0, 1, 2, 3, ...}, the state being the number of particles present.
8. In Exercise 7 above, suppose that new particles immigrate into the system
(independently of particles present) at random times that form a Poisson process
with parameter y, and then give rise to succeeding generations as described in
Exercise 7. Compute the generator of this Markov process.
*9. Let {XJ be a Markov process with state space an arbitrary measurable space (S, .9').
Let B(S) be the space of (Sorel-measurable) bounded real-valued functions on S
with the uniform norm, 111 11 = supxeslf(x)I, JE B(S). Then (B(S), II II) is a Banach
space. Define T, f(x) = E x f (X,), t >_ 0, x e S, for Je B(S). Also, for Je B(S) such
that lim, . o . {(T, f f)/t} exists in (B(S), II II ), say that f belongs to the domain of
Q and define Qf = lim,. o ,{(T,f f)/t}. Show that if the domain of Q is all of
B(S), then Q must be a bounded linear operator on (B(S), II II); i.e., Q is continuous
on (B(S), II II). [Hint: T, f(x) = f(x) + J T,Qf (x) ds, t > 0, XE S, Je B(S). Apply
the closed graph theorem from functional analysis.]
10. (Continuous-Parameter Plya Process) Fix r> 0 and consider a box containing
r = r(r) red balls and b = b(T) black balls. Every r units of time a ball is randomly
selected, its color is noted, and together with c = c(z) balls of the same color it is
placed in the box. Let S T denote the number of red balls sampled by the time nr,
n = 0, 1, 2, ... , S o = 0. As in Eq. 3.14 of Chapter II, {S,} is a discrete-parameter
nonhomogeneous Markov chain with one-step transition probabilities
pj.;(nr, (n + 1)T) = I - pj.^+i(nr, (n + 1)r),
and
r + ci
p,.;+ (nt,(n+1)a)=P(S^, +nz =i +1 S,, =i)= i= 0,1,...
r + b + nc '
Let p = p(r) = r/(r + b), y = y(r) = c/(r + b), t = nz. Then the probability of a
transition from i to i + 1 in time t to + t is given by
p+yi
(t,t+T)=
1+yt
i
Note that p = r/(r + b) is the probability of selecting a red ball at the nth trial and
np = (p/z)t is the expected number of red balls sampled by time t = nr, i.e., p/i is
the mean sampling rate of red balls. Suppose that p/T = p(r)/r -+ 1 and
= Y(i)/t -. y o > 0 as t - 0.
(i) For fixed r > 0, use a combinatorial argument to show that the distribution of
S,, is given by
n) b(b+c)(b+(n - j)c - c) r( r+c). (r +jc - c)
(
j (b+r)(b+r+c) (b+r+nc-c)
(ii) Show that in the limit as r -+ 0, the distribution in (i) converges to
((j - I)Yo + 1 ) .. . (2Yo + 1 )(Yo + 1)

f(t) = t'(1 + yot)-t-' , = 0,1, 2, ... ,
j
which is a negative binomial p.m.f.

(iii) Show that
jf (t) = t and (j - t) Z Tf(t) = Yot 2 + t.

3=o 3=o
(iv) The continuous-parameter Plya process is often defined as a nonhomogeneous

(pure birth) Markov chain { Y,} starting at 0 on S = {0, 1, 2, ...} with transition
probabilities denoted by P(Y =j I Y = i) = p ;; (s, t), j, i = 0, 1, 2, ... , 0 _< s <_ t,
and satisfying
p, ; (t, t + At) = + q 1 (t) At + o(At) as At - 0,

EXERCISES 337
where
+ iy o
qi,i+,(t) = l = q(t), q 1 (t) = 0, j : i, i + 1 .
l + ty o
Check that P(S( + , ) L = j IS 1 , = i) = p;t (t, t + t) + o(T)as r . 0, n -+ oo, t = nt.

1. Prove the inequality (3.12).
2. Prove that the solution p(t) = exp {tQ} is the unique bounded solution of
Kolmogorov's backward (as well as forward) equations, under the hypothesis (3.9).
3. Solve the forward equations directly in Example 1.
4. Solve the Kolmogorov equations for a Markov process with three states and
infinitesimal generator
A a 0
Q= 0
0 00
where .?, p are positive.

5. Let Q = ((q ij )) such that sup 1 Ig 1 I < co. Show that all elements of eQ` are nonnegative
for all t >- 0 if q, -> 0 for i 0 j. [Hint: Consider
z
r;j (t)= S, j +q ;; t +q;j 1 t +
2!
for small t > 0, and use the fact that the product of matrices with all entries
nonnegative has itself all entries nonnegative together with the property
e to = (e(>Q)'.]
6. Each organism of a system, independently of the others, lives for an exponentially

distributed time with parameter A and is then replaced by two replicas that
independently undergo the same replication process. Let X, denote the total number
of organisms at time t. Show that the bounded rates condition is not satisfied for the
generator Q of {X,}.
7. Calculate the transition probabilities for Exercise 2.2 in the case n = 2
8. In radioactive transformations an unstable atom randomly disintegrates to an unstable
atom of a new chemical and physical structure. Successive atomic states in the chain
are labeled 0, 1, 2, ... , N. Atoms in state i are transformed to state i + 1,0 -< i < N
at the rate q i , ; + i = .1 1 , where A N = 0. Calculate the distribution of the state at time t
of an atom initially in state 0.

1. Calculate the successive approximations p(t) in the case of the (homogeneous)
Poisson process with parameter A > 0.
2. Write out the third iterate (of Eq. 4.7) p; for the case S = {l, 2, 3}, q,, i = A,
q1,2 = A q 22 = qz.3 = , q3.3 = y, q 32 = y, q; = 0 otherwise for i, j e S.
3. (A Pure Death Process) Let S = {0, 1, 2, ...} and let q_ = 6 > 0, q ;; _ S, i > 1,
= 0, otherwise.
(i) Calculate p (t), t 0 using successive approximations.
;;
(ii) Calculate E.X, and Var ; X.

4. Calculate approximations p 2 (t) for Exercises 2.7 and 2.8.
5. Show that the minimal solution p(t) also satisfies the forward equation. [Hint:
Consider the integral version.]
6. Show that the minimal solution p(t), t > 0, is a semigroup (i.e., that it satisfies the
ChapmanKolmogorov equations) by the following procedure.
(i) Let f be a continuous bounded function on [0, oo) with Laplace transform
f(v) =
f" '
e f(t) dt.
-
Define F(s, t) = f(s + t), s, t _> 0, and let
F(v, ) = J $ e`F(s, t) ds dt ( > 0, v > 0)

0
be the bivariate Laplace transform of F. Show that P satisfies the resolvent

equation:
i(v) ^()
F(v, ) _ v ^ .
V/1
[Hint: Write u = s + t, v = s + t, 0 <_ u < oo, u < v < u, in the integral

defining F.]
(ii) Let p(v) = (((v))) denote the matrix of transformed entries of the minimal
solution. Show that p(s + t) = p(s)p(t), s, t >, 0 if and only if
_ p(v) p() = p()p(v), vu.

vp
(iii) Show that the backward and forward equations (see also Exercise 5) transform,
respectively, as
[B]: vp(v) = I + Qp(v),

[F]: v(v) = I + P(v)Q
(iv) Use the backward equation to show
p(v) = A + Q(v)
for A = I + (p v)p(v). Use nonnegativity of p(v) and A for p> v, to show by

EXERCISES 339
induction, using (4.7), that p(v) > ( " ) (p)A for n = 0, 1, 2, ... , and therefore
P(v) > P(,u)A = p() + (p - v)P()P(v), p> v.
(v) Use [F] to prove the reverse inequality and hence (ii). [Hint: Check
r(v) = p(p) + (u - v)p(p)p(v) solves [F] and use minimality.]
7. Compute p ;k (t) for all i, k in Example 1.

1. Show that the holding time T" (n > 1) is not a stopping time.
2. Let A(A), B(A), B(0) denote the events Is < To <_ s + A), {XX+o =j}, {XT" -J},
respectively.
(i) Prove that P; (A(A) n B(A) n B`(0)) = o(A) and P (A(A) n BC(A) n B(0)) = o(A)
as A 10 (for i 0 j). [Hint: Apply the strong Markov property with respect to To
in order to show P; (s < To < To + T, _< s + A) = o(A).]
(ii) Use (i) to derive (5.6) from (5.5).
3. Let U 1 , U21... , U" be i.i.d. uniform on [0, t], and order them as U (l) < U(2) <
<U. Define D; =U(,,-U^_, ) (j=1,2,...,n+1)with U(Q) =0, U( " + , ) =t.
(i) Show that the joint distribution of D,..... D" + , is the same as the conditional
distribution of n + 1 i.i.d. exponential random variables Y,, Y2 , ... , Y" + ,, given
Y1 +...+.Y" +1 =t.
(ii) Show that Dk has p.d.f. given by n(1 - x)" - ', 0 < x < 1, k = 1, ... , n + 1.
4. Derive an extension of Proposition 5.6 for a nonhomogeneous Poisson process with
intensity function p(t) (see Example 1.1).
5. Suppose that a pure birth process on S = {0, 1, 2, ...} has infinitesimal parameters
qkk = - ^`k
(i) If Z Ak ' = oo and 1 -2 < co, then show that the variance of
= To + T, + + T _, (the time to reach n, starting from 0) goes to a finite
limit as n - oo. Use Kolmogorov's zero-one law (see theoretical complements
1.1, Chapter I) to show that as n -+ oc, z" - .ik ' converges (for all sample
paths, outside a set of probability zero) to a finite random variable ry.
(ii) If ak 1 = cc = ^ ak Z , but _p 1 k 3 < oo, show that
C t"

I )/40 Ak 2 )
'
is asymptotically (as n -* co) Gaussian with mean 0 and variance 1. [Hint: Use
Liapounov's central limit theorem, Chapter 0.]
6. Messages arrive at a telegraph office according to a Poisson process with mean rate
of occurrence of 4 messages per hour.
(i) What is the probability that no message will have arrived during the morning
hours (8 to 12)?
(ii) What is the distribution of the time at which the second afternoon message arrives?
7. Let {X,} and {Y} be independent Poisson processes with parameters 2 and p
respectively. Show that the probability of n occurrences of {Y,} within the time
interval from the first to the (r + 1)st occurrence of {X,} is given by
C n r-1 1)(p
A + pJ \.A + p) n =
'
0, 1,2 , ... .
8. Let {X} be a Poisson process with parameter A > 0. Let To denote the time of the
first arrival.
(i) Let N = X ZTo X T ,, and calculate Cov(N, To ).
(ii) Calculate the (conditional) expected value of the time To + + T,_, of the
rth arrival given X, = n > r.
>_
9. Suppose that two colonies, I and 2, start as single units and independently undergo
growth by pure birth processes with rates 2 , respectively. Calculate the expected
size of colony 1 at the time when the first offspring is produced in colony 2.
10. Consider a puredeath process with rates q ;; = q 1 _ 1 = 2>0, i 1, q ; = 0
otherwise.
(i) Calculate p(t).
(ii) If initially there are n, particles of type 1 and n 2 particles of type 2 that
independently undergo pure death at rates S,, 5, respectively, calculate the
expected number of type 1 particles at the time of extinction of the type 2 particles.
11. Let {N,} be a homogeneous Poisson process with parameter 2>0. Define
X, _ (-1) ^ `, t 0. Show that {X,} is a Markov process and compute its transition
probabilities.
12. Consider all rooted binary tree graphs having n sources (i.e., degree-one vertices
excluding the root). Call any edge incident to a source vertex an external edge and
call the others internal (see Fig. Ex.IV.5). [By a rooted tree is meant a tree graph
in which a vertex is singled out as the root. Other graph-theory terminology is
described in Exercise 7.5 of Chapter II.]
n=4
-source
Internal
Root
Figure Ex. IV.5
_>

EXERCISES 341
(i) Show that a rooted binary tree with n sources has n external edges and n t
internal edges, n 1. In particular, the total number of edges is 2n 1, which
is also the total number of vertices excluding the root.
(ii) Show that the following code establishes a one-to-one correspondence between
the collection of all rooted binary tree graphs and the collection of all simple
polygonal paths from (0, 0) to (2n 1, 1) in steps of + I that do not touch
or cross the line y = 1 prior to x = 2n 1. Starting with the edge incident
to the root, traverse the tree along the leftmost path until reaching the leftmost
source. Then follow back until reaching a junction leading to the next leftmost
source, and so on, recording, on the first (and only on the first) traverse of an
edge, + if it is internal and if it is external. The path 2n I long of (+ )s
and ()s furnishes the coding of the tree in the form of a "random walk
excursion."
(iii) Use (ii) and the reflection principle (Section 4, Chapter I) to calculate the
distribution of ML in Example 3.
(iv) Let g(x) = Ex M I denote the probability-generating function of ML . Use the
recursive structure of the tree to establish the quadratic equation
g(x) = Zx + ? g 2 (x).
(v) Use (iv) to give another derivation of the distribution of ML .
13. Show that the holding time in state j for Example 2 is exponentially distributed
with parameter i t = jA (I? 1).

I. Show that (I + x) log(1 + x) _> >_
x for all x 0.
2. Show that (i) the backward equation holds for p 1 (t), and (ii) {(t): t > 0} are
transition probabilities.
>_
3. Show that {p(t): t > 0} are transition probabilities on S v {A} satisfying the
backward and forward equations.
4. (i) Consider an initial mass of size x o that grows (deterministically) to a size x, at
time t at a rate that is dependent upon size; say, x; = f (x,), t 0. Give an example
of a growth-rate function f (x), x 0, such that the mass will grow without bound
within a finite time > 0, i.e., lim,_, x, = co. [Hint: Consider a case in which
each "element" of mass grows at a rate proportional to the total mass, so that
the total mass itself grows at a rate proportional to the mass squared.]
(ii) Let X, denote the number of reactions that have occurred on or before time t in
a chain reaction process. Suppose that {X} is a pure birth process starting at 0.
Show that the expected number of reactions that occur in a finite time interval
[0, t] need not be finite.
(A Bus Stop Problem) A passenger regularly travels from home to office either by
bus or by walking. The travel time by walking is a constant t w , whereas the travel
time by bus from the stop to work is a random variable, independent of the bus
arrival time, with a continuous distribution having mean t b < t w . Buses arrive
randomly at the home stop according to a Poisson process with intensity parameter
2. The passenger uses the following strategy to decide whether to walk or ride. If the
bus arrives within c time units, then ride the bus; if the wait reaches c, then walk.
Determine (optimality) conditions on c that minimize the average travel time in terms
of t b , t A. [Note: The solutions c = 0 (always walk), c = cc (always ride) are

permitted.]
6. Consider a single telephone line that is either free (state 0) or busy (state 1). Suppose
incoming calls form a Poisson process with a mean rate (intensity) A per minute. The
successive durations of calls are i.i.d. exponential random variables with parameter
p (mean duration of a call being p ' minutes), independent of the Poisson process
-
of incoming calls. If a call arrives at a time when the line is busy, the call is simply lost.
(i) Give the corresponding prescription of Kolmogorov's backward and forward
equations for the state of the line.
(ii) Solve the forward equation.

1. Show that the process {X} described noncanonically in Example 2 is a Markov
process.
2. Prove Eq. 7.11.
3. Let {X,} be the Yule linear growth process of Example 2. Show that
(i) EXX = ie".
(ii) Var, X, = ie(el` - 1) = 2ie 3 II 2 sinh(lAt).
4. Show that the N-server queue process {X} is a Markov process.
*5. (Renewal Age Process) Let Ti , T2 ,... be an i.i.d. sequence of positive random
variables with a continuous (lifetime) distribution function F. Let S o = 0,
S=T1 +T2 ++T,,,n> 1,andforA 0 =O,definefort>0,theageattimet
by A,:= t - max{Sk : Sk _< t, k >_ 0}. Then {a + A,} is the process starting at a >_ 0.
(i) Show that {A,} has the Markov property: i.e., for arbitrary time points
0 -< s o <S 1 < <S <s < t < t, < < t n , the conditional distribution of
A, A,,, .. , A A so , .. , A sk A S does not depend on A so , .. , A sk
, .
(ii) The failure rate (also called hazard rate or force of mortality) for the objects
being renewed is defined by h(t) = f(t)/[l - F(t)], where f is the p.d.f. (assumed
to exist) of F. For continuously differentiable functions g having bounded
derivatives show, for a _> 0,
g(Ar) - g(a) dg a -> 0.

Qg(a): lim Ep = h(a){g(0) - g(a)} + >
,- o t da
(iii) Show that (a) = Z -1 exp{ - J$ h(u) du}, a >_ 0, where Z is the normalization
constant, solves f o u(a)Qg(a) da = 0. In particular, show that p is the density
of an invariant probability for {A,}.
(iv) Show (i) holds also for the residual lifetime { SN , - t}, where N, = inf {n >_ 0:
S> t}. For ET, < oc, show that ff (t) = (1 - F(t))/ET t >, 0, is the density
of an invariant probability. [The above has an interesting generalization by

EXERCISES 343
F. Spitzer (1986), "A Multidimensional Renewal Theorem," in Probability,

Statistical Mechanics, and Number Theory, Advances in Mathematics
Supplementary Studies, Vol. 9 (G. C. Rota, ed.), pp. 147-155.]
6. (Thinned Poisson) Let {t n } be the sequence of occurrence times for a Poisson
process {N} with parameter 2. Let {r} be an i.i.d. sequence, independent of the
Poisson process, of Bernoulli 0-1-valued random variables such that P(E, = 0) = p,
0 0, Vo = 0,
n =o
where {N, = max {n: i n < t}} is the Poisson counting process. Show that {}} is a
Poisson process with intensity parameter (1 - p)2.
7. (i) (Vibrating String) For Example 5, check that in the (deterministic) case
p = ar = 0, u(x, t) = 2 {cp(x + vt) + cp(x - vt)}. In particular, that u(x, t) solves
=v2-
[Z x2 , u(x, 0) = cP(x), nt
au t) =0 at t = 0.
(ii) Check that there is a randomization of time represented by a nonnegative

nondecreasing stochastic process { T,} such that
u(x, t) = z{Ecp(x + vT,) + Ecp(x - vT) }.
[Hint: Simply consider (7.37).]

8. (Diffusion limit) Check that in the limit a -. co, v - oo, 2a /V 2 = D ' > 0, the -
diffusion equation
u au
at = D 3x2'
is consistent with the telegrapher's equation.

9. The flow of electricity through a coaxial cable is typically described by the
telegrapher's equation (7.19) or (7.32), (7.33), where u(x, t) represents the
(instantaneous) voltage and w(x, t) the current at a distance x from the sending end
of the cable. The parameters a, /3, and v 2 can be interpreted in terms of the electrical
properties of the cable as outlined in this exercise. If one ignores leakage (conductance
due to inadequate insulation), then the circuit diagram for the segment of cable
from x to x + Ax may be depicted as in Figure Ex.IV.7 below.
Here the parameters R, L, and C are the resistance, inductance, and capacitance
per unit length. These parameters are defined in accordance with certain physical
principles. For example, Ohm's law says that the ratio of the voltage drop across a
resistor to the current through the resistor is a constant (called the resistance) given
here by R'Ax. Thus, the voltage drop across the resistor is R Ax w(x, t). Likewi.>e,
x+Ax
R4x Lex
(x. t) I J -L cex u(x +/X. t)
Figure Ex. IV.7
when there is a change in the flow of current aw/at in an inductor then there is a
corresponding voltage drop of L Ax Ow/Ot. According to Kirchhoff's second law, the
sum of potential drops (as measured by a voltmeter) around a closed loop in an
electric network is zero. Thus, for Ax small, one has
u(x + Ax, t) - u(x, t) + R Ax w(x, t) + L Ax a` = 0.
The nature of a capacitor is such that the ratio of the charge stored to the voltage
drop is the capacitance C Ax. Thus, the capacitor current, being the time rate of
change of charge, is given by C Ax au/at. According to Kirchhoff's first law, the sum
of the currents flowing toward any point in an electrical network is zero (i.e., charge
is conserved). Thus, for Ax small, one has
au
w(x+ Ax, t)w(x,t) +CAx=0.
at
(i) Use the above to complete the derivation of the transmission line equations and
the telegrapher's equation (for twice-continuously differentiable functions) and
give the corresponding electrical interpretation of the parameters a, , and v 2 .
(ii) In the case when leakage is present there is an additional parameter (loss factor)
G, called the conductance per unit length, such that the leakage current is
proportional to the voltage G Ax u(x, t). In this case the term G Ax u(x, t) is
added to the left-hand side in Kirchhoff's first law and one sees that the voltage
and current satisfy transmission line equations of exactly the same general forms.
Show that precisely the same equation is satisfied by both voltage and current.
10. In Example 4, suppose that the service time distribution is arbitrary, with distribution
function F. Determine the distribution of W,.

1. Let Z (i >, 1) be a sequence of i.i.d. nonnegative random variables, P(Z, > 0) > 0.
;
Prove that Y "_, Z -* oo almost surely.

;
2. Prove that the first and third terms on the right side of Eq. 8.13 go to zero (almost
surely) as t -+ oo, provided (8.10) holds.
EXERCISES 345
3. When is a compound Poisson process (i) a pure birth process, (ii) a birthdeath
process? Write down q1, k ;j in these cases.
4. Let {X,} be a birthdeath chain, and let c <x <d be three states.
(i) Compute Px ({X,} reaches c before d) in terms of the infinitesimal rates (i.e.,
bi).
(ii) Calculate p PX ({X,} ever reaches c) in the case {X} is nonexplosive. Briefly
comment on what may happen in the case that explosion may occur with positive
Ps - probability.
5. Given a birthdeath chain {X}:
(i) Derive a difference equation for m(x):= E X (a) where i is the first time {X }
reaches c or d (c 5 x d).
(ii) For the special case qzs = A., , = hx = i (c <x < d), compute m(x).
(*iii) Compute m(x) for general birthdeath chains.
6. Let p(t) = ((p, (t))) be transition probabilities for a Markov chain on a finite state
space S. Adapt Proposition 6.1 of Chapter II to show that lim, m p. (t) exists for
all i, Je S, and the convergence is exponentially fast. [Hint: Fix t > 0 and consider
the discrete-parameter one-step transition probability matrix q = ((p ;,(t))). Then
q" = ((p ; ,(nt)), n = 0, 1, 2, .... Use the semigroup property to show that lim"^ q;jl'
(exists) does not depend on t > 0.]
7. Suppose that {X,} is irreducible and positive-recurrent. Let {}"} be the embedded
discrete-parameter Markov chain with transition matrix K. Let a be the invariant
distribution for {X,} such that n' Q = 0. Calculate the invariant distribution for { Y"}.
8. Let {X} be a positive-recurrent birthdeath process on S = {0, 1, 2, ...} with

birthdeath rates ; a ; and b ; A ; respectively. Show that
where it is the invariant initial distribution.

9. Consider the N-server queue under the invariant initial distribution (8.27).
(i) Show that the expected number of customers waiting to be served (excluding
those being served) is given by
(IP( ^)
) N' rc o ,
i
where p = N (< 1)
is referred to as the traffic intensity parameter; i.e., the mean number of

arrivals within the average service time '/N. -
(ii) Show that the average length of time a customer must wait for service is
(NP)"
It o .
N(1 p) 2 N!
(iii) A particular hospital ward receives patients according to a Poisson process at

an average rate of a = 2 per day. The average length of stay is 5 days. Assuming
the length of stay to be exponentially distributed, how many beds are necessary
in order to achieve an equilibrium distribution for the total number of patients

admitted and waiting admission? What will be the average waiting time for
admission?
10. Let {X= } be the continuous-parameter Markov (binary) branching process with
offspring distribution f(0) = (5,f(2) = , + 8 = 1, and parameter A.
(i) Show that {X,} is a birthdeath process with linear rates.
(ii) Let gt =_ g;' (r):=Z ; P; (X, = j)rt denote the p.g.f. of the P ; -distribution of X,.
)
Show that the forward equations transform as

age _ z _ ag^'>
r)+b(1r)] ^r , Q=Q1, S=2S.
at =[(r
(iii) Solve the equation for g.

11. Suppose that {X,} has transition probabilities p 1 (t) and infinitesimal transition rates
given by Q = ((q)). Suppose that it is a probability distribution satisfying zt'Q = 0.
If E ; A ; n i < oo, where 2 ; = q ;; , then show that it is an invariant initial distribution.
[Hint: Differentiation of >' nip(t) term by term is allowed.]
12. If S comprises a single communicating class and it is an invariant probability, then
all states are positive recurrent. [Hint: Consider the discrete time chain X,, n >_ 0.
Use Theorem 9.4 of Chapter II. Note that q defined by (8.4) is smaller than the
second passage time to state i for the discrete parameter chain.]
13. Show that positive recurrence is a class property. [Hint: Use arguments analogous
to those used in the discrete-time case.]
14. Verify the forward equations for Example 5.3. [Hint: Use Exercise 4.5 and Example
8.2.]

1. Calculate the spectral representation of the transition probabilities p 1 (t) for the
continuous-parameter simple symmetric random walk on S = {0, 1, 2, ...} with one
reflecting boundary at 0. [Hint: Use (9.19).]
2. (Time Reversibility) Let {X,} be an irreducible positive recurrent Markov chain with
invariant initial distribution it.
(i) Show that {X} is a stationary process; i.e., for any h > 0, 0 < t i < t 2 < < t k
(X,,..... X, k ) and (X,,,,, . .. , X, k ,) have the same distribution.
(ii) Show that {X,} is time-reversible, in the sense that for any 0 _< t i < t 2 < < t j, < T
(X,,, . .. , X, k ) and (XT _ tk , ... , X T _,,) have the same distribution, if and only if
n ; p ;j (t) = rr t pt; (t) for all i, j e S, t _> 0.
This last property is sometimes referred to as time-reversibility of n.

(iii) Show that {X} is time-reversible if and only if n ; q i; = rc ; q ;; for all i, j e S.
(iv) Show that {X,} is time-reversible if and only if the discrete-parameter embedded
chain is time-reversible; see Exercise 7.4 of Chapter II.
(v) Show that the chemical reaction kinetics example is time-reversible under the
invariant initial distribution.

EXERCISES 347
3. (Relaxation and Maximal Correlation) Let {X,} bean irreducible finite-state Markov
chain on S with transition probabilities p(t) = ((p ;j (t))), t >, 0, and infinitesimal rates
Q = ((q i3 )). Let it be the unique invariant distribution and assume the distribution
to be time-reversible as defined in Exercise 2(ii) above. Define the maximal correlation
by p(t) = sup f , 9 Corr( f (XX ), g(X o )), where the supremum is over all real-valued
functions f, g on S and
E{[J(X,) E,.f(X^)][g(XO) Eg(XO)]}

Corrn(.f(X,), g(Xo)) =
{ Var r f (Xi )} "2 {Var,, g(Xo)} It 2
Show that

p(t)=e2, t>_0
where A, < 0 is the largest nontrivial eigenvalue of Q. The parameter y = 1/),, is

called the relaxation time or correlation length parameter in this context. [Hint: For
f, g such that E rt f = E R g = 0, 1II f II,, = IIII R = i Corr rt ( f (X,), g(X o )) = (p(t)f, g),. So
there is an orthonormal basis {(p} of eigenvectors of p(t). Since
p(t) = e Qr _ k ^
_ Qktk
k!
the eigenvalues of p(t) are of the form e`^' where A. are eigenvalues of Q. Take
f = g = cp, to get <p(t)f, g>, = e` and, for the general case (centered and scaled),
expand f and g in terms of {q,}, i.e., f = J (f, cp) n cp,,, g = E (g, to show
ICorr n ( f (X,), g(X0 ))l _< e. Use the simple inequality ab _< (a 2 + b 2 )/2.]
4. Let {X,} be an irreducible positive recurrent finite-state Markov chain with initial
distribution p. Let lt j (t) = PI (X, = j), je S, and let it denote the invariant initial
distribution. Also let Q = ((q, j )) be the matrix of infinitesimal rates for {X}. Show that
d (t)
(i) ' = {(t)q 1 p (t)q 1 }, t >' 0, j e S,
dt tes
z(0)=p, j e S.
(ii) Suppose that it is a time-reversible invariant distribution and define
Yjj = _
n1g1j njgj,
(possibly infinite). Show that
dpj(t) _ 1 ilk\t) 14j (t) j e S.

dt keS Yjk Ttk Itj )'
Consider an electrical network in which the states of S are the nodes, and y jk is
the resistance in a wire connectingj and k. Suppose also that each nodej carries
a capacitance rrj . Then these equations are Kirchhoffs equations for the spread
of an initial "electrical charge" ; (0) = p,, j ES, with time (see also Exercise 7.9).
The potential energy stored in the capacitors at the nodes at time t, when the
initial distribution is (0) = t, is given by
(ul(t))^
U(t) = U(t(t)) = - Y z
2 ieS n%
One expects that as time progresses energy will be dissipated as heat in the wires.
(iii) Show that if p 96 it, then U is strictly decreasing as a function of time.
(iv) Calculate U(p(t)) for the two-state (flip-flop) Example 3.1 in the cases t(0) = 6 ( ,^,
i =0,1.
5. As in Exercise 4 above, let {X1 } be an irreducible positive recurrent finite-state Markov
chain with initial distribution it. Let ; (t) = PI (XZ = j), je S, and let it denote the
invariant initial distribution. Let h be a strictly concave function on [0, x) and define
the h-entropy of the distribution at time t by
H(t) = H((t)) h t >- 0.

,%e$ '\ 1j
(i) Show for 0 n, H(t) is strictly increasing as a function of t and

lim 1 ^^ H(p(t)) = H(n).
(ii) Statistical entropy is defined by taking h(x) _ x log x in (i). Calculate
H(p(t)) for the two-state (flip-flop) Example 3.1 when t(0) = S (;r , i = 0, 1.
1. Calculate the distribution G(t; 2, A ; , ... , A i ) appearing in Eq. 10.4. [Hint: Apply
the partial fractions decomposition to the characteristic function of the sum.]
2. Calculate the distribution of the time until absorption at j = 0 for the pure death
process on S = {0, 1, 2, ...} starting at i, with infinitesimal rates q 1 = ib if j = i 1,
i >, 1, q i1 = i8, q 11 = 0 otherwise. What is the average time to absorption?
3. A zero-seeking particle in state i > 0 waits for an exponentially distributed time with
parameter a. and then jumps to a position uniformly distributed over
0, 1, 2, ... , i 1. If in state i < 0, it holds for an exponential time with parameter
2 and then jumps to a position uniformly distributed over i + 1, i + 2, ... , 1, 0.
Once in state 0 it stays there. Calculate the average time to reach zero, starting in
state i.
*4. (i) Under the conditions stated in Example 2, show that conditioned on non-
extinction at time t, X, has a limit distribution as t -- oc with p.g.f. given by
Eq. 10.34. [Hint: Check that Zj o P,(X, =j I r (o) > t)r may be expressed as
{g,(' ) (r) g,(' ( 0 )}/( 1 g"()). Apply (10.28) to get
H(t + H(r)) H '(t + H(0)) -
1 H"'(t + H(0))
which by (10.31) is, asymptotically as t -+ oc, given by I exp{x l (H(r) H(0))} x

(1 + 0(1)). Finally, apply (10.26).]

(ii) Determine the precise form of the distribution of the limit in the binary case
.r2=p<9=.%,p+9= 1 .
Theoretical Complements to Section IV.I

(Independent Increments and Infinite Divisibility) A probability measure Q on
(R', ') is called infinitely divisible if for each n >_ 1, Q can be factored as an n-fold
convolution of a probability distribution Q. on (R' , . iJ), i.e., Q = Q, * * Q. (n-fold ),
n >_ 1. Familiar examples are the Normal, Gamma, Poisson and Cauchy distributions,
as well as the first passage time distribution of the Brownian motion (Chapter 1).
A basic property of infinitely divisible distributions is that the characteristic function
Q() = f eQ(dx), 5 e R', is never zero. To see this, observe that by considering
cP(5) = IQ 2 (^)I if necessary, one may assume to be given a real nonnegative
characteristic function which, for any n _> 1, factors into an n-fold product of real
nonnegative characteristic functions q,, (of some probability distributions n , say).
Since q(0) = 1 and cp is continuous, there is an interval [ i, r] on which cp(5) is
strictly positive. Now, taking the unique version of log cp(^) which makes - log cp(5)
continuous and log cp(0) = 0, log <p(S) = n log cp(f) and, expanding the log,
log (p(^) = n log{ 1 [1 cV^(^)]} = n[1 cP^(5)] n[1 4)J^)] 2 / 2 .... In par-
ticular, one sees that n[I cp(] is a bounded sequence for e [r, r]. But, also
lb,(dx) = n J (I e 2
n[1 cPn( 2 5)] = n I J e 2
I J u'
),u,(dx)
J
= n [I cos(2^x)]p(dx) = 2n J [I
a
cos 2 (^x)](dx)
4nJ [I cos(,x)](dx) = 4n[1 cpJ )].

^'
So, the sequence n[ 1 cp(2' )] k is bounded on [r, r]. This means that each cp(s ),
and therefore, cp(^) must be nonzero on [Zr, Zr]. In particular, there could be no
largest such r and this proves p has no real zeros.
If {X,} is a stochastic process with state space S c R' having stationary independent
increments, then for any s < t, the distribution of X, X, is clearly infinitely divisible
since for each integer n _> 1,
X, XS= Z (X,X, 1 ) t k =s+(ts)^ k=0,1,.. ,n.

k=1 n
Conversely, given an infinitely divisible distribution Q there is a family of probability

measures Q t > 0, such that Q, = Q and Q, * Q s = Q, + ,,, obtained, for example,
though their respective characteristic functions by taking cp,(^) = (cp(^))':_
exp{t log cp, ( )}, since cp, (S) ^ 0, for H' Thus one obtains a consistent specification
of the finite-dimensional distributions of a stochastic process {X,} having stationary
independent increments starting at 0 such that X, X, has distribution Q, _ 0 _< s< t.
The process {X,} with stationary independent increments is a Markov process on

R' starting at 0 having stationary transition probabilities given by
p(t;x,B)=P(X,+sEB1 Xs=x)=Q,(B x), t>0, BE.c',
where B x :_ {y x: ye B} is a translate of B.
2. (LevyKhinchine Representation) The Brownian motion process and the compound
Poisson process are simple examples of processes having stationary independent
increments. While the Brownian motion is the only one of these that is a.s. continuous,
both are continuous in probability (called stochastic continuity). That is, for any t o >' 0,
X, X 0 in probability as t + t o since (i) for the Brownian motion, a.s. convergence
implies convergence in probability and (ii) for the compound Poisson process,
P(IX, X,I > e) 0. Although the sample paths of
the Poisson and compound Poisson processes have jump discontinuities, stochastic
continuity means that there are no fixed discontinuities.
Stochastic processes {X} and {Y} defined on the same probability space and
having the same index set are said to be stochastically equivalent if P(X, = Y) = 1
for each t. Stochastically equivalent processes must have the same finite-dimensional
distributions. As an application of the fundamental theorem on the sample path
regularity of stochastically continuous submartingales given in theoretical complement
5.2, it will follow that a stochastically continuous process {X,} with independent
increments is equivalent to a stochastic process {Y} having the property that almost all
of its sample paths are right-continuous and have left-hand limits at each t, i.e., have
at most jump discontinuities of the first kind. Moreover, the process {Y,} is unique in
the sense that if { Y;} is any other such process, then P(Y, = Y, for every t) = 1. Without
loss of generality we may assume the given process {X} with stationary independent
increments to have jump discontinuities of the first kind. Such a process is called a
(homogeneous) Levy process. The prefix homogeneous refers only to stationarity of
the increments.
Theorem T.1.1. If {X,} is a (nondegenerate) homogeneous Levy process having a.s.

continuous sample paths, then {XX } must be Brownian motion.
Proof. Let s < t. By sample path continuity, for any e> 0 there is a number
6 = S(c)> 0 such that
P(IX X I <e whenever lu VI <5, s < u, v _< t) > 1 e.
Let {E"} be a sequence of positive numbers decreasing to zero and partition (s, t] into
subintervals s = t / < t;" < ... < tk", = t of lengths less than b" = b(e,,). Then
) )
k k
X,X9 = E (XX ,1X1 )__ E X;" I
where X'> . .. , Xk^ are independent (triangular array). Observe that the truncated
)
random variables
are also independent and ., + X, X, in probability as n * oo since
k
P(X, XS =R)>> I e".
The result now follows by an application of the Lindeberg CLT (Chapter 0,

Theorem 7.1). n
Within this same context, the other extreme is represented by the following.
Theorem T.1.2. Let {XX } be a homogeneous Levy process almost all of whose sample
paths are step functions with unit jumps. Then {X,} is a Poisson process. 0
The proof of Theorem T.1.2 will be based on the following basic coupling lemma.
Lemma. (Coupling Bound). Let X and Y be arbitrary random variables. Then for
any (Borel) set B,
P(X e B) P(Ye B)I -< P(X ^ Y).
Proof. First consider the case P(X e B) > P(Y e B). Then
0-<P(XeB)--P(XeB,YeB)=P(XeB,Y0B)-<P(X ^ Y).
The argument applies symmetrically to the other case. n
The coupling lemma can be used to get the following Poisson approximations
which, while important in their own right, will be used in the proof of Theorem 1.1.2.
Lemma. (Poisson Approximations). Let Y1 , ... , Y" be independent Bernoulli

0-1-valued random variables with p ; = P(Y = 1), i = 1, ... , n. Then for any ). > 0
and J^{0,1,2....},
(i) P(1 Y E J) Fo (J)H 1 p; ;
(ii) P^ ^ Y E J) Fx (J) S ^ P,^ + (max P,) P;,
where
e-^ p" _ p..

FF(J) _
m E./ m i =1
Proof. Let Q be an arbitrary (discrete) probability distribution on {0, 1, 2, ...} with

p.m.f. q = Q( {i}), i = 0, 1, 2, ... . A simulation random variable having distribution
;
Q based on the value of a random variable U uniformly distributed over (0, 1] (i.e.,
a function of U) can be defined by
X = Q(U), (T.1.1)
where Q(x) e {O, 1, 2, ...} is uniquely defined for each 0 < x < I by the condition
that (an empty sum being zero)
Qt:) -1 Q(x)
q ; . (T.1.2)
j=0 j=0
Clearly, X = Q(U) so constructed has distribution Q since
P(Q(U) =i) =P qj<U^ 9; =q;
Now let U 1 , U2 ,..., UU be i.i.d. uniform over (0, 1] and let G, - G F; = F,, be
the Bernoulli and Poisson distributions with parameter p ; , respectively, i = 1, ... , n.
That is,
G,({1})=p = 1 G ({0}),
; ; F; ({k})=p` e P', - k=0,1,2,....
k!
Then, by the coupling lemma, letting p _ î= 1 p ; and letting indicate the
corresponding "simulation" function as defined above, we have
P(i Y ) Fp.(J) P(Y 1 (U1 ) 6 1 F,(U,)) (T.1.3)
since the distribution of the sum of independent Poisson random variables with
parameters p,,... , p is Poisson with parameter E" =1 p ; . Now from (T.1.3) we get
P(t FP (J) ^ P^ U {G,(U ; )

P((U; ) F; (U1 )). (T.1.4)
Now, for fixed i, observe from Figure TC.IV.1 that
P(G;(U;) ^ F;(U,)) =
(e-v1 (I p;)) + (1 e-a'(1 + p;)) 5 p? (T.1.5)
.- {G (U ) = 0} - 1 -- {G (U ) = I}
; ; ; ;
0 1 pi e-n, (1 + p;)e-P' 1
I ' {(U,) = 0) I - {F (U;) = 1) -;I f-- { F (U,) > 1) -- ;
Figure TC.IV.1

since 1 e i -< p,. Thus, (T.1.4) and (T.1.5) prove

- (i).
To derive the estimate (ii),
i
let > 0 be arbitrary and use the triangle inequality to write
PI ` ^t Y Ei) FA (J)I 5 IP 1 Y Ei) FP(J) + IF,,(J) Fz(J)I

(^
n
< p? + IF(J) FA(J)I
max pi pi + IFF .(J) FA (J)I. (T.1.6)

15i^n i =1
To complete the proof, we simply need to show
IF(J) FA(J)I 1
For this, first suppose p" = _, p ; > land let Z Z Z be independent Poisson random
variables with parameters a and Y"_ , p i A. Again use the coupling bound lemma
to bound the last term in (T.1.6) by
P(Z1 ^ Z 1 + Zz) = P(Z2 ^ 0) = 1 exp (iZ pi i)] -< .l pi. (T.1.7)

[ t i =1
The symmetrical argument works for Y" =1 p ; < A also. n
Proof of Theorem T.1.2. It is enough to show that for each t > 0, X, has a Poisson
distribution with EX, = At for some A > 0. Partition (0, t] into n intervals of the form
(t i _,, t i ], i = 1, ... , n having equal lengths A = t/n. Let A> = {X,, X1 l} and
SS _ Y"-_, I"? (the number of time intervals with at least one jump occurrence). Let
D denote the shortest distance between jumps in the path Xs , 0 < s -< t. Then
P(X, 0 S") -< P(0 < D < t/n) + 0 as n --, oo, since {X, 0 Sn } implies that there is at
least one interval containing two or more jumps.
Now, by the Poisson approximation lemma (ii), taking J = {m}, we have
P(S" = m) e -x' \ Inp" Al + (npn)z

where pn = pin) _ P(A^nl) ,
m! n
i = 1, ... , n, and A > 0 is arbitrary. Thus, using the triangle inequality and the
coupling bound lemma,
P(XX = m )-- e xS
- IP(X, =m )P(Sn=m )I + P(Sn=m ) --- e -zl
m! m!
P(X, 0 Sn) + Inpn AI + npn
=o( 1 )+Inpn Al+np^. (T.1.8)
The proof will be completed by determining A such that np" Al 0, at least along
a subsequence of {np"}. For then we also have np = np"P(A) 0 as n > oo. But,
since P(X, = 0) = P(n"=, A;'') = (1 p")", it follows that P(X, = 0) > 0; for
otherwise P(A;" ) ) = 1 for each i and therefore X, > n for all n, which is not possible
under the assumed sample path regularity. Therefore np, _< n log(1 p") =
log P(X, = 0) for all n. Since {np"} is a bounded sequence of positive numbers,
there is at least one limit point A > 0. This provides the desired A.
The remarkable fact which we wish to record here (with a sketch of the proof and
examples) is that every (homogeneous) Levy process may be represented as sum of a
(possibly degenerate) Brownian motion and a limit of independent superpositions of
compound Poisson processes with varying jump sizes. Observe that if the sample paths
of {X,} are step function of a fixed jump size y 0 0, then {y 1 X1 } is a Poisson process
by Theorem 1.1.2. The idea is that by removing (subtracting) the jumps of various
sizes from {XX } one arrives at an independent homogeneous Levy process with
continuous sample paths. According to Theorem T.1.1, therefore, this is a Brownian
motion process. More precisely, let {X,} be a homogeneous Levy process and let
S = [0, T) x !J' for fixed T> 0. Let a(S) be the Borel sigmafield of S, and let
R + (S) = {A e g(S): p(A, [0, T) x {0}) > 0}, where p(A, B) = inf{Jx yl: x e A, ye B}.
The spacetime jump set of {X,} is defined by
J=J(w)={(t,y)eS:y=X,(ow)X,-(cw) 0,0<t < T},
for w e S2. Define a random counting measure v on a(S) by
v(A) = v(uw, A) = # A n J(w), WED,

S2, A e l(S).
For fixed we S2, v is a measure on the sigmafield a(S), and for each A e P4(S), v(A)
is a random variable (i.e., w --+ v(cw, A) is measurable). Moreover, for fixed A a P4(S),
the process v(A n [0, t] x U8'), t _> 0, is a Poisson process. To prove this, take A an
open subset of S and show that the process is a Levy process with unit jump step
function sample paths and apply Theorem 1.1.2.
Define (A) = Ev(A) for A e .4 + (S). Then visa measure on P4(S); countable
additivity can be proved from countable additivity of v by Lebesgue's Monotone
convergence theorem. Now observe that for each fixed B E .(OR' \ {0} ), the set function
(C) = v(C x B) is translation invariant on .4([0, T)). Therefore, v(C x B) is a
multiple (dependent on B) of the Lebesgue measure ICJ of C, i.e., v(C x B) = K(B)ICI,
where K(B) is the multiplying factor. Notice now that since p is a measure, K must also
be a measure on .P(R' \ {0} ). Define
v,(B) = v([0, t] x B), B e P4(18' \ {0}).
Since there are at most a finite number of jumps in [0, t] of size 1/n or larger,
{uv^> Ii "i yv,(dy) is well-defined. However, in general, its limit may not exist pathwise.
By appropriate centering, a limit may be shown to exist in L 2 . Finally, one may
prove that this process built up from jumps is independent of the Brownian motion
component. In this way one obtains the following.
Theorem T.1.3. Let {X,} be a homogeneous Levy process. Then there is a standard
Brownian motion {B,} independent of {v,(B): Be .4(IV \ {0}), 0 < t < T} such that

X, = mt + /CT B 1 +
ff
'^
Yv1(dY)
1 +t y z K(dY) }
in distribution, where me R', Q 2 -> 0, and for fixed Be .(U8' \{0}), v 1 (B) is a Poisson
process in t.
Corollary. A probability distribution Q on I' is infinitely divisible if and only if the

characteristic function Q() = f u eQ(dx), e W, can be represented as
Q() =exp{ imt 2tz + t

J [e
`Y _ l_ +yvZ K(dY) j
J
for some me I8', a 2 >, 0, where K is a nonnegative measure on I8'.
Example 1. K = 0. Then {X,} is Brownian motion with drift m and diffusion

coefficient a Z .
Example 2
r = lim y z K(dy)
e10 IYI>s 1 + y
exists. Then,
X, = (m + r)t + 6 2 B, +
5 yv,(dy).
In this case
l
Ee' 4x, =exp{it(m+r)^t Z +t J -. [e'^Y - 1]K(dy)}.J J
Example 3. Ifm+r= 0, a 2 = 0, then
X, = lim
e10 f(IYI>e)
yv,(dy) and Ee = exp{lim t J
(((sly
(e'4Y 1)K(dy)
(IYI>el )))
Example4. IfA=K(t')<oo,m+r=0,a 2 =0, then
X, = Ee`4x , = exp{2.t J (e'SY 1 ) 2-1 K(dY)}.

J Yv,(dy),
l
An important special class of (nondegenerate) homogeneous Levy processes {X1 } have

the following basic scaling property: For each A > 0, there is a positive constant c x > 0
such that {c AX,u } and {X,} have the same distribution. These are the Levy stable
processes. In view of the LevyKhinchine formula we have the following theorem.
Theorem T.1.4. If {X,} is a homogeneous Levy stable process, then
Ee'^ x ' = e"t' or e - "`^ 2 or exp{(at + i jlt )1 del
for some a>0,eW,0<0<2.

Observe that in the first case X, = m, t >_ 0, is a.s. constant and c A = . = 1. In
the second case {X1 } is Brownian motion with zero drift and c z = .1 -1 . In the third
case we have c z = A `t e and examples are the Cauchy process (0 = 1) and the
-
first-passage-time process for Brownian motion (0 = Z).

Thorough treatments of processes with independent increments may be found in
K. Ito (1984), Lectures on Stochastic Processes, Tata Institute of Fundamental
Research, distributed by Springer-Verlag, New York, and in I. I. Gikhman and
A. V. Skorokhod (1969), Introduction to the Theory of Random Processes,
W. B. Saunders, Philadelphia. The coupling arguments follow T. C. Brown (1984),
"Poisson Approximations and the Definition of the Poisson Process," Amer. Math.
Monthly, 91, pp. 116-123.
3. (Fractional Brownian Motion and Self-Similarity) The scaling property (or
self-similarity) of Levy processes was extended to more general processes (including
diffusions) in a classic paper by John Lamperti (1962), "Semistable Stochastic
Processes," Trans. Amer. Math. Soc., 104, pp. 62-78. Lamperti used the term
semistable to describe the class of processes {X,} with the property that for each 1 > 0
there is a positive scale coefficient c z and a location parameter b z such that {c A XA, + b x }
and {X} have the same distribution. The defining property is often referred to as
simple scaling or self-similarity. Stochastic processes with this property are of
substantial current interest in many branches of mathematics and the physical sciences
(see references below as well as references therein). One family of such processes of
some notoriety is the class of fractional Brownian motions defined as follows. Let
0 < _ 0,
P(s, t) _ {IsI 2 + ItI Z Q Is tI 2 R}/2. (T.1.9)

Then p is a positive-definite symmetric function. To see this, one need only check
that there is a family of square-integrable functions y (t, ) on the real-number line
such that for each s and t, p p (s, t) may be represented as an L 2 inner product
<Y(s, ), Yp(t, )), namely,
YQ(t , x) = It xlQ - 't 2 sign(t x) + IxI - 't z sign(x). (T.1.10)
One may then check the positive-definiteness using the bilinearity of the inner product;
see, for example, M. Ossiander and E. Waymire (1989), "On Certain Positive Definite
Kernels," Proc. Amer. Math. Soc., 107, 487-492.
The fractional Brownian motion {Zr } with exponent may now be most simply
defined as a mean-zero Gaussian process starting at 0 such that Z S and Z, have
covariance given by p (s, t). This class of processes was studied by Benoit Mandelbrot
and others in connection with the Hurst effect problem of Section 1.14; see
B. Mandelbrot (1982), The Fractal Geometry of Nature, Freeman, San Francisco (and
references therein) and J. Feder (1988), Fractals, Plenum Press, New York.
Additional reference: M. Taqqu (1986), "A Bibliographical Guide to Self-Similar
Processes and Long Range Dependence," in Dependence in Statistics and Probability
(E. Eberlein, M. Taqqu, eds.), Birkhuser, Boston.

Theoretical Complements to Section IV.5

(The Uperossing Inequality and (Sub) Martin gale Convergence) A fundamental
property of monotone sequences of real numbers is that boundedness of the sequence
implies the existence of a limit. An analogous property of submartingales also holds
as will be developed below.
Consider a sequence {Z.} of real-valued random variables and sigmafields
.F1 c .F2 c . , such that Z. is . -measurable. Let a < b be an arbitrary pair of real
numbers. An uperossing of the interval (a, b) by {Z} is a passage to a value equal
to or exceeding b from an earlier value equal to or below a. It is convenient to look
at the corresponding process {X:= (Z^ a) + }, where (Z a) + = max{(Z a), 0}.
The uperossings of (a, b) by {Z} are the uperossings of (0, b a) by {X}. The succes-
sive uperossing times >a 2k (k = 1, 2, ...) of {X} are defined by q l := inf{n _> I: X = 0},
11 2 := inf{n >_ q: X _> b a}, ?1 2k+1 := inf{n q2k: Xn = 0 } , rl2k+2'= inf{n 3 12k+1:
X > b a}. Then the rl k are {5}-stopping times. Fix a positive integer N and define
N = min{ry k , N},k = 1, 2..... Then the z k are stopping times. Also, T k - N
T k `= rlk A
for k > [N/2] (the integer part of N/2) so that X X N for k > [N/2].
Let UN - U N (a, b) denote the number of uperossings of (a, b) of {Zn } by time N.
That is,
UN(a, b) := sup{k > 1: q 2k _< N} , (T.5.1)
with the convention that the supremum over an empty set is 0. Thus UN is also the
number of uperossings of (0, b a) by {X} in time N.
Since X X N fork > [N/2], one may write (setting t 0 = 1)
IN121 [N^/'2` 1
XN X1 = Y- (XT2k-1 X T2k-2 ) + L^ (XT2k Xr2k - 1 ) (T.5.2)
k=1 k=1
To relate (T.5.2) to the number UN , let v denote the largest k such that ry k _< N, i.e.,
v is the last time _< N for an uperossing or a downcrossing. Notice that UN = [v/2].
Ifvis even, then XT2k XT2k-, >b a if 2 k<v , and =X N X N =0if2k>v.Now

suppose v is odd. Then XT2k XT2k _ 1 >_ b a if 2k I < v, and = 0 if 2k I > v, and
= XT2k 0 _> 0 if 2k I = v. Hence in every case
N/21
(Xr2k X T2k - 1 ) >- [v/2](b a) = UN(b a). (T.5.3)
k= 1
As a consequence,
[N/21
XN X1 >- Y- (XT2k - 1 XT2k - 2) + (b a)UN . (T.5.4)
k=1
Observe that (T.5.4) is true for an arbitrary sequence of random variables (or real
numbers) {Z}. Assume now that {Z} is a {.f}-submartingale. Then {X} is a
{.F}-submartingale by convexity. One may show, using the method of proof of
_> .
Theorem 13.3 of Chapter I, that {X Tk : k >_ l} is a submartingale. Hence EX
increasing in k, so that
E[ [N121
Y-
k=1
(XT2k-I XT2k-2 ) ] 0
Applying this to (T.5.2) one obtains the following.

Proposition T.5.1. (Uperossing Inequality). Let {Z} be a {S}-submartingale. For

each pair a <b the expected number of uperossings of (a, b) by Z,, ... , Z N satisfies
the inequality
E(Z N a) + E(Z, a) + EIZNI + Ial

EUN (a, b) (T.5.5)
ba ba
As an important consequence of this result we get the following theorem.
Theorem T.5.2. (Submartingale Convergence Theorem). Let {Z} be a submartingale

such that EIZI is bounded. Then {Z} converges a.s. to an integrable random
variable Z.
Proof. Let U(a, b) denote the total number of uperossings of (a, b) by {Zn }. Then
UN (a, b) j U(a, b) as N I oo. Therefore, by the Monotone Convergence Theorem
(Chapter 0)
EIZNI + lal
EU(a, b) = lim EUN (a, b) _< sup < co . (T.5.6)
N1 N b a
In particular, U(a, b) < cc almost surely, so that
P(liminf Z n <a < b < limsup Z n ) = 0. (T.5.7)
Since this holds for every pair a, b = a + 1/m with a rational and m a positive integer,
and the set of all such pairs is countable, one must have liminf Z. = limsup Z. almost
surely. If lim Z. is not finite with probability 1, IZ,I * oo with positive probability
so that E limlZI = co. Then, by Fatou's Lemma (Chapter 0), lim EIZI = 00, which
contradicts the hypothesis of the theorem. The integrability of IZJ follows from
Fatou's Lemma. n
Corollary. A nonnegative martingale {Z} converges almost surely to a finite limit

Z. Also, EZ, < EZ,.
Proof. For a nonnegative martingale {Zn }, IZn l = Z. and therefore, sup EIZI =
sup EZ = EZ, < oo. By Fatou's Lemma (Chapter 0), EZ ^ = E(lim Z) S lim EZ, =
EZ,. n
Convergence properties of supermartingales {X} are obtained from the

submartingale results applied to { X.J.
2. (Martingales and Sample Path Regularity) The control over fluctuations available
through Doob's Inequality (Section 1.13) and the Uperossing Inequality (theoretical
complement 1) for (sub/super) martingales make it possible to find versions of such
processes with very regular sample path behavior. In this connection the martingale
property is of fundamental importance for the construction of models of stochastic
processes having manageable sample path properties (cf. theoretical complements 1.1
and 5.3). The basic result is the following. Recall the meaning of continuity in
probability from theoretical complement 1.2.

Theorem T.5.3. Let {X} be a submartingale or supermartingale with respect to an

increasing family of sigmafields {.}. Suppose that {X,} is continuous in probability
at each t _> 0. Then there is a process {X,} such that:
(i) (Stochastic Equivalence) {X,} is equivalent to {X} in the sense that
P(X, = = l for each t _> 0.
(ii) (Sample Path Regularity) With probability I the sample paths of {Xt } are
bounded on compact intervals a _< t < b, (a, b >_ 0), and are right-continuous and
have left-hand limits at each t > 0.
Proof: Fix T> 0 and let Q T denote the set of rational numbers in [0, T]. Write
Q T = U ^ , R", where each R. is a finite subset of [0, T] and T a R, c R 2 c . By
Doob's Maximal Inequality (Section 1.13) we have
IS rl
P( max (X,I > AlE n= 1,2 ... .
tE R, j
Therefore,
P( suP JX,I > n^

tEQT
lim P(max JX t j > A) <_ EIX TI
"-IX: (.Rn
7
/i.
In particular, the paths of {X,: t a Q T } are bounded with probability 1. Let (c, d) be
any interval in R and let U (T) (c, d) denote the number of uperossings of (c, d) by the
process {X,: t e Q T }. Then U (T) (c, d) is the limit of the number U ( " ) (c, d) of uperossings
of (c, d) by { X,: t E R"} as n -* co. By the uperossing inequality (theoretical complement
1), one has
EIXTI + j cj
EU 1 " ) (c, d)
dc
Since, the Ul" ) (c, d) are nondecreasing with n, it follows that U (T) (c, d) is a.s. finite.
Taking unions over all (c, d), with c, d rational, it follows that with probability one
{X,: t a Q 7_} has only finitely many uperossings of any interval. In particular, therefore,
left- and right-hand limits must exist at each t < T a.s. To construct a right-continuous
version of {X,}, define X, = lim5.., ;.SEQT X, fort < T. That {X,} is in fact stochastically
equivalent to {X1 } now follows from continuity in probability; i.e., X, = lim s -,. X, = X,
since a.s. limits and limits in probability must a.s. coincide. Since T is arbitrary, the
proof is complete. n
3. (Markov Processes and Sample Path Regularity) The martingale regularity theorem
of theoretical complement 2 can be applied to the problem of constructing Markov
processes with regular sample path properties.
Theorem T.5.4. Let {X,: t _> 0} be a Markov process with a compact metric state
space S and transition probability function
p(t;X,B)=P(Xt+s eBIXs =x), xES, s,t->0, BE.^(S),

such that
(i) (Uniform Stochastic Continuity) For each e 0, p(t; x, BE(x)) = o(1) as t -* 0 +
uniformly for x e S, where B E (x) is the ball centered at x of radius e > 0.
(ii) (Feller Property) For each (bounded) continuous function f, the function
x -+ f s f(y)p(t; x, dy) = E x f (Xr ) is continuous. Then there is a version {X,} of
{X,} a.s. having right continuous sample paths with left-hand limits at each t.
0
Proof. It is enough to prove that {X,} a.s. has left- and right-hand limits at each t,
for then {X,} can be modified as X, = lim s .,+ XS which by stochastic continuity will
provide a version of {X,}. It is not hard to check stochastic continuity from (i) and
(ii). Let f E C(S) be an arbitrary continuous function on S. Consider the semigroup
{ T} acting on C(S) by T, f (x) = E x f (X,), x e S, t _> 0. For A > 0, write
R x f (x) =
f' .
e 'Tf (x) ds,
- X E S;
R x (A > 0) is called the resolvent (Laplace transform) of the semigroup {T}. A basic
property of the resolvent is that AR, behaves like the identity operator on C(S) for
A large in the sense that
Ml 1Rf II := sup 11(x) ARJ.f(x)I = sup IJe ' 2 (f(x) T.f(x)) ds

xeS xes 0 l
f o
^ e - 2IIf Tf II ds = f e - Ìlf Tx-lTII dt.
0
(T.5.8)
For each z > 0, by the properties (i) and (ii), Ill - Tx- lTII - 0 as A - oo, and the
integrand is bounded by 2e - TII f II since II 7;f II _< Ill II for any t >_ 0. By Lebesgue's
Dominated Convergence Theorem (Chapter 0) it follows that III f - 2R x f II - 0 as
f - f uniformly on S as d --> oo. With this property of the resolvent
A -+ oo; i.e., ARJ
in mind, consider the process { Y} defined by
1', = e -,
t >, 0, (T.5.9)
"Rj(X,),
where f e C(S) is fixed but arbitrary nonnegative function on S. Then {Y,} is a

supermartingale with respect to ^, = a{X,: s 5 t}, t > 0, since EI YI < 00, Y, is
.-measurable, one has T f (x) >, 0 (x e S. t >_ 0), and
- zn +e >E{R f(X )^.3z}=e-xetF ThR f X)

t+h 1.^}:=e
E{Y r ^ r+h r h,t ( r
m
= e -at f m e -xis+nt - zr e
T,+hf(X$) ds = e.
-,
"T.f(X,) ds
J 0 n
e-.0 f e'Tf(X,)ds=Y. (T.5.10)

.^
Applying the martingale regularity result of Theorem T.5.3, we obtain a version { Y}

of { Y} whose sample paths are a.s. right continuous with left-hand limits at each t.
Thus, the same is true for {)e^'Y} _ {AR x f(X,)}. Since J.R A f -+ f uniformly as ;. -+ x,
the process { f(X,)} must, therefore, a.s. have left and right-hand limits at each t. The
same will be true for any f E C(S) since one can write f = f ' - f - with f + and
f - continuous nonnegative functions on S. So we have that for each f e C(S), { f (X,)}
is a process whose left- and right-hand limits exist at each t (with probability 1). As
remarked at the outset, it will be enough to argue that this means that the process
{X,} will a.s. have left- and right-hand limits. This is where compactness of S enters
the argument. Since S is a compact metric space, it has a countable dense subset
{x}. The functions f: S -- 68' defined by f(x) = p(x, x), XE S, are continuous for
the metric p, and separate points of S in the sense that if x y, then for some n,
f(X) ^ f,(y). In view of the above, for each n, { f,(X,)} is a process whose left- and
right-hand limits exist at each t with probability 1. Thus, the countable union of
events of probability 0 having probability 0, it follows that, with probability 1, for
all n the left- and right-hand limits exist at each t for { f(X,)}. But this means that
with probability one the left- and right-hand limits exist at each t for {X} since the
fn 's separate points; i.e., if either limit, say left, fails to exist at some t', then, by
compactness of S, the sample path t -+ X, must have at least two distinct limit points
as t --> t' , contradicting the corresponding property for all the processes { f(X,)},
-
n=1,2,.... n
In the case that S is locally compact, one may adjoin a point at infinity, denoted
A( S), to S. The topology of the one point compactification on S = S v {A} defines
a neighborhood system for A by complements of compact subsets of S. Let . !(S) be
the sigmafield generated by {, {A}}. The transition probability function p(t; x, B),
(t >, 0, XE S. Be s(S)) is extended to p"(t; x, B) (t >, 0, XE S. B e 4(S)) by making A
an absorbing state; i.e., p(t; A, B) = I if and only if Ac B, Be 4, otherwise
p(t; A, B) = 0. If the conditions of Theorem T.5.4 are fulfilled for p, then one obtains
a regular process {X,} with state space S. Defining
r o =inf{t>0;X,=A},
the basic Theorem T.5.4 provides a process {X,: t < r o } with state space S whose
sample paths are right continuous with left-hand limits at each t < r e . A detailed
treatment of this case can be found in K. L. Chung (1982), Lectures from Markov
Processes to Brownian Motion, Springer- Verlag, New York.
While the one-point compactification is natural on analytic grounds, it is not
always probabilistically natural, since in general a given process may escape the state
space in a variety of manners. For example, in the case of birth-death processes with
state space S = Z one may want to consider escapes through the positive integers as
distinct from escapes through the negative integers. These matters are beyond the
present scope.
4. (Tauberian Theorem) According to (5.44), in the case r >- 2 the generating function
h(v) can be expressed in the form
h(v) - (1 - v) K((1 - v) ') as v -* 1 , (T.5.12)

- -
where p >- 0 and K(x) is a slowly varying function as x -+ oo; that is, K(tx)/K(t) --* 1
as t - oo for each x > 0. Examples of slowly varying functions at infinity are constants,
various powers of Ilog xl, and the coefficient appearing in (5.44) as a function of
x = (1 - v) '. In the case r = 1, the generating function vh'(v) for {nh} is also of
-
this form asymptotically with p = 1. Likewise, in the case of (5.37) one can differentiate
to get that the generating function of {(n + I)P(ML = n + 1)} is asymptotically of the
form Z(1 v) '' 2 as v - 1.
-
Let d(v) be the generating function for a sequence {a} of nonnegative real numbers
and suppose
(v) =I a k v k , ( T.5.13)
k=0
converges for 0 < v < 1. The Tauberian theorem provides the asymptotic growth of
the sums as
n
Y a k (1/pl'(p))nP as n -+ co, (T.5.14)
k=0
from that of (v) of the form
(v) (1 v) oK((l v) ')

- - as v -. 1 (T.5.15)
-
as in (T.5.12). Under additional regularity (e.g., monotonicity) of the terms {a} it is

often possible to deduce the asymptotic behavior of the terms (i.e., differenced sums)
from this as
a. (1/I'(p))n - ' as n - oo. (T.5.16)
An especially simple proof can be found in W. Feller (1971), An Introduction to

Probability Theory and Its Applications, Vol. II, Wiley, New York, pp. 442-447.
Using the Tauberian theorem one can compute the asymptotic form of the rth
moments (r > 1) of An ' 12 L given ML = n as n -. oo. In fact, one obtains that for r > 1,
-
I
E{n - 'I 2 L ML = n} (2/)'p* + (r) as n - oo, (T.5.17)
where y* + (r) is the rth moment of the maximum of the Brownian excursion process
given in Exercise 12.8(iii) of Chapter I. Moreover, in view of the last equation in the
hint following that exercise, one can check that the moments *+(r) uniquely
determine the distribution function by checking that the moment-generating function
with these coefficients has an infinite radius of convergence. From this one obtains
convergence in distribution to that of the maximum of a (appropriately scaled)
Brownian excursion. This result, which was motivated by considerations of the main
channel length as an extreme value of a river network, can be found in V. K. Gupta,
O. Mesa and E. Waymire (1990), "Tree Dependent Extreme Values: The Exponential
Case, J. Appl. Probability, in press. A more comprehensive treatment of this problem
is given by R. Durrett, H. Kesten and E. Waymire (1989), "Random Heights of
Weighted Trees," MSI Report, Cornell University, Ithaca. Problems of this type also
occur in the analysis of tree search algorithms in computer science (see P. Flajolet
and A. M. Odlyzko (1982), "The Average Height of Binary Trees and Other Simple
Trees," J. Comput. System Sci., 25, pp. 171-213).
Theoretical Complements to Section IV.1I
1. A (noncanonical) probability space for the voter model is furnished by the graphical
percolation construction. This approach was created by T. E. Harris (1978), "Additive
Set-Valued Markov Processes and Percolation Methods," Ann. Probab., 6, pp.
355-378. To construct Q, first, for each m e Z', n belonging to the boundary
set 0(m) of nearest neighbors to m, let f2 m , n denote the collection of right-continuous
nondecreasing unit jump stepfunctions cn m.n (t), t >, 0, such that Wm(t) --* co as t -+ x.
By the term "step function," it is implied that there are at most finitely many jumps
in any bounded interval (nonexplosive). Let Pm ,, be the (canonical) Poisson
probability distribution on (S 2 m.n m.n) with intensity 1, where S m.n is the sigmafield
generated by events of the form {w E Q m : co(t) s k}, t > 0, k = 0, 1,2,.... Define
(1 ',Y', P') as the product probability space S2' = 11 ' = H ' 11 m.n,
where the products fj are over m E 1d , n e 3(m). To get 52, remove from Q' any
= ((Wm,n (t): t % 0): m E Z', n E 3(m)) such that two or more w m , n (t) have a jump
at the same time t. Then f(= S2 n #') and P are obtained by the corresponding
restriction to S2 (and measure-theoretical completion). The percolation flow structure
can now be defined on (S2, , P) as in the text, but sample pointwise for each
w = ((wm.n(t): t 0): m E 1 d , ne 3(n)) e 0. For a given initial configuration 1I E S let
D = D(q) :_ {m E 7L : 'im = -1 }. A sample path of the process started at '1. denoted
a D( " ) (t, w) _ (a^" ) (t, co): n E 71"), t > 0, co e S2, is given in terms of the percolation flow
on w by
I if the flow on as reaches (n, t) from some (m, 0), m E D,

1 +1 otherwise. (T.11.1)
The Markov property follows from the following basic property of the construction:
D^Q UI9I^S.CU %i
QD(^) (t + s, w) = (t, Us w), (T.11.2)
where, for co E 0 as above, U,: S2 --+ Q, is given by
U,(co) = ((W m , n (t + s): t >- 0): me 7L , ne 3(n)). (T.11.3)
The Markov property follows from (T.11.2) as a consequence of the invariance of P

under the map U,: S2 - 0 for each s; this is easily checked for the Poisson distribution
with constant intensity and, because of independence, it is enough.
2. The distribution of the process {a(t)} started at it E S = { - 1, 1}'" is the induced
probability measure P" on (S, .) defined by
P(B) = P({6 D(1) (t)} E B),

Be.. (T.11.4)
A metric for S, which gives it the so-called product space topology, is defined by
= --kJ.21ni_._... (T.11.5)
nEzd
where uni _ I(n l , ... , n d )I := max(1n 1 1, ... , In d l). Note that this metric is possible largely
because of the denumerability of 7L . In any case, this makes the fact that, (i) ' is
the Bore! sigmafield and (ii) S is compact, rather straightforward exercises for this
metric (topology). Compactness is in fact true for arbitrary products of compact
spaces under the product topology, but this is a much deeper result (called Tychonoffs
Theorem).
To prove the Feller property, it is sufficient to consider continuity of the mappings
of the form a ' PQ (v(t) = 1, ne F) at tt e S, for, fixed t >0 and finite sets F c V.
Inclusionexclusion principles can then be used to get the continuity of a -+ P0 (B)
for all finite-dimensional sets B e F. The rest will follow from compactness (tightness).
For simplicity, first consider the map a * PP (a (t) _ 1), for F = {n}. Open
neighborhoods of t) e S are provided by sets of the form,
R A (1l) = {a e S: Qm = h m for all me A}, (T.11.6)
where A is a finite subset of 7L", say
A={m=(m i ,...,m d )e7L :Im ; j<r}, r>0. (T.11.7)
Now, in view of the simple percolation structure for the voter model, regardless of
the initial configuration il, one has
Qn^^ ) (t) = Qm(n,t)( 0 ) a.s., (T.11.8)
where m(n, t) is some (random) site, which does not depend on the initial configuration,
obtained by following backwards through the percolation diagram down and against
the direction of arrows. Thus, if {r k } is a sequence of configurations in fl that converges
to il in the metric p as k oo, then o (t) v,,,,, t) (t) a.s. as k + oo. Thus, by
Lebesgue's Dominated Convergence Theorem (Chapter 0), one has
1 2 Pnk (Qfl(t) _ 1) = Ea 11k .")(t) = 1 2 Pn (Qn(t) _ 1).
Thus, P,) , (a(t) = 1) + Pq (v(t) = 1) as k co.

3. A real-valued function cp(m), m e ZL", such that for each m e Z",
I /
4 (n) = q (m), (T.11.9)
2d ^ Ym) E
where 0(m) denotes the set of nearest neighbors of m, is said to be harmonic (with
respect to the discrete Laplacian on 7Ld ). Note that (T.11.9) may be expressed as
(p(Z O ) = E m cp(Z,) = E m rp(Zk ), k>_ 1, (T.11.10)
where {Z k } is the simple symmetric random walk starting at Z o = m.
Theorem T.11.1. (Boundedness Principle). Let (p be a real-valued bounded harmonic

function on 7L . Then cp is a constant function.
Proof. Suppose that {(Zk, ZZ)} is a coupling of two copies of {Z k }; i.e., a Markov
chain on S x S such that the marginal processes {Zk} and {Z'} are each Markov

>,
chains on S with the same transition law as {Z}. Then, if r = inf{k 1: Zk = Zk} < o0
-
a.s. one may define Z; = Zk for all k > r without disrupting the property that the
process {(Z;, Zk)} is a coupling of {Z k }. With this as the case, by the boundedness
of cp and (T.11.10), one obtains
Ig(n) 9(m)I = IEnp(Zk) Emco(Zk)I = I E (n.m){q (Zk) p(Zk) }I

E(n,m)Iq(Zk) w(Zk)I , 2BP(n.m)(Zk 0 Zk)
= 2 BP(n.ml(z > k), (T.l 1.11)
where B = sup.J(p(x)I. Letting k + oc it would then follow that q(n) = p(m). The
success of this approach rests on the construction of a coupling {(Z, Zk)} with r < 00
a.s.; such a coupling is called a successful coupling.
The independent coupling is the simplest to try. To see how it works, take d = 1
and let {Z} and {Z} be independent simple symmetric random walks on Z starting
at n, m, n m even. Then {(Zk, Zk)} is a successful coupling since {Zk Zk} is easily
_>
checked to be a recurrent random walk using the results of Chapters II and IV or
theoretical complement to Section 3 of Chapter 1. This would also work for d = 2,
but it fails for d 3 owing to transience. In any case, here is another coupling that
is easily checked to be successful for any d. Let {(Z, Zk)} be the Markov chain on
7L x Z d starting at (n, m) and having the stationary one-step transition probabilities
furnished by the following rules of motion. At each time, first select a (common)
coordinate axis at random (each having probability 1 /d). From (n, m) such that the
coordinates of n, m differ along the selected axis, independently select directions
(each having probability i) along the selected axis for displacements of the components
n, m. If, on the other hand, the coordinates of n, m agree along the selected axis, then
randomly select a common direction along the axis for displacement of both
components n, m. Then the process {(Zk, Zk)} with this transition law is a coupling.
That it is successful for all (n, m), whose coordinates all have the same parity, follows
from the recurrence of the simple symmetric random walk on 1 1 , since it guarantees
with probability 1 that each of the d coordinates will eventually line up. Thus, one
obtains cp(n) = cp(m) for all n, m whose coordinates are each of the same parity. This
is enough by (T.11.9) and the maximum principle for harmonic functions described
below.
The following two results provide fundamental properties of harmonic functions

given on a suitable domain D. Their proofs are obtained by iterating the averaging
property out to the boundary, just as in the one-dimensional case of Exercise 3.16,
Chapter 1. First some preliminaries about the domain. We require that D = D u 33D,
where D , CD are finite, disjoint subsets of Z d such that

(i) 3(n) c D for each n e D , where 3(n) denotes the set of nearest neighbors of n;
(ii) (m) n D 0 Q for each me OD;
(iii) for each n, me D there is a path of respective nearest neighbors n n 2 , ... , n k

in D such that n, e 0(n), n k e 0(m).
Theorem T:11.2. (Maximum Principle). A real-valued function cp on a domain D, as


defined above, that is harmonic on D , i.e.,
p(m) _m
2d neB(ml
(P(n), m e D,
takes its maximum and minimum values on D.
In particular, it follows from the maximum principle that if cp also takes an extreme

value on D , then it must be constant on D . This can be used to complete the proof
of Theorem T. 11.1 by suitably constructing a domain D with any of the 2 d coordinate

parities on D and D desired.
Theorem T.11.3. (Uniqueness Principle). If cp, and Q 2 are harmonic functions on D,

as defined in Theorem T.11.2, and if q, 1 = cp 2 on OD, then gyp, = c0 2 on D.
Proof. To prove the uniqueness principle, simply note that q - 42 is harmonic with
zero boundary values and hence, by the Maximum Principle, zero extreme values.
n
The coupling described in Theorem T.11.1 can be found in T. Liggett (1985),
Interacting Particle Systems, Springer-Verlag, New York, pp. 67-69, together with
another coupling to prove the boundedness principle (Choquet-Deny theorem) for
the more general case of an irreducible random walk on V.
4. Additional references: The voter model was first considered in the papers by P. Clifford
and A. Sudbury (1973), "A Model for Spatial Conflict," Biometrika, 60, pp. 581-588,
and independently by R. Holley and T. M. Liggett (1975), "Ergodic Theorems
for Weakly Interacting Infinite Systems and the Voter Model," Ann. Probab., 3,
pp. 643-663. The approach in section 11 essentially follows F. Spitzer (1981), "Infinite
Systems with Locally Interacting Components," Ann. Probab., 9, 349-364. The
so-called biased voter (tumor-growth) model originated in T. Williams and R. Bjerknes
(1972), "Stochastic Model for Abnormal Clone Spread Through Epithelial Basal
Layer," Nature, 236, pp. 19-21.
Much of the modern interest in the mathematical theory of infinite systems of
interacting components from the point of view of continuous-time Markov evolutions
was inspired by the fundamental papers of Frank Spitzer (1970), "Interaction of
Markov Processes," Advances in Math., 5, pp. 246-290, and by R. L. Dobrushin
(1971), "Markov Processes With a Large Number of Locally Interacting
Components," Problems Inform. Transmission, 7, pp. 149-164, 235-241. Since then
several books and monographs have been written on the subject, the most
comprehensive being that of T. Liggett (1985), loc. cit. Other modern books and
monographs on the subject are those of F. Spitzer (1971), Random Fields and
Interacting Particle Systems, MAA, Summer Seminar Notes; D. Griffeath (1979),
Additive and Cancellative Interacting Particle Systems, Lecture Notes in Math.,
No. 724, Springer-Verlag, New York; R. Kindermann and J. L. Snell (1980), Markov
Random Fields and Their Applications, Contemporary Mathematics Series, Vol. 1,
AMS, Providence, RI; R. Durrett (1988), Lecture Notes on Particle Systems and
Percolation, Wadsworth, Brooks/Cole, San Francisco.
CHAPTER V
Brownian Motion and Diffusions
I INTRODUCTION AND DEFINITION
One-dimensional unrestricted diffusions are Markov processes in continuous time

with state space S = (a, b), oo < a 0 introduced in Chapter I as a limit of simple random
walks. In this case, S = (cc, cc). The transition probability distribution
p(t; x, dy) of Ys+ , given Ys = x has a density given by
1 I
p(i; x, y) _ e -2Q cv- x -gt)2. (1.1)
(2tto 2 t) i / 2
Because { Y} has independent increments, the Markov property follows directly.
Notice that (1.1) does not depend on s; i.e., {Y} is a time-homogeneous Markov
process. As before, by a Markov process we mean one having a
time-homogeneous tranition law unless stated otherwise.
One may imagine a Markov process {X,} that has continuous sample paths
but that is not a proc,ss with independent increments. Suppose that, given
Xs = x, for (infnitesimJl) small times t, the displacement X s +, Xs = Xs+, x
has mean and variance approximately tp(x) and ta 2 (x), respectively. Here p(x)
and a 2 (x) are function;; of the state of x, and not constants as in the case of
{Y1 }. The distinction between { Y} and {X,} is analogous to that between a
simple random walk and a birth-death chain. More precisely, suppose
E;X + , Xs X., = x) = tp(x) + o(t),
3
E((X X+, XS) 2 I X = x) = ta 2 (x) + o(t),

X (1.2)
E(I);s +1 Xsl' I XS = x) = o(t),
hold, as t j 0, for every x e S.
367
368 BROWNIAN MOTION AND DIFFUSIONS
Note that (1.2) holds for Brownian motions (Exercise 1). A more general
formulation of the existence of infinitesimal mean and variance parameters,
which does not require the existence of finite moments, is the following. For
every c > 0 assume that
E((X5+, XS) 1 {Ixa.,-xsl_<E} j XS = x) = t(x) + o(t),
E((X.+: )S)21IIX.,,-xslsE} X = x) = ta 2 (x) + o(t), (1.2),
P(IXs,XI>E1Xs=x)=o(t),
hold as t j 0.
It is a simple exercise to show that (1.2) implies (1.2)' (Exercise 2). However,
there are many Markov processes with continuous sample paths for which (1.2)'
hold, but not (1.2).
Example 1. (One-to-One Functions of Brownian Motion). Let {B 1 } be a

standard one-dimensional Brownian motion and cp a twice continuously
differentiable strictly increasing function of (cc, oo) onto (a, b). Then
{Xj :_ {cp(B,)} is a time-homogeneous Markov process with continuous sample
paths. Take p(x) = e' 3 . Now check that EIXÎ = cc for all t > 0, so that (1.2)
does not hold. On the other hand, (1.2)' does hold (as explained more generally
in Section 3).
Definition 1.1. A Markov process {X} on the state space S = (a, b) is said to
be a diffusion with drift coefficient (x) and diffusion coefficient v 2 (x) > 0, if
(i) it has continuous sample paths, and
(ii) relations (1.2)' hold for all x.
If the transition probability distribution p(t; x, dy) has a density p(t; x, y),
then, for (Borel) subsets B of S,
p(t; x, B) = J p(t; x, y) dy. (1.3)

e
It is known that a strictly positive and continuous density exists under the
Condition (1.1) below, in the case S = ( co, oo) (see theoretical complement
1). Since any open interval (a, b) can be transformed into ( oo, oo) by a strictly
increasing smooth map, Condition (1.1) may be applied to S = (a, b) after
transformation (see Section 3).
Below, a(x) denotes either the positive square root of a 2 (x) for all x or the
negative square root for all x.
Condition (1.1). The functions (x), a(x) are continuously differentiable, with
bounded derivatives on S = ( oo, oo). Also, a exists and is continuous, and
INTRODUCTION AND DEFINITION 369
a 2 (x) > 0 for all x. If S = (a, b), then assume the above conditions for the
relabeled process under some smooth and strictly increasing transformation
onto ( oo, co ).
Although the results presented in this chapter are true under (1.2)', in order
to make the calculations less technical we will assume (1.2). It turns out that
Condition (1.1) guarantees (1.2). From the Markov property the joint density
of X,,, X, 2 , ... , X, for 0 < t l < t 2 < . < t is given by the product
p(tl;x,Yi)p(t2 t1;Yi,Y2)"'p(tn tn-1;Y,-1,Yn) Therefore, for an initial
distribution n,
P,,(XX,EB,,...,X,EB.)
fs e,
..
JB,,
p(tl;x,Y)
...
p(t t^-1;Yn-1,Yn)dy ...dy e n(dx). (1.4)
As usual P,, denotes the distribution of the process {X} for the initial distribution
rt. In the case it = S.. we write PX in place of P6 . Likewise, E E X are the
corresponding expectations.
Example 2. (Ornstein-Uhlenbeck Process). Let V, denote the random velocity

of a large solute molecule immersed in a liquid at rest. For simplicity, let V
denote the vertical component of the velocity. The solute molecule is subject
to two forces: gravity and friction. It turns out that the gravitational force is
negligible compared to the frictional force exerted on the solute molecule by
the liquid. The frictional force is directed oppositely to the direction of motion
and is proportional to the velocity in magnitude. In the absence of statistical
fluctuations, one would therefore have m(dV,/dt) = V where m is the mass
of the solute molecule and is the constant of proportionality known as the
coefficient of friction. However, this frictional force V may be thought of
as the mean of a large number of random molecular collisions. Assuming that
the central limit theorem applies to the superposition of displacements due to
a large number of such collisions, the change in momentum m dV over a time
interval (t, t + dt) is approximately Gaussian, provided dt is such that the mean
number of collisions during (t, t + dt) is large. The mean and variance of this
Gaussian distribution are both proportional to the number of collisions, i.e.,
to dt. Therefore, one may take the (local) mean of V to be (/m) V, dt and
the (local) variance to be a 2 dt.
E(V +d , V I V = V) = (/m)V dt + o(dt),

(1.5)
E((V +d , V) 2 V = V) = a 2 dt + o(dt).
Therefore, a reasonable model for the velocity process is a diffusion with drift
(/m)V and diffusion coefficient a 2 > 0, called the Ornstein-Uhlenbeck
process.
An important problem is to determine the transition probabilities for

diffusions having given drift and diffusion coefficients. Various approaches to
this problem will be developed in this chapter, but in the meantime for the
Ornstein-Uhlenbeck process we can check that the transition function given by
p(t; x, y) = [nw 2 -1 m(1 exp{-2m"'t})] - 'I 2 exp m

a (1 e 2 x _m^ 't)2
1(y )
(t > 0, oo < x, y < oo), (1.6)
furnishes the solution. In other words, given V = x, L' Gaussian with mean
xe - '" ` and variance ZQ 2 (1 e_ 2 '``) -1 m. From this one may check (1.5)
directly.
Example 3. (A Nonhomogeneous Diffusion). Suppose {X} is a diffusion with

drift (x) and diffusion coefficient a 2 (x). Let f be a continuous function on
[0, co). Define
Z, =X,+
f o
f(u)du, t30. (1.7)
Then, since f is a deterministic function, the process {Z 1 } inherits the Markov

property from {XX }. Moreover, {Z} has continuous sample paths. Now,
o
E{Z5+ ,Z3 1Z ,=z}=E{Xs+,XXIX=z
(
f(u)du}+

f
s
"+ t f (u) du
=^z J 0
f(u)du)t+f(s)t+o(t). (1.8)
Also,
E{(Z+tZS) 2 lZ3=z}=Ej(Xs+1X.,) 2 X.,=z Jf (u)du}

l o
+ f(u) du ) +o(t)
(f.' s //I
= a 2 (z J S
0
f(u) du)t + o(t). (1.9)
Similarly,
E{IZs+, Zs 1 3 Z., = z} = o(t). (1.10)
In this case {Z} is referred to as a diffusion with (nonhomogeneous) drift and

KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES 371
diffusion coefficients, respectively, given by
(s, z) =p z J du) + f(s),

\ (1.11)
v 2 (s, z) = a 2 (z
\ o

Js f (u) du I .
Note that if {X,} is a Brownian motion with drift and diffusion coefficient
a 2 , and if f (t) := v is constant, then {Z,} is Brownian motion with drift + v
and diffusion coefficient o 2 .
A rigorous construction of unrestricted diffusions by the method of stochastic
differential equations is given in Chapter VII. An alternative method, similar
to that of Chapter IV, Sections 3 and 4, is to solve Kolmogorov's backward
or forward equations for the transition probability density. The latter is then
used to construct probability measures P,, Px on the space of all continuous
functions on [0, oo). The Kolmogorov equations, derived in the next section,
may be solved by the methods of partial differential equations (PDE) (see
theoretical complement 1). Although the general PDE solution is not derived
in this book, the method is illustrated by two examples in Section 5. A third
method of construction of diffusions is based on approximation by birthdeath
chains. This method is outlined in Section 4.
Diffusions with boundary conditions are constructed from unrestricted ones
in Sections 6 and 7 by a probabilistic method. The PDE method, based on
eigenfunction expansions, is described and illustrated in Section 8.
The aim of the present chapter is to provide a systematic development of
some of the most important aspects of diffusions. Computational expressions
and examples are emphasized. The development proceeds in a manner analogous
to that pursued in previous chapters, and, as before, there is a focus on large-time
behavior.
2 KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS,

MARTINGALES
In Section 2 of Chapter IV, Kolmogorov's equations were derived for

continuous-parameter Markov chains. The backward equations for diffusions
are derived in Subsection 2.1 below, together with an important connection
between Markov processes and martingales. The forward, or FokkerPlanck,
equation is obtained in Subsection 2.2. Although the latter plays a less important
mathematical role in the study of diffusions, it arises more naturally than the
backward equation in physical applications.
2.1 The Backward Equation and Martingales

Suppose {X} is a Markov process on a metric space S, having right-continuous
sample paths and a transition probability p(t; x, dy). Continuous-parameter
Markov chains on countable state spaces and diffusions on S = (a, b) are

examples of such processes. On the set B(S) of all real-valued, bounded, Borel
measurable function f on S define the transition operator
(Ttf)(x):= E(f(X) ( Xo = x) = f f (y)p(t; x, dy), (t > 0). (2.1)
Then T, is a bounded linear operator on B(S) when the latter is given the sup
norm defined by
il f II := sup{II(Y)I : Y e S} . (2.2)
Indeed T, is a contraction, i.e., IITrf II < IIfit for all f E B(S). For,
I(T,.f)(x)I 5 J I.f (Y)I P(t;, x, dy) < II f 1I . (2.3)
The family of transition operators {T,: t > 0} has the semigroup property,
Ts+, = TSTt, (2.4)
where the right side denotes the composition of two maps. This relation follows
from
(Ts+,.f )(x) = E(f(Xs+,)1 Xo = x) = E[E(.f(X5+^) J XS) 1 Xo = x]
= E[(T,f)(X.,) Xo = x] = T.,(T1.f)(x). (2.5)
The relation (2.4) also implies that the transition operators commute,
TT, = T A T,. (2.6)
As discussed in Section 2 of Chapter IV for the case of continuous-parameter

Markov chains, the semigroup relation (2.4) implies that the behavior of T,
near t = 0 completely determines the semigroup. Observe also that if
f e Cs(S), the set of all real bounded continuous functions on S, then
(T j)(x) - E(f (XX ) I X0 = x) + f (x) as t j 0, by the right continuity of the
sample paths. It then turns out, as in Chapter IV, that the derivative (operator)
of the function t --+ T, at t = 0 determines {T,: t > 0} (see theoretical
complement 3).
Definition 2.1. The infinitesimal generator A of {T 1 : t > 0}, or of the Markov

process {X1 }, is the linear operator A defined by
(A.f)(x) = lim (Ts i)(x) i(x) > (2.7)

s i o S
for all f e B(S) such that the right side converges to some function uniformly
in x. The class of all such f comprises the domain Â c?f A.
In order to determine A explicitly in the case of a diffusion {X,}, let f be a

bounded twice continuously differentiable function. Fix x e S, and a > 0,
however small. Find r > 0 such that I f"(x) f"(y)I E} I Xo = x). (2.8)
Now by a Taylor expansion of f(X1 ) around x,

f (x)
E(.f (XX) 1 nx, -x1 Xo = x) = E[{ f(x) + (X x).f'(x) + z(X, x) 2
+ 2(XI x) 2 (f " (^,) f ,, (x))} 1 11x xl Vie} I Xo = x],
(2.9)
where ^, lies between x and X. By the first two relations in (1.2)', the expectations
of the first three terms on the right add up to
f(x) + tj,(x)f , (x) + tza 2 (x)f"(x) + 0(t), (t 0). (2.10)
The expectation of the remainder term in (2.9) is less than
x) 2 lilx^ x^sEt 1 Xo = x) = 8Q 2 (x)t + o(t), (t j0). (2.11)
The relations (2.8)-(2.11) lead to
(Tj)(x) - f(x) _
um {u(x)f'(x) + z6 2 (x)f "(x)}
( l o t
Ilfll E(1
-562
2 1 x)
( + liro (fix,xl>el X o x). (2.12)
(lo t
The Firn on the right is zero by the last relation in (1.2)'. As >0 is arbitrary
it now follows that
(T^.1)(x)-
Firn .l (x) = (x)1 '(x) + ia 2 (x)f "(x). (2.13)
rlo t
Does the above computation prove that a bounded twice continuously

differentiable f belongs to -9A ? Not quite, for the limit (2.13) has not been
shown to be uniform in x. There are three sources of nonuniformity. One is
that the o(t) terms in (1.2)' need not be uniform in x. The second is that the o(t)
term in (2.10) may not be uniform in x, even if those in (1.2)' are; for (x)f'(x),
Q 2 (x)f"(x) may not be bounded. The third source of nonuniformity arises from
the fact that given b > 0 there may not exist an e independent of x such that
I f() f (x)I <5 for all x, y satisfying Ix yI < e. The third source is removed
by requiring that f" be uniformly continuous. Assume p(x)f'(x), r 2 (x)f "(x)
are bounded to take care of the second. The first source is intrinsic. One example
where the errors in (1.2)' are o(t) uniformly in x, is a Brownian motion (Exercise
4). In the case the Brownian motion has a zero drift, the second source is
removed if f" is bounded. Thus, for a Brownian motion with zero drift, every
bounded f having a uniformly continuous and bounded f" is in 9A . Indeed,
it may be shown that 27A comprises precisely this class of functions. Similarly,
if the Brownian motion has a nonzero drift, then f e .9A if f, f ', f " are all
bounded and f" is uniformly continuous (Exercise 4). In general one may ensure
uniformity of o(t) in (1.2)' by restricting to a compact subset of S. Thus, all
twice continuously differentiable f, vanishing outside a closed and bounded
subinterval of S, belong to 9A . We have sketched a proof of the following result
(see theoretical complements 1, 3 and Exercise 3.12 of Chapter VII).
Proposition 2.1. Let {X} be a diffusion on S = (a, b). Then, all twice
continuously differentiable f, vanishing outside a closed bounded subinterval
of S, belong to -9A , and for such f
(Af )(x) = u(x).f'(x) + ?v 2 (x)f"(x). (2.14)
In what follows, the symbol A will stand, in the case of a diffusion {X}, for
the second-order differential operator (2.14), and it will be applied to all
twice-differentiable f, whether or not such an f is in _9A
Turning to the derivation of the backward equation, note that the arguments
leading to (2.13) hold for all bounded f that are twice continuously
differentiable. If the function x -+ (T,f )(x) is bounded and twice continuously
differentiable, then applying (2.13) to this function one gets
^t (T, f)(x) = A(T, f)(x), (t > 0, x e S). (2.15)
This is Kolmogorov's backward equation for the function (t, x) (T, f)(x)
(t > 0, x e S). It will be shown below that if f E.9,, then T, f e Q. The following
proposition establishes this fact for general Markov processes.
Proposition 2.2. Let {T,} be the family of transition operators for a Markov
process on a metric space S. If f E 22 A then the backward equation (2.15) holds.

Proof. If f e -9A then using (2.6) and (2.3),
TS(T,f) T^f s is .i Af
T(A.f) = T,( T

Ts f f Af 0 ass j0, (2.16)
which shows that T, f e _9A and, therefore, by the definition of A, (2.15) holds.
n
The relation (2.16) also shows that
A(T,f) = T 1 (Af), (2.17)
i.e., T, and A commute on .9A This fact is made use of in the proof of the
following important result.
Theorem 2.3. Let {X} be a right-continuous Markov process on a metric space

S, and f E -9A . Assume that s > (A f)(X ) is right-continuous for all sample
S
paths. Then the process {Z1 }, where
Z,'= f(X,) J 0
r (Af)(X) ds (t > 0 ), (2.18)
is a {.yt }-martingale, with ^ == Q{X: 0 < u <, t}.
Proof. The assumption of right continuity of s (A f)(X.,) ensures the

integrability and .F,-measurability of the integral in (2.18) (theoretical
complement 2). Now
E(Zs+, I Js) = E(.f(Xs+,) 1 `ys ) Jo (Af)(XX) du E fS+t (A.f )(X.) du 1 5;
= (T,.f)(X,) (Af )(Xu) du E \Jo (A.f )(XS ) du I .ys ) , (2.19)

Jo
where XS is the after-s or shifted process (X) = XS+ (u > 0). By the Markov
property,
E
(A.f)(XS ) du 1 5s J =
[
Ex
(A.f)(X^.)
du]X=x,.
LJ o (EXA.f )(X.) du].=x.
=
LJ
. P T (Af)(x) dul
o X-x
s (2.20)
interchanging the order of integration with respect to Lebesgue measure and

taking expectation for the second equality (Fubini's Theorem, Section 4 of
Chapter 0). Now replace the last integrand T(Af) by A(T f) (see (2.17)) to get
E\Jo(Af)(Xs )udul ysl JoA(Tuf)(XS) du. (2.21)
Substituting (2.21) in (2.19) obtain
E(Zs+: 1 5s) = (T^f)(XS)

J
0
A(Tf)(X) du
f
o
(A.f )(X) du. (2.22)
By the backward equation (2.15), which holds by Proposition 2.2, and the
fundamental theorem of calculus,
(T,f)(XS) = f(X ) + J r a (T,f)(X.,) du = f(X) +

5
o au f,
o
A(T,.f)(XS)du. (2.23)
Use this in (2.22) to derive the desired result,
E(Zs+J .gis) = .f(XS) J s (A.f )(X.) du = Z. n

0
Combining Theorem 2.3 and Proposition 2.1, the following corollary is

immediate.
Corollary 2.4. If Condition (1.1) holds for a diffusion {X,}, then for every twice
continuously differentiable f vanishing outside a compact subset of S, the
process {Z} in (2.18) is a {.F,}-martingale.
It will be shown in Section 3 of Chapter VII by direct probabilistic arguments

that {Z} is a {.F}-martingale for a much wider class of twice-differentiable
functions f than prescribed by the corollary. As a simple example, take {X,}
to be a Brownian motion with drift p and diffusion coefficient 6 Z . Then
d
A=2o.2 aZ2+p .
dx dx
If one takes f(x) = x in (2.18), then Zt = XX tp (t >, 0), which is a martingale,

even though f is unbounded. In the case p = 0, one may take f (x) = x 2 to get
Z, = Xt ta 2 (t >, 0), which is also a martingale. Still, one can do a lot under
the assumption of Corollary 2.4. One application is the following.
Proposition 2.5. Let {X,} be a diffusion on S = (a, b), whose coefficients satisfy
Condition (1.1). Let [c, d] c (a, b). Then
f
^ d
exp J - 2(y
(' ' )
d> dz
Px ({X,} reaches c before d) = ` o ? y --
( ) ) , c < x <, d.
J^ a
c f
d exp^ f Z
2u( y) dy1 dz
2 (y) ))
(2.24)
Proof. Define a twice continuously differentiable function f that equals the
right side of (2.24) for c < x < d, and vanishes outside [c s, d + c] c (a, b)
for some s > 0. This is always possible as f" is continuous at c, d (Exercise 3).
By Corollary 2.4,
Z,:= f(X,)
f o
(A.f)(X,) ds (t 3 0 ), (2.25)
is a {.F,}-martingale. Define the {}-stopping time (see Chapter I (13.68)),
z = inf{t > 0: X, = c or d}. (2.26)
Let x E (c, d). By the optional stopping result Proposition 13.9 of Chapter I
E x Zt = E x Z o . (2.27)
But Z o = f(X0) - f(x) under PX , so that E.,Z 0 is the right side of (2.24). Now
check that Af(x) = 0 for c <x < d, so that (Af)(XS ) = 0 for 0 < s <t if
x e (c, d). Therefore,
on {XT = c},
Z= f(X) = {f(c) = 1 (2.28)
f(d) = 0 on {X, = d},
so that EZL is simply the left side of (2.24). n
As in Section 9 of Chapter I and Section 2 of Chapter III, one may derive

criteria for transience and recurrence based on Corollary 2.4. This will be
pursued later in Sections 4, 9, and 14. The martingale method will be more
fully explored in Chapter VII.
2.2 The FokkerPlanck Equation

Consider a diffusion {X,} on S = (a, b) whose coefficients satisfy Condition (1.1).
Let p(t; x, y) be the transition probability density of {X,}. Letting f = l 8 in

(2.5) leads to
SB
=J I p(t;zydy)ps;xz) dz
+ t;x,y)dy
ss \
J
= J(J p(
a s
t; z, y)p(s; x, z) dz) dy,
//
(2.29)
for all Borel sets B c S. This implies the so-called ChapmanKolmogorov

equation
p(s + t; x, y) = J p(t; z, y)p(s; x, z) dz = (T3.f )(x), (2.30)

s
where f (z) := p(t; z, y). A somewhat informal derivation of the backward

equation for p may be based on (2.13) and (2.30) as follows. Suppose one may
apply the computation (2.13) to the function f in (2.30). Then
lim
(Tsf)(x) i(x) . ( 2.31)
(Af)(x) =
slo s
But the right side equals p(t; x, y)/t, and the left side is Ap(t; x, y). Therefore,
ap(t; x, y) _ aP(t; x, y ) 1 2 2 P(t; x, y) (2.32)

at p(x) ax + i s (x) a axe
which is the desired backward equation for p.

In applications to physical sciences the equation of greater interest is
Kolmogorov's forward equation or the FokkerPlanck equation governing the
probability density function of X, when X0 has an arbitrary initial distribution
it. Suppose for simplicity that it has a density g. Then the density of X, is given by
g(x)p(t; x, y)
(T*g)(y):= fa
b
dx. (2.33)
The operator T* transforms a probability density g into another probability

density. More generally, it transforms any integrable g into an integrable
function T*g. It is adjoint (transpose) to T, in the sense that
<T *g,.f > :=

J s(T7g)(y)f(y) dy = fs g(x)(T:f)(x) dx = <g, T,f>. (2.34)
Here <u, v> = f u(x)v(x) dx. If f is twice continuously differentiable and vanishes
outside a compact subset of S, then one may differentiate with respect to t in
(2.34) and interchange the orders of integration and differentiation to get, using
(2.17),
à T*9,.i) _ (T f)= <g, T1A.J> t
= <T9, A.f > = f (T *9)(Y)(Af)(Y) dy. (2.35)
Now, assuming that f, h are both twice continuously differentiable and that f
vanishes outside a finite interval, integration by parts yields
<h, Af> =f s
h(Y)[p(Y).f'(Y) + 1 2 (Y).f"(Y)] dY
2
(.u(Y)h(Y)) + dy2 (2a2(Y)h(Y)) f(Y) dy = <A * h,.f >, (2.36)
Js L Y
where A* is the formal adjoint of A defined by
i
d
(A*h)( y ) _ (,u(Y)h(Y)) + d
yZ (ia 2 (Y)h(Y)). (2.37)
dy
Applying (2.36) in (2.35), with T*g in place of h, one gets
a
*g,.i = <A*T *9, f >. (2.38)
at T
Since (2.38) holds for sufficiently many functions f, all infinitely differentiable
functions vanishing outside some closed, bounded interval contained in S for
instance, we get (Exercise 1),
t (T*9)(Y) = A * (T*9)(Y) (2.39)
That is,
I 2
g(x) dx ay ((Y)P(t; x, Y)) 2 aye ( 62 (Y)P(t; x, Y)) 9(x) dx.
Js at Js L
(2.40)
Since (2.40) holds for sufficiently many functions g, all twice continuously
differentiable functions vanishing outside a closed bounded interval contained
in S for instance, we get Kolmogorov's forward equation for the transition

probability density p ( Exercise 1),
ap(t; x, y) 2
y (h(y)p(t; x, y)) + ay2 (icr 2 (y)p(t; x, y)), (t > 0). (2.41)
t
For a physical interpretation of the forward equation (2.41), consider a dilute

concentration of solute molecules diffusing along one direction in a possibly
nonhomogeneous fluid. An individual molecule's position, say in the x-direction,
is a Markov process with drift u(x) and diffusion coefficient a 2 (x). Given an
initial concentration c o (x), the concentration c(t, x) at x at time t is given by
c(t, x) =
fs
c o (z)p(t; z, x) dz, (2.42)
where p(t; z, x) is the transition probability density of the position process of

an individual solute molecule. Therefore, c(t, x) satisfies Kolmogorov's forward
equation
ac(t, x)
A*c(t, x) ax J(t, x), (2.43)
with J given by
J(t, x) _ [1 2 (x)c(t, x)) + (x)c(t, x)]. (2.44)
The Kolmogorov forward equation is also referred to as the FokkerPlanck

equation in this context. Now the increase in the amount of solute in a small
region [x, x + Ax] during a small time interval [t, t + At] is approximately
c(t, x)
At Ax. (2.45)
t
On the other hand, if v(t, x) denotes the velocity of the solute at x at time t,
moving as a continuum, then a fluid column of length approximately v(t, x) At
flows into the region at x during [t, t + At]. Hence the amount of solute that
flows into the region at x during [t, t + At] is approximately v(t, x)c(t, x) At,
while the amount passing out at x + Ax during [t, t + At] is approximately
v(t, x + Ax)c(t, x + Ax) At. Therefore, the increase in the amount of solute in
[x, x + Ax] during [t, t + At] is approximately
[v(t, x)c(t, x) v(t, x + Ax)c(t, x + Ax)] At. (2.46)

TRANSFORMATION OF THE GENERATOR 381
Equating (2.45) and (2.46) and dividing by At Ax, one gets
c(
x)c(t, x)). (2.47)
t x) x (v(t,
Equation (2.47) is generally referred to as the equation of continuity or the

equation of mass conservation. The quantity v(t, x)c(t, x) is called the flux of
the solute, which is seen to be the rate per unit time at which the solute passes
out at x (at time t) in the positive x-direction. In the present case, therefore,
the flux is given by (2.44).
3 TRANSFORMATION OF THE GENERATOR UNDER

RELABELING OF THE STATE SPACE
In many applications the state space of the diffusion of interest is a finite or a

semi-infinite open interval. For example, in an economic model the quantity
of interest, say price, demand, supply, etc., is typically nonnegative; in a genetic
model the gene frequency is a proportion in the interval (0, 1). It is often the
case that the end points or boundaries of the state space in such models cannot
be reached from the interior. In other words, owing to some built-in mechanism
the boundaries are inaccessible. A simple way to understand these processes is
to think of them as strictly monotone functions of diffusions on S = R 1 . Let
d 2 d
+(x)x (3.1)
A=162(x)dx2
be the generator of a diffusion on S = (co, oo). That is to say, we consider

a diffusion with coefficients (x), a 2 (x). Let 0 be a continuous one-to-one map
of S onto S = (a, b), where oo < a <b < oo. It is simple to check that if {X1 }
is a diffusion on S, {Z,} :_ {î(X,)} is a Markov process on S having continuous
sample paths (Exercise 1.3). If 0 is smooth, then {Z 1 } is a diffusion whose drift
and diffusion coefficients are given by Proposition 3.1 below.
First we need a lemma.
Lemma. If {X} is a diffusion satisfying (1.2)' for all c> 0, then for every r> 2
one has
E(IX , XX = x) = o(t) as t10, (3.2)
for all e'>0.
Proof. Let r> 2, E' > 0 be given. Fix 0 > 0. We will show that the left side of
(3.2) is less than 6t for all sufficiently small t. For this, write
(5= (0/(2 r 2 (x))) 1 " ( r -2) . Then the left side of (3.2) is no more than
E(IXS+1 Xsl'lnx.+txsl,a) ( XS = x)
r
+ E(IXs+r X 1 (Ixs+t xsls^xs.txsl>S) Xs = x)
S' 2 (1 X., X521(1 xsi,ts) 1 XX = x) + E 'r P(IX., XJ > (5 XS = x)
(o2(x)t + o(t)) + (e')ro(t) = 0 t + o(t). (3.3)

l 2 o' 2 (x)
The last inequality uses the fact that (1.2)' holds for all e > 0. Now the term
o(t) is smaller than Ot/2 for all sufficiently small t. n
Proposition 3.1. Let {X} be a diffusion on S = (c, d) having drift and diffusion
coefficients (x), Q 2 (x), respectively. If 0 is a three times continuously
differentiable function on (c, d) onto (a, b) such that 0' is either strictly positive
or strictly negative, then {Z, :_ (XX )} is a diffusion on (a, b) whose drift i()
and diffusion coefficient Q 2 (.) are given by
A(z) = 4) (4 ' + i4) (4) 1(z))a2(4,-1(z)),

z E (a, b). (3.4)
& 2 (Z) _ (4' 1 (0 -1 (Z))) 2 2 (4 -1 (Z))'
Proof. By a Taylor expansion,
Za+, Z., = 0(Xs+r) 0(X.,)
= (X,+' X.)^'(XS) + 1 (Xs +, X:) 2 4)"(X5) + 3 (Xs +, XS) 3 4"(Z),

{
2!
(3.5)
where lies between Xs and XS+ ,. Fix s > 0, z e (a, b). There exist positive
constants (5 1 (),5 2 (c) such that 0 - ' maps the interval [z e, z + s] onto
[4 -1 (z) (5 1 (e), - '(z) + 6 2 (s)] Write x = 4 - '(z), and let S m = min{8 1 (s), 6 2 (8)},
SM = max{81(c), 62(8)}. Then
1 ({Zs+t-zlbE) = 1(x-61(E)sX,+t,x+bz(E))1
(3.6)
1 (Ixa+t-xlsam) 1 {IZ.+t-zI e} 1 {IX,+t-xlsM)'

Therefore,
E((Xs X., ) 1 {Iza+t-zl,E) 1 Zs = z) = E((X, x)ltl+t-IE) I X, = x)

= E((X5+, x ) l (Ixa+t xl) 1 XX = x )
+ E((Xs x ) l (Ix.+txl>b,,.,iz,+tzl5e) I Xs =
(3.7)
In view of the last inequality in (3.6), the last expectation is bounded in
magnitude by
SMP(IXs+a xI > CS m I XX = x),
which is of the order o(t) as t j 0, by the last relation in (1.2)'. Also, by the first
relation in (1.2)',
E((Xs+t Xs)ltIxs+t-xl,am) I Xs = x) = (x)t + o(t).
Therefore,
E((XX+r XS)l(Iz. ZIsE) 1 Z = z) = u(x)t + o(t).

,
(3.8)
In the same manner, one has
E((X5+1 Xs) 21 (1z,+tzsl,E) 1 Zs = z) = E((Xs+r Xs) 21 (Ixs-,xaI o,,,} I Xs = x)
+ O(SMP(I Xs+r xl > b m I Xs = x)

= 0 2 (x)t + o(t). (3.9)
Also, by the Lemma above,
,,,
E(IXs+r Xs1 3 1o (^)Ilnzs.,z,I,E) ( ZS = z)
cE(IXX+r Xsl 31 {Ix,,,-x,I_<oM} I Xs = x) = o(t). (3.10)
Using (3.8)-(3.10), and (3.5), one gets
E((Zs+r Zs)ltlzs,, zsI E) Zs = z) = O'(x),U(x)t + ir 2 (x)O ,, (x)t + 0(t), (3.11)
so that the drift coefficient of {Z r } is
(^ (z))( 1 (z)) + 1a 2 (4 1 (z)) (4 1 (z))
In order to compute the diffusion coefficient of {Z}, square both sides of

(3.5) to get
E((Z5+r Zs) 21 {Izs.t_ZsI e) 1 Zs = z)
= (t'(x)) 2 E((X5+r Xs) 21 uz,.,-zal,E) I XS = x) + R r , (3.12)
where each summand in R, is bounded by a term of the form
cE(IX3+, Xsl'lnz,+-z,IsE)1 Xs = x)
5 cE(IXs+r XSI l(lxa.-xa1âM) XS = x),

where r> 2. Hence by the Lemma above R, = o(t) and we get, using (3.9) in
(3.12),
E((Z, ZS) 21 {IZ ,,-zal-<E) 1 ZS = z) = (4'(x)) 2 o 2 (x)t + o(t). (3.13)
Finally,
P(IZs+r Zj>cIZs=z)<P(IX5+, Asl> 6 .1X.,=x)= 0 (t), (3.14)
by the last condition in (1.2)'. n
Example 1. (Geometric Brownian Motion). Let S = (cc, cc), S = (0, cc),

4(x) = e". If A is given by (3.1), then the coefficients of A are
(z) = z(log z) + Zza 2 (log z),

(0<z<oo). (3.15)
v 2 (z) = z 2 Q 2 (log z),
In particular, a Brownian motion with drift and diffusion coefficient 0 2

becomes, under the transformation x --> ex, a diffusion with state space
S = (0, oo) and generator
Z Z Z Z d
z2 + ( + 2 f z z . d (3.16)
In other words, the mean rate of growth as well as the mean rate of fluctuation
in growth at z is proportional to the size z. Note that one has
Z, = e x'. (3.17)
But {X} may be represented as XX = X0 + tp + cB^, t >, 0, where {B1 } is a

standard one-dimensional Brownian motion starting at zero and independent
of X0 . Then (3.17) becomes
Z, = Z o e` ', Zo = e x o. (3.18)
The process {Z} is sometimes referred to as the geometric Brownian motion.
Example 2. (Geometric Ornstein-Uhlenbeck Process). The Ornstein-Uhlenbeck

process with generator (see Example 1.2)
i
dx' (cc < x < co), (3.19)
A Zoe dx z Yx
is transformed by the transformation x -+ e" into a diffusion on (0, oo) with

generator
d2 - f \yz log z - 2
= ? Q Z Z Z dz2 z d
z /%i dz , ( 0<z<oo ). (3.20)
Example 3. S = (- co, oo), S = (0, 1), 4(x) = ex/(ex + 1). For this case, first
transform by x - ex, then by y - y/(1 + y). Thus, one needs to apply the
transformation y(y): y - y/(1 + y) to the operator with coefficients (3.15). Since
the inverse of y is y '(z) = z/(1 - z), one may use (3.15) to obtain the
transformed operator (Exercise 1)
dz
A = Zz 2 (1 - Z
z) 2 7 2 log
( 1 -z dz'
z z(1 - z) z
+ z(1 - z) log-----) +-- a 2 log ---
1-z 2 1-z
- z 2 (1 - z)Q2 (los
_--)]
r z d 2
= Zz(1 - z) z(1 - z)a 2 log
L ---
1 - z j dzZ
+ { 2j( log z I + (1 - 2z) 2 1 log z I} ] , 0< z <1. (3.21)

l 1 z/ 1 jjjz dz`
In particular, a Brownian motion is transformed into a diffusion on (0, 1) with

generator
=
r
L d
zz(1 - z) z(1 - z)a 2 z2 + ( 2 + (1- 2z)Q 2 ) d , 0< z <1, (3.22)
and an Ornstein-Uhlenbeck process is transformed into a diffusion on (0, 1)

with generator
d
= Zz(1 - z) z(1 - z)a 2 dz -
dz 2 1
2y log z - - (1 - 2z) a2
- z dz
0<z<1. (3.23)
Example 4. Let S = (-oo, oo), S = (0, oo), 4(x) = e"'. If
dz
A _iQZ_
z dx2
then
z
= Z6 2 (9z 2 (log z)4"3) dz 2 z{6(log z)" + 9(log z)413} . (3.24)
z + Za dz
In this example the transformed process
Z,=e', t>,0 (3.25)
(where {X,} is a Brownian motion) does not have a finite first moment (Exercise 2).
4 DIFFUSIONS AS LIMITS OF BIRTHDEATH CHAINS
We now show how one may arrive at diffusions as limits of discrete-parameter

birthdeath chains with decreasing step size and increasing frequencies. Because
the underlying Markov chains do not in general have independent increments,
the limiting diffusions will not have this property either. Indeed, the Brownian
motions are the only Markov processes with continuous sample paths that have
independent increments (see Theorem T.1.1 of Chapter IV).
Suppose we are given two real-valued functions (x), a 2 (x) on 6R' _ ( oo, 00)
that satisfy Condition (1.1). Throughout this section, also assume that
(x), v 2 (x) are bounded and write
Q = sup Q 2 (x). (4.1)

x
Consider a discrete-parameter birthdeath chain on S = {0, A, 2A, ...}

with step size A > 0, having transition probabilities p ;; of going from iA to JA
in one step given by,
= a;o^ ;=Q Z (ie)E^t(ie)E

_ 2e U2(ie)8 1z(iA)E
2A2 ô> := 2e2 + 2A
(4.2)
a z (i0)e = 1
cep aê^
Pu 1 ez
= < <
with the parameter given by
A2
Az . (4.3)
Uo
Note that under Condition (1.1) and boundedness of p(x), v z (x), the quantities
, 5, and I ;) a; ) are nonnegative for sufficiently small A. We shall
let e be the actual time in between two successive transitions. Note that, given

DIFFUSIONS AS LIMITS OF BIRTHDEATH CHAINS 387
that the process is at x = iA, the mean displacement in a single step in time e is
A /3i() - A bi() = p(iA)e = fi(x)e. (4.4)
Hence, the instantaneous rate of mean displacement per unit time, when the
process is at x, is u(x). Also, the mean squared displacement in a single step is
A 2 + ( - A) 2 8; ) = r 2 (iA)c = Q Z (x)E. (4.5)
Therefore, a 2 (x) is the instantaneous rate of mean squared displacement per

unit time.
Conversely, in order that (4.4), (4.5) may hold one must have the birth-death
parameters (4.2) (Exercise 1). The choice (4.3) of c guarantees the nonnegativity
of the transition probabilities p ;; , p ;.; _,, pi,;+l.
Just as the simple random walk approximates Brownian motion under proper
scaling of states and time, the above birth-death chain approximates the
diffusion with coefficients (x) and u'(x). Indeed, the approximation of
Brownian motion by simple random walk described in Section 8 of Chapter I
follows as a special case of the following result, which we state without proof
(see theoretical complement 1). Below [r] is the integer part of r.
Theorem 4.1. Let { Y: n = 0, 1, 2, ...} be a discrete-parameter birth-death

chain on S = {0, A, 2&. with one-step transition probabilities (4.2) and
with Yo) = [x o /0]A where x 0 is a fixed number. Define the stochastic process
{X} := { Yte 1 } (t >, 0). (4.6)
Then, as A j 0, {X} converges in distribution to a diffusion {X} with drift

(x) and diffusion coefficient 6 2 (x), starting at x 0 .
As a consequence of Theorem 4.1 it follows that for any bounded continuous

function f we have
A-^
) )
lim E{.f(X( ) I X = ^]Aj = Ez.f(X,).
[ (4.7)
By (2.15), in the case of a smooth initial function f, the function u(t, x):= E x f (XX )
solves the initial value problem
au _ ^
Z 2(x) ai Z + p(x) au , lim u(t, x) _ f (x). (4.8)
ac ax ax ,1 o
The expectation on the left side of (4.7) may be expressed as

u (A) (n, i)'=
>f(jA)P (4.9)

where n = [ t/s], i = [x/A], and p;; are the n -step transition probabilities. It is
)
illuminating to check that the function (n, i) u (n, i) satisfies a difference ( )
equation that is a discretized version of the differential equation (4.8). To see

this, note that
(
P;
+ i)
= PiiPci ) +P).i +i pi+i.;+pa,i - iPi"- i,; )
(n) _ ce) c")

= ( 1 )
(e) (") (o) (")
at )P); + t Pt+i,i + bI PI -i,;
(n)
= P); + (A) (")
i (Pt +i.;
(n) (D) (n)
P.;) bt (Pt;

Pi-
(n)
i,;), (4.10)
or,
P( P ( __ (iA) (") (") + (P,;(") p)(")

.+
20 {(P i,; Pi;) -i,;)}
1 Q2 (iA) (")
+ 2 A2 (P( +i,; 2pi(;
(") + PI
(") -i, ;) ( 4.11 )
Summing over j one gets a corresponding equation for u () ( n, i).

As Ab 0 the state space S = {0, A, 2A,. . .} approximates l = ( oo, oo),
provided we think of the state jA as representing an interval of width A around
jA. Accordingly, think of spreading the probability p;j ) over this interval. Thus,
one introduces the approximate density y -- p () ( t; x, y) at time t = ne for states
x = iA, y =jA, by
p(e)(ns; iA, jA) := h-. (4.12)
By dividing both sides of (4.11) by A, one then arrives at the difference equation
(P (e) ((n + 1)s; iA, jA) p ( e)(ne; iA,jA)) /E

= (iA)(P (n) (ns; (i + 1)A, jA) p ( ) (ne; (i 1)A, jA))/2A
e
+ ia 2 (iA)(P (e) (ns; (i + 1)A, jA) 2p ( ) ( nc; iA,jA)

A
+ p ( e) (ne; (i 1)A, jA))/AZ. (4.13)
This is a difference-equation version of the partial differential equation
y) op(; x, y) + Za2(x) a 2 P(t; x, y)

= k(x) i
a t
ap(t;, ax
for t > 0, oo < x, y < oo, (4.14)
at grid points (t, x, y) = (ne, iA, jA).

Thus, the transition probability density p(t; x, y) of the diffusion {X} may
TRANSITION PROBABILITIES FROM THE KOLMOGOROV EQUATIONS: EXAMPLES 389
be approximately computed by computing p. The latter computation only

involves raising a matrix to the nth power, a fairly standard computational
task. In addition, Theorem 4.1 may be used to derive the probability i/i(x) that
{X,} reaches c before d, starting at x E (c, d), by taking the limit, as A J 0, of
the corresponding probability for the approximating birthdeath chain {Y;,e }. )
In other words (see Section 2 of Chapter III, relation (2.10)),
j-1 Sce)S(e) ê>

r r -1 5 1+1
r=to
(s) 1
0(x) = leim ê^ gê>
(4.15)
+1
1 L=iie^r +1 r Pi+1
where
i' = x], i= i = [ ]. (4.16)

[ [ ]
The limit in (4.15) is (Exercise 2)
J Y1 exp
f rA
( 2 (Y)/a 2 (Y)) dY j
rip c
1L/(X) = hm -
eio I + ^Y1 exp

r =i +t ( J . c
re ( 2 li(Y)/a 2 (Y)) dY
f I
d
exp 2 (Y)/Q 2 (Y)) d y} dz

^ _JZ (
Z (4.17) )
J c
exp{ J ( (Y)/a (Y)) dy} dz
c
2 2
)))
which confirms the computation (2.24) given in Section 2. This leads to necessary
and sufficient conditions for transience and recurrence of diffusions (Exercise
3). This is analogous to the derivation of the corresponding probabilities for
Brownian motion given in Section 9 of Chapter I.
Alternative derivations of (4.17) are given in Section 9 (see Eq. 9.23 and
Exercise 9.2), in addition to Section 2 (Eq. 2.24).
5 TRANSITION PROBABILITIES FROM THE KOLMOGOROV

EQUATIONS: EXAMPLES
Under Condition (1.1) the Kolmogorov equations uniquely determine the

transition probabilities. However, solutions are generally not obtainable in
closed form, although numerical methods based on the scheme described in
Section 4 sometimes provide practical approximations. Two examples for which

the solutions can be obtained explicitly from the Kolmogorov equations alone
are given here.
Example 1. (Brownian Motion). Brownian motion is a diffusion with constant

drift and diffusion coefficients. First assume that the drift is zero. Let the diffusion
coefficient be a 2 . Then the forward, or Fokker-Planck, equation for p(t; x, y) is
ap(t at a
, Y) = 2a2 a Z P(; 2 , Y)
( t > 0, co < x < oc, oo < y < co ). (5.1)
Y
Let the Fourier transform of p(t; x, y) as a function of y be denoted by
= x, y) dy. (5.2)
P(t; x, )
J e 14y p(t;
Then (5.1) becomes
(5.3)
t 2 Q2'
or,
2
)
0, (5.4)
t (e
whose general solution is
p(t; x, ) = c(x, l;)exp{?a 2 2 t}. (5.5)
Now p is the characteristic function of the distribution of X, given X 0 = x, and

as t 10 this distribution converges to the distribution of X0 that is degenerate
at x. Therefore,
c(x, Z) = lim p(t; x, Z) = E(e` X ) = e`x, (5.6)

tjo
and we obtain
^(t; x, l;) = exp {il x a 2 2 t}. (5.7)
But the right side is the characteristic function of the normal distribution with
TRANSITION PROBABILITIES FROM THE KOLMOGOROV EQUATIONS: EXAMPLES 391
mean x and variance tv 2 . Therefore,

s
(y _
P(t; x,Y) _ (2na2t)"2 exp{- (t > 0, -cc < x, y < oo). (5.8)
X)
The transition probability density of a Brownian motion with nonzero drift

may be obtained in the same manner as above (Exercise 1).
Example 2. (The Ornstein-Uhlenbeck Process). S = (- oo, cc), u(x) = - yx,

a 2 (x) := v 2 > 0. Here y is a (positive) constant. Fix an initial state x. As a
function of t and y, p(t; x, y) satisfies the forward equation
P z ZP
(5.9)
+ y - (YP(t; X, Y))-
at 2 y 2
Let p be the Fourier transform of p as a function of y,
(t; Z) =
J e'4YP(t; x, y) dy. (5.10)
Then, upon integration by parts,
G Y/ (t; ) = J -^ e aY (t; x, y) dy
iê'4vp(t; x, y) dy = -i,
^ I (t; Z) =
GY / _ m
e1 ap(t ; x, Y) dY = - iZ
aY
J e`' apaydy
_ (_ iZ) 2 ep(t; x, Y) dY =
J
C Y ^y ) ^(t; ) = J
ey aP(t;
, Y) =
i
y a J p e ^ y ! dY
1 (ap
1 d p(t; )
( -1 P) _ - P - a (5.11)
ay^ i
Here we have assumed that y(ap/ay), a l p/ay e are integrable and go to zero as
jyj - cc. Taking Fourier transforms on both sides of (5.9) one has
a(t; ) =-2 ^ 2 P+yP+y( - P - a)=

Therefore,
p(t;
3 2
^ 2 ^ Z p. (5.12)
t
Thus, we have reduced the second-order partial differential equation (5.9) to

the first-order equation (5.12). The left side of (5.12) is the directional derivative
of p along the vector (1, y^) in the (t, )-plane. Let a(t) = d exp{yt}. Then (5.12)
yields
d
t p(t; a(t)) = --- a 2 (t)p(t; a(t)) (5.13)
or
p(t; a(t)) = c(x, d) exp - -

l
-f o, a
2 (s) ds }
))
= c( x , d) e x p ^ a
4 dz (e 2 Yt _ l)}.
Y
For arbitrary t and one may choose d = ê - Y`, so that a(t) = ^, and get
(t; ) = c(x, ê - '")exp{ 42 X 2 (1 e -2 ji)} (5.14)

l 4y
t j 0, one then has

,im (t; Z) = c(x, Z). (5.15)
:10
On the other hand, as t 10, p(t; c) converges to the Fourier transform of 5,

which is exp{i^x}. Thus, c(x, ) = exp{i^x} for every real , so that
(t; ) = exp{ ixe - y` 4 (1 e -Z Y`) Z }, (5.16)

Y
which is the Fourier transform of a Gaussian density with mean xe - y` and
variance (o 2 /2y)(1 e -2 yt ). Therefore,
p(t; x, y) = (2 )îi (2y (1 e -Z Y`)) I/2 exp{ (5.17)

a 2 (1 e e 2y1)2}
Note that the above derivation does not require that y be a positive parameter.
However, observe that if y = /m > 0 then letting t --+ oo in (5.16) we obtain
DIFFUSIONS WITH REFLECTING BOUNDARIES 393
the characteristic function of the Gaussian distribution with mean 0 and variance
Q 2 /2y = mc 2 /2 (the MaxwellBoltzmann velocity distribution). It follows that
it(v) (__Zy-1)1/2 exp{ yv z/6 z }, cc < v < oo,
is the p.d.f. of the invariant initial distribution of the process. With it as the
initial distribution, { V} is a stationary Gaussian process. The stationarity may
be viewed as an "equilibrium" status in which energy exchanges between the
particle and the fluid by thermal agitation and viscous dissipation have reached
a balance (on the average). Observe that the average kinetic energy is given by
E, ( mV,) = (m 2 /4)Q 2
6 DIFFUSIONS WITH REFLECTING BOUNDARIES
So far we have considered unrestricted diffusions on S = ( oo, oo), or on open

intervals. End points of the state space of a diffusion that cannot be reached
from the interior are said to be inaccessible. In this section and the next we
look at diffusions restricted to subintervals of S with one or more end points
accessible from the interior. In order to continue the process after it reaches a
boundary point, one must specify some boundary behavior, or boundary
condition, consistent with the requirement that the process be Markovian. One
such boundary condition, known as the reflecting boundary condition or the
Neumann boundary condition, is discussed in this section. Subsection 6.2 may
be read independently of Subsection 6.1.
6.1 Reflecting Diffusions as Limits of BirthDeath Chains

As an aid to intuition let us first see, in the spirit of Section 4, how a reflecting
diffusion on S = [0, oo) may be viewed as a limit of reflecting birthdeath
chains. Let it(x) and a 2 (x) satisfy Condition (1.1) on [0, oo) and let (x) be
bounded. For sufficiently small A > 0 one may consider a discrete-parameter
birthdeath chain on S = {0, A, 2A,. . .} with one-step transition probabilities
= ,e) = u'(i0)e p(iA)a

^
2AZ + 2A
a 2 (iA) e u(iA)E
= Stns 2A2 2A (i 1), (6.1)
CrZAE
P^^ = 1 (i + b) = 1 (i )

and
Poi = o (Al a2(0)E ii(0)E (0)

202 + 2e Poo = 1
62(0)E
o 1 2ez (
2e 0 )E
. (6.2)
Exactly as in Section 4, the backward difference equations (4.11) or (4.13)

are obtained for i >, 1, j > 0. These are the discretized difference equations for
,
Y)
= u(x) aP(
)x Y + ZU2(x) a 2 P(t; x, Y)
z t>0, x>0, Y>0.
aP(t;
(6.3)
The backward boundary condition for the diffusion is obtained in the limit as
A J, 0 from the corresponding equations for the chain for i = 0, j >, 0,
Po.i+ 11 Poi Pi] + Poo Po

_
( a'(0)f: ( 0 )s )(") a 2 0 )E - 11 0 )E l( n
2A 2 + 2A p '' + 1\ 1 2A z 2A )Poi)

( (
(J 0),
or,
P; i)E P;
+
_ 2Q0) (Pi; P;) + 2D) (P (' P;) (j > 0). (6.4)
In the notation of (4.12) this leads to,
p((n + 1)E; 0, jA) p () (m; 0, j0) A = 2e)

( P () (nw; A, JA) p () (w; 0, jA))
+ (0) (p () (nc; A, j0) p () (

ns; 0, JA))A.
(6.5)
Fix y >, 0, t > 0. For n = [ t/E], j = [y/0], the left side of (6.5) is approximately
0 ( aP(t; x, Y)
at ) .= O'
while the right side is approximately
1az(0) P(t; x, Y) 01p(0) aP(t; x, Y)

z ax x_o z ax
).=0
Letting A j 0, one has
ap(t; x, Y) = 0, (t > 0, y > 0). (6.6)

ox x=0
The equations (6.3) and (6.6), together with an initial condition p(0; x,.) = 6 x
determine a transition probability density of a Markov process on [0, co) having
continuous sample paths and satisfying the infinitesimal conditions (1.2) on the
interior (0, oc). This Markov process is called the diffusion on [0, co) with
reflecting boundary at 0 and drift and diffusion coefficients (x), a 2 (x). The
equations (6.3) and (6.6) are called the Kolmogorov backward equation and
backward boundary condition, respectively, for this diffusion. This particular
boundary condition (6.6) is also known as a Neumann boundary condition. The
precise nature of the approximation of the reflecting diffusion by the
corresponding reflecting birthdeath chain is the same as described in
Theorem 4.1.
6.2 Sample Path Construction of Reflecting Diffusions

Independently of the above heuristic considerations, we shall give in the rest of
this section complete probabilistic descriptions of diffusions reflecting at one
or two boundary points, with a treatment of the periodic boundary along the
way. The general method described here is sometimes called the method of
images.
ONE-POINT BOUNDARY CASE. (Diffusion on S = [0, c0) with "0" as a Reflecting

Boundary). First we consider a special case for which the probabilistic
description of "reflection" is simple. Let (x), v 2 (x) be defined on S = [0, cc)
and satisfy Condition (1.1) on S. Assume also that
(0) = 0. (6.7)
Now extend the coefficients p(.), a 2 (.) on R' by setting
( x) = u(x), Q 2 ( x) = Q 2 (x), (x > 0). (6.8)
Although /1(.), a'(.) so obtained may no longer be twice-differentiable on R',

they are Lipschitzian, and this suffices for the construction of a diffusion with
these coefficients (Chapter VII).
Theorem 6.1. Let {X} denote a diffusion on FR' with the extended coefficients
,u(.), a 2 (.) defined above. Then {1X1 1} is a Markov process on the state space
S = [0, cc), whose transition probability density q(t; x, y) is given by
q(t; x, y) = p(t; x, y) + p(t; x, y) (x, ye [0, cc)), (6.9)
where p(t; x, y) is the transition probability density of {X}. Further, q satisfies

the backward equation
aq(t; x, y) 12 a 2 q
q
ax (t > 0; x > 0, y % 0 ), (6.10)
+ p(x)
and the backward boundary condition
aq(t; x, y) I
=0 (t>0;y>,0). (6.11)
ax x=0
Proof. First note that the two Markov processes {X} and {X} on R' have
the same drift and diffusion coefficients (use Proposition 3.1, or see Exercise
1). Therefore, they have the same transition probability density function p, so
that the conditional density p(t; x, y) of XX at y given X0 = x is the same as the
conditional density of X, at y given X 0 = x; but the latter is the conditional
density of X, at y, given X0 = x. Hence,
p(t; x, y) = p(t; x, y). (6.12)
In order to show that { Y := I XÎ } is a Markov process on S = [0, cc), consider

an arbitrary real-valued bounded (Borel measurable or continuous) function g
on [0, oo), and write h(x) = g(Ixl). Then, as usual, writing Px for the distribution
of {X1 } starting at x and Ex for the corresponding expectations, one has for
Ex(g()s+,) I {Y 0 < u < s}),
Ex(g(IX5+,I) I {IXuI: 0 s u s})

= Ex[E(g(IXs+,I) I {X: 0 u s}) I {IX I: 0 u 5 s}] X
= E x [E(h(XS+ ,) {X: 0 u s}) {IXI: 0 u s}]
Ex[(J m h(y)p(t; x', y) dy)

m x'=xs
I {IXXI: 0 ,< u ,< s}]
= Ex g(z)(p(t; x', z) + p(t; x', z)) dz {IXI: 0 < u <, s} ]
0 x' =X s
= Ex [\J 00 g(z)(p(t; x', z) + p(t; x', z)) dz J - 8 {IXXI: 0 u < s}1

o / xx
=
EX[J 0
g(z)q(t; IXXI, z) dz I {IXI: 0 < u s}]
g(z)q(t; I X31, z) dz, (6.13)
=J 0
where
q(t; x, y) = p(t; x, y) + p(t; x, y). (6.14)
This proves {IX,I} is a time-homogeneous Markov process with transition

probability density q. By differentiating both sides of (6.9) with respect to t,
and making use of the backward equation for p, one arrives at (6.10). Since
the right side of (6.14) is an even differentiable function of x, (6.11) follows.
n
Definition 6.1. A Markov process on S = [a, oo) that has continuous sample
paths and whose transition probability density satisfies equations (6.10), (6.11),
with 0 replaced by a, is called a reflecting diffusion on [a, oo) having drift and
diffusion coefficients u(x), a 2 (x). The point a is then called a reflecting boundary
point.
Note that in this definition (0) is not required to be zero. It is possible to give
a description of such a general diffusion with a reflecting boundary, similar to
that given in Theorem 6.1 for the case (0) = 0 (see theoretical complement 1).
Example 1. Consider a reflecting diffusion on S = [0, oo) with u(-):= 0,

v 2 (x):= a 2 > 0. This is called the reflecting Brownian motion on [0, x). Its
transition probability density is, by (6.9) or (6.14),
_ x 2 2
q(t; x, Y) _ (2n02t) - î z exp (y x) + exp (y
2Q t )
(t>0;x,y>,0). (6.15)
Example 2. Consider the Ornstein-Uhlenbeck process (Example 5.2) {X 1 } on

W with (x) _ yx, a 2 () := a 2 > 0. By Theorem 6.1, {IX t I} is the reflecting
diffusion on S = [0, co ), with transition probability density (see (5.17) and (6.9))
-viz[ yl z
yt)
q(t; x, y) _ ? (1 e -2 exp a
z (1 e -2 y t )
_ y( y_+_xe) 2 1
+ exp a >
z (1 e-2yi)
(t>0;x,y>,0). (6.16)
By arguments similar to those given in Section 2, we shall now derive the

forward equation for a reflecting diffusion on [a, cc). By considering twice
continuously differentiable functions f, g vanishing outside a finite interval, and
g vanishing in a neighborhood of a as well, one derives the forward equation
exactly as in (2.33)-(2.41) (Exercise 6),
l p(t; x, Y) 2 a
(ia (Y)P(t; x, Y)) - ((Y)P(t; x, Y)),
Y Y
(t>0;x>,a,y>a). (6.17)
To derive the forward boundary condition, differentiate both sides of the

following equation with respect to t,
1
=J to-I
p(t; x, y) dy, (6.18)
to obtain, using (6.17),
(' ap(t; x, y) r a a
0
=J tQ 00 ^ at
dY =
J f^(Y)P + a (ia 2 (Y)P) dY
t.'.) ay \ ay
= (a)p(t; x, a) \ Y (zo 2 (Y)P(t; x, Y))

)y = a
Hence the forward boundary condition is
Oy (io 2 (Y)P(t; x, Y)) y =a (a)p(t; x, a) = 0. (6.19)
PERIODIC BOUNDARY CASE. (Diffusions on a Circle: Periodic Boundary

Conditions). Let ( ), a'(.) be defined on R' and satisfy Condition (1.1). Assume
that both are periodic functions of period d.
Theorem 6.2. Let {X} be a diffusion on l' with periodic coefficients ( ), 2(.)
as above. Define
{Z,} := {XX (mod d)} . (6.20)
Then {Z} is a Markov process on [0, d) whose transition probability density

function q(t; z, z') is given by
q(t; z, z') _ Z p(t; z, z' + md) (t > 0; z, z' e [0, d)) (6.21)
m=m
where p(t; x, y) is the transition probability density function of {X}.
Proof. Let f be a real-valued bounded continuous function on [0, d] with

Proof
f(0) = f (d ). Let g be the periodic extension of f on Q^' defined by
g(x + md) = f (x) for x e [0, d] and m = 0, 1, +2, .... We need to prove
E[f(Z3+)1 {Z: 0 < u < s}] =

[ J df(zr)q(t; z, z')
dz'] . (6.22)
0 2 -Z
Now, since f(Z) = g(X,), and {X} is Markovian,
1
E[f (ZS +,) {X: 0 < u < s}] = E[g(X5+,) {X: 0 < u < s}] 1
[II
=g(y)p(t; x, y) dY]
m +l}d
z x s
x, Y) dY(6 . 2 3)
m=m LLL L 9(Y)P(t;
d x =x,,
But the periodicity of z() and a'() implies that the Markov processes Al
and {X, md} have the same drift and diffusion coefficients (Proposition 3.1
or Exercise 2) and, therefore, the same transition probability densities. This
means
p(t;x+md,y+md)=p(t;x,y) (t>O;x,ye[0,d);m=0,+1,2,...).
(6.24)
Using (6.24) in (6.23), and using the fact that g(md + z') = f(z') for z E [0, d),
one gets
E[ f (ZS+ ,) I {X: 0 u < s}] =

m =
J
ao 0
d g(md + z') p(t; Xs , and + z')
= f(z') p(t; X , and + z') dz'.(6.25)

5
0 m = ao
Since Zs = X (mod d), there exists a unique integer m o = m 0 (X ) such that

5 3
Xs = m o d + Z. Then
p(t; X , and + z') = p(t; m o d + Z , and + z') = p(t; Z

S 5 S, ( m m 0 )d + z'),
by (6.24). Hence (6.25) reduces to (taking m' = m m 0 )
E[f(Z ,) I {X: 0 - u < s}] = f od

. f (z') p(t; Z , m'd + z') dz
S
= J d
0
f (z')q(t; Zs , z') dz'. (6.26)
The desired relation (6.22) is obtained by taking conditional expectations of

extreme left and right sides of (6.26) with respect to {Z: 0 < u < s}, noting that
{Z: 0 < u < s} is determined by {X: 0 < u < s} (i.e., a{Z: 0 < u < s} c
Q{X:O'<u<s}). n
Since 0(mod d) = 0 = d(mod d), the state space of {Z} may be regarded as
the compact set [0, d] with 0 and d identified. Actually, the state space is best
thought of as a circle of circumference d, with z as the arc length measured
counterclockwise along the circle starting from a fixed point on the circle. It
may also be identified with the unit circle by zE-.exp{i2irz/d}.
Definition 6.5. Let {XX } be a diffusion on O' with periodic drift and diffusion
coefficients with period d. Then the Markov process {Z} := {X,(mod d)}, is
called a diffusion on [0, d) with periodic boundary or a diffusion on the circle.
Example 3. Let {Xj be a diffusion on 0V with drift and diffusion coefficients

and a 2 > 0. Then {Z, := X1 (mod 2it)} is the circular Brownian motion, so named
since Z, can be identified with e` Z '. The transition probability density of this
diffusion is
q(t; z, z')
_ m-
=m
1
(2ja2t)-i2 expl (2mn + z' z (
2a2t 6.27)
t)2
Underlying the proofs of Theorems 6.1 and 6.2 is an important principle

that may be stated as follows (Exercise 5 and theoretical complement 2).
Proposition 6.3. (A General Principle). Suppose {X,} is a Markov process on

S having transition operators T, t > 0. Let q be an arbitrary (measurable)
function on S. Then {q(XX is a Markov process on S':= q(S) if T ( f o 4,) is a
)} '
function of (p, for every bounded (measurable) f on S'.
TWO-POINT BOUNDARY CASE. (Diffusions on S = [0, 1] with "0" and "1"

Reflecting). To illustrate the ideas for the general case, let us first see how to
obtain a probabilistic construction of a Brownian motion on [0, 1] with zero
drift, and both boundary points 0,1 reflecting. This leads to another application
of the above general principle 6.3 in the following example.
Example 4. Let {X} be an unrestricted Brownian motion with zero drift,

starting at x e [0, 1]. Define Z,(' ) := X,(mod 2). By Theorem 6.2, {Z} is a
diffusion on [0, 2] (with "0" and "2" identified). Hence {Z, 2 ':= Z,( " 1} is a
diffusion on [-1, 1] whose transition probability density is (see (6.27))
q(2)(t; z, z') _Y
,,,_ _
( (2m+z'z) 2 1
(2nQ 2 t) 1/2 expt
2a2t Jj
for 1 < z', z ,< 1.
(6.28)
In particular,
q(2)(t; z, z') = q (2) (t; z, z'). (6.29)
It now follows, exactly as in the proof of Theorem 6.1, that {Z r } = { cp(Zt( z) )} :_

{jZt( 2) I} is a Markov process on [0, 1]. For, if f is a continuous function on

[0, 1] then
E[f(Z,) I Z' = z] = E[f(IZ; Z Î) Z0 = z]
= I f (I z'I)q(t; z, z') dz'

1
= f 0 f(Iz'I)q ( ^ 1 (t;z,z') dz' + J

0
f(Iz'I)q(2(t;z z') dz'
= f(Iz'I)(q` 2 (t; z, z') + q (2 (t; z, z')) dz'

f0
f
i
= f(lz'I)(q (2> (t; z, z') + q (2 (t;z, z')) dz' (by (6. 29 ))

0
=f(Iz'I)q(t; Izl, z') dz', say, (6.30)

fo
which is a function of q(z) = IzI. Hence, {Z,} is a Markov process on [0, 1]
whose transition probability density is
q(t; x, y) = q (2 (t; x, y) + q(2 (t; x, y)
+ y x)Zl + exp{ (2m + y + x)?^J

_ (2na2t)
m= co li2[exp
{ (2m
2QZt j l 2a2t
forx,ye[0,l]. (6.31)
Note that
q(t; x, y) aq( 2 )( t; x, Y) a9 (2) (t; x, y)

at at + at
x, 2m + y) + p(t; x, 2m + y)), (6.32)

m= ao at (p(t;
where p(t; x, y) is the transition probability density of a Brownian motion with

zero drift and diffusion coefficient 6 2 . Hence
_ aq(t; x, )
Z
1- a9t>Y)
ax2
y
, (xE(0 , 1 )>YE[0 , 1 ]). (6.33)
at
Also, the first equality in (6.31) shows that
anti- x_ v1
=0, (yE[0,1]). (6.34)
UA 1x=0
Since q (2) (t; x, y) + q (2) (t; x, y) is symmetric about x = 1 (see (6.21)) one has
aq(t; x, y) = 0
(y E [0 , 1 ]). (6.35)
3x x=1
Thus, q satisfies Kolmogorov's backward equation (6.33), and backward

boundary conditions (6.34), (6.35).
In precisely the same manner as in Example 4, we may arrive at a more

general result. In order to state it, consider ( ), Q 2 () satisfying Condition (1.1)
on [0, 1]. Assume
(0) = 0 = (1). (6.36)
Extend p(.), a 2 (.) to ( oo, oo) as follows. First set
p(x) = (x), u 2 (x) = r 2 (x) for x e [0, 1] (6.37)
and then set
p(x+2m)=(x), a 2 (x+2m)=v 2 (x) forxe[-1,1],

m=0,1,2.....(6.38)
Theorem 6.4. Let )' a 2 (.) be extended as above, and let {XX } be a diffusion
on S = (oo, oo) having these coefficients. Define {Z'} __ {X,(mod 2)}, and
{ZZ } _= {IZ,( 1 " 1I}. Then {Z^} is a diffusion with coefficients p(.) and Q 2 (.) on
[0, 1], and reflecting boundary points 0, 1.
The condition (6.36) guarantees continuity of the extended coefficients.

However, if the given t(.) on [0, 1] does not satisfy this condition, one may
modify y(x) at x = 0 and x = 1 as in (6.36). Although this makes z(.)
discontinuous, Theorem 6.4 may be shown to go through (see theoretical
complements 1 and 2).
7 DIFFUSIONS WITH ABSORBING BOUNDARIES
Diffusions with absorbing boundaries are rather simple to describe. Upon arrival
at a boundary point (state) the process is to remain in that state for all times
thereafter. In particular this entails jumps in the transition probability
distribution at absorbing states.
7.1 One-Point Boundary Case (Diffusions on S = [a, oo) with Absorption at a)

Let p( ), a'(.) be defined on S and satisfy Condition (1.1). Extend ( ), a'( )
on all of W in some (arbitrary) manner such that Condition (1.1) holds on II'
DIFFUSIONS WITH ABSORBING BOUNDARIES 403
for this extension. Let {X} denote a diffusion on ER' having these coefficients,
and starting at x e [a, oo). Define a new stochastic process {X = } by
X, if t < Ta
(7.1)
^
a=X^. ift>,T a ,
where 'c a is the first passage time to a defined by
ra := inf{t >, 0: X` = a}. (7.2)
Note that 'c a - i a ({X,}) is a function of the process {X}.
Theorem 7.1. The process {X,} is a time-homogeneous Markov process.
Proof. Let B be a Borel subset of (a, oo). Then
P(XS+ ,EB, and T a 's {X:0-<u<s})=0. (7.3)
On the other hand, introducing the shifted process {(X) := X5+ ,: t > 0} and
using the Markov property of {X,} one gets, on the set {T a > s},
P(Xs+ ,EB,t a >s1 {X:0u<s})
= P(Xs+1 e B, T a > s, and T a > S + t {X: 0 1 u <1 s})
= 1 {ta , $} P((XS ), e B, and 'r a (XS ) > t I {Xu : 0 <- u -< s})
= 1 {â, )P(X, e B, and Ta > t I {Xo = Y})I ^ =xs ,
= 1 {ra>S}P(X, e B {X0 = Y})I=x,. (7.4)
For the second equality in (7.4), we have used the fact that if T a > s then
T a = s + T a (X, ), being the first passage time to a for X. Also, (T a > s}
is determined by {X: 0 s} may be taken outside the
conditional probability. Combining (7.3) and (7.4) one gets, for B c (a, oo),
P(Xs+ ,eBI {XX :0<u<s})=P(X,eBI{X 0 =Y})I v=xs 1 (=a>s)

= P(X, e B {X o = y})I v=2= l ira>s ^
= P(XI e B I {Xo = Y})I v=g, (7.5)
The second equality holds, since r a > s implies Xs = Xs (and, always, X o = Xo ).

The last equality holds, since T. < s implies X3 = a, so that for B contained in
(a, co), P(X e B I {Xo = y})I y=x , = 0. Now the sigmafield Q{X: 0 < u < s} is
contained in v{X: 0 < u < s}, since X is determined by {X: 0 < v < u}. Hence
P(Xs+ e B {X: 0 < u < s}) may be obtained by taking the conditional
expectation of the left side of (7.5), given {X:0 < u < s}. But the last expression
in (7.5) is already a function of Xs . Hence
P(X,,, e B I {X,,: 0< u 5 s}) = P(', E B I Xo = y)I v =g.. (7.6)
For B = {a}, (7.6) may be checked by taking B = (a, oo), and by

complementation. This establishes the Markov property of {X,}.
Definition 7.1. The Markov process {XJ is called a diffusion on [a, cc) having
drift and diffusion coefficients u( ), a 2 ( ), and an absorbing boundary at a.
The transition probability p(t; x, dy) of {Xj is given by
p(t; x, B) := P(X^ e B I {X = x})

(P(XX e B, and t Q > t I {X = x}) if B c (a, oo)
(7 7)
S(P(TQ '<tI{X =x}) ifB={a}.
In particular, the transition probability distribution will have a jump (positive

probability) at the boundary point {a} if a is accessible.
Let p(t; x, y) denote the transition probability density of {X}. Since, for
Bc(a, cc) and x>a,
p(t;x,B)=P(X1 EB,and to >t1 {X =x})

P(XX e B I {X = x}) = p(t; x, B), (B c (a, oo)), (7.8)

P(t; x, dy) is given by a density p (t; x, y), say, on (a, co) with p (t; x, y) < p(t; x, y)
(for x > a, y > a) (Exercise 1). One may then rewrite (7.7) as
IffB p(t;x,y)dy ifBc(a, cc) and x>a,

(7.9)
P(t; x B) = P(TQ t)
ifB={a},
where P. denotes the distribution of {X} starting at x.

Thus for the analytical determination of p(t; x, dy) one needs to find the
density p (t; x, y) and the function (t, x) -^ Px (ra < t) for x > a. It is shown in
Section 15 that p (t; x, y) satisfies the same backward equation as does p(t; x, y)
(also see Exercise 2).

ap (t; x, y) = I 2 0Zp ap0 (t>0;x>a,y>a), (7.10)
t Za (x) axz + (x) Ox
and the Dirichlet boundary condition

lim p (t; x, y) = 0. (7.11)
xlo
Indeed, (7.11) may be derived from the relation (see (7.9))
P (t; x, Y) dY = Px (T a > t), (x > a, t > 0), (7.12)
noting that as x j a the probability on the right side goes to zero. If one assumes
that p (t; x, y) has a limit as x j a, then this limit must be zero.
By the same method as used in the derivation of (6.17) it will follow (Exercise
3) that p satisfies the forward equation
aP (t;x,Y) = aZ
t
(i 6Z (Y)P ) a ((Y)P ), (t > 0; x > a,Y > a), (7.13)
ay2 aY
and the forward boundary condition
lim p (t; x, y) = 0, (t > 0; x > a). (7.14)

y1a
For a Brownian motion on [0, co) with an absorbing boundary zero, p is

calculated in Section 8 analytically. A purely probabilistic derivation of p is
sketched in Exercises 11.5, 11.6, and 11.11.
7.2 Two-Point Boundary Case (Diffusions on S = [a, b] with Two Absorbing

Boundary Points a, b)
Let U( ), i 2 () be defined on [a, b]. Extend these to J' in any manner such
that Condition (1.1) is satisfied. Let {X,} be a diffusion on R' having these
extended coefficients, starting at a point in [a, b]. Define the stopped process
{X,} by,
tx,X,
ift>T,(7.15)
ift î,
where T is the first passage time to the boundary,
r:= inf{t>,0:X,=aor X,=b}. (7.16)
Virtually the same proof as given for Theorem 7.1 applies to show that {X t }
is a Markov process on [a, b].
Definition 7.2. The process {X,} in (7.15) is called a diffusion on [a, b] with
coefficients i(.), a'() and with two absorbing boundaries a, b.
Once again, the transition probability p(t; x, dy) of {X} is given by a density
p (t; x, y) when restricted to the interior (a, b),

p(t; x, B) = J p (t; x, y) dy (t > 0; a <x < b, B c (a, b)). (7.17)
a
This density has total mass less than 1,
J(a,b)

P (t; x, y) dy = PX (r > t), (t > 0; x e (a, b)), (7.18)
where Ps is the distribution of {X}. Also, by the same argument as in the

one-point boundary case,
lim p (t; x, y) = 0, (7.19)

Y1a
lim p (t; x, y) = 0; (t > 0; x e (a, b)). (7.20)

YIb
Unlike the one-point boundary case, however, p does not completely

determine p(t; x, dy) in the present case. For this, one also needs to calculate
the probabilities
P(t; x, {a}) = Px (r < t, Xi = a), P(t; x, {b}) := Px (i ,< t, XL = b). (7.21)
In order to calculate these probabilities, let A denote the event that the
diffusion {X,}, starting at x, reaches a before reaching b, and let DD be the event
that by time t the process does not reach either a or b but eventually reaches
a before b. Then DD c A and
P(t; x, {a}) = Px(A\D,) = P(A) Px(DD) = 1i(x) P(D), (7.22)
where (/i(x) is the probability that starting at x the diffusion {X,} reaches a
before b. Conditioning on XX , one gets for t > 0, a <x < b,
PP(DD) =
f(a,b)

PX(D, 1 X, = y)P (t; x, y) dy =
f (a.b)
i(y)P(t; x, y) dy. (7.23)
Substituting (7.22) in (7.23) yields
p(t; x, {a}) = î(x) J (a,b)

î(y)p (t; x, y) dy, (t > 0, a < x < b). (7.24)
Therefore we have the following result.

Proposition 7.2. Let (x) and a 2 (x) satisfy Condition (1.1) on (oo, oo). Let
S = [a, b], for some a < b. Then the probability p(t; x, {a}) is given by
P(t; x, {a}) = /i(x) J0 6 î(Y)p (t; x, y) dy (t > 0, a <x < b), (7.25)
where '(y) = PY (X, = a), Py being the distribution of the unrestricted process
{X,} starting at y.
The function li(x) in Proposition 7.2 is given by (2.24) as well as (4.17). It

is also calculated in Section 8, where it is shown that tJi(x) can be obtained as
the solution to a boundary-value problem that is the continuous (or, differential
equation) analog of the discrete (or difference equation) boundary-value
problem (Chapter III, Eqs. 2.4-2.5) for the birthdeath process. It is given by
rJi(x) _
f.
'
exp
Z
2(y) dy 1J dz
dy } dz
< Ja ^Z(Y) ) ) (a < x <, b). (7.26)
f exp
O ."
f(Y)
Finally,
p(t; x, {b}) = Px (r < t) p(t; x, {a}) = 1 f p(t; x, y) dy p(t; x, {a}).

(a,b)
(7.27)
For the case of a Brownian motion on [a, b] with both boundary points
a, b absorbing, calculations of p(t; x, y), /i(x) based on eigenfunction
expansions and (7.26) are given in Section 8. A purely probabilistic calculation
is sketched in Exercises 11.5, 11.6, and 11.10.
7.3 Mixed Two-Point Boundary Case (One Absorbing Boundary Point and
One Reflecting)
Let {X} be a diffusion on [a, oo) with a reflecting boundary a, having drift and
diffusion coefficients t(), a 2 (.). For b > a, define
X, if t < zb,
i (7.28)
{
b ift>T,.
If {X,} starts at x e [a, b], then {1,} is a Markov process on [a, b], starting at
x, which is called a diffusion on [a, b] having coefficients p(), a 2 ( ), and a
reflecting boundary point a and an absorbing boundary point b.


SPECTRAL METHODS
Consider an arbitrary diffusion on an interval S with drift (x) and diffusion

coefficient a 2 (x). If an end point is accessible then choose either a Neumann

(reflecting) or a Dirichlet (absorbing) boundary condition. Let S denote the
interior of S. Define the function
n(x) = 2a exp{I(x , x)} , x e S, (8.1)

a 2 (x)
where a is an arbitrary positive constant, x 0 is an arbitrarily chosen state, and
2(z)
1(x 0 , x) := x dz (8.2)
xo a 2 (Z)
Notice that n(x) is proportional to the limiting measure obtained from Eq. 4.1
of Chapter III, in the diffusion limit of birthdeath chains with the parameters
given by (4.2) of the present chapter.
Consider the space LZ (S, it) of real-valued functions on S that are square
integrable with respect to the density n. Let f, g be twice continuously
differentiable functions that satisfy the backward Neumann or Dirichiet
boundary condition(s) imposed at the boundary point(s), and that vanish
outside a finite interval. Then, upon integration by parts, one obtains the
following property for A (Exercise 1)
<Af, g>, = < f, Agin, (8.3)
where
Af(x) = za 2 (x)f"(x) + p(x)f'(x), x E S , (8.4)
and the inner product < , >,, is defined by
<f h> =
fS
f(x)h(x)n(x) dx. (8.5)
In view of the symmetry of A reflected in (8.3), one may expect that there is a
spectral representation of the transition probability analogous to that for
discrete state spaces (see Section 4 of Chapter III and Section 9 of Chapter IV).
That this is indeed the case will be illustrated in forthcoming examples of this
section (also see theoretical complement 2).
Consider the case in which S is a closed and bounded interval. The idea
behind this method is that if i is an eigenfunction of A (including boundary
conditions) corresponding to an eigenvalue a, i.e.,
A>y = a>G (8.6)
then u(t, x) = e /i(x) solves the backward equation
û = eai/i(x) = eAi(x) = Au(t, x) (8.7)

^t
and u satisfies the boundary conditions. Likewise, if u(t, x) is a superposition

(linear combination) of such functions then the same will be true. Suppose
that the set of eigenvalues (counting multiplicities) is countable, say
1 0 , a,, a 2 , . .. with corresponding eigenfunctions Mi o , 0 l , . .. of unit length, i.e.,
II0.II = <., .> 112 = 1. It is simple to check that eigenfunctions corresponding
to distinct eigenvalues are orthogonal with respect to the inner product (8.5)
(Exercise 2). Also if there are more than one linearly independent eigenfunctions
for a single eigenvalue, then these can be orthogonalized by the GramSchmidt
procedure (Exercise 2). So , ... can be taken to be orthonormal. If the
set of finite linear combinations of the eigenfunctions is dense in LZ (S, it), then
the eigenfunctions are said to be complete. In this case each f E LZ (S, n) has a
Fourier expansion of the form
J = Y_ <fi'Yn>nWn. (8.8)
n=0
Consider the superposition defined by
u(t, x) = E e"^`<f, ' >, i (x). (8.9)

n=0
Then u satisfies the backward equation and the boundary condition(s) together
with the initial condition
u(0, x) = f(x). (8.10)
However, the function
T^f(x) = EXf(X,) = I .f(y)p(t; x, dy) (8.11)

Js
also satisfies the same backward equation, boundary condition and initial
condition. So if there is uniqueness for a sufficiently large class of initial functions
f (see theoretical complement 2) then we get
T,f(x) = u(t, x), (8.12)

which means
e""'^(x)^n(Y))it(Y) dy.
fs f(Y)p(t; x, dy) = fs f(Y)(Z
=O
(8.13)
In such cases, therefore, p(t; x, dy) has a density p(t; x, y) and it is given by

p(t; x, y) _Ze "'^n(x)^ n(Y)n(Y)
a
(8.14)
n=o
In the present section this method of computation of transition probabilities

is illustrated in the case that (x) and o 2 (x) are constants.
In Examples I and 2 below, y = 0 and o 2 (x) - z > 0, so that n(x) is a
constant that may be taken to be 1.
Explicit computations of eigenvalues and eigenfunctions are possible only in
very special cases. There are, however, effective numerical procedures for their
approximate computation.
Example 1. (Brownian Motion on [0, d] with Two Reflecting Boundaries). In

this case, S = [0, d], and Kolmogorov's backward equation for the transition
probability density function p(t; x, y) is
z
(t > 0, 0 < x < d; y e S), (8.15)
t zoz ax
with the backward boundary condition
a t
p =0 (t > 0; ye S). (8.16)
x x=o,a
We seek eigenvalues a of A and corresponding eigenfunctions 0. That is,

consider solutions of
?oz^(x) = ai(x), (8.17)
i'(0) = '(d) = 0. (8.18)
Check that the functions
î m (x) = b m cos mit ^x8.19

l ( m=0,1,2,...), ( 8.19 )
satisfy (8.17), (8.18), with a given by

m z ir 2 a z
, 2d2 (m=0,1,2,...). (8.20)
Now the function (8.1) is here it(x) = 1. Since
J 0
l z dx=d, J cost(Jdx =2,
0
(8.21)
the normalizing constant b m is given by
b rn = jI
\ (m ? 1); bo d
(8.22)
To see whether (8.19) and (8.20) provide all the eigenvalues and eigenfunctions,
one needs to check the completeness of {di m : m = 0, 1, 2, ...} in L2 ([0, d], n).
This may be done by using Fourier series (Exercise 10). Thus,
a"t
p(t; x, y) = L e II/m(x)frm(y)
m=0
1 2 z z z
1 anmmirx m7ry
exp _ +
t cos cos ( X )
d
dm, 2dz d
(t >0,0<x,y <d). (8.23)
Notice that
1
lim p(t; x, y) = d , ( 8.24)
the convergence being exponentially fast, uniformly with respect to x, y. Thus,

as t * oo, the distribution of the process at time t converges to the uniform
distribution on [0, d]. In particular, this limiting distribution provides the
invariant initial distribution for the process.
Example 2. (Brownian Motion with One Reflecting Boundary). Let S = [0, oo),
(x) - 0, a 2 (x) - a z > 0. We seek the transition density p that satisfies
ap 2 a lp
fort>0, x >0, y:0,
at= 2
a xZ
(8.25)
ap
=0 fort>0, y>,O.
ax I =o
X
This diffusion is the limiting form of that in Example 1 as d j oo. Letting d j oo

in (8.23) one has
1 ( oznz(m/d)2 1 ^
p(t; x, y) = lim - exp 1 t)t cos cos
d r^d m 2=-m d d
\ m^y^
( to z i z u zl
= exp{ 2 cos(zrxu) cos(nyu) du
-X 1. )
Zzz
= 1 exp - ta [cos(n(x + y)u) + cos(n(x - y)u)] du
2.
_ 2^ I (e
,J-^
+ e)
ex{p_
l
t62 z 1 dZ
I
( _ nu)
(x + Z )Z } + ex
= (2ntor)
lz ^1z (exp {( _ 2tv ) p 2tQ Z )Z) /
(x
(t>0,0<x,y<oo). (8.26)
The last equality in (8.26) follows from Fourier inversion and the fact that
exp{ ta 2 ^ 2 /2} is the characteristic function of the normal distribution with
mean zero and variance to e .
Example 3. (Diffusion on S = [0, d] with Constant Coefficients and Two

Absorbing Boundaries). Here S = [0, d], and the transition probability has a
density p(t; x, y) on 0 < x, y < d that satisfies the backward equation
z
(t>0,0<x<d;0<y<d), (8.27)
a^ 2a +ax,
the backward boundary conditions
Jim p(t; x, y) = 0, lim p(t; x, y) = 0, (8.28)

x10 xjd
for t > 0, 0 <y < d. Check that the functions
yx In=
I1 m (x) = b m exp -- z sin d( m = 1, 2, ...) (8.29)
are eigenfunctions corresponding to eigenvalues

m z n 2^2 2
(m = 1, 2, ...). (8.30)
= a. 2d2 2a2 ,
Since the function in (8.1) is in this case given by
n(x) = exp { 2px )) } (O ^ x ^ d), ( 8.31 )

JJ
the normalizing constants are
2
b m = (m=1,2,...). (8.32)
d
The eigenfunction expansion of p is therefore

W
p(t; x, y) = Z eam'IJm(x)Im(Y)ir(Y)
m=I
_ 2 exp( (Y0-z
x)
} exp t
t ia-i j expj m22daZt^
m=1
l 2d 2
x sin( m
dx ) sin^ m
dy^ (t > 0,0 < x, y < d). (8.33)
It remains to calculate
p(t; x, {0}) = PX (XX = 0), p(t; x, {d}) = PX (X, = d), 0 < x < d. (8.34)
Note that
p(t; x, {0}) + p(t; x, {d}) = 1 J p(t; x, y) dy

(0d)
= 1 2 exp
1 Q2c mÎ a
m exp{
l
m2

d^l
\(8.35)
2t si
n( ^
( mix
where
d
am = exp " sin y)dy. (8.36)
f'
Thus, it is enough to determine p(t; x, {0}); the function p(t; x, {d}) is then
determined from (8.35). From (7.25) we get
p(t; x, {0}) = +1(x) 0(y)p(t; x, y) dy (8.37)

J (0 d)
where i/i(x) = Px ({X,} reaches 0 before d). From the calculations of (9.6) in

Chapter I we have
x if=0
(8.38)
^ (x) = exp{2(d x)/a2}
if 0.
1 exp{2d/6 2 }
Substituting (8.33), (8.38) in (8.37) yields p(t; x, {0}).
Example 4. (Diffusion on S = [0, oo) with Constant Coefficients , o 2 > 0;

Zero an Absorbing Boundary). The transition density function p(t; x, y) for
x, y e (0, oo) is obtained as a limit of (8.33) as d T oo. That is,
J 2 exp l 2
nz x) _
IY ^z
p(t; x, y) = 2t exp 2 sin(irxu) sin(ityu) du
}n2o2tu2
2
_ exp 1 J exp{ - 2 } sin(Zx) sin(ZY) dZ
-
t( a 2 x) 2 Q Z o 1. ))
2 2t)
2_ exp p(y i J exp{
l ort z ).
x) / 0
x [cos(^(x y)) cos(^(x + y))] d^

z
(Y x) _ /it )
= xp 1
2ir
e
Qz 2QZ
e-i^cx-ri _ e -14cx+Y)) e -a 2 ,4 2 J2 dZ
2t ) 1
(
= exp { (y x)
l Q z 2vz (2na 2 t) 1 / 2
(x y) z (xexp+
y)
z
x exp (8.39)
2o 2 t 2vzt
Integration of (8.39) with respect to y over (0, oo) yields p(t; x, {0}`), which
is the probability that, starting from x, the first passage time to zero is greater
than t.
Examples 3 and 4 yield the distributions of the maximum and the minimum
and the joint distribution of the maximum, minimum and the state at time
t of a diffusion with constant coefficients over the time period [0, t]
(Exercises 4-6).
9 TRANSIENCE AND RECURRENCE OF DIFFUSIONS
Let {X': t 0} be a diffusion on an interval S, with drift and diffusion

coefficients (x) and a 2 (x), starting at x. Let [c, d] c S, c < d, and let
TRANSIENCE AND RECURRENCE OF DIFFUSIONS 415
O(x) = P({XI } reaches c before d), c < x < d. (9.1)
Assume, as always, that Condition (1.1) holds. Criteria for transience and
recurrence of diffusions may be derived from the computation of i/i(x) given in
Section 4 (see Eq. (4.17)). A different method of computation based on solving
a differential equation governing ifr is given in this section; recall Chapter III,
Eqs. 2.4, 2.5 in the case of the birthdeath chain for the analogous discrete
problem. The present method is similar to that of Proposition 2.5, but does
not make use of martingales. It does, however, use the fact derived in Exercise
3.5, namely,
Px ( max IX, xI > s = o(h) as h j 0 for every s > 0.

os<<h /
Assume that this last fact holds.
Proposition 9.1. i/i is the solution to the two-point boundary-value problem:
za 2 (x)r/i"(x) + (x)/i'(x) = 0 for c <x < d,

(9.2)
i(c) = 1, '(d)=0.
Proof. Let i denote the first passage to the set {c, d} with two elements. The
event
{i > h} - {c < min{X: 0 h,X,=c)=E x [P('r>h,X,=c1 {X:0<,u s h})]

=Ex [P(r>h,(X,; ) T =c1{X.:0h)P((Xti )T' = c {X: 0 u h})]
= Ex(l{r>h}/(Xh)) (9.3)
Here X,; is the shifted process {(X,; ),} := {X,, ,: t >, 0} and tr' = inf{t: (X,) = c}
(= T h on {r> h}). Now extend the function i/i(x) over x <c and x > d
smoothly so as to vanish outside a compact set in S. Denote this extended
function also by ay. Then (9.3) may be expressed as
P.('r > h, X^ = c) = Ex0 (Xh) EX(l(T,n)(J(Xh))
=
J
I (y)p(h; x, y) dy + o(h), as h j 0. (9.4)
s
The remainder term in (9.4) is o(h) by the third condition in (1.2)'. Actually,

one needs the stronger property that PX(maxos5shlX5 xj > E) = o(h) (see
Exercise 3.5). For the same reason,
Px (r<h,X,=c)<Px (t<h)=o(h), ashj0. (9.5)
By (9.4) and (9.5)
i/i(x) = Px (r > h, XL = c) + Px (r < h, XL = c)

= (T h, /I)(x) + o(h), as h j 0. (9.6)
Hence
lim (Th/)(x) /i(x) = 0. (9.7)

hlo h
But the limit on the left is (Ai/i)(x) = Za 2 (x)^/i"(x) + p(x)/i'(x) (see Section 2).
n
In order to solve the two-point boundary-value problem (9.2), write
I(x,z):= (x,zES), (9.8)

J 2,u(y) dy
x a 2 (y)
-
with the usual convention that 1(z, x) _ I(x, z). Fix an arbitrary x 0 E S, and
define the scale function
s(x) = s(x o ; x).= J x

xo
exp{-1(x 0 , z)} dz, (x e S), (9.9)
and the speed function
m(x) - m(x o ; x):= J x

xa
2 exp{I(x 0 , z)} dz,
a2 ( z )
(x E S). (9.10)
In terms of these two functions, the differential operator
d2 d
A = 2 6 2 (x) dx2 + (x) x (9.11)
takes the simple form
d d
= (9.12)
A dm(x) ds(x)

In other words, for every twice-differentiable function f on S (Exercise 1)
(df (x)
(Af)(x) =d (9.13)
dm(x) ds(x)
where, by the chain rule,
df(x) df(x) dx _ 1 df(x) 1 )

(9.14
ds(x) dx ds(x) s'(x) f (x) ' dm(x) m'(x)f (x).
The differential equation in (9.2) may now be expressed as
d
(9.15)
dm(x) ds(x)
which yields
do(x) = c l , O(x) = c l s(x) + c 2 . (9.16)

ds(x)
The boundary conditions in (9.2) now yield,
s(d)
cl
--) = 1- c2 = c i s(d) = (C.
) (9.17)
s(d)s(c) s(d)s(c)
Use these constants in (9.16) to get
O(x) = s(d) s(x) = s(d;x,) s(x;x0)

(9.18)
s(d)s(c) s(d;x,)s(c;x,)
Taking x o = c and noting that s(c; c) = 0, one has
J d exp{ I(c, z)} dz

'(x) = fd (9.19)
exp{-1(c, z)} dz
c
The relation (9.18) justifies the scale function nomenclature for the function s.
When distance is measured in this scale, i.e., when s(y) s(x) is regarded as
the distance between x and y (x < y), then the probability of reaching c before
d starting at x is proportional to the distance s(d) s(x) between x and d. This
is a property of Brownian motion, for which s(x; 0) = x (also see Eq. 9.6,
Chapter 1). In particular, starting from the middle of an interval in this scale,
the probability of reaching the left end point before the right end point is Z,
the same as that of reaching the right end point first.
It follows from (9.18) (Exercise 2) that
cp{x)'= Px {{X} reaches d before c

)
_
s(x)s(c) _
s(d) s(c)
f x
d
e-i<<,:^ dz
e_'' dz
. (9.20)
Jc
Of course, (9.20) may be proved directly in the same way as (9.18).

Next write
p x , = Px ({X} ever reaches y}), (x, y e S). (9.21)
Then the following results hold.
Proposition 9.2. Let S = (a, b). Fix x 0 e S arbitrarily.

(a) If
s(a) = s(x o ; a) = cc,, (9.22)
then p, = 1 for all y > x. If s(x o ; a) is finite, then
x exp{ 1(x 0 , z)} dz

s(x) s(a) a
Px Y = (y > x). (9.23)
s(Y) s(a) =
f . y
exp{ 1(x 0 , z)} dz
(b)If
s(x o ; b) = oc, (9.24)
then p, = 1 for all y < x. If s(x o ; b) is finite, then
exp{-1(x o , z)} dz
s(b) s(x)= fX' (y < x). (9.25)
pxY s(b) s(y)
f b exp{ 1(x o , z)} dz
J
'
Proof. (a) If y> x then p x ^, is obtained from (9.20) by letting d = y, and c J. a.

(b) If y < x, then use (9.18) with c = y, and let d lb. n

Definition 9.1. A state y is recurrent if p xy = 1 for all x E S such that p vx > 0,

and is transient otherwise. If all states in S are recurrent, then the diffusion
is said to be recurrent.
The following corollary is a useful consequence of Proposition 9.2.
Corollary 9.3. A diffusion on S = ( a, b) with coefficients (x), r 2 (x) is recurrent

if and only if s(a) = cc and s(b) = oo.
If S has one or two boundary points, then p xv is given by the following

proposition. The modifications of the above to get these conditions are left to
exercises.
Proposition 9.4
(a) Suppose S = [a, b) and a is reflecting. Then the diffusion is recurrent if
and only if s(b) = oo. If, on the other hand, s(b) < oo then one has p 1
for y > x and
b exp{-1(x 0 , z)} dz
s(b)s(x) =
Pxv = y<x. (9.26)
s(b) s(y)
exp{ I(x o ,z)} dz
fb y
(b) Suppose S = [ a, b) and a is absorbing. Then the only recurrent state is

a and
s(x) s(a)
if0<x<'y,
s(y) s(a)
Px y = I ifs(b)=oo
and 0,<y<x, (9.27)
s(b)s(x)
if s(b) < oc and 0 ,< y ,< x.
s(b) s(y)
(c) Suppose S = [ a, b] with both boundaries reflecting. Then p xy = I for all

x, y E S. and the diffusion is recurrent.
(d) Suppose S = [ a, b] with a absorbing and b reflecting. Then the only
recurrent state is a, p xy = 1 for y < x, and
s(x) s(a) y > x.

Px v , (9.28)
= s(y) s(a)
(e) Suppose S = [ a, b] with both boundaries absorbing. Then all states,

other than a and b, are transient and one has
s(b) s(x)
a y x<b,
s(b) s(y)
Px y = (9.29)
s(x) s(a)
a<x<y^b.
s(y) s(a)
10 NULL AND POSITIVE RECURRENCE OF DIFFUSIONS
As in the case of Markov chains, the existence of an invariant initial distribution

for a diffusion is equivalent to its positive recurrence. The present section is
devoted to the derivation of a criterion for positive recurrence.
For a diffusion {X} let t y denote the first passage time to a state y defined by
r y = inf{t >, 0: X, = y}, (10.1)
with the usual convention that t y (uw) = oo for those sample points w for which
X,(oo) does not equal y for any t (i.e., if the set on the right side of (10.1) is
empty). If c < d are two states, then write
z:=inf {t^0:X(c,d)} =r.A r d , (10.2)
where A denotes the minimum, i.e., S A t = min(s, t). Then i is the first time
the process is at c or d, provided X. e [c, d].
Let us now calculate the mean escape time M(x) of a diffusion from an
interval (c, d),
M(x):= E x T, (10.3)
Note that M(x) = 0 if x (c, d). Here [c, d] c S. Now denoting by Xti the
shifted process {(Xh ),:= Xk+( : t > 0}, one has
T = h + r(X,;) on the set {T > h}, (10.4)
since i(X,;) = z({(X,; ),}) is precisely the additional time needed by the process
{X1 }, after time h, to escape from (c, d). Because 1{,>k} is determined by (i.e.,
measurable with respect to) {X: 0 k)T(Xj )) = E x [1{ T >k)E(i(Xn ) {X: 0 s u h})] = E x( 1 {z>k}(Eyt) y =x ti )
= E X(l{r>k)M(Xk )) = EX(M(Xh)) Ex( 1 {t,k)M(Xh)). (10.5)

NULL AND POSITIVE RECURRENCE OF DIFFUSIONS 421
By (10.4) and (10.5), for x e (c, d),
M(x) = E x t = hPP (i > h) + EX(M(Xh)) EX(I{isn)M(Xh)) + Ez(r l{ hl)

= h + EX(M(Xh)) hP(r < h) EX(l(T<h)M(Xh)) + EX(rlltsh))
= h + E X (M(Xh )) + o(h), (h 10). (10.6)
The last relation in (10.6) clearly follows if E X (I (Z h} M(Xh )) = o(h), as h J 0; a

proof of which is sketched in Exercise 1. Now (10.6) leads to
1
lim - {(Th M)(x) M(x)} _ 1, (10.7)
Noh
i.e., (AM)(x) = 1 for x e (c, d). Therefore the following result is proved.
Proposition 10.1. The mean escape time M(x) from (c, d) satisfies
Zv z (x)M"(x) + u(x)M'(x) = 1, c < x < d,

(10.8)
M(c) = M(d) = 0.
In order to solve the two-point boundary-value problem (10.8), express the

differential equation as (see Section 9)
d dM(x)
1, (10.9)
dm(x) ds(x)
whose general solution is given by
dM(x) =
m(x) + c 1 , M(x) = c,s(x) f m(y) ds(y) + c 2 . (10.10)
ds(x) cx
From the boundary conditions in (10.8) it follows that
^ d m(y) ds(y)
cz = c,s(c). (10.11)
c ' s(d) s(c) '
Substituting this in (10.10), one gets
z
S(C; X) d
M(x) = m(c; y)s'(c; y) dy m(c; y)s'(c; y) dy
s(c; d) ff

e-rt^.:^ dz
_ jd d y
^2(z) eic'.Z) dz ] e_'.
dY
J e-,cc,=) dz < <
^ x y 2
^ LJG 6 2 e''
dz e `c'.y dY (c < x < d). (10.12)
-
)
Equation (10.9) justifies the nomenclature speed measure given to m. For suppose
the diffusion is in natural scale, i.e., that it has been relabeled so that s(x) = x.
Then (10.9) may be expressed as
M"(x) = m'(x). (10.13)
If m l , m 2 are two speed measures and M l , M2 the corresponding expected times

to escape from (c, d), then m' (x) > m'' (x) for all x implies M 1 (x) >, M(x) for
all x (Exercise 8). Thus, the larger m' is, the slower is the speed of escape. For
a Brownian motion with zero drift and diffusion coefficient a 2 , the speed measure
is m(x) = (2/a l )x (with x o = 0) and m'(x) = 2/a 2 .
Suppose the state space S has an inaccessible right end point b. Write tc A T d
for min{z,, t d }, the first passage time to {c, d}, so that
M(x) = E x (; A i d ), c < x < d. (10.14)
By letting d T b in (10.14), one obtains Ex ; provided
1im r d = oo with probability 1, (10.15)

dlb
which is what inaccessibility of b means. It may be shown that (10.15) holds,

in particular, if s(b) = oo (Exercise 2). In this case, E x ; is finite if and only if
(Exercise 3)
m(b) := m(x o ; b) b=2 e`cx M dz < cc. (10.16)
It now follows from the first equality in (10.12) that (Exercise 3)
Ex ; = JC (m(b) m(y)) ds(y), x > c. (10.17)
Proceeding in the same manner, it follows (Exercise 4) that if S has an

inaccessible lower end point a and s(a) = oo, then E x id < oo (x < d) if and
only if
2 e`cX.z) dz - X
m(a):_ m(x o ; a):=
f
. a. a2(Z) a a'
2
W
e-(z,x) dz < oc .
(10.18)
Then (Exercise 4)
E rd = (m(y) m(a)) ds(y), x < d. (10.19)

id
X
Finiteness (with probability 1) of the passage time t y under every initial state
is the meaning of recurrence of a diffusion. A stronger property is the finiteness
of their expectations.
Definition 10.1. A diffusion on S is positive recurrent if Er y < co for all x, y c S.

A recurrent diffusion that is not positive recurrent is null recurrent.
The positive recurrence of a diffusion implies its recurrence, although the

converse is not true.
The following result has been proved above.
Proposition 10.2. Suppose S = ( a, b). Then the diffusion is positive recurrent,

if and only if
s(a) = cc, m(a)> co, (10.20)
and
s(b) = +oo, rn(b) < oo. (10.21)
For intervals S having boundary points one may prove the following result.
The modifications that give this result are left to exercises.
Proposition 10.3
(a) Suppose S = [ a, b) with a a reflecting boundary. Then the diffusion is
positive recurrent if and only if s(b) = cc and m(b) < oo.
(b) Suppose S = [a, b] with a and b reflecting boundaries. Then the diffusion
is positive recurrent.
11 STOPPING TIMES AND THE STRONG MARKOV PROPERTY
Take Q to be the set of all possible. paths, i.e., the set of all continuous functions
aw on the time interval [0, co) into the interval S (state space). In this case X,
is the value of the trajectory (function) at time t, X1 (w) = w,. The sigmafield F
is the smallest sigmafield that includes all finite-dimensional events of the form
{X^. e B, for I < i ,< n}, where 0 .< t 1 <t 2 < <t, and B 1 , .... B are
subintervals of the state space S. The probabilities of such finite-dimensional
events are specified by the transition probability p(t; x, dy) of the diffusion and
an initial distribution it (see Eq. 1.4). This determines P on all of .F .
Let .. denote the sigmafield of events determined by finite-dimensional events
of the form {X1 , e B L for I <, i < n} with t, <, t for all i. Thus, f, = a{Xs : 0 < s < t}
is the class of events determined by the sample path or trajectory of the diffusion
up to time t.
Definition 11.1. A random variable r on the probability space (f', F) with

values in [0, oo] is a stopping time or a Markov time if, for every t ? 0,
This says that, for a stopping time i, whether or not to stop by time t is
determined by the sample path or trajectory up to time t. Check that constant
times T - t o are stopping times. The first passage times t y , ; A T d are important
examples of stopping times (Exercise 1). By the "past up to time t" we mean
the pre-T sigmafield of events .FT defined by
Wit -=v{XIAt : t ^ 0}. (11.1)
The stochastic process {X} :_ {X,,, } is referred to as the process stopped at r.

Events in .fit depend only on the process stopped at i. The stopped process
contains no further information about the process {X1 } beyond the time T.
An alternative description of the pre-T sigmafield, which is often useful in
checking whether a particular event belongs to it, is as follows,
5t ={Fe. :Fn{z<t}e. ,for all t>,0}. (11.2)
For example, using this it is simple to check that
{T<oo}eWit . (11.3)
The proof of the equivalence of (11.1) and (11.2) is a little long and is omitted
(see theoretical complement 1). For our purposes in this section, we take (11.2)
as the definition of .yt .
It follows from (11.1) or (11.2) that if i is the constant time r - t o , then
.; =^
As intuition would suggest, if t, and r 2 are two stopping times such that
r 2 then
F ci2 . (11.4)
For, if A e 34 ,, then
An {T Z -t} =(An{T 1 <t})n(A - {r 1 <t}n{r 2 >t})`. (11.5)
Since A n {z, < t} E .F, and {z 2 > t} _ {T Z < t}` E .y the set on the right side
of (11.5) is in .F,. Hence A e.yr2 .
Another important property of stopping times is that, if i 1 and r 2 are stopping
times, then r, A i 2 - min{r,, i 2 } is also a stopping time. This is intuitively
clear and simple to prove (Exercise 2). It follows from (11.4) that
c 5b Ft
A cz C 9 n ^`tz
Finite-dimensional events in J1t are those that depend only on finitely many
coordinates of the stopped process, i.e., events of the form
G={(X( I ni,...,X,..,,)EB} (11.6)
where B is a Borel subset of Sm := S x S x .. x S (m-fold). Similarly, a

finite-dimensional -measurable function is of the form
f(Xt l î,...,XI-AT) (11.7)
where f is a Borel-measurable function on S'. For example, one may take f

to be a continuous function on S .
More generally, one may consider an arbitrary measurable set F of paths
and let G be the event that the stopped process lies in F,
G = [{Xt T } E F] (Fey). (11.8)
The sigmafield . is precisely this class of sets G. The Markov property holds if
the "past" and "future" are defined relative to stopping times, and not merely
constant times. For this the past relative to a stopping time T is taken to be
the pre-T sigmafield information contained in the stopped process {X,,, T }. The
future relative to to r is the after-T process Xt = {(XT )1 } obtained by viewing
{X,} from time t = r onwards, for r < oo. That is,
(XT ),(w) = XZ( )+ ,(cv), t 0, on {w: T(w) < }. (11.9)
Since the after-T process is defined only on the set {r < oo}, events based on
it are subsets of {r < co}. The after -i sigmafield on IT < cc} comprises sets of
the form
H={Xt eF}n{r<oo}, (FE.). (11.10)
Finite-dimensional events and functions measurable with respect to this

sigmafield of subsets on {T < cc} may be defined as in (11.6) and (11.7) with
X,,, 1 replaced by X
Theorem 11.1. (Strong Markov Property). Let i be a stopping time. On

{i < oo}, the conditional distribution of X, given the past up to the time i is
the same as the distribution of the diffusion {X} starting at X. In other words,
the conditional distribution is Px on {x < co}.
Proof. First assume that i has countably many values ordered as

0 s s l < s 2 < . Consider a finite-dimensional function of the after-r process
of the form
h(X,,,, X^+r2, ... , {t < oo}, (11.11)
where h is a bounded continuous real-valued function on S' and

0<'t1 <t2 <.. <t,.Itisenough to prove
E[h(X,,,I, . .. , XT+r,)1{T<,}I ] _ (E y h(X , > tc))y=x.1{t<^}. (11.12)
If (11.12) holds, then for every bounded .Ft measurable function Z we have
E(Zh(X, + , ... , X+.) 1 <) = E(E[Zh(XX+ts> ... , Xt+t.) 1 {=< .'t1)
= E(ZE[h(XX+> . .. , X'+t) 1 {t<'} t.^^)

= E(Z[Eh(Xt ,, . .. , Xj.)]y=x, 1 ( <})
(11.13)
Conversely, if the equality between the first and last terms in (11.13) holds for
all bounded measurable functions Z of the stopped process, then (11.12) holds.
Indeed, to verify (11.12) it is enough to check (11.13) for finite-dimensional
functions Z of the form (11.7), where f is bounded and continuous on S'
(Exercise 3). Now
E(.f (XT t,, . .. , XT t,)h(Xt+t,, ... , XT+t,) 1 {^=s ; })
= E(f(XX ; t>> ... , X ... , X+)1{==s;})
= E(E[(f(XS;At> ... X ;nt )h(X . 41 , . .. , X. t) 1 =}) I s;^)
= E(f(XS . .. , Xs ; ,, t)lt==5 ;}E[h(XS ; +t> .. . , Xs+) ^^) (11.14)
since XSJA t ,, ... , X,, A are determined by {Xs : 0 < s <, s ; }, i.e., are measurable
with respect to ^sJ ; also,
{'c=si} = {a <si } n {z <si t } c Efs' --
By the Markov property, the last expression in (11.14) equals
E(.f(XS ; A , .
.. , X,, t ,) 1 {==s,}[Ey h(Xt,, ... , Xt:})],,=x, j )
= E(f(XTA > . , X,,, l,)l{t=s ; }[E y, h(Xt;, ... , X)]Y=x.).

Therefore,
E (J (X A r t , . . . , XLA !)h(X+ti, . . , XT +Ir) {T=S;})
= E(f(XrAt XL ,m) 1 }L=S ;

}[E,h(Xt.l , X})^=x). (11.15)
Summing (11.15) over j one gets
E(f(XL t,, .. . , X, A IM)h(XZ+t^, .. , XL+t.) 1 IT<^})
= E(f(X,,, ... , XZA,)[Eyh(Xrs, ... , Xt.)]Y=x,1{T<.}). (11.16)
This completes the proof in the case r has countably many values
o s 1 < S 2 < .
The case of more general r may be dealt with by approximating it by stopping
times assuming countably many values. Specifically, for each positive integer n
define
1
L if j=0,1,2,...
Z n = (11.17)
0o if r = oo .
Since
^ Tn = jn 1
< 2n t< 2
j J_ f t \ 2n J'f T J 2n }
RE^
{;t}= U fT n =^n r for allt >,0.
j:j/2"-<t )))
Therefore, T n is a stopping time for each n and i n (w) j i(w) as n oo for each
w e Q. Also, .F, c 3 (11.4). Let f, h be bounded continuous functions on
S', S', respectively. Define
q (Y) = E^,h(X, ; , ... , X1: ).

,
(11.18)
Then
PP(Y) =
J 9(Yi )p(ti;Y,Yi)dYi, (11.19)
where g(y 1 ) = E[h(X,,,, ... , Xt; ) I X so that g is bounded. Since

y p(t I ; y, y,) is continuous, cp is continuous (see Exercise 4 and theoretical
complement 1). Applying (11.16) to T = r n one has
E(.f(XT"At>> . > Xz"A tm)h(X+t;, ... , XT"+:) 1 {t"<W})
E (f(X,n,, .. . , XL"Atm)4(X,)11Tn<QO}). (11.20)

Since f, h, cp are continuous, {Xt } has continuous sample paths, and T. j i as

n -+ oo, Lebesgue's dominated convergence theorem may be used on both sides
of (11.20) to get
E (f (XL A t ,, . . . , XL lm)h(X + 1 , . . . , X ,.)1< cob)
= E(f(X,,,, . .. , Xt,)q (X) 1 (T<O)). (11.21)
This establishes (11.13), and therefore (11.12). n
Examples 1, 2, 3 below illustrate the importance of Theorem 11.1 in typical

computations. In all these examples {X} is a one-dimensional Brownian motion
with zero drift.
Example 1. (Probability of Reaching One Boundary Point Before Another). Let

{Xj be a one-dimensional Brownian motion with zero drift and diffusion
coefficient Q z > 0. Let c <d be two numbers and define
i(x) = Px ({X} reaches c before d), (c < x ` d). (11.22)
Fix x E (c, d) and It > 0 such that [x h, x + h] c (c, d). Writer = T x _ h A 'rx+n,
i.e., r is the first time {X} reaches x h or x + h. Then Px (i < oo) = 1, since
x+nexp^_(y_X)Zl dy
Px(i>t)<PP(xh<X,<x+h)=
( 21
1
f
(27CV2t)l^Z x_h
2o2t J}
_ 1
(2 )I Z
^ ^
f _ '
exp{
,41,7
z2
dz -*0 as t -+ cc. (11.23)
Now,
si(x) = PX ({X1 } reaches c before d) = Px ({(Xz ),} reaches c before d)

= Ex (P({(Xi ),} reaches c before d I {X,,, ^; t '> 0})). (11.24)
The strong Markov property now gives that
O(x) = EX(iIi(XL)) (11.25)
so that by symmetry of the normal distribution,
tî(x)=1i(xh)PP (XL =xh)+t/(x+h)PP (XT =x+h)

=ay(xh)Z+fi(x+h)2. (11.26)
Therefore,
(t (x+h) f(x))(^(x)î(xh))=0. (11.27)
Dividing the second-order difference equation (11.27) by h 2 and letting h j 0,

we get
."=0, 1'(c) = 1, i(d)=0. (11.28)
Therefore,
dx
OX)=dc (11.29)
which is a special case of (2.24), or (4.17) or (9.18); see also Eq. 9.6 of Chapter I.
Now, by (11.23) and (11.29),
P x ({X,} reaches d before c) = 1 î(x) = x c (11.30)

dc
for c < x <, d. It follows, on letting d T cc in (11.29) and c. oo in (11.30) that
P.(r < cc) = 1 for all x, y. (11.31)
Example 2. (Independent Increments of the First Passage Time Process). Let

{X,} be as in Example 1, with X o = 0 with probability 1. Fix 0 <a <b < cc.
By Theorem 11.1 the conditional distribution of the after-T a process X given
the past up to time T a is PX a = Pa , since XTa = a on {T a < oo }, and Po (T ; < cc) = 1
by (11.31). Thus {XfA } and {Xia+ ,} are independent stochastic process and the
distribution of X under P0 is Pa . That is, X7 :_ (X),} has the same
distribution as that of a Brownian motion starting at a.
Also, starting from 0, the state b can be reached only after a has been reached.
Hence, with Po -probability 1,
Tb = T a + inf{t i 0: X b} = T a + Tb(Xio), (11.32)
where T b (X) is the first hitting time of b by the after-T a process X. Since this
last hitting time depends only on the after-T a process, it is independent of T a
which is measurable with respect to .F. Hence T b Ta - Tb(X o) and T a are
independent, and the distribution of T b T a under P0 is the distribution of T b
under Pa . This last distribution is the same as the distribution of T b _ a under
P0 . For, if {X1 } is a Brownian motion starting at a then {X, a} is a Brownian
motion starting at zero; and if T b is the first passage time of {X,} to b then it
is also the first passage time of {XX a} to b a.
If 0 < a, <a 2 < ... <a are arbitrary, then the above arguments applied

to
Figure 11.1
to T a . _ , shows that T an T an _ , is independent of {t a , : 1 < i < n 1} and its

distribution (under Po ) is the same as the distribution of i a _ a , under P0 . We
have arrived at the following proposition.
Proposition 11.2. Under P0 , the stochastic process {Ta : 0 < a < o} is a process
with independent increments. Moreover, the increments are homogeneous, i.e.,
the distribution of T b t o is the same as that of T d ; if d c = b a; both
distributions being the same as the distribution of r b _ a under P0 .
Example 3. (Distribution of the First Passage Time). Let {XX } be a

one-dimensional Brownian motion with zero drift and diffusion coefficient
a 2 > 0, X0 = 0 with probability 1. Fix a > 0. Let { Y} be the same as {X} up
to time and then the reflection of {X} about the level a after time t o (see
Figure 11.1 above). That is,
XX for t < TQ, (

11.33)
_t2XTa X X =2aX, fort ->Ta.
Thus, Y = { YTa+' } = a + ( a X). Now the process a X is independent

of the past up to time r 0 , and has the same distribution as X a, namely,
that of a Brownian motion starting at zero. Therefore, Y - a + (a X) is
independent of the past up to time 'r a and has the same distribution as that of
a + ( X a) = X. We have arrived at the following result.
Proposition 11.3. (The Reflection Principle). The distribution of {Y} is the

same as that of {X,}.
Now for x e a one has (see Figure 11.1 above)

Po (X, ,> x) = Po (X1 ,> x, v a ,< t) = Po (Y ,< 2a x, i a (Y) ,< t), (11.34)
where t 0 (Y) is the first passage time to a of { }}. Note that ra (Y) = r a . By
Proposition 11.3, the extreme right side of (11.34) equals Po (X1 < 2a x, z a <- t).
Hence,
Po ( X,>, x )= Po ( X,<2a x,za <t) =Po (X,<2a x,M, >a), (11.35)
where M1 = max{Xs : 0 < s < t }. Setting x = a in (11.35) one gets
P0 (X, > a) = P0 (X, < a, M, >, a) = P0 (M, >, a) P0 (M, >, a, X, > a)
= PO (M, > a) PO (X, > a) (11.36)
since {X1 > a} c {M1 > a }. Thus,
PO (M, > a) = 2P0 (X, > a). (11.37)
Hence,
2 1c0 YZI(2a2t) dy
PO(ta < t) = P0 (M, % a) _ e-
(2ita2t)I 2 a
2
(2)1/2 e - = 212 dz. (11.38)
Thus, the probability density function (p.d.f.) of T a is given by
t 3/2 e a2 /(2ort)
-
0 < t < co. (11.39)
fa(t) = u(27r)1/2
The p.d.f. of MM is obtained by differentiating the first integral in (11.38) with

respect to a, namely,
9,(a) = 2 e
- a2 /(2,,2f) = 2Q1
( 27ra 2 t) 1J2 \a
aj tJI , cc <a < oo , (11.40)
where cp is the standard normal density, (p(z) = ( 2 7 ) 112 exp{ z 2 /2 }.

Changing variables x y = 2a x in (11.35) yields the joint distribution
of (X1 , M1 ),
PO (X, <, y, MM a) = Po (X, >, 2a y)
F =
- )/(a lt) ( 2irl)
e =/2 dz, co
(11.41)
<ya, a0.
Differentiating successively with respect to y and a, one gets the joint p.d.f. of

(X1 , MM ) as
h (y a) = 2(2a y) j 2a
r a 3 t 312 y
\ t7 \/ /
2 2ay 2
exp' (2ay) co< ,<a,
0,<a<oo .
g Q' t 3 / 2 p 2Q2t y
(11.42)
12 INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF

LARGE NUMBERS
In this section it is shown that the existence of an invariant probability is

equivalent to positive recurrence. In addition, a computation of the invariant
density is provided.
First, an important consequence of the strong Markov property (Theorem
11.1) is the following result.
Proposition 12.1. If the diffusion is recurrent, x, y e S (x < y) and f is a

real-valued (measurable) function on S, then the sequence of random variables
n2 +2
Z,= r f(X)ds (r= 1,2,...) (12.1)

n2r
is i.i.d., where
rl 1 = inf{t>,0:X,=x}, r12,= inf{t >r12r- :Xr =Y},

(12.2)
12, +1 =inf{t 11 2r :XX =x} (r=1,2,...).
Proof. By the strong Markov property, the conditional distribution of Z, given

the past up to time 11 2 , is the same as the distribution of j2 f (Xs ) ds with initial
state y. This last distribution does not change with the sample point w, and
therefore Z, is independent of the past up to time q 2 ,. In particular, Zr is
independent of Z 1 , Z Z ... , Z,_ 1 .
, n
The main result may now be derived.
Theorem 12.2. Suppose that the diffusion is positive recurrent on S = ( a, b).

(a) Then there exists a unique invariant distribution n(dx).
(b) For every real-valued f such that
fs Lf(x)In(dx) < oo, (12.3)

INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF LARGE NUMBERS 433
the strong law of large numbers holds, i.e., with probability 1,
r_X
1
t
lim f(X.,) ds =
t o
f fs(x)n(dx), (12.4)
no matter what the initial distribution may be.

(c) If the end points a, b, of S are inaccessible or reflecting, then the invariant
probability has a density ic(x), which is the unique normalized integrable
solution of A*7c(x) = 0, i.e.,
1 d2
2 (a 2 (x)rc(x)) x ((x)n(x)) = 0 for x e S, (12.5)
d
or simply of
dx (U2(X)n(x)) (x)ac(x) = 0. (12.6)

2
Indeed, the invariant measure is the normalized speed measure,
ir(x) = m (x)(12.7)
m(b) m(a)
Proof. Positive recurrence implies E5 (rl 2r r12._2) = E Y ri 2 < oc. Hence, by the
classical strong law of large numbers,
r
(flr' 12(r -1))
- r' 1
X12r
* Ey112 (12.8)
r r
with Pr -probability 1, as r -- oo. Let f be a bounded real-valued (measurable)

function on S. Then applying the strong law to the sequence Z,
(r = 0, 1, 2, ... ; ri o = 0) in (12.1) one gets
lim 1 Z Zr -1 = EYZ0 = E,f f(X,) ds. , "' (12.9)

r ^ r r'= 1
As in Section 9 of Chapter II (or Section 8 of Chapter IV), one has (Exercise 1)
lim 1
rc, fl2r
f 0
f(XS ) ds = lim
t ao t 0
f f( X ) ds, (12.10)

for every f such that
EI J 0
n2 I f(X,)I ds < oo . (12.11)
Combining (12.8)(12.10), one gets
lim 1 fo f(XX)ds= Et InZ f(X5 ) ds, (12.12)

r- m t Ey'12 0
for all f satisfying (12.11). In the special case f = 1 8 with B a Borel subset of
S, (12.12) becomes
r
1
lim 1 8 (X3 ) ds = ir(B) (12.13)
t- r tfo
where
n(B)'= 1
Epî2
fon 2
EY 1 B (XS ) ds. (12.14)
Thus, (12.13) says that the limiting proportion of time the process spends in a
set B equals the expected amount of time it spends in B during a single cycle
relative to (i.e., divided by) the mean length of the cycle. It is simple to check
from (12.14) that n is a probability measure on S (Exercise 2). Also, if
f = a1 , where a l , a 2 , ... , a are real numbers and B 1 , B 2i ... , B are
pairwise disjoint (Borel) subsets of S, then
Y a ^rl: 1
1 E5 f(Xs) ds =E 1 8 r (X.) ds
E5,12 0 2 i= 1 0
_ a n (B ) =J f(x)ic(dx).
i ;
The equality
n= ('
E2 o f (X) ds = f(x)ir(dx) (12.15)

EY z
^
J s
may now be extended to all f satisfying (12.11) (Exercise 2). Combining (12.12)
and (12.15), one has
lim 1
r- ,O t 0
r
f
f(X5 ) ds = sf(x)n(dx) (12.16)
with Pr -probability 1, for all f satisfying (12.11). Taking expectations in (12.16)

for bounded f one gets, by arguments used in Section 9 of Chapter II,
lim J t B y f (X,) ds = J f (x)n(dx) (12.17)

t-. t o s
for all y. Writing
(T5f)(Y) = Ej(XS) = f(z)p(s; Y, z) dz, (12.18)

fs
one may express (12.17) as
1 J (T
(' t
lim S f )(y) ds = f (x)n(dx) (12.19)
r - .0 t o s
for all y E S. But the left side also equals, for any given h> 0,
1 t+h 1 t 1 t ('
lim (Ts f)(y) ds = lim (Ts+hf)(Y) ds = lim (TT,, f))(y) ds

t-^co t h rôo t o t- x, t JO
= J (T f)(x)n(dx) (12.20)
h
The last equality follows from (12.19) applied to the function Th f. It follows
from (12.19) and (12.20) that, at least for all bounded (measurable) f, one has
JI (T f )(x)n(dx) = f f(x)n(dx)
s
h
s
(h > 0). (12.21)
Specializing this to f = 1 B one has (T h f)(x) = p(h; x, B), so that (12.21) yields
PP (Xh E B) = E,P(Xh e B 1 Xo ) = J p(h; x, B)n(dx)

s
=f J s
1 B (x)rc(dx) = J x(dx) = T(B) = P,,(Xo e B),
s
(12.22)
proving that it is an invariant probability. The proof of parts (a), (b) of Theorem
12.2 is now complete, excepting for uniqueness. Uniqueness may be proved in
the same manner as in Section 6 of Chapter II (Exercise 3).
In order to prove (c), first note that n(x) given by (12.7) is a probability
density function (p.d.f.) which satisfies (12.6) and, therefore, (12.5). To prove

its invariance one needs to check (see (12.14)), for n given by (12.7),
1
YIl
E
z
E E x .f(X:) ds = .f(z)n(z) dz =
s i (b) m(
s
)d (
-
) (12.23)
where f is an arbitrary bounded measurable function on S, and a, b, are the

end points of S. But, as in the proof of (10.17) and (10.18) (Exercise 4)
Ex f T- f(X,) ds = t y f f'u^).dm(ul ds(z),

Z.
0 Jz a
f
.f(X,) ds = Y b .f(u) dm(u) ds(z).
Ex o
JJ
(12.24)
Hence, by the strong Markov property,
EJ n2 f(X)ds=
0
E !
' f(Xs) ds + E x J 0
tY .f (XS) ds
r y /(' 6 \ 6
= I ( J f (u) dm(u) I ds(z) = s(x; y) f(u) dm(u). (12.25)
J X \ n / a
In particular, taking f - 1,
E y '7 2 = s(x; y)(m(b) m(a)). (12.26)
Dividing (12.25) by (12.26); one arrives at (12.23). n
The following alternative argument is instructive, and may be justified under

appropriate assumptions on the transition probability density p(t; x, y) (Exercise
5). By the backward equation (2.15) and integration by parts one has, for the
function ir(x) given by (12.7),
ap(t
f p(t; x, y) 7 r(x) dx =
s s
,
t
y)
f
m(x) dx = (Ax p(t; x, y))rti(x) dx
s
= Jp(t; x, y)(A*rc)(x) dx = 0,
s
(t > 0). (12.27)
Since fs p(t; x, y)rc(x) dx is the p.d.f. of X, when the initial p.d.f. is ir(x), it follows
from (12.27) that the distribution of X, does not change for t > 0; but X, Xo
a.s. as t 10, so that X, converges in distribution to ir(x) dx. Therefore, X, has
p.d.f. n(x) for every t > 0.
Example 1. (Price Adjustment in a Two-Commodity Model). The excess

demand function a(x) = (z t (x), z 2 (x)) of two commodities is a function of prices
x = (xl, x2) E A:= 1 (x1, x2): xl > 0,x2 > 0,x1 + x2 = 1}
such that Walras' law holds, namely,
x,z i (x) + x z z 2 (x) = 0, x c- A. (12.28)
In view of (12.28) one may concentrate on the behavior of just one excess
demand, say z l (x). Price adjustments in a market usually depend on excess
demands. Since x 2 = 1 x 1 , one may consider the price X 1 (t) of the first
commodity as a diffusion on (0, 1) with a drift function determined by the excess
demand z l at this price. For example, take the generator of such a diffusion to
be of the form
= 12
62 O
x ,O
dz + x d (0 <x<1), (12.29)
dx2 dx
so that the mean rate of change in price is proportional to the excess demand.
The drift function z 1 (x) satisfies
lim z l (x) = co, lim z i (x) = ace. (12.30)

XIO xii
This says that if the price of commodity I is very low (high) its demand is very
high (low), so the price will tend to go up (down) fast at the mean rate z,. Also,
assume that there is a constant r > 0 such that
Qz (x) , Q. (12.31)
Then the boundaries 0 and I become inaccessible. Indeed, it is easy to check

in this case that s(0) = co, s(1) = oo, and that the diffusion on S = (0, 1) is
positive recurrent and, therefore, 0 and 1 cannot be reached from S (Exercise
6). There is, therefore, a unique steady-state or equilibrium distribution
it(dx) = iv(x) dx, and, no matter where the system starts, the distribution of
X 1 (t) will approach this steady-state distribution as t co (see theoretical
complement 1).
Example 2. (Stochastic Changes in the Size of an Industry). By the "size" of

a competitive industry is meant its productive capacity. The profit level f(y)
for a given size y is the rate of additional return (profit) per unit of increase in
the industry size from its present size y. The law of diminishing return requires
that f'(y) <0 for all y. There is a "normal" profit level p. When the current
level f falls below p, firms tend to drop out or reduce their individual capacities.
If the current profit level f is higher than p, then the productive capacity of

the industry is increased owing either to the appearance of new firms or to

expansions in existing firms. In a deterministic dynamic model one might
consider the industry size y, to be governed by an equation of the form
dy` = g(.f(Y1) P), (12.32)

dt
where g is assumed to be sign preserving. That is,
g(u) (> 0 if u > 0

(12.33)
<0 ifu<0.
The deterministic equilibrium is the value y* such that f(y*) = P. We assume

f(0) > p > f (co ). In a stochastic dynamic model one may take the industry
size Y at time t to be a diffusion on S = (0, oo) with drift
(y) = g(f(Y) P)' 0 <y < cc, (12.34)
and a diffusion coefficient 2 (y) that represents the (time) rate of local fluctuation
(variance) when the process is at y. If one assumes
Q 2 (Y) > Q > 0, lim g(f(Y) P) < 0,

Y-M (12.35)
limyg(f(y)p)>0,
vlo
then the process { Y} is positive recurrent (and, in particular, 0 and o0 are

inaccessible) (Exercise 7). The strict sign-preserving property (12.33) is not
needed for this. The stochastic steady-state distribution is approached as time
increases, no matter how the process starts. This distribution is sometimes
referred to as the stochastic equilibrium.
13 THE CENTRAL LIMIT THEOREM FOR DIFFUSIONS
Consider a positive recurrent diffusion on an interval S. The following theorem

may be proved in much that same way as described in Section 10 of Chapter
II (Exercise 1).
Let y e S, and define ri g as in (12.2).
Theorem 13.1. Let f be a real-valued (measurable) function on the state space

S of a positive recurrent diffusion, satisfying
J f 2 (x)n(x) dx < cc, b2:= E y (

s \ o
J '12 (f(X,) E,j) ds^ Z < co, (13.1)

THE CENTRAL LIMIT THEOREM FOR DIFFUSIONS 439
where ir(dx) = 7r(x) dx is the invariant probability. Then as t -- oo,
J
1
o
'
(f(X) E n f) ds (13.2)
converges in distribution to a Gaussian law with mean zero and variance

2 /(E, 2 ), whatever the initial distribution.
In order to use Theorem 13.1, one must be able to express the variance
parameter in terms of the drift and diffusion coefficients. A derivation of such
an expression will be given now.
Assume E,j = 0 (i.e., replace f by f E rz f ). Under the initial distribution
n the variance of (13.2) is given by
Var>< Jf(X) ds = E(
o t o
J f(X.,) ds) = 1 E n J
t J
o 0
f(X)f(X5 ) ds' ds
,
r r r s
_
o
? En
o J f(X)f(X)ds' ds = ?
o 0
En(f(XX).f(X5))
_?JJ
t o
S
oo
E><[f(X5)E(f(X,) 1 X .)] ds' ds
5
2 r s
J
('
= - E ,r(.f(Xs)(TS-s.f)(Xs )) ds' ds. (13.3)

t o o
Now
E,c(.f(Xs.)(T5 - 5.f)(X))=
f s
.f(x)(TS-s-f)(x)n(x)dx. (13.4)
Therefore, (13.3) may be expressed as
1 r 2 r s
J
Var n o f (XS) ds
=
fs ( fo I (Ts
2 r
t
s f)(x) ds' ds)f (x)n(x) dx
s t 0 (I (T f)(x) du) ds] f(x)n(x) dx. (13.5)
Assume that the limit
h(x) = lim ^^ (Tf)(x) du (13.6)

s. 0
exists in L2 (S, it) (see remark following proof of Theorem T.13.4, page 515).

Then
1.
lim 1
t .' ot JS (T f)(x) duds = h(x),
o
(13.7)
and one has, using (13.5) and (13.7),
lim Var,, 1 f(X5 ) ds = 2f h(x)f(x)n(x) dx. (13.8)

1--^ f o s
Let us show that the function h satisfies the equation
Ah(x) = f (x), (13.9)
where A is the infinitesimal generator of the diffusion, i.e.,
(13.10)
A = Zo2(x) - + (x) d ,
together with boundary conditions if S has a boundary. For this, note that, by
(13.6), for > 0,
T E h(x) = lim J S (TE Tf)(x) du = lim

s-^ o
J S (TE+ . f)(x) du
= lim
S I s
+E (Tf)(x) dv = lim

J e
S (T f)(x) dv
= lim
s'^^ o
J S (Tj)(x) dv
J0
(T^f)(x) dv = h(x) J (T^f)(x) dv.
0
(13.11)
Hence,
(TEh)(x) h(x) 1 E
Ah(x) = lim = um-- (T f)(x) dv = (T o f)(x) = f (x).
ej0 E sf.0 E 0
(13.12)
These calculations provide the following result.
Proposition 13.2. The variance of the limiting Gaussian distribution in

Theorem (13.1) is given by
2<f, h> := 2
f
s
f(x)h(x)n(x) dx (13.13)
where h is a solution to (13.9) in LZ (S, n).

INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS 441
It may be shown that if there exists a function h in LZ (S, it) satisfying (13.9)
for a given f in LZ (S, it) with Ej = 0, then such an h is unique up to the
addition of a constant (see theoretical complement 4).
The condition E n f = 0 ensures that in this case the expression (13.13) does
not change if a constant is added to h. Also note that, in order that (13.9) may
admit a solution h, one must have Ej f = 0. This is seen by integrating both
sides of (13.9) with respect to it(x) dx and reducing the first integral by
integration by parts. That is,
Ah(x)it(x) dx =
JS f s
h(x)(A*n(x)) dx
= fhs
(x)[Z(u'(x)n(x))" (p(x)n(x))'] dx = 0, (13.14)
using (12.5).
14 INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN

MOTION AND DIFFUSIONS
A k- dimensional standard Brownian motion with initial state x = (x ( ' ) , x (2) , ... , x)
is the process {B, = (B, .. . , B"): t 0} where {B, }, 1 < i < k, are k
independent one-dimensional Brownian motions starting at x ' (1 < i < k). It
(
is easily seen to be Markovian and the conditional density of B + given given

{Bh: 0 < u < s} is a Gaussian (k-dimensional) density with mean vector Bs and
variancecovariance matrix tl where I is the (k x k) identity matrix. The
transition density function is
1 ^/2 k exp 1 ^ (y ( x" )

) 2
p(t; x, y) =
[ (2nt) ] 2t ; = 1
(Y ` x(`))
I
( )
k I ex { _
(2nt) 1 / 2 p 2t ?^
where
x = (x" ...,x (k) ), Y = (Y "),...,y(k)). (14.1)
As in the one-dimensional case, the Markov property also follows from the fact
that {B: t > 0} is a (vector-valued) process with independent increments. It is
straightforward to check that p satisfies the backward equation
ap_1k ^ a 2 p
(14.2)
as well as the forward equation
ap - 1 k02p
(14.3)
t 2 a=1 (a y (I)) 2
A Brownian motion {X, = (X,(' ) , . .. , X,(k) )} with drift vector p = ( (1) , ... , (k) )
and diffusion matrix D = ((d ;; )), is defined by
X,=x 0 +t+o'B,, (14.4)
where X a = x 0 = (xol) , ... , x )) is the initial state, a is a k x k matrix satisfying

66' = D , and B, = (B,. . . , Bt( k) ) is a standard Brownian motion with initial
state "zero" (vector). For each t > 0, aB, is a nonsingular linear transformation
of a Gaussian vector B, whose mean (vector) is zero and variance-covariance
matrix tI. Therefore, aB, is a Gaussian random vector with mean vector zero
and variance-covariance matrix atic' = tal = tD = ((td ;; )). Therefore, X, is
Gaussian with mean x 0 + ttt and variance-covariance matrix tD. Since
X, +S - X, = st + a(B, , - B,), {X,} is a process with independent increments
(and is, therefore, Markovian) having a transition density function
1
p(t; x, y) -
((21rt)'/ 2 ) k (det D)" 2
( 1 k k
x ci) - t (i))
x expS -- Y I d''(y(`) - x ( ` ) - t()(yc;) -
` 2 t;=ii=1
(x,yeRk;t>,0). (14.5)
Here ((d")) = D -i* . This transition probability density satisfies the backward
and forward equations (Exercise 3)
a 1 k k a2 k a
U)
,
rat = 2 ; ^ j=1 d" ax ` axu + ; Y axe;)
( ) )
(14.6)
IC
a t__ 1d
l3 a 2 P - u ;;) aP
t 2;=1; =i
ay u)a x c;) ;=1 a y (j)
More generally, we have the following.
Definition 14.1. A k-dimensional diffusion with (nonconstant) drift bector

(x) = (p" l) (x), .... p (k) (x)) and (nonconstant) diffusion matrix D(x) = ((d (x))) ;;
is a Markov process whose transition density function p(t; x, y) satisfies the

Kolmogorov equations
aP
t
1 Y_
aP __ 1
k
2, =1 i=1
k
a
I d,i(x) ax ^ ax ct^ + Y (x) ax
a 2 (dj;(Y)P)
1
3(f^(`^(Y)P)
^,^ ,
(14.7)
=1 a y <<^
t 2i=11=1 yu
Such a p may be shown to exist and be unique under the assumptions:
1. ((d i3 (y))) is, for each y, a positive definite matrix.

2. d(y) is, for each pair (i, j), twice continuously differentiable.
3 ` (y) is, for each i, continuously differentiable.
( )
4 l ( O(y)) does not go to infinity faster than lyl, and Id (y)I does not go to
;;
infinity faster than Iy1 2 .
An alternative definition of a multidimensional diffusion may be given by

requiring that the appropriate analog of (1.2)' holds (Exercise 2). The state
space may be taken to be a suitably "regular" open subset of Il k , e.g., a rectangle,
a ball, l{B k \ {0 }, etc.
Once p is given, the joint distribution of the stochastic process at any finite
set of time points is assigned in the usual manner, e.g., given X 0 = x 0 ,
P(t1; x0, x1)P(t2 t1; x1, x 2 ).

..P(t n t o -1; x n - 1 , x )
is the joint density of (X,,, X, ... , Xj where 0 < t 1 < t 2 < < t,. The
sample paths may also be taken to be continuous (vector-valued functions oft).
An alternative method of constructing the diffusion is to solve Itb's stochastic
differential equations:
dX, = (X,) dt + a(X,) dB^, X 0 = x, (14.8)
where {B,: t >, 0} is a standard Brownian motion. The equations (14.8) (there
are k equations corresponding to the k components of the vector X,) may be
roughly interpreted as follows: Given {X: 0 < u <, t }, the conditional distribution
of the increment dX, = X, +d , X, is approximately Gaussian with mean (X,) dt
and variancecovariance matrix D(X) dt. We will study the Ito calculus in detail
in Chapter VII.
The operator
k k
12
A:= d,i(x) a2 +
x^`^ xci^
Y j(x)axti)
a (14.9)
^.i =1 r=1
is called the (infinitesimal) generator of the diffusion on R k with drift p() and
diffusion matrix D(.).
It is sometimes possible to reduce the state space by transformation from

multidimensional to one dimensional while still preserving the Markov property.
Such reductions often facilitate computation of certain probabilities. Of course,
if cp is a one-to-one measurable transformation then p(x) may be thought of
only as a relabeling of x, and the Markov property is preserved, since knowing
cp(X), 0 < u < s, is equivalent to knowing X,,, 0 < u < s. In particular, if cp is
a one-to-one twice continuously differentiable map of Il"
onto an open subset
S of III
W', then the Markov process {Y,:= (p(X,)} is also a diffusion whose drift
and diffusion coefficients may be expressed in terms of those of {X,} in a manner
analogous to that in Proposition 3.2 (Exercise 2). The Ito calculus of Chapter
VII provides another method for calculating the coefficients of the transformed
diffusion (Exercise 3).
We are here interested, however, in transformations that are not one-to-one
and, in fact, reduce the dimension of the state space.
Example 1. (The Bessel Process or the Radial Brownian Motion). Let {B,} be
a k-dimensional standard Brownian motion, k> 1. Let
k
O(x) = Ixl = 5" (x) 21 ^ 2 R,:= q(B,) = IB,I
( =J
Let us show that {R} is a Markov process on the state space g= [0, co).
Let f be an arbitrary real-valued, bounded measurable function on S. Write
g(x) = (f o ip)(x) = f (IxI). Using the Markov property of {B,} one has
E[.f (R,+s) I {R,,: 0 u s}] = E[E(f (R 1+S ) I {B,,: 0 s u s}) I {R: 0 u s}]
= E[E(g(B, +S ) I {B: 0 u s}) I {R: 0 u s}]
= E[(T,g)(B5) I {R: 0 u <, s}], (14.10)
where {T r } is the semigroup of transition operators associated with {B,}, i.e.,
(T,g)(x) = g((y))p(t; x, y) dy
fR k
f
2
= i(IYI)(2^ct) k^2 eXp _ ly 2a x l dy. (14.11)
Rk
To reduce the last integral, change to polar coordinates. That is, first integrate
over the surface of the sphere {lyl = r} with respect to the normalized surface
area measure da, and then integrate out r after multiplying the integrand by
the surface area uw k r k- ' of {lyl = r }. Since

k
IY - x1 2 = Iy1 2 + IXI 2 - 2 Y xy,
x12+IYI?1
ex { IY xl Z ^ da = ex {_I
ffl yl =r} p 2t p 2t J
x
flI exp lx
ZI =1}
lIYI
t i=1
wu ) zc` ) }da i (z) (14.12)
)))
where w = x /Ixl, z = Y/IYI Note that E;` =1 w ( zW is the cosine of the angle
between the unit vectors w and z, and its distribution under da l does not depend
on the particular unit vector w. In particular, one may replace w by (1, 0, ... , 0).
Hence,
I Y 2rxl 2 l dar __ exp{ Ix12 2t

r2^ f
Jilyl .y exp{ il=1 =^^ exp ^\ )z(1) } dal
I zJ
1x12 + r
exp
2t
I h(
rlxl
t
)
(14.13)
where (Exercise 9)
1
e "(1 uz)(k zu2 du -
h(v):=
f
ilzl =11
e Z `
da, _
J
' 1 (14.14)
1
(1 uz)(k-2)I2 du
Thus,
w k e - I=I 212 i f( r ) e -.=I2(h( j
(T1g)(x) = (2mt) - ki 2 r x ll r k-I d r
= G(t, Ixl),
J o ` t )
(14.15)
say. Using (14.15) in (14.10) one gets the desired Markov property of {R 1 },
E[.i(R, +s) I {R: 0 < u 5 s }] = E[ i(t, IX5I) I {R u : 0 < u < s }] ='(t, Rs).
(14.16)
Further, it follows from (14.15) that the transition probability density function
of {R 1 } is
z
9(t; r, r') = (21rt)-ki2wk
exp _ r ^rt
lhl,k ^. 2 r zl (14.17)

Write the semigroup of transition operators for {R,} as {T r }. Then,
(Trf)(r) = (T9)( X )Ilxi=. _'(t, r),

a a _ a I 1 k a 2 (T,g)(X)I
at (Tjf)(r) = at 0(t, r) at (Tr9)(X) = 2 i I (a x (i))Z
1 k a2,(t, IxI) I 1 1 r ( 2^(t, r) aixI l

2 i = 1 ax , z ,=, 2 i =, ax ar ax / Ix,.,
1" a ( aI1/(tr) x) "1

2 i= [ax(i) ar IXI Ixl=r
a^ (t r) 1 (x (i) ) 2
__ 1 '` a 2 ^(t, r) (x(1))2 +
2i a [ Or r2 r (r r 3 )llxl=r
= 1 a 2 î(t, r) + k 1 a, (t, r) = 1 a 2 (T 1 f)(r) + k 1 a(T^ f)(r)

2 ar e 2r ar 2 ar e 2r ar
(14.18)
Thus, the transformed Markov process has as its infinitesimal generator the
Bessel operator
Ar. 1 dz + k 1 d
(14.19)
2 dr 2 2r dr
It will be shown later that the state space of {R,} may be restricted to (0, cc),
so that (k 1)/2r in (14.19) is defined.
It may be noted that not only is {R,} Markovian, its distribution under P,,
is Q 4,( . ) ; where P. is the distribution of the Brownian motion {X,} when X 0 = x,
and Q, that of {R} when R o = r. This is easily checked by looking at the joint
distribution, under Px , of (R,., ... , R) by successive conditioning with respect
to {X: 0 '< U
i =n-1,n-2,...,1 (0<tI <t 2 <<t).
Let us make use of the above reduction to calculate the probability (for
given 0<c<d<cc)
Px ({X 1 } reaches aB(a:c) before aB(a:d)), (14.20)
for c < Ix al < d. Here B(a:r) = {z E R k : Iz al <r} is the k-dimensional ball

of radius r and center a, and OB(a:r) = {Iz al = r} is its boundary (surface).
By translation, (14.20) is the same as
Px _ a ({X t } reaches aB(O:c) before aB(O:d)), (14.21)
for c<Ixal<d.

In turn, (14.21) equals
Px _ a ({R r } reaches c before d) = Q 1 21 ({)} reaches c before d), (14.22)
where { Y} is the canonical one-dimensional diffusion (i.e., the coordinate

process) on S = [0, oo), having the same distribution as {R, }. From (2.24) (or,
(4.17), or (9.19)), the last probability is given by
d
e - ' , 'I dr
0(Ix al) _ ^" dH^ (14.23)
f c
e -'..) dr
where, with (y) = (k 1)/2y, Q 2 (y) = 1 (see (14.19)),
^'
dy= log(C) k- ( 14.24)
I(c,r) Jar Q Z (y) dy= ^^rk y I
so that
logdloglx al
ifk =2,
logd lagt (
c k 2 (C ) k
-2`14.25)
^G(Ix al) _
Ix alJ d
c k-2 if k > 2,
(k -)
for c<Ixal<d.
Further, letting dl oo in (14.25) (and (14.21)), one arrives at
If k 2
Px ( {X 1 } ever reaches B(a:c)) = 1 k_2 (c < Ix al).
ifk>2,
xca
(14.26)
In other words, a two-dimensional Brownian motion is recurrent, while

higher-dimensional Brownian motions are transient (Exercises 5 and 6). For
k = 2, given any ball, {X,} will reach it (with probability 1) no matter where
it starts. For k > 2, there is a positive probability that {X,} will never reach a
ball if it starts from a point outside. Recall that an analogous result is true for
multidimensional simple symmetric random walks (see Section 5 of Chapter I).
In one respect, however, multidimensional random walks on integer lattices
differ from multidimensional Brownian motions. In the former case, in view of
the countability of the state space, the random walk reaches every state with
positive probability. This is not true for multidimensional Brownian motions.
Indeed, by letting c 10 in (14.25) one gets for 0< jx aj < d,
P,,({X,} reaches a before 3B(a:d)) = 0. (14.27)
Letting d t co in (14.27) it follows that
PX ({X,} ever reaches a) = 0 (0 < x al). (14.28)
In particular, taking a = 0, one gets
Q,({Y} ever reaches 0) = 0, (0 <r < cc). (14.29)
In view of (14.29), zero is said to be an inaccessible boundary for the Bessel

process. The state space of the Bessel process may, therefore, be restricted to
S = (0, 00).
Observe that the Markov property of {cp(X,):= JX^^ = Rj in Example 1

follows from the fact that T, transforms (bounded measurable) functions of q(x)
into function of cp(x). This is a general property, as may be seen from the steps
(14.10), (14.11), (14.15), and (14.16). In most cases, however, it is difficult, if
not impossible, to calculate T, g, since the transition probability cannot be
computed explicitly. A simple check for the process {q (X,)} to be Markov is
furnished by the following.
Proposition 14.1
(a) If A transforms functions of q(x) into functions of cp(x) then {cp(X,)} is
a Markov process.
(b) If P. denotes the distribution of {X} when X 0 = x and Q that of {cp(X,)}
when gp(X o ) = u, then the P. distribution of {cp(X,)} is Q 4, .
This statement needs some qualification, since not all bounded measurable
functions are in the domain (of definition) of A (see theoretical complement 1).
15 MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING

BOUNDARY CONDITIONS AND CRITERIA FOR TRANSIENCE
AND RECURRENCE
Consider a diffusion {X,} on U8'` with drift coefficients y ( ' ) () and diffusion
coefficients d ; j (.) as described in Section 13. Let G be a proper open subset of
Il k with a "smooth" boundary 0G (see theoretical complement 1), e.g.,
G = H":= {x e U8' k : x ) > 0), G = B(O:a)== {x e Il k : lxi <a). Note that EH" =
MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS 449
{x E ll : x^' = 0 }, aB(O:a) = {x E R: jxl = a}. One may "restrict" the diffusion

)
to G - G u r7G in various ways, by prescribing what the diffusion must do on

reaching the boundary G starting from the interior of G, consistent with the
Markov property and the continuity of sample paths. The two most common
prescriptions are (i) absorption and (ii) reflection. This section is devoted to
absorption.
In the case of absorption the new process {X,} is defined in terms of the
original unrestricted process {X,} (which is assumed to start in S = G u c G) by
if t r
where t a, G is the first passage time to G, also called the first hitting time of G,
defined by
T,G = inf {t >- 0: X, e 2G} . (15.2)
As always, T ay = oo if {X,} never hits G.
Definition 15.1. The process {X} defined by (15.1) is called a diffusion on

G = G u OG with absorption on G.
The Markov property of {JZ,} follows from calculations that are almost the
same as carried out for Theorem 7.1 (Exercise 1). Its transition probability is
given by
p(t;x,B) = P(X t eBIX =x)

P(X ( EB,T 2G >tjX =x) ifBcG,
(15.3)
P(i oc <, t,X, c-BIX =x) ifBcG.
Now {Xj has a transition probability density, say, p(t; x, y), and
P (X,EB,t ac >t1X =x)<,P(X,EB1X =x)= J p(t;x,y)dy =0, (15.4)

s
if B c G has Lebesgue measure zero. Hence, the measure B -+ P(t; x, B) on the

Borel sigmafield of G has a density, say p (t; x, y) (which vanishes outside G),
so that (15.3) may be expressed as
B c G, x e G,
p(t; x, B) = J (
B
p (t; x, y) dy,
15.5)
PX(zaG<t,X uG c-B) Bc2G, xeG,
,

where P. denotes the distribution of the unrestricted process starting at x. If

x e OG, then of course p(t; x, B) = 1 B (x) for all t > 0.
Example 1. (Brownian Motion on a Half-Space with Absorption). Let p ( () = p i ,

d (.) = Q?8 i; , where the ;, r > 0 are constants. Let {X, = (X;'", .. , X)} be
;;
an unrestricted Brownian motion with these coefficients, starting at some x E H k .

First consider the case k = 1. For this case the state space is S = [0, x), whose

boundary is {0}. Example 8.4 provides the density p = p, say, given by
p(t; u, v) = exp{
u^(V u) p It }(2 ma^t)-viz
2 z )))
1
x exp
(v u) z
exp _
(v u) 2 11(15.6)
2Q, t 2v; t
for t>0,u> 0,v>0.

As a special case for l = 0 one has
P(t; u, v) = P1(t; u, v) P,(t; u, v) (15.7)
for t > 0, u > 0, v > 0, where p l is the transition probability density of an

unrestricted one-dimensional Brownian motion with drift zero and diffusion
coefficient ai.
Also, for the case k = 1, one has G = {0}, so that X 0
P(iaG < t and Xt , G = 0) = P(T,G 1< t), (15.8)
which is the distribution of the first passage time to zero of a one-dimensional

Brownian motion starting at u. By relation (10.15) in Section 10 of Chapter I,
we have
u) ds (15.9)
PP(Tac '< t) =
fo, fq,u,(s;
where
z
iiz exp _ (u s
.%1,1(s; u) = u(2ôls3)
)
(15.10)
2s
Q;
for u > 0.
For k> 1, the (k 1)-dimensional Brownian motion {(X 2 ,. . . , X; k) )} is
independent of {X1 1) } and, therefore, of the first passage time
iac = inf{t > 0: X(' = 0}. (15.11)

Therefore, for every interval I, c (0, cc) and every Borel subset C of l " ' it
follows that
p(t; x, 1i x C) = Px( {X;' C 1 1 , (X} z) , ... , X{ k) ) E C} n {z aG > t })

)
= P( {X{ i) E 1 i } n {T OG > t} Xo' ) = x^' ) )

x P((X; Z) , .. , Xt(k)) E C Xo) = xci) for 2 <, j <, k)
(') v) v a)) d v 1z)...d v (k),

x
= f p',(t' dv/ J flp'(t;xi)
x ' > 0, (15.12)

( )
where p is given by (15.6) and pj is the transition probability density of an

unrestricted one-dimensional Brownian motion with drift p and diffusion
coefficient o; namely,
p,(t; u, v) = (2iroj t) 'i 2 I _ (v = u t ) z } ,

2Q;t
;
(u, v c l '). (15.13)
Hence, for x, y e H k ,
k

p (t; x, Y) = p(t; x(') y
( )) fl p;(t; x , (15.14)
i=z
Consider a Borel set B c G. Then B = {0} x C for some Borel subset C of

Rk
- I and, for x e G,
P(t; x, B) = Px ( {r c -< t} n {X {0} x C })
=Px ({(X,...,Xtk,)EC }n {T OG 1<t})
= Px((Xs2), , Xs k) ) E C} I Tc'G = s)Jai.N,(s; x11)) ds

J
= fol (f ... f pJ(s; xî)vci)) dv' z ' . .. dv^ k) ).ral, 1 (s;x UU) )ds. (15.15)
`J c J ^
The last equality holds owing to the independence of ' OG and the (k 1) -
dimensional Brownian motion {(Xt( z) , ... , X;k) )}. Letting t -+ oo in (15.15) one
arrives at the Pr -distribution ifr (x, dy) of X reG * Let y' = (0, y (2) , . .. , y (k) ) C G. It
follows from (15.15) that the distribution /i(x, dy) has a density '(x', y') with
respect to Lebesgue measure on 3G = {0} x k -1 , given by
V/(x, y') = ()
f6 2 ' ,,(s; x ' ) fl p (s, x ti) yti)) ds (y' E OG; x e H k ). (15.16)
fo "o i=z
For the special case y j = 0, a? = a 2 for I < i < k, we have
J x(')(2i exp{
i(x, Y) =
0
a2)-kizs-(k+z)iz
2Q2s L
^(x(r))2 + I (y(i) xci))2]} ds
z J
Jx (1) 2kt2 I
t (krz)- 1e-^ dt
(2n)k/2 I ( x (1))2 + [' (y (l) x (i))2 1 k/2

L [2, J
r(k/2) x(1)
(15.17)
ir k/2 Ix ( 0 , Y ' )I k
The second equality in (15.17) is the result of the change of variables
s t = [( x (1))2 + Y_ (y x j ) 1'2a s.
( ) 2
2
L z J
It is interesting to note that, for k = 2, (15.17) is the Cauchy distribution.
The explicit calculations carried out for the example above are not possible
for general diffusions, or for more general domains G. One may, however, derive
the linear second-order equations whose solutions are p (t; x, dy). The function
p (t; x, y) satisfies Kolmogorov's backward equation (see theoretical complement 2)
x, y) _ k d.. x a2p(t;
x, y) + '` (^) x aP (t; x, Y) __ A o
F^ O ax( , ) P ,

at 2 j "O ax (O ax(,)
(t>0,x,yEG), (15.18)
lim
p (t; x, y) = 0 (t > 0, ye G). (15.19)
x-a 3G.x G
Relations (15.18)(15.19) may be checked directly for Example 1.

Finally, for B c G using the same argument as used in Section 7, (see
(7.22)(7.24)) one gets, writing i for TOG,
P(t; x, B) = PX (X, e B) = P,^(T t, X t e B)

=PX (i<00,X,eB)Px (i>t,X,aB)
= PX (r < oo and X = e B)
E X [PX ({T>t}n{X t aB} {X,,:0<u<t})]
=Px (t< cc and X t cB)

Ex[Ex( 1 (t>) 1 (xtX,ÊB} 1 {X:0 - , } Py (i < oc, X , E B) y = x ,]
= î(x; B)
B)p (t; x, y) dy (t > 0, x c G), (15.20)
where
i4'(x, B):= Px (r < oc, X e B), (B c 1G). (15.21)

Since p (t; x, y) is determined by (15.18) and (15.19) (along with the initial

condition lim, 10 p (t; x, y) dy = S x for x E G), it remains to determine fir. Now,
for fixed x, the hitting distribution O(x; dy) on G is determined by the collection
of functions
p 1 (x) '=
fac
f(y)^(x; dy) = Ex(f(X^) 1 (^<^}) (x E G),
(15.22)
for arbitrary bounded continuous f on 13 G. Let us show that
Al f (x) =0 (xEG),
(15.23)
lim tG f (x) = f(a) (a c 1G).
x a
Drop the subscript f and assume for simplicity that O(x) can be extended
to a twice-differentiable function i on Q that belongs to the domain of A, the
infinitesimal generator of the unrestricted diffusion {X 1 } (theoretical complement
4). Then, for x E G, assuming PP (r ac < t) = o(t) as t j0, (see Exercise 14.10),
proceed as in the proof of Proposition 9.1 to get, writing r for ' OG
T1c(x) = Ex(^(X1)) = Ex( 1 (t> /(X,)) + Ex( 1 (Tsri/(XO)
= Ex( 1 V>t ^4(XJ) + o(t) = Ex( 1 (t>,)Ex,(f(X^)l(t<)) + o(t)

( }
= Ex( 1 (c>t}E(J (X (X,')) {Xu: 0 ,)f(Xt(X,-))1(t(Xn+)<w)) + O(t)
Ex( 1 {L >ti./ (XTlit<mi) + (t)
x [X = X V(x ,- ) and {r < oo} = {c{X ) < o0 on {i > t }]
= Ex(f(X^) 1 1,<l) + o(t) [PP(r < t) = o(t)]

= '(x) + o(t) = i(x) + o(t) (x E G, t > 0). (15.24)
Hence,
Tî(x) >y(x) = 0
A(x) = At (x) = lim (x e G), (15.25)
1 1 0t

establishing the first relation in (15.23). The second relation in (15.23), namely,
continuity at the boundary, may also be established under broad assumptions
(theoretical complement 2).
If G is bounded and aG smooth then the so-called Dirichlet problem (15.23),
for a given continuous f specified on G, can be shown to have a unique solution
by the so-called maximum principle (see theoretical complement 3). A proof of
this principle is sketched in Exercises 10-12 for the case A = A is the Laplacian
I' 02
A = (15.26)
ca> 2 '
The Maximum Principle. Let u be a continuous function on G = G u 3G,

where G is a bounded, connected, open set, such that Au = 0 on G. Then u
attains its maximum and minimum on G, and not on G, unless it is a constant.
To prove uniqueness of the solution of (15.23), suppose i2 are two

solutions. Let u = rG 1 i 2 , whose value on G is zero. Now apply the maximum
principle to u.
We briefly illustrate these ideas by an example.
Example 2. (Brownian Motion on a Ball with Absorption at the Boundary).

Take p (.)=0, d, (.)=a 2 8 for 1<i,j,<k. Let G={xell ; Ix! <a},
; ;;
G = {Ixj = a} for some a > 0. The case k = 1 is dealt with in Example 8.3, in
detail. Let us consider k> 1. The hitting distribution i/i(x; dy), i.e., the
Ps -distribution of X L , c is called the harmonic measure on G. Its density fi(x, y)
with respect to the surface area measure s(dy) on G (or the arc length measure
in the case k = 2) is the Poisson kernel,
a 2 Ix1 2
i(x; Y) =(IYI = a), (15.27)
cokalx yl k
where w k is the surface area of the unit sphere (Exercise 7). The solution to
(15.23) is
a2 _ 2 (Y)
s(dy). (15.28)
Of(x) o
IX1 t^y^ =o} IX
It is left as an exercise (Exercise 11) to check that x -- t4 (x, y) satisfies Ai/i = 0

in G for each x e OG, where A is the Laplacian (15.26). One may then differentiate
under the integral sign to prove Abi f = 0 in G. The checks that
SOG i/i(x; y)s(dy) = 1 for each x e G, and that i(x, y) dy converges weakly to
0 (dy) as x 4 y o E G are left to Exercise 11.
Let us now turn to transience and recurrence of unrestricted multidimensional

diffusions {X 1 }.
Now, according to the probabilistic representation (15.22) of the solution to
the Dirichlet problem (15.23), the probability
O(x):= PX ( {X,} reaches OB(0; c) before aB(0; d)), (0 < c < lxl < d),
(15.29)
is the solution to
A(i(x) = 0 for 0 < c < ixi < d,
lim (x)=
x y ,^x^<a
10 1 if lyi = c,
if lYl = d.
(15.30)
To see this take G in (15.22) and (15.23) to be the annulus G = {c < lxl <d},
and f to be 1 on {ixi = c }, and 0 on {lxl = d}. Recall that the analog (9.2) to
(15.30) was solved in Section 9 to compute i,/i for the derivation of criteria for
transience and recurrence of one-dimensional diffusions. In the case of
multidimensional Brownian motion, (15.30) reduces to a two-point boundary
value problem, such as (9.2), in a single variable r = lxi. This equation was
solved in Section 14 to derive criteria for transience and recurrence of multi-
dimensional Brownian motion. Unfortunately, for general multidimensional
diffusions the Dirichlet problem (15.30) cannot be solved explicitly. It is possible,
however, to use a multidimensional analog of Corollary 2.4 to derive lower
and upper bounds for the probability i. Notice that the arguments sketched
in Section 2 remain valid for multidimensional diffusions and, hence, Corollary
2.4 is valid with S = Il k . Such a generalization is also derived in Chapter VII,
Corollary 3.2, by the method of stochastic differential equations.
In order to derive lower and upper bounds for ', some notation is needed.
Let F be a given real-valued twice continuously differentiable function on (0, oo).
Consider the radial function
f(x):= F(Ixl) (Ixl > 0). (15.31)
Straightforward differentiations yield (also see (14.18))
x(')
x(^)f(x) = IXI F'(ixl),
z (x(`))2 1 - (x ( ' ) ) 2 \
(ax(i)2 f(x) _ ^ xI2 F (IXI) + IXI (IXI), (15.32)
IXI3 F
3(2) x(i)xU) x(i)x(1)
ax(i) 2x(,) = 1x1 2 F (Ixi) 1x13 F'(Ixl), (1 i).

Also write,
d(x):= Y d..(x)x''
ixt2
' x
( ) )
B(x):= trace of D(x) _ d ;; (x),
C(x):=2 x p (x),
I
(15.33)
B(x) + C(x)
(r) := max d(x) 1,
Ixl =r
O B(x) + C(x)
r min
d(x)
Ixl_r 1,
a(r):= max d(x), a(r):= min d(x).

Ixl =r Ixl =r
Finally define, for some c > 0,
1 (r):= f r (u) du, I(r):= O du. (15.34)

Jc u f r
c u
Using (15.33) it is simple to check that
+ B(x) d(x) + C(x) F'(Ixl).

2Af = d(x)F "(Ixl) (15.35)
x
Proposition 15.1. Under conditions (1)-(4) in Section 14, the function 0 in

(15.29) satisfies
d d
exp{ I(u)} du exp{I(u)} du
fX1
(' (c <, jxI < d)(15.36)
f d
exp{ - 1(u)} du J exp{ I(u)} du
Proof. Let
J r exp{ -1(u)} du
F(r):= ('`d .f(x):= F(Ixi) (c < Ix! < d). (15.37)
J
exp{ 1(u)} du

Then F'(r) >0 and F"(r) + F'(r)(r)/r = 0, for r > c. Hence by (15.35) and the
definition of , writing r = jxl,
A x A
f <, max i(Y) c F'(r)(r)
F"(r) + -- = 0, (15.38)
d(x) I,1= d(y) r
so that
Af(x) < 0 for all c < jxi <, d. (15.39)
Now extend f to a twice continuously differentiable function on all of l that

vanishes outside a compact set (Exercise 15). Denote this extension also by f.
By the extension of Corollary 2.4 to S = 11" mentioned above,
Z, :=f(X,) J 0
Af(X S )ds (t '>0) (15.40)
is a martingale. By optional stopping (see Proposition 13.9 of Chapter I and

Exercise 21),
EXZ, = E.Z o = f(x), (15.41)
with
i = TOG(o ;d) A TG(o ;d). (15.42)
In other words, using (15.39),
E,,.f(X^) =f(x) + Ex J Af(X.,) ds < f(x) (c < Ixl < d). (15.43)
0
On the other hand, f (X t ) is 1 if {X} reaches B(0; d) before OB(0; c), and is 0
otherwise. Therefore, (15.43) becomes
Px ( {X,} reaches OB(0; d) before B(0; c)) < f (x). (15.44)
Subtracting both sides from 1, the first inequality in (15.36) is obtained. For
the second inequality in (15.36), replace by in the definition of f in (15.37)
to get A f (x) >, 0, and E x f (X r ) >, f (x). n
The following corollary is immediate on letting d T o0 in (15.36). Write
p c (x):=Px ( {X,} eventually reaches B(0; c)). (15.45)

Corollary 15.2. If the conditions (1)-(4) of Section 14 hold, then for all x,
Ixt > C,
p(x) = 1 if j^ exp{-1(u)} du = co, (15.46)
p(x) < 1 if exp{-1(u)} du < oo. (15.47)

fc ,0
Note that the convergence or divergence of the integrals in (15.46), (15.47)

does not depend on the value of c (Exercise 16).
Theorem 15.3 below says that the divergence of the integral in (15.46) implies
recurrence, while the convergence of the integral in (15.47) implies transience.
First we must define more precisely transience and recurrence for
multidimensional diffusions. Since "point recurrence" cannot be expected to
hold in multidimensions, as is illustrated for the case of Brownian motion in
(14.28), the definition of recurrence on l8 k has to be modified (in the same way
as for Brownian motion).
Definition 15.2. A diffusion on R' (k > 1) is said to be recurrent if for every

pair x0y
PX(IX, - yl <, e for some t) = 1 (15.48)
for all e > 0. A diffusion is transient if
Px (IXÎ- coast- co) =1 (15.49)
for all x.
Theorem 15.3. Assume that the conditions (1)-(4) in Section 14 hold.

(a) If J e_ 1 = oo for some c > 0, then the diffusion is recurrent.
(b) If J e `- " du < oo for some c> 0, then the diffusion is transient.
- ( )
Proof. (a) Fix x 0 , y (x o y) and c > 0. Let d > max{1x 0 1, lyl + s}. Choose c,
0 < c < d, such that {Ixl = c} is disjoint from B(y; e). Define the stopping times
T 1 := inf{t ? 0: IXÎ = d}, T2:= inf{t >, r 1 : IXI = c},

(15.50)
i2n+1 = inf{t - r 2n : IX 1 I = d}, T2 , := inf{t ,> zzn 1: IX:I = c}.
-
Now the function x - PX (IX,I = d for some t > 0), (x) < d, is the solution to
the Dirichlet problem (15.23) with G = B(0; d), and boundary function f - 1
on {Ixl = d}. But i(x) - 1 is a solution to this Dirichlet problem. By the
uniqueness of the solution,
Px (IX,I = d for some t >, 0) = I for Ixl < d. (15.51)
This means
PX(TI < cc) = 1, (Ixl < d). (15.52)
As r 2 is the first hitting time of {Ixl = c} by the after-z process X , it follows

from (15.52), the strong Markov property, and (15.46), that
Px(r2 < 00 ) = 1, (Ixl < d). (15.53)
It now follows by induction that
PX (t<oo) =1, forlxl<d, n >,1. (15.54)
Now define the events
A,,:= { {X, } does not reach B(y; s) during (ten, r 21]} (n > 1). (15.55)
Then
Pxo(A,, I = EX 0 (i(X^ 2 ^)), (15.56)
where
O(x) = P.(A t ) (15.57)
But iIi(x) is the solution to the Dirichlet problem
Ai(x) = 0 for x e G := B(y; d) B(y; e),

1 if Izl = d, (15.58)
lim i(x)=
xZ 0 if z e OB(y; e).
By the maximum principle, 0 < O(x) < 1 for x e G; in particular,
1 6 0 : =max >/i(x) < 1. (15.59)

IxI =c
Therefore, from (15.56),
P..(A, I ^sz.)'< (I 6 0) (n >, 1). (15.60)

Hence, (Exercise 17)
P 0 X,} does not reach B(y; E)) < P, 0 (A I n . n A) < (1 8 0 ) for all n,
({
(15.61)
which implies (15.48), with x = x 0 .
(b) The proof is analogous to the proof of the transience of Brownian motion
for k >, 3, as sketched in Exercise 14.5 and Exercise 18. n
16 REFLECTING BOUNDARY CONDITIONS FOR

MULTIDIMENSIONAL DIFFUSIONS
The probabilistic construction of reflecting diffusions is not quite as simple as

that of absorbing diffusions. Let us begin with an example.
Example 1. (Reflecting Brownian Motion on a Half-Space). Take S =

Hk uHk = R,where
Hk = {X E Rk ; X (1) >O}, 3H k = {X E P k ; x (1) = 0}.
First consider the case k = 1. This is dealt with in detail in Subsection 6.2,
Example 1. In this case a reflecting Brownian motion {X,} with zero drift and
diffusion coefficient a 2 > 0 is given by
Y = All (16.1)
where {X} is a one-dimensional (unrestricted) Brownian motion with zero drift

and diffusion coefficient a 2 > 0. The transition probability density of {X1 is }
given by
g1(t;x,y)= p1(t;x,y)+p1(t;x, y) (x,y% 0 ), (16.2)
where p l (t; x, y) is the transition probability density of {B,},
_(y x Z
p, (t; x, y) = (2na2t)_ 1/2 exp 2
( X, y e W). (16.3)
2Q t )
Fork > 1, consider k independent Brownian motions {Xli) }, I < j k, each

having drift zero and diffusion coefficient a 2 . Then {Y t ;= (IX'(' ) I, X e ) , ... , X; k) )}
is a Brownian motion in Hk := {x e Il k ; x ( ' ) > 0} = Hk u 3Hk with normal
reflection at the boundary {x e ff; x (1) = 0} = Hk . Its transition probability
density function q is given by
k
q(t; x, y) = qi(t; x('), y ( ' ) ) ri pi(t; x(i), y e) ),
j =2 (16.4)
(X = (x(1), ... , x (k) ), y = (.y(1) ... , y (k) ) E Hk).
REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS 461
This transition density satisfies Kolmogorov's backward equation
a t = i 62 A.q(t; x, y),
k
(t > 0; XE
a2
Hk , ye Hk), (16.5)
where
=j ax
a q(t; x, y) =0, (t >0;xEaHk,yEHk). (16.6)

--
_ The following extension of Theorem 6.1 provides a class of diffusions on

Hk with normal reflection at the boundary.
Suppose that drift coefficients p (x), I , .. , x (k) ) write
x = ( x (1 1 x 121 x ).
(16.8)
Now extend the coefficients /1(1)(.), d ;; () on all of 11" by setting, on (Hk )`,
JU (^)( x ) = _p(l)(X), U(j)(x) = /1(i)(X)

(2 <j < k),
d(x) = d, ^(x), d,;x = d 1 (z) ; (2 < j < k), (16.9)
d.;(x) = d 1 () (2 < i, j < k).
Proposition 16.1. Let {X,} be a diffusion on Rk with coefficients ^l^( ) d. (. )

defined on 11 and satisfying (16.7), (16.9). Then the process
{Y := (IX ' j, X(k))} ,( )
is a Markov process on H k , whose transition probability density function q is

given by
q(t; x, y) = p(t; x, y) + p(t; x, y)

= p(t; x, y) + p(t; z, y), (x, y e Rk ), (16.10)
where p is the transition probability density of {X, }.

Proof. Bya change of variables, the Markov process {X,: ( .. , X;k) )}

has the transition probability density p given by
p(t; x, y) = p(t; z, y). (16.11)
Therefore p satisfies the backward equation
aP(t; x, y) ap(t; X, Y)
at = at
1 k k
_ I dj;(X)(a1ajp(t; X, y) + ( ` ) (X)(a^p)(t; X, y), (16.12)

2 i.i=1 !=1
where a ; p denotes differentiation of p with respect to the ith backward

coordinate. Now
(a1 p)(t; X, Y) = (a1 p)(t, x, y),
(a;p)(t; X, Y) = (a; P)(t; x, y), (2 < j 5 k),
(aip)(t; X> Y) _ (a1P)(t; x, y), (16.13)
(ataip)(t; X, Y) _ ( 2 ,a;P)(t; x, y), (2 j <, k),

(ei a;p)(t; X, Y) _ (ei a;p)(t; x, y), (2 5 i, j < k).
It follows from (16.7), (16.9), (16.12), and (16.13) that {X} and {X t } have
the same drift and diffusion coefficients and, therefore, the same transition
probability density, i.e.,
p(t; x, y) = p(t; x, y). (16.14)
It now follows, as in the proof of Theorem 6.1, that

(1) J , X(2) X(k))}
{Yt':=
l - (IX t , , t
is a Markov process whose transition probability density is given by

p(t; x, y) + p(t; x, y). The second equality in (16.10) is obtained using (16.14).
n
The backward equation for p leads, by (16.10), to the backward equation for q,
k 2 k
1 d(x) ax(^ ax (;) + ^ ()(x) (x E Hk ; t > 0). (16.15)
a^ =
The second equation in (16.10), together with the assumption that p(t; x, y) is
differentiable in x, implies the backward boundary condition
aq(t;x, y)
=0 (t > 0; xEaHk ,yEHk ). (16.16)
Definition 16.1. A Markov process on the state space Hk that has continuous
sample paths and whose transition density q satisfies (16.15), (16.16) is called
a reflecting diffusion on Hk with drift coefficients ( ` 1 (x) and diffusion coefficients
d(x).
It may be shown that the first requirement in (16.7), namely,
' (x)=0
( ) forxeHk , (16.17)
is really not needed for the validity of Proposition 16.1. The condition (16.17)
ensures that the extended drift coefficients in (16.9) are continuous. If it does
not hold, extend the coefficients as in (16.9) and modify (')() on OHk by setting
it zero there. Although the extended 'I() is no longer continuous, it is still
(
possible to define a diffusion having these coefficients and Proposition 16.1 goes
through (theoretical complement 1).
The second requirement in (16.7), i.e., the condition
d 1 (x)=0 for x e aHk , 2 < j < k, (16.18)
is of a different nature. The next proposition shows that the failure of (16.18)
means that the direction of reflection is no longer normal to the boundary.
First consider a one-dimensional reflecting Brownian motion on [0, co) with
mean drift it 'I and diffusion coefficient 1. It may be constructed by the method
described in the proof of Proposition 16.1, after setting the drift p"I on
( oo, 0) and modifying the drift at 0 to have the value zero as described above
(also see Section 6). Let q l denote the transition density of this reflecting
Brownian motion. Let p j denote the transition probability density of a
one-dimensional Brownian motion (on R) with drift and diffusion
coefficient 1,2 < j < k, i.e.,
- " 2 exp (y x t ^2
p;(t; x,y):_(2nt) ) . (16.19)
2t
Then
k
pj ( c; x ye>),
q(t; x, y) := qi(t; x I ye>) [] (x, y E Hk), (16.20)
I= 2
is the transition probability density of a Markov process on Hk satisfying the

backward equation
k
aq(t; x, y) 1 a 2 9 ^;, aq
(t>0; xEHk ,yEHk ), (16.21)
at = 2 ; _ , (ax<<>)2 +
aq(t;x,y)
= forxeaHk , (t>O;yeHk ). (16.22)
ax" ^
This Markov process will be referred to as a reflecting Brownian motion on Hk

having drift := (('), ... , (k) ) and diffusion matrix I (or, diffusion coefficients
b ;, (1 i,j<k)).
To construct a reflecting Brownian motion on an arbitrary half-space and
having (constant, but) arbitrary drift and diffusion coefficients, first denote by
H the half-space
H=Hy:={xeRk;yx>0}, (16.23)
where y = (y .... y,) is a unit vector in Q, and y x := I y ; x 1 is the Euclidean

inner product. The interior of H is H:= {y x > 0} and its boundary is
aH := {y x = 0}. Let t = ( (1) , ... , (k) ) be an arbitrary vector in R" and
D := ((d 1 )) an arbitrary k x k positive definite matrix. We write, for every
real-valued differentiable function f (on l or H), grad f for the gradient of f, i.e.,
( 3f(x)
(grad = , ... , of (x) (16.24)
ax(1) ax(k)
Also write D" 2 for the positive definite matrix such that D 1 / 2 D 1 / 2 = D, and
D 1/2 for its inverse.
Proposition 16.2. On the half-space H = fly there exists a Markov process

{Z,} with continuous sample paths whose transition probability density r
satisfies the Kolmogorov backward equation
ar(t; z, w) 1 k 02r k ^^^ ar

d,+
at = 2 ;, 1 b`
az(;) + 1 =1
(t > 0; zeH,wef), (16.25)

(Dy)(grad r)(t; z, w) = 0 (t > 0; z e OH, w E H). (16.26)

Further, {Z,} has the representation
Z, := D' 12 0Y, (16.27)

where 0 is an orthogonal transformation such that 0' maps D" 2 y/ID" 2 yJ into
e:= (1, 0, ... 0), and {Y} is a reflecting Brownian motion on Hk having drift
vector v:= (D 112 O) - 't and diffusion matrix I.
Proof. Let {Y,} denote a reflecting Brownian motion on the half-space Hk and
0 an orthogonal matrix as described. Then {Z, }, defined by (16.27), is a Markov
process on the state space H (Exercise 3). Let q, r denote the transition
probability densities of {Y, }, {Z, }, respectively. Then, writing T:= D 112 0,
r(t; z, w) = (det D 112 ) 'q(t; T - 'z T - 'w). (16.28)
Since whenever {B,} is a standard Brownian motion, {TB,} is a Brownian

motion with dispersion matrix TT' = D, one may easily guess that r satisfies
the backward equation (16.25). To verify (16.25) by direct computation, use
the fact that,
k
aq(t; x, y)
=2exq+ ;^1 v ( ` a q 16.29
)
( )
at ax(`)
and that, for every real-valued twice-differentiable function f on W' the following
standard rules on the differentiation of composite functions apply (Exercise 4):
T' grad(f o T - ') = (grad f) o T - ',

(16.30)
TT'(((aja;f) T - I )) = (((ai a;)(f T -' )))
This is used with f as the function x q(t; x, T -1 w), for fixed t and T - 'w, to
arrive at (16.25) using (16.28) and (16.29).
The boundary condition aq/ax^`> - (egrad q) = 0 on OHk becomes, using
the first relation in (16.30),
e(T' grad r)(t; z, w) = 0 (z E aH - {yz = 0 }). (16.31)
Recall that O' D 1 / 2 y = IDYI e, and T = D 112 0, to express (16.31) as
(D). (grad r)(t; z, w) = 0 if z e OH. (16.32)

n
Since a diffusion on H k with spatially varying drift coefficients and constant

diffusion matrix I may be constructed by the method of Proposition 16.1, the
preceding result may be proved with {Y,} taken as such a diffusion. This leads
to a Markov process (diffusion) on H whose transition probability density r
satisfies the backward equation
ar(t; z, w) 1 k 32r k ( i) ar
(z) azî) (16.33)
at 2 ,, ^ d " as<<^ a z ( +
and the backward boundary condition (16.26).

To extend the preceding construction to a spatially dependent dispersion
matrix D(x) that does not satisfy (16.18), involves some difficulty, which can
in general be resolved only locally; constructions on local pieces must be put
together to come up with the desired diffusion on H (theoretical complement 2).
In order to define reflecting diffusions on more general domains, we consider
domains of the form
G:= {x e Ul k ; cp(x) >, 0}, (16.34)
where cp is a real-valued continuously differentiable function on R such that

grad q is bounded away from zero on the boundary, i.e., for some c > 0,
(gradq(x)i>c>0 forxe aG:_{x:Q(x)=0}. (16.35)
Write G:= {p(x) > 0}. Let D(x):= ((d ; (x))), and u ' (x) (1 < i < k) satisfy the
( )
assumptions (1)(4) of Section 14, on G.
Definition 16.2. Let {X,} be a Markov process on the state space G in (16.34)
that has continuous sample paths and a transition probability density q
satisfying the Kolmogorov backward equation
k 2 k
ag(t; x, y) = Aq := 1 Z d,j ( x ) a g + 1 uî^(x) qa ,
at 2 i . j = 1 ax(`) ax j> ( i = 1 ax(f)
(t>0;xeG,ye G), (16.36)
D(x)(grad (p)(x)grad q = 0 (t > 0; x e aG, y e G). (16.37)
Then {X,} is called a reflecting diffusion on G having drift coefficients p O( ) (
and diffusion coefficients d(.) (1 < i,j < k). The vector D(x)(grad cp)(x) at
x e aG (or any nonzero multiple of it), is said to be conormal to the boundary
at x. The reflection of {X,} is said to be in the direction of the conormal to the
boundary.
Note that, according to standard terminology, (grad p)(x) is normal to the

boundary of G at x a aG.
So far we have considered domains (16.37) with q(x) = x^l ) . Now let
p(x) = 1 1x1 2 , so that G =_ {x e IRk: lxi < 1} is the closed unit ball.
Example 2. The reflecting standard Brownian motion {X,} on the unit ball
B(0:1) = {x e R': Ixl < 1} has a transition probability density p satisfying the
backward equation
ap(t; x, Y) = i
!A p(t; x, Y) (t > 0; IXI < 1 , IY1 , 1), (16.38)
at
and the boundary condition
y- 0) ap(t; x, Y) =
0 (t > 0; Ixl = 1, IYI < 1). (16.39)
ax ' ( )
It is useful to note that the radial motion {R,:= IX,I} is a Markov process
on [0, 1]. This follows from the fact that the Laplacian A. transforms radial
functions into radial functions (Proposition 14.1) and the boundary condition
is radial in nature. Indeed, if f is a twice continuously differentiable function
on [0, 1] and g(x) = f(Ixl), then g is a twice continuously differentiable radial
function on 9(0:1) and the function
7 g(x)= Exg(X,) = Exf(IXI), (16.40)
is the solution to the initial-value problem
au(t, x)
ZAXu(t, x), (t > 0; Ixl < 1),
at
k
au(t,
x(i' x) = 0, (Ixl = 1), (16.41)
lim u(t, x) = f(Ixl), (Ixl < 1 ).

to
Let v(t, r) be the unique solution to
av(t, r) 1 az v k 1 av
= ----- + ------- (t>0;0 r<1),
at tare 2r ar'
av(t, r) -
- 0, (t > 0; r = 1), (16.42)
ar
limv(t,r)= f(r), (0<r<1).

tlo
It is not difficult to check that the function v(t, Ixl) satisfies (16.41) (Exercise
5). By uniqueness of the solution to (16.41) it follows that u(t, x) _ v(t, Ixl).
Thus, the transition operators T, transform radial functions to radial functions.

Hence, by Proposition 14.1, {R,} is a Markov process on [0, 1] whose
infinitesimal generator is
1d2 k-1 d
A ,:=--+ ,
2 dr z 2r dr
with boundary condition d/dr = 0 at r = 1. Recall (see Eq. 14.29, Section 14)
that 0 is an inaccessible boundary for {R,}. Hence, a reflecting boundary
condition attached to the accessible boundary r = 1 suffices to specify the
Markov process {R,}.
17 CHAPTER APPLICATION: G. I. TAYLOR'S THEORY OF

SOLUTE TRANSPORT IN A CAPILLARY
In a classic study (see theoretical complement 1), G. I. Taylor showed that

when a solute in dilute concentration is injected into a liquid flowing with a
steady slow velocity through an infinite straight capillary of uniform circular
cross section, the concentration along the capillary becomes Gaussian as time
increases. Such results have diverse applications. Taylor himself used his theory,
along with some meticulous experiments, to determine molecular diffusion
coefficients for various substances. Another application is to determine the rate
at which a chemical injected into the bloodstream propagates.
Let a denote the radius of the circular cross-section of the capillary whose
interior G and boundary G are given by
G = {y = (y (l) , y'): oc < y c^, < z, ly'l <a},

(17.1)
G = {y: ly'I = a}, (y' = (Y (2) , y (3) )).
Let c(t; y) denote the solute concentration at a point y at time t. The velocity
of the liquid is along the capillary length (i.e., in the yU ) direction) and is given,
as the solution of a linearized NavierStokes equation governing a steady
nonturbulent flow, by
F(y') = u0 ( i --i
a
^z
. (17.2)
The parameter U0 is the maximum velocity, attained at the center of the

capillary.
As in Einstein's theory of Brownian motion, a solute particle injected at a
point x = (x', x (2) , x (3) ) E G is locally like a three-dimensional Brownian
motion with drift (F(x'), 0, 0) and diffusion matrix D O I, where Do is the molecular
diffusion coefficient of the solute (relative to the liquid in the capillary), and I
is the 3 x 3 identity matrix. When this solute particle reaches the boundary clG,
it is reflected. In other words, the location {X, = (X; ", X;^^, X^ 3 ^) = (X^' ) , X,)}
of the solute particle is a reflecting diffusion on G = G u G having drift
(x) = (F(x'), 0, 0) and diffusion matrix D.I. Therefore, its transition probability
density p(t; x, y) satisfies Kolmogorov's backward equation,
^ p = Ap'= zDoAXp + F(x')

t
ap
G'x(1)
(x = (x" , x'), x' = (x^2> x^ 3 ^)), (17.3)
)
where A is the Laplacian Z 2 /(x) 2 . The backward boundary condition is

;
that of normal reflection (see (16.37) with cp(x) = a 2 Ix'1 2 ),
x )
x
p +x ( ' ) b i
x
p =n(x)gradp =0 (xcG,t>0;ye6). (17.4)
Here, n(x) = (0, x') is the normal (of length a) at the boundary point
x = (x", x').
The FokkerPlanck or forward equation may now be shown, by arguments
entirely analogous to those given in Sections 2 and 6, to be
= 2D0A rp --- (F(y')p) (ye G, t > 0). (17.5)
The forward boundary condition is the no-flux condition (see Section 2)
y (2) 2) +y=n(y)grad p =0 (ycG,t>0). (17.6)
By using the divergence theorem, which is a multidimensional analog of

integration by parts, it is simple to check, in the manner of Sections 2 and 6,
that (17.5) and (17.6) are indeed the forward (or, adjoint) conditions. Given
an initial solute concentration distribution c o (dx), the solute concentration
c(t; y) at y at time t is then given by
y) = p(t; x, y)co(dx) (17.7)

c(t;
J G
and it satisfies the FokkerPlanck equations (17.5) and (17.6).

Since (a) the diffusion matrix is D O I, (b) the drift velocity is along the x' -axis
and depends only on x^ 2 ^, x^ 3 ^, and (c) the boundary condition only involves
x (2) , x (3) , it should be at least intuitively clear that {X,} has the following
representation:
1. {X := (X 2 ,X 3 )} is a two-dimensional, reflecting Brownian motion on

the disc Ba := {ly'I <, a} with diffusion matrix D 0 I', where I' is the 2 x 2
identity matrix.
2. X,(' = Xo' + f F(X) ds + Do B where {B 1 } is a one-dimensional
)
standard Brownian motion independent of {X;} and Xo l ". (17.8)
A proof of this representation is given toward the end of this section.

Now the transition probability density p' of the process {X,'} is related top by
p'(t; x', Y') = p(t; x, y) dyt i) . (17.9)
By the Markov property of {XI}, the integral on the right does not involve xi ".
Now the disc Ba = {ly'I <, a} is compact and p'(t; x', y') is positive and
continuous in x', y' e BQ for each t > 0. It follows, as in Proposition 6.3 of
Chapter II, that for some positive constants c 1 , c 2 , one has
max Ip'(t; x', y') y(Y')I < c l e - ` 2 ` (t > 0), (17.10)

X , ,y'EBa
where y(y') is the unique invariant probability for p'. Let us check that y(y') is
the uniform density,
n g (IY'I < a).

y(Y') = a (17.11)
It is enough to show that
p'(t; x', y')y(x') dx' = 0. (17.12)

atfjjy'j'_â)
But p' satisfies the backward equation
(Ix'! < a), (17.13)

atp = iDOAX p'
along with the boundary condition
x t2) a P + xts) a p = 0 (Ix') = a). (17.14)

ax ax
Interchanging the order of differentiation and integration on the left side of
(17.12), and using (17.13) and (17.14) and the divergence theorem, (17.12) is
established. In other words, the adjoint operator, which in this symmetric case
is the same as the infinitesimal generator, annihilates constants.
As a consequence of (17.10), the concentration c(t; y) gets uniformized, or

becomes constant, in the y'-plane. That is,
(17.15)
(t; Y,) :=
J ^ c(t; y) dy ' ' c'=
ao
( )
(fa co(dx))'7za2
exponentially fast as t > oo. Of much greater interest, however, is the asymptotic
behavior of the concentration in the y ' -direction, i.e., of ( )
e(t; y
)) := dy'I.
7 (1 .16)
y) dy' = p(t; x, y)
fj c(t;
Iy'l a) J _co(dx)I
G IJy'^ a)
The study of the asymptotics of c(t; y ") is further simplified by the fact that
(
the radial process {R,:= IX,'1} is Markovian. This is a consequence of the facts
that (1) the backward operator ZD.A.. of {X;} transforms all smooth radial
functions on the disc B o into radial functions and (2) the boundary condition
(17.12) is radial, asserting that the derivative in the radial direction at the
boundary vanishes (see Proposition 14.1). Hence {R,'} is a diffusion on [0, a]
whose transition probability density is
q(t; r, r') := p'(t; X', Y')sr(dY) (Ix'l = r), (17.17)

. iI9'I_r)
where s,(dy') is the arc length measure on the circle {Iy'I = r' }. The infinitesimal
generator of {R;} is given by the backward operator
_ (d 2 1 dl
0 < r < a (17.18)
AR A dr 2 + r drJ'
and the backward reflecting boundary condition
= 0.
dl
dr r = a
(17.19)
One may arrive at (17.18) and (17.19) from (17.17), (17.13), and (17.14), as in
Example 2 of Section 16 (see Eq. 16.42 for k = 2). It follows from (17.10),
(17.11), and (17.17) that {R,'} has the unique invariant density
2r
ir(r) = z , 0 <, r <, a, (17.20)
a
and that
2r
max q(t; r, r') r 27cac 1 e - ` 2t (t > 0). (17.2)
o,r.rsa a
For convenience, write
f(r) = 1 r2 (O < r < a). (17.22)

Then F(y') = Uo f (ly'l)
Now by the central limit theorem (Theorem 13.1, Exercise 13.2. Also see
theoretical complement 13.3) it follows that, as n oo,
Z= n (f(R;) En.f) ds ' N( 0 , 2 t), (17.23)

'12 fo,",
where the convergence in (17.23) is in distribution, and

f0a
Ej = f(r)ir(r) dr =
(1
o
J r (r2 dr = 1 .
?
aZ/ \a 2 2
(17.24)
The variance parameter u 2 of the limiting Gaussian is given by (Proposition

13.2, Exercise 13.2)
Q Z =2
f 0
'a (f (r) E n f)h(r)n(r) dr (17.25)
where h is a function in LZ ([O, a], n) satisfying
ARh(r) _ (f (r) E,rf) _ ^2 1 . (17.26)
Such an h is unique up to the addition of a constant, and is given by

a
2 + c 3 (17.27)
h(r) 8D az
0 2r
where c 3 is an arbitrary constant, which may be taken to be zero in carrying

out the integration in (17.25). One then gets
2
48 0
Finally, since {B} and {X,'} in (17.8) are independent it follows that, as n 00,
Y:= n - ' 1' Z (X ZUo nt) --* N(0, Dt) (17.29)
in distribution. Here
a2U2
o , (17.30)
D 48D 0+ D
is the large scale or effective dispersion along the capillary axis. Equations (17.29)
and (17.30) represent Taylor's main result as completed by R. Aris (theoretical
complement 1). The effective dispersion D is larger than the molecular diffusion
coefficient Do and grows quadratically with velocity.
Actually, the functional central limit theorem holds, i.e., { Y} converges in
distribution to a Brownian motion with zero drift and variance parameter D.
Also, for each t > 0, the convergence in (17.29) is much stronger than in
distribution. Indeed, since the distribution of Y,(" is the convolution of those
of Z,(" ) and Brownian motion B, (see (17.8(ii))), Y,(" has a density that)
converges to the density of N(0, Dt) as n --> oo. Hence, by Scheffe's theorem
(Section 3 of Chapter 0), the density of Y," converges to that of N(0, Dt) in
)
the L' -norm. Since the distribution of X,(' ) has the density c(t;)/Co , where
C o := $,3 c o (dx) is the total amount of solute present, the density of Y^' is given by
z --> n" 2 c(nt; n'/ Z Z + 2Uo nt)/C0 . (17.31)
One therefore has, writing E =
I E
-l^( E -z t; E -i z + zU0 E -2 t)
(2nCt)ii2 exp{ - 2
dz-+0 as v J, 0.
(17.32)
Another way of expressing (17.32) is by changing variables z' _ E - 'z, t' = E -z t.

Then (17.32) becomes
^c(t ; z + zUt) exp^ dz' -+0 ast'I oo. (17.33)

(2^Dt')^^z ll 2Dt'}
---
From a practical point of view, (17.33) says that at time t' much larger than
the relaxation time over which the error in (17.10) becomes negligible, the
concentration along the capillary axis becomes Gaussian. The center of mass
of the solute moves with a velocity ZU 0 along the capillary axis, with a dispersion
D per unit of time. The relaxation time in (17.10) is 1/c 2 , and c 2 is estimated
by the first (i.e., closest to zero) nonzero eigenvalue of ZD o i X . on the disc Ba
with the no-flux, or Neumann, boundary condition.
It remains to give a proof of the representation (17.8). First, let us show that
{X, = (X; 1 ", X;)} as defined by (17.8) is a time-homogeneous Markov process.
Fix s, t positive, and let the initial state be x o = (x'I, xo'). Write
X ;+)S = x + 1 5 F((X; + )) du + Do (B,+s B,), (17.34)

0
where X; + is the after -t process {(X )":= X, + : u >, 0 }. Let g be a bounded

measurable function on G. Since {B,,: 0 <, u < t }, {X;,: 0 < u ` t} determine
{X.: 0 < u < tl,
E[g(X1+s) I {X v : 0 < u , t}]
= E[E(g(X, +S ) {B:0 u,< t}, {X: 0,< u<, t}) ^ {X: 0,< u,< t}]. (17.35)
Now,
X;+s). (17.36)
g(X1+5) = 9(X}' , +
foS F((X; + )u)du+ (B1+sB1),
Since B,+s B t is independent of the conditioning variables and of X;'> and

X; +, the inner conditional expectation on the right in (17.35) may be expressed as
E(h(s, (X, + F
f
o ((X;+)))du, X
) +s {B: 0 < u < t}, {X: 0 < u < t}
(17.37)
where
h(s, v, z') = Eg(v + Ji (B, B,), z'). (17.38)
To evaluate (17.37) use the facts that (1) X,(' ) is determined by the conditioning
variables, (2) {X;} is independent of {B,}, so that the conditional distribution
of X given {B,,: 0 < u < t}, {X: 0 < u < t}, is the same as that given
{X: 0 < u < t}. Then (17.37) becomes
(E(h(s, y + Jo F((X)u) du, X;+S) I {X: 0 < u < t})).. (17.39)
But {X;} is a time-homogeneous Markov process. Therefore, (17.39) becomes
(h1(s, y, X;)) Y =xil , = hl(s, Xi l) , X;), (17.40)
where h l is some function on [0, ao) x G. The expression (17.40) is already

determined by {X 0 : 0 < u < t}. Therefore, the outer conditional expectation in
(17.35) is the same as (17.40), completing the proof that {X 1 } is a
time-homogeneous Markov process.
Observe next that by Proposition 14.1, if {Z, :_ (Z,'', Z,')} is a Markov process
on G whose transition probability density satisfies the Kolmogorov equations
(17.3) and (17.4), then {Z,'} is a Markov process on the disc B o whose transition
probability density satisfies the Kolmogorov equations (17.13) and (17.14).
Since the latter are also satisfied by {X;}, (17.8(i)) is established. It is enough
then to identify the conditional probability density of X;'^, given X o = x, with
that of Z; 1) , given Z o = x. Let g be a smooth bounded function on R', and let
g denote the function on G defined by g(x) = g(xU )). Then, denoting by EX

EXERCISES 475
expectation given X 0 = x,
(Tt9)(X)'= E.O(Xr) = E,,g(X). (17.41)
By (17.8), as t j 0 one has by a Taylor expansion,
(Tr9)(x) = g(xci)) + g'(xU))Ex(J F(X) ds + Do B

0 `)
r 2
+ Zg"(x ( ' ) )E. F(X) ds + Do B, + o(t)
0
= g(x1) + g '( x U))F( x ') + zg"(x" ) )Dot + o(t) (17.42)
where g', g" are the first and second derivatives of g. It follows that
a
( at
T,9(x) I
/t_o
= F(x')g'(x") + ZDo g"(x" i ^). (17.43)
But the right side is Ag, where A is as in (17.3). Therefore, the infinitesimal
generator of {X,} agrees with that of {Z I } for functions depending only on the
first coordinate. In particular, the backward equation for T, g becomes
a T,g(x) = AT,g(x), (17.44)

at
so that, by the uniqueness of the solution to the initial-value problem,
E.g(X) _ (T,g)(x) = E(g(Z{' ) ) I Z o = x). Thus, the conditional distribution
of Xf ", given X 0 = x, is the same as that of Z,1 1) , given Z o = x. This completes
the proof of (17.8).
It may be noted that, although kinetic theoretic arguments given earlier
provide a justification for the validity of the FokkerPlanck equations (17.5)
and (17.6) (with c(t; y) replacing p(t; x, y)), they are not needed for the analysis
of the asymptotics of c(t; y) carried out above. We have simply given a
probabilistic proof, using the central limit theorem for Markov processes, of
an important analytical result.
EXERCISES
Exercises for Section V.1
Check (1.2) for

(i) a Brownian motion with drift p and diffusion coefficient a Z > 0, and
(ii) the OrnsteinUhlenbeck process.
2. Show that if (1.2) holds then so does (1.2)' for every e > 0.
3. Suppose that {X,} is a Markov process on S = (a, b), and (p is a continuous strictly
monotone function on (a, b) onto (c, d).
(i) Prove that {(XX )} is a Markov process.
(ii) Compute the transition probability density for the process {X,:=exp{B,}} in
Example 1.
4. Consider the Ornstein-Uhlenbeck velocity process { Y,} starting at V o = v. Let
y = /m (Example 2).
(i) Show that
az
Cov(V, I;+) = (1 - e - ' 7 )e - Y5 , s > 0.
Y
(ii) Calculate the limit of the transition probability density p(t; x, y) in (1.6), as
t--. oo,if(a)y<0,or(b)y>0.
(iii) Define the Ornstein-Uhlenbeck position process by X, = x + f o V. ds, t -> 0.
(a) Show that {X} is a Gaussian process.
(b) Show EX, = x + vy '(1 - e Y`).
(c) Show
z z z
4e-rr-1e-zyr
VarX,=a t-3v +v
Y z 2 Y' 2Y z Y Y
(d) Show
z
COV(Xs , Xs+t - Xs) = (1 - e)(1 - e) 2 .
Y
(iv) Explain why {X,} is not a Markov process.

(v) Write y = m 'and let a z = ya for some a > 0. Use (iii) to show that as
y -+ co the process {XX } converges to a Brownian motion with zero drift and
diffusion coefficient a.
5. In Example 3, take {X,} to be an Ornstein-Uhlenbeck process and f(t) = ct (c ^ 0).
Compute the transition probability density of {Z, = X, + ct}.
6. Let v e R', v 0 0, be an arbitrary (nonrandom) constant. Define a deterministic
motion by XX = X0 + vt, t > 0, i.e., randomness only in the initial distribution of X 0 .
Show that {X,} is a Markov process having continuous sample paths and satisfying
(1.2) but with o z = 0. Does the transition probability distribution have a density?

1. If fa f(x)h l (x) dx = J f(x)h z (x) dx for all twice continuously differentiable functions
f vanishing outside some closed, bounded subinterval of (a, b), then prove that
h l (x) = h 2 (x) outside a set of Lebesgue measure zero.
2. Show that the derivations of (2.13) and of the backward equation (2.15) do not
require the assumption of the existence of a density of the transition probability.

EXERCISES 477
3. Let f be a twice continuously differentiable function on [c, d]. Show that given &> 0
there exists a twice-differentiable g on l' such that g = f on [c, d] and g vanishes
outside [c - e, d + J.
4. (i) Show that (1.2) in Section 1 holds uniformly for all x in the following cases:
(a) U(x) =0,a 2 (x) =a 2 .
(b) u(x) = , a 2 (x) = a 2
(ii) Show that, for the Ornstein-Uhlenbeck process, PP (IX, - xI > r) = o(t) does not
hold uniformly in x.
5. Let p(t; x, y) be the transition probability density of a diffusion on V8' and A a real
number. Write q(t; x, y) exp{ - At} p(t; x, y).
(i) Show that q satisfies the Chapman-Kolmogorov equation (2.2), and the
backward and forward equations
at = (x) 2q,
+ Za2(x) ax
ax -
aq
at
- ay a (u(y)q) + ya (za (y)q) - Aq.
z
2 2
(ii) (Initial-Value Problem) Let f be a bounded continuous function and define

T, f := f f(y)q(t; x, y) dy. Show that the function u(t, x):= (T, f)(x) solves the
initial-value problem
au au z
(xEf8',t>0),
=p(x)t-
at x +za2(x)axz-Au
lim u(t, x) = f(x).
rlo
(iii) (Killing at a Constant Rate) Let {X,} be a diffusion with transition probability
density p, and an exponentially distributed random variable independent of
{X,} and having parameter A. Then q may be interpreted as the (defective)
transition probability density of the process {X,} killed at time . More
specifically, show that the function u in (ii) may be represented as
E(f(X,) 1 (Z>l) 1 Xo = x).

(iv) (Killing) Let ).(x) be a continuous, nonnegative function on R'. Define the
operators T, acting on bounded continuous functions f, by
(T,f)(x)'= E(.i(X,) expS

l
- J o A(X,) dsl
t I Xo = x)
where {X} is a diffusion on W. Show that the operators T, have the semigroup
property, and that u(t, x):= (T, f)(x) solves the initial-value problem in (ii) with
A=
(v) Note that (T, f)(x) is finite (indeed, T, f is bounded if f is bounded), if A(x) >_ 0.
One may also express T, f as
(T1./ )(x) = E(.f(X,) 1 (4,X >> n),

where conditionally given the sample path {X s : s >_ 0}, the killing time ^ (X , E has
the (nonhomogeneous) exponential distribution
P(i;^ x , } > t {Xs : s > 0}) = exp

t 2(XS ) ds } .
)
The killing times ^ fx . ) may take the value + co with positive probability. If {X,}
is defined on a probability space of trajectories (52,,.F 1 , P,), then show that by
enlarging the space appropriately one may define both {X,} and on a
common probability space (52,.x, P). [Hint: First construct the product space
(Q 2 , .Fz , Pz ) for an independent family of random variables { } indexed by the
set S2, of all trajectories co,. That is, 0 2 = X w En , Icu,, where lo = [0, oo]
for all w l E Q. Then take the product space (S2 = X S 2 Z, = 1 OO Wiz,
P = P 1 X P2).]
(vi) (FeynmanKac Formula) If A(x) is bounded below and continuous, and not
necessarily nonnegative, show that T, f given in (iv) is well defined, defines a
semigroup, and solves the initial-value problem (ii) with A = A(x).
(Adjoint Diffusions) Let p(t; x, y) denote the transition probability density of a
diffusion on F1' with coefficients satisfying Condition (1.1). In addition, assume that
z
(o2(x))d (x)=0, (xeR').
2dxz
(i) Show that P(t; x, y):= p(t; y, x) is a transition probability density of a diffusion
on F1' and compute its drift and diffusion coefficients. What is the forward equation
for p?
(ii) (Time Reversibility) Let {X,} and {Y} be diffusions with transition probability
densities p and p" as above, with X o =_ x, Yo = y. Prove that for arbitrary time
points 0 < t, < < t,, < t the conditional p.d.f. of (X,,, X,,, ... , X,) given
X, = y, evaluated at (x,, x 2 ..... x), is the same as the conditional p.d.f. of
(Y,, Y,, .. , Y) given Y, = x, evaluated at (x,,, x_,, .. , x,). Thus, the
sample path of one process over any finite time period cannot be distinguished
probabilistically from that of the other with the direction of time reversed.
7. (Sources and Sinks) Consider the FokkerPlanck equation with a source (or sink)
term h(t, y),
c(t, a2
(u(Y)c(t, Y)) + i (ia z (Y)c(t, y)). + h(t, y),

at Y
where h(t, y) is a bounded continuous function of t and y, and f Ih(t, x)Ip(t; x, y) dx < oo.
One may interpret h(t, y) (in case h _> 0) as the rate at which new solute particles are
created at y at time t, providing an additional source for the change in concentration
with time other than the flux. The contribution of the initial concentration c o to the
concentration at y at time t is
c, (t, Y)'=
J
co(x)p(t; x, Y) dx.

EXERCISES 479
The contribution due to the h(s, x) Ax As new particles created in the region
[x, x + ix] during the time [s, s + As] is h(s, x)p(t s; x, y) As Ax (0 _< s < t).
Integrating, the overall contribution from this to the concentration at y at time t is
c 2 (t, y) = J
f. gal
'
h(s, x) p(t s; x, y) dx ds.
(i) (Duhamel's Principle) Show that c, satisfies the homogeneous FokkerPlanck

equation with h = 0 and initial concentration c o , while c 2 satisfies the
nonhomogeneous FokkerPlanck equation above with initial concentration zero.
Hence c := c, + c 2 solves the nonhomogeneous FokkerPlanck equation above
with initial concentration c o . Assume here that c o (x) is integrable, and
lim
ajo PU
J h(t s, x)p(s; x, y) ds = h(t, x).
Show that this last condition is satisfied, for example, under the hypothesis of
Exercise 6. *Can you make use of Exercise 5 to verify it more generally?
(ii) Apply (i) to the case (x), a z (x), h(t, x) are constants.
(iii) Solve the corresponding initial-value problem for the backward equation.

1. Derive (3.21) using (3.15) and Proposition 3.1.
2. Show that the process {Z,} of Example 4 does not have a finite first moment.
3. By a change of variables directly derive the backward equation satisfied by the
transition probability density p of {Z,} := {4p(X,)}, using the corresponding equation
for p. From this read off the drift and diffusion coefficients of {Z, }.
4. (Natural Scale for Diffusions) For given drift and diffusion coefficients
p(.), a'(.) >0 on S = (a, 6), define
(Y
1(x z):= ^ 2, u ) d Y
_
x 62 (y)
with the usual convention !(z, x) 1(x, z) for z < x. Fix an arbitrary x o e S, and
define the scale function
J
s(x)= exp{ 1(x 0 , z)} dz,
x
again following the convention fx = J for x < x o . If {X,} is a diffusion on S

with coefficients p() a 2 (.), find the drift and diffusion coefficients of {s(X,)} (see
Section 9 for a justification of the nomenclature).
5. Use Corollary 2.4 to prove that the following stronger version of the last relation
in (1.2)' holds: for each x e 11' and each c> 0,
P.( max IX,-xl>E^=o(h) ashj0.

o<t<n
[Hint: Let f be twice continuously differentiable, such that (a) f (y) = 2 for Ix yj > 1,
(b) 1 <, f (y) <- 2 for Iy xl > e, (c) f (y) = Iy xl'/E 3 for ly xl e. Then
P.( max JX,xl>s)^P, max f(X,)>-1^
Px ^ max tf
ost-<h
(X,) J t
0
Af(X,) ds}
2
if h is chosen so small that max{IAf (y)I: ye 11'} -< 1/2h. By the maximal inequality
(see Chapter I, Theorem 13.6), the last probability is less than
Now show that

EXI f(Xh)
f .'
Af (XS ) ds) <- 2E x f 2 (Xh ) + O(h 2 ).
I3
EXf 2 (Xh) 4 PX(IXh xl > e) + E x IX'^ x I(lXh_XI F) = o(h),
by (1.2)' and the lemma in Section 3.

6. Show that the geometric Brownian motion {X,} is a process with independent ratios
(X,,/X, a ), ... , (X,.jX,), for 0 = t o < t l < < t,,, k >, 2. In particular, at any
time scale, t k kA, k = 0, 1, 2, ... , the values X,,,, X,, ... satisfy the law of
proportionate effect: X,,, = L k (A)Xk , k = 0, 1, 2, ... , where L 0 (0), L 1 (i), ... are
i.i.d. (load factors) positive random variables.
7. Let {B(t)} be a standard Brownian motion starting at 0. Let V, = e - 'aB(e 21 ), t > 0.
Show that { Y,} is an Ornstein-Uhlenbeck process. Note continuity of sample paths
from {B(t)}.

1. Show that (4.4), (4.5) are solved by (4.2).
2. Derive (4.17).
3. (i) Using (4.17) derive the probability p , that, starting at x, {X,} ever reaches c.
X
(ii) Find a necessary and sufficient condition for {X 1 } to be recurrent.

4. Apply the criterion in Exercise 3(ii) to the Ornstein-Uhlenbeck process. What if
y/m> 0 ?
5. (i) Using the results of Section 3 of Chapter III, give an informal derivation of a
necessary and sufficient condition for the existence of a unique invariant
probability distribution for {X,} (i.e., of p(t; x, y) dy).

EXERCISES 481
(ii) Compute this invariant density.

(iii) Show that the Ornstein-Uhlenbeck process { V} has a unique invariant
distribution and compute this distribution (called the Maxwell-Boltzmann:
velocity distribution).
(iv) Show that under the invariant (Maxwell-Boltzmann) distribution,
az
Cov,(V, , V +,) = 2- e - 'S, s > 0.
Y
(v) Calculate the average kinetic energy E(mV )of an Ornstein-Uhlenbeck particle
under the invariant initial distribution.
(vi) According to Bochner's Theorem (Chapter 0, Section 8), one can check that
p(s) = Cov,,(V V + ,) in (iv) is the Fourier transform of a measure It (spectral
distribution). Calculate p.
6. Briefly indicate how the Ornstein-Uhlenbeck process may be viewed as a limit of
the Ehrenfest model (see Section 5 of Chapter I11) as the number of balls d goes to
infinity.
7. Let
s (x) J
E
= pex ^-I 2ti(1)dy^dz
. xo a 2 (Y)
be the so-called scale function on S = (a, b), where x o is some point in S. If {X} is
a diffusion on S with coefficients p( ), a 2 ( ), then show that the diffusion { Y, := s(X,)}
on S = (s(a), s(b)) has the property
P({Y,} reaches c before dl Yo = y) =a y c , y_<d.
8. Derive the forward equation
ap(t x, y)
; ^2
= -^ ^((Y)P(t;
yyz x, y)) + (ia2(Y)P(t; X, y)),
at
(t >0, -cc < x, y < oo), along the lines of the derivation of the backward equation
(4.14) given in this section. [Hint: p " +t = pap ] .
9. (Maxwell-Boltzmann Velocities) A monatomic gas consists of a large number N

(- 10 23 ) molecules of identical masses m each. Label the velocity of the ith molecule
by v, _ (v 31 _ 2 , v 3; 1 , v 3; ), for i = 1, 2, ... , N. To say that the gas is an ideal gas
means that molecular interactions are ignored. In particular, the only energy
represented in an ideal monatomic gas is the translational kinetic energy of individual
molecular motions. The total energy is therefore
3N
T = T(v1, v2, ... , U3N) _ 'zmv; = zm IIvIIi, for v = (v1.... , V3N) E R 3N ,
j=1
where IlvlI2:=^; =Ni vj. Define equilibrium for a gas in a closed system of energy E
to mean that the velocities are (purely random) uniformly distributed over the energy
surface S given by the surface of the 3N-dimensional ball of radius R = (2E/m)" 2 , i.e.,
f
S = ( VI, v2,
3N
, V3N) Y_vJ _
i =1 m
2E
One may also define temperature in proportion to E.

(i) Show that the distribution of the jth component of velocity is given by
P(a <v < b) _ b [l (x/R)2](3N- 3)/2 dx/^ R [1 (x/R)2](3N- 3)12 dx.

Ja R
(ii) Calculate the limiting distribution as N + oo and E + cc such that the average
energy per particle EIN (density) stabilizes to > 0.
(iii) How do the above calculations change when the 1 2 -norm 1 2 defined above is
replaced by the 1p norm defined by jIvIIP'= v, where p >_ 1 is fixed?
(This interesting generalization was brought to our attention by Professor
Svetlozar T. Rachev.)

1. Extend the argument given in Example 1 to compute the transition probability density
of a Brownian motion with drift p and diffusion coefficient a 2 > 0.
2. Use the method of Example 2 to compute the transition probability density of a
diffusion with drift u(x) = yx g and diffusion coefficient a 2 > 0.
3. (i) Use Kolmogorov's backward equation and the Fourier transform to solve the
initial-value problem
u ^= x
( ) Z
(t>0,oo<x<oo),
=za axe+x
lim u(t, x) = f (x),

rlo
where f is a bounded and continuous function on (oo, oo).

(ii) Use (i) to compute the transition probability density of a Brownian motion.
4. In Exercise 3 take the initial-value problem to be the one corresponding to the
OrnsteinUhlenbeck process.

1. Use (1.2)' to show that the diffusions { Y, := X,} and {X} have the same drift and
diffusion coefficients, if (6.7), (6.8) hold.
2. Use (1.2)' to show that the diffusions {Y:= X, md} (m = 0, 1, ...) have the same
drift and diffusion coefficients, provided u() and v 2 () are periodic with period d.
3. Express the transition probability density of the Markov process {Z 1 } in Theorem
6.4 in terms of that of {X,}.
EXERCISES 483
4. Using the construction of reflecting diffusions on [0, oo), [0, 1], construct reflecting
diffusions on [a, oo), ( co, a], [a, b] for arbitrary a < b. What are the analogs of
(6.7), (6.8) and (6.36)- (6.38) in these cases? Express the transition probability
densities for these reflecting diffusions in terms of appropriate unrestricted diffusions.
5. Suppose that cp is a measurable map on an interval S onto an interval S'. Suppose
that {X} is a Markov process having a homogeneous transition probability p. If
T,(f o ^p)(x) := E[ f ((p(X,)) I X 0 = x] is a function v 1 (t, p(x)) of qp(x), for every
real-valued bounded measurable function f on S. then prove that { Y, := cp(X,)} is a
homogeneous Markov process on S'.
6. Give a derivation of the forward equation (6.17).
7. Prove that the transition probability density q(t; x, y) of {Z} in Theorem 6.4 satisfies
(aq/ax)x -1 = 0.
8. (i) Assume that p(t; x, y) >0 for all t > 0, and x, y e ( oo, oo) for every diffusion
whose drift and diffusion coefficients satisfy Condition (1.1). Prove that the
Markov process {Z,} on the circle in Theorem 6.2 has a unique invariant
probability m(y) dy, and that the transition probability density q(t; x, y) of {Z,}
converges to m(y) in the sense that
jq(t;x,y) m(y)ldy - + 0 ast-+oo,

J s

the convergence being exponentially fast and uniform in x.

(ii) Compute m(y). [Hint: Differentiate $ T, f (x)m(x) dx = Sf(x)m(x) dx with respect
J),
to t, to get (A f)(x)m(x) dx = 0 for all twice-differentiable f on S with compact
support, so that A *m = 0.]
9. Derive the forward boundary conditions for a reflecting diffusion on [0, 1].
10. Suppose (a) {X} and { X,} have the same transition probability density, and
(b) { 1 + X} and { l X,} have the same transition probability density. Show that
the drift and diffusion coefficients of {X} must be periodic with period 2.
1. Use the Radon-Nikodym Theorem (Section 4 of Chapter 0) to prove the existence

of a nonnegative function y -+ p (t; x, y) such that f, p (t; x, y) dy = p(t; x, B) for

every Borel subset B of (a, oo). Show that p (t; x, y) _< p(t; x, y) where p is the
transition probability density of the unrestricted process {X, }.
2. Let T, (t > 0) be the semigroup of transition operators for the Markov process {Xj
in Theorem 7.1, i.e.,
(T,f)(x) = Exf(Xr) = f(y)P(t; x, dy)

fla,
for all bounded continuous f on [a, oo).

(i) If f(a) = 0, show that
(a) (T^f)(x) =
f( a.m)

f(Y)P (t; x, Y) dY (x > a),
(b) (T, f)(x) -+ 0 as x j a. [Hint: Use (7.11).]

(ii) Use (i) to show that T (t > 0), defined by
(T.f )(x);= f( a, )
f(Y)P(t; x, y) dY,
is a semigroup of operators on the class C of all bounded continuous functions

on [a, oo) vanishing at a. Thus T is the restriction of T, to C (and TC c C).
(iii) Let A denote the infinitesimal generator of T, i.e., the restriction of the
infinitesimal generator of T, to C. Give at least a rough argument to show that
(7.10) holds (along with (7.11)).
3. Derive (7.13) and (7.14).
4. (Method of Images) Consider the drift and diffusion coefficients p(.), a 2 () defined
on [0, oo). Assume u(0) = 0. Extend p(.), a 2 () to 01' as in (6.8) by setting
,u(x) = (x), a'(x) = a 2 (x) for x > 0. Let p denote the (defective) transition
probability density on [0, oo) (extended by continuity to 0) defined by (7.9). Let p
denote the transition probability density for a diffusion on R' having the extended
coefficients above. Show that
P (t; x, y) = p(t; x, y) p(t; x, y)
= p(t; x, y) p(t; x, y), (t > 0; x, y >- 0).
[Hint: Show that p satisfies the Chapman-Kolmogorov equation (2.30) on

S = [0, oo), the backward equation (7.10), the boundary condition (7.11), and the
initial condition p (t; x, y) dy -* 5(dy) as t j 0 (i.e., (T f)(x) - f (x) as t j 0 for every
bounded continuous function f on [0, oo) vanishing at 0). Then appeal to the
uniqueness of such a solution (see theoretical complements 2.3, 8.1). An alternative
probabilistic derivation is given in Exercise 11.11.]
5. (Method of Images) Let u(.), a 2 (.) be drift and diffusion coefficients defined on
[0, d] with p(0) = (d) = 0, as in (6.36). Extend p(.), a 2 (.) to R' as in (6.37) and
(6.38) (with d in place of 1). Let p denote the transition probability density of a
diffusion on W having these extended coefficients. Define
q(t; x, y):= I p(t; x, y + 2md), d < x, y < d,

m=m
P (t; x, Y)'= q(t; x, Y) q(t; x, y), 0 5 x, y < d.
(i) Prove that p is the density component of the transition probability of the diffusion
on [0, d] absorbed at the boundary {0, d}. [Hint: Check (2.30). (7.10), and (7.11)
on [0, d], as well as the initial condition: (T f)(x) -+ f (x) as t 10 for every
continuous f on [0, d] with f(0) = f(d) = 0.]
(ii) For the special case j(.) = 0, a 2 (.) = a 2 compute p.
EXERCISES 485
6. (Duhamel's Principle)
(i) (Nonhomogeneous Equation) Solve the initial-boundary-value problem
a u tazu
u(t, x) = u(x) za2(x) x), (t > 0, a <x < b);
at -x + c3x2 + h(t,
lim u(t, x) = 0, lim u(t, x) = 0, (t > 0, a -< x -< b);

xja zTb
tim u(t,x)= f(x), (a -<x-<b),

(40
where h is a bounded continuous function on [0, cc) x [a, b], and f is a bounded
continuous function on [a, b] vanishing at a, b. [Hint: Let p be as in (7.17).
Define
u0(t, x)'=
f. J
' [a,b]
h(s, y)p (t s; x, y) dy, ui(t; x) =
ft a,b]
f(y)p (t; x, v) dy.

Check that u satisfies the nonhomogeneous equation and the given boundary
conditions, but has zero initial value. Then check that u, satisfies the
homogeneous differential equation (i.e., with h = 0), the given boundary and
initial conditions.]
(ii) (Nonzero Constant Boundary Values) Solve the initial-boundary-value problem
z
u(t, x) = u(t, x), (t > 0; a <x < b);
at axz
lim u(t, x) = a, lim u(t, x) = ;
:la xTb
lim u(t, x) = f(x),

t; o
where f is a bounded continuous function on [a, b] with 1(a) = a, f (b) = .

[Hint: Let u (x) satisfy the differential equation and the boundary conditions.
Find u,(t, x) which satisfies the differential equation, zero boundary conditions,
and the initial condition: lim a u, (t, x) = f (x) u (x). Then consider
u (x) + u,(t, x).]
(iii) (Time Dependent Boundary Values) In (ii) take a = a(t), = (t) where a(t), (t)
are continuous differentiable functions on [0, x). [Hint: Let u ( t, x):=
a(t) + (((t) a(t))/(b a))(x a). Find u,(t, x) solving the nonhomogeneous
equation:
au, = _a u 1 _ a u0
1Q2 _
(3t z x z at
with zero boundary conditions and initial condition lim, u, (t, x) = ]
.f(x) uo(0, x)]

7. (Maximum Principle)
(i) Let u(t, x) be twice-differentiable with respect to x and once with respect to t on

(0, T] x (a, b), continuous on [0, T] x [a, b], and satisfy (au/at) Au <0 on
_<
(b) at a point (t, a) or (t, b) for some t e (0, T]. [Hint: Suppose not. Let the
_<
(0, T] x (a, b), where A = o 2 (x) d 2 /dx 2 + p(x) d/dx, with a 2 (x) > 0. Show that
the maximum value of u is attained either (a) initially, i.e., at a point (0, x o ), or
maximum value be attained at a point (t o , x 0 ) with 0 < t o T, and x o e (a, b).

Then (Au)(t o , x o ) 0, so that (u/t)(t o , x o ) < 0. But this means u(t, x o ) > u(t o , x o )
for some t < t o ].
(ii) Extend (i) to the case au/at Au < 0. [Hint: Let 0 be a continuous function on
[a, b] satisfying AB = 1. For each r > 0, consider u,(t, u) = u(t, x) + eO(x).]
(iii) Let u(x) be twice-differentiable on (a, b), continuous on [a, b] and satisfy
(Au)(x) > 0 for a <x < b. Show that the maximum value of u is attained at
x = a or x = b. [Hint: First assume Au > 0; if the maximum value is attained
at x o e (a, b), then Au(x o ) < 0.]
(iv) Prove that the solutions to the initial-boundary-value problems in Exercise 6
are unique.

1. Prove that (8.3) holds under Dirichlet or Neumann boundary conditions at the end
_<
points of a finite interval S.
2. Prove that
(i) if a is an eigenvalue of A (with Dirichlet or Neumann boundary conditions at
the end points of S), then a 0;
(ii) if a,, a 2 are two distinct eigenvalues and ei l 0 2 corresponding eigenfunctions,
,
then Ali, and i' 2 are orthogonal, i.e., <i/i,, i/i 2 > = 0;
(iii) if 0,, 0 2 are linearly independent, then 4,, 2 (,, 02> /II1Il 2 )O, are
orthogonal; if 0 1 , 0 2 , ... , 0, are linearly independent, and ,, ... , O k _, are
orthogonal to each other, then 0 ... , k_ 1, k ( Z,-i ;l 4,k) /11c6; II 2 )4 . ,
are orthogonal.
3. Assume that the nonzero eigenvalues of A are bounded away from zero (see
theoretical complement 2).
(i) In the case of two reflecting boundaries prove that
,im p(t; x, Y) = m(Y) (x, Y e [ 0 , d])
where m(y) is the normalization of n(y) in (8.1). Show that this convergence is
exponentially fast, uniformly in x, y.
(ii) Prove that m(y) dy is the unique invariant probability for p.
(iii) Prove (ii) without the assumption concerning eigenvalues.
4. By a change of scale in Example 3 compute the transition probability of a Brownian
motion on [a, b] with both boundaries absorbing.
5. Use Example 4, and a change of scale, to find
(i) the distribution of the minimum m, of a Brownian motion {X,} over the time
interval [0, t];
(ii) the distribution of the maximum M, of {X s } over [0, t];
(iii) the joint distribution of (X m,).

EXERCISES 487
6. Use Example 3, and change of scale, to compute the joint distribution of (X m, M,),
using the notation of Exercise 5 above.
7. For a Brownian motion with drift p and diffusion coefficient a 2 > 0, compute the
p.d.f. of T A r b when X0 = x and a <x < b.
8. Establish the identities
(na
2t) z ' Z
m
L
[
ex jl 3
= _ x 2azt
p ( - (2md +y -x) Z l
+ ex
p(
Sl
(2md +Y +x)' ll
2621 )7J
1 2
- + - ex z - cos cos
= m=1 p^- - -- --> m x /
\i \-
exp{-(2md + y Y^ - exp^-(2md + y + x)
(2na 2 t) - '/ 2
L 2a 2 1 ) 2a2t -^1
x,
2 a 2 rr 2 m 2 1 m rzy
_-- Z exp - sin ( rnnx
- si n - ) (0 x, y d).
d m = 1 2d2 d d
Use these to derive Jacobi's identity for the theta function, .
^
(z):=
m - m _
exp{ -nm z} _1
2 exp{ -nmz /z} _ --O`
1 (il
f \z / (z > 0).
^z m s, ^Z
[Hint: Compare (8.23) with (6.31), and (8.33) with Exercise 7.5(ii).]
9. (Hermite Polynomials and the Ornstein-Uhlenbeck Process) Consider the generator
A = ' d 2 /dx 2 - x d/dx on the state space W.
(i) Prove that A is symmetric on an appropriate dense subspace of L2 (!8', e - x 2 dx).
(ii) Check that A has eigenfunctions H"(x)'= ( -1)" exp {x 2 }(d "/dx") exp{ -x 2 }, the
so-called Hermite polynomials, with corresponding eigenvalues n = 0, 1, 2, ... .
(iii) Give some justification for the expansion
x t
Z c"e
p(t; x, y) = e 2 "'H"(x)H"(y), c "'=
10. According to the theory of Fourier series, the functions cos nx (n = 0, 1, 2, ...),
sin nx (n = 1, 2, ...) form a complete orthogonal system in L2 [-n, 7r]. Use this to
prove the following:
(i) The functions cos nx (n = 0, 1, 2, ...) form a complete orthogonal sequence in
L2 [0, it]. [Hint: Let f E L2 [0, n]. Make an even extension of f to [ -n, n], and
show that this f may be expanded in L2 [-n, it] in terms ofcos nx (n = 0, 1, ...).]
(ii) The functions sin x (n = 1, 2, ...) form a complete orthogonal sequence in
L2 [0, it]. [Hint: Extend f to [-it, n] bysetting,f( -x) = - f(x)forx e [ -n, 0).]

1. Check (9.13).
2. Derive (9.20) by solving the appropriate two-point boundary-value problem.
3. Suppose S = [a, b) and a is reflecting. Show that the diffusion is recurrent if and only
if s(b) = cc. If, on the other hand, s(b) < oo then one has p s,. = 1 for y> x and
s(b) s(x) _
J
x
n exp{ I(x o , z)} dz
Px r = Sib) s(Y) ----, y < x.
s
exp{ I(x o , z)} dz
[Hint: For y > x, assume the fact that the transition probability density p(t; x, y)
is strictly positive and continuous in x, y for each t> 0, to show that
b := min{P= (X o y): a -< z -< y} > 0 for t o > 0.]
4. Suppose S = [a, b) and a is absorbing. Then show that no state, other than a, is
recurrent and that
s(x) s(a)
ifa<x^y,
s(y) s(a)
Px y = I ifs(b)=oo,anda<-y<x,
s(b) s(x)
ifs(b)< cc, and a<y-<x.
s(b) s(y)
5. Suppose S = [a, b] with both boundaries reflecting. Then show that p x,, = 1 for all
x, y E S. and the diffusion is recurrent.
6. Apply Corollary 9.3 and Exercise 3 to decide which of the following diffusions are
recurrent and which are transient:
(i) S=(ac,00),(x)=p960,a z (x)=a 2 >0;
(ii) S=( cc, cc),p(x)=0,a 2 (x)=a z >0;
(iii) S = (cc, cc), (x) = x, a 2 (x) = a 2 >0 (consider separately, >0, < 0);
(iv) S = [0, cc), p(x) - p, a 2 (x) = a 2 > 0, 0 reflecting (consider separately the cases
P< 0 ,P= 0 ,u>0);
(v) S=(0,1),u(x)--* oo as x10,p(x)-+ oo asxji,a 2 (x)>a 2 >0 for all x.
7. Suppose S = [a, b] with a absorbing and b reflecting. Show that all states but a
are transient, p xy = 1 for y < x, and
s(x) s(a)
Px v Y > x.
= s(y) s(a),
8. Suppose S = [a, b] with both boundaries absorbing. Show that all states, other than
a and b, are transient and
s(b)s(x)
ayx<b,
s(b) s(y)
Pz y =
s(x) s(a)
a<xyb.
s(y)s(a)

EXERCISES 489
9. Let [c, d] c S. Solve the two-point boundary-value problem: A f (x) = 0 for c <x < d,
lim a 1 , f (x) = a, lim X1 d f (x) = , where a, are given numbers. Show that the solution
equals aP (r < < r d ) + Px (r a < t a ).
P
1. (i) Assuming that the transition probability density p(t; x, y) is positive and
continuous in x, y for each t > 0, show that M(x), defined by (10.3), is bounded
on [c, d]. [Hint: Use Proposition 13.5 of Chapter I.]
(ii) Let T be as in (10.2), x e (c, d). Assume that
Px (max (XX xj>a =o(h) ash10,

0 <t-1h /
for every e > 0 (this is proved in Exercise 3.5). Show that E (1,,, h } M(Xh )) = o(h)
X
as h l 0. Check the assumption in the case of constant coefficients.

2. Show that (10.15) holds if s(b) = oo.
3. Assume s(b) = oc.
(i) Show that (10.16) is necessary and sufficient for finiteness of E x r, (c < x).
(ii) Prove (10.17), under the assumption (10.16).
4. Suppose s(a) = cc.
(i) Prove that, for x < d, E,t d < cc if and only if (10.18) holds.
(ii) Derive (10.19).
5. (i) Suppose S = [a, b) with a a reflecting boundary. Show that the diffusion is positive
recurrent if and only if s(b) = oo and m(b) < co. [Hint: Assume that the transition
probability density p(t; x, y) is positive and continuous in x, y for each t > 0, and
proceed as in Exercise 9.3, or use Proposition 13.5 of Chapter I.] Show that
(10.19) holds in case of positive recurrence.
(ii) Suppose S = [a, b] with a and b reflecting boundaries. Show that the diffusion
is positive recurrent. Show that (10.17) and (10.19) hold.
6. Apply Proposition 10.2 and Exercise 5 above (as well as Exercise 9.6) to classify the
diffusions in Exercise 9.6 into transient, null recurrent and positive recurrent ones.
7. Let [c, d] c S. Solve the two-point boundary-value problem Af (x) = g(x) for
c <x < d, lim, 1 c f (x) = a, lim x 1 d f (x) = , where g is a given bounded measurable
(or, continuous) function on [c, d] and a, are given constants. Show that the solution
represents
f
>_
OT, n za
Ex g(X,) ds + aPa(t< < t,.) + Px(Td < t,)
8. Show that, under natural scale (i.e., s(x) = x), m'1 (x) m'2 (x) for all x implies
M I (x) -> M 2 (x) for all x.

1. Prove that t, r, A r d are stopping times, by showing that they are measurable with
respect to . O ' t^ A Zd' respectively.
2. Prove that T 1 A 1 2 is a stopping time, if z and t Z are.
3. (i) Suppose that
E(Zh(XX+t,, . .. ' Xi+,,)lfr<^)) = E(Z[E, h(X ;, .... X,;%=x.11=<^))
for all Z of the form (11.7), m being arbitrary and f bounded continuous. Prove
that (11.12) holds. [Hint: Use (a) the fact that bounded continuous functions
on S"' are dense in O(S', y) where u is any probability measure, (b) Dynkin's
Pi-Lambda Theorem (Section 4 of Chapter 0).]
(ii) Prove that if (11.12) holds for arbitrary finite sets t, < t2 < ... < t; and bounded
continuous h, then the conditional distribution of X, given .F, is Px , on the
set {r < co}.
4. Prove that q(y) in (11.18) is continuous for every bounded continuous h on S', if
x p(t; x, dy) is weakly continuous (see Section 5 of Chapter 0, for the definition
of weak convergence). This weak continuity, also called the (weak) Feller property,
is proved in Section 3 of Chapter VII.
5. Let {XX } be a Brownian motion with drift p and diffusion coefficient a 2 > 0, starting
at x.
(i) Prove that the conditional distribution of {X; 0 _< u <_ t}, given X, does not
depend on p. [Hint: Look at finite-dimensional conditional distributions.]
(ii) Use (i) and (11.42) to compute the joint p.d.f. of (X M,), where
MM = max{X,,: 0 x).
(iv) Check that the conditional distribution in (i), given X, = y, is the same as the
distribution of the process
{_o_ (yx))+x:Ou<'t1 1
where { }.: u >, 0} is a Brownian motion with zero drift and diffusion coefficient
a 2 , starting at zero.
6. Assume the result of Example 8.3, for the case p = 0. Show how you can use Exercise
5(i) to derive the joint distribution of (Xe , m, M,) where {X,), M, are as in Exercise
5(i), (ii), and m, = min{X: 0 _ 0,
starting at x. Let t =; A r d , where c <x < d. Show that the after-r process X, is
not independent of the pre-t sigmafield.
8. Let {X} be a Brownian motion with drift p > 0 and diffusion coefficient a 2 > 0,
starting at zero.
(i) Prove that {z Q : a - 0} is a process with independent and homogeneous
increments.
(ii) Find the distribution of r q .
(iii) What can you say concerning (i), (ii), if p < 0?

EXERCISES 491
9. Show that the proof of the strong Markov property (Theorem 11.1) essentially
applies (indeed, more simply) to the countable state space case (see Section 5 of
Chapter IV).
10. (i) Use Exercise 5(ii) to derive the transition probability density (component) of
a Brownian motion on ( oo, 0] with absorption at 0.
(ii) Derive the corresponding transition density on [0, co) with absorption at 0.
(iii) Use Exercise 6 to derive the transition probability density (component) of a
Brownian motion on [0, 1] with absorption at both ends.
11. (Method of Images) Consider the drift and diffusion coefficient g(.), c 2 () defined
on ( oo, 0]. Assume p(0) = 0. Extend p(.)o 2 (.) to R' as in (6.8) by setting
( x) = g(x), Q 2 ( x) = a 2 (x) for x < 0. Let {X,} be a diffusion on R' with these
coefficients, starting at x < 0, and let z o denote the first passage time to 0. Let { Y,}
be the process defined by
_ X, for t < T o
Y X, for >, t o .
(i) Show that {Y,} has the same distribution as
(ii) Show that Px (M, -> 0) = 2P,,(X, > 0), where M, := max{Xs : 0 -< s -< t }. [Hint:
Show that (11.34) (11.36) hold with 0 replaced by x and a replaced by 0.]
(iii) Show that Px (XX -< y, t o > t) = PX (X, >, y) for ally < 0. [Hint: Use the analog
of (11.35).]
(iv) Deduce from (iii) that
P.(X,<y,s o >t)= Ps (X,<y)PX (X,>, y) forallx<0,y-<0.
(v) Derive the analog of (iv) for a diffusion on [0, co).

1. Use the strong Markov property to prove (12.10) assuming that the diffusion is
recurrent and that (12.11) holds.
2. Assume that the diffusion is positive recurrent.
(i) Prove that (12.14) defines a probability measure n(dx), i.e., prove countable
additivity. [Hint: Use Monotone Convergence Theorem.]
(ii) Prove (12.15) for all f satisfying (12.11).
3. Prove the uniqueness of the invariant probability in Theorem 12.2.
4. (i) Prove (12.24).
(ii) Use the strong Markov property to prove (12.25).
5. Under what assumptions on p (and op/at) can you justify the equalities in (12.27)?
6. (i) Prove that the diffusion in Example I is positive recurrent.
(ii) Let oo <a <b < oo, and suppose that a diffusion on (a, b) is recurrent. Prove
that it cannot reach either a or b with positive probability. [Hint: The time to
reach {a, b} is larger than ri z , (see (12.2)) for every r.]
7. Prove that the diffusion in Example 2 is positive recurrent under the assumptions
(12.35).
Find the invariant distributions of the following diffusions:

(i) S = (cc, oo), (x) = x ( < 0), o 2 (x) = a 2
(ii) S = (0, 1], (x) = (k 1)/x (k > 1), a 2 (x) = a 2 "1" a reflecting boundary.
(iii) S = [0, CO), (x) _ x ( > 0), a 2 (x) = a 2 , "0" is a reflecting boundary.
1. Give a proof of Theorem 13.1 similar to the proof of Proposition .10.1 in Chapter II.
(Also, see the proof of Proposition 12.1.)
2. Consider a diffusion {X1 } on S = (0, 1], with "1" a reflecting boundary and coefficients
(x) = (k 1)/x (k > 1), v 2 (x) - a 2 . Apply Theorem 13.1 to compute the asymptotic
distribution of
t-l/2 J t (f(XS )Ej)ds as tj oo,

0
where f (x) = 1 x 2 . What is E n f ? [Hint: Remember the boundary condition for A.]

Let {Xt(' 1 } (i = 1, 2, ... , k) be k independent diffusions on R 1 with drift coefficients
u ( ' ) () and diffusion coefficients a().).
(i) Show that {X 1 := (X,( ", .. , X; k) )} is a Markov process on 18 k , and express its
transition probability density in terms of those of {X}' ) }. [Hint: Look at
P({X s+ , e R} n F), where R is a k-dimensional rectangle (a,, b,) x x (a k , b k ),
and F = F, n F2 n . n Fk with F; determined by {X': 0 < u < s} (1 < i < k).]
(ii) Show that the transition probability density p(t; x, y) of {X,} satisfies
Kolmogorov's backward and forward equations
p Q , (x) a 2p + Yk- u ^ (x') p

1 Zk a
at = ax<i, z ax(i,'
ap
at =2 i =,
I Z " Z (U?(y Ii ')p) e ^ ay^,^ (f^(Y' )p)
(ay )
(iii) Let p(x) = yx, D(x) = a 2 I, with y > 0, a 2 > 0. Write down its transition
probability density p(t; x, y), and check that it satisfies the Kolmogorov
equations. Show p(t; x, y) converges to a Gaussian density as t cc, and deduce
that the diffusion has a unique invariant distribution.
2. Write down the analogs of (1.2), (1.2)' for a multidimensional diffusion.
3. (i) Show that a k-dimensional Brownian motion with drift p = (p". .. , p" k ") and
diffusion matrix D = ((d ;; )) is a process with independent increments, and that
its transition probability density p(t; x, y) satisfies Kolmogorov's equations
(14.6).

EXERCISES 493
(ii) More generally, let cp be a one-to-one twice continuously differentiable map on

R' onto an open subset of R k . If {X} is a diffusion on R k with generator A given
by (14.9), compute the generator of {Y,:= cp(X,)}.
4. Check that the proof of Theorem 11.1 (Strong Markov property) remains unchanged
if the state space is taken to be It" or a subset of I".
5. Let {B,} be a k - dimensional standard Brownian motion, k >_ 3, starting at some
XE Rk .
(i) Let d> jxl. Prove that P(B,I > d for all sufficiently large t) = 1. [Hint: Fix d, > d.
(a) P({B,} ever reaches {z: IzI = d, }) = 1, by the recurrence of one-dimensional
Brownian motions.
(b) Write
r, := inf {t > 0: IB,l = d,},

r, :=inf{t> r,: B,l = d}, t 2n +1 ' =inf {t >'r 2n :IB,l = d, }
z 2n := inf {t > T 2i _,: IB,l = d }.
By the strong Markov property and (14.26), P(r 2 " < oo) = (d/d,) (k-2 >", and
P(T2 1 < oo I TZ" < (Xo) = 1.
(c) By (b), P(T Z " = oo for some n) = 1.]
(ii) From (i) conclude that
P(IB,I-+coast-+ cc) =1.
6. Let {B,} be a two-dimensional standard Brownian motion. Let B = {z: Iz - z 0 1

with z o and E > 0 arbitrary. Prove that
P(sup {t >_0:B,EB} = oo) = 1.
7. Let {X,} be a k - dimensional diffusion with periodic drift and diffusion coefficients,
the period being d > 0 in each coordinate. Write Z;' := X;'(mod d), Z, )
(Z t (I) , Z't1k) )
(i) Use Proposition 14.1 to show that {Z,} is a Markov process on the torus
T = [0, d)" which may be regarded as a Cartesian product of k circles.
(ii) Assuming that the transition probability density p(t; x, y) of {X,} is continuous
max jq(t; z, z') - ir(z')l

z.z'e T
_<
and positive for all t > 0, x, y, prove that the transition probability q(t; z, z') dz'
of {Z,} admits a unique invariant probability n(z') dz' and that
[I - bdk]ttt -1 0
where [t] is the integer part of t, b'= min{q(1;z, z'): z, z' e T} and
>_
0= max{q(l;y,z')q(l;z,z'):y,z,z'c-T}.
[Hint: Recall Exercise 6.9 of Chapter 11.]

8. Let {X,} be a k - dimensional diffusion, k 2.
(i) Use Proposition 14.1 and computations such as (14.18) to find a necessary and
sufficient condition (in terms of the coefficients of {X,}) for {R,:= IX,I} to be a
Markov process. Compute the generator of {R,} if this condition is met.
(ii) Assume that the sufficient condition in (i) is satisfied. Find necessary and sufficient
conditions for {X} to be (a) recurrent, (b) transient.
9. Derive (14.14).
10. (i) Sketch an argument, similar to that in Section 2, to show that Corollary 2.4
holds for diffusions on 11" (under conditions (1)(4) following Definition 14.1).
(ii) Use (i) to prove
Px (max IX,xl>e^=o(h) as h10.

O_<tSh
[Hint: See Exercise 3.5.]

1. Prove that the process {X,} defined by (15.1) is Markovian and has the transition
probability (15.3). [Hint: Mimic the proof of Theorem 7.1.]
2. Specialize (15.23) to the case k = 1 to compute Px ({X} reaches c before d), where
{X} is a one-dimensional diffusion and c 5 x <_ d. Check that this agrees with (9.19).
3. Apply (15.23) to compute Px ({B} reaches 3B(O:c) before aB(O:d)) where {B,} is a
k-dimensional standard Brownian motion, k > 1, OB(O:r) = {y E l: IYI = r}, and
c < Ixl _< d. Check that this computation agrees with (14.23).
4. Use Exercise 14.8(i) and (15.23) to compute
Px ({X,} reaches B(O:c) before B(O:d)), c 5 Ixl _< d,
for a k-dimensional diffusion (k > 1) {X,} such that the radial process {R,:= IX,I} is
Markovian.
5. Consider a standard k-dimensional Brownian motion (k > 1) on G = {x E R k : 0
for 1 <, i 5 k} with absorption at the boundary.
(i) Compute p (t; x, y) for t> 0; x, ye G).
(ii) Compute the distribution of r ac , if the process starts at x e G.
*(iii) Compute the hitting (or, harmonic) measure i/i(x, dy) on G, for x e G. (1'(x, dy)
is the Pr -distribution of X t G ).
*(iv) Compute the transition probability P(t; x, B) for x e G, B G.
6. Do the analog of Exercise 5 for G = {x e I: 0 _< x ' _< 1 for 1 t) = o(t) as t 10 if

x e G.
9. Show that the function Ji f in (15.28) satisfies (15.23) with

k 02
A = Z A ( A := ^ z
i =1 (ax )

EXERCISES 495
10. Let G be an open subset of Oa k . A Borel measurable function u on G, which is

bounded on compacts, is said to have the mean value property in G if
u(x) = f sk-, u(x + rO) do, for all x and r > 0 such that the closure of the ball B(x:r)
with center x and radius r is contained in G. Here S" - ' is the unit sphere {lyj = l}
and dO denotes integration with respect to the uniform (i.e., rotation invariant)
distribution on S k- '.
(i) Let {B,} be a standard Brownian motion on V (k > 1) and cp a bounded Borel
measurable function on 3G. Suppose Px (i ?c < cc) = 1 for all x c G, where P.
denotes the distribution of {B,} starting at x. Show that u(x):= E x cp(B,,, G ) has
the mean value property in G. [Hint: u(x) = E x u(B,,, A ) if {y: ly - xl -< r} c G,
by the Strong Markov property. Next note that the Po -distribution of {B,} is
the same as that of {OB,} for every orthogonal transformation 0.]
(ii) Show that if u has the mean value property in G, then u is infinitely differentiable.
[Hint: Fix x c- G. Let B(x:2r) c G, and let 0 be an infinitely differentiable
radially symmetric p.d.f. vanishing outside B(O:r). Let .= u on B(x:2r) and
zero outside. Show that u = * i/i on B(x:r)].
(iii) Show that if u has the mean value property in G, then it is harmonic in G, i.e.,
Au = 0 in G. [Hint:
u(x) = u(y) + Y- (xW`W - y (0 ) au (Y) + 2 Z (xu) - y 'n)( x u> - yn) xO xu> (Y)
+ O(p 3 ),
where p = Ix - yl. Integrate with respect to the uniform distribution on

{x: Ix - yj = p }. Show that the integral of the first sum is zero, that of the
second sum is (p 2 /2k)Au(y) and that of the remainder is 0(p 3 ), so that
(p 2 /2k)Au(y) + O(p 3 ) is zero for small p.]
(iv) Show that a harmonic function in G has the mean value property. [Hint: Use
the divergence theorem.]
11. Check that
(i) the function i/i f in (15.28) is harmonic,
(ii) f
IG
O(x; y)s(dy) = I for x E 3G, and
(iii) i(i(x, y)s(dy) converges weakly to b .(dy) as x -^ y o E 3G.
y
12. (Maximum Principle) Let u be harmonic in a connected open set G c 1& k . Show
that u cannot attain its infimum or supremum in G, unless u is constant in G. [Hint:
Use the mean value property.]
13. (Dirichlet Problem) Let G be a bounded, connected, open subset of R. Given a
continuous function q' on OG, show, using Exercises 10 and 12, that
(i) u(x):= E, e q (B) is a solution of the Dirichlet problem
Au(x) = 0 for x e G, u(x) = 43(x) for x e 3G,
and
(ii) if u is continuous at the boundary, i.e., u(y) = lim u(x) as x(e G) - y e 0G, then
this solution is unique in the class of solutions which are continuous on G.

14. (Poisson's Equation) Let G be as in Exercise 13. Give at least an informal proof,
akin to that of (15.25), that
Tr'G
u(x):= E x g(X,) ds + E x cp(X T G , )
satisfies Poisson's equation
2Au(x) = g(x) for x e G, u(x) = 47(x) for x e OG.
Here g, cp are given continuous functions on G and OG, respectively. Assuming u is

continuous at the boundary, show that this solution of Poisson's equation is unique
in the class of all continuous solutions on G.
15. Extend the function F in (15.37) to a twice continuously differentiable function on
[0, co) vanishing outside [c 1 , d 1 ], where 0 < c, < c < d <d 1 . Show that the
corresponding extension of f (in (15.37)) is twice continuously differentiable on (f8'`
and vanishes outside a compact set.
16. Prove that
(i)fc
exp{ I(u)} du diverges for some c >0 if and only if it diverges for all c > 0;
(ii) exp{ I (u)} du converges for some c > 0 if and only if it converges for all c > 0.
17. Prove (15.61) using (15.60).
18. Write out the details of the proof of Theorem 15.3(b). [Hint: Follow the steps of
Exercise 14.5(i), replacing step (a) by (15.52) with d replaced by d i .]
19. Consider the translation x * x + z (for a given z), and the diffusion {Y, := X, + z}.
(i) Write down the drift and diffusion coefficients of {Y,}.
(ii) Show that {X,} is recurrent (or transient) if and only if {Y,} is recurrent
(transient).
(iii) Write down (15.46) and (15.47) for {Y}, in terms of the coefficients")( ), d. (. ).
20. Let {X,} be a diffusion on l. Assume that
(a) ((d (x))) __ a 2 I, where a 2 > 0 and I is the k x k identity matrix, and
(b)E;`_
;;
I x ' Ip''(x + z) -< S for Ixt >, M, where z e l8 k , 5>0, M >0 are given.
(
Apply Theorem 15.3 (and Exercise 19) to decide whether the diffusion is recurrent
or transient.
21. For a diffusion having a positive and continuous (in x, y) transition density p(t; x, y)
(t > 0), prove that the P,-distribution of T OB(O . d) has a finite m.g.f. in a neighborhood
of zero, if xl < d.

1. Compute the transition probability density of a standard Brownian motion on
G = {x e 08 k x ' >, 0 for 1 1).
: (
2. (i) Compute the transition probability density q of a standard Brownian motion on

G = {x e ll : 0 < x ` -< 1 for 1 - 0),
x,yEri
for some positive constants c,, c 2 .
3. Prove that {Z,} defined by (16.27) is a Markov process on H = {x e 11": yx > 0}.
4. Check (16.30).
5. Check that the function v(t, jxj) given by the solution of (16.42) (with r = jxl) satisfies
(16.41).
Theoretical Complements to Section V.1

1. A construction of diffusions on S = R' by the method of stochastic differential
equations is given in Chapter VII (Theorem 2.2), under the assumption that g( ), r(.)
are Lipschitzian. If it is also assumed that a(x) is nonzero for all x, then it follows
from K. Ito and H. P. McKean, Jr. (1965), Diffusion Processes and Their Sample
Paths, Springer- Verlag, New York, pp. 149-158, that the transition probability
p(t; x, dy) has a positive density p(t; x, y) for t > 0, which is once continuously
differentiable in t and twice in x. Most of the main results of Chapter V have been
proved without assuming existence of a smooth transition probability density. Ito
and McKean, loc. cit., contains a comprehensive account of one-dimensional
diffusions.

1. For the Markov semigroup {T,} defined by (2.1) in the case p(t; x, dy) is the transition
probability of a diffusion {X }, it may be shown that every twice continuously
differentiable function f vanishing outside a compact subset of the state space belongs
to 9,,. A proof of this is sketched in Exercise 3.12 of Chapter VII. As a consequence,
by Theorem 2.3, f (X,) $ Af (Xs ds is a martingale. More generally, it is proved
)
in Chapter VII, Corollary 3.2, that this martingale property holds for every twice
continuously differentiable f such that f, f', f" are bounded. Many important
properties of diffusions may be deduced from this martingale property (see Sections
3, 4 of Chapter VII).
2. (Progressive Measurability) The assumption of right continuity of t X,(w) for
every w e Q in Theorem 2.3 ensures that {X,} is progressively measurable with respect
to {.y, }; that is, for each t > 0 the map (s, w) + X(w) on [0, t] x S2 into S is measurable
with respect to the product sigmafield _4([0, t]) SF, (on [0, t] x S2) and the Borel
sigmafield .a(S) on S. Before turning to the proof, note that as a consequence of the
progressive measurability of {X} the integral f p Af (Xj ds is well defined, is
.y,-measurable and has a finite expectation, by Fubini's Theorem.
Lemma 1. Let (S2, .y) be a measurable space. Suppose S is a metric space and
s --+ X(s, w) is right continuous on [0, oo) into S, for every aw e Q. Let {.y,} be an
increasing family of sub-sigmafields of .f such that X, is .F, measurable for every
t _> 0. Then {X,} is progressively measurable with respect to {. ~r }.
Proof. Fix t > 0. Let f be a real-valued bounded continuous function on S. Consider

the function (s, cw) -* f (XX (w)) on [0, t] x fI. Define
2^
gn(s, w) _ f(X2-^r(w)) 1 [0.2--r](s) + f(X12-nr(w))lu+-1)2 r,i2-"r](S). (T.2.1)
As each summand in (T.2.1) is the product of an S measurable function of w and a

([0, t])-measurable function of s, g is _4([0, t]) .-measurable. Also, for each
(s, co) e [0, t] x f, g(s, w) -+ f (X$ (co)). Therefore, (s, w) , f (X3 (o))) is R([0, t]) Qx .-
measurable. Since the indicator function of a closed set F c S may be expressed as
a pointwise limit of continuous functions, it follows that (s, w) - 1 F (Xs (w)) is
_4([0, t]) .0;-measurable. Finally, c' == {B e p1(S): (s, co) - 1 8 (XS (w)) is ([0, t]) Q
measurable on [0, t] x fl} equals s(S), by Dynkin's Pi-Lambda Theorem. n
Another significant consequence of progressive measurability is the following result,

which makes the statement of the strong Markov property of continuous-parameter
Markov processes complete (see Chapter IV, Proposition 5.2; Chapter V, Theorem
11.1; and Exercise 11.9).
Lemma 2. Under the hypothesis of Lemma 1, (a) every {.^,}-stopping time r is

.-measurable, where 5=== {A e F: A n {i _< t} e .F, Vt > 0}, and (b) X,1 (T< . ) is
SFB measurable.
Proof. The first part is obvious. For part (b), fix t > 0. On the set 52, := {r < t}, XT
is the composition of the maps (i) a - (tr(w), w) (on f2, into [0, t] x S2,) and (ii)
(s, w) --> X (w) (on [0, t] x Q r into S).
5
Let n __ {A n St,: Ac . Nr } = (Ac 3: A c i2 r }, be the trace of the sigmafleld .Fr

on 52,. If r e [0, t] and Ac .y, ( n , one has
{west,: (r(w),o)e[0, r] x A} = r} n Ac
Thus the map (i) is measurable with respect to In (on the domain St,) and
,1([0, t] (on the range [0, t] x S2 r ).
Next, the map (s, w) -+ XS (w) is measurable with respect to .q([0, t]) O .yr (on
[0, t] x f2) and .a(S) (on S), in view of the progressive measurability of {X 5 }. Since
[0, t] x Q, e ([0, t]) F,, and the restriction of a measurable function to a
measurable set is measurable with respect to the trace sigmafield, the map (ii) is
e([0, t]) SFJ,,- measurable. Hence w -. XT(m[ (W) is . I n ,-measurable on R.
Therefore, for every B e a(S),
(we S2: X X(W) (cw) e B} n Q, - {co E f',: X, [m) (w) e B) e S. n
3. (Semigroup Theory and Feller's Construction of One-Dimensional Diffusions) In a

series of articles in the 1950s, beginning with "The Parabolic Differential Equation

and the Associated Semigroups of Transformations," Ann. Math., 55 (1952), pp.

468-519, W. Feller constructed all nonsingular Markov processes on an interval S
having continuous sample paths. Nonsingularity here means that, starting from any
point in the interior of S, the process can move to the right, as well as to the left
with positive probability. If S = (a, b), and a, b cannot be reached from the interior,
then such a process is completely characterized by a strictly increasing continuous
scale function s(x) and a strictly increasing right-continuous speed function m(x). The
role of s(x) is the determination of the probability 4(x) of reaching d before c,
starting from x, where a <c <x < d < b (for all c, x, d), by the relation
4(x) = (s(x) s(c))/(s(d) s(c)). Then the speed function determines the expected
time M(x) of reaching c or d, starting from x, by the relation
d d
Mx 1, c<x<d, M(c)=M(d) =0.
(
dm(x) ds(x) ( )
The infinitesimal generator of this process is A = (d/dm(x))(d/ds(x)). It may be noted

that, given the functions 0, M, the scale function s(.) is determined up to an additive
constant, and the speed function m() is determined up to the addition of an affine
linear function of s(. ). One may, therefore, fix x 0 e (a, b) and let s(x 0 ) = 0, m(x 0 ) = 0.
A necessary and sufficient condition that the boundary point a cannot be reached,
that is for the inaccessibility of a, is
f. X0 m(x)ds(x)= co, (T.2.2)
and that for the inaccessibility of b is

m(x) ds(x) = co. (T.2.3)
J b
xo
For the class of diffusions considered in this book (see (9.9), (9.10), (9.13)),
Jx
s(x)= exp{I(x o , z)} dz, m(x) = f z^ z) exp{I(x o , z)} dz,
xp ,
X0
1 (xo, z) = ,Jr 2(Y) dY

xo U 2 (Y)
If any boundary point is accessible, a boundary condition has to be specified at that

point, in order to indicate what the Markov process does on reaching this point.
This is discussed in theoretical complement 7.1. A very readable account of this
characterization of nonsingular Markov processes on S having continuous sample
paths is given in D. Freedman (1971), Brownian Motion and Diffusion, Holden-Day,
San Francisco, pp. 102-138. See Theorem T.3.1 of Chapter VII for (T.2.2), (T.2.3).
The second, and more important, part of Feller's theory is the construction of
Markov processes on S. given a pair of scale and speed functions s and m. This is
carried out by the method of semigroups. The rest of this subsection is devoted to a
description of this method.
In general semigroup theory one considers a family of bounded linear operators

{T,: t > 0} on a Banach space B with norm II - II, satisfying T, +s = T,T,. It will be
assumed that IIT, f f II 0 as t j 0, for every fe ll. Since the set B o of all f e B
for which this convergence holds is itself a Banach space, i.e., a closed linear subset
of B, this assumption simply means the restrict of T, to B 0 .
It follows from the semigroup property that {T,: t > 0} is determined by
{T,: 0 < t <e} for every s > 0. Indeed, the semigroup is determined by its infinitesimal
generator A:= (d/dt)T o . To be precise, let -9 = _9 x denote the set of all f e B such
that (T, f f )/t converges in norm to some g e B, as t j 0. For f e -9, define Af to
be lim(TI f f )/t as t 10. We will also assume that IIT,II _< 1, i.e., {T} is a contraction
semigroup. This is not a serious restriction since, given an arbitrary semigroup {T,},
the semigroup exp{ ct}T, (t > 0) defines a contraction semigroup if c > logjjT, II.
For f e 9A ,
II (T, +sf Tf)/sT,A f II _ II(T3ff)/sA f II --+0
as s j 0. Therefore, T, f e 3x for all t > 0 if f e -q,^, and in this case AT, f = T, A f.

That is, T, and A commute on Q. By an argument entirely analogous to that used
for proving the fundamental theorem of calculus,
T,ff= J ( T,f)ds=
, ds
'
o
ATJds= J ' T.Afds.
0
E
In particular, the function t . T, f on (0, oo) solves the initial-value problem: for
f e 1,A , solve
t u(t) = Au(t) (t > 0), tim u(t) = f.

Flo
The solution t --+ T, f is also unique. To see this, let u(t) be a solution. Fix t> 0 and
consider the function s --+ T,_ S u(s) (0 _< s 5 t). Now,
Tu(s) = lim
{(T,
h h u(s + h) T 1 _ s u(s + h)) + (T 1 _ g u(s + h) T,_.u(s))}
ds 1
t-s
du s
( )
=lim 1 ( T S,Au(s+h)ds' +T,_ 5

h l o h , _s_a ds
= T,_ s Au(s) + T,_,Au(s) = 0.
Hence T,_ s u(s) is independent of s so that setting in turn s = 0, t, we get T,u(0) = u(t),
that is, T, f = u(t).
When is an operator A defined on a linear subspace Q of B an infinitesimal
generator of a contraction semigroup {T,}? The answer is contained in the important
Hille Yosida Theorem: Suppose 1 x is dense in B. Then A is the infinitesimal generator
of a contraction semigroup on B if and only if, for each . > 0, A A is one-to-one (on
9A ) and onto B with II(A A) - ' II < 1/2. A simple proof of this theorem for the case
B = C[a, b], the set of all continuous functions on (a, b) with finite limits at a and
b, and for closed linear subspaces of C[a, b], may be found in P. Mandl (1968),
Analytical Treatment o/' One-Dimensional Markov Processes, Springer- Verlag,

New York, pp. 2-5. It may be noted that if A is a bounded operator on B, then
T, = exp{tA} (see Section 3 of Chapter IV). But the differential operator
A = (d /dm(x))(d/ds(x)) is unbounded on C[a, b].
Consider the case B = C 0 (a, b), the set of all continuous functions on (a, b) having
limit zero at a and b. C0 (a, b) is given the supnorm. Let A = (d/dm(x))(d/ds(x)) with
Â comprising the set of all f e B = C0 (a, h) such that Af e B. Suppose A, i7 A satisfy
the hypothesis of the Hille-Yosida Theorem, and {T,} the corresponding contraction
semigroup. It is simple to check that (i: - A) 'f equals g= J/ e 'T, f dt, for each
- -
2 > 0; that is, . - A)g = f. Suppose that the resolvent operator (;. - A) ' is positive;
-
if f >_ 0 then (1 - A) 'f >_ 0. Then, by the uniqueness theorem for Laplace transforms,
T, f > 0 if f >_ 0. In this case f -^ T, f(x) is a positive bounded linear functional on
C 0 (a, b). Therefore, by the Riesz Representation Theorem (see H. L. Royden (1968),
Real Analysis, 2nd ed., Macmillan, New York, pp. 310-11), there exists a finite
measure p(t; x, dy) on S = (a, b) such that T, f(x) = $ f(y)p(t; x, dy) for every
f e C 0 (a, b). In view of the contraction property, p(t; x, S) < 1.
In order to verify the hypothesis of the Hille-Yosida Theorem in the case a, b are
inaccessible, construct for each A. > 0 two positive solutions u u 2 of (A - A)u = 0,
u, increasing and u 2 decreasing. Then define the symmetric Green's function
G,A (x, y) = W 'u,(x)u z (y) for a <x _< y < b, extended to x > y by symmetry. Here
-
the Wronskian W given by W 1= u 2 (x) du, (x)/ds(x) - u l (x) du 2 (x)/ds(x) is independent

of x. For Ja C0 (a, b) the function g(x) = G x f(x):= J G,(x, y)f(y) dm(y) is the unique
solution in C0 (a, b) of (A - A)g = f. In other words, G. = (A - A) '. The positivity
-
of (A - A) ' follows from that of G A (x, y). Also, one may directly check that G.1 - 1/2.
-
This implies p(t; x, (a, b)) = 1. For details of this, and for the proof that a Markov
process on S = (a, b) with this transition probability may be constructed on the space
C([0, oo ): S) of continuous trajectories, see Mandl (1968), loc. cit., pp. 14-17,21-38.
4. For a diffusion on S = (a, b), consider the semigroup {T} in (2.1). Under Condition
(1.1), it is easy to check on integration by parts that the infinitesimal generator A is
self-adjoint on LZ (S, n(y) dy), where ir(y) dy is the speed measure whose distribution
function is the speed function (see Eq. 8.1). That is,
Af(Y)g(Y)n(Y) dy = ff(y)Ag(y)7r(y)dy
J
for all twice continuously differentiable f, g vanishing outside compacts. It follows
that T, is self-adjoint and, therefore, the transition probability density
q(t; x, y):= p(t; x, y) /n(y) with respect to n(y) dy is symmetric in x and y. That is,
q(t; x, y) = q(t; y, x). Since p satisfies the backward equation cap/c?t = A ' p, it follows
that so does q: cq/t = A x q. By symmetry of q it now follows that 3q /t = A y q. Here
Aq, A y q denote the application of A to x -. q(t; x, y) and y -^ q(t: x, y) respectively.
The equation tq/at = A y q easily reduces to cep/lit = A *p, as given in (2.41). For
details see Ito and McKean (1965), loc. cit., pp. 149-158.
For more general treatments of the relations between Markov processes and
semigroup theory, see E. B. Dynkin (1965), Markov Processes, Vol. 1, Springer- Verlag,
New York, and S. N. Ethier and T. G. Kurtz (1986), Markov Processes:
Characterization and Convergence, Wiley, New York.


1. Theorem 4.1 is a special case of a more general result on the approximation of
diffusions by discrete-parameter Markov chains, as may be found in D. W. Stroock
and S. R. S. Varadhan (1979), Multidimensional Diffusions, Springer-Verlag, New
York, Theorem 11.2.3.
From the point of view of numerical analysis, (4.13) is the discretized, or
difference-equation, version of the backward equation (4.14), and is usually solved
by matrix methods.

1. The general PDE (partial differential equations) method for solving the Kolmogorov
equations, or second-order parabolic equations, may be found in A. Friedman (1964),
Partial Differential Equations of Parabolic Type, Prentice-Hall, Englewood Cliffs. In
particular, the existence of a smooth positive fundamental solution, or transition
density, is proved there.

1. Suppose two Borel-measurable function p( ), a'() > 0 are given on S = (a, b) such
that (i) j(.) is bounded on compact subsets of S, (ii) a 2 (.) is bounded away from
zero and infinity on compact subsets of S, and (iii) a and b are inaccessible (see
theoretical complement 2.1 above). Then Feller's construction provides a Markov
process on S having continuous sample paths, with a scale function s(.) and speed
function m(.) expressed as functions of p( ), a'(.) as described in theoretical
complement 2.1. Hence continuity of p() is not needed for this construction.
2. The general principle Proposition 6.3 is very useful in proving that certain functions
{Q(XX )} of Markov processes {X} are also Markov. But how does one find such
functions? It turns out that in all the cases considered in this book the function q0 is
a maximal invariant of a group of transformations G under which the transition
probability is invariant. Here invariance of the transition probability means
p(t; gx, g(B)) = p(t; x, B) for all t > 0, x e S. g e G, B Borel subset of the metric space
S. A function cp is said to be a maximal invariant if (i) cp(gx) = cp(x) for all g e G,
x e S, and (ii) every measurable invariant function is a (measurable) function of (p.
For each x e S the orbit of x (under G) is the set o(x):= {gx: g e G}. Each invariant
function is constant on orbits. Let S' be a metric space and cp a measurable function
on S onto S' such that (i)' pp is constant on orbits, and (ii)' cp(x) # q(y) if o(x) 96 o(y).
In other words <p(x) is a relabeling of o(x) and S' may be thought of as (relabeling
of) the space of orbits. Then it is simple to see that rp is a maximal invariant.
Proposition. If qp is a maximal invariant, then {cp(XX )} is Markov.
Proof. Taking the conditional expectation first with respect to a{Xu : u <, s} and then
with respect to a{cp(X): u < s),
I I
P(<a(X, +1) e B' c {q (XX): u s}) = E(p(t; X (p 1 (B')) a{co(X4): u s}).
But, by property (i) above,
p(t; x, tV '(B')) = p(t; g 'x, g '(r) '(B'))) = p(t; g - 'x, (W g) '(B'))

- - - -
= p(t; g 'x, qP '(B')),

-
since (p g = (p by invariance of (p. In other words, the function x -, p(t; x, (p '(B'))

is invariant. By (ii) it is therefore a function q(t; cp(x), B'), say, of 'p. Thus
E(p(t; X, q '(B')) a{w(X): u -< s }) = E(q(t; p(XX), B') a{ p(XX): u < s })

-
= q(t; go(xs), B'). n
It has been pointed out to us by J. K. Ghosh that the idea underlying the
proposition also occurs in the context of sequential analysis in statistics. See W. J. Hall,
R. A. Wijsman, and J. K. Ghosh (1965), "The Relationship Between Sufficiency and
Invariance with Applications in Sequential Analysis, Ann. Math. Statist., 36,
pp. 575-614. Also see theoretical complement 16.3.
A maximal invariant of the reflection group G = {e, e} (ex := x, ex := x) is
(p(x) = Jx ( . Thus, {IX,l} is Markov if {X} and { X,} have the same transition
probability, i.e., if p(t; x, B) = p(t; x, B). The group G of transformations on 68'
generated by reflections around 0 and 1 (i.e., by g o x = x, g l x = g 1 (1 + x 1) _
1 x + I = 2 x) is infinite, and
o ( x )={ x + 2m : m = 0 , 1,2,...}u { x +2m:m= 0,1,... }.
Thus, if p(t; x, B) = p(t; x, B) and p(t; x, B) = p(t; x + 2, B + 2), then {Z,} in

Theorem 6.4 is Markov, since a maximal invariant is cp(x) = jx(mod 2) 11.
1. It is shown in Example 14.1, that the so-called radial Brownian motion {IB,^} is a
diffusion on S = [0, co), such that "0 is inaccessible (from the interior of S). On the
other hand, if the process starts at "0 ", it instantaneously enters the interior (0, cc)
and stays there forever. Thus, although the diffusion may be restricted to the state
space (0, oo), "0" may also be included in the state space. The only other way of
including "0" in the state space is to make "0" an absorbing boundary, if the process
is to have the Markov property and continuous sample paths. The nonabsorbing
boundary point "0" in this case is called an entrance boundary. In general, an
inaccessible lower boundary a is an entrance boundary if
v= J a
xp
s(x)dm(x)> cc.
Similarly, an upper inaccessible boundary b is entrance if
v :=J 6 s(x)dm(x) < cc.

xo
See Ito and McKean (1965), loc. cit., 108. p.

Suppose a boundary point is accessible, but the process starting at this boundary
point cannot enter the interior; that is, no boundary condition other than absorption
is consistent with the requirement that the process be Markovian and have continuous
sample paths. Then the boundary is said to be an exit boundary.
In general, an accessible lower boundary a is exit if v a = oc, and an accessible
upper boundary h is exit if u b = co. For details see Ito and McKean, loc. cit., p. 108.
An example of a lower exit boundary is S = [0, oc ), (x) = ax, a 2 (x) = x with > 0,
which occurs as a model of growth of an isolated population (see S. Karlin and H.
M. Taylor (1981), A Second Course in Stochastic Processes, Academic Press,
New York, p. 239).
If one allows jump discontinuities at a boundary point, then the diffusion on
reaching the boundary (or, starting from it) may stay at this point for a random
holding time before jumping to another state. The holding time distribution is
necessarily exponential (see Chapter IV, Proposition 5.1), with a parameter 6 > 0,
while the jump distribution y(dy) is arbitrary. The successive holding times at this
boundary are i.i.d. and are independent of the successive jump positions, the latter
being also i.i.d. See Mandl (1968), loc. cit., pp. 39, 47, 66-69.
(Semigroups under Boundary Conditions) Let S = [a, b], p(. ) and a(. ) differentiable,
and a 2 (x) > 0 for all x e S. If Dirichlet or Neumann boundary conditions are
prescribed at a, b, then the Hille-Yosida theorem may be used to construct a
contraction semigroup {T,} generated by A on B:= C[a, b], or C 0 (a, b). As indicated
in theoretical complement 2.3 above, there exists a transition probability, with total
mass _< 1, such that Tj(x) = f f (y)p(t; x, dy) for all f c B. To give an indication how
the hypothesis of the Hille-Yosida Theorem is verified, consider Dirichlet boundary
conditions. That is, let B = C 0 (a, b), -9A the set of all Je B such that
Af := (d/dm(x))(d/ds(x))f(x) is in B. Fix A > 0, and let u u 2 be the solutions of
(it A)u = 0 with u i (a) = 0, ui(a) = 1, u 2 (b) = 0, ui(b) = 1. By the existence and
uniqueness theorem for linear ordinary differential equations, these solutions exist
and are unique. Define the Green's function G A (x, y) = W - ù 1 (x)u 2 (y) for x <, y, and
extend it symmetrically to x > y, with W as the Wronskian (see theoretical
complement 2.3). It is not difficult to check that for every f e B the function
G A f (x):= f G A (x, y)f (y) dm(y) is the unique element in B satisfying (2 A)G A f = f.
Since G A is positive and (G x 1)(x) may be shown directly to be less than I for all x,
the verification is complete, and we have a transition probability p(t; x, dy) with
p(t; x, S) < 1.
In the case of Neumann, or reflecting, boundary conditions at a, b, take B = C[a, b]
and 9,^ = the set of all f in B such that Af e B and f'(a) = 0 = f'(b). Then (.? A)
may be expressed as (A A) 'f = G z f + I I (f)u, + ! 2 (f )u 2 , where u I , u 2 , GA are as
-
above, and 1 1 (f),1 2 (f) are bounded linear functionals of f determined so that
g== (2 A) 'f satisfies the boundary conditions g'(a) = 0 = g'(b). The hypothesis of
the Hille-Yosida Theorem may be directly verified now. Since constant functions
belong to ,^ and (A A)(1/2) = 1, it follows that
(AA) -t 1 = 1/2, and p(t;x,[a,b])= I.
We have given a brief outline above of Feller's construction of diffusions under

Dirichlet and Neumann boundary conditions. Complete details may be found in

Mandl (1968), loc. cit., Chapter II. The direct probabilistic constructions given in
the text (see Sections 6 and 7) do not make use of this theory.
2. (Eigenfunction Expansions) For a justification of the eigenfunction expansion
described in Section 8, consider first {T,} on C0 (a, b) under Dirichlet boundary
conditions. Extend T, to L2 ([a, b], a(y) dy = dm(y)). Now n(y) dy is an invariant
measure. For
dt
f T1f(Y)n(Y) dy = J ATf(Y)h(Y) dy = J Tf(Y)A *7 z(Y) dy = 0,
so that
J T,f(Y)n(Y) dy = f f(y)ir(y)dy for f e C.(a, b).
From this it is easily shown that {T,} is a contraction semigroup on L2 , whose

infinitesimal generator is the extension of A to the set of all f e C0 (a, b) such that
Af e LZ([a, b], m(y) dy). We denote this extension also by A and note that A is
self adjoint. Fix A > 0. Then (2 A) ' is compact and self-adjoint on L2 . Indeed,
-
(.? A) ' is the integral operator G., and it follows from a well-known theorem of
-
Riesz (see F. Riesz (1955), Functional Analysis, F. Ungar Publishing Co., New York,
Chapter VI) that (A A) ' is a compact self-adjoint operator whose eigenvalues
-
p(2) are positive and converge to zero as n ^ oo, and that the corresponding
normalized eigenfunctions 0. comprise a complete orthonormal sequence in L2 . As
a consequence, the eigenvalues of A are ;:= = A '(2) < 0, with the
corresponding eigenfunctions i/i,,.
Under Neumann boundary conditions, it follows from the representation
(A A) 'f = G x f + 1,(f )u 1 + 1 2 (f )u 2 that (A A) ' is again a compact self-adjoint
- -
operator on L2 . The eigenvalues a of A are nonnegative, with 0 as one of them since

Al = 0. Again the normalized eigenfunctions comprise a complete orthonormal set.
The uniqueness of the solution to the initial-value problem, under any of these
boundary conditions, follows from the uniqueness result proved in theoretical
complement 2.1 above. Another proof may be based on the maximum principle for
parabolic equations (see Exercise 7.7).

1. A proof that the pre -i sigmafield as defined by (11.2) is the same as the one generated
by the stopped process {X t ,,,: t _> 0} is given in Stroock and Varadhan (1979), loc.
cit., p. 33.
The continuity of the function cp(y) in (11.18) depends only on the fact that
x p(t; x, dy) is weakly continuous. This fact, referred to as the Feller property, is
proved in Section 3 of Chapter VII, for diffusions on H'. More generally, as described
in theoretical complements to Sections 2, 7, 8, Feller's construction implies that T, f
is bounded and continuous if f is.
2. The Brownian meander and Brownian excursion processes considered in Exercises
12.6, 12.7 of Chapter I and theoretical complements to Section 12 of Chapter I and
Section 5 of Chapter IV may also be defined as continuous-parameter Markov
processes having continuous sample paths but nonhomogeneous transition

probabilities. In fact, let {B,} denote a standard Brownian motion, and let
a. = sup{t _< 1: B, = 0}, p = inf{t >_ 1: B, = 0}. Consider the stochastic processes
defined by
(meander): B; :_ X8 (1 _ x+ ,^(1 A)_1/ 2 , 0 _< t _< 1, (T.11.1)

(excursion): B*+ :_ JB (1 _, )x+ , P I(p A) 1 / 2 , 0 < t < 1. (T.11.2)
Then in B. Belkin (1972), "An Invariance Principle for Conditioned Random Walk
Attracted to a Stable Law," Z. Wahrscheinlichkeitstheorie und Verw. Geb., 21,
pp. 45-64, it is shown that the meander process defined by (T.11.1) is a
continuous-parameter Markov process starting at 0 with transition probabilities given
by the following densities
(
p(00 t, Y) = 2t 3 / 2 y exp{ 2t }^ i -t( 0 , Y),

- 0 S t 1, y > 0,
D 1 ^(0, y)
p + (s, x; t, Y) = {(P1-s(Y x) cp,-s(Y + x)} 0 < s < t 1, x, y > 0,
ca l _,(0, x)'
where
p (z) = (2zru) iiz exp{ 2u}' and

(
^.(a, b) =
Ja q(z)dz.
b
Likewise, for the case of the excursion process defined by (T.11.2), one obtains a
continuous-parameter Markov process with nonhomogeneous transition law with
the following density (see Ito and McKean (1965), loc. cit., p. 76):
p* + (0, 0; t, y) = 2y 2 (21rt 3 (1 t) 3 ) - '/ 2 exp{y 2 /2t(1 t)}, 0 < t -< 1, y > 0,

(1 s) 3 / 2 y exp{ y 2 /2(1 t) }
P *+ (s, x; t, y) = {q,-(y x) (P'-s(Y + x)}
(1 t) 312 x exp{ x 2 /2(1 s)}'
0<s<t<1, x,y>0.
The equivalence of these processes with those of the same name defined in
theoretical complements to Section I.12 is the subject of the paper by R. T. Durrett,
D. L. Iglehart, and D. R. Miller (1977), "Weak Convergence to Brownian Meander
and Brownian Excursion," Ann. Probab., 5, pp. 117-129.

1. Consider a positive recurrent diffusion on an interval S, having a transition probability
p(t; x, dy) and an invariant initial distribution n. We will use a coupling argument to
show that fl p(t; x, dy) rc(dy)jI := sup{ p(t; x, B) n(B)I: B e _4(S)} -. 0 as t - oc.
For this, let {X1 }, {Y} be two independent diffusions having transition probability p
and with X0 - x, and Yo having distribution n. Let i= inf{t >, 0: X, = Y}. If one
can show that r < oo a.s., then the process {Z} defined by
Z,:=
(X,
Y, _>
for t <r,
for t 'r,
(T.12.1)
has, by the strong Markov property for the two-dimensional Markov process
{(X, ) ,) }, the same distribution as {X, }. In this case,
Ip(t; x, B) rt(B)i = IP(X, E B) P(} e B)I = IP(Z, E B) P(Y E B)I

_<P(Z,0Y)=P(r>t) --+0 ast--*oc.
In order to prove r < oo a.s., it is enough to prove that the process {(X Y,)}
reaches every rectangle [a, b] x [c, d] (a < b, c < d) almost surely. For the process
must then go across the diagonal {(u, v): u = v} a.s. Let cp(u, v) denote the probability
that a pair of such independent diffusions starting at (u, v) ever reach the rectangle
[a, b] x [c, d]. Note that it x it is an invariant initial distribution for the
two-dimensional diffusion, and let {(U, V)} denote such a diffusion having the initial
distribution it x it. Let a denote the probability that the latter ever reaches
[a, b] x [c, d]. Also write D := {(U V) e [a, b] x [c, d] for some t > n }, and
F.:= {(U V,) e [a, b] x [c, d] for some t _< n }. Then a = P(D) + P(F n D.c). But, by
stationarity, P(D) = a. Therefore, P(F, n D ) = 0. By the Markov property,
P(F n Dc) = E'[IF.,( 1 ro(U,,, V'))] = E(P(F I (U,, V'))(l qp(U,,, l' )).
Now we may use the positivity of the transition probability density to show easily
that P(F, U = u, V. = v) > 0 for almost all pairs (u, v) (with respect to Lebesgue
measure on S x S). Therefore, I (p(u, v) = 0 for almost every pair (u, v). By using
the continuity of x p(t; x, y) it is now shown that (u, v) * c,(u, v) is continuous,
so that cp(u, v) = I for all (u, v). This proves that r < oo a.s.

1. (Martingale Central Limit Theorem) Although a proof of the central limit theorem
(Theorem 13.1) for positive recurrent one-dimensional diffusions may be patterned
after the corresponding proof for Markov chains without any essential change, the
proof does not work for Markov processes on general state spaces and
multidimensional diffusions. The reason for this failure is that point recurrence may
not hold. That is, even for Markov processes having a unique invariant distribution,
there may not be any point in the state space to which the process returns (infinitely
often) with probability 1. A more general approach that applies to all ergodic Markov
processes is via martingales. First, let us prove a general martingale central limit
theorem. Central limit theorems for Markov processes will then be derived from it.
Let {Xk ,,,: 1 < k < k} be, for each n >, 1, a square integrable martingale difference

sequence, with respect to an increasing family of sigmafields { y k : 0 -< k <- kn }. Write
E(X k,n
22 I p--
'^kI,n)>
2 =
Sk.n
k
2
62.,
i =1
I'k( i)' E(Xl,n1(IXj.nI>c) I ` fJ l.n ) >

(T.13.1)
Mn =max{ak n ;1 <, k
kn
Sn,kn' YL Xin.
j=1
The following result is due to B. M. Brown (1977), "Martingale Central Limit

Theorems," Ann. Math. Statist., 42, pp. 59-66.
Theorem T.13.1. (Martingale CLT). Assume that, as n -+ oo, (i) skn n - 1 in

probability, and (ii) L k ,,() -, 0 in probability, for every E > 0. Then S n kn converges
in distribution to N(0, 1).
Proof. Consider the conditional characteristic functions
(Pk,n( )'=E(exp { i Xk,n} I'^k1,n ), (fielt'). (T.13.2)
It would be enough to show that
(a) E (exp {i Sn.kn}'1 j 4k.)) = 1,
provided If in <Pk,n(')) -1 1 -< 5(^), a constant, and
kn
(b) p k , n () -+ exp{ X 2 /2} in probability.
Indeed, if J([ <p k , n (^)) - 'I -< b(^), then (a), (b) imply
E exp{iZSf , kn } E exp{i_Sn , k }
E ex
P{ i S n.kn } ex p{ Z/ }I P{ 2/2
2 ==ex
} exp{ Z2/2} 1 1 4'k.,,()
Sexp{Z2/2
Z 1 I .0.
}Elexp{
1 /2} f (Pk.n(Z)
(T.13.3)
Now part (a) follows by taking successive conditional expectations given .yk-1,n
(k = kn , kn 1, ... ,1), if (JJ ())-' is integrable. Note that the martingale
difference property is not needed for this. It turns out, however, that in general
(f cannot be bounded away from zero. Our first task is then to replace Xk,n
by new martingale differences Yk , n for which this integrability does hold, and whose
sum has the same asymptotic distribution as S., kn . To construct 1, first use

assumption (ii) to check that M. -+ 0 in probability. Therefore, there exists a

nonrandom sequence 8 j 0 such that
P(M>_S )-+0 asn-pco. (T.13.4)
Similarly, there exists, for each e > 0, a nonrandom sequence O(E) j 0 such that
P(L k (e) >_ O(E)) -+ 0 as n - co. (T.13.5)
Consider the events
A k (s):= {Qk, < L k. < O (E) , se.,, < 2 }, (1 _< k _< Q. (T.13.6)
Then A k , (e) is yk _ 1 -measurable. Therefore, Yk. defined by
Y Xk.n 1 Ak ^(c) (T.13.7)
has zero conditional expectation, given Although Yk , depends on e, we will

suppress this dependence for notational convenience. Note also that
k
P(Yk.n = Xk.n for 1 -< k < kn) >- Pn Ak,n(E)
k=1
= P(M, L k n(e) < (e) , S,.k ^ < 2) - 1. (T.13.8)

We will use the notation (T.13.1 -2) with a prime (') to denote the corresponding
quantities for {Yk.n }. For example, using the fact E(Yk, I Fk.-I.n ) = 0 and a Taylor
expansion,
z \l (^ \
^ kn() ^ 1 2 QkZn i E exP(i Yk.n } ( 1+ t(; Yk,n + (1 2 ) Yk.n
L 1 k-1
\\\ )1
=E
L 2 Yk
f
ol
(l u)(exp{iu^Yk } 1) du I ^k_ 1
"J^
JI
<E
II3
2 6k ' n + SzZ E(Yk.n
/ c-
ll^Yt.^^>e} ^-ln)
3
II akZ , + 2 E(Xk.n 1 Urk."I>E1 j Ak- l.n) (T.13.9)
Fix E W. Since M; < b , 0 1 ( 2 /2) 2. - 1 (I _< k _< k ) for all large n.

Therefore, using (T.13.6, 9),
j^
11 ^k.n(S) ^ - Qk n^ _ Y_k
x
f 1
SZ
k f//
11
"2 2 J
2 ak,n
1
ISI 3 E +2 e), (T.13.10)

and
(l c) exp{ 1 ^2 vk2 exp{ ^

22 Sk _
(
J 2 ak? }
(T.13.11)
8 8 4
Therefore,
( ^2 Sa
1 j (Pk.n (^) exp1 2 Sn.k 1 1 3 E +X 2 0(E) + 4 8 n . (T.13.12)
Moreover, (T.13.12) implies
Ifl wk.n( )1 >- exp { 22 S k } I I'E +X 2 0(E) + 4a

a
exp{ ^ 2 } IÎ E 0 2 0() + 4 S (T.13.13)

3 n
By choosing E sufficiently small, one has for all sufficiently large n (depending on E),
Ifl gqk.n(^)I bounded away from zero (uniformly in n). Therefore, (a) holds for {Yk , n },
for all sufficiently small E (and all sufficiently large n, depending on E). By using
relations as in (T.13.3) and the inequalities (T.13.12, 13) and the fact that s k. - 1
in probability, we get
2
lira E exp{i^S k } exp --
n^m 2
exp{ ^2 } li m E (exp{ Z }) (11 (Pk.n() I )I

2
5 ex p{ } exp12^( exp{ ^Z } I I'E I Inn EIf ^Pk.n(^) e 212 1

l ((( ) n -^ x
( ^2
exP 2 ii') (T.13.14)
( 1
Finally,
_ z2 l
urn E exp {i S, k ,} exp
S }
n- W 2 )
Jim E exp{iS k ^} E exp{iS k }I + lim IE exp{iS} exp{ --

n-+ao .. l 2
( 2 2
= 0 + li m I E exp{iS;, k } exp{ } < (expf 2 } II'E
J 1(T. 13.15)

The extreme right side of (T.13.15) goes to zero as e j 0, while the extreme left does
not depend on e. n
Condition (ii) of the theorem is called the conditional Lindeberg condition. For
additional information on martingale central limit theorems, see P. G. Hall and
C. C. Heyde (1980), Martingale Limit Theory and Its Applications, Academic Press,
New York.
One may deduce from this theorem a result of P. Billingsley (1961), "The
Lindeberg-Levy Theorem for Martingales," Proc. Amer. Math. Soc., 12, pp. 788-792,
and I. A. Ibragimov (1963), "A Central Limit Theorem for a Class of Dependent
Random Variables," Theor. Probability Appl., 8, pp. 83-89, stated below.
Theorem T.13.2. Let {X: n _> 0} be a stationary ergodic sequence of square

integrable martingale differences. Then Z/^:= (X 1 + + X) /\ converges in
distribution to N(0, a 2 ), where a 2 = EX.
Proof. It is convenient to construct a doubly stationary sequence
X_,, XO, X, ... .X,...
such that {X: n _> 0} has the same distribution as {X: n >_ 0} has. This can be
accomplished by setting for each m, the finite-dimensional distribution of any m
consecutive terms of the sequence {X: n e Z) to be the same as the distribution of
(X0 ..... X, _ , ). Since this specification satisfies Kolmogorov's consistency
requirements, the construction is complete on the canonical space 08 1 . To simplify
notation we shall write X, instead of X, for this process.
If a 2 = 0, then X = 0 almost surely, and the desired conclusion holds trivially.
Assume a 2 > 0.
First, let us show that {X} so extended is a {3} -martingale difference sequence,
where ^ := a {X; : co <f _< n }. We need to show that, for each fixed n > 0,
E(Xn +l I `;n) = 0, i.e.,
`
E(1 X ,) = 0
A + for all A e .fF. (T.13.16)
Suppose A is a finite-dimensional event in , say A e a{ X _ j : 0 _< j _< m }. Note that

the (joint) distribution of (X_ ... , X, X ,) is the same as that of any in + 2
consecutive terms of the sequence {X: n >_ 0 }, e.g., (X0 , X,, ... , X, + ,). Therefore,
(T.13.16) becomes, for such a choice of A, E(I A X, + ,) = 0, for all A E a{X o , ... , X,},
which is clearly true by the martingale difference property of {X: n >, 0 }. Now consider
the class 'K of sets A in f such that E(1,, X. +1 ) = 0. It is simple to check that '6' is
a sigmafield and since'' = a {X - t : 0 s j _< m} for all m ? 0 one has 16 _ .
Next define a := E(X, I -,) (na 1). Then {a: n _> 0} is a stationary ergodic
sequence, and EQ,., = a' > 0. In particular, sn /n = cr ,/n -+ a 2 a.s., where
s = am. Observe that s,, - co almost surely. Write Xk,:=Xk/fn,
k = n. Then at,, = at/n, and s.2 = s /n - a 2 a.s., so that condition (i) of Theorem
T.13.1 is checked, assuming a 2 = I without loss of generality. It remains to
check the conditional Lindeberg condition (ii). But L , (e) is nonnegative, and
EL (F) = E(XI l (1X ,^ >E ;_n I ) -+ 0. Therefore, L , (e) --* 0 in probability. n
.
2. We now apply Theorem 1.13.2 to Markov processes. Let {X n : n _> 0} be a Markov

process on a state space S (with sigmafield .'), having a transition probability p(x; dy).
Assume that p(x; dy) admits an invariant probability it and that under this initial
invariant distribution the stationary process {Xn : n >, 0} is ergodic.
Assume X0 has distribution n. Consider a real-valued function f on S such that
Ef Z (Xo ) < oo. Write f'(x) = f (x) -1 where f = f f dit. Write T for the transition
operator, (Tg)(x):= I g(y)p(x; dy). Then (Tmf')(x) _ (Tmf )(x) -7 for all m >, 0. Also,
Ef'(X0 ) = 0. Suppose the series h n (x) ^_ I (Tmf')(x) converges in L2 (S, n) to h, i.e.
f (h. - h)' d7z -* 0 asn-co. (T.13.17)
This is true, e.g., for all f in case S is finite. In the case (T.13.17) holds, h satisfies
h(x) - (Th)(x) = f'(x), or (I - T)h = f', (T.13.18)
i.e., f' belongs to the range of I - T regarded as an operator on LZ (S,., iv). Now it
is simple to check that
h(Xn ) - (Th)(Xn _ 1 ) (n _> I) (T.13.19)
is a martingale difference sequence. It is also stationary and ergodic. Write

n
Zn = (h(X,) - (Th)(Xm-, )). (T.13.20)

m=1
Then, by Theorem T.13.2, Z,/ f converges in distribution to N(0, a 2 ), where
a 2 = E(h(XI) - (Th)(X0)) 2 = Eh 2 (X,) + E(Th) 2 (X0 ) - 2E[(Th)(X0)h(X1)].

(T.13.21)
Since E[h(X 1 ) I {X0 }] = (Th)(X0 ), we have E[(Th)(X o )h(X I )] = E(Th) 2 (X0 ), so that
(T.13.21) reduces to
a 2 = Eh 2 (XI) - E(Th) 2 (Xo) = J h 2 do - J (Th) 2 dn. (T.13.22)
Also, by (T.13.18),
n n-1
Zn = (h(Xm) - (Th)(Xm-1)) _ (h(X,) - (Th)(Xm)) + h(X) - h(X0)
m=1
nI
_ r' f'(Xm ) + h(X) - h(Xo ). (T.13.23)
m=0
Since
E[h(Xn) - h(X0))/ n]Z ?(Eh2(X0) + Eh 2 (X0)) = 4 $h 2 dir -. 0,

n n

and
1 " 11
1

m
1=o f'(Xm) = Z
In
(h(X") h(X0)),
f
(T.13.24)
it follows that the left side converges in distribution to N(0, a 2 ) as n * Co. We have
arrived at a result of M. 1. Gordin and B. A. Lifsic (1978), "The Central Limit
Theorem for Stationary Ergodic Markov Processes," Dokl. Akad. Nauk SSSR, 19,
pp. 392-393.
Theorem T.13.3. (CLT for Discrete-Parameter Markov Processes). Assume p(x, dy)
admits an invariant probability n and, under the initial distribution n, {X"} is ergodic.
Assume also that f' := f f is in the range of I T. Then
-- Z (f (Xm ) r) N(0,0. 2 ) as n oo , (T.13.25)

jn m=0
where the convergence is in distribution, and Q2 is given by (T.13.22).
Some applications of this theorem in the context of processes such as considered

in Sections 13, 14 of Chapter II may be found in R. N. Bhattacharya and O. Lee
(1988), "Asymptotics of a Class of Markov Processes Which Are Not In General
Irreducible," Ann. Probab., 16, pp. 1333-1347, and "Ergodicity and Central Limit
Theorems for a Class of Markov Processes," J. Multivariate Analysis, 27, pp. 80-90.
3. For continuous-parameter Markov processes such as diffusions, the following theorem
applies (see R. N. Bhattacharya (1982), "On the Functional Central Limit Theorem
and the Law of the Iterated Logarithm for Markov Processes," Z. Wahrscheinlichkeits-
theorie und Verw. Geb., 60, pp. 185-201).
Theorem T.13.4. (CLT for Continuous-Parameter Markov Processes). Let {X} be a

stationary ergodic Markov process on a metric space S, having right-continuous
sample paths. Let n denote the (stationary) distribution of X and A the infinitesimal
generator of the process on LZ (S,it). If f belongs to the range of A, then
t -1 / 2 fo
f (X,) ds converges in distribution to N(0, a 2 ), with a Z = 2<f, g>
2 f f(x)g(x)ir(dx), where g e -9A and Ag = f.
Proof. It follows from Theorem 2.3 that
Z" = g(X) J 0
Ag(XX) ds = g(X)
J o
f(X) ds (n = 1, 2,...)
is a square integrable martingale. By hypothesis, Z" Z" _ 1 (n 1), is a stationary >,

ergodic sequence of martingale differences. Therefore, Theorem T.13.2 applies and
converges in distribution to N(0, a'), where a 2 = E(Z 1 Z0 2 i.e., ) ,
a z = EIg(X,) g(X 0 ) J 0
1 Ag(XS) ds
j
2 . ( T.13.26)
But
E(n- ` 12 g(Xn)) 2 = n - 'Eg 2 (X,) -p 0 as n- oc.
Therefore,
+n-1/2 fo" f (XS)ds -^ N(0, a 2 ).
Also, for each positive integer n, {Zk/n - Z(k-1)/n 1 < k < n} are stationary martingale
differences, so that
n
Q 2 = E(ZI 4) 2 = E E(Zk/ n Z(k 1)/n) 2 = nE(Z11n 4)2
k=1
f
/n 2
= nE g(XI1n) - g(Xo) - Ag(X,) ds

o,
I/n z
= nE(g(X I/n ) - g(Xo )) 2 + nE^ g(Xs) ds
0
f
/n
- 2nE (g(XI/n) - g(X0)) Ag(XX) ds . (T.13.27)
o,
Now,
nE(g(X11n) - g(Xo)) 2 = n[Eg 2 (XI/n) + Eg 2 (Xo ) - 2 Eg(X11n)g(Xo)]
= n 2 Eg 2 (Xo) - 2 J g(x)TI/ng(x)it(dx)]
[
(^(' 1/n
= n 2Eg 2 (Xo ) - 2 J g(x) g(x) + TSg(x) ds n(dx)

[ 0
f J
= - 2n g(x)T,Ag(x) ds ir(dx) --o - 2
0, /n
(T.13.28)
g(x)Ag(x)rr(dx),
as n - oo, since T,g -. g in L2 (S, n) as s j 0. Also,
( E
lJn
o
1 2
g(XS) ds I-< E(!
/ n
1
fo
l/"

(g(Xs))2ds) _ - d
n o
E(g)s
2 (Xo) J(' I/n
= 1
n
E(Ag)
2 2 (X0). (T.13.29)
Applying (T.13.28, 29), the product term on the extreme right side of (T.13.27) is
seen to go to zero. Therefore, a 2 = - 2<g, f > + o(1) as n -+ oo, which implies
a 2 = - 2<g, f >. This proves the result for the sequence t = n. But if
Mn: =max{n - 1/2 k I f(X,)I ds: 1 5 k-< n},

ll J k l )

then
P(M > E) Y P(n -1j2 If(XS)I ds > E 1= nP(f 1 If(X5)I ds > E f) , 0

k=1 fk
- t / 0 /
as n p since
n
G0' If(X)Ids)
.
In order to check that f f belongs to the range of A it is enough to show that

$0 Ts (f f)(x) ds converges in LZ (S, it) as t oo. For if the convergence is to a
function g in L2 , then
(Tag g)(x) = h
n
TS(.% .%)( x) ds ' (f
in L2 as h 10. In other words, g = f 7 Now,
J 0
Ta(f f)(x) ds = I f
os
J
y f( )(p(s; x, dy) n(dy)) ds. (T.13.26)
Therefore, one simple sufficient condition that f f belongs to the range of for
every f e LZ(S, n) is
sup{Ip(s; x, B) n(B)I: Be a(S), x E S} 0
r -dimensional Brownian motion on the disc {1x1 2 a 2 ), k> 1. _<

exponentially fast as s -- oo. This is, for example, the case with the reflecting
An alternative renewal approach may be found in R. N. Bhattacharya and

S. Ramasubramanian (1982), "Recurrence and Ergodicity of Diffusions," J.
Multivariate Analysis, 12, pp. 95-122.
4. It is shown in Bhattacharya (1982), loc. cit., that ergodicity of the Markov process
{X,} under the stationary initial distribution n is equivalent to the null space of A
being the space of constants.

1. To state Proposition 14.1 more precisely, suppose {X,} is a Markov process on a
metric space S having an infinitesimal generator A on a domain -, and transition
operators T, on the Banach space Cb (S) of all real-valued bounded continuous
functions f on S, with the sup norm II.f I). Let cp be a continuous function on S onto
a metric space S'. Let be a class of real-valued, bounded and continuous functions g
on S' having the following two properties:
g then g o ( A' and A(g o q) = h o cp for some bounded continuous hon S';
(i) If g e
(ii) 9 is a determining class for probability measures on (S', M(S')), that is, if , v are
two probability measures such that $ g dp = J g dv for all g e then p = v.
Then {tp(X,)} is a Markov process on S'.
To see this consider the subspace B of C,(S) comprising all functions g o gyp, g in
the closure of the linear span of 2(f) of W . Then B is a Banach space, and A is the
generator of a semigroup on B by the Hille -Yosida Theorem and property (i). This
implies in particular that T, maps lB into itself, so that T,(g o 4,) is of the form h, o cp
for some bounded continuous h, on S'. Since 5 is a determining class, it follows that
T,(g o cp) is of the form h, o cp for every bounded measurable g on S'. Therefore,
Proposition 6.3 applies, showing that {q (X,)} is a Markov process on S'.

1. The Markov property of {X,} as defined by (15.1) does not require any assumption
on (G. But for the continuity at the boundary of x - T, g(x) := E. g(X) for bounded
continuous g on G, or of the solution /i f of the Dirichlet problem (15.23) for continuous
bounded f on (G, some "smoothness" of aG is needed. The minimal such requirement
is that every point b of G be regular, that is, for every t > 0, Px (t d0 > t) -+ 0 as
x - b, x e G. Here T ai is as in (15.2). A simple sufficient condition for the regularity
of b is that it be Poincare point, that is, there exists a truncated cone contained in
68'`\G with vertex at b. For the case of a standard Brownian motion on R', simple
proofs of these facts may be found in E. B. Dynkin, and A. A. Yushkevich (1969),
Markov Processes: Theorems and Problems, Plenum Press, New York, pp. 51-62.
Analytical proofs for general diffusions may be found in D. Gilbarg and
N. S. Trudinger (1977), Elliptic Partial Differential Equations of Second Order,
Springer- Verlag, New York, p. 196, for the Dirichlet problem, and A. Friedman
(1964), Partial Differential Equations of Parabolic Type, Prentice-Hall, Englewood
Cliffs.
2. Let f be a bounded continuous function that vanishes outside G. Define
Tf(x):= E(f(X ( )1 wc ) ( X = x). If b is a regular boundary point then as x -+ b
(x e G), T f (x) -+ 0. In other words, TC c C , where Co is the class of all bounded
continuous functions on G vanishing at the boundary. If f is a twice continuously
differentiable function in C o with compact support then one may show that f e ^,ô,
the domain of the infinitesimal generator A of the semigroup {T}, and that
(Af )(x) _ (A f)(x) for all x e G. For this one needs the estimate P,(T eG <- t) - 0
as t --+ 0, uniformly for all x in the support of f (see Exercise 3.11 of Chapter VII).
Also see (15.24), (15.25), as well as Exercise 3.12 of Chapter VII.
As indicated in Section 7 following (7.12), if aG is assumed regular and p (t; x, y)
has a limit as x approaches G, then this limit must be zero. For the existence of a
smooth p continuous on G, and vanishing on 3G, see A. Friedman, loc. cit., p. 82.
3. (Maximum Principle for Elliptic Equations) Suppose first that Au(x) > 0 for all x e G,
where G is a bounded open set. Let us show that the maximum of u cannot be
attained in the interior. For if it did at x 0 e G, then ((3u/x''') x0 = 0, 1 < i _< k, and
B:= (( 2 u/(3x (') dx (j) )),ro is negative semidefinite. This implies that the eigenvalues of
CB are all real nonpositive, where C:= ((d(x )). For if ) is an eigenvalue of CB,
then CBz = Az for some nonzero z e C k and, using the inner product < , > on C,
0 _< <CBz, Bz> = <Az, Bz> _ A<z, Bz>. Therefore, A _< 0. It follows now that
d ;; (x )(z u/x (' (3 )x 0 = trace of CB is <, 0. Therefore, Au(x 0 ) < 0, contradicting
Au(x 0 ) > 0.
Now consider u such that Au(x) _> 0 for all x e G. For > 0 consider the function
u,(x):= u(x) + e exp{yx^' )}. Then Au(x) = Au(x) + E[Zd l ,(x)y z + "(x)y] exp{yx"^}.
Choose y sufficiently large that the expression within square brackets is strictly positive
for all x e G. Then Au(x) > 0 for x e G, for all t > 0. Applying the conclusion of the
preceding paragraph conclude that the maximum of u t is attained on l G, say at x^.
Since G is compact, choosing a convergent subsequence of x L as E j 0 through a
sequence, it follows that the maximum of u is attained on EG.
Finally, if Au(x) = 0 for all x e G, then applying the above result to both u and
u one gets: u attains its maximum and minimum on c?G. Note that for the above
proof it is enough to require that '(x), d ;j (x), I _< i, j < k, are continuous on G,
(
and d 4 (x) > 0 on G for some i. Under the additional hypothesis that ((d ;; (x))) is
positive definite on G, one can prove the strong maximum principle: Suppose G is a
bounded and connected open set. 1f Au(x) = 0 for all x e G, and u is continuous on G,
then u cannot attain its maximum or minimum in G , unless u is constant.
The probabilistic argument for the strong maximum principle is illuminating.
Under the conditions (1)(4) of Section 14 it may be shown that the support of the
probability measure P. is the set of all continuous functions on [0, ou) into R', starting
at x (see theoretical complements to Section Vll.3). Therefore, for any ball B x, the
Pr -distribution i(x; dy) of X i 8 has support i)B. Now suppose u is continuous on G
and Au(x) = 0 for all x e G. Suppose u(x 0 ) = maxju(x): x e G} for some x o e G. Then
for every closed ball B e = ;x: Ix x 0 ) _< e} c G, the strong Markov property yields:
u(x o ) = E..u(X,,,,,,), in view of the representation (15.22). (Also see Exercise 3.14 of
Chapter VII ). That is,
u(xo) = u(y)ê(xo; dy), (T.15,1)

J
where 0,(x o ; dy) is the PX -distribution of X. Since u is continuous on iB and the
probability measure 0,(x 0 ; dy) has the full support (,B,, the right side is strictly smaller
than the maximum value of u on B E , namely u(x o ), unless u(y) = u(x 0 ) for all y e i B^.
By letting E vary, this constancy extends to the maximal closed ball with center x 0
contained in G. By connectedness of G, the proof is completed.
4. Let G be a bounded open set such that there exists a twice continuously differentiable
real-valued function cp on R'` such that G = {x: ep(x) < e}, (G = ix: q(x) = c;, for
some c e 18. Assume also that grad cpi is bounded away from zero on iG. If u is a
twice continuously differentiable function in G such that u, i u/<,x 4 ' ) , i 2 u/i)xW i^x^' 4
(I _< i, j < k) have continuous extensions to G, then there exists a twice continuously
differentiable function q on 9 with compact support such that q = u on G (see
Gilbarg and Trudinger, loc. cit., p. 130).

1. D. W. Stroock and S. R. S. Varadhan (1979), Multidimensional Diffusions,
Springer- Verlag, New York, Chapter 10, have constructed diffusions on R'` under the
assumptions (i) ((d(x))) is continuous and positive definite, (ii) jl ( '(x) (I s i _< k) are
Borel-measurable and bounded on compacts, and (iii) an appropriate condition for
nonexplosion holds. These conditions apply under our assumptions in Section 16.
2. The present method of constructing reflecting diffusions in one and multidimensions
may be found in M. Friedlin (1985), Functional Integration and Partial Differential
Equations, Princeton University Press, Princeton, pp. 86, 87, and in R. N.

Bhattacharya and E. C. Waymire (1989), "An Extension of the Classical Method of
Images for the Construction of Reflecting Diffusions," Proceedings of the R. C. Bose
Symposium on Probability, Statistics and Design of Experiments (ed. R. R. Bahadur,
J. K. Ghosh, K. Sen), Eka Press, Calcutta.
3. The transformation g(x) = x (see (16.9)) generates a group of transformations
G = {g o , g} with g o as the identity. A maximal invariant under G is
q (x) = (Ix" > I, x (2) , .. , x (k) ). By the proposition in theoretical complement 6.2, {(X,)}
is Markov if p(t; z, y) = p(t; x, y). The transition density of the unrestricted diffusion
{X i } in Proposition 16.1 has this invariance property. As further applications of the
proposition, consider the following examples.
Example 1. (Radial Diffusions). Let {X} be a diffusion on ll ' with drift s(.), and
diffusion matrix ((d1 (. ))). If I d;j (x)xl' x'j and 2 E x ( ' ) (i) (x) + E d 1 (x) are radial
) )
functions (i.e., functions of Ix), then {IX,I} is a Markov process. This follows from
the fact that if f is a smooth radial function then Af is also radial (see (15.35), and
theoretical complement 14.1). The underlying group G in this case is the group of
all orthogonal matrices.
Example 2. (Diffusions on the Torus). Let {X 1 } be a diffusion on Ft" such that 1 ` ) (x)
and d ij (x) are invariant under the group G generated by translations by a set of k
linearly independent vectors ^,,, 4 2 , ... , 4k . In other words,
(i)(X + m r r^ _ p (x), dij(x + I mr4r = dij(x) (1 < i..% <, k),

1 1 )
for all x and all k-tuples of integers (m 1 , M21. .. , m k ). Now express x (uniquely) as
S,(x) r , and let 8,(x) = 5 r (x)(mod 1) e [0, 1), 1 -< r -< k. Then the maximal
invariant under G is rp(x):= ($ 1 (x), ... , &(x)), and {cp(X,)} is a Markov process on
the torus [0,1) k .

1. Taylor's article appeared in G. I. Taylor (1953), "Dispersion of Soluble Matter in a
Solvent Flowing Through a Tube," Proc. Roy. Soc. Ser. A, 219, pp. 186-203. It was
shown there that asymptotically for large U0 , D a 2 U/48D0 . The explicit
computation (17.30) was given by R. Aris (1956), "On the Dispersion of a Solute in
a Fluid Flowing Through a Tube," Proc. Roy. Soc. Ser. A, 235, pp. 67-77, using the
method of moments and eigenfunction expansions.
The treatment of the TaylorAris theory given in this section follows R. N.
Bhattacharya and V. K. Gupta (1984), "On the TaylorAris Theory of Solute
Transport in a Capillary," SIAM J. App!. Math., 44, 33-39.
CHAPTER VI
Dynamic Programming and

Stochastic Optimization
1 FINITE-HORIZON OPTIMIZATION
Consider a finite set S, called the state space, a finite set A, called the action
space, and, for each pair (x, a) e S x A, a probability function p(y; x, a) on S:
p(y; x, a) 0, p(y; x, a) = 1. (1.1)

yes
The function p(y; x, a) denotes the probability that the state in the next period
will be y, given that the present state is x and an action a has been taken.
A policy (or, a feasible policy) is a sequence of functions (fo , f,, .. .) defined
as follows. fo is a function on S into A. If the state in period k = 0 is x 0 then
an action f0 (x 0 ) = a o is taken. Given the state x 0 and the action a o = fo(xo),
the state in period k = I is x, with probability p(x,; x 0 ,f0 (x 0 Now fl is a
)).
function on S x A x S into A. Given the triple x o , fo (x o ), x,, an action

.f1(x0,f0(x0), x 1 ) = a l is taken. Given x 0 , a o and the state x i and action a l , the
probability that the state in period k = 2 is x 2 is p(x 2 ; x,, a,). Similarly, f2 is
a function on S x A x S x A x S into A; given x o , a o , x i , a l , x 2 an action
a 2 = f2 (x o , a o , x 1 , a 1 , x 2 is taken. Given all the states and actions up to period
)
k = 2 (namely x0, ao = .fo(xo), xi, ai = fi(xo, ao, x1), x2, a 2 = .f2(x0, ao, xi, ai x2)),
,
the probability that a state x 3 occurs during period k = 3 is p(x 3 ; x 2 , a 2 ), and

so on. In general, a policy is a sequence of functions { fk : k = 0, 1, 2, ...} such
that fk is a function on (S x A)' x S into A (k = 1, 2, ...), with fo a function
on S into A. The sequence may be finite { fk : k = 0, 1, ... , N }, called the
finite-horizon case, or it, may be infinite, the infinite-horizon case.
Consider first the case of a finite horizon. Given a policy f = ( fo, f1, ..
N + 1 random variables X0 , X,,. , XN are defined, Xk being the state at time
..
519
520 DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
k. The initial state X0 may be either fixed or randomly chosen according to a

probability in o = {n 0 (x): x e S}. Given Xo = x 0 the conditional probability ,
distribution of X l is given by
P(X1 = xl X0 = x0) = P(x1; x0,.fo(xo)). (1.2)
Given X0 = x 0 , X l = x 1 , the conditional probability distribution of X2 is given

by
P(X2 = I
x2 X0 = x0 , X1 = x1) = P(x2; x1, al), al = f1(xo,f0(x0) , x1).
(1.3)
In general, given X. = x0, Xl = x1, Xk -1 = X k _ 1' the conditional distribution
of Xk is given by
P(Xk = Xk I XO = X0, Xl = X1, ... , Xk- I = xk- 1) = P(xk; Xk- 1, ak- 1), (1.4)
where the actions ak are defined recursively by
ak = fk(xo , a0, x1, al, ... , Xk-1r ak-1 , xk) (k = 1, 2, ... N),
(1.5)
a0 = f0(xo)
The joint distribution of X0 , X 1 , ... , Xk is then given by
10.k(XO, X1, ... , Xk):= P(X0 = X0, X1 = xl, ... , Xk = Xk)
_ iro( x0) Pt x1>x0 , ao)P(X2=x1 , a,)

. ..
P(Xk;Xk-1>ak-1) ,
(k = 1,2,...,N) (1.6)
with the states a 0 , a 1 , ... defined by (1.5). Expectations with respect to n,N
will be denoted by o En .
The kth period reward function is a real-valued function g k on S x A such
that if the state at time k is X k and the action at time k is a k , then a reward
g k (x k , a k ) accrues. The total expected return for a policy f is given by
N
Jao Eaak=0
Z gk(Xk, ak), (1.7)
where a k denotes the (random) action at time k determined by the states

X0 , X 1
, ...X,,_ 1 according to (1.5). In case n o = S write Jx for J
X 0.
Let F denote the set of all policies. The object is to find an optimal policy,
i.e., a policy f* = ( f , ... , f N) such that
J 110 J` o for all f e F, (1.8)
for all initial distributions no.

FINITE-HORIZON OPTIMIZATION 521
In view of the fact that the probability of transition to a state x k+ , in period

k + I depends only on the state X k and action a k in period k, it is plausible that
the search for an optimal policy may be confined to the class of policies
f = ( fo fl , . .. 'f) such that the action a k depends only on the state x k in period
,
k. In other words, fk is a function on S into A (k = 0, 1, ... , N). Such policies

are called Markovian. The reason for this nomenclature is the following
proposition.
Proposition 1.1. If f = ( f,,fl , . .. , fN ) is a Markovian policy, then X0 , X,,

... , XN is a Markov chain (which is, in general, time-inhomogeneous).
Proof. By (1.6) one has
P(XO = x0, X, = x,, ... , XN = xN)

= lr0(x0)P(x,; xo,fO(xO))P(x2; x1,Jl(x,)) .. . P(xN; xN I , fN-1(xN -1))
= 71 o(xo)P1(xl; xo)P2(x2; x1) ...
PN(xN; xN -1), (1.9)
where
Pk (xk; xk 1) = P (xk; xk l , fk 1(xk 1)) ( 1.10)
Hence X0 , X 1 , ... , XN forms a finite Markov chain with successive one-step

transition probabilities P,, P2, ... , PN. n
Let us show that a Markovian optimal policy exists. For each yE S pick an
element f N(y) (perhaps not unique) of A such that
max 9 (Y, a) = 9N(Y,fN(Y)).

, (1.11)
aEA
Write
hN(Y) = 9N(Y9fN(Y)) (1.12)
Next, for each y E S find an element f N - 1 (y) of A such that
max [9N -1(Y> a) + I hN(z)P(z; Y, a)

aEA 2E$ J
= 9N -1(Y,f -1(Y)) + X hN(z)P(z;Y,.IN -1(Y)) = hN -1(Y), (1.13)
ZES
say. In general, let f k be a function (on S into A) such that
max 9k(Y, a) + I hk+ l(z)P(z; Y, a) = 9k(Y, f

aeA L ZES J (Y)) + Y hk+ 1 (z)p(z;
ZES
Y, fk (Y))
=hk(Y), (k= N 1,N-2,...,0)

(1.14)
say, obtained successively (i.e., by backward recursion) starting from (1.11) and
(1.12).
Theorem 1.2. The Markovian policy f* (f 0* f i . .. , f N) is optimal.

Proof. Fix an initial distribution n o . Let f = (fo , f1 , . .. , fN ) be any given policy.
Define the policies
k
.. . >fN) (k = 1,... ,N),
{' {'
f^ = (fo'f . ..
(1.15)
f(0) = f*
We will show that

Jn o < J f '' (k = 0, 1, 2, ... , N). (1.16)
First note that the joint distribution of Xo , X 1 , ... , Xk is the same under f
and under f (k) (i.e., n, k = it). In particular,
k-1 k-1
ERO 9k'(Xk,, a k .) = E o ' Z g k (Xk ., a k .). (1.17)
k'=0 k'=0
Now,
Erto{9N(XN , aN) I XO = X0, X1 = X 1 , . .. , XN = XN]
= 9N(XN, aN) ' 9N(XNr f *I \\= hN(XN)
= Eao ' 19N(XNr aN) I X O = x0, ... , XN = XN]. (1.18)
Since n.N = n N, averaging over x o , x 1 , . .. , X N , one gets
En09N(XN aN) <, Erto'9N(XN, aN).

, (1.19)
Together with (1.17) with k = N, this proves (1.16) for k = N. To prove (1.16)
for k = N 1, note that
Es,C9N-1(XN-1, aN-1) + 9N(XN, aN) I Xo = xo, . .. , XN-1 = XN-1]
= 9N-1(XN-1 , aN-1)+ Y- 9N(z,aN(Z))P(Z;XN-1,aN-1) (1.20)

zes

FINITE-HORIZON OPTIMIZATION 523
where a N _ 1 , a N (z) are defined by (1.5) with x N replaced by z. By (1.12) and

(1.13), the right side of (1.20) does not exceed
hN-1(xN -1) = En0 'I [ gN-1(XN - 1'aN -1) +gN(XN,aN)
XO = x0 > - ' , XN 1 = xN - 1] ( 1.21)
Since n.N 1 = n N 1, it follows by averaging over x 0 , x 1 , ... , X_ 1 that
Eno(9N_,( XN -1 aN -1)
, + 9N(XN, aN)) <, Erzo 11 (9N-1( XN -1 aN -1) , + 9N(XN, aN))
(1.22)
Together with (1.17) for k = N 1, (1.22) yields (1.16) for k = N 1.
To complete the proof we use backward induction. Assume that
N
r
E no .
9k'(Xk ak') I XO = XO, , Xk +l = Xk+I hk+11/ ll23)
(1.23)
k'=k+1
for all x 0 ,x 1 ,...,x k + , eS and
I
N
En`" .) IX
(X.
9k' k>a k k+l = x k+l h k+l(x k+l )
(1.24)
k'=k+1
Y_
for all X k+ 1 e S. Then
Eno E gk,(Xk., a k ') X0 = x 0 , ... , Xk = xk

k'=k
N
= gk(xk, ak) + Ea o Eao 9k'(Xk', a k ,) I X 0 ,.. . , Xk +1
k'=k+1
XO = X 0 , X i = x 1 , ... , Xk = xk1
gk(xk , ak) + Eno[hk+ 1(Xk+ 1) I XO = X0, . .. , Xk = Xk]

= 9k(xk , ak) + hk+1(z)p(z; Xk, ak) < hk(xk) (1.25)
=ES
The first inequality in (1.25) follows from (1.23), while the last inequality follows
from (1.14). Hence (1.23) holds for k. Also,
En o [gk(Xk, ak) +
hk(xk) = gk(xk, f (xk))
r1kl
+
Y_
1] hk+ I (z)p(z; xk, f (xk))
=ES
= Eno ' [gk(Xk , ak) + hk+ 1(Xk +1) I Xk = Xk]

N
k'=k +I
gk'(Xk', ak') Xk = Xk I (1.26)
where the last equality follows from the facts that:

(i) conditional distributions of (Xk+ 1, , XN ) given Xk+ 1 are the same
under n N and n N ", and
(ii) one has
N
E E
7[p 1 Rp /
gk'lXk'^ ak') I Xk+l Xk
k'=k+1

E 110 1nol
E E gk'(Xk'^ ak') ( Xk^ Xk+l Xk
[ k'=k+l
N
=t o gk'(Xk,, a k ) Xk . (1.27)
k'=k+1
Hence, (1.24) holds with k + 1 replaced by k, and the induction argument is

complete, so that (1.23), (1.24) hold for all k = 0,1, ... , N 1. Thus, one has
N
%no = E ao y- gk'( Xk' , ak)

k'=0
k-1 N l
E><o gk'(Xk,, ak') + E 0 gk'(Xk', ak') I
k'=0 k'=k
k-1
= E><o(^ 9k'(Xk', Q k )) + Erzp(E 9k (Xk s Qk)
r N
k'=0 / L k'=k Xk])

k-1
E tt0 Y
k'=0
gk'(Xk'^ ak') + EROhk(Xk)
k-1
(Ikl r (Ikl
= E no : gk'(Xk', ak') + Eao hk(Xk)

k'=0
N
(Ikl (Ikl
= En o gk'(Xk., Q k .) = J np . (1.28)
k'=0
Letting k = 0, the desired result is obtained. n
Exactly the same proof works for the more general result below (Exercise 3).
Theorem 1.3. Assume S is countable, A compact metric, and the reward

functions a p g k (x, a) are bounded and continuous for each x a S. Then
f* =(f ,f ...f*)isoptimal.
There are several useful extensions of the problem of optimization over a

finite horizon considered in this section.
First, the set of all possible actions when in state x may depend on x. Denote
THE INFINITE-HORIZON PROBLEM 525
this set by A r (x e S). In this case one simply maximizes over a e A y in

(1.11)(1.14) (Exercises 1, 2).
Secondly, instead of maximizing (rewards), one could minimize (costs). In
all the results above, max is then replaced by min, and inequality signs get
reversed (Exercise I ).
One may often derive existence of optimal policies when S is a continuum,
g k are unbounded, and A is noncompact, if certain special structures are
guaranteed. For example, if S, A are intervals (finite or infinite) and the functions
a ' gk(x, a), gk(x, a) + hk+ 1(z)p(dz; x, a) (0 < k < N 1)

J
have unique maxima in A, then all the results carry over. Here (i) p(dz; x, a)
is, for each pair (x, a), a probability measure on (the Borel sigmafield of) S, (ii)
hk +1(z) is integrable with respect to p(dz; x, a), and its integral is a continuous
function of (x, a). See Section 5 for an interesting example.
Finally, the rewards g k (0 <, k < N) may depend on the state and action in
period k, and the state in period k + 1. This may be reduced to the standard
case considered in this section. Suppose gk(x, a, x') is the reward in period k if
the state and action in period k are x, a, and the state in period k + I is x'.
Then define
gk(x, a) := I gk(x, a, z)p(z; x, a). (1.29)
Then for each policy f,
E rzo
N
k=0
g(X , ak, Xk + 1) = Erzo
= Eno Y_
N
k=0
N
k=0
gk(Xk, a k ).
Thus, the maximization of the first expression in (1.30) is equivalent to that of

the last (see Exercise 2).
I
E(gk(xk, ak, Xk+ 1) X0, , Xk)
(1.30)
2 THE INFINITE-HORIZON PROBLEM
As in Section 1, consider finite state and action spaces S, A, unless otherwise

specified. Assume that
sup Ig k (x, a)I) < oo . (2.1)

k=0 xOS.aEA
An important example in which (2.1) is satisfied is the case of discounted

rewards g k (x, a) = 8 k g o (x, a) for some discount factor S, 0 0 there exists an integer N E such that
I gk(xk, ak) j, gk(Xk,

k k ak) < 2
( ) 2.2
for all sequences {x k } in S and {a k } in A. Let (f o*`, f ; `, ... , f) be an

optimal Markovian policy over the finite horizon {0, 1, 2, ... , NE }, and
f*` = (f", f iE, ...) with f = f NE for all k> N. Then f*` is a Markovian
policy that is E-optimal over the infinite horizon, i.e., given any policy f one has
'ino = Eno L gk(Xk, ak)

r = E aoL ( 1 gk(Xk , ak)) + E 0 ( Z gk(Xk, ak))
k=0 k=0 ( k=N,+ 1
N^ E ao
i En o k ^ gk(Xk, ak)J 2 =gk(Xk, a k )) _c=J o _c. (2.3)
i E^^ ( = O
By Cantor's diagonal procedure, one may find a sequence S j.0 and a

sequence of functions { f,*a"} from S into A (k = 0, 1, 2, ...) such that
(Exercise 1)
lim f "(x) = f(x) for all k = 0, 1, 2, ... , and for all x e S, (2.4)
n + ao
for some sequence of functions f,*. Since S and A are finite, (2.4) implies that
f,*, b^(x) = f k (x) for all sufficiently large n. From this it follows that given any
integer N there exists n(N) such that
IN
Eao^ gk(Xkr ak)) = EIto gk(Xk, ak)) for n > n(N). (2.5)
\k=0 \k=0
In particular, for fixed E > 0, (2.5) holds for N = N E . In view of this and (2.2),
one has for all n > n(N).
s N` 1 E Ea ,
JEo k^ gk(Xk, ak) ) 2 = E no ^ gk(Xk, ak)) 2 i J n o E. (2.6)
(k N'
Now apply (2.3) to this to get J`, > Jn 0 S E for all n > n(N), and then let
n cc to obtain
JRO ? J,` 0 E.

Since this is true for all e, it follows that
Jrz ? J,` o for all f e F. (2.7)
In other words, f* is an optimal Markovian policy.

We have thus proved the following result.
Theorem 2.1. Under the hypothesis (2.1), there exists an optimal Markovian
policy.
We now consider a special case of Theorem 2.1, the case of discounted dynamic
programming. Assume that
gk(x, a) = b k go(x, a) (k = 0, 1, 2, ...), (2.8)
where g o is a given real-valued function on S x A, and S is a constant satisfying
0 < 5 <l. (2.9)
A Markovian policy f = (f0 , fl , ...) is said to be stationary if fk = fo for all

k. It will be shown now that for the reward (2.8) there exists a stationary optimal
policy.
Let f* _ (f o*, f i , ...) be an optimal Markovian policy in this case (which
exists by Theorem 2.1). Write Ez for E 0 and J,, for J 0 in the case n({x }) = 1.
Then,
JI = Ex bkgo(Xk, ak)
k=0
= 90(x, i(x)) + SEzE ^ (5kg0(xk +1, ak+l) 1

k=0 X'J)
= 9o(x, i(x)) + 8 J^'P(z; x,.f(x)), (2.10)
where f" = (f ,*, f z, ...). Since f* is optimal, one has

1
Jf'" < J ` for all z e S. (2.11)
Therefore,
J go(x,f(x)) + a JZ^P(z; x, i(x)) = Jx` (2.12)

zes
where f* (1) = (f *, f *, fi , f *....) But f* is optimal. Therefore, one must have

equality in (2.12). Similarly, one shows successively that
f in) = 1 01 -11
(n = 1, 2, ...), f *(0) = f* (2.13)
where
f *(n-1):_
(fo,JO, ... IJo,
flI f2l ...),
with fo appearing in the first n positions. By (2.2) it follows that the stationary
policy f* = ( f * ,f *, f *' 'f, ...) is optimal. We shall denote it by f*. This
proves the first part of Theorem 2.2 below. The dynamic programming equation
(2.14) below may be derived by letting k T oo in (1.14) with g k (x, a) = S k (x, a).
However, an alternative derivation of Theorem 2.2 is given below.
Let us describe first the so-called dynamic programming principle. Suppose
there exists a stationary optimal policy f* = ( f *, f *, ...) in the case of
discounted rewards (2.8) with the discount factor S satisfying (2.9). Let J, := Jz'
denote the optimal reward, starting at state x. Fix x e S and a e A. Consider a
Markovian policy f' = ( fo , f *, f *, . ..) for which f0 (x) = a, while the optimal
policy is used from the next period onward. Since the conditional distribution
of {Xk : k > 1}, given X 1 under f' is the same as that under f*,
J' go(x, a) + EX E (j b k go(Xk,.f *(xk)) = go(x, a) + SEX'JxI

L k
k=1 X, /]
= go(x, a) + b 1] J2p(z; x, a).
Z E.S
The optimal reward Jx must equal the maximum of Jz over all choices of
f0 (x) = a E A. Thus, one arrives at the dynamic programming equation (2.14).
Theorem 2.2. Let g o be bounded and 0 < 6 < 1. Then for the discounted
rewards (2.8) there exists a stationary optimal policy f* = ( f *, f *, ...). This
policy satisfies the dynamic programming equation, or the Bellman equation, for
each x e S, namely
JX + = max { 9 o (x, a) + b J f* p(z; x, a)} . ( 2.14)

aES ( zcs
Conversely, if a stationary Markovian policy f* satisfies (2.14) for all x e S,

then it is optimal.
Proof. Denote by B(S) the set of all real-valued functions on S. For each

function f on S into A, define the maps T, Tf : B(S) --- B(S) by
(TfJ)(x) = 90(x,.f(x)) + 6 1 J(z)p(z; x,f(x)),

ZES
(2.15)
(TJ)(x) = max { 9 o (x, a) + 6 1 J(z)p(z; x, a) } (x ES, J e B(S)).
aEA (( ZES ))
For all J, J' E B(S) one has
II T} J Tf J'll = max ITT J(x) TJ J'(x)I = 6 max

XES XES
IY
ZES
(J(z) J'(z))p(z; x,.f(x))
1<SIIJ J'll, (2.16)
where II I denotes "sup norm", IIgII := max {lg(x)I: x E S }. Turning to T, fix

Je B(S). Let the maximum in (2.15) be attained at a point f (x) (x es). Then,
(TJ)(x) = (T1 J)(x). (2.17)
Also, note that for all J' E B(S) one has
(TJ')(x) > (Tf J')(x) (x e S). (2.18)
Combining (2.17) and (2.18), one gets, for all J',
(TJ)(x) (TJ')(x) = (Tf J)(x) (TJ')(x) <, (Tf J)(x) T1 J'(x). (2.19)
By (2.16) and (2.19),
sup [(TJ)(x) (TJ')(x)] < sup [(TJJ)(x) (TfJ')(x)] < 611 J J'. (2.20)
xeS XES
This inequality being true for all J, J' e B(S), one gets (interchanging the roles
of J and J'),
IITJ TJ'jI <6IIJJ'II. (2.21)
Thus, T is a (uniformly strict) contraction on the space B(S). Therefore, it has

a unique fixed point J *; i.e., TJ* = J* (Exercise 2),
J*(x) = max { 9 o (x, a) + b 1 J *(z)p(z; x, a) } . (2.22)

aEì l ZES )
Let a = f *(x) maximize the right side of (2.22) (x e S). Then,
J * (x) = 9o(x,f * (x)) + b Z J * (z)p(z; x,.f * (x)) = Tf .J *(x). (2.23)

ZES
Now let f be an arbitrary policy. In view of (2.22) one has, for all k = 0, 1, 2....
(and all x),
J*(Xk) i g0(Xk, ak) + SE ' (J * (Xk+1) 1 {Xk, ak}). (2.24)
Multiplying both sides by 8 k and taking expectations one gets
S k E f J * (Xk) i BkExgo(Xk, ak) + 6k+IExJ*(Xk+l). (2.25)
Summing (2.24) over k = 0, 1, 2,... and canceling common terms from both
sides it follows that
W
J * (x) i h k f g0(Xk, ak) = J. (2.26)
k=0
Now note that one has equality in (2.24)-(2.26) if f = f = (f *, f *, f *, ...)

(see (2.23)). Hence f* is optimal. This proves that there exists a stationary
optimal policy f* satisfying (2.14). Conversely, if (2.14) holds (i.e., (2.22) and
(2.23) hold) then (2.24)-(2.26) (with equalities in case f = f*) show that f* is
optimal. n
For the map T defined in (2.15), (T N O)(x) is the N-stage optimal reward
function for the discounted case, as may be seen from (1.11)-(1.14) and Theorem
1.2. Hence J*(x) = lim N -.(T N 0)(x) is the optimal reward function in the
infinite-horizon case. The contraction property of T shows that T"'J(x)
converges to the optimal reward function, no matter what J is.
The proof of Theorem 2.2 extends word for word to the following more
general case.
Theorem 2.3. Let S be countable, A a compact metric, g o bounded, and

a -+ g o (x, a) continuous for each x. Then the conclusions of Theorem 2.2 hold.
Further extensions are indicated in the theoretical complements.
Example. Let S = 7L, the set of all integers, A = [a, 1 a] with 0 < a < and
p(x + 1;x,a) = a, p(x 1; x,a) = 1 a for x 0 0,

a a (2.27)
p(0;0,a)= 1 a, p( 1 ; 0 ,a)=2, p(-1;0,a)= 2 .
Take gk (x, a) = b k ^p(x) (0 < b < 1), where co is bounded and even (i.e.,
cp(x) = gyp(x)), and cp(jxl) decreases as lxi increases. Then the hypothesis of
Theorem 2.3 is satisfied. Under a stationary policy f = (f, f, . ..) the stochastic

process {Xk } is a birth-death chain on 7 with transition probabilities
x ' = P(Xk+1 = X + 1 I Xk = X) = f(x) ,
5 x := P(Xk+ 1 =x 1 I Xk = X) = 1 f(x) (X 0),
f(0) ,
YO' = P(Xk +1 = 1 I Xk = 0) _ O = P(Xk +l = l I Xk = O) =
2
1 o S O = P(Xk +l = O I Xk = O) = I f(0). (2.28)
Since the maximum of cp is at zero, and q(ix ) decreases as lxi increases, one
expects the maximum (discounted) reward to be attained by a policy
f* = (f* f*,.. .) which assigns the highest possible probability to move in the
,
direction of zero. In other words, an optimal policy is perhaps given by
1 a ifx<0,
f *(x) = a if x > 0, (2.29)

a ifx =0.
Let us check that f* is indeed optimal. To simplify notation, write Ex for

E. Assume, as an induction hypothesis, that the following order relations hold.
(i) E x (p(Xk ) > Ex+ 1 p(Xk) for all x > 0,

(ii) E x 9(Xk ) E x+ ,cp(Xk ) for all x 1, (2.30)
(iii) E x cp(Xk ) = E_ x cp(XX ) for all x.
Clearly, (2.30) holds for k = 0 in view of the assumptions on q. Suppose it holds

for some k. Then for x > I one has, by (i),
Exgp(Xk +l) = (1 a)Ex-1(P(Xk) + aEx+1p(Xk) (1 a)Ex(P(Xk) + aEx+2(P(Xk)
= Ex+14(Xk +l), (x > 1). (2.31)
Similarly, using (ii),
Exq (Xk +1) < Ex+1gq(Xk),

, (x'< 1). (2.32)
Also, (iii) is trivially true for all k if x = 0. For x > 0 one has, by (iii),
EX^P(Xk +l) = (1 a)Ex-1gq(Xk) + aEx+1(P(Xk)
= (1 a)E- x +1gq(Xk) + aE_x-1(Xk) = E -x(P(Xk +l). (2.33)

Finally, using (2.31)-(2.33),
/ a a /
Eogo(Xk+l) = ( 1 a)EO(p(Xk) + 2 E-Igq(Xk) + 2 E1(P( Xk)
_ (1 a)Eo^P(Xk) + aEI(P(Xk)
(1 a)Eo^P(Xk) + aE2(P(Xk) = EIp(Xk+l). (2.34)
By (2.31) and (2.34), the relation (2.30(i)) holds with k replaced by k + 1. By

(2.32), (2.30(ii)) holds for k + 1; in view of (2.33), (2.30(iii)) holds for k + 1.
Hence the relations (2.30) hold for all k >, 0. To complete the proof that f* is
optimal, let us now show that the expected discounted reward J x under f' given
by
00
Jx := I, 6k E.(V(Xk) (2.35)
k=0
satisfies the dynamic programming equation (2.14). That is, one needs to show
ix =cp(x)+6max[(1 a)JJ _ I +aJx+t ] (x00),

aEA
(2.36)
J0 =rp(0)+6max (1 a)Jo +2J_ I +aJ, ] .
[
Now, by (2.30(i)),
Jx_I'>Jx+I forx>,l,
so that, assigning the largest possible weight to the larger quantity J._ I ,
max [(1 a)Jx _ 1 + aJx+l] = (1 a)Jx _ 1 + aJ+l, (x >, 1). (2.37)

aeA
But Jx satisfies the equation

ao
Jx = q (x) + b bk-'Exp(Xk)
k=1
(P(x) + (5 ak-1 [(1 a)E x-l^(Xk-1) + aEx+l9(Xk-1)]

k=I
= q(x) + S[(1 a)Js _ I + aJx+1], (x ? 1). (2.38)
Hence, Jx satisfies (2.36) for x >, 1. In exactly the same way one shows, using
(2.30(ii)), that J. satisfies (2.36) for x < 1. Finally, since J-, = J 1 by (2.30(iii)),

OPTIMAL CONTROL OF DIFFUSIONS 533
and .J0 ?J 1 (= J_ 1),
_, + --J,, (2.39)
max (1 a)Jo +2J_, +-J 1 ]= (1 a)Jo +2J
acA [
2
assigning the largest possible weight to J0 for the last equality. The proof of
(2.36), and therefore of the optimality of f *, is now complete.
In general, even for the infinite-horizon discounted case, it is difficult to
compute explicitly optimal policies and optimal rewards. The approximations
T"0, and the corresponding N -period optimal policy given by backward
recursion (1.11)(1.14), are therefore useful.
3 OPTIMAL CONTROL OF DIFFUSIONS
Let G be an open set c l, and consider the problem of minimizing the expected
penalty E X f (X.. (; ) on reaching the boundary 0G, starting from a point x outside
of G= G v 1G. Here {X,} is a diffusion on Fk" \ G, with absorption at 1G. The
function f is a continuous bounded function on 1G. If f = I on 1G, the problem
is to minimize the probability of ever reaching 13G. As usual TG denotes the
first time to reach the boundary 1G.
The drift vector (x, c) and diffusion matrix D(x, c) of the diffusion are
specified. The control variable c, which may depend on the present state x,
takes values in some compact set C c 18m The function c(.) (on W'\G into C)
so chosen is called a control, or a feedback control. The class of allowable, or
feasible, controls may vary. For example, it may be the class of all measurable
functions, the class of all continuous functions, the class of all continuously
differentiable functions, etc.
Let us give a heuristic derivation of the dynamic programming equation, by
means of the so-called dynamic programming principle. Write Ex for the
expectation when the diffusion has coefficients i(., c), D(., c) with c constant,
and xis the initial state. If c(. ) is a control (function), this expectation is denoted
EX^ , the diffusion having drift and diffusion coefficients (y, c(y)), D(y, c(y)).
-)
Write
Je(x) '= Ex i(X r ^), J` (x) = EX ' f (Xr.G), J(x)t= inf J(x), (3.1)
() ( )
c(-
where the infimum is over the class of all feasible controls. Suppose there exists
an optimal control c*(.). Consider the following small perturbation c l () from
the optimal policy. c l (.) takes the constant value c for an initial period [0, t],
after which it switches to the optimal control c *(. ). This control is of course
time-dependent and not feasible under our description above. But one may
actually allow time-dependent controls in the present development (theoretical
complement 1). One has, by the Markov property,
J" ' (x) = EXJ(X,) + o(t), ( ) as t .j 0. (3.2)
Note that if after time t the state is X, e Il k \ G, then the conditional expectation
of f (Xt G ) is J(X,), as the optimal control c*(.) is now in effect. Also, the
probability that t -> 'r OG is o(t) for x e Il k \ G (see Exercise 1 for an indication
of this). As J" ' (x) > J(x), one gets
( )
EJ(X 1 ) J(x) > o(t), as t j 0. (3.3)
But the left side is TIJ(x) J(x), where {T',} is the semigroup of transition
operators generated by
2
a
x^'âxu^ + Y pl'>(x, c) a (3.4)
A` : 2 d " (x ' c)
with absorbing boundary condition on 8G. Hence, dividing both sides of (3.3)
by t and letting t f, 0, one gets
A,J(x) >, 0. (3.5)
By minimizing the left side of (3.3) with respect to c, one expects an equality
in (3.3) and, therefore, in (3.5); i.e., the dynamic programming equation is
inf A,J(x) = 0 for XE R'\G, J(x) = f(x) for x E 8G. (3.6)

cec
We will illustrate the use of this rigorously in Example 1 below. Before this
is undertaken, a simple proposition in the case k = 1, f - 1, directly shows how
to minimize the expected penalty when the diffusion coefficient is fixed. Let
p ) () (i = 1, 2) be two drift functions, Q 2 (.) > 0 a diffusion coefficient specified
on (0, oo), satisfying appropriate smoothness conditions. Let i/i(z) denote the
probability that a diffusion with coefficients p ( O( ), 6 2 (.) reaches 0 before
reaching d, starting at z. Here d > 0 is a fixed number.
It is intuitively clear that, for the same diffusion coefficient a 2(.), larger the
drift tc(.) better the chance of staying away from zero. Here is a proof.
Proposition 3.1. Consider two diffusions on [0, oo ), with absorption at 0,

having a common diffusion coefficient a'() but different drifts p() (i = 1, 2)
satisfying p ( '>(z)'< j 2 ^(z) for every z > 0. Then /i 2 (z) 1(z) for all z > 0.
Proof. By relation (9.19) in Chapter V,

= a ( fu 2ct_ (y) 2 11()(y)
î^`^(z) xp{ (
e du exp dy du. (3.7)
fZ o a 2 (y) } 40, fo" a2(y)
Write
Ì( z ) pti^(z),
^(z):= y i (z) =t' (z) + srl(z),
and let F(r; z) denote the probability of reaching 0 before d starting at z, for a
diffusion with drift coefficient E ( ) and diffusion coefficient o2(.). It is
straightforward to check (Exercise 2) that
F(a; z) = /it i ^(z)(1 ey(z)) + O(s 2 ) as e 0, (3.8)
where
a
y(z):=
J
f exp
o
^ ti ^
2
(Y) rl(Y)
2 dy
6 (Y) 0
22
6 (Y)
dy du
I
exp
0
2 pta^(
6
Y)
z dy du.
(Y}
z(3.9)
Since rl(z) >, 0 for all z, it follows that
y(z) >0 (3.10)
unless t' (.) =

) t 2) ( ). Therefore, barring the case of identical drifts, one has
de F(e; z) _ îtn(z)Y(z) < 0. (3.11)

e=0
Hence, F(c; z) is strictly decreasing in e in a neighborhood of a = 0, say on

[0, e o ). Since one can continue beyond s o , by replacing t l) () by j(.), the
supremum of all such e o is oo. In particular, one can take e = 1. n
Example 1. (Survival Under Uncertainty). Consider first the following

discrete-time model, in which h denotes the length of time between successive
periods. Let Z, denote an agent's capital at time t. His capital at time t + h is
given by
Zo =z >0,
(3.12)
Z, =exp {W +n }(Z,ah), (t = nh; n=0,1,...)
where a is a positive constant and { W h : n = 1, 2, ...} is a sequence of i.i.d.

Gaussian random variables with mean mh and variance vh. The constant a
denotes a fixed consumption rate per unit of time, so that Z, ah is the amount
available for investment in the next period. The random exponential term
represents the uncertain rate of return. For the Markov process {Z n h }, it is easy
to check (Exercise 3) that
E(Z, +h ( Z,) = exp{mh + Zvh}(Z, ah),

E(Zt+ ^, I Z,) = exp{2mh + 2vh}(Z1 ah)',
E(Z, + Z,IZ,)=[(m+2v)Z,a]h+o(h), (3.13)
E((Zj+h Z,) 2 ( Z,) = vZI h + o(h),

E(IZ,+n Z t 1 3 Zr )=o(h), as h j0.
Therefore, one may, in the limit as h j 0, model Z, by a diffusion with drift t(.)
and diffusion coefficient Q 2 (.) given by
p(z) = (m + -v)z a, a2(z) = vz 2 . (3.14)
The state space of this diffusion should be taken to be [0, cc), since negative
capital is not allowed. One takes "0" as an absorbing state, which when reached
indicates the agent's economic ruin.
In the discrete-time model, suppose that the agent is free to choose (m, v)
for the next period depending on the capital in the current period. This choice
is restricted to a set C x (0, cc). In the diffusion approximation, this
amounts to a choice of drift and diffusion coefficients,
(z) = (m(z) + iv(z))z a, o2(z) = v(z)z 2 , (3.15)
such that
(m(z), v(z)) E C for every z e (0, co). (3.16)
The object is to choose m(z), v(z) in such a way as to minimize the probability
of ruin. The following assumptions on C are needed:
(i) 0 < v * := inf{v: (m, v) C} < v* := sup{v: (m, v) E C} < oo,

(ii) f (v) _= sup{m: (m, v) e C} < co, and (f (v), v) E C for each v e [v * , v*] ,
(iii) v -+ f(v) is twice continuously differentiable and concave on (v * , v*),
(iv) lim l ^,, f'(v) = cc, Iim t v. f'(v) = cc. (3.17)
First fix a d > 0. Suppose that the agent wants to quit while ahead, i.e., when
a capital d is reached. The goal is to maximize the probability of reaching d
before 0. This is equivalent to minimizing the probability O(z) of reaching 0
before d, starting at z.
It follows from Proposition 3.1 that an optimal choice of (m(z), v(z)) should
be of the form
(m(z) = f(v(z)), v(z))- (3.18)

The problem of optimization has now been reduced to a choice of v(z). The
dynamic programming equation for this choice is contained in the following
proposition. To state it define, for each constant v E [v * , v*], the infinitesimal
generator A by
(A,g)(z):= zvz 2 g"(z) + {(f(v) + zv)z a}g'(z). (3.19)
For a given measurable function v(.) on (0, cc) into [v * , v *] define the
infinitesimal generator A v( . ) by
(A V( . ) g)(z)-= zv(z)z 2 g"(z) + {(f(v(z)) + 1 v(z))z a}g'(z). (3.20)
Fix d > 0. Let /i,, ( . ) (z) denote the probability that a diffusion having generator
A, ( . ) , starting at z e [0, d], reaches 0 before d. For simplicity consider v(.) to
be differentiable, although the arguments are valid for all measurable v(.) (see
theoretical complement 2). Write
^ (z):= inf i , ( . ) (z). (3.21)
Proposition 3.2. Assume the hypothesis (3.17) on C. Then the dynamic

programming equation
min A,,O(z) =0 for0<z<d; limî(z) =1, limî(z) =0,

VE I V,R . V * 1 Z10 _ td
(3.22)
has a solution that is the minimal probability of ruin . For each z E (0, d)
there exists a unique 6(z) E [v * , v*] such that the minimum is attained at v = v(z).
The function v is strictly decreasing on (0, cc) and is optimal.
Proof. In the case v * = v*, Proposition 3.1 already provides the optimum
choice. We therefore assume v * < v *. Let li(z) be a twice continuously
differentiable function satisfying (3.22), and v() a (measurable) function such
that the minimum in (3.22) is attained at v = v(z). Then,
A f (z) = 0 for 0 < z < d; lim s(z) = 1, lim ii(z) = 0. (3.23)

Z10 zta
Then (see (3.20)),
z2 r (z) _ 17(z) {(f(v(z)) + iv(z))z a}'(z). (3.24)
Using this in (3.22), one gets
min [ v {(f(v(z)) + zv(z))z a}/v(z) + (f(v) + zv)z a]î'(z) = 0. (3.25)

VE IV..V ;j

It follows from (3.23), since its solution is unique (Exercise 7.7(iii), (iv) of Chapter
V), that i(z) is the probability of reaching zero before d, starting at z, for the
diffusion generated by A, ( . ) . In particular, i'(z) < 0. One may actually explicitly
calculate i'(z) from (9.19) in Chapter V. Therefore, to determine v(z) we find
the critical point(s) of the expression within the square brackets in (3.25) by
setting its derivative equal to zero. This leads to
(i(v(z)) + Zv(z))z a = (i
(v) + z)z, (3.26)
v(z)
or,
.f(v(z)) v(z)f'(v) = a
z
Since this must hold for v = v(z), the equation to solve is
f(v) vf'(v) = a. (3.27)

z
As the derivative of the left side is v f "(v) > 0, the left side increases from
oo to + co as v increases from v * to v*. Therefore, for each z > 0 there exists
a unique solution of (3.27), say v = v(z) e (v * , v*). Since the right side of (3.27)
decreases as z increases, it follows that the function v() is strictly decreasing.
It remains to show that cui is minimal. For this, consider an arbitrary
(measurable) choice v(.). It follows from (3.22) that
(A V( . ) I4')(z) >, 0 for all z e (0, d). (3.28)
Since (A V( . ) ct' V( . ) )(z) = 0 and >/i and î v( .^ satisfy the same boundary conditions,
letting u(z):= ii(z) 0,,. (z), one gets
(A, ( . ) u)(z) > 0 for all z e (0, d), lim u(z) = 0, lim u(z) = 0. (3.29)
zj0 zld
By the maximum principle (see Exercise 7.7(iii) of Chapter V) it follows that

u(z) <, 0 for all z, i.e.,
î(z) < 0 . (z)

'( )
for all z E [0, d]. (3.30)
n
It is important to note that the optimal choice
(Y)'= (f( '(Y)) + z'(y))y a, i(y), (3.31)

does not depend on d. Therefore, with this choice one also minimizes the
probability p Zo of ever reaching 0, starting from z. For this, simply let d j oo in
the inequality
(.(.)(z) <' ^1l ((z). (3.32)
An example of a C satisfying (3.17) is the following set bounded by the lines

v = v o , and a semi-elliptical cap (bounding m away from + co),
(v v0)2 1 / 2
C ={(m,v):vo^v'<vo+,m-mo+a^l (3.33)
Here a, ,v o are positive constants, v o > , and m o is an arbitrary constant.

Next consider the problem of maximizing the expected discounted reward
J(x) ;= E c( ' ) e s r(XS, c(XS)) ds, (3.34)

0
where the state space of the diffusion {X 1 } is an open set G c W. The diffusion
has drift (y, c(y)) and diffusion matrix D(y, c(y)); r(y, c) is the reward rate
when in state y and an action c belonging to a compact set C c 11" is taken,
and > 0 is the discount rate.
The given functions ph i) (y, c) (1 < i < k) are appropriately smooth on G x C,
as are d(y, c) (1 < i, j < k), and D(y, c) is positive definite. Once again, the set
of feasible controls c(.) may vary. For example, it may be the class of all
continuously differentiable functions on G into C. The function r(y, c) is
continuous and bounded on G x C.
If, under some feasible control c(. ), the diffusion {X,} can reach G with
positive probability, then in (3.34) the upper limit of integration should be
replaced by rO G Let
J(x):= sup JÒ(x). (3.35)
a.)
In order to derive the dynamic programming equation informally, let c *(.) be
an optimal control. Consider a control c 1 (.) which, starting at x, takes the
action c initially over the period [0, t] and from then on uses the optimal control
c *(. ). Then, as in the derivation of (3.2), (3.3),
J(x) = E`X o e -a sr(X S , c) ds + EXr )

5 ^"
e -Qs r(Xs,
-s
c(XS)) ds
= E`
fo, e sr(X,, c) ds + EX^'Ie `
-
f 'o
e ri r(X^ +s, c(Xr +s)) ds'
= E` o e - sr(X s , c) ds + e - QÈxJ(X,)
J
= t(r(x, c)) + o(t) + e - `T,J(x) < J(x),
or,
(1 en') T,J(x) J(x)

J(x) > r(x, c) + e-' + o(1). (3.36)
t t
Letting t l 0 in (3.36) one arrives at
J(x) > r(x, c) + A c J(x). (3.37)
One expects equality if the right side of (3.37) (or (3.36)) is maximized with
respect to c, leading to the dynamic programming equation
J(x) = sup {r(x, c) + A,J(x)). (3.38)

Cec
In the example below, k = 1.
Example 2. The state space is G = (0, oo). Let c denote the consumption rate
of an economic agent, i.e., the "fraction" of stock consumed per unit time. The
utility, or reward rate, is
c) = (cx)1
r(x, (3.39)
y
where y is a constant, 0 < y < 1. The stock X, corresponding to a constant

control c is a diffusion with drift and diffusion coefficients
(x, c) = Sx cx, o 2 (x, c) = v 2 x 2 , ( 3.40)
where 6 > 1 is a given constant representing growth rate of capital, while c is

the depletion rate due to consumption. For any given feasible control c() one
replaces c by c(x) in (3.40), giving rise to a corresponding diffusion {X1 }. It is
simple to check that such a diffusion never reaches the boundary (Exercise 4).
The dynamic programming equation (3.38) becomes
J(x) = max x' + zQ 2 x 2 J"(x) + (Sx cx)J'(x) ? . (3.41)

CE[0,1] y
Let us for the moment assume that the maximum in (3.41) is attained in the
interior so that, differentiating the expression within braces with respect to c
one gets
cY-'xY = xJ'(x),
or,
c*(x) = 1 (J'(x))' icy

x
-1)
. ( 3.42)
Since the function within braces in (3.41) is strictly concave it follows that (3.42)
is the unique maximum, provided it lies in (0, 1). Substituting (3.42) in (3.41)
one gets
262x2)(.x) J(x) = 0. (3.43)
+ SxJ'(x) I
\ Y/
Try
Try the "trial solution
J(x) = dxv. (3.44)
Then (3.43) becomes, as x 1 is a factor in (3.43),
-1) d = 0,
a 2 dy(v I) + dy (1 I I (dv)rrw (3.45)
vl
i.e.,
1 1 y 1-v
d = -- - (3.46)
v + .a 2 v(l y) ay
Hence, from (3.42), (3.44), and (3.46),
72
ii^ v^ = + Y( 1 =r SY . )
(3.47)
c * := c * (x) = (yd) 1
I y
Thus c*(x) is independent of x. Of course one needs to assume here that this
constant is positive, since a zero value leads to the minimum expected discounted
reward zero. Hence,
+ a 2 y(1 y) by > 0. (3.48)
For feasibility, one must also require that c* < 1; i.e., if the expression on the
extreme right in (3.47) is greater than 1, then the maximum in (3.41) is
attained always at c* = 1 (Exercise 5). Therefore,
c* = max {1, d' }, (3.49)
where d' is the extreme right side of (3.47). We have arrived at the following.
Proposition 3.3. Assume 6 > land (3.48). Then the control c*() = min{ 1, d'}
is optimal in the class of all continuously differentiable controls.
Proof. It has been shown above that c*(. ) is the unique solution to the dynamic
programming equation (3.38) with J = J` * . Let c() be any continuously
differentiable control. Then it is simple to check (see Exercise 6) that
J` ( ) (x) = r(x, c(x)) + A c( - ) jc '(x).

(. (3.50)

Using (3.38) with J = J one then has for the function
h(x) := J(x) J` ' (x),

( ) (3.51)
the inequality
h(x) = r(x, c*) + A,.J(x) {r(x, c(x)) + A C( . ) J` ' (x)} ( )
r(x, c(x)) + A C( . ) J(x) {r(x, c(x)) + A C( . ) J` ' (x)} = A C( . ) h(x), (3.52)

( )
or,
g(x) := ( A C( . ) )h(x) >, 0. (3.53)
Since the nonnegative function
hi(x):= EX )(J e g(XS)ds), (3.54)

0 s
satisfies the equation (Exercise 6)
( A.^)h1(x) = g(x), (3.55)
one has, assuming uniqueness of the solution to (3.55) (see theoretical

complement 3), h(x) = h l (x). Therefore, h(x) >, 0. n
Under the optimal policy c* the growth of the capital (or stock) X, is that
of a diffusion on (0, co) with drift and diffusion coefficients
(x):= ( c*)x, u'(x):= u 2 x 2 . (3.56)
By a change of variables (see Example 3.1 of Chapter V), the process

{ Y := log X} is a Brownian motion on O' with drift 6 c* ZQ Z and diffusion
coefficient a'. Thus, {X,} is a geometric Brownian motion.
4 OPTIMAL STOPPING AND THE SECRETARY PROBLEM
On a probability space (S2, .`, P) an increasing sequence {.: n >, 0} of sub-

sigmafields of are given. For example, one may have 3 = a{Xo , X1 , ... , X},
where {X} is a sequence of random variables. Also given are real-valued
integrable random variables {Y,,: n > 0}, Y being .-measurable. The objective
is to find a {}-stopping time r,* that minimizes EYT in the class , of all
{.y}-stopping times T < m. One may think of Y as the loss incurred by stopping
at time n, and m as the maximum number of observations allowed.

OPTIMAL STOPPING AND THE SECRETARY PROBLEM 543
This problem is solved by backward recursion in much the same way as the
finite-horizon dynamic programming problem was solved in Section 1.
We first give a somewhat heuristic derivation of rm. If S = . . . ., X}
(n >, 0), and X0 , X 1 ,. . . , Xm _ 1 have been observed, then by stopping at time
m 1 the loss incurred would be Ym _ 1 . On the other hand, if one decided to
continue sampling then the loss would be Ym . But Ym is not known yet, since
Xm has not been observed at the time the decision is made to stop or not to
stop sampling. The (conditional) expected value of Ym , given X0, ... , Xm_1,
must then be compared to Ym _ 1 . In other words, rm is given, on the set
{im >,m l}, by
2* _Im 1 if Ym_1 < E(Ym I Fm-1 ),

(4.1 )
m m if Ym -1 > E(Ym I `gym - 1)
As a consequence of such a stopping rule, one's expected loss, given

{X0 ,...,Xm I },is
Vm _, :=min{Ym _ 1 , E(Ym I `gym -1)} on {t, >, m 1 }. (4.2)
Similarly, suppose one has already observed Xo , X 1 , ... , Xm _ 2 (so that

r m 2). Then one should continue sampling only if Ym _ 2 is greater than
the conditional expectation (given {X0 , . .. , Xm _ 2 }) of the loss that would result
from continued sampling. That is,
T m 2 if Ym -2 < E(Vm -1 I .m-2),

m (4.3)
>m I if Ym -2 > E( Vm -t I'Fm -2) on {Tm i m 2 }.
The conditional expected loss, given {Xo , ... , Xm _ 2 }, is then
Vm - 2 :=min{Ym _ 2 , E(Vm -1 I'm -2)} on {t, i m 2 }. (4.4)
Proceeding backward in this manner one finally arrives at
0 if Yo ` E(V1 I .Fo),
(4.5)
m >1 ifYo>E(Vilô) on{t0} =52.
The conditional expectation of the loss, given .moo , is then
Vo := min{ Yo , E(V1 )} . (4.6)
More precisely, V, are defined by backward recursion,
Vm :=Ym , Vj:=min{Y^,E(Vj 1 I.3j )} (j = m-1,m-2,...,0), (4.7)

and the stopping time Tm is defined by
r:= min{ j: 0 < j < m, }, = Vj }. (4.8)
Although the optimality of im is intuitively clear, a formal proof is worthwhile.

First we need the following extension of the Optional Stopping Theorem
(Chapter I, Theorem 13.3). A sequence {S S : j >, 0} is an {.9j }-submartingale if
E(SS I .Fj _,) > SS _ 1 a.s. for all j >, 1.
Theorem 4.1. (Optional Stopping Theorem for Submartingales). Let {S; : j > 0}
be an {.}-submartingale, r a stopping time such that (i) P(r < cc) = 1, (ii)
E^SZ C < oc, and (iii) lim n . E(S,l {t>m) ) = 0. Then
EST >, ESo ,
with equality in the case {SS } is a {3}-martingale.
Proof. The proof is almost the same as that of Theorem 13.1 of Chapter I if
we write XX := SS SJ _ 1 (j >, 1), X0 = Sp, and note that E(XX I .9j _ 1 ) 0 and
accordingly change (13.8) of Chapter Ito
E(Xjlj)) = E[ 1 {r j)E(Xj ( . _ 1)] % 0,

and replace the equalities in (13.9) and (13.11) by the corresponding inequalities.
n
The main theorem of this section may be proved now. Theorem 4.1 is needed
for the proof of part (c), and we use it only for r <, m.
Theorem 4.2. (a) The sequence {l': 0 < j < m} is an {$ ; }-submartingale.
(b) The sequence { Vrm j : 0 <, j < m} is an {S%}-martingale. (c) One has
E(Y,) > E(V0 ) VT E $ E(Y.) = E(VO ). (4.9)
That is, r is optimal in the class ,,,.
Proof (a) By (4.7), V < E( +1

(b) Let Z be an arbitrary F;-measurable bounded real-valued random
variable. We need to prove (see Chapter 0, relation (4.14))
E(ZVV.Ai)=E(ZV.A(J +i)) (0<j'<m 1). (4.10)
For this, write
E(ZV,. ,i) = E(ZV,. , j 1 (,,J;) + E(ZVL,AJl1, >J))
= E(Zl' A j I (rm , j) ) + E(ZVV l st ,, > J) ). (4.11)

But, on {tm >j}, Vj = E(VJ+ , I S). Also, {rm >j} E Ft . Therefore,
E(ZVjl {t., >i}) = E(Zllr.,>J}E(V; +i I ,;))

= E(ZV, + , i{T ,>i)) = E(ZV^n,A(J+1)l1t,,>i}). (4.12)
Using (4.12) in (4.11) one gets (4.10), since zm A j = A (j + 1) on {r j }.

(c) Let r E .9,,. Since YY >, V^ for all j (see (4.7)), one has Yt , V. By (4.7),
Theorem 4.1, and the submartingale property of {Vj } it now follows that
E(Y,) >, E(VT ) > E(V0 ). (4.13)
This gives the first relation in (4.9). The second relation in (4.9) follows by the
martingale property of { Vtm J } (and Theorem 4.1). Note YL ^, = V^.,. n
Y Y
In the minimization of EYt over .9 , need not be .-measurable. In such
cases one may replace by E( Y .) = U, say, and note that, for every t C-
m m
E(Y) _ I E(Y.iItT=1)) = I E[E(Yi 1 J^=jJ I

J= 0 l=0
j)] Z_ =
m
J=0
E[ 1 1t=r)UJ) = EUt .
Hence the minimization of EYE reduces to the minimization of EUr over .%,,,.
Also, instead of minimization one could as easily maximize EYT over .9,.
Simply replace min by max in (4.7), and replace ">," by "<," in (4.9). This is
the case in the following example, in which the indexing of the random variables
and sigmafields starts at j = I (instead of j = 0).
Example 1. (The Secretary Problem or the Search for the Best). Let m> 1
labels carrying m distinct numbers be in a box. A person takes out one label
at a time at random, observes the number on it and sets it aside before the
next draw. This continues until he stops after the rth draw. The objective is to
maximize the probability that this rth number is the maximum M in the box.
Thus, if Wj = 1 or 0 according as the jth number X; drawn is the maximum or
not, one would like to maximize EW over the class of all stopping times
r(<, m) relative to the sigmafields .Fj := a{X I , ... , X3 }, I <, j '< m.
Define the .F -measurable random variables
;
Y` =E(WiI.Fi)=P(XX=MI ). (4.14)
If XX is the maximum, say Ml , among X 1 ,. . . , XX , then the condition probability

(given X,, ... , XX ) that it is the maximum in the whole box is j/m; if X3 < MM
then of course this conditional probability is zero. Therefore,
j
Y = m l(x;=Mj1' (4.15)

Also, for every z e .9 , as explained above,

m m m
EYT = E(Y;lt i }) _ E[E(Wi I X1) 1 1 , =i}] _ E[E( 1 tt=i} 14 I $5)]
=1 i =1 i =1
m
_ E(I t== Wj ) = E.
1=1
Hence the maximum of EWt over 9-m is also the maximum of EYt over m . In
order to use Theorem 4.2 with min replaced by max in (4.7) and ">" replaced
by "<," in (4.9), we need to calculate Vj (1 < j < m). Now Vm = Ym and (see
Eq. 4.15)
E(YmI' m-1 )= mm P(Xm=M mI^m 1 )=1

m (Mm =M).
Since (m 1)/m > 1/m, it then follows that
J 1 m-1 1
Vm -i =max {Ym-1 E(Ym I ^m-1) }
, = 1 {X Mm + m 1 {X--i<M.,,- }-i)
Then,
m-1
E( , I ^m 2) = P(Xm-1 = Mm-1 (gym-z) + 1 P(Xm - 1 < Mm-1 m-z)
m m
m 1 1 lm-2 m-2 1 1
_ + _ I +
m m-1 mm-1 m m-2 m-1
(
(4.16)
To evaluate
Vm-2'=max {Ym-2,E(Vm-1 m-2)}
note that if (m _2)_1 + (m 1)' < 1 then (see Eqs. 4.14 and 4.16)
m-2 m-2( 1 1
V m z = m 1txm-2M.-z} + m 1 ltxm-2<n^m :} (4.17)
m m-2
so that a calculation akin to (4.16) yields
m-3 1 1 1
E( V
m Iz m 3)=
m m-3+m-2+m-1

Assume, as a backward induction hypothesis, that
1 11
J (4.18 )
m j j+l m I
for some j such that
a:= - + 1 + ... + 1 1. (4.19)

j j+1 m-1
Then,
V; = max{ Y, E(V;+ 1 F; )} = max { m 1(x; =M; ) ' m ai}
I
= 1 {x;=M;} + m aj1jx;<M;},
and, since P(X; = M; 35- 1 ) = 1/j, it follows that
j-1 11 1 J-1
E(Vj I. j -^)= +-+...+ = a;-^
m j-1 j m-1 m
The induction is complete, i.e., (4.18) holds for all j >j', where
j *:=max{j:l <j<m,a > 1 }. ; (4.20)
In other words, one gets
E(V 1 )=1-a for j* <, j < m . (4.21)

m
Also,
V;. = max{, E(1'. 1 ;.)} = L a*, (4.22)
m
since a . > 1. In particular, V; . is nonrandom, which implies

;
which in turn leads to
j*
V; -^ = max {Y;.- 1 , E(Vi. I-)} = a .. ;
m
Continuing in this manner one gets
V ; = L a . ; for 1 < j < j*. (4.23)

m
The optimal stopping rule is then given by (see Eq. 4.8)
i * _ min{ j:j>,j*+ 1,X; =M;}, if X; = M; for some j>,j*+1, (4.24)

m, if X; 9kM; forallj>,j*+1.
For, if j <j" then Yt < E(V;+ 1 Iy ) = ( j*/m)a;.. On the other hand, if j >j'
;
then aj < 1 so that (i) Y > E(Vj+1 ;) = ( j/m)a j on {XX = MM } and (ii)
I
0= Y < E(V 1 .yj ) on {X, < M}.
Simply stated, the optimal stopping rule is to draw j* observations and then
Simply
continue sampling until an observation larger than all the preceding shows up
(and if this does not happen, stop after the last observation has been drawn).
The maximal probability of stopping at the maximum value is then
11 1
E(V1 ) Vi -
-+ j*+ 1 +...+ (4.25)
m m-1
Finally, note that, as m oo,
1 1 1
a*=+ ++ z1, (4.26)
' j* j* + 1 m 1
where the difference between the two sides of the relation "z" goes to zero.
This follows since j* must go to infinity (as the series (1/j) diverges and j*
is defined by (4.20)) and a;. > 1, a;. + l < 1. Now,
1 1
l m
iI J/m ,*/m
a,. = m*
1
. dx = log(j*/m). (4.27)
Combining (4.26) and (4.27) one gets
j*
logt .: 1, - e ', (4.28)
m m
where the ratio of the two sides of "" goes to one, as m --+ oo. Thus,
lim =e, lim E(V1 )=e. (4.29)

m- m
J m-+co

5 CHAPTER APPLICATION: OPTIMALITY OF (S, s) POLICIES IN

INVENTORY PROBLEMS
Suppose a company has an amount x of a commodity on stock at the beginning

of a period. The problem is to determine the amount a of additional stock that
should be ordered to meet a random demand W at the end of this period. The
cost of ordering a units is
if a = 0,
(5.1)
K +ca ifa>0,
where K >, 0 is the reorder cost and c > 0 is the cost per unit. There is a holding
cost h > 0 per unit of stock left unsold, and a depletion cost d > 0 per unit of
unmet demand. Thus the expected total cost is
I(x, a):='(a) + L(x + a), (5.2)
where
L(y):= E(h max {0, y W} + d max {0, W y }). ( 5.3)
In this model x may be negative, but a > 0. The objective is to find, for each
x, the value a = f *(x) that minimizes (5.2).
Assume
EW<co, d >c>0, h >0. (5.4)
Consider the function G on R ,
G(y):= cy + L(y). (5.5)
Then,
G(x)cx ifa =0,

1 ( x, a) = (5.6)
JG(x + a) + K cx ifa>0.
Minimizing I(x, a) over a >, 0 is equivalent to minimizing I(x, a) + cx over

a>,O,oroverx +a>,x. But
G(x) if a = 0,
a) + cx =
I(x, (5.7)
+a) JG(x +K ifa>0.
Thus, the optimum a = f *(x) equals y*(x) x, where y = y*(x) minimizes the

function (on [x, oo))
if y = x,
G(Y; x) {G(x) (5.8)
G(y) + K if y> x,
over y > x.
Now the function y - L(y) is convex, since the functions y -+ max{0, y - w},
max{0, w - y} are convex for each w. Therefore, G(y) is convex. Clearly, G(y)
goes to 0o as y -> oo. Also, for y < 0, G(y) > cy + djyl. Hence G(y) -+ oo as
y -. - oo, as d > c > 0. Thus, G(y) has a minimum at y = S. say. Let s < S be
such that
G(s) = K + G(S). (5.9)
If K = 0, one may take s = S. If K > 0 then such ans < S exists, since G(y) --> o0
as y -- - oo and G is continuous. Observe also that G decreases on (- cc, S]
and increases on (S, cc), by convexity. Therefore, for XE (-cc, s],
G(x) > G(s) = K + G(S), and the minimum of G in (5.8) is attained at
y = y*(x) = S. On the other hand, for x E (s, S],
G(x)<,G(s)=K+G(S)<K+G(y) for all y>x.
Therefore, y*(x) = x for XE (s, S]. Finally, if x e (S, oo) then G(x) < G(y) for
all y > x, since G is increasing on (S, oc). Hence y*(x) = x for x E (S, oo).
Since f *(x) = y*(x) - x, we have proved the following.
Proposition 5.1. Assume (5.4). There exist two numbers s <, S such that the
optimal policy is to order
f*(x) _ S-x ifx<s,

to ifx>s.
(5.10)
The minimum cost function is
I(x) := I(x, f *(x)) = (g(f *(x)) + L(x + f *(x))

K+G(S)-cx ifx.s,
(5.11)
G(x)-cx ifx>s.
Instead of fixed costs per unit for holding and depletion, one may assume
that L(y) is given by
L(y):= E(H(max{ 0, y - W}) + D(max{0, W - y})). (5.12)
Here H and D are convex and increasing on [0, co), H(0) = 0 = D(0). The

assumption (5.4) may now be relaxed to
L(y) < oo for all y, lim (cx + L(x)) > K + cS + L(S), (5.13)
X - - -
where S is a point where G is minimum.

Next consider an (N + 1)-period dynamic inventory problem. If the initial
stock is x = X0 (state) and a = a 0 units (action) are ordered, then the expected
total cost in period 0 is
g 0 (x, a):= E(4'(a) + h max{0, x + a W1 } + d max {0, W1 x a}) = I(x, a).

(5.14)
We denote by W1 , W2 ,.. . , W^ i.i.d. random demands arising at the end of
periods 0, 1, . .. , N 1. Assumption (5.4) is still in force with W a generic
random variable having the same distribution as the W. The state X 1 in period
I is given by X 1 = x + a W1 = X0 + a o W1 and in general
Xk = Xk- l + ak-, Wk (5.15)
where Xk - 1 is the state in period k 1, and a k _ is the action taken in that

period. Thus, the transition probability law p(dz; x, a) is the distribution of
x + z W. The conditional expectation of the cost for period k, given Xk _, = x,
ak -1 =a, is
gk(x, a)'= 5 k l(x, a) (k = 0, 1, ... , N), (5.16)
where is the discount factor, 0 < 6 < oo. The objective is to minimize the total
(N + 1)- period expected discounted cost
N
JX = EX ak I (Xk, ak), (5.17)
k=0
over all policies f = (f, f', ... 'f).

By Theorem 1.2 (changing max to min), an optimal policy is
f = (f , f *, ... , f N), which is Markovian and is given by backward recursion
(see Eqs. 1.11 - 1.14). In other words, f N is given by (5.10),
fN( )= Sx ifX<S,
X (5.18)
0 ifx>S,
and f (0 < k < N 1) are given recursively by (1.14). Let us determine f N _i.
For this one minimizes
gN_ 1(x, a) + Eh N (x + a W), (5.19)

where 8 N,- 1 is given by (5.16) and h N,(x) is the minimum of g,(x, a) = S"I(x, a),
i.e., (see Eq. 5.11)
hN(x) = N I(x).
Thus, (5.19) becomes
6' -1 [I(x,a)+6EI(x+aW)], (5.20)
and its minimization is equivalent to that of
I N _ l (x,a):=I(x,a)+6E1(x+a W). (5.21)
Write
GN -1(Y) = cy + L(y) + 6E1(y W). (5.22)
Then (see the analogous relations (5.5)-(5.8)),
GN -i(x) if a = 0,
I N - i (x, a) + cx = f (5.23)
+a)+K G,-,(x ifa>0.
Hence, the minimization of IN _ 1 (x, a) over a >, 0 is equivalent to the

minimization of the function (on [x, oo)),
GN -l(X) ify=x, ( 5.24)

G N -,(Y; x) _
tGN-1(Y) + K ify>x,
over y >, x. The corresponding points f _ 1 (x), yN _ 1 (x), where these minima
are achieved, are related by
J N -1 \x) = YN-I(X) - X. (5.25)
The main difficulty in extending the one-period argument here is that the
function GN - 1 is not in general convex, since I is not in general convex. Indeed
(see (5.11)), to the left of the point s it is linear with a slope c, while the
right-hand derivative at s is G'(s) c < c since G'(s) <0 if K > 0. One can
easily check now that I is not convex, if K > 0.
It may be shown, however, that GN _ I is K- convex in the following sense.
Definition 5.1. Let K >, 0. A function g on an interval.! is said to be K- convex

if, for all y l < Y 2 <J'3 in
a Y2
K + 9(Y3) 9(Y2) + Y (9(Y2) 9(Y1)). (5.26)
Y2 Yi
Thus 0-convexity is the same as convexity, and a convex function is K-convex

for all K > 0.
We will show a little later that
(i) G N _ 1 is K-convex on V^ 1 ,
(ii) GN -1(Y) -* cc as IYI --> oo.
Assuming (i), (ii), we now prove the existence of two numbers S N _ I SN _ 1

such that
SN -1 ifx,SN -1
YN 1(X)'- x (5.27)
t 1f X > S N-1+
minimizes GN _ 1 (y; x) over the interval [x, cc). For this let SN _ 1 be a point
where G N _ , attains its minimum value, and let S N _ 1 be the smallest number
SN _ 1 such that
GN-I(SN -1) = K + GN-1(SN -1)- (5.28)
Such a number S N _, exists, since G N _ 1 is continuous and G N _ I (x) - co as

x-* -cc.
Now G N _ 1 is decreasing on (-cc ; S N - I ]. To see this, let y, < Y 2 < S N _ 1 and
apply (5.26) with y, =
SN -1 Y2 / /
K + GN- I (SN- 1) > G_1(y2) + -- ( GN- I (Y2) GN- 1 (YI )). (5.29)
Y2 YI
Also, GN _ I (y 2 ) > K + G N _ 1 (SN _ 1 ), since Y 2 <5N- , and (5.28) holds for the
smallest possible 5 N _ 1 Using this in (5.29), one gets
K + GN-1 (SN-1 ) > K + GN-I(SN -1) + / /

----- -- (GN-I(Y2) GN-1(Y1)) ,
Y2 Y1
so that GN _ I (y2) < GN _ I (y l ). Therefore, if x -< sN _ 1, then GN _ , (x) > GN _ I (SN) _

GN_ 1 (SN_ 1 ) + K and,forally> x , GN_1(y;x)=GN_1(y)+K^GN_1(SN_I) +K.
Hence, the minimum of 6N -,(-; x) on [x, oo) is attained at y = 5N-1' proving
the first half of (5.27). In order to prove the second half of (5.27) it is enough
to show that
GN-I(x)<GN - I(Y) +K =GN-I(Y;x) ifS N - I ^x<y. (5.30)
If S N _ I <x 1< S N _ I then by (5.28) and K-convexity,
GN-I(sN-1) = K + GN-1(SN -1) i GN-1(x) + SN--1 x (GN -,(x) GN-1(sN -1)) ,

X SN -1

or,
SN -1 SN-1 SN-1 S N-1 /

GN-1(SN-1) > GN -l(x),
X SN- 1 X SN _ 1
i.e., GN_l(x) <- GN-I(SN-1) = GN-1(SN-1) + K < GN_1(y) + K for all y. Since
(5.30) clearly holds for x = s N _ 1 , it remains to prove it in the case S N _ 1 <x < y.
By K- convexity one has
/ x /
/ /
K + GN -1(Y) i GN -1(x) + ( GN-1( x) GN-,(SN-1))
X SN-1
Thus (5.30) is proved, establishing the second half of (5.27).

Finally, let us show that GN _ 1 is K-convex. Recall (see Eqs. 5.22, 5.5) that

G_ 1 (y) = G(y) + SE1(y W) (5.31)
Since the function G is convex (i.e., 0-convex), it is enough to prove that I is

K- convex. For it is easy to check from the definition, on taking expectations,
that the K- convexity of I implies that of y --+ EI (y W). It is also easy to
show that if J 1 is K 1 -convex, J2 is K 2 -convex for some K 1 >, 0, K 2 > 0, then
S 1 J1 + S 2 J2 is (a 1 K 1 + 5 2 K 2 )-convex for all S 0, 6 2 >, 0. For positive S it
now follows from (5.31) that GN _ 1 is K- convex if I is K- convex. Recall the
definition of I (see Eq. 5.11),
K+G(S)cx ifx<s,
1 ( x) __ (5.32)
ifx>s. ( )
We want to show
K + 1(x 3 ) > 1(x 2 ) + X3 X2 (1(x 2 ) 1(x 1 )) for all x 1 <x 2 <x 3 . ( 5.33)
x2 x1
If x 3 -< s, then linearity of I in ( oo, s] implies convexity and, therefore,

K- convexity in ( co, s]. Similarly, if x 1 > s, (5.33) is trivially true since
G(x) cx is convex, and, therefore, K- convex. Consider then x 1 <s <x 3 .
Distinguish two cases.
CASE I. x 1 <s <x 2 . In this case (5.33) is equivalent to
K+G(x 3 ) cx 3 >G(x 2 ) cx 2 + X3X2 (G(x 2 ) cx 2 KG(S)+cx 1 ).

x2 X1
(5.34)
Canceling cx (i = 1, 2, 3) from both sides and recalling that K + G(S) = G(s),
;

(5.34) becomes equivalent to
K + G(x 3 ) >, G(x 2 ) + X3 X2 (G(x 2 ) G(s)). (5.35)

x2 x1
If G(x 2 ) >, G(s), then the right side of (5.35) is no more than
G(x2) + X3 X2
(G(x2) G(s)),
x2 s
which is no more than G(x 3 ) by convexity of G. Hence (5.35) holds. Suppose

G(x 2 ) < G(s). The left side of (5.35) is no less than K + G(S) = G(s), and (5.35)
will be proved if it holds with the left side replaced by G(s), i.e., if
G(s) > G(x 2 ) + x3 x2 (G(x 2 ) G(s)).

(5.36)
x2 x 1
By simple algebra, (5.36) is equivalent to G(s) >, G(x 2 ), which has been assumed
to be the case.
CASE 1I. x 1 < x 2 - s < x 3 . In this case (5.33) is equivalent to
X2
K + G(x3) cx3 > K + G(S) cx2 + x3 ( cx2 + cx l )
x2 x1
= K + G(S) cx 3 ,
which is obviously true.

Thus, I is K-convex, so that G N _, is K-convex, and the proof of (5.27) is
complete.
The minimum value of (5.19), or (5.20), is
(5.37)
where (see Eqs. 5.23, 5.25, 5.27)
K +GN _,(SN _,)cx ifx s N _ 1 ,

Ix -,(x) = min I N -,(x, a) _ (5.38)
a>- 0 G,^_,(x)cx ifx>sN_1.
The equation for the determination of f_2 is then (see Eq. 1.14)
min [6 N-2 1(x, a) + 6 N- 'EI N _ ,(x + a W)]

a ?o
= 6 N-2 min [I(x, a) + 6EI N _ I (x + a W)]. (5.39)

n3o
This problem is mathematically equivalent to the one just considered since

(5.32) and (5.38) are of the same structure, except for the fact that G is convex
and G, -1 is K-convex. But only K-convexity of G was used in the proof of
K-convexity of I(x). Proceeding iteratively, we arrive at the following.
Proposition 5.2. Assume (5.4). Then there exists an optimal Markovian policy
f* _ (f , ... , f *) of the (S, s) type for the (N + 1)-period dynamic inventory
model, i.e., there exist Sk < S k (k = 0, 1, . .. , N) such that during period k it is
optimal to order Sk x if the stock x is < s k , and to order nothing if x > Sk.
Also s k <Sk for all kifK>0, and sk =Sk ifK=0.
Finally, consider the infinite-horizon problem of minimizing, for a given b,

0<8<1,
X
Jx := Ex b k l (Xk, ak), (5.40)
k=0
over the class of all (measurable) policies f = (fo , f1 , ...). Write this infimum as
Jx := inf JX. (5.41)
To show that Jx is finite, consider the policy f := (0, 0, ...), i.e., the stationary
policy that orders zero in every period. It is straightforward to check that
EX 1(Xk , a k ) < (h + d)Ex(IXk-l) + Wk ) = (h + d)(EXIXk-11 + EW), (5.42)
where { Wk } is an i.i.d. sequence, Wk having the same distribution as W considered

before. Also, for each k, Wk is independent of X 0 ,. .. , Xk _ , . Now,
EX IXkI = Ex IXk_ I Wkl < EW + Ez IXk _ 11,
and iterating one gets
EsI XkI <, kEW + Ixl. (5.43)
Substituting (5.43) in (5.42), (5.40) one gets
(h + d)EW (h + d)Ixl
JX + <co. (5.44)
(1 S) 2 1 S
Hence Jx < oo. The dynamic programming equation (2.14) becomes
Jx = min {I(x, a) + 6E(Jx+a _ w )}. (5.45)

a^0

EXERCISES 557
We will assume K = 0. Now the (N + 1 )-period optimal costs (T"0)(x) are

increasing with N. Since these costs are convex, the limit J. is also easily seen
to be convex and, in particular, continuous. It follows that the convergence of
I(x, a) + SE(T "0)(x + a W) (5.46)
to the expression within braces in (5.45) with J = J. is uniform on compact

subsets of x and a, at least if W is bounded,
P(0. W'< M) = I for some M < oo. (5.47)
The dynamic programming equation, therefore, holds for J, x . This also implies
that the convex function
G(y)L=G(y) + BEJY _ W = cy + L(y) + 5EJJ _ W (5.48)
goes to infinity as yl oo. Thus, by the same argument as above, an optimal

policy exists with S = s. We have the following.
Proposition 5.3. Assume (5.4) and (5.47). Let K = 0, 0 < S < 1.

(a) Then the dynamic programming equation (5.45) holds for the minimum
cost J. in the infinite-horizon discounted dynamic programming problem.
(b) There exists a stationary optimal policy f* = ( f *, f *,_...) such that
a = f *(x) minimizes the right-hand side of (5.45) with Jx = J, and f*
is given by
_ Sx ifx'<S,
f *(x) 0 if x > S.
Here S is the point where the function (5.48) attains its minimum.
EXERCISES
Exercises for Section VIA

Consider the following inventory control problem. Let the possible amounts of a
commodity that a business can stock be 0, 1, 2 (states). In each period k = 0, 1, 2
(N = 2) the stock is replenished by ordering 0, 1, or 2 units (actions), but not more
than 2 units can be stored. There is a random demand that arises at the end of each
period, demands over different periods being independent of each other and identically
distributed. The possible values of demand are 0, 1, 2, which occur with probabilities
0.2, 0.5, and 0.3, respectively. Thus the transition probability p(z; x, a) is the
distribution of max {0, z + a W} A 2, where W is the random demand. Find an
optimal policy to minimize the total expected cost if in each period the cost is the
sum of (a) the cost of ordering a units at the rate of $1 per unit, (b) the cost of storing
the excess supply max{0, x + a W} at the rate of $1 per unit, and (c) a penalty of
$1 per unit for excess demand max{0, W x a}.
2. (Taxicab Operation) The area of operation of a cab driver comprises three towns,
Ti, T2 , T3 (states). To get a fare, the driver may follow one of three courses (actions):
pull over and wait for a radio call (a 1 ), go to the nearest taxi stand and wait in line
(a 2 ), or go on cruising until hailed by a passenger (a 3 ). The action a, is not available
if the driver is in town T,, as there is no radio service in this town. There is a known
probability p(T; ; Ti , a,) that the next trip will be to town T; , if the cab is in town T
and follows the course of action a,. The corresponding reward is g(T, a k , T).
(i) Write down the backward recursion relations for maximizing the total expected
reward over a finite horizon.
(ii) Find the optimal policy for the case N = 2, if the transition probabilities and
rewards are as follows:
State Action Probability of Reward if trip is

transition to state to state
T1 T2 T3 TI T2 T3
Ti a2 2 0 2 2 0 4
1 1 1
a3 4 2 4 4 2 4
i 1 1
T2 al 4 2 4 2 2 4
1 1 1
a2 4 4 2 2 4 2
i 1 1
a3 4 2 4 2 2 2
t
T3 a, 41 21 4 4 2
I i 1
az 2 4 4 2 2 2
I I 1
a3 4 4 2 2 4 2
3. Prove Theorem 1.3 along the lines of the proof of Theorem 1.2.
Exercises for Section VI.2

1, Suppose S = {x,, x z , ...} is countable and A is a compact metric space. For each
E e (0, b) let {f: k = 0,1, ...} be a sequence of functions on S into A.
(i) Show that there exists a sequence E, j 0 such that f 0-(x l ) converges to some
point f(x l ), say, in A. [Hint: A is a compact metric space.]
(ii) Show that there exists a subsequence {E,,, 2 : n = 1, 2, ...} of {E : n = 1, 2, ...}
such that f 2 (x z ) converges to some point f(x2) in A.
(iii) Having constructed {E 1 : n = 1, 2, ...} in this manner find a subsequence
n = 1, 2, ...} of {: n = 1, 2, ...} such that f " ( x ;+ ,) converges to
some point f (x ;+ , ), say.
(iv) Define S;p = E.,,, and show that f ^(xj ) - f (x j ) for allj = 1, 2, ... , as n co.
(v) Use (i)(iv) to find a subsequence {8,.,: n >, 1} of {8 0 } such that ^'(x) fa
converges to f (x), say, for every x e S, as n + oo.
(vi) Proceeding in this manner find a subsequence { . k + 1 : n l} c {8,,, k : n > l}
such that f k^ k(x) -- f k (x), say, for every x e S.

(vii) Let 6, := 8 . Show that f k "(x) - f (x) for every k >_ 0 and every x e S, as
n -* oo.
2. Let the elements of a finite set S be labeled 1, 2, ... , m.
(i) Show that the set B(S) of all real-valued functions on S may be identified with 1m,
and the "sup norm" on B(S) corresponds to the norm Ixl := max {Ix' I: I _< i <_ m}
)
on p'".
(ii) Let T be a (strict) contraction on S, i.e., IITf Tg <6111 gll for some

6 e [0, 1). Show that this may be viewed as a contraction, also denoted T, on
68",i.e.,ITxTyI Ix yl.
(iii) Show that T"O converges to a point, say x *, in S. [Hint: {TO: n >_ l} is a
Cauchy sequence in l8'".]
(iv) Show that Tx* = x *.
(v) Show that T"y x* whatever be ye p'". [Hint: If 7'x* = x *, Ty* = y *, then
x* = y *]
3. Write out the dynamic programming equation for the optimal infinite-horizon
discounted reward version of part (i) of the taxicab operation problem described in
Example 1.2.
4. (i) Prove Theorem 2.3.
(ii) Under the hypothesis of Theorem 2.3 show that (T"O)(x) is the N -stage optimal
reward, where T is defined by (2.15).
Exercises for Section VI.3
_<
1. Suppose {X} is a diffusion on Oa k \G with absorption at OG, starting at x G. Give
an argument analogous to that given in Exercise 3.5 of Chapter V to prove that
P:(taG t) = o(t), as t f, 0.
2. Check (3.8).
3. Check (3.13).
4. Show that the probability that the diffusion with coefficients (3.40), with c replaced
by c(x) (continuously differentiable) ever reaches 0, starting at x > 0, is zero.
5. Suppose that the expression on the extreme right in (3.47) is greater than or equal
to 1, then the maximum in (3.41) is attained at c* = 1. [Hint: c = 0 gives a minimum,
and the function is strictly concave on [0, 1].]
6. Suppose {X,} is a diffusion on (a, b) with infinitesimal generator A = 26 2 (x) d 2 /dx 2 +
(x) d /dx. Let f be in the domain of A, i.e., (T, f r)
, /t -* Af as t j 0. Prove that the
function g(x):= E x f e s(T s f )(x) ds satisfies the resolvent equation: (/3 A)g(x) _
-
fo
f(x). [Hint: (T,g)(x) = e`9(x) efl' e -vs(Tsf)(x) ds.]
Theoretical Complements to Section VI.I
_< _<
1. (Extensions to More General State Spaces: Finite-Horizon Case) Let S be a complete
separable metric space, A a compact metric space, g,,, (0 k N) continuous
real-valued functions on S x A, p(dz; z, a) weakly continuous on S x A into f(S),

where e(S) is the set of all probability measures on the Borel sigmafield of S. Then
the maxima in (1.11), (1.13), and (1.14) are attained, and the maximal functions h k
are continuous on S. Since there may not be a unique point in A where (1.14) is
maximized, one needs to use a measurable selection theorem to obtain measurable
functions f N, f N_ 1 , ... , f on S into A in order to define a Markovian policy
f* _ (f , f *, ... , f N). A proof of the existence of measurable selections may be found
in A. Maitra (1968), "Discounted Dynamic Programming on Compact Metric
Spaces," Sankhy, Ser. A, 30, pp. 211-216. The proof of optimality remains
unchanged.
The above assumptions can be further relaxed. Assume (A):
1. S is a nonempty Borel subset of a complete separable metric space.

2. A is a nonempty compact metric space.
3. g k (k = 0, 1, .. . , N) are bounded upper semicontinuous real-valued functions on
S x A.
4. p(dz; x, a) is weakly continuous on S x A into f(S).
Under (A) the following hold:
(i) h N (x) 2= SUP QEA g s,(x, a) is attained (i.e., h N (x) = g N (x, aN(x)) for some a N (x) e A)
and is upper semicontinuous on S.
(ii) Given that hk +l is upper semicontinuous on S:
f
(a) The function (x, a) . h k+ 1(z)p(dz; x, a) is upper semicontinuous on S x A.
(b) h k (x) = suq, EA {g k (x, a) + $ h k+ ,( z)p(dz; x, a)} is upper semicontinuous on S
(and the supremum is attained).
(c) There is a Borel-measurable function f k on S into A such that
h&(x) = gk(x,fk(x)) + f hk+ 1(z)p(dz;x,f k(x)).
See, Maitra, loc. cit., for this generalization.
Therefore, Proposition 1.1 and Theorem 1.2 go over under (A).
Theoretical Complements to Section VI.2

1. (Infinite-Horizon Discounted Problem) Consider the assumption (A'): Conditions
(A) above, with (3) replaced by (3)': g o is bounded and upper semicontinuous on S x A.
Theorem 2.2 holds under (A'). The proof goes over word for word if one takes
B(S) to be the set of all real-valued bounded Borel measurable functions on S.
2. (Semi-Markov Models) All the results of Sections 1 and 2 have extensions to
semi-Markov models, which include, as special cases, the discrete-time models of
Sections 1 and 2, as well as continuous-time jump Markov-type models. In order to
describe this extension let (1) the state space S be a Borel subset of a complete
separable metric space, (2) the action space A be compact metric; (3) a reward rate
r(x, a) be given, which is bounded and upper semicontinuous on S x A. (4) In addition,
one is given the holding time distribution y(du; x, a) in state x when an action a is
taken, and the probability distribution of transition p(dz; x, a) to a new state z from

the present state x when an action a is taken; (x, a) -* y(du; x, a) and

(x, a) - p(dz; x, a) are assumed to be weakly continuous.
A policy f is a sequence of functions f = ( fo , f f2 where fo is a measurable
, ...),
function on S into A, f, is a measurable function on S x A x S into A, ... , fk is a

measurable function on (S x A) k x S into A. Given such a policy and an initial state
x, an action f0 (x) is taken. Then the process remains in state x for a random time
To having distribution y(du; x, f0 (x)). At the end of this time, the state changes to X,
with distribution p(dz; x, fo (x)). Then an action a, = f1 (x, fo (x), X,) is taken. The
process remains in state X, for a random time T, having distribution y(du; X 1 , a s ),
conditionally given X,. At the end of this period the state changes to X 2 having
distribution p(dz; X,, a, ), conditionally given X 1 . This goes on indefinitely. The
expected discounted reward under the policy is
Jx := EX I
0
e - `r(Y, a,) dt
where > 0 is the discount rate, Y is the state at time t, and a, is the action at time
t. A policy f is (semi-)Markov if, for all k >- 1, f, depends only on the last coordinate
among its arguments, i.e., if fk is a function on S into A. Under such a policy, the
stochastic process { },: t >- 0} is semi-Markov. In other words, although the embedded
process {Xk : k >- 0} is (nonhomogeneous) Markov having transition probability
p(dz; x, fk _,(x)) at the kth step, and the holding times To , T,, ... are conditionally
independent given {X k : k >- 0}, the latter (conditional) distributions y(du; x, fk (x)) are
not in general exponential. Hence, { Y: t >- 0} is not Markov in general. A policy
f =_ f is said to be stationary if it is (semi-)Markov and if fk = f for all k, where
( )
f is a fixed Borel-measurable function on S into A. Under a mild additional assumption

f
that (5) 6,(x, a):= exp{ u}y(du; x, a} is bounded away from 1, it may be shown
that the optimal discounted reward J, is the unique upper semicontinuous solution to
the dynamic programming equation
J. = max fr(x, a)T B (x, a) + S Q (x, a) Jz P(dz; x, a) }

a J
Here T Q (x, a):= (1 S (X, a))/. In addition, there exists a Borel-measurable function
f * on S into A such that a = f *(x) maximizes the right side above. For each such f*
the stationary policy f* is optimal. For details and references to the literature, see
R. N. Bhattacharya and M. Majumdar (1989), "Controlled Semi-Markov Models:
The Discounted Case," J. Statist. Plan. Inference, 21, pp. 365-381. The following
lists only a few of the significant earlier works on the subject. R. Howard (1960),
Dynamic Programming and Markov Processes, MIT Press, Cambridge, Mass.;
D. Blackwell (1965), "Discounted Dynamic Programming," Ann. Math. Statist., 36,
pp. 226-235; A. Maitra (1968), loc. cit.; S. M. Ross (1970), "Average Cost
Semi-Markov Decision Processes," J. Appl. Probability, 7, pp. 656-659.
The dynamic programming equations of this chapter (for example, (2.14))
originated in Bellman's pioneering work. See R. Bellman (1957), Dynamic
Programming, Princeton University Press, Princeton.
Theoretical Complements to Section VI.3

1. The policies that are shown in this section to be optimal in the class of all stationary
feasible policies are actually optimal in a much larger class of policies, namely, the
class of all nonanticipative feasible policies, which may depend on the entire past.
For details, see W. H. Fleming and R. W. Rishel (1975), Deterministic and Stochastic
Optimal Control, Springer-Verlag, New York, Chapters V and VI. Apart from some
technical measurability problems, the ideas involved in this extension of the class of
feasible policies are similar to those already considered in Sections 1 and 2.
2. Feller's construction of one-dimensional diffusions (see theoretical complement 2.3
to Chapter V) allows discontinuous drift and diffusion coefficients. Hence, in Example
1 one may allow all measurable functions v with values in [v s , v*]. This example is
due to M. Majumdar and R. Radner (1990), "Linear Models of Economic Survival
under Production Uncertainty," Working Paper 427, Dept. Econ., Cornell University.
Example 2 is a one-dimensional specialization of a result of R. C. Merton (1971),
"Optimal Consumption and Portfolio Rules in a Continuous-Time Model," J.
Economic Theory, 3, pp. 373-413.
3. The uniqueness of the solution to (3.55) is a little difficult to check, since r(x, c) is
unbounded as a function of x. One way to circumvent this problem is to fix T> 0
and consider the problem of maximizing the discounted reward over the finite horizon
[0, T]. This is carried out in Fleming and Rishel, loc. cit., pp. 160, 161. One may
then let T * oo.
Theoretical Complement to Section VIA

1. A readable account of the general theory of optimal stopping rules in discrete time
may be found in Y. S. Chow, H. Robbins, and D. Siegmund (1971), Great Expectations:
The Theory of Optimal Stopping, Houghton Mifflin, Boston.
Theoretical Complement to Section YES

The chapter application is based on H. Scarf (1960), "The Optimality of (S, s) Policies
in the Dynamic Inventory Problem," Mathematical Methods in the Social Sciences,
Stanford University Press, Stanford, pp. 96-202.
CHAPTER VII
An Introduction to Stochastic
Differential Equations
1 THE STOCHASTIC INTEGRAL
A diffusion {X,} on R' may be thought of as a Markov process that is locally

like a Brownian motion. That is, in some sense the following relation holds,
dX, = (X,) dt + a(X,) dB (1.1)
where /1(.), a(.) are given functions on ll ', and {B,} is a standard
one-dimensional Brownian motion. In other words, conditionally given
{X : 0 < s < t}, in a small time interval (t, t + dt] the displacement
5
dx, := X, +d , X, is approximately the Gaussian random variable u(X,) dt +

B,), having mean (X,) dt and variance 6 2 (X,) dt. Observe,
however, that (1.1) cannot be regarded as an ordinary differential equation such
as dX,/dt = (X,) + o (X,) dB,/dt. For, outside a set of probability 0, a Brownian
path is nowhere differentiable (Exercise 1; or see Exercises 7.8, 9.3 of Chapter I).
The precise sense in which (1.1) is true, and may be solved to construct a
diffusion with coefficients p(), a 2 (.), is described in this section and the next.
The present section is devoted to the definition of the integral version of (1.1):
X,=x+m(Xx )ds
0
+5 a(X)dB.
0
(1.2)
The first integral is an ordinary Riemann integral, which is defined if t(.) is

continuous and s Xs is continuous. But the second integral cannot be defined
as a RiemannStieltjes integral, as the Brownian paths are of unbounded
variation on [0, t] for every t > 0 (Exercise 1). On the other hand, for a constant
563
564 AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
function o(x) - a, the second integral has the obvious meaning as a(B, B o ),
so that (1.2) becomes
X,=x +u(X,)ds+a(B,B o ). (1.3)

0
It turns out that (1.3) may be solved, more or less by Picard's well-known
method of iteration for ordinary differential equations, if (x) is Lipschitzian
(Exercise 2). This definition of the (stochastic) integral with respect to the
Brownian increments dB, easily extends to the case of an integrand that is a
step function. In order that {X} may have the Markov property, it is necessary
to restrict attention to step functions that do not anticipate the futute. This
motivates the following development.
Let (S2, F, P) be a probability space on which a standard one-dimensional
Brownian motion {B,: t > 0} is defined. Suppose { : t >, 0} is an increasing
family of sub-sigmafields of F such that
(i) B. is 3- measurable (s >, 0),

(1.4)
(ii) {B, B s : t >, s} is independent of (events in) .;s (s > 0).
For example, one may take .y, = a {B.,: 0 < s < t }. This is the smallest .01, one
may take in view of (i). Often it is important to take larger A. As an example,
let SZ = a[{B s : 0 < s < t }, {Z z : A E A}], where {Z 2 : A E A} is a family of random
variables independent of {B,: t >, 0 }. For technical convenience, also assume
that .F F are P-complete, i.e., if N e .y, and P(N) = 0, then all subsets of N
are in Wit : this can easily be ensured (theoretical complement 1).
Next fix two time points 0 <, a < < oc. A real-valued stochastic process
{ f (t): a < t < } is said to be a nonanticipative step functional on [a, ] if there
exists a finite set of time points t o = a < t, < < t, = and random variables
f (0 < i < m) such that
(i) f is .F,,- measurable (0 < i < m),
(1.5)
(ii) f (t) =f for t, < t < t + , (O <, i < m 1), f () = f,.
;
Definition 1.1. The stochastic integral, or the Ito integral, of the nonanticipative
step functional f = { f(t): a < t < } in (1.5) is the stochastic process
f(to)(B, B1 0 ) . f(a)(B, Ba
I f(s) dB
`
s :=
i
) for t E [a, t,],
f(ti- 1 )(B,. B, ) + i(t1)(B, B1,)

;
for t e (ti, ti+ ],
=1
(a<t<). (1.6)
Observe that the Riemann-type sum (1.6) is obtained by taking the value of
the integrand at the left endpoint of a time interval (t,_,, ti]. As a consequence,
THE STOCHASTIC INTEGRAL 565
for each t E [a, ] the Ito integral ca f(s) dB., is .9i measurable, i.e., it is
nonanticipative. Some other important properties of this integral are contained
in the proposition below.
Proposition 1.1
(a) If f is a nonanticipative step functional on [a, ], then t ^ f f(s) dB, is
continuous for every w n S2; it is also additive, i.e.,
f^"
'f(u)dB+
s
j f(u)dB= J " f(u)dB., (a<'s<t<'). (1.7)
a
(b) If f, g are nonanticipative step functionals on [a, ], then
J a
(f(s) + g(s)) dB5 = J f(s) dB,, +
a a
g(s) dB,,. (1.8)
(c) Suppose f is a nonanticipative step functional on [a, ] such that
E
J R
a
f 2 (t) dB, = J
a
(Ef 2 (t)) dt < oo. (1.9)
Then {$f(s) dB 5 : a < t } is a square integrable {S,}-martingale, i.e.,
E(J
rI
a
f(u)dB )= f a
, f (u) dB. (a'<s<t<). (1.10)
Also,
2(s)dsl.Fa) (1.11)
E (f f(s)dB,)2I J' E (Ja f
and
c 2 r
f
Ef (s) dB 5 = 0,
' a
E

f (s) dB s = E f
a
2 (s) ds, (a < t < )
(1.12)
Proof. (a), (b) follow from Definition 1.1.

(c) Let f be given by (1.5). As J f(u) dB is . -measurable, it follows from
(1.7) that
s
E f(u)dB,,I 5 = f(u)dB.+E f(u)dBI. ). S (1.13)
a a s
Now, by (1.6), if t1 _ 1 <S ` t1 and t, < t < t ;+ 1 , then
J f(u) dB = f(s)(B1, Bs) +.f(t;)(B, ; ., Bt ; )
+ (1.14)
...
+f(ti- 1 )(B,, Bt,-,)+.f(t,)(B, B,,).
Observe that, for s' < t', B,. B5 . is independent of.. (property (1.4(ii)), so that
E(B,.B 5 .l.ys .) =0 (s' <t'). (1.15)
Applying this to (1.14),
E(f(s)(B,, B3) 2s) = f(s)E(B,, B:I gis) = 0 ,

E(f(tj)(Br j ^, B,) I ) = E[E(f(t;)(B,,. B,,)1 . J )1 2s]
(116)
=E[f(t ; )E(B t ,,, B,IJ,,)I.gis] = 0,...,
E(.f (tt)(B, Bra) j . ) = E[.f (tt)E(B, B, 1 gis)] = 0.
Therefore,
=0 (s (1.17)
E
( J s
t f(u)dBI.F,)
/
<t).
From this and (1.7), the martingale property (1.10) follows. In order to prove
(1.11), first let t e [a, t 1 ]. Then
E (f a
f(s) dBs J z I ya = E(.f Z (a)(B, Ba) 2 I ^à) = .% 2 (a)E((B ,
/ J Ba) 2 I Via)
=f2(a)(ta)=E(Jaf 2(s) ds ), (1.18)
by independence of .tea and B, Ba . If t e (t ; , t i ,,] then, by (1.6),
= E[{ 1 (tJ-1)(Bt^ Bti-t) + f (ti)(Bt B11 )1 2 I Via] .

E ((J f (S) dB5) 2
l (1.19)
Now the contribution of the product terms in (1.19) is zero. For, if j < k, then
E[f(ti-i)(B, Br,-1 )f(tk-1)(Btk - Btk-1 ) I Via]

=E[E(... tk- ,)1$Z 27
E[f(t1-1)(B,i Bri-1)f(tk-I)E(Btk - B lk - Si;tk -1 ) 19Q]
= 0. (1.20)
Therefore, (1.19) reduces to
f(s) dBs) z I'^ E(.%z(tj-1)(Eti Btj 1 )2 Vi a)

E[(J / J = j=1
+ E(f 2 (ti)(B, By) 2 1 Via)
= E[J Z(tj- 1)E((13ti B 1 )2 ti_t) `tea]

j=1
+ E[f 2(ti)E((B, B.) 2 I `^'ti) I Via]

i
= E(J 2 ( t j-1)( tj tj-1) ) + E(f 2(ti)(t ti) Via)

j=1
= E (t tj_ t_ + (t t /' tj
(1.21)
=E(
J t f 2 (s)dsI F.
The relations (1.12) are immediate consequences of (1.17) and (1.21). n
The next task is to extend the definition of the stochastic integral to a larger
class of functionals, by approximating these by step functionals.
Definition 1.2. A right-continuous stochastic process f = { f(t): a < t < },

such that f (t) is .-measurable, is said to be a nonanticipative functional on
[a, ]. Such an f is said to belong to .#[a, ] if
R
E f 2 (t)dt < co. (1.22)
a
If f e .#[a, ] for all > a, then f is said to belong to .#[ cc).
Proposition 1.2. Let f E ]. Then there exists a sequence { f,} of

nonanticipative step functionals belonging to i!'[a, ] such that
t3
E (f(t)f(t)) 2 dt-^0 asn --goo. (1.23)
a
Proof. Extend f(t) to ( oo, oo) by setting it zero outside [a, ]. For > 0
write g(t) := f (t s). Let i be a symmetric, continuous, nonnegative (non-
random) function that vanishes outside (-1, 1) and satisfies f 1_ 1 i1i(x) dx = I.
Write Ii e (x) %= 1i(x/s)/E. Then /0, vanishes outside (s, E) and satisries
f`-E he(x) dx = 1. Now define gE'= g E * k// s , i.e.,
g(t) = E 9E(t x)TE(x) dx = Jf(t i. x)'YE(x) dx

J E
f (y)Ê(y t + E) dy (y = t e x). (1.24)
=
J^
I 2E
Note that, for each w e S2, the Fourier transform of gE is
9E ( ) = 7E(S) YE( ) = e f(
' )Y ( ),
so that, by the Plancherel identity (Chapter 0, Eq. 8.45),
1
IgE(t) f(t)I Z dt = 2>z 19 0) f(^)I z d^
1

1
= 2^ I.f(^)I2Ie1E(^) lj 2 )d^. (1.25)
The last integrand is bounded by 21 f (^)I 2 . Since
E J d f J I.f ( )In ^
2 d dP = 2n Jn J f 2 (t) dt dP
= 2nE I f 2 (t) dt < co, (1.26)

J [ a.1
it follows, by Lebesgue's Dominated Convergence Theorem, that
J
E I gE(t) f (t)I 2 dt --> 0 as s 0. (1.27)
This proves that there exists a sequence {g} of continuous nonanticipative

functionals in t[a, ] such that
e
E (g(s) f (s)) 2 ds --> 0. (1.28)
It is now enough to prove that if g is a continuous element of ], then

there exists a sequence {h } of nonanticipative step functionals in .A'[a, ] such
that
v
E (h(s) g(s)) 2 ds --+0. (1.29)
f
For this, first assume that g is also bounded, jgj < M. Define
h(t)=gI a+ k (a )I
n /
ifa+k<t<a
n
+kt
n
(a),(0<k<nI),
h() = g().
Use Lebesgue's Dominated Convergence Theorem to prove (1.29). If g is

unbounded, then apply (1.29) with g replaced by its truncation g M , which agrees
with g on {t c [a, ]: jg(t)l M} and equals M on the set {g(t) < M} and
M on {g(t) > M}. Note that g m _ (g A M) v (M) is a continuous and
bounded element of #[a, ], and apply Lebesgue's Dominated Convergence
Theorem to get
B (g M (s)g(s)) 2 ds*0 asM--0x. n

2
We are now ready to define the stochastic integral of an arbitrary element

fin &[a, ]. Given such an f, let { f} be a sequence of step functionals satisfying
(1.23). For each pair of positive integers n, m, the stochastic integral
f (fn fm)(s) dB ( t < ) is a square integrable martingale, by Proposition
3
1.1(c). Therefore, by the Maximal Inequality (see Chapter!, Eq. 13.56) and the
second relation in (1.12), for all c > 0,
P(M, > 8 ) E(.fn(s) fm (S)) 2 ds/E 2 , (1.30)

Q
where
M,,,:=maxÎf ' (fjs) fm (s))dB s :a'<t<4. (1.31)

Choose an increasing sequence {n k } of positive integers such that the right side
of (1.30) is less than 1/k 2 for r = 1/k 2 , if n, m > n k . Denote by A the set
A :_ {cw e S2: M k k (w) <, 1/k 2 for all sufficiently large k} . (1.32)
By the BoreiCantelli Lemma (Section 6 of Chapter 0), P(A) = 1. Now on A

the series > k M k k+ , < oo. Thus, given any 6 > 0 there exists a positive integer
k(5) (depending on co E A) such that
>1 < Vj,lik(b).

k3k(b)
In other words, on the set A the sequence { f,, f, (s) dB,: a < t < } is Cauchy
k

Ja
in the supremum distance (1.31). Therefore, f", (s) dB, (a < t < ) converges
uniformly to a continuous function, outside a set of probability zero.
Definition 1.3. For f e ], the uniform limit of f f" (s) dB (a < t < ) k 5
on A as constructed above is called the stochastic integral, or the Ito integral,

of f, and is denoted by f,, f (s) dB, (a < t < ). It will be assumed that
t -+ $f(s) dB is continuous for all w e S2, with an arbitrary specification on
5
A`. If f e M[a, co ), then such a continuous version exists for f f (s) dB s on the
infinite interval [a, cc).
It should be noted that the Ito integral of f is well defined up to a null set.
For if {f}, {g"} are two sequences of nonanticipative step functionals both
satisfying (1.23), then
E (f"(s) g"(s)) 2 ds 0. (1.33)
If { f" }, {g," } are subsequences of {f}, {g"} such that f f" (s) dB s and
k k k
a
J g(s) dB3 both converge uniformly to some processes {Y, }, {Z, }, respectively,
outside a set of zero probability, then
Ef
a
(1 Z) 2 ds tim E
k- a
(fn k (S) 9m k (S))
2 ds = 0,
by (1.33) and Fatou's Lemma (Section 3 of Chapter 0). As { Y }, {Z are a.s. S 3}
continuous, one must have Ys = Z., for a < s < , outside a P -null set.
Note also that $f(s) dB, is .` ,-measurable for a < t <, , f e -#[a, ]. The
following generalization of Proposition 1.1 is almost immediate.
Theorem 1.3. Properties (a)(c) of Proposition 1.1 hold if f (and g) e W[a, ].
Proof. This follows from the corresponding properties of the approximating

step functionals { f k }, on taking limits.
" n
Example. Let f (s) = B . Then f e 4' [a, cc). Fix t > 0. Define f(t) = B, and
S
f"(s):= Br2 - 1 if r2 - "t<,s<(r +1)2 - "t (0<r<2" -1).
Then, writing Br," = Br2 -

2" -1 1 2" - 1
J fn(s) dB3 = B,,,(B,.+t,n B,,,)
0 r =0
,") _ (Br +1,n Br,n) 2
2 r=0
' B0 + 2Bt
zt ZB + ZB, a.s. (as n --> oo). (1.34)

DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS 571
The last convergence follows from an application of the Borel-Cantelli Lemma

(Exercise 4). Hence,
J
t
B5 dB 5 = (B-B)t. (1.35)
0
Notice that a formal application of ordinary calculus would yield

J Bs dB., = 2(B, B). Also, if one replaced the values of f at the left end
points of the subintervals by those at the right end points, then one would get,
in place of the first sum in (1.34),
z^ -i 1 z^ -i
( z i
(B z B0)
z
Br+l,n(Br+l,n Br,n) _ tBr+l,n Br,n) + -1
r=0 2 r=0
which converges a.s. to Zt + Z(B2 B).

Observe finally that the stochastic integral in (1.2) is well defined if } s(X )} S
belongs to .A'[0, cc). In the next section, it is shown that there exists a unique
continuous nonanticipative solution of (1.2) if u() and a(.) are Lipschitzian
and, in particular, {6(X )} e .H[0, cc).
5
2 CONSTRUCTION OF DIFFUSIONS AS SOLUTIONS OF

STOCHASTIC DIFFERENTIAL EQUATIONS
In the last section, a precise meaning was given to the stochastic differential
equation (1.1) in terms of its integral version (1.2). The present section is devoted
to the solution of (1.1) (or (1.2)).
2.1 Construction of One-Dimensional Diffusions

Let u(x) and v(x) be two real-valued functions on R' that are Lipschitzian,
i.e., there exists a constant M > 0 such that
I(x) i (Y)I <' Mix YI, 1Q(x) Q(Y)I '< Mix YI , Vx, y. (2.1)
The first order of business is to show that equation (1.1) is valid as a stochastic
integral equation in the sense of Ito, i.e.,
Xt = X + Jt (Xs) ds + J t a(XS) dB5, (t ^ a). (2.2)

0 a
Theorem 2.2. Suppose p(), a(.) satisfy (2.1). Then, for each 3-measurable
square integrable random variable X, there exists a unique (except on a set of
zero probability) continuous nonanticipative functional {Xt : t >, a} belonging
to .i [a, oo) that satisfies (2.2).
Proof. We prove "existence" by the method of iterations. Fix T> a. Let
X,1):= V <, t < T. (2.3)
Define, recursively,
f
XI(" +i) ; = Xa + ,u(X(")) ds +
f
a
'
a
t
o(X) dB,, a t T. (2.4)
For example,
Xi l) = Xa + (Xa)(t a) + u(Xa)(B, Ba),

X i2) = Xa + lz([X + u(XX)(s a) + Q(Xa)(BS Ba)]) ds
f.' (2.5)
+
J ([Xa + p(XX)(s
a) + u(Xa)(B.,
B )]) dB,,
2 a t <, T.
Note that, for each n, a < t < T} is a continuous nonanticipative

functional on [a, T]. Also,
r2
t
(X ^"+ 1 ) _ X = ") ) 2
If.t
((Xs"))_u(Xs" i))) ds +I(Q(Xs" ) )o(XSn -1) )) dB

a
2
2( /2(XS" -1) ds
(k(X")) ))
j a
-1 dB s ) 2 , (n > 1). (2.6)

+ 21j t (ff(Xr"r) u (Xsn
)))
\JJ a
Write
D := E(max (Xs" Xs" 1) ) 2 ) ) (2.7)

aSs^t
Taking expectations of the maximum in (2.6), over 0 < t < T, and using (2.1)
we get
T 2
D r" + 1) 2EM2
T I X(")
s X s( " - ' ds
)
+ 2E max (Q(Xs" ) Q(Xs"

)
1) )) dB s 1 2 . (2.8)
aSI-<T (f
.' /
Now,
J
J
T
EI a
T IXs"t XS" 'It ds) < (T a)E
J a
(Xsn) Xsn 1))2 ds
T
(T a) DS"t ds. (2.9)
a
Also, by the Maximal Inequality (Chapter I, Theorem 13.6) and Theorem 1.3
(see the second relation in (1.12)),
E max (r(Xs") 6{X s" '?}) dBS2

T (f,'
4E1 J T (v(XS ) v(Xsn 1)}) dB) = 4J

")
E( a (Xsn^) _ (XSn 1)))2 ds
4M2
E E(Xs"t Xs" - ' 1 )zds 4M z J T DS" ds.
a
(2.10)
Using (2.9) and (2.10) in (2.8), obtain
DT +II \ (2M 2 (T a) + 8M 2 ) JT Ds"I ds = c l J T Ds"I ds (n > 1), (2.11)

a a
say. An analogous, but simpler, calculation gives
DTI < 2(T a) 2 Ep 2 (Xa ) + 8(T a)E i 2 (X ) = c 2 , (2.12)
where c z is a finite positive number (Exercise 1). It follows from (2.11), (2.12),
and induction that
D("") ^ c z (T a)"ci (n0).

(2.13)
n.
By Chebyshev's Inequality,
2
zn c 2(T
PI max ^Xtn+'1 Xt"Ij > 2 - ") < (2.14)
T n.
Since the right side is summable in n, it follows from the Borel-Cantelli Lemma
that
P( max IX,(" + 't X; "Il > 2 for infinitely many n) = 0. (2.15)

a-<t -<T

Let N denote the set within parentheses in (2.15). Outside N, the series
Xô) + Xtn +l) X ,(" 1 = um X t(")

r ( t t )
n=0 n-+m
converges absolutely, uniformly for a < t < T, to a nonanticipative functional

{Xt : a < t < T }, say. Since each X;" ) is continuous and the convergence is
uniform, the limit is continuous. Also, by Fatou's Lemma,
J a
T E(X;" ) Xt ) 2 dt <
m
tim
- 00 f ^,
T E(X;" ) X;'" ) ) 2 dt. (2.16)
By the triangle inequality for L2 - norms, and (2.13), if m > n then

T1/2 m / T 1/2
n) Xm)) 2 dt^\ 1 E(X[r I) Xir))2 dt )
j
( E(X e r=n+1 t\ a
m o0
(DT))"2 \ (DT))1 12 > 0, (2.1 7 )
rn+1 / rn+1
as n w (Exercise 2). Therefore, {Xt : a < t < T} is in #[a, T], and

/' T
E(X(" ) Xt ) 2 dt +0, asn--roo. (2.18)
Ja
To prove that X, satisfies (2.2), note that
max ! t (Xs" ) ds + I t o(Xs" ) dBs

T JJ a Ja
) )
f' (X,)
ds J o(XS) dB5
1
a
f
T
M J IXS" Xs 1 ds + max
a a,tsT
)
a'
(Q(XS" ) u(XX)) dB s . (2.19)
)
The first term on the right side goes to zero, as proved above. For the second
term, use the Maximal Inequality (Chapter I, Theorem 13.6) to get
PI max I
\\\a5t5T f. '
(a(Xs" ) ) a(X2)) dB, > 1 J < k 2 E
k /
J T (v(Xs" )
a
) ^(X)) 2 ds
Y (D) ) (2.20)
k2
\ r-
= n+1
Since the last expression in (2.20) is summable in n (Exercise 2), it follows from
the BorelCanteIli Lemma that max{jsa (o(XS" ) ) a(X5 )) dB s j: a < t < } < 1/k
for all sufficiently large n, outside a set Nk of zero probability. Let

No := U {Nk : 1 < k < co}. Then No has zero probability and, outside No , the
second term on the right side of (2.19) goes to zero. Thus, the right side of (2.4)
converges uniformly on [a, T] to the right side of (2.2) as n + cc, outside a set
of zero probability. Since X 1 converges to XX uniformly on [a, T] outside a
set of zero probability, (2.2) holds on [a, T] outside a set of zero probability.
It remains to prove the uniqueness to the solution to (2.2). Let { Y,: a < t < T}
be another solution. Then, write 49 E(max{(X 3 YI: a , s'< t}) 2 and use
(2.7)(2.11) with cp, in place of D, X, in place of X;" ) and Y in place of X;" - ' ) ,
to get
T
(PT < cl q ds. (2.21)
Since t + cp, is nondecreasing, iteration of (2.21) leads, just as in (2.12), (2.13), to
c 2 (T a) C1 "
PP T ^ --'.0 asn *oo. (2.22)
n!
Hence, (p T = 0, i.e., X, = Y on [a, T] outside a set of zero probability.
The next result identifies the stochastic process {X,: t >, 0} solving (2.2), in
the case a = 0, as a diffusion with drift u(. ), diffusion coefficient a 2 (), and
initial distribution as the distribution of the given random variable X 0 .
Theorem 2.2. Assume (2.1). For each x E El let {X: t > 0} denote the unique
continuous solution in .l[0, oo) of It's integral equation
X f = x +
f o,
p(Xs) ds + i 1 u(Xs) dB (t > 0).
,10
5 (2.23)
Then {X'} is a diffusion on R' with drift j(.) and diffusion coefficient a 2 (. ),
starting at x.
Proof. By the additivity of the Riemann and stochastic integrals,
X; = Xs +
S
t
p(X) du + f 5
' u(Xx) dB, (t '> s). (2.24)
Consider also the equation, for z e R',
X, = z + J S
u(X) du + J a(X)dB,
S
(t >' s). (2.25)
Let us write the solution to (2.25) (in . #[s, co)) as 6(s, t; z, Bs), where
Bs:= {B B s : s s} is a continuous stochastic process in #[s, co)
and is, by (2.24), a solution to (2.25) with z = Xs, it follows from the uniqueness
of this solution that
X; = 8(s, t; Xs, B), (t >, s). (2.26)
Since Xs is .-measurable and S and Bs are independent, (2.26) implies (see

Section 4 of Chapter 0, part (b) of the theorem on Independence and Conditional
Expectation) that the conditional distribution of X1 given .Fs is the distribution of
6(s, t; z, BS), say q(s, t; z, dy), evaluated at z = X. Since o {X: 0 0, the solution 8(s + h, t + h; z, Bs+;,) of
+n +h
X,+h = z +
l' ,h
(X) du +
s+n
J
a(X) dB, (t >, s) (2.27)
has the same distribution as that of (2.25). This fact is verified by noting that
the successive approximations (2.3)(2.5) yield the same functions in the two
cases except that, in the case of (2.27), B., is replaced by Bs+,^,. But B, and B +
have
have the same distribution, so that q(s + h, t + h; z, dy) = q(s, t; z, dy). This
proves homogeneity.
To prove that {X} is a diffusion in the sense of (1.2) or (1.2)' of Chapter
V, assume for the sake of simplicity that () and o 2 (. ) are bounded (see Exercise
7 for the general case). Then
E(X' x) = E J r u(X s) ds + E a(Xs) dBs = E

0 0
J r
o
(X5) ds
= t(x) + J t E(lc(Xs) (x)) ds = tp(x) + o(t), (2.28)

0
since Ej(XS) (x) 0 as s J. 0. Next,
E(Xx x) 2 = E( Jo
(X;)ds
r
)
Z
+E(fo
t a(Xs)dB5)
/
(X5)
+ 2EL(
J o ds)( o J a(X) dB.,) ]
= 0 (t 2 ) +
J 0
Ea 2 (X) ds + 0(t) = 0(t) as t j 0. (2.29)
For, by the Schwarz Inequality,
[(J o (Xs) ds)(o a(XS) dB.

E )]
[E(fo (X s) ds I 2] 1/2[E(fo a(X) dB))2]

(' c /1/z
= O(t)[ J E2(XS)ds] = O(t)O(t" 2 ) = 0(t) as t j 0,
0
leading to
E(X, x) 2 =J o z (X,,) ds + o(t) as t j 0. (2.30)
Now, as in (2.28),
J Ea
0
2(XS) ds =
0
a 2 (x) dx +
= ta 2 (x) + 0(t)
J0
E(a 2 (X;) a 2 (x)) ds
as t j 0. (2.31)
The last condition (1.2)' of Chapter V is checked in Section 3, Corollary 3.5.

n
2.2 Construction of Multidimensional Diffusions
In order to construct multidimensional diffusions by the method of Ito it is
necessary to define stochastic integrals for vector-valued integrands with respect
to the increments of a multidimensional Brownian motion. This turns out to
be rather straightforward.
Let {B, = ( B;' 1 , B, )} be a standard k-dimensional Brownian motion.
... ,
Assume (1.4) for {B,}. Define a vector-valued stochastic process
f = {f(t) = (f">(t), ... , .f (k) (t)): a z t < }
to be a nonanticipative step functional on [a, ] if (1.5) holds for some finite set
of time points t o = a < t, < <t,, = and some random vectors f ;
(0 < i < m) with values in W'.
Definition 2.1. The stochastic integral, or the Ito integral of a nonanticipative

step functional f on [a, ] is defined by
f(to) (B, B oo ) for t e [a, t,],
Y f(t ; _,) (B r , B, ; )+f(t r )(B,B 1 ,) forte(t.,t. +i ]

j=1
(2.32)
Here (dot) denotes euclidean inner product,
f(s) (B, B ) _ S f ` (s)(B,(' Bs' ).

( )
) ) (2.33)
It follows from this definition that the stochastic integral (2.32) is the sum
of k one-dimensional stochastic integrals,
k
f(s)dB =
J a
s
i=1
f m`m(s) dB(` . ) (2.34)
Parts (a), (b) of Proposition 1.1 extend immediately. In order to extend part
(c) assume, as in (1.9),
J E^f(u)1 2 du - 1
t1
J
a
E(f ^`^(u)) 2 du < cc. (2.35)
The square integrability and the martingale property of the stochastic integral
follow from those of each of its k summands in (2.34). Similarly, one has
e f(s)dB s I = 0. (2.36)
E(J a 111
It remains to prove the analog of (1.11), namely,
f(u) dB If(u)I 2 du ^). (2.37)

E\(Ja ") 2 ja ) = E j'
In view of (1.11) applied to $ f" 1 (u) dB, (1 <, i < k), it is enough to prove that
the product terms vanish, i.e.,

EL ( of n(u) dB(')Xf " fv1(u) dB)
a ^
= 0 for i J. (2.38)
To see this, note that, for s < s' >) I Fa]
= E[E(... .mau) I . ] = E[.f (n(s)(Bs`) Bsn)f u)( u )E(B/) B;) I .F) I .Fa] = 0.
(2.39)
Also, by the indepencence of .Fs and {B s . B s : s' s }, and by the independence

of {B{} and {Bl'l} for i j,

E[ f <<1 ci) (s)(BS'
(s)(B BS` ) ).i ) B u) ) ^ ^]
a
= E[E(. ..1 .ys) 1 Ja] = E[f ( `'(s).f u 1 (s){E((B (,) Bs )(BSP B u> ) 1 ^)} ( te ]
= 0.
Thus, Proposition 1.1 is fully extended to k-dimension. For the extension to

more general nonanticipative functionals, consider the following analog of
Definition 1.2.
Definition 2.2.. If f = {f(t) = (f ' 1 (t), ... , f(m))) a < t < } is a right-continuous
stochastic process with values in (for some m) such that f(t) is F,-measurable
for all t e [a, ], then f is called a nonanticipative functional on [a, ] with values
in fl8m. If, in addition,
J
E R If(t)j 2 dt < oo, (2.40)
then f is said to belong to .WW[a, ]. If f e W[a, ] for all > a, then f belongs
to ./H[a, o)).
For f e di[, ] with values in R'` one may apply the results in Section Ito
each of the k components f;, f (u) dBu (1 <, i < k) to extend Proposition 1.2
to k-dimension and to define the stochastic integral f f(u) dB. for a
k-dimensional f e ^[a ]. The following extension of Theorem 1.3 is now
immediate.
Theorem 2.3. Suppose f e . &{x, ] with values in l8 k . Then
(a) t J r f(u) dB is continuous and additive; also
(b)tJ f(u) dB: a < t <, } is a square integrable {, }-martingale and

F
f(u) f If(u)I 2 du . f . (2.41)

E \\â dBu ) 1 3a / = E
The final task is to construct multidimensional diffusions. For this let
(x), a i; (x) (1 < i, j < k) be real-valued Lipschitzian functions on W. Then,
writing, a(x) for the k x k matrix ((u jj (x))),
I t(x) t(y) I '< Mx YI, Ia(x) a(Y)II '< Mix YI, (x, ye
(2.42)
for some positive constant M. Here II II denotes the matrix norm,

IIDII := sup{IDzI: z e I8k}.
Consider the vector stochastic integral equation,
X, = X a + Ja ' (Xs) ds + Ja a(X5)

t dB5, (t s), (2.43)
which is shorthand for the system of k equations
X;'' = X +
a
p (X,) ds +
a
a ; (X 5 )dB s , ( 1 < i < k). (2.44)
Here ,(x) is the k- dimensional vector, (6 i1 (x), ... , v ;k (x)). The proofs of the
following theorems are entirely analogous to those of Theorems 2.1 and 2.2
(Exercises 4 and 5). Basically, one only needs to write
IX E+ ,( t) X(t)l 2 instead of (Xa+ 1 (t) X(t)) 2 ,

r 2 /(' r \2
J a (jt(X(s)) (X _ 1 (s))) ds in place of

(J a ((X(s)) (X _ 1 (s))) ds
ds)I
Ila(Xn(s)) a(Xn- i(s))II 2 for (6(Xn (ss)) Q(X _ (s)) 2 , etc.
Theorem 2.4. Suppose p( ), a(.) satisfy (2.42). Then, for each -measurable
square integrable random vector X 8 , there exists a unique (up to a P -null set)
continuous element {X,: t > a} of #[a, oo) such that (2.43) holds.
Theorem 2.5. Suppose ( ), a(.) satisfy (2.42), and let {X'; t > 0) denote the
unique (up to a P-null set) continuous nonanticipative functional in .t[0, co)
satisfying It's integral equation
X,
f' ,
s) ds +
0
J r 6(X 8 ) dB5 , ( t >- 0). (2.45)
Then {X'} is a diffusion on R' with drift t() and diffusion matrix a()6'( ),
a'(.) being the transpose i(.).
It may be noted that in Theorems 2.4 and 2.5 it is not assumed that a(x) is
positive definite. The positive definiteness guarantees the existence of a density
for the transition probability and its smoothness (see theoretical complement
5.1 of Chapter V), but is not needed for the Markov property.
Example. Let k = 1, (x) = yx, 6(x) = a. Then the successive approximations

X,(" (see Eqs. 2.3-2.5) are given by (assuming B 0 = 0)

)
X'( o) - X0,
X; ') = Xp + J yX ds + 6B, = X (I ty) + uB ,

Jo
o 0 r
X; 2) = X0 + J y{X (I sy) +
r 0 QB S } ds + aB ,
0
t i ,z \ r
= I ty+ } )XpyQ B,ds+aB
2!J 0
Cl S2y2 s
X1 3) = Xp + p y (I Sy + 2,--- ^ Xp y6 B u du + QB S ^ dS + 6B,
J(' B ds+a'B,
fo
2 2 3 31 I s Jz
= t ^ ---- IX 0 +, 2 6
I ty+ 2
2 2 33
/ f(' o,
r
0
B dudsy6
0
s
i t 3! ^Xp+,, a
ty+ -2
t22 t3 3
o
r
( f ds)Bduy f Bs ds+aB,
. ' o,
= 1 ty+ -- ,
2!
- - ---'-)X0+,r 2 (J
3
J( '
o
(tu)Bdu
0
Bsds+aB,.
(2.46)
Assume, as an induction hypothesis,
I
fr(tS )m
-
(ty)Mn-1
a," ) = Y -- )X 0 + Y- (Y)r"6 Bs ds + 6B,. (2.47)
( M "= 0 mI m1 0 (n1 I)!
Now use (2.4) to check that (2.47) holds for n + 1, replacing n. Therefore, (2.47)
holds for all n. But as n co, the right side converges (for every w e S2) to
e -r yX0 ya f r e-vlr-S^B5 ds + QB,.

(2.48)
Jo
Hence, Xr equals (2.48). In particular, with Xp = x,
X, = e-"x r
t7 e-''u-s)B5 ds + aB r . (2.49)
0
As a special case, for y > 0, a 0- 0, (2.49) gives a representation of an

OrnsteinUhlenbeck process as a functional of a Brownian motion.
3 IT'S LEMMA
Brownian paths s - B, have finite quadratic variation on every finite time

interval [0, t] in the sense that
x -1
max Ji ( B(m+l)2-nt - Bm2-nt) 2 - N2 - "t -+0 a.s. as n - cc. (3.1)
16N52" m =0
This is easily checked by recognizing that the expression ZN say, within the ,,, ,
absolute signs is, for each n, a martingale (1 < N < 2 "), so that the Maximal
Inequality (Chapter I, Eq. 13.56) may be used to prove (3.1) (Exercise 1). Notice
that the quadratic variation of {B } over an interval equals the length of the
5
interval. A consequence of (3.1) is a curious and extremely important "chain

rule" for the stochastic calculus, called It's Lemma. The present section is
devoted to a derivation of this chain rule and some applications.
For an intuitive understanding of this chain rule, consider a nonanticipative
functional of the form
Y(t) = Y(0) + J " f(s) ds + fo g(s) dB

0
5 , ( 3.2)
where f, g E l'[0, T], and Y(0) is .-measurable and square integrable. One
may express (3.2) in the differential form
dY(t) = f(t)dt + g(t)dB,. (3.3)
Suppose that cp is a real-valued twice continuously differentiable function on

ll , say with bounded derivatives gyp', cp". Then It's Lemma says
dq (Y(t)) = q'( Y(t)) dY(t) + zip " (Y(t))g 2 (t) dt

_ {qp'(Y(t)) f(t) + iqp"(Y(t))g 2 (t))} dt + gp'(Y(t))g(t) dB,. (3.4)
.
In other words,
(P(Y(t)) = (P(Y(0)) +
E {(P'(Y(s))f(s) + 1 "(Y(s))g 2 (s)} ds +
0
-
J 9 '(Y(s))g(s) dB 5
(3.5)
.
Observe that a formal application of ordinary calculus would give
dcp(Y(t)) = q'(Y(t)) d Y(t) = cp'(Y(t))f(t) dt + tp'(Y(t))g(t) dB,. (3.6)
The extra term Zip "(Y(t))g 2 (t) dt appearing in (3.4) arises because
(dY(t)) 2 = g 2 (t)(dB,) 2 + o(dt) = g 2 (t) dt + o(dt).

ITO'S LEMMA 583
Since the term g 2 (t) dt cannot be neglected in computing the differential dcp(Y(t)),
one must expand cp(Y(t + dt)) around Y(t) in a Taylor expansion including the
second derivative of cp.
The same argument as above applied to a function p(t, y) on [0, T] x (f8 1 ,
such that cp o := cp/t, q,', cp" are continuous and bounded, leads to
dcp(t, Y(t)) = {(p o (t, Y(t)) + cp'(t, Y(t))f(t) + jqp'(t, Y(t))g 2 (t)} dt
+ cp'(t, Y(t))g(t) dB,. (3.7)
To state an extension to multidimensions, let cp(t, y) be a function on

[0, T] x Ihm that is once continuously differentiable in t and twice in y. Write

a0(P(t, Y) __ a0t, Y) 7r(P(t, y)'= a Y) (1 < r < m). (3.8)
at
Let {B,} be a k-dimensional standard Brownian motion satisfying conditions

(1.4(i), (ii)) (with B s replaced by B S ). Suppose Y(t) is a vector of m processes of
the form
Y(t) = (Y (I) (t), . . . , Y

Im' \t)),
f
(3.9)
y (r) (t) = y()() + I fr( s) ds + J s dB s
grO r m).
(1 ^^
o o
Here fr,. 'fm are real-valued and g 1 , ... , g m vector-valued (with values in
ff8'k ) nonanticipative functionals belonging to . &[0, T]. Also Y ( ' ) (0) are
.o -measurable square integrable random variables. One may express (3.9) in
the differential form
dY r (t) = fr (t) dt + g,(t).dB,

( ) (1 <, r v m). (3.10)
It's lemma says,
dQ(t, Y(t)) = U o(p(t , Y(t)) + ar(p(t, Y(t))f(t)

l r=1
1
+ arar. (t, Y(t))(gr(t) . r (t)) dt
2 ! ISr.r'-<m
m
+ I a r (p(t, Y(t))gr (t).dB,. (3.11)
r
In order to arrive at this, write
dcp(t, Y(t)) = (p(t + dt, Y(t + dt)) (p(t, Y(t)) = ^p(t + dt, Y(t + dt))
(p(t, Y(t + dt)) + qp(t, Y(t + dt)) qo(t, Y(t))
= 0 (p(t, Y(t)) dt + a r (p(t, Y(t)) dY r (t) ( )
r=1
+ 12 Orar'gP(t,Y(t))dy(r)(t)dY(r')(t).
! 1 r,r' S m
(3.12)
In Newtonian calculus, of course, the contribution of the last sum to the

differential would be zero. But there is one term in the product dY r (t) dY r' (t) ( ) ( )
which is of the order dt and, therefore, must be retained in computing the

stochastic differential. This term is (see Eq. 3.10)
(gr(t)dB1)(gr(t)dB) _ (^ 9("(t) dB)Xj

\t - 1 Y
= 1 9(t)
_
f=1
ga 1 (t)9:` 1 (t)(dB,( ) 2 + E 9( t) (t)9(' ) (t)dB(` ) dBf;> .
i#j
(3.13)
Now as seen above (see Eq. 3.1) the first sum in (3.13) equals
19r` (t)9:` (t)

) ) dt = gr(t)gr (t) dr. (3.14)
( )
To show that the contribution of the second term in (3.13) to

cp(t, Y(t)) cp(0, Y(0)) over any interval [0, t] is zero note that, for i j,
N-1
= Y 9:` ) (m 2 "t)grj) (m 2-" t)(B(m+l)Y ^t - Bm 2 ^t)(Bm +1)2
-
t ) - ^t - Bm2 ^t),
m=0
I <N<2", (3.15)
is a martingale. This is true as the conditional expectation of each summand,

given all the preceding, is zero. Therefore, as in (3.1),
max JZ N , 0 a.s. as n -+ oo . (3.16)

1 <N<,2"
Thus,
g;(t)g;P(t) dB 1 dB 1 = 0 (i ^ j). (3.17)

ti

IT'S LEMMA 585
Using (3.14), (3.17) in (3.12), It's lemma (3.11) is obtained. A more elaborate
argument is given in theoretical complement 3.1. For ease of reference, here is
a statement of It's Lemma.
Theorem 3.1. (It's Lemma). Assume f1 .... , f g 1 , ... , g,,, belong to

.l[0, T], y ( r ) (0) (1 < r < m) .moo -measurable and square integrable. Let cp(t, y)
be a real-valued function on [0, T] x 1m which is once continuously
differentiable in t and twice in y. Assume that
E fOT ( 0 r 4P(s, Y(s)) 2 Ig r (s)I 2 ds < oo, (I < r <, m), (3.18)
i.e., r cpgr e .A"[0, T] (1 <, r < m). Then (3.11) holds, i.e.,
(P(t, Y(t)) w(s, Y(s)) = J IOo(P(u Y(u)) + Y_ Or^(U, Y(U))fr (U)

,
r= 1
+ 12 1
. 1 5
arar'(p(u, Y(u))gr(u)gr.(u)
r.r' <- m I
du
m t
+ Z 0 r cp(u, Y(u))g r (u) dB (0 <, s < t < T).
r=1 s
(3.19)
Applying It's Lemma to the diffusion Y(t) = X, (t >, 0) constructed in

Section 2, the following corollary is obtained immediately.
Corollary 3.2. Let {X t } be a diffusion given by the solution to the stochastic

differential equation (2.43), with a = 0 and tt( ), 6() Lipschitzian. Assume
cp(t, y) satisfies the hypothesis of Theorem 3.1 with m = k, and (3.18) replaced by
E fO T (arcP)2(u, Xu)I6r (Xu)I2du < oo, (1 < r < k).
(a) Then one has the relation
9(t, X,) = cp(s, X S ) + 5{O o (u, X) + (Atp)(u, X)} du

s
k t
+ rcp(u, X)6 r (X)dB. (s < t < T), (3.20)
r=1 s
where a r (x) is the rth row vector of a(x), and A is the differential operator
(14.9) of Chapter V with D(x) = a(x)a'(x),
r
(A(P)(u, x)'= drr'(X)arar'(v(u, x) + Z (X)ar(v(u, x), )
(
2 1-<r,r'Sk r=1
(3.21)
((drr'(X)))'^ O(X)O'(X).
(b) In particular, if 0 0 (p, O rr cp are bounded then
Z, := w(t, XJ ^ {a o (p(u, X.) + (A(p)(u, X.)} du (0 , t , T),

J 0
(3.22)
is a {.^,}-martingale.
Note that part (b) is an immediate consequence of part (a), since Z, Zs

equals the stochastic integral in (3.20), whose conditional expectation, given
.gis , is zero.
Corollary 3.2 generalizes Corollary 2.4 of Chapter V to a wider class of
functions and to multidimension. Therefore, one may provide derivations of
Propositions 2.5 and 15.1 of Chapter V, as well as criteria for transience and
recurrence based on them, using It's Lemma instead of Corollary 2.4 of Chapter
V. The next result similarly provides a criterion for positive recurrence. Recall
the notation (see Eqs. 15.33 and 15.34 of Chapter V)
D(x) _ ((d ;; (x))):= 6(x)a'(x), d(x):= Y d jj (x) x t') x ci)/jxj 2 ,

i.i
B(x):= > d(x), C(x): =2 x ( ` )JUY ) (x),
B(x) + C(x) B(x) + C(x)

(r) := max 1, (r) min 1, (3.23 )
Ixl =r d(x) 1xl =r d(x)
& (r):= max d(x), a(r)r= Ixl_r

min d(x),
Ixl =r
I(r) := f r (u) du, 1(r) := (u) du,

J ^ u fu
where c> 0 is a given constant. Also note that (see Eq. 15.35 of Chapter V)
for every F that is twice differentiable on (0, oo), Acp for cp(x):= F(IxI) is given by
B(x) + C(x)d(x)
2 (Aq, )(x) = (d(x))F (Ixl) + Ixl F (Ixl) (Ixt > 0). (3.24)
Proposition 3.3. Let t(.), 6() be Lipschitzian, and (x) nonsingular for all x.

IT'S LEMMA 587
Suppose that, for some c > 0,
J^ J 1 exp{1(u)} du < oo
a(u)
(3.25)
Then
E(t B(O:ro) ( X o = x) < cc (Ixj > ro > 0), (3.26)
where
t B(O : r) := inf{t 0: 1X^l = r} . (3.27)
Proof. First note that if (3.25) holds for some c> 0 then it holds for all c > 0
(Exercise 2). Let c = ro > 0. Define
F(r) :=
f"0
exp{ 1(u)}
u
J acv)exp{I(v)} dv J du, (r >, ro ) (3.28)
and
q(x) = F(Ixl), Ixi ro . (3.29)
Note that F'(r) > 0 and F"(r) + ((r)/r)F'(r) _ 1/a(r) for r > r 0 . Hence, by
(3.24),
2(A4)(x) < d(x)/a((x)) < 1 forIxI >, r 0 . (3.30)
Fix x such that Ixi > r0 . By Corollary 3.2(b), and optional stopping (Proposition
13.9 of Chapter 1; also see Exercises 3, 4),
EZgN = EZ o = q(x) = F(Ixl) (3.31)
where Z, is as in (3.22) with cp(t, y) = q(y) and cp o (t, y) = 0, and r1 N is the

{.}-stopping time
?J N := inf{t >, O:IX; I = ro or N}, (r0 < IxI < N). (3.32)
Using (3.30) in (3.31) the following inequality is obtained,
2EF(IXn I) 2F(Ixl) = E
N
J nN 2(Acp)(X9) ds E(11,). (3.33)
0
Now the first relation in (3.25) implies that T (O:, o ) < oo a.s. (see Corollary 15.2
of Chapter V. or Exercise 5). Therefore, rJ N --+ t,, B(a:ro) a.s. as N -* co, so that
(3.33) yields
E('tae(O:ro) 1 Xo = x) < 2 F(Ixl). (3.34)

n
As in the one-dimensional case (see Section 12 of Chapter V), it may be

shown that (3.26) implies the existence of a unique invariant distribution of the
diffusion (theoretical complement 5).
As a second application of It's Lemma, let us obtain an estimate of the
fourth moment of a stochastic integral.
Proposition 3.4. Let f = {(fi (t), ... , fk (t)): 0 < t < T} be a nonanticipative
functional on [0, T] satisfying
T f 4(t) dt < oo (i = 1, ... , k). (3.35)

EJ 0
Then
E
ff .
f(t)dB,4 < 9k 3 T Jy ^
'T{Ef
1=1
4(t)} dt.
0
(3.36)
Proof. Since f(t) dB, = Z _, f (t) dB}` ) ,
f
E( oT f(t)dB,f 4 =k 4 E) 1 j f Tf(t)dBi^>)4
/ \\\ki =io
4
k 4 Ek .f (t) dB/4 = k 3 Y E \J 1 (t) dB,(')

(f T
0 T f
(3.37)
Hence, it is enough to prove
T 4 T
E( g(t) dB) {Eg 4 (t)} dt (3.38)
0 0
for a real-valued nonanticipative functional g satisfying
E f0T
g 4 (t) dt < oo. (3.39)
First suppose g is a bounded nonanticipative step functional. Then by an explicit

computation (Exercise 6) the nonanticipative functional s -a (f g(u) dB.) 3 g(s)
belongs to AY[0, T]. One may then apply It's Lemma (see Eq. 3.5, or Eq.
IT'S LEMMA 589
3.20 with k = 1) to q (y) = y 4 and Y(t) = f o g(s) dB to get 5
fo g(u) dB I Z g 2 (s) ds + 4 fo 9(u) dBu 3 g(s) dBs>

( fo g(s) dBs / 4 6 \Jo (10 )
0<t<T. (3.40)
Taking expectations,
=6 f g(u) dB 1 z g 2 (s)} ds. (3.41)

J
J, E \ o g(s) dB, 4 o Ej \Jo
This shows, in particular, that J, is absolutely continuous with a density

satisfying
z 4^1/2
f
(' )
o g(u) dB {Eg4(t)}^^2
dtv = 6E I( 0 g(u) dB ) g2(t)} 6 E(
= 6J 2 {Eg 4 (t)}'/ 2 . (3.42)
Therefore, unless J, = 0 (which implies J = 0 for 0 < s < t), S
dJ,its Z J1 viz 5 3{Eg 4 (t)}'/

= 2,
dt dt
or,
JI'' < 3 J t {Eg 4 (s)} l / 2 ds, J, '< 9t J t (Eg 4 (s)) ds.

0 0
This proves (3.38) for bounded nonanticipative step functionals. For the case
of an arbitrary nonanticipative functional g satisfying (3.39), observe that such
a g belongs to ./l[0, T]. Following the proof of Proposition 1.2, there exists a
sequence of bounded step functionals g that converge almost surely to g and
for which
T
f 'T
9(s) dB, --
0
g(s) dB 5 a.s. (i = 1, 2),
E f
OT (gn(s) g z (S)) Z ds > 0.

By Fatou's Lemma,
n-
E J('o
T \ 4
g(t) dB r !
/
-< lim E
n-'m (f oT
g(t) dB,^
4
9T lim E
Jo
T
g, (t) dt
T
= 9TE fg 0
4 (t) dt. n
As a corollary, the property (1.2) of {X, } in Section 1 of Chapter V follows.
Corollary 3.5. Let {X} be a diffusion with drift coefficient p() and diffusion
coefficient a 2 (.), both Lipschitzian and bounded. Then, for every E > 0,
P(X" xl > E) = 0(t) as t j 0. (3.43)
Proof. By Chebyshev's Inequality,
P(IXI xl) < E(Xx x) 4 .

E
But if u(.) and a(.) are both bounded by c, then

r a
E(Xl x) 4 = EI (Xs) ds + o (Xs) dB s
/fo , Jo
4 1
4 + E (Joi(Xs) dBs)
23{E(Jo(Xs) ds)
8 {(ct) 4 + 9t 2 C 4 } = 0(t 2 ) = 0(t) as t j 0. n
For another application of It's Lemma (see Eq. 3.19), let
0
Y(t) != Xl X = x y+
J r

((Xs) (X5 )) ds + J r
0
(a(XS) a(XS )) dBs,
(3.44)
and cp(z) = Iz1 2 , to get
Ixe xs I Z = IX y1 2 + {2(x5 xs) . ((X5 ) (Xs ))

J r
0
+ I rr(Xs) ar(X5)i2} ds
r=1
+ J
0
2(X Xs) (a(Xs) (XS )) dB s . (3.45)
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS 591
Note that, since X, X; E l[0, oo) and p() and o(.) are Lipschitzian, the
expectation of the stochastic integral is zero, so that
EIX, Xfl 2 = Ix y1 2 +
J'
o f
2E(Xs Xs) . ((Xs) (Xs ))
+ 1 Et(X) - 6r(XS )I 2 ( ds. (3.46)

r=1
In particular, the left side is an absolutely continuous function of t with a density
dt EIX Xfl 2 2MEIXt Xf l 2 + kM 2 EIX, Xfl 2 . (3.47)
Integrate (3.47) to obtain
EIX' X, l 2 < Ix y12e(2M+kM2),. (3.48)
As a consequence, the diffusion has the Feller property: If y --^ x, then p(t; y, dz)
converges weakly to p(t; x, dz) for every t >, 0. To deduce this property, note
that (3.48) implies that, for every bounded Lipschitzian function h on R' with
Ih(z) h(z')I < cjz z'J,
IEh(XT) Eh(X^ )I = IE(h(XT) h(X^ ))I cEjX X; c(EIX; X; X 2 )'/ 2

clx yIe(M+kM2/2)t y 0 as y x. (3.49)
Now apply Theorem 5.1 of Chapter 0. To state the Feller property another
way, write T, for the transition operator
(T,f)(x) = Ef(X,) = f f (y)p(t; x, dy). (3.50)
The Feller property says that if f is bounded and continuous, so is T 1 J'.

A number of other applications of It's Lemma are sketched in the Exercises.
4 CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR

DIFFUSIONS
A significant advantage of the theory of stochastic differential equations over

other methods of construction of diffusions lies in its ability to construct and
analyze with relative ease those diffusions whose diffusion matrices
D(x):= u(x)a'(x) are singular. Such diffusions are known as singular, or
degenerate, diffusions. Observe that in Section 2 the only assumption made on
the coefficients is that they are Lipschitzian. As we shall see in this section, the
stochastic integral representation (2.45) and It's Lemma are effective tools in
analyzing the asymptotic behavior of these diffusions. Notice, on the other
hand, that the method of Section 15 of Chapter V (also see Section 3 of the
present chapter) does not work for analyzing transience, recurrence, etc., as
quantities such as (r) may not be finite.
Singular diffusions arise in many different contexts. Suppose that the velocity
V, of a particle satisfies the stochastic differential equation
dV, = 0(V,) dt + a 0 ' (V1) dB', (4.1)
where {B} is a standard three-dimensional Brownian motion. The position

Y, of the particle satisfies
dY, = V, dt. (4.2)
The process {X, := (V 1 , Y,)} is then a six-dimensional singular diffusion governed

by the stochastic differential equation
dX, = (X,) dt + a(X,) dB,, (4.3)
where
FL(x) = (o(v), v)', 6(x) 0 _I _

c'^
v) 0 '
writing x = (v, y) and 0 for a 3 x 3 null matrix. Here {B,} is a six-dimensional

standard Brownian motion whose first three coordinates comprise {B,(" }. Often,
as in the case of the OrnsteinUhlenbeck process (see Example 1.2 in Chapter
V, and Exercise 14.1(iii) of Chapter V), the velocity process has a unique
invariant distribution. The position process, being an integral of velocity, is
usually asymptotically Gaussian by an application of the central limit theorem.
Thus, the asymptotic properties of X, = (V Y,) may in this case be deduced
from those of the nonsingular diffusion {V, }. On the other hand, there are many
problems arising in applications in which the analysis is not as straightforward.
One may think of a deterministic process U, = (Ut", U^ 21 ) governed by a system
of ordinary differential equations dU, = (U 1 ) dt, having a unique fixed point
u* such that U, --+ u* as t -- co, no matter what the initial value U 0 may be.
Suppose now that a noise is superimposed that affects only the second
component Ue 2) directly. The perturbed system X, = (X; 11 , XI 21 ) may then be
governed by an equation of the form (4.3) with
( 0 0
6(x) - 0( (x)^
where a&(x) is nonsingular. The following result may be thought to deal with
this kind of phenomenon, although 6(x) need not be singular. For its statement,
use the notation
(x), J(x)_ ((J1i(x))). (4.4)
Theorem 4.1. Let i(.), 6() be Lipschitzian on ll ,
Il(x) a(y)II '< )olx yl for all x, ye UBk. (4.5)
Assume that, for all x, the eigenvalues of the symmetric matrix 2(J(x) + J'(x))
are all less than or equal to A 1 < 0, where k2 < 22 1 . Then there exists a
unique invariant distribution ir(dz) for the diffusion, and p(t; x, dz) converges
weakly to n(dy) as t , cc, for every x e W.
Proof. The first step is to show that, for some x, the family { p(t; x, dy): t >, 0}
is tight, i.e., given E > 0 there exists C E < oo such that
p(t; x, {y: IYI > c E }) <, e (t >, 0). (4.6)
It will then follow (see Chapter 0, Theorem 5.1) that there exist t ^ cc and a
probability measure it, perhaps depending on x, such that
weakly
p(t.; x, dy) i rc(dy) as n --+ oo. (4.7)
We will actually prove that
sup EIXfI Z < oo, (4.8)

t>'0
where {X'} is the solution to (4.3) with X 0 = x. Clearly, (4.6) follows from (4.7)
by means of Chebyshev's Inequality,
p(t; x, {lyl > c}) = P(I X I > c) i

c
EIX^ I 2 .
In order to prove (4.8), apply It6's Lemma (see Eq. 3.20) to the function
^(y) = Iy1 2 to get
5
t (' t
IXxI 2 = IXI2 + (Acp)(XS)ds + . ,(p(X7)a,(Xs)dB5. (4.9)
0 r= 1 0
Now check that

k
(Agq)(Y) = 2y it(Y) + X I(y)I 2 = 2 Y'(Y) + tr(a(Y)a' (Y)), (4.10)
where tr D denotes the trace of D, i.e., the sum of the diagonal elements of D.
Now, by a one-term Taylor expansion,
i
(r) (Y) r (0) =
( ) y grad j(By) dB,
0
(y) (0 ) = J 1 y'J(Oy)dO, (4.11)

0
(Y'J(Oy)Y) dO + y(0).
y (Y) = Y - ((Y) (0 )) + Y'( 0 ) =
f o
1
Now,
Y'J(OY)Y = Y'J'(OY)Y = zY'(J(OY) + J'(OY))Y 5 A,jy 2 . (4.12)
Using this in (4.11),
2 Y'(Y) < 2 1 y 2 + 2 (I t( 0 )I)IYI. (4.13)
Also,
tr(a(Y) 6 '(Y)) = tr{((Y) a( 0 ))(a(Y) 6 (0 ))' + a(0 )(u(Y) (0))'

+ (a(y) 6(0))6'(0) + a(0) (0)} . (4.14)
Since every element of a matrix is bounded by its norm, the trace of a k x k

matrix is no more than k times its norm. Therefore, (4.14) leads to
tr((Y)a'(Y)) < kAjy( 2 + 2kIG(0)IA0IYl + kIa(0)I 2 . (4.15)
Substituting (4.13) and (4.15) into (4.10), and using the fact that Il < 6 1Y1 2 + 1/5
for all 6 > 0, there exist 6, > 0, 6 2 > 0, such that 2), 1 kt 6 1 > 0 and
(AqP)(Y) (2A k^0 b^)IYI 2 + b 2 (4.16)
Now take expectations in (4.9) to get
EIX^ I 2 = IXI Z + EE(Agp(XS)) ds. (4.17)

In arriving at (4.17), use the facts (i) IAq (y)I < cjy1 2 + c' for some c, c' positive,
so that f o EIA(p(Xs)I ds < oo, and (ii) the integrand of the stochastic integral in
(4.9) is in .t[0, cc). Now (4.17) implies that t --> EIX, I Z is absolutely continuous
with a density satisfying, by (4.16),
EIXx 1 2 = EAW(X') kA 2 6 1)EjX x 1 2 + 6 2 . (4.18)

dt
In other words, writing 6' .= 22 1 k2 6 1 > 0, 8(t) = EIX9 2 ,
a ` 0 (t)) - 6 2 e b `,
dt (e (4.19)
0 (t) -< {0(0) + ( 2 /8')(e a ` 1)}e - a` <, Ix1 2 e - a' 1 + 6 2 /S'.
This proves (4.8) and therefore (4.6) for all x.

The next task is to prove that the limit in (4.7) does not depend on x. For
this it is enough to show
lim EIX, XT I 2 = 0 `dx, ye IF8'`. (4.20)
For, if (4.7) holds for some x, and (4.20) holds for this x and all y then for
every Lipschitzian and bounded f, writing t for t,,,
Ef(XT) ff (z) n(dz) < IEf(Xfl Ef(Xx )I + Ef(X,) f f(z)7r(dz)
,<cEIX; X;I +Ef(X,) f f(z)n(dz)
c(EIXt X11 2 ) 1 / 2 + Êf(X;) Jf(z)(dz) ^ . 0 ,
as t o oo. It follows (Chapter 0, Theorem 5.1) that if (4.7) holds, then
p(tn; y, dz) ' n(dz) Vy C R k . (4.21)
To prove (4.20), proceed as in the first step, replacing X; by
0
Z,:= X, X; =x y+ J t f(s) ds + J ry(s) dB s ,
0
(4.22)
where
f(s) = (Xs) (Xs) = J 1 (XS X5)'J(XS + 0(Xs Xs )) d 0 ,

o (4.23)
Y(s)'= aND a(XS )
Letting Y, = Z, and p(z) = Iz1 2 in It6's Lemma (see Eqs. 3.44-3.46), obtain
EIZ t I 2 = Ix - y1 2 + E J {2Z . f(s) + tr y(s)y'(s)} ds.

5 (4.24)
But, by (4.23) and the fact that z'J(w)z < A l lz1 2 for all w, z (see Eq. 4.12),
Z S fs = J 1
0
Z;J(X., + OZ5 )Zs dO A I IZs12. (4.25)
Also,
IIY(s)Y'(s)II II7(s)II IIY'(s)II A 2 '7 2 ,

so that
tr y(s)y'(s) S k 2 0IZsj 2 . (4.26)
Using (4.25) and (4.26) in (4.24), obtain
EIZ` I 2 = E {2Z,f(t) + tr y(t)y'(t)}

dt
(2A 1 k 2 )EIZ,I 2 , (EIZOI 2 = Ix y1 2 ). (4.28)
Hence (4.20) holds and, therefore, the limit in (4.21) is independent of y e Il k

(i.e., the limit in (4.7) is independent of x).
The final step is to show that (4.21) implies that n is the unique invariant
probability. Let T, denote the transition operator for the diffusion (see Eq.
3.50). Then for every bounded continuous function f, (4.21) implies
(T,J)(Y) - ..i :=
-
f f (z)it(dz). (4.29)
By Lebesgue's Dominated Convergence Theorem, applied to the sequence

{T,. f } and the measure p(t; x, dy),
T,(T,.,f)(x) = J (TtJ)(Y)P(t; x, dy) TJ = .f. (4.30)
But T, f is a bounded continuous function, by the Feller property (see end of

Section 3). Therefore, applying (4.29) to T, f,
T,,(T,.f)(x) -' J (T^f)(z)n(dz). (4.31)

As T,(T,,, f) = T,^(T, f) = T, +t ^ f, the limits in (4.30) and (4.31) coincide,
J (T, f )(z)zr(dz) = J f (z)mc(dz), (4.32)
i.e., if X 0 has distribution it, then Ef(X,) = Ef(X 0 ) for all t > 0. In other
words, it is an invariant (initial) distribution. To prove uniqueness, let n' be
any invariant probability. Then for all bounded continuous f,
J (T f)(z)n'(dz) = f f(z)ir'(dz)
Vn. (4.33)
But, by (4.29), the left side converges to 7. Therefore,
f- f f(z)ir(dz) = ff(z)7r'(dz),
(4.34)
implying it' = n. U
Consider next linear stochastic differential equations of the form
dX, = CX, dt + a dB, (4.35)
where C and a are constant k x k matrices. This is a continuous-time analog

of the difference equation (13.10) of Chapter II for linear time series models.
It is simple to check that (Exercise 1)
t
XX:= e``x + e ( `"s )c v dB
Q s (4.36)
is a solution to (4.35) with initial state x. By uniqueness, it is the solution with

X o = x. Observe that, being a limit of linear combinations of Gaussians, X, is
Gaussian with mean vector exp{tC}x and dispersion matrix (Exercise 2)
L(t) = E eta-s)co dB se
(r-s)'6 dB s = e('-s)c66ê('-s)' ds
0 0 0
=
5
0
eae du. (4.37)
Suppose that the real parts of the eigenvalues A.....2, , say, of C are
negative. This does not necessarily imply that the eigenvalues of C + C are all
negative (Exercise 3). Therefore, Theorem 4.1 does not quite apply. But, writing
[t] for the integer part of t and B = exp {C },
Ile` c I < Ile t " c Ii Ile" -( ""II < IIB 1 t'II max{ Ilesc ii: 0 < s < 1} -+ 0 (4.38)
exponentially fast, as t -+ oo, by the Lemma in Section 13 of Chapter II. For

the eigenvalues of B are exp{A ; } (1 < i < k), all smaller than I in modulus. It
follows from the exponential convergence (4.38) that exp{tC }x - 0 as t - oo,
and the integral in (4.37) converges to
E := J ec aae c du.
0
" (4.39)
The following result has, therefore, been proved.
Proposition 4.2. The diffusion governed by (4.35) has a unique invariant

distribution if all the eigenvalues of C have negative real parts. The invariant
distribution is Gaussian with zero mean vector and dispersion matrix E given
by (4.39).
Under the hypothesis of Proposition 4.2, the diffusion governed by (4.35)

may be thought of as the multivariate Ornstein-Uhlenbeck process.
EXERCISES
Exercises for Section VII.1

1. Let s = t 0 ,,, < t l , n < < t k ., n = t be, for each n, a partition of the interval [s, t],
such that S n := max {t ;+ ,, n - t in : 0 i 5 k n - 1} -+0 as n - oo.
(i) Prove that
n
s :_
n (B,3 a n Btn) 2
i=0
converges to t - s in probability as n --> oc. [Hint: E(sn - (t - s)) 2 -+ 0.]

(ii) Prove that
J B,, i l.n B, ,I * 00
i=0
in probability as n --> oo. [Hint: y n -> s ',/maxJB,,,,, n B, i.n ^.]

(iii) Let it denote an arbitrary subdivision s = t o < t l < <t,,, = t of [s, t]. Prove
that the supremum of y(ir) B,,, B, 3 1 over all partitions n is 00 for all
Brownian paths, outside a set of zero probability. [Hint: Choose a subsequence
in (ii) such that y n --* co a.s.]

EXERCISES 599
(iv) Prove that, outside a set of zero probability, the Brownian paths are of unbounded
variation over every finite interval [a, b], a < b. [Hint: Use (iii) for every interval
[a, b] with rational end points.]
(v) Use (iv) to prove that, outside a set of zero probability, the Brownian paths are
nowhere differentiable on (0, oo). [Hint: If f is differentiable at x, then there
exists an interval [x h, x + h] such that If() f (z)I < ( 2 I f '(x)l + i) iy zI
Vy, z e [x h, x + h], so that f is of variation less than (21f'(x)l + 1)2h on
[x h,x+h).]
2. Consider the stochastic integral equation (1.3) with p() Lipschitzian, ((x) le(y)i
Mix yt. Solve this equation by the method of successive approximations, with the
nth approximation X;" given by )
X}">=x+ J (y(" '>) ds+a(B, B o )

- (n>1), X; 1 =x.
0
[Hint: For each t > 0 write b"(t) := max {IXS" XS" "1.0 -< s < t}. Fix T> 0. Then
) -
S"(T)-<M
T
0
S"_ i (s)ds-<M 2 J f
(' Tr
0 0
-1
S "_ 2 (s)dsdt"_,
M"
T
o fo

t-,
J(' tz
... 6,(s)dsdt 2 ..dt"_, < M "B I (T)T" '(n 1)!

o
-
Hence, E"
c5(T) converges to a finite limit, which implies the uniform convergence
of {X:0 5 s T} to a finite and continuous limit {XS : 0 s T}.]
3. Suppose f", Ja [a, ] and E f Q (f"(s) f (s)) 2 ds - 0 as n + co.
(i) Prove that
r r l
U"'= max f(s)dB f(s)dB. : 0
Ja ))J
in probability.
(ii) If { Y(t): a t } is a stochastic process with continuous sample paths and
f Q f(s) dB s --* Y(t) in probability for all t e [a, ], then prove that
P(Y(t)=f(s)dB,,`dte[a,]=1. )
4. Prove the convergence in (1.34).

5. Let f (t) (t o < t < t,) be a nonrandom continuously differentiable function. Prove that
J to
f(s) dB a = f(t1)B1 f(t 0 )B B,sf'(s) ds.
ro
10

I. If h is a Lipschitzian function on Il' and Xis a square integrable random variable,
then prove that Eh 2 (X) < oo. [Hint:look at h(X) - h(0).]
2. Let DT's be as in the proof of Theorem 2.1. Prove that the sum ^^ " +1 (D) 2
occurring in (2.17) goes to zero as n - oo. [Hint: Use Stirling's approximation (10.3)
in Chapter I.]
3. Assume the hypothesis of Theorem 2.1, with a = 0.
(ii) Prove that
E(X, - X0 ) 2 -< 4M 2 (t + 1)
fo,
E(XS - X0 ) 2 ds + 4t 2 Ep 2 (X0 ) + 4 tEa 2 (Xo
(ii) Write D, E(max{(XS - X0 ) 2 :0 s t }), 0 t , T. Deduce from (i) that
(a) D, <, dl
f o D
s ds + d 2 , where d, = 4M 2 (T + 1), d 2 = 4T(TEp 2 (X0 ) + Ea 2 (X0
(b) DT < d 2 e ' T .
4. Write out a proof of Theorem 2.4 by extending step by step the proof of Theorem 2.1.
5. Write out a proof of Theorem 2.5 along the lines of that of Theorem 2.2.
6. Consider the Gaussian diffusion {X} given by (2.49).
(i) Show that EX, = e - `Yx.
(ii) Compute Cov(Xl , X+,,).
(iii) For the case y > 0, a 96 0, prove that there exists a unique invariant probability
distribution, and specify this distribution.
7. (i) Prove (2.28), (2.30), and (2.31) under the assumption (2.1).
(ii) Write out a corresponding proof for the multidimensional case.

(i) Prove (3.1). [Hint: For each n, the finite sequence (B , +,^r v - B rii2 - ) 2 - 2 "t
(
,
-
(m = 0, 1, ... , 2" - 1) is i.i.d. with mean zero and variance 2(4 ")t 2 . Use (13.56)
-
of Chapter I to estimate
1
P max ZN ,"I > --
and then apply the Borel-Cantelli Lemma.]

(ii) Prove (3.16).
2. Let t( ), a() be Lipschitzian and a(x) nonsingular for all x.
(i) Show that if
exp{ -I(u)} du = oo for some c > 0,
J
then the same holds for all c > 0.

EXERCISES 601
(ii) Show that if
^^ '(u) exp{1(u)} du < co for some c > 0,
then the same holds for all c > 0.

3. Let p( ), a() be Lipschitzian and d, 1 (x)= i1(x)l 2 > 0 for all x. Prove that Et < oo,
where r := inf{t ? 0: XI = d} and lxi < d. [Hint: For a sufficiently large c, the
function cp(x):= exp{cx" ) } in {Ixi d} (extended suitably) satisfies A(x) < 0 in
{lxs <_ d}. Let 6:= max{Acp(x): x d}. Apply It6's Lemma to cP and optional
stopping (Proposition 13.9 of Chapter 1) with stopping time t A n to get
E(r A n) _< (2/6) max{1gp(x)I: jxl < d}. Now let n j oo.]
4. Let F be the twice continuously differentiable function on [r o , N], 0 < r o < N < oo,
defined by (3.28). Find a twice continuously differentiable extension of F on [0, oo)
that vanishes outside [r o e, N + t:], where E > 0 is chosen so that r o s > 0. Show
that cp(x):= F(Ixl) is twice continuously differentiable on l, vanishing outside a
compact set. Apply It6's Lemma to cp to derive (3.31).
5. Let
F(r):= J r exp{ 1(u))} du, c -< r -< d,
where /(u) is defined by (3.23).

(i) Define cp(x):= F(IxI) for c < jxJ _< d and obtain a twice continuously
differentiable extension of cp vanishing outside a compact.
(ii) For the function tp in (i) check that Agp(x) ` 0 for c 5 lxi < d, and use It6's
Lemma to derive a lower bound for P(r( e(o:r) < =ae(o:d)), where raa(v:.)'=
,
inf{t >_ 0:IX;i = r}.

(iii) Use (ii) to prove that
P(iaBio:r o ) < 00) = I for lxi > r o ,

provided
f c
^ exp{ 1(u)} du = ca for some c> 0.
(iv) Use an argument similar to that outlined in (i) (iii) to prove that
oo) < 1 if exp{-1(u)} du < oo.

P(zae(o:. o ) <
J m
6. Suppose that g() is a real-valued, bounded, nonanticipative step functional on

[0, T]. Prove that
f t
t . ( g(u)
o dB g(t)
3
belongs to .'H[0, T]
[Hint: Use (1.6), with g in place of f, to get E {(f g(u) dB) 6 g 2 (t)} S ct' for an
appropriate constant c.]
7. Let {X} be a diffusion on 68 1 having Lipschitzian coefficients ( ), v( ), with
o 2 (x) > 0 for all x.
(i) Use It's Lemma to compute (a) i/i(x):= P( {X^ } reaches c before d), c < x < d;
(b) p P({X; } ever reaches c), x > c; (c) p xd t= P( {Xf } ever reaches d), x < d.
[Hint: Consider p(y):= f' exp{ 1(c, r)} dr for c -< y -< d, where 1(c, r):_
f C ( 2 u(z)/a 2 (z)) dz.]
(ii) Compute (a) ET, A T d , where T,:= inf {t _> 0: Xx = r }, and c < x < d; (b) Et c
(x > c); (c) ET d (x <d). [Hint: Let cp be the solution of Acp(y) _ 1 for c <y < d,
cp(c) = p(d) = 0; use Ito's Lemma.]
8. (i) Let m be a positive integer, and g a nonanticipative functional on [0, T] such

that E f o g 2 m(t) dt < oo. Prove that
T 2m
E/ g(t)dB,) < (2m 1)mTm-I f Eg2m(t)dt.
o 0T
[Hint: For bounded nonanticipative step functionals g, use Ito's Lemma to get
t \ 2m t (' s 2,-2
Jt 1= E(g(s) dB s ) m(2m 1) E (J g(u) dB g 2 (s) ds,
)
o / o o
so that
(' dJ, = m(2m I )E (J
r
g(u) dB,)
2m -2
g 2 (t)
dt o
f
2m (2m -2)/2m
m(2m 1)E g(u) dB.) (Eg2m(t))I /m

1( o
(by Hlder's Inequality).]

(ii) Extend (i) to multidimension, i.e., for nonanticipative functionals f =
{(f,(t), ... , fk (t)): 0 < t < T} satisfying E J 01 f 2m(t) dt < oo (1 _< i _< k), prove
that
f0T
E(1 T f(t)dB,
Jo /
< k 2 m - I(2m 1)mTm-I YE J f 2 m(t) dt.
9. (L-Maximal Inequalities)
(i) Let {Z.: n = 0,1, ...} be an {}-martingale, such that EIZ n I < oo for all n
and for some p > 1. Prove that
Pt max IZ m I> ^/ 0.

,
\I m n
[Hint: Note that {IZ n j} is a submartingale, and see Exercise 13.10 of Chapter I.]
EXERCISES 603
(ii) Assume the hypothesis of (i) for some p > 1. Prove that
max IZnI I ` (P/(P 1))PEIZ"I.

E^ 0<m_n
[Hint: Write M"'= max{IZ,I: 0 _< m _< n}. Then
AP(M" A) _ j IZnI 1M , dP ,
Ja
by (13.55) of Chapter I. Therefore,
J('
EMn = E(p '' dA = p

0
M m
o
AP - 'P(M" >- A) d i (by Fubini)
p
J
(' m
o
A p-2 ^ IZnI1,.^,,, dPl dA = P IZ4I^
\ /
('
J \ o
M^
Al-2 d), dP
_ (PI (P 1 ))
IZnl M^ -' dP
Ja
(p/(p 1))(EMP) ( p - ' )1 P(EIZI ')" (by Hlder's Inequality) .]
(iii) Let {Z,: t > 0} be a continuous-parameter {.F,}-martingale, having a.s.

continuous sample paths. If EIZ,IP < oo for all t ? 0, and for some p > 1, then
show that
max IZ,I >,) <- A. - EIZTI VA> 0,

P( o<t,T
and, if p> 1, show that
max IZ,Ij < (p/(p 1))PEIZ T Ip.

Ej 0<t<T
10. (i) In addition to the hypothesis of Theorem 2.1 assume that EX'" < co, where
m is a positive integer. Prove that for 0 _< t a < 1,
/ / / (2m)1 u2m
IIX[ Xall2m -< c1( m, M) (t IX)Il^lXa)112m + (t a)1J211Q(Xa)112m\m
2m j J
= c l (m, M)tp(t 0),
say. Here II.112. is the Le m-norm for random variables, and c l (m, M) depends
only on m and M. [Hint: Write S;":= sup{E(XS"' s t}. Use
the triangle inequality for II 2m' and Exercise 8(i), to show that
(a) I(Xt ' Xa Ii2m <- q (t a), (((1))1/2m < (t a)e

t
(b) afn) < 22m-1M2m {(t
IX)2m
-1 + (2m 1)m (t a) m -1 } 6(gn-1j ds
l
a
for n>2.
Observe that
II Xt X12,,, ((S(n))1/2m
n=1 /
(ii) Deduce from (i) that sup,.,,, EX < oo for all t > a.
(iii) Extend (i), (ii) to multidimension.
11. (i) Use the result in 10(i) to prove, for every e > 0, that P(IX,, xl >, s) = o(t), as
t j 0, (a) uniformly for all x in a bounded interval, if p(.), a(.) are Lipschitzian,
and (b) uniformly for all x in 08' if p(. )' a() are bounded as well as Lipschitzian.
(ii) Extend (i) to multidimension.
(iii) Let t(.), a(.) be Lipschitzian. Prove that
P( max IXs xl >_ e f = o(t) as t 10,

o5s5r I
uniformly for every bounded set of x's. Show that the convergence is uniform
for all x e IF k if t(. ), a() are bounded as well as Lipschitzian. [Hint:
PI max IX'r xl s > ^ l(X")I

S d s % + P max a ^ -
a X adB
oss,t e/ < P \J o t E/
2 \ Os: Ifos 2() I
='1 + 1 2,
say. To estimate I l note that Ip(XS)I < 1(z)1 + MIX' xl, and use Chebyshev's
Inequality for the fourth moment and Exercise 10(iii). To estimate / 2 use Exercise
9(iii) with p = 4, Proposition 3.4 (or Exercise 8(ii) with m = 2) and Exercise
10(iii).]
12. (The Semigroup {T, }) Let p(-), a() be Lipschitzian.
(i) Show that IITtf f II -+ 0 as t j 0, for every real-valued Lipschitzian f on 08 k
vanishing outside a bounded set. Here II II denotes the "sup norm." [Hint: Use
Exercise 10.]
(ii) Let f be a real-valued twice continuously differentiable function on 08'`.
(a) If Af and gradf are polynomially bounded (i.e., IAf(x)I < c l (1 + Ixlm'),
(grad f (x)l c 2 (1 + Ixlm'), show that T, f (x) -+ f(x) as t j 0, uniformly on
every bounded set of x's.
(b) If Af is bounded and grad f polynomially bounded, show that
IITtf f II -+ 0. [Hint: Use It's Lemma.]
(iii) (Infinitesimal Generator) Let -9 denote the set of all real-valued bounded
continuous functions f on R k such that Ilt - '(T,f f) gII -^ 0 as t 10, for
some bounded continuous g (i.e., g = ((d/dt)T r f), = o , where the convergence of
the difference quotient to the derivative is uniform on R"). Write g = Af for
such f, and call A the infinitesimal generator of {T, }, and -9 the domain of A.
Show that every twice continuously differentiable f, which vanishes outside
some bounded set, belongs to -9 and that for such f, Af = Af where A is given
EXERCISES 605
by (3.21). [Hint: Suppose f(x) = 0 for Ixl > N. For each E > 0 and x E
ITJ(x) f(x) tAf(x)i -< tO(Af:c) + 2tIIAf IIPI max IXs xl

\o<s<<
where (h:) = sup{th(y) h(z)I: y, z E R", ly ... zl < a}. As r j0. O(Af :s) --p 0,
and for each e > 0,
max IXS xi _> e 0 as t j 0

P( osssr /
uniformly for all x in Ne r_ {y: II _< N + F} (see Exercise I I (iii)). For x E NE,
IT,f(x) f(x) tA.f(x)I = E fo Af(X,)ds tIIAfIIP(
where z':= inf{t >_ 0: IX, I = N}. But P(r" <, t) < sup{P(r'' <_ t): lyl = N + F},
by the strong Markov property applied to the stopping time t:= inf{t >_ 0:
IX, I = N + f}. Now
t) max IXsyl>,e^=o(t) astj0,

o,s,l
uniformly for all y satisfying lyl = N + e.]

(iv) Let f be a real-valued, bounded, twice continuously differentiable function on
If8'k such that gradf is polynomially bounded, Af is uniformly continuous, and
Af(x) + 0 as Ixl oo . Show that f E -9. [Hint: Extend the argument in (iii).]
(v) (Local Property of A) Suppose f, g E _ are such that f(y) = g(y) in a
neighborhood B(x:e) XI <a} of x. Then Af(x) = Ag(x). [Hint:
T, f(x) T,g(x) = o(t) as t j 0, by Exercise 11(iii).]
13. (Initial-Value Problem) Let t(). a() be Lipschitzian. Adopt the notation of
Exercise 12.
(i) If f E -9, then T, f E for all t _> 0, and T, f = T,f. [Hint:
h(Th(Tf) T1f)=
Ti(Tti h f ) ^T,Af
in "sup norm," since T,g 11911 for all bounded measurable g.]
(ii) (Existence of a Solution) Let f satisfy the hypothesis of Exercise 12(iv). Show
that the function u(t, x):= T, f (x) satisfies Kolmogorov's backward equation
au(t, x) = Au(t, x
and the initial condition
u(t, x) f(x) as t j 0, uniformly on (I^'`.

If, in addition, u(t, x) is twice continuously differentiable in x, for > 0, then

Au(t, x) = Au(t, x). [Hint: The first part is a restatement of (i). For the second
part use Exercise 12(iii), (v).]
(iii) (Uniqueness) Suppose v(t, x) is continuous on {t -> 0, x e H"), once continuously
differentiable in t > 0 and twice continuously differentiable in x e H'", and
satisfies
av(t,x)
=Av(t,x) (t >0,xEH"),
at
lim v(t, x) = f(x) (x e R'),
rio
where f is continuous and polynomially bounded. If Av(t, x) and 1grad v(t, x)j =
^(a/axW . .. , a/ax"k")v(t, x)I are polynomially bounded uniformly for every
compact set of time points in (0, eo), then v(t, x) = u(t, x):= Ef (X^ ). [Hint: For
each t > 0, use It's Lemma to the function w(s, x) = v(t s, x) to get
r
Ev(e, X^_ E ) v(t, x) = E {v o (r, X) + Av(r, X r )} dr = 0,
E
where v 0 (t, y)'= (av/at)(t, y). Let e 10.]

14. (Dirichlet Problem) Let G be a bounded open subset of H". Assume that A( ), (.)
are Lipschitzian and that, for some i, d(x)1= (x)I 2 > 0 for x e G. Suppose v is a
twice continuously differentiable function on G, continuous on G, satisfying
Av(x) _ g(x) (x e G),
v(x)=f(x) (x e bG),
where f and g are (given) continuous functions on aG and G, respectively. Assume

that v can be extended to a twice continuously differentiable function on H'". Then
v(x) = E f (X7) + E T g(X,) ds (x e G),

.1 0
where r'= inf{t >, 0: X; e aG}. [Hint: v can be taken to be twice continuously
differentiable on H k with compact support. Apply Itd's Lemma to {v(X,)}, and then
use the Optional Stopping Theorem (Proposition 13.9, Chapter I). Also see Exercise
3.]
15. (FeynmanKac Formula) Let p(.), 6() be Lipschitzian. Suppose u(t, x) is a

continuous function on [0, oo) x H", once continuously differentiable in t for t > 0,
and twice continuously differentiable in x on H", satisfying
au(t, x)
= Au(t, x) + V(x)u(t, x) (t > 0, x E H"), u(0, x) = f(x),
at
where f is a polynomially bounded continuous function on H k , and V isa continuous

function on H k that is bounded above. If grad u(t, x) and Au(t, x) are polynomially

bounded in x uniformly for every compact set oft in (0, oo), then
u(t,x)= E(f(X,)exp {
f .'
V(XS)ds})
J
(t>,0,xcR'`).
[Hint: For each t > 0, apply It's Lemma to
1'(s) = (Y1(s), Y2(s)) =

(
X,., J V(X) ds')
0
and w(s, y) = u(t s, Y1),
where y = ( y 1 , y 2 ).]

1. (i) Use the method of successive approximation to show that (4.36) is the solution
to (4.35).
(ii) Use It's Lemma to check that (4.36) solves (4.35).
2. Check that (4.37) is the dispersion matrix of the solution {X'} of the linear stochastic
differential equation (4.35).
3. (i) Let C be a k x k matrix such that C + C' is negative definite (i.e., the eigenvalues
of C + C' are all negative). Prove that the real parts of all eigenvalues of C are
negative.
(ii) Give an example of a 2 x 2 matrix C whose eigenvalues have negative real parts,
but C + C' is not negative definite.
4. In (4.35) let C = cI where c > 0 and I is the k x k identity matrix. Compute the
dispersion matrices E(t), E explicitly in this case, and show that they are singular if
and only if a is singular.
Theoretical Complements to Section VII.1
>_
subsigmafield of .. The P-completion ' of
_>
1. (P- Completion of Sigmafields) Let (0, .F , P) be a probability space,
is the sigmafield {G u N: G e
N c M e such that P(M) = 0}. The proof of Lemma I of theoretical complement
V.1.1 shows that if .3, is P-complete for all t 0 then the stochastic process
{f(t): t 0} is progressively measurable provided (1) f(t) is ,y,-measurable for all t,
and (2) t + f(t) is almost surely right-continuous. Several times in Section 1 we have
constructed processes { f(t)} that have continuous sample paths almost surely, and
a
which have the property that f(t) is .F- measurable. These are nonanticipative in view
of the fact that, is P-complete.
_< _<
2. (Extension of the Ito Integral to the Class .[a, /f]) Notice that the It integral (1.6)
is well defined for all nonanticipative step functionals, and not merely those that are
in #[a, ]. It turns out that the stochastic integral of a right-continuous
nonanticipative functional { f(t): a t fl} exists as an almost surely continuous

limit of those of an appropriate sequence of step functionals, provided
P
f 2 (s) ds < oo a.s. (T.1.1)
Ja
The class of all nonanticipative right-continuous functionals { f (t): a < t < }

satisfying (T.1.1) is denoted by 2[a, ]. Given an f E .^[a, ] define, for each positive
integer n,
A(t; n) {
:=:
l
cw J a
f 2 (s)ds<n f,,(t)=.i(t)lAU ")
} I
;
Then f" E #[a, ]. Write

)
A":= w: j f 2 (t, w) dt <n - A(; n).

a JJJ
If m > n then on A. we have fm (t, co) = f"(t, w) for a <, t _< . It follows that
J a
" fm (s) dB, =
a
J f (s) dB, n for a _< t _< , for almost all w E A".
To see this, consider g, h E H[a, ] such that w) = h(t, w) for a _< t _< , g(t,
we A e F. If g, h are step functionals, then it follows from the definition (1.6) of the
stochastic integral that
_< t < , for all A.

J r
g(s) dB = (s)
r
a
^ h dB
s
a
s fora wE
In the general case, let g ", h" be nonanticipative step functionals such that
g"(s) dB, -- j g(s) dB,, j h"(s) dB, > h(s) dB, for a _< t _< , a.s.
j a ,l a ,J a a
Without loss of generality one may take, for each n, the same partition
"}
{t ; (n): 0 < i < N of [a, ] for both g h Modify h if necessary, so that ", ". ",
h"(tf", co) = g"(t; ", w) for we {h(s) = g(s) for a _< s _< t," } (0 _< i <_ N"). Then )
c
for a _< t _< , if w E A,
h"(s)
J dB, = a
g"(s) dB,
so that, in the limit,
a _< t < , almost everywhere on A.

J a
h(s) dB5 =
a
g(s) dB,,
Thus, we get
J r fm(s)
a
dB. = Jf a
t "(s) dB, a _< t < , almost everywhere on A".

Therefore,
lim fm(s) dB, = f,(s) dB, a < t -< , almost everywhere on A.

m-t f^ ' Js
A stochastic process that equals 1'. fn (s) dB, a.s. on A. (n >- 1) will be called the Ito
Ja
integral off and denoted by f (s) dB,. Since P(A) . I in view of (T.1.1), on the
complement of U a A the process may be defined arbitrarily.
The proof of Proposition 1.2 may be adapted for f e Y[a, (f] to show that (see
(1.25) and the definition of h. after (1.29)) there exists a sequence of nonanticipative
step functionals h such that
E (h(s) f(s)) 2 ds * 0 in probability.
From this one may show that f Q h a (s) dB, converges in probability to S f(s) dB,. For
this we first derive the inequality
P^ J
('
f (s) dB, > e^ P(
a
j a

f2(5)dS -> c +
c.
z (T.1.3)
r,
for all c > 0, e > 0. To see this define
g(t) = f(t) 1 A(t; )-
Since g e . #{, ], and f(t) = g(t) for -< t /3 on the set { Ja f 2 (s) ds <c}, it follows
that
P
(I Ja.i(s) dB E) P( J f 2 (s) ds > c) +
P(I JQ
g(s) dB, > r
2
2(s) ds c^ + E\Ja g(s) dB )/ F :
P \J f
J
PI f 2 (s) ds c) + (E
c
E g2(s)
ds) J /2
P f 2(s) ds > c + (T.1.4)

EZ
a
Applying (T.1.3) to h f one gets, taking c = e 3 ,
lim
(J :h
a (S) dB, J f(s) dB, >
which is the desired result. Properties (a), (b) of Proposition 1.1 hold for f E ^P[a, ].

1. (The Markov Property of {X1 }) Consider the equation (2.25). Let 5s , denote the
P -completion of the sigmafield a {B" Bs : s < u -< t }. The successive approximations
(2.3) (2.5), with a = s, X,, = z, may be shown to be measurable with respect to
(U8 1 ) Qx X5 . To see this, note that (z, u, w) --+ X." (w) is measurable on
)
(R' x [s, t] x f2,.(l 1 ) Q.1([s, t]) Qx 9,,,) for every t > s. Also, if (z, u, u0) --* X (w)
is measurable with respect to this product sigmafield, then so is the right side of (2.4).
This is easily checked by expressing the time integral as a limit of Riemann sums,
and the stochastic integral as a limit of stochastic integrals of nonanticipative (with
respect to { 9 : u -> s }) step functionals. It then follows that the almost sure limit X,
of X; " (as n - oo) is measurable as a function of (z, w) (with respect to the sigmafield
)
f(H 1 ) px (g R' x i)). Write the solution of (2.25) as cp(s, t; z). Then
X; = (p(s, t; XS ), by (2.24) and the uniqueness of the solution of (2.25) with z = X.
Since X. is S measurable, and is independent of 15 ,, it follows that for every bounded
Borel measurable function g on R', E(g(X;) I ) _ [Ey(rp(s, t; z))] z=Xt (see Section
4 of Chapter 0, Theorem on Independence and Conditional Expectation). This proves
the Markov property of {X, }. The multidimensional case is entirely analogous.
2. (Nonhomogeneous Diffusions) Suppose that we are given functions p '(t, x), 6(t, x)(
on [0, oo) x H k into H'. Write (t, x) for the vector whose ith component is
^' (t, x), a(t, x) for the k x k matrix whose (i, j) element is Q(t, x). Given s > 0, we
)
would like to solve the stochastic differential equation
dX, = (X t) dt + a(X t) dB, (t >, s), XS = Z, (T.2.1)
where Z is `ks measurable, EIZI 2 < oc. It is not difficult to extend the proofs of
Theorems 2,1, 2.2 (also see Theorems 2.4, 2.5) to prove the following result.
Theorem T.2.1. Assume
1(t, x) (s, Y)I 5 M(Jx yl + It sI),

(T.2.2)
Ila(t, x) a(s, Y)II M(Ix yI + It sl),
for some constant M.

(a) Then for every . -measurable k -dimensional square integrable Z there exists a
unique nonanticipative continuous solution {X,: t ^ s} in .&[s, oo) of
X,=Z+ J S
u)du+
5
J f a(X,u)dB" (t s). (T.2.3)
(b) Let {X;": t ? s} denote the solution of (T.2.3) with Z = x E H". Then it is a
(possibly nonhomogeneous) Markov process with (initial) state x at time s and
transition probability
p(s', t; z, B) = P(X,'z E B) (s -< s' t). (T.2.4)

Theoretical Complements for Section VII.3

1. (Itb's Lemma) To prove Theorem 3.1, assume first that f g, are nonanticipative
step functionals. The general case follows by approximating f,, g, by step functionals
and taking limits in probability.
Fix s < t (0 -< s < t -< T). In view of the additivity of the (Riemann and stochastic)
integrals, it is enough to consider the case f,(s') = f,(s), g,(s') = g,(s) for s -< s' t.
Let t ) = s < t(" ) < < tN,), = t be a sequence of partitions such that
S":=max{t+1t;"):=0-<i<-N"-1}--^0 as n oc.
Write
Nn - 1
(t, Y(t)) P(s, Y(s)) =[(t1, Y(t )) (P(ti"', Y(tt)fl
i =o
Nn - 1
+ [^(t;"), Y(t'? l )) q(ti" ) Y(t1" ) ))] . (Ti 1)

i =o
The first sum on the right may be expressed as
Nn - i
"1
(tl t l ){oq(t , (ti+l))+R';} (T.3.2)
i
where
IRn .'I 5 max{Iaoq(u, Y(t i+ )) aoq(u', Y(t i+ l ))I: t u u' ti+, } -a 0
uniformly in i (for each w), as n , because c o op is uniformly continuous on

[s, t] x { Y(u, w): s -< u -< t}. Hence, (T.3.2) converges, for all w, to
Jo'= d0 (u, Y(u)) du. (T.3.3)

J l
By a Taylor expansion, the second sum on the right in (T.3.1) may be expressed as
Nn - 1 m
Y(tlnl))
(Ylrl(ti+l) Y (ry
(tlnl)) 0, q(tin)
i =0 r=1
I Nn -I
+ > ( ) Y l.l (tl" ) ))(Y`"'(t'+) y r.l (t^"'))
2 ! t^r,r'Sm i =o
X l a r ar' q ( t n) Y(t'n))) + Rrr).n.i}, (T.3.4)
where R, 0 uniformly in i (for each w), because (u, y) ---r ,. 3, cp(u, y) and
t -a Y(t) are continuous. Using (3.9) the first sum in (T.3.4) is expressed as
m Nn-1 J {'
nl
Y- Y- l( 1 i ") ti )f,(s) + g,(s)' (Bt( , Bilnl)}urq(tin) Y(t))
r=1 i =0 l
' Jt 1 `_ .=t ^J sl
f.(u)a.^P(u, J
1'(u)) du + 1 a,(P(u, Y(u))g.(u)dB"],
in probability, (T.3.5)

since ,cp is continuous and (see Exercise 1.3)
N-1
f 4"
(0 (p(t(
"1 Y(ti"^)) 3 p(u, Y(u)))g,(s)dB = J h"(u)dB,,,
(' I
1=0 " s
say, where $ Jh,.,(u)j du -+ 0 a.s. It remains to find the limiting value of the second
sum in (T.3.4) (excluding the remainder). For this, write
N-1
t l + l) Y '
+1) Y (. ) (tl n) ))a.a. w(ti"' y(t;n '))
.
(Y'( (t"))(Y'(t
1=o
N-1
_ { (t + l t )f (s) + g.(s) (B,j" , B 11 0}

,
i0
X l(t( +l t1n)), (s) + g; (s) (B 1 1,. Btlnl)},r(p(ti' Y(t'"1)). (T.3.6)
By (3.1), (3.15) and (3.16), (T.3.6) converges a.s. (for a suitable sequence of partitions)
to
N -1 k k
um 9(s)(B^f;, B.)}{ g (s)(B) B ) a.: ^P(t} 1
" , Y(ti''))
j =1i=1
Jim
= 9:1(s)9r1(s) (Bri'. Bî^ ) Z a.a; W (ti " 1 Y(ti" 1 ))
j=1 1=0
k N-1
= L^ 9r11(S)gr^)(s) llm Z nl
(ti +l ti )a,a,. (ti ") , Y(ti "1 ))
j=1 i=0
= J12'= Y f,' g,'j) (U)gr(j '(U)O,ar'(P(U,Y(u))du.

(T.3.7)
j =1
The proof is completed by adding J0 , J 11 , and J 12 . n
It may be noted that the proof goes through for f g, e [0, T], I -< r -< k, cp(t, y)
twice continuously differentiable in y and once in t.
2. (Explosion) Assume that ( ), 6() are locally Lipschitzian, that is, (2.42) holds for
^xl < n, jyj < n, with M = Mn , where Mn may go to infinity as n co. One may still
construct a diffusion on R k satisfying (2.43) up to an explosion time C. For this, let
{X': t >- 0} be the solution of (2.43) with globally Lipschitzian coefficients i,,(), c()
satisfying
n(Y) = (Y) and a(y) = 6 (Y) for jyj -< n. (T.3.8)
We may, for example, let "(y) = p(ny/Iyl) and v"(y) = a(ny/Iyj) for lyl > n, so that
(2.42) holds for all x, y with M = M. Let us show that, if (xl -< n,
X' m (w) = X;. ,,(aw) for 0 -< t < C n (co) (m >- n) (T.3.9)

outside a set of zero probability, where
inf{t > 0: IX, I = n } . (T.3.10)
To prove (T.3.9), first note that if m >- n then (y) = (y) and a,,,(y) = a(y) for
II -< n, so that
m (X,,,,) = (X,.,,) and (X;,,,) = a(Xf ) for 0 -< t < S(w). (T.3.1 I
It follows from (T.3.1 1) (see the argument in theoretical complement I.2) that
f p.(X,.,,) ds +
Jo Jo
a,X) dB =
Jo
(X) ds +
JO
(X;. ,,) dB for 0 -< t < S
(T.3.12)
almost surely. Therefore, a.s.,
Xi m X^ n = f' o [Im(X.;m) P^,(Xs,)] ds +

Jo
[ 6 m(X.;.m) am(X,.^)] dBu
for 0 -< t < S. (T.3.13)
This implies
cP(t):= E(IXi XiJ 2 I > ,) 2tMm J (s) ds + 2M' w(s) ds

0
= 2Mm(t + 1) q(s) ds. (T.3.14)

Jo
By iteration (see (2.21) and (2.22)), q(t) = 0. In other words, on {^ > t}, X; , =
a.s., for each t > 0. Since t --, X; ,,,, X,.,, are continuous, it follows that, outside a set
of probability 0, X,., = Xs,, for 0 -< s < S, m -> n. In particular, C. j a.s. to (the
explosion time) 1;, say, as n j co, and
X; (w) = lim X (w), t < (w), (T.13.15)

nom
exists a.s., and has a.s. continuous sample paths on [0, S(cw)). If P(C < co) > 0, then
we say that explosion occurs. In this case one may continue {X'} fort -> S(w) by setting
X; (co) _ "co, t (co), (T.13.16)
where "co" is a new state, usually taken to be the point at infinity in the one-point
compactification of 11" (see H. L. Royden (1963), Real Analysis, 2nd ed., Macmillan,
New York, p. 168). Using the Markov property of {X} for each n, it is not difficult
to show that {X': t >- 0} defined by (T.13.15) and (T.13.16) is a Markov process on
08'` u {"cc"}, called the minimal dillusion generated by
0 2 r; u^
A = 1
2 d ,( )
x 3x^î fix' + (X) (izri'
If k = 1 one often takes, instead of (T.13.16),
00 for t >-C(w)on(lim X, =oo^,

Xt _ \ (T.3.17)
+ oo for t >- ^(w) on (um X; = + oo f .
T -' J
Write C = r_. on {lim,^. X, = co}, C = r + on {lim,^. X, = +oo}. Write

T,:= inf{t >, 0: X; = z }.
Theorem T.3.1. (Feller's Test for Explosion). Let k = 1 and I(z) := f o (2(z')/a 2 (z')) dz'.
Then
(a) P(r + ^ < oo) = 0 if J 0

exp{- 1(y) } ^ y (exp {1(z) }/v 2 (z)) dz oo,
L 0
dy =
I
(b) P(r-. < oo) = 0i
f J o exp' 1(y)}
I fy
o (exp{I(z)}/a 2 (z)) dz] dy = oo.
Proof. (a) Assuming that the integral on the right in (a) diverges we will construct
a nonnegative, increasing and twice continuously differentiable function (p on [0, cc)
such that (i) Agq(y) < p(y) for all y and (ii) 4p(y) -+ oo as y -^ oo. Granting the
existence of such a function cp for the moment (and extending it to a twice continuously
differentiable function on (-0o,0]), apply It's Lemma to the function
cp(t, y) := exp{ t }(p(y) to get, for all x > 0,
n TO A I
Eq (t A T o n t, X '^ Tp ,,,) q (x) = E (exp{ s })[cp(Xs) AQp(Xs )] ds
0
0.
This means q(x) >, Ecp(;r n A T o A t, X A To j, so that
Q(x) >, e `^p(n)P(r < r o A t).

-
Letting n -+ o0 one has, for all t >, 0,
P(r + , < r o A t) < e`cp(x) )im cp - '(n) = 0. (T.3.18)

n-m
Thus, P(T + < t, T o = oo) = 0, for all t >- 0, so that P(t + ^ < oo, -r o = oo) = 0. But on
the set {tt o < co}, r + > i o by definition of r +m . Therefore, P(r +W t, r o < oo) = 0.
This implies P(r + . < oo) = 0. It remains to construct cp. Let p 0 (z) = I (z ? 0) and
define, recursively,
exp{ 1 (z')}con-i(z ) dz,ldy

q(z) = 2 êxp{- 1(y) (z ^ 0; n >- 1).
f. }Ifoy 62 (z') J
(T.3.19)

Then (p, >- 0, q(z) j as z j, and
cp(z)
( [ex
2 p{-!(Y)}
f ex6{((zl
z
)}
dz' d3 , (n -> 0). (T.3.20)

n' 0
To prove (T.3.20), note that it holds for n = 0 and that (z) -< cp_,(z)^p,(z) (n -> 1).
Now use induction on n. It follows from (T.3.20) that the series q (z):= = o y^(z)
converges uniformly on compacts. Also, it is simple to check that A(P(z) = cP_,(z)
for all n >_ 1, so that A may be applied term by term to yield AQQ(z) = (P(z) (z -> 0).
By assumption, p 1 (z) --' -) as z -+ rc,. Therefore, cp(z) -* x as z - o. Next assume
that the series on the right in (a) converges. Again, applying It's Lemma to
rp(t, y):=exp{-t}cp(y), we get
X
q(x) = EcP(r A T o A t , TnA ToA
cp(n)P(i < t o A t) + q (0)P(r o -< t A t) + e - `cp(n)P(t < z A x)

= cp(n){P(t -< T o A t) + e `P(i A T o > t)}.
- (T.3.21)
Letting t -* oc we get
9(x) -< p(n){P(r < To, T o < cc) + P(i < r x , T o = cc)} < w(n)P(i < cc),
so that
P(z, < cc) -> 9 (x)/q (oo) > 0.
The proof of (b) is essentially analogous. n
REMARK. It should be noted that P(T + , < oo) > 0 means +co is accessible. For
diffusions on S = (a, b) with a and/or b finite, the criterion of accessibility mentioned
in theoretical complement V.2.3 may be derived in exactly the same manner.
In multidimension (k > 1) the following criteria may be derived. We use the
notation (3.23) for the following statement.
Theorem T.3.2. (Has'tninskii's Test for Explosion).
f exp{/(u)}
(a) P(S < oo) = 0 if J x e i^^( du) dr = cc,
` J x(u)
(b) P(? < oo) > 0 if J e-t(r)(I

,
exp{I(u)}
a(u)
du dr < co. [I]
Proof. (a) Assume that the right side of (a) diverges. Fix r o e [0, xl). Define the
following functions on (r o , cc),
Âo(v) = 1, ^P(v) = J e ^t.^ I I exp{ (u)}r0=i(u) du) dr.

x(u)
(T.3.22)

Then the radial functions q(y)on (r0 _< IyI < oo) satisfy: A(y) _< ^p_ I(IYI) Now,
as in the proof of Theorem T.3.1, apply It's Lemma to the function
<P(t, y)'= exp{ - t }gPQYI), where cp(IYI) _

n=0
extended to all of R'` as a twice continuously differentiable function. Then, writing

t R := inf {t >_ 0: IXXI = R },
E(p(t, o /\ tR A t, X,' TRn,) <- 9(Ixl),
or
w(IXI)
e-'w(R)P(tR ^ t ro A t) (IX), P(ZR t, a A t) e'
cp(R)
Letting R - co, we get P(^ _< t, o A t) = 0 for all t _> 0. It follows that P(S < co) = 0.
(b) This follows by replacing upper bars by lower bars in the definition (T.3.22)
and noting that for the resulting functions, Acp(IYI) % (V - i(IYI) The rest of the proof
follows the proof of the second half of part (a) of Theorem T.3. 1. n
Theorem T.3.1 is due to W. Feller (1952), "The Parabolic Differential Equations

and the Associated Semigroups of Transformations, Ann. of Math., 55, pp. 468-519.
Theorem T.3.2 is due to R. Z. Has'minskii (1960), "Ergodic Properties of Recurrent
Diffusion Processes and Stabilization of the Solution of the Cauchy Problem for
Parabolic Equations, Theor. Probability Appl., 5, pp. 196-214.
3. (Cameron-Martin-Girsanov Theorem) Let ), a(.) be Lipschitzian, a() nonsingular
and a - ' () Lipschitzian. Let {X'} be the solution of
X; = x + j a(X)dB (t -> 0). (T.3.23)

Jo
Define
3 exp{ J
`
6-
1 `
'(X II)R(XS )dB 3 . - 2 la -t (Xs:)P(X -, )I Z ds'
J5
(0`s<t<oo). (T.3.24)
By It's Lemma,
dZ^.: = Z,..6 - '(X')p(X^ )dB
or,
Z"= I + J Z0 a '(X S)g(Xs)dB s .

(T.3.25)
0

One may show, using Exercises 3.8, 3.10, that the integrand in (T.3.25) is in 1I[0, ox).
Hence {Z'} is a nonnegative {3}-martingale and EZ' = 1. Define the set function
Q on U,,o ^F, by
Q(A)= E(1 A Z") =

f
A
Z" dP, A e .f,. (T.3.26)
Note that if A e ,y, then A e .,, for all t' > t, and
E(I A Z) = E [ 1AE(Z o.x I

^)] = E(l A Z.x)
Hence Q is a well-defined nonnegative and countably additive set function on the

field U,, o F,. It therefore has a unique extension to the smallest sigmafield 9
containing U,, o .y,. Also, Q(S1) = E(Z') = 1. Thus, Q is a probability measure on
(f2, F). On (Q .y,), Q is absolutely continuous with respect to P. with a
RadonNikodym derivative Zr'.
We will show that, under Q, {X} is a diffusion with drift () and diffusion matrix
a( )'( ), starting at x. Denote by E Q and E, expectations under Q and P, respectively.
Let g be an arbitrary real-valued bounded Borel measurable function on Il k . Then
o.x
E(9(X')Z+,1 gis) = Z '(t, X., ), (T.3.27)
where
o .r).
i(t, Y):= E(g(XY )Z (T.3.28)
Therefore, for every real bounded F,-measurable h,
.x
EQ(hg(Xs+^)) = E(hZ i(t, XS)) = EQ(h+i(t, Xs)), (T.3.29)
which proves the desired Markov property. Finally, for all twice continuously
differentiable g with compart support, It6's Lemma yields, with f = a`tt,
o
d(g(X, )Zo,.) = (Aog)(X' )Z dt + Z.' (grad 9)(X: ). (X; ) dBt
+ g(X,)Z,o '"f(X, )dB, + Z"(grad g)(X(X, ) dt
= (Ag)(X, )Z" dt + Z"(grad g)(X,) + g(X, )f (X, )) dB (T.3.30)
where, writing d(y)t= (a(y)a'(y)),
1 82g( Y) ^;^ ag
Aog(Y) = 2 1] oy^î ayu^' Ag(Y)'= Aog(Y) + (Y) _ .) . (T.3.31)
Taking expectations in (T.3.30) we get
E Q g(X;) g(x) = f E Q Ag(Xs) ds. (T.3.32)

Jo
Dividing by t and letting t j 0, it follows that the infinitesimal generator of the

Q-distribution of {X;) is A. We have thus proved the following result.
Gib AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Theorem T.3.3. (Cameron-Martin-Girsanov Theorem). Suppose i(), a(.), a ' ( )

are Lipschitzian, {X'} the nonanticipative solution of (T.3.23).
(a) Then for every t ? 0, the probability measure Q defined by (T.3.26) is absolutely
continuous with respect to P on (fl, . ), with the Radon-Nikodym derivative Z o "
(b) Under Q, {X} is a diffusion with drift it() and diffusion matrix a(.)'().
An immediate corollary is the following result. Let .4, denote the sigmafield on
S2':= C([0, oo): P) generated by the coordinate projections co' - a, 0 <_ s _< t. Let
9 = a(U,, o ,4,). Denote by P" the distribution (on (0', ga')) of a diffusion with drift
)
;(.) and diffusion matrix ((d; j ( ))) = a()a( )', starting at x (i = 1, 2).
Corollary T.3.4. Suppose ;() (i = 1, 2), a(. ), a ' () are Lipschitzian. Then, for
every t > 0, P;;2 ^ is absolutely continuous with respect to P on (Q', V,).
The preceding results extend easily to nonhomogeneous diffusions on 1. Recall

the notation of Theorem T.2.1, and define
'(s, Xo. =)(s, Xo. :),d% - 1 r^ â-'(s Xo.X)(s, Xo,x)^2 ds}.

exp{Jt0 a 2 o J jjj
(T.3.33)
Theorem T.3.5
(a) Suppose (t, y) -. (t, y), a(t, y), a - '(t, y) are Lipschitzian. Let {X'} denote the
nonanticipative solution of
X o " = x + a(s, X o ') dB s ,

fo, (T.3.34)
and Q a probability measure on (S2, a(U,, o S)) defined by
Q(A) = E(1 g Z"), A e.. (T.3.35)
Then Q is absolutely continuous with respect to P on (S2, y,), with a

Radon-Nikodym derivative Zo"
(b) Suppose (t, (y)) -+ (t, y) (i = 1, 2), a(t, y), a" 1 (t, y) are Lipschitzian. Let P! x
;
(i = 1, 2) denote the distribution (on (C([0, oo): R'), .4)) of a diffusion with drift
;() and diffusion matrix a( . )a'( ). Then P ; is absolutely continuous with
respect to Po'; on .4.
The assumption that , a, a ' be Lipschitzian may be relaxed to the

-
assumption that these may be locally Lipschitzian and that the diffusions, with
and without the drift p, be nonexplosive. This may be established by
approximating , a by , a as in (T.3.8), and then letting n -. oo.
R. H. Cameron and W. T. Martin (1944), "Transformation of Wiener Integrals
Under Translations, Ann. of Math., 45, pp. 386-396, proved the above results
for the case a() = I. These were generalized by I. V. Girsanov (1960), "On
Transforming a Class of Stochastic Processes by Absolutely Continuous
Substitution of Measures," Theor. Probability Appl., 5, pp. 314-330.

4. (The Support Theorem and the Maximum Principle) We want to establish the
following result. The notation is the same as in the statement of Corollary T.3.4.
Theorem T.3.6. (The Support Theorem). Let p.), a( ), a 1 ( ) be Lipschitzian. Then

the support of the distribution P. (of a diffusion with drift It(.) and diffusion matrix
( )a'( )) is C {w' e C([0, cc): 01"): w'(0) = x}.
Proof. It is enough to show that all sufficiently smooth functions in C. are in the
support of P. Let t * u(t) = (u,(t), ... , u k (t)) be a twice continuously differentiable
function in C. Let > 0, T> 0 be given. Define a continuously differentiable (in t)
function c(t) _ (c 1 (t), ... , c k (t)) that vanishes for t > 2T and such that c.(t) = (d/dt)u ; (t),
1 _ 0). (T.3.36)
In view of Theorem T.3.5, it is enough to show
PÎX x Jc(s) ds < e for all t e [0, T]) > 0. (T.3.37)
In turn, (T.3.37) follows if we can find a Lipschitzian vector field b() such that the
solution {Y,} of
r r
Y, = x + c(s) ds +
f.'
b(Y,) ds +
0
6(Y s ) dB, (t -> 0) (T.3.38)
satisfies
P(IYr x J c(s) ds < e for all

0
t E [0, T]) > 0. (T.3.39)
But (T.3.39) will follow if one may find b() such that
E(r OB(x: ,) ) > T, where T B( , ) := inf { t > 0: I Y, x f ' c(s) dsI-> e } .

. )
The following lemma shows that such a b() exists. n
Lemma. Let a() be nonsingular and Lipschitzian. Then, for every e > 0,
sup Er fl( , E) = co, (T.3.40)
where the supremum is over the class of all Lipschitzian vector fields b(.).
Proof. Without loss of generality, take x = 0. (This amounts to a translation of the

coefficients by x.) Write Z, = Y, f c(s) ds. For any given pair t() - b(. ), y(.)

define
Za)ZW

d(t, z):= d ;i (z + c(s) ds) Z
IZI . f ,
B(t, z) _ Y d ii (z +
t. t
('
J c(s) ds)
o
C(t, z) = 2 + c(s) ds
o
B(t, z) + C(t, z)
3=' sup 1, a(r):= sup d(t, z).
f30.Izl =r d(t, z) t >-o.Izl =r
Similarly (r), a(r) are defined by replacing "sup" by "inf" in the last two expressions.
Finally, let
I(r):= f r(u) du !(r):_ I r (u) du
o U J a U
where a is an arbitrary positive constant. Note that {Z,} is a (nonhomogeneous)

diffusion on 11" whose drift coefficient is b (t , z) = b(z + ,(o c(s) ds), and whose diffusion
matrix is D(t, z) := a(t, z)'(t, z) where a(t, z) = a(z + f o c(s) ds). The infinitesimal
generator of this diffusion is denoted A.
Now check that
lim(r)>k -1. (T.3.41)
10
Define
F(r) =
f, f, 8( "a 1v )
(
exp{
J u(,
v
dll } dv) du.
))) v
(T.3.42)
Using (T.3.41) it is simple to check that F(r) is finite for 0 < r s. Also, for r > 0,
J (U)
dv' } dv < 0,
Er 1 exp^
r
F'(r) =
a(v) ^ v' J)
(1.3.43 )
F"(r) _ -r) Y
r)
Define q(z) = F(IzI). By (3.24) and (T.3.43) it follows that
2A t rp(z) >- 1 for 0< IzI < s.
Therefore, by Itb's Lemma applied to cp(Z,) (and using the fact that cp = 0 on B(O:e)),
0 4?(0) 3 ZEiaBto:E),
so that
ETaB(o:,) -> 2q(0) = 2F(0), 0 -< IzI -< a. (T.3.44)
Choose b(z) _ Mz (M > 0). Then it is simple to check that as M r oo, F(0) oo.
n

Corollary T.3.7. (Maximum Principle for Elliptic Equations). Suppose It(.) and a( )
are locally Lipschitzian and r(.) nonsingular. Let A be as in (T.3.31). Let G be a
bounded, connected, open set, and u() a continuous function on G, twice continuously
differentiable in G, such that Au(x) = 0 for all x e G. Then u(.) cannot attain its
maximum or minimum in G, unless u is constant on G. 0
Proof. If possible, let x o e G be such that u(x) _< u(x o ) for all x e G. Let 6 > 0 be
such that B(x 0 :2 5) c G. Let cp be a bounded, twice continuously differentiable function
on H" with bounded first and second derivatives, satisfying q0(x) = u(x) on B(x o :b).
Let {X,} be a diffusion with coefficients ( ), a( ), starting at x 0 . Apply It6's Lemma
to get
Eg7(X8,0 ai) =
or, denoting by it the distribution of X,, b .,
u(y)rt(dy)
f =
eB(x o :j)
u(x o ). (T.3.45)
But u(y) _< u(x o ) for all ye aB(x 0 :6). If u(y o ) < u(x o ) for some y o E 3B(x 0 :S) then, by
the continuity of u, there is a neighborhood of y o on 3B(x 0 :S) in which u(y) < u(x o ).
But Theorem T.3.6 implies that iv assigns positive probability to this neighborhood,
which would imply that the left side of (T.3.45) is smaller than its right side. Thus
u(y) = u(x o ) on 3B(x 0 :8). By letting 6 vary, it follows that u is constant on every
open ball B contained in G centered at x o . Let D = {y e G: u(y) = u(x o )}. For each
y e D there exists, by the same argument as above, an open ball centered at y on
which u equals u(x o ). Thus D is open. But D is also closed (in G), since u is continuous.
By connectedness of G, D = G. n
D. W. Stroock and S. R. S. Varadhan (1970), "On the Support of Diffusion

Processes with Applications to the Strong Maximum Principle," Proc. Sixth Berkeley
Symp. on Math. Statist. and Probability, Vol. III, pp. 333-360, contains much more
than the results described above.
5. (Positive Recurrence and the Existence of a Unique Invariant Probability) Consider

a positive recurrent diffusion {X} on Rk . Fix two positive numbers r, < r z . Define
ry e :=inf{t 0: X') = r 1 }, riz;:=inf{t z; _ ' : IX,) = r z },

(T.3.46)
x
7zr +i =inf{ t->ri zi : ^X,^ =r 1 } (i 1).
The random variables q ; are a.s. finite. By the strong Markov property {Xn z ,: i > 1}
is a Markov chain on the state space 0B(O:r 1 ) = {)yl = r l }, having a (one-step)
transition probability
ni(y, B):= P(X"7:,., E B H,(z, B)H2(y, dz),

{iii =.zi
nz:-. )xt=r =
J (T.3.47)
()yj = r 1 , B E .1(B(O: r, )),

where .yn , is the pre-q sigmafield, and H (y, dz) is the distribution of X (0 ., t the
;
random point where {X;} hits OB(0:r; ) at the time of its first passage to aB(O:r; ).
Now for all z,, z 2 e OB(O:r, ), H2 (z,, dy), H2 (z 2 ,dy) are absolutely continuous with
respect to each other and there exists a positive constant c (independent of z,, z 2 )
such that
dH2(z1, dy) >_

c (jz; = r l for i = 1, 2). (T.3.48)
dH2(z2, dy)
This is essentially Harnack's inequality (see D. Gilbarg and N. S. Trudinger (1977),
Elliptic Partial Differential Equations of Second Order, Springer- Verlag, New York,
p. 189). It follows from (T.3.47) and (T.3.48) that
dn1(Y, dz)
(Y, Yi e B(O:r l )). (T.3.49)
dn1(Y1, dz) >c
This implies (see theoretical complements II.6) that there exists a unique invariant
probability l for the Markov chain {X11 2 , : i > l} and the n-step transition
probability n(y, B) converges exponentially fast to 1 (B) as n -^ oo, uniformly for
all y e 3B(O: r,) and all BE .(0B(0: r, )).
Now let {X 1 } be the above diffusion with initial distribution ,, so that
{X n2 : i > 1} is a stationary process, as is the process
nz
Z1 (f)= fJ(XS)ds (i 1),
n2i- 1
where f is a real-valued bounded Borel-measurable function on 118'`.
Lemma. The tail sigmafield of {Z ; (f ): i > 1} is trivial.
Proof. Let A e := n,,,, o {Z; (f ): i >, n} the tail sigmafield, and Bev{Z i (f ):
I _< i _< n} c ,Fn2 , (the pre-n Z + , sigmafield of {X, }). For every positive integer m,
1
A belongs to the after- ,j 2( ,,, )+i process X,1z and, therefore, may be expressed
m
as {Xnzi m e A n +m a.s., for some Borel set A +m of C([0, oo): 11"). By the strong
}
Markov property,
+
P(A j na^.m^. ) = E(In,m(X .I) I ^ ?2I.'+m,.,)
_ (Px (A n+m )) x = xa21,,.^1.I ^n+m( X nzi^)s
say. Hence,
P(A I . n2^* I ) = E( P+m (Xmt

( .m1 +^ ) ^ ^nTh,,) = (
J +zmx,
dz)/
x = X,2,, i
But n,mt(x, B) -+ 1 (B) as m -+ oo, uniformly for all x and B. Therefore
P(A I m* ,) - cp +m (z),(dz) --. 0 a.s. as m - cc. (T.3.50)

J
But
^^n+mlZ)t^l(dZ) = E(n+m(X,1,_ = P(A).
Therefore, (T.3.50) implies, for every Be
P(B nA)P(B)P(A)^0 asm --oo. (T.3.51)
But the left side of (T.3.51) does not depend on m. Therefore, P(B n A) = P(B)P(A).
I-fence a{Z (f ): I
; 1.
This implies that .i is independent of a{Z (f ): _>
i ;1 }. In particular, is independent
of itself, so that P(A) = P(A n A) = P(A)P(A). Thus, P(A) = 0 or 1. n
Using the above lemma and the Ergodic Theorem (theoretical complements II.9),
instead of the classical strong law of large numbers, we may show, as in Section 9
of Chapter 1I, or Section 12 of Chapter V, that
1(
J as.
. 1
lim -- .i(X ) ds --' --- ---- E f(X s ) ds = .i(X)m(dX) > (T.3.52)
r -. >
S
t 0 E(3 tie) i,, 1
say, where in is the probability measure on (H', k ) with m(B) equal to the average
amount of time the process {X } spends in the set B during a cycle [72+ 92n+3].
S
It follows from (T.3.52) that in is the unique invariant probability for the diffusion.
The criterion (3.25) (Proposition 3.3) for positive recurrence is due to R. Z.
Has'minskii (1960), "Ergodic Properties of Recurrent Diffusion Processes and
Stabilization of the Solution to the Cauchy Problem for Parabolic Equations," Theor.
Probability App!., 5, pp. 196-214. A proof and some extensions may be found in R.
N. Bhattacharya (1978), "Criteria for Recurrence and Existence of Invariant Measures
for Multidimensional Diffusions," Ann. Probab., 6, pp. 541-553; Correction Note
(1980), Ann. Probab., 8. The proof in the last article does not require Harnack's
inequality.

1. Theorem 4.1 and Proposition 4.2 are special cases of more general results contained
in G. Basak (1989), "Stability and Functional Central Limit Theorems for Degenerate
Diffusions,)) Ph.D. Dissertation, Indiana University.
Two comprehensive accounts of the theory of stochastic differential equations are
N. Ikeda and S. Watanabe (1981), Stochastic Differentia! Equations and Diffusion
Processes, North-Holland, Amsterdam, and I. Karatzas and S. E. Shreve (1988),
Brownian Motion and Stochastic Calculus, Springer- Verlag, New York.
CHAPTER 0
A Probability and Measure Theory

Overview
1 PROBABILITY SPACES
Underlying the mathematical description of random variables and events is the notion
of a probability space ((2, F, P). The sample space 1 is a nonempty set that represents
the collection of all possible outcomes of an experiment. The elements of 12 are called
sample points. The sigmafield F is a collection of subsets of Q that includes the empty
set 0 (the "impossible event') as well as the set n (the "sure event") and is closed under
the set operations of complements and finite or denumerable unions and intersections.
The elements of F are called measurable events, or simply events. The probability measure
P is an assignment of probabilities to events (sets) in F that is subject to the
conditions that (i) 0 < P(F) _< 1, for each Fe F, (ii) P(0) = 0, P(12) = 1, and
(iii) "(Ui F; ) = Z ; P(F; ) for any finite or denumerable sequence of mutually exclusive
(pairwise disjoint) events F; , i = 1, 2, ... , belonging to F. The closure properties of F
ensure that the usual applications of set operations in representing events do not lead
to nonmeasurable events for which no (consistent) assignment of probability is possible.
The required countable additivity property (iii) gives probabilities a sufficiently rich
structure for doing calculations and approximations involving limits. Two immediate
consequences of (iii) are the following so-called continuity properties: if A, c A 2 c
is a nondecreasing sequence of events in .y then, thinking of Ute , A n as the "limiting
event" for such sequences,
P^ U lim P(An).
An)=
n-I ^
To prove this, disjointify {A n } by B n = A. A_ 1 , n >_ 1, A 0 = 0, and apply (iii) to

U = , B. = U^ , A n . By considering complements, one gets for decreasing measurable
625
626 A PROBABILITY AND MEASURE THEORY OVERVIEW
events A, A 2 that
Pj (l A tim P(A,). (1.2)

\n, nt = n
While (1.1) holds for all countably additive set functions u (in place of P) on
F, finite or not, (1.2) does not in general hold if p(A n is not finite for at least some n
>_
)
(onwards).
If S2 is a finite or denumerable set, then probabilities are defined for all subsets F of
(I once they are specified for singletons, so F is the collection of all subsets of S2. Thus,
if f is a probability mass function (p.m.f.) for singletons, i.e., f (w) 0 for all con S2 and
y. f(co) = 1, then one may define P(F) = y.EF f(cw). The function P so defined on the
class of all subsets of Q is countably additive, i.e., P satisfies (iii). So (Q, F, P) is easily
seen to be a probability space. In this case the probability measure P is determined by
the probabilities of singletons {co}.
In the case f) is not finite or denumerable, e.g., when Q is the real line or the space
the probability P(I) = ff(x) dx, by a Riemann integral. This set function P may be
_<
of all infinite sequences of 0's and l's, then the above formulation is no longer possible
in general. Instead, for example in the case S2 = I8', one is often given a piecewise
continuous probability density function (p.d.f.) f, i.e., f is nonnegative, integrable, and
f f (x) dx = 1. For an interval I = (a, b) or (b, cc), cc < a <b oo, one then assigns
extended to the class' comprising all finite unions F = U I of pairwise disjoint intervals
; ;
I by setting P(F) = Z t P(I). The class' is afield, i.e., 0 and S2 belong to' and it is
;
closed under complements and finite intersection (and therefore finite unions). But, since
W is not a sigmafield, usual sequentially applied operations on events may lead to events
outside of for which probabilities have not been defined. But a theorem from measure
theory, the Caratheodory Extension Theorem, asserts that there is a unique countably
additive extension of P from a field to the smallest sigmafield that contains W. In the
case of f above, this sigmafield is called the Borel sigmafield .fi t on 11' and its sets are
called Borel sets of R'. In general, such an extension of P to the power set sigmafield,
that is the collection of all subsets of II', is not possible. The same considerations apply
to all measures (i.e., countably additive nonnegative set functions p defined on a sigmafield
>_
with p(Q) = 0), whether the measure of 0 is 1 or not. The measure p = m, which is
defined first for each interval I as the length of the interval, and then extended uniquely
to ,4', is called the Lebesgue measure on R. Similarly, one defines the Lebesgue measure
on R'k (k 2) whose Borel sigmafield . is the smallest sigmafield that contains all
k -dimensional rectangles I = I, x I Z x x 1,, with I a one-dimensional rectangle
;
(interval) of the previous type. The Lebesgue measure of a rectangle is the product of
the lengths of its sides, i.e., its volume. Lebesgue measure on 11" has the property that
the space can be decomposed into a countable union of measurable sets of finite Lebesgue
measure; such measures are said to be sigma finite. All measures referred to in this book
are sigma-finite.
The extension of a measure p from a field W, as provided by the Caratheodory
Extension Theorem stated above, is unique and may be expressed by the formula
p(F) = inf E p(C,) (F e F), (1.3)
where the summation is over a finite collection C,, C 2 , ... of sets in' whose union
contains F and the infimum is over all such collections.
RANDOM VARIABLES AND INTEGRATION 627
As suggested by the construction of measures on it outlined above, starting from

their specifications on a class of rectangles, if two measures , and U 2 on a sigmafield
agree on a subclass .9t c ,F closed under finite intersections and ) E .sd, then they
agree on the smallest sigmafield, denoted a(d), that contains .4. The sigmafield a(d)
is called the sigmafield generated by d. On a metric space S the sigmafield M = a(S)
generated by the class of all open sets is called the Borel sigmafield.
In case a probability measure P is given on M k , by specifying a p.d.f. f on I, for
example, one defines the (cumulative) distribution function (c.d.f.) F associated with P as
F(x)=P({(Y1,.. .,Yk)ER k :y;<xifor I <i <,k}), x=(x 1 ,...,xjEI8 k . (1.4)
It is simple to show, using the continuity (additivity) properties of P that F is

right-continuous and coordinatewise nondecreasing. By the usual inclusionexclusion
procedure one may compute probabilities of rectangles (a,, b,] x x (a h , bk ] in terms
of the distribution function.
2 RANDOM VARIABLES AND INTEGRATION
A real-valued function X defined on Q is a random variable provided that events of the

form {X E I} := {co E S: X(cu) E I} are in F for all intervals l; i.e., X is a measurable
function on S with respect to .f . This definition makes it possible to assign probabilities
to events {X E I}. A similar definition can be made for k-dimensional vector-valued
random variables by taking I to be an arbitrary open rectangle in ll. From this definition
it follows that if X is a random variable then the event {X E B} E.y for every Borel set
B. The distribution of X is the probability measure Q on (lt, -4 k ) given by
Q(B) = P(X E B), B E .A k . (2.1)
Often one writes Px for Q. We sometimes write Q(dx) = f(x) dx to signify that the
distribution of X has p.d.f. f with respect to Lebesgue measure.
Suppose that (S2, ,y , p) is an arbitrary measure space and g is a real-valued function
on S2 that is measurable with respect to .^. For the simplest case, suppose that g takes
on only finitely many distinct nonzero values y t , Y 2 ..... y k on the respective sets
A,, A 21 ... , A k ; such a function g is called a simple function. In the case is a probability
measure, simple functions are discrete random variables. If (A ; ) is finite for each i _> 1
then the Lebesgue integral of g is defined as the sum of the values of g weighted by the
p-measures of the sets on which it takes its values. That is,
g dp'= Z y, (Ar), (2.2)

Jn
where A ; = {w E Q: g(w) = y;}, i = 1, 2, ... , k. If g is a bounded nonnegative measurable

function on S2, then it is possible to approximate g by a sequence of simple functions
n = 1, 2, ... , where g" has values i/2" on A(n) = {w: i /2" _< g(w) < (i + 1)/2"),
respectively, for i = 1, 2, ... , k" = N 2" with N so large that g(w) < N for all w. If each
A 1 (n) has finite measure, then fn g" dp is a nondecreasing sequence of numbers whose
limit therefore exists but may be infinite. In the case that the limit exists and is finite,
g is said to be integrable and its Lebesgue integral is the limiting value. In the case that
g is an unbounded nonnegative measurable function, first approximate g by

g(w) = min{g(w), N 1 }, N = 1, 2, .... Then each g N is a bounded nonnegative
measurable function. If each g r, is integrable, then the sequence f n g,, dp is a
nondecreasing sequence of numbers. If the limit of this sequence is finite, then g is
integrable and its integral is the limiting value. Finally, if g is an arbitrary measurable
function then write g = g g as a difference of nonnegative measurable functions,
where g + (w) = g(w) on A + _ {weft g(w) > O} and g + (w) = 0 on A - = {w e n: g(w) <O},
and g(w) = 0 on A + and g(w) = g(w) on A. If each of g + and g - is integrable,
then g is integrable and its integral is the difference of the two integrals. A basic simple
property of the integral is linearity. If f, g are integrable and c, d are reals, then cf + dg
is integrable and In (cf + dg) dp = c Jn f dp + d $n g d.
In the case y = P is a probability measure, the integral of g is called the expected
value of g. The notation for the expected value (integral) takes various forms, e.g.,
EX = f n X dP = J0 X(cw)P(dw). Sometimes the domain of integration is omitted, and
one simply writes f X dP for f n X dP. In addition, one may apply a change of variable
co + X(w) from S2 to 11' to obtain the integral with respect to the distribution Q of X
on R' given by
EX = J xQ(dx).
u^
More generally, if^p is a measurable function on R' and q (X) integrable, then
Erp(X) = J cp(X) dP = J rp(x)Q(dx). (2.3)

n â^
To verify the change-of-variable formula (2.3) one begins with simple functions (p and
proceeds by limits as described after the definition (2.2).
Estimates and bounds on expected values and other integrals are often obtained by
applications of the basic inequalities for integrals summarized below. Let (S2, ^ , p) be
an arbitrary measure space and let f and g be integrable functions defined on S2. A
property is said to hold almost everywhere (a.e.) or almost surely (a.s.) if the set of
points where it fails has p-measure zero. Then, it is simple to check from the definition
of the integral that we have the following.
Proposition 2.1. (Domination Inequality). If f _< g p-a.e. then
Jn
f d
< in g dl^ (2.4)
A real-valued differentiable function (p defined on an interval I whose graph is

supported below by its tangent lines is called convex function, e.g., qi(x) = ex, x e 18',
(p(x) = x, x > 0 (p > 1). In other words, a differentiable cp is convex on I if
4,(y) -> 4,(x) + m(x)(y x) (x, y e 1), (2.5)
where m(x) = 4,'(x). The function cy is strictly convex if the inequality in (2.5) is strict
except when x = y. For twice continuously differentiable functions this means a positive
second derivative. Observe that if rp is convex on I, and X and 4(X) have finite expec-
RANDOM VARIABLES AND INTEGRATION 629
tations, then letting y = X, x = EX in (2.5) we have q(X) ? m(EX)(X EX) + (p(EX).

Therefore, using the above domination inequality (2.6) and linearity of the integral, we
have
Ecp(X) >- m(EX)E(X EX) + Ecp(EX) = cp(EX ). (2.6)
This inequality is strict if (p is strictly convex and X is not degenerate. More generally,
convexity means that for any x 0 ye I, (p(tx + (1 t)y) _< up(x) + (1 t)q (y),
0 < t < 1; likewise strict convexity means strict inequality here. One may show that (2.5)
still holds if m(x) is taken to be the right-hand derivative of tp at x. The general inequality
(2.6) may then be stated as follows.
Proposition 2.2. (Jensen's Inequality). If cp is a convex function on the interval I and

if X is a random variable taking its values in 1, then
(p(EX) _< Etp(X) (2.7)
provided that the indicated expected values exist. Moreover, if q is strictly convex, then
equality holds if and only if the distribution of X is concentrated at a point (degenerate).
From Jensen's inequality we see that if X is a random variable with finite pth absolute
moment then for 0 < r < p, writing IXI = (IXI')'/", we get EIXI >_ (EIXI')l'. That is,
taking the pth root,
EIXI")' ' (0 < r < p). (2.8)
(EIXI')' 1 -<' (
In particular, all moments of lower order than p also exist. The inequality (2.8) is known
as Liapounov's Inequality.
The next inequality is the Hlder Inequality, which can also be viewed as a convexity
type inequality as follows. Let p > 1, q > I and let f and g be functions such that If I
and Igl are both p-integrable. If p and q are conjugate in the sense I/p + 1/q = 1, then
using convexity of the function ex we obtain
IfI - I9I = e nrvuoglllP+tt/vuogl919 < 1 e l-91f1" + I e l- 919IQ = I IfI' + 1 191 9 .

P q P q
Thus,
1 1
IfIIgI - If1 1 +- I9I 9 . ( 2.9)
P q
Now define the L -norm by
IlflI '=( Ifl d1) . ( 2.10)
Replacing f and g by f/IlfII,, and g /Ilgll q , respectively, we have upon integration that
f Ifgj dy
5
1
J Ifl d +--_+=
IJ1919du t 1
1. (2.11)
II.fIIpII9II q P (IifII P )' q (11911 9 )4 P q
Thus, we have proved the following.

Proposition 2.3. (Hlder Inequality). If I f" and Ig1 4 are integrable, where 1 <p < oo,
1/p + 1/q = 1, then fg is integrable and
J fIgIq dyJ (2.12)

J
The case p = 2 (and therefore q = 2) is called the CauchySchwarz Inequality, or the
Schwarz Inequality. The first inequality in (2.12) is a consequence of the Domination
Inequality, since fg _< I fgl and fg <, IfgI.
The next convexity-type inequality furnishes a triangle inequality for the If -norm
(P31).
Proposition 2.4. (Minkowski Inequality). If I f Ip and Igle are -integrable where

1 < p < co, then
if If+gl p d}
l/p
<If IIPdul
.f
'ip
+{ IgI "d }
J
'ip
. ( 2.13)
_<
To prove (2.13) first notice that since,
If + gl p ( IJ I + IgI) p < (2 max(If I, Igl)) p _<

the integrability of If + glp follows from that of If Ip and Igle. Let q = p/(p 1) be the
2p (I f I p + IgI"), (2.14)
conjugate to p, i.e., 1/q + 1/p = 1, barring the trivial case p = 1. Then,
f If+glpdu f
_< (IfI +IgI)pdp. (2.15)
The trick is to write
J (I.f I + IgI) p d =f J
I.f I(If I + Igl) p- ' du + IgI(If I + II) p- ' du
Now we can apply Hlder's inequality to get, since qp q = p,
/q
Elle
(IfI + IgI) p d < IIflip [(IfI + II) p-I]9d)1 + II8II [(Ifl + II)p- I]4dp
J \J (J
_
_ (Ilf Ilp + II IIp) (III + IgI) p
n
Dividing by (f (If I + Igl)p )' Iq and again using conjugacy 1 1/q = 1/p gives the desired
inequality by taking (1 /p)th roots in (2.15), i.e., Ilf + gllp < IlfII + IlgII .
Finally, if X is a random variable on a probability space (f2, .., P) then one has
Chebyshev's Inequality
P(IXI 3 e) p EIXIP (s > 0, p> 0), (2.16)
>, >_
E
which follows from EIXI" E(lIIxI,E}IXI P ) e"P(IXI > e).

LIMITS AND INTEGRATION 631
3 LIMITS AND INTEGRATION
A sequence of measurable functions { f : n >- I} on a measure space (S2, Y, p) is said to

f
converge in measure (or in probability in case p is a probability) to f if, for every e > 0,
p({Ifn fi>E})-+0 asn -+oo. (3.1)
A sequence { fn } is said to converge almost everywhere (abbreviated a.e.) to f, if
u({fn i })=0. (3.2)
If g is a probability measure, and (3.2) holds, one says that (f} converges almost sure/i'
(a.s.) to f.
A sequence { f,} is said to be Cauchy in measure if, for every E > 0,
Alf.LI >E })-0 asn, in--^ co. (3.3)
Given such a sequence, one may find an increasing sequence of positive integers
{n k : k >- 1} such that
u({I/n k .in k . (> ('z) k }) < (z) k (k = 1, 2, ...). (3.4)
The sets B,:= U= {If fnk , l > 2 -k } form a decreasing sequence converging to
.
B:= {If fnk I > 2 for infinitely many integers k}. Now p(B,) -< 2"+'. Therefore,
-
u(B) = 0. Since B = {1 fn , f,,, s 2 for all sufficiently large k}, on B` the sequence
{ f,,,: k >- l} converges to a function f. Therefore, { fnk } converges to f a.e. Further, for
any r>0 and all m,
u({IfnfI>E})Vu({Ifnfnml>2})+11({Ifn .fl>2}). m
(3.5)
The first term on the right goes to zero as n and n m go to infinity. Also,
Bm if
1 1 Jn.^ J I > 21 C
2m+1 < 2
As p(B m ) -+ 0 as m -+ oc, so does the second term on the right in (3.5). Therefore,
{ fn : n -> l} converges to fin measure. Conversely, if {f,,} converges to f in measure then,
for every E > 0,
f > E}) < ({l f .fl > 2}) + p({IJJ f > 2}) --> 0
as n, m -- x. That is, {f,,} is Cauchy in measure. We have proved parts (a), (b) of the
following result.
Proposition 3.1. Let (1 , .y, p) be a measure space on which is defined a sequence { !}

of measurable functions.
(a) If { f} converges in measure to f then If.) is Cauchy in measure and there exists
a subsequence { f,k : k >- l} which converges a.e. to f
(b) If {f} is Cauchy in measure then there exists f such that {f} converges to f in
measure.
(c) If is finite and { f,} converges to f a.e., then {f} converges to f in measure.
Part (c) follows from the relations
u( {IfmfI>E})<(U {ifmfI>E } -+ 0 asn -goo. (3.6)

m=n
Note that the sets A, say, within parentheses on the right side are decreasing to
A := {I f, f I > s for infinitely many m }. As remarked in Section 1, following (1.2), the
convergence of (A m ) to u(A) holds because (A m ) is finite for all m.
The first important result on interchanging the order of limit and integration is the
following.
Theorem 3.2. (The Monotone Convergence Theorem). If { f} is an increasing sequence

of nonnegative measurable functions on a measure space (S2, .y , p), then
I^m J f.du= ffdl^, (3.7)
where f = lim f,.
Proof. If {f) is a sequence of simple functions then (3.7) is simply the definition of
$ f dp. In the general case, for each n let { f k : k >- l} be an increasing sequence of non-
negative simple functions converging a.e. to f, (ask --* x). Then g k := max{ fn k : 1 -< n -< k}
(k >, 1) is an increasing sequence of simple functions and, for k >- n,
Jk (3.8)
as f, k -< f, < f, for I -< n < k. By the Domination Inequality
f f,, dp , J g k dp <- ffk d. (3.9)
By taking limits in (3.8) as k -. oo one gets
(3.10)
where g = lim k g k By taking limits in (3.10) as n - oo, one gets f < g , f, that is, g = f.
Now, by the definition of the integral, on taking limits in (3.9) as k -^ oo one gets
f fdu< J gd= J fdSlim J k

fkd. (3.11)
Taking limits, as n -. co, in (3.11) one gets (3.7). n

A useful consequence of the Monotone Convergence Theorem is the following

theorem.
Theorem 3.3. (Fatou's Lemma). If { f,} is a sequence of nonnegative integrable functions,

then
J (liminf f,) dit -< liminf J f d. (3.12)
Proof. Write g:= inf{ fk : k -> n}, g:= liminf f,. Then 0 g 1g. Therefore, by the
Monotone Convergence Theorem, f g dp --+ ( g d. But g f so that $ g dp $ ff dp
for all n. Therefore,
J g d = lim J g d = liminf J g dp -< liminf J f dp. n
The final result on the interchange of limits and integration is as follows.
Theorem 3.4. (Lebesgue's Dominated Convergence Theorem). Let { be

be a sequence of
measurable functions on a measure space (S2, F, p) such that
(i) f -^ f a.e. or in measure, and
(ii) ILl -< g for all n, where $ g dp < oo. Then
f If. - f I dp ^ 0 asn--oc. (3.13)
In particular, $ f d - $ f dp.
Proof. Assume f, -+ f a.e. By Fatou's Lemma applied to the sequences {g + f,}, {g - f}

one gets $ f d < liminf $ f du and $ f d -> limsup $ f d. Therefore, lim $ f dp = $ f d.
To derive (3.13), apply the above result to the sequence {If. - f I}, noting that
If,,fI <2g.
Now assume { f,} converges to fin measure. By Proposition 3.1, for every subsequence
{ f,.} of {f,} there exists a further subsequence { f,,,: k >- 1) such that { f, k } converges
a.e. to f. By the above, $ f k dp - $ f dp as k -* oo. Since the limit is the same for all
subsequences, the proof is complete. n
We now turn to the L"-spaces. Consider the set of all measurable functions f on a
measure space (S2, .F, u) such that I I fl dp < co, where p is a given number in [1, oo).
For each such f, consider the class of all g such that f = g a.e. Since the relation "f = g
a.e." is an equivalence relation (i.e., it is reflexive, symmetric, and transitive), the set of
all such f splits into disjoint equivalence classes. The set of all these equivalence classes
is denoted L"(S), F, p) or L(S2, p) or simply L if the underlying measure space is obvious.
It is customary and less cumbersome to write f e L, rather than {equivalence class of
f} e U. For Je L, the L-norm II f lI is defined by (2.10). By Minkowskii's inequality
(2.13), If is a linear space. That is, f, g e L implies cf + dg e If for all reals c, d. Also,
under the L'-norm, If is a normed linear space. This means (i) III f ii,, 0 for all f e L ,
with equality if and only if f = 0 (a.e.); (ii) Ilcf II D = I cl II f II P for every real c, every f e L;
and (iii) the triangle inequality II f + gll, < II f II P + g11, holds for all f, g e L (by (2.13)).
The space L" is also complete, i.e., if fn e L' (n > 1) is a Cauchy sequence (i.e.,
II.% .fmll P -* 0 as n, m -. oo), then there exists f in L such that II fn f II P - 0. For
this last important fact, note first that Ilfn fmllP - 0 easily implies that { fn } is Cauchy
in measure and therefore, by Proposition 3.1, converges to some f in measure. As a
consequence, {h f"} converges to I f I in measure. It then follows by Fatou's Lemma
applied to {I ff} that f I f I dp < oo, i.e., f e U. Applying Fatou's Lemma to the sequence
{If fml in >_
p: n} one gets
I.in f I dp <_ liminf

J m J I.fn .lml dp. (3.14)
But S Ifn fmi' d = (11L fmllp)". Therefore, the right side goes to zero as n -^ oo,
proving that II f f II,, - 0. A complete normed linear space is called a Banach space.
Thus, the LP-spaces are Banach spaces (1 < p < oo). If is a probability measure (more
generally, a finite measure), then p < p' implies L" c La', in view of Liapounov's inequality
(2.8). This is not true if (Sl) = cc.
When does a sequence {f} in L converge to some Je LP in L" -norm? Clearly, {f,,}
must converge to f in measure. A useful sufficient condition is provided by Lebesgue's
Dominated Convergence Theorem if one assumes that the dominating function g is in
R. For then IL f I -* 0 in measure, and IL f IP <, 2"(/c,"+ I f(" ) (see Eq. 2.14)
2 + 'Igl" = h, say, so that the conditions of the theorem apply to {If f I}.
If p is a probability measure, then one may obtain a necessary and sufficient condition
for L-convergence. For this purpose, define a sequence of random variables {X n } on a
probability space (S2, F, P) to be uniformly integrable if
n
J
sup IXnI dP -. 0 as A - co. (3.15)
One then has the following.

Theorem 3.5. (L'- Convergence Criterion). Let (S2, F , P) be a probability space, and
{Xn } a sequence of integrable random variables. Then X. converges in L' to some
random variable X if and only if (i) XX -, X in probability, and (ii) {Xn } is uniformly
integrable.
Proof. (Sufficiency). Under the hypotheses (i), (ii), given e > 0 one has for all n,
f IX.I dP <_ J IXnldP+A_<E +1(e) (3.16)
if A(e) is sufficiently large. In particular, { f IX,,I dP} is a bounded sequence. Then, by

Fatou's Lemma, f IXI dP < co. Now
f11X.-Xj>_.Z)
IXnXIdP<
f
flX.j_>Al2)
IXXXIdP+
fflX,j<Al2jX,-Xj>_A)
IXn XIdP
^ f I XnI dP +
jX^J>_Al21 f I
jX.J^_.1/2)
X I dP
fI
+ X,XIdP.
jjX.J Âj2.jX.-Xj1_A)
(3.17)

Now, given e > 0, choose 2(E) > 0 such that the first and second terms of the last sum
are less than e for A = 2(e) (use hypothesis (ii)). With this value of A, the third term goes
to zero, as n -. eo, by Lebesgue's Dominated Convergence Theorem. Hence
limsup J JX X 1 dP -< 2e.

n- W (Ix-XPA( )1
But, again by the Dominated Convergence Theorem,
limsup
- 'm f IX.-XI<,( t )^
IX XI dP = 0.
(Necessity). If X. + X in L', clearly X. X in probability. Also,
J {Ix^l%zt
J ilx.,l x)
IXI dP - ix X^ dP + JXj dP
f IXX X^dP +
f fjXj>-k2)
JXidP + f
jjXj<ZJZ.IX,,-XI zz,
JX^dP.
(3.18)
The first term of the last sum goes to zero, as n - co, by hypothesis. For each A > 0
the third term goes to zero by the Dominated Convergence Theorem, as n + oo. The
same convergence theorem implies that the second term goes to zero as A . cc. Therefore,
given e > 0 there exists 2(E), n(e), such that
sup f JXj dP -< e if 1 2(a). (3.19)

"'>"(E)lxI%x1
Since a finite sequence {X: 1 < n < n(e)} of integrable random variables is always
uniformly integrable, it follows that {XX } is uniformly integrable. n
A corollary is the following version. As always, p >, 1.
Theorem 3.6. (L-Convergence Criterion). Let (S2,.y, P) be a probability space, and

XX e if (n 1). Then X. X in L if and only if (i) XX X in probability, and (ii)
{IXi} is uniformly integrable.
Proof. Apply the above result to the sequence {IXX XI}. For sufficiency, note, as in
(3.16), that (i), (ii) imply Xe L, and then argue as in (3.17) that the uniform integrability
of {jXI} implies that of {IX XI"}. The proof of necessity is analogous to (3.18) and
(3.19), using (2.14). n
It is simple to check by a Chebyshev-type inequality (see Eq. 2.16) that if {EIXV }

is a bounded sequence for some p' > p, then {IXI"} is uniformly integrable.
There is one important case where convergence a.e. implies C-convergence. This is
as follows.
Theorem 3.7. (Scheffe's Theorem). Let (S), .^ , p) be a measure space. Let f,(n >- 1),f
be p.d.f.'s with respect to p, i.e., f,,, f are nonnegative, and Sf dp = 1 for all n, If dp = 1.
If f converges a.e. to f, then f . fin L'.
Proof: Recall that for every real-valued function g on SI, one has
g = g + g - , II = g + + g - , where g + = max {g, 0 }, g - = min {g, 0 }.
One has
f (f fn) dl^ = 0 = $(f fn) d f (f fn) du,

}
so that
J (f f) dP = J (f fn)+ d, J If f^l d = 2 f (.1 .%)+ d. (3.20)

_<
Now 0 _< (f f,) + f, and (f f,) + -+ 0 a.e. as n * oo. Therefore, by (3.20) and
Lebesgue's Dominated Convergence Theorem,
f If_f.1 d = 2 f (f .ff) + du 0 asn>oo. n
Among the L"- spaces the space Lz - Lz (O, .F , p) has a particularly rich structure. It
is a Hilbert space. That is, it is a Banach space with an inner product < , > defined by
<f,g>= f fgdu (f,geL ).

2 (3.21)
The inner product is bilinear, i.e., linear in each argument, and the L z -norm is given by
If II = <.f,f>' rz (3.22)
and the Schwarz Inequality may be expressed as
I< 1, 9 >I 11f 112119112. (3.23)
4 PRODUCT MEASURES AND INDEPENDENCE,

RADONNIKODYM THEOREM AND CONDITIONAL
PROBABILITY
If (S,, .,, , ), (S2,'2 , P2) are two measure spaces, then the product space (S, .9', p)
is a measure space where (i) S is the Cartesian product S 1 x S 2 ; (ii) .50 = .9' g .'z
is the smallest sigmafeld containing the class R of all measurable rectangles,
x B 2 : B, e .So,, B z e Soz ); and (iii) p is the product measure P, x z on .9'
determined by the requirement
p(B, x B.) = p1(B1)pz(B2) (B 1 E5",, B 2 e.9 ). (4.1)
As the intersection of two measurable rectangles is a measurable rectangle, 9 is closed

PRODUCT MEASURES AND INDEPENDENCE 637
under finite intersections. Then the class' of all finite disjoint unions of sets in .4 is a
field. By finite additivity and (4.1), p extends to' as a countably additive set function.
Finally, Caratheodory's Extension Theorem extends p uniquely to a measure on
9 = e('), the smallest sigmafield containing '.
For each Ba 5, every x-section B(x ,., := {y: (x, y) E B} is in $o,. This is clearly true
for measurable rectangles F = B, x B 2 . The class .01 of all sets in So for which the
assertion is true is a 2-class (or, a lambda class). That is, (i) S e 2, (ii) A, Be 2, and
A c B imply B\A E 2, and (iii) A(n l) e 2, A IA imply Ac Y. We state without
proof the following useful result from which the measurable-sections property asserted
above for all Be .9' follows.
Theorem 4.1. (Dynkin's Pi-Lambda Theorem).' Suppose a class .4 is closed under finite
intersections, a class d is a lambda class, and R c .4. Then a(s) c .W, where a(s) is
the smallest sigmafield containing -4.
In view of the measurable-section property, for each B e 9 one may define the
functions x " p 2 (B(X ,.,), y -a p,(B). These are measurable functions on (S,,.',,)
and (SZ, 52, P2), respectively, and one has
(B) = (B E 2). (4.2)

J A2(B(X .,)p, (dx) = f1 2
p,(Bc..vi)P2(dy)
This last assertion holds for B = B, x B 2 e , by (4.2) and the relations

2(B1,)) = P 2 (B 2 )l B ,(x), (B () ) = p,(B,)1 8 ,(y). The proof is now completed for all
B e .9 by the Pi-Lambda Theorem. It follows that if f(x, y) is a simple function on
(S, .9', p), then for each x e S, the function y . f(x, y) on (S 2 . Sot , 2 ) is measurable,
and for each y e S 2 the function x * f(x, y) on (S,,50,,,) is measurable. Further for
all such f
J /d
= J \Js=
.f(x,Y)P2(dY) IPi(dx) =
Js ( f, ^ f
'
(x,Y)p,(dx) F^2(dy). (4.3)
J
By the usual approximation of measurable functions by simple functions one arrives
at the following theorem.
Theorem 4.2. (Fubini's Theorem). (a) If f is integrable on the product space

(Si x S2, ` OO`2 9, x P2) = (S, 5, p), then
,
(i) x - fsz f(x, y)p 2 (dy) is measurable and integrable on (S,, So,, p,).
(ii) y - f s , f(x, y)p,(dx) is measurable and integrable on (S2, 9'Z, P2).
(iii) One has the equalities (4.3).
(b) If f is nonnegative measurable on (S, .9', p) then (4.3) holds, whether the integrals
are finite or not.
The concept of a product space extends to an arbitrary but finite number of

components (S ; ,., ;) (1 < i _< k). In this case S = S, x S 2 x .. x Sk , . = 9, Ox .9 OO
. (9 is the smallest sigmafield containing the class . of all measurable rectangles
' P. Billingsley (1986), Probability and Measure, 2nd ed., Wiley, New York. p. 36.
B = B 1 x B 2 x x B k (B ; e 1 -< i -< k). The sigmafield .9' is called the product

sigmafield, while p = u, x .. x Pk is the product measure determined by
(B 1 x Bz x .. x Bk) = p1(B1)2(B2) ...

uk(Bk) (4.4)
for elements in A. Fubini's theorem extends in a straightforward manner, integrating

f first with respect to one coordinate keeping the k 1 others fixed, then integrating
the resulting function of the other variables with respect to a second coordinate keeping
the remaining k 2 fixed, and so on until the function of a single variable is integrated
with respect to the last and remaining coordinate. The order in which the variables are
integrated is immaterial, the final result equals J s f d.
Product probabilities arise as joint distributions of independent random variables.
Let (S2, F, P) be a probability space on which are defined measurable functions X;
(1 -< i < k), with X, taking values in S; , which is endowed with a sigmafield So (1 < i < k).
The measurable functions (or random variables) X,, X2 .... , X are said to be
independent if
P(X, EB 1i X2 EB 2 ,...,Xk EBk )= P(X1 eB1) ...

P(XkEBk)
for all B I eYl ,...,Bk e4k . (4.5)
In other words the (joint) distribution of (X 1 ,. . . , Xk ) is a product measure. If f is an

integrable function on (S; , .97, p i ) (1 < i -< k), then the function
f: (xt, x2.... , Xk) -- .r1(x 1)f2(x2) . ..

fk(xk)
is integrable on (S, .9', p), and one has
or
J fd (4.6)
E(fl f(X1)) = fl (Efi(X1)). (4.7)
A sequence {X} of random variables is said to be independent if every finite

subcollection is. Two sigmaftelds FI , .y2 are independent if P(F1 n F2 ) = P(F 1 )P(F2 ) for
all Fl E .fi , F2 e .F2 . Two families of random variables {X x : A E A 1 ) and { Yx : A E A 2 } are
independent of each other if o{X5 : A E A t ) and r{YY : A E A Z } are independent. Here
6 {Xd: A E Al is the smallest sigmafield with respect to which all the X x are measurable.
Events A 1 , A 2 , ... , A k are independent if the corresponding indicator functions

1 4 1 A 2 , ... ,1,,, are independent. This is equivalent to requiring P(B l n B 2 n n Bk ) _
P(B 1 )P(B 2 ) P(Bk ) for all choices B l , ... , B k with B ; = A ; or A.
Before turning to Kolmogorov's definition of conditional probabilities, it is necessary
to state an important result from measure theory. To motivate it, let (S2, F, p) be a
measure space and let f be an integrable function on it. Then the set function v defined by
v(F) := fF f d (F e .^), (4.8)

is a countable additive set function, or a signed measure, on .F with the property
v(F) = 0 if (F) = 0. (4.9)
A signed measure v on F is said to be absolutely continuous with respect to , denoted

v ,u, if (4.9) holds. The theorem below says that, conversely, v p implies the existence
of an f such that (4.8) holds, if p, v are sigmajinite. A countably additive set function v
is sigmafinite if there exist B. (n > 1) such that (i) U,,, 1 B = S2, (ii) Jv(B)I < oo for all n.
All measures in this book are assumed to be sigmaflnite.
Theorem 4.3. (RadonNikodym Theorem). 2 Let (S2, F, p) be a measure space. If a

finite signed measure v on F is absolutely continuous with respect to p then there exists
an a.e. unique integrable function f, called the RadonNikodym derivative of v with
respect to , such that v has the representation (4.8).
Next, the conditional probability P(A B), of an event A given another event B, is
defined in classical probability as
n B)
P(A I B)._ P(A (4.10)
P(B)
provided P(B) > 0. To introduce Kolmogorov's extension of this classical notion, let
(S2, ,F, P) be a probability space and {B} a countable partition of S2 by sets B. in F.
Let -9 denote the sigmafield generated by this partition, Q = a{B}. That is, -9 comprises
all countable disjoint unions of sets in {B}. Given an event A e .F, one defines P(A
the conditional probability of A given Q, by the s-measurable random variable
P(A n B.)
P(A 19)(o)._ for w e B,,, (4.11)
P(B)
if P(B) > 0, and an arbitrary constant c,,, say, if P(B) = 0. Check that
P(A I-9) dP = P(A n D)

J D
for all Dc -9. (4.12)
If X is a random variable, E!XJ < oo, then one defines E(X (.9), the conditional
expectation of X given -9, as the 9i-measurable random variable
E(X I f)(w):= I X dP for we B,,, if P(B) > 0,

P(B,,) JB,
=c^ forcoeB,,,ifP(B)=0, (4.13)
where c are arbitrarily chosen constants (e.g., c = 0 for all n). From (4.13) one easily
verifies the equality
D
E(X 9) dP = J D
XdP for all D e ^. (4.14)
Note that P(A I -9) = E(1 A I D), so that (4.12) is a special case of (4.14).
2 P. Billingsley (1986), loc. cit., p. 443.
One may express (4.14) by saying that E(X -9) is a 9- measurable random variable
whose integral over each D E -9 equals the integral of X over D. By taking D = B. in
(4.14), on the other hand, one derives (4.13). Thus, the italicized statement above may
be taken to be the definition of E(X -9). This is Kolmogorov's definition of E(X -9), I
which, however, holds for any subsigmafield -9 of F, whether generated by a countable
partition or not. To see that a 9-measurable function E(X -9) satisfying (4.14) exists
and is unique, no matter what sigmafield .9 c F is given, consider the set function v
defined on 9 by
v(D)_ X dP (D E 9). (4.15)

D
Then v(D) = 0 if P(D) = 0. Consider the restriction of P to .9. Then one has v P on
9. By the Radon-Nikodym Theorem, there exists a unique (up to a P-null set)
9-measurable function, denoted E(X 9), such that (4.14) holds.
1
The simple interpretation of E(X -9) in (4.13) as the average of X on each member
of the partition generating .9 is lost in this abstract definition for more general sigmafields
9. But the italicized definition of E(X -9) above still retains the intuitive idea that given
the information embodied in -9, the reassessed (or conditional) probability of A, or
expectation of X, must depend only on this information, i.e., must be 9-measurable,
and must give the correct probabilities, or expectations, when integrated over events in -9.
Here is a list of some of the commonly used properties of conditional expectations.
Theorem 4.4. (Basic Properties of Conditional Expectations). Let (S2, F, P) be a

probability space, -9 a subsigmafield of F.
(a) If X is 9-measurable and integrable then E(X -9) = X.
(b) (Linearity) If X, Y are integrable and c, d constants, then E(cX + dYI -9)
cE(X -9) + dE(YI -9).
(c) (Order) If X, Y are integrable and X <_ Y a.s., then E(X -9) _< E(YI 9) a.s.
(d) If Y and X Y are integrable, and X is 9-measurable then E(X Y -9) = XE(Y 9).
(e) (Successive Smoothing) If 9 is a subsigmafield ofF, 5 c -9, and X is integrable,
then E(X ) = E[E(X 1 -9) ] = E[E(X 19)1 -9].
(f) (Convergence) Let {X} be a sequence of random variables such that, for all n,
Z where Z is integrable. If X - X a.s., then E(X, I -9) - E(X19) a.s. and
X
in V.
All the properties (a) -(f) are fairly straightforward consequences of the definition of
conditional expectations, and are therefore not proved here. The interplay between
conditional expectations and independence is described by the following result.
Theorem 4.5. (Independence and Conditional Expectation). Let (S2, F, P) be a

probability space and 9 a subsigmafield of F. Then the following hold.
(a) If X is integrable, and a {X} and -9 are independent, then E(X -9) = EX.
(b) Suppose X and Y are measurable maps on (Q, .F) into (S,, .9,) and (S2, Y2),
respectively. Let rp be a real-valued measurable function on (S, x S2, . .9? ),
and cp(X, Y) integrable. Assume X and Y are independent, i.e., v{X} and o { Y} -
are independent. Then,
E(q (X, Y) I a {Y }) = [ Etp(X, y)]r =r. (4.16)

(c) Let X, Y, Z be measurable maps on (0, .Y-, P) into (S,, S/). (S Z , .9 ), and (S 3 , ./ ),
respectively. Let co be a real-valued measurable function on (S,, .9, ). Assume
cp(X) is integrable, and a{X} and a{ Y} both independent of a{Z}. Then, writing
Q{Y, Z} for the smallest sigmafield c .} with respect to which both Y and Z are
measurable, one has
E(tp(X))I at Y,Z})=E(cp(X)I a{ Y}). (4.17)
Proof. (a) EX is a constant and, therefore, -measurable. Also, relation (4.14) holds
with EX in place of E(X I -9) by independence.
(b) Let Px , P,. denote the distributions of X and Y, respectively. Then the product
probability Px x P. on (S, x S 2 , .VZ ) is the (joint) distribution of (X, Y). Let
D E cr{ Y}. This means there exists BE ./ such that D = {w eft Y(co) E B}
By the change-of-variables formula and Fubini's Theorem,
J <P(X, Y) dP = J la(Y)(fl(X, Y) dP = f
n 11 x sz
la(Y)(P(x, Y)^'x(da)Pr(dY)
=
L IB(Y)(Js (P(x, v)Px(dx) I Pr(dY) = J Sz 1 s(Y)(E( (X, Y))P1(dv)
=J n
Ja
le(Y(w))(E(P(X, y)) y =Y(,) dP(w) = 1 D(W)(Ew(X, y)) y =Y () dP(w)
= JI (E<V(X,Y))y=rdP.
D
Also, [Ecp(X, y)],,=,. is o{Y}-measurable.

(c) Let D, E a{Y}, D 2 E a{Z}. Then there exist B, e'9 , B 2 e V, such that
D, = Y '(B,), D2 = Z '(B 2 ). With D = D, n D 2 one has
- -
1 e,(Y) 1 ,,(Z)p(X)dP
fD rP(X)dP= J 1 D,ID rP(X)dP= J
S2
z
o
= E(I B,(Y)W(X))E( 1 ,,(Z)) = E(la j (Y)(X))P(Z E B 2 )

= P(Z e B 2 )E[E(1 B ,(Y)gp(X) I a{Y})] = P(Z e B 2 )E(1(Y)E(cp(X) rr{Y})]
= E(1 B2 (Z)1 e (Y)E(W(X)I o{ Y})) = E(1 0 ,1 D ,E(gp(X)I o{ Y}))
= E(tp(X)I a{Y})dP.
Din Di
Thus, the desired relation (of the type) (4.14) holds for sets D = D, n D 2 e at Y, Z}. The
class :/,1 of all such sets is closed under finite intersection. Also, the class d of all sets
D e at Y, Z} for which f D q(X) dP = f D E(cp(X) ( at Y}) dP holds is a lambda class.
Therefore, by Dynkin's Pi-Lambda Theorem, a(4) c d. But = at Y, Z}. n
There is an extension of Jensen's inequality (2.7) for conditional expectations that is

useful.
Proposition 4.6. (Conditional Jensen 's Inequality). Let cp be a convex function on an

interval J. If X is an integrable random variable such that P(X e J) = 1, and cp(X) is
integrable, then
E[gp(X) 19] >, Qp(E(X I .9)). (4.18)
Proof. In (2.5) take y = X, x = E(X 1 -9), to get
rp(X) >, cp(E(X 1 .9)) + m(E(X I .9))(X E(X 1 .9)).
Now take conditional expectations, given -9, on both sides. n
As an immediate corollary to (4.18) it follows that the operation of taking conditional

expectation is a contraction on LF(i2, .F, P). That is, if X e L for some p >, 1, then
IIE(X 1 .9)IIp s 11X11 p . (4.19)
If Xe L2 , then in fact E(X I .9) is the orthogonal projection of X onto

LZ (fl, -9, P) c L2 (S2, F, P). For if Y is an arbitrary element of L2 (,.9, P), then
E(X Y) 2 = E(X E(X .9) + E(X I -9) Y) 2 = E(X E(X

+ E(E(X I -9) Y) 2 + 2E[(E(X .9) Y)(X E(X I .9))].
The last term on the right side vanishes, on first taking conditional expectation given
(see Basic Property (e)). Hence, for all Ye L2 (12, .9, P),
E(X Y) 2 = E(X E(X I .9)) 2 + E(E(X 11) Y) 2 E(X E(X ( .9 )) Z . (4.20)
Note that X has the orthogonal decomposition: X = E(X (9) + (X E(X .9)).
Finally, the classical notion of conditional probability as a reassessed probability
measure may be recovered under fairly general conditions. The technical difficulty lies
in the fact that for every given pairwise disjoint sequence of events {A n } one may assert
the equality P(U A .9)(cu) _ E P(A I .9)(m) for all w outside a P-null set (Basic
Properties (b), (f)). Since in general there are uncountably many such sequences, there
may not exist any choice of versions of P(A ^) for all A e F such that for each w,
outside a set of zero probability, A -+ P(A 1 .9)(cw) is a probability measure on F. When
such a choice is possible, the corresponding family {P(A .9): A e f} is called a regular
conditional probability. The problem becomes somewhat simpler if one does not ask for
a conditional probability measure on F, but on a smaller sigmafield. For example, one
may consider the sigmafield I = {Y - '(B): B e.9'} where Y is a measurable function on
((1, F, P) into (S, .9'). A function (w, B) -+ Q w(B -9) on S2 x .9' into [0, 1] is said to be a
conditional distribution of Ygiven -9 if (i) for each Be .50 , Q W (B 1 -9) = P( {Ye B) .9)(cw) _-
E(I B (Y) .9)(w), for all w outside a P-null set, and (ii) for each to e S2, B 4 Q^,(B I -9) is
a probability measure on (S, .9'). Note that (i), (ii) say that there is a regular conditional
probability on 9 .
If there exists a conditional distribution Q,,(B -9) of Y given .9, then it is simple to
check that
E(W(Y) 19)(w) _ Q(y)Q(dy 19) a.s. (4.21)

js
CONVERGENCE IN DISTRIBUTION IN FINITE DIMENSIONS 643
for every measurable cp on (S, 9) such that co(Y) is integrable. Conditional distributions
do exist if S is a (Borel subset of a) complete separable metric space and 5' its Borel
sigmafeld.
In this book we often write E(Z {X x : Ac A}) in place of E(Z 6{X,: A c A}) for
simplicity.
5 CONVERGENCE IN DISTRIBUTION IN FINITE DIMENSIONS
A sequence of probability measures {Pn : n = 1, 2, ...} on (W, I') is said to converge

weakly to a probability measure P (on R') if
lim 41(x) dP (x) =

n^. oar
P J 41(x) dP(x)
âI
(5.1)
holds for all bounded continuous functions 0 on W. It is actually sufficient to verify

(5.1) for those continuous functions 0 that vanish outside some finite interval. For suppose
(5.1) holds for all such functions. Let 0 be an arbitrary bounded continuous function,
l41(x)i -< c for all x. For notational convenience write {x e W: lxi >- N} = {jxI >, N}, etc.
Given s > 0 there exists N such that P({ixi >- N}) < e/2c. Let 6 N,, O be as in Figure
5.1(a), (b). Then,
lim Pn ( {Ixt ` N + 1}) >, lim J O x (x) dP(x) = J (x) dP(x)

N
P({ixi <, N}) > 1---,

2c
so that
limPP(ixi>N+1})_-1 lim Pn ({ixi-<N+1})< c . (5.2)

n- w n- m
Hence, writing 0, = 416''N and noting that 0 = Q^ s, on {ixi -< N + 1} and that on
{IxI > N + 1} one has f r(x)I < c, we have
lim I^ dP,,
n-. I$ pi
0 dPI <- lim I
n-m i
JI 0 x dP I
,t
dP
+ lim (cPP ({jxj > N + 1}) + cP({IxI > N + 1}))

n-.m
= lim cP ({IxI > N + 1}) + cP({Ixi > N + 1}))

n-^m
E 8
<C-+C-=E.
2c 2c
Since e> 0 is arbitrary, J, 0 dPn . } a , dP, and the proof of the italicized statement
above is complete. Let us now show that it is enough to verify (5.1) for every infinitely
(a)
(b)
Figure 5.1
differentiable function that vanishes outside a finite interval. For each e > 0 define the
function
PE(x) = d(E) expj (1 for ^xj < E,

x2/e2)}
= 0 l for IxI >- e, (5.3)
where d(e) is so chosen as to make f p,(x) dx = 1. One may check that p,(x) is infinitely
differentiable in x. Now let 0 be a continuous function that vanishes outside a finite interval.
Then is uniformly continuous and, therefore, b(s) = sup{1c(x) (y)j: x y^ -< e) -+ 0
as e j 0. Define
(x) = 0 * P(X)J O(x y)PE(y) dy, (5.4)

e
and note that, since 4(x) is an average over values of 4 within the interval (x e, x + s),
fr/(x) (x)I <- S(e) for all E. Hence,
^ J 0 dP J OE dPI '< (e) for all n, IJ J

0 dP 4 dPI -< b(E),
CONVERGENCE IN DISTRIBUTION IN FINITE DIMENSIONS 645
J dP_ f 0dPH J 0dPn J `dP + J 4 dP J O`dP P
J
t8 n u o8
+I L O'dP- bdP
2(r) + I J
u
O` dP. J ulO` dP 26(E) as n--. cc .
Since S(e) -+ 0 as e --* 0 it follows that f R , dPn -* f a ^ 0 dP, as claimed.

Next let Fn , F be the distribution functions of P, P. respectively (n = 1, 2, ...). We
want to show that if {P : n = 1, 2, ...} converges weakly to P. then F (x) -. F(x) as
P F
n -+ oo for all points of continuity x of F. For this, fix a point of continuity x 0 of F.

Given e > 0 there exists 'j(e) > 0 such that IF(x) - F(x o )I < r for Ix - x 0 1 _< ry(c). Let
ii. (x) = I for x _< x 0 , = 0 for x > x 0 + rt(e), and /i. (x) be linearly interpolated for
x o <x <x 0 + ry(e). Similarly, define /i, (x) = I for x < x 0 - q(e), ' (x) = 0 for x
and linearly interpolated in the interval (x 0 - ry(e), x o ). Then, using (5.1),
lim F(x 0 ) <- lim J c+(X)dPn (X) = J (x)dP(x) <- F(xo + ?I(e)) < F(x 0 ) + F,
n-+m n-+m
(5.5)
lim F(x 0 ) lim J (x) dPP(x) = J Ê dP(x) >- F(xo - q(F)) > F(x o ) - e.
n-+m n-^w
Since r > 0 is arbitrary, lim n ^ FF (x o ) <_ F(x o ) _< lim n ^ F(x o ), showing that
F(x 0 ) -+ F(x o ) as n - co. One may show that the converse is also true. That is, if
FF (x) --* F(x) at all points of continuity of a distribution function (d.f.) F, then
{Pn : n = 1, 2, ...} converges weakly to P. For this take a continuous function f that
vanishes outside [a, b] where a, h are points of continuity of F. One may divide [a, b]
into a finite number of small subintervals whose end points are all points of continuity
of F, and approximate f by a step function constant over each subinterval. The integral
of this step function with respect to P. converges to that with respect to P. All the above
results easily extend to multidimensions and we have the following theorem.
Theorem 5.1. Let P 1 , P2 .... , P be probability measures on ff8". The following are
equivalent statements.
(a) {Pn : n = 1, 2, ...} converge weakly to P.
(b) Eq. 5.1 holds for all infinitely differentiable functions vanishing outside a bounded
set.
(c) F(x) -* F(x) as n -* oo, for every point of continuity x of F.
Often the probability measures P. (n _> 1) arise as distributions of given random

variables X. (n _> 1). In this situation, if {Pn } converges weakly to P. one says {X n }
converges in distribution, or in law, to P.
It is simple to check that if X} converges in probability to a random variable X, then
{Xn } converges in distribution to the distribution of X.
In general, if {Fn } is a sequence of distribution functions of probability measures P.
that converges to a right-continuous, nondecreasing function F at all points of continuity
F(x), it need not be the case that F(x) is the distribution function of a probability
_>
x of
measure. While there will be a positive measure p such that F(x) = p( oo, x], x e R',
it can happen that u(R') < 1; for example, if P. = b , i.e., P({n }) = 1,F(x) = 0, x <n
and F(x) = 1 for x n, then F = 0. Situations such as this are described by the
>_
expression, "probability mass has escaped to infinity. A property that prevents this
from happening for a sequence {P} is the so-called tightness criterion. Namely, a family
{PP } of probability measures on R' (with d.f.'s {F }), is said to be tight if for each e > 0
there is an interval (a, b] in R' such that P((a, b]) = F(b) FF (a) > I E for each n I.

It is clear that if a sequence of probability measures on R' (or R") converges weakly
to a probability measure P, then {P} is tight. Conversely, one has the following result.
Theorem 5.2. Suppose a sequence of probability measures {P} on R' is tight. Then it
has a subsequence converging weakly to a probability measure P.
Proof. Let F be the d.f. of P.. Let {r 1 , r 2 , be an enumeration of the rationals. Since
...}
{F(r l )} is bounded, there exists a subsequence {n,} (i.e., {1 f ,2,.....n,,...}) of the

positive integers such that F,(r,) is convergent. Since {F,,(r 2 is bounded, there exists
)}
a subsequence {n 2 } of {n 1 } such that F 2 (r2 is convergent. Then {F,,(r; )} ( i = 1, 2)

)
converges. Continue in this manner. Consider the "diagonal sequence 1 1 , 2 2 , kk ... , , ... .
Then {F,(r; )} converges, as n oo, for every i, as {n,,: n >_ i} is a subsequence of

{n : n > 1 }. Let us write n = n'. Define
;
G(r): =lim F,, (r ) ; (i = 1, 2, ...),

(5.6)
F(x):= inf {G(r; ): r> x} (x
Then F is a nondecreasing and right-continuous function on R', 0 < F < 1. Let x be
F(x') _< G(y') = lim F.(y') <,

a point of continuity of F. Let x' < y' <x < y < x" with y', y" rational. Then
lim Ff .(x) < lim F,,(x) < lim F.,(y") = G(y") _< F(x").
Now let x' x, x" j x, and use continuity of F at x to deduce
lim F..(x) = F(x). (5.7)
It remains to prove that F is the d.f. of a probability measure. By tightness of {P} (and,
therefore, of {P,,.}), given e > 0, there exist x F , yE such that F,,(x,) < F(y,) > I s.
Let xE, y be points of continuity of F such that xE <x and y> yE . Then
F(x,) = lim F,,(x) < e, F(y) = lim F,,.(y') I E.
This shows lim, I _. F(x) = 0, lim, 1 , F(x) = 1. n
A similar proof applies to P,,, P on R k .
6 CLASSICAL LAWS OF LARGE NUMBERS
Theorem 6.1. (Strong Law of Large Numbers). Let {X} be a sequence of pairwise
independent and identically distributed random variables defined on a probability space

CLASSICAL LAWS OF LARGE NUMBERS 647
(SZ, y, P). If EIX,j < oo then with probability I,
X' + _+ Xn = EX,.
lim (6.1)
n wn
The proof we present here is due to Etemadi. 3 It is based on Part I of the following
Borel-Cantelli Lemmas. Part 2 is also important and it is cited here for completeness.
However, it is not used in Etemadi's proof of the SLLN.
Lemma 6.1. (Bore/-Cantelli). Let {A n } be any sequence of events in 3.

Part 1. If J t P(A,) < co then
P(A, i.o.):= P n U A k ) = 0.
( ' =I k=n
Part 2. If A 1 , A 2 , ... are independent events and if P(A,) diverges then

P(A, i.o.) = 1.
Proof. For Part I observe that the sequence of events B. = U= A k , n = 1, 2, ... , is

a decreasing sequence. Therefore, we have by the continuity property (1.2),
m m \ x m
P n U A k t = lim P^ U A k ) -< lim Z P(A k ) = 0.
nk=n n-m k=n n^z k=n
For Part 2 note that
P({A n i.o.}`) = P^ U n Ak) = lim P( Ak j = lim P(Ak).

n=1 k=n n-âc k=n J nx k=n
But
w m
P(Ak) = lim U (1 P(Ak))
k =n
lim exp{ P(A k )} = 0. n

m-^m k=n
Without loss of generality we may assume for the proof of the SLLN that the random
variables X. are nonnegative, since otherwise we can write X. = X X,, where
X = max(X,, 0) and X, = min(X,, 0) are both nonnegative random variables, and
then the result in the nonnegative case yields that
S
n
= n1 k=, ark _ n1_k=,
n
z Xk
converges to EX, EX 1 = EX, with probability 1.

-
' N. Etemadi (1983), Òn the Laws of Large Numbers for Nonnegative Random Variables," J.
Multivariate Analysis, 13, pp. 187-193.
Truncate the variables X. by Y = X1 . Then Y. has moments of all orders. Let

T = Zk =1 Yk and consider the sequence {T} on the "fast time scale T. = [a"], for a
fixed a > 1, where brackets [ ] denote the integer part. Let e > 0. Then by Chebyshev's
Inequality and pairwise independence,
P Î T^^ ET
`^ >
Var(T) 1 T^
VarYk ^
1 T^
Z EYk
T
z z=^ = z
E T
z
E T k= 1 E rn k=1
= 22 S E {Xk2 l (xk 5k1} ='

2 Z E {X12 1(X, sk)}
E T k=l E i n k=1
1 t^ 1
Zz E{X 1 ltx t ^^} = 2 TE{X i lix, s^^(}. (6.2)
c T k=1 e T
Therefore,
P \I
T ET
Tn
rÎ > e^ Z E {Xi l(x, s^ "1} = 2
n=1 E Tn
+2,
En=1 Tn
.3)
1(x^^(I. (6
Let x > 0 and let N = minn >- 1: T. -> x }. Then a" >- x and, since y < 2[y] for any y >- 1,
Z
n=l
1 l(:st^l= Z 1 -<2
Tn in>-x Tn n3N
"= 2 a " =k
a!
- k,
x
where k = 2 /(a 1). Therefore,
1 k
forX,>0.
n=1 in Xl
So
^n ET= I>e)<k Et
" 2
<co. (6.4)
nE1P\IT
By the BorelCantelli Lemma (Part 1), taking a union over positive rational values of
e, with probability 1, (TL^ ET ^ )/T n ^ 0 as n -4 oo. Therefore,
T EX, (6.5)
Tn
since
lim ET lim EY " = EX,.E
n_ao Tn n^ro
Since
P(XX Y") = Z P(X 1 > n) -< P(X1 > u) du = EX, < o0

n=1 n=1 O

CLASSICAL CENTRAL LIMIT THEOREMS 649
we get by another application of the Borel-Cantelli Lemma that, with probability 1,
S. T. -. 0
as n -+ oo. (6.6)
n
Therefore, the previous results about {T} give for {Sn } that
S
`"-SEX, as n-* (6.7)
rn
with probability 1. If v n < k < t,, 1 then, since X. >_ 0,
Zn Sty Sk Zn +l SrI
(6.8)
Tn+1 in k to Tn+l
But rn+l/in -* a, so that now we get with probability 1,
1 Sk Sk
- EX, -< liminf - -< limsup - -< aEX, . (6.9)
a k k k k
Take the intersection of all such events for rational a > 1 to get lim k - d, Sk /k = EX, with
probability 1. This is the Strong Law of Large Numbers (SLLN). n
The above proof of the SLLN is really quite remarkable, as the following observations
show. First, pairwise independence is only used to make sure that the positive and negative
parts of X, and their truncations, remain pairwise uncorrelated for the calculation of
the variance of Tk as the sum of the variances. Observe that if the random variables are
all of the same sign, say nonnegative, and bounded, then it suffices to require that they
merely be uncorrelated for the same proof to go through. However, this means that, if
the random variables are bounded, then it suffices that they be uncorrelated to get the
SLLN; for one may simply add a sufficiently large constant to make them all positive.
Thus, we have the following.
Corollary 6.2. If X 1 , X2,... , is a sequence of mean zero uncorrelated random variables

that are uniformly bounded, then with probability 1,
X, +.+Xn
*0 asn -*oo.
n
7 CLASSICAL CENTRAL LIMIT THEOREMS
In view of the great importance of the central limit theorem (CLT) we shall give a general
but self-contained version due to Lindeberg. This version is applicable to non-identically
distributed summands.
Theorem 7.1. (Lindeberg's CLT). For each n, let X,,,,. .. , Xk^ n be independent random
variables satisfying
( k"
EXj.n = 0 , al.n' (EX^ n)1/Z CO, an = 1 , (7.1)
j=1
and, for each E > 0,

k"
(Lindeberg condition) lim Z E(X; jjjx "I,E}) = 0 . j (7.2)
nW j=1
Then ^J^ 1 X in distribution to the standard normal law N(0, 1).
Proof. Let {Z : j _> 1} be a sequence of i.i.d. N(0, 1) random variables, independent of

j
{X: I < j Q. Write
Z ,, ;=a j.n Zj (1 <j < kn), (7.3)
so that EZ 0 = EX EZj = a j = EX J . Define

n n n
m
Um,n := Y_ X,
j=1
,, + I Zj,n
k"
j=m+1
(I < _<
m kn 1),
(7.4)
j=1 j=1
Vm.n `= Um.n Xm n (1 < m < k n ).
Let f be a real-valued function on I8' such that ff', f ", f " are bounded. Recall the
following version of the Taylor expansion, which is easy to check by integration by parts,
h2
f(x + h) = f(x)+hf'(x) + h f"(x)+h 2 (1 6){f"(x + Oh) f"(x) }dO
2! 0
(x,hE R'). (7.5)

Taking x = V,,,, h = X,,, in (7.5), one gets
EJ (Umn) = E' f( V,. + Xm.n) = EJ ( Vmn) + E ( Xm.nf ' ( vmn)) + I'E(Xm.nf " ( V, )) + E(R mn),
(7.6)
where
Xm.n J 1 (1 6){f " ( Vm n + x,) f"( V)} d8. (7.7)
As Xm , n and Vm , n are independent, and EX,,,,, = 0, EX,,, = of,,,, (7.6) reduces to
.n
Ef(Um,n) = E.f(I'm.n) + ^2 Ef"(ym.n) + E(R m , n ). (7.8)
Also Um_ l.n = Vm,n + Z., n , and Vm.n and Zm.n are independent. Therefore, exactly as

CLASSICAL CENTRAL LIMIT THEOREMS 651
above one gets, using EZ,,,, = 0, EZ m,n = Qm,n
Ef(Um- t.n) = Ef(Vm,n) + -^.n Ef'(Vm,n) + E(R^n.n), (7.9)
where

(V. + Zm ) .f"(Vm,)} dB.
R = Zm
J 1 (1 O){ f (7.10)
Hence,
IEf(Umn) Ef(U,-i.n)I -< EIRm,nI + EIR;n.nl (1 -< m -< Q. (7.11)
Now, given an arbitrary e > 0,
EIRm,nI = E(IRm.nl l{Ix,,.,.l>el) + E(IRm.nI1Ux.,,.ISc))
E xm.n1llXm.^1>El f '(1 0 ) 2 11f"Ilô do

[
. ]
+ E X 2 ,.IllX.^..,I5c1 J ' (1 O)IXm,nl IIJ "Il do J

[ 0
Ilf"IIwE(Xr2n,. 1 tlx . I>s} ) + 2EQm,n 11 f " II . (7.12)
We have used the notation 11911 := sup{Ig(x)I: x e W}. By (7.1), (7.2), and (7.12),
lim EIRm,I 28IIf'"II .

m=1
As > 0 is arbitrary,
k
tim Z EIR m.I = 0. (7.13)
m=1
Also,
,
EIRl s E[zm. J(1 9)Ilf"'1 Zm,nl de] _ ?II f"'IIÊIzm.1 3 = ?Ilf" IIWam.nEIZ1i 3
camc^ max a m , n l am (7.14)

1 <m-k /
where c = Z IIf"ILEIZ1I3. Now, for each S > 0,
a m.n = E(xm.n 1 11x., ,d>al) + E(Xm.n 1 lIX,,, ,I s) -< E ( x m.n 1 llx.^..,1 >)) + 5 2 ,
which implies that

k
max am, K E(Xm.ltlxm.>aI) + Sz.
Therefore, by (7.2),
max a,,,,, -- 0 as n -- oo. (7.15)

i 5mbk
From (7.14) and (7.15) one gets
k
I EIR; ,j < c max a m n -^ 0 as n cc. (7.16)
m=1
Combining (7.13) and (7.16), one finally gets
/ k /
IEf(Uk,,,n) Ef(Uo.n)I Z (EIRm.nl + EIRin.nl) -a 0 as n c0. (7.17)

m=1
But Uo , n is a standard normal random variable. Hence,
J
k
Ef Y- x) f(y)(2n)-lie exp{Zy 2 } dy -+ 0 as n -+ oo.
j= 1
By Theorem 5.1, the proof is complete. n
It has been shown by Feller 4 that in the presence of the uniform asymptotic negligibility
condition (7.15), the Lindeberg condition is also necessary for the CLT to hold.
Corollary 7.2. (The Classical CLT). Let {Xj : j -> 1} be i.i.d. EXj = p, 0 < a 2 :=
Var Xj < oo. Then Y 7 =1 (Xj )/(i/) converges in distribution to N(0, 1).
Proof. Let Xi ,, = (Xj )/(a^), k n = n, and apply Theorem 7.1. n
Corollary 7.3. (Liapounov's CLT). For each n let X1,n, X2,n, . .. , Xn k be k n independent
random variables such that
k k
I EXj,n = u, I Var Xi ,, = Q Z > 0,
j= 1j=l
k
(Liapounov condition) lim Z EIXj , n EXj , n 1 2+ ' = 0 (7.18)
n-. j=l
for some 6 > 0. Then Xi,, converges in distribution to the Gaussian law with mean
p and variance a Z .
Proof. By normalizing one may assume, without loss of generality, that
k.,
EXj , n =0, Z EX; n = 1.
j=1
4 P. Billingsley (1986), loc. cit., p. 373.

FOURIER SERIES AND THE FOURIER TRANSFORM 653
It then remains to show that the hypothesis of the corollary implies the Lindeberg
condition (7.2). This is true since, for every E > 0,
k k IX. Iz+a
i ,.n
Ea 0, (7.19)
(
E X;.^ltlx;..,l>tl) Z E
1 J=1
as n - oo, by (7.18). n
Observe that the most crucial property of the normal distribution used in the proof
of Theorem 7.1 is that the sum of independent normal random variables is normal. In
other words, the normal distribution is infinitely divisible. In fact, the normal distribution
N(0, 1) may be realized as the distribution of the sum of independent normal random
variables having zero means and variances Q? for any arbitrarily specified set of
nonnegative numbers or? adding up to 1. Another well-known infinitely divisible
distribution is the Poisson distribution.
The following multidimensional version of Corollary 7.2 may be proved along the
lines of the proof of Theorem 7.1.
Theorem 7.4. (Multivariate Classical CLT). Let {X n : n = 1, 2, ...} be a sequence of

i.i.d. random vectors with values in l. Let EX I = p and assume that the dispersion
matrix (i.e., variance-covariance matrix) D of X I is nonsingular. Then as n -+ 00,
n - '^ 2 (X I + + X. nit) converges in distribution to the Gaussian probability
measure with mean zero and dispersion matrix D.
8 FOURIER SERIES AND THE FOURIER TRANSFORM
Consider a real- or complex-valued periodic function on the real line. By changing the
scale, if necessary, one may take the period to be 2n. Is it possible to represent f as a
superposition of the periodic functions ("waves") cos nx, sin nx of frequency n
(n = 0, 1, 2, ...)? The Weierstrass approximation theorem (Theorem 8.1) says that every
continuous periodic function f of period 2n is the limit (in the sense of uniform convergence
of functions) of a sequence of trigonometric polynomials, i.e., functions of the form
T T
Y c n e" s = c0 + (an cos nx + b n sin nx).
n=-T n=1
The theory of Fourier series says, among other things, that with the weaker notion of
L2 -convergence the approximation holds for a wider class of functions, namely for all
square integrable functions f on [it, n]; here square integrability means that I f I z is
measurable and that J"_,, f (x)1 2 dx < oc. This class of functions is denoted by L2 [ rr, n].
The successive coefficients c n for this approximation are the so-called Fourier coefficients:
1 "
c n = -- f(x)edx (n=0,1,2,...). (8.1)
n
The functions exp{inx} (n = 0, + 1, 2, ...) form an orthonormal set:
1
e1nxe-"""dx =0 forn0m,
2it _n
=1 for n = m, (8.2)
so that the Fourier series off written formally, without regard to convergence for the
time being, as
X
Y cneinx
(8.3)
is a representation of f as a superposition of orthogonal components. To make matters

precise we first prove the following theorem.
Theorem 8.1. Let f be a continuous periodic function of period 2n. Then, given > 0,
there exists a trigonometric polynomial Y"= - N do exp{inx} such that
N
sup f (x) do exp{inx} <5.
XE[, n=N
Proof. For each positive integer N, introduce the Fejer kernel
k N (x) := j (i 1nj ) exp{inx}. (8.4)

n = _ N N +1
This may also be expressed as
lexpî(N + 1)x} l 2
(N + 1)k N (x) _ exp{i( j k)x} = I exp{ijx}I z =
o,j.ksN i=o exp{ix} 1
_ 2{l (cos(N + 1)x} _ sm {Z(N + 1)x} 2

(8.5)
2(1 cos x) sin Zx )
The first equality in (8.5) follows from th^fact that there are N + 1 (ni pairs (j, k)
such thatj k = n. It follows from (8.5) that k N is a positive continuous periodic function
with period 2n. Also, k N is a p.d.f. on [ n, iv], as follows from (8.4) on integration. For
every e> 0 it follows from (8.5) that k N (x) goes to zero uniformly on [ n, e] u [E, n]
so that
kN(x) dx -. 0 as N -^ co . (8.6)
[ - a. - e]v [c.n]
In other words, k N (x) dx converges weakly to 5 0 (dx), the point mass at 0, as N -* oo.
Consider now the approximation fN of f defined by
fN(x)'=
f rz
-
.f(y)kN(x y)dy = y 1 1
. n= _N \\\

N
Inl+llJc exp{inx}, (8.7)
where c" is the nth Fourier coefficient of f. By changing variables and using the periodicity
of f and k N , one may express fN as
.fN(x) =
f n f(x y)kN(y) dy.
rz
Therefore, writing M = sup{I f (x)I: Xe l}, and S E = sup{If (y) f(y')I: I y y'I <c},
one has
If(x) .fN(X)I - n f(x y) f(x)I kN(y) dy -< 2M J kN(y) dy + S.E
$ R [-R.-E]V [E.f[]
(8.8)
It now follows from (8.6) that f fN converges to zero uniformly as N -+ oo. Now
write do = (1 Inj/(N + 1))c n . n
The next task is to establish the convergence of the Fourier series (8.3) to f in L2 .
For this note that for every square integrable f and all positive integers N,
1 rz
( f(x)_ C" e'nx i e dx = c, C m = 0 (m = 0, 1.... .N). (8.9)
27[ - n-
N
Therefore, if one defines the norm (or "length") of a function g in L2 [iv, rz] by
rz i2
I lg(x)j' dx
Ilg11 = ( = JIgll2, (8.10)
rz
then, writing 2 for the complex conjugate of z,
N
c"einxZ I = 1 f rz (f(x) ^ c"enx )( f( x ) [^ C "e -inx 1 dx
0 - I f(x)
_N 21 _ rz \ -N -N J
=
2 1
^
/'
rz
J -n IJ (x)12 dx L
N^
[
(
-N \ 27C f rz
-n
N

einx f(x) dx + 2^
J
rz
-a
e"' f (x) dx cn ,,
J 7
= IIJ 11 2 [^ Cn Cn = IIJ I1 2 ICnI2. (0.11.)

N -N
This shows that II f (x) > N_ N c" exp{inx} 11 2 decreases as N increases and that
N 2
f (8.12)
lim f(x) Y- cne`"x = 11 112 Y- Ic n I 2 .
N- . -N
To prove that the right side of (8.12) vanishes, first assume that f is continuous and
f(it) = f(n). Given e> 0 there exists, by Theorem 8.1, a trigonometric polynomial
ZNOy o d n e i "x such that
No
max f(x) Y- de"' <e.
< E.
x -No
This implies
1 n No 2
f(x) de"" dx < F 2 . (8.13)
2r j,, -No
But, by (8.9), f(x) J] N-N o c exp{inx} is orthogonal to exp{imx} (m = 0, 1, ... , N o )

so that
rz No 2 fNo No 2
1 /'( x ) d rze "x d x =1 f(x) So crzeinx + (cn d n )e
nx
l dx
7C .J No 27t _ No
n - No
1 n No 2
= f(x) cneinx^ dx
27r ., _No
rz 2
1 No
+ (c d rz )e inx dx. (8.14)
27r ,, -No
Hence, by (8.13) and (8.14),
No c n e
inx
l2 dx < e 2 , lim f(x) cetnx 2 \ e 2 . (8.15)
27[ f rz I f (x)
-rz -No Nam -N
Since f > 0 is arbitrary, it follows that
N
lim f(x) crze inx = 0,
(8.16)
N-- -N
Y_ (8.17)
and, by (8.12),
II.f ll 2 = Ic1 2 .
This completes the proof of convergence for continuous periodic f. Now it may be
shown that given a square integrable f and e > 0, there exists a continuous periodic g
such that II f gll < E/2. Also, letting Z d. exp{inx}, X c exp{inx} be the Fourier series
of ,g, f, respectively, there exists N t such that
N,
g(x) d exp{inx} II < -
-N,
Hence (see (8.14))
f(x) Y c,einx \ f(x) dneirzx

If gII + I g(x) d u e"
-N, -N, -N,
< 2E + 2E =E. (8.18)

Since F >0 is arbitrary and Il f(x) I N-N cne' n ÌI Z decreases to II f 2 ^ x x. Ic n lz as

NI oo (see Eq. 8.12), one has
N x
c12
llm f(x) CnPinx =0, Ilf 11 2 = (8.19)
N-^ x^ - N -x
Thus, we have proved the first part of the following theorem.
Theorem 8.2
(a) For every f in L2 [n, n], the Fourier series of f converges to f in L2 -norm, and
the identity li f II = IcnI2)"2 holds for its Fourier coefficients c n .
(b) If (i) f is differentiable, (ii) f(n) = f(n), and (iii) f' is square integrable, then
the Fourier series of f also converges uniformly to f on [n, n].
Proof. To prove part (b), let f be as specified. Let Y_ c n exp{inx} be the Fourier series
of f, and Y c;,' ) exp{inx} that of f'. Then
x)e-"' dx = 2n inxln = inc n .

cnl) = 2?[ Jna f' (l J
f (x)e rz
+
Zn Jn /'(x)e-inx dx = 0 + inc
rzJ
n
(8.20)
Since f' is square integrable,
S Incnl' = Z Ic;,"I Z < cc. (8.21)
Therefore, by the CauchySchwarz Inequality,
1 1 \h/2/
Icnl = Icol + nô Inl Inc,I <_ Icol + z Incnl2 < oo. (8.22)
n#on/ \n#O
But this means that Y c n exp{inx} is uniformly absolutely convergent, since
max I E c n exp{inx} I _< E Ic n l *0 as N cc.

x InI>N I,,I>N
Since the continuous functions Y N_ N c n exp{inx} converge uniformly (as N > co) to
c n exp{inx}, the latter must be a continuous function, say h. Uniform convergence
to h also implies convergence in norm to h. Since Y"' N c n exp{inx} also converges in
norm to f, f(x) = h(x) for all x. For if the two continuous functions f and h are not
identically equal, then
J f(x)g(x)l 2 dx>0. n
For a finite measure (or a finite signed measure) p on the circle [n, n) (identifying

-n and n), the nth Fourier coefficient of p is defined by
1
c = exp{ -inx}p(dx) (n = 0, 1, ...). (8.23)
2n [n)
If p has a density f, then (8.23) is the same as the nth Fourier coefficient of f given by (8.1).
Proposition 8.3. A finite measure on the circle is determined by its Fourier coefficients.
Proof. Approximate the measure p(dx) by g(x) dx, where
rz n
g N (x):= kN(x - y)p(dy) = ^ (1 - l ^c exp{inx}, (8.24)
_ rz N N+ 1
with c defined by (8.23). For every continuous periodic function h (i.e., for every
continuous function on the circle),
f h(x)gN(x) dx
t = 5 ( ft - x,.)
h(x)k,(x - y) dx)P(dy). (8.25)
As N -+ oo, the probability measure k N (x - y) dx = k N (y - x) dx on the circle converges

weakly to b y (dx). Hence, the inner integral on the right side of (8.25) converges to h(y).
Since the inner integral is bounded by sup{Ih(y)l: ye W}, Lebesgue's Dominated
Convergence Theorem implies that
N
lim
- f
,-x,n)
h(x)g N (x) dx = J
[-rz.rz)
h(y)p(dy). (8.26)
This means that p is determined by {g N : N > 1} The latter in turn are determined by {c),
n
We
We are now ready to answer an important question: When is a given sequence
{c: n = 0, + 1, ...}, the sequence of Fourier coefficients of a finite measure on the circle?
A sequence of complex numbers {c: n = 0, + 1, 2,.. .) is said to be positive definite
if for any finite sequence of complex numbers {zj : 1 < j < N), one has
Y- c j - k zj z k i 0. (8.27)
1 j.k5N
Theorem 8.4. (Herglotz's Theorem). {c: n = 0, 1, ...} is the sequence of Fourier

coefficients of a probability measure on the circle if and only if it is positive definite,
and co = 1.
Proof. (Necessity). If p is a probability measure on the circle, and {z j : 1 -< j -< N} a

given finite sequence of complex numbers, then
1 E zjik
1j,kN
Ci_kZJZk =
21t 1-<j.kSN J [-a.n)
exp{i( j k)x}(dx)
('
1
J N
_ Irz Ê zj exp{ iix})(Z z k exp{ikx})(dx)

N
=1J
(' N Z
exp{ ijk} j (dx) >- 0.
j (8.28)
z
Also,
J
c0= p(dx) = 1.
(Sufficiency). Take z j = exp{i(j 1)x}, j = 1, 2, . .. , N + 1 in (8.27) to get
gN(x):= 1 c j _ k exp{i(j k)x} >, 0. (8.29)

N + 1 0-<j,k<N
Again, as there are N + 1 Inj pairs (j, k) such that j k = n (N _< n _< N), (8.29)
becomes
'
0 -< 9 N (X) _ 1 j exp{inx}c. (8.30)
N +1 /
In particular,
1
gN(x)dx = c o = 1. (8.31)
27[ t-n,n)
Hence (l/21t)g N is a p.d.f. on [it, ir]. By Theorem 5.2, there exists a subsequence {g N -}
such that (1/27t)g N .(x) dx converges weakly to a probability measure u(dx) as N' -+ oo.
Also, integration yields
i2rcf -,,.
[ n[
exp{ inx}g N (x) dx = ( l Inj ^c
N + 1
(n = 0, + 1, ... , N). (8.32)
For each fixed n, take N = N' in (8.32) and let N' - oo. Then
n
C = lim^1 j --c = exp{inx}p(dx) (n = 0, 1,...). (8.33)
N '- N' + 1 ;
In other words, c is the nth Fourier coefficient of p. If u({rz}) > 0, one may change p
to p' where p'({it}) = p({R}) + p({rr}), and p' = p on (it, n) to get a probability
measure p' on the circle whose Fourier coefficients are c. Note that (8.33) holds with
p replaced by p', because exp{inn} = exp{inrc} = 1. n
Corollary 83. A sequence {c} of complex numbers is the sequence of Fourier

coefficients of a finite measure on the circle [ -n, it) if and only if {c} is positive definite.
Proof. Since the measure p = 0 has Fourier coefficients c = 0 for all n, and the latter
trivially comprise a positive definite sequence, it is enough to prove the correspondence
between nonzero positive definite sequences and nonzero finite measures. It follows from
Theorem 8.4, by normalization, that this correspondence is 1-1 between positive definite
sequences {c} with c o = c > 0, and measures on the circle having total mass c. n
The Fourier transform of an integrable (real- or complex-valued) function f on

(-oo, oo) is the function f on (-oo, oo) defined by
e`^yf(y)dy, -oo< <oo. (8.34)
As a special case take f = I(c,dJ Then,
exp{ i^ d} - exp{i^ c }
= i
(8.35)
so that f() -. 0 as ICI -+ oo. This convergence to zero as - oo is clearly valid for
arbitrary step functions, i.e., finite linear combinations of indicator functions of finite
intervals. Now let f be an arbitrary integrable function. Given e > 0 there exists a step
function f, such that
I14 f II I = f If(y) - f(y)I dy <. (8.36)

W
Now it follows from (8.34) that (J ) - f( if ff - f II 1 for all . Since J) --* 0 as

^ -. oo, one has lim 1 _,. I f(^)I < E. Since E >0 is arbitrary,
0 as {^j -+ co. (8.37)
The property (8.37) is generally referred to as the Riemann-Lebesgue Lemma.

If f is continuously differentiable and f, f' are both integrable, then integration by
parts yields
.^^( ) = - ).
if. (8.38)
The boundary terms in deriving (8.38) vanish, for if f' is integrable (as well as f) then
f (x) -. 0 as x - oo. More generally, if f is r-times continuously differentiable and
f e' , 0 < j < r, are all integrable, then one may repeat the relation (8.38) to get
)
f (^) = (-if' ().

(r)
(8.39)
In particular, (8.39) implies that if f, f', f" are integrable then f is integrable.
It is instructive to consider the Fourier transform as a limiting version of a Fourier
series. Consider for this purpose that f is differentiable and vanishes outside a finite
interval, and that f' is square integrable. Then, for all sufficiently large integers N, the

function
gN(x):= f(Nx) (8.40)
vanishes outside (n, n). Let Cn Ne x , c e S be the Fourier series of y ti. and its
derivative g N , respectively. Then
f
Cn.N gN(x)e-;nx dx =
n 2R
f (Nx)e
'
dx = 2
1 - ^r. f(y)e
inl/N
dy
1 n
= f (8.41)
2Nrz N
Now writing A = (2>2j, n-2)'/2,

11 1/z 1/2
kn,NI = ICo.NI + (I ncn.NI) Co.NI + > Z >2 Inen.Nl 2
( n>o InI ., on n#0
'/2
Ico,NI + A( >2 Ic^ 1 NI 2 )
f=
ou
g(x)dx + A ( rz f Ig (x)I 2 dx <
n
..
1/2
x j 2 -R /
Therefore, for all sufficiently large N, the following convergence is uniform:
f(z) = yN
( -
N
-

n Y>.
= n1
_.. f
x, 2Nn
Letting N - co in (8.42), if f E C(If', dx), one gets the Fourier inversion formula,
n e ;n 1N.
N
(8.42)
` ('
.f(y)ey dy = - J ^^ ds.
_^ .i(^)e_
;
f(z) = -- (8.43)
One may show that this formula holds for all f such that both f and f are integrable.
Next, any f that vanishes outside a finite interval and is square integrable is automatically
integrable, and for such an f one has, for all sufficiently large N,
1 I
IgN(x)I2 dx If(y)I 2 dy,
1- =2I f
Ig(x)I 2 dx = Y ICn.NI 2 = 4NZrz2 Y, I f( N^ 2 ,

2n f
so that
If(y)1 2 dy. (8.44)

N n=^^ I f \ N
n/I2 2n J
Therefore, letting NI oc, one has the Plancherel identity
1 . If(^)I Z dl; = 2n I If(y)1 2 dy. (8.45)

JJJ ^ JJ
Now every square integrable f on ( oo, oo) may be approximated in norm arbitrarily
closely by a continuous function that vanishes outside a finite interval. Since (8.45) holds
for the latter, it also holds for all f that is integrable as well as square integrable. We
have, therefore, the following.
Theorem 8.6
(a) If f and f are both integrable, then the Fourier inversion formula (8.43) holds.
(b) If f is integrable as well as square integrable, then the Plancherel identity (8.45)
holds.
Next define the Fourier transform of a finite measure p by setting
(ff) _ f e du(x). (8.46)

j
If p is a finite signed measure, i.e., p = 1 P 2 where p, P 2 are finite measures, then

also one defines fi by (8.46) directly, or by setting = l 2 . In particular, if
p(dx) = f (x) dx, where f is real-valued and integrable, then = f If p is a probability
measure, then A is also called the characteristic function of p (or of any random variable
whose distribution is p).
We next consider the convolution of two integrable functions f, g:
f *g(x) = .f(xy)g(y)dy (oo<x<oo). (8.47)

J
Since
If * g(x)I dx = If(x y)IIg(y)I dy dx

JJ
= f m
If (x)l dx
J Ig(y)I dy, (8.48)
f * g is integrable. Its Fourier transform is
(f * g) (Z) =
^ f (x y)g(y) dy) dx
Jm e'4X \J -
=
JJ e'Z(x-rie''yf(x y)g(y) dy dx
e.42e14y
f(z)g(y) dy dz = i( ( ), (8.49)
-m -t
a result of importance in probability and analysis. By iteration, one defines the n-fold
convolution f, * *f,, of n integrable functions f,, ... , f and it follows from (8.49)
that (f, * * f) = f, f. f.. Note also that if f, g are real-valued integrable functions
and one defines the measures u, v by (dx) = f (x) dx, v(dx) = g(x) dx, and * v by
(f * g)(x) dx, then
( * v)(B) = I (f * g)(x) dx = 1' \J f(x y) dx g(y) dy

JJJe s )
(8.50)
=
J p(B y)g(y) dy =
J T (B y) dv(y),
for every interval (or, more generally, for every Bore! set) B. Here B y is the translate
of B by y, obtained by subtracting from each point in B the number y. Also,
( * v) = (f * g) = fo = . In general (i.e., whether or not p and/or v have densities),
the last expression in (8.50) defines the convolution p * v of finite signed measures p and
v. The Fourier transform of this finite signed measure is still given by (p * v) = v".
Recall that if X,, X 2 are independent random variables on some probability space
((2, d, P) and have distributions Q1, Q 2 , respectively, then the distribution of X, + X Z
is Q, * Q 2 whose characteristic function (i.e., Fourier transform) may also be computed
from
(Q, * Q2) ^(Z) = Ee'z x, Ee`t x2 = Qt( )2( ). (8.51)
This argument extends to finite signed measures, and is an alternative way of thinking
about (or deriving) the result (p * v) = v".
Theorem 8.4 and Corollary 8.5 may be extended to a correspondence between finite
(probability) measures on H' and positive definite functions on R' defined as follows.
A complex-valued function cp on R' is said to be positive definite if for every positive
integer N and finite sequences {^ , ... , N } c H' and {z,, Z21... , z} c C (the set
of complex numbers), one has
Z zjzk(P(Zi Zk) i 0. (8.52)

I fij.k<-N
Theorem 8.7. (Bochner's Theorem). A function cp on R' is the Fourier transform of a

finite measure on H' if and only if it is positive definite and continuous.
The proof of necessity is entirely analogous to (8.28). The proof of sufficiency may
be viewed as a limiting version of that for Fourier series, as the period increases to
infinity. We omit this proof.
The above results and notions may be extended to higher dimensions H k . This
extension is straightforward. The Fourier series of a square integrable function f on
[m, i) x [n, n) x x [n, n) = [rz, n) k is defined by E c,, exp{iv x} where
the summation is over all integral vectors (or multi-indices) v = (v', V' 21 , .. , v (k '), each
vf being an integer. Also v x = v'''xthe usual euclidean inner product on U& k
between two vectors v = (v"'. .. , v' ), and x = (x', x (2 ', .. , x" k '). The Fourier
coefficients are given by
I n "
... 'v
C^ k f(x)e . dx. (8.53)

( 2n ) ,,
The extensions of Theorems 8.1-8.4 are fairly obvious. Similarly, the Fourier transform
of an integrable function (with respect to Lebesgue measure on Il) f is defined by
i() = J ... J e f(y) dy (4 E R k ), (8.54)
and the Fourier inversion formula becomes
f(z) =
...
.f(4)e d4, (8.55)
(2 I f
which holds when f(x) and f() are integrable. The Plancherel identity (8.45) becomes
J .. J
If(4)I Z d4 = ( 2 n) k J ... If(y)1 2 dy,
J (8.56)
which holds whenever f is integrable and square integrable. Theorem 8.6 now extends
in an obvious manner. The definitions of the Fourier transform and convolution of finite
signed measures on 08'` are as in (8.46) and (8.50) with integrals over ( 00, cc) being
replaced by integrals over R". The proof of the property (it s * 12) = l P2 is unchanged.
Our final result says that the correspondence P -+ P. on the set of probability measures
onto the set of characteristic functions, is continuous.
Theorem 8.8. (Cramer-Levy Continuity Theorem). Let P(n >, 1), P be probability
measures on (R', ,1k)
(a) If P. converges weakly to P, then P() converges to P(4) for every e Il k .
(b) If P,, () - P() for every , then P. converges weakly to P.
Proof. (a) Since P(, P() are the integrals of the bounded continuous function
exp {i4 x} with respect to P and P, it follows from the definition of weak convergence
that PP (^) -> P(^).
(b) Let f be an infinitely differentiable function on l having compact support.
Define f(x) = f(x). Then f is also infinitely differentiable and has compact support.
Now,
f f(x)P^(dx) _ (f * P)( 0 ), f f(x)P(dx) = (f * P)(0). (8.57)
As f is integrable, (f* P) = / P., and (f * P) = f P are integrable. By Fourier

inversion and Lebesgue's Dominated Convergence Theorem,
(f * P^)(0 ) = i0 x }f(4)PP(4) d --- (f* P)(0). (8.58)

(2rr) k ûk exp{
Hence, f f dP - f f dP for all such f. The proof is complete by Theorem 5.1. n

Author Index
Aldous, D., 194 Feder. J., 356

Aris, R., 518 Feller, W., 67, 107, 223, 362, 498, 499. 616. 652
Athreya, K., 223 Flajolet, P., 362
Fleming, W. H., 562
Bahadur, R. R.. 518 Fomin, S. V., 97
Basak, G., 623 Freedman, D. A., 228, 499
Belkin, B., 506 Friedlin, M., 517
Bellman. R., 561 Friedman, A., 502, 516
Bhattacharya, R. N., 106, 228.513, 515. 518, Fuchs, W. H. J., 92
561, 623
Billingsley. P., 79, 93, 94, 96. 104, 511, 637, Garcia, A. M., 224
639, 652 Ghosh. J. K., 503, 518
Bjerknes, R.. 366 GiIbarg, D., 516, 517, 622
Blackwell. D., 561 Girsanov, I. V., 618
Blumenthal, R. M., 228 Gordin, M. 1., 513
Brown, B. M., 508 Griffeath. D.. 366
Brown, T. C., 356 Gupta, V. K., 106, 362, 518
Cameron. R. H., 618 Hall, P. G., 511

Chan. K. S., 228 Hall, W. J., 503
Chandrasekhar, S., 259 Harris, T. E., 363
Chow, Y. S., 562 Has'minskii, R. Z., 616, 623
Chung, K. L., 92, 105.361 Heyde, C. C.. 511
Clifford, P., 366 Holley, R., 366
Corson, H. K.. 228 Howard, R., 561
Diaconis, P., 194 Ibragimov, A., 511

Dobrushin, R. L., 366 lglehart, D. L., 105, 506
Dorisker, M. D., 103 Ikeda, N., 623
Doob, J. L.. 216 It6. K., 105, 356, 497. 501. 503, 504. 506
Dorfman, R., 64
Dubins. L. E., 228 Jain, N., 229
Durrett, R. T., 105. 362. 366. 506 Jamison, B., 229
Dynkin, E. B., 501, 516
Kac, M., 259
Eberlein, E., 356 Kaigh, W. D., 105
Etemadi, N., 647 Karatzas, I., 623
Ethier, S. N., 501 Karlin, S., 504
665
666 AUTHOR INDEX
Kennedy, D. P., 105 Riesz, F., 505

Kesten, H., 190, 227, 362 Rishel, R. W., 562
Kieburtz, R. B., 217 Robbins, H., 562
Kindermann, R.. 366 Rogers. L. C. G., 105
Kolmogorov, A. N., 97 Ross, S. M., 561
Kunth. D., 193 Rota, G. C.. 343
Kurtz, T. G., 501 Royden, H. L.. 93, 228, 230, 501,
613
Lamperti, J., 207.356
Lawrance, V. B., 217 Scarf, H., 562
Lee, O., 228, 513 Sen, K., 518
Levy, P., 105 Shannon, C. E., 231
Lifsic, B. A.. 513 Shreve. S. E., 623
Liggett, T. M., 366 Siegmund, D.. 562
Lindvall, T., 223 Skorokhod, A. V., 356
Lotka, A. J., 207 Snell. J. L.. 366
Spitzer, F.. 68. 73, 190, 227, 343, 366
McDonald, D., 223 Stroock, D. W., 502. 505, 517, 621
McKean, H. P. Jr., 105, 497, 501, 503, 504, Sudbury, A., 366
506
Maitra. A., 560, 561 Taqqu, M., 356
Majumdar. M., 228, 561, 562 Taylor, G. 1., 518
Mandelbrot, B. B., 107, 356 Taylor, H. M.. 504
Mandl, P., 500, 501, 504, 505 Thorisson. H., 223
Martin, W. T.. 618 Tidmore, F. E., 65
Merton, R. C., 562 Tong, H., 228
Mesa, 0., 362 Trudinger, N. S., 516. 517, 622
Miller, D. R., 105, 506 Turner, D. W., 65
Miller. L. H., 104 Tweedie, R. L.. 229
Mina, K. V., 217
Mirman, L. J., 228 Van Ness, J. W.. 107
Moran, P. A. P., 107 Varadhan, S. R. S., 502, 505, 517.621
Nelson, E., 93 Watanabe, S., 623

Ney, P., 223 Waymire, E., 106, 259, 362, 356, 518
Weaver, W., 231
Odlyzko. A. M., 362 Weisshaar, A., 216
Orey, S., 229 Wijsman, R. A.. 503
Ossiander, M., 356 Williams, D., 105
Williams, T., 366
Parthasarathy, K. R., 230
Yahay. J. A., 228
Rachev, S. T., 482 Young. D. M., 65
Radner. R.. 228, 562 Yosida, K., 500
Ramasubramanian, S., 516 Yushkevich, A. A., 516
Subject Index
Absolutely continuous, 639 Birkhoffs ergodic theorem, 224

Absorbing boundary: Birth-collapse model, 203
birth-death chain, 237, 309 Blackwell's renewal theorem, 220
branching, 158, 160, 321 Bochner s theorem, 663
Brownian motion, 412, 414, 451, 454 Borel-Cantelli lemma, 647
diffusion, 402-407, 448-460 Borel sigmafield:
Markov chain. 151. 318 of CI0.cI, CI0.l . 23.93
random walk, 114, 122 of metric space, 627
Accessible boundary, 393, 499, 615. See also of l8', 68' .626
Explosion Boundary conditions, see also Absorbing
Actuary science: boundary, Reflecting boundary
collective risk, 78 backward, 396, 401.402, 452
contagion, accident proneness, 116, 273 forward,398
After-tprocess, 118.425 mixed, 407
After-ssigmafield, 425 nonzero values, 485, 489
Alexandrov theorem, 96 time dependent, 485
Annihilating random walk. 330 Branching process:
Arcsine law, 34, 35, 84 Bienayme-Galton-Watson branching,
AR(p), ARMA (p,q), 170, 172 115, 158.281
Artificial intelligence, 252 extinction probability, 160.283
Arzela-Ascoli theorem, 97 Lamperti's maximal branching, 207
Asymptotic equality (-), 55 time to extinction, 285
Autoregressive model, 166. 170 Brownian bridge. 36, 103
Brownian excursion, 85. 104, 288, 362, 505
Backward equation. See also Boundary Brownian meander, 85, 104, 505
conditions Brownian motion. 18, 350, 390,441
branching, 283.322 constructions, 93, 99, 105
Brownian motion, 78, 410. 412, 452, 467 max/min, 79.414.431, 486, 487, 490
diffusion, 374. 395, 442, 462, 463, 464 scaling property, 76
Markov chain, 266, 335 standard, 18.441
Backward induction, 523 tied-down, see Brownian bridge
Backward recursion, 522. 543
Bellman equation, see Dynamic Cameron-Martin-Girsanov theorem, 616,
programming equation 618
Bernoulli-Laplace model, 233, 240 Canonical construction, 15. 166
Be rt rands ballot problem, 70 Canonical sample space, 2
Bessel process, see Brownian motion Cantor set. 64
Biomolecular fixation, 319, 321 Cantor ternary function, 64
667

668 SUBJECT INDEX
Capital accumulation model, 178 Coupon collector problem, 194

Caratheodory extension theorem, 626 Cox process, 333
Card shuffles, 191, 194 Cramer-Levy continuity theorem, 664
Cauchy-Schwarz inequality, 630
Central limit theorem (CLT): Detailed balance principle, see
classical, 652 Time-reversible Markov process
continuous parameter Markov process, Diffusion. 368. See also Boundary conditions
307, 513 multidimensional, 442
diffusion, 438-441 nonhomogeneous, 370, 610
discrete parameter Markov process, 150, singular, 591
227, 513 Diffusion coefficient, 18, 19, 442
functional (FCLT), 23, 99 Dirichlet boundary condition. 404
Liapounov, 652 Dirichlet problem, 454. 495. 606
Lindeberg, 649 Dispersion matrix, see Diffusion coefficient
Martingale, 503-511 Distribution function (c.d.f. or d.f), 627
multivariate classical, 653 Doeblin condition, 215
Chapman-Kolmogorov equations, see Doeblin coupling, see Coupling
Semigroup property Dominated convergence theorem. 633
Characteristic function, 662 Domination inequality. 628
Chebyshev inequality, 630 Donsker's invariance principle, see Central
Chemical reaction kinetics, 310 limit theorem
Choquet-Deny theorem, 366 Doob's maximal inequality. 49
Chung-Fuchs recumence criterion, 92 Doob-Meyer decomposition, 89
Circular Brownian motion, 400 Dorfman's blood test problem. 64
Collocation polynomial, 206 Doubly stochastic, 199
Communicate, 121, 123, 303 Doubly stochastic Poisson, see Cox process
Completely deterministic motion, 113 Drift coefficient. 18. 19.442
Completely random motion, 113 Duhamel principle, 479,485
Compound Poisson process, 262, 267, 292 Dynamic inventory problem. 551
Compression coefficient, 185 Dynamic programming equation. 527, 528
Conditional expectation, 639 Dynamic programming principle, 526
Conditional probability, 639 discounted. 527
Co-normal reflection, 466 Dynkin's Pi-Lambda theorem, 637
Conservative process:
diffusion, see Explosion Eigenfunction expansion. 409, 505
Markov chain, 289 Empirical process, 36, 38
Continuity equation, 381 Entrance boundary, 503
Continuous in probability, see Stochastic Entropy. 186, 213.348
continuity Ergodic decomposition. 230
Control: Ergodic maximal inequality, 225
diffusion, 533-542 Ergodic theorem, see Birkhof s ergodic
-
feasible, 533 theorem

feedback, 533 Escape probability:
optimal, 533 birth-death, 235
Convergence: Brownian motion, 25.27
almost everywhere (a.e.), 631 diffusion. 377.428
almost surely (a.s.). 631 random walk, 6
in distribution (in law), see Weak Essential. 121, 303
convergence Event, 625
in measure, 631 finite dimensional, 2.62
Convex function, 159, 628 infinite dimensional. 2. 3, 23-24, 62
Convolution, 115,263.291.662.663 Exchangeable process, 116
Correlation length, see Relaxation time Exit boundary. 504
Coupling. 197, 221, 351, 364, 506 Expected value, 628

SUBJECT INDEX 669
Explosion: Hermite polynomials, 487

diffusion. 612 Hewitt-Savage zero-one law, 91
Feller's test, 614 Hille -Yosida theorem. 500
Has'minskii's test. 615 Hitting distributions, 454
Markov chain, 288 Holder inequality, 630
pure birth, 289 Horizon, finite or infinite, 519
Exponential martingale, 88 Hurst exponent, 56
Factorial moments, 156 i.i.d., 3

Fatou's lemma, 633 Inaccessible, 393. 422. 503
Feller property, 332. 360. 505, 591 Independent:
Feynman-Ka c formula. 478, 606 events, 638
Field. 626 random variables, 638
First passage time: sigmafields, 638
Brownian motion, 32, 429, 430 Independent increments, 18.262, 349, 429
diffusion, 491 Inessential, 121.303
distribution, 430 Infinitely divisible distribution, 349
random walk, 11 Infinitesimal generator, 372
First passage time process, 429 compound Poisson. 334
Flux, 381 diffusion, 373, 500
Fokker- Planck equation. see Forward Markov chain. 266
equation, 78 multidimensional diffusion, 443, 604
Forward equation, see also Boundary radial Brownian motion. 446
conditions Initial value problem, 477, 604
branching. 285 with boundary conditions, 467, 485. 500
Brownian motion, 78. 380, 390 Invariance principle. 24. See also Central
diffusion. 379, 405.442 limit theorem
Markov chain. 266, 335 Invariant distribution:
Fourier coefficients, 653. 658. 663 birth-death chain, 238
Fourier inversion formula. 661, 664 diffusion, 432, 433, 483, 593, 621, 623
Fourier series, 654 Markov chain. 129, 307, 308
Fourier transform. 660. 662. 664 Invariant measure. 204
Fractional Brownian motion, 356 Inventory control problem, 557
Fubini's theorem. 637 Irreducible transition probability, 123
Functional central limit theorem (FCLT). Irreversible process, 248
see Central limit theorem Ising model, 203
Iterated averaging. 204
Gambler's ruin, 67. 70, 208 Iterated random maps. 174, 228
Gaussian pair correlation formula. 77 ItO integral. 564, 570. 577.607
Gaussian process, 16 It s lemma. 585.611
Geometric Brownian motion. 384 It s stochastic integral equation, 571, 575
Geometric Ornstein-Uhlenbeck. 384 multidimensional, 580
Gibbs distribution, 162
Glivenko-Cantelli lemma. 39, 86 Jacobi identity, 487
Gnedenko-Koroljuk formula. 87 Jensen's inequality. 629
Gradient (grad), 464. 466 conditional, 642
Green's function, 501
K-convex. 552
Harmonic function, 68, 329, 364, 366, 495 Killing, 477
Harmonic measure, 454. See also Hitting Kirchoff circuit laws, 344, 347
distributions Kolmogorov's backward equation, see
Harmonic oscillator, 257 Backward equation
Harnack's inequality, 622 Kolmogorov's consistency theorem, see
Herglotz's theorem, 658 Kolmogorov's existence theorem

670 SUBJECT INDEX
Kolmogorov's existence theorem. 16, 73, 92, Maximum principle. 365, 454. 485, 495, 516,
93 621
Kolmogorov's extension theorem, see Maxwell-Boltzmann distribution, 63.393,
Kolmogorov's existence theorem 481
Kolmogorov's forward equation, see Forward Measure, 626
equation Measure determining class, 101
Kolmogorov's inequality, 51 Measure preserving transformation, 227
Kolmogorov-Smirnov statistic, 38, 103 Method of images, 394, 484, 491. See also
Kolmogorov zero-one law, 90 Reflecting boundary, diffusion
Kronecker's delta, 265 Monotone convergence theorem, 632
Minimal process:
Lp convergence criterion, 634, 635 diffusion. 613
LP maximal inequality, 602, 603 Markov chain, 288
Lp space, 633 Minkowski inequality, 630
Lagrange's method of characteristics, 312 Mutually singular, 63
Laplacian, 454
Law of diminishing returns, 178 Natural scale:
Law of large numbers: birth-death chain, 249
classical, 646 diffusion, 416, 479
diffusion, 432 Neumann boundary condition, 404,408
Markov chain, 145, 307 Negative definite, 607
Law of proportionate effect, 78, 480 Newton's law in Hamiltonian form, 257
Learning model, see Artificial intelligence Newton's law of cooling, 247
Lebesgue dominated convergence theorem, Noiseless coding theorem. 213
see Dominated convergence theorem Nonanticipative functional. 564, 567. 577. 579
Lebesgue integral, 627 Nonhomogeneous:
Lebesgue measure, 626 diffusion, 370, 610
Levy-Khinchine representation, 350, 354 Markov chain, 263, 335, 336
Levy process. 350 Markov process, 506
Levy stable process. 355 Nonnegative definite. 17, 74, 658
Liapounov's inequality, 629 Normal reflection. 460
Linear space, normed, 633 Null recurrent:
Liouville theorem, 258 diffusion. 423
Lipschitzian, local, global, 612 Markov chain, 144, 306
m -step transition probability, 113 Occupation time, 35

Main channel length, 285 Optional stopping. 52, 544
Mann-Wald theorem, 102 Orbit, 502
Markov branching, see Branching process, Ornstein-Uhlenbeck process, 369, 391. 476.
Bienayme-Galton-Watson branching 580
Markov chain, 17, 109, 110, 111, 214
Markov property, 109, 110 Past up to time, 118, 424
for function of Markov process, 400, 448, Percolation construction, 325, 363
483, 502, 518 Periodic boundary, 256, 398, 400, 493, 518
Markov random field, 166 Period of state, 123
Markov time, see Stopping time Perron-Frobenious Theorem, 130
Martingale, Martingale difference, 43.45 p- harmonic, 153
Mass conservation, 311, 381 Plancherel identity. 662
Maximal inequality: Poincare recurrence, 248, 258
Doob, 49 Poisson kernel, 454
Kolmogorov. 51 Poisson process, 103, 261, 270, 279, 280, 351
LP, 602, 603 Poisson's equation, 496
Maximal invariant, 502 Policy:
diffusion on torus, 518 c- optimal, 526
radial diffusion, 518 feasible. 519

SUBJECT INDEX 671
Markov, 521 diffusion, 419, 458

optimal, 520 Markov chain, 305
semi-Markov, 561 random walk, 8, 12
stationary, 527 Reflecting boundary:
(S. ), 550 birth-death chain, 237, 238. 241, 308. 315
Plya characteristic function criterion, 74 Brownian motion. 410, 411.412, 460, 467
Plya recurrence criterion. 12 diffusion, 395-402, 463.466
Pdlya urn scheme, 115. 336 random walk, 114, 122, 243, 246
Positive definite, 17, 74, 356, 658. See also Reflection principle:
Nonnegative definite Brownian motion. 430
Positive recurrent: diffusion, see Method of images
birth-death, 240 random walk, 10. 70
diffusion, 423. 586, 621 Regular conditional probability, 642
Markov chain, 141, 306 Relaxation time, 129, 256, 347
Pre sigmafield, 119, 424 Renewal age process, 342
Price adjustment model, 436 Renewal process, 193, 217
Probability density function (p.d.f.), 626 Renewal theorem, see Blackwell's renewal
Probability generating function (p.g.f.), 159 theorem
Probability mass function (p.m.f.), 626 Residual life time process, 342
Probability measure. 625 Resolvent:
Probability space, 625 equation, 338, 559
Process stopped at t, 424 operator, 501
Product measure space, 636 semigroup, 360
Products of random variables, 68 Reward function, 520
Progressively measurable, 497 discounted, 526
Prohorov theorem. 97 Reward rate, 539
Pseudo random number generator, 193 Riemann-Lebesgue lemma. 660
Pure birth process, 289
Pure death process, 338, 348 Sample path, 2, I11, 354, 358
Scale function. 416.499
Quadratic variation. 76 Scaling. 76. 82. 355
Queue. 193. 295.296. 309. 345 Scheffe's theorem, 635
Schwarz inequality, see Cauchy-Schwarz
r-th order Markov dependence, 189 inequality
r-th passage time, 39. 40. 45 Secretary problem, 545
Radon-Nikodym theorem. 639 Selfsimilar, 356
Random field, I Semigroup property, 264. 282. 372, 504, 604
Random replication model, 155 Semistable, 356
Random rounding model. 137, 216 Separable process, 94
Random variable, 627 Shannon's entropy. see Entropy
Random walk: Shifted process, 403
continuous parameter, 263, 317 Sigmafield, 625, 627
general, 114 Sigmafinite, 626
max/min, 6, 66. 70, 83 Signed measure. 639
multidimensional, II Simple function. 627
simple. 4, 114. 122 Singular diffusion, see Diffusion
Random walk bridge, 85 Size of industry model, 437
Random walk on group, 72, 191. 199 Slowly varying, 361
Random walk in random scenery. 190 Source/sinks. 478
Random walk on tree. 200 Spectral radius, 130, 197
Range of random walk. 67 Spectral representation. See also
Record value, time. 195 Eigenfunction expansion
Recurrence: birth-death chain, 242
birth-death chain. 236 diffusion, 410
Brownian motion, 25.447 Speed measure. 416.499

672 SUBJECT INDEX
Stable process. 355 Time-reversible Markov process, 200, 241,

Stationary process, 129. 223. 346 254.346.478
Statistical hypothesis test, 39 Time series, linear. 168
Stirling coefficient, 157 AR(p). 166, 170
Stirling formula, 11.67 ARMA( p.q). 172
Stochastic continuity, 350. 360 Traffic intensity of queue, 345
Stochastic differential equation, 563. 583. Transience:
592 birth-death, 237
Stochastic matrix, 110 Brownian motion. 27.447
Stopping time, 39, 51, 117, 277, 424 diffusion. 419.458
Strong law of large numbers (S.L.L.N.), see Markov chain. 305
Law of large numbers random walk, 7
Strong Markov property, 118, 277, 426, 491 Transition probability, density, law, 110,
Strong uniform time, 194 123. 368
Submartingale convergence theorem, 358 Tree graph. 200
Sub/super martingale. 88.89
Support theorem, 619 Umbrella problem, 199
Survival probability of economic agent, 182 Uniform asymptotic negligibility. 652
Survival under uncertainty, 535 Uniform integrability, 634
Symmetric dependence, see Exchangeable Uperossing inequality. 357, 358
process
Wald's equation. 42
Tauberian theorem, 361 Weak convergence. 24, 96, 97, 643
Taxicab problem, 558 Weierstrass approximation theorem, 653
Telegrapher equation, 299, 343 Wiener measure, 19, 105
Theta function, see Jacobi identity
Thinned Poisson process, 343 Yule process. 293
Tightness, 23, 646
Time dependent boundary, 485 Zero seeking particles. 348
Errata
p. 12, (5.3): Replace n by k on right side of (5.3).

p. 30, line 8 from top: fo (t) should be fo z (t).
p. 40, line 15 from bottom: ES, = ESo.
p. 53, line 8 from top: T r should be
( )
p. 69, Exc. 1(iv): q - p^ -1 should be (q - p) -1 .

p. 70, Exc. 4: In 4(i) replace the equality by the inequality (Levy's Inequality)
P(maxn<N Sn > y) < 2P(SN > y). Delete 4().
2
p. 71, Exc. 10(i): P(T9 = N)-'. (22 I N -3 " for y 0 and N = ^y) + 2m, m = 0,1, ... .
p. 88, Exc. 7: "symmetric" should be "asymmetric".
p. 88, Exc. 12(ii), (iii): Assume EX. = 0 for all n.
p. 91, proof of Theorem T.1.2: P(A.n ii A n ) should be P(A l A.).
p. 93, line 6 from top: Delete p in i, . Should read p..,, (dxz, dxi ).
p. 123, line 14 from bottom: Insert "> 0" (A = In > 1: pti^ > > 0}).
p. 127, (6.7): Superscript N + 1 on the left side should be n + 1.
p. 130: line 14 from bottom, should read: spectral radius is the largest magnitude of eigen -
values of A.
p. 135: line below (8.2): "return" times should be "passage" times.
p. 137, line 13 from top: The reference should be to Exercise 6 (not Exercise 7).
p. 137, (8.15): Should be (2p - 1) in place of (1 - 2p).
p. 140, (9.8): First sum should start with m = n + 1.
p. 150, (10.11): Replace X[l+l by f(X[nt]+1).
p. 173, line before (13.35): Replace "p-th row" by "last row".
e p. 194, Exc. 4(iv), should read: Zn = X, n <T and Zn = Y,, for n > 1'.
p. 197, Exc. 6(c), last line should read: with transition law p and starting at i.
p. 197, Exc. 7(i), should read: the largest magnitude of the eigenvalues of A.
p. 200, Exc. 5, should read: A tree graph on r vertices vi ... v T is a connected graph that
contains no cycles. [That is, ... unique sequence ei, e2, . .. , e r of edges ez = {vk^ , V mi }
such that u E e,., ej fl ei+i. 01, i = 1, ... , r - 1.] The degree....
673
674 ERRATA
p. 202, Exc. 1(ii): Insert n in hint: P(lY I > ne i.o.).

p. 210, line 7 from bottom: cEizi 1 should be EZ I (delete e and -1).
p. 225, line 2 from bottom: Replace T.9.1 by T.9.2.
p. 234, 1st line of (1.2), should read: pi,i +l = (w+r z)(2T-2>
234, 2nd line of (1.2), should read: i(2r -i) (w -'+z)(w +r-i)
p. 236, (2.16): Last term should be (1 - 6, - ,Qy )p w (i.e., missing factor pvv ).
p. 239, (3.6): Right side of 2nd equation in (3.6) should be irk in place of j.
p. 240, 6 lines from bottom: Replace first denominator (2 - r + j) in (3.14) by (w - r + j).
p.256, Exc. 4, should read: al = e -x1 is the largest nontrivial (i.e., 1) eigenvalue of p.
c'i is the eigenvector corresponding to al.
p. 262, 2nd line of (1.4), should read: j < i.

p. 278, lines 7, 8 from top: X- 1 should be XT2 , and XT, should be XT,,^ }1 .
p. 279, line 4 from top: Insert at the end: on the set {Yk = ik : k > 0 }.
p. 280, 2nd line of Proposition 5.6: Replace {Yt} by {Xt}.
p. 285, line 4 from bottom: The reference should be to Example 8.3 (not Example 8.2).
p. 290, line 7 from top: A,,,, +1 should be A, +i.
p. 293, 3rd line of (7.2): Capital T in center of inequality show be lowercase t.
p. 302, line 6 from top: {Vt : 0 < u < t} should be {Vu : 0 < u < t }.
p. 303, line 5 from top: o(T) should be o(1).
p. 307, (8.15): The right side should be divided by A2.
p. 307, 3rd line after (8.15): The right side of the display should be divided by A.
p. 309, 12 lines from top: The second and third sums in display (8.26) are E N -y
pyrv ,
_ N ^0" (y
N) N ,
p. 309, (8.27): Delete (^ +p) I (a+jR) (a +N,3) wherever it appears in display (8.27).
p. 325, (11.2): First sum is over n E Z d .

p. 326, Figure 11.1: ( o o(0)) should be (a(0)).
-
p. 329, 11.10: Insert d: Should be 2dp mn in second line of display (11.10).

p. 333, Exc. 3: p(u) > 0.
p. 334, line 1 from top: Replace "positive" by "nonzero".
p. 357, lines 3, 4 above (T.5.1): Tk - N should be T2k = N and X ,- k = N should be
X 2k = N.
,-
p. 357, (T.5.3) and 2nd line from bottom: Replace the upper limit of the sum by [N/2] + 1
(not [N/2]).
p. 363, line 6 from bottom: Replace (S,.T) by (E, C9).
p. 363, line 5 from bottom: Replace F by 9.
p. 364, line 8 from top: Replace F by 9.
ERRATA 675
p. 364, line 14 from bottom, should read: to y in the metric p as k - oo, then a ( "' k) (t) _
am (nk) ( 0 ) + a(0) ( 0 ) _ aD(7n)(t) a.s. as k - oo.
m(n,t) m(n,t)
p. 364, line 12 from bottom, display should read: 1-2P.n , (o (t) = -1) = EQD""` ) (t) -f
Ea n (t) = 1 - 2P.,(o n (t) = -1).
p. 373, (2.12): tfH should be H f I.
p. 374, line 2 from top: l 1(y) - f (x) should be f" (y) - f" (x)1.
p. 375, line 3 from bottom: x = X. should be x = X s .
p. 380, (2:44): Delete square brackets on the right-hand side, and insert one parenthesis
after -a/ax.
p. 400, (6.27): Insert a minus sign in the exponential.
p. 414, (8.38), ist line: Replace by d
d x.
p. 439, 2nd line of (13.3): Replace f o fo b y fo fo in first term and insert ds'ds at the end
of the line.
p. 476, Exc. 4(v): a 2 = yao should be a 2 = ry 2 a .
p. 477, Exc. 5(i): Reference to (2.4) in place of (2.2).
p. 485, line 9: Insert ds in the double integral.
p. 487, Exc. 9(ii): The eigenvalues are 0, -1, -2, ... .
p. 505, line 18 from bottom: "nonnegative" should be "nonpositive".
p. 518, line 5 from top: The publisher is Wiley Eastern Division (not Eka Press).
p. 564, line 20: In the definition of P-complete, replace N E .Ft by N E F.
p. 565, (1.9): dB t should be dt.
p. 566, (1.18): Should be .P not. P,9 in (1.18).
p.569,line2:Insert (-a) toread a+n(-a)<t<a+n1 (-a).
p. 570, line 16: Insert ( - c) after < on right side of the display.
p. 571, (1.35): Delete first minus sign.
p. 571: Theorem 2.2 at the bottom of the page should be named Theorem 2.1.
p. 576, (2.29): Delete last = o(t).
p. 577, first line of (2.31): dx should be ds.
p. 578, (2.34): Insert f t after 1
p. 590, line 9 from top: P(IXt - xJ > e).
p. 597, (4.37): Insert one prime on the right side: aa'.
p. 598, (4.39): Insert one prime on the right side: aa'.
p. 602, Exc. 8(i): In the last inequality of the Hint, transpose E and the first bracket {.
p. 607, line 4: Insert factor eye in cp(s, y) = u(t - s, yl)eh/ 2 .
p. 617, (T.3.30): Insert a'(Xt) after Z'' in the dB t term on the right side of last line
of (T.3.30) and close parenthesis before dBt. Last line of (T.3.30) should thus read:
A9(Xi )Z'"dt + Z'^ (a (Xi) grad g(Xe) + g(Xe ).f (X t )) dBt.
676 ERRATA
p. 618, lines 14, 15 from the bottom, should read: Then P and Q are absolutely continuous
with respect to each other on (Il, .Ft), with dQ/dP = Z o ""
p. 619, (T.3.36): c(s) should be replaced by (X 8 )
p. 620, line 4 from top: Insert (r): Should read (r) :=.
p. 626, line 17 from top: I = [a, b).
p. 640, Theorem 4.4(f): Delete the last "a.s. and".
p. 648, lines 11, 12 from bottom: k = 2a /(a 1) in line 11, and a should be a in line 12.
p. 651, 4 lines from bottom, should read: where c = 2I I f " I C E Zl I3
Zr in (8.4) on the right; missing factor 21r on the left in (8.5).
p. 654: Missing factor ^
p. 668, line 15 from top: Martingale pages are 507-515 (not 503-511).
p. 670, line 5 from top, should read: Kolmogorov forward equation, 78, 266, 335, 379, 380,
390, 405, 442, 477, 478, 481, 492. See also Forward equation.
p. 670, line 15 from bottom: OrnsteinUhlenbeck pages are 369, 391, 476, 480, 581, 598
(not 580).

Stochastic Processes With Applications

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stochastic Processes With Applications

Uploaded by

Copyright:

Available Formats

9

Classics in Applied Mathematics

Library of Congress Cataloging-in-Publication Data

Bhattacharya, R. N. (Rabindra Nath), 1937-

S1L2JTL. is a registered trademark.

Preface to the Classics Edition xiii

Sample Course Outline xvii

I Random Walk and Brownian Motion 1

II Discrete-Parameter Markov Chains 109

3. Some Examples, 113

IV Continuous-Parameter Markov Chains 261

11. Chapter Application: An Interacting System: The Simple Symmetric

V Brownian Motion and Diffusions 367

VI Dynamic Programming and Stochastic Optimization 519

VII An Introduction to Stochastic Differential Equations 563

0 A Probability and Measure Theory Overview 625

Author Index 665

The publication of Stochastic Processes with Applications (SPWA) in the SIAM

This is a text on stochastic processes for graduate students in science and

Chapter I Chapter II Chapter III Chapter IV Chapter VI

Random Walk and Brownian

1 WHAT IS A STOCHASTIC PROCESS?

we make the following definition.

Definition 1.1. Given an index set I, a stochastic process indexed by I is a

to the occurrence of a sample point co e fl constitute a sample realization

frequently calculated in terms of the probabilities of finite-dimensional events

of X. corresponding to the occurrence of the sample point w e f is simply the

Fn = {X1 =e ,...,Xn =En } = { w a fl: w 1 = s 13 ...,CJ n =E n }

is a finite-dimensional event. By independence,

P(F,,) = p'"( 1 p)" - '" (1.1)

where rn is the number of l's among e, . , e n . Now consider the singleton

event G corresponding to the occurrence of a specific sequence of outcomes

G = {X l =e ,. . . , Xn = E n ,. . .} = {(E1, E2, . . . , en , ...)}

consists of the single outcome a = ( E,, e 2 , ... , E n , ...) in f2. G is an

2 THE SIMPLE RANDOM WALK

Think of a particle moving randomly among the integers according to the

moves either one unit forward to + I or one unit backward to 1, with

S,,:=X i +...+X., S 0 =0. (2.1)

The simple random walk is often used by physicists as an approximate model

3 TRANSIENCE AND RECURRENCE PROPERTIES OF THE

Ty:= min{n >, 0:S = y}. (3.1)

4(x):= P(T < T' ). (3.2)

4(x) = po(x + 1) + q4(x 1) (3.3)

O(x+ 1)O(x)=-[O(x)c(x 1)], c+ 1 ,<x,<d 1

=0(c+1) Y - =0(c+1) 1- /p) (3.5)

To determine 4(c + 1) take x = d in Eq. 3.5 to get

0/i(x)==P(T, < Tdx). (3.7)

By symmetry (or the same method as above),

P({S} will ever reach c) = P(T,' < oo) = lim i(x)

P(Sn = y i.o.) = 0, if p q, (3.12)

where i.o. is shorthand for "infinitely often." For if Sx = y for integers

the probability of which is zero by Eq. 3.11.

In the case p = q = 2, according to the boundary-value problem (3.4), the

q5(x) + i(x) = 1. (3.15)

Moreover, in this case, given any initial position x> c,

P({S} will eventually reach c) = P(Tc < cc)

= lim P({S} will reach c before it reaches d)

= lim d-x = 1. (3.16)

Similarly, whatever the initial position x < d,

P(S' = y i.o.) = 1, if p = 9 = 2. (3.18)

This argument is discussed again in Example 4.1 of Chapter II.

Let r^ X denote the time of the first return to x,

rl := inf{n >, 1: S.' = x} . (3.19)

Then, conditioning on the first step, it will follow (Exercise 6) that