Tutorial On Stochastic Differential Equations

Please cite as J. R. Movellan (2011) Tutorial on Stochastic Dierential Equations, MPLab Tutorials Version 06.1.
Tutorial on Stochastic Dierential Equations
Javier R. Movellan
Copyright c 2003, 2004, 2005, 2006, Javier R. Movellan
This document is being reorganized. Expect redundancy, inconsistencies, disorganized presentation ...
Motivation
There is a wide range of interesting processes in robotics, control, economics, that can be described as a dierential equations with non-deterministic dynamics. Suppose the original processes is described by the following dierential equation dXt = a(Xt ) (1) dt with initial condition X0 , which could be random. We wish to construct a mathematical model of how the may behave in the presence of noise. We wish for this noise source to be stationary and independent of the current state of the system. We also want for the resulting paths to be continuous. As it turns out building such a model is tricky. An elegant mathematical solution to this problem may be found by considering a discrete time versions of the process and then taking limits in some meaningful way. Let = {0 = t0 t1 tn = t} be a partition of the interval [0, t]. Let tk = tk+1 tk . For each partition we can construct a continuous time process X dened as follows
Xt = X0 0 Xt k+1
(2) +
a(Xt ) tk k
Xt k
c(Xt )(Ntk+1 k
Ntk )
(3)
where N is a noise process whose properties remain to be determined and b is a function that allows us to have the amount of noise be a function of time and of the state. To make the process be continuous in time, we make it piecewise constant between the intervals dened by the partition, i.e.
Xt = Xt for t [tk , tk+1 ) k
(4)
We want for the noise Nt to be continuous and for the increments Ntk+1 Ntk to have zero mean, and to be independently and identically distributed. It turns out that the only noise source that satises these requirements is Brownian motion. Thus we get
n1 Xt = X0 + k=0 a(Xt )tk + k k=0 n1 c(Xt )Bk k
(5)
where tk = tk+1 tk , and Bk = Btk+1 Btk where B is Brownian Motion. Let = max{tk } be the norm of the partition . It can be shown that as 0 the X processes converge in probability to a stochastic process X . It follows that
n1 0 t a(Xt )k = k k=0
lim
a(Xs )ds
0
(6)
and that
n1 c(Xt )Bk k k=0
(7)
converges to a process It
n1
It = lim
c(Xt )Bk k k=0
(8)
Note It looks like an integral where the integrand is a random variable c(Xs ) and the integrator Bk is also a random variable. As we will see later, It turns out to be an Ito Stochastic Integral. We can now express the limit process X as a process satisfying the following equation
t
Xt = X0 +
0
a(Xs )ds + It
(9)
Sketch of Proof of Convergence: Construct a sequence of partitions 1 , 2 , each i one being a renement of the previous one. Show that the corresponding Xt form a Cauchy sequence in L2 and therefore converge to a limit. Call that process X. In order to get a better understanding of the limit process X there are two things we need to do: (1) To study the properties of Brownian motion and (2) to study the properties of the Ito stochastic integral.
Standard Brownian motion
By Brownian motion we refer to a mathematical model of the random movement of particles suspended in a uid. This type of motion was named after Robert Brown that observed it in pollens of grains in water. The processes was described mathematically by Norbert Wiener, and is is thus also called a Wiener Processes. Mathematically a standard Brownian motion (or Wiener Process) is dened by the following properties: 1. The process starts at zero with probability 1, i.e., P (B0 = 0) = 1 2. The probability that a randomly generated Brownian path be continuous is 1. 3. The path increments are independent Gaussian, zero mean, with variance equal to the temporal extension of the increment. Specically for 0 s1 t1 s2 , t2 Bt1 Bs1 N (0, s1 t1 ) Bt2 Bs2 N (0, s2 t2 ) and Bt2 Bs2 is independent of Bt1 Bs1 . Wiener showed that such a process exists, i.e., there is a stochastic process that does not violate the axioms of probability theory and that satises the 3 aforementioned properties. (10) (11)
2.1 2.1.1
Properties of Brownian Motion Statistics
From the properties of Gaussian random variables, E(Bt Bs ) = 0 Var(Bt Bs ) = E[(Bt Bs ) ] = t s

2
(12) (13) (14) (15) (16) (17)
E((Bt Bs )4 ] = 3(t s) Var[(Bt Bs )2 ] = E[(Bt Bs )4 ] E[(Bt Bs )2 ]2 = 2(t s)2

Cov(Bs , Bt ) = s, for t > s Corr(Bs , Bt ) = s , t for t > s.
Proof: For the variance of (Bt Bs )2 we used the that for a standard random variable Z E(Z 4 ) = 3 (18) Note Var(BT ) = Var(BT B0 ) = T (19) since P (B0 = 0) and for all t 0 Var(Bt+t Bt ) = t (20) Moreover, Cov(Bs , Bt ) = Cov(Bs , Bs + (Bt Bs )) = Cov(Bs , Bs ) + Cov(Bs , (Bt Bs )) = Var(Bs ) = s (21) since Bs and Bt Bs are uncorrelated.
2.1.2
Distributional Properties
Let B represent a standard Brownian motion (SBM) process. Self-similarity: For any c = 0, Xt =
1 B c ct
is SBM.
We can use this property to simulate SBM in any given interval [0,T] if we know how to simulate in the interval [0, 1]: 1 1 then Xt = T B T If B is SBM in [0,1], c = T t is SBM in [0,T]. Time Inversion: Xt = tB 1 is SBM t Time Reversal: Xt = BT BT t is SBM in the interval [0, T ] Symmetry: Xt = Bt is SBM 2.1.3 Pathwise Properties Brownian motion sample paths are non-dierentiable with probability 1 This is the basic why we need to develop a generalization of ordinary calculus to handle stochastic dierential equations. If we were to dene such equations simply as dBt dXt = a(Xt ) + c(Xt ) (22) dt dt
we would have the obvious problem that the derivative of Brownian motion does not exist. Proof: Let X be a real valued stochastic process. For a xed t let = {0 = t0 t1 , tn = t} be a partition of the interval [0, t]. Let be the norm of the partition. The quadratic variation of X at t is a random variable represented as < X, X >2 t and dened as follows
n
< X, X >2 t = lim
|Xtk+1 Xtk |2
k=1
(23)
We will show that the quadratic variation of SBM is larger than zero with probability one, and therefore the quadratic paths are not dierentiable with probability 1. Let B be a Standard Brownian Motion. For a partition = {0 = t0 t1 , tn = t} let Bk be dened as follows
Bk = Btk
(24)
2
Let S =
n (Bk ) k=1
(25)
Note
E( S ) =
and 0 Var(S ) =
n1
tk+1 tk = t
k=0
(26)
n1 2 Var (Bk ) k=0 n1
=2
k=0
(tk+1 tk )2
n1
2
k=0
(tk+1 tk ) = 2 t
(27)
Thus
n1 2 (Bk ) t k=0 2
lim Var(S ) = lim
=0 (28)
This shows mean square convergence, which implies convergence in probability, of S to t. (I think) almost sure convergence can also be shown. Comments: If we were to dene the stochastic integral
t t (dBs )2 0
as (29)
(dBs )2 = lim S
0 0
Then
t t
(dBs )2 =
0 0
ds = t
(30)
If a path Xt ( ) were dierentiable almost everywhere in the interval [0, T ] then

n1
< X, X >2 t ( )) lim = ( max Xt ( )2 ) lim

t[0,T ] t[0,T ] n n
t 0
(t Xtk ( ))2
k=0
(31) (32) (33)
2 t
= ( max Xt ( )2 ) lim (n)(T /n)2 = 0
where X = dX/dt. Since Brownian paths have non-zero quadratic variation with probability one, they are also non-dierentiable with probability one. 2.2 Simulating Brownian Motion
Let = {0 = t0 t1 tn = t} be a partition of the interval [0, t]. Let {Z1 , , Zn }; be i.i.d Gaussian random variables E(Zi ) = 0; Var(Zi ) = 1. Let the stochastic process B as follows,
Bt =0 0 Bt 1
Bt 0
(34) t1 t0 Z1 (35) (36)
. . .
Bt k
Bt k 1
t k t k 1 Z k
(37) (38)
Moreover,
Bt = Bt for t [tk1 , tk ) k 1
For each partition this denes a continuous time process. It can be shown that as 0 the process B converges in distribution to Standard Brownian Motion. 2.2.1 Exercise
Simulate Brownian motion and verify numerically the following properties
E( B t ) = 0
Var(Bt ) = t
t 2 dBs = 0 0 t
(39) (40) ds = t (41)
The Ito Stochastic Integral
We want to give meaning to the expression

t
Ys dBs
0
(42)
where B is standard Brownian Motion and Y is a process that does not anticipate the future of Brownian motion. For example, Yt = Bt+2 would not be a valid
integrand. A random process Y is simply a set of functions f (t, ) from an outcome space to the real numbers, i.e for each Yt ( ) = f (t, ) (43)
We will rst study the case in which f is piece-wise constant. In such case there is a partition = {0 = t0 t1 tn = t} of the interval [0, .t] such that
n1
fn (t, ) =
k=0
Ck ( )k (t)
(44)
where k (t) = 1 if t [tk , tk+1 ) 0 else (45)
where Ck is a non-anticipatory random variable, i.e., a function of X0 and the Brownian noise up to time tk . For such a piece-wise constant process Yt ( ) = fn (t, ) we dene the stochastic integral as follows. For each outcome
t n1
Ys ( )dBs ( ) =
0 k=0
Ck ( ) Btk+1 ( ) Btk ( )
(46)
More succinctly
t n1
Ys dBs =
0 k=0
Ck Btk+1 Btk
(47)
This leads us to the more general denition of the Ito integral Denition of the Ito Integral Let f (t, ) be a non-anticipatory function from an outcome space to the real numbers. Let {f1 , f2 , } be a sequence of elementary non-anticipatory functions such that
E[ n
lim
f (s, ) fn (s, )
0
ds] = 0
(48)
Let the random process Y be dened as follows: Yt ( ) = f (t, ) Then the Ito integral
t
Ys dBs
0
(49)
is a random variable dened as follows. For each outcome

t t
f (s, )dBs ( ) = lim

0
fn (t, )dBs ( )
0
(50)
where the limit is in L2 (P ). It can be shown that an approximating sequence f1 , f2 satisfying (48) exists. Moreover the limit in (50) also exists and is independent of the choice of the approximating sequence. Comment Strictly speaking we need for f to be measurable, i.e., induce a proper random variable. We also need for f (t, ) to be Ft adapted. This basically means that Yt must be a function of Y0 and the Brownian motion up to time t It cannot t be a function of future values of B . Moreover we need E[ 0 f (t, )2 dt] .
3.1
Properties of the Ito Integral

2 Var(It ) = E(It )= t 0
E(It ) = 0 E(Xs2 )ds

t
(51)
(52)
t t
(Xs + Ys ) dBs =
0 0 t
Xs dBs +
0 T
Ys dBs
(53)
Xs dBs =
0 0
Xs dBs +
t
Xs dBs for t (0, T )
(54)
The Ito integral is a Martingale process where E(It | Fs ) is the least squares prediction of It based on all the information available up to time s.
E(It | Fs ) = Is lfor all t > s
(55)
Stochastic Dierential Equations
In the introduction we dened a limit process X which was the limit process of a dynamical system expressed as a dierential equation plus Brownian noise perturbation in the system dynamics. The process was a solution to the following equation
t
Xt = X0 +
0
a(Xs )ds + It
(56)
where
It = lim c(Xt )Bk k 0
(57)
It should now be clear that It is in fact an Ito Stochastic Integral

t
It =
0
c(Xs )dBs
(58)
and thus X can be expressed as the solution of the following stochastic integral equation
t t
Xt = X0 +
0
a(Xs )ds +
0
c(Xs )dBs
(59)
It is convenient to express the integral equation above using dierential notation dXt = a(Xt )dt + c(Xt )dBt (60)
with given initial condition X0 . We call this an Ito Stochastic Dierential Equation (SDE). The dierential notation is simply a pointer, and thus acquires its meaning from, the corresponding integral equation.
4.1
Second order dierentials
The following rules are useful

t
Xt (dt)2 = 0
0 t
(61) (62) (63) (64) (65)
Xt dBt dt = 0
0 t
Xt dBt dWt = 0 if B, W are independent Brownian Motions

0 t t
Xt (dBt )2 =
0 0
Xt dt
Symbolically this is commonly expressed as follows dt2 = 0 dBt dt = 0 dBt dWt = 0 (dBt )2 = dt (66) (67) (68) (69)
Sketch of proof: Let = {0 = t0 t1 tn = t} a partition of the [0, t] with equal intervals, i.e. tk+1 tk = t. Regarding dt2 = 0 note
n1 t 0 t
lim
Xtk t2 = lim t
k=0 t 0 0
Xs ds = 0
(70)
Regarding dBt dt = 0 note

n1 t 0 t
lim
Xtk tBk = lim t

k=0 t 0 0
Xs dBs = 0
(71)
2 = dt note Regarding dBt
n1 2 Xtk Bk k=0 n1 n1
n1
Xtk t
k=0
=E
n1 2 Xtk (Bk t ) k=0
=
k=0 k =0
E[Xt
2 2 t)] Xtk (Bk t)(Bk
(72)
2 2 If k > k then (Bk t) is independent of Xtk Xtk (Bk t), and therefore
E[Xt
2 2 Xtk (Bk t)(Bk t)]
2 2 = E[Xtk Xtk (Bk t)]E[Bk t)] = 0 2 (Bk
(73)
Equivalently, if k > k then t) and therefore
2 t) is independent of Xtk Xtk (Bk
E[Xt
2 2 Xtk (Bk t)(Bk t)]
2 2 = E[Xtk Xtk (Bk t)]E[Bk t)] = 0
(74)
Thus
n1 n1
E[Xt
n1
2 2 Xtk (Bk t)(Bk t)] = k k=0
2 E[Xt2 (Bk t)2 ]

k
k=0 k =0
(75) Note since Bk is independent of Xtk then

k k
2 2 2 E[Xt2 (Bk t)2 = E[Xt ]E[(Bk t)2 ] 2 2 2 = E[Xt ]Var(Bk ) = 2E[Xt ]t2
k k
(76) (77)
Thus
n1 2 Xtk Bk k=0
n1
n1
Xtk t
k=0
=
k=0
E[Xt2 ]2 t
k
(78)
which goes to zero as t 0. Thus, in the limit as t 0

n1 t0 n1 2 Xtk Bk = lim k=0 t0
lim
Xtk t
k=0
(79)
where the limit is taken in the mean square sense. Thus

t 2 Xs dBs = 0 0 t
Xs ds
(80)
Regarding dBt dWt = 0 note
n1
n1 n1
Xtk Bk Wk
k=0
=
k=0 k =0
E[Xt
Xtk Bk Wk Bk Wk ] (81)
If k > k then Bk , Wk are independent of Xtk Xtk Bk Wk and therefore
E[Xt E[Xt
Xtk Bk Wk Bk Wk ] = E[Xtk Xtk Bk Wk ]E[Bk ]E[Wk ] = 0 (82) are independent of
Equivalently, if k > k then Bk , Wk Xtk Xtk Bk Wk and therefore

k
Xtk Bk Wk Bk Wk ] = E[Xtk Xtk Bk Wk ]E[Bk ]E[Wk ] = 0 (83)

2 2 2 2 2 2 ]E[Bk ]E[Wk ] = E[Xt ]t2 E[Xt2 Bk W k ] = E[Xt
k k k
Finally, for k = k , Bk , Wk and Xtk are independent, thus Thus
(84)
n1
n1
Xtk Bk Wk
k=0
=
k=0
E[Xt2 ]t2
k
(85)
which converges to 0 as t 0. Thus

t
Xs dBs dWs = 0
0
(86)
4.2
Vector Stochastic Dierential Equations
The form dXt = a(Xt )dt + c(Xt )dBt (87)
is also used to represent multivariate equations. In this case Xt represents an ndimensional random vector, Bt an m-dimensional vector of m independent standard Brownian motions, and c(Xt is an n m matrix. a is commonly known as the drift vector and b the dispersion matrix.
Itos Rule
The main thing with Itos calculus is that for the general case a dierential carries quadratic and linear components. For example suppose that Xt is an Ito process. Let Yt = f (t, Xt ) then
T 2 dYt = f (t, Xt )T dXt + 1 2 dXt f (t, Xt )dXt
(88)
(89)
where , 2 are the gradient and Hessian with respect to (t, x). Note basically this is the second order Taylor series expansion. In ordinary calculus the second order terms are zero, but in Stochastic calculus, due to the fact that these processes have non-zero quadratic variation, the quadratic terms do not go away. This is really all you need to remember about Stochastic calculus, everything else derives from this basic fact. The most important consequence of this fact is Itos rule. Let Xt be governed by an SDE dXt = a(Xt , t)dt + c(Xt , t)dBt Let Yt = f (Xt , t). Itos rule tells us that Yt is governed by the following SDE
1 T 2 x f (t, Xt )dXt dYt = t f (t, Xt )dt + x f (t, x)T dXt + 2 dXt
def
(90)
(91)
where dBi,t dBj,t = (i, j ) dt dXdt = 0 dt = 0 Equivalently

2 def
def def
(92) (93) (94)
dYt = t f (Xt , t)dt + x f (Xt , t)T a(Xt , t)dt + x f (Xt , t)T c(Xt , t)dBt T 2 +1 2 trace c(Xt , t)c(Xt , t) x f (Xt , t) dt
(95)
where x f (x, t)T a(x, t) =

i
f (x, t) ai (x, t) xi (c(x, t)c(x, t)T )ij

i j
(96) 2 f (x, t) xi xj (97)
trace c(x, t)c(x, t)T 2 x f (x, t) =
Note b is a matrix. Sketch of Proof: To second order Yt = f (Xt+t , t + t) f (Xt , t) = t f (Xt , t)t + x f (Xt , t)T Xt 1 1 T 2 T + t 2 2 t f (Xt , t) + Xt x f (Xt , t)Xt + t(x t f (Xt , t)) Xt t 2 2 (98) where t , x are the gradients with respect to time and state, and 2 t is the second derivative with respect to time, 2 the Hessian with respect to time and x t the x gradient with respect to state of the gradient with respect to time. Integrating over time
n1
Yt = Y0 +
k=0
Ytk
(99)
and taking limits

t t t
Yt =Y0 + + +
0
dYs = Y0 +
0 t 0 t 0
t f (Xs , s)ds +
0
x f (Xs , s)T dXs
1 2
2 2 t f (Xs , s)(ds) +
1 2
t T 2 dXs x f (Xs , s)dXs 0
(x t f (Xs , s))T dXs ds dYt =t f (Xt , t)dt + x f (Xt , t)T dXt 1 1 T 2 + 2 f (Xt , t)(dt)2 + dXt x f (Xt , t)dXt 2 t 2 + (x t f (Xt , t))T dXt dt
(100)
In dierential form
(101)
Expanding dXt (x t f (Xt , t))T dXt dt = (x t f (Xt , t))T a(Xt , t)(dt)2 + (x t f (Xt , t))T c(Xt , t)dBt dt = 0 where we used the standard rules for second order dierentials (dt)2 = 0 (dBt )dt = 0 Moreover
T 2 dXt x f (Xt , t)dXt
(102) (103) (104) (105)
= (a(Xt , t)dt + c(Xt , t)dBt )T 2 x f (Xt , t)(a(Xt , t)dt + c(Xt , t)dBt )

2 = a(Xt , t)T 2 x f (Xt , t)a(Xt , t)(dt)
+ 2a(Xt , t)T 2 x f (Xt , t)c(Xt , t)(dBt )dt

T + dBt c(Xt , t)T 2 x f (Xt , t)c(Xt , t)(dBt )
(106)
Using the rules for second order dierentials (dt)2 = 0 (dBt )dt = 0
T dBt K (Xt , t)dBt = i j
(107) (108) Ki,j (Xt , t)dBi,t dBj,t =

i
Ki,i dt
(109)
where K (Xt , t) = c(Xt , t)T 2 x f (Xt , t)c(Xt , t) Thus dYt =t f (Xt , t)dt + x f (Xt , t)T a(Xt , t)dt + x f (Xt , t)T c(Xt , t)dBt 1 + trace c(Xt , t)c(Xt , t)T 2 (111) x f (Xt , t) dt 2 where we used the fact that Kii (Xt , t)dt = trace(K )dt
i
(110)
= trace c(Xt , t)T 2 x f (Xt , t)c(Xt , t) = trace c(Xt , t)c(Xt , t)T 2 x f (Xt , t) (112)
5.1
Product Rule
Let X, Y be Ito processes then d(Xt Yt ) = Xt dYt + Yt dXt + dXt dYt Proof: Consider X, Y as a joint Ito process and take f (x, y, t) = xy . Then f =0 t f =y x f =x y 2f =1 xy 2f 2f = =0 x2 y 2 Applying Itos rule, the Product Rule follows. Exercise: Solve 0 Bt dBt symbolically. Let a(Xt , t) = 0, c(Xt , t) = 1, f (x, t) = x2 . Thus dXt = dBt Xt = Bt (119) (120)
T
(113)
(114) (115) (116) (117) (118)
and f (t, x) =0 t f (t, x) = 2x x 2 f (t, x) =2 x2 Applying Itos rule df (Xt , t) = f (Xt , t) f (Xt , t) f (Xt , t) dt + a(Xt , t)dt + c(Xt , t)dBt t x x 1 2 f (x, t) + trace c(Xt , t)c(Xt , t)T 2 x2
2 dBt = 2Bt dBt + dt
(121) (122) (123)
(124)
we get (125)
t
Equivalently
t 2 dBs =2 0 t 2 Bt =2 0 0 t
Bs dBs +
0
ds
(126) (127)
Bs dBs + t
Therefore
t
Bs dBs =
0 2 is dierent from (dBt )2 . NOTE: dBt
1 2 1 B t 2 t 2
(128)
Exercise:
Get
E[ e B t ]
1 dYt = eBt dBt + 2 eBt dt 2 t 2 t Bs Bs Yt = Y0 + e dBs + e ds 2 0 0 (129) (130)
Let Let a(Xt , t) = 0, c(Xt , t) = 1, i.e., dXt = dBt . Let Yt = f (Xt , t) = eBt , and dXt = dBt . Using Itos rule
Taking expected values
E[Yt ] = E[Y0 ] + 2
t
2 0
E[Ys ]ds
(131)
where we used the fact that E[ 0 eBs dBs ] = 0 because for any non anticipatory t random variable Yt , we know that E[ 0 Ys dBs ] = 0. Thus dE[Yt ] 2 = E[Yt ] dt 2 (132)
and since
E[Y0 ] = 1
E[eB ] = e
t
2 2
(133)
Exercise:
Solve the following SDE
dXt = Xt dt + Xt dBt
(134)
In this case a(Xt , t) = Xt , c(Xt , t) = Xt . Using Itos formula for f (x, t) = log(x) f (t, x) =0 t f (t, x) 1 = x x 1 2 f (t, x) = 2 2 x x Thus 1 1 1 2 2 2 Xt dt + Xt dBt Xt dt = ( )dt + dBt 2 Xt Xt 2Xt 2 2 )t + Bt 2 (135) (136) (137)
d log(Xt ) =
(138)
Integrating over time log(Xt ) = log(X0 ) + (

2
(139) (140)
Xt = X0 exp(( Note
)t) exp(Bt ) 2
E[Xt ] = E[X0 ]e(

6 Moment Equations
2 2
)t
E[exp(Bt )] = E[X0 ]et
(141)
Consider an SDE of the form dXt = a(Xt )dt + c(Xt )dBt (142)
Taking expected values we get the dierential equation for rst order moments dE[Xt ] = E[a(Xt )] (143) dt seems weird that c has no eect. Double check with generator of ito diusion result With respect to second order moments, let Yt = f (Xt ) = Xi,t Xj,t using Itos product rule dYt = d(Xi,t Xj,t ) =Xi,t dXj,t + Xj,t dXi,t + dXi,t dXj,t =Xi,t (aj (Xt )dt + (c(Xt )dBt )j ) + Xj,t (ai (Xt )dt + (c(Xt )dBt )i ) + (ai (Xt )dt + (c(Xt )dBt )i )(aj (Xt )dt + (c(Xt )dBt )j ) = Xi,t (aj (Xt )dt + (c(Xt )dBt )j ) + Xj,t (ai (Xt )dt + (c(Xt )dBt )i ) + ci (Xt )cj (Xt )dt (145) Taking expected values (144)
dE[Xi,t Xj,t ] = E[Xi,t aj (Xt )] + E[Xj,t ai (Xt )] + E[ci (Xt )cj (Xt )] (146) dt In matrix form dE[Xt Xt ] = E[Xt a(Xt ) ] + E[a(Xt )Xt ] + E[c(Xt )c(Xt ) ] (147) dt The moment formulas are particularly useful when a, c are constant with respect to Xt , in such case dE[Xt ] = aE[Xt ] dt dE[Xt Xt ] = E[Xt Xt ]a + aE[Xt Xt ] + cc dt Var[Xt ] = E[Xt Xt ]a + aE[Xt Xt ] aE[Xt ]E[Xt ] a + cc dt (148) (149) (150) (151) Example Calculate the equilibrium mean and variance of the following process dXt = Xt + cdBt The rst and second moment equations are dE[Xt ] = E[Xt ] dt 2 dE[Xt ] = 2E[Xt ]2 + c2 dt (152)
(153) (154)
Thus
t
lim lim
E[Xt ] = 0
2
(155) (156)
c E[Xt2 ] = tlim Var[Xt ] = t 2
Generator of an Ito Diusion
The generator Gt of the Ito diusion dXt = a(Xt , t)dt + c(Xt , t)dBt (157) is a second order partial dierential operator. For any function f it provides the directional derivative of f averaged across the paths generated by the diusion. In particular given the function f , the function Gt [f ] is dened as follows dE[f (Xt ) | Xt = x] E[f (Xt+t ) | Xt = x] f (x) = lim t0 dt t E [df (Xt ) | Xt = x] = dt Note using Itos rule Gt [f ](x) = d f (Xt ) =x f (Xt , t)T a(Xt , t)dt + x c(Xt , t)T dBt 1 + trace c(Xt , t)c(Xt , t)T 2 x f (Xt , t) dt 2
(158)
(159)
Taking expected values Gt [f ](x) = dt
E[df (Xt ) | Xt = x] =
T x f (x) a(x, t)
1 + trace c(x, t)c(x, t)T 2 x f (x) 2 (160)
In other words Gt [] =
i
ai (x, t)
1 [] + xi 2
(c(x, t)c(x, t)T )i,j

i j
2 [] xi xj
(161)
Adjoints
Every linear operator G on a Hilbert space H with inner product < , > has a corresponding adjoint operator G such that < Gx, y >=< x, G y > for all x, y H (162)
In our case the elements of the Hilbert space are functions f, g and the inner product will be of the form < f, g >= f (x) g (x) dx (163)
Using partial integrations it can be shown that if f (x) 1 ai (x, t) + trace c(x, t)c(x, t)T 2 x f ( x) xi 2
G[f ](x) =
i
(164) (165)
then G [f ](x) =
i
1 [f (x)ai (x, t)] + xi 2
i,j
2 [(c(x, t)c(x, t)T )ij f (x)] xi xj
(166)
The Feynman-Kac Formula (Terminal Condition Version)
Let X be an Ito diusion dXt = a(Xt , t)dt + c(Xt , t)dBt with generator Gt Gt [v ](x) =
i
(167) 2 v (x, t) xi xj
ai (x, t)
v (x, t) 1 + xi 2
(c(x, t)c(x, t)T )i,j

i j
(168)
Let v be the solution to the following pde v (x, t) = Gt [v ](x, t) v (x, t)f (x, t) (169) t with a known terminal condition v (x, T ), and function f . It can be shown that the solution to the pde (169) is as follows v (x, s) = E v (XT , T ) exp
T
f (Xt )dt | Xs = x
s
(170)
We can think of v (XT , T ) as a terminal reward and of factor.
T s
f (Xt )dt as a discount
Informal Proof: t Let s t T let Yt = v (Xt , t), Zt = exp( s f (X )d ), Ut = Yt Zt . It can be shown (see Lemma below) that dZt = Zt f (Xt )dt (171) Using Itos product rule dUt = d(Yt Zt ) = Zt dYt + Yt dZt + dYt dZt (172) Since dZt has a dt term, it follows that dYt dZt = 0. Thus dUt = Zt dv (Xt , t) v (Xt , t)Zt f (Xt )dt (173) Using Itos rule on dv we get dv (Xt , t) =t v (Xt , t)dt + (x v (Xt , t))T a(Xt , t)dt + (x v (Xt , t))T c(Xt , t)dBt 1 + trace c(Xt , t)c(Xt , t)T 2 (174) x v (Xt , t) dt 2 Thus dUt =Zt t v (Xt , t) + (x v (Xt , t))T a(Xt , t) 1 + trace c(Xt , t)c(Xt , t)T 2 x v (Xt , t) v (Xt , t)f (Xt ) dt 2 + Zt (x v (Xt , t))T c(Xt , t)dBt and since v is the solution to (169) then dUt = (x v (Xt , t))T c(Xt , t)dBt Integrating
T
(175) (176)
UT Us =
s
Yt (x v (Xt , t))T c(Xt , t)dBt
(177)
taking expected values
E[UT | Xs = x] E[Us | Xs = x] = 0 (178) where we used the fact that the expected values of integrals with respect to Brownian motion is zero. Thus, since Us = Y0 Z0 = v (Xs , s) E[UT | Xs = x] = E[Us | Xs = x] = v(x, s) (179) Using the denition of UT we get
v (x, s) = E[v (XT , T )e s f (Xt )dt | Xs = x] We end the proof by showing that dZt = Zt f (Xt )dt
RT
(180) (181)
First let Yt =
t s
f (X )d and note
t+t
Yt =
t
f (X )d f (Xt )t dYt = f (Xt )dt
(182) (183) (184) (185)
Let Zt = exp(Yt ). Using Itos rule 1 dZt = eYt dYt + 2 eYt (dYt )2 = eYt f (Xt )dt = Zt f (Xt )dt 2 where we used the fact that 2 (dYt )2 = Zt f (Xt )2 (dt)2 = 0
10
Kolmogorov Backward equation
The Kolmogorov backward equation tells us at time s whether at a future time t the system will be in the target set A. We let be the indicator function of A, i.e, (x) = 1 if x A, otherwise it is zero. We want to know for every state x at time s < T what is the probability of ending up in the target set A at time T . This is call the the hit probability. Let X be an Ito diusion dXt = a(Xt , t)dt + c(Xt , t)dBt X0 = x The hit probability p(x, t) satises the Kolmogorov backward pde i.e.,
(x,t) pt = (x,t) ai (x, t) p xi + 1 2 i,j (c(x, t)c(x, t) T p(x,t) )ij xi xj
2
(186) (187)
p(x, t) = Gt [p](x, t) t
(188)
(189)
subject to the nal condition p(x, T ) = (x). The equation can be derived from the Feynman-Kac formula, noting that the hit probability is an expected value over paths that originate at x at time s T , and setting f (x) = 0, q (x) = (x) for all x p(x, t) = p(XT A | Xt = x) = E[ (XT ) | Xt = x] = E[q (XT )e
RT
t
f (Xs )ds
(190)
11
The Kolmogorov Forward equation

dXt = a(Xt , t)dt + c(Xt , t)dBt X0 = x0 (191) (192)
Let X be an Ito diusion
with generator G. Let p(x, t) represent the probability density of Xt evaluated at x given the initial state x0 . Then p(x, t) = G [p](x, t) (193) t where G is the adjoint of G, i.e.,
p(x,t) t
i xi [p(x, t)ai (x, t)]
1 2
2 T i,j xi xj [(c(x, t)c(x, t) )ij p(x, t)]
(194)
It is sometimes useful to express the equation in terms of the negative divergence (inow) of a probability current J , caused by a probability velocity V p(x, t) = J (x, t) = t J (x, t) = p(x, t)V (x, t) 1 Vi (x, t) = ai (x, t) 2 k (x) = c(x, t)c(x, t)T Ji (x, t) xi (195) (196) k (x, t)i,j log(p(x, t)ki,j (x)) xj (197) (198)
From this point of view the Kolmogorov forward equation is just a law of conservation of probability (the rate of accumulation of probability in a state x equals the inow of probability due to the probability eld V ). 11.1 Example: Discretizing an SDE in state/time
Consider the following SDE dXt = a(Xt )Xt dt + c(Xt )dBt (199) The Kolmogorov Forward equation looks as follows p(x, t) a(x)p(x, t) 1 2 c(x)2 p(x, t) (200) = + t x 2 x2 Discretizing in time and space, to rst order p(x, t) 1 = (p(x, t + t ) p(x, t)) (201) t t and 1 a(x)p(x, t) = (a(x + x )p(x + x , t) a(x x )p(x x , t)) x 2x 1 a(x) a(x) = )p(x + x , t) (a(x) x )p(x x , t) (a(x) + x 2x x x a(x) a(x) a(x) a(x) = p(x + x , t)( ) p(x x , t)( ) (202) + 2x 2x 2x 2x and 2 c2 (x)p(x, t) 1 = 2 c2 (x + x )p(x + x , t) + c2 (x x )p(x x , t) 2c2 (x)p(x, t) 2 x x (203) 1 c(x) c(x) = 2 (c2 (x) + 2x c(x) )p(x + x , t) + (c2 (x) 2x c(x) )p(x x , t) x x x 2c2 (x)p(x, t) = p(x + x , t)( 2p(x, t) c(x) x c(x) x
2 2
+2
c(x) c(x) ) x x
c(x) 2 c(x) c(x) ) (204) 2 x x x Putting it together, the Kolmogorov Forward Equation can be approximated as follows p(x, t + t ) p(x, t) a(x) a(x) =p(x x , t)( ) t 2x 2x a(x) a(x) p(x + x , t)( + ) 2x 2x 1 c(x) 2 c(x) c(x) + p(x + x , t)( + ) 2 x x x c(x) 2 p(x, t) x 1 c(x) 2 c(x) c(x) + p(x x , t)( ) (205) 2 x x x + p(x x , t)(
Rearranging terms p(x, t + t ) =p(x, t) 1 t c2 (x) 2 x a(x) t c2 ( x ) c(x) + p(x x , t) 2c(x) + a(x) x 2x x x x c(x) a(x) t c2 ( x ) + 2c(x) a(x) x + p(x + x , t) 2x x x x
(206)
Considering in a discrete time, discrete state system p(Xt+t = x) =

x
p(Xt = x )p(Xt+t = x | Xt = x )
(207)
we make the following discrete time/discrete state approximation c2 (x) c(x) a(x) t if xt+t = xt x 2x x a(x) + 2c(x) x x x 2 t c (x) c(x) a(x) + a(x) 2c(x) x x x if xt+t = xt + x x p(xt+t | xt ) = 2x 2 t c (x) if xt+t = xt 1 2 x 0 else (208) Note if the derivative of the drift function is zero, i.e., a(x)/x = 0 the conditional probabilities add up to one. Not sure how to deal with the case in which the derivative is not zero. 11.2 Girsanovs Theorem (Version I)
Let (, F , P) be a probability space. Let B be a standard m-dimensional Brownian motion adapted to the ltration Ft . Let X, Y be dened by the following SDEs dXt = a(Xt )dt + c(Xt )dBt dYt = (c(Yt )Ut + a(Yt ))dt + c(Yt )dBt X0 = Y0 = x (209) (210) (211)
where Xt Rn , Bt Rm and a, c satisfy the necessary conditions for the SDEs to t be well dened and Ut is an Ft adapted process such that P( 0 c(Xs )Us 2 ds < ) = 1. Let
t
Zt = t = e and
Us dBs +
0 Zt
1 2
Us Us ds
0
(212) (213)
dQt = t dP i.e., for all A Ft
(214)
Qt (A) = E P [t IA ]
t
(215)
Then Wt =
0
Us ds + Bt
(216)
is a standard Brownian motion with respect to
Qt .
P the log-likelihodd of W
where G1 , G2 , are independent standard Gaussian vectors under P. Thus, under is as follows log p(W ) = h(n, t) 1 2t
n1
Informal Proof: Well provide a heuristic argument for the discrete time case. In discrete time the equation for W = (W1 , , Wn ) would look as follows Wk = Uk t + tGk (217)
(Wk Uk t) (Wk Uk t)
k=1
(218)
where h(n, t) is constant with respect to W, U . For W to behave as Brownian motion under Q we need the probability density of W under Q to be as follows 1 log q (W ) = h(n, t) 2t
n1
W k W k
k=1
(219)
Let the random variable Z be dened as follows p(W ) Z = log q (W ) where q, p represent the probability densities of W under tively. Thus
n1
(220)
Q and under P respec(221)
Z=
k=1 n1
Uk (Wk Uk t) + Uk
k=1
1 2
n1
Uk Uk t
k=1
1 tGt + 2
n1
Uk Uk t
k=1
(222) (223)
Note as t 0
t
Z
0
Us dBs +
1 2
Us Us ds
0
(224) (225)
q (W ) eZ p(W )
Remark 11.1. dYt = (Ht + a(Xt ))dt + c(Yt )dBt = a(Xt )dt + c(Yt )(Ut + dBt ) = a(Xt )dt + c(Yt )dWt (226) (227) (228)
Therefore the distribution of Y under Qt is the same as the distribution of X under P, i.e., for all A Ft .
P(X A) = Qt (Y
A)
(229)
or more generally
E P [f (X0:t )] = E Qt [f (Y0:t )] = E P [f (Y0:t )t ]
(230)
Remark 11.2. Radon Nykodim derivative t is the Radon Nykodim derivative of Qt with respect to P. This is typically represented as follows dQt t = eZt = (231) dP We can get the derivative of P with respect to Qt by inverting t , i.e., dP (232) = eZ t dQt Remark 11.3. Likelihood Ratio This tells us that t is the likelihood ratio between the process X0:t and the process Y0:t , i.e., the equivalent of pX (X )/pY (Y ) where pX , pY are the probability densities of X and Y . Remark 11.4. Importance Sampling t can be used in importance sampling schemes. Suppose (y [1] , [1] ), , (y [n] , [n] ) are iid samples from (Y0:t , t ), then we can estimate E P [f (X0:t )] as follows E P [f (X0:t )] 11.2.1 Let dXt = c(Xt )dBt dYt = b(Yt )dt + c(Yt )dBt X0 = Y0 = x Let
t
1 n
f (y [i] )[i]
i=1
(233)
Girsanov Version II
(234) (235) (236)
Zt =
0
b(Ys ) (c(Ys )1 ) dBs 1 2

t
b(Ys ) k (Ys )b(Ys )ds

0
(237) (238)
dP = eZt dQt where Then under k (Ys ) = (c(Ys ) c(Ys ))1
Qt the process Y
(239)
has the same distribution as the process X under
P.
Proof. We apply Girsanovs version I with a(Xt ) = 0, Ut = c(Yt )1 b(Xt ). Thus

t
Zt =
0 t
Us dBs +
1 2
Us Us ds
0
=
0
b(Ys ) (c(Ys )1 ) dBs 1 2

t

0
(240)
And under drift.
Qt
dP = eZt (241) dQt the process Y looks like the process X , i.e., a process with zero
11.2.2 Let
Girsanov Version III
dXt = c(Xt )dBt dYt = b(Yt )dt + c(Yt )dBt X0 = Y0 = x Let

t
(242) (243) (244)
Zt =
0
b(Ys ) k (Ys )dYt 1 2

t

0
(245) (246)
dP = eZt dQt where k (Ys ) = (c(Ys ) c(Ys ))1
(247)
Then under
Qt the process Y
t
has the same distribution as the process X under
P.
Proof. We apply Girsanovs version II Zt = + =

0
b(Ys ) (c(Ys )1 ) dBs

0 t
1 2
t
b(Ys ) (c(Ys )1 ) c(Ys )1 b(Ys )ds

0
(248)
b(Ys ) (c(Ys )1 ) c(Ys )1 (dYt b(Ys )ds) 1 2

t 0 t
+ =
b(Ys ) (c(Ys )1 ) c(Ys )1 b(Ys )ds

0
(249)
b(Ys ) k (Ys )dYt 1 2

t

0
(250) (251)
dP = eZt dQt
Informal Discrete Time Based Proof: For a given path x the ratio of the probability density of x under P and Q can be approximated as follows dP(x) dQt (x) p(xtk+1 xtk | xtk ) q (xtk+1 xtk | xtk ) (252)
were = {0 = t0 < t1 < tn = T } is a partition of [0, T ] and p(xtk+1 | xtk ) = G (xtk | a(xtk )tk , tk k (xt )1 ) q (xtk+1 | xtk ) = G (xtk | 0, tk k (xt )
1
(253) (254)
where G (, , ) is the multivariate Gaussian distribution with mean and covariance matrix c. Thus dP(x) log dQt (x)
n1
k=0
1 (xtk a(xtk )tk ) k (xt )(xtk a(xtk )tk ) 2tk
xtk k (xt )xtk

n1
=
k=0
1 a(xtk ) k (xt )xtk a(xtk ) k (xtk )a(xtk )tk 2
(255)
taking limits as | | 0 log dP(x) = dQt (x)

T
a(Xt ) k (Xt ) dXt

0
1 2
a(Xt ) k (Xt )a(Xt ) dt

0
(256)
Theorem 11.1. The dt dierential Let Xt be an Ito process of the form dXt = a(t, Xt )dt + c(t, Xt )dBt Let t = eZt
t
(257)
(258) a(s, Xs ) k (t, Xt )dXs 1 2

t
Zt =
0
a(s, Xs ) k (s, Xs )a(s, Xs )ds

0
(259)
where k (t, x) = (c(t, x)c(t, x) )1 .Then dt = t a(t, Xt ) k (t, Xt )dXt (260)
Proof. From Itos product rule d(t f (Xt )) = f (Xt )dt + t f (Xt ) + (dt )(df (Xt ) Note dXt k (t, X )a(t, Xt )a (t,k (t, Xt )dXt 1 = dt = (z t ) dZt + dZt (2 z t )dZt 2 1 = t dZt + dZt dZt 2 Moreover, from the denition of Zt 1 dZt = a(t, Xt ) k (t, Xt )dXt a(t, Xt ) k (t, Xt )a(t, Xt )dt 2 (265) (262) (263) (264) (261)
Thus dZt dZt = dXt k (t, X )a(t, Xt )a (t, Xt )k (t, Xt )dXt =dBt c(t, Xt ) k (t, X )a(t, Xt )a (t, Xt )k (t, Xt )c(t, Xt )dBt = trace c(t, Xt ) k (t, Xt )a(t, Xt )a (t, Xt )k (t, Xt )c(t, Xt ) dt = trace c(t, Xt )c(t, Xt ) k (t, Xt )a(t, Xt )a (t, Xt )k (t, Xt ) dt = trace a(t, Xt )a (t, Xt )k (t, Xt ) dt = trace a(t, Xt )a (t, Xt )k (t, Xt ) dt = trace a (t, Xt )k (t, Xt )a(t, Xt ) dt = a (t, Xt )k (t, Xt )a(t, Xt )dt Thus dt = t a(t, Xt ) k (t, Xt )dXt (274) (266) (267) (268) (269) (270) (271) (272) (273)
12
Let
Zakais Equation
dXt = a(Xt )dt + c(Xt )dBt dYt = g (Xt )dt + h(Xt )dWt t = eZt
t
(275) (276) (277)

t
Zt =
0
g (Yt ) k (Yt )dYt +
1 2
g (Yt ) k (Yt )g (Yt )dt

0
(278) (279) (280)
k (Yt ) = (h(Yt )h(Yt ) )1 Using Itos product rule d(f (Xt )t ) = t df (Xt ) + f (Xt )dt + df (Xt )dt where 1 df (Xt ) = x f (Xt ) dXt + trace c(Xt )c (Xt )2 x f (Xt ) dt 2 = Gt [f ](x) + x f (Xt )c(Xt )dBt dt = t g (Yt )k (Yt )dYt
(281) (282)
Following the rules of Itos calculus we note dXt dYt is an n m matrix of zeros. Thus d(f (Xt )t ) = t Gt [f ](x) + x f (Xt )c(Xt )dBt + g (Yt )k (Yt )dYt (283)
13
Solving Stochastic Dierential Equations

Let dXt = a(t, Xt )dt + c(t, Xt )dBt . (284)
dBt dBt t Conceptually, this is related to dX dt = a(t, Xt ) + c(t, Xt ) dt where dt is white dBt noise. However, dt does not exist in the usual sense, since Brownian motion is nowhere dierentiable with probability one.
We interpret solving for (284), as nding a process Xt that satises

t t
Xt = M +
0
a(s, Xs ) ds +
0
c(s, Xs ) dBs.
(285)
for a given standard Brownian process B . Here Xt is an Ito process with a(s, Xs ) = Ks and c(s, Xs ) = Hs . a(t, Xt ) is called the drift function. c(t, Xt ) is called the dispersion function (also called diusion or volatility function). Setting b = 0 gives an ordinary dierential equation.
Example 1: Geometric Brownian Motion

dXt = aXt dt + bXt dBt X0 = > 0 Using Itos rule on log Xt we get (286) (287)
d log Xt =
1 1 1 (dXt )2 2 dXt + Xt 2 Xt dXt 1 = b2 dt Xt 2 1 = aXt b2 dt + dBt 2
(288) (289) (290)
Thus 1 log Xt = log X0 + a b2 t + bBt 2 and

1 2 Xt = X0 e(a 2 b )t+bBt
(291)
(292)
Processes of the form Yt = Y0 et+Bt (293)
where and are constant, are called Geometric Brownian Motions. Geometric Brownian motion Xt is characterized by the fact that the log of the process is Brownian motion. Thus, at each point in time, the distribution of the process is log-normal. Lets study the dynamics of the average path. First let Yt = ebBt (294)
Using Itos rule 1 dYt = bebBt dBt + b2 ebBt (dBt )2 2 t 1 Yt = Y0 + b Ys dBs + b2 +0t Ys ds 2 0 t E(Yt ) = E(Y0 ) + 1 b2 E(Ys )ds 2 0 1 dE(Yt ) = b2 E(Yt ) dt 2
1 2 2b t
(295) (296) (297) (298) (299)
E(Yt ) = E(Y0 )e
Thus
=e
1 2 2b t
1 2 b )t )t E(Y ) = E(X )e(a 2 (300) t 0 Thus the average path has the same dynamics as the noiseless system. Note the result above is somewhat trivial considering 1 2 2b
E(Xt ) = E(X0 )e(a
E(dXt ) = dE(Xt ) = E(a(Xt ))dt + E(c(Xt )dBt ) = E(a(Xt ))dt + E(c(Xt ))E(dBt )
dE(Xt ) = E(a(Xt ))dt
(301) (302) (303) (304)
and in the linear case
E(a(Xt ))dt = E(at Xt + ut )dt = at E(Xt ) + ut )dt
These symbolic operations on dierentials trace back to the corresponding integral operations they refer to.
14
14.1
Linear SDEs
The Deterministic Case (Linear ODEs)
Let xt n be dened by the following ode dxt = axt + u dt The solution takes the following form: Constant Coecients xt = eat x0 + a1 (eat I )u To see why note dxt = aeat x0 + eat u dt and axt + u = aeat x0 + eat u u + u = dxt dt Example: Let xt be a scalar such that dxt = (u xt ) dt xt = et x0 1 t (e 1)u = et x0 + (1 et )u
(305)
(306)
(307)
(308)
(309)
Thus
(310)
Time variant coecients
Let xt n be dened by the following ode dxt = at xt + ut dt xo =
(311) (312)
where ut is known as the driving, or input, signal. The solution takes the following form:
t
xt = t x0 +
0
1 s us ds
(313)
where t is an n n matrix, known as the fundamental solution, dened by the following ODE dt = at t (314) dt 0 = I n (315) 14.2 The Stochastic Case
Linear SDEs have the following form

m
dXt = (at Xt + ut )dt +

i=1
(ci,t Xt + vi,t ) dBi,t

m
(316) (317) (318)
= (at Xt + ut )dt + vt dBt +

i=1
ci,t Xt dBi,t
X0 =
where Xt is an n dimensional random vector, Bt = (B1 , , Bm ), bi,t are n n matrices, and vi,t are the n-dimensional column vectors of the n m matrix vt . If bi,t t = 0 for all i, t we say that the SDE is linear in the narrow sense. If vt = 0 for all t we say that the SDE is homogeneous. The solution has the following form
t m 1 s 0 t m 1 s 0 i=1
Xt = t
X0 +
us
i=1
bi,s vi,s
ds +
vi,s dBi,s
(319)
where t is an n n matrix satisfying the following matrix dierential equation

m
dt = at t dt +
i=1
bi,s s dBi,s
(320) (321)
0 = In
One property of the linear Ito SDEs is that the trajectory of the expected value equals the trajectory of the associated deterministic system with zero noise. This is due to the fact that in the Ito integral the integrand is independent of the integrator dBt :
E(dXt ) = dE(Xt ) = E(a(Xt ))dt + E(c(Xt )dBt ) = E(a(Xt ))dt + E(c(Xt ))E(dBt )
dE(Xt ) = E(a(Xt ))dt
(322) (323) (324) (325)
and in the linear case
E(a(Xt ))dt = E(at Xt + ut )dt = at (E(Xt ) + ut )dt
14.3
Solution to the Linear-in-Narrow-Sense SDEs
In this case dXt = (at Xt + ut ) dt + vt dBt (326) X0 = (327) where v1 , , vm are the columns of the nm matrix v , and dBt is an m-dimensional Brownian motion. In this case the solution has the following form
t t 1 s us ds + 0 0 1 s vs dBs
Xt = t X0 +
(328)
where is dened as in the ODE case d t = at dt 0 = In
(329) (330)
White Noise Interpretation This solution can be interpreted using a symbolic view of white noise as dBt Wt = (331) dt and thinking of the SDE as an ordinary ODE with a driving term given by ut +vt Wt . We will see later that this interpretation breaks down for the more general linear case with bt = 0. Moment Equations Let r,s = E ((Xr mr )(Xs ms )) t = t,t = Var(Xt ) Then dE(Xt ) = at dE(Xt ) + ut dt
2 def
def
(332) (333) (334)
E(Xt ) = t E(X0 ) +
t 2 2 t = t 0 + 0
t 1 s us ds 0 T
(335) (336) (337)

T
1 1 s v s s v s
ds T t
d2 t 2 T T = at 2 t + t a t + vt v dt
r s
r,s = r 2 0+
0
1 1 t v t t v t
dt T s
(338)
where r s = min{r, s}. Note the mean evolves according to the equivalent ODE with no driving noise. Constant Coecients: dXt = aXt + u + vdBt In this case t = eat (340)
t
(339)
E(Xt ) = eat E(X0 ) +
eas uds
0 t
(341)
T
at Var(Xt ) = 2 2 t =e 0+ 0
eas vv T eas
ds
eat
(342)
Example: Linear Boltzmann Machines (Multidimensional OU Process) dXt = ( Xt )dt + 2 dBt (343) where is symmetric and > 0. Thus in this case a = , u = , t = et and
t
(344) (345) (346)

t
Xt = et X0 +
0
es ds +
t 0
2
0 t
es dBs es dBs
0
(347) (348) (349)
=et X0 + 1 es
2
t
=et X0 + (et I ) + Thus
2
0
eas dBs
E(Xt ) = et E(X0 ) + lim E(Xt ) = t
I et
(350) (351)
t t Var(Xt ) = 2 2 t =e 0 + 2 0
e2s ds et
(352) (353) (354) (355) (356) (357)
= e2t 2 0 + 2
1 2s e 2
t 0
1 = e2ts 2 e2t I 0 + 1 = e2t 2 I e2t 0 + 1 = 1 + e2t 2 0 lim Var(Xt ) = 1 t
where we used the fact that a, eat are symmetric matrices, and eat dt = a1 eat . If the distribution of X0 is Gaussian, the distribution continues being Gaussian at all times. Example: Harmonic Oscillator Let Xt = (X1,t , X2,t )T represent the location and velocity of an oscillator dXt = aXt + thus Xt = eat X0 +
0
0 b
dBt =
0
t
1 0 b
dXt +
0 b
dBt
(358)
eas
dBs
(359)
14.4
Solution to the General Scalar Linear Case
Here we will sketch the proof for the general solution to the scalar linear case. We have Xt R dened by the following SDE dXt = (at Xt + ut ) dt + (bt Xt + vt ) dBt (360)
In this case the solution takes the following form

t t 1 s (us vs bs )ds + 0 0 1 s vs dBs
Xt = t X0 +
(361) (362) (363)
dt = at t dt + bt t dBt 0 = 1 Sketch of the proof Use Itos rule to show

1 1 1 2 d t = (b at )t dt bt t dBt
(364) (365) (366) (367) (368)
Use Itos rule to show that if Y1,t , Y2,t are scalar processes dened as follows dY1,t = a1 (t, Y1,t )dt + b1 (t, Y1,t )dBt , then d(Y1,t Y2,t ) = Y1,t dY2,t + Y2,t dY1,t + b1 (t, Y1,t )b2 (t, Y2,t )dt Use the property above to show that
1 1 d(Xt t ) = t ((ut vt bt )dt + vt dBt )
for i = 1, 2
Integrate above to get (361) Use Itos rule to get

1 d log t =
1 2 b at dt bt dBt 2 t
t 0
(369)
t
1 log t = log t =
1 (as bs )ds + 2
bs dBs .
0
(370)
White Noise Interpretation Does not Work The white noise interpretation of the general linear case would be dXt = (at Xt + ut )dt + (bt Xt + vt )Wt dt = (at + bt Wt )Xt dt + (ut + vt Wt )dt (371) (372)
If we interpret this as an ODE with noisy driving terms and coecients, we would have a solution of the form
t
Xt = t X0 +
0 t
1 s (us + vt Wt ) dt t 1 s (us vs bs )ds + 1 s vs dBs 0
(373) (374) (375)
= t X0 +
0
with dt = (at + bt Wt )t dt = at t dt + bt t dBt 0 = 1 The ODE solution to would be of the form

t t t
(376) (377)
log t =
0
(as + bs Ws )ds =
0
as ds +
0
bs dBs
t b d /2. 0 s s
(378)
which diers from the Ito SDE solution in (370) by the term
14.5
Solution to the General Vectorial Linear Case
Linear SDEs have the following form

m
dXt = (at Xt + ut )dt +

i=1
(bi,t Xt + vi,t ) dBi,t

m
(379) (380) (381)
= (at Xt + ut )dt + vdBt +

i=1
bi,t Xt dBi,t
X0 =
where Xt is an n dimensional random vector, Bt = (B1 , , Bm ), bi,t are n n matrices, and vi,t are the n-dimensional column vectors of the n m matrix vt . The solution has the following form
t m 1 s 0 t m 1 s 0 i=1
Xt = t
X0 +
us
i=1
bi,s vi,s
ds +
vi,s dBi,s
(382)
where t is an n n matrix satisfying the following matrix dierential equation

m
dt = at t dt +
i=1
bi,s s dBi,s
0 = In
(383)
An explicit solution for cannot be found in general even when a, bi are constant. However if in addition of being constant they pairwise commute, agi = gi a, gi gj = gj gi for all i, j then t = exp a 1 2
m m
b2 i
i=1
t+
i=1
bi Bi,t
(384)
Moment Equations
T ), then Let mt = E(Xt ), st = E(Xt Xt
dmt = at mt + ut , dt m dst T T = at st + st aT + bi,t mt bT t i,t + ut mt + mt ut dt i=1

m
(385) (386) (387)
+
i=1
T T T bi,t mt vi,t + vi,t mT t bi,t + vi,t vi,t ,
The rst moment equation can be obtained by taking expected values on (379). Note it is equivalent to the dierential equation one would obtain for the deterministic part of the original system. For the second moment equation apply Itos product rule
T T T dXt X T = Xt dXt + (dXt )Xt + (dXt )dXt m
T Xt dXt
T (dXt ) Xt
+
i=1
T T (bi,t Xt + vi,t ) Xt bi,t + vi,t
dt
(388)
substitute the dXt for its value in (379) and take expected values.
Example: Multidimensional Geometric Brownian Motion

n
dXi,t = ai Xi,t dt + Xi,t

j =1
bi,j dBj,t
(389)
for i = 1, , n. Using Itos rule 1 1 2 dXi,t 2 + (dXi,t ) = d log Xi,t = Xi,t 2 Xi , t 1 ai + b2 bi,j dBi,j i,j 2 j j Thus n n 1 Xi,t = Xi,0 exp ai b2 t + b B i,j j,t 2 j =1 i,j j =1 n n 1 b2 t + log bi,j Bj,t log Xi,t = log(Xi,0 ) + ai 2 j =1 i,j j =1 (390)
(391)
(392)
(393)
and thus Xi,t has a log-normal distribution.
15
Important SDE Models

Stock Prices: Exponential Brownian Motion with Drift dXt = aXt dt + bXt dBt Vasiceck( 1977) Interest rate model: OU Process dXt = ( Xt )dt + bdBt Cox-ingersol-Ross (1985) Interest rate model dXt = ( Xt )dt + b Xt dBt Constant Elasticity of Variance process
dX= Xt + Xt dBt , 0
(394)
(395)
(396)
(397)
Generalized Cox Ingersoll Ross Model for short term interest rate, proposed by Chan et al (1992).
dXt = (0 1 Xt )dt + Xt dBt , for , > 0
(398)
Let s = X Thus where a(x) = 1 (0 + 1 x ) x x 2 (401)

1 Xs (1 )
(399) (400)
s = a(X s )ds + dBs dX
where
x = ( (1 )x)(1)
(402)
A special case is when = 0.5. In this case the process increments are known to have a non-central chi-squared distribution (Cox, Ingersoll, Ross, 1985) Logistic Growth Model I dXt = aXt (1 Xt /k )dt + bXt dBt The solution is Xt = Model II dXt = aXt (1 Xt /k )dt + bXt (1 Xt /k )dBt Model III
2 dXt = rXt (1 Xt /k )dt srXt dBt
(403)
X0 exp (a b2 /2)t + bBt 1+

t X0 k a 0
exp {(a b2 /2)s + bBs } ds
(404)
(405) (406)
In all the models r is the Maltusian growth constant and k the carrying capacity of the environment. In model II, k is unattainable. In the other models Xt can have arbitrarily large values with nonzero probability. Gompertz Growth dXt = Xt dt rXt log k Xt dt + bXt dBt (407)
where r is the Maltusian growth constant and k the carrying capacity. For = 0 we get the Skiadas version of the model, for r = 0 we get the lognormal model. Using Itos rule on Yt = et log Xt we can get expected value follows the following equation
E(Xt ) = exp
where
log(x0 )ert exp
(1 ert ) exp r
b2 (1 e2rt 4r
(408)
b2 2 Something shy in expected value formula. Try = b = 0! =
(409)
16
Stratonovitch and Ito SDEs
Stochastic dierential equations are convenient pointers to their corresponding stochastic integral equations. The two most popular stochastic integrals are the Ito and the Stratonovitch versions. The advantage of the Ito integral is that the integrand is independent of the integrator and thus the integral is a Martingale. The advantage of the Stratonovitch denition is that it does not require changing the rules of standard calculus. The Ito interpretation of
m
dXt = f (t, Xt )dt +

j =1
gj (t, Xt )dBj,t
(410)
is equivalent to the Stratonovitch equation m m m 1 g i,j dXt = f (t, Xt )dt gi j (t, Xt ) dt + gj (t, Xt )dBj,t 2 i=1 j =1 xi j =1 and the Stratonovitch interpreation of
m
(411)
dXt = f (t, Xt )dt +

j =1
gj (t, Xt )dBj,t
(412)
is equivalent to the Ito equation 1 dXt = f (t, Xt )dt + 2

m m
i=1 j =1
m gi,j gi j (t, Xt ) dt + gj (t, Xt )dBj,t xi j =1
(413)
17
SDEs and Diusions

Diusions are rocesses governed by the Fokker-Planck-Kolmogorov equation. All Ito SDEs are diusions, i.e., they follow the FPK equation. There are diusions that are not Ito diusions, i.e., they cannot be described by an Ito SDE. Example: diusions with reection boundaries.
18
18.1
Appendix I: Numerical methods

Simulating Brownian Motion Innitesimal piecewise linear path segments
18.1.1
Get n independent standard Gaussian variables {Z1 , , Zn }; as follows, E(Zi ) = 0; Var(Zi ) = 1. Dene the stochastic process B t = 0 B 0 t = B t + t1 t0 Z1 B 1 0 . . . t Bt = B + t k t k 1 Z k
k k 1
(414) (415) (416) (417)
Moreover, t = B t B for t [tk1 , tk ) (418) k 1 This denes a continuous time process that converges in distribution to Brownian motion as n . 18.1.2 Linear Interpolation
Same as above but linearly interpolating the starting points of path segments. t = B t t t )/(tk tk1 ) for t [tk1 , tk ) B + (t tk )(B B (419) k 1 k1 k Note this approach is non-causal, in that it looks into the future. I believe it is inconsistent with Itos interpretation and converges to Stratonovitch solutions
18.1.3 t ( ) = B
2 2T (2k+1)R
Fourier sine synthesis

n1 k=0
Zk ( )k (t)
where Zk ( ) are same random variables as in previous approach, and k (t) =

Rt sin (2k+1) 2T As n B converges to BM in distribution. Note this approach is non-causal, in that it looks into the future. I believe it is inconsistent with Itos interpretation and converges to Stratonovitch solutions
18.2
Simulating SDEs
Our goal is to simulate dXt = a(Xt )dt + c(Xt )dBt , 0 t T X0 = (420)
Order of Convergencce Let 0 = t1 < t2 < tk = T A method is said to have strong oder of convergence if there is aconstant K such that k < K (t ) sup E Xtk X (421) k
tk
A method is said to have week oder of convergence if there is a constant K such that k ] < K (t ) sup E[Xtk ] E[X (422) k
tk
Euler-Maruyama Method k = X k1 + a(X k1 )(tk tk1 ) + c(X k1 )(Bk Bk1 ) X Bk = Bk1 + tk tk1 Zk where Z1 , , Zn are independent standard Gaussian random variables. The Euler-Maruyama method has strong convergence of order = 1/2, which is poorer of the convergence for the Euler method in the deterministic case, which is order = 1. The Euler-Maruyama method has week convergence of order = 1. Milsteins Higher Order Method: It is based on a higher order truncation of the Ito-Taylor expansion k = X k1 + a(X k1 )(tk tk1 ) + c(X k1 )(Bk Bk1 ) X 1 + c(Xk1 )x c(Xk1 ) (Bk Bk1 )2 (tk tk1 ) 2 Bk = Bk1 + tk tk1 Zk where Z1 , , Zn are independent standard Gaussian random variables. method has strong convergence of order = 1. (425) (426) (427) This (423) (424)
19
History
The rst version of this document, which was 17 pages long, was written by Javier R. Movellan in 1999.
The document was made open source under the GNU Free Documentation License 1.1 on August 12 2002, as part of the Kolmogorov Project. October 9, 2003. Javier R. Movellan changed the license to GFDL 1.2 and included an endorsement section. March 8, 2004: Javier added Optimal Stochastic Control section based on Oksendals book. September 18, 2005: Javier. Some cleaning up of the Ito Rule Section. Got rid of the solving symbolically section. January 15, 2006: Javier added new stu to the Ito rule. Added the Linear SDE sections. Added Boltzmann Machine Sections. January/Feb 2011. Major reorganization of the tutorial.

Tutorial On Stochastic Differential Equations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tutorial On Stochastic Differential Equations

Uploaded by

Copyright:

Available Formats

Please cite as J. R. Movellan (2011) Tutorial on Stochastic Dierential Equations, MPLab Tutorials Version 06.1.

Tutorial on Stochastic Dierential Equations

Copyright c 2003, 2004, 2005, 2006, Javier R. Movellan

c(Xt )Bk k k=0

Standard Brownian motion

Properties of Brownian Motion Statistics

From the properties of Gaussian random variables, E(Bt Bs ) = 0 Var(Bt Bs ) = E[(Bt Bs ) ] = t s

(12) (13) (14) (15) (16) (17)

E((Bt Bs )4 ] = 3(t s) Var[(Bt Bs )2 ] = E[(Bt Bs )4 ] E[(Bt Bs )2 ]2 = 2(t s)2

< X, X >2 t = lim

n1 2 Var (Bk ) k=0 n1

lim Var(S ) = lim

If a path Xt ( ) were dierentiable almost everywhere in the interval [0, T ] then

< X, X >2 t ( )) lim = ( max Xt ( )2 ) lim

(31) (32) (33)

= ( max Xt ( )2 ) lim (n)(T /n)2 = 0

(34) t1 t0 Z1 (35) (36)

Simulate Brownian motion and verify numerically the following properties

(39) (40) ds = t (41)

The Ito Stochastic Integral

We want to give meaning to the expression

where k (t) = 1 if t [tk , tk+1 ) 0 else (45)

is a random variable dened as follows. For each outcome

f (s, )dBs ( ) = lim

Properties of the Ito Integral

E(It ) = 0 E(Xs2 )ds

Xs dBs for t (0, T )

E(It | Fs ) = Is lfor all t > s

Stochastic Dierential Equations

It should now be clear that It is in fact an Ito Stochastic Integral

Second order dierentials

The following rules are useful

(61) (62) (63) (64) (65)

Xt dBt dWt = 0 if B, W are independent Brownian Motions

Regarding dBt dt = 0 note

Xtk tBk = lim t

2 = dt note Regarding dBt

n1 2 Xtk (Bk t ) k=0

2 2 t)] Xtk (Bk t)(Bk

2 2 Xtk (Bk t)(Bk t)]

2 2 = E[Xtk Xtk (Bk t)]E[Bk t)] = 0 2 (Bk

Equivalently, if k > k then t) and therefore

2 t) is independent of Xtk Xtk (Bk

2 2 Xtk (Bk t)(Bk t)]

2 2 = E[Xtk Xtk (Bk t)]E[Bk t)] = 0

2 2 Xtk (Bk t)(Bk t)] = k k=0

2 E[Xt2 (Bk t)2 ]

(75) Note since Bk is independent of Xtk then

which goes to zero as t 0. Thus, in the limit as t 0

where the limit is taken in the mean square sense. Thus

Regarding dBt dWt = 0 note

If k > k then Bk , Wk are independent of Xtk Xtk Bk Wk and therefore

Xtk Bk Wk Bk Wk ] = E[Xtk Xtk Bk Wk ]E[Bk ]E[Wk ] = 0 (82) are independent of

Equivalently, if k > k then Bk , Wk Xtk Xtk Bk Wk and therefore

Xtk Bk Wk Bk Wk ] = E[Xtk Xtk Bk Wk ]E[Bk ]E[Wk ] = 0 (83)

Finally, for k = k , Bk , Wk and Xtk are independent, thus Thus

which converges to 0 as t 0. Thus

Vector Stochastic Dierential Equations

The form dXt = a(Xt )dt + c(Xt )dBt (87)

where dBi,t dBj,t = (i, j ) dt dXdt = 0 dt = 0 Equivalently

(92) (93) (94)

where x f (x, t)T a(x, t) =

f (x, t) ai (x, t) xi (c(x, t)c(x, t)T )ij