You are on page 1of 198

LECTURE NOTES

MEASURE THEORY and PROBABILITY


Rodrigo Ba nuelos
Department of Mathematics
Purdue University
West Lafayette, IN 47907
June 20, 2003
2
I
SIGMA ALGEBRAS AND MEASURES
1 Algebras: Denitions and Notation.
We use to denote an abstract space. That is, a collection of objects called
points. These points are denoted by . We use the standard notation: For A, B
, we denote A B their union, A B their intersection, A
c
the complement of
A, AB = A B = x A: x , B = A B
c
and AB = (AB) (BA).
If A
1
A
2
, . . . and A =

n=1
A
n
, we will write A
n
A. If A
1
A
2
. . .
and A =

n=1
A
n
, we will write A
n
A. Recall that (
n
A
n
)
c
=
n
A
c
n
and
(
n
A
n
)
c
=
n
A
c
n
. With this notation we see that A
n
A A
c
n
A
c
and
A
n
A A
c
n
A
c
. If A
1
, . . . , A
n
, we can write

n
j=1
A
j
= A
1
(A
c
1
A
2
) (A
c
1
A
c
2
A
3
) . . . (A
c
1
. . . A
c
n1
A
n
), (1.1)
which is a disjoint union of sets. In fact, this can be done for innitely many sets:

n=1
A
n
=

n=1
(A
c
1
. . . A
c
n1
A
n
). (1.2)
If A
n
, then

n
j=1
A
j
= A
1
(A
2
A
1
) (A
3
A
2
) . . . (A
n
A
n1
). (1.3)
Two sets which play an important role in studying convergence questions are:
limA
n
) = limsup
n
A
n
=

n=1

_
k=n
A
k
(1.4)
and
limA
n
= liminf
n
A
n
=

_
n=1

k=n
A
k
. (1.5)
3
Notice
(limA
n
)
c
=
_

n=1

_
k=n
A
n
_
c
=

_
n=1
_

_
k=n
A
k
_
c
=

_
n=1

k=n
A
c
k
= limA
c
n
Also, x limA
n
if and only if x

k=n
A
k
for all n. Equivalently, for all n there is
at least one k > n such that x A
k
0
. That is, x A
n
for innitely many n. For
this reason when x limA
n
we say that x belongs to innitely many of the A

n
s
and write this as x A
n
i.o. If x limA
n
this means that x

k=n
A
k
for some
n or equivalently, x A
k
for all k > n. For this reason when x limA
n
we say
that x A
n
, eventually. We will see connections to limx
k
, limx
k
, where x
k
is a
sequence of points later.
Denition 1.1. Let T be a collection of subsets of . T is called a eld (algebra)
if T and T is closed under complementation and nite union. That is,
(i) T
(ii) A T A
c
T
(ii) A
1
, A
2
, . . . A
n
T
n

j=1
A
j
T.
If in addition, (iii) can be replaced by countable unions, that is if
(iv) A
1
, . . . A
n
, . . . T

j=1
A
j
T,
then T is called a algebra or often also a eld.
Here are three simple examples of algebras.
(i) T = , ,
4
(ii) T = all subsets of ,
(iii) If A , T = , , A, A
c
.
An example of an algebra which is not a algebra is given by the following.
Let = R, the real numbers and take T to be the collection of all nite disjoint
unions of intervals of the form (a, b] = x: a < x b, a < b < . By
convention we also count (a, ) as rightsemiclosed. T is an algebra but not a
algebra. Set
A
n
= (0, 1
1
n
].
Then,

_
n=1
A
n
= (0, 1) , T.
The convention is important here because (a, b]
c
= (b, ) (, a].
Remark 1.1. We will refer to the pair (, T) as a measurable space. The reason
for this will become clear in the next section when we introduce measures.
Denition 1.2. Given any collection / of subsets of , let (/) be the smallest
algebra containing /. That is if T is another algebra and / T, then
(/) T.
Is there such a algebra? The answer is, of course, yes. In fact,
(/) =

T
where the intersection is take over all the algebras containing the collection /.
This collection is not empty since / all subsets of which is a algebra.
We call (/) the algebra generated by /. If T
0
is an algebra, we often write
(T
0
) = T
0
.
Example 1.1. / = A, A . Then
(/) = , A, A
c
, .
5
Problem 1.1. Let / be a collection of subsets of and A . Set / A =
B A: B /. Assume (/) = T. Show that (/ A) = T A, relative to A.
Denition 1.2. Let = R and B
0
the eld of rightsemiclosed intervals. Then
(B
0
) = B is called the Borel algebra of R.
Problem 1.2. Prove that every open set in R is the countable union of right
semiclosed intervals.
Problem 1.3. Prove that every open set is in B.
Problem 1.4. Prove that B = (all open intervals).
Remark 1.2. The above construction works equally in R
d
where we take B
0
to be
the family of all intervals of the form
(a
1
, b
1
] . . . (a
d
, b
d
], a
i
< b
i
< .
2. Measures.
Denition 2.1. Let (, T) be a measurable space. By a measure on this space we
mean a function : T [0, ] with the properties
(i) () = 0
and
(ii) if A
j
T are disjoint then

_
_

_
j=1
A
j
_
_
=

j=1
(A
j
).
6
Remark 2.1. We will refer to the triple (, T, ) as a measure space. If () = 1
we refer to it as a probability space and often write this as (, T, P).
Example 2.1. Let be a countable set and let T = collection of all subsets of .
Denote by #A denote the number of point in A. Dene (A) = #A. This is called
the counting measure. If is a nite set with n points and we dene P(A) =
1
n
#A
then we get a probability measure. Concrete examples of these are:
(i) Coin ips. Let = 0, 1 = Heads, Tails = T, H and set P0 =
1/2 and P1 = 1/2
(2) Rolling a die. = 1, 2, 3, 4, 5, 6, Pw = 1/6.
Of course, these are nothing but two very simple examples of probability
spaces and our goal now is to enlarge this collection. First, we list several elemen-
tary properties of general measures.
Proposition 2.1. Let (, T, ) be a measure space. Assume all sets mentioned
below are in T.
(i) If A B, then (A) (B), (monotonicity).
(ii) If A

j=1
A
j
, then (A)

j=1
(A
j
), (subadditivity).
(iii) If A
j
A, then (A
j
) (A), (continuity for below).
(iv) If A
j
A and (A
1
) < , then (A
j
) (A), (continuity from above).
Remark 2.2. The niteness assumption in (iv) is needed. To see this, set =
1, 2, 3, . . . and let be the counting measure. Let A
j
= j, j + 1, . . . . Then
A
j
but (A
j
) = for all j.
Proof. Write B = A (BA). Then
(B) = (A) +(BA) (A),
7
which proves (i). As a side remark here, note that if if (A) < , we have
(BA) = (B) (A). Next, recall that

n=1
A
n
=

n=1
(A
c
1
. . . A
c
n1
A
n
)
where the sets in the last union are disjoint. Therefore,

_

_
n=1
A
n
_
=

n=1
(A
n
A
c
1
. . . A
c
n1
)

n=1
(A
n
),
proving (ii).
For (iii) observe that if A
n
, then

_

_
n=1
A
n
_
=
_

_
n=1
(A
n
A
n1
)
_
=

n=1
(A
n
A
n1
)
= lim
m
m

n=1
(A
n
A
n1
)
= lim
m

_
m
_
n=1
A
n
A
n1
_
.
For (iv) we observe that if A
n
A then A
1
A
n
A
1
A. By (iii), (A
1
A
n
)
(A
1
A) and since (A
1
A
n
) = (A
1
) (A
n
) we see that (A
1
) (A
n
)
(A
1
) (A), from which the result follows assuming the niteness of (A
1
).
Denition 2.2. A LebesgueStieltjes measure on R is a measure on B = (B
0
)
such that (I) < for each bounded interval I. By an extended distribution
function on R we shall mean a map F: R R that is increasing, F(a) F(b) if
a < b, and right continuous, lim
xx
+
0
F(x) = F(x
0
). If in addition the function F is
nonnegative satisfying lim
x
F
X
(x) = 1 and lim
x
F
X
(x) = 0, we shall simply call
it a distribution function.
We will show that the formula (a, b] = F(b)F(a) sets a 1-1 correspondence
between the LebesgueStieltjes measures and distributions where two distributions
that dier by a constant are identied. of course, probability measures correspond
to distributions.
8
Proposition 2.2. Let be a LebesgueStieltjes measure on R. Dene F: R R,
up to additive constants, by F(b) F(a) = (a, b]. For example, x F(0) arbitrary
and set F(x) F(0) = (0, x], x 0, F(0) F(x) = (x, 0], x < 0. Then F is
an extended distribution.
Proof. Let a < b. Then F(b) F(a) = (a, b] 0. Also, if x
n
is such that x
1
>
x
2
> . . . x, then (x, x
n
] 0, by Proposition 2.1, (iv), since

n=1
(x
1
, x
n
] =
and (x, x
n
] . Thus F(x
n
) F(x) 0 implying that F is right continuous.
We should notice also that
b = lim
n

_
b
1
n
, b
_
= lim
n
F(b) F(b 1/n) = F(b) F(b).
Hence in fact F is continued at b if and only if b = 0.
Problem 2.1. Set F(x) = lim
xx

F(x). Then
(a, b) = F(b

) F(a) (1)
[a, b] = F(b) F(a

) (2)
[a, b) = F(b

) F(a

) (3)
(R) = F() F(). (4)
Theorem 2.1. Suppose F is a distribution function on R. There is a unique
measure on B(R) such that (a, b] = F(b) F(a).
Denition 2.3. Suppose / is an algebra. is a measure on / if : / [0, ],
() = 0 and if A
1
, A
2
, . . . are disjoint with A =

j
A
j
/, then (A) =

j
(A
j
) . The measure is nite if the space =

j=1

j
where the
j
/
are disjoint and (
j
) < .
9
Theorem 2.2 (Carath`eodorys Extension Theorem). Suppose is nite
on an algebra /. Then has a unique extension to (/).
We return to the proof of this theorem later.
Denition 2.4. A collection o is a semialgebra if the following two condi-
tions hold.
(i) A, B o A B o,
(ii) A o then A
c
is the nite union of disjoint sets in o.
Example 2.2. o = (a, b]: a < b < . This is a semialgebra but not an
algebra.
Lemma 2.1. If o is a semi-algebra, then o = nite disjoint unions of sets in
o is an algebra. This is called the algebra generated by o.
Proof. Let E
1
=
n
j=1
A
j
and E
2
=
n
j=1
B
j
, where the unions are disjoint and
the sets are all in o. Then E
1
E
2
=
i,j
A
i
B
j
o. Thus o is closed under
nite intersections. Also, if E =
n
j=1
A
j
o then A
c
=
j
A
c
j
. However, by the
denition of S, and o, we see that A
c
o. This proves that o is an algebra.
Theorem 2.3. Let o be a semialgebra and let be dened on o. Suppose () = 0
with the additional properties:
(i) If E o, E =
n

i=1
E
i
, E
i
o disjoint, then (E) =
n

i=1
(E
i
)
and
(ii) If E o, E =

i=1
E
i
, E
i
o disjoint, then (E)

i=1
(E
i
).
Then has a unique extension to o which is a measure. In addition, if is
nite, then has a unique extension to a measure (which we continuo to call
) to (o), by the Caratheodary extension theorem
10
Proof. Dene on o by
(E) =
n

i=1
(E
i
),
if E =

n
i=1
E
i
, E
i
o and the union is disjoint. We rst verify that this is well
dened. That is, suppose we also have E =

m
j=1

E
j
, where

E
j
o and disjoint.
Then
E
i
=
m
_
j=1
(E
i


E
j
),

E
j
=
n
_
i=1
(E
i


E
j
).
By (i),
n

i=1
(E
i
) =
n

i=1
(E
i


E
j
) =

n
(E
i


E
j
) =

m
(

E
j
).
So, is well dened. It remains to verify that so dened is a measure. We
postpone the proof of this to state the following lemma which will be used in its
proof.
Lemma 2.2. Suppose (i) above holds and let be dened on o as above.
(a) If E, E
i
o, E
i
disjoint, with E =
n

i=1
E
i
. Then (E) =
n

i=1
(E
i
).
(b) If E, E
i
o, E
n

i=1
E
i
, then (E)
n

i=1
(E
i
).
Note that (a) gives more than (i) since E
i
o not just o. Also, the sets in
(b) are not necessarily disjoint. We assume the Lemma for the moment.
Next, let E =

_
i=1
E
i
, E
i
o where the sets are disjoint and assume, as
required by the denition of the measures on algebras, that E o. Since E
i
=
n
_
j=1
E
ij
, E
ij
o, with these also disjoint, we have

i=1
(E
i
)
(i)
=

i=1
n

j=1
(E
ij
) =

i,j
(E
ij
).
11
So, we may assume E
i
o instead of o, otherwise replace it by E
ij
. Since
E o, E =
n

j=1

E
j
,

E
j
o and again disjoint sets, and we can write

E
j
=

_
i=1
(

E
j
E
i
).
Thus by assumption (ii),
(

E
j
)

i=1
(

E
j
E
i
).
Therefore,
(E) =
n

j=1
(

E
j
)

j=1

i=1
(

E
j
E
i
)
=

i=1
n

j=1
(

E
j
E
i
)

i=1
(E
i
),
which proves one of the inequalities.
For the apposite inequality we set (recall E =

i=1
E
i
) A
n
=

n
i=1
E
i
and
C
n
= E A
c
n
so that E = A
n
C
n
and A
n
, C
n
o and disjoint. Therefore,
(A) = (A
n
) +(C
n
) = (B
1
) +. . . +(B
n
) +(C
n
)

i=1
(B
i
),
with n arbitrary. This proves the other inequality.
Proof of Lemma 2.2. Set E =
n

i=1
E
i
, then E
i
=
m

j=1
E
ij
, S
ij
o. By assumption
(i),
(A) =

ij
(E
ij
) =
n

i=1
(E
i
)
12
proving (a).
For (b), assume n = 1. If E E
1
, then
E
1
= E (E
1
E
c
), E
1
E
c
o.
(E) (E) +(E
1
E
c
) = (E
1
).
For n > 1, set
n
_
i=1
E
i
=
n
_
i=1
(E
i
E
c
1
. . . E
c
i1
) =
n
_
i=1
F
i
.
Then
E = E
_
n
_
i=1
E
i
_
= E F
1
. . . (E F
n
).
So by (a), (E) =

n
i=1
(E F
i
). Now, the case n = 1 gives
n

i=1
(E F
i
)
n

i=1
(E
i
)

i=1
(F
i
)
=
_
n
_
i=1
E
i
_
,
where the last inequality follows from (a).
Proof of Theorem 2.1. Let o = (a, b]: a < b < . Set F() = lim
x
F(x)
and F() = lim
x
F(x). These quantities exist since F is increasing. Dene for
any
(a, b] = F(b) F(a),
for any a < b , where F() > , F() < . Suppose (a, b] =
n
_
i=1
(a
i
, b
i
], where the union is disjoint. By relabeling we may assume that
a
1
= a
b
n
= b
a
i
= b
i1
.
13
Then (a
i
, b
i
] = F(b
i
) F(a
i
) and
n

i==1
(a
i
, b
i
] =
n

i=1
F(b
i
) F(a
i
)
= F(b) F(a)
= (a, b],
which proves that condition (i) holds.
For (ii), let < a < b < and (a, b]

_
i=1
(a
i
, b
i
] where the union is
disjoint. (We can also order them if we want.) By right continuity of F, given
> 0 there is a > 0 such that
F(a +) F(a) < ,
or equivalently,
F(a +) < F(a) +.
Similarly, there is a
i
> 0 such that
F(b
i
+
i
) < F(b
i
) +2
i
,
for all i. Now, (a
i
, b
i
+
i
) forms a open cover for [a + , b]. By compactness,
there is a nite subcover. Thus,
[a +, b]
N
_
i=1
(a
i
, b
i
+
i
)
and
(a +, b]
N
_
i=1
(a
i
, b
i
+
i
].
14
Therefore by (b) of Lemma 2.2,
F(b) F(a +) = (a +, b]

i=1
(a
i
, b
i
+
i
]
=
N

i=1
F(b
i
+
i
) F(a
i
)
=
N

i=1
F(b
i
+
i
) F(b
i
) +F(b
i
) F(a
i
)

i=1
2
i
+

i=1
(F(b
i
) F(a
i
))
+

i=1
F(b
i
) F(a
i
).
Therefore,
(a, b] = F(b) F(a)
2 +

i=1
F(b
i
) F(a
i
)
= 2 +

i=1
(a
i
, b
i
],
proving (ii) provided < a < b < .
If (a, b]

_
i=1
(a
i
, b
i
], a and b arbitrary, and (A, B] (a, b] for any < A <
B < , we have by above
F(B) F(A)

i=1
(F(b
i
) F(a
i
))
and the result follows by taking limits.
If F(x) = x, is called the Lebesgue measure on R. If
F(x) =
_

_
0, x 0
x, 0 < x 1
1, x > 1
15
the measure we obtain is called the Lebesgue measure on = (0, 1]. Notice that
() = 1.
If is a probability measure then F(x) = (, x] and lim
x
F(x) = 1.
lim
x
F(x) = 0.
Problem 2.2. Let F be the distribution function dened by
F(x) =
_

_
0, x < 1
1 +x, 1 x < 0
2 +x
2
, 0 x < 2
9, x 2
and let be the LebesgueStieltjes measure corresponding to F. Find (E) for
(i) E = 2,
(ii) E = [1/2, 3),
(iii) E = (1, 0] (1, 2),
(iv) E = x: [x[ + 2x
2
> 1.
Proof of Theorem 2.3. For any E we dene

(E) = inf

(A
i
) where the
inmum is taken over all sequences of A
i
in / such that E A
i
. Let /

be
the collection of all subsets E with the property that

(F) =

(F E) +

(F E
c
),
for all sets F . These two quantities satisfy:
(i) /

is a algebra and

is a measure on /

.
(ii) If

(E) = 0, then E /

.
(iii) / /

and

(E) = (E), if E /.
We begin the proof of (i)(iii) with a simple but very useful observation. It
follows easily from the denition that E
1
E
2
implies

(E
1
)

(E
2
) and that
16
E

j=1
E
j
implies

(E)

j=1

(E
j
).
Therefore,

(F)

(F E) +

(F E
c
)
is always true. Hence, to prove that E /

, we need to verify that

(F)

(F E) +

(F E
c
),
for all F . Clearly by symmetry if E /

we have E
c
/

.
Suppose E
1
and E
2
are in /

. Then for all F ,

(F) =

(F E
1
) +

(F E
c
1
)
= (

(F E
1
E
2
) +

(F E
1
E
c
2
))
+ (

(F E
c
1
E
2
) +

(E E
c
1
E
c
2
))

(F (E
1
E
2
)) +

(F (E
1
E
2
)
c
),
where we used the fact that
E
1
E
2
(E
1
E
2
) (E
1
E
c
2
) (E
c
1
E
2
)
and the subadditivity of

observed above. We conclude that E


1
E
2
/

. That
is, /

is an algebra.
Now, suppose E
j
/

are disjoint. Let E =

j=1
E
j
and A
n
=
n

j=1
E
j
. Since
E
n
/

we have (applying the denition with the set F A


n
)

(F A
n
) =

(F A
n
E
n
) +

(F A
n
E
c
n
)
=

(F E
n
) +

(F A
n1
)
=

(F E
n
) +(F E
n1
) +

(F A
n2
)
=
n

j=1

(F E
j
).
17
Now, the measurability of A
n
together with this gives

(F) =

(F A
n
) +

(F A
c
n
)
=
n

j=1

(F E
j
) +

(F A
c
n
)

j=1

(F E
j
) +

(F E
c
).
Let n we nd that

(F)

j=1

(F E
j
) +

(F E
c
)

j=1
(F E
j
)) +

(F E
c
)
=

(F E) +

(F E
c
)

(F),
which proves that E /

. If we take F = E we obtain

(E) =

j=1

(E
j
).
From this we conclude that /

is closed under countable disjoint unions and that

is countably additive. Since any countable union can be written as the disjoint
countable union, we see that /

is a algebra and that

is a measure on it.
This proves (i).
If

(E) = 0 and F , then

(F E) +

(F E
c
) =

(F E
c
)

(F).
Thus, E /

and we have proved (ii).


For (iii), let E /. Clearly

(E) (E).
18
Next, if E

_
j=1
E
i
, E
i
/, we have E =

_
j=1

E
i
, where

E
j
= E
_
E
j
_
j1
_
i=1
E
j
_
and these sets are disjoint and their union is E. Since is a measure on /, we
have
(E) =

j=1
(

E
j
)

j=1
(E
j
).
Since this holds for any countable covering of E by sets in /, we have (E)

(E). Hence
(E) =

(E), for all E /.


Next, let E /. Let F and assume

(F) < . For any > 0, choose


E
j
/ with F

_
j=1
E
j
and

j=1
(E
j
)

(F) +.
Using again the fact that is a measure on /,

(F) +

j=1
(E
j
)
=

j=1
(E
j
E) +

j=1
(E
j
E
c
)

(F E) +

(F E
c
)
and since > 0 is arbitrary, we have that E /

. This completes the proof of


(iii).
With (i)(iii) out of the way, it is clear how to dene

. Since A /

,
and /

is a algebra, (/) /

. Dene (E) =

(E) for E (/). This is


clearly a measure and it remains to prove that it is unique under the hypothesis
of niteness of . First, the construction of the measure

clearly shows that


whenever is nite or nite, so are the measure

and .
19
Suppose there is another measure on (/) with (E) = (E) for all E /.
Let E (/) have nite

measure. Since (A) /

(E) = inf
_
_
_

j=1
(E
j
): E

_
j=1
E
j
, E
j
/
_
_
_
.
However, since (E
j
) = (E
j
), we see that
(E)

j=1
(E
j
)
=

j=1
(E
j
).
This shows that
(E)

(E).
Now let E
j
/ be such that E

j=1
E
j
and

j=1
(E
j
)

(E) +.
Set

E =

j=1
E
j
and

E
n
=
n
j=1
E
j
. Then

(

E) = lim
k
= lim
k
(E
n
)
= (

E)
Since

(

E)

(E) +,
we have

(

EE) .
20
Hence,

(E)

(

E)
= (

E)
(E) + (EE)
(E) +.
Since > 0 is arbitrary, (E) =

(E) =

(E) for all E (/) of nite

measure. Since

is nite, we can write any set E =

j=1
(
j
E) where the
union is disjoint and each of these sets have nite

measure. Using the fact that


both and are measures, the uniqueness follows from what we have done for
the nite case.
What is the dierence between (/) and /

? To properly answer this ques-


tion we need the following
Denition 2.5. The measure space (, T, ) is said to be complete if whenever
E T and (E) = 0 then A T for all A E.
By (ii), the measure space (, /

) is complete. Now, if (, T, ) is a
measure space we dene T

= E N : E T, and N T, (N) = 0. We
leave the easy exercise to the reader to check that T

is a algebra. We extend
the measure to a measure on T

by dening

(E N) = (E). The measure


space (, T

) is clearly complete. This measure space is called the completion


of (, T, ). We can now answer the above question.
Theorem 2.4. The space (, /

) is the completion of (, (/), ).


21
II
INTEGRATION THEORY
1 Measurable Functions.
In this section we will assume that the space (.T, ) is nite. We will say
that the set A is measurable if A T. When we say that A R is measurable
we will always mean with respect to the Borel algebra B as dened in the last
chapter.
Denition 1.1. Let (, T) be a measurable space. Let f be an extended real
valued function dened on . That is, the function f is allowed to take values in
+, . f is measurable relative to T if : f() > T for all R.
Remark 1.1. When (, T, P) is a probability space and f : R, we refer to
measurable functions as random variables.
Example 1.1. Let A be a measurable set. The indicator function of this set
is dened by
1
A
() =
_
1 if A
0 else.
This function is clearly measurable since
x: 1
A
() < =
_

_
1
< 0
A
c
0 < 1.
This denition is equivalent to several others as seen by the following
22
Proposition 1.1. The following conditions are equivalent.
(i) : f() > T for all R,
(ii) : f() T for all R,
(iii) : f() < T for all R,
(ii) : f() T for all R.
Proof. These follow from the fact that algebras are closed under countable
unions, intersections, and complementations together with the following two iden-
tities.
: f() =

n=1
: f() >
1
n

and
: f() > =

_
n=1
: f() +
1
n

Problem 1.1. Let f be a measurable function on (, T). Prove that the sets
: f() = +, : f() = , : f() < , : f() > , and
: < f() < are all measurable.
Problem 1.2.
(i) Let (, T, P) be a probability space. Let f : R. Prove that f is measurable
if and only if f
1
(E) = : f() E T for every Borel set E R.
(ii) With f as in (i) dene on the Borel sets of R by (A) = P : f()
A. Prove that is a probability measure on (R, B).
Proposition 1.2. If f and f
2
are measurable, so are the functions f
1
+f
2
, f
1
f
2
,
max(f
1
, f
2
), min(f
1
, f
2
) and cf
1
, for any constant c.
23
Proof. For the sum note that
: f
1
() +f
2
() =
_
( : f
1
() < r : f
2
() < r) ,
where the union is taken over all the rational. Again, the fact that countable
unions of measurable sets are measurable implies the measurability of the sum. In
the same way,
: max(f
1
(), f
2
()) > = : f
1
() > : f
2
() >
gives the measurability of max(f
1
, f
2
). The min(f
1
, f
2
) follows from this by taking
complements. As for the product, rst observe that
: f
2
1
() > = : f
1
() >

: f
1
() <

and hence f
2
1
is measurable. But then writing
f
1
f
2
=
1
2
_
(f
1
+f
2
)
2
f
2
1
f
2
2

,
gives the measurability of the product.
Proposition 1.3. Let f
n
be a sequence of measurable functions on (, T), then
f = inf
n
f
n
, sup
n
f
n
, limsup f
n
, liminf f
n
are measurable functions.
Proof. Clearly inf
n
f
n
< = f
n
< and sup
n
f
n
> = f
n
> and
hence both sets are measurable. Also,
limsup
n
f
n
= inf
n
_
sup
mn
f
m
_
and
liminf
n
f
n
= sup
n
_
inf
mn
f
m
_
,
the result follows from the rst part.
24
Problem 1.3. Let f
n
be a sequence of measurable functions. Let E = :
limf
n
() exists. Prove that E is measurable.
Problem 1.4. Let f
n
be a sequence of measurable functions converging pointwise
to the function f. Proof that f is measurable.
Proposition 1.4.
(i) Let be a metric space and suppose the collection of all open sets are in the
sigma algebra T. Suppose f : R is continuous. Then f is measurable.
In particular, a continuous function f : R
n
R is measurable relative to the
Borel algebra in R
n
.
(ii) Let : R R be continuous and f : R be measurable. Then (f) is
measurable.
Proof. These both follows from the fact that for every continuous function f,
: f() > = f
1
(, ) is open for every .
Problem 1.5. Suppose f is a measurable function. Prove that
(i) f
p
, p 1,
(ii) [f[
p
, p > 0,
(iii) f
+
= max(f, 0),
(iv) f

= min(f, 0)
are all measurable functions.
Denition 1.2. Let f: R be measurable. The sigma algebra generated by f
is the sigma algebra in
1
generated by the collection f
1
(A): A B. This is
denoted by (f).
25
Denition 1.3. A function dene on (, T, ) is a simple function if (w) =
n

i=1
a
i
1
A
i
where the A

i
s are disjoint measurable sets which form a partition of ,
(

A
i
= ) and the a

i
s are constants.
Theorem 1.1. Let f: [0, ] be measurable. There exists a sequence of simple
functions
n
on with the property that 0
1
()
2
() . . . f() and

n
() f(), for every .
Proof. Fix n 1 and for i = 1, 2, . . . , n2
n
, dene the measurable sets
A
n
i
= f
1
_
i 1
2
n
,
i
2
n
_
.
Set
F
n
= f
1
([n, ])
and dene the simple functions

n
() =
n2
n

i=1
i 1
2
n
1
A
n
i
() +n1
F
n
().
clearly
n
is a simple function and it satises
n
()
n+1
() and
n
() f()
for all .
Fix > 0. Let . If f() < , then pick n so large that 2
n
< and
f() < n. Then f()
_
i1
2
n
,
i
2
n
_
for some i = 1, 2, . . . n2
n
. Thus,

n
() =
i 1
2
n
and so,
f(x))
n
() < 2
n
.
By our denition, if f() = then
n
() = n for all n and we are done.
2 The Integral: Denition and Basic Properties.
26
Denition 2.1. Let (, T, ) be a measure space.
(i) If (w) =
n

i=1
a
i
1
A
i
is a simple function and E T is measurable, we dene
the integral of the function over the set E by
_
E
(w)d =
n

i=1
a
i
(A
i
E). (2.1)
(We adapt the convention here, and for the rest of these notes, that 0 = 0.)
(ii) If f 0 is measurable we dene the integral of f over the set E by
_
E
fd = sup

_
E
d, (2.2)
where the sup is over all simple functions with 0 f.
(iii) If f is measurable and at least one of the quantities
_
E
f
+
d or
_
E
f

d is
nite, we dene the integral of f over E to be
_
E
fd =
_
E
f
+
d
_
E
f

d.
(iv) If
_
E
[f[d =
_
E
f
+
d +
_
E
f

d <
we say that the function f is integrable over the set E. If E = we denote
this collection of functions by L
1
().
We should remark here that since in our denition of simple functions we did
not required the constants a
i
to be distinct, we may have dierent representations
for the simple functions . For example, if A
1
and A
2
are two disjoint measurable
sets then 1
A
1
A
2
and 1
A
1
+ 1
A
2
both represents the same simple function. It is
clear from our denition of the integral that such representations lead to the same
quantity and hence the integral is well dened.
Here are some basic and easy properties of the integral.
27
Proposition 2.1. Let f and g be two measurable functions on (, T, ).
(i) If f g on E, then
_
E
fd
_
E
gd.
(ii) If A B and f 0, then
_
A
fd
_
B
fd.
(iii) If c is a constant, then
_
E
cfd = c
_
E
fd.
(iv) f 0 on E, then
_
E
fd = 0 even if (E) = .
(v) If (E) = 0, then
_
E
fd = 0 even if f(x) = on E.
(vi) If f 0, then
_
E
fd =
_

gf
E
fd.
Proposition 2.2. Let (, T, ) be a measure space. Suppose and are simple
functions.
(i) For E T dene
(E) =
_
E
d.
The is a measure on T.
(ii)
_

( +)d =
_

d +
_

d.
Proof. Let E
i
T, E =

E
i
. Then
(E) =
_
E
d =
n

i=1

i
(A
i
E)
=
n

i=1

j=1
(A
i
E
j
) =

j=1
n

i=1

i
(A
i
E
j
)
=

i=1
(E
i
).
By the denition of the integral, () = 0. This proves (i). (ii) follows from
(i) and we leave it to the reader.
28
We now come to the three important limit theorems of integration: The
Lebesgue Monotone Convergence Theorem, Fatous lemma and the Lebesgue Dom-
inated Convergence theorem.
Theorem 2.1 (Lebesgue Monotone Convergence Theorem). Suppose f
n

is a sequence of measurable functions satisfying:


(i) 0 f
1
() f
2
() . . . , for every ,
and
(ii) f
n
() f(), for every .
Then
_

f
n
d
_
fd.
Proof. Set

n
=
_

f
n
d.
Then
n
is nondecreasing and it converges to [0, ]. Since
_

f
n
d
_

fd,
for all n we see that if = , then
_

fd = and we are done. Assume


_

fd < . Since

_

fd.
we need to prove the opposite inequality. Let 0 f be simple and let 0 < c <
1. Set
E
n
= : f
n
() c ().
Clearly, E
1
E
2
. . . In addition, suppose . If f() = 0 then () = 0
and E
1
. If f() > 0 then cs() < f() and since f
n
() f(), we have that
29
E
n
some n. Hence

E
n
= , or in our notation of Proposition 2.1, Chapter
I, E
n
. Hence
_

f
n
d
_
E
n
f
n
d
c
_
E
n
()d
= c(E
n
).
Let n . By Proposition 2.2 above and Proposition 2.1 of Chapter I,
lim
n
_

f
n
d c() = c
_

d
and therefore,
_

E
d ,
for all simple f and
sup
f
_

d ,
proving the desired inequality.
Corollary 2.1. Let f
n
be a sequence of nonnegative measurable functions and
set
f =

n=1
f
n
().
Then
_

fd =

n=1
_

f
n
d.
Proof. Apply Theorem 2.1 to the sequence of functions
g
n
=
n

j=1
f
j
.

30
Corollary 2.2 (First BorelContelli Lemma). Let A
n
be a sequence of
measurable sets. Suppose

n=1
(A
n
) < .
Then A
n
, i.o. = 0.
Proof. Let f() =

n=1
1
A
n
()
. Then
_

f()d =

n=1
_

1
A
n
d
=

n=1
(A
n
) < .
Thus, f() < for almost every . That is, the set A where f() = has
measure 0. However, f() = if and only if A
n
for innitely many n. This
proves the corollary.
Let be the counting measure on = 1, 2, 3, . . . and dene the measurable
functions f by f(j) = a
j
where a
j
is a sequence of nonnegative constants. Then
_

f(j)d(j) =

j=1
a
j
.
From this and Theorem 2.1 we have
Corollary 2.3. Let a
ij
0 for all i, j. Then

i=1

j=1
a
ij
=

j=1

i=1
a
ij
.
The above theorem together with theorem 1.1 and proposition 2.2 gives
31
Corollary 2.4. Let f be a nonnegative measurable function. Dene
(E) =
_
E
fd.
Then is a measure and
_

gd =
_

gfd
for all nonnegative measurable functions g.
Theorem 2.2 (Fatous Lemma). Let f
n
be a sequence of nonnegative mea-
surable functions. Then
_

liminf f
n
d liminf
_

f
n
d.
Proof. Set
g
n
() = inf
mn
f
m
(), n = 1, 2, . . .
Then g
n
is a sequence of nonnegative measurable functions satisfying the hy-
pothesis of Theorem 2.1. Since
lim
n
g
n
() = liminf
n
f
n
()
and
_
g
n
d
_
f
n
d,
Theorem 2.1 gives
_

liminf
n
f
n
d =
_

lim
n
g
n
d
= lim
n
_

g
n
d
liminf
n
_

f
n
d.
This proves the theorem.
32
Proposition 2.2. Let f be a measurable function. Then

fd

[f[d.
Proof. We assume the right hand side is nite. Set =
_

fd. Take = sign()


so that = [[. Then

fd

= [[
=
_

fd
=
_

fd

[f[d.
Theorem 2.3 (The Lebesgue Dominated Convergence Theorem ). Let
f
n
be a sequence of measurable functions such that f
n
() f() for every
. Suppose there is a g L
1
() with [f
n
()[ g(). Then f L
1
() and
lim
n
_

[f
n
f[d = 0.
In particular,
lim
n
_

f
n
d =
_

fd.
Proof. Since [f()[ = lim
n
[f
n
()[ g() we see that f L
1
(). Since [f
n
f[
2g, 0 2g [f
n
f[ and Fatous Lemma gives
0
_

2gd lim
n
_

2gd + lim
_

[f
n
f[d
_
=
_

2gd lim
_

[f
n
f[d.
33
It follows from this that
lim
_

[f
n
f[d = 0.
Since

[f
n
d
_

fd

[f
n
f[d,
the rst part follows. The second part follows from the rst.
Denition 2.2. Let (, T, ) be a measure space. Let P be a property which a
point may or may not have. We say that P holds almost everywhere on
E, and write this as i.e. , if there exists a measurable subset N E such that P
holds for all EN and (N) = 0.
For example, we say that f
n
f almost everywhere if f
n
() f() for all
except for a set of measure zero. In the same way, f = 0 almost everywhere
if f() = 0 except for a set of measure zero.
Proposition 2.3 (Chebyshevs Inequality). Fix 0 < p < and let f be a
nonnegative measurable function on (, T, ). Then for any measurable set e we
have
E: : f() >
1

p
_
E
f
p
d: .
Proof.

p
E: f() > =
_
{E:f()>}

p
d

_
E
f
p
d
_
E
f
p
d
which proves the proposition.
Proposition 2.4.
34
(i) Let f be a nonnegative measurable function. Suppose
_
E
fd = 0.
Then f = 0 a.e. on E.
(ii) Suppose f L
1
() and
_
E
fd = 0 for all measurable sets E . Then
f = 0 a.e. on .
Proof. Observe that
E: f() > 0 =

_
n=1
: f() > 1/n.
By Proposition 2.3,
E: f() > 1/n n
_
E
fd = 0.
Therefore, f() > 0 = 0, which proves (i).
For (ii), set E = : f() 0 = : f() = f
+
(). Then
_
E
f
+
d =
_
E
fd = 0,
which by (i) implies that f
+
= 0, a.e. But then
_
E
fd =
_
E
f

d = 0
and this again gives
_
E
f

d = 0
which implies that f

= 0, a.e.
35
Denition 2.3. The function : (a, b) R (the interval (a, b) = R is permit-
ted) is convex if
((1 )x +y) (1 )(x) +(y), (2.3)
for all 0 1.
An important property of convex functions is that they are always continuous.
This easy to see geometrically but the proof is not as trivial. What follows easily
from the denition is
Problem 2.1. Prove that (2.3) is equivalent to the following statement: For all
a < s < t < u < b,
(t) (s)
t s

(u) (t)
u t
.
and conclude that a dierentiable function is convex if and only if its derivative is
a nondecreasing function.
Proposition 2.5 (Jensens Inequality). Let (, T, ) be a probability space.
Let f L
1
() and a < f() < b. Suppose is convex on (a, b). The (f) is
measurable and

__

fd
_

(f)d.
Proof. The measurability of the function (f) follows from the continuity of
and the measurability of f using Proposition 1.4. Since a < f() < b for all
and is a probability measure, we see that if
t =
_

fd,
then a < t < b. Let (x) = c
1
x + c
2
be the equation of the supporting line of
the convex function at the point (t, (t)). That is, satises (t) = (t) and
36
(x) (x) for all x (a, b). The existence of such a line follows from Problem
2.1. The for all ,
(f()) c
1
f() +c
2
= (f()).
Integrating this inequality and using the fact that () = 1, we have
_

(f())d c
1
_

f()d +c
2
=
__

f()d
_
=
__

f()d
_
,
which is the desired inequality.
Examples.
(i) Let (x) = e
x
. Then
exp
_

fd
_

e
f
d.
(ii) If = 1, 2, . . . , n with the measure dened by i = 1/n and the
function f given by f(i) = x
i
, we obtain
exp
1
n
(x
1
+x
2
+. . . +x
n
)
1
n
e
x
1
+. . . +e
x
n
.
Setting y
i
= e
x
i
we obtain the Geometric mean inequality. That is,
(y
1
, . . . , y
n
)
1/n

1
n
(y
1
+. . . +y
n
) .
More generally, extend this example in the following way.
Problem 2.2. Let
1
, ,
n
be a sequence of positive numbers with
1
+ +

n
= 1 and let y
1
, , y
n
be positive numbers. Prove that
y

1
1
y

n
n

1
y
1
+ +
n
y
n
.
37
Denition 2.4. Let (, T, ) be a measure space. Let 0 < p < and set
|f|
p
=
__

[f[
p
d
_
1/p
.
We say that f L
p
() if |f|
p
< . To dene L

() we set
E = m R
+
: : [f()[ > m = 0.
If E = , dene |f|

= . If E ,= , dene |f|

= inf E. The function


f L

() if |f|

< .
Suppose |f|

< . Since
f
1
(|f|

, ] =

_
n=1
f
1
_
|f|

+
1
n
,
_
and f
1
(|f|

+
1
n
, ] =, we see |f|

E. The quantity |f|

is called the
essential supremum of f.
Theorem 2.4.
(i) (Holders inequality) Let 1 p and let q be its conjugate exponent. That
is,
1
p
+
1
q
= 1. If p = 1 we take q = . Also note that when p = 2, q = 2.
Let f L
p
() and g l
q
(). Then fg L
1
() and
_
[fg[d |f|
p
|g|
q
.
(ii) (Minkowskis inequality) Let 1 p . Then
|f +g|
p
|f|
p
+|g|
p
.
Proof. If p = 1 and q = , or q = 1 and p = , we have [fg()[ |g|

[f()[.
This immediately gives the result when p = 1 or p = . Assume 1 < p < and
(without loss of generality) that both f and g are nonnegative. Let
A =
__

f
p
d
_
1/p
and B =
__
g
q
d
_
1/q
38
If A = 0, then f = 0 almost everywhere and if B = 0, then g = 0 almost everywhere
and in either case the result follows. Assume 0 < A < and same for B. Put
F = f/A > 0, G = g/B. Then
_

F
p
d =
_

G
p
d = 1
and by Problem 2.2
F() G()
1
p
F
p
+
1
q
G
q
.
Integrating both sides of this inequality gives
_
F()G()
1
p
+
1
q
= 1,
which implies the (i) after multiplying by A B.
For (ii), the cases p = 1 and p = are again clear. Assume therefore that
1 < p < . As before, we may assume that both f and G are nonnegative. We
start by observing that since the function (x) = x
p
, x R
+
is convex, we have
_
f +g
2
_
p

1
2
f
p
+
1
2
g
p
.
This gives
_

(f +g)
p
d 2
(p1)
_

f
p
d + 2
(p1)
_

g
p
d.
Thus, f +g L
p
(d). Next,
(f +g)
p
= (f +g)(f +g)
p1
= f(f +g)
p1
+g(f +g)
p1
together with Holders inequality and the fact that q(p 1) = p, gives
_

f(f +g)
p1
d
__

f
p
d
_
1/p
__

(f +g)
q(p1)
d
_
1/q
=
__

f
p
d
_
1/p
__

(f +g)
p
d
_
1/q
.
39
In the same way,
_

g(f +g)
p
d
__

g
p
d
_
1/p
__

(f +g)
p
d
_
1/q
.
Adding these inequalities we obtain
_

(f +g)
p
d
_
__

f
p
d
_
1/p
+
__

g
p
d
_
1/p
_
__

(f +g)
p
d
_
1/q
.
Since f + g L
p
(d), we may divide by the last expression in brackets to obtain
the desired inequality.
For f, g L
p
() dene d(f, g) = |f g|
p
. For 1 p , Minkowskis
inequality shows that this function satises the triangle inequality. That is,
d(f, g) = |f g|
p
= |f h +h g|
p
|f h|
p
+|h g|
p
= d(f, h) +d(h, g),
for all f, g, h L
p
(). It follows that L
p
() is a metric space with respect to d(, ).
Theorem 2.4. L
p
(), 1 p , is complete with respect to d( , ).
Lemma 2.1. Let g
k
be a sequence of functions in L
p
and (0 < p < ) satisfying
|g
k
g
h+1
|
p

_
1
4
_
k
, k = 1, 2, . . ..
Then g
k
converges a.e.
Proof. Set
A
k
= : [g
k
() g
k+1
()[ > 2
k
.
By Chebyshevs inequality,
A
n
2
kp
_

[g
k
g
k+1
[
p
d

_
1
4
_
kp
2
kp
=
1
2
kp
40
This shows that

(A
n
) < .
By Corollary 2.2, A
n
i.o. = 0. Thus, for almost all A
n
i.o.
c
there is an
N = N() such that
[g
k
() g
k+1
()[ 2
k
,
for all k > N. It follows from this that g
k
() is Cauchy in R and hence g
k
()
converges.
Lemma 2.2. The sequence of functions f
n
converges to f in L

if and only if
there is a set measurable set A with (A) = 0 such that f
n
f uniformly on A
c
.
Also, the sequence f
n
is Cauchy in L

if and only if there is a measurable set


A with (A) = 0 such that f
n
is uniformly Cauchy in A
c
.
Proof. We proof the rst statement, leaving the second to the second to the reader.
Suppose |f
n
f|

0. Then for each k > 1 there is an n > n(k) suciently


large so that |f
n
f|

<
1
k
. Thus, there is a set A
k
so such that (A
k
) = 0
and [f
n
() f()[ <
1
k
for every A
c
k
. Let A = A
k
. Then (A) = 0 and
f
n
f uniformly on A
c
. For the converse, suppose f
n
f uniformly on A
c
and
(A) = 0. Then given > 0 there is an N such that for all n > N and A
c
,
[f
n
()f()[ < . This is the same as saying that |f
n
f|

< for all n > N.


Proof of Theorem 2.4. Now, suppose f
n
is Cauchy in L
p
(). That is, given any
> 0, there is a N such that for all n, m N, d(f
n
, f
m
) = |f
n
f
m
| < for all
n, m > N. Assume 1 p < . The for each k = 1, 2, . . . , there is a n
k
such that
|f
n
f
m
|
p
(
1
4
)
k
for all n, m n
k
. Thus, f
n
k
() f a.e., by Lemma 2.1. We need to show that
f L
p
and that it is the L
p
() limit of f
n
. Let > 0. Take N so large that
41
|f
n
f
m
|
p
< for all n, m N. Fix such an m. Then by the pointwise convergence
of the subsequence and by Fatous Lemma we have
_

[f f
m
[
p
d =
_

lim
k
[f
n
k
f
m
[
p
d
liminf
k
_

[f
n
k
f
m
[
p
<
p
.
Therefore f
n
f in L
p
() and
_

[f f
m
[
p
d <
for m suciently large. But then,
|f|
p
= |f
m
f
m
f|
p
|f
m
|
p
+|f
m
f|
p
,
which shows that f L
p
().
Now, suppose p = . Let f
n
be Cauchy in L

. There is a set A with


(A) = 0 such that f
n
is uniformly Cauchy on A
c
, by Lemma 2.2. That is, given
> 0 there is an N such that for all n, m > N and all A
c
,
[f
n
() f
m
()[ < .
Therefore the sequence f
n
converges uniformly on A
c
to a function f. Dene
f() = 0 for A. Then f
n
converges to f in L

() and f L

().
In the course of proving Theorem 2.4 we proved that if a sequence of functions
in L
p
, 1 p < converges in L
p
(), then there is a subsequence which converges
a.e. This result is of sucient importance that we list it here as a corollary.
Corollary 2.5. Let f
n
L
p
() with 1 p < and f
n
f in L
p
(). Then
there exists a subsequence f
n
k
with f
n
k
f a.e. as k .
The following Proposition will be useful later.
42
Proposition 2.6. Let f L
1
(). Given > 0 there exists a > 0 such that
_
E
[f[d < whenever (E) < .
Proof. Suppose the statement is false. Then we can nd an > 0 and a sequence
of measurable sets E
n
with
_
E
n
[f[d
and
(E
n
) <
1
2
n
Let A
n
=

j=n
E
j
and A =

n=1
A
n
= E
n
i.o.. Then

(E
n
) < and by
the BorelCantelli Lemma, (A) = 0. Also, A
n+1
A
n
for all n and since
(E) =
_
E
[f[d
is a nite measure, we have
_
A
[f[d = lim
n
_
A
n
[f[d
lim
n
_
E
n
[f[d
.
This is a contradiction since (A) = 0 and therefore the integral of any function
over this set must be zero.
3 Types of convergence for measurable functions.
Denition 3.1. Let f
n
be a sequence of measurable functions on (, T, ).
(i) f
n
f in measure if for all > 0,
lim
n
: [f
n
() f()[ = 0.
(ii) f
n
f almost uniformly if given > 0 there is a set E T with (E) <
such that f
n
f uniformly on E
c
.
43
Proposition 3.1. Let f
n
be measurable and 0 < p < . Suppose f
n
f in
L
p
. Then, f
n
f in measure.
Proof. By Chebyshevs inequality
[f
n
f[
1

p
_

[f
n
f[
p
d,
and the result follows.
Example 3.1. Let = [0, 1] with the Lebesgue measure. Let
f
n
() =
_
e
n
0
1
n
0 else
Then f
n
0 in measure but f
n
, 0 in L
p
() for any 0 < p . To see this
simply observe that
|f
n
|
p
=
_
1
0
[f
n
(x)[
p
dx =
1
n
e
np

and that |f
n
|

= e
n
, as n .
Proposition 3.2. Suppose f
n
f almost uniformly. Then f
n
f in measure
and almost everywhere.
Proof. Since f
n
f almost uniformly, given > 0 there is a measurable set E
such that (E) < and f
n
f uniformly on E
c
. Let > 0 be given. There is a
N = N() such that [f
n
() f()[ < for all n N and for all E
c
. That is,
: [f
n
() f()[ E, for all n N. Hence, for all n N,
[f
n
() f()[ < .
Since > 0 was arbitrary we see that for all > 0,
lim
n
[f
n
() f()[ = 0,
proving that f
n
f in measure.
Next, for each k take A
k
T with (A
k
) <
1
k
and f
n
f uniformly on A
c
k
.
If E =

k=1
A
c
k
, then f
n
f on E and (E
c
) = (

k=1
A
k
) (A
k
) <
1
k
for all
k. Thus (E
c
) = 0 and we have the almost everywhere convergence as well.
44
Proposition 3.3. Suppose f
n
f in measure. Then there is a subsequence f
n
k

which converges almost uniformly to f.


Proof. Since
[f
n
f
m
[ [f
n
f[ /2 +[f
n
f[ /2,
we see that f
n
f
m
[ 0 as n and m . For each k, take n
k
such that
n
k+1
> n
k
and
[f
n
() f
m
()[
1
2
k

1
2
k
for all n, m n
k
. Setting g
k
= f
n
k
and A
k
= : [g
k+1
() g
k
()[
1
2
k
we
see that

k=1
(A
k
) < .
By the BorelCantelli Lemma, Corollary 2.2, A
n
i.o. = 0. However, for every
, A
n
i.o., there is an N = N() such that
[g
k+1
() g
k
()[ <
1
2
k
for all k N. This implies that the sequence of real numbers g
k
() is Cauchy
and hence it converges to g(). Thus g
k
g a.e.
To get the almost uniform convergence, set E
n
=

k=n
A
k
. Then (E
n
)

k=n
(A
k
) and this can be made smaller than as soon as n is large enough. If
, E
n
, then
[g
k
() g
k+1
()[ <
1
2
k
for all k n, n + 1, n + 2, . . . . Thus g
k
g uniformly on E
c
n
.
For the uniqueness, suppose f
n
f in measure. Then f
n
k
f in measure
also. Since we also have f
n
k
g almost uniformly clearly, f
n
k
g in measure
and hence f = g a.e. This completes the proof.
45
Theorem 3.1 (Egoros Theorem). Suppose (, T, ) is a nite measure space
and that f
n
f a.e. Then f
n
f almost uniformly.
Proof. We use Problem 3.2 below. Let > 0 be given. For each k there is a n(k)
such that if
A
k
=

_
n=n(k)
: [f
n
f[
1
k
,
then (A
k
) /2
k
. Thus if
A =

_
k=1
A
k
,
then (A)

k=1
(A
k
) < . Now, if > 0 take k so large that
1
k
< and then for
any n > n(k) and , A, [f
n
() f()[ <
1
k
< . Thus f
n
f uniformly on
A
c
.
Let us recall that if y
n
is a sequence a sequence of real numbers then
y
n
converges to y if and only if every subsequence y
n
k
has a further subsequence
y
n
k
j
which converges to y. For measurable functions we have the following result.
Proposition 3.3. The sequence of measurable functions f
n
on (, T, ) con-
verges to f in measure if and only if every subsequence f
n
k
contains a further
subsequence converging a.e. to f.
Proof. Let
k
be a sequence converging down to 0. Then [f
n
f[ >
k
0, as
n for each k. We therefore have a subsequence f
n
k
satisfying
[f
n
k
f[ >
k

1
2
k
.
Hence,

k=1
[f
n
k
f[ >
k
<
46
and therefore by the rst BorelCantelli Lemma, [f
n
k
f[ >
k
i.o. = 0. Thus
[f
n
k
f[ <
k
eventually a.e. Thus f
n
k
f a.e.
For the converse, let > 0 and put y
n
= [f
n
f[ > and consider
the subsequence y
n
k
. If every f
n
k
subsequence has a subsequence f
n
k
j
such that
f
n
k
j
f a.e. Then y
n
k
has a subsequence. y
n
k
j
0. Therefore y
n
converges
to 0 and hence That is f
n
0 in measure.
Problem 3.1. Let = [0, ) with the Lebesgue measure and dene f
n
() =
1
A
n
() where A
n
= : n n +
1
n
. Prove that f
n
0 a.e., in measure
and in L
p
() but that f
n
, 0 almost uniformly.
Problem 3.2. Let (, T, ) be a nite measure space. Prove that f
n
f a.e. if
and only if for all > 0
lim
n

_

_
k=n
A
k
()
_
= 0
where
A
k
() = : [f
k
() f()[ .
Problem 3.3.
(i) Give an example of a sequence of nonnegative measurable functions f
n
for
which we have strict inequality in Fatous Lemma.
(ii) Let (, T, ) be a measure space and A
n
be a sequence of measurable sets.
Recall that liminf
n
A
n
=

n=1

k=n
A
k
and prove that
liminf
n
A
n
liminf
n
A
n
.
(converges) Suppose f
n
is a sequence of nonnegative measurable functions on (, T, )
which is pointwise decreasing to f. That is, f
1
() f
2
() 0 and
47
f
n
() f(). Is it true that
lim
n
_

f
n
d =
_

fd?
Problem 3.4. Let (, T, P) be a probability space and suppose f L
1
(P). Prove
that
lim
p0
|f|
p
= exp
_

log [f[dP
where exp is dened to be zero.
Problem 3.5. Let (, T, ) be a nite measure space. Prove that the function
[f[ > , for > 0, is right continuous and nonincreasing. Furthermore, if
f, f
1
, f
2
are nonnegative measurable and
1
,
2
are positive numbers with the prop-
erty that f
1
f
1
+
2
f
2
, then for all > 0,
f > (
1
+
2
) f
1
> +f
2
> .
Problem 3.6. Let f
n
be a nondecreasing sequence of measurable nonnegative
functions converging a.e. on to f. Prove that
lim
n
f
n
> = f > .
Problem 3.7. Let (, T, ) be a measure space and suppose f
n
is a sequence
of measurable functions satisfying

n=1
[f
n
[ >
n
<
for some sequence of real numbers
n
. Prove that
limsup
n
[f
n
[

n
1,
a.e.
48
Problem 3.8. Let (, T, ) be a nite measure space. Let f
n
be a sequence of
measurable functions on this space.
(i) Prove that f
n
converges to f a.e. if and only if for any > 0
lim
m
[f
n
f
n
[ > , for some n

> n m = 0
(ii) Prove that f
n
0 a.e. if and only if for all > 0
[f
n
[ > , i.o = 0
(converges) Suppose the functions are nonnegative. Prove that f
n
a.e. if and only
if for all M > 0
f
n
< M, i.o. = 0
Problem 3.9. Let = [0, 1] with its Lebesgue measure. Suppose f L
1
().
Prove that x
n
f L
1
() for every n = 1, 2, . . . and compute
lim
n
_

x
n
f(x)dx.
Problem 3.10. Let (, T, ) be a nite measure space and f a nonnegative real
valued measurable function on . Prove that
lim
n
_

f
n
d
exists, as a nite number, if and only if f > 1 = 0.
Problem 3.11. Suppose f L
1
(). Prove that
lim
n
_
{|f|>n}
fd = 0
49
Problem 3.12. Let = [0, 1] and let (, T, ) be a nite measure space and f a
measurable function on this space. Let E be the set of all x such that f(x) is
an integer. Prove that the set E is measurable and that
lim
n
_

(cos(f(x))
2n
d = (E)
Problem 3.13. Let (, T, P) be a probability space. Suppose f and g are positive
measurable function such that fg 1 a.e. on . Prove that
_

fgdP 1
Problem 3.14. Let (, T, P) be a probability space and suppose f L
1
(P). Prove
that
lim
p0
|f|
p
= exp
_

log [f[dP
where exp is dened to be zero.
Problem 3.15. Let (, T, P) be a probability space. Suppose f L

(P) and
|f|

> 0. Prove that


lim
n
_
_

[f[
n+1
dP
_

[f[
n
dP
_
= |f|

Problem 3.16. Let (, T, P) be a probability space and f


n
be a sequence of mea-
surable functions converging to zero in measure. Let F be a bounded uniformly
continuous function on R. Prove that
lim
n
_

F(f
n
)dP = F(0)
50
Proclaim 3.17. Let (, T, P) be a probability space.
(i) Suppose F : R R is a continuous function and f
n
f in measure. Prove
that F(f
n
) F(f) in measure.
(i) If f
n
0 and f
n
f in measure. Then
_

fd lim
_

f
n
d.
(ii) Suppose [f
n
[ g where g L
1
() and f
n
f in measure. Then
_

fd = lim
_

f
n
d.
Problem 3.18. Let (, T, ) be a measure space and let f
1
, f
2
, , f
n
be mea-
surable functions. Suppose 1 < p < . Prove that
_

1
n
n

j=1
f
j
(x)

p
d(x)
1
n
_

j=1
[f
j
(x)[
p
d(x)
and
_

1
n
n

j=1
f
j
(x)

p
d(x)
_
_
1
n
n

j=1
|f
j
|
p
_
_
p
Problem 3.19. Let (, T, ) be a measure space and let f
n
be a sequence of
measurable functions satisfying |f
n
|
p
n
1
p
, for 2 < p < . Prove that the
sequence
1
n
f
n
converges to zero almost everywhere.
Problem 3.20. Suppose (, T, P) is a probability space and that f L
1
(P) in
nonnegative. Prove that
_
1 +|f|
2
1

_

_
1 +f
2
dP 1 +|f|
1
51
Problem 3.21. Compute, justifying all your steps,
lim
n
_
n
0
_
1
x
n
_
n
e
x/2
dx.
Problem 3.22. Let probtrip be a probability space. Let f be a measurable function
with the property that |f|
2
= 1 and |f|
1
=
1
2
. Prove that
1
4
(1 )
2
P : [f()[

2
.
52
III
PRODUCT MEASURES
Our goal in this chapter is to present the essentials of integration in product
space. We begin by dening the product measure. Many of the denitions and
properties of product measures are, in some sense, obvious. However, we need to
be properly state them and carefully prove them so that they may be freely used
in the subsequent Chapters.
1 Denitions and Preliminaries.
Denition 1.1. If X and Y are any two sets, their Cartesian product X Y is
the set of all order pairs (x, y): x X, y Y .
If A X, B Y, AB XY is called a rectangle. Suppose (X, /), (X, B)
are measurable spaces. A measurable rectangle is a set of the form A B, A
/, B B. A set of the form
Q = R
1
. . . R
n
,
where the R
i
are disjoint measurable rectangles, is called an elementary sets. We
denote this collection by c.
Exercise 1.1. Prove that the elementary sets form an algebra. That is, c is closed
under complementation and nite unions.
We shall denote by /B the algebra generated by the measurable rectangle
which is the same as the algebra generated by the elementary sets.
53
Theorem 1.1. Let E X Y and dene the projections
E
x
= y Y : (x, y) E, and E
y
= x X: (x, y) E.
If E /B, then E
x
B and E
y
/ for all x X and y Y .
Proof. We shall only prove that if E /B then E
x
B, the case of E
y
being
completely the same. For this, let be the collection of all sets E / B for
which E
x
B for every x X. We show is a algebra containing all measurable
rectangles. To see this, note that if
E = AB
then
E
x
=
_
B if x A
if x , A.
Thus, E . The collection also has the following properties:
(i) X Y .
(ii) If E then E
c
.
This follows from the fact that (E
c
)
x
= (E
x
)
c
, and that / is a algebras.
(iii) If E
i
then E =

i=1
E
i
.
For (iii), observe that E
x
=

i=1
(E
i
)
x
where (E
i
)
x
B. Once again, the fact that
/ is a algebras shows that E . (i)(iii) show that is a algebra and the
theorem follows.
We next show that the projections of measurable functions are measurable.
Let f: X Y R. For x x X, dene f
x
: Y R by f
x
(y) = f(x, y) with a
similar denition for f
y
.
In the case when we have several algebras it will be important to clearly
distinguish measurability relative to each one of these sigma algebras. We shall
54
use the notation f (T) to mean that the function f is measurable relative to
the algebra T.
Theorem 1.2. Suppose f (/B). Then
(i) For each x X, f
x
(B)
(ii) For each y X, f
y
(/)
Proof. Let V be an open set in R. We need to show that f
1
x
(V ) B. Put
Q = f
1
(V ) = (x, y): f(x, y) V .
Since f (/B), Q T (. However,
Q
x
= f
1
x
(V ) = y: f
x
(y) V ,
and it follows by Theorem 1.1 that Q
x
B and hence f
x
(B). The same
argument proves (ii).
Denition 1.2. A monotone class / is a collection of sets which is closed under
increasing unions and decreasing intersections. That is:
(i) If A
1
A
2
. . . and A
i
/, then A
i
/
(ii) If B
1
B
2
. . . and B
i
/, then B
i
/.
Lemma 1.1 (Monotone Class Theorem). Let T
0
be an algebra of subsets of X and
let / be a monotone class containing T
0
. If T denotes the algebra generated
by T
0
then T /.
Proof. Let /
0
be the smallest monotone class containing T
0
. That is, /
0
is the
intersection of all the monotone classes which contain T
0
. It is enough to show that
T /
0
. By Exercise 1.1, we only need to prove that /
0
is an algebra. First we
55
prove that /
0
is closed under complementation. For this let = E : E
c
/
0
.
It follows from the fact that /
0
is a monotone class that is also a monotone
class and since T
0
is an algebra, if E T
0
then E . Thus, /
0
and this
proves it.
Next, let
1
= E : EF /
0
for all F T
0
. Again the fact that /
0
is a
monotone class implies that
1
is also a monotone class and since clearly T
0

1
,
we have /
0

1
. Dene
2
= F : F E /
0
for all E /
0
. Again
2
is a
monotone class. Let F T
0
. Since /
0

1
, if E /
0
, then E F /
0
. Thus
T
0

2
and hence /
0

2
. Thus, if E, F /
0
then E F /
0
. This shows
that /
0
is an algebra and completes the proof.
Exercise 1.2. Prove that an algebra T is a algebra if and only if it is a mono-
tone class.
Exercise 1.3. Let T
0
be an algebra and suppose the two measures
1
and
2
agree
on T
0
. Prove that they agree on the algebra T generated by T
0
.
2 Fubinis Theorem.
We begin this section with a lemma that will allow us to dene the product
of two measures.
Lemma 2.1. Let (X, /, ) and (Y, B, ) be two nite measure spaces. Suppose
Q /B. If
(x) = (Q
x
) and (y) = (Q
y
),
then
(/) and (B)
and
_
X
(x)d(x) =
_
Y
(y)d(y). (2.1)
56
Remark 2.1. With the notation of 1 we can write
(Q
x
) =
_
Y

Q
(x, y)d(y) (2.2)
and
(Q
y
) =
_
X

Q
(x, y)d(x). (2.3)
Thus (2.1) is equivalent to
_
X
_
Y

Q
(x, y)d(y)d(x) =
_
Y
_
X

Q
(x, y)d(x)d(y).
Remark 2.2. Lemma 2.1 allows us to dene a new measure on /B by
( )(Q) =
_
X
(Q
x
)d(x) =
_
Y
(Q
y
)d(y). (2.4)
To see that this is indeed a measure let Q
j
be a disjoint sequence of sets in
/B. Recalling that (Q
j
)
x
= (Q
j
)
x
and using the fact that is a measure we
have
( )
_
_

_
j=1
Q
j
_
_
=
_
X

_
_
_
_

_
j=1
Q
j
_
_
x
_
_
d(x)
=
_
X

_
_
_
_

_
j=1
Q
jx
_
_
_
_
d(x)
=
_
X

j=1
(Q
jx
)d(x)
=

j=1
_
X
(Q
jx
)d(x)
=

j=1
( )(Q
j
),
where the second to the last equality follows from the Monotone Convergence
Theorem.
57
Proof of Lemma 2.1. We assume (X) < and (Y ) < . Let / be the
collection of all Q /B for which the conclusion of the Lemma is true. We will
prove that / is a monotone class which contains the elementary sets; c /. By
Exercise 1.1 and the Monotone class Theorem, this will show that / = T (.
This will be done in several stages. First we prove that rectangles are in /. That
is,
(i) Let Q = AB, A /, B B. Then Q /.
To prove (i) observe that
Q
x
=
_
B if x A
if x , A.
Thus
(x) =
_
(B) if x A
0 if x , A.
=
A
(x)(B)
and clearly (/). Similarly,
(y) = 1
B
(y)(A) B.
Integrating we obtain that
_

_
_
X
(x)d(x) = (A)(B)
_
Y
(y)d(y) = (A)(B),
proving (i).
(ii) Let Q
1
Q
2
. . . , Q
j
/. Then Q =

j=1
Q
j
/.
To prove this, let

n
(x) = ((Q
n
)
x
) =
_
_
_
_
n
_
j=1
Q
j
_
_
x
_
_
58
and

n
(y) = (Q
y
n
) =
_
_
_
_
n
_
j=1
Q
j
_
_
y
_
_
.
Then

n
(x) (x) = (Q
x
)
and

n
(x) (x) = (Q
x
).
Since
n
(/) and
n
(B), we have (/) and (B). Also by
assumption
_
X

n
(x)d(x) =
_
Y

n
(y)d(y),
for all n. By Monotone Convergence Theorem,
_
X
(x)d(x) =
_
Y
(y)d(y)
and we have proved (ii).
(iii) Let Q
1
Q
2
. . . , Q
j
/. Then Q =

j=1
Q
j
/.
The proof of this is the same as (ii) except this time we use the Dominated Conver-
gence Theorem. That is, this time the sequences
n
(x) = ((Q
n
)
x
),
n
(y) = (Q
y
n
)
are both decreasing to (x) = (Q
x
) and (y) = (Q
y
), respectively, and since
since both measures are nite, both sequences of functions are uniformly bounded.
(iv) Let Q
i
/ with Q
i
Q
j
= . Then

j=1
Q
i
/.
For the proof of this, let

Q
n
=
n

i=1
Q
i
. Then

Q
n
/, since the sets are disjoint.
However, the

Q

n
s are increasing and it follows from (ii) that their union is in /,
proving (iv).
It follows from (i)(iv) that / is a monotone class containing the elementary
sets c. By the Monotone Class Theorem and Exercise 1.1, /B = (c) = /. This
proves the Lemma for nite measure and the following exercise does the rest.
59
Exercise 2.1. Extend the proof of Lemma 2.1 to the case of nite measures.
Theorem 2.1 (Fubinis Theorem). Let (X, /, ) and (Y, B, ) be nite measure
spaces. Let f (/B).
(a) (Tonelli) If f is nonnegative and if
(x) =
_
X
f
x
(y)d(y), (y) =
_
X
f
y
(x)d(x), (2.5)
then
(/), (B)
and
_
X
(x)d(x) =
_
XY
f(x, y)d( ) =
_
Y
(y)d(y).
(b) If f is complex valued such that

(x) =
_
Y
[f(y)[
x
d(y) =
_
X
[f(x, y)[d(y) < (2.6)
and
_
X

(x)d(x) <
then
f L
1
( ).
and (2.6) holds. A similarly statement holds for y in place ofx.
(c) If f L
1
( ), then f
x
L
1
() for a.e. x X, f
y
L
1
() for
a.e. y Y , the functions dened in (2.5) are measurable and (2.6)
holds.
Proof of (a). If f =
Q
, Q /B, the result follows from Lemma 2.1. By linearity
we also have the result for simple functions. Let 0 s
1
. . . be nonnegative simple
functions such that s
n
(x, y) f(x, y) for every (x, y) X Y . Let

n
(x) =
_
Y
(s
n
)
x
(y)d(y)
60
and

n
(y) =
_
X
s
y
n
(x)d(x).
Then
_
X

n
(x)d(x) =
_
XY
s
n
(x, y)d( )
=
_
Y

n
(y)d(y).
Since s
n
(x, y) f(x, y) for every (x, y) X Y ,
n
(x) (x) and
n
(y) (y).
The Monotone Convergence Theorem implies the result. Parts (b) and (c) follow
directly from (a) and we leave these as exercises.
The assumption of niteness is needed as the following example shows.
Example 2.1. X = Y = [0, 1] with = the Lebesgue measure and = the
counting measure. Let f(x, y) = 1 if x = y, f(x, y) = 0 if x ,= y. That is, the
function f is the characteristic function of the diagonal of the square. Then
_
X
f(x, y)d(x) = 0, and
_
Y
f(x, y)d(y) = 1.
Remark 2.1. Before we can integrate the function f in this example, however, we
need to verify that it (and hence its projections) is (are) measurable. This can be
seen as follows: Set
I
j
=
_
j 1
n
,
j
n
_
and
Q
n
= (I
1
I
1
) (I
2
I
2
) . . . (I
n
I
n
).
Then Q
n
is measurable and so is Q = Q
n
, and hence also f .
Example 2.2. Consider the function
f(x, y) =
x
2
y
2
(x
2
+y
2
)
2
on (0, 1) (0, 1).
61
with the = = Lebesgue measure. Then
_
1
0
_
1
0
f(x, y)dydx = /2
but
_
1
0
_
1
0
f(x, y)dxdy = /2.
The problem here is that f / L
1
(0, 1) (0, 1) since
_
1
0
[f(x, y)[dy 1/2x.
Let m
k
= Lebesgue measure in R
k
and recall that m
k
is complete. That is,
if m
k
(E) = 0 then E is Lebesgue measurable. However, m
1
m
1
is not complete
since xB, for any set B R, has m
1
m
1
measure zero. Thus m
2
,= m
1
m
1
.
What is needed here is the notion of the completion of a measure. We leave the
proof of the rst two Theorems as exercises.
Theorem 2.2. If (X, T, ) is a measure space we let
T

= E X: A and B T, A E B and (BA) = 0.


Then T

is a algebra and the function

dened on T

by

(E) = (A)
is a measure. The measure space (X, m

) is complete. This new space is called


the completion of (X, T, ).
Theorem 2.3. Let m
n
be the Lebesgue measure on R
n
, n = r + s. Then m
n
=
(m
r
m
j
)

, the completion of the product Lebesgue measures.


The next Theorem says that as far as Fubinis theorem is concerned, we need
not worry about incomplete measure spaces.
62
Theorem 2.4. Let (X, /, ), (Y, B, ) be two complete nite measure spaces.
Theorems 2.1 remains valid if is replaced by ()

except that the functions


and are dened only almost everywhere relative to the measures and ,
respectively.
Proof. The proof of this theorem follows from the following two facts.
(i) Let (X, T, ) be a measure space. Suppose f (T

). Then there is a
g (T) such that f = g a.e. with respect to .
(ii) Let (X, /, ) and (Y, B, ) be two complete and nite measure spaces.
Suppose f ((/ B)

) is such that f = 0 almost everywhere with


respect to . Then for almost every x X with respect to , f
x
= 0
a.e. with respect to . In particular, f
x
(B) for almost every x X.
A similar statement holds with y replacing x.
Let us assume (i) and (ii) for the moment. Then if f ((/ B)

) is
nonnegative there is a g (/ B) such that f = g a.e. with respect to .
Now, apply Theorem 2.1 to g and the rest follows from (ii).
It remains to prove (i) and (ii). For (i), suppose that f =
E
where E /

.
By denition A E B with (AB) = 0 and A and B /. If we set g =
A
we have f = g a.e. with respect to and we have proved (i) for characteristic
function. We now extend this to simple functions and to nonnegative functions in
the usual way; details left to the reader. For (ii) let = (x, y) : f(x, y) ,= 0.
Then (/B)

and ( )() = 0. By denition there is an



/B such
that

and (

) = 0. By Theorem 2.1,
_
X
(

x
)d(x) = 0
and so (

x
) = 0 for almost every x with respect to . Since
x

x
and the
space (Y, B, ) is complete, we see that
x
B for almost every x X with respect
63
to the measure . Thus for almost every x X the projection function f
x
B
and f
x
(y) = 0 almost everywhere with respect to . This completes the proof of
(ii) and hence the theorem.
Exercise 2.3. Let f be a nonnegative measurable function on (X, T, ). Prove
that for any 0 < p < ,
_
X
f(x)
p
d(x) = p
_

0

p1
x X : f(x) > d.
Exercise 2.4. Let (X, T, ) be a measure space. Suppose f and g are two nonneg-
ative functions satisfying the following inequality: There exists a constant C such
that for all > 0 and > 0,
x X : f(x) > 2, g(x) C
2
x X : f(x) > .
Prove that
_
X
f(x)
p
d C
p
_
X
g(x)
p
d
for any 0 < p < for which both integrals are nite where C
p
is a constant
depending on C and p.
Exercise 2.5. For any R dene
sign() =
_

_
1, > 0
0, = 0
1, < 0
Prove that
0 sign()
_
y
0
sin(x)
x
dx
_

0
sin(x)
x
dx (2.7)
for all y > 0 and that
_

0
sin(x)
x
dx =

2
sign() (2.8)
and
_

0
1 cos(x)
x
2
dx =

2
[[. (2.9)
64
Exercise 2.6. Prove that
e

=
2

_

0
cos s
1 +s
2
ds (2.10)
for all > 0. Use (2.10), the fact that
1
1 +s
2
=
_

0
e
(1+s
2
)t
dt,
and Fubinis theorem, to prove that
e

=
1

_

0
e
t

t
e

2
/4t
dt. (2.11)
Exercise 2.7. Let S
n1
= x R
n
: [x[ = 1 and for any Borel set E S
n1
set

E = r : 0 < r < 1, A. Dene the measure on S
n1
by (A) = n[

E[.
Notice that with this denition the surface area
n1
of the sphere in R
n
satises

n1
= n
n
= 2
n
2
/(
n
2
) where
n
is the volume of the unit ball in R
n
. Prove
(integration in polar coordinates) that for all nonnegative Borel functions f on R
n
,
_
R
n
f(x)dx =
_

0
r
n1
__
S
n1
f(r)d()
_
dr.
In particular, if f is a radial function, that is, f(x) = f([x[), then
_
R
n
f(x)dx =
2
n
2
(
n
2
)
_

0
r
n1
f(r)dr = n
n
_

0
r
n1
f(r)dr.
Exercise 2.8. Prove that for any x R
n
and any 0 < p <
_
S
n1
[ x[
p
d() = [x[
p
_
S
n1
[
1
[
p
d()
where x =
1
x
1
+ +
n
x
n
is the inner product in R
n
.
65
Exercise 2.9. Let e
1
= (1, 0, . . . , 0) and for any S
n1
dene , 0 such
that e
1
= cos . Prove, by rst integrating over L

= S
n1
: e
1
= cos ,
that for any 1 p < ,
_
S
n1
[
1
[
p
d() =
n1
_

0
[ cos [
p
(sin )
n2
d. (2.12)
Use (2.12) and the fact that for any r > 0 and s > 0,
2
_
2
0
(cos )
2r1
(sin )
2s1
d =
(s)(r)
(r +s)
([Ru1, p. 194]) to prove that for any 1 p <
_
S
n1
[
1
[
p
d() =
2
n1
2
(
p+1
2
)
(
n+p
2
)
(2.13)
66
IV
RANDOM VARABLES
1 Some Basics.
From this point on, (, T, ) will denote a probability space. X : R is a
random variable if X is measurable relative to T. We will use the notation
E(X) =
_

XdP.
E(X) is called the expected value of X, or expectation of X. We recall from Prob-
lem that if X is a random variable, then (A) =
X
(A) = PX A, A B(R),
is a probability measure on (R, B(R)). This measure is called the distribution mea-
sure of the random variable X. Two random variables X, Y are equally distributed
if
X
=
X
. This is often written as X
d
= Y or X Y .
If we take the set A = (, x], for any x R, the n

X
(, x] = PX x = F
X
(x)
denes a distribution function, as we saw in Chapter I. We list some additional
properties of this distribution function given the fact that
X
(R) = 1 and since it
arises from the random variable X.
(i) F
X
(b) = F
X
(a) = (a, b].
(ii) lim
x
F
X
(x) = 1, lim
x
F
X
(x) = 0.
(iii) With F
X
(x) = lim
yx
F
X
(y), we see that F
X
(x) = P(X < x).
(iv) PX = x =
X
x = F
X
(x) F
X
(x).
67
It follows from (iv) that F is continuous at x R if and only if x is not an
atom of the measure . That is, if and only if
X
x = 0. As we saw in Chapter
I, distribution functions are in a one-to-one correspondence with the probability
measures in (R, B). Also, as we have just seen, every random variable gives rise to
a distribution function. The following theorem completes this circle.
Theorem 1.1. Suppose F is a distribution function. Then there is a probability
space (, T, P) and a random variable X dened on this space such that F = F
X
.
Proof. We take (, T, ) with = (0, 1), T = Borel sets and P the Lebesgue
measure. For each , dene
X() = supy : F(y) < .
We claim this is the desired random variable. Suppose we can show that for each
x R,
: X() x = : F(x). (1.1)
Clearly then X is measurable and also PX() x = F(x), proving that F =
F
X
. To prove (1.1) let
0
: F(x). That is,
0
F(x). Then
x / y : F(y) <
0
and therefore X(
0
) x. Thus : F(x)
: X() x.
On the other hand, suppose
0
> F(x). Since F is right continuous, there
exists > 0 such that F(x + ) <
0
. Hence X() x + > x. This shows that

0
/ : X x and concludes the proof.
Theorem 1.2. Suppose X is a random variable and let G : R R be Borel
measurable. Suppose in addition that G is nonnegative or that E[G(X)[ < .
Then
_

G(X())d() = E(G(X)) =
_
R
G(y)d
X
(y). (1.2)
68
Proof. Let B B(R). Then
E(1
B
(X()) = P

X B
=
X
(B) =
_
B
d
X
=
_
R
1
B
(y)d
X
(y).
Thus the result holds for indicator functions. By linearity, it holds for simple
functions. Now , suppose G is nonnegative. Let
n
be a sequence of nonnegative
simple functions converging pointwise up to G. By the Monotone Convergence
Theorem,
E(G(X()) =
_
R
G(x)d
X
(x).
If E[G(X)[ < write
G(X()) = G
+
(X()) G

(X()).
Apply the result for nonnegative G to G
+
and G

and subtract the two using the


fact that E(G(X)) < .
More generally let X
1
, X
2
, . . . , X
n
be n-random variables and dene their
join distribution by

n
(A) = P(X
1
, X
2
, . . . , X
n
) A, A B(R
n
).

n
is then a Borel probability measure on (R
n
, B). As before, if G : R
n
R is
Borel measurable nonnegative or E(G(X
1
, X
2
, . . . , X
n
)) < , then
E(G(X
1
(), X
2
(), . . . , X
n
())) =
_
R
n
G(x
1
, x
2
, . . . , x
n
)d
n
(x
1
, . . . , x
n
).
The quantity EX
p
, for 1 p < is called the pth moment of the random
variable X. The case and the variance is dened by var(X) = E[X m[
2
Note
that by expanding this quantity we can write
var(X) = EX
2
2(EX)
2
+ (EX)
2
= EX
2
(EX)
2
69
If we take the function G(x) = x
p
then we can write the pth moments in terms
of the distribution as
EX
p
=
_
R
x
p
d
X
and with G(x) = (x = m)
2
we can write the variance as
var(X) =
_
R
(x m)
2
d
=
_
R
x
2
d
X
m
2
.
Now, recall that if f is a nonnegative measurable function on (, T, P) then
(A) =
_
A
fdP
denes a new measure on (, T) and
_

g d =
_

gfdP. (1.3)
In particular, suppose f is a nonnegative borel measurable function in R with
_
R
f(x)dx = 1
where here and for the rest of these notes we will simply write dx in place of dm
when m is the Lebesgue measure. Then
F(x) =
_
x

f(t)dt
is a distribution function. Hence if (A) =
_
A
f dt, A B(R) then is a probability
measure and since
(a, b] =
_
b
a
f(t)dt = F(b) F(a),
70
for all interval (a, b] we see that F (by the construction in Chapter I). Let X
be a random variable with this distribution function. Then by (1.3) and Theorem
1.2,
E(g(X)) =
_
R
g(x)d(x) =
_
R
g(x)f(x)dx. (1.4)
Distributions arising from such f

s are called absolutely continuous distributions.


We shall now give several classical examples of such distributions. The function f
is called the density of the random variable associated with the distribution.
Example 1.1. The Uniform distribution on (0, 1).
f(x) =
_
1 x (0, 1)
0 x / (0, 1)
Then
F(x) =
_

_
0 x 0
x 0 x 1
1 x 1
.
If we take a random variable with this distribution we nd that the variance
var(X) =
1
12
and that the mean m =
1
2
.
Example 1.2. The exponential distribution of parameter . Let > 0 and set
f(x) =
_
e
x
, x 0
0 else
If X is a random variable with associated to this density, we write X
exp().
EX
k
=
_

0
x
k
e
x
dx =
k!

k
71
Example 1.3. The Cauchy distribution of parameter a. Set
f(x) =
1

a
a
2
+x
2
We leave it to the reader to verify that if the random variable X has this
distribution then E[X[ = .
Example 1.3. The normal distribution. Set
f(x) =
1

2
e

x
2
2
.
The corresponding random variable is the normal distribution. We write X
N(0, 1) By symmetry,
E(X) =
1

2
_
R
xe

x
2
2
dx = 0.
To compute the variance let us recall rst that for any > 0,
() =
_

0
t
1
e
t
dt.
We note that
_
R
x
2
e

x
2
2
dx = 2
_

0
x
2
e

x
2
2
dx = 2

2
_

0
u
1
2
e
u
dx
= 2

2(
3
2
) = 2

1(
1
2
+ 1) =
k

2
k
(
1
2
) =

2
and hence var(X) = 1. If we take > 0 and R, and set
f(x) =
1
_
(2
2
)
e
(x)
2
2
2
we get the normal distribution with mean and variance and write X N(, ).
For this we have Ex = and var(X) =
2
.
72
Example 1.4. The gamma distribution arises from
f(x) =
_
1
()

x
1
e
x
x 0
0, x < 0.
We write X (, ) when the random variable X has this density.
Random variables which take only discrete values are appropriately called
discrete random variables. Here are some examples.
Example 1.5. X is a Bernoulli random variable with parameter p, 0 < p < 1, if
X takes only two values one with probability p and the other with probability 1p.
P(X = 1) = p and P(X = 0) = 1 p
For this random variable we have
EX = p 1 + (1 p) 0 = p,
EX
2
= 1
2
p = p
and
var(X) = p p
2
= p(1 p).
Example 1.6. We say X has Poisson distribution of parameter if
PX = k = e

k
k!
k = 0, 1, 2, . . . .
For this random variable,
EX =

k=0
k
e

k
k!
= e

k=1

k1
(k 1)!
=
and
Var(X) = EX
2

2
=

k=0
k
2
e

k
k!

2
= .
73
Example 1.7. The geometric distribution of parameter p. For 0 < p < 1 dene
PX = k = p(1 p)
k1
, for k = 1, 2, . . .
The random variable represents the number of independent trial needed to
observe an event with probability p. By the geometric series,

k=1
(1 p)
k
=
1
p
and we leave it to the reader to verify that
EN =
1
p
and
var(N) =
1 p
p
2
.
2 Independence.
Denition 2.1.
(i) The collection T
1
, T
2
, . . . , T
n
of algebras is said to be independent if when-
ever A
1
T
j
, A
2
T
2
, . . . , A
n
T
n
, then
P
_

n
j=1
A
j
_
=
n

j=1
P(A
j
).
(ii) A collection X
j
: 1 j n of random variables is said to be (totally)
independent if for any B
j
: 1 j n of Borel sets in R,
PX
1
B
1
, X
1
B
2
, . . . X
n
B
n
= P
n

j=1
(X
j
B
j
) =
n

j=1
PX
j
B
j
.
(iii) The collection of measurable subsets A
1
, A
2
, . . . , A
n
in a algebra T is in-
dependent if for any subset I 1, 2, . . . , n we have
P
_
_
_

jI
(A
j
)
_
_
_
=

jI
PA
j

74
Whenever we have a sequence X
1
, . . . , X
n
of independent random variables
with the same distribution, we say that the random variables are identically dis-
tributed and write this as i.i.d. We note that (iii) is equivalent to asking that the
random variables 1
A
1
, 1
A
2
, . . . , 1
A
3
are independent. Indeed, for one direction we
take B
j
= 1 for j I and B
j
= R for j / I. For the other direction the reader
is asked to do
Problem 2.1. Let A
1
, A
2
, . . . , A
n
be independent. Proof that A
c
1
, A
c
2
, . . . A
c
n
and
1
A
1
, 1
A
2
, . . . , 1
A
n
are independent.
Problem 2.2. Let X and Y be two random variable and set T
1
= (X) and
T
2
= (Y ). (Recall that the sigma algebra generated by the random X, denoted
(X), is the sigma algebra generated by the sets X
1
B where B ranges over
all Borel sets in R.) Prove that X, Y are independent if and only if T
1
, T
2
are
independent.
Suppose X
1
, X
2
, . . . , X
n
are independent and set

n
(B) = PX
1
, . . . , X
n
) B B B(R
n
),
as in 1. Then with B = B
1
B
n
we see that

n
(B
1
B
n
) =
n

i=1

j
(B
j
)
and hence

n
=
1

n
where the right hand side is the product measure constructed from
1
, . . . ,
n
as
in Chapter III. As we did earlier. Thus for this probability measure on (R
n
, B),
the corresponding n-dimensional distribution is
F(x) =
n

j=1
F
X
j
(x
j
),
where x = (x
1
, x
2
, . . . , x
n
)).
75
Denition 2.2. Suppose / T. /is a -system if it is closed under intersections:
/, B / A B /. The subcollection L T is a -system if (i) L, (ii)
A, B L and A B BA L and (iii) A
n
L and A
n
A A L.
Theorem 2.1. Suppose / is a -system and L is a -system and A L. Then
(/) L.
Theorem 2.2. Let and be two probability measures on (, T). Suppose they
agree on the -system / and that there is a sequence of sets A
n
/ with A
n
.
Then = on (/)
Theorem 2.3. Suppose /
1
, /
2
, . . . , /
n
are independent and -systems. Then
(/
2
), (/
2
), . . . , (/
n
) are independent.
Corollary 2.1. The random variables X
1
, X
2
, . . . , X
n
are independent if and only
if for all x = x
1
, . . . , x
n
, x
i
(, ].
F(x) =
n

j=1
F
X
j
(x
i
), (2.1)
where F is the distribution function of the measure
n
.
Proof. We have already seen that if the random variables are independent then
the distribution function F satises 2.1. For the other direction let x R
n
and set
A
i
be the sets of the form X
i
x
i
. Then
X
i
x
i
X
i
y
i
= X
i
x
i
y
i
/
i
.
Therefore the collection /
i
is a -system. (/
i
) = (X).
Corollary 2.2.
n
=
1

n
.
76
Corollary 2.3. Let X
1
, . . . , X
n
with X
i
0 or E[X
i
[ < be independent. Then
E
_
_
n

j=1
X
i
_
_
=
n

i=1
E(X
i
).
Proof. Applying Fubinis Theorem with f(x
1
, . . . x
n
) = x
1
x
n
we have
_
R
n
(x
1
x
n
)d(
1

n
) =
_
R
x
1
d
1
(a)
_
R
n
x
n
d
n
(a).

It follows as in the proof of Corollary 1.3 that if X


1
, . . . , X
n
are independent
and g 0 or if E[

n
j=1
g(X
i
)[ < , then
E
_
n

i=1
g(X
i
)
_
=
n

i=1
E(g(X
i
)).
We warn the reader not to make any inferences in the opposite direction. It may
happen that E(XY ) = (E(X)E(Y ) and yet X and Y may not be independent.
Take the two random variables X and Y with joint distributions given by
XY 1 0 1
1 0 a 0
0 b c b
1 0 a 0
with 2a + 2b + c = 1, a, b, c > 0. Then XY = 0 and E(X)E(Y ) = 0. Also by
symmetry, EX = EY = 0. However, the random variables are not independent.
Why? Well, observe that P(X = 1, Y = 1) = 0 and that P(X = 1) = P(X =
1, Y = 1) = ab ,= 0.
Denition 2.2. If F and G are two distribution functions we dene their convo-
lution by
F G(z) =
_
R
F(z y)d(y)
77
where is the probability measure associated with G. The right hand side is often
also written as
_
R
F(z y)dG(y).
In this notes we will use both notations.
Theorem 2.4. If X and Y are independent with X F
X
, Y G
Y
, then X+Y
F G.
Proof. Let Let us x z R. Dene
h(x, y) = 1
(x+yz)
(x, y)
Then
F
X+Y
(z) = PX +Y z
= E(h(X, Y ))
=
_
R
2
h(x, y)d(
X

Y
)(x, y)
=
_
R
__
R
h(x, y)d
X
(x)
_
d
Y
(y)
=
_
R
__
R
1
,zy)
(x) d
X
(x)
_
d
Y
(y)
=
_
R

X
(, z y)d
Y
(y) =
_
R
F(z y)dG(y).

Corollary 2.4. Suppose X has a density f and Y G, and X and Y are inde-
pendent. Then X +Y has density
h(x) =
_
R
f(x y)dG(y).
If both X and Y have densities with g denoting the density of Y . Then
h(x) =
_
f(x y)g(y)dy.
78
Proof.
F
X+Y
(z) =
_
R
F(z y)dG(y)
=
_
R
_
zy

f(x)dxdG(y)
=
_
R
_
z

f(u y)dudG(y)
=
_
z

_
R
f(u y)dG(y)du
=
_
z

__
R
f(u y)g(y)dy
_
du,
which completes the proof.
Problem 2.1. Let X (, ) and Y (, ). Prove that X+Y (+, ).
3 Construction of independent random variables.
In the previous section we have given various properties of independent ran-
dom variables. However, we have not yet discussed their existence. If we are given
a nite sequence F
1
, . . . F
n
of distribution functions, it is easy to construct inde-
pendent random variables with this distributions. To do this, let = R
n
and
T = B(R
n
). Let P be the measure on this space such that
P((a
1
, b
1
] (a
n
, b
n
]) =
n

j=1
(F
j
(b
j
) F
j
(a
j
)).
Dene the random variables X
j
: R by X
j
() =
j
, where = (
1
, . . . ,
n
).
Then for any x
j
R,
P(X
j
x
j
) = P(R (, x
j
] R R) = F
j
(x
j
).
Thus X
j
F
j
. Clearly, these random variables are independent by Corollary 2.1.
It is, however, extremely important to know that we can do this for innitely many
distributions.
79
Theorem 3.1. Let F
j
be a nite or innite sequence of distribution functions.
Then there exists a probability space (, T, P) and a sequence of independent ran-
dom variables X
j
on this space with X
j
F
j
.
Let N = 1, 2, . . . and let R
N
be the space of innite sequences of real
numbers. That is, R
N
= = (
1
,
2
, . . . ) :
i
R. Let B(R
N
) be the -
algebra on R
N
generated by the nite dimensional sets. That is, sets of the form
R
N
:
i
B
i
, 1 i n, B
i
B(R).
Theorem 3.2 (Kolmogovovs Extension Theorem). Suppose we are given
probability measures
n
on (R
n
, B(R
n
)) which are consistent. That is,

n+1
(a
1
, b
1
] (a
n
, b
n
] R) =
n
(a
1
, b
1
] (a
n
, b
n
].
Then there exists a probability measure P on (R
N
, B(R
N
)) such that
P :
i
(a
i
, b
i
], 1 i n =
n
((a
1
, b
1
] (a
n
, b
n
]).
The Means above
n
are consistent. Now, dene
X
j
: R
N
R
by
X
j
() =
j
.
Then X
j
are independent under the extension measure and X
j
F
j
.
A dierent way of constructing independent random variables is the following,
at least Bernoulli random variables, is as follows. Consider = (0, 1] and recall
that for x (0, 1) we can write
x =

n=1

n
2
n
where
n
is either 0 or 1. (This representation in actually unique except for x the
dyadic nationals.
80
Problem 3.1. Dene X
n
(x) =
n
. Prove that the sequence X
n
of random
variables is independent.
Problem 3.2. Let A
n
be a sequence of independent sets. Prove that
P

j=1
A
j
=

j=1
PA
j

and
P

_
j=1
A
j
= 1

j=1
(1 PA
j
)
Problem 3.3. Let X
1
, . . . , X
n
be independent random variables with X
j
F
j
.
Fin the distribution of the random variables max
1jn
X
j
and min
1jn
X
j
.
Problem 3.4. Let X
n
be independent random variables and f
n
be Borel mea-
surable. Prove that the sequence of random variables f
n
(X
n
) is independent.
Problem 3.5. Suppose X and Y are independent random variables and that X+
Y L
p
(P) for some 0 < p < . Prove that both X and Y must also be in L
p
(P).
Problem 3.6. The covariance of two random variables X and Y is dened by
Cov(X, Y ) = E[(X EX)(Y EY )]
= E(XY ) E(X)E(Y ).
Prove that
var(X
1
+X
2
+ +X
n
) =
n

j=1
var(X
j
) +
n

i,j=1,i=j
Cov(X
i
, X
j
)
and conclude that if the random variables are independent then
var(X
1
+X
2
+ +X
n
) =
n

j=1
var(X
j
)
81
V
THE CLASSICAL LIMIT THEOREMS
1 Bernoulli Trials.
Consider the sequence of independent random variables which arise from
tossing a fair coin.
X
i
=
_
1 with probability p
0 with probability 1 p
If we use 1 to denote success (=heads) and 0 to denote failure (=tails) and S
n
, for
the number of successes in n -trials, we can write
S
n
=
n

j=1
X
j
.
We can compute and nd that the probability of exactly j successes in n
trials is
PS
n
= j =
_
n
j
_
Pany specic sequence of n trials with exactly j heads
=
_
n
j
_
p
j
(1 p)
nj
=
_
n
j
_
p
j
(1 p)
nj
=
n!
j!(n j)!
p
j
(1 p)
nj
.
This is called Bernoullis formula. Let us take p = 1/2 which represents a fair
coin. Then
S
n
n
denotes the relative frequency of heads in n trials, or, the average
number of successes in n trials. We should expect, in the long run, for this to be
1/2. The precise statement of this is
82
Theorem 1.1 (Bernoullis Law of averages, or Weak low of large num-
bers). As nincreases, the probability that the average number of successes de-
viates from
1
2
by more than any preassigned number tends to zero. That is,
P[
S
n
n

1
2
[ > 0, as n .
Let x [0, 1] and consider its dyadic representation. That is, write
x =

n=1

n
2
n
with
n
= 0 or 1. The number x is a normal number if each digit occurs the right
proportion of times, namely
1
2
.
Theorem 1.2 (Borel 1909). Except for a set of Lebesgue measure zero, all
numbers in [0, 1] are normal numbers. That is, X
n
(x) =
n
and S
n
is the partial
sum of these random variables, we have
S
n
(x)
n

1
2
a.s. as n .
The rest of this chapter is devoted to proving various generalizations of these
results.
2 L
2
and Weak Laws.
First, to conform to the language of probability we shall say that a sequence of
random variables X
n
converges almost surely, and write this as a.s., if it converges
a.e. as dened in the Chapter II. If the convergence is in measure we shall say that
X
n
X in probability. That is, X
n
X if for all > 0,
P[X
n
X[ > 0 as n .
We recall that X
n
X in L
p
then X
n
X in probability and that there is a
subsequence X
n
k
X a.s. In addition, recall that by Problem 3.8 in Chapter II,
X
n
Y a.s. i for any > 0,
lim
m
P[Y
n
Y [ for all n m = 1 (2.1)
83
or
lim
m
P[X
n
X[ > for all n m = 0. (2.2)
The proofs of these results are based on a convenient characterization of a.s. con-
vergence. Set
A
m
=

n=m
[X
n
X[
= [X
n
X[ for all n m
so that
A
c
m
= [X
n
X[ > for some n m.
Therefore,
[X
n
X[ > i.o. =

m=1

_
n=m
[X
n
X[ >
=

m=1
A
c
m
.
However, since X
n
X a.s. if and only if [X
n
X[ < , eventually almost surely,
we see that X
n
X a.s. if and only if
P[X
n
X[ > i.o. = lim
m
PA
c
m
= 0. (2.3)
Now, (2.1) and (2.2) follow easily from this. Suppose there is a measurable
set N with P(N) = 0 such that for all
0
= N, X
n
(
0
) X(
0
).
Set
A
m
() =

n=m
[X
n
X[ (2.3.)
A
m
() A
m+1
(). Now, for each
0

0
there exists an M(
0
, ) such that for
all n M(
0
, ), [X
n
X[ . Therefore, A
M(
0
,)
. Thus

_
m=1
A
m
()
84
and therefore
1 = P(
0
) = lim
m
PA
m
(),
which proves that (2.1) holds. Conversely, suppose (2.1) holds for all > 0 and
the A
m
() are as in (2.3) and A() =

m=1
A
m
(). Then
PA() = P

_
m=1
A
m
() = 1.
Let
0
A(). Then for
0
A() there exists m = m(
0
, ) such that [X
m
X[
Let = 1/n. Set
A =

n=1
A(
1
n
).
Then
P(A) = lim
n
P(A(
1
n
)) = 1
and therefore if
0
A we have
0
A(1/n) for all n. Therefore [X
n
(
0
)
X(
0
)[ < 1/n which is the same as X
n
(
0
) X(
0
)
Theorem 2.1 (L
2
weak law). Let X
j
be a sequence of uncorrelated random
variables. That is, suppose EX
i
X
j
= EX
i
EX
j
. Assume that EX
i
= and that
var (X
i
) C for all i, where C is a constant. Let S
n
=
n

i=1
X
i
. Then
S
n
n
as
n in L
2
(P) and in probability.
Corollary 2.1. Suppose X
i
are i.i.d with EX
i
= , var (X
i
) < . Then
S
n
n

in L
2
and in probability.
Proof. We begin by recalling from Problem that if X
i
are uncorrelated and E(X
2
i
) <
then var(X
1
+. . . +X
n
) = var(X
1
) +. . . +var(X
n
) and that
var(cX) = c
2
var(X) for any constant c. We need to verify that
E

S
n
n

2
0.
85
Observe that E
_
S
n
n
_
= and therefore,
E

S
n
n

2
= var
_
S
n
n
_
=
1
n
2
var (S
n
)
=
1
n
2
n

i=1
var (X
i
)

Cn
n
2
and this last term goes to zero as n goes to innity. This proves the L
2
. Since
convergence in L
p
implies convergence in probability for any 0 < p < , the
result follows.
The assumption in Theorem Here is a standard application of the above
weaklaw.
Theorem 2.2 (The Weierstrass approximation Theorem). Let f be a con-
tinuous function on [0, 1]. Then there exists a sequence p
n
of polynomials such that
p
n
f uniformly on [0, 1].
Proof. Without loss of generality we may assume that f(0) = f(1) = 0 for if this
is not the case apply the result to g(x) = f(x) f(0) x(f(1) f(0)). Put
p
n
(x) =
n

j=0
_
n
j
_
x
j
(1 x)
nj
f(j/n)
recalling that
_
n
j
_
=
n!
j!(n j)!
.
The functions p
n
(x) are clearly polynomials. These are called the Bernstein poly-
nomials of degree n associated with f.
86
Let X
1
, X
2
, . . . i.i.d according to the distribution: P(X
i
= 1) = x, 0 < x < 1.
P(X
i
= 0) = 1 x so that E(X
i
) = x and var(X
i
) = x(1 x). The if S
n
denotes
their partial sums we have from the above calculation that
PS
n
= j =
_
n
j
_
x
j
(1 x)
nj
.
Thus
E(f(S
n
/n)) =
n

j=0
_
n
j
_
x
j
(1 x)
nj
f(j/n) = p
n
(x)
as n . Also, S
n
/n x in probability. By Chebyshevss inequality,
P
_

S
n
n
x

>
_

2
var
_
S
n
n
_
=
1

2
1
n
2
var(S
n
)
=
x(1 x)
n
2

1
4n
2
for all x [0, 1] since x(1 x)
1
4
. Set M = |f|

and let > 0. There exists a


> 0 such that [f(x) f(y)[ < when [x y[ < . Thus
[p
n
(x) f(x)[ =

Ef
_
S
n
n
_
f(x)

E
_
f
_
S
n
n
_
f(x)
_

f
_
S
n
n
_
f(x)

=
_
{|
S
n
n
x|<}

f
_
S
n
n
_
f(x)

dP
+
_
{|
S
n
n
x|

f
_
S
n
n
_
f(x)

dP
< + 2MP
_

S
n
n
x


_
.
Now, the right hand side can be made smaller than 2 by taking n large enough
and independent of x. This proves the result.
87
The assumption that the variances of the random variables are uniformly
bounded can be considerably weaken.
Theorem 2.3. Let X
i
be i.i.d. and assume
P[X
i
[ > 0 (1.3)
as . Let S
n
=

n
j=1
X
j
and
n
= E(X
1
1
(|X
1
|n)
). Then
S
n
n

n
0
in probability.
Remark 2.1. The condition (1.3) is necessary to have a sequence of numbers a
n
such that
S
n
n
a
n
0 in probability. For this, we refer the reader to Feller, Vol
II (1971).
Before proving the theorem we have a
Corollary 2.2. Let X
i
be i.i.d. with E[X
1
[ < . Let = EX
i
. Then
S
n
n
in
probability.
Proof of Corollary. First, by the Monotone Convergence Theorem and Cheby-
shevss inequality, P[X
i
[ > = P[X
1
[ > 0 as and
n

E(X) = . Hence,
P
_

_
S
n
n

_

>
_
= P
_

S
n
n
+
n

n

>
_
P
_

S
n
n

n

> /2
_
+P
n
[ > /2
and these two terms go to zero as n .
88
Lemma 2.1 ( triangular arrays). Let X
n,k
, 1 k n, n = 1, 2, . . . be a
triangular array of random variables and assume that for each n, X
n,k
, 1 k n,
are independent. Let b
n
> 0, b
n
as n and dene the truncated random
variables by X
n,k
= X
n,k
1
|X
n,k
|b
n
)
. Suppose that
(i)
n

n=1
P[X
n,k
[ > b
n
) 0, as n .
(ii)
1
b
2
n
n

k=1
EX
2
n,k
0 as n .
Put a
n
=
n

k=1
EX
n,k
and set S
n
= X
n,1
+X
n,2
+. . . +X
n,n
. Then
S
n
a
n
b
n
0
in probability.
Proof. Let S
n
= X
n,1
+. . . +X
n,n
. Then
P
_
[S
n
a
n
[
b
n
>
_
= P
_

S
n
a
n
b
n

> , S
n
= S
n
, S
n
,= S
n
_
PS
n
,= S
n
+P
_
[S
n
a
n
[
b
n
>
_
.
However,
PS
n
,=

S
n
P
_
n
_
k=1
X
n,k
,= X
n,k

k=1
PX
n,k
,= X
n,k

=
n

k=1
P[X
n,k
[ > b
n

and this last term goes to zero by (i).


89
Since a
n
= ES
n
, we have
P
_
[S
n
a
n
[
b
n
>
_

2
b
2
n
E[S
n
a
n
[
2
=
1

2
b
2
n
n

k=1
var(X
n,k
)

2
b
2
n
n

k=1
EX
2
n,k
and this goes to zero by (ii).
Proof of Theorem 2.3. We apply the Lemma with X
n,k
= X
k
and b
n
= n. We rst
need to check that this sequence satises (i) and (ii). For (i) we have
n

k=1
P[X
n,k
[ > n = nP[X
1
[ > n,
which goes to zero as n by our assumption. For (ii) we see that that since
the random variables are i.i.d we have
1
n
2
n

k=1
EX
2
n,1
=
1
n
EX
2
n,1
.
Let us now recall that by Problem 2.3 in Chapter III, for any nonnegative
random variable Y and any 0 < p < ,
EY
p
= p
_

0

p1
PY > d
Thus,
E[X
2
n,1
[ = 2
_

0
PX
n,1
[ > d = 2
_
n
0
P[X
n,1
[ > d.
We claim that as n ,
1
n
_
n
0
P[X
1
[ > d 0.
For this, let
g() = P[X
1
[ > .
90
Then 0 g() and g() 0. Set M = sup
>0
[g()[ < and let > 0. Fix k
0
so large that g() < for all > k
0
. Then
_
n
0
P[X
1
[ > dx = M +
_
n
k
0
P[x
1
[ > d
< M +(n k
0
).
Therefore
1
n
_
n
0
P[x
1
[ > <
M
n
+
_
n k
0
n
_
.
The last quantity goes to as n and this proves the claim.
3 BorelCantelli Lemmas.
Before we stay our BorelCantelli lemmas for independent events, we recall
a few already proven facts. If A
n
, then
A
n
, i.o. = limA
n
=

m=1

_
n=m
A
n
and
A
n
, eventually = limA
n
=

_
m=1

n=m
A
n
Notice that
lim1
A
n
= 1
{limA
n
}
and
lim1
A
n
() = 1
{limA
n
}
It follows from Fatous Lemma
P(limA
n
) limPA
n

and that
limPA
n
PlimA
n
.
Also recall Corollary 2.2 of Chapter II.
91
First BorelCantelli Lemma. If

n=1
p(A
n
) < then PA
n
, i.o. = 0.
Question: Is it possible to have a converse ? That is, is it true that PA
n
, i.o. = 0
implies that

n=1
PA
n
= ? The answer is no, at least not in general.
Example 3.1. Let = (0, 1) with the Lebesgue measure on its Borel sets. Let
a
n
= 1/n and set A
n
= (0, 1/n). Clearly then

P(A
n
) = . But, PA
n
i.o. =
P = 0.
Theorem 3.1 (The second BorelCantelli Lemma). Let A
n
be a sequence
of independent sets with the property that

P(A
n
) = Then PA
n
i.o. = 1.
Proof. We use the elementary inequality (1 x) e
x
valid for 0 x 1. Let
Fix N. By independence,
P
_
N

n=m
A
c
n
_
=
N

n=m
PA
c
n

=
N

n=m
1 PA
n

n=m
e
P{A
n
}
= exp
{
N
P
n=m
P{A
n
}}
and this quantity converges to 0 as N Therefore,
P
_

_
n=m
A
n
_
= 1
which implies that PA
n
i.o. = 1 and completes the proof.
4 Applications of the BorelCantelli Lemmas.
In Chapter II, 3, we had several applications of the First BorelCantelli
Lemma. In the next section we will have several more applications of this and of
92
the Second BorelCantelli. Before that, we give a simple application to a fairly
weak version of the strong law of large numbers.
Theorem 4.1. Let X
i
be i.i.d. with EX
1
= and EX
4
1
= C. Then
S
n
n

a.s.
Proof. By considering X

i
= X
i
we may assume that = 0. Then
E(S
4
n
) = E(
n

i=1
X
i
)
4
= E

1i,j,k,ln
X
i
X
j
X
k
X
l
=

1i,j,k,ln
E(X
i
X
j
X
k
X
l
)
Since the random variables have zero expectation and they are independent, the
only terms in this sum which are not zero are those where all the indices are equal
and those where to pair of indices are equal. That is, terms of the form EX
4
j
and
EX
2
i
X
2
j
= (EX
2
i
)
2
. There are n of the rst type and 3n(n1) of the second type.
Thus,
E(S
4
n
) = nE(X
1
) + 3n(n 1)(EX
2
1
)
2
Cn
2
.
By Chebyshevs inequality with p = 4,
P[S
n
[ > n
Cn
2
n
4

4
=
C
n
2

4
and therefore,

n=1
P[
S
n
n
[ > <
and the First BorelCantelli gives
P
_

S
n
n

> i.o. = 0
93
which proves the result.
The following is an applications of the Second BorelCantelli Lemma.
Theorem 4.2. If X
1
, X
2
, . . . , X
n
are i.i.d. with E[X
1
[ = . Then P[X
n
[
n i.o. = 1 and Plim
S
n
n
exists (, ) = 0.
Thus E[X
1
[ < is necessary for the strong law of large numbers. It is also
sucient.
Proof. We rst note that

n=1
P[X
1
[ n E[X
1
[ 1 +

n=1
P[X
1
[ > n
which follows from the fact that
E[X
1
[ =
_

0
P[X
1
[ > Xdx
and

n=0
_
n+1
n
P[X
1
[ > xdx
_

0
P[X
1
[ > Xdx
1 +

n=1
P[X
1
[ > n
Thus,

n=1
P[X
n
[ n =
and therefore by the Second BorelCantelli Lemma,
P[X
n
[ > n i.o. = 1.
Next, set
A =
_
lim
n
S
n
n
exits (, )
_
94
Clearly for A,
lim
n

S
n
()
n

S
n+1
n + 1
()

= 0
and
lim
n
S
n
()
n(n + 1)
= 0.
Hence there is an N such that for all n > N,
lim
n

S
n
()
n(n + 1)

<
1
2
.
Thus for A : [X
n
[ n i.o.,

S
n
()
n(n + 1)

X
n+1
()
n + 1

>
1
2
,
innitely often. However, since
S
n
n

S
n+1
n + 1
=
S
n
n(n + 1)

X
n+1
n + 1
and the left hand side goes to zero as observe above, we see that A : [X
n
[
n i.o. = and since P[X
n
[ > 1 i.o. = 1 we conclude that PA = 0, which
completes the proof.
The following results is stronger than the SecondBorel Cantelli but it follows
from it.
Theorem 4.3. Suppose the sequence of events A
j
are pairwise independent and

j=1
P(A
j
) = . Then
lim
n
_

n
j=1
1
A
j

n
j=1
P(A
j
)
_
= 1. a.s.
In particular,
lim
n
n

j=1
1
A
j
() = a.s
95
which means that
PA
n
i.o. = 1.
Proof. Let X
j
= 1
A
j
and consider their partial sums, S
n
=
n

j=1
X
j
. Since these
random variables are pairwise independent we have as before, var(S
n
) = var(X
1
)+
. . .+ var(X
n
). Also, var(X
j
) = E(X
j
EX
j
[
2
E(X
j
)
2
= E(X
j
) = PA
j
. Thus
var(S
n
) ES
n
. Let > 0.
P
_

S
n
ES
n
1

>
_
= P
_

S
n
ES
n
[ > ES
n
_

2
(ES
n
)
2
var(S
n
)
=
1

2
ES
n
and this last goes to as n . From this we conclude that
S
n
ES
n
1 in
probability. However, we have claimed a.s.
Let
n
k
= infn 1: ES
n
k
2
.
and set T
k
= S
n
k
. Since EX
n
1 for all n we see that
k
2
ET
k
E(T
k1
) + 1 k
2
+ 1
for all k. Replacing n with n
k
in the above argument for S
n
we get
P[T
k
ET
k
[ > ET
k

1

2
ET
k

2
k
2
Thus

k=1
P
_

T
k
ET
k
1

>
_
<
96
and the rst BorelCantelli gives
P
_

T
k
ET
k
1

> i.o
_
= 0
That is,
T
k
ET
k
1, a.s.
Let
0
with P(
0
) = 1 be such that
T
k
()
ET
k
1.
for every
0
. Let n be any integer with n
k
n < n
k+1
. Then
T
k
()
ET
k+1

S
n
()
E(S
n
)

T
k+1
()
ET
k
.
We will be done if we can show that
lim
n
T
k
()
ET
k+1
1 and lim
n
T
k+1
()
ET
k
= 1.
Now, clearly we also have
ET
k
ET
k+1

T
k
()
ET
k

S
n
()
ES
n

T
k+1
()
ET
k+1

ET
k+1
ET
k
and since
k
2
ET
k
ET
k+1
(k + 1)
2
+ 1
we see that 1
ET
k+1
ET
k
and that
ET
k+1
ET
k
1 and similarly 1
ET
k
ET
k+1
and that
ET
k
ET
k+1
1, proving the result.
5. Convergence of Random Series, Strong Law of Large Numbers.
Denition 5.1. Let X
n
be a sequence of random variables. Dene the
algebras T

n
= (X
n
, X
n+1
, . . . ) and 1 =

n1
T

n
. T

n
is often called the future
algebra and 1 the remote (or tail) algebra.
97
Example 5.1.
(i) If B
n
B(R), then X
n
B
n
i.o. 1. and if we take X
n
= 1
A
n
we see that
Then A
n
i.o. 1.
(ii) If S
n
= X
1
+. . .+X
n
, then clearly, lim
n
S
n
exists 1 and limS
n
/c
n
>
1, if c
n
. However, limS
n
> 0 , 1
Theorem 5.1 (Kolmogorov 0 1 Law). If X
1
, X
2
. . . are independent and
A 1, then P(A) = 0 or 1.
Proof. We shall show that A is independent of itself and hence P(A) = P(A
A) = P(A)P(A) which implies that P(A) = 0 or 1. First, since X
1
, . . . are indepen-
dent if A (X
1
, . . . , X
n
) and B (X
n+1
, . . . ) then A and B are independent.
Thus if, A (X
1
, . . . , X
n
) and B 1, then A and B are independent. Thus
_
n
(X
1
, . . . , X
n
) is independent of 1. Since they are both systems (clearly
if A, B
_
n
(X
1
, . . . , X
n
) then A (X
1
. . . , X
n
) and B (X
1
, . . . , X
m
) for
some n and m and so AB (X
1
, . . . , X
max(n,m)
)),

n
(X
1
, . . . , X
n
) is indepen-
dent of 1, by Theorem 2.3, Chapter IV. Since A 1 implies A (X
1
, X
2
, . . . ),
we are done.
Corollary. Let A
n
be independent. Then PA
n
i.o. = 0 or 1. In the same way,
if X n are independent then PlimS
n
exists = 0 or1.
Or next task is to investigate when the above probabilities are indeed one.
Recall that Chebyshevs inequality gives, for mean zero random variables which
are independent, that
P[S
n
[ >
1

2
var(S
n
).
The following results is stronger and more useful as we shall see soon.
98
Theorem 5.2 ( Kolmogorovs inequality). Suppose X
n
be independent,
EX
n
= 0 and var(X
n
) < for all n. Then for any > 0,
P max
1kn
[S
k
[
1

2
E[S
n
[
2
=
1

2
var(S
n
).
Proof. Set
A
k
= : [S
k
()[ , [S
j
()[ < for all j < k
Note that these sets are disjoint and
ES
2
n

n

k=1
_
A
k
S
2
n
dP =
=
n

k=1
_
A
k
S
2
k
+ 2S
k
S
n
2S
k
+ (S
n
S
k
)
2
dP

k=1
_
A
k
S
2
k
dP + 2
n

k=1
_

S
k
1
A
k
(S
n
S
k
)dP (5.1)
Now,
S
k
1
A
k
(X
1
, . . . , X
k
)
and
S
n
S
k
(X
k+1
. . . S
n
)
and hence they are independent. Since E(S
n
S
k
) = 0, we have E(S
k
1
A
k
(S
n

S
k
)) = 0 and therefore the second term in (5.1) is zero and we see that
ES
n
2

k=1
_
A
k
[S
2
k
[dp

2

k=1
P(A
k
)
=
2
P( max
1kn
[S
k
[ >
_
which proves the theorem.
99
Theorem 5.3. If X
j
are independent, EX
j
= 0 and

n=1
var(X
n
) < , then

n=1
X
n
converges a.s.
Proof. By Theorem 5.2, for N > M we have
P
_
max
MnN
[S
n
S
M
[ >
_

2
var(S
N
S
M
)
=
1

2
N

n=M+1
var(X
n
).
Letting N gives P max
mM
[S
n
S
M
[ >
1

n=M+1
(X
n
) and this last
quantity goes to zero as M since the sum converges. Thus if

M
= sup
n,mM
[S
m
S
n
[
then
P
M
> 2 P max
mM
[S
m
S
M
[ > 0
as M and hence
M
0 a.s. as M . Thus for almost every , S
m
()
is a Cauchy sequence and hence it converges.
Example 5.2. Let X
1
, X
2
, . . . be i.i.d. N(0, 1). Then for every t,
B
t
() =

n=1
X
n
sin(nt)
n
converges a.s. (This is a series representation of Brownian motion.)
Theorem 5.4 (Kolmogorovs Three Series Theorem). Let X
j
be inde-
pendent random variables. Let A > 0 and set Y
j
= X
j
1
(|X
j
|A)
. Then

n=1
X
n
converges a.s. if and only if the following three conditions hold:
(i)

n=1
P([X
n
[ > A) < ,
100
(ii)

n=1
EY
n
converges,
(iii)

n=1
var(Y
n
) < .
Proof. Assume (i)(iii). Let
n
= EY
n
. By (iii) and Theorem 5.3,

(Y
n

n
) con-
verges a.e. This and (ii) show that

n=1
Y
n
converges a.s. However, (i) is equivalent
to

n=1
P(X
n
,= Y
n
) < and by the BorelCantelli Lemma,
P(X
n
,= Y
n
i.o. = 0.
Therefore, P(X
n
= Y
n
eventually = 1. Thus if

n=1
Y
n
converges, so does

n=1
X
n
.
We will prove the necessity later as an application of the central limit theo-
rem.
For the proof of the strong law of large numbers, we need
Lemma 5.1 (Kroneckers Lemma). Suppose that a
n
is a sequence of positive
real numbers converging up to and suppose

n=1
x
n
a
n
converges. Then
1
a
n
n

m=1
x
m
0.
Proof. Let b
n
=
n

j=1
x
j
a
j
. Then b
n
b

, by assumption. Set a
0
= 0, b
0
= 0. Then
101
x
n
= a
n
(b
n
b
n1
), n = 1, 2, . . . and
1
a
n
n

j=1
x
j
=
1
a
n
n

j=1
a
j
(b
j
b
j1
)
=
1
a
n
_
_
b
n
a
n
b
0
a
0

n1

j=0
b
j
(a
j+1
a
j
)
_
_
= b
n

1
a
n
n1

j=0
b
j
(a
j+1
a
j
)
The last equality is by summation by parts. To see this, precede by induction
observing rst that
n

j=1
a
j
(b
j
b
j1
) =
n1

j=1
a
j
(b
j
b
j1
) +a
n
(b
n
b
n1
)
= b
n1
a
n1
b
0
a
0

n2

j=0
b
j
(a
j+1
a
j
) +a
n
b
n
a
n
b
n1
= a
n
b
n
b
0
a
0

n2

j=0
b
j
(a
j+1
a
j
) b
n1
(a
n
a
n1
)
= a
n
b
n
b
0
a
0

n1

j=0
b
j
(a
j+1
a
j
)
Now, recall a b
n
b

. We claim that
1
a
n
n1

j=0
b
j
(a
j+1
a
j
) b

. Since b
n
b

,
given > 0 N such that for all j > N, [b
j
b

[ < . Since
1
a
n
n1

j=0
(a
j+1
a
j
) = 1
102
Jensens inequality gives
[
1
a
n
n1

j=0
b
j
(a
j+1
a
j
) b

1
a
n
n1

j=1
(b

b
j
)(a
j+1
a
j
)

1
a
n
N

j=1
[(b

b
j
)(a
j+1
a
j
)[
+
1
a
n
n1

j=N+1
[b

b
j
[[a
j+1
a
j
[

1
a
n
N

j=1
[b
N
b
j
[[a
j+1
a
j
[ +
1
a
n
n

j=N+1
[a
j+1
a
j
[

M
a
n
+.
Letting rst n and then 0 completes the proof.
Theorem 5.5 (The strong law of large numbers). Suppose X
j
are i.i.d., E[X
1
[ <
and set EX
1
= . Then
S
n
n
a.s.
Proof. Let Y
k
= X
k
1
(|X
k
|k)
Then

n=1
PX
k
,= Y
k
=

P([X
k
[ > k

_

0
P([X
1
[ > d

= E[X[ < .
Therefore by the First BorelCantelli Lemma, PX
k
,= Y
k
i.o. = 0 or put in
other words, PX
k
,= Y
k
eventually = 1. Thus if we set T
n
= Y
1
+ . . . + Y
n
. It
103
suces to prove
T
n
n
a.s. Now set Z
k
= Y
k
EY
k
. Then E(Z
k
) = 0 and

k=1
var(Z
n
)
k
2

k=1
E(Y
2
k
)
k
2
=

k=1
1
k
2
_

0
2P[Y
k
[ > d
=

k=1
1
k
2
_
k
0
2P[X
1
[ > d
= 2
_

0

k=1

k
2
1
{k}
()P[X
1
[ > d
= 2
_

0

k>
1
k
2
_
P[X
1
[ > d.
CE[X
1
[,
where we used the fact that

k>
1
k
2

C

for some constant C which follows from the integral test. By Theorem 5.3,

k=1
Z
k
k
converges a.s. and the Kroneckers Lemma gives that
1
n
n

k=1
Z
k
0 a.s.
which is the same as
1
n
n

k=1
(Y
k
EY
k
) 0 a.s.
or
T
n
n

1
n
n

k=1
EY
k
0 a.s.
We would be done if we can show that
1
n
n

k=1
EY
k
. (5.2)
104
We know EY
k
as k . That is, there exists an N such that for all k > N,
[EY
k
[ < . With this N xed we have for all n N,

1
n
n

k=1
EY
k

1
n
n

k=1
(EY
k
)

1
n
N

k=1
E[Y
k
[ +
1
n
n

k=N
E[Y
k
[

1
n
N

k=1
E[Y
k
[ +.
Let n to complete the proof.
6. Variants of the Strong Law of Large Numbers.
Let us assume E(X
i
) = 0 then under the assumptions of the strong law of
large numbers we have
S
n
n
0 a.s. The question we address now is: Can we have
a better rate of convergence? The answer is yes under the right assumptions and
we begin with
Theorem 6.1. Let X
1
, X
2
, . . . be i.i.d., EX
i
= 0 and EX
2
1

2
< . Then for
any 0
lim
n
S
n
n
1/2
(log n)
1/2+
= 0,
a.s.
We will show later that in fact,
lim
S
n
_
2
2
nlog log n
= 1,
a.s. This last is the celebrated law of the iterated logarithm of Kinchine.
Proof. Set a
n
=

n(log n)
1
2
+
, n 2. a
1
> 0

n=1
var(X
n
/a
n
) =
_
1
a
2
1
+

n=2
1
n(log n)
1+2
_
< .
105
Then

n=1
X
n
a
n
converges a.s. and hence
1
a
n
n

k=1
X
k
0 a.s.
What if E[X
1
[
2
= But E[X
1
[
p
< for some 1 < p < 2? For this we have
Theorem 6.2 (Marcinkiewidz and Zygmund). X
j
i.i.d., EX
1
= 0 and
E[X
1
[
p
< for some 1 < p < 2. Then
lim
n
S
n
n
1/p
= 0,
a.s.
Proof. Let
Y
k
= X
k
1
(|X
n
|k
1/p
)
and set
T
n
=
n

k=1
Y
k
.
It is enough to prove, as above, that
T
n
n
1/p
0
, a.s. To see this, observe that

PY
k
,= X
k
=

k=1
P[X
k
[
p
> k
E([X
1
[
p
) <
and therefore by the rst BorelCantelli Lemma, PY
k
,= X
k
i.o. = 0 which is
the same as PY
k
= X
k
, eventually = 1
Next, estimating by the integral we have

k>
p
1
k
2/p
C
_

p
dx
x
2/p
=
1
(1 2/p)
x
22/p

p
=
p2
106
and hence

k=1
var(Y
k
/k
1/p
)

k=1
EY
2
k
k
2/p
= 2

k=1
1
k
2/p
_

0
P[Y
k
[ > d
= 2

k=1
1
k
2/p
_
k
1/p
0
P[X
1
[ > d
= 2

k=1
1
k
2/p
_

0
1
(0,k
1/p
)
()P[X
1
[ > d
= 2
_

0
P[X
1
[ >
_

k>
p
1
k
2/p
_
d
2
_

0

p1
P[X
1
[ > d = C
p
E[X
1
[
p
< .
Thus, and Kronecker implies, with
k
= E(Y
k
), that
1
n
1/p
n

k=1
(Y
k

k
) 0, a.s.
If we are bale to show that
1
n
1/p
n

k=1

k
0, we will be done. Observe that 0 =
E(X
1
) = E(X1
(|X|k
1/p
)
) +
k
so that [
k
[ [E(X1
(|X|k
1/p
)
[ and therefore

1
n
1/p
n

k=1

1
n
1/p
n

k=1
_

k
1/p
P[X
1
[ > d

1
pn
1/p
n

k=1
1
k
11/p
_

k
1/p
p
p1
P[X
1
[ > d
=
1
pn
1/p
n

k=1
1
k
11/p
E[X
1
[
p
; [X
1
[ > k
1/p
.
Since X L
p
, given > 0 there is an N such that E([X
1
[
p
[X
1
[ > k
1/p
) < if
k > N. Also,
n

k=1
1
k
11/p
C
n
_
1
x
1/p1
dx Cn
1/p
.
The Theorem follows from these.
107
Theorem 6.3. Let X
1
, X
2
, . . . be i.i.d. with EX
+
j
= and EX

j
< . Then
lim
n
S
n
n
= , a.s.
Proof. Let M > 0 and X
M
j
= X
i
M, the maximum of X
j
and M. Then X
M
i
are i.i.d. and E[X
M
i
[ < . (Here we have used the fact that EX

j
< .) Setting
S
M
n
= X
M
1
= X
M
1
+. . . +X
M
n
we see that
S
M
n
n
EX
M
1
a.s. Now, since X
i
X
M
i
we have
lim
S
n
n
lim
n
S
M
n
n
= EX
M
1
, a.s.
However, by the monotone convergence theorem, E(X
M
1
)
+
E(X
+
1
) = , hence
EX
M
1
= E(X
M
1
)
+
E(X
M
1
)

+.
Therefore,
lim
S
n
n
= , a.s.
and the result is proved.
7. Two Applications.
We begin with an example from Renewal Theory. Suppose X
1
, X
2
, . . . be
are i.i.d. and 0 < X
i
< , a.s. Let T
n
= X
1
+ . . . + X
n
and think of T
n
as the time of the nth occurrence of an event. For example, X
i
could be the
lifetime of ith lightbulb in a room with innitely many lightbulbs. Then T
n
=
is the time the nth lightbulb burns out. Let N
t
= supn: T
n
t which in this
example is the number of lightbulbs that have burnt out by time t.
Theorem 7.1. Let X
j
be i.i.d. and set EX
1
= which may or may not be nite.
Then
N
t
t
1/, a.s. as t where this is 0 if = . Also, E(N(t))/t 1/
Continuing with our lightbulbs example, note that if the mean lifetime is
large then the number of lightbulbs burnt by time t is very small.
108
Proof. We know
T
n
n
a.s. Note that for every , N
t
() is and integer and
T(N
t
) t < T(N
t
+ 1).
Thus,
T(N
t
)
N
t

t
N
t

T(N
t
+ 1)
N
t
+ 1

N
t
+ 1
N
t
.
Now, since T
n
< for all n, we have N
t
a.s. By the law of large numbers
there is an
0
such that P(
0
) = 1 and such that for
0
,
T
N
t
()
()
N
t
()
and
N
t
() + 1
N
t
()
1.
Thus t/N
t
() a.s. and we are done.
Let X
1
, X
2
, . . . be i.i.d. with distribution F. For x R set
F
n
(x, ) =
1
n
n

n=1
1
(X
k
x)
().
This is the observed frequency of values x. Now, x and set a
k
= X
k
().
Then F
n
(x, ) is the distribution with jump of size
1
n
at the points a
k
. This is
called the imperial distribution based on n samples of F. On the other hand, let
us x x. Then F
n
(x, ) is a random variable. What kind of a random variable is
it? Dene

k
() = 1
(X
k
x)
() =
_
1, X
k
() x
0, X
k
() > x
Notice that in fact the
k
are independent Bernoullians with p = F(x) and E
k
=
F(x). Writing
F
n
(x, ) =
1
n
n

k=1

k
we see that in fact F
n
(x, ) =
S
n
n
and the Strong Law of Large numbers shows
that for every x R, F
n
(x, ) F(x) a.s. Of course, the exceptional set may
depend on x. That is, what we have proved here is that given x R there is a set
109
N
x
such that P(N
x
) = 0 and such that F
n
(x, ) F(x) for N
x
. If we
set N =
xQ
N
x
where we use Q to denote the rational numbers, then this set
also has probability zero and o this set we have F
n
(x, ) F(x) for all N
and all x Q. This and the fact that the discontinuities of distribution functions
are at most countable turns out to be enough to prove
Theorem 7.2 ( GlivenkoCantelli Theorem). Let
D
n
() = sup
xR
[F
n
(x, ) F(x)[.
Then D
n
0 a.s.
110
VI
THE CENTRAL LIMIT THEOREM
1 Convergence in Distribution.
If X
n
tends to a limit, what can you say about the sequence F
n
of d.f. or
the sequence
n
of measures?
Example 1.1. Suppose X has distribution F and dene the sequence of random
variables X
n
= X+1/n. Clearly, X
n
X a.s. and in several other ways. F
n
(x) =
P(X
n
x) = P(X x 1/n) = F(x 1/n). Therefore,
lim
n
F
n
(x) = F(x).
Hence we do not have convergence of F
n
to F. Even worse, set X
n
= X + C
n
where C
n
=
_
1
n
even
1/n odd
. Thenthe limit may not even exist.
Denition 1.1. The sequence F
n
of d.f. converges weakly to the d.f. F if
F
n
(x) F(x) for every point of continuity of F. We write F
n
F. In all
our discussions we assume F is a d.f. but it could just as well be a (sub. d.f.).
The sequence of random variables X
n
converges weakly to X if their distri-
butions functions F
n
(x) = P(X
n
x) converge weakly to F(x) = P(X x). We
will also use X
n
X.
Example 1.2.
(1) The GlivenkoCantelli Theorem
111
(2) X
i
= i.i.d. 1, probability 1/2. If S
n
= X
1
+. . . +X
n
then
F
n
(y) = P
_
S
n

n
y
_

2
_
y

e
x
2
/2
dx.
This last example can be written as
S
n
n
N(0, 1) and is called the the De
MoivreLaplace Central limit Theorem. Our goal in this chapter is to obtain a very
general version of this result. We begin with a detailed study of convergence in
distribution.
Theorem 1.1 (Skorhods Theorem). IF F
n
F, then there exists random
variables Y
n
, Y with Y
n
Y a.s. and Y
n
F
n
, Y F.
Proof. We construct the random variables on the canonical space. That is, let
= (0, 1), T the Borel sets and P the Lebesgue measure. As in Chapter IV,
Theorem 1.1,
Y
n
() = infx: F
n
(x), Y () = infx: F(x).
are random variables satisfying Y
n
F
n
and Y F
The idea is that if F
n
F then F
1
n
F
1
, but of course, the problem is
that this does not happen for every point and that the random variables are not
exactly inverses of the distribution functions. Thus, we need to proceed with some
care. In fact, what we shall show is that Y
n
() Y () except for a countable
set. Let 0 < < 1. Given > 0 chose and x for which Y () < x < Y ()
and F(x) = F(x), (that is for which F is continuous at x). Then by denition
F(x) < . Since F
n
(x) F(x) we have that for all n > N, F
n
(x) < . Hence,
again by denition, Y () < x < Y
n
(), for all such n. Therefore,
lim Y
n
() Y ().
It remains to show that
lim Y
n
(x) Y (x).
112
Now, if <

and > 0, choose y for which Y (

) < y < Y (

) + and F is
continuous at y. Now,
<

F(Y (

)) F(y).
Again, since F
n
(y) F(y) we see that for all n > N, F
n
(y) and hence
Y
n
() y < Y (

) + which implies lim Y


n
() Y (

). If Y is continuous at
we must have
lim Y
n
() Y ().

The following corollaries follow immediately from Theorem 1.1 and the results
in Chapter II.
Corollary 1.1 (Fatous in Distribution). Suppose X
n
X and g 0 and
continuous. Then E(g(X)) limE(g(X
n
)).
Corollary 1.2 (Dominated Convergence in Distribution). If X
n
X, g
is continuous and and E[(g(X
n
)[ < C, then
E(g(X
n
)) E(g(X)).
The following is a useful characterization of convergence in distribution.
Theorem 1.2. X
n
X if and only if for every bounded continuous function g
we have E(g(X
n
)) E(g(X)).
Proof. If X
n
X then Corollary 2.1 implies the convergence of the expectations.
Conversely, let
g
x,
(y) =
_

_
1 y x
0 y x +
linear x y x +
113
It follows from this that
P(X
n
x) E(g
x,
(X
n
))
and therefore,
limP(X
n
x) limE(g
x,
(X
n
))
= E(g
x,
(X))
P(X x +).
Now, let 0 to conclude that
limP(X
n
x) P(X x).
In the same way,
P(X x ) E(g
x,
(X))
= lim
n
E(g
x,
(X
n
))
lim
n
P(X
n
x).
Now, let 0. If F continuous at x, we obtain the result.
Corollary 1.3. Suppose X
n
X in probability. Then X
n
X.
Lemma 1.1. Suppose X
n
0 in probability and [X
n
[ Y with E(Y ) < .
Then E[X
n
[ 0.
Proof. Fix > 0. Then P[X
n
[ > 0, as n . Hence by Proposition 2.6 in
Chapter II,
_
{|X
n
|>}
[Y [ dP 0, as n .
114
Since
E[X
n
[ =
_
{|X
n
|<}
[X
n
[dP +
_
{|X
n
|>}
[X
n
[dP
< +
_
{|X
n
|>}
[Y [ dP,
the result follows.
Proof of Corollary 1.3. If X
n
X in probability and g is bounded and continuous
then g(X
n
) g(X) in probability (why ?) and hence E(g(X
n
)) E(g(X)),
proving X
n
X.
An alternative proof is as follows. Set a
n
= E(G(X
n
)) and a = E(X). Let
a
n
k
be a subsequence. Since X
n
k
converges to X in probability also, we have a
subsequence X
n
k
j
which converges almost everywhere and hence by the dominated
convergence theorem we have a
n
k
j
a and hence the sequence a
n
also converges
to a, proving the result.
Theorem 1.3 (Continuous mapping Theorem). Let g be a measurable func-
tion in R and let D
g
= x: g is discontinuous at x. If X
n
X and PX
D
g
= (D
g
) = 0, then g(X
n
) g(X)
Proof. Let X
n
Y
n
, X Y and Y
n
Y a.s. Let f be continuous and bounded.
Then D
fg
D
g
. So,
PY

D
fg
= 0.
Thus,
f(g(Y
n
)) f(g(Y ))
a.s. and the dominated convergence theorem implies that E(f(g(Y
n
))) E(f(g(Y ))
and this proves the result.
Next result gives a number of useful equivalent denitions.
115
Theorem 1.4. The following are equivalent:
(i) X
n
X.
(ii) For all open sets G R, limP(X
n
G) P(X G) or what is the
same, lim
n
(G) (G), where X
n

n
and X .
(iii) For all closed sets K R , limP(X
n
K) P(X K).
(iv) For all sets A R with P(X A) = 0 we have lim
n
P(X
n
A) =
P(X A).
We recall that for any set A, A = AA
0
where A is the closure of the set
and A
0
is its interior. It can very well be that we have strict inequality in (ii) and
(iii). Consider for example, X
n
= 1/n so that P(X
n
= 1/n) = 1. Take G = (0, 1).
Then P(X
n
G) = 1. But 1/n 0 G, so,
P(X G) = 0.
Also, the last property can be used to dene weak convergence of probability
measures. That is, let
n
and be probability measures on (R, B). We shall say
that
n
converges to weakly if
n
(A) (A) for all borel sets A in R with the
property that (A) = 0.
Proof. We shall prove that (i) (ii) and that (ii) (iii). Then that (ii) and (iii)
(iv), and nally that (iv) (i).
Proof. Assume (i). Let Y
n
X
n
, Y X, Y
n
Y a.s. Since G is open,
lim1
(Y
n
G)
() 1
(Y G)
()
Therefore Fatous Lemma implies
P(Y G) limP(Y
n
G),
116
proving (ii). Next, (ii) (iii). Let K be closed. Then K
c
is open. Put
P(X
n
K) = 1 P(X
n
K
c
)
P(X K) = 1 P(X K
c
).
The equivalence of (ii) and (iii) follows from this.
Now, (ii) and (iii) (iv). Let K = A, G = A
0
and A = AA
0
. Now,
G = KA and under our assumption that P(X A) = 0,
P(X K) = P(X A) = P(X G).
Therefore, (ii) and (iii)
lim P(X
n
A) lim P(X
n
K)
P(X K)
= P(X A)
lim P(X
n
A) lim P(X
n
G)
and this gives
P(X

G) = P(X

A).
To prove that (iv) implies (i), take A = (, x]. Then A = x and this
completes the proof.
Next, recall that any bounded sequence of real numbers has the property that
it contains a subsequence which converges. Suppose we have a sequence of prob-
ability measures
n
. Is it possible to pull a subsequence
u
k
so that it converges
weakly to a probability measure ? Or, is it true that given distribution functions
F
n
there is a subsequence F
n
k
such that F
n
k
converges weakly to a distribution
function F? The answer is no, in general.
117
Example 1.3. Take F
n
(x) =
1
3
1
(xn)
(x) +
1
3
1
(xn)
(x) +
1
3
G(x) where G is a
distribution function. Then
lim
n
F
n
(x) = F(x) = 1/3 + 1/3G(x)
lim
x
f(x) = 2/3 < 1
lim
x
F(x) = 1/3 ,= 0.
Lemma 1.1. Let f be an increasing function on the rationals Q and dene

f on
R by

f(x) = inf
x<tQ
f(t) = inff(t): x < t Q
= lim
t
n
x
f(t
n
)
Then

f is increasing and right continuous.
Proof. The function

f is clearly increasing. Let x
0
R and x > 0. We shall
show that there is an x > x
0
such that
0

f(x)

f(x
0
) < .
By the denition, there exists t
0
Q such that t
0
> x
0
and
f(t
0
) <

f(x
0
) < f(t
0
).
Hence
[f(t
0
)

f(x)[ < .
Thus if t Q is such that x
0
< t < t
0
, we have
0 f(t)

f(x
0
) f(t
0
)

f(x
0
) < .
That is, for all x
0
< t < t
0
, we have
f(t) <

f(x
0
) +
and therefore if x
0
< x < t
0
we see that
0

f(x)

f(x
0
) < ,
proving the right continuity of

f.
118
Theorem 1.5 (Hellys Selection Theorem). Let F
n
be a sequence of of
distribution functions. There exists a subsequence F
n
k
and a right continuous
nondecreasing function function F such that F
n
k
(x) F(x) for all points x of
continuity of F.
Proof. Let q
1
, q
2
, . . . be an enumeration of the rational. The sequence F
n
(q
1
)
has values in [0, 1]. Hence, there exists a subsequence F
n
1
(q
1
) G(q
1
). Similarly
for F
n
1
(q
2
) and so on. schematically we see that
q
1
: F
n
1
, . . . G(q
1
)
q
2
: F
n
2
, . . . G(q
2
).
.
.
.
q
k
: F
n
k
(q
k
) . . . G(q
k
)
.
.
.
Now, let F
n
n
be the diagonal subsequence. Let q
j
be any rational. Then
F
n
n
(q
j
) G(q
j
).
So, we have a nondecreasing function G dened on all the rationals. Set
F(x) = infG(q): q Q: q > x
= lim
q
n
x
G(q
n
)
By the Lemma 1.1 F is right continuous and nondecreasing. Next, let us show
that F
n
k
(x) F(x) for all points of continuity of F. Let x be such a point and
pick r
1
, r
2
, s Q with r
1
< r
2
< x < s so that
F(x) < F(r
1
) F(r
2
)
F(x) F(s)
< F(x) +.
119
Now, since F
n
k
(r
2
) F(r
2
) F(r
1
) and F
n
k
(s) F(s) we have for n
k
large
enough,
F(x) < F
n
k
(r
2
) F
n
k
(x) F
n
k
(s) < F(x) +
and this shows that F
n
k
(x) F(x), as claimed.
When can we guarantee that the above function is indeed a distribution?
Theorem 1.6. Every weak subsequential limit of
n
is a probability measures
if and only if for every > 0 there exists a bounded interval I

= (a, b] such that


inf
n

n
(I

) > 1 . (*)
In terms of the distribution functions this is equivalent to the statement that
for all > 0, there exists an M

> 0 such that sup


n
1 F
n
(M

) +F
n
(M

) < .
A sequence of probability measures satisfying () is said to be tight. Notice that
if
n
is unit mass at n then clearly
n
is not tight. The mass of
n
scapes to
innity. The tightness condition prevents this from happening.
Proof. Let
n
k
. Let J I

and (J) = 0. Then


(R) (J) = lim
n

n
k
(J)
lim
n
k
(I

)
> 1 .
Therefore, (R) = 1 and is a probability measure.
Conversely, suppose () fails. Then we can nd an > 0 and a sequence n
k
such that

n
k
(I) 1 ,
120
for all n
k
and all bounded intervals I. Let
n
k
j
weakly. Let J be a continuity
interval for . Then
(J) = lim
j

n
k
j
(J) lim
n
k
j
(J)
1 .
Therefore, (R) 1 and is not a probability measure.
2 Characteristic Functions.
Let be a probability measure on R and dene its Fourier transform by
(t) =
_
R
e
itx
d(x). Notice that the Fourier transform is a a complex valued
function satisfying [ (t)[ (R) = 1 for all t R. If X be a random variable its
characteristic function is dened by

X
(t) = E(e
itX
) = E(cos(tX)) +iE(sin(tX)).
Notice that if is the distribution measure of X then

X
(t) =
_
R
e
itx
d(x) = (t).
and again [
X
(t)[ 1. Note that if X Y then
X
(t) =
Y
(t) and if X and Y
are independent then

X+Y
(t) = E(e
itX
e
itY
) =
X
(t)
Y
(t).
In particular, if X
1
, X
2
, . . . , X
n
are are i.i.d., then

X
n
(t) = (
X
1
(t))
n
.
Notice also that if (a +ib) = a ib then
X
(t) =
X
(t). The function is
uniformly continuous. To see the this observe that
[(t +h) (t)[ = [E[e
i(t+h)X
e
itX
)[
E[e
ihX
1[
121
and use the continuity of the exponential to conclude the uniform continuity of

X
. Next, suppose a and b are constants. Then

aX+b
(t) = e
itb

X
(at).
In particular,

X
(t) =
X
(t) =
X
(t).
If X X then
X
(t) =
X
(t) and
X
is real. We now proceed to present some
examples which will be useful later.
Examples 2.1.
(i) (Point mass at a) Suppose X F =
a
. Then
(t) = E(e
itX
) = e
ita
(ii) (Coin ips) P(X = 1) = P(X = 1) = 1/2. Then
(t) = E(e
itX
) =
1
2
e
it
+
1
2
e
it
=
1
2
(e
it
+e
it
) = cos(t).
(iii) (Bernoulli ) P(X = 1) = p, P(X = 0) = 1 p. Then
(t) = E(e
itX
) = pe
it
+ (1 p)
= 1 +p(e
it
1)
(iv) (Poisson distribution) P(X = k) = e

k
k!
, k = 0, 1, 2, 3 . . .
(t) =

k=0
e
itk
e

k
k!
= e

k=0
(e
it
)
k!
k
= e

e
e
it
= e
(e
it
1)
122
(v) (Exponential) Let X be exponential with density e
y
. Integration by parts
gives
(t) =
1
(1 it)
.
(vi) (Normal) X N(0, 1).
(t) = e
t
2
/2
.
Proof of (vi). Writing e
itx
= cos(tx) +i sin(tx) we obtain
(t) =
1

2
_
R
e
itx
e
x
2
/2
dx =
1

2
_
R
cos txe
x
2
/2
dx

(t) =
1

2
_
R
xsin(tx)e
x
2
/2
dx
=
1

2
_
R
sin(tx)xe
x
2
/2
dx
=
1

2
_
R
t cos(tx)e
x
2
/2
dx
= t(t).
This gives

(t)
(t)
= t which, together with the initial condition (0) = 1, immedi-
ately yields (t) = e
t
2
/2
as desired.
Theorem 2.1 (The Fourier Inversion Formula). Let be a probability mea-
sure and let (t) =
_
R
e
itx
d(x). Then if x
1
< x
2
(x
1
, x
2
) +
1
2
(x
1
) +
1
2
(x
2
) = lim
T0
1
2
_
T
T
e
itx
1
e
itx
2
it
(t)dt.
Remark. The existence of the limit is part of the conclusion. Also, we do not mean
that the integral converges absolutely. For example, if =
0
then (t) = 1. If
x
1
= 1 and x
2
= 1, then we have the integral of
2 sin t
t
which does not converse
absolutely.
123
Recall that
sign() =
_

_
1 > 0
0 = 0
1 < 0
Lemma 2.1. For all y > 0,
0 sign()
_
y
0
sin(x)
x
dx
_

0
sin x
x
dx, (2.1)
_

0
sin(x)
x
dx = /2 sign(), (2.2)
_

0
1 cos x
x
2
dx =

2
[[. (2.3)
Proof. Let x = u. It suces to prove (2.1)(2.3) for = 1. For (2.1), write
[0, ) = [0, ] [, 2], . . . and choose n so that n < y (n + 1). Then
_
y
0
sin x
x
dx =
n

k=0
_
_
(k+1)
k
sin x
x
dx
_
+
_
y
n
sin x
x
dx
=
_

0
sin x
x
dx +
_
2

sin x
x
dx +
_
3
2
sin x
x
dx +. . . +
_
y
n
sin x
x
dx
=
_

0
sin x
x
dx + (1)a
1
+ (1)
2
a
2
+. . . + (1)
n1
a
n1
+ (1)
n
_
y
n
sin x
x
dx
where [a
j+1
[ < [a
j
[. If n is odd then n 1 is even and
y
_
n
sin x
x
dx < 0. Comparing
terms we are done. If n is even, the result follows by replacing y with (n+1) and
using the same argument.
For (2.2) and (2.3) apply Fubinis Theorem to obtain
_

0
sin x
x
dx =
_

0
sin x
_

0
e
ux
dudx
=
_

0
__

0
e
ux
sin xdx
_
du
=
_

0
_
du
1 +u
2
_
= /2.
124
and
_

0
1 cos x
x
2
dx =
_

0
1
x
2
_
x
0
sin ududx
=
_

0
sin u
_

u
dx
x
2
du
=
_

0
sin u
u
du
= /2.
This completed the proof.
Proof of Theorem 2.1. We begin by observing that

e
it(xx
1
)
e
it(xx
2
)
it

_
x
2
x
1
e
itu
du

[x
1
x
2
[
and hence for any T > 0,
_
R
1
_
T
T
[x
2
x
2
[dtd(x) 2T[x
1
x
2
[ < .
From this, the denition of and Fubinis Theorem, we obtain
1
2
_
T
T
e
i+x
1
e
itx
2
it
(t)dt =
_

_
T
T
e
itx
1
e
itx
2
2it
e
itx
dtd(x)
=
_

_
_
T
T
e
it(xx
1
)
e
it(xx
2
)
2it
dt
_
d(x)
=
_

F(T, x, x
1
, x
2
)d(x) (2.4)
Now,
F(T, x, x
1
, x
2
) =
1
2i
_
T
T
cos(t(x x
1
))
t
dt +
i
2i
_
T
T
sin(t(x x
1
))
t
dt

1
2i
_
T
T
cos(t(x x
2
))
t
dt
i
2i
_
T
T
sin(t(x x
2
))
t
dt
=
1

_
T
0
sin(t(x x
1
))
t
dt
1

_
T
0
sin(t(x x
2
))
t
dt,
125
using the fact that
sin(t(x x
i
))
t
is even and
cos(t(x x
i
)
t
odd.
By (2.1) and (2.2),
[F(T, x, x
1
, x
2
)[
2

_

0
sin t
t
dt.
and
lim
T
F(T, x, x
1
, x
2
) =
_

1
2
(
1
2
) = 0, if x < x
1
0 (
1
2
) =
1
2
, if x = x
1
1
2
(
1
2
) = 1, if x
1
< x < x
2
1
2
0 =
1
2
if x = x
2
1
2

1
2
= 0, if x > x
2
Therefore by the dominated convergence theorem we see that the right hand side
of (2.4) is
_
(,x
1
)
0 d +
_
{x
1
}
1
2
d +
_
(x
1
,x
2
)
1 d +
_
1
2
d +
_
(x
2
,)
0 d
= (x
1
, x
2
) +
1
2
x
1
+
1
2
x
2
,
proving the Theorem.
Corollary 2.1. If two probability measures have the same characteristic function
then they are equal.
This follows from the following
Lemma 2.1. Suppose The two probability measures
1
and
2
agree on all inter-
vals with endpoints in a given dense sets, then they agree on all of B(R).
This follows from our construction, (see also Chung, page 28).
Proof Corollary 2.1. Since the atoms of both measures are countable, the two
measures agree, the union of their atoms is also countable and hence we may
apply the Lemma.
126
Corollary 2.2. Suppose X is a random variable with distribution function F
and characteristic function satisfying
_
R
[
X
[dt < . Then F is continuously
dierentiable and
F

(x) =
1
2
_
R
e
ity

X
(y)dy.
Proof. Let x
1
= x h, x
2
= x, h > 0. Since (x
1
, x
2
) = F(x
2
) F(x
1
) we have
F(x
2
) F(x
1
) +
1
2
(F(x
1
) F(x
1
)) +
1
2
(F(x
2
) F(x
2
))
= (x
1
, x
2
) +
1
2
x
1
+
1
2
x
2

=
1
2
_

_
e
it(xh)
e
itx
it
_

X
(t)dt.
Since

e
it(xh)
e
itx
it

_
x
xh
e
ity
dy

h
we see that
lim
h
((x
1
, x
2
) +
1
2
(x
1
)) +
1
2
x
2

lim
h0
h
2
_
R
[
X
(t)[ = 0.
Hence, x = 0 for any x R, proving the continuity of F. Now,
F(x +h) F(x)
h
= (x, x +h)
=
1
2
_
R
_
e
it
e
it(x+h)
hit
_

X
(t)dt
=
1
2
_
R

(e
it(x+h)
e
itx
)
hit

X
(t)dt.
Let h 0 to arrive at
F

(x) =
1
2
_
R
e
itx

X
(t)dt.
127
Note that the continuity of F

follows from this, he continuity of the exponential


and the dominated convergence theorem.
Writing
F(x) =
_
x

(t)dt =
_
x

f(t)dt
we see that it has a density
f =
1
2
_
R
e
itx

X
(t)dt.
and hence also
(t) =
_
R
e
itx
f(x)dx.
3 Weak convergence and characteristic functions.
Theorem 3.1. Let
n
be a sequence of probability measures with characteristic
functions
n
. (i) If
n
converges weakly to a probability measure with charac-
teristic function , then
n
(t) (t) for all t R. (ii) If
n
(t) (t) for all
t R where is a continuous function at 0, then the sequence of measures
n

is tight and converges weakly to a measure and is the characteristic function


of . In particular, if
n
(t) converges to a characteristic function then
n
.
Example 3.1. Let
n
N(0, n). Then
n
(t) = e

nt
2
2
. (By scaling if X
N(,
2
) then
X
(t) = e
it
2
t
2
/2
.) Clearly
n
0 for all t ,= 0 and
n
(0) = 1
for all n. Thus
n
(t) converges for ever t but the limit is not continuous at 0. Also
with

n
(, x] =
1

2n
_
x

e
t
2
/2n
dt
a simple change of variables (r =
t

n
) gives

n
= (, x] =
1

2
_ x

e
t
2
2
dt 1/2
and hence no weak convergence.
128
Proof of (i). This is the easy part. Note that g(x) = e
itx
is bounded and continu-
ous. Since
n
we get that E(g(X
n
)) E(y(X

)) and this gives


n
(t) (t)
for every t R.
For the proof of (ii) we need the following Lemma.
Lemma 3.1 (Estimate of in terms of ). For all A > 0 we have
[2A, 2A] A

_
A
1
A
1
(t)dt

1. (3.1)
This, of course can also be written as
1 A

_
A
1
A
1
(t)[dt[ [2A, 2A], (3,2)
or
P[X[ > 2A 2 A

_
A
1
A
1
(t)dt

. (3.3)
Proof of (ii). Let > 0.

1
2
_

(t)dt

1
2
_

n
(t)dt

+
1
2
_

[
n
(t) (t)[dt.
Since
n
(t) (t) for all t, we have for each xed > 0 (by the dominated
convergence theorem)
lim
n
1
2

[
n
(t) (t)[dt 0.
Since is continuous at 0, lim
0
1
2
_

[(t)[dt = [(0)[ = 1. Thus for all > 0


there exists a = () > 0 and n
0
= n
0
() such that for all n n
0
,
1 /2 <

1
2
_

n
(t)dt

+/2,
129
or
1

n
(t)dt

> 2(1 ).
Applying the Lemma with A =
1

gives

n
[2
1
, 2
1
] >

(t)dt

> 2(1 ) 1 = 1 2,
for all n n
0
. Thus the sequence
n
is tight. Let
n
k
. Then a probability
measure. Let be the characteristic function of . Then since
n
k
the rst
part implies that
n
k
(t) (t) for all t. Therefore, (t) = (t) and hence (t) is
a characteristic function and any weakly convergent subsequence musty converge
to a measure whose characteristic function is . This completes the proof.
Proof of Lemma 3.1. For any T > 0
_
T
T
(1 e
itx
)dt = 2T
_
T
T
(cos tx +i sin tx)dt
= 2T
2 sin(Tx)
x
.
Therefore,
1
T
_
R
n
_
T
T
(1 e
itx
)dtd(x) = 2
_
R
2 sin(Tx)
Tx
d(x)
or
2
1
T
_
T
T
(t)dt = 2
_
R
2 sin(x)
Tx
d(x).
That is, for all T > 0,
1
2T
_
T
T
(t)dt =
_
R
sin(Tx)
Tx
d(x).
Now, for any [x[ > 2A,

sin(Tx)
Tx

1
[Tx[

1
(2TA)
130
and also clearly,

sin Tx
Tx

< 1, for all x.


Thus for any A > 0 and any T > 0,
[
_
R
sin(Tx)
Tx
d(x)[ =

_
2A
2A
sin(Tx)
Tx
(dx) +
_
|x|>2A
sin(Tx)
Tx
d(x)

[2A, 2A] +
1
2TA
[1 [2A, 2A]]
=
_
1
1
2TA
_
[2A, 2A] +
1
2TA
.
Now, take T = A
1
to conclude that
A
2

_
A
1
A
1
(t)dt

_
R
sin Tx
Tx
d

=
1
2
[2A, 2A] + 1/2
which completes the proof.
Corollary. x: [x[ > 2/T
1
T
_
T
T
(1 (t))dt, or in terms of the random
variable,
P[X[ > 2/T
1
T
_
T
T
(1 (t))dt,
or
P[X[ > T T/2
_
T
1
T
1
(1 (t))dt.
4 Moments and Characteristic Functions.
Theorem 4.1. Suppose X is a random variable with E[X[
n
< for some positive
integer n. Then its characteristic function has bounded continuous derivatives
of any order less than or equal to n and

(k)
(t) =

(ix)
k
e
itx
d(x),
131
for any k n.
Proof. Let be the distribution measure of X. Suppose n = 1. Since
_
R
[x[d(x) <
the dominated convergence theorem implies that

(t) = lim
h0
(t +h) (t)
h
= lim
n0
_
R
_
e
i(t+h)x
e
itx
h
_
d
=
_
R
(ix)e
itx
d(x).
We now continue by induction to complete the proof.
Corollary 4.1. Suppose E[X[
n
< , n an integer. Then its characteristic func-
tion has the following Taylor expansion in a neighborhood of t = 0.
(t) =
n

m=0
i
m
t
m
E(X)
m
m!
+o(t
n
).
We recall here that g(t) = o(t
m
) as t 0 means g(t)/t
m
0 as t 0.
Proof. By calculus, if has n continuous derivatives at 0 then
(t) =
n

m=0

(m)
(0)
m!
t
m
+o(t
n
).
In the present case,
(m)
(0) = i
m
E(X
m
) by the above theorem.
Theorem 4.2. For any random variable X and any n 1

Ee
itX

m=0
E(itX)
m
m!

e
itX

m=0
(itX)
m
m!

E
_
min
_
[tX[
n+1
(n + 1)!
,
2[tX[
n
n!
__
.
This follows directly from
132
Lemma 4.2. For any real x and any n 1,

e
ix

m=0
(ix)
m
m!

min
_
[x[
n+1
(n + 1)!
,
2[X[
n
n!
_
.
We note that this is just the Taylor expansion for e
ix
with some information
on the error.
Proof. For all n 0 (by integration by parts),
_
x
0
(x s)
n
e
is
ds =
x
n+1
(n + 1)
+
i
(n + 1)
_
x
0
(x s)
n+1
e
is
ds.
For n = 0 this is the same as
1
i
(e
ix
1) =
_
x
0
e
is
ds = x +i
_
x
0
(x s)e
is
ds
or
e
ix
= 1 +ix +i
2
_
x
0
(x s)e
is
ds.
For n = 1,
e
ix
= 1 +ix +
i
2
x
2
2
+
i
3
2
_
x
0
(x s)
2
e
is
ds
and continuing we get for any n,
e
ix

m=0
(ix)
m
m!
=
i
n+1
n!
_
x
0
(x s)
n
e
is
ds.
So, need to estimate the right hand side.

i
n+1
n!
_
x
0
(x s)
n
e
is
ds

1
n!

_
x
0
(x s)
n
dx

=
[x[
n+1
(n + 1)!
.
This is good for [x[ small. Next,
i
n
_
x
0
(x s)
n
e
is
ds =
x
n
n
+
_
x
0
(x s)
n1
e
is
ds.
133
Since
x
n
n
=
x
_
0
(x s)
n1
ds
we set
i
n
_
x
0
(x s)
n
e
is
ds =
_
x
0
(x s)
n1
(e
is
1)ds
or
i
n+1
n!
_
x
0
(x s)
n
e
is
ds =
i
n
(n 1)!
_
x
0
(x s)
n1
(e
is
1)ds.
This gives

i
n+1
n!
_
x
0
(x s)
n
e
is
ds

2
(n 1)!
_
|x|
0
(x s)
n1
ds

2
n!
[x[
n
,
and this completes the proof.
Corollary 1. If EX = and E[X[
2
=
2
< , then (t) = 1+it
t
2

2
2
+o(t)
2
,
as t 0.
Proof. Applying Theorem 4.1 with n = 2 gives

(t)
_
1 +it
t
2

2
2
_

t
2
E
_
[t[[X[
3
3!

2[X[
2
2!
_
and the expectation goes to zero as t 0 by the dominated convergence theo-
rem.
5. The Central Limit Theorem.
We shall rst look at the i.i.d. case.
Theorem 5.1. X
i
i.i.d. with EX
i
= , var(X
i
) =
2
< . Set S
n
= X
1
+
. . . +X
n
. Then
S
n
n

n
N(0, 1).
134
Equivalently, for any real number x,
P
S
n

n
x
1

2
_
x

e
y
2
/2
dt.
Proof. By looking at X

i
= X
i
, we may assume = 0. By above

X
1
(t) = 1
t
2

2
2
+g(t)
with
g(t)
t
2
0 as t 0. By i.i.d.,

S
n
(t) =
_
1
t
2

2
2
+g(t)
_
n
or
S
n

n
(t) =
S
n
(

nt) =
_
1
t
2
2n
+g
_
t

n
__
n
.
Since
g(t)
t
2
0 as t 0, we have (for xed t) that
g
_
t

n
_
(1/

n)
2
=
g
_
t

n
_
1
n
0,
as n . This can be written as
ng
_
t

n
_
0 as n .
Next, set C
n
=
t
2
2
+ng
_
t

n
_
and C = t
2
/2. Apply Lemma 5.1 bellow to
get
S
n

n
(t) =
_
1
t
2
2n
+g(t/

n)
_
n
e
t
2
/2
and complete the proof.
135
Lemma 5.1. If C
n
are complex numbers with C
n
C C. Then
_
1 +
C
n
n
_
n

e
C
.
Proof. First we claim that if z
1
, z
2
, . . . , z
n
and
1
, . . . ,
n
are complex numbers
with [z
j
[ and [
j
[ for all j, then

m=1
z
m

n

m=1


n1
n

m=1
[z
m

m
[. (5.1)
If n = 1 the result is clearly true for n = 1; with equality, in fact. Assume it for
n 1 to get

m=1
z
m

n

m=1

z
n
n1

m=2
z
m

n
n1

m=1

z
n
n1

m=1
z
m
z
n
n1

m=1

m
+z
n
n1

m=1

m

n
n1

m=1

n1

m=1
z
m

n1

m=1

n1

m=1

[z
n

m
[

n2
n1

m=1
[z
m

m
[ +
n1
[z
n

m
[
=
n1
n

m=1
[z
m

m
[.
Next, if b C and [b[ 1 then
[e
b
(1 +b)[ [b[
2
. (5.2)
For this, write e
b
= 1 +b +
b
2
2
+
b
3
3!
+. . . . Then
[e
b
(1 +b)[
[b[
2
2
_
1 +
2[b[
3!
+
2[b[
2
4!
+. . .
_

[b[
2
2
_
1 +
1
2
+
1
2
2
+
1
2
3
+. . .
_
= [b[
2
,
which establishes (5.2).
136
With both (5.1) and (5.2) established we let > 0 and choose > [C[. Take
n large enough so that [C
n
[ < and
2
e
2
/n < and

C
n
n

1. Set z
i
= (1 +
C
n
n
)
and
i
= (e
C
n
/n
) for all i = 1, 2, . . . , n. Then
[z
i
[ =

1 +
C
n
n

_
1 +

n
_
and [
i
[ e
/n
hence for both z
i
and [
i
[ we have the bound e
/n
_
1 +

n
_
. By (5.1) we have

_
1 +
C
n
n
_
n
e
C
n

n
(n1)
_
1 +

n
_
n1
n

m=1

e
C
n
n

_
1 +
C
n
n
_

n
(n1)
_
1 +

n
_
n1
n

e
C
n
n

_
1 +
C
n
n
_

Setting b = C
n
/n and using (5.2) we see that this quantity is dominated by
e

n
(n1)
_
1 +

n
_
n1
n

C
n
n


2
e

n
(n1)
_
1 +

n
_
n1
n


2
e

n
(n1)
e

n


2
e
2
n
< ,
which proves the lemma
Example 5.1. Let X
i
be i.i.d. Bernoullians 1 and 0 with probability 1/2. Let
S
n
= X
1
+. . . +X
n
= total number of heads after ntones.
EX
i
= 1/2, var(X
i
) = EX
2
i
(E(X))
2
= 1/2 (
1
4
) = 1/4
and hence
S
n

n

n
=
S
n

n
2
_
n/4
= N(0, 1).
From a table of the normal distribution we nd that
P( > 2) 1 .9773 = 0.227.
137
Symmetry:
P([[ < 2) = 1 2(0.227) = .9546.
Hence for n large we should have
0.95 P
_

S
n

n
2
_
n
2
/4

< 2
_
= P
_
2
2

n S
n

n
2
<
2
2

n
_
= P
_
n
2

n < S
n

n +n/2
_
.
If n = 250, 000,
n
2

n = 125, 000 500


n
2
+

n = 125, 000 + 5000.


That is, with probability 0.95, after 250,000 tosses you will get between 124,500
and 125,500 heads.
Examples 5.2. A Roulette wheel has slots 138 (18 red and 18 black) and two
slots 0 and 00 that are painted green. Players can bet $1 on each of the red and
black slots. The player wins $1 if the ball falls on his/her slot. Let X
1
, . . . , X
n
be
i.i.d. with X
i
= 1 and P(X
i
= 1) =
18
38
, P(X
i
= 1) =
20
38
. S
n
= X
1
+ +X
n
is the total fortune of the player after n games. Suppose we want to know P(S
n
0)
after large numbers tries. Since
E(X
i
) =
18
38

20
38
=
2
38
=
1
19
var(X
i
) = EX
2
(E(x))
2
= 1
_
1
19
_
2
= 0.9972
we have
P(S
n
0) = P
_
S
n
n

n

n

n
_
.
138
Take n such that

n
_
1
19
_
(.9972)
= 2.
This gives

n = 2(19)(0.9972) or n 3 61.4 = 1444. Hence
P(S
1444
0) = P
_
S
n
n

n
2
_
P( 2)
= 1 0.9772
= 0.0228
Also,
E(S
1444
) =
1444
19
= 4.19
= 76.
Thus, after n = 1444 the Casino would have won $76 of your hard earned dollars,
in the average, but there is a probability .0225 that you will be ahead. So, you
decide if you want to play!
Lemma 5.2. Let C
n,m
be nonnegative numbers with the property that max
1mn
C
n,m

0 and
n

m=1
C
n,m
. Then
n

m=1
(1 C
n,m
) e

.
Proof. Recall that
lim
a0
log
_
1
1a
_
a
= 1.
Therefore, given > 0 there exists > 0 such that 0 < a < implies
(1 )a log
_
1
1 a
_
(1 +)a.
139
If n is large enough, C
m,n
and
(1 )C
m,n
log
_
1
1 C
m,n
_
(1 +)C
m,n
.
Thus
n

m=1
log
_
1
1 C
m,n
_

and this is the same as
n

m=1
log(1 C
m,n
)
or
log
_
n

m=1
(1 C
m,n
)
_
.
This implies the result.
Theorem 5.2 (The LindebergFeller Theorem). For each n, let X
n,m
, 1
m n, be independent r.v.s with EX
n,m
= 0. Suppose
(i)
n

m=1
EX
2
n,m

2
, (0, ).
(ii) For all > 0, lim
n
n

m=1
E([X
n,m
[
2
; [X
n,m
[ > ) = 0.
Then, S
n
= X
n,1
+X
n,2
+. . . +X
n,m
N(0,
2
).
Example 5.3. Let Y
1
, Y
2
, . . . be i.i.d., EY
i
= 0, E(Y
2
i
) =
2
. Let X
n,m
=
Y
m
/n
1/2
. Then X
n,1
+X
n,2
+. . . +X
n,m
=
S
n

n
. Clearly,
n

m=1
E(Y
2
m
)
n
=

2
n
n

m=1
1 =
2
.
Also, for all > 0,
n

m=1
E([X
n,m
[
2
; [X
n,m
[ > ) = nE
_
[Y
1
[
2
n
;
[Y
1
[
n
1/2
>
_
= E([Y
1
[
2
; [Y
1
[ > n
1/2
)
140
and this goes to 0 as n since E[Y
1
[
2
< .
Proof. Let
n,m
(t) = E(e
itX
n,m
),
2
n,m
= E(X
2
n,m
). It is enough to show that
n

m=1

n,m
(t) e
t
2

2
/2
.
Let > 0 and set z
n,m
=
n,m
(t),
n,m
= (1 t
2

2
n,m
/2). We have
[z
n,m

n,m
[ E
_
[tX
n,m
[
3
3!

2[tX
n,m
[
2
2!
_
E
_
[tX
n,m
[
3!
3

2[tX
n,m
[
2
2!
; [X
n,m
[
_
+E
_
[tX
n,m
[
3
3!

2t
2
[X
n,m
[
2
2!
; [X
n,m
[ >
_
E
_
[tX
n,m
[
3
3!
; [X
n,m
[
_
+E
_
[tX
n,m
[
2
; [X
n,m
[ >
_

t
3
6
E[X
n,m
[
2
+t
2
E([X
n,m
[
2
; [X
n,m
[ > )
Summing from 1 to n and letting n gives (using (i) and (ii))
lim
n
n

m=1
[z
n,m

n,m
[
t
3

2
6
.
Let 0 to conclude that
lim
n
n

m=1
[z
n,m

n,m
[ 0.
Hence with = 1 (5.1) gives

m=1

n,m
(t)
n

m=1
_
1
t
2

2
n,m
2
_

0,
as n . Now,

2
n,m

2
+E([X
n,m
[
2
; [X
n,m
[ > )
141
and therefore,
max
1mn

2
n,m

2
+
n

m=1
E([X
n,m
[
2
; [X
n,m
[ > ).
The second term goes to 0 as n . That is, max
1mn

2
n,m
0. Set C
n,m
=
t
2

2
n,m
2
.
Then
n

m=1
C
n.m

t
2
2

and Lemma 5.2 shows that
n

m=1
_
1
t
2

n,m
2
_
e

t
2

2
2
,
completing the proof of the Theorem.
We shall now return to the Kolmogorov three series theorem and prove the
necessity of the condition. This was not done when we rst stated the result earlier.
For the sake of completeness we state it in full again.
The Kolmogorovs Three Series Theorem. Let X
1
, X
2
, . . . be independent
random variables. Let A > 0 and Y
m
= X
m
1
(|X
m
|A)
. Then

n=1
X
n
converges a.s.
if and only if the following three hold:
(i)

n=1
P([X
n
[ > A) < ,
(ii)

n=1
EY
n
converges and
(iii)

n=1
var(Y
n
) < .
Proof. We have shown that if (i), (ii), (iii) are true then

n=1
X
n
converges a.s. We
now show that if

n=1
X
n
converges then (i)(iii) hold. We begin by proving (i).
142
Suppose this is false. That is, suppose

m=1
P([X
n
[ > A) = .
Then the BorelCantelli lemma implies that
P([X
n
[ > A i.o.) > 0.
Thus, lim
n

m=1
X
m
cannot exist. Hence if the series converges we must have (i).
Next, suppose (i) holds but

n=1
var(Y
n
) = . Let
C
n
=
n

m=1
var(Y
m
) and X
n,m
=
(Y
m
EY
m
C
1/2
n
.
Then
EX
n,m
= 0 and
n

m=1
EX
2
n,m
= 1.
Let > 0 and choose n so large that
2A
C
1/2
n
< . Then
n

m=1
E([X
n,m
[
2
; [X
n,m
[ > )
n

m=1
E
_
[X
n,m
[
2
; [X
n,m
[ >
2A
C
1/2
n
_

m=1
E
_
[X
n,m
[
2
;
2A
C
1/2
n
<
[Y
n
[ +E[Y
m
[
C
1/2
n
_
.
But
[Y
n
[ +E[Y
m
[
C
1/2
n

2A
C
1/2
n
.
So, the above sum is zero. Let
S
n
= X
n,1
+X
n,2
+. . . X
n,m
=
1
C
1/2
n
n

m=1
(Y
m
EY
m
).
By Theorem 5.2,
S
n
N(0, 1).
143
Now, if lim
n
n

m=1
X
m
exists then lim
n
n

m=1
Y
m
exists also. (This follows from (i).)
Let
T
n
=
n

m=1
Y
m
C
1/2
n
=
1
C
1/2
n
n

m=1
Y
m
and observe that T
n
0. Therefore, (S
n
T
n
) where N(0, 1). (This
follows from the fact that lim
n
E(g(S
n
T
n
)) = lim
n
E(g(S
n
)) = E(g()).)
But
S
n
T
n
=
1
C
1/2
n
n

m=1
E(Y
m
)
which is nonrandom. This gives a contradiction and shows that (i) and (iii) hold.
Now,

n=1
var(Y
n
) < implies

m=1
(Y
m
EY
m
) converges, by the corollary
to Kolmogorov maximal inequality. Thus if
n

m=1
X
n
converges so does

Y
m
and
hence also

EY
m
.
6. The Polya distribution.
We begin with some discussion on the Polya distribution. Consider the density
function given by
f(x) = (1 [x[)1
x(1,1)
= (1 [x[)
+
.
Its characteristic function is given by
(t) =
2(1 cos t)
t
2
and therefore for all y R,
(1 [y[)
+
=
2
2
_
R
e
ity
(1 cos t)
t
2
dt
=
1

_
R
_
1 cos t
t
2
_
e
ity
dt.
144
Take y = y this gives
(1 [y[)
+
=
1

_
R
(1 cos t)
t
2
e
ity
dt.
So, if f
1
(x) =
1 cos x
x
2
which has
_
R
f
1
(x)dx = 1, and we take X F where F has
density f
1
we see that (1[t[)
+
is its characteristic function This is called the Polya
distribution. More generally, If f
a
(x) =
1cos ax
ax
2
, then then we get the characteristic
function
a
(t) =
_
1

t
a

_
+
, just by changing variables. The following fact will be
useful below. If F
1
, . . . , F
n
have characteristic functions
1
, . . . ,
n
, respectively,
and
i
0 with

i
= 1. Then the characteristic function of
n

i=1

i
F
i
is
n

i=1

i
.
Theorem 6.1 (The Polya Criterion). Let (t) be a real and nonnegative
function with (0) = 1, (t) = (t), decreasing and convex on (0, ) with
lim
t0
(t) = 1, lim
t
(t) = 0. There is a probability measure on (0, ) so that
(t) =
_

0
_
1

t
s

_
+
d(s)
and (t) is a characteristic function.
Example 6.1. (t) = e
|t|

for any 0 < 2. If = 2, we have the normal


density. If = 1, we have the Cauchy density. Let us in show here that exp([t[

)
is a characteristic function for any 0 < < 1. With a more delicate argument, one
can do the case 1 < < 2. We only need to verify that the function is convex.
Dierentiating twice this reduces to proving that
t
22

2
t
2
+t
2
> 0.
This is true if
2
t


2
+ > 0 which is the same as
2
t


2
+ > 0 which
follows from t

+(1 ) > 0 since 0 < 1.


145
7. Rates of Convergence; BerryEsseen Estimates.
Theorem 7.1. Let X
i
be i.i.d., E[X
i
[
2
=
2
, EX
i
= 0 and E[X
i
[
3
= < . If
F
n
is the distribution of
S
n

n
and (x) is the normal distribution, we have
sup
xR
[F
n
(x) (x)[
c

n
,
where c is an absolute constant. In fact, we may take c = 3.
More is actually true:
F
n
(x) = (x) +
H
1
(x)

n
+
H
2
(x)
n
+. . . +
H
3
(x)
n
3/2
+. . .
where H
i
(x) are explicit functions involving Hermit polynomials. We shall not
prove this, however.
Lemma 7.1. Let F be a distribution function and G a realvalued function with
the following conditions:
(i) lim
x
G(x) = 0, lim
x+
G(x) = 1,
(ii) G has bounded derivative with sup
xR
[G

(x)[ M. Set A =
1
2M
sup
xR
[F(x) G(x)[.
There is a number such that for all T > 0,
2MTA
_
3
_
TA
0
1 cos x
x
2
dx
_

1 cos Tx
x
2
F(x +) G(x +)dx

.
Proof. Observe that A < , since G is bounded and we may obviously assume that
it is positive. Since F(t) G(t) 0 at t , there is a sequence x
n
b R
such that
F(x
n
) G(x
n
)
_

_
2MA
or
2MA
.
146
Since F(b) F(b) it follows that either
_

_
F(b) G(b) = 2MA
or
F(b) G(b) = 2MA.
Assume F(b) G(b) = 2MA, the other case being similar.
Put
= b A < b, since
A = (b ).
If [x[ < A we have
G(b) G(x +) = G

()(b x)
= G

()(Ax)
Since [G

()[ M we get
G(x +a) = G(b) + (x A)G

()
G(b) + (x A)M.
So that
F(x +a) G(x +a) F(b) [G(b) + (x )M]
= 2MAxM +AM
= M(x +A)
for all x [A, A]. Therefore for all T > 0,
_
A
A
1 cos Tx
x
2
F(x +) G(x +)dx M
_
A
A
1 cos Tx
x
2
(x +A)dx
= 2MA
_
A
0
_
1 cos Tx
x
2
_
dx
147
Also,

__
A

+
_

A
_
1 cos Tx
x
2
F(x +) G(x +)dx

2MA
__
A

+
_

A
_
1 + cos Tx
x
2
dx
= 4MA
_

A
1 cos Tx
x
2
dx.
Adding these two estimates gives
_

_
1 cos Tx
x
2
_
F(x +) G(x +)dx
2MA
_

_
A
0
+2
_

A
__
1 cos Tx
x
2
_
dx
= 2MA
_
3
_
A
0
+2
_

0
__
1 cos Tx
x
2
_
dx
= 2MA
_
3
_
A
0
1 cos Tx
x
2
dx + 2
_

0
1 cos Tx
x
2
dx
_
= 2MA
_
3
_
A
0
1 cos Tx
x
2
dx + 2
_
T
2
__
= 2MTA
_
3
_
TA
0
1 cos x
x
2
dx +
_
< 0,
proving the result.
Lemma 7.2. Suppose in addition that G is of bounded variation in (, ) (for
example if G has a density) and that

[F(x) G(x)[dx < .


Let f(t) and g(t) be the characteristic functions of F and G, respectively. Then
A
1
2M
_
T
T
[f(t) g(t)[
t
dt +
12
T
,
for any T > 0.
Proof. Since F and G are of bounded variation,
f(t) g(t) = it
_

F(x) G(x)e
itx
dx.
148
Therefore,
f(t) g(t)
it
e
it
=
_

(F(x) G(x))e
it+itx
dx
=
_

F(x +) G(x +)e


itx
dx.
It follows from our assumptions that the right hand side is uniformly bounded in
. Multiply the left hand side by (T [t[) and integrating gives
_
T
T
_
f(t) g(t)
it
_
e
it
(T [t[)dt
=
_
T
T
_

F(x +) G(x +)e


itx
(T [t[)dxdt
=
_

F(x +) G(x +)
_
T
T
e
itx
(T [t[)dtdx
=
_

(F(x +) G(x +))


_
T
T
e
itx
(T [t[)dtdx.
= I
Writing
1 cos Tx
x
2
=
1
2
_
T
T
(T [t[)e
itx
dt
we see that
I = 2
_

(F(x +) G(x +))


_
1 cos Tx
x
2
_
dx
which gives

_
F(x +) G(x +)
_
1 cos Tx
x
2
_
dx

1
2

_
T
T
f(t) g(t)
it
e
ita
(T [t[)dt

T/2
_
T
T

f(t) g(t)
t

dt
Therefore by Lemma 7.1,
2MA
_
3
_
TA
0
1 cos x
x
2
dx
_

1
2
_
T
T

f(t) g(t)
t

dt
149
However,
3
_
TA
0
1 cos x
x
2
dx = 3
_

0
1 cos x
x
2
dx 3
_

TA
1 cos x
x
2
dx
= 3
_

2
_
3
_

TA
1 cos x
x
2
dx

3
2
6
_

TA
dx
x
2
=

2

6
TA
Hence,
_
T
T
[f(t) g(t)[
t
dt 2
_
2MA
_

2

6
TA
__
= 2MA
24M
T
or equivalently,
A
1
2M
_
T
T

f(t) g(t)
t

dt +
12
T
,
which proves the theorem.
Proof of Theorem 7.1. Without loss of generality,
2
= 1. Then 1. We will
apply the above lemmas with
F(x) = F
n
(x) = P
_
S
n

n
> x
_
and
G(x) = (x) =
1

2
_
x

e
y
2
/2
dy.
Clearly they satisfy the hypothesis of Lemma 7.1 and in fact we may take M = 2/5
since
sup
xR
[

(x)[ =
1

2
= .39894 < 2/5.
Also clearly G is of bounded variation. We need to show that
_
R
[F
n
(x) (x)[dx < .
150
To see this last fact, note that Clearly,
_
1
1
[F
n
(x) (x)[dx <
and we need to verify that
_
1

[F
n
(x) (x)[dx +
_

1
[F(x) (x)[dx < . (7.1)
For x > 0, P([X[ > x
1

2
E[X[
2
, by Chebyshevs inequality. Therefore,
(1 F
n
(x)) = P
_
S
n

n
> x
_

1
x
2
E

S
n

2
<
1
x
2
and if N denotes a normal random variable with mean zero and variance 1 we also
have
(1 (x)) = P(N > x)
1
x
2
E[N[
2
=
1
x
2
.
In particular: for x > 0, max ((1 F
n
(x)), (1 (x)))
1
x
2
. If x < 0 then
F
n
(x) = P
_
S
n

n
< x
_
= P
_

S
n

n
> x
_

1
x
2
E

S
n

2
=
1
x
2
and
(x) = P(N < x)
1
x
2
.
Once again, we have max(F
n
(x), (x))
1
x
2
hence for all x ,= 0 we have
[F(x) (x)[
1
x
2
.
Therefore, (7.1) holds and we have veried the hypothesis of both lemmas. We
obtain
[F
n
(x) (x)[
1

_
T
T
[
n
(t/

n) e
t
2
/2
[
[t[
dt +
24M
T

_
T
T
[
n
(t/

n) e
t
2
/2
[
[t[
dt +
48
5T
151
Assume n is large and take T =
4

n
3
. Then
48
5T
=
48 3
54

n
=
12 3
5

n
=
36
5

n
.
Next we claim
1
[t[
[
n
(t/

n) e
t
2
/2
[
1
T
e
t
2
/4
_
2t
2
9
+
[t[
3
18
_
(7.2)
for T t T, T = 4

n/3 and n 10. If this were the case then


T[F
n
(x) (x)[
_
T
T
e
t
2
/4
_
2t
2
9
+
[t[
3
18
_
dt +
48
5
=
_
T
T
e
t
2
/4
_
2t
2
9
+
[t[
3
18
_
dt +
48
5

2
9
_

e
t
2
/4
t
2
dt +
1
18
_

e
t
2
/4
[t[
3
dt + 9.6
= I +II + 9.6.
Since
2
9
_

e
t
2
/4
t
2
dt =
8
9

and
1
18
_

[t[
3
e
t
2
/4
dt =
1
18
_
2
_

0
t
3
e
t
2
/4
dt
_
=
1
18
_
2
_

0
t
2
te
t
2
/4
dt
_
=
1
18
+
_
2
_

0
2t 2e
t
2
/4
dt
_
=
1
18
_
8
_

0
te
t
2
/4
dt
_
=
_
16e
t
2
/4

0
_
1
18
=
16
18
=
8
9
.
152
Therefore,
T[F
n
(x) (x)[
_
8
9

+
8
9
+ 9.6
_
.
This gives,
[F
n
(x) (x)[
1
T
_
8
9
(1 +

_
+ 9.6
_
=
3
4

n
_
8
9
(1 +

) + 9.6
_
<
3

n
.
For n 9, the result is clear since 1 . It remains to prove (7.2). Recall that
that

(t)
n

m=0
E(itX)
m
m!

E(min
|tX|
n+1
(n+1)!
2|tX|
n
n!
_
. This gives

(t) 1 +
t
2
2

[t[
3
6
and hence
[(t)[ 1 t
2
/2 +
[t[
3
6
,
for t
2
2.
With T =
4

n
3
, if [t[ T then
|t|

n
(4/3) < 2 and t/

n =
4
3
< 2. Thus

_
t

n
_

1
t
2
2n
+
[t[
3
6n
3/2
= 1
t
2
2n
+
[t[
6

n
[t[
2
n
1
t
2
2n
+
4
18
t
2
n
= 1
5t
2
18n
e
5t
2
18n
,
given that 1 x e
x
. Now, let z = (t/

n), w = e
t
2
/2n
and = e
5t
2
18n
. Then
for n 10,
n1
e
t
2
/4
and the lemma above gives
[z
n
w
n
[ n
n1
[z w[
153
which implies that
[(t/

n) e
t
2
/2
[ ne
5t
2
18n
(n1)

(t/

n) e
t
2
/2n

ne
t
2
/4

(t/

n) 1 +
t
2
2n
e
t
2
/2n
+ 1
t
2
2n

ne
t
2
/4

(t/

n) 1 +
t
2
2n

+ne
t
2
/4

1
t
2
2n
e
t
2
/2n

ne
t
2
/4
[t[
3
6n
3/2
+ne
t
2
/4
t
4
2 4n
2
,
using the fact that [e
x
(1 x)[
x
2
2
, for 0 < x < 1. We get
1
[t[

(t/

n) e
t
2
/2

t
2
e
t
2
/4
6

n
+
e
t
2
/4
[t[
3
8n
= e
t
2
/4
_
t
2
6

n
+
[t[
3
8n
_

1
T
e
t
2
/4
_
2t
2
9
+
[t[
3
18
_
,
using /

n =
4
3
T and
1
n
=
1

n
1

n

4
3
T
1
3
, > 1 and n 10.This completed
the proof of (7.2) and the proof of the theorem.
Let us now take a look at the following question. Suppose F has density f.
Is it true that the density of
S
n

n
tends to the density of the normal? This is not
always true. as shown in Feller, volume 2, page 489. However, it is true if add some
other conditions. We state the theorem without proof.
Theorem. Let X
i
be i.i.d., EX
i
= 0 and EX
2
i
= 1. If L
1
, then
S
n

n
has a
density f
n
which converges uniformly to
1

2
e
x
2
/2
= (x).
8. Limit Theorems in R
d
.
Recall that R
d
= (x
1
, . . . , x
d
): x
i
R. For any two vectors x, y R
d
we
will write x y if x
i
y
i
for all i = 1, . . . , d and write x if x
i
for all
154
i. Let X = (x
1
, . . . , x
d
) be a random vector and dened its distribution function
by F(x) = P(X x). F has the following properties:
(i) If x y then F(x) F(y).
(ii) lim
x
F(x) = 1, lim
x
i

F(x) = 0.
(iii) F is right continuous. That is, lim
yx
F(x) = F(x).
The distribution measure is given by (A) = P(X A), for all A B(R
d
).
However, unlike the situation of the real line, a function satisfying (i) (ii) may
not be the distribution function of a random vector. Example: we must have:
P(X (a
1
, b
1
] (a
2
, b
2
]) = F(b
1
, b
2
) F(a
1
, b
2
)
P(a < X
1
b
1
, a
2
X
2
b
2
) F(b
1
, a
2
) +F(a
1
, a
2
).
Need: measure of each vect. 0,
Example 8.1.
F(x
1
, x
2
) =
_

_
1, x
1
, x
1
1
2/3, x
1
1, 0 x
2
1
2/3, x
2
1, 0 x
1
< 1
0, else
If 0 < a
1
, a
2
< 1 b
1
, b
2
< , then
F(b
1
, b
2
) F(a
1
, b
2
) F(b
1
, a
2
) +F(a
1
, a
2
) = 1 2/3 2/3 + 0
= 1/3.
155
Hence the measure has
(0, 1) = (1, 0) = 2/3, (1, 1) = 1/3
which is a signed measure (not a probability measure).
If F is the distribution function of (X
1
, . . . , X
n
), then F
i
(x) = P(X
i
x),
x R is called the marginal distributions of F. We also see that
F
i
(x) = lim
m
F(m, . . . , m, x
i
, . . . , m)
As in the real line, F has a density if there is a nonnegative function f with
_
R
n
f(y)dy =
_
R

_
R
f(y
1
, y
2
, . . . , y
n
)dy
1
. . . dy
n
= 1
and
F(x
1
, x
2
, . . . , x
d
) =
_
x
1

. . .
_
x
d

f(y)dy
1
. . . dy
2
.
Denition 8.1. If F
n
and F are distribution functions in R
d
, we say F
n
converges
weakly to F, and write F
n
F, if lim
n
F
n
(x) = F(x) for all points of continuity
of F. As before, we also write X
n
X,
n
.
As in the real line, recall that A in the closure of A and A
o
is its interior and
A = A A
o
is its boundary. The following two results are exactly as in the real
case. We leave the proofs to the reader.
Theorem (Skorohod) 8.1. Suppose X
n
X. Then there exists a sequence of
random vectors Y
n
and a random vector Y with Y
n
X
n
and Y X such that
Y
n
Y a.e.
Theorem 8.2. The following statements are equivalent to X
n
X.
(i) Ef(X
n
) E(f(X)) for all bounded continuous functions f.
156
(iii) For all closed sets K, limP(X
n
K) P(X K).
(iv) For all open sets G, limP(X
n
G) P(X G).
(v) For all Borel sets A with (P(X A) = 0,
lim
n
P(X
n
A) = P(X

A).
(vi) Let f : R
d
R be bounded and measurable. Let D
f
be the discontinuity
points of f. If P(X D
f
) = 0, then E(f(X
n
)) E(f(X

)).
Proof. X
n
X

(i) trivial. (i) (ii) trivial. (i) (ii). Let d(x, K) = inf[x
y[: y K. Set

j
(t) =
_

_
1 t 0
1 jt 0 t j
1
0
1
t
and let f
j
(x) =
j
(dist (x, K)). The functions f
j
are continuous and bounded by
1 and f
j
(x) I
K
(x), since K is closed. Therefore,
limsup
n

n
(K) lim
n
E(f
j
(X
n
))
= E(f
j
(X))
and this last quantity P(X K) as j .
That (iii) (iv) follows by taking complements. For (v) implies convergence
in distribution, assume F is continuous at x = (x
1
, . . . , x
d
), and set A = (, x] =
(, x
1
] . . . (, x
d
]. We have (A) = 0. So, F
n
(x) = F(x
n
A) P(X


A) = F(x).
As in the real case, we say that a sequence of measurers
n
is tight if given
0 exists an M

> 0 such that


inf
n

n
([M

, M

]
d
) 1 .
157
We remark here that Theorem 1.6 above holds also in the setting of R
d
.
The characteristic function of the random vector X = (X
1
, . . . , X
d
) is dened as
(t) = E(e
itX
) where t X = t
1
X
1
+. . . +t
d
X
d
.
Theorem 8.3 (The inversion formula in R
d
). Let A = [a
1
, b
1
] . . . [a
d
, b
d
]
with (A) = 0. Then
(A) = lim
T
1
(2)
d
_
T
T
. . .
_
T
T

1
(t
1
)(t) . . .
d
(t
d
)(t)dt
1
, . . . , dt
2
= lim
T
1
(2)
d
_
[T,T]
d
d

j=1

j
(t
j
)(t)dt,
where

j
(s) =
_
e
isa
j
e
sb
j
is
_
,
for s R.
Proof. Applying Fubinis Theorem we have
_
[T,T]
d
d

j=1

j
(t
j
)
_
R
d
e
itx
d(x)
=
_
[T,T]
d
d

j=1

j
(t
j
)
_
R
d
e
it
1
x
1
+...+it
d
X
d
d(x)dt
=
_
R
d
_
[T,T]
d
d

j=1

j
(t
j
)e
itX
dtd(x)
=
_
R
d
_
[T,T]
d
d

j=1

j
(t
j
)e
it
j
X
j
dtd(x)
=
_
R
d
_
d

j=1
_
[T,T]

j
(t
j
)e
it
j
X
j
dt
j
_
d(x)

_
R
d
d

j=1
_
(1
(a
j
,b
j
)
(x
j
) + 1
[a
j
,b
j
]
(x
j
)
_
d(x)
and this proves the result.
158
Theorem (Continuity Theorem) 8.4. Let X
n
and X be random vectors with
characteristic functions
n
and , respectively. Then X
n
X if and only if

n
(t) (t).
Proof. As before, one direction is trivial. Let f(x) = e
itx
. This is bounded and
continuous. X
n
X implies
n
(x) = E(f(X
n
)) (t).
For the other direction we need to show tightness. Fix R
d
. Then for
s R
n
(s) (s). Let

X
n
= X
n
. Then

X
(s) =
X
(s) and

X
n
(s)

X
(s).
Therefore the distribution of

X
n
is tight by what we did earlier. Thus the random
variables e
j
X
n
are tight. Let > 0. There exists a constant positive constant M
i
such that
lim
n
P(e
j
X
i
[M
i
, M
i
]) 1 .
Now take M = max
1jd
M
j
. Then
P(X
n
[M, M]
d
) 1
and the result follows.
Remark. As before, if
n
(t) (t) and is continuous at 0, then (t) is the
characteristic function of a random vector X and X
n
X.
Also, it follows from the above argument that If X
n
X for all R
d
then X
n
X. This is often called the CramerWold devise.
Next let X = (X
1
, . . . , X
d
) be independent X
i
N(0, 1). Then X has den-
sity
1
(2)
d/2
e
|x|
2
/2
where [x[
2
=

d
i=1
[x
i
[
2
. This is called the standard normal
distribution in R
d
and its characteristic function is

X
(t) = E
_
d

j=1
e
it
j
X
j
_
= e
|t|
2
/2
.
159
Let A = (a
ij
) be a d d matrix. and set Y = AX where X is standard normal.
The covariance matrix of this new random vector is

ij
= E(Y
i
Y
j
)
= E
_
d

l=1
a
il
X
l

d

m=1
a
jm
X
m
_
=
d

l=1
d

m=1
a
il
a
jm
E(X
l
X
m
)
=
d

l=1
a
il
a
jl
.
Thus = (
ij
) = AA
T
. and the matrix is symmetric;
T
= . Also the quadratic
form of is positive semidenite. That is,

ij

ij
t
i
t
j
= t, t) = A
T
t, A
T
t)
= [A
T
t[
2
0.

Y
(t) = E(e
itAX
)
= E(e
iA
T
tX
)
= e

|A
T
t|
2
2
= e

P
ij

ij
t
i
t
j
.
So, the random vector Y = AX has a multivariate normal distribution with co-
variance matrix .
Conversely, let be a symmetric and nonnegative denite dd matrix. Then
there exists an orthogonal matrix O such that
O
T
O = D
160
where D is diagonal. Let D
0
=

D and A = OD
0
.
Then AA
T
= OD
0
(D
T
0
O
T
) = ODO
T
= . So, if we let Y = AX, X normal, then
Y is multivariate normal with covariance matrix . If is nonsingular, so is A
and Y has a density.
Theorem 8.5. Let X
1
, X
2
, . . . be i.i.d. random vectors, EX
n
= and covariance
matrix

ij
= E(X
1,j

j
)(X
1,i

i
)).
If S
n
= X
1
+. . . +X
n
then
S
n
n

n

where is a multivariate normal with covariance matrix = (
ij
).
Proof. By setting X

n
= X
n
we may assume = 0. Let t R
d
. Then

X
n
= tX
n
are i.i.d. random variables with E(

X
n
) = 0 and
E[

X
n
[
2
= E
_
d

i=1
t
i
(X
n
)
i
_
2
=

ij
t
i
t
j

ij
.
So, with

S
n
=
n

j=1
(t X
j
) we have

S
n
(1) = E(e
i

S
n
) e

P
ij

ij
t
i
t
j
/2
This is equivalent to

S
n
(t) = E(e
itS
n
) e

P
ij

ij
t
i
t
j
/2
.
Theorem. Let X
i
be i.i.d. E[X
i
[
2
=
2
, EX
i
= 0 and E[X
i
[
3
= < . Then if
F
n
is the distribution of
S
n

n
and (x) is the normal we have
sup
xR
[F
n
(x) (x)[
c

n
(may take c = 3).
161
?? is actually true:
F
n
(x) = (x) +
H
1
(x)

n
+
H
2
(x)
n
+. . . +
H
3
(x)
n
3/2
+. . .
where H
i
(x) are explicit functions involving Hermid polynomials.
Lemma 1. Let F be a d.f., G a realvalued function with the following conditions:
(i) lim
x
G(x) = 0, lim
x+
G(x) = 1,
(ii) G has bounded derivative with sup
xR
[G

(x)[ M. Set A =
1
2M
sup
xR
[F(x) G(x)[.
There is a number a s.t. T > 0
2MT
_
3
_
T
0
1 cos x
x
2
dx
_

1 cos Tx
x
2
F(x +a) G(x +a)dx

.
Proof. < since G is bounded. Assume L.H.S. is > 0 so that > 0.
b<a
a>0
a < b
b<a
.
Since F G = 0 at , sequence x
n
b R s.t.
F(x
n
) G(x
n
)
_

_
2M
or
2M
.
So, either
_

_
F(b) G(b) = 2M
or
F(b) G(b) = 2M.
Assume F(b) G(b) = 2M.
Put
a = b < b, since
= (b a)
162
if [x[ < we have
G(b) G(x +a) = G

()(b a x)
= G

()(x) [G

()[ M
or
G(x +a) = G(b) + (x )G

()
G(b) + (x )M.
So that
F(x +a) G(x +a) F(b) [G(b) + (x )M]
= 2MxM + M
= M(x + ) x [, ]
T to be chosen: we will consider

+rest
_

1 cos Tx
x
2
F(x +a) G(x +a)dx
M
_

1 cos Tx
x
2
(x + )dx
= 2M
_

0
_
1 cos Tx
x
2
_
dx
(1)

__

+
_

_
F(x +a) G(x +a)dx

2M
__

+
_

_
1 + cos Tx
x
2
dx = 4M
_

1 cos Tx
x
2
dx.
(2)
163
Add:
_

_
1 cos Tx
x
2
_
F(x +a) G(x +a)dx
2M
_

_

0
+2
_

__
1 cos Tx
x
2
_
dx
= 2M
_
3
_

0
+2
_

0
__
1 cos Tx
x
2
_
dx
= 2M
_
3
_

0
1 cos Tx
x
2
dx + 2
_

0
1 cos Tx
x
2
dx
_
= 2M
_
3
_

0
1 cos Tx
x
2
dx + 2
_
T
2
__
= 2MT
_
3
_
T
0
1 cos x
x
2
dx +
_
< 0.
Lemma 2. Suppose in addition that
(iii) G is of bounded variation in (, ). (Assume G has a density).
(iv)

[F(x) G(x)[dx < .


Let
f(t) =
_

e
itx
dF(x), g(t) =
_

e
itx
dG(x).
Then

1
2M
_
T
T
[f(t) g(t)[
t
dt +
12
T
.
Proof.
f(t) g(t) = it
_

F(x) G(x)e
itx
dx
and
f(t) g(t)
it
e
ita
=
_

(F(x) G(x))e
ita+itx
dx
=
_

F(x +a) G(x +a)e


+itx
dx.
164
By (iv), R.H.S. is bounded and L.H.S. is also. Multiply left hand side by (T [t[)
and integrade
_
T
T
_
f(t) g(t)
it
_
e
ita
(T [t[)dt
=
_
T
T
_

F(x +a) G(x +a)e


itx
(T [t[)dxdt
=
_

F(x +a) G(x +a)


_
T
T
e
itx
(T [t[)dtdx
=
_

(F(x +a) G(x +a))


_
T
T
e
itx
(T [t[)dtdx
Now,
1 cos Tx
x
2
=
1
2
_
T
T
(T [t[)e
itx
dt
above
= 2
_

(F(x +a) G(x +a))


_
1 cos Tx
x
2
_
dx
or

_
F(x +a) G(x +a)
_
1 cos Tx
x
2
_
dx

1
2

_
T
T
f(t) g(t)
it
e
ita
(T [t[)dt

T/2
_
T
T

f(t) g(t)
t

dt
Lemma 1
2M
_
3
_
T
0
1 cos x
x
2
dx
_

_
T
T
[f(t) g(t)[
t
dt
or now,
3
_
T

0
1 cos x
x
2
dx = 3
_

0
1 cos x
x
2
dx 3
_

T
1 cos x
x
2
dx
= 3
_

2
_
3
_

T
1 cos x
x
2
dx

3
2
6
_

T
dx
x
2
=

2

6
T
165
or
_
T
T
[f(t) g(t)[
t
dt 2
_
2M
_

2

6
T
__
= 2M
24M
T
or

1
2MT
_
T
T
[f(t) g(t)[
[t[
dt +
12

We have now bound the dierence between the d.f. satisfying certain conditions
by the average dierence of.
Now we apply this to our functions: Assume wlog
2
= 1, 1.
F(x) = F
n
(x) = P
_
S
n

n
> x
_
, X
i
i.i.d.
G(x) = (x) = P(N > x) =
1

2
_
x

e
y
2
/2
dy.
Clearly F
n
and satisfy (i).
sup
x
[

(x)[ =
1

2
e
x
2
/2
=
1

2
= .39894 < 2/5 = M.
(iii) Satisfy:
(iv)
_
R
[F
n
(x) (x)[dx < .
Clearly,
_
1
1
[F
n
(x) (x)[dx < .
Need:
_
1

[F
n
(x) (x)[dx +
_

1
[F(x) (x)[dx < .
Assume wlog
2
= 1. For x > 0. P([X[ > x) =
1

2
F[x[
2
.
1 F
n
(x) = P
_
S
n

n
> x
_

1
[X[
2
E

S
n

2
<
1
[X[
2
1 (x) = P(N > x)
1
x
2
E[N[
2
=
1
[x[
2
.
166
In particular: for x > 0. max((1 F
n
(x)), max 1 (x))
1
|x|
2
.
If x < 0. Then
F
n
(x) = P
_
S
n

n
< x
_
= P
_

S
n

n
> x
_

1
x
2
E

S
n

2
=
1
x
2
(x) = P(N < x)
1
x
2

x<0
max(F
n
(x), (x))
1
x
2
[F(x) (x)[
1
x
2
if x < 0
[F(x) (x)[ =
= [1 (x) (1 F(x))[
1
x
2
. x > 0
(iv) hold.
[F
n
(x) (x)[
1

_
T
T
[
n
(t/

n) e
t
2
/2
[
[t[
dt +
24M
T

_
T
T
[
n
(t/

n) e
t
2
/2
[t[
dt +
48
5T
tells us what we must do. Take T = multiple of

n.
Assume n 10.
Take T =
4

n
3
: Then
48
5T
=
48 3
54

n
=
12 3
5

n
=
36
5

n
.
For second. Claim:
1
[t[
[
n
(t/

n) e
t
2
/2
[

1
T
e
t
2
/4
_
2t
2
9
+
[t[
3
18
_
,
T t T
T = 4

n/3
, n 10
167
T[F
n
(x) (x)[
_
T
T
e
t
2
/4
_
2t
2
9
+
[t[
3
18
_
dt
+
48
5

=
_
T
T
e
t
2
/4
_
2t
2
9
+
[t[
3
18
_
dt +
9.6
48
5

2
9
_

e
t
2
/4
t
2
dt +
1
18
_

e
t
2
/4
[t[
3
dt + 9.6
= I +II + 9.6.
Recall:
1

2
2
_
e
t
2
/2
2
t
2
dt =
2
.
Take
2
= 2
2
9
_

e
t
2
/2
2
t
2
dt =
2
9

2 2 2 =
2 2 2
9

=
8
9

1
18
__

[t[
3
e
t
2
/2
2
dt
_
=
1
18
_
2
_

0
t
3
e
t
2
/4
dt =
1
18
_
2
_

0
t
2
te
t
2
/4
dt
_
=
1
18
+
_
2
_

0
2t 2e
t
2
/4
dt
_
=
1
18
_
8
_

0
te
t
2
/4
dt =
_
16e
t
2
/4

0
_
1
18
=
16
18
=
8
9
T[F
n
(x) (x)[
_
8
9

+
8
9
+ 9.6
_
.
or
[F
n
(x) (x)[
1
T
_
8
9
(1 +

_
+ 9.6
_

3
4

n
_
8
9
(1 +

) + 9.6
_
=

n
_
..
_
<
3

n
.
168
For n 9, the result follows. 1 < since
2
= 1.
Proof of claim. Recall:
2
= 1, [(t)
n

m=0
E(itX)
m
m!

E(min
[t[X[
nt
(n + 1)!
2[tX[
n
n!
_
.
(1)

(t) 1 +
t
2
2

[t[
3
6
and
(2) [(t)[ 1 t
2
/2 +
[t[
3
6
, for t
2
2.
So, if T =
4

n
3
if [t[ L
_
[t[

n
_
(4/3) =
16
9
< 2.
Also, t/

n =
4
3
< 2. So,

_
t

n
_

1
t
2
2n
+
[t[
3
6n
3/2
= 1
t
2
2n
+
[t[

n
[t[
2
n
1
t
2
2n
+
4
3
t
2
n
= 1
5t
2
18n
e
5t
2
18n
, gives that 1 x e
x
.
Now, let = (t/

n), = e
t
2
/2n
and = e
5t
2
18n
. Then n 10
n1
e
t
2
/4
.
[
n

n
[ n
n1
[ [
[(t/

n) e
t
2
/2
[ ne
5t
2
18n
(n1)
[(t/

n) e
t
2
/2n
[
ne
t
2
/4
[(t/

n) 1 +
t
2
2n
e
t
2
/2n1t
2
/2n
[
ne
t
2
/4
[(t/

n) 1 +
t
2
2n
[ +ne
t
2
/4
[1
t
2
2n
e
t
2
/2n
[
ne
t
2
/4
[t[
3
6n
3/2
+ne
t
2
/4
t
4
2 4n
2
using [e
x
(1 x)[
x
2
2
for 0 < x < 1
or
1
[t[
[(t/

n) e
t
2
/2
[
t
2
e
t
2
/4
6

n
+
e
t
2
/4
[t[
3
8n
= e
t
2
/4
_
t
2
6

n
+
[t[
3
8n
_

1
T
e
t
2
/4
_
2t
2
9
+
[t[
3
18
_
169
using /

n =
4
3
T and
1
n
=
1

n
1

n

4
3
T
1
3
, > 1 and n 10. Q.E.D.
Question: Suppose F has density f. Is it true that the density of
S
n

n
tends to the
density of the normal? This is not always true. (Feller v. 2, p. 489). However, it is
true if more conditions.
Let X
i
be i.i.d. EX
i
= 0, FX
2
i
= 1.
Theorem. If L
1
, then
S
n

n
has a density f
n
which converges uniformly to
1

2
e
x
2
/2
= (x).
Proof.
f
n
(x) =
1
2
_
R
e
itx

n
(t)dt
n(x) =
1
2
_
R
e
ity
e

1
2
t
2
dt
[f
n
(x) (x)[
1
2
_

[(t/

n)
n
e
1/2t
2
[dt
under the assumption
[(t)[ e

1
4
t
2
for [t[ < .
At 0, both sides are 0.
Both have ?? derivatives of social derivative of (t) at 0 is 1 smaller than
the second derivative of r.h.s.
Limit Theorems in R
d
Recall: R
d
= (x
1
, . . . , x
d
): x
i
R.
If X = (x
1
, . . . , X
d
) is a random vector, i.e. a r.v. X: R
d
. We dened its
distribution function by F(x) = P(X x), where X x X
i
x
i
, i = 1, . . . , d.
F has the following properties:
170
(i) x y F(x) F(y).
(ii) lim
x
F(x) = 1, lim
x
i

F(x) = 0.
(iii) F is right cont. i.e. lim
yx
F(x) = F(x).
X
p
we mean each coordinate goes to zero. You know what X
i
.
There is also the distribution measure on (R
d
, B(R
d
)): (A) = P(X A).
If you have a function satisfying (i) (ii), this may not induce a measure. Exam-
ple: we must have:
P(X (a
1
, b
1
] (a
2
, b
2
]) = F(b
1
, b
2
) F(a
1
, b
2
)
P(a < X
1
b
1
, a
2
X
2
b
2
) F(b
1
, a
2
) +F(a
1
, a
2
).
Need: measure of each vect. 0,
Example. f(x
1
, x
2
) =
_

_
1 x
1
, x
1
1
2/3
2/3 x
1
1, 0 x
2
1
0 x
2
1, 0 x
1
< 1
else
.
If 0 < a
1
, a
2
< 1 b
1
, b
2
<
F(b
1
, b
2
) F(a
1
, b
2
) F(b
1
, a
2
) +F(a
1
, a
2
) = 1 2/3 2/3 + 0
= 1/3.
171
The measure has:
(0, 1) = (1, 0) = 2/3, (1, 1) = 1/3
for each, need measure of ??
Other simple ??
Recall: If F is the dist. of (x
1
, . . . , X
n
), then F
i
(x) = P(X
i
x), x real in the
marginal distributions of F
F
i
(x) = lim
m
F(m, . . . , m, x
i+1
, . . . m)
F has a density if f 0 with
_
R
d
0 = f and
F(x
1
, x
2
, x
d
) =
_
x
1

. . .
_
x
d

f(y)dy
1
. . . dy
2
.
Def: If F F
n
is a distribution function in R
d
, we say F
n
converges weakly to F, if
F
n
F, if lim
n
F
n
(x) = F(x) for all pts of continuity of F.
X
n
X,
n
.
Recall: A = set of limits of sequences in A, closure of A. A
o
= R
d
(R
d
[A) interior.
A = AA
o
. A Borel set A is a continuity set if (A) = 0.
Theorem 1 Skorho. X
n
X

r.v. X
n
X
n
, Y X

s.t. Y
n
Y a.e.
Theorem 2. The following statements are equivalent to X
n
X

.
(i) Ef(X
n
) E(f(X

)) bounded cont. f.
(iii) closed sets k, limP(X
n
k) P(X

k).
(iv) open sets G, limP(X
n
G) P(X

G).
(v) continuity A, (P(X A).
lim
n
P(X
n
A) = P(X

A).
172
(vi) Let D
p
= discontinuity sets of f. If P(X

D
p
) = 0 E(f(X
n
))
E(f(X

)), f bounded.
Proof. X
n
X

(i) trivial. (i) (ii) trivial. (i) (ii). Let d(x, k) = infd(x
y): y k.

j
(t) =
_

_
1 t 0
1 jt 0 t j
1
0
1
t
Let f
j
(x) =
j
(dist (x, k)). f
j
is cont. and bounded by 1 and f
j
(x) I
k
(x) since
k is closed.
limsup
n

n
(k) = lim
n
E(f
j
(X
n
))
= E(f
j
(X

)) P(X

k). Q.E.D.
(iii) (iv): A open i A
c
closed and P(X A) +P(X A
c
) = 1.
(v) implies conv. in dis. If F is cont. at X, then with A = (, x
1
]
. . . (, x
d
], x = (x
1
, . . . , x
n
) we have (A) = 0. So, F
n
(x) = F(x
n

A) P(X

A) = F(x). Q.E.D.
As in 1dim.
n
is tight if given 0 s.t.
inf
n

n
([M, M]
d
) 1 .
Theorem. If
n
is tight
n
j
s.t.
n
j
weak
.
Ch. f. Let X = (X
1
, . . . , X
d
) be a r. vector. (t) = E(e
itX
) is the Ch.f. t X =
t
1
X
1
+. . . +t
d
X
d
.
Inversion formula: Let A = [a
1
, b
1
] . . . [a
d
, b
d
] with (A) = 0. Then
(A) = lim
T
1
(2)
d
_
T
T
. . .
_
T
T

1
(t
1
)(t) . . .
d
(t
d
)(t)dt
1
, . . . , dt
2
173
where

j
(s) =
_
e
isa
j
e
sb
j
is
_
= lim
T
1
(2)
d
_
[T,T]
d
d

j=1

j
(t
j
)(t)dt
Proof. Apply Fubinis Theorem:
A = [a
1
, b
1
] . . . [a
d
, b
d
] with (A) = 0. Then
_
[T,T]
d
d

j=1

j
(t
j
)
_
R
d
e
itx
d(x)
=
_
[T,T]
d
d

j=1

j
(t
j
)
_
R
d
e
it
1
x
1
+...+it
d
X
d
d(x)dt
=
_
R
d
_
[T, T]
d
d

j=1

j
(t
j
)e
itX
dtd(x)
=
_
R
d
_
[T,T]
d
d

j=1

j
(t
j
)e
itjX
j
dtd(x)
=
_
R
d
_
d

j=1
_
[T,T]

j
(t
j
)e
itjX
j
dt
j
_
d(x)

_
R
d
d

j=1
_
(1
(a
j
,b
j
)
(x
j
) + 1
[a
j
,b
j
]
(x
j
)
_
d(x)
results = (A).
Continuity Theorem. Let X
n
, 1 n be random vectors with Ch. f.s
n
.
Then X
n
X


n
(t)

(t).
Proof. f(x) = e
itx
is bounded and cont. to, X
n
x implies
n
(x) = E(f(X
n
))

(t).
Next, we show as earlier, that sequence is tight.
174
Fix O R
d
. Then for s R
n
(sO)

(sO). Let

X
n
= O X
n
:
Then

X
(s) =
X
(Os). Then

X
n
(s)

X
(s).
The dist. of

X
n
is tight.
Thus, for each vector, e
j
X
n
is tight.
So, given > 0, M
i
s.t.
lim
n
P(X
n
i
[M
i
, M
i
]) > 1 .
Now, [M
i
, M
i
] take M = largest of all.
P(X
n
[M, M]) 1 . Q.E.D.
Remark. As before, if
n
(t)
a
(t) and

is cont. at 0, then

(t) is the
Ch.f. of a r.vec. X

and X
n
X

.
We also showed: CramerWold device
If O X
n
O X

O R
d
X
n
X.
Proof the condition implies E(e
iOX
n
) E(e
iOX
n
) O R
d


n
(O) (O)
Q.E.D.
.
Last time: (Continuity Theorem): X
n
X

i
n
(t)

(t).
We showed: O X
n
O X

O R
d
.
Implies: X
n
X
This is called CramerWold device.
Next let X = (X
1
, . . . , X
d
) be independent X
i
N(0, 1). Then X
i
has
density
1

2
e

|x|
2
2
.
175
X has density
1
(2)
d/2
e
|x|
2
/2
,
x = (x
1
, . . . , x
d
), [x[
2
=
d

i=1
[x
i
[
2
.
This is called the standard normal.
The Ch.f.
(t) = E
_
d

j=1
e
it
j
X
j
_
= e
|t|
2
/2
.
Let A = (a
ij
) be a d d matrix. Let
Y = AX, X normal
Y
j
=
d

l=1
a
jl
X
l
Let

ij
= E(Y
i
Y
j
)
= E
_
d

l=1
a
il
X
l

d

m=1
a
jm
X
m
_
=
d

l=1
d

m=1
a
il
a
jm
E(X
l
X
m
)
=
d

l=1
a
il
a
jl
= (
ij
) = AA
T
.
Recall: For any matrix, Bx, x) = x, B
+
x). So, is symmetric.
T
= .
Also,

ij

ij
t
i
t
j
= t, t) = A
t
t, A
t
t)
= [A
t
t[ 0.
176
So, is nonnegative denite.
E(e
itAX
) = E(e
itA

tX
)
= e

|A
T
t|
2
2
= e

P
ij

ij
t
i
t
j
.
So, the random vector Y = AX has a multivariate normal distribution with co-
variance matrix .
Conversely: Let be a symmetric and nonnegative denite d d matrix.
Then O orthogonal s.t.
O
T
O = D D diagonal.
Let D
0
=

D and A = OD
0
.
Then AA
T
= OD
0
(D
T
0
O
T
) = ODO
T
= . So, if we let Y = AX, X normal, then
Y is multivariate normal with covariance matrix .
If is nonsingular, so is A and Y has a density.
Theorem. Let X
1
, X
2
, . . . be i.i.d. random vectors, EX
n
= and covariance
matrix

ij
= E(X
1,j

j
)(X
1,i

i
)).
If S
n
= X
1
+. . . X
n
then
S
n
n

n
where
is a multivariate normal with covariance matrix = (
ij
).
Proof. Letting X

n
= X
n
we may assume = 0. Let t R
d
. Then

X
n
= t X
n
are i.i.d. random variables with E(

X
n
) = 0 and
E[

X
n
[
2
= E
_
d

i=1
t
i
X
ni
_
2
=

ij
t
i
t
j

ij
.
177
So, with

S
n
=
n

j=1
(t X
j
) we have

S
n
(1) = E(e
i

S
n
) e

P
ij

ij
t
i
t
j
/2
or

S
n
(t) = E(e
itS
n
) Q.E.D.
178
Math/Stat 539. Ideas for some of the problems in nal homework assignment. Fall
1996.
#1b) Approximate the integral by
1
n
n

k=1
_
B
k
n

B
k1
n
_
and . . .
E
__
1
0
B
t
dt
_
2
= E
__
1
0
_
1
0
B
s
B
t
dsdt
_
= 2
_
1
0
_
1
s
E(B
s
B
t
)dtds
= 2
_
1
0
_
1
s
sdtds =
1
3
.
#2a) Use the estimate C
p
=
2
p
C
1
e
c/
2

p
(e
c/
2
2
p
)
from class and choose c/

p.
#2b) Use (a) and sum the series for the exponential.
#2c) Show (2) c() for some constant c. Use formula E((X)) =

_
0

()PX d
and apply good inequalities.
#3a) Use the exponential martingale and ...
#3b) Take b = 0 in #3a).
#4)(i) As in the proof of the reection property. Let
Y
1
s
() =
_
1, s < t and u < (t s) < v
0 else
and
Y
2
s
() =
_
1, s < t, 2a v < (t s) < 2a u
0 else.
Then E
x
(Y
1
s
) = E
x
(Y
2
s
) (why?) and with = (infs: B
s
= a) t, we apply the
strong Markov property to get
E
x
(Y
1

[T

)
why
= E
x
(Y
2

[T

)
179
and ... gives the result.
#4)(i) Let the interval (u, v) x to get
P
0
M
t
> a, B
t
= x = P
0
B
t
= 2a x =
1

2t
e

(2ax)
2
2t
and dierentiate with respect to a.
#5a) Follow Durrett, page 402, and apply the Markov property at the end.
#7)(i)
E(X
n+1
[T
n
) = e
S
n
(n+1)()
E(e

n+1
[T
n
)
= e
S
n
n()
(
n+1
independent of T
n
).
(ii) Show

() =

()/() and
_

()
()
_
=

()
()

_

()
()
_
2
= E(Y
2

) (E(Y

))
2
> 0
where Y

has distribution
e
x
()
(distribution of
1
). (Why is true?)
(iii)
_
X

n
= e

2
S
n

n
2
()
= X
/2
n
e
n{(

2
)
1
2
()}
Strict convexity, (0) = 0, and ... imply that
E
_
X

n
= e
n{(

2
)
1
2
()}
0
as n . This implies X

n
0 in probability.
180
Chapter 7
1) Conditional Expectation.
(a) The RadonNikodym Theorem.
Durrett p. 476
Signed measures: If
1
and
2
are two measures, particularly prob. measures, we
could add them. i.e. =
1
+
2
is a measure.
But what about
1

2
?
Denition 1.1. By a signed measure on a measurable space (, T) we mean an
extended real valued function dened on T such that
(i) assumes at most one of the values + or .
(ii)
_

_
j=1
E
j
_
=

j=1
(E
j
), E
j
s are disjoint in T. By (iii) we mean that
the series is absolutely convergent if
_

_
j=1
E
j
_
is nite and properly
divergent if
_

_
j=1
E
j
_
is innite or innite.
Example. f L
1
[0, 1], then (if f 0, get a measure)
(E) =
_
E
fdx
(Positive sets): A set A T is a positive set if (E) 0 for every mble subset
E A.
(Negative sets): A set A T is negative if for every measurable subset E
A, (E) 0.
(Null): A set which is both positive and negative is a null set. Thus a set is null
i every measurable subset has measure zero.
181
Remark. Null sets are not the same as sets of measure zero.
Example. Take given above.
Our goal now is to prove that the space can be written as the disjoint union
of a positive set and a negative set. This is called the HahnDecomposition.
Lemma 1.1. (i) Every mble subset of a positive set is positive.
(ii) If A
1
, A
2
, . . . are positive then A =

_
i=1
A
i
is positive.
Proof. (i): Trivial.
Proof of (ii). : Let A =

_
n=1
A
i
. A
i
positive. Let E A be mble. Write
_
_
_
E =

n=1
E
i
, E
i
E
j
= 0, i ,= j
E
j
= E A
j
A
c
j1
. . . A
c
1
A
j
(*)
(E
j
) 0
(F) = (E
j
) 0
We show (): Let x E
j
x E and x E
j
but x , A
j1
, . . . A
1
. x , E
i
if j > i. If x E, let j = rst j such that x A
j
. Then x E
j
, done. (Such a j
exists become E A).
Lemma 1.2. Let E be measurable with 0 < (E) < . Then there is mble set
A E. A positive such that 0 < (A).
Proof. If E is positive we are done. Let n
1
= smallest positive number such that
there is an E
1
E with
(E
1
) 1/n
1
.
Now, consider E[E
1
E.
182
Again, if E[E
1
is positive with (E[E
1
) > 0 we are done. If not n
2
= smallest
integer such that
1) E
2
E[E
1
with (E
2
) < 1/n
2
.
Continue:
Let n
k
= smallest positive integer such that
E
k
E

k1
_
j=1
E
j
with
(E
k
) <
1
n
k
.
Let
A = E[

_
k=1
E
k
.
Claim: A will do.
First: (A) > 0. Why?
E = A

_
k=1
E
k
are disjoint.
(E) = (A) +

k=1
(E
k
)
(A) > 0
since negative.
Now,
0 < (E) <

k=1
(E
k
)
converges.
Problem 1: Prove that A is also positive.
183
Absolutely.

k=1
1
n
k
<

(E
k
) < .
Suppose A is not positive. Then A has a subset A
0
with A
0
) < for some > 0.
Now, since
1
n
k
< , n
k
.
Theorem 1.1 (HahnDecomposition). Let be a signed measure on (, T).
There is a positive set A and a negative set B with A B = , A B = X.
Proof. Assume does not take +. Let = sup(A): A positive sets.
Positive sets, sup 0. Let A
n
p s.t.
= lim
n
(A
n
).
Set
A =

_
n=1
A
n
.
A is positive. Also, (A). Since
AA
n
A (A[A
n
) 0
and
(A) = (A
n
) +(A[A
n
) (A
n
).
Thus,
(A) 0 (A) = < .
0 (A).
Let B = A
c
.
Claim B is negative.
Let E B and E positive.
We show (E) = 0. This will do it.
184
For suppose E B, 0 < (E) < E has a positive subset of positive measure
by Lemma 1.2.
To show (E) = 0, observe E A is positive
(E A) = (E) +(A)
= (E) + (E) = 0.
Q.E.D.
Problem 1.b: Give an example to show that the Hahn decomposition is not unique.
Remark 1.1. The Hahn decomposition give two measures
+
and

dened by

+
(E) = (A E)

(E) = (B E).
Notice that
+
(B) = 0 and

(A) = 0. Clearly (E) =


+
(E)

(E).
Denition 1.2. Two measures
1
and
2
are mutually singular (
1

2
) if
there are two measurable subsets A and B with A B = A B = and

1
(A) =
2
(B) = 0. Notice that
+

.
Theorem 1.2 (Jordan Decomposition). Let be a signed measure. These are two
mutually singular measures
+
and

such that
=
+

.
This decomposition is unique.
Example. f L
1
[a, b]
(E) =
_
E
fdx.
Then

+
(E) =
_
E
f
+
dx,

(E) =
_
E
f

dx.
185
Denition 1.3. The measure is absolutely continuous with respect to , written
<< , if (A) = 0 implies (A) = 0.
Example. Let f 0 be mble and set (A) =
_
A
fd.
<< .
Theorem 1.3 (RadonNikodym Theorem). Let (, T, ) be nite measure
spaces. Assume << . Then there is a nonnegative measurable function f such
that
(E) =
_
E
fd.
The function f is unique a.e. []. We call f the RadonNikodym derivative of
with respect to and write
f =
d
d
.
Remark 1.2. The space needs to be nite.
Example. (, T, ) = ([0, 1], Borel, = counting). Then
m << , m = Lebesgue.
If
m(E) =
_
E
fd
f(x) = 0 for all x [0, 1].
m = 0, Contra.
3. Lemmas.
186
Lemma 1.3. Suppose B

D
is a collection of mble sets index by a countable
set of real numbers D. Suppose B

whenever < . Then there is a mble


function f such that f(x) on B

and f(x) on B
c

.
Proof. For x , set
f(x) = rst such that x B

= inf D: x B

.
inf = .
If x , B

, x , B

for any < and so, f(x) .


If x B

, then f(x) provided we show f is mble. Q.E.D.


Claim: real
x: f(x) < =
_
<
D
B

.
If f(x) < , then x D

save < . If x B

, < f() < . Q.E.D.


Lemma 1.4. Suppose B

D
as in Lemma 1.3 but this time < implies
only D

= 0. Then there exists a mble function f on such that f(x)


a.e. on B

and f(x) a.e. on B


c

.
Lemma 1.5. Suppose D is dense. Then the function in Lemma 1.3 is unique and
the function in lemma 1.4 is unique a.e.
Proof of Theorem 1.3. Assume () = 1. Let

= , Q.

is a signed measure. Let A

, B

be the HahnDecomp of

. Notice:
= B

, B

= , if 0. (1)
B

[B

= B

(X[B

) = B

. (2)
187
Thus,

(B

[B

) 0 (1)

(B

[B

) 0 (2)
or
(B

[B

) (B

[B

) 0 (1)
(B

[B

) (B

[B

) 0. (2)
Thus,
(B

[B

) (B

[B

) (B

[B

).
Thus, if < , we have
(B

[B

) = 0.
Thus, n mble f s.t. Q, f a.e. on A

and f(x) a.e. on B

. Since
B
0
= , f 0 a.e.
Let N be very large. Put
E
k
= E
_
B
k+1
N

B
k/N
_
, k = 0, 1, 2, . . .
E

_
k=0
B
k/N
.
Then E
0
, E
1
, . . . , E

are disjoint and


E =

_
k=0
E
k
E

.
So,
(E) = (E

) +

k=0
(E
k
)
on
E
k

B
k+1
N

B
k/N
=
B
k+1
N
A
k/N
.
188
We have,
k/N f(x)
k + 1
N
a.e.
and so,
k
N
(E
k
)
_
E
k
f(x)d
k + 1
N
(E
k
). (1)
Also
E
k
A
k/N

k
N
(E
k
) (E
k
) (2)
and
E
k

B
k+1
N
(E
k
)
k + 1
N
(E
k
). (3)
Thus:
(E
k
)
1
N
(E
k
)
k
N
(E
k
)
_
E
k
f(x)dx

k
N
(E
k
) +
1
N
(E
k
)
(E
k
) +
1
N
(E
k
)
on
E

, f a.e.
If (E

) > 0, then (E

) = 0 since ( )(E

) 0 . If (E

) = 0
(E

) = 0. So, either way:


(E

) =
_
E

fd.
Add:
(E)
1
N
(E)
_
E
fd (E) +
1
N
(E)
Since N is arbitrary, we are done.
Uniqueness: If (E) =
_
E
gd, E B
(E) (E) =
_
E
(g )d E A

.
189
Since
0 (E) (E) =
_
E
(g )d.
We have:
g 0[] a.e. on A

or
g a.e. on A

.
Similarly,
g a.e. on B

f = g a.e.
Suppose is nite: << . Let
i
be s.t.
i

j
= ,

i
= . (
i
) < .
Put
i
(E) = (E
i
) and
i
(E) = (E
i
). Then
i
<<
i
f
i
0 s.t.

i
(E) =
_
E
f
i
d
i
or
(E
i
) =
_
E
i
f
i
d =
_
E
d
i
d.
result.
Theorem 1.4 (The Lebesgue decomposition for measures). Let (, T) be a mea-
surable space and and nite measures on T. Then
0
and
1
<<
such that =
0
+
1
. The measure
0
and
1
are unique.
(f BV f = h +g. h singular g a.c.).
Proof. Let = +. is nite.
(E) = 0 (E) = (E) = 0.
190
R.N.
(E) =
_
E
fd
(E) =
_
E
gd.
Let
A = f > 0, B = f = 0
= A B, A B = , (B) = 0.
Let
0
(E) = (E B). Then

0
(A) = 0, so
0

set

1
(E) = (E A) =
_
EA
gd.
Clearly
1
+
0
= and it only remains to show that
1
<< . Assume (E) = 0.
Then
_
E
fd = 0 f 0 a.e. []. (f 0)
on E.
Since f > 0 on E A (E A) = 0. Thus

1
(E) =
_
EA
gd

= 0 Q.E.D.
uniqueness. Problem.
You know: P(A[B) =
P(A B)
P(B)
, A, B. P and B indept. P(A[B) = P(A). Now
we work with probability measures. Let (, T
0
, P) be a prob space, T T
0
a
algebra and X (T
0
) with E[X[ < .
191
Denition 1.4. The conditional expectation of X given T, written as E(X[T),
is any random variable Y with the properties
(i) Y (F)
(ii) A T,
_
A
XdP =
_
A
Y dP or E(X; A) = E(Y ; A).
Existence, uniqueness.
First, let us show that if Y has (i) and (ii), then E[Y [ E[X[.
With A = Y > 0 T, observe that
_
A
Y dp =
_
A
Xdp
_
A
[X[dp
and
_
A
c
Y dp =
_
A
c
Xdp
_
A
c
[X[dp.
So,
E[Y [ E[X[.
Uniqueness: If Y

also satises (i) and (ii), then


_
A
Y dp =
_
A
Y

dp A T

_
A
ydp =
_
A
Y

dp A T or
_
A
(Y Y

)dp = 0 A T.
Y = Y a.e.
Existence:
Consider dened on (, T) by
(A) =
_
A
Xdp. A T.
192
is a signed measure. << P.
Y (T) s.t.
(A) =
_
A
Y dp A T.

_
A
Y dp =
_
A
xdp. A T.
Example 1. Let A, B be xed sets in T
0
. Let
T = B = , , B, B
c
.
E(1
A
[T)?
This is a function so that thwn we integrate over sets in T, we get integral of 1
A
over the sets.
i.e.
_
B
E(1
A
[T)dP =
_
B
1
A
dP
= P(A B).
E(1
A
[T)P(B) = P(A B)
or
E(1
A
[T)1
B
(B) = P(A B)
or
P(A[B) = E(1
A
[T)1
B
=
P(A B)
P(B)
.
In general. If X is a random variable and T = (
1
,
2
, . . . ) where
i
are disjoint,
then
1

i
E(X[T) =
E(X;
i
)
P(
i
)
or
E(X[T) =

i=1
E(X;
i
)
P(
i
)
1

i
().
193
Notice that if T, .
Thus
E(X[T) = E(X).
Properties:
(1) If X T E(X[T) = X. E(X) = E(E(X[T)).
(2) X is independent of T, i.e. ((X) T)
P(X B) A) = P(X B)P(A).
E(X[T) = E(X) (as in P(A[B) = P(A)).
Proof. To check this, (1) E(X) T.
(ii) Let A T. Thus
_
A
EXdP = E(X)E(1
A
)
= E(X1
A
) =
_
A
Xdp
E(X[T) = EX.
Theorem 1.5. Suppose that X, Y and X
n
are integrable.
(i) If X = a E(X[T) = a
(ii) For constants a and b, E(aX +bY ) = aE(X[T) +bE(Y [T).
(iii) If x Y E(X[T) E(X[T). In particular, [E(X[T)[ E([X[[T).
(v) If lim
n
X
n
= X and [X
n
[ Y, Y integrable, then E(X
n
[T) E(X[T)
a.s.
(vi) Monotone and Fatous hold the same way.
Proof. (i) Done above since clearly a T.
194
(ii)
_
A
aE(x[T)dp +
_
A
bE(Y [T)dP
_
A
(aE(x[T) +bE(Y [I))dp =
= a
_
A
E(X[T)dP +b
_
A
E(Y [T)dP
= a
_
A
XdP +b
_
A
Y dp =
_
A
(aX +bY )dp =
_
A
E(aX +bY [T)dP
(iii)
_
A
E(X[T)dP =
_
A
XdP
_
A
Y dP
=
_
A
E(Y [T)dP. A T
E(X[T) E(Y [T). a.s.
(iv) Let Z
n
= sup
kn
[X
k
X[. Then Z
n
0 a.s.
[E(X
n
[T) E(X[T)[ E([X
n
Y [[T
n
)
E[Z
n
[T].
Need to show E[Z
n
[T) 0 with pub. 1. By (iii), E(Z
n
[T
n
) decreasing.
So, let Z = limit. Need to show Z 0.
We have Z
n
2Y .
E(Z) =
_

E(Z[T)dp (E(Z[ = E(E(Z[T))


= E(E(Z[T)) E(E(Z
n
[T)) = E(Z
n
) 0 by D.C.T.
Theorem 1.6 (Jensen Inequality). If is convex E[X[ and E[(X)[ < , then
(E(X[T)) E((X)[T).
Proof.
195
(x
0
) +A(x
0
)(x x
0
) (x).
x
0
= E(X[T) x = X.
(E(X[T)) +A(E(X[T))(X E(X[T)) (X).
Note expectation of both sides.
E[(E(X[T)) +A(E(X[T))(X E(X[T)[T) E((X[T)
(E(X[T)) +A(E(X[T))[E(X[T) E(X[T)] E((X[T)).
Corollary.
[E(X[T)[
p
E([X[
p
[T) for 1 p <
exp(E(X[T)) E(e
X
[T).
Theorem 1.7. (1) If T
1
T
2
then
(a) E(E(X[T
1
)[T
2
)) = E(X[T
1
)
(b) E(E(X[T
2
)[T
1
)) = E(X[T
1
). (The smallest eld always wins).
(2) If X (T), E[Y [, E[XY [ <
E(XY [T) = XE(Y [T)
(measurable functions act like constants). (Y = 1, done before).
Proof. (a) E(X[T
1
) (T
1
) (T
2
).
Done.
(b) E(X[T
1
) T
1
. So A T
1
T
2
we have
_
A
E(X[T
1
)dp =
_
A
(X)dp
=
_
A
E(X[T
2
)dp =
_
A
E(E(x[T
2
)[T
2
)dP.
196
Durrett: p 220: #1.1
p. 222: #1.2
p.225: #1.3
p. 227: 1.6
p. 228: 1.8.
Let
L
2
(T
0
) = X T
0
: EX
2
N
and
L
2
(T
1
) = Y T
1
: EY
2
< .
Then L
2
(T
1
) is a closed subspace of L
2
(T
0
). In fact, with X
1
, X
2
) = E(X
1
X
2
),
L
2
(T
0
) and L
2
(T
1
) are Hilbert spaces. L
2
(T
1
) is closed subspace in L
2
(T
0
). Given
any X L
2
(T
0
), Y L
2
(T
1
) such that
dist (X, L
2
(T
1
)) = E(x y)
2
Theorem 1.8. Suppose Ex
2
< , then
inf
XL
2
(F
1
)
E([X Y [
2
) = E([X (EX[T
1
))
2
.
Proof. Need to show
E([X Y [
2
) E[X E(X[T
1
))
2
for any y L(T
1
). Let y L
2
(T
1
) out set. Set
Z = Y E(X[T) L
2
(T
1
),
Y = Z +E(X[T).
Now, since
E(ZE(X[T)) = E(ZX[T)) = E(ZX)
197
we see that
E(ZE(X[T)) E(ZX) = 0.
E(X Y )
2
= EX Z E(X[T)
2
= E(X E(X[T))
2
+E(Z
2
)
2E((X E(X[T))Z)
E(X E(X[T))
2
2E(XZ) + 2E(ZE(X[T))
= E(X E(X[T))
2
. Q.E.D.
By the way: If X and Y are two r.v. we dene
E(X[Y ) = E(x[(Y )).
Recall conditional expectation for ??
Suppose X and Y have joint density f(x, y)
P((X, Y ) B) =
_
B
f(x, y)dxdy, B R
2
.
And suppose
_
R
f(x, y)dx > 0 y. We claim that in this case, if E[g(X)[ < ,
then
E(g(X)[Y ) = h(Y )
and
h(y) =
_
R
g(x)f(x, y)dy
_
R
(f(x, y)dx
.
Treat the given density as if the second probability
P(X = x[Y = y) =
P(X = x, Y = y)
P(Y = y)
=
f(x, y)
_
R
f(x, y)dx
.
198
Now: Integrale:
E(g(X)[Y = y) =
_
g(x)P(X = x)Y = y))dy.
To verify: (i) clearly h(Y ) (Y ). For (ii): let
A = Y B for B B(R).
Then need to show
E(h(Y ); A) = E(g(X); A).
L.H.S., E(h(Y )1
R
(X) A)
=
_
B
_
R
h(y)f(x, y)dxdy
_
B
_
R
_
_
R
g(3)f(3, y)d
3
_
_
R
f(x, y)dx
f(x, y)dxdy
=
_
B
_
R
g(3)f(3, y)d
3
d
y
= E(g(X)1
B
(Y )) = E(g(X); A).
(If
_
f(x, y)dy = 0, dene h by h(y)
_
f(x, y)dx =
_
g(x)f(x, y)dy i.e. h can be
anything where
_
f(x, y) dy = 0).

You might also like