CH 3

McFadden, !
"#"$%"$&#' )**'%+ 2000 Chapter 3-1, Page 43

___________________________________________________________________________
CHAPTER 3. A REVIEW OF PROBABILITY THEORY
3.1. SAMPLE SPACE
The starting point Ior probability theory is the concept oI a state of Nature, which is a description
oI everything that has happened and will happen in the universe. In particular, this description
includes the outcomes oI all probability and sampling experiments. The set oI all possible states oI
Nature is called the sample space. Let s denote a state oI Nature, and S the sample space. These are
abstract objects that play a conceptual rather than a practical role in the development oI probability
theory. Consequently, there can be considerable Ilexibility in thinking about what goes into the
description oI a state oI Nature and into the speciIication oI the sample space; the only critical
restriction is that there be enough states oI Nature so that distinct observations are always associated
with distinct states oI Nature. In elementary probability theory, it is oIten convenient to think oI the
states oI Nature as corresponding to the outcomes oI a particular experiment, such as Ilipping coins
or tossing dice, and to suppress the description oI everything else in the universe. Sections 3.2-3.4
in this Chapter contain a Iew crucial deIinitions, Ior events, probabilities, conditional probabilities,
and statistical independence. They also contain a treatment oI measurability, the theory oI
integration, and probability on product spaces that is needed mostly Ior more advanced topics in
econometrics. ThereIore, readers who do not have a good background in mathematical analysis may
Iind it useIul to concentrate on the deIinitions and examples in these sections, and postpone study
oI the more mathematical material until it is needed.
3.2. EVENT FIELDS AND INFORMATION
3.2.1. An event is a set oI states oI Nature with the property that one can in principle determine
whether the event occurs or not. II states oI Nature describe all happenings, including the outcome
oI a particular coin toss, then one event might be the set oI states oI Nature in which this coin toss
comes up heads. The Iamily oI potentially observable events is denoted by F. This Iamily is
assumed to have the Iollowing properties:
(i) The "anything can happen" event S is in F.
(ii) II event A is in F, then the event "not A", denoted A
c
or S\A, is in F.
(iii) II A and B are events in F, then the event "both A and B", denoted AB, is in F.
(iv) II A
1
,A
2
,... is a Iinite or countable sequence oI events in F, then the event "one or more oI
A
1
or A
2
or ...", denoted A
i
, is in F.
"
i1
A Iamily F with these properties is called a o-field (or Boolean o-algebra) oI subsets oI S. The pair
(S,F) consisting oI an abstract set S and a o-Iield F oI subsets oI S is called a measurable space, and
the sets in F are called the measurable subsets oI S. Implications oI the deIinition oI a o-Iield are
McFadden, !"#"$%"$&#' )**'%+ 2000 Chapter 3-2, Page 44
___________________________________________________________________________
(v) II A
1
,A
2
,... is a Iinite or countable sequence oI events in F, then is also in F.
"
i1
,
i
(vi) II A
1
,A
2
,... is a countable sequence oI events in F that is monotone decreasing (i.e., A
1
A
2
...), then its limit, also denoted A
i
A
0
, is also in F. Similarly, iI a sequence in F is monotone
increasing (i.e., A
1
A
2
...), then its limit A
0
, is also in F
"
i1
,
i
(vii) The empty event is in F.
We will use a Iew concrete examples oI sample spaces and o-Iields:
Example 1. |Two coin tosses| A coin is tossed twice, and Ior each toss a head or tail appears.
Let HT denote the state oI Nature in which the Iirst toss yields a head and the second toss yields a
tail. Then S HH,HT,TH,TT}. Let F be the class oI all possible subsets oI S; F has 2
4
members.
Example 2. |Coin toss until a tail| A coin is tossed until a tail appears. The sample space is S
T, HT, HHT, HHHT,...}. In this example, the sample space is inIinite, but countable. Let F be
the o-Iield generated by the Iinite subsets oI S. This o-Iield contains events such as 'At most ten
heads, and also, using the monotone closure property (vi) above, events such as "Ten or more tosses
without a tail", and "an even number oI heads beIore a tail". A set that is not in F will have the
property that both the set and its complement are inIinite. It is diIIicult to describe such a set,
primarily because the language that we normally use to construct sets tends to correspond to
elements in the o-Iield. However, mathematical analysis shows that such sets must exist, because
the cardinality oI the class oI all possible subsets oI S is greater than the cardinality oI F .
Example 3. |S&P stock index| The stock index is a number in the positive real line r
, so S
r
. Take the o-Iield oI events to be the Borel o-field B(r
), which is deIined as the smallest Iamily

oI subsets oI the real line that contains all the open intervals in r
and satisIies the properties (i)-(iv)

oI a o-Iield. The subsets oI r
that are in B are said to be measurable, and those not in B are said
to be non-measurable.
Example 4. |S&P stock index on successive days| The set oI states oI Nature is the Cartesian
product oI the set oI calues on day one and the set oI values on day 2, S r
(also denoted r
2
).
Take the o-Iield oI events to be the product oI the one-dimensional o-Iields, F B
1
B
2
, where ""
denotes an operation that Iorms the smallest o-Iield containing all sets oI the Iorm AC with A -
B
1
and C - B
2
. In this example, B
1
and B
2
are identical copies oI the Borel o-Iield on r
. Assume
that the index was normalized to be one at the beginning oI the previous year. Examples oI events
in F are "below 1 on day 1", "at least 2 on both days", and "higher on the second day than the Iirst
day". The operation "" is diIIerent than the cartesian product "", where B
1
B
2
is the Iamily oI all
___________________________________________________________________________
rectangles AC Iormed Irom A - B
1
and C - B
2
. This Iamily is not itselI a o-Iield, but the o-Iield
that it generates is B
1
B
2
. For example, the event "higher on the second day than the Iirst day" is
not a rectangle, but is obtained as a monotone limit oI rectangles.
In the Iirst example, the o-Iield consisted oI all possible subsets oI the sample space. This was
not the case in the last two examples, because the Borel o-Iield does not contain all subsets oI the
real line. There are two reasons to introduce the complication oI dealing with o-Iields that do not
contain all the subsets oI the sample space, one substantive and one technical. The substantive
reason is that the o-Iield can be interpreted as the potential inIormation that is available by
observation. II an observer is incapable oI making observations that distinguish two states oI Nature,
then the o-Iield cannot contain sets that include one oI these states and excludes the other. Then, the
speciIication oI the o-Iield will depend on what is observable in an application. The technical reason
is that when the sample space contains an inIinite number oI states, it may be mathematically
impossible to deIine probabilities with sensible properties on all subsets oI the sample space.
Restricting the deIinition oI probabilities to appropriately chosen o-Iields solves this problem.

3.2.2. It is possible that more than one o-Iield oI subsets is deIined Ior a particular sample space
S. II A is an arbitrary collection oI subsets oI S, then the smallest o-Iield that contains A is said to
be the o-Iield generated by A. It is sometimes denoted o(A). II F and G are both o-Iields, and G
F, then G is said to be a sub-field oI F, and F is said to contain more information or refine G. It
is possible that neither F G nor G F. The intersection FG oI two o-Iields is again a o-Iield
that contains the common information in F and G. Further, the intersection oI an arbitrary countable
or uncountable collection oI o-Iields is again a o-Iield. The union FG oI two o-Iields is not
necessarily a o-Iield, but there is always a smallest o-Iield that reIines both F and G, which is simply
the o-Iield o(FG) generated by the sets in the union oI F and G, or put another way, the intersection
oI all o-Iields that contain both F and G.
Example 1. (continued) Let F denote the o-Iield oI all subsets oI S. Another o-Iield is G
,S,HT,HH},TT,TH}}, containing events with inIormation only on the outcome oI the Iirst coin
toss. Yet another o-Iield contains the events with inIormation only on the number oI heads, but not
their order, H ,S,HH},TT},HT,TH},HH,TT},HT,TH,TT},HH,HT,TH}}. Then, F
contains more inIormation than G or H. The intersection GH is the 'no inIormation o-Iield
,S}. The union GH is not a o-Iield, and the o-Iield o(GH) that it generates is F. This can be
veriIied constructively (in this Iinite S case) by building up o(GH) by Iorming intersections and
unions oI members oI GH, but is also obvious since knowing the outcome oI the Iirst toss and
knowing the total number oI heads reveals Iull inIormation on both tosses.
Example 3. (continued) Let F denote the Borel o-Iield. Then G ,S,(1,),(-,1|} and D
,S,-,2),|2,)} are both o-Iields, the Iirst corresponding to the ability to observe whether the
___________________________________________________________________________
index is above 1, the second corresponding to the ability to tell whether it is above 2. For shorthand,
let a (-,1|, b (-,2|, c (1,), d (2,), and e (1,2|. Neither G or D contains the other,
both are contained in F, and their intersection is the 'no inIormation o-Iield ,S}. The o-Iield
generated by their union, corresponding to the ability to tell iI the index is in a,e, or d, is o(GD)
,S,a,b,c,d,e,ad}.
An element B in a o-Iield G oI subsets oI S is an atom iI the only set in G that is a proper subset
oI B is the empty set . In the last example, D has atoms b and d, and the atoms oI o(GD) are a,
d, and e, but not b ae or c ed. The atoms oI the Borel o-Iield are the individual real numbers.
An economic interpretation oI this concept that iI the o-Iield deIining the common inIormation oI
two economic agents contains an atom, then a contingent contract between them must have the same
realization no matter what state oI Nature within this atom occurs.
3.3. PROBABILITY
3.3.1. Given a sample space S and o-Iield oI subsets F, a probabilitv (or probabilitv measure)
is deIined as a Iunction P Irom F into the real line with the Iollowing properties:
(i) P(A) . 0 Ior all A - F.
(ii) P(S) 1.
(iii) |Countable Additivity| II A
1
, A
2
,... is a Iinite or countable sequence oI events in F that are
mutually exclusive (i.e., A
i
A
j
Ior all i j), then P( A
i
) P(A
i
).

"
i1 Z
"
i1
With conditions (i)-(iii), P has the Iollowing additional intuitive properties oI a probability when A
and B are events in F:
(iv) P(A) P(A
c
) 1.
(v) P() 0.
(vi) P(AB) P(A) P(B) - P(AB).
(vii) P(A) . P(B) when B A.
(viii) II A
i
in F is monotone decreasing to (denoted A
i
), then P(A
i
) - 0.
(ix) II A
i
- F, not necessarily disjoint, then P( A
i
) < P(A
i
).

"
i1 Z
"
i1
(x) II A
i
} is a Iinite or countable partition oI S (i.e., the events A
i
- F are mutually exclusive
and exhaustive, or A
i
A
j
Ior all i j and A
i
S), then P(B) P(BA
i
).
"
i1 Z
"
i1
___________________________________________________________________________
The triplet (S,F,P) consisting oI a measurable space (S,F) and a probability measure P is called a
probabilitv space.
Example 1 (continued). Consider the o-Iield H containing inIormation on the number oI heads,
but not their order. The table below gives three Iunctions P
1
, P
2
, P
3
deIined on H. All satisIy
properties (i) and (ii) Ior a probability. Functions P
2
and P
3
also satisIy (iii), and are probabilities,
but P
1
violates (iii) since P
1
(HH}TT}) P
1
(HH}) P
1
(TT}). The probability P
2
is generated
by Iair coins, and the probability P
3
by one Iair coin and one biased coin.
S HH TT HT,TH HH,TT HT,TH,TT HH,HT,TH
P
1
0 1 1/3 1/3 1/2 1/2 2/3 2/3
P
2
0 1 1/4 1/4 1/2 1/2 3/4 3/4
P
3
0 1 1/3 1/6 1/2 1/2 2/3 5/6
3.3.2. II A - F has P(A) 1, then A is said to occur almost surelv (a.s.), or with probabilitv one
(w.p.1). II A - F has P(A) 0, then A is said to occur with probabilitv :ero (w.p.0). Finite or
countable intersections oI events that occur almost surely again occur almost surely, and Iinite or
countable unions oI events that occur with probability zero again occur with probability zero.
Example 2. (continued) II the coin is Iair, then the probability oI k-1 heads Iollowed by a tail
is 1/2
k
. Use the geometric series Iormulas in 2.1.10 to veriIy that the probability oI 'At most 3
heads is 15/16, oI "Ten or more heads" is 1/2
10
, and oI "an even number oI heads" is 2/3.
Example 3. (continued) Consider the Iunction P deIined on open sets (s,) - r
by P((s,))
e
-s/2
. This Iunction maps into the unit interval. It is then easy to show that P satisIies properties
(i)-(iii) oI a probability on the restricted Iamily oI open intervals, and a little work to show that when
a probability is determined on this Iamily oI open intervals, then it is uniquely determined on the
o-Iield generated by these intervals. Each single point, such as 1}, is in F. Taking intervals that
shrink to this point, each single point occurs with probability zero. Then, a countable set oI points
occurs w.p.0.
3.3.3. OIten a measurable space (S,F) will have an associated measure v that is a countably
additive Iunction Irom F into the nonnegative real line; i.e., v( A
i
) v(A
i
) Ior any
"
i1 Z
"
i1
sequence oI disjoint A
i
- F. The measure is positive iI v(A) . 0 Ior all A - F; we will consider only
positive measures. The measure v is finite iI v(A) < M Ior some constant M and all A - F, and
___________________________________________________________________________
o-finite iI F contains a countable partition A
i
} oI S such that the measure oI each partition set is
Iinite; i.e., v(A
i
) . The measure v may be a probability, but more commonly it is a measure oI
"length" or "volume". For example, it is common when the sample space S is the countable set oI
positive integers to deIine v to be counting measure with v(A) equal to the number oI points in A.
When the sample space S is the real line, with the Borel o-Iield B, it is common to deIine v to be
Lebesgue measure, with v((a,b)) b - a Ior any open interval (a,b). Both oI these examples are
positive o-Iinite measures. A set A is said to be oI v-measure :ero iI v(A) 0. A property that holds
except on a set oI measure zero is said to hold almost evervwhere (a.e.). It will sometimes be useIul
to talk about a o-Iinite measure space (S,F,) where is positive and o-Iinite and may either be a
probability measure or a more general counting or length measure such as Lebesgue measure.
3.3.4. Suppose I is a real-valued Iunction on a o-Iinite measure space (S,F,). This Iunction is
measurable iI I
-1
(C) - F Ior each open set C in the real line. A measurable Iunction has the
property that its contour sets oI the Iorm s-S,a<I(s)<c} are contained in F . This implies that iI B
- F is an atom, then I(s) must be constant Ior all s - B.
The integral oI measurable I on a set A - F, denoted I(s)(ds), is deIined Ior (A)
!,
as the limit as n - oI sums oI the Iorm (k/n)(C
kn
), where C
kn
is the set oI states oI
Z
"
k"
Nature in A Ior which I(s) is contained in the interval (k/n,(k1)/n|. A Iinite limit exists
iI ,k/n,(C
kn
) , in which case I is said to be integrable on A. Let A
i
} - F be a
Z
"
k"
countable partition oI S with (A
i
) , guaranteed by the o-Iinite property oI . The Iunction I
is integrable on a general set A - F iI it is integrable on AA
i
Ior each i and iI ,I(s),(ds)
!,
lim
n$"
,I(s),(ds) exists, and simply integrable iI it is integrable Ior A S. In
Z
n
i1
!,,
i
general, the measure can have point masses (at atoms), or continuous measure, or both, so that the
notation Ior integration with respect to includes sums and mixed cases. The integral I(s)(ds)
!,
will sometimes be denoted I(s)d, or in the case oI Lebesgue measure, I(s)ds.
!, !,
3.3.5. For a o-Iinite measure space (S,F,), deIine L
q
(S,F,) Ior 1 < q to be the set oI
measurable real-valued Iunctions on S with the property that ,I,
q
is integrable, and deIine [I[
q

___________________________________________________________________________
| I(s)
q
(ds)|
1/q
to be the norm oI I. Then, L
q
(S,F,) is a linear space, since linear
!
combinations oI integrable Iunctions are again integrable. This space has many, but not all, oI
Iamiliar properties oI Iinite-dimensional Euclidean space. The set oI all linear Iunctions on the space
L
q
(S,F,) Ior q ~ 1 is the space L
r
(S,F,), where 1/r 1 - 1/q. This Iollows Irom an application oI
Holder`s inequality, which generalizes Irom Iinite vector spaces to the condition
I - L
q
(S,F,) and g - L
r
(S,F,) with q
-1
r
-1
1 imply I(s)g(s) (ds) < [I[
q
[g[
r
.
!
The case q r 2 gives the Cauchy-Schwartz inequality in general Iorm. This case arises oIten in
statistics, with the Iunctions I interpreted as random variables and the norm [I[
2
interpreted as a
quadratic mean or variance.
3.3.6. There are three important concepts Ior the limit oI a sequence oI Iunctions I
n
- L
q
(S,F,).
First, there is convergence in norm, or strong convergence: I is a limit oI I
n
iI [I
n
- I[
q
- 0. Second,
there is convergence in -measure: I is a limit oI I
n
iI (s-S ,I
n
(s) - I(s), ~ C}) - 0 Ior each C ~ 0.
Third, there is weak convergence: I is a limit oI I
n
iI (I
n
(s) - I(s))g(s) (ds) - 0 Ior each g -
!
L
r
(S,F,) with 1/r 1 - 1/q. The Iollowing relationship holds between these modes oI convergence:
Strong Convergence == Weak Convergence == Convergence in -measure
An example shows that convergence in -measure does not in general imply weak convergence:
Consider L
2
(0,1],B,) where B is the Borel o-Iield and is Lebesgue measure. Consider the
sequence I
n
(s) n1(s<1/n). Then (s-S, ,I
n
(s), ~ C}) 1/n, so that I
n
converges in -measure to
zero, but Ior g(s) s
-1/3
, one has [g[
2
3
1/2
and I
n
(s)g(s) (ds) 3n
1/3
/2 divergent. Another
!
example shows that weak convergence does not in general imply strong convergence: Consider S
1,2,...} endowed with the o-Iield generated by the Iamily oI Iinite sets and the measure that
gives weight k
-1/2
to point k. Consider I
n
(k) n
1/4
1(k n). Then [I
n
[
2
1. II g is a Iunction Ior
which I
n
(k)g(k)(k}) g(n)n
1/4
does not converge to zero, then g(k)
2
(k}) is bounded
Z
"
k1
away Irom zero inIinitely oIten, implying [g[
2
g(k)
2
(k}) . Then, I
n
converges
Z
"
k1
weakly, but not strongly, to zero. The Iollowing theorem, which is oI great importance in advanced
econometrics, gives a uniIormity condition under which these modes oI convergence coincide.
___________________________________________________________________________
Theorem 3.1. (Lebesgue Dominated Convergence) II g and I
n
Ior n 1,2,... are in L
q
(S,F,) Ior
1 < q and a o-Iinite measure space (S,F,), and iI ,I
n
(s), < g(s) almost everywhere, then I
n
converges in -measure to a Iunction I iI and only iI I - L
q
(S,F,) and [I
n
- I[
q
- 0.
One application oI this theorem is a result Ior interchange oI the order oI integration and
diIIerentiation. Suppose I(,t) - L
q
(S,F,) Ior t in an open set T r
n
. Suppose I is differentiable,
meaning that there exists a Iunction V
t
I(,t) - L
q
(S,F,) Ior t - T such that iI th - T and h 0, then
the remainder Iunction r(s,t,h) |I(s,th) - I(s,t) - V
t
I(,t)h|/,h, - L
q
(S,F,) converges in -measure
to zero as h - 0. DeIine F(t) I(s,t)(ds). II there exists g - L
q
(S,F,) which dominates the
!
remainder Iunction (i.e., ,r(s,t,h), < g(s) a.e.), then Theorem 3.1 implies lim
h$0
[r(,t,h)[
q
0, and F(t)
is diIIerentiable and satisIies V
t
F(t) V
t
I(s,t)(ds).
!
A Iinite measure P on (S,F) is absolutelv continuous with respect to a measure v iI A - F and
v(A) 0 imply P(A) 0. II P is a probability measure that is absolutely continuous with respect to
the measure v, then an event oI measure zero occurs w.p.0, and an event that is true almost
everywhere occurs almost surely. A Iundamental result Irom analysis is the theorem:
Theorem 3.2. (Radon-Nikodym) II a Iinite measure P on a measurable space (S,F) is absolutely
continuous with respect to a positive o-Iinite measure v on (S,F), then there exists an integrable real-
valued Iunction p - L
1
(S,F,v) such that
p(s)v(ds) P(A) Ior each A - F.
!,
When P is a probability, the Iunction p given by the theorem is nonnegative, and is called the
probabilitv densitv. An implication oI the Radon-Nikodym theorem is that iI a measurable space
(S,F) has a positive o-Iinite measure v and a probability measure P that is absolutely continuous with
respect to v, then there exists a density p such that Ior every I - L
q
(S,F,P) Ior some 1 < q , one
has I(s)P(ds) I(s)p(s)v(ds).
!! !!
3.3.7. In applications where the probability space is the real line with the Borel o-Iield, with a
probability P such that P((-,s|) F(s) is continuously diIIerentiable, the Iundamental theorem oI
integral calculus states that p(s) F(s) satisIies F(A) p(s)ds. What the Radon-Nikodym
!,
theorem does is extend this result to o-Iinite measure spaces and weaken the assumption Irom
___________________________________________________________________________
continuous diIIerentiability to absolute continuity. In basic econometrics, we will oIten characterize
probabilities both in terms oI the probability measure (or distribution) and the density, and will
usually need only the elementary calculus version oI the Radon-Nikodym result. However, it is
useIul in theoretical discussions to remember that the Radon-Nikodym theorem makes the
connection between probabilities and densities. We give two examples that illustrate practical use
oI the calculus version oI the Radon-Nikodym theorem.

Example 3. (continued) Given P((s,)) e
-s/2
, one can use the diIIerentiability oI the Iunction
in s to argue that it is absolutely continuous with respect to Lebesgue measure on the line. VeriIy
by integration that the density implied by the Radon-Nikodym theorem is p(s) e
-s/2
/2.
Example 5. A probability that appears Irequently in statistics is the normal, which is deIined
on (r,B), where r is the real line and B the Borel o-Iield, by the density n(s-,o)
, so that P(A) . In this probability, and o are (2ao
2
)
1/2
e
(s)
2
/2o
2
!
,
(2ao
2
)
1/2
e
(s)
2
/2o
2
ds
parameters that are interpreted as determining the location and scale oI the probability, respectively.
When 0 and o 1, this probability is called the standard normal.
3.3.8. Consider a probability space (S,F,P), and a o-Iield G F. II the event B - G has P(B) ~
0, then the conditional probabilitv oI A given B is deIined as P(AB) P(AB)/P(B). Stated
another way, P(AB) is a real-valued Iunction on FG with the property that P(AB) P(AB)P(B)
Ior all A - F and B - G. When B is a Iinite set, the conditional probability oI A given B is the ratio
oI sums
P(A,B) .
Zs,-
P(s})
Z
s-
P(s})
Example 6. On a quiz show, a contestant is shown three doors, one oI which conceals a prize,
and is asked to select one. BeIore it is opened, the host opens one oI the remaining doors which he
knows does not contain the prize, and asks the contestant whether she wants to keep her original
selection or switch to the other remaining unopened door. Should the contestant switch? Designate
the contestant`s initial selection as door 1. The sample space consists oI pairs oI numbers ab, where
a 1,2,3 is the number oI the door containing the prize and b 2,3 is the number oI the door opened
by the host, with b a: S 12,13,23,32}. The probability is 1/3 that the prize is behind each door.
The conditional probability oI b 2, given a 1, is 1/2, since in this case the host opens door 2 or
door 3 at random. However, the conditional probability oI b 2, given a 2 is zero and the
conditional probability oI b 2 given a 3 is one. Hence, P(12) P(13) (1/3)(1/2), and P(23)
P(32) 1/3. Let A 12,13} be the event that door 1 contains the prize and B 12,32} be the
___________________________________________________________________________
0
2
4
6
8
10
S
e
c
o
n
d

S
t
o
r
e
0 2 4 6 8 10
First Store
Location of Fast Food Stores
event that the host opens door 2. Then the conditional probability oI A given B is
P(12)/(P(12)P(32)) (1/6)/((1/6)(1/3)) 1/3. Hence, the probability oI receiving the prize is 1/3
iI the contestant stays with her original selection, 2/3 iI she switches to the other unopened door.
Example 7. Two Iast Iood stores are sited at random points along a street that is ten miles long.
What is the probability that they are less than Iive miles apart? Given that the Iirst store is located
at the three mile marker, what is the probability that the second store is less than Iive miles away?
The answers are obvious Irom the diagram below, in which the sample space is depicted as a
rectangle oI dimension 10 by 10, with the horizontal axis giving the location oI the Iirst store and
the vertical axis giving the location oI the second store. The shaded areas correspond to the event
that the two are more than Iive miles apart, and the proportion oI the rectangle in these areas is 1/4.
Conditioned on the Iirst store being at point 3 on the horizontal axis, the second store is located at
random on a vertical line through this point, and the proportion oI this line that lies in the shaded
area is 1/5. Let x be the location oI the Iirst store, y the location oI the second. The conditional
probability oI the event that ,x - y, ~ 5, given x, is ,x-5,/10. This could have been derived by Iorming
the probability oI the event ,x - y, ~ 5 and c x co Ior a small positive o, taking the ratio oI this
probability to the probability oI the event c x co to obtain the conditional probability oI the
event ,x - y, ~ 5 given c x co, and taking the limit o - 0.

The idea behind conditional probabilities is that one has partial inIormation on what the state oI
Nature may be, and one wants to calculate the probability oI events using this partial inIormation.
One way to represent partial inIormation is in terms oI a subIield; e.g., F is the Iield oI events which
distinguish outcomes in both the past and the Iuture, and a subIield G contains events which
distinguish only past outcomes. A conditional probability P(AB) deIined Ior B G can be
interpreted Ior Iixed A as a Iunction Irom G into |0,1|. To emphasize this, conditional probabilities
___________________________________________________________________________
are sometimes written P(AG), and G is termed the information set, or a Iamily oI events with the
property that you know whether or not they happened at the time you are Iorming the conditional
probability.
Example 1. (continued) II G ,S,HT,HH},TT,TH}}, so that events in G describe the
outcome oI the Iirst coin toss, then P(HHHH,HT}) P(HH)/(P(HH)P(HT)) is the probability
oI heads on the second toss, given heads on the Iirst toss. In this example, the conditional probability
oI a head on the second toss equals the unconditional probability oI this event. In this case, the
outcome oI the Iirst coin toss provides no inIormation on the probabilities oI heads Irom the second
coin, and the two tosses are said to be statisticallv independent. II G
,S,HT,TH},HH},TT},HH}
c
,TT}
c
}, the Iamily oI events that determine the number oI heads
that occur in two tosses without regard Ior order, then the conditional probability oI heads on the
Iirst toss, given at least one head, is P(HT,HH}TT}
c
) (P(HT)P(HH))/(1-P(TT)) 2/3. Then,
the conditional probability oI heads on the Iirst toss given at least one head is not equal to the
unconditional probability oI heads on the Iirst toss.
Example 3. (continued) Suppose G ,S,(1,),(-,1|} is the o-Iield corresponding to the
event that the index exceeds 1, and let B denote the Borel o-Iield containing all the open intervals.
The unconditional probability P((s,)) e
-s/2
implies P((1,)) e
-1/2
0.6065. The conditional
probability oI (2,) given (1,) satisIies P(((2,)(1,)) P((1,)(2,))/P((1,)) e
-1
/e
-1/2
0.6065
~ P((2,)) 0.3679. The conditional and unconditional probabilities are not the same, so that the
conditioning event provides inIormation on the probability oI (2,).
For a probability space (S,F,P), suppose A
1
,...,A
k
is a Iinite partition oI S; i.e., A
i
A
j
and
A
i
S. The partition generates a Iinite Iield G F . From the Iormula P(AB)
\
k
i1
P(AB)P(B) satisIied by conditional probabilities, one has Ior an event C - F the Iormula
P(C) P(C,A
i
)P(A
i
).
Z
k
i1
This is oIten useIul in calculating probabilities in applications where the conditional probabilities
are available.

3.3.9. In a probability space (S,F,P), the concept oI a conditional probability P(A,B) oI A - F
given an event B in a o-Iield G F can be extended to cases where P(B) 0 by deIining P(AB)
as the limit oI P(AB
i
) Ior sequences B
i
- G that satisIy P(B
i
) ~ 0 and B
i
- B, provided the limit
exists. II we Iix A, and consider P(AB) as a measure deIined Ior B - G, this measure obviously
satisIies P(AB) < P(B), so that it is absolutely continuous with respect to P(B). Then, Theorem 3.2
___________________________________________________________________________
implies that there exists a Iunction P(A,) - L
1
(S,G,P) such that P(AB) P(A,s)P(ds). We
!-
have written this Iunction as iI it were a conditional probability oI A given the 'event s}, and it
can be given this interpretation. II B - G is an atom, then the measurability oI P(A,) with respect
to G requires that it be constant Ior s - B, so that P(AB) P(A,s)P(B) Ior any s - B, and we can
instead write P(AB) P(A,B)P(B), satisIying the deIinition oI conditional probability even iI P(B)
0.
Example 4. (continued) Consider F BB, the product Borel o-Iield on r
, and G
B.r
}, the o-Iield corresponding to having complete inIormation on the level oI the index on
the Iirst day and no inIormation on the second day. Suppose P((s,)(t,)) 2/(1e
st
). This is a
probability on these open intervals that extends to F; veriIying this takes some work. The
conditional probability oI (s,)(t,) given the event (r,)(0,) - G and s < r equals P((r,)(t,))
divided by P((r,)(0,)), or (1e
r
)/(1e
rt
). The conditional probability oI (s,)(t,) given the
event (r,ro)(0,) - G and s < r is |1/(1e
rt
) - 1/(1e
rot
)|/|1/(1e
r
) - 1/(1e
ro
)|. The limit oI this
expression as o - 0 is e
r
(1e
r
)
2
/(1e
rt
)
2
P((s,)(t,),r}(0,)); this Iunction oI r is also the
integrand that satisIies Theorem 3.2. Note that P((s,)(t,),r}(0,)) P((s,)(0,)) 1/(1e
s
),
so that the conditioning event conveys inIormation about the probability oI (s,)(t,).
3.4. STATISTICAL INDEPENDENCE AND REPEATED TRIALS
3.4.1. Consider a probability space (S,F,P). Events A and C in F are statisticallv independent
iI P(AC) P(A)P(C). From the deIinition oI conditional probability, iI A and C are statistically
independent and P(A) ~ 0, then P(CA) P(AC)/P(A) P(C). Thus, when A and C are
statistically independent, knowing that A occurs is unhelpIul in calculating the probability that C
occurs. The idea oI statistical independence oI events has an exact analogue in a concept oI
statistical independence oI subIields. Let A ,A,A
c
,S} and C ,C,C
c
,S} be the subIields oI
F generated by A and C, respectively. VeriIy as an exercise that iI A and C are statistically
independent, then so are any pair oI events A - A and C - C. Then, one can say that the subIields
A and C are statistically independent. One can extend this idea and talk about statistical
independence in a collection oI subIields. Let N denote an index set, which may be Iinite, countable,
or non-countable. Let F
i
denote a o-subIield oI F (F
i
F) Ior each i - N. The subIields F
i
are
mutuallv statisticallv independence (MSI) iI and only iI P( A
j
) P(A
j
) Ior all Iinite K
f.
1
f.
N and A
j
- F
j
Ior j - K. As in the case oI statistical independence between two events (subIields),
the concept oI MSI can be stated in terms oI conditional probabilities: F
i
Ior i - N are mutually
___________________________________________________________________________
statistically independent (MSI) iI, Ior all i - N, Iinite K N\i} and A
j
- F
j
Ior j - i}K, one has
P(A
i
A
j
) P(A
i
), so the conditional and unconditional probabilities are the same.
f.
Example 1. (continued) Let A HH,HT} denote the event oI a head Ior the Iirst coin, C
HH,TH} denote the event oI a head Ior the second coin, D HH,TT} denote the event oI a match,
G HH} the event oI two heads. The table below gives the probabilities oI various events.
Event A C D G AC AD CD ACD AG
Prob. 1/4 1/4 1/4 1/4 1/4 1/4
The result P(AC) P(A)P(C) 1/4 establishes that A and C are statistically independent. VeriIy
that A and D are statistically independent, and that C and D are statistically independent, but that
P(ACD) P(A)P(C)P(D), so that A, C, and D are not MSI. VeriIy that A and G are not
statistically independent.
Example 4. (continued) Recall that S r
2
with F BB, the product Borel o-Iield. DeIine N
,r} and the subIields F
1
BN and F
2
NB, containing inIormation on the index levels on
the Iirst and second day, respectively. DeIine G to be the o-Iield generated by the rectangles
(0,1|(0,1|, (0,1|(1,),(1,)(0,1|, and(1,)(1,). Then G is the subIield oI B containing
inIormation on whether the indices on the two days are above one. DeIine F
3
to be the o-subIield
oI BB generated by sets oI the Iorm A
1
A
2
with A
1
- G and A
2
- B; then F
3
contains Iull
inIormation on the second day index, but only the qualitative inIormation on whether the Iirst day
index is above one. Suppose P((s,)(t,)) e
-s-t
. Then F
1
,F
2
} are MSI. However, F
1
,F
3
} are not
independent.
Example 8. Consider S 0, 1, 2, 3, 4, 5, 6, 7}, with F equal to all subsets oI S. As a
shorthand, let 0123 denote 0,1,2,3}, etc. DeIine the subIields
F
1
,0123,4567,S}, F
2
,2345,0167,S}, F
3
,0246,1357,S},
F
4
,01,23,4567,0123,234567,014567,S},
F
5
,01,23,45,67,0123,0145,0167,2345,2367,4567,012345,012367,014567,234567,S},
F
6
,06,17,24,35,0167,0246,0356,1247,1357,2345,123457,023456,013567,012467,S}.
The Iield F
4
is a refinement oI the Iield F
1
(i.e., F
1
F
4
), and can be said to contain more inIormation
than F
1
. The Iield F
5
is a mutual refinement oI F
1
and F
2
(i.e., F
1
F
2
F
5
), and is in Iact the smallest
mutual reIinement. It contains all the inIormation available in either F
1
or F
2
. Similarly, F
6
is a
___________________________________________________________________________
mutual reIinement oI F
2
and F
3
. The intersection oI F
5
and F
6
is the Iield F
2
; it is the common
inIormation available in F
5
and F
6
. II, Ior example, F
5
characterized the inIormation available to one
economic agent, and F
6
characterized the inIormation available to a second agent, then F
2
would
characterize the common inIormation upon which they could base contingent contracts. Suppose
P(i) 1/8. Then F
1
, F
2
, F
3
} are MSI. E.g., P(01232345) P(01230246) P(012323450246)
P(0123) 1/2. However, F
1
, F
4
} are not independent; e.g., 1 P(012301) P(0123) 1/2.
For M N, let F
M
denote the smallest o-Iield containing F
i
Ior all i - M. Then MSI satisIies the
Iollowing theorem, which provides a useIul criterion Ior determining whether a collection oI
subIields is MSI:
Theorem 3.3. II F
i
are MSI Ior i - N, and M N\i}, then F
i
,F
M
} are MSI. Further, F
i
Ior i-N
are MSI iI and only iI F
i
,F
N\i
} are MSI Ior all i-N.
Example 5. (continued) II M 2,3}, then F
M
F
6
, and P(0123A) Ior each A - F
M
.
3.4.2. The idea oI repeated trials is that an experiment, such as a coin toss, is replicated over
and over. It is convenient to have common probability space in which to describe the outcomes oI
larger and larger experiments with more and more replications. The notation Ior repeated trials will
be similar to that introduced in the deIinition oI mutual statistical independence. Let N denote a
Iinite or countable index set oI trials, S
i
a sample space Ior trial i, and G
i
a o-Iield oI subsets oI S
i
.
Note that (S
i
,G
i
) may be the same Ior all i. Assume that (S
i
, G
i
) is the real line with the Borel o-Iield,
or a countable set with the Iield oI all subsets, or a pair with comparable mathematical properties
(i.e., S
i
is a complete separable metric space and G
i
is its Borel Iield). Let t (s
1
,s
2
,...) (s
i
: i-N)
denote an ordered sequence oI outcomes oI trials, and S
N

iN
S
i
denote the sample space oI these
sequences. Let F
N
G GG G
iN
G
i
denote the o-Iield oI subsets oI S
N
generated by the finite rectangles
which are sets oI the Iorm (
iK
A
i
)(
iN\K
S
i
) with K a Iinite subset oI N and A
i
- G
i
Ior i - K.
The collection F
N
is called the product o-field oI subsets oI S
N
.
Example 9. N 1,2,3}, S
i
0,1}, G
i
,0},1},S} is a sample space Ior a coin toss, coded
'1" iI heads and '0" iI tails. Then S
N
s
1
s
2
s
3
s
i
- S
i
} 000, 001, 010, 011, 100, 101, 110, 111},
where 000 is shorthand Ior the event 0}0}0}, and so Iorth, is the sample space Ior three coin
tosses. The Iield F
N
is the Iamily oI all subsets oI S
N
.
For any subset K oI N, deIine S
K

iK
S
i
and G
K
G GG G
iK
G
i
. Then, G
K
is the product o-Iield
on S
K
. DeIine F
K
to be the o-Iield on S
N
generated by sets oI the Iorm AS
N\K
Ior A - G
K
. Then G
K
___________________________________________________________________________
and F
K
contain essentially the same inIormation, but G
K
is a Iield oI subsets oI S
K
and F
K
is a
corresponding Iield oI subsets oI S
N
which contains no inIormation on events outside oI K. Suppose
P
N
is a probability on (S
N
, F
N
). The restriction oI P
N
to (S
K
,G
K
) is a probability P
K
deIined Ior A -
G
K
by P
K
(A) P
N
(AS
N\K
). The Iollowing result establishes a link between diIIerent restrictions:
Theorem 3.4. II M K and P
M
, P
K
are restrictions oI P
N
, then P
M
and P
K
satisIy the
compatibilitv condition that P
M
(A) P
K
(AS
K\M
) Ior all A - F
M
.
There is then a Iundamental result that establishes that when probabilities are deIined on all Iinite
sequences oI trials and are compatible, then there exists a probability deIined on the inIinite sequence
oI trials that yields each oI the probabilities Ior a Iinite sequence as a restriction.
Theorem 3.5. II P
K
on (S
K
,G
K
) Ior all Iinite K N satisIy the compatibility condition, then there
exists a unique P
N
on (S
N
,F
N
) such that each P
K
is a restriction oI P
N
.
This result guarantees that it is meaningIul to make probability statements about events such as 'an
inIinite number oI heads in repeated coin tosses".
.
Suppose trials (S
i
,G
i
,P
i
) indexed by i in a countable set N are mutually statistically independent.
For Iinite K N, let G
K
denote the product o-Iield on S
K
. Then MSI implies that the probability oI
a set
iK
A
i
- G
K
satisIies P
K
(
iK
A
i
) P
j
(A
j
). Then, the compatibility condition in
1
f.
Theorem 3.3 is satisIied, and that result implies the existence oI a probability P
N
on (S
N
,F
N
) whose
restrictions to (S
K
,G
K
) Ior Iinite K N are the probabilities P
K
.
3.4.3. The assumption oI statistically independent repeated trials is a natural one Ior many
statistical and econometric applications where the data comes Irom random samples Irom the
population, such as surveys oI consumers or Iirms. This assumption has many powerIul
implications, and will be used to get most oI the results oI basic econometrics. However, it is also
common in econometrics to work with aggregate time series data. In these data, each period oI
observation can be interpreted as a new trial. The assumption oI statistical independence across
these trials is unlikely in many cases, because in most cases real random eIIects do not conveniently
limit themselves to single time periods. The question becomes whether there are weaker
assumptions that time series data are likely to satisIy that are still strong enough to get some oI the
basic statistical theorems. It turns out that there are quite general conditions, called mixing
conditions, that are enough to yield many oI the key results. The idea behind these conditions is that
usually events that are Iar apart in time are nearly independent, because intervening shocks
overwhelm the older history in determining the later event. This idea is Iormalized in Chapter 4.
___________________________________________________________________________
3.5. RANDOM VARIABLES, DISTRIBUTION FUNCTIONS, AND EXPECTATIONS
3.5.1. A random variable X is a measurable real-valued Iunction on a probability space (S,F,P),
or X:S r. Then each state oI Nature s determines a value X(s) oI the random variable, termed its
reali:ation in state s. When the Iunctional nature oI the random variable is to be emphasized, it is
denoted X(), or simply X. When its values or realizations are used, they are denoted X(s) or x. For
each set B - B, the probability oI the event that the realization oI X is contained in B is well-deIined
and equals P`(B) P(X
-1
(B)), where P` is termed the probability induced on r by the random
variable X. One can have many random variables deIined on the same probability space; another
measurable Iunction y Y(s) deIines a second random variable. It is important in working with
random variables to keep in mind that the random variable itselI is a Iunction oI states oI Nature, and
that observations are oI realizations oI the random variable. Thus, when one talks about
convergence oI a sequence oI random variables, one is actually talking about convergence oI a
sequence oI Iunctions, and notions oI distance and closeness need to be Iormulated as distance and
closeness oI Iunctions. Multiplying a random variable by a scalar, or adding random variables,
results in another random variable. Then, the Iamily oI random variables Iorms a linear vector
space. In addition, products oI random variables are again random variables, so that the Iamily oI
random variables Iorms an Abelian group under multiplication. The Iamily oI random variables is
also closed under mafori:ation, so that Z:S r deIined by Z(s) max(X(s),Y(s)) Ior random
variables X and Y is again a random variable. Then, the Iamily oI random variables Iorms a lattice
with respect to the partial order X < Y(i.e., X(s) < Y(s) almost surely).
3.5.2. The term measurable in the deIinition oI a random variable means that Ior each set A in
the Borel o-Iield B oI subsets oI the real line, the inverse image X
-1
(A) s-SX(s)-A} is in the
o-Iield F oI subsets oI the sample space S. The assumption oI measurability is a mathematical
technicality that ensures that probability statements about the random variable are meaningIul. We
shall not make any explicit reIerence to measurability in basic econometrics, and shall always
assume implicitly that the random variables we are dealing with are measurable.
3.5.3. The probability that a random variable X has a realization in a set A - B is given by
F(A) P(X
-1
(A)) P(s-SX(s)-A}).
The Iunction F is a probability on B; it is deIined in particular Ior halI-open intervals oI the Iorm A
(-,x|, in which case F((-,x|) is abbreviated to F(x) and is called the distribution function (or,
cumulative distribution function, CDF) oI X. From the properties oI a probability, the distribution
Iunction has the properties
___________________________________________________________________________
(i) F(-) 0 and F() 1.
(ii) F(x) is non-decreasing in x, and continuous Irom the right.
(iii) F(x) has at most a countable number oI jumps, and is continuous except at these jumps.
(Points without jumps are called continuitv points.)
Conversely, any Iunction F that satisIies (i) and (ii) determines uniquely a probability F on -. The
support oI the distribution F is the smallest closed set A - B such that F(A) 1.
Example 5. (continued) The standard normal CDF is 4(x) , obtained by
!
x
"
(2a)
1/2
e
s
2
/2
ds
integrating the density (s) . Other examples are the CDF Ior the standard (2a)
1/2
e
s
2
/2
exponential distribution, F(x) 1 - e
-x
Ior x ~ 0, and the CDF Ior the logistic distribution, F(x)
1/(1e
-x
). An example oI a CDF that has jumps is F(x) 1 - e
-x
/2 - Ior x ~ 0.
Z
"
k1
1(k.x)/2
k1
3.5.4. II F is absolutely continuous with respect to a o-Iinite measure v on r; i.e., F gives
probability zero to any set that has v-measure zero, then (by the Radon-Nikodym theorem) there
exists a real-valued Iunction I on r, called the densitv (or probabilitv densitv function, pdf) oI X,
such that
F(A) I(x)v(dx)
!,
Ior every A - B. With the possible exception oI a set oI v-measure zero, F is diIIerentiable and the
derivative oI the distribution gives the density, I(x) F(x). When the measure v is Lebesgue
measure, so that the measure oI an interval is its length, it is customary to simpliIy the notation and
write F(A) I(x)dx.
!,
II F is absolutely continuous with respect to counting measure on a countable subset C oI r, then
it is called a discrete distribution, and there is a real-valued Iunction I on C such that
F(A) I(x).
Z
x,
Recall that the probability is itselI a measure. This suggests a notation F(A) F(dx) that covers
!,
both continuous and counting cases. This is called a Lebesgue-Stieltfes integral.
___________________________________________________________________________
3.5.5. II (r,B,F) is the probability space associated with a random variable X, and g:r - r is
a measurable Iunction, then Y g(X) is another random variable. The random variable Y is
integrable with respect to the probability F iI g(x)F(dx) ;
!
iI it is integrable, then the integral g(x)F(dx) gdF exists, is denoted E g(X), and is
! !
called the expectation of g(X). When necessary, this expectation will also be denoted E
X
g(X) to
identiIy the distribution used to Iorm the expectation. When F is absolutely continuous with respect
to Lebesgue measure, so that F has a density I, the expectation is written E g(X) g(x)I(x)dx.
!
Alternately, Ior counting measure on the integers with density I(k), E g(X) g(k)I(k).
Z
"
k"
The expectation oI X, iI it exists, is called the mean oI X. The expectation oI (X - EX)
2
, iI it
exists, is called the variance oI X. DeIine 1(X<a) to be an indicator Iunction that is one iI X(s) <
a, and zero otherwise. Then, E 1(X<a) F(a), and the distribution Iunction can be recovered Irom
the expectations oI the indicator Iunctions. Most econometric applications deal with random
variables that have Iinite variances. The space oI these random variables is L
2
(S,F,P), the space oI
random variables X Ior which E X
2
|
S
X(x)
2
P(ds) . The space L
2
(S,F,P) is also termed the
space oI square-integrable functions. The norm in this space is root-mean-square, [X[
2
||
S
X(s)
2
P(ds)|
. Implications oI X - L
2
(S,F,P) are E ,X, < |
S
max(X(s),1)P(ds)< |
S
(X(s)
2
1)P(ds) [X[
2
2
1 and E (X - EX)
2
[X[
2
2
- (E ,X,)
2
< [X[
2
2
, so that X has a well-deIined, Iinite mean
and variance.
Example 1. (continued) DeIine a random variable X by
X(s)
0 if s TT
1 if s TH or HT
2 if s HH
Then, X is the number oI heads in two coin tosses. For a Iair coin, E X 1.
Example 2. (continued) Let X be a random variable deIined to equal the number oI heads that
appear beIore a tail occurs. Then, possible values oI X are the integers C 0,1,2,...}. Then C is
the support oI X. For x real, deIine |x| to be the largest integer k satisIying k < x. A distribution
___________________________________________________________________________
Iunction Ior X, deIined on the real line, is F(x) ; the associated density
1 2
|x1|
for 0 < x
0 for 0 ~ x
deIined on C is I(k) 2
-k-1
. The expectation oI X, obtained using evaluation oI a special series Irom
2.1.10, is E X k2
-k-1
1.
Z
"
k0
Example 3. (continued) DeIine a random variable X by X(s) s - 1. Then, X is the magnitude
oI the deviation oI the index Irom one. The inverse image oI an interval (a,b) is (1-b,1-a)(1a,1b)
- F, so that X is measurable. Other examples oI measurable random variables are Y deIined by Y(s)
Max 1,s} and Z deIined by Z(s) s
3
.
3.5.6. Consider a random variable Y on (r,B). The expectation EY
k
is the k-th moment oI Y,
and E(Y-EY)
k
is the k-th central moment. Sometimes moments Iail to exist. However, iI g(Y) is
continuous and bounded, then Eg(Y) always exists. The expectation m(t) Ee
tY
is termed the
moment generating function (mgI) oI Y; it sometimes Iails to exist. Call a mgI proper iI it is Iinite
Ior t in an interval around 0. When a proper mgI exists, the random variable has Iinite moments oI
all orders. The expectation (t) Ee
tY
, where is the square root oI -1, is termed the characteristic
function (cI) oI Y. The characteristic Iunction always exists.
Example 5. (continued) A density I(x) that is symmetric about zero, such as the standard normal,
has EX
k
x
k
I(x)dx x
k
I(-x)dx x
k
I(x)dx |1 (-1)
k
|x
k
I(x)dx 0 Ior
!
"
" !
0
" !
"
0 !
"
0
k odd. Integration by parts yields the Iormula EX
k
2k x
k-1
|1-F(x)|dx Ior k even. For the
!
"
0
standard normal, EX
2k
(2k-1)EX
2k-2
Ior k ~ 2 using integration 2
!
"
0
(2a)
1/2
x
2k1
e
x
2
/2
xdx
by parts, and EX
2
24(0) 1. Then, EX
4
3 and EX
6
15. The 2
!
"
0
(2a)
1/2
e
x
2
/2
xdx
moment generating Iunction oI the standard normal is m(t) .
!
"
"
(2a)
1/2
e
tx
e
x
2
/2
dx
Completing the square in the exponent gives m(t) . e
t
2
/2
!
"
"
(2a)
1/2
e
(xt)
2
/2
dx e
t
2
/2
___________________________________________________________________________
3.5.7. II T random variables are Iormed into a vector, X() (X(,1),...,X(,T)), the result is
termed a random vector. For each s - S, the realization oI the random vector is a point
(X(s,1),...,X(s,T)) in r
T
, and the random vector has an induced probability on r
T
which is
characterized by its multivariate CDF, F
X
(x
1
,...,x
T
) P(s-S,X(s,1)<x
1
,...,X(s,T)<x
T
}). Note that all
the components oI a random vector are Iunctions oI the same state oI Nature s, and the random
vector can be written as a measurable Iunction X Irom the probability space (S,F,P) into (r
T
,B
nT
.
(The notation B
T
means BB...B T times, where B is the Borel o-Iield on the real line. This is
also called the product o-Iield, and is sometimes written B
T
G GG G
i1,...,T
B
i
, where the B
i
are identical
copies oI B.) The measurability oI X requires X
-1
(C) - S Ior each open rectangle C in r
T
. The
independence or dependence oI the components oI X is determined by the Iine structure oI P on S.
A useIul insight comes Irom considering diIIerent representations oI vectors in Iinite-
dimensional spaces, and extending these ideas to inIinite-dimensional situations. To be speciIic,
consider r
2
. When we express a Iunction X on T 1,2} as a point (X(1),X(2)) in this space, what
we are really doing is deIining two Iunctions Z
1
(1,0) and Z
2
(0,1) with the property that Z
1
and
Z
2
span the space, and then writing X as the linear combination X X(1)Z
1
X(2)Z
2
. The pair oI
Iunctions (points) Z
1
and Z
2
is called a Hamel basis Ior r
2
, and every point in the space has a unique
representation in terms oI this basis. However, there may be many diIIerent Hamel bases. For
example, the unit Iunction (1,1) and the Iunction cos(at) or (-1,1) also Iorm a Hamel basis, and in
terms oI this basis X has the representation X (X(1)X(2))(1,1) (X(2)-X(1))(-1,1).
Another way to write a random vector X is to deIine an index set T 1,...,T}, and then deIine
X as a real-valued Iunction on S and T, X:STr. Then, X(,t) is a simple random variable Ior each
t - T, and X(s,) is a real vector that is a realization oI X Ior each s - S. A Iunction deIined in this
way is also called a stochastic process, particularly when T is not Iinite. The measurability
requirement on X is the same as beIore, but can be written in a diIIerent Iorm as requiring that the
inverse image oI each open interval in r be contained in FT, where T is a o-Iield oI subsets oI T
that can be taken to be the Iamily oI all subsets oI T and ' denotes the operation that Iorms the
smallest o-Iield containing all sets AB with A - F and B - T. There is then a complete duality
between random vectors in a T-dimensional linear space and random Iunctions on a T-dimensional
index set. This duality between vectors and Iunctions will generalize and provide useIul insights into
statistical applications in which T is a more general set indexing time. The distribution function
(CDF) oI X is
F(x
1
,...,x
T
) P(sCSX
i
(s) < x
i
Ior i 1,...,T}).

II A - B
T
, deIine F(A) P(sCSX(s)-A}). II F(A) 0 Ior every set A oI Lebesque measure
zero, then there exists a probabilitv densitv function (pdI) I(x
1
,...,x
T
) such that
___________________________________________________________________________
(1) F(x
1
,...,x
T
) I(y
1
,...,y
T
) dy
1
...dy
T
.

!
x
1
" !
x
2
"
...
!
x
T
"
F and I are termed the foint or multivariate CDF and pdI, respectively, oI X. The random variable
X
1
has a distribution that satisIies

F
1
(x
1
) P(s-SX
1
(s) < x
1
}) F(x
1
,,...,).
This random variable is measurable with respect to the o-subIield G
1
containing the events whose
occurrence is determined by X
1
alone; i.e., G
1
is the Iamily generated by sets oI the Iorm Ar...r
with A - B. II F is absolutely continuous with respect to Lebesque measure on B
T
, then there are
associated densities I and I
1
satisIying
(2)

F
1
(x
1
) I
1
(y
1
) dy
1

!
x
1
v
1
"
(3) I
1
(x
1
) I(x
1
,y
2
,...,y
n
)dy
2
...dy
n
.
!
"
v
2
"

!
"
v
n
"
F
1
and I
1
are termed the marginal CDF and pdI, respectively, oI X
1
.

3.5.8. Corresponding to the concept oI a conditional probability, we can deIine a conditional
distribution: Suppose C is an event in G
1
with P(C) ~ 0. Then, deIine F
(2)
(x
2
,...,x
n
C)
F(y-r
n
y
1
-C,y
2
<x
2
,...,y
n
<x
n
})/F
1
(C) to be the conditional distribution oI (X
2
,...,X
n
) given X
1
- C.
When F is absolutely continuous with respect to Lebesgue measure on r
n
, the conditional
distribution can be written in terms oI the joint density,
F
(2)
(x
2
,...,x
n
C)

.
!v
1
/ !
x
2
v
2
" !
x
n
v
n
"
f(v
1
,v
2
,...,v
n
)dv
1
dv
2
...dv
n
!v
1
/ !
"
v
2
" !
"
v
n
"
f(v
1
,v
2
,...,v
n
)dv
1
dv
2
...dv
n
Taking the limit as C shrinks to a point X
1
x
1
, one obtains the conditional distribution oI (X
2
,...,X
n
)
given X
1
x
1
,
F
(2)
(x
2
,...,x
n
X
1
x
1
)

,
!
x
2
v
2
" !
x
n
v
n
"
f(x
1
,v
2
,...,v
n
)dv
1
dv
2
...dv
n
f
1
(x
1
)
provided I
1
(x
1
) ~ 0. Finally, associated with this conditional distribution is the conditional density
I
(2)
(x
2
,...,x
n
X
1
x
1
) I(x
1
,x
2
,...,x
n
)/I
1
(x
1
). More generally, one could consider the marginal
distributions oI any subset, say X
1
,...X
k
, oI the vector X, with X
k1
,...X
n
integrated out; and the
___________________________________________________________________________
conditional distributions oI one or more oI the variables X
k1
,...X
n
given one or more oI the
conditions X
1
x
1
,...,X
k
x
k
.
3.5.9. ust as expectations are deIined Ior a single random variable, it is possible to deIine
expectations Ior a vector oI random variables. For example, E(X
1
- EX
1
)(X
2
-EX
2
) is called the
covariance oI X
1
and X
2
, and Ee
tX
, where t (t
1
,...,t
n
) is a vector oI constants, is a (multivariate)
moment generating Iunction Ior the random vector X. Here are some useIul properties oI
expectations oI vectors:
(a) II g(X) is a Iunction oI a random vector, then Eg(X) is the integral oI g with respect to the
distribution oI X. When g depends on a subvector oI X, then Eg(X) is the integral oI g(y) with
respect to the marginal distribution oI this subvector.
(b) II X and Z are random vectors oI length n, and a and b are scalars, then E(aX bZ) aEX
bEZ.
(c) |Cauchy-Schwartz inequality| II X and Z are random vectors oI length n, then (EXZ)
2
<
(EXX)(EZZ).
(d) |Minkowski Inequality| II X is a random vector oI length n and r . 1 is a scalar, then
(E X
i
r
)
1/r
< (EX
i
r
)
1/r
.
Z
n
i1 Z
n
i1
(e) |Loeve Inequality| II X is a random vector oI length n and r ~ 0, then E X
i
r
<
Z
n
i1
max(1,n
r-1
) EX
i
r
.
Z
n
i1
(I) |ensen Inequality| II X is a random vector and g(x) is a convex Iunction, then E g(X) .
g(EX). II g(x) is a concave Iunction, the inequality is reversed.
When expectations exist, they can be used to bound the probability that a random variable takes on
extreme values.
Theorem 3.6. Suppose X is a n1 random vector and C is a positive scalar.
a. |Markov bound| II max
i
EX
i
, then max
i
Pr(X
i
~ C) max
i
EX
i
/C.
b. |Chebyshev bound| II EXX , then Pr([X[
2
~ C) EXX/C
2
.
c. |ChernoII bound| II Ee
tX
exists Ior all vectors t in some neighborhood oI zero, then Ior some
positive scalars and M, Pr([X[
2
~ C) Me
-C
.
ProoI: All these inequalities are established by the same technique: II r(y) is a positive non-
decreasing Iunction oI y ~ 0, and Er([X[) , then
___________________________________________________________________________
Pr([X[ ~ C) F(dx) < |r([x[)/r(C)|F(dx) < Er([X[)/r(C).

!x~C !x~C
Taking r(y) y
2
gives the result directly Ior the Chebyshev bound. In the remaining cases, Iirst get
a component-by-component inequality. For the Markov bound, Pr(X
i
~ C) EX
i
/C Ior each i
gives the result. For the ChernoII bound,
Pr([X[
2
~ C) < |Pr(X
i
~ Cn
-1/2
) Pr(X
i
-Cn
-1/2
)|
Z
n
i1
since iI the event on the leIt occurs, one oI the events on the right must occur. Then apply the
inequality Pr(X
i
~ C) < Er(X
i
)/r(C) with r(y) n
-1/2
e
y
to each term in the right-hand-side sum.
The inequality Ior vectors is built up Irom a corresponding inequality Ior each component.
3.5.10. When the expectation oI a random variable is taken with respect to a conditional
distribution, it is called a conditional expectation. II F(xC) is the conditional distribution oI a
random vector X given the event C, then the conditional expectation oI a Iunction g(X) given C is
deIined as
E
XC
g(X) g(y)F(dyC).
!
Another notation Ior this expectation is E(g(X)C). When the distribution oI the random variable
X is absolutely continuous with respect to Lebesgue measure, so that it has a density I(x), the
conditional density can be written as , and the conditional expectation f(x,/) f(x)1(x-/)/
!/
f(s)ds
can then be written
E
XC
g(X) .
!/
g(x)f(x,/)dx
!/
g(x)f(x)dx
!
/
f(x)dx
When the distribution oI X is discrete, this Iormula becomes
E
XC
g(X) .
Z
k/
g(k)f(k)
Z
k/
f(k)
The conditional expectation is actually a Iunction on the o-Iield C oI conditioning events, and is
sometimes written E
XC
g(X) or E(g(X)C) to emphasize this dependence.
Suppose A
1
,...,A
k
partition the domain oI X. Then the distribution satisIies
___________________________________________________________________________
F(x) F(x,A
i
)F(A
i
),
Z
k
i1
implying
Eg(X) g(x)F(dx) g(x)F(dxA
i
)F(A
i
) Eg(X),A
i
}F(A
i
).
!
Z
k
i1
!
Z
k
i1
This is called the law of iterated expectations, and is heavily used in econometrics.
Example 2. (continued) Recall that X is the number oI heads that appear beIore a tail in a
sequence oI coin tosses, and that the probability oI X k is 2
-k-1
Ior k 0,1,... . Let C be the event
oI an even number oI heads. Then,
E
XC
X 2/3,
Zk0,2,4,...
k2
k1
Z
k0,2,4,...
2
k1
Zf0,1,2,...
f4
f
Z
f0,1,2,...
4
f
/2
where the second ratio is obtained by substituting k 2j, and the value is obtained using the
summation Iormulas Ior a geometric series Irom 2.1.10. A similar calculation Ior the event A oI an
odd number oI heads yields E
XA
X 5/3. The probability oI an even number oI heads is
2/3. The law oI iterated expectations then gives
Zk0,2,4,...
2
k1
E X EX,C)P(C) EX,A)P(A) (2/3)(2/3) (5/3)(1/3) 1,
which conIirms the direct calculation oI E X.
The concept oI a conditional expectation is very important in econometrics and in economic
theory, so we will work out its properties in some detail Ior the case oI two variables. Suppose
random variables (U,X) have a joint density I(u,x). The marginal density oI X is deIined by
g(x) I(u,x)du,
!
"
u"
and the conditional density oI U given X x is deIined by I(ux) I(u,x)/g(x), provided g(x) ~ 0.
The conditional expectation oI a Iunction h(U,X) satisIies E(h(U,X)Xx) |h(u,x)I(ux)du, and
is a Iunction oI x. The unconditional expectation oI h(U,X) satisIies
Eh(U,X) h(u,x)I(u,x)dudx E
X
E
UX
h(U,X);

!! !
"
x" !
"
u"
h(u,x)f(ux)du g(x)dx
another example oI the law oI iterated expectations. The conditional mean oI U given Xx is
M
UX
(x) E
UXx
U; by the law oI iterated expectations, the conditional and unconditional mean are
___________________________________________________________________________
related by

E
U
U E
X
E
UX
U E
X
M
UX
(X). The conditional variance oI U is deIined by V(Ux)
E
UX
(U - M
UX
(x))
2
. It is related to the unconditional variance by the Iormula
E
U
(U - E
U
U)
2
E
X
E
UX
(U - M
UX
(X) M
UX
(X) - E
U
U)
2
E
X
E
UX
(U - M
UX
(X))
2
E
X
E
UX
(M
UX
(X) - E
U
U)
2
2E
X
E
UX
(U - M
UX
(X))(M
UX
(X) - E
U
U)
E
X
V(UX) E
X
(M
UX
(X) - E
U
U)
2
2E
X
(M
UX
(X) - E
U
U)E
UX
(U - M
UX
(X))
E
X
V(UX) E
X
(M
UX
(X) - E
U
U)
2

Then, the unconditional variance equals the expectation oI the conditional variance plus the variance
oI the conditional expectation.
Example 10: Suppose (U,X) are bivariate normal with means EU
u
and EX
x
, and second
moments E(U-
u
)
2
o
u
2
, E(X-
x
)
2
o
x
2
, and E(U-
u
)(X-
x
) o
ux
o
u
o
x
. DeIine
,
1
1
2
u
u
o
u
2
x
x
o
x
2
2
u
u
o
u

x
x
o
x
and observe that
. Q
x
x
o
x
2
1
1
2
u
u
o
u

x
x
o
x
2
The bivariate normal density is I(u,x) |2ao
u
o
x
(1-
2
)
1/2
|
-1
exp(-/2). The marginal density oI X is
normal with mean
x
and variance o
x
2
: n(x-
x
,o
x
) (2ao
x
2
)
-1
exp(-(x-
x
)
2
/2o
x
2
) . This can be derived
Irom the bivariate density by completing the square Ior u in and integrating over u. The
conditional density oI U given X then satisIies
I(ux) |2ao
u
o
x
(1-
2
)
|
-1
exp(-/2)/(2ao
x
2
)
-1
exp(-(x-
x
)
2
/2o
x
2
).
|2ao
u
2
(1-
2
)|
-
.

exp
1
2(1
2
)
u
u
o
u

x
x
o
x
2
Hence the conditional distribution oI U, given X x, is normal with conditional mean E(UXx)

u
o
u
(x
x
)/o
x

u
o
ux
(x-
x
)/o
x
2
and variance V(UXx) E((U-E(UXx))
2
Xx)
o
u
2
(1-
2
) o
u
2
- o
ux
2
/o
x
2
. When U and X are joint normal random vectors with EU
u
, EX
x
,
E(U-
u
)(U-
u
)
uu
, E(X-
x
)(X-
x
)
xx
, and E(U-
u
)(X-
x
)
ux
, then (UXx) is normal with
E(UXx)
u

ux
xx
-1
(x -
x
) and V(UXx)
uu
-
ux
xx
-1

xu
.
___________________________________________________________________________
3.5.11. Conditional densities satisIy I(u,x) I(ux)g(x) I(xu)h(u), where h(u) is the marginal
density oI U, and hence I(ux) I(xu) h(u)/g(x). This is called Baves Law. When U and X are
independent, I(u,x) h(u)g(x), or I(ux) h(u) and I(xu) g(x). For U and X independent, and
r() and s() any Iunctions, one has E(r(U)Xx) |r(u)I(ux)du |r(u)h(u)du Er(U), and
E(r(U)s(X)) |r(u)s(x)I(u,x)dudx |s(x)g(x)|r(u)I(ux)du dx |s(x)g(x)Er(Ux)dx
|Es(X)||Er(U)|, or cov(r(U),s(X)) 0, provided Er(U) and Es(X) exist. II r(u) u - EU, then
E(r(U)Xx) 0 and cov(U,X) E(U-EU)X 0. Conversely, suppose U and X are jointly
distributed. II cov(r(U),s(X)) 0 Ior all Iunctions r (), s() such that Er(U) and Es(X) exist, then X
and U are independent. To see this, choose r(u) 1 Ior u < u
, r(u) 0 otherwise; choose s(x) 1

Ior x < x
, s(x) 0 otherwise. Then Er(U) H(u
) and Es(X) G(x
), where H and G are the

marginal cumulative distribution Iunctions, and 0 cov F(u
,x
) - H(u
)G(x
), where F is the joint

cumulative distribution Iunction. Hence, F(u,x) H(u)G(x), and X, U are independent.
Note that cov (U,X) 0 is not suIIicient to imply U,X independent. For example, g(x) Ior
-1 < x < l and I(ux) Ior -1 < u-x
2
< 1 is nonindependent with E(UXx) x
2
, but cov(U,X)
EX
3
0. Furthermore, E(UXx) 0 is not suIIicient to imply U,X independent. For example, g(x)
Ior -1 < x < 1 and I(ux) 1/2(1 x
2
) Ior -(1 x
2
) < u < (1 x
2
) is nonindependent with E
(U
2
x) (1 x
2
)
2
E U
2
28/15, but E(UXx) 0.
Example 11. Suppose monthly Iamily income (in thousands oI dollars) is a random variable Y
with a CDF F(y) 1 - y
-2
Ior y ~ 1. Suppose a random variable Z is one Ior home owners and zero
otherwise, and that the conditional probability oI the event Z 1, given Y, is (Y-1)/Y. The
unconditional expectation oI Y is 2. The joint density oI Y and Z is I(y)g(z,y) (2y
-3
) (1 - y
-1
) Ior
z 1. The unconditional probability oI Z 1 is then I(y)g(z,y)dy 1/3. Bayes Law gives
!
"
v1
the conditional density oI Y given z 1, I(y,z) I(y)g(z,y)/ I(y)g(z,y)dy (6y
-3
) (1 - y
-1
), so
!
"
v1
that the conditional expectation oI Y given z 1 is E(Y,Z1) y I(y,z)dy 3.
!
"
v1
Example 12. The problem oI interpreting the results oI medical tests illustrates Bayes Law. A
blood test Ior prostate cancer is known to yield a 'positive with probability 0.9 iI cancer is present,
and a Ialse 'positive with probability oI 0.2 iI cancer is not present. The prevalence oI the cancer
in the population oI males is 0.05. Then, the conditional probability oI cancer, given a 'positive
test result, equals the joint probability oI cancer and a positive test result, (0.05)(0.9), divided by the
probability oI a positive test result, (0.05)(0.9)(0.95)(0.2), or 0.235. Thus, a 'positive test has a
low probability oI identiIying a case oI cancer, and iI all 'positive tests were Iollowed by surgery,
about 75 percent oI these surgeries would prove unnecessary.
___________________________________________________________________________
3.5.12. The discussion oI expectations will be concluded with a list oI detailed properties oI
characteristic Iunctions and moment generating Iunctions:
a. (t) Ee
tY
Ecos(tY) Esin(tY).
b. Z a bY has the cI e
ta
(bt) and Z I(Y) has the cI Ee
t(I(Y)
.
c. II EY
k
exists, then
(k)
(t) d
k
(t)/dt
k
exists, satisIies the bound d
k
(t)/dt
k
< EY
k
, and is
uniIormly continuous, and EY
k
()
k
(k)
(0). II
(k)
(t) exists, then EY
k
exists.
d. II Y has Iinite moments through order k, then (t) has a Taylors expansion
(t)
j
(EY
j
)t
j
/j |
(k)
(t) -
(k)
(0)|t
k
/k

Z
k
f0
where is a scalar with 0 1; the Taylors expansion satisIies the bounds

(t) -
j
(EY
j
)t
j
/j < t
k
EY
k
/k
Z
k1
f0
and
(t) -
j
(EY
j
)t
j
/j < 2t
k
EY
k
/k
Z
k
f0
II EY
k
exists, then the expression (t) Ln (t), called the second characteristic function or
cumulant generating function, has a Taylors expansion
(t)
j
j
t
j
/j |
(k)
(t) -
(k)
(t)|,

Z
k
f1
where
(k)
d
k
/dt
k
, and is a scalar with 0 1. The expressions
j
are called the cumulants
oI the distribution, and satisIy
1
EY and
2
Var(Y). The expression
3
/
2
3/2
is called the
skewness, and the expression
4
/
2
2
- 3 is called the kurtosis (i.e., thickness oI tails relative to
center), oI the distribution.
e. II Y is normally distributed with mean and variance o
2
, then its characteristic Iunction is
exp(t-o
2
t
2
/2). The normal has cumulants
1
,
2
o
2
,
3

4
0.
I. Random variables X and Y have identical distribution Iunctions iI and only iI they have
identical characteristic Iunctions.
g. II Y
n
-
p
Y (see Chap. 4.1), then the associated characteristic Iunctions satisIy
n
(t) - (t) Ior
each t. Conversely, iI Y
n
has characteristic Iunction
n
(t) converging pointwise to a Iunction (t)
that is continuous at t 0, then there exists Y such that (t) is the characteristic Iunction oI Y
and Y
n
-
p
Y.
h. The characteristic Iunction oI a sum oI independent random variables equals the product oI
the characteristic Iunctions oI these random variables, and the second characteristic Iunction oI
a sum oI independent random variables is the sum oI the second characteristic Iunctions oI these
variables; the characteristic Iunction oI a mean oI n independently identically distributed random
variables, with characteristic Iunction (t), is (t/n)
n
.
___________________________________________________________________________
Similar properties hold Ior proper moment generating Iunctions, with obvious modiIications:
Suppose a random variable Y has a proper mgI m(t), Iinite Ior t , where is a positive constant.
Then, the Iollowing properties hold:
a. m(t) Ee
tY
Ior t .
b. Z a bY has the mgI e
ta
m(bt).
c. EY
k
exists Ior all k ~ 0, and m d
k
m(t)/dt
k
exists and is uniIormly continuous Ior t , with
EY
k
m
Y
(0).
d. m(t) has a Taylors expansion (Ior any k) m
Y
(t) (EY
j
)t
j
/j |m(t) - m(0)|t
k
/k, where is
a scalar with 0 1.
e. II Y is normally distributed with mean and variance o
2
, then it has mgI exp(to2t2).
I. Random variables X and Y with proper mgI have identical distribution Iunctions iI and only
iI their mgI are identical.
g. II Y
n
-
p
Y and the associated mgI are Iinite Ior t , then the mgI oI Y
n
converges pointwise
to the MGF oI Y. Conversely, iI Y
n
have proper MGF which converges pointwise to a Iunction
m(t) that is Iinite Ior t , then there exists Y such that m(t) is the mgI oI Y and Y
n
-
p
Y.
h. The mgI oI a sum oI independent random variables equals the product oI the mgI oI these
random variables; the mgI oI the mean oI n independently identically distributed random
variables, each with proper mgI m(t), is m(t/n)
n
.
The deIinitions oI characteristic and moment generating Iunctions can be extended to vectors oI
random variables. Suppose Y is a n1 random vector, and let t be a n1 vector oI constants. Then
(t) Ee
tY
is the characteristic Iunction and m(t) Ee
tY
is the moment generating Iunction. The
properties oI cI and mgI listed above also hold in their multivariate versions, with obvious
modiIications. For characteristic Iunctions, two oI the important properties translate to
(b`) Z a BY, where a is a m1 vector and B is a mn matrix, has cI e
ta
(Bt).
(e`) iI Y is multivariate normal with mean and covariance matrix / // /, then its characteristic
Iunction is exp(t - t/ // /t/2).
A useIul implication oI (b`) and (e`) is that a linear transIormation oI a multivariate normal vector
is again multivariate normal. Conditions (c) and (d) relating Taylor`s expansions and moments Ior
univariate cI have multivariate versions where the expansions are in terms oI partial derivatives oI
various orders. Conditions (I) through (h) are unchanged in the multivariate version.
The properties oI characteristic Iunctions and moment generating Iunctions are discussed and
established in C. R. Rao Linear Statistical InIerence, 2b.4, and W. Feller An Introduction to
Probability Theory, II, Chap. 13 and 15.
___________________________________________________________________________
3.6. TRANSFORMATIONS OF RANDOM VARIABLES
6.1. Suppose X is a measurable random variable on (r,B) with a distribution F(x) that is absolutely
continuous with respect to Lebesgue measure, so that X has a density I(x). Consider an increasing
transIormation Y H(X); then Y is another random variable. Let h denote the inverse Iunction oI
H; i.e., y H(x) implies x h(y). The distribution Iunction oI Y is given by
G(y) Pr(Y < y) Pr(H(X) < y) Pr(X < h(y)) F(h(y)).
When h(y) is diIIerentiable, with a derivative h(y) dh(y)/dy, the density oI Y is obtained by
diIIerentiating, and satisIies g(y) I(h(y))h(y). Since y H(h(y)), one obtains by diIIerentiation
the Iormula 1 H(h(y))h(y), or h(y) 1/H(h(y)). Substituting this Iormula gives g(y)
I(h(y))/H(h(y)).
Example 13. Suppose X has the distribution Iunction F(x) 1-e
-x
Ior x ~ 0, with F(x) 0 Ior
x < 0; then X is said to have an exponential distribution. Suppose Y H(X) log X, so that X
h(Y) e
Y
. Then, G(y) 1-exp(-e
y
) and G(y) exp(-e
y
)e
y
exp(y-e
y
) Ior - y . This is called
an extreme value distribution. A third example is X with some distribution Iunction F and density
I, and Y F(X), so that Ior any value oI X, the corresponding value oI Y is the proportion oI all X
that are below this value. Let x
p
denote the solution to F(x) p. The distribution Iunction oI Y is
G(y) F(x
y
) y. Hence, Y has the uniIorm density on the unit interval.
The rule Ior an increasing transIormation oI a random variable X can be extended in several
ways. II the transIormation Y H(X) is decreasing rather than increasing, then
G(y) Pr(Y < y) Pr(H(X) < y) Pr(X . h(y)) 1-F(h(y)),
where h is the inverse Iunction oI H. DiIIerentiating,
g(y) I(h(y))(-h(y)).
Then, combining cases, one has the result that for anv one-to-one transformation Y H(X) with
inverse X h(Y), the densitv of Y is
g(y) I(h(y))h(y) I(h(y))/H(h(y).
An example oI a decreasing transIormation is X with the exponential density e
-x
Ior x ~ 0, and Y
1/X. Show as an exercise that G(y) e
-1/y
and g(y) e
-1/y
/y
2
.
___________________________________________________________________________
Consider a transIormation Y H(X) that is not one-to-one. The interval (-,y) is the image oI
a set A
y
oI x values that may have a complicated structure. One can write
G(y) Pr(Y < y) Pr(H(X) < y) Pr(X - A
y
) F(A
y
).
II this expression is diIIerentiable, then its derivative gives the density.
Example 14. II X has a distribution F and density I, and Y X, then A
y
|-y,y|, implying
G(y) F(y) - F(-y) and I(y) I(y) I(-y).
Example 15. II Y X
2
, then A
y
|-y
1/2
,y
1/2
|, G(y) F(y
1/2
) - F(-y
1/2
). DiIIerentiating Ior y 0,
g(y) (I(y
1/2
) I(-y
1/2
))/2y
1/2
. Applying this to the standard normal with F(x) 4(x), the density oI
Y is g(y) (y
1/2
)/y
1/2
(2ay)
-
e
-y/2
, called the chi-square with one degree oI Ireedom.
3.6.2. Next consider transIormations oI random vectors. These transIormations will permit us
to analyze sums or other Iunctions oI random variables. Suppose X is a n1 random vector.
Consider Iirst the transIormation Y AX, where A is a nonsingular nn matrix. The Iollowing
result Irom multivariate calculus relates the densities oI X and Y:

Theorem 3.8. II X has density I(x), and Y AX, with A nonsingular, then the density oI Y is

g(y) I(A
-1
y)/det(A)

.
ProoI: We will prove the result in two dimensions, leaving the general case to the reader. First,
consider the case with a
11
~ 0 and a
22
~ 0. One has G(y
1
,y
2
) F(y
1
/a
11
,y
2
/a
22
).
Y
1
Y
2

a
11
0
0 a
22
X
1
X
2
DiIIerentiating with respect to y
1
and y
2
, g(y
1
,y
2
) I(y
1
/a
11
,y
2
/a
22
)/a
11
a
22
. This establishes the result
Ior diagonal transIormations. Second, consider with a
11
~ 0 and a
22
~ 0. Then
Y
1
Y
2

a
11
0
a
21
a
22
X
1
X
2
G(y
1
,y
2
) I(x
1
,x
2
)dx
2
dx
1
.

DiIIerentiating with respect to y
1
and y
2
yields
!
v
1
/a
11
x
1
" !
(v
2
a
21
)/a
22
x
2
"
c
2
G(y
1
,y
2
)/cy
1
cy
2
g(y
1
,y
2
) (a
11
a
22
)
-1
I(y
1
/a
11
,(y
2
-y
1
a
21
/a
11
)/a
22
).
___________________________________________________________________________
This establishes the result Ior triangular transIormations. Finally, consider the general
transIormation with a
11
~ 0 and a
11
a
22
-a
12
a
21
~ 0. Apply the result Ior triangular
Y
1
Y
2

a
11
a
12
a
21
a
22
X
1
X
2
transIormations Iirst to , and second to . This
Z
1
Z
2

1 a
12
/a
11
0 1
X
1
X
2
Y
1
Y
2

a
11
0
a
21
a
22
a
12
a
21
/a
11
Z
1
Z
2
gives the general transIormation, as . The density oI
a
11
a
12
a
21
a
22

a
11
0
a
21
a
22
a
12
a
21
/a
11
1 a
12
/a
11
0 1
Z is h(z
1
,z
2
) I(z
1
-z
2
a
12
/a
11
,z
2
), and oI Y is g(y
1
,y
2
) h(y
1
/a
11
,(y
2
-y
1
a
21
/a
11
)/(a
22
-a
12
a
21
/a
11
)).
Substituting Ior h in the last expression and simpliIying gives

g(y
1
,y
2
) I((a
22
y
1
-a
12
y
2
)/D,(a
11
y
2
-a
21
y
1
)/D)/D,

where D a
11
a
22
-a
12
a
21
is the determinant oI the transIormation.
We leave as an exercise the prooI oI the theorem Ior the density oI Y AX in the general case
with A nn and nonsingular. First, recall that A can be Iactored so that A PLDU, where P and
are permutation matrices, L and U are lower triangular with ones down the diagonal, and D is a
nonsingular diagonal matrix. Write Y PLDUX. Then consider the series oI intermediate
transIormations obtained by applying each matrix in turn, constructing the densities as was done
previously.
3.6.3. The extension Irom linear transIormations to one-to-one nonlinear transIormations oI
vectors is straightIorward. Consider Y H(X), with an inverse transIormation X h(Y). At a point
y
o
and x
o
h(y
o
), a Iirst-order Taylors expansion gives
y - y
o
A(x - x
o
) o(x - x
o
),
where A is the Jacobean matrix
A

cH
1
(x
o
)/cx
1
... cH
1
(x
o
)/cx
n
, ,
cH
n
(x
o
)/cx
1
... cH
n
(x
o
)/cx
n
and the notation o(z) means an expression that is small relative to z. Alternately, one has
___________________________________________________________________________
B A
-1
.
ch
1
(v
o
)/cv
1
... ch
1
(v
o
)/cv
n
, ,
ch
n
(x
o
)/cv
1
... ch
n
(v
o
)/cv
n
The probability oI Y in the little rectangle |y
o
,y
o
y| is approximately equal to the probability oI X
in the little rectangle |x
o
,x
o
A
-1
y|. This is the same situation as in the linear case, except there the
equality was exact. Then, the Iormulas Ior the linear case carry over directly, with the acobean
matrix oI the transIormation replacing the linear transIormation matrix A. II I(x) is the density oI
X, then g(y) I(h(y))det(B) I(h(y))/det(A) is the density oI Y.
Example 16. Suppose a random vector (X,Z) has a density I(x,z) Ior x,z ~ 0, and consider the
nonlinear transIormation W XZ and Y X/Z, which has the inverse transIormation X (WY)
1/2
and Z (W/Y)
1/2
. The acobean matrix is B , and det(B) 1/2y.
W
1/2
Y
1/2
/2 W
1/2
Y
1/2
/2
W
1/2
Y
1/2
/2 W
1/2
Y
3/2
/2
Hence, the density oI (w,y) is I((wy)
1/2
,(w/y)
1/2
)/2y.
In principle, it is possible to analyze n-dimensional nonlinear transIormations that are not
one-to-one in the same manner as the one-dimensional case, by working with the one-to-many
inverse transIormation. There are no general Iormulas, and each case needs to be treated separately.
OIten in applications, one is interested in a transIormation Irom a n1 vector oI random variables
X to a lower dimension. For example, one may be interested in the scalar random variable S X
1
... X
n
. II one "Iills out" the transIormation in a one-to-one way, so that the random variables oI
interest are components oI the complete transIormation, then Theorem 3.6 can be applied. In the
case oI S, the transIormation Y
1
S Iilled out by Y
i
X
i
Ior i 2,...,n is one-to-one, with
.

Y
1
Y
2
Y
3
,
Y
n
1 1 1 ... 1
0 1 0 ... 0
0 0 1 ... 0
, , , ,
0 0 0 ... 1
X
1
X
2
X
3
,
X
n
___________________________________________________________________________
Example 17. Consider a random vector (X,Z) with a density I(x,z), and the transIormation S
X Z and T Z, or . The acobean oI this transIormation is one, and its inverse is
S
T

1 1
0 1
X
Z
, so the density oI (S,T) is g(s,t) I(s-t,t). The marginal density oI S is then g
1
(s)
X
Z

1 1
0 1
S
T
I(s-t,t)dt. II X and Z are statistically independent, so that their density is I(x,z) I
1
(x)I
2
(z),
!
"
t"
then this becomes g
1
(s) I
1
(s-t)I
2
(t)dt. This is termed a convolution Iormula.
!
"
t"
3.7. SPECIAL DISTRIBUTIONS
3.7.1. A number oI special probability distributions appear Irequently in statistics and
econometrics, because they are convenient Ior applications or illustrations, because they are useIul
Ior approximations, or because they crop up in limiting arguments. The tables at the end oI this
Chapter list many oI these distributions.
3.7.2. Table 3.1 lists discrete distributions. The binomial and geometric distributions are
particularly simple, and are associated with statistical experiments such as coin tosses. The Poisson
distribution is oIten used to model the occurrence oI rare events. The hypergeometric distribution
is associated with classical probability experiments oI drawing red and white balls Irom urns, and
is also used to approximate many other distributions.
3.7.3. Table 3.2 list a number oI continuous distributions, including some basic distributions such
as the gamma and beta Irom which other distributions are constructed. The extreme value and
logistic distributions are used in the economic theory oI discrete choice, and are also oI statistical
interest because they have simple closed Iorm CDF`s.
3.7.4. The normal distribution and its related distributions play a central role in econometrics,
both because they provide the Ioundation Ior Iinite-sample distribution results Ior regression models
with normally distributed disturbances, and because they appear as limiting approximations in large
samples even when the Iinite sample distributions are unknown or intractable. Table 3.3 lists the
normal distribution, and a number oI other distributions that are related to it. The t and F
distributions appear in the theory oI hypothesis testing, and the chi-square distribution appears in
___________________________________________________________________________
large-sample approximations. The non-central versions oI these distributions appear in calculations
oI the power oI hypothesis tests.
It is a standard exercise in mathematical statistics to establish the relationships between normal,
chi-square, F, and t distributions. For completeness, we state the most important result:
Theorem 3.9. Normal and chi-square random variables have the Iollowing properties:
(I) II S Y
1
2
... Y
k
2
, where the Y
k
are independent normal random variables with means
k
and unit variances, then S has a non-central chi-square distribution with degrees oI Ireedom
parameter k and non-centrality parameter o
1
2
...
k
2
, denoted
2
(k,o). II o 0, this is
a (central) chi-square distribution with degrees oI Ireedom parameter k, denoted
2
(k).
(ii) II Y and S are independent, Y is normal with mean and unit variance, and S is chi-square
with k degrees oI Ireedom, then T Y/(S/k)
is non-central t-distributed with degrees oI

Ireedom parameter k and non-centrality parameter , denoted t(k,). II 0, this is a (central)
t-distribution with degrees oI Ireedom parameter k, denoted t(k).
(iii) II R and S are independent, R is non-central chi-square with degrees oI Ireedom parameter
k and non-centrality parameter o, and S is central chi-square with degrees oI Ireedom parameter
n, then F nR/kS is non-central F-distributed with degrees oI Ireedom parameters (k,n) and
non-centrality parameter o, denoted F(k,n,o). II o 0, this distribution is F-distributed with
degrees oI Ireedom parameters (k,n), and is denoted F(k,n).
(iv) T is non-central t-distributed with degrees oI Ireedom parameter k and non-centrality
parameter iI and only iI F T
2
is non-central F-distributed with degrees oI Ireedom
parameters (1,k) and non-centrality parameter o
2
.
ProoI: These results can be Iound in most classical texts in mathematical statistics; see particularly
Rao (1973), pp. 166-167, 170-172, 181-182, ohnson & otz (1970), Chap. 26-31, and Graybill
(1961), Chap. 4..
In applied statistics, it is important to be able to calculate values x G
-1
(p), where G is the CDF
oI the central chi-square, F, or t, distribution, and values p G(x) where G is the CDF oI the non-
central chi-square, F, or t distribution. Selected points oI these distributions are tabled in many
books oI mathematical and statistical tables, but it is more convenient and accurate to calculate these
values within a statistical or econometrics soItware package. Most current packages, including TSP,
STATA, and SST, can provide these values.
3.7.5. One oI the most heavily used distributions in econometrics is the multivariate normal. We
describe this distribution and summarize some oI its properties. A n1 random vector Y is
multivariate normal with a vector oI means and a positive deIinite covariance matrix L iI it has
the density
___________________________________________________________________________
n(y - ,/ // /) (2a)
-n/2
det(/ // /)
-1/2
exp(-((y - )/ // /
-1
(y - )/2).
This density is also sometimes denoted n(y;,/ // /), and the CDF denoted N(y;,/ // /). Its characteristic
Iunction is exp(t - t/ // /t/2), and it has the moments E Y and E (Y-)(Y-) / // /. From the
characteristic Iunction and the rule Ior linear transIormations, one has immediately the property that
a linear transIormations oI a multivariate normal vector is again multivariate normal. SpeciIically,
iI Y is distributed N(y;,/ // /), then the linear transIormation Z a BY, which has mean a B and
covariance matrix B/ // /B, is distributed N(z;a + B,B/ // /B). The dimension oI Z need not be the same
as the dimension oI Y, nor does B have to be oI maximum rank; iI B/ // /B is less than Iull rank, then
the distribution oI Z is concentrated on an aIIine linear subspace oI dimension n through the point
a B. Let o
k
(/
kk
)
denote the standard deviation oI Y

k
, and let
kj
/
kj
/o
k
o
j
denote the
correlation oI Y
k
and Y
j
. Then the covariance matrix / // / can be written
/ // / DRD,
o
1
0 ... 0
0 o
2
... 0

0 0 ... o
n
1
12
..
1n
21
1 ...
2n

n1

n2
.. 1
o
1
0 ... 0
0 o
2
... 0

0 0 ... o
n
where D diag(o
1
,...,o
n
) and R is the array oI correlation coeIIicients.
Theorem 3.10. Suppose Y is partitioned Y (Y
1
Y
2
), where Y
1
is m1, and let (
1

2
)
and be commensurate partitions oI and L. Then the marginal density oI Y
1
is
/ // /
11
/ // /
12
/ // /
21
/ // /
22
multivariate normal with mean
1
and covariance matrix L
11
. The conditional density oI Y
2
, given
Y
1
y
1
, is multivariate normal with mean
2
/ // /
22
-1
/ // /
21
(y
1
-
1
) and covariance matrix L
22
- / // /
21
/ // /
11
-
1
/ // /
12
. Then, the conditional mean oI a multivariate normal is linear in the conditioning variables.
ProoI: The easiest way to demonstrate the theorem is to recall Irom Chapter 2 that the positive
deIinite matrix L has a Cholesky Iactorization L LL, where L is lower triangular, and that L has
an inverse K that is again lower triangular. II Z is a n1 vector oI independent standard normal
random variables (e.g., each Z
i
has mean zero and variance 1), then Y LZ is normal with mean
and covariance matrix L. Conversely, iI Y has density n(y - ,/ // /), then Z K(Y - ) is a vector
oI i.i.d. standard normal random variables. These statement use the important property oI normal
random vectors that a linear transIormation is again normal. This can be shown directly by using
the Iormulas in Section 3.6 Ior densities oI linear transIormations, or by observing that the
(multivariate) characteristic Iunction oI Y with density n(y - ,/ // /) is exp( t - t/ // /t/2), and the Iorm
oI this characteristic Iunction is unchanged by linear transIormations.
___________________________________________________________________________
The Cholesky construction Y LZ provides an easy demonstration Ior the densities oI
marginal or conditional subvectors oI Y. Partition L and Z commensurately with (Y
1
Y
2
), so that
L and Z (Z
1
Z
2
). Then / // /
11
L
11
L
11
, / // /
21
L
21
L
11
,
/ // /
22
L
22
L
22
L
21
L
21
, and
0
11
0
0
21
0
22
hence / // /
21
/ // /
11
-1
L
21
L
11
-1
, implying L
22
L
22
/ // /
22
- / // /
21
/ // /
11
-1
/ // /
12
. Then, Y
1

1
+ L
11
Z
1
has a marginal
multivariate normal density with mean
1
and covariance matrix L
11
L
11
/ // /
11
. Also, Y
2

2
+ L
21
Z
1
L
22
Z
2
, implying Y
2

2
+ L
21
L
11
-1
(Y
1
-
1
) L
22
Z
2
. Conditioned on Y
1
y
1
, this implies Y
2

2
+ / // /
21
/ // /
11
-1
(y
1
-
1
) L
22
Z
2
is multivariate normal with mean
2
- / // /
21
/ // /
11
-1
1
and covariance matrix
/ // /
22
- / // /
21
/ // /
11
-1
/ // /
12
.
The next theorem gives some additional useIul properties oI the multivariate normal and oI
quadratic Iorms in normal vectors.
Theorem 3.11. Let Y be a n1 random vector. Then,
(i) II Y (Y
1
Y
2
) is multivariate normal, then Y
1
and Y
2
are independent iI and only iI they
are uncorrelated. However, Y
1
and Y
2
can be uncorrelated and each have a marginal normal
distribution without necessarily being independent.
(ii) II every linear combination cY is normal, then Y is multivariate normal.
(iii) II Y is i.i.d. standard normal and A is an idempotent nn matrix oI rank k, then YAY is
distributed
2
(k).
(iv) II Y is distributed N(,I) and A is an idempotent nn matrix oI rank k, then YAY is
distributed
2
(k,o) with o A.
(v) II Y is i.i.d. standard normal and A and B are positive semideIinite nn matrices, then
YAY and YBY are independent iI and only iI AB 0.
(vi) II Y is distributed N(,I), and A
i
is an idempotent nn matrix oI rank k
i
Ior I 1,..., then
the YA
i
Y are mutually independent and distributed
2
(k
i
,o
i
) with o
i
A
i
iI and only iI
either (a) A
i
A
j
0 Ior I j or (b) A
1
... A
is idempotent.
(vii) II Y is distributed N(,I), A is a positive semideIinite nn matrix, B is a kn matrix, and
BA 0, then BY and YAY are independent.
(viii) II Y is distributed N(,I) and A is a positive semideIinite nn matrix, then E YAY
A tr(A).
ProoI: Results (i) and (ii) are proved in Anderson (1958), Thm. 2.4.2 and 2.6.2. For (iii) and (iv),
write A UU, where this is its singular value decomposition with U a nk column orthogonal
matrix. Then UY is distributed N(U,I
k
), and the result Iollows Irom Theorem 3.8. For (v), let k
be the rank oI A and m the rank oI B. There exists a nk matrix U oI rank k and a nm matrix V
oI rank m such that A UU and B VV. The vectors UY and VY are uncorrelated, hence
___________________________________________________________________________
independent, iI and only iI U V 0. But AB U(UV)V is zero iI and only iI U V 0 since U and
V are oI maximum rank. For (vi), use the SVD decomposition as in (iv). For (vii), write A UU
with U oI maximum rank as in (v). Then BA (BU)U 0 implies BU 0, so that BY and UY are
independent by (i). For (vii), E YAY A E (Y-)A(Y-) A tr(E (Y-)A(Y-))
A tr(A).
3.8. NOTES AND COMMENTS
The purpose oI this chapter has been to collect the key results Irom probability theory that are
used in econometrics. While the chapter is reasonably selI-contained, it is expected that the reader
will already be Iamiliar with most oI the concepts, and can iI necessary reIer to one oI the excellent
texts in basic probability theory and mathematical statistics, such as P. Billingsley, Probabilitv and
Measure, Wiley, 1986; or Y. Chow and H. Teicher, Probabilitv Theorv, 1997. A classic that
provides an accessible treatment oI Iields oI subsets, measure, and statistical independence is .
Neveu, Mathematical Foundations of the Calculus of Probabilitv, Holden-Day, 1965. Another
classic that contains many results Irom mathematical statistics is C. R. Rao (1973) Linear Statistical
Inference and Its Applications, Wiley. A comprehensive classical text with treatment oI many
topics, including characteristic Iunctions, is W. Feller, An Introduction to Probabilitv Theorv and
Its Applications, Vol. 1&2, Wiley, 1957. For special distributions, properties oI distributions, and
computation, a Iour-volume compendium by N. ohnson and S. otz, Distributions in Statistics,
Houghton-MiIIlin, 1970, is a good source. For the multivariate normal distribution, T. Anderson
(1958) An Introduction to Multivariate Statistical Analvsis, Wiley, and F. Graybill (1961) An
Introduction to Linear Statistical Models, McGraw-Hill, are good sources. Readers who Iind some
sections oI this chapter unIamiliar or too dense may Iind it useIul to Iirst review an introductory text
at the undergraduate level, such as . Chung, A Course in Probabilitv Theorv, Academic Press, New
York, or R. Larsen and M. Marx, Probabilitv Theorv, Prentice-Hall.
___________________________________________________________________________
TABLE 3.1. SPECIAL DISCRETE DISTRIBUTIONS
NAME & DOMAIN DENSITY MOMENTS CHAR. FN.
1. Binomial
n
k
p
k
(1p)
nk
np (1-ppe
t
)
n
k 0,1,...,n 0 p 1 o
2
np(1-p) Note 1
2. Hypergeometric
r
k
w
nk
rw
n
nr/(rw) Note 2
k an integer
max0,n-w} < k
& k < minr,n}
rw ~ n
r,w,n positive integers
o
2

nrw
(rw)
2
rwn
rw1
3. Geometric p(1-p)
k
(1-p)/p Note 3
k 0,1,2,... 0 p 1 o
2
(1-p)/p
2
4. Poisson e
-
k
/k exp|(e
t
-1)|
k 0,1,2,... ~ 0 o
2

2
Note 4
5. Negative Binomial
p
r
(1-p)
k
rk1
k
r(1-p)/p
k 0,1,2,... r integer, r ~ 0 & 0 p 1 o
2
r(1-p)/p
2
Note 5

NOTES
1. EX (the mean), and o
2
E(X-)
2
(the variance). The density is oIten denoted b(k;n,p). The moment
generating Iunction is (1-ppe
t
)
n
.
2. The characteristic and moment generating Iunctions are complicated.
3. The characteristic Iunction is p/(1-(1-p)e
t
) and the moment generating Iunction is p/(1-(1-p)e
t
), deIined Ior t
-ln(1-p).
4. The moment generating Iunction is exp((e
t
-1)), deIined Ior all t.
5. The characteristic Iunction is p
r
/(1-(1-p)e
t
)
r
, and the moment generating Iunction is p
r
/(1-(1-p)e
t
)
r
, deIined Ior
t -ln(1-p).
___________________________________________________________________________
TABLE 3.2. SPECIAL CONTINUOUS DISTRIBUTIONS
1. UniIorm
a < x < b

1/(b-a) (ab)/2
o
2
(b-a)
2
/12
e
bt
e
at
t(ba)
Note 1
2. Triangular (1-,x,/a)/a 0
2
1cos at
a
2
t
2
,x, a o
2
a
2
/6
3. Cauchy a/a(a
2
(x-)
2
) none e
t-,to,
- x

4. Exponential e
-x/
/ 1/(1-t)
x . 0

o
2

2
Note 2
5. Pareto ba
b
x
--b-1
ab/(b-1) Note 3
x . a o
2
ba
2
/(b-1)
2
(b-2)
6. Gamma
x ~ 0

x
a1
e
x/b
(a)b
a
ab
o
2
ab
2
(1-bt)
-a
Note 4
7. Beta
0 x 1

x
a-1
(1-x)
b-1
(ab)
(a)(b)
a/(ab)
o
2

ab
(ab)
2
(ab1)
Note 5
8. Extreme Value
exp
1
b

xa
b
e
(xa)/b
a 0.57721b Note 6
- x

o
2
(ab)
2
/12
9. Logistic
1
b
exp((ax)/b)
(1exp((ax)/b))
2
a Note 7
- x

o
2
(ab)
2
/6
NOTES
1. The moment generating Iunction is (e
bt
- e
at
)/(b-a)t, deIined Ior all t.
2. The moment generating Iunction is 1/(1 - t), deIined Ior t 1/.
3. The moment generating Iunction does not exist. The mean exists Ior b ~ 1, the variance exists Ior b ~ 2.
4. For a ~ 0, (a) |
o
"
x
a-1
e
-x
dx is the gamma Iunction. II a is an integer, (a) (a-1).
5. For the characteristic Iunction, see C. R. Rao, Linear Statistical InIerence, Wiley, 1973, p. 151.
6. The moment generating Iunction is e
at
(1 - tb) Ior t 1/b .
7. The moment generating Iunction is e
at
abt/sin(abt) Ior t 1/2b.
___________________________________________________________________________
TABLE 3.3. THE NORMAL DISTRIBUTION AND ITS RELATIVES
1. Normal
n(x-,o)
- x , o ~ 0
(2ao
2
)
-
exp( )
(x)
2
2o
2
mean
o
2
variance
exp(t-o
2
t
2
/2)
Note 1
2. Standard Normal
- x
(x) (2a)
-
exp(-x
2
/2) 0
o
2
1
exp(-t
2
/2)
3. Chi-Square
0 x
2
(x;k)
x
(k/2)1
e
x/2
(k/2)2
k/2
k
o
2
2k
k 1,2,...
(1-2t)
-k/2
Note 2
4. F-distribution
0 x
F(x;k,n)
k,n positive integers
iI n ~ 2
o
2
2n
2
(kn2)
k(n2)
2
(n4)
iI n ~ 4
Note 3
5. t-distribution
- x
(
k1
2
)(1x
2
/k)
(k1)/2
k (
1
2
)(
12k
2
)
0 iI k ~ 1
o
2
k/(k-2) iI k ~ 2
Note 4
1. Noncentral
2
(x;k,o) ko Note 5
Chi-Squared k pos. integer o
2
2(k2o)
x ~ 0 o . 0
2. Noncentral F(x;k,n,o) iI n ~ 2, n(ko)/k(n-2) Note 6
F-distribution
x ~ 0
k,n positive integers
o . 0
iI n ~ 4, o
2

2(n/k)
2
(ko)
2
(k2o)(n2)
(n2)
2
(n4)
3. Noncentral
t-distribution
t(x;k,)
k pos. integer
iI k ~ 1
((k1)/2)
(k/2)
Note 7
o
2
(1
2
)k/(k-2) -
2

iI k ~ 2

___________________________________________________________________________
NOTES TO TABLE 3.3
1. The density is oIten denoted n(x-,o
2
), and the cumulative distribution reIerred to as N(x-,o
2
), or simply N(,o
2
). The
moment generating Iunction is exp(to
2
t
2
/2), deIined Ior all t. The standard normal density is oIten denoted (x), and the
standard normal CDF is denoted 4(x). The general normal and standard normal Iormulas are related by n(x-,o
2
)
((x-)/o)/o and N(x-,o
2
) 4((x-)/o).
2. The moment generating Iunction is (1-t/2)
-k/2
Ior t 2. The Chi-Square distribution with parameter k ( degrees oI
Ireedom) is the distribution oI the sum oI squares oI k independent standard normal random variables. The Chi-Square
density is the same as the gamma density with b 2 and a k/2.
3. The F-distribution is the distribution oI the expression nU/kV, where U is a random variable with a Chi-square
distribution with parameter k, and V is an independent random variable with a Chi-square distribution with parameter n.
The density is . For n < 2, the mean does not exist, and Ior n < 4, the variance does not exist.
(
kn
2
)
(
k
2
)(
n
2
)
k
k/2
n
n/2
x
k/21
(nkx)
(kn)/2
The characteristic and moment generating Iunctions are complicated.
4. II Y is standard normal and Z is independently Chi-squared distributed with parameter k, then Y/ has a Z/k
T-Distribution with parameter k ( degrees oI Ireedom). The characteristic Iunction is complicated; the moment generating
Iunction does not exist.
5. The Noncentral Chi-square is the distribution oI the sum oI squares oI k independent normal random variables, each
with variance one, and with means whose squares sum to o. The Noncentral Chi-Square density is a Poisson mixture oI
(central) Chi-square densities, |e
-o/2
(o/2)
j
/j|
2
(x;k2j).
"
f0
6. The Non-central F-distribution has a density that is a Poisson mixture oI rescaled (central) F-distributed densities,
|e
-o/2
(o/2)
j
/j| F( ;k2j,n). It is the distribution oI the expression nU/kV, where U is a Noncentral
"
f0
k
k2f
kx
k2f
Chi-Squared random variable with parameters k and o, and V is an independent central Chi-Squared distribution with
parameter n.
7. II Y is standard normal and Z is independently Chi-squared distributed with parameter k, then (Y)/ has (Z/k)
a Noncentral T-Distribution with parameters k and . The density is a Poisson mixture oI scaled Beta distributed densities,
| (
2
/2)
j
/j| B( ).
Z
"
f0
e
2
/2 xk
(kx
2
)
2
k
kx
2
,
k
2
,
12f
2
The square oI a Noncentral T-Distributed random variable has a Noncentral F-Distribution with parameters 1, k, and o
2
.
___________________________________________________________________________
3.9 EXERCISES
1. In Example 1, write out all the members oI F.
2. Prove that a o-Iield oI events contains countable intersections oI its members.
3. Example 2 claims that the class oI all subsets oI countable S has greater cardinality than S itselI. Mathematically, this
means that it is not possible to associate a unique element oI S with each member oI the class. Use the Iollowing device
to convince yourselI this is true: Write each number in the unit interval as a Iraction in binary notation, 0.b
1
b
2
.... Associate
with each number the class member that contains the sequence with j heads iI and only iI bj 1. Then, the real numbers,
which are uncountable, map into unique members oI the class, so the class is also uncountable.
4. In Example 4, show that the event 'the change is the same on successive days is not in -
1
-
2
, but is a monotone limit
oI sets in -
1
-
2
.
5. Economic agents can make contingent trades only iI it is common knowledge iI the contingency is realized. In Example
1, Agent 1 knows 1, Agent 2 knows 2, Agent 3 knows 3 ,HH,TT},HT,TH},S}. What is the common knowledge
oI Agents 1 and 2? OI Agents 1 and 3?
6. Suppose, in Example 1, that instead oI H and T being equally likely, the probability measure satisIies
HH HT TH TT
0.2 0.4 0.1 0.3
What is the probability that the Iirst coin is heads? That the second coin is heads? That the two coins give the same result?
7. Consider the sequence oI Iunctions I
n
(x) x
1/n
Ior 0 x 1. These are square integrable. Do they converge to a limit,
and iI so, what is the convergence strong, in measure, or weak?
8. Consider the probability measure P(|0,x|) x1/2 on 0 < x < 1. Does it meet the Radon-Nikodym conditions Ior the
existence oI a probability density?
9. It is known that 0.2 percent oI the population is HIV-positive. It is known that a screening test Ior HIV has a 10 percent
chance oI incorrectly showing positive when the subject is negative, and a 2 percent chance oI incorrectly showing negative
when the subject is positive. What proportion oI the population that tests positive has HIV?
10. ohn and ate are 80 years old. The probability that ohn will die in the next year is 0.08, and the probability that ate
will die in the next year is 0.05. The probability that ohn will die, given that ate dies, is 0.2. What is the probability
that both will die? That at least one will die? That ate will die, given that ohn dies?
11. The probability that a driver will have an accident next year iI she has a Ph.D. is 0.2. The probability she will have
an accident iI she does not have a Ph.D. is 0.25. The probability the driver has a Ph.D. and an accident is 0.01. What is
the probability the driver has a Ph.D.? What is the probability oI a Ph.D. given an accident?
___________________________________________________________________________
12. A quiz show oIIers you the opportunity to become a millionaire iI you answer nine questions correctly. uestions can
be easy (E), moderate (M), or hard (H). The respective probabilities that you will answer an E, M, or H question correctly
are 2/3, 1/2, and 1/3. II you get an E question, your next question will be E, M, or H with probabilities 1/4, 1/2, and 1/4
respectively. II you get a M question, your next question will be E, M, or H with probabilities 1/3, 1/3, and 1/3 respectively.
II you get a H question, your next question will be E, M, or H with probabilities 1/2, 0, and 1/2 respectively. The Iirst
question is always an E question. What is the probability that you will become a millionaire? |Hint: Show that the
probability oI winning iI you reach question 9 is independent oI whether this question is E, M, or H. Then use backward
recursion.|
13. Show that iI A i B and P(A) ~ 0, then P(CA) can be either larger or smaller than P(CB).
14. An airplane has 100 seats. The probability that a ticketed passenger shows up Ior the Ilight is 0.9, and the events that
any two diIIerent passengers show up is statistically independent. II the airline sells 105 seats, what is the probability that
the plane will be overbooked? How many seats can the airline sell, and keep the probability oI overbooking to 5 percent
or less?
15. Prove that the expectation E(X - c)
2
is minimized when c EX.
16. Prove that the expectation EX - c is minimized when c median(X).
17. What value oI c minimizes Emax(X-c,0) (1-)max(c-X,0)}? (Hint: describe the solution in terms oI the
distribution F oI X.|
18. A sealed bid auction has a tract oI land Ior sale to the highest oI n bidders. You are bidder 1. Your experience is that
the bids oI each other bidder is distributed with a Power distribution F(X) X
Ior 0 < X < 1. Your proIit iI you are

successIul in buying tract at price y is 1 - y. What should you bid to maximize your expected proIit? What is your
probability oI winning the auction?
19. A random variable X has a normal distribution iI its density is f(x) (2ao
2
)
-
exp(-(x-)
2
/2o
2
), where and o
2
are
parameters. Prove that X has mean and variance o
2
. Prove that E(X-)
3
0 and E(X-)
4
3o
4
. |Hint: First show that
|xexp(-x
2
/2)dx - exp(-x
2
/2) and Ior k ~ 1, the integration by parts Iormula |x
k
exp(-x
2
/2)dx -x
k-1
exp(-x
2
/2)
(k-1)|x
k-2
exp(-x
2
/2)dx.|
20. Suppose the stock market has two regimes, Up and Down. In an Up regime, the probability that the market index will
rise on any given day is P. In a Down regime, the probability that the market index will rise on any given day is , with
P. Within a regime, the probability that the market rises on a given day is independent oI its history. The probability
oI being in a Up regime is 1/2, so that iI you do not know which regime you are in, then all you can say is that the
probability that the market will rise on any given day is R (P)/2. Assume that regimes persist Iar longer than runs oI
rises, so that when analyzing runs the regime can be treated as persisting indeIinitely. Show that when you are in the Up
regime, the probability oI a run oI k or more successive days in which the market rises is P
k-1
, and that the probability oI
a run oI exactly k days in which the market rises is P
k-1
(1-P). A similar Iormula with instead oI P holds when you are
in a Down regime. Show that expected length in an Up regime oI a run oI rises is 1/(1-P). Show that /(1-P) /(1-)
. 1/(1-R).
21. The random vector (X
1
,X
2
) has the distribution Iunction exp(-(exp(-2x
1
)exp(-2x
2
))
1/2
). What is the marginal
distribution oI X
1
? What is the conditional distribution oI X
1
given X
2
< c? Given X
2
c?
___________________________________________________________________________
22. The expectation E(XaZ)
2
. 0 Ior random variables X, Z and any scalar a.. Use this property to prove the Cauchy-
Schwartz inequality.
23. Prove ensen`s inequality Ior a probability concentrated at two points.
24. In Example 2, use the law oI iterated expectations to calculate the expectation oI the number oI heads, given that the
number exceeds one.
25. II X and Z are bivariate normal with means 1 and 2, and variances 1 and 4, respectively, and covariance , what is the
density oI X given Z z? Use Bayes law to deduce the conditional density oI Z given X x?
26. Prove the Iormula Ior the characteristic Iunction oI a standard normal random variable.
27. What is the domain oI the moment generating Iunction oI an exponentially distributed random variable with density
I(x) exp(-3x) Ior x ~ 0?
28. II (X,Z) is a random vector with density I(x,z) and z ~ 0, and S X/Z, T Z, what is the acobean oI the
transIormation?
29. II X and Y are multivariate normal with zero means, EXX A, EYY B, and EXY C, show that X and Z Y -
XB
-1
C are independent.
30. For the binomial distribution b(k;n,p), what is the variance oI the Irequency I k/n?
31. The hypergeometric distribution describes the probability that k oI n balls drawn Irom an urn will be red, where the urn
contains r red and w white balls, and sampling is without replacement. Calculate the same probability iI sampling is with
replacement. Calculate the probabilities, with and without replacement, when r 10, w 90, n 5, k 1.
32. In a Poisson distribution, what is the expected count conditioned on the count being positive?
33. Under what conditions is the characteristic Iunction oI a uniIorm distribution oI |-a,b| real?
34. Show that iI X and Y are independent identically distributed extreme value, then X - Y is logistic distributed.
35. Suppose that the duration oI a spell oI unemployment (in days) can be described by a geometric distribution, Prob(k)
p
k
(1-p), where 0 p 1 is a parameter and k is a non-negative integer. What is the expected duration oI unemployment?
What is the probability oI a spell oI unemployment lasting longer than days? What is the conditional expectation oI the
duration oI unemployment, given the event that ~ m, where m is a positive integer? |Hint: Use Iormulas Ior geometric
series, see 2.1.10.|
36. Use the moment generating Iunction to Iind EX
3
when X has density e
-x/
/, x ~ 0.
37. A log normal random variable Y is one that has log(Y) normal. II log(Y) has mean and variance o
2
, Iind the mean
and variance oI Y. |Hint: It is useIul to Iind the moment generating Iunction oI Z log(Y).|
___________________________________________________________________________
38. II X and Y are independent normal, then XY is again normal, so that one can say that the normal familv is closed
under addition. (Addition oI random variables is also called convolution, Irom the Iormula Ior the density oI the sum.)
Now suppose X and Y are independent and have extreme value distributions, Prob(X < x) exp(-e
a-x
) and Prob(Y < y)
exp(-e
b-y
)

, where a and b are location parameters. Show that max(X,Y) once again has an extreme value distribution (with
location parameter c log(e
a
e
b
)), so that the extreme value familv is closed under maximi:ation.
39. II X is standard normal, derive the density and characteristic Iunction oI Y X
2
, and conIirm that this is the same as
the tabled density oI a chi-square random variable with one degree oI Ireedom. II X is normal with variance one and a mean
that is not zero, derive the density oI Y, which is non-central chi-square distributed with one degree oI Ireedom and
noncentrality parameter
2
.
40. Random Variables X and Y are bivariate normal, with EX 1, EY 3, and Var(X) 4, Var(Y) 9, Covariance(X,Y)
5.
(a) What is the mean oI Z 2X - Y?
(b) What is the variance oI Z 2X - Y?
What is the conditional mean oI Z given X 5?
(d) What is the conditional variance oI Z given X 5?
41. What is the probability that the larger oI two random observations Irom any continuous distribution will exceed the
population median?

42. II random variables X and Y are independent, with EX 1, EY 2, EX
2
4, EY
2
9, what is the unconditional
mean and variance oI 3XY? What is the conditional mean and variance oI 3XY given Y 5?
43. obs are characterized by a wage rate W and a duration oI employment X, and (W,X) can be interpreted as a
random vector. The duration oI employment has an exponential density e
-x
, and the wage rate W has a exponential
density, conditioned on X x, equal to (x)e
-(x)w
, where , , and are positive parameters. What is the marginal
density oI W? The conditional density oI X given W?
44. Random Variables X and Y are bivariate normal, with EX 1, EY 3, and Var(X) 4, Var(Y) 9,
Covariance(X,Y) 5.
(a) What is the mean oI Z 2X - Y?
(b) What is the variance oI Z 2X - Y?
What is the conditional mean oI Z given X 5?
(d) What is the conditional variance oI Z given X 5?
45. The data set nyse.txt in the class data area oI the class home page contains daily observations on stock market
returns Irom an. 2, 1968 through Dec. 31, 1998, a total oI 7806 observations corresponding to days the market was
open. There are Iour variables, in columns delimited by spaces. The Iirst variable (DAT) is the date in vvmmdd
Iormat, the second variable (RNYSE) is the daily return to the NYSE market index, deIined as the log oI the ratio oI
the closing value oI the index today to the closing index on the previous day the market was open, with distributions
(dividends) Iactored in. The third variable (SP500) is the S&P500 market index, an index oI a majority oI the high
market value stocks in the New York stock exchange. The Iourth variable (RTB90) is the rate oI interest in the
secondary market Ior 90-day Treasury Bills, converted to a daily rate commensurate with RNYSE..
a. Let E
n
denote a sample average (empirical expectation). Find the sample mean E
n
X, variance o
2
E
n
(X - ),
skewness E
n
(X - )
3
/o
3
, and kurtosis E
n
(X - )
4
/o
4
- 3, Ior the variables RNYSE and RTB90. Normally distributed
___________________________________________________________________________
random variables have zero skewness and kurtosis in the population. Making an "eyeball" comparison, do the sample
moments appear to be consistent with the proposition that RNYSE and RTB90 are normally distributed?
b. For RNYSE, Iorm the standardi:ed variable Z (RNYSE - )/o, by subtracting this variables sample mean and
then dividing by the square root oI its variance (or standard deviation). Sort the values oI Z Irom low to high, and then
construct a new variable Y that equals i/7806 Ior 1 < i < 7806. The values oI Z are called the order statistics oI the
sample, and Y is the empirical CDF, a CDF that puts 1/7806 probability at each observed value oI RNYSE. Plot Y
against 4(Z), where 4 is the standard normal CDF. II RNYSE is normal, then these curves will diIIer only because oI
sampling noise in Y. Does it appear by eyeball comparison that they are likely to be the same? A particular issue is the
theoretical question oI whether the distribution oI returns has Iat tails, so that the variance and higher moments are hard
to estimate precisely or may Iail to exist. In a normal sample, one would expect that on average 99 percent oI
standardized observations are less than 2.575 in magnitude. Do the standardized values Z appear to be consistent with
this Irequency?
c. A claim in the analysis oI stock market returns is that the introduction oI Iinancial derivatives and index Iunds
through the 1980s made it easier Ior arbitragers to close windows oI proIit opportunity. The argument is made that the
resulting actions oI arbitragers have made the market more volatile. Compare the subsamples oI NYSE excess returns
(EXCESS RNYSE - NRTB90) Ior the periods 1968-1978 and 1988-1998. By eyeball comparison, were there
diIIerences in mean excess return in the two decades? In the variance (or standard deviation) oI excess return? Now
do a 22 table oI sample means classiIied by the two decades above and by whether or not the previous day`s excess
return was above its decade average. Does it appear that the gap between mean excess returns on days Iollowing
previous rises and Ialls has increased or shrunk in the decade oI the 90s?

CH 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 3

Uploaded by

Copyright:

Available Formats

McFadden, !

"#"$%"$&#' )**'%+ 2000 Chapter 3-1, Page 43

. Take the o-Iield oI events to be the Borel o-field B(r

), which is deIined as the smallest Iamily

and satisIies the properties (i)-(iv)

, r(u) 0 otherwise; choose s(x) 1

, s(x) 0 otherwise. Then Er(U) H(u

) and Es(X) G(x

), where H and G are the

), where F is the joint

is non-central t-distributed with degrees oI

denote the standard deviation oI Y

Ior 0 < X < 1. Your proIit iI you are

You might also like