MTH2222 Mathematics of Uncertainty

School of Mathematical
Sciences
MTH2222
Mathematics of Uncertainty
Semester 2, 2015
MTH2222 – Mathematics of Uncertainty
Prepared by:
Kais Hamza
Produced and Published by:

School of Mathematical Sciences
Faculty of Science
Monash University
Clayton, Victoria, Australia, 3800
© Copyright 2010
NOT FOR RESALE. All materials produced for this unit are protected by copyright.
Monash students are permitted to use these materials for personal study and
research only, as permitted under the Copyright Act. Use of these materials for any
other purposes, including copying or resale may infringe copyright unless written
permission has been obtained from the copyright owners. Enquiries should be made
to the publisher
Lecture Notes
Contents
1 Sample Space and Probability 3

1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Total Probability Theorem and Bayes’ Rule . . . . . . . . . . . . . . . . . . . . 8
5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Discrete Random Variables 11

1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Probability Mass Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Expectation, Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Joint PMFs of Multiple Random Variables . . . . . . . . . . . . . . . . . . . . . 17
6 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 General Random Variables 21

1 Continuous Random Variables and PDFs . . . . . . . . . . . . . . . . . . . . . . 21
2 Cumulative Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 The Gamma Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Normal Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Conditioning on an Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6 Multiple Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . 29
7 Derived Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Further Topics on Random Variables 39

1 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2 Sums of Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . 43
2.1 MGF Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2 Direct Approach - Convolution . . . . . . . . . . . . . . . . . . . . . . . 44
3 More on Conditional Expectation and Variance . . . . . . . . . . . . . . . . . . 45
4 Sum of a Random Number of Random Variables . . . . . . . . . . . . . . . . . . 48
5 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6 Least Square Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7 The Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . 52
1
2
5 Limit Theorems 55
1 Markov and Chebyshev’s Inequalities . . . . . . . . . . . . . . . . . . . . . . . . 55
2 The Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 The Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 1
Sample Space and Probability
1 Sets
• S ∪ T = {x; x ∈ S or x ∈ T } and S ∩ T = {x; x ∈ S and x ∈ T }
– If S = {♣, ♢, ♡} and T = {♢, ♡, ♠} then S ∪T = {♣, ♢, ♡, ♠} and S ∩T = {♢, ♡}.

– Shade on separate graphs S ∩ T and S ∪ T .
S S
T T
– Let S be the set of (strictly) positive even integers and T be the set of integers
less than or equal to 9. Then S ∪ T = {. . . , −2, −1, . . . , 8, 9, 10, 12, 14, 16, . . .} and
S ∩ T = {2, 4, 6, 8}.
– Let S be the set of polynomials of degree less than or equal to 2 and T be the set of
differentiable functions f with f (0) = f ′ (0) = 0. Describe S ∩ T .
• S c = {x; x ̸∈ S}, (S c )c = S, S ∩ S c = ∅ and S ∪ S c = Ω
– If Ω = {♣, ♢, ♡, ♠} and S = {♢, ♡} then S c = {♣, ♠}.

– Shade on separate graphs S c ∩ T c and (S ∪ T )c .
3
4
S S
T T
What do you notice?
– Within the set of positive integers, what is the complement of the set of even integers?
• S \ T = {x; x ∈ S and x ̸∈ T } = S ∩ T c
– Shade S \ T and T \ S.
• S∆T = (S \ T ) ∪ (T \ S)
– What is (S∆T ) ∪ (S ∩ T )?
• S ∪ (T ∩ U ) = (S ∪ T ) ∩ (S ∪ U )
S T S T
U U
5
and S ∩ (T ∪ U ) = (S ∩ T ) ∪ (S ∩ U )
S T S T
U U
∪
∞
• Sn = S1 ∪ S2 ∪ . . . = {x; x ∈ Sn for some n}
n=1
∪
∞
– What is [0, 1 − 1/n]?
n=1
∪
∞
– What is [0, 1 − 1/n)?
n=1
∩
∞
• Sn = S1 ∩ S2 ∩ . . . = {x; x ∈ Sn for all n}
n=1
∩
∞
– What is [0, 1 + 1/n]?
n=1
∩
∞
– What is [0, 1 + 1/n)?
n=1
( )c ( )c
∪
∞ ∩
∞ ∩
∞ ∪
∞
• De Morgan’s Laws. Sn = Snc and Sn = Snc
n=1 n=1 n=1 n=1
2 Probabilistic Models
• Probability Axioms:
– P(A) ≥ 0
– P(Ω) = 1
( )
∪
∞ ∑
∞
– For disjoint events (Am ∩ An = ∅ for m ̸= n), P An = P(An )
n=1 n=1
• Properties of Probability Laws:

6
– If A ∩ B = ∅, then P(A ∪ B) = P(A) + P(B).
– P(Ac ) = 1 − P(A)
– P(∅) = 0
– If A ⊂ B, then P(A) ≤ P(B).
– P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
– P(A ∪ B) ≤ P(A) + P(B).
– P(A ∪ B ∪ C) = P(A) + P(Ac ∩ B) + P(Ac ∩ B c ∩ C) = P(A) + P(B) + P(C) − P(A ∩

B) − P(A ∩ C) − P(B ∩ C) + P(A ∩ B ∩ C).
A B
∗ What is P(A ∪ B ∪ C ∪ D)?

7
3 Conditional Probability
P(A ∩ B)
• P(A|B) = for P(B) ̸= 0
P(B)
– If B ⊂ A then P(A|B) = 1.
∗ What does this mean?
– If A ⊂ B c then P(A|B) = 0.
∗ What does this mean?
P(A)
– If A ⊂ B then P(A|B) = .
P(B)
– Assuming equally likely outcomes, compute
∗ P(A|B)?
A • •
• • • •
∗ P(B|A)? • • • • •
• • • •
• • •
B
∗ P(Ac |B)? • •
• P(·|B) satisfies all axioms of probability
– P(A|B) ≥ 0
– P(Ω|B) = P(B|B) = 1
( )
∪
∞ ∑
∞

– For disjoint events, P An B = P(An |B)

n=1 n=1
∗ Check this.
• Multiplication Rule:
8
– For P(A) ̸= 0 (resp. P(B) ̸= 0) P(A ∩ B) = P(B|A)P(A) (resp. P(A|B)P(B)).

∗ BT - Example 1.9: If an aircraft is present in a certain area, a radar correctly
registers its presence with probability 0.99. If it is not present, the radar falsely
registers an aircraft presence with probability 0.10. We assume that an aircraft
is present with probability 0.05. What is the probability of false alarm (a false
indication of aircraft presence), and the probability of missed detection (nothing
registers, even though an aircraft is present)?
(n−1 )
∩
– If P Ak ̸= 0, then
k=1
( ) ( )
∩
n n−1
∩

P Ak = P(A1 )P(A2 |A1 )P(A3 |A1 ∩ A2 ) . . . P An Ak

k=1 k=1
4 Total Probability Theorem and Bayes’ Rule

∪
n
• Total Probability Theorem. If A1 , . . . , An is a partition of Ω (i.e. Ak = Ω and
k=1
Ai ∩ Aj = ∅, for i ̸= j), then
P(B) = P(B|A1 )P(A1 ) + P(B|A2 )P(A2 ) + . . . + P(B|An )P(An )
A1 A2
B
A3
• Bayes’ Rule. If A1 , . . . , An is a partition of Ω, then

P(Ak )P(B|Ak ) P(Ak )P(B|Ak )
P(Ak |B) = =
P(B) P(B|A1 )P(A1 ) + . . . + P(B|An )P(An )
9
– BT - Example 1.9 (contd.): A radar registers the presence of an aircraft. What is

the probability that an aircraft is present?
5 Independence
• Assume equally likely outcomes.
In each of the following cases, how do P(A|B) and P(A) compare?
A • • A • •
• • • • • • • •
• • • • • • • • • •
• • • • • • • •
• • • • •
B • B
• • • •
• For P(A) ̸= 0 and P(B) ̸= 0, P(A|B) = P(A) ⇐⇒ P(B|A) = P(B) ⇐⇒ P(A ∩ B) =

P(A)P(B)
• A and B are said to be independent if P(A ∩ B) = P(A)P(B).
– Toss two fair coins. Let A = {HH, HT }, B = {HH, T H} and C = {HT, T H}.
Then A and B are independent, A and C are independent and, B and C are inde-
10
pendent.
• A and B are independent ⇒ A and B c are independent.
• A, B and C are independent if
– A, B and C are pairwise independent and

– P(A ∩ B ∩ C) = P(A)P(B)P(C).
∗ Back to the toss of two fair coins. A, B and C are not independent.
• A1 , A2 , . . . , An are independent if
( )
∩ ∏
P Ai = P(Ai ),
i∈I i∈I
for any subset I of {1, 2, . . . , n} with at least two indexes.
• With P(C) ̸= 0, A and B are conditionally independent given C if P(A ∩ B|C) =

P(A|C)P(B|C).
• If P(B ∩C) ̸= 0, A and B are conditionally independent given C iff P(A|B ∩C) = P(A|C).
• Independence does not imply conditional independence and vice versa.
– A supplier sends boxes of items to a factory: 90% of the boxes contain 1% defective,
9% contain 10% defective, and 1% contain 100% defective (eg wrong size). What
percentage of screws supplied are defective?
Two screws are chosen from a randomly selected box. What is the probability that
both are defective? Given that both are defective, what is the probability that the
box is 100% defective?
Chapter 2
Discrete Random Variables
1 Basic Concepts
• A random variable is a real-valued function of the outcome of the experiment.
– The number of heads out of two tosses of a coin defines a function (mapping) from
the sample space Ω = {HH, HT, T H, T T } into R:
X(HH) = 2, X(HT ) = X(T H) = 1 and X(T T ) = 0.
– The number of heads until the first tail in a sequence of tosses of a coin defines a
function from the sample space Ω = {T, HT, HHT, HHHT, . . .} into R:
X(T ) = 0, X(HT ) = 1, X(HHT ) = 2, X(HHHT ) = 3 . . .
• A function of a random variable defines another random variable.

• A discrete random variable is a real-valued function of the outcome of the experiment
that can take a finite or countably infinite number of values.
• The event {ω : X(ω) ∈ A} is commonly denoted {X ∈ A}.
2 Probability Mass Functions

• pX (x) = P[X = x] is the probability mass function of the random variable X.
– Let X be the number of heads in the two tosses of a fair coin. X may take the values
0, 1 or 2:
P[X = 0] = 0.25, P[X = 1] = 0.50 and P[X = 2] = 0.25.
– Let X be the number of heads until the first tail in a sequence of tosses of a coin:
P[X = 0] = 0.5, P[X = 1] = 0.25, P[X = 2] = 0.125, P[X = 3] = 0.0625 . . .
Starting with $y, you double your wealth after each head. Let Y be the amount of
money you hold after the first tail: Y = 2X y,
P[Y = y] = 0.5, P[Y = 2y] = 0.25, P[Y = 4y] = 0.125, P[Y = 8y] = 0.0625 . . .
11
12
• A probability mass functions must satisfy the following properties:
– p(x) ≥ 0, for all x’s in R

– p(x) > 0, for at most a countable number of x’s
∑
– p(x) = 1
x
• The Bernoulli Random Variable: A Bernoulli trial has only two possible outcomes
usually referred to as success and failure.
The Bernoulli random variable takes two values, 1 with probability say p, and 0 with
probability 1 − p: {
1−p x=0
p(x) =
p x=1
– A random variable with the above probability mass function is said to have a
Bernoulli distribution with parameter p.
• The Binomial Random Variable: The Binomial random variable is used to model
the number of successes in a sequence of say n Bernoulli trials with probability say p of
success (and probability q = 1 − p of failure):
( )
n x n−x n!
p(x) = p q = px q n−x , x = 0, 1, . . . , n
x x!(n − x)!
– A random variable with the above probability mass function is said to have a bino-
mial distribution with parameters n and p.
( )
n
– Counting: The number of ways one can select k objects out of n objects is =
k
n!
. It is the number of subsets of size k taken from a set of size n.
k!(n − k)!
∑n ( )
n k n−k ∑
n
n n!
– The Binomial Formula: (a + b) = a b = ak bn−k .
k=0
k k=0
k!(n − k)!
• The Discrete Uniform Random Variable: A random variable X with the following
probability mass function
1
p(x) = , x = m, . . . , n
n−m+1
is said to have a Discrete Uniform distribution (or that it is a Discrete Uniform random
variable) over the interval [m, n].
• The Geometric Random Variable: The Geometric random variable is used to model
the number of trials in a sequence of Bernoulli trials with probability say p of success
(and probability q = 1 − p of failure), until the first success:
p(x) = pq x−1 , x = 1, 2, . . .
13
– A random variable with the above probability mass function is said to have a geo-
metric distribution with parameter p.
∑∞
ak
– The Geometric Series: an = .
n=k
1−a
• The Poisson Random Variable The Poisson Random Variable with parameter λ has
probability mass function:
λx
p(x) = e−λ , x = 0, 1, 2, . . .
x!
– A random variable with the above probability mass function is said to have a Poisson
distribution with parameter λ.
3 Functions of Random Variables

• The PMF of the random variable Y = g(X) is
∑
pY (y) = pX (x).
x:g(x)=y
– Let X be a binomial random variable with parameters n and p. What is the distri-
bution of n − X?
– Let X be a Discrete Uniform random variable over the interval [−n, n], n ≥ 1. What
is the distribution of |X|?
4 Expectation, Mean and Variance

• The expected value or mean of a random variable X with PMF pX is
∑
E[X] = xpX (x)
x
14
– If X > c, then E[X] > c
– E[aX + b] = aE[X] + b
– The Bernoulli Random Variable with probability p of success: E[X] = p
– The Binomial and Associated Formulae:

∑ n ( ) ∑n ( )
n k n−k n n k n−k
a b = (a + b) , k a b = na(a + b)n−1 ,
k=0
k k=1
k
∑n ( )
2 n
k ak bn−k = na(na + b)(a + b)n−2
k=1
k
– The Binomial Random Variable with n trials and probability p of success:
E[X] = np
– The Arithmetic and Associated Sums:

∑
n
n(n + 1) ∑ 2 n(n + 1)(2n + 1) ∑ 3 n2 (n + 1)2
n n
k= , k = , k =
k=1
2 k=1
6 k=1
4
m+n
– The Discrete Uniform Random Variable over [m, n]: E[X] =
2
– The Geometric and Associated Sums: For all z ̸= 1,

∑
n
1 − z n+1
zk =
k=0
1−z
and for |z| < 1,
∑
∞
1 ∑
∞
1 ∑
∞
2
k
z = , kz k−1
= , k(k − 1)z k−2 =
k=0
1 − z k=1 (1 − z) k=2
2 (1 − z)3
15
1
– The Geometric Random Variable with probability of success p: E[X] = .
p
– The Exponential Power Series: For all z,
∑
∞
zk
= ez
k=0
k!
– The Poisson Random Variable with parameter λ: E[X] = λ.
• The variance of a random variable X with PMF pX is
var(X) = E[(X − E[X])2 ]
– var(X) = E[X 2 ] − E[X]2
– var(aX + b) = a2 var(X)
– The Bernoulli Random Variable with probability p of success: var(X) = p(1−p)
– The Binomial Random Variable with n trials and probability p of success:

var(X) = np(1 − p)
16
(n − m)(n − m + 2)
– The Discrete Uniform Random Variable over [m, n]: var(X) =
12
1−p
– The Geometric Random Variable with probability of success p: var(X) = .
p2
– The Poisson Random Variable with parameter λ: var(X) = λ.
• The nth moment of a random variable X with PMF pX is

∑
E[X n ] = xn pX (x)
x
– Let X be a discrete uniform random variable over [m, n]:

n2 (n + 1)2 − m2 (m − 1)2
E[X 3 ] =
4(n − m + 1)
• The expected value of the random variable Y = g(X) is

∑
E[Y ] = E[g(X)] = g(x)pX (x).
x
– Let X be a geometric random variable with parameter p. For 0 < z < 1,

pz
E[z X ] =
1 − qz
17
5 Joint PMFs of Multiple Random Variables

• How do we deal with two (or more) random variables?
– Roll two fair dice. Let S be the sum of the outcomes of the two dice and M be their
maximum. Compute P[S = 9] and P[M = 5]. What is P[S = 9, M = 5]?
– Roll three fair dice. Let X be the sum of the outcomes of dice 1 and 2, and Y be
the sum of the outcomes of dice 2 and 3. Compute P[X = 9] and P[Y = 9]. What
is P[X = 9, Y = 9]?
• The joint PMF of the random variables X and Y is
pX,Y (x, y) = P[X = x, Y = y]
– Roll two fair dice. Let X be the minimum of the outcomes of the two dice and Y be
their maximum. Find the joint PMF of X and Y .
• The marginal PMF of X and Y can be obtained from the joint PMF
∑ ∑
pX (x) = pX,Y (x, y) pY (y) = pX,Y (x, y)
y x
– Consider the grid {0, 1, . . . , n} × {0, 1, . . . , n}. Assume that points on the grid are
equally likely to be selected and denote by (X, Y ) the coordinates of a random se-
lected point. What is the joint PMF of X and Y ? What is the marginal distribution
of X?
• Let Z be the random variable g(X, Y ). Then

18
∑
– pZ (z) = pX,Y (x, y)
g(x,y)=z
∑∑
– E[Z] = E[g(X, Y )] = g(x, y)pX,Y (x, y)
x y
– in particular, E[aX + bY ] = aE[X] + bE[Y ].
6 Conditioning
• The conditional PMF of X given an event A with P(A) > 0, is
pX|A (x) = P[X = x|A]
maximum: 
 0.4 x = 4
P[S = x|M = 3] = 0.4 x = 5

0.2 x = 6
∑
• Conditional PMFs are true PMFs; i.e. pX|A (x) ≥ 0 and pX|A (x) = 1.
x
• The conditional PMF of X given an Y = y with pY (y) > 0, is
pX,Y (x, y)
pX|Y (x|y) = P[X = x|Y = y] =
pY (y)
• The conditional expectation of X given an event A with P(A) > 0, is

∑
E[X|A] = xpX|A (x)
x
• The conditional expectation of X given an Y = y with pY (y) > 0, is

∑
E[X|Y = y] = xpX|Y (x|y)
x
and we have ∑
E[X] = pY (y)E[X|Y = y]
y
19
maximum. What is E[S|M = 3]?
∑6
Find E[S|M = y], y = 1, . . . , 6 and check that E[S] = y=1 pM (y)E[S|M = y].
7 Independence
• A random variable X is independent of the event A if for all x, {X = x} and A are
independent, i.e.
pX|A (x) = pX (x), for all x
maximum. Is S independent of {M = 3}?
• X and Y are independent if for all pairs (x, y), {X = x} and {Y = y} are independent,
i.e.
pX,Y (x, y) = pX (x)pY (y), for all x, y
– Roll two fair dice. Let X be the minimum of the outcomes of the two dice and Y be
their maximum. Are X and Y independent?
• If X and Y are independent then, for any functions g and h, g(X) and h(Y ) are inde-
pendent.
• If X and Y are independent then
E[XY ] = E[X]E[Y ] and var(X + Y ) = var(X) + var(Y )

20
• The owner of a small drugstore is to order copies of a news magazine for the n potential
readers among his customers. Customers act independently and each one of them will
actually express interest in buying the news magazine with probability p. Suppose that
the store owner actually pays $B for each copy of the news magazine, and the price to
customers is $C. If magazines left at the end of the week have no salvage value, what is
the optimum number of copies to order?
Chapter 3
General Random Variables
1 Continuous Random Variables and PDFs

• A random variable X is continuous if there is a nonnegative fX , called the probability
density function of X (PDF), such that
∫ b
P[a ≤ X ≤ b] = fX (x)dx
a
• If X is a continuous RV with PDF fX , then
– For any x, P[X = x] = 0, and for any a and b, P[a ≤ X ≤ b] = P[a < X ≤ b] =
P[a ≤ X < b] = P[a < X < b].
∫ ∞
– fX (x)dx = 1.
−∞
– If δ is very small P[x < X < x + δ] ≈ fX (x)δ.
– Consider a continuous random variable whose PDF is given by
{ 2
cx 0 ≤ x ≤ 1
fX (x) =
0 otherwise
∗ Find c.
∗ Compute P[X < 1/2], P[X ≤ 2] and more generally P[X ≤ x].
• Let X be a continuous RV with PDF fX , then
21
22
∫ ∞
– E[X] = xfX (x)dx
−∞
∗ What is E[X] if fX (x) = 3x2 , 0 ≤ x ≤ 1.
∫ ∞
– E[g(X)] = g(x)fX (x)dx
−∞
∫ ∞ (∫ ∞ )2
– var(X) = E[X ] − E[X] =
2 2
x fX (x)dx −
2
xfX (x)dx
−∞ −∞
∗ What is var(X) if fX (x) = 3x2 , 0 ≤ x ≤ 1.
• var(X) = E[(X − E[X])2 ] ≥ 0, E[aX + b] = aE[X] + b and var(aX) = a2 var(X).
• The Continuous Uniform Random Variable

1
– The PDF is fX (x) = , a ≤ x ≤ b.
b−a
a+b
– The mean is E[X] = .
2
(b − a)2
– The variance is var(X) = .
12
• The Exponential Random Variable
– The PDF is fX (x) = λe−λx , x ≥ 0.

1
λ
1
– The variance is var(X) = 2 .
λ
• The Cauchy Random Variable
23
α
– The PDF is fX (x) = .
π(x2
+ α2 )
– The mean and variance do not exist.
2 Cumulative Distribution Functions

• The cumulative distribution function (CDF) of a random variable X is
FX (x) = P[X ≤ x]
– Draw the CDF of a binomial random variable with parameters n = 3 and p = .5.
– Draw the CDF of an arbitrary discrete random variable (taking an arbitrary but
finite number of of values).
– Draw the CDF of a uniform random variable over the interval [0, 1].
– Draw the CDF of an arbitrary continuous random variable.

24
– The Geometric Random Variable

The CDF is FX (x) = 1 − (1 − p)⌊x⌋ , x ≥ 0, where ⌊x⌋ is the integer part of x.
– The Continuous Uniform Random Variable
x−a
The CDF is FX (x) = , a ≤ x ≤ b.
b−a
– The Exponential Random Variable
The CDF is FX (x) = 1 − e−λx , x ≥ 0.
– The Cauchy Random Variable
1 1
The CDF is FX (x) = + arctan(x/α).
2 π
• If FX is a CDF, then
– FX is non-decreasing (y ≤ z ⇒ FX (y) ≤ FX (z)),

– lim FX (x) = 0 and lim FX (x) = 1,
x→−∞ x→+∞
– FX is right-continuous.
• If X is discrete with PMF pX and CDF FX , then
– FX is piecewise constant and increases only by jumps,

∑ ∑
x
– FX (x) = pX (y), in particular if X takes integer values, FX (x) = pX (y),
y≤x y=−∞
– pX (x) = jump of FX at x, in particular if X takes integer values, pX (x) = FX (x) −

FX (x − 1).
• If X is continuous with PDF fX and CDF FX , then
– FX is continuous,
∫ x
– FX (x) = fX (y)dy,
−∞
– fX (x) = FX′ (x).
25
3 The Gamma Random Variable

• The gamma function ∫ ∞
Γ(z) = uz−1 e−u du, z>0
0
has the following properties

√
– Γ(1/2) = π,
– Γ(z + 1) = zΓ(z), for z > 0,
– Γ(n) = (n − 1)!, for an integer n ≥ 1.
• A gamma random variable X has PDF
λα α−1 −λx
fX (x) = x e , x≥0
Γ(α)
• If X is a gamma random variable, then
α
λ
α
– The variance is var(X) = .
λ2
26
4 Normal Random Variables

• A continuous random variable is said to be normal or Gaussian with mean µ and variance
σ 2 , σ > 0, if its PDF is
1
e−(x−µ) /(2σ ) .
2 2
f (x) = √
2πσ
• If X is a normal random variable with mean µ and variance σ 2 , then E[X] = µ and
var(X) = σ 2 .
• If X is normal with mean µ and variance σ 2 , then Y = aX + b, a ̸= 0, is normal with

mean aµ + b and variance a2 σ 2 .
• A normal random variable with mean 0 and variance 1 is called a standard normal random
variable.
• If Z is standard normal then X = σZ + µ is normal with mean µ and variance σ 2 .

X −µ
Conversely, If X is normal with mean µ and variance σ 2 then Z = is standard
σ
normal.
• If Z is a standard normal random variable then

(2k)!
E[Z 2k ] = and E[Z 2k+1 ] = 0, k = 0, 1, 2, . . .
2k k!
27
5 Conditioning on an Event
• The conditional PDF fX|A of a continuous random variable X given an event A with
P(A) > 0 is defined as satisfying
∫ b
P[a < X < b|A] = fX|A (x)dx.
a
– In particular, the conditional PDF fX|X∈B of a continuous random variable X given

the event {X ∈ B} with P(X ∈ B) > 0 is
fX (x)
fX|X∈B (x) = , x ∈ B.
P(X ∈ B)
∗ Let X be an exponential random variable with parameter λ. What is the con-

ditional PDF fX|X>a for a > 0?
The memoryless property

P[X > x + a|X > a] = P[X > x], x, a > 0.
∫ ∞
• The conditional expectation of X given an event A with P(A) > 0, is E[X|A] = xfX|A (x)dx.
∫ ∞ −∞
More generally, E[g(X)|A] = g(x)fX|A (x)dx, and the conditional variance of X given
−∞
A is var(X|A) = E[X 2 |A] − E[X|A]2 .
28
∫
– In particular for A = {X ∈ B}, E[X|X ∈ B] = xfX|X∈B (x)dx and E[g(X)|X ∈
∫ B
B] = g(x)fX|X∈B (x)dx.
B
∗ Let X be an exponential random variable with parameter λ. Compute the
conditional expectation of X given the event {X > a} for a > 0.
∗ Compute var(X|X > a) for a > 0.
• If A1 , A2 , . . . , An form a partition with P(Ai ) > 0, for each i, then
∑
n
– fX (x) = fX|Ai (x)P(Ai )
i=1
∑
n
– E[X] = E[X|Ai ]P(Ai )
i=1
∑
n
– E[g(X)] = E[g(X)|Ai ]P(Ai )
i=1
∗ Example 3.11 The metro train arrives at the station near your home every
quarter hour starting at 6:00am. You walk into the station every morning be-
tween 7:10am and 7:30am, with the time in this interval being a uniform random
variable. What is the PDF of the time you have to wait for the first train to
arrive?
29
6 Multiple Continuous Random Variables

• Two random variables are said to be jointly continuous random variables if there exists
a non-negative function fX,Y (x, y) such
∫ ∫
P[X ∈ A, Y ∈ B] = fX,Y (x, y)dxdy, ∀A, B.
A B
fX,Y (x, y) is called the joint density of the pair (X, Y ).
– fX,Y (x, y) must satisfy

∗ fX,Y (x, y) ≥ 0 and
∫ +∞ ∫ +∞
∗ fX,Y (x, y)dxdy = 1.
−∞ −∞
– We select a point “at random” from the unit square [0, 1] × [0, 1] and denote by
(X, Y ) its coordinate. Then
{
1 0 ≤ x, y ≤ 1
f(X,Y ) (x, y) =
0 elsewhere
– f (x, y) = λ2 e−λy , 0 < x < y, is a joint PDF.
( 2 )
1 x − 2rxy + y 2
– f (x, y) = √ exp − is a joint PDF.
2π 1 − r2 2(1 − r2 )
• If X and Y are jointly continuous with joint density fX,Y (x, y), then
– X and Y are both continuous.

30
∫ +∞ ∫ +∞
– fX (x) = fX,Y (x, y)dy is the density of X, and fY (y) = fX,Y (x, y)dx is
−∞ −∞
the density of Y . fX (x) and fY (y) are called the marginal densities of X and Y
respectively.
1
∗ f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d
(b − a)(d − c)
∗ f (x, y) = λ2 e−λy , 0 < x < y
( 2 )
1 x − 2rxy + y 2
∗ f (x, y) = √ exp −
2π 1 − r2 2(1 − r2 )
• Let X and Y be two jointly continuous random variables with joint PDF fX,Y (x, y).
∫ +∞ ∫ +∞
E[g(X, Y )] = g(x, y)fX,Y (x, y)dxdy.
−∞ −∞
31
In particular
∫ +∞ ∫ +∞
E[XY ] = xyfX,Y (x, y)dxdy.
−∞ −∞
1
– f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d
(b − a)(d − c)
– f (x, y) = λ2 e−λy , 0 < x < y
( 2 )
1 x − 2rxy + y 2
– f (x, y) = √ exp −
2π 1 − r2 2(1 − r2 )
• Let X and Y be two jointly continuous random variables with joint PDF fX,Y (x, y). The
conditional PDF fX|Y (x|y) of X given Y is defined, whenever fY (y) > 0, as
fX,Y (x, y)
fX|Y (x|y) = .
fY (y)
32
1
– f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d
(b − a)(d − c)
– f (x, y) = λ2 e−λy , 0 < x < y
( 2 )
1 x − 2rxy + y 2
– f (x, y) = √ exp −
2π 1 − r2 2(1 − r2 )
∫ ∞
– fX,Y (x, y) = fX|Y (x|y)fY (y), fX (x) = fX|Y (x|y)fY (y)dy and P[X ∈ A|Y = y] =
∫ −∞
fX|Y (x|y)dy.
A
Furthermore, Bayes’ rule for continuous random variables holds
fX|Y (x|y)fY (y) fX|Y (x|y)fY (y)

fY |X (y|x) = = ∫ +∞ .
fX (x) −∞
f X|Y (x|v)f Y (v)dv
• Let X and Y be two jointly continuous random variables with joint PDF fX,Y (x, y). The
conditional expectations of X, g(X) and h(X, Y ) given Y = y are
∫ ∞
– E[X|Y = y] = xfX|Y (x|y)dx,
−∞
33
1
∗ f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d
(b − a)(d − c)
( 2 )
1 x − 2rxy + y 2
∗ f (x, y) = √ exp −
2π 1 − r2 2(1 − r2 )
∫ ∞
– E[g(X)|Y = y] = g(x)fX|Y (x|y)dx,
−∞
∫ ∞
– E[h(X, Y )|Y = y] = h(x, y)fX|Y (x|y)dx.
−∞
• Let X and Y be two jointly continuous random variables with joint PDF fX,Y (x, y). Then
E[h(X, Y )k(Y )|Y = y] = k(y)E[h(X, Y )|Y = y].
• Let X and Y be two jointly continuous random variables. Then

∫ ∞
E[g(X, Y )] = E[g(X, Y )|Y = y]fY (y)dy.
−∞
In particular
∫ ∞
– E[X] = E[X|Y = y]fY (y)dy,
−∞
∫ ∞
– E[g(X)] = E[g(X)|Y = y]fY (y)dy,
−∞
• Two jointly continuous random variables, X and Y , are said to be independent if
fX,Y (x, y) = fX (x)fY (y).
– X and Y are independent if and only if
fX|Y (x|y) = fX (x).

34
– If X and Y are independent then, for any functions g and h,
E[g(X)h(Y )] = E[g(X)]E[h(Y )].
In particular E[XY ] = E[X]E[Y ] and var(X + Y ) = var(X) + var(Y ).

1
∗ f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d
(b − a)(d − c)
∗ f (x, y) = λ2 e−λy , 0 < x < y
7 Derived Distributions
• Let X be a continuous random variable. To find the distribution of Y = g(X), one must
obtain the CDF of Y :
∫
FY (y) = P[g(X) ≤ y] = fX (x)dx
{x:g(x)≤y}
and differentiate the result to obtain the PDF of Y .
– Let X be standard normal. What is the distribution of X 2 ?
• Let X be a continuous random variable and Y = aX + b (a ̸= 0). Then

( )
1 y−b
fY (y) = fX .
|a| a
35
• Let X be a continuous random variable and Y = g(X) where g is a strictly monotone

and differentiable function. Then
1 −1
fY (y) = fX (g (y)).
|g ′ (g −1 (y))|
8 Simulations
The Inverse Transform Method
• Let X be a continuous random variable. Assume that FX is strictly increasing on the
range of X. Then FX (X) is a uniform over [0, 1] random variable. Conversely, if U is
uniform over [0, 1] random variable, then FX−1 (U ) is distributed like X.
36
• To generate an exponential random variable with parameter λ, simply generate a uniform

over [0, 1] random variable U and transform it in this way:
1
X = − ln(1 − U ).
λ
Since 1 − U and U have the same distribution,
1
X = − ln U
λ
also has an exponential distribution with parameter λ.
The Acceptance-Rejection Method

• The acceptance-rejection method is useful when the inverse transform method is not
practical because FX−1 is hard to compute.
• Suppose X has a density f and that there exists a density ϕ and a constant C such that
∀x, f (x) ≤ Cg(x).
Then, given a sequence of random variables having density ϕ, the following algorithm
produces a (single) simulation of X.
1. Simulate a number from the Random Variable with distribution g. Call the outcome
y.
2. Simulate a number from a uniform over [0, Cg(y)]. Call the outcome u.
3. If u ≤ f (y), then take X = y. Otherwise return to step 1.
Indeed, let Y be a random variable with density ϕ and U , conditionally on Y = y, be

uniform over [0, Cg(y)]. Then
P(Y ≤ x|U ≤ f (Y )) = F (x),

37
where F is the CDF of X.
It follows that X has CDF F and density f .

38
Chapter 4
Further Topics on Random Variables
1 Transforms
• The moment generating function (MGF) of a random variable X is
MX (t) = E[etX ].
In the discrete case, ∑
MX (t) = etx pX (x).
x
In the continuous case, ∫ ∞
MX (t) = etx fX (x)dx.
−∞
Note that in general MX (t) may not be defined for all values of t, however MX (0) is
always defined and equals 1.
• Moment generating functions for some common random variables
– Bernoulli p: pX (x) = px (1 − p)1−x , x = 0, 1

MX (t) = 1 − p + pet
( )
n x
– Binomial (n, p): pX (x) = p (1 − p)n−x , x = 0, 1, . . . , n
x
MX (t) = (1 − p + pet )n
39
40
1
– Discrete Uniform over [m, n]: pX (x) = , x = m, . . . , n
n−m+1
e(n+1)t − emt
MX (t) =
(n − m + 1)(et − 1)
– Geometric p: pX (x) = p(1 − p)x−1 , x = 1, 2, . . .
pet
MX (t) = , t < − ln(1 − p)
1 − (1 − p)et
λx
– Poisson λ: pX (x) = e−λ , x = 0, 1, . . .
x!
MX (t) = eλ(e −1)

t
1
– Uniform over [a, b]: fX (x) = ,a≤x≤b
b−a
ebt − eat
MX (t) =
(b − a)t
41
– Exponential λ: fX (x) = λe−λx , x > 0

λ
MX (t) = , t<λ
λ−t
λα α−1 −λx
– Gamma (α, λ): fX (x) = x e ,x>0
Γ(α)
( )α
λ
MX (t) = , t<λ
λ−t
( )
1 (x − µ)2
– Normal (µ, σ ): fX (x) = √
2
exp − ,x>0
2πσ 2σ 2
( )
MX (t) = exp µt + σ 2 t2 /2
• If MX (t) = MY (t) < +∞ for all values of t in an open interval containing 0, then X and
Y have the same CDF (distribution). In other words, if MX (t) is finite for all values of t
in an open interval containing 0, then MX (t) determines uniquely the CDF (distribution)
of X.
– Identify the distributions of the following random variables:

( )4
2 1 t
∗ MU (t) = + e
3 3
( )
∗ MV (t) = exp t2 − 2t
et
∗ MW (t) =
2 − et
42
1
∗ MX (t) =
1 − 2t
sinh t
∗ MY (t) =
t
( )4
2 1 t
∗ MZ (t) = e t
+ e
3 3
• If the MGF of X can be expressed as

∑
n
tx1 tx2 txn
MX (t) = p1 e + p2 e + . . . + pn e = pk etxk ,
k=1
then X has a discrete distribution with PMF
P[X = xk ] = pk , k = 1, . . . , n.
( )2
2 1 t
– MX (t) = + e
3 3
• MaX+b (t) = ebt MX (at).
– If Z is standard normal and X = σZ + µ, then X is normal (µ, σ 2 ).
• If MX (t) is finite for all values of t in an open interval containing 0, then it admits
derivatives at 0 of all orders and
(n)
MX (0) = E[X n ].
– Let X be binomial (n, p). Then MX′ (0) = np.
– Let X be normal (µ, σ 2 ). Then MX′′ (0) = σ 2 + µ2 .

43
• If the MGF of X can be expressed as
∑
+∞
tn
MX (t) = µn ,
n=0
n!
then X has moments

E[X n ] = µn , n = 0, 1, . . . .
– Let X be standard normal. Then

{ (2k)!
if n is even and n = 2k
E[X ] =
n 2k k!
0 if n is odd
• If X and Y are independent, then
MX+Y (t) = MX (t)MY (t).
– Let X and Y be independent continuous uniform over [0, 1]. Then

( )2
et − 1
MX+Y (t) =
t
2 Sums of Independent Random Variables

2.1 MGF Approach
• The distribution of the sum of two (or more) independent random variables can be ob-
tained by first computing its MGF, and then identifying (inverting) it.
44
– The sum of n independent Bernoulli random variables with the same parameter is
binomial.
– The sum of two independent Poisson random variables is Poisson.
– The sum of two independent normal random variables is normal.
– The sum of two independent exponential random variables with the same parameter
is gamma.
2.2 Direct Approach - Convolution

• The discrete case: if X and Y are independent discrete random variables, then
∑ ∑
pX+Y (z) = pX (x)pY (z − x) = pX (z − y)pY (y).
x y
45
– The sum of two independent Bernoulli p random variables is binomial (2, p).
∑ ∑
– The function γ(z) = ϕ(x)ψ(z − x) = ϕ(z − y)ψ(y) is called the convolution
x y
of ϕ and ψ.
• The continuous case: if X and Y are independent continuous random variables, then
∫ +∞ ∫ +∞
fX+Y (z) = fX (x)fY (z − x)dx = fX (z − y)fY (y)dy.
−∞ −∞
– The sum of two independent exponential λ random variables is gamma (2, λ).
∫ +∞ ∫ +∞
– The function γ(z) = ϕ(x)ψ(z − x)dx = ϕ(z − y)ψ(y)dy is also called the
−∞ −∞
convolution of ϕ and ψ.
3 More on Conditional Expectation and Variance

• Recall that E[Y |X = x] is a function of x. We define E[Y |X] as the random variable
obtained by replacing x by X in E[Y |X = x].
( 2 )
1 x − 2rxy + y 2
– If (X, Y ) has pdf f (x, y) = √ exp − , then E[Y |X = x] =
2π 1 − r2 2(1 − r2 )
rx and E[Y |X] = rX
46
1
– If (X, Y ) has pdf f (x, y) = λ2 e−λy , 0 < x < y, then E[Y |X = x] = x + and
λ
1
E[Y |X] = X + .
λ
e−x 1 + ex
– If (X, Y ) has pdf f (x, y) = , 0 < ln y < x, then E[Y |X = x] = and
ex − 1 2
1 + eX
E[Y |X] = .
2
1
– If (X, Y ) has pdf f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d, then
(b − a)(d − c)
c+d c+d
E[Y |X = x] = and E[Y |X] = .
2 2
• The law of total expectation (also known as the law of iterated expectations)
E[E[Y |X]] = E[Y ].
– In the discrete case:

∑
E[Y ] = E[Y |X = x]pX (x) = E[E[Y |X]].
x
– In the continuous case:

∫ +∞
E[Y ] = E[Y |X = x]fX (x)dx = E[E[Y |X]].
−∞
2
∗ If (X, Y ) has pdf f (x, y) = λ2 e−λy , 0 < x < y, then E[Y ] = .
λ
47
• The conditional variance of Y given X = x is the variance of the conditional PMF/PDF

of Y given X = x; that is,
var(Y |X = x) = E[Y 2 |X = x] − E[Y |X = x]2 .
– In the discrete case,

( )2
∑ ∑
var(Y |X = x) = y 2 pY |X (y|x) − ypY |X (y|x) .
y y
– In the continuous case

∫ (∫ )2
var(Y |X = x) = y fY |X (y|x)dy −
2
yfY |X (y|x)dy .
∗ If (X, Y ) has pdf f (x, y) = λ2 e−λy , 0 < x < y, then
1
var(Y |X = x) = .
λ2
– var(Y |X) is the random variable obtained by replacing x by X in var(Y |X = x).
• The law of total variance
– Let R be a continuous uniform over [0, 1] random variable and, conditional on R = r,

X is binomial with parameters n and r. Then X is discrete uniform over [0, n] and
n(n + 2)
var(X) = .
12
– The law of total variance:
var(Y ) = E[var(Y |X)] + var(E[Y |X]).

48
– If (X, Y ) has pdf f (x, y) = λ2 e−λy , 0 < x < y, then
2
var(Y ) = .
λ2
4 Sum of a Random Number of Random Variables

• Example 4.22 Jane visits a number of bookstores, looking for Great Expectations. Any
given bookstore carries the book with probability p, independent of the others. In a typical
bookstore visited, Jane spends a random amount of time, exponentially distributed with
parameter λ, until she either finds the book or she determines that the bookstore does not
carry it. Assuming that Jane will keep visiting bookstores until she buys the book and
that the time spent in each is independent of everything else, what is the mean, variance
and PDF of the total time Y spent in bookstores?
– Conditional on Jane finding the book in the first bookstore, Y is exponential λ, and
1 1
therefore has conditional mean equal to , conditional variance equal to 2 and
λ λ
conditional PDF equal to λe−λx , x > 0.
– Let N be the number of bookstore needed to find the book. Then
n n
E[Y |N = n] = , var(Y |N = n) = ,
λ λ2
λn
fY |N =n (y) = y n−1 e−λy , y > 0.
(n − 1)!
49
– N is geometric p. Therefore
1 1
E[Y ] = , var(Y ) = , fY (y) = pλe−pλy , y > 0.
pλ p 2 λ2
• Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables

with mean µ, variance σ 2 and MGF MX (t). Consider the sum of a random number of
random variables
Y = X1 + . . . + XN
where N is an integer-valued random variable independent of the family X1 , X2 , . . ..
– E[Y ] = µE[N ].
– var(Y ) = σ 2 E[N ] + µ2 var(N ).
∑
+∞
– MY (t) = MX (t)n pN (n).
n=0
50
5 Covariance and Correlation

• The covariance of two random variables X and Y is given by
cov(X, Y ) = E[XY ] − E[X]E[Y ].
If cov(X, Y ) = 0, we say that X and Y are uncorrelated.
• Let X and Y be two random variables.
– cov(X, Y ) = E[(X − E[X])(Y − E[Y ])].
– If X and Y are independent, then they are uncorrelated. The converse is not always
true.
– var(X + Y ) = var(X) + var(Y ) + 2cov(X, Y ).
• If X1 , X2 , . . . , Xn are n random variables, then

( n )
∑ ∑n n ∑
∑ k−1
var Xk = var(Xk ) + 2 cov(Xk , Xl ).
k=1 k=1 k=2 l=1
• The correlation coefficient of two random variables X and Y is given by

cov(X, Y )
corr(X, Y ) = √ .
var(X)var(Y )
• Let X and Y be two random variables. Then
– −1 ≤ corr(X, Y ) ≤ 1 and
51
– |corr(X, Y )| = 1 if and only if Y = aX + b for some a and some b.
6 Least Square Estimation

• E[(X − c)2 ] is minimum for c = E[X]:
E[(X − c)2 ] ≥ E[(X − E[X])2 ], for all c.
• E[(X − C(y))2 |Y = y] is minimum for C(y) = E[X|Y = y]:
E[(X − C(y))2 |Y = y] ≥ E[(X − E[X|Y = y])2 |Y = y],
for all functions C(y).
• Out of all estimators (random variables) g(Y ) based on Y , the mean squared estimation
error E[(X − g(Y ))2 ] is minimum for g(Y ) = E[X|Y ]:
E[(X − g(Y ))2 ] ≥ E[(X − E[X|Y ])2 ], for all functions g(Y ).
• Out of all linear estimators aY + b based on Y , the mean squared estimation error E[(X −
(aY + b))2 ] is minimum for
cov(X, Y ) cov(X, Y )
a= and b = E[X] − E[Y ].
var(Y ) var(Y )
52
7 The Multivariate Normal Distribution
• The pair (U, V ) is said to have a standard bivariate normal distribution if their joint PDF
is
{ }
1 1 [ 2 ]
fU,V (u, v) = √ exp − u + v − 2ρuv
2
2π 1 − ρ2 2(1 − ρ2 )
– U and V are standard normal.
– The correlation coefficient of U and V is ρ and, U and V are independent if and

only if ρ = 0.
– Conditionally on V = v, U is normal with mean ρv and variance 1 − ρ2 .
– If
√U and Z are two independent standard normal random variables, and V = ρU +
1 − ρ2 Z (|ρ| < 1), then (U, V ) has a standard bivariate normal distribution with
53
correlation ρ.
• The pair (X, Y ) is said to have a bivariate normal distribution if their joint PDF is
1
fX,Y (x, y) = √
2πστ 1 − ρ2
{ [ ]}
1 (x − µ)2 (y − ν)2 (x − µ)(y − ν)
× exp − + − 2ρ
2(1 − ρ2 ) σ2 τ2 στ
– If (U, V ) has a standard bivariate normal distribution with correlation ρ, then X =

σU + µ and Y = τ V + ν has the bivariate normal distribution displayed above.
– X is normal with mean µ and variance σ 2
– Y is normal with mean ν and variance τ 2
– The correlation coefficient of X and Y is ρ and, X and Y are independent if and

only if ρ = 0
54
σ
– Conditionally on Y = y, X is normal with mean µ+ρ (y−ν) and variance σ 2 (1−ρ2 ).
τ
( ) ( ) ( 2 )
X µ σ ρστ
• If we write Z for the column vector , m for and Σ for the matrix ,
Y ν ρστ τ 2
then the determinant of Σ is |Σ| = σ 2 τ 2 (1 − ρ2 ), its inverse is
( 2 )
−1 1 τ −ρστ
Σ = 2 2
σ τ (1 − ρ2 ) −ρστ σ2
( )
x
and, with z = ,
y
( )
1 1 ′ −1
fX,Y (x, y) = fZ (z) = exp − (z − m) Σ (z − m) .
2π|Σ|1/2 2
Here z′ denotes the transpose of z.
• An n-dimensional vector Z is said to have a multivariate normal distribution with (n-

dimensional) mean vector m and (n-by-n-dimensional) covariance matrix Σ if
( )
1 1 ′ −1
fZ (z) = exp − (z − m) Σ (z − m) .
(2π)n/2 |Σ|1/2 2
• Suppose Z has an n-dimensional multivariate normal distribution with mean vector m

and covariance matrix Σ and let a be a non-zero n-dimensional vector. Then a′ Z is
(univariate) normal with mean a′ m and variance a′ Σa.
• In particular all the marginals of Z are (univariate) normal.
Chapter 5
Limit Theorems
• Let X1 , . . . , Xn be independent and identically distributed random variables with mean

1
µ and variance σ 2 . If Mn = (X1 + . . . + Xn ) then
n
σ2
E[Mn ] = µ and var(Mn ) = .
n
• As n ↑ +∞, var(Mn ) ↓ 0 and “Mn approaches µ” (in some sense). This is a “first order
approximation” of Mn .
• The Central Limit Theorem gives a “second order approximation” of Mn :
Mn − µ
√ converges (in some sense) to a standard normal random variable.
σ/ n
1 Markov and Chebyshev’s Inequalities

• If X is a nonnegative random variable, then
E[X]
P[X ≥ a] ≤ , a > 0.
a
55
56
– Let X be exponential λ. Compare the Markov bound to the exact probability

P[X ≥ a].
• If X is a random variable with mean µ and variance σ 2 , then
σ2
P[|X − µ| ≥ c] ≤ , c > 0.
c2
In particular
1
P[|X − µ| ≥ kσ] ≤ , k > 0.
k2
2 The Weak Law of Large Numbers

µ and variance σ 2 . For every ε > 0, we have
P[|Mn − µ| > ε] −→ 0, as n → ∞.
We say that Mn converges in probability to µ.
• probabilities and Frequencies. Let p = P(A), for some event A. We consider n

repetitions of the experiment, and let Mn be the fraction of time A occurs (empirical
frequency). Then
P[|Mn − p| > ε] −→ 0, as n → ∞.
57
3 The Central Limit Theorem

µ and variance σ 2 , and define
Mn − µ
Zn = √ .
σ/ n
Then, the CDF of Zn converges to the standard normal CDF
∫ z
1
√ e−x /2 dx,
2
Φ(z) =
−∞ 2π
in the sense that

lim P[Zn ≤ z] = Φ(z).
n↑∞
• Let Sn = X1 + . . . + Xn . If n is large,
( )
c − nµ
P[Sn ≤ c] ≈ Φ √ .
σ n
58
• A machine processes parts one at a time. The processing times of different parts are inde-
pendent random variables uniformly distributed over [1, 5]. Approximate the probability
that the number of of parts processed within 320 time units, denoted by N , is at least
100.
• De Moivre-Laplace Approximation to the Binomial
– A binomial random variable Sn with parameters n and p can be viewed as the sum
of n independent Bernoulli random variables X1 , . . . , Xn with common parameter p:
Sn = X1 + . . . + Xn .
Therefore,
( ) ( )
b − np a − np
P[a ≤ Sn ≤ b] ≈ Φ √ −Φ √ .
np(1 − p) np(1 − p)
– Let Sn be a binomial random variable with parameters n = 36 and p = 0.5. an exact

calculation yields,
∑21 ( )
36
P[Sn ≤ 21] = (0.5)3 6 = 0.8785.
k=0
k
59
The CLT approximation gives
P[Sn ≤ 21] ≈ 0.8413.
Note that
P[Sn ≤ 21.5] ≈ 0.879.
This “continuity correction” is often used as a refinement to the CLT approximation.
4 The Strong Law of Large Numbers

µ. Then the sequence of sample means Mn converges to µ, with probability 1, in the
sense that [ ]
P lim Mn = µ = 1.
n↑∞
• Convergence with probability 1 (also called almost sure convergence) is much stronger
than convergence in probability, and the SLLN is much stronger than the WLLN.
• Consider a discrete-time arrival process. The set of times is partitioned into consecutive
intervals of the forms Ik = {2k , 2k + 1, . . . , 2k+1 − 1}. During each interval Ik , there is
exactly one arrival, and all times are equally likely. The arrival times within different
intervals are assumed to be independent. Let Yn = 1 if there is an arrival at time n, and
Yn = 0 if there is no arrival. Then Yn to 0 converges in probability but does not converge
with probability 1.
Probability is common sense reduced to calculation.

Laplace
Problem Sets
MTH2222 – Mathematics of Uncertainty 1
Tutorial 01 – Problem Set

Week 2: Mon 3 Aug – Fri 7 Aug 2015
Problems discussed in tutorials

You should attempt these before attending your tutorial.
1. Let A and B be two sets.
(a) Show that (Ac ∩ B c )c = A ∪ B and (Ac ∪ B c )c = A ∩ B.

(b) Consider rolling a six-sided die once. Let A be the set of outcomes where an odd
number comes up. Let B be the set of outcomes where a 1 or a 2 comes up. Calculate
the sets on both sides of the equalities in part (a), and verify that the equalities hold.
2. Let A and B be two sets with a finite number of elements. Show that the number of
elements in A ∩ B plus the number of elements in A ∪ B is equal to the number of
elements in A plus the number of elements in B.
3. We are given that P(Ac ) = 0.6, P(B) = 0.3, and P(A ∩ B) = 0.2. Determine P(A ∪ B).
4. We roll a four-sided die once and then we roll it as many times as is necessary to obtain a
different face than the one obtained in the first roll. Let the outcome of the experiment be
(r1 , r2 ) where r1 and r2 are the results of the first and the last rolls, respectively. Assume
that all possible outcomes have equal probability. Find the probability that:
(a) r1 is even.
(b) Both r1 and r2 are even.
(c) r1 + r2 < 5.
5. Alice and Bob each choose at random a number between zero and two. We assume a
uniform probability law under which the probability of an event is proportional to its
area. Consider the following events:
A: The magnitude of the difference of the two numbers is greater than 1/3.
B: At least one of the numbers is greater than 1/3.
C: The two numbers are equal.
D: Alices number is greater than 1/3.
Find the probabilities P(A), P(B), P(A ∩ B), P(C), P(D), P(A ∩ D).
6. Show the following generalizations of the formula
P(A ∪ B ∪ C) = P(A) + P(Ac ∩ B) + P (Ac ∩ B c ∩ C).

2 Tutorial 01
(a) Let A, B, C, and D be events. Then

P(A ∪ B ∪ C ∪ D) = P(A) + P(Ac ∩ B) + P (Ac ∩ B c ∩ C) + P(Ac ∩ B c ∩ C c ∩ D).
(b) Let A1 , A2 , . . . , An be events. Then
( n )
∪
P Ak = P(A1 ) + P(Ac1 ∩ A2 ) + P(Ac1 ∩ Ac2 ∩ A3 ) + . . . + P(Ac1 ∩ . . . ∩ Acn−1 ∩ An ).
k=1
7. Suppose that P(E) = 0.6. What can you say about P(E | F ) when
(a) E and F are disjoint?

(b) E ⊂ F ?
(c) F ⊂ E?
8. We roll two fair 6-sided dice. Each one of the 36 possible outcomes is assumed to be
equally likely.
(a) Find the probability that doubles were rolled.

(b) Given that the roll resulted in a sum of 4 or less, find the conditional probability
that doubles were rolled.
(c) Find the probability that at least one die is a 6.
(d) Given that the two dice land on different numbers, find the conditional probability
that at least one die is a 6.
9. A new test has been developed to determine whether a given student is overstressed.
This test is 95% accurate if the student is not overstressed, but only 85% accurate if the
student is in fact overstressed. It is known that 99.5% of all students are overstressed.
Given that a particular student tests negative for stress, what is the probability that the
test results are correct, and that this student is not overstressed?
Revision problems
1. We are given that P(A) = 0.55, P(B c ) = 0.35, and P(A ∪ B) = 0.75. Determine P(B)
and P(A ∩ B).
2. Let A and B be two sets. Under what conditions is the set A ∩ (A ∪ B)c empty?
3. A magical four-sided die is rolled twice. Let S be the sum of the results of the two rolls.
We are told that the probability that S = k is proportional to k, for k = 2, 3, . . . , 8,
and that all possible ways that a given sum k can arise are equally likely. Construct an
appropriate probabilistic model and find the probability of getting doubles.
4. Show the formula
P((A ∩ B c ) ∪ (Ac ∩ B)) = P(A) + P(B) − 2P(A ∩ B),
which gives the probability that exactly one of the events A and B will occur. [Compare
with the formula P(A ∪ B) = P(A) + P(B) − P(A ∩ B), which gives the probability that
at least one of the events A and B will occur.]
5. (a) A gambler has in his pocket a fair coin and a two-headed coin. He selects one of the
coins at random, and when he flips it, it shows heads. What is the probability that
it is the fair coin?
(b) Suppose that he flips the same coin a second time and again it shows heads. What
is the probability that it is the fair coin?
(c) Suppose that he flips the same coin a third time and it shows heads. What is now
the probability that it is the fair coin?
6. Alice and Bob have 2n + 1 coins, each with probability of a head equal to 1/2. Bob tosses
n + 1 coins, while Alice tosses the remaining n coins. Show that the probability that after
all the coins have been tossed, Bob will have gotten more heads than Alice is 1/2.
4 Tutorial 02

Problems to be submitted at the start of your tutorial

1. Write the law of total probability for an event U and a partition {A, B, C}. [2]
2. Give (a precise and complete definition of) the probability mass function of a binomial
random variable with k trials and probability of success u. [2]
3. Let X be a random variable with PMF

x 0 2 4 5
p(x) 0.15 0.30 0.30 0.25
Find P[2 < X < 4], P[X ≤ 3] and E[X]. [4]

1. A magnetic tape storing information in binary form has been corrupted, so it can only
be read with bit errors. The probability that you correctly detect a 0 is 0.9, while
the probability that you correctly detect a 1 is 0.85. Each digit is a 1 or a 0 with equal
probability. Given that you read a 1, what is the probability that this is a correct reading?
2. Bonferroni’s inequality.
Prove that for any two events A and B, we have
P(A ∩ B) ≥ P(A) + P(B) − 1.
3. (a) Find an example in which P(A ∩ B) < P(A)P(B).

(b) Find an example in which P(A ∩ B) > P(A)P(B).
4. Let A and B be events such that A ⊂ B. Can A and B be independent? Explain.
5. Suppose that A, B, and C are independent. Use the definition of independence to show
that A and B ∪ C are independent.
6. A parking lot consists of a single row containing n parking spaces (n ≥ 2). Mary arrives
when all spaces are free. Tom is the next person to arrive. Each person makes an equally
likely choice among all available spaces at the time of arrival. Describe the sample space.
Obtain P(A), the probability the parking spaces selected by Mary and Tom are at most
2 spaces apart.
7. A company is interviewing potential employees. Suppose that each candidate is either

qualified, or unqualified with given probabilities q and 1 − q, respectively. The company
tries to determine a candidates qualifications by asking 20 true-false questions. A qual-
ified candidate has probability p of answering a question correctly, while an unqualified
candidate has a probability p of answering incorrectly. The answers to different questions
are assumed to be independent. If the company considers anyone with at least 15 correct
answers qualified, and everyone else unqualified, give a formula for the probability that
the 20 questions will correctly identify someone to be qualified or unqualified.
8. Let A1 , A2 , . . . , An be an independent events and P(Ak ) = pk , 1 ≤ k ≤ n.
(a) What is the probability that at least one of the events A1 , A2 , . . . , An occurs?
(b) What is the probability that none of the events A1 , A2 , . . . , An occurs?
9. Let X be the outcome of the roll of a fair die. Write the PMF of X.
10. The annual premium of a special kind of insurance starts at $1000 and is reduced by
10% after each year where no claim has been filed. The probability that a claim is filed
in a given year is 0.05, independently of preceding years. What is the PMF of the total
premium paid up to and including the year when the first claim is filed?
11. Let X be a discrete random variable that is uniformly distributed over the set of integers
in the range [a, b], where a and b are integers with a < 0 < b. Find the PMF of the
random variable max(0, X).
12. Let Y be a random variable with PMF

y 0 1 2 3 4 5
p(y) 0.05 0.30 0.30 0.20 0.10 0.05
Find P[Y < 3], P[Y ≥ 2], E[Y ] and var(Y ).
13. Let X be a discrete random variable, and let Y = |X|.
(a) Assume that the PMF of X is

{
Kx2 if x = −3, −2, −1, 0, 1, 2, 3
pX (x) =
0 otherwise
where K is a suitable constant. Determine the value of K.

(b) For the PMF of X given in part (a) calculate the PMF of Y .
14. You are visiting the rainforrest, but unfortunately your insect repellent has run out. As a
result, at each second, a mosquito lands on your neck with probability 0.5. If a mosquito
lands, it will bite you with probability 0.2, and it will never bother you with probability
0.8, independently of other mosquitoes. What is the expected time between successive
bites?
6 Tutorial 02
Revision problems
1. In general, what is P(A ∪ B ∪ C)?
2. Write the law of total probability for the event B and the partition {A ∩ B, A ∩ B c , Ac }.
3. Bonferroni’s inequality.
Generalize Problem 2 of the previous section to the case of n events A1 , A2 , ..., An , by
showing that
P(A1 ∩ A2 ∩ . . . ∩ An ) ≥ P(A1 ) + P(A2 ) + . . . + P(An ) − (n − 1).
4. Show that an event is independent of itself if and only if P(A) = 0 or P(A) = 1.
5. We are told that events A and B are independent. In addition, events A and C are
independent. Is it true that A is independent of B∪C? Provide a proof or counterexample
to support your answer.
6. Suppose that A, B, and C are independent. Use the definition of independence to show
that A and B ∩ C are independent.
7. Continuity of Probability Measures. Let (An )n be an increasing sequence of events

∪
∞
(i.e. An ⊆ An+1 , for all n). For such a sequence, it is natural to call An its limit. Let
n=1
B1 = A1 and Bn+1 = An+1 \ An , for n ≥ 1. Show that
(a) (Bn )n is a sequence of disjoint events

∪n ∪
n ∪
∞ ∪∞
(b) Ak = Bk and An = Bn
k=1 k=1 n=1 n=1
( )
∪
∞
Deduce that P An = lim P(An ). Prove the corresponding result for a decreasing
n→∞
n=1
sequence of events.
8. Give (a precise and complete definition of) the probability mass function of a Poisson
random variable with parameter p.
9. Let X be a random variable that takes integer values and is symmetric, that is, P(X =
k) = P(X = −k) for all integers k. What is the expected value of Y = X cos(Xπ) and
Z = sin(Xπ)?


1. Let X be a random variable with PMF
x 0 1 2 3 4
p(x) 5k 4k 3k 2k k
(a) Find k. [1]

(b) Find E[X] and var(X). [2]
(c) Find E[2X ]. [1]
2. The joint PMF of X and Y is pX,Y (x, y) = c(x2 + y 2 ), x, y = 1, 2, 3. Find
(a) c and the marginal PMFs of X and Y ; [2]

(b) E[X]; [1]
(c) The PMF of 3X − 2Y ; [2]
(d) E[3X − 2Y ] in two ways. [1]

1. Fischer and Spassky play a sudden-death chess match whereby the first player to win a
game wins the match. Each game is won by Fischer with probability p, by Spassky with
probability q, and is a draw with probability 1 − p − q.
(a) What is the probability that Fischer wins the match?

(b) What is the PMF, the mean, and the variance of the duration of the match?
2. Let X be a Poisson random variable with parameter λ. Compute E[X], E[X(X − 1)] and
deduce var(X).
3. Imagine a TV game show where each contestant i spins an infinitely calibrated wheel of
fortune, which assigns him/her with some real number with a value between 1 and 100.
All values are equally likely and the value obtained by each contestant is independent of
the value obtained by any other contestant.
(a) Find P(X1 < X2 ).

(b) Find P(X1 < X2 , X1 < X3 ).
8 Tutorial 03
(c) Let N be the integer-valued random variable whose value is the index of the first
contestant who is assigned a smaller number than contestant 1. As an illustration,
if contestant 1 obtains a smaller value than contestants 2 and 3 but contestant 4
has a smaller value than contestant 1 (X4 < X1 ), then N = 4. Find P(N > n) as a
function of n.
(d) Find E[N ], assuming an infinite number of contestants.
4. Let N be an integer-valued random variable. Show that

∑
∞
E[N ] = P(N ≥ i)
i=1
5. A city’s temperature is modelled as a random variable with mean and standard deviation
both equal to 10 degrees Celsius. A day is described as “normal” if the temperature
during that day ranges within one standard deviation from the mean. What would be the
temperature range for a normal day if temperature were expressed in degrees Fahrenheit?
6. Let a and b be positive integers with a ≤ b, and let X be a random variable that takes as
values, with equal probability, the powers of 2 in the interval [2a , 2b ]. Find the expected
value and the variance of X.
7. As an advertising campaign, a chocolate factory places golden tickets in some of its candy
bars, with the promise that a golden ticket is worth a trip through the chocolate factory,
and all the chocolate you can eat for life. If the probability of finding a golden ticket is
p, find the mean and the variance of the number of candy bars you need to eat to find a
ticket.
8. The MIT football team wins any one game with probability p, and loses it with probability
1 − p. Its performance in each game is independent of its performance in other games.
Let L1 be the number of losses before its first win, and let L2 be the number of losses
after its first win and before its second win. Find the joint PMF of L1 and L2 .
9. Your probability class has 250 undergraduate students and 50 graduate students. The
probability of an undergraduate (or graduate) student getting an A is 1/3 (or 1/2, re-
spectively). Let X be the number of students that get an A in your class.
(a) Calculate E[X] by first finding the PMF of X.

(b) Calculate E[X] by viewing X as a sum of random variables, whose mean is easily
calculated.
10. A scalper is considering buying tickets for a particular game. The price of the tickets is
$75, and the scalper will sell them at $150. However, if she can’t sell them at $150, she
won’t sell them at all. Given that the demand for tickets is a binomial random variable
with parameters n = 10 and p = 1/2, how many tickets should she buy in order to
maximize her expected profit?
11. Suppose that X and Y are independent discrete random variables with the same geometric
PMF: pX (k) = pY (k) = p(1 − p)k−1 , k = 1, 2, . . ., where p is a scalar with 0 < p < 1.
Show that for any integer n ≥ 2, the conditional PMF P(X = k|X + Y = n) is uniform.
Revision problems
1. Let X be a discrete random variable that is uniformly distributed over the set of integers
in the range [a, b], where a and b are integers with a < 0 < b. Find the PMF of the
random variable min(0, X).
2. Let X be a discrete random variable, and let Y = |X|.

Give a general formula for the PMF of Y in terms of the PMF of X.
3. A particular binary data transmission and reception device is prone to some error when
receiving data. Suppose that each bit is read correctly with probability p. Find a value
of p such that when 10,000 bits are received, the expected number of errors is at most 10.
4. Let X1 , . . . , Xn be independent, identically distributed random variables with common

mean and variance. Find the values of c and d that will make the following formula true:
E[(X1 + . . . + Xn )2 ] = cE[X12 ] + d(E[X1 ])2
5. St. Petersburg paradox. You toss independently a fair coin and you count the number
of tosses until the first tail appears. If the number is n, you receive 2n . What is the
expected amount that you receive? How much would you be willing to pay to play this
game?
6. A fair coin is tossed successively until two consecutive heads or two consecutive tails
appear. Find the PMF, the expected value, and the variance of the number of tosses.
10 Tutorial 04


1. Give (a precise and complete definition of) the probability density function of a continuous
uniform random variable over the interval [u, v]. [2]
2. Let X be a continuous random variable with PDF

{
x 0<x<1
f (x) =
2−x 1<x<2
Find P[X < 1], P[|X| ≤ 1], the CDF of X, E[X] and var(X). [6]

1. Alvin shops for probability books for K hours, where K is a random variable that is
equally likely to be 1, 2, 3, or 4. The number of books N that he buys is random and
1
depends on how long he shops according to the conditional PMF pN |K (n|k) = , for
k
n = 1, . . . , k.
(a) Find the joint PMF of K and N .

(b) Find the marginal PMF of N .
(c) Find the conditional PMF of K given that N = 2.
(d) Find the conditional mean and variance of K, given that he bought at least 2 but
no more than 3 books.
(e) The cost of each book is a random variable with mean $30. What is the expected
value of his total expenditure? Hint: Condition on the events {N = 1}, . . . , {N = 4},
and use the total expectation theorem.
2. At his workplace, the first thing Oscar does every morning is to go to the supply room
and pick up one, two, or three pens with equal probability 1/3. If he picks up three pens,
he does not return to the supply room again that day. If he picks up one or two pens, he
will make one additional trip to the supply room, where he again will pick up one, two,
or three pens with equal probability 1/3. (The number of pens taken in one trip will not
affect the number of pens taken in any other trip.) Calculate the following:
(a) The probability that Oscar gets a total of three pens on any particular day.
(b) The conditional probability that he visited the supply room twice on a given day,
given that it is a day in which he got a total of three pens.
(c) E[N ] and E[N |C], where E[N ] is the unconditional expectation of N , the total num-
ber of pens Oscar gets on any given day, and E[N |C] is the conditional expectation
of N given the event C = {N > 3}.
(d) σN |C , the conditional standard deviation of the total number of pens Oscar gets on
a particular day, where N and C are as in part (c).
(e) The probability that he gets more than three pens on each of the next 16 days.
(f) The conditional standard deviation of the total number of pens he gets in the next
16 days given that he gets more than three pens on each of those days.
3. Your computer has been acting very strangely lately, and you suspect that it might have
a virus on it. Unfortunately, all 12 of the different virus detection programs you own
are outdated. You know that if your computer does have a virus, each of the programs,
independently of the others, has a 0.8 chance of believing that your computer is infected,
and a 0.2 chance of thinking your computer is fine. On the other hand, if your computer
does not have a virus, each program has a 0.9 chance of believing that your computer is
fine, and a 0.1 chance of wrongly thinking your computer is infected. Given that your
computer has a 0.65 chance of being infected with some virus, and given that you will
believe your virus protection programs only if 9 or more of them agree, find the probability
that your detection programs will lead you to the right answer.
4. The runner-up in a road race is given a reward that depends on the difference between
his time and the winner’s time. He is given 10 dollars for being one minute behind, 6
dollars for being one to three minutes behind, 2 dollars for being 3 to 6 minutes behind,
and nothing otherwise. Given that the difference between his time and the winner’s time
is uniformly distributed between 0 and 12 minutes, find the mean and variance of the
reward of the runner-up.
5. Let X be a random variable with PDF fX (x) = 2x/3, 1 < x ≤ 2, and let Y = X 2 .
Calculate E[Y ] and var(Y ).
6. Find the PDF, the mean, and the variance of the random variable X with CDF FX (x) =
1 − a3 /x3 , x ≥ a, where a is a positive constant.
7. The median of a random variable X is a number µ that satisfies FX (µ) = 1/2. Find the
median of the exponential random variable with parameter λ.
Revision problems
1. A class of n students takes a test in which each student gets an A with probability p,
a B with probability q, and a grade below B with probability 1 − p − q, independently
of any other student. If X and Y are the numbers of students that get an A and a B,
respectively, calculate the joint PMF pX,Y .
2. Let X, Y , and Z be independent geometric random variables with the same PMF: pX (k) =
pY (k) = pZ (k) = p(1 − p)k−1 , k = 1, 2, . . ., where p is a scalar with 0 < p < 1. Find
P(X = k|X + Y + Z = n). Hint: Try thinking in terms of coin tosses.
12 Tutorial 04
3. Joe Lucky plays the lottery on any given week with probability p, independently of
whether he played on any other week. Each time he plays, he has a probability q of
winning, again independently of everything else. During a fixed time period of n weeks,
let X be the number of weeks that he played the lottery and Y be the number of weeks
that he won.
(a) What is the probability that he played the lottery on any particular week, given that
he did not win on that week?
(b) Find the conditional PMF pY |X (y|x).
(c) Find the joint PMF pX,Y (x, y).
(d) Find the marginal PMF pY (y). Hint: One possibility is to start with the answer
to part (c), but the algebra can be messy. But if you think intuitively about the
procedure that generates Y , you may be able to guess the answer.
(e) Find the conditional PMF pX|Y (x|y). Do this algebraically using the preceding
answers.
(f) Re-derive the answer to part (e) by thinking as follows: for each one of the n − Y
weeks that he did not win, the answer to part (a) should tell you something.
4. Give (a precise and complete definition of) the probability density function of an expo-
nential random variable with parameter α.

Week 6: Mon 31 Aug – Fri 4 Sep 2015

1. Write in integral form Γ(a), for a > 0. [2]
2. Compute Γ(3/2)? [2]
3. How do Γ(a + 3) and Γ(a) relate? [2]
4. What is the distribution of Y = −2X − 3 if X is normal with mean 13 and variance 6.[2]

1. Compute the nth moment, n ≥ 1, of a uniform random variable over the interval [0, 1].
2. Compute the nth moment, n ≥ 1, of an exponential random variable with parameter λ.
3. The maintenance manager at a chemical facility knows that the times between repairs,
X, for a specific chemical reactor are well modelled by this distribution:
f (x) = 0.01e−0.01x , x>0
(a) Find the probability that X is less than 30.

(b) Find the probability that X is greater than 15.
(c) Find the probability that X is exactly equal to 100.
(d) Find the probability that X is between 50 and 150.
4. Engineers often use the uniform distribution to model the arrival time of some event
given that the event did occur within some interval. For example, production knows that
a particular pump failed at some time between 1.00 and 3.00 pm. Given that we know it
failed at some time during this period, the pdf for the specific time within the period is
{
1/(b − a) a ≤ x ≤ b
f (x) =
0 otherwise
Where a = 1 and b = 3. This pdf essentially says that all the times within this interval
are equally likely to occur.
(a) Derive the mean for this distribution.

14 Tutorial 05
(b) Derive the variance and the standard deviation for this distribution.
(c) Find the probability that the pump failed after 1.30 pm.
5. A radar tends to overestimate the distance of an aircraft, and the error is a normal
random variable with a mean of 50 meters and a standard deviation 100 meters. What
is the probability that the measured distance will be smaller than the true distance?
6. Let X be normal with mean 1 and variance 4. Let Y = 2X + 3. Calculate the PDF of Y
and find P(Y ≥ 0).
7. A signal of amplitude s = 2 is transmitted from a satellite but is corrupted by noise, and

the received signal is X = s + W , where W is noise. When the weather is good, W is
normal with zero mean and variance 1. When the weather is bad, W is normal with zero
mean and variance 4. Good and bad weather are equally likely. In the absence of any
weather information:
(a) Calculate the PDF of X.

(b) Calculate the probability that X is between 1 and 3.
Revision problems
1. Compute the nth moment, n ≥ 1, of a uniform random variable over the interval [a, b].
2. A random variable with PDF

f (x) = ce−λ|x|
is said to have a Laplace distribution. Find c in terms of λ and compute its nth moment,
n ≥ 1.
3. The CDF of a Weibull random variable X is given by
[ ]
F (x) = 1 − exp −(λx)β , x>0
Find the PDF of X, and express its mean and variance in terms of the Gamma function:
∫ ∞
Γ(x) = z x−1 e−z dz, x > 0.
0

Week 7: Mon 7 Sep – Fri 11 Sep 2015

1. Let X be a standard normal random variable and A = {X > 0}.
Write down the conditional PDF of X given A. [2]
2. Let X be a random variable and A be an event such that, conditional on A, X is ex-

ponential with parameter λ, and conditional on Ac , X is exponential with parameter
µ.
Write E[X] in terms of λ, µ and p, the probability of A. [3]
3. The joint PDF of X and Y is given by
f (x, y) = cxy, 0 < x < 1, 0 < y < 1.
Find c. [3]

1. Oscar uses his high-speed modem to connect to the internet. The modem transmits zeros
and ones by sending signals −1 and +1, respectively. We assume that any given bit has
probability p of being a zero. The telephone line introduces additive zero-mean Gaussian
(normal) noise with variance σ 2 (so, the receiver at the other end receives a signal which
is the sum of the transmitted signal and the channel noise). The value of the noise is
assumed to be independent of the encoded signal value.
(a) Let a be a constant between −1 and +1. The receiver at the other end decides
that the signal −1 (respectively, +1) was transmitted if the value it receives is less
(respectively, more) than a. Find a formula for the probability of making an error.
(b) Find a numerical answer for the question of part (a) assuming that p = 2/5, a = 1/2
and σ 2 = 1/4.
2. Compute the nth moment, n ≥ 1, of a standard normal random variable.
3. Let X be a standard normal random variable, compute E[etX ], for a scalar t.
4. An old modem can take anywhere from 0 to 30 seconds to establish a connection, with
all times between 0 and 30 being equally likely.
16 Tutorial 06
(a) What is the probability that if you use this modem you will have to wait more than
15 seconds to connect?
(b) Given that you have already waited 10 seconds, what is the probability of having to
wait at least 10 more seconds?
5. Consider a random variable X with PDF fX (x) = 2x/3, 1 < x ≤ 2, and let A be the
event {X ≥ 1.5}. Calculate E[X], P(A), and E[X|A].
6. Dino, the cook, has good days and bad days with equal frequency. On a good day, the
time (in hours) it takes Dino to cook a souffle is described by the PDF fG (g) = 2, if
1/2 < g ≤ 1, but on a bad day, the time it takes is described by the PDF fB (b) = 1, if
1/2 < b = 3/2. Find the conditional probability that today was a bad day, given that it
took Dino less than three quarters of an hour to cook a souffle?
7. One of two wheels of fortune, A and B, is selected by the toss of a fair coin, and the
wheel chosen is spun once to determine the value of a random variable X. If wheel A is
selected, the PDF of X is fX|A (x|A) = 1 if 0 < x ≤ 1. If wheel B is selected, the PDF of
X is fX|B (x|B) = 3 if 0 < x ≤ 1/3. If we are told that the value of X was less than 1/4,
what is the conditional probability that wheel A was the one selected?
8. Alexei is vacationing in Monte Carlo. The amount X (in dollars) he takes to the casino
each evening is a random variable with a PDF of the form fX (x) = ax, if 0 ≤ x ≤ 40.
At the end of each night, the amount Y that he has when leaving the casino is uniformly
distributed between zero and twice the amount that he came with.
(a) Determine the joint PDF fX,Y (x, y).

(b) What is the probability that on a given night Alexei makes a positive profit at the
casino?
(c) Find the PDF of Alexei’s profit Y − X on a particular night, and also determine its
expected value.
9. A family has three children, A, B and C, of height X1 , X2 , X3 , respectively. If X1 , X2 , X3

are independent and identically distributed continuous random variables, evaluate the
following probabilities:
(a) P(A is the tallest child).

(b) P (A is taller than B|A is taller than C).
(c) P(A is taller than B|B is taller than C).
(d) P(A is taller than B|A is shorter than C).
(e) P(A is taller than B|B is shorter than C).
10. Let X have a uniform distribution in the unit interval [0, 1], and let Y have an exponential
distribution with parameter ν = 2. Assume that X and Y are independent. Let Z =
X +Y.
(a) Find P(Y ≥ X).

(b) Find the conditional PDF of Z given that Y = y.

(c) Find the conditional PDF of Y given that Z = 3.
11. Let X and Y be independent random variables, with each one uniformly distributed in
the interval [0, 1]. Find the probability of each of the following events.
(a) X > 6/10.

(b) Y < X.
(c) X + Y ≤ 3/10.
(d) max(X, Y ) ≥ 1/3.
Revision problems
1. Let X be a normal random variable with mean µ and variance σ 2 , compute E[etX ], for a
scalar t.
2. Let X be an exponential λ random variable. Find the PDF of
(a) Y = aX, a ̸= 0;
(b) Z = X 2 ;
(c) R = 1 − e−λX .
3. Let X be a standard normal random variable. Find the PDF of X 2 , and identify its
distribution.
4. Let X be a standard normal random variable and A = {X > 0}.

Compute the conditional mean and variance of X given A.
5. Let X and Y be independent random variables, with each one uniformly distributed in
the interval [0, 1]. Find the probability of {XY ≤ 1/4}.
6. Let P a random variable which is uniformly distributed between 0 and 1. On any given
day, a particular machine is functional with probability P . Furthermore, given the value
of P , the status of the machine on different days is independent.
(a) Find the probability that the machine is functional on a particular day.
(b) We are told that the machine was functional on m out of the last n days. Find the
conditional PDF of P . You may use the identity
∫ 1
k!(n − k)!
pk (1 − p)n−k dp = .
0 (n + 1)!
(c) Find the conditional probability that the machine is functional today given that it
was functional on m out of the last n days.
18 Tutorial 07


1. For a jointly continuous pair (X, Y ), write E[Y ] in terms of E[Y |X = x] and fX (x). [2]
2. For a positive continuous random variable X, write down the PDF of Y = X 2 in terms
of the PDF of X. [2]
3. The random variables X and Y have the joint PDF

1 √
fX,Y (x, y) = √ exp(−x2 − 2xy − y 2 )
π 2
Obtain fX (x), fY (y) and E[XY ]. [2]

1. Let X be exponential λ and Y be such that E[Y |X = x] = x2 . Obtain E[XY ].
2. The random variables X and Y have the joint PDF fX,Y (x, y) = λ3 (y −x)e−λy , 0 < x < y.
Obtain fX (x), fY (y) and fX|Y (x|y).
3. Let Y be a gamma random variable with parameters (2, λ), and let X, conditionally on
Y = y, be continuous uniform over [0, y].
(a) Write down the joint PDF of (X, Y ).

(b) Obtain the marginal distribution of X.
4. Your driving time to work is between 30 and 45 minutes if the day is sunny, and between
40 and 60 minutes if the day is rainy, with all times being equally likely in each case.
Assume that a day is sunny with probability 2/3 and rainy with probability 1/3.
(a) Find the PDF, the mean, and the variance of your driving time.
(b) Your distance to work is 20 miles. What is the PDF, the mean, and the variance of
your average speed (driving distance over driving time)?
5. The random variables X and Y have the joint PDF fX,Y (x, y) = 2, x > 0, y > 0 and
x + y ≤ 1. Let A be the event {Y ≤ 0.5} and let B be the event {Y > X}.
(a) Calculate P(B|A).

(b) Calculate fX|Y (x|0.5). Calculate also the conditional expectation and the conditional
variance of X, given that Y = 0.5.
(c) Calculate fX|B (x).
(d) Calculate E[XY ].
(e) Calculate the PDF of Y /X.
√
6. Let X be an exponential random variable with parameter λ. Obtain the PDF of Y = X,
its mean and variance.
7. Let X be a random variable with PDF fX . Find the PDF of the random variable |X| in
the following three cases.
(a) X is exponentially distributed with parameter λ.

(b) X is uniformly distributed in the interval [−1, 2].
(c) fX is a general PDF.
Revision problems
λn+1
1. The random variables X and Y have the joint PDF fX,Y (x, y) = (y − x)n−1 e−λy ,
(n − 1)!
0 < x < y, n integer, n ≥ 2. Obtain fX (x), fY (y) and fX|Y (x|y).
2. The lifetime of a light bulb is supposed to be exponentially distributed with mean inversely
proportional to the length its filament. That is, if x is the filament length, then it is
assumed that the mean lifetime of the light bulb is K/x.
(a) For a filament of length x, write down the PDF of the lifetime Zx .
(b) Production of filaments is such that their lengths can be anywhere between two
limits l and L. In fact a randomly selected light bulb can be assumed to have a
uniformly distributed over [l, L] filament length, herein denoted by X. Let Y be the
lifetime of a randomly selected light bulb.
i. What is the conditional PDF of Y given X = x?
ii. What is the joint PDF of (X, Y )?
iii. What is E[Y ]?
iv. What is the PDF of Y ?
20 Tutorial 08


1. Find the MGF of a discrete random variable with PMF
x 1 2 3 4
[2]
p(x) 0.4 0.3 0.2 0.1
2. Find the MGF of a continuous random variable with PDF
f (x) = 2x, 0 < x < 1.
[2]
3. Find the MGF of the random variable X with PMF

x x
−λ λ −µ µ
pX (x) = pe + (1 − p)e , x = 0, 1, . . .
x! x!
where λ and µ are positive scalars, and p satisfies 0 ≤ p ≤ 1. [2]

1. Let X be a random variable such that
MX (t) = a + be2t + ce4t , E[X] = 3, var(X) = 2.
Find a, b and c, and the PMF of X.
2. The MGF of a random variable Y has the form

( )6
MY (t) = a6 0.1 + 2et + 0.1e4t + 0.4e7t .
Find a, pY (41), pY (11), the third largest possible value of Y , and its corresponding prob-
ability.
3. The MGFs of two independent discrete random variables X and Y are

( )7
1 2t 1 4t
MY (t) = e8(e −1) .
t
MX (t) = e + e ,
2 2
Find pX (15), pY (5), E[X], E[Y 2 ] and P(X + Y = 15).

4. The MGF and mean of a discrete random variable X are
MX (t) = aet + be4(e −1) ,

t
E[X] = 3.
Find
(a) The scalar parameters a and b;

(b) pX (1), E[X 2 ], E[2X ]
(c) P(X + Y = 2), where Y is a random variable that is independent of X and is
identically distributed with X.
5. Suppose that
6 − 3t
MX (t) = .
2(1 − t)(3 − t)
Find the PDF of the associated random variable.
6. Let X1 , X2 , X3 , X4 be independent random variables with common mean, variance, and

MGF denoted by E[X], var(X), and MX (t), respectively. Let Y be a random variable
that is independent of X1 , X2 , X3 , X4 , and has MGF MY (t). Each part of this problem
introduces a new random variable either as a function of X1 , X2 , X3 , X4 and Y , or as an
MGF defined in terms of MX (t) and MY (t). For each part part, determine the mean and
variance of the new random variable.
(a) W = X1 + X2 + X3 + X4 .
(b) V = 0.25(X1 + X2 + X3 + X4 ).
(c) U = X1 + X2 + X3 + X4 + Y .
(d) MQ (t) = [MX (t)]5 .
(e) MH (t) = [MX (t)]2 [MY (t)]3 .
7. Let X and Y be independent exponential random variables with a common parameter λ.
(a) Find the MGF of aX + Y , where a is a constant.

(b) Use the result of part (a) to find the PDF of aX + Y , for the case where a is positive
and different from 1.
(c) Use the result of part (a) to find the PDF of X − Y .
8. Use the formula for the MGF of a Poisson random variable X to calculate E[X] and
E[X 2 ].
22 Tutorial 08
Revision problems
1. Let X1 , X2 , X3 , X4 be independent random variables with common mean, variance, and
MGF denoted by E[X], var(X), and MX (t), respectively. Let Y be a random variable
that is independent of X1 , X2 , X3 , X4 , and has MGF MY (t). Each part of this problem
introduces a new random variable either as a function of X1 , X2 , X3 , X4 and Y , or as an
MGF defined in terms of MX (t) and MY (t). For each part part, determine the mean and
variance of the new random variable.
(a) R = 4X1 − Y .
(b) MG (t) = e6t MX (t).
(c) MD (t) = MX (7t).
2. Let X1 and X2 be independent random variables. Use the properties of MGFs to verify
that var(X1 + X2 ) = var(X1 ) + var(X2 ).

Week 10: Mon 5 Oct – Fri 9 Oct 2015

1. Let X1 and X2 be independent random variables with the same PMF:


 1/4 if x = 1

1/4 if x = 2
pX (x) =

 1/2 if x = 3

0 otherwise
Use convolution to obtain the PMF of Y = X1 + X2 . [4]
2. The random variables X and Y have joint PDF given by
1
fX,Y (x, y) = , (x, y) ∈ [−1, 1] × [−2, 2] ∪ [1, 2] × [−1, 1]
10
(a) Find the conditional PDFs fY |X (y|x) and fX|Y (x|y). [2]
(b) Find E[X|Y ], E[X] and var(X|Y ). Use these to calculate var(X). [2]
(c) Find E[Y |X], E[Y ] and var(Y |X). Use these to calculate var(Y ). [2]

1. Let X be continuous uniform over [0, 2] and Y be continuous unform over [3, 4]. Find
and sketch the PDF of Z = X + Y , using convolutions.
2. Let Y be exponentially distributed with parameter 1, and let Z be uniformly distributed
over the interval [0, 1]. Assume that Y and Z are independent. Find the distribution of
−Z, use convolution to find the PDF of Y − Z, and deduce that of |Y − Z|.
3. Let X be a discrete random variable with PMF pX and let Y be a continuous random
variable, independent of X, with PMF fY . Derive a formula for the PDF of the random
variable X + Y .
4. Oscar is an engineer who is equally likely to work between zero and one hundred hours
each week (i.e. the time he works is uniformly distributed between zero and one hundred).
He gets paid one dollar an hour. If Oscar works more than fifty hours during a week,
there is a probability of 1/2 that he will actually be paid overtime, which means he will
receive two dollars an hour for each hour he works longer than fifty hours. Otherwise, he
will just get his normal pay for all hours he worked that week. Independently of receiving
overtime pay, if Oscar works more than seventy five hours a week, there is a probability
of 1/2 that he will receive a one hundred dollar bonus, in addition to whatever else he
earns. Find the expected value and variance of Oscar’s weekly pay.
24 Tutorial 09
5. Let X be a geometric random variable with parameter P , where P is itself random and
uniformly distributed from 1/n to 1. Let Z = E[X|P ]. Find E[Z] and limn→∞ E[Z].
6. The random variables X and Y are described by a joint PDF which is constant within
the unit area quadrilateral with vertices (0, 0), (0, 1), (1, 2) and (1, 1). Use the law of
total variance to find the variance of X + Y .
7. (a) You roll a fair six-sided, and then you flip a fair coin the number of times shown by
the die. Find the expected value and the variance of the number of heads obtained.
(b) Repeat part (a) for the case where you roll two dice, instead of one.
8. A fair coin is flipped independently until the first head is obtained. For each tail observed
before the first head, the value of a continuous random variable with uniform PDF over
[0, 3] is generated. Let the random variable X be defined as the sum of all the values
obtained before the first head. Find the mean and variance of X.
9. Consider n independent tosses of a die. Each toss has probability pi of resulting in i.

Let Xi be the number of tosses that result in i. Show that X1 and X2 are negatively
correlated (i.e. a large number of ones suggests a smaller number of twos).
Revision problems
1. Consider two independent and identically distributed discrete random variables X and
Y . Assume that their common PMF, denoted by p(x), is symmetric around zero, i.e.
p(x) = p(−x), for all x. Show that the PMF of X +Y is also symmetric around zero and is
( )1/2 ( )1/2
∑ ∑ ∑
largest at zero. Hint: Use the Schwarz inequality: (ak bk ) ≤ a2k b2k .
k k k
2. Let X, Y and Z be discrete random variables. Show the following generalisations of the
law of iterated expectations.
(a) E[Z] = E[E[Z|X, Y ]].

(b) E[Z|X] = E[E[Z|X, Y ]|X].
(c) E[Z] = E[E[E[Z|X, Y ]|X]].
3. The random variables X1 , . . . , Xn have common mean µ, common variance σ 2 and, fur-
thermore, E[Xi Xj ] = c for every pair of distinct i and j. Derive a formula for the variance
of X1 + . . . + Xn in terms of µ, σ 2 , c and n.
4. Let X1 , . . . , Xn be some random variables and let cij = cov(Xi , Xj ). Show that for any
numbers a1 , . . . , an , we have
∑n ∑ n
ai aj ci,j ≥ 0.
i=1 j=1
5. Let X = Y − Z where Y and Z are non-negative random variables such that Y Z = 0.
(a) Show that cov(Y, Z) ≤ 0.

(b) Show that var(X) ≥ var(Y ) + var(Z).

(c) Use the result of part (b) to show that
var(X) ≥ var(max(0, X)) + var(max(0, −X)).
6. Consider two random variables X and Y . Assume for simplicity that they both have zero
mean.
(a) Show that X and E[X|Y ] are positively correlated.

(b) Show that the correlation coefficient of Y and E[X|Y ] has the same sign as the
correlation coefficient of X and Y .
26 Tutorial 10


1. The random variables X and Y are described by a joint PDF of the form
fX,Y (x, y) = ce−8x

2 −6xy−18y 2
.
Find the means, variances, and the correlation coefficient of X and Y . Also, find the
value of the constant c. [4]

1. A police radar always overestimates the speed of incoming cars by an amount that is
uniformly distributed between 0 and 5 miles/hour. Assume that car speeds are uniformly
distributed from 55 to 75 miles/hour. What is the least squares estimate of the car speed
based on the radar’s measurement?
2. The continuous random variables X and Y have joint PDF given by
fX,Y (x, y) = c, 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1 or 1 ≤ x ≤ 2 and x − 1 ≤ y ≤ x.
(a) Find the least squares estimate of Y given that X = x, for all possible values x.
(b) Let g ∗ (x) be the estimate from part (a), as a function of x. Find E[g ∗ (X)] and
var(g ∗ (X)),
(c) Find the mean square error E[(Y − g ∗ (X))2 ]. Is it the same as E[var(Y |X)]?
(d) Find var(Y ).
3. We are given that E[X] = 1, E[Y ] = 2, E[X 2 ] = 5, E[Y 2 ] = 8, and E[XY ] = 1. Find the
linear least squares estimator of Y given X.
4. In a communication system, the value of a random variable X is transmitted, but what
is received (denoted by Y ) is the value of X corrupted by some additive noise; that is
Y = X + W . We know the distribution of X and W , and let us assume that these two
ransom variables are independent and have the same PDF. Calculate the least squares
estimate of X given Y . What happens if X and W are dependent?
5. Consider three zero-mean random variables X, Y and Z with known variances and co-
variances. Give a formula for the linear least squares estimate of X based on Y and Z,
that is, find a and b that minimize E[(X − aY − bZ)2 ]. For simplicity, assume that Y and
Z are uncorrelated.
6. Use the law of total variance to provide a derivation of the formula
var(X) = var(X̂) + var(X̃),
where X̂ = E[X|Y ] and X̃ = X − X̂.
7. Let U and V be independent standard normal random variables. Let
X = U + V, Y = U − 2V.
(a) Do X and Y have a bivariate normal distribution?

(b) Provide a formula for E[X|Y ].
(c) Write down the joint PDF of X and Y .
Revision problems
1. Linear least squares estimate based on several measurements. Let X be a
random variable with mean µ and variance v, and let Y1 , . . . , Yn be measurements of the
form Yi = X +Wi , where the Wi are random variables with mean 0 and variance vi , which
represent measurement errors. We assume that the random variables X, W1 , . . . , Wn are
independent. Show that the linear least squares estimator of X based on Y1 , . . . , Yn is
∑
(µ/v) + ni=1 (Yi /vi )
∑ .
(1/v) + ni=1 (1/vi )
2. Suppose that X and Y are independent normal random variables with the same variance.
Show that X − Y and X + Y are independent.
28 Tutorial 11


1. Given the information E[X] = 7 and var(X) = 9, use the Chebyshev inequality to find a
lower bound for the probability P[4 < X < 10]. [4]

1. Let X be a random variable and let α be a positive constant. Show that

E[|X|α ]
P[|X| ≥ c] ≤ , for allc > 0.
cα
2. Chernoff bound for a Poisson random variable. Let X be a Poisson random variable
with parameter λ.
(a) Show that for every s ≥ 0, we have

s −1)
P[X ≥ k] ≤ eλ(e e−sk .
(b) Assuming that k > λ, show that
e−λ (eλ)k
P[X ≥ k] ≤ .
kk
3. Let X1 , X2 , . . . be a sequence of independent random variables that are uniformly dis-
tributed between 0 and 1. For every n, we let Yn be the median of the values of
X1 , X2 , . . . , X2n+1 . [That is, we order X1 , . . . , X2n+1 in increasing order and let Yn be
the (n+1)st element in this ordered sequence.] Apply the Weak Law of Large Numbers
to the sequence of Bernoulli random variables which equal 1 when Xi ≥ 0.5 + c to show
that that the sequence Yn converges to 1/2, in probability.
4. Uncle Henry has been having trouble keeping his weight constant. In fact, at the end
of each week, he notices that his weight has changed by a random amount, uniformly
distributed between -0.5 and 0.5 pounds. Assuming that the weight change during any
given week is independent of the weight change of any other week, find the probability
that Uncle Henry will gain or lose more than 3 pounds in the next 50 weeks.
5. Let Sn be the number of successes in n independent Bernoulli trials, where the probability
of success in each trial is p = 1/2. Provide a numerical value for the limit as n tends to
∞ for each of the following three expressions.
[n n ]
(a) P − 10 ≤ Sn ≤ + 10 .
2 2
[n n n n]
(b) P − ≤ Sn ≤ + .
2 10 2 10
[ √ √ ]
n n n n
(c) P − ≤ Sn ≤ + .
2 2 2 2
Revision problems
1. Bo assumes that X, the height in meters of any Canadian selected by an equally likely
choice among all Canadians, is a random variable with E[X] = h. Because Bo is sure that
no Canadian is taller than 3 meters, he decides to use 1.5 meters as a conservative value
for the standard deviation of X. To estimate h, Bo uses the average H of the heights of
n Canadians he selects at random.
(a) In terms of h and Bos 1.5 meter bound for the standard deviation of X, determine
the expectation and standard deviation of H.
(b) Find as small a value of n as possible such that the standard deviation of Bos
estimator is guaranteed to be less than 0.01 meters.
(c) Bo would like to be 99% sure that his estimate is within 5 centimeters of the true
average height of Canadians. Using the Chebyshev inequality, calculate the minimum
value of n that will achieve this objective.
(d) If we agree that no Canadians are taller than three meters, why is it correct to use
1.5 meters as an upper bound on the standard deviation for X, the height of any
Canadian selected at random?
2. On any given flight, an airlines goal is to fill the plane as much as possible, without
overbooking. If, on average, 10% of customers cancel their tickets, all independently
of each other, what is the probability that a particular flight will be overbooked if the
airline sells 320 tickets, for a plane that has maximum capacity 300 people? What is the
probability that a plane with maximum capacity 150 people will be overbooked if the
airline sells 160 tickets?
Assessment Summary
The assessment of MTH2222 consists of the following.
1. Weekly homework (8×1.5%=12%)

A small number of exercises from each weekly tutorial sheet will be
assessed. The best 8 out of 10 (weeks 3 – 12 inclusive) will contribute
1.5% each. These are basic problems that are solvable by simple and
direct application of the lecture material studied in the previous week.
They must be submitted at the start of your tutorial.
2. Assignments (3×4%=12%)
Assignment 1 will be handed out on Friday 14 August, is due on Friday
28 August and will be returned during Week 7.
Assignment 2 will be handed out on Friday 4 September, is due on
Friday 18 September and will be returned during Week 10.
Assignment 3 will be handed out on Friday 25 September, is due on
Friday 9 October and will be returned during Week 12.
3. Mid-semester Test (6%)
A short mid-semester test will be conducted during the Thursday
lecture in Week 9 (24 September). It will be returned during Week 11.
4. Final exam (70%)
A three-hour paper will constitute the final exam. It will be held during
the normal second semester examination period.
Assessment Task Value Due Date
1. Assignment 1 4% Friday 28 August
2. Assignment 2 4% Friday 18 September
3. Mid-semester Test 6% Thursday 24 September
4. Assignment 3 4% Friday 9 October

Unit Schedule
An approximate week-by-week activity schedule is given below.
Week Topics/Chapters Assessment
0 No formal assessment
1 Sample space and probability No formal assessment
2 Sample space and probability No formal assessment
3 Discrete random variables Homework
4 Discrete random variables Homework
5 General random variables Homework & Assignment 1
6 General random variables Homework
7 General random variables Homework
8 Further topics on random variables Homework & Assignment 2
9 Further topics on random variables Homework & Test
10 Further topics on random variables Homework & Assignment 3
11 Limit theorems Homework
12 Limit theorems Homework
SWOT VAC No formal assessment
Examination period LINK to Assessment

Policy:
http://www.policy.monash.
edu/policy-
bank/academic/education/
assessment/assessment-
in-coursework-policy.html

MTH2222 Mathematics of Uncertainty

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MTH2222 Mathematics of Uncertainty

Uploaded by

Copyright:

Available Formats

School of Mathematical

Produced and Published by:

1 Sample Space and Probability 3

2 Discrete Random Variables 11

3 General Random Variables 21

4 Further Topics on Random Variables 39

Sample Space and Probability

– If S = {♣, ♢, ♡} and T = {♢, ♡, ♠} then S ∪T = {♣, ♢, ♡, ♠} and S ∩T = {♢, ♡}.

• S c = {x; x ̸∈ S}, (S c )c = S, S ∩ S c = ∅ and S ∪ S c = Ω

– If Ω = {♣, ♢, ♡, ♠} and S = {♢, ♡} then S c = {♣, ♠}.

What do you notice?

• Properties of Probability Laws:

– If A ∩ B = ∅, then P(A ∪ B) = P(A) + P(B).

– If A ⊂ B, then P(A) ≤ P(B).

– P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

– P(A ∪ B) ≤ P(A) + P(B).

– P(A ∪ B ∪ C) = P(A) + P(Ac ∩ B) + P(Ac ∩ B c ∩ C) = P(A) + P(B) + P(C) − P(A ∩

∗ What is P(A ∪ B ∪ C ∪ D)?

• P(·|B) satisﬁes all axioms of probability

– For P(A) ̸= 0 (resp. P(B) ̸= 0) P(A ∩ B) = P(B|A)P(A) (resp. P(A|B)P(B)).

4 Total Probability Theorem and Bayes’ Rule

P(B) = P(B|A1 )P(A1 ) + P(B|A2 )P(A2 ) + . . . + P(B|An )P(An )

• Bayes’ Rule. If A1 , . . . , An is a partition of Ω, then

– BT - Example 1.9 (contd.): A radar registers the presence of an aircraft. What is

In each of the following cases, how do P(A|B) and P(A) compare?

• For P(A) ̸= 0 and P(B) ̸= 0, P(A|B) = P(A) ⇐⇒ P(B|A) = P(B) ⇐⇒ P(A ∩ B) =

• A and B are said to be independent if P(A ∩ B) = P(A)P(B).

• A and B are independent ⇒ A and B c are independent.

• A, B and C are independent if

– A, B and C are pairwise independent and

for any subset I of {1, 2, . . . , n} with at least two indexes.

• With P(C) ̸= 0, A and B are conditionally independent given C if P(A ∩ B|C) =

• Independence does not imply conditional independence and vice versa.

Discrete Random Variables

X(HH) = 2, X(HT ) = X(T H) = 1 and X(T T ) = 0.

X(T ) = 0, X(HT ) = 1, X(HHT ) = 2, X(HHHT ) = 3 . . .

• A function of a random variable deﬁnes another random variable.

2 Probability Mass Functions

P[X = 0] = 0.5, P[X = 1] = 0.25, P[X = 2] = 0.125, P[X = 3] = 0.0625 . . .

• A probability mass functions must satisfy the following properties:

– p(x) ≥ 0, for all x’s in R

3 Functions of Random Variables

4 Expectation, Mean and Variance

– If X > c, then E[X] > c

– The Bernoulli Random Variable with probability p of success: E[X] = p

– The Binomial and Associated Formulae:

– The Arithmetic and Associated Sums:

– The Geometric and Associated Sums: For all z ̸= 1,

– The Exponential Power Series: For all z,

– The Poisson Random Variable with parameter λ: E[X] = λ.

• The variance of a random variable X with PMF pX is

var(X) = E[(X − E[X])2 ]

– var(X) = E[X 2 ] − E[X]2

– The Bernoulli Random Variable with probability p of success: var(X) = p(1−p)

– The Binomial Random Variable with n trials and probability p of success:

– The Poisson Random Variable with parameter λ: var(X) = λ.

• The nth moment of a random variable X with PMF pX is

– Let X be a discrete uniform random variable over [m, n]:

• The expected value of the random variable Y = g(X) is

– Let X be a geometric random variable with parameter p. For 0 < z < 1,

5 Joint PMFs of Multiple Random Variables

• The joint PMF of the random variables X and Y is

pX,Y (x, y) = P[X = x, Y = y]

• Let Z be the random variable g(X, Y ). Then