You are on page 1of 10

Introduction to Probability Theory

K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay

July 22, 2017


2

Course Plan (each lecture is of 90 minutes duration

1. Introduction. Introduces random experiment, sample space, events,


probability measure, σ-fields, probability space. [Lectures 1-2]

2. Random variables. Definition and examples, Borel σ-field of subsets


of R, σ-field generated by a random variable. [Lectures 3 - 5]

3. Conditional probability and Independence. Conditional proba-


bility, Law of total probability, Bayes theorem, Independence of events,
Independence of σ-fields, Independence of random variables, lim inf
and limsup of events, Borel-Cantelli lemma. [Lectures 6-9]

4. Distributions. Distribution function and its properties, Law of a ran-


dom variable, Important distributions, Classification of random vari-
ables, pmf and pdf of random variables. [Lectures 10-13]

5. Random vectors, Joint distributions. Borel σ-field of subsets


of Rn , Definition and a characterization of random vector, Law of
random vector, Joint distribution function and its properties, Marginal
distribution functions, Conditional pmf and conditional pdf. [Lectures
14- 16]

6. Expectation and Conditional expectation. Definition of expec-


tation of discrete random variable and its properties, Simple random
variables, Definition of expectation of nonnegative random variables,
Definition of expectation of general random variables and properties,
Expectation of continuous random variable with pdf, Monotone and
Dominated convergence theorems (statements), Conditional expecta-
tion of discrete and continuous random variables. [Lectures 17-20]

7. Moment generating functions and Characteristic functions.


Definition and properties, Inversion theorem, Uniqueness theorem,
Continuity theorem (statement). [Lectures 21-24]

8. Limit Theorems. Markov and Chebyshev inequalities, Weak Law of


large numbers, Strong Law of Large numbers, Central Limit Theorem,
Applications. [Lectures 25-27]
Chapter 1

Introduction

LECTURES 1 - 2

Key words : Random experiment, sample space, probability measure,


σ-fields, probability space, Borel σ-field of subsets of (0, 1].

In this chapter we introduce the basic notions random experiment, sam-


ple space, events and probability of event.
We begin with a snap shot of the history of probability. Prehistory of
probability theory is about the calculations of probabilities of outcomes of
games of chances with dice or cards. One of the earliest book on this is Liber
de ludo aleae (The book on games of chances) by Gerolamo Cardano (1501
- 1576) published posthumosly in 1663. In the Chapter 14 of this 15 page
work, the first definition of classical probability is given. This reads as:
So there is one general rule, that we should consider the whole circuit
(all possible outcomes) and the number of those casts which represents in
how many ways the favorable result can occur and compare the number to
the rest of the circuit and according to that proportion should the mutual
wagers be laid so that one may contend on equal terms.
This definition in spirit resembles the definition given by Laplace, the
one we use it as the frequency definition of probability.
The mathematical methods of probability is widely believed to began in
the correspondence Fermat and Pascal on a question, known by the name
the problem of points, posed in 1654 by the Gambler Antoine Gombaud
(Chevelier de Mere). Statement of the problem of points is the following.
Suppose two players A and B stake equal money on being the first to win
n points in a game in which the winner of each point is decided by the toss of

3
4 CHAPTER 1. INTRODUCTION

a fair coin, with head for A and tail for B. If the game is interrupted when
A still lacks m points and B lacks l, how should the total stake be divided
between them?
Problem is been solved by both using multiple ways. The method which
essentially opened the way into the mathematical methods of probability is
the one using combinatorial arguments. In fact the method put the prob-
lem into a general frame work and used combinorial techniques (here using
Pascal’s triangle) to solve it.
During the begining of 18th century, Jacob Bernoulli (in Ars Conjectandi)
and de Moivre (in Doctrines of chances) put probability on a solid math-
ematical framework. In fact Bernoulli proved a version of Law of large
numbers.
Second half of the 19th century saw another revolution for probability
through the birth of Statistical Mechanics. James Maxwell Clark used prob-
ability theory to explain kinetic theory of gases. Infact, Maxwell realized
that one need to compute how molecules are distributed with respect to
their speed, as opposed to the early kinetic theory by Rudolf Clausius which
is based on assuming that all molecules are with same speed. Maxwell used
the concept of distribution function, a concept from probability and a give a
heuristic derivation of the velocity distribution of molecules. This was later
refined by Boltzman to arrive at the Maxwell-Boltzman distribution.
Axiomatic foundation of probability theory was laid by Andrei Kol-
mogorov in 1933.
By a random experiment, we mean an experiment which has multiple
outcomes and one don’t know in advance which outcome is going to occur.
We call this an experiment with ‘random’ outcome. We assume that the set
of all possible outcomes of the experiment is known.

Definition 1.1. Sample space of a random experiment is the set of all


possible outcomes of the random experiment. Unless specified otherwise, we
denote sample space by Ω.
The notion of sample space was formally introduced by von Moses in
1931 though the idea of sample sapce exists and was used even at the time
of Cardano.

Example 1.0.1 Toss a coin and note down the face. This is a random
experiment, since there are multiple outcomes and outcome is not known
before the toss, in other words, outcome occur randomly. More over, the
sample space is {H, T }
5

Example 1.0.2 Toss two coins and note down the number of heads ob-
tained. Here Ω = {0, 1, 2}.

Example 1.0.3 (Urn problem) Two balls say ’R’ and ’B’ are distributed
’at random’ in three urns labeled 1, 2, 3. Here the order of occupancy in an
urn is irrelvent. The sample space is

Ω = {(RB, −, −), (−, RB, −), (−, −, RB), (R, B, −), (R, −, B),
(B, R, −), (B, −, R), (−, R, B), (−, B, R)}.

We will give examples with infinte number of outcomes.

Example 1.0.4 A coin is tossed untill two heads or tails come in succes-
sion. The sample space is

Ω = {HH, T T, HT T, T HH, HT HH, T HT T, · · · }.

Example 1.0.5 Pick a point ‘at random’ from the interval (0, 1]. ‘At ran-
dom’ means there is no bias in picking any point. Sample space is (0, 1].

Definition 1.2 ( Event ) Any subset of a sample space is said to be an event.

Example 1.0.6 {H} is an event corresponding to the sample space in Ex-


ample 1.0.1.

Definition 1.3 (mutually exclusive events) Two events A, B are said to be


mutually exclusive if A ∩ B = ∅.

If A and B are mutually exclusive, then occurrence of A implies non


occurrence of B and vice versa. Note that non occurrence of A need not
imply occurrence of B, since in general Ac 6= B.

Example 1.0.7 The events {H}, {T } of the sample space in Example 1.0.1
are mutually exclusive. But the events {H, T } , {T } are not mutually ex-
clusive.

Now we introduce the concept of probability of events (in other words


probability measure). Intuitively probability quantifies the chance of the
occurrence of an event. We say that an event has occurred, if the outcome
belongs to the event. In general it is not possible to assign probabilities
to all events from the sample space. For the experiment given in Example
6 CHAPTER 1. INTRODUCTION

1.0.5, it is not possible to assign probabilities to all subsets of (0, 1], for
example, it is not possible to assign ’uniform’ probability to a Vitali set. So
one need to restrict to a smaller class of subsets of the sample space. For
the random experiment given in Example 1.0.5, it turns out that one can
assign probability to each interval in (0, 1] as its length. Therefore, one can
assign probability to any finite union of intervals in (0, 1], by representing
the finite union of intervals as a finite disjoint union of intervals and assign
the probability as the sum of the length of these disjoint intervals. In fact
one can assign probability to any countable union intervals in (0, 1] by pre-
serving the desirable property “probability of a countable disjoint union is
the sum of the probabilities”. Also note that if one can assign probability
to an event, then one can assign probability to its compliment, since occur-
rence of the event is same as the non-occurrence of its compliment. Thus
one seek to define probability on those class of events which satisfies “closed
under complimentation” and “closed under countable union”. This leads to
the following special family of events where one can assign probabilities.

Definition 1.4 A family of subsets F of a nonempty set Ω is said to be a


σ-field if it satisfies the following.
(i) Ω ∈ F
(ii) if A ∈ F, then Ac ∈ F
(iii) if A1 , A2 , · · · ∈ F, then ∪∞
n=1 An ∈ F.

Example 1.0.8 Let Ω be a nonempty set. Define

F0 = {∅, Ω}, P(Ω) = {A | A ⊆ Ω} .

Then F0 , P(Ω) are σ-fields. Moreover, if F is a σ-field of subsets of Ω,


then
F0 ⊆ F ⊆ P(Ω) ,
i.e., F0 is the smallest and P(Ω) is the largest σ-field of subsets of Ω.

Example 1.0.9 Let Ω be a nonempty set and A ⊆ Ω. Define

σ(A) = {∅, A, Ac , Ω} .

Then σ(A) is a σ-field and is the smallest σ-field containing the set A.
σ(A) is called the σ-field generated by A.

Lemma 1.0.1 Let I be an index set and {Fi | i ∈ I} be a family of σ-fields.


Then F = ∩i∈I Fi is a σ-field.
7

Proof. Since Ω ∈ Fi for all i ∈ I, we have Ω ∈ F. Now,


A∈F ⇒ Ac ∈ Fi for all i ∈ I
⇒ Ac ∈ F .
Similarly it follows that
A1 , A2 · · · ∈ F ⇒ A1 , A2 , · · · ∈ Fi for all i ∈ I
⇒ ∪∞
n=1 An ∈ Fi , for all i
⇒ ∪∞
n=1 An ∈ F .

Hence F is a σ-field.
Example 1.0.10 Let A ⊆ P(Ω). Then
∩{F | F is a σ − field and A ⊆ F}
is a σ-field and is the smallest σ-field containing A. We denote it by σ(A)
This can be seen as follows. From Lemma 1.0.1, σ(A) is a σ-field. From
the definition of σ(A), clearly A ⊆ σ(A). If F is a σ-field containing A,
then σ(A) ⊆ F. Hence, σ(A) is the smallest σ-field containing A.
Definition 1.5 A family F of subsets of a non empty set Ω is said to be a
field if F satisfies
(i) Ω ∈ F
(ii) if A ∈ F, then Ac ∈ F
(iii) if A1 , A2 ∈ F, then A1 ∪ A2 ∈ F.

Example 1.0.11 Any σ-field is a field. In particular, F0 , σ(A), P(Ω) are


fields.

Example 1.0.12 Let Ω = {1, 2, . . . }. Define


F = {A ⊆ Ω | either A is finite or Ac is finite } .
Then F is a field but not a σ-field.
Note that (i) and (ii) in the definition of field follows easily. To see (iii),
for A1 , A2 ∈ F, if both A1 , A2 are finite so is A1 ∪ A2 and if either A1 or
A2 is not finite, then (A1 ∪A2 )c is finite. Hence (iii) follows. i.e., F is a field.

To see that F is not a σ-field, take


An = {2n + 1}, n = 1, 2, . . . .
Now
∪∞
n=1 An = {3, 5, . . . } ∈
/F.
8 CHAPTER 1. INTRODUCTION

Definition 1.6 (Probability measure)


Let Ω be a nonempty set and F be a σ-field of subsets of Ω. A map
P : F → [0, 1] is said to be a probability measure if P satisfies
(i) (Total mass 1) P (Ω) = 1,
(ii)(countable additivity) if A1 , A2 , · · · ∈ F and are pairwise disjoint,
then
  X∞

P ∪n=1 An = P (An ) .
n=1
Definition 1.7 (Probability space).
The triplet (Ω, F, P ); where Ω, a nonempty set (sample space), F, a
σ-field and P , a probability measure; is called a probability space.
Example 1.0.13 Let Ω = {H, T }, F = P(Ω). Define P on F as
follows.
1
P (∅) = 0, P {H} = P {T } = , P (Ω) = 1.
2
Then (Ω, F, P ) is a probability space. This probability space corresponds to
the random experiment of tossing an unbiased coin and noting the face.
Example 1.0.14 Let Ω = {0, 1, 2, · · · }, F = P(Ω). Define P on F as
follows.
X 5k
P (A) = e−5 , A ∈ F .
k!
k∈A
Then (Ω, F, P ) is a probability space.
Solution.
k ∞
X
−5 5 −5
X 5k
P (Ω) = e = e = e−5 e5 = 1 .
k! k!
k∈Ω k=0

If A1 A2 · · · , ∈ F are pairwise disjoint. Then


X 5k X 5k
e−5 = lim e−5
k! m→∞ k!
k∈∪∞
n=1 An k∈∪m
n=1 An
m X
X 5k
= lim e−5
m→∞ k!
n=1 k∈An
Xm
= lim P (An )
m→∞
n=1

X
= P (An ).
n=1
9

Here first and the last equlity follows from the definition of convergence of
series, second from the definition of finite sum of convergent series.
Therefore properties (i) , (ii) are satisfied. Hence P is a probability
measure.

Theorem 1.0.1 ( Properties of probability measure) Let (Ω, F, P ) a prob-


ability space and A, B, A1 , A2 , . . . are in F. Then

(1) P (Ac ) = 1 − P (A) .

(2) Monotonicity: if A ⊆ B, then

P (A) ≤ P (B) .

(3) Inclusion - exclusion formula:


n
X n
X
P (∪nk=1 Ak ) = P (Ak ) − P (Ai Aj )
k=1 i<j
n
X
+ P (Ai Aj Ak ) . . . + (−1)n+1 P (A1 A2 . . . An ) .
i<j<k

(4) Finite sub-additivity:

P (A ∪ B) ≤ P (A) + P (B) .

(5) Continuity property:


(i) For A1 ⊆ A2 ⊆ . . .

P (∪∞
n=1 An ) = lim P (An ) .
n→∞

(ii) For A1 ⊇ A2 ⊇ . . . ,

P (∩∞
n=1 An ) = lim P (An ) .
n→∞

(6) Boole’s inequality (Countable sub-additivity):



X
P (∪∞
n=1 An ) ≤ P (An ) .
n=1
10 CHAPTER 1. INTRODUCTION

Proof. Since
Ω = A ∪ Ac ∪ ∅ · · ·
we have
1 = P (A) + P (Ac ) + P (∅) + P (∅) + · · · .
Since the RHS is a convergent series, it follows that P (∅) = 0 and hence we
get (1).
Now
A ⊆ B =⇒ B = A ∪ B \ A .
Therefore

P (B) = P (A) + P (B \ A) ⇒ P (B) ≥ P (A) ,

since P (B \ A) ≥ 0. This proves (2).


Will complete the rest of the proofs in the next class.

You might also like