You are on page 1of 31

Course 003: Basic Econometrics, 2012-2013

Course 003: Basic Econometrics


Rohini Somanathan- Part 1
Sunil Kanwar- Part II
Delhi School of Economics, 2014-2015

Page 0

Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Outline of the Part 1


Main text: Morris H. DeGroot and Mark J. Schervish, Probability and Statistics, fourth edition.
1. Probability Theory: Chapters 1-6
Probability basics: The definition of probability, combinatorial methods, independent
events, conditional probability.
Random variables: Distribution functions, marginal and conditional distributions,
distributions of functions of random variables, moments of a random variable,
properties of expectations.
Some special distributions,laws of large numbers, central limit theorems
2. Statistical Inference: Chapters 7-10
Estimation: definition of an estimator, maximum likelihood estimation, sufficient
statistics, sampling distributions of estimators.
Hypotheses Testing: simple and composite hypotheses, tests for differences in means,
test size and power, uniformly most powerful tests.
Nonparametric Methods

&
Page 1

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Administrative Information
Internal Assessment: 25% for Part 1
1. Midterm: 20%
2. Lab assignments, Tutorial attendance and class participation: 5%
Problem Sets: - Do as many problems from the book as you can. All odd-numbered
exercises have solutions so focus on these.
Tutorials: -Check the notice board in front of the lecture theatre for lists.
Punctuality is critical - coming in late disturbs the rest of the class and me

&
Page 2

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Why is this course useful?


We (as economists, citizens, consumers, exam-takers) are often faced with situations in
which we have to make decisions in the face of uncertainty. This may be caused by:
randomness in the world ( a farmer making planting decisions does not know how much
it will rain during the season, we do not know how many days well be sick next year,
what the chances are of an economic crisis or recovery)
incomplete information about the realized state of the world (Is a politicians promise
sincere? Is a firm telling us the truth about a product? Has our opponent been dealt a
better hand of cards? Is a prisoner guilty or innocent... )
By putting structure on this uncertainty, we can arrive at
decision rules: firms choose techniques, doctors choose drug regimes, electors choose
politicians- these rules have to tell us how best to incorporate new information.
estimates: of empirical relationships (wages and education, drugs and health...)
tests: how likely is it that population parameters take particular values based on the
estimates weve obtained?
Probability theory puts structure on uncertain events and allows us to derive systematic
decision rules. The field of statistics shows us how we can collect and use data to estimate
empirical models and test hypotheses about the population based on our estimates.

&
Page 3

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

A motivating example: gender ratios


We are interested in whether the gender ratio in a population reflects discrimination, either
before of after birth.
Suppose it is equally likely for a child of either sex to be conceived.
We visit a small village with 10 children under the age of 1. If each birth is independent, we
would get considerable variation in the sex-ratio in the absence of discrimination.

.05

.1

probability
.15

.2

.25

P(0)=.0001, P(1)=.001, P(2)=.044, P(3)=.12, P(4)=.21, P(5)=.25 ... (display binomial(10, k, .5))

10

When should we conclude that there is gender bias? Can we get an estimate of this bias?

&
Page 4

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Origins of probability theory


A probability is a number attached to some event which expresses the likelihood of the
event occurring.
A theory of probability was first exposited by European mathematicians in the 16th C
studying gambling problems.
How are probabilities assigned to events?
By thinking about all possible outcomes. If there are n of these, all equally likely, we
can attach numbers n1 to each of them. If an event contains k of these outcomes, we
k
attach a probability n
to the event. This is the classical interpretation of probability.
Alternatively, imagine the event as a possible outcome of an experiment. Its probability
is the fraction of times it occurs when the experiment is repeated a large number of
times. This is the frequency interpretation of probability
In many cases events cannot be thought of in terms of repeated experiments or equally
likely outcomes. We could base likelihoods in this case on what we believe about the
world subjective probabilities. The subjective probability of an event A is a real number
in the interval [0, 1] which reflects a subjective belief in the validity or occurence of event
A. Different people might attach different probabilities to the same events. Examples?
We formalize this subjective interpretation by imposing certain consistency conditions on
combinations of events.

&
Page 5

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Definitions
An experiment is any process whose outcome is not known in advance with certainty. These
outcomes may be random or non-random, but we should be able to specify all of them and
attach probabilities to them.
Experiment

Event

10 coin tosses

4 heads

select 10 LS MPs

one is female

go to your bus-stop at 8

bus arrives within 5 min.

A sample space is the collection of all possible outcomes of an experiment.


An event is a certain subset of possible outcomes in the space S.
The complement of an event A is the event that contains all outcomes in the sample space
that do not belong to A. We denote this event by Ac
The subsets A1 , A2 , A3 . . . . . . of sample space S are called mutually disjoint sets if no two of
these sets have an element in common. The corresponding events A1 , A2 , A3 . . . . . . are said to
be mutually exclusive events.
If A1 , A2 , A3 . . . . . . are mutually exclusive events such that S = A1 A2 A3 . . . . . . , these are
called exhaustive events.

&
Page 6

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Example: 3 tosses of a coin


The experiment has 23 possible outcomes and we can define the sample space S = {s1 , . . . , s8 }
where
s1 = HHH, s2 = HHT s3 = HT H, s4 = HT T , s5 = T HH, s6 = T HT , s7 = T T H, s8 = T T T
Any subset of this sample space is an event.
If we have a fair coin, each of the listed events are equally likely and we attach probability
to each of them.

1
8

Let us define the event A as atleast one head. Then A = {s1 , . . . , s7 }, Ac = {s8 }. A and Ac are
exhaustive events.
The events exactly one head and exactly two heads are mutually exclusive events.
Notice that there are lots of different ways in which we can define a sample space and the
most useful way to do so depending on the event we are interested in (# heads, or with
picking from a deck of cards, we may be interested in the suit, the number or both)

&
Page 7

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

The definition of probability


Definition: Let S be a collection of all events in S. A Probability distribution is a function
P : S [0, 1] which satisfies the following axioms:
1. The probability of every event must be non-negative
P(A) 0 for all events A S
2. If an event is certain to occur, its probability is 1
P(S) = 1
3. For any sequence of disjoint events A1 , . . . . . .
P(
i=1 Ai )

P(Ai )

i=1

Note:
We will typically use P(A) or Pr(A) instead of P(A)
For finite sample spaces S is straightforward to define. For any S which is a subset of the
real line (and therefore infinite) let S be the set of all intervals in S.

&
Page 8

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Probability measures... some useful results


We can use our three axioms to derive some useful results:
Result 1:For each A S, P(A) = 1 P(Ac )
Proof: A Ac = S. By our second axiom, P(S) = 1 and by axiom 3,
P(A Ac ) = P(A) + P(Ac )
Result 2:P() = 0
Proof:

Let A = so Ac = S. Since P(S) = 1, P() = 0 using the first result above.

Result 3:If A1 and A2 are subsets of S such that A1 A2 , then P(A1 ) P(A2 )
Proof: Lets write A2 as: A2 = A1 (Ac1 A2 ). Since these are disjoint, we can use property
3 to get P(A2 ) = P(A1 ) + P(Ac1 A2 ). The second term on the RHS is non-negative (by
axiom 1), so P(A2 ) P(A1 ).
Result 4: For each A S, 0 P(A) 1
Proof: Since A S, we can directly apply the previous result to obtain
P() P(A) P(S) or 0 P(A) 1

&
Page 9

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Some useful results..


Result 5: If A1 and A2 are subsets of S then P(A1 A2 ) = P(A1 ) + P(A2 ) P(A1 A2 )
Proof: As before, the trick is to write A1 A2 as a union of disjoint sets and then add the
probabilities associated with them. Drawing a Venn Diagram helps to do this.
A1 A2 = (A1 Ac2 ) (A1 A2 ) (A2 Ac1 )

(1)

but A1 = (A1 Ac2 ) (A1 A2 ) and A2 = (A2 Ac1 ) (A1 A2 ), so


P(A1 ) + P(A2 ) = P(A1 Ac2 ) + P(A1 A2 ) + P(A2 Ac1 ) + P(A1 A2 )
Subtracting P(A1 A2 ) gives us the expression in (??).

&
Page 10

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Examples using the probability axioms


1. Consider two events A and B such that Pr(A) = 31 and Pr(B) = 21 . Determine the value of
P(BAc ) for each of the following conditions: (a) A and B are disjoint (b) A B (c)
Pr(AB) = 18
2. Consider two events A and B, where P(A) = .4 and P(B) = .7. Determine the minimum and
maximum values of Pr(AB) and the conditions under which they are obtained?
3. A point (x, y) is to be selected from the square containing all points (x, y), such that
0 x 1 and 0 y 1. Suppose that the probability that the point will belong to any
specified subset of S is equal to the area of that subset. Find the following probabilities:
(a) (x 21 )2 + (y 12 )2
(b)

1
2

<x+y<

1
4

3
2

(c) y < 1 x2
(d) x = y
answers: (1) 1/2, 1/6, 3/8 (2) .1, .4 (3) 1-/4, 3/4, 2/3, 0

&
Page 11

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Finite sample spaces


If a sample space S contains a finite number of points s1 , . . . sn , we can specify a probability
distribution on S by assigning a probability to each point si S. This probability must
satisfy the two conditions:
1. pi 0 for i = 1, 2, . . . n and
n
P
2.
pi = 1
i=1

The probability of any event A can now be found as the sum of pi for all outcomes si that
belong to A.
A sample space containing n outcomes is called a simple sample space if the probability
assigned to each of the outcomes s1 . . . , sn is n1 . Probability measures are easy to define in
such spaces. If the event A contains exactly m outcomes, then P(A) = m
n
Notice that for the same experiment, we can define the sample space in multiple ways
depending on the events of interest. For example- suppose were interested in obtaining a
given number of heads in the tossing of 3 coins, our sample space can either comprise all
the 8 possible outcomes (a simple space) or just four outcomes (0,1,2 and 3 heads).
We can arrive at the total number of elements in a sample space through listing all possible
outcomes. A simple sample space for a coin-tossing experiment with 3 fair coins would have
a eight possible outcomes, a roll of two dice would have 36, etc. We then just calculate the
number of elements contained in our event A and divide this by the total number of
outcomes to get our probability (P(2 heads)=3/8 and P(sum of 7)=1/6
Listing outcomes can take a long time, and we can use a number of counting methods to
make things easier and avoid mistakes.

&
Page 12

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Counting methods..the multiplication rule


Sometimes it is useful to think of an experiment as being performed in multiple stages
(tossing coins, picking cards, questions on an exam, going from one city to another via a
third).
If the first stage has m possible outcomes and the second n outcomes, then we can define a
simple sample space with exactly mn outcomes. Each element in this space with be a pair
(xi , yj ).
Example: the experiment of tossing 5 fair coins will have 32 elements in the simple sample
space, the probability of five heads is 1/32 and of one head is 5/32.

&
Page 13

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Permutations
Suppose we are sampling k objects from a total of n distinct objects without replacement.
We are interested in the total number of different arrangements of these objects we can
obtain.
We first pick one object- this can happen in n different ways. Since we are now left with
n 1 objects, the second one can be picked in (n 1) different ways, and so on.
The total number of permutations of n objects taken k at a time is given by
Pn,k = n(n 1) . . . (n k + 1)
and Pn,n = n!
Pn,k can alternatively be written as:
Pn,k = n(n 1).. . . . (n k + 1) = n(n 1).. . . . (n k + 1)

(n k)!
n!
=
(n k)!
(n k)!

In the case with replacement, we can apply the multiplication rule derived above. In this
case there are n outcomes possible for each of the k selections, so the number of elements in
S is nk .

&
Page 14

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

The birthday problem


You go to watch an India-Australia cricket match with a friend.
He would like to bet Rs. 100 that among the group of 23 players on the field (2 teams plus
a referee) at least two people share a birthday
Should you take the bet?
What is the probability that out of a group of k, at least two share a birthday?
the total number of possible birthdays is 365k
365!
the number of different ways in which each of them has different birthdays is (365k)!
(because the second person has only 364 days to choose from, etc.). The required
365!
probability is therefore p = 1 (365k)!365
k

It turns out that for k = 23 this number is .507, so you should take the bet (if you are not
risk-averse)

&
Page 15

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Combinatorial methods..the binomial coefficient


How many different subsets of k elements can be chosen from a set of n distinct elements?
We are not interested in the order in which the k elements are arranged.
Each such subset is called a combination, denoted by Cn,k
We derived above the number of permutations of n elements, taken k at a time. We can
think of these permutations as being derived by the following process:
First pick a set or combination of k elements.
Since there are k! permutations of these elements, this particular combination will give
rise to k! permutations.
This is true of each such combination, therefore the number of permutations is given by
Pn,k = k!Cn,k , or
Pn,k
n!
=
Cn,k =
k!
k!(n k)!

This is also denoted by n
k and called the binomial coefficient.

&
Page 16

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

The multinomial coefficient


Suppose we have elements of multiple types (jobs, modes of transport, methods of water
filtration..) and want to find the number of ways that n distinct elements (individuals,
trips..) can be divided into k groups such that for j = 1, 2, . . . , k jth group containing exactly
nj elements.

The n1 elements for the first group can be chosen in nn1 ways, the second group is chosen

1 ways...The total number of ways of
out of (n n1 ) elements and this can be done in nn
n2
 nn n 

1
1
2 . . . nk1 +nk
dividing the n elements into k groups is therefore nn1 nn
n2
n3
n
k1

This can be simplified to

n!
n1 !n2 !...nk !

This expression is known as the multinomial coefficient.

Examples:
An student organization of 1000 people is picking 4 office-bearers and 8 members for its
1000!
managing council. The total number of ways of picking this groups is given by 4!8!988!
105 students have to be organized into 4 tutorial groups, 3 with 25 students each and
one with the remaining 30 students. How many ways can students be assigned to
groups?

&
Page 17

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Unions of finite numbers of events


We can extent our formula on the probability of a union of events to the case where the number
of events is greater than 2 but finite:
For any three events A1 , A2 , and A3 ,
P(A1 A2 A3 ) = P(A1 ) + P(A2 ) + P(A3 ) [P(A1 A2 ) + P(A1 A3 ) + P(A2 A3 )] + P(A1 A2 A3 )
The easiest way to see this is to draw a Venn Diagram and express the desired set in terms
of 7 disjoint sets, p1 , . . . , p7 .
For a finite number of events, we have:
P(

n
[

i=1

Ai ) =

n
X
i=1

P(Ai )

X
i<j

P(Ai Aj ) +

Pr(Ai Aj Ak ) ...(1)n+1 P(A1 A2 . . . An )

i<j<k

&
Page 18

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Independent Events
Definition: Let A and B be two events in a sample space S. Then A and B are independent
iff P(A B) = P(A)P(B). If A and B are not independent, A and B are said to be dependent.
Events may be independent because they are physically unrelated -tossing a coin and rolling
a die, two different people falling sick with some non-infectious disease, etc.
This need not be the case however, it may just be that one event provides no relevant
information on the likelihood of occurrence of the other.
Example:
The even A is getting an even number on a roll of a die .
The event B is getting one of the first four numbers.
The intersection of these two events is the event of rolling the number 2 or 4, which we
know has probability 13 .
Are A and B independent? Yes because P(A)P(B) =

12
23

1
3

This is because the occurrence of A does not affect the likelihood that B will occur, or
vice-versa. Why?
If A and B are independent, then A and Bc are also independent as are Ac and Bc . (We
require P(A Bc ) = P(A)P(Bc ). But A = (A B) (A Bc ), so with A and B independent,
P(A Bc ) = P(A) P(A)P(B) = P(A)[1 P(B)] = P(A)P(Bc ). Starting now with A and B
complement, we can use the same argument to show Ac and Bc independent.

&
Page 19

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Independent Events..examples and special cases


1. A company has 100 employees, 40 men and 60 women. There are 6 male executives. How
many female executives should there be for gender and rank to be independent?
solution: If gender and rank are independent, then P(M E) = P(M)P(E). We can solve
P(ME)
for P(E) as P(M) = .06
.4 = .15. So there must be 9 female executives.
2. The experiment involves flipping two coins. A is the event that the coins match and B is
the event that the first coins is heads. Are these events independent?
solution: In this case P(B) = P(A) =
events are independent.

1
2

( {H,H} or {T,T}) and P(A B) = 41 , so yes, the

3. Suppose A and B are disjoint sets in S. Does it tell us anything about the independence of
events A and B?
4. Remember that disjointness is a property of sets whereas independence is a property of the
associated probability measure and the dependence of events will depend on the probability
measure that is being used.

&
Page 20

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Independence with 3 or more events


Definition: Let A1 , A2 , A3 . . . . . . An be events in the sample space S. Then A1 , A2 , A3 . . . . . . An are
mutually independent iff P(A1 A2 A3 . . . . . . Ak ) = P(A1 )P(A2 ) . . . P(Ak ). for every collection of
k of these events, where 2 k n These events are pairwise independent iff
P(Ai Aj ) = P(Ai )P(Aj ) for all i, j.
Clearly mutual independence implies pairwise independence but not vice-versa.
Examples:
One ticket is chosen at random from a box containing 4 lottery tickets with numbers
112, 121, 211, 222.
The event Ai is that a 1 occurs in the ith place of the chosen number.
P(Ai ) = 21 i = 1, 2, 3 P(A1 A2 ) = P({112}) = 41 Similarly for A1 A3 and A2 A3 . These 3
events are therefore pairwise independent.
Are they mutually independent? Clearly not: P(A1 A2 A3 ) 6= P(A1 )P(A2 )P(A3 )
Toss two dice, white and black. The sample space consists of all ordered pairs
(i, j) i, j = 1, 2 . . . 6. Define the following events :
A1 : first die = {1, 2 or 3}, P(A1 ) =
A2 : first die = {3, 4 or 5}, P(A2 ) =

1
2
1
2

A3 : the sum of the faces equals 9, P(A3 ) =

1
9

1
In this case, P(A1 A2 A3 ) = P(3, 6) = 36
= ( 12 )( 21 )( 19 ) = P(A1 )P(A2 )P(A3 ) but
1
1
P(A1 A3 ) = P(3, 6) = 36
6= P(A1 )P(A3 ) = 18
, so the events are not independent, nor
pairwise independent.

&
Page 21

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Conditional probability
When we conduct an experiment, we are absolutely sure that the event S will occur.
Suppose now we have some additional information about the outcome, say that it is an
element of B S.
What effect does this have on the probabilities of events in S? How exactly can we use such
additional information to compute conditional probabilities?
Example: The experiment involves tossing two fair coins in succession. What is the
probability of two tails? Suppose you know the first one is a head? What if it is a tail?
We denote the conditional probability of event A, given B by P(A|B)
B is now the conditional sample space and since B is certain to occur, P(B|B) = 1
Event A will now occur iff A B occurs
Definition: Let A and B be two events in a sample space S. If P(B) 6= 0, then conditional
probability of event A given event B is given by
P(A|B) =

P(A B)
P(B)

Notice that P(.|B) is now a probability set function (probability measure) defined for
subsets of B.
For independent events A and B, the conditional and unconditional probabilities are equal:
P(A)P(B)
P(A|B) = P(B) = P(A)

&
Page 22

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Conditional probability...the multiplication rule


The above definition of conditional probability can be manipulated to arrive at a set of rules
that are useful in for computing conditional probabilities for particular types of problems.
We defined the conditional probability of event A given event B as P(A|B) =

P(AB)
P(B)

Multiplying both sides by P(B), we have the multiplication rule for probabilities:
P(A B) = P(A|B)P(B)
This is especially useful in cases where an experiment can be interpreted as being
conducted in two stages. In such cases, P(A|B) and P(B) can often be very easily assigned.
Examples:
Two cards are drawn successively, without replacement from an ordinary deck of
playing cards. What is the probability of drawing two aces?
Here the event B is that the first card drawn is an ace and the event A is that the
4
1
3
1
second card is an ace. P(B) is clearly 52
= 13
and P(A|B) = 51
= 17
The required
1
1
1
probability P(A B) is therefore ( 13 )( 17 ) = 221
There are two types of candidates, competent and incompetent (C and I). The share of
I-type candidates seeking admission is 0.3. All candidates are interviewed by a
committee and the committee rejects incompetent candidates with probability 0.9.
What is the probability that an incompetent candidate is admitted?
Here were interested in P(A I) where P(I) = .3 and P(A|I) = .1, so the required
probability is .03.

&
Page 23

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

The law of total probability


Let S denote the sample space of an experiment and consider k events A1 , A2 , . . . Ak in S
such that A1 , A2 , . . . Ak are disjoint and i = 1k Ai = S. These events are said to form a
partition of the sample space S.
If B is any other event, then the events A1 B, A2 B, . . . , Ak B form a partition of B:
k
P
B = (A1 B) (A2 B), (Ak B) and, since these are disjoint events P(B) =
P(Ai B).
i=1

If P(Ai ) > 0 for all i, then using the multiplication rule derived above, this can be written as:
P(B) =

k
X

P(Ai )P(B|Ai )

i=1

This is known as the law of total probability.


Example: Youre playing a game in which your score is equally likely to take any integer
value between 1 and 50. If your score the first time you play is equal to X, and you play
until you score Y X, what is the probability that Y = 50?
1
Solution: For each value xi , P(X = xi ) = 50
. We can compute the conditional probability of
Y = 50 for each of these values. The event Ai is the probability that X = xi and the event B
is getting a 50 to end the game. The probability of getting xi in the first round and 50 to
end the game is given by the product, P(B|Ai )P(Ai ). The required probability is the sum
of these products over all possible values of i:

P(Y = 50) =

50
X
x=1

1
1
1
1
1
1
.
=
(1 + + + +
) = .09
51 x 50
50
2
3
50

&
Page 24

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Bayes Theorem
Bayes Theorem: (or Bayes Rule) Let the events A1 , A2 , . . . Ak form a partition of S such that
P(Aj ) > 0 for all j = 1, 2, . . . , k, and let B be any event such that P(B) > 0. Then for i = 1, . . . , k,
P(Ai |B) =

P(B|Ai )P(Ai )
k
P
P(Aj )P(B|Aj )
j=1

Proof:
By the definition of conditional probability,
P(Ai |B) =

P(Ai B)
P(B)

The denominators in these expressions are the same by the law of total probability and the
numerators are the same using the multiplication rule.
In the case where the partition of S consists of only two events,
P(A|B) =

P(B|A)P(A)
P(B|A)P(A) + P(B|Ac )P(Ac )

&
Page 25

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Bayes Rule...remarks
Bayes rule provides us with a method of updating events in the partition based on the new
information provided by the occurrence of the event B
Since P(Aj ) is the probability of event Aj prior to the occurrence of event B, it is referred
to as the prior probability of event Aj .
P(Aj |B) is the updated probability of the same event after the occurrence of B and is called
the posterior probability of event Aj .
Bayes rule is very commonly used in game-theoretic models. For example, in political
economy models a Bayes-Nash equilibrium is a standard equilibrium concept: Players (say
voters) start with beliefs about politicians and update these beliefs when politicians take
actions. Beliefs are constrained to be updated based on Bayes conditional probability
formula.
In Bayesian estimation, prior distributions on population parameters are updated given
information contained in a sample. This is in contrast to more standard procedures where
only the sample information is used. The sample would now lead to different estimates,
depending on the prior distribution of the parameter that is used.
A word about Bayes: He was a non-conformist clergyman (1702-1761), with no formal
mathematics degree. He studied logic and theology at the University of Edinburgh.

&
Page 26

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Bayes Rule ...examples


C1 , C2 and C3 are plants producing 10, 50 and 40 per cent of a companys output. The
percentage of defective pieces produced by each of these is 1, 3 and 4 respectively. Given
that a randomly selected piece is defective, what is the probability that it is from the first
plant?
P(C|C1 ))(P(C1 )
(.01)(.1)
1
P(C1 |C) =
=
=
= .03
P(C)
(.01)(.1) + (.03)(.5) + (.04)(.4)
32
How do the prior and posterior probabilities of the event C1 compare? What does this tell
you about the difference between the priors and posteriors for the other events?
Suppose that there is a new blood test to detect a virus. Only 1 in every thousand people
in the population has the virus. The test is 98 per cent effective in detecting a disease in
people who have it and gives a false positive for one per cent of disease free persons tested.
What is the probability that the person actually has the disease given a positive test result:
P(Disease|Positive) =

P(Positive|Disease)P(Disease)
(.98)(.001)
=
= .089
P(Positive)
(.98)(.001) + (.01)(.999)

So in spite of the test being very effective in catching the disease, we have a large number of
false positives.

&
Page 27

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Bayes Rule ... priors, posteriors and politics


To understand the relationship between prior and posterior probabilities a little better, consider
the following example:
A politician, on entering parliament, has a reasonably good reputation. A citizen attaches a
prior probability of 34 to his being honest (undertaking policies to maximize social welfare,
rather than his bank balance).
At the end of his tenure, the citizen finds a very large number of potholes on roads in the
politicians constituency. While these do not leave the citizen with a favorable impression of
the incumbent, it is possible that the unusually heavy rainfall over these years was
responsible.
Elections are coming up. How does the citizen use this information on road conditions to
update his assessment of the moral standing of the politician? Let us compute the posterior
probability of the politicians being honest, given the event that the roads are in bad
condition:
Suppose that the probability of bad roads is
he/she is dishonest.

1
3

if the politician is honest and is

2
3

if

The posterior probability of the politician being honest is now given by


( 31 )( 34 )
P(bad roads|honest)P(honest)
3
= 1 3
=
P(honest|bad roads) =
P(bad roads)
5
( 3 )( 4 ) + ( 32 )( 14 )
What would the posterior be if the prior is equal to 1? What if it the prior is zero? What if
the probability of bad roads was equal to 12 for both types of politicians? When are
differences between priors and posteriors going to be large?

&
Page 28

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

The Monty Hall problem


A game show host leads the contestant to a wall with three closed doors.
Behind one of these is a fancy car, behind the other two a consolation prize (a bag of sweets)
The contestant must first choose a door without any prior knowledge of what is behind each
door.
The host then opens one of the doors hiding a bag of sweets.
The contestant is given an opportunity to switch doors and wins whatever is behind the
door that is finally chosen by him.
Does he raise his chances of winning the car by switching?
Suppose that the contestant chooses door 1 and the host opens door three. Denote by
A1 , A2 and A3 the events that the car is behind doors 1,2 and 3 respectively. Let B be
the event that the host opens door 3.
Wed like to compare P(A1 |B) and P(A2 |B).
By Bayes rule, the denominator of both these expressions is P(B), we therefore need to
compare P(B|A1 )P(A1 ) and P(B|A2 )P(A2 )
The first expression is 21 . 13 , the second is
certainly be opened, so P(B|A2 ) = 1

1
3

(because if the car is behind 2 then three will

The contestant can therefore double his probability of being correct by switching. The
posterior probability of A2 is 32 while that of A1 remains 13 .

&
Page 29

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Bayes Rule: The Sally Clark case


Sally Clark was a British solicitor who became the victim of a one of the great miscarriages
of justice in modern British legal history
Her first son died within a few weeks of his birth in 1996 and her second one died in
similarly in 1998 after which she was arrested and tried for their murder.
A well-known paediatrician Professor Sir Roy Meadow, who testified that the chance of two
children from an affluent family suffering sudden infant death syndrome was 1 in 73 million,
which was arrived by squaring 1 in 8500 for likelihood of a cot death in similar circumstance.
Clark was convicted in November 1999. In 2001 the Royal Statistical Society issued a public
statement expressing its concern at the misuse of statistics in the courts and arguing that
there was no statistical basis for Meadows claim
In January 2003, she was released from prison having served more than three years of her
sentence after it emerged that the prosecutors pathologist had failed to disclose
microbiological reports that suggested one of her sons had died of natural causes.
RSS statement excerpts:

In the recent highly-publicised case of R v. Sally Clark, a medical expert

witness drew on published studies to obtain a figure for the frequency of sudden infant death syndrome
(SIDS, or cot death) in families having some of the characteristics of the defendants family. He went on
to square this figure to obtain a value of 1 in 73 million for the frequency of two cases of SIDS in such a
family. ..This approach is, in general, statistically invalid. It would only be valid if SIDS cases arose
independently within families,.. there are very strong a priori reasons for supposing that the assumption
will be false. There may well be unknown genetic or environmental factors that predispose families to SIDS,
so that a second case within the family becomes much more likely. The true frequency of families with two
cases of SIDS may be very much less incriminating than the figure presented to the jury at trial.

&
Page 30

%
Rohini Somanathan

You might also like