You are on page 1of 288

Contents

1. Introduction to Probability 1
1.1 Denitionsof Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 ProblemsonChapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Mathematical Probability Models 5
2.1 SampleSpacesandProbability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 ProblemsonChapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3. Probability Counting Techniques 15
3.1 CountingArguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Reviewof Useful SeriesandSums . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 ProblemsonChapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4. Probability Rules and Conditional Probability 39
4.1 General Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Rulesfor Unionsof Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Intersectionsof EventsandIndependence . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5 MultiplicationandPartitionRules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 ProblemsonChapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5. Discrete Random Variables and Probability Models 66
5.1 RandomVariablesandProbabilityFunctions. . . . . . . . . . . . . . . . . . . . . . . 66
5.2 DiscreteUniformDistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 HypergeometricDistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.5 NegativeBinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.6 GeometricDistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
ii
iii
5.7 PoissonDistributionfromBinomial . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.8 PoissonDistributionfromPoissonProcess . . . . . . . . . . . . . . . . . . . . . . . 87
5.9 CombiningOther ModelswiththePoissonProcess . . . . . . . . . . . . . . . . . . 92
5.10 Summaryof SingleVariableDiscreteModels . . . . . . . . . . . . . . . . . . . . . . 95
5.11 ProblemsonChapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6. Computational Methods and 1 102
6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3 ArithmeticOperations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4 SomeBasicFunctions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.5 R Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.6 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.7 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.8 ProblemsonChapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7. Expected Value and Variance 111
7.1 SummarizingDataonRandomVariables . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2 Expectationof aRandomVariable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.3 SomeApplicationsof Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.4 MeansandVariancesof Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.5 Moment GeneratingFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.6 ProblemsonChapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8. Discrete Multivariate Distributions 134
8.1 BasicTerminologyandTechniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.2 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.3 MarkovChains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8.4 Expectationfor MultivariateDistributions: CovarianceandCorrelation. . . . . . . . . 155
8.5 MeanandVarianceof aLinear Combinationof RandomVariables . . . . . . . . . . . 163
8.6 MultivariateMoment GeneratingFunctions . . . . . . . . . . . . . . . . . . . . . . . 171
8.7 ProblemsonChapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9. Continuous Probability Distributions 181
9.1 General TerminologyandNotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2 ContinuousUniformDistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.3 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
0
9.4 A Methodfor Computer Generationof RandomVariables . . . . . . . . . . . . . . . . 196
9.5 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
9.6 Useof theNormal DistributioninApproximations . . . . . . . . . . . . . . . . . . . 211
9.7 ProblemsonChapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Sample Final Exam 234
Solutions to Section Problems 240
Answers to End of Chapter Problems 261
Summary of Distributions 282
Probabilities For the Standard Normal Distribution (0, 1) 283
1
1
Sections5.3, 5.5, 5.8, 5.9Chapter 6, andSections7.5, 8.3, 8.6and9.4 arenormallyoptional for Stat 220
Chapter 6andSections8.6andpartsof Section8.3areoftenoptional for Stat 230
1. Introduction to Probability
1.1 Denitions of Probability
You are the product of a randomuniverse. Fromthe Big Bang to your own conception and birth,
randomeventshavedeterminedwhoweareasaspecies, whoyouareas aperson, andmuchof your
experiencetodate. Ironicthereforethatwearenotwell-tunedtounderstandingtherandomnessaround
us, perhapsbecausemillionsof yearsof evolutionhavecultivatedour abilitytoseeregularity, certainty
and deterministic cause-and-effect in theevents and environment about us. Wearegood at nding
patternsinnumbersandsymbols, or relatingtheeatingof certainplantswithillnessandotherswitha
healthy meal. Inmany areas, suchasmathematicsor logic, weassumeweknowtheresultsof certain
processes withcertainty (e.g., 2+3=5), thougheventheseareoftensubject toassumedaxioms. Most
of thereal world, however, fromthebiological sciencestoquantumphysics
2
, involvesvariability and
uncertainty. For example, it is uncertain whether it will rain tomorrow; thepriceof agiven stock a
weekfromtodayisuncertain; thenumber of claimsthat acar insurancepolicyholder will makeover a
one-year periodisuncertain. Uncertaintyor randomness" (i.e. variabilityof results) isusuallydueto
somemixtureof at least twofactorsincluding: (1) variability in populations consistingof animateor
inanimateobjects(e.g., peoplevaryinsize, weight, bloodtypeetc.), and(2) variability in processes or
phenomena(e.g., therandomselectionof 6numbersfrom49inalotterydrawcanleadtoaverylarge
number of different outcomes). Which of thesewould you useto describetheuctuations in stock
pricesor currencyexchangerates?
Variabilityanduncertaintyinasystemmakeit moredifcult toplanor tomakedecisionswithout
suitabletools. Wecannot eliminateuncertaintybut it isusuallypossibletodescribe, quantifyanddeal
withvariability anduncertainty usingthetheory of probability. Thiscoursedevelopsboththemathe-
matical theory andsomeof theapplications of probability. Theapplications of this methodology are
far-reaching, fromnancetothelife-sciences, fromtheanalysisof computer algorithmstosimulation
of queuesandnetworksor thespreadof epidemics. Of coursewedonot havethetimeinthiscourse
2
"Asfar asthelawsof mathematicsrefer toreality, theyarenot certain; andasfar astheyarecertain, theydonot refer to
reality" Albert Einstein, 1921.
1
2
todeveloptheseapplicationsindetail, but someof theend-of-chapter problemswill giveahint of the
extraordinaryrangeof applicationof themathematical theoryof probabilityandstatistics.
It seems logical to beginby deningprobability. Peoplehaveattemptedto do this by givingde-
nitions that reect theuncertainty whether somespeciedoutcomeor event" will occur inagiven
setting. Thesettingis oftentermedanexperiment" or process" for thesakeof discussion. Weof-
tenconsider simpletoy" examples: it isuncertainwhether thenumber 2will turnupwhena6-sided
dieis rolled. It is similarly uncertain whether theCanadian dollar will behigher tomorrow, relative
to theU.S. dollar, thanit is today. So onestepindeningprobability requires envisioningarandom
experiment withanumber of possibleoutcomes. Werefer totheset of all possibledistinct outcomes
to arandomexperiment as thesample space (usually denotedby o). Groups or sets of outcomes of
possibleinterest, subsetsof thesamplespace, wewill call events. Thenwemight deneprobabilityin
threedifferent ways:
1. Theclassical denition: Theprobabilityof someevent is
number of ways the event can occur
number of outcomes in S
,
providedall points inthesamplespaceo areequally likely. For example, whenadieis rolled
theprobabilityof gettinga2is
1
6
becauseoneof thesixfacesisa2.
2. Therelative frequency denition: Theprobability of anevent is the(limiting) proportion(or
fraction) of timestheeventoccursinaverylongseriesof repetitionsof anexperimentor process.
For example, thisdenitioncouldbeusedtoarguethattheprobabilityof gettinga2fromarolled
dieis
1
6
.
3. Thesubjective probability denition: Theprobabilityof anevent isameasureof howsurethe
person making thestatement is that theevent will happen. For example, after considering all
availabledata, aweather forecaster might saythat theprobabilityof raintodayis30%or 0.3.
Unfortunately, all threeof thesedenitionshaveseriouslimitations.
Classical Denition: What does equally likely mean? Thisappearstousetheconcept of proba-
bilitywhiletryingtodeneit! Wecouldremovethephraseprovidedall outcomesareequallylikely,
butthenthedenitionwouldclearlybeunusableinmanysettingswheretheoutcomesino didnottend
tooccur equallyoften.
Relative Frequency Denition: Sincewecannever repeatanexperimentor processindenitely, we
cannever knowtheprobability of any event fromtherelativefrequency denition. Inmany caseswe
3
cant evenobtainalongseries of repetitions dueto time, cost, or other limitations. For example, the
probabilityof raintodaycant reallybeobtainedbytherelativefrequencydenitionsincetodaycant
berepeatedagainunder identical conditions. Intuitively, however, if aprobabilityiscorrect, weexpect
it tobeclosetorelativefrequency, whentheexperiment isrepeatedmanytimes.
Subjective Probability: Thisdenitiongivesnorational basisfor peopletoagreeonaright answer,
andthus woulddisqualify probability asanobjectivescience. Areeveryonesopinionsequally valid
or shouldweonly consult "experts". Thereissomecontroversy about when, if ever, tousesubjective
probability except for personal decision-makingbut it doesplay apart inabranchof Statisticsthat is
oftencalled"BayesianStatistics". Thiswill not bediscussedinStat 230, but it isacommonanduseful
methodfor updatingsubjectiveprobabilitieswithobjectiveexperimental results.
Thedifculties in producingasatisfactory denition can beovercomeby treatingprobability as
amathematical systemdened by aset of axioms. Wedo not worry about thenumerical values of
probabilitiesuntil weconsider aspecicapplication. Thisisconsistentwiththewaythatother branches
of mathematicsaredenedandthenusedinspecicapplications(e.g., thewaycalculusandreal-valued
functionsareusedtomodel anddescribethephysicsof gravityandmotion).
Themathematical approachthat wewill developanduseintheremainingchaptersisbasedonthe
followingdescriptionof aprobability model:
asamplespaceof all possibleoutcomesof arandomexperiment isdened
aset of events, subsetsof thesamplespacetowhichwecanassignprobabilities, isdened
amechanismfor assigningprobabilities(numbersbetween0and1) toeventsisspecied.
Of courseinagivenrunof therandomexperiment, aparticular event mayor maynot occur.
Inorder tounderstandthematerial inthesenotes, youmay needtoreviewyour understandingof
basiccountingarguments, elementaryset theoryaswell assomeof theimportant seriesthat youhave
encounteredinCalculusthat provideabasisfor someof thedistributionsdiscussedinthesenotes. In
thenext chapter, webeginamoremathematical descriptionof probabilitytheory.
4
1.2 Problems on Chapter 1
1.1 Trytothinkof examplesof probabilitiesyouhaveencounteredwhichmight havebeenobtained
byeachof thethreedenitions".
1.2 Whichdenitionsdoyouthinkcouldbeusedfor obtainingthefollowingprobabilities?
(a) Youhaveaclaimonyour car insuranceinthenext year.
(b) Thereisameltdownat anuclear power plant duringthenext 5years.
(c) A personsbirthdayisinApril.
1.3 Giveexamplesof howprobabilityappliestoeachof thefollowingareas.
(a) Lotterydraws
(b) Auditingof expenseitemsinanancial statement
(c) Diseasetransmission(e.g. measles, tuberculosis, STDs)
(d) Publicopinionpolls
1.4 Which of thefollowing can beaccurately described by a"deterministic" model, i.e. amodel
whichdoesnot requireanyconcept of probability?
(a) Thepositionof asmall particleinspace
(b) Thevelocityof anobject droppedfromtheleaningtower of Pisa
(c) Thevalueof astockwhichyoupurchasedfor $20onemonthago
(d) Thepurchasingpower of $20CANaccordingtotheConsumer PriceIndexinonemonth.
2. Mathematical Probability Models
2.1 Sample Spaces and Probability
Consider somephenomenonor processwhichisrepeatable, at least intheory, andsupposethat certain
events or outcomes
1
,
2
,
3
, . . . aredened. Wewill often termthephenomenon or process an
experiment" and refer to asinglerepetition of theexperiment as atrial". Theprobability of an
event , denoted 1(), is anumber between 0 and 1. For probability to beauseful mathematical
concept, it shouldpossesssomeother properties. For example, if our experiment consistsof tossing
acoinwithtwosides, HeadandTail, thenwemight wishtoconsider thetwoevents
1
=Headturns
upand
2
=Tail turnsup. Itdoesnotmakemuchsensetoallow1(
1
) = 0.6 and1(
2
) = 0.6, so
that 1(
1
) +1(
2
) 1. (Whyisthisso? Isthereafundamental reasonor havewesimplyadopted1
asaconvenient scale?) Toavoidthissort of thingwebeginwiththefollowingdenition.
Denition 1 A sample space o is a set of distinct outcomes for an experiment or process, with the
property that in a single trial, one and only one of these outcomes occurs.
Theoutcomes that makeup thesamplespacemay sometimes becalled "samplepoints" or just
"points" onoccasion. A samplespaceisdenedaspart of theprobabilitymodel inagivensettingbut
it isnot necessarilyuniquelydened, asthefollowingexampleshows.
Example: Roll a6-sideddie, anddenetheevents
a
i
= topfaceisi, for i = 1, 2, 3, 4, 5, 6.
Thenwecouldtakethesamplespaceaso = {a
1
, a
2
, a
3
, a
4
, a
5
, a
6
}. (Noteweusethecurly brackets
"{...}" toindicatetheelementsof aset). Insteadof usingthisdenitionof thesamplespacewecould
insteaddeneevents
1 istheevent that anevennumber turnsup
O istheevent that anoddnumber turnsup
andtakeo = {1, O}. Bothsamplespacessatisfythedenition. Whichoneweusewoulddependson
what wewantedtousetheprobability model for. If weexpect never tohavetoconsider eventslike"
5
6
anumber lessthan3turnsup" thenthespaceo = {1, O} will sufce, but inmost cases, if possible,
wechoosesamplepointsthat arethesmallest possibleor "indivisible". Thustherst samplespaceis
likelypreferredinthisexample.
Samplespaces may beeither discrete or non-discrete; o is discreteif it consists of a nite or
countably inniteset of simpleevents. Recall that acountably innitesequenceis onethat can be
put inone-onecorrespondencewiththepositiveintegers, sofor example{
1
2
,
1
3
,
1
4
,
1
5
, ...} iscountably
innite as is the set of all rational numbers. The two sample spaces in the preceding example are
discrete. A samplespaceo = {1, 2, 3, . . . } consisting of all thepositiveintegers is discrete, but a
samplespaceo = {r : r 0} consistingof all positivereal numbersisnot. For thenext fewchapters
weconsider only discretesamplespaces. For discretesamplespaces it is mucheasier to specify the
classof eventstowhichwemay wishtoassignprobabilities; wewill allowall possiblesubsetsof the
samplespace. For exampleif o = {a
1
, a
2
, a
3
, a
4
, a
5
, a
6
} isthesamplespacethen = {a
1
, a
2
, a
3
, a
4
}
and1 = {a
6
} ando itself areall examplesof events.
Denition 2 An event in a discrete sample space is a subset o. If the event is indivisible so it
contains only one point, e.g.
1
= {a
1
} we call it a simple event. An event made up of two or more
simple events such as = {a
1
, a
2
} is called a compound event.
Our notationwill oftennot distinguishbetweenthepoint a
i
andthesimpleevent
i
= {a
i
} which
has this point as its only element, althoughthey differ as mathematical objects. Whenwemean the
probabilityof theevent
1
= {a
1
}, weshouldwrite1(
1
) or 1({a
1
}) butthelatter isoftenshortened
to1(a
i
). Inthecaseof adiscretesamplespaceit iseasy tospecify probabilitiesof eventssincethey
aredeterminedbytheprobabilitiesof simpleevents.
Denition 3 Let o = {a
1
, a
2
, a
3
, . . . } be a discrete sample space. Then probabilities 1(a
i
) are
numbers attached to the a
i
s (i = 1, 2, 3, . . . ) such that the following two conditions hold:
(1) 0 1(a
i
) 1
(2)
P
i
1(a
i
) = 1
Theabovefunction1() ono whichdescribestheset of probabilities{1(a
i
), i = 1, 2, . . . } iscalled
aprobability distribution on o. Thecondition
P
i
1(a
i
) = 1 abovereects theideathat whenthe
process or experiment happens, oneor other of thesimpleevents {a
i
} in o must occur (recall that
the sample space includes all possible outcomes). The probability of a more general event (not
necessarilyasimpleevent) isthendenedasfollows:
7
Denition 4 The probability 1() of an event is the sum of the probabilities for all the simple events
that make up or 1() =
P
o
1(a).
For example, theprobabilityof thecompoundevent = {a
1
, a
2
, a
3
} is1(a
1
) +1(a
2
) +1(a
3
).
Probability theory does not say what numbers to assignto thesimpleevents for agivenapplication,
only thosepropertiesguaranteeingmathematical consistency. Inanactual applicationof aprobability
model, wetrytospecifynumerical valuesof theprobabilitiesthat aremoreor lessconsistent withthe
frequencies of events whentheexperiment is repeated. Inother words wetry tospecify probabilities
thatareconsistentwiththereal world. Thereisnothingmathematicallywrongwithaprobabilitymodel
for atossof acointhat speciesthat theprobability of headsiszero, except that it likely wont agree
withthefrequenciesweobtainwhentheexperiment isrepeated.
Example: Supposea6-sideddieisrolled, andlet thesamplespacebeo = {1, 2, 3, 4, 5, 6}, where1
means thetopfaceis 1, andso on. If thedieis anordinary one, (afair die) wewouldlikely dene
probabilitiesas
1(i) = 1,6 for i = 1, 2, 3, 4, 5, 6, (2.1)
becauseif thedieweretossedrepeatedlybyafair roller (asinsomegamesor gamblingsituations) then
eachnumber wouldoccur closeto1,6 of thetime. However, if thediewereweightedinsomeway, or
if theroller wereabletomanipulatethediesothat 1ismorelikely, thesenumerical valueswouldnot
besouseful. Tohaveauseful mathematical model, somedegreeof compromiseor approximationis
usually required. Is it likely that thedieor theroller areperfectly "fair"? Given(2.1), if wewishto
consider somecompoundevent, theprobability iseasily obtained. For example, if =evennumber
obtains" thenbecause = {2, 4, 6} weget 1() = 1(2) +1(4) +1(6) = 1,2.
Wenowconsider someadditional examples, startingwithsomesimpletoy" problems involving
cards, coins and dice. Once again, to calculate probability for discrete sample spaces, we usually
approachagivenproblemusingthreesteps:
(1) Specifyasamplespaceo.
(2) Assignnumerical probabilitiestothesimpleeventsino.
(3) For any compoundevent , nd1() by addingtheprobabilities of all thesimpleevents that
makeup.
Later wewill discover that having a detailed specication or list of theelements of thesample
spacemaybedifcult. Indeedinmanycasesthesamplespaceissolargethat at best wecandescribe
8
it in words. For thepresent wewill solveproblems that arestated as Find theprobability that ...
by carryingout step(2) above, assigningprobabilitiesthat weexpect shouldreect thelongrunrela-
tivefrequenciesof thesimpleeventsinrepeatedtrials, andthensummingtheseprobabilitiestoobtain
1().
Some Examples
Wheno hasonly afewpoints, oneof theeasiest methodsfor ndingtheprobability of anevent isto
listall outcomes. Inmanyproblemsasamplespaceo withequallyprobablesimpleeventscanbeused,
andtherst fewexamplesareof thistype.
Example: Draw 1 card from a standard well-shufed deck (13 cards of each of 4 suits - spades, hearts,
diamonds, clubs). Find the probability the card is a club.
Solution 1: Let o ={spade, heart, diamond, club}. Theno has 4points, with1of thembeing
club, so1(club)=
1
4
.
Solution 2: Let o = { 2,3,4, ..., 2, ...}. Theneachof the52cardsino hasprobability
1
52
. Theevent of interest is
= {2, 3, ...}
andthisevent has13simpleoutcomesinit all withthesameprobability
1
52.
. Therefore
1() =
1
52
+
1
52
+...
1
52
=
13
52
=
1
4
.
Note 1: A samplespaceis not necessarily unique, as mentionedearlier. Thetwo solutions illustrate
this. Notethat intherst solutiontheevent =thecardis aclub is asimpleevent becauseof the
waythesamplespacewasdened, but inthesecondit isacompoundevent.
Note 2: Insolvingtheproblemwehaveassumedthat eachsimpleevent ino isequallyprobable. For
exampleinSolution1eachsimpleeventhasprobability1,4. Thisseemstobetheonlysensiblechoice
of numerical valuein this setting, but you will encounter problems later on whereit is not obvious
whether outcomesall areequiprobable.
Thetermodds is sometimes usedindescribingprobabilities. Inthis cardexampletheodds in
favour of clubsare1:3; wecouldalsosaytheoddsagainst clubsare3:1. Ingeneral,
9
Figure2.1: 9tossesof twocoinseach
Denition 5 The odds in favour of an event is the probability the event occurs divided by the
probability it does not occur or
1()
11()
. The odds against the event is the reciprocal of this,
11()
1()
.
If theodds against agivenhorsewinningaraceare20to1(or 20:1), what is thecorresponding
probability that thehorsewill wintherace? Accordingtothedenitionabove
11()
1()
= 20, which
gives 1() =
1
21
. Notethat these odds arederived frombettors collectiveopinion and therefore
subjective.
Example: Toss a coin twice. Find the probability of getting one head. (In this course, "one head" is
taken to mean exactly one head. If we meant "at least one head" we would say so.)
Solution 1: Let o = {HH, HT, TH, TT} andassumethesimpleevents eachhaveprobability
1
4
.
(Here, thenotationHT meansheadonthe1
st
tossandtailsonthe2
nd
.) Sinceoneheadoccursfor sim-
pleeventsHT andTH, theevent of interest is = {HT, TH} andweget 1() =
1
4
+
1
4
=
2
4
=
1
2
.
Solution 2: Let o = { 0heads, 1head, 2heads} andassumethesimpleeventseachhaveprobability
1
3
. Then1(1head) =
1
3
.
Whichsolutionisright? Botharemathematicallycorrectinthesensethattheyarebothconsequences
of probabilitymodels. However, wewantasolutionthatreectstherelativefrequencyof occurrencein
repeatedtrialsinthereal world, notjustonethatagreeswithsomemathematical model. Inthatrespect,
thepoints insolution2arenot equally likely. Theevent {1head} occurs moreoftenthaneither {0
head} or {2heads} inactual repeatedtrials. Youcanexperiment toverifythis(for exampleof thenine
replicationsof theexperiment inFigure2.1, 2headsoccurred2of theninetimes, 1headoccurred6
of the9times. For morecertainty youshouldreplicatethis experiment many times. Youcandothis
without benet of coinat http://shazam.econ.ubc.ca/ip/index.html). Sowesaysolution2isincorrect
for ordinary fair coins becauseit is based on an incorrect model. If weweredetermined to usethe
10
samplespaceinsolution2, wecoulddo it by assigningappropriateprobabilities to eachof thethree
simpleeventsbut then0headswouldneedtohaveaprobability of
1
4
, 1head \aprobability of
1
2
and
2heads
1
4
. Wedo not usually dothis becausethereseems littlepoint inusingasamplespacewhose
pointsarenot equallyprobablewhenonewithequallyprobablepointsisreadilyavailable.
Example: Roll a red die and a green die. Find the probability the total is 5.
Solution: Let (r, j) represent gettingr onthereddieandj onthegreendie.
Then, withtheseassimpleevents, thesamplespaceis
o = { (1, 1) (1, 2) (1, 3) (1, 6)
(2, 1) (2, 2) (2, 3) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 6)

(6, 1) (6, 2) (6, 3) (6, 6)}
Eachsimpleevent, for example{(1, 1)} is assignedprobability
1
36
. Thentheevent of interest is the
event that thetotal is5, = {(1, 4)(2, 3)(3, 2), (4, 1)}. Therefore1() =
4
36
Example: Suppose the 2 dice were now identical red dice or equivalently that the observer is color-
blind. Find the probability the total is 5.
Solution 1: Sincewecannolonger distinguishbetween(r, j) and(j, r), theonly distinguishable
pointsino are:
o = { (1, 1) (1, 2) (1, 3) (1, 6)
(2, 2) (2, 3) (2, 6)
(3, 3) (3, 6)
.. ..
(6, 6)}
Using this samplespace, weget a total of 5 frompoints (1, 4) and (2, 3) only. If weassign equal
probability
1
21
toeachpoint (simpleevent) thenweget 1(total is5) =
2
21
.
At thispoint youshouldbesuspicioussince
2
21
6=
4
36
. Thecolour of thediceshouldnt haveanyeffect
onwhat total weget. Theuniversedoes not changethefrequency of real physical events depending
onwhether theobserver iscolour-blindor not, sooneanswer must bewrong! Theproblemisthat the
21pointsino herearenot equally likely. Therewasnothingtheoretically wrongwiththeprobability
model except that if thisexperiment isrepeatedinthereal world, thepoint (1, 2) occursabout twiceas
ofteninthelongrunasthepoint(1,1). Sotheonlysensiblewaytousethissamplespaceconsistentwith
thereal worldistoassignprobability weights
1
36
tothepointsof theform(r, r) and
2
36
tothepoints
(r, j) for r 6= j. Wecan comparetheseprobabilities with experimental evidence. On thewebsite
11
Figure2.2: Resultsof 1000throwsof 2dice
http://www.math.duke.edu/education/postcalc/probability/dice/index.html youmaythrowvirtual dice
upto 10,000times andrecordtheresults. For exampleon1000throws of two dice(seeFigure2.2),
therewere121occasions whenthesumof thevalues onthedicewas 5, indicatingtheprobability is
around121/1000 or 0.121 Thiscompareswiththetrueprobability4,36 = 0.111.
Solution 2: For amorestraightforwardsolutiontotheaboveproblem, pretendthedicecanbedistin-
guishedeventhoughtheycant. (Imagine, for example, that weput tinymark ononedie, or label one
of themdifferently.) Wethenget thesame36samplepointsasintheexamplewiththereddieandthe
greendie. Thefact that onediehasatinymarkcannot changetheprobabilitiessothat
1(total is5) =
4
36
Thelawsdeterminingtheprobabilitiesassociatedwiththesetwodicedonot, of course, knowwhether
your eyesight is so keenthat youcanor cannot distinguishthedice. Theseprobabilities must bethe
samein either case. In many problems when objects areindistinguishableand weareinterested in
calculatingaprobability, youwill discover thatthecalculationismadeeasier bypretendingtheobjects
canbedistinguished.
This illustrates acommonpitfall. Whentreatingobjects inanexperiment as distinguishableleads to
adifferent answer fromtreatingthemasidentical, thepointsinthesamplespacefor identical objects
areusually not equally likely" interms of their longrunrelativefrequencies. It is generally safer to
pretend objects can bedistinguished even when they cant be, in order to get equally likely sample
points.
12
Whilethemethodof ndingprobability by listingall thepoints ino canbeuseful, it isnt practical
whentherearealot of pointstowriteout (e.g., if 3diceweretossedtherewouldbe216pointsino).
Weneedto havemoreefcient ways of guringout thenumber of outcomes ino or inacompound
event without havingtolist themall. Chapter 3considerswaystodothis, andthenChapter 4develops
other waystomanipulateandcalculateprobabilities.
Although we often use toy problems involving things such as coins, dice and simple games for
examples, probabilityisusedtodeal withahugevarietyof practical problemsfromnancetoclinical
trials. Insomesettingssuchasinquestion2.6and2.7below, weneedtorely onpreviousrepetitions
of anexperiment, or onrelatedscienticdata, toassignnumerical probabilitiestoevents.
2.2 Problems on Chapter 2
2.1 Studentsinaparticular programhavethesame4mathprofs. Twostudentsintheprogrameach
independentlyaskoneof their mathprofessors
3
for aletter of reference. Assumeeachisequally
likelytoaskanyof themathprofs.
a) List asamplespacefor thisexperiment.
b) Usethissamplespacetondtheprobabilitybothstudentsaskthesameprof.
2.2 a) List asamplespacefor tossingafair coin3times.
b) What istheprobabilityof 2consecutivetails(but not 3)?
2.3 Youwishto choose2different numbers without replacement (so thesamenumber can not be
chosentwice) from{1, 2, 3, 4, 5}. List all possiblepairs youcouldobtain, assumeall pairs are
equally probable, andndtheprobability thenumberschosendiffer by 1(i.e. thetwonumbers
areconsecutive).
2.4 Four letters addressed to individuals \, A, 1 and 7 arerandomly placed in four addressed
envelopes, oneletter ineachenvelope.
(a) List a24-point samplespacefor thisexperiment. Besuretoexplainyour notation.
(b) List thesamplepointsbelongingtoeachof thefollowingevents:
: \sletter goesintothecorrect envelope;
1: nolettersgointothecorrect envelopes;
3
"Americabelievesineducation: theaverageprofessor earnsmoremoneyinayear thanaprofessional athleteearnsina
wholeweek." EvanEsar (1899- 1995)
13
C: exactlytwolettersgointothecorrect envelopes;
1: exactlythreelettersgointothecorrect envelopes.
(c) Assumingthat the24samplepointsareequallyprobable, ndtheprobabilitiesof thefour
eventsin(b).
2.5 (a) Threeballsareplacedat randominthreeboxes, withnorestrictiononthenumber of balls
per box; list the27possibleoutcomesof thisexperiment. Besuretoexplainyour notation.
Assuming that theoutcomes areall equally probable, nd theprobability of each of the
followingevents:
: therst boxisempty;
1: therst twoboxesareempty;
C: noboxcontainsmorethanoneball.
(b) Find theprobabilities of events , 1 and C when threeballs areplaced at randomin :
boxes(: 3).
(c) Findtheprobabilitiesof events, 1 andC when/ ballsareplacedin: boxes(: /).
2.6 Diagnostic Tests. Supposethat inalargepopulationsomepersons haveaspecic diseaseat a
givenpoint intime. A personcanbetestedfor thedisease, butinexpensivetestsareoftenimper-
fect, andmay giveeither afalsepositive result (thepersondoes not havethediseasebut the
test saystheydo) or afalsenegative result (thepersonhasthediseasebut thetest saystheydo
not).
In arandomsampleof 1000 people, individuals with thediseasewereidentied according to
acompletely accuratebut expensivetest, andalso accordingto aless accuratebut inexpensive
test. Theresultsfor thelessaccuratetest werethat
920personswithout thediseasetestednegative
60personswithout thediseasetestedpositive
18personswiththediseasetestedpositive
2personswiththediseasetestednegative.
(a) Estimate the fraction of the population that has the disease and tests positive using the
inexpensivetest.
(b) Estimatethefractionof thepopulationthat hasthedisease.
(c) Supposethat someonerandomly selectedfromthesamepopulationas thosetestedabove
wasadministeredtheinexpensivetest andit indicatedpositive. Basedontheaboveinfor-
mation, howwouldyouestimatetheprobabilitythat theyactuallyhavethedisease.
18/1000
20/1000
18/(18+60)
1/4
14
2.7 Machine Recognition of Handwritten Digits. Supposethat youhaveanoptical scanner and
associatedsoftwarefor determiningwhichof thedigits 0, 1, ..., 9 anindividual has writtenina
squarebox. Thesystemmay of coursebewrongsometimes, dependingonthelegibility of the
handwrittennumber.
(a) Describeasamplespaceo that includespoints(r, j), wherer standsfor thenumber actu-
allywritten, andj standsfor thenumber that themachineidenties.
(b) Suppose that the machine is asked to identify very large numbers of digits, of which
0, 1, ..., 9 occur equally often, and suppose that the following probabilities apply to the
pointsinyour samplespace:
j(0, 6) = j(6, 0) = .004; j(0, 0) = j(6, 6) = .096
j(5, 9) = j(9, 5) = .005; j(5, 5) = j(9, 9) = .095
j(4, 7) = j(7, 4) = .002; j(4, 4) = j(7, 7) = .098
j(j, j) = .100 for j = 1, 2, 3, 8
Give a table with probabilities for each point (r, j) in o. What fraction of numbers is
correctlyidentied?
2.8
1
Anonymousprofessor X hasaninteger (1 : 9) inmindandaskstwostudents, ||a: and
1ct/ topick numbersbetween1 and9. Whichever iscloser to: gets90%andtheother 80%
inStat 230. If they areequally close, they bothget 85%. If theprofessors number andthat of
Allenarechosenpurelyat randomandAllenannounceshisnumber out loud, describeasample
spaceandastrategywhichleadsBethtothehighest possiblemark.
2.9
1
Inquestions2.4-2.7, what canyousay about howappropriateyouthink theprobability model
isfor theexperiment beingmodelled?
1
Solutionnot inappendix
0.096*2+0.095*2+0.098*2+0.1*4=
0.978
Pick the same number
3. Probability Counting Techniques
Some probability problems can be attacked by specifying a sample space o = {a
1
, a
2
, . . . , a
a
} in
which each simpleevent has probability
1
a
(i.e. is equally likely"). This is referredto aauniform
distributionover theset {a
1
, a
2
, . . . , a
a
}. If acompoundevent contains/ points, then1() =
I
a
.
Inother words, weneedtobeabletocount thenumber of eventsino whicharein. Wereviewrst
somebasicwaystocount outcomesfromexperiments".
3.1 Counting Arguments
Therearetwohelpful rulesfor counting, phrasedintermsof jobs" whicharetobedone.
1. TheAddition Rule: Suppose we can do job 1 in j ways and job 2 in ways. Then we can do
either job 1 OR job 2 (but not both), in j + ways.
For example, supposeaclasshas30menand25women. Thereare30 + 25 = 55 waystheprof.
canpickonestudent toanswer aquestion. If thereare5vowelsand20consonantsonalist andI must
pickoneletter, thiscanbedonein5+20ways.
2. TheMultiplication Rule: Suppose we can do job 1 in j ways and, for each of these ways,
we can do job 2 in ways. Then we can do both job 1 AND job 2 in j ways.
For example, if thereare5vowels and20consonants andI must chooseoneconsonant followed
by onevowel for atwo-letter word, this canbedonein20 5 ways (thereare100suchwords). To
rideabike, youmust havethechainonbothafront sprocket andarear sprocket. For a21speedbike
thereare3waystoselect thefront sprocket and7waystoselecttherear sprocket, i.e. 3 7 = 21 such
combinations.
This interpretationof "OR" as additionand"AND" as multiplicationevident intheadditionand
multiplicationrulesabovewill occur throughout probability, soit ishelpful tomakethisassociationin
your mind. Of coursequestions donot always haveanAND or anOR inthemandyoumay haveto
playaroundwithre-wordingthequestiontodiscover impliedANDsor ORs.
15
16
Example: Supposewepick 2numbers fromdigits 1, 2, 3, 4, 5withreplacement. (Note: with
replacement means that after therst number is pickedit is replaced in theset of numbers, so it
couldbepickedagainasthesecondnumber.) Assumeauniformdistributiononthesamplespace, i.e.
thateverypair of numbershasthesameprobability. Letusndtheprobabilitythatonenumber iseven.
Thiscanberewordedas: Therst number isevenANDthesecondisodd(thiscanbedonein2 3
ways) ORtherstisoddANDthesecondiseven(donein32 ways). Sincetheseareconnectedwith
thewordOR, wecombinethemusingtheadditionruletocalculatethatthereare(23)+(32) = 12
waysfor thiseventtooccur. Sincetherstnumber canbechosenin5waysANDthesecondin5ways,
o contains5 5 = 25 pointsandsinceeachpoint hasthesameprobability, they all haveprobability
1
25
.
Therefore1(onenumber iseven) =
12
25
Whenobjectsareselectedandreplacedafter eachdraw, theadditionandmultiplicationrulesaregen-
erally sufcient to ndprobabilities. Whenobjects aredrawnwithout beingreplaced, somespecial
rulesmaysimplifythesolution.
Note: Thephrasesat random, or uniformly areoftenusedtomeanthat all of thepointsinthesample
spaceareequallylikelysothat intheaboveproblem, everypossiblepair of numberschosenfromthis
set hasthesameprobability
1
25
.
Problems:
3.1.1 (a) A coursehas4sectionswithnolimitonhowmanycanenrol ineachsection. Threestudents
eachpickasectionat random.
(i) Specifythesamplespaceo.
(ii) Findtheprobabilitytheyall endupinthesamesection
(iii) Findtheprobabilitytheyall endupindifferent sections
(iv) Findtheprobabilitynobodypickssection1.
(b) Repeat (a) inthecasewhenthereare: sectionsand: students(: :).
3.1.2 Canadianpostal codesconsistof 3letters(of 26possibleletters) alternatedwith3digits(of the10
possible), startingwithaletter (e.g. N2L 3G1). Assumenoother restrictionsontheconstruction
of postal codes. For apostal codechosenat random, what istheprobability:
(a) all 3lettersarethesame?
(b) thedigitsareall evenor all odd? Treat 0asbeingneither evennor odd.
17
3.1.3 Supposeapasswordhastocontainbetweensixandeight digits, witheachdigit either aletter or
anumber from1to9. Theremust beat least onenumber present.
(a) What isthetotal number of possiblepasswords?
(b) If youstartedtotrypasswordsinrandomorder, what istheprobabilityyouwouldndthe
correct passwordfor agivensituationwithintherst 1,000passwordsyoutried?
Wehavealready discussed aspecial class of discreteprobability models, theuniformmodel, in
whichall of theoutcomeshavethesameprobability. Insuchamodel, wecancalculatetheprobability
of anyevent A bycountingthenumber of outcomesintheevent ,
1() =
Number of outcomesin
Total Number of outcomesino
Herewelookat someformal countingmethodstohelpcalculateprobabilitiesinuniformmodels.
Counting Arrangements: Inmanyproblems, thesamplespaceisasetof arrangementsor sequences.
Theseareclassicallycalledpermutations. A keystepintheargument istobesuretounderstandwhat
it is youarecounting. It is helpful to invent anotationfor theoutcomes inthesamplespaceandthe
eventsof interest (thesearetheobjectsyouarecounting).
Example. Supposethelettersarearrangedat randomtoformasix-letter word(anarrangement) we
must useeachletter onceonly. Thesamplespace
o = {a/cdc), a/cd)c, ..., )cdc/a}
has alargenumber of outcomes and, becauseweformedthewordat random, weassignthesame
probabilitytoeach. Tocountthenumber of wordsino, countthenumber of waysthatwecanconstruct
suchaword eachwaycorrespondstoauniqueword. Consider llingtheboxescorrespondingtothe
sixpositionsinthearrangement
Wecanll therst boxin6wayswithanyoneof theletters. For eachof thesechoices, wecanll
thesecondboxin5wayswithanyoneof theremainingletters. Thusthereare6 5 = 30 waystoll
therst twoboxes. (If youarenot convincedby thisargument, list all thepossiblewaysthat therst
twoboxescanbelled.)
For eachof these30choices, wecanll thethirdbox in4ways usingany oneof theremaining
letterssothereare6 5 4 = 120 waystoll therst threeboxes. Applyingthesamereasoning, we
18
seethat thereare6 5 4 3 2 1 = 720 waystoll the6boxesandhence720 equallyprobable
wordsino.
Nowconsider events suchas : thesecondletter is c or ) so A={ afbcde,aebcdf,...,efdcba}. We
cancount thenumber of outcomesin usingasimilar argument if westart withthesecondbox.
Wecanll thesecondboxin2waysi.e., withanc or ). For eachof thesechoices, wecanthenll
therst box in5ways, sonowwecanll therst twoboxes in2 5 = 10 ways. For eachof these
choices, wecanll theremainingfour boxesin4 3 2 1 = 24 wayssothenumber of outcomes
inA is10 24 = 240. Sincewehaveauniformprobabilitymodel, wehave
1() =
number of outcomesin
number of outcomesino
=
240
720
=
1
3
.
In determining thenumber of outcomes in A, it is important that westart with thesecond box.
Suppose, instead, westart by sayingthereare6waystoll therst box. Nowthenumber of waysof
llingthesecondboxdependsonwhat happenedintherst. If weusedc or ) intherst box, thereis
only oneway toll thesecond. If weuseda, /, c or d for therst box, thereare2waysof llingthe
second. Weavoidthiscomplicationbystartingwiththesecondbox.
Wecangeneralizetheaboveprobleminseveral ways. Ineachcasewecountthenumber of arrange-
mentsby countingthenumber of wayswecanll thepositionsinthearrangement. Supposewestart
with: symbols. Thenwecanmake
: (: 1) ... 1 arrangementsof length: usingeachsymbol onceandonlyonce. This
product isdenotedby:! (read: factorial). Notethat :! = : (: 1)!.
: (: 1) ... (: / + 1) arrangementsof length/ usingeachsymbol at most once. This
product isdenotedby:
(I)
(read: to/ factors). Notethat :
(I)
=
a!
(aI)!
.
: : ... : = :
I
arrangementsof length/ usingeachsymbol asoftenaswewish.
Theterms above, especially thefactorial :! growat anextraordinary rateas afunctionof :. For
example(wewill discuss0! shortly),
: 0 1 2 3 4 5 6 7 8 9 10
:! 1 1 2 6 24 120 720 5040 40320 362880 3628800
Thereisanapproximationto:! calledStirlingsformulawhichisoftenusedfor large:. Firstwhat
would it mean for two sequences of numbers which aregrowing very quickly to beasymptotically
equal? Supposewewish to approximateonesequencea
a
with another sequence/
a
and want the
percentage error of the approximation to approach zero as : grows. This is equivalent to saying
a
a
,/
a
1 as: andunder thesecircumstanceswewill call thetwosequencesasymptotically
equivalent. Stirlings approximationsays that :! is asymptotically equivalent to :
a
c
a

2:. The
error inStirlings approximationis less than1% if : 8 andbecomes very small quitequickly as:
increases.
19
For manyproblemsinvolvingsamplingfromadeckof cardsor areasonablylargepopulation, counting
thenumber of cases by simpleconventional meansis virtually impossible, andweneedthecounting
arguments dealt with here. The extraordinarily large size of populations, in part due to the large
sizeof quantities like:
a
and:!, is part of thereasonthat statistics, sampling, countingmethods and
probabilitycalculationsplaysuchanimportant part inmodernscienceandbusiness.
Example. A pinnumber of length4is formedby randomly selecting(withreplacement) 4digits
fromtheset {0, 1, 2, ..., 9}. Findtheprobabilityof theevents:
: thepinnumber iseven
1: thepinnumber hasonlyevendigits
C: all of thedigitsareunique
1: thepinnumber containsat least one1.
Sincewepick thedigits with replacement, theoutcomes in thesamplespacecan haverepeated
digits.
Thesamplespaceis o = {0000, 0001, ..., 9999} with 10
4
equally probableoutcomes. For the
event = {0000, 0002, ..., 9998}, wecanselect thelast digit tobeany oneof 0, 2, 6, 4, 8 in5ways.
Thenfor eachof thesechoices, wecanselect therst digit in10 ways andsoon. Thereare5 10
3
outcomesin and
1() =
5 10
3
10
4
=
1
2
.
Theevent1 = {0000, 0002, ..., 8888}. Wecanselecttherstdigitin5 ways, andfor eachof these
choices, thesecondin5ways, andsoon. Thereare5
4
outcomesin1 and
1(1) =
5
4
10
4
=
1
16
.
Theevent C = {0123, 0124, ..., 9876}. Wecanselect therst digit in10 ways andfor eachof
thesechoices, thesecondin9 waysandsoon. Thereare10 9 8 7 outcomesinC andso
1(C) =
10 9 8 7
10
4
=
63
125
.
Theevent 1 = {0001, 0011, 0111, 1111, ...}. Tocount thenumber of outcomes, consider thecomple-
mentof 1, orthesetof all outcomesino butnotin1. Wedenotethisevent1 = {0000, 0002, ...., 9999}.
Thereare9
4
outcomesin1 andsothereare10
4
9
4
outcomesin1 and
1(1) =
10
4
9
4
10
4
=
3439
10000
.
For ageneral event , thecomplement of denoted istheset of all outcomesino whichare
not in. It isofteneasier tocount outcomesinthecomplement rather thanintheevent itself.
20
Example. A pinnumber of length4isformedbyrandomlyselecting(without replacement) 4digits
fromtheset {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Findtheprobabilityof theevents:
: thepinnumber iseven.
1: thepinnumber hasonlyevendigits.
C: thepinnumber beginsor endswitha1.
1: thepinnumber contains1.
Thesamplespaceis
o = {0123, 0132, ..., 6789}
with10
(4)
equally probableoutcomes. For theevent = {1230, 0134, ..., 9876}, wecanselect
thelast digit tobeany oneof 0, 2, 6, 4, 8 in5 ways. Thenfor eachof thesechoices, wecanselect the
rst digit in9 ways, thethirdin8 waysandsoon. Thereare5 9 8 7 outcomesin and
1() =
5 9 8 7
10
(4)
=
1
2
.
Theevent 1 = {0246, 0248, ..., 8642}. Thepinnumbersin1 areall 5
(4)
arrangementsof length
4usingonlytheevendigits{0, 2, 4, 6, 8} andso
1(1) =
5
(4)
10
(4)
=
5 4 3 2
10 9 8 7
=
1
42
.
Theevent C = {1023, 0231, ..., 9871}. Thereare2positionsfor the1. For eachof thesechoices,
wecanll theremainingthreepositionsin 9
(3)
waysandso
1(C) =
2 9
(3)
10
(4)
=
1
5
.
Theevent 1 = {1234, 2134, ..., 9871}. Wecanusethecomplement andcount thenumber of pin
numbers that do not contain a1. Thereare9
(4)
pin numbers that do not contain 1 and so thereare
10
(4)
9
(4)
that docontaina1. Therefore
1(1) =
10
(4)
9
(4)
10
(4)
= 1
9
(4)
10
(4)
=
2
5
.
Notethat thisis1 1(1) where1 isthecomplement of 1.
Counting Subsets. Insomeproblems, theoutcomesinthesamplespacearesubsetsof axedsize.
Herewelook at countingsuchsubsets. Again, it is useful to writeashort list of thesubsets youare
counting.
Example. Supposewerandomly select asubset of 3digits fromtheset {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} so
that thesamplespaceis
o = {{1, 2, 3}, {0, 1, 3}, {0, 1, 4}, ...{7, 8, 9}}.
(9*8*7*5)/(10*9*8*7)
(5*4*3*2)/(10*9*8*7)
(9*8*7*1)/(10*9*8*7) (2*9P3)/(10*9*8*7)
(10P4 - 9P4)/10P4
21
All thedigitsineachoutcomeareuniquei.e. wedonot consider {1, 1, 2} tobeasubset of o. Also,
theorder of theelementsinasubsetisnotrelevant. Thisistrueingeneral for sets; thesubsets{1, 2, 3}
and{3, 1, 2} arethesame. Tocount thenumber of outcomesino, weusewhat wehavelearnedabout
countingarrangements. Supposethereare: suchsubsets. Usingtheelements of any subset of size
3, wecanform3! arrangements of length3. For example, thesubset {1, 2, 3} generates the3! = 6
arrangements
123, 132, 213, 231, 312, 321
andanyother subset generatesadifferent 3! arrangementssothat thetotal number of arrangementsof
3digitstakenwithout replacement fromtheset {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} is: 3!. But weknowthe
total number of arrangementsis10
(3)
so: 3! = 10
(3)
. Solvingweget
: =
10
(3)
3!
= 120.
Number of subsets of size /. Weusethecombinatorial symbol

a
I

(read: choose /) todenotethe


number of subsets of size/ that canbeselectedfromaset of : objects. By anargument similar to
that above, if : denotesthenumber of subsetsof size/ that canbeselectedfrom: things, then:
/! = :
(I)
andsowehave:isequal

:
/

=
:
(I)
/!
.
In the example, since we selected the subset at random, each of the 120 subsets has the same
probability1,120. Nowndtheprobabilityof thefollowingevents.
: thedigit 1isincludedintheselectedsubset
1: all thedigitsintheselectedsubset areeven
C: at least oneof thedigitsintheselectedsubset islessthan5
Theevent : To count theoutcomes, wemust have1 inthesubset andwecanselect theother two
elementsfromtheremaining9digitsin

9
2

ways. Andso
1() =

9
2

10
3
=
9
(2)
,2!
10
(3)
,3!
=
3
10
.
Theevent 1 = {{0, 2, 4}, {0, 2, 6}, ...}. Wecanformtheoutcomesin1 byselecting3 digitsfromthe
veevendigits{0, 2, 4, 6, 8} in

5
3

ways. Andso
1(1) =

5
3

10
3
.
TheeventC = {{0, 1, 2}, {0, 1, 6}, {0, 6, 7}, ...}. Hereitisconvenienttoconsider thecomplement
C inwhichtheoutcomesare{{6, 7, 8}, {6, 7, 9}, ...} i.e. subsetswithall elementsgreater than5. We
22
canformthesubsetsinC byselectingasubsetof size3 fromtheset{6, 7, 8, 9} in

4
3

ways. Therefore
thenumber of pointsinC is

10
3

4
3

anditsprobabilityis
1(C) =

10
3

4
3

10
3
= 1

4
3

10
3
= 1 1(C).
Example. Supposeabox contains10 ballsof which3 arered, 4 arewhiteand3 aregreen. A sample
of 4 ballsisselectedat randomwithout replacement. Findtheprobabilityof theevents
1: thesamplecontains2redballs
1: thesamplecontains2red, 1whiteand1greenball
G: thesamplecontains2or moreredballs
Imaginethat welabel theballsfrom1 to10 withlabels1, 2, 3 beingred, 4, 5, 6, 7 beingwhiteand
8, 9, 10 beinggreen. Construct auniformprobability model inwhichall subsetsof size4areequally
probable. Thesamplespaceis
o = {{1, 2, 3, 4}, {1, 2, 3, 5}, ..., {7, 8, 9, 10}}
andeachoutcomehasprobability1,

10
4

.
Theevent 1: Tocount thenumber of outcomesin1, wecanconstruct asubset withtworedballsby
rst choosingthetworedballsfromthethreein

3
2

ways. For eachof thesechoiceswecanselect the


other twoballsfromthesevennon-redballsin

7
2

wayssothere

3
2

7
2

areoutcomesin1 and
1(1) =

3
2

7
2

10
4
=
3
10
.
Theevent 1 = {{1, 2, 4, 8}, {1, 2, 4, 9}, ...}: Tocount thenumber of outcomesin1, wecanselect the
tworedballsin

3
2

ways, thenthewhiteball in

4
1

waysandthegreenball in

3
1

ways. Sowehave
1(1) =

3
2

4
1

3
1

10
4
=
6
35
.
Theevent G = {{1, 2, 3, 4}, {1, 2, 4, 5}, ...} has outcomes with both 2 and3 redballs. Weneed to
count theseseparately (seebelow). Thereare

3
2

7
2

outcomes withexactly tworedballs and

3
3

7
1

outcomeswiththreeredballs. Hencewehave
1(G) =

3
2

7
2

3
3

7
1

10
4
=
1
3
.
A common mistake is to count theoutcomes inG as follows. Thereare

3
2

ways to select two red


(3C2*7C2)/10C4
(3C2*4C1*3C1)/10C4
(10C4-7C4)/10C4 (3C2*7C2+3C3*7C1)/10C4
23
ballsandthenfor eachof thesechoiceswecanselect theremainingtwoballsfromtheremainingeight
in

8
2

ways. So thenumber of outcomes inG is

3
2



8
2

. Youcaneasily check that this is greater


than

3
2

7
2

+

3
3

7
1

. Thereasonfor theerror is that someof theoutcomes inG havebeencounted


morethanonce. For example, youmight pick redballs 1,2andthenother balls 3,4to get thesubset
{1, 2, 3, 4}. Or youmaypick redballs1,3andthenother balls2,4toget thesubset {1, 3, 2, 4}. These
arecountedastwoseparateoutcomesbuttheyareinfactthesamesubset. Toavoidthiscountingerror,
whenever youareaskedabout eventsdenedintermssuchasat most..., morethan..., fewer
than... etc., break theeventsintopieceswhereeachpiecehasoutcomeswithspecic valuese.g. two
redballs, threeredballs.
Properties of

a
I

. Youshouldbeabletoprovethefollowing:
1. :
(I)
=
a!
(aI)!
= :(: 1)
(I1)
for / 1.
2.

a
I

=
a!
I!(aI)!
=
a
(I)
I!
3.

a
I

=

a
aI

for all / = 0, 1, ..., :.


4. If wedene0! = 1, thentheformulasabovemakesensefor

a
0

=

a
a

= 1
5. (1 +r)
a
=

a
0

a
1

r +

a
2

r
2
+... +

a
I

r
a
( thisisthebinomial theorem)
Inmany problems, wecancombinecountingarguments for arrangements andsubsets as inthe
followingexample.
Example. A binary sequenceis an arrangement of zeros and ones. Supposewehaveauniform
probability model onthesamplespaceof all binary sequences of length10. What is theprobability
that thesequencehasexactly5 zeros?
Thesamplespaceiso = {0000000000, 0000000001, ..., 1111111111}. Wecanll eachof the10
positionsinthesequencein2waysandhenceo has2
10
outcomeseachwithprobability
1
2
10
. Theevent
1 withexactly5zerosand5onesis1 = {0000011111, 1000001111, ..., 1111100000}. Tocount the
outcomesin1, thinkof constructingthesequencebyllingboxesbelow
Wecanchoosethe5boxesfor thezerosin

10
5

waysandthentheonesgointheremainingboxes
in1 way. Hencewehave
1(1) =

10
5

2
10
.
Example. Suppose the letters of the word STATISTICS are arranged at random. Find the prob-
24
ability of the event G that the arrangement begins and ends with S. The sample space is o =
{oooTTT11C, oooTTT11C, ...}. Hereweneed to count arrangements when someof theel-
ementsarethesame. Weusethesameideaasinthelast example. Weconstruct thearrangementsby
lling10 boxescorrespondingtothepositionsinthearrangement.
Wecanchoosethethreepositionsfor thethreeSsin

10
3

ways. For eachof thesechoices, wecan


choosethepositionsfor thethreeTsin

7
3

ways. ThenwecanplacethetwoIsin

4
2

ways, thenthe
Cin

2
1

waysandnallytheA in

1
1

ways. Thenumber of equallyprobableoutcomesinSis

10
3

7
3

4
2

2
1

1
1

=
10!
3!7!
7!
3!4!
4!
2!2!
2!
1!1!
1!
1!0!
=
10!
3!3!2!1!1!
Theevent G = {ooTTT11C, ooTTT11Co, ...}: Tocount theoutcomesinG wemust haveo
intherst andlast position
S S
Nowwecanusethesametechniquetoarrangetheremaining8letters. Havingplacedtwoof the
Ss, thereremain8freeboxes, inwhichwearetoplacethreeTsin

8
3

ways, twoIsin

5
2

ways, one
C in

3
1

ways, oneA in

2
1

waysandnallytheremainingS inthelast emptyboxin

1
1

way. There
are

8
3

5
2

3
1

2
1

1
1

=
8!
3!2!1!1!1!
= 3360
elementsinG and
1(G) =
8!
3!2!1!1!1!
10!
3!3!2!1!1!
=
3360
50400
=
1
15
.
Number of Arrangements when some symbols are alike: Ingeneral, if wehave:
i
symbolsof type
i, i = 1, 2, ..., / with:
1
+:
2
+.... +:
I
= :, thenthenumber of arrangementsusingall of thesymbols
is
:
:
1

: :
1
:
2

: :
1
:
2
:
3

...

:
I
:
I

=
:!
:
1
!:
2
!...:
I
!
Example. Supposewemakearandomarrangementof length3usinglettersfromtheset{a, /, c, d, c, ), q, /, i, ,}.
What istheprobabilityof theevent 1 that thelettersareinalphabeticorder if
a) lettersareselectedwithout replacement
b) lettersareselectedwithreplacement
25
For (a), thesamplespaceis{a/c, /ac, ..., /i,} with10
(3)
equallyprobableoutcomes.
Theevent 1 = {a/c, a/d, ..., /i,}. Tocount theoutcomesin1, werst select thethree(different)
letters to formthearrangement in

10
3

ways. Thereis then1way to makeanarrangement withthe


selectedlettersinalphabeticorder. Sowehave
1(1) =

10
3

10
(3)
=
1
6
.
For (b), thesamplespaceis {aaa, aa/, a/c, ...} with 10
3
equally probableoutcomes. To count the
elementsin1, consider thefollowingcases
Case 1: all threelettersarethesame. Thereare10sucharrangements{aaa, ///, ccc, ...} all inalpha-
beticorder.
Case 2: therearetwo different letters e.g. {aa/, a/a, /aa, a//, /a/, //a}. Wecan choosethetwo
letters in

10
2

ways. For eachof thesechoices, wecanthenmake2arrangements withtheletters in


alphabeticorder e.g. {aa/, a//} Thereare

10
2

2 arrangementsinthiscase.
Case 3: all threeletters aredifferent. Wecan select thethreeletters in

10
3

ways andthen make1


arrangement that isinalphabeticorder (asinpart (a)).
Combiningthethreecases, wehave
1(1) =
10 +

10
2

2 +

10
3

10
3
=
11
50
Example: Weforma4 digit number by randomly selectingandarranging4digits from1, 2, 3,...7
without replacement. Find theprobability thenumber formed is (a) even (b) over 3000 (c) an even
number over 3000.
Solution: Let o betheset of all possible4 digit numbers usingdigits 1, 2, . . . , 7 sampledwithout
replacement. Theno has7
(4)
outcomes.
(a) For anumber tobeeven, thelast digit must beeven. Wecanll thislast positionwitha2, 4, or
6; i.e. in3ways. Therst3positionscanbelledbychoosingandarranging3of the6digitsnot
usedinthenal position. i.e. in6
(3)
ways. Thenthereare3 6
(3)
waystoll thenal position
AND therst 3positions to produceanevennumber. Thereforetheprobability thenumber is
evenis
36
(3)
7
(4)
=
3
7
. Alternatively, thefour digit number is evenif andonly if thelast digit is
even. Thelast digit isequallylikelytobeanyoneof thenumbers1, ..., 7sotheprobabilityit is
evenistheprobabilityit iseither 2,4, or 6or
3
7
.
(b) Toget anumber over 3000, werequiretherst digit tobe3, 4, 5, 6, or 7; i.e. it canbechosen
in5ways. Theremaining3positions canbelledin6
(3)
ways. Thereforetheprobability the
number isgreater than3000is
56
(3)
7
(4)
=
5
7
. Alternatively, notethat thefour digit number isover
26
3000if andonlyif therst digit isoneof 3, 4, 5, 6or 7. Sinceeachof 1, ..., 7isequallylikelyto
betherst digit, weget theprobabilitythenumber isgreater than3000is
5
7
.
Inboth(a) and(b) wedealt with positions whichhadrestrictions rst, beforeconsideringpo-
sitions withno restrictions. This is generally thebest approachto followinapplyingcounting
techniques.
(c) This part has restrictions onboththerst andlast positions. Toillustratethecomplicationthis
introduces, supposewedecidetoll positionsintheorder 1then4thenthemiddletwo. Wecan
ll position1in5ways. Howmanywayscanwethenll position4? Theanswer iseither 2or
3ways, dependingonwhether therst positionwaslledwithanevenor odddigit. Whenever
weencounter asituationsuchasthis, wehavetobreakthesolutionintoseparatecases. Onecase
iswheretherst digit iseven. Thepositionscanbelledin2waysfor therst (i.e. witha4or
6), 2ways for thelast, andthen5
(2)
ways to arrange2of theremaining5digits inthemiddle
positions. Thisrst casethenoccursin2 2 5
(2)
ways. Thesecondcasehasanodddigit in
positionone. Thereare3waystoll positionone(3, 5, or 7), 3waystoll positionfour (2, 4,
or 6), and5
(2)
waystoll theremainingpositions. Case2thenoccursin3 3 5
(2)
ways. We
needcase1OR case2. Thereforetheprobabilityweobtainanevennumber greater than3000is
2 2 5
(2)
+ 3 3 5
(2)
7
(4)
=
13 5
(2)
7 6 5
(2)
=
13
42
.
Another way todothisistorealizethat weneedonly toconsider therst andlast digit, andto
nd1(rst digit is 3and last digit iseven). Thereare7 6 = 42 different choicesfor (rst
digit, last digit) andit is easy to seethereare13choices for whichrst digit 3, last digit is
even( 5 3 minustheimpossibleoutcomes(4, 4) and(6, 6)). Thusthedesiredprobabilityis
13
42
.
Exercise: Trytosolvepart (c) byllingpositionsintheorder 4, 1, middle. Youshouldget thesame
answer.
Exercise: Canyouspot theawinthefollowingargument? Thereare3 6
(3)
waystoget aneven
number (part (a)). Thereare5 6
(3)
ways to get anumber 3000 (part (b)). Thereforeby the
multiplicationrulethereare[3 6
(3)
] [5 6
(3)
] waystoget anumber whichisevenand 3000.
Example: 5menand3womenareplacedinrandomseatsinarow. Findtheprobabilitythat
(a) thesamegender isat eachend
(b) thewomenall sit together.
27
What are you assuming in your solution? Is it likely in real life that individuals are randomly
seated?
Solution: If wetreat thepeopleasbeing8objects, 5of onetypeand3of another, i.e. 5' and3\,
our samplespacewill have
8!
5!3!
= 56 points.
(a) Toget thesamegender at eachendweneedeither
M M OR W W
Thenumber of distinctarrangementswithamanateachendis
6!
3!3!
= 20, sincewearearranging
3's and3\s inthemiddle6positions. Thenumber withawomanat eachendis
6!
5!1!
= 6.
Thus
1(samegender at eachend) =
20 + 6
56
=
13
28
assumingeacharrangement isequallylikely.
(b) Treating\\\ asasingleunit, wearearranging6objects, 5's and1object wemight call
\\\
00
. Thereare
6!
5!1!
= 6 arrangements. Thus,
1(womensit together) =
6
56
=
3
28
.
Our solutionisbasedontheassumptionthat all pointsino areequallylikely. Thiswouldmean
thepeoplesit inapurelyrandomorder. Randomseatingisunlikelyinreal life, sincefriendsare
morelikelytosit together.
Problems:
3.1.4 Digits1, 2, 3, ..., 7arearrangedat randomtoforma7digit number. Findtheprobabilitythat
(a) theevendigitsoccur together, inanyorder
(b) thedigitsat the2endsarebothevenor bothodd.
3.1.5 Thelettersof thewordEXCELLENT arearrangedinarandomorder. Findtheprobabilitythat
(a) thesameletter occursat eachend.
(b) A, C, and occur together, inanyorder.
(c) thelettersoccur inalphabetical order.
6!*4!/3!/2!/9!=1/252
28
Example: IntheLotto6/49lottery, sixnumbersaredrawnat random, withoutreplacement, fromthe
numbers1to49. Findtheprobabilitythat
(a) thenumbersdrawnare{1, 2, 3, 4, 5, 6}.
(b) noevennumber isdrawn.
Solution:
(a) Let thesamplespaceo consist of all subsetsof 6numbersfrom1, ..., 49; thereare

49
6

of them.
Since1, 2, 3, 4, 5, 6consistof oneof thesesubsets, theprobabilityof thisparticular setis1,

49
6

,
whichisabout 1in13.9million.
(b) Thereare25oddand24evennumbers, so thereare

25
6

choices inwhichall thenumbers are


odd. Thereforetheprobabilitynoevennumber isdrawnistheprobabilitytheyareall odd, or

25
6

49
6
' 0.0127.
Example: Find the probability a bridge hand (13 cards picked at randomfroma standard deck
4
without replacement) has
(a) 3aces
(b) at least 1ace
(c) 6spades, 4hearts, 2diamonds, 1club
(d) a6-4-2-1split betweenthe4suits
(e) a5-4-2-2split.
Solution: Sinceorder of selectiondoes not matter, wetakeo to have

52
13

outcomes, eachwiththe
sameprobability.
(a) Wecanchoose3acesin

4
3

ways. Wealsohavetochoose10 other cardsfromthe48 non-aces.


Thiscanbedonein

48
10

ways. Hencetheprobabilityof exactlythreeacesis


(
4
3
)(
48
10
)
(
52
13
)
4
A standarddeck has13cardsineachof four suits, hearts, diamonds, clubsandspadesfor atotal of 52cards. Thereare
four acesinthedeck(oneof eachsuit).
29
(b) Solution 1: At least 1acemeans1aceor 2acesor 3acesor 4aces. Calculateeachpart asin(a)
andusetheadditionruletoget that theprobabilityof at least oneaceis

4
1

48
12

4
2

48
11

4
3

48
10

4
4

48
9

52
13
.
Solution 2: If wesubtract all caseswith0 acesfromthe

52
13

pointsino weareleft withall pointshavingat


least 1ace. Thereare

4
0

48
13

=

48
13

possiblehandswith0acessinceall cards must bedrawn


fromthenon-aces. (Theterm

4
0

canbeomittedsince

4
0

= 1, but wasincludedheretoshow
that wewerechoosing0 of the4aces) Thisgivesthat theprobabilityof at least oneaceis

52
13

48
13

52
13
= 1

48
13

52
13

This solution is incorrect, but illustrates a common error. Choose1of the4aces thenany
12of theremaining51cards. This guarantees wehaveat least 1ace, so theprobability of at
least oneaceis
(
4
1
)(
51
12
)
(
52
13
)
. Theawinthissolutionisthat it countssomepointsmorethanonceby
partiallykeepingtrack of order. For example, wecouldget theaceof spadesontherst choice
andhappento get theaceof clubs inthelast 12draws. Wealso couldget theaceof clubs on
therst drawandthenget theaceof spadesinthelast 12draws. Thoughinbothcaseswehave
thesameoutcome, they wouldbecountedas 2different outcomes. Thestrategiesinsolution1
and2abovearesafer. Weoftenneedtoinspect asolutioncarefully toavoiddoubleor multiple
counting.
(c) Choosethe6spadesin

13
6

waysandtheheartsin

13
4

waysandthediamondsin

13
2

waysand
theclubsin

13
1

ways. Thereforetheprobabilityof 6spades, 4hearts, 2diamondsandoneclubs


is

13
6

13
4

13
2

13
1

52
13
' 0.00196
(d) Thesplit in(c) isonly 1of several possible6-4-2-1splits. Infact, llinginthenumbers6, 4, 2
and1inthespacesbelow
Spades Hearts Diamonds Clubs
denesa6-4-2-1split. Thereare4! waystodothis, andhavingdonethis, thereare

13
6

13
4

13
2

13
1

ways to pick thecards fromthesesuits. Thereforetheprobability of aa6-4-2-1split between


the4suitsis
30
4!

13
6

13
4

13
2

13
1

52
13
' 0.047
(e) This is thesamequestionas (d) except thenumbers 5-4-2-2arenot all different. Thereare
4!
2!
different arrangementsof 5-4-2-2inthespacesbelow.
Spades Hearts Diamonds Clubs
Therefore, theprobabilityof aa5-4-2-2split is
4!
2!

13
5

13
4

13
2

13
2

52
13
' 0.1058
Notes. While:
(I)
onlyhasaphysical interpretationwhen: and/ arepositiveintegerswith: /, it
still hasmeaningwhen: isnot apositiveinteger, aslongas/ isanon-negativeinteger. Ingeneral we
candene:
(I)
= :(: 1)....(: / + 1). For example:
(2)
(3)
= (2)(2 1)(2 2) = (2)(3)(4) = 24 and
1.3
(2)
= (1.3)(1.3 1) = 0.39
Notethat inorder for

a
0

=

a
a

= 1 wemust dene
:
(0)
=
:!
(: 0)!
= 1 and0! = 1.
Also

a
I

losesitsphysical meaningwhen: isnot anon-negativeinteger / but wecanuse

:
/

=
:
(I)
/!
todeneit when: isnot apositiveinteger but / is. For example,

1
2
3

=
(
1
2
)
(3)
3!
=
(
1
2
)(
1
2
)(
3
2
)
3!
=
1
16
Also, when: and/ arenon-negativeintegersand/ : noticethat

a
I

=
a
(I)
I!
=
a(a1)...(0)...
I!
= 0.
Problems:
3.2.1 A factoryparkinglothas160carsinit, of which35havefaultyemissioncontrols. Anair quality
inspector doesspot checkson8carsonthelot.
31
(a) Giveanexpressionfor theprobabilitythatatleast3of these8carswill havefaultyemission
controls.
(b) What assumption does your answer to (a) require? Howlikely is it that this assumption
holdsif theinspector hopestocatchasmanycarswithfaultycontrolsaspossible?
3.2.2 Inarace, the15runners arerandomly assignedthenumbers 1, 2, , 15. Findtheprobability
that
(a) 4of therst 6nishershavesingledigit numbers.
(b) thefthrunner tonishisthe3rdnisher withasingledigit number.
(c) number 13isthehighest number amongtherst 7nishers.
3.2 Review of Useful Series and Sums
Wewill bemakingusethefollowingseriesandsums.
1. Geometric Series:
a +ar +ar
2
+ +ar
a1
=
a (1 r
a
)
1 r
=
a (r
a
1)
r 1
for r 6= 1
If |r| < 1, then
a +ar +ar
2
+ =
a
1 r
Notethat other identitiescanbeobtainedfromthisonebydifferentiation, for example
d
dr

X
i=0
ar
i
=
d
dr
a
1 r
or
a

X
i=1
ir
i1
=
a
(1 r)
2
Youshouldbeabletodetermineother identitiesbytakingsecondandhigher derivatives.
2. Binomial Theorem: Therearevariousformsof thistheorem. Wewill usetheform
(1 +a)
a
= 1 +

:
1

a
1
+

:
2

a
2
+... +

:
:

a
a
=
a
X
a=0

:
r

a
a
.
Justication: Oneway of verifyingthis formulauses thecountingarguments of this chapter.
Imagineaproduct of theindividual terms:
(1 +a) (1 +a) (1 +a) ... (1 +a)
32
Toevaluatethis product wemust addtogether all of thepossibilities obtainedby takingoneof
thetwopossibletermsfromtherst bracketedexpression, i.e. oneof {1, a), multiplyingbyone
{1, a) takenfromthesecondbracketedexpression. etc. Inhowmany ways do weobtainthe
terma
a
wherer = 0, 1, 2, ..., :? Wemight choosea fromeachof therst r terms aboveand
then1 fromtheremainingterms, or indeedwecouldchoosea fromany r of theterms in

a
a

waysandthen1fromtheremainingterms.
3. Binomial Theorem: Thereisamoregeneral versionof thebinomial theoremthat resultsinan
inniteseriesandthat holdswhen: isnot apositiveinteger:
(1 +a)
a
=

X
a=0

:
r

a
a
if |a| < 1.
Proof: Recall fromCalculustheMaclaurinsserieswhichsaysthat asufcientlysmoothfunc-
tion)(r) canbewrittenasaninniteseriesusinganexpansionaroundr = 0,
)(r) = )(0) +
)
0
(0)
1
r +
)
00
(0)
2!
r
2
+...
providedthat thisseriesisconvergent. Inthiscase, with)(a) = (1 +a)
a
, )(0) = 1, )
0
(0) =
:, )
00
(0) = :(: 1) and)
(v)
(0) = :
(v)
. Substituting,
)(a) = 1 +
:
1
a +
:(: 1)
2!
a
2
+... +
:
(v)
r!
a
v
+... =

X
a=0

:
r

a
a
It isnot hardtoshowthat thisconvergeswhenever |a| < 1.
4. Multinomial Theorem: A generalizationof thebinomial theoremis
(a
1
+a
2
+ +a
I
)
a
=
X
:!
r
1
!r
2
! r
I
!
a
a
1
1
a
a
2
2
a
a
I
I
.
withthesummationover all r
1
, r
2
, , r
I
with
P
r
i
= :.
Justication: Again we could verify this formula uses the counting arguments. Imagine a
product of theindividual terms:
(a
1
+a
2
+ +a
I
) (a
1
+a
2
+ +a
I
) ... (a
1
+a
2
+ +a
I
)
Toevaluatethis product wemust addtogether all of thepossibilities obtainedby takingoneof
theterms fromtherst bracketedexpression, i.e. oneof {a
1
, a
2
, , a
I
}, multiplyingby one
{a
1
, a
2
, , a
I
} takenfromthesecondbracketedexpression. etc. Inhowmany ways do we
obtaintheterma
a
1
1
a
a
2
2
a
a
I
I
where
P
r
i
= :? Wecanchoosea
1
atotal of r
1
timesfromany
33
of the: termsin

a
a
1

ways, andthena
2
fromany of theremaining:r
1
termsin

aa
1
a
2

ways,
andsoonsothereare

:
r
1

: r
1
r
2

: r
1
r
2
r
3

...

r
I
r
I

=
:!
r
1
!r
2
! r
I
!
waysor obtainingthistermintheproduct. Thecase/ = 2 givesthebinomial theoreminthe
form
(a
1
+a
2
)
a
=
a
X
a
1
=0

:
r
1

a
a
1
1
a
aa
1
2
5. Hypergeometric Identity:

X
a=0

a
r

/
: r

=

a +/
:

.
Therewill not bean innitenumber of terms if a and / arepositiveintegers sincetheterms
become0eventually. For example

4
5

=
4
5!
(5)
=
(4)(3)(2)(1)(0)
5!
= 0
Proof: Weprovethisinthecasethat a and/ arenon-negativeintegers. Obviously
(1 +j)
o+b
= (1 +j)
o
(1 +j)
b
.
If weexpandeachtermusingthebinomial theoremweobtain
o+b
X
I=0

a +/
/

j
I
=
o
X
i=0

a +/
i

j
i

b
X
)=0

/
,

j
)
.
Notethat thecoefcient of j
I
ontheright sideis
o
P
i=0

o
i

b
Ii

andsothismust equal

o+b
I

, the
coefcient of j
I
ontheleft side.
6. Exponential series: Thisisanother exampleof aMaclaurinseriesexpansion, if welet )(r) =
c
a
, then)
(v)
(0) = 1 andso
c
a
=
r
0
0!
+
r
1
1!
+
r
2
2!
+
r
3
3!
+ =

X
a=0
r
a
:!
Wewill alsousethelimit denitionof theexponential function: for all real r,
c
a
= lim
a

1 +
r
:

a
34
7. Special series involving integers:
1 + 2 + 3 + +: =
:(: + 1)
2
1
2
+ 2
2
+ 3
2
+ +:
2
=
:(: + 1)(2: + 1)
6
1
3
+ 2
3
+ 3
3
+ +:
3
=

:(: + 1)
2

2
Example: Find

X
a=0
r(r 1)

a
r

/
: r

Solution: For r = 0 or 1thetermbecomes0, sowecanstart summingat r = 2. For r 2, we


canexpandr! asr(r 1)(r 2)!

X
a=0
r(r 1)

a
r

/
: r

=

X
a=2
r(r 1)
a!
r(r 1)(r 2)!(a r)!

/
: r

.
Cancel ther(r 1) termsandtrytore-groupthefactorial termsassomethingchoosesomething.
a!
(r 2)!(a r)!
=
a(a 1)(a 2)!
(r 2)! [(a 2) (r 2)]!
= a(a 1)

a 2
r 2

.
Then

X
a=0
r(r 1)

a
r

/
: r

=

X
a=2
a(a 1)

a 2
r 2

/
: r

.
Factor out a(a 1) andlet j = r 2 toget
a(a 1)

X
j=0

a 2
j

/
: (j + 2)

= a(a 1)

a +/ 2
: 2

bythehypergeometricidentity.
35
3.3 Problems on Chapter 3
3.1 Six digits from2, 3, 4, ..., 8arechosenandarrangedinarowwithout replacement. Findthe
probabilitythat
(a) thenumber isdivisibleby2
(b) thedigits2and3appear consecutivelyintheproper order (i.e. 23)
(c) digits2and3appear intheproper order but not consecutively.
3.2 Supposer passengers get onanelevator at thebasement oor. Thereare: oors above(num-
bered1, 2, 3, ..., :) wherepassengersmayget off.
(a) Findtheprobability
(i) nopassenger getsoff at oor 1
(ii) passengersall get off at different oors(: r).
(b) What assumption(s) underliesyour answer to(a)? Comment brieyonhowlikelyit isthat
theassumption(s) isvalid.
3.3 Thereare6stopsleftonasubwaylineand4passengersonatrain. Assumetheyareeachequally
likelytoget off at anystop. What istheprobability
(a) theyall get off at different stops?
(b) 2get off at onestopand2at another stop?
3.4 Giveanexpression for theprobability abridgehandof 13cards contains 2aces, 4facecards
(J ack, QueenorKing) and7others. Youmightinvestigatethevariouspermutationsandcombina-
tionsrelatingtocardhandsusingtheJ avaappletat/ttj : ,,nnn.ncr|.ar:.n:da.qo,ccc,,aa,co:/./t:
3.5 Thelettersof thewordSTATISTICSarearrangedinarandomorder. Findtheprobability
(a) theyspell statistics
(b) thesameletter occursat eachend.
4/7
((n-1)/n)^r
nPr/n^r
The probability of each passenger gets off on each floor is equal likely.
In reality, more passengers in the elevator will get off on upper floors
because they are more willing to get on an elevator.
6P4/6^4=5/18
6C2*4C2/6^4=5/72
(4C2*12C4*36C7)/(52C13)
3!*3!*2!/10!
All n^r outcomes are equally likely.
That is, all n floors are
equally likely to be selected,
and each passengers selection
is unrelated to each
other persons selection. Both
assumptions are doubtful
since people may be travelling
together (e.g. same family) and
the floors may
not have equal
traffic (e.g. more likely to use the stairs for going up 1 floor
than for 10 floors);
equally likely
independent
7/45
(2*8!/(3!*2!)+8!/(3!*3!))/(10! / (3! * 3! * 2!))=7/45
1/7
5/21=5C2* 5P4 / 7P6
5/42=5*5P4 / 7P6
36
3.6 Threedigitsarechoseninorder from0, 1, 2, ..., 9. Findtheprobabilitythedigitsaredrawnin
increasingorder; (i.e., therst < thesecond< thethird) if
(a) drawsaremadewithout replacement
(b) drawsaremadewithreplacement.
3.7 The Birthday Problem.
5
Supposetherearer personsinaroom. IgnoringFebruary29andas-
sumingthateverypersonisequallylikelytohavebeenbornonanyof the365otherdaysinayear,
ndtheprobabilitythat notwopersonsintheroomhavethesamebirthday. Findthenumerical
valueof thisprobabilityfor r = 20, 40 and60. ThereisagraphicJavaapplet for illustratingthe
frequencyof commonbirthdaysathttp://www-stat.stanford.edu/%7Esusan/surprise/Birthday.html
3.8 Youhave: identical lookingkeysonachain, andoneopensyour ofcedoor. If youtrythekeys
inrandomorder then
(a) what istheprobabilitythe/
0
thkeyopensthedoor?
(b) what istheprobabilityoneof therst twokeysopensthedoor (assume: 3)?
(c) Determinenumerical valuesfor theanswer inpart (b) for thecases: = 3, 5, 7.
3.9 Froma set of 2: + 1 consecutively numbered tickets, three are selected at randomwithout
replacement. Findtheprobabilitythatthenumbersof theticketsformanarithmeticprogression.
[Theorder inwhichtheticketsareselecteddoesnot matter.]
3.10 The10,000 tickets for alottery arenumbered 0000 to 9999. A four-digit winning number is
drawnandaprizeispaidoneachticketwhosefour-digitnumber isanyarrangement of thenum-
ber drawn. For instance, if winningnumber 0011isdrawn, prizesarepaidonticketsnumbered
0011, 0101, 0110, 1001, 1010, and1100. A ticket costs$1andeachprizeis$500.
(a) What is theprobability of winning aprize(i) with ticket number 7337? (ii) with ticket
number 7235? What advicewouldyougivetosomeonebuyingaticket for thislottery?
(b) Assumingthat all ticketsaresold, what istheprobabilitythat theoperator will losemoney
onthelottery?
5
" Mybirthdaywasanatural disaster, ashower of paper full of atteryunder whichonealmostdrowned" AlbertEinstein,
1954onhisseventy-fthbirthday.
1/6
10C3 / 10^3 =0.12
365P20 / 365^20 =0.589
365P40 / 365^40 =0.109
365P60 / 365^60 =0.006
1/n
2/n
2/3 2/5 2/7
i) (4!/(2! * 2!)) / 10000 =3/5000 ii)4!/10000=3/1250 Buy a ticket with number which is combined with 4 different numbers
500*4! = 12000 > 1000 500*4!/2!=6000<10000 P(b)=P(the # of ticket has 4 different digits) =10P4/10000 = 63/125
arithmetic
sequence
1+3+5+......+(2n-3)+(2n-1)=((1+(2n-1))*n/2 =n^2
And generally for any difference b (where b < n) there are 2n (2b-1)
possible starting points. P(E) = n^2 / (2n+1)C3
37
3.11 (a) Thereare25deer inacertainforestedarea, and6havebeencaughttemporarilyandtagged.
Sometimelater, 5deer arecaught. Findtheprobability that 2of themaretagged. (What
assumptiondidyoumaketodothis?)
(b) Supposethat thetotal number of deer intheareawasunknowntoyou. Describehowyou
couldestimatethenumber of deer basedontheinformationthat6deer weretaggedearlier,
andlater when5deer arecaught, 2arefoundtobetagged. What estimatedoyouget?
3.12 Lotto 6/49. InLotto6/49youpurchasealotteryticketwith6differentnumbers, selectedfromthe
set{1, 2, ..., 49}. Inthedraw, six(different) numbersarerandomlyselected. Findtheprobability
that
(a) Your ticket hasthe6numberswhicharedrawn. (ThismeansyouwinthemainJ ackpot.)
(b) Your ticket matchesexactly5of the6numbersdrawn.
(c) Your ticket matchesexactly4of the6numbersdrawn.
(d) Your ticket matchesexactly3of the6numbersdrawn.
3.13 (Texas Hold-em) TexasHold-emisapoker gameinwhichplayersareeachdealttwocardsface
down(calledyour holeor pocket cards), fromastandarddeck of 52cards, followedbyaround
of betting, andthenvecardsaredealt faceuponthetablewithvariousbreakstopermit players
to bet thefarm. Thesearecommunal cards that anyonecanuseincombinationwiththeir two
pocket cards toformapoker hand. Players canuseany veof theface-upcards andtheir two
cardstoformavecardpoker hand. Probabilitycalculationsfor thisgamearenot onlyrequired
at theend, but alsoat intermediatestepsandarequitecomplicatedsothat usually simulationis
usedtodeterminetheoddsthatyouwill wingivenyour currentinformation, soconsider asimple
example. Supposeweweredealt 2Jacksintherst round.
(a) What istheprobabilitythat thenext threecards(faceup) includeat least oneJ ack?
(b) GiventhattherewasnoJ ackamongthesenextthreecards, whatistheprobabilitythatthere
isat least oneamongthelast twocardsdealt face-up?
(c) Whatistheprobabilitythatthe5face-upcardsshowtwoJ acks, giventhatI havetwoinmy
pocket cards?
3.14 Showthat
a
X
a=0
r

:
r

j
a
(1 j)
aa
= :j
(usethebinomial theorem).
(6C2*19C3) / 25C5 =0.274
6/2*5=15
1/ 49C6
(6C5 * 43C1) / 49C6
(6C4 * 43C2) / 49C6
(6C3 * 43C3) /49C6
1- 48C3 / 50C3 = 144/1225 = 0.118
1- 45C2 / 47C2 =23/188=0.122
( 2C2 * 48C3 ) / 50C5= 2/245
The best you can do is assume your sample is a fair representation of the population as a whole.
38
3.15 I haveaquarter whichturnsupheadswithprobability0.6, andafair dime. Thequarter isipped
until aheadoccurs. Independently thedimeisippeduntil aheadoccurs. Findtheprobability
that thenumber of ipsisthesamefor bothcoins.
3.16 Someother summationformulascanbeobtainedbydifferentiatingtheaboveequationsonboth
sides. Assume|r| < 1. Showthat a + 2ar + 3ar
2
+ =
o
(1v)
2
startingwiththegeometric
seriesformula. Showalsothat
P

i=2
i(i 1)r
a2
= 2(1 r)
3
.
3.17 Players and1 decideto play chess until oneof themwins. Assumegames areindependent
with1( wins) =.3, 1(1 wins) =.25and1(draw) =.45oneachgame. If thegameendsina
drawanother gamewill beplayed. Findtheprobability winsbefore1.
A wins before B iff
A wins on the 1st game OR
A ties the first game and A wins on the 2nd game OR
A ties the first 2 games and A wins on the 3rd game OR....
So P(A wins before B) = P(A) + P(D)*P(A) + P(D)^2*P(A) + .... = P(A)/(1 - P(D))
where P(A) = Prob that A wins, P(D) = Prob of a draw
Let Q = { heads on quarter} and D = { heads on dime}. Then P {both heads at same time}
= P (QD V -Q -DQD V -Q-D-Q-DQD V ......)
=(.6)(.5) + (.4)(.5)(.6)(.5)+(.4)(.5)(.4)(.5)(.6)(.5)+...
=(.6)(.5)/(1 - (.4)(.5) = 3/8
4. Probability Rules and Conditional
Probability
4.1 General Methods
Recall that aprobability model consists of asamplespaceo, aset of events or subsets of thesample
spaceto which wecan assign probabilities and amechanismfor assigning theseprobabilities. The
probability of anarbitrary event canbedeterminedby summingtheprobabilities of simpleevents
in andsowehavethefollowingrules:
Rule 1 1(o) = 1
Proof: 1(o) =
P
oS
1(a) =
P
all o
1(a) = 1
Rule 2 For anyevent , 0 1() 1.
Proof: 1() =
P
o
1(a)
P
oS
1(a) = 1 and so since each 1(a) 0, we have
0 1() 1.
Rule 3 If and 1 are two events with 1 (that is, all of the points in are also in 1), then
1() 1(1).
Proof: 1() =
P
o
1(a)
P
o1
1(a) = 1(1) so1() 1(1).
Beforecontinuingwiththeset-theoretic descriptionof aprobability model, let usreviewsomeof
thebasic ideas inset theory. First what do sets haveto do withtheoccurrenceof events? Suppose
arandomexperiment having samplespaceo is run (for exampleadiewith o = {1, 2, 3, 4, 5, 6} is
thrown). Whenwouldwesayanevent o, or inthecaseof thedie, theevent = {2, 4, 6} occurs?
Inthelatter case, theevent means that thenumber showingis even, i.e. ingeneral that one of the
simple outcomes in occurred. WeoftenillustratetherelationshipamongsetsusingVenn diagrams.
Inthedrawingsbelow, thinkof o consistingof all of thepointsinarectangleof areaone
6
. Toillustrate
6
Asyoumayknow, however, thenumber of pointsinarectangleisNOT countable, sothisisnot adiscretesamplespace.
Neverthelessthisdenitionof o isusedtoillustratevariouscombinationsof sets
39
40
theeventwecandrawaregionwithintherectanglewitharearoughlyproportional totheprobability
of theevent . Wemight think of therandomexperiment asthrowingadart at therectangleinFigure
4.3, andwesaytheevent occursif thedart landswithintheregion.
Figure4.3: Set insamplespaceo
What if wecombinetwoevents, 1 byincludingall of thepointsineither or 1 or both. This
istheunionof thetwoeventsor 1 illustratedinFigure4.4.
Theunionof theeventsoccursif oneof theoutcomesineither A or B or bothoccurs. Inlanguage
werefer tothisastheevent " or 1" withtheunderstandingthat inthiscoursewewill usetheword
"or" inclusivelytoalsopermitboth. Another wayof expressingaunionis1 occursif atleastone
of , 1 occurs. Similarlyif wehavethreeevents, 1, C, theevent 1 C means"at least one
of , 1, C.
What about theintersectionof twoevents( 1) or theset of all pointsino that areinboth
and 1? ThisisillustratedinFigure4.5. The event1 occursif andonlyif apointintheintersection
occurswhichmeansboth A and B occur. It iscommontoshortenthenotationfor theintersectionof
twoevents sothat 1 means 1 and1C means 1 C. Finally thecomplement of the
event isdenoted

andmeanstheset of all pointswhichareino but not in asinFigure4.6.
Therearetwospecial eventsinaprobabilitymodel thatwewill use. Oneisthewholesamplespace
o. Because1(o) = 1, thisevent iscertaintooccur. Another istheempty event, or thenull set ,. This
isaset withnoelementsat all andsoit must haveprobability0. Noticethat , = o.
Theillustrations aboveshowingtherelationshipamongsets areexamples of Venndiagrams. At
theURL http://stat-www.berkeley.edu/users/stark/J ava/Venn.htm, thereis anapplet whichallows you
to vary the area of the intersection and construct Venn diagrams for a variety of purposes. Since
probabilitytheoryisbuilt fromtherelationshipsamongsets, itisoftenhelpful touseVenndiagramsin
solvingproblems. For exampletherearerules(DeMorganslaws) governingtakingthecomplements
of unionsandintersectionsthat caneasilybeveriedusingVenndiagrams.
41
A
B
S
A B
A B
Figure4.4: Theunionof twosets 1
A
B
S
A B
A B
Figure4.5: Theintersectionof twoevents 1
Exercise: VerifydeMorganslaws:
a. 1 = 1
b. 1 = 1
Proof of a: Onecanarguesuchset theoreticrulesusingthedenitionsof thesets. For example
whenis apoint a intheset 1. This means a o but a is not in 1, whichinturn
implies a is not in and it is not in1, or a and a 1,equivalently a 1. As
andalternativedemonstration, wecanuseaVenndiagram(Figure4.7) inwhich isindicated
withvertical lines, 1 withhorizontal linesandso1 istheregionwithcrosshatching. This
agreeswiththeshadedregion 1.
ThefollowingexampledemonstratessolvingaproblemusingaVenndiagram.
42
S
A
_
A Complement
Figure4.6:

=thecomplement of theevent
Example: Supposefor studentsnishingsecondyear Maththat22%haveamathaveragegreater than
80%, 24%haveaSTAT 230mark greater than80%, 20%haveanoverall averagegreater than80%,
14%havebothamathaverageandSTAT 230greater than80%, 13%havebothanoverall averageand
STAT 230greater than80%, 10%haveall 3of theseaveragesgreater than80%, and67%havenoneof
these3averagesgreater than80%. Findtheprobabilityarandomlychosenmathstudent nishing2A
hasmathandoverall averagesbothgreater than80%andSTAT 230lessthanor equal to80%.
Solution: When using rules of probability it is generally helpful to begin by labeling theevents of
interest. Imagineastudent ischosenat randomfromall studentsnishingsecondyear Math. For this
student, let
betheevent mathaveragegreater than80%
1 betheevent overall averagegreater than80%
C betheevent STAT 230markgreater than80%
Intermsof thesesymbols, wearegiven:
1() = 0.22, 1(1C) = 0.13,
1(1) = 0.20, 1(1C) = 0.1,
1(C) = 0.24, 1(


1

C) = 0.67
1(C) = 0.14,
Let usinterpret someof theseexpressions; for example


1

C means


1

C or (not ) and (not
1) and (not C), or that noneof themarks or averages aregreater than80%for therandomly chosen
student. Weareaskedtond1(1

C), theregionlabelledwith(5) x inFigure4.8. Fillinginthis
informationonaVenndiagram, intheorder indicatedby (1), (2), (3), etc. below(andrather loosely
identifyingtheareaof aset withitsprobability)
43
Figure4.7: Illustrationof DeMorganslawusingaVenndiagram. Theregionindicatedwithvertical
barsis andwithhorizonal lines, 1, Theshadedregion, 1 isidentical to 1
(1) 1(1C) isgiven= 0.1
(2) 1(C) 1(1C) = 0.14 0.1 = 0.04
(3) 1(1C) 1(1C) = 0.13 0.1 = 0.03
(4) 1(C) 1(C) 0.03 = 0.24 0.14 0.03 = 0.07
(5) 1(1C) isunknown, solet 1(1C) = r
(6) 1() 1(C) 1(1C) = 0.22 0.14 r = 0.08 r
(7) 1(1) 1(1C) 1(1C) = 0.20 0.13 r = 0.07 r
(8) 1( 1 C) = 0.67 is given.
Addingall probabilitiesfrom(1) to(8) weobtain, since1(o) = 1,
0.1 + 0.04 + 0.03 + 0.07 +r + 0.08 r + 0.07 r + 0.67 = 1
giving 1.06 r = 1 andsolvingfor r, 1(1

C) = r = 0.06.
Problems:
4.1.1 Inatypical year, 20%of thedayshaveahightemperature 22
c
C. On40%of thesedaysthere
isnorain. Intherest of theyear, whenthehightemperature 22
c
C, 70%of thedayshaveno
rain. What percent of daysintheyear haverainandahightemperature 22
c
C?
4.1.2 Accordingtoasurvey of peopleonthelast Ontariovoters list, 55%arefemale, 55%arepolit-
ically to theright, and 15%aremaleand politically to theleft. What percent arefemaleand
politicallytotheright? Assumevoter attitudesareclassiedsimplyasleft or right.
1- 0.2+0.4*0.7 =0.52
55+55- 100-15 =110-85=25
P( R^ -T) = P(-T) - P(-T ^ -R) = 0.8 - (0.7*0.8) = 0.8 - 0.56 = 0.24
44
A
(6) 0.08x
(5) x
(1) 0.01
(2) 0.04
(4) 0.07
(3) 0.03
(7) 0.07x
B
C
Figure4.8: VennDiagramfor MathAveragesExample
4.2 Rules for Unions of Events
InadditiontothetworuleswhichgovernprobabilitieslistedinSection4.1, wehavethefollowing
Rule 4 a (probability of unions) 1( 1) = 1() +1(1) 1(1)
Proof: Supposewedenoteset differences by ,1 = 1, theset of points whicharein
but not in1. Then
1() +1(1) =
X
o
1(a) +
X
o1
1(a)
=

X
o1
1(a) +
X
o1
1(a)

X
o1
1(a) +
X
o1
1(a)

X
o1
1(a) +
X
o1
1(a) +
X
o1
1(a)

+
X
o1
1(a)
=
X
o1
1(a) +
X
o1
1(a)
= 1( 1) +1(1)
Subtracting 1(1) weobtain 1( 1) = 1() + 1(1) 1(1) as required. This can
also bejustiedby usingaVenndiagram. Eachpoint in 1 must becountedonce. Inthe
45
Figure4.9: Theunion 1 C
expression1() +1(1), however, pointsin1 havetheir probabilitycountedtwice- oncein
1() andoncein1(1) - sotheyneedtobesubtractedonce.
Rule 4 b (theprobabilityof theunionof threeevents) Byasimilar argument, wehave
1( 1 C) = 1() +1(1) +1(C) 1(1) 1(C) 1(1C) +1(1C) (4.2)
(seeFigure4.9). Theproof is similar. Inthesum1() + 1(1) + 1(C) thosepoints intheregions
labelled 1, H, J in Figure 4.9 lie in only one of the events and their probabilities are added only
once. However points intheregions labelledG, 1, 1, for example, lieintwo of theevents. Wecan
compensatefor thisdouble-countingbysubtractingtheseprobabilitiesonce, e.g. using1()+1(1)+
1(C) [1(1) + 1(C) + 1(1C)]. However, nowthosepointsinall threesets, i.e. thosepoints
in1 = 1C havetheir probabilitiesaddedinthreetimesandthensubtractedthreetimessotheyare
not includedat all: wemust correct theformulatogive(4.2).
Rule 4 c There is an obvious generalization of the above formula to : events
1
, ...
a
. This is of-
tenreferredto as theinclusion-exclusion principle becauseof theprocess discussedabovefor
constructingit:
1(
1

2

3

a
) =
X
i
1(
i
)
X
i<)
1(
i

)
) +
X
i<)<I
1(
i

I
) (4.3)

X
i<)<I<|
1(
i

|
) +
46
(wherethesubscriptsareall distinct, for examplei < , < / < |).
Proof: Thisiseasytoproveusingrule4a andinduction. Let 1
a
=
1

2

3

a
for
: = 1, 2, .... Then4ashowsthat (4.3) holdsfor : = 2. Supposetheruleistruefor :. Then
1(
1

2

3

a

a+1
) = 1(1
a

a+1
)
= 1(1
a
) +1(
a+1
) 1(1
a

a+1
)
=
X
ia
1(
i
)
X
i<)a
1(
i

)
) +
X
i<)<Ia
1(
i

I
) +... +1(
a+1
)

X
ia
1(
i

a+1
) +
X
i<)a
1(
i

a+1
)
X
i<)<Ia
1(
i

a+1
) +...
Wewill use(4.3) rarelyinthiscourse
7
.
Denition 6 Events and 1 are mutually exclusive if 1 = , (the empty event).
Sincemutuallyexclusiveevents and1 havenocommonpoints, 1(1) = 1(,) = 0.
Ingeneral, events
1
,
2
,
a
aremutually exclusiveif
i

)
= , for all i 6= ,. This means
that thereisnochanceof twoor moreof theseeventsoccurringtogether, weeither haveexactlyoneof
theeventsoccur, or none. For example, if adieisrolledtwice, theevents
istheevent that 2occursontherst roll,
1 istheevent that thetotal is10,
aremutually exclusive. Similarly theevents
2
,
3
, ...
12
where
)
istheevent that thetotal onthe
twodiceis, areall mutually exclusiveevents. Inthecaseof mutually exclusiveevents, rule4above
simpliestorule5below.
Rule 5 a (unions of mutually exclusive events). Let and1 bemutuallyexclusiveevents. Then1(
1) = 1() +1(1). Thisisaconsequenceor rule4aandthefact that 1(1) = 1(,) = 0.
Rule 5 b Ingeneral, let
1
,
2
,
a
bemutually exclusiveevents. Then1 (
1

2

a
) =
a
P
i=1
1(
i
).
Thisiseasilyprovenfromrule5aaboveusinginductionor asanimmediateconsequenceof 4c.
7
i.e. donot memorize
47
Rule 6 (probability of complements) 1() = 1 1(

).
Proof: and

aremutuallyexclusiveand

= o, sobyRule5a,
1(

) = 1() +1(

).
But since1(

) = 1(o) = 1,
1 = 1() +1(

) or
1() = 1 1(

).
Thisresult isuseful whenever 1(

) iseasier toobtainthan1().
Example: Twoordinarydicearerolled. Findtheprobabilitythat at least oneof themturnsupasix.
Solution 1: Thesamplespaceiso = (1, 1), (1, 2), (1, 3), ...Let betheevent that weobtain6onthe
rst die, 1 betheevent that weobtain6ontheseconddieandnote(byrule4) that
1(at least onedieshows6) = 1( 1)
= 1() +1(1) 1(1)
=
1
6
+
1
6

1
36
=
11
36
Solution 2: This is anexamplewhereit is perhaps somewhat easier toobtainthecomplement of the
event 1 sincethecomplement istheevent that thereisnosixshowingoneither die, andthereare
exactly25suchpoints, (1, 1), ...(1, 5), (2, 1), ...(2, 5), ...(5, 5). Therefore
1(at least onedieshows6) = 1 1(no6oneither die)
= 1
25
36
=
11
36
Example: Roll adie3times. Findtheprobabilityof gettingat least one6.
Solution 1: Letbetheevent"leastonedieshows6. Then

istheeventthatno6onanydieshows.
Usingcountingarguments, thereare6outcomes oneachroll, soo = {(1, 1, 1), (1, 1, 2)....(6, 6, 6)}
has 6 6 6 = 216 points. For

to occur wecant havea6 on any roll. Then

can occur in
5 5 5 = 125 ways.
Therefore1(

) =
125
216
. Hence 1() = 1
125
216
=
91
216
1/6 + 1/6 + 1/6 - 1/36 * 3 + 1/216 = 91/216
48
Solution 2: Canyouspot theawinthefollowingargument? Let
betheevent that 6occursontherst roll
1 betheevent that 6occursonthesecondroll
C betheevent that 6occursonthethirdroll
Then
1(oneor moresix) = 1( 1 C)
= 1() +1(1) +1(C)
=
1
6
+
1
6
+
1
6
=
1
2
Youshouldhavenoticedthat , 1, andC arenot mutuallyexclusiveevents, soweshouldhaveused
1( 1 C) = 1() +1(1) +1(C) 1(1) 1(C) 1(1C) +1(1C)
Each of 1, C, and 1C occurs 6 times in the 216 point sample space and so 1(1) =
1
36
=
1(1C) = 1(C). Also1(1C) =
1
216
Therefore1( 1 C) =
1
6
+
1
6
+
1
6

1
36

1
36

1
36
+
1
216
=
91
216
.
Note: Rules 3, 4, and (indirectly) 5 link the concepts of addition of probabilities with unions of
events, andcomplements. Thenext segment will consider intersection, multiplicationof probabilities,
and a concept known as independence. Making these linkages will make problemsolving and the
constructionof probabilitymodelseasier.
Problems:
4.2.1 Let , 1, andC beeventsfor which
1() = 0.2, 1(1) = 0.5, 1(C) = 0.3 and1(1) = 0.1
(a) Findthelargest possiblevaluefor 1( 1 C)
(b) For thislargestvaluetooccur, aretheeventsandC mutuallyexclusive, notmutually
exclusive, or canthisnot bedetermined?
4.2.2 Provethat 1( 1) = 1 1( 1) for arbitraryevents and1 ino.
0.2+0.5-0.1+0.3=0.9
A and C are mutually exclusive.
49
4.3 Intersections of Events and Independence
Dependent and Independent Events:
Consider theevents : airplaneenginefails inight and1 : airplanereaches its destinationsafely.
Do wenormally consider theseevents as related or dependent in someway. Certainly if a Canada
Goose is sucked into one jet engine, that effects the probability that the airplane safely reaches its
destination, i.e. iteffectstheprobabilitythat shouldbeassignedtotheevent 1. Supposewetossafair
cointwice. What about thetwoevents : H isobtainedonrst toss and1 : H isobtainedonboth
tosses. Againthereappears to besomedependence. Ontheother handif wereplace1 by 1 : H
is obtainedonsecondtoss, wedo not think that theoccurrenceof affects thechances that 1 will
occur. Whenweshouldreassesstheprobabilityof oneevent 1 giventhat theevent occurredwe
call apair of eventsdependent, andotherwisewecall themindependent. Weformalizethisconcept in
thefollowingmathematical denition.
Denition 7 Events and 1 are independent if and only if 1(1) = 1()1(1). If they are not
independent, we call the events dependent.
WhenweusedVenndiagrams, weimaginedthattheprobabilityof eventswasroughlyproportional
to their area. This is justied in part because area and probability are to examples of measures
in mathematics and share much the same properties. Let us continue this tradition, so that in the
gurebelow, theprobabilityof eventsisrepresentedbytheareaof thecorrespondingregion. Thenif
twoeventsareindependent, thesize of their intersectionasmeasuredby theprobability measureis
requiredtobetheproduct of theindividual probabilities. This means, of course, that theintersection
must benon-empty, andsotheevents arenot mutually exclusive
8
. For exampleintheVenndiagram
depictedinFigure4.10, 1() = 0.3, 1(1) = 0.4 and1(1) = 0.12 sointhiscasethetwoevents
areindependent. If youweretoholdtherectangle inplaceandmovetherectangle1 downandto
theright, theprobability of theintersectionas representedby theareawoulddecreaseandtheevents
wouldbecomedependent.
For another example, supposewetossafair cointwice. Let ={headon1st toss}and1 ={head
on2ndtoss}. Clearly and1 areindependent sincetheoutcomeoneachtoss is unrelatedto other
tosses, so1() =
1
2
, 1(1) =
1
2
, 1(1) =
1
4
= 1()1(1).
8
Canyouthinkof apair of eventsthat arebothindependent andmutuallyexclusive? Suppose1() = 0.5 and1 isan
event suchthat 1 = , and1(1) = 0. Then1()1(1) = 0 = 1( 1) sothispair of eventsisindependent. Does
thismakesensetoyou?
50
A
B
P(AB)=P(A)P(B)
S
Figure4.10: Supposethat theprobability of aregionis equal toits area(sothat theareadof o is 1).
ThenthisillustratesIndependent events, 1
However, if weroll adieonceandlet ={thenumber iseven}and1 ={number 3}theeventswill
bedependent since
1() =
1
2
, 1(1) =
1
2
, 1(1) = 1(4 or 6 occurs) =
2
6
6= 1()1(1).
(Rationale: 1 onlyhappenshalf thetime. If occursweknowthenumber is2, 4, or 6. So1 occurs
2
3
of thetimewhen occurs. Theoccurrenceof doesaffect thechancesof 1 occurringso and1
arenot independent.)
Whentherearemorethan2events, theabovedenitiongeneralizesto:
Denition 8 The events
1
,
2
, ,
a
are independent if and only if
1(
i
1
,
i
2
, ,
i
I
) = 1(
i
1
)1(
i
2
) 1(
i
I
)
for all sets (i
1
, i
2
, , i
I
) of distinct subscripts chosen from (1, 2, , :)
9
For example, for : = 3, weneed
1(
1

2
) = 1(
1
)1(
2
),
1(
1

3
) = 1(
1
)1(
3
),
1(
2

3
) = 1(
2
)1(
3
)
9
Weneedall subsetssothat eventsareindependent of combinationsof other events. For exampleif
1
isindependent of

2
and
4
istobeindependent of
1

2
then, 1(
1

4
) = 1(
1

2
)1(
4
) = 1(
1
)1(
2
)1(
4
)
51
and
1(
1

3
) = 1(
1
)1(
2
)1(
3
)
Technically, wehavedenedmutually independent events, but wewill shortenthenameto inde-
pendent toreduceconfusionwithmutuallyexclusive.
Thedenitionof independenceworkstwoways. If wecannd1(), 1(1), and1(1) thenwecan
determinewhether and1 areindependent. Conversely, if weknow(or assume) that and1 are
independent, thenwecanusethedenitionasaruleof probability tocalculate1(1). Examplesof
eachfollow.
Example: Toss a die twice. Let be the event that the rst toss is a 3 and 1 the event that
thetotal is 7. Are and 1 independent? (What do you think?) Using thedenition to check, we
get 1() =
1
6
, 1(1) =
6
36
(points (1,6), (2,5), (3,4), (4,3), (5,2) and (6,1) give a total of 7) and
1(1) =
1
36
(onlythepoint (3,4) makes1 occur).
Therefore, 1(1) = 1()1(1) andso and1 areindependent events.
NowsupposewedeneC tobetheevent that thetotal is8. Thisisaminor changefromthedenition
of 1.
Then
1() =
1
6
, 1(C) =
5
36
and 1(C) =
1
36
Therefore1(C) 6= 1()1(C)
andconsequently andC aredependent events.
Thisexampleoftenpuzzlesstudents. Whyaretheyindependentif 1 isatotal of 7butdependentfor C:
total is8? Thekeyisthat regardlessof therst toss, thereisalwaysonenumber onthe2ndtosswhich
makesthetotal 7. Sincetheprobabilityof gettingatotal of 7startedoff being
6
36
=
1
6
, theoutcomeof
the1st tossdoesnt affect thechances. However, for anytotal other than7, theoutcomeof the1st toss
doesaffect thechancesof gettingthat total (e.g., arst tossof 1guaranteesthetotal cannot be8)
10
.
Example: A randomnumber generator onthecomputer cangiveasequenceof independent random
digitschosenfromo = {0, 1, . . . , 9}. Thismeansthat (i) eachdigit hasprobabilityof
1
10
of beingany
of 0, 1, . . . , 9, and(ii) eventsdeterminedbythedifferent trialsareindependent of oneanother. Wecall
thisanexperiment withindependent trials. Determinetheprobabilitythat
10
This argument is in terms of conditional probability closely related to independence and to be treated in the next
section.
52
(a) inasequenceof 5trials, all thedigitsgeneratedareodd
(b) thenumber 9occursfor therst timeontrial 10.
Solution:
(a) Denetheevents
i
: digit fromtrial i isodd, i = 1, . . . , 5.
Then
1(all digitsareodd) = 1(
1

5
)
=
5
Y
i=1
1(
i
),
sincethe
i
saremutuallyindependent. Since1(
i
) =
1
2
, weget 1(all digitsareodd)=
1
2
5
.
(b) Deneevents
i
: 9occursontrial i, for i = 1, 2, . . . . Thenwewant
1(

2
. . .

10
) = 1(

1
)1(

2
) . . . 1(

9
)1(
10
)
= (.9)
9
(.1),
becausethe
i
sareindependent, and1(
i
) = 1 1(

i
) = 0.1.
Note: Wehaveusedthefact herethat if and1 areindependent events, thensoare

and1. Tosee
thisnotethat
1 = 1

1 where1 and

1 aremutuallyexclusiveevents, so
1(1) = 1(1) +1(

1).
Therefore
1(

1) = 1(1) 1(1)
= 1(1) 1()1(1) (since and1 areindependent)
= (1 1())1(1)
= 1(

)1(1).
Note: Wehaveimplicitlyassumedindependenceof eventsbyusingthediscreteuniformmodel some
of our earlier probabilitycalculations. For example, supposeacoinistossed3times, andweconsider
thesamplespace
o = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
(1/2)^5
(9/10)^9*1/10=
53
Assumingthat theoutcomesonthethreetossesareindependent, andthat
1(H) = 1(T) =
1
2
onanysingletoss, weget that
1(HHH) = 1(H)1(H)1(H) = (
1
2
)
3
=
1
8
.
Similarly, all theother simpleeventshaveprobability
1
8
. Inearlier calculationsweimplicitlyassumed
this was trueby assigningthesameprobability
1
8
to all possibleoutcomes without thinkingdirectly
about independence. However, it is clear that if somehowthe3tosses werenot independent then it
might beabadideato assumeeachoutcomehadprobability
1
8
. (For example, insteadof heads and
tails, supposeH standsforrainandT standsfornorainonagivenday; nowconsider3consecutive
days. Wouldyouwant toassignaprobabilityof
1
8
toeachof the8simpleeventsevenif thiswereina
seasonwhentheprobabilityof rainonadaywas
1
2
?)
Note: The denition of independent events can thus be used either to check for independence or,
if events are known to be independent, to calculate 1(1). Many problems are not obvious, and
scientic study is neededto determineif two events areindependent. For example, aretheevents
and1 independentif, for arandomchildlivinginacountry, theeventsaredenedas: thechildlives
within5Kim. of anuclear power plant and1: thechildhas leukemia? Determiningwhether such
events aredependent andif so theextent of thedependenceareproblems of substantial importance,
andcanbehandledbymethodsinlater statisticscourses.
Problems:
4.3.1 A weighteddieis suchthat 1(1) = 1(2) = 1(3) = 0.1, 1(4) = 1(5) = 0.2, and1(6) = 0.3.
Assumethat eventsdeterminedbydifferent throwsof thedieareindependent.
(a) If thedieisthrowntwicewhat istheprobabilitythetotal is9?
(b) If adieisthrowntwice, andthis processrepeated4times, what istheprobability thetotal will
be9onexactly1of the4repetitions?
4.3.2 SupposeamongUWstudents that 15%speaks Frenchand45%arewomen. Supposealso that
20%of thewomenspeak French. A committeeof 10studentsisformedby randomly selecting
fromUWstudents. What istheprobability therewill beat least 1womanandat least 1French
speakingstudent onthecommittee
11
?
4.3.3 Provethat and1 areindependent eventsif andonlyif and1 areindependent.
11
Althoughthesamplingis conductedwithout replacement, becausethepopulationis very large, whether wereplaceor
(.1)(.3)+(.2)(.2)*2+(.3)(.1)=0.06+0.08=0.14
4C1*0.14^1*0.86^3=0.3562
0.85+0.55-(1-(0.15+0.45-0.2))=0.8
0.15-0.45*0.2+0.45=0.51 1-0.51=0.49
P(-A-B)=P(-A)P(-B) <=> P(-(A V B))=1 - P (A V B)=P(-A)P(-B) <=> 1 - [P(A)+P(B)-P(A ^ B)]=P(-A)P(-B)
<=>[1-P(A)]-P(B)+P(A^B)=P(-A)P(-B)<=>P(-A)-P(B)+P(A)P(B)=P(-A)P(-B)=P(-A)[1 - P(B)]=P(-A)-P(-A)P(B)
<=>P(-A)- [P(B) - P(A)(B)]=P(-A) - P(-AB) = P(-A) - P(-A)P(B) <=> P(-A) - P(-AB) = P(-A) - P(-A)P(B)
<=>P(-AB) = P(-A)P(B) <=> -A and B independent

P(-AB) = P(B) - P(AB)
P(-F)=0.85^10 P(-W)=0.55^10
P(-F ^ -W) = 0.49^10
Req'd P =
1- [(P(-F) + P(-W) - P(-W^-F)] = 0.8014
54
4.4 Conditional Probability
In many situations wemay want to determinetheprobability of someevent , whileknowing that
someother event 1 has already occurred. For example, what is theprobability arandomly selected
personis over 6feet tall, giventhat sheis female? Let thesymbol 1(|1) represent theprobability
that event occurs, whenweknowthat 1 occurs. Wecall thistheconditional probabilityof given
1. Whilewewill giveadenitionof 1(|1), letsrst consider anexamplewelookedat earlier, to
get somesenseof why1(|1) isdenedasit is.
Example: Supposeweroll adieonceso that samplespaceis o = {1, 2, 3, 4, 5, 6}. Let bethe
event that thenumber is evenand1 theevent that thenumber is greater than3. If weknowthat 1
occurs, that tellsusthat wehavea4, 5, or 6. Of thetimeswhen1 occurs, wehaveanevennumber
2
3
of thetime. So1(|1) =
2
3
. Moreformally, wecouldobtainthisresult by calculating
1(1)
1(1)
, since
1(1) = 1(4 or 6) =
2
6
and1(1) =
3
6
.
Denition 9 the conditional probability of event , given event 1, is
1(|1) =
1(1)
1(1)
, provided 1(1) 6= 0.
Note: If and1 areindependent,
1(1) = 1()1(1) so
1(|1) =
1()1(1)
1(1)
= 1().
This can betaken as an equivalent denition of independence; that is, and 1 areindependent iff
1(|1) = 1(). Wedidnot usethis denitionsimply becauseit does not apply inthecasethat
1(1) = 0. Youshouldinvestigatethebehaviour of theconditional probabilitiesaswemovetheevents
aroundontheweb-sitehttp://stat-www.berkeley.edu/%7Estark/J ava/Venn3.htm.
Example: If afair coin is tossed 3 times, nd theprobability that if at least 1 Headoccurs, then
exactly1Headoccurs.
Solution: Thesamplespaceiso = {HHH, HHT, HTH, ....}. Denetheevents : weobtain1
Head, and1 : weobtainat least 1Head. What wearebeingaskedtondis 1(|1). This equals
1(1),1(1), andsowend
1(1) = 1 1(0 heads) =
7
8
not will makelittledifference. Thereforeassumeinyour calculationsthat samplingiswith replacement sothe10drawsare
independent.
55
and
1(1) = 1(weobtainoneheadANDweobtainat least onehead)
= 1(weobtainonehead)
= 1({HTT, THT, TTH})
=
3
8
usingeither thesamplespacewithequallyprobablypoints, or thefactthatthe3tossesareindependent.
Thus,
1(|1) =
1(1)
1(1)
=
3
8
7
8
=
3
7
.
Example: Theprobability arandomly selectedmaleiscolour-blindis.05, whereastheprobability a
femaleiscolour-blindisonly.0025. If thepopulationis50%male, what isthefractionthat iscolour-
blind?
Solution: LetC betheevent that thepersonselectediscolour-blind, ' theevent that theperson
selectedismaleand1 = ' theevent that thepersonselectedisfemale. Weareaskedtond1(C).
Wearetoldthat
1(C|') = 0.05,
1(C|1) = 0.0025, and
1(') = 0.5 = 1(1).
Notethat fromthedenitionof conditional probability
1(C|')1(') =
1(C')
1(')
1(') = 1(C') andsimilarly1(C|1)1(1) = 1(C1).
Toget 1(C) wecanthereforeusethefact that
C = C' C' andtheeventsC' andC' aremutuallyexclusiveso
1(C) = 1(C') +1(C1)
= 1(C|')1(') +1(C|1)1(1)
= (0.05)(0.5) + (0.0025)(0.5)
= 0.02625.
56
4.5 Multiplication and Partition Rules
Theprecedingexamplesuggeststwomoreuseful probabilityrules. Theyarebasedonbreakingevents
of interest intomutuallyexclusivepieces.
Rule6. Product rules. Let , 1, C, 1, . . . be arbitrary events in a sample space. Assume that
1() 0, 1(1) 0, and1(1C) 0. Then
1(1) = 1()1(1|)
1(1C) = 1()1(1|)1(C|1)
1(1C1) = 1()1(1|)1(C|1)1(1|1C)
and so on.
Proof:
Therst rulecomesdirectlyfromthedenition1(1|) since
1()1(1|) = 1()
1(1)
1()
= 1(1) assuming1() 0.
Theright handsideof thesecondruleequals(assuming1(1) 0 and1() 0)
1()1(1|)1(C|1) = 1()
1(1)
1()
1(C|1)
= 1(1)1(C|1)
= 1(1)
1(C1)
1(1)
= 1(1C),
andsoon.
In order to remember theserules you can imaginethat theevents unfold in somechronological
order, evenif theydonot. For example 1(1C1) = 1()1(1|)1(C|1)1(1|1C) could
beinterpretedastheprobabilitythatA occurs (rst) andthen"givenA occurs, that 1 occurs" (next),
etc.
Partition Rule. Let
1
, . . . ,
I
be a partition of the sample space o into disjoint (mutually
exclusive) events, that is

1

2

I
= o. and
i

)
= , if i 6= ,
Let 1 be an arbitrary event in o. Then
1(1) = 1(1
1
) +1(1
2
) + +1(1
I
)
=
I
P
i=1
1(1|
i
)1(
i
)
57
Proof: Notethat theevents 1
1
, . . . 1
I
areall mutually exclusiveandtheir unionis 1, that is
1 = (1
1
) (1
I
). Therefore 1(1) = 1(1
1
)+1(1
2
)+ +1(1
I
). Bytheproduct
rule, 1(1
i
) = 1(1|
i
)1(
i
) sothisbecomes1(1) = 1(1|
1
)1(
1
) +1(1|
2
)1(
2
) +... +
1(1|
I
)1(
I
).
Example: Inaninsuranceportfolio10%of thepolicy holders areinClass
1
(highrisk), 40%are
inClass
2
(mediumrisk), and50%areinClass
3
(lowrisk). Theprobability thereisaclaimona
Class
1
policyinagivenyear is.10; similar probabilitiesfor Classes
2
and
3
are.05and.02. Find
theprobabilitythat if aclaimismade, it ismadeonaClass
1
policy.
Solution: For arandomlyselectedpolicy, let
1 = {policyhasaclaim}

i
= {policyisof Class
i
}, i = 1, 2, 3
Weareaskedtond1(
1
|1). Notethat
1(
1
|1) =
1(
1
1)
1(1)
andthat
1(1) = 1(
1
1) +1(
2
1) +1(
3
1).
Wearetoldthat
1(
1
) = 0.10, 1(
2
) = 0.40, 1(
3
) = 0.50
andthat
1(1|
1
) = 0.10, 1(1|
2
) = 0.05, 1(1|
3
) = 0.02.
Thus
1(
1
1) = 1(
1
)1(1|
1
) = .01
1(
2
1) = 1(
2
)1(1|
2
) = .02
1(
3
1) = 1(
3
)1(1|
3
) = .01
Therefore1(1) = .04 and1(
1
|1) = .01,.04 = .25.
Tree Diagrams
Treediagrams canbeauseful devicefor keepingtrack of conditional probabilities whenusingmul-
tiplication and partition rules. Theideais to draw atreewhereeach path represents asequenceof
events. Onanygivenbranchof thetreewewritetheconditional probabilityof that event givenall the
eventsonbranchesleadingtoit. Theprobabilityat anynodeof thetreeisobtainedbymultiplyingthe
58
probabilitiesonthebranchesleadingtothenode, andequalstheprobability of theintersectionof the
eventsleadingtoit.
For example, theimmediately precedingexamplecouldberepresentedby thetreeinFigure4.11.
Notethat theprobabilitiesontheterminal nodesmust addupto1.
P(A
1
) =.1
P(A
2
) =.4
P(A
3
) =.5
P(B|A
1
) =.1
P(B|A
2
) =.05
P(B|A
3
) =.02
P(A
1
B) =.01
P(A
2
B) =.02
P(A
3
B) =.01
P(A
1
B) =.09
P(A
2
B) =.38
P(A
3
B) =.49
.9
.95
.98
Figure4.11:
Hereisanother exampleinvolvingdiagnostictestsfor disease. Seeif youcanrepresenttheproblemby
atree.
Example. Testing for HIV
Testsusedtodiagnosemedical conditionsareoftenimperfect, andgivefalsepositiveor falsenegative
results, asdescribedinProblem2.6of Chapter 2. A fairly cheapbloodtest for theHumanImmunod-
eciency Virus (HIV) that causes AIDS (AcquiredImmuneDeciency Syndrome) has thefollowing
characteristics: thefalse negativerate is 2%and thefalsepositive rate is 0.5%. It is assumed that
around.04%of CanadianmalesareinfectedwithHIV.
Findtheprobabilitythat if amaletestspositivefor HIV, heactuallyhasHIV.
Solution: Supposeamaleisrandomlyselectedfromthepopulation, anddenetheevents
= {selectedmalehasHIV}
1 = {bloodtest ispositive}
59
Weareaskedtond1(|1). Fromtheinformationgivenweknowthat
1() = .0004, 1(

) = .9996
1(1|) = .98, 1(1|

) = .005
Thereforewecannd
1(1) = 1()1(1|) = .000392
1(

1) = 1(

)1(1|

) = .004998
Therefore 1(1) = 1(1) +1(

1) = .00539
and
1(|1) =
1(1)
1(1)
= .0727
Thus, if arandomlyselectedmaletestspositive, thereisstill onlyasmall probability(.0727) that they
actuallyhaveHIV!
Exercise: Trytoexplaininordinarywordswhythisisthecase.
Note: Bayes Theorem. Bayestheoremallowsustowriteconditional probabilitiesintermsof similar
conditional probabilitiesbut withtheorder of conditioningreversed:
1(|1) =
1(1|)1()
1(1|)1() +1(1|)1()
Theproof of thisresult issimplesinceusingtheproduct rule,
1(1|)1()
1(1|)1() +1(1|)1()
=
1(1)
1(1) +1(1)
=
1(1)
1(1)
bythepartitionrule
= 1(|1)
This result is called Bayes Theorem, after a mathematician
12
who proved it in the 1700s. It is a
simpletheorem, butithasinspiredapproachestoproblemsinstatisticsandother areassuchasmachine
learning, classicationandpatternrecognition. Intheseareas thetermBayesianmethods is often
used.
Problems:
4.4.1 If youtakeabustowork inthemorningthereisa20%chanceyoull arrivelate. Whenyougo
by bicyclethereis a10%chanceyoull belate. 70%of thetimeyougo by bike, and30%by
bus. Giventhat youarrivelate, what istheprobabilityyoutookthebus?
12
(Rev) Thomas Bayes (1702-1761) was an English Nonconformist minister, turned Presbyterian. He may have been
tutoredby DeMoivre. His famous paper introducingthis rulewas publishedafter his death. Bayesians arestatisticians
who opt for a purely probabilistic view of inference. All unknowns obtain fromsome distribution and ultimately, the
distributionsaysit all.
P(-B | A)=0.02 P(B | A) = 1-P(-B | A)
6/13 = 0.462
60
4.4.2 A boxcontains4coins 3fair coinsand1biasedcoinfor which1(heads) =.8. A coinispicked
at randomandtossed6times. It shows5heads. Findtheprobabilitythiscoinisfair.
4.4.3 At apolicespot check, 10%of carsstoppedhavedefectiveheadlightsandafaultymufer. 15%
have defective headlights and a mufer which is satisfactory. If a car which is stopped has
defectiveheadlights, what istheprobabilitythat themufer isalsofaulty?
p = 6C5*(0.5)^5*(0.5) / ( 6C5* 0.5^5 * 0.5 + 6C5 * 0.8^5 * 0.2) = 0.0385
A: has a good headlights
B: has a good muffler
P(-A-B) = 0.1
P(-AB) = 0.15
Ask: P (-B | -A)
P(-B | -A) = P(-B -A) / P(-A) =P(-B -A) / [P(-B-A)+P(B-A)] = 0.1 /(0.1+ 0.15) = 0.4
P = (3/4)6C5*(0.5)^5*(0.5) / [(3/4) 6C5* 0.5^5 * 0.5 + (1/4) 6C5 * 0.8^5 * 0.2] = 0.4170
61
4.6 Problems on Chapter 4
4.1 If and1 aremutuallyexclusiveeventswith1() = 0.25 and1(1) = 0.4, ndtheprobability
of eachof thefollowingevents:

;

1; 1; 1;


1;


1; 1.
4.2 Threedigitsarechosenat randomwithreplacement from0, 1, ..., 9; ndtheprobabilityof each
of thefollowingevents.
C: thedigitsareall nonzero;
: all threedigitsarethesame; 1: thedigitsall exceed4;
1: all threedigitsaredifferent; 1 digitsall havethesameparity(all oddor all even).
Thenndtheprobabilityof eachof thefollowingevents, whicharecombinationsof theprevious
veevents:
11; 1 1; 1 1 1; ( 1)1; (11).
Showthelast twoof theseeventsinVenndiagrams.
4.3 Let and1 beevents denedonthesamesamplespace, with1() = 0.3, 1(1) = 0.4 and
1(|1) = 0.5. Giventhat event 1 doesnot occur, what istheprobabilityof event ?
4.4 A dieisloadedtogivetheprobabilities:
number 1 2 3 4 5 6
probability .3 .1 .15 .15 .15 .15
Thedieis thrown8times. Events determinedby different throws of thedieareassumedinde-
pendent. Findtheprobability
(a) 1doesnot occur
(b) 2doesnot occur
(c) neither 1nor 2occurs
(d) both1and2occur.
4.5 Events and1 areindependent with1() = .3 and1(1) = .2. Find1( 1).
4.6 Students, 1 andC eachindependentlyanswer aquestiononatest. Theprobabilityof getting
thecorrect answer is.9for , .7for 1 and.4for C. If 2of themget thecorrect answer, what is
theprobabilityC wastheonewiththewronganswer?
4.7 Customersat astoreindependently decidewhether topay by credit cardor withcash. Suppose
theprobabilityis70%that acustomer paysbycredit card. Findtheprobability
0.75; 0.6; 0.65; 0 1 0.35 1
P(A)=0.01 P(B)=0.72 P(C)=9^3/1000=0.729 P(D)=0.5^3=0.125 P(E)=2*5^3 /1000= 0.25
0.12 0.785 0.72+0.125+0.25-0.06-0.12-0.035+0.006= 0.886
0.73+0.125-(0.005+0.06)
0.01+0.06-0=0.07
P(A|-B)=P(A-B) / P(-B) = [P(A)- P(AB)] / [1- P(B)] =( 0.3 - 0.5*0.4) / (1- 0.4)= 0.1/ 0.6 = 1/6
P(AB)+P(BD)=0.005+0.06=0.065 INDEPENDENT
(0.9)^8=0.430
(0.7)^8=0.058
(0.6)^8= 0.017
8C2*0.3*0.1*1^6=0.84
DOUBLE COUNTING
P(A ^ B) =1 - P(-(A ^ B))=1- P(-A V -B)=1-[P(-A)+P(-B) - P(-A ^ -B)]
=1- (0.058+0.430-0.017)= 0. 529
P(-A)
P(-B)
P(-A ^ -B)
0.5-0.06=0.44
0.9*0.7*0.6 / (0.9*0.7*0.6+0.9*0.3*0.4+0.1*0.4*0.7)=0.7354
62
(a) 3out of 5customerspaybycredit card
(b) the5thcustomer isthe3rdonetopaybycredit card.
4.8 Let 1 and1 beindependent with1 = 1 and1 = 1. Provethat either 1(1) = 0 or
else1

1

= 0.
4.9 Inalargepopulation, peopleareoneof 3genetictypes, 1 andC: 30%aretype, 60%type
1 and10%typeC. Theprobability apersoncarriesanother genemakingthemsusceptiblefor
adiseaseis.05for , .04for 1 and.02for C. If tenunrelatedpersonsareselected, what isthe
probabilityat least oneissusceptiblefor thedisease?
4.10 Twobaseball teamsplayabest-of-sevenseries, inwhichtheseriesendsassoonasoneteamwins
four games. Therst twogamesaretobeplayedonseld, thenext threegameson1seld,
andthelast twoonseld. Theprobability that wins agameis0.7at homeand0.5away.
Assumethat theresultsof thegamesareindependent. Findtheprobabilitythat:
(a) winstheseriesin4games; in5games;
(b) theseriesdoesnot goto6games.
4.11 A populationconsistsof 1 femalesand' males; thepopulationincludes) femalesmokersand
: malesmokers. Anindividual ischosenat randomfromthepopulation. If istheevent that
this individual is femaleand1 is theevent heor sheis asmoker, ndnecessary andsufcient
conditionson), :, 1 and' sothat and1 areindependent events.
4.12 Anexperiment hasthreepossibleoutcomes, 1 andC withrespectiveprobabilitiesj, andr,
wherej + +r = 1. Theexperiment isrepeateduntil either outcome or outcome1 occurs.
Showthat occursbefore1 withprobabilityj,(j +).
4.13 Inthegameof craps, aplayer rollstwodice. Theywinat onceif thetotal is7or 11, andloseat
onceif thetotal is2, 3, or 12. Otherwise, theycontinuerollingthediceuntil they either winby
throwingtheir initial total again, or losebyrolling7.
Showthat theprobabilitytheywinis0.493.
(Hint: Youcanusetheresult of Problem4.12)
4.14 A researcher wishestoestimatetheproportionj of university studentswhohavecheatedonan
examination. Theresearcher preparesaboxcontaining100cards, 20of whichcontainQuestion
A and80QuestionB.
QuestionA: WereyouborninJ ulyor August?
QuestionB: Haveyouever cheatedonanexamination?
5C3*(0.7)^3*0.3^2=0.3087
4C2*(0.7)^2*(0.3)^2*0.7=0.1852
1- 0.3*0.95+0.6*0.96+0.1*0.98)^10 =0.342
A:0.49*0.25=0.1225;
2*(0.7)^2*(0.5)^3=0.1225
2*0.7*0.3*(0.5)^3=0.0525
B:0.1225+0.0525=0.175
To not go 6 games we must have A win in 4 or 5 OR B win in 4 or 5. These are all independent events so just add their
probabilities together. Note that we already have the ones for A. For B the probabilities are found quite similarly:
b) P(B win in 4 or 5)
=0.3^2*0.5^2
+2*0.3^2*0.5^3
+2*0.7*0.3*0.5^3=0.0975
P=0.1225+0.175+0.0975=0.395
For independence we need P(AB) = P(A)P(B). Notice that we can easily calculate all the probabilities as just fractions of the total population.
F / f = M / m
63
Each student who is interviewed draws acard at randomwith replacement fromthebox and
answers thequestionit contains. Sinceonly thestudent knows whichquestionheor sheis an-
swering, condentialityisassuredandsotheresearcher hopesthattheanswerswill betruthful
13
.
It isknownthat one-sixthof birthdaysfall inJ ulyor August.
(a) What istheprobabilitythat astudent answersyes?
(b) If r of : studentsanswer yes, estimatej.
(c) What proportionof thestudentswhoanswer yes arerespondingtoQuestionB?
4.15 Diagnostic tests. Recall thediscussionof diagnostic tests inProblem2.6for Chapter 2. For a
randomlyselectedpersonlet 1 = personhasthedisease and1 = thetest result ispositive.
Giveestimatesof thefollowingprobabilities: 1(1|1), 1(1|

1), 1(1).
4.16 Slot machines. Standardslot machines havethreewheels, eachmarkedwithsomenumber of
symbols at equally spacedpositions aroundthewheel. For this problemsupposethereare10
positions on each wheel, with three different types of symbols being used: ower, dog, and
house. The three wheels spin independently and each has probability 0.1 of landing at any
position. Eachof thesymbols (ower, dog, house) is usedinatotal of 10positions across the
threewheels. A payout occurswhenever all threesymbolsshowingarethesame.
(a) If wheels 1, 2, 3 have2, 6, and 2 owers, respectively, what is theprobability all three
positionsshowaower?
(b) Inorder tominimizetheprobability of all threepositions showingaower, what number
of owersshouldgoonwheels1, 2and3? Assumethat eachwheel must haveat least one
ower.
4.17 Spam detection 1. Manymethodsof spamdetectionarebasedonwordsor featuresthat appear
muchmorefrequently inspamthaninregular email. Conditional probability methodsarethen
usedtodecidewhether anemail isspamor not. For example, supposewedenethefollowing
eventsassociatedwitharandomemail message.
Spam = Messageisspam
Not Spam = Messageisnot spam(regular)
A = MessagecontainsthewordViagra
If weknowthevalues of theprobabilities 1(Spam), 1(| Spam) and 1(| Not Spam), then
wecanndtheprobabilities1(Spam|) and1(Not Spam|).
13
"A foolishfaithinauthorityistheworst enemyof truth" Albert Einsten, 1901.
(assuming p is the fraction that cheats) P=0.2*1/6 + 0.8*p
1/30+4/5*p= x/n, p=(x/n - 1/30) *5 / 4
4/5*p / (4/5*p + 1/30)
0.9
0.0612 0.078
0.2*0.6*0.2=0.024
minimize: x*y*(1-x-y)
8 on any one wheel and 1 on the others
64
(a) Fromastudyof email messagescomingintoacertainsystemit isestimatedthat 1(Spam)
=.5, 1(|Spam) =.2, and1(|NotSpam) =.001. Find1(Spam|) and1(NotSpam|).
(b) If youdeclaredthatanyemail containingthewordViagrawasSpam, thenndwhatfraction
of regular emailswouldbeincorrectlyidentiedasSpam.
4.18 Spam detection 2. Themethodinpart(b) of theprecedingquestionwouldonlylter out20%of
Spammessages. (Why?) Toincreasetheprobabilityof detectingspam, wecanusealarger setof
email features; thesecouldbewordsor other featuresof amessagewhichtendtooccur with
muchdifferentprobabilitiesinspamandinregular email. (Fromyour experience, whatmightbe
someuseful features?) Supposeweidentify: binaryfeatures, anddeneevents

i
=featurei appearsinamessage.
Wewill assumethat
1
, . . . ,
a
areindependent events, giventhat amessageisspam, andthat
theyarealsoindependent events, giventhat amessageisregular.
Suppose: = 3 andthat
1(
1
|Spam) =.2 1(
1
|Not Spam) =.005
1(
2
|Spam) =.1 1(
2
|Not Spam) =.004
1(
3
|Spam) =.1 1(
3
|Not Spam) =.005
Assumeasintheprecedingquestionthat 1(Spam) =.5.
(a) Supposeamessagehasall of features1, 2, and3present. Determine1(Spam|
1

3
).
(b) Supposeamessagehas features 1and2present, but feature3is not present. Determine
1(Spam|
1

3
).
(c) If youdeclaredasspamany messagewithoneor moreof features1, 2or 3present, what
fractionof spamemailswouldyoudetect?
4.19 Online fraud detection. Methods likethoseinproblems 4.17and4.18arealso usedinmoni-
toringeventssuchascredit cardtransactionsfor potential fraud. Unlikethecaseof spamemail,
however, thefractionof transactionsthat arefraudulent isusually very small. What wehopeto
dointhiscaseistoagcertaintransactionssothattheycanbecheckedfor potential fraud, and
perhapstoblock(deny) certaintransactions. Thisisdonebyidentifyingfeaturesof atransaction
sothat if 1 =transactionisfraudulent, then
r =
1(feature present|1)
1(feature present|

1)
islarge.
0.995 0.005
0.001*0.5 / 0.5 = 0.001
0.99995
0.99889
0.352
0.2*0.1*0.1*0.5 / [( 0.2*0.1*0.1*0.5) + (0.005*0.004*0.005*0.5) ] = 0.99995
0.2*0.1*0.9*0.5 / [(0.2*0.1*0.1*0.5) + (0.005*0.004*0.995*0.5) ] = 0.998896
0.2+0.1+0.1- 0.2*0.1-0.2*0.1-0.1*0.1+0.2*0.1*0.1 = 0.352
65
(a) Suppose1(1) =0.0005 and that 1(featurepresent|

1) = .02. Determine1(1| feature
present) asafunctionof r, andgivethevalueswhenr = 10,30 and100.
(b) Supposer = 100 andyoudecideto agtransactions withthefeaturepresent. What per-
centageof transactionswouldbeagged? Doesthisseemlikeagoodidea?
4.20

Challenge problem: : musiclovershavereservedseatsinatheatrecontainingatotal of : +/


seats (/ seats areunassigned). Therst person who enters thetheatre, however, lost his seat
assignment andchooses aseat at random. Subsequently, peopleenter thetheatreoneat atime
andsit intheir assignedseat unlessit isalready occupied. If it is, they chooseaseat at random
fromtheremainingempty seats. What is theprobability that person:, thelast personto enter
thetheatre, ndstheir seat alreadyoccupied?
4.21

Challenge problem: (Monty Hall) Youhavebeenchosenasnalist onatelevisionshow. For


your prize, thehost showsyouthreedoors. Behindonedoor isasportscar, andbehindtheother
two aregoats.. After you chooseonedoor, thehost, who knows what is behind each of the
threedoors, opensone(never theoneyouchoseor theonewiththecar) andthensays:Youare
allowedtoswitchthedoor youchoseif youndthat advantageous. Shouldyouswitch?
P= 0.00005r / (0.0005r+0.9995) = r / (r+1999) 0.005 0.0148 0.0476
p(f)= 0.02*100*0.0005 / 0.0476 =0.021 2.1%
yes because r is large
1/ (1+k) * 1/ (2+k) * ....* 1/ (n+k)
5. Discrete Random Variables and
Probability Models
5.1 Random Variables and Probability Functions
Probability models areusedto describeoutcomes associatedwithrandomprocesses. Sofar wehave
used sets , 1, C, . . . in sample spaces to describe such outcomes. In this chapter we introduce
numerical-valuedvariablesA, 1, . . . todescribeoutcomes. Thisallowsprobability modelstobema-
nipulatedeasilyusingideasfromalgebra, calculus, or geometry.
A randomvariable(r.v.) is anumerical-valuedvariablethat represents outcomes inanexperiment or
randomprocess. For example, supposeacoinistossed3times; then
A = Number of Headsthat occur
would be a randomvariable. Associated with any randomvariable is a range , which is the set
of possible values for the variable. For example, the randomvariable A dened above has range
= {0, 1, 2, 3}.
Randomvariablesaredenotedbycapital letterslikeA, 1, . . . andtheir possiblevaluesaredenotedby
r, j, . . . . Thisgivesaniceshort-handnotationfor outcomes: for example, A = 2 intheexperiment
abovestandsfor 2headsoccurred.
Randomvariables are always dened for every outcome of the randomexperiment, i.e. for every
outcomea o. For eachpossiblevaluer of therandomvariableA, thereis acorrespondingset of
outcomes a inthesamplespaceo whichresults inthis valueof r (i.e. sothat A = r occurs). In
rigorousmathematical treatmentsof probability, arandomvariableisdenedasafunctiononasample
space, asfollows:
Denition 10 A random variable is a function that assigns a real number to each point in a sample
space o.
Tounderstandthis denition, consider theexperiment inwhichacoinis tossed3times, andsuppose
66
67
that weusedthesamplespace
o = {HHH, THH, HTH, HHT, HTT, THT, TTH, TTT}
anddenearandomvariableasA =Number of heads. Inthiscasetherangeof therandomvariable,
or theset of possiblevalues of A is theset {0, 1, 2, 3}. For points inthesamplespace, for example
a = THH, thevalueof thefunctionA(a) is obtainedby countingthenumber of heads, A(a) = 2
inthiscase. Eachof theoutcomesA = r (whereA =number of heads) representsanevent (either
simpleor compound). For exampletheyareasfollows:
Events Denitionof thisevent
A = 0 {TTT}
A = 1 {HTT, THT, TTH}
A = 2 {HHT, HTH, THH}
A = 3 {HHH}
Table4.1
andsincesomevalueof A intherange must occur, theevents of theformA = r for r
formapartitionof thesamplespaceo. For exampletheeventsinthesecondcolumnof Table4.1are
mutually exclusive(for example{TTT} {HTT, THT, TTH} = ,) andtheir unionis thewhole
samplespace: {TTT} {HTT, THT, TTH} {HHT, HTH, THH} {HHH} = o.
As youmay recall, afunctionis amappingof eachpoint inadomaininto auniquepoint. e.g. The
function)(r) = r
3
maps thepoint r = 2 inthedomaininto thepoint )(2) = 8 intherange. We
arefamiliar withthisrulefor mappingbeingdenedbyamathematical formula. However, therulefor
mappingapoint inthesamplespace(domain) intothereal number intherangeof arandomvariable
is oftengiveninwords rather thanby aformula. As mentionedabove, wegenerally denoterandom
variables, intheabstract, bycapital letters(A, 1 , etc.) anddenotetheactual numberstakenbyrandom
variables by small letters (r, j, etc.). Youshouldknowthat thereis adifferencebetweenafunction
()(r) or A(a)) andthevalue of a function ( for example)(2) or A(a) = 2).
SinceA = r represents anevent of somekind, wewill beinterestedinits probability, whichwe
writeas1(A = r). Intheaboveexampleinwhichafair coinistossedthreetimes, wemight wishthe
probabilitythat A isequal to2, or 1(A = 2). Thisis1({HHT, HTH, THH}) =
3
8
intheexample.
Weclassifyrandomvariablesintotwotypes, accordingtohowbigtheir rangeof valuesis:
Discrete random variables takeinteger values or, moregenerally, values inacountableset (recall
that aset is countableif its elements canbeplacedinaone-onecorrespondencewithasubset of the
positiveintegers).
Continuous random variables takevalues insomeinterval of real numbers like(0, 1) or (0, ) or
(, ). Youshouldbeawarethatthecardinalityof thereal numbersinaninterval isNOT countable.
68
Examplesof eachmight be
Discrete Continuous
14
number of peopleinacar total weight of peopleinacar
number of carsinaparkinglot distancebetweencarsinaparkinglot
number of phonecallsto911 timebetweencallsto911.
Intheorytherecouldalsobemixedrandomvariableswhicharediscrete-valuedover partof their range
andcontinuous-valuedover someother portionof their range. Wewill ignorethispossibilityhereand
concentraterst ondiscreterandomvariables. ContinuousrandomvariablesareconsideredinChapter
9.
Our aimistoset upgeneral modelswhichdescribehowtheprobability isdistributedamongthepos-
siblevalues arandomvariablecantake. Todothiswedenefor any discreterandomvariableA the
probabilityfunction.
Denition 11 The probability function (p.f.) of a random variable A is the function
)(r) = 1(A = r), dened for all r .
Theset of pairs{(r, )(r)) : r } iscalledtheprobability distribution of A. All probability
functionsmust havetwoproperties:
1. )(r) 0 for all valuesof r (i.e. for r )
2.
P
all a
)(r) = 1
By implication, theseproperties ensurethat )(r) 1 for all r. Weconsider afewtoy examples
beforedealingwithmorecomplicatedproblems.
Example: LetA bethenumber obtainedwhenadieisthrown. Wewouldnormallyusetheprobability
function )(r) = 1,6 for r = 1, 2, 3, , 6. In fact thereprobably is no absolutely perfect die in
existence. For most dice, however, the6sideswill becloseenoughtobeingequallylikelythat )(r) =
1,6 isasatisfactorymodel for thedistributionof probabilityamongthepossibleoutcomes.
Example: Supposeafair coinis tossed3times, withtheresults onthethreetosses independent,
andlet A bethetotal number of headsoccurring. Refer toTable4.1 andcomputetheprobabilitiesof
thefour eventslistedthere; youobtain
69
Events Denitionof thisevent 1(A = r)
A = 0 {TTT}
1
8
A = 1 {HTT, THT, TTH}
3
8
A = 2 {HHT, HTH, THH}
3
8
A = 3 {HHH}
1
8
Table4.2
Thus theprobability functionhas values )(0) =
1
8
, )(1) =
3
8
, )(2) =
3
8
, )(3) =
1
8
. Inthis caseit
is easy toseethat thenumber of points ineachof thefour events of theform"A = r is

3
a

using
thecountingargumentsof Chapter 3, sowecangiveasimplealgebraicexpressionfor theprobability
function,
)(r) =

3
a

8
for r = 0, 1, 2, 3.
Example 3: Findthevalueof / whichmakes)(r) belowaprobabilityfunction.
r 0 1 2 3
)(r) / 2/ 0.3 4/
Sincetheprobability of all possibleoutcomes must addto one,
3
P
a=0
)(r) = 1 giving7/ + 0.3 = 1.
Hence/ = 0.1.
Whiletheprobability functionis themost commonway of describingaprobability model, thereare
other possibilities. Oneof themisbyusingthecumulative distribution function (c.d.f.).
Denition 12 The cumulative distribution function (c.d.f.) of A is the function usually denoted by
1(r)
1(r) = 1(A r)
dened for all real numbers r.
Inthelast example, with/ = 0.1, therangeof valuesfor therandomvariableis = {0, 1, 2, 3} and
wehavefor r
r )(r) 1(r) = 1(A r)
0 0.1 0.1
1 0.2 0.3
2 0.3 0.6
3 0.4 1
70
1 0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F
(
x
)
o
o
o
o
Figure5.1: A simplecumulativedistributionfunction
Notethat thevaluesinthethirdcolumnarepartial sumsof thevaluesof theprobabilityfunctioninthe
secondcolumn. For example,
1(1) = 1(A 1) = 1(A = 0) +1(A = 1) = )(0) +)(1) = 0.3
1(2) = 1(A 2) = )(0) +)(1) +)(2) = 0.6.
Similarly, 1(r) isdenedfor real numbersr , notintherangeof therandomvariable, for example
1(2.5) = 1(2) = 0.6 and1(3.8) = 1.
Thec.d.f. for thisexampleisplottedinFigure5.1.
Ingeneral, 1(r) canbeobtainedfrom)(r) bythefact that
1(r) = 1(A r) =
X
&a
)(n).
A c.d.f. 1(r) has certain properties, just as a probability function )(r) does. Obviously, since it
represents a probability, 1(r) must be between 0 and 1. In addition it must be a non-decreasing
function(e.g. 1(A 8) cannot belessthan1(A 7)). Thuswenotethefollowingpropertiesof a
c.d.f. 1(r):
1. 1(r) isanon-decreasingfunctionof r.
2. 0 1(r) 1 for all r.
3. lim
a
1(r) = 0 andlim
a
1(r) = 1.
71
Wehavenotedabovethat 1(r) canbeobtainedfrom)(r). Theoppositeisalsotrue; for example
thefollowingresult holds:
If A takes on integer values then for values r such that r and r 1 ,
)(r) = 1(r) 1(r 1)
Thissaysthat )(r) isthesizeof thejumpin1(r) at thepoint r.
Toprovethis, just notethat
1(r) 1(r 1) = 1(A r) 1(A r 1) = 1(A = r).
Whenarandomvariablehasbeendenedit issometimessimpler tonditsprobabilityfunction(p.f.)
)(r) rst, andsometimesit issimpler tond1(r) rst. Thefollowingexamplegivestwoapproaches
for thesameproblem.
Example: Supposethat balls labelled1, 2, . . . , areplacedinabox, and: balls (: ) are
randomlyselectedwithout replacement. Denether.v.
A = largest number selected
Findtheprobabilityfunctionfor A.
Solution 1: If A = r thenwemustselectthenumber r plus:1 numbersfromtheset{1, 2, . . . , r
1}. (Notethat thismeansweneedr :.) Thisgives
)(r) = 1(A = r) =

1
1

a1
a1

.
a
=

a1
a1

.
a
r = :, : + 1, . . . ,
Solution 2: First nd1(r) = 1(A r). Notingthat A r if andonly if all : ballsselectedare
fromtheset {1, 2, . . . , r}, weget
1(r) =

a
a

.
a
for r = :, : + 1, . . .
Wecannownd
)(r) = 1(r) 1(r 1)
=

a
a

a1
a

.
a

a1
a1

.
a

72
0 1 2 3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x
f
(
x
)
Figure5.2: Probabilityhistogramfor )(r) =
a+1
10
, r = 0, 1, 2, 3
asbefore.
Remark: Whenyouwritedownaprobabilityfunction, dontforgettogiveitsdomain(i.e. thepossible
valuesof therandomvariable, or thevaluesrfor which)(r) isdened). Thisisanessential partof the
functionsdenition.
Wefrequently graph theprobability function )(r) using a(probability) histogram. For now, well
denethis only for randomvariables whoserangeis someset of consecutiveintegers {0, 1, 2, . . . }.
A histogramof )(r) is thenagraphconsistingof adjacent bars or rectangles. At eachr weplacea
rectanglewithbaseon(r .5, r +.5) andwithheight )(r). IntheaboveExample3, ahistogramof
)(r) lookslikethat inFigure5.2.
Noticethat theareasof theserectanglescorrespondtotheprobabilities, sofor example1(A = 1)
istheareaof thebar aboveandcenteredaroundthevalue1and1(1 A 3) isthesumof thearea
of thethreerectanglesabovethepoints1, 2, and3 (actually theareaof theregionabovebetweenthe
pointsr = 0.5 andr = 3.5). Ingeneral inaprobabilityhistogram, probabilitiesaredepictedbyareas.
Model Distributions:
Many processes or problems have the same structure. In the remainder of this course we will
identifycommontypesof problemsanddevelopprobabilitydistributionsthat represent them. Indoing
thisitisimportanttobeabletostripawaytheparticular wordingof aproblemandlookfor itsessential
features. For example, thefollowingthreeproblemsareall essentiallythesame.
(a) A fair coinistossed10timesandthenumber of headsobtained (A) isrecorded.
(b) Twentyseedsareplantedinseparatepotsandthenumberof seedsgerminating(A) isrecorded.
73
(c) Twelveitems arepickedat randomfromafactorys productionlineandexaminedfor defects.
Thenumber of itemshavingnodefects(A) isrecorded.
Whatarethecommonfeatures? Ineachcasetheprocessconsistsof trials" whicharerepeatedastated
number of times - 10, 20, and 12. In each repetition therearetwo types of outcomes - heads/tails,
germinate/dont germinate, andnodefects/defects. Theserepetitionsareindependent (asfar aswecan
determine), withtheprobability of eachtypeof outcomeremainingconstant for eachrepetition. The
randomvariablewerecordisthenumber of timesoneof thesetwotypesof outcomeoccurred.
Sixmodel distributionsfor discreterandomvariableswill bedevelopedintherest of thischapter. Stu-
dentsoftenhavetroubledecidingwhichone(if any) touseinagivensetting, sobesureyouunderstand
thephysical setupwhichleadstoeachone. Also, asillustratedaboveyouwill needtolearntofocuson
theessential featuresof thesituationaswell astheparticular content of theproblem.
Statistical Computing
A number of major softwaresystems havebeendevelopedfor probability andstatistics. Wewill use
asystemcalled1, whichhas awidevariety of features andwhichhas Unix andWindows versions.
Appendix 6.1at theendof this chapter givesabrief introductionto1, andhowtoaccess it. For this
course, 1 cancomputeprobabilitiesfor all thedistributionsweconsider, cangraphfunctionsor data,
andcansimulaterandomprocesses. Inthesectionsbelowwewill indicatehow1canbeusedfor some
of thesetasks.
Problems:
5.1.1 Let A haveprobabilityfunction
r 0 1 2
)(r) 9c
2
9c c
2
. Findc.
5.1.2 Supposethat 5people, includingyouandafriend, lineupat random. Let A bethenumber of
peoplestandingbetweenyouandyour friend. Tabulatetheprobabilityfunctionandthecumula-
tivedistributionfunctionfor A.
5.2 Discrete Uniform Distribution
Wedeneeachmodel interms of anabstract physical setup", or setting, andthenconsider specic
examplesof thesetup.
Physical Setup: SupposeA takesvaluesa, a + 1, a + 2, , / withall valuesbeingequally likely.
ThenA hasadiscreteuniformdistribution, ontheset {a, a + 1, a + 2, , /}.
Illustrations:
X=0, YFOOO,FYOOO...OOOYF f(x)=8/20,F(X)=8/20
X=1, YOFOO,OYOFO,OOYOF... f(x)=6/20,F(x)=14/20
X=2, YOOFO,OYOOF,FOOYO,OFOOY f(x)=4/20,F(X)=18/20
X=3,YOOOF,FOOOY f(x)=2/20,F(X)=20/20
74
1. If A isthenumber obtainedwhenadieisrolled, thenA hasadiscreteuniformdistributionwith
a = 1 and/ = 6.
2. Computer randomnumber generators giveuniform[1, ] variables, for aspeciedpositivein-
teger . Theseareusedfor many purposes, e.g. generatinglottery numbersor providingauto-
matedrandomsamplingfromaset of items.
Probability Function: Thereare/a+1 valuesA cantakesotheprobabilityateachof thesevalues
must be
1
bo+1
inorder that
b
P
a=o
)(r) = 1. Therefore
)(r) =
(
1
bo+1
; r = a, a + 1, , /
0; otherwise
Example. Supposeafair dieisthrownonceandletA bethenumber ontheface. Firstndthec.d.f.,
1(r) of A.
This is anexampleof adiscreteuniformdistributionontheset {1, 2, 3, 4, 5, 6} havinga = 1, / = 6
andprobabilityfunction
)(r) =
(
1
6
; r = 1, 2, , 6
0; otherwise
Thecumulativedistributionfunctionis1(r) = 1(A r),
1(r) =

0 if r < 1
[a]
6
if 1 r < 6
1 if r 6
whereby [r] wemeantheinteger part of thereal number r or thelargest wholenumber less thanor
equal tor.
Manydistributionsareconstructedusingdiscreteuniformrandomvariables. For examplewemight
throwtwodiceandsumthevaluesontheir faces.
Example. Supposetwofair dice(supposefor simplicityoneisredandtheother isgreen) arethrown.
Let A bethesumof thevaluesontheir faces. Findthec.d.f., 1(r) of A.
Inthiscasewecanconsider thesamplespacetobe
o = {(1, 1), (1, 2), (1, 3), ..., (5, 6), (6, 6)}
wherefor exampletheoutcome(i, ,) means weobtainedi onthereddieand, onthegreen. There
are36 outcomes in this samplespace, all with thesameprobability
1
36
. Theprobability function of
75
A is easily found. For example )(5) is the probability of the event A = 5 or the probability of
{(1, 4), (2, 3), (3, 2), (4, 1)} so)(5) =
4
36
. Theprobability functionandthecumulativedistribution
functionisaslistedbelow:
r = 2 3 4 5 6 7 8 9 10 11 12
)(r)
1
36
2
36
3
36
4
36
5
36
6
36
5
36
4
36
3
36
2
36
1
36
1(r)
1
36
3
36
6
36
10
36
15
36
21
36
26
36
30
36
33
36
35
36
1
Althoughitisabitmoredifculttogiveaformulafor thec.d.f. for general argumentr inthiscase,
it isclear for examplethat 1(r) = 1([r]) and1(r) = 0 for r < 2, 1(r) = 1 for r 12.
Example. Let A bethelargest number whenadieisrolled3times. First ndthec.d.f., 1(r), and
thenndtheprobabilityfunction, )(r) of A.
This is another exampleof adistributionconstructedfromthediscreteuniform. In this casethe
samplespace
o = {(1, 1, 1), (1, 1, 2), ..., (6, 6, 6)}
consistsof all 6
3
possibleoutcomesof the3dice, witheachoutcomehavingprobability
1
216
. Suppose
that r is aninteger between1and6. What is theprobability that thelargest of thesethreenumbers
islessthanor equal tor?Thisrequiresthat all threeof thediceshownumberslessthanor equal tor,
andthereareexactlyr
3
pointsino whichsatisfythisrequirement. Thereforetheprobabilitythat the
largest number islessthanor equal tor is, for r = 1, 2, 3, 4, 5,or 6,
1(r) =
r
3
6
3
andmoregenerallyif r isnot aninteger between1and6,
1(r) =

[a]
3
216
for 1 r < 6
0 for r < 1
1 for r 6
Tondtheprobabilityfunctionwemayusethefactthatfor r inthedomainof theprobabilityfunction
(in this casefor r {1, 2, 3, 4, 5, 6}) wehave1(A = r) = 1(A r) 1(A < r) so that for
r {1, 2, 3, 4, 5, 6},
)(r) = 1(r) 1(r 1)
=
r
3
(r 1)
3
216
=
[r (r 1)][r
2
+r(r 1) + (r 1)
2
]
216
=
3r
2
3r + 1
216
76
5.3 Hypergeometric Distribution
15
Physical Setup: Wehaveacollectionof objectswhichcanbeclassiedintotwodistinct types.
Call one type success
16
(o) and the other type failure (1). There are r successes and r
failures. Pick : objectsat randomwithout replacement. Let A bethenumber of successesobtained.
ThenA hasahypergeometricdistribution.
Illustrations:
1. Thenumber of acesA inabridgehandhasahypergeometricdistributionwith = 52, r = 4,
and: = 13.
2. In aeet of 200trucks thereare12 which havedefectivebrakes. In asafety check 10 trucks
arepickedat randomfor inspection. Thenumber of trucksA withdefectivebrakeschosenfor
inspectionhasahypergeometricdistributionwith = 200, r = 12, : = 10.
Probability Function: Usingcountingtechniqueswenotethereare

.
a

pointsinthesamplespace
o if wedont consider order of selection. Thereare

v
a

waystochoosether successobjectsfromthe
r availableand

.v
aa

waystochoosetheremaining(: r) objectsfromthe( r) failures. Hence


)(r) =

v
a

.v
aa

.
a

Therangeof values for r is somewhat complicated. Of course, r 0. However if thenumber, :,


pickedexceeds thenumber, r, of failures, thedifference, : ( r) must besuccesses. So
r max(0, :+r). Also, r r sincewecantgetmoresuccessesthanthenumber available. But
r :, sincewecantgetmoresuccessesthanthenumberof objectschosen. Thereforer min(r, :).
Example: InLotto6/49aplayerselectsasetof sixnumbers(withnorepeats) fromtheset{1, 2, . . . , 49}.
Inthelotterydrawsixnumbersareselectedatrandom. Findtheprobabilityfunctionfor A, thenumber
fromyour set whicharedrawn.
Solution: Think of your numbersastheo objectsandtheremainder asthe1 objects. ThenA hasa
hypergeometricdistributionwith = 49, r = 6 and: = 6, so
1(A = r) = )(r) =

6
a

43
6a

49
6
, for r = 0, 1, . . . , 6
15
Thissectionoptional for stat 220
16
"If isasuccessinlife, then equalsr plus plus:. Workisr; isplay; and: iskeepingyour mouthshut." Albert
Einstein, 1950
77
For example, youwinthejackpot prizeif A = 6; theprobabilityof thisis

6
6

49
6

, or about 1in13.9
million.
Remark: Whenparameter valuesarelarge, Hypergeometricprobabilitiesmaybetedioustocompute
usingabasic calculator. The1 functions d/jjcr andj/jjcr canbeusedto evaluate)(r) andthe
c.d.f 1(r). Inparticular, d/jjcr(r, r, r, :) gives )(r) andj/jjcr(r, r, r, :) gives 1(r).
Using this we nd for the Lotto 6/49 problemhere, for example, that )(6) is calculated by typing
d/jjcr(6, 6, 43, 6) in1, whichreturnstheanswer 7.151124 10
8
or 1,13, 983, 186.
For all of our model distributions wecanalso conrmthat
P
all a
)(r) = 1. To do this hereweusea
summationresult fromChapter 5calledthehypergeometricidentity. Lettinga = r, / = r inthat
identityweget
X
all a
)(r) =
X

v
a

.v
aa

.
a
=
1

.
a

r
r

r
: r

v+.v
a

.
a
= 1
Problems:
5.3.1 A box of 12tinsof tunacontainsd whicharetainted. Suppose7tinsareopenedfor inspection
andnoneof these7istainted.
a) Calculatetheprobabilitythat noneof the7istaintedfor d = 0, 1, 2, 3.
b) Doyouthinkit islikelythat theboxcontainsasmanyas3taintedtins?
5.3.2 Supposeour samplespacedistinguishespointswithadifferent ordersof selection. For example
supposethat o = {oooo111...., } consists of all words of length: whereletters aredrawn
without replacement fromatotal of r Ss and r Fs. Deriveaformulafor theprobability
that thewordcontainsexactlyA Ss. Inother words, determinethehypergeometricprobability
functionusingasamplespaceinwhichorder of selectionisconsidered.
5.4 Binomial Distribution
Physical Setup:
Supposeanexperiment" hastwotypesof distinct outcomes. Call thesetypessuccess (o) andfail-
ure(1), andlettheirprobabilitiesbej (foro) and1j (for1). Repeattheexperiment: independent
times. Let A bethenumber of successes obtained. ThenA has what is calledabinomial distribu-
tion. (WewriteA 1i(:, j) asashorthandfor A isdistributedaccordingtoabinomial distribution
While we could find none tainted if d is as big as 3, it is not likely to happen. This implies
the box is not likely to have as many as 3 tainted tins.
X Range: from max(0, n-(N-r)) to min(n,r)
78
with: repetitions andprobability j of success.) The: individual experiments intheprocess just
describedareoftencalledtrials or Bernoulli trials andtheprocessiscalledaBernoulli
17
process
or abinomial process.
Illustrations:
1. Tossafair die10timesandlet A bethenumber of sixesthat occur. ThenA 1i(10, 1,6).
2. Inamicrocircuit manufacturingprocess, 90%of thechips producedwork (10%aredefective).
Supposeweselect 25 chips, independently
18
and let A bethenumber that work. ThenA
1i(25, .6).
Comment: We must think carefully whether the physical process we are considering is closely
approximated by abinomial process, for which thekey assumptions arethat (i) theprobability j of
success is constant over the: trials, and(ii) theoutcome(o or 1) onany trial is independent of the
outcomeontheother trials. For Illustration1theseassumptions seemappropriate. For Illustration2
wewouldneedtothinkabout themanufacturingprocess. Microcircuit chipsareproducedonwafers
containing alargenumber of chips and it is common for defectivechips to cluster on wafers. This
could mean that if weselected 25 chips fromthesamewafer, or fromonly 2 or 3 wafers, that the
trials (chips) might not beindependent, or perhapsthat theprobabilityof defectiveschanges.
Probability Function: Thereare
a!
a!(aa)!
=

a
a

different arrangements of r os and (: r) 1s


over the: trials. Theprobabilityfor eachof thesearrangementshasj multipliedtogether r timesand
(1 j) multiplied(: r) times, insomeorder, sincethetrialsareindependent. Soeacharrangement
hasprobabilityj
a
(1 j)
aa
.
Therefore)(r) =

:
r

j
a
(1 j)
aa
; r = 0, 1, 2, , :.
17
After J ames(J akob) Bernoulli (1654 1705), aSwissmember of afamilyof eight mathematicians. NicolausBernoulli
wasanimportant citizenof Basel, beingamember of thetowncouncil andamagistrate. J acobBernoullismother alsocame
fromanimportantBasel familyof bankersandlocal councillors. J acobBernoulli wasthebrother of J ohannBernoulli andthe
uncleof Daniel Bernoulli. Hewascompelledtostudyphilosophyandtheologybyhisparents, graduatedfromtheUniversity
of Basel withamastersdegreeinphilosophyandalicentiateintheologybutagainsthisparentswishes, studiedmathematics
andastronomy. Hewasofferedanappointment intheChurchheturnedit downinsteadtaught mechanicsat theUniversity
inBasel from1683, givinglectures onthemechanics of solids andliquids. J akobBernoulli is responsiblefor many of the
combinatorial resultsdealingwithindependentrandomvariableswhichtakevalues0 or 1 inthesenotes. Hewasalsoaerce
rival of hisyounger brother J ohannBernoulli, alsoamathematician, whowouldhavelikedthechair of mathematicsat Basel
whichJ akobheld.
18
for exampleweselect at randomwithreplacement or without replacement fromavery large number of chips.
,9
79
5 0 5 10 15 20 25
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
x
f
(
x
)
Figure5.3: TheBinomial(20, 0.3) probabilityhistogram.
Checking that
P
)(r) = 1:
a
P
a=0
)(r) =
a
P
a=0

a
a

j
a
(1 j)
aa
= (1 j)
a
a
P
a=0

a
a

j
1j

a
= (1 j)
a

1 +
j
1j

a
bythebinomial theorem
= (1 j)
a

1j+j
1j

a
= 1
a
= 1.
WegraphinFigure5.3theprobability functionfor theBinomial distributionwithparameters: = 20
and j = 0.3. Although the formula for )(r) may seemcomplicated this shape is increasing to a
maximumvaluenear :j andthendecreasingthereafter.
Computation: Many softwarepackages andsomecalculators givebinomial probabilities. In1 we
usethefunctiond/i:o:(r, :, j) tocompute)(r) andj/i:o:(r, :, j) tocomputethecorresponding
c.d.f. 1(r) = 1(A r).
Example Supposethat inaweekly lottery youhaveprobability .02of winningaprizewithasingle
ticket. If youbuy1ticketper weekfor 52weeks, whatistheprobabilitythat(a) youwinnoprizes, and
(b) that youwin3or moreprizes?
Solution: Let A bethenumber of weeksthat youwin; thenA 1i(52, .02). Wend
(a) 1(A = 0) = )(0) =

52
0

(.02)
0
(.98)
52
= 0.350
(b) 1(A 3) = 1 1(A 2)
= 1 )(0) )(1) )(2)
= 0.0859
80
(Notethat 1(A 2) isgivenbythe1 commandj/i:o:(2, 52, .02).)
Comparison of Binomial and Hypergeometric Distributions:
Thesedistributionsaresimilar inthat anexperiment with2typesof outcome(o and1) isrepeated:
timesandA isthenumber of successes. Thekey differenceisthat thebinomial requiresindependent
repetitionswiththesameprobability of o, whereasthedrawsinthehypergeometric aremadefroma
xedcollectionof objectswithout replacement. Thetrials(draws) arethereforenot independent. For
example, if therearer = 10 o objectsand r = 10 1 objects, thentheprobabilityof gettingano
ondraw2dependsonwhat wasobtainedindraw1. If thesedrawshadbeenmadewith replacement,
however, theywouldbeindependent andwedusethebinomial rather thanthehypergeometricmodel.
If is large and the number, :, being drawn is relatively small in the hypergeometric setup then
we are unlikely to get the same object more than once even if we do replace it. So it makes little
practical differencewhether wedrawwith or without replacement. This suggests that when weare
drawingafairly small proportionof alargecollectionof objectsthebinomial andthehypergeometric
modelsshouldproducesimilar probabilities. Asthebinomial iseasier tocalculate, it isoftenusedas
anapproximationtothehypergeometricinsuchcases.
Example: Supposewehave15cansof soupwithnolabels, but 6aretomatoand9arepeasoup. We
randomlypick8cansandopenthem. Findtheprobability3aretomato.
Solution: Thecorrectsolutionuseshypergeometric, andis(withA =number of tomatosoupspicked)
)(3) = 1(A = 3) =

6
3

9
5

15
8
= 0.396.
If weincorrectlyusedbinomial, wedget
)(3) =

8
3

6
15

9
15

5
= 0.279
Asexpected, thisisapoor approximationsincewerepickingover half of afairly small collectionof
cans.
However, if wehad1500cans - 600tomato and900pea, werenot likely to get thesamecanagain
even if wedid replaceeach of the8 cans after openingit. (Put another way, theprobability weget
atomato soup on each pick is very closeto .4, regardless of what theother picks give.) Theexact,
hypergeometric, probabilityisnow
(
600
3
)(
900
5
)
(
1500
8
)
= .2794. Herethebinomial probability,

8
3

600
1500

900
1500

5
= 0.279
isaverygoodapproximation.
Problems:
81
5.4.1 Meganaudits130clientsduringayear andndsirregularitiesfor 26of them.
a) Giveanexpressionfor theprobability that 2clients will haveirregularities when6of her
clientsarepickedat random,
b) Evaluateyour answer to(a) usingasuitableapproximation.
5.4.2 Theashmechanismoncamera failson10%of shots, whilethat of camera1 failson5%of
shots. Thetwocamerasbeingidentical inappearance, aphotographer selectsoneat randomand
takes10indoor shotsusingtheash.
(a) Givetheprobability that theashmechanismfailsexactly twice. What assumption(s) are
youmaking?
(b) Giventhat theashmechanismfailedexactlytwice, what istheprobabilitycamera was
selected?
5.5 Negative Binomial Distribution
19
Physical Setup:
Thesetupfor this distribution is almost thesameas for binomial; i.e. an experiment (trial) has two
distinct types of outcome(o and 1) and is repeated independently with thesameprobability, j, of
success eachtime. Continuedoingtheexperiment until aspeciednumber, /, of success havebeen
obtained. Let A bethenumber of failures obtainedbeforethe/
th
success. Then A has anegative
binomial distribution. WeoftenwriteA 1(/, j) todenotethis.
Illustrations:
(1) If afair coin is tosseduntil weget our 5
th
head, thenumber of tails weobtainhas anegative
binomial distributionwith/ = 5 andj =
1
2
.
(2) Asaroughapproximation, thenumberof half creditfailuresastudentcollectsbeforesuccessfully
completing40half creditsfor anhonoursdegreehasanegativebinomial distribution. (Assume
all courseattempts areindependent, withthesameprobability of beingsuccessful, andignore
thefact thatgettingmorethan6half credit failurespreventsastudentfromcontinuingtowardan
honoursdegree.)
19
Thissectionoptional for stat 220
6C2* 26C2 * (130-26)C4 / (130C6)
Using binomial distribution as an approximation f(2) = 6C2 * (26/130)^2 * (1-26/130)^4 = 0.2458
P(fail twice) = P(A)P(A ^ failed twice) + P(B)P(B ^ failed twice)
= 0.5 * 10C2 * 0.1^2 * 0.9^8 + 0.5 * 10C2 * 0.05^2 * 0.95 ^8 = 0.1342
P(A | failed twice) = 0.5 * 10C2 * 0.1^2 * 0.9^8 / P(fail twice) = 0.7219
This assumes shots are independent with a constant failure probability
82
Probability Function: Inall therewill ber + / trials(r 1sand/ os) andthelast trial must bea
success. Intherst r + / 1 trialswethereforeneedr failuresand(/ 1) successes, inany order.
Thereare
(a+I1)!
a!(I1)!
=

a+I1
a

differentorders. Eachorder will haveprobabilityj


I
(1j)
a
sincethere
must ber trialswhicharefailuresand/ whicharesuccess. Hence
)(r) =

r +/ 1
r

j
I
(1 j)
a
; r = 0, 1, 2,
Note: An alternateversion of thenegativebinomial distribution denes A to bethetotal number
of trials neededtoget the/
th
success. This is equivalent toour version. For example, askingfor the
probabilityof getting3tailsbeforethe5
th
headisexactlythesameasaskingfor atotal of 8tossesin
order to get the5
th
head. Youneedto becareful to readhowA is denedinaproblemrather than
mechanicallypluggingin numbersintheaboveformulafor )(r).
Checkingthat
P
)(r) = 1 requires somewhat morework for thenegativebinomial distribution. We
rst re-arrangethe

a+I1
a

term,

r +/ 1
r

=
(r +/ 1)
r!
(a)
=
(r +/ 1)(r +/ 2) (/ + 1)(/)
r!
Factor a(-1) out of eachof ther termsinthenumerator, andre-writethesetermsinreverseorder,

r +/ 1
r

=(1)
a
(/)(/ 1) (/ r + 2)(/ r + 1)
r!
= (1)
a
(/)
r!
(a)
= (1)
a

/
r

Then(usingthebinomial theorem)

X
a=0
)(r) =

X
a=0

/
r

(1)
a
j
I
(1 j)
a
= j
I

X
a=0

/
r

[(1)(1 j)]
a
= j
I
[1 + (1)(1 j)]
I
= j
I
j
I
= 1
Comparison of Binomial and Negative Binomial Distributions
Theseshouldbeeasily distinguishedbecausethey reversewhat is speciedor knowninadvanceand
what isvariable.
Binomial: we know the number : if trials in advance but we do not know the number of
successeswewill obtainuntil after theexperiment.
DO NOT FORGET RANGE !
83
Negative Binomial: Weknowthenumber/ of successesinadvancebutdonotknowthenumber
of trialsthat will beneededtoobtainthisnumber of successesuntil after theexperiment.
Example: The fraction of a large population that has a specic blood type T is .08 (8%). For blood
donation purposes it is necessary to nd 5 people with type T blood. If randomly selected individuals
from the population are tested one after another, then (a) What is the probability j persons have to be
tested to get 5 type T persons, and (b) What is the probability that over 80 people have to be tested?
Solution: Think of atypeT personasasuccess(o) andanon-typeT asan1. Let 1 =number of
personswhohavetobetestedandlet A =number of non-typeT personsinorder toget 5os. Then
A 1(/ = 5, j = .08) and
1(A = r) = )(r) =

r + 4
r

(.08)
5
(.92)
a
r = 0, 1, 2, . . .
Weareactuallyaskedhereabout 1 = A + 5. Thus
1(1 = j) = 1(A = j 5)
= )(j 5)
=

j 1
j 5

(.08)
5
(.92)
j5
for j = 5, 6, 7, . . .
Thuswehavetheanswer to(a) asgivenabove, and for (b)
1(1 80) = 1(A 75) = 1 1(A 75)
= 1
75
X
a=0
)(r) = 0.2235
Note: Calculatingsuchprobabilitiesiseasy with1. Toget )(r) weused:/i:o:(r, /, j) andtoget
1(r) = 1(A r) weusej:/i:o:(r, /, j).
Problems:
5.5.1 Youcangetagrouprateonticketstoaplayif youcannd25peopletogo. Assumeeachperson
youask respondsindependentlyandhasa20%chanceof agreeingtobuyaticket. Let A bethe
total number of peopleyouhaveto ask inorder tond25who agreeto buy aticket. Findthe
probabilityfunctionof A.
(X-1)C(X-25) * (0.2)^25 * (0.8)^ (X-25) ;

X=25,26,27...... DO NOT FORGET RANGE !
84
5.5.2 A shipment of 2500 car headlights contains 200 which are defective. You choose fromthis
shipment without replacement until youhave18whicharenot defective. Let A bethenumber
of defectiveheadlightsyouobtain.
(a) Givetheprobabilityfunction, )(r).
(b) Usingasuitableapproximation, nd)(2).
5.6 Geometric Distribution
Physical Setup: Consider thenegativebinomial distribution with / = 1. In this casewerepeat
independent Bernoulli trialswithtwotypesof outcome(o and1) eachtime, andthesameprobability,
j, of success each timeuntil weobtain therst success. Let A bethenumber of failures obtained
beforetherst success.
Illustrations:
(1) Theprobabilityyouwinalotteryprizeinanygivenweek isaconstant j. Thenumber of weeks
before youwinaprizefor therst timehasageometricdistribution.
(2) If youtakeSTAT 230until youpassitandattemptsareindependentwiththesameprobabilityof
apasseachtime
20
, thenthenumber of failureswouldhaveageometricdistribution. (Thankfully
theseassumptionsareunlikelytobetruefor most persons! Whyisthis?)
Probability Function: Thereis only theonearrangement with r failures followed by 1success.
Thisarrangement hasprobability
)(r) = (1 j)
a
j; r = 0, 1, 2,
Alternativelyif wesubstitute/ = 1 intheprobabilityfunctionfor thenegativebinomial, weobtain
)(r) =

r + 1 1
r

j
1
(1 j)
a
; for r = 0, 1, 2,
= j(1 j)
a
for r = 0, 1, 2,
20
youburnall notesandpurgeyour memoryof thecourseafter eachfailure
f(x) = ((x+17)C x)* (200/2500) ^ x * (2300/2500)^ 18
x=0,1,3...200
Using binomial distribution as an approximation
f(2) = 19C2 (2/25)^2 * (23/25)^17 * 2500-200-17) / (2500-19) x = 0,1,...200
2500 is large and we only select a few of them
hypergeometric f(x)= ((x+17)C x)* (2300C17)*(200Cx) / (2500C(x+17)) * (2500-200-17) / (2500- (x+17))
the probability of (x+18) th must be not defective
85
whichisthesame. Tocheckingthat
P
)(r) = 1, wewill needtoevaluateageometricseries,

X
a=0
)(r) =

X
a=0
(1 j)
a
j = j + (1 j)j + (1 j)
2
j +
=
j
1 (1 j)
=
j
j
= 1
Note: The names of the models so far derive fromthe summation results which show )(r) sums
to 1. The geometric distribution involved a geometric series; the hypergeometric distribution used
thehypergeometric identity; boththebinomial andnegativebinomial distributions usedthebinomial
theorem.
Bernoulli Trials. Onceagainremember thatthebinomial, negativebinomial andgeometricmodels
all involvetrials(experiments) which:
(1) areindependent
(2) have2distinct typesof outcome(o and1)
(3) havethesameprobabilityj of success (o) eachtime.
SuchtrialsareknownasBernoulli trials.
Problem 5.6.1
Supposethereisa30%chanceof acar fromacertainproductionlinehavingaleakywindshield. The
probability aninspector will havetocheck at least : carstondtherst onewithaleaky windshield
is.05. Find:.
5.7 Poisson Distribution from Binomial
ThePoisson
21
distribution hasprobabilityfunction(p.f.) of theform
)(r) = c
j
j
a
r!
r = 0, 1, 2, . . .
wherej 0 isaparameter whosevaluedependsonthesettingfor themodel. Mathematically, wecan
seethat )(r) hasthepropertiesof ap.f., since)(r) 0 for r = 0, 1, 2, . . . andsince

P
a=0
)(r) = c
j

P
a=0
j
i
a!
= c
j
(c
j
) = 1
21
After SimonDenisPoisson(1781-1840), aFrenchmathematicianwhowassupposedtobecomeasurgeonbut, fortu-
nately for his patients, failedmedical school for lack of coordination. Hewas forcedto do theoretical research, beingtoo
clumsy for anythinginthelab. Hewroteamajor work onprobability andthelaw, Recherchs sur la probabilit des juge-
ments en matire criminelle et matire civile (1837), discoveredthePoissondistribution(calledlawof largenumbers) and
tohimisascribedoneof themoredepressingquotesinour disciplineLifeisgoodfor onlytwothings: tostudymathematics
andtoteachit
(0.7)^(n-1) * (0.3) = 0.05
P(X >= n-1) = f(n-1) + f(n)+f(n+1) +...= (0.7)^(n-1)*0.3 + (0.7)^n*0.3 + (0.7)^(n+1)*0.3+......
= (0.7)^(n-1)*0.3 / (1- 0.7) = 0.05
0.7^(n-1) = 0.05 n-1 log 7 = log 0.05 n = 9.4
86
ThePoissondistributionarisesinphysical settingswheretherandomvariableA representsthenumber
of events of sometype. In this section weshow how it arises froma binomial process, and in the
followingsectionweconsider another derivationof themodel.
Wewill sometimeswriteA Poisson(j) todenotethat A hasthep.f. above.
Physical Setup: Oneway thePoissondistributionarisesisasalimitingcaseof thebinomial distri-
butionas: andj 0. Inparticular, wekeeptheproduct :j xedat someconstant value, j,
whileletting: . Thisautomaticallymakesj 0. Let usseewhat thelimit of thebinomial p.f.
)(r) isinthiscase.
Probability Function: Since:j = j, Thereforej =
j
a
and for r xed,
)(r) =

:
r

j
a
(1 j)
aa
=
:
(a)
r!

j
:

1
j
:

aa
=
j
a
r!
a terms
z }| {
:(: 1)(: 2) (: r + 1)
(:)(:) (:) (:)

1
j
:

aa
=
j
a
r!

:
:

: 1
:

: 2
:

: r + 1
:

1
j
:

1
j
:

a
=
j
a
r!
(1)

1
1
:

1
2
:

1
r 1
:

1
j
:

1
j
:

a
lim
a
)(r) =
j
a
r!
(1)(1)(1) (1)
| {z }
a terms
c
j
(1)
a

sincec
I
= lim
a

1 +
/
:

=
j
a
c
j
r!
; for r = 0, 1, 2,
(For thebinomial theupper limit onr is :, but weareletting: .) This result allows us to use
thePoissondistributionwithj = :j asacloseapproximationtothebinomial distribution1i(:, j) in
processesfor which: islargeandj issmall.
Example: 200peopleareat aparty. What istheprobabilitythat 2of themwerebornonJ an. 1?
Solution: Assuming all days of theyear areequally likely for abirthday (and ignoring February
29) andthat thebirthdaysareindependent (e.g. notwins!) wecanusethebinomial distributionwith
: = 200 andj = 1,365 for A =number bornonJ anuary1, giving
)(2) =

200
2

1
365

1
1
365

198
= .086767
87
Since: is largeandj is closeto0, wecanusethePoissondistributiontoapproximatethis binomial
probability, withj = :j =
200
365
, giving
)(2) =

200
365

2
c
(
200
365
)
2!
= .086791
Asmight beexpected, thisisaverygoodapproximation.
Notes:
(1) If j is closeto 1wecanalso usethePoissondistributionto approximatethebinomial. By in-
terchangingthelabelssuccess andfailure, wecanget theprobabilityof success (formerly
labelledfailure) closeto0.
(2) ThePoissondistributionusedtobeveryuseful for approximatingbinomial probabilitieswith:
largeandj near 0sincethecalculationsareeasier. (Thisassumesvaluesof c
a
tobeavailable.)
With the advent of computers, it is just as easy to calculate the exact binomial probabilities
as thePoissonprobabilities. However, thePoissonapproximationis useful whenemployinga
calculator without abuilt inbinomial function.
(3) The1 functionsdjoi:(r, j) andjjoi:(r, j) give)(r) and1(r).
Problem 5.7.1
Anairlineknowsthat 97%of thepassengerswhobuyticketsfor acertainight will showupontime.
Theplanehas120seats.
a) Theysell 122tickets. Findtheprobabilitythat morepeoplewill showupthancanbecarriedon
theight. Comparethisanswer withtheanswer givenbythePoissonapproximation.
b) What assumptionsdoesyour answer dependon? Howwell wouldyouexpect theseassumptions
tobemet?
5.8 Poisson Distribution from Poisson Process
22
We now derive the Poisson distribution as a model for the number of a certain kind of event or
occurrence(e.g. births, insuranceclaims, websitehits) that occur at pointsintimeor inspace. Tothis
22
Thissectionoptional for stat 220
88
end, weusetheorder notationq(t) = o(t) ast 0 tomeanthat thefunctionq approaches0
faster thant ast approacheszero, or that
q(t)
t
0 ast 0.
For exampleq(t) = (t)
2
= o(t) but (t)
12
isnot o(t).
Physical Setup: Consider asituationinwhichacertaintypeof eventoccursatrandompointsintime
(or space) accordingtothefollowingconditions:
1. Independence: thenumber of occurrencesinnon-overlappingintervalsareindependent.
2. Individuality: for sufciently short time periods of length t, the probability of 2 or more
events occurring in theinterval is closeto zero i.e. events occur singly not in clusters. More
precisely, ast 0, theprobabilityof twoor moreeventsintheinterval of lengtht must go
tozerofaster thant 0. or that
1(2or moreeventsin(t, t +t)) = o(t) ast 0.
3. Homogeneity or Uniformity: events occur at auniformor homogeneous rate` over timeso
that theprobabilityof oneoccurrenceinaninterval (t, t +t) isapproximately`t for small
t for anyvalueof t. Moreprecisely,
1(oneevent in(t, t +t)) = `t +o(t).
Thesethreeconditionstogether deneaPoisson Process.
Let A bethenumber of event occurrences in atimeperiod of length t. Then it can beshown (see
below) that A hasaPoissondistributionwithj = `t.
Illustrations:
(1) Theemissionof radioactiveparticlesfromasubstancefollowsaPoissonprocess. (Thisisused
inmedical imagingandother areas.)
(2) HitsonawebsiteduringagiventimeperiodoftenfollowaPoissonprocess.
(3) Occurrencesof certainnon-communicablediseasessometimesfollowaPoissonprocess.
89
Probability Function: Wecanderivetheprobabilityfunction)(r) = 1(A = r) fromtheconditions
above. Weareinterestedintimeintervals of arbitrary lengtht, so as atemporary notation, let )
t
(r)
betheprobability of r occurrences inatimeinterval of lengtht. Wenowrelate)
t
(r) and)
t+t
(r).
Fromthat wecandeterminewhat )
t
(r) is. Tond)
t+t
(r) wenotethat for t small thereareonly
2waystoget atotal of r event occurrencesbytimet +t. Either therearer eventsbytimet andno
morefromt tot + t or therearer 1 by timet and1morefromt tot + t. (since1(2or more
eventsin(t, t +t)) = o(t), other possibilitiesarenegligibleif t issmall). Thisandcondition1
above(independence) implythat
)
t+t
(r)
.
= )
t
(r)(1 `t) +)
t
(r 1)(`t) +o(t)
Re-arranginggives
)
I+I
(a))I(a)
t
.
= `[)
t
(r 1) )
t
(r)] +o(1). Takingthelimitast 0 weget
d
dt
)
t
(r) = `[)
t
(r 1) )
t
(r)] . (5.4)
This provides a differential-difference" equation that needs to besolved for thefunctions )
t
(r) as
functionsof t for eachxedinteger valueof r. Weknowthat ininterval of length0, zeroeventswill
occur, sothat )
0
(0) = 1 and)
0
(r) = 0 for r = 1, 2, 3, . At themoment wemaynot knowhowto
solvesuchasystembutletsapproachtheproblemusingthebinomial approximationof thelastsection.
Supposethat theinterval (0, t) isdividedinto: =
t
t
small subintervalsof lengtht. Theprobability
that an event falls in any subinterval (record this as asuccess) is approximately j = `t provided
theinterval lengthis small. Theprobability of two or moreevents fallinginany onesubinterval is
less than:1(2or moreevents in(t, t + t)) = : o(t) whichgoes to 0 as t 0 so wecan
ignorethepossibility that oneof thesubintervalshas2or moreeventsinit. Alsothesuccesses are
independent onthe: different subintervalsor trials, andsothetotal number of successesrecorded,
A, isapproximatelybinomial(:, j). Therefore
1(A = r) '

:
r

j
a
(1 j)
aa
=
:
(a)
j
a
r!
(1 j)
a

1
1 j

a
Noticethat for xedt, r, ast 0, j = `t 0 and: =
t
t
, and(1 j)
a
c
At
. Also,
for xedr, :
(a)
j
a
(`t)
a
. Thisyieldstheapproximation
1(A = r) '
(`t)
a
c
At
r!
Youcaneasilyconrmthat this, i.e.
)
t
(r) = )(r) =
(`t)
a
c
At
r!
; r = 0, 1, 2,
provides asolutionto thesystem(5.4) withtherequiredinitial conditions. If welet j = `t, wecan
re-write)(r) as)(r) =
j
i
c

a!
, whichisthePoissondistributionfromSection5.7. That is:
90
InaPoissonprocesswithrateof occurrence`, thenumber of event occurrencesA
inatimeinterval of lengtht hasaPoissondistributionwithj = `t.
Interpretation of j and `: ` isreferredtoastheintensity or rate of occurrence parameter for
theevents. It represents theaveragerateof occurrenceof events per unit of time(or areaor volume,
as discussedbelow). Then`t = j represents theaveragenumber of occurrences int units of time.
It is important tonotethat thevalueof ` depends ontheunitsusedtomeasuretime. For example, if
phonecallsarriveatastoreatanaveragerateof 20per hour, then` = 20 whentimeisinhoursandthe
averagein3hourswill be320 or60. However, if timeismeasuredinminutesthen` = 20,60 = 1,3;
theaveragein180minutes(3hours) isstill (1,3)(180) = 60.
Example Suppose earthquakes recorded in Ontario each year follow a Poisson process with an
average of 6 per year. What is the probability that 7 will be recorded in a 2-year period?
Inthiscaset = 2(years) andtheintensity of earthquakesis` = 6 . ThereforeA, thenumber of
earthquakes inthetwo-year periodfollows aPoissondistributionwithparameter j = `t = 12. The
probabilitythat 7earthquakeswill berecordedina2year periodis)(7) =
12
7
c
12
7!
= .0437.
Example At a nuclear power station an average of 8 leaks of heavy water are reported per year. Find
the probability of 2 or more leaks in 1 month, if leaks follow a Poisson process.
Solution: Assumeleakssatisfytheconditionsfor aPoissonprocessandthatamonthis1,12 of ayear.
Let A bethenumber of leaksinonemonth. ThenA hasthethePoissondistributionwith` = 8 and
t = 1,12, soj = `t = 8,12. Thus
1(A 2) = 1 1(A < 2)
= 1 [)(0) +)(1)]
= 1
"
(8,12)
0
c
812
0!
+

8
12

1
c
812
1!
#
' 0.1443.
Random Occurrence of Events in Space: ThePoissonprocess also applies whenevents occur
randomlyinspace(either 2or 3dimensions). For example, theevents might bebacteriainavolume
of water or blemishesinthenishof apaint jobonametal surface. If A isthenumber of eventsina
volumeor areainspaceof size andif ` is theaveragenumber of events per unit volume(or area),
thenA has aPoissondistributionwithj = `. For this model to bevalid, it is assumedthat the
Poissonprocessconditionsgivenpreviously apply here, withtime replacedby volume or area.
Onceagain, notethat thevalueof ` dependsontheunitsusedtomeasurevolumeor area.
91
Example: Coliformbacteriaoccur inriver water withanaverageintensityof 1bacteriaper 10cubic
centimeters(cc) of water. Find(a) theprobabilitytherearenobacteriaina20ccsampleof water which
istested, and(b) theprobabilitythereare5or morebacteriaina50ccsample. (Todothisassumethat
aPoissonprocessdescribesthelocationof bacteriainthewater at anygiventime.)
Solution: Let A =number of bacteriainasampleof volume cc. Since` =0.1bacteriaper cc (1
per 10cc) thep.f. of A isPoissonwithj = .1,
)(r) = c
.1
(.1)
a
r!
r = 0, 1, 2, . . .
Thuswend
(a) With = 20, j = 2 so1(A = 0) = )(0) = c
2
= .135
(b) With = 50, j = 5 so)(r) = c
5
5
a
,r! and1(A 5) = 1 1(A 4) = .440
(Note: wecanusethe1 commandjjoi:(4, 5) toget 1(A 4).)
Exercise: Ineachof theaboveexamples, howwell areeachof theconditionsfor aPoissonprocess
likelytobesatised?
Distinguishing Poisson from Binomial and Other Distributions
StudentsoftenhavetroubleknowingwhentousethePoissondistributionandwhennottouseit. Tobe
certain, thethreeconditionsfor aPoissonprocessneedtobechecked. However, aquick decisioncan
oftenbemadebyaskingyourself thefollowingquestions:
1. Can we specify in advance the maximum value which A can take?
If wecan, thenthedistributionis not Poisson. If thereis no xedupper limit, thedistribution
might be Poisson, but is certainly not binomial or hypergeometric, e.g. the number of seeds
whichgerminateout of apackageof 25does not haveaPoissondistributionsinceweknowin
advancethat A 25. Thenumber of cardinalssightedat abirdfeedingstationinaweekmight
bePoissonsincewecant specify axedupper limit onA. At any rate, thisnumber wouldnot
haveabinomial or hypergeometric distribution. Of courseif it is binomial with avery large
valueof : andasmall valueof j wemay still usethePoissondistribution, but inthiscaseit is
beingusedtoapproximateabinomial.
2. Does it make sense to ask how often the event did not occur?
If it doesmakesense, thedistributionisnot Poisson. If it doesnot makesense, thedistribution
might bePoisson. For example, it doesnot makesensetoaskhowoftenapersondidnot hiccup
duringanhour. Sothenumber of hiccupsinanhour might haveaPoissondistribution. It would
92
certainlynot bebinomial, negativeBinomial, or hypergeometric. If acoinweretosseduntil the
3
rd
headoccursit doesmakesensetoask howoftenheadsdidnot comeup. Sothedistribution
would not bePoisson. (In fact, wed usenegativebinomial for thenumber of non-heads; i.e.
tails.)
Problems:
5.8.1 Supposethat emergency calls to 911 follow aPoisson process with an averageof 3 calls per
minute. Findtheprobabilitytherewill be
a) 6callsinaperiodof 2
1
2
minutes.
b) 2callsintherstminuteof a2
1
2
minuteperiod, giventhat6callsoccur intheentireperiod.
5.8.2 Misprintsaredistributedrandomlyanduniformlyinabook, at arateof 2per 100lines.
(a) What istheprobabilityalineisfreeof misprints?
(b) Twopages areselectedat random. Onepagehas 80lines andtheother 90lines. What is
theprobabilitythat thereareexactly2misprintsoneachof thetwopages?
5.9 Combining Other Models with the Poisson Process
23
Whileweveconsideredthemodel distributionsinthischapter oneatatime, wewill sometimesneed
to usetwo or moredistributions to answer aquestion. To handlethis typeof problemyoull needto
beveryclear about thecharacteristicsof eachmodel. Hereisasomewhat articial illustration. Lotsof
other examplesaregivenintheproblemsat theendof thechapter.
Example: A verylarge(essentiallyinnite) number of ladybugsisreleasedinalargeorchard. They
scatter randomlysothat onaverageatreehas6ladybugsonit. Treesareall thesamesize.
a) Findtheprobabilityatreehas 3 ladybugsonit.
b) When10treesarepickedat random, what istheprobability 8of thesetreeshave 3 ladybugs
onthem?
23
Thissectionoptional for stat 220
93
c) Trees are checked until 5 with 3 ladybugs are found. Let A be the total number of trees
checked. Findtheprobabilityfunction, )(r).
d) Findtheprobabilityatreewith 3 ladybugsonit hasexactly6.
e) On2treesthereareatotal of t ladybugs. Findtheprobability that r of theseareontherst of
these2trees.
Solution:
a) If theladybugs arerandomly scatteredthemost suitablemodel is thePoissondistributionwith
` = 6 and = 1 (i.e. anytreehasavolume" of 1unit), soj = 6 and
1(A 3) = 1 1(A 3) = 1 [)(0) +)(1) +)(2) +)(3)]
= 1
h
6
0
c
6
0!
+
6
1
c
6
1!
+
6
2
c
6
2!
+
6
3
c
6
3!
i
= .8488
b) Usingthebinomial distributionwheresuccess means 3 ladybugs onatree, wehave: =
10, j = .8488 and
)(8) =

10
8

(.8488)
8
(1 .8488)
2
= .2772
c) Usingthenegativebinomial distribution, weneedthenumber of successes, /, to be5, andthe
number of failurestobe(r 5). Then
)(r) =

a5+51
a5

(.8488)
5
(1 .8488)
a5
=

a1
a5

(.8488)
5
(1 .8488)
a5
or

a1
4

(.8488)
5
(.1512)
a5
; r = 5, 6, 7,
d) Thisisconditional probability. Let ={6ladybugs}and
1 ={morethan3 ladybugs}. Then
1(|1) =
1(1)
1(1)
=
1(6 ladybugs)
1(morethan3 ladybugs)
=
6
6
c
6
6!
0.8488
= 0.1892.
e) Againweneedtouseconditional probability.
1(r on1
st
tree|total of t) =
1(r on 1
st
tree and total of t)
1(total of t)
=
1(a on 1
st
tree and ta on 2
nd
tree)
1( total of t)
=
1(a on 1
st
tree). 1(ta on 2
nd
tree)
1(total of t)
UsethePoissondistributionto calculateeach, withj = 6 2 = 12 inthedenominator since
thereare2trees.
94
1(r on1
st
tree|total of t) =

6
i
c
6
a!

6
Ii
c
6
(ta)!

12
I
c
12
t!
=
t!
r!(t r)!

6
12

6
12

ta
=

t
r

1
2

1
1
2

ta
, r = 0, 1, , t.
Caution: Dont forget togivetherangeof r. If thetotal ist, therecouldnt bemorethant ladybugs
onthe1
st
tree.
Exercise: Theanswer to(e) isabinomial probabilityfunction. Canyoureachthisanswer bygeneral
reasoningrather thanusingconditional probabilitytoderiveit?
Problems:
5.9.1 InaPoissonprocesstheaveragenumber of occurrencesis` per minute. Independent 1minute
intervalsareobserveduntil therst minutewithnooccurrencesisfound. Let A bethenumber
of 1minuteintervalsrequired, includingthelast one. Findtheprobabilityfunction, )(r).
5.9.2 Calls arriveat atelephonedistress centreduringtheeveningaccordingto theconditions for a
Poissonprocess. Onaveragethereare1.25callsper hour.
(a) Findtheprobabilitytherearenocallsduringa3hour shift.
(b) Giveanexpressionfor theprobabilityapersonwhostartsworkingat thiscentrewill have
therst shift withnocallsonthe15
th
shift.
(c) A personworksonehundred3hour eveningshiftsduringtheyear. Giveanexpressionfor
theprobability thereareno calls onat least 4of these100shifts. Calculateanumerical
answer usingaPoissonapproximation.
95
5.10 Summary of Single Variable Discrete Models
Name ProbabilityFunction
DiscreteUniform )(r) =
1
bo+1
; r = a, a + 1, a + 2, /
Hypergeometric )(r) =
(
r
i
)(
^r
ni
)
(
^
n
)
; r = max(0, : ( r)), , min(:, r)
Binomial )(r) =

a
a

j
a
(1 j)
aa
; r = 0, 1, 2, :
NegativeBinomial )(r) =

a+I1
a

j
I
(1 j)
a
; r = 0, 1, 2,
Geometric )(r) = j (1 j)
a
; r = 0, 1, 2,
Poisson )(r) =
c

j
i
a!
; r = 0, 1, 2,
96
5.11 Problems on Chapter 5
5.1 Supposethattheprobabilityj(r) apersonbornin1950livesatleasttocertainagesr isasgiven
inthetablebelow.
r: 30 50 70 80 90
Females .980 .955 .910 .595 .240
Males .960 .920 .680 .375 .095
(a) If afemalelivestoage50, whatistheprobabilityshelivestoage80? Toage90? Whatare
thecorrespondingprobabilitiesfor males?
(b) If 51%of personsbornin1950weremale, ndthefractionof thetotal population(males
andfemales) that will livetoage90.
5.2 Let A beanon-negativediscreterandomvariablewithcumulativedistributionfunction
1(r) = 1 2
a
for r = 0, 1, 2, ...
(a) Findtheprobabilityfunctionof A.
(b) Findtheprobabilityof theevent A = 5; theevent A 5.
5.3 Twoballsaredrawnat randomfromaboxcontainingtenballsnumbered0, 1, ..., 9. Let random
variableA bethelarger of thenumbersonthetwoballsandrandomvariable1 betheir total.
(a) Tabulatethep.f. of A andof 1 if thesamplingiswithout replacement.
(b) Repeat (a) if thesamplingiswith replacement.
5.4 LetA haveageometricdistributionwith)(r) = j(1j)
a
; r = 0, 1, 2, . Findtheprobability
functionof 1, theremainder whenA isdividedby4.
5.5 (a) Todd decides to keep buying alottery ticket each week until hehas 4 winners (of some
prize). Suppose30%of theticketswinsomeprize. Findtheprobabilityhewill havetobuy
10tickets.
(b) A coffeechainclaims that youhavea1in9chanceof winningaprizeontheir roll up
theedge" promotion, whereyouroll uptheedgeof your paper cupto seeif youwin. If
so, what istheprobabilityyouhavenowinnersinaoneweekperiodwhereyoubought 15
cupsof coffee?
(c) Over thelast week of a month long promotion you and your friends bought 60 cups of
coffee, but therewasonly 1winner. Findtheprobability that therewouldbethisfew(i.e.
1or 0) winners. What might youconclude?
0.595/0.955=0.623 0.240/0.955=0.251
0.375/0.920=0.408 0.095/0.920=0.103
0.49*0.240+0.51*0.095=0.166
f(x)= F(x) - F(x-1) = 2^(-x+1) - 2^(-x) = 1/ 2^x x=0,1,2.....
f(5)= 1/32 P(X>= 5) = 1 - P(X <= 4) = 1- (1- 2^-4) = 1 - 15/16 = 1/16
R(x) = p(1-p)^r / (1-(1-p)^4), r=0,1,2,3
9C3*(0.3)^3 *(0.7)^6*0.3=0.080
(8/9)^15=0.171
(8/9)^59 * 60C1 * (1/9)^1+ 8/9)^60 = 0.00725
0.00725
97
5.6 Anoil companyrunsacontest inwhichthereare500,000tickets; amotorist receivesoneticket
witheachll-upof gasoline, and500of theticketsarewinners.
(a) If amotorist hastenll-upsduringthecontest, what istheprobability that heor shewins
at least oneprize?
(b) If aparticular gas bar distributes 2,000tickets duringthecontest, what is theprobability
that thereisat least onewinner amongthegasbarscustomers?
5.7 Jury selection. During jury selection alargenumber of peopleareasked to bepresent, then
personsareselectedoneby oneinarandomorder until therequirednumber of jurorshasbeen
chosen. Becausetheprosecutionanddefenseteamscaneachreject acertainnumber of persons,
andbecausesomeindividualsmaybeexemptedbythejudge, thetotal numberof personsselected
beforeafull juryisfoundcanbequitelarge.
(a) Supposethat youareoneof 150personsaskedtobepresent for theselectionof ajury. If
it isnecessarytoselect 40personsinorder toformthejury, what istheprobabilityyouare
chosen?
(b) In arecent trial thenumbers of men and women present for jury selection were74 and
76. Let 1 bethenumber of menpickedfor ajury of 12persons. Giveanexpressionfor
1(1 = j), assumingthat menandwomenareequallylikelytobepicked.
(c) For thetrial inpart (b), thenumber of menselectedturnedout tobetwo. Find1(1 2).
What might youconcludefromthis?
5.8 A wastedisposal company averages 6.5 spills of toxic wasteper month. Assumespills occur
randomly at auniformrate, andindependently of eachother, withanegligiblechanceof 2or
moreoccurringat thesametime. Findtheprobability thereare4or morespills ina2month
period.
5.9 Coliformbacteriaaredistributedrandomly anduniformly throughout river water at theaverage
concentrationof oneper twentycubiccentimetersof water.
(a) What is theprobability of ndingexactly twocoliformbacteriaina10cubic centimeters
sampleof theriver water?
(b) What is theprobability of ndingat least onecoliformbacteriumina1cubic centimeter
sampleof theriver water?
(c) Intestingfor theconcentration(averagenumber per unit volume) of bacteriait ispossible
to determinecheaply whether asamplehas any (i.e. 1 or more) bacteriapresent or not.
P(X>=1) = 1 - P(X=0) = 1- (499,500/500,000)^10 = 0.010
P(X>=1) = 1- 2000C0 * (500/500,000)^0
* (499,500/500,000)^2000=0.865
40/150=0.267
P(Y=y) = 74Cy * 76C(12-y) / 150C12
P(Y<=2) = P(Y=0)+P(Y=1) + P(Y=2) = 0.0018+0.00246+0.01495=0.0176
98
Supposetheaverageconcentrationof bacteriainabodyof water is` per cubiccentimeter.
If 10independentwater samplesof 10c.c. eacharetested, lettherandomvariable1 bethe
number of sampleswithno bacteria. Find1(1 = j).
(d) Supposethat of 10samples, 3hadnobacteria. Findanestimatefor thevalueof `.
5.10 Inagroupof policyholdersfor houseinsurance, theaveragenumber of claimsper 100policies
per year is` = 8.0. Thenumber of claimsfor anindividual policyholder isassumedtofollowa
Poissondistribution.
(a) Inagivenyear, what istheprobabilityanindividual policyholder hasat least oneclaim?
(b) Inagroupof 20policyholders, what istheprobabilitytherearenoclaimsinagivenyear?
What istheprobabilitytherearetwoor moreclaims?
5.11 Assumepower failuresoccur independentlyof eachother atauniformratethroughthemonthsof
theyear, withlittlechanceof 2or moreoccurringsimultaneously. Supposethat 80%of months
havenopower failures.
a) Sevenmonthsarepickedatrandom. Whatistheprobabilitythat 5of thesemonthshaveno
power failures?
b) Monthsarepickedatrandomuntil 5monthswithoutpower failureshavebeenfound. What
istheprobabilitythat 7monthswill havetobepicked?
c) What istheprobabilityamonthhasmorethanonepower failure?
5.12 a) Let )(r) =
(
r
i
)(
^r
ni
)
(
^
n
)
, andkeep j =
v
.
xed. (e.g. If doubles then r also doubles.)
Provethat lim
.
)(r) =

a
a

j
a
(1 j)
aa
.
b) What part of thechapter isthisrelatedto?
5.13 Spruce budworms are distributed through a forest according to a Poisson process so that the
averageis` per hectare.
a) Giveanexpressionfor theprobabilitythat at least 1of : onehectareplotscontainsat least
/ sprucebudworms.
b) Discussbrieywhichassumption(s) for aPoissonprocessmaynot bewell satisedinthis
situation.
5.14 A person working in telephone sales has a 20% chance of making a sale on each call, with
calls beingindependent. Assumecalls aremadeat auniformrate, with thenumbers madein
non-overlappingperiodsbeingindependent. Onaveragethereare20callsmadeper hour.
7C5*(0.8)^5*(0.2)^2 = 0.2753
6C4 * (0.8)^4 * (0.2)^2 *0.8=0.1966
P(X>1) = 1- P(X=0)-P(X=1)
1-p = 0.8 ^(1/30), p = 0.0074

0.0215
Enables us to approximate hypergeometric distribution
by binomial distribution
99
a) Findtheprobabilitythereare2salesin5calls.
b) Findtheprobabilityexactly8callsareneededtomake2sales.
c) If 8callswereneededtomake2sales, what istheprobabilitytherewas1saleintherst 3
of thesecalls?
d) Findtheprobabilityof 3callsbeingmadeina15minuteperiod.
5.15 A binat ahardwarestorecontains35forty watt lightbulbsand70sixtywatt bulbs. A customer
wants to buy 8 sixty watt bulbs, and withdraws bulbs without replacement until these8 bulbs
havebeenfound. LetA bethenumber of 40wattbulbsdrawnfromthebin. Findtheprobability
function, )(r).
5.16 Duringrush hour thenumber of cars passingthrough aparticular intersection
24
has aPoisson
distributionwithanaverageof 540per hour.
a) Findtheprobability thereare11carsina30secondinterval andtheprobability thereare
11or morecars.
b) Findtheprobabilitythatwhen20disjoint30secondintervalsarestudied, exactly2of them
had11cars.
c) Wewant tond12intervalshaving11carsin30seconds.
(i) Giveanexpressionfor theprobability140030secondintervalshavetobeobservedto
ndthe12havingthedesiredtrafcow.
(ii) Useanapproximationwhichinvolves thePoissondistributiontoevaluatethisproba-
bilityandjustifywhythisapproximationissuitable.
5.17 (a) Bubblesaredistributedinsheetsof glass, asaPoissonprocess, atanintensityof 1.2bubbles
per squaremetre. Let A bethenumber of sheetsof glass, inashipment of : sheets, which
havenobubbles. Eachsheet is0.8m
2
. Givetheprobabilityfunctionof A.
(b) Theglassmanufacturer wantstohaveat least 50%of thesheetsof glasswithnobubbles.
Howsmall will theintensity` needtobetoachievethis?
5.18 RandomvariableA takesvalues1,2,3,4,5andhasc.d.f.
r 0 1 2 3 4 5
1(r) 0 .1/ .2 .5/ / 4/
2
Find/, )(r) and1 (2 < A 4). Drawahistogramof )(r).
24
"TrafcsignalsinNewYorkarejust roughguidelines." DavidLetterman(1947- )
f(x)=(x+7)Cx * (35Cx) * (70C7) * ((70-7)/ (105-x-7)
f(x) = 35Cx * 70C7 / 105C(X+7) * (63/(98-x)) x=0,1,2...35 NO (x+7) C x ???
100
5.19 Letrandomvariable1 haveageometric distribution 1(1 = j) = j(1 j)
j
for j = 0, 1, 2, ... .
(a) Findanexpressionfor 1(1 j), andshowthat 1(1 : +t|1 :) = 1(1 t) for all
non-negativeintegers:, t.
(b) What isthemost probablevalueof 1 ?
(c) Findtheprobabilitythat 1 isdivisibleby3.
(d) Findtheprobabilityfunctionof randomvariable1, theremainder when1 isdividedby3.
5.20 Polls and Surveys. Polls or surveys inwhichpeopleareselectedandtheir opinions or other
characteristics aredeterminedarevery widely used. For example, inasurvey oncigaretteuse
amongteenagegirls, wemightselectarandomsampleof : girlsfromthepopulationinquestion,
anddeterminethenumber A whoareregular smokers. If j is thefractionof girlswhosmoke,
thenA 1i(:, j). Sincej is unknown(that is why wedo thesurvey) wethenestimateit as
j = A,:. (Inprobabilityandstatisticsahat" isusedtodenoteanestimateof amodel parameter
basedondata.) Thebinomial distributioncanbeusedtostudy howgood" suchestimates are,
asfollows
(a) Supposej = .3 and: = 100. Findtheprobability1(.27
A
a
.33). Many surveystry
toget anestimateA,: whichiswithin3%(.03) of j withhighprobability. What doyou
concludehere?
(b) Repeat thecalculationin(a) if : = 400 and: = 1000. What doyouconclude?
(c) If j = .5 insteadof .3, nd1(.47
A
a
.53) when: = 400 and1000.
(d) Your employer asksyoutodesignasurveytoestimatethefractionj of personsage25-34
whodownloadmusicviatheinternet. Theobjectiveistoget anestimateaccuratetowithin
3%, withprobabilitycloseto.95. What sizeof sample(:) wouldyourecommend?
5.21 Telephone surveys. Insomerandomdigit dialing" surveys, acomputer phones randomly se-
lectedtelephonenumbers. However, notall numbersareactive" (belongtoatelephoneaccount)
andtheymaybelongtobusinessesaswell astoindividual or residences.
Supposethat for agivenlargeset of telephonenumbers, 57%areactiveresidential or individual
numbers. Wewill call thesepersonal" numbers.
Supposethat wewishtointerview(over thephone) 1000personsinasurvey.
(a) Supposethattheprobabilityacall toapersonal number isansweredis.8, andthattheprob-
abilitythepersonansweringagreestobeinterviewedis.7. Givetheprobabilitydistribution
for A, thenumber of callsneededtoobtain1000interviews.
(p+p(1-p)^y) / (1-(1-p)) = (1-p)^y (1-p)^(s+t) / (1-p)^s = (1-p)^t
y=0,1,2... = P(Y>= t)
Y=0
p/(1-(1-p)^3)
R(y) = p(1-p)^r / (1 - (1-p)^3) , r = 0,1,2
101
(b) UseR softwaretond1(A r) for thevaluesr = 2900, 3000, 3100, 3200.
(c) Supposeinsteadthat 3200randomly selectednumbers weredialed. Givetheprobability
distributionfor 1 , thenumber of interviewsobtained, andnd1(1 1000).
(Note: TheR functionspnbinom andpbinom givenegativebinomial andbinomial probabilities,
respectively.)
5.22

Challenge problem: Supposethat: independenttossesof acoinhavingprobabilityj of coming


upheadsaremade. Showthat theprobabilityof anevennumber of headsisgivenby
1
2
[1 +(
j)
a
] where = 1 j.
6. Computational Methods and
25
Oneof thegiant steps towards democracy inthelast century was theincreaseddemocratizationof
knowledge
26
, facilitatedbythepersonal computer, Wikipedia andtheadventof freeopen-source(GNU)
softwaresuchasLinux. Thestatistical softwarepackage1 implementsadialect of theSlanguagethat
wasdevelopedat AT&T Bell LaboratoriesbyRickBecker, JohnChambersandAllanWilks. Versions
of 1 areavailable, at no cost, for 32-bit versions of Microsoft Windows for Linux, for Unix andfor
Macintoshsystems. ItisavailablethroughtheComprehensiveRArchiveNetwork(CRAN) (download-
ablefor unix, windowsor MACplatformsathttp://cran.r-project.org/). Thismeansthatacommunityof
interestedstatisticiansvoluntarilymaintainandupdatesthesoftware. LikethelicensedsoftwareMatlab
andSplus, 1 permits easy matrix andnumerical calculations, as well as aprogrammingenvironment
for high-level computations. The1 softwarealso provides apowerful tool for handling probability
distributions, generating randomvariables, and graphical display. Becauseit is freely availableand
usedbystatisticiansworld-wide, highlevel programsin1 areoftenavailableontheweb. Thesenotes
provideaglimpseof afewof thefeaturesof 1. Webresourceshavemuchmoreinformationandmore
links canbefoundontheStat 230webpage. Wewill provideabrief descriptionof commands ona
windowsmachinehere, buttheMACandUNIX commandswill generallybesimilar once1 isstarted.
6.1 Preliminaries
BeginbyinstallingR onyour personal computer andtheninvokeit onMathUnixmachinesbytyping
1 or onawindowsmachinebyclickingonthe1 icon. For thesenotes, wewill simplydescribetyping
commandsintotheRcommandwindowfollowingtheRprompt"" ininteractivemode. Thiswindow
isdisplayedbelowinFigure6.1
Objects includevariables, functions, vectors, arrays, lists andother items. To seeonlinedocumenta-
tionabout something, weusethe"help" function. For example, toseedocumentationonthefunction
mean(), type
25
FThissectionoptional for Stat 220andStat 230
26
"Knowledgeisthemost deomocraticsourceof power" AlvinTofer
102
103
Figure6.1: AnR, version2.7.1commandwindowinwindows
hel p( mean) .
Insomecaseshelp.search() ishelpful. For example
hel p. sear ch( "mat r i x")
listsall functionswhosehelppageshaveatitleor aliasinwhichthetext stringmatrix appears.
The<- isaleft diamondbracket (<) followedby aminussign(-). It meansisassignedto, for
example,
x<- 15
assignsthevalue15tovariablex. Toquit anR session, type
q( )
Youneedthebrackets () becauseyouwishto runthefunction"q". Typingqonits own, without
theparentheses, displaysthetext of thefunctiononthescreen. Tryit! Alternativelytoquit R, youcan
click onthe"File" menuandthenonExit or onthexinthetopright corner of theR window. Youare
askedwhether youwant tosavetheworkspaceimage. Clicking"Yes" (safer) will saveall theobjects
that remainintheworkspaceboththoseat thestart of thesessionandthoseadded.
104 1
6.2 Vectors
Vectors can consist of numbers or other symbols; we will consider only numbers here. Vectors are
denedusingc(): for example,
x<- c( 1, 3, 5, 7, 9)
denesavector of length5withtheelementsgiven. Vectorsandother classesof objectspossesscertain
attributes. For example, typing
l engt h( x)
will givethelengthof thevector x. Vectors areaconvenient way tostorevalues of afunction(e.g. a
probabilityfunctionor ac.d.f) or valuesof arandomvariablethat havebeenrecordedinsomeexperi-
ment or process. Wecanalsoreadatableof valuesfromatext lethat wecreatedearlier calledsay
"mydata.txt" onadiskindrivec:
> mydat a <- r ead. t abl e( "c: / mydat a. t xt ", header =T)
Useof "header=T" causes 1 to usetherst lineof thetext leto get header informationfor the
columns. If columnheadingsarenot includedinthele, theargument canbeomittedandweobtaina
tablewithjust thedata. The1 object"mydata" isaspecial formknownasa"dataframe". Dataframes
that consist entirely of numeric datahaveastructurethat is similar to that of numeric matrices. The
namesof thecolumnscanbedisplayedwiththecommand
> names( mydat a)
6.3 Arithmetic Operations
ThefollowingR commandsandresponsesshouldexplainthemost basicarithmeticoperations.
> 7+3
[ 1] 10
> 7*3
[ 1] 21
> 7/ 3
[ 1] 2. 333333
> 2^3
[ 1] 8
Inthelast exampletheresult is8. The[1] saysbasicallyrst requestedelement follows but here
thereisjust oneelement. The"" indicatesthat R isreadyfor another command.
105
6.4 Some Basic Functions
Functions of many types exist in R. Many operateon vectors in atransparent way, as do arithmetic
operations. For example, if x andy arevectors thenx+y adds thevectors element-wise; thus x andy
must bethesamelength. Someexamples, withcomments, follow. Notethat anythingthat followsa#
onthecommandlineistakenascomment andignoredbyR.
> x<- c( 1, 3, 5, 7, 9) # Def i nes a vect or x
> x # di spl ays x
[ 1] 1 3 5 7 9
> y<- seq( 1, 2, . 25) #seq def i nes vect or whose el ement s ar e an ar i t hmet i c pr ogr essi on
> y
[ 1] 1. 00 1. 25 1. 50 1. 75 2. 00
> y[ 2] #di spl ays t he second el ement of vect or y
[ 1] 1. 25
> y[ c( 2, 3) ] #di spl ays vect or of second and t hi r d el ement s of vect or y
[ 1] 1. 25 1. 50
> mean( x) #comput es mean of t he el ement s of vect or x
[ 1] 5
> summar y( x) #f unct i on whi ch summar i zes f eat ur es of a vect or x
Mi n. 1st Qu. Medi an Mean 3r d Qu. Max.
1 3 5 5 7 9
> var ( x) # Comput es t he ( sampl e) var i ance of t he el ement s of x
[ 1] 10
> exp( 1) # The exponent i al f unct i on
[ 1] 2. 718282
> exp( y)
[ 1] 2. 718282 3. 490343 4. 481689 5. 754603 7. 389056
> r ound( exp( y) , 2) # r ound( y, n) r ounds el ement s of vect or y t o n deci mal s
[ 1] 2. 72 3. 49 4. 48 5. 75 7. 39
> x+2*y
[ 1] 3. 0 5. 5 8. 0 10. 5 13. 0
106 1
6.5 R Objects
Type "ls()" to see a list of names of all objects, including functions and data structures, in your
workspace.
If youtypethenameof anobject, vector, matrix or function, youarereturnedits contents. (Try
typing"q" or "mean").
Beforeyouquit, youmayremoveobjectsthat younolonger requirewith"rm()" andthensavethe
workspaceimage. Theworkspaceimageisautomaticallyloadedwhenyourestart 1 inthat directory.
6.6 Graphs
ToopenagraphicswindowinUnix, typex11(). NotethatinR, agraphicswindowopensautomatically
whenagraphical functionisused.
Therearevariousplottingandgraphical functions. Twouseful onesare
pl ot ( x, y) # Gi ves a scat t er pl ot of x ver sus y; t hus x and y must be
#vect or s of t he same l engt h.
hi st ( x) # Cr eat es a f r equency hi st ogr ambased on t he val ues i n t he
#vect or x. To get a r el at i ve f r equency hi st ogr am( ar eas of
#r ect angl es sumt o one) use
hi st ( x, pr ob=T) .
Graphscanbetailoredwithrespecttoaxislabels, titles, numbersof plotstoapageetc. Typehelp(plot),
help(hist) or help(par) for someinformation. Try
x<- ( 0: 20) *pi / 10
pl ot ( x, si n( x) )
Is it obvious that thesepoints lieon asinecurve? Onecan makeit moreobvious by changing
theshapeof thegraph. Placethecursor over thelower border of thegraphsheet, until it becomes a
double-sided and then drag theborder in towards thetop border, to makethegraph sheet short and
wide.
Tosave/print agraphinR usingUNIX, yougeneratethegraphyouwouldliketosave/print inR using
agraphingfunctionlikeplot() andtype:
107
dev. pr i nt ( devi ce, f i l e="f i l ename")
wheredeviceis thedeviceyouwouldliketosavethegraphto(i.e. x11) andlenameis thenameof
thelethat youwouldlikethegraphsavedto. To look at alist of thedifferent graphics devices you
cansaveto,
t ype hel p( Devi ces) .
Tosave/print agraphin1 usingWindows, youcandooneof twothings.
a) YoucangototheFilemenuwhenthegraphwindowisactiveandsavethegraphusingoneof
several formats (i.e. postscript, jpeg, etc.) or print it. You may also copy thegraphto theclipboard
usingoneof theformatsandthenpastetoaneditor, suchasMSWord.
b) Youcanright click onthegraph. Thisgivesyouachoiceof copyingthegraphandthenpasting
toaneditor, suchasMSWord, or savingthegraphasametaleor bitmapor printdirectlytoaprinter.
6.7 Distributions
Therearefunctions whichcomputevalues of probability or probability density functions, cumulative
distributionfunctions, andquantiles for various distributions. It is also possibleto generate(pseudo)
randomsamples fromthesedistributions. Someexamples followfor Binomial andPoisson distribu-
tions. For other distributioninformation, type
hel p( r hyper ) ,
hel p( r nbi nom)
andso on. Notethat 1 does not haveany functionspecically designedto generaterandomsamples
fromadiscreteuniformdistribution(althoughthereis onefor acontinuous uniformdistribution). To
generatenrandomsamplesfromadiscreteUNIF(a,b), use
sampl e( a: b, n, r epl ace=T) .
> y<- r bi nom( 10, 100, 0. 25) # Gener at e 10 r andomval ues f r omt he Bi nomi al
#di st r i but i on Bi ( 100, 0. 25) . The val ues ar e st or ed i n t he vect or y.
> y # Di spl ay t he val ues
[ 1] 24 24 26 18 29 29 33 28 28 28
108 1
> pbi nom( 3, 10, 0. 5) # Comput e P( Y<=3) f or a Bi ( 10, 0. 5) r andomvar i abl e.
[ 1] 0. 171875
> qbi nom( . 95, 10, 0. 5) # Fi nd t he . 95 quant i l e ( 95t h per cent i l e) f or
[ 1] 8 Bi ( 10, 0. 5) .
> z<- r poi s( 10, 10) # Gener at e 10 r andomval ues f r omt he Poi sson di st r i but i on
#Poi sson( 10) . The val ues ar e st or ed i n t he vect or z.
> z # Di spl ay t he val ues
[ 1] 6 5 12 10 9 7 9 12 5 9
> ppoi s( 3, 10) # Comput e P( Y<=3) f or a Poi sson( 10) r andomvar i abl e.
[ 1] 0. 01033605
> qpoi s( . 95, 10) # Fi nd t he . 95 quant i l e ( 95t h per cent i l e) f or
[ 1] 15 Poi sson( 10) .
Toillustratehowtoplot theprobabilityfunctionfor arandomvariable, aBi(10,0.5) randomvariableis
used.
# Assi gn al l possi bl e val ues of t he r andomvar i abl e, X ~ Bi ( 10, 0. 5)
x <- seq( 0, 10, by=1)
# Det er mi ne t he val ue of t he pr obabi l i t y f unct i on f or possi bl e val ues of X
x. pf <- dbi nom( x, 10, 0. 5)
# Pl ot t he pr obabi l i t y f unct i on
bar pl ot ( x. pf , xl ab="X", yl ab=" Pr obabi l i t y Funct i on",
names. ar g=c( "0", " 1", "2", "3", "4", "5", "6" , "7", "8", "9", "10") )
LoopsinRareeasytoconstruct but longloopscanbeslowandshouldbeavoidedwherepossible. For
example
x=0
f or ( i i n 1: 10) x<- c( x, i )
canbereplacedby
x=c( 0: 10)
109
Commonly used functions.
pr i nt ( ) # Pr i nt s a si ngl e R obj ect
cat ( ) # Pr i nt s mul t i pl e obj ect s, one af t er t he ot her
l engt h( ) # Number of el ement s i n a vect or or of a l i st
mean( ) # mean of a vect or of dat a
medi an( ) # medi an of a vect or of dat a
r ange( ) # Range of val ues of a vect or of dat a
uni que( ) # Gi ves t he vect or of di st i nct val ues
di f f ( ) # t he vect or of f i r st di f f er ences so di f f ( x) has
# one l ess el ement t han x
sor t ( ) # Sor t el ement s i nt o or der , omi t t i ng NAs
or der ( ) # x[ or der ( x) ] or der s el ement s of x, wi t h NAs l ast
cumsum( ) # vect or of par t i al or cumuul at i ve sums
cumpr od( ) # vect or of par t i al or cumuul at i ve pr oduct s
r ev( ) # r ever se t he or der of vect or el ement
6.8 Problems on Chapter 6
6.1 Thefollowingtenobservations, takenduringtheyears1970-79, areonOctober snowcover for
Eurasia. (Snowcover isinmillionsof squarekilometers).
Year Snow.cover
1970 6.5
1971 12
1972 14.9
1973 10
1974 10.7
1975 7.9
1976 21.9
1977 12.5
1978 14.5
1979 9.2
(a) Enter thedataintoR. Tosavekeystrokes, enter thesuccessiveyearsas1970:1979
(b) Plot snow.cover versusyear.
(c) Use"hist()" toplot ahistogramof thesnowcover values.
110 1
(d) Repeat bandcafter takinglogarithmsof snowcover.
6.2 Input the following data, on damage that had occurred in space shuttle launches prior to the
disastrousChallenger spaceshuttlelaunchof J an281986.
DateTemperature Number of damageincidents
Date Temperature (F)
Number of
Damage Incidents
Date Temperature (F)
Number of
Damage Incidents
4/12/81 66 0 10/5/84 78 0
11/12/81 70 1 11/8/84 67 0
3/22/82 69 0 1/24/85 53 3
6/27/82 80 NA 4/12/85 67 0
1/11/82 68 0 4/29/85 75 0
4/4/83 67 0 6/17/85 70 0
6/18/83 72 0 7/29/85 81 0
8/30/83 73 0 8/27/85 76 0
11/28/83 70 0 10/3/85 79 0
2/3/84 57 1 10/30/85 75 2
4/6/84 63 1 11/26/85 76 0
8/30/84 70 1 1/12/86 58 1
ThiswasthenfollowedbythedisasterousCHALLENGER incident on1/28/86.
(a) Enterthetemperaturedataintoadataframe, with(forexample) columnnamestemperature,
damage.
(b) Plot total incidentsagainst temperature. Doyouseeany relationship? Onthedateof the
challenger incidentthetemperatureatlaunchwas31degreesF. Whatwouldyouexpectfor
thenumber of damageincidents?
7. Expected Value and Variance
7.1 Summarizing Data on Random Variables
Whenwereturnmidtermtests, someonealmost always asks what theaveragewas. Whilewecould
list out all marks to giveapictureof howstudents performed, this would betedious. It would also
givemoredetail than couldbeimmediately digested. If wesummarizetheresults by tellingaclass
theaveragemark, studentsimmediately get asenseof howwell theclassperformed. For thisreason,
summarystatistics areoftenmorehelpful thangivingfull detailsof everyoutcome.
To illustratesomeof theideas involved, supposewewereto observecars crossingatoll bridge, and
recordthenumber, A, of peopleineachcar. Supposeinasmall study
27
dataon25carswerecollected.
Wecouldlist out all 25numbersobserved, but amorehelpful way of presentingthedatawouldbein
terms of thefrequency distribution below, whichgives thenumber of times (thefrequency) each
valueof A occurred.
X Frequency Count Frequency
1 | | | | | 6
2 | | | | | | | 8
3 | | | | 5
4 | | | 3
5 | | 2
6 | 1
Wecouldalsodrawafrequency histogramof thesefrequencies:
Frequencydistributionsor histogramsaregoodsummariesof databecausetheyshowthevariability
intheobservedoutcomes very clearly. Sometimes, however, wemight prefer asingle-number sum-
mary. Themost commonsuchsummaryistheaverage, or arithmeticmeanof theoutcomes. Themean
27
"Studywithout desirespoilsthememory, andit retainsnothingthat it takesin." LeonardodaVinci
111
112
1 2 3 4 5 6
0
1
2
3
4
5
6
7
8
Figure7.1: FrequencyHistogram
of : outcomesr
1
, . . . , r
a
for arandomvariableA is
a
P
i=1
r
i
,:, andisdenotedby r. Thearithmetic
meanfor theexampleabovecanbecalculatedas
(6 1) + (8 2) + (5 3) + (3 4) + (2 5) + (1 6)
25
=
65
25
= 2.60
That is, therewas an averageof 2.6 persons per car. A set of observed outcomes r
1
, . . . , r
a
for a
randomvariableA is termedasample inprobability andstatistics. Toreect thefact that this is the
average for a particular sample, we refer to it as thesample mean. Unless somebody deliberately
cooked the study, we would not expect to get precisely the same sample mean if we repeated it
another time. Notealsothat r isnot ingeneral aninteger, eventhoughA is.
Twoother commonsummarystatisticsarethemedianandmode.
Denition 13 The median of a sample is a value such that half the results are below it and half above
it, when the results are arranged in numerical order.
If these25results werewritteninorder, the13
th
outcomewouldbea2. Sothemedianis 2. By
convention, wegohalf waybetweenthemiddletwovaluesif thereareanevennumber of observations.
Denition 14 The mode of the sample is the value which occurs most often. In this case the mode is 2.
There is no guarantee there will be only a single mode.
Exercise: Giveadataset withatotal of 11 valuesfor whichthemedian<mode<mean.
113
7.2 Expectation of a Random Variable
The statistics in the preceding section summarize features of a sample of observed A-values. The
sameideacanbeusedtosummarizetheprobabilitydistributionof arandomvariableA. Toillustrate,
consider thepreviousexample, whereA isthenumber of personsinarandomlyselectedcar crossing
atoll bridge.
Notethat wecanre-arrangetheexpressionusedtocalculater for thesample, as
(6 1) + (8 2) + + (1 6)
25
= (1)

6
25

+(2)

8
25

+(3)

5
25

+(4)

3
25

+(5)

2
25

+(6)

1
25

=
6
X
a=1
r fractionof timesr occurs
Nowsupposeweknowthat theprobabilityfunctionof A isgivenby
r 1 2 3 4 5 6
)(r) .30 .25 .20 .15 .09 .01
Usingtherelativefrequencydenitionof probability, if weobservedaverylargenumber of cars, the
fraction(or relativefrequency) of timesA = 1 wouldbe.30, for A = 2, thisproportionwouldbe.25,
etc. So, in theory, (accordingtotheprobabilitymodel) wewouldexpect themeantobe
(1)(.30) + (2)(.25) + (3)(.20) + (4)(.15) + (5)(.09) + (6)(.01) = 2.51
if weobservedaninnitenumberof cars. Thistheoreticalmeanisusuallydenotedbyj or1(A), and
requiresustoknowthedistributionof A. Withthisbackgroundwemakethefollowingmathematical
denition.
Denition 15 The expected value (also called the mean or the expectation) of a discrete random vari-
able A with probability function )(r) is
1(A) =
X
all x
r)(r).
Theexpected valueof A is also often denoted by theGreek letter j. Theexpected value
28
of A
canbethought of physically as theaverageof theA-values that wouldoccur inaninniteseries of
repetitionsof theprocesswhereA isdened. Thisvaluenot onlydescribesoneaspect of aprobability
distribution, but isalsoveryimportant incertaintypesof applications. For example, if youareplaying
28
Oft expectationfails, andmost oft wheremost it promises; andoft it hitswherehopeiscoldest; anddespair most sits.
WilliamShakespeare(1564- 1616)
114
acasinogameinwhichA representstheamount youwininasingleplay, then1(A) representsyour
averagewinnings(or losses!) per play.
Sometimeswemay not beinterestedintheaveragevalueof A itself, but insomefunctionof A.
Consider thetoll bridgeexampleonceagain, andsupposethereisatoll whichdependsonthenumber
of car occupants. For example, atoll of $1per car plus25centsper occupantwouldproduceanaverage
toll for the25carsinthestudyof Section7.1equal to
(1.25)

6
25

+(1.50)

8
25

+(1.75)

5
25

+(2.00)

3
25

+(2.25)

2
25

+(2.50)

1
25

= $1.65
If A hasthetheoretical probabilityfunction)(r) givenabove, thentheaveragevalueof this$(.25X +
1) toll wouldbedenedinthesameway, as,
(1.25)(.30) + (1.50)(.25) + (1.75)(.20) + (2.00)(.15) + (2.25)(.09) + (2.50)(.01) = $1.6275
Wecall thistheexpectedvalueof (0.25A + 1) andwrite1(0.25A + 1) = 1.6275.
Asafurther illustration, supposeatoll designedtoencouragecar poolingcharged$12,r
2
if therewere
r peopleinthecar. Thisschemewouldyieldanaveragetoll, intheory, of

12
1

(.30) +

12
4

(.25) +

12
9

(.20) +

12
16

(.15) +

12
25

(.09) +

12
36

(.01) = $4.7757
that is,
1

12
A
2

= 4.7757
isthe expectedvalue of

12
A
2

.
Withthisasbackground, wecannowmakeaformal denition.
Theorem 16 Suppose the random variable A has probability function )(r). Then the expected value
of some function q(A) of A is given by
1[q(A)] =
X
all a
q(r))(r)
Proof. To use denition 15, we need to determine the expected value of the randomvariable
1 = q(A) byrstndingtheprobabilityfunctionof 1, say)
Y
(j) = 1(1 = j) andthencomputing
1[q(A)] = 1(1 ) =
X
all j
j)
Y
(j) (7.5)
115
Noticethat if welet 1
j
= {r; q(r) = j} betheset of r valueswithagivenvaluej for q(r), then
)
Y
(j) = 1(q(A) = j) =
X
a1
)(r)
Substitutingthisin(7.5) weobtain
1[q(A)] =
X
all j
j)
Y
(j)
=
X
all j
j
X
a1

)(r)
=
X
all j
X
a1

q(r))(r)
=
X
all a
q(r))(r)
Notes:
(1) Youcaninterpret 1[q(A)] astheaveragevalueof q(A) inaninniteseriesof repetitionsof the
processwhereA isdened.
(2) 1[q(A)] is also known as theexpectedvalue of q(A). This nameis somewhat misleading
sincetheaveragevalueof q(A) maybeavaluewhichq(A) never takes- henceunexpected!
(3) Thecasewhereq(r) = r reducestoour earlier denitionof 1(A).
(4) Confusionsometimesarisesbecausewehavetwonotationsfor themeanof aprobabilitydistri-
bution: j and1(A) meanthesamething. Thereisasmall advantagetousingthe(lower case)
letter j. It makesit visuallyclearer that theexpectedvalueisNOT arandomvariablelikeA but
anon-randomconstant.
(5) Whencalculatingexpectations, lookat your answer tobesureit makessense. If A takesvalues
from1to10, youshouldknowyouvemadeanerror if youget 1(A) 10 or 1(A) < 1. In
physical terms, 1(A) isthebalancepoint for thehistogramof )(r).
Letusnoteacoupleof mathematical propertiesof expectedvaluethatcanhelptosimplifycalculations.
Linearity Properties of Expectation: If your linear algebraisgood, it mayhelpif youthink of 1
asbeingalinear operator, andthismaysavememorizingtheseproperties.
116
1. For constantsa and/,
1[aq(A) +/] = a1[q(A)] +/
Proof:
1[aq(A) +/] =
X
all a
[aq(r) +/] )(r)
=
X
all a
[aq(r))(r) +/)(r)]
= a
X
all a
q(r))(r) +/
X
all a
)(r)
= a1[q(A)] +/ since
X
all a
)(r) = 1
2. Similarlyfor constantsa and/ andtwofunctionsq
1
andq
2
, it isalsoeasytoshow
1[aq
1
(A) +/q
2
(A)] = a1[q
1
(A)] +/1[q
2
(A)]
Dont let expectedvalueintimidateyou. Muchof it is commonsense. For example, usingproperty
1, with welet a = 0 and / = 13 weobtain 1(13) = 13. Theexpected valueof aconstant / is,
of course, equal to /. Theproperty also implies 1(2A) = 21(A) if weusea = 2, / = 0, and
q(A) = A. This is obvious also. Note, however, that for q(r) a nonlinear function, it is NOT
generally truethat 1[q(A)] = q(1(A)); this is a common mistake. (Check this for theexample
abovewhenq(A) = 12,A
2
.)
7.3 Some Applications of Expectation
Becauseexpectedvalueis an averagevalue, it is frequently usedin problems wherecosts or prots
areconnectedwiththeoutcomes of arandomvariableA. It is also usedas asummary statistic; for
example, oneoftenhearsabout theexpectedlife(expectationof lifetime) for apersonor theexpected
return on an investment. Becautious however. Theexpected valuedoes NOT tell thewholestory
about adistribution. Oneinvestment couldhaveahigher expectedvaluethananother but muchmuch
larger probabilityof largelosses.
Thefollowingareexamples.
Example: Expected Winnings in a Lottery. A small lottery
29
sells 1000 tickets numbered 000, 001, . . . , 999;
the tickets cost $10 each. When all the tickets have been sold the draw takes place: this consists of a
29
"Heressomethingtothinkabout: Howcomeyounever seeaheadlinelikePsychicWinsLottery? " J ayLeno(1950- )
117
single ticket from 000 to 999 being chosen at random. For ticket holders the prize structure is as
follows:
Your ticket is drawn - win $5000.
Your ticket has the same rst two number as the winning ticket, but the third is different - win
$100.
Your ticket has the same rst number as the winning ticket, but the second number is different -
win $10.
All other cases - win nothing.
Let the random variable A represent the winnings from a given ticket. Find 1(A).
Solution: The possible values for A are 0, 10, 100, 5000 (dollars). First, we need to nd the
probabilityfunctionfor A. Wend(makesureyoucandothis) that )(r) = 1(A = r) hasvalues
)(0) = 0.900, )(10) = 0.090, )(100) = .009, )(5000) = .001
Theexpectedwinningsarethustheexpectedvalueof A, or
1(A) =
X
all x
r)(r) = $6.80
Thus, thegrossexpectedwinningsper ticketare$6.80. However, sinceaticketcosts$10your expected
net winningsarenegative, -$3.20(i.e. anexpectedlossof $3.20).
Remark: For anylotteryor gameof chancetheexpectednet winningsper playisakeyvalue. A fair
gameis onefor whichthis valueis 0. Needless to say, casino games andlotteries arenever fair: the
expectednet winningsfor aplayer arealwaysnegative.
Remark: Therandomvariableassociatedwithagivenproblemmaybedenedindifferentwaysbut
theexpectedwinnings will remainthesame. For example, insteadof deningA as theamount won
wecouldhavedenedA = 0, 1, 2, 3 asfollows:
A = 3 all 3digitsof number matchwinningticket
A = 2 1st 2digits(only) match
A = 1 1st digit (but not the2nd) match
A = 0 1st digit doesnot match
118
Now, wewoulddenethefunctionq(r) asthewinningswhentheoutcomeA = r occurs. Thus,
q(0) = 0, q(1) = 10, q(2) = 100, q(3) = 5000
Theexpectedwinningsarethen
1(q(A)) =
3
X
a=0
q(r))(r) = $6.80,
thesameasbefore.
Example: Diagnostic Medical Tests: Oftentherearecheaper, lessaccuratetestsfor diagnosingthe
presenceof someconditionsinaperson, alongwithmoreexpensive, accuratetests. Supposewehave
twocheaptestsandoneexpensivetest, withthefollowingcharacteristics. All threetestsarepositiveif
apersonhasthecondition(therearenofalsenegatives), but thecheaptestsgivefalsepositives.
Let apersonbechosenat random, andlet 1 ={personhasthecondition}. Thethreetestsare
Test 1: 1 (positivetest |1) =.05; test costs$5.00
Test 2: 1 (positivetest |1) =.03; test costs$8.00
Test 3: 1 (positivetest |1) =0; test costs$40.00
Wewant tocheck alargenumber of peoplefor thecondition, andhavetochooseamongthreetesting
strategies:
(i) UseTest 1, followedbyTest 3if Test 1ispositive
30
.
(ii) UseTest 2, followedbyTest 3if Test 2ispositive.
(iii) UseTest 3.
Determinetheexpectedcost per personunder eachof strategies(i), (ii) and(iii). Wewill thenchoose
the strategy with the lowest expected cost. It is known that about .001 of the population have the
condition(i.e. 1(1) = .001, 1(1) = .999).
Solution: DenetherandomvariableA asfollows(for arandompersonwhoistested):
A = 1 if theinitial test isnegative
A = 2 if theinitial test ispositive
30
Assumethat given1 or 1, testsareindependent of oneanother.
119
Alsolet q(r) bethetotal cost of testingtheperson. Theexpectedcost per personisthen
1[q(A)] =
2
X
a=1
q(r))(r)
Theprobability function)(r) for A andfunctionq(r) differ for strategies(i), (ii) and(iii). Consider
for examplestrategy(i). Then
1(A = 2) = 1 (initial test positive)
= 1(1) +1(positive|1)1(1)
= .001 + (.05)(.999) = 0.0510
Therest of theprobabilities, associatedvaluesof q(A) and1[q(A)] areobtainedbelow.
(i) )(1) = 1(A = 1) = 1 )(2) = 1 0.0510 = 0.949 (see )(2) below)
)(2) = 0.0510 (obtainedabove)
q(1) = 5 q(2) = 45
1[q(A)] = 5(.949) + 45(.0510) = $7.04
(ii) )(2) = .001 + (.03)(.999) = .03097
)(1) = 1 )(2) = .96903
q(1) = 8 q(2) = 48
1[q(A)] = 8(.96903) + 48(.03097) = $9.2388
(iii) )(2) = .001, )(1) = .999
q(0) = q(1) = 40
1[q(A)] = $40.00
Thus, it ischeapest tousestrategy(i).
Problem:
7.3.1 A lottery
31
has tickets numbered000to 999whicharesoldfor $1each. Oneticket is selected
at randomandaprizeof $200is givento any personwhoseticket number is apermutationof
31
"Ivedonethecalculationandyour chancesof winningthelotteryareidentical whether youplayor not." FranLebowitz
(1950- )
120
theselectedticket number. All 1000tickets aresold. What is theexpectedprot or loss tothe
organizationrunningthelottery?
7.4 Means and Variances of Distributions
Itsuseful toknowthemeans, j = 1(A) of probabilitymodelsderivedinChapter 6.
Example: (Expected Value of the binomial distribution) Let A 1i(:, j). Find1(A).
Solution:
j = 1(A) =
a
P
a=0
r

a
a

j
a
(1 j)
aa
=
a
P
a=0
r
a!
a!(aa)!
j
a
(1 j)
aa
When r = 0 thevalueof theexpression is 0. Wecan thereforebeginour sumat r = 1. Provided
r 6= 0, wecanexpandr! asr(r 1)! (soit isimportant toeliminatethetermwhenr = 0).
Thereforej =
a
X
a=1
:(: 1)!
(r 1)! [(: 1) (r 1)]!
jj
a1
(1 j)
(a1)(a1)
= :j(1 j)
a1
a
X
a=1

: 1
r 1

j
1 j

a1
Let j = r 1 inthesum, toget
j = :j(1 j)
a1
a1
P
j=0

a1
j

j
1j

j
= :j (1 j)
a1

1 +
j
1j

a1
(binomial theorem)
= :j (1 j)
a1 (1j+j)
n1
(1j)
n1
= :j
Exercise: Doesthisresult makesense? If youtry something100timesandthereisa20%chanceof
successeachtime, howmanysuccessesdoyouexpect toget, onaverage?
Example: (Expected value of the Poisson distribution) Let A haveaPoissondistributionwhere
` istheaveragerateof occurrenceandthetimeinterval isof lengtht. Findj = 1(A).
Solution: Theprobability functionof A is )(r) =
(At)
i
c
AI
a!
. Thenj = 1(A) =

P
a=0
r
(At)
i
c
AI
a!
.
As inthebinomial example, wecaneliminatethetermwhenr = 0 andexpandr! as r(r 1)! for
121
r = 1, 2, , .
j =

X
a=1
r
(`t)
a
c
At
r!
=

X
a=1
r
(`t)
a
c
At
r(r 1)!
=

X
a=1
(`t)c
At
(`t)
a1
(r 1)!
= (`t) c
At

X
a=1
(`t)
a1
(r 1)!
= (`t)c
At

X
j=0
(`t)
j
j!
lettingj = r 1 inthesum
= (`t)c
At
c
At
sincec
a
=

X
j=0
r
j
j!
= `t.
Notethat weusedthesymbol j = `t earlier inconnectionwiththePoissonmodel; thiswasbecause
weknew(but couldnt showuntil now) that 1(A) = j.
Exercise: Thesetechniquescanalsobeusedtoworkoutthemeanfor thehypergeometricor negative
binomial distributions. Lookingback at howweprovedthat
P
)(r) = 1 shows thesamemethodof
summationusedtondj. However, inChapter 8wewill giveasimpler methodof ndingthemeans
of thesedistributions, whichare1(A) = :r, (hypergeometric) and1(A) = /(1 j),j (negative
binomial).
Variability: Whileanaverageor expectedvalueis auseful summary of aset of observations, or a
probability distribution, it omits another important pieceof information, namely theamount of vari-
ability. For example, it wouldbepossiblefor car doorstobetheright width, onaverage, andstill have
nodoorstproperly. Inthecaseof ttingcar doors, wewouldalsowantthedoor widthstoall beclose
tothiscorrect average. Wegiveaway of measuringtheamount of variability next. Youmight think
wecouldusetheaveragedifferencebetweenA andj toindicatetheamount of variation. Intermsof
expectation, thiswouldbe1(A j). However, 1(A j) = 1(A) j (sincej isaconstant) =0.
Wesoonrealizethat for ameasureof variability, wecanusetheexpectedvalueof afunctionthat has
thesamesignfor A j andfor A < j. Onemight try theexpectedvalueof thedistancebetween
A andits mean, e.g. 1(|A j|). Analternative, moremathematically tractableversionsquares the
distance(muchasEuclideandistancein<
a
involvesasumof squareddistances) isthevariance.
Denition 17 The variance of a r.v A is 1
h
(A j)
2
i
, and is denoted by o
2
or by Var (A).
Inwords, thevarianceistheaveragesquareof thedistancefromthemean. Thisturnsout tobeavery
useful measureof thevariabilityof A.
122
Example: Let A be the number of heads when a fair coin is tossed 4 times. Then A 1i

4,
1
2

so
j = :j = (4)

1
2

= 2. Find \ ar(A).
Without doingany calculationsweknow\ ar(A) = o
2
4. ThisisbecauseA isalwaysbetween0
and4andso themaximumpossiblevaluefor (A j)
2
is (4 2)
2
or (0 2)
2
whichis 4. Anex-
pectedvalueof afunction, say1(q(r)] isalwayssomewherebetweentheminimumandthemaximum
valueof thefunctionq(r) sointhiscase0 \ ar(A) 4. Thevaluesof )(r) are
r 0 1 2 3 4 since)(r) =

4
a

1
2

1
2

4a
)(r) 1,16 4,16 6,16 4,16 1,16 =

4
a

1
2

4
Thevalueof \ ar(A) (i.e. o
2
) iseasilyfoundhere:
o
2
= 1
h
(A j)
2
i
=
4
P
a=0
(r j)
2
)(r)
= (0 2)
2

1
16

+ (1 2)
2

4
16

+ (2 2)
2

6
16

+ (3 2)
2

4
16

+ (4 2)
2

1
16

= 1
If wekeeptrackof unitsof measurement thevariancewill beinpeculiar units; e.g. if A isthenumber
of heads in 4 tosses of acoin, o
2
is in units of heads
2
! Wecan regain theoriginal units by taking
(positive)

variance. Thisiscalledthestandarddeviationof A, andisdenotedbyo, or aso1(A).


Denition 18 The standard deviation of a random variable A is o =
r
1
h
(A j)
2
i
Bothvarianceandstandarddeviationarecommonlyusedtomeasurevariability.
Thebasic denitionof varianceis oftenawkwardtousefor mathematical calculationof o
2
, whereas
thefollowingtworesultsareoftenuseful:
(1) o
2
= 1

A
2

j
2
(2) o
2
= 1[A(A 1)] +j j
2
Proof:
(1) Usingpropertiesof expectedvalue,
o
2
= 1
h
(A j)
2
i
= 1

A
2
2jA +j
2

= 1

A
2

2j1(A) +j
2
(sincej is constant )
= 1

A
2

2j
2
+j
2
( Therefore1(A) = j)
= 1

A
2

j
2
123
(2) sinceA
2
= A(A 1) +A
Therefore 1

A
2

j
2
= 1[A (A 1) +A] j
2
= 1[A(A 1)] +1(A) j
2
= 1[A(A 1)] +j j
2
Formula(2) ismostoftenusedwhenthereisanr! terminthedenominator of )(r). Otherwise, formula
(1) isgenerallyeasier touse.
Example: (Variance of binomial distribution)
Let A 1i(:, j). FindVar (A).
Solution: Theprobabilityfunctionfor thebinomial is
)(r) =
:!
r!(: r)!
j
a
(1 j)
aa
sowell useformula(2) above,
1[A(A 1)] =
a
X
a=0
r(r 1)
:!
r!(: r)!
j
a
(1 j)
aa
If r = 0 or r = 1 thevalueof thetermis0, sowecanbeginsummingat r = 2. For r 6= 0 or 1, we
canexpandther! asr(r 1)(r 2)!
Therefore1[A (A 1)] =
a
X
a=2
:!
(r 2)!(: r)!
j
a
(1 j)
aa
Now re-group to t the binomial theorem, since that was the summation technique used to show
P
)(r) = 1 andtoderivej = :j.
1[A(A 1)] =
a
X
a=2
:(: 1)(: 2)!
(r 2)! [(: 2) (r 2)]!
j
2
j
a2
(1 j)
(a2)(a2)
= :(: 1)j
2
(1 j)
a2
a
X
a=2

: 2
r 2

j
1 j

a2
Let j = r 2 inthesum, giving
1[A(A 1)] = :(: 1)j
2
(1 j)
a2
a2
X
j=0

: 2
j

j
1 j

j
= :(: 1)j
2
(1 j)
a2

1 +
j
1j

a2
= :(: 1)j
2
(1 j)
a2
(1j+j)
n2
(1j)
n2
= :(: 1)j
2
124
Then
o
2
= 1[A (A 1)] +j j
2
= :(: 1)j
2
+:j (:j)
2
= :
2
j
2
:j
2
+:j :
2
j
2
= :j(1 j)
Remember that thevarianceof abinomial distributionis:j(1 j), sincewell beusingit later inthe
course.
Example: (Variance of Poisson distribution) Findthevarianceof thePoissondistribution.
Solution: Theprobabilityfunctionof thePoissonis
)(r) =
j
a
c
j
r!
fromwhichweobtain
1[A(A 1)] =

X
a=0
r(r 1)
j
a
c
j
r!
=

X
a=2
r(r 1)
j
a
c
j
r(r 1)(r 2)!
, settingthelower limit to2 andexpandingr!
= j
2
c
j

X
a=2
j
a2
(r 2)!
Let j = r 2 inthesum, giving
1[A(A 1)] = j
2
c
j

X
j=0
j
j
j!
= j
2
c
j
c
j
= j
2
so
o
2
= 1[A(A 1)] +j j
2
= j
2
+j j
2
= j
(For the Poisson distribution, the variance equals the mean.)
Properties of Mean and Variance
If a and/ areconstantsand1 = aA +/, then
j
Y
= aj
A
+/ ando
2
Y
= a
2
o
2
A
(wherej
A
ando
2
A
arethemeanandvarianceof A andj
Y
ando
2
Y
arethemeanandvarianceof 1 ).
Proof:
Wealreadyshowedthat 1(aA +/) = a1(A) +/.
125
i.e. j
Y
= aj
A
+/, andthen
o
2
Y
= 1
h
(1 j
Y
)
2
i
= 1
n
[(aA +/) (aj
A
+/)]
2
o
= 1
h
(aA aj
A
)
2
i
= 1
h
a
2
(A j
A
)
2
i
= a
2
1
h
(A j
A
)
2
i
= a
2
o
2
A
This result is tobeexpected. Addingaconstant, /, toall values of A has noeffect ontheamount of
variability. Soit makessensethat Var (aA +/) doesnt dependonthevalueof /. Alsosincevariance
is in squared units, multiplication by a constant results in multiplying the variance by the constant
squared. A simpleway to relateto this result is to consider arandomvariableA whichrepresents a
temperatureindegreesCelsius(eventhoughthisisacontinuousrandomvariablewhichwedontstudy
until Chapter 9). Nowlet 1 bethecorrespondingtemperatureindegreesFahrenheit. Weknowthat
1 =
9
5
A + 32
andit isclear if wethinkabout it that j
Y
= (
9
5
)j
A
+ 32 andthat o
2
Y
= (
9
5
)
2
o
2
A
.
Problems:
7.4.1 Anairlineknows that thereis a97%chanceapassenger for acertainight will showup, and
assumes passengers arriveindependently of each other. Tickets cost $100, but if apassenger
showsupandcant becarriedontheight theairlinehastorefundthe$100andpay apenalty
of $400toeachsuchpassenger. If thepassenger doesnot showup, theairlinemust fullyrefund
thepriceof theticket. Howmanyticketsshouldtheysell for aplanewith120seatstomaximize
their expectedticketrevenuesafterpayinganypenaltycharges? Assumeticketholderswhodont
showupget afull refundfor their unusedticket.
7.4.2 A typisttypingataconstantspeedof 60wordsperminutemakesamistakeinanyparticularword
withprobability.04, independentlyfromwordtoword. Eachincorrect wordmust becorrected;
ataskwhichtakes15secondsper word.
(a) Findthemeanandvarianceof thetime(inseconds) takentonisha450wordpassage.
(b) Wouldit belesstimeconsuming, onaverage, totypeat 45wordsper minuteif thisreduces
theprobabilityof anerror to.02?
126
7.5 Moment Generating Functions
32
Wehavenowseentwofunctions whichcharacterizeadistribution, theprobability functionandthe
cumulativedistribution function. Thereis athird typeof function, themoment generating function,
whichuniquely determinesadistribution. Themoment generatingfunctionisclosely relatedtoother
transformsusedinmathematics, theLaplaceandFourier transforms.
Denition 19 Consider a discrete random variable A with probability function )(r). The moment
generating function (m.g.f.) of A is dened as
'(t) = 1(c
tA
) =
X
a
c
ta
)(r).
We will assume that the moment generating function is dened and nite for values of t in an interval
around 0 (i.e. for some a 0 ,
P
a
c
ta
)(r) < for all t [a, a]).
Themoments of arandomvariableA aretheexpectations of thefunctions A
v
for r = 1, 2, . . . .
Theexpected value1(A
v
) is called r
th
moment of A. Themean j = 1(A) is thereforetherst
moment, 1(A
2
) thesecondandsoon. Itisofteneasytondthemomentsof aprobabilitydistribution
mathematicallybyusingthemoment generatingfunction. Thisoftengiveseasier derivationsof means
and variances than thedirect summation methods in thepreceding section. Thefollowing theorem
givesauseful propertyof m.g.f.s.
Theorem 20 Let the random variable A have m.g.f. '(t). Then
1(A
v
) = '
(v)
(0) r = 1, 2, . . .
where '
(v)
(0) stands for d
v
'(t),dt
v
evaluated at t = 0.
Proof:
'(t) =
P
a
c
ta
)(r) andif thesumconverges, then
'
(v)
(t) =
o
ot
r
P
a
c
ta
)(r)
=
P
a
o
ot
r
(c
ta
))(r)
=
P
a
r
v
c
ta
)(r)
Therefore'
(v)
(0) =
P
a
r
v
)(r) = 1(A
v
), asstated.
32
Thissectionoptional for stat 220
127
Thissometimesgivesasimplewaytondthemomentsfor adistribution.
Example 1. SupposeX hasaBinomial(:, j)distribution. Thenitsmoment generatingfunctionis
'(t) =
a
X
a=0
c
ta

:
r

j
a
(1 j)
aa
=
a
X
a=0

:
r

(jc
t
)
a
(1 j)
aa
= (jc
t
+ 1 j)
a
Therefore
'
0
(t) = :jc
t
(jc
t
+ 1 j)
a1
'
00
(t) = :jc
t
(jc
t
+ 1 j)
a1
+:(: 1)j
2
c
2t
(jc
t
+ 1 j)
a2
andso
1(A) = '
0
(0) = :j,
1(A
2
) = '(0) = :j +:(: 1)j
2
Var(A) = 1(A
2
) 1(A)
2
= :j(1 j)
Exercise. Poisson distribution
Showthat thePoissondistributionwithprobabilityfunction
)(r) = c
j
j
a
,r! r = 0, 1, 2, . . .
hasm.g.f. '(t) = c
j+jc
I
. Thenshowthat 1(A) = j and\ ar(A) = j.
Them.g.f. alsouniquelyidentiesadistributioninthesensethat twodifferent distributionscannot
havethesamem.g.f. Thisresultisoftenusedtondthedistributionof arandomvariable. For example
if I canshowsomehowthat themoment generatingfunctionof arandomvariableA is
c
2(c
I
1)
then I know, fromtheabove exercisethat therandomvariablemust haveaPoisson(2) distribution.
Moment generatingfunctions areoftenusedtoidentify agivendistribution. If tworandomvariables
havethesamemoment generatingfunction, they havethesamedistribution(so thesameprobability
function, cumulativedistributionfunction, moments, etc.). Of coursethemoment generatingfunctions
must matchfor all valuesof t, inother wordstheyagreeasfunctions, not just at afewpoints. Moment
generating functions can also be used to determine that a sequence of distributions gets closer and
128
closer to somelimiting distribution. To show this (albeit abit loosely), supposethat asequenceof
probabilityfunctions)
a
(r) havecorrespondingmoment generatingfunctions
'
a
(t) =
X
a
c
ta
)
a
(r)
Supposemoreover that theprobabilityfunctions)
a
(r) convergetoanother probabilityfunction)(r)
pointwiseinr as: . Thisiswhat wemeanbyconvergenceof discretedistributions. Thensince
)
a
(r) )(r) as: for eachr, (7.6)
X
a
c
ta
)
a
(r)
X
a
c
ta
)(r) as: for eacht (7.7)
whichsaysthat '
a
(t) convergesto'(t) themoment generatingfunctionof thelimitingdistribution.
It shouldnt betoo surprisingthat avery useful converseto this result also holds. (This is strictly an
asideand may beof interest only to thosewith a thing for inniteseries, but is it always truethat
becausetheindividual terms in aseries convergeas in (7.6) does this guaranteethat thesumof the
seriesalsoconverges(7.7)?)
Supposeconverselythat A
a
hasmoment generatingfunction'
a
(t) and'
a
(t) '(t) for each
t suchthat '(t) < . For examplewesawinChapter 6that aBinomial(:, j) distributionwithvery
large: andverysmall j isclosetoaPoissondistributionwithparameter j = :j. Consider themoment
generatingfunctionof suchabinomial randomvariable
'(t) = (jc
t
+ 1 j)
a
= {1 +j(c
t
1)}
a
= {1 +
j
:
(c
t
1)}
a
(7.8)
Nowtakethelimit of thisexpressionas: . Sinceingeneral
(1 +
c
:
)
a
c
c
thelimit of (7.8) as: is
c
j(c
I
1)
= c
j+jc
I
and this is themoment generating function of a Poisson distribution with parameter j. This shows
alittlemoreformally than wedid earlier that thebinomial(:, j) distribution with (small) j = j,:
approachesthePoisson(j) distributionas: .
129
7.6 Problems on Chapter 7
7.1 Let A haveprobabilityfunction)(r) =
(
1
2a
for r = 2, 3, 4, 5, or 6
11,40 for r = 1
Findthemeanandvari-
ancefor A.
7.2 A gameisplayedwhereafaircoinistosseduntil thersttail occurs. Theprobabilityr tosseswill
beneededis)(r) = .5
a
; r = 1, 2, 3, . Youwin$2
a
if r tossesareneededforr = 1, 2, 3, 4, 5
but lose$256if r 5. Determineyour expectedwinnings.
7.3 Diagnostictests. Consider diagnostictestslikethosediscussedaboveintheexampleof Section
7.3andinProblem15for Chapter 4. Assumethat for arandomlyselectedperson, 1(1) = .02,
1(1|1) = 1, 1(1|1) = .05, sothat theinexpensivetest onlygivesfalsepositive, andnot false
negative, results.
Supposethat thisinexpensivetest costs$10. If apersontestspositivethentheyarealsogivena
moreexpensivetest, costing$100, whichcorrectlyidentiesall personswiththedisease. What
istheexpectedcost per personif apopulationistestedfor thediseaseusingtheinexpensivetest
followed, if necessary, bytheexpensivetest?
7.4 DiagnostictestsII. Twopercent of thepopulationhasacertainconditionfor whichtherearetwo
diagnostic tests. Test A, which costs $1 per person, gives positiveresults for 80%of persons
withtheconditionandfor 5%of persons without thecondition. Test B, whichcosts $100per
person, givespositiveresultsfor all personswiththeconditionandnegativeresultsfor all persons
without it.
(a) Supposethat test B is givento150persons, at acost of $15,000. Howmany cases of the
conditionwouldoneexpect todetect?
(b) Suppose that 2000 persons are given test A, and then only those who test positive are
giventest B. Showthat theexpectedcost is$15,000but that theexpectednumber of cases
detectedismuchlarger thaninpart (a).
7.5 Theprobabilitythat aroulettewheel stopsonarednumber is18/37. For eachbet onred you
arereturnedtwiceyour bet(includingyour bet) if thewheel stopsonarednumber, andloseyour
moneyif it doesnot.
(a) If youbet $1oneachof 10consecutiveplays, what is your expectedwinnings? What is
your expectedwinningsif youbet $10onasingleplay?
(b) For eachof thetwocasesinpart (a), calculatetheprobability that youmadeaprot (that
is, your winnings arepositive, not negative).
1*5 -(1-0.5-0.25-0.125-0.0625-0.03125)*256 =
5 - 8 = -3
130
7.6 Slotmachines. Consider theslotmachinediscussedaboveinProblem16for Chapter 4. Suppose
that thenumber of eachtypeof symbol onwheels1, 2and3isasgivenbelow:
Wheel
Symbols 1 2 3
Flower 2 6 2
Dog 4 3 3
House 4 1 5
If all threewheelsstoponaower, youwin$20for a$1bet. If all threewheelsstoponadog,
youwin$10, andif all threestoponahouse, youwin$5. Otherwiseyouwinnothing.
Findyour expectedwinningsper dollar spent.
7.7 Supposethat : peopletakeablood test for adisease, whereeach person has probability j of
havingthedisease, independent of other persons. Tosavetimeandmoney, bloodsamplesfrom
/ peoplearepooledandanalyzedtogether. If noneof the/ personshasthediseasethenthetest
will benegative, but otherwiseit will bepositive. If thepooledtest ispositivetheneachof the/
personsistestedseparately(so/ + 1 testsaredoneinthat case).
(a) Let A bethenumber of testsrequiredfor agroupof / people. Showthat
1(A) = / + 1 /(1 j)
I
.
(b) Whatistheexpectednumber of testsrequiredfor :,/ groupsof / peopleeach? If j = .01,
evaluatethisfor thecases/ = 1, 5, 10.
(c) Showthat if j is small, theexpectednumber of tests inpart (b) is approximately :(/j +
/
1
), andisminimizedfor /
.
= j
12
.
7.8 A manufacturer of car radiosshipsthemtoretailersincartonsof : radios. Theprot per radio
is $59.50, less shippingcost of $25 per carton, so theprot is $ (59.5: 25) per carton. To
promotesalesbyassuringhighquality, themanufacturer promisestopaytheretailer $200A
2
if
A radiosinthecartonaredefective. (Theretailer isthenresponsiblefor repairinganydefective
radios.) Supposeradios areproducedindependently andthat 5%of radios aredefective. How
manyradiosshouldbepackedper cartontomaximizeexpectednet prot per carton?
7.9 Let A haveageometricdistributionwithprobabilityfunction
)(r) = j(1 j)
a
; r = 0, 1, 2, ...
(a) Calculatethem.g.f. '(t) = 1

c
tA

, wheret isaparameter.
131
(b) Findthemeanandvarianceof A.
(c) Useyour result in(b) toshowthat if j istheprobabilityof success (o) inasequenceof
Bernoulli trials, thentheexpectednumber of trialsuntil therst o occursis1,j. Explain
whythisisobvious.
7.10 Analysis of Algorithms: Quicksort. Supposewehaveaset o of distinct numbersandwewish
to sort themfromsmallest to largest. Thequicksort algorithmworks as follows: When: = 2
it just compares thenumbers andputs thesmallest onerst. For : 2 it starts by choosinga
randompivot number fromthe: numbers. It thencompareseachof theother : 1 numbers
withthepivotanddividesthemintogroupso
1
(numberssmaller thanthepivot) and

o
1
( numbers
bigger thanthepivot). Itthendoesthesamethingwitho
1
and

o
1
asitdidwitho, andrepeatsthis
recursively until thenumbersareall sorted. (Try thisout with, say: = 10 numberstoseehow
it works.) Incomputer scienceit iscommontoanalyzesuchalgorithmsbyndingtheexpected
number of comparisons(or other operations) neededtosort alist. Thus, let
C
a
= expectednumber of comparisonsfor listsof length:
(a) Showthat if A isthenumber of comparisonsneeded,
C
a
=
a
X
i=1
1(A| initial pivot isithsmallest number)

1
:

(b) Showthat
1(A|initial pivot isithsmallest number) = : 1 +C
i1
+C
ai
andthusthat C
a
satisestherecursion(noteC
0
= C
1
= 0)
C
a
= : 1 +
2
:
a1
X
I=1
C
I
: = 2, 3, . . .
(c) Showthat
(: + 1)C
a+1
= 2: + (: + 2)C
a
: = 1, 2, . . .
(d) (Harder) Usetheresult of part (c) toshowthat for large:,
C
a+1
: + 1
2 log (: + 1)
(Note: a
a
/
a
meansa
a
,/
a
1 as: ) Thisprovesaresult fromcomputer science
whichsaysthat for Quicksort, C
a
O(: |oq :).
132
7.11 Findthedistributionsthat correspondtothefollowingmoment-generatingfunctions:
(a) '(t) =
1
3c
I
2
, for t < ln(3,2)
(b) '(t) = c
2(c
I
1)
, for t <
7.12 Findthemoment generatingfunctionof thediscreteuniformdistributionA on{a, a + 1, ..., /};
1(A = r) =
1
/ a + 1
, for r = a, a + 1, ..., /.
Whatdoyougetinthespecial casea = / andinthecase/ = a+1? Usethemomentgenerating
functioninthesetwocasestoconrmtheexpectedvalueandthevarianceof A.
7.13 LetA bearandomvariabletakingvaluesintheset{0, 1, 2} withmoments1(A) = 1, 1(A
2
) =
3,2.
(a) Findthemoment generatingfunctionof A
(b) Findtherst sixmomentsof A
(c) Find1(A = i), i = 0, 1, 2.
(d) Showthat anyprobabilitydistributionon{0, 1, 2} iscompletelydeterminedbyitsrsttwo
moments.
7.14 Assumethateachweekastockeither increasesinvalueby$1withprobability
1
2
or decreasesby
$1, thesemovesindependentof thepast. Thecurrentpriceof thestockis$50. I wishtopurchase
acall optionwhichallowsme(if I wishtodoso) theoptionof buyingthestock 13weeksfrom
nowat astrikeprice of $55. Of courseif thestock priceat that timeis$55or lessthereisno
benet totheoptionandit isnot exercised. Assumethat thereturnfromtheoptionis
q(o
13
) = max(o
13
55, 0)
where o
13
is the price of the stock in 13 weeks. What is the fair price of the option today
assumingnotransactioncostsand0%interest; i.e. what is1[q(o
13
)]?
7.15

Challenge problem: Let A


a
bethenumber of ascentsinarandompermutationof theintegers
{1, 2, ...:}. For example, the number of ascents in the permutation 213546 is three, since
2, 135, 46 formascendingsequences.
(a) Showthat thefollowingrecursionfor theprobabilitiesj
a
(/) = 1[A
a
= /].
j
a
(/) =
/ + 1
:
j
a1
(/) +
: /
:
j
a1
(/ 1)
X~Bin(13,1/2)
X=10,11,12,13 50+10-3 = 57 > 55
E[g(S13)] = 2*13C10*0.5^13 + 4* 13C11*0.5^13 + 6* 13C12 * 0.5^13 + 8* 13C13* 0.5^13 = 0.1184
133
(b) Cardsnumbered1,2,....,n areshufed, drawnandput intoapileaslongasthecarddrawn
has anumber lower thanits predecessor. A newpileis startedwhenever ahigher cardis
drawn. Showthatthedistributionof thenumber of pilesthatweendwithisthatof 1 +A
a
andthat theexpectednumber of pilesis
a+1
2
.
7.16 YasminandZack areundergraduatemathematics students currently takingthesame5courses.
Let A bethenumber of assignments they haveinoneweek. Theprobability functionof A is
givenasfollows.
r 0 1 2 3 4 5
)(r) 0.09 0.10 0.25 0.40 0.15 0.01
Thenumber of cupsof coffeeYasminandZackdrinkinoneweekbothdependonthenumber of
assignmentstheyhave. Yasmindrinksabout2A
2
cupsper weekandZackdrinksabout|2A1|
cupsper week. Onaveragehowmanycupsof coffeewill YasminandZackdrinkinaweek?
7.17 A contestent onagameshowhastwoquestions, onefromcategoryA andonefromcategoryB.
Shemay choosewhichcategory toattempt rst but shemust answer therst questioncorrectly
tobeabletoattempt theremainingquestion. If sheanswers A correctly shereceives$100 and
if sheanswers B correctly shereccives $200. Sheknows theanswer to A withprobability 0.8
andtheanswer toB withprobability0.6. (Assumeindependenceinknowingtheanswerstothe
2questions)
(a) Whichquestionshouldsheattempt rst tomaximizeher winnings?
(b) Supposethat shemustnowpaya$50 penaltyif shegetsthetherst questionwrong. What
questionshouldsheattempt rst?
7.18 OnHalloweentrick-or-treatersarriveat ahousefrom5:30pmuntil 9pmaccordingtoaPoisson
Processwithanaverageof 12kidsper hour.
(a) What istheprobabilitythat between5and7kids(inclusive) arriveintherst half hour?
(b) Howmanykidswouldbeexpectedtoarriveover thewholeevening?
(c) What number of trick-or-treatersismost likelytoarrive?
(d) What isthevarianceof thenumber of kidsthat arriveover thewholeevening?
8. Discrete Multivariate Distributions
8.1 Basic Terminology and Techniques
Manyproblemsinvolvemorethanasinglerandomvariable. Whentherearemultiplerandomvariables
associatedwithanexperiment or processweusually denotethemasA, 1, . . . or asA
1
, A
2
, . . . . For
example, your nal mark inacoursemight involveA
1
=your assignment mark, A
2
=your midtermtest
mark, andA
3
=your exammark. Weneedto extendtheideas introducedfor singlevariables to deal
withmultivariateproblems. Hereweonlyconsider discretemultivariateproblems, thoughcontinuous
multivariatevariablesarealsocommonindaily life(e.g. consider apersonsheight A andweight 1,
or A
1
=thereturnfromStock1, A
2
=returnfromstock2). Tointroducetheideasinasimplesetting,
well rst consider anexampleinwhichthereareonly afewpossiblevalues of thevariables. Later
well apply theseconcepts tomorecomplex examples. Theideas themselves aresimpleeventhough
someapplicationscaninvolvefairlymessyalgebra.
Joint Probability Functions:
First, supposetherearetworandomvariablesA and1 , anddenethefunction
)(r, j) = 1(A = r and1 = j)
= 1(A = r, 1 = j).
Wecall )(r, j) thejoint probabilityfunctionof (A, 1 ). Ingeneral,
)(r
1
, r
2
, , r
a
) = 1(A
1
= r
1
andA
2
= r
2
and . . . andA
a
= r
a
)
if thereare: randomvariablesA
1
, . . . , A
a
.
Thepropertiesof ajoint probabilityfunctionaresimilar tothosefor asinglevariable; for tworandom
variableswehave)(r, j) 0 for all (r, j) and
X
all(x,y)
)(r, j) = 1.
134
135
Example: Consider thefollowingnumerical example, whereweshow)(r, j) inatable.
r
)(r, j) 0 1 2
1 .1 .2 .3
j
2 .2 .1 .1
for example)(0, 2) = 1(A = 0 and 1 = 2) = 0.2. Wecan check that )(r, j) is aproper
joint probability function since)(r, j) 0 for all 6combinations of (r, j) and thesumof these6
probabilitiesis1. Whenthereareonly afewvaluesfor A and1 it isofteneasier totabulate)(r, j)
thantondaformulafor it. Well usethisexamplebelowtoillustrateother denitionsfor multivariate
distributions, but rst wegiveashort examplewhereweneedtond)(r, j).
Example: Supposeafair coinistossed3times. DenetherandomvariablesA =number of Heads
and1 = 1(0) if H(T) occursontherst toss. Findthejoint probabilityfunctionfor (A, 1 ).
Solution: First weshouldnotetherangefor (A, 1 ), whichistheset of possiblevalues(r, j) which
canoccur. ClearlyA canbe0, 1, 2, or 3and1 canbe0or 1, but well seethat not all 8combinations
(r, j) arepossible.
Wecannd)(r, j) = 1(A = r, 1 = j) byjust writingdownthesamplespace
o = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} thatwehaveusedbeforeforthisprocess.
Thensimplecountinggives)(r, j) asshowninthefollowingtable:
r
)(r, j) 0 1 2 3
0
1
8
2
8
1
8
0
j
1 0
1
8
2
8
1
8
For example, (A, 1 ) = (0, 0) if andonlyif theoutcomeisTTT; (A, 1 ) = (1, 0) iff theoutcome
iseither THT or TTH.
Notethat therangeor joint p.f. for (A, 1 ) isalittleawkwardtowritedownhereinformulas, sowe
just usethetable.
Marginal Distributions: Wemaybegivenajoint probabilityfunctioninvolvingmorevariablesthan
wereinterestedinusing. Howcanweeliminateanywhicharenotof interest? Lookattherstexample
136
above. If wereonlyinterestedinA, anddont carewhat value1 takes, wecanseethat
1(A = 0) = 1(A = 0, 1 = 1) +1(A = 0, 1 = 2),
so1(A = 0) = )(0, 1) +)(0, 2) = 0.3. Similarly
1(A = 1) = )(1, 1) +)(1, 2) = .3 and
1(A = 2) = )(2, 1) +)(2, 2) = .4
Thedistributionof A obtainedinthiswayfromthejoint distributioniscalledthemarginal proba-
bilityfunctionof A:
r 0 1 2
)(r) .3 .3 .4
Inthesameway, if wewereonlyinterestedin1 , weobtain
1(1 = 1) = )(0, 1) +)(1, 1) +)(2, 1) = .6
sinceA canbe0, 1, or 2when1 = 1. Themarginal probabilityfunctionof 1 wouldbe:
j 1 2
)(j) .6 .4
Our notation for marginal probability functions is still inadequate. What is )(1)? As soon as we
substituteanumber for r or j, wedont knowwhichvariablewerereferringto. For this reason, we
generallyputasubscriptonthe) toindicatewhether itisthemarginal probabilityfunctionfor therst
or secondvariable. So)
1
(1) wouldbe1(A = 1) = .3, while)
2
(1) wouldbe1(1 = 1) = 0.6. An
alternativenotationthat youmayseeis)
A
(r) and)
Y
(j).
Ingeneral, tond)
1
(r) weaddover all valuesof j whereA = r, andtond)
2
(j) weaddover all
valuesof r with1 = j. Then
)
1
(r) =
X
all y
)(r, j) and
)
2
(j) =
X
all x
)(r, j).
Thisreasoningcanbeextendedbeyondtwovariables. For example, withthreevariables(A
1
, A
2
, A
3
),
137
)
1
(r
1
) =
X
all (x
2
,x
3
)
)(r
1
, r
2
, r
3
) and
)
1,3
(r
1
, r
3
) =
X
all x
2
)(r
1
, r
2
, r
3
) = 1(A
1
= r
1
, A
3
= r
3
)
where)
1,3
(r
1
, r
3
) isthemarginal joint distributionof (A
1
, A
3
).
Independent Random Variables:
For eventsand1, wehavedenedand1 tobeindependentif andonlyif 1(1) = 1() 1(1).
Thisdenitioncanbeextendedtorandomvariables(A, 1 ): tworandomvariablesareindependent if
their joint probabilityfunctionistheproduct of themarginal probabilityfunctions.
Denition 21 A and 1 are independent random variables iff )(r, j) = )
1
(r))
2
(j) for all values
(r, j)
Denition 22 In general, A
1
, A
2
, , A
a
are independent random variables iff
)(r
1
, r
2
, , r
a
) = )
1
(r
1
))
2
(r
2
). .)
a
(r
a
) for all r
1
, r
2
, ...r
a
Inour rstexampleA and1 arenotindependentsince)
1
(r))
2
(j) 6= )(r, j) for anyof the6combina-
tionsof (r, j) values; e.g., )(1, 1) = .2 but )
1
(1))
2
(1) = (0.3) (0.6) 6= 0.2. Becareful applyingthis
denition. YoucanonlyconcludethatA and1 areindependentaftercheckingall (r, j) combinations.
Evenasinglecasewhere)
1
(r))
2
(j) 6= )(r, j) makesA and1 dependent.
Conditional Probability Functions:
Again wecan extend adenition fromevents to randomvariables. For events and 1, recall that
1(|1) =
1(1)
1(1)
. Since1(A = r|1 = j) = 1(A = r, 1 = j),1(1 = j), wemakethefollowing
denition.
Denition 23 The conditional probability function of A given 1 = j is )(r|j) =
)(a,j)
)
2
(j)
.
Similarly, )(j|r) =
)(a,j)
)
1
(a)
(provided, of course, the denominator is not zero).
138
Inour rst examplelet usnd)(r|1 = 1).
)(r|1 = 1) =
)(r, 1)
)
2
(1)
.
Thisgives:
r 0 1 2
)(r|1 = 1)
.1
.6
=
1
6
.2
.6
=
1
3
.3
.6
=
1
2
As youwouldexpect, marginal andconditional probability functions areprobability functions in
that theyarealways 0 andtheir sumis1.
Functions of Variables:
Inanexampleearlier, your nal mark inacoursemight beafunctionof the3variablesA
1
, A
2
, A
3
-
assignment, midterm, andexammarks
33
. Indeed, weoftenencounter problemswhereweneedtond
theprobability distributionof afunctionof twoor morerandomvariables. Themost general method
for ndingtheprobabilityfunctionfor somefunctionof randomvariablesA and1 involveslookingat
everycombination(r, j) toseewhatvaluethefunctiontakes. For example, if weletl = 2(1 A) in
our example, thepossiblevaluesof l areseenbylookingat thevalueof l = 2(j r) for each(r, j)
intherangeof (A, 1 ).
r
n 0 1 2
1 2 0 -2
j
2 4 2 0
Then 1(l = 2) = 1(A = 2 and1 = 1) = )(2, 1) = .3
1(l = 0) = 1(A = 1 and1 = 1, or A = 2 and1 = 2)
= )(1, 1) +)(2, 2) = .3
1(l = 2) = )(0, 1) +)(1, 2) = .2
1(l = 4) = )(0, 2) = .2
Theprobabilityfunctionof l isthus
33
"Dont worryabout your marks. J ust makesurethat youkeepupwiththeworkandthat youdont havetorepeat ayear.
It snot necessarytohavegoodmarksineverything" Albert Einsteininletter tohisson, 1916.
139
n -2 0 2 4
)(n) .3 .3 .2 .2
For somefunctionsit ispossibletoapproachtheproblemmoresystematically. Oneof themost com-
monfunctionsof thistypeisthetotal. Let T = A +1 . Thisgives:
r
t 0 1 2
1 1 2 3
j
2 2 3 4
Then1(T = 3) = )(1, 2) +)(2, 1) = .4, for example. Continuinginthisway, weget
t 1 2 3 4
)(t) .1 .4 .4 .1
(Wearebeingalittlesloppy withour notationby using) for both)(t) and)(r, j). Noconfusion
ariseshere, but better notationwouldbetowrite)
T
(t) for 1(T = t).) Infact, tond1(T = t) we
aresimply addingtheprobabilitiesfor all (r, j) combinationswithr + j = t. Thiscouldbewritten
as:
)(t) =
P
all (x,y)
withx+y=t
)(r, j).
However, if r +j = t, thenj = t r. Tosystematicallypickout theright combinationsof (r, j), all
wereallyneedtodoissumover valuesof r andthensubstitutet r for j. Then,
)(t) =
X
all x
)(r, t r) =
X
all x
1(A = r, 1 = t r)
So 1(T = 3) wouldbe
1(T = 3) =
X
all a
)(r, 3 r) = )(0, 3) +)(1, 2) +)(2, 1) = 0.4.
(note)(0, 3) = 0 since1 cant be3.)
Wecansummarizethemethodof ndingtheprobability functionfor afunctionl = q(A, 1 ) of two
randomvariablesA and1 asfollows:
Let )(r, j) = 1(A = r, 1 = j) betheprobabilityfunctionfor (A, 1 ). Thentheprobabilityfunction
for l is
)
l
(n) = 1(l = n) =
X
all(i,):
j(a,j)=&
)(r, j)
140
Thiscanalsobeextendedtofunctionsof threeor morerandomvariablesl = q(A
1
, A
2
, . . . , A
a
):
)
l
(n) = 1(l = n) =
X
(i
1
,...,i
n
):
j(a
1
,...,a
n
)=&
)(r
1
, . . . , r
a
).
(Note: Donot get confusedbetweenthefunctions) andq intheabove: )(r, j) isthejoint probability
functionof therandomvariablesA, 1 whereasl = q(A, 1 ) denesthenew randomvariablethat
isafunctionof A and1 , andwhosedistributionwewant tond.)
Example: LetA and1 beindependentrandomvariableshavingPoissondistributionswithaverages
(means) of j
1
andj
2
respectively. Let T = A +1 . Finditsprobabilityfunction, )(t).
Solution: Werst needtond)(r, j). SinceA and1 areindependent weknow
)(r, j) = )
1
(r))
2
(j)
UsingthePoissonprobabilityfunction,
)(r, j) =
j
a
1
c
j
1
r!
j
j
2
c
j
2
j!
wherer andj canequal 0, 1, 2, . . . . Now,
1(T = t) = 1(A +1 = t) =
X
all a
1(A = r, 1 = t r).
Then
)(t) =
X
all a
)(r, t r)
=
t
X
a=0
j
a
1
c
j
1
r!
j
ta
2
c
j
2
(t r)!
To evaluate this sum, factor out constant terms and try to regroup in some formwhich can be
evaluatedbyoneof our summationtechniques.
)(t) = j
t
2
c
(j
1
+j
2
)
t
X
a=0
1
r!(t r)!

j
1
j
2

a
141
If wehadat! onthetopinsidethe
t
P
a=0
, thesumwouldbeof theform
t
P
a=0

t
a

j
1
j
2

a
. Thisisthe
right handsideof thebinomial theorem. Multiplytopandbottombyt! toget:
)(t) =
j
t
2
c
(j
1
+j
2
)
t!
t
X
a=0

t
r

j
1
j
2

a
=
j
t
2
c
(j
1
+j
2
)
t!
(1 +
j
1
j
2
)
t
bythebinomial theorem.
Takeacommondenominator of j
2
toget
)(t) =
j
t
2
c
(j
1
+j
2
)
t!
(j
1
+j
2
)
t
j
t
2
=
(j
1
+j
2
)
t
t!
c
(j
1
+j
2
)
, for t = 0, 1, 2,
Notethatwehavejustshownthatthesumof 2independentPoissonrandomvariablesalsohasaPoisson
distribution.
Example: Threesprinters, , 1 andC, competeagainst eachother in10independent 100m. races.
Theprobabilitiesof winninganysingleraceare.5for , .4for 1, and.1for C. Let A
1
, A
2
andA
3
be
thenumber of races, 1 andC win.
(a) Findthejoint probabilityfunction, )(r
1
, r
2
, r
3
)
(b) Findthemarginal probabilityfunction, )
1
(r
1
)
(c) Findtheconditional probabilityfunction, )(r
2
|r
1
)
(d) AreA
1
andA
2
independent? Why?
(e) Let T = A
1
+A
2
. Finditsprobabilityfunction, )(t).
Solution: Beforestarting, notethat r
1
+r
2
+r
3
= 10 sincethereare10racesinall. Wereallyonly
havetwovariablessincer
3
= 10 r
1
r
2
. However it is convenient touser
3
tosavewritingand
preservesymmetry.
(a) Thereasoningwill besimilar tothewaywefoundthebinomial distributioninChapter 6except
thattherearenow3typesof outcome. Thereare
10!
a
1
!a
2
!a
3
!
differentoutcomes(i.e. resultsforraces
1to10) inwhichtherearer
1
winsby, r
2
by1, andr
3
byC. Eachof thesearrangementshas
aprobabilityof (.5) multipliedr
1
times, (.4) r
2
times, and(.1) r
3
timesinsomeorder;
i.e., (.5)
a
1
(.4)
a
2
(.1)
a
3
Binomial
Binomial
142
Therefore
)(r
1
, r
2
, r
3
) =
10!
r
1
!r
2
!r
3
!
(.5)
a
1
(.4)
a
2
(.1)
a
3
Therangefor )(r
1
, r
2
, r
3
) istriples(r
1
, r
2
, r
3
) whereeachr
i
isaninteger between0and10,
andwherer
1
+r
2
+r
3
= 10.
(b) It wouldalsobeacceptabletodropr
3
asavariableandwritedowntheprobabilityfunctionfor
A
1
, A
2
only; thisis
)(r
1
, r
2
) =
10!
r
1
!r
2
!(10 r
1
r
2
)!
(.5)
a
1
(.4)
a
2
(.1)
10a
1
a
2
,
because of the fact that A
3
must equal 10 A
1
A
2
. For this probability function r
1
=
0, 1, , 10; r
2
= 0, 1, , 10 and r
1
+r
2
10. Thissimpliesnding)
1
(r
1
) alittle. We
nowhave)
1
(r
1
) =
P
a
2
)(r
1
, r
2
). Thelimitsof summationneedcare: r
2
couldbeassmall as0,
but sincer
1
+r
2
10, wealsorequire r
2
10 r
1
. (For exampleif r
1
= 7 then1 canwin
0, 1, 2, or 3 races.) Thus,
)
1
(r
1
) =
10a
1
X
a
2
=0
10!
r
1
!r
2
!(10 r
1
r
2
)!
(.5)
a
1
(.4)
a
2
(.1)
10a
1
a
2
=
10!
r
1
!
(.5)
a
1
(.1)
10a
1
10a
1
X
a
2
=0
1
r
2
!(10 r
1
r
2
)!

.4
.1

a
2
(Hint: In

a
v

=
a!
v!(av)!
the2termsinthedenominator addtotheterminthenumerator, if we
ignorethe! sign.) Multiplytopandbottomby[r
2
+ (10 r
1
r
2
)]! = (10 r
1
)! Thisgives
)
1
(r
1
) =
10!
r
1
!(10 r
1
)!
(0.5)
a
1
(0.1)
10a
1
10a
1
X
a
2
=0

10 r
1
r
2

0.4
0.1

a
2
=

10
r
1

(0.5)
a
1
(0.1)
10a
1
(1 +
.4
.1
)
10a
1
(againusingthebinomial theorem)
=

10
r
1

(0.5)
a
1
(0.1)
10a
1
(0.1 + 0.4)
10a
1
(0.1)
10a
1
=

10
r
1

(0.5)
a
1
(0.5)
10a
1
Here)
1
(r
1
) isdenedfor r
1
= 0, 1, 2, , 10.
Note: Whilethisderivationisincludedasanexampleof howtondmarginal distributionsby sum-
mingajoint probabilityfunction, thereisamuchsimpler methodfor thisproblem. Notethat eachrace
143
is either wonby (success) or it is not wonby (failure). Sincetheraces areindependent and
A
1
isnowjust thenumber of success outcomes, A
1
must haveabinomial distribution, with: = 10
andj = .5.
Hence)
1
(r
1
) =

10
a
1

(.5)
a
1
(.5)
10a
1
; for r
1
= 0, 1, . . . , 10, asabove.
(c) Remember that )(r
2
|r
1
) = 1(A
2
= r
2
|A
1
= r
1
), sothat
)(r
2
|r
1
) =
)(r
1
, r
2
)
)
1
(r
1
)
=
10!
a
1
!a
2
!(10a
1
a
2
)!
(.5)
a
1
(.4)
a
2
(.1)
10a
1
a
2
10!
a
1
!(10a
1
)!
(.5)
a
1
(.5)
10a
1
=
(10 r
1
)!
r
2
! (10 r
1
r
2
)!
(.4)
a
2
(.1)
10a
1
a
2
(.5)
a
2
(.5)
10a
1
a
2
=

10 r
1
r
2

4
5

a
2

1
5

10a
1
a
2
For any givenvalueof r
1
, r
2
rangesthrough0, 1, . . . , (10 r
1
). (Sotherangeof A
2
depends
onthevaluer
1
, whichmakessense: if 1 winsr
1
racesthenthemost canwinis10 r
1
.)
Note: As in(b), this result canbeobtainedmoresimply by general reasoning. Oncewearegiven
that winsr
1
races, theremaining(10 r
1
) racesareall wonby either 1 or C. For theseraces, 1
wins
4
5
of thetimeandC
1
5
of thetime, because1(1 wins) = 0.4 and1(C wins) = 0.1; i.e., 1 wins
4timesasoftenasC. Moreformally
1(1 wins|1 or C wins) = 0.8.
Therefore )(r
2
|r
1
) =

10 r
1
r
2

4
5

a
2

1
5

10a
1
a
2
fromthebinomial distribution.
(d) A
1
andA
2
areclearlynotindependentsincethemoreraceswins, thefewer racestherearefor
1 towin. Moreformally,
)
1
(r
1
))
2
(r
2
) =

10
r
1

(.5)
a
1
(.5)
10a
1

10
r
2

(.4)
a
2
(.6)
10a
2
6= )(r
1
, r
2
)
(Ingeneral, if therangefor A
1
dependsonthevalueof A
2
, thenA
1
andA
2
cannot beindepen-
dent.)
144
(e) If T = A
1
+A
2
then
)(t) =
X
a
1
) (r
1
, t r
1
)
=
t
X
a
1
=0
10!
r
1
!(t r
1
)! (10 r
1
(t r
1
))!
| {z }
(10t)!
(.5)
a
1
(.4)
ta
1
(.1)
10t
Theupper limit onr
1
ist because, for example, if t = 7 then couldnot havewonmorethan7
races. Then
)(t) =
10!
(10 t)!
(.4)
t
(.1)
10t
t
X
a
1
=0
1
r
1
!(t r
1
)!

.5
.4

a
1
What doweneedtomultiplybyonthetopandbottom? Canyouspot it beforelookingbelow?
)(t) =
10!
t!(10 t)!
(.4)
t
(.1)
10t
t
X
a
1
=0
t!
r
1
!(t r
1
)!

.5
.4

a
1
=

10
t

(.4)
t
(.1)
10t

1 +
.5
.4

t
=

10
t

(.4)
t
(.1)
10t
(.4 +.5)
t
(.4)
t
=

10
t

(.9)
t
(.1)
10t
for t = 0, 1, . . . , 10.
Exercise: Explaintoyourself howthisanswer canbeobtainedfromthebinomial distribution, aswe
didinthenotesfollowingparts(b) and(c).
Thefollowingproblemissimilar toconditional probability problemsthat wesolvedinChapter 4.
Nowwearedealingwitheventsdenedintermsof randomvariables. Earlier resultsgiveusthingslike
1(1 = j) =
X
all a
1(1 = j|A = r)1(A = r) =
X
all a
)(j|r))
1
(r)
Example: In an auto parts company an averageof j defectiveparts areproduced per shift. The
number, A, of defectivepartsproducedhasaPoissondistribution. Aninspector checksall partsprior
toshippingthem, but thereisa10%chancethat adefectivepart will slipbyundetected. Let 1 bethe
number of defectiveparts theinspector nds onashift. Find)(r|j). (Thecompany wants to know
howmanydefectivepartsareproduced, but canonlyknowthenumber whichwereactuallydetected.)
Solution: Think of A = r beingevent and1 = j beingevent 1; wewant tond1(|1). Todo
thiswell use
1(|1) =
1(1)
1(1)
=
1(1|)1()
1(1)
145
Weknow)
1
(r) =
j
i
c

a!
= 1(A = r). Also, for agivennumber r of defectiveitemsproduced, the
number, 1 , detectedhasabinomial distributionwith: = j andj = .9, assumingeachinspectiontakes
placeindependently. Then
) (j|r) =

r
j

(.9)
j
(.1)
aj
=
)(r, j)
)
1
(r)
.
Therefore
)(r, j) = )
1
(r))(j|r) =
j
a
c
j
r!
r!
j!(r j)!
(.9)
j
(.1)
aj
Toget )(r|j) well need)
2
(j). Wehave
)
2
(j) =
X
all a
)(r, j) =

X
a=j
j
a
c
j
j!(r j)!
(.9)
j
(.1)
aj
(r j sincethenumber of defectiveitemsproducedcant belessthanthenumber detected.)
=
(.9)
j
c
j
j!

X
a=j
j
a
(.1)
aj
(r j)!
Wecouldt thisintothesummationresult c
a
=
a
0
0!
+
a
1
1!
+
a
2
2!
+ bywritingj
a
asj
aj
j
j
. Then
)
2
(j) =
(.9j)
j
c
j
j!

X
a=j
(.1j)
aj
(r j)!
=
(.9j)
j
c
j
j!

(.1j)
0
0!
+
(.1j)
1
1!
+
(.1j)
2
2!
+

=
(.9j)
j
c
j
j!
c
.1j
=
(.9j)
j
c
.9j
j!
)(r|j) =
)(r, j)
)
2
(j)
=
j
i
c

(.9)

(.1)
i
j!((aj)!
(.9)

c
.9
j!
=
(.1j)
aj
c
.1j
(r j)!
for r = j, j + 1, j + 2,
Problems:
8.1.1 Thejoint probabilityfunctionof (A, 1 ) is:
r
)(r, j) 0 1 2
0 .09 .06 .15
j 1 .15 .05 .20
2 .06 .09 .15
146
a) AreA and1 independent? Why?
b) Tabulatetheconditional probabilityfunction, ) (j|A = 0).
c) Tabulatetheprobabilityfunctionof 1 = A 1 .
8.1.2 Inproblem6.14, giventhat r sales weremadeina1hour period, ndtheprobability function
for 1 , thenumber of callsmadeinthat hour.
8.1.3 A and1 areindependent, with)(r) =

a+I1
a

j
I
(1 j)
a
and
)(j) =

j+1
j

(1 j)
j
. Let T = A +1 . Findtheprobabilityfunction, )(t). Youmayuse
theresult

o+b1
o

= (1)
o

b
o

.
8.2 Multinomial Distribution
There is only this one multivariate model distribution introduced in this course, though other mul-
tivariate distributions exist. The multinomial distribution dened below is very important. It is a
generalizationof thebinomial model tothecasewhereeachtrial has/ possibleoutcomes.
Physical Setup: This distributionis thesameas binomial except thereare/ types of outcomerather
thantwo. Anexperimentisrepeatedindependently: timeswith/ distincttypesof outcomeeachtime.
Lettheprobabilitiesof these/ typesbej
1
, j
2
, , j
I
eachtime. LetA
1
bethenumber of timesthe1
st
typeoccurs, A
2
thenumber of timesthe2
nd
occurs, , A
I
thenumber of timesthe/
th
typeoccurs.
Then(A
1
, A
2
, , A
I
) hasamultinomial distribution.
Notes:
(1) j
1
+j
2
+ +j
I
= 1
(2) A
1
+A
2
+ +A
I
= :,
If wewishwecandroponeof thevariables(say thelast), andjust notethat A
I
equals: A
1

A
2
A
I1
.
Illustrations:
(1) Intheexampleof Section8.1withsprintersA,B, andC running10raceswehadamultinomial
distributionwith: = 10 and/ = 3.
(2) Supposestudent marksaregiveninletter gradesasA, B, C, D, or F. Inaclassof 80studentsthe
number gettingA, B, ..., F might haveamultinomial distributionwith: = 80 and/ = 5.
f(x): Poisson * Binomial = Poisson f(y) : Poisson (given) f(x|y): Binomial (given) f(y|x): Poisson
147
Joint Probability Function: Thejoint probability functionof A
1
, . . . , A
I
is givenby extendingthe
argumentinthesprintersexamplefrom/ = 3 togeneral /. Thereare
a!
a
1
!a
2
!a
I
!
different outcomesof
the: trialsinwhichr
1
areof the1
st
type, r
2
areof the2
nd
type, etc. Eachof thesearrangementshas
probabilityj
a
1
1
j
a
2
2
j
a
I
I
sincej
1
ismultipliedr
1
timesinsomeorder, etc.
Therefore) (r
1
, r
2
, , r
I
) =
:!
r
1
!r
2
! r
I
!
j
a
1
1
j
a
2
2
j
a
I
I
Therestrictiononther
i
sarer
i
= 0, 1, , : and
I
P
i=1
r
i
= :.
Asacheckthat
P
) (r
1
, r
2
, , r
I
) = 1 weusethemultinomial theoremtoget
X
:!
r
1
!r
2
! r
I
!
j
a
1
1
j
a
I
I
= (j
1
+j
2
+ +j
I
)
a
= 1.
Wehavealready seen oneexampleof themultinomial distribution in thesprinter example. Hereis
another simpleexample.
Example: Everypersonisoneof four bloodtypes: A, B, ABandO. (Thisisimportantindetermining,
for example, whomay giveabloodtransfusiontoaperson.) Inalargepopulationlet thefractionthat
hastypeA, B, AB andO, respectively, bej
1
, j
2
, j
3
, j
4
. Then, if : personsarerandomlyselectedfrom
thepopulation, thenumbersA
1
, A
2
, A
3
, A
4
of typesA, B, AB, Ohaveamultinomial distributionwith
/ = 4 (InCaucasianpeoplethevaluesof thej
i
sareapproximatelyj
1
= .45, j
2
= .08, j
3
= .03, j
4
=
.44.)
Remark: We sometimes use the notation (A
1
, . . . , A
I
) 'n|t(:; j
1
, . . . , j
I
) to indicate that
(A
1
, . . . , A
I
) haveamultinomial distribution.
Remark: For some types of problems its helpful to write formulas in terms of r
1
, . . . , r
I1
and
j
1
, . . . , j
I1
usingthefact that
r
I
= : r
1
r
I1
and j
I
= 1 j
1
j
I1
.
Inthiscasewecanwritethejointp.f. as)(r
1
, . . . , r
I1
) butwemustremember thenthatr
1
, . . . , r
I1
satisfythecondition0 r
1
+ +r
I1
:.
Themultinomial distributioncanalsoariseincombinationwithother models, andstudentsoftenhave
troublerecognizingit then.
Example: A potter isproducingteapotsoneat atime. Assumethat they areproducedindependently
of each other and with probability j the pot produced will be satisfactory; the rest are sold at a
lower price. Thenumber, A, of rejects beforeproducingasatisfactory teapot is recorded. When12
148
satisfactory teapots areproduced, what is theprobability the12 values of A will consist of six 0s,
three1s, two2sandonevaluewhichis 3?
Solution: Eachtimeasatisfactory pot isproducedthevalueof A fallsinoneof thefour categories
A = 0, A = 1, A = 2, A 3. Under theassumptions giveninthis question, A has ageometric
distributionwith
)(r) = j(1 j)
a
; for r = 0, 1, 2,
sowecanndtheprobabilityfor eachof thesecategories. Wehave1(A = r) = )(r) for 0, 1, 2, and
wecanobtain1 (A 3) invariousways:
a)
1 (A 3) = )(3) +)(4) +)(5) + = j(1 j)
3
+j(1 j)
4
+j(1 j)
5
+
=
j(1 j)
3
1 (1 j)
= (1 j)
3
sincewehaveageometricseries.
b)
1 (A 3) = 1 1 (A < 3) = 1 )(0) )(1) )(2).
Withsomere-arranging, thisalsogives(1 j)
3
.
c) TheonlywaytohaveA 3 istohavetherst3potsproducedall beingrejects. Therefore1 (A 3) =
1 (3consecutiverejects) =(1 j)(1 j)(1 j) = (1 j)
3
Reiterating that each time a pot is successfully produced, the value of A falls in one of 4 cate-
gories (0, 1, 2, or 3), weseethat theprobability askedfor is givenby amultinomial distribution,
Mult(12; )(0), )(1), )(2), 1(A 3)):
)(6, 3, 2, 1) =
12!
6!3!2!1!
[)(0)]
6
[)(1)]
3
[)(2)]
2
[1 (A 3)]
1
=
12!
6!3!2!1!
j
6
[j(1 j)]
3

j(1 j)
2

(1 j)
3

1
=
12!
6!3!2!1!
j
11
(1 j)
10
Problems:
8.2.1 An insurance company classies policy holders as class A,B,C, or D. The probabilities of a
randomlyselectedpolicyholder beinginthesecategoriesare.1, .4, .3and.2, respectively. Give
expressionsfor theprobabilitythat 25randomlychosenpolicyholderswill include
(a) 3As, 11Bs, 7Cs, and4Ds.
149
(b) 3Asand11Bs.
(c) 3Asand11Bs, giventhat thereare4Ds.
8.2.2 Chocolatechipcookies aremadefrombatter containinganaverageof 0.6chips per c.c. Chips
aredistributedaccordingto theconditions for aPoissonprocess. Eachcookieuses 12c.c. of
batter. Giveexpressionsfor theprobabilitiesthat inadozencookies:
(a) 3havefewer than5chips.
(b) 3havefewer than5chipsand7havemorethan9.
(c) 3havefewer than5chips, giventhat 7havemorethan9.
8.3 Markov Chains
34
Consider asequenceof (discrete) randomvariables A
1
, A
2
, . . . eachof whichtakes integer values
1, 2, . . . (called states). We assume that for a certain matrix 1 (called the transition probability
matrix), theconditional probabilitiesaregivenbycorrespondingelementsof thematrix; i.e.
1[A
a+1
= ,|A
a
= i] = 1
i)
, i = 1, . . . , , = 1, . . .
andfurthermorethat thechainonlyusesthelast stateoccupiedindeterminingitsfuture; i.e. that
1[A
a+1
= ,|A
a
= i, A
a1
= i
1
, A
a2
= i
2
...A
a|
= i
|
] = 1[A
a+1
= ,|A
a
= i] = 1
i)
for all ,, i, i
1
, i
2
, . . . i
|
, and| = 2, 3, .... Thenthesequenceof randomvariablesA
a
iscalledaMarkov
35
Chain. Markov Chainmodels arethemost commonsimplemodels for dependent variables, andare
usedtopredictweather aswell asmovementsof securityprices. Theyallowthefutureof theprocessto
dependonthepresent stateof theprocess, but thepast behaviour caninuencethefutureonlythrough
thepresent state.
34
Thissectionoptional for stat 220
35
After Andrei Andreyevich Markov (1856-1922), a Russian mathematician, Professor at Saint Petersburg University.
Markovstudiedsequencesof mutuallydependent variables, hopingtoestablishthelimitinglawsof probabilityintheir most
general formanddiscoveredMarkovchains, launchedthetheoryof stochasticprocesses. Aswell, Markovappliedthemethod
of continuedfractions, pioneeredbyhisteacher PafnutyChebyshev, toprobabilitytheory, completedChebyschevsproof of
thecentral limit theorem(seeChapter 9) for independent non-identically distributedrandomvariables. For entertainment,
Markovwasalsointerestedinpoetryandstudiedpoeticstyle.
150
Example. Rain-No rain
Supposethat theprobabilitythat tomorrowisrainygiventhat todayisnot rainingisc (andit doesnot
otherwisedependonwhether it rainedinthepast) andtheprobability that tomorrowisdry giventhat
today is rainy is ,. If tomorrows weather depends onthepast only throughwhether today is wet or
dry, wecandenerandomvariables
A
a
=
(
1 if Day: iswet
0 if Day: isdry
(beginningat somearbitrary timeorigin, day : = 0 ). ThentherandomvariablesA
a
, : = 0, 1, 2, ...
formaMarkovchainwith = 2 possiblestatesandhavingprobabilitytransitionmatrix
1 =

1 c c
, 1 ,
!
Properties of the Transition Matrix 1
Notethat 1
i)
0 for all i, , and
P
.
)=1
1
i)
= 1 for all i. Thislast propertyholdsbecausegiventhat
A
a
= i, A
a+1
must occupyoneof thestates, = 1, 2, ..., .
The distribution of A
a
Supposethat thechainisstartedby randomly choosingastatefor A
0
withdistribution1[A
0
= i] =

i
, i = 1, 2, . . . . Thenthedistributionof A
1
isgivenby
1(A
1
= ,) =
.
X
i=1
1(A
1
= ,, A
0
= i)
=
.
X
i=1
1(A
1
= ,|A
0
= i)1(A
0
= i)
=
.
X
i=1
1
i)

i
andthisisthe,
0
t/ element of thevector
T
1 where isthecolumnvector of values
i
. Toobtainthe
distributionat time: = 1, premultiplythetransitionmatrix1 byavector representingthedistribution
at time: = 0. Similarly thedistribution of A
2
is thevector
T
1
2
where1
2
is theproduct of the
matrix 1 with itself and thedistribution of A
a
is
T
1
a
. Under very general conditions, it can be
shown that these probabilities converge because the matrix 1
a
converges pointwise to a limiting
matrix as: . Infact, inmany suchcases, thelimit does not dependontheinitial distribution
becausethelimitingmatrix has all of its rows identical andequal to somevector of probabilities .
Identifyingthisvector whenconvergenceholdsisreasonablyeasy.
151
Denition
A limiting distribution of aMarkovchainisavector ( say) of longrunprobabilitiesof theindividual
statesso

i
= lim
t
1[A
t
= i].
Nowlet us supposethat convergencetothis distributionholds for aparticular initial distribution so
weassumethat

T
1
a

T
as: .
Thennoticethat
(
T
1
a
)1
T
1
but also
(
T
1
a
)1 =
T
1
a+1

T
as:
so
T
must havethepropertythat

T
1 =
T
Any limitingdistributionmust havethisproperty andthismakesit easy inmany examplestoidentify
thelimitingbehaviour of thechain.
Denition 24 A stationarydistributionof a Markov chain is the column vector ( say) of probabilities
of the individual states such that
T
1 =
T
.
Example: (weather continued)
Let usreturntotheweather exampleinwhichthetransitionprobabilitiesaregivenbythematrix
1 =

1 c c
, 1 ,
!
What isthelong-runproportionof rainydays? Todeterminethisweneedtosolvetheequations

T
1 =
T


0

1

1 c c
, 1 ,
!
=


0

1

subject totheconditionsthat thevalues


0
,
1
arebothprobabilities(non-negative) andaddtoone. It
iseasytoseethat thesolutionis

0
=
,
c +,

1
=
c
c +,
152
whichisintuitively reasonableinthat it saysthat thelong-runprobability of thetwostatesispropor-
tional totheprobabilityof aswitchtothatstatefromtheother. Sothelong-runprobabilityof adryday
isthelimit

0
= lim
a
1(A
a
= 0) =
,
c +,
.
Youmight tryverifyingthisbycomputingthepowersof thematrix1
a
for : = 1, 2, .... andshowthat
1
a
approachesthematrix

o
c+o
c
c+o
o
c+o
c
c+o
!
as : . There are various mathematical conditions under which the limiting distribution of a
Markovchainisuniqueandindependent of theinitial stateof thechainbut roughlytheyassert that the
chainissuchthat it forgetsthemoreandmoredistant past.
Independent Random Variables
Consider aMarkovchainwithtransitionprobabilitymatrix
1 =

1 c c
1 c c
!
.
Noticethatbothrowsof thismatrixareidentical so1(A
a+1
= 1|A
a
= 0) = c = 1(A
a+1
= 1|A
a
=
1). For thischaintheconditional distributionof A
a+1
givenA
a
= i evidentlydoesnotdependonthe
valueof i. Thisdemonstratesindependence. Indeedif A and1 aretwodiscreterandomvariablesand
if theconditional probabilityfunction )
j|a
(j|r) of 1 givenA isidentical for all possiblevaluesof r
thenit must beequal totheunconditional (marginal) probabilityfunction)
j
(j). If )
j|a
(j|r) = )
j
(j)
for all values of r and j then A and 1 are independent randomvariables. Therefore if a Markov
Chainhastransitionprobabilitymatrixwithall rowsidentical, it correspondstoindependent random
variables A
1
, A
2
, ..... Thisisthemost forgetful of all Markovchains. It paysnoattentionwhatever to
thecurrent stateindeterminingthenext state.
Is the stationary distribution unique? Onemight wonder whether it ispossiblefor aMarkovchain
tohavemorethanonestationarydistributionandconsequentlypossiblymorethanonelimitingdistri-
bution. Wehaveseenthat the2 2 Markovchainwithtransitionprobabilitymatrix
1 =

1 c c
, 1 ,
!
153
has a solution of
T
1 =
T
and
0
+
1
= 1 given by
0
=
o
c+o
,
1
=
c
c+o
. Is thereis any
other solution possible? Rewriting theequation
T
1 =
T
in theform
T
(1 1) = 0, notethat
thedimension of thesubspaceof solutions
T
is oneprovided that therank of thematrix 1 1 is
one(i.e. thesolutions
T
areall scalar multiples of thevector
T
), andthedimensionis 2provided
that therank of thematrix 1 1 is0. Only if rank(1 1) = 0 will therebetwolinear independent
solutionsandhencetwopossiblecandidatesfor equilibriumdistributions. Butif 1 1 hasrank0, then
1 = 1, thetransitionprobability matrix of avery stubbornMarkov chainwhichalways stays in the
state currently occupied. For two-dimensional MarkovChains, onlyinthecase1 = 1 istheremore
thanonestationary distributionandany probability vector
T
satises
T
1 =
T
andisastationary
distribution. Thisisat theoppositeendof thespectrumfromtheindependent caseabovewhichpays
noattentiontothecurrent stateindeterminingthenext state. Thechainwith1 = 1 never leavesthe
current state.
Example (Gene Model) A simpleformof inheritanceof traits occurs when atrait is governedby
apair of genes anda. Anindividual may havean of ana combination(inwhichcasethey
areindistinguishableinappearance, or " dominatesa). Let uscall anAA individual dominant, aa,
recessive anda hybrid. Whentwoindividualsmate, theoffspringinheritsonegeneof thepair from
each parent, and weassumethat thesegenes areselected at random. Nowlet us supposethat two
individualsof oppositesexselectedatrandommate, andthentwoof their offspringmate, etc. Herethe
stateisdeterminedbyapair of individuals, sothestatesof our processcanbeconsideredtobeobjects
like(, a) indicatingthatoneof thepair isandtheother isa (wedonotdistinguishtheorder
of thepair, or maleandfemale-assumingthesegenesdonot dependonthesexof theindividual)
Number State
1 (, )
2 (, a)
3 (, aa)
4 (a, a)
5 a, aa)
6 (aa, aa)
For example, consider thecalculationof 1(A
t+1
= ,|A
t
= 2). Inthis caseeachoffspringhas
probability 1,2 of beingadominant , andprobability of 1,2 of beingahybrid (a). If twooff-
springareselectedindependentlyfromthisdistributionthepossiblepairsare(, ), (, a), (a, a)
154
withprobabilities1,4, 1,2, 1,4 respectively. Sothetransitionshaveprobabilitiesbelow:
(, ) (, a) (, aa) (a, a) (a, aa) (aa, aa)
(, ) 1 0 0 0 0 0
(, a) .25 .5 0 .25 0 0
(, aa) 0 0 0 1 0 0
(a, a) .0625 .25 .125 .25 .25 .0625
(a, aa) 0 0 0 .25 .5 .25
(aa, aa) 0 0 0 0 0 1
andtransitionprobabilitymatrix
1 =

1 0 0 0 0 0
.25 .5 0 .25 0 0
0 0 0 1 0 0
.0625 .25 .125 .25 .25 .0625
0 0 0 .25 .5 .25
0 0 0 0 0 1

What is thelong-runbehaviour insuchasystem? For example, thetwo-generationtransitionproba-


bilitiesaregivenby
1
2
=

1 0 0 0 0 0
0.3906 0.3125 0.0313 0.1875 0.0625 .01156
0.0625 0.25 0.125 0.25 0.25 0.0625
0.1406 0.1875 0.0312 0.3125 0.1875 0.14063
0.01562 0.0625 0.0313 0.1875 0.3125 0.3906
0 0 0 0 0 1

whichseems toindicateadrift to oneor other of theextremestates 1or 6. Toconrmthelong-run


behaviour calculate:
1
100
=

1 0 0 0 0 0
0.75 0 0 0 0 0.25
0.5 0 0 0 0 0.5
0.5 0 0 0 0 0.5
0.25 0 0 0 0 0.75
0 0 0 0 0 1

whichshowsthat eventuallythechainisabsorbedineither of state1or state6, withtheprobabilityof


absorptiondependingontheinitial state. Thischain, unliketheonesstudiedbefore, hasmorethanone
possiblestationary distribution, for example,
T
= (1, 0, 0, 0, 0, 0) and
T
= (0, 0, 0, 0, 0, 1), andin
thesecircumstancesthechaindoesnot havethesamelimitingdistributionfor all initial states.
155
8.4 Expectation for Multivariate Distributions: Covariance and Corre-
lation
It is easy to extendthedenitionof expectedvalueto multiplevariables. Generalizing1[q (A)] =
P
all a
q(r))(r) leadstothedenitionof expectedvalueinthemultivariatecase
Denition 25
1[q (A, 1 )] =
X
all (a,j)
q(r, j))(r, j)
and
1[q (A
1
, A
2
, , A
a
)] =
X
all (a
1
,a
2
, ,an)
q (r
1
, r
2
, r
a
) ) (r
1
, , r
a
)
As before, theserepresent theaveragevalueof q(A, 1 ) andq(A
1
, . . . , A
a
). 1[q (A, 1 )] couldalso
bedeterminedbyndingtheprobabilityfunction)
Z
(.) of 7 = q(A, 1 ) andthenusingthedenition
of expectedvalue1(7) =
P
all :
.)
Z
(.).
Example: Let thejoint probabilityfunction, )(r, j), begivenby
r
)(r, j) 0 1 2
1 .1 .2 .3
j 2 .2 .1 .1
Find1(A1 ) and1(A).
Solution:
1(A1 ) =
X
all (a,j)
rj)(r, j)
= (0 1 .1) + (1 1 .2) + (2 1 .3) + (0 2 .2) + (1 2 .1) + (2 2 .1)
= 1.4
Tond1(A) wehaveachoiceof methods. First, takingq(r, j) = r weget
1(A) =
X
all (a,j)
r)(r, j)
= (0 .1) + (1 .2) + (2 .3) + (0 .2) + (1 .1) + (2 .1)
= 1.1
156
Alternatively, since1(A) onlyinvolvesA, wecouldnd)
1
(r) anduse
1(A) =
2
X
a=0
r)
1
(r) = (0 .3) + (1 .3) + (2 .4) = 1.1
Example: Intheexampleof Section8.1withsprintersA, B, andCwehad(usingonlyA
1
andA
2
in
our formulas)
) (r
1
, r
2
) =
10!
r
1
!r
2
!(10 r
1
r
2
)!
(.5)
a
1
(.4)
a
2
(.1)
10a
1
a
2
whereA winsr
1
timesandB winsr
2
timesin10races. Find1(A
1
A
2
).
Solution: Thiswill besimilar totheway wederivedthemeanof thebinomial distributionbut, since
thisisamultinomial distribution, well beusingthemultinomial theoremtosum.
1(A
1
A
2
) =
X
r
1
r
2
)(r
1
, r
2
) =
X
i
1
6=0
i
2
6=0
r
1
r
2
10!
r
1
(r
1
1)!r
2
(r
2
1)!(10 r
1
r
2
)!
(.5)
a
1
(.4)
a
2
(.1)
10a
1
a
2
=
X
i
1
6=0
i
2
6=0
(10)(9)(8!)
(r
1
1)!(r
2
1)! [(10 2) (r
1
1) (r
2
1)]!
(.5)(.5)
a
1
1
(.4)(.4)
a
2
1
(.1)
(102)(a
1
1)(a
2
1)
= (10)(9)(.5)(.4)
X
i
1
6=0
i
2
6=0
8!
(r
1
1)!(r
2
1)! [8 (r
1
1) (r
2
1)]!
(.5)
a
1
1
(.4)
a
2
1
(.1)
8(a
1
1)(a
2
1)
Let j
1
= r
1
1 andj
2
= r
2
1 inthesumandweobtain
1(A
1
A
2
) = (10)(9)(.5)(.4)
X
(j
1
,j
2
)
8!
j
1
!j
2
!(8 j
1
j
2
)!
(.5)
j
1
(.4)
j
2
(.1)
8j
1
j
2
= 18(.5 +.4 +.1)
8
= 18
Property of Multivariate Expectation: It iseasilyproved(makesureyoucandothis) that
1[aq
1
(A, 1 ) +/q
2
(A, 1 )] = a1[q
1
(A, 1 )] +/1[q
2
(A, 1 )]
Thiscanbeextendedbeyond2functionsq
1
andq
2
, andbeyond2variablesA and1 .
Relationships between Variables:
Independenceis ayes/no way of deningarelationshipbetweenvariables. Weall knowthat there
can be different types of relationships between variables which are dependent. For example, if A
157
is your height in inches and 1 your height in centimeters therelationship is one-to-one and linear.
Moregenerally, tworandomvariablesmay berelated(non-independent) inaprobabilistic sense. For
example, a persons weight 1 is not an exact linear function of their height A, but 1 and A are
neverthelessrelated. Well lookat twowaysof measuringthestrengthof therelationshipbetweentwo
randomvariables. Therst iscalledcovariance.
Denition 26 The covariance of A and 1 , denoted Cov(A, 1 ) or o
AY
, is
Cov(A, 1 ) = 1[(A j
A
)(1 j
Y
)]
For calculationpurposesthisdenitionisusuallyharder tousethantheformulawhichfollows, which
isprovednotingthat
Cov(A, 1 ) = 1[(A j
A
) (1 j
Y
)] = 1(A1 j
A
1 Aj
Y
+j
A
j
Y
)
= 1(A1 ) j
A
1(1 ) j
Y
1(A) +j
A
j
Y
= 1(A1 ) 1(A)1(1 ) 1(1 )1(A) +1(A)1(1 )
ThereforeCov(A, 1 ) = 1(A1 ) 1(A)1(1 )
Example:
Intheexamplewithjoint probabilityfunction
r
)(r, j) 0 1 2
1 .1 .2 .3
j
2 .2 .1 .1
ndCov(A, 1 ).
Solution: Wepreviouslycalculated1(A1 ) = 1.4 and1(A) = 1.1. Similarly, 1(1 ) = (1 .6) +
(2 .4) = 1.4
ThereforeCov(A, 1 ) = 1.4 (1.1)(1.4) = .14
Exercise: Calculatethecovarianceof A
1
andA
2
for thesprinter example. Wehavealready found
that 1(A
1
A
2
) =18. Themarginal distributionsof A
1
andof A
2
aremodelsfor whichwevealready
derivedthemean. If your solutiontakesmorethanafewlinesyouremissinganeasier solution.
158
Interpretation of Covariance:
(1) Supposelargevalues of A tend to occur with largevalues of 1 and small values of A with
small values of 1 . Then (A j
A
) and (1 j
Y
) will tend to beof thesamesign, whether
positive or negative. Thus (A j
A
) (1 j
Y
) will be positive. Hence Cov (A, 1 ) 0.
For example in Figure 8.2 we see several hundred points plotted. Notice that the majority
of thepoints areinthetwo quadrants (lower left andupper right) labelledwith"+" so that for
these(A j
A
) (1 j
Y
) 0. A minority of points arein theother two quadrants labelled
"-" and for these (A j
A
) (1 j
Y
) < 0. Moreover the points in the latter two quad-
rants appear closer to themean(j
A
, j
Y
) indicatingthat onaverage, over all points generated
acraqc((A j
A
) (1 j
Y
)) 0. Presumably thisimpliesthat over thejoint distributionof
(A, 1 ), 1[(A j
A
) (1 j
Y
)] 0 or Co(A, 1 ) 0.
3 2 1 0 1 2 3
4
3
2
1
0
1
2
3
4
+
+

Y
x
y
Figure8.2: Randompoints(A, 1 ) withcovariance0.5, variances1.
For exampleof A =persons height and1 =persons weight, thenthesetwo randomvariables
will havepositivecovariance.
(2) Supposelargevalues of A tend to occur with small values of 1 and small values of A with
large values of 1 . Then (A j
A
) and (1 j
Y
) will tend to be of opposite signs. Thus
(A j
A
) (1 j
Y
) tendstobenegative. HenceCov(A, 1 ) < 0. For exampleseeFigure8.3
For exampleif A =thicknessof atticinsulationinahouseand1 =heatingcostfor thehouse, then
Co(A, 1 ) < 0.
159
3 2 1 0 1 2 3
4
3
2
1
0
1
2
3
4
Figure8.3: Covariance=-0.5, variances=1
Theorem 27 If A and 1 are independent then Cov (A, 1 ) = 0.
Proof: Recall 1(A j
A
) = 1(A) j
A
= 0. Let A and1 beindependent.
Then)(r, j) = )
1
(r))
2
(j).
Cov (A, 1 ) = 1[(A j
A
) (1 j
Y
)] =
P
all j

P
all a
(r j
A
) (j j
Y
) )
1
(r))
2
(j)

=
P
all j

(j j
Y
) )
2
(j)
P
all a
(r j
A
) )
1
(r)

=
P
all j
[(j j
Y
) )
2
(j)1(A j
A
)]
=
P
all j
0 = 0
Thefollowingtheoremgivesadirect proof theresult above, andisuseful inmanyother situations.
Theorem 28 Suppose random variables A and 1 are independent. Then, if q
1
(A) and q
2
(1 ) are any
two functions,
1[q
1
(A)q
2
(1 )] = 1[q
1
(A)]1[q
2
(1 )].
160
Proof: SinceA and1 areindependent, )(r, j) = )
1
(r))
2
(j). Thus
1[q
1
(A)q
2
(1 )] =
P
all(a,j)
q
1
(r)q
2
(j))(r, j)
=
P
all a
P
all j
q
1
(r))
1
(r)q
2
(j))
2
(j)
= [
P
all a
q
1
(r))
1
(r)][
P
all j
q
2
(j))
2
(j)]
= 1[q
1
(A)]1[q
2
(1 )]
Toproveresult (3) above, wejust notethat if A and1 areindependent then
Cov(A, 1 ) = 1[(A j
A
)(1 j
Y
)]
= 1(A j
A
)1(1 j
Y
) = 0
Caution: This result is not reversible. If Cov (A, 1 ) = 0 wecan not concludethat A and 1 are
independent. For examplesupposethat therandomvariable7 is uniformly distributed on theval-
ues {1, 0.9, ....0.9, 1} and dene A = sin(27) and 1 = cos(27). It is easy to see that
Cov(A, 1 ) = 0 but the two randomvariables A, 1 are clearly related because the points (A, 1 )
arealwaysonacircle.
Example: Let (A, 1 ) havethejoint probabilityfunction)(0, 0) = 0.2, )(1, 1) = 0.6, )(2, 0) =
0.2; i.e. (A, 1 ) onlytakes3values.
r 0 1 2
)
1
(r) .2 .6 .2
and
j 0 1
)
2
(j) .4 .6
aremarginal probabilityfunctions. Since)
1
(r))
2
(j) 6= )(r, j), therefore, A and1 arenot
independent. However,
1(A1 ) = (0 0 .2) + (1 1 .6) + (2 0 .2) = .6
1(A) = (0 .2) + (1 .6) + (2 .2) = 1 and 1(1 ) = (0 .4) + (1 .6) = .6
ThereforeCov (A, 1 ) = 1(A1 ) 1(A)1(1 ) = .6 (1)(.6) = 0
SoA and1 havecovariance0but arenot independent. If Cov(A, 1 ) = 0 wesaythat A and1 are
uncorrelated, becauseof thedenitionof correlation
36
givenbelow.
36
" Thenest thingsinlifeincludehavingaclear graspof correlations. " Albert Einstein, 1919.
161
(4) Theactual numerical valueof Cov (A, 1 ) hasnointerpretation, socovarianceisof limiteduse
inmeasuringrelationships.
Exercise:
(a) Lookbackat theexampleinwhich)(r, j) wastabulatedandCov(A, 1 ) = .14. Considering
howcovarianceisinterpreted, doesit makesensethat Cov(A, 1 ) wouldbenegative?
(b) Withoutlookingattheactual covarianceforthesprinterexercise, wouldyouexpectCov(A
1
, A
2
)
tobepositiveor negative? (If A winsmoreof the10races, will Bwinmoreracesor fewerraces?)
Wenowconsider asecond, relatedwaytomeasurethestrengthof relationshipbetweenA and1 .
Denition 29 The correlation coefcient of A and 1 is j =
Cov (A,Y )
o
^
o
Y
Thecorrelationcoefcient measures thestrengthof thelinear relationshipbetweenA and1 and
issimply arescaledversionof thecovariance, scaledtolieintheinterval [1, 1]. Youcanattempt to
guessthecorrelationbetweentwovariablesbasedonascatter diagramof valuesof thesevariablesat
thewebpage
ht t p: / / st at web. cal pol y. edu/ chance/ appl et s/ guesscor r el at i on/ GuessCor r el at i on. ht ml
For exampleinFigure8.4 I guessedacorrelationof -0.9 whereasthetruecorrelationcoefcient gen-
eratingthesedatawasj = 0.92.
Properties of j:
1) Sinceo
A
ando
Y
, thestandarddeviations of A and1 , arebothpositive, j will havethesame
signasCov(A, 1 ). Hencetheinterpretationof thesignof j isthesameasfor Cov(A, 1 ), and
j = 0 if A and1 areindependent. Whenj = 0 wesaythat A and1 areuncorrelated.
2) 1 j 1 andasj 1 therelationbetweenA and1 becomesone-to-oneandlinear.
Proof: Deneanewrandomvariableo = A +t1 , wheret issomereal number. Well showthat the
fact that Var(o) 0 leadsto2) above. Wehave
Var (o) = 1{(o j
S
)
2
}
= 1{[(A +t1 ) (j
A
+tj
Y
)]
2
}
= 1{[(A j
A
) +t(1 j
Y
)]
2
}
= 1{(A j
A
)
2
+ 2t(A j
A
)(1 j
Y
) +t
2
(1 j
Y
)
2
}
= o
2
A
+ 2tCov(A, 1 ) +t
2
o
2
Y
162
Figure8.4: Guessingthecorrelationbasedonascatter diagramof points
Since\ ar(o) 0 for any real number t, this quadratic equation must haveat most onereal root
(valueof t for whichit iszero). Therefore
(2Cov(A, 1 ))
2
4o
2
A
o
2
Y
0
leadingtotheinequality

Cov(A, 1 )
o
A
o
Y

1
Toseethatj = 1 correspondstoaone-to-onelinear relationshipbetweenA and1 , notethatj = 1
corresponds to a zero discriminant in thequadratic equation. This means that thereexists onereal
number t

for which
Var (o) = Var (A +t

1 ) = 0
But for Var(A + t

1 ) to bezero, A + t

1 must equal aconstant c. Thus A and1 satisfy alinear


relationship.
Exercise: Calculatej for thesprinter example. Doesyour answer makesense? (Youshouldalready
havefoundCov(A
1
, A
2
) inapreviousexercise, solittleadditional workisneeded.)
Problems:
8.4.1 Thejoint probabilityfunctionof (A, 1 ) is:
163
r
)(r, j) 0 1 2
0 .06 .15 .09
j
1 .14 .35 .21
Calculatethecorrelationcoefcient, j. What doesit indicateabout therelationshipbetweenA
and1 ?
8.4.2 Supposethat A and1 arerandomvariableswithjoint probabilityfunction:
r
)(r, j) 2 4 6
-1 1/8 1/4 j
j
1 1/4 1/8
1
4
j
(a) For what valueof j areA and1 uncorrelated?
(b) Showthat thereisnovalueof j for whichA and1 areindependent.
8.5 Mean and Variance of a Linear Combination of Random Variables
Manyproblemsrequireustoconsider linear combinationsof randomvariables; exampleswill begiven
belowandinChapter 9. Althoughwritingdowntheformulasissomewhat tedious, wegiveheresome
important resultsabout their meansandvariances.
Results for Means:
1. 1(aA +/1 ) = a1(A) + /1(1 ) = aj
A
+ /j
Y
, whena and/ areconstants. (This follows
fromthedenitionof expectedvalue.) Inparticular, 1(A +1 ) = j
A
+j
Y
and1(A 1 ) =
j
A
j
Y
.
2. Let a
i
beconstants(real numbers) and1(A
i
) = j
i
. Then1(
P
a
i
A
i
) =
P
a
i
j
i
. Inparticular,
1(
P
A
i
) =
P
1(A
i
).
3. Let A
1
, A
2
, , A
a
berandomvariables which havemean j. (You can imaginethesebeing
some sample results froman experiment such as recording the number of occupants in cars
travellingover atoll bridge.) ThesamplemeanisA =
n

.=1
A
.
a
. Then1

= j.
It only indicates that there is no linear relationship between X and Y
164
Proof: From(2), 1

a
P
i=1
A
i

=
a
P
i=1
1(A
i
) =
a
P
i=1
j = :j. Thus
1

1
:
X
A
i

=
1
:
1

X
A
i

=
1
:
:j = j
Results for Covariance:
1. Cov(A, A) = 1[(A j
A
) (A j
A
)] = 1
h
(A j)
2
i
= \ ar (A)
2. Cov(aA +/1, cl +d\ ) = ac Co (A, l)+ad Co (A, \ )+/c Co (1, l)+/d Co (1, \ )
wherea, /, c, andd areconstants.
Proof:
Co (aA +/1, cl +d\ ) = 1[(aA +/1 aj
A
/j
Y
) (cl +d\ cj
l
dj
\
)]
= 1{[a (A j
A
) +/ (1 j
Y
)] [c (l j
l
) +d (\ j
\
)]}
= ac1[(A j
A
) (l j
l
)] +ad1[(A j
A
) (\ j
\
)]
+/c1[(1 j
Y
) (l j
l
)] +/d1[(1 j
Y
) (\ j
\
)]
= ac Cov (A, l) +ad Cov (A, \ ) +/c Cov (1, l) +/d Cov (1, \ )
Thistypeof result canbegeneralized, but getsmessytowriteout.
Results for Variance:
1. Variance of a linear combination:
Var (aA +/1 ) = a
2
Var (A) +/
2
Var(1 ) + 2a/ Cov (A, 1 )
Proof:
Var (aA +/1 ) = 1
h
(aA +/1 aj
A
/j
Y
)
2
i
= 1
n
[a (A j
A
) +/ (1 j
Y
)]
2
o
= 1
h
a
2
(A j
A
)
2
+/
2
(1 j
Y
)
2
+ 2a/ (A j
A
) (1 j
Y
)
i
= a
2
1
h
(A j
A
)
2
i
+/
2
1
h
(1 j
Y
)
2
i
+ 2a/1[(A j
A
) (1 j
Y
)]
= a
2
o
2
A
+/
2
o
2
Y
+ 2a/ Cov (A, 1 )
165
Exercise: Trytoprovethisresultbywriting\ ar (aA +/1 ) asCov(aA +/1, aA +/1 ) andusing
propertiesof covariance.
2. Variance of a sum of independent random variables: Let A and 1 beindependent. Since
Cov(A, 1 ) = 0, result 1. gives
Var (A +1 ) = o
2
A
+o
2
Y
;
i.e., for independent variables, thevariance of a sum is the sum of the variances. Alsonote
Var (A 1 ) = o
2
A
+ (1)
2
o
2
Y
= o
2
A
+o
2
Y
;
i.e., for independent variables, thevarianceof adifferenceisthesumof thevariances.
3. Variance of a general linear combination: Let a
i
beconstantsandVar (A
i
) = o
2
i
. Then
Var

X
a
i
A
i

=
X
a
2
i
o
2
i
+ 2
X
i<)
a
i
a
)
Cov (A
i
, A
)
) .
Thisisageneralizationof result 1. andcanbeprovedusingeither of themethodsusedfor 1.
4. Variance of a linear combination of independent: Special casesof result 3. are:
a) If A
1
, A
2
, , A
a
areindependent thenCov(A
i
, A
)
) = 0, sothat
Var

X
a
i
A
i

=
X
a
2
i
o
2
i
.
b) If A
1
, A
2
, , A
a
areindependent andall havethesamevarianceo
2
, then
Var

= o
2
,:
Proof of 4 (b): A =
1
a
P
A
i
. From4(a), Var (
P
A
i
) =
a
P
i=1
\ ar (A
i
) = :o
2
. Using Var
(aA +/) = a
2
\ ar (A), weget:
Var

A

= Var

1
:
X
A
i

=
1
:
2
Var

X
A
i

=
:o
2
:
2
= o
2
,:.
Remark: This result is a very important one in probability and statistics. To recap, it says that if
A
1
, . . . , A
a
areindependent randomvariableswiththesamemeanj andsomevarianceo
2
, thenthe
samplemean

A =
1
a
a
P
i=1
A
i
has
1(

A) = j
Var (

A) = o
2
,:
166
This shows that theaverage

A of : randomvariables withthesamedistributionis less variablethan
any singleobservationA
i
, andthat thelarger : is theless variability thereis. This explains mathe-
maticallywhy, for example, that if wewant toestimatetheunknownmeanheight j inapopulationof
people, wearebetter totaketheaverageheightfor arandomsampleof : = 10 personsthantojusttake
theheight of onerandomly selectedperson. A sampleof : = 20 personswouldbebetter still. There
areinterestingappletsat theurl http://users.ece.gatech.edu/users/gtz/java/samplemean/notes.html and
http://www.ds.uni.it/VL/VL_EN/applets/BinomialCoinExperiment.html which allows one to sample
andexploretherateatwhichthesamplemeanapproachestheexpectedvalue. InChapter 9wewill see
howtodecidehowlargeasampleweshouldtakefor acertaindegreeof precision. Alsonotethat as
: , \ ar(

A) 0, whichmeansthat

A becomesarbitrarilyclosetoj. Thisissometimescalled
thelawof averages
37
. Thereisaformal theoremwhichsupportstheclaimthatfor largesamplesizes,
samplemeansapproachtheexpectedvalue, calledthelawof largenumbers.
Indicator Variables
Theresultsfor linear combinationsof randomvariablesprovideawayof breakingupmorecomplicated
problems, involving mean and variance, into simpler pieces using indicator variables; an indicator
variableisjustabinaryvariable(0or 1) thatindicateswhether ornotsomeeventoccurs. Well illustrate
thisimportant methodwith3examples.
Example: Mean and Variance of a Binomial R.V.
Let A 1i(:, j) inabinomial process. DenenewvariablesA
i
by:
A
i
= 0if thei
th
trial wasafailure
A
i
= 1if thei
th
trial wasasuccess.
i.e. A
i
indicateswhether theoutcomesuccess occurredonthei
th
trial. Thetrick weuseisthat the
total number of successes, A, isthesumof theA
i
s:
A =
a
X
i=1
A
i
.
Wecanndthemeanandvarianceof A
i
andthenuseour resultsfor themeanandvarianceof asum
toget themeanandvarianceof A. First,
1(A
i
) =
1
X
a
.
=0
r
i
) (r
i
) = 0)(0) + 1)(1) = )(1)
37
"I feel likeafugitivefromthelawof averages."
WilliamH. Mauldin(1921- 2003)
167
But)(1) = j sincetheprobabilityof successisj oneachtrial. Therefore1(A
i
) = j. SinceA
i
= 0
or 1, A
i
= A
2
i
, andtherefore
1

A
2
i

= 1(A
i
) = j.
Thus
Var (A
i
) = 1

A
2
i

[1(A
i
)]
2
= j j
2
= j(1 j).
Inthebinomial distributionthetrialsareindependent sotheA
i
sarealsoindependent. Thus
1(A) = 1

a
P
i=1
A
i

=
a
P
i=1
1(A
i
) =
a
P
i=1
j = :j
Var(A) = Var

a
P
i=1
A
i

=
a
P
i=1
Var (A
i
) =
a
P
i=1
j(1 j) = :j(1 j)
These, of course, arethesameas wederived previously for themean and varianceof thebinomial
distribution. Notehowsimplethederivationhereis!
Remark: If A
i
isabinaryrandomvariablewith1(A
i
= 1) = j = 1 1(A
i
= 0) then1(A
i
) = j
andVar(A
i
) = j(1j), asshownabove. (NotethatA
i
1i(1, j) isactuallyabinomial r.v.) Insome
problemstheA
i
sarenot independent, andthenwealsoneedcovariances.
Example: Let A haveahypergeometricdistribution. Findthemeanandvarianceof A.
Solution: Asabove, let usthink of thesetting, whichinvolvesdrawing: itemsat randomfromatotal
of , of whichr areo and r are1 items. Dene
A
i
=
(
0 if i
th
draw is a failure (1) item
1 if i
th
draw is a success (o) item.
ThenA =
a
P
i=1
A
i
as for thebinomial example, but nowtheA
i
s aredependent. (For example, what
weget ontherst drawaffectstheprobabilitiesof o and1 for theseconddraw, andsoon.) Therefore
weneedtondCov(A
i
, A
)
) for i 6= , aswell as1(A
i
) andVar(A
i
) inorder touseour formulafor
thevarianceof asum.
Weseerst that 1(A
i
= 1) = r, for each of i = 1, . . . , :. (If thedraws arerandomthen the
probability ano occursindrawi isjust equal totheprobability positioni isano whenwearranger
osand r 1sinarow.) Thisimmediatelygives
1(A
i
) = r,
Var(A
i
) =
r

(1
r

)
168
since
Var(A
i
) = 1(A
2
i
) 1(A
i
)
2
= 1(A
i
) 1(A
i
)
2
.
Thecovarianceof A
i
andA
)
(i 6= ,) isequal to1(A
i
A
)
) 1(A
i
)1(A
)
), soweneed
1(A
i
A
)
) =
1
P
a
.
=0
1
P
a

=0
r
i
r
)
)(r
i
, r
)
)
= )(1, 1)
= 1(A
i
= 1, A
)
= 1)
Theprobabilityof ano onbothdrawsi and, isjust
r(r 1),[( 1)] = 1(A
i
= 1)1(A
)
= 1|A
i
= 1)
Thus,
Cov (A
i
, A
)
) = 1(A
i
A
)
) 1(A
i
) 1(A
)
)
=
v(v1)
.(.1)

v
.

v
.

=

v
.

v1
.1

v
.

=
v(.v)
.
2
(.1)
(Does it makesensethat Cov (A
i
, A
)
) isnegative? If youdrawasuccess indrawi, areyoumoreor
lesslikelytohaveasuccessondraw,?) Nowwend1(A) andVar(A). First,
1(A) = 1

X
A
i

=
a
X
i=1
1(A
i
) =
a
X
i=1

= :

BeforendingVar (A), howmanycombinationsA


i
, A
)
aretherefor whichi < ,? Eachi and, takes
valuesfrom1, 2, , : sothereare

a
2

different combinationsof (i, ,) values. Eachof thesecanonly


bewrittenin1waytomakei < ,. Therefore Thereare

a
2

combinationswithi < , (e.g. if i = 1, 2, 3


and, = 1, 2, 3, thecombinations withi < , are(1,2) (1,3) and(2,3). Sothereare

3
2

= 3 different
combinations.)
Nowwecannd
Var(A) = Var

a
P
i=1
A
i

=
a
P
i=1
Var (A
i
) + 2
P
i<)
Cov (A
i
, A
)
)
= :
v(.v)
.
2
+ 2

a
2

v(.v)
.
2
(.1)
i
= :

v
.

.v
.

h
1
(a1)
(.1)
i
since2

a
2

=
2a(a1)
2
= :(: 1)

= :

v
.

1
v
.

.a
.1

In the last two examples, we know )(r), and could have found 1(A) and Var(A) without using
indicator variables. In thenext example)(r) is not known and is hard to nd, but wecan still use
indicator variablesfor obtainingj ando
2
. Thefollowingexampleisafamousprobleminprobability.
169
Example: Wehave lettersto different people, and envelopesaddressedtothose people.
Oneletter is put in each envelopeat random. Find themean and varianceof thenumber of letters
placedintheright envelope.
Solution:
Let A
i
=
(
0; if letter i is not in envelope i
1; if letter i is in envelope i.
Then
.
P
i=1
A
i
isthenumber of correctlyplacedletters. Onceagain, theA
i
saredependent (Why?).
First 1(A
i
) =
1
P
a
.
=0
r
i
)(r
i
) = )(1) =
1
.
= 1

A
2
i

(sincethereis1chancein that letter i will be


put inenvelopei) andthen,
Var (A
i
) = 1(A
i
) [1(A
i
)]
2
=
1

2
=
1

1
1

Exercise: Beforecalculatingcov(A
i
, A
)
), what signdoyouexpect ittohave? (If letter i iscorrectly
placeddoesthat makeit moreor lesslikelythat letter , will beplacedcorrectly?)
Next, 1(A
i
A
)
) = )(1, 1) (Asinthelast example, thisistheonly non-zeroterminthesum.) Now,
)(1, 1) =
1
.
1
.1
sinceonceletter i iscorrectlyplacedthereis1chancein 1 of letter , goingin
envelope,.
Therefore1(A
i
A
)
) =
1
( 1)
For thecovariance,
Cov (A
i
, A
)
) = 1(A
i
A
)
) 1(A
i
) 1(A
)
) =
1
( 1)

=
1

1
1

1

=
1

2
( 1)
1

.
X
i=1
A
i
!
=
.
X
i=1
1(A
i
) =
.
X
i=1
1

=

1

= 1
Var

.
X
i=1
A
i
!
=
.
X
i=1
Var (A
i
) + 2
X
i<)
Cov (A
i
, A
)
)
=
.
X
i=1
1

1
1

+ 2

2
( 1)
=
1

1
1

+ 2

2
( 1)
= 1
1

+ 2
( 1)
2
1

2
( 1)
= 1
170
(Commonsenseoftenhelps inthis course, but wehavefoundnoway of beingabletosay this result
is obvious. Onaverage1letter will becorrectly placedandthevariancewill be1, regardless of how
manylettersthereare.)
Problems:
8.5.1 Thejoint probabilityfunctionof (A, 1 ) isgivenby:
r
)(r, j) 0 1 2
0 .15 .1 .05
j
1 .35 .2 .15
Calculate1(A), Var (A), Cov(A, 1 ) andVar (3A21 ). Youmayusethefactthat1(1 ) = .7
andVar (1 ) =.21without verifyingthesegures.
8.5.2 Inarowof 25switches, eachis consideredto beon or off. Theprobability of beingonis
.6for eachswitch, independentlyof other switch. Findthemeanandvarianceof thenumber of
unlikepairsamongthe24pairsof adjacent switches.
8.5.3 SupposeVar (A) = 1.69, Var (1 ) = 4, j = 0.5; andlet l = 2A 1 . Findthestandard
deviationof l.
8.5.4 Let 1
0
, 1
1
, , 1
a
beuncorrelatedrandomvariables withmean0andvarianceo
2
. Let A
1
=
1
0
+1
1
, A
2
= 1
1
+1
2
, , A
a
= 1
a1
+1
a
. FindCov(A
i1
, A
i
) for i = 2, 3, , : and
Var

a
P
i=1
A
i

.
8.5.5 A plasticfabricatingcompanyproducesitemsinstripsof 24, withtheitemsconnectedbyathin
pieceof plastic:
Item1 Item2 ... Item24
A cuttingmachinethencutstheconnectingpiecestoseparatetheitems, withthe23cutsmade
independently. Thereisa10%chancethemachinewill fail tocut aconnectingpiece. Findthe
meanandstandarddeviationof thenumber of the24itemswhicharecompletely separateafter
thecutshavebeenmade. (Hint: Let A
i
= 0 if itemi isnot completely separate, andA
i
= 1 if
itemi iscompletelyseparate.)
171
8.6 Multivariate Moment Generating Functions
38
Supposewehavetwopossiblydependent randomvariables(A, 1 ) andwewishtocharacterizetheir
joint distributionusingamoment generatingfunction. J ust as theprobability functionandthecumu-
lativedistributionfunctionare, intis case, functions of two arguments, so is themoment generating
function.
Denition 30 The joint moment generating function of (A, 1 ) is
'(:, t) = 1{c
cA+tY
}
Recall that if A, 1 happentobeindependent, q
1
(A) andq
2
(1 ) areanytwofunctions,
1[q
1
(A)q
2
(1 )] = 1[q
1
(A)]1[q
2
(1 )]. (8.9)
andsowithq
1
(A) = c
cA
andq
2
(1 ) = c
tY
weobtain, for independent randomvariablesA, 1
'(:, t) = '
A
(:)'
Y
(t)
theproduct of themoment generatingfunctionsof A and1 respectively.
Thereis another labour-savingproperty of moment generatingfunctions for independent random
variables. SupposeA, 1 areindependent randomvariableswithmoment generatingfunctions'
A
(t)
and'
Y
(t). Supposeyouwishthemoment generatingfunctionof thesum7 = A + 1. Onecould
attackthisproblembyrst determiningtheprobabilityfunctionof 7,
)
Z
(.) = 1(7 = .) =
X
all a
1(A = r, 1 = . r)
=
X
all a
1(A = r)1(1 = . r)
=
X
all a
)
A
(r))
Y
(. r)
andthencalculating
1(c
tZ
) =
X
all :
c
tZ
)
Z
(.).
Evidentlylotsof work! Ontheother handrecycling(8.9) with
q
1
(A) = c
tA
q
2
(1 ) = c
tY
gives
'
Z
(t) = 1c
t(A+Y )
= 1(c
tA
)1(c
tY
) = '
A
(t)'
Y
(t).
38
FThissectionoptional for Stat 220andStat 230
172
Theorem 31 The moment generating function of the sum of independent random variables is the prod-
uct of the individual moment generating functions.
For exampleif bothA and1 areindependent withthesame(Bernoulli) distribution
r = 0 1
)(r) = 1 j j
thenbothhavemoment generatingfunction
'
A
(t) = '
Y
(t) = (1 j +jc
t
)
andsothemomentgeneratingfunctionof thesum7 is'
A
(t)'
Y
(t) = (1j+jc
t
)
2
. Similarlyif we
addanother independentBernoulli themomentgeneratingfunctionis(1 j+jc
t
)
3
andingeneral the
sumof : independent Bernoulli randomvariablesis(1 j +jc
t
)
a
, themoment generatingfunction
of aBinomial(:, j) distribution. Thisconrmsthatthesumof independentBernoulli randomvariables
hasaBinomial(:, j) distribution.
8.7 Problems on Chapter 8
8.1 Thejoint probabilityfunctionof (A, 1 ) isgivenby:
r
)(r, j) 0 1 2
0 .15 .1 .05
j
1 .35 .2 .15
a) AreA and1 independent? Why?
b) Find1 (A 1 ) and1 (A = 1|1 = 0)
8.2 For apersonwhosecar insuranceandhouseinsurancearewiththesamecompany, let A and
1 represent thenumber of claims on thecar andhousepolicies, respectively, in agiven year.
Supposethat for acertaingroupof individuals, A Poisson(mean= .10) and1 Poisson
(mean= .05).
(a) If A and1 areindependent, nd1(A+1 1) andndthemeanandvarianceof A+1 .
(b) Supposeit waslearnedthat 1(A = 0, 1 = 0) wasverycloseto.94. ShowwhyA and1
cannot beindependent inthiscase. What might explainthenon-independence?
173
8.3 Consider Problem2.7for Chapter 2, whichconcernedmachinerecognitionof handwrittendigits.
Recall that j(r, j) was theprobability that thenumber actually writtenwasr, andthenumber
identiedbythemachinewasj.
(a) AretherandomvariablesA and1 independent? Why?
(b) What is1(A = 1 ), that is, theprobabilitythat arandomnumber iscorrectlyidentied?
(c) What istheprobabilitythat thenumber 5isincorrectlyidentied?
8.4 BlooddonorsarriveataclinicandareclassiedastypeA, typeO, or other types. Donors blood
types areindependent with 1 (typeA) =j, 1 (typeO) =, and 1 (other type) =1 j .
Consider thenumber, A, of typeA andthenumber, 1 , of typeOdonorsarrivingbeforethe10
th
other type.
a) Findthejoint probabilityfunction, )(r, j)
b) Findtheconditional probabilityfunction, )(j|r).
8.5 Slotmachinepayouts. Supposethatinaslotmachinethereare:+1 possibleoutcomes
1
, . . . ,
a+1
for asingleplay. A singleplay costs$1. If outcome
i
occurs, youwin$a
i
, for i = 1, . . . , :.
If outcome
a+1
occurs, youwinnothing. Inother words, if outcome
i
(i = 1, . . . , :) occurs
your net prot isa
i
1; if
a+1
occursyour net prot is- 1.
(a) Giveaformulafor your expectedprot fromasingleplay, if theprobabilitiesof the: + 1
outcomesarej
i
= 1(
i
), i = 1, . . . , : + 1.
(b) Theowner of theslot machinewants theplayers expectedprot tobenegative. Suppose
: = 4, withj
1
= .1, j
2
= j
3
= j
4
= .04 andj
5
= .78. If theslot machineisset topay
$3whenoutcome
1
occurs, and$5wheneither of outcomes
2
,
3
,
4
occur, determine
theplayersexpectedprot per play.
(c) Theslotmachineowner wishestopayda
i
dollarswhenoutcome
i
occurs, wherea
i
=
1
j
.
andd isanumber between0and1. Theowner alsowisheshisor her expectedprot tobe
$.05per play. (Theplayersexpectedprot is-.05per play.) Findd asafunctionof : and
j
a+1
. What isthevalueof d if : = 10 andj
a+1
= .7?
8.6 Bacteriaaredistributedthroughriver water accordingto aPoissonprocess withanaverageof
5per 100c.c. of water. What is theprobability ve50c.c. samples of water have1withno
bacteria, 2withonebacterium, and2withtwoor more?
174
8.7 A boxcontains5yellowand3redballs, fromwhich4ballsaredrawnoneat atime, at random,
without replacement. Let A bethenumber of yellow balls on therst two draws and 1 the
number of yellowballsonall 4draws.
a) Findthejoint probabilityfunction, )(r, j).
b) AreA and1 independent? J ustifyyour answer.
8.8 Inaquality control inspectionitems areclassiedas havingaminor defect, amajor defect, or
asbeingacceptable. A cartonof 10itemscontains2withminor defects, 1withamajor defect,
and7acceptable. Threeitemsarechosenat randomwithout replacement. Let A bethenumber
selectedwithminor defectsand1 bethenumber withmajor defects.
a) Findthejoint probabilityfunctionof A and1 .
b) Findthemarginal probabilityfunctionsof A andof 1 .
c) Evaluatenumerically1 (A = 1 ) and1 (A = 1|1 = 0).
8.9 Let A and1 bediscreterandomvariables withjoint probability function)(r, j) = /
2
i+
a!j!
for
r = 0, 1, 2, andj = 0, 1, 2, , where/ isapositiveconstant.
a) Derivethemarginal probabilityfunctionof A.
b) Evaluate/.
c) AreA and1 independent? Explain.
d) Derivetheprobabilityfunctionof T = A +1 .
8.10 Thinning a Poisson process. SupposethateventsareproducedaccordingtoaPoissonprocess
withanaverageof ` eventsper minute. Eacheventhasaprobabilityj of beingaTypeAevent,
independent of other events.
(a) Let the randomvariable 1 represent the number of Type A events that occur in a one-
minuteperiod. Provethat 1 hasaPoissondistributionwithmean`j. (Hint: let A bethe
total number of events ina1minuteperiodandconsider theformulajust beforethelast
exampleinSection8.1).
(b) Lighting strikes in a large forest region occur over the summer according to a Poisson
processwith` = 3 strikesper day. Eachstrikehasprobability .05of startingare. Find
theprobabilitythat thereareat least 5resover a30dayperiod.
175
8.11 Inabreedingexperiment involvinghorsestheoffspringareof four genetictypeswithprobabili-
ties:
Type 1 2 3 4
Probability 3/16 5/16 5/16 3/16
A groupof 40independent offspringareobserved. Giveexpressionsfor thefollowingprobabili-
ties:
(a) Thereare10of eachtype.
(b) Thetotal number of types1and2is16.
(c) Thereareexactly10of type1, giventhat thetotal number of types1and2is16.
8.12 Inaparticular city, let therandomvariableA represent thenumber of childreninarandomly
selectedhousehold, andlet 1 represent thenumber of femalechildren. Assumethat theproba-
bilityachildisfemaleis0.5, regardlessof whatsizehouseholdtheylivein, andthatthemarginal
distributionof A isasfollows:
)(0) = .20, )(1) = .25, )(2) = .35, )(3) = .10, )(4) = .05,
)(5) = .02, )(6) = .01, )(7) = .01, )(8) = .01
(a) Determine1(A).
(b) Findtheprobabilityfunctionfor thenumber of girls1 inarandomlychosenfamily. What
is1(1 )?
8.13 In aparticular city, theprobability acall to aredepartment concerns various situations is as
givenbelow:
1. reinadetachedhome - j
1
= .10
2. reinasemi detachedhome - j
2
= .05
3. reinanapartment or multipleunit residence - j
3
= .05
4. reinanon-residential building - j
4
= .15
5. non-re-relatedemergency - j
5
= .15
6. falsealarm - j
6
= .50
Inaset of 10calls, let A
1
, ..., A
6
represent thenumbersof callsof eachof types1, ..., 6.
(a) Givethejoint probabilityfunctionfor A
1
, ..., A
6
.
(b) Whatistheprobabilitythereisatleastoneapartmentre, giventhatthereare4re-related
calls?
176
(c) If theaveragecostsof callsof types1, ..., 6 are(in$100units) 5, 5, 7, 20, 4, 2respectively,
what istheexpectedtotal cost of the10calls?
8.14 Suppose A
1
, . . . , A
a
have joint p.f. )(r
1
, . . . , r
a
). If q(r
1
, . . . , r
a
) is a function such that
a q(r
1
, . . . , r
a
) / for all (r
1
, . . . , r
a
) intherangeof ),
thenshowthat a 1[q(A
1
, . . . , A
a
)] /.
8.15 Let A and 1 berandomvariables with Var (A) = 13, Var(1 ) = 34 and j = 0.7. Find
Var(A 21 ).
8.16 Let A and1 haveatrinomial distributionwithjoint probabilityfunction
)(r, j) =
:!
r!j!(: r j)!
j
a

j
(1 j )
aaj
;
r = 0, 1, , :
j = 0, 1, , :
andr +j :. Let T = A +1 .
a) What distributiondoesT have? Either explainwhyor derivethisresult.
b) For thedistributionin(a), what is1(T) andVar(T)?
c) Using(b) ndCov(A, 1 ), andexplainwhyyouexpect it tohavethesignit does.
8.17 J aneandJ ack eachtossafair cointwice. Let A bethenumber of headsJ aneobtainsand1 the
number of headsJ ackobtains. Denel = A +1 and\ = A 1 .
a) Findthemeansandvariancesof l and\ .
b) FindCov(l, \ )
c) Arel and\ independent? Why?
8.18 A multiplechoiceexamhas100questions, eachwith5possibleanswers. Onemark isawarded
for acorrect answer and1/4mark isdeductedfor anincorrect answer. A particular student has
probabilityj
i
of knowingthecorrectanswertothei
th
question, independentlyof otherquestions.
a) Supposethat onaquestionwherethestudent doesnot knowtheanswer, heor sheguesses
randomly. Showthat his or her total mark has mean
P
j
i
andvariance
P
j
i
(1 j
i
) +
(100

j
.
)
4
.
b) Showthat thetotal markfor astudent whorefrainsfromguessingalsohasmean
P
j
i
, but
withvariance
P
j
i
(1 j
i
). Comparethevarianceswhenall j
i
sequal (i) .9, (ii) .5.
8.19 Let A and1 beindependent randomvariableswith1(A) = 1(1 ) = 0, Var(A) = 1 andVar
(1 ) = 2. FindCov(A +1, A 1 ).
177
8.20 Anautomobiledriveshaft isassembledbyplacingpartsA, B andC endtoendinastraight line.
Thestandarddeviationinthelengthsof partsA, B andC are0.6, 0.8, and0.7respectively.
(a) Findthestandarddeviationof thelengthof theassembleddriveshaft.
(b) Whatpercentreductionwouldtherebeinthestandarddeviationof theassembleddriveshaft
if thestandarddeviationof thelengthof part B werecut inhalf?
8.21 Theinhabitants of thebeautiful andancient canal city of Pentapolis liveon5islands separated
fromeachother bywater. Bridgescrossfromoneislandtoanother asshown.
1 2
4 3
5
Onanyday, abridgecanbeclosed, withprobabilityj, for restorationwork. Assumingthatthe8
bridgesareclosedindependently, ndthemeanandvarianceof thenumber of islandswhichare
completelycut off becauseof restorationwork.
8.22 A Markov chainhas adoubly stochastic transitionmatrix if boththerowsums andthecolumn
sums of thetransition matrix 1 areall 1. Show that for such aMarkov chain, theuniform
distributionon{1, 2, . . . , } isastationarydistribution.
8.23 A salesmansellsinthreecitiesA,B, andC. Henever sellsinthesamecityonsuccessiveweeks.
If hesells incity A, thenthenext week healways sells inB. However if hesells ineither B
or C, thenthenext week heis twiceas likely tosell incity A as intheother city. What is the
long-runproportionof timehespendsineachof thethreecities?
8.24 Find
lim
a0
1
a
178
where
1 =

0 1 0
1
6
1
2
1
3
0
2
3
1
3

8.25 SupposeA and1 areindependent havingPoissondistributions withparameters `


1
and`
2
re-
spectively. Usemoment generatingfunctionstoidentifythedistributionof thesumA +1.
8.26 Waterloo inJ anuary is blessedby many things, but not by goodweather. Therearenever two
nicedaysinarow. If thereisaniceday, wearejustaslikelytohavesnowasrainthenextday. If
wehavesnowor rain, thereisanevenchanceof havingthesamethenext day. If thereischange
fromsnowor rain, onlyhalf of thetimeisthisachangetoaniceday. Takingasstatesthekinds
of weather R, N, andS. thetransitionprobabilities1 areasfollows
1 =

R N S
R
1
2
1
4
1
4
N
1
2
0
1
2
S
1
4
1
4
1
2

If today is raining, nd theprobability of Rain, Nice, Snow three days fromnow. Find the
probabilitiesof thethreestatesinvedays, given(1) todayisraining(ii) todayisnice(iii) today
issnowing.
8.27 (One-card Poker) A cardgame, which, for thepurposes of this questionwewill call Metzler
Poker, isplayedasfollows. Eachof 2playersbetsaninitial $1andisdealt acardfromadeck
of 13cards numbered1-13. Uponlookingat their card, eachplayer thendecides (unawareof
theothers decision) whether or not toincreasetheir bet by $5(toatotal stakeof $6). If both
increasethestake("raise"), thentheplayer withthehigher cardwins bothstakes-i.e. they get
their moneybackaswell astheother players$6. If onepersonincreasesandtheother doesnot,
thentheplayer whoincreasesautomaticallywinsthepot(i.e. moneyback+$1). If neither person
increasesthestake, thenitisconsideredadraw-eachplayer receivestheir own$1back. Suppose
that Player A andB havesimilar strategies, basedonthresholdnumbers{a,b} theyhavechosen
between1and13. A chooses to raisewhenever their cardis greater thanor equal to a andB
whenever Bscardisgreater thanor equal tob.
(a) SupposeB alwaysraises(sothat b=1). What istheexpectedvalueof Aswinor lossfor
thedifferent possiblevaluesof a=1,2,...,13.
(b) Supposea andb arearbitrary. Giventhat bothplayersraise, what istheprobabilitythat A
wins? What istheexpectedvalueof Aswinor loss?
179
(c) Supposeyouknowthat b=11. Findyour expectedwinor lossfor variousvaluesof a and
determinetheoptimal value. Howmuchdoyouexpecttomakeor loseper gameunder this
optimal strategy?
8.28 (Searching a database) Supposethat wearegiven3 records, 1
1
, 1
2
, 1
3
initially storedinthat
order. Thecost of accessingthejth recordinthelist isj sowewouldlikethemorefrequently
accessed records near thefront of thelist. Whenever a request for record j is processed, the
move-to-front heuristic stores 1
)
at thefront of thelist and theothers in theoriginal order.
For exampleif therst request is for record 2, then therecords will bere-stored in theorder
1
2
, 1
1
, 1
3
. Assumethat on each request, record , is requested with probability j
)
, for , =
1, 2, 3.
(a) Show that if A
)
is thepermutation that obtains after , requests for records (e.g. A
2
=
(2, 1, 3)), thenA
)
, , = 1, 2, ... isaMarkovchain.
(b) Findthestationary distributionof this Markov chain. (Hint: what is theprobability that
A
)
takestheform(2, , ) for large,?)
(c) Findtheexpectedlong-runcost per recordaccessedinthecasej
1
, j
2
, j
3
= 0.1, 0.3, 0.6
respectively.
(d) Howdoesthisexpectedlong-runcost comparewithkeepingtherecordsinrandomorder,
andwithkeepingtheminorder of decreasingvaluesof j
)
(onlypossibleif weknowj
)
).
8.29 (Secretary Problem) Supposeyouaretointerview candidates for ajob, oneat atime. You
must decideimmediately after eachinterviewwhether to hirethecurrent candidateor not and
youwishtomaximizeyour chances of choosingthebest personfor thejob(thereis nobenet
fromchoosingthesecondor thirdbest). For simplicity, assumecandidatei hasnumerical value
A
i
chosenwithout replacement from{1, 2, ..., } where1 =worst, =best. Our strategyisto
interview/ candidatesrst, andthenpicktherst of theremaining / that hasvaluegreater
thanmax(A
1
, A
2
, ..., A
I
). What isthebest choiceof /? (Hint: youmayusetheapproximation
P
a1
)=1
1
)
' ln(:)). For this choice, what is theapproximateprobability that youdochoosethe
maximum?
8.30 Threestocks areassumedto havereturns over thenext year A
1
, A
2
, A
3
whichhavethesame
expectedvalue1(A
i
) = 0.08, i = 1, 2, 3 andvariances\ ar(A
1
) = (0.2)
2
, \ ar(A
2
) = (0.3)
2
,
\ ar(A
4
) = (0.4)
2
. Assumingthatthereturnsareindependent, ndportfolioweightsn
1
, n
2
, n
3
sothat thelinear combination
n
1
A
1
+n
2
A
2
+n
3
A
3
hasthesmallest varianceamongall suchlinear combinationssubject ton
1
+n
2
+n
3
= 1.
180
8.31

Challenge problem: A drunkenprobabilist stands: stepsfromacliffsedge. Hetakesrandom


steps, either towardsor awayfromthecliff, eachstepindependent of thepast. At anypoint, the
probabilityof takingastepawayis2/3, or asteptoward, 1/3. What arehischancesof escaping
thecliff?
9. Continuous Probability Distributions
9.1 General Terminology and Notation
Continuous random variables havea range (set of possiblevalues) an interval (or a collection of
intervals) on thereal number line. They haveto betreated a littledifferently than discreterandom
variables because1(A = r) is zero for each r. To illustratearandomvariablewith acontinuous
distribution, consider thesimplespinningpointer inFigure9.1. andsupposethat all numbers inthe
1
3
2
4
X
Figure9.1: Spinner: adevicefor generatingacontinuousrandomvariable(inazero-gravity, virtually
frictionlessenvironment)
interval (0,4] areequallylikely. Theprobabilityof thepointer stoppingpreciselyat anygivennumber
r must be zero, because if each number has the same probability j 0, then the probability of
1 = {r : 0 < r 4} is thesum
P
ac(0,4]
j = , sincetheset 1 is uncountably innite. For
acontinuous randomvariabletheprobability of each point is 0 and probability functions cannot be
usedtodescribeadistribution. Ontheother hand, intervalsof thesamelength/ entirelycontainedin
(0,4], for exampletheinterval (0,
1
4
] and(1
3
4
, 2] all havethesameprobability (1/16inthiscase). For
continuousrandomvariableswespecifytheprobabilityof intervals, rather thanindividual points.
181
182
3 2 1 0 1 2 3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
*
X
Figure9.2:
Consider another exampleproducedbychoosingarandompoint inaregion. Supposeweplot a
graphafunction)(r) asinFigure9.2(assumethefunctionispositiveandhasniteintegral) andthen
generateapointatrandombyclosingour eyesandringadartfromadistanceuntil atleastonelandsin
theshadedregionunder thegraph. Weassumesuchapoint, heredenoted"*" isuniformlydistributed
underthegraph. Thismeansthatthepointisequallylikelytofall inanyoneof manypossibleregionsof
agivenarealocatedintheshadedregionsoweonlyneedtoknowtheareaof aregiontodeterminethe
probabilitythatapoint fallsinit. Consider thex-coordinateA of the point "*" asour randomvariable
(in Figure 9.2 it appears to be around 0.4). Notice that the probability that A falls in a particular
interval (a, /) is themeasured by theareaof theregion abovethis interval, i.e.
R
b
o
)(r)dr and so
the probability of any particular point 1(A = a) is the area of the region immediately above this
singlepoint
R
o
o
)(r)dr = 0. Thisisanother exampleof arandomvariableA whichhasacontinuous
distribution. For continuousA, therearetwocommonlyusedfunctionswhichdescribeitsdistribution.
Therst isthecumulativedistributionfunction, usedbeforefor discretedistributions, andthesecond
istheprobabilitydensityfunction, thederivativeof thec.d.f.
Cumulative Distribution Function:
For discreterandomvariables wedenedthec.d.f., 1(r) = 1 (A r) for continuous randomvari-
ablesaswell asfor discrete. For thespinner, theprobabilitythepointer stopsbetween0and1is1/4if
all valuesr areequallylikely"; between0and2theprobabilityis1/2, between0and3it is3/4; and
so on. Ingeneral, 1(r) = r,4 for 0 < r 4. Also, 1(r) = 0 for r 0 sincethereis no chance
of thepointer stoppingat anumber 0, and1(r) = 1 for r 4 sincethepointer is certaintostop
at number belowr if r 4. Inour secondexampleinwhichwegeneratedapoint at randomunder
thegraphof afunction)(r), if weassumethat thetotal areaunder thegraphis one, thecumulative
distributionfunction 1(r) istheareaunder thegraphbut totheleft of thepoint r asinFigure9.3.
183
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
F(x)
x
Figure9.3:
Most propertiesof ac.d.f. arethesamefor continuousvariablesasfor discretevariables. Theseare:
1. 1() = 0;and1() = 1
2. 1(r) isanon-decreasingfunctionof r
3. 1 (a < A /) = 1(/) 1(a).
Notethat, asindicatedbefore, for acontinuousdistribution, wehave0 = 1(A = a) = lim
.0
1(a
- < A a) = lim
.0
1(a) 1(a -). This means that lim
.0
1(a -) = 1(a) or that the
continuousdistributionfunctionisacontinuousfunction(inthesenseof continuityincalculus). Also,
sincetheprobabilityis0at eachpoint:
1(a < A < /) = 1(a A /) = 1(a A < /) = 1(a < A /) = 1(/) 1(a)
(For adiscreterandomvariable, eachof these4probabilities couldbedifferent.). For thecontinuous
distributions in this chapter, wedo not worry about whether intervals areopen, closed, or half-open
sincetheprobabilityof theseintervalsisthesame.
Probability Density Function (p.d.f.): Whilethec.d.f. canbeusedtondprobabilities, it doesnot
giveanintuitivepictureof whichvalues of r aremorelikely, andwhichareless likely. To develop
suchapicturesupposethat wetakeashort interval of A-values, [r, r + r]. Theprobability A lies
intheinterval is
1(r A r +r) = 1(r +r) 1(r).
Tocomparetheprobabilitiesfor twointervals, eachof lengthr, iseasy. Nowsupposeweconsider
what happensasr becomessmall, andwedividetheprobabilitybyr. Thisleadstothefollowing
denition.
Denition 32 The probability density function (p.d.f.) )(r) for a continuous random variable A is
184
the derivative
)(r) =
d1(r)
dr
where 1(r) is the c.d.f. for A.
Noticethat if thefunction)(r) graphedinFigure9.3hastotal integral one, thec.d.f. or theareato
theleftof apointr isgivenby1(r) =
R
a

)(.)d. andsothederivativeof thec.d.f. is1


0
(r) = )(r).
Itisclear fromthewayinwhichA wasgeneratedthat)(r) representstherelativelikelihoodof (small
intervalsaround) different r-values. Todothiswerst notesomepropertiesof ap.d.f. It isassumed
that )(r) isacontinuousfunctionof r at all pointsfor which0 < 1(r) < 1.
Properties of a probability density function
1. 1(a A /) = 1(/) 1(a) =
R
b
o
)(r)dr. (Thisfollowsfromthedenitionof )(r))
2. )(r) 0. (since1(r) isnon-decreasing, itsderivativeisnon-negative)
3.
R

)(r)dr =
R
alla
)(r)dr = 1. (Thisisbecause1( A ) = 1)
4. 1(r) =
R
a

)(n)dn. (Thisisjust property1witha = )


Toseethat)(r) representstherelativelikelihoodof differentoutcomes, wenotethatfor r small,
1(r
r
2
A r +
r
2
) = 1(r +
r
2
) 1(r
r
2
)
.
= )(r)r.
Thus, )(r) 6= 1(A = r) but )(r)r is theapproximate probability that A is insideaninterval of
lengthr centeredaboutthevaluer whenr issmall. A plotof thefunction)(r) showssuchvalues
clearly andfor this reason it is very common to plot theprobability density functions of continuous
randomvariables.
Example: Consider thespinner example, where
1(r) =

0 for r 0
a
4
for 0 < r 4
1 for r 4
Thus, thep.d.f. is)(r) = 1
0
(r), or
)(r) =
1
4
for 0 < r < 4.
and outside this interval thep.d.f. is 0.Figure 9.4 shows the probability density function )(r); for
obviousreasonsthisiscalledauniform distribution.
185
x
f
(
x
)
0 1 2 3 4
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
Figure9.4: Uniformp.d.f.
Remark: Continuousprobabilitydistributionsare, likediscretedistributions, mathematical
39
models.
Thus, theuniformdistributionassumedfor thespinner aboveisamodel, thoughitseemslikelyitwould
beagoodmodel for manyreal spinners.
Remark: It may seemparadoxical that 1(A = r) = 0 for acontinuous r.v. andyet werecordthe
outcomesA = r inreal experiments" withcontinuousvariables. Thecatchisthat all measurements
haveniteprecision; they areineffect discrete. For example, theheight 60 + inches is withinthe
rangeof theheight A of peopleinapopulationbut wecouldnever observetheoutcomeA = 60 +
if weselectedapersonat randomandmeasuredtheir height.
Tosummarize, inmeasurementsweareactuallyobservingsomethinglike
1(r 0.5 A r + 0.5)
where may bevery small, but not zero. Theprobability of thisoutcomeisnot zero: it is(approxi-
mately) )(r).
Wenowconsider amorecomplicated mathematical exampleof acontinuous randomvariableThen
well consider real problemsthat involvecontinuousvariables. Remember that it isalwaysagoodidea
tosketchor plot thep.d.f. )(r) for arandomvariable.
Example:
39
"Howcanit bethat mathematics, beingafter all aproduct of humanthought whichisindependent of experience, isso
admirablyappropriatetotheobjectsof reality? Ishumanreason, then, without experience, merelybytakingthought, ableto
fathomthepropertiesof real things?" Albert Einstein.
186
Let )(r) =

/r
2
; 0 < r 1
/(2 r); 1 < r < 2
0; otherwise
beap.d.f.
Find
a) /
b) 1(r)
c) 1

1,2 < A < 1
1
2

Solution:
a) Set
R

)(r)dr = 1 to solvefor /. When nding theareaof aregion bounded by different


functionswesplit theintegral intopieces.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
f(x
)
x
(Wenormallywouldnt evenwritedownthepartswith
R
0dr)
1 =
Z

)(r)dr
=
Z
0

0dr +
Z
1
0
/r
2
dr +
Z
2
1
/(2 r)dr +
Z

2
0dr
= 0 +/
Z
1
0
r
2
dr +/
Z
2
1
(2 r)dr + 0
= /
r
3
3

1
0
+/

2r
r
2
2
|
2
1

=
5/
6
Therefore/ =
6
5
.
b) Doingtheeasypieces, whichareoftenleft out, rst:
187
1(r) = 0 if r 0
and 1(r) = 1 if r 2 (since all probabilityisbelowr if r isanumber above2.)
For 0 < r < 1 1 (A r) =
R
a
0
6
5
.
2
d. =
6
5

a
3
3
|
a
0
=
2a
3
5
For 1 < r < 2, 1 (A r) =
R
1
0
6
5
.
2
d. +
R
a
1
6
5
(2 .) d.
(seetheshadedareabelow)
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
x
F(x)
f
(
x
)
x
=
6
5
a
3
3
|
1
0
+
6
5

2r
a
2
2
|
a
1
=
12a3a
2
7
5
i.e.
1(r) =

0; r 0
2r
3
,5; 0 < r 1
12a3a
2
7
5
; 1 < r < 2
1; r 2
Asaroughcheck, sincefor acontinuousdistributionthereisnoprobabilityat anypoint, 1(r) should
havethesamevalueasweapproacheachboundarypoint fromaboveandfrombelow.
e.g.
As r 0
+
,
2a
3
5
0
As r 1

,
2a
3
5

2
5
As r 1
+
,
12a3a
2
7
5

2
5
As r 2

,
12a3a
2
7
5
1
Thisquickcheckwont proveyour answer isright, but will detect manycarelesserrors.
188
c)
1

1
2
< A < 1
1
2

=
R
1
1
2
12
)(r)dr
or 1

1
1
2

1

1
2

(easier)
=
12(
3
2
)3(
3
2
)
2
7
5

2(
1
2
)
3
5
= 4,5
Dened Variables or Change of Variable:
Whenweknowthep.d.f. or c.d.f. for acontinuousrandomvariableA wesometimeswant tondthe
p.d.f. or c.d.f. for someother randomvariable1 whichis afunctionof A. Theprocedurefor doing
thisissummarizedbelow. It isbasedonthefact that thec.d.f. 1
Y
(j) for 1 equals1(1 j), andthis
canberewrittenintermsof A since1 isafunctionof A. Thus:
1) Writethec.d.f. of 1 asafunctionof A.
2) Use1
A
(r) tond1
Y
(j). Thenif youwantthep.d.f. )
Y
(j), youcandifferentiatetheexpression
for 1
Y
(j).
3) Findtherangeof valuesof j.
Example: Intheearlier spinner example,
)(r) =
1
4
; 0 < r 4
and 1(r) =
a
4
; 0 < r 4
Let 1 = 1,A. Find)(j).
Solution:
1
Y
(j) = 1 (1 j) = 1

1
A
j

= 1

A
1
j

= 1 1 (A < 1,j)
= 1 1
A
(1,j) (this completes step (1))
For step(2), wecandoeither:
1
Y
(j) = 1

4
(substituting
1
j
for r in1
A
(r))
= 1
1
4j
Therefore)
Y
(j) =
o
oj
1
Y
(j) =
1
4j
2
;
1
4
j <
189
(Asr goesfrom0to4, j =
1
a
goesbetweenand
1
4
.)
or : )
Y
(j) =
o
oj
1
Y
(j) =
o
oj
(1 1
A
(1,j))
=
o
oj
1
A
(1,j) =
o
oa
1
A
(1,j)
oa
oj
|
a=1j
(chain rule)
= )
A
(1,j)

1
j
2

=
1
4

1
j
2

=
1
4j
2
;
1
4
j <
Generallyif 1
A
(r) isknownit iseasier tosubstituterst, thendifferentiate. If 1
A
(r) isintheformof
anintegral that cant besolved, it isusuallyeasier todifferentiaterst, thensubstitute)
A
(r).
Extension of Expectation, Mean, and Variance to Continuous Distributions
Denition 33 When A is continuous, we still dene
1(q(A)) =
Z
all x
q(r))(r)dr.
Withthisdenition, all of theearlier propertiesof expectedvalueandvariancestill hold; for exam-
plewithj = 1(A),
o
2
= Var(A) = 1[(A j)
2
] = 1(A
2
) j
2
.
(This denitioncanbejustiedby writing
R
all a
q(r))(r)dr as alimit of aRiemannsumandrecog-
nizingtheRiemannsumasbeingintheformof anexpectedvaluefor discreterandomvariables.)
Example: Inthespinner examplewith)(r) =
1
4
; 0 < r 4
j =
R
4
0
r
1
4
dr =
1
4

a
2
2

|
4
0
= 2
1

A
2

=
R
4
0
r
2 1
4
dr =
1
4

a
3
3

|
4
0
=
16
3
o
2
= 1

A
2

j
2
=
16
3
4 = 4,3
Example: Let A havep.d.f.
)(r) =

6a
2
5
; 0 < r 1
6
5
(2 r); 1 < r < 2
0; otherwise
190
Then
j =
Z
all a
r)(r)dr =
Z
1
0
r
6
5
r
2
dr +
Z
2
1
r
6
5
(2 r)dr (splittingtheintegral)
=
6
5

r
4
4
|
1
0
+

r
2

r
3
3

|
2
1

= 11,10 or 1.1
1(A
2
) =
Z
1
0
r
2
6
5
r
2
dr +
Z
2
1
r
2
6
5
(2 r)dr
=
6
5

r
5
5
|
1
0
+2

r
3
3

|
2
1

r
4
4
|
2
1

=
67
50
o
2
= 1

A
2

j
2
=
67
50

11
10

2
=
13
100
or 0.13
Problems:
9.1.1 Let A havep.d.f. )(r) =
(
/r
2
; 1 < r < 1.
0; otherwise
Find
a) /
b) thec.d.f., 1(r)
c) 1 (.1 < A < .2)
d) themeanandvarianceof A.
e) let 1 = A
2
. Derivethep.d.f. of 1 .
9.1.2 A continuousdistributionhasc.d.f. 1(r) =
Ia
n
1+a
n
for r 0, where: isapositiveconstant.
(a) Evaluate/.
(b) Findthep.d.f., )(r).
(c) What is themedianof this distribution? (Themedianis thevalueof r suchthat half the
timeweget avaluebelowit andhalf thetimeaboveit.)
9.2 Continuous Uniform Distribution
J ust aswedidfor discreterandomvariables, wenowconsider somespecial typesof continuousproba-
bilitydistributions. Thesedistributionsariseincertainsettings, describedbelow. Thissectionconsiders
what wecall uniformdistributions.
Physical Setup:
191
SupposeA takes values insomeinterval [a,b] (it doesnt actually matter whether interval is openor
closed) withall subintervalsof axedlengthbeingequallylikely. ThenA hasacontinuous uniform
distribution. WewriteA l [a, /].
Illustrations:
(1) Inthespinner exampleA l(0, 4].
(2) Computerscangeneratearandomnumber A whichappearsasthoughit isdrawnfromthedis-
tributionl(0, 1). Thisisthestartingpoint for manycomputer simulationsof randomprocesses;
anexampleisgivenbelow.
The probability density function and the cumulative distribution function:
Sinceall points areequally likely (moreprecisely, intervals containedin[a, /] of agivenlength, say
0.01, all havethesameprobability), theprobabilitydensityfunctionmustbeaconstant)(r) = /; a
r / for someconstant /. Tomake
R
b
o
)(r)dr = 1, werequire/ =
1
bo
.
Therefore)(r) =
1
/ a
for a r /
1(r) =

0 for r < a
R
a
o
1
bo
dr =
ao
bo
for a r /
1 for r /
Mean and Variance:
j =
Z
b
o
r
1
/ a
dr =
1
/ a

r
2
2
|
b
o

=
/
2
a
2
2(/ a)
=
(/ a)(/ +a)
2(/ a)
=
/ +a
2
1(A
2
) =
Z
b
o
r
2
1
/ a
dr =
1
(/ a)

r
3
3
|
b
o

=
/
3
a
3
3(/ a)
=
(/ a)

/
2
+a/ +a
2

3(/ a)
=
/
2
+a/ +a
2
3
o
2
= 1

A
2

j
2
=
/
2
+a/ +a
2
3

/ +a
2

2
=
4/
2
+ 4a/ + 4a
2
3/
2
6a/ 3a
2
12
=
/
2
2a/ +a
2
12
=
(/ a)
2
12
192
Example: SupposeA hasthecontinuousp.d.f.
)(r) = .1c
.1a
r 0
(Thisiscalledanexponential distributionandisdiscussedinthenext section. It isusedinareassuch
asqueueingtheoryandreliability.) Well showthat thenewrandomvariable
1 = c
.1A
hasauniformdistribution, l(0, 1). Toseethis, wefollowthestepsinSection9.1:
1
Y
(j) = 1(1 j)
= 1(c
.1A
j)
= 1(A 10 |: j)
= 1 1(A < 10 |: j)
= 1 1
A
(10 |: j)
Since1
A
(r) =
R
a
0
.1c
.1&
dn = 1 c
.1a
weget
1
Y
(j) = 1 (1 c
.1(10|a j)
)
= j for 0 < j < 1
(Therangeof 1 is(0,1) sinceA 0.) Thus)
Y
(j) = 1
0
Y
(j) = 1(0 < j < 1) andso1 l(0, 1).
Many computer softwaresystems haverandomnumber generator" functions that will simulateob-
servations 1 fromal(0, 1) distribution. (Thesearemoreproperly calledpseudo-random number
generators becausethey arebasedondeterministic algorithms. Inadditionthey giveobservations1
thathaveniteprecisionsotheycannotbeexactly likecontinuousl(0, 1) randomvariables. However,
goodgeneratorsgive1 sthat appear indistinguishableinmost waysfroml(0, 1) randomvariables.)
Given such agenerator, wecan also simulaterandomvariables A with theexponential distribution
abovebythefollowingalgorithm:
1. Generate1 l(0, 1) usingthecomputer randomnumber generator.
2. ComputeA = 10 |: 1 .
193
Then A has thedesired distribution. This is aparticular caseof amethod described in Section 9.4
for generatingrandomvariables fromageneral distribution. In 1 softwarethecommand rn:i)(:)
producesavector consistingof : independent l(0, 1) values.
Problem:
9.2.1 If A hasc.d.f. 1(r), then1 = 1(A) hasauniformdistributionon[0,1]. (Showthis.) Suppose
youwanttosimulateobservationsfromadistributionwith)(r) =
3
2
r
2
; 1 < r < 1, byusing
therandomnumber generator onacomputer togeneratel[0, 1) numbers. What valuewouldA
takewhenyougeneratedtherandomnumber .27125?
9.3 Exponential Distribution
Thecontinuous randomvariableA is saidto haveanexponential distribution if its p.d.f. is of the
form
)(r) = `c
Aa
r 0
where` 0 isareal parameter value. Thisdistributionarisesinvariousproblemsinvolvingthetime
until someevent occurs. Thefollowinggivesonesuchsetting.
Physical Setup: InaPoissonprocessfor events intimelet A bethelengthof timewewait for the
rst event occurrence. Well show that A has an exponential distribution. (Recall that thenumber
of occurrences in a xed time has a Poisson distribution. The difference between the Poisson and
exponential distributionsliesinwhat isbeingmeasured.)
Illustrations:
(1) Thelengthof timeA wewait withaGeiger counter until theemissionof aradioactiveparticle
isrecordedfollowsanexponential distribution.
(2) Thelengthof timebetweenphonecallstoarestation(assumingcallsfollowaPoissonprocess)
followsanexponential distribution.
Derivation of the probability density function and the c.d.f.
1(r) = 1(A r) = 1 (timeto1
st
occurrence r)
= 1 1 (timeto1
st
occurrence r )
= 1 1 (nooccurrencesintheinterval (0, r))
Checkthatyouunderstandthislaststep. If thetimetotherstoccurrence r, theremustbenooccur-
rencesin(0, r), andviceversa. Wehavenowexpressed1(r) intermsof thenumber of occurrences
194
inaPoissonprocess by timer. But thenumber of occurrences has aPoissondistributionwithmean
j = `r, where` istheaveragerateof occurrence.
Therefore1(r) = 1
j
0
c
j
0!
= 1 c
j
.
Sincej = `r, 1(r) = 1 c
Aa
; for r 0. Thus
)(r) =
d
dr
1(r) = `c
Aa
; for r 0
whichistheformulawegaveabove.
Alternate Form: It is commonto usetheparameter 0 = 1,` intheexponential distribution. (Well
seebelowthat 0 = 1(A).) Thismakes
1(r) = 1 c
a0
and )(r) =
1
0
c
a0
Exercise:
Supposetreesinaforest aredistributedaccordingtoaPoissonprocess. Let A bethedistancefroman
arbitrary startingpoint tothenearest tree. Theaveragenumber of treesper squaremetreis`. Derive
)(r) thesameway wederivedtheexponential p.d.f. YourenowusingthePoissondistributionin2
dimensions(area) rather than1dimension(time).
Mean and Variance:
Findingj ando
2
directly involvesintegrationby parts. Aneasier solutionusespropertiesof gamma
functions, whichextendsthenotionof factorialsbeyondtheintegerstothepositivereal numbers.
Denition 34 The Gamma Function: (c) =
R

0
r
c1
c
a
dr is called the gamma function of c,
where c 0.
Notethat c is1morethanthepower of r intheintegrand. e.g.
R

0
r
4
c
a
dr = (5). Thereare3
propertiesof gammafunctionswhichwell use.
1. (c) = (c 1)(c 1) for c 1
Proof: Usingintegrationbyparts,
Z

0
r
c1
c
a
dr = r
c1
c
a
|

0
+ (c 1)
Z

0
r
c2
c
a
dr
andprovidedthat c 1, r
c1
c
a
|

0
= 0. Therefore
Z

0
r
c1
c
a
dr = (c 1)
Z

0
r
c2
c
a
dr
195
2. (c) = (c 1)! if c isapositiveinteger.
Proof: It iseasytoshowthat (1) = 1. Usingproperty1repeatedly, weobtain
(2) = 1(1) = 1,
(3) = 2(2) = 2!,
(4) = 3(3) = 3!, etc.
Ingeneral, (: + 1) = :! for integer :.
3.

1
2

(Thiscanbeprovedusingdoubleintegration.)
Returningtotheexponential distribution:
j =
Z

0
r
1
0
c
a0
dr
Let j =
a
0
. Thendr = 0dj and
j =
R

0
jc
j
0dj = 0
R

0
j
1
c
j
dj = 0(2)
= 0
Note: Readquestions carefully. If youregiventheaveragerate of occurrenceinaPoissonprocess,
that is`. If youregiventheaveragetime youwait for anoccurrence, that is0.
Toget o
2
= Var(A), werst nd
1

A
2

=
R

0
r
2 1
0
c
a0
dr
=
R

0
0
2
j
2 1
0
c
j
0dj = 0
2
R

0
j
2
c
j
dj
= 0
2
(3) = 2!0
2
= 20
2
Thereforeo
2
= 1

A
2

j
2
= 20
2
0
2
= 0
2
Example:
Suppose#7buses arriveat abus stopaccordingto aPoissonprocess withanaverageof 5buses per
hour. (i.e. ` = 5/hr. So0 =
1
5
hr. or 12min.) Findtheprobability(a) youhavetowait longer than15
minutesfor abus(b) youhavetowait morethan15minuteslonger, havingalreadybeenwaitingfor 6
minutes.
Solution:
196
a) 1 (A 15) = 1 1 (A 15) = 1 1(15)
=1

1 c
1512

= c
1.25
= .2865
b) If A isthetotal waitingtime, thequestionasksfor theprobability
1 (A 21|A 6) =
1 (A 21 and A 6)
1 (A 6)
=
1 (A 21)
1 (A 6)
=
1

1 c
2112

1 c
612
=
c
2112
c
612
= c
1512
= c
1.25
= .2865
Does this surpriseyou? Thefact that yourealready waited6minutes doesnt seemto matter.
Thisillustratesthememorylessproperty of theexponential distribution:
1 (A a +/|A /) = 1 (A a)
Fortunately, busesdontfollowaPoissonprocesssothisexampleneedntcauseyoutostopusing
thebus.
Problems:
9.3.1 Inabankwithon-lineterminals, thetimethesystemrunsbetweendisruptionshasanexponential
distributionwithmean0 hours. Onequarter of thetimethesystemshutsdownwithin8hoursof
thepreviousdisruption. Find0.
9.3.2 Flawsinpaintedsheetsof metal occur over thesurfaceaccordingtotheconditionsfor aPoisson
process, at anintensityof ` per :
2
. Let A bethedistancefromanarbitrarystartingpoint tothe
secondclosest aw. (Assumesheetsareof innitesize!)
(a) Findthep.d.f., )(r).
(b) What istheaveragedistancetothesecondclosest aw?
9.4 A Method for Computer Generation of Random Variables
40
Most computer softwarehas abuilt-inpseudo-randomnumber
41
generator that will simulateob-
servationsl fromal(0, 1) distribution, or at least areasonableapproximationtothisuniformdistri-
bution. If wewisharandomvariablewithanon-uniformdistribution, thestandardapproachistotake
40
Thissectionoptional for stat 220
41
"Thegeneration of randomnumbers is too important to beleft to chance." Robert R. Coveyou, Oak RidgeNational
Laboratory
197
asuitablefunction of l. By far thesimplest and most common method for generating non-uniform
variates is based on the inverse cumulative distribution function. For arbitrary c.d.f. 1(r), dene
1
1
(j) =min {r; 1(r) j}. This is areal inverse(i.e. 1(1
1
(j)) = 1
1
(1(j)) = j) in the
casethat thec.d.f. is continuous andstrictly increasing, sofor examplefor acontinuous distribution.
However, inthemoregeneral caseof apossiblydiscontinuousnon-decreasingc.d.f. (suchasthec.d.f.
of adiscretedistribution) thefunctioncontinuestoenjoyat least someof thepropertiesof aninverse.
1
1
isuseful for generatingarandomvariableshavingc.d.f. 1(r) froml, auniformrandomvariable
ontheinterval [0, 1].
Theorem 35 If 1 is an arbitrary c.d.f. and l is uniform on [0, 1] then the random variable dened
by A = 1
1
(l) has c.d.f. 1(r).
Proof:
Theproof isaconsequenceof thefact that
[l < 1(r)] [A r] [l 1(r)] for all r.
You can check this graphically bechecking, for example, that if [l < 1(r)] then [1
1
(l) r]
(this conrms the left hand
00
). Taking probabilities on all sides of this, and using the fact that
1[l 1(r)] = 1[l < 1(r)] = 1(r), wediscover that 1[A r] = 1(r).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
U
X=F
1
(U)
x
F
(
x
)
F(x)
Figure9.5: InvertingaCumulativeDistributionFunction
198
The relation A = 1
1
(l) implies that 1(A) l and for any point . < A, 1(.) < l. For
example, for therather unusual lookingpiecewiselinear cumulativedistributionfunctioninFigure9.5,
wendthesolutionA = 1
1
(l) by drawingahorizontal lineat l until it strikes thegraphof the
c.d.f. (or wherethegraphwouldhavebeenif wehadjoinedtheendsat thejumps) andthenA isthe
rcoordi:atc of thispoint. Thisistrueingeneral, A isthecoordinateof thepointwhereahorizontal
linerst strikesthegraphof thec.d.f. Weprovideonesimpleexampleof generatingrandomvariables
bythismethod, for thegeometricdistribution.
Example: A geometric random number generator
For theGeometricdistribution, thecumulativedistributionfunctionisgivenby
1(r) = 1 (1 j)
a+1
, for r = 0, 1, 2, ...
Thenif l isauniformrandomnumber intheinterval [0, 1], weseekaninteger A suchthat
1(A 1) < l 1(A)
(youshouldconrmthat thisisthevalueof A at whichtheabovehorizontal linestrikesthegraphof
thec.d.f) andsolvingtheseinequalitiesgives
1 (1 j)
A
< l 1 (1 j)
A+1
(1 j)
A
1 l (1 j)
A+1
A ln(1 j) ln(1 l) (A + 1) ln(1 j)
A <
ln(1 l)
ln(1 j)
A + 1
sowecomputethevalueof
ln(1 l)
ln(1 j)
androunddowntothenext lower integer.
Exercise: An exponential random number generator.
Showthat theinversetransformmethodaboveresultsinthegenerator for theexponential distribution
A =
1
`
ln(1 l)
199
9.5 Normal Distribution
Physical Setup:
A randomvariableA dened on (, ) has a normal
42
distribution if it has probability density
functionof theform
)(r) =
1

2o
c

1
2
(
i

)
2
< r <
where< j < ando 0 areparameters. It turnsout (andisshownbelow) that1(A) = j and
Var(A) = o
2
for thisdistribution; that iswhyitsp.d.f. iswrittenusingthesymbolsj ando. Wewrite
A (j, o
2
)
todenotethat A hasanormal distributionwithmeanj andvarianceo
2
(standarddeviationo).
The normal distribution is the most widely used distribution in probability and statistics. Physical
processesleadingtothenormal distributionexistbutarealittlecomplicatedtodescribe. (For example,
it arises in physics viastatistical mechanics and maximumentropy arguments.) It is used for many
processes where A represents a physical dimension of some kind, but also in many other settings.
Well seeother applicationsof it below. Theshapeof thep.d.f. )(r) aboveiswhat isoftentermeda
bell shape or bell curve, symmetric about 0 as showninFigure9.6.(youshouldbeabletoverify
theshapewithout graphingthefunction)
5 4 3 2 1 0 1 2 3 4 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x
4.8

(
x
)
Figure9.6: TheStandardNormal ((0, 1)) probabilitydensityfunction
Illustrations:
(1) Heightsor weightsof males(or of females) inlargepopulationstendtofollownormal distribu-
tions.
42
"Theonlynormal peoplearetheonesyoudont knowverywell." J oeAncis,
200
(2) Thelogarithmsof stockpricesareoftenassumedtobenormallydistributed.
The cumulative distribution function: Thec.d.f. of thenormal distribution(j, o
2
) is
1(r) =
Z
a

2o
c

1
2
(

)
2
dj.
asshowninFigure9.7. Thisintegral cannot begivenasimplemathematical expressionsonumerical
methodsareusedtocomputeitsvaluefor givenvaluesof r, j ando. Thisfunctionisincludedinmany
softwarepackagesandsomecalculators.
5 4 3 2 1 0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

(
x
)
x
4.7
Figure9.7: Thestandardnormal c.d.f.
In thestatistical packages 1 and o-Plus weget 1(r) aboveusing thefunction j:or:(r, j, o).
Before computers, people needed to produce tables of probabilities 1(r) by numerical integration,
usingmechanical calculators. Fortunatelyitisnecessarytodothisonlyfor asinglenormal distribution:
the one with j = 0 and o = 1. This is called the standard" normal distribution and denoted
(0, 1).
Itiseasytoseethatif A (j, o
2
) thenthenew" randomvariable7 = (Aj),o isdistributed
as7 (0, 1). (J ust usethechangeof variablesmethodsinSection9.1.) Well usethistocompute
1(r) andprobabilitiesfor A below, butrstweshowthat)(r) integratesto1andthat1(A) = j and
201
Var(A) = o
2
. For therst result, notethat
Z

2o
c

1
2
(
i

)
2
dr =
Z

2
c

1
2
:
2
d. wherewelet . = (r j),o)
= 2
Z

0
1

2
c

1
2
:
2
d.
= 2
Z

0
1

2
c
j
dj

2j
1
2
Note: j =
1
2
.
2
; andd. =
dj

2j
1
2
=
1

Z

0
j

1
2
c
j
dj
=
1

(
1
2
) (where isthegammafunction)
= 1
Mean, Variance, Moment generating function: Recall that anoddfunction, )(r), hastheproperty
that )(r) = )(r). If )(r) isanoddfunctionthen
R

)(r)dr = 0, providedtheintegral exists.


Consider
1(A j) =
Z

(r j)
1
o

2
c

(i)
2
2
2
dr.
Let j = r j. Then
1(A j) =
Z

j
1
o

2
c


2
2
2
dj,
wherej
1
o

2
c


2
2
2
is anoddfunctionso that 1(A j) = 0. But since1(A j) = 1(A) j,
thisimplies
1(A) = j,
andsoj isthemean. Toobtainthevariance,
Var(A) = 1
h
(A j)
2
i
=
R

(r j)
2 1
o

2
c

(i)
2
2
2
dr
= 2
R

j
(r j)
2 1
o

2
c

(i)
2
2
2
dr ( sincethefunctionissymmetricabout j).
Wecanobtainagammafunctionbylettingj =
(aj)
2
2o
2
.
Then (r j)
2
= 2o
2
j
(r j) = o

2j (r j, sothepositiveroot istaken)
dr =
o

2oj
2

j
=
o

2j
dj
202
Then
Var(A) = 2
Z

0

2o
2
j

1
o

2
c
j

2j
dj

=
2o
2

Z

0
j
12
c
j
dj =
2o
2

3
2

=
2o
2

1
2

1
2

=
2o
2

1
2

= o
2
andsoo
2
isthevariance. Wenowndthemoment generatingfunctionof the(j, o
2
) distribution.
If A hasthe(j, o
2
) distribution, then
'
A
(t) = 1(c
At
) =
Z

c
at
)(r)dr
=
1
o

2
Z

c
at
c

(i)
2
2
2
dr
=
1
o

2
Z

1
2
2
(a
2
2ja2ato
2
+j
2
)
dr
=
c
jt+o
2
t
2
2
o

2
Z

1
2
2
{a
2
2(j+to
2
)a+(j+to
2
)
2
}
dr
=
c
jt+o
2
t
2
2
o

2
Z

1
2
2
{a(j+to
2
)}
2
dr
= c
jt+o
2
t
2
2
wherethelast stepfollowssince
1
o

2
Z

1
2
2
{a(j+to
2
)}
2
dr
isjust theintegral of a(j +to
2
, o
2
) probabilitydensityfunctionandisthereforeequal toone. This
conrmsthevalueswealreadyobtainedfor themeanandthevarianceof thenormal distribution
'
0
A
(0) = c
jt+o
2
t
2
2
(j +to
2
)|
t=0
= j
'
00
A
(0) = j
2
+o
2
= 1(A
2
)
fromwhichweobtain
\ ar(A) = o
2
.
Finding Normal Probabilities Via (0, 1) Tables Asnotedabove, 1(r) doesnot haveanexplicit
closedformsonumerical computationis needed. Thefollowingresult shows that if wecancompute
thec.d.f. for thestandardnormal distribution(0, 1), thenwecancomputeit for any other normal
distribution(j, o
2
) aswell.
203
Theorem 36 Let A (j, o
2
) and dene 7 = (A j),o. Then 7 (0, 1) and
1
A
(r) = 1(A r)
= 1
Z
(
aj
o
).
Proof: Thefact that 7 (0, 1) hasp.d.f.
)
Z
(.) =
1

2
c

1
2
:
2
< . <
followsimmediatelybychangeof variables. Alternatively, wecanjust notethat
1
A
(r) =
Z
a

2o
c

1
2
(
i

)
2
dr
=
Z
(aj)o

2
c

1
2
:
2
d. (. =
r j
o
)
= 1
Z
(
r j
o
)
A tableof probabilities1
Z
(.) = 1(7 .) is givenonthelast pageof thesenotes. A space-saving
featureis that only thevalues for . 0 areshown; for negativevalues weusethefact that (0, 1)
p.d.f. issymmetricabout0. Thefollowingexamplesillustratehowtogetprobabilitiesfor 7 usingthe
tables.
Examples: Findthefollowingprobabilities, where7 (0, 1).
(a) 1 (7 2.11)
(b) 1 (7 3.40)
(c) 1 (7 1.06)
(d) 1 (7 < 1.06)
(e) 1 (1.06 < 7 < 2.11)
Solution:
a) Look up2.11inthetableby goingdowntheleft columnto2.1thenacross totheheading.01.
Wendthenumber .9826. Then1 (7 2.11) = 1 (2.11) = .9826. SeeFigure9.8.
204
4 3 2 1 0 1 2 3 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.9826
2.11
f
(
z
)
z
Figure9.8:
b) 1 (7 3.40) = 1(3.40) = .99966
c) 1 (7 1.06) = 1 1 (7 1.06) = 1 1(1.06) = 1 .8554 = .1446
d) Nowwehavetousesymmetry:
1 (7 < 1.06) = 1 (7 1.06) = 1 1 (7 1.06) = 1 1(1.06) = .1446
SeeFigure9.5.
e) 1 (1.06 < 7 < 2.11) = 1(2.11) 1(1.06)
= 1(2.11) 1 (7 1.06) = 1(2.11) [1 1(1.06)]
= .9826 (1 .8554) = .8380
Inadditiontousingthetablestondtheprobabilitiesfor givennumbers, wesometimesaregiventhe
probabilities andaskedtondthenumber. With1 or o-Plus software, thefunctionqnorm(j, j, o)
givesthe100j-thpercentile(where0 < j < 1). Wecanalsousetablestonddesiredvalues.
Examples:
a) Findanumber c suchthat 1 (7 < c) = .85
b) Findanumber d suchthat 1 (7 d) = .90
c) Findanumber / suchthat 1 (/ < 7 < /) = .95
205
4 3 2 1 0 1 2 3 4
0
0.1
0.2
0.3
0.4
1.06
f
(
z
)
z
4 3 2 1 0 1 2 3 4
0
0.1
0.2
0.3
0.4
1.06
f
(
z
)
z
Solutions:
a) Wecanlook inthebody of thetableto get anentry closeto .8500. This occurs for . between
1.03 and 1.04; . = 1.04 gives theclosest valueto .85. For greater accuracy, thetableat the
bottomof thelast pageis designedfor ndingnumbers, giventheprobability. Lookingbeside
theentry.85wend. = 1.0364.
b) Since1 (7 d) = .90 wehave1(d) = 1 (7 d) = 1 1 (7 d) = .10. Thereisnoentry
for which1(.) = .10 soweagainhavetousesymmetry, sinced will benegative.
1 (7 d) = 1 (7 |d|)
= 1 1 (|d|) = .10
Therefore1 (|d|) = .90
Therefore|d| = 1.2816
Therefored = 1.2816
Thekeytothissolutionliesinrecognizingthatd will benegative. If youcanpicturethesituation
it will probablybeeasier tohandlethequestionthanif yourelyonalgebraicmanipulations.
Exercise: Will a bepositiveor negativeif 1 (7 a) = .05? What if 1 (7 < a) = .99?
206
4 3 2 1 0 1 2 3 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
d |d|
f
(
z
)
z
0.1 0.1
c) If 1 (/ < 7 < /) = .95 weagainusesymmetry.
4 3 2 1 0 1 2 3 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
b b
f
(
z
)
z
0.95
0.025 0.025
Figure9.9:
Theprobabilityoutsidetheinterval (/, /) must be.05, andthisisevenlysplit betweenthearea
above/ andtheareabelow/.
Therefore1 (7 < /) = 1 (7 /) = .025
and1 (7 /) = .975
Lookinginthetable, / = 1.96.
To nd

j, o
2

probabilities in general, we use the theoremgiven earlier, which implies that if


A (j, o
2
) then
1(a A /) = 1

oj
o
7
bj
o

= 1
Z

bj
o

1
Z

oj
o

where7 (0, 1).


Example: Let A (3, 25).
207
a) Find1 (A < 2)
b) Findanumber c suchthat 1 (A c) = .95.
Solution:
a)
1 (A < 2) = 1

A j
o
<
2 3
5

= 1 (7 < .20) = 1 1 (7 < .20)


= 1 1(.20) = 1 .5793 = .4207
b)
1 (A c) = 1

A j
o

c 3
5

= 1

7
c 3
5

= .95
4 3 2 1 0 1 2 3 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
(c3)/5
0.95
f(
z
)
z
Figure9.10:
Therefore
c3
5
= 1.6449
andc = 5.2245
Gaussian Distribution: Thenormal distribution is also known as theGaussian
43
distribution. The
notation A G(j, o) means that A has Gaussian (normal) distribution with mean j and standard
deviationo. So, for example, if A (1, 4) thenwecouldalsowriteA G(1, 2).
43
AfterJ ohannCarl FriedrichGauss(1777-1855), aGermanmathematician, physicistandastronomer, discovererof Bodes
Law, theBinomial Theoremandaregular 17-gon. Hediscoveredtheprimenumber theoremwhilean18year-oldstudent
andusedleast-squares(what iscalledstatistical regressioninmost statisticscourses) topredict thepositionof Ceres.
208
Example: The heights of adult males in Canada are close to normally distributed, with a mean of
69.0inches andastandarddeviation of 2.4inches. Findthe10thand90th percentiles of theheight
distribution. (Recall that thea-thpercentileissuchthat a%of thepopulationhasheight lessthanthis
value.)
Solution: Wearebeingtoldthat if A istheheight of arandomly selectedCanadianadult male, then
A G(69.0, 2.4), or equivalentlyA (69.0, 5.76). Tondthe90thpercentilec, weuse
1(A c) = 1

A 69.0
2.4

c 69.0
2.4

= 1

7
c 69.0
2.4

= .90
Fromthetablewesee1(7 1.2816) = .90 soweneed
c 69.0
2.4
= 1.2816,
whichgives c = 72.08 inches. Similarly, to ndc suchthat 1(A c) = .10 wendthat 1(7
1.2816) = .10, soweneed
c 69.0
2.4
1.2816,
or c = 65.92 inches, asthe10thpercentile.
Linear Combinations of Independent Normal Random Variables
Linearcombinationsof normal randomvariablesareimportantinmanyapplications. Sincewehavenot
coveredcontinuousmultivariatedistributions, wecanonlyquotethesecondandthirdof thefollowing
resultswithout proof. Therst result followseasilyfromthechangeof variablesmethod.
1. Let A (j, o
2
) and 1 = aA + /, wherea and / areconstant real numbers. Then 1
(aj +/, a
2
o
2
)
2. Let A

j
1
, o
2
1

and1

j
2
, o
2
2

beindependent, andlet a and/ beconstants.


ThenaA +/1

aj
1
+/j
2
, a
2
o
2
1
+/
2
o
2
2

.
Ingeneral if A
i


j
i
, o
2
i

areindependent anda
i
areconstants,
then
P
a
i
A
i

P
a
i
j
i
,
P
a
2
i
o
2
i

.
3. Let A
1
, A
2
, , A
a
beindependent

j, o
2

randomvariables.
Then
P
A
i


:j, :o
2

andA

j, o
2
,:

.
Actually, theonlynewresult hereisthat thedistributionsarenormal. Themeansandvariances
of linear combinationsof randomvariableswerepreviouslyobtainedinsection8.3.
209
Example: Let A (3, 5) and1 (6, 14) beindependent. Find1 (A 1 ).
Solution: Whenever wehavevariablesonbothsidesof theinequalityweshouldcollect themonone
side, leavinguswithalinear combination.
1 (A 1 ) = 1 (A 1 0)
A 1 (3 6, 5 + 14) i.e. (3, 19)
1 (A 1 0) = 1

7
0(3)

19
= .69

= 1 1(.69) = .2451
Example: Threecylindrical parts arejoined end to end to makeup ashaft in amachine; 2 typeA
partsand1typeB. Thelengthsof thepartsvaryalittle, andhavethedistributions: (6, .4) and
1 (35.2, .6). Theoverall lengthof theassembledshaft must liebetween46.8and47.5or else
theshaft has tobescrapped. Assumethelengths of different parts areindependent. What percent of
assembledshaftshavetobescrapped?
Exercise: Why wouldit bewrongto represent thelengthof theshaft as 2A +B? Howwouldthis
lengthdiffer fromthesolutiongivenbelow?
Solution: Let 1, thelengthof theshaft, be1 =
1
+
2
+1.
Then
1 (6 + 6 + 35.2, .4 +.4 +.6) = (47.2, 1.4)
andso
1 (46.8 < 1 < 47.5) = 1

46.847.2

1.4
< 7 <
47.547.2

1.4

= 1 (.34 < 7 < .25) = .2318.


i.e. 23.18%areacceptableand76.82%must bescrapped. Obviously wehavetondaway toreduce
thevariabilityinthelengthsof theparts. Thisisacommonprobleminmanufacturing.
Exercise: Howcouldwereducethepercent of shafts beingscrapped? (What if wereducedthevari-
anceof and1 partseachby50%?)
Example: Theheightsof adult femalesinalargepopulationiswell representedbyanormal distribu-
tionwithmean64in. andvariance6.2in
2
.
(a) Findtheproportionof femaleswhoseheight isbetween63and65inches.
(b) Suppose 10 women are randomly selected, and let

A be their average height ( i.e.

A =
10
P
i=1
A
i
,10, whereA
1
, . . . , A
10
aretheheightsof the10women). Find1(63

A 65).
210
(c) Howlargemust : beso that arandomsampleof : womengives anaverageheight

A so that
1(|

A j| 1) .95?
Solution:
(a) A (64, 6.2) sofor theheight A of arandomwoman,
1(63 A 65) = 1

63 64

6.2

A j
o

65 64

6.2

= 1(0.402 7 0.402)
= 0.312
(b)

A

64,
6.2
10

so
1(63

A 65) = 1

6364

.62


Aj
o
^

6564

.62

= 1(.1.27 7 1.27)
= 0.796
(c) If

A

64,
6.2
a

then
1(|

A j| 1) = 1(|

A 64| 1)
= 1(63

A 65)
= 1

6364

6.2a


Aj
o
^

6564

6.2a

= 1(0.402

: 7 0.402

:) = .95
iff .402

: = 1.96. (Thisisbecause1(1.96 7 1.96) = .95). So1(|



A 64| 1) .95 iff
0.402

: 1.96 whichistrueif : (1.96,.402)


2
, or : 23.77. Thuswerequire: 24 since: is
aninteger.
Remark: Thisshowsthat if weweretoselect arandomsampleof : = 24 persons, thentheir average
height

A wouldbewith1inchof theaverageheight j of thewholepopulationof women. So if we
didnot knowj thenwecouldestimateit towithin1 inch(withprobability.95) bytakingthissmall
asample.
Exercise: Findhowlarge: wouldhavetobetomake1(|

A j| .5) .95.
These ideas formthe basis of statistical sampling and estimation of unknown parameter values in
populationsandprocesses. If A (j, o
2
) andweknowroughlywhat o is, but dont knowj, then
211
wecanusethefactthat

A (j, o
2
,:) tondtheprobabilitythatthemean

A fromasampleof size
: will bewithinagivendistanceof j.
Problems:
9.5.1 Let A (10, 4) and1 (3, 100) beindependent. Findtheprobability
a) 8.4 < A < 12.2
b) 21 A
c) 1 < 0 where1 isthesamplemeanof 25independent observationson1 .
9.5.2 Let A have a normal distribution. What percent of the time does A lie within one standard
deviationof themean? Twostandarddeviations? Threestandarddeviations?
9.5.3 Let A (5, 4). An independent variable1 is also normally distributed with mean 7 and
standarddeviation3. Find:
(a) Theprobability2A differsfrom1 bymorethan4.
(b) Theminimumnumber, :, of independent observationsneededonA sothat
1

|A 5| < 0.1

.98. (A =
a
P
i=1
A
i
,: isthesamplemean)
9.6 Use of the Normal Distribution in Approximations
Thenormal distributioncan, under certainconditions, beusedtoapproximateprobabilities for linear
combinationsof variableshavinganon-normal distribution. Thisremarkablepropertyfollowsfroman
amazingresult calledthecentral limit theorem. Thereareactuallyseveral versionsof thecentral limit
theorem. Theversiongivenbelowisoneof thesimplest.
Central Limit Theorem (CLT):
Themajor reason that thenormal distributionissocommonlyusedisthat it tendstoapproximate
thedistributionof sumsof randomvariables. For example, if wethrow: fair diceando
a
isthesumof
theoutcomes, what isthedistributionof o
a
? Thetablesbelowprovidethenumber of waysinwhicha
givenvaluecanbeobtained. Thecorrespondingprobabilityisobtainedbydividingby6
a
. For example
onthethrowof : = 1 dicetheprobableoutcomesare1,2,...,6withprobabilitiesall 1,6 asindicated
intherst panel of thehistograminFigure9.11.
If wesumthevalues ontwofair dice, thepossibleoutcomes arethevalues 2,3,...,12as shownin
thefollowingtableandtheprobabilitiesarethevaluesbelow:
212
1 2 3 4 5 6
0
0.05
0.1
0.15
0.2
n=1
2 3 4 5 6 7 8 9 10 11 12
0
0.05
0.1
0.15
0.2
n=2
2 4 6 8 10 12 14 16 18 20
0
0.05
0.1
0.15
0.2
The probability histogram of the sum of n Uniform1, 2, 3, 4, 5, 6 random variables
n=3
Figure9.11: Theprobability histogramof thesumof :=1,2,3 discreteuniform{1,2,3,4,5,6}random
variables
Values 2 3 4 5 6 7 8 9 10 11 12
Probabilities36 1 2 3 4 5 6 5 4 3 2 1
Theprobabilityhistogramof thesevaluesisshowninthesecondpanel. Finallyfor thesumof the
values onthreeindependent dice, thevalues rangefrom3to 18andhaveprobabilities which, when
multipliedby6
3
result inthevalues
1 3 6 10 15 21 25 27 27 25 21 15 10 6 3 1
towhichwecant threeseparatequadratic functionsoneinthemiddleregionandoneineachof the
twotails. Thehistogramof thesevaluesshowninthethirdpanel of Figure9.11. andalreadyresembles
anormal probabilitydensityfunction.Ingeneral, thesedistributionsshowasimplepattern. For : = 1,
theprobability functionis aconstant (polynomial degree0). For : = 2, twolinear functions spliced
together. For : = 3, thehistogramcan beconstructed fromthreequadratic pieces (polynomials of
degree: 1). Theseprobability histograms rapidly approach the shapeof thenormal probability
densityfunction, asisthecasewiththesumor theaverageof independentrandomvariablesfrommost
distributions. Youcansimulatethethrows of any number of diceandillustratethebehaviour of the
213
sumsonat theurl http://www.math.csusb.edu/faculty/stanton/probstat/clt.html.
Let A
1
, A
2
, , A
a
beindependent randomvariables all havingthesamedistribution, withmeanj
andvarianceo
2
. Thenas: ,
a
X
i=1
A
i


:j, :o
2

(9.10)
and
A

j,
o
2
:

. (9.11)
Thisisactuallyaroughstatementof theresultsince, as: , boththe

:j, :o
2

and

j, o
2
,:

distributionsfail toexist. (Theformer becauseboth:j and:o


2
, thelatter because
o
2
a
0.) A
preciseversionof theresultsis:
Theorem 37 If A
1
, A
2
, , A
a
be independent random variables all having the same distribution,
with mean j and variance o
2
, then as : , the cumulative distribution function of the random
variable
P
A
i
:j
o

:
approaches the (0, 1) c.d.f. Similarly, the c.d.f. of
A j
o,

:
approaches the standard normal c.d.f.
Althoughthis is atheoremabout limits, wewill useit when: is large, but nite, to approximatethe
distributionof
P
A
i
or A byanormal distribution, sotheroughversionof thetheoremin(9.10) and
(9.11) isadequatefor our purposes.
Notes:
(1) Thistheoremworksfor essentiallyall distributionswhichA
i
couldhave. Theonlyexceptionoc-
curswhenA
i
hasadistributionwhosemeanor variancedontexist. Therearesuchdistributions,
but theyarerare.
(2) Wewill usetheCentral LimitTheoremtoapproximatethedistributionof sums
a
P
i=1
A
i
oraverages

A. Theaccuracy of theapproximation depends on: (bigger is better) and also on theactual


distributiontheA
i
scomefrom. Theapproximationworksbetter for small : whenA
i
sp.d.f. is
closetosymmetric.
214
(3) If youlook at thesectiononlinear combinations of independent normal randomvariables you
will ndtworesultswhichareverysimilar tothecentral limit theorem. Theseare:
For A
1
, , A
a
independent and

j, o
2

,
P
A
i


:j, :o
2

, andA

j, o
2
,:

.
Thus, if theA
i
sthemselveshaveanormal distribution, then
P
A
i
andA haveexactlynormal distri-
butions for all values of :. If theA
i
s do not haveanormal distributionthemselves, then
P
A
i
and
A haveapproximately normal distributionswhen: islarge. Fromthisdistinctionyoushouldbeable
to guess that if theA
i
s distributionis somewhat normal shapedtheapproximationwill begoodfor
smaller values of : than if theA
i
s distributionis very non-normal in shape. (This is relatedto the
secondremarkin(2)).
Example: Hamburger pattiesarepacked8toabox, andeachbox issupposedtohave1Kgof meat
init. Theweights of thepatties vary alittlebecausethey aremass produced, andtheweight A of a
singlepatty isactually arandomvariablewithmeanj = 0.128 kgandstandarddeviationo = 0.005
kg. Findtheprobability abox has at least 1kgof meat, assumingthat theweights of the8patties in
anygivenboxareindependent.
Solution: Let A
1
, . . . , A
8
betheweights of the8 patties in abox, and 1 = A
1
+ + A
8
be
their total weight. By theCentral Limit Theorem, 1 isapproximately (8j, 8o
2
); well assumethis
approximationisreasonableeventhough: = 8 issmall. (Thisislikely OK becauseAsdistribution
islikelyfairlyclosetonormal itself.) Thus1 (1.024, .0002) and
1(1 1) = 1

7
11.024

.0002

= 1(7 1.702)
.
= .9554
(Note: Weseethatonlyabout95%of theboxesactuallyhave1kgor moreof hamburger. Whatwould
yourecommendbedonetoincreasethisprobabilityto99%?)
Example: Supposeresreportedtoarestationsatisfytheconditionsfor aPoissonprocess, witha
meanof 1reevery4hours. Findtheprobabilitythe500
th
reof theyear isreportedonthe84
th
day
of theyear.
Solution: LetA
i
bethetimebetweenthe(i1)
st
andi
th
res(A
1
isthetimetothe1
st
re). ThenA
i
hasanexponential distributionwith0 = 1,` = 4 hours, or 0 = 1,6 day. Since
500
P
i=1
A
i
isthetimeuntil
the500thre, wewant tond1

83 <
500
P
i=1
A
i
84

. Whiletheexponential distributionisnot close


to normal shaped, wearesummingalargenumber of independent exponential variables. Hence, by
215
thecentral limit theorem,
P
A
i
hasapproximatelya

500j, 500o
2

distribution, wherej = 1(A


i
)
ando
2
= Var(A
i
).
For exponential distributions, j = 0 = 1,6 ando
2
= 0
2
= 1,36 so
1

83 <
X
A
i
84

= 1

83
500
6
q
500
36
< 7
84
500
6
q
500
36

= 1 (.09 < 7 .18) = .1073


Example: This exampleis frivolous but shows howthenormal distributioncan approximateeven
sums of discreterandomvariables. Inanorchard, supposethenumber A of worms inanapplehas
probabilityfunction:
r 0 1 2 3
)(r) .4 .3 .2 .1
Findtheprobabilityabasketwith250applesinithasbetween225and260(inclusive) wormsinit.
Solution:
j = 1(A) =
3
P
a=0
r)(r) = 1
1

A
2

=
3
P
a=0
r
2
)(r) = 2
Thereforeo
2
= 1

A
2

j
2
= 1
By thecentral limit theorem,
250
P
i=1
A
i
has approximately a

250j, 250o
2

distribution, whereA
i
is
thenumber of wormsinthei
th
apple.
i.e.
X
A
i
(250, 250)
1

225
X
A
i
260

= 1

225 250

250
7
260 250

250

= 1 (1.58 7 .63) = .6786


Whilethisapproximationisadequate, wecanimproveitsaccuracy, asfollows. WhenA
i
hasadiscrete
distribution, as it does here,
P
A
i
will always remain discreteno matter how large: gets. So the
distribution of
P
A
i
, whilenormal shaped, will never beprecisely normal. Consider a probability
histogramof thedistributionof
P
A
i
, asshowninFigure9.12. (Onlypart of thehistogramisshown.)
216
220 225 230 235 240 245 250 255 260 265
0
0.005
0.01
0.015
0.02
0.025
0.03
224.5
260.5
Figure9.12:
Theareaof eachbar of thishistogramistheprobabilityat ther valueinthecentreof theinterval. The
smoothcurveisthep.d.f. for theapproximatingnormal distribution. Then
260
P
a=225
1 (
P
A
i
= r) isthe
total areaof all bars of thehistogramfor r from225to 260. Thesebars actually spancontinuous r
valuesfrom224.5to260.5. Wecouldthengetamoreaccurateapproximationbyndingtheareaunder
thenormal curvefrom224.5to260.5.
i.e. 1 (225
P
A
i
260) = 1 (224.5 <
P
A
i
< 260.5)
= 1

224.5250

250
< 7 <
260.5250

250

= 1 (1.61 < 7 < .66) = .6917


Unlessmakingthisadjustmentgreatlycomplicatesthesolution, itispreferabletomakethiscontinuity
correction.
Notes:
(1) A continuity correctionshouldnot beappliedwhenapproximatingacontinuousdistributionby
thenormal distribution. Sinceit involves goinghalfway to thenext possiblevalueof r, there
wouldbenoadjustment tomakeif r takesreal values.
(2) Rather thantryingtoguessor remember whentoadd.5andwhentosubtract.5, itisoftenhelpful
to sketchahistogramandshadethebars wewishto include. It shouldthenbeobvious which
valuetouse.
217
(3) Whenyouareapproximatingaprobabilitysuchas1(A = 50) whereA isBinomial(100, 0.5)
it isessential tousethecontinuitycorrectionbecausewithout it, weobtainthesillyapproxima-
tion1(A = 50) ' 0.
Example: Normal approximation to the Poisson Distribution
Let A bearandomvariablewithaPoisson(`) distributionandsuppose` islarge. For themoment
supposethat ` is aninteger andrecall that if weadd` independent Poissonrandomvariables, each
with parameter 1, then thesumhas thePoisson distribution with parameter `. In general, aPoisson
randomvariablewithlargeexpectedvaluecanbewrittenasthesumof alargenumber of independent
randomvariables, andsothecentral limittheoremimpliesthatitmustbeclosetonormallydistributed.
Wecanprovethisusingmoment generatingfunctions. InSection7.5wefoundthemoment generating
functionof aPoissonrandomvariableA
'
A
(t) = c
A+Ac
I
.
Thenthestandardizedrandomvariableis
7 =
A `

`
andthishasmoment generatingfunction
'
Z
(t) = 1(c
Zt
) = 1(c
t

^A

)
= c
t

A
1(c
At

A
)
= c
t

A
'
A
(t,

`)
Thisiseasier toworkwithif wetakelogarithms,
ln('
Z
(t)) = t

` ` +`c
t

A
= `(c
t

A
1
t

`
).
Nowas` ,
t

`
0
and
c
t

A
= 1 +
t

`
+
1
2
t
2
`
+o(`
1
)
218
so
ln('
Z
(t)) = `(c
t

A
1
t

`
).
= `(
t
2
2`
+o(`
1
))

t
2
2
as` .
Thereforethemoment generatingfunctionof thestandardizedPoissonrandomvariable7 approaches
c
t
2
2
, themoment generatingfunctionof thestandardnormal andthisimpliesthat thePoissondistrib-
utionapproachesthenormal as` .
Example: SupposeA 1oi::o:(`). Usethenormal approximationtoapproximate
1(A `).
Comparethisapproximationwiththetruevaluewhen` = 9.
Solution Wehaveveriedabovethat themoment generatingfunctionof thePoisson(`) distribution
approachesc
t
2
2
, themomentgeneratingfunctionof thestandardnormal distributionas` . This
impliesthat thecumulativedistributionfunctionof thestandardizedrandomvariable
7
A
=
A `

`
(note: identify1(A) and\ ar(A) intheabovestandardization) approachesthecumulativedistribution
functionof astandard normal randomvariable7. Inparticular, without acontinuitycorrection,
1(A `) = 1(7
A
0) 1(7 0) =
1
2
as` .
Computingthetruevaluewhen` = 9,
1(A 9) = 1 1(A 9) = 1 c
9
9c
9

9
2
2!
c
9
...
9
9
9!
c
9
= 1 0.5874 = 0.4126.
Thereis aconsiderabledifferenceherebetweenthetruevalue0.4126 andthenormal approximation
1
2
sincethevalueof ` = 9 isstill quitesmall. However, if weusethecontinuity correctionwhenwe
applythenormal approximation,
1(A 9) = 1(A 9.5) = 1(7
A

0.5
3
) 1(7 0.1667) = 0.4338
whichismuchcloser tothetruevalue0.4126.
Normal approximation to the Binomial Distribution
219
Itiswell-knownthatthebinomial distribution, atleastforlargevaluesof :, resemblesabell-shaped
or normal curve. Themost commondemonstrationof thisiswithamechanical devicecommoninsci-
encemuseums calleda"Galtonboard" or "Quincunx"
44
whichdropballs throughameshof equally
spacedpins(seeFigure9.13 andtheappletatht t p: / / j avabout i que. i nt er net . com/ Bal l Dr op/ ).
Noticethatif ballseither gototherightor leftateachof the8levelsof pins, independentlyof themove-
ment of theother balls, thenA =number of movestoright hasa1i:(8,
1
2
) distribution. If theballs
aredroppedfromlocation0 (ontheraxis) thentheball eventuallyrestsat location2A 8 whichis
approximatelynormallydistributedsinceA isapproximatelynormal.
Figure9.13: A "GaltonBoard" or "Quincunx"
Thefollowingresult iseasilyprovedusingtheCentral Limit Theorem.
Theorem 38 Let A have a binomial distribution, 1i(:, j). Then for : large, the random variable
\ =
A :j
p
:j(1 j)
is approximately (0, 1)
Proof: Weuseindicator variables A
i
(i = 1, . . . , :) whereA
i
= 1 if theith trial in thebinomial
processisano outcomeand0if it isan1 outcome. ThenA =
a
P
i=1
A
i
andwecanusetheCLT.
44
ThewordcomesfromLatinquinque(ve) unicia(twelve) andmeansvetwelfths.
220
Since
j = 1(A
i
) = j, ando
2
= Var(A
i
) = j(1 j)
wehavethat as: P
A
i
:j
p
:j(1 j)
=
A :j
p
:j(1 j)
is(0, 1), asstated.
Analternativeproof usesmoment generatingfunctionsandisessentiallyaproof of thisparticular
caseof theCentral LimitTheorem. Recall thatthemomentgeneratingfunctionof thebinomial random
variableA is
'
A
(t) = (1 j +jc
t
)
a
.
AswedidwiththestandardizedPoissonrandomvariable, wecanshowwithsomealgebraiceffort that
themoment generatingfunctionof \
1(c
Wt
) c
t
2
2
as:
provingthatthestandardizedbinomial randomvariable\ approachesthestandardnormal distribution.
Remark: Wecanwritethenormal approximationeither as\ (0, 1) or asA (:j, :j(1
j)).
Remark: The continuity correction method can beused here. Thefollowing numerical example
illustratestheprocedure.
Example: If (i) A 1i(: = 20, j = .4), usethetheoremto nd theapproximateprobability
1(4 A 12) and(ii) if A 1i(100, .4) nd theapproximateprobability 1(34 A 48).
Comparetheanswer withtheexact valueineachcase.
Solution (i) Bythetheoremabove, A (8, 4.8) approximately. Without thecontinuitycorrection,
1(4 A 12) = 1

48

4.8

A9

4.8

128

4.8

.
= 1(1.826 7 1.826) = 0.932
where7 (0, 1). Usingthecontinuitycorrectionmethod, weget
1(4 A 12)
.
= 1

3.58

4.8
7
12.58

4.8

= 1(2.054 7 2.054) = 0.960


Theexact probability is
12
P
a=4

20
a

(.4)
a
(.6)
20a
, which(usingthe1 functionj/i:o:( )) is0.963. As
expectedthecontinuitycorrectionmethodgivesamoreaccurateapproximation.
221
(ii) A (40, 24) approximatelysowithout thecontinuitycorrection
1(34 A 48) ' 1

34 40

24
7
48 40

24

' 1(1.225 7 1.633)


' 0.9488 (1 0.8897) = 0.8385
Withthecontinuitycorrection
1(34 A 48) ' 1

33.5 40

24
7
48.5 40

24

' 1(1.327 7 1.735)


' 0.9586 (1 0.9076) = 0.866
Theexactvalue,
48
P
a=34
)(r), equals0.866 (correctto3decimals). Theerrorof thenormal approximation
decreasesas: increases, butitisagoodideatousethecontinuitycorrectionwhenitisconvenient. For
exampleif weareusinganormal approximationtoadiscretedistributionlikethebinomial whichtakes
integer valuesandthestandarddeviationof thebinomial islessthan10, thenthecontinuitycorrection
makes adifferenceof 0.5,10 = 0.05 to thenumber welook up in thetable. This can result in a
differenceintheprobability of uptoaround0.02. If youarewillingtotolerateerrorsinprobabilities
of that magnitude, your ruleof thumbmight betousethecontinuitycorrectionwhenever thestandard
deviationof theinteger-valuedrandomvariablebeingapproximatedislessthan10.
Example: Let j betheproportionof CanadianswhothinkCanadashouldadopt theUSdollar.
a) Suppose400Canadiansarerandomlychosenandaskedtheir opinion. LetA bethenumber who
sayyes. Findtheprobabilitythat theproportion,
A
400
, of peoplewhosayyesiswithin0.02 of j,
if j is0.20.
b) Findthenumber, :, whomust besurveyedsothereisa95%chancethat
A
a
lieswithin0.02 of j.
Againsupposej is0.20.
c) Repeat (b) whenthevalueof j isunknown.
Solution:
a) A 1i (400, .2). Usingthenormal approximationwetake
A Normal withmean:j = (400)(.2) = 80 andvariance:j(1 j) = (400)(.2)(.8) = 64
222
If
A
400
lieswithinj .02, then.18
A
400
.22, so72 A 88. Thus, wend
1 (72 A 88)
.
= 1

71.5 80

64
< 7 <
88.5 80

64

= 1 (1.06 < 7 < 1.06) = .711


b) Since: isunknown, it isdifcult toapplyacontinuitycorrection, soweomit it inthispart. By
thenormal approximation,
A (:j = .2:, :j(1 j) = .16:)
Therefore,
A
:


0.2:
:
= 0.2,
0.16:
:
2
=
0.16
:

(Recall Var (aA) = a


2
Var (A))
1

.18
A
a
.22

= .95 istheconditionweneedtosatisfy. Thisgives


1

.18 .2
q
.16
a
7
.22 .2
q
.16
a

= .95
1

.05

: 7 .05

= 0.95
Therefore, 1 (.05

:) = .975 andso.05

: = 1.9600 giving: = 1536.64. Inother words, we


needtosurvey1537peopletobeat least 95%surethat
A
a
lieswithin.02either sideof j.
c) Nowusingthenormal approximationtothebinomial, approximately A (:j, :j(1 j)) and
so
A
:


j,
j(1 j)
:

Wewishtond: suchthat
0.95 = 1

j .02
A
:
j +.02

= 1

j .02 j
q
j(1j)
a
7
j +.02 j
q
j(1j)
a

= 1

.02
q
j(1j)
a
7
.02
q
j(1j)
a

223
Asispart (b),
1

.02
q
j(1j)
a

= .975,
.02

:
p
j(1 j)
= 1.96
Solvingfor :,
: =

1.96
.02

2
j(1 j)
Unfortunately this doesnot giveus anexplicit expressionfor : becausewedont knowj. The
wayout of thisdilemmaistondthemaximumvalue

1.96
.02

2
j(1 j) couldtake. If wechoose
: thislarge, thenwecanbesureof havingtherequiredprecisioninour estimate,
A
a
, for any j.
Itseasytoseethat j(1 j) isamaximumwhenj =
1
2
. Thereforewetake
: =

1.96
.02

1
2

1
1
2

= 2401
i.e., if wesurvey 2401peoplewecanbe95%surethat
A
a
lieswithin.02of j, regardlessof the
valueof j.
Remark: Thismethodisusedwhenpoll resultsarereportedinthemedia: youoftenseeor hear that
thispoll isaccuratetowith3percent 19timesoutof 20". Thisissayingthat: wasbigenoughsothat
1(j .03 A,: j +.03) was95%. (Thisrequires: of about 1067.)
Problems:
9.6.1 Tomatoseedsgerminate(sprouttoproduceaplant) independentlyof eachother, withprobability
0.8of eachseedgerminating. Giveanexpressionfor theprobabilitythat at least 75seedsout of
100whichareplantedinsoil germinate. Evaluatethisusingasuitableapproximation.
9.6.2 A metal partsmanufacturer inspectseachpart produced. 60%areacceptableasproduced, 30%
havetoberepaired, and10%arebeyondrepair andmust bescrapped. It coststhemanufacturer
$10torepair apart, and$100(inlost labour andmaterials) toscrapapart. Findtheapproximate
probabilitythat thetotal cost associatedwithinspecting80partswill exceed$1200.
224
9.7 Problems on Chapter 9
9.1 ThediametersA of spherical particlesproducedbyamachinearerandomlydistributedaccord-
ingtoauniformdistributionon[.6,1.0] (cm). Findthedistributionof 1 , thevolumeof aparticle.
9.2 A continuousrandomvariableA hasp.d.f.
)(r) = /(1 r
2
) 1 r 1.
(a) Find/ andthec.d.f. of A. Graph)(r) andthec.d.f.
(b) Findthevalueof c suchthat 1(c A c) = .95.
9.3 a) Whenpeopleareaskedtomakeuparandomnumber between0and1, it has beenfound
that thedistributionof thenumbers, A, hasp.d.f. closeto
)(r) =
(
4r; 0 < r 1,2
4 (1 r) ;
1
2
< r < 1
(rather thanthel[0, 1] distributionwhichwouldbeexpected). Findthemeanandvariance
of A.
b) For 100random numbers fromtheabovedistributionndtheprobability their sumlies
between49.0and50.5.
c) What wouldtheanswer to(b) beif the100numbersweretrulyl[0, 1]?
9.4 Let A havep.d.f. )(r) =
1
20
; 10 < r < 10, andlet 1 =
A+10
20
. Findthep.d.f. of 1 .
9.5 A continuous randomvariableA which takes values between 0 and 1 has probability density
function
)(r) = (c + 1) r
c
; 0 < r < 1
a) For what valuesof c isthisap.d.f.? Explain.
b) Find1

A
1
2

and1(A)
c) Findtheprobabilitydensityfunctionof T = 1,A.
9.6 Themagnitudesof earthquakesinaregionof NorthAmericacanbemodelledbyanexponential
distributionwithmean2.5(measuredontheRichter scale).
(a) If 3earthquakesoccur inagivenmonth, what istheprobability that noneexceed5onthe
Richter scale?
225
(b) If anearthquakeexceeds4, what istheprobabilityit alsoexceeds5?
9.7 A certaintypeof light bulbhaslifetimesthat followanexponential distributionwithmean1000
hours. Findthemedianlifetime(thatis, thelifetimer suchthat50%of thelightbulbsfail before
r).
9.8 The examination scores obtained by a large group of students can be modelled by a normal
distributionwithameanof 65%andastandarddeviationof 10%.
(a) Findthepercentageof studentswhoobtaineachof thefollowingletter grades:
( 80%), 1(70 80%), C(60 70%), 1(50 60%), 1(< 50%)
(b) Findtheprobabilitythat theaveragescoreinarandomgroupof 25studentsexceeds70%.
(c) Findtheprobability that theaveragescores of two distinct randomgroups of 25students
differ bymorethan5%.
9.9 Thenumber of litersA that allingmachineinawater bottlingplant depositsinanominal two
liter bottlefollowsanormal distribution(j, o
2
), whereo = .01 (liters) andj isthesettingon
themachine.
(a) If j = 2.00, what istheprobabilityabottlehaslessthan2litersof water init?
(b) What shouldj beset at tomaketheprobability abottlehasless than2liters beless than
.01?
9.10 A turbineshaft ismadeupof 4different sections. Thelengthsof thosesectionsareindependent
andhavenormal distributionswithj ando: (8.10, .22), (7.25, .20),
(9.75, .24), and(3.10, .20). What istheprobability anassembledshaft meetsthespecications
28 .26?
9.11 Let A G(9.5, 2) and1 (2.1, 0.75) beindependent.
Find:
(a) 1(9.0 < A < 11.1)
(b) 1(A + 41 0)
(c) anumber / suchthat 1(A /) = .90.
9.12 Theamount, , of wineinabottle

1.05|, .0004|
2

(Note: | meansliters.)
226
a) Thebottleis labelledas containing1|. What is theprobability abottlecontains less than
1|?
b) Casksareavailablewhichhaveavolume, \ , whichis(22|, .16|
2
). What istheprobabil-
itythecontentsof 20randomlychosenbottleswill t insidearandomlychosencask?
9.13 Inproblem8.18, calculatetheprobability of passingtheexam, bothwithandwithout guessing
if (a) eachj
i
=.45; (b) eachj
i
= .55.
What isthebest strategyfor passingthecourseif (a) j
i
= .45 (b) j
i
= .55?
9.14 Supposethatthediametersinmillimetersof theeggslaidbyalargeockof henscanbemodelled
byanormal distributionwithameanof 40mm. andavarianceof 4mm
2
. Thewholesaleselling
priceis5centsfor anegglessthan37mmindiameter, 6centsfor eggsbetween37and42mm,
and7centsfor eggsover 42mm. What istheaveragewholesalepriceper egg?
9.15 In asurvey of : voters fromagiven riding in Canada, theproportion
a
a
who say they would
voteConservativeisusedtoestimatej, theprobabilityavoter wouldvoteP.C. (r isthenumber
of Conservativesupporters in thesurvey.) If Conservativesupport is actually 16%, howlarge
should: besothat withprobability.95, theestimatewill beinerror at most .03?
9.16 Whenbloodsamplesaretestedfor thepresenceof adisease, samplesfrom20peoplearepooled
andanalysedtogether. If theanalysisisnegative, noneof the20peopleisinfected. If thepooled
sampleispositive, atleastoneof the20peopleisinfectedsotheymusteachbetestedseparately;
i.e., atotal of 21testsisrequired. Theprobabilityapersonhasthediseaseis.02.
a) Findthemeanandvarianceof thenumber of testsrequiredfor eachgroupof 20.
b) For 2000people, testedingroupsof 20, ndthemeanandvarianceof thetotal number of
tests. What assumption(s) hasbeenmadeabout thepooledsamples?
c) Findtheapproximateprobabilitythat morethan800testsarerequiredfor the2000people.
9.17 Suppose80%of peoplewhobuyanewcar saytheyaresatisedwiththecar whensurveyedone
year after purchase. Let A bethenumber of peopleinagroupof 60randomly chosennewcar
buyerswhoreportsatisfactionwiththeir car. Let1 bethenumber of satisedownersinasecond
(independent) survey of 62randomly chosennewcar buyers. Usingasuitableapproximation,
nd1 (|A 1 | 3). A continuitycorrectionisexpected.
9.18 Supposethat theunemployment rateinCanadais7%.
(a) Findtheapproximateprobability that inarandomsampleof 10,000personsinthelabour
force, thenumber of unemployedwill bebetween675and725inclusive.
227
(b) Howlargearandomsamplewouldit benecessary tochoosesothat, withprobability .95,
theproportionof unemployedpersonsinthesampleisbetween6.9%and7.1%?
9.19 Gambling. Your chances of winning or losing money can becalculated in many games of
chanceasdescribedhere.
Supposeeach timeyou play agame(or placeabet) of $1 that theprobability you win (thus
endingupwithaprot of $1) is.49andtheprobability youlose(meaningyour prot" is-$1)
is.51
(a) LetA representyour protafter : independentplaysor bets. Giveanormal approximation
for thedistributionof A.
(b) If : = 20, determine1(A 0). (Thisistheprobability youareahead" after 20plays.)
Alsond1(A 0) if : = 50 and: = 100. What doyouconclude?
Note: Formanycasinogames(roulette, blackjack) therearebetsforwhichyour probability
of winning is only alittleless than .5. However, as you play moreand moretimes, the
probabilityyoulose(endupbehind") approaches1.
(c) Supposenowyouarethecasino. If all playerscombinedplace: = 100, 000 $1betsinan
evening, let A beyour prot. Findthevaluec withtheproperty that 1(A c) = .99.
Explaininwordswhat thismeans.
9.20 Gambling: Crown and Anchor. Crown andAnchor is agamethat is sometimes playedat
charitycasinosor justfor fun. Itcanbeplayedwithawheel of fortune" or with3dice, inwhich
eachdiehasits6sideslabelledwithacrown, ananchor, andthefour cardsuitsclub, diamond,
heart andspade, respectively. Youbet anamount (lets say $1) ononeof the6symbols: lets
supposeyoubet onheart". The3dicearethenrolledsimultaneouslyandyouwin$t if t hearts
turnup(t = 0, 1, 2, 3).
(a) Let A represent your protsfromplayingthegame: times. Giveanormal approximation
for thedistributionof A.
(b) Find(approximately) theprobabilitythat A 0 if (i) : = 10, (ii) : = 50.
9.21 Binary classication. Many situations requirethat weclassify" aunit of sometypeas being
oneof two types, which for conveniencewewill termPositiveand Negative. For example, a
diagnostictest for adiseasemight bepositiveor negative; anemail messagemaybespamor not
spam; acredit cardtransactionmay befraudulent or not. Theproblemisthat inmany caseswe
cannot tell for certainwhether aunit isPositiveor Negative, sowhenwehavetodecidewhicha
unit is, wemaymakeerrors. Thefollowingframeworkhelpsustodeal withtheseproblems.
228
For arandomlyselectedunit fromthepopulationbeingconsidered, denetheindicator random
variable
1 = 1(unit isPositive)
Supposethat wecannot knowfor certainwhether 1 = 0 or 1 = 1 for agivenunit, but that we
canget ameasurement A withthepropertythat
If 1 = 1, A (j
1
, o
2
1
)
If 1 = 0, A (j
0
, o
2
0
)
wherej
1
j
0
. Wenowdecideto classify units as follows, based on their measurement A:
select somevalued betweenj
0
andj
1
, andthen
if A d, classifytheunit asPositive
if A < d, classifytheunit asNegative
(a) Supposej
0
= 0, j
1
= 10, o
0
= 4, o
1
= 6 andd = 5. Findtheprobabilitythat
(i) If aunit isreally Positive, they arewrongly classiedasNegative. (Thisiscalledthe
falsenegative" probability.)
(ii) If aunit isreally Negative, they arewrongly classiedasPositive. (Thisiscalledthe
falsepositive" probability.)
(b) Repeat thecalculationsif j
0
= 0, j
1
= 10 asin(a), but o
1
= 3, o
2
= 3. Explaininplain
Englishwhythefalsenegativeandfalsepositivemisclassicationprobabilitiesaresmaller
thanin(a).
9.22 Binary classication and spam detection. Theapproachintheprecedingquestioncanbeused
for problems such as spamdetection, which was discussed earlier in Problems 4.17 and4.18.
Insteadof usingbinaryfeaturesasinthoseproblems, supposethat for agivenemail messagewe
computeameasureA, designedsothatA tendstobehighforspammessagesandlowforregular
(non-spam) messages. (For exampleA canbeacompositemeasurebasedonthepresenceor
absenceof certainwordsinamessage, aswell asother features.) Wewill treatA asacontinuous
randomvariable.
Supposethat for spammessages, thedistributionof A isapproximately(j
1
, o
2
1
), andthat for
regular messages, it isapproximately (j
0
, o
2
0
), wherej
1
j
0
. Thisisthesamesetupasfor
Problem9.21. Wewill lter spambypickingavalued, andthenlteringanymessagefor which
A d. Thetrickhereistodecidewhat valueof d touse.
229
Figure9.14: BertrandsParadox
(a) Supposethat j
0
= 0, j
1
= 10, o
1
= 3, o
2
= 3. Calculatetheprobability of afalse
positive(ltering amessagethat is regular) and afalsenegative(not ltering amessage
that isspam) under eachof thethreechoices(i) d = 5 (ii) d = 4 (iii) d = 6.
(b) What factorswoulddeterminewhichof thethreechoicesof d wouldbebest touse?
9.23 Random chords of a circle. Givenacircle, ndtheprobability that achordchosenat random
belonger thanthesideof aninscribedequilateral triangle. For exampleinFigure9.14, theline
joining and1 satisesthecondition, theother linesdonot. ThisiscalledBertrandsparadox
(see the J ava applet at http://www.cut-the-knot.org/bertrand.shtml) and there various possible
solutions, dependingonexactly howyouinterpret thephraseachordchosenat random. For
example, sincetheonly important thingis theposition of thesecondpoint relativeto therst
one, wecanx thepoint andconsider only thechords that emanatefromthis point. Thenit
becomesclear that 1/3of theoutcomes(thosewithanglewiththetangent at that point between
60and120degrees) will result inachordlonger thanthesideof anequilateral triangle. But a
chordisfullydeterminedbyitsmidpoint. Chordswhoselengthexceedsthesideof anequilateral
trianglehavetheir midpointsinsideasmaller circlewithradiusequal to1/2thatof thegivenone.
If wechoosethemidpointof thechordatrandomanduniformlyfromthepointswithinthecircle,
what istheprobabilitythat correspondingchordhaslengthgreater thanthesideof thetriangle?
Canyouthinkof anyother interpretationswhichleadtodifferent answers?
9.24 A model for stock returns. A common model for stock returns is as follows: thenumber of
trades of stockXXX inagivenday hasaPoissondistributionwithparameter `. Ateachtrade,
say theithtrade, thechangeinthepriceof thestock is A
i
andhas anormal distributionwith
mean0 andvarianceo
2
, sayandthesechangesareindependent of oneanother andindependent
of . Findthemoment generatingfunctionof thetotal changeinstock priceover theday. Is
thisadistributionthat yourecognise? What isitsmeanandvariance?
230
9.25 Let A
1
, A
2
, ..., A
a
beindependent randomvariablewithaNormal distributionhavingmean1,
andvariance2. Findthemoment generatingfunctionfor
(a) A
1
(b) A
1
+A
2
(c) o
a
= A
1
+A
2
+... +A
a
(d) :
12
(o
a
:)
9.26

Challenge problem: Supposel


1
, l
2
, ... isasequenceof independentl[0, 1] randomvariables.
For agivennumber /, denetherandomvariable
= min{:;
a
X
i=1
l
i
/}
What is theexpectedvalueof ? Howwouldyouapproximatethis expectedvalueif / were
large?
9.27 A continuousrandomvariableA issaidtohavetheGammadistributionwithparametersc 0
and, 0 if
)(r) =
(
a
o1
(c)o
o
e

i
c
r 0
0 oth.
.
Here, (.) istheGammafunctionintroducedinclass.
(a)- Showthat )(.) isalegitimateProbabilityDensityFunction.
(b)- Usethepropertiesof theGammafunctiontoobtain1(A) andVar(A).
(c)- Verifythat settingc = 1 resultsintheexponential distributionwithparameter ` =
1
o
.
9.28 A randomvariableA issaidtohaveaCauchydistributionwithparameter c 0 if
)
A
(r) =
c
(c
2
+r
2
)
, r R.
(a)- Showthat 1(A) and\ ar(A) bothdonot exist.
(b)- Let 1 =
1
A
. Showthat 1 hasaCauchydistributionwithparameter
1
c
.
(c)- Findthec.d.f. 1(.) andtheinversec.d.f. 1
1
(.) for therandomvariableA.
(d)- Assume we have access to a uniformrandomvariable l Uniform[1, 0]. Suggest a
functionq(.) suchthat q(l) isaCauchyrandomvariablewithparameter c.
9.29 Let beanevent andA beacontinuousrandomvariable. Wedene
)
A|
(r|) =
d
dr
1(A r|) (1)
231
and
1(|A = r) =
)
A|
(r|)1()
)
A
(r)
. (2)
(a)- Let = {|A| < 1}. Showthat
)
A|
(r|) =
(
)
^
(a)
1(|A|<1)
|r| < 1
0 |r| 1
.
(b)- Showthat for anyevent ,
1() =
Z

1(|A = r))
A
(r)dr.
(c)- Assumewehaveacoinwhichisnot fair andwedonot knowhowmuchtheprobability of
heads. As such, wemodel theprobability of heads by arandomvariableA Uniforn(0, 1).
Thisisanappropriatemodel astheprobabilityof headscanbeanynumber intheinterval [0, 1].
Wewant tondtheprobability of observing/ headsin: tossesof thecoin. Call thisevent .
Clearly,
1(|A = r) =

:
/

r
I
(1 r)
aI
.
Usetheresult inpart (b) toshowthat
1() =
1
: + 1
.
Hint. Usetheidentity
R
1
0
0
n
(1 0)
v
d0 =
n!v!
(n+v+1)!
.
9.30 (a)- J ones gures that thetotal number of thousands of kilometers that an auto can bedriven
beforeit needto bejunkedis anexponential randomvariablewithparameter
1
20
. Smithhas a
usedcar that heclaims has beendriven10, 000 kilometers. If J ones purchases thecar, what is
theprobabilitythat shewouldget at least 20, 000 additional kilometersout of it?
(b)- Repeat under theassumptionthat thelifetimekilometer-ageof thecar is not exponentially
distributedbut rather is(inthousandsof kilometers) uniformlydistributedover (0, 40).
9.31 A manufacturer producesboltsthatarespeciedtobebetween1.19and1.21inchesindiameter.
If hisproductionprocessresultsinaboltsdiameter beingnormally distributedwithmean1.20
inchesandstandarddeviation0.005, what percentageof boltswill not meet specications?
9.32 Let T
1
, T
2
, , T
a
denotetherst : interarrival timesof apoissonprocessA
t
(recall that A
t
is
thenumber of hitsin[0, t)) withintensity` andset o
a
=
P
a
i=1
T
a
.
(a)- What istheinterpretationof o
a
?
232
(b)- Arguethat thetwoevents{o
a
t} and{A
t
:} areidentical.
(c)- Use(b) toshowthat
1(o
a
t) = 1
a1
X
)=0
e
At
(`t)
)
,!
.
(d)- By differentiatingthec.d.f. of o
a
givenin(c), showthat o
a
is aGammarandomvariable
(introducedinProblem1above) withparametersc = : and, = `.
233
Sample Final Exam
Part I. Multiple Choice Questions
Therearefour choicesfor eachof thefollowingquestionsandonlyone of themiscorrect. Circle the
correct answer for eachof thequestions. (Total mark: 36; 3marksfor eachcorrectly circledquestion
and0otherwise)
1. Supposethat thesamplespaceis o = {1, 2, 3, 4, 5, 6, 7, 8} andevents = {1, 2, 3, 4} and1 =
{3, 4, 5, 6}. What isthesubset for event

1 ?
(a) {1, 2, 3, 4, 5, 6} (b) {1, 2} (c) {5, 6} (d) {7, 8}
2. Suppose1() = 0.5, 1(1) = 0.3, and1 areindependent. What istheprobability1(

1)?
(a) 0.20 (b) 0.15 (c) 0.35 (d) 0.65
3. Suppose1() = 0.4, 1(1) = 0.3, and1 aremutually exclusive(i.e. disjoint). What is the
probability1( | 1)?
(a) 0.00 (b) 0.10 (c) 0.30 (d) 0.40
4. Let A beadiscreterandomvariablewithprobabilityfunction)(i) = 1(A = i) = i,6, i = 1, 2, 3.
What is1(A
2
1)?
(a) 3 (b) 4 (c) 5 (d) 6
234
235
5. Let A beacontinuous randomvariablewith probability density function (pdf) )
A
(r) = 2r for
0 r 1 and)
A
(r) = 0 otherwise. What istheprobability1(0.5 < A < 1.5)?
(a) 0.25 (b) 0.50 (c) 0.75 (d) 1.00
6. Let A beacontinuous randomvariablewith probability density function (pdf) )
A
(r) = 2r for
0 r 1 and)
A
(r) = 0 otherwise. What isthepdf of 1 =

A?
(a) )
Y
(j) = 2

j for 0 j 1 for )
Y
(j) = 0 otherwise
(b) )
Y
(j) = 2j
2
for 0 j 1 for )
Y
(j) = 0 otherwise
(c) )
Y
(j) = 3j
2
for 0 j 1 for )
Y
(j) = 0 otherwise
(d) )
Y
(j) = 4j
3
for 0 j 1 for )
Y
(j) = 0 otherwise
Questions 7, 8 and 9 are based on the following tabulated joint probability function )(r, j) =
1(A = r, 1 = j) of random variables A and 1 :
r
)(r, j) 1 2 3
0 0.2 0.1 0.0
j 1 0.1 0.1 0.2
2 0.0 0.0 0.3
7. What istheconditional probability1(1 = 1|A = 2)?
(a) 0.2 (b) 0.3 (c) 0.4 (d) 0.5
8. What istheprobability1(A = 1 )?
(a) 0.0 (b) 0.1 (c) 0.2 (d) 0.3
9. What isthecovarianceCo(A, 1 )?
(a) 0.3 (b) 0.4 (c) 0.5 (d) 0.6
236
10. Supposethelifetimeof anOlsontireisnormallydistributedwithameanlifeof 62,000kilometers
and a standard deviation of 3,000 kilometers. Suppose that you buy four Olson tires, what is the
probabilitythat all four will last longer than65,000kilometers?
(a) 0.8413 (b) 0.5010 (c) 0.1587 (d) 0.0006
11. Davidis waitingfor thenext bus whicharrives accordingto aPoissonprocess withtherateof 5
busesper hour. Hehasalreadywaitedfor 10minutes, what istheprobabilitythat hell havetowait for
10moreminutes?
(a) c
56
(b) c
53
(c) c
50
(d) c
100
12. A longlineof customersqueueovernighttobuyticketstoaconcert. Inthemorning, asinglecashier
arrivesandstartssellingtickets. Thedistributionof theservicetimesisthesamefor all customersand
has mean 3 minutes and variance 4 minutes
2
. Assuming that service times for different customers
areindependent, what istheapproximateprobabilitythat the83
vo
personinlinewill meet thecashier
withinthenext 4hours?
(a) 0.6298 (b) 0.3702 (c) 0.7576 (d) 0.5151
(End of Part I)
Part II. Full Answer Questions
1. ThreetravellersA, B andC arriveat theComfort Innwherevesingle rooms(numbered1, 2, 3, 4
and5, all adjacenttoeachother onthesamesideof thebuilding) arestill available. Thethreetravellers
arerandomlyassignedtothreedifferent rooms. Giveexpressionsfor theprobabilitiesthat
[3] (a) Thethreetravellersareinrooms1, 2and3.
[3] (b) Thethreetravellersareinthreeadjacent rooms.
[3] (c) Room1isnot occupiedbyanyof thethreetravellers.
[3] (d) A andB areintwoadjacent roomsbut Csroomisnot adjacent toeither A or B.
2. A call center has 5phonelineswhichareindependent of eachother. For eachline, calls arriveat
thecenter followingaPoissonprocess withanaverageof 12calls per hour. Thereis aredlight for
237
eachline, andif therehavebeenmorethan2callsinthepast 10minutesonthisline, theredlight will
ash. Thecall center alsohasasingleyellowlight. When4or moreredlightsashat thesametime,
theyellowlight will ash.
[3] (a) For anyphoneline, showthat theprobabilitythat theredlight isashingisapproximately0.32.
(Thisvaluecanbeusedlater if needed)
[3] (b) Findtheprobabilitythat theyellowlight at thecenter isashing.
[3] (c) If theredlight of line1isashing, ndtheprobabilitythat theyellowlight isalsoashing.
[3] (d) If theyellowlight isashing, ndtheprobabilitythat theredlight of line1isalsoashing.
3. Blood samples of 60 patients havearrivedat atest center. Each patient has aprobability 0.3 of
HIV-positive, independent of eachother. The60samplesarerandomlydividedinto6groups, with10
samplesineachgroup. A pooledbloodsamplefor eachgroupistestedrst. If theresult isnegative,
all 10patientsinthat groupareHIV-negative andhencenomoretest isrequiredfor that group; if the
pooledtestispositive, eachof the10sampleswill havetobetestedonebyonesoanadditional 10tests
arerequiredfor that group.
[4] (a) Let A
1
bethenumber of testsrequiredfor therst group. Find1(A
1
) and\ ar(A
1
).
[3] (b) Let 1 bethenumber of testsrequiredfor all 6groups. Find1(1 ) and\ ar(1 ).
[3] (c) If each pooled test costs $25 and each individual test costs $15. Find the expected cost for
completingall tests(6groups).
4. SupposeA and1 areindependent randomvariableswithA (60, 3
2
) and1 (65, 4
2
).
[5] (a) Findthevalueof c suchthat 1(A 1 + 5 < c) = 0.975.
[5] (b) SupposeA
i
(60, 3
2
), i = 1, 2, , : areindependent, and1
i
(65, 4
2
), i = 1, 2, , :
areindependent. TheA
i
s arealso independent of the1
i
s. Findthesmallest valueof : such
that
1

1
:
a
X
i=1
(A
i
1
i
+ 5) < 0.5
!
0.975 .
238
5. Anirregular coinistossedrepeatedly until aheadappears. Let A bethenumber of tossesneeded
toobtaintherst head(includingthelast tossonwhichtheheadisactually obtained). Thenasecond
irregular coinistossedfor A number of times. Let1 bethenumber of timesthatthesecondcointurns
uphead. Assumethat bothcoinshaveaprobabilityof 1,3 of turninguphead.
[3] (a) Find1(A = :).
[3] (b) Find1(1 = 0|A = :).
[4] (c) Find1(1 = 0).
6. TheToronto Raptors andtheLos Angles Lakers play aseries of 10games. TheRaptors havea
winningprobability0.3for eachgame, andresultsfromdifferent gamesareindependent. Twoconsec-
utivegamesarecalledarun if theresult for theRaptorsis(Win, Win). If theresult from3consecutive
gamesis (Win, Win, Win), it iscountedastworuns. Let A bethenumber of runs for theRaptors in
the10gameseries.
[3] (a) Denesuitableindicator variables whichcouldbeusedfor computingtheexpectationandthe
varianceof A.
[3] (b) Find1(A).
[4] (c) Find\ ar(A).
239
Solutions to Section Problems
3.1.1 (a) Eachstudent canchoosein4waysandtheyeachget tochoose.
(i) Supposewelist thepointsino in aspecic order, for example(choiceof student A,
choiceof student B, choiceof student C) so that thepoint (1, 2, 3)indicates chose
section1, 1 chosesection2andC chosesection3. Theno lookslike
{(1, 1, 1), (1, 1, 2), (1, 1, 3), ...}
Since each student can choose in 4 ways regardless of the choice of the other two
students, bythemultiplicationruleo has4 4 4 = 64 points.
(ii) Tosatisfythecondition, therststudentcanchoosein4waysandtheothersthenonly
have1sectiontheycangoin. Thereforetheprobabilitytheyareall inthesamesection
is
411
64
= 1,16.
(iii) Tosatisfythecondition, therst topick has4waystochoose, thenext has3sections
left, andthelast has2sectionsleft. Thereforetheprobability they areall indifferent
sectionsis
432
64
= 3,8.
(iiii) Tosatisfythecondition, eachhas3waystochooseasection. Thereforetheprobability
thereisno-oneinsection1is
333
64
= 27,64
(b) (i) Nowo has:
c
points, eachasequencelike(1, 2, 3, 2, ...) of length:.
(ii) 1(all insamesection) =: 1 1 ... 1,:
c
= 1,:
c1
.
(iii) 1(different sections) =:(: 1)(: 2)...(: : + 1),:
c
=
a
(s)
a
s
.
(iiii) 1(nobodyinsection1)=(: 1)(: 1)(: 1)...(: 1),:
c
=
(a1)
s
a
s
.
3.1.2 (a) Thereare26 ways to chooseeach of the3 letters, so in all theletters can bechosen in
262626 ways. If all lettersarethesame, thereare26waystochoosetherstletter, and
only1waytochoosetheremaining2letters. So1(all lettersthesame) is
2611
26
3
= 1,26
2
.
(b) Thereare10 10 10 waystochoosethe3digits. Thenumber of waystochooseall even
digitsis444. Thenumber of waystochooseall odddigitsis555. Therefore1(all
evenor all odd) =
4
3
+5
3
10
3
= .189.
240
241
3.1.3 (a) Thereare35symbols inall (26letters +9numbers). Thenumber of different 6-symbol
passwordsis35
6
26
6
(weneedtosubtract off the26
6
arrangementsinwhichonlyletters
areused, sincetheremust beat least onenumber). Similarly, weget the number of 7-
symbol and8-symbol passwordsas35
7
26
7
and35
8
26
8
. Thetotal number of possible
passwordsisthen
(35
6
26
6
) + (35
7
26
7
) + (35
8
26
8
).
(b) Let betheanswer topart (a) (thetotal no. of possiblepasswords). Assumingyounever
try thesamepasswordtwice, theprobability youndthecorrect passwordwithintherst
1,000triesis1000,.
3.1.4 Thereare7! different orders
(a) Wecanstick theevendigitstogether in3! orders. Thisblock of evendigitsplusthe4odd
digitscanbearrangedin5! orders. Therefore 1(eventogether) =
3!5!
7!
= 1,7.
(b) For evenat ends, thereare3waystoll therst place, and2waystoll thelast placeand
5! waystoarrangethemiddle5digits. For oddatendsthereare4waystoll therstplace
and3waystoll thelast placeand5! waystoarrangethemiddle5digits. 1(evenor odd
at ends) =
(3)(2)(5!)+(4)(3)(5!)
7!
=
3
7
.
3.1.5 Thenumber of arrangementsino is
9!
3!2!
(a) 1 ateachendgives
7!
2!
arrangementsof themiddle7letters. 1 ateachendgives
7!
3!
arrange-
mentsof themiddle7letters. Therefore1(sameletter at ends) =
7!
2!
+
7!
3!
9!
3!2!
=
1
9
.
(b) TheA, Cand can bestuck together in 3! ways to formasingleunit. Wecan then
arrangethe31s, 21s, T, and(AC) in
7!
3!2!
ways. Therefore1(AC together) =
7!
3!2!
3!
9!
3!2!
=
1
12
.
(c) Thereisonly1waytoarrangethelettersintheorderCEEELLNTX. Therefore1(alphabetical
order) =
1
9!
3!2!
=
12
9!
3.2.1 (a) The8carscanbechosenin

160
8

ways. Wecanchooser withfaultyemissioncontrolsand


(8 r) withgoodones in

35
a

125
8a

ways. Therefore1(at least 3faulty) =


8

i=3
(
35
i
)(
125
8i
)
(
160
8
)
sinceweneedr = 3 or 4or .... or 8.
(b) Thisassumesall

160
8

combinationsareequally likely. Thisassumptionprobably doesnt


holdsincetheinspector wouldtendtoselect older carsor thoseinbadshape.
242
3.2.2 (a) Therst 6nishes canbechosenin

15
6

ways. Choose4fromnumbers 1,2, ...., 9in

9
4

ways and2fromnumbers 10, ..., 15in

6
2

ways. Therefore1(4singledigits intop6) =


(
9
4
)(
6
2
)
(
15
6
)
=
54
143
.
(b) Need2singledigitsand2doubledigit numbersin1
st
4andthenasingledigit. Thisoccurs
in

9
2

6
2

7 ways. Therefore
1(5
th
is3
rd
singledigit) =

9
2

6
2

15
4

11
=
36
143
(sincewecanchoose1
st
4in

15
4

waysandthenhave11choicesfor the5
th
)
Alternate Solution: Thereare15
(5)
waystochoosetherst 5inorder. Wecanchoosein
order, 2doubledigit and3singledigit nishers in6
(2)
9
(3)
ways, andthenchoosewhich
2of therst 4placeshavedoubledigit numbersin

4
2

ways. Therefore1(5
th
is3
rd
single
digit) =
6
(2)
9
(3)
(
4
2
)
15
(5)
=
36
143
.
(c) Choose13in1wayandtheother6numbersin

12
6

ways. (from1,2, ....., 12). Therefore1(13


ishighest) =
(
12
6
)
(
15
7
)
=
28
195
.
Alternate Solution: Fromthe

13
7

waystochoose7numbersfrom1,2, ..., 13subtractthe

12
7

whichdont include13(i.e. all 7chosenfrom1,2, ..., 12). Therefore1(13ishighest)


=
(
13
7
)(
12
7
)
(
15
7
)
=
28
195
.
4.1.1 Let 1 = {rain}andT = {temp. 22

C}, anddrawaVenndiagram. Then


1(T1) = .4 .2 = .08
1(T1) = .7 .8 = .56
(Notethat theinformationthat 40%of dayswithtemp 22

havenorainisnot neededtosolve
thequestion). Therefore1(1T) = 24%. Thisresult istobeexpectedsince80%of dayshave
ahightemperature 22

C and30%of thesedayshaverain.
4.1.2 1('1) = .15, 1(') = .45, 1(1) = .45 (seetheFigurebelow)
243
L
M
.30 .15 .30
Theregion outsidethecircles represents females to theright. To make1(o) = 1. weneed
1(11) = .25.
4.2.1 (a)
1( 1 C) = 1() +1(1) +1(C) 1(1) 1(C) 1(1C) +1(1C)
= 1 .1 [1(C) +1(1C) 1(1C)]
= 0.9 1(C 1C)
Therefore1(1C) = .9 isthelargestvalue, andthisoccurswhen1(C1C) = 0.
(b) If eachpointinthesamplespacehasstrictlypositiveprobabilitythenif 1(C1C) = 0,
then C = , and 1C = , so that and C aremutually exclusiveand 1 and C are
mutuallyexclusive. Otherwisewecannot makethisdetermination. While andC could
bemutuallyexclusive, it cant bedeterminedfor sure.
4.2.2
1( 1) = 1( or 1 occur) = 1 1( doesnt occur AND1 doesnt occur)
= 1 1(

1).
Alternatively, (look at aVenndiagram), o = ( 1) (


1)is apartition, so 1(o) = 1
1( 1) +1(


1) = 1.
4.3.1 (a) Pointsgivingatotal of 9 are: (3, 6), (4, 5), (5, 4) and(6, 3). Theprobabilitiesare(.1)(.3) =
.03 for (3, 6) andfor (6, 3), and(.2)(.2) = .04 for (4, 5) andfor (5, 4). Therefore1{(3,
6) or (4, 5) or (5, 4) or (6, 3)}= .03 +.04 +.04 +.03 = .14.
244
(b) Thereare

4
1

arrangementswith1nineand3non-nines. Eacharrangementhasprobability
(.14)(.86)
3
.
Therefore 1(nineon1of 4repetitions) =

4
1

(.14)(.86)
3
= .3562.
4.3.2 Let \={at least 1woman}and1={at least 1Frenchspeakingstudent}.
1(\1) = 11(\1) = 11(\1) = 1

1(\) +1(1) 1(

\

1)

(seegurebelow)
W
F
Venndiagram, Problem4.3.2
But 1(\1) = 1(no woman and no French speaking student)= 1(all men who dont speak
French)
1(womanwhospeaksFrench) =1(woman)1(French|woman)=.45 .20 = .09.
FromVenndiagram,1(manwithout French) = .49.
French
Woman
.36 .09 .06
.49
Figure11.1: P(womanwhospeaksfrench)
245
1(\ 1) = (.49)
10
and1(\) = (.55)
10
; 1(1) = (.85)
10
Therefore1(\1) = 1

(.55)
10
+ (.85)
10
(.49)
10

= 0.8014.
4.3.3 FromaVenndiagram: (1) 1(1) = 1(1) 1(1) (2) 1( 1) = 1( 1)
1( 1) = 1()1(1)
1( 1) = 1()1(1) 1( 1) = 1()1(1)
1 1( 1) = 1()1(1)
1 [1() +1(1) 1(1)] = 1() [1 1(1)]
[1 1()] [1(1) 1(1)] = 1() 1()1(1)
1() 1(1) = 1() 1()1(1)
1()1(1) = 1(1)
Therefore and1 areindependent iff and1 areindependent.
4.4.1 Let1={bus}and1={late}.
1(1|1) =
1(11)
1(1)
=
1(1|1)1(1)
1(1|1)1(1)+1(1|

1)1(

1)
=
(.3)(.2)
(.3)(.2)+(.7)(.1)
= 6,13.
4.4.2 Let 1={fair}andH={5 heads}
1(1|H) =
1(1H)
1(H)
=
1(H|1)1(1)
1(H|1)1(1) +1(H|

1)1(

1)
=
(
3
4
)

6
5

(
1
2
)
6
(
3
4
)

6
5

(
1
2
)
6
+ (
1
4
)

6
5

(.8)
5
(.2)
1
= 0.4170
4.4.3 Let H={defectiveheadlights}, '={defectivemufer}
1(',H) =
1('H)
1(H)
=
1('H)
1('H 'H)
=
.1
.1 +.15
= .4
5.1.1 Weneed)(r) 0 and
2
P
a=0
)(r) = 1
9c
2
+ 9c +c
2
= 10c
2
+ 9c = 1
Therefore 10c
2
+ 9c 1 = 0
(10c 1)(c + 1) = 0
246
c = 1,10 or 1
But if c = 1 wehave)(1) < 0..whichisimpossible. Therefore c = .1
5.1.2 Wearearranging1 1 O O O where1 ={you}, 1={friend}, O ={other}. Thereare
5!
3!
= 20
distinct arrangements.
A = 0: 1 1 O O O, , O O O 1 1 has4arrangementswith1 rst and4with1 rst.
A = 1: 1 O1OO, , OO1 O1 has3arrangementswith1 rst and3with1 rst.
A = 2: 1 OO1O, O1 OO1 has2with1 rst and2with1.
A = 3: 1 OOO1 has1with1 rst and1with1.
r 0 1 2 3
)(r) .4 .3 .2 .1
1(r) .4 .7 .9 1
5.3.1 (a) Usingthehypergeometricdistribution,
)(0) =

o
0

12o
7

12
7

d 0 1 2 3
)(0) 1 5,12 5,33 5,110
(b) Whilewecouldndnonetaintedif d isasbigas3, it isnot likelytohappen. Thisimplies
theboxisnot likelytohaveasmanyas3taintedtins.
5.3.2 Consideringorder, thereare
(a)
points ino. Wecanchoosewhichr of the: selections will
havesuccess in

a
a

ways. Wecanarrangether successes intheir selectedpositionsinr


(a)
waysandthe(: r) failures intheremainingpositionsin( r)
(aa)
ways.
Therefore)(r) =
(
n
i
)v
(i)
(.v)
(ni)
.
(n)
withr rangingfrommax(0, : ( r)) tomin(:, r)
5.4.1 (a) Usinghypergeometric, with = 130, r = 26, : = 6,
)(2) =

26
2

104
4

130
6
(= .2506)
(b) Usingbinomial asanapproximation,
)(2) =

6
2

26
130

104
130

4
= 0.2458
247
5.4.2 (a) 1(fail twice)
=1()1(fail twice|) +1(1)1(fail twice|1) =

1
2

10
2

(.1)
2
(.9)
8
+

1
2

10
2

(.05)
2
(.95)
8
=
.1342.
Where={camera ispicked}and1={camera1 ispicked}. Thisassumesshotsare
independent withaconstant failureprobability.
(b) 1(|failedtwice) =
1( oao )oi| t&icc)
1()oi| t&icc)
=
(
1
2
)(
10
2
)(.1)
2
(.9)
8
.1342
= .7219
5.5.1 Weneed(r 25) failures" beforeour 25thsuccess".
)(r) =

r 1
r 25

(.2)
25
(.8)
a25
or

r 1
24

(.2)
25
(.8)
a25
; r = 25, 26, 27, ...
5.5.2 (a) Intherst (r + 17) selectionsweneedtoget rdefective(usehypergeometricdistribution)
andthenweneedagoodoneonthe(r + 18)
th
draw.
Therefore)(r) =

200
a

2300
17

2500
a+17

2283
2500 (r + 17)
; r = 0, 1, , 200.
(b) Since2500islargeandwereonlychoosingafewof them, wecanapproximatethehyper-
geometricportionof )(r) usingbinomial
)(2)
.
=

19
2

200
2500

1
200
2500

17

2283
2481
= .2440.
5.6.1 Usinggeometric,
1(r not leaky found before rst leaky) = (0.7)
a
(0.3) = )(r)
1(A : 1) = )(: 1) +)(:) +)(: + 1) +...
= (0.7)
a1
(0.3) + (0.7)
a
(0.3) + (0.7)
a+1
(0.3) +...
=
(.7)
a1
(.3)
1 .7
= (.7)
a1
= 0.05
(: 1) log .7 = log .05; so : = 9.4
At least 9.4carsmeans10or morecarsmust bechecked. Therefore: = 10.
5.7.1 (a) Let A bethenumber whodont show. Then
A 1i(122, .03)
248
1(not enough seats) = 1(A = 0 or 1)
=

122
0

(.03)
0
(.97)
122
+

122
1

(.03)
1
(.97)
121
= 0.1161
(To useaPoissonapproximationweneedj near 0. That is why wedenedsuccess as
not showingup).
For Poisson, j = :j = (122)(.03) = 3.66
)(0) +)(1) = c
3.66
+ 3.66c
3.66
= 0.1199
(b) Binomial requiresall passengerstobeindependent astoshowingupfor theight, andthat
each passenger has thesameprobability of showing up. Passengers arenot likely inde-
pendent sincepeoplefromthesamefamilyor companyarelikelytoall showupor all not
show. Evenstrangersarrivingonanearlier incomingightwouldnotmisstheir ightinde-
pendently if theight wasdelayed. Passengersmay all haveroughly thesameprobability
of showingup, but eventhisissuspect. Peopletravellingindifferent farecategoriesor in
different classes (e.g. charter fares versus rst class) may havedifferent probabilities of
showingup.
5.8.1 (a)
` = 3, t = 2
1
2
, j = `t = 7.5
)(6) =
7.5
6
c
7.5
6!
= 0.1367
(b)
1(2 in 1st minute|6 in 2
1
2
minutes) =
1(2 in 1st min. and 6 in 2
1
2
min.)
1(6 in 2
1
2
min)
=
1(2 in 1st min. and 4 in last 1
1
2
min)
1(6 in 2
1
2
min.)
=

3
2
c
3
2!

4.5
4
c
4.5
4!

7.5
6
c
7.5
6!

=

6
2

3
7.5

4.5
7.5

4
= .3110
Notethisisabinomial probabilityfunction.
5.8.2 Assumingtheconditionsfor aPoissonprocessaremet, withlinesasunitsof time":
249
(a) ` = .02 per line; t = 1line; j = `t = .02
)(0) =
j
0
c
j
0!
= c
.02
= .9802
(b) j
1
= 80 .02 = 1.6; j
2
= 90 .02 = 1.8

j
2
1
c
j
1
2!

j
2
2
c
j
2
2!

= .0692
5.9.1 Consider a1minuteperiodwithnooccurrencesasasuccess. ThenAhasageometricdistrib-
ution. Theprobabilityof success is
)(0) =
`
0
c
A
0!
= c
A
.
Therefore )(r) = (c
A
)(1 c
A
)
a1
; r = 1, 2, 3, ...
(Theremust be(r 1) failuresbeforetherst success.)
5.9.2 (a) j = 3 1.25 = 3.75
)(0) =
3.75
0
c
3.75
0!
= .0235
(b)

1 c
3.75

14
c
3.75
, usingageometricdistribution
(c) Usingabinomial distribution
)(r) =

100
r

c
3.75

1 c
3.75

100a
ApproximatebyPoissonwithj = :j = 100c
3.75
= 2.35. )(r)
.
= c
2.35 2.35
i
a!
(: large,
j small). Thus, 1(A 4) = 1 1(A 3) = 1 .789 = .211.
7.3.1 Thereare10tickets withall digits identical. For thesethereis only 1prize. Thereare10 9
ways to pick adigit to occur twiceandadifferent digit to occur once. Thesecanbearranged
in
3!
2! 1!
= 3 different orders; i.e. thereare270 tickets for which 3 prizes arepaid. Thereare
10 9 8waystopick 3different digitsinorder. For eachof these720ticketstherewill be3!
prizespaid.
Theorganizationtakesin$1,000.
Therefore1(Prot) =

(1000 200)
10
1000

(1000 600)
270
1000

(1000 1200)
720
1000

= $28
i.e., onaveragetheylose$28.
250
7.4.1 Let themsell : tickets. SupposeA showup. ThenA 1i(:, .97). For thebinomial distribu-
tion, j = 1(A) = :j = .97:
If : 120, therevenueswill be100A, andtheexpectedrevenueis1(100A) = 1001(A) =
97:. Thisismaximizedfor : = 120. Thereforethemaximumexpectedrevenueis$11,640. For
: = 121, revenuesare100A, less$500if all 121showup. i.e. theexpectedrevenueis,
100 121 .97 500 )(121) = 11, 737 500(.97)
121
= $11, 724.46.
For : = 122, revenuesare100A, less$500if 121showup, less$1000if all 122show. i.e. the
expectedrevenueis
100 122 .97 500

122
121

(.97)
121
(.03) 1000(.97)
122
= $11, 763.77
For : = 123, revenuesare100A, less$500if 121show, less$1,000if 122show, less$1500if
all 123show. i.e. theexpectedrevenueis
100 123 .97 500

123
121

(.97)
121
(.03)
2
1000

123
122

(.97)
122
(.03) 1500(.97)
123
= $11, 721.13
Thereforetheyshouldsell 122tickets.
7.4.2 (a) LetA bethenumber of wordsneedingcorrectionandletT bethetimetotypethepassage.
Then A 1i(450, .04) and T = 450 + 15A. A has mean :j = 18 and variance
:j(1 j) = 17.28.
1(T) = 1(450 + 15A) = 450 + 151(A) = 450 + (15)(18) = 720
Var(T)=Var(450 + 15A) = 15
2
Var(A) = 3888.
(b) At45wordsperminute, eachwordtakes1
1
3
seconds. A 1i(450, .02)andT =

450 1
1
3

+
15A = 600 + 15A
1(A) = 450 .02 = 9; 1(T) = 600 + (15)(9) = 735, soit takeslonger onaverage.
8.1.1 (a) Themarginal probabilityfunctionsare:
r 0 1 2 and j 0 1 2
)
1
(r) .3 .2 .5 )
2
(j) .3 .4 .3
Since)
1
(r) )
2
(j) 6= )(r, j) )or a|| (r, j)
ThereforeA and1 arenot independent. e.g. )
1
(1) )
2
(1) = 0.08 6= 0.05
251
(b)
)(j|A = 0) =
)(0, j)
)
1
(0)
=
)(0, j)
.3
j 0 1 2
)(j|A = 0) .3 .5 .2
(c)
d 2 1 0 1 2
)(d) .06 .24 .29 .26 .15
(e.g.1(1 = 0) = )(0, 0) + )(1, 1) +)(2, 2))
8.1.2
)(j|r) =
)(r, j)
)
1
(r)
)(r, j) = 1(j calls)1(r sales|j calls)
=

20

c
20
j!

j
a

(.2)
a
(.8)
ja

=
(20
i
)(.8)
i
(ja)!

(20)
i
(.2)
i
a!
c
20
; for r = 0, 1, 2, ... andj = r, r + 1, r + 2, ...
(j startsat r sinceno. of calls no. of sales).
)
1
(r) =

X
j=a
)(r, j) =
[(20)(.2)]
a
r!
c
20

X
j=a
[(20)(.8)]
ja
(j r)!
=
4
a
c
20
r!

16
0
0!
+
16
1
1!
+
16
2
2!
+...

=
4
a
c
20
r!
c
16
=
4
a
c
20
r!
c
16
=
4
a
c
4
r!
Therefore )(j|r) =
16
i
(ja)!
4
i
a!
c
20
4
i
c
4
a!
=
16
ja
c
16
(j r)!
; j = r, r + 1, r + 2, ...
252
8.1.3
)(r, j) = )(r))(j) =

r +/ 1
r

j +/ 1
j

j
I+
(1 j)
a+j
)(t) =
t
X
a=0
)(r, j = t r)
=
t
X
a=0

r +/ 1
r

t r +/ 1
t r

j
I+
(1 j)
t
=
t
X
a=0
(1)
a

/
r

(1)
ta

/
t r

j
I+
(1 j)
t
= (1)
t
j
I+
(1 j)
t
t
X
a=0

/
r

/
t r

= (1)
t
j
I+
(1 j)
t

/ /
t

using the hypergeometric identity


=

t +/ +/ 1
t

j
I+
(1 j)
t
; t = 0, 1, 2,
usingthegivenidentityon(1)
t

I
t

. (T hasanegativebinomial distribution)
8.2.1 (a) Useamultinomial distribution.
)(3, 11, 7, 4) =
25!
3! 11! 7! 4!
(.1)
3
(.4)
11
(.3)
7
(.2)
4
(b) GroupCsand1sintoasinglecategory.
)(3, 11, 11) =
25!
3! 11! 11!
(.1)
3
(.4)
11
(.5)
11
(c) Of the21non1sweneed3s, 11 1sand7Cs. The(conditional) probabilitiesfor the
non-1sare: 1/8for , 4/8for 1, and3,8 for C.
(e.g. 1(|1) = 1(),1(1) = .1,.8 = 1,8)
Therefore )(3
0
:, 111
0
: 7C
0
:|41
0
:) =
21!
3!11!7!
(
1
8
)
3
(
4
8
)
11
(
3
8
)
7
.
8.2.2 j = .6 12 = 7.2
j
1
= 1(fewer than5chips) =
4
P
a=0
7.2
i
c
7.2
a!
j
2
= 1(morethan9chips) =1
9
P
a=0
7.2
i
c
7.2
a!
(a)

12
3

j
3
1
(1 j
1
)
9
253
(b)
12!
3!7!2
j
3
1
j
7
2
(1 j
1
j
2
)
2
(c) Giventhat 7have9chips, theremaining5areof 2types- under 5chips, or 5to9chips
1(< 5| 9 chips) =
1(< 5 and 9)
1( 9)
=
j
1
1 j
2
.
Usingabinomial distribution,
1(3under 5|7over 9) =

5
3

j
1
1j
2

1
j
1
1j
2

2
8.4.1
r 0 1 2 j 0 1
)
1
(r) .2 .5 .3 )
2
(j) .3 .7
1(A) = (0 .2) + (1 .5) + (2 .3) = 1.1
1(1 ) = (0 .3) + (1 .7) = .7
1(A
2
) = (0
2
.2) + (1
2
.5) + (2
2
.3) = 1.7;
1(1
2
) = .7
\ ar(A) = 1.7 1.1
2
= .49
\ ar(1 ) = .7 .7
2
= .21
1(A1 ) = (1 1 .35) + (2 1 .21) = .77
Co(A, 1 ) = .77 (1.1)(.7) = 0
Thereforej =
Cov(A, 1 )
p
Var(A)Var(1 )
= 0
Whilej = 0 indicates A and1 may beindependent (andindeedareinthis case), it does not
provethat they areindependent. It only indicatesthat thereisnolinear relationshipbetweenA
and1 .
8.4.2
(a)
r 2 4 6
)
1
(r) 3/8 3/8 1/4
j -1 1
)
2
(j)
3
8
+j
5
8
j
1(A) =

2
3
8

4
3
8

6
1
4

= 15,4; 1(1 ) =
3
8
j +
5
8
j =
1
4
2j;
1(A1 ) =

2
1
8

4
1
4

+ +

1
4
j

=
5
4
12j
Cov(A, 1 ) = 0 = 1(A1 ) 1(A)1(1 )
5
4
12j =
15
16

15
2
j
Thereforej = 5,72
254
(b) If A and1 areindependent thenCov(A, 1 ) = 0, andsoj must be5/72. But if j = 5,72
then
)
1
(2))
2
(1) =
3
8

4
9
=
1
6
6= )(2, 1)
ThereforeA and1 cannot beindependent for anyvalueof j
8.5.1
r = 0 1 2
)
1
(r) = 0.5 0.3 0.2
1(A) = (0 .5) + (1 .3) + 2 .2) = 0.7
1(A
2
) = (0
2
.5) + (1
2
.3) + (2
2
.2) = 1.1
Var(A) = 1(A
2
) [1(A)]
2
= 0.61
1(A1 ) =
X
all a,j
rj)(r, j)andthishasonlytwonon-zeroterms
= (1 1 0.2) + (2 1 .15) = 0.5
Cov(A, 1 ) = 1(A1 ) 1(A)1(1 ) = 0.01
Var(3A 21 ) = 9Var(A) + (2)
2
Var(1 ) + 2(3)(2)Cov(A, 1 )
= 9(.61) + 4(.21) 12(.01) = 6.21
8.5.2 Let A
i
=
(
0, if the ith pair is alike
1, if the ith pair is unlike
, i = 1, 2, ..., 24.
1(A
i
) =
1
X
a
.
=0
r
i
)(r
i
) = 1)(1) = 1(ON OFF OFF ON)
= (0.6)(0.4) + (0.4)(0.6) = 0.48
1(A
2
i
) = 1(A
i
) = .48 (for A
i
= 0 or 1)
Var (A
i
) = .48 (.48)
2
= .2496
Consider apair whichhasnocommonswitchsuchasA
1
, A
3
. SinceA
1
dependsonswitch1&2
andA
3
onswitch3&4 andsincetheswitchesaresetindependently, A
1
andA
3
areindependent
andsoco(A
1
, A
3
) = 0. Infact all pairs areindependent if they havenocommonswitch, but
255
maynot beindependent if thepairsareadjacent. Inthiscase, for example, sinceA
i
A
i+1
isalso
anindicator randomvariable,
1(A
i
A
i+1
) = 1(A
i
A
i+1
= 1)
= 1(ON OFF ON OFF ON OFF)
= (.6)(.4)(.6) + (.4)(.6)(.4) = .24
Therefore
Cov (A
i
, A
i+1
) = 1(A
i
A
i+1
) 1(A
i
)1(A
i+1
)
= 0.24 (.48)
2
= .0096
1(
X
A
i
) =
24
X
i=1
1(A
i
) = 24 .48 = 11.52
Var(
X
A
i
) =
24
X
i=1
Var(A
i
) + 2
23
X
i=1
Cov (A
i
, A
i+1
) = (24 .2496) + (2 23 .0096)
= 6.432
8.5.3
j =
Cov(A, 1 )
o
a
o
j
= 0.5
Co(A, 1 ) = 0.5

1.69 4 = 1.3
\ ar(l) = \ ar(2A 1 ) = 4o
2
A
+o
2
Y
4Co(A, 1 ) = 5.56
Therefore:.d.(l) = 2.36
8.5.4
Co (A
i1
, A
i
) = Co (1
i2
+1
i1
, 1
i1
+1
i
)
= Co (1
i2
, 1
i1
) +Co (1
i2
, 1
i
) +Co (1
i1
, 1
i1
) +Co (1
i1
, 1
i
)
= 0 + 0 +\ ar (1
i1
) + 0 = o
2
Also, Co (A
i
, A
)
) = 0 for , 6= i 1 and\ ar(A
i
) = \ ar(1
i1
) +\ ar(1
i
) = 2o
2
\ ar (
P
A
i
) =
P
\ ar(A
i
) + 2
a
P
i=2
Co (A
i1
, A
i
) = :(2o
2
) + 2(: 1)o
2
= (4: 2)o
2
256
8.5.5 UsingA
i
asdened, 1(A
i
) =
1
P
a
.
=0
r
i
)(r
i
) = )(1) = 1

A
2
i

sinceA
i
= A
2
i
1(A
1
) = 1(A
24
) = .9 sinceonly1cut isneeded
1(A
2
) = 1(A
3
) = = 1(A
23
) = .9
2
= .81 since2cutsareneeded.
Var(A
1
)=Var(A
24
) = .9 .9
2
= .09
Var(A
2
)=Var(A
3
) = = Var(A
23
) = .81 .81
2
= .1539
Cov(A
i
, A
)
) = 0if , 6= i 1 sincetherearenocommonpiecesandcutsareindependent.
1(A
i
A
i+1
) =
P
r
i
r
i+1
) (r
i
, r
i+1
) = )(1, 1)
(product is0if either r
i
or r
i+1
isa0)
=
(
.9
2
for i = 1 or 23........2 cuts needed
.9
3
for i = 2, .., 22........3 cuts needed
Cov (A
i
, A
i+1
) = 1(A
i
A
i+1
) 1(A
i
)1(A
i+1
)
=
(
.9
2
(.9)(.9
2
) = .081 for i = 1 or 23
.9
3
(.9
2
)(.9
2
) = .0729 for i = 2, , 22
1

24
X
i=1
A
i
!
=
24
X
i=1
1(A
i
) = (2 .9) + (22 .81) = 19.62
\ ar

24
X
i=1
A
i
!
=
24
X
i=1
\ ar(A
i
) + 2
X
i<)
Co (A
i
, A
)
)
= (2 .09) + (22 .1539) + 2 [(2 .081) + (21 .0729)] = 6.9516
Therefores.d.(
P
A
i
) =

6.9516 = 2.64
9.1.1 (a)
R
1
1
/r
2
dr = /
a
3
3
|
1
1
=
2I
3
= 1
Therefore/ = 3,2
(b) 1(r) =

0; for r 1
R
a
1
3
2
r
2
dr =
a
3
2
|
a
1
=
a
3
2
+
1
2
; for 1 < r < 1
1; for r 1
(c) 1(.1 < A < .2) = 1(.2) 1(.1) = .504 .4995 = .0045
(d)
1(A) =
Z
1
1
r
3
2
r
2
dr =
3
2
Z
1
1
r
3
dr =
3
8
r
4
|
1
1
= 0
1(A
2
) =
Z
1
1
r
2
3
2
r
2
dr =
3
10
r
5
|
1
1
= 3,5
Therefore\ ar(A) = 1(A
2
) j
2
= 3,5
257
(e)
1
Y
(j) = 1(1 j) = 1(A
2
j) = 1(

j A

j)
= 1
A
(

j) 1
A
(

j) =

j)
3
2
+
1
2

j)
3
2
+
1
2

= j
32
Therefore)(j) =
o
oj
1
Y
(j) =
3
2

j for 0 j < 1 andis0 otherwise.
9.1.2 (a) 1() = 1 = lim
a
Ia
n
1+a
n
= lim
a
I
1
i
n
+1
= /. Therefore/ = 1
(b) )(r) =
o
oa
1(r) =
aa
n1
(1+a
n
)
2
; for r 0.
(c) Let :bethemedian. Then1(:) = .5 =
n
n
1+n
n
. Therefore:
a
= 1 andsothemedianis
1
9.2.1 1(r) =
R
a
1
3
2
r
2
dr =
a
3
+1
2
. If j = 1(r) =
a
3
+1
2
isarandomnumber between0and1, then
r = (2j 1)
13
. For j = .27125weget r = (.4574)
13
= .77054.
9.3.1 Let thetimetodisruptionbeA.
Then1(A 8) = 1(8) = 1 c
80
= .25
Therefore c
80
= .75. Takenatural logs giving0 =
8
ln .75
= 27.81hours.
9.3.2 (a) 1(r) = 1(distance r) =1 1(distance r) = 1 1(0awsor 1awwithinradius
r)
Thenumber of awshasaPoissondistributionwithmeanj = `r
2
.
1(r) = 1
j
0
c
j
0!

j
1
c
j
1!
= 1 c
Aa
2

1 +`r
2

)(r) =
d
dr
1(r) = 2`
2

2
r
3
c
Aa
2
for r 0
(b) j = 1(A) =
R

0
r2`
2

2
r
3
c
Aa
2
=
R

0
2`
2

2
r
4
c
Aa
2
dr. Let j = `r
2
. Then
dj = 2`rdr, sodr =
oj
2

Aj
j =
Z

0
2j
2
c
j
dj
2

`j
=
1

`
Z

0
j
32
c
j
dj
=
1

5
2

=
1

3
2

3
2

=
1

3
2

1
2

1
2

3
2

1
2

`
=
3
4

`
258
9.5.1 (a) 1 (8.4 < A < 12.2) = 1

8.410
2
< 7 <
12.210
2

.
= 1(.8 < 7 < 1.1)
=1(1.1) 1(.8)
=1(1.1) [1 1(.8)]
=.8643 (1 .7881) = .6524
(seeFigure11.2)
4 3 2 1 0 1 2 3 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.8 1.1
f(
z
)
z
Figure11.2:
(b) 21 A is normally distributed with mean 2(3) 10 = 4, and variance 2
2
(100) +
(1)
2
(4) = 404.
1(21 A) = 1(21 A 0)
= 1(7
0 (4)

404
= .20)
= 1(7 .20)
= 1 1(.20) = 1 .5793 = .4207
(c) 1 is normally distributed with mean 3, and variance
100
25
= 4. Therefore1(1 < 0) =
1(7 <
03
2
= 1.5) = 1(7 1.5) = 1 1(1.5) = 1 .9332 = .0668
259
60 65 70 74.5 80 85 90 95 100
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
f(
z
)
z
Figure11.3:
9.5.2
1 (|A j| < o)=1 (o < A j < o) = 1 (1 < 7 < 1)
= 1(1) [1 1(1)] = .8413 (1 .8413) = 68.26% (about 2/3)
1 (|A j| < 2o)=1 (2o < A j < 2o) = 1 (2 < 7 < 2)
= 1(2) [1 1(2)] = .9772 (1 .9772) = 95.44%(about 95%)
Similarly, 1 (|A j| < 3o) = 1 (3 < 7 < 3) = 99.73% (over 99%)
9.5.3 (a) 2A 1 isnormallydistributedwithmean2(5) 7 = 3,variance2
2
(4) + 9 = 25.
1 (|2A 1 | 4) = 1(2A 1 4) +1(2A 1 < 4)
= 1

7
4 3
5
= .20

+1

7 <
4 3
5
= 1.40

= .42074 +.08076 = .5015


(b) 1

A 5

< 0.1

= 1

|7| <
0.1
2

= .98 ( since...A (5, 4,:)). Therefore


1

0.1
2

= .99.
Therefore.05

: = 2.3263 and: = 2164.7 sowetake: = 2165 observations.


9.6.1 Let A bethenumber germinating. ThenA 1i(100, .8).
1(A 75) =
100
P
a=75

100
a

(.8)
a
(.2)
100a
.
Approximateusinganormal distributionwithj = :j = 80 ando
2
= :j(1 j) = 16.
260
1(A 75) ' 1(A 74.5) (seeFigure11.3)
= 1(7
74.5 80
4
= 1.375)
' 1(1.38) = .9162
Possiblevariationsonthissolutionincludecalculating 1(1.375) as
1(1.37)+1(1.38)
2
andrealizing
that A 100 means
1(A 75) ' 1(74.5 < A < 100.5)
However,
1(A 100.5) ' 1(7
100.5 80
4
= 5.125) ' 0
soweget thesameanswer asbefore.
9.6.2 Let A
i
bethecost associatedwithinspectingpart i
1(A
i
) = (0 .6) + (10 .3) + (100 .1) = 13
1

A
2
i

=

0
2
.6

10
2
.3

100
2
.1

= 1030
\ ar (A
i
) = 1030 13
2
= 861
By thecentral limit theorem
80
P
i=1
A
i
is Normal withmean80 13 = 1040 andvariance
80 861 = 68, 880 approximately. Since
P
A
i
increasesin$10increments,
1(
X
A

1200) ' 1

7
1205 1040

68, 880
= 0.63

= 0.2643
Answers to End of Chapter Problems
Chapter 2:
1. 2.1(a) Label theprofs, 1, C and1.
o = {, 1, C, 1, 1, 11, 1C, 11, C, C1, CC, C1, 1, 11, 1C, 11}
(b) 1/4
2.2(a) {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
(b)
1
4
;
2.3 o = {(1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5), (2, 1), (3, 1), (4, 1, ), ....(5, 4)};
probabilityconsecutive= 0.4;
2.4(c)
1
4
,
3
8
,
1
4
, 0
2.5(a)
8
27
,
1
27
,
2
9
(b)
(a1)
3
a
3
,
(a2)
3
a
3
,
a
(3)
a
3
(c)
(a1)
r
a
r
,
(a2)
r
a
r
,
a
(r)
a
r
2.6(a) .018 (b) .020 (c) 18/78=.231
2.7(b) .978
Chapter 3:
1. 3.1 (a) 4/7 (b) 5/42 (c) 5/21;
3.2 (a) (i)
(a1)
r
a
r
(ii)
a
(r)
a
r
(b) All :
v
outcomes are equally likely. That is, all : oors are equally likely to be
selected, andeachpassengersselectionisunrelatedtoeachother personsselection. Both
assumptions aredoubtful sincepeoplemay betravellingtogether (e.g. samefamily) and
theoorsmaynot haveequal trafc(e.g. morelikelytousethestairsfor goingup1oor
thanfor 10oors);
3.3 (a) 5/18 (b) 5/72;
3.4
(
4
2
)(
12
4
)(
36
7
)
(
52
13
)
261
262
3.5 (a) 1/50,400 (b) 7/45;
3.6 (a) 1/6 (b) 0.12;
3.7 Valuesfor r = 20, 40 and60are.589, .109and.006.
3.8 (a)
1
a
(b)
2
a
3.9
1+3++(2a1)
(
2n+1
3
)
=
a
2
(
2n+1
3
)
3.10 (a) (i) .0006 (ii) .0024 (b)
10
(4)
10
4
= .504
3.11 (a)

6
2

19
3

25
5

(b) 15
3.12 (a) 1,

49
6

(b)

6
5

43
1

49
6

(c)

6
4

43
2

49
6

(d)

6
3

43
3

49
6

3.13
(a) 1

48
3

50
3
(b) 1

45
2

47
2
(c)

48
3

50
5

3.14 Bythebinomial theorem


a
P
a=0

a
a

a
a
= (1 +a)
a
Differentiatewithrespect toa onbothsides:
a
P
a=0
r

a
a

a
a1
= :(1 +a)
a1
. Multiplybya toget
a
P
a=0
r

a
a

a
a
= :a(1 +a)
a1
Let a =

j
1j

. Then
a
P
a=0
r

a
a

j
1j

a
= :

j
j1

1 +
j
1j

a1
=
aj
(1j)
n
(1)
a1
Multiplyby(1 j)
a
:
a
X
a=0
r

:
r

j
1 j

a
(1 j)
a
=
a
X
a=0
r

:
r

j
a
(1 j)
aa
=
:j
(1 j)
a
(1 j)
a
= :j
3.15 Let Q = {headsonquarter}and1={headsondime}. Then
1(Bothheadsat sametime) = 1(Q1 Q 1Q1 Q 1 Q 1 Q1 )
= (.6)(.5) + (.4)(.5)(.6)(.5) + (.4)(.5)(.4)(.5)(.6)(.5) +
=
(.6)(.5)
1 (.4)(.5)
= 3,8 (usinga +ar +ar
2
+ =
a
1 r
withr = (.4)(.5))
3.16
3.17
Chapter 4:
1. 4.1 0.75, 0.6, 0.65, 0, 1, 0.35, 1
4.2 1() = 0.01, 1(1) = 0.72, 1(C) = (0.9)
3
, 1(1) = (.5)
3
, 1(1) = (0.5)
2
263
4.3 1(|1) =
1(1)
1(1)
=
1()1(1)
11(1)
=
1()1(1)1(|1)
11(1)
=
0.30.4(0.5)
10.4
=
1
6
4.4 (a) 0.0576 (b) 0.4305 (c) 0.0168 (d) 0.5287
4.5 0.44
4.6 0.7354
4.7 (a) 0.3087 (b) 0.1852;
4.8
4.9 0.342
4.10 (a) 0.1225, 0.175 (b) 0.395
4.11 (
)
1
) = (
n
A
)
4.12
4.13
4.14 (a)
1
30
+
41
5
(b) j =
(30aa)1
24
(c)
24j
1+24j
4.15 0.9, 0.061, 0.078
4.16 (a) 0.024 (b) 8onanyonewheel and1ontheothers
4.17 (a) 0.995 and0.005 (b) 0.001
4.18
(a) 0.99995
(b) 0.99889
(c) 0.2 + 0.1 + 0.1 (.2)(.1) (.2)(.1) (.1)(.1) + (.2)(.1)(.1) = 0.352
4.19 (a)
v
v+1999
; 0.005, 0.0148, 0.0476 (b) 2.1%
Chapter 5:
1. 5.1 (a) .623, .251; for males, .408, .103 (b) .166
5.2 (a) )(0) = 0, )(r) = 2
a
(r = 1, 2, . . . ) (b) )(5) =
1
32
; 1(A 5) =
1
16
5.3
5.4
j(1j)
r
1(1j)
4
; r = 0, 1, 2, 3
5.5 (a) .0800 (b) .171 (c) .00725
5.6 (a) .010 (b) .864
264
5.7 (a)
4
15
(b)

74
j

76
12j

150
12

(c) .0176
5.8 0.9989
5.9 (a) .0758 (b) .0488 (c)

10
j

(c
10A
)
j
(1 c
10A
)
10j
(d) ` = .12
5.10 (a) .0769 (b) 0.2019; 0.4751
5.11 (a) 0.2753 (b) 0.1966 (c) 0.0215
5.12 (b) enablesustoapproximatehypergeometricdistributionbybinomial distributionwhen:
islargeandj near 0.
5.13 (a) 1

I1
P
a=0
A
i
c
A
a!

a
(b) (Could probably argue for other answers also). Budworms
probablyarent distributedat auniformrateover theforest andmaynot occur singly
5.14 (a) .2048 (b) .0734 (c) .428 (d) .1404
5.15
(
35
i
)(
70
7
)
(
105
i+7
)
63
98a
; r = 0, 1, , 35
5.16 (a) .004264; .006669 (b) .0032 (c) (i)

1399
11

(.004264)
12
(.995736)
1388
(ii) 9.33610
5
Ontherst1399attemptsweessentiallyhaveabinomial distributionwith: = 1399 (large)
andj = .004264 (near 0)
5.17 (a)

a
a

c
0.96

1 c
0.96

aa
; r = 0, 1, , : (b) ` 0.866 bubblesper :
2
5.18 0.5;
r 0 1 2 3 4 5
)(r) 0 .05 .15 .05 .25 .5
; 0.3
5.19 (a) (1 j)
j
(b) 1 = 0 (c) j,[1 (1 j)
3
] (d) 1(1 = r) =
j(1j)
r
1(1j)
3
for r = 0, 1, 2
5.20 (a) .555 (b) .809; .965 (c) .789; .946 (d) : = 1067
5.21 (a)

a1
999

(.3192
1000
)(.6808
a1000
) (b) .002, .051, .350, .797 (c)

3200
j

(.3192
j
)(.6808
3200j
); .797
Chapter 7:
1. 7.1 2.775; 2.574375
7.2 -$3
7.3 $16.90
7.4 (a) 3cases (b) 32cases
7.5 (a) - 10/37dollarsinbothcases (b) .3442; 0.4865
7.6 $.94
7.7 (b) : +
a
I
:(1 j)
I
, whichgives1.01:, 0.249:, 0.196: for / = 1, 5, 10
7.8 50
265
7.9 (a)
j
1(1j)c
I
for t < ln(1 j); (b)
1j
j
;
1j
j
2
7.10
7.11 (a) Expand'(t) inapower seriesinpowersof c
t
, i.e.
'
A
(t) =
1
3
c
t
+
2
9
c
2t
+
4
27
c
3t
+
8
81
c
4t
+
16
243
c
5t
+...
andthisconvergesfor
|
2
3
c
t
| < 1 or t < ln(
3
2
).
Then1(A = ,) =coefcient of c
)t
=
1
3
(
2
3
)
)1
, , = 1, 2, ...
(b) Similarly
'
A
(t) = c
2
+ 2c
2
c
t
+ 2c
2
c
2t
+
4
3
c
2
c
3t
+
2
3
c
2
c
4t
+
4
15
c
2
c
5t
+....
Then1(A = ,) = c
2 2

)!
, , = 0, 1, ...
7.12 '(t) =
1
bo+1
P
b
a=o
c
at
=
c
aI
c
(l+1)I
(1c
I
)(bo+1)
. 1(A) = '
0
(0) =
1
bo+1
P
b
a=o
r, 1(A
2
) =
'
00
(0) =
1
bo+1
P
b
a=o
r
2
7.13 (a) '(t) = 0.25 + 0.5c
t
+ 0.25c
2t
(b) '
(I)
(0), / = 1, 2, ..., 6.
1
2
+
1
4
(c) j
0
= 1,4, j
1
= 1,2, j
2
= 1,4
(d) Notethat for givenvaluesof themean1(A) = j
1
, 1(A
2
) = j
2
, thereis aunique
solutiontotheequationsj
0
+j
1
+j
2
= 1, j
1
+ 2j
2
= j
1
, j
1
+ 4j
2
= j
2
7.14 If A isBin(13,
1
2
) thenq(o
13
) = max(2A 18, 0) and
1[q(o
13
)] =

13
10

+ 2

13
11

+ 3

13
12

+ 4

13
13

2
12
=
485
4096
7.16 Let 1 = thenumber of cupsof coffeeYasmindrinksinaweek andlet 7 = thenumber of cups
of coffeeZackdrinksinaweek. Wehave1 (r) = 2r
2
and7(r) = |2r 1|.
1[1 ] =
X
aS
1 (r)1(A = r) =
5
X
a=0
(2r
2
)1(A = r)
= 2(0)
2
(0.09) + 2(1)
2
(0.10) + 2(2)
2
(0.25)
+ 2(3)
2
(0.40) + 2(4)
2
(0.15) + 2(5)
2
(0.01)
= 14.7
266
OnaverageYasmindrinks14.7cupsof coffeeper week.
1[7] =
X
aS
7(r)1(A = r) =
5
X
a=0
|2r 1|1(A = r)
= |2(0) 1|(0.09) + |2(1) 1|(0.10) + |2(2) 1|(0.25)
+ |2(3) 1|(0.40) + |2(4) 1|(0.15) + |2(5) 1|(0.01)
= 4.08
OnaverageZackdrinks4.08cupsof coffeeper week.
7.17 (a) Let A
i
bethewinningswhenattemptingQuestioni rstandlet i betheevent Questioni is
answeredcorrectly, i = , 1.
1[A

] =
X
aS
r1(A

= r)
= ($100 + $200)1(1) + ($100)1(1) + ($0)1()
= ($300)1()1(1) + ($100)1()1(1)
= ($300)(0.8)(0.6) + ($100)(0.8)(0.4)
= $176
1[A
1
] =
X
aS
r1(A
1
= r)
= ($100 + $200)1(1) + ($200)1(1) + ($0)1(1)
= ($300)1(1)1() + ($200)1(1)1()
= ($300)(0.6)(0.8) + ($100)(0.6)(0.2)
= $168
Thereforetheexpectedwinningsaregreatest whenattemptingQuestionA rst.
(b)
1[A

] = ($300)1(1) + ($100)1(1) + ($50)1()


= ($300)(0.8)(0.6) + ($100)(0.8)(0.4) ($50)(0.2)
= $166
267
1[A
1
] = ($300)1(1) + ($200)1(1) + ($50)1(1)
= ($300)(0.6)(0.8) + ($200)(0.6)(0.2) ($50)(0.4)
= $148
QuestionA shouldstill beattemptedrst tomaximizeexpectedwinnings.
7.18 (a) Let = thenumber of kids that arriveintherst half hour. Sincearrivals areaPoisson
processthen 1oi::o:(12,/r 0.5 hr = Poisson(6).
1(5 7) = 1( = 5) +1( = 6) +1( = 7)
=
e
6
(6)
5
5!
+
e
6
(6)
6
6!
+
e
6
(6)
7
7!
= e
6
1296
7
rcd 0.45892 . . .
(b) Let ' = thenumber of kids that arriveper 3.5 hours. Therefore' Poisson(j) where
j = 3.512 = 42. If wearealreadyfamiliar withtheexpectedvalueof aPoissonrandom
variableweknowthat1['] = j, however if wearejust learningabout expectedvaluewe
canderivethisasfollows,
1['] =

X
i=0
i1(' = i) =

X
i=0
i
c
j
(j)
i
i!
= c
j

X
i=1
j
i
(i 1)!
= jc
j

X
)=0
j
)
,!
where,= i 1
= jc
j
(c
j
) = j
= 42
Theexpectednumber of kidsover thewholeeveningis42.
(c) Using' asinthepreviouspart, wenotethat
1(' = r)
1(' = r + 1)
=
e
42 42
i
a!
e
42
42
i+1
(a+1)!
=
r + 1
42
=
r + 1
42
Thisis< 1 if r < 41 and 1 if r 41. It followsthat outcomesbecomemorelikelyasr
increasestoamaximumof 41 andthenlesslikely thereafter. Thetwooutcomes, A = 41
andA = 42 arebothequallylikelyred(andoccur withprobabilitye
42 42
41
41!
= e
42 42
42
42!

0.0614 . . . ).
268
(d) Againif wearefamiliar withthePoissondistributionweknowright awaythat \ ar(') =
j but it wouldbeuseful to derivethis fromthefact that (') = ['(' 1)] + j j
2
.
Now
1['(' 1)] =

X
i=0
i(i 1)1(' = i)
= e
j

X
i=0
i(i 1)
j
i
i!
= e
j
j
2
0
2
0j
2

X
i=0
j
i
i!
= e
j
j
2
0
2
0j
2
e
j
= e
j
j
2
e
j
= j
2
So(') = j
2
+ j j
2
= j. Thevarianceof thenumber of kidsarrivingover thewhole
eveningis42.
Chapter 8:
1. 8.1 (a) no )(1, 0) 6= )
1
(1))
2
(0) (b) 0.3and1/3
8.2 (a) mean=0.15, variance=0.15
8.3 (a) No (b) 0.978 (c) .05
8.4 (a)
(a+j+9)!
a!j!9!
j
a

j
(1 j )
10
for r, j = 0, 1, 2, .....
(b)

a+j+9
j


j
(1 )
a+10
; j = 0, 1, 2, ...; 6 .0527
8.5 (b) - .10dollars, (c) d = .95,:
8.6 Let A = :n:/cr of bacteriain50 ccof water. ThenA hasaPoisson(2.5) distribution.
1(A = ,) =
2.5
)
c
2.5
,!
, , = 0, 1, ...
Thenif (A
1
, A
2
, A
3
, A
4
, A
5
) represent thecountsinthevesamples
1(A
1
= 0, A
2
= 1, A
3
= 1, A
4
= 2, A
5
= 2) = c
2.5

2.5c
2.5

2.5
2
c
2.5
2!

1 c
2.5
2.5c
2.5

1(A
1
= 0, A
2
= 1, A
3
= 1, A
4
= 2, A
5
2) = c
2.5

2.5c
2.5

2.5
2
c
2.5
2!

1 c
2.5
2.5c
2.5

2
269
Sincethesecanoccur inanyorder, thereare
5!
2!2!
orderings of therstpossibilityand
5!
2!
of
thesecondsothat theprobability ve50c.c. samplesof water have1withnobacteria, 2
withonebacterium, and2withtwoor moreis
5!
2!2!
c
10

2.5
6
4

1 c
2.5
2.5c
2.5

+
5!
2!
c
7.5

2.5
4
2

1 c
2.5
2.5c
2.5

2.5
2
c
2.5
2!

8.7 (a)

5
a

3
2a

5a
ja

1+a
2+aj

8
2

6
2
; for r = 0, 1, 2, ; j = :ar(1, r), r + 1, r + 2;
(b) notee.g. that )
1
(0) 6= 0; )
2
(3) 6= 0, but )(0, 3) = 0
8.8 (a)
(
2
i
)(
1

)(
7
3i
)
(
10
3
)
; r = 0, 1, 2, andj = 0, 1
(b) )
1
(r) =

2
a

8
3a

10
3

; r = 0, 1, 2;
)
2
(j) =

1
j

9
3 j

10
3

; j = 0, 1
(c) 49/120and1/2
8.9 (a) /
2
i
c
2
a!
; r = 0, 1, 2, ... (b) c
4
(c) Yes. )(r, j) = )
1
(r))
2
(j)
(d)
4
I
c
4
t!
; t = 0, 1, 2, . . .
8.10 (b) .468
8.11 (a)
40!
(10!)
4

3
16

20

5
16

20
(b)

40
16

(1,2)
40
(c)

16
10

3
8

10

5
8

6
8.12 (a) 1.76 (b) 1(1 = j) =
8
P
a=j

a
j

(
1
2
)
a
)(r); 1(1 ) = 0.88 =
1
2
1(A)
8.13 (a) Multinomial (b) .4602 (c) $5700
8.15 207.867
8.16 (a) 1i(:, j +) (b) :(j +) and:(j +)(1 j ) (c) :j
8.17 (a) j
l
= 2, j
\
= 0, o
2
l
= o
2
\
= 1 (b) 0
(c) no. e.g. 1(l = 0) 6= 0; 1(\ = 1) 6= 0; 1(l = 0 and\ = 1) = 0
8.19 -1
8.20 (a) 1.22 (b) 17.67%
8.21 j
3
(4 +j); 4j
3
(1 j
3
) +j
4
(1 j
4
) + 8j
5
(1 j
2
)
270
8.22 Suppose1 is and let 1 be acolumn vector of ones of length :. Consider the
probabilityvector correspondingtothediscreteuniformdistribution =
1
.
1. Then

T
1 =
1

1
T
1 =
1

(
X
i
1
i1
,
X
i
1
i2
, ...
X
i
1
i.
) =
1

1
T
=
T
since1 isdoublystochastic.
Therefore isastationarydistributionof theMarkovchain.
8.23 Thetransitionmatrixis
1 =

0 1 0
2
3
0
1
3
2
3
1
3
0

fromwhich, solving
T
1 =
T
and rescaling so that the sumof the probabilities is
one, weobtain
T
= (0.4, 0.45, 0.15), thelongrunfractionof timespent incitiesA,B,C
respectively.
8.24 By arguments similar to thoseinsection8.3, thelimitingmatrix has rows all identically

T
wherethevector
T
arethestationaryprobabilitiessatisfying
T
1 =
T
and
1 =

0 1 0
1
6
1
2
1
3
0
2
3
1
3

Thesolutionis
T
= (0.1, 0.6, 0.3) andthelimit is

0.1 0.6 0.3


0.1 0.6 0.3
0.1 0.6 0.3

8.25 With7 = A +1,


'
Z
(t) = 1c
t(A+Y )
= 1(c
tA
)1(c
tY
) = '
A
(t)'
Y
(t) = exp(`
1
+`
1
c
t
) exp(`
2
+`
2
c
t
)
= exp((`
1
+`
2
) + (`
1
+`
2
)c
t
)
andsincethisistheMGF of aPoisson(`
1
+`
2
) distribution, thismust bethedistribution
of 7.
8.26 If today israining, theprobability of Rain, Nice, Snowthreedaysfromnowisobtainable
fromtherstrowof thematrix1
3
, i.e. (0.406 0.203 0.391). Theprobabilitiesof thethree
statesinvedays, given(1) todayisraining(ii) todayisnice(iii) todayissnowing arethe
threerowsof thematrix1
5
. Inthiscasecall rowsareidentical tothreedecimals; theyare
all equal theequilibriumdistribution
T
= (0.400 0.200 0.400).
271
(b) If ab, andbothpartiesraisethentheprobabilityB winsis
13 /
2(13 a)
<
1
2
andtheprobabilityA winsis1minusthisor
132o+b
2(13o)
. If a /, thentheprobabilityA
winsis
13 a
2(13 /)
.
8.27 (a) Inthespecial caseb=1, countthenumber of possible pairs(i, ,) for which = i a
and1 = , .
1 ( = 12)
2 ( = 11)
:
13 a ( = a)
(13o)(13o+1)
2
Total
Thisleadsto
1(1 , a) =
(13 a)(13 a + 1)
2(13
2
)
Similarly, sincethenumber of pairs(, 1) for which a, and1 < a is(13 a +
1)(a 1), wehave
1( 1, a) = 1( 1, a, 1 a) +1( 1, a, 1 < a)
=
(13 a)(13 a + 1)
2(13
2
)
+
(13 a + 1)(a 1)
13
2
=
(14 a)(a + 11)
2(13
2
)
Therefore, incaseb=1, theexpectedwinningsof A are
11(B raises, A doesnot) 61(bothraise, B wins) + 61(bothraise, A wins)
= 11( < a) 61(1 , a) + 61( 1, a)
= 1
a 1
13
6
(13 a)(13 a + 1)
2(13
2
)
+ 6
(14 a)(a + 11)
2(13
2
)
=
6
169
a
2
+
77
169
a
71
169
=
1
169
(a 1) (6a 71)
:whosemaximum(overreal a) isat77,12 andoverintegera, at6 or7. Fora=1,2,...,13
thisgivesexpectedwinningsof 0, 0.38462, 0.69231, 0.92308, 1.0769, 1.1538, 1.1538,
1.0769, 0.92308, 0.69231, 0.38462, 0, -0.46154 respectively, andthemaximum is
for a=6 or 7.
(b) Wewant1( 1, a, 1 /). Countthenumber of pairs(i, ,) for which a
and1 / and 1. Assumethat / a.
272
1 (1 = 12)
2 (1 = 11)
: :
13 a (1 = a)
(a /)(13 a + 1) (/ 1 < a)
for atotal of
(13 a)(13 a + 1)
2
+ (a /)(13 a + 1) =
1
2
(14 a) (13 +a 2/)
and
1( 1, a, 1 /) =
(14 a) (13 +a 2/)
2(13
2
)
Similarly
1( < 1, a, 1 /) = 1( < 1, a) =
(13 a)(13 a + 1)
2(13
2
)
Thereforetheexpectedreturnto (still assuming/ a) is
11( < a, 1 /) + 11( a, 1 < /)
+61( 1, a, 1 /) 61( < 1, a, 1 /)
= 1
(a 1)(13 / + 1)
13
2
+1
(/ 1)(13 a + 1)
13
2
+6
(14 a) (13 +a 2/)
2(13
2
)
6
(13 a)(13 a + 1)
2(13
2
)
=
1
13
2
(71 6a) (a /)
If / a thentheexpectedreturnto1 isobtainedbyswitchingtheroleof a, / above,
namely
1
13
2
(71 6/) (/ a)
andsotheexpectedreturnto is
1
13
2
(71 6/) (a /)
Ingeneral, thentheexpectedreturntoA is
1
13
2
(71 6 max(a, /)) (a /)
273
(c) Bypart (b), Aspossibleexpectedprot per gamefor a=1,2,...,13 and / = 11 is
1
13
2
(71 6 max(a, 11)) (a 11) =
6
13
2

max(a, 11)
71
6

(a 11)
For a = 1, 2, ...13 theseare, respectively,-0.2959, -0.2663, -0.2367, -0.2071, -0.1775,
-0.1479, -0.1183,-0.0888, -0.0592, -0.0296, 0, -0.0059, -0.0828. Thereisnostrategy
that providesapositiveexpectedreturn. Theoptimal isthebreak-evenstrategya=11.
(Note: inthistwo-personzero-sumgame, a= 11 andb=11 isaminimaxsolution)
8.28 i. ThepermutationA
)+1
after ,+1 requestsdependsonlyonthepermutationA
)
before
andtherecordrequestedat time, + 1. Thusthenewstatedependsonly only theold
stateA
t
(without knowingthepreviousstates) andtherecordcurrentlyrequested.
ii. For examplethelong-runprobabilityof thestate(i, ,, /) is, with
i
= j
i
,(1 j
i
),

i
j
)
iii. Theprobabilitythat record, isinposition/ = 1, 2, 3 is,
j
)
for / = 1, (Q
)
)j
)
, for / = 2, 1 j
)
(1 +Q
)
) for / = 3
whereQ =
P
3
i=1

i
. Theexpectedcost of accessingarecordinthelongrunis
3
X
)=1
{j
2
)
+ 2j
2
)
(Q
)
) + 3j
)
[1 j
)
(1 +Q
)
)]} (11.12)
Substitute j
1
= 0.1, j
2
= 0.3, j
3
= 0.6 so
1
=
1
9
,
2
=
3
7
,
3
=
6
4
and Q =
1
9
+
3
7
+
6
4
= 2.0397 and(11.12) is1.7214.
iv. If they areinrandomorder, theexpectedcost= 1(
1
3
) + 2(
1
3
) + 3(
1
3
) = 2. If they are
orderedintermsof decreasingj
)
, expectedcost isj
2
3
+ 2j
2
2
+ 3j
2
1
= 0.57
8.29 Let J =index of maximum. 1(J = ,) = 1,, for , = 1, 2, ..., . Let =" your
strategy chooses themaximum". occurs only J / andif max{A
i
; / < i < J} <
max{A
i
; 1 i /}. GivenJ = , /, theprobability of this is theprobability that
themaximummax{A
i
; 1 i < ,} occursamongtherst / values, i.e. theprobabilityis
/,(, 1). Therefore,
1() =
X
)
1(|J = ,)1(J = ,) =
.
X
)=I+1
1(|J = ,)
1

=
.
X
)=I+1
/
, 1
1

=
/

{
1
/
+
1
/ + 1
+... +
1
1
}

ln(

/
)
274
Notethat thevalueof r maximizingrln(1,r) isr = c
1
0.37 soroughly, thebest / is
c
1
. Theprobabilitythat youselect themaximumisapproximatelyc
1
0.37.
8.30 Theoptimal weightsare
n
1
=
1
co
2
1
, n
2
=
1
co
2
2
, n
3
=
1
co
2
3
wherec =
1
o
2
1
+
1
o
2
2
+
1
o
2
3
ando
1
= 0.2, o
2
= 0.3, o
3
= 0.4
Chapter 9:
1. 9.1 )(j) = (
5
6
)(
6

)
1
3
j

2
3
for .036 j

6
9.2 (a) / = .75; 1(r) = .75

2
3
+r
a
3
3

for 1 r 1
(b) Findc suchthat c
3
3c + 1.9 = 0. Thisgivesc = .811
9.3 (a) 1/2, 1/24 (b) 0.2828 (c) 0.2043
9.4 )(j) = 1; 0 < j < 1
9.5 (a) c 1 (b) 0.5
c+1
,
c+1
c+2
(c)
c+1
t
o+2
; 1 < t <
9.6 (a) (1 c
2
)
3
(b) c
.4
9.7 1000 log 2 = 693.14
9.8 (a) .0668, .2417, .3829, .2417, .0668 (b) .0062 (c) .0771
9.9 (a) .5 (b) j 2.023
9.10 0.4134
9.11 (a) .3868 (b) .6083 (c) 6.94
9.12 (a) .0062 (b) .9927
9.13 (a) .2327, .1841 (b) .8212, .8665; Guessif j
i
= 0.45, dont guessif j
i
= 0.55
9.14 6.092cents
9.15 574
9.16 (a) 7.6478, 88.7630
(b) 764.78, 8876.30, peoplewithinpooledsamplesareindependent andeachpooledsam-
pleisindependent of eachother pooledsample.
(c) 0.3520
275
9.17 0.5969
9.18 (a) .6728 (b) 250,088
9.19 (a) A (.02:, .9996:)
(b) 1(A 0) = .4641, .4443, .4207 (usingtable) for : = 20, 50, 100 Themoreyouplay,
thesmaller your chanceof winning.
(c) 1264.51
Withprobability.99thecasinosprot isat least $1264.51.
9.20 (a) A is approximately

a
2
,
5a
12

(b) (i)1(A 0) = 1(7 2.45) = 0.0071. (ii)


1(A 0) = 1(7 5.48) ' 0.
9.21 (a) (i) .202 (ii) .106 (b) .0475, .0475
9.22 (a) Falsepositiveprobabilitiesare1(7
o
3
) = 0.0475, 0.092, 0.023 for 7 standardnor-
mal andd = 5, 4, 6 in(i), (ii), (iii). Falsenegativeprobabilitiesare1(7 <
o10
3
) =.0475,
0.023, 0.092 for 7 standard normal and d = 5, 4, 6 in (i), (ii), (iii). (b)Thefactors are
thesecurity (proportionof spaminemail) andproportionof legitimatemessages that are
lteredout.
9.23
9.24 Let 1 =total changeover day. Given = :, 1 has aNormal(0, :o
2
) distributionand
therefore
1[c
tY
| = :] = exp(:o
2
t
2
,2)
'
Y
(t) = 1[c
tY
] =
X
a
1[c
tY
| = :]1( = :) = c
A
X
a
exp(:o
2
t
2
,2)
`
a
:!
= c
A
X
a
(c
o
2
t
2
2
`)
a
:!
= exp(` +c
o
2
t
2
2
`)
Not aMGF inthiscourseat least. Themeanis'
0
Y
(0) = 0 andthevarianceis'
Y
(0) =
`o
2
.
9.25 (a) i. exp(t +t
2
)
ii. exp(2t + 2t
2
)
iii. exp(:t +:t
2
)
iv. exp(t
2
)
276
9.27 (a) Clearly, )(r) 0 for anyr R. Weneedtocheckif
R

)(r)dx. Wehave
Z

)(r)dr =
Z

0
r
c1
(c),
c
e

i
c
dr
=
1
(c),
c
Z

0
r
c1
c

i
c
dr
=
1
(c),
c
Z

0
(,j)
c1
c
j
d(,j) rcd(Changeof variable
r
,
= j)
=
1
(c)
Z

0
j
c1
c
j
dj
= 1. rcd(Bydenitionof theGammafunction.)
(b) Werst ndthemoment generatingfunctionof A:
'(t) = [c
tA
] =
1
(c),
c
Z

0
r
c1
c

t
1
c

a
dr
=
1
(c),
c
Z

0

j
1
o
t

c1
c
j
d

j
1
o
t

rcdafter thesubstitutionj =

1
,
t

r
=
1
(c)(1 ,t)
c
Z

0
j
c1
c
j
dj
=
1
(1 ,t)
c
rcdbydenitionof .
'(.) toexistwemusthavet <
1
o
, otherwise, theintegralsdonotexist. Since, 0, t = 0
isintheDomainof '(.). Therefore,
1[A] =
d'
dt
(0) = c,and1[A
2
] =
d
2
'
dt
2
(0) = c(c + 1),
2
.
Hence,
\ ar(A) = 1[A
2
] (1[A])
2
= c,
2
.
(c) For c = 1, wehave(1) = 1. Then,
)(r) =
(
1
o
c

i
c
r 0
0 otherwise.
ThisisthePDF of anexponential r.v. withparameter ` =
1
o
or 0 = ,.
9.28 (a) Inthispart, weusethefollowingresult incalculus:
Solution 39 redFact- Let ) : R [0, ) be a positive function. Also, )(.) is monotonic.
Then,
Z

0
)(r)dr

X
a=1
)(:).
277
We have 1[A] =
R

ca
(c
2
+a
2
)
dr and 1[A
2
] =
R

ca
2
(c
2
+a
2
)
dr.
The integral
R

ca
(c
2
+a
2
)
dr does not exist because in particular
R

c
a
(c
2
+a
2
)
dr does not
exist. To see this note that the function r 7
a
(c
2
+a
2
)
is decreasing for r c. Therefore,
Z

c
r
(c
2
+r
2
)
dr
1

X
a=dce
:
c
2
+:
2
.
But
P

a=dce
a
c
2
+a
2
does not exist.
Also,
R

ca
2
(c
2
+a
2
)
dr does not exist because
R

0
a
2
(c
2
+a
2
)
dr does not exist. To see this,
note that the function r 7
a
2
(c
2
+a
2
)
is increasing. Hence,
Z

0
r
2
(c
2
+r
2
)
dr
1

X
a=0
:
2
c
2
+:
2
.
But
P

a=0
a
2
c
2
+a
2
does not exist.
(b) Let 1 =
1
A
. Showthat 1 has aCauchy distributionwithparameter
1
c
. Let us rst nd
1(1 j) for somej R. Wehave1(1 j) = 1(
1
A
j). SinceA canbepositiveor
negative, weconsider twocases:
1- Let j 0. Then,
1

1
A
j

= 1

1
A
j, A 0

+1

1
A
j, A < 0

= 1

A
1
j

+1 (A < 0)
Here(reda) isduetothefactthatforj 0,

1
A
j, A 0

=
n
A
1
j
o
and

1
A
j, A < 0

=
{A < 0}. Also, sincethePDF of A is symmetric around theorigin, 1(A < 0) =
1
2
.
Therefore,
1

1
A
j

=
3
2
1

A
1
j

. (11.13)
2- Let j < 0. Inthiscase,

1
A
j

=
n
1
j
A < 0
o
. Then,
1

1
A
j

= 1

1
j
A < 0

= 1(A < 0) 1

A
1
j

=
1
2
1

A
1
j

. (11.14)
278
Torecap,
1(1 j) =

3
2
1

A
1
j

j 0
1
2
1

A
1
j

j < 0
. (11.15)
Therefore, for anyj 6= 0,
)
Y
(j) =
d
dj
1(1 j) =
d
dj
1

A
1
j

=
1
j
2
)
A

1
j

=
c
(c
2
j
2
+ 1)
=
1
c

j
2
+
1
c
2
.
(11.16)
Therefore
1
A
isacauchyrandomvariablewithparameter
1
c
.
(c) Findthec.d.f. 1(.) andtheinversec.d.f. 1
1
(.) for therandomvariableA. Weknowthat
1(r) =
Z
a

)(n)dn =
Z
a

c
(c
2
+n
2
)
dn. (11.17)
Recall that
d
dn
tan
1

n
c

=
c
c
2
+n
2
. (11.18)
Therefore,
1(r) =

tan
1

n
c

=
1

tan
1

r
c

tan
1

=
1

tan
1

r
c

+

2

=
1
2
+
1

tan
1

r
c

. (11.19)
Since1(.) isincreasing, theinversec.d.f. isinfact theinversefunctionof 1(.), i.e.,
1
1
(:) = ctan

:
1
2

, : [0, 1]. (11.20)


(d) Assumewehaveaccess to auniformrandomvariablel Uniform[1, 0]. Suggest a
functionq(.) suchthat q(l) is aCauchy randomvariablewithparameter c. Weknow
that if \ Uniform[0, 1], then1
1
(\ ) is acauchy randomvariablewithparameter c.
However, l is uniformin [1, 0]. Then, one can easily check that 1 + l is simply a
Uniform[0, 1] randomvariable. red(Note: l wouldalsowork here.) Therefore, q(l) =
1
1
(1 +l) = ctan

l +
1
2

isthedesiredfunction.
9.29 Let beanevent andA beacontinuousrandomvariable. Wedene
)
A|
(r|) =
d
dr
1(A r|)
and
1(|A = r) =
)
A|
(r|)1()
)
A
(r)
.
279
(a) Bythedenitionof )
A|
(r|), weneedtorst nd1(A r|). Wehave
1(A r|) = 1(A r| |A| < 1) =
1(A r, |A| < 1)
1(|A| < 1)
.
Now, it isclear that
1(A r, |A| < 1) =

0 r < 1
1(1 < A r) |r| < 1
1(|A| < 1) r 1
=

0 r < 1
1(r) 1(1) |r| < 1
1(|A| < 1) r 1
. (11.21)
Therefore,
1(A r, |A| < 1)
1(|A| < 1)
=

0 r < 1
1(a)
1(|A|<1)

1(1)
1(|A|<1)
|r| < 1
1 r 1
. (11.22)
Therefore,
)
A|
(r|) =
d
dr
1(A r|) =

0 r < 1
d
di
1(a)
1(|A|<1)
|r| < 1
0 r 1
=
(
)(a)
1(|A|<1)
|r| < 1
0 |r| 1
.
(11.23)
(b) Wehave
1(|A = r))(r) = )
A|
(r|)1(). (11.24)
Now, integratebothsidesfromto,
Z

1(|A = r))(r)dr =
Z

)
A|
(r|)1()dr
= 1()
Z

)
A|
(r|)dr
= 1()
Z

d
dr
1(A r|)dr
= 1() (1(A < |) 1(A < |)) . (11.25)
But, {A < } = S whereS isthesamplespaceand{A < } = . Therefore,
Z

1(|A = r))(r)dr = 1()(1 0) = 1() (11.26)


asdesired.
280
(c) Bypart (b),
1() =
Z

1(|A = r))(r)dr =
Z
1
0

:
/

r
I
(1 r)
aI
dr
=

:
/
Z
1
0
r
I
(1 r)
aI
dr
=

:
/

/!(: /)!
(/ +: / + 1)!
=
:!
/!(: /)!
/!(: /)!
(: + 1)!
=
:!
(: + 1)!
=
1
: + 1
. (11.27)
9.30 (a) Let A bethelifetimekilometer-ageof thecar. It is anexponential randomvariablewith
parameter ` =
1
1[A]
=
1
200,000
. By thememoryless property of theexponential random
variables,
1(A 200, 000 + 100, 000|A 100, 000) = 1(A 200, 000) = 1 1(A < 200, 000)
= 1 (1 c
A200,000
)
= c

1
200,000
200,000
= c
1
= 0.3679.
(11.28)
(b) Let A bethelifetimekilometer-ageof thecar that is Uniform[0, 400, 000] inkilometers.
Wehave
1(A 200, 000 + 100, 000|A 100, 000) =
1(A 300, 000, A 100, 000)
1(A 100, 000)
=
1(A 300, 000)
1(A 100, 000)
=
R
400,000
300,000
1
400,000
dr
R
400,000
100,000
1
400,000
dr
=
1
3
= 0.3333
(11.29)
9.31 Let A be the bolts diameter. Then A N(1.2, (0.005)
2
). We want to nd the quantity
1(A 1.21 or A < 1.19). Notethat 7 =
A1.2
0.005
N(0, 1). Wehave
1(A 1.21 or A < 1.19) = 1(A 1.21) +1(A < 1.19)
= 1

A 1.2
0.005

1.21 1.2
0.005

+1

A 1.2
0.005
<
1.19 1.2
0.005

= 1(7 2) +1(7 < 2)


= 1 1(7 < 2) + (1 1(7 < 2))
= 2(1 1(7 < 2)) = 2(1 1(2)) = 2(1 0.97725) = 0.0455.
281
9.32 (a) o
a
isthetimeweneedtowait until the:
tI
hit occurs.
b. If o
a
t, thenthe:
tI
hithashappenedsometimein(0, t]. Therefore, A
t
:, becauseA
t
countsthenumber of hitsin(0, t]. Conversely, if A
t
:, it meansthat thenumber of hits
occurreduptotimet isat least :. Therefore, the:
tI
hit hashappenedsometimein(0, t],
i.e., o
a
t.
c. Weknowthat A
t
Poisson(`t). Therefore,
1(o
a
t) = 1(A
t
:) = 1 1(A
t
< :)
= 1
a1
X
)=0
1(A
t
= ,)
= 1
a1
X
)=0
c
At
(`t)
)
,!
. (11.30)
d. ThePDF of o
a
isgivenby
)
S
n
(t) =
d
dt
1(o
a
t)
=
a1
X
)=0

`c
At
(`t)
)
,!
+,`(`t)
)1
c
At
,!

= `c
At

1 +
a1
X
)=1

(`t)
)
,!

(`t)
)1
(, 1)!

= `c
At

1 +

(`t)
a1
(: 1)!

(`t)
0
0!

=
`
a
t
a1
c
At
(: 1)!
.
Thisisfor t 0. For t < 0, wesimplyhave)
S
n
(t) = 0. Therefore,
)
S
n
(t) =
(
A
n
t
n1
c
AI
(a1)!
t 0
0 t < 0
. (11.31)
Notingthat (:) = (: 1)!, wehave
)
S
n
(t) =
(
A
n
t
n1
c
AI
(a)
t 0
0 t < 0
(11.32)
whichisaGammarandomvariablewithparametersc = : and, =
1
A
.
Summary of Distributions
Discrete
Notation and
Parameters
Probabilityfunction
)(r)
Mean Variance
Moment generating
function'
A
(t)
Binomial(:, j)
0 < j < 1, = 1 j

a
a

j
a

aa
r = 0, 1, 2, ..., :
:j :j (jc
t
+)
a
Bernoulli(j)
0 < j < 1, = 1 j
j
a
(1 j)
1a
r = 0, 1
j j(1 j) (jc
t
+)
NegativeBinomial(/, j)
0 < j < 1, = 1 j

a+I1
a

j
I

a
r = 0, 1, 2, ...
Iq
j
Iq
j
2
(
j
1qc
I
)
I
Geometric(j)
0 < j < 1, = 1 j
j
a
r = 0, 1, 2, ...
q
j
q
j
2
(
j
1qc
I
)
Hypergeometric(, r, :)
r < , : <
(
r
i
)(
^r
ni
)
(
^
n
)
r = 0, 1, 2, ...min(r, :)
av
.
:
v
.
(1
v
.
)
.a
.1
intractible
Poisson(`)
` 0
c
A
A
i
a!
r = 0, 1, ...
` ` c
A(c
I
1)
Continuous p.d.f. )(r) Mean Variance
Moment generating
function'
A
(t)
Uniform(a, /) )(r) =
1
bo
, a < r < /
o+b
2
(bo)
2
12
c
lI
c
aI
(bo)t
Exponential(0)
0 < 0
)(r) =
1
0
c
a0
, 0 < r 0 0
2 1
10t
, t < 1,0
Normal(j, o
2
)
< j < , o
2
0
)(r) =
1

2o
c
(aj)
2
(2o
2
)
< r <
j o
2
c
jt+o
2
t
2
2
282
Probabilities for Standard Normal N(0,1) Distribution

The table gives the values of F(x) for
0 x

x 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.99952 0.99953 0.99955 0.99957 0.99958 0.99960 0.99961 0.99962 0.99964 0.99965
3.4 0.99966 0.99968 0.99969 0.99970 0.99971 0.99972 0.99973 0.99974 0.99975 0.99976
3.5 0.99977 0.99978 0.99978 0.99979 0.99980 0.99981 0.99981 0.99982 0.99983 0.99983

Values of F
-1
(p)
p 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.5 0 0.0251 0.05015 0.07527 0.1004 0.1257 0.15097 0.1764 0.20189 0.2275
0.6 0.25335 0.2793 0.30548 0.33185 0.3585 0.3853 0.41246 0.4399 0.4677 0.4959
0.7 1.67821 0.5534 0.58284 0.61281 0.6433 0.6745 0.7063 0.7388 0.77219 0.8064
0.8 0.84162 0.8779 0.91537 0.95417 0.9945 1.0364 1.08032 1.1264 1.17499 1.2265
0.9 1.28155 1.3408 1.40507 1.47579 1.5548 1.6449 1.75069 1.8808 2.05375 2.3263

You might also like