You are on page 1of 135

5/13/2016

QuantitativeMethodsOnlineCourse

ThisdocumentisauthorizedforuseonlybyNishithkumarRaval.Copyorpostingisaninfringementof
copyright.

QuantitativeMethodsOnlineCourse
PreAssessmentTestIntroduction
WelcometothepreassessmenttestfortheHBSQuantitativeMethodsTutorial.
Allquestionsmustbeansweredforyourexamtobescored.
Navigation:
Toadvancefromonequestiontothenext,selectoneoftheanswerchoicesor,ifapplicable,completewithyourown
choiceandclicktheSubmitbutton.Aftersubmittingyouranswer,youwillnotbeabletochangeit,somakesureyou
aresatisfiedwithyourselectionbeforeyousubmiteachanswer.Youmayalsoskipaquestionbypressingtheforward
advancearrow.PleasenotethatyoucanreturntoskippedquestionsusingtheJumptounansweredquestion
selectionmenuorthenavigationalarrowsatanytime.Althoughyoucanskipaquestion,youmustnavigatebacktoitand
answeritallquestionsmustbeansweredfortheexamtobescored.
Inthebriefcase,linkstoExcelspreadsheetscontainingzvalueandtvaluetablesareprovidedforyourconvenience.For
somequestions,additionallinkstoExcelspreadsheetscontainingrelevantdatawillappearimmediatelybelowthe
questiontext.
Yourresultswillbedisplayedimmediatelyuponcompletionoftheexam.
Aftercompletion,youcanreviewyouranswersatanytimebyreturningtotheexam.
Goodluck!

FrequentlyAskedQuestions
Howdifficultarethequestionsontheexam?Theexamquestionshavealevelofdifficultysimilartothe
exercisesinthecourse.
CanIrefertostatisticstextbooksandonlineresourcestohelpmeduringthetest?Yes.Thisisanopen
bookexamination.
MayIreceiveassistanceontheexam?No.AlthoughwestronglyencouragecollaborativelearningatHBS,workon
examssuchastheassessmenttestsmustbeentirelyyourown.Thusyoumayneithergivenorreceivehelponany
examquestion.
Isthisatimedexam?No.Youshouldtakeabout6090minutestocompletetheexam,dependingonyour
familiaritywiththematerial,butyoumaytakelongerifyouneedto.
WhathappensifIam(ormyinternetconnectionis)interruptedwhiletakingtheexam?Youranswer
choiceswillberecordedforthequestionsyouwereabletocompleteandyouwillbeabletopickupwhereyouleftoff
whenyoureturntotheexamsite.
HowdoIseemyexamresults?Yourresultswillbedisplayedassoonasyousubmityouranswertothefinal
question.Theresultsscreenwillindicatewhichquestionsyouansweredcorrectly.

Overview&Introduction
WelcometoQM...
Welcome!Youareabouttoembarkonajourneythatwillintroduceyoutothebasicsofquantitativeandstatistical
analysis.Thiscoursewillhelpyoudevelopyourskillsandinstinctsinapplyingquantitativemethodstoformulate,
analyze,andsolvemanagementdecisionmakingproblems.
Clickonthelinklabeled"TheTutorialanditsMethod"intheleftmenutogetstarted.

TheTutorialanditsMethod
1/135

5/13/2016

QuantitativeMethodsOnlineCourse

QMisdesignedtohelpyoudevelopquantitativeanalysisskillsinbusinesscontexts.Masteringitscontentwillhelp
youevaluatemanagementsituationsyouwillfacenotonlyinyourstudiesbutalsoasamanager.Clickontheright
arrowiconbelowtoadvancetothenextpage.
Thisisn'taformalorcomprehensivetutorialinquantitativemethods.QMwon'tmakeyouastatistician,butitwill
helpyoubecomeamoreeffectivemanager.
Thetutorial'sprimaryemphasisisondevelopinggoodjudgmentinanalyzingmanagementproblems.Whetheryou
arelearningthematerialforthefirsttimeorareusingQMtorefreshyourquantitativeskills,youcanexpectthe
tutorialtoimproveyourabilitytoformulate,analyze,andsolvemanagerialproblems.
Youwon'tbelearningquantitativeanalysisinthetypicaltextbookfashion.QM'sinteractivenatureprovides
frequentopportunitiestoassessyourunderstandingoftheconceptsandhowtoapplythemallinthecontextof
actualmanagementproblems.
Youshouldtake15to20hourstorunthroughthewholetutorial,dependingonyourfamiliaritywiththematerial.
QMoffersmanyfeatureswehopeyouwillexplore,utilize,andenjoy.

TheStoryanditsCharacters
Naturally,themostappropriatesettingforacourseonstatisticsisatropicalisland...
Somehow,"internship"isnotthewayyou'ddescribeyoursummerplanstoyourfriends.You'reflyingouttoHawaii
afterall,stayingata5starhotelasaSummerAssociatewithAvioConsulting.
Thisisagreatlearningopportunity,nodoubtaboutit.Tothinkthatyouhadalmostskippedoverthissummer
internship,asyoupreparedtoenrollinatwoyearMBAprogramthisfall.
YouarealsoexcitedthatthefirmhasassignedAlice,oneofitsrisingstars,asyourmentor.ItseemsclearthatAvio
partnersconsideryouahighpotentialinterntheyarewillingtoinvestinyouwiththehopethatyouwilllater
returnafteryoucompleteyourMBAprogram.
AlicerecentlyreceivedthelatestinaseriesofquickpromotionsatAvio.Thisisherfirstassignmentasaproject
lead:providingconsultingassistancetotheKahana,anexclusiveresorthotelontheHawaiianislandKauai.
Needlesstosay,oneoftheperksofthejobisthelodging.TheKahana'sbrochurelooksinvitingluxurysuites,fine
cuisine,aspa,sportsactivities.Andaboveall,thepristinebeachandgloriousocean.
AfteryoursuccessfulinterviewwithAvio,Alicehadgivenyouaquickbriefingonthehotelanditsmanager,Leo.
LeoinheritedtheKahanajustthreeyearsago.Hehasalwaysbeeninthehospitalityindustry,butthesheerscopeof
theluxuryhotel'soperationshashimslightlyoverwhelmed.HehasaskedforAvio'shelptobringamorerigorous
approachtohismanagementdecisionmakingprocesses.

UsingtheTutorial:AGuidetoTutorialResources
Beforeyoustartpackingyourbeachtowel,readthissectiontolearnhowtousethistutorialtoyourgreatest
advantage.
QM'sstructureandnavigationaltoolsareeasytomaster.Ifyou'rereadingthistext,youmusthaveclickedonthe
linklabeled"UsingtheTutorial"ontheleft.
Thesenavigationlinksopeninteractiveclips(likethisone)here.
Therearethreetypesofinteractiveclips:KahanaClips,ExplanatoryClips,andExerciseClips.
KahanaClipsposeproblemsthatariseinthecontextofyourconsultingengagementattheKahana.Typically,one
clipwillhaveLeoassignyouandAliceaspecifictask.InalaterKahanaClipyouwillanalyzetheproblem,andyou
andAlicewillpresentyourresultstoLeoforhisconsideration.TheKahanaclipswillgiveyouexposuretothetypes
ofbusinessproblemsthatbenefitfromtheanalyticalmethodsyou'llbelearning,andacontextforpracticingthe
methodsandinterpretingtheirresults.
Tofullybenefitfromthetutorial,youshouldsolveallofLeo'sproblems.Attheendofthetutorial,amultiplechoice
assessmentexamwillevaluateyourunderstandingofthematerial.
InExplanatoryClips,youwilllearneverythingneededtoanalyzemanagementproblemslikeLeo's.
2/135

5/13/2016

QuantitativeMethodsOnlineCourse

Complementingthetextaregraphs,illustrations,andanimationsthatwillhelpyouunderstandthematerial.Keep
onyourtoes:you'llbeaskedquestionseveninExplanatoryClipsthatyoushouldanswertocheckyour
understandingoftheconcepts.
Someexplanatoryclipsgiveyoudirectionsortipsonhowtousetheanalyticalandcomputationalfeaturesof
MicrosoftExcel.FacilitywiththenecessaryExcelfunctionswillbecriticaltosolvingthemanagementdecision
problemsinthiscourse.
QMissupplementedwithspreadsheetsofdatarelatingtotheexamplesandproblemspresented.Whenyouseea
Briefcaselinkinaclip,westronglyencourageyoutoclickonthelinktoaccessthedata.Then,practiceusingthe
Excelfunctionstoreproducethegraphsandanalysesthatappearintheclips.
YouwillalsoseeDatalinksthatyoushouldclicktoviewsummarydatarelatingtotheproblem.
ExerciseClipsprovideadditionalopportunitiesforyoutotestyourunderstandingofthematerial.Theyarea
resourcethatyoucanusetomakesurethatyouhavemasteredtheimportantconceptsineachsection.
Workthroughexercisestosolidifyyourknowledgeofthematerial.Challengeexercisesprovideopportunitiesto
tacklesomewhatmoreadvancedproblems.Thechallengeexercisesareoptionalyoushouldnothavetocomplete
themtogainthemasteryneededtopassthetutorialassessmenttest.
Thearrowbuttonsimmediatelybelowareusedfornavigationwithinclips.Ifyou'vemadeitthisfar,you'vebeen
usingtheoneontherighttomoveforward.
Usetheoneontheleftifyouwanttobackupapageortwo.
IntheupperrightoftheQMtutorialscreenarethreebuttons.FromlefttorighttheyarelinkstotheHelp,
Briefcase,andGlossary.
ToaccessadditionalHelpfeatures,clickontheHelpicon.
InyourBriefcaseyou'llfindallthedatayou'llneedtocompletethecourse,neatlystoredasExcelWorkbooks.In
manyoftheclipstherewillbelinkstospecificdocumentsintheBriefcase,buttheentireBriefcaseisavailableat
anytime.
IntheGlossary/Indexyou'llfindalistofhelpfuldefinitionsoftermsusedinthecourse,alongwithbrief
descriptionsoftheExcelfunctionsusedinthecourse.
WeencourageyoutouseallofQM'sfeaturesandresourcestothefullest.Theyaredesignedtohelpyoubuildan
intuitionforquantitativeanalysisthatyouwillneedasaneffectiveandsuccessfulmanager.

...andWelcometoHawaii!
Thedayofdeparturehascome,andyou'reinflightoverthePacificOcean.Alicegraciouslyletyoutakethewindow
seat,andyouwatchasthefoggyWestCoastrecedesbehindyou.
I'vebeentoHawaiibefore,soI'llletyouhavetheexperienceofseeingtheislandsfromtheairbeforeyousetfooton
them.
ThisLeosoundslikequiteacharacter.He'sbeeninbusinessallhislife,involvedinmanyventuressomemore
successfulthanothers.Apparently,heonceownedandmanagedagourmetspamrestaurant!
Spamisreallypopularamongtheislanders.LeotriedtoopenasecondlocationindowntownHonoluluforthetourists,
butthatdidn'tdosowell.Hehadtodeclarebankruptcy.
Then,justthreeyearsago,hisauntunexpectedlylefthimtheKahana.NowLeoisbackinbusiness,thistimewitha
largeoperationonhishands.
Itsoundstomelikehe'sthekindofmanagerwhousuallyreliesongutinstinctstomakebusinessdecisions,andlikes
totakerisks.Ithinkhe'shiredAviotohelphimmakemanagerialdecisionswith,well,betterjudgment.Hewantsto
learnhowtoapproachmanagementproblemsinamoresophisticated,analyticalfashion.
We'llbeusingsomebasicstatisticaltoolsandmethods.Iknowyou'renoexpertinstatistics,butI'llfillyouinalongthe
way.You'llbesurprisedathowquicklythey'llbecomesecondnaturetoyou.I'mconfidentyou'llbeabletodoquitea
bitoftheanalyticworksoon.

3/135

5/13/2016

QuantitativeMethodsOnlineCourse

LeoandtheHotelKahana
OnceyourplanetouchesdowninKauai,youquicklypickupyourbaggageandmeetyourhost,Leo,outsidethe
airport.
InheritingtheKahanacameasabigsurprise.MyaunthadruntheKahanaforalongtime,butIneverconsidered
thatshewouldleaveittome.
Anyway,I'vebeentryingmybesttoruntheKahanathewayahotelofitsqualitydeserves.I'vehadsomeupsand
downs.Thingshavebeenfairlysmoothforthepastyearnow,butI'verealizedthatIhavetogetmoreseriousabout
thewayImakedecisions.That'swhereyoucomeintothepicture.
Iusedtobequitearisktaker.Imadealotofdecisionsonimpulse.Now,whenIthinkofwhatIhavetolose,Ijust
wanttogetitright.
AfteryouarriveattheKahana,Leopersonallyshowsyoutoyourrooms."Ihaveatablereservedforthethreeofusat
8inthemainrestaurant,"Leoannounces."Youjusthavetotryournewchef'smangoandbrietart."

Basics:DataDescription
Leo'sDataMine
AfteryourwelcomedinnerintheKahana'smainrestaurant,LeoasksyouandAlicetomeethimthenextmorning.You
wakeupearlyenoughtotakeashortwalkonthebeachbeforeyoumakeyourwaytoLeo'soffice.
Goodmorning!Ihopeyoufoundyourroomscomfortablelastnightandarestartingtorecoverfromyourtrip.
Unfortunately,Idon'thavemuchtimethismorning.Asyourequestedonthephone,I'veassembledthemost
importantdataontheKahana.Itwasn'teasythishasn'tbeenthemostorganizedhotelintheworld,especiallysince
Itookover.There'sjustsomuchtokeeptrackof.
Thankyou,Leo.We'llhavealookatyourdatarightaway,sowecangetamoredetailedunderstandingoftheKahana
andthetypeofdatayouhaveavailableforustoworkwith.Anythinginparticularthatyou'dlikeustofocusonaswe
peruseyourfiles?
Yes.Therearetwothingsinparticularthathavebeenonmymindrecently.
Forone,weoffersomerecreationalactivitieshereattheKahana,includingascubadivingcertificationcourse.I
contractouttheoperationstoalocaldivingschool.Thecontractisupsoon,andIneedtorenewit,hireanotherschool,
ordiscontinueofferingscubalessonsalltogether.
I'dlikeyoutogetmesomequotesfromotherdivingschoolsontheislandsoIgetanideaofthecompetition'spricing
andhowitcomparestotheschoolI'vebeenusing.
I'malsoveryconcernedabouthoteloccupancyrates.Asyoumightimagine,theKahana'soccupancyfluctuatesduring
theyear,andI'dliketoknowhow,when,andwhy.I'dlovetohaveabetterfeelingforhowmanyguestsIcanexpectin
agivenmonth.
Thesefilescontainsomeinformationabouttourismontheisland,butI'dreallylikeyoutohelpmemakebettersense
ofit.SomehowIfeelthatifIcouldunderstandthepatternsinthedata,Icouldbetterpredictmyownoccupancyrates.
That'swhatwe'reheretodo.We'lltakealookatyourfilestogetbetteracquaintedwiththeKahana,andthenfocuson
divingschoolpricesandoccupancypatterns.
Thanks,oraswesayinHawaiian,Mahalo.Bytheway,we'renottooformalhereonHawaii.Asyouprobablynoticed,
yoursuite,Alice,includesaroomthathasbeensetupasanoffice.Butfeelfreetotakeyourworkdowntothebeachor
bythepoolwheneveryoulike.
Thanks!We'llcertainlytakeadvantageofthat.
Later,underaparasolatthebeach,youporeoverLeo'sfolders.Feelingabitoverwhelmed,youfindyourselfstaring
outtosea.
Alicetellsyounottoworry:"Wehaveanumberofstrategieswecanusetocompileamountainofdatalikethisinto
conciseandusefulinformation.Butnomatterwhatdatayouareworkingwith,alwaysmakesureyoureally
understandthedatabeforedoingalotofanalysisormakingmanagerialdecisions."
4/135

5/13/2016

QuantitativeMethodsOnlineCourse

WhatisAlicegettingatwhenshetellsyouto"understandthedata?"Andhowcanyoudevelopsuchanunderstanding?

DescribingandSummarizingData
Datacanberepresentedbygraphslikehistograms.Thesevisualdisplaysallowyoutoquicklyrecognizepatternsinthe
distributionofdata.

WorkingwithData
Informationoverload.Inventorycosts.Payroll.Productionvolume.Assetutilization.What'samanagertodo?
Thedataweencountereachdayhavevaluableinformationburiedwithinthem.Asmanagers,correctlyanalyzing
financial,production,ormarketingdatacangreatlyimprovethequalityofthedecisionswemake.
Analyzingdatacanberevealing,butchallenging.Asmanagers,wewanttoextractasmuchoftherelevant
informationandinsightaspossiblefromourdatawehaveavailable.
Whenweacquireasetofdata,weshouldbeginbyaskingsomeimportantquestions:Wheredothedatacomefrom?
Howweretheycollected?Howcanwehelpthedatatelltheirstory?
Supposeafriendclaimstohavemeasuredtheheightsofeveryoneinabuilding.Shereportsthattheaverageheight
wasthreeandahalffeet.Wemightbesurprised...
...untilwelearnthatthebuildingisanelementaryschool.
We'dalsowanttoknowifourfriendusedapropermeasuringstick.Finally,we'dwanttobesureweknewhowshe
measuredheight:withorwithoutshoes.
Beforestartinganytypeofformaldataanalysis,weshouldtrytogetapreliminarysenseofthedata.Forexample,
wemightfirsttrytodetectanypatterns,trends,orrelationshipsthatexistinthedata.
Wemightstartbygroupingthedataintologicalcategories.Groupingdatacanhelpusidentifypatternswithina
singlecategoryoracrossdifferentcategories.Buthowdowedothis?Andisthisoftentimeconsumingprocess
worthit?
Accountantsthinkso.BalanceSheetsandProfitandLossStatementsarrangeinformationtomakeiteasierto
comprehend.
Inaddition,accountantsseparatecostsintocategoriessuchascapitalinvestments,laborcosts,andrent.Wemight
ask:Areoperatingexpensesincreasingordecreasing?Doofficespacecostsvarymuchfromyeartoyear?
Comparingdataacrossdifferentyearsordifferentcategoriescangiveusfurtherinsight.Aresellingcostsgrowing
morerapidlythansales?Whichdivisionhasthehighestinventoryturns?

Histograms
Inadditiontogroupingdata,weoftengraphthemtobettervisualizeanypatternsinthedata.Seeingdata
displayedgraphicallycansignificantlydeepenourunderstandingofadatasetandthesituationitdescribes.
Toseethevalueagraphicalapproachcanadd,let'slookatworldwideconsumptionofoilandgasin2000.What
questionsmightwewanttoanswerwiththeenergydata?Whichcountryisthelargestconsumer?Howmuch
energydomostcountriesuse?
Source
Inordertocreateagraphthatprovidesgoodvisualinsightintothesequestions,wemightsortthecountriesby
theirlevelofenergyconsumption,thengrouptogethercountrieswhoseconsumptionfallsinthesamerange
e.g.,thecountriesthatuse100to199milliontonnesperyear,or200to299milliontonnes.
Source
Wecanfindthenumberofcountriesineachrange,andthencreateabargraphinwhichtheheightofeachbar
representsthenumberofcountriesineachrange.Thisgraphiscalledahistogram.
Ahistogramshowsuswherethedatatendtocluster.Whatarethemostcommonvalues?Theleastcommon?For
example,weseethatmostcountriesconsumelessthan100milliontonnesperyear,andthevastmajorityless
5/135

5/13/2016

QuantitativeMethodsOnlineCourse

than200milliontonnes.Onlythreecountries,Japan,Russia,andtheUS,consumemorethan300milliontonnes
peryear.
Whyaretheresomanycountriesinthefirstrangethelowestconsumption?Whatfactorsmightinfluencethis?
Populationmightbeourfirstguess.
Yetdespitealargepopulation,India'senergyconsumptionissignificantlylessthanthatofGermany,amuch
smallernation.Whymightthisbe?Clearlyotherfactors,likeclimateandtheextentofindustrialization,influence
acountry'senergyusage.

Outliers
Inmanydatasets,thereareoccasionalvaluesthatfallfarfromtherestofthedata.Forexample,ifwegraphthe
agedistributionofstudentsinacollegecourse,wemightseeadatapointat75years.Datapointslikethisonethat
fallfarfromtherestofthedataareknownasoutliers.Howdoweinterpretthem?
First,wemustinvestigatewhyanoutlierexists.Isitjustanunusual,butvalidvalue?Coulditbeadataentry
error?Wasitcollectedinadifferentwaythantherestofthedata?Atadifferenttime?
Wemightdiscoverthatthedatapointreferstoa75yearoldretiree,takingthecourseforfun.
Aftermakinganefforttounderstandwhereanoutliercomesfrom,weshouldhaveadeeperunderstandingofthe
situationthedatarepresent.Then,wecanthinkabouthowtohandletheoutlierinouranalysis.Typically,wedo
oneofthreethings:leavetheoutlieralone,orveryrarelyremoveitorchangeittoacorrectedvalue.
Aseniorcitizeninacollegeclassmaybeanoutlier,buthisagerepresentsalegitimatevalueinthedataset.Ifwe
trulywanttounderstandtheagedistributionofallstudentsintheclass,wewouldleavethepointin.
Or,ifwenowrealizethatwhatwereallywantistheagedistributionofstudentsinthecoursewhoarealso
enrolledinfulltimedegreegrantingprograms,wewouldexcludetheseniorcitizenandallothernondegree
programstudentsenrolledinthecourse.
Occasionally,wemightchangethevalueofanoutlier.Thisshouldbedoneonlyafterexaminingtheunderlying
situationingreatdetail.
Forexample,ifwelookattheinventorygraphbelow,adatapointshowing80pairsofrollerbladesininventory
wouldbehighlyunusual.
Noticethatthedatapoint"80"wasrecordedonApril13th,andthattheinventorywas10pairsonApril12th,and
6onApril14th.
Basedonourmanagementunderstandingofhowinventorylevelsriseandfall,werealizethatthevalueof80is
extraordinarilyunlikely.Weconcludethatthedatapointwaslikelyadataentryerror.Furtherinvestigationof
salesandpurchasingrecordsrevealsthattheactualinventorylevelonthatdaywas8,not80.Havingfounda
reliablevalue,wecorrectthedatapoint.
Excludingorchangingdataisnotsomethingwedooften.Weshouldneverdoittohelpthedata'fit'aconclusion
wewanttodraw.Suchchangestoadatasetshouldbemadeonacasebycasebasisonlyaftercarefulinvestigation
ofthesituation.

Summary
Withanydatasetweencounter,wemustfindwaystoallowthedatatotelltheirstory.Orderingandgraphing
datasetsoftenexposepatternsandtrends,thushelpingustolearnmoreaboutthedataandtheunderlying
situation.Ifdatacanprovideinsightintoasituation,theycanhelpustomaketherightdecisions.

CreatingHistograms
Note:UnlessyouhaveinstalledtheExcelDataAnalysisToolPakaddin,youwillnotbeabletocreatehistograms
usingtheHistogramtool.However,wesuggestyoureadthroughtheinstructionstolearnhowExcelcreates
histogramssoyoucanconstructtheminthefuturewhenyoudohaveaccesstotheDataAnalysisToolpak.
TocheckiftheToolpakisinstalledonyourcomputer,gototheDatatabintheToolbarinExcel2007.If"Data
Analysis"appearsintheRibbon,theToolpakhasalreadybeeninstalled.Ifnot,clicktheOfficeButtoninthetop
leftandselect"ExcelOptions."Choose"AddIns"andhighlightthe"AnalysisToolpak"inthelistandclick"Go."
6/135

5/13/2016

QuantitativeMethodsOnlineCourse

ChecktheboxnexttoAnalysisToolpakandclick"OK."Excelwillthenwalkyouthroughasetupprocesstoinstall
thetoolpak.
CreatingahistogramwithExcelinvolvestwosteps:preparingourdata,andprocessingthemwiththeData
AnalysisHistogramtool.
Topreparethedata,weenterorcopythevaluesintoasinglecolumninanExcelworksheet.
Often,wehavespecificrangesinmindforclassifyingthedata.Wecanentertheseranges,whichExcelcalls
"bins,"intoasecondcolumnofdata.
IntheToolbar,selecttheDatatab,andthenchooseDataAnalysis.
IntheDataAnalysispopupwindow,chooseHistogramandclickOK.
ClickontheInputRangefieldandentertherangeofdatavaluesbyeithertypingtherangeorbydraggingthe
cursorovertherange.
Next,tousethebinswespecified,clickontheBinRangefieldandentertheappropriaterange.Note:ifwedon't
specifyourownbins,Excelwillcreateitsownbins,whichareoftenquitepeculiar.
ClicktheChartOutputcheckboxtoindicatethatwewantahistogramcharttobegeneratedinadditiontothe
summarytable,whichiscreatedbydefault.
ClickNewWorksheetPly,andenterthenameyouwouldliketogivetheoutputsheet.
Finally,clickOK,andthehistogramwiththesummarytablewillbecreatedinanewsheet.

CentralValuesforData
Graphsareveryusefulforgaininginsightintodata.However,sometimeswewouldliketosummarizethedataina
concisewaywithasinglenumber.

TheMean
Often,we'dliketosummarizeasetofdatawithasinglenumber.We'dlikethatsummaryvaluetodescribethe
dataaswellaspossible.Buthowdowedothis?Whichsinglevaluebestrepresentsanentiresetofdata?That
dependsonthedatawe'reinvestigatingandthetypeofquestionswe'dlikethedatatoanswer.
Whatnumberwouldbestdescribeemployeesatisfactiondatacollectedfromannualreviewquestionnaires?The
numericalaveragewouldprobablyworkquitewellasasinglevaluerepresentingemployees'experiences.
Tocalculateaverageormeanemployeesatisfaction,wetakeallthescores,sumthemup,anddividetheresult
by11,thenumberofsurveys.TheGreeklettermurepresentsthemeanofthedataset.
Themeanisbyfarthemostcommonmeasureusedtodescribethe"center"or"centraltendency"ofadataset.
However,itisn'talwaysthebestvaluetorepresentdata.Outlierscanexerciseundueinfluenceandpullthemean
valuetowardsoneextreme.
Inaddition,ifthedistributionhasatailthatextendsouttoonesideaskeweddistributionthevaluesonthat
sidewillpullthemeantowardsthem.Here,thedistributionisstronglyskewedtotheright:thehighvalueofUS
consumptionpullsthemeantoavaluehigherthantheconsumptionofmostothercountries.Whatothernumbers
canweusetofindthecentraltendencyofthedata?

TheMedian
Let'slookattherevenuesofthetop100companiesintheUS.Themeanrevenueofthesecompaniesisabout$42
billion.Howshouldweinterpretthisnumber?Howwelldoesthisaveragerepresenttherevenuesofthese
companies?
Whenweexaminetherevenuedistributiongraphically,weseethatmostcompaniesbringinlessthan$42billion
ofrevenueayear.Ifthisistrue,whyisthemeansohigh?
Source
Asourintuitionmighttellus,thetopcompanieshaverevenuesthataremuchhigherthan$42billion.These
7/135

5/13/2016

QuantitativeMethodsOnlineCourse

higherrevenuespulluptheaverageconsiderably.
Source
Incaseslikeincome,wherethedataaretypicallyveryskewed,themeanoftenisn'tthebestvaluetorepresentthe
data.Inthesecases,wecanuseanothercentralvaluecalledthemedian.
Source
Themedianisthemiddlevalueofadatasetwhosevaluesarearrangedinnumericalorder.Halfthevaluesare
higherthanthemedian,andhalfarelower.
Source
Forincome,themedianrevenuesofthetop100UScompaniesis$30billionsignificantlylessthan$42billion.
Halfofallthecompaniesearnlessthan$30billion,andhalfearnmorethan$30billion.
Source
Medianrevenueisamoreinformativerevenueestimatebecauseitisnotpulledupwardsbyasmallnumberof
highrevenueearners.Howcanwefindthemedian?
Source
Withanoddnumberofdatapoints,listedinorder,themedianissimplythemiddlevalue.Forexample,
considerthissetof7datapoints.Themedianisthe4thdatapoint,$32.51.
Inadatasetwithanevennumberofpoints,weaveragethetwomiddlevalueshere,thefourthandfifthvalues
andobtainamedianof$41.92.
Whendecidingwhethertouseameanormediantorepresentthecentraltendencyofourdata,weshouldweigh
theprosandconsofeach.Themeanweighsthevalueofeverydatapoint,butissometimesbiasedbyoutliersor
byahighlyskeweddistribution.
Bycontrast,themedianisnotbiasedbyoutliersandisoftenabettervaluetorepresentskeweddata.

TheMode
Athirdstatistictorepresentthe"center"ofadatasetisitsmode:thedataset'smostfrequentlyoccurringvalue.
Wemightusethemodetorepresentdatawhenknowingtheaveragevalueisn'tasimportantasknowingthemost
commonvalue.
Insomecases,datamayclusteraroundtwoormorepointsthatoccurespeciallyfrequently,givingthehistogram
morethanonepeak.Adistributionthathastwopeaksiscalledabimodaldistribution.

Summary
Tosummarizeadatasetusingasinglevalue,wecanchooseoneofthreevalues:themean,themedian,orthe
mode.Theyareoftencalledsummarystatisticsordescriptivestatistics.Allthreegiveasenseofthe"center"
or"centraltendency"ofthedataset,butweneedtounderstandhowtheydifferbeforeusingthem:

FindingTheMeanInExcel
TofindthemeanofadatasetenteredinExcel,weusetheAVERAGEfunction.
WecanfindthemeanofnumericalvaluesbyenteringthevaluesintheAVERAGEfunction,separatedby
commas.
Inmostcases,it'seasiertocalculateameanforadatasetbyindicatingtherangeofcellreferenceswherethedata
arelocated.
Excelignoresblankvaluesincells,butnotzeros.Therefore,wemustbecarefulnottoputazerointhedatasetifit
doesnotrepresentanactualdatapoint.

FindingTheMedianInExcel
8/135

5/13/2016

QuantitativeMethodsOnlineCourse

Excelcanfindthemedian,evenifadatasetisunordered,usingtheMEDIANfunction.
Theeasiestwaytocalculateadataset'smedianistoselectarangeofcellreferences.

FindingTheModeInExcel
Excelcanalsofindthemostcommonvalueofadataset,themode,usingtheMODEfunction.
Ifmorethanonemodeexistsinadataset,Excelwillfindtheonethatoccursfirstinthedata.
Mean,median,andmodearefairlyintuitiveconcepts.Already,Leo'smountainofdataseemslessintimidating.

Variability
Themean,medianandmodegiveyouasenseofthecenterofthedata,butnoneoftheseindicatehowfarthedataare
spreadaroundthecenter."Twosetsofdatacouldhavethesamemeanandmedian,andyetbedistributedcompletely
differentlyaroundthecentervalue,"Alicetellsyou."Weneedawaytomeasurevariationinthedata."

TheStandardDeviation
It'softencriticaltohaveasenseofhowmuchdatavary.Dothedataclusterclosetothecenter,orarethevalues
widelydispersed?
Let'slookatanexample.Toidentifygoodtargetmarkets,acardealershipmightlookatseveralcommunitiesand
findtheaverageincomeofeach.TwocommunitiesSilverhavenandBrightonhaveaveragehouseholdincomes
of$95,500and$97,800.Ifthedealerwantstotargethouseholdswithincomesabove$90,000,heshouldfocuson
Brighton,right?
Weneedtobemorecareful:themeanincomedoesn'ttellthewholestory.Aremostoftheincomesnearthemean,
oristhereawiderangearoundtheaverageincome?Amarketmightbelessattractiveiffewerhouseholdshavean
incomeabovethedealer'stargetlevel.Basedonaverageincomealone,Brightonmightlookmoreattractive,butlet's
takeacloserlookatthedata.
Despitehavingaloweraverageincome,incomesinSilverhavenhavelessvariability,andmorehouseholdsarein
thedealer'stargetincomerange.Withoutunderstandingthevariabilityinthedata,thedealermighthavechosen
Brighton,whichhasfewertargetedhomes.
Clearlyitwouldbehelpfultohaveasimplewaytocommunicatethelevelofvariabilityinthehouseholdincomesin
twocommunities.
Justaswehavesummarystatisticslikethemean,median,andmodetogiveusasenseofthe'centraltendency'ofa
dataset,weneedasummarystatisticthatcapturesthelevelofdispersioninasetofdata.
Thestandarddeviationisacommonmeasurefordescribinghowmuchvariabilitythereisinasetofdata.We
representthestandarddeviationwiththeGreeklettersigma:
Thestandarddeviationemergesfromaformulathatlooksabitcomplicatedinitially,solet'strytounderstanditata
conceptuallevelfirst.Thenwe'llbuildupstepbysteptohelpunderstandwheretheformulacomesfrom.
Thestandarddeviationtellsushowfarthedataarespreadout.Alargestandarddeviationindicatesthatthedataare
widelydispersed.Asmallerstandarddeviationtellsusthatthedatapointsaremoretightlyclusteredtogether.

Calculating
Ahotelmanagerhastostaffthefrontreceptiondeskinherlobby.Sheinitiallyfocusesonastaffingplanfor
Saturdays,typicallyaheavytrafficday.Inthehospitalityindustry,likemanyserviceindustries,properstaffing
canmakethedifferencebetweenunhappyguestsandsatisfiedcustomerswhowanttoreturn.
Ontheotherhand,overstaffingisacostlymistake.Knowingtheaveragenumberofcustomerrequestsforservices
duringashiftgivesthemanageraninitialsenseofherstaffingneedsknowingthestandarddeviationgivesher
invaluableadditionalinformationabouthowthoserequestsmightvaryacrossdifferentdays.
Theaveragenumberofcustomerrequestsis172,butthisdoesn'ttellusthereare172requestseverySaturday.To
staffproperly,thehotelmanagerneedsasenseofwhetherthenumberofrequestswilltypicallybebetween150
9/135

5/13/2016

QuantitativeMethodsOnlineCourse

and195,forexample,orbetween120and220.
Tocalculatethestandarddeviationfordatainthiscasethehoteltrafficweperformtwosteps.Thefirstisto
calculateasummarystatisticcalledthevariance.
EachSaturday'snumberofrequestsliesacertaindistancefrom172,themeannumberofrequests.Tofindthe
variance,wefirstsumthesquaresofthesedifferences.Whysquarethedifferences?
Ahotelmanagerwouldwantinformationaboutthemagnitudeofeachdifference,whichcanbepositive,negative,
orzero.IfwesimplysummedthedifferencesbetweeneachSaturday'srequestsandthemean,positiveand
negativedifferenceswouldcanceleachotherout.
Butweareinterestedinthemagnitudeofthedifferences,regardlessoftheirsign.Bysquaringthedifferences,we
getonlypositivenumbersthatdonotcanceleachotheroutinasum.
Theformulaforvarianceaddsupthesquareddifferencesanddividesbyn1togetatypeof"average"squared
differenceasameasureofvariability.(Thereasonwedividebyn1togetanaveragehereisatechnicalitybeyond
thescopeofthiscourse.)Thevarianceinthehotel'sfrontdeskrequestsis637.2.Canweusethisnumberto
expressthevariabilityofthedata?
Sure,butvariancesdon'tcomeoutinthemostconvenientform.Becausewesquarethedifferences,weendup
withavaluein'squared'requests.Whatisarequestsquared?Oradollarsquared,ifweweresolvingaproblem
involvingmoney?
Wewouldlikeawaytoexpressvariabilitythatisinthesameunitsastheoriginaldatafrontdeskrequests,for
example.Thestandarddeviationthefirstformulawesawaccomplishesthis.
Thestandarddeviationissimplythesquarerootofthevariance.Itreturnsourmeasuretoouroriginalunits.The
standarddeviationforthehotel'sSaturdaydesktrafficis25.2requests.

Interpreting
Whatdoesastandarddeviationof25.2requeststellus?Supposethestandarddeviationhadbeen50requests.
Withalargerstandarddeviation,thedatawouldbespreadfartherfromthemean.Ahigherstandarddeviation
wouldtranslateintomoredifficultstaffing:whenrequesttrafficisunusuallyhigh,disgruntledcustomerswaitin
longlineswhentrafficisverylow,deskstaffareidle.
Foradataset,asmallerstandarddeviationindicatesthatmoredatapointsarenearthemean,andthatthemean
ismorerepresentativeofthedata.Thelowerthestandarddeviation,themorestablethetraffic,therebyreducing
bothcustomerdissatisfactionandstaffidletime.
Fortunately,wealmostneverhavetocalculateastandarddeviationbyhand.SpreadsheettoolslikeExcelmakeit
easyforustocalculatevarianceandstandarddeviation.

Summary
Thestandarddeviationmeasureshowmuchdatavaryabouttheirmeanvalue.

FindinginExcel
Excel'sSTDEVfunctioncalculatesthestandarddeviation.
Tofindthestandarddeviation,wecanenterdatavaluesintotheSTDEVformula,onebyone,separatedby
commas.
Inmostcases,however,it'smucheasiertoselectarangeofcellreferencestocalculateastandarddeviation.
Tocalculatevariance,wecanuseExcel'sVARfunctioninthesameway.

TheCoefficientofVariation
Thestandarddeviationmeasureshowmuchadatasetvariesfromitsmean.Butthestandarddeviationonlytells
yousomuch.Howcanyoucomparethevariabilityindifferentdatasets?
10/135

5/13/2016

QuantitativeMethodsOnlineCourse

Astandarddeviationdescribeshowmuchthedatainasingledatasetvary.Howcanwecomparethevariabilityof
twodatasets?Dowejustcomparetheirstandarddeviations?Ifonestandarddeviationislarger,canwesaythat
datasetis"morevariable"?
Standarddeviationsmustbeconsideredwithinthedata'scontext.Thestandarddeviationsfortwostockindices
belowTheStreet.Com(TSC)InternetIndexandthePacificExchangeTechnology(PET)Indexwereroughly
equivalentoveraperiod.Butwerethetwoindicesequallyvariable?
Source
Iftheaveragepriceofanindexis$200,a$20standarddeviationisrelativelyhigh(10%oftheaverage)ifthe
averageis$700,$20isrelativelylow(notquite3%oftheaverage).Togaugevolatility,we'dcertainlywanttoknow
thatPET'saverageindexpricewasoverthreeandhalftimeshigherthanTSC'saverageindexprice.
Source
Togetasenseoftherelativemagnitudeofthevariationinadataset,wewanttocomparethestandarddeviationof
thedatatothedata'smean.
Source
Wecantranslatethisconceptofrelativevolatilityintoastandardizedmeasurecalledthecoefficientofvariation,
whichissimplytheratioofthestandarddeviationtothemean.Itcanbeinterpretedasthestandarddeviation
expressedasapercentofthemean.
Togetafeelingforthecoefficientofvariation,let'scompareafewdatasets.Whichsethasthehighestrelative
variation?Clicktheansweryouselect.
Becausethecoefficientofvariationhasnounits,wecanuseittocomparedifferentkindsofdatasetsandfindout
whichdatasetismostvariableinthisrelativesense.
Thecoefficientofvariationdescribesthestandarddeviationasafractionofthemean,givingyouastandard
measureofvariability.

Summary
Thecoefficientofvariationexpressesthestandarddeviationasafractionofthemean.Wecanuseittocompare
variationindifferentdatasetsofdifferentscalesorunits.

ApplyingDataAnalysis
Afteragoodnight'ssleep,youmeetAliceforBreakfast.
"It'stimetogetstartedonLeo'sassignments.Couldyougetthosepricequotesfromdivingschoolsandpreparea
presentationforLeo?We'llwanttopresentourfindingsasneatlyandconciselyaspossible.Usegraphsandsummary
statisticswhereverappropriate.Meanwhile,I'llstartworkingonLeo'shoteloccupancyproblem."

PricingtheScubaSchools
InadditiontotheschoolLeoiscurrentlyusing,youfind20otherscubaservicesinthephonebook.Youcallthose20
andgetpricequotesonhowmuchtheywouldchargetheKahanaperguestforaScubaCertificationCourse.
Prices
Youcreateahistogramoftheprices.Usethebinrangesprovidedinthedataspreadsheet,orexperimentwithyour
ownbins.IfyoudonothavetheExcelAnalysisToolpakinstalled,clickontheBriefcaselinklabeled"Histogram"to
seethefinishedhistogram.
Prices
Histogram
Thisdistributionisskewedtotheright,sinceatailofhigherpricesextendstotherightsideofthehistogram.The
shapeofthedistributionsuggeststhat:
Prices
11/135

5/13/2016

QuantitativeMethodsOnlineCourse

Histogram
Youcalculatethekeysummarystatistics.Thecorrectvaluesare(Mean,Median,StandardDeviation):
Prices
Histogram
Yourreportlooksgood.Thisgraphicisveryhelpful.Atthemoment,I'mpaying$330perguest,whichisabout
averagefortheisland.Clearly,Icouldgetacheaperdealonly6schoolswouldchargeahigherrate.Ontheother
hand,maybethesemoreexpensiveschoolsofferabetterdivingexperience?Iwonderhowsatisfiedmyguestshave
beenwiththecourseofferedbymycurrentcontractor...

Exercise1:VALinuxStockBonanza
Afteracompanycompletesitsinitialpublicoffering,howistheownershipofcommonstockdistributedbetween
individualsinthefirm,oftentermed"namedinsiders"?
Let'sexamineacompany,VALinux,thatchoosetosellitsstockinanInitialPublicOffering(IPO)duringtheIPO
crazeinthelate1990s.
Accordingtoitsprospectus,aftertheIPO,VALinuxwouldhavethefollowingdistributionofoutstandingsharesof
commonstockownedbyinsiders:
Source
FromtheVALinuxcommonstockdata,whatcouldwelearnbycreatingahistogram?(Choosethebestanswer)

Exercise2:EmployeeTurnover
Hereisahistogramgraphingannualturnoverratesataconsultingfirm.
Whichsummarystatisticbetterdescribesthesedata?

Exercise3:HonidewInternship
TheJ.B.HonidewCorporationoffersaprestigioussummerinternshiptofirstyearstudentsatalocalbusiness
school.ThehumanresourcesdepartmentofHonidewwantstopublishabrochuretoadvertisetheposition.
Toattractasuitablepoolofapplicants,thebrochureshouldgiveanindicationofHonidew'shighacademic
expectations.ThehumanresourcesmanagercalculatesthemeanGPAoftheprevious8interns,toincludeinthe
brochure.
ThemeanGPAoftheformerinternsis:
Interns'GPA's
In1997,J.B.Honidew'sgrandson'sgirlfriendwasawardedtheinternship,eventhoughherGPAwasonly3.35.In
thepresenceofoutliersorastronglyskeweddataset,themedianisoftenabettermeasureofthe'center'.What's
themedianGPAinthisdataset?
Interns'GPA's

Exercise4:ScubaRegulations
Safetyequipmenttypicallyneedstofallwithinveryprecisespecifications.Suchspecificationsapply,forexample,
toscubaequipmentusingadevicecalleda"rebreather"torecycleoxygenfromexhaledair.
Recycledairmustbeenrichedwiththerightamountofoxygenfromthetankbeforedeliverytothediver.With
toolittleoxygen,thedivercanbecomedisorientedtoomuch,andthedivercanexperienceoxygenpoisoning.
Minimizingthedeviationofoxygenconcentrationlevelsfromthespecifiedlevelisclearlyamatteroflifeand
death!
Ascubaequipmenttestinglabcomparedtheoxygenconcentrationsoftwodifferentbrandsofrebreathers,Aand
B.Examinethedata.Withoutdoinganycalculations,forwhichofthetworebreathersdoestheoxygen
concentrationappeartohavealowerstandarddeviation?
12/135

5/13/2016

QuantitativeMethodsOnlineCourse

NoticethatdatasetA'sextremevaluesareclosertothecenter,withmoredatapointsclosertothecenteroftheset.
Evenwithoutcalculations,wehaveagoodknackforseeingwhichsetismorevariable.
WecanbackupourobservationsbyusingthestandarddeviationformulaortheSTDEVfunctioninExcel,wecan
calculatethatthestandarddeviationofAis0.58%,whereasthatofBis1.05%.

Exercise5:FluctuationsinEnergyPrices
Afterdecadesofgovernmentcontrol,statesacrosstheUSarederegulatingenergymarkets.Inaderegulated
market,electricitypricestendtospikeintimesofhighdemand.
Thisvolatilityisaconcern.Aprimarybenefittoconsumersinaregulatedmarketisthatpricesarefairlystable.To
provideabaselinemeasureforthevolatilityofpricespriortoderegulation,wewanttocomputethestandard
deviationofpricesduringthe1990s,whenelectricitypriceswerelargelyregulated.
From1990to2000,theaveragenationalpriceinJulyof500kWofelectricityrangedbetween$45.02and$50.55.
Whatisthestandarddeviationoftheseelevenprices?
ElectricityPrices
Source
Excelmakesthejobmucheasier,becauseallthat'srequiredisenteringthedataintocellsandinputtingtherange
ofcellsintothe=STDEV()function.Theresultis$2.02.
Ontheotherhand,tocalculatethestandarddeviationbyhand,usetheformula:
First,calculatethemean,$48.40.Then,findthedifferencebetweeneachdatapointandthemean.Calculatethe
sumofthesesquareddifferences,40.79.Dividebythenumberofpointsminusone(111=10inthiscase)to
obtain4.08.Takingthesquarerootof4.08givesusthestandarddeviation,$2.02.

Exercise6:BigMartPersonalCareProducts
Supposeyouareapurchasingagentforawholesaleretailer,BigMart.BigMartoffersseveralgenericversionsof
householditems,likedeodorant,toconsumersataconsiderablediscount.
Every18months,BigMartrequestsbidsfrompersonalcarecompaniestoproducethesegenericproducts.
Aftersimplychoosingthelowestindividualbidderforyears,BigMarthasdecidedtointroduceavendor"score
card"thatmeasuresmultipleaspectsofeachvendor'sperformance.Oneofthecriteriaonthescorecardisthe
levelofyeartoyearfluctuationinthevendor'spricing.
Comparethevariabilityofpricesfromeachsupplier.Whichcompany'spricesvarytheleastfromyeartoyearin
relationtotheiraverageprice,asmeasuredbythecoefficientofvariation?

Summary
Pleasedwithyourwork,Alicedecidestoteachyoumoredatadescriptiontechniques,soyoucantakeoveragreater
shareoftheproject.

RelationshipsBetweenVariables
Sofar,youlearnedhowtoworkwithasinglevariable,butmanymanagerialproblemsinvolveseveralfactorsthatneed
tobeconsideredsimultaneously.

TwoVariables
Weusehistogramstohelpusanswerquestionsaboutonevariable.Howdowestarttoinvestigatepatternsand
trendswithtwovariables?
Let'slookattwodatasets:heightsandweightsofathletes.Whatcanwesayaboutthetwodatasets?Istherea
relationshipbetweenthetwo?
Ourintuitiontellsusthatheightandweightshouldberelated.Howcanweusethedatatoinformthatintuition?
Howcanweletthedatatelltheirstoryaboutthestrengthandnatureofthatrelationship?
13/135

5/13/2016

QuantitativeMethodsOnlineCourse

Asalways,oneofourfirststepsistotrytovisualizethedata.
Becauseweknowthateachheightandweightbelongtoaspecificathlete,wefirstpairthetwovariables,withone
heightweightpairforeachathlete.
Plottingthesedatapairsonaxesofheightandweightonedatapointforeachathleteinourdatasetwecansee
arelationshipbetweenheightandweight.Thistypeofgraphiscalleda"scatterdiagram."
Scatterdiagramsprovideavisualsummaryoftherelationshipbetweentwovariables.Theyareextremelyhelpful
inrecognizingpatternsinarelationship.Themoredatapointswehave,themoreapparenttherelationship
becomes.
Inourscatterdiagram,there'sacleargeneraltrend:tallerathletestendtobeheavier.
Weneedtobecarefulnottodrawconclusionsaboutcausalitywhenweseethesetypesofrelationships.
Growingtallermightmakeusabitheavier,butheightcertainlydoesn'ttellthewholestoryaboutourweights.
Assumingcausalityintheotherdirectionwouldbejustplainwrong.Althoughwemaywishotherwise,growing
heaviercertainlydoesn'tmakeustaller!
Thedirectionandextentofcausalitymightbeeasytounderstandwiththeheightandweightexample,butin
businesssituations,theseissuescanbequitesubtle.
Managerswhousedatatomakedecisionswithoutfirmunderstandingoftheunderlyingsituationoftenmake
blundersthatinhindsightcanappearasludicrousasassumingthatgainingweightcanmakeustaller.
Whydon'twetrygraphinganotherpairofdatasetstoseeifwecanidentifyarelationship?Onascatterdiagram,
weplotforeachdaythenumberofmassagespurchasedatasparesortversusthetotalnumberofguestsvisitingthe
resort.
Wecanseearelationshipbetweenthenumberofguestsandthenumberofmassages.Themoregueststhatstayat
theresort,themoremassagespurchasedtoapoint,wheremassagesleveloff.
Whydoesthenumberofmassagesreachaplateau?Weshouldinvestigatefurther.Perhapstherearelimited
numbersofmassageroomsatthespa.Scatterplotscangiveusinsightsthatpromptustoaskgoodquestions,those
thatdeepenourunderstandingoftheunderlyingcontextfromwhichthedataaredrawn.

VariableandTime
Sometimes,wearenotasinterestedintherelationshipbetweentwovariablesasweareinthebehaviorofa
singlevariableovertime.Insuchcases,wecanconsidertimeasoursecondvariable.
Supposeweareplanningthepurchaseofalargeamountofhighspeedcomputermemoryfromanelectronics
distributor.Experiencetellsusthesecomponentshavehighpricevolatility.Shouldwemakethepurchasenow?
Orwait?
Assumingwehavepricedatacollectedovertime,wecanplotascatterdiagramformemoryprice,inthesameway
weplottedheightandweight.Becausetimeisoneofthevariables,wecallthisgraphatimeseries.
Timeseriesareextremelyusefulbecausetheyputdatapointsintemporalorderandshowhowdatachangeover
time.Havepricesbeensteadilydecliningorrising?Orhavepricesbeenerraticovertime?Arethereseasonal
patterns,withpricesinsomemonthsconsistentlyhigherthaninothers?
Timeserieswillhelpusrecognizeseasonalpatternsandyearlytrends.Butwemustbecareful:weshouldn'trely
onlyonvisualanalysiswhenlookingforrelationshipsandpatterns.

FalseRelationships
Ourintuitiontellsusthatpairsofvariableswithastrongrelationshiponascatterplotmustberelatedtoeach
other.Butwemustbecareful:humanintuitionisn'tfoolproofandoftenweinferrelationshipswherethereare
none.Wemustbecarefultoavoidsomeofthesecommonpitfalls.
Let'slookatanexample.ForUSpresidentsofthelast150years,thereseemstobeaconnectionbetweenbeing
electedinayearthatisamultipleof20(1900,1920,1940,etc.)anddyinginoffice.AbrahamLincoln(electedin
1860)wasthefirstvictimofthisunfortunaterelationship.
14/135

5/13/2016

QuantitativeMethodsOnlineCourse

Source
JamesGarfield(elected1880)survivedhispresidency(butwasassasinatedtheyearafterheleftoffice),and
WilliamMcKinley(1900),WarrenHarding(1920),FranklinRoosevelt(1940),andJohnF.Kennedy(1960)all
diedinoffice.
Source
RonaldReagan(elected1980)onlynarrowlysurvivedanassassinationattempt.Whatdothedatasuggestabout
thepresidentelectedin2020?
Probablynothing.Unlesswehaveareasonabletheoryabouttheconnectionbetweenthetwovariables,the
relationshipisnomorethananinterestingcoincidence.

HiddenVariables
Evenwhentwodatasetsseemtobedirectlyrelated,wemayneedtoinvestigatefurthertounderstandthereason
fortherelationship.
Wemayfindthatthereasonisnotduetoanyfundamentalconnectionbetweenthetwovariablesthemselves,but
thattheyareinsteadmutuallyrelatedtoanotherunderlyingfactor.
Supposewe'reexaminingsalesoficehockeypucksandbaseballsatasportinggoodsstore.
Thesalesofthetwoproductsformarelationshiponascatterplot:whenpucksalesslump,baseballsalesjump.But
arethetwodatasetsactuallyrelated?Ifso,why?
Athird,hiddenfactorprobablydrivesbothdatasets:theseason.Inwinter,peopleplayicehockey.Inspringand
summer,peopleplaybaseball.
Ifwehadsimplyplottedpuckandbaseballsaleswithoutthinkingfurther,wemightnothaveconsideredthetime
ofyearatall.Wecouldhaveneglectedacriticalvariabledrivingthesalesofbothproducts.
Inmanybusinesscontexts,hiddenvariablescancomplicatetheinvestigationofarelationshipbetweenalmostany
twovariables.
Afinalpoint:Keepinmindthatscatterplotsdon'tproveanythingaboutcausality.Theyneverprovethatone
variablecausestheother,butsimplyillustratehowthedatabehave.

Summary
Plottingtwovariableshelpsusseerelationshipsbetweentwodatasets.Butevenwhenrelationshipsexist,westill
needtobeskeptical:istherelationshipplausible?Anapparentrelationshipbetweentwovariablesmaysimplybe
coincidental,ormaystemfromarelationshipeachvariablehaswithathird,oftenhiddenvariable.

CreatingScatterDiagrams
TocreateascatterdiagraminExcelwithtwodatasets,weneedtofirstpreparethedata,andthenuseExcel's
builtincharttoolstoplotthedata.
Toprepareourdata,weneedtobesurethateachdatapointinthefirstsetisalignedwithitscorrespondingvalue
intheotherset.Thesetsdon'tneedtobecontiguous,butit'seasierifthedataarealignedsidebysideintwo
columns.
Ifthedatasetsarenexttoeachother,simplyselectbothsets.
Next,fromtheInserttabinthetoolbar,selectScatterintheChartsbinfromtheRibbon,andchoosethefirst
type:ScatterwithOnlyMarkers.
Excelwillinsertanonspecificscatterplotintotheworksheet,withthefirstcolumnofdatarepresentedontheX
axisandthesecondcolumnofdataontheYaxis.
WecanincludeacharttitleandlabeltheaxesbyselectingQuickLayoutfromtheRibbonandchoosingLayout
1.
Thenwecanaddthecharttitleandlabeltheaxesbyselectingandeditingthetext.
15/135

5/13/2016

QuantitativeMethodsOnlineCourse

Finally,ourscatterdiagramiscomplete.YoucanexploremoreofExcel'snewChartToolstoeditanddesign
elementsofyourchart.

Correlation
Byplottingtwovariablesonascatterplot,wecanexaminetheirrelationship.Butcanwemeasurethestrengthof
thatrelationship?Canwedescribetherelationshipinastandardizedway?
Humanshaveanuncannyabilitytodiscernpatternsinvisualdisplaysofdata.We"know"whentherelationship
betweentwovariableslooksstrong...
...orweak...
...linear...
...ornonlinear...
...positive(whenonevariableincreases,theothertendstoincrease)...
...ornegative(whenonevariableincreases,theothertendstodecrease).
Supposewearetryingtodiscernifthereisalinearrelationshipbetweentwovariables.Intuitively,wenoticewhen
datapointsareclosetoanimaginarylinerunningthroughascatterplot.
Logically,thecloserthedatapointsaretothatline,themoreconfidentlywecansaythereisalinearrelationship
betweenthetwovariables.
However,itisusefultohaveasimplemeasuretoquantifyandcommunicatetootherswhatwesoreadilyperceive
visually.Thecorrelationcoefficientissuchameasure:itquantifiestheextenttowhichthereisalinearrelationship
betweentwovariables.
Todescribethestrengthofalinearrelationship,thecorrelationcoefficienttakesonvaluesbetween1and+1.Here's
astrongpositivecorrelation(about0.85)...
...andhere'sastrongnegativecorrelation(about0.90).
Ifeverypointfallsexactlyonalinewithanegativeslope,thecorrelationcoefficientisexactly1.
Attheextremesofthecorrelationcoefficient,weseerelationshipsthatareperfectlylinear,butwhathappensinthe
middle?
Evenwhenthecorrelationcoefficientis0,arelationshipmightexistjustnotalinearrelationship.Aswe'veseen,
scatterplotscanrevealpatternsandhelpusbetterunderstandthebusinesscontextthedatadescribe.
Toreinforceourunderstandingofhowourintuitionaboutthestrengthofalinearrelationshipbetweenvariables
translatesintoacorrelationcoefficient,let'srevisittheexamplesweanalyzedvisuallyearlier.

InfluenceofOutliers
Insomecases,thecorrelationcoefficientmaynottellthewholestory.Managerswanttounderstandthe
attendancepatternsoftheiremployees.Forexample,doworkers'absenceratesvarybytimeofyear?
Supposeamanagersuspectsthathisemployeesskipworktoenjoythegoodlifemoreoftenasthetemperature
rises.Afterpairingabsenceswithdailytemperaturedata,hefindsthecorrelationcoefficienttobe0.466.
Whilenotastronglinearrelationship,acoefficientof0.466doesindicateapositiverelationshipsuggestingthat
theweathermightindeedbetheculprit.
Butlookatthedatabesidesafewoutliers,thereisn'taclearrelationship.Seeingthescatterplot,themanager
mightrealizethatthethreeoutlierscorrespondtoalatesummer,threedaytransportationstrikethatkeptsome
workershomeboundthepreviousyear.
Withoutlookingatthedata,thecorrelationcoefficientcanleadusdownfalsepaths.Ifweexcludetheoutliers,the
relationshipdisappears,andthecorrelationessentiallydropstozero,quietinganysuspicionofweather.Whydo
theoutliersinfluenceourmeasureoflinearitysomuch?
Asasummarystatisticforthedata,thecorrelationcoefficientiscalculatednumerically,incorporatingthevalueof
16/135

5/13/2016

QuantitativeMethodsOnlineCourse

everydatapoint.Justasitdoeswiththemean,thisinclusivenesscangetusintotrouble...
Becausemeasureslikecorrelationgivemoreweighttopointsdistantfromthecenterofthedata,outlierscan
stronglyinfluencethecorrelationcoefficientoftheentireset.Inthesesituations,ourintuitionandthemeasure
weusetoquantifyourintuitioncanbequitedifferent.Weshouldalwaysattempttoreconcilethosedifferencesby
returningtothedata.

Summary
Thecorrelationcoefficientcharacterizesthestrengthanddirectionofalinearrelationshipbetweentwodatasets.
Thevalueofthecorrelationcoefficientrangesbetween1and+1.

FindinginExcel
Excel'sCORRELfunctioncalculatesthecorrelationcoefficientfortwovariables.Let'sreturntoourdataon
athletes'heightandweight.
Enterthedatasetintothespreadsheetastwopairedcolumns.Wemustmakesurethateachdatapointinthefirst
setisalignedwithitscorrespondingvalueintheotherset.
Tocomputethecorrelation,simplyenterthetwovariables'ranges,separatedbyacomma,intotheCORREL
functionasshownbelow.
Theorderinwhichthetwodatasetsareselecteddoesnotmatter,aslongasthedata"pairs"aremaintained.With
heightandweight,bothvaluescertainlyneedtorefertothesameperson!

OccupancyandArrivals
Aliceiseagertomoveforward:"Withyournewunderstandingofscatterdiagramsandcorrelation,you'llbeableto
helpmewithLeo'shoteloccupancyproblem."
Inthehotelindustry,oneofthemostimportantmanagementperformancemeasuresisroomoccupancyrate,the
percentageofavailableroomsoccupiedbyguests.
Alicesuggeststhatthemonthlyoccupancyratemightberelatedtothenumberofvisitorsarrivingontheislandeach
month.OnageographicallyisolatedlocationlikeHawaii,visitorsalmostallarrivebyairplaneorcruiseship,sostate
agenciescangatherveryprecisedataonarrivals.
Aliceasksyoutoinvestigatetherelationshipbetweenroomoccupancyratesandtheinfluxofvisitors,asmeasured
bytheaveragenumberofvisitorsarrivingtoKauaiperdayinagivenmonth.Shewantsagraphicaloverviewofthis
relationship,andameasureofitsstrength.
Leo'sfoldersincludedataonthenumberofarrivalsonKauai,andonaveragehoteloccupancyratesinKauai,as
trackedbytheHawaiiDepartmentofBusiness,EconomicDevelopment,andTourism.
KauaiData
Source
Thebestwaytographicallyrepresenttherelationshipbetweenarrivalsandoccupancyis:
KauaiData
Source
YougeneratethescatterdiagramusingthedatafileandExcel'sChartWizard.Therelationshipcanbecharacterized
as:
KauaiData
Source
Youcalculatethecorrelationcoefficient.Enterthecorrelationcoefficientindecimalnotationwith2digitstothe
rightofthedecimal,(e.g.,enter"5"as"5.00").Roundifnecessary.
KauaiData
Source
17/135

5/13/2016

QuantitativeMethodsOnlineCourse

Tofindthecorrelationcoefficient,opentheKahanaDatafile.Inanyemptycell,type=CORREL(B2:B37,C2:C37).
Whenyouhitenter,thecorrectanswer,0.71,willappear.
KauaiData
TogetherwithAlice,youcompileyourfindingsandpresentthemtoLeo.
Source
Isee.TherelationshipbetweenthenumberofpeoplearrivingonKauaiandtheisland'shoteloccupancyratefollows
ageneraltrend,butnotaprecisepattern.Lookatthis:intwomonthswithnearlythesameaveragenumberofdaily
arrivals,theoccupancyrateswereverydifferent68%inonemonthand82%intheother.
Butwhyshouldtheybesodifferent?Whenpeoplearriveontheisland,theyhavetosleepsomewhere.Domore
camperscometoKauaiinonemonth,andmorehotelpatronsintheother?
Well,thatmightbeoneexplanation.Therecouldbedifferencesinthetypeoftouristsarriving.Thevacation
preferencesofthearrivalswouldbewhatwecallahiddenvariable.
Anotherhiddenvariablemightbetheaveragelengthofstay.Ifthelengthofstayvariesmonthtomonth,thensowill
hoteloccupancy.When50arrivalscheckintoahotel,theoccupancyratewillbehigheriftheyspend10dayseachat
thehotelthaniftheyspendonly3days.
I'mfollowingyou,butI'mbeginningtoseethattheoccupancyissueismorecomplexthanIexpected.Let'sgetback
toitatalatertime.Thescubaschoolcontractismorepressingatthemoment.

Exercise1:TheEffectivenessofSearchEngines
Asonlineretailingexpands,manycompaniesareinterestedinknowinghoweffectivesearchenginesarein
helpingconsumersfindgoodsonline.
Computerscientistsstudytheeffectivenessofsuchsearchenginesandcomparehowmanyresultssearchengines
recallandtheprecisionwithwhichtheyrecallthem."Precision"isanotherwayofsayingthatthesearchfoundits
target,forexampleapagecontainingboththephrases"winterparka"and"EddieBauer."
WhatcouldyousayabouttherelationshipbetweenthePrecisionandthenumberofResultsRecalled?
Source

Exercise2:EducationandIncome
Isaneducationagoodinvestmentinyourfuture?Someverysuccessfulbusinessexecutivesarecollegedropouts,
butistherearelationshipinthegeneralpopulationbetweenincomeandeducationlevel?
Considerthefollowingscatterplot,whichliststheincomeandyearsofformaleducationfor18people.Isthe
correlation:
Source
Thoughweshouldalwayscalculatethecorrelationcoefficientifwewanttohaveaprecisemeasure,it'sgoodto
havearoughfeelforthecorrelationbetweentwovariablesweseeplottedonascatterdiagram.Fortheincome
educationdata,thecoefficientisnearestto:

Sampling&Estimation
Introduction:TheScubaProblem
LeoasksyoutohelphimevaluatetheKahana'scontractwiththescubaschool.
Scubadivinglessonsareanidealwayforourgueststoenjoytheirvacationortakeabreakfromtheirbusiness
activities.Wehaveanexcellentcoralreef,andscubadivingisbecomingverypopularamongvacationersandbusiness
travelers.
Westartedouryearrounddivingprogramlastyear,contractingalocaldivingschooltodoascubacertificationcourse.
Theoneyeartrialcontractisnowupforrenewal.
18/135

5/13/2016

QuantitativeMethodsOnlineCourse

Maintainingthescubaofferingsonsiteisn'tcheap.Wehavetostaffthescubadesksevendaysaweek,andwe
subsidizethecostsassociatedwitheachcourse.SoIwanttogetagoodhandleonhowsatisfiedtheguestsarewiththe
lessonsbeforeIdecidewhetherornottorenewthecontract.
Thehotelhasadatabasewithinformationaboutwhichgueststookscubalessonsandwhen.Feelfreetotakealookat
it,butIcan'tspendafortunefiguringthisout.AndIneedtoknowassoonaspossible,sinceourcontractexpiresatthe
endofthemonth.
Aliceconvincesyoutodosomefieldresearchandjoinherforascubadivinglesson.Youreturnlatethatafternoon
exhaustedbutexhilarated.Aliceisespeciallyenthusiastic.
"Well,Icertainlygivethelessonstwothumbsup.Andwehaven'tevenbeenouttoseayet!
"Butouropinionsalonecan'tdecidethematter.Weshouldn'tinferfromourexperiencethatLeo'sclienteleasawhole
enjoyedthescubacertificationcourse.Afterall,wemayhavecaughttheinstructoronhisbestdaythisyear."
Alicesuggestscreatingasurveytofindouthowsatisfiedguestsarewiththescubadivingschool.

GeneratingRandomSamples
Naturally,youcan'tasktheopinionofeveryguestwhotookscubalessonsoverthepastyear.Youhavetosurveyafew
guests,andfromtheiropinionsdrawconclusionsabouthotelguestsingeneral.Theguestsyouchoosetosurveymust
berepresentativeofalloftheguestswhohavetakenthescubacourseattheresort.Buthowcanyoubesureyougeta
goodsample?

HowtoCreateaRepresentativeandUnbiasedSample
Asmanagers,weoftenneedtoknowsomethingaboutalargegroupofpeopleorproducts.Forexample,howmany
defectivepartsdoesalargeplantproduceeachyear?WhataretheaverageannualearningsofaWallStreet
investmentbanker?Howmanypeopleinourindustryplantoattendtheannualconference?
Whenitistoocostlytogathertheinformationwewanttoknowabouteverypersonoreverythinginanentiregroup,
weoftenaskthequestionofasubset,orsampleofthegroup.Wethentrytousethatinformationtodraw
conclusionsaboutthewholegroup.
Totakeasample,wefirstselectelementsfromtheentiregroup,or"population,"atrandom.Wethenanalyzethat
sampleandtrytoinfersomethingaboutthetotalpopulationwe'reinterestedin.Forexample,wecouldselecta
sampleofpeopleinourindustry,askthemiftheyplantoattendtheannualconference,andtheninferfromtheir
answershowmanypeopleintheentireindustryplantoattend.
Forexample,if10%ofthepeopleinoursamplesaytheywillattend,wemightfeelquiteconfidentsayingthat
between7%and13%ofourentirepopulationwillattend.
Thisisthegeneralstructureofalltheproblemswe'lladdressinthisunitwe'llworkoutthedetailsaswego
forward.Wewanttoknowsomethingaboutapopulationlargeenoughtomakeexaminingeverypopulationmember
impractical.
Wefirstselectelementsfromthepopulationatrandom...
...thenanalyzethatsample...
...andthendrawaninferenceaboutthetotalpopulationwe'reinterestedin.

TakingaRandomSample
Thefirsttricktosamplingistomakesureweselectasamplethatbroadlyrepresentstheentiregroupwe're
interestedin.Forexample,wecouldn'tjustasktheconferenceorganizersiftheywantedtoattend.Theywouldnot
berepresentativeofthewholegrouptheywouldbebiasedinfavorofattendingtheconference!
Togetagoodsample,wemustmakesureweselectthesample"atrandom"fromthefullpopulation.Thismeans
thateverypersonorthinginthepopulationisequallylikelytobeselected.Ifthereare15,000peopleinthe
industry,andwearechoosingasampleof1,000,theneverypersonneedstohavethesamechance1outof15
ofbeingselected.
Selectingarandomsamplesoundseasy,butactuallydoingitcanbequitechallenging.Inthissection,we'llsee
19/135

5/13/2016

QuantitativeMethodsOnlineCourse

examplesofsomemajormistakespeoplehavemadewhiletryingtoselectarandomsample,andprovidesome
adviceabouthowtoavoidthemostcommontypesofsamplingerrors.
Insomecases,selectingarandomsamplecanbefairlyeasy.Ifwehaveacompletelistofeachmemberofthe
groupinadatabase,wecanjustassignauniquenumbertoeachmemberofthegroup.Wethenletacomputer
drawrandomnumbersfromthelist.Thiswouldensurethateachelementofthepopulationhasanequal
likelihoodofbeingselected.
Ifthepopulationaboutwhichweneedtoobtaininformationisnotlistedinaneasytoaccessdatabase,thetaskof
selectingasampleatrandombecomesmoredifficult.Inthesecases,wehavetobeextremelycarefulnotto
introduceabiasinthewayweselectthesample.
Forexample,ifwewanttoknowsomethingabouttheopinionsofanentirecompany,wecannotjustpick
employeesfromonedepartment.Wehavetomakesurethateachemployeehasanequalchanceofbeingincluded
inthesample.Adepartmentasawholemightbebiasedinfavorofoneopinion.

SampleSize
Oncewehavedecidedhowtoselectasample,wehavetoaskhowlargeoursampleneedstobe.Howmany
membersofthegroupdoweneedtostudytogetagoodestimateaboutwhatwewanttoknowabouttheentire
population?
Theansweris:Itdependsonhow"accurate"wewantourestimatetobe.Wemightexpectthatthelargerthe
population,thelargerthesamplesizeneededtoachieveagivenlevelofaccuracy,butthisisnottrue.
Asamplesizeof1,000randomlyselectedindividualscanoftengiveasatisfactoryestimationabouttheunderlying
population,aslongasthesampleisrepresentativeofthewholepopulation.Thisistrueregardlessofwhetherthe
populationconsistsofthousandsofemployeesormillionsoffactoryparts.
Sometimes,asamplesizeof100oreven50mightbeenoughwhenwearenotthatconcernedabouttheaccuracy
ofourestimate.Othertimes,wemightneedtosamplethousandstoobtaintheaccuracywerequire.
Laterinthisunit,wewillfindouthowtocalculateagoodsamplesize.Fornow,it'simportanttounderstandthat
thesamplesizedependsonthelevelofaccuracywerequire,notonthesizeofthepopulation.

LearningaboutaSample
Onceweselectoursample,weneedtomakesureweobtainaccurateinformationabouteachmemberofthe
sample.Forexample,ifwewanttolearnaboutthenumberofdefectsaplantproduces,wemustcarefullymeasure
eachiteminthesample.
Whenwewanttolearnsomethingaboutagroupofpeopleanddon'thaveanyexistingdata,weoftenuseasurvey
tolearnaboutanissueofinterest.Conductingasurveyraisesproblemsthatcanbesurprisinglytrickytoresolve.
First,howdowephraseourquestions?Isthereabiasinanyquestionsthatmightleadparticipantstoanswer
theminacertainway?Areanyquestionswordedambiguously?Ifsomeofthepeopleinthesampleinterpreta
questiononeway,andothersinterpretitdifferently,ourresultswillbemeaningless!
Second,howdowebestconductthesurvey?Shouldwesendthesurveyinthemail,orconductitoverthephone?
Shouldweinterviewsurveyparticipantsinperson,ordistributehandoutsatameeting?
Thereareadvantagesanddisadvantagestoallmethods.Asurveysentthroughthemailmayberelatively
inexpensive,butmighthaveaverylowresponserate.Thisisamajorproblemifthosewhorespondhavea
differentopinionthanthosewhodon'trespond.Afterall,thesampleismeanttolearnabouttheentire
population,notjustthosewithstrongopinions!
Creatingatelephonesurveycreatesotherissues:Whendowecallpeople?Whoishomeduringregularbusiness
hours?Mostlikelynotworkingprofessionals.Ontheotherhand,ifwecallhouseholdnumbersintheeveningthe
"happyhourcrowd"mightnotbeavailable.
Whenwedecidetoconductasurveyinperson,wehavetoconsiderwhetherthepresenceofthepersonaskingthe
questionsmightinfluencethesurveyresults.Arethesurveyparticipantslikelytoconcealcertaininformationout
ofembarrassment?Aretheylikelytoexaggerate?
Clearly,everysurveywillhavedifferentissuesthatweneedtoconfrontbeforegoingintothefieldtocollectthe
data.
20/135

5/13/2016

QuantitativeMethodsOnlineCourse

ResponseRates
Withanytypeofsurvey,wemustpaycloseattentiontotheresponserate.Wehavetobesurethatthosewho
respondtothesurveyanswerquestionsinmuchthesamewayasthosewhodon'trespondwouldanswerthem.
Otherwise,wewillhaveabiasedviewofwhatthewholepopulationthinks.
Surveyswithlowresponseratesareparticularlysusceptibletobias.Ifwegetalowresponserate,wemusttryto
followupwiththepeoplewhodidnotrespondthefirsttime.Weeitherneedtoincreasetheresponserateby
gettinganswersfromthosewhooriginallydidnotrespond,orwemustdemonstratethatthenonrespondents'
opinionsdonotdifferfromthoseoftherespondentsontheissueofinterest.
Trackingdowneveryoneinasampleandgettingtheirresponsecanbecostlyandtimeconsuming.Whenour
resourcesarelimited,itisoftenbettertotakeasmallsampleandrelentlesslypursueahighresponseratethanto
takealargersampleandsettleforalowresponserate.

Summary
Oftenitmakessensetoinferfactsaboutalargepopulationfromasmallersample.Tomakesoundinferences:

ClassicSamplingMistakes
Tounderstandtheimportanceofrepresentativesamples,let'sgobackinhistoryandlookatsomemistakesmade
intheLiteraryDigestpollof1936.
TheLiteraryDigest,apopularmagazineinthe1930's,hadcorrectlypredictedtheoutcomeofU.S,presidential
electionsfrom1916to1932.Whentheresultsofthe1936pollwereannounced,thepublicpaidattention.Who
wouldbecomethenextpresident?
Newscaster:"Onceagain,theLiteraryDigestsentoutasurveytotheAmericanpublic,asking,"Whomwillyou
voteforinthisyear'spresidentialelection?"ThismaywellbethelargestpollinAmericanhistory."
Newscaster:"TheDigestsentthesurveytoover10millionAmericansandovertwomillionresponded!"
Newscaster:"Andthesurveyresultspredict:AlfLandonwillbeatFranklinD.Rooseveltbyalargemarginand
becomePresidentoftheUnitedStates."
Asitturnedout,AlfLandondidnotbecomePresidentoftheUnitedStates.Instead,FranklinD.Rooseveltwasre
electedtoathirdterminofficeinthelargestlandslidevictoryrecordedtothatdate.Thiswasadevastatingblowto
theDigest'sreputation.Whatwentwrong?Howcouldsuchalargesurveybesofaroffthemark?
TheLiteraryDigestmadetwomistakesthatledittopredictthewrongelectionoutcome.First,itmailedthe
surveytopeopleonthreedifferentlists:themagazine'ssubscribers,carowners,andpeoplelistedintelephone
directories.Whatwaswrongwithchoosingasamplefromtheselists?
ThesamplewasnotrepresentativeoftheAmericanpublic.Mostlowerincomepeopledidnotsubscribetothe
Digestanddidnotownphonesorcarsbackin1936.Thisledthepolltobebiasedtowardshigherincome
householdsandgreatlydistortedthepoll'sresults.Lowerincomehouseholdsweremorelikelytovoteforthe
Democrat,Roosevelt,buttheywerenotincludedinthepoll.
Second,themagazinereliedonpeopletovoluntarilysendtheirresponsesbacktothemagazine.Outoftheten
millionvoterswhoweresentapoll,overtwomillionresponded.Twomillionisahugenumberofpeople.What
waswrongwiththissurvey?
Themistakewassimple:Republicans,whowantedpoliticalchange,feltmorestronglyabouttheelectionthan
Democrats.Democrats,whoweregenerallyhappywithRoosevelt'spolicies,werelessinterestedinreturningthe
survey.Amongthosewhoreceivedthesurvey,adisproportionatenumberofRepublicansresponded,andthe
resultsbecameevenmorebiased.
TheDigesthadputanunprecedentedeffortintothepollandhadstakeditsreputationonpredictingtheoutcome
oftheelection.Itsreputationwounded,theDigestwentoutofbusinesssoonthereafter.
Duringthesameelectionyear,alittleknownpsychologistnamedGeorgeGallupcorrectlypredictedwhatthe
Digestmissed:Roosevelt'svictory.WhatdidGallupdothattheLiteraryDigestdidnot?Didhecreateaneven
biggersample?
21/135

5/13/2016

QuantitativeMethodsOnlineCourse

Surprisingly,GeorgeGallupusedamuchsmallersample.Heknewthatlargesampleswerenoguaranteeof
accurateresultsiftheyweren'trandomlyselectedfromthepopulation.
Gallup'steaminterviewedonly3,000people,butmadesurethatthepeopletheyselectedweretruly
representativeoftheUSpopulation.Healsoinstructedhisteamtobepersistentinaskingtheopinionofeach
personinthesample,whichgeneratedahighresponserate.
Gallup'scorrectpredictionofthe1936electionwinnerboostedhisreputationandGallup'smethodofpollingsoon
becameastandardforpublicopinionpolls.
Today'spollsusuallyconsistofasampleofaroundathousandrandomlyselectedpeoplewhoaretruly
representativeoftheunderlyingpopulations.Forexample,lookatpollreportedinaleadingnewspaper:the
samplesizewilllikelybearoundathousand.
Anothercommonsurveymistakeisphrasingthequestionsinawaythatleadstoabiasedresponse.Let'stakea
lookatarecentexampleofabiasedquestion.
In1992,RossPerot,anindependentcontenderfortheUSPresidentialelection,conductedamailinsurveyto
showthatthepublicsupportedhisdesiretoabolishspecialinterestgroups.Thisisthequestionheasked:
Source
InPerot'smailinsurvey,99percentofrespondentssaid"yes"tothatquestion.Itseemedasifeveryonein
AmericaagreedwithPerot'sstance.
Source
SoonafterPerot'ssurvey,YankelovichPartners,anindependentmarketresearchfirm,conductedtwointeresting
followupsurveys.Inthefirstsurvey,itusedthesamequestionthatPerotaskedandfoundthat80percentofthe
populationfavoredpassingthelaw.YPattributedthedifferencetothefactthatitwasabletocreateamore
representativesamplethanPerot.
Source
Interestingly,Yankelovichthenconductedasimilarsurvey,butrephrasedthequestioninthefollowingway:
Source
Theresponsetothisquestionwasstrikinglydifferent.Only40percentofthesampledpopulationagreedto
prohibitcontributions.Asitturnedout,theresultsofthesurveyallcamedowntothewaythequestionwas
phrased.
Source
Foranysurveyweconduct,it'scriticaltophrasethequestioninthemostneutralwaypossibletoavoidbiasinthe
sampleresults.
Source
Thereallessonofthesetwoexamplesisthis:Howdataarecollectedisatleastasimportantashowdataare
analyzed.Asamplethatisunrepresentative,biased,ornotdrawnatrandomcangivehighlymisleadingresults.
Howsampledataarecollectedisatleastasimportantashowtheyareanalyzed.Knowingthatsampledataneedto
berepresentativeandunbiased,youconductasurveyofthehotelguests.

SolvingtheScubaProblem(PartI)
Howcanyoubestdetermineifhotelguestsareenjoyingthescubacourse?Bysearchingthehoteldatabase,you
determinethat2,804hotelgueststookscubatripsinthepastyear.Thescubacertificationcoursewasofferedyear
round.Thedatabaseincludeseachguest'sname,address,phonenumber,age,dateofarrival,lengthofstay,and
roomnumber.
Yourfirststepisdecidingwhattypeofsurveytoconductthatwillbeinexpensive,quick,andwillprovideagood
sampleofalltheguestswhotookscubalessons.
Shouldyoumailasurveytothewholelistofguestswhotookscubalessons,expectingthatasmallpercentagewill
respond,orconductatelephonesurvey,whichwouldlikelyprovideahigherresponserate,butcostmoreperguest
22/135

5/13/2016

QuantitativeMethodsOnlineCourse

contacted?
ToensureagoodresponserateandbecauseLeowantsananswerquicklyyouchoosetocontactcustomersby
phone.Alicewarnsthattokeepcostslow,youcanonlycontact50hotelguests,andremindsyoutocreatearandom,
representativesample.
Youopenupthelistofnamesinthehoteldatabase.Thenameswereenteredasguestsarrived.Tomakethings
simple,yourandomlyselectadateandthenrecordthefirst50guestsarrivingafterthatdatewhotookthecourse.
Youaskthehoteloperatortocallthemforyou,andtellhimtobepersistent.Eventuallyheisabletocontact45ofthe
guestsonthelist.Heasksthegueststoratetheirscubaexperienceona1to6scaleandreportstheresultsbackto
you.Clickthelinkbelowtoviewyoursample.
Entertheaveragesatisfactionlevelasadecimalnumberwithonedigittotherightofthedecimalpoint(e.g.,enter
"5"as"5.0").Roundifnecessary.
HotelDatabase
Youcomputetheaveragesatisfactionlevelandfindthatitis2.5.YougiveLeothenews.Heexplodes.
Twopointfive!That'simpossible!Iknowforsurethatitmustbehigherthanthat!You'dbettergooveryourdata
again.
Backinyourroom,youlookoveryourlistofdata.WhatshouldyoutellLeo?
Whatfactorisbiasingyourresults?
WhenyoureportthisnewstoLeo,hebeginstolaugh.
WewerehitwithahurricaneatthebeginningofApril.Halfthescubaclasseswerecancelled,andtheonesthatdid
meethadtodealwithchoppywaterandbadvisibility.Eventheweeksfollowingthehurricanewerebad.Usually
guestsseeamantarayeveryweek,andtheguestsinAprilcouldbarelyseetheunderwatercoral.Nowonderthey
weren'thappy.
YouassureLeoyouwillconductthesurveyagainwithamorerepresentativesample.Thistime,youmakesurethat
theguestsaretrulyrandomlyselected.Later,youhavenewdatainyourhandsfrom45randomlychosengueststhat
showtheaveragesatisfactionratetobe4.4ona1to6scale.Thestandarddeviationofthesampleis1.54.

Exercise1:TheBellComputerProblem
Mr.GavinCollinsistheChiefOperatingOfficerofBellComputers,amarketleaderinpersonalcomputers.This
morning,heopenedthelatestissueofBusiness4.0,abusinessjournal,andnoticedanarticleonBellComputers.
ThearticlepraisedthehighqualityandlowcostofthePCsmadebyBell.However,italsoincludedsomenegative
commentsaboutBell'scustomerservice.
Currently,customerserviceisonlyavailabletocustomersofBellComputersoverthephone.
CollinswantstounderstandmorefullywhatcustomersthinkofBell'scustomerservice.Hismarketing
departmentdesignsasurveythataskscustomerstorateBell'scustomerservicefrom1to10.
Howshouldheconductthesurvey?

Exercise2:TheWaveProblem
"Wave"isacompanythatmanufactureslaundrydetergentinseveralcountriesaroundtheworld.InIndia,the
competitionamonglaundrydetergentsisfierce.
ThesalespermonthofWavehavebeenconstantforthepastfiveyears.WaveCEOMr.Sharmainstructedhis
marketingteamtocomeupwithastrongadvertisingcampaignstressingWave'ssuperiorityoverother
competitors.WaveconductedasurveyinthemonthofJune.
Theyaskedthefollowingquestions:"HaveyouheardofWave?""DoyouthinkWaveisagoodproduct?""Doyou
noticeadifferenceinthecolorofyourclothesafterusingWave?"Then,citingtheresultsoftheirsurvey,Wave
airedamajortelevisioncampaignclaimingthat75%ofthepopulationthoughtthatWavewasagoodproduct.
YouareanewassociateatMadisonConsulting.Withyourpartner,Ms.Mehta,youhavebeenaskedtoconducta
23/135

5/13/2016

QuantitativeMethodsOnlineCourse

studyforWave'smaincompetitor,theCoralReefDetergentCompany,aboutwhetherWave'sclaimsholdwater.
CoralReefwondershowtheWaveresultsarepossible,consideringthatCoralReefholdsover45%ofthecurrent
marketshare.
Ms.Mehtahasbeengoingthroughthesurveymethodology,andshetellsyou,"Thissampleisobviouslynot
representativeandunbiased.CoralReefcandisputeWave'sclaim!"WhathasMs.Mehtanoticed?

Challenge:TheAirport
Youhavebeenaskedtoconductasurveytodeterminethepercentageofflightsarrivingatasmallairportthat
werefilledtocapacitythatmorning.Youdecidetostandoutsidetheairport'ssingleexitdoorandaskasampleof
60passengersleavingtheairporthowfulltheirflightwas.
Yourfirstthoughtistojustaskthefirst60passengersdepartingtheairporthowfulltheirflightwas,butyou
quicklyrealizethatthatcouldbeahighlybiasedsample.Any60peopleleavingatthesametimewouldlikelyhave
comefromonlyacoupleofflights,andyouwanttogetagoodsenseofwhatpercentofallflightsarrivingthat
morningwerefilledtocapacity.Thus,youdecidetorandomlyselect60peoplefromallthepassengersdeparting
thebuildingthatmorning.
Afterconductingyoursurvey,youtallytheresults:10peopledeclinetoanswer,30peopletellyouthattheirflight
wasfilledtocapacity,and20peopletellyouthattheirflightwasnotfilledtocapacity.Whatcanyouconcludefrom
yoursurveyresultssofar?
Whatistheproblemwithyoursurvey?
Toseethis,imaginethat10planeshavearrivedthatmorningfiveofwhichwerefull(having100passengers
each)andfiveofwhichhadonlyasinglepassengerontheplane.Inthiscase,halfoftheplaneswerefull.
However,almostallofthepassengers(500ofthetotal505)departingfromtheairportwouldreport(correctly!)
thattheyhadbeenonafullplane.Sincepeoplefromafullplanearemorelikelytobeselected,thereisa
systematicbiasinyourresponse.
Itisimportant,ineverysurvey,totrytomakeyoursampleasrepresentativeaspossible.Inthiscase,yoursample
wasnotrepresentativeoftheplanesarrivingtotheairport.
Abetterapproachmightbetoaskthepeopleyouselectwhattheirflightnumberwas,andthenaskthemhowfull
theirflightwas.Makesureyouhaveatleastonepassengerfromeveryplane.Thencounttheresponsesofonlyone
personfromeachflight.Byincludingonlyonepersonperflightinyoursample,youensurethatyoursampleisan
accuratepredictionofhowmanyplanesarefilledtocapacity.
Samplingiscomplicated,anditisimportanttothinkthroughallthefactorsthatmightinfluenceyourresults.In
thiscase,themistakeisthatyouaretryingtoestimateapopulationofplanesbysamplingapopulationof
passengers.Thismakesthesampleunrepresentativeoftheunderlyingpopulation.Byrandomlysamplingthe
passengersratherthantheflights,eachflightisnotequallylikelytobeselected,andthesampleisbiased.

ThePopulationMean
Youreporttheresultsofyoursurvey,thesamplemean,anditsstandarddeviationtoLeo.

TheScubaProblemII
Asamplemeanof4.4makesmoresensetome,butI'mstillabituneasyaboutyoursurveyresult.Afterall,you've
onlycollected45responses.
Ifyou'dchosendifferentpeople,theylikelywouldhavegivendifferentresponses.Whatifjustbychancethese
45peoplelovedthescubacourse,andnooneelsedid?
Youhaveagoodpointthere,Leo.Ourintuitionisthattheaveragesatisfactionrateforallguestsisn'ttoofarfrom
4.4,butatthispointwe'renotsureexactlyhowfarawayitmightbe.Withoutmorecalculations,allwecansayis
that4.4isthebestestimatewehave.Thatiswhy...
Waitaminute!Thisisveryunsatisfying.Areyoutellingmethatthere'snowaytogaugetheaccuracyofthissurvey
result?
Iftheresultsarealittleoff,that'snotaproblem.Butyouhavetotellmehowfarofftheymightbe.Whatifyou'reoff
bytwowholepoints,andthetruesatisfactionofmyhotelguestsis2.4,not4.4?Inthatcase,mydecisionwouldbe
24/135

5/13/2016

QuantitativeMethodsOnlineCourse

completelydifferent.
Ineedtoknowhowaccuratelythissamplereflectstheopinionsofallthehotelguestswhowentscubadiving!
Thesamplemeanisthebestpointestimateofthepopulationmean,butitcannottellyouhowaccuratelythesample
reflectsthepopulation.
AlicesuggestsgivingLeoarangeofvaluesthatisalmostcertaintocontainthepopulationmean."Wemaynotbe
abletopindownmeansatisfactionprecisely.ButconfiningittoarangeoflikelyvalueswillprovideLeowithenough
informationtomakeasoundbusinessdecision."
Thatsoundslikeagoodidea,butyouwonderhowtoactuallydoit.

UsingConfidenceIntervals
Thesamplemeanisthebestestimateofourpopulationmean.However,itisonlyapointestimate.Itdoesnotgive
usasenseofhowaccuratelythesamplemeanestimatesthepopulationmean.
Thinkaboutit.Ifweknowonlythesamplemean,whatcanwereallysayaboutthepopulationmean?Inthecaseof
ourscubaschool,whatcanwesayabouttheaveragesatisfactionrateofallscubadivinghotelguests?Coulditbe
4.3?4.0?4.7?2.0?
Tomakedecisionsasamanager,weneedtohavemorethanjustagoodpointestimate.Weneedtohaveasense
ofhowcloseorfarawaythetruepopulationmeanmightbefromourestimate.
Wecanindicatethemostlikelyvaluesofthetruepopulationmeanbycreatingarange,orinterval,aroundthe
samplemean.Ifweconstructitcorrectly,thisrangewillverylikelycontainthetruepopulationmean.
Forexample,byconstructingarange,wemightbeabletotellLeothatweareveryconfidentthatthetrueaverage
customersatisfactionforallscubaguestsfallsbetween4.2and4.6.
Knowingthatthetrueaverageisalmostcertainlybetween4.2and4.6,Leoisbetterequippedtomakeadecision
thanifhesimplyknewtheestimatedaverageof4.4.
Creatingarangearoundthesamplemeanisquiteeasy.First,weneedtoknowthreestatisticsofthesample:the
meanxbar,thestandarddeviations,andthesamplesizen.
Wealsoneedtoknowhow"confident"we'dliketobethattherangecontainsthetruemeanofthepopulation.For
anylevelof"confidence",thereisavaluewe'llcallztoputintotheformula.We'lllearnlaterinthisunitexactlywhat
wemeanby"confidence,"andhowtocomputez.Fornow,justkeepinmindthatforhigherlevelsofconfidence,
we'llneedtoputinalargervalueofz.
Usingthesenumbers,wecancreatearangearoundthesamplemeanaccordingtothefollowingformula:
Beforeweactuallyusetheformula,let'strytodevelopourintuitionabouttherangewe'recreating.Whereshould
therangebecentered?Howwidemusttherangebetomakeusconfidentthatitcontainsthetruepopulationmean?
Whatfactorswouldleadustoneedawiderornarrowerrange?
Let'sseehowthestatisticsofthesampleinfluencethelocationandwidthoftherange.Let'sstartwiththesample
mean.
Thesamplemeanisourbestestimateofthepopulationmean.Thissuggeststhatthesamplemeanshouldalwaysbe
thecenteroftherange.Movethesliderbartoseehowthesamplemeanaffectstherange.
Second,thewidthoftherangedependsonthestandarddeviationofthesample.Whenthesamplestandard
deviationislarge,wehavegreateruncertaintyabouttheaccuracyofthesamplemeanasanestimateofthe
populationmean.Thus,wehavetocreateawiderrangetobeconfidentthatitincludesthetruepopulationmean.
Ontheotherhand,ifthesamplestandarddeviationissmall,wefeelmoreconfidentthatoursamplemeanisan
accuratepredictorofthetruepopulationmean.Inthiscase,wecandrawamorenarrowrange.
Thelargerthestandarddeviation,thewidertherangemustbe.Movethesliderbartoseehowthesamplestandard
deviationaffectstherange.
Third,thewidthoftherangedependsonthesamplesize.Withaverysmallsample,it'squitepossiblethatoneor
twoatypicalpointsinthesamplecouldthrowthesamplemeanoffconsiderablyfromthetruepopulationmean.So
withasmallsample,weneedtocreateawiderangetofeelcomfortablethatthetruemeanislikelytobeinsideit.
25/135

5/13/2016

QuantitativeMethodsOnlineCourse

Thelargerthesample,themorecertainwecanbethatthesamplemeanrepresentsthepopulationmean.Witha
largesample,evenifoursampleincludesafewatypicalpoints,therearelikelytobemanymoretypicalpointsinthe
sampletocompensatefortheoutliers.Thus,withalargesample,wecanfeelcomfortablewithasmallrange.
Movethesliderbartoseehowthesamplesizeinfluencestherange.
Finally,thewidthoftherangedependsonourdesiredlevelofconfidence.Thelevelofconfidencestateshowcertain
wewanttobethattherangecontainsthemeanofthepopulation.Themoreconfidentwewanttobethattherange
containsthetruepopulationmean,thewiderwehavetomaketherange.
Ifourdesiredlevelofconfidenceisfairlylow,wecandrawamorenarrowrange.
Inthelanguageofstatistics,weindicateourlevelofconfidencebysaying,forexample,thatweare"95%confident"
thattherangecontainsthetruepopulationmean.Thismeansthereisa95%chancethattherangecontainsthetrue
populationmean.
Movethesliderbartoseehowtheconfidencelevelaffectstherange.
Thesevariablesdeterminethesizeoftherangethatwewanttoconstruct.Wewilllearnexactlyhowtoconstructthis
rangeinalatersection.
Fornow,allwehavetounderstandisthatthepopulationmeancanbestbeestimatedbyarangeofvaluesandthat
therangedependsonthreesamplestatisticsaswellasthelevelofconfidencethatwewanttoassigntotherange.

Summary
Thesamplemeanisourbestinitialestimateofthepopulationmean.Toindicatehowaccuratethisestimateis,we
constructarangearoundthesamplemeanthatlikelycontainsthepopulationmean.Thewidthoftherangeis
determinedbythesamplesize,samplestandarddeviation,andthelevelofconfidence.Theconfidencelevel
measureshowcertainwearethattherangeweconstructcontainsthetruepopulationmean.
Alicerecommendstakingastepbackfromsamplingandlearningaboutthenormaldistribution.

TheNormalDistribution
Alicerecommendstakingastepbackfromsamplingandlearningaboutthenormaldistribution.
Thenormaldistributionhelpsuscreatearangearoundasamplemeanthatislikelytocontainthetruepopulation
mean.Youcanusethenormaldistributiontoturntheintuitivenotionof"confidenceinyourestimate"intoaprecisely
definedconcept.Understandingthenormaldistributionwillalsogiveyoudeeperinsightintohowsamplingworks.
Thenormaldistributionisaprobabilitydistributionthatiscenteredatthemean.Itisshapedlikeabell,andis
sometimescalledthe"bellcurve."
Likeanyprobabilitydistribution,thenormaldistributionisshownontwoaxes:thexaxisforthevariablewe're
studyingwomen'sheights,forexampleandtheyaxisforthelikelihoodthatdifferentvaluesofthevariablewill
occur.
Forexample,fewwomenareveryshortandfewareverytall.Mostareinthemiddlesomewhere,withfairlyaverage
heights.Sincewomenofaverageheightaresomuchmorecommon,thedistributionofwomen'sheightsismuch
higherinthecenterneartheaverage,whichisabout63.5inches.
Asitturnsout,foraprobabilitydistributionlikethenormaldistribution,thepercentofallvaluesfallingintoaspecific
rangeisequaltotheareaunderthecurveoverthatrange.
Forexample,thepercentageofallwomenwhoarebetween61and66inchestallisequaltotheareaunderthecurve
overthatrange.
Thepercentageofallwomentallerthan66inchesisequaltotheareaunderthecurvetotherightof66inches.
Likeanyprobabilitydistribution,thetotalareaunderthecurveisequalto1,or100%,becausetheheightofevery
womanisrepresentedinthecurve.
Overtheyears,statisticianshavediscoveredthatmanypopulationshavethepropertiesofthenormaldistribution.For
example,IQtestscoresfollowanormaldistribution.TheweightsofpenniesproducedbyU.S.mintshavebeenshown
tofollowanormaldistribution.
26/135

5/13/2016

QuantitativeMethodsOnlineCourse

Butwhatissospecialaboutthiscurve?
First,thenormaldistribution'smeanandmedianareequal.Theyarelocatedexactlyatthecenterofthedistribution.
Hence,theprobabilitythatanormaldistributionwillhaveavaluelessthanthemeanis50%,andthattheprobability
itwillhaveavaluegreaterthanthemeanis50%.
Second,thenormaldistributionhasauniquesymmetricalshapearoundthismean.Howwideornarrowthecurveis
dependssolelyonthedistribution'sstandarddeviation.
Infact,thelocationandwidthofanynormalcurvearecompletelydeterminedbytwovariables:themeanandthe
standarddeviationofthedistribution.
Largestandarddeviationsmakethecurveveryflat.Smallstandarddeviationsproducetight,tallcurveswithmostof
thevaluesveryclosetothemean.
Howisthisinformationuseful?
Regardlessofhowwideornarrowthecurve,italwaysretainsitsbellshapedform.Becauseofthisuniqueshape,we
cancreateafewuseful"rulesofthumb"forthenormaldistribution.
Foranormaldistribution,about68%(roughlytwothirds)oftheprobabilityiscontainedintherangereachingone
standarddeviationawayfromthemeanoneitherside.
It'seasiesttoseethiswithastandardnormalcurve,whichhasameanofzeroandastandarddeviationofone.
Ifwegotwostandarddeviationsawayfromthemeanforastandardnormalcurvewe'llcoverabout95%ofthe
probability.
Theamazingthingaboutnormaldistributionsisthattheserulesofthumbholdforanynormaldistribution,nomatter
whatitsmeanorstandarddeviation.
Forexample,abouttwothirdsofallwomenhaveheightswithinonestandarddeviation,2.5inches,oftheaverage
height,whichis63.5inches.
95%ofwomenhaveheightswithintwostandarddeviations(or5inches)oftheaverageheight.
Toseehowtheserulesofthumbtranslateintospecificwomen'sheights,wecanlabelthexaxistwicetoshowwhich
valuescorrespondtobeingonestandarddeviationaboveorbelowthemean,whichvaluescorrespondtobeingtwo
standarddeviationsaboveorbelowthemean,andsoon.
Essentially,bylabelingthexaxistwicewearetranslatingthenormalcurveintoastandardnormalcurve,whichis
easiertoworkwith.
Forwomen'sheight,themeanis63.5andthestandarddeviationis2.5.So,onestandarddeviationabovethemeanis
63.5+2.5,andonestandarddeviationbelowthemeanis63.52.5.
Thus,wecanseethatabout68%ofallwomenhaveheightsbetween61and66inches,sinceweknowthatabout68%
oftheprobabilityisbetween1and+1onastandardnormalcurve.
Similarly,wecanreadtheheightscorrespondingtotwostandarddeviationsaboveandbelowthemeantoseethat
about95%ofallwomenhaveheightsbetween58.5and68.5inches.

Thezstatistic
Theuniqueshapeofthenormalcurveallowsustotranslateanynormaldistributionintoastandardnormalcurve,as
wedidwithwomen'sheightssimplybyrelabelingthexaxis.Todothismoreformally,weusesomethingcalledthe
zstatistic.
Foranormaldistribution,weusuallyrefertothenumberofstandarddeviationswemustmoveawayfromthemean
tocoveraparticularprobabilityas"z",orthe"zvalue."Foranyvalueofz,thereisaspecificprobabilityofbeing
withinzstandarddeviationsofthemean.
Forexample,forazvalueof1,theprobabilityofbeingwithinzstandarddeviationsofthemeanisabout68%,the
probabilityofbeingbetween1and+1onastandardnormalcurve.
Agoodwaytothinkaboutwhatthezstatisticcandoisthisanalogy:ifagianttellsyouhishouseisfourstepstothe
north,andyouwanttoknowhowmanystepsitwilltakeyoutogetthere,whatelsedoyouneedtoknow?
27/135

5/13/2016

QuantitativeMethodsOnlineCourse

Youwouldneedtoknowhowmuchbiggerhisstrideisthanyours.Fourstepscouldbeareallylongway.
Thesameistrueofastandarddeviation.Toknowhowfaryoumustgofromthemeantocoveracertainareaunder
thecurve,youhavetoknowthestandarddeviationofthedistribution.
Usingthezstatistic,wecanthen"standardize"thedistribution,makingitintoastandardnormaldistributionwith
ameanof0andastandarddeviationof1.Wearetranslatingtherealvalueinitsoriginalunitsinchesinour
exampleintoazvalue.
Thezstatistictranslatesanyvalueintoitscorrespondingzvaluesimplybysubtractingthemeananddividingbythe
standarddeviation.
Thus,forthewomen'sheightof66inches,thezvalue,z=(6663.5)/2.5,equals1.Therefore,66isexactlyone
standarddeviationabovethemean.
Essentially,thezstatisticallowsustomeasurethedistancefromthemeanintermsofstandarddeviationsinstead
ofrealvalues.Itgiveseveryonethesamesizefeetinstatistics.
Wecanextendtherulesofthumbwe'vedevelopedbeyondthetwocaseswe'velookedat.Forexample,wemaywant
toknowthelikelihoodofbeingwithin1.5standarddeviationsfromthemean,orwithinthreestandarddeviations
fromthemean.
Selectdifferentvaluesofzthatis,selectdifferentnumbersofstandarddeviationsfromthemeanandseehow
theprobabilitychanges.Besuretotryzvaluesof1and2toverifythatourrulesofthumbareontarget!
Sometimeswemaywanttogointheotherdirection,startingwiththeprobabilityandfiguringouthowmany
standarddeviationsarenecessaryoneithersideofthemeantocapturethatprobability.
Forexample,supposewewanttoknowhowmanystandarddeviationsweneedtobefromthemeantocapture95%
oftheprobability.
Oursecondruleofthumbtellsusthatwhenwemovetwostandarddeviationsfromthemean,wecaptureabout
95%oftheprobability.Moreprecisely,tocaptureexactly95%oftheprobability,wemustbewithin1.96standard
deviationsofthemean.
Thismeansthatforanormaldistribution,thereisa95%probabilityoffallingbetween1.96and1.96standard
deviationsfromthemean.
Selectdifferentprobabilitiesandseehowmanystandarddeviationswehavetomoveawayfromthemeantocover
thatprobability.
Wecancreateatablethatshowswhichvaluesofzcorrespondtoeachprobabilityorwecancalculatezusingasimple
functioninMicrosoftExcel.We'llexplainhowtousebothoftheseapproachesinthenextfewclips.
ztable
Remember,theprobabilitiesandtherulesofthumbswe'vedescribedapplyONLYtoanormaldistribution.Don't
thinkyoucanusethemforanydistribution!
Sometimes,probabilitiesareshowninotherforms.Ifwestartattheveryleftsideofthedistribution,thearea
underneaththecurveiscalledthecumulativeprobability.Forexample,theprobabilityofbeinglessthanthemean
is0.5,or50%.Thisisjustoneexampleofacumulativeprobability.
Acumulativeprobabilityof70%correspondstoapointthathas70%oftheareaunderthecurvetoitsleft.
ThereareeasywaystofindcumulativeprobabilitiesusingspreadsheetpackagessuchasMicrosoftExcel.You'llhave
opportunitiestopracticesolvingthesetypesofproblemsshortly.
Cumulativeprobabilitiescanbeusedtofindtheprobabilityofanyrangeofvalues.Forexample,tofindthe
percentageofallwomenwhohaveheightsbetween63.5and68inches,wewouldsimplysubtractthepercentwhose
heightsarelessthan63.5inchesfromthepercentwhoseheightsarelessthan68inches.

Summary
Thenormaldistributionhasauniquesymmetricalshapewhosecenterandwidtharecompletelydeterminedbyits
meananditsstandarddeviation.Foreverynormaldistribution,theprobabilityofbeingwithinaspecifiednumber
ofstandarddeviationsofthemeanisthesame.Thedistancefromthemean,asmeasuredinstandarddeviations,is
28/135

5/13/2016

QuantitativeMethodsOnlineCourse

knownasthezvalue.Usingthepropertiesofthenormaldistribution,wecancalculateaprobabilityassociatedwith
anyrangeofvalues.

UsingExcel'sNormalFunctions
Tofindthecumulativeprobabilityassociatedwithagivenzvalueforastandardnormalcurve,weusetheExcel
functionNORMSDIST.NotetheSbetweentheMandtheD.Itindicatesweareworkingwitha'standard'normal
curvewithmeanzeroandstandarddeviationone.
Forexample,tofindthecumulativeprobabilityforthezvalue1,weentertheExcelfunction=NORMSDIST(1).
Thevaluereturned,0.84,istheareaunderthestandardnormalcurvetotheleftof1.Thistellsusthatthe
probabilityofobtainingavaluelessthan1forastandardnormalcurveisabout84%.
Weshouldn'tbesurprisedthattheprobabilityofbeinglessthan1is84%.Why?First,weknowthatthenormal
curveissymmetric,sothereisa50%chanceofbeingbelowthemean.
Next,weknowthatabout68%oftheprobabilityforastandardnormalcurveisbetween1and+1.
Sincethenormalcurveissymmetric,halfofthat68%or34%oftheprobabilitymustliebetween0and1.
Puttingthesetwofactstogetherconfirmsthatthereisan84%chanceofobtainingavaluelessthan1forastandard
normalcurve.
Ifwewanttofindthecumulativeprobabilityofavalueinageneralnormalcurveonethatdoesnotnecessarily
haveameanofzeroandastandarddeviationofonewehavetwooptions.Oneoptionistofirststandardizethe
valueinquestiontofindtheequivalentzvalue,andthenusetheNORMSDISTtofindthecumulativeprobabilityfor
thatzvalue.
Forexample,ifwehaveanormaldistributionwithmean26andstandarddeviation8,wemaywishtoknowthe
probabilityofobtainingavaluelessthan24.
Standardizingcanbedoneeasilybyhand,butExcelalsohasaSTANDARDIZEfunction.Weenterthefunctionina
cellandinsertthreevalues:thevaluetobestandardized,andthemeanandstandarddeviationofthenormal
distribution.
Wefindthatthestandardizedvalue(orzvalue)of24foranormalcurvewithmean26andstandarddeviation8is
0.25.
Now,tofindthecumulativeprobabilityforthezvalue0.25,weentertheExcelfunction=NORMSDIST(0.25),
whichtellsusthattheprobabilityofavaluelessthan0.25onastandardnormalcurveis40%.Thus,theprobability
ofavaluelessthan24onanormalcurvewithmean26andstandarddeviation8is40%.
ThesecondwaytofindacumulativeprobabilityinageneralnormalcurveistousetheNORMDISTfunction.Here,
weenterthefunctioninacellandinsertfourvalues:thenumberwhosecumulativeprobabilitywewanttofind,the
meanandstandarddeviationofthenormaldistribution,andtheword"TRUE."
Aswithourpreviousapproach,wefindthattheprobabilityofobtainingavaluelessthan24onanormalcurvewith
mean26andstandarddeviation8is40%.
Thevalue"TRUE"tellsExceltoreturnacumulativeprobability.Ifinsteadof"TRUE"weenter"FALSE,"Excel
returnstheyvalueofthenormalcurvesomethingweareusuallynotinterestedin.
Quiteoften,wehaveacumulativeprobability,andwanttoworkbackwards,translatingitintoavalueonanormal
curve.
Supposewewanttofindthezvalueassociatedwiththecumulativeprobability95%.
Totranslateacumulativeprobabilitybacktoazvalueonthestandardnormalcurve,weusetheExcelfunction
NORMSINV.NoteonceagaintheS,whichtellsusweareworkingwithastandardnormalcurve.
Wefindthatthezvalueassociatedwiththecumulativeprobability95%is1.64.
Sometimeswemaywanttotranslateacumulativeprobabilitybacktoavalueonageneralnormalcurve.For
example,wemaywanttofindthevalueassociatedwiththecumulativeprobability95%foranormalcurvewith
mean26andstandarddeviation8.
29/135

5/13/2016

QuantitativeMethodsOnlineCourse

Ifwewanttotranslateacumulativeprobabilitybacktoavalueonageneralnormalcurve,weusetheNORMINV
function.NORMINVrequiresthreevalues:thecumulativeprobability,andthemeanandstandarddeviationofthe
normaldistributioninquestion.
Wefindthatthevalueassociatedwiththecumulativeprobability95%foranormalcurvewithmean26and
standarddeviation8is39.2.

Usingtheztable
ThepreviousclipshowsushowtousesoftwareprogramslikeExceltocalculatezvaluesandcumulativeprobabilities
forthenormalcurve.Anotherwaytofindzvaluesandcumulativeprobabilitiesistouseaztable.Usingztablesisa
bitmorecumbersomethanusingExcel,butithelpsreinforcetheconcepts.
Let'susetheztabletofindacumulativeprobability.Women'sheightsaredistributednormally,withmeanaround
63.5inches,andstandarddeviation2.5inches.Whatpercentageofwomenareshorterthan65.6inches?
First,wecalculatethezvaluefor65.6inches,0.84.Thecumulativeprobabilityassociatedwiththezvalueisthearea
underthestandardnormalcurvetotheleftofthezvalue.Thiscumulativeprobabilityisthepercentageofwomen
whoareshorterthan65.6inches.
Wenextusethetabletofindthecumulativeprobabilitycorrespondingtoazvalueof0.84.First,wefindtherowby
locatingthezvalueuptothefirstdigittotherightofthedecimalpoint,0.8.Thenwechoosethecolumn
correspondingtotheremainderofthezvalue(0.840.8=0.04).Thecumulativeprobabilityis0.7995.About80%
ofwomenareshorterthan65.6inches.
Findingthecumulativeprobabilityforavaluelessthanthemeanisabittrickier.Forexample,wemightwantto
knowwhatpercentageofwomenareshorterthan61.6inches.
Wefindthatthezvalueforaheightof61.6inchesisanegativenumber:0.76.
Whenazvalueisnegative,wemustfirstusethetabletofindthecumulativeprobabilitycorrespondingtothe
positivezvalue,inthiscase+0.76.Then,sincethenormalcurveissymmetric,wewillbeabletoconcludethatthe
probabilityofbeinglessthanthezvalue0.76isthesameastheprobabilityofbeinggreaterthanthezvalue+0.76.
Wefindthecumulativeprobabilityfor+0.76bylocatingtherowcorrespondingtothezvalueuptothefirstdigitto
therightofthedecimalpoint,0.7,andthecolumncorrespondingtotheremainderofthezvalue(0.760.7=
0.06).Thecumulativeprobabilityis0.7764.
Sincetheprobabilityofbeinglessthanazvalueof+0.76is0.7764,thentheprobabilityofbeinggreaterthanaz
valueof+0.76is10.7764=0.2236.Thus,wecanconcludethattheprobabilityofbeinglessthanazvalueof0.76
isalso0.2236.
Finally,wereachourconclusion.About22.36%ofwomenareshorterthan61.6inches.

PracticewithNormalCurves
Findthecumulativeprobabilityassociatedwiththezvalue2.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Findthecumulativeprobabilityassociatedwiththezvalue2.36.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Findthecumulativeprobabilityassociatedwiththezvalue1.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
30/135

5/13/2016

QuantitativeMethodsOnlineCourse

ztable
Excel
Findthecumulativeprobabilityassociatedwiththezvalue1.645.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Findthecumulativeprobabilityassociatedwiththezvalue1.645.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Foranormalcurvewithmean100andstandarddeviation10,findthecumulativeprobabilityassociatedwiththe
value115.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Foranormalcurvewithmean100andstandarddeviation10,findthecumulativeprobabilityassociatedwiththe
value80.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Foranormalcurvewithmean100andstandarddeviation10,findtheprobabilityofobtainingavaluegreaterthan
80butlessthan115.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Foranormalcurvewithmean80andstandarddeviation5,findtheprobabilityofobtainingavaluegreaterthan85
butlessthan95.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Foranormalcurvewithmean47andstandarddeviation6,findtheprobabilityofobtainingavaluegreaterthan45.
Enteryouranswerindecimalnotationwith3digitstotherightofthedecimal,(e.g.,enter"5"as"5.000").Roundif
necessary.
ztable
Excel
Foranormalcurvewithmean47andstandarddeviation6,findtheprobabilityofobtainingavaluegreaterthan38
butlessthan45.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
31/135

5/13/2016

QuantitativeMethodsOnlineCourse

ztable
Excel
Findthezvalueassociatedwiththecumulativeprobabilityof60%.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Findthezvalueassociatedwiththecumulativeprobabilityof40%.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Findthezvalueassociatedwiththecumulativeprobabilityof2.5%.
Enteryouranswerindecimalnotationwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
Excel
Foranormalcurvewithmean222andstandarddeviation17,findthevalueassociatedwiththecumulative
probabilityof88%.
Enteryouranswerasaninteger(e.g.,"5").Roundifnecessary.
ztable
Excel
Foranormalcurvewithmean222andstandarddeviation17,findthevalueassociatedwiththecumulative
probabilityof28%.
Enteryouranswerasaninteger(e.g.,"5").Roundifnecessary.
ztable
Excel

TheCentralLimitTheorem
HowcanthenormaldistributionhelpyousampleLeo'shotelguests?
Howdotheuniquepropertiesofthenormaldistributionhelpuswhenweusearandomsampletoinfersomething
abouttheunderlyingpopulation?
Afterall,whenwesampleapopulation,weusuallyhavenoideawhetherornotthepopulationisnormally
distributed.We'retypicallysamplingbecausewedon'tevenknowthemeanofthepopulation!Ifthenormal
distributionissuchagreattool,whencanweuseit?
Itturnsoutthatevenifapopulationisnotnormallydistributed,thepropertiesofthenormaldistributionarevery
helpfultousinsampling.Toseewhy,let'sfirstlearnaboutawellestablishedstatisticalfactknownasthe"Central
LimitTheorem".

Definition
Roughlyspeaking,theCentralLimitTheoremsaysthatifwetookmanyrandomsamplesfromapopulationand
plottedthemeansofeachsample,thenassumingthesampleswetakearesufficientlylargetheresultingplot
ofthesamplemeanswouldlooknormallydistributed.
Furthermore,ifwetookenoughofthesesamples,themeanoftheresultingdistributionofsamplemeanswould
beequaltothetruemeanofthepopulation.
32/135

5/13/2016

QuantitativeMethodsOnlineCourse

Torepeat:nomatterwhattypeofdistributionthepopulationhasuniform,skewed,bimodal,orcompletely
bizarreifwetookenoughsamples,andthesamplesweresufficientlylarge,thenthemeansofthosesamples
wouldformanormaldistributioncenteredaroundthetruemeanofthepopulation.
TheCentralLimitTheoremisoneofthesubtlestaspectsofbasicstatistics.Itmayseemoddtobedrawinga
distributionofthemeansofmanysamples,butthatisexactlywhatwearedoing.We'llcallthisdistributionthe
DistributionofSampleMeans.(StatisticiansalsooftencallittheSamplingDistributionoftheMean).
Let'swalkthroughthisstepbystep.Ifwehaveapopulationanypopulationwecantakearandomsample.
Thissamplehasamean.
Wecanplotthatmeanonagraph.
Thenwetakeanothersample.Thatsamplealsohasamean,whichwealsoplotonthegraph.
Now,ifweplotalotofsamplemeansinthisway,theywillstarttoformanormaldistributionaroundthe
population'smean.
Themoresampleswetake,themorethegraphofthesamplemeanswouldlooklikeanormaldistribution.
Eventually,thegraphofthesamplemeanstheDistributionoftheSampleMeanswouldformanearlyperfect
replicaofanormaldistribution.
Now,nobodywouldactuallytakealotofsamples,calculateallofthesamplemeans,andthenconstructanormal
distributionwiththem.We'retakingalotofsamplesherejusttoletyouseethatgraphingthemeansofmany
sampleswouldgiveyouanormalcurve.
Intherealworld,wetakeasinglesampleandsqueezeitforalltheinformationit'sworth.Butwhatdoesthe
CentralLimitTheoremallowustosaybasedonthatsinglesample?
TheCentralLimitTheoremtellsusthatthemeanofthatonesampleispartofanormaldistribution.More
specifically,weknowthatthesamplemeanfallssomewhereinanormalDistributionofSampleMeansthatis
centeredatthetruepopulationmean.
TheCentralLimitTheoremissopowerfulforsamplingandestimationbecauseitallowsustoignorethe
underlyingdistributionofthepopulationwewanttolearnabout.SinceweknowtheDistributionofSample
Meansisnormallydistributedandcenteredatthetruepopulationmean,wecancompletelydisregardthe
underlyingdistributionofthepopulation.
Aswe'llseeshortly,becauseweknowsomuchaboutthenormaldistribution,wecanusetheinformationabout
theDistributionofSampleMeanstodrawconclusionsaboutthelikelihoodofdifferentvaluesoftheactual
populationmean.

Summary
TheCentralLimitTheoremstatesthatforanypopulationdistribution,themeansofsamplesfromthat
populationaredistributedapproximatelynormally.Themoresamples,andthelargerthesamplesize,thecloser
theDistributionofSampleMeansfitsanormalcurve.Themeanofasinglesampleliesonthisnormalcurve,so
wecanusethenormalcurve'sspecialpropertiestoextractmoreinformationfromasinglesamplemean.

Illustrating
Let'sseehowtheCentralLimitTheoremworksusingagraphicalillustration.Thethreeiconsaremarked
"Uniform,""Bimodal,"and"Skewed."Onalaterpage,clickingoneachofthethreesectionsinthenavigationwill
displayadifferentkindofdistribution.
Onthenextpage,clickingon"Uniform"willdisplayadistributionthatisuniforminshape,i.e.adistributionfor
whichallvaluesinaspecifiedrangeareequallylikelytooccur.Clickingon"Bimodal"willdisplayadistribution
thathastwoseparateareaswherevaluesaremorelikelytooccurthanelsewhere.Clickingon"Skewed"will
displayadistributionthatisnotsymmetricalvaluesaremorelikelytofallabovethemeanthanbelow.

Uniform
Thepopulationdistributionisonthetophalfofthepage.Let'stakeasampleofit.Thissamplehasamean.
Let'sstartbuildingadistributionofthesamplemeansonthebottomhalfofthepagebyplacingeachsample
33/135

5/13/2016

QuantitativeMethodsOnlineCourse

meanonagraph.Werepeatthisprocessseveraltimestocreateourdistribution.
Takeasample.Finditsmean.Recorditinthesamplemeanhistogram.Thishistogramapproximatesthe
distributionofthesamplemeans.
Aswecansee,theshapeoftheoriginaldistributiondoesn'tmatter.Thedistributionofthesamplemeanswill
alwaysformanormaldistribution.ThisiswhattheCentralLimitTheorempredicts.

Bimodal
Thepopulationdistributionisonthetophalfofthepage.Let'stakeasampleofit.Thissamplehasamean.
Let'sstartbuildingadistributionofthesamplemeansonthebottomhalfofthepagebyplacingeachsample
meanonagraph.Werepeatthisprocessseveraltimestocreateourdistribution.
Takeasample.Finditsmean.Recorditinthesamplemeanhistogram.Thishistogramapproximatesthe
distributionofthesamplemeans.
Aswecansee,theshapeoftheoriginaldistributiondoesn'tmatter.Thedistributionofthesamplemeanswill
alwaysformanormaldistribution.ThisiswhattheCentralLimitTheorempredicts.

Skewed
Thepopulationdistributionisonthetophalfofthepage.Let'stakeasampleofit.Thissamplehasamean.
Let'sstartbuildingadistributionofthesamplemeansonthebottomhalfofthepagebyplacingeachsample
meanonagraph.Werepeatthisprocessseveraltimestocreateourdistribution.
Takeasample.Finditsmean.Recorditinthesamplemeanhistogram.Thishistogramapproximatesthe
distributionofthesamplemeans.
Aswecansee,theshapeoftheoriginaldistributiondoesn'tmatter.Thedistributionofthesamplemeanswill
alwaysformanormaldistribution.ThisiswhattheCentralLimitTheorempredicts.
TheCentralLimitTheoremstatesthatthemeansofsufficientlylargesamplesarealwaysnormallydistributed,
akeyinsightthatwillallowyoutoestimatethepopulationmeanfromasample.

ConfidenceIntervals
UsingthepropertiesofthenormaldistributionandtheCentralLimitTheorem,youcanconstructarangeofvalues
thatisalmostcertaintocontainthepopulationmean.

EstimatingaPopulationMeanII
Foranormaldistribution,weknowthatifweselectavalueatrandom,itwillbewithintwostandarddeviationsof
thedistribution'smean95%ofthetime.
TheCentralLimitTheoremoffersustwoadditionalinsights.First,weknowthatthemeansofsufficientlylarge
samplesarenormallydistributed,regardlessofthedistributionoftheunderlyingpopulation.
Second,weknowthatthemeanoftheDistributionofSampleMeansisequaltothetruepopulationmean.
Combiningthesefactscangiveusameasureofhowaccuratelythemeanofasampleestimatesthepopulationmean.
Specifically,wecannowconcludethatifwetakeasufficientlylargesamplelet'ssayatleast30pointsfroma
population,thereisa95%chancethatthemeanofthatsamplefallswithintwostandarddeviationsofthetrue
populationmean.
Let'sbuildthisupstepbysteptomakesureweunderstandthelogic.
First,wetakeasamplefromapopulationandcomputeitsmean.Weknowthatthemeanofthatsampleisapoint
onanormaldistributiontheDistributionofSampleMeans.
Sincethemeanofoursampleisavaluerandomlyobtainedfromanormaldistribution,thereisa95%chancethat
thesamplemeaniswithintwostandarddeviationsofthemeanofthedistribution.
34/135

5/13/2016

QuantitativeMethodsOnlineCourse

TheCentralLimitTheoremtellsusthatthemeanofthatdistributionisthesameasthetruepopulationmean.Thus,
wecanconcludethatthereisa95%chancethatthesamplemeaniswithintwostandarddeviationsofthepopulation
mean.
Wehavearguedthat95%ofoursampleswillhaveameanwithintherangeshownaroundthetruepopulation
mean.
Nextwe'llturnthisaroundandlookatintervalsaroundsamplemeans,becausethat'sexactlywhataconfidence
intervalis.
Let'slookatintervalsaroundthemeansoftwodifferenttypesofsamples:thosewhosesamplemeansfallwithinthe
2standarddeviationrangearoundthepopulationmean(whichshouldbethecasefor95%ofallsamples)andthose
whosesamplemeansfalloutsidethe2standarddeviationrangearoundthepopulationmean(whichshouldbethe
casefor5%ofallsamples).
First,let'slookatasamplewhosemeanfallsoutsidethe2standarddeviationrangeshownaroundthepopulation
mean.
Sincethissamplemeanisoutsidetherange,itmustbemorethan2standarddeviationsawayfromthepopulation
mean.Sincethepopulationmeanismorethan2standarddeviationsawayfromthissamplemean,anintervalof
width2standarddeviationsaroundthissamplemeancouldnotcontainthetruepopulationmean.
Weknowthat5%ofallsamplesshouldhavesamplemeansoutsidethe2standarddeviationrangearoundthe
populationmean.Therefore5%ofallsamplesweobtainwillhaveintervalsthatdonotcontainthepopulationmean.
Nowlet'sthinkabouttheremaining95%ofsampleswhosemeansdofallwithinthe2standarddeviationrange
aroundthepopulationmean.
Ifwedrawanintervalofwidth2standarddeviationsaroundanyoneofthesesamplemeans,theintervalwould
containthetruepopulationmean.Thus,95%ofallsamplesweobtainwillhaveintervalsthatcontainthepopulation
mean.
We'vejustshownhowtogofromanysamplemeanapointestimatetoarangearoundthesamplemeana
95%confidenceinterval.We'vealsoarguedthat95%ofconfidenceintervalsobtainedinthiswayshouldcontainthe
truepopulationmean.
It'simportanttoemphasize:Wearenotsayingthat95%ofthetimeoursamplemeanisthepopulationmean,but
wearesayingthat95%ofthetimearangethatistwostandarddeviationswidecenteredaroundthesamplemean
containsthepopulationmean.
Tovisualizethegeneralconceptofaconfidenceinterval,imaginetaking20differentsamplesfromapopulationand
drawingaconfidenceintervalaroundeach.Asthediagramshows,onaverage95%oftheseintervalsor19outof
20wouldactuallycontainthepopulationmean.
Whatdoesthisinsightmeanforusasmanagers?Whenwesetaconfidencelevelof95%,weareagreeingtoan
approachthat1outof20timeswillgiveusanintervalthatdoesnotcontainthetruepopulationmean.Ifwearen't
comfortablewiththoseodds,weshouldraisetheconfidencelevel.
Ifweincreasetheconfidencelevelto98%,wehaveonlya1outof50chanceofobtaininganintervalthatdoesnot
containthetruepopulationmean.However,thishigherconfidencecomesatacost.Ifwekeepthesamesamplesize,
thentheconfidenceintervalwillwiden,therebydecreasingtheaccuracyofourestimate.Alternatively,tokeepthe
sameintervalwidth,wecanincreaseoursamplesize.
Howdoweknowifanintervalistoowide?Typically,ifwewouldmakeadifferentdecisionfordifferentvalues
withinaninterval,thatintervalistoowide.Let'slookatanexample.
Toestimatethepercentofpeopleinourindustrywhowillattendtheannualconference,wemightconstructa
confidenceintervalthatrangesfrom7%to13%.Ifwewouldselectadifferentconferencevenueifthetrue
percentagesis7%thanifitis13%,weneedtotightenourrange.
Now,beforewearereadytoactuallycreateourownconfidenceintervals,thereisatechnicalpointweneedtobe
acquaintedwith.WeneedtoknowthatthestandarddeviationoftheDistributionofSampleMeansis,the
standarddeviationoftheunderlyingpopulation,dividedbythesquarerootofn,thesamplesize.
Wewon'tprovethisfacthere,butsimplynotethatitistrue,andthatitshouldconfirmourgeneralintuitionabout
theDistributionofSampleMeans.Forexample,ifwehavehugesamples,we'dexpectthemeansofthoselarge
samplestobetightlyclusteredaroundthetruepopulationmean,andtherebyformanarrowdistribution.
35/135

5/13/2016

QuantitativeMethodsOnlineCourse

Summary
Aconfidenceintervalisanestimateforthemeanofapopulation.Itspecifiesarangethatislikelytocontainthe
populationmean.Aconfidenceintervaliscenteredatthemeanofasamplerandomlydrawnfromthepopulation
understudy.Whenwehaveaconfidencelevelof95%weexpectequallywideconfidenceintervalscenteredat95
outof100suchsamplemeanstocontainthepopulationmean.

FindingaConfidenceInterval
Youunderstandthetheorybehindaconfidenceinterval.Buthowdoyouactuallyconstructone?
Wecannowtranslatethepreviousdiscussionintoasimplemethodforfindingaconfidenceintervalforthemeanof
anypopulation.First,werandomlyselectasampleofsize30fromthepopulation.
Wethencomputethemeanandstandarddeviationofthesample.
Next,weassignthesamplemeanasthecenteroftheconfidenceinterval.
Tofindthewidthoftheinterval,wemustknowthelevelofconfidencewewanttoassigntotheinterval.Ifwewanta
95%confidenceinterval,theintervalshouldbe2timesthestandarddeviationofthepopulationdividedbythe
squarerootofn,thesamplesize.
Sincewetypicallydon'tknowthestandarddeviationofthepopulation,wesubstitutethebestestimatethatwedo
havethestandarddeviationofthesample.
Here'swhattheequationlookslikeforourexample.
Ifwewantalevelofconfidenceotherthan95%,insteadofmultiplyings/sqrt(n)by2,wemultiplybythezvalue
correspondingtothedesiredlevelofconfidence.
Wecanusethisformulatocomputeanyconfidenceinterval.Thereisonerestriction:inorderforittowork,the
samplesizehastobeatleast30.

WineLover'sMagazing
Let'swalkthroughanexample.WineLover'sMagazine'smanagershaveaskedustohelpthemestimatethe
averageageoftheirsubscriberssotheycanbettertargetpotentialadvertisers.
Wetellthemweplantosurveyasampleoftheirsubscribers.Theysaythey'recomfortablewithourworkingwith
asample,butemphasizethattheywanttobe95%confidentthattherangewegivethemcontainsthetrueaverage
ageofitsfullsetofsubscribers.
Weobtainsurveyresultsfrom60randomlychosensubscribersanddeterminethatthesamplehasameanof52
andastandarddeviationof40.
Tofindanappropriateconfidenceinterval,weincorporateinformationaboutthesampleintotheformula:
Thezvaluefora95%confidenceintervalisabout2,ormoreaccurately,about1.96.Thistellsusthata95%
confidenceintervalwouldbeginat52minus10.12,or41.88,andendatthemeanplus10.12,or62.12.
Wegivemanagementtherangefrom41.88to62.12asanestimateoftheaverageageofitssubscribers,telling
themtheycanbe95%confidentthatthetruepopulationmeanfallsbetweenthesevalues.
Whatifwewantaconfidencelevelotherthan95%?Wecanusethesamplemean,standarddeviation,andsize
fromthesampledata,buthowdoweobtaintherightzvalue?

Obtainingthezvalue
Thezvaluefor95%confidenceiswellknowntobeabout2,buthowdowefindazvalueforalesscommon
confidenceinterval?Tobe98%confidentthatourintervalcontainsthepopulationmean,howdoweobtainthe
appropriatezvalue?
Tofindthezvaluefor98%confidencelevel,weareessentiallyasking:Howfartotheleftandrightofthestandard
normalcurve'smeandowehavetogotocapture98%ofthearea?
36/135

5/13/2016

QuantitativeMethodsOnlineCourse

Capturing98%oftheareacenteredatthemeanofthenormalcurveleavestwoareasatthetails,eachcovering1%
oftheareaunderthecurve.Thezvalueoftherightboundaryisthezvalueassociatedwithacumulative
probabilityof99%thesumofthecentral98%andthe1%inthelefttail.
Convertingthedesiredconfidencelevelintothecorrespondingcumulativeprobabilityonthestandardnormal
curveisessentialbecauseExcel'sNORMSINVfunctionandtheztableworkwithcumulativeprobabilities.
Tofindthezvalueassociatedwithacumulativeprobabilityof99%,enterintoExcel=NORMSINV(0.99),which
returnsthezvalue2.33.Or,lookintheztableandfindthecellthatcontainsacumulativeprobabilityclosestto
0.9900.Thezvalueis2.33,thesumoftherowvalue2.3andthecolumnvalue0.03.
Tryfindingazvalueyourself.Findthezvalueassociatedwitha99.5%confidencelevelusingtheappropriate
normaldistributionfunctioninExcelorusingtheStandardNormalTable(ztable)inyourbriefcase.
Thecorrectzvalueforaconfidencelevelof99.5%is:
ztable
Excel
Ourfirststepistoconverttheconfidencelevelof99.5%intothecorrespondingcumulativeprobabilityonthe
standardnormalcurve.Todothis,notethattohave99.5%probabilityinthemiddleofthestandardnormalcurve,
wemustexcludeatotalareaof199.5%=0.5%fromthecurve.Thatareaisdividedintotwoequalpartsinthe
distribution'stails:0.25%ineachtail.
Wecannowseethatthecumulativeprobabilityassociatedwithconfidencelevelof99.5%is10.25%=99.75%.
Thus,thezvalueforaconfidencelevelof99.5%isthesameasthezvalueofacumulativeprobabilityof99.75%.
WefindthezvalueinExcelbyentering=NORMSINV(0.9975),whichreturnsthevalue2.81.Alternatively,we
couldfindthezvalueintheztablebylookinguptheprobability0.9975.

Summary
Tocalculateaconfidenceinterval,wetakeasample,computeitsmeanandstandarddeviation,andthenbuilda
rangearoundthesamplemeanwithaspecifiedlevelofconfidence.Theconfidencelevelindicateshowconfident
wearethatthesamplemeanwecollectedcontainsthepopulationmean.

UsingSmallSamples
Weassumedinourconfidencelimitcalculationsthatthesamplesizewasatleast30.Whatifitisn't?Whatifwe
haveonlyasmallsample?Let'sconsideradifferentsurvey,onethatconcernsadelicatematter.
Thebusinessmanagerofalargeoceanliner,theDemiurgosasksforourhelp.Shewantsustofindoutthevalueof
herguests'belongings.Sheneedsthisvaluetodeterminethecorrectinsuranceprotectionincaseguestbelongings
disappearfromtheircabins,aredestroyedinafire,orsinkwiththeship.
Shehasnoideahowvaluableherguests'belongingsare,butshefeelsuneasyaskingthemforthisinformation.
Sheiswillingtoaskonly16gueststoestimatethetotalvalueofthebelongingsintheircabins.Fromthissample,
weneedtoprepareanestimate.
Withasamplesizelessthan30,wecannotcalculateconfidenceintervalsinthesamewayaswithalargesample
size.Asmallsampleincreasesouruncertaintyabouttwoimportantaspectsofourestimateofthepopulation
mean.
First,withasmallsample,theconsequencesoftheCentralLimitTheoremarenotassured,sowecannotbesure
thatthesamplemeansfollowanormaldistribution.
Second,withasmallsample,wecan'tbesurethatthesamplestandarddeviationisagoodestimateofthe
populationstandarddeviation.
Duetotheseadditionaluncertainties,wecannotusezvaluestoconstructconfidenceintervals.Usingazvalue
wouldoverstateourconfidenceinourestimate.
Canwestillcreateaconfidenceinterval?Isthereawaytoestimatethepopulationmeanevenifwehaveonlya
handfulofdatapoints?
Itdepends:ifwedon'tknowanythingabouttheunderlyingpopulation,wecannotcreateaconfidenceinterval
37/135

5/13/2016

QuantitativeMethodsOnlineCourse

withfewerthan30datapoints.However,iftheunderlyingpopulationisnormallydistributedorevenroughly
normallydistributedwecanuseaconfidenceintervaltoestimatethepopulationmean.
Inpractice,aslongaswearesuretheunderlyingpopulationisnothighlyskewedorextremelybimodal,wecan
constructaconfidenceinterval,evenwhenwehaveasmallsample.However,wedoneedtomodifyourapproach
slightly.
Toestimatethepopulationmeanwithasmallsample,weuseatdistribution,whichwasdiscoveredintheearly
20thcenturyattheGuinnessBrewingCompanyinIreland.
Atdistributiongivesustvaluesinmuchthesamewayasanormaldistributiongivesuszvalues.Whatisthe
differencebetweenthenormaldistributionandthetdistribution?
Atdistributionlookssimilartoanormaldistribution,butisnotastallinthecenterandhasthickertails,because
itismorelikelythanthenormaldistributiontohavevaluesfallfartherawayfromthemean.
Therefore,thenormaldistribution's"rulesofthumb"for68%and95%probabilitiesnolongerhold.Forexample,
wemustgomorethan2standarddeviationsoneithersideofthemeantocapture95%oftheprobabilityforat
distribution.
Thus,toachievethesamelevelofconfidence,aconfidenceintervalbasedonatdistributionwillbewiderthan
onebasedonanormaldistribution.Thisreinforcesourintuition:wehavelesscertaintyaboutourestimatewitha
smallersample,soweneedawiderintervaltoachieveagivenlevelofconfidence.
Thetdistributionisalsodifferentbecauseitvarieswiththesamplesize:Foreachsamplesize,thereisadifferent
tvalueassociatedwithagivenlevelofconfidence.Thesmallerthesamplesizen,theshortertheheightandthe
thickerthetailsofthetdistributioncurve,andthefartherwehavetogofromthemeantoreachagivenlevelof
confidence.
Ontheotherhand,asthesamplesizeincreases,theshapeofthetdistributionbecomesmoreandmorelikethe
shapeofanormaldistribution.Oncewereachasamplesizeof30,thetdistributionbecomesvirtuallyidenticalto
thezdistribution,sotvaluesandzvaluescanbeusedinterchangeably.
Incidentally,wecanusethetdistributionevenforsamplesizeslargerthan30.However,mostpeopleusethez
distributionforlargersamples,partiallyoutofhabitandpartiallybecauseit'seasier,sincethezvaluedoesn'tvary
basedonthesamplesize.

Findingthetvalue
Tofindtherighttvalue,wefirsthavetoidentifythetdistributionthatcorrespondstooursamplesize.Wedo
thisbyfindingthenumberof"degreesoffreedom"ofthesample,whichforourpurposesissimplythesample
sizeminusone.Ifoursamplesizeis16,wehave15degreesoffreedom,andsoon.
Excelprovidesasimplefunctionforfindingtheappropriatetvalueforaconfidenceinterval.Ifweenter1
minusthelevelofconfidencewewantandthedegreesoffreedomintotheExcelfunctionTINV,Excelgivesus
theappropriatetvalue.
Forexample,fora95%confidenceintervalandasamplesizeofn=16,theExcelfunctionTINV(0.05,15)would
returnthevalue2.131.
Excel
Oncewefindthetvalue,weuseitjustlikeweusedthezvaluetofindaconfidenceinterval.Forexample,fort
=2.131,theappropriateconfidenceintervalis:
Excel
Ifwedon'thaveExcelhandy,wecanuseatdistributiontabletofindthetvalueassociatedwiththedegreesof
freedomandtheconfidencelevelwespecify.Whenusingdifferenttvaluetables,weneedtobecarefultonote
whichprobabilitythetabledepicts.
Excel
ttable
Sometablesreportvaluesassociatedwiththeconfidencelevel,like0.95.Othersreportvaluesbasedonthearea
inthetails,whichwouldbe0.05fora95%confidenceinterval.Ourttable,likemanyothers,reportsvalues
associatedwithacumulativeprobability,sofora95%levelofconfidence,wewouldhavetolookatacumulative
38/135

5/13/2016

QuantitativeMethodsOnlineCourse

probabilityof97.5%.
Excel
ttable

TheGoodShipDemiurgos
ReturningtothegoodshipDemiurgos,let'sdetermineanestimateoftheaveragevalueofpassengers'
belongings.Themanagersamples16guests,andreportsthattheyhaveanaverageof$10,200worthofclothing,
jewelry,andpersonaleffectsintheircabins.Fromhersurveynumbers,wecalculateastandarddeviationof
$4,800.
Weneedtodoublecheckthatthedistributionisn'ttooskewed,whichwemightexpect,sincesomeofthe
passengersarequitewealthy.Themanagerexplainsthattheinsurancepolicyhasalimitedliabilityclausethat
limitsapassenger'smaximumclaimto$20,000.
Above$20,000,passengers'ownhomeowners'policiesmustcoveranylosses.Thus,inthesurvey,ifaguest
reportedvaluesabove$20,000,themanagersimplyreported$20,000asthevaluetobecoveredforourdata
set.
Wesketchagraphofthe16valuesthatconfirmsthatthedistributionisnottooasymmetric,sowefeel
comfortableusingthetdistribution.
Sincewehaveasampleof16passengers,thereare15degreesoffreedom.TheExcelfunction=TINV(0.05,15)
tellsusthattheappropriatetvalueis2.131.
Excel
Usingtheconfidenceintervalformula,theguests'valuablesareworth$10,200plusorminus2.131times
$4,800overthesquarerootof16.Thus,thewidthoftheconfidenceintervalis2.131*4,800/4=$2,557,andwe
canreportthatweare95%confidentthattheaveragevalueofpassengers'belongingsisbetween$7,643and
$12,757.
Excel
ttable
WhatiftheDemiurgos'managerthinksthisintervalistoolarge?
Excel
ttable
Shewillhavetosurveymoreguests.Increasingthesamplesizecausesthetvaluetodecrease,andalsoincreases
thesizeofthedenominator(thesquarerootofn).Bothfactorsnarrowtheconfidenceinterval.
Excel
ttable
Forexample,ifsheasks10moreguests,andthestandarddeviationofthesampledoesnotchange,thetvalue
woulddropto2.06andthesquarerootofninthedenominatorwouldincrease.Thewidthoftheintervalwould
decreasesignificantly,from$2,557to$1,939.
Excel
ttable

Summary
Confidenceintervalscanbeconstructedevenwithasamplesizeoflessthan30,aslongasthepopulationis
roughlynormallydistributed(or,atleastnottooskewedorbimodal).Tofindaconfidenceintervalwithasmall
sample,useatdistribution.Tdistributionsareasetofdistributionsthatresemblethenormaldistribution,but
withshorterheightsnearthemeanandthickertails.Tofindaconfidenceintervalforasmallsamplesize,place
theappropriatetvalueintotheconfidenceintervalformula.

ChoosingaSampleSize
Whenwetakeasurvey,weoftenwantaspecificlevelofaccuracyinourestimateofthepopulationmean.For
39/135

5/13/2016

QuantitativeMethodsOnlineCourse

example,whenestimatingcarowners'averagespendingoncarrepairseachyear,wemightwanttobe95%
confidentthatourestimateiswithin$50ofthetruemean.
Weknowthatthesamplesizeofoursurveydirectlyaffectstheaccuracyofourestimate.Thelargerthesample
size,thetightertheconfidenceintervalandthemoreaccurateourestimate.Asampleofsizengivesusa
confidenceintervalthatextendsadistanceofdoneithersideofthemean:
Tofindthesamplesizenecessarytogiveusaspecifieddistancedfromthemean,wemusthaveanestimateof
sigma,thestandarddeviationofspending.Ifwedonothaveanestimatebasedonpastdataorsomeothersource,
wemighttakeapreliminarysurveytoobtainaroughestimateofsigma.
Inthisexample,weestimatesigmatobe$300basedonpastexperience.Sincewewanta95%levelofconfidence,
wesetz=1.96.Toensureourdesiredaccuracythatdisnomorethan$50wemustrandomlysampleatleast
139people.
Ingeneral,toensureaconfidenceintervalextendsadistanceofatmostdoneithersideofthemean,wechoosea
samplesizenthatsatisfiestheexpressionbelow.Wecandothiswithsimplealgebra,orbyusingtheattached
Excelutility.
ConfidenceIntervalUtility

Summary
Whenestimatingapopulationmean,wecanensurethatourconfidenceintervalextendsadistanceofatmostd
oneithersideofthemeanbychoosinganappropriatesamplesize.

StepbyStepGuide
Hereisastepbystepprocessforcreatingaconfidenceinterval:
First,wechoosealevelofconfidenceandasamplesizenappropriatetothedecisioncontext.
Second,wetakearandomsampleandfindthesamplemean.Thisisourbestestimateforthepopulationmean.
Third,wefindthesample'sstandarddeviation.
Fourth,findthezvalueortvalueassociatedwiththeproperconfidencelevel.Ifoursamplesizeisover30,we
findthezvalueforourconfidencelevel.Ifnot,wefindthetvalueforourconfidencelevelandwithdegreesof
freedom=samplesize1.
Fifth,wecalculatetheendpointsoftheconfidenceintervalusingtheformulaebelow.

Summary
Constructconfidenceintervalsusingthestepsoutlinedbelow.Withaconfidenceintervalderivedfroman
unbiasedrandomsample,wecansaythatthetruepopulationmeanfallswithintheintervalwiththe
correspondinglevelofconfidence.

ExcelUtility
ClickheretoopenanExcelutilitythatallowsyoutocreateconfidenceintervalsbyprovidingthesamplemean,
standarddeviation,size,anddesiredlevelofconfidence.Youshouldenterdataonlyintheyellowinputareasof
theutility.Toensureyouareusingtheutilitycorrectly,trytoreproducetheresultsfortheWineLover'sMagazine
andtheDemiurgosexamples.

SolvingtheScubaProblemII
ThesampleyoucollectedearlierhasallthedatayouneedtocreateaconfidenceintervalforLeo'sproblem.
YoutakeanotherlookatthesurveyyoucreatedearlierforLeo:yousampled45guests,andcalculatedthatthe
averagesatisfactionrateofthesamplewas4.4,withastandarddeviationof1.54.Usingthisinformation,youdecide
tocreatea95%confidenceintervalforLeo.
Yourcalculationsshowthefollowing:
40/135

5/13/2016

QuantitativeMethodsOnlineCourse

ConfidenceIntervalUtility
ztable
ttable
Tocreatea95%confidenceinterval,youtakethemeanofthesampleandadd/subtractthezvaluemultipliedbythe
samplestandarddeviationdividedbythesquarerootofthesamplesize.Usingthenumbersgiven,youobtaina
95%confidenceintervalbygoing0.45pointsaboveandbelowthesamplemeanof4.4,whichtranslatesintoa
confidenceintervalfrom3.95to4.85.
YoumeetwithLeoandtellhimthatyoucanbe95%certainthatthepopulationmeanfallsbetween3.95and4.85.
Leolooksatyournumbersandshakeshishead.
That'sjustnotaccurateenoughformetomakeadecision.Ifthemeaniscloseto4.85,I'dbehappy,butifit'scloser
to4,I'mconcerned.Canwenarrowtherangeatall?
Lookingoveryournotes,youthinkyoucangiveLeosomeoptions.
Whydon'tyoucreatealargersampleandreporttheresultsbacktome?
Youselectanother40guestsatrandomandaskthehoteloperatortoconductthesurveyforyouagain.Heisableto
reach25guests.Youcombinethetwosamples,whichgivesanewsamplesizeof70.
Forthecombinedsample,youfindthatthenewsamplemeanis4.5andthenewsamplestandarddeviationis1.2.
Armedwithmoredata,youcreateanotherconfidenceinterval.
Wecanbe95%certainthattheaveragesatisfactionofallhotelguestswiththescubaschoolisbetween:
ConfidenceIntervalUtility
ztable
ttable
Tocreatethis95%confidenceinterval,youtakethemeanofthesampleandadd/subtractthezvaluemultipliedby
thesamplestandarddeviationdividedbythesquarerootofthesamplesize.Usingthenumbersgiven,youobtaina
95%confidenceintervalbygoing0.28pointsaboveandbelowthesamplemeanof4.5,whichtranslatesintoa
confidenceintervalfrom4.22to4.78.
Thankyou.Iammuchhappierwiththisresult.Ihaveenoughinformationnowtodecidewhethertokeepthe
currentscubadivingschool.

Exercise1:TheVeetekVCRGambit
ToshiMatsumotoistheChiefOperatingOfficerofaconsumerelectronicsretailerwithover150storesspread
throughoutJapan.Foroverayear,thesalesofhighendVCRshavelagged,duetoashifttowardsDVDplayers.
Justtoday,ToshiheardthatVeetek,alargeSouthAsianelectronicsretailer,islookingtopurchaseabulkshipment
ofhighendVCRs.
ThiswouldbeaperfectopportunityforToshitoliquidateslowmovinginventorycurrentlylanguishingonthe
shelvesofhisstores.BeforehecallsVeetek,hewantstoknowhowmanyhighendVCRshecanpromise.Aftertwo
daysoffuriousphonecalls,hisdeputyhasgathereddatafrom36representativeoutletsinhisretailchain.
ThemeanhighendVCRinventoryineachstorepolledwas500units.Thestandarddeviationwas180.Toshi
needsyoutofinda95%confidenceintervalfortheaverageVCRinventoryperstore.Theintervalis:
ConfidenceIntervalUtility
ztable
ttable

Exercise2:PulluscularPigDisorder
PaulSegalmanagesthepigfarmingdivisionoftheagriculturalcompanyBowmanLyonsCenterville.Arumored
outbreakofPulluscularPigDisorder(PPD)inoneofPaul'sherdsisonthevergeofcausingapublicrelations
disaster.
ThemainsymptomofPPDisashrinkingbrain,andtheonlycertainwaytodiagnosePPDisbymeasuringbrain
sizepostmortem.
41/135

5/13/2016

QuantitativeMethodsOnlineCourse

PaulneedstoknowifhisherdisaffectedbyPPD,buthedoesnotwanttohavetoslaughterhundredsofswineto
findout.Atthepreliminarystage,hecanoffernomorethan5primeporkerstobeslaughteredanddiagnosed.
Forthepigsslaughtered,themeanbrainweightwas0.18lbs,withastandarddeviationof0.06lbs.With95%
confidence,inwhatrangedoestheherd'saveragebrainweightlie?
ConfidenceIntervalUtility
ztable
ttable

Proportions
Thenextmorning,youandAliceareabouttoheadofftothehotelpoolwhenLeocallsyou.

TheCustomerResponseProblem
I'msorrytodisturbyou,butIhaveanotherproblem,andIthinkyoumightbeabletohelp.
TheKahanaisaverypopularresortduringthesummertouristseason.Butthenumberofleisurevisitorsdrops
significantlyduringtheoffseason,fromSeptemberthroughFebruaryandthenAprilthroughMay.
Weusuallyhavequiteafewroomvacanciesduringthatperiodoftime.Weexpecttohaveabout200roomsvacant
forweeklongperiodsduringtheslowseasonthisyear.
I'vedevelopedanewprogramthatrewardsourbestguestswithaspecialdiscountiftheybookaweeklongstay
duringourslowperiod.Theywon'thavecompletedateflexibilityofcourse,butthesteepdiscountshouldmakethe
offerattractiveforthem.
Toseehowmanyofourpastguestswouldacceptsuchanoffer,Isentpromotionalbrochuresto100ofthem.The
deadlinebywhichtheyhadtorespondtotheofferhaspassed.Tenguestsrespondedwiththerequiredroomdeposit
priortothedeadlinethat'sasolid10percent.
Ifigureifwesendout2,000promotions,we'llgetabout200responses.
ThisisaniceideaLeo,butI'mconcerneditcouldbackfire.Ifmorethan10%respondtothisoffer,youmightendup
disappointingsomeoftheveryguestsyou'retryingtoreward.Or,iftoomanyrespondandyougivethemallthe
discount,you'llhavetoturnawaycustomerswillingtopayfullprice.
Thatisexactlymyconcern.Iwonderhowaccuratethe10%responserateis.Justbecauseitheldfor100guests,will
itholdfor2,000?Whatif11%actuallyrespondtothepromotions?
Imaginewhatwouldhappenif220guestsresponded.Idon'twanttoanger20loyalcustomersbytellingthemthe
offerisnotvalid,butIalsodon'twanttoturnawayfullpayinggueststoaccommodatetheextra20guestsata
discount.
I'mwillingtoreserve200roomsforthesediscountweeklongstaysduringtheslowseason.Howmanyreturnguests
canIsafelysendthediscountofferandbeconfidentthatnomorethan200willrespond?
YoucantellthatLeoisgrowingquitecomfortablewithrelyingonyourstatisticalmethods.Heseemsalmostas
interestedinthemasheisinyourresults.

ConfidenceIntervalsAndProportions
Sometimes,thequestionweposetomembersofasamplecallsforayesornoanswer.
Wemightsurveypeopleinatargetmarketandaskiftheyplantobuyanewcarthisyear.Orsurveyvotersandask
iftheyplantovotefortheincumbentcandidateforoffice.Orwemighttakeasampleoftheproductsourplant
producedyesterdayandcounthowmanyaredefective.
Eventhoughourquestionhasonlytwoanswers,westillhavetoaddressaninherentuncertainty:Weknowwhat
valuesourdatacantakeyesornobutwedon'tknowhowofteneachresponsewillbegiven.
Inthesecases,weusuallyconveythesurveyresultsbyreportingthepercentageofyesresponsesasaproportion,p
bar.Thisisourbestestimateofp,thetruepercentageof"yes"responsesintheunderlyingpopulation.
42/135

5/13/2016

QuantitativeMethodsOnlineCourse

Suppose,forexample,thatwehavepostedadvertisementsinthesubwaycarsonBoston's"RedLine,"andwantto
knowwhatpercentageofallpassengersremembersseeingourad.
Wecreateapropersurvey,andaskrandomlyselectedRedLinepassengersiftheyrememberseeingourad.300
passengersrespondtooursurvey,ofwhich100passengersreportrememberingthead.
Thenpbarissimply33%,whichisthenumberofpeoplethatrememberthead,100,dividedbythenumberof
respondents,300.
Theremaining200passengers,or67%ofthesample,reportnotrememberingthead.Thetwoproportionsalways
addupto1becausesurveyrespondentsreporteitherrememberingtheadornot.
Onceweknowtheproportionofthesample,wecandrawconclusionsaboutallRedLinepassengers.Ourbest
estimate,orpointestimate,forp,thepercentageofallpassengerswhorememberseeingourad,is33%.
Asmanagers,wetypicallywantmorethanthissimplepointestimatewewanttoknowhowaccuratetheestimate
is.Howfarfrom33%mightthetruepercentagebe?Canwesayconfidentlythatitisbetween30%and36%,for
example?
Whenweworkwithproportions,howdowefindaconfidenceintervalaroundourpointestimate?
Theprocessforcreatingaconfidenceintervalaroundaproportionisnearlyidenticaltotheprocesswe'veused
before.Theonlydifferenceisthatwecanapproximatethestandarddeviationofthepopulationwithasimple
formularatherthancalculatingitdirectlyfromtherawdata.
Basedonoursample,ourbestestimateofthetruepopulationproportionispbar,thepercentageof"yes"responses
inoursurvey.Statisticaltheorytellsusthatourbestestimateofthestandarddeviationofthetruepopulation
proportionisthesquarerootof[(pbar)*(1(pbar)].Wecanusethisapproximatestandarddeviationtodetermine
aconfidenceintervalfortheproportion.
ForourRedLinead,weapproximatethestandarddeviationwiththesquarerootof0.33times0.67,or0.47.A95%
confidenceintervalis0.33plusorminus1.96times0.47dividedbythesquarerootof300.Thisisequalto0.33
plusorminus0.053,or27.7%to38.3%.
Unfortunately,thereisonecatchwhenwecalculateconfidenceintervalsaroundproportions...

SampleSize
Samplesizematters,particularlywhendealingwithverysmallorverylargeproportions.Supposeweare
samplingNewYorkersforAmyotrophicLateralSclerosis,commonlyknownasLouGehrig'sDisease.IntheU.S.,
theoddsofhavingthediseasearelessthan1in10,000.Wouldoursamplebeusefulifwesurveyed100people?
No.Weprobablywouldn'tfindasinglepersonwiththediseaseinoursample.Sincethetrueproportionisvery
small,weneedtohavealargeenoughsampletomakesurewefindatleastafewpeoplewiththedisease.
Otherwise,wewillnothaveenoughdatatogetagoodestimateofthetrueproportion.
Thereisaguidelinewemustmeettomakesurethatoursampleislargeenoughwhenestimatingproportions.
Twoconditionsmustbemet:First,theproductofthesamplesizeandtheproportionmustbeatleast5.Second,
theproductofthesamplesizeand1minustheproportionmustalsobeatleast5.
Ifboththeserequirementsaremet,wecanusethesample.Essentially,thisguidelineguaranteesthatoursample
containsareasonablenumberof"yes"andareasonablenumberof"no"answers.Oursamplewillnotbeuseful
otherwise.
Toavoidaninvalidsample,weneedtocreatealargeenoughsamplesizetosatisfytherequirements.However,
sincewedon'tknowtheproportionpbarbeforesampling,wedon'tknowifthetwoconditionsaremetbefore
settingthesamplesize.Howcanwegetaroundthisproblem?

FindingaPreliminaryEstimateofpbar
Wecanobtainapreliminaryestimateofpbarusingeitheroftwomethods:first,wecanusepastexperience.For
example,toestimatetherateofLouGehrig'sdisease,wecanresearchtherateofoccurrenceinthegeneral
population.Thisisareasonablefirstestimateforpbar.
Inmanycases,however,wearesamplingforthefirsttime.Withoutpastexperience,wedon'tknowwhatpbar
mightbe.Inthiscase,itmaywellbeworthourtimetotakeasmalltestsampletoestimatetheproportion,pbar.
43/135

5/13/2016

QuantitativeMethodsOnlineCourse

Forexample,iftheproportionofyesanswersinoursmalltestsampleis3%,thenwecanuse3%asour
preliminaryestimateofpbar.
Substituting3%forpbarinourtworequirements,n(pbar)5andn(1(pbar))5,tellsusthatnmustsatisfy
n*0.035andn*0.975.Thusthesamplesizeweneedforourrealsamplemustbeatleast167.
Wewouldthenusearealsamplewithatleast167respondentstofindanactualsamplevalueofpbarto
createaconfidenceintervalforthepopulationproportion.

Summary
Proportionsareoftenusedtoindicatethefrequencyofsomecharacteristicinapopulation.Thesampleproportion
pbaristhenumberofoccurrencesofthecharacteristicinthesampledividedbythenumberofrespondents,the
samplesize.Itisourbestestimateofthetrueproportioninthepopulation.Wecanconstructaconfidenceinterval
forthepopulationproportion.Twoguidelinesforthesamplesizemustbemetforavalidconfidenceinterval:n(p
bar)andn(1(pbar))musteachbeatleastfive.

SolvingtheCustomerResponseProblem
Creatingconfidenceintervalsaroundproportionsisnotmuchdifferentfromcreatingthemaroundmeans.Finding
therightnumberofLeo'spromotionalbrochurestomailshouldbeeasy.
Leoneedstoknowhowaccuratethe10percentresponserateofhis100customersampleis.Willthisresponserate
holdfor2,000guests?Tohowmanyguestscanhesendthediscountofferforhis200rooms?
First,youcalculatea95%confidenceintervalfortheresponserate.
Enterthelowerboundasadecimalnumberwithtwodigitstotherightofthedecimal,(e.g.,enter"5"as"5.00").
Roundifnecessary.
ztable
ConfidenceIntervalUtility
The95%confidenceintervalfortheproportionestimateis0.0412to0.1588,or4.12%and15.88%.Youobtainthat
answerbyusingthesampledataandapplyingthefamiliarformula:
ThenaftergivingLeo'squestionssomethought,yourecommendtohimthathesendthemailingtoaspecific
numberofguests.
Enterthenumberofguestsasaninteger,(e.g.,"5").Roundifnecessary.
ztable
Basedontheconfidenceintervalfortheproportion,themaximumpercentageofpeoplewhoarelikelytorespondto
thediscountoffer(atthe95%confidencelevel)is15.88%.So,if15.88%ofpeopleweretorespondfor200rooms,
howmanypeopleshouldLeosendoutthesurveyto?Simplydivide200by0.1588togettotheanswer:Leoneedsto
sendoutthesurveytoatmost1,259pastcustomers.
Leoispleasedwithyourwork.Hetellsyoutorelaxandenjoytheresort.

Exercise1:GMWAutomotive
GMWisaGermanautomanufacturerthathasregionalsalessubsidiariesthroughouttheworld.ArturoLopez
headstheMexicansalesdivisionofthecompany'sLatinAmericansubsidiary.
GMWearnsadditionalprofitwhencustomerschoosetofinancetheircarpurchasewithaGMWfinancing
package.ArturohasbeenaskedtosubmitareporttotheGMWCEOinGermanyaboutthepercentageofGMW
customerswhooptforfinancing.
Arturohasaskedyou,anewmemberofthedivisionsalesteam,todeviseawaytoestimatethispercentage.You
takearandomsampleof64carssoldintheMexicansalesdivision,andfindthat13ofthem,orabout20.3%,
optedforGMWfinancing.
Ifyouwanttobe95%confidentinyourreporttoMr.Lopez,youshouldtellhimthatthepercentageofallMexican
customersoptingforGMWfinancingfallsintherange:
44/135

5/13/2016

QuantitativeMethodsOnlineCourse

ztable
ConfidenceIntervalUtility

Exercise2:CrownToothpaste
KayleighMarlonistheChiefBuyeratTarMart,acompanythatoperatesachainofsuperstoressellingdiscount
merchandise.TarMarthasahugenationalpresence,andmanufacturerscompetefiercelytogettheirproducts
ontoTarMart'sshelves.
CrownToothpaste,anewentrantinthetoothpastemarket,isoneofthem.KayleighagreedtostockCrownfor4
weeksanddisplayitprominently.Afterthatperiod,shewillstopstockingCrownunless5%ofTarMart's
customersboughtCrownorwereconsideringbuyingCrownwithinthenextmonth.
Thetrialperiodisnowover.KayleighhasaskedyoutotakeasampleofcustomerstoseeifTarMartshould
continuestockingCrown.Shewouldlikeyoutobeatleast95%confidentinyouranswer.
Thefirststepistodecidehowlargeasamplesizetochoose.Kayleightellsyouthat,inthepast,whenTarMart
introducedanewproduct,thepercentageofpeoplewhoexpressedinterestrangedbetween2%and10%.What
samplesizeshouldyouuse?
Youchooseasamplesizeof250.Afterconductingthesurvey,youfindthat10outof250peoplesurveyedhad
boughtCrownorwereconsideringbuyingCrownwithinthenextmonth.Whatisthe95%confidenceintervalfor
thepopulationproportion?
ztable
ConfidenceIntervalUtility
First,youfindthesampleproportion:10outof250isaproportionof4%.Youverifythatn(pbar)=250*0.04=
105andn(1(pbar))=250*0.96=2405.Then,usingtheformula,youfindtheconfidenceintervalaround
thesampleproportion.Theendpointsofthatintervalare1.6%and6.4%.

Challenge:OOPS!PackageDeliveries
OOPSisasmallpackagedeliveryservicewithworldwideoperations.CelineBedex,VPMarketing,hasheard
increasingcomplaintsaboutlatedeliveries,andwantstoknowhowmanyoftheshipmentsarelatebyonedayor
more.
Celinewouldlikeanestimateofthepercentageoflatedeliveries.Inasampleof256shipments,2weredelivered
late,aproportionofabout0.008,or0.8%.IfCelinewantstobe99%confidentintheresultofaconfidence
intervalcalculation,theintervalis:
Celinecollectsanewsample,thistimeof729shipments.Ofthese,8werelate.Celinecanbe99%confidentthat
thepopulationproportionoflatepackagesisbetween:
ztable
ConfidenceIntervalUtility
First,calculatethesampleproportionforthenewsample:8/729=0.011.Then,verifythatthenewsamplesize
satisfiestherulesofthumb.Bothn(pbar)andn(1(pbar))aregreaterthan5.
Usingthenewsamplesizeandsampleproportion,calculatetheconfidenceinterval:[0.1%,2.1%].

HypothesisTesting
Introduction
Afterfinishingthesamplingassignments,youandAlicedecidetotakesometimeofftoenjoythebeach.Justasyou
aregatheringyourbeachgear,Leogivesyouanothercall.

ImprovingtheKahana
Hithere!Don'tletmekeepyoufromenjoyingthebeach.IjustwantedtoletyouknowwhatI'dlikeyoutohelpme
withnext.I'vebeenworkingonideastoincreasetheKahana'sprofits.
45/135

5/13/2016

QuantitativeMethodsOnlineCourse

Isitpossibletoincreaseprofitsbyraisingtheroomprices?Thatwouldbeaneasysolution.
Iwishitwerethateasy.Roompricesareextremelycompetitiveandareoftenthefirstthingpotentialgueststake
intoconsideration.Soifweincreaseroomprices,I'mafraidwe'llhavefewerguests.Thatmightputusbackwhere
westartedfromwithprofitsorevenworse.
Whatotherfactorsinfluenceyourprofits?
Thetwomajoronesareroomoccupancyratesanddiscretionaryspending."Discretionaryspending"isthemoney
guestsspendonnonroomamenities.Youknow,food,drinks,spaservices,sportsactivities,andsoon.
AsamanagerIcanaffectavarietyoffactorsthatinfluencediscretionaryspending:thequalityoftherestaurant,for
example,orthetypesofamenitiesoffered.
Andyou'dlikeustohelpyouunderstandyourguests'discretionaryspendingpatternsbetter.
Right.ThenIcanexplorenewwaystoincreaseprofitsonnonroomamenities.Icanalsoseeifsomeofmyrecent
effortstoincreaseguestspendinghavepaidoff.
I'mparticularlyinterestedinrestaurantoperations.I'vemadesomechangestotherestaurantsrecently.For
example,Ihiredanewexecutivecheflastyear.I'dliketoknowifrestaurantrevenuesperpersonhavechanged
sincethen.
I'dalsoliketofindoutiftherenovationofourpremiercocktailloungehasresultedinhigherspendingonbeverages.
Finally,I'vebeenwonderingifdiscretionaryspendingpatternsaredifferentforleisureandbusinessguests.Ifso,I
mightchangeourmarketingcampaignstobettersuiteachofthosemarketsegments.
Whatrecordsdoyouhaveforustoworkwith?
Wedon'thaveaconsolidatedreportforthisyearyet,sowe'llneedtoconductsomesurveysandanalyzetheresults.
You'rereallygettingintothesestatisticalmethods,aren'tyou,Leo?

Definition
Leomadesomeimportantchangestohisbusinessandhehassomeideasofwhattheimpactofthesechangeshas
been.Howdoyouputhisideastothetest?
Asmanagers,weoftenneedtoputourclaims,ideas,ortheoriestothetestbeforewemakeimportantdecisions.
Basedonwhetherornotourclaimisstatisticallysupported,wemaywishtotakemanagerialaction.
Hypothesistestingisastatisticalmethodfortestingsuchclaims.Ahypothesisissimplyaclaimthatwewantto
substantiate.Tobegin,wewilllearnhowtotesthypothesesaboutpopulationmeans.
Forinstance,supposeweknowthatthehistoricalaveragenumberofdefectsinaproductionprocessis3defectsper
1,000unitsproduced.Wehaveahunchthatacertainchangetotheprocessanewmachine,sayhaschanged
thisnumber.Thehypothesiswewishtosubstantiateisthattheaveragedefectratehaschangedthatitisno
longer3per1,000.
Howdoweconductahypothesistest?First,wecollectarandomsampleofunitsproducedbytheprocess.Then,we
seewhetherornotwhatwelearnaboutthesamplesupportsourhypothesisthatthedefectratehaschanged.
Supposeoursamplehasanaveragedefectrateof2.7defectsper1,000.Basedonthissample,canweconfidentlysay
thatthedefectratehaschanged?
Thatdepends.Tofindout,weconstructarangearoundthehistoricaldefectrateof3thepopulationmeanthathas
beencastindoubt.Weconstructtherangesothatifthemeandefectrateinthepopulationisstill3,itisvery
likelyforthemeanofasampletakenfromthepopulationtofallwithinthatrange.
Theoutcomeofourtestwilldependonwhether2.7,themeanofthesamplewehavetaken,fallswithintherangeor
not.
Ifthesamplemeanof2.7fallsoutsideoftherange,wefeelcomfortablerejectingthehypothesisthatthedefectrate
isstill3.
However,ifthesamplemeanfallswithintherange,wedon'thaveenoughevidencetosupporttheclaimthatthe
46/135

5/13/2016

QuantitativeMethodsOnlineCourse

defectratehaschanged.
Thisexamplecapturestheessenceofhypothesistesting,butweneedtoformalizeourintuitionabouttheexample
anddefineournewstatisticaltechniquemoreprecisely.
Toconductahypothesistest,weformulatetwohypotheses:thesocallednullhypothesisandthealternative
hypothesis.
Basedonexperienceorconventionalwisdom,wehaveaninitialvalueofthepopulationmeaninmind.Thenull
hypothesisstatesthatthepopulationmeanisequaltothatinitialvalue:inourexample,thenullhypothesisstates
thatthecurrentpopulationmeanis3defectsper1,000.WeusetheGreeklettermutorepresentthepopulation
mean,inthiscasethecurrentaveragedefectrate.
Thealternativehypothesisistheclaimwearetryingtosubstantiate.Here,thealternativehypothesisisthatthe
averagedefectratehaschanged.Notethatthealternativehypothesisstatesthatthenullhypothesisdoesnothold.
Astheexamplesuggests,inahypothesistest,wetestthenullhypothesis.Basedonevidencewegatherfroma
sample,thereareonlytwopossibleconclusionswecandrawfromahypothesistest:eitherwerejectthenull
hypothesisorwedonotrejectit.
Sincethealternativehypothesisstatestheoppositeofthenullhypothesis,by"rejecting"thenullhypothesiswe
necessarily"accept"thealternativehypothesis.
Inourexample,theevidencefromoursamplewillhelpusdeterminewhetherornotweshouldrejectthenull
hypothesisthatthedefectrateisstill3infavorofthealternativehypothesisthatthedefectratehaschanged.
Basedonoursampleevidence,whichconclusionshouldwedraw?Werejectthenullhypothesisifitishighly
unlikelythatoursamplemeanwouldcomefromapopulationwiththemeanstatedbythenullhypothesis.
Forexample,ifthesamplewedrewhadadefectrateof14per1,000,wewouldrejectthenullhypothesis.Drawinga
samplewith14defectsfromapopulationwithanaveragedefectrateof3wouldbeveryunlikely.
"Wecannotrejectthenullhypothesisifitisreasonablylikelythatoursamplemeanwouldcomefromapopulation
withthemeanstatedbythenullhypothesis.Thenullhypothesismayormaynotbetrue:wesimplydon'thave
enoughevidencetodrawadefiniteconclusion."
Forexample,ifthesamplewedrewhadadefectrateof3.05per1,000,wecouldnotrejectthenullhypothesis,since
itwouldn'tbeunusualtorandomlydrawasamplewith3.05defectsfromapopulationwithanaveragedefectrateof
3.
Notethathavingthesample'saveragedefectrateverycloseto3doesnot"prove"thatthemeanis3.Thuswenever
saythatwe"accept"thenullhypothesiswesimplydon'trejectit.
Itisbecausewecannever"accept"thenullhypothesisthatwedonotposetheclaimthatweactuallywantto
substantiateasthenullhypothesissuchatestwouldneverallowusto"accept"ourclaim!Theonlywaywecan
substantiateourclaimistostateitastheoppositeofthenullhypothesis,andthenrejectthenullhypothesisbased
ontheevidence.
Itisimportantthatweunderstandexactlyhowtointerprettheresultsofahypothesistest.Let'sillustratethetwo
typesofconclusionswithananalogy:aUSjurytrial.
IntheUSjudicialsystem,theaccusedisconsideredinnocentuntilprovenguilty.So,thenullhypothesisisthatthe
accusedisinnocent.Thealternativehypothesisisthattheaccusedisguilty:thisistheclaimthattheprosecutionis
tryingtoprove.
Thetwopossibleoutcomesofajurytrialare"guilty"or"notguilty."Thejurydoesnotconvicttheaccusedunlessitis
certainbeyondreasonabledoubtthattheaccusedisguilty.Withinsufficientevidence,thejurycannotconcludethat
theaccusedtrulyisinnocent.Thejurysimplydeclaresthattheaccusedis"notguilty.
Similarly,inahypothesistest,ifourevidenceisnotstrongenoughtorejectthenullhypothesis,thenthatdoesnot
provethatthenullhypothesisistrue.Wesimplyhavefailedtoshowitisfalse,andthuscannotrejectit.
Ahypothesisisaclaimorassertionthatcanbetested.Onthebasisofahypothesistestweeitherrejectorleave
unchallengedaparticularstatement:thenullhypothesis.
AlicepromisesLeothatthetwoofyouwilldropbyhisofficefirstthinginthemorningtotestifLeo'ssurveyresults
supporthisclaimsthatfoodandbeveragespendingpatternshavechanged.
47/135

5/13/2016

QuantitativeMethodsOnlineCourse

Summary
Weusehypothesisteststosubstantiateaclaimaboutapopulationmean.Thenullhypothesisstatesthatthe
populationmeanisequaltoaninitialvaluethatisbasedonourexperienceorconventionalwisdom.Wetestthe
nullhypothesistolearnifweshouldrejectitinfavorofourclaim,thealternativehypothesis,whichstatesthatthe
nullhypothesisdoesnothold.

SinglePopulationMeans
Thenextmorning,Leoexplainsthemeasureshehasundertakentoincreasecustomerspendingonfoodand
beverages."I'dliketoseeifthey'vehadadiscernableimpactonmyguests'restaurantrelatedspendingpatterns."

TheRestaurantRevenueProblem
Lastyear,Imadetwomajorchangestorestaurantoperations:Ibroughtinanewexecutivechefandrenovatedthe
maincocktaillounge.
Thechefintroducedanewmenu:afusionoftraditionalHawaiianandFrenchcuisine.Sheputsomeelaborateitems
onthemenu,likethatmangoandbrietartIrecommendedtoyou.Shealsohasofferingsthatcatertosimplertastes.
Butthequestionis,haverestaurantprofitsbeenaffectedbythenewchef?
Sincewesetourfoodmarginsasafixedpercentageoffoodrevenue,Iknowthatifrevenueshaveincreased,profits
haveincreasedtoo.Basedonlastyear'sconsolidatedreports,theaveragespendingonfoodperpersonperdaywas
$55.I'mcurioustoseeifthathaschanged.
Inaddition,Irenovatedthecocktaillounge.Theoldbarwasdesignedpoorlyandusedspaceinefficiently.Nowmore
guestscanbeseatedinthelounge,andmoreseatshavegoodviewsoftheocean.
Ialsoinvestedinalargemachinethatmakesawidevarietyoffrozendrinks.Frozenpinacoladasarevery,very
popular.
Ihopemyinvestmentsinthebararepayingoffintermsofhigherguestspendingondrinks.Beverageshavehigh
margins,butI'mnotsureifbeveragesaleshaveincreasedenoughtocovertheinvestments.
Canwesay,forbeverages,asforfood,that"changesinrevenues"areagoodproxyfor"changesinprofits?"
Absolutely.Isetmyprofitmarginsasafixedpercentageofrevenuesforbeveragesaswell.Lastyear,theaverage
spendingonbeveragesperguestperdaywas$21.
Isn'tthathigh?
Well,wehavesomeverynicewinesinourrestaurants.
Wedon'thavetheconsolidatedreportyet,butI'vealreadyhadmystaffchoosearandomsampleofguests.
Wepulledtherestaurantandloungereceiptsfortheguestsinthesampleandnotedthreeitems:totalfoodrevenues,
totalbeveragerevenues,andnumberofguestsatthetable.Usingthisinformation,weshouldbeabletoestimatethe
dailyspendingonfoodandbeveragesperguest.
YoulookatLeo'sdataandwonderhowyoucandiscernwhetherLeo'schangesthenewchefandthebar
renovationshaveinfluencedtheresort'sprofits.

HypothesisTestsforSinglePopulationMeans
Leohasprepareddataforyou.Howareyougoingtoputittouse?
Ourfirsttypeofhypothesistestisusedtostudypopulationmeans.Let'swalkthroughanexampleofthistypeoftest.
Supposethemanagerofamovietheaterimplementedanewstrategyatthebeginningoftheyear:hestarted
showingoldclassicsinsteadofrecentreleases.
Heknowsthatpriortothechangeinstrategy,averagecustomersatisfactionwas6.7outofapossible10points.He
wouldliketoknowifaveragecustomersatisfactionhaschangedsincehealteredhistheater'sartisticfocus.
Themanager'snullhypothesisstatesthatthecurrentmeansatisfactionhasnotchangeditisstill6.7.Weusethe
48/135

5/13/2016

QuantitativeMethodsOnlineCourse

Greeklettermutorepresentthecurrentmeansatisfactionratingofthetheater'sentirefilmgoingpopulation.
Hisalternativehypothesisistheoppositeofthenullhypothesis:itstatesthataveragecustomersatisfactionisnow
different.
Tosubstantiatehisclaimthatthemeanhaschanged,themanagertakesarandomsampleof196moviegoers.Heis
carefultosampleacrossmovies,showtimes,anddates.Themeansatisfactionratingforthesampleis7.3,witha
standarddeviationof2.8.
Doesthefactthattherandomsample'smeanof7.3ishigherthanthehistoricalmeanof6.7indicatethatthisyear's
moviegoersreallyaremoresatisfied?
Or,isthemeanstillthesame,andthemanager"justhappened"topickasamplewithanunusuallyhighaverage
satisfactionrating?Thisisequivalenttoaskingthequestion:Ifthenullhypothesisistruetheaveragesatisfaction
isstill6.7wouldwebelikelytorandomlydrawthesamplethatwedid,withaveragesatisfaction7.3?
Toanswerthisquestion,wehavetofirstdefinewhatwemeanby"likely."Asinsamplingandestimation,we
typicallyuse95%asourthresholdleveloflikelihood.
Wethenconstructarangearoundthepopulationmeanspecifiedbyournullhypothesis.Therangeshouldbedrawn
sothatifthenullhypothesisistrue,95%ofallsamplesdrawnfromthepopulationwouldfallinthatrange.Inother
words,wecreatearangeoflikelysamplemeans.
Thecentrallimittheoremtellsusthatthedistributionofsamplemeansfollowsanormalcurve,sowecanuseits
familiarpropertiestofindprobabilities.Moreover,thedistributionofsamplemeansiscenteredatourassumed
populationmean,mu,andhasstandarddeviationsigma/sqrt(n).Wedon'tknowsigma,theunderlyingpopulation
standarddeviation,soweusethesamplestandarddeviationasourbestestimate.
Aswedowhenconstructing95%confidenceintervals,wecreatearangewithwidthz*s/sqrt(n)=1.96*s/sqrt(n)on
eithersideofthemean.However,whenweconductahypothesistest,wecentertherangearoundthemean
specifiedinthenullhypothesisbecausewealwaysstartahypothesistestbyassumingthenullhypothesisistrue.
Inourexample,thenullhypothesisisthatthepopulationmeanis6.7,nis196,andsis2.8.Our95%confidence
leveltranslatesintoazvalueof1.96.Weconstructtherangeoflikelysamplemeans:
Thistellsusthatifthepopulationmeanis6.7,thereisa95%chancethatthemeanofarandomlyselectedsample
willfallbetween6.3and7.1.
Now,ifwetakeasample,andthemeandoesnotfallwithintherangearound6.7,wecanrejectthenullhypothesis.
Why?Becauseifthepopulationmeanwere6.7,itwouldbeunlikelytocollectasamplewhosemeanfallsoutsidethis
range.
Theregionoutsidetherangeoflikelysamplemeansiscalledthe"rejectionregion,"sincewerejectthenull
hypothesisifoursamplemeanfallsintoit.Inthemovietheaterexample,therejectionregioncontainsallvaluesless
than6.3andallvaluesgreaterthan7.1.
Inthisexample,thesamplemean,7.3,fallsintherejectionregion,sowerejectthenullhypothesis.Wheneverwe
rejectthenullhypothesis,weineffectacceptthealternativehypothesis.
Weconcludethatcustomersatisfactionhasindeedchangedfromthehistoricalmeanvalueof6.7.
Ifoursamplemeanhadfallenwithintherangearound6.7,wecouldnotmakeadefinitestatementabout
moviegoers'satisfaction.Wewouldnothaveenoughevidencetostatethatthingshavechanged,butwecannever
claimthattheyhavedefinitelyremainedthesame.
Unlesswepolleverycustomer,we'llneverknowforsureifcustomersatisfactionhastrulychanged.Workingonly
withsampledata,thereisalwaysachancethatwe'lldrawthewrongconclusionaboutthepopulation.Wecango
wrongintwoways:rejectinganullhypothesisthatisinfacttrueorfailingtorejectanullhypothesisthatisinfact
false.Let'slookatthefirstofthese:thenullhypothesisistrue,butwerejectit.
Wechoosetheconfidencelevelsoitisunlikelybutnotimpossibleforthesamplemeantofallintherejection
regionwhenthenullhypothesisistrue.Inthiscase,weareusinga95%confidencelevel,sobyunlikelywemeana
5%chance.However,5%ofallsamplesfromapopulationwiththenullhypothesismeanwouldfallintherejection
region,sowhenwerejectanullhypothesis,thereisa5%chancewewilldosoerroneously.
Therefore,whenthesamplemeanfallsintherejectionregion,wecanonlybe95%confidentthatwearejustifiedin
rejectingthenullhypothesis.Hencewecontinuetospeakofaconfidencelevelof95%.
49/135

5/13/2016

QuantitativeMethodsOnlineCourse

Ahypothesistestwitha95%confidencelevelissaidtohavea5%levelofsignificance.A5%significancelevelsays
thatthereisa5%chanceofasamplemeanfallingintherejectionregionwhenthenullhypothesisistrue.Thisis
whatpeoplemeanwhentheysaythatsomethingis"statisticallysignificantata5%significancelevel.
Ifweincreaseourconfidencelevel,wewidentherangearoundthenullhypothesismean.Ata99%confidencelevel,
ourrangecaptures99%ofallsamplemeans.Thisreducesto1%ourchanceofrejectingthenullhypothesis
erroneously.Butdoingthishasadownside:bydecreasingthechanceofonetypeoferror,weincreasethechanceof
theothertype.
Thehighertheconfidencelevelthesmallertherejectionregion,andthelesslikelyitisthatwecanrejectthenull
hypothesiswhenitisinfactfalse.Thisdecreasesourchanceofbeingabletosubstantiatethealternativehypothesis
whenitistrue.Asmanagers,weneedtochoosetheconfidencelevelofourtestbasedontherelativecostsofmaking
eachtypeoferror.
Therangeoflikelysamplemeansshouldnotbeconfusedwithaconfidenceinterval.Confidenceintervalsarealways
constructedaroundsamplemeans,neveraroundpopulationmeans.Whenweconstructaconfidenceinterval,we
don'tevenhaveaninitialestimateofthepopulationmean.Constructingaconfidenceintervalisaprocessfor
estimatingthepopulationmean,notfortestingparticularclaimsaboutthatmean.

Summary
Inahypothesistestforpopulationmeans,weassumethatthenullhypothesisistrue.Then,weconstructarange
oflikelysamplemeansaroundthenullhypothesismean.Ifthesamplemeanwecollectfallsintherejection
region,werejectthenullhypothesis.Otherwise,wecannotrejectthenullhypothesis.Theconfidencelevel
measureshowconfidentwearethatwearejustifiedinrejectingthenullhypothesis.

OnesidedHypothesisTests
Themovietheatermanagerdidnothaveastrongconvictionaboutthedirectionofchangeforcustomer
satisfactionpriortoperformingthehypothesistest.
Hewantedtotestforchangeinbothdirectionsupordownandthusheusedatwosidedhypothesistest.The
nullhypothesisthatnochangehastakenplacecouldhavebeenwrongineitheroftwoways:Customer
satisfactionmayhaveincreasedordecreased.Thetwotailednatureofthetestwasreflectedinthetwosided
rangewedrewaroundthepopulationmean.
Sometimes,wemaywanttoknowiftheactualpopulationmeandiffersfromourinitialvalueofthepopulation
meaninaspecificdirection.Forinstance,ifthetheatermanagerwerequitesurethatsatisfactionhadnot
decreased,hewouldn'thavetotestinthatdirectionrather,he'donlyhavetotestforpositivechange.
Inthesecases,ouralternativehypothesisshouldclearlystatewhichdirectionofchangewewanttotestfor.These
kindsoftestsarecalledonesidedhypothesistests.Here,wesubstantiatetheclaimthatthemeanhasincreased
onlyifthesamplemeanissufficientlyhigherthan6.7,soourrejectionregionextendsonlytotheright.
Let'soutlinehowtoformulateoneandtwosidedtests.Foratwosidedtestwehaveaninitialunderstandingof
thepopulation:thepopulationmeanisequaltoaspecifiedinitialvalue.
Ifwewanttosubstantiatetheclaimthatapopulationmeanhaschanged,thenullhypothesisshouldstatethatthe
meanstillequalsthatinitialvalue.Thealternativehypothesisshouldstatethatthemeandoesnotequalthat
initialvalue.
Ifwewanttoknowthattheactualpopulationmeanisgreaterthantheinitialvaluethenullhypothesismean
thenthenullhypothesisshouldstatethatthepopulationmeanhasatmostthatvalue.Thealternativehypothesis
statesthatthemeanisgreaterthanthenullhypothesismean.
Likewise,ifwewanttosubstantiatetheclaimthatapopulationmeanislessthantheinitialvalue,thenull
hypothesisshouldstatethatthemeanisatleastthatinitialvalue.Thealternativehypothesisshouldstatethatthe
meanislessthanthenullhypothesismean,andtherejectionregionextendsonlytotheleft.
Whenweconductaonesidedhypothesistest,weneedtocreateaonesidedrangeoflikelysamplemeans.
Supposethetheatermanagerclaimsthatsatisfactionimproved.Asusual,hestatestheclaimhewantsto
substantiateashisalternativehypothesis.
The196personsamplehasmean7.3andstandarddeviation2.8.Doesthissampleprovidesufficientevidenceto
substantiatetheclaimthatmeansatisfactionincreased?Tofindout,themanagercreatesaonesidedrange:he
50/135

5/13/2016

QuantitativeMethodsOnlineCourse

assumesthepopulationmeanisthenullhypothesismean,6.7,andfindstherangethatcontainsthelower95%of
allsamplemeans.
Tofindthisrange,allheneedstodoiscalculateitsupperbound.Forwhatvaluewould95%ofallsamplemeans
belessthanthatvalue?
Tofindout,weusewhatweknowaboutthecumulativeprobabilityunderthenormalcurve:acumulative
probabilityof95%correspondstoazvalueof1.645.
ztable
Whyisthisdifferentfromthezvalueforatwosidedtestwitha95%confidencelevel?Foratwosidedtest,thez
valuecorrespondstoa97.5%cumulativeprobability,since2.5%oftheprobabilityisexcludedfromeachtail.Fora
onesidedtest,thezvaluecorrespondstoa95%cumulativeprobability,since5%oftheprobabilityisexcluded
fromtheuppertail.
ztable
Wenowhavealltheinformationweneedtofindtheupperboundontherangeoflikelysamplemeans.
Therejectionregioniseverythingabovethevalue7.0.Thesamplemeanfallsintherejectionregion,sothe
managerrejectsthenullhypothesis.Heisconfidentthatcustomersatisfactionishigher.

Summary
Whenwewanttotestforchangeinaspecificdirection,weuseaonesidedtest.Insteadoffindingarange
containing95%ofallsamplemeanscenteredatthenullhypothesismean,wefindaonesidedrange.We
calculateitsendpointusingthecumulativeprobabilityunderthenormalcurve.

ExcelUtility(SinglePopulations)
TheExcelUtilitylinkbelowallowsyoutoperformhypothesistestsforsinglepopulations.
Makesureyoudoatleastoneexamplebyhandtoensureyouthoroughlyunderstandthebasicconceptsbefore
usingtheutility.Youshouldenterdataonlyintheyellowinputareasoftheutility.Toensureyouareusingthe
utilitycorrectly,trytoreproducetheresultsforthetheatermanager'sexample.
ExcelUtilityforSinglepopulations

SolvingtheRestaurantRevenueProblem
Asinglepopulationhypothesistesttestsaclaimusingasamplefromasinglepopulation.Withaplaninmind,you
takealookatLeo'ssampledata.
YouarereadytoanalyzetheimpactofthechangesLeohasmadetohisrestaurantoperations.Youdrawatableto
organizethedatafromyoursampleondailyguestspendingonrestaurantfood.
OnechangeLeomadetohisrestaurantoperationswastohireanewchef.Hewantstoknowwhetheraverage
restaurantspendingperguesthaschangedsinceshetookoverthemenuandthekitchen.Thisisaclearcasefora
hypothesistest.
Lastyear'saveragespendingonfoodperpersonwas$55thisgivesyouaninitialvalueforthemean.
Leowantstoknowifmeanspendinghaschanged,soyouuseatwosidedtest.Youjotdownyournullhypothesis,
whichstatesthattheaveragerevenueperguestisstill$55.
Ifthenullhypothesisistrue,thedifferencebetweenthesamplemeanof$64andtheinitialvalueof$55canbe
accountedforbychance.
Youaddthealternativehypothesistoyournotes.
Next,youassumethatthenullhypothesisistrue:thepopulationmeanis$55.Nowyouneedtoconstructarangeof
likelysamplemeansaround$55andask:doesthesamplemeanof$64fallwithinthatrange?Ordoesitfallinthe
rejectionregion?
51/135

5/13/2016

QuantitativeMethodsOnlineCourse

Leodidn'tspecifywhatlevelofconfidencehewantedforyourresults.Youcallhimforclarification.
Isupposea95%confidencelevelisokay.I'dliketobemoreconfident,ofcourse.
Afteryoupointoutthathigherconfidencewouldreducehischancesofbeingabletosubstantiateachangein
spendingifachangehastakenplace,heagreesto95%.Youpulloutyourtrustycalculatorandgetreadytocomputea
rangearoundthenullhypothesismeanof$55.Consultingyournotes,youfindthecorrectformula:
Youfindtherangecontaining95%ofallsamplemeans.Itsendpointsare:
ztable
UtilityforSinglePopulations
Youpauseforamomenttoreflectontheinterpretationofthisrange.Supposethenullhypothesisistrue.Then19
outof20samplesofthissizefromthepopulationofhotelguestswouldhavemeansthatwouldfallinthecalculated
range.
Thesamplemeanof$64fallsoutsideofthisrange.YouandAlicereportyourresultstoLeo.
Lookslikehiringthatchefwasagooddecision.Theevidencesuggeststhatmeanspendingperpersonhasincreased.
I'mgladtohearit.Nowwhataboutrenovatingthebar?Canyourunasimilartesttoseeifthathasaffectedaverage
beveragespending?
Leoemphasizesthathecan'timaginethathisinvestmentsinthebarcouldhavereducedaveragebeveragespending
perguest.Hewantstoknowifspendinghasgoneup.Youdecidetodoaonesidedtest.
First,youwritedownallofLeo'sdata,alongwiththehypotheses:
Youneedtofindanupperboundsuchthat95%ofallsamplemeansaresmallerthanit.Todoso,youuseazvalueof
1.645.Theupperboundis$24.29.
Whatisthecorrectinterpretationofthisnumber?Giventhatthenullhypothesisistrue,
Therangeoflikelysamplemeanscontainsthecollectedsamplemeanof$24.Thistellsyouthat:
PresentingyourfullreporttoLeo,heappearsconfusedanddisappointed.
Howisthispossible?Whyhasn'trenovatingthebarincreasedrevenues?Evenifthefrozendrinkmachinedidn't
payoff,shouldn'ttheincreaseinseatshavehelped?
Firstofall,wehaven'tconcludedthataveragerevenuehasnotincreased.Wejustcan'tbesurethatithas.Thefact
thatoursamplemeanis$24vs.$21lastyeardoesnotallowustosayanythingdefinitiveaboutthechangein
averagebeveragerevenue.
Remember,wesetouttosubstantiateourhypothesisthatspendinghasimproved.Basedjustonthissample,weare
unabletoconcludethatspendinghasincreased.
Youaddedseatsandnowmorepeoplecanbeseatedinyourlounge.Butagreaternumberofguestsdoesnot
necessarilytranslateintomorespendingperperson.
Thatdoesmakealotofsense.
Youroverallrevenuesmayhaveactuallyincreased,becausemoreguestscanbeseatedinthelounge.
Gosh,I'mgladtohearthat.Foramomentthere,IthoughtIhadmadeareallybadinvestment.I'mquiteoptimistic
I'llseeajumpintotalbeveragerevenuesintheconsolidatedreportattheendoftheyear.
Whydon'twegofillthreeofthosenewseatsrightnow?

Exercise1:Oma'sPretzels
BlancheMcCarthyisthemarketingdirectorofOma'sOwnsnackfoodcompany.Oma'smakestoastedpretzel
snacks,andadvertisesthatthesepretzelscontainanaverageof112caloriesperserving.
Inarecenttest,anindependentconsumerresearchorganizationconductedanexperimenttoseeifthisclaimwas
true.Somewhattotheirsurprise,theresearchersfoundthattheaveragecaloriecontentinasampleof32bags
was102caloriesperserving.Thestandarddeviationofthesamplewas19.
52/135

5/13/2016

QuantitativeMethodsOnlineCourse

BlanchewouldliketoknowifthecaloriecontentofOma'spretzelsreallyhaschanged,soshecanmarketthem
appropriately.With99%confidence,dothesedataindicatethatthepretzels'caloriecontenthaschanged?
ztable
UtilityforSinglePopulations
Youbeginanyhypothesistestbyformulatinganullandanalternativehypothesis.Thenullhypothesisstatesthat
thepopulationmeanisequaltotheinitialvalue.Inthisproblem,thenullhypothesisisthatthecaloriccontentin
theactualpopulationiswhatOma'shasalwaysadvertised.
Thealternativehypothesisshouldcontradictthenullhypothesis.Foratwosidedtest,thealternativehypothesis
simplystatesthatthemeandoesnotequaltheinitialvalue.Atwosidedtestismoreappropriateinthisproblem,
sinceBlancheonlywantstoknowifthemeancaloriecontenthaschanged.
Youassumethatthenullhypothesisistrueandconstructarangeoflikelysamplemeansaroundthepopulation
mean.Usingthedataandtheappropriateformula,youfindtherange[102121].
Thesamplemeanof102fallsoutsideofthatrange,soyoucanrejectthenullhypothesis.Blanchecanbe99%
confidentthatthepopulationmeanisnot112.
WhymightBlanchehavechosena99%confidencelevelratherthanthemoretypical95%levelforhertest?
ztable
UtilityforSinglePopulations

Exercise2:TheClearwaterPowerCompany
TheClearwaterPowerCompanyproduceselectricalpowerfromcoal.Alocalenvironmentalgroupclaimsthat
Clearwater'semissionshaveraisedsulfurdioxidelevelsabovepermissiblestandardsinBlueSky,thetown
downwindoftheplant.
AccordingtoEnvironmentalProtectionAgencystandards,anacceptableaveragesulfurdioxidelevelis30parts
perbillion(ppb).AsClearwater'sPRconsultant,youwanttodefendthecompany,andyoutrytoanticipatethe
environmentalist'sargument.
Theenvironmentalgroupcollects36samplesonrandomlyselecteddaysoverthecourseofayear.Itfindsamean
sulfurdioxidecontentof35ppbwithastandarddeviationof24ppb.
Theenvironmentalistgroupwilluseahypothesistesttobackupitsclaimthatthesulfurdioxidelevelsarehigher
thanpermitted.Whichofthefollowingisanappropriatenullhypothesisforthisproblem?
ztable
UtilityforSinglePopulations
Theenvironmentalists'claimisthatsulfurdioxidelevelsarehigher,sotheywillwanttorunaonesidedtest.The
alternativehypothesisstatesthatthesulfurdioxidelevelsareabovetheacceptedstandard.Weassumetheywill
choosea95%confidencelevel.
Whatistherangeoflikelysamplemeans?
ztable
UtilityforSinglePopulations
Theycalculatetheonesidedrangearoundthenullhypothesismeanthatcontains95%ofallsamples.Thezvalue
foraonesided95%rangeis1.645.Theupperboundontherangeoflikelysamplemeansis36.58ppb.
Basedonyourcalculations,youshould:
ztable
UtilityforSinglePopulations

Exercise3:Neshey'sSmooches
YouaretheplantmanagerofaNeshey'schocolatefactory.Theshopwasfloodedduringtherecentstorms.The
machinethatwrapsNeshey'spopularchocolateconfection,Smooches,stillworks,butyouareafraiditmaynotbe
workingatitsformercapacity.
53/135

5/13/2016

QuantitativeMethodsOnlineCourse

Ifthemachineisn'tworkingattopcapacity,youwillneedtohaveitreplaced.
Whichtypeofhypothesistestismostappropriateforthisproblem?
Thehourlyoutputofthemachineisnormallydistributed.Beforetheflood,themachinewrappedanaverageof
340Smoochesperhour.Overthefirstweekaftertheflood,youcountedwrappedSmoochesduring32randomly
selectedonehourperiods.Themachineaveraged318Smoochesperhour,withastandarddeviationof44.
Youconductaonesidedhypothesistestusinga95%confidencelevel.Accordingtoyourcalculations,youshould:
Thenullhypothesisisthat340.Thealternativehypothesisisthat<340sinceyouareusingaonetailtest
andyouareassumingthatthenewpopulationmeanislowerthanthepopulationmeanbeforetheflood.
Identifytherelevantvalues.Thesamplesizen=32.Thestandarddeviations=44.Theappropriatezvalueis1.645
ifyouwanttocapture95%ofallsamplemeansinaonesidedrangearoundthenullhypothesismean.
Usetheformulaandcalculatethelowerbound,327.Thesamplemeanof318fallswelloutsideofthecalculated
rangeoflikelysamplemeans.Youacceptthisasstrongevidenceagainstthenullhypothesis,substantiatingthe
alternativehypothesisthatthemeanoutputratehasdropped.Youshouldreplacethemachine.

SinglePopulationProportions
Happywithyourworkonrestaurantspending,Leojumpsrightintothenextproblem."It'snotjusttherevenueofthe
restaurantsthatIcareabout,"Leosays,"It'salsomyguests'satisfactionwiththeirrestaurantexperience."

TheRestaurantAmbianceProblem
WhenIgoouttoeat,Iexpectmorethanjustexcellentfood.Thewholediningexperienceisessentialeverything
fromtheservice,tothedcor,tothedesignandqualityofthesilverware.
Andit'snotjustthatallofthesefactorsmustbeexcellentindividuallytheyhavetofittogether.Therestauranthas
tohaveambiance!I'msuremyguestshavesimilarexpectations,andIwanttobesuremyrestaurantmeetsthem.
Sincemynewchefintroducedmoresophisticatedcuisine,ImadesomechangestothedcorthatIthinkhave
improvedtheambiance.
Ittookmealongtimeandasubstantialamountofmoneytogeteverythingright,butI'mpleasedwiththeresult:
therestaurantsareelegantanddistinctlyHawaiian.Justlikethenewchef'scuisine.
Inthepast,I'vecontractedalocalmarketresearchfirmtoconductsurveys,askinggueststoratetheKahana's
restaurants'ambianceonascaleofonetofive.
Historically,thepercentageofpeoplethatratedambiancethetopscoreof5gavemeagoodideaofhowwellwewere
doing.Thatpercentagehasbeenveryhigh:72%.
I'vecollectedthisyear'sdataforyou.Canyoufigureoutifmyguestsarehappierwithmyrestaurants'ambiance?

HypothesisTestsforSinglePopulationProportions
AlicetellsyouthattestingLeo'sclaimaboutaproportionwillbeverysimilartotestingamean.
Oftenthesummarystatisticwewanttomakeaclaimaboutisaproportion.Howdowetestahypothesisabouta
populationproportioninsteadofapopulationmean?
Weknowfromourworkwithconfidenceintervalsthattheprocessesforestimatingpopulationproportionsand
populationmeansarevirtuallyidentical.Similarly,hypothesistestsforproportionsaremuchlikehypothesistests
formeans.
Becauseweareexaminingapopulationproportioninsteadofapopulationmean,weuseslightlydifferentnotation:
weusealowercaseptorepresentthepopulationproportioninplaceofforapopulationmean.Weconstructa
hypothesistesttotestaclaimaboutthevalueofp.
Again,weformulatenullandalternativehypotheses.Basedonconventionalwisdomorpastexperience,wehavean
initialunderstandingofthepopulationproportion.Thenullhypothesisforaproportionteststatestheinitial
understanding.Forexample,inatwosidedtest,thenullhypothesisassertsthatthepopulationproportion,p,is
54/135

5/13/2016

QuantitativeMethodsOnlineCourse

equaltotheinitialvaluewehadinmind.
Thealternativehypothesisistheclaimweareusingthehypothesistesttosubstantiate.Thealternativehypothesis
typicallystatestheoppositeofthenullhypothesis:itstatesthatourinitialunderstandingisincorrect.
Aswithpopulationmeans,wecollectarandomsampleandcalculatethesampleproportion,"pbar."However,fora
hypothesistestaboutapopulationproportion,wedon'tneedtocalculateastandarddeviationfromthesample.
Statisticaltheorytellsusthat,thestandarddeviationofthepopulationproportion,isthesquarerootof[p*(1
p)].Sincewealwaysstartthetestassumingthenullhypothesisistrue,wewillcalculateusingthenullhypothesis
proportion.
Analogouslytopopulationmeantests,wecreatearangeoflikelysampleproportionsaroundthenullhypothesis
proportion.Tocreatetherange,wesubstitutefor,thestandarddeviationoftheunderlyingnullhypothesis
population.
Ifoursampleproportionfallsoutsidetherangeoflikelysampleproportions,werejectthenullhypothesis.
Otherwise,wecannotrejectthenullhypothesis.

Summary
Inahypothesistestforpopulationproportions,weassumethatthenullhypothesisistrue.Then,weconstructa
rangeoflikelysampleproportionsaroundthenullhypothesisproportion.Ifthesampleproportionwecollectfalls
intherejectionregion,werejectthenullhypothesis.Otherwise,wecannotrejectthenullhypothesis.

SolvingtheRestaurantAmbianceProblem
Onceyouunderstandhypothesistestingformeans,usingthesametechniquesonproportionsiseasy.
Bynow,you'refamiliarwiththeconceptoftestingahypothesis.YourecognizethatLeo'srestaurantambiance
problemcallsforahypothesistestforapopulationproportion.
Leowantsyoutofindoutiftheproportionofhisgueststhatraterestaurantambiance"excellent"hasincreased.
Historically,thatpopulationproportionhasbeen0.72.SinceLeowantstoseeiftherehasbeenpositivechange,you
doaonesidedtest.
Theappropriatepairofhypothesesis:
Youaredoingaonesidedtesttoseeiftheproportionofguestsratingtherestaurant"excellent"hasincreased.The
alternativehypothesisstatesthattheproportionhasincreased,andthenullhypothesisstatesthatithasnot
increased.
YoulookatLeo'sdata.Thesampleproportionis0.81andthesamplesizeis126.
Butwhataboutthestandarddeviation?
ztable
UtilityforSinglePopulations
Here'showyoufindthestandarddeviationforaproportionproblem:
Usingtheappropriateformula,youcalculatethestandarddeviationtobe0.45.
Leowantedyoutousea95%confidencelevel.Nowyou'rereadytoconstructarangeoflikelysamplemeansaround
thenullhypothesisvalueofthepopulationproportion:0.72.
Findtherangeoflikelysampleproportionsaroundthenullhypothesisproportion,andformulateashortanswerfor
Leo.
ztable
UtilityforSinglePopulations
Aonesidedtestcallsforaonesidedrangeoflikelysampleproportions.Youneedtofindtheupperboundforthis
rangesuchthattherangecapturesthelower95%ofthesampleproportions.
Thezvalueforaonesided95%confidencelevelis1.645.Substitutethenullhypothesisproportion,0.72,forp.The
55/135

5/13/2016

QuantitativeMethodsOnlineCourse

upperboundfortherangecontainingthelower95%ofallsamplemeansis0.78.
Sincethesampleproportion0.81fallsintherejectionregion,yourejectthenullhypothesis.Thedataprovide
sufficientevidencethatthepopulationproportionhas,infact,changed.
AlicepresentsyourfindingstoLeo,tellinghimthatwith95%confidence,thedatayoucollectedindicatethatthe
differencebetweenthehistoricalpopulationproportionandtheproportionoftherandomsampleisnotdueto
chance.
Theproportionofyourgueststhatratetherestaurants'ambianceas"excellent"hasincreased.
JustwhatIwantedtohear!Thanks,youtwo.

Exercise1:TheVenturaInsuranceCompany
LutherLenya,thenewproductguruofTheVenturaAutomotiveInsuranceCompany,isconsideringmarketinga
specialinsurancepackagetomembersofcertainprofessionalgroups.
Inparticular,Lutherwantstocreateaspecialpackageforhealthprofessionals.
Tofindoutwhatratetochargeforthispackage,Lutherconductsapreliminarystudytoseeifhealthprofessionals
arelesslikelytobeinvolvedincaraccidentsthantherestofhiscustomerbase.
Ifthedataindicatethathealthprofessionalsarelesslikelytobeinvolvedincaraccidents,thenVenturacanoffer
healthprofessionalsalower,morecompetitiverate.
Inthepast5years,8.3%ofVentura'scustomershavebeeninvolvedinaccidents.Whichofthefollowingisthe
correctpairofhypothesesforsolvingLuther'sproblem?
Asampleof240customersinthehealthprofessionrevealsthat12(5.0%)havehadaccidents.
Ifheusesa95%confidencelevel,whichofthefollowingisthebestconclusionLuthercancometo?
ztable
UtilityforSinglePopulations
Youneedtofindarangeoflikelysampleproportions.Tofindthisrange,youcalculateastandarddeviation.The
standarddeviationis0.28.
Foraonesidedtest,aconfidencelevelof95%correspondstoazvalueof1.645.Thelowerboundofthisrangeis
0.054=5.4%.
Therangeoflikelysampleproportionsdoesnotcontain5.0%,soyoushouldrejectthenullhypothesis.With95%
confidence,theproportionofhealthprofessionalsinvolvedincaraccidentsislowerthantheproportionofthe
overallpopulationofdrivers.

PValues
Aftersleepingoveryouranalysisofrestaurantoperations,Leoseemsunsatisfied.

LeoDemandsaDeeperUnderstanding
Don'tgetmewrong,Iappreciateyourhardwork.Butlookhere:thesehypothesistestsresultina"reject/don't
reject"decision.IfIunderstandyoucorrectly,itdoesn'tmatterhowclosetotheborderoftherejectionregionour
samplestatisticfalls:"reject"is"reject."
Butcan'tyoutellmemore?Iwanttoknowhowstrongtheevidenceagainstthenullhypothesisis,notjustifitis
strongenough.
I'mgladyoubroughtthatissueup,Leo.Wehaveasecondmethodofdoinghypothesistests,onethatprovidesa
measureofthestrengthoftheevidence.

PValues
Theeveningbefore,Alicehadacquaintedyouwithpvalues:"Wecanusethepvaluemethodofhypothesistesting
56/135

5/13/2016

QuantitativeMethodsOnlineCourse

tomake'reject/notreject'decisionsinthesamewaywehavebeendoingallalong.Butthepvaluealsomeasuresthe
strengthofevidenceagainstanullhypothesis."
Inhypothesistestswe'vedonesofar,wefirstchosetheconfidencelevelofthetest.Theconfidenceleveltellsusthe
significancelevelofthetest,whichissimply1minustheconfidencelevel.
Typically,wechosea5%significancelevela95%confidencelevelasourthresholdvalueforrejection.Assuming
thatthenullhypothesisistrue,wereasonedthatcertainsamplemeanvaluesarelesslikelytoappearthanothers.If
themeanofthesamplewecollectedwassufficientlyunlikelytoappear(thatis,lessthan5%likely)weconsidered
thenullhypothesisimplausibleandrejectedit.
Now,ratherthansimplycheckingwhetherthelikelihoodofcollectingoursampleisaboveorbelowourchosen
threshold,we'llask:ifthenullhypothesisistrue,howlikelyisittochooseasamplewithameanatleastasfar
fromthenullhypothesismeanasthesamplemeanwecollected?
The"pvalue"measuresthislikelihood:ittellsushowlikelyitistocollectasamplemeanthatfallsatleastacertain
distancefromthenullhypothesismean.
Inthefamiliarhypothesistestingprocedure,ifthepvalueislessthanourthresholdof5%,werejectournull
hypothesis.
Thepvaluedoesmorethansimplyanswerthequestionofwhetherornotwecanrejectthehypothesis.Italso
indicatesthestrengthoftheevidenceforrejectingthenullhypothesis.Forexample,ifthepvalueis0.049,we
barelyhaveenoughevidencetorejectthenullhypothesisatthe0.05levelofsignificanceifitis0.001,wehave
strongevidenceforrejectingthehypothesis.
Let'slookatanexample.Recallthemovietheatermanagerwhowantedtoknowiftheaveragesatisfactionratefor
hisclientelehadchangedfromitshistoricalrateof6.7.
Tofindout,weconstructedtherange,6.3to7.1,whichwouldhavecontained95%ofthesamplemeansifthenull
hypothesismeanhadstillbeentrue.Sincethemeanofthesampleofcurrentmoviegoerswecollected,7.3,fell
outsideofthatrange,werejectedthenullhypothesis.
Because7.3fellintherejectionregion,weknowthatthelikelihoodofcollectingasamplemeanasextremeas7.3is
lessthan5%ifthenullhypothesisistrue.Nowlet'sfindoutexactlyhowunlikelyitisbycalculatingthepvalue.
Calculatingthepvalueisalittletricky,butwehaveallthetoolsweneedtodoit.Recallthatforsamplesofsufficient
size,thesamplemeansofanypopulationaredistributednormally.
Tocalculatethelikelihoodofacertainrangeofsamplemeanvaluesinourexample,samplemeanvaluesgreater
than7.3orlessthan6.1wejustneedtofindtheappropriateareaunderthedistributioncurveofthesample
means.
Tocalculatethepvalueforthistwosidedtest,wewanttofindtheareaunderthenormalcurvetotherightof7.3
andtotheleftof6.1.Thestandarddeviationinthisexampleis2.8,andthesamplesizeis196.
Wecancalculatethisprobabilitybyfirstcalculatingthezvalueassociatedwiththevalue7.3.Thatzvalueis3.
Then,wefindtheprobabilityofhavingazvaluelessthan3orgreaterthan3.
Theareatotheleftofthezvalueof3is0.00135.Theareatotherightofthezvalueof+3isthesamesize,sothe
totalareais0.0027.Thatisourpvalue.TheseareasandthepvaluecanbefoundinExcelusingthe
NORMSDIST(3)function,intheztable,orwiththeExcelutilityprovided.
Excel
ztable
UtilityforSinglePopulations
Ourpvaluecalculationtellsusthattheprobabilityofcollectingasamplemeanatleastasfarfrom6.7as7.3is
0.0027.Thepvalueislowerthan0.05.Thus,atasignificancelevelof0.05,wewouldrejectthenullhypothesisand
concludethatmoviegoers'averagesatisfactionratingisnolonger6.7.
Excel
ztable
UtilityforSinglePopulations
Butthepvalue0.0027ismuchsmallerthan0.05.Thus,wecanrejectthenullhypothesisat0.0027,amuchlower
significancelevel.Inotherwords,wecanrejectthenullhypothesiswith99.73%confidence.Ingeneral,thelowerthe
57/135

5/13/2016

QuantitativeMethodsOnlineCourse

pvalue,thehigherourconfidenceinrejectingthenullhypothesis.
Onesidedhypothesistestsarealsoeasilyconductedwithpvalues.Foronesidedtests,thepvalueistheareaunder
onesideofthecurve.Inourmovietheaterexample,ifthealternativehypothesisstatesthatthepopulationmeanis
largerthan6.7,thepvalueistheareaunderthenormalcurvetotherightofthesamplemeanof7.3.

Summary
Thepvaluemeasuresthestrengthoftheevidenceagainstthenullhypothesis.Itisthelikelihood,assumingthat
thenullhypothesisistrue,ofcollectingasamplemeanatleastasfarfromthenullhypothesismeanasthesample
actuallycollected.Wecomparethepvaluetothethresholdsignificanceleveltomakeareject/notrejectdecision.
Thepvaluealsotellsushowcomfortablewecanbewiththatdecision.

SolvingtheRestaurantRevenueProblem(PartII)
NowAliceexplainsthebasicsofpvaluestoLeo,soyoucanpresenttheresultsofyourrestaurantrevenuehypothesis
testagain.Thistime,you'llbeabletogiveLeoanideaofhowstrongthestatisticalevidenceis.
Leowantsyoutocompletethepvaluehypothesistestrightthereinhisoffice.You'realittlenervousyou'venever
hadaclientpeeringoveryourshoulderwhenyouwork.Butyouobligehim,becauseyou'regrowingmoreconfident
ofyourstatisticalskills.
Lookingbackatyournotesontheproblem,youfindthedataandthehypotheses.Youmakeamentalnotethatyou
aredoingatwosidedtesttoseewhetherornotaveragespendingonfoodhaschangedfromitshistoricallevelof
$55.
AneagerLeointerruptsyourthoughtprocess:
WhenyouranthehypothesistestearlierIhadyouusea95%confidencelevel.Thatcorrespondstoasignificance
levelof0.5,right?
Youpolitelyrespond:
Tofindthesignificancelevelcorrespondingtoaconfidencelevelof95%,simplysubtract95%from100%,and
convertintodecimalnotation:0.05.
AfteryouclarifyLeo'smistake,hesitsbackandletsyoufinishyouranalysiswithoutfurtherinterruption.First,you
findtheappropriatezvalue.Enterthezvalueasadecimalnumberwith2digitstotherightofthedecimal,(e.g.,
enter"5"as"5.00").Roundifnecessary.
Excel
ztable
Thecorrectzvalueis3.00,correspondingtoarighttailprobabilityof0.00135.You:
ztable
UtilityforSinglePopulations
Yoursamplehasamean(xbar=$64)thatis$9higherthantheassumedpopulationmean,$55.Youwantto
calculatethelikelihoodofgettingasamplemeanthatisatleastasfarfromthepopulationmeanasxbar.
Thatlikelihoodisnotjustthetailprobabilitytotherightofthesamplemean.Samplemeansontheothersideofthe
normalcurvearejustasfarfromthepopulationmeanasxbar.Theymustbeincluded,too,whenyoucalculatethe
pvalueforatwosidedhypothesistest.
Doublingtherighttailprobabilitygivesyouthecorrectpvalue:0.0027.
AlicesummarizesyourresultsforLeo.
Allwehavetodoiscomparethepvaluetothesignificancelevel.Thepvalue0.0027islessthanthesignificance
level0.05.Ourdataarestatisticallysignificantatthe0.05level.
Justaswecalculatedearlierbyconstructingarangearoundthenullhypothesismean,thepvaluemethodsuggests
thatwerejectthenullhypothesis.With95%confidence,averagefoodspendingperguesthaschanged.
Butnowwecanalsoseethattheevidenceisverystrong,becausethepvalueismuchlowerthanthesignificance
58/135

5/13/2016

QuantitativeMethodsOnlineCourse

level.Wecanclaimthatfoodspendinghaschangedatthe0.0027levelofsignificance.Thanks,youtwo.Ifeelmuch
morecomfortableconcludingthataverageguestspendinginmyrestauranthaschanged.

Exercise1:Oma'sRevisited
Inthefollowingexerciseyouwillrevisitanearlierproblem,thistimesolvingitwiththepvaluemethod.
BlancheMcCarthyisthemarketingdirectorofOma'sOwnsnackfoodcompany.Oma'smakestoastedpretzel
snacks.Eachbagofpretzelscontainsoneserving,andOma'sadvertisesthatthepretzelsnackscontainanaverage
of112caloriesperserving.
Inarecenttest,anindependentconsumerresearchorganizationconductedanexperimenttoseeifthisclaimwas
true.Theresearchersfoundthattheaveragecaloriecontentinasampleof32bagswas102caloriesperserving.
Thestandarddeviationofthesamplewas19.
BlanchewouldliketoknowifthecaloriecontentofOma'spretzelshasreallychanged,soshecanmarketthem
appropriately.Atthesignificancelevel0.01,dothesedataindicatethatthepretzels'caloriecontenthaschanged?
UtilityforSinglePopulations
Excel
ztable
Inthisproblem,thenullhypothesisisthattheactualpopulationmeaniswhatOma'shasalwaysadvertised.A
twosidetestismoreappropriateinthisproblem,sinceBlancheonlywantstoknowifthemeancaloriecontent
haschanged.
Assumingthatthenullhypothesisistrue,youfindazvalueforthesamplemeanof102usingtheappropriate
formula.Thezvalueis2.98.
UsingtheExcelNORMSDISTfunctionortheStandardNormalTable,youcanfindthecorrespondinglefttail
probabilityof0.0014.Foratwosidedtest,youdoublethisnumbertofindthepvalue,inthiscase0.0028.
Sincethispvalueislessthanthesignificancelevel,youcanrejectthenullhypothesis.Moreover,younowcansay
thatyouarerejectingthenullhypothesisatthe0.0028levelofsignificance.YoucanrecommendtoBlanchethat
shehavethelabelingchangedonthepretzelbags,andadjusthermarketingaccordingly.

Exercise2:Neshey'sRevisited
Inthefollowingchallengeyouwillrevisitanearlierproblem,thistimesolvingitwiththepvaluemethod.
YouaretheplantmanagerofaNeshey'schocolatefactory.Theshopwasfloodedduringtherecentstorms.The
machinethatwrapsNeshey'spopularchocolateconfection,Smooches,stillworks,butyouareafraiditmaynotbe
workingatitsformercapacity.
Ifthemachineisn'tworkingattopcapacity,youwillneedtohaveitreplaced.
Thehourlyoutputofthemachineisnormallydistributed.Beforetheflood,themachinewrappedanaverageof
340Smoochesperhour.Overthefirstweekaftertheflood,youcountedwrappedSmoochesduring32randomly
selectedonehourperiods.Themachineaveraged318Smoochesperhour,withastandarddeviationof44.
Youconductaonesidedhypothesistestusinga95%confidencelevel.Accordingtoyourcalculations,youshould:
UtilityforSinglePopulations
Excel
ztable
Thenullhypothesisisthatthepopulationmeanisnolowerthan340.Thealternativehypothesisisthatthe
populationmeanisnowlessthan340.Youareusingaonetailedtest,andyouareassumingthatthenew
populationmeanislowerthanthepopulationmeanbeforetheflood.
Identifythevaluesoftherelevantquantities.Usetheappropriateformulaandcalculatethezvalue.Thezvalueis
2.83.
Thiszvaluecorrespondstoalefttailprobabilityof0.0023.Thisisthetailyouareinterestedin,sinceyouare
conductingaonesidedtotesttoseeiftheactualpopulationmeanislessthanitwasinthepast.Thistail
probabilityisthepvalue.
59/135

5/13/2016

QuantitativeMethodsOnlineCourse

Sincethepvalueislessthanthesignificancelevel,yourejectthenullhypothesisthatthepopulationmeanis
unchanged.Moreover,younowcansaythatyouarerejectingthenullhypothesisatthe0.0023levelof
significance.Youshouldreplacethemachine.

ComparingTwoPopulations
Nowsatisfiedwithyouranalysisoftherestaurant,Leoasksyoutocomparethediscretionaryspendinghabitsoftwo
categoriesofguests:leisureandbusiness.

LeisureGuestsvs.BusinessGuests:Whospendsmore?
Everyhotelmanagerwrestleswiththeproblemofstretchinglimitedmarketingresources.Iwanttomakesurethat
I'mwiselyallocatingeachmarketingdollar.
Leisureguests,suchastouristsandhoneymooners,areespeciallyattractedtoHawaii.Also,manyprofessional
associationsliketohavetheirconventionshere,soourislandsattractbusinesstravelers,whomixbusinessand
pleasure.
Businesstravelerspaylowerroompricesbecauseconferencesbookroomsinbulk.Bulkreservationsaregoodforme
becausetheykeepmyoccupancylevelshigh.
However,Idon'thaveagoodsenseofwhetherthediscretionaryspendingofmybusinessguestsisdifferentfrom
thatofmyleisureguests:theymaytakefewerscubalessonsbutusethespaservicesmore,forexample.
Canyouhelpmefigureoutwhetherthereisanysignificantdifferencebetweenleisureandbusinesstravelers'
discretionaryspendinghabits?Yourconclusionsmightinfluencemymarketingefforts.
Icollectedtworandomsamples:oneofleisureguestsandoneofbusinessguests.Notincludingroom,meal,and
beveragechargesleisuretravelersspentanaverageof$75aday,comparedto$64adayforthebusinesstravelers.
Iknewthatthedifferencebetweenthetwoaveragesofthetwosamplescouldbeduetochance,soIthoughtI'dhave
youdoahypothesistesttofindout.
WhenIwascompilingthedataforyou,Irealizedthatmysampleswereofdifferentsizes.Iwasabletoget85leisure
gueststorespond,butonly76businessguestsreturnedmysurvey.
Whichfigurewillyouuseasthesamplesize?Orwillyouaddthemtogether?
Ialsorealizedthatwiththesedata,you'dhavetocalculatetwosamplestandarddeviations,oneforeachsample.
Howdoyougoaboutsolvingaproblemlikethis?

UsingHypothesisTeststoCompareTwoPopulationMeans
Howdoyoutestwhethertwopopulationshavedifferentmeans?
Sofar,we'veusedhypothesisteststostudythemeanorproportionofasinglepopulation.Often,managerswantto
comparethemeansorproportionsoftwodifferentpopulations:inthiscase,weuseatwopopulationhypothesis
test.Let'sclarifywhenweuseeachtypeoftest.
Weconductsinglepopulationtestswhenwehaveaninitialvalueforapopulationmeanandwanttotesttoseeifit
iscorrect.Singlepopulationtestsareespeciallyusefulwhenwesuspectthatthepopulationmeanhaschanged.For
example,weuseasinglepopulationtestwhenweknowthehistoricalaverageofapopulationandwanttotest
whetherthathistoricalaveragehaschanged.
Weconducttwopopulationteststocompareacharacteristicoftwogroupsforwhichwehaveaccesstosampledata
foreachgroup.Forexample,we'duseatwopopulationtesttostudywhichoftwoeducationalsoftwarepackages
betterpreparesstudentsfortheGMAT.Dothestudentsusingpackage1performbetterontheGMATthanthe
studentsusingpackage2?
Intwopopulationtests,wetaketwosamples,onefromeachpopulation.Foreachsample,wecalculatethesample
mean,standarddeviation,andsamplesize.
Wecanthenusethetwosetsofsampledatatotestclaimsaboutdifferencesbetweenthetwopopulations.For
example,whenwewanttoknowwhethertwopopulationshavedifferentmeans,weformulateanullhypothesis
statingthatthemeansarenotdifferent:thefirstpopulationmeanisequaltothesecond.
60/135

5/13/2016

QuantitativeMethodsOnlineCourse

Let'slookattheGMATsoftwarepackageexamplemoreclosely.Themanagerofoneeducationalsoftwarecompany
mightwonderiftheaverageGMATscoreofstudentsusinghersoftwareisdifferentfromtheaverageGMATscoreof
studentsusingthecompetitor'ssoftware.
SincethemanageronlywantstotestiftheaverageGMATscoresaredifferent,sheconductsatwosidedhypothesis
testfortwopopulations.ThenullhypothesisstatesthatthereisnodifferencebetweentheaverageGMATscoresof
thestudentswhousethetwocompanies'software.
ThealternativehypothesisstatesthattheaverageGMATscoresofthestudentswhousethetwocompanies'software
aredifferent.
WedenotetheaveragescoresofthetwopopulationsbytheGreeklettermuanddistinguishthemwithsubscripts.
Ourhypothesesare:
Tobe95%confidentintheresultofthetest,weuseasignificancelevelof0.05.
Wecollecttwosamples,onefromeachpopulation.Wedenotethesamplemeanswiththefamiliarxbar,whichwe
againdistinguishwithsubscripts.
WeareabletocollecttheGMATscoresof45peoplewhousedthecompany'ssoftware,and36peoplewhousedthe
competitor'ssoftware.Aswewillseeshortly,thedifferentsamplesizeswillnotposeaproblem.
Therespectivesamplemeansare650and630,andthestandarddeviationsare60and50.
Couldthetworandomsampleswepickedjusthappentohavedifferentmeansbychancebutreallyhavecomefrom
populationsthathavethesamepopulationmeans?
Thenullhypothesisstatesthatthereisnodifferenceinthetwopopulationmeans.Aswithsinglepopulationtests,
wetestthenullhypothesisbyaskinghowlikelyitwouldbetoproducethesampleresultsifthenullhypothesisisin
facttrue.
Thatis,iftheaverageGMATscoresforstudentsusingthetwodifferentsoftwarepackagesactuallyarethesame,
whatisthechancethattwosampleswecollectwouldhavesamplemeansasdifferentas650and630?
Ourintuitiontellsusthatthegreaterthedifferencebetweenthemeansofthetwosamples,themorelikelyitisthat
thesamplescamefromdifferentpopulations.Buthowdoweknowwhenthenumericaldifferenceislargeenoughto
bestatisticallysignificant?Whendowehaveenoughevidencetoactuallyconcludethatthetwopopulationsmustbe
different?
Weusepvaluestoanswerthisquestion.Firstwecalculateazvalueforthedifferenceofthesamplemeans,
incorporatingthedatafrombothpopulations.Itlooksabitcomplicated:
Let'scomputethezvalueforourexample.Sinceweassumethatthenullhypothesisistrue,wehave:
Usingtheformula,wefindthatthezvalueis1.64.
Foratwosidedtest,azvalueof1.64translatesintoaprobabilityinonetailof0.05,andthusapvalueof0.10.
Sincethispvalueisgreaterthanthesignificancelevelof0.05,wecannotrejectthenullhypothesis.
Inotherwords,thehighpvaluetellsusthatthereisinsufficientevidencefromthetwosamplestoconcludethatthe
averageGMATscoreofthestudentswhousethecompany'ssoftwareisdifferentfromtheaverageGMATscoreof
studentswhousethecompetitor'ssoftware.
Twopopulationhypothesistestscanbeperformedusingtheformulashownabove,oryoucanclickheretoaccessthe
Excelutilityforhypothesistesting.

Summary
Inahypothesistestfortwopopulationmeans,weassumeanullhypothesis:thatthetwopopulationmeansare
equal.Wecollectasamplefromeachpopulationandcalculateitssamplestatistics.Wecalculateapvalueforthe
differencebetweenthetwosamples.Ifthepvalueislessthanthesignificancelevel,werejectthenullhypothesis.

HypothesisTestsforTwoPopulationProportions
Often,managerswanttoknowiftwopopulationproportionsareequal.Forexample,amarketingmanagerofa
61/135

5/13/2016

QuantitativeMethodsOnlineCourse

packagedsnackfoodscompanymightwanttocomparethesnackfoodhabitsofdifferentstatesintheUS.
ThemarketingmanagermightthinkthattheproportionofconsumerswhofavorpotatochipsinTexasisdifferent
fromtheproportionofconsumerswhofavorpotatochipsinOklahoma.
Comparingtwopopulationproportionsissimilartocomparingtwopopulationmeans.Wehavetwopopulations:
thenullhypothesisstatesthattheirproportionsarethesamethealternativehypothesisstatesthattheyare
different.
Wecollectasamplefromeachpopulationandcalculateitssamplesizeandsampleproportion.Asinthesingle
populationproportiontest,wedon'tneedtofindthesamplestandarddeviation,sinceweknowthatthe
populationstandarddeviationisthesquarerootof
[p*(1p)].
Similarlytothehypothesistestsforcomparingtwopopulationmeans,wecalculateazvalueforthedifference
betweentheproportionsusingtheformulabelow:
Wetranslatethezvalueintoapvaluejustaswewouldforanyothertypeofhypothesistest.Ifthepvalueisless
thanoursignificancelevel,werejectthenullhypothesisandconcludethattheproportionsaredifferent.Ifthep
valueisgreaterthanthesignificancelevel,wedonotrejectthenullhypothesis.

OptionalExample
Let'stakeacloserlookatthestudyofsnackinghabitsinTexasandOklahoma.
Themanagerdoesnotwishtotestforaparticulardirectionofdifferencehejustwantstoknowifthe
proportionsaredifferent.Thus,heshoulduseatwosidedtest.
Themarketingmanagerwantstobe95%confidentintheresultofthistest,sothesignificancelevelis0.05.
Supposewecollectresponsesfrom400peopleinTexasand225peopleinOklahoma.Thesampleproportions
are45%and35%,respectively.
Couldthetworandomsampleswepickedjusthappentohavedifferentsampleproportions?Thatis,ifthetrue
proportionsofTexansandOklahomansfavoringpotatochipsactuallyarethesame,whatwouldbethechance
thatthesampleproportionsare45%and35%respectively?
Weusepvaluestoanswerthisquestion.First,wecalculateazvalueforthedifferenceofthesample
proportionsthatincorporatesthedatafrombothpopulations.Thenullhypothesisstatesthatthepopulation
proportionsareequal,sotheirdifferenceis0.
Thezvalueis2.48.
Foratwosidedtest,azvalueof2.48translatesintoaprobabilityinonetailof0.0065andhenceapvalueof
0.013.
Sincethispvalueislessthanthesignificancelevelof0.05,wecanrejectthenullhypothesis.
Inotherwords,thelowpvaluetellsusthatthereissufficientevidencefromthesamplestoconcludethatthere
isadifferencebetweentheproportionsofTexanandOklahomanpotatochiplovers.Wecanmakethisclaimata
0.013levelofsignificance.
Twopopulationhypothesistestsforpopulationproportionscanbeperformedusingtheformulashownabove,
oryoucanclickheretoaccesstheExcelutilityforhypothesistesting.

Summary
Inahypothesistestfortwopopulationproportions,weassumeanullhypothesis:thetwopopulation
proportionsareequal.Wecollecttwosamplesandcalculatethesampleproportions.Wecalculateapvaluefor
thedifferencebetweenthesampleproportions.Ifthepvalueislessthanthesignificancelevel,werejectthe
nullhypothesis.

ExcelUtility(TwoPopulations)
ClickheretoopenanExcelUtilitythatallowsyoutoperformhypothesistestsfortwopopulations.
62/135

5/13/2016

QuantitativeMethodsOnlineCourse

Makesureyoudoatleastoneexamplebyhandtoensureyouthoroughlyunderstandthebasicconceptsbefore
usingtheutility.Youshouldenterdataonlyintheyellowinputareasoftheutility.Toensureyouareusingthe
utilitycorrectly,trytoreproducetheresultsfortheGMATandpotatochipexamples.

SolvingtheLeisurevs.BusinessGuestSpendingProblem
Twopopulationhypothesistestshelpyoudeterminewhethertwopopulationshavedifferentmeans.Youuseatwo
populationtesttosolveLeo'sproblem.
Youhavetofindoutifleisureguests'averagedailydiscretionaryspendingisdifferentfrombusinessguests'average
dailydiscretionaryspending.
Leohasprovidedthesedata:
Nowit'stimetostatethenullhypothesis.Thebestformulationis:
Youwanttofindoutifthemeansoftwopopulationsaveragespendingbyleisureguestsvs.averagespendingby
businessguestsaredifferent.Thetwosamplesfromthosepopulationshavedifferentmeans:$75and$64,
respectively.
Thesamplesmaycomefrompopulationswiththesamemeans,andthenumericaldifferenceisduetochancein
gettingtheseparticularsamples.Itcouldbethatthefirstsamplejusthappenedtohaveahighmeanandthesecond
samplejusthappenedtohavealowermean.
Youtestthenullhypothesisthatthepopulationmeanshavethesamevalue.
Youmakenoteofyournullhypothesisandthecorrespondingalternativehypothesis.Youuseatwosidedtest
becauseyoudon'thaveanyreasontobelievethatonetypeofguestspendsmorethantheother.
AtLeo'srequestyoudoapvaluetestusingasignificancelevelof0.05.Tocalculatethepvalue,youfirstfindthez
value.
Enterthezvalueasadecimalnumberwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Roundif
necessary.
ztable
UtilityforTwoPopulations
Thezvalueis+2.12or2.12,dependingonhowyousetupthedifference,i.e.,inwhatorderyousubtractthesample
means.Eitherway,thefinalconclusionwillbethesame.
Youusethezvaluetocalculatethepvalue.Thepvalueis:
ztable
UtilityforTwoPopulations
Azvalueof2.12hasacumulativeprobabilityof0.9830.Yousubtractthisprobabilityfromthetotalprobability,1,
forarighttailprobabilityof0.017.
Becausewearedoingatwosidetest,wewanttomeasuretheprobabilityofextremevaluesonbothsides.Thus,we
double0.017togetapvalueof0.034.
Sincethepvalue0.034islessthanthesignificancelevel0.05,yourecommendtoLeothatherejectthenull
hypothesis.Theaveragedailydiscretionaryspendingperpersonisdifferentforleisureandbusinessguests.
Leoreadsyourreport:
Isee.Wecantelliftwopopulationmeansaredifferentbyrunningahypothesistestontheirdifference.Wetestthe
nullhypothesisthatthereisnodifference.
Asyousee,thepvalueislessthanthesignificancelevel.Thistellsyouthat...
...thedifferenceinthetwosamplemeansisprobablynotduetochance.Thespendinghabitsofthetwotypesof
guestsaredifferent.Gotit.

Exercise1:TheBurgerBaron
63/135

5/13/2016

QuantitativeMethodsOnlineCourse

ClaudeForbesisaregionalmanagerofTheBurgerBaronrestaurantchain.TheBaronwouldliketoopena
franchiseinSappington.Twopropertiesareupforsale,andClaudewantstochoosethelocationthathasthemost
traffic.
On54randomlyselecteddaysoverasixmonthperiod,anaverageof92carspassedbylocationAduringthe
lunchhour,fromnoonto1PM.On62randomlyselecteddays,101carspassedbylocationBduringthelunchhour.
Thestandarddeviationsforthetwolocationswere16and23,respectively.

IsthedifferencebetweentheamountoftrafficatlocationsAandBstatisticallysignificantatthe0.01level?
ztable
UtilityforTwoPopulations
First,yousetupthehypotheses.Thenullhypothesisstatesthatthereisnodifferenceinmeantrafficflowduring
thelunchhouratthetwolocations.Sinceyouareonlyinterestedinwhetherthetrafficisdifferentatthetwo
locations,youconductatwosidedhypothesistest.
Next,youfindthezvalueforthedifferencebetweenthetwosamplemeansusingtheappropriateformula.Thez
valueis2.47
Azvalueof2.47correspondstoalefttailprobabilityof0.0068.Sinceyouaredoingatwosidedtest,youmust
doublethisprobabilitytofindthecorrectpvalue:0.0136.
Thepvalueishigherthanthesignificancelevel.Youcannotrejectthenullhypothesis.Onthebasisofthesedata,
Claudehasinsufficientevidencetoshowthatonelocationhasmoretrafficthantheother.

Exercise2:KarnivorousKongvs.PeterthePipsqueak
TheMagicalToycompanymanufacturesalineofwrestlingactionfigures.MaudeTroston,theheadofthedoll
department,mustmakethefinaldecisionregardingwhichoftwomodels,"KarnivorousKong"or"Peterthe
Pipsqueak,"shouldbediscontinued.
MagicalToyshipscratescontainingequalnumbersofPipsqueaksandKongstoretailoutlets.45randomlyselected
toystoreshavesoldanaverageof78%ofallPipsqueaksdeliveredtothem.48otherstoreshavesoldanaverageof
84%oftheirKongs.
Atasignificancelevelof0.05,dothedataindicatethatoneactionfiguresellsbetterthantheother?
ztable
UtilityforTwoPopulations
Maudewantstoknowifonefiguresellsbetter,butdoesn'thaveapreconceptionaboutwhichfiguremightbe
flyingofftheshelves.Thusyouuseatwosidedtest,andyouralternativehypothesissimplystatesthatthereisa
differencebetweenthetwopopulationproportions.
Usingtheappropriateformulaandthedata,youfindthezvalueforthedifferencebetweenthesample
proportions.Thezvalueis0.738.
Azvalueof0.738correspondstoalefttailprobabilityof0.2303.Youdoublethistofindthepvalueforthetwo
sidedtest.Thepvalueis0.4606
Thepvalueisgreaterthanthesignificancelevel.Therefore,youcan'trejectthehypothesis.Thedatadonotshow
conclusivelythatoneactionfiguresellsbetterthantheother.

GrapefruitBizarre
TheRegalBeverageCompanymakesthesoftdrinkGrapefruitBizarre.Themarketingdepartmentwantsto
refocusitsenergiesandresources.
Youhavebeenaskedtodetermineifthereareregionaldifferencesinconsumers'responsetoadvertisementsfor
GrapefruitBizarre.Specifically,youmustfindoutiftheMidwestrespondstoGrapefruitBizarreadvertisements
aswellastheWestCoast.
Themarketingdepartmentisclamoringtostartasecondcampaign.Itclaimsthatadsthatareeffectiveonthe
64/135

5/13/2016

QuantitativeMethodsOnlineCourse

WestCoastdonotgooveraswellintheMidwest.Managementdemandsstatisticalevidenceatasignificancelevel
of0.05.
Inthecontextofafreemoviescreening,anadforGrapefruitBizarreisshownto173Midwesterners.Theviewers
hadbeenrandomlyselected,andhadnotpreviouslytastedthedrink.
Whenaskedlater,33%claimedthattheywereatleastmildlyinterestedintryingGrapefruitBizarre.Inasimilar
surveyconductedontheWestCoast,42%of152testsubjectsclaimedatleastamildinterestintryingGrapefruit
Bizarre.
Youcalculatethezvalueforthedifferenceinsampleproportions.
Enteryourzvalueasadecimalnumberwith2digitstotherightofthedecimal,(e.g.,enter"5"as"5.00").Round
ifnecessary.
ztable
UtilityforTwoPopulations
Thezvalueis+or1.68,dependingonhowyousetupthedifference,i.e.,inwhatorderyousubtractthesample
means.
Azvalueof1.68correspondstoalefttailprobabilityof0.0465.Whatdoyoureporttothemarketing
department?
ztable
UtilityforTwoPopulations
Youareconductingaonesidedhypothesistest.Thealternativehypothesisstatesthattheproportionofthe
MidwestsampleislessthantheproportionoftheWestCoastsample.Therefore,youareinterestedonlyinthe
lefttailprobability.Yourpvalueis0.0465.
0.0465islessthanthesignificancelevel0.05.Youshouldrejectthenullhypothesis.Midwesternersdonotrespond
totheadsaswellaspeoplefromtheWestCoast.Marketing'sclaimsarevalid.

Challenge:LeMerFashionDesign
UpscalefashiondesignerMarjorieLeMermustdecidefromwhichsuppliersheshouldpurchaseboltsofcloth.
RumorhasitthatBlueTex'sproductissuperiortoSouthernHalifax's.
Random10yardsectionsfrom43boltsofHalifax'sclothcontainameanof1.8flawsperyard.Similarsections
from42boltsofBlueTex'sproductcontain1.6flawsperyard.Thestandarddeviationsare0.3and0.6,
respectively.
MarjoriewantsyoutofindoutiftherumorsthatBlueTexmakesabetterproductarestatisticallywarranted.
Youconductaonesidedtest.Whichofthefollowingisthebestalternativehypothesis?
ztable
UtilityforTwoPopulations
Atwhatlevelarethesedatasignificant?
ztable
UtilityforTwoPopulations
Youfindthezvalueforthedifferencebetweenthetwosamplemeansusingtheappropriateformula.Thezvalue
is1.94.
Thecumulativeprobabilityforz=1.94is0.0262.Thisisthelefttailprobability.Sinceyouarerunningaone
sidedtest,0.0262isyourpvalue.
0.0262isgreaterthan0.01.Atthissignificancelevel,youwouldnotrejectthenullhypothesis.0.0262islessthan
0.05.Atthissignificancelevel,youwouldrejectthenullhypothesis.
SouthernHalifax'sproductisslightlycheaperthanBlueTex's.Allotherfactorsbeingequal,Marjoriewouldliketo
buythelessexpensiveproduct.Unlesssheis99%confidentthatthereisadifferenceinquality,shewillgowith
thecheapercloth.
65/135

5/13/2016

QuantitativeMethodsOnlineCourse

Basedonthisinformationandyourcalculationsdata,Marjorieshould:
ztable
UtilityforTwoPopulations
Thedataarenotsignificantatthe0.01level,soyoucan'trejectthenullhypothesisthattheyaredifferent.
Asignificancelevelof0.01correspondstoaconfidencelevelof99%.Soata99%confidencelevel,youcan'treject
thenullhypothesis.Marjoriecan'tbe99%confidentthatBlueTex'sproductisbetterthanHalifax's.
SinceHalifax'sproductischeaperandyoucan'tdetermineadifferenceinqualityatthelevelofstatistical
significantMarjorierequires,yourecommendHalifaxtoMarjorie.
"Goodwork!"saysAlice."You'rereadyforanewchallenge:investigatingrelationshipsbetweenvariables."

RegressionBasics
Introduction
Asyourelaxinyourroomduringabriefafternoondownpour,yourphonerings.

Leo'sBisqueDebacleandtheStaffingProblem
Leojustcalled.Hewantsustocometohisofficeimmediately.Hesoundsalittleangry.We'dbetternotkeephim
waiting.
I'msorryifIwasshortonthephone.I'mveryupset.Wejusthadalittleincidentdownintherestaurant.Aserver
spilledatureenofcrabbisqueononeofourmost"favored"guests,Mr.Pitt.
TheKahana'soccupancythisyearhasbeenhigherthanIexpected,andIhadtohireextrahelpfromastaffing
agency.Thosestaffingagencieschargeafortune,whichisespeciallyirritatingconsideringthattheemployeesthey
refertousareoftenpoorlysuitedtocustomerserviceinanupscalehotel.
Really,thisismyfaultfornothavingamoreeffectivestaffingprocess.IjustwishIcouldpredictmyneedsbetter.
Sometimes,whendemandislowerthanIexpected,I'moverstaffed.ThenIlosemoneypayingidlebellhops.IfIhad
agoodsenseofmystaffingneedsatleastamonthinadvance,Icouldavoidhiringworkersatthelastminuteand
havingidlestaff.
Ihadbeenthinkingthatthenumberofadvancereservationswouldgivemeagoodideaofhowhighmyoccupancy
wouldbeamonthdowntheroad.Butclearlyadvancereservationsdon'ttellmethewholestory.I'vebeenmaking
waytoomanyfalsepredictions.
Isthereanythingyoucandotohelpmehere?WhatpredictionsaboutoccupancycanImakebasedonadvance
bookings?AndhowmuchcanItrustthem?
We'lltakealookatthedataonadvancebookingsandoccupancyandletyouknowwhatwefindout.

IntroducingtheRegressionLine
AliceseemsconfidentthatthetwoofyoucanofferusefuladviceonLeo'sstaffingproblem:
"Thiswillbeagreatopportunityforyoutolearnregression.It'sapowerfulstatisticaltoolusedallthetimein
business:infinance,demandforecasting,marketresearchtonamejustafewareas.I'msureyou'lluseitinyour
MBAprogram.Andit'sagreatchancetoreviewwhatyou'velearnedsofar:sampling,confidenceintervals,and
hypothesistestingallplayapartinregression."
Aswehaveseen,itisoftenusefultoexaminetherelationshipbetweentwovariables.Usingscatterdiagrams,wecan
visualizesuchrelationships.
Wecanlearnmoreabouttherelationshipbyfindingthecorrelationcoefficient,whichmeasuresthestrengthofthe
linearrelationshiponascalefrom1to1.
Regressionisastatisticaltoolthatgoesevenfurther:itcanhelpusunderstandandcharacterizethespecific
structureoftherelationshipbetweentwovariables.
66/135

5/13/2016

QuantitativeMethodsOnlineCourse

Let'slookatanexample.JuliusTabinownsasmallfoodprocessingcompanythatproducesthespreadable
lunchmeatproductEasyMeat.Juliusistryingtounderstandtherelationshipbetweenhisfirm'sadvertisingandits
sales.
Totalsalesinthespreadablemeatindustryhavebeenfairlyflatoverthelastdecade,andJulius'competitors'actions
havebeenquitestable.Juliusbelievesthathisadvertisinglevelsinfluencehisfirm'ssalespositively,buthedoesn't
haveaclearunderstandingofwhattherelationshiplookslike.
Let'shavealookatdataonhisfirm'sadvertisingandsalesoverthelast10years.ClickontheExcellinktocreatethe
scatterdiagramyourselffromanExcelspreadsheet.
EasyMeatData
Plottingannualsalesagainstannualadvertisingexpendituresgivesusavisualsenseoftherelationshipbetweenthe
twovariables.Lookingatthegraph,wecanseethatasadvertisinghasgoneup,saleshavegenerallyincreased.The
relationshiplooksreasonablylinear.
EasyMeatData
Thecorrelationcoefficientforthetwovariablesis0.93,indicatingastronglinearrelationshipbetweenadvertising
andsales.
EasyMeatData
Whatifweweretodrawalinethatcharacterizesthisrelationship?Whichlinewouldbestfitthedata?Ourmind's
eyealreadyseeshowthetwovariablesarerelated,buthowcanweformalizeourvisualimpression?
EasyMeatData
Beforewestartanycalculations,let'slookatseverallinesthatcoulddescribetherelationship.
EasyMeatData
Oneoftheselinesmostaccuratelydescribestherelationshipbetweenthetwovariables:the"bestfit"orregression
line.
EasyMeatData
Inourexample,thebestfitlineisSales=333,831+50*Advertising.Forthisline,theyinterceptis
333,831andtheslopeis50.
EasyMeatData
Ingeneral,aregressionlinecanbedescribedbyasimplelinearequation,y=a+bx,withyinterceptaandslopeb.
EasyMeatData
Inthisequation,theyvariable,sales,iscalledthedependentvariable,tosuggestthatwethinkJulius'sales
dependtosomedegreeonhisadvertising.Thexvariable,advertising,iscalledtheindependentvariable,orthe
explanatoryvariable.
EasyMeatData
Whenweobservethatachangeintheindependentvariable(hereadvertising)istypicallyaccompaniedbya
proportionalchangeinthedependentvariable(heresales),regressionanalysiscanidentifyandformalizethat
relationship.
EasyMeatData

Summary
Regressionanalysishelpsusfindthemathematicalrelationshipbetweentwovariables.Wecanuseregressionto
describealinearrelationship:onethatcanberepresentedbyastraightlineandcharacterizedbyanequationof
theformy=a+bx.

TheUsesofRegression
67/135

5/13/2016

QuantitativeMethodsOnlineCourse

Whatkindsofquestionscanregressionanalysishelpanswer?
Howdoesregressionhelpusasmanagers?Incanhelpintwoways:first,ithelpsusforecast.Forexample,wecan
makepredictionsaboutfuturevaluesofsalesbasedonpossiblefuturevaluesofadvertising.
Second,ithelpsusdeepenourunderstandingofthestructureoftherelationshipbetweentwovariablesby
expressingtherelationshipmathematically.
EasyMeatData

UsingRegressionforForecasting
Let'stalkfirstabouthowmanagerscanuseregressiontoforecast.Inourexample,regressioncanhelpJulius
predicthiscompany'ssalesforaspecifiedlevelofadvertising.
Forexample,ifheplanstospend$65,000inadvertisingnextyear,whatmightweexpectsalestobe?
Ifwedidn'tknowanythingabouttherelationship,butonlyhadthehistoricaldata,wemightsimplynotethatthe
lasttimeJuliusspent$65,000onadvertising,hissaleswere$3,200,200.Butisthisthebestpredictionwecan
make?
EasyMeatData
Notatall.Regressionanalysisbringstheentiredatasettobearonourprediction.Ingeneral,thiswillallowusto
makemoreaccuratepredictionsthanifweinferthefuturevalueofsalesfromasingleobservationofadvertising
andsales.Havingidentifiedtherelationshipbetweenthetwovariablesfromthefulldataset,wecanapplyour
understandingofthatrelationshiptoourforecast.
EasyMeatData
Usingregressionanalysis,wefoundtheregressionlinetobeSales=333,831+50*Advertising.IfJuliusplansto
spend$65,000inadvertising,whatwouldwepredictsalestobe?
EasyMeatData
Thepointonthelineshowsuswhatlevelofsalestoexpect.Inthiscase,wewouldexpectsalesof$2,916,169.
EasyMeatData
Withregression,wecanforecastsalesforanyadvertisinglevelwithintherangeofadvertisinglevelswe've
seenhistorically.Forexample,evenifJuliushasneverspentexactly$50,000onadvertising,wecanstill
forecastacorrespondinglevelofsales.
EasyMeatData
Wemustbeextremelycautiousaboutforecastingsalesforvaluesofadvertisingbeyondtherangeofvalueswe
havealreadyobserved.Thefurtherwearefromthehistoricalvaluesofadvertising,themoreweshouldquestion
thereliabilityofourforecast.
EasyMeatData
Forexample,wemightfeelcomfortableforecastingsalesforadvertisinglevelsabitabovetheobservedrange
perhapsashighas$100,000or$105,000.Butweshouldn'tinferthatifJuliusspent$10milliononadvertising,he
wouldachieve$500millioninsales.Thetotalmarketforspreadablemeatisprobablymuchlessthan$500million
annually!
EasyMeatData
Likewise,wemightfeelcomfortableforecastingsalesforadvertisinglevelsjustbelowtheobservedrange.Butwe
certainlyshouldn'treportthatifJuliusspent$0onadvertisinghewouldhavenegativesales!
EasyMeatData
Ifwetrytouseourregressionequationtoforecastsalesforadvertisinglevelsoutsideofthehistoricalrange,we
areimplicitlyassumingthattherelationshipbetweenadvertisingandsalescontinuestobelinearoutsideofthe
historicalrange.
EasyMeatData
68/135

5/13/2016

QuantitativeMethodsOnlineCourse

Inreality,althoughtherelationshipmaybequitelinearfortherangeofvalueswe'veobserved,thecurvemaywell
leveloffforadvertisingvaluesmuchlowerormuchhigherthanthosewe'veobserved.Withnoobservations
outsidethehistoricaldatarange,wesimplydon'thaveevidenceaboutwhattherelationshiplookslikethere.
EasyMeatData
Anothercriticalcaveattokeepinmindisthatwheneverweusehistoricaldatatopredictfuturevalues,weare
assumingthatthepastisareasonablepredictorofthefuture.Thus,weshouldonlyuseregressiontopredictthe
futureifthegeneralcircumstancesthatheldinthepast,suchascompetition,industrydynamics,andeconomic
environment,areexpectedtoholdinthefuture.

TheStructureofaRelationship
Regressioncanbeusedtodeepenourunderstandingofthestructuralrelationshipbetweentwovariables.Ifwe
thinkaboutit,manybusinessdecisionsareaboutincreasingordecreasingonevariableinvestmentsor
advertising,forexampletoaffectsomeothervariableproductivity,brandrecognition,orprofits,forexample.
Regressioncanrevealthestructureofrelationshipsofthistype.
Ourregressionanalysisstipulatesalinearrelationshipbetweensalesandadvertising.Understanding"the
structure"ofthisrelationshiptranslatesintofindingandinterpretingthecoefficientsoftheregressionequation.
Aswe'venotedabove,theconstantterm333,831mayhavenorealmanagerialsignificanceitjust"anchors"the
regressionlinebytellingustheyintercept.We'veneverseenadvertisinglevelscloseto$0,sowecannotinferthat
spendingnomoneyonadvertisingwillleadtosalesof$333,831!
Themoreimportanttermistheadvertisingcoefficient,50,whichgivesustheslopeoftheline.Theadvertising
coefficienttellsushowsaleshavechangedonaverageasadvertisinghasincreased.
Inthepast,whenadvertisinghasincreasedby$10,000,whathasbeentheaveragecorrespondingchangeinsales?
Assumingthattherelationshipbetweensalesandadvertisingislinear,each$1increaseinadvertisingshouldbe
accompaniedbythesameaverageincreaseinsales.Inourexample,foreveryincremental$1inadvertising,sales
increaseonaverageby$50.Thus,foreveryincremental$10,000inadvertising,salesincreaseonaverageby
$500,000.
Theregressionlinegivesusinsightintohowtwovariablesarerelated.Asonevariableincreases,byhowmuch
doestheothervariabletypicallychange?Howmuchgrowthinsalescanweanticipatefromanincremental
increaseinadvertisingexpenditures?Regressionanalysishelpsmanagersanswerquestionslikethese.

Summary
Weuseregressionanalysisfortwoprimarypurposes:forecastingandstudyingthestructureoftherelationship
betweentwovariables.Wecanuseregressiontopredictthevalueofthedependentvariableforaspecifiedvalue
oftheindependentvariable.Theregressionequationalsotellsushowthedependentvariablehastypically
changedwithchangesintheindependentvariable.

Exercise1:SoftDrinkConsumption
Percapitaconsumptionofsoftdrinkbeveragesisrelatedtopercapitagrossdomesticproduct(GDP).Generally,
thehighertheGDPofacountry,themoresodaitscitizensconsume.Softdrinkconsumptionismeasuredin
numberof8ozservings.
Basedondatafrom12countries,therelationshipcanbeexpressedmathematicallyas:
Source
Basedonthisrelationship,youcanexpectthat,onaverage,foreachadditional$1,000ofpercapitaGDPa
country'ssodaconsumptionincreasesby:
Theregressionequationtellsusthatinourdataset,averagesodaconsumptionincreasesby0.018servingsfor
everyadditional$1ofpercapitaGDP.So,foranadditional$1,000,averageconsumptionincreasesby($1,000)
(0.018servings/$)=18servings.
ThepercapitaGDPintheNetherlandsis$25,034.Whatdoyoupredictistheaveragenumberofservingsofsoda
consumedintheNetherlandsperyear?
69/135

5/13/2016

QuantitativeMethodsOnlineCourse

Enterpredictedaveragesodaconsumption(inservings)asaninteger(e.g.,"5").Roundifnecessary.
Theregressionequationtellsusthataveragesodaconsumption=130+0.018*(percapitaGDP).Therefore,we
anticipatetheNetherlands'averagesodaconsumptiontobe580.6servings.
Althoughtheregressionpredictsasodaconsumptionofaround581servingsperpersonfortheNetherlands,the
actualmeasurednumberofservingsconsumedismuchlower:362.Thediscrepancyintheactualandpredicted
consumptionreinforcesthatpercapitaGDPaloneisnotaperfectpredictorofsodaconsumption.

CalculatingtheRegressionLine
Aregressionlinehelpsyouunderstandtherelationshipbetweentwovariablesandforecastfuturevaluesofthe
dependentvariable.Alicepointsouttoyouthatthesetwofeaturesofregressionanalysismakeitapowerfultoolfor
managerswhomakeimportantdecisionsintheuncertainworldofbusiness.
Buthowdoyougeneratearegressionlinefromobserveddata?Ofallthestraightlinesthatyoucoulddrawthrougha
scatterdiagram,whichoneistheregressionline?

TheAccuracyofaLine
Let'sreturntoJuliusTabin'ssalesandadvertisingdata.Aswecanseefromthegraph,nostraightlinecouldbe
drawnthatwouldpassthrougheverypointinthedataset.
Thisisnotsurprising.Typically,advertisingisnotaperfectpredictorofsales,sowedon'texpecteverydatapointto
fallinaperfectline.Theregressionlinedepictsthebestlinearrelationshipbetweenthetwovariables.Weattribute
thedifferencebetweentheactualdatapointsandthelinetotheinfluencethatothervariableshaveonsales,orto
chancealone.
Sincetheregressionlinedoesnotpassthrougheverypoint,thelinedoesnotfitthedataperfectly.Howaccurately
doestheregressionlinerepresentthedata?
Tomeasuretheaccuracyofaline,we'llquantifythedispersionofthedataaroundtheline.Let'slookatonelinewe
coulddrawthroughourdataset.
Let'sconsiderasecondline.Clickonthelinethatmorecloselyfitsthetendatapoints.
Althoughinthisexamplewecanseewhichoftwolinesismoreaccurate,itisusefultohaveaprecisemeasureofa
line'saccuracy.
Toquantifyhowaccuratelyalinefitsadataset,wemeasuretheverticaldistancebetweeneachdatapointandthe
line.
Whydon'twemeasuretheshortestdistancebetweenthepointandthelinethedistanceperpendiculartothe
line?Whydowemeasurevertically?
Wemeasureverticaldistancebecauseweareinterestedinhowwellthelinepredictsthevalueofthedependent
variable.Thedependentvariableinourcase,salesismeasuredontheverticalaxis.Foreachdatapoint,we
wanttoknow:howcloseisthevalueofsalespredictedbythelinetothehistoricallyobservedvalueofsales?
Fromnowonwewillrefertothisverticaldistancebetweenadatapointandthelineastheerrorinpredictionor
theresidualerror,orsimplytheerror.Theerroristhedifferencebetweentheobservedvalueandtheline's
predictionforourdependentvariable.Thisdifferencemaybeduetotheinfluenceofothervariablesortoplain
chance.
Goingforward,wewillrefertothevalueofthedependentvariablepredictedbythelineasyhatand
totheactualvalueofthedependentvariableasy.Thentheerrorisy(yhat),thedifference
betweentheactualandpredictedvaluesofthedependentvariable.
Thecompletemathematicaldescriptionoftherelationshipbetweenthedependentandindependentvariablesisy=
a+bx+error.Theyvalueofanydatapointisexactlydefinedbytheseterms:thevalueyhatgivenbythe
regressionlineplustheerror,y(yhat).
Collectively,theerrorsinpredictionforallthedatapointsmeasurehowaccuratelyalinefitsasetofdata.
Toquantifythetotalsizeoftheerrors,wecannotjustsumeachoftheverticaldistances.Ifwedid,positiveand
negativedistanceswouldcanceleachotherout.
70/135

5/13/2016

QuantitativeMethodsOnlineCourse

Instead,wetakethesquareofeachdistanceandthensumallthesquares,similarlytowhatwedowhenwecalculate
variance.
Thismeasure,calledtheSumofSquaredErrors(SSE),ortheResidualSumofSquares,givesusagood
measureofhowaccuratelyalinedescribesasetofdata.
Thelesswellthelinefitsthedata,thelargertheerrors,andthehighertheSumofSquaredErrors.

Summary
Tofindthelinethatbestfitsadataset,wefirstneedameasureoftheaccuracyofaline'sfit:theSumofSquared
Errors.TofindtheSumofSquaredErrors,wecalculatetheverticaldistancesfromthedatapointstotheline,
squarethedistances,andsumthesquares.

IdentifyingtheRegressionLine
Nowthatyouhaveawaytomeasurehowwellalinefitsasetofdata,youneedawaytoidentifythelinethat"best
fits"thedata:theregressionline.
WecancalculatetheSumofSquaredErrorsforanylinethatpassesthroughthedata.Ofcourse,differentlineswill
giveusdifferentSumsofSquaredErrors.Thelinewearelookingfortheregressionlineistheonewiththe
smallestSumofSquaredErrors.
Let'slookatseverallinesthatcoulddescribetherelationshipbetweenadvertisingandsalesinourexample.Our
intuitiontellsusthatthemiddlelineisamuchbetterfitthanlineaorlineb.
Let'scheckourintuition.Foreachline,wecancalculatetheSumofSquaredErrorstodetermineitsaccuracy.
ThelowertheSumofSquaredErrors,themorepreciselythelinefitsthedata,andthehighertheline'saccuracy.
Thelinethatmostaccuratelydescribestherelationshipbetweenadvertisingandsalestheregressionlineisthe
linethatminimizesthesumofsquares.Findingtheregressionlineforasetofdataisacalculationintensiveprocess
bestlefttostatisticalsoftware.

Summary
ThelinethatmostaccuratelyfitsthedatatheregressionlineisthelineforwhichtheSumofSquaredErrors
isminimized.

PerformingRegressionAnalysis
Note:UnlessyouhaveinstalledtheExcelDataAnalysisToolPakaddin,youwillnotbeabletodoregression
analysisusingtheregressiontool.However,wesuggestyoureadthroughthefollowinginstructionstolearnhow
Excel'sregressiontoolworks,soyoucanrunregressionsinthefuture,whenyoudohaveaccesstotheData
AnalysisToolpak.
Performingregressionanalysisbyhandisatimeconsumingprocess.Fortunately,statisticalsoftwarepackages
andmajorspreadsheetprogramsExcel,forexamplecandothenecessarycalculationsforyouinamatterof
seconds.ClickontheExcellinktoaccessthedatafilesoyoucanpracticedoingtheanalysisinExcelasyouread
throughtheinstructions.
EasyMeatData
Let'sgothroughtheprocessstepbystep.WestartwithdataenteredintwocolumnsinanExcelspreadsheet.Each
columncontainsvaluesofavariable.Toperformregressionanalysis,theremustbeanequalnumberofentriesin
eachcolumn.
UndertheDatatabinthetoolbarweselecttheDataAnalysisoption.
Awindowpopsopencontaininganalphabeticallistofstatisticaltools.Weselect"Regression"andclick"OK".
Anewwindowopensofferingseveraloptionsforregressionanalysis.
Intheregressionwindow,weseeapromptfieldtitled''InputYRange.''Init,weenterC1:C11,therangeofcells
71/135

5/13/2016

QuantitativeMethodsOnlineCourse

containingthecolumnlabel(C1)andthedata(C2:C11)forthedependentvariable:Sales($).
Werepeatthisforthepromptfieldtitled''InputXRange,''enteringB1:B11toincludeboththecolumnlabel(B1)
andthedata(B2:B11)fortheindependentvariable:Advertising($).
Sinceweincludedthecolumnlablesinrow1inourranges,wemustcheckthe"Labels"box.Includinglabelsis
helpfulbecauseExcelusesthelabelstoidentifythevariablecoefficientsintheoutputsheet.Ifyoudonotinclude
thelabelsinyourranges,donotcheckthelabelbox,orExcelwilltreatthefirstrowofdataaslabels,excluding
thoseentriesfromtheregression.
Finally,weselecttheoutputoption"NewWorksheetPly:",enterthenameforthenewworksheet,andclick"OK."
Excelopensanewworksheetwiththenamewespecified.Init,weseeanintimidatingarrayofdata.
Forthemoment,wearemainlyinterestedintheentriesinthecellslabeled"Coefficients",whichspecifythe
interceptandslopeoftheregressionline.
Notethatthelabel"Advertising($)"hasbeencarriedoverfromtheoriginaldatacolumn.Thecoefficientinthe
"Advertising($)"rowistheslopeoftheregressionline.
Fortheexercisesinthisunit,westronglyrecommendyoufindtherelevantdatainanExcelspreadsheetand
performtheregressionanalysesyourself.IfyoudonothavetheAnalysisToolpak,youcanopenafilecontaining
therelevantregressionoutput.
EasyMeatData
EasyMeatRegression

Exercise1:SoftDrinkConsumptionRevisited
TopracticeusingExcel'sregressiontool,runaregressionusingtheworldsoftdrinkconsumptiondatafroman
earlierexercise.UsesoftdrinkconsumptionforthedependentvariableandpercapitaGDPfortheindependent
variable.
SoftDrinkConsumptionData
Whatistheslopeoftheregressionline?Entertheslopeasadecimalnumberwith3digitstotherightofthe
decimalpoint(e.g.,enter"5"as"5.000").Roundifnecessary.
SoftDrinkConsumptionData
SoftDrinkConsumptionRegression
Source
WeruntheregressionbyselectingrangeC1:C13fortheYrange,thedependentvariableconsumption,andB1:B13
fortheXrange,theindependentvariableGDPpercapita.Wecheckthelabelbox,andseetheoutputbelow.The
slopeoftheregressionlineisthecoefficientoftheindependentvariable,GDPpercapita.
Whatistheinterceptoftheregressionline?Entertheinterceptasaninteger(e.g.,"5").Roundifnecessary.
SoftDrinkConsumptionData
SoftDrinkConsumptionRegression
Source
Theinterceptofthelineisthecoefficientlabeled"Intercept."

DeeperintoRegression
Equippedwiththebasictoolsneededtofindandinterprettheregressionline,youfeelreadytotackleLeo's
assignment.ButAlicecautionsyounottobehastyandurgesyoutoconsidersometrickyquestions:"Howwelldoesthe
regressionlineactuallycharacterizetherelationshipinthedata?Isastraightlineevenagooddescriptorofthe
relationship?"

QuantifyingthePredictivePowerofRegression
Howmuchdoestherelationshipbetweenadvertisingandsaleshelpusunderstandandpredictsales?We'dliketo
beabletoquantifythepredictivepoweroftherelationshipindeterminingsaleslevels.Howmuchmoredoweknow
72/135

5/13/2016

QuantitativeMethodsOnlineCourse

aboutsalesthankstotheadvertisingdata?
Toanswerthisquestionweneedabenchmarktellingushowmuchweknowaboutthebehaviorofsaleswithout
theadvertisingdata.Onlythendoesitmakesensetoaskhowmuchmoreinformationtheadvertisingdatagive
us.
Withouttheadvertisingdata,wehavethesalesdataalonetoworkwith.Usingnoinformationotherthanthesales
data,thebestpredictorforfuturesalesissimplythemeanofprevioussales.Thus,weusemeansalesasour
benchmark,anddrawa"meansalesline"throughthedata.
Let'scomparetheaccuracyoftheregressionlineandthemeansalesline.Wealreadyhaveameasureofhow
accuratelyanindividuallinefitsasetofdata:theSumofSquaredErrorsabouttheline.Nowwewantameasureof
howmuchmoreaccuratetheregressionlineisthanthemeanline.
Toobtainsuchameasure,we'llcalculatetheSumofSquaredErrorsforeachofthetwolines,andseehowmuch
smallertheerrorisaroundtheregressionlinethanaroundthemeanline.
TheSumofSquaredErrorsforthemeansaleslinemeasuresthetotalvariationinthesalesdata.Infact,itisthe
samemeasureofvariationweusetoderivethestandarddeviationofsales.WecalltheSumofSquaredErrorsfor
ourbenchmarkthemeansaleslinetheTotalSumofSquares.Here,theTotalSumofSquaresis8.01trillion.
TheSumofSquaredErrorsfortheregressionlineisoftencalledtheResidualSumofSquaredErrors,orthe
ResidualSumofSquares.TheResidualSumofSquaresisthevariationleft"unexplained"bytheregression.Here,
theResidualSumofSquaresis1.13trillion.
ThedifferencebetweentheTotalSumofSquaresandtheResidualSumofSquares,6.88trillioninthiscase,is
calledtheRegressionSumofSquares.TheRegressionSumofSquaresmeasuresthevariationinsales"explained"
bytheregressionline.
Excel'sregressionoutputreportsallthreeoftheseterms.
Astandardizedmeasureoftheregressionline'sexplanatorypoweriscalledRsquared.Rsquaredisthefractionof
thetotalvariationinthedependentvariablethatisexplainedbytheregressionline.
Rsquaredwillalwaysbebetween0and1atworst,theregressionlineexplainsnoneofthevariationinsalesat
bestitexplainsallofit.
WefindRsquaredbydividingthevariationexplainedbytheregressionlinetheRegressionSumofSquares
bythetotalvariationinthedependentvariabletheTotalSumofSquares.
Rsquaredispresentedeitherasafraction,apercentage,oradecimal.Wefindthatintheadvertisingandsales
example,theRsquaredvalueis6.88trillion/8.01trillion=0.859=85.9%.
AnequivalentapproachtocomputingRsquaredissomewhatlessintuitivebutmorecommon.Inthisapproachwe
firstfindthefractionofthetotalvariationinthedependentvariablethatisNOTexplainedbytheregressionline:
wedividetheResidualSumofSquaresbytheTotalSumofSquares.
Thenwesubtractthefractionofunexplainedvariationfrom1toobtainRsquared.
Fortunately,wedon'tneedtocalculateRsquaredourselvesExcelcomputesRsquaredandincludesitinthe
standardregressionoutput.
Inaregressionthathasonlyoneindependentvariable,Rsquarediscloselyrelatedtothecorrelationcoefficient
betweentheindependentanddependentvariables:thecorrelationcoefficientissimplythepositiveornegative
squarerootofRsquaredpositiveiftheslopeoftheregressionlineispositiveandnegativeiftheslopeofthe
regressionlineisnegative.
Excel'sregressionoutputalwayscomputesthesquarerootofRsquared,whichitlabels"MultipleR."

Summary
Rsquaredmeasureshowwellthebehavioroftheindependentvariableexplainsthebehaviorofthedependent
variable.RsquaredistheratiooftheRegressionSumofSquarestotheTotalSumofSquares.Assuch,ittellsus
whatproportionofthetotalvariationinthedependentvariableisexplainedbyitslinearrelationshipwiththe
independentvariable.

73/135

5/13/2016

QuantitativeMethodsOnlineCourse

ResidualAnalysis
Althoughtheregressionlineisthelinethatbestfitstheobserveddata,thedatapointstypicallydonotfall
preciselyontheline.Collectively,theverticaldistancesfromthedatatothelinetheerrorsmeasurehowwell
thelinefitsthedata.Theseerrorsarealsoknownasresiduals.
Acarefulstudyoftheresidualscantellusalotaboutaregressionanalysisandthevalidityoftheassumptionswe
baseiton.
Forexample,whenwerunaregression,weassumethatastraightlinebestdescribestherelationshipbetween
ourtwovariables.Infact,sometimestherelationshipmaybebetterdescribedbyacurve.
Inthisgraph,wecanclearlyseeanegativetrend.Ifwerunaregressiononthesedata,wefindarelativelyhighR
squared.Howdoweusetheresidualstocheckourassumptionthattherelationshipislinear?
First,wemeasuretheresiduals:thedistancefromthedatapointstotheregressionline.
Thenweplottheresidualsagainstthevaluesoftheindependentvariable.Thisgraphcalledaresidualplot
helpsusidentifypatternsintheresiduals.
Wecanrecognizeapatternintheresidualplot:acurve.Thispatternstronglyindicatesthatastraightlineisnot
thebestwaytoexpresstherelationshipbetweenthevariables:acurvewouldbeamuchbetterfit.
Aresidualplotoftenisbetterthantheoriginalscatterplotforrecognizingpatternsbecauseitisolatestheerrors
fromthegeneraltrendinthedata.Residualplotsarecriticalforstudyingerrorpatternsinmoreadvanced
regressionswithmultipleindependentvariables.
Iftheonlypatterninthedependentvariableisaccountedforbyalinearrelationshipwiththeindependent
variable,thenweshouldseenosystematicpatternintheresidualplot.Theresidualsshouldbespreadrandomly
aroundthehorizontalaxis.
Infact,thedistributionoftheresidualsshouldbeanormaldistribution,withmeanzero,andafixedvariance.
Residualsarecalledhomoskedasticiftheirdistributionshavethesamevariance.
Ifweseeapatterninthedistributionoftheresiduals,thenwecaninferthatthereismoretothebehaviorofthe
dependentvariablethanwhatisexplainedbyourlinearregression.Otherfactorsmaybeinfluencingthe
dependentvariable,ortheassumptionthattherelationshipislinearmaybeunwarranted.
We'vealreadyseenthepatternofacurvedrelationship.Whatotherpatternsmightwesee?
Let'slookatthisscatterdiagramanditscorrespondingresidualplot.Theresidualsappeartobegettinglargerfor
highervaluesoftheindependentvariable.Thisphenomenonisknownasheteroskedasticity.
Residualanalysisrevealsthatthedistributionoftheresidualschangeswiththeindependentvariable:the
varianceincreasesastheindependentvariableincreases.Sincethevarianceoftheresidualswhichcontributes
tothevariationofthedependentvariableisaffectedbythebehavioroftheindependentvariable,wecan
concludethattheremustbemoretothestorythanjustthelinearrelationship.
Thereareanumberofotherassumptionsaboutregressionwhosevaliditycanbetestedbyperformingaresidual
analysis.Althoughinteresting,theseusesofresidualanalysisarebeyondthescopeofthiscourse.

Summary
Acompleteregressionanalysisshouldincludeacarefulinspectionoftheresiduals.Plottheresidualsagainstthe
independentvariabletorevealpatternsinthedistributionoftheresiduals.

GraphingResidualLinesandResiduals
Note:UnlessyouhaveinstalledtheExcelDataAnalysisToolPakaddin,youwillnotbeabletoperformresidual
analysisusingtheregressiontool.However,wesuggestyoureadthroughtheinstructionstolearnhowExcel's
regressiontoolworks,soyoucanperformresidualanalysisinthefuture,whenyoudohaveaccesstotheData
AnalysisToolpak.
Tostudypatternsinresidualsandotherregressiondata,visualrepresentationscanbeveryhelpful.Toplota
regressionlinewefirstgenerateascatterdiagramofthedatawearestudying.
74/135

5/13/2016

QuantitativeMethodsOnlineCourse

Oncewehavethescatterdiagramweaddtheregressionlinebyrightclickingonanyoneofthedatapoints.A
menuwillpopup.Fromthatmenu,weselect"AddTrendline."
IntheTrendlineOptionsMenu,weselect"Linear"underTrend/RegressionType.
IfwewishtodisplaytheregressionequationandRsquaredvalueonthesamescatterplot,wecanselectthecheck
boxesnexttothoseitemsatthebottomandpress"Close."
Thescatterdiagramwillnowhavebeenaugmentedbytheregressionline,theregressionequation,andtheR
squaredvalue.Thisisaquickwaytoperformasimpleregressionanalysisfromascatterplot,thoughitdoesn't
providealloftheoutputwe'llwanttothoroughlyreviewtheresults.
ResidualanalysisisanoptioninExcel'sRegressiontool.Tocalculatetheresidualsandgeneratetheresidualplot,
weselect"ResidualPlots"intheResidualssectionoftheRegressionMenuandclick"Ok."
Theresidualsandtheirplotappearintheregressionoutput.

TheSignificanceofRegressionCoefficients
Rsquaredandtheresidualsarenottheonlythingstokeepaneyeonwhenrunningaregression.
Theregressionlineisthelinethatbestfitsourobserveddata.Butarewesurethattheregressionlinedepictsthe
"true"linearrelationshipbetweenthevariables?
Infact,theregressionlineisalmostneveraperfectdescriptorofthetruelinearrelationshipbetweenthevariables.
Why?Becausethedataweusetofindtheregressionlinetypicallyrepresentonlyasamplefromtheentire
populationofdatapertainingtotherelationship.
Returningtoourspreadablelunchmeatexample,advertisingandsalesdataforadifferentsetofyearswouldlikely
havegivenusadifferentline.Sinceeachregressionlinecomesfromalimitedsetofdata,itgivesusonlyan
approximationofthe"true"linearrelationshipbetweenthevariables.
Aswedoinsampling,we'lluseGreekletterstodenoteparametersdescribingthepopulation,andLatinlettersto
denotetheestimatesofthoseparametersbasedonsampledata.We'llusealphaandbetatorepresentthe
coefficientsofthe"true"linearrelationshipinthepopulation,andaandbtorepresentourestimatesofalphaand
beta.
Whenwecalculatethecoefficientsofaregressionlinefromasetofobserveddata,thevalueaisanestimateof
alpha,theinterceptofthe"true"line.Similarly,thevaluebisanestimateofbeta,theslopeofthetrueline.
Likeallestimatesbasedonsampledata,thecalculatedestimateforacoefficientisprobablydifferentfromthetrue
coefficient.Justasinsampling,tofindarangeoflikelyvaluesforthetruecoefficient,weconstructconfidence
intervalsaroundeachestimatedcoefficient.
Excel'soutputgivesus95%confidenceintervalsforeachcoefficientbydefault.
TheExceloutputforourEasyMeatexampletellsusthatthebestestimateoftheslopeis50,andweare95%
confidentthattheslopeofthetruelineisbetween33.5and66.5.
Wecanspecifyanyotherconfidencelevelwhensettinguptheregression.Wemightbeinterestedina99%
confidenceintervalfortheslopeoftheEasyMeatregressionline.
TheExceloutputforourEasyMeatexampletellsusthatthebestestimateoftheslopeis50,andweare99%
confidentthattheslopeofthetruelineisbetween25.9and74.1.

TestingforaLinearRelationship
Sincewedon'tknowtheexactvalueoftheslopeofthetrueadvertisingline,wemightwellquestionwhetherthere
actuallyISalinearrelationshipbetweenadvertisingandsales.Howcanweassureourselvesthatthepatternwe
seeinthesampledataisnotsimplyduetochance?
Iftheretrulywerenolinearrelationship,theninthefullpopulationofrelevantdata,changesinadvertisingwould
notcorrespondsystematicallytoproportionalchangesinsales.Inthatcase,theslopeofthebestfittinglineforthe
truerelationshipwouldbezero.
Therearetwoquickwaystotestiftheslopeofthetruelinemightbezero.Oneissimplytolookattheconfidence
75/135

5/13/2016

QuantitativeMethodsOnlineCourse

intervalfortheslopecoefficientandseeifitincludeszero.
The95%confidenceintervalfortheslopeinourcase,[33.5,66.5],doesnotincludezero,sowecanrejectwith95%
confidencethehypothesisthattheslopeofthetruelineiszero.Wecansaythatweare95%confidentthatthere
reallyisalinearrelationshipbetweenadvertisingandsales.
Alternatively,wecantesthowlikelyitisthattheslopereallyiszerousingahypothesistest.Weformulatethenull
hypothesisthatthereisnolinearrelationship:thetrueslope,beta,oftheregressionlineiszero.
Thenwecalculatethepvalue:thelikelihoodofhavingaslopeatleastasfarfromzeroastheslopecalculated,b,
assumingthattheslopeisinfactzero.
Inourexample,thepvaluetellsusthelikelihoodofhavingaslopeasfarfromzeroas50ifthetrueslopewerein
factzero.
Fortunately,Excelsavesusthetroubleofcalculatingthepvalueandreportsitalongwiththeregressionoutput.
Thispvaluetellsusexactlywhatwewanttoknow:iftherereallyisnolinearrelationshipbetweenthevariables,
howlikelyisitthatoursamplewouldhavegivenusaslopeaslargeas50?
Inourexample,thepvaluefortheadvertisingcoefficientis0.0001,indicatingthatwecanbeconfidentatthe
99.99%levelthatbeta,theslopeofthetrueline,isdifferentfromzero.Thus,weareconfidentatthe99.99%level
thattherereallyisalinearrelationshipbetweenadvertisingandsales
Ingeneral,ifthepvalueforaslopecoefficientislessthan0.05,thenwecanrejectthenullhypothesisthatthe
slopebetaiszero,andconcludewith95%confidencethatthereisalinearrelationshipbetweenthetwovariables.
Moreover,thesmallerthepvalue,themoreconfidentwearethatalinearrelationshipexists.
Ifthepvalueforaslopecoefficientisgreaterthan0.05,thenwedonothaveenoughevidencetoconcludewith
95%confidencethatthereisasignificantlinearrelationshipbetweenthevariables.
Additionally,pvaluescanbeusedtotesthypothesesabouttheinterceptofaregressionline.Inourexample,the
pvaluefortheinterceptis0.498,muchlargerthan0.05.However,sincetheinterceptsimplyanchorsthe
regressionline,whetherornotitiszeroisnotparticularlyimportant.
Wewon'tusethestandarderrorofthecoefficientinthiscourse,butwillsimplypointoutthatthestandarderror
issimilartoastandarddeviationofourdistributionforthecoefficient.Thus,the95%confidenceintervalis
approximately50+/2*(7.2).It'snotexact,becausethedistributionisn'tquitenormal,buttheconceptisvery
similar.
Wewon'tusethetstatinthiscourseeither,sincethepvaluegivestheequivalentinformationinamorereadily
usableform.Thetstattellsushowmanystandarderrorsthecoefficientisfromthevaluezero.Thus,ifthetstatis
greaterthan2,wearequitesure(approximately95%confident)thatthetruecoefficientisnotzero.

Summary
Theslopeandinterceptoftheregressionlineareestimatesbasedonsampledata:howcloselytheyapproximate
theactualvaluesisuncertain.Confidenceintervalsfortheregressioncoefficientsspecifyarangeoflikelyvalues
fortheregressioncoefficients.Excelreportsapvalueforeachcoefficient.Ifthepvalueforaslopecoefficientis
lessthan0.05wecanbe95%confidentthattheslopeisnonzero,andhencethatthereisalinearrelationship
betweentheindependentanddependentvariables.

RevisitingRsquaredandp
ItisimportantnottoconfusethepvalueforacoefficientwiththeRsquaredfortheregression.Rsquaredtellsus
whatpercentageofthevariationseeninthedependentvariableisexplainedbyitsrelationshipwiththe
independentvariable.Thepvaluetellsusthelikelihoodthatthereisnorealrelationshipbetweenthedependent
andindependentvariablesthatthetruecoefficientofthelineinthefullpopulationiszero.
Let'slookatanexamplethatillustratesthedifferenceinthesetwomeasures.Thisscatterdiagramshowsdata
tightlyclusteredaroundtheregressionline.RsquaredisveryhighverylittlevariationinYisleftunexplained
bytheline.ThepvalueonthecoefficientisverylowthereisclearlyasignificantlinearrelationshipbetweenX
andY.
ThesecondscatterdiagramhasamuchlowerRsquaredalotofvariationinYisleftunexplainedbytheline.
ButthereisnoquestionthatthereisasignificantrelationshipbetweenXandY,sothepvalueonthecoefficient
isagainverylow.Here,thelinearrelationshipmaynotexplainasmuchvariationasitdoesintheuppergraph,
76/135

5/13/2016

QuantitativeMethodsOnlineCourse

buttherelationshipisclearlysignificant.
Ahighpvalueindicatesalackofconfidenceintheunderlyinglinearrelationship.Wewouldnotexpecta
questionablerelationshiptoexplainmuchvariation.Soifwehaveonlyoneindependentvariableandithasahigh
pvalue,wewouldnotexpecttofindaveryhighRsquared.However,aswe'llseeshortly,thestorybecomesmore
complexwhenwehavetwoormoreindependentvariables.
Sofar,wehaven'tdiscussedhowsamplesizeaffectstheaccuracyofaregressionanalysis.Thelargerthesample
weusetoconducttheregressionanalysis,themoreprecisetheinformationweobtainaboutthetruenatureofthe
relationshipunderinvestigation.Specifically,thelargerthesample,thebetterourestimatesfortheslopeandthe
intercept,andthetightertheconfidenceintervalsaroundthoseestimates.
Let'slookathowsamplesizeaffectspvaluesinanexample.Thisscatterplotisbasedonalargesamplesize50
observations.WithapvalueofzeroweareextremelyconfidentthatthereisalinearrelationshipbetweenXand
Y,andourconfidenceintervalfortheslopecoefficientisquitetight.
Thesecondscatterplotisbasedonasamplesizeofonly10.Thepvaluerisesto0.07,sowecannotbe95%
confidentthatthereisalinearrelationshipbetweenXandY.Ourlackofconfidenceinourestimateisalsoevident
inthewideconfidenceintervalfortheslopecoefficient.

Summary
ThepvalueandR2 providedifferentinformation.Alinearrelationshipcanbesignificantbutnotexplaina
largepercentageofthevariation,sohavingalowpvaluedoesnotensureahighR2 .Samplesizeisanimportant
determinantofregressionaccuracy:aswithallsampling,largersamplesgivemoreaccurateestimates.

SolvingtheStaffingProblem
Awareofmanyofbasicregression'ssubtleties,youarereadytoturntoLeo'sstaffingproblem.
TheKahanahas500guestrooms.Overthelastthreeyears,averageyearlyoccupancyhasvariedagooddeal:my
recordlowwas250roomsoccupied,andmyhighwas484.Thatrangeofvariationmakesstaffplanningdifficult.
I'dliketobeabletopredicttheKahana'soccupancyonemonthinadvance.Sofar,I'vebeenmakingeducated
guessesaboutthelevelofoccupancyonemonthinthefuturebasedonthenumberofadvancebookings.Itakethe
numberofbookingsandaddanother50%,tobeonthesafeside.Obviously,Ihaven'tdoneverywellwiththat
method.
Youneedtofindoutmorepreciselywhattherelationshipisbetweenadvancebookingsandoccupancy.Aliceasks
youtorunaregressionanalysisonLeo'soccupancyandadvancebookingsdata.
BookingsandOccupancyData
Theresultsofyouranalysistellyouthat,forevery100additionaladvancebookings,
BookingsandOccupancyData
BookingsandOccupancyRegression
Whenyouruntheregressionanalysis,besureyouchoosethedependentandindependentvariablescorrectly.We
areusingadvancebookingstopredictoccupancy,suggestingthatwebelievethatoccupancylevels"depend"on
advancebookings.Thus,theoccupancyisthedependentvariable,andthenumberofadvancebookingsisthe
independentvariable.
Intheregressionoutput,findthecoefficientoftheslopeoftheregressionline.Thistellsyouhowmanyadditional
guestsLeocanexpectforeachadditionaladvancebooking.
Basedonthesedata,canyoube95%confidentthattheslopeoftheregressionlineisnot0?
BookingsandOccupancyData
BookingsandOccupancyRegression
Howmuchofthevariationinoccupancyisexplainedbythevariationinthenumberofadvancebookings?
BookingsandOccupancyData
BookingsandOccupancyRegression
77/135

5/13/2016

QuantitativeMethodsOnlineCourse

YouandAlicepresentyourfindingstoLeo.
Sothereisapositiverelationshipbetweenadvancebookingsandoccupancy.Butthepowerofadvancebookingsto
predictoccupancyisprettysmall.Morethanhalfofthevariationinyourroomoccupancyisduetootherfactors.
Isupposethemathematicalrelationshipyoufoundwillhelpmemakeslightlymoreinformedstaffingchoices.But
mostlyI'llstillbestumblinginthedark.
NotsofastLeo:Wemaybeabletoidentifyotherfactorslinkedtooccupancy,andusethemtocomeupwithan
improvedforecastingmodel.Whydon'twemeettomorrowanddiscusswhatotherfactorsmighthelpuspredict
occupancy?
Alright.Inthemeantime,I'llbedoingmybesttoappeaseMr.Pitt.He'stalkingaboutfilingsuitagainstme.Hesays
hewasburned.Burnedbyabisque!

Exercise1:InventoriesandCapacityUtilization
Youhavebeenaskedtoexaminetherelationshipbetweentwoimportantmacroeconomicquantities:thechange
inbusinessinventoriesandfactorycapacityutilizationlevels.
Ifyouwishtolearnwhatpercentofthevariationinchangesininventoriesisexplainedbycapacityutilization
levels,whichvariableshouldyouchooseasyourindependentvariable?
ClickheretoaccessUSeconomicdatafromtheyears19711986.Runtheregressionwiththechangeinbusiness
inventoriesasthedependentvariableandthecapacityutilizationastheindependentvariable.
Source
Usingtheregressionoutput,findtheslopeoftheregressionline.Entertheslopeasadecimalnumberwith2
digitstotherightofthedecimalpoint(e.g.,enter"5"as"5.00").Roundifnecessary.
InventoriesandCapacityData
InventoriesandCapacityRegression

Exercise2:CleverSolutions
GretaJohnisthehumanresourcesmanageratthesoftwareconsultingfirmCleverSolutions.Recently,someof
theprogrammershavebeenrestless:theseniorprogrammersfeelthatlengthofserviceandloyaltyhavenotbeen
rewardedintheircompensation.Thejuniorprogrammersthinkthatseniorityshouldnotbeamajorbasisforpay.
Gretawantssomeharddatatoinformthedebate.Asapreliminarystep,sheplotsemployees'salariesagainst
theirlengthofservice.Usingthedataprovided,performaregressionanalysiswithsalaryasthedependent
variableandlengthofserviceastheindependentvariable.
CleverSolutionsData
WhatistheaverageincreaseinaCleverSolutionsprogrammer'ssalaryperyearofservice?
CleverSolutionsData
CleverSolutionsRegression
Basedontheregressionanalysis,Gretacantellthat:
CleverSolutionsData
CleverSolutionsRegression

Exercise3:ProductivityandCompensation
Productivitymeasuresanation'saverageoutputperlaborhour.Itisoneofthemostcloselywatchedvariablesin
economics:asworkersproducemoreperhour,employerscanpaythemmorewithoutincreasingthepriceofthe
product.
Sincewagescanrisewithoutprovokingacorrespondingriseinconsumerprices,thegrowthofproductivityis
essentialtoarealincreaseinanation'sstandardofliving.
PeterAgarwal,astudentattheHarvardBusinessSchool,wantstoinvestigatetherelationshipbetweenchangein
78/135

5/13/2016

QuantitativeMethodsOnlineCourse

productivityandchangeinrealhourlycompensation.
Peterhasdataonchangeinproductivityandchangeincompensationfor8industrializednations.Thefiguresare
annualaveragesovertheperiodfrom19791990.
ProductivityandCompensationData
Source
Runaregressionwithchangeincompensationasthedependentvariableandchangeinproductivityasthe
independentvariable.
ProductivityandCompensationData
Source
Howmuchofthevariationinthechangeincompensationcanbeexplainedbythechangeinproductivity?Enter
thepercentageasdecimalnumberwith2digitstotherightofthedecimalpoint(e.g,enter''50%''as''0.50'').
Roundifnecessary.
ProductivityandCompensationData
ProductivityandCompensationRegression
Giventhesedata,Peterfindsthattherelationshipcanbemathematicallyexpressedas:
CanPeterclaim(witha95%levelofconfidence)thattherelationshipisstatisticallysignificant?
ProductivityandCompensationData
ProductivityandCompensationRegression
ThecoefficientfortheslopegivenbyExcelisanestimatebasedonthedatainPeter'ssample.Theestimateforthe
slopeoftheregressionlineisabout0.75.Iftheactualslopeoftherelationshipis0,thereisnosignificantlinear
relationshipbetweenthechangeinproductivityandthechangeincompensation.
Ontheregressionoutput,therearetwowaystotelliftheslopecoefficientissignificantatthe0.05level.First,we
canlookatthe95%confidenceintervalprovidedandseethatitrangesfrom0.05to+1.56.Sincethe95%
confidenceintervalcontainszero,thecoefficientisnotsignificantatthe0.05level.
Alternatively,wecannotethatthepvalueoftheslopecoefficient,0.0625,isgreaterthan0.05.Petercannotbe
95%confidentthattheactualslopeis0.
SincePetercannotbeconfidentthattheslopeisnotzero,hecannotbeconfidentthatthereisalinearrelationship
betweenthetwovariables.
SupposePetercollectsdataon8morecountries.Runtheregressionfortheentiredatasetwithchangein
compensationasthedependentvariableandchangeinproductivityastheindependentvariable.
ExpandedProductivityandCompensationData
Thenewdatasetindicatesthatthevariableshaveaslightlydifferentrelationship:
CanPeterclaim(witha95%levelofconfidence)thattherelationshipisstatisticallysignificant?
ExpandedProductivityandCompensationData
ExpandedProductivityandCompensationRegression
Whatmightexplainwhythecoefficientissignificantinthesecond(combined)dataset?
ExpandedProductivityandCompensationData
ExpandedProductivityandCompensationRegression

MultipleRegression
Introduction
Afterbrainstormingforseveralhours,youandAlicedeviseaplantoimproveyourregressionanalysisofthestaffing
problem,tohelpLeobetterpredicttheKahana'soccupancy.

79/135

5/13/2016

QuantitativeMethodsOnlineCourse

TheStaffingProblem(II)
Goodmorning.Ihopeyouhadagoodnight'ssleepandhavecomeupwithsomeideasonhowtobetterpredictmy
occupancy.
Formypart,Islepthorriblylastnight.Thisbisquelawsuitistakingyearsoffmylife.
I'msorrytohearthat.I'mafraidwedon'thaveanylegaladvicetogiveyou.Wedohavesomeideasabouthowto
improveyourabilitytoforecastyourhotel'soccupancy.
Wecandobetterthanexplaining39%ofthevariationinoccupancyaswedidwithourearlierregressionusing
advancebookingsastheindependentvariable.
Rememberwhenwefirstarrived,weanalyzedtherelationshipbetweenKauai'saveragehoteloccupancyratesand
arrivalsontheisland?Wefoundafairlystrongcorrelationbetweenoccupancyandarrivals:71%.
Source
ItookalookattherelationshipbetweenarrivalsonKauaiandyourhotel'soccupancynumbers.Intheregression
ofKahanaoccupancyversusarrivals,arrivalsexplain80%ofthevariationinoccupancy.That'smuchbetterthanthe
39%explainedbyadvancebookings.
BookingsandOccupancyData
Waitaminute.Thosenumbersadduptomorethan100%.Itseemslikeyourregressiontechniqueexplainsmore
variationthanthereis!
Goodobservation,Leo.Thenumbersdon'taddup.Thereasonisthatthereisastatisticalrelationshipbetween
arrivalsandadvancebookingsthatwearen'ttakingintoaccountwhenwerunthetworegressionsseparately.
Weintendtofindoneequationthatincorporatesthedataonbotharrivalsandadvancebookingsandtakesinto
accounttherelationshipbetweenthem.We'llalsoinvestigatetheimpactofotherfactors,suchasthebusiness
practicesofyourcompetitors.Whoisyourmaincompetitorinthearea?
ThatwouldbetheHotelExcelsior.Itsmanager,KnutSteinkalt,isarealcutthroat.He'salwaysofferingspecial
promotionsthatundercutmyroomprices.
Fortunately,theExcelsior,thoughveryluxurious,isnotnearlyasinvitingastheKahana.Thatplacefeelslikean
undertaker'sparlor!I'vebeenabletokeepaheadofoldKnutbyofferingabetterproduct.
We'llstudytheExcelsior'spromotions,andseeifthey'vehadasignificantinfluenceontheKahana'soccupancy.
Thanks.Letmeknowassoonasyouhavesomeresults.Ihavetowarnyou,though,Imaybeout:I'mgoingseemy
lawyersinHonoluluthisweektodiscussMr.Pitt'sbisquelawsuit.Whatamess!

IntroducingMultipleRegression
"Mostmanagementproblemsaretoocomplextobecompletelydescribedbytheinteractionsbetweenonlytwo
variables,"Alicetellsyou."Incorporatingmultipleindependentvariablescangivemanagersamoreaccurate
mathematicalrepresentationoftheirbusiness."
Whenbuyinganewhome,weknowthatthehouse'ssizeinfluencesitssellingprice.Allotherfactorsbeingequal,we
expectthatthelargerthehome,thehigheritsprice.
Wecangainabettersenseofthisrelationshipbygraphingdataonpriceandhousesizeinascatterdiagram,and
runningaregressionwithpriceasthedependentvariableandhousesizeastheindependentvariable.We'lluse
dataon15recenthomepurchasesinthetownofSilverhaven.
SilverhavenRealEstateData
WecantellfromthevalueofRsquared,26%,thattherelationshipisfairlyweak:housesizevariationexplainsonly
aboutaquarterofthevariationinprice.
SilverhavenRealEstateData
SilverhavenRealEstateRegressions
Thisshouldcomeasnosurprise:housesizeisnottheonlyvariableaffectingthepriceofahome.Thereareatleast
80/135

5/13/2016

QuantitativeMethodsOnlineCourse

threeotherimportantfactors,asanyrealestateagentwilltellyou:
Onefacetofadesirablelocationisalowcommutingtime,sowe'dexpectaveragecommutingtimetoberelatedto
price.Tostudythisrelationship,we'lluseaproxyvariable:thehouse'sdistancefromthebusinesscenterin
downtownSilverhaven.Aproxyvariableisavariablethatiscloselycorrelatedtothevariablewewanttoinvestigate,
buttypicallyhasmorereadilyavailabledata.
Thedataondistanceforthesame15housesrevealsanegativerelationship:thefartherawayfromdowntown,the
lessexpensivehousestendtobe.Again,thestrengthoftherelationshipisrelativelyweak:Rsquaredforthe
regressionofpriceversusdistanceis37%.
SilverhavenRealEstateData
SilverhavenRealEstateRegressions
Wemightbetemptedtothinkthat,ifhousesizeexplains26%ofthepriceofahouseanddistanceexplains37%,
thenthetwovariablestogetherwouldexplain63%oftheprice.Butwhatifthereisarelationshipbetweenthetwo
independentvariables,housesizeanddistance?
Infact,thecorrelationcoefficientofhousesizeanddistanceis31%.Aswemightexpect,thereisapositive
relationshipbetweenthetwovariables:aswemovefartherfromthecitycenter,housestendtobelarger.How
shouldwefactorthisrelationshipintoouranalysisofhowhousesizeanddistanceaffectthepriceofahouse?
Ratherthanconsideringeachindividualrelationship,weneedtofindawaytoexpressthethreewayrelationship
amongallthreevariables:price,housesize,andlocation.Insteadoftwoseparateequationsdescribingprice,each
withadifferentindependentvariable,weneedoneequationthatincludesbothindependentvariables.
Wefindthisthreewayrelationshipusingmultipleregression.Inmultipleregression,weadaptwhatweknow
aboutregressionwithoneindependentvariableoftencalledsimpleregressiontosituationsinwhichwetake
intoaccounttheinfluenceofseveralvariables.
Thinkingaboutseveralvariablessimultaneouslycanbequitechallenging.Givenonlythepriceofahome,wecannot
makeinferencesaboutitssizeandlocation.A$500,000homemightbeamansioninthecountryside,amodest
houseinthesuburbs,oracozycardboardboxonthecornerof1standMain.
Graphingdataonmorethantwovariablesposesitsownsetofdifficulties.Threevariablescanstillberepresented,
butbeyondthat,visualizationandgraphicalrepresentationbecomeessentiallyimpossible.
Althoughwecancarryovermanyofthecentralideasbehindsimpleregressiontothemultivariatecase,we'llhaveto
considerseveralinterestingcomplications.
Asmanagers,almostanyquantitywewishtostudywillbeinfluencedbymorethanonevariable:toconstructan
accuratemodelofabusiness'dynamics,we'llusuallyneedseveralvariables.Multipleregressionisanessentialand
powerfulmanagementtoolforanalyzingthesesituations.
IncorporatingmorethanoneindependentvariableintoyouranalysisoftheKahana'soccupancysoundslikeagood
idea.Buthowdoyouadaptregressionanalysistoaccommodatemultiplevariables?

Summary
Multipleregressionisanextensionofsimpleregressionthatallowsustoanalyzetherelationshipsbetween
multipleindependentvariablesandadependentvariable.Relationshipsamongindependentvariablescomplicate
multivariateregression.Withmorethantwoindependentvariables,graphingmultivariablerelationshipsis
impossible,sowemustproceedwithcautionandconductadditionalanalysestoidentifypatterns.
IncorporatingmorethanoneindependentvariableintoyouranalysisoftheKhana'soccupancysoundslikeagood
idea.Buthowdoyouadaptregressionanalysistoaccomodatemultiplevariables?

AdaptingBasicConcepts
"Multipleregressionequationslookverysimilartoregressionequationswithonlyoneindependentvariable,"Alice
explains."Butbecarefulyouhavetointerpretthemslightlydifferently."

InterpretingtheMultipleRegressionEquation
Inourhomepriceexample,wefoundtworegressionequations,onefortherelationshipbetweenpriceandhouse
81/135

5/13/2016

QuantitativeMethodsOnlineCourse

size,andonefortherelationshipbetweenpriceanddistance.Whatwilltheequationforthethreewayrelationship
betweenprice,thedependentvariable,andthetwoindependentvariables,housesizeanddistance,looklike?
Theregressionequationinourhousingexamplewillhavetheformbelow:housesizeanddistanceeachhavetheir
owncoefficients,andtheyaresummedtogetheralongwiththeconstantcoefficienta.
Ingeneral,thelinearequationforaregressionmodelwithkdifferentvariableshastheformbelow.Sincethe
coefficientsweobtainfromthedataarejustestimates,wemustdistinguishbetweentheidealizedequationthat
representsthe"true"relationshipandtheregressionlinethatestimatesthatrelationship.Toexpressthateventhe
"true"equationdoesnotfitperfectly,weincludeanerrortermintheidealizedequation.
Runningtheregressiongivesuscoefficientsforhousesizeanddistance:252and55,006,respectively.Wecanuse
thismultipleregressionequationtopredictthepriceofotherhousesnotinourdataset.Topredictahouse'sprice,
weneedtoknowonlyitssizeanditsdistancetodowntown.
SilverhavenRealEstateData
SilverhavenRealEstateRegressions
Suppose"Windsor"isamodestmansionof3,500squarefeet,locatedintheoutersuburbsofSilverhaven,
approximately11milesfromdowntown.Basedonourregressionequation,howmuchwouldweexpectWindsorto
sellfor?
SilverhavenRealEstateData
SilverhavenRealEstateRegressions
WesimplyenterWindsor'ssquarefootageanddistancetodowntownintotheequation,andcalculateanexpected
sellingpriceof$699,938.
Let'stakeacloserlookatthecoefficientsinthehousingexample,focusingonthedistancecoefficient:55,006.This
coefficientissubstantiallydifferentfromthecoefficientintheoriginalsimpleregression:39,505.Whyisitso
different?
Thecoefficientsinthesimpleregressionandthecoefficientsinthemultipleregressionhaveverydifferent
meanings.Inthesimpleregressionequationofpriceversusdistance,weinterpretthecoefficient,39,505,inthe
followingway:foreveryadditionalmilefartherfromdowntown,weexpecthousepricetodecreasebyanaverageof
$39,505.
Wedescribethisaveragedecreaseof$39,505asagrosseffectitisanaveragecomputedovertherangeof
variationofallotherfactorsthatinfluenceprice.
Inthemultipleregressionofpriceversussizeanddistance,thevalueofthedistancecoefficient,55,006,isdifferent,
becauseithasadifferentmeaning.Here,thecoefficienttellsusthat,foreveryadditionalmile,weshouldexpectthe
pricetodecreaseby$55,006,providedthesizeofthehousestaysthesame.
Inotherwords,amonghousesthataresimilarlysized,weexpectpricestodecreaseby$55,006permileofdistance
todowntown.Werefertothisdecreaseastheneteffectofdistanceonprice.Alternatively,werefertoitas"the
effectofdistanceonpricecontrollingforhousesize".
Twohousesaresimilarinsize,butlocatedindifferentneighborhoods:"ShangriLa"isfivemilesfartherfrom
downtownthan"Xanadu."IfXanaduisvaluedat$450,000,howmuchwouldweexpectShangriLatocost?
SilverhavenRealEstateData
SilverhavenRealEstateRegressions
Sincethetwohousesarethesamesize,weusetheneteffectofdistanceonprice,
$55,006/mile,topredicttheexpecteddifferenceintheirsellingprices.ShangriLais5additionalmilesform
downtown,soitspriceshouldbe$55,006/mile*5miles=$275,030lessthanXanadu's,or$450,000$275,030=
$174,970.
"Valhalla"isanotherhouselocated5milesfartherfromdowntownthanXanadu.Wehavenoinformationaboutthe
relativesizesofthetwohomes.IfXanadu'ssellingpriceis$450,000,whatwouldweexpectValhalla'ssellingpriceto
be?
SilverhavenRealEstateData
SilverhavenRealEstateRegressions
Sincewecannotassumethatthesizesofthetwohousesareequal,weshouldnotcontrolforsize.Thusweusethe
grosseffectofdistanceonprice,
82/135

5/13/2016

QuantitativeMethodsOnlineCourse

$39,505/mile,topredicttheexpecteddifferenceinthetwohomes'sellingprices.Valhallais5additionalmilesfrom
downtown,soitspriceshouldbe$39,505/mile*5miles=$197,525lessthanXanadu's,or$450,000$197,525=
$252,475.
Let'strytobuildourintuitionaboutthedifferenceinthedistancecoefficientsinthesimpleandmultiple
regressions.Thecoefficientsaredifferentbecausetheyhavedifferentmeanings.Butwhatexactlyaccountsforthe
dropfrom39,505to55,006?
Inthemultipleregression,byessentiallyconsideringonlyhousesthatareofequalsize,weseparateouttheeffectof
housesizeonprice.Weareleftwithadistancecoefficientthatisnetrelativetohousesize.
Inthesimpleregression,thegrosseffectofdistance,$39,505/mile,representsanaverageovertherangeofhouse
sizes.Assuch,italsocapturessomeoftheeffectthathousesizehasonprice.Let'stakeacloserlookatthedistance
andhousesizedata.
CalculatingthecorrelationcoefficientbetweensizesofhomesandtheirdistancesfromdowntownSilverhaven,we
seethatthereisaslightpositiverelationship,withacorrelationcoefficientof31%.Inotherwords,aswemove
fartherfromdowntown,housestendtobelarger.
Wehaveseenthattwothingshappenaswemovefartherfromdowntownhousingpricesdropbecausethe
commuteislonger,andhousesizeincreases.Thefactthathousesizeincreaseswithdistancecomplicatesthepricing
story,becauselargerhousestendtobemoreexpensive.
Longerdistancesfromdowntowntranslateintotwodifferenteffectsonprice.Oneeffectofdistanceonpriceis
negative:asdistanceincreases,commutetimesincreaseandpricesdrop.
Asecondeffectofdistanceonpriceispositive:asdistanceincreases,housesizeincreases,andlargerhouses
correspondstohigherprices.
Runningamultipleregressionwithbothsizeanddistanceasindependentvariableshelpsteaseoutthesetwo
separateeffects.Whenwecontrolforhousesize,weseetheneteffectofdistanceonprice:pricesdropby$55,006
peradditionalmile.
Whenwedon'tcontrolforhousesize,theeffectofdistancealoneonpriceisconfoundedbythefactthathousesize
tendstoriseasdistanceincreases.The"real"effectofdistanceonpriceisdiminishedbytherelationshipbetween
priceandhousesize.
Whenwelookatthenetrelationshipbetweendistanceandprice,weconsideronlysimilarlysizedhouses.Nowwe
assumethatasdistancefromdowntowngrows,housesizestaysthesame.Ifhousesizedidn'tincreaseaswemoved
fartherout,priceswoulddropmoresharply:by$55,006ratherthan$39,505peradditionalmile.
Let'sanalyzethehousesizecoefficientinasimilarfashion.Inthemultivariableregressionmodelofhomeprices,
thehousesizecoefficientisnetrelativetodistance.Thecoefficientof252tellsustoexpectpricesofhomesequally
distantfromdowntowntoincreasebyanaverageof$252foreachadditionalsquarefootofsize.
Thegrosseffectofhousesizeonpriceis$167persquarefoot,considerablylessthan$252,theneteffect.Whenwe
donotcontrolfordistance,housesize"picksup"someofthenegativeeffectofdistance:thefactthatlargerhouses
aretypicallylocatedfartherfromthecityoffsetssomeoftheeffectofincreasedhousesize.
Weshouldalwaysbecarefultointerpretregressioncoefficientsproperly.Acoefficientis"net"withrespecttoall
variablesincludedintheregression,but"gross"withrespecttoallomittedvariables.Anincludedvariablemaybe
pickinguptheeffectsonpriceofavariablethatisnotincludedinthemodelschooldistrict,forexample.
Finally,weshouldnotethatthesecoefficientsarebasedonsampledata,andassuchareonlyestimatesofthe
coefficientsofthetruerelationship.Foreachindependentvariable,wemustinspectitspvalueintheregression
outputtomakesurethatitsrelationshipwiththedependentvariableissignificant.
Sincethepvalueislessthan0.05forbothhousesizeanddistancetodowntown,wecanbe95%confidentthatthe
truecoefficientsofthetwoindependentvariablesarenotzero.Inotherwords,weareconfidentthattherearelinear
relationshipsbetweeneachindependentvariableandhouseprice.
Therearefourstepsweshouldalwaysfollowwheninterpretingthecoefficientsofanindependentvariableina
multipleregression:

Summary
Weusemultipleregressiontounderstandthestructureofrelationshipsbetweenmultiplevariablesanda
83/135

5/13/2016

QuantitativeMethodsOnlineCourse

dependentvariable,andtoforecastvaluesofthedependentvariable.Acoefficientforanindependentvariablein
aregressionequationcharacterizesthenetrelationshipbetweentheindependentvariableandthedependent
variable:theeffectoftheindependentvariableonthedependentvariablewhenwecontrolfortheother
independentvariablesincludedintheregression.

PerforminginExcel
Note:UnlessyouhaveinstalledtheExcelDataAnalysisToolPakaddin,youwillnotbeabletoperformmultiple
regressionanalysisusingtheregressiontool.However,wesuggestyoureadthroughthefollowinginstructionsto
learnhowExcel'sregressiontoolworks,soyoucanperformmultipleregressioninthefuture,whenyoudohave
accesstotheDataAnalysisToolpak.
PerformingmultipleregressioninExcelisnearlyidenticaltoperformingsimpleregressions.Inthe"InputY
range"field,enterthecolumnwiththedataonthedependentvariableasyouwouldforasimpleregression.
Toenterdataontheindependentvariables,thecolumnsofalltheindependentvariablesmustbecontiguousand
havethesamenumberofrows.
Inthe"InputXRange"fieldandenterthecellreferenceofthetopcelloftheindependentvariableappearing
farthesttotheleft.Followingacolon,enterthecellreferenceofthebottomcelloftheindependentvariable
appearingfarthesttotheright.
Select"Labels"sothatExcelcanproperlylabeltheindependentvariablesintheoutput.Select"ResidualPlots"to
includetheresidualsandresidualplotsoftheresidualsagainsteachindependentvariableintheoutput.Enterthe
desired"ConfidenceLevel,"oracceptthedefaultlevelof95%.
Theregressionoutputprintsthevaluesofthecoefficientsforeachvariableinseparaterows,alongwiththep
valuesforthecoefficientsandthedesiredconfidenceintervals.

ResidualAnalysis
Whenrunningamultipleregressionyoumustdistinguishbetweennetandgrossrelationships.WhataboutR
squared?Howdoyoumeasurethepredictivepowerofaregressionmodelwithmultipleindependentvariables?
OneoftheSilverhavenhousesinourdataset,"Windsor,"soldfor$570,000,substantiallylessthanitspredicted
price,$699,938.RumorhasitthatWindsorishaunted,anditwashardtofindabuyerforit.Thedifference
betweentheactualpriceandthepredictedpriceistheresidualerror,inthiscase$129,938.
Insimpleregression,theresidualsthedifferencesbetweentheactualandpredictedvaluesofthedependent
variablesareeasytovisualizeonascatterplotofthedata.Theresidualsaretheverticaldistancesfromthe
regressionlinetothedatapoints.
Graphicallyrepresentingrelationshipsamongthreevariablesisabitdifficult.
Whenwehaveonlyoneindependentvariable,wecreateascatterplotofthedatapoints,thendrawalinethrough
thedatarepresentingtherelationshipdefinedbytheregressionequation:y=a+bx.
Withtwoindependentvariables,anordinaryscatterplotwillnolongerdo.Instead,wekeeptrackofthethird
variablebypicturingathreedimensionalspace.
Ourdatasetis"scattered"inthespacewithhousesizemeasuredononeaxis,distancetodowntownmeasuredon
anotheraxis,andthedependentvariable,price,measuredontheverticalaxis.
Theregressionequationwithtwoindependentvariablesdefinesaplanethatpassesthroughthedata.Theresiduals
thedifferencesbetweentheactualandpredictedpricesinthedatasetaretheverticaldistancesfromthe
regressionplanetothedatapoints.
Asinsimpleregression,theseverticaldistancesareknownasresidualsorerrors.Theregressionplaneisthe
planethat"bestfits"thedatainthesensethatitistheplaneforwhichthesumofthesquarederrorsisminimized.
Wecanplottheresidualsagainstthevaluesofeachindependentvariabletolookforpatternsthatcouldindicatethat
ourlinearregressionmodelisinadequateinsomeway.Hereistheresidualplotanalyzingthebehaviorofthe
residualsovertherangeoftheindependentvariablehousesize...
...andhereistheresidualplotanalyzingthebehavioroftheresidualsovertherangeoftheindependentvariable
distance.
84/135

5/13/2016

QuantitativeMethodsOnlineCourse

Whenlinearregressionisagoodmodelfortherelationshipsstudied,eachoftheresidualplotsshouldreveala
randomdistributionoftheresiduals.Thedistributionshouldbenormal,withmeanzeroandfixedvariance.
Theresidualplotagainstdistanceinthemultipleregressionlooksquitedifferentfromtheresidualplotagainst
distanceinthesimpleregression.Theyrepresentdifferentconcepts:thefirstgivesinsightintothenetrelationship
betweendistanceandpricecontrollingforhousesize,andthesecondgivesinsightintothegrossrelationship.
Withmorethantwoindependentvariables,visualizingtheresidualsinthecontextofthefullregressionrelationship
becomesessentiallyimpossible.Wecanonlythinkofresidualsintermsoftheirmeaning:thedifferencesbetween
theactualandpredictedvaluesofthedependentvariable.
Becauseresidualplotsalwaysinvolveonlytwovariablesthemagnitudeoftheresidualandoneoftheindependent
variablestheyprovideanindispensablevisualtoolfordetectingpatternssuchasheteroskedasticityandnon
linearityinregressionswithmultipleindependentvariables.

Summary
Theresiduals,orerrors,arethedifferencesbetweentheactualvaluesofthedependentvariableandthepredicted
valuesofthedependentvariable.Foraregressionwithtwoindependentvariables,theresidualsarethevertical
distancesfromtheregressionplanetothedatapoints.Wecangrapharesidualplotforeachindependentvariable
tohelpidentifypatternssuchasheteroskedasticityornonlinearity.

QuantifyingthePredictivePowerofMultipleRegression
Insimpleregressions,Rsquaredmeasuresthepredictiveorexplanatorypoweroftheindependentvariable:the
percentageofthevariationinthedependentvariableexplainedbytheindependentvariable.Whenwerunthe
regressionofhousepriceversushousesize,wefindanRsquaredof26%.TheRsquaredforpriceversusdistance
fromdowntownis37%.
Themultipleregressionofpriceversusthetwoindependentvariables,housesizeanddistance,alsoreturnsanR
squaredvalue:90%.Here,Rsquaredisthepercentageofpricevariationexplainedbythevariationinbothof
theindependentvariables.
Themultipleregression'sRsquaredismuchhigherthantheRsquaredofeithersimpleregression:Buthowcan
webesurewereallygainedpredictivepowerbyconsideringmorethanonevariable?
Infact,Rsquaredcannotdecreasebyaddinganotherindependentvariabletoamodelitcanonlystaythesame
orincrease,evenifthenewindependentvariableiscompletelyunrelatedtothedependentvariable.
TounderstandwhyRsquaredalwaysimproveswhenweaddanothervariable,let'sreturntothesimple
regressionofhousepriceversusdistancefromdowntown.Whenwelookatonlytwoobservationstwohouses
wecanfitalinethroughthemthatfitsperfectly.
Rsquaredforthesetwodatapointswiththislineis100%.Butclearlywecan'tusethislinetoexplainthetrue
relationshipbetweenhousepriceanddistance.ThehighRsquaredisaresultofthefactthatthenumberof
observations,2,issosmallrelativetothenumberofindependentvariables,1.
Ifwehave3observationsthreehouseswecan'tfindalinethatfitsperfectly.Here,Rsquaredis35%.
Butwhenweaddanotherindependentvariablenomatterhowirrelevantwecanfindaplanethatfitsthe
datapointsperfectly,increasingRsquaredupto100%.The"perfect"Rsquaredisagainduetothefactthatthe
numberofobservations,3,isonlyonemorethanthenumberofindependentvariables,2.
ImprovingRsquaredbyaddingirrelevantvariablesis"cheating."WecanalwaysincreaseRsquaredto100%by
addingindependentvariablesuntilwehaveonefewerthanthenumberofobservations.Weare"overfitting"the
datawhenweobtainaregressionequationinthisway:theequationfitsourparticulardatasetexactly,butalmost
surelydoesnotexplainthetruerelationshipbetweentheindependentanddependentvariables.
Tobalanceouttheeffectofthedifferencebetweenthenumberofobservationsandthenumberofindependent
variables,wemodifyRsquaredbyanadjustmentfactor.Thistransformationlooksquitecomplicated,butnotice
thatitislargelydeterminedbynk,thedifferencebetweenthenumberofobservationsnandthenumberof
independentvariablesk.
ThisadjustmentreducesRsquaredslightlyforeachvariableweadd:unlessthenewvariableexplainsenough
additionalvariancetoincreaseRsquaredbymorethantheadjustmentfactorreducesit,weshouldnotaddthe
newvariabletothemodel.
85/135

5/13/2016

QuantitativeMethodsOnlineCourse

Excelreportsboththe"raw"RsquaredandtheAdjustedRsquared.
ItiscriticaltouseadjustedRsquaredwhencomparingthepredictivepowerofregressionswithdifferent
numbersofindependentvariables.Forexample,sincetheadjustedRsquaredofthemultipleregressionofhouse
priceversushousesizeanddistanceisgreaterthantheadjustedRsquaredofeithersimpleregression,wecan
concludethatwegainedrealpredictivepowerbyconsideringbothindependentvariablessimultaneously.
Afinalcaveat:weshouldnevercompareRsquaredoradjustedRsquaredvaluesforregressionswithdifferent
dependentvariables.AnRsquaredof50%mightbeconsideredlowwhenwearetryingtoexplainproductsales,
sinceweexpecttobeabletoidentifyandquantifymanykeydriversofsales.AnRsquaredof50%wouldbe
consideredhighifweweretryingtoexplainhumanpersonalitytraits,sincetheyareinfluencedbysomany
factorsandrandomevents.

Summary
Rsquaredmeasureshowwellthebehavioroftheindependentvariablesexplainsthebehaviorofthedependent
variable.Itisthepercentageofvariationinthedependentvariableexplainedbyitsrelationshipwiththe
independentvariables.BecauseRsquaredneverdecreaseswhenindependentvariablesareaddedtoa
regression,wemultiplyitbyanadjustmentfactor.Thisadjustmentbalancesouttheapparentadvantagegained
justbyincreasingthenumberofindependentvariables.

SolvingtheStaffingProblem(II)
AliceurgesyoutouseyournewfoundknowledgetoanalyzetherelationshipbetweentheKahana'soccupancyand
advancedbookingsandKauai'sarrivals.
YouandAlicewanttofindthethreewayrelationshipbetweenthedependentvariable,theKahana'soccupancy,and
thetwoindependentvariables,arrivalsonKauaiandtheKahana'sadvancebookings.
ThefirstthingAliceasksyoutodoistomeasurethestrengthoftherelationshipbetweenthetwoindependent
variables.Enterthecorrelationcoefficientasadecimalnumberwiththreedigitstotherightofthedecimalpoint
(e.g.,enter"5"as"5.000").Roundifnecessary.
KahanaOccupancyData
Thecorrelationbetweenarrivalsandbookingsisn'tespeciallystrong,44%.
KahanaOccupancyData
YourunthemultipleregressionofoccupancyversusKauaiarrivalsandadvancebookings.
KahanaOccupancyData
Whichofthefollowingindicatesthatthemultipleregressionwithtwoindependentvariablesisanimprovement
overbothsimpleregressions?
KahanaOccupancyData
KahanaOccupancyRegressions
Dothedataindicatethattheregressioncoefficientsoftheindependentvariablesaresignificantatthe0.05level?
KahanaOccupancyData
KahanaOccupancyRegressions
SupposeLeohas200advancebookingsforthemonthofJanuaryand250advancebookingsforFebruary.Whatis
thebestestimateofhowmanymoreguestsLeocanexpectinFebruarycomparedtoJanuary?
KahanaOccupancyData
KahanaOccupancyRegressions
YouandAlicetrytocontactLeo,butheappearstobeunavailable.
LeomustbewithhislawyersinHonolulu.I'msurewe'llbeabletogetaholdofhimlater.
Inthemeantime,IcancompletemyresearchintotheExcelsior'spromotions.Andtherearesomecomplexitiesof
multipleregressionthatyoushouldbecomefamiliarwith...
86/135

5/13/2016

QuantitativeMethodsOnlineCourse

Exercise1:EmpireLearning
EmpireLearningisadeveloperofeducationalsoftware.CEOBillHartborneismakingabidforacontractto
createanelearningmoduleforanewclient.
Preparingthebidrequiresanestimateofthenumberoflaborhoursitwilltaketocreatethenewmodule.Bill
believesthatthelengthofamoduleandthecomplexityofitsanimationsdirectlyaffecttheamountoflabor
requiredtocompleteit.
BillhasdataonthelaborhoursEmpireusedtocompletepreviouscourses.Healsoknowsthenumberofpages
andtheanimationruntimeofeachpreviouscoursequantitieshethinksarereasonableproxiesforcourse
lengthandanimationcomplexity,respectively.
Performasimpleregressionanalysisforeachoftheindependentvariables:numberofpagesandruntimeof
animations.
EmpireLearningData
Whichfactorexplainsmorevariationinlaborhours?
EmpireLearningData
EmpireLearningRegressions
Inthesimpleregressions,whichoftheindependentvariablescontributessignificantlytothenumberoflabor
hoursittakesEmpiretocreateanelearningcourse?
EmpireLearningData
EmpireLearningRegressions
Thepvaluesforthecoefficientsonanimationruntimeandnumberofpagesare0.003and0.0002respectively
wellbelow0.05,themostcommonlyusedlevelofsignificance.Thus,weconcludethatbothindependentvariables
contributesignificantlyintheirrespectivesimpleregressionstothenumberoflaborhoursEmpiretakestocreate
anelearningcourse.
Runthemultipleregressionoflaborhoursversusnumberofpagesandruntimeofanimations.
EmpireLearningData
Accordingtothismultipleregression,whichoftheindependentvariablescontributessignificantlytothenumber
oflaborhoursittakesEmpiretocreateanelearningcourse?
EmpireLearningData
EmpireLearningRegressions
Thepvaluesforthecoefficientsonanimationruntimeandnumberofpagesare0.014and0.0015respectively
wellbelow0.05,themostcommonlyusedlevelofsignificance.Thus,weconcludethatbothindependentvariables
contributesignificantlyinthemultipleregressiontothenumberoflaborhoursEmpiretakestocreateane
learningcourse.

Exercise2:TheEmpireStrikesBack
Forthisexercise,refertotheregressionanalysesperformedinExercise1ofthissection.
EmpireLearningData
EmpireLearningRegressions
BillHartborne,CEOofEmpireLearning,isusingregressionanalysistopredictthenumberoflaborhoursitwill
takehisteamtocreateanewelearningcourse.HeisusingdataonpreviouscoursesEmpirecreated,withthe
numberofpagesandthetotalruntimeofanimationsasindependentvariables.
EmpireLearningData
EmpireLearningRegressions
Inthemultipleregressionoflaborhoursversusnumberofpagesandruntimeofanimations,thecoefficientof
0.84forthenumberofpagestellsusthat:
EmpireLearningData
87/135

5/13/2016

QuantitativeMethodsOnlineCourse

EmpireLearningRegressions
Inthemultipleregressionequation,thecoefficientoftheindependentvariable"numberofpages"isgross
relativeto:
EmpireLearningData
EmpireLearningRegressions

Challenge:ChildrenoftheEmpire
Forthisexercise,refertotheregressionanalysesperformedinExercise1ofthissection.
EmpireLearningData
EmpireLearningRegressions
BillHartborne,theCEOofEmpireLearning,isusingregressionanalysistopredictthenumberoflaborhoursit
willtakehisteamtocreateanewelearningcourse.HeisusingdataonpreviouscoursesEmpirecreated,withthe
numberofpagesandthetotalruntimeofanimationsasindependentvariables.
EmpireLearningData
EmpireLearningRegressions
Billbillsouthistalentat$70/hour.Basedonthemultipleregression,howmuchshouldhechargeforthelabor
contentofacoursewith400pagesand170secondsofanimations?Entertheestimatedcostofthelabor(in$)as
aninteger(e.g.,enter"$5.00"as"5").Roundifnecessary.
EmpireLearningData
EmpireLearningRegressions
Firstusetheregressionequationtopredictthenumberoflaborhoursrequiredtocompletethecourse.
EmpireLearningData
EmpireLearningRegressions
ThenmultiplythatnumberbyEmpireLearning'sbillingrateof$70/hourtofindthetotalamountheshould
chargeforthelaborcontentofthecourse,$77,210.
EmpireLearningData
EmpireLearningRegressions
Billissurethattheclientwillbalkatalaborbillofover$70,000.Heknowsthatanimationisimportanttothe
client,sodoesn'twanttocutcornersthere.However,hebelievesthathisleadwritercancoverthecontentinfewer
pageswithoutcompromisinghisrenownedclearandengagingprose.
EmpireLearningData
EmpireLearningRegressions
Toreducetotallaborcoststo$70,000,howmanypagesmustBillcutfromtheplantomeethisclient'scostlimits?
EmpireLearningData
EmpireLearningRegressions
Toreducethelaborbillfrom$77,210to$70,000,Billmustreducelaborcostsby$7,210.Toachievethisreduction,
Billmustcutthecontract'slaborhoursby103hours,sincehebillsouthistalentat$70/hour.
EmpireLearningData
EmpireLearningRegressions
Sincetheanimationruntimewillnotchange,weusethenetrelationshipbetweenlaborhoursandnumberof
pages,whichtellsusthateachadditionalpageconsumes0.84laborhours.Thus,Billmustreducethenumberof
pagesby123..
EmpireLearningData
EmpireLearningRegressions

NewConceptsinMultipleRegression
88/135

5/13/2016

QuantitativeMethodsOnlineCourse

"IexpectLeowillcallusthiseveningafterhismeetingwiththelawyers,"Alicepredicts."Ihopethingsaregoingwell.
IfMr.Pitt'slawsuitmaterializes,Leomightnothavemuchofabusinesslefttohelphimwith."

TheStaffingProblem(III)
Ijustspentthewholedayatmylawyers'offices.Pleasegivemesomegoodnewsabouttheoccupancyproblem.
Well,we'vefoundaregressionmodelthatincorporatesarrivalsonKauaiandadvancebookings.We'renowableto
explainabout86%ofthevariationinoccupancy.
KahanaOccupancyRegressions
That'sgreat.That'ssomuchbetterthanthe39%youcalculatedusingonlyadvancebookingsastheindependent
variable.Ishouldbeabletomakemuchmorereliablepredictionsbasedonyournewmodel!
Unfortunately,no.Althoughthisnewmodelhelpsusunderstandwhyyouroccupancyvaries,wecan'texactlyusethe
modeltomakepredictions.
Youcanuseadvancebookingstomakeapredictionaboutoccupancyinagivenmonthbecausethebookingsare
knowntoyouaheadoftime.Butyouwon'tgetthedataonthenumberofarrivalsinamonthuntilit'stoolateand
yourguestsarealreadyonyourdoorstep.
That'sterrible!Sure,it'snicetoknowhowtoday'soccupancyisaffectedbytoday'sarrivals.ButIneedtomake
businessdecisions!Ineedtoknowonemonthinadvancehowmanystafferstohire!Please,isn'ttheresomething
youcando?
Don'tgiveupyet,Leo.Westillhaveanumberofstatisticalapproachesatourdisposal.We'llhavesomethingforyou
whenyougetbackfromHonolulu.

Multicollinearity
"WeneedsomemoreadvancedstatisticaltoolstofindaregressionsuitabletoLeo'spurposes,"Alicetellsyou."And
thereisstillapitfallyou'llneedtolearntoavoidwhenusingmultipleregression."
Anotherkeyfactorthatinfluencesthepriceofahouseisthesizeofthepropertyitisbuiltonits"lotsize".
Naturally,we'dpaymoreforaspaciousacreoflandthanfor800crampedsquarefeet.
Usingnewdataonthelotsizesofthe15samplehousesinSilverhaven,runthesimpleregressionofhousepriceon
lotsize.Ifwedon'tcontrolforanyotherfactors,howmuchofthevariationinpricecanbeexplainedbyvariationin
lotsize?EnterRsquaredasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter"50%"as
"0.50").Roundifnecessary.
SilverhavenRealEstateData
SilverhavenRealEstateRegressions
Lotsizeaccountsfor30percentofahome'sprice.Dothedataprovideevidencethatthereisasignificantlinear
relationshipbetweenhousepriceandlotsize?
SilverhavenRealEstateData
SilverhavenRealEstateRegressions
Thelowpvalueof0.033tellsusthatwecanbeconfidentthatthegrossrelationshipbetweenlotsizeandhomeprice
issignificant.Whathappenswhenweaddlotsizeasathirdindependentvariableinourmultipleregressionofprice
onthreeindependentvariables:housesize,distancefromdowntownSilverhaven,andlotsize?
Runamultipleregressionofpriceonthethreeindependentvariables:housesize,distance,andlotsize.Howdoes
theadditionofthenewindependentvariable,"lotsize"affectthepredictivepoweroftheregressionmodel?
SilverhavenRealEstateData
SilverhavenRealEstateRegressions
Byaddingtheindependentvariable"lotsize",weimproveadjustedRsquaredslightly:from89%to91%,tellingus
thatthepredictivepoweroftheregressionhasimproved.Whataboutthesignificanceoftheindependentvariables?
Hasaddingthenewvariablechangedthepvaluesofthecoefficients?
SilverhavenRealEstateRegressions
89/135

5/13/2016

QuantitativeMethodsOnlineCourse

Somethingoddhashappened.Inourearlierregressionwithtwoindependentvariables,thepvaluesforboththe
housesizeandthedistancecoefficientswerelessthan0.05.Now,addinglotsizeintotheequationhassomehow
raisedthepvalueforhousesize.
Thenewpvalueforhousesize,0.2179,issohighthatthereisnolongerevidenceofasignificantlinearrelationship
betweenpriceandhousesizeaftertakinglotsizeintoaccount.Howdoweexplainthisdropinsignificance?
Whenamultipleregressiondeliversasurprisingresultsuchasthis,wecanusuallyattributeittoarelationship
betweentwoormoreoftheindependentvariables.Let'slookatthedataonhousesizeandlotsize.
Thehighcorrelationcoefficientbetweenhousesizeandlotsize94%istheculpritintheCaseoftheDropping
Significance.
Whentwooftheindependentvariablesarehighlycorrelated,oneisessentiallyaproxyfortheother.This
phenomenoniscalledmulticollinearity.Inourexample,lotsizeisagoodproxyforhousesize.
Bothhousesizeandlotsizecontributetothepriceofahome.Butbecausethesetwovariablesarecloselycorrelated
inourdataset,thereisnotenoughinformationinthedatatodiscernhowtheircombinedcontributionsshouldbe
attributedtothetwoindependentvariables.
Theneteffectofhousesizeonpriceshouldtellustheeffectofhousesizeonpriceassumingthatthelotsizeisfixed.
However,wecan'tdetectthiseffectinthedata:housesizeandlotsizearesocloselyrelatedthatwe'veneverseen
housesizevarymuchwhenlotsizeisfixed.
Woulddroppingthevariablehousesizeimprovethepredictivepoweroftheregressionmodel?Themultiple
regressionswithandwithouthousesizehavedifferentnumbersofindependentvariables,soweuseadjustedR
squaredtocomparetheirpredictivepower.
Withouthousesize,adjustedRsquaredis90.89%,slightlylowerthan91.40%,theadjustedRsquaredforthe
regressionincludinghousesize.Thus,althoughtheregressionmodelcannotaccuratelyestimatetheeffectofhouse
sizewhenwecontrolforlotsizeanddistance,theadditionofhousesizedoeshelpexplainabitmoreofthevariance
insellingprice.

DiagnosingandTreatingMulticollinearity
AcommonindicationoflurkingmulticollinearityinaregressionisahighadjustedRsquaredvalueaccompanied
bylowsignificanceforoneormoreoftheindependentvariables.Onewaytodiagnosemulticollinearityistocheck
ifthepvalueonandindependentvariableriseswhenanewindependentvariableisadded,suggestingstrong
correlationbetweenthoseindependentvariables.
Howmuchofaproblemismulticollinearity?Thatdependsonwhatweareusingtheregressionanalysisfor.If
we'reusingittomakepredictions,multicollinearityisnotaproblem,assumingasalwaysthatthehistorically
observedrelationshipsamongthevariablescontinuetoholdgoingforward.
Inthehousepriceexample,we'dkeepthehousesizevariableinthemodel,becauseitspresenceimproves
adjustedRsquaredandbecauseourjudgmentwouldsuggestthathousesizeshouldhaveanimpactonprice
separatefromtheeffectoflotsize.
Ifwe'retryingtounderstandthenetrelationshipsoftheindependentvariables,multicollinearityisaserious
problemthatmustbeaddressed.Onewaytoreducemulticollinearityistoincreasethesamplesize.Themore
observationswehave,theeasieritwillbetodiscerntheneteffectsoftheindividualindependentvariables.
Wecanalsoreduceoreliminatemulticollinearitybyremovingoneofthecollinearindependentvariables.
Identifyingwhichonetoremoverequiresacarefulanalysisoftherelationshipsbetweentheindependent
variablesandthedependentvariable.Thisiswhereamanager'sdeepunderstandingofthedynamicsofthe
situationbecomesinvaluable.
Inourhomepriceexample,we'dexpectbothhousesizeandlotsizetohavesignificantanddiscernableeffectson
thepriceofahome:ashackonanacreoflandshouldcostlessthanamansiononasimilarproperty.
Tobetterunderstandtheneteffectofhousesizeweshouldprobablygatheralargersampletoreduce
multicollinearity.Ifwedidn'texpectlotsizeandhousesizetohavedistincteffectsonprice,wemightremove
housesizefromtheequation.

Summary
90/135

5/13/2016

QuantitativeMethodsOnlineCourse

Multicollinearityoccurswhensomeoftheindependentvariablesarestronglyinterrelated:distinguishingthe
respectiveeffectsofsomeoftheindependentvariablesonthedependentvariableisnotpossibleusingthe
availabledata.Multicollinearityistypicallynotaproblemwhenweuseregressionforforecasting.Whenusing
regressiontounderstandthenetrelationshipsbetweenindependentvariablesandthedependentvariable,
multicollinearityshouldbereducedoreliminated.

LaggedVariables
IntheSilverhavenrealestateexamplewelookedatanumberofhousesandcomparedfourcharacteristics:price,
housesize,lotsize,anddistancefromthecitycenter.Thesearecrosssectionaldata:welookedatacrosssection
oftheSilverhavenrealestatemarketataspecificpointintime.
Atimeseriesisasetofdatacollectedoverarangeoftime:eachdatapointpertainstoaspecifictimeperiod.
OurEasyMeatdataisanexampleofatimeserieswehavedataonsalesandadvertisinglevelsforten
consecutiveyears.
Sometimes,thevalueofthedependentvariableinagivenperiodisaffectedbythevalueofanindependent
variableinanearlierperiod.Weincorporatethedelayedeffectofanindependentvariableonsdependent
variableusingalaggedvariable.
Theeffectsofvariablessuchasadvertisingoftencarryoverintothelatertimeperiod.Forexample,lastyear's
EasyMeatadvertisinglevelsmaycontinuetoaffectthisyear'ssales.
Tostudythiscarryovereffect,weaddthelaggedvariable"previousyear'sadvertising."Ourregressionnowhas
twoindependentvariables:thecurrentyear'sadvertisinglevelandlastyear'sadvertisinglevel.
TorunaregressiononthelaggedEasyMeatadvertisingvariablewefirstneedtopreparethedataforthelagged
variable:wecopythecolumnofadvertisingdataovertoanewcolumn,shifteddownbyonerow.
Thefirstandlastdatapointsdrawattention:thefirsthasallthenecessarydataexceptalaggedadvertisingvalue,
andthelasthasalaggedvaluebutnootherinformation.Sinceweneedobservationswithdataonallvariables,we
areforcedtodiscardthefirstobservationaswellastheextraneouspieceofinformationinthelaggedvariable
column.
Ineffect,byintroducingalaggedvariable,weloseadatapoint,becausewehavenovaluefor"previousyear's
advertising"forthefirstobservation.
Weruntheregressionaswewouldanyordinarymultipleregression.Theequationweobtainis:
EasyMeatSalesData
EasyMeatSalesRegressions
Comparingtheoutputfromtheregressionswithandwithoutthelaggedvariable,wefirstnoticethatthenumber
ofobservationshasdecreasedfrom10to9.Doestheadditionofthelaggedadvertisingvariableimprovethe
regression?
ThelaggedvariabledecreasesadjustedRsquaredsubstantiallyfrom84.11%to76.62%.Moreover,thelagged
variable'shighpvalue0.33tellsusthatitseffectistooinsignificantforustodistinguishfromthecurrent
year'sadvertising'seffect.Weconcludethattheadditionofthelaggedvariablehasnotimprovedtheregression.
Wecanintroducevariableswithevenlongerlags.Wemighttrytouseadvertisingdatafromtwoormoreyearsin
thepasttohelpexplainthepresentsaleslevel.However,foreachadditionalperiodoflagweloseanotherdata
point.
RunningtheEasyMeatregressionwiththecurrentyear'sadvertising,lastyear'sadvertising,andtheadvertising
levelfromtwoyearsagoleavesusonlyeightobservations.Welosepredictivepower:adjustedRsquareddrops
from76.62%to62.47%.Moreover,noneofthecoefficientsissignificantatthe0.05level.
Addingalaggedvariableiscostlyintwoways.Thelossofadatapointdecreasesoursamplesize,whichreduces
theprecisionofourestimatesoftheregressioncoefficients.Atthesametime,becauseweareaddinganother
variable,wedecreaseadjustedRsquared.
Thus,weincludealaggedvariableonlyifwebelievethebenefitsofaddingitoutweighthelossofanobservation
andthe"penalty"imposedbytheadjustmenttoRsquared.
Despitethesecosts,laggedvariablescanbeveryuseful.Sincetheypertaintoprevioustimeperiods,theyare
usuallyavailableaheadoftime.Laggedvariablesareoftengood"leadingindicators"thathelpuspredictfuture
91/135

5/13/2016

QuantitativeMethodsOnlineCourse

valuesofadependentvariable.

Summary
Wecanuselaggedvariableswhendataconsistofatimeseries,andwebelievethatthevalueofthedependent
variableatonepointintimeisrelatedtothevalueofanindependentvariableatapreviouspointintime.
Laggedvariablesareespeciallyusefulforpredictionsincetheyareavailableaheadoftime.However,theycome
atacost:weloseoneobservationforeachtimeseriesintervalofdelayedeffectweincorporateintothelagged
variable.

DummyVariables
Sofar,we'vebeenconstructingregressionmodelsusingonlyquantitativevariablessuchasadvertising
expenditures,housesize,ordistance.Thesevariablesbytheirnaturetakeonnumericalvalues.
Butmanyvariableswestudyarequalitativeorcategorical:theydonotnaturallytakeonnumericalvalues,but
canbeclassifiedintocategories.
Forexample,inourhousepriceexample,acategoricalvariablemightbethehome'sprimaryconstruction
material:wood,brick,orstraw.Assigningnumericalvaluestosuchcategoriesdoesn'tmakesenseahousemade
ofwoodcannotbedescribedasabrickhouseplussomenumber.Howdoweincorporatecategoricalvariablesinto
aregressionanalysis?
Let'slookatanexamplefromJuliusTabin'sEasyMeatbusiness.WehavefoundthatEasyMeatsalesare
determinedtoalargeextentbyhowmuchhespendsonadvertising.Anotherfactoraffectingsalesiswhichflavor
ofEasyMeatJuliusfeaturesinhisadvertisingcampaigns.
JuliusproducesbothEasyMeatClassic,madefrombeefforthemostpart,andEasyMeatPoulk!,aporkand
poultryblend.
Hopingtomatchtherightspreadblemeatflavortothemoodofthenation,JuliusfeaturesPoulk!inhisad
campaignsinsomeyears.Inotheryears,classictakestheleadrole.
TostudytheeffectsofJulius'flavoroftheyearchoice,weuseatypeofcategoricalvariablecalledadummy
variable.Adummyvariabletakesononeoftwovalues:0or1,toindicatewhichoftwocategoriesadatapointfalls
into.
Thisdummyvariablewe'llcallit"Poulk!"flavorissetto1foryearswhenEasyMeatadsfeaturePoulk!,and0
foryearswhentheyfeatureClassic.
Runtheregressiononthedata,withthesalesasthedependentvariable,andadvertisingandthePoulk!flavor
dummyvariableastheindependentvariables.WhatisthecoefficientforthePoulk!flavorvariable?Enterthe
coefficientforthePoulk!flavorvariableasaninteger(e.g.,"5").Roundifnecessary.
EasyMeatSalesData
EasyMeatSalesRegressions
Wefindacoefficientof533,024forPoulk!flavor.Whatdoesthiscoefficienttellus?
Thecoefficient533,024tellsusthataftercontrollingforthelevelofadvertisingexpenditures,averagesalesare
$533,024higherwhenPoulk!istheflavoroftheyear,ratherthanEasyMeatClassic.
Thisregressionmodelcanbeexpressedgraphically:essentiallywehavetwoparallelregressionlines:oneforyears
inwhichClassicispromoted,andonefor"Poulk!Years."Theverticaldistancebetweenthelinesistheaverage
increaseinsalesJuliuswouldexpectifhechoosestofeaturePoulk!inagivenyear,aftercontrollingforadvertising
level.
Theslopeofeachlineisthesame:ittellsustheaverageincreaseinsaleswhenJuliusspendsanotherdollarin
advertisingaftercontrollingforwhichflavorisfeatured.
Inthisexample,wearbitrarilychosetosetthedummyvariableequaltozerofortheClassicflavor,i.e.tomake
Classicthe"basecase"flavor.IfwemadePoulk!thebasecaseflavor,wewouldobtainexactlythesamegraph.The
coefficientonflavornowwouldbe533,024,butitwouldagainindicatethatJuliusshouldexpectaveragesalesto
be$533,024lowerintheyearsthathisadsfeatureClassicratherthanPoulk!.
Asforanyindependentvariableinaregression,wecantestadummyvariable'ssignificancebycalculatingap
92/135

5/13/2016

QuantitativeMethodsOnlineCourse

valueforitscoefficient.Whenthepvalueislessthan0.05,wecanbe95%confidentthatthecoefficientisnotzero
andthatthedummyvariableissignificantlyrelatedtothedependentvariable.
IfJuliusproducedandpromotedmorethantwoflavors,we'dneedanotherdummyvariableforeachadditional
flavor.Ingeneral,we'dneedonefewerdummyvariablethanthenumberofflavors:onevariableforeachofthe
flavorsexceptforClassic,thebasecase.
Foreachyearthatagivenflavorisfeatured,itsdummyvariableissettoone,andallotherflavorvariablesareset
tozero.Ina"ClassicYear,"alldummyvariablesaresettozero
Whydon'twehaveexactlyasmanydummyvariablesascategories?SupposewehavejusttwocategoriesPoulk!
andClassicwithseparatedummyvariablesforeachPandC.InaPoulk!Year,P=1andC=0.InaClassic
Year,P=0andC=1.Theseconddummyvariablewouldbecompletelyredundant:sincethereareonly2flavors,
ifP=0weknowtheflavorisnotPoulk!,soitmustbeClassic.
Technically,includingbothdummyvariableswouldbeproblematicbecausetheywouldbeperfectlycorrelated:
wheneverP=1,C=0,andwheneverP=0,C=1.Perfectlycorrelatedvariablesareperfectlycollinear:wegain
nothingbyincludingbothandmustdroponetoavoidcripplingmulticollinearity.
Itisusefultonotethatrunningasimpleregressionanalysiswithonlyadummyvariableisequivalenttorunning
ahypothesistestforthedifferencebetweenthemeansoftwopopulations.Inthiscase,onepoulationwouldbe
salesinPoulk!years,andtheotherwouldbesalesinClassicyears.Fromthedatawefindthemeanandstandard
deviationofsalesinPoulk!andClassicyears.Thehypothesistestgivesusapvalueof0.43,sowecannotconclude
thatthemeansdifferwhenthefeatureflavorisdifferent.
Asimpleregressionofsalesonthedummyvariableflavorgivesusthesameresult.Thepvalueonthedummy
variableis0.45,tellingusthattheflavordummyvariableisnotstatisticallysignificant.Thepvaluesdifferslightly
forthetwoapproachesduetodifferencesinthewayExcelcalculatestheterms,butthetestsareconceptually
equivalent.

Summary
Wecanusedummyvariablestoincorporatequalitativevariablesintoaregressionanalysis.Dummyvariables
havethevalues0and1:thevalueis1whenanobservationfallsintoacategoryofthequalitativevariableand0
whenitdoesn't.Forqualitativevariableswithmorethantwocategories,weneedmultipledummyvariables:
onefewerthanthenumberofcategories.

SolvingtheStaffingProblem(III)
Withoneeyeonthelookoutforsignsoflurkingmulticollinearityandtwonewtypesofvariablesinyourtoolbox,you
setouttosettlethestaffingproblemforgood.
Youmighthavenoticedthatthenumberofarrivalsfollowsanannualseasonalpattern.Arrivalstendtodropoff
duringthelatesummer,surgeagaininOctober,anddropverylowfortherestoftheyear.
Source
TheypickupbrieflyinFebruary,butthetouristbusinessslowsthroughthespring.Duringtheearlysummer,
vacationersstartarrivingindroves,witharrivalspeakinginJuneorJuly.
Source
Perhapswecanusetheseasonalityofarrivalsinsomeway.Wemightcomeupwithalaggedvariablethatfunctions
asaproxyforthecurrentmonth'sarrivals.Thenwecanrunaregressionwiththelaggedvariable.Sincethevaluesof
alaggedvariablewouldbebasedonhistoricaldata,Leowouldknowthemaheadoftime.
Giventhatarrivalsfollowthisseasonalpattern,whichofthefollowingvariablesislikelytobeagoodproxyforthis
month'sarrivalsonKauai?
Source
YourunthesimpleregressionofKahanaoccupancyversus12monthlaggedarrivals.
KahanaOccupancyData
Source
93/135

5/13/2016

QuantitativeMethodsOnlineCourse

Whatisthelowestlevelatwhichthe"laggedarrivals"variableissignificant?
KahanaOccupancyData
KahanaOccupancyRegressions
Source
Theregressionoutputshowsapvalueof0.000forthelaggedarrivalscoefficient,farbelowthesignificancelevelof
0.01.
At55%,theRsquaredforthesimpleregressiononlaggedarrivalsismuchlowerthanforthesimpleregressionon
currentarrivals,80%.TheadjustedRsquaredisonly53%.Still,laggedarrivalshavesubstantialpredictivepower.
Runamultipleregressionofoccupancyversuslaggedarrivalsandadvancebookings.
KahanaOccupancyData
WhatistheadjustedRsquaredforthismultipleregression?EntertheadjustedRsquaredasadecimalnumber
withtwodigitstotherightofthedecimalpoint(e.g.,enter"50%"as"0.50").
KahanaOccupancyData
KahanaOccupancyRegressions
TheadjustedRsquaredof60%forthemultipleregressionishigherthanthe53%forthesimpleregression.Butis
thatthebestyoucando?Forthemoment,youreportyouranalysistoAlice.
ThismodelwiththelaggedvariableismoreusefultoLeo,evenifitspredictivepowerisstillprettylow.Butlet'slook
atthesedataonLeo'scompetitionthatIdugup.Maybewecanusethemtogiveusanevenbettermodel.
Leo'scompetitor,KnutSteinkaltattheHotelExcelsior,frequentlylaunchespromotioncampaigns:insomemonths,
Steinkaltslashesroompricesdramatically.
ThesedatashowthemonthsinwhichtheExcelsior'spromotionstookplaceinthelastthreeyears.Wecanusea
dummyvariabletoseehowthecompetition'spromotionshaveaffectedLeo'soccupancy.
KahanaOccupancyData
TheExcelsiorhastoadvertisethesepromotionalpackagesatleastamonthinadvancetoattractcustomers.Aslong
asLeokeepsaneyeontheExcelsior'sadvertising,he'llbealertedtothesepromotionsinenoughtimetotakethem
intoaccountwhenhemakesstaffingdecisionsforthefollowingmonth.
KahanaOccupancyData
UsingAlice'sresearch,yourunasimpleregressionofoccupancyversusthepromotions.Thenyourunthemultiple
regressionofoccupancyversusadvancebookings,laggedarrivals,andExcelsiorpromotions.
KahanaOccupancyData
SupposeLeohasusedthemultipleregressionequationtopredictJuly'soccupancyusingagivenlevelofadvance
bookingsandlastJuly'sarrivals.Justbeforehemakeshisstaffingdecisions,helearnsthat,unexpectedly,hisrival
KnuthascuttheExcelsior'sroompricesforJuly.Leoshouldrevisehispredictedoccupancyby:
KahanaOccupancyData
KahanaOccupancyRegressions
ThegrosseffectofExcelsiorpromotionsonKahanaoccupancy,areductionof52guests,isnotrelevantheresincewe
knowtheadvancedbookingsandlaggedarrivalsforJulyarestillthesameastheywerebeforetheExcelsior
launcheditscampaign.Leowantstousetheneteffectofthepromotiononhisoccupancylevels,assumingadvanced
bookingsandlaggedarrivalsremainfixedanaveragereductioninoccupancyof60guests.
AddingExcelsiorpromotionstotheregressionanalysis:
KahanaOccupancyData
KahanaOccupancyRegressions
UponLeo'sreturn,youeagerlyreportyourresults.
Thisisgoodwork.Naturally,I'dliketohaveevengreaterpredictivepowerthan84%,butIrealizeyouaren't
psychics.ThismodelwillreallyhelpmewhenIhirestaff.Thanks!
94/135

5/13/2016

QuantitativeMethodsOnlineCourse

Ihavesomegoodnewsofmyown.Mr.Pittagreednottofilesuitagainstme!Ididhavetopromisehimanextended
stayintheKahana'spenthouse,freeofcharge.
He'scomingnextmonththere'soneoccupantIdon'tneedtousestatisticstopredict.Thistime,I'llservethe
bisquemyself.
Thanksagainforallyourhelp!

Exercise1:TheKiwanaQuandary
LindaSzewczyk,marketingdirectorofAmalgamatedFruitsVegetables&Legumes(AFV&L),isresearchingthe
nation'sfruitconsumptionhabits.Inparticular,shewouldlikegreaterinsightintohouseholdconsumptionofthe
kiwana,acrossbreedofkiwisandbananasthatAFV&Lpioneered.
Naturally,oneimportantdeterminantofhouseholdconsumptionisthesizeofthehouseholdthenumberof
members.SinceAFV&Lhaspositionedthekiwanaasa"highend"fruit,Lindabelievesthathouseholdincome
mayalsoinfluenceitsconsumption.
Runamultipleregressionofhouseholdkiwanaconsumptionversushouseholdsizeandincome.Makenoteof
importantregressionparameterssuchasRsquared,adjustedRsquared,thecoefficients,andthecoefficients'
significance.Theincomevariablehasacoefficientof0.0004.Canavariablewithsuchasmallcoefficientbe
statisticallysignificant?
KiwanaConsumptionData
Theindependentvariable,income,isstatisticallysignificantsinceitspvalueislessthan0.05,themostcommon
levelofsignificance.Thesmallcoefficienttellsusthatforeveryadditional$10,000ofincome,averagekiwana
consumptionincreasesby4lbs.ayear.
Todate,AF&Lhasfocuseditsmarketingcampaignsonhighincome,highlyeducatedconsumers.Lindawouldlike
todeepenherunderstandingofhowtheeducationallevelofthehouseholdmembersmightaffecttheirappetite
forkiwanas.
KiwanaConsumptionData
Toincorporateeducationintoherkiwanaconsumptionanalysis,Lindaseparatedthehouseholdsinherdataset
intothreecategoriesbasedonthehighestlevelofeducationattainedbyanymemberofthehouseholdno
collegedegree,collegedegreebutnopostgraduatedegree,andpostgraduatedegree.Sherepresentsthese
categoriesusingtwodummyvariables"collegeonly"and"postgraduate."
Runaregressiononallfourindependentvariables.
KiwanaConsumptionData
Controllingforhouseholdsize,income,andpostgraduatedegree,howmanymorepoundsofkiwanasare
consumedinahouseholdinwhichthehighesteducationallevelisacollegedegree,comparedtoahouseholdin
whichnooneholdsacollegedegree?
KiwanaConsumptionData
KiwanaConsumptionRegressions
Thecoefficientforthedummyvariable"college,"51.6,tellsustheexpecteddifferenceinkiwanaconsumptionfor
"collegedegreesonly"householdscomparedtotheexcludededucationalcategory:householdsinwhichnoone
holdsacollegedegree.Thecoefficientdescribesthenetrelationshipbetween"collegedegreesonly"andhousehold
kiwanaconsumption,controllingforhouseholdsize,income,andpostgraduatedegree.
Controllingforhouseholdsize,howmanymorepoundsofkiwanasareconsumedinahouseholdinwhichthe
highesteducationallevelisapostgraduatedegree,comparedtoahouseholdinwhichthehighesteducational
levelisacollegedegree?Enterthedifferenceinconsumptionbetweenthetwohouseholdsasadecimalnumber
withtwodigitstotherightofthedecimalpoint.(e.g.,enter"5"as"5.00").Roundifnecessary.
KiwanaConsumptionData
KiwanaConsumptionRegressions
Whenyoucontrolforhouseholdsizeandincome,collegedegreehouseholdsconsume51.63lbsmorethannon
collegehouseholds.Postgraduatedegreehouseholdsconsume51.95lbsmorethannoncollegehouseholds.In
otherwords,postgraduatehouseholdsconsume0.32lbsmorethancollegehouseholds.
95/135

5/13/2016

QuantitativeMethodsOnlineCourse

Theanalysisofhouseholdkiwanaconsumptionindicates:
KiwanaConsumptionData
KiwanaConsumptionRegressions

Exercise2:TheReturnoftheEmpire
Forthisexercise,refertotheregressionanalysesyouraninexercise1oftheprevioussection.
EmpireLearningData
EmpireLearningRegressions
BillHartborne,theCEOofEmpireLearning,isusingregressionanalysistopredictthenumberoflaborhoursit
willtakehisteamtocreateanewelearningcourse.HeisusingdataonpreviouscoursesEmpirecreated,withthe
numberofpagesandthetotalanimationruntimeasindependentvariables.
EmpireLearningData
EmpireLearningRegressions
Billbelievesthatthenumberofillustrationsusedinthecoursemayalsohaveasignificantimpactonthenumber
oflaborhoursittakestocompleteanelearningcourse.Hewantstoaddthenumberofillustrationstothemodel
asanotherindependentvariable.
EmpireLearningData
EmpireLearningRegressions
Runthesimpleregressionoflaborhoursversusnumberofillustrations.
EmpireLearningData
EmpireLearningRegressions
Atwhichlevelisthenumberofillustrationsastatisticallysignificantindependentvariable?
EmpireLearningData
EmpireLearningRegressions
Runthemultipleregressionoflaborhoursversusnumberofpages,illustrations,andanimationruntime.
EmpireLearningData
EmpireLearningRegressions
Isthereevidenceofmulticollinearityinthedata?
EmpireLearningData
EmpireLearningRegressions
AcommonsymptomofmulticollinearityisahighadjustedRsquaredinthiscase94%accompaniedbyone
ormoreindependentvariableswithlowsignificance.Inthiscase,thecoefficientforthenumberofillustrationsis
notsignificantatthe0.05level,andthepvalueforthenumberofpageshasrisento0.0291,upfrom0.0015inthe
regressionwithoutillustrations.
Whichofthefollowingisthelikelyculpritofmulticollinearity?
EmpireLearningData
EmpireLearningRegressions
Multicollinearityoccurswhentherespectiveeffectsoftwoormoreindependentvariablesonthedependent
variablearenotdistinguishableinthedata.Thiscanbetheresultofcorrelatedindependentvariables.Thefact
thatthepvalueforthenumberofpagesriseswhenweaddtheillustrationsraisesoursuspicionsthatthenumber
ofillustrationsandthenumberofpagesmightbecorrelated.
Wecancomputethecorrelationbetweenthenumberofpagesandthenumberofillustrations:thecorrelation
coefficient,67%,isfairlyhigh.
Wecouldalsoattempttodiagnosethecauseofthemulticollinearitybyrunningaregressionoflaborhoursversus
numberofpagesandnumberofillustrationsomittinganimations.Here,thesignificanceofillustrationsis
extremelylow,withapvalueof0.85.
96/135

5/13/2016

QuantitativeMethodsOnlineCourse

EmpireLearningRegressions
Intheregressionoflaborhoursversusnumberofillustrationsandruntimeofanimationsomittingpagesthe
respectiveeffectsoftheindependentvariablesonthedependentvariablescanbedistinguished.Here,thep
valuesforbothvariablesaremuchlowerthan0.05.Alloftheevidencepointstoalinearrelationshipbetween
numberofpagesandnumberofillustrationsastheculpritforthemulticollearity.
EmpireLearningRegressions
Billwantstousetheregressionanalysistopredictthenumberoflaborhoursitwilltaketocompleteanewe
learningcourse.Comparingthetworegressionmodelsonewithallthreeindependentvariables,onewithout
illustrationswhichshouldBilluse?
EmpireLearningData
EmpireLearningRegressions

DecisionAnalysis
Introduction
ThreemoredaysbeforeyouleaveHawaii.Asexcitedasyouarebyyourimminentdebutatbusinessschool,youdon't
lookforwardtoyourdeparture.Youfindyourselfweighingtherelativemeritsofbusinessschoolandjuststayingon
Kauai.Leohasanopeningintherestaurant...
Theshrillringofthetelephoneinterruptsyourreverie.SoundingexcitedevenbyLeostandards,thehotelierdemands
yourimmediateattentioninhisoffice.

DiningatSea:TheChezTethysProblem
I'vehadanexcitingbusinessventureonmymindforsometime:Iwanttorunafloatingrestaurant.It'sgoingtobe
amazing,offeringspectacularviewsofvolcanicsilhouettesandthebrightlightsofresorttowns,thebestHawaiian
culinaryartistry,tastefullyalluringhuladancingandmusic,...
Slowdown,slowdown,Leo.I'minterestedtohearyouridea,butlet'sfocusonthebigpicturefirst.Afloating
restaurant?
Iwanttorunaworldclassrestaurantonayacht.ChezTethys,I'llcallit.I'vehadtheideaforsometime.ChezTethys
willbeHawaii'smostluxuriousdiningexperience.Theyachtwilltravelfromislandtoisland,dockinginadifferent
harboreachnight.
Seatingwillbelimited,andyoucanboardonlyatselectedtimes.Thechallengealoneofmakingareservationwill
makeitirresistible.ItwouldbecomeamatterofprestigetoincludeamealChezTethysonanytriptotheislands.
TheChezTethyshasbeenonmymindsincebeforeIinheritedtheKahana.I'dbeenlookingforinvestors,but
withoutluck.ThentheKahanafellintomylap,andthehotelhaspreoccupiedmesincethen.
I'mbeginningtofeelconfidentaboutrunningthehotelthankstoyoutwo,Ishouldadd.NowIfeellikeI'mready
toexpandmyoperations.Inadditiontoitsownprofits,justthinkoftheprestigeahighprofilefloatingrestaurant
wouldaddtotheKahana!
Listen,Leo,thisisanintriguingproposition,butI'mnotfullyconvincedthatafloatingrestaurantwouldreallycatch
on.Let'sgooveryourbusinessplan,andthenwe'llthinkabouthowlikelyitisthattheTethyswillbeaprofitable
venture.Onequickquestion:HowdoyouintendtofinancetheTethys?
I'mwayaheadofyou.I'vealreadybeentothebank,andIthinkIcantakeoutaloanusingtheKahanaascollateral.I
haveareallygoodfeelingabouttheTethys.I'mprettyconfidentthatIcanmakeafloatingrestaurantwork.
"Leo'sChezTethysideaisoutoftheordinary,butveryinteresting,"Alicemuses."Butwe'llneedtoguidehim
throughthedecisiontoexpandhisoperationssoextravagantly,especiallysincehe'sputtingtheKahanaatrisk.We
wouldn'twanthimtomakeachoicehe'llregret."

IntroducingDecisionAnalysis
S&CFilms,asmallfilmproductioncompany,justpurchasedascriptforanewfilmcalledCloven.Cloven'sunusual
97/135

5/13/2016

QuantitativeMethodsOnlineCourse

plothasthemakingsofacultclassic:apastoralromanceshatteredbyalivestockrevolutionandthecreationofa
poultrydominatedtotalitarianfarmstate.S&C'sownerandproducer,SethChaplin,enthusiasticallypitchesthe
film:"It'sAnimalFarmmeetsCasablanca".
SethenteredintonegotiationswithtwomajorstudiosPonyPicturesandK2Classicsandhammeredouta
prospectivedealwitheachofthem.
PonyPictureslikedthescriptandagreedtopayS&C$10milliontoproduceCloven,coveringtheproductioncosts
andaprofitmarginforS&C.SethestimatesClovenwouldcostS&C$9milliontoproduce.Underthisagreement,
Ponywouldacquireallrightstothemovie.
Theseconddeal,negotiatedwithK2Classics,stipulatesthatK2wouldmarketanddistributethemovie,andS&C
wouldcovertheproductioncostsitself.Underthisagreement,K2andS&Cwouldsplitownershipofthemovie.Seth
believesthatthemoviehashugepotential.Evenwithpartialownership,S&CwouldreapenormousprofitsifCloven
becomesablockbuster.
ThisdealspecifiesthatK2wouldcollectallrevenuesuntilitsmarketinganddistributioncostsarecovered,thentake
35%ofanyfurtherrevenues.S&Cwouldcollecttheremaining65%.IfClovenisablockbuster,S&Cwillmakea
killing.Ifitflops,S&Cwilllosesomeorallofitsproductioninvestment.
WhichdealshouldSethchoose?
Decisionmakingisanessentialmanagementresponsibility.Managersarechargedwithchoosingfrommultiple
coursesofactionthatcaneachleadtoradicallydifferentconsequences.Seth'sdecisionabouthowtofinanceCloven's
productionmightleadtoanythingfromblockbusterprofitstofinancialdisaster.
Whenfacedwithachoiceofactionsandarangeofpossibleoutcomes,decisionmakingcanbedifficult,inlargepart
duetouncertainty.Althoughthecourseofactionwechooseinfluencestheoutcome,muchofwhathappensisnot
onlybeyondourcontrol,butbeyondourpowerstopredictwithcertainty.
Nonetheless,decisionslikeSeth'sthatinvolveuncertaintycanbeanalyzedlogicallyandrigorously:decision
analysistoolscanhelpmanagersweighalternativeoptionsandmakeinformedandrationalchoices.
Inaworldofuncertainty,applyingdecisionanalysiswillnotguaranteethateachdecisionyoumakewillleadtothe
bestresult,oreventoagoodresult.Butifyouapplyeffectivedecisionanalysisconsistentlyoverthecourseofa
managerialcareer,youarealmostcertaintogainareputationforsoundjudgment.

Summary
Decisionanalysisisasetofformaltoolsthatcanhelpmanagersmakemoreinformeddecisionsinthefaceof
uncertainty.Althoughapplyingeventhemostrigorousdecisionanalysisdoesnotguaranteeinfallibility,itcan
helpyoumakesoundjudgmentsoverthecourseofyourcareer.

DecisionTrees
"TheproblemwithLeo'sfloatingrestaurantisthatitssuccesshingesonsomanyfactors,"Alicetellsyou."Some,Leo
canpredict,liketheapproximatepriceofayacht,orthecostoflabor.Butwhataboutconsumerdemand?Truthbe
told,notevenLeoknowswhethertheChezTethyswillcatchon.Leohastoincorporatethisuncertaintyintohis
decisionmakingprocess."

UncertaintyandProbability
ProducerSethChaplinboaststhatwhenClovenisreleasedintheaters,itwillbecomeaninstantclassic.Butevenan
incurableenthusiastlikeSethwouldneverclaimthatsuccessiscertain,andifaskedhowconfidentheisthatCloven
willachieveatleastmodestsuccess,hemightreplywithapercentage:"80%."
Uncertaintyclearlyhasaquantitativedimension:wecandistinguishbetweendegreesofuncertainty.
Butourintuitionaboutuncertaintyisoftenunderdeveloped.Howdowemeasureuncertaintyconsistentlyand
systematically?
Probabilityisameasureofuncertainty.Ameteorologistreportstheprobabilityofraininherforecast:"We'llsee
cloudyskiestomorrow,witha20%chanceofrain."
Probabilityismeasuredonascalefrom0to1,or0%to100%.Eventsthatareimpossiblearesaidtohavezero(or
98/135

5/13/2016

QuantitativeMethodsOnlineCourse

zeropercent)probability,andeventsthatareabsolutelycertainaresaidtohaveaprobabilityofone(or100%).But
howdoweinterpretprobabilitiesof20%or37.8%?
Let'slookataverysimpleandfamiliaruncertainevent:aspinofawheeloffortune.Ourwheelconsistsoftwo
equallysizedareas:anorangehalfandagreenhalf.Spinthewheelafewtimesandcountthenumberofgreenand
orangeoutcomesyouget.
Sincethetwoareasareofequalsize,we'dexpect"greens"and"oranges"tooccurwithequalfrequency.Abouthalfof
thespinswillendup"green"andhalfwillendup"orange."Ofcourse,ifweonlyspinafewtimes,wewon'texpect
exactlyhalfofthespinstoresultin"green."Butthemoretimeswespin,thecloserweexpectthepercentageof
greenswillbeto50%.Thisistheprobabilityofgettinga"green"eachtimewespin.
Thisinterpretationofprobabilityisarelativefrequencyinterpretation.Wehaveasetofeventsinourcase,
spinsandasetofoutcomes"green"or"orange."Weformtheratioofthenumberoftimesaparticularcolor
occurstothetotalnumberofspins.Theprobabilityofthatparticularcoloristhevaluethatratioapproachesasthe
totalnumberofspinsapproachesinfinity.
Let'slookatadifferentwheel.Thiswheelisalsodividedintotwoareas,butnowthegreenareaislarger:itcovers
threequartersofthewheel.
Whatistheprobabilityofaspinofthiswheelresultingintheoutcome"green"?
Sincethegreenareacoversthreequartersofthewholewheel,wecanexpectthatthreeoutoffourspinswillresultin
agreenoutcome.Thiscorrespondstoaprobabilityof75%.
Forsimpleuncertaineventslikeawheeloffortunegame,figuringouttheprobabilitiesoftheoutcomesiseasy.
Therearealimitednumberofpossibleoutcomesandtherearenohiddenfactorsaffectingtheoutcome.Besides,if
wehadanydifficultybelievingthattheprobabilityofa"green"is75%,wecouldsimplyspinthewheeloverandover
againtocalculateanapproximationoftheprobability.
Formorecomplexuncertaineventsliketheweatherorthesuccessofanewventure,findingtheprobabilitiesofthe
outcomesismoredifficult.Manyfactorscanaffecttheoutcomeindeed,wemaynotevenknowallthepossible
outcomes.Furthermore,theremaybenosimpleexperimentwecanrepeatedlyruntogenerateapproximationsof
therelativefrequencies.Howshouldwethinkaboutprobabilitiesinthesesituations?
Let'sthinkabouttheweather.Anytimeyouleaveyourhouse,youfacethedecisionofwhetherornottotakean
umbrella.Iftheskyisperfectlyblue,youprobablywon'ttakeone.Norwillyoudosoifthereareafewwhiteclouds
inthesky.
Ontheotherhand,iftheskyiscompletelyovercast,youmightconsidertakingsomekindofrainprotection,andif
it'salreadyraining,youalmostcertainlywould.You'dbeabletogivearoughestimateoftheprobabilityofrainfor
instanceifit'sovercast,youmightestimatea40%chanceofrain.
Thisestimatemayberough,butitisn'tcompletelyarbitrary,andit'sclearlyabetterestimatethaneither0%or
100%.Yoursubjectiveestimatemaybebasedonaninformalassessmentofrelativefrequency.Theestimateof
"40%"mightbeshorthandforthereflection:"Inmypastexperience,ithasrainedonalittleunderhalfthedays
whenitishasbeenthisovercastinthemorningatthistimeoftheyear,inthisgeographiclocation."
Althoughyouprobablyneversatdownandtabulatedthedaysaccordingtoseason,atmosphericconditions,and
rainfall,yourinformalassessmentispowerfulenoughtomakeareasonablechoiceabouttakingyourumbrella.
Thoughsometimes,youmaymakethewrongchoice.
Similarly,whenSethChaplingiveshisestimateofan80%chanceofCloven'ssuccess,heisprobablyarticulatingthe
following:"Inmyexperience,aboutfouroutoffivemoviesofasimilargenre,withasimilarbudget,releasedunder
similarconditionsweresuccessful."
Intheexampleswe'vediscussedthusfar,ouruncertaintyabouttheeventswefacedhasstemmedfromthefactthat
theeventshadnotyethappened.Sometimeswefaceuncertaintyforadifferentreason:aneventmayhavealready
occurred,butwesimplydon'tknowwhattheoutcomewas.Let'sreturntothewheeloffortunetoseehowwemight
thinkaboutuncertaintyinthesecases.
Imagineyouandafriendspinthewheel.Youcloseyoureyesandkeepthemshutwhileyourfriendobservesthe
wheel.Whilethewheelisspinning,bothyouandyourfriendareuncertainabouttheoutcome,andyouwouldeach
assesstheprobabilityof"green"at75%.
Whenthewheelstops,theresultbecomesdefinite.Yourfriendknowstheoutcome:forher,theprobabilityof
"green"isnoweither0%or100%.Butaslongasyoureyesareclosed,youremainuncertainabouttheoutcome.
99/135

5/13/2016

QuantitativeMethodsOnlineCourse

Ifsomeonepromisedtogiveyou$10ifyoucorrectlyguessedtheoutcomeofthatspinbeforeyouopenedyoureyes,
whichcolorwouldyouchoose?
Fromyourlimitedpointofview,thebestchoiceyoucanmakeis"green,"eventhoughtheoutcomemayalreadybea
definite"orange."Ifyouplayedthesamegameoverandoveragainandalwayschose"green,"youwouldwinthree
quartersofthegames.Theprobabilityoftheoutcometurningouttohavebeen"green"whenyouopenyoureyesis
75%.
Eventhoughtheeventhasalreadyoccurred,theoutcomeisuncertainfromyourpointofview.Itmakessenseto
measuretheuncertaintyduetoalackofinformationthesamewaywemeasuretheuncertaintyduetoourinability
topredicttheoutcomesoffutureevents.

Summary
Uncertaintymakesdecisionmakingchallenging.Wecanbeuncertainaboutoutcomesthathaveorhavenot
occurred.Tomakethemostinformeddecisions,wequantifyouruncertaintyusingprobabilitymeasures.
Probabilityismeasuredonascalefrom0%to100%:eventswith0%probabilityareimpossibleeventswith100%
probabilityarecertain.Anevent'sprobabilitycanoftenbedeterminedbyobservingtherelativefrequencyofits
occurrencewithinasetofopportunitiesfortheeventtooccur.Foreventsforwhichrelativefrequenciesare
difficulttoassess,weoftenmakesubjectiveestimatesoftheprobabilities.Thoughnotbasedon"hard"data,these
estimatesareoftenasufficientbasisforsounddecisionmaking.

StructuringDecisionTrees
Alicewhipsoutanotepadandpencil,andbeginsdrawingsomethingthatvaguelyresemblesasaguarocactus."Let's
lookathowwecananalyzeandstructuredecisionslikeLeo's."
Graphicalrepresentationsareoftenahighlyefficientwaytoorganizeandconveyinformation.Scatterplotsand
histogramsefficientlycommunicatedistributionsofdata,anorganizationalchartcanquicklyoutlinethestructureof
acompanyandpiechartseffectivelyexpressinformationaboutproportionsandprobabilities.
Istheresuchathingasagraphicaltoolthathelpsinformandorganizethedecisionmakingprocess?
Theanswerisyes,andthegraphicaltooliscalledadecisiontree.Let'slookatasimpledecisiontree.
SethChaplin'sbusinessdecisionwhethertoproduceClovenforPonyPicturesorinpartnershipwithK2Classics
involvestwoalternatives.Atsomepoint,Sethmustmakeadecision.Untilhedoes,therearetwopathsalong
whichhistorycanunfold.ClovenwilleitherbeownedbyPonyorcoownedbyS&CFilmsandK2Classics.A
differentbranchofthetreerepresentseachalternative.
Decisionsaren'ttheonlypointsatwhichalternativescanbranchoff.Eventswithuncertainoutcomesalsoleadto
branchingalternatives.Oncereleasedintheaters,Clovencouldbea"Blockbuster"hit,haveafair,but"Lackluster"
performance,or"Flop"completely.Eachoftheseoutcomescorrespondstoaseparatebranchonthedecisiontree.
Decisiontreeshavetwotypesofbranchingpoints,ornodes.Atdecisionnodes,thetreebranchesintothe
alternativesthedecisionmakercanchoose.Weusesquareboxestorepresentdecisionnodes.
Atchancenodes,thetreebranchesbecausetheuncertaintyofthebusinessworldpermitsmultiplepossible
outcomes.Werepresentchancenodesbycircles.
Wearrangethenodesfromlefttorightintheorderinwhichwewilleventuallydeterminetheirresults.InSeth's
example,thefirstthingtobedeterminedishisowndecisionaboutwhichdealtochoose.Thenextthingtobe
determinedisCloven'sboxofficeperformance.
Oneofthechallengesofcreatingadecisiontreeisdeterminingthecorrectsequenceofnodes:inwhatorderwillthe
relevantoutcomesbedetermined?whichevents"depend"ontheoccurrenceofotherevents?Whatalternativesare
createdorforeclosedbypriordecisionsorbytheoutcomesofuncertainevents?
Drawingadecisiontreeforcesustodelineateeachalternativeandclarifyourassumptionsaboutthosealternatives.
Thus,evenbeforeweuseatreetomakeadecision,simplystructuringthetreehelpsusclarifyandorganizeour
thoughtsaboutaproblem.Adecisiontreeisalsousefulinitsownrightasatoolforcommunicatingour
understandingofacomplexsituationtoothers.
Categorizingbranchingpointsasdecisionorchancenodesmakesclearwhicheventsandoutcomesareunderour
powerandwhicharebeyondourcontrol.
100/135

5/13/2016

QuantitativeMethodsOnlineCourse

Writingdownalternativesinasystematicfashionoftenallowsustothinkofideasfornewalternatives.Forinstance,
afterviewingthedecisiontree,Sethmightrealizethathewouldpreferadealthatpayspartofhisproductioncosts
butstillallowsasmallownershipstakeasopposedtoeitherofthedealshehashammeredoutwithPonyandK2.

Summary
Decisiontreesareagraphicaltoolmanagersusetoorganize,structure,andinformtheirdecisionmaking.
Decisiontreesbranchattwotypesofnodes:atdecisionnodes,thebranchesrepresentdifferentcoursesofaction
thedecisionmakercanchoose.Atchancenodes,thebranchesrepresentdifferentpossibleoutcomesofan
uncertaineventatthetimeoftheinitialdecision,thedecisionmakerdoesnotknowwhichoftheseoutcomeswill
occur(orhasoccurred).Thenodesarearrangedfromlefttoright,intheorderinwhichthedecisionmakerwill
determinewhichofthepossiblebranchesactuallyoccurs.

IncorporatingDataintoDecisionTrees
Layingoutadecisiontreemappingalogicalstructureandidentifyingthealternativescenariosisanessential
firststepinthedecisionmakingprocess.Buthowdowecomparedifferentscenarios?Onwhatcriteriamightwe
baseourpreferenceforonescenariooveranother?
Inthebusinessworld,successisoftenmeasuredinprofits.SethChaplinwillconsiderClovenasuccessifitbrings
inatleastamodestprofit.HowshouldSethincorporatepossibleprofitlevelsintohisdecisionanalysis?
Seth'sdecisiontreecontainsfourpossiblescenarios:fouruniquepathsfromhiscurrentdecisionnodeonthe
lefttotheultimateoutcomesontheright.HemustassigntoeachscenarioadollaramountS&C'sprofits.For
example,ifSethchoosestoproduceClovenforPonyPictures,hisexpectedprofitswillbe$1million.
WhatifSethchoosestheK2deal?Ifhehasasignificantownershipstakeinthemovie,hisprofitswilldependon
Cloven'sboxofficesuccess.Witha"Blockbuster"moviecome"Blockbuster"profits:Sethestimatesaround$6
million.
IfClovenperformanceis"Lackluster",SeththinkstheClovenprojectwouldbreakevenforS&Cfilms:S&C's
expectedprofitswouldberoughly$0.IfCloven"Flops,"S&C'sproductioncostswillexceeditsrevenuesfromthe
movie,leadingtoanexpectedlossof$2million.
Arethesethreetheonlypossibleoutcomes?Clearly,no.S&C'sprofitsonthereleaseofClovencouldbe$6.6
millionor$15.03.Infactthereisarangeofoutcomesstretchingfromprofitsinthehundredsofmillionsofdollars
thehistoricalupperlimitofmovieprofitstoalossof$9milliontheamountS&Cwillinvestinproduction.
Althoughitismathematicallypossibletoanalyzethefullrangeofpossibleoutcomes,theprocedurefordoingsois
usuallymorecomplicatedandtimeconsumingthaniswarrantedformanydecisionproblems.Inpractice,
consideringonlyafewrepresentativescenariosaswehavedoneintheClovenexamplewilltypicallyleadtogood
decisions.
Whenweuseascenariotorepresentarangeofpossibleoutcomes,theoutcomefigureweassigntothatscenario
shouldrepresenttheweightedaverageofalloutcomesintherangethatscenariorepresents.Forexample,inthe
Clovendecision,$0millionmightrepresenttheweightedaverageofallpossibleprofitsfrom$3millionto+$3
million.

DecisionTreesandProbabilities
HowcanSethusehisdecisiontreeandestimatedprofitvaluestochoosethebetterdeal?
IfSethchoosesthepartnershipwithK2,S&Ccouldmake$6million.Clearly,thatscenarioispreferabletothe
scenarioinwhichS&CsimplysellsCloventoPonyforaprofitof$1million.
IfS&CretainspartownershipofClovenandit"Flops"inthetheaters,S&Cstandstolose$2million.Thatscenario
isclearlyworsethanthescenarioinwhichS&CsellsCloventoPony.
IfClovenishighlylikelyto"Flop"intheaters,thenSethshouldchoosethePonydealifit'shighlylikelytobea
"Blockbuster"heshouldchoosetheK2deal.Clearly,thelikelihoodofthedifferentoutcomesintheK2dealshould
weighheavilyinSeth'sdecision.
Toeachbranchemanatingfromachancenodewemustassociateaprobability:theprobabilityofthatoutcome
occurring.Theseprobabilitiesmaybebasedonhistoricaldataoronourbestjudgmentofthelikelihoodofeach
outcome.
101/135

5/13/2016

QuantitativeMethodsOnlineCourse

Basedonthequalityofthescriptandhisyearsofexperienceinthemovieindustry,Sethestimatestheprobability
ofClovenbecominga"Blockbuster"at30%,theprobabilityofafair,but"Lackluster"performanceat50%,andthe
probabilityofa"Flop"at20%.
IfS&CsellsCloventoPony,thelevelofboxofficesuccessislargelyirrelevanttoSeth.S&C'sprofitsarecertainto
be$1million.
Whenassigningoutcomesandprobabilitiestochancenodes,wemustmeettworequirements.First,outcomes
thatbranchofffromthesamechancenodemustbe"mutuallyexclusive."Thus,forexample,twooutcomes
emanatingfromthesamenodecannotbothoccuratthesametime:theoccurrenceofoneoutcomesexcludesthe
occurrenceofanyother.
Second,thesetofoutcomesthatbranchofffromthesamechancenodemustbe"collectivelyexhaustive":the
branchesmustrepresentallpossibleoutcomes.
Inpractice,wetypicallydon'tdepicteverypossibleoutcomeseparately.ForCloven,we'vereducedthevastrange
ofpossiblefinancialoutcomestoamanageablesetofthreerepresentativeoutcomes.However,weconsiderthese
threeoutcomescollectivelyexhaustiveinthattheyrepresentthreerangesthattogetherexhaustallpossibilities.
Also,weusuallyconsiderextremelyunlikelybutnotimpossibleeventstobeincludedinoneoftheexisting
branches,orifsufficientlyunlikely,tobeirrelevanttodecisionmaking.IntheClovenexample,wedon'tincludea
branchrepresentingthesimultaneousdestructionofallexistingcopiesoftheClovenfilm.
Whenweconstructachancenode,wemustmakesurethattheoutcomesemanatingfromthatnodearemutually
exclusiveandcollectivelyexhaustive.Sinceitiscertainthatoneandonlyoneofthepossiblescenarioswilloccur,
theindividualprobabilitiesmustaddupto100%.

Summary
Foradecisiontreetoeffectivelyinformadecision,itmustincorporatetwotypesofrelevantdata:endpoint
valuescorrespondingtoeachscenarioandtheprobabilitiesofthepossibleoutcomesofeachuncertainevent.An
outcomevalueisassociatedwithascenarioauniquepathfromthefirstnodeontheleftofthetreetoan
endpointontheright.Weplacetheappropriateprobabilityoneachbranchemanatingfromachancenode.
Outcomesrepresentedbybranchesemanatingfromthesamechancenodemustbemutuallyexclusiveand
collectivelyexhaustive.

Exercise1:TheTardytechBlameGame
TedNesbitisaprojectmanageratTardytech,asoftwaredevelopmentcompany.Tedisplanningtheproject
scheduleandbudgetforthedevelopmentofanewpieceofcustombusinesssoftwareforaFortune500client.
Theclienthasimposedastrictdeadlineforcompletionoftheproject.However,basedonpastexperiencewith
similarclients,TedknowsthatthereisasignificantprobabilitythatTardytechwillnotbeabletomeetitsdeadline
dueinmostcasesduetotoclientsidedelays.
Experienceindicatesthat56%ofallTardytechprojectsarecompletedontime,42%aredelayedduetoclientside
delays,and2%aredelayedduetoerrorsandprocessfailuresatTardytech.
WhatistheprobabilitythatTardytech'snewprojectwillnotbecompletedontime?
ThetotalprobabilityofadelayisthesumoftheprobabilitiesofaclientsidedelayandanTardytechsidedelay:
44%.
ApotentialnewclienthasaskedTardytechtoreporttheprobabilitythataprojectwillnotbedelayeddueto
Tardytecherrorsorprocessfailures.WhatprobabilityshouldTedreport?
Only2%ofallprojectsaredelayedduetoproblemscausedbyTardytech.Theremainingprojectsareeither
completedontime(56%)ordelayedbyclientsidedelays(42%).Theprobabilitythataprojectwillnotbedelayed
byTardytechisthesumoftheprobabilitiesofthoseoutcomes,98%.
AnotherpotentialclienthasaskedTardytechtoreportthepercentageofalldelayedprojectswhosedelayscouldbe
attributedtoTardytech'serrorsandprocessfailures.WhatpercentageshouldTedreport?
Inthepast,44outof100projectswerenotcompletedontime.Onaverage,2projectsoutof44delayedprojects
weredelayedduetoTardytecherrorsandprocessfailures.Thus,theproportionofTardytechcauseddelaysamong
alldelaysis2/44,or4.5%.We'lldiscussandanalyzesimilarquestionsingreaterdepthinthenextunit.
102/135

5/13/2016

QuantitativeMethodsOnlineCourse

Exercise2:TheFirstBankofSilverhaven
AsaloanofficerattheFirstBankofSilverhaven,CarlaWudetermineswhetherornottograntloanstosmall
businessloanapplicants.
Of108recentsuccessfulsmallbusinessloanapplicantswithafullyearofpaymenthistory,28wereunabletomeet
atleastoneloanpaymentinthefirstyeartheyhadanoutstandingloan.
Whatistheprobabilitythatarandomlyselectedsmallbusinessinthispoolmissedapaymentinthefirstyearof
theloan?
Entertheprobabilityasadecimalnumberwiththreedigitstotherightofthedecialpoint(e.g.,enter"50%"as
"0.500").Roundifnecessary.
Theprobabilitythatarandomlyselectedsmallbusinessmissedapaymentinthefirstyearoftheloanissimply
theratioofthenumberwhomissedatleastonepaymenttothetotalnumberofloans,i.e.,25.9%.

Exercise3:TheShippingBea
RobinBeaistheCEOofasmallshippingcompany.Sheneedstodecidewhetherornottoleaseanothertruckto
addtohercurrentfleet.
Ifsheleasesthetruck,shemaybeabletogeneratenewbusinessthatwouldincreasehercompany'sprofits.
However,ifadditionalbusinessfailstomaterialize,shemaynotbeabletocovertheincrementalleasingcosts.
Basedonherassessmentoffuturetrendsinthetransportationsector,Robinidentifiesthreescenariosthatmight
transpire:"Boom,""ModerateGrowth,"and"Slowdown,"dependingupontheperformanceoftheeconomyand
itsimpactonthetransportationsector.Sheassociateswitheachscenarioanestimateofthetotalprofitsan
expandedfleetwillgenerateforherfirm.
Ifshedoesn'tleasethenewtruck,Robinprobablywon'tgenerateasmuchrevenue,butherleasingcostswillbe
lower.Again,shebreaksdownthepossibleoutcomesifshedoesnotleasetheadditionaltruckintothreescenarios
correspondingtotheperformanceoftheeconomy.Thensheassociatesanestimateoftotalfirmprofitswitheach
scenario.
WhichofthefollowingtreesbestrepresentsRobin'sdecision?
ThefirstbranchingofthetreeoccursatthepointwhenRobinmakesherdecision:toleaseornottoleaseanew
truck.Asquaredecisionnodeshouldrepresentthisdecision.
EachofRobin'soptionssplitsintothreepossibleoutcomesdependingontheperformanceoftheeconomy.Robin
willlearnwhicheconomicscenariotranspiresonlyaftermakingherdecision.Thesecondbranchingrepresents
thisuncertainty.Thenodesofthesebranchesarechancenodesandshouldbedrawnascircles.

Exercise4:WheelofFortune
Youandafriendhaveawheeloffortunethathasa75%chanceofa"green"outcomeanda25%chanceofa"red"
outcome.Beforeyouspin,thefriendoffersyou$100ifyoucorrectlypredicttheoutcome.Ifyouchoosethewrong
color,youreceivenothing.Thegoodnews:youdon'thavetochooseuntilthespiniscomplete.Thebadnews:you
willhavetokeepyoureyescloseduntilthewheelhasstoppedandyouhavemadeyourprediction.
Youcloseyoureyesandkeepthemshutwhilethewheelspins.Whenthewheelstops,youreyesarestillclosed.
Younowmustdecidewhethertochoose"red"or"green.
Whichofthefollowingtreesbestrepresentsyourdecision?
Thenodesofadecisiontreearearrangedfromlefttorightintheorderinwhichwediscovertheirresults,notin
theorderinwhichtheeventsactuallyoccur.Thus,eventhoughthewheelstopsandtheresultisfinalizedbefore
youmakeyourdecision,becauseyoudon'tknowtheresultuntilafteryourdecisionismade,thechancenodefor
thespinresultshouldappearaftertotherightofthedecisionnode.

ComparingtheOutcomes
Alice'sdecisiontreesareneatdevicestohelporganizeandstructureadecisionproblem.Buthowaretheygoingto
103/135

5/13/2016

QuantitativeMethodsOnlineCourse

helpLeoevaluatehisoptionsandchoosethebestone?

IntroducingtheExpectedMonetaryValue
SethChaplinmustdecidehowtoproducethemovieCloven.Hehasmappedoutthelogicalstructureofhisdecision
inadecisiontreeandhasincorporatedtheappropriatedata,evaluatingeachscenariointermsofitsexpected
profitsanditsprobabilityofoccurring.Now,howshouldheusethetreetoinformhisdecision?
IfS&CFilmsworksinpartnershipwithK2andretainspartownershipofCloven,S&Ccouldearnaprofitof$6
million,ahighlydesirableoutcome.However,thatscenarioisrelativelyunlikely:30%.Howdowebalancethehigh
valueofthatoutcomeagainstitslowprobability?
Theansweriselegantandsimple:wemultiplytheoutcomevaluebyitsprobability:$6million*0.3=$1.8million.
Essentially,we"credit"the"Blockbuster"outcomewithonly30%ofitsvalue.
Similarly,wecredita"Lackluster"performanceof$0profitswithonlyhalfofitsvalue.
Themagnitudeofthelossincurredifthefilm"Flops"ismitigatedbyitslowlikelihoodof20%.
Thenweaddtheseweightedvaluestogether,foratotalof$1.4million.Thistotaliscalledtheexpectedmonetary
value(EMV)oftheK2option.TheEMVisaweightedaverageoftheexpectedoutcomesofthescenariosinwhich
S&CretainspartownershipofClovenasstipulatedintheK2deal.
Let'sreturntoourwheeloffortunegametobuildourintuitionfortheconceptoftheEMV.Thiswheelconsistsof
threeareas:"blue,""green,"and"red."Theblueareais30%ofthetotalarea.Ifaspinresultsin"blue,"youwin$6.
"Green"covers50%and"red"covers20%ofthewheel'sarea.Ifaspinresultsin"green,"yougainnothingatall,if
theresultis"red,"youlose$2.
Supposeyouplaythegame100times.Howmuchmoneydoyouthinkyouwillhavegainedorlostover100spins?
Whatdoyouestimatewillbeyouraverageyieldpergame?
About30%ofthetime,aspinwillresultin"blue."Inotherwords,youcanexpect30ofthe100spinstoyield$6.
About50%ofthetimeaspinwillresultin"green."Youexpect50ofthe100spinstoyield$0.Andabout20%ofthe
timeaspinwillresultin"red,"thatis,youexpect20ofthe100spinstocostyou$2.
Thetotalamountyoucanexpecttowinafterplayingthegame100timesis$140,andtheaverageyieldperspinis
$1.40.Thataverageistheexpectedmonetaryvalue(EMV)forasinglespin.
Theexpectedmonetaryvalueforasinglespinis$1.40.Ifyouspinthewheelonce,whichofthefollowingresultsis
leastlikelytooccurastheoutcome?
Theprobabilityofeachoutcomeisshownbelow.Withaprobabilityof0%,$1.40istheleastlikelyoutcome.
Itisimportanttounderstandthenatureoftheexpectedmonetaryvalue.Wedonotactually"expect"anoutcomeof
$1.40.Infact,$1.40isnotevenapossibleoutcome.TheEMVof$1.40isthelongtermaveragevalueofthe
outcomesofalargenumberspins.
IntheClovencase,theEMVof$1.4millionistheaverageamountofprofitsSethChaplincanexpecttomakewhen
heproducessimilarfilmsinsimilarcircumstances.
WecanusetheEMVasameasurewithwhichtocomparealternativeoptions.First,wecalculatetheEMVforeach
chancenode,beginningattherightofthetree.ForthechancenodeassociatedwiththeK2deal,theEMVis$1.4
million.
NowthatwehavecalculatedtheEMVoftheK2chancenode,wecan"collapse"thebranchesemanatingfromthe
chancenodetoasinglepoint.Goingforward,wecantreattheEMVof$1.4millionastheendpointvalueoftheK2
option.
TheEMVforthePonyOptionissimply$1.0million:theoutcomevaluemultipliedbyitsprobabilityof100%.
Atadecisionnode,wechoosethebestEMVofallthebranchesemanatingfromthatdecisionnode.InourCloven
example,thebestEMVistheonewiththehighestexpectedprofits.TheEMVoftheK2deal,$1.4million,exceeds
$1.0million,theEMVofthePonydeal.SelectingtheoptionwiththebestEMVandremovingallotheroptionsfrom
considerationisknownas"pruning"thetree.
104/135

5/13/2016

QuantitativeMethodsOnlineCourse

Anydecisiontreenomatterhowlargeorcomplexcanbeanalyzedusingtwosimpleprocedures.Ateachchance
node,calculatetheEMV,collapsethebranchestoapoint,andreplacethechancenodewithitsEMV.Ateach
decisionnode,comparetheEMVsandprunethebrancheswithlessfavorableEMVs.Thisentireprocessisknownas
foldingbackthedecisiontree.

Summary
Weoftenuseexpectedmonetaryvalue(EMV)toquantifythevalueofuncertainoutcomes.TheEMVisthesumof
thevaluesofthepossibleoutcomesofanuncertaineventaftereachhasbeenweightedbyitsprobabilityof
occurring.TheEMVcanbeinterpretedastheexpectedaverageoutcomevalueoftheuncertainevent,ifthat
uncertaineventwererepeatedalargenumberoftimes.Toanalyzeatree,we"folditback:"wemovefromright
(thefuture)toleft,findingtheEMVforeachnode.ForchancenodeswecalculatetheEMVasdescribedbelow.For
decisionnodeswesimplychoosetheoptionwiththebestEMVlowercostsorhigherprofitsamongthe
choicesrepresentedbyadecisionnode'sbranchesandprunetheothers.

RelevantCosts
BurningtousedecisiontreesandEMVs,youstarttodrawupthesquareandcircularnodesthatmakeupLeo'sChez
Tethysdecision.Alice,however,urgescaution:"Beforeyougettootriggerhappywiththosedecisiontrees,you
shouldbeawareofafewcommonpitfalls."
JenAmatohasbeendriving"Millie,"anoldjalopyofacar,forthepastfiveyears.MillieanOldsmobileDelta88
istwentyyearsold.Aweekago,theairconditionerbrokedown,andJenhaditreplacedatacostof$500.
Now,thedrivetrainiswornout.Replacingitwillcost$1,200.Ifshedoesn'trepairit,JencansellMillie"asis"for
about$300.Jenmustdecide:shouldshesellhercarorshouldshehavethedrivetrainreplaced?
IfJensellshercarnow,shewillnotevenrecoupher$500investmentintheairconditioner.Ifshehasthedrive
trainreplaced,herbelovedMilliewilllastalittlelonger,andJenwillbeabletoenjoythecoolairshespentsomuch
moneyon.HowshouldthefactthatJenhasn'thadtheopportunitytobenefitfromherairconditionerinvestment
affectherdecision?
Withabrokendrivetrain,Millieisworthabout$300.Accordingtohermechanic,ifJenpays$1,200toreplacethe
drivetrain,Milliewillbeworthapproximately$1,100intermsofresalevalue.IntheanalysisofJen'sdecision,what
roleshouldher$500investmentintheairconditionerplay?
InanyscenarioinwhichJensellshercar,the$500airconditionercosthasalreadybeenincurred,soitcouldbe
includedinthetotalcost.
Likewise,theairconditionerrepaircostsarepartoftheprehistoryofanyscenarioinwhichJenhasMillie'sdrive
trainreplaced.Here,too,the$500investmentintheairconditionercouldbeincludedasacost.
However,whenwecomparethetotalcostsofthetwooptions,werecognizethatthe$500doesnotcontributetoa
differenceintheoutcomevalues.Becausethe$500costisincurredinbothcases,itplaysnoroleinthe
comparisonofthetwooptions,andthusisirrelevanttoJen'sdecisionshewillhavespentthe$500nomatter
whatshedecidestodonow.
Coststhatwereincurredorcommittedtointhepast,beforeadecisionismade,contributetothetotalcostsofall
scenariosthatcouldpossiblyunfoldafterthedecisionismade.Assuchthesecostscalledsunkcostsshouldnot
haveanybearingonthedecision,becausewecannotdeviseascenarioinwhichtheyareavoided.
Itisn'twrongtoincludeasunkcostintheanalysisaslongasitisincludedinthevalueofeveryoutcome.However,
includingsunkcostsdistractsfromthedifferencesbetweenscenariostherelevantcosts.Imaginethecomplexity
ofJen'streeifsheincludedeverysunkcostshe'sincurredfromowningMilliefromtheoriginalpurchasepriceof
thecartothecostofallthegasolineshe'spumpedintoMillieovertheyears,toherexpenditureonadashboardhula
dancer.
Misinterpretingasunkcostasacostthatweighsononlysomeofthescenariosisacommonerror.Aftersinking$500
ofrepairsintohercar,sellingitfor$300willbequitepainfulforJen.Nonetheless,gooddecisionsaremadebased
onpossiblefutureoutcomes,notonthedesiretocorrectorjustifypastdecisionsormistakes.
Anothercommondecisionmakingerroristoomitfromtheanalysisrelevantcoststhatshouldhaveabearingon
thedecision.IfJenhasMillie'sdrivetrainreplaced,therepaircostsarejustoneofmanycostsinvolvedwiththat
option.GivenMillie'sage,thereisahighlikelihoodofanotherrepaircostsoon.Similarly,ifJensellsMillienow,she
willhavecostsassociatedwithbuyinganewcarorarrangingothertransportationservices.
105/135

5/13/2016

QuantitativeMethodsOnlineCourse

Opportunitycostsareanimportantcostcategorythatdecisionmakersoftenneglecttoincludeintheiranalyses.
Forexample,sellingthecarwillrequireJentodevote10hoursofhertimethatshewouldotherwisedevotetoher
parttimejobpaying$12perhour.Thus,Jenshouldadd$120inopportunitycoststotheoutcomesofthe"sell
Millie"decision.
Weshouldalsotakenonmonetarycostsintoaccount.Jenwillfeelsadatleavinghertrustedvehicleandcompanion
onmanyaroadtripbehind.Althoughsuchcostscanbedifficulttoquantify,theyshouldnotbeneglected.Wewill
seeshortlyhowtousesensitivityanalysistoincorporatenonmonetarycostsintoadecisionanalysis.

Summary
Amongthemostcommonerrorsindecisionanalysisisthefailuretoproperlyaccountforthecostsinvolvedin
differentpossiblescenarios.Ontheonehand,relevantcostssuchasopportunitycostsornonmonetary
consequencesareoftenomitted.Ontheotherhand,irrelevantcostssuchassunkcostsareincorrectlyincludedin
theanalysis.Sunkcostsarecoststhatwereincurredorcommittedtopriortomakingthedecisionandcannotbe
recoveredatthetimethedecisionisbeingmade.Sincethesecostsfactorintoanypossiblefutureoutcome,they
canbesafelyomittedfromtheanalysissunkcostsmustneverbeincludedinonlyselectedbranchesofadecision
tree.

TimeHorizons
Jenhasdecidedtoreplaceheroldcar,MillietheOldsmobile.Herfriend,Sven,isleavingthecountryfortwoyears
anddoesn'twanttopaytostorehisMazdaMiata.HeoffersJentwooptions.
Inthefirstoption,Jenleasesthecar,payingSven$700eachyear.WhenSvenreturnsfromabroad,hereclaims
hiscar.Inthesecondoption,Jenbuysthecaroutrightfor$4,000.HowshouldJencomparethesetwooptions?
WhatistheappropriatecostdifferenceJenshouldusetocomparethetwooptionsSvenhasoffered?
Tobegin,notethatinthefirstoption,thecostsarespreadacrosstwoyears,whereasinthesecondoption,Jen
makesasinglepaymentwhensheclosesthedealwithSven.Tocomparetheoptionsmeaningfully,wemustdefine
atimehorizonforthisproblem,thatis,wemustcomparetheircostsoveracommontimeperiod.Inthiscase,
twoyearsisaconvenienttimehorizon.
Next,notethatwecannotdirectlycompare$1,400,thesimplesumofthetwoleasepaymentsmadeattwo
differenttimes,tothe$4,000onetimecostofthepurchaseoptionpaidentirelyatthebeginningofthefirstyear.
Wecanonlycomparecostswhentheyarevaluedatthesamepointintime.
We'llcomparethepresentvalueofthecostsassociatedwitheachoptionthatis,thevalueofthecostsatthe
timeJenmakesherdecision.Inordertocomparethecosts,wefirstneedtoconvertthesecondinstallmentofthe
leasingpaymentintoitspresentvalue.
Jencurrentlyhassufficientcashforeitheralternativeinaninvestmentaccountwitha5%rateofreturn.Whatis
thepresentvalueofthesecondinstallmentof$700paidundertheleasingoption?
Thepresentvalueofthesecondinstallmentof$700is$666.67.Attheendofoneyear,$666.67residinginher
investmentaccounttodaywillhaveincreasedinvalueto$666.67*(1.05)=$700,theamountshemustpayforthe
secondleaseinstallment.
Thepresentvalueofthetotalcostoftheleasingoptionis$1,366.67.Canweusethisnumbertocomparethetwo
options?
Although$1,366.67and$4,000arenowcomparablecostsbecausetheyaregivenattheirpresentvalue,wehave
notyetconsideredthefactthatunderthepurchaseoption,JenwillowntheMiataattheendofthetwoyear
period.Intwoyears,JenexpectsthattheMiata'smarketvaluewillbearound$3,000.Thevalueofanassetatthe
endofthetimehorizonistypicallycalleditsterminalvalue.
What'sthenetpresentvalueofthecostofthepurchaseoptionthatis,thepresentvalueoffuturecashflowsafter
subtracting,or"nettingout,"theinitialpaymenttoSven?RecallthatJencurrentlyhashermoneyinvestedin
accountsthatearna5%averageannualreturn.
IfJenresellsthecarintwoyears,shewouldreceive$3,000.Thepresentvalueof$3,000receivedattheendof
twoyearsis$2,721.09.
Thenetpresentvalueofthecostofthepurchaseoptionis$1,278.91.
106/135

5/13/2016

QuantitativeMethodsOnlineCourse

IfcostistheonlydecidingfactorinJen'schoice,she'dsave$87.76ifshechosetopurchasetheMiata.Before
finalizingherdecision,Jenshouldmakesuretoweighotherconsiderationsrelevanttothedecision,suchas
whetherornotshewillneedtoborrowmoneytocoverotherexpensesthisyearifshespends$4,000tobuythe
Miatanow,whattheborrowingrateis,etc.

Summary
Whenwemakeadecision,weneedtochooseatimehorizonoverwhichwequantifytheoutcomesofour
decision.Tocomparemonetaryvaluesthattakeplaceatdifferenttimes,wemustaccountforthetimevalueof
moneybycomparingcashflowsattheirvaluesatacommonpointintime.Generally,wecomparethepresent
valuesornetpresentvaluesofdifferentoutcomesbydiscountingfuturecashflowsattheappropriate
discountrate.Terminalvaluesofassetsheldattheendofthetimehorizonmustbedeterminedanddiscounted
totheirpresentvalue.

SolvingtheChezTethysProblem
"Nowlet'sfindLeo,andgettoworkonhisdecisionproblem,"Alicesays,snappingherlaptopshut.
Icalculatedthatifthefloatingrestaurantideareallytakesoff,I'dmakeover$2millioninprofitsoverthenextfive
years.Andthat'safterIsubtractmyinitialinvestment,includingthepurchaseoftheship.
HowcertainareyouthattheTethyswillbethesuccessyouenvision?
Gosh,Idon'tknow.It'salmostlikeaflipofacointomechancesareaboutfifty/fiftythattheTethyswillbeabig
hit.
Thatsoundsalittletoooptimistic.Let'stakeacloselookatyourbusinessplan.
Thethreeofyouspendtherestofthedayinenergeticdiscussionandresearch.Finally,youagreeontwo
representativescenariosthatmightoccurifLeolaunchestheChezTethys:"Phenomenon,"or"Fad."
IftheTethysbecomesacultural"Phenomenon"withstayingpower,Leocanexpect$2millioninprofitsoverfive
years,intermsofnetpresentvalue.Leogrudginglyagreesthatthelikelihoodofthisscenarioisonly35%.
Alternatively,diningChezTethysmightbecomeapassing"Fad"foracoupleofyears,thenbereplacedby"TheNext
BigThing."Inthatcase,Leowouldfacesubstantiallosses,estimatedtohaveapresentvalueofabout$800,000.Leo
grantsthathisbrainchildhasa65%chanceofjustbeingafad.
WhatistheEMVoflaunchingtheChezTethys?
EntertheEMVin$millionsasadecimalnumberwiththreedigitstotherightofthedecimalpoint(e.g.,enter
''$5,500,000''as''5.500'').Roundifnecessary.
TheexpectedmonetaryvalueoflaunchingtheTethysistheoutcomevalueofthe"Phenomenon"scenarioweighted
byitsprobabilityaddedtotheoutcomevalueofthe"Fad"scenarioweightedbyitsprobability,i.e.,$180,000.
Hmm.Icanseehowthisanalysisishelpful.I'mgladthatmyexpectedprofitsarepositive,butsomehowI'mnot
satisfied.Idon'tfeelverycomfortablewithsomeofourestimates.

Exercise1:TheShippingBeaFliesAgain
RobinBeaistheCEOofasmallshippingcompany.Sheneedstodecidewhetherornottoleaseanothertruckto
addtohercurrentfleet.
Robinidentifiesthreestatesofthetransportationsectorthatmightoccur:"Boom,""ModerateGrowth,"and
"Slowdown,".Herfirmsprofitsdependonwhethersheleasesanadditionaltruckandonthestateofthesector:
Sheassociatesanestimateoftotalfirmprofitswitheachofthesixscenarios.
WhatistheEMVofthedecisiontoleasethetruck?
EntertheEMVin$thousandsasadecimalnumberwithonedigittotherightofthedecimalpoint(e.g.,enter
''$5,500''as''5.5'').Roundifnecessary.
TheEMVofthedecisiontoleasethetruckis$31,000.Weighteachofthescenario'soutcomesbytheprobabilities
thattheywilloccur,thenaddtheweightedvalues.
107/135

5/13/2016

QuantitativeMethodsOnlineCourse

WhatistheEMVofthedecisiontonotleasethetruck?
EntertheEMVin$thousandsasadecimalnumberwithonedigittotherightofthedecimalpoint(e.g.,enter
''$5,500''as''5.5'').Roundifnecessary.
TheEMVoftheoptiontonotleasethetruckis$18,200.Weighteachofthescenario'soutcomesbythe
probabilitiesthattheywilloccur,thenaddtheweightedvalues.

Exercise2:TheSHAMHoftheCentury
JariLipponenoftheSilverhavenHomeforAbandonedMiniatureHorses(SHAMH)needsfundstomaintain
operations.Hecaneitherapplyforagovernmentgrantorrunalocalfundraiser,butthedemandsonhistimeare
toohighforhimtobeabletodoboth.
Jaribelieveshehasa90%chancethathewillwinagrantof$25,000ifhesubmitsthegrantapplication.Grants
inthiscategoryareforafixedamount,soifhelosesthegrant,he'llhavenomoneytoruntheSHAMH.
Basedonhispastfundraisingexperience,heestimatesthatifherunsalocalfundraiser,hehasa30%chanceof
raising$30,000and70%chanceofraising$20,000.
WhatistheEMVoflaunchingafundraiser?
EntertheEMVin$thousandsasadecimalnumberwithonedigittotherightofthedecimalpoint(e.g.,enter
''$5,500''as''5.5'').Roundifnecessary.
TofindtheEMVoflaunchingthefundraiser,weightthe$30,000outcomevaluebyitsprobabilityof30%andadd
thattothe$20,000outcomevalueweightedbyitsprobabilityof70%.
WhatistheEMVofapplyingforthegrant?
EntertheEMVin$thousandsasadecimalnumberwithonedigittotherightofthedecimalpoint(e.g.,enter
''$5,500''as''5.5'').Roundifnecessary.
TofindtheEMVofapplyingforthegrant,weightthegrantawardamountof$25,000bytheprobabilityof
winningit(90%)andaddthattothevalueoflosingthereward($0)weightedbytheprobabilityoflosing(10%).
TheEMVofthefundraiseroptionis$23,000,higherthantheEMVofapplyingforthegrant.Basedonthia
analysis,Jarishouldorganizeafundraiser.

Exercise3:TheGaiacorpsUpgrade
MarshaRatulangiisthechiefoperatingofficeratGaiacorps,anonprofitorganizationdedicatedtopreserving
naturalhabitatsaroundtheworld.Gaiacorps'IThardwareisaging,andMarshamustdecidewhethertoextend
thecurrentleaseonGaiacorps'ITdesktopcomputersorpurchasenewones.
WhichofthefollowingcostsisnotrelevanttoMarsha'sdecision?
Thecostoftheleaseextensionandthefuturemaintenancecostsofthecurrenthardwareareincurredonlyinthe
scenarioinwhichGaiacorpsextendsitscurrentlease.Thepurchasepriceofthenewhardwareisincurredonlyin
thescenarioinwhichGaiacorpspurchasesnewhardware.Whichofthethreecostsareincurreddependsonwhich
optionMarshachooses,soallthreearehighlyrelevanttoherdecision.
WhichofthefollowingcostsisrelevanttoMarsha'sdecision?
Thecostofthememorycardupgradeandthemaintenancecostspreviouslyinvestedinthecurrenthardwareare
sunkcoststhathavealreadybeenincurred.
Marsha'ssalaryisnotasunkcost,butisincurredinallpossiblescenarios.NeitheroftheoptionsMarshais
consideringwillaffecttheamountofthesethreecosts,thusallthreecostsareirrelevanttothedecision.
Incontrast,thecostofdisposingofthenewhardwareisrelevant,sinceitisincurredonlyinthecasethatMarsha
buysnewdesktopcomputers.

Exercise4:MoppinguptheEmpire
Underpressurefromhiscompany'sadhocadvisoryboardtoloweroperatingexpenses,theCEOofEmpire
108/135

5/13/2016

QuantitativeMethodsOnlineCourse

Learning,BillHartborne,isconsideringcancelingthebiweeklycleaningserviceforthecompanyoffices.Each
cleaningengagementcosts$75.
Insteadofhiringacleaningservice,Billcouldsimplycleantheofficehimselfwheneveraclientvisitstheoffice.
Theprobabilityofexactlyoneclientvisitingtheofficeinagivenmonthis25%,andtheprobabilitythattwoclients
willvisitis10%.Theprobabilitythatnoclientswillvisitis65%.
Billdrawsupthestructureofhisdecisioninthetreedepictedbelow.Givenaonemonthtimehorizon,whatis
yourbestestimateoftheEMVofthecostofnothiringacleaningservice?
Insteadofcleaningtheoffice,Billcouldbecreatingvalueforhiscompanybymakingsales,networking,boosting
employeemorale,orsimplyincreasinghisproductivitybynappingontheofficesofa.Weneedtocalculatethe
opportunitycostofthetimeBillwouldspendcleaningtheoffice,soweneedtoknowhowlonghespends
cleaning,andhowhighlyhistimeisvalued.
AssumingthatitalwaystakesBill2fullhourstocleanEmpireLearning'soffices,andthatBill'stimeisvaluedat
$200/hour,whatistheEMVofBillcleaningtheoffice?
EntertheEMVindollarsasaninteger(e.g.,enter''$5.00''as''5'').Roundifnecessary.
TheEMVofBillcleaningtheofficeis$180.Basedonthisanalysis,Billshouldcontinueemployingthecleaning
service.

Exercise5:TheErisShoeCompany
ValPurcell,CEOofasupplychainmanagementconsultingfirmPurcell&Co.,mustdecidewhetherornottoput
inabidforacontracttoreengineerthesupplychainofapotentialnewclient:theErisShoeCompany.
Creatingthebidwillcost$16,000inVal'stimeandlegalfees.Ifhisbidbeatsoutthecompetition,Valexpectsthe
contracttoreturnprofitsof$100,000fromwhichthecostofpreparingthebidhasnotyetbeensubtracted.Val
believeshehasa20%chanceofwinningthebid.
EriswillpaytheconsultingfeeandValwillaccruehisestimated$100,000profitsuponcompletionoftheproject,
whichisscheduledforoneyearfromnow.
Undertheseterms,whatistheexpectedmonetaryvalueofcreatingandsubmittingthebid?
TheanswercannotbedeterminedwithoutknowingPurcell'sdiscountrate.IfPurcellwinsthebid,itwon'treceive
itsconsultingfeeuntilcompletionoftheprojectoneyearfromnow.Cashflowsrelatedtothecontractshouldbe
discountedatPurcell'sdiscountrate.Forsimplicityassumethatthecost/profitfiguresrepresentcashoutflows
andinflows,andthatPurcell'sdiscountrateis15%.
WhatistheEMVfortheoptionofsubmittingthisbid?
EntertheEMVin$thousandsasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter
''$5,500''as''5.50'').Roundifnecessary.
TofindtheEMVofcreatingthebid,firstcalculatethepresentvalueofthecashinflowfromPurcell'sprofitsonthe
contract,tobereceivedayearfromnowuponcompletionoftheproject.
Todeterminethenetpresentvalueofwinningthebid,wesubtractthe$16,000bidpreparationcosts.
Then,usetheprobabilityofwinningthebidtoweighttheEMVsofthe"Win"and"Don'twin"scenariosto
determinetheEMVofplacingthebid.
BeforeValstartsworkonthebid,Erisdecidestomovetheproject'scompletiondatetotwoyearsafterthecontract
issigned.Againassumethatthecost/profitfiguresrepresentcashoutflowsandinflows,andthatPurcell's
discountrateis15%.WhatistheEMVforsubmittingthebidnow?
EntertheEMVindollarsasaninteger(e.g.,enter''$5.00''as''5'').Roundifnecessary.
TofindthenewEMVofputtinginthebid,firstcalculatethepresentvalueofthecashinflowfromPurcell'sprofits
onthecontract,tobereceivedtwoyearsfromnowuponcompletionoftheproject.
Todeterminethenetpresentvalueofwinningthebid,wesubtractthe$16,000bidpreparationcosts.
Then,usetheprobabilityofwinningthebidtoweighttheEMVsofthe"Win"and"Don'twin"scenariosand
109/135

5/13/2016

QuantitativeMethodsOnlineCourse

determinetheEMVofplacingthebid.TheEMVofputtinginthebidis$877.IfVal'spaymentsaredelayedby
anotheryear,hecannotexpecttheprojecttobeprofitable,soheshouldnotsubmitabid.Delayingtheprofits
changesVal'soptimaldecision!

Exercise6:TheCrumblingEmpire
CapWinestoneofUniversalLearningispreparingabidforanewbuildingtoaccommodateitsexpanding
operations.Asluckhasit,theheadquartersofformercompetitorEmpireLearningisforsalethroughasealedbid
auction.ThebuildingiswellsuitedtoUniversal'sneeds,containingcomputingequipment,networkinfrastructure
andotherimportantelearningaccessories.
Capestimatesthebuildingisworthabout$900,000tohim,andistryingtodecidewhatbidtoplace.Tosimplify
hisdecision,henarrowsdownhisbidchoicestofourpossiblebids.Hemuses:"WithlowerbidsIgainmore
value.IfIbid$600,000,I'llgetabuildingworth$900,000tome,soIgain$300,000invalue.Intermsofthe
valueIgain,thelowerthebid,thebetter.
"Ontheotherhand,"Capcontinues,"WithalowbidI'mnotlikelytowin.Fromthepointofviewofwinningthe
bid,thehigherthebidthebetter.HowdoIbalancethesetwoopposingfactors?
Caplaysoutadecisiontreewiththepossiblebidamounts,thelikelihoodofwinningforeachbid,andtheoutcome
values.WhatamountshouldCapbid?
Foldingbackthetree,wefindthe$800,000bidgivesthehighestexpectedmonetaryvalue.the$800,000bidbest
balancesthevaluegainedagainsttheprobabilityofwinning.

SensitivityAnalysis
StillinLeo'sofficeafteryouinitiallycalculatedtheexpectedmonetaryvalueoflaunchingtheChezTethys,youlistento
Leo'sgrowingconcernsabouttheestimatesthatinformhisdecision.

LeotheSkeptic:TheUncertainEstimatesProblem
Nowthatyou'vemappedoutthepossibledownsideanditsprobabilityforme,I'malittlediscouraged.Sure,the
analysisindicatesthatventuresliketheTethyswillbeprofitableonaverage,butthefactthatIhavealmostatwo
thirdschanceofan$800,000lossscaresme.
Whatifthepotentiallossisevengreater?Or,whatifit'sevenmorelikelythattheChezTethyswilljustbeapassing
"Fad?"
Bothpointsarewelltaken,Leo.I'lltellyouwhat:let'sbreakforlunch,andthendelvealittledeeperintoouranalysis.

ADecision'sSensitivitytoOutcomeEstimates
Overlunch,AlicecommentsonLeo'sreactiontoyouranalysis."Leo'squestionsbringustoacrucialcomponentof
decisionanalysis:sensitivityanalysis."
SethChaplinhasallbutdecidedhowtoproducethefilmCloven.Basedonhisinitialanalysis,heisinclinedto
produceCloveninpartnershipwithK2Classics,therebyretainingpartownershipofthefilm.TheEMVoftheK2
optionis$1.4million.TheEMVofthealternativeoptioninwhichSeth'scompanyproducesClovenforPony
Picturesandrelinquishesownershipinthefilmis$1.0million.
ButSeth'scalculationswerebasedonestimates.Theprobabilitiesandtheoutcomevaluesheusedinhisanalysis
wereeducatedguesses.Nomatterhowdetailedandrigorousthemethodology,adecisionanalysisisonlyasgoodas
thedataonwhichitisbased.WhatifSethisn'tcompletelycertainaboutthesedata?
Sethisparticularlyunsureaboutthe$6millionvalueheusedtorepresenttheprofitsassociatedwitha
"Blockbuster"film.WhatwouldtheEMVoftheK2optionbeif$4millionwereamorerepresentativevaluefor
"Blockbuster"profits?
EntertheEMVin$millionsasadecimalnumberwithonedigittotherightofthedecimalpoint(e.g.,enter
''$5,500,000''as''5.5'').Roundifnecessary.
If$4millionistheexpectedvalueofS&C'sprofitsforthe"Blockbuster"scenario,thentheEMVofa"Blockbuster"
dropsto$800,000,lessthanthe$1millionEMVofthePonyoption.Notethatifthe"Blockbuster"profitfiguredrops
110/135

5/13/2016

QuantitativeMethodsOnlineCourse

to$4million,Seth'soptimaldecisionchanges:heshouldnowchoosethePonyoption.Sethhaslearnedthathis
optimaldecisionissensitivetothefigureheusestorepresentS&C'sprofitsifClovenattains"Blockbuster"status.
IfSeth'soptimaldecisionswitcheswhenthe"Blockbuster"profitfiguredropsto$4million,hemightreasonably
wonderwhathisdecisionwouldbeforothervalues.Whatabout$5million?$4.5million?Howlowwouldthe
"Blockbuster"profitfigurehavetobetomakethePonyoptionpreferabletotheK2optionintermsoftheEMV?
Atwhat"Blockbuster"profitfiguredoesSeth'soptimaldecisionswitch?
EntertheEMVin$millionsasadecimalnumberwithonedigittotherightofthedecimalpoint(e.g.,enter
''$5,500,000''as''5.5'').Roundifnecessary.
Toanswerthatquestion,wefirstwritetheEMVoftheK2optionwithavariable,B,torepresent"Blockbuster"
profitsinmillionsofdollars.
Now,wecomparetheEMVoftheK2optiontotheEMVofthePonyoption.TheK2EMVisgreaterthanthePony
EMVwhenthe"Blockbuster"profitfigureisgreaterthan$4.67million.
Wecall$4.67millionthebreakevenvaluefor"Blockbuster"profits:aboveit,theEMVcriterionrecommendsthe
K2optionbelowit,theEMVcriterionrecommendsthePonyoption.
Sethmaynotbesureifthe"Blockbuster"profitsarebestrepresentedby$6millionor$5millionor$4.8million,
butaslongasheisconfidentthatthefigureisgreaterthan$4.67million,heneednotloseanysleepoverfindinga
moreaccuratevalue.Knowingthathisestimatesforthe"Blockbuster"profitfigurearefirmlyononesideofthe
breakevenvalueallowshimtostopworryingabouttheprecisevalueofthatnumberinthedecisionanalysis,since
knowingthatnumberwithgreaterprecisionwillnotchangehisdecision.
Ontheotherhand,ifSeththinksthe"Blockbuster"profitfigurecouldbebelow$4.67million,hemaywishtoinvest
additionalresourcestofindamoreaccuratefiguretorepresent"Blockbuster"profits.
IfSeththinksthetruenumberisaroundthebreakevenvalueof$4.67million,butisn'tcertainifit'sslightlyabove
orslightlybelowthatfigure,hecanalsostopworryingabouttheprecisevalueofthenumber.Aslongasthefigureis
around$4.67million,SethshouldbeindifferentbetweentheK2andPonydeals,sincetheEMVsofthetwooptions
areaboutthesame.
Infact,agoodwaytocheckthatwehavecalculatedabreakevenvaluecorrectlyistosubstitutethevalueintothe
EMVcalculation:iftheoptionshavethesameEMV,thebreakevenvalueiscorrect.
Oncewecalculateabreakevenvalue,weknowwhetherornotexpendingadditionaltimeandotherresourcestofind
amoreaccurateestimateisworthwhile.Thebreakevenvalueestablishesacomfortzone:aslongasweareconfident
thatthevalueweareestimatingiswithinthezone,wecanfeelcomfortablechoosingtheoptionrecommendedby
ourinitialanalysis,basedonouroriginalestimate.
Ifwethinkthevalueweareestimatingcouldbeclosetothebreakevenvalue,weneedtobemorecautious.Ifthe
truevaluewearetryingtoestimatecouldlieoutsideofthecomfortzone,wemightwanttotrytomakeourestimate
ofthatvaluemoreaccuratebeforewereachafinaldecision.
Howconfidentwearethatthetruevalueweareestimatingliesinsidethecomfortzonegivenbythebreakeven
analysisisamatterofjudgmentandexperience.Sometimes,wemightcollectsampledatatoestimateanoutcome
value.Inthiscase,weshouldlookcloselyatthevariationinthedatatoseehowwidelyandinwhatwaythedatacan
vary.
Calculatingabreakevenvaluefordatausedinadecisionanalysisiscalledsensitivityanalysis:foreachestimated
valueintheanalysis,wechecktoseebyhowmuchitwouldhavetochangetoaffectourdecision,assumingour
estimatesforalltheotherdataarecorrect.
Sensitivityanalysisisanimportantandpowerfultoolformanagementdecisionmaking.Managerswhobase
decisionsonaninitialanalysiswithoutperformingsensitivityanalysisoncriticaldatarisklullingthemselvesintoa
falsesenseofsecurityintheirdecisions.

Summary
Aftercompletinganinitialdecisionanalysis,alwaysconductasensitivityanalysisforeachoutcomevalueestimate
youareuncomfortablewith.First,calculatetheoutcomevalue'sbreakevenvalue:thevalueforwhichtheEMVof
theoptioninitiallyrecommendedbythedecisionanalysisceasestobethebestEMV.Thebreakevenvaluedefines
acomfortzone:Ifwebelievethattheactualoutcomevaluemightbeoutsidethatzonetherebychangingthe
optimaldecisionweshouldreconsiderouranalysisandrefineourestimateoftheoutcomevalueinquestion.
111/135

5/13/2016

QuantitativeMethodsOnlineCourse

EvaluatingNonMonetaryConsequences
DuringhisnegotiationswithK2andPony,Sethrealizedthathedidnotparticularlylookforwardtoworkingwith
theK2team.Basedonpastexperience,heknowsthatinterpersonalfrictionscanbehighlyfrustratingandcan
makeacollaborationunpleasant.Thisfrustrationisclearlyacostalbeitanonmonetarycostassociatedwith
theK2option.
Sensitivityanalysiscangiveusa"realitycheck"onhowhighlywevaluenonmonetaryconsequencessuchas
frustration,reputationcostsandbenefits,andsentimentalvalues.Althoughitmaybedifficulttoassignavalueto
suchconsequences,wecanoftenanswerquestionsaboutthemostwe'dbewillingtopaytoavoid(ortoobtain)
thembycalculatingathresholdvalue.
ClearlySethwouldn'twanttospoilanymagicalHollywooddaysjusttomakeanadditional$5inprofits.Butthe
moretheK2optionpaysrelativetothealternatives,themorewillingSethmightbetosufferworkingwiththeK2
team.HowcanSethdeterminewhetherheshouldaccepttheK2dealandbeartheresultinginterpersonaltrials
andtribulations?
Sensitivityanalysiscanhelpusanalyzenonmonetaryconsequencessuchasfrustration.Let'suse"F"torepresent
thefrustrationcost(in$millions)SethwillincurifhehastoworkwiththeK2team.Sincefrustrationwilloccurin
anyscenarioinvolvingtheK2option,$FmillionmustbesubtractedfromalloutcomesassociatedwiththeK2
option.
HowlargemustthecostoffrustrationbeforSeth'soptimaldecisiontoswitchtothePonyoption?
EnterF,thecostoffrustration,in$millionsasdecimalnumberwithtwodigitstotherightofthedecimalpoint
(e.g.,enter''$5,500,000''as''5.50'').Roundifnecessary.
Adding$FmilliontotheoutcomeschangestheEMVoftheK2optionto$1.4million$Fmillion.ForSeth's
decisiontoswitchfromtheK2dealtothePonydeal,theEMVoftheK2option,$1.4million$Fmillion,must
dropbelow$1.0million.Inotherwords,Fmustsatisfytheinequalitybelow.
InordertolowertheEMVoftheK2optionbelowtheEMVofthePonyoption,thecostoffrustrationwouldhaveto
bevaluedatleastat$400,000.Sethneedstoaskhimselfifhewouldpay$400,000toavoidthefrustrationof
workingwithK2.Sensitivityanalysisprovidesaclear,monetaryupperboundagainstwhichhecanmeasurethe
strengthofhisfeelings.
Intheend,SethdecidesthathecanlearntolovetheK2teamfor$400,000.

Summary
Toincorporatenonmonetaryconsequencesintoadecision,firstfindtheoptionwiththebestEMV.Then,add
thenonmonetaryconsequencetotheoutcomevaluesofallscenariosaffectedbythatnonmonetary
consequence,andcalculatethebreakevenvalueforwhichtheoptionrecommendedbytheinitialdecision
analysisceasestohavethebestEMV.Thebreakevenvaluedefinesacomfortzone:Ifwebelievethattheactual
valueofthenonmonetaryconsequencesmightbeoutsidethatzonetherebychangingtheoptimaldecision
weshouldtrytogainafirmestimateofthenonmonetaryconsequence'svalue.

ADecision'sSensitivitytoProbabilityEstimates
SethissomewhatunsureabouthisestimatesfortheprobabilitiesofhowsuccessfulClovenwillbe.Heisquite
surethattheprobabilityofthefilmfloppingisaround20%,buthe'slesssureabouttheprobabilitiesof
"Lackluster"and"Blockbuster"performancelevels.HewantstoknowhowsensitivehisdecisiontochoosetheK2
dealistothevaluesoftheseprobabilities.
Let'scalltheprobabilityofa"Blockbuster""p."Sethisconfidentthattheprobabilityofa"Flop"is20%.Whatis
theprobabilityofa"Lackluster"outcome?
Sincethethreeoutcomesaremutuallyexclusiveandcollectivelyexhaustive,theirprobabilitiesmustaddto100%.
Sethisconfidentthattheprobabilityofa"Flop"is20%,soheknowsthattheprobabilitiesoftheremainingtwo
outcomes("Blockbuster"and"Lackluster")mustaddto80%.Thus,theprobabilityofa"Lackluster"performance
is0.8p.
Usingptodenotetheprobabilityofa"Blockbuster"outcomeand0.8ptodenotetheprobabilityofa"Lackluster"
outcome,whatistheEMVoftheK2option?
112/135

5/13/2016

QuantitativeMethodsOnlineCourse

TheEMVoftheK2optionisp*$6million$0.4million.Forwhatvaluesofp,theprobabilityofCloven
becominga"Blockbuster"film,isthePonyoptionpreferredtotheK2optiononthebasisofEMV?
Thebreakevenvaluefortheprobabilityofa"Blockbuster"hitis23.3%:whentheprobabilityisabove23.3%,the
EMVoftheK2optionishigherwhentheprobabilityisbelow23.3%,theEMVofthePonyoptionishigher.If
Sethisconfidentthattheprobabilityofa"Blockbuster"isatleast23.3%,hecanfeelcomfortablechoosingtheK2
option.Heneednotexpendadditionalefforttryingtofurtherrefinehisestimatesfortheprobabilitiesofdifferent
levelsofsuccess.
Decisionmakingisaniterative,multistepprocess.Whenanalyzingadecision,weshouldfirstconstructand
analyzeadecisiontreebasedonourbestestimatesoftheoutcomesandprobabilitiesinvolved.Afterreachinga
tentativedecision,itiscriticaltoscrutinizethedatausedinadecisionanalysisandconductsensitivityanalyses
foreachestimatethatwefeelunsureabout.
Aslongaswearecomfortablethatthetruevalueweareestimatingiswithintherangespecifiedbythebreakeven
calculation,wecanconfidentlyproceedwithourdecision.Ifnot,weshouldfocusoureffortsonrefiningour
estimatesforthosevaluestowhichourdecisionsaremostsensitive.
Finally,weshouldnotethatmanagerssometimesneedtotestthesensitivityoftheirdecisionstotwoormore
estimatessimultaneously.Sensitivityanalysistechniquescanbeextendedtoaddressthesesituationsthese
techniquesarebeyondthescopeofthiscourse.

Summary
Aftercompletinganinitialdecisionanalysis,alwaysconductasensitivityanalysisforeachprobabilityvalueyou
areuncomfortablewith.Firstcalculatetheprobability'sbreakevenvalue:theprobabilityforwhichtheEMVof
theoptioninitiallyrecommendedbythedecisionanalysisceasestobethebestEMV.Thebreakevenvalue
definesacomfortzone:Ifwebelievethattheactualprobabilitymightbeoutsidethatzonetherebychanging
theoptimaldecisionweshouldreconsiderouranalysisandrefineourestimateoftheprobabilityinquestion.

SolvingtheUncertainEstimatesProblem
Sensitivityanalysishelpsmanagerscopewiththeuncertaintythatsurroundstheestimatestheybasedecisionson.
Withyournewknowledge,you'rereadytoturntoLeo'sproblem.
YourinitialanalysisrecommendsthatLeolaunchhisfloatingrestaurant,butLeohasexpressedsomedoubtabout
theestimateofthelosseshe'dincuriftheTethysturnsouttobeapassing"Fad."
Howhighdothelosseshavetobeinthe"Fad"scenariotomakelaunchingtheTethysilladvisedintermsofEMV?
LaunchingtheTethyswouldbelessattractivethannotlaunchingitiftheEMVoflaunchingislessthan$0,theEMV
ofnotlaunching.Solvingtheinequalitybelow,wefindthatthelossesintheeventthattheTethysisjusta"Fad"
wouldhavetoexceed$1.08millionforLeotoabandontheprojectbasedontheEMVcriterion.
Assumethat,infact,Leo'sestimatethathewillincuran$800,000lossifChezTethysturnsouttobea"Fad"is
correct.Forwhatprobabilityofa"Fad"doeslaunchingtheTethysceasetobeaworthwhileventure,intermsofthe
EMV?
LaunchingtheTethysislessattractivethannotlaunchingitiftheEMVoflaunchingislessthan$0,theEMVofnot
launching.Solvingtheinequalitybelow,wefindthattheprobabilityofa"Fad"wouldhavetobehigherthan71%for
LeotoprefertoabandontheprojectbasedontheEMVcriterion.
Hmm,I'mfairlyconfidentthatourestimateofthepossiblelossisontargetcertainlyitwon'tbehigherthana
milliondollars!ButthemoreIthinkaboutit,themoreIwishwehadabetterestimatefortheprobabilitythatthe
TethyswillbethesuccessI'vealwaysdreamedof.
IhavesomeideasforwaystogetabetterhandleonthelikelihoodoftheTethys'success.ButIneedtomakesome
phonecallstoseeifI'montherighttrack.Let'sbreakfortodayandmeettomorrowmorning.

Exercise1:SensitiveasaTruck
RobinBeaistheCEOofasmallshippingcompany.Sheneedstodecidewhetherornottoleaseanothertruckto
addtohercurrentfleet.
Basedonherassessmentoffuturetrendsinthetransportationsector,Robinidentifiedthreescenariosthatmight
113/135

5/13/2016

QuantitativeMethodsOnlineCourse

occur:"Boom,""ModerateGrowth,"and"Slowdown,"correspondingtotheperformanceoftheeconomyandits
impactonthetransportationsector.Sheassociatedestimatesoftotalfirmprofitswitheachscenario,and
calculatedtheEMVsof"leasing"and"notleasing"anewtruck:$31,000and$18,200,respectively.
Robinisafraidthatshemayhavebeenalittleoveroptimisticinherestimateof$70,000oftotalprofitsinthe
scenarioinwhichshe"leasesanewtruck"andthetransportationsector"booms."Shewantstoknowhowsensitive
herdecisionistotheoutcomevalueofthisscenario.
WhatmustthevalueoftotalprofitsofthescenarioinwhichRobin"leasesthetruck"andthetransportationsector
"booms"beinorderforthe"don'tleasetruck"optiontobethemoreattractiveone?
Forsufficientlylowprofitsgeneratedbythenewtruckinthe"Boom"scenario,theEMVofleasingthetruckis
lowerthantheEMVofnotleasingthetruck.TofindthebreakevenvalueX,wesolvetheinequalitybetweenthe
twoEMVsshownbelow.ThisinequalityissolvedwhenX,thevalueoftotalprofitsintheleasetruckand"Boom"
scenario,is$27,333.

Exercise2:SneakersofDiscord
ValPurcell,CEOofthesupplychainmanagementconsultingfirmPurcell&Co.,mustdecidewhetherornottoput
inabidforacontracttoreengineertheprocurementprocessforapotentialnewclient:theErisShoeCompany.
Creatingthebidwillcost$16,000inVal'stimeandlegalfees.Ifhewinsthebid,hewillreceivepaymentforthe
projectuponcompletion,twoyearsaftersigningthecontract.Hehasestimatedthepresentvalueoftheprofitsif
hewinsthebidat$75,614.Afterfactoringinthe$16,000costofsubmittingthebid,thenetpresentvalueofthe
profitsifhewinsthebidis$59,614.
Valbelieveshehasa20%chanceofwinningthebid,sotheEMVforsubmittingthebidis$877:underthese
circumstances,Valcan'texpectsubmittingabidtobeprofitable.ButValhasn'tbeenabletofigureouthowstiff
thecompetitionforthecontractis,andhe'sfairlyuncertainabouttheprobabilityofwinning.
HowhighwouldtheprobabilityofwinningthebidhavetobetomakeValchangehismindandchoosetoputina
bid?
TochangeVal'sdecision,theEMVofbiddingwillhavetobehigherthantheEMVofnotbidding:$0.Tofindout
howhighp,theprobabilityofwinningthebidwouldhavetobe,wesolvetheinequalitybelow.Notethatthe
probabilityoflosingthebid(1p)mustchangeastheprobabilityofwinningchanges.Theinequalityissolved
whentheprobabilityofwinningthebidisgreaterthan21.2%.

Exercise3:TheSHAMHContinues
JariLipponenoftheSilverhavenHomeforAbandonedMiniatureHorses(SHAMH)needsfundstomaintain
operations.Hecaneitherapplyforagovernmentgrantorrunalocalfundraiser,butthedemandsonhistimeare
toohighforhimtobeabletodoboth.
Jaribelieveshehasa90%chancethathewillwinagrantof$25,000ifhesubmitsthegrantapplication.He
estimatesthatalocalfundraiser,hehasa30%chanceofraising$30,000and70%chanceofraising$20,000.The
EMVofthefundraiseroptionis$23,000,higherthan$22,500,theEMVofapplyingforthegrant.Basedonthe
initialanalysis,Jarishouldorganizeafundraiser.
Jarihasexpresseduncertaintyabouthisestimatefortheprobabilitiesofthetwolevelsoffundraisingsuccess.
Howlowwouldtheprobabilityoftheraising$30,000havetobetochangehisdecision?(Enteryouranswerasa
decimalnumberwithtwodigitstotherightofthedecimalpoint)
ForJaritochangehisinitialdecision,theEMVofthefundraisingoptionwouldhavetobelessthan$22,500.That
wouldbethecaseiftheprobabilityofthehighfundraisingsuccessvaluedat$30,000werelessthan25%.
Jarineeds$20,000tooperatetheSHAMHthroughthenextyear.Raising$25,000ormorewouldpermitJarito
operatetheSHAMHandexpandthecapacityoftheoperation,therebyallowingevenmoreabandonedminiature
horsestobesaved.ThisyearwillbeJari'slastrunningtheSHAMH,andhereallywantstoexpanditscapacity
beforeheleaves.
Ifhe'sunabletoraiseatleast$25,000tocovertheSHAMHexpansion,he'llbeverydisappointed.Howhighly
wouldJarihavetovaluehisdisappointmentinorderforhimtopreferthegovernmentgrantoptionthatismore
likelytosecurefundsforexpansion?Assumethathisoriginalprobabilityassessmentsforthesuccessofthe
fundraiserarecorrect.
114/135

5/13/2016

QuantitativeMethodsOnlineCourse

IfJariisonlymoderatelysuccessfulinhisfundraisingactivities,hewon'tbeabletoexpandtheSHAMHonhis
watch.ThedisappointmentcostDshouldbesubtractedfromtheoutcomevalueforthescenarioinwhichhetakes
inonly$20,000incontributions.Afterincorporatingthedisappointmentcost,theEMVofthefundraiseroption
becomes$23,0000.7D.
However,ifheappliesforthegrantanddoesn'twinit,hewon'tbeabletoexpand,either.Thedisappointment
costDmustalsobesubtractedfromtheoutcomevalueinthescenarioinwhichhewritesthegrantbutdoesn't
winit.Afterincorporatingthedisappointmentcost,theEMVofthegrantwritingoptionbecomes$22,5000.1D.
ThebreakevenvalueisthevalueofdisappointmentforwhichtheEMVsofthetwooptionsareequal.Thus,tofind
thebreakevenvalueofD,wemustsetupaninequalitybetweenthetwoEMVsandsolveforD.Inthiscase,the
disappointmentcostwouldhavetobegreaterthan$833.33inorderforJaritopreferthegrantwritingoption.

DecisionAnalysisII
ConditionalProbabilities
AfteraKahanabreakfastofcrabbenedict,youmeetoncemorewithLeo.

DiningChezTethys:TheMarketResearchProblem
So,I'vebeenthinking:mydecisionisheavilydependentonthelikelihoodthattheChezTethyswillbeasuccess.
Couldn'twedosomemarketresearchtofindouthowinterestedourtargetmarketwouldbe?ThenIcouldmakea
betterdecisiononethatwouldbelessrisky!
Sure,Leo.Butmarketresearchcostsmoney.Howmuchwouldyoubewillingtopayforinformationthatwouldhelp
youbetterpredictthesuccessoftheChezTethys?
Hmmm.Goodquestion...frankly,Idon'thaveacluehowtoevenstartthinkingaboutthat.Canyoutwohelpme?
"WhatevermarketresearchLeohasinmind,"remarksAlice,"itwon'trevealwithcertaintyhowconsumerswilltake
totheTethys.Inotherwords,we'reuncertainabouttheTethys'chanceofsuccess,andwe'llbeuncertainthatthe
marketresearchwecollectisaccurate."

JointandMarginalProbabilities
Ponderingtwolayersofuncertaintyyoubegintofeelvertigo...
Whenanalyzingadecision,wetypicallyuseestimatesfortheprobabilitiesandfinancialimplicationsofvarious
outcomesthatcouldoccur.Often,wecanimagineadditionalinformationscientifictests,marketresearchdata,or
professionalexpertisethatwouldhelpmakeourestimatesmoreaccurate.Howmuchshouldwebewillingtopay
forthistypeofinformation?Andhowdoweincorporatenewinformationintoourdecisionanalysis?
Beforeweanswerthesequestions,we'llneedtoexpandourunderstandingofprobabilityandintroducethe
importantconceptsofconditionalprobabilityandstatisticalindependence.Let'slookatanexamplefirst.
BritishautomakerChariot'smostsoughtaftermodelistheBenHur.ConsumerslovetheBennie,asit's
affectionatelycalled.Itwasofferedasalimitededition:todate,only1,000BenniesareontheroadinBritain.We'll
takeacloserlookattwopropertiesoftheBennie:its"Color"andits"Stereo."
TheBenniecomesintwocoloroptions:BurgundyandChampagne.Also,theBennieisofferedwithahighendcar
stereosystembySweetone.Let'slookatatableofthe1,000Benniescurrentlyontheroadandseehowthetwo
properties"Color"and"Stereo"aredistributed.
Ofthe1,000Bennies,150areBurgundyandhaveaSweetonestereo.600areBurgundyanddonothavethe
Sweetonestereo.Furthermore,thereare50ChampagneBennieswiththestereoand200without.
Let'saddanothercolumntoourtableandfillinthetotalnumbersofBurgundyandChampagneBennies.To
calculatethesenumbers,wesimplytakethesumsoftherows.Forexample,thenumberofBurgundyBenniesis
simplythenumberofBurgundyBennieswiththeSweetonestereoaddedtothenumberofBurgundyBennies
without.
Next,inthefinalrow,we'llfillinthetotalnumbersofBennieswithandwithouttheSweetonestereo.Here,we
simplytakethesumsofthetwocolumns.Inthebottomrightcell,weenterthetotalnumberofcars:1,000.
115/135

5/13/2016

QuantitativeMethodsOnlineCourse

Thisnumber1,000shouldbeequaltothesumofthenumbersinthefinalrow:thetotalnumberofBennies
withtheSweetonestereoaddedtothetotalnumberwithoutone.Also,itshouldbeequaltothesumofthenumbers
inthefinalcolumn:thenumberofBurgundyBenniesplusthenumberofChampagneBennies.Finally,since1,000
isthetotalnumberofallBennies,itshouldbethesumofalltheoriginalnumbersweenteredintothetable.
AVenndiagramisausefulgraphicalwaytorepresentthecontentsofthetable.TheBurgundyrectangleontop
representsthesetofallBurgundyBennies.TheChampagnerectanglebelowrepresentsthesetofallChampagne
Bennies.TherectanglesdonotintersectbecauseaBenniecannotbebothChampagneandBurgundy.
WenowaddapatternedrectangleonthelefttorepresentthesetofBennieswiththeSweetonestereo.Thearea
withoutthepatternrepresentsthesetofBenniesthatarenotequippedwiththeSweetone.Theseareasintersectthe
rectanglesthatrepresentthedistributionof"color":someBurgundyBenniesareSweetoneequippedsomearenot.
SomeChampagneBenniesareSweetoneequippedsomearenot.
ThesizesoftheareasintheVenndiagramaredirectlyproportionaltotheincidenceofthedifferentBennie
properties:Burgundy/ChampagneandSweetone/noSweetone.WeuseVenndiagramstoeffectivelycommunicate
informationaboutsetsofthingsforexample,BurgundycarsandSweetoneequippedcarsandtheir
interactions.
Thetableisausefultoolforcalculatingproportionsofpotentialinteresttomanagers.Forinstance,wecanfindthe
proportionofBurgundyBennieswiththeSweetonestereosimplybylocatingthecellthatcontainstheirnumberand
dividingitbythetotalnumberofcars:15%.
Or,tofindtheproportionofBurgundyBenniesingeneral,wefindthecellthatcontainsthetotalnumberof
BurgundyBenniesanddivideitbythetotalnumberofcars:75%.
Inthiswaywecancreateanentiretableofproportions.Theseproportionscanbeinterpretedasprobabilities.For
example,since15%oftheBenniesontheroadareBurgundycoloredandSweetoneequipped,theprobabilitythata
randomlyselectedBenniewillbeBurgundycoloredandhavetheSweetoneis15%.
TheprobabilitythatarandomlyselectedBenniewillbeBurgundyis75%.Goingforward,intalkingaboutthetable,
we'llusethewords"proportion"and"probability"interchangeably.
Theprobabilitiesontheinsideofthetablearecalledjointprobabilities:theprobabilitiesofasinglecarhaving
twoparticularBenniefeatures,forexample,BurgundycoloredandSweetoneequipped.We'lldenotejoint
probabilitiesinthefollowingway:P(Burgundy&Sweetone)istheprobabilitythataBennieisBurgundycolored
andSweetoneequipped.
Theprobabilitiesofeach"Color"or"Stereo"optionoccurringinagivenBennieBurgundyorChampagne,
SweetoneequippedornotSweetoneequippedareoftencalledmarginalprobabilities.Theyaredenoted
simplyasP(Burgundy),P(Champagne),P(Sweetone),andP(noSweetone).
Informationaboutthedistributionofpropertiesinpopulationsisoftenavailableintermsofprobabilities,sothe
tableofprobabilitiesisaverynaturalwaytorepresenttheBenniedata.

Summary
FortwoeventsAandBwithoutcomesA1,A2,etc.andB1,B2,etc.,respectively,thejointprobabilityP(A1&B1)
istheprobabilitythattheuncertaineventAhasoutcomeA1andtheuncertaineventBhasoutcomeB1.Thejoint
probabilitiesofallpossibleoutcomesoftwouncertaineventscanbesummarizedinaprobabilitytable.The
marginalprobabilityoftheoutcomeA1ofthefirstuncertaineventisthesumofthejointprobabilitiesof
outcomesA1andallpossibleoutcomesB1,B2,etc.oftheseconduncertainevent.

ConditionalProbabilities
AutomakerChariot'slimitededitionBenHurmodelcomesintwopossiblecolorsChampagneorBurgundy
andwithorwithoutahighendSweetonestereo.Thetablebelowshowsthedistributionoftheseproperties
"Color"and"Stereo"inthepopulationof1,000BenniesthatChariothassoldtodate.
RestrictingourfocustotheBurgundyBennies,weaskthefollowingquestion:amongthesetofBurgundyBennies,
whatistheproportionofBurgundyBennieswithaSweetonestereo?Stateddifferently:whatistheprobability
thatarandomlyselectedBenniehasaSweetonegiventhatweknowitisBurgundy?
Toanswerthequestion,wefindtheratioofSweetoneequippedBurgundyBenniesamongthesetofall
BurgundyBennies.Thatis,wedividethenumberofBenniesthatarebothBurgundyandhaveaSweetone
116/135

5/13/2016

QuantitativeMethodsOnlineCourse

150bythetotalnumberofBenniesthatareBurgundy750.Theprobabilityis20%.
Thisprobabilityiscalledaconditionalprobability:theprobabilitythataBennieisSweetoneequippedgiven
thatitisBurgundy.We'lldenotethisprobabilityP(Sweetone|Burgundy),andreadtheverticallineas"given."
WecancalculateP(Sweetone|Champagne)as:
P(Sweetone|Champagne)isthenumberofBenniesthatareChampagneandSweetoneequippeddividedbythe
totalnumberofChampagneBennies.
WhatisP(NoSweetone|Burgundy)?
Enterthepercentageasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter"50%"as
"0.50").Roundifnecessary.
UsingthetableofactualnumbersofcarswecancalculateP(NoSweetone|Burgundy)andafourthconditional
probability,P(NoSweetone|Champagne).
Earlier,weusedthetableofactualnumbersofcarstocalculateatableofprobabilities.Whengiveninformationas
atableofprobabilities,wecanusetheprobabilitiestocalculateconditionalprobabilitiesaswell:wesimplyform
theratiosoftheappropriateprobabilities.
Forexample,theprobabilitythataBennieisSweetoneequippedgiventhatitisBurgundyistheprobabilityofa
Sweetoneequipped,BurgundyBenniedividedbytheprobabilityofaBurgundyBennie.
Infact,aconditionalprobabilityisformallydefinedintermsoftheratioofajointprobabilitytoamarginal
probability.
IfP(Sweetone|Burgundy)representstheprobabilitythataBennieisSweetoneequippedgiventhatitis
Burgundy,whatdoesP(Burgundy|Sweetone)represent?
P(Burgundy|Sweetone)representstheprobabilitythataBennieisBurgundygiventhatitisSweetone
equipped.NotethatP(Sweetone|Burgundy)andP(Burgundy|Sweetone)arenotthesame.
Wecancalculatetheotherconditionalprobabilities,thistimeconditioningtheproperty"Color"ontheproperty
"Stereo."Itisusefultowritetheconditionalprobailitiesnexttothetableofprobabilitiestofacilitatecalculation:
sinceP(Burgundy|Sweetone)andP(Champagne|Sweetone)requireonlytheprobabilitiesintheSweetone
column,wewritethemdirectlybelowtheSweetonecolumn.
Similarly,sinceP(Burgundy|NoSweetone)andP(Champagne|NoSweetone)requireonlytheprobabilitiesin
the"NoSweetone"column,wewritethemdirectlybelowthe"NoSweetone"column.Notethatthenewrows
mimictherowsintheoriginaltable:BurgundyontopandChampagnebelowit.
Similarly,wewritetheconditionalprobabilitiesfortheproperty"Stereo"conditionedontheproperty"Color"to
therightofthetable.Again,wemimicthecolumnsintheoriginaltable:acolumnfor"Sweetone"andonefor"No
Sweetone."Wenowhaveafulltableofjoint,marginal,andconditionalprobabilities.
Itisimportanttodoublecheckwhicheventweareconditioningon,andtodoa"realitycheck"tomakesureour
calculationsarerealistic.Forexample,theprobabilitythatarandomlychosenIndiancitizenistheprimeminister
isnearlyzero,buttheprobabilitythattheprimeministerofIndiaisanIndiancitizenis100%!
Thetableofjointprobabilitiesinformsourunderstandingofthelikelihoodofdifferentpropertiesoreventsina
crucialway:aswe'veseenintheBennieexample,oncewehavethejointprobabilities,wecancomputeany
conditionalprobabilityandanymarginalprobability.
Thus,whenpresentedwithadecisionprobleminwhichtheoutcomesareinfluencedbymultipleuncertain
events,constructingthejointprobabilitiesoftheseeventsisalmostalwaysawisefirststep.

Summary
TheconditionalprobabilityP(A|B)istheprobabilityoftheoutcomeAofoneuncertainevent,giventhatthe
outcomeBofaseconduncertaineventhasalreadyoccurred.Thetableofjointprobabilitiesprovidesallthe
informationneededtocomputeallconditionalprobabilities.Firstcalculatethemarginalprobabilitiesforeach
event,thencomputetheconditionalprobabilitiesasshownbelow:

StatisticalIndependence
117/135

5/13/2016

QuantitativeMethodsOnlineCourse

Let'sreturnoncemoretoourChariotBenHurexample.TheBenniecomesintwopossiblecolorsChampagne
orBurgundyandwithorwithoutahighendstereobySweetone.Theprobabilitytablebelowshowsthe
distributionoftheseproperties"Color"and"Stereo"inthepopulationof1,000BenniesthatChariothassoldto
date.
NotethattheproportionofSweetoneequippedBenniesintheBurgundypopulationisthesameastheproportion
ofSweetoneequippedBenniesintheoverallpopulation:20%.Inthelanguageofconditionalprobabilities,thisis
thesameassayingthatP(Sweetone|Burgundy)=P(Sweetone).
Inotherwords,ifwerandomlyselectaBennie,thendiscoveringthatitisBurgundygivesusnoadditional
informationaboutwhetherornotitisequippedwithaSweetonestereo,beyondwhatwehadbeforeweknewits
color:westillthinkthereisa20%chanceithasaSweetone.
Similarly,theproportionofBurgundyBenniesintheSweetoneequippedpopulationisthesameasthe
proportionofBurgundyBenniesintheoverallpopulation:P(Burgundy|Sweetone)=P(Burgundy)=75%.
ABennie'scolortellsusnothingaboutitsstereosystem.Itsstereosystemtellsusnothingaboutitscolor.When
thisistrue,wesaythatthattheBennieproperties"Color"and"Stereo"arestatisticallyindependent,or,more
simply,independent.
Ingeneral,wecaninterpretthefactthattwouncertaineventsareindependentinthefollowingway:knowingthat
oneeventhasoccurredgivesusnoadditionalinformationaboutwhetherornottheothereventhas.Forexample,
theresultsoftwospinsofawheeloffortuneareindependent.Thefirstresultdoesnotrevealanythingaboutthe
second.
WecanconfirmtheindependenceoftheBennie'sstereoandcolorbylookingatourVenndiagramandnotingthat
SweetoneequippedBenniesoccupythesamepercentageinthepopulationofburgundyBennies20%asthey
dointheentireBenniepopulation.
Thus,tofindthejointprobabilitythataBenniehasaSweetonestereoandisburgundy,wetake20%ofthe75%of
Benniesthatareburgundy,givingusajointprobabilityof15%.Thispropertyistrueforanytwostatistically
independentproperties:thejointprobabilitiesaresimplytheproductsofthemarginalprobabilities.
Althoughitmayseemplausibletoassumethatcertainpropertiesareindependent,managerswhotakestatistical
independenceforgranteddosoattheirperil.Weneedtoverifytheassumptionthatthepropertiesare
independentbylookingatandevaluatingdataorbyprovingindependenceonthetheoreticallevel.

StatisticalDependence
Whenaretwooutcomesstatisticallydependent?Let'slookatanotheroptionalfeatureoftheBennie:aunique
factoryinstalledtheftdiscouragementsystem(TDS).
UsingChariot'sdata,wecancreatethefollowingtableofthedistributionoftheproperties"Stereo"and
"Protection."Beforegoingforward,practiceyourskillsbycalculatingthejoint,marginal,andconditional
probabilitiesfortheseproperties.Placethemintheusualformatinthecompleteprobabilitytable.
Thecompletedtableisshownbelow.Whichofthefollowingistrue?
Fromthetable,wecanseethatP(TDS|Sweetone)isnotequaltoP(TDS).Wecaninferthattheproperties
"Stereo"and"Protection"arenotstatisticallyindependentonceyouknowthatarandomlyselectedBennie
hasaSweetone,youknowthechancesofithavingaTDSsystemaregreaterthantheywouldbeifyoudidn't
havethatinformation.
OnceagainourVenndiagramprovidesvisualconfirmation,inthiscasethatthepropertiesarenot
independent.Intheoverallpopulation,theproportionofBenniesthatareTDSprotectedisonly35%.However,
inthepopulationofSweetoneequippedBennies,theproportionissignificantlyhigher:75%.Whymightthat
be?
BenniebuyerswhooptfortheSweetonestereofeelthattheircarswillbeespeciallytargetedfortheftor
vandalism,andaremorelikelytochoosetheTDSoptionthanarebuyerswhochoosethelowendcarstereo.
AsavvycarthiefknowsthatthenextBenniehepasseshasa35%probabilityofbeingprotectedbyatheft
discouragementsystem.
OnceheidentifiesthataparticularBenniehasaSweetonestereo,hegainsmoreinformation:thenheknows
thatthecarhasa75%chanceofbeingTDSprotected.Thismayaffecthisdecisionaboutwhetherornottobreak
intothatcar.
118/135

5/13/2016

QuantitativeMethodsOnlineCourse

Summary
TwouncertaineventsAandBaresaidtobestatisticallyindependentifknowingthatAhasoccurreddoesnot
tellusanythingabouttheprobabilityofBoccurring,andviceversa.Statisticalindependenceoftwoeventscan
bedemonstratedbasedondataorprovedfromtheoryitshouldneverbeassumed.Eventsthatarenot
statisticallyindependentaresaidtobestatisticallydependent.

ConditionalProbabilitiesinDecisionAnalysis
Probabilitytheoryisfascinating,sure,butwhatdoconditionalprobabilitieshavetodowithdecisionanalysis?
Toseehowwemightapplyconditionalprobabilitiesinadecisionanalysis,let'srevisittheClovenfilmproduction
example.
SethChaplinofS&CFilmsisecstatic.InasushilunchmeetingwiththeagentofsuperstaractorShawnConnelly,
theagentagreedtotrytoconvinceConnellytobethevoiceofthemaincharacterinSeth'snewfilm,Cloven.
UnderSeth'scharmingyetirresistiblepressure,theagenttoldSeththatjustbetweenthetwoofthemhe
thoughthehada40%chanceofpersuadingConnellytotaketherole.Connelly'sfameissuchthatbylendingthe
prestigeofhisnametoCloven,thelikelihoodofCloven'ssuccessincreasesdramatically.
Sethhasbookedtheservicesofacrackanimationteam,sohe'llhavetostartproductionsoonprobablybefore
Connellymakesadecision.Butthatshouldn'thinderproductionbecauseConnellycanaddhisvoiceoverswellafter
theanimationhasbeencreated.
Sethhasnotyetsignedeitherofthetwodealshenegotiated,andfeelsheshouldreconsiderhisoptionsinlightof
Connelly'spossibleparticipation.WithadecentshotatstarpowerbehindCloven,Sethfeelshecancloseabetter
dealwithPony.Connelly'svoicewouldalsosubstantiallyincreasethelikelihoodofblockbustersuccess,makingthe
dealwithK2potentiallymorelucrative.
SethreturnstoPonyandhammersoutasecondagreement,contigentonConnelly'sparticipation.Ponywillpayan
additional$5million$15millionintotalifSethcanretainConnelly'svoiceactingservices.Evenafter
accountingforConnelly'ssalary,Sethexpectstomakeatotalof$2.2millioninprofitsfromthePonydealifConnelly
agreestotaketherole.
HowshouldallofthisnewinformationaffectSeth'sdecision?Let'stakealookatthenewdecisiontree.
IfSethchoosesthePonydeal,therearenowtwopossiblescenarios:inthefirst,Connellyagreestolendhisvoiceto
Cloven,inthesecond,Connellydeclines.Thesetwoscenariosareassociatedwithtwodifferentprofitoutcomes:$2.2
millionand$1million,respectively.
BasedonConnelly'sagent'sassessment,thereisa40%chancethatConnellywilltaketherole,anda60%chancethat
hewon't.
WhatabouttheK2option?Whichofthefollowingdecisiontreescorrectlyreflectsthenewlyintroduced
circumstances?
Thenodesofadecisiontreearearrangedfromlefttorightintheorderinwhichthedecisionmakerwilleventually
knowtheirresults.Inthiscase,SethwillfirstdiscoverwhetherornotConnellytakestheroleonlythenwillhesee
howClovenperformsattheboxoffice.
Whataretheprobabilitiesofthedifferentbranches?AsinthePonyoption,thefirstbranchingintheK2optionisa
splitbetweenthescenariosinwhichConnellysignsontotheClovenprojectandthoseinwhichhedoesn't.These
branchesareassociatedwithprobabilitiesof40%and60%,respectively.
Whataretheprobabilitiesofthesixfinalbranches?Let'slookcloselyatthetopthreebranches.Onceweknowthat
Connellyhassignedon,weneedtoknowwhattherespectiveprobabilitiesareofa"Blockbuster,"a"Lackluster"
performance,anda"Flop."Inotherwords,weneedtheconditionalprobabilitiesofthethreeoutcomes,giventhat
ConnellytakespartinCloven.
Inanydecisiontree,to"beatanode"istoassumethatalltheeventsonthepathleadingtothatnodefromtheleft
havealreadytakenplace.Thus,thedataweassociatewithanydecisionorchancenodedependdirectlyonthe
sequenceofeventsontheuniquepathleadinguptothatnodefromtheleft.
Specifically,theprobabilitiesafterachancenodemustbeconditionedonalleventsontheprecedingpath,andthe
outcomevaluesmustincorporatetheeffectsofalloftheprecedingevents.
119/135

5/13/2016

QuantitativeMethodsOnlineCourse

ReturningtoSeth'sdecision:ifConnellytakespartinCloven,Sethestimatesat50%eachtheconditionalprobability
thatClovenwillbea"Blockbuster"andtheconditionalprobabilitythatitwillhavea"Lackluster"theaterrun.Atthis
stageinConnelly'scareer,a"Flop"isvirtuallyimpossible.IfConnellydoesn'ttakepart,theconditionalprobabilities
aresimplytheoriginalprobabilitiesof30%,50%,and20%forthethreepossibleoutcomes.
Usingtheseconditionalprobabilities,wecandeterminewhichofthetwooptionsSethshouldchoosebycalculating
theEMVsofallthenodesandfoldingbackthedecisiontree.
Let'sbeginwiththePonyoption.WhatisitsEMV?EntertheEMVin$millionsasadecimalnumberwithtwodigits
torightofthedecimalpoint(e.g.,enter"$5,500,000"as"5.50").Roundifnecessary.
TheEMVofthePonyoptionis$1.48million.
NowwefindtheEMVoftheK2option,constructingitstepbystep.First,wemustfindtheEMVfortheeventthat
ConnellydecidestovoicethemaincharacterinCloven.ThatEMVis$3million.
Next,wefindtheEMVfortheeventinwhichConnellyrefusesthepart.ThatEMVissimplytheEMVoftheoriginal
K2option:$1.4million.
ThetotalEMVfortheK2optionis$2.04million:theEMVwithConnelly'sparticipationweightedbytheprobability
ofhisparticipation,plustheEMVofClovenfilmedwithoutConnelly,weightedbytheprobabilitythatherefusesthe
part.
ThepossibilitythatShawnConnellymighttakepartgreatlyenhancestheattractivenessoftheK2option.

Summary
Theeventualoutcomesofmanydecisionsinvolvesequentialuncertaineventswhoseoutcomesaredetermined
overtime.Toconductadecisionanalysisinsuchcases,weneedtheprobabilitiesofthefirstuncertainevent's
possibleoutcomesandconditionalprobabilitiesoffutureuncertainevents'possibleoutcomes,conditionedonthe
previousevents'outcomes.

Exercise1.:CaptainAhabFisheries
CaptainAhabFisheriescansherringandsardinesineitherspicytomatosauceorvegetableoil.Thetabletothe
rightsummarizesthedistributionofAhab'sproductline.Whatisthemarginalprobabilitythatarandomly
selectedcanoffishiscannedintomatosauce?
Entertheprobabilityasadecimalnumberwiththreedigitstotherightofthedecimalpoint(e.g.,enter"50%"as
"0.500").Roundifnecessary.
Themarginalprobabilityoftomatosauceis23%.Similarly,themarginalprobabilityofvegetableoilis77%.
Whatisthemarginalprobabilitythatarandomlyselectedcanoffishcontainssardines?

Entertheprobabilityasadecimalnumberwiththreedigitstotherightofthedecimalpoint(e.g.,enter"50%"as
"0.500").Roundifnecessary.
Themarginalprobabilityofsardinesis40%.Similarly,themarginalprobabilityofherringis60%.
Whatistheconditionalprobabilitythatarandomlyselectedcanoffishiscannedintomatosauce,giventhatitisa
canofsardines?
Entertheprobabilityasadecimalnumberwiththreedigitstotherightofthedecimalpoint(e.g.,enter"50%"as
"0.500").Roundifnecessary.
Theconditionalprobabilityoftomatosaucegiventhatacancontainssardinesis35%.
Whatistheconditionalprobabilitythatarandomlyselectedcanoffishiscannedintomatosauce,giventhatitisa
canofherring?
Entertheprobabilityasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter"50%"as
"0.50").Roundifnecessary.

120/135

5/13/2016

QuantitativeMethodsOnlineCourse

Theconditionalprobabilityoftomatosaucegiventhatacancontainsherringis15%.
Whichislarger,theconditionalprobabilitythatarandomlyselectedcanoffishcontainssardines,giventhatitis
cannedinvegetableoil,ortheconditionalprobabilitythatarandomlyselectedcanoffishiscannedinvegetable
oil,giventhatitcontainssardines?
Theconditionalprobabilityofacancontainingsardinesgiventhatitcontainsvegetableoilis34%.Theconditional
probabilityofthefishbeingcannedinvegetableoilgiventhatthecancontainssardinesis65%.Notethat
P(Sardines|Oil)isnotthesameasP(Oil|Sardines).

Exercise2:MutuallyExclusiveandCollectivelyExhaustive
Thetabletotherightshowsthreepossibleoutcomesofanuncertainevent.Thefactthatthesumofthethree
probabilitiesislessthan100%tellsus:
Wecannotinferanythingaboutwhetherornottheseeventsaremutuallyexclusivefromtheinformation
provided.TheeventsA,B,andCatleftaremutuallyexclusive.TheeventsD,E,andFatrightarenotmutually
exclusive.
Thetabletotherightshowsthreepossibleoutcomesofanuncertainevent.Thefactthatthesumofthethree
probabilitiesislessthan100%tellsus:
Forasetofeventstobecollectivelyexhaustive,theprobabilitiesoftheeventsmustsumtoatleast100%.
Thetabletotherightshowsthreepossibleoutcomesofanuncertainevent.Thefactthatthesumofthethree
probabilitiesisgreaterthan100%tellsus:
Theprobabilitiesoftheeventsinamutuallyexclusivesetcannotadduptomorethan100%.Iftheprobabilitiesof
asetofeventsadduptomorethan100%,thenatleasttwoofthemarenotmutuallyexclusive,i.e.,theycanoccur
simultaneously.
Thetabletotherightshowsthreepossibleoutcomesofanuncertainevent.Thefactthatthesumofthethree
probabilitiesisgreaterthan100%tellsus:
Wecannotinferanythingaboutwhetherornottheseeventsarecollectivelyexclusivefromtheinformation
provided.Belowareexamplesofeventswiththeseprobabilitiesofwhichonesetiscollectivelyexhaustiveandone
isn't.

TheValueofInformation
Marketresearchcanbeexpensive.IfthecostoftheresearchoutweighsitsvaluetoLeo,thenobviouslyheshouldn't
spendmoneyontheresearch."We'llneedtoassessthevalueofLeo'smarketresearch,"Alicetellsyou,"todetermine
whetherornotpayingforitisawisechoice."

TheExpectedValueofPerfectInformation
TheErisShoeCompanyisconsideringsourcingsomeofitsproductionfromthedevelopingcountryArboria.
ArboriawasalandofcivilwarandstrifeuntilacontroversialUNinterventiontwoyearsagoreconciledwarring
factionsandhelpedinstallanationalunitygovernment.Today,guerrillawarfarepersistsinthemountains,butthe
majorcoastalcitiesarerelativelysecure.
ErisCEOEmilyVillehasidentifiedArboriaasacandidatelocationduetoitslowlaborcostsandisconsidering
openingasmallbuyingofficeinthecapitalcity.However,shealsorecognizesthepotentialforsubstantialriskifcivil
warshouldbreakout,includingthelossofEris'investmentsintheofficeandthedisruptionofitssupplychain.
EmilyestimatesthepresentvalueofsourcingfromArboriaat$1millioninnetsavingsaslongastheArborian
productionfacilityandinfrastructureworkatahighlevelofreliability.Inheropinion,theprobabilityofsucha
"Serene"scenariois40%.
Emilybelievesthereisa40%chanceofamore"Troubled"environmentbesetwithminorsupplychaindisruptions,
whichwouldreduceEris'expectedcostsavingsto$0.5million.Finally,sheestimatesthelikelihoodofmajorpolitical
unrestat20%,andthenetpresentvalueofthelossesassociatedwithsucha"chaotic"scenarioat$1million.
InsteadofsourcingfromArboria,Eriscouldcontinueitscurrentsourcingagreements,whichwouldnotresultinany
costreduction.
121/135

5/13/2016

QuantitativeMethodsOnlineCourse

TheEMVofsourcingfromArboriais$0.4million.BasedontheEMVcriterion,Emilyshouldchoosetosourcefrom
Arboria.However,giventhestakesinvolved,Emilywouldlovetohaveadditionalinformationthatwouldincrease
herunderstandingofArboria'spoliticalsituation.
Ideally,she'dliketobaseherdecisiononperfectinformation,i.e.,she'dliketoknownowexactlywhichoneofthe
threeoutcomeswillmaterialize.Suchperfectinformationwouldbeextremelyvaluable:howmuchshouldEmilybe
willingtopayforit?
Managersoftenhavetheopportunitytogathermoredataorexpertisetohelpinformtheirbusinessdecisions.
Dependingonthebusinesscontext,newinformationcanbeobtainedfromstatisticaldata,expertconsultants,or
scientifictestsandexperiments.Inmanycases,suchinformationcanimprovetheaccuracyoftheestimates
incorporatedintoadecisionanalysis.
However,thecostofsuchdatacandragdownthebottomline.Clearlythecostofadditionalinformationmustbe
weighedagainstitsvalue.Howcanmanagersassessthevalueofinformation?
SupposeforthesakeofargumentthatEmilyknowsafortuneteller,FriedaFeatherlight,whohasgenuine
psychicpowersandcanaccuratelypredictthefuture,givingEmilytheperfectinformationshecraves.Howmuch
shouldEmilypayFriedaforhersupernaturalservices?
Toanswerthisquestion,wecalculatetheExpectedValueofPerfectInformationtheEVPIprovidedby
Frieda.Wecandeterminethisvaluebyframingadecision:shouldEmilypurchaseFrieda'sinformationbeforeshe
makesherdecisiontosourcefromArboria?
Characterizedasadecisionproblem,wecandeterminetheEVPIusingfamiliartools:adecisiontreeandthe
expectedmonetaryvalue.Let'sconstructatreeforEmily'sdecision,beginningwithadecisionnodethatbranches
intotwooptions:"Buytheinformation"or"Don'tbuytheinformation."
Thelowerbranch"Don'tbuytheinformation"issimplytheoriginaltreeforthedecisionaboutwhetherornot
tosourcefromArboria.ThistreeusesonlytheinformationEmilyalreadypossesses.TheEMVofthisoptionis$0.4
million.
IfEmilyengagesFrieda'sservicesand"buystheinformation,"threepossibilitiesemergebasedonwhichscenario
Friedapredicts"Serene,""Troubled,"or"Chaotic."Iftheprobabilitiesofthesescenariosactuallyoccurringare
40%,40%,and20%,respectively,whatisEmily'sbestassessmentofthelikelihoodthatFriedawillpredicteach
scenario?
Withoutfurtherinformation,Emilyhasnoreasontochangeherinitialassessments.UnlesssheprobesFriedafor
information,Emily'sbestguessthatFriedawillpredicteachoutcome"Serene,""Troubled,"and"Chaotic"is
thesameasherbestestimatesfortheprobabilitiesthatthoseoutcomeswillactuallyoccur:40%,40%,and20%,
respectively.
Let'sassumeforthemomentthatFriedawon'tchargeEmilyforherservices.Thebeautyofperfectinformationis
thatEmilycandelayherdecisionuntilaftershehasheardFrieda'scompletelyaccuratepredictionofwhatthe
Arborianpoliticalclimatewillbe.IfFriedapredictsa"Serene"sourcingexperiencethenEmilywillchoosetosource
fromArboria,andcostsavingsofabout$1millionarecertain.
Likewise,ifFriedapredictsa"Troubled"sourcingexperience,EmilywillsourcefromArboriaandcostsavingsof
about$0.5millionarecertain.IfFriedapredictsa"Chaotic"sourcingexperience,thenEmilywillchoosenotto
sourcefromArboria,therebyavoidingacertainlossinthe$1millionrange.Inthiscase,Eriswillreceivenocost
savings.
StillworkingundertheassumptionthatFriedawon'tchargeforherservices,whatistheEMVof"buyingthe
information?"
EntertheEMVin$millionsasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter
"$5,500,000"as"5.50").Roundifnecessary.
TheEMVof"buyingtheinformation"is$600,000.
At$600,000,theEMVof"buyingtheinformation"is$200,000higherthantheEMVof"notbuyingthe
information."ThisdifferencebetweentheEMVsoftheoptionwithperfectinformationandwithoutperfect
informationistheexpectedvalueofperfectinformation(EVPI).TheEVPIof$200,000isthemaximumamount
EmilyshouldpayFriedaforasance.
TheEVPIestablishesanupperboundonwhatweshouldpayforperfectinformation.Insomebusinesscases,
perfectornearperfectinformationmaybeavailablewithoutsupernaturalmeans.
122/135

5/13/2016

QuantitativeMethodsOnlineCourse

Forinstance,supposethataerospacecompanyAirbuswantstoknowifitcouldperfectallofthetechnologies
necessarytocreateasafeandreliableSpaceCruiser,aspaceshuttlefortouriststhatwouldhaveallthecomfortsofa
luxurycruiseship.
Airbuscouldbuildasmallbutcompletelyfunctionalprototypeandseeifitworks.AlthoughAirbusmightcollect
nearperfectinformationbydoingso,thecostofdevelopingtheSpaceCruiserprototypewouldlikelybehigherthan
theexpectedvalueofthatperfectinformation.
Instead,Airbusmightfirstusecomputersimulationsandlimitedfunctionalityprototypestoassessthetechnical
viabilityoftheSpaceCruiser.Theinformationthusgainedwouldn'tbeperfect,butitwouldbehelpfulanditwould
costfarlessthanafullyfunctionalprototype.
TheEVPIestablishesanupperboundonthevalueofanyinformation,perfectorflawed.Throughsampling,
educatedexpertise,orimperfecttesting,wemightbeabletogainbetterthoughnotperfectestimatesofthe
probabilitiesandtheoutcomevaluesofourdecision'spossiblescenerios.
Shortly,we'lllearnhowtovalueimperfectinformationinformationthatreduces,butdoesnoteliminate,our
uncertaintyaboutfutureevents.Forthetimebeing,weknowthatweshouldneverspendmoreforimperfect
informationthanwearewillingtopayforperfectinformation.
InherquestforaninfalliblechoiceintheArboriansourcingdecision,ifEmilywon'tpaymorethan$200,000for
perfectinformation,shecertainlyshouldn'tpaymorethan$200,000forimperfectinformation.Ifthepriceof
imperfectinformationexceedstheEVPI,weshouldnotexpendresourcesonit.

Summary
Asmanagers,wewouldliketoknowexactlywhichoutcomeswilloccursowecanmakethebestdecisions.Wecan
calculatetheexpectedvalueofsuchperfectinformationEVPItofindanupperlimitontheamountwe
wouldbewillingtopayforanyadditionalinformation.TocalculatetheEVPI,wefirstframethedecisionproblem
"tobuyornottobuytheperfectinformation."WesubtracttheEMVofbuyingperfectinformationassuming
it'sfreefromtheEMVofnotbuyingperfectinformationtofindtheEVPI.

TheExpectedValueofSampleInformation
Inalmostallcircumstances,perfectinformationiseitherimpossibletoassembleorprohibitivelyexpensive.In
thesecases,weuseimperfectinformationoftencalled"sample"informationtoinformourdecision.Aswith
perfectinformation,thecostofsampleinformationmustbeweighedagainstitsvalue.Asmanagers,howdowe
assessthevalueofsampleinformation?
Let'sreturntheErisShoeCompany'sdecisiontosourceproductioninArboria.EmilyVillehasdistinguished
threepossiblescenariosfortheArborianpoliticalclimate:"Serene,""Troubled,"and"Chaotic,"whichshebelieves
haveprobabilitiesof40%,40%,and20%respectively.Theoutcomessheassociateswiththesescenariosincost
savingsforErisare$1million,$0.5million,and$1million,respectively.
Emilywouldliketoimproveherestimatesoftheprobabilities.ShecontactsPoliFor,aconsultingcompanythat
specializesinbusinessoutlookintelligence.PoliForproducesananalysisofacountry'sorregion'spoliticalclimate,
andthenprovidesariskassessmentspecifictotheneedsofitsclient,distinguishingbetween"High"and"Low"
risksituations.
Inaworldofuncertainty,PoliFor'sassessmentsarenotalwaysaccurate.Sometimes,regionswith"Low"risk
assessmentsburstintorevolutionaryflame.Sometimes,regionswith"High"riskassessmentsturnouttobeas
stableandplacidasthemoon'sorbit.HowhighapriceshouldEmilypayforPoliFor'sservicesgiventhatPoliFor's
assessmentsarenotperfectlyreliable?
BasedonpreliminaryconversationswithPoliFor'sleadconsultant,EmilyassessestheprobabilitiesthatPoliFor
willreport"High"or"Low"riskforsourcinginArboriaat30%and70%,respectively.Shethenconsidershow
eachoftheserisklevelassessmentswouldaffectherestimatesoftherelativelikelihoodofherthreerepresentative
scenarios:"Serene,""Troubled,"and"Chaotic."
EmilybelievesthatifPoliForpredicts"Low"risk,thenthelikelihoodofa"Chaotic"politicalclimatewouldbelow:
5%,andtheprobabilitiesof"Troubled"or"Serene"scenarioswouldbe40%and55%respectively.Ifweuse"Low"
torepresentPoliForpredictingalowriskpoliticalenvironmentwhichofthefollowingbestexpressesthebeliefs
Emilyhassummarizedabove?
TheinformationuponwhichEmilyisbasingherassessmentsisthePoliForpredictionofa"Low"riskpolitical
123/135

5/13/2016

QuantitativeMethodsOnlineCourse

climate.Thus,sheisestimatingconditionalprobabilitiesthatEris'experiencesourcingfromArboriawillbe
"Serene,""Troubled,"or"Chaotic,"giventhattheconsultantspredicta"Low"levelofrisk.
Similarly,EmilyassessestheconditionalprobabilitiesthatEris'experiencesourcingfromArboriawillbe
"Serene,""Troubled,"or"Chaotic"giventhattheconsultantspredicta"High"levelofriskat5%,40%,and55%,
respectively.HowdowecalculatetheexpectedvalueoftheimperfectinformationPoliForoffers?
Aswedidforperfectinformation,weframethisquestionasadecisionproblem:shouldEmilypurchasethe
imperfectinformationfromPoliFor?First,we'llconstructadecisiontree.Thetreebeginswithadecisionnodefor
thechoiceEmilyfaces:"Buytheinformation"or"Don'tbuytheinformation.
Thelowerbranchfortheoption"Don'tbuytheinformation"istheoriginaltreethatEmilyconstructedusingher
initialestimatesfortheoutcomevaluesandprobabilitiesofthethreebasicscenarios.Whatshouldemanatetothe
rightfromthe"BuyInformation"branch?
Ifweareattheendofthe"BuyInformation"branch,weassumethatEmilyhaspurchasedtheinformation.In
thatcase,thefirstthingthatshewilldoisopenthereportenvelopeandexaminetheinformationtolearnwhat
levelriskPoliForhaspredicted.Thus,theupperbranchfortheoption"Buytheinformation"splitsthechance
nodeintotwobranches,oneforeachriskleveltheconsultingcompanymightpredict.
WhatshouldemanatefromeachoftheLowandHighbranches.
TheinformationthatEmilyhaspurchasedisvaluableonlyifsheusesittosupportherdecisionmaking.The
valueoftheinformationresidesinEmily'sabilitytomakehersourcingdecisionafterlearningthelevelofrisk
PoliForpredicts.Thus,aftereachprediction,weplaceadecisionnode:shouldshesourcefromArboriaorcontinue
tosourcefromhercurrentsuppliers?
Tocompletethedecisiontree,wemustincorporatethepossiblescenariosthatcanoccurifEmilychoosestosource
fromArboria:"Serene,"Troubled,"or"Chaotic."Whatprobabilitiesshouldweassigntothethreebranches
emanatingfromthechancenodehighlightedtotheright?
Atthehighlightednode,weassumethateveryeventontheuniquepathleadingfromthenodebacktothe
beginningofthetreehastranspired:EmilyboughttheinformationPoliForreported"High"riskandEmily
chosetosourcefromArboria.Thus,theprobabilitiesassignedtothebranchesmustbeconditionedonthose
events.Forexample,weassignP(Serene|High)tothe"Serene"scenarioonthe"High"branch.
Nowwecompletethetree,substitutingthevaluesfortheconditionalprobabilitiesandplacingtheappropriate
monetaryvaluesattheendpoints.Asusual,we'llassumeforthemomentthattheinformationisfree,andstartto
foldbackthetree.WhatistheEMVofthe"HighRisk"branch'sdecisionnode?
IfPoliForpredictsa"High"risklevel,EmilyshouldnotchoosetosourcefromArboriasincetheEMVofthe
"Source"fromArborianode,$0.3million,islessthantheEMVofthe"Don'tsource"node,$0million.Instead
sheshouldcontinuehercurrentsupplyarrangements,inwhichcaseshewillrealize$0incostsavings.
Foldingbackthedecisiontree,wefindanEMVof$490,000fortheoption"Buytheinformation."
WhatisthemostEmilyshouldpayfortheimperfectinformation?
ThedifferencebetweentheEMVsofthe"Buy"and"Don'tbuytheinformation"optionsis$90,000,theexpected
valueofimperfect(orsample)information(EVSI)providedbyPoliFor.TheEVSIisthemaximumamountthat
EmilyshouldbewillingtopayforPoliFor'sreport.
Aswewouldanticipate,theexpectedvalueofthisimperfectinformationisquiteabitlessthan$200,000,the
expectedvalueofperfectinformationwecomputedearlier.
PoliFor'sinformation,althoughimperfect,hasvaluebecauseitallowsEmilytoupdateheroriginalprobability
estimatesbasedontherisklevelPoliForpredicts.Shecanmakebetterdecisionsbasedonthesemoreaccurate
updatedprobabilityassessments.
TheprobabilityestimateswestartedwithEmily'sestimatesforP(Chaotic),etc.arecalledpriorprobabilities.
TheupdatedconditionalprobabilitiesP(Chaotic|HighRisk),P(Troubled|HighRisk),etc.,arecalled
posteriorprobabilities.

Summary
Byinvestinginprofessionalexpertise,statisticalstudiesortests,managerscanoftenimprovetheir
understandingofuncertainevents.Suchimperfect(or"sample")informationisitselfasourceofuncertainty
124/135

5/13/2016

QuantitativeMethodsOnlineCourse

becauseitisn'taperfectpredictoroftheoutcomesofinterest.Althoughimperfectinformationcanimproveour
predictions,suchinformationcomesataprice.Responsiblemanagerscalculatetheexpectedvalueofsample
information(EVSI),andpurchaseinformationonlyifitsexpectedvalueexceedsitscost.TocalculatetheEVSI,
weposethedecisionproblem"tobuyornottobuythesampleinformation,"andsubtracttheEMVofbuying
theinformationassumingthatit'sfreefromtheEMVofnotbuyingit.

UpdatingPriorProbabilites
Zeke"Claw"CrankshawownsandrunsOligarchol,oneofthelargestindependentoilproducersintheWestern
Hemisphere.
Recently,ZekeacquiredthemineralrightstoapieceoflandnearChanchitoGordoCanyon.Basedonhisown
professionalassessment,Zekebelievesthathehasa50%chanceoffindinganoilfieldifhedrillsinChanchito
Gordo,a50%chanceofofdrillingnothingbut"Dry"holes.
Likeanysubjectiveprobabilities,theseestimatesaretheproductofZeke'seducatedguesswork.Clearly,theoilis
eitherbeneaththesurfaceatChanchitooritisn't.However,Zeke'sexperiencetellshim,forexample,thatin
abouthalfofallsimilarsituationssimilarterrain,similarlyexposedrockstrata,etc.oilprospectorshave
struckoil.
Heestimatesthevalueofstrikingoilinpresentvalueoffutureprofitstobe$8millionafternettingout
drillingcosts.Ifallofhisboreholescomeup"Dry,"Zekewillhavelostthe$2millioncostofdrilling.Withthese
assessmentsinmind,Zekemustdecideifheshoulddrillontheproperty.
IfZekedecidesnottodrill,hisinvestmentinthemineralrightswillnotpayoffinthewayhehadhoped.
However,sincethepurchasepriceofthemineralrightshasalreadybeenincurred,itisasunkcostthatshouldn't
bearonthedecision.
WhatistheexpectedmonetaryvalueofdrillinginChanchitoGordo?
EntertheEMVin$millionsasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter
"$5,500,000"as"5.50").Roundifnecessary.
TheEMVofdrillinginChanchitoGordois$3million.
Withoutfurtherinformation,Zekeshouldstartshippingdrillbitsandoilrigsoverthedustybackcountryroadsto
Chanchito.ButoldZekeisnottheimpetuousyoungwhippersnapperoftheearlydaysofOligarchol,backwhenthe
companywasderisivelyreferredtoinindustrycirclesas"PipeDreams,Incorporated.
Now,Zekeishappytousethelatestinseismictestingtechnologytogainbetterinformationandmakemore
informeddecisions.Zekecanordera$100,000seismictestoftheChanchitoGordoproperty.Althoughtheseismic
testwillimprovehisestimateofthelikelihoodoffindingoil,itwon'tprovideperfectinformation.
Theseismictestwillreportoneoftwopossibleoutcomes:a"Positive"ora"Negative"result,dependingon
whetherornotthetestindicatesthepresenceofoil.Becausetheseismictest'saccuracyisakeydeterminantofits
value,Zekehasgatheredsomehistoricaldataonthetest'sperformance.
Zekebelievesthatifthereisanoilfieldonhisproperty,thentheprobabilitythatthetestwillreturna"Positive"
resultis90%.ThisistheconditionalprobabilityP(Positive|Oil).Clearlythetestisn'tperfectevenifthereisa
reserveonhisproperty,thereisa1in10chancethetestwillfailtodetectit.Thisistheconditionalprobabilityofa
"FalseNegative"result:P(Negative|Oil)=10%.
And,unfortunatelyforZeke,evenwhenthereisn'tadropofoiltobefound,thetestwillstillreportapositive
resultafalse"Positive"withprobability30%.Theprobabilitythattheseismictestwillindicatetheabsence
ofoilwhenthereis,infact,nooiltobefoundP(Negative|Dry)is70%.HowcanZekeusethesedatato
calculatetheEVSItheexpectedvalueofthetestinformation?
Asusual,westartwithadecisionnodethatbranchesintotwooptions:"Test"or"Don'tTest."Thelowerbranch,
"Don'tTest,"isthesameasZeke'soriginaltreeforwhethertodrillornot.TheEMVforthe"Don'tTest"branchis
$3millionitisbasedonlyontheinformationZekepreviouslyhadavailabletohim.
IfZekebuysthetest,hewilllearnthetest'sresultpositiveornegativebeforehehastodecidewhethertodrill
ornot.WhatprobabilityshouldZekeassigntothe"TestPositive"branch?
TheprobabilitythatZekeshouldassigntothe"TestPositive"branchisP(Positive).However,thisprobabilityis
notintheinformationZekeoriginallyhadavailable,whichissummarizedinthetabletotheright.
125/135

5/13/2016

QuantitativeMethodsOnlineCourse

TofindtheP(Positive),wemustfirstfindthejointprobabilitiesandenterthemintothetable.WhatisP(Oil|
Positive)?
Entertheprobabilityasadecimalnumberwiththreedigitstotherightofthedecimalpoint(e.g.,enter"50%"as
"0.500").Roundifnecessary.
TofindP(Oil|Positive),weusethefamiliarrelationshipforconditionalprobabilities.Ifyouwishtopractice,
calculatetheotherjointprobabilitiesandP(Positive)andP(Negative)beforeadvancingtothenextscreen.
Addingthejointprobabilities,wefindthatP(Positive)=60%andP(Negative)=40%.
Wearenowreadytocompletethetree.AfterZekelearnstotesttheresult,hemakesthedecisionaboutwhetheror
nottodrill.Ifhedrills,hewilleitherstrikeoilornot.Whatprobabilityshouldweassociatewiththebranch
highlightedtotheright?
IfZekeisatthehighlightednode,thenalltheeventsonthebranchleadinguptothenodemusthaveoccurred:he
decidedtotest,thetestwaspositive,andhedecidedtodrill.Thus,Zekemustconditiontheprobabilityofstriking
oilonpastevents:specifically,onthefactthatthetestreportwas"Positive."Thus,heshouldassociatethe
conditionalprobabilityP(Oil|Positive)withthehighlighted"Oil"branch.
Wefindtheconditionalprobabilitiesintheusualway.Thefulltableofjoint,marginal,andconditional
probabilitesisshowntotheright.
Ourtreenowhasallthenecessarydata.WefolditbackandfindthattheEMVforconductingaseismictest
assuminigthetestisfreeis$3.3million.
WhatistheEVSIoftheseismictest?
EntertheEVSIin$millionsasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter
"$5,500,000"as"5.50").Roundifnecessary.
TheEVSIis$300,000thedifferencebetweentheEMVofconductingthetest,$3.3million,andtheEMVofnot
conductingthetest,$3million.ThisisthemaximumamountZekeshouldbewillingtospendontheseismictest.
Sincetheexpectedvalueofthetestexceedsits$100,000cost,Zekeshouldorderthetest.

Summary
Often,informationisnotavailableintheformweneeditintoinformourdecisionprocessandthecalculation
oftheEVSI.Frequently,forexample,wehaveaccesstodataaboutthereliabilityofatest.Thereliabilityofa
testisgivenasthesetofconditionalprobabilitiesP(TRi|Si)whereTRiisatestresultandSjisoneofthe
scenariosthetestisintendedtohelppredict.Collectively,theseconditionalprobabilitiesrevealthetest'sability
topredicttheoutcomeoftheuncertaineventinquestion.Usingthisinformation,weupdateourinitial
estimatesfortheprobabilitiesofeachscenariothepriorprobabilitiesP(Si)toachieveimproved,
conditionalprobabilitiesofeachscenario,giventheresultofthetestposteriorprobabilitiesP(Si|TRi).We
usetheposteriorprobabilitiestocalculatetheEVSIandaidourdecisionprocess.

SolvingtheMarketResearchProblem
YouknowhowtocalculatetheexpectedvalueofinformationandsetanupperlimitontheamountLeoshouldbe
willingtopayformarketresearch.ButwhatkindofmarketresearchcouldLeohaveinmind?Andwoulditbeworth
itscost?
Leowantstoknowhowmuchmoneyheshouldbewillingtospendonadditionalinformationaboutconsumers'
likelyreactiontoafloatingrestaurant.First,youcalculatetheexpectedvalueofperfectinformationtogiveLeoon
absoluteupperboundontheamountheshouldpayforanykindofinformation.
Youbeginbysettingupadecisiontreetohelpinformthedecisionofwhetherornottobuyperfectinformation.
Whichofthefollowingbestrepresentsthe"BuyInformation"branchoftheperfectinformationtree?
Whateverthesourceofperfectinformation,itwillpredicttheoccurrenceofeithera"Phenomenon"ora"Fad."
Whichoneofthesescenariositwillpredictisuncertaintherespectiveprobabilitiesareequaltotheprobabilitiesof
thetwoscenarios:35%and65%.
OnceLeohasperfectinformation,hecanchoosetoeitherlaunchtheTethysventureornot.
WhatistheEMVofacquiringtheperfectinformation,assumingthatitisfree?
126/135

5/13/2016

QuantitativeMethodsOnlineCourse

EntertheEMVin$millionsasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter
"$5,500,000"as"5.50").Roundifnecessary.
IfLeoiscertainthattheChezTethyswillbea"Phenomenon,"hewilllaunchherandrealizea$2millionprofit.Ifhe
iscertainthattheTethyswouldbeamere"Fad,"he'llavoidthe$800,000lossbynotlaunching,therebyrealizingno
profitsorlossesinthefloatingrestaurantindustry.TheEMVofacquiringperfectinformationis$700,000.
RecallthattheEMVoflaunchingtheTethysbasedonLeo'soriginalestimateswas$180,000.Whatisthe
expectedvalueoftheperfectinformation?
EntertheEMVin$millionsasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter
"$5,500,000"as"5.50").Roundifnecessary.
Theexpectedvalueofperfectinformationisthedifferencebetween$700,000theEMVofacquiringtheperfect
informationandtheEMVofnotacquiringtheperfectinformation.Inthiscase,theEMVofnotacquiringthe
perfectinformationistheoriginalEMVoflaunchingtheChezTethys,$180,000.Thedifferenceis$520,000.
That'salotofmoney!Ihaveanideaforamarketresearcheventthatisexpensive,butit'smuchcheaperthanthat.
Let'sgettoit!
Notsofast,Leo.Upto$520,000iswhatyoushouldbewillingtopayforperfectinformation.Butrealworld
informationisusuallymuchlessthanperfect.Whatexactlydoyouhaveinmind?
Youtwoaregoingtolovethis.I'llrentacruiseshipanddoaweeklongtestcruisearoundtheislands,invitingguests
atthefinesthotelsonboardforfree.Atdinner,I'llrunanadvertisementfortheTethys,thensurveyalltheguests
andaskthemifthey'dbeinterestedindiningthere!
That'sanaudaciousmarketresearchplanforanaudaciousbusinessventure.Iliketheidea,though,andIthink
you'llgetmuchbetterinformationthanifyouhireafirmtodoconventionalmarketresearch.Let'strytoestimate
thevalueofthisinformation.
ThethreeofyoudebatethemeritsofLeo'smarketingevent.YousplittheresultsofLeo'seventintotworesponse
scenariosbasedonhowLeo'sguests/subjectsreceivehisdemonstration:"Enthusiastic"and"Tepid,"estimatingthe
probabilitiesoftheseresponsesat40%and60%,respectively.
IfthesubjectsrespondtoLeo'sdemonstration"Enthusiastically,"youbelievethattheTethysbecominga
"Phenomenon"ismuchmorelikelythanyourearlierestimate:65%.Ontheotherhand,ifthereceptionis"Tepid,"
youestimatethattheprobabilityofa"Phenomenon"isamere15%.
Giventhesedataandthetreetotheright,whatistheexpectedvalueofLeo'ssampleinformation?
EntertheEVSIin$millionsasadecimalnumberwithfourdigitstotherightofthedecimalpoint(e.g.,enter
"$5,555,000"as"5.5550").Roundifnecessary.
TocalculatetheEVSI,calculatetheEMVofLeo'srunninghisplannedevent.First,calculatetheEMVoflaunching
theTethysgiventhattheeventparticipants'responseis"Enthusiastic."TheEMVisthesumoftheoutcomesofthe
"Phenomenon"and"Fad"scenarios,aftertheyhavebeenweightedbytheconditionalprobabilitiesthatthese
scenarioswilloccurgivenan"Enthusiastic"reception:$1.02million.
SincetheEMVforlaunchingtheTethysishigherthantheEMVfornotlaunchingher,Leoshouldembarkonthe
Tethysventureifhisguest/subjectsrespond"Enthusiastically"tohisevent.Thus,theEMVifguestsgivean
"Enthusiastic"responsetoLeo'seventis$1.02million.
Next,calculatetheEMVoflaunchingtheTethysgiventhattheeventparticipants'responseis"Tepid."TheEMVis
thesumoftheoutcomesofthe"Phenomenon"and"Fad"scenarios,aftertheyhavebeenweightedbytheconditional
probabilitiesthatthesescenarioswilloccurgivenan"Tepid"reception:$380,000.
SincetheEMVforlaunchingtheTethysislowerthantheEMVformaintainingthestatusquoattheKahana,Leo
shouldnotembarkontheTethysventureiftheparticipants'responseis"Tepid".Thus,theEMVofa"Tepid"
responsetoLeo'seventis$0.
TheEMVofLeo'seventis$408,000:thesumoftheEMVsofthe"Enthusiastic"and"Tepid"responses,afterthey
havebeenweightedbytheprobabilitiesthattheseresponseswilloccur.
TheexpectedvalueofLeo'ssampleinformationisthedifferencebetweentheEMVofhiseventand$180,000the
EMVoflaunchingtheTethyswithoutfurtherinformation.Thedifferenceis$228,000.
Italkedtosomefriendslastnight,andthetotalcostofrentingandrunningacruiseshipforaweek,andshootingan
127/135

5/13/2016

QuantitativeMethodsOnlineCourse

advertisingshortcomestoatleast$240,000.
That'shigherthantheexpectedvalueofthisinformationwecalculated:$228,000.
Areyoutellingmethatmyeventwouldnotbeworthitscost?
I'mafraidso,Leo.Butthatdoesn'tnecessarilymeanthatyoushouldn'tdoit.Youmightchoosetohaveyourevent
evenifitsexpectedmonetaryvalueislow.It'slate.Let'sbreakfortonightandmeetoverbreakfasttomorrow.

Exercise1:TheFUDGroceryChain
ThegrocerychainFUDofferstwolinesofprivatelabelproducts:"FUDBrand,"and"PleasantValleyFare."The
twobrandseachareusedtolabelasetofgroceryitemswithverylittleoverlap.Forexample,"FUDBrand"isused
formeatandcheeseproducts"PleasantValleyFare"forcannedgoods.
EstherSmith,CEOofFUD,anticipatesthateliminatingthebrandwithweakerconsumerloyaltyandrepackaging
itsproductlineunderthelabelofthestrongerbrandwouldresultinanadditional$200,000inprofitsfrom
increasedsalesandfromcostsavingsonpackagingandadvertising.IfEsthereliminatesthestrongerbrand,she
anticipatesanetbottomlineimprovementof$0reductionsinsalesvolumeorpricediscountswouldroughly
offsetcostsavings.
Estherdoesn'tknowwhichofthebrandsenjoysgreatercustomerloyaltysincetheproductlinesdon'toverlap,
shecan'tdirectlycomparethetwobrands'performances.Initially,shebelievesthatthereisa50%chancethat
customersaremoreloyaltoFUDBrand,anda50%chancethattheyaremoreloyaltoPleasantValleyFare(PVF).
Basedonthisinformation,whichofthetwooptionseliminatingFUDBrandorPleasantValleyFarehasthe
higherEMV?
EachoptionhasanEMVof$100,000.BasedonEsther'scurrentinformation,shehasnobasisforpreferringone
brandovertheother.
SupposeEstherhadaccesstoperfectinformation,i.e.,shecoulddiscoverwithcertaintywhichbrandher
customersprefer,soshecouldeliminatetheweakerbrandandincreaseFUD'sprofitsby$200,000.Howmuch
shouldshebewillingtopayforsuchperfectinformation?
Entertheexpectedvalueofperfectinformationin$millionsasadecimalnumberwithtwodigitstotherightof
thedecimalpoint(e.g.,enter"$5,500,000"as"5.50").Roundifnecessary.
First,constructthetreeforthedecisiontobuyornottobuytheinformation.TheEMVfornotbuyingthe
informationis$100,000nomatterwhichbrandEstherchoosestoeliminate.
Thereisa50%chancethattheperfectinformationEstheracquireswillrevealthatcustomersprefertheFUD
Brandanda50%chanceitwillrevealthatcustomerspreferPleasantValleyFare.
OnceEstherhastheperfectinformation,shewillchoosetoeliminatethebrandthatcustomersarelessloyalto,
adding$200,000toFUD'sbottomline.
TheEMVofbuyingperfectinformationis$200,000.Theexpectedvalueofperfectinformation,EVPI,isthe
differencebetweentheEMVsofbuyingandnotbuyingperfectinformation:$100,000.
SinceEsthercannotgetperfectinformation,she'swillingtosettleforinformationthatislessthanperfectand
decidestohireamarketresearchfirmtodeterminewhichofFUD'stwobrandscustomersprefer.
EstherbelievesthattheprobabilitythatthemarketresearchfirmwillfindthatFUDBrandhashighercustomer
loyaltyis50%.Likewise,theprobabilitythatthemarketresearchfirmwillpredictthatPleasantValleyFarehas
highercustomerloyaltyis50%.
Whicheverbrandthemarketresearchfirmindicatesisstronger,Estherbelievesthatthefindingwillbecorrect
withprobability85%.Thus,thereisa15%chancethatthefindingthefirmprovideswillbewrong.
Whatistheexpectedvalueofthissampleinformation?
Entertheexpectedvalueofsampleinformationin$millionsasadecimalnumberwiththreedigitstotherightof
thedecimalpoint(e.g.,enter"$5,500,000"as"5.500").Roundifnecessary.
First,constructthetreeforthedecisiontobuyornottobuythesampleinformationusingtheprobabilitiesand
outcomescitedabove.TheEMVfornotbuyingtheinformationis$100,000nomatterwhichbrandEsther
128/135

5/13/2016

QuantitativeMethodsOnlineCourse

chooses.
OnceEstherhasthesampleinformation,shewillchoosetoeliminatethebrandthatthemarketresearchindicates
isweaker.TheEMVofeliminatingthebrandthatmarketresearchpredictsenjoyslesscustomerloyaltyis
$170,000.
TheEMVofbuyingthisimperfectinformationis$170,000.Thus,theexpectedvalueofsampleinformation,
EVSI,isthedifferencebetweentheEMVsofbuyingandnotbuyingperfectinformation:$70,000.

Exercise2:PPDImmunity
Afterapublicrelationsdebacleoverapossibleoutbreakofpulluscularpigdisorder(PPD)atoneofBowman
LyonsCenterville'shogfarms,PaulSegal,headofthehogfarmingdivision,isconsideringvaccinatinganother
herdatahogfarmlocatedneartheoriginaloutbreak.
ImmunitytoPPDisfairlycommonamongpigstheprobabilitythattheentireherdinquestionisimmuneis
fairlyhigh:60%.
However,ifanypigsintheherdarenotimmune,theexpectedcosttoBowmanLyonsCentervilleis$150,000.
ThiscosthasbeencalculatedbasedontheprobabilityofaPPDoutbreakandtheensuingcoststocontainthe
diseaseandmanagetheexpectedPRfallout.
Thecostofvaccinatingtheherdis$40,000.WhatistheEMVofthecostofnotvaccinatingtheherd?

EntertheEMVin$thousandsasadecimalnumberwithonedigittotherightofthedecimalpoint(e.g.,enter
"$5,500"as"5.5").Roundifnecessary.
TheEMVofthecostofnotvaccinatingtheherdis$60,000.Sincethecostofvaccinatingtheherd$40,000is
lowerthantheexpectedcostofnotvaccinatingtheherd,Paulshouldhavetheherdvaccinated,barringthe
emergenceofanyfurtherinformation.
ThereisatestthatPaulcouldperformontheherdtodetermineiftheherdisimmunetoPPD.Thistestdelivers
tworesults:"Positive"forimmunity,or"Negative"forimmunity.
Thetestisnotperfectlyaccurate:iftheentireherdisimmune,thetestwillreport"Positive"withprobability85%
and"Negative"withprobability15%.Similarly,ifanyofthepigsintheherdarenotimmune,thetestwillreport
"Positive"withprobability30%and"Negative"withprobability70%.
UsingPaul'spriorprobabilityfortheimmunityoftheherd,calculateP(Positive),themarginalprobabilitythat
thetestwillreporta"Positive"result.
Entertheprobabilityasadecimalnumberwiththreedigitstotherightofthedecimalpoint(e.g.,enter"50%"as
"0.500").Roundifnecessary.
First,calculatethejointprobabilitiesP(Positive&Immune)andP(PositiveandNotImmune),usingthemarginal
probabilitiesforimmunityandnonimmunity60%and40%,respectivelyandtheconditionalprobabilities
thatquantifythereliabilityofthetestP(Positive|Immune)andP(Positive|NotImmune).
Then,sumthejointprobabilitiestofindthemarginalprobabilityofa"Positive"testresult:63%.Sincethetest
results"Positive"and"Negative"aremutuallyexclusiveandcollectivelyexhaustiveevents,theprobabilityofa
"Negative"issimply100%minustheprobabilityofa"Positive,"i.e.,P(Negative)is37%.
ThetreebelowwillhelpPauldeterminewhetherornotorderingtheimmunitytestwillbeworthwhile.The
calculatedprobabilitiesofthetestresultsareenteredintheappropriatebranches.However,toreachadecision,
Paulneedstheconditionalprobabilitiesofimmunityandnonimmunityconditionedona"Positive"testresult.
WhatisP(Immune|Positive)?
Entertheprobabilityasadecimalnumberwiththreedigitstotherightofthedecimalpoint(e.g.,enter"50%"as
"0.500").Roundifnecessary.
TocalculateP(Immune|Positive),usethedefinitionofconditionalprobabilityanddividethejointprobability
P(Positive&Immune)bythemarginalprobabilityP(Positive).P(Immune|Positive)is81%.Sinceimmunity
andnonimmunityaremutuallyexclusiveandcollectivelyexhaustiveevents,P(NotImmune|Positive)issimply
100%minusP(Immune|Positive)i.e.,19%.
129/135

5/13/2016

QuantitativeMethodsOnlineCourse

Calculatetheremainingprobabilitiesonthedecisiontree.WhatisthehighestamountPaulshouldbewillingto
spendontheimmunitytest?
Entertheexpectedvalueofsampleinformationin$thousandsasadecimalnumberwithonedigittotherightof
thedecimalpoint(e.g.,enter"$5,500"as"5.5").Roundifnecessary.
Theexpectedvalueofnotvaccinatinggivena"Positive"testresultis$28,500.Sincethisislowerthanthecostof
vaccination,PaulshouldchoosenottovaccinateonEMVgrounds.
Theprobabilitythattheherdisimmunegiventhatthetestreports"Negative"is24.3%.SincetheEMVofthecost
ofvaccinatingtheherd($40,000)islowerthantheEMVofthecostofnotvaccinating($113,550)Paulshould
choosetovaccinateifthetestresultis"Negative".
TheEMVofconductingtheimmunitytestis$32,755,i.e.,thesumoftheEMVsof"Negative"and"Positive"test
results,weightedbytheirrespectiveprobabilities.
Theexpectedvalueofsampleinformation,EVSI,isthedifferencebetweentheEMVsoftestingandnottestingfor
immunity,i.e.,$7,245.ThemostPaulshouldbewillingtopayfortheimmunitytestis$7,245.

RiskAnalysis
Justasanelearningcoursemusthaveabittersweetlastsection,sotoomustyourinternshiponHawaiihaveafinal
day.ApartingPacificfrolicisrefreshingpreparationforyourmeetingwithLeo.

IntroducingRisk
Asyoudryinthemorningsun,AliceinvitesyoutoconsiderLeo'spositionshouldhewagertheKahanaonthe
Tethys'success."Evenifthepotentialprofitsaresubstantial,losingtheKahanaiftheTethystankswouldbeasevere
blow.Iwonderhowseriouslyhe'sconsideredthepotentialdownside."
Imagine:uponyourarrivalatbusinessschoolyoubuyyourselfanicenewcar.It'sasleek,powerful,BurgundyBen
HurbyChariotwithbothaSweetonestereoandaTheftDiscouragementSystemworthover$25,000.
Citydrivingcanberough:theprobabilitythatyou'llbeinvolvedinanaccidentthat"totals"yourcarreducesits
valuefrom$25,000tozeroinyourfirstyearis0.01%.Collisioninsurancetocoversuchalossinyournewcityis
optional:itwillcostyou$1,600peryearaboveandbeyondthepremiumforrequiredliabilityinsurance.Willyou
buythecollisioninsurance?
Mostpeoplewouldpaythepremiumtomitigatetherisksassociatedwithacaraccidentleadingtosuchlargelosses.
Butlet'slookatthedecisiontreeforbuyingcollisioninsurance.Forsimplicity,we'llignoreaccidentsthatdonottotal
yourcar.
Afterayearinschool,yourBennie'smarketvaluewillhaveapresentvalueof$25,000.Ifyoudon'tbuycollision
insuranceandthecaristotaled,you'lllosethefullvalueofthecar$25,000.Ifasweallhopeyouandyourcar
survivethecity'sroadsunscathed,youwon'tloseanything.
Optingforcollisioninsuranceentailsa$1,600insurancepremium.Ifyouaresparedthecalamityofawreck,you'll
haveincurredjustthepremiumcostattheendoftheyear.Iftraffictragedybefallsyou,yourinsurancecompany
willpayoutthevalueofthecarminusa$1,000deductible:soattheendoftheyearyouwillbeout$2,600:the
premiumplusthe$1,000deductible.
WhichoptionisbetterintermsoftheEMV?
At$2.50,theEMVofnotbuyingcollisioninsuranceismuchbetterthantheEMVofbuyingtheinsurance.
Essentially,ifyoubuythecollisioninsurance,youarepaying$1,600toavoidanexpectedlossof$2.50!Weshouldn't
besurprisedinsurancecompaniesarenotcharities,andfromtheirperspective,sellingyouinsurancehasavery
favorableEMV.
Still,amajorityofdriverschoosecollisioninsurance,eventhoughithasasignificantlylowerEMV.Whymightthis
bethecase?
Let'sconsideradifferentsituation.SupposeyouownedabrightandshinyminiaturewinduptoyBenniefor$25.
Wouldyoutakeouta$1.60insurancepolicyonit,eveniftheprobabilitywereupwardsof50%thatitwouldbe
steppedonorlostinthecomingyear?
Probablynot.A$25lossisnotevenaminorcatastrophe.IfyoulostoraccidentallydestroyedyourminiatureBennie,
130/135

5/13/2016

QuantitativeMethodsOnlineCourse

you'deitherreplaceitquickly,oracceptitsdeparturewithoutfuss.
Ifinsteadofanice,new,$25,000Bennieyouowneda20yearoldbeatupandrustyOldsmobileDelta88,worth
about$2,500,you'dalsothinktwiceaboutpaying$160toinsureitagainstcollision.Alossof$2,500though
stingingisunlikelytobeamajorsetback,especiallywhenweighedagainsta$160premium.
But,imaginethatyouassistedbyyourMBAdegreeamassamultimilliondollarfortuneoverthenextdecades.
Wouldyoustillinsurea$25,000caragainstcollision?Perhapsnot.
Clearly,thevalueofthepotentiallossrelativetoyournetworthhasanimportantinfluenceonyourwillingnessto
takeontheriskassociatedwithuncertainty.Supposeyouareinvitedtoplayagamblinggameinwhichyoucould
win$10millionwith50%probabilityorlose$2millionwith50%probability.EventhoughtheEMVofplayingthis
gameis$4million,you'dprobablydeclineparticipationunlessyoucanafforda$2millionloss.
Formostofus,thepainofthe$2millionlosswouldweighmoreheavilythanthejoyofwinning$10million,sowe
wouldnotacceptthisgamble.Forsomeonewhocanafforda$2millionloss,thisgamemightbeveryprofitable,
especiallyifplayedmultipletimes.
Thereareformalwaystomeasuresuchassessmentsbyassigningapersonalutilityvaluetoeachoutcome.Thenwe
canshowthatmaintainingthestatusquoourcurrentassetsprovidesgreaterutilitythanplayingthegamedoes.
Rigorousmethodsofquantifyingutilityarebeyondthiscourse'sscope.Fornow,let'slookathowwemightgain
insightintothepersonalutilityweassociatewithdifferentmonetaryoutcomes.
Returningtocollisioninsurance,let'ssummarizethescenariosandtheirprobabilitiesandoutcomevalues.Youhave
twooptions:"Buycollisioninsurance"and"Don'tbuycollisioninsurance.
Ifyouinsureagainstcollision,therearetwopossibleoutcomes:eitheryouavoidawreckforoneyearandlosethe
insurancepremiumof$1,600,ormisfortunestrikesandyourBennieistotaled.Inthelattercase,youlosethe
premiumandthedeductible.Theprobabilitiesassociatedwiththesetwooutcomesare99.99%and0.01%,
respectively.Thepossibleoutcomesandtheirrespectiveprobabilitiesarelistedinthetablebelow.
Ifyoudon'tinsureagainstcollisionandparkyourcarsafelyatyear'send,youwon'tsustainanylossatall.Ifthethe
unfortunatealternativeoccursandyourBennieistotaled,you'llhavelostitsvalueof$25,000.Again,the
probabilitiesoftheseoutcomesare99.99%and0.01%,respectively.
Wenowhavetwotablesthatcompletelysummarizethepossiblescenariosofourdecision.Themonetaryvaluesof
thesescenariosarearrangedfromtoptobottom,fromthebestoutcomestotheworst.Thesetablesoneforeach
optionaretogetherreferredtoastheriskprofilesforthedecision.
Fromtheriskprofiles,iteasytorecognizethattheoptionofnotbuyinginsurancewilldeliverthemostpreferred
outcomeifyoudon'thaveanaccidentbutwilldelivertheleastdesirableoutcomeifyoudo.Thisinsightprovidesthe
basisforyoutoprefertobuythecollisioninsurance:itexposesyoutolowerrisk,eventhoughithasalowerEMV.

Summary
TheEMVcriterionisnottheonlycriterionthatinformsdecisions.Besideswantingtomaximizeouraverage
outcomesinthelongrun,wewanttominimizeourexposuretorisk,especiallywhenpotentiallossesarehigh
relativetoourownnetworth.

RiskAttitudes
RecallSethChaplinandS&CFilms'productionofthemovieCloven.Sethhasachoicebetweentwooptions.Ina
dealwithPonyPictures,S&Cproducesthefilm,andPonyacquiresthecompleterightstoClovenforaflat
productionfee.InaseconddealwithK2Classics,S&Cretainspartownership,andthefinancialoutcomesforS&C
dependonCloven'sperformanceattheboxoffice.
SethknowsfrompreviousanalysisthattheK2dealhasahigherEMV:$2.04millionvs.$1.48millionforthePony
deal.However,healsoknowstheK2dealisriskier.Beforemakingafinaldecision,Sethwantstounderstandthe
fullimplicationsofhistwooptions.Let'sbuildriskprofilesforeachdeal.
Therearesevenpossiblescenariosthatcanoccur.TheoutcomevaluesofthesescenariosdependonSeth'schoice
ofaproductionpartner,onwhetherornotsuperstaractorShawnConnellylendshisgravellybaritonetothefilm's
leadcharacter,andontheaudience'sreceptionofthefilm.
Let'ssummarizethepossiblescenarios.IfSethchoosesthePonyoption,twoscenarioscouldoccur.Inone,
ConnellyparticipatesinthemovieandSeth'sprofitsare$2.2million.Intheother,Connellydoesnotparticipate
131/135

5/13/2016

QuantitativeMethodsOnlineCourse

inthemovieandSeth'sprofitsareonly$1million.IfSethoptsforthePonydeal,theprobabilitiesofthesetwo
scenariosoccurringare40%and60%,respectively.
IfSethtakestheK2option,therearefivepossiblescenarios,butonlythreepossibleoutcomes:"Blockbuster"
success,a"Lackluster"performance,anda"Flop.
A"Blockbuster"successcanoccurwhetherornotConnellyparticipatesintheproduction.Theprobabilityofa
"Blockbuster"is38%:thisprobabilityiscalculatedbyaddingthejointprobabilitythatClovenstarsConnellyand
isa"Blockbuster"tothejointprobabilitythatClovenisa"Blockbuster"withoutConnelly'sparticipation.
A"Lackluster"performancecanalsooccurwhetherornotConnellyparticipatesintheproduction.Theprobability
ofa"Lackluster"performanceis50%:thisprobabilityiscalculatedbyaddingthejointprobabilitythatCloven
starsConnellyandhasa"Lackluster"performancetothejointprobabilitythatClovenhasa"Lackluster"
performancewithoutConnelly'sparticipation.
A"Flop"occursonlywhenConnellydoesn'ttakepartintheproductionofCloven.Thetotalprobabilityofa"Flop"
isthejointprobabilityofClovenbeinga"Flop"andConnellynottakingpart:12%.
Sethnowhascompleteriskprofilesforthetwooptions.TheriskprofilesgiveSethaquickoverviewofthepossible
outcomesandtheirrespectiveprobabilitiesforeachchoice.Notethatwe'vecombinedscenariossothatwe
nowhaveprobabilitiesforeachdistinctoutcomevalue.HowdoesamanagerlikeSethuseriskprofilestoinforma
decision?
RiskprofilesallowSethtocompareoptionsnotjustintermsoftheirEMVsbutintermsoftherisktheyexpose
himto.FromSeth'sriskprofilewecanseethat,althoughtheK2optionhasthehigherEMV,itisalsoassociated
withgreaterrisk:a$2millionlossispossible,whereasthePonyoptiondoesn'tincludeanypossibleloss.
WhichoptionSethshouldchooseultimatelydependsonhowcomfortableheiswiththeriskassociatedwiththe
K2option.ThePonyoptioniscertaintodeliveraprofit.TheK2optionmightgenerateasubstantialloss.
Ifa$2millionlossismorethanSethbelieveshiscompanycanorshouldsustain,hemaywanttochoosethe
PonyoptiondespiteitslowerEMV.IfSethchoosesthePonyoption,hewillbedemonstratingthatheisrisk
averse:heischoosinganoptionwithalowerEMVinreturnforlowerexposuretorisk.
Mostpeopleareatleastslightlyriskaverse,asshownbytheirinclinationstobuyinsurance.Somepeopletendto
beriskseeking,thatis,theyarewillingtoforgooptionswithhigherEMVsandaccepthigherlevelsofriskin
returnforthepossibilityofextremelyhighreturns.
Althoughriskseekingbehaviorisgenerallyrareinthepopulationwhensignificantdownsideriskisinvolved,
manypeopletendtobesomewhatriskseekingwhenthevalueofthelossriskedislow.Forexample,theEMVfor
playingthelotteryislowerthanfornotplayingthelottery.However,sincetheamountriskedbyplayingthe
lotteryissolowthepriceofalotteryticketmanypeoplechoosetopayitinreturnfortheunlikelyoutcome
thattheywillwinamultimilliondollarprize.
WhenweusetheEMVasthebasisfordecisionmaking,weareactinginariskneutralmanner.Mostpeopleare
riskneutralwhentheymakeroutine"dayin,dayout"typesofdecisionsthoseforwhichpotentiallossesarenot
terriblyharmfultothedecisionmakerorherorganization.
PeoplefeelcomfortableusingtheEMVforroutinedecisionsbecausetheyfeelcomfortable"playingtheaverages":
somedecisionswillresultinlossesandsomeingains,butoverthelongrun,theoutcomeswillaverageout.We
expectthat,overtime,choosingoptionswiththehighestEMVswillleadtohighesttotalvalue.
MostpeoplefeelcomfortableusingEMVasthebasisfordecisionsthatarenotmaderepeatedlyprovidedthe
decisionshaveoutcomevaluessimilartothoseofother,moreroutinedecisionsthattheymakeonaregularbasis.
WhywouldPonyPicturesagreetopurchasetherightstoClovenandtakeonalltheriskassociatedwithmarketing
anddistributingthefilm?ForPony,thedealwithSethisaroutinedecision.Ponyisalargeenoughcompanyto
sustainapotentialmultimilliondollarlossifClovenflops.Ponyalsodistributes10to12moviesayear,sooverall,
thepositiveEMVofafilmreleasepromisesthatinthelongrun,Pony'soperationswillbeprofitable.
SethknowsthattheK2dealisveryrisky:ifhechoosestheK2deal,hewilltiehiscompany'sfinancialsuccessto
boxofficeperformanceandtoShawnConnelly'swhims.IfCloven"Flops,"S&CFilmsmightwellgobankrupt.
ButSethlikesagamble...

Summary
132/135

5/13/2016

QuantitativeMethodsOnlineCourse

Riskprofilesallowustoassesstheutilitydifferentoutcomesbringus,asopposedtotheirmonetaryvalue.
Theconcisesummaryriskprofilesprovidedhelpsuscompareandcontrastourdifferentdecisionoptions,
allowingustochoosetheoptionwepreferbasedonourattitudetorisk:riskaverse,riskseeking,orrisk
neutral.

SolvingtheMarketResearchProblem(II)
"AlthoughLeo'smarketresearcheventdoesn'tpayoffintermsofitsexpectedvalue,"Aliceexplains,"itcouldhelp
himmanagehisrisk."
Let'sassembletheriskprofileforLeo'sdecision,includingthedecisionaboutwhetherornottorunhismarket
researchevent.Wehavetokeepinmindthatthecostofrunninghiseventis$240,000.
WhichofthefollowingtablescorrectlysummarizestheoutcomesandprobabilitiesifLeochoosestorunhismarket
researchevent?
IfLeochoosestorunhismarketresearchevent,thentheparticipants'responsecanbeeither"Enthusiastic"or
"Tepid."Ifit's"Tepid,"Leowillchoosetostayoutofthefloatingrestaurantbusiness,sothereisa60%chancethat
Leowillincuronlythecostoftheevent,$240,000.
Iftheresponsetotheeventis"Enthusiastic,"thenthereisa65%chancethatLeowillmakeaprofitof$1.76million
theoriginalprofitestimateof$2million,minustheevent'scostof$240,000.Thelikelihoodofa$1.76million
profitis(40%)(65%),or26%.
Iftheresponsetotheeventis"Enthusiastic,"thenthereisan35%chancethatLeowillsufferalossof$1.04million
theoriginallossestimateof$800,000,minustheevent'scostof$240,000.Thelikelihoodofa$1.04millionlossis
(40%)(35%),or14%.
YounextcompletetheriskprofileforLeo'sprofitsifhechoosesnottoruntheevent.Youpresentyourriskanalysis
toLeotoshowhimwhyhemightwanttorunhiseventevenifitsexpectedmonetaryvalueisnotoptimal.
Isee.SoifIrunmyevent,I'mmostlikelytoincurthecostoftheevent,andnothingelse.But,iftheresponseis
"Enthusiastic,"Ishouldgoaheadwithmyplan.Then,thereisasmallchancethatIwillincuralargelossifthe
restaurantbecomesamere"Fad,"andagoodchancethattheTethyswillbecomea"Phenomenon."
Ontheotherhand,ifdon'trunmyevent,althoughmypotentialprofitsareslightlyhigher,theriskistoogreat.Ijust
havetoomuchtoloseatthispoint.
Atthesametime,IthinkIcanaffordtopayforamarketresearcheventnow,withoutgoingtothebank.Thenifmy
guestsreactfavorably,ImightconsiderdivingintotheTethysadventure.
Iwantyoutwotoenjoyyourlastdayhereandhavedinnerwithmetonight.Anywherespecialyou'dliketogo?
Hmmm.IknowthiswonderfulplacethatservesthemostdelightfulmixofFrenchandHawaiiancuisines...
Aw,shucks.

Exercise1:TheFrivolousLawsuitofD.Pitt
DanforthPittwasinjuredinabisquespillingincidentinaHawaiianluxuryresortrestaurant.Danforth,who
insistshehasn'tbeenabletoremovethescentofcrabfromhisknees,ispursuinghislegaloptions.
Afterinitialinvestigationandconsultation,Pitt'slawyersexplaintoDanforththathecanexpecttowin$250,000
indamageswith5%probabilityor$100,000with25%probability,beforesubtractingtheattorney'sfees.Thefees
associatedwithpursuinglegalactionareestimatedat$50,000,whichPittmustpaywhetherhewinsorlosesthe
suit.
WhatistheEMVofpursuinglegalactionagainsttheresorthotel?

EntertheEMVin$thousandsasadecimalnumberwithtwodigitstotherightofthedecimalpoint(e.g.,enter
"$5,000"as"5.00").Roundifnecessary.
TheEMVofpursuinglegalactionis
133/135

5/13/2016

QuantitativeMethodsOnlineCourse

$12,500.
Toavoidalegalbattle,thehotel'sowneragreestosettleoutofcourt,offeringDanforthtwofullweeksinthehotel's
penthousesuitefreeofcharge.Includingamenities,thevalueofthisofferis$12,000.
IfDanforthchoosestorejecttheoffertosettleoutofcourt,andtosuethehotelinstead,heshouldbecharacterized
aswhichofthefollowing?
SincetheEMVofpursuinglegalactionislowerthantheEMVofsettlingoutofcourt,butthepossiblegainsoflegal
action,thoughrelativelyunlikely,aremuchhigherthantheEMVofsettling,Danforthwouldhavetobe
characterizedasriskseeking.

FinalAssessmentTestIIntroduction
WelcometothepostassessmenttestfortheHBSQuantitativeMethodsTutorial.
Allquestionsmustbeansweredforyourexamtobescored.
Navigation:
Toadvancefromonequestiontothenext,selectoneoftheanswerchoicesor,ifapplicable,completewithyourown
choiceandclicktheSubmitbutton.Aftersubmittingyouranswer,youwillnotbeabletochangeit,somakesureyou
aresatisfiedwithyourselectionbeforeyousubmiteachanswer.Youmayalsoskipaquestionbypressingtheforward
advancearrow.PleasenotethatyoucanreturntoskippedquestionsusingtheJumptounansweredquestion
selectionmenuorthenavigationalarrowsatanytime.Althoughyoucanskipaquestion,youmustnavigatebacktoitand
answeritallquestionsmustbeansweredfortheexamtobescored.
Inthebriefcase,linkstoExcelspreadsheetscontainingzvalueandtvaluetablesaswellasutilitiesforfindingconfidence
intervalsandconductinghypothesistestsareprovidedforyourconvenience.Forsomequestions,additionallinksto
Excelspreadsheetscontainingrelevantdatawillappearimmediatelybelowthequestiontext.
Yourresultswillbedisplayedimmediatelyuponcompletionoftheexam.
Aftercompletion,youcanreviewyouranswersatanytimebyreturningtotheexam.
Goodluck!

FrequentlyAskedQuestions
Howdifficultarethequestionsontheexam?Theexamquestionshavealevelofdifficultysimilartothe
exercisesinthecourse.
CanIrefertostatisticstextbooksandonlineresourcestohelpmeduringthetest?Yes.Thisisanopen
bookexamination.
MayIreceiveassistanceontheexam?No.AlthoughwestronglyencouragecollaborativelearningatHBS,workon
examssuchastheassessmenttestsmustbeentirelyyourown.Thusyoumayneithergivenorreceivehelponany
examquestion.
Isthisatimedexam?No.Youshouldtakeabout6090minutestocompletetheexam,dependingonyour
familiaritywiththematerial,butyoumaytakelongerifyouneedto.
WhathappensifIam(ormyinternetconnectionis)interruptedwhiletakingtheexam?Youranswer
choiceswillberecordedforthequestionsyouwereabletocompleteandyouwillbeabletopickupwhereyouleftoff
whenyoureturntotheexamsite.
HowdoIseemyexamresults?Yourresultswillbedisplayedassoonasyousubmityouranswertothefinal
question.Theresultsscreenwillindicatewhichquestionsyouansweredcorrectly.

FinalAssessmentTestIIIntroduction
WelcometothesecondpostassessmenttestfortheHBSQuantitativeMethodsTutorial.
Allquestionsmustbeansweredforyourexamtobescored.
Navigation:
134/135

5/13/2016

QuantitativeMethodsOnlineCourse

Toadvancefromonequestiontothenext,selectoneoftheanswerchoicesor,ifapplicable,completewithyourown
choiceandclicktheSubmitbutton.Aftersubmittingyouranswer,youwillnotbeabletochangeit,somakesureyou
aresatisfiedwithyourselectionbeforeyousubmiteachanswer.Youmayalsoskipaquestionbypressingtheforward
advancearrow.PleasenotethatyoucanreturntoskippedquestionsusingtheJumptounansweredquestion
selectionmenuorthenavigationalarrowsatanytime.Althoughyoucanskipaquestion,youmustnavigatebacktoitand
answeritallquestionsmustbeansweredfortheexamtobescored.
Inthebriefcase,linkstoExcelspreadsheetscontainingzvalueandtvaluetablesareprovidedforyourconvenience.For
somequestions,additionallinkstoExcelspreadsheetscontainingrelevantdatawillappearimmediatelybelowthe
questiontext.
IMPORTANT:Takethisexamonlyaftertakingthepreviousfinalassessmentexam.Thereareonlytwo
finalassessmentexams.Thus,ifyoudidnotreceiveapassingscoreonthefirst,itiscriticalthatyou
reviewthecoursematerialcarefullybeforetakingthisexam.
Yourresultswillbedisplayedimmediatelyuponcompletionoftheexam.
Aftercompletion,youcanreviewyouranswersatanytimebyreturningtotheexam.
Goodluck!

FrequentlyAskedQuestions
Howdifficultarethequestionsontheexam?Theexamquestionshavealevelofdifficultysimilartothe
exercisesinthecourse.
CanIrefertostatisticstextbooksandonlineresourcestohelpmeduringthetest?Yes.Thisisanopen
bookexamination.
MayIreceiveassistanceontheexam?No.AlthoughwestronglyencouragecollaborativelearningatHBS,workon
examssuchastheassessmenttestsmustbeentirelyyourown.Thusyoumayneithergivenorreceivehelponany
examquestion.
Isthisatimedexam?No.Youshouldtakeabout6090minutestocompletetheexam,dependingonyour
familiaritywiththematerial,butyoumaytakelongerifyouneedto.
WhathappensifIam(ormyinternetconnectionis)interruptedwhiletakingtheexam?Youranswer
choiceswillberecordedforthequestionsyouwereabletocompleteandyouwillbeabletopickupwhereyouleftoff
whenyoureturntotheexamsite.
HowdoIseemyexamresults?Yourresultswillbedisplayedassoonasyousubmityouranswertothefinal
question.Theresultsscreenwillindicatewhichquestionsyouansweredcorrectly.
CopyrightHarvardBusinessSchoolPublishing.Copyingorpostingisaninfringementofcopyright.
Permissions@hbsp.harvard.eduor6177837860.

135/135

You might also like