Textbook Summaries

Chapter1
Descriptivestatisticsdealswithmethodsoforganizing,summarizing,andpresentingdata
inaconvenientandinformativeway.Oneformofdescriptivestatisticsusesgraphical
techniquesthatallowstatisticspractitionerstopresentdatainwaysthatmakeiteasyfor
thereadertoextractusefulinformation.
Anotherformofdescriptivestatisticsusesnumericaltechniquestosummarizedata.One
suchmethodthatyouhavealreadyusedfrequentlycalculatestheaverageormean.
Measureofcentrallocation,example:theaverage,themean
Measureofvariability,example:range
Inferentialstatisticsisabodyofmethodsusedtodrawconclusionsorinferencesabout
characteristicsofpopulationsbasedonsampledata.
Exitpolls:arandomsampleofvoterswhoexitthepollingboothareaskedforwhomthey
voted.
Statisticalinferenceproblemsinvolvethreekeyconcepts:thepopulation,thesample,and
thestatisticalinference:
o A population isthegroupofallitemsofinteresttoastatisticspractitioner.A
descriptivemeasureofapopulationiscalledaparameter.
o A sample is a set of data drawn from the studied population. A descriptive
measureofasampleiscalledastatistic.
o Statisticalinferenceistheprocessofmakinganestimate,prediction,ordecision
aboutapopulationbasedonsampledata.
Itisfareasierandcheapertotakeasamplefromthepopulationofinterestanddraw
conclusionsormakeestimatesaboutthepopulationonthebasisofinformationprovided
bythesample.However,suchconclusionsandestimates arenotalwaysgoingtobe
correct.Forthisreason,webuildintothestatisticalinferenceameasureofreliability.
Therearetwosuchmeasures:theconfidencelevelandthesignificancelevel.
o Theconfidencelevelistheproportionoftimesthatanestimatingprocedurewill
becorrect.
o Whenthepurposeofthestatisticalinferenceistodraw aconclusionabouta
population,thesignificancelevelmeasureshowfrequentlytheconclusionwillbe
wrong.
Tohelpstudentsunderstandthebasicfoundation,weoffertwoapproaches.First,wewill
teach readers how to create Excel spreadsheets that allow for whatif analyses. By
changingsomeoftheinputvalue,studentscanseeforthemselveshowstatisticsworks.
(ThetermisderivedfromwhathappenstothestatisticsifIchangethisvalue?).
Second, weoffer applets, which arecomputerprograms thatperform similarwhatif
analysesorsimulations.
Chapter2
Avariableissomecharacteristicofapopulationorsample.
Thevaluesofthevariablearethepossibleobservationsofthevariable.
Dataaretheobservedvaluesofavariable.Dataispluralfordatum.
Intervaldataarerealnumbers,suchasheights,weights,incomes,anddistances.Wealso
refertothistypeofdataasquantitativeornumerical.
Thevaluesof nominal dataarecategories.Forexample,responsestoquestionsabout
marital status produce nominal data. The values of this variable are single, married,
divorced,andwidowed.Noticethatthevaluesarenotnumbersbutinsteadarewordsthat
describethecategories.Weoftenrecordnominaldatabyarbitrarilyassigninganumber
toeachcategory.Nominaldataarealsocalledqualitativeorcategorical.
Thethirdtypeofdataisordinal.Ordinaldataappeartobenominal,butthedifferenceis
thattheorderoftheirvalueshasmeaning.
Thedifferencebetweennominalandordinaltypesofdataisthattheorderofthevalues
ofthelatterindicateahigherrating.Consequently,whenassigningcodestothevalues,
weshouldmaintaintheorderofthevalues.Itsnotthemagnitudeofthevaluesthatis
important,itstheirorder.
Students often have difficulty distinguishing between ordinal and interval data. The
criticaldifferencebetweenthemisthattheintervalsordifferencesbetweenvaluesof
intervaldataareconsistentandmeaningful(whichiswhythistypeofdatais called
interval).Forexample,thedifferencebetweenmarksof85and80isthesamefivemark
differencethatexistsbetween75and70thatis,wecancalculatethedifferenceand
interprettheresults.
Becausethecodesrepresentingordinaldataarearbitrarilyassignedexceptfortheorder,
wecannotcalculateandinterpretdifferences.Forexample,usinga12345coding
systemtorepresentpoor,fair,good,verygood,andexcellent,wenotethatthedifference
betweenexcellentandverygoodisidenticaltothedifferencebetweengoodandfair.
Witha618234588coding,thedifferencebetweenexcellentandverygoodis43,and
thedifferencebetweengoodandfairis5.
Allcalculationsarepermittedonintervaldata.Weoftendescribeasetofintervaldataby
calculatingtheaverage.Forexample,theaverageofthe10markslistedonpage13is
70.3.
Because the codes of nominal data are completely arbitrary, we cannot perform any
calculationsonthesecodes.Calculationsbasedonthecodesusedtostorethistypeof
dataaremeaningless.Allthatwearepermittedtodowithnominaldatais countor
computethepercentagesoftheoccurrencesofeachcategory.
Themostimportantaspectofordinaldataistheorderofthevalues.Asaresult,theonly
permissiblecalculationsarethoseinvolvingarankingprocess.Forexample,wecanplace
all the data in order and select the code that lies in the middle. This descriptive
measurementiscalledthemedian.
HierarchyofData:Thedatatypescanbeplacedinorderofthepermissiblecalculations.
Atthetopofthelist,weplacetheintervaldatatypebecausevirtuallyallcomputations
areallowed.Thenominaldatatypeisatthebottombecausenocalculationsotherthan
determiningfrequenciesarepermitted.(Wearepermittedtoperformcalculationsusing
the frequencies of codes, but this differs from performing calculations on the codes
themselves.)Inbetweenintervalandnominaldataliestheordinaldatatype.Permissible
calculationsareonesthatrankthedata.
Asdiscussed,theonlyallowablecalculationonnominaldataistocountthefrequencyor
computethepercentagethateachvalueofthevariablerepresents.Wecansummarizethe
data in a table, which presents the categories and their counts, called a frequency
distribution. A relativefrequencydistribution liststhecategoriesandtheproportion
withwhicheachoccurs.Wecanusegraphicaltechniquestopresentapictureofthedata.
Therearetwographicalmethodswecanuse:thebarchartandthepiechart.
ToconstructafrequencyandrelativefrequencydistributionfornominaldatainExcel:
1) CopythedataintoExcel
2) Activateanyemptycellandtype=COUNTIF([Inputrange],[Criteria])
Inputrangearethecellscontainingthedata.
Thecriteriaarethecodesyouwanttocount.
Example: To count the number of 1s (Working fulltime), type
=COUNTIF(P1:P2024,1).
The information contained in the data is summarized well in the table. However,
graphicaltechniquesgenerallycatchareaderseyemorequicklythandoesatableof
numbers.Twographicaltechniquescanbeusedtodisplaytheresultsshowninthetable.
Abarchartisoftenusedtodisplayfrequencies;apiechartgraphicallyshowsrelative
frequencies.
Thebarchartiscreatedbydrawingarectanglerepresentingeachcategory.Theheightof
therectanglerepresentsthefrequency.Thebaseisarbitrary.
Ifwewishtoemphasizetherelativefrequenciesinsteadofdrawingthebarchart,we
drawapiechart.Apiechartissimplyacirclesubdividedintoslicesthatrepresentthe
categories.Itisdrawnsothatthesizeofeachsliceisproportionaltothepercentage
correspondingtothatcategory.
Toconstructabarandpiechartforfrequencyandrelativefrequencydistributionfor
nominaldatainExcel:
1) Aftercreatingthefrequencydistribution,highlightthecolumnoffrequencies.
2) Forabarchart,clickInsert,Column,andthefirst2DColumn.
3) ClickChartTools(ifitdoesnotappear,clickinsidetheboxcontainingthebar
chart)andLayout.Thiswillallowyoutomakechangestothechart.Weremoved
theGridlines,theLegend,andclickedtheDataLabelstocreatethetitles.
4) Forapiechart,clickPieandChartToolstoeditthegraph.
Interpretation:Thebarchartfocusesonthefrequenciesandthepiechartfocusesonthe
proportions.
Pie and bar charts are used widely in newspapers, magazines, and business and
governmentreports.Onereasonforthisappealisthattheyareeyecatchingandcan
attractthereadersinterestwhereasatableofnumbersmightnot.Pieandbarchartsare
frequentlyusedtosimplypresentnumbersassociatedwithcategories.Theonlyreasonto
useabarorpiechartinsuchasituationwouldbetoenhancethereadersabilitytograsp
thesubstanceofthedata.
Therearenospecificgraphicaltechniquesforordinaldata.Consequently,whenwewish
todescribeasetofordinaldata,wewilltreatthedataasiftheywerenominalandusethe
techniques described in this section. The only criterion is that the bars in bar charts
shouldbearrangedinascending(ordescending)ordinalvalues;inpiecharts,thewedges
aretypicallyarrangedclockwiseinascendingordescendingorder.
Techniquesappliedtosinglesetsofdataarecalledunivariate.Therearemanysituations
wherewewishtodepicttherelationshipbetweenvariables;insuchcases, bivariate
methodsarerequired.Acrossclassificationtable(alsocalledacrosstabulationtable)is
usedtodescribetherelationshipbetweentwonominalvariables.
Todescribetherelationshipbetweentwonominalvariables,wemustrememberthatwe
arepermittedonlytodeterminethefrequencyofthevalues.Asafirststep,weneedto
produceacrossclassificationtablethatliststhefrequencyofeachcombinationofthe
valuesofthetwovariables.
Excelcanproducethecrossclassificationtableusingseveralmethods.Wewilluseand
describethePivotTableintwoways:(1)tocreatethecrossclassificationtablefeaturing
thecountsand(2)toproduceatableshowingtherowrelativefrequencies.
1) ClickInsertandPivotTable.
2) MakesurethattheTable/Rangeiscorrect.
3) DragtheOccupationbuttontotheROWsectionofthebox.DragtheNewspaper
button to the COLUMN section. Drag the Reader button to the DATA field.
Rightclickanynumberinthetable,clickSummarizeDataBy,andcheckCount.
Toconverttorowpercentages,rightclickanynumber,clickSummarizeDataBy,
MoreoptionsandShowvaluesas.Scrolldownandclick%ofrows.(Wethen
formatted the data into decimals. To improve both tables, we substituted the
namesoftheoccupationsandnewspapers.
Thereareseveralwaystostorethedatatobeusedinthissectiontoproduceatableora
barorpiechart.
1) Thedataareintwocolumns.Thefirstcolumnrepresentsthecategoriesofthe
firstnominalvariable,andthesecondcolumnstoresthecategoriesforthe
secondvariable.Eachrowrepresentsoneobservationofthetwovariables.
Thenumberofobservationsineachcolumnmustbethesame.Exceland
Minitab can produce a crossclassification table from these data. (To use
Excels PivotTable, there also must be a third variable representing the
observationnumber.)
2) Thedataarestoredintwoormorecolumns,witheachcolumnrepresenting
the same variable in a different sample or population. For example, the
variablemaybethetypeofundergraduatedegreeofapplicantstoanMBA
program,andtheremaybefiveuniversitieswewishtocompare.Toproducea
crossclassificationtable,wewouldhavetocountthenumberofobservations
ofeachcategory(undergraduatedegree)ineachcolumn.
3) Thetablerepresentingcountsinacrossclassificationtablemayhavealready
beencreated.
Chapter3
Thehistogramnotonlyisapowerfulgraphicaltechniqueusedtosummarizeintervaldata
butalsoisusedtohelpexplainanimportantaspectofprobability.
We create a frequency distribution for interval data by counting the number of
observationsthatfallintoeachofaseriesofintervals,calledclasses,thatcoverthe
completerangeofobservations.
Ahistogramiscreatedbydrawingrectangleswhosebasesaretheintervalsandwhose
heightsarethefrequencies.
Howtocreateahistogram:
1) Typeorimportthedataintoonecolumn.Inanothercolumn,typetheupperlimits
oftheclassintervals.Excelcallsthembins.
2) ClickData,DataAnalysis,andHistogram.
3) Specify the Input Range (A1:A201) and the Bin Range (B1:B9). Click Chart
Output.ClickLabelsifthefirstrowcontainsnames.
4) Toremovethegaps,placethecursoroveroneoftherectanglesandclicktheright
buttonofthemouse.Click(withtheleftbutton)FormatDataSeriesmovethe
pointertoGapWidthandusetheslidertochangethenumberfrom150to0.
Thenumberofclassintervalsweselectdependsentirelyonthenumberofobservations
inthedataset.Themoreobservationswehave,thelargerthenumberofclassintervals
weneedtousetodrawausefulhistogram.
ApproximateNumberofClassesinHistograms
AnalternativetotheguidelineslistedinthetableaboveistouseSturgessformula,
whichrecommendsthatthenumberofclassintervalsbedeterminedbythefollowing:
1) Numberofclassintervals=1+3.3log(n)
Forexample,ifn=50Sturgessformulabecomes
2) Numberofclassintervals=1+3.3log(50)=1+3.3(1.7)=6.6whichweround
to7.
ClassIntervalWidths
1) Wedeterminetheapproximatewidthoftheclassesbysubtractingthesmallest
observationfromthelargestanddividingthedifferencebythenumberofclasses.
Thus, class width = (largest observation smallest observation)/number of
classes.
Wethendefineourclasslimitsbyselectingalowerlimitforthefirstclassfromwhichall
otherlimitsaredetermined.Theonlyconditionweapplyisthatthefirstclassinterval
mustcontainthesmallestobservation.
The purpose ofdrawing histograms, like that ofall other statistical techniques, is to
acquireinformation.Oncewehavetheinformation,wefrequentlyneedtodescribewhat
weve learned to others. We describe the shape of histograms on the basis of the
followingcharacteristics.
1) Ahistogramissaidtobesymmetricif,whenwedrawaverticallinedownthe
centerofthehistogram,thetwosidesareidenticalinshapeandsize.
2) Askewedhistogramisonewithalongtailextendingtoeithertherightortheleft.
The former is called positively skewed, and the latter is called negatively
skewed.
Amodeistheobservationthatoccurswiththegreatestfrequency.Amodalclassisthe
class withthelargestnumberofobservations.A unimodalhistogram isonewitha
singlepeak.
A bimodalhistogram isonewithtwopeaks,notnecessarilyequalinheight.Bimodal
histogramsoftenindicatethattwodifferentdistributionsarepresent.
Aspecialtypeofsymmetricunimodalhistogramisonethatisbellshaped.
Oneofthedrawbacksofthehistogramisthatwelosepotentiallyusefulinformationby
classifyingtheobservations.
yclassifyingtheobservationswedidacquireusefulinformation.However,thehistogram
focusesourattentiononthefrequencyofeachclassandbydoingsosacrificeswhatever
informationwascontainedintheactualobservations.AstatisticiannamedJohnTukey
introducedthestemandleafdisplay,whichisamethodthattosomeextentovercomes
thisloss.
Thefirststepindevelopingastemandleafdisplayistospliteachobservationintotwo
parts,astemandaleaf.
Thereareseveraldifferentwaysofdoingthis.Forexample,thenumber12.3canbesplit
sothatthestemis12andtheleafis3.Inthisdefinitionthestemconsistsofthedigitsto
theleftofthedecimalandtheleafisthedigittotherightofthedecimal.Anothermethod
candefinethestemas1andtheleafas2(ignoringthe3).Inthisdefinitionthestemis
thenumberoftensandtheleafisthenumberofones.Aftereachstem,welistthatstems
leaves,usuallyinascendingorder.
Thestemandleafdisplayissimilartoahistogramturnedonitsside.Thelengthofeach
linerepresentsthefrequencyintheclassintervaldefinedbythestems.Theadvantageof
thestemandleafdisplayoverthehistogramisthatwecanseetheactualobservations.
The frequency distribution lists the number of observations that fall into each class
interval. We can also create a relative frequency distribution by dividing the
frequenciesbythenumberofobservations.
As you can see, the relative frequency distribution highlights the proportion of the
observationsthatfallintoeachclass.Insomesituations,wemaywishtohighlightthe
proportionofobservationsthatliebeloweachoftheclasslimits.Insuchcases,wecreate
thecumulativerelativefrequencydistribution.
Another way of presenting this information is the ogive, which is a graphical
representationofthecumulativerelativefrequencies.
Besidesclassifyingdatabytype,wecanalsoclassifythemaccordingtowhetherthe
observationsaremeasuredatthesametimeorwhethertheyrepresentmeasurementsat
successivepointsintime.Theformerarecalledcrosssectionaldata,andthelattertime
seriesdata.
Timeseriesdataareoftengraphicallydepictedona linechart,whichisaplotofthe
variableovertime.Itiscreatedbyplottingthevalueofthevariableontheverticalaxis
andthetimeperiodsonthehorizontalaxis.
Graphicalexcellence:atermweapplytotechniquesthatareinformativeandconcise
andthatimpartinformationclearlytotheirviewers.
Graphicalexcellenceisachievedwhenthefollowingcharacteristicsapply.
1) The graph presents large data sets concisely and coherently. Graphical
techniqueswerecreatedtosummarizeanddescribelargedatasets.Smalldatasets
areeasilysummarizedwithatable.Oneortwonumberscanbestbepresentedin
asentence.
2) Theideasandconceptsthestatisticspractitionerwantstodeliverareclearly
understood by the viewer. The chart is designed to describe what would
otherwisebedescribedinwords.Anexcellentchartisonethatcanreplacea
thousandwordsandstillbeclearlycomprehendedbyitsreaders.
3) Thegraphencouragestheviewertocomparetwoormorevariables.Graphs
displayingonlyonevariableprovideverylittleinformation.Graphsareoftenbest
usedtodepictrelationshipsbetweentwoormorevariablesortoexplainhowand
whytheobservedresultsoccurred.
4) Thedisplayinducestheviewertoaddressthesubstanceofthedataandnot
theformofthegraph. Theformofthegraphissupposedtohelppresentthe
substance. If the form replaces the substance, the chart is not performing its
function.
5) Thereisnodistortion ofwhatthedatareveal. Youcannotmakestatistical
techniquessaywhateveryoulike.Aknowledgeablereaderwilleasilyseethrough
distortionsanddeception.
GraphicalDeception
o Thefirstthingtowatchforisagraphwithoutascaleononeaxis.
o Asecondtraptoavoidisbeinginfluencedbyagraphscaption.
o Perspective is often distorted if only absolute changes in value, rather than

percentagechanges,arereported.
o Agraphcanbemadetoappearmoredramaticbystretchingtheaxis
Chapter4
Therearethreedifferentmeasuresthatweusetodescribethecenterofasetofdata.The
firstisthebestknown,thearithmeticmean,whichwellrefertosimplyasthemeanor
the average. Themeaniscomputedbysummingtheobservationsanddividingbythe
numberofobservations.
Welabeltheobservationsinasamplex1,x2,,xn,wherex1isthefirstobservation,x2
isthesecond,andsoonuntilxn,wherenisthesamplesize.Asaresult,thesamplemean
isdenotedx.Inapopulation,thenumberofobservationsislabeledNandthepopulation
meanisdenotedbym(Greeklettermu).
The median is calculated by placing all the observations in order (ascending or

descending).Theobservationthatfallsinthemiddleis themedian.Thesampleand
populationmediansarecomputedinthesameway.
Whenthereisanevennumberofobservations,themedianisdeterminedbyaveraging
thetwoobservationsinthemiddle.
The mode isdefinedastheobservation(orobservations)thatoccurswiththegreatest
frequency.Boththestatisticandparameterarecomputedinthesameway.
Forpopulationsandlargesamples,itispreferabletoreportthemodalclass
Withthreemeasuresfromwhichtochoose,whichoneshouldweuse?Thereareseveral
factorstoconsiderwhenmakingourchoiceofmeasureofcentrallocation.Themeanis
generallyourfirstselection.However,thereareseveralcircumstanceswhenthemedian
isbetter.Themodeisseldomthebestmeasureofcentrallocation.Oneadvantagethe
medianholdsisthatitisnotassensitivetoextremevaluesasisthemean.
Whenthedataareinterval,wecanuseanyofthethreemeasuresofcentrallocation.
However,forordinalandnominaldata,thecalculationofthemeanisnotvalid.Because
the calculation of the median begins by placing the data in order, this statistic is
appropriateforordinaldata.Themode,whichisdeterminedbycountingthefrequencyof
eachobservation,isappropriatefornominaldata.However,nominaldatadonothavea
center,sowecannotinterpretthemodeofnominaldatainthatway.Itisgenerally
pointlesstocomputethemodeofnominaldata.
Measuresofvariability:
o Range=LargestobservationSmallestobservation
Theadvantageoftherangeisitssimplicity.Thedisadvantageisalsoits
simplicity.Becausetherangeiscalculatedfromonlytwoobservations,it
tellsusnothingabouttheotherobservations.
o The variance anditsrelatedmeasure,the standarddeviation,arearguablythe
mostimportantstatistics.
Examinetheformulaforthesamplevariance s2.Itmayappeartobeillogicalthatin
calculatings2wedividebyn1ratherthanbyn.*However,wedosoforthefollowing
reason.Populationparametersinpracticalsettingsareseldomknown.Oneobjectiveof
statistical inference is to estimate the parameter from the statistic. For example, we
estimatethepopulationmeanmfromthesamplemean x.Althoughitisnotobviously
logical,thestatisticcreatedbydividinga(xix)2byn1isabetterestimatorthantheone
createdbydividingbyn.
Tocomputethesamplevariances2,webeginbycalculatingthesamplemeanx.Nextwe
computethedifference(alsocallthedeviation)betweeneachobservationandthemean.
Wesquarethedeviationsandsum.Finally,wedividethesumofsquareddeviationsbyn
1.
Thevarianceprovidesuswithonlyaroughideaabouttheamountofvariationinthe
data.However,thisstatisticisusefulwhencomparingtwoormoresetsofdataofthe
sametypeofvariable.Ifthevarianceofonedatasetislargerthanthatofaseconddata
set,weinterpretthattomeanthattheobservationsinthefirstsetdisplaymorevariation
thantheobservationsinthesecondset.
TheStandardDeviationissimplythepositivesquarerootofthevariance.
Knowingthe meanand standarddeviation allows thestatistics practitioner toextract

usefulbitsofinformation.Theinformationdependsontheshapeofthehistogram.Ifthe
histogramisbellshaped,wecanusetheEmpiricalRule.
o Approximately68%ofallobservationsfallwithinonestandarddeviationofthe
mean.
o Approximately95%ofallobservationsfallwithintwostandarddeviationsofthe
mean.
o Approximately99.7%ofallobservationsfallwithinthreestandarddeviationsof
themean.
When k 2, Chebysheffs Theorem states that at least threequarters (75%) of all

observations lie withintwostandard deviations ofthe mean.With k 3,Chebysheffs
Theorem states that at least eightninths (88.9%) of all observations lie within three
standarddeviationsofthemean.
Measuresofrelativestandingaredesignedtoprovideinformationaboutthepositionof
particularvaluesrelativetotheentiredataset.
ThePthpercentileisthevalueforwhichPpercentarelessthanthatvalueand(100P)
%aregreaterthanthatvalue.
o Example:ourSATscoreisreportedtobeatthe60thpercentile.Thismeansthat
60%ofalltheothermarksarebelowyoursand40%areaboveit.Younowknow
exactlywhereyoustandrelativetothepopulationofSATscores.
Wehavespecialnamesforthe25th,50th,and75thpercentiles.Becausethesethree
statisticsdividethesetofdataintoquarters,thesemeasuresofrelativestandingarealso
called quartiles. The first or lower quartile is labeled Q1. It is equal to the 25th
percentile.The secondquartile, Q2,isequaltothe50thpercentile,whichisalsothe
median.Thethirdorupperquartile,Q3,isequaltothe75thpercentile.
Thefollowingformulaallowsustoapproximatethelocationofanypercentile:
P
o L =(n+1) 100
P
whereLPisthelocationofthePthpercentile.
Example:
o Placingthe10observationsinascendingorderweget00578912142233
25
o Thelocationofthe25thpercentileisL25=(10+1) =(11)(.25)=2.75
o The25thpercentileisthreequartersofthedistancebetweenthesecond(whichis
0)andthethird(whichis5)observations.Threequartersofthedistanceis(.75)
(50)=3.75.
o Becausethesecondobservationis0,the25thpercentileis0+3.75=3.75.
Wecanoftengetanideaoftheshapeofthehistogramfromthequartiles.Forexample,if
the first and second quartiles are closer to each other than are the second and third
quartiles,thenthehistogramispositivelyskewed.Ifthefirstandsecondquartilesare
fartherapartthanthesecondandthirdquartiles,thenthehistogramisnegativelyskewed.
Ifthedifferencebetweenthefirstandsecondquartilesisapproximatelyequaltothe
differencebetweenthesecondandthirdquartiles,thenthehistogramisapproximately
symmetric.
The quartiles can be used to create another measureof variability, the interquartile
range:InterquartilerangeQ Q
3 1
Theinterquartilerangemeasuresthespreadofthemiddle50%oftheobservations.Large
valuesofthisstatisticmeanthatthefirstandthirdquartilesarefarapart,indicatingahigh
levelofvariability.
Boxplotsgraphfivestatistics:theminimumandmaximumobservations,andthefirst,
second,andthirdquartiles.Italsodepictsotherfeaturesofasetofdata.
Thethreeverticallinesoftheboxarethefirst,second,andthirdquartiles.Thelines
extending to the left and right are called whiskers. Any points that lie outside the
whiskersarecalledoutliers.Thewhiskersextendoutwardtothesmallerof1.5timesthe
interquartilerangeortothemostextremepointthatisnotanoutlier.
Outliers areunusuallylargeorsmallobservations.Becauseanoutlierisconsiderably
removedfromthemainbodyofthedataset,itsvalidityissuspect.Consequently,outliers
shouldbecheckedtodeterminethattheyarenottheresultofanerrorinrecordingtheir
values.Outlierscanalsorepresentunusualobservationsthatshouldbeinvestigated.
INSTRUCTIONS
o Typeorimportthedataintoonecolumnortwoormoreadjacentcolumns.(Open
Xm0301.)
o ClickAddIns,DataAnalysisPlus,andBoxPlot.
o SpecifytheInputRange(A1:A201).
o Aboxplotwillbecreatedforeachcolumnofdatathatyouhavespecifiedor
highlighted
FactorsThatIdentifyWhentoComputePercentilesandQuartiles
1.Objective:Describeasinglesetofdata
2.Typeofdata:Intervalorordinal
3.Descriptivemeasurement:Relativestanding
FactorsThatIdentifyWhentoComputetheInterquartileRange
1.Objective:Describeasinglesetofdata
2.Typeofdata:Intervalorordinal
3.Descriptivemeasurement:Variability
Chapter3(cont)
Statisticspractitionersfrequentlyneedtoknowhowtwointervalvariablesarerelated.
Thetechniqueiscalledascatterdiagram.
Todrawascatterdiagram,weneeddatafortwovariables.Inapplicationswhereone
variabledependstosomedegreeontheothervariable,welabelthedependentvariableY
andtheother,calledtheindependentvariable,X.Inothercaseswherenodependencyis
evident,welabelthevariablesarbitrarily.
Todeterminethestrengthofthelinearrelationship,wedrawastraightlinethroughthe
pointsinsuchawaythatthelinerepresentstherelationship.Ifmostofthepointsfall
closetotheline,wesaythatthereisalinearrelationship.Ifmostofthepointsappearto
bescatteredrandomlywithonlyasemblanceofastraightline,thereisno,oratbest,a
weaklinearrelationship.
Ingeneral,ifonevariableincreaseswhentheotherdoes,wesaythatthereisapositive
linearrelationship.Whenthetwovariablestendtomoveinoppositedirections,we
describethenatureoftheirassociationasanegativelinearrelationship.
Ininterpretingtheresultsofascatterdiagramitisimportanttounderstandthatiftwo
variablesarelinearlyrelateditdoesnotmeanthatoneiscausingtheother.Infact,we
canneverconcludethatonevariablecausesanothervariable.Wecanexpressthismore
eloquentlyasCorrelationisnotcausation.
Chapter4(cont)
InChapter3,weintroducedthescatterdiagram,agraphicaltechniquethatdescribesthe
relationshipbetweentwointervalvariables.Atthattime,wepointedoutthatwewere
particularlyinterestedinthedirectionandstrengthofthelinearrelationship.Wenow
present three numerical measures oflinear relationship that provide this information:
covariance,coefficientofcorrelation,andcoefficientofdetermination.
Chapter6
A random experiment is an action or process that leads to one of several possible

outcomes.
Thefirststepinassigningprobabilitiesistoproducealistoftheoutcomes.Thelisted
outcomesmustbeexhaustive,whichmeansthatallpossibleoutcomesmustbeincluded.
In addition, the outcomes must be mutually exclusive, which means that no two
outcomescanoccuratthesametime.
A sample space of a random experiment is a list of all possible outcomes of the
experiment.Theoutcomesmustbeexhaustiveandmutuallyexclusive.
Therearethreeapproachestoassigningprobabilities:
o Theclassicalapproachisusedbymathematicianstohelpdetermineprobability
associatedwithgamesofchance.Forexample,theclassicalapproachspecifies
thattheprobabilitiesofheadsandtailsintheflipofabalancedcoinareequalto
eachother.Becausethesumoftheprobabilitiesmustbe1,theprobabilityof
heads and the probability of tails are both 50%. Similarly, the six possible
outcomesofthetossofabalanceddiehavethesameprobability;eachisassigned
aprobabilityof1/6.
o The relative frequency approach defines probability as the longrun relative
frequencywithwhichanoutcomeoccurs.Forexample,supposethatweknow
thatofthelast1,000studentswhotookthestatisticscourseyourenowtaking,
200receivedagradeofA.TherelativefrequencyofAsisthen200/1000or20%.
ThisfigurerepresentsanestimateoftheprobabilityofobtainingagradeofAin
thecourse.Itisonlyanestimatebecausetherelativefrequencyapproachdefines
probabilityasthelongrunrelativefrequency.Onethousandstudentsdonot
constitutethelongrun.Thelargerthenumberofstudentswhosegradeswehave
observed,thebettertheestimatebecomes.Intheory,wewouldhavetoobservean
infinitenumberofgradestodeterminetheexactprobability.
o Whenitisnotreasonabletousetheclassicalapproachandthereisnohistoryof
theoutcomes,wehavenoalternativebuttoemploythesubjectiveapproach.In
thesubjectiveapproach,wedefineprobabilityasthedegreeofbeliefthatwehold
intheoccurrenceofanevent.
Aneventisacollectionorsetofoneormoresimpleeventsinasamplespace.
Theprobabilityofaneventis thesumoftheprobabilities ofthesimpleevents that
constitutetheevent.
Nomatterwhatmethodwasusedtoassignprobability,weinterpretitusingtherelative
frequencyapproachforaninfinitenumberofexperiments.
TheintersectionofeventsAandBistheeventthatoccurswhenbothAandBoccur.It
isdenotedasAandB.Theprobabilityoftheintersectioniscalledthejointprobability.
Forexample,onewaytotossa3withtwodiceistotossa1onthefirstdieanda2on
theseconddie,whichistheintersectionoftwosimpleevents.
Marginal probabilities, computed by adding across rows or down columns, are so
namedbecausetheyarecalculatedinthemarginsofthetable.
Wefrequentlyneedtoknowhowtwoeventsarerelated.Inparticular,wewould
liketoknowtheprobabilityofoneeventgiventheoccurrenceofanotherrelated
event.Thisprobabilityiscalledaconditionalprobability.Theconditionalprobability
thatweseekisrepresentedbyP(B1|A1)wherethe|representsthewordgiven.
Oneoftheobjectivesofcalculatingconditionalprobabilityistodeterminewhethertwo
eventsarerelated.Inparticular,wewouldliketoknowwhethertheyare independent
events.
Putanotherway,twoeventsareindependentiftheprobabilityofoneeventisnotaffected
bytheoccurrenceoftheotherevent.
Anothereventthatisthecombinationofothereventsistheunion.
ThecomplementofeventAistheeventthatoccurswheneventAdoesnotoccur.The
complementofevent A isdenotedby AC.The complementrule definedherederives
from the fact that the probability of an event and the probability of the events
complementmustsumto1.
Themultiplicationruleisusedtocalculatethejointprobabilityoftwoevents.Itisbased
ontheformulaforconditionalprobability.Wederivethemultiplicationrulesimplyby
multiplyingbothsidesbyP(B).
IfAandBareindependentevents,P(A|B)=P(A)andP(B|A)=P(B).Itfollowsthatthe
jointprobabilityoftwoindependenteventsissimplytheproductoftheprobabilitiesof
thetwoevents.Wecanexpressthisasaspecialformofthemultiplicationrule.
Theadditionruleenablesustocalculatetheprobabilityoftheunionoftwoevents.
Aswasthecasewiththemultiplicationrule,thereisaspecialformoftheadditionrule.
Whentwoeventsaremutuallyexclusive(whichmeansthatthetwoeventscannotoccur
together),theirjointprobabilityis0.
Aneffectiveandsimplermethodofapplyingtheprobabilityrulesistheprobabilitytree,
wherein the events in an experiment are represented by lines. The resulting figure
resemblesatree,hencethename.
Theadvantageofaprobabilitytreeisthatitrestrainsitsusersfrommakingthewrong
calculation.Oncethetreeisdrawnandtheprobabilitiesofthebranchesinserted,virtually
theonlyallowablecalculationisthemultiplicationoftheprobabilitiesoflinkedbranches.
Aneasycheckonthosecalculationsisavailable.Thejointprobabilitiesattheendsofthe
branchesmustsumto1becauseallpossibleeventsarelisted.
Conditionalprobabilityisoftenusedtogaugetherelationshipbetweentwoevents.There
aresituations,however,wherewewitnessaparticulareventandweneedtocomputethe
probabilityofoneofitspossiblecauses.BayessLawisthetechniqueweuse.
The probabilities P(A) and P(AC) are called prior probabilities because they are
determined prior tothedecisionabouttakingthepreparatorycourse.Theconditional
probabilities are called likelihood probabilities for reasons that are beyond the
mathematics in this book. Finally, the conditional probability P(A B) and similar
conditionalprobabilities P(AC | B), P(A | BC),and P(AC | BC)arecalled posterior
probabilitiesorrevisedprobabilitiesbecausethepriorprobabilitiesarerevisedafterthe
decisionabouttakingthepreparatorycourse.
Chapter7
A randomvariable isafunctionorrulethatassignsanumbertoeachoutcomeofan
experiment.
Therearetwotypesofrandomvariables,discreteandcontinuous.Adiscreterandom
variableisonethatcantakeonacountablenumberofvalues.
A continuous random variable is one whose values are uncountable. An excellent
exampleofacontinuousrandomvariableistheamountoftimetocompleteatask.
A probabilitydistribution isatable,formula,orgraphthatdescribesthevaluesofa
randomvariableandtheprobabilityassociatedwiththesevalues.
An uppercase letter will represent the name of the random variable, usually X. Its
lowercasecounterpartwillrepresentthevalueoftherandomvariable.Thus,werepresent
theprobabilitythattherandomvariableXwillequalxasP(X=x)ormoresimplyP(x).
Thepopulationmeanistheweightedaverageofallofitsvalues.Theweightsarethe
probabilities.ThisparameterisalsocalledtheexpectedvalueofXandisrepresentedby
E(X).
Thepopulationvarianceiscalculatedsimilarly.Itistheweightedaverageofthesquared
deviationsfromthemean.
Thereisashortcutcalculationthatsimplifiesthecalculationsforthepopulationvariance.
Thisformulaisnotanapproximation;itwillyieldthesamevalueastheformulaabove.
Asyouwilldiscover,weoftencreatenewvariablesthatarefunctionsofotherrandom
variables.Theformulasgiveninthenexttwoboxesallowustoquicklydeterminethe
expectedvalueandvarianceofthesenewvariables.Inthenotationusedhere,Xisthe
randomvariableandcisaconstant.

Textbook Summaries

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Textbook Summaries

Uploaded by

Copyright:

Available Formats

Chapter1

o Perspective is often distorted if only absolute changes in value, rather than

The median is calculated by placing all the observations in order (ascending or

Knowingthe meanand standarddeviation allows thestatistics practitioner toextract

When k 2, Chebysheffs Theorem states that at least threequarters (75%) of all

A random experiment is an action or process that leads to one of several possible

You might also like