You are on page 1of 31

6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.

htm#sample_variance

GlossaryofStatisticalTerms
Youcanusethe"find"(findinframe,findinpage)featureofyourbrowsertosearchthe
glossary.

ABCDEFGHIJKLMNOPQRSTUVWXYZ

09

01box
Aboxofnumberedtickets,inwhicheachticketisnumberedeither0or1.See BOXMODEL.

Affinetransformation.
See TRANSFORMATION.
Affirmingtheantecedent.
Avalidlogicalargumentthatconcludesfromthe PREMISEABandthepremiseAthattherefore,Bistrue.
Thenamecomesfromthefactthattheargumentaffirms(i.e.,assertsastrue)the ANTECEDENT(A)inthe
CONDITIONAL.
Affirmingtheconsequent.
Alogicalfallacythatarguesfromthe PREMISEABandthepremiseBthattherefore,Aistrue.Thename
comesfromthefactthattheargumentaffirms(i.e.,assertsastrue)the CONSEQUENT(B)inthe CONDITIONAL.
AlternativeHypothesis.
In HYPOTHESISTESTING,a NULLHYPOTHESIS(typicallythatthereisnoeffect)iscomparedwithanalternative
hypothesis(typicallythatthereisaneffect,orthatthereisaneffectofaparticularsign).Forexample,in
evaluatingwhetheranewcancerremedyworks,thenullhypothesistypicallywouldbethattheremedydoes
notwork,whilethealternativehypothesiswouldbethattheremedydoeswork.Whenthedataare
sufficientlyimprobableundertheassumptionthatthenullhypothesisistrue,thenullhypothesisisrejected
infavorofthealternativehypothesis.(Thisdoesnotimplythatthedataareprobableundertheassumption
thatthealternativehypothesisistrue,northatthenullhypothesisisfalse,northatthealternativehypothesis
istrue.Confused?TakeacourseinStatistics!)
and,&,conjunction,logicalconjunction,.
Anoperationontwological PROPOSITIONS.Ifpandqaretwo PROPOSITIONS,(p&q)isapropositionthatistrue
ifbothpandqaretrueotherwise,(p&q)isfalse.Theoperation&issometimesrepresentedbythesymbol
.
Ante.
Theupfrontcostofabet:themoneyyoumustpaytoplaythegame.FromLatinfor"before."
Antecedent.
Ina CONDITIONALpq,theantecedentisp.
AppealtoIgnorance.
Alogicalfallacy:takingtheabsenceofevidencetobeevidenceofabsence.Ifsomethingisnotknowntobe
false,assumethatitistrueorifsomethingisnotknowntobetrue,assumethatitisfalse.Forexample,ifI
havenoreasontothinkthatanyoneinTajikistanwishmewell,thatisnotevidencethatnobodyinTajikistan
wishesmewell.
Applet.
Anappletisasmallprogramthatisautomaticallydownloadedfromawebsitetoyourcomputerwhenyou
visitaparticularwebpageitallowsapagetobeinteractivetorespondtoyourinput.Theappletrunson
yourcomputer,notthecomputerthathostedthewebpage.Thesematerialscontainmanyappletsto
illustratestatisticalconceptsandtohelpyoutoanalyzedata.Manyofthemareaccessibledirectlyfromthe
toolspage.
Association.

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 1/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

Two VARIABLESareassociatedifsomeofthevariabilityofonecanbeaccountedforbytheother.Ina
SCATTERPLOTofthetwovariables,ifthescatterinthevaluesofthevariableplottedontheverticalaxisis
smallerinnarrowrangesofthevariableplottedonthehorizontalaxis(i.e.,invertical"slices")thanitis
overall,thetwovariablesareassociated.The CORRELATIONCOEFFICIENTisameasureoflinearassociation,
whichisaspecialcaseofassociationinwhichlargevaluesofonevariabletendtooccurwithlargevaluesof
theother,andsmallvaluesofonetendtooccurwithsmallvaluesoftheother(positiveassociation),orin
whichlargevaluesofonetendtooccurwithsmallvaluesoftheother,andviceversa(negativeassociation).
Average.
Asometimesvagueterm.Itusuallydenotesthe ARITHMETICMEAN,butitcanalsodenotethe MEDIAN,the MODE,
the GEOMETRICMEAN,andweightedmeans,amongotherthings.
AxiomsofProbability.
Therearethreeaxiomsofprobability:(1)Chancesarealwaysatleastzero.(2)Thechancethatsomething
happensis100%.(3)Iftwoeventscannotbothoccuratthesametime(iftheyare DISJOINTORMUTUALLY
EXCLUSIVE),thechancethateitheroneoccursisthesumofthechancesthateachoccurs.Forexample,
consideranexperimentthatconsistsoftossingacoinonce.Thefirstaxiomsaysthatthechancethatthe
coinlandsheads,forinstance,mustbeatleastzero.Thesecondaxiomsaysthatthechancethatthecoin
eitherlandsheadsorlandstailsorlandsonitsedgeordoesn'tlandatallis100%.Thethirdaxiomsaysthat
thechancethatthecoineitherlandsheadsorlandstailsisthesumofthechancethatthecoinlandsheads
andthechancethatthecoinlandstails,becausebothcannotoccurinthesamecointoss.Allother
mathematicalfactsaboutprobabilitycanbederivedfromthesethreeaxioms.Forexample,itistruethatthe
chancethataneventdoesnotoccuris(100%thechancethattheeventoccurs).Thisisaconsequenceof
thesecondandthirdaxioms.

Baseratefallacy.
Thebaseratefallacyconsistsoffailingtotakeintoaccountpriorprobabilities(baserates)whencomputing
CONDITIONALPROBABILITIESFROMOTHERCONDITIONALPROBABILITIES.ITISRELATEDTOTHEPROSECUTOR'SFALLACY.For
instance,supposethatatestforthepresenceofsomeconditionhasa1%chanceofafalsepositiveresult
(thetestsaystheconditionispresentwhenitisnot)anda1%chanceofafalsenegativeresult(thetest
saystheconditionisabsentwhentheconditionispresent),sotheexamis99%accurate.Whatisthe
chancethatanitemthattestspositivereallyhasthecondition?Theintuitiveansweris99%,butthatisnot
necessarilytrue:thecorrectanswerdependsonthefractionfofitemsinthepopulationthathavethe
condition(andonwhethertheitemtestedisselectedatrandomfromthepopulation).Thechancethata
randomlyselecteditemtestspositiveis0.99f/(0.99f+0.01(1f)),whichcouldbemuchsmallerthan99%
iffissmall.See BAYES'RULE.
Bayes'Rule.
Bayes'ruleexpressesthe CONDITIONALPROBABILITYofthe EVENTAgiventhe EVENTBintermsofthe CONDITIONAL
PROBABILITYofthe EVENTBgiventhe EVENTAandtheunconditionalprobabilityofA:

P(A|B)=P(B|A)P(A)/ (P(B|A)P(A)+P(B|A c)P(A c) ).

Inthisexpression,theunconditionalprobabilityofAisalsocalledthe PRIORPROBABILITYofA,becauseitisthe
probabilityassignedtoApriortoobservinganydata.Similarly,inthiscontext,P(A|B)iscalledthe POSTERIOR
PROBABILITYOFAGIVENB ,becauseitistheprobabilityofAupdatedtoreflect(i.e.,toconditionon)thefact
thatBwasobservedtooccur.
Bernoulli'sInequality.
TheBernoulliInequalitysaysthatifx1then(1+x)n1+nxforeveryintegern0.Ifniseven,the
inequalityholdsforallx.
Bias.
Ameasurementprocedureor ESTIMATORissaidtobebiasedif,ontheaverage,itgivesananswerthatdiffers
fromthetruth.Thebiasistheaverage(EXPECTED)differencebetweenthemeasurementandthetruth.For
example,ifyougetonthescalewithclotheson,thatbiasesthemeasurementtobelargerthanyourtrue
weight(thiswouldbeapositivebias).Thedesignofanexperimentorofasurveycanalsoleadtobias.Bias
canbedeliberate,butitisnotnecessarilyso.Seealso NONRESPONSEBIAS.
Bimodal.
Havingtwo MODES.
Bin.
See CLASSINTERVAL.
BinomialCoefficient.
See COMBINATIONS.

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 2/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

BinomialDistribution.
Arandomvariablehasabinomialdistribution(withparametersnandp)ifitisthenumberof"successes"in
afixednumbernof INDEPENDENTrandomtrials,allofwhichhavethesameprobabilitypofresultingin
"success."Undertheseassumptions,theprobabilityofksuccesses(andnkfailures)is nCkpk(1p)nk,
where nCkisthenumberof COMBINATIONSofnobjectstakenkatatime: nCk=n!/(k!(nk)!).The EXPECTEDVALUE
ofa RANDOMVARIABLEwiththeBinomialdistributionisnp,andthestandarderrorofarandomvariablewith
theBinomialdistributionis (np(1p)).Thispageshowsthe PROBABILITYHISTOGRAMofthebinomial
distribution.
BinomialTheorem.
TheBinomialtheoremsaysthat(x+y)n=xn+nxn1y++ nCkxnkyk++yn.
Bivariate.
Havingorhavingtodowithtwo VARIABLES.Forexample,bivariatedataaredatawherewehavetwo
measurementsofeach"individual."Thesemeasurementsmightbetheheightsandweightsofagroupof
people(an"individual"isaperson),theheightsoffathersandsons(an"individual"isafathersonpair),the
pressureandtemperatureofafixedvolumeofgas(an"individual"isthevolumeofgasunderacertainset
ofexperimentalconditions),etc. SCATTERPLOTS,the CORRELATIONCOEFFICIENT,and REGRESSIONmakesensefor
bivariatedatabutnot UNIVARIATEdata.C.f. UNIVARIATE.
Blind,BlindExperiment.
Inablindexperiment,the SUBJECTSdonotknowwhethertheyareinthe TREATMENTGROUPorthe CONTROL
GROUP.Inordertohaveablindexperimentwithhumansubjects,itisusuallynecessarytoadministera
PLACEBOtothecontrolgroup.

BootstrapestimateofSTANDARDERROR.
Thenameforthisideacomesfromtheidiom"topulloneselfupbyone'sbootstraps,"whichconnotes
gettingoutofaholewithoutanythingtostandon.Theideaofthebootstrapistoassume,forthepurposes
ofestimatinguncertainties,thatthesampleisthepopulation,thenusethe SEforsamplingfromthesample
toestimatethe SEofsamplingfromthepopulation.Forsamplingfromaboxofnumbers,theSDofthe
sampleisthebootstrapestimateoftheSDoftheboxfromwhichthesampleisdrawn.For SAMPLE
PERCENTAGES,thistakesaparticularlysimpleform:the SEofthe SAMPLEPERCENTAGEofndrawsfromabox,

withreplacement,is SD(box)/n,whereforaboxthatcontainsonlyzerosandones, SD(box)= ((fractionof


onesinbox)(fractionofzerosinbox) ).Thebootstrapestimateofthe SEofthe SAMPLEPERCENTAGE
consistsofestimating SD(box)by ((fractionofonesinsample)(fractionofzerosinsample)).Whenthe
samplesizeislarge,thisapproximationislikelytobegood.
Boxmodel.
Ananalogybetweenanexperimentanddrawingnumberedtickets"atrandom"fromaboxwithreplacement.
Forexample,supposewearetryingtoevaluateacoldremedybygivingitoraplacebotoagroupofn
individuals,randomlychoosinghalftheindividualstoreceivetheremedyandhalftoreceivetheplacebo.
Considerthemediantimetorecoveryforalltheindividuals(weassumeeveryonerecoversfromthecold
eventuallytosimplifythings,wealsoassumethatnoonerecoveredinexactlythemediantime,andthatn
iseven).Bydefinition,halftheindividualsgotbetterinlessthanthemediantime,andhalfinmorethanthe
mediantime.Theindividualswhoreceivedthetreatmentarea RANDOMSAMPLEof SIZEn/2fromthesetofn
subjects,halfofwhomgotbetterinlessthanmediantime,andhalfinlongerthanmediantime.Ifthe
remedyisineffective,thenumberofsubjectswhoreceivedtheremedyandwhorecoveredinlessthan
mediantimeislikethesumofn/2drawswithreplacementfromaboxwithtwoticketsinit:onewitha"1"
onit,andonewitha"0"onit.Thispageillustratesthesamplingdistributionofrandomdrawswithorwithout
fromaboxofnumberedtickets.
BreakdownPoint.
Thebreakdownpointofan ESTIMATORisthesmallestfractionofobservationsonemustcorrupttomakethe
estimatortakeanyvalueonewants.

CategoricalVariable.
A VARIABLEwhosevaluerangesovercategories,suchas{red,green,blue},{male,female},{Arizona,
California,Montana,NewYork},{short,tall},{Asian,AfricanAmerican,Caucasian,Hispanic,Native
American,Polynesian},{straight,curly},etc.Somecategoricalvariablesare ORDINAL.Thedistinctionbetween
categoricalvariablesand QUALITATIVEVARIABLESisabitblurry.C.f. QUANTITATIVEVARIABLE.
Causation,causalrelation.
Twovariablesarecausallyrelatedifchangesinthevalueofonecausetheothertochange.Forexample,if
oneheatsarigidcontainerfilledwithagas,thatcausesthepressureofthegasinthecontainertoincrease.

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 3/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

Twovariablescanbe ASSOCIATEDwithouthavinganycausalrelation,andeveniftwovariableshaveacausal
relation,their CORRELATIONcanbesmallorzero.
CentralLimitTheorem.
Thecentrallimittheoremstatesthatthe PROBABILITYHISTOGRAMSofthe SAMPLEMEANand SAMPLESUMofndraws
withreplacementfromaboxoflabeledticketsconvergetoa NORMALCURVEasthe SAMPLESIZEngrows,inthe
followingsense:Asngrows,theareaoftheprobabilityhistogramforanyrangeofvaluesapproachesthe
areaunderthe NORMALCURVEforthesamerangeofvalues,convertedto STANDARDUNITS.Seealso THENORMAL
APPROXIMATION.
CertainEvent.
An EVENTiscertainifits PROBABILITYis100%.Evenifaneventiscertain,itmightnotoccur.However,bythe
COMPLEMENTRULE,thechancethatitdoesnotoccuris0%.
Chancevariation,chanceerror.
A RANDOMVARIABLEcanbedecomposedintoasumofits EXPECTEDVALUEandchancevariationaroundits
expectedvalue.Theexpectedvalueofthechancevariationiszerothe STANDARDERRORofthechance
variationisthesameasthe STANDARDERRORoftherandomvariablethesizeofa"typical"difference
betweenthe RANDOMVARIABLEandits EXPECTEDVALUE.Seealso SAMPLINGERROR.
ChangeofUnitsorVariables.
Seealso TRANSFORMATION.
Chebychev'sInequality.
For LISTS:Foreverynumberk>0,thefractionofelementsinalistthatarek SD'sorfurtherfromthe
2
ARITHMETICMEANofthelistisatmost1/k .
For RANDOMVARIABLES:Foreverynumberk>0,theprobabilitythata RANDOMVARIABLEXisk SEsorfurtherfrom
its EXPECTEDVALUEisatmost1/k2.
Chisquarecurve.
Thechisquarecurveisafamilyofcurvesthatdependonaparametercalleddegreesoffreedom(d.f.).The
chisquarecurveisanapproximationtothe PROBABILITYHISTOGRAMofthe CHI SQUARESTATISTICfor MULTINOMIAL
modelifthe EXPECTEDnumberofoutcomesineachcategoryislarge.Thechisquarecurveispositive,andits
totalareais100%,sowecanthinkofitastheprobabilityhistogramofa RANDOMVARIABLE.Thebalancepoint
ofthecurveisd.f.,sotheexpectedvalueofthecorrespondingrandomvariablewouldequald.f..The

STANDARDERRORofthecorrespondingrandomvariablewouldbe(2d.f.) .Asd.f.grows,theshapeofthechi
squarecurveapproachestheshapeofthe NORMALCURVE.Thispageshowsthechisquarecurve.
ChisquareStatistic.
Thechisquarestatisticisusedtomeasuretheagreementbetween CATEGORICALdataanda MULTINOMIALMODEL
thatpredictstherelativefrequencyofoutcomesineachpossiblecategory.Supposetherearen INDEPENDENT
trials,eachofwhichcanresultinoneofkpossibleoutcomes.Supposethatineachtrial,theprobabilitythat
outcomeioccursispi,fori=1,2,,k,andthattheseprobabilitiesarethesameineverytrial.The
expectednumberoftimesoutcome1occursinthentrialsisnp1moregenerally,theexpectednumberof
timesoutcomeioccursis

expectedi=npi.

Ifthemodelbecorrect,wewouldexpectthentrialstoresultinoutcomeiaboutnpitimes,giveortakea
bit.Letobservedidenotethenumberoftimesanoutcomeoftypeioccursinthentrials,fori=1,2,,k.
Thechisquaredstatisticsummarizesthediscrepanciesbetweentheexpectednumberoftimeseach
outcomeoccurs(assumingthatthemodelistrue)andtheobservednumberoftimeseachoutcomeoccurs,
bysummingthesquaresofthediscrepancies,normalizedbytheexpectednumbers,overallthecategories:

chisquared=

(observed1expected1)2/expected1+(observed2expected2)2/expected2++
(observedkexpectedk)2/expectedk.
Asthesamplesizenincreases,ifthemodeliscorrect,thesamplingdistributionofthechisquaredstatistic
isapproximatedincreasinglywellbythechisquaredcurvewith

(#categories1)=k1

degreesoffreedom(d.f.),inthesensethatthechancethatthechisquaredstatisticisinanygivenrange
growscloserandclosertotheareaundertheChiSquaredcurveoverthesamerange.Thispageillustrates
thesamplingdistributionofthechisquarestatistic.
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 4/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

ClassBoundary.
Apointthatistheleftendpointofone CLASSINTERVAL,andtherightendpointofanother CLASSINTERVAL.
ClassInterval.
Inplottinga HISTOGRAM,onestartsbydividingtherangeofvaluesintoasetofnonoverlappingintervals,
calledclassintervals,insuchawaythateverydatumiscontainedinsomeclassinterval.Seetherelated
entries CLASSBOUNDARYand ENDPOINTCONVENTION.
ClusterSample.
Inaclustersample,the SAMPLINGUNITisacollectionofpopulationunits,notsinglepopulationunits.For
example,techniquesforadjustingtheU.S.censusstartwithasampleofgeographicblocks,then(tryto)
enumerateallinhabitantsoftheblocksinthesampletoobtainasampleofpeople.Thisisanexampleofa
clustersample.(Theblocksarechosenseparatelyfromdifferentstrata,sotheoveralldesignisa STRATIFIED
CLUSTERSAMPLE.)
Combinations.
Thenumberofcombinationsofnthingstakenkatatimeisthenumberofwaysofpickingasubsetofkof
thenthings,withoutreplacement,andwithoutregardtotheorderinwhichtheelementsofthesubsetare
picked.Thenumberofsuchcombinationsis nCk=n!/(k!(nk)!),wherek!(pronounced"k FACTORIAL")is
k(k1)(k2)1.Thenumbers nCkarealsocalledtheBinomialcoefficients.Fromasetthathasn
elementsonecanformatotalof2nsubsetsofallsizes.Forexample,fromtheset{a,b,c},whichhas3
elements,onecanformthe23=8subsets{},{a},{b},{c},{a,b},{a,c},{b,c},{a,b,c}.Becausethenumberof
subsetswithkelementsonecanformfromasetwithnelementsis nCk,andthetotalnumberofsubsetsof
asetisthesumofthenumbersofpossiblesubsetsofeachsize,itfollowsthat nC0+nC1+nC2++nCn=2n.
Thecalculatorhasabutton(nCm)thatletsyoucomputethenumberofcombinationsofmthingschosen
fromasetofnthings.Tousethebutton,firsttypethevalueofn,thenpushthenCmbutton,thentypethe
valueofm,thenpressthe"="button.
Complement.
Thecomplementofa SUBSETofagiven SETisthecollectionofall ELEMENTSofthe SETthatarenot ELEMENTSof
the SUBSET.
Complementrule.
Theprobabilityofthe COMPLEMENTofaneventis100%minustheprobabilityoftheevent:P(A c)=100%
P(A).
Compoundproposition.
Alogical PROPOSITIONformedfromotherpropositionsusinglogicaloperationssuchas !, |, XOR, &, and
.
ConditionalProbability.
Supposeweareinterestedintheprobabilitythatsome EVENTAoccurs,andwelearnthatthe EVENTB
occurred.HowshouldweupdatetheprobabilityofAtoreflectthisnewknowledge?Thisiswhatthe
conditionalprobabilitydoes:itsayshowtheadditionalknowledgethatBoccurredshouldaffectthe
probabilitythatAoccurredquantitatively.Forexample,supposethatAandBare MUTUALLYEXCLUSIVE.Thenif
Boccurred,Adidnot,sotheconditionalprobabilitythatAoccurredgiventhatBoccurrediszero.Atthe
otherextreme,supposethatBisa SUBSETofA,sothatAmustoccurwheneverBdoes.Thenifwelearnthat
Boccurred,Amusthaveoccurredtoo,sotheconditionalprobabilitythatAoccurredgiventhatBoccurredis
100%.Forinbetweencases,whereAandBintersect,butBisnota SUBSETofA,theconditionalprobability
ofAgivenBisanumberbetweenzeroand100%.Basically,one"restricts"the OUTCOMESPACEStoconsider
onlythepartofSthatisinB,becauseweknowthatBoccurred.ForAtohavehappenedgiventhatB
happenedrequiresthat ABhappened,soweareinterestedintheevent AB.Tohavealegitimate
probabilityrequiresthatP(S)=100%,soifwearerestrictingtheoutcomespacetoB,weneedtodivideby
theprobabilityofBtomaketheprobabilityofthisnewSbe100%.Onthisscale,theprobabilitythatAB
happenedisP(AB)/P(B).ThisisthedefinitionoftheconditionalprobabilityofAgivenB,providedP(B)is
notzero(divisionbyzeroisundefined).NotethatthespecialcasesAB={}(AandBare MUTUALLYEXCLUSIVE)
andAB=B(Bisa SUBSETofA)agreewithourintuitionasdescribedatthetopofthisparagraph.
Conditionalprobabilitiessatisfythe AXIOMSOFPROBABILITY,justasordinaryprobabilitiesdo.
ConfidenceInterval.
Aconfidenceintervalfora PARAMETERisarandomintervalconstructedfromdatainsuchawaythatthe
probabilitythattheintervalcontainsthetruevalueoftheparametercanbespecifiedbeforethedataare
collected.Confidenceintervalsaredemonstratedinthispage.
ConfidenceLevel.
Theconfidencelevelofa CONFIDENCEINTERVAListhechancethattheintervalthatwillresultoncedataare
collectedwillcontainthecorresponding PARAMETER.Ifonecomputesconfidenceintervalsagainandagain
fromindependentdata,thelongtermlimitofthefractionofintervalsthatcontaintheparameteristhe
confidencelevel.
Confounding.
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 5/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

Whenthedifferencesbetweenthe TREATMENTand CONTROLgroupsotherthanthetreatmentproduce


differencesinresponsethatarenotdistinguishablefromtheeffectofthe TREATMENT,thosedifferences
betweenthegroupsaresaidtobeconfoundedwiththeeffectofthetreatment(ifany).Forexample,
prominentstatisticiansquestionedwhetherdifferencesbetweenindividualsthatledsometosmokeand
othersnotto(ratherthantheactofsmokingitself)wereresponsiblefortheobserveddifferenceinthe
frequencieswithwhichsmokersandnonsmokerscontractvariousillnesses.Ifthatwerethecase,those
factorswouldbeconfoundedwiththeeffectofsmoking.Confoundingisquitelikelytoaffect OBSERVATIONAL
STUDIESand EXPERIMENTSthatarenot RANDOMIZED.Confoundingtendstobedecreasedby RANDOMIZATION.See
also SIMPSON'SPARADOX.
ContinuityCorrection.
Inusingthe NORMALAPPROXIMATIONtothe BINOMIAL PROBABILITYHISTOGRAM,onecangetmoreaccurateanswersby
findingtheareaunderthenormalcurvecorrespondingtohalfintegers,transformedto STANDARDUNITS.Thisis
clearestifweareseekingthechanceofaparticularnumberofsuccesses.Forexample,supposeweseekto
approximatethechanceof10successesin25 INDEPENDENTtrials,eachwithprobabilityp=40%ofsuccess.
Thenumberofsuccessesinthisscenariohasa BINOMIALDISTRIBUTIONwithparametersn=25andp=40%.
The EXPECTEDnumberofsuccessesisnp=10,andthe STANDARDERRORis(np(1p))=6=2.45.Ifwe
considertheareaunderthe NORMALCURVEatthepoint10successes,transformedto STANDARDUNITS,weget
zero:theareaunderapointisalwayszero.Wegetabetterapproximationbyconsidering10successesto
betherangefrom91/2to101/2successes.Theonlypossiblenumberofsuccessesbetween91/2and10
1/2is10,sothisisexactlyrightforthe BINOMIALDISTRIBUTION.Becausethe NORMALCURVEis CONTINUOUSanda
BINOMIAL RANDOMVARIABLEis DISCRETE,weneedto"smearout"the BINOMIALprobabilityoveranappropriate
range.Thelowerendpointoftherange,91/2successes,is(9.510)/2.45=0.20 STANDARDUNITS.The
upperendpointoftherange,101/2successes,is(10.510)/2.45=+0.20 STANDARDUNITS.Theareaunder
the NORMALCURVEbetween0.20and+0.20isabout15.8%.Thetrue BINOMIALprobabilityis
10 15
25C10(0.4) (0.6) =16%.Inasimilarway,ifweseekthe NORMALAPPROXIMATIONtotheprobabilitythata
BINOMIAL RANDOMVARIABLEisintherangefromisuccessestoksuccesses,inclusive,weshouldfindthearea
underthe NORMALCURVEfromi1/2tok+1/2successes,transformedto STANDARDUNITS.Ifweseekthe
probabilityofmorethanisuccessesandfewerthanksuccesses,weshouldfindtheareaunderthe NORMAL
CURVEcorrespondingtotherangei+1/2tok1/2successes,transformedto STANDARDUNITS.Ifweseekthe
probabilityofmorethanibutnomorethanksuccesses,weshouldfindtheareaunderthe NORMALCURVE
correspondingtotherangei+1/2tok+1/2successes,transformedto STANDARDUNITS.Ifweseektheprobability
ofatleastibutfewerthanksuccesses,weshouldfindtheareaunderthe NORMALCURVEcorrespondingtothe
rangei1/2tok1/2successes,transformedto STANDARDUNITS.Includingorexcludingthehalfintegerranges
attheendsoftheintervalinthismanneriscalledthecontinuitycorrection.
Consequent.
Ina CONDITIONALpq,theconsequentisq.
ContinuousVariable.
A QUANTITATIVEVARIABLEiscontinuousifitssetofpossiblevaluesisuncountable.Examplesinclude
temperature,exactheight,exactage(includingpartsofasecond).Inpractice,onecannevermeasurea
continuousvariabletoinfiniteprecision,socontinuousvariablesaresometimesapproximatedby DISCRETE
VARIABLES.A RANDOMVARIABLEXisalsocalledcontinuousifitssetofpossiblevaluesisuncountable,andthe
chancethatittakesanyparticularvalueiszero(insymbols,ifP(X=x)=0foreveryrealnumberx).A
randomvariableiscontinuousifandonlyifits CUMULATIVEPROBABILITYDISTRIBUTIONFUNCTIONisacontinuous
function(afunctionwithnojumps).
Contrapositive.
Ifpandqaretwo LOGICALPROPOSITIONS,thenthecontrapositiveoftheproposition(p q)istheproposition
((!q)(!p)).ThecontrapositiveisLOGICALLYEQUIVALENTtotheoriginalproposition.
Control.
Thereareatleastthreesensesof"control"instatistics:amemberofthecontrolgroup,towhomno
TREATMENTisgivena CONTROLLEDEXPERIMENT,andto CONTROLFORapossibleconfoundingvariable.
Controlledexperiment.
An EXPERIMENTthatusesthe METHODOFCOMPARISONtoevaluatethe EFFECTofa TREATMENTbycomparingtreated
SUBJECTSwithacontrolgroup,whodonotreceivethetreatment.
Controlled,randomizedexperiment.
A CONTROLLEDEXPERIMENTinwhichtheassignmentof SUBJECTStothe TREATMENTGROUPor CONTROLGROUPisdone
atrandom,forexample,bytossingacoin.
Controlforavariable.
Tocontrolforavariableistotrytoseparateitseffectfromthetreatmenteffect,soitwillnot CONFOUNDwith
thetreatment.Therearemanymethodsthattrytocontrolforvariables.Somearebasedonmatching
individualsbetweentreatmentandcontrolothersuseassumptionsaboutthenatureoftheeffectsofthe
variablestotrytomodeltheeffectmathematically,forexample,usingregression.
Controlgroup.
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 6/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

The SUBJECTSina CONTROLLEDEXPERIMENTwhodonotreceivethe TREATMENT.


ConvenienceSample.
Asampledrawnbecauseofitsconvenienceitisnota PROBABILITYSAMPLE.Forexample,Imighttakea
sampleofopinionsinBerkeley(whereIlive)byjustaskingmy10nearestneighbors.Thatwouldbea
sampleofconvenience,andwouldbeunlikelytoberepresentativeofallofBerkeley.Samplesof
conveniencearenottypicallyrepresentative,anditisnotpossibletoquantifyhowunrepresentativeresults
basedonsamplesofconveniencearelikelytobe.Conveniencesamplesaretobeavoided,andresults
basedonconveniencesamplesaretobeviewedwithsuspicion.Seealso QUOTASAMPLE.
Converge,convergence.
Asequenceofnumbersx1,x2,x3convergesifthereisanumberxsuchthatforanynumberE>0,thereis
anumberk(whichcandependonE)suchthat|xjx|<Ewheneverj>k.Ifsuchanumberxexists,itis
calledthe LIMITofthesequencex1,x2,x3.
Convergenceinprobability.
Asequenceof RANDOMVARIABLESX 1,X 2,X 3convergesinprobabilityifthereisarandomvariableXsuch
thatforanynumberE>0,thesequenceofnumbers

P(|X1X|<e),P(|X2X|<e),P(|X3X|<e),

convergesto100%.
Converse.
Ifpandqaretwo LOGICALPROPOSITIONS,thentheconverseoftheproposition(p q)istheproposition(q
p).
Correlation.
Ameasureoflinear ASSOCIATIONbetweentwo(ordered) LISTS.Twovariablescanbestronglycorrelatedwithout
havinganycausalrelationship,andtwovariablescanhaveacausalrelationshipandyetbeuncorrelated.
Correlationcoefficient.
Thecorrelationcoefficientrisameasureofhownearlya SCATTERPLOTfallsonastraightline.Thecorrelation
coefficientisalwaysbetween1and+1.Tocomputethecorrelationcoefficientofa LISTofpairsof
measurements(X,Y),first TRANSFORMXandYindividuallyinto STANDARDUNITS.Multiplycorresponding
elementsofthetransformedpairstogetasinglelistofnumbers.Thecorrelationcoefficientisthe MEANof
thatlistofproducts.Thispagecontainsatoolthatletsyougenerate BIVARIATEdatawithanycorrelation
coefficientyouwant.
Counting.
Tocountasetofthingsistoputitinonetoonecorrespondencewithaconsecutivesubsetofthepositive
integers,startingwith1.
CountableSet.
Asetiscountableifitselementscanbeputinonetoonecorrespondencewithasubsetoftheintegers.For
example,thesets{0,1,7,3},{red,green,blue},{,2,1,0,1,2,},{straight,curly},andthesetofall
fractions,arecountable.Ifasetisnotcountable,itis UNCOUNTABLE.Thesetofallrealnumbersis UNCOUNTABLE.
Cover.
A CONFIDENCEINTERVALissaidtocoveriftheintervalcontainsthetruevalueofthe PARAMETER.Beforethedata
arecollected,thechancethattheconfidenceintervalwillcontaintheparametervalueisthe COVERAGE
PROBABILITY,whichequalsthe CONFIDENCELEVELafterthedataarecollectedandtheconfidenceintervalis
actuallycomputed.
Coverageprobability.
Thecoverageprobabilityofaprocedureformaking CONFIDENCEINTERVALSisthechancethattheprocedure
producesanintervalthat COVERSthetruth.
Criticalvalue
Thecriticalvalueinan HYPOTHESISTESTisthevalueofthe TESTSTATISTICBEYONDWHICHWEWOULDREJECTTHENULL
HYPOTHESIS.Thecriticalvalueissetsothattheprobabilitythatthe TESTSTATISTICISBEYONDTHECRITICALVALUEIS
ATMOSTEQUALTOTHESIGNIFICANCELEVELifthenullhypothesisbetrue.
Crosssectionalstudy.
Acrosssectional STUDYcomparesdifferentindividualstoeachotheratthesametimeitlooksatacross
sectionofapopulation.Thedifferencesbetweenthoseindividualscan CONFOUNDwiththeeffectbeing
explored.Forexample,intryingtodeterminetheeffectofageonsexualpromiscuity,acrosssectionalstudy
wouldbelikelyto CONFOUNDtheeffectofagewiththeeffectofthemoresthesubjectsweretaughtas
children:theolderindividualswereprobablyraisedwithaverydifferentattitudetowardspromiscuitythanthe
youngersubjects.Thusitwouldbeimprudenttoattributedifferencesinpromiscuitytotheagingprocess.
C.f. LONGITUDINALSTUDY.
CumulativeProbabilityDistributionFunction(cdf).

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 7/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

Thecumulativedistributionfunctionofa RANDOMVARIABLEisthechancethattherandomvariableislessthan
orequaltox,asafunctionofx.Insymbols,ifFisthecdfofthe RANDOMVARIABLEX,thenF(x)=P(Xx).
Thecumulativedistributionfunctionmusttendtozeroasxapproachesminusinfinity,andmusttendtounity
asxapproachesinfinity.Itisapositivefunction,and INCREASESMONOTONICALLY:ify>x,thenF(y)F(x).The
cumulativedistributionfunctioncompletelycharacterizesthe PROBABILITYDISTRIBUTIONofa RANDOMVARIABLE.

D
deMorgan'sLaws
deMorgan'sLawsareidentitiesinvolvinglogicaloperations:the NEGATIONofa CONJUNCTIONis LOGICALLY
EQUIVALENTtothe DISJUNCTIONofthenegations,andthenegationofadisjunctionislogicallyequivalenttothe
conjunctionofthenegations.Insymbols,!(p&q)=!p|!qand!(p|q)=!p&!q.
DeckofCards.
Astandarddeckofplayingcardscontains52cards,13eachoffoursuits:spades,hearts,diamonds,and
clubs.Thethirteencardsofeachsuitare{ace,2,3,4,5,6,7,8,9,10,jack,queen,king}.Thefacecards
are{jack,queen,king}.Itistypicallyassumedthatifadeckofcardsisshuffledwell,itisequallylikelytobe
ineachpossibleordering.Thereare52!(52 FACTORIAL)possibleorderings.
DependentEVENTS,DependentRANDOMVARIABLES.
Two EVENTSor RANDOMVARIABLESaredependentiftheyarenot INDEPENDENT.
DependentVariable.
In REGRESSION,thevariablewhosevaluesaresupposedtobeexplainedbychangesintheothervariable(the
the INDEPENDENTor EXPLANATORYVARIABLE).Usuallyoneregressesthedependentvariableonthe INDEPENDENT
VARIABLE.
Density,DensityScale.
Theverticalaxisofahistogramhasunitsofpercentperunitofthehorizontalaxis.Thisiscalledadensity
scaleitmeasureshow"dense"theobservationsareineachbin.Seealso PROBABILITYDENSITY.
Denyingtheantecedent.
Alogicalfallacythatarguesfromthe PREMISEABandthepremise!Athattherefore,!B.Thenamecomes
fromthefactthattheoperationdenies(i.e.,assertsthe NEGATIONof)the ANTECEDENT(A)inthe CONDITIONAL.
Denyingtheconsequent.
Avalidlogicalargumentthatconcludesfromthe PREMISEABandthepremise!Bthattherefore,!A.The
namecomesfromthefactthattheoperationdenies(i.e.,assertsthelogical NEGATION)the CONSEQUENT(B)in
the CONDITIONAL.
Deviation.
Adeviationisthedifferencebetweenadatumandsomereferencevalue,typicallythe MEANofthedata.In
computingthe SD,onefindsthe RMSofthedeviationsfromthe MEAN,thedifferencesbetweentheindividual
dataandthe MEANofthedata.
DiscreteVariable.
A QUANTITATIVEVARIABLEwhosesetofpossiblevaluesis COUNTABLE.Typicalexamplesofdiscretevariablesare
variableswhosepossiblevaluesareasubsetoftheintegers,suchasSocialSecuritynumbers,thenumber
ofpeopleinafamily,agesroundedtothenearestyear,etc.Discretevariablesare"chunky."C.f. CONTINUOUS
VARIABLE.Adiscrete RANDOMVARIABLEisonewhosesetofpossiblevaluesis COUNTABLE.Arandomvariableis
discreteifandonlyifits CUMULATIVEPROBABILITYDISTRIBUTIONFUNCTIONisastairstepfunctioni.e.,ifitis
piecewiseconstantandonlyincreasesbyjumps.
DisjointorMutuallyExclusiveEVENTS.
Two EVENTSaredisjointormutuallyexclusiveiftheoccurrenceofoneisincompatiblewiththeoccurrenceof
theotherthatis,iftheycan'tbothhappenatonce(iftheyhavenooutcomeincommon).Equivalently,two
EVENTSaredisjointiftheir INTERSECTIONisthe EMPTYSET.

DisjointorMutuallyExclusiveSETS.
Two SETSaredisjointormutuallyexclusiveiftheyhavenoelementincommon.Equivalently,two SETSare
disjointiftheir INTERSECTIONisthe EMPTYSET.
Distribution.
Thedistributionofasetofnumericaldataishowtheirvaluesaredistributedovertherealnumbers.Itis
completelycharacterizedbythe EMPIRICALDISTRIBUTIONFUNCTION.Similarly,the PROBABILITYDISTRIBUTIONofa
randomvariableiscompletelycharacterizedbyits PROBABILITYDISTRIBUTIONFUNCTION.Sometimestheword
"distribution"isusedasasynonymforthe EMPIRICALDISTRIBUTIONFUNCTIONorthe PROBABILITYDISTRIBUTION
FUNCTION.Iftwoormorerandomvariablesaredefinedforthesameexperiment,theyhavea JOINTPROBABILITY
DISTRIBUTION.
DistributionFunction,Empirical.
Theempirical(cumulative)distributionfunctionofasetofnumericaldatais,foreachrealvalueofx,the
fractionofobservationsthatarelessthanorequaltox.Aplotoftheempiricaldistributionfunctionisan
unevensetofstairs.Thewidthofthestairsisthespacingbetweenadjacentdatatheheightofthestairs
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 8/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

dependsonhowmanydatahaveexactlythesamevalue.Thedistributionfunctioniszeroforsmallenough
(negative)valuesofx,andisunityforlargeenoughvaluesofx.Itincreases MONOTONICALLY:ify>x,the
empiricaldistributionfunctionevaluatedatyisatleastaslargeastheempiricaldistributionfunction
evaluatedatx.
DoubleBlind,DoubleBlindExperiment.
Inadoubleblindexperiment,neitherthe SUBJECTSnorthepeopleevaluatingthesubjectsknowswhoisinthe
TREATMENTGROUPandwhoisinthe CONTROLGROUP.Thismitigatesthe PLACEBOEFFECTandguardsagainst
consciousandunconsciousprejudicefororagainstthetreatmentonthepartoftheevaluators.

EcologicalCorrelation.
The CORRELATIONbetweenaveragesofgroupsofindividuals,insteadofindividuals.Ecologicalcorrelationcan
bemisleadingabouttheassociationofindividuals.
ElementofaSET.
See MEMBER.
EmpiricalLawofAverages.
TheEmpiricalLawofAveragesliesatthebaseofthe FREQUENCYTHEORYofprobability.Thislaw,whichis,in
fact,anassumptionabouthowtheworldworks,ratherthanamathematicalorphysicallaw,statesthatif
onerepeatsa RANDOMEXPERIMENToverandover,independentlyandunder"identical"conditions,thefraction
oftrialsthatresultinagivenoutcomeconvergestoalimitasthenumberoftrialsgrowswithoutbound.
EmptySET.
Theempty SET,denoted{}or,isthe SETthathasno MEMBERS.
EndpointConvention.
Inplottinga HISTOGRAM,onemustdecidewhethertoincludeadatumthatliesataclassboundarywiththe
CLASSINTERVALtotheleftortherightoftheboundary.Theruleformakingthisassignmentiscalledan
endpointconvention.Thetwostandardendpointconventionsare(1)toincludetheleftendpointofallclass
intervalsandexcludetheright,exceptfortherightmostclassinterval,whichincludesbothofitsendpoints,
and(2)toincludetherightendpointofallclassintervalsandexcludetheleft,exceptfortheleftmost
interval,whichincludesbothofitsendpoints.
Estimator.
Anestimatorisarulefor"guessing"thevalueofapopulation PARAMETERbasedona RANDOMSAMPLEfromthe
population.Anestimatorisa RANDOMVARIABLE,becauseitsvaluedependsonwhichparticularsampleis
obtained,whichisrandom.Acanonicalexampleofanestimatoristhe SAMPLEMEAN,whichisanestimatorof
the POPULATIONMEAN.
Event.
Aneventisa SUBSETof OUTCOMESPACE.Aneventdeterminedbya RANDOMVARIABLEisaneventoftheformA=
(XisinA).WhentherandomvariableXisobserved,thatdetermineswhetherornotAoccurs:ifthevalueof
XhappenstobeinA,Aoccursifnot,Adoesnotoccur.
Exhaustive.
Acollectionof EVENTS{A 1,A 2,A 3,}exhauststhesetAif,fortheeventAtooccur,atleastoneofthose
setsmustalsooccurthatis,if

S A 1 A 2 A 3

IftheeventAisnotspecified,itisassumedtobetheentire OUTCOMESPACES.

Expectation,ExpectedValue.
Theexpectedvalueofa RANDOMVARIABLEisthelongtermlimitingaverageofitsvaluesinindependent
repeatedexperiments.TheexpectedvalueoftherandomvariableXisdenotedEXorE(X).Foradiscrete
randomvariable(onethathasa COUNTABLEnumberofpossiblevalues)theexpectedvalueistheweighted
averageofitspossiblevalues,wheretheweightassignedtoeachpossiblevalueisthechancethatthe
randomvariabletakesthatvalue.Onecanthinkoftheexpectedvalueofarandomvariableasthepointat
whichits PROBABILITYHISTOGRAMwouldbalance,ifitwerecutoutofauniformmaterial.Takingtheexpected
valueisa LINEARoperation:ifXandYaretworandomvariables,theexpectedvalueoftheirsumisthesum
oftheirexpectedvalues(E(X+Y)=E(X)+E(Y)),andtheexpectedvalueofaconstantatimesarandom
variableXistheconstanttimestheexpectedvalueofX(E(aX)=aE(X)).
Experiment.
Whatdistinguishesanexperimentfroman OBSERVATIONALSTUDYisthatinanexperiment,theexperimenter
decideswhoreceivesthe TREATMENT.
ExplanatoryVariable.
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 9/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

Inregression,theexplanatoryor INDEPENDENTVARIABLEistheonethatissupposedto"explain"theother.For
example,inexaminingcropyieldversusquantityoffertilizerapplied,thequantityoffertilizerwouldbethe
explanatoryor INDEPENDENTVARIABLE,andthecropyieldwouldbethe DEPENDENTVARIABLE.In EXPERIMENTS,the
explanatoryvariableistheonethatismanipulatedtheonethatisobservedisthe DEPENDENTVARIABLE.
Extrapolation.
See INTERPOLATION.

Factorial.
Foranintegerkthatisgreaterthanorequalto1,k!(pronounced"kfactorial")isk(k1)(k2)1.By
convention,0!=1.Therearek!waysoforderingkdistinctobjects.Forexample,9!isthenumberofbatting
ordersof9baseballplayers,and52!isthenumberofdifferentwaysastandarddeckofplayingcardscanbe
ordered.The CALCULATORabovehasabuttontocomputethefactorialofanumber.Tocomputek!,firsttype
thevalueofk,thenpressthebuttonlabeled"!".
FairBet.
Afairbetisoneforwhichthe EXPECTEDVALUEofthepayoffiszero,afteraccountingforthecostofthebet.
Forexample,supposeIoffertopayyou$2ifafaircoinlandsheads,butyoumust ANTEup$1toplay.Your
expectedpayoffis$1+$0P(tails)+$2P(heads)=$1+$250%=$0.Thisisafairbetinthelongrun,
ifyoumadethisbetoverandoveragain,youwouldexpecttobreakeven.
FalseDiscoveryRate.
Intestingacollectionofhypotheses,thefalsediscoveryrateisthefractionofrejectednullhypothesesthat
arerejectederroneously(thenumberofTypeIerrorsdividedbythenumberofrejectednullhypotheses),
withtheconventionthatifnohypothesisisrejected,thefalsediscoveryrateiszero.
FinitePopulationCorrection.
Whensamplingwithoutreplacement,asina SIMPLERANDOMSAMPLE,the SEofsamplesumsandsample
meansdependsonthefractionofthepopulationthatisinthesample:thegreaterthefraction,thesmaller
the SE.Samplingwithreplacementislikesamplingfromaninfinitelylargepopulation.Theadjustmentto
theSEforsamplingwithoutreplacementiscalledthefinitepopulationcorrection.The SEforsampling
withoutreplacementissmallerthanthe SEforsamplingwithreplacementbythefinitepopulationcorrection
factor ((Nn)/(N1)).Notethatforsamplesizen=1,thereisnodifferencebetweensamplingwithand
withoutreplacementthefinitepopulationcorrectionisthenunity.Ifthesamplesizeistheentirepopulation
ofNunits,thereisnovariabilityintheresultofsamplingwithoutreplacement(everymemberofthe
populationisinthesampleexactlyonce),andthe SEshouldbezero.Thisisindeedwhatthefinite
populationcorrectiongives(thenumeratorvanishes).
Fisher'sexacttest(fortheequalityoftwopercentages)
Considertwopopulationsofzerosandones.Letp1betheproportionofonesinthefirstpopulation,andlet
p2betheproportionofonesinthesecondpopulation.Wewouldliketotestthe NULLHYPOTHESISthatp1=p2
onthebasisofa SIMPLERANDOMSAMPLEfromeachpopulation.Letn1bethesizeofthesamplefrom
population1,andletn2bethesizeofthesamplefrompopulation2.LetGbethetotalnumberofonesin
bothsamples.Ifthenullhypothesisbetrue,thetwosamplesarelikeonelargersamplefromasingle
populationofzerosandones.Theallocationofonesbetweenthetwosampleswouldbeexpectedtobe
proportionaltotherelativesizesofthesamples,butwouldhavesomechancevariability. CONDITIONALonG
andthetwosamplesizes,underthenullhypothesis,theticketsinthefirstsamplearelikearandomsample
ofsizen1withoutreplacementfromacollectionofN=n1+n2unitsofwhichGarelabeledwithones.Thus,
underthenullhypothesis,thenumberofticketslabeledwithonesinthefirstsamplehas(conditionalonG)
an HYPERGEOMETRICDISTRIBUTIONwithparametersN,G,andn1.Fisher'sexacttestusesthisdistributiontoset
therangesofobservedvaluesofthenumberofonesinthefirstsampleforwhichwewouldrejectthenull
hypothesis.
FootballShapedScatterplot.
Inafootballshapedscatterplot,mostofthepointsliewithinatiltedoval,shapedmoreorlesslikeafootball.
Afootballshapedscatterplotisoneinwhichthedataare HOMOSCEDASTICALLYscatteredaboutastraightline.
Frame,samplingframe.
Asamplingframeisacollectionof UNITSfromwhichasamplewillbedrawn.Ideally,theframeisidenticalto
the POPULATIONwewanttolearnaboutmoretypically,theframeisonlyasubsetofthe POPULATIONofinterest.
Thedifferencebetweentheframeandthe POPULATIONcanbeasourceof BIASinsamplingdesign,ifthe
PARAMETERofinteresthasadifferentvaluefortheframethanitdoesforthe POPULATION.Forexample,one
mightdesiretoestimatethecurrentannualaverageincomeof1998graduatesoftheUniversityofCalifornia
atBerkeley.Iproposetousethe SAMPLEMEANincomeofasampleofgraduatesdrawnatrandom.To
facilitatetakingthesampleandcontactingthegraduatestoobtainincomeinformationfromthem,Imight
drawnamesatrandomfromthelistof1998graduatesforwhomthealumniassociationhasanaccurate

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 10/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance
currentaddress.Thepopulationisthecollectionof1998graduatestheframeisthosegraduateswhohave
currentaddressesonfilewiththealumniassociation.Ifthereisatendencyforgraduateswithhigher
incomestohaveuptodateaddressesonfilewiththealumniassociation,thatwouldintroduceapositive
BIASintotheannualaverageincomeestimatedfromthesamplebythe SAMPLEMEAN.
FPP.
Statistics,thirdedition,byFreedman,Pisani,andPurves,publishedbyW.W.Norton,1997.
Frequencytheoryofprobability.
See PROBABILITY,THEORIESOF.
Frequencytable.
Atablelistingthefrequency(number)orrelativefrequency(fractionorpercentage)ofobservationsin
differentranges,called CLASSINTERVALS.
FundamentalRuleofCounting.
IfasequenceofexperimentsortrialsT1,T2,T3,,Tkcouldresult,respectively,inn1,n2n3,,nk
possibleoutcomes,andthenumbersn1,n2n3,,nkdonotdependonwhichoutcomesactuallyoccurred,
theentiresequenceofkexperimentshasn1n2n3nkpossibleoutcomes.

GameTheory.
Afieldofstudythatbridgesmathematics,statistics,economics,andpsychology.Itisusedtostudy
economicbehavior,andtomodelconflictbetweennations,forexample,"nuclearstalemate"duringtheCold
War.
GeometricDistribution.
Thegeometricdistributiondescribesthenumberoftrialsuptoandincludingthefirstsuccess,in
independenttrialswiththesameprobabilityofsuccess.Thegeometricdistributiondependsonlyonthe
singleparameterp,theprobabilityofsuccessineachtrial.Forexample,thenumberoftimesonemusttoss
afaircoinuntilthefirsttimethecoinlandsheadshasageometricdistributionwithparameterp=50%.The
geometricdistributionassignsprobabilityp(1p)k1totheeventthatittakesktrialstothefirstsuccess.
The EXPECTEDVALUEofthegeometricdistributionis1/p,andits SEis(1p)/p.
GeometricMean.
Thegeometricmeanofnnumbers{x1,x2,x3,,xn}isthenthrootoftheirproduct:

(x1x2x3xn)1/n.

GraphofAverages.
For BIVARIATEdata,agraphofaveragesisaplotoftheaveragevaluesofonevariable(sayy)forsmallranges
ofvaluesoftheothervariable(sayx),againstthevalueofthesecondvariable(x)atthemidpointsofthe
ranges.

Heteroscedasticity.
"Mixedscatter."A SCATTERPLOTor RESIDUALPLOTshowsheteroscedasticityifthescatterinverticalslicesthrough
theplotdependsonwhereyoutaketheslice. LINEARREGRESSIONisnotusuallyagoodideaifthedataare
heteroscedastic.
Histogram.
Ahistogramisakindofplotthatsummarizeshowdataaredistributed.Startingwithasetof CLASSINTERVALS,
thehistogramisasetofrectangles(" BINS")sittingonthehorizontalaxis.Thebasesoftherectanglesarethe
CLASSINTERVALS,andtheirheightsaresuchthattheirareasareproportionaltothefractionofobservationsin
thecorresponding CLASSINTERVALS.Thatis,theheightofagivenrectangleisthefractionofobservationsin
thecorresponding CLASSINTERVAL,dividedbythelengthofthecorresponding CLASSINTERVAL.Ahistogramdoes
notneedaverticalscale,becausethetotalareaofthehistogrammustequal100%.Theunitsofthevertical
axisarepercentperunitofthehorizontalaxis.Thisiscalledthedensityscale.Thehorizontalaxisofa
histogramneedsascale.Ifanyobservationscoincidewiththeendpointsof CLASSINTERVALS,the ENDPOINT
CONVENTIONisimportant.Thispagecontainsahistogramtool,withcontrolstohighlightrangesofvaluesand
readtheirareas.
HistoricalControls.
Sometimes,thea TREATMENTGROUPiscomparedwithindividualsfromanotherepochwhodidnotreceivethe
treatmentforexample,instudyingthepossibleeffectoffluoridatedwateronchildhoodcancer,wemight
comparecancerratesinacommunitybeforeandafterfluorinewasaddedtothewatersupply.Those

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 11/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

individualswhowerechildrenbeforefluoridationstartedwouldcompriseanhistoricalcontrolgroup.
Experimentsandstudieswithhistoricalcontrolstendtobemoresusceptibletoconfoundingthanthosewith
contemporarycontrols,becausemanyfactorsthatmightaffecttheoutcomeotherthanthe TREATMENTtendto
changeovertimeaswell.(Inthisexample,thelevelofotherpotentialcarcinogensintheenvironmentalso
couldhavechanged.)
Homoscedasticity.
"Samescatter."A SCATTERPLOTor RESIDUALPLOTshowshomoscedasticityifthescatterinverticalslicesthrough
theplotdoesnotdependmuchonwhereyoutaketheslice.C.f. HETEROSCEDASTICITY.
HouseEdge.
Incasinogames,theexpectedpayofftothebettorisnegative:thehouse(casino)tendstowinmoneyinthe
longrun.Theamountofmoneythehousewouldexpecttowinforeach$1wageredonaparticularbet(such
asabeton"red"inroulette)iscalledthehouseedgeforthatbet.
HTLWS.
ThebookHowtoliewithStatisticsbyD.Huff.
HypergeometricDistribution.
ThehypergeometricdistributionwithparametersN,Gandnisthedistributionofthenumberof"good"
objectsina SIMPLERANDOMSAMPLEofsizen(i.e.,arandomsamplewithoutreplacementinwhicheverysubset
ofsizenhasthesamechanceofoccurring)fromapopulationofNobjectsofwhichGare"good."The
chanceofgettingexactlyggoodobjectsinsuchasampleis

GCgNGCng/NCn,

providedgn,gG,andngNG.(Theprobabilityiszerootherwise.)Theexpectedvalueofthe
hypergeometricdistributionisnG/N,anditsstandarderroris

((Nn)/(N1))(nG/N(1G/N)).
Hypothesistesting.
Statisticalhypothesistestingisformalizedasmakingadecisionbetweenrejectingornotrejectinga NULL
HYPOTHESIS,onthebasisofasetofobservations.Twotypesoferrorscanresultfromanydecisionrule(test):
rejectingthenullhypothesiswhenitistrue(a TYPEIERROR),andfailingtorejectthenullhypothesiswhenit
isfalse(a TYPEIIERROR).Foranyhypothesis,itispossibletodevelopmanydifferentdecisionrules(tests).
Typically,onespecifiesaheadoftimethechanceofaTypeIerroroneiswillingtoallow.Thatchanceis
calledthe SIGNIFICANCELEVELofthetestordecisionrule.Foragivensignificancelevel,onewayofdeciding
whichdecisionruleisbestistopicktheonethathasthesmallestchanceofaTypeIIerrorwhenagiven
ALTERNATIVEHYPOTHESISistrue.Thechanceofcorrectlyrejectingthenullhypothesiswhenagivenalternative
hypothesisistrueiscalledthe POWERofthetestagainstthatalternative.

iff,ifandonlyif,
Ifpandqaretwo LOGICALPROPOSITIONS,then(pq)isapropositionthatistruewhenbothpandqaretrue,
andwhenbothpandqarefalse.Itis LOGICALLYEQUIVALENTtotheproposition

((pq)&(qp))
andtotheproposition

((p&q)|((!p)&(!q))).

Implies,logicalimplication,,conditional,ifthen
Logicalimplicationisanoperationontwo LOGICALPROPOSITIONS.Ifpandqaretwologicalpropositions,(p
q),pronounced"pimpliesq"or"ifpthenq"isalogicalpropositionthatistrueifpisfalse,orifbothpandq
aretrue.Theproposition(pq)is LOGICALLYEQUIVALENTtotheproposition((!p) |q).Intheconditionalpq,
the ANTECEDENTispandthe CONSEQUENTisq.
Independent,independence.
Two EVENTSAandBare(statistically)independentifthechancethattheybothhappensimultaneouslyisthe
productofthechancesthateachoccursindividuallyi.e.,ifP(AB)=P(A)P(B).Thisisessentiallyequivalent
tosayingthatlearningthatoneeventoccursdoesnotgiveanyinformationaboutwhethertheotherevent
occurredtoo:theconditionalprobabilityofAgivenBisthesameastheunconditionalprobabilityofA,i.e.,
P(A|B)=P(A).Two RANDOMVARIABLESXandYareindependentifalleventsthey DETERMINEareindependent,

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 12/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

forexample,iftheevent{a<Xb}isindependentoftheevent{c<Yd}forallchoicesofa,b,c,andd.
Acollectionofmorethantworandomvariablesisindependentifforeverypropersubsetofthevariables,
everyeventdeterminedbythatsubsetofthevariablesisindependentofeveryeventdeterminedbythe
variablesinthecomplementofthesubset.Forexample,thethreerandomvariablesX,Y,andZare
independentifeveryeventdeterminedbyXisindependentofeveryeventdeterminedbyYandeveryevent
determinedbyXisindependentofeveryeventdeterminedbyYandZandeveryeventdeterminedbyYis
independentofeveryeventdeterminedbyXandZandeveryeventdeterminedbyZisindependentofevery
eventdeterminedbyXandY.
Independentandidenticallydistributed(iid).
Acollectionoftwoormorerandomvariables{X 1,X 2,,}isindependentandidenticallydistributedifthe
variableshavethesameprobabilitydistribution,andare INDEPENDENT.
IndependentVariable.
In REGRESSION,theindependentvariableistheonethatissupposedtoexplaintheotherthetermisa
synonymfor"explanatoryvariable."Usually,oneregressesthe"dependentvariable"onthe"independent
variable."Thereisnotalwaysaclearchoiceoftheindependentvariable.Theindependentvariableisusually
plottedonthehorizontalaxis.Independentinthiscontextdoesnotmeanthesamethingas STATISTICALLY
INDEPENDENT.

IndicatorRANDOMVARIABLE.
Theindicator[ RANDOMVARIABLE]ofthe EVENTA,oftenwritten1A,isthe RANDOMVARIABLEthatequalsunityifA
occurs,andzeroifAdoesnotoccur.The EXPECTEDVALUEoftheindicatorofAistheprobabilityofA,P(A),
andthe STANDARDERRORoftheindicatorofAis(P(A)(1P(A)).Thesum

1A+1B+1C+

oftheindicatorsofacollectionof EVENTS{A,B,C,}countshowmanyofthe EVENTS{A,B,C,}occurina


giventrial.Theproductoftheindicatorsofacollectionofeventsistheindicatoroftheintersectionofthe
events(theproductequalsoneifandonlyifallofindicatorsequalone).Themaximumoftheindicatorsofa
collectionofeventsistheindicatoroftheunionoftheevents(themaximumequalsoneifanyofthe
indicatorsequalsone).
InterquartileRange(IQR).
Theinterquartilerangeofa LISTofnumbersisthe UPPERQUARTILEminusthe LOWERQUARTILE.
Interpolation.
Givenasetof BIVARIATEDATA(x,y),toimputeavalueofycorrespondingtosomevalueofxatwhichthereis
nomeasurementofyiscalledinterpolation,ifthevalueofxiswithintherangeofthemeasuredvaluesofx.
Ifthevalueofxisoutsidetherangeofmeasuredvalues,imputingacorrespondingvalueofyiscalled
EXTRAPOLATION.
Intersection.
Theintersectionoftwoormoresetsisthesetofelementsthatallthesetshaveincommontheelements
containedineveryoneofthesets.Theintersectionofthe EVENTSAandBiswritten"AB,""AandB,"and
"AB."C.f. UNION.Seealso VENNDIAGRAMS.
Invalid(logical)argument.
Aninvalid LOGICALARGUMENTisoneinwhichthetruthofthe PREMISESdoesnotguaranteethetruthofthe
CONCLUSION.Forexample,thefollowinglogicalargumentisinvalid:Iftheforecastcallsforrain,Iwillnotwear
sandals.Theforecastdoesnotcallforrain.Therefore,Iwillwearsandals.Seealso VALIDARGUMENT.

JointProbabilityDistribution.
IfX 1,X 2,,X kare RANDOMVARIABLESdefinedforthesameexperiment,theirjointprobabilitydistribution
givestheprobabilityofeventsdeterminedbythecollectionofrandomvariables:foranycollectionofsetsof
numbers{A 1,,A k},thejointprobabilitydistributiondetermines

P ((X 1isinA 1)and(X 2isinA 2)andand(X kisinA k) ).

Forexample,supposewerolltwofairdiceindependently.LetX 1bethenumberofspotsthatshowonthe
firstdie,andletX 2bethetotalnumberofspotsthatshowonbothdice.ThenthejointdistributionofX 1and
X 2isasfollows:

P(X 1=1,X 2=2)=P(X 1=1,X 2=3)=P(X 1=1,X 2=4)=P(X 1=1,X 2=5)=P(X 1=1,X 2=6)
=P(X 1=1,X 2=7)=

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 13/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

P(X 1=2,X 2=3)=P(X 1=2,X 2=4)=P(X 1=2,X 2=5)=P(X 1=2,X 2=6)=P(X 1=2,X 2=7)
=P(X 1=2,X 2=8)=

P(X 1=6,X 2=7)=P(X 1=6,X 2=8)=P(X 1=6,X 2=9)=P(X 1=6,X 2=10)=P(X 1=6,X 2
=11)=P(X 1=6,X 2=12)=1/36.

Ifacollectionofrandomvariablesis INDEPENDENT,theirjointprobabilitydistributionistheproductoftheir
MARGINALPROBABILITYDISTRIBUTIONS,theirindividualprobabilitydistributionswithoutregardforthevalueofthe
othervariables.Inthisexample,themarginalprobabilitydistributionofX 1is

P(X 1=1)=P(X 1=2)=P(X 1=3)=P(X 1=4)=P(X 1=5)=P(X 1=6)=1/6,

andthemarginalprobabilitydistributionofX 2is

P(X 2=2)=P(X 2=12)=1/36

P(X 2=3)=P(X 2=11)=1/18

P(X 2=4)=P(X 2=10)=3/36

P(X 2=5)=P(X 2=9)=1/9

P(X 2=6)=P(X 2=8)=5/36

P(X 2=7)=1/6.

NotethatP(X 1=1,X 2=10)=0,whileP(X 1=1)P(X 2=10)=(1/6)(3/36)=1/72.Thejointprobabilityisnot


equaltotheproductofthemarginalprobabilities:X 1andX 2are DEPENDENTrandomvariables.

LawofAverages.
TheLawofAveragessaysthatthe AVERAGEof INDEPENDENTobservationsof RANDOMVARIABLESthathavethe
same PROBABILITYDISTRIBUTIONisincreasinglylikelytobeclosetothe EXPECTEDVALUEofthe RANDOMVARIABLESas
thenumberofobservationsgrows.Moreprecisely,ifX 1,X 2,X 3,,areindependent RANDOMVARIABLESwith
thesame PROBABILITYDISTRIBUTION,andE(X)istheircommon EXPECTEDVALUE,thenforeverynumber>0,

P{|(X 1+X 2++X n)/nE(X)|<}

convergesto100%asngrows.Thisisequivalenttosayingthatthesequenceofsamplemeans

X 1,(X 1+X 2)/2,(X 1+X 2+X 3)/3,

CONVERGESINPROBABILITYtoE(X).
LawofLargeNumbers.
TheLawofLargeNumberssaysthatinrepeated, INDEPENDENTtrialswiththesameprobabilitypofsuccessin
eachtrial,thepercentageofsuccessesisincreasinglylikelytobeclosetothechanceofsuccessasthe
numberoftrialsincreases.Moreprecisely,thechancethatthepercentageofsuccessesdiffersfromthe
probabilitypbymorethanafixedpositiveamount,e>0,convergestozeroasthenumberoftrialsngoesto
infinity,foreverynumbere>0.Notethatincontrasttothedifferencebetweenthepercentageofsuccesses
andtheprobabilityofsuccess,thedifferencebetweenthenumberofsuccessesandthe EXPECTEDnumberof
successes,np,tendstogrowasngrows.Thefollowingtoolillustratesthelawoflargenumbersthebutton
togglesbetweendisplayingthedifferencebetweenthenumberofsuccessesandtheexpectednumberof
successes,andthedifferencebetweenthepercentageofsuccessesandtheexpectedpercentageof
successes.Thetoolonthispageillustratesthelawoflargenumbers.
Limit.
See CONVERGE.
LinearOperation.

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 14/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

SupposefisafunctionoroperationthatactsonthingsweshalldenotegenericallybythelowercaseRoman
lettersxandy.Supposeitmakessensetomultiplyxandybynumbers(whichwedenotebya),andthatit
makessensetoaddthingslikexandytogether.Wesaythatfislinearifforeverynumberaandeveryvalue
ofxandyforwhichf(x)andf(y)aredefined,(i)f(ax)isdefinedandequalsaf(x),and(ii)f(x+y)is
definedandequalsf(x)+f(y).C.f. AFFINE.
Linearassociation.
Twovariablesarelinearlyassociatedifachangeinoneisassociatedwithaproportionalchangeintheother,
withthesameconstantofproportionalitythroughouttherangeofmeasurement.The CORRELATIONCOEFFICIENT
measuresthedegreeoflinearassociationonascaleof1to1.
List.
Iusethetermlisttomeantwothings:eithera MULTISETor(moreoften)an TUPLE.Listsarecountable
collections(multisets)insomeorder(likeatuple).Thatis,itmakessensetotalkaboutthe1st(or7th,or
nth)elementofalist,andthenthandmthelementsofalistcanbeequal,evenifnm(theelementsofa
listneednotbedistinct).
Location,Measureof.
Ameasureoflocationisawayofsummarizingwhata"typical"elementofa LISTisitisaonenumber
summaryofa DISTRIBUTION.Seealso ARITHMETICMEAN, MEDIAN,and MODE.
Logicalargument.
Alogicalargumentconsistsofoneormore PREMISES, PROPOSITIONSthatareassumedtobetrue,anda
conclusion,apropositionthatissupposedtobeguaranteedtobetrue(asamatterofpurelogic)ifthe
premisesaretrue.Forexample,thefollowingisalogicalargument:

p q
p
Therefore,q.

Thisargumenthastwopremises:p q,andp.Theconclusionoftheargumentisq.Ifalogicalargumentis
VALIDifthetruthofthepremisesguaranteesthetruthoftheconclusionotherwise,theargumentis INVALID.
Thatis,anargumentwithpremisesp1,p1,pnandconclusionqisvalidifthe COMPOUNDPROPOSITION

(p1&p2&&pn)q

is LOGICALLYEQUIVALENTtoTRUE.Theargumentgivenaboveisvalidbecauseifitistruethatpqandthatp
istrue(thetwopremises),thenq(theconclusionoftheargument)mustalsobetrue.
Logicallyequivalent,logicalequivalence.
Two PROPOSITIONSarelogicallyequivalentiftheyalwayshavethesametruthvalue.Thatis,thepropositionsp
andqarelogicallyequivalentifpistruewheneverqistrueandpisfalsewheneverqisfalse.The
proposition(pq)isalwaystrueifandonlyifpandqarelogicallyequivalent.Forexample,pislogically
equivalenttop,to(p&p),andto(p|p)(p |(!p))islogicallyequivalenttoTRUE(p &!p)islogically
equivalenttoFALSE(p p)islogicallyequivalenttoTRUEand(p q)islogicallyequivalentto(!p|q).
Longitudinalstudy.
A STUDYinwhichindividualsarefollowedovertime,andcomparedwiththemselvesatdifferenttimes,to
determine,forexample,theeffectofagingonsomemeasuredvariable.Longitudinal STUDIESprovidemuch
morepersuasiveevidenceabouttheeffectofagingthando CROSSSECTIONALSTUDIES.
LowerQuartile(LQ).
See QUARTILES.

Marginoferror.
Ameasureoftheuncertaintyinan ESTIMATEofa PARAMETERunfortunately,noteveryoneagreeswhatitshould
mean.Themarginoferrorofanestimateistypicallyoneortwotimestheestimated STANDARDERRORofthe
estimate.
Marginalprobabilitydistribution.
Themarginalprobabilitydistributionofarandomvariablethathasa JOINTPROBABILITYDISTRIBUTIONwithsome
otherrandomvariablesistheprobabilitydistributionofthatrandomvariablewithoutregardforthevalues
thattheotherrandomvariablestake.ThemarginaldistributionofadiscreterandomvariableX 1thathasa
jointdistributionwithotherdiscreterandomvariablescanbefoundfromthejointdistributionbysumming
overallpossiblevaluesoftheothervariables.Forexample,supposewerolltwofairdiceindependently.Let
X 1bethenumberofspotsthatshowonthefirstdie,andletX 2bethetotalnumberofspotsthatshowon
bothdice.ThenthejointdistributionofX 1andX 2isasfollows:

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 15/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

P(X 1=1,X 2=2)=P(X 1=1,X 2=3)=P(X 1=1,X 2=4)=P(X 1=1,X 2=5)=P(X 1=1,X 2=6)
=P(X 1=1,X 2=7)=

P(X 1=2,X 2=3)=P(X 1=2,X 2=4)=P(X 1=2,X 2=5)=P(X 1=2,X 2=6)=P(X 1=2,X 2=7)
=P(X 1=2,X 2=8)=

P(X 1=6,X 2=7)=P(X 1=6,X 2=8)=P(X 1=6,X 2=9)=P(X 1=6,X 2=10)=P(X 1=6,X 2
=11)=P(X 1=6,X 2=12)=1/36.

ThemarginalprobabilitydistributionofX 1is

P(X 1=1)=P(X 1=2)=P(X 1=3)=P(X 1=4)=P(X 1=5)=P(X 1=6)=1/6.

WecanverifythatthemarginalprobabilitythatX 1=1isindeedthesumofthejointprobabilitydistribution
overallpossiblevaluesofX 2forwhichX 1=1:

P(X 1=1)=P(X 1=1,X 2=2)+P(X 1=1,X 2=3)+P(X 1=1,X 2=4)+P(X 1=1,X 2=5)+P(X 1
=1,X 2=6)+P(X 1=1,X 2=7)=6/36=1/6.

Similarly,themarginalprobabilitydistributionofX 2is

P(X 2=2)=P(X 2=12)=1/36

P(X 2=3)=P(X 2=11)=1/18

P(X 2=4)=P(X 2=10)=3/36

P(X 2=5)=P(X 2=9)=1/9

P(X 2=6)=P(X 2=8)=5/36

P(X 2=7)=1/6.

Again,wecanverifythatthemarginalprobabilitythatX 2=4is3/36byaddingthejointprobabilitiesforall
possiblevaluesofX 1forwhichX 2=4:

P(X 2=4)=P(X 1=1,X 2=4)+P(X 1=2,X 2=4)+P(X 1=3,X 2=4)=3/36.

Markov'sInequality.
For LISTS:Ifalistcontainsnonegativenumbers,thefractionofnumbersinthelistatleastaslargeasany
givenconstanta>0isnolargerthanthe ARITHMETICMEANofthelist,dividedbya.
For RANDOMVARIABLES:ifarandomvariableXmustbenonnegative,thechancethatXexceedsanygiven
constanta>0isnolargerthanthe EXPECTEDVALUEofX,dividedbya.
MaximumLikelihoodEstimate(MLE).
Themaximumlikelihoodestimateofa PARAMETERfromdataisthepossiblevalueofthe PARAMETERforwhich
thechanceofobservingthedatalargest.Thatis,supposethatthe PARAMETERisp,andthatweobservedata
x.Thenthemaximumlikelihoodestimateofpis

estimatepbythevalueqthatmakesP(observingxwhenthevalueofpisq)aslargeaspossible.

Forexample,supposewearetryingtoestimatethechancethata(possiblybiased)coinlandsheadswhenit
istossed.Ourdatawillbethenumberoftimesxthecoinlandsheadsinnindependenttossesofthecoin.
Thedistributionofthenumberoftimesthecoinlandsheadsis BINOMIALwith PARAMETERSn(known)andp
(unknown).Thechanceofobservingxheadsinntrialsifthechanceofheadsinagiventrialisqis nCx
qx(1q)nx.Themaximumlikelihoodestimateofpwouldbethevalueofqthatmakesthatchancelargest.
Wecanfindthatvalueofqexplicitlyusingcalculusitturnsouttobeq=x/n,thefractionoftimesthecoinis
observedtolandheadsinthentosses.Thusthemaximumlikelihoodestimateofthechanceofheadsfrom
thenumberofheadsinnindependenttossesofthecoinistheobservedfractionoftossesinwhichthecoin
landsheads.
Mean,Arithmeticmean.
Thesumofa LISTofnumbers,dividedbythenumberofelementsinthelist.Seealso AVERAGE.
MeanSquaredError(MSE).

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 16/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

Themeansquarederrorofan ESTIMATORofa PARAMETERisthe EXPECTEDVALUEofthesquareofthedifference


betweentheestimatorandtheparameter.Insymbols,ifXisanestimatoroftheparametert,then

MSE(X)=E((Xt)2).

TheMSEmeasureshowfartheestimatorisofffromwhatitistryingtoestimate,ontheaveragein
repeatedexperiments.Itisasummarymeasureoftheaccuracyoftheestimator.Itcombinesanytendency
oftheestimatortoovershootorundershootthetruth(BIAS),andthevariabilityoftheestimator(SE).The
MSEcanbewrittenintermsofthe BIASand SEoftheestimator:

MSE(X)=(BIAS(X))2+(SE(X))2.

Median.
"Middlevalue"ofa LIST.Thesmallestnumbersuchthatatleasthalfthenumbersinthelistarenogreater
thanit.Ifthelisthasanoddnumberofentries,themedianisthemiddleentryinthelistaftersortingthelist
intoincreasingorder.Ifthelisthasanevennumberofentries,themedianisthesmallerofthetwomiddle
numbersaftersorting.Themediancanbeestimatedfromahistogrambyfindingthesmallestnumbersuch
thattheareaunderthehistogramtotheleftofthatnumberis50%.
Memberofaset.
Somethingisamember(orelement)ofa SETifitisoneofthethingsinthe SET.
MethodofComparison.
Themostbasicandimportantmethodofdeterminingwhethera TREATMENThasaneffect:comparewhat
happenstoindividualswhoaretreated(the TREATMENTGROUP)withwhathappenstoindividualswhoarenot
treated(the CONTROLGROUP).
MinimaxStrategy.
Ingametheory,aminimaxstrategyisonethatminimizesone'smaximumloss,whatevertheopponent
mightdo(whateverstrategytheopponentmightchoose).
Mode.
For LISTS,themodeisamostcommon(frequent)value.Alistcanhavemorethanonemode.For
HISTOGRAMS,amodeisarelativemaximum("bump").
Moment.
Thekthmomentofa LISTistheaveragevalueoftheelementsraisedtothekthpowerthatis,ifthelist
consistsoftheNelementsx1,x2,,xN,thekthmomentofthelistis

(x1k+x2k+xNk)/N.

Thekthmomentofa RANDOMVARIABLEXisthe EXPECTEDVALUEofX k,E (X k).


Monotone,monotonicfunction.
Afunctionismonotoneifitonlyincreasesoronlydecreases:fincreasesmonotonically(ismonotonic
increasing)ifx>y,impliesthatf(x)f(y).Afunctionfdecreasesmonotonically(ismonotonicdecreasing)ifx
>y,impliesthatf(x)f(y).Afunctionfisstrictlymonotonicallyincreasingifx>y,impliesthatf(x)>f(y),and
strictlymonotonicallydecreasingififx>y,impliesthatf(x)<f(y).
MultimodalDistribution.
Adistributionwithmorethanone MODE.The HISTOGRAMofamultimodaldistributionhasmorethanone
"bump."
MultinomialDistribution
Considerasequenceofn INDEPENDENTtrials,eachofwhichcanresultinanoutcomeinanyofkcategories.
Letpjbetheprobabilitythateachtrialresultsinanoutcomeincategoryj,j=1,2,,k,so

p1+p2++pk=100%.

Thenumberofoutcomesofeachtypehasamultinomialdistribution.Inparticular,theprobabilitythatthen
trialsresultinn1outcomesoftype1,n2outcomesoftype2,,andnkoutcomesoftypekis

n!/(n1!n2!nk!)p1n1p2n2pknk,

ifn1,,nkarenonnegativeintegersthatsumtonthechanceiszerootherwise.
Multiplicationrule.
Thechancethat EVENTSAandBbothoccur(i.e.,that EVENT ABoccurs),isthe CONDITIONALPROBABILITYthatA
occursgiventhatBoccurs,timestheunconditionalprobabilitythatBoccurs.
Multiplicityinhypothesistests.
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 17/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

In HYPOTHESISTESTING,ifmorethanonehypothesisistested,theactual SIGNIFICANCELEVELofthecombined
testsisnotequaltothenominal SIGNIFICANCELEVELoftheindividualtests.Seealso FALSEDISCOVERYRATE.
MultivariateData.
Asetofmeasurementsoftwoormore VARIABLESperindividual.See BIVARIATE.
Multiset.
Amultiset,alsoknownasabagisacollectionofthings,butunlikea SET,whichisalsoacollectionof
thingsthesameobjectcanoccurinamultisetmorethanonce.Forinstance,thesets{1,2},{1,2,2},and
{1,1,1,1,1,2,2}areallequal,whilethemultisets[1,2],[1,2,2],and[1,1,1,1,1,2,2]arealldifferent.
However,orderdoesnotmatterforsetsorformultisets,so,forinstance{1,2}={2,1}and[1,1,1,1,1,2,
2]=[2,1,1,2,1,1,1].
MutuallyExclusive.
See DISJOINTEVENTSor DISJOINTSETS.

Nearlynormaldistribution.
Apopulationofnumbers(a LISTofnumbers)issaidtohaveanearlynormaldistributionifthe HISTOGRAMofits
valuesin STANDARDUNITSnearlyfollowsa NORMALCURVE.Moreprecisely,supposethatthe MEANofthelistis
andthestandarddeviationofthelistisSD.Thenthelistisnearlynormallydistributedif,foreverytwo
numbersa<b,thefractionofnumbersinthelistthatarebetweenaandbisapproximatelyequaltothe
areaunderthenormalcurvebetween(a)/SDand(a)/SD.
NegativeBinomialDistribution.
Considerasequenceof INDEPENDENTtrialswiththesameprobabilitypofsuccessineachtrial.Thenumberof
trialsuptoandincludingtherthsuccesshasthenegativeBinomialdistributionwithparametersnandr.If
the RANDOMVARIABLENhasthenegativebinomialdistributionwithparametersnandr,then

P(N=k)=k1Cr1pr(1p)kr,

fork=r,r+1,r+2,,andzerofork<r,becausetheremustbeatleastrtrialstohaversuccesses.The
negativebinomialdistributionisderivedasfollows:fortherthsuccesstooccuronthekthtrial,theremust
havebeenr1successesinthefirstk1trials,andthekthtrialmustresultinsuccess.Thechanceofthe
formeristhechanceofr1successesink1 INDEPENDENTtrialswiththesameprobabilityofsuccessineach
trial,which,accordingtothe BINOMIALDISTRIBUTIONwithparametersn=k1andp,hasprobability

r kr
k1Cr1p 1(1p) .

Thechanceofthelattereventisp,byassumption.Becausethetrialsare INDEPENDENT,wecanfindthe
chancethatboth EVENTSoccurbymultiplyingtheirchancestogether,whichgivestheexpressionforP(N=k)
above.
Nocausationwithoutmanipulation.
AsloganattributedtoPaulHolland.Iftheconditionswerenotdeliberatelymanipulated(forexample,ifthe
situationisan OBSERVATIONALSTUDYratherthanan EXPERIMENT),itisunwisetoconcludethatthereisanycausal
relationshipbetweentheoutcomeandtheconditions.See POSTHOCERGOPROPTERHOCand CUMHOCERGO
PROPTERHOC.
NonlinearAssociation.
Therelationshipbetweentwovariablesisnonlinearifachangeinoneisassociatedwithachangeinthe
otherthatisdependsonthevalueofthefirstthatis,ifthechangeinthesecondisnotsimplyproportional
tothechangeinthefirst,independentofthevalueofthefirstvariable.
Nonresponse.
Insurveys,itisrarethateveryonewhoisinvited''toparticipate(everyonewhosephonenumberiscalled,
everyonewhoismailedaquestionnaire,everyoneaninterviewertriestostoponthestreet)infact
responds.Thedifferencebetweenthe"invited"samplesought,andthatobtained,isthenonresponse.
Nonresponsebias.
Inasurvey,thosewhorespondmaydifferfromthosewhodonot,inwaysthatarerelatedtotheeffectone
istryingtomeasure.Forexample,atelephonesurveyofhowmanyhourspeopleworkislikelytomiss
peoplewhoareworkinglate,andarethereforenotathometoanswerthephone.Whenthathappens,the
surveymaysufferfromnonresponsebias.Nonresponsebiasmakestheresultofasurveydiffer
systematicallyfromthetruth.
Nonresponserate.

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 18/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

Thefractionof NONRESPONDERSINASURVEY:THENUMBEROFNONRESPONDERSDIVIDEDBYTHENUMBEROFPEOPLEINVITED
TOPARTICIPATE(THENUMBERSENTQUESTIONNAIRES,THENUMBEROFINTERVIEWATTEMPTS,ETC.)IFTHENONRESPONSERATEIS
APPRECIABLE,THESURVEYSUFFERFROMLARGENONRESPONSEBIAS.
Normalapproximation.
Thenormalapproximationtodataistoapproximateareasunderthe HISTOGRAMofdata,transformedinto
STANDARDUNITS,bythecorrespondingareasunderthe NORMALCURVE.
Manyprobabilitydistributionscanbeapproximatedbyanormaldistribution,inthesensethattheareaunder
theprobabilityhistogramisclosetotheareaunderacorrespondingpartofthenormalcurve.Tofindthe
correspondingpartofthenormalcurve,therangemustbeconvertedtostandardunits,bysubtractingthe
EXPECTEDVALUEanddividingbythe STANDARDERROR.Forexample,theareaunderthe BINOMIALprobability
histogramforn=50andp=30%between9.5and17.5is74.2%.Tousethenormalapproximation,we
transformtheendpointstostandardunits,bysubtractingthe EXPECTEDVALUE(fortheBinomialrandom
variable,np=15forthesevaluesofnandp)anddividingtheresultbythe STANDARDERROR(foraBinomial,
(np(1p))1/2=3.24forthesevaluesofnandp).Theareanormalapproximationistheareaunderthe
normalcurvebetween(9.515)/3.24=1.697and(17.515)/3.24=0.772thatareais73.5%,slightly
smallerthanthecorrespondingareaunderthebinomialhistogram.Seealsothe CONTINUITYCORRECTION.The
toolonthispageillustratesthenormalapproximationtothe BINOMIAL PROBABILITYHISTOGRAM.Notethatthe
approximationgetsworsewhenpgetscloseto0or1,andthattheapproximationimprovesasnincreases.
Normalcurve.
Thenormalcurveisthefamiliar"bellcurve:,"illustratedonthispage.Themathematicalexpressionforthe
2
normalcurveisy=(2pi)ex /2,wherepiistheratioofthecircumferenceofacircletoitsdiameter
(3.14159265),andeisthebaseofthenaturallogarithm(2.71828).Thenormalcurveissymmetric
aroundthepointx=0,andpositiveforeveryvalueofx.Theareaunderthenormalcurveisunity,andtheSD
ofthenormalcurve,suitablydefined,isalsounity.Many(butnotmost) HISTOGRAMS,convertedinto STANDARD
UNITS,approximatelyfollowthenormalcurve.
Normaldistribution.
ArandomvariableXhasanormaldistributionwithmeanmandstandarderrorsifforeverypairofnumbers
ab,thechancethata<(Xm)/s<bis

P(a<(Xm)/s<b)=areaunderthenormalcurvebetweenaandb.

IftherearenumbersmandssuchthatXhasanormaldistributionwithmeanmandstandarderrors,then
Xissaidtohaveanormaldistributionortobenormallydistributed.IfXhasanormaldistributionwithmean
m=0andstandarderrors=1,thenXissaidtohaveastandardnormaldistribution.ThenotationX~N(m,s2)
meansthatXhasanormaldistributionwithmeanmandstandarderrorsforexample,X~N(0,1),meansX
hasastandardnormaldistribution.
NOT,!,Negation,LogicalNegation.
Thenegationofa LOGICALPROPOSITIONp,!p,isapropositionthatisthelogicaloppositeofp.Thatis,ifpis
true,!pisfalse,andifpisfalse,!pistrue.Negationtakesprecedenceoverotherlogicaloperations.Other
commonsymbolsforthenegationoperatorinclude,and.
Nullhypothesis.
In HYPOTHESISTESTING,thehypothesiswewishtofalsifyonthebasisofthedata.Thenullhypothesisis
typicallythatsomethingisnotpresent,thatthereisnoeffect,orthatthereisnodifferencebetween
treatmentandcontrol.

ObservationalStudy.
C.f. CONTROLLEDEXPERIMENT.
Odds.
Theoddsinfavorofan EVENTistheratioofthe PROBABILITYthattheeventoccurstotheprobabilitythatthe
eventdoesnotoccur.Forexample,supposeanexperimentcanresultinanyofnpossibleoutcomes,all
equallylikely,andthatkoftheoutcomesresultina"win"andnkresultina"loss."Thenthechanceof
winningisk/nthechanceofnotwinningis(nk)/nandtheoddsinfavorofwinningare(k/n)/ ((nk)/n)=
k/(nk),whichisthenumberoffavorableoutcomesdividedbythenumberofunfavorableoutcomes.Note
thatoddsarenotsynonymouswithprobability,butthetwocanbeconvertedbackandforth.Iftheoddsin
favorofaneventareq,thentheprobabilityoftheeventisq/(1+q).Iftheprobabilityofaneventisp,the
oddsinfavoroftheeventarep/(1p)andtheoddsagainsttheeventare(1p)/p.
OnesidedTest.
C.f. TWOSIDEDTEST.Anhypothesistestofthe NULLHYPOTHESISthatthevalueofaparameter,,isequaltoa
nullvalue,0,designedtohave POWERagainsteitherthe ALTERNATIVEHYPOTHESISthat<0orthe ALTERNATIVE

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 19/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

>0(butnotboth).Forexample,a SIGNIFICANCELEVEL5%,onesided ZTESTofthenullhypothesisthatthe


meanofapopulationequalszeroagainstthealternativethatitisgreaterthanzero,wouldrejectthenull
hypothesisforvaluesof
SAMPLEMEAN
z = > 1.64.
SE(SAMPLEMEAN)
or,|,Disjunction,LogicalDisjunction,
Anoperationontwo LOGICALPROPOSITIONS.Ifpandqaretwopropositions,(p|q)isapropositionthatistrueif
pistrueorifqistrue(orboth)otherwise,itisfalse.Thatis,(p|q)istrueunlessbothpandqarefalse.
Theoperation|issometimesrepresentedbythesymbolandsometimesbythewordor.C.f. EXCLUSIVE
DISJUNCTION,XOR .
OrdinalVariable.
A VARIABLEwhosepossiblevalueshaveanaturalorder,suchas{short,medium,long},{cold,warm,hot},or
{0,1,2,3,}.Incontrast,avariablewhosepossiblevaluesare{straight,curly}or{Arizona,California,
Montana,NewYork}wouldnotnaturallybeordinal.Arithmeticwiththepossiblevaluesofanordinalvariable
doesnotnecessarilymakesense,butitdoesmakesensetosaythatonepossiblevalueislargerthan
another.
OutcomeSpace.
Theoutcomespaceisthesetofallpossibleoutcomesofagiven RANDOMEXPERIMENT.Theoutcomespaceis
oftendenotedbythecapitalletterS.
Outlier.
Anoutlierisanobservationthatismany SD'sfromthe MEAN.Itissometimestemptingtodiscardoutliers,
butthisisimprudentunlessthecauseoftheoutliercanbeidentified,andtheoutlierisdeterminedtobe
spurious.Otherwise,discardingoutlierscancauseonetounderestimatethetruevariabilityofthe
measurementprocess.

Pvalue.
Supposewehaveafamilyof HYPOTHESISTESTSofa NULLHYPOTHESISthatletustestthehypothesisatany
SIGNIFICANCELEVELpbetween0and100%wechoose.ThePvalueofthe NULLHYPOTHESISgiventhedataisthe
smallestsignificancelevelpforwhichanyofthetestswouldhaverejectedthenullhypothesis.
Forexample,letXbea TESTSTATISTIC,andforpbetween0and100%,letxpbethesmallestnumbersuch
that,underthenullhypothesis,

P(Xx)p.

Thenforanypbetween0and100%,therule

rejectthenullhypothesisifX<xp

teststhenullhypothesisatsignificancelevelp.IfweobservedX=x,thePvalueofthenullhypothesis
giventhedatawouldbethesmallestpsuchthatx<xp.
Parameter.
Anumericalpropertyofa POPULATION,suchasits MEAN.
Partition.
Apartitionofan EVENTAisacollectionofevents{A 1,A 2,A 3,}suchthattheeventsinthecollectionare
DISJOINT,andtheir UNIONisA.Thatis,

AjAk={}unlessj=k,and

A=A1A2A3.

IftheeventAisnotspecified,itisassumedtobetheentire OUTCOMESPACES.
PayoffMatrix.
Awayofrepresentingwhateachplayerinagamewinsorloses,asafunctionofhisandhisopponent's
STRATEGIES.
Percentile.
Thepthpercentileofa LISTisthesmallestnumbersuchthatatleastp%ofthenumbersinthelistareno
largerthanit.Thepthpercentileofa RANDOMVARIABLEisthesmallestnumbersuchthatthechancethatthe

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 20/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

randomvariableisnolargerthanitisatleastp%.C.f. QUANTILE.
Permutation.
Apermutationofasetisanarrangementoftheelementsofthesetinsomeorder.Ifthesethasnthingsin
it,thereare N!differentorderingsofitselements.Forthefirstelementinanordering,therearenpossible
choices,forthesecond,thereremainn1possiblechoices,forthethird,therearen2,etc.,andforthenth
elementoftheordering,thereisasinglechoiceremaining.Bythefundamentalruleofcounting,thetotal
numberofsequencesisthusn(n1)(n2)1.Similarly,thenumberoforderingsoflengthkonecan
formfromnkthingsisn(n1)(n2)(nk+1)=n!/(nk)!.Thisisdenoted nP k,thenumberof
permutationsofnthingstakenkatatime.C.f. COMBINATIONS.
Placebo.
A"dummy" TREATMENTthathasnopharmacologicaleffecte.g.,asugarpill.
Placeboeffect.
Thebelieforknowledgethatoneisbeingtreatedcanitselfhaveaneffectthat CONFOUNDSwiththerealeffect
ofthetreatment.Subjectsgivena PLACEBOasapainkillerreportstatisticallysignificantreductionsinpainin
randomizedexperimentsthatcomparethemwithsubjectswhoreceivenotreatmentatall.Thisveryreal
psychologicaleffectofaplacebo,whichhasnodirectbiochemicaleffect,iscalledtheplaceboeffect.
Administeringaplacebotothe CONTROLGROUPisthusimportantinexperimentswithhumansubjectsthisis
theessenceofa BLINDEXPERIMENT.
PointofAverages.
Ina SCATTERPLOT,thepointwhosecoordinatesarethe ARITHMETICMEANSofthecorrespondingvariables.For
example,ifthevariableXisplottedonthehorizontalaxisandthevariableYisplottedontheverticalaxis,
thepointofaverageshascoordinates(meanofX,meanofY).
PoissonDistribution.
ThePoissondistributionisa DISCRETE PROBABILITYDISTRIBUTIONthatdependsonone PARAMETER,m.IfXisa
RANDOMVARIABLEwiththePoissondistributionwithparameterm,thentheprobabilitythatX=kis

Emmk/k!,k=0,1,2,,
whereEisthebaseofthenaturallogarithmand!isthe FACTORIALfunction.Forallothervaluesofk,the
probabilityiszero.
The EXPECTEDVALUEthePoissondistributionwithparametermism,andthe STANDARDERRORofthePoisson
distributionwithparametermism.
Population.
Acollectionof UNITSbeingstudied. UNITScanbepeople,places,objects,epochs,drugs,procedures,ormany
otherthings.Muchofstatisticsisconcernedwithestimatingnumericalproperties(PARAMETERS)ofanentire
populationfroma RANDOMSAMPLEof UNITSfromthepopulation.
PopulationMean.
The MEANofthenumbersinanumericalpopulation.Forexample,thepopulationmeanofaboxofnumbered
ticketsisthemeanofthe LISTcomprisedofallthenumbersonallthetickets.Thepopulationmeanisa
PARAMETER.C.f. SAMPLEMEAN.
PopulationPercentage.
Thepercentageof UNITSina POPULATIONthatpossessaspecifiedproperty.Forexample,thepercentageofa
givencollectionofregisteredvoterswhoareregisteredasRepublicans.Ifeachunitthatpossessesthe
propertyislabeledwith"1,"andeachunitthatdoesnotpossessthepropertyislabeledwith"0,"the
populationpercentageisthesameasthemeanofthat LISTofzerosandonesthatis,thepopulation
percentageisthe POPULATIONMEANforapopulationofzerosandones.Thepopulationpercentageisa
PARAMETER.C.f. SAMPLEPERCENTAGE.
PopulationStandardDeviation.
The STANDARDDEVIATIONofthevaluesofavariableforapopulation.Thisisa PARAMETER,nota STATISTIC.C.f.
SAMPLESTANDARDDEVIATION.
Posthocergopropterhoc.
"Afterthis,thereforebecauseofthis."Afallacyoflogicknownsinceclassicaltimes:inferringa CAUSAL
RELATIONfrom CORRELATION.Don'tdothisathome!
Power.
Referstoan HYPOTHESISTEST.Thepowerofatestagainstaspecific ALTERNATIVEHYPOTHESISisthechancethat
thetestcorrectlyrejectsthe NULLHYPOTHESISwhenthe ALTERNATIVEHYPOTHESISistrue.
Premise,logicalpremise.
Apremiseisa PROPOSITIONthatisassumedtobetrueaspartofa LOGICALARGUMENT.
Primafacie.
Latinfor"atfirstglance.""Onthefaceofit."Primafacieevidenceforsomethingisinformationthatatfirst
glancesupportstheconclusion.Oncloserexamination,thatmightnotbetruetherecouldbeanother

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 21/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

explanationfortheevidence.
Principleofinsufficientreason(Laplace)
Laplace's PRINCIPLEOFINSUFFICIENTREASONsaysthatifthereisnoreasontobelievethatthepossibleoutcomes
ofanexperimentarenot EQUALLYLIKELY,oneshouldassumethattheoutcomesareequallylikely.Thisisan
exampleofa FALLACYcalled APPEALTOIGNORANCE.
Probability.
Theprobabilityofan EVENTisanumberbetweenzeroand100%.Themeaning(interpretation)ofprobability
isthesubjectof THEORIESOFPROBABILITY,whichdifferintheirinterpretations.However,anyruleforassigning
probabilitiesto EVENTShastosatisfythe AXIOMSOFPROBABILITY.
Probabilitydensityfunction.
Thechancethata CONTINUOUSRANDOMVARIABLEisinanyrangeofvaluescanbecalculatedastheareaundera
curveoverthatrangeofvalues.Thecurveistheprobabilitydensityfunctionoftherandomvariable.Thatis,
ifXisacontinuousrandomvariable,thereisafunctionf(x)suchthatforeverypairofnumbersab,

P(aXb)=(areaunderfbetweenaandb)
fistheprobabilitydensityfunctionofX.Forexample,theprobabilitydensityfunctionofarandomvariable
witha STANDARDNORMALDISTRIBUTIONisthe NORMALCURVE.Onlycontinuousrandomvariableshaveprobability
densityfunctions.
ProbabilityDistribution.
Theprobabilitydistributionofa RANDOMVARIABLEspecifiesthechancethatthevariabletakesavalueinany
subsetoftherealnumbers.(Thesubsetshavetosatisfysometechnicalconditionsthatarenotimportantfor
thiscourse.)Theprobabilitydistributionofa RANDOMVARIABLEiscompletelycharacterizedbythe CUMULATIVE
PROBABILITYDISTRIBUTIONFUNCTIONthetermssometimesareusedsynonymously.Theprobabilitydistributionof
a DISCRETE RANDOMVARIABLEcanbecharacterizedbythechancethatthe RANDOMVARIABLEtakeseachofits
possiblevalues.Forexample,theprobabilitydistributionofthetotalnumberofspotsSshowingontheroll
oftwofairdicecanbewrittenasatable:

s P(S=s)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36

Theprobabilitydistributionofa CONTINUOUS RANDOMVARIABLEcanbecharacterizedbyits PROBABILITYDENSITY


FUNCTION.
ProbabilityHistogram.
Aprobabilityhistogramfora RANDOMVARIABLEisanalogoustoa HISTOGRAMofdata,butinsteadofplottingthe
areaofthe BINSproportionaltotherelativefrequencyofobservationsinthe CLASSINTERVAL,oneplotsthearea
ofthe BINSproportionaltotheprobabilitythatthe RANDOMVARIABLEisinthe CLASSINTERVAL.
ProbabilitySample.
Asampledrawnfromapopulationusingarandommechanismsothateveryelementofthepopulationhas
aknownchanceofendingupinthesample.
Probability,Theoriesof.
Atheoryofprobabilityisawayofassigningmeaningtoprobabilitystatementssuchas"thechancethata
thumbtacklandspointupis2/3."Thatis,atheoryofprobabilityconnectsthemathematicsofprobability,
whichisthesetofconsequencesofthe AXIOMSOFPROBABILITY,withtherealworldofobservationand
experiment.Thereareseveralcommontheoriesofprobability.Accordingtothefrequencytheoryof

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 22/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

probability,theprobabilityofaneventisthelimitofthepercentageoftimesthattheeventoccursin
repeated,independenttrialsunderessentiallythesamecircumstances.Accordingtothesubjectivetheoryof
probability,aprobabilityisanumberthatmeasureshowstronglywebelieveaneventwilloccur.Thenumber
isonascaleof0%to100%,with0%indicatingthatwearecompletelysureitwon'toccur,and100%
indicatingthatwearecompletelysurethatitwilloccur.Accordingtothetheoryof EQUALLYLIKELYOUTCOMES,if
anexperimenthasnpossibleoutcomes,and(forexample,bysymmetry)thereisnoreasonthatanyofthen
possibleoutcomesshouldoccurpreferentiallytoanyoftheothers,thenthechanceofeachoutcomeis
100%/n.Eachofthesetheorieshasitslimitations,itsproponents,anditsdetractors.
Proposition,logicalproposition.
Alogicalpropositionisastatementthatcanbeeithertrueorfalse.Forexample,"thesunisshiningin
Berkeleyrightnow"isaproposition.Seealso &, , , |, XOR, CONVERSE, CONTRAPOSITIVEand LOGICAL
ARGUMENT.
Prosecutor'sFallacy.
Theprosecutor'sfallacyconsistsofconfusingtwo CONDITIONALPROBABILITIES:P(A|B)andP(B|A).Forinstance,
P(A|B)couldbethechanceofobservingtheevidenceiftheaccusedisguilty,whileP(B|A)isthechancethat
theaccusedisguiltygiventheevidence.Thelattermightnotmakesenseatall,butevenwhenitdoes,the
twonumbersneednotbeequal.Thisfallacyisrelatedtoacommonmisinterpretationof PVALUES.

QualitativeVariable.
Aqualitativevariableisonewhosevaluesareadjectives,suchascolors,genders,nationalities,etc.C.f.
QUANTITATIVEVARIABLEand CATEGORICALVARIABLE.
Quantile.
Theqthquantileofa LIST(0<q1)isthesmallestnumbersuchthatthefractionqormoreoftheelements
ofthelistarelessthanorequaltoit.I.e.,ifthelistcontainsnnumbers,theqthquantile,isthesmallest
numberQsuchthatatleastnqelementsofthelistarelessthanorequaltoQ.
QuantitativeVariable.
Avariablethattakesnumericalvaluesforwhicharithmeticmakessense,forexample,counts,temperatures,
weights,amountsofmoney,etc.Forsomevariablesthattakenumericalvalues,arithmeticwiththosevalues
doesnotmakesensesuchvariablesarenotquantitative.Forexample,addingandsubtractingsocial
securitynumbersdoesnotmakesense.Quantitativevariablestypicallyhaveunitsofmeasurement,suchas
inches,people,orpounds.
Quartiles.
Therearethreequartiles.Thefirstorlowerquartile(LQ)ofa LISTisanumber(notnecessarilyanumberin
thelist)suchthatatleast1/4ofthenumbersinthelistarenolargerthanit,andatleast3/4ofthenumbers
inthelistarenosmallerthanit.Thesecondquartileisthe MEDIAN.Thethirdorupperquartile(UQ)isa
numbersuchthatatleast3/4oftheentriesinthelistarenolargerthanit,andatleast1/4ofthenumbers
inthelistarenosmallerthanit.Tofindthequartiles,firstsortthelistintoincreasingorder.Findthesmallest
integerthatisatleastasbigasthenumberofentriesinthelistdividedbyfour.Callthatintegerk.Thekth
elementofthesortedlististhelowerquartile.Findthesmallestintegerthatisatleastasbigasthenumber
ofentriesinthelistdividedbytwo.Callthatintegerl.Thelthelementofthesortedlististhemedian.Find
thesmallestintegerthatisatleastaslargeasthenumberofentriesinthelisttimes3/4.Callthatinteger
m.Themthelementofthesortedlististheupperquartile.
QuotaSample.
Aquotasampleisasamplepickedtomatchthepopulationwithrespecttosomesummarycharacteristics.It
isnota RANDOMSAMPLE.Forexample,inanopinionpoll,onemightselectasamplesothattheproportionsof
variousethnicitiesinthesamplematchtheproportionsofethnicitiesintheoverallpopulationfromwhichthe
sampleisdrawn.Matchingonsummarystatisticsdoesnotguaranteethatthesamplecomescloseto
matchingthepopulationwithrespecttothequantityofinterest.Asaresult,quotasamplesaretypically
biased,andthesizeofthebiasisgenerallyimpossibletodetermineunlesstheresultcanbecomparedwith
aknownresultforthewholepopulationorforarandomsample.Moreover,withaquotasample,itis
impossibletoquantifyhowrepresentativeofthepopulationaquotasampleislikelytobequotasampling
doesnotallowonetoquantifythelikelysizeof SAMPLINGERROR.Quotasamplesaretobeavoided,andresults
basedonquotasamplesaretobeviewedwithsuspicion.Seealso CONVENIENCESAMPLE.

RandomError.
Allmeasurementsaresubjecttoerror,whichcanoftenbebrokendownintotwocomponents:a BIASor
SYSTEMATICERROR,whichaffectsallmeasurementsthesamewayandarandomerror,whichisingeneral

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 23/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

differenteachtimeameasurementismade,andbehaveslikeanumberdrawnwithreplacementfromabox
ofnumberedticketswhoseaverageiszero.
RandomEvent.
See RANDOMEXPERIMENT.
RandomExperiment.
Anexperimentortrialwhoseoutcomeisnotperfectlypredictable,butforwhichthelongrunrelative
frequencyofoutcomesofdifferenttypesinrepeatedtrialsispredictable.Notethat"random"isdifferent
from"haphazard,"whichdoesnotnecessarilyimplylongtermregularity.
RandomSample.
Arandomsampleisa SAMPLEwhosemembersarechosenatrandomfromagiven POPULATIONinsuchaway
thatthechanceofobtaininganyparticularsamplecanbecomputed.Thenumberofunitsinthesampleis
calledthesamplesize,oftendenotedn.ThenumberofunitsinthepopulationoftenisdenotedN.Random
samplescanbedrawnwithorwithoutreplacingobjectsbetweendrawsthatis,drawingallnobjectsinthe
sampleatonce(arandomsamplewithoutreplacement),ordrawingtheobjectsoneatatime,replacing
theminthepopulationbetweendraws(arandomsamplewithreplacement).Inarandomsamplewith
replacement,anygivenmemberofthepopulationcanoccurinthesamplemorethanonce.Inarandom
samplewithoutreplacement,anygivenmemberofthepopulationcanbeinthesampleatmostonce.A
randomsamplewithoutreplacementinwhicheverysubsetofnoftheNunitsinthepopulationisequally
likelyisalsocalleda SIMPLERANDOMSAMPLE.Thetermrandomsamplewithreplacementdenotesarandom
sampledrawninsuchawaythateverymultisetofnunitsinthepopulationisequallylikely.Seealso
PROBABILITYSAMPLE.
RandomVariable.
Arandomvariableisanassignmentofnumberstopossibleoutcomesofa RANDOMEXPERIMENT.Forexample,
considertossingthreecoins.Thenumberofheadsshowingwhenthecoinslandisarandomvariable:it
assignsthenumber0totheoutcome{T,T,T},thenumber1totheoutcome{T,T,H},thenumber2tothe
outcome{T,H,H},andthenumber3totheoutcome{H,H,H}.
RandomizedControlledExperiment.
An EXPERIMENTinwhichchanceisdeliberatelyintroducedinassigning SUBJECTStothe TREATMENTand CONTROL
groups.Forexample,wecouldwriteanidentifyingnumberforeachsubjectonaslipofpaper,stirupthe
slipsofpaper,anddrawslipswithoutreplacementuntilwehavedrawnhalfofthem.Thesubjectsidentified
ontheslipsdrawncouldthenbeassignedtotreatment,andtheresttocontrol.Randomizingtheassignment
tendstodecrease CONFOUNDINGofthetreatmenteffectwithotherfactors,bymakingthetreatmentandcontrol
groupsroughlycomparableinallrespectsbutthetreatment.
Range.
Therangeofasetofnumbersisthelargestvalueinthesetminusthesmallestvalueintheset.Notethat
asastatisticalterm,therangeisasinglenumber,notarangeofnumbers.
Realnumber.
Looselyspeaking,therealnumbersareallnumbersthatcanberepresentedasfractions(rationalnumbers),
whetherproperorimproperandallnumbersinbetweentherationalnumbers.Thatis,therealnumbers
comprisetherationalnumbersandalllimitsofCauchysequencesofrationalnumbers,wheretheCauchy
sequenceiswithrespecttotheabsolutevaluemetric.(Moreformally,therealnumbersarethecompletion
ofthesetofrationalnumbersinthetopologyinducedbytheabsolutevaluefunction.)Therealnumbers
containallintegers,allfractions,andallirrational(andtranscendental)numbers,suchas,e,and2.
Thereare UNCOUNTABLYmanyrealnumbersbetween0and1incontrast,thereareonly COUNTABLYmany
rationalnumbersbetween0and1.
Regression,LinearRegression.
Linearregressionfitsalinetoa SCATTERPLOTinsuchawayastominimizethesumofthesquaresofthe
RESIDUALS.Theresultingregressionline,togetherwiththe STANDARDDEVIATIONSofthetwovariablesortheir
CORRELATIONCOEFFICIENT,canbeareasonablesummaryofascatterplotifthescatterplotisroughlyfootball
shaped.Inothercases,itisapoorsummary.IfweareregressingthevariableYonthevariableX,andifY
isplottedontheverticalaxisandXisplottedonthehorizontalaxis,theregressionlinepassesthroughthe
POINTOFAVERAGES,andhasslopeequaltothe CORRELATIONCOEFFICIENTtimesthe SD ofYdividedbythe SD of
X.Thispageshowsascatterplot,withabuttontoplottheregressionline.
RegressionFallacy.
Theregressionfallacyistoattributethe REGRESSIONEFFECTtoanexternalcause.
RegressionTowardtheMean,RegressionEffect.
Supposeonemeasurestwo VARIABLESforeachmemberofagroupofindividuals,andthatthe CORRELATION
COEFFICIENTofthevariablesispositive(negative).Ifthevalueofthefirstvariableforthatindividualisabove
average,thevalueofthesecondvariableforthatindividualislikelytobeabove(below)average,butby
fewer STANDARDDEVIATIONSthanthefirstvariableis.Thatis,thesecondobservationislikelytobeclosertothe
meanin STANDARDUNITS.Forexample,supposeonemeasurestheheightsoffathersandsons.Each
individualisa(father,son)pairthetwovariablesmeasuredaretheheightofthefatherandtheheightof
theson.Thesetwovariableswilltendtohaveapositivecorrelationcoefficient:fatherswhoaretallerthan
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 24/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

averagetendtohavesonswhoaretallerthanaverage.Considera(father,son)pairchosenatrandomfrom
thisgroup.Supposethefather'sheightis3SDabovetheaverageofallthefathers'heights.(TheSDisthe
STANDARDDEVIATIONofthefathers'heights.)Thentheson'sheightisalsolikelytobeabovetheaverageofthe
sons'heights,butbyfewerthan3SD(heretheSDisthe STANDARDDEVIATIONofthesons'heights).
Rejectionregion.
Inan HYPOTHESISTESTusinga TESTSTATISTIC,THEREJECTIONREGIONISTHESETOFVALUESOFTHETESTSTATISTICfor
whichwerejectthe NULLHYPOTHESIS.
Residual.
Thedifferencebetweenadatumandthevaluepredictedforitbyamodel.In LINEARREGRESSIONofavariable
plottedontheverticalaxisontoavariableplottedonthehorizontalaxis,aresidualisthe"vertical"distance
fromadatumtotheline.Residualscanbepositive(ifthedatumisabovetheline)ornegative(ifthedatum
isbelowtheline). PLOTSOFRESIDUALScanrevealcomputationalerrorsinlinearregression,aswellas
conditionsunderwhichlinearregressionisinappropriate,suchas NONLINEARITYand HETEROSCEDASTICITY.Iflinear
regressionisperformedproperly,thesumoftheresidualsfromtheregressionlinemustbezerootherwise,
thereisacomputationalerrorsomewhere.
ResidualPlot.
Aresidualplotforaregressionisaplotofthe RESIDUALSfromtheregressionagainstthe EXPLANATORYvariable.
Resistant.
A STATISTICissaidtoberesistantifcorruptingadatumcannotchangethestatisticmuch.The MEANisnot
resistantthe MEDIANis.Seealso BREAKDOWNPOINT.
Rootmeansquare(RMS).
TheRMSofa LISTisthesquarerootofthemeanofthesquaresoftheelementsinthelist.Itisameasure
oftheaverage"size"oftheelementsofthelist.TocomputetheRMSofalist,yousquarealltheentries,
averagethenumbersyouget,andtakethesquarerootofthataverage.
Rootmeansquareerror(RMSE).
TheRMSEofanan ESTIMATORofa PARAMETERisthesquarerootofthe MEANSQUAREDERROR(MSE)ofthe
estimator.Insymbols,ifXisanestimatoroftheparametert,then

RMSE(X)=(E((Xt)2)).

TheRMSEofanestimatorisameasureoftheexpectederroroftheestimator.TheunitsofRMSEarethe
sameastheunitsoftheestimator.Seealso MEANSQUAREDERROR.
rmsErrorofRegression
Thermserrorofregressionisthe RMSofthevertical RESIDUALSfromtheregressionline.ForregressingYon
X,thermserrorofregressionisequalto(1r2)SDY,whereristhe CORRELATIONCOEFFICIENTbetweenX
andYandSDYisthe STANDARDDEVIATIONofthevaluesofY.

Sample.
Asampleisacollectionof UNITSfroma POPULATION.Seealso RANDOMSAMPLE.
SampleMean.
Thearithmetic MEANofa RANDOMSAMPLEfromapopulation.Itisa STATISTICcommonlyusedtoestimatethe
POPULATIONMEAN.Supposetherearendata,{x1,x2,,xn}.Thesamplemeanis(x1+x2++xn)/n.The
EXPECTEDVALUEofthesamplemeanisthe POPULATIONMEAN.Forsamplingwithreplacement,theSEofthe
samplemeanisthepopulation STANDARDDEVIATION,dividedbythesquarerootofthe SAMPLESIZE.Forsampling
withoutreplacement,theSEofthesamplemeanisthe FINITEPOPULATIONCORRECTION ((Nn)/(N1))timesthe
SEofthesamplemeanforsamplingwithreplacement,withNthesizeofthepopulationandnthesizeof
thesample.
SamplePercentage.
Thepercentageofa RANDOMSAMPLEwithacertainproperty,suchasthepercentageofvotersregisteredas
Democratsina SIMPLERANDOMSAMPLEofvoters.Thesamplemeanisastatisticcommonlyusedtoestimate
thepopulationpercentage.The EXPECTEDVALUEofthesamplepercentagefroma SIMPLERANDOMSAMPLEora
randomsamplewithreplacementisthepopulationpercentage.The SEofthesamplepercentagefor
samplingwithreplacementis (p(1p)/n ),wherepisthepopulationpercentageandnisthesamplesize.
The SEofthesamplepercentageforsamplingwithoutreplacementisthe FINITEPOPULATIONCORRECTION
((Nn)/(N1))timestheSEofthesamplepercentageforsamplingwithreplacement,withNthesizeofthe
populationandnthesizeofthesample.TheSEofthesamplepercentageisoftenestimatedbythe
BOOTSTRAP.
SampleSize.
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 25/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

Thenumberofelementsinasamplefromapopulation.
Soundargument.
A LOGICALARGUMENTissoundifitis LOGICALLYVALIDandits PREMISESareinfacttrue.Anargumentcanbe
logicallyvalidandyetnotsoundifitspremisesarefalse.
SampleStandardDeviation,S.
ThesamplestandarddeviationSisanestimatorofthe STANDARDDEVIATIONofapopulationbasedona RANDOM
SAMPLEfromthepopulation.Thesamplestandarddeviationisa STATISTICthatmeasureshow"spreadout"the
sampleisaroundthe SAMPLEMEAN.Itisquitesimilartothe STANDARDDEVIATIONofthesample,butinsteadof
averagingthesquared DEVIATIONS(togetthe RMSofthe DEVIATIONSofthedatafromthe SAMPLEMEAN)itdivides
thesumofthesquareddeviationsby(numberofdata1)beforetakingthesquareroot.Supposethereare
ndata,{x1,x2,,xn},withmeanM=(x1+x2++xn)/n.Then

s= (((x1M)2+(x2M)2++(xnM)2)/(n1) )

Thesquareofthesamplestandarddeviation, S2(THESAMPLEVARIANCE)isan UNBIASEDestimatorof


thesquareoftheSDofthepopulation(thevarianceofthepopulation).

SampleSum.
Thesumofa RANDOMSAMPLEfromapopulation.The EXPECTEDVALUEofthesamplesumisthe SAMPLESIZE
timesthe POPULATIONMEAN.Forsamplingwithreplacement,the SEofthesamplesumisthepopulation
STANDARDDEVIATION,timesthesquarerootofthe SAMPLESIZE.Forsamplingwithoutreplacement,theSEofthe

samplesumisthe FINITEPOPULATIONCORRECTION ((Nn)/(N1))timestheSEofthesamplesumforsampling


withreplacement,withNthesizeofthepopulationandnthesizeofthesample.
SampleSurvey.
Asurveybasedontheresponsesofasampleofindividuals,ratherthantheentirepopulation.
SampleVariance
Thesamplevarianceisthesquareofthe SAMPLESTANDARDDEVIATIONS.Itisan UNBIASEDestimatorofthe
squareofthepopulation STANDARDDEVIATION,whichisalsocalledthevarianceofthepopulation.
Samplingdistribution.
Thesamplingdistributionofan ESTIMATORisthe PROBABILITYDISTRIBUTIONoftheestimatorwhenitisappliedto
RANDOMSAMPLES.Thetoolonthispageallowsyoutoexploreempiricallythesamplingdistributionofthe
SAMPLEMEANandthe SAMPLEPERCENTAGEofrandomdrawswithorwithoutreplacementdrawsfromaboxof
numberedtickets.
Samplingerror.
Inestimatingfroma RANDOMSAMPLE,thedifferencebetweenthe ESTIMATORandthe PARAMETERcanbewrittenas
thesumoftwocomponents: BIASandsamplingerror.The BIASistheaverageerroroftheestimatoroverall
possiblesamples.Thebiasisnotrandom.Samplingerroristhecomponentoferrorthatvariesfromsample
tosample.Thesamplingerrorisrandom:itcomesfrom"theluckofthedraw"inwhich UNITShappentobein
thesample.Itisthe CHANCEVARIATIONoftheestimator.Theaverageofthesamplingerroroverallpossible
samples(the EXPECTEDVALUEofthesamplingerror)iszero.The STANDARDERRORoftheestimatorisameasure
ofthetypicalsizeofthesamplingerror.
Samplingunit.
Asamplefromapopulationcanbedrawnone UNITatatime,ormorethanoneunitatatime(onecan
sampleclustersofunits).Thefundamentalunitofthesampleiscalledthesamplingunit.Itneednotbea
unitofthepopulation.
Scatterplot.
Ascatterplotisawaytovisualize BIVARIATEdata.Ascatterplotisaplotofpairsofmeasurementsona
collectionof"individuals"(whichneednotbepeople).Forexample,supposewerecordtheheightsand
weightsofagroupof100people.Thescatterplotofthosedatawouldbe100points.Eachpointrepresents
oneperson'sheightandweight.Inascatterplotofweightagainstheight,thexcoordinateofeachpoint
wouldbeheightofoneperson,theycoordinateofthatpointwouldbetheweightofthesameperson.Ina
scatterplotofheightagainstweight,thexcoordinateswouldbetheweightsandtheycoordinateswouldbe
theheights.
ScientificMethod.
Thescientificmethod.
SDline.
Fora SCATTERPLOT,alinethatgoesthroughthe POINTOFAVERAGES,withslopeequaltotheratioofthe STANDARD
DEVIATIONSofthetwoplottedvariables.IfthevariableplottedonthehorizontalaxisiscalledXandthe
variableplottedontheverticalaxisiscalledY,theslopeoftheSDlineistheSDofY,dividedbytheSDof
X.
SecularTrend.
Alinearassociation(trend)withtime.
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 26/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

SelectionBias.
Asystematictendencyforasamplingproceduretoincludeand/orexclude UNITSofacertaintype.For
example,ina QUOTASAMPLE,unconsciousprejudicesorpredilectionsonthepartoftheinterviewercanresult
inselectionbias.Selectionbiasisapotentialproblemwheneverahumanhaslatitudeinselectingindividual
unitsforthesampleittendstobeeliminatedby PROBABILITYSAMPLINGschemesinwhichtheintervieweristold
exactlywhomtocontact(withnoroomforindividualchoice).
SelfSelection.
Selfselectionoccurswhenindividualsdecideforthemselveswhethertheyareinthe CONTROLGROUPorthe
TREATMENTGROUP.Selfselectionisquitecommoninstudiesofhumanbehavior.Forexample,studiesofthe
effectofsmokingonhumanhealthinvolveselfselection:individualschooseforthemselveswhetherornotto
smoke.Selfselectionprecludesan EXPERIMENTitresultsinan OBSERVATIONALSTUDY.Whenthereisself
selection,onemustbewaryofpossibleconfoundingfromfactorsthatinfluenceindividuals'decisionsto
belongtothetreatmentgroup.
Set.
Asetisacollectionofthings(calledelements),withoutregardtotheirorder.Anitemiseitherinaset(itis
anelementoftheset),oritisnot.Itcannotbeinthesetmorethanonce.Twosetsareequaliftheycontain
electlythesameelements.Forinstance,theset{1,2,3,4}isequaltotheset{1,4,3,2},butnottotheset
{0,1,2,3}.Asanotherexample,theset{1,2,2}isequaltotheset{1,2}:theyhavethesametwo(distinct)
elements,1and2.
Significance,Significancelevel,Statisticalsignificance.
Thesignificancelevelofan HYPOTHESISTESTisthechancethatthetesterroneouslyrejectsthe NULLHYPOTHESIS
whenthe NULLHYPOTHESISistrue.
SimpleRandomSample.
Asimplerandomsampleofnunitsfromapopulationisarandomsampledrawnbyaprocedurethatis
equallylikelytogiveeverycollectionofnunitsfromthepopulationthatis,theprobabilitythatthesample
willconsistofanygivensubsetofnoftheNunitsinthepopulationis1/ NCn.Simplerandomsamplingis
samplingatrandomwithoutreplacement(withoutreplacingtheunitsbetweendraws).Asimplerandom
sampleofsizenfromapopulationofNnunitscanbeconstructedbyassigningarandomnumber
betweenzeroandonetoeachunitinthepopulation,thentakingthoseunitsthatwereassignedthen
largestrandomnumberstobethesample.
Simpson'sParadox.
Whatistrueforthepartsisnotnecessarilytrueforthewhole.Seealso CONFOUNDING.
SkewedDistribution.
Adistributionthatisnot SYMMETRICAL.
Spread,Measureof.
Seealso INTERQUARTILERANGE, RANGE,and STANDARDDEVIATION.
SquareRootLaw.
TheSquareRootLawsaysthatthe STANDARDERROR(SE)ofthe SAMPLESUMofnrandomdrawswith
replacementfromaboxofticketswithnumbersonthemis

SE(SAMPLESUM)=nSD(box),

andthe STANDARDERRORofthe SAMPLEMEANofnrandomdrawswithreplacementfromaboxof


ticketsis

SE(SAMPLEMEAN)=nSD(box),

whereSD(box)isthe STANDARDDEVIATIONofthelistofthenumbersonalltheticketsinthebox(including
repeatedvalues).
StandardDeviation(SD).
Thestandarddeviationofasetofnumbersisthe RMSofthesetof DEVIATIONSbetweeneachelementofthe
setandthe MEANoftheset.Seealso SAMPLESTANDARDDEVIATION.
StandardError(SE).
TheStandardErrorofa RANDOMVARIABLEisameasureofhowfaritislikelytobefromits EXPECTEDVALUEthat
is,itsscatterinrepeatedexperiments.TheSEofarandomvariableXisdefinedtobe

SE(X)= [E((X E(X))2 )].

Thatis,thestandarderroristhesquarerootofthe EXPECTEDsquareddifferencebetweentherandomvariable
anditsexpectedvalue.TheSEofarandomvariableisanalogoustothe SDofa LIST.
StandardNormalCurve.
See NORMALCURVE.

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 27/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

StandardUnits.
Avariable(asetofdata)issaidtobeinstandardunitsifits MEANiszeroandits STANDARDDEVIATIONisone.
You TRANSFORMasetofdataintostandardunitsbysubtractingthe MEANfromeachelementofthe LIST,and
dividingtheresultsbythe STANDARDDEVIATION.A RANDOMVARIABLEissaidtobeinstandardunitsifits EXPECTED
VALUEiszeroandits STANDARDERRORisone.You TRANSFORMa RANDOMVARIABLEtostandardunitsbysubtracting
its EXPECTEDVALUEthendividingbyits STANDARDERROR.
Standardize.
To TRANSFORMinto STANDARDUNITS.
Statistic.
Anumberthatcanbecomputedfromdata,involvingnounknown PARAMETERS.Asafunctionofa RANDOM
SAMPLE,astatisticisa RANDOMVARIABLE.Statisticsareusedtoestimate PARAMETERS,andto TESTHYPOTHESES.
StratifiedSample.
Inastratifiedsample,subsetsofsamplingunitsareselectedseparatelyfromdifferent STRATA,ratherthan
fromthe FRAMEasawhole.
Stratifiedsampling
Theactofdrawinga STRATIFIEDSAMPLE.
Stratum
Inrandomsampling,sometimesthesampleisdrawnseparatelyfromdifferent DISJOINT SUBSETSofthe
population.Eachsuchsubsetiscalledastratum.(Thepluralofstratumisstrata.)Samplesdrawninsucha
wayarecalled STRATIFIEDSAMPLES.Estimatorsbasedonstratifiedrandomsamplescanhavesmaller SAMPLING
ERRORSthanestimatorscomputedfrom SIMPLERANDOMSAMPLESofthesamesize,iftheaveragevariabilityof
the VARIABLEofinterestwithinstrataissmallerthanitisacrosstheentire POPULATIONthatis,ifstratum
membershipisassociatedwiththevariable.
Forexample,todetermineaveragehomepricesintheU.S.,itwouldbeadvantageoustostratifyon
geography,becauseaveragehomepricesvaryenormouslywithlocation.Wemightdividethecountryinto
states,thendivideeachstateintourban,suburban,andruralareasthendrawrandomsamplesseparately
fromeachsuchdivision.
Studentizedscore
Theobservedvalueofastatistic,minustheexpectedvalueofthestatistic,dividedbytheestimated
standarderrorofthestatistic.
Student'stcurve.
Student'stcurveisafamilyofcurvesindexedbyaparametercalledthedegreesoffreedom,whichcantake
thevalues1,2,Student'stcurveisusedtoapproximatesomeprobabilityhistograms.Considera
populationofnumbersthatare NEARLYNORMALLYDISTRIBUTEDandhave POPULATIONMEANis.Considerdrawinga
RANDOMSAMPLEofsizenwithreplacementfromthepopulation,andcomputingthe SAMPLEMEANMandthe
SAMPLESTANDARDDEVIATIONS.Definethe RANDOMVARIABLE

T=(M)/(S/n).

Ifthesamplesizenislarge,the PROBABILITYHISTOGRAMofTcanbeapproximatedaccuratelybythe NORMAL


CURVE.However,forsmallandintermediatevaluesofn,Student'stcurvewithn1degreesoffreedom
givesabetterapproximation.Thatis,

P(a<T<b)isapproximatelytheareaunderStudent'sTcurvewithn1degrees
offreedom,fromatob.
Student'stcurvecanbeusedtotesthypothesesaboutthepopulationmeanandconstructconfidence
intervalsforthepopulationmean,whenthepopulationdistributionisknowntobe NEARLYNORMALLYDISTRIBUTED.
ThispagecontainsatoolthatshowsStudent'stcurveandletsyoufindtheareaunderpartsofthecurve.
Subject,ExperimentalSubject.
Amemberofthe CONTROLGROUPorthe TREATMENTGROUP.
Subset.
Asubsetofagivensetisacollectionofthingsthatbelongtotheoriginalset.Everyelementofthesubset
mustbelongtotheoriginalset,butnoteveryelementoftheoriginalsetneedbeinasubset(otherwise,a
subsetwouldalwaysbeidenticaltothesetitcamefrom).
Survey.
See SAMPLESURVEY.
SymmetricDistribution.
TheprobabilitydistributionofarandomvariableXissymmetricifthereisanumberasuchthatthechance
thatXa+bisthesameasthechancethatXabforeveryvalueofb.A LISTofnumbershasasymmetric
distributionifthereisanumberasuchthatthefractionofnumbersinthelistthataregreaterthanorequal

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 28/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

toa+bisthesameasthefractionofnumbersinthelistthatarelessthanorequaltoab,foreveryvalueof
b.Ineithercase,thehistogramortheprobabilityhistogramwillbesymmetricalaboutaverticallinedrawnat
x=a.
Systematicerror.
Anerrorthataffectsallthemeasurementssimilarly.Forexample,ifaruleristooshort,everythingmeasured
withitwillappeartobelongerthanitreallyis(ignoring RANDOMERROR).Ifyourwatchrunsfast,everytime
intervalyoumeasurewithitwillappeartobelongerthanitreallyis(again,ignoring RANDOMERROR).
Systematicerrorsdonottendtoaverageout.
Systematicsample.
Asystematicsamplefroma FRAMEof UNITSisonedrawnbylistingtheunitsandselectingeverykthelement
ofthelist.Forexample,ifthereareNunitsintheframe,andwewantasampleofsizeN/10,wewouldtake
everytenthunit:thefirstunit,theeleventhunit,the21stunit,etc.Systematicsamplesarenot RANDOM
SAMPLES,buttheyoftenbehaveessentiallyasiftheywererandom,iftheorderinwhichtheunitsappearsin
thelistishaphazard.Systematicsamplesareaspecialcaseofclustersamples.
Systematicrandomsample.
A SYSTEMATICSAMPLEstartingatarandompointinthelistingof UNITSintheof FRAME,insteadofstartingatthe
firstunit.Systematicrandomsamplingisbetterthansystematicsampling,buttypicallynotasgoodas SIMPLE
RANDOMSAMPLING.

ttest.
An HYPOTHESISTESTbasedonapproximatingthe PROBABILITYHISTOGRAMofthe TESTSTATISTICby STUDENT'STCURVE.
ttestsusuallyareusedtotesthypothesesaboutthe MEANOFAPOPULATIONwhenthe SAMPLESIZEis
intermediateandthedistributionofthepopulationisknowntobe NEARLYNORMAL.
TestStatistic.
A STATISTICusedto TESTHYPOTHESES.Anhypothesistestcanbeconstructedbydecidingtorejectthe NULL
HYPOTHESISwhenthevalueoftheteststatisticisinsomerangeorcollectionofranges.Togetatestwitha
specified SIGNIFICANCELEVEL,thechancewhenthenullhypothesisistruethattheteststatisticfallsinthe
rangewherethehypothesiswouldberejectedmustbeatmostthespecifiedsignificancelevel.The Z
STATISTICisacommonteststatistic.
Transformation.
Transformationsturn LISTSintootherlists,orvariablesintoothervariables.Forexample,totransformalistof
temperaturesindegreesCelsiusintothecorrespondinglistoftemperaturesindegreesFahrenheit,you
multiplyeachelementby9/5,andadd32toeachproduct.Thisisanexampleofanaffinetransformation:
multiplybysomethingandaddsomething(y=ax+bisthegeneralaffinetransformationofxit'sthe
familiarequationofastraightline).Inalineartransformation,youonlymultiplybysomething(y=ax).Affine
transformationsareusedtoputvariablesin STANDARDUNITS.Inthatcase,yousubtractthe MEANanddividethe
resultsbythe SD.Thisisequivalenttomultiplyingbythereciprocalofthe SDandaddingthenegativeof
the MEAN,dividedbythe SD,soitisanaffinetransformation.Affinetransformationswithpositive
multiplicativeconstantshaveasimpleeffectonthe MEAN, MEDIAN, MODE, QUARTILES,andother PERCENTILES:the
newvalueofanyoftheseistheoldone,transformedusingexactlythesameformula.Whenthe
multiplicativeconstantisnegative,the MEAN, MEDIAN, MODE,arestilltransformedbythesamerule,but
quartilesandpercentilesarereversed:theqthquantileofthetransformeddistributionisthetransformed
valueofthe1qthquantileoftheoriginaldistribution(ignoringtheeffectofdataspacing).Theeffectofan
affinetransformationonthe SD, RANGE,and IQR,istomakethenewvaluetheoldvaluetimestheabsolute
valueofthenumberyoumultipliedthefirstlistby:whatyouaddeddoesnotaffectthem.
Treatment.
Thesubstanceorprocedurestudiedinan EXPERIMENTor OBSERVATIONALSTUDY.Atissueiswhetherthe
treatmenthasaneffectontheoutcomeorvariableofinterest.
TreatmentEffect.
Theeffectofthe TREATMENTonthevariableofinterest.Establishingwhetherthetreatmenthasaneffectis
thepointofan EXPERIMENT.
Treatmentgroup.
Theindividualswhoreceivethe TREATMENT,asopposedtothoseinthe CONTROLGROUP,whodonot.
Tuple,ntuple.
A TUPLEisanorderedcollectionofthings.Twotuplesareequaliftheycontainthesamethings,inthesame
order.Forinstance,thetuple(1,2,3)isequaltothetuple(1,2,3)butnotequaltothetuple(1,3,2).
Tuplescancontainrepeatedelements.Forinstance,thetuple(1,2,2)isnotequaltothetuple(1,2),norto
thetuple(2,2,1).Anntuple,wherenisaninteger,isatuplewithnpositions.Forexample,(1,2)isa2
tuple(akaorderedpair)and(7,3,2,2,2,1)isa6tuple.
TwosidedHypothesistest.
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 29/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

C.f. ONESIDEDTEST.Anhypothesistestofthe NULLHYPOTHESISthatthevalueofaparameter,,isequaltoa


nullvalue,0,designedtohave POWERagainstthe ALTERNATIVEHYPOTHESISthateither<0or>0(the
ALTERNATIVEHYPOTHESIScontainsvaluesonbothsidesofthenullvalue).Forexample,a SIGNIFICANCELEVEL5%,
twosided ZTESTofthenullhypothesisthatthemeanofapopulationequalszeroagainstthealternativethat
itisgreaterthanzerowouldrejectthenullhypothesisforvaluesof

sample mean
|z| = > 1.96.
SE

TypeIandTypeIIerrors.
Thesereferto HYPOTHESISTESTING.ATypeIerroroccurswhenthe NULLHYPOTHESISisrejectederroneouslywhen
itisinfacttrue.ATypeIIerroroccursifthe NULLHYPOTHESISisnotrejectedwhenitisinfactfalse.Seealso
SIGNIFICANCELEVELand POWER.

Unbiased.
Not BIASEDhavingzerobias.
UncontrolledExperiment.
An EXPERIMENTinwhichthereisno CONTROLGROUPi.e.,inwhichthe METHODOFCOMPARISONisnotused:the
experimenterdecideswhogetsthe TREATMENT,buttheoutcomeofthetreatedgroupisnotcomparedwiththe
outcomeofa CONTROLGROUPthatdoesnotreceive TREATMENT.
Uncorrelated.
Asetof BIVARIATEdataisuncorrelatedifits CORRELATIONCOEFFICIENTiszero.Two RANDOMVARIABLESare
uncorrelatedifthe EXPECTEDVALUEoftheirproductequalstheproductoftheir EXPECTEDVALUES.Iftworandom
variablesare INDEPENDENT,theyareuncorrelated.(The CONVERSEisnottrueingeneral.)
Uncountable.
Asetisuncountableifitisnot COUNTABLE.
Unimodal.
Havingexactlyone MODE.
Union.
Theunionoftwoormoresetsisthesetofobjectscontainedbyatleastoneofthesets.Theunionofthe
EVENTSAandBisdenoted"A+B","AorB",and"AB".C.f. INTERSECTION.
Unit.
Amemberofa POPULATION.
Univariate.
Havingorhavingtodowithasingle VARIABLE.Someunivariatetechniquesand STATISTICSincludethe
HISTOGRAM, IQR , MEAN, MEDIAN, PERCENTILES, QUANTILES,and SD .C.f. BIVARIATE.
UpperQuartile(UQ).
See QUARTILES.

Valid(logical)argument.
Avalid LOGICALARGUMENTisoneinwhichthetruthofthe PREMISESindeedguaranteesthetruthofthe
CONCLUSION.Forexample,thefollowinglogicalargumentisvalid:Iftheforecastcallsforrain,Iwillnotwear
sandals.Theforecastcallsforrain.Therefore,Iwillnotwearsandals.Thisargumenthastwopremises
which,together,guaranteethetruthoftheconclusion.Anargumentcanbelogicallyvalidevenifits
premisesarefalse.Seealso INVALIDARGUMENTand SOUNDARGUMENT.
Variable.
Anumericalvalueoracharacteristicthatcandifferfromindividualtoindividual.Seealso CATEGORICAL
VARIABLE, QUALITATIVEVARIABLE, QUANTITATIVEVARIABLE, DISCRETEVARIABLE, CONTINUOUSVARIABLE,and RANDOMVARIABLE.
Variance,populationvariance
Thevarianceofa LISTisthesquareofthe STANDARDDEVIATIONofthelist,thatis,theaverageofthesquaresof
thedeviationsofthenumbersinthelistfromtheirmean.ThevarianceofarandomvariableX,Var(X),isthe
expectedvalueofthesquareddifferencebetweenthevariableanditsexpectedvalue:Var(X)=E ((X
E(X))2).Thevarianceofarandomvariableisthesquareofthe STANDARDERROR(SE)ofthevariable.
VennDiagram.

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 30/31
6/7/2017 https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance

Apictorialwayofshowingtherelationsamong SETSor EVENTS.Theuniversalsetor OUTCOMESPACEisusually


drawnasarectanglesetsareregionswithintherectangle.Theoverlapoftheregionscorrespondstothe
INTERSECTIONofthesets.Iftheregionsdonotoverlap,thesetsare DISJOINT.Thepartoftherectangleincluded
inoneormoreoftheregionscorrespondstothe UNIONofthesets.Thispagecontainsatoolthatillustrates
Venndiagramsthetoolrepresentstheprobabilityofaneventbytheareaoftheevent.

XOR,exclusivedisjunction.
XORisanoperationontwo LOGICALPROPOSITIONS.Ifpandqaretwo PROPOSITIONS,(pXORq)isaproposition
thatistrueifeitherpistrueorifqistrue,butnotboth.(pXORq)is LOGICALLYEQUIVALENTto((p|q)&!(p&
q)).

zscore
Theobservedvalueofthe ZSTATISTIC.
Zstatistic
AZstatisticisa TESTSTATISTICwhosedistributionunderthe NULLHYPOTHESIShas EXPECTEDVALUEzeroandcan
beapproximatedwellbythe NORMALCURVE.Usually,Zstatisticsareconstructedby STANDARDIZINGsomeother
STATISTIC.TheZstatisticisrelatedtotheoriginalstatisticby

Z=(original EXPECTEDVALUEoforiginal)/ SE(original).

ztest
An HYPOTHESISTESTbasedonapproximatingthe PROBABILITYHISTOGRAMofthe ZSTATISTICunderthe NULL
HYPOTHESISbythe NORMALCURVE.

SticiGuiHome

19972017.P.B.Stark.Allrightsreserved.
Lastgenerated6/7/2017,3:16:20PM.Contentlastmodified11January201707:37PDT.

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#sample_variance 31/31

You might also like