You are on page 1of 3

Using R to Analyze Data (Part 3)

Normal Distributions
Some typical assignment problems can be solved by knowing the
Empirical Rule (also known as the 68-!- rule"# $or any normal
distribution%
&he total area under the curve is '(()#
!() o* values lie on either side o* the mean#
+ppro,imately 68) o* values lie between -' and -'
standard deviations *rom the mean#
+ppro,imately !) o* values lie between -. and -.
standard deviations *rom the mean#
+ppro,imately ) o* values lie between -/ and -/
standard deviations *rom the mean#
0ogic replaces R *or these types o* problems# 1owever% this only
works when you are provided speci2c in*ormation such as being
told that 0inda3s weekly co4ee consumption o* 5 cups per day is
e,actly one standard deviation above the mean# &he rule doesn3t
help as much i* we know that she drinks 5#/. cups#
6ercentiles
Suppose the mean daily co4ee consumption is / with a standard
deviation o* '# 7ell% we can use the Empirical Rule to state that
!) o* people drink between ' and ! cups per day# 8ut suppose
that we are asked to 2nd the 8/
rd
percentile9
&echnically% you are trying to 2nd the inverse o* the cumulative
*re:uency distribution# 8ut% to bring this down to earth% in R you
would type
qnorm(0.83, mean=3, sd=1)
and you learn that the 8/
rd
percentile corresponds to /#! cups o*
co4ee#
7hat i* the :uestion was reversed and they wanted to know the
percentile *or /#! cups o* co4ee9 ;n that case% you would enter
pnorm(3.!, mean=3, sd=1)
Determining areas under the curve
Suppose we want to know the proportion o* people drinking more
than 5 cups o* co4ee per day9 7ell% we know that the mean is /
and that this is <ust one standard deviation above the mean# So%
we could use some logic and the Empirical Rule# 8ut% to do this in
R% we could again use pnorm
pnorm(",mean=3,sd=1)
8ut note that the answer is the proportion less than 5% and we
want the proportion greater than 5# 7ell% the total proportion is
'#(% right9 So <ust subtract R3s answer *rom '#
R tells us that #85' drink less than 5 cups a co4ee a day# So%
'-#85' = #'! (or '!#)" drink more than 5 cups o* a co4ee a day#
8ut what i* the assignment is clear in that what you have is a
sample and not a population9 $or e,ample% what i* you want to
know the likelihood that someone drinks less than 5 cups o*
co4ee per day based on a study where the mean is /% the
standard deviation is '% and this is based on a randomly selected
group o* ! co4ee drinkers9 &his changes things because now we
are dealing with one o* many possible samples and have to
account *or the sample si>e#
?alculate the >-score (standard score" like this# &he numerator is
the sample value minus the population mean% right9
n#merator="$3
&he denominator is the standard deviation divided by the s:uare
root o* the sample si>e
denominator=1%sqrt(!)
So% the standard score is <ust
my&o'eez=n#merator%denominator
+nd% you get the likelihood (probability" via
pnorm(my&o'eez)
7e get an answer o* #8@ (or 8#@)"#

You might also like