Professional Documents
Culture Documents
in Statistics with
in
of s o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
Excel 2010
75 Problems
&
in
Detailed Solutions
on
ta
An I
deal
Supp
Stat
leme
istic
nt fo
s
r St
Inst
uden
ruct
ts &
ors
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
on
t
I INTER-RATER RELIABILITY USING SAS: A Practical Guide for Nominal, Ordinal, and Interval Data
http://cixls.advancedanalyticsllc.com/index.html
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
CONFIDENCE INTERVALS
IN STATISTICS:
100 Problems & Solutions
on
t
http://cixls.advancedanalyticsllc.com/index.html
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
A single copy of this document may be printed and the printed copy be shared
with other interested parties. However, this document is NOT to be transmitted in
any other form including electronic or mechanical, photocopying, recording, or by
an information storage and retrieval system except by a reviewer who may quote
brief passages in a review to be printed in a magazine or a newspaper without
a writing permission from the publisher. For information, please contact Advanced
Analytics, LLC at the following address :
Advanced Analytics, LLC
PO BOX 2696
Gaithersburg, MD 20886-2696
e-mail: info@advancedanalyticsllc.com
on
t
Gwet, Kilem Li
Confidence Intervals in Statistics: 75 Problems and Solutions
A Practical Self-Study Guide for Students and Professionals/ By Kilem Li Gwet
p. cm.
1. Biostatistics
2. Statistical Methods
3. Statistics - Study - Learning. I. Title.
http://cixls.advancedanalyticsllc.com/index.html
Contents
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
2. Confidence Interval for a Population Mean . . . . . . . . 3
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3. Confidence Interval for a Population Proportion 71
4. Excels Analysis ToolPak . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
on
t
- iv http://cixls.advancedanalyticsllc.com/index.html
on
t
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
http://cixls.advancedanalyticsllc.com/index.html
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
Introduction
on
t
K. Gwet (2011), The Practical Guide to Statistics: Applications with Excel, R, and Calc,
Advanced Analytics, LLC
-1http://cixls.advancedanalyticsllc.com/index.html
-2-
Chapter 1: Introduction
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
on
t
http://cixls.advancedanalyticsllc.com/index.html
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
on
t
For all practical purposes, a sample size n is considered large when it equals or exceeds
30, and is small otherwise
-3http://cixls.advancedanalyticsllc.com/index.html
-4-
follows:
z/2 = NORM.S.INV(1-/2)
(2.2)
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
The confidence intervals of equation 2.1, are sometimes presented in the form,
When the size n of your sample is small (i.e. the number of sample elements is
below 30), then you must assume that the input observations follow the Normal distribution (at least approximately)2 , and separately treat the situations
where the population standard deviation is known and where it is unknown.
(i) Known Population Standard Deviation
C.I() =
x z/2 ; x + z/2 ,
n
n
(2.3)
where = (Confidence Interval), and z/2 is the 100(1/2)th percentile of the standard Normal distribution, which is computed with Excel
2010 as shown in equation 2.2.
on
t
s
s
x t/2,n1 ; x + t/2,n1 ,
n
n
(2.4)
Gwet (2011), in the book The Practical Guide to Statistics: Applications with Excel,
R, and Calc discusses the situation where this assumption cannot be made. However, all
exercises in this section are based on the assumption of Normality or approximate Normality
http://cixls.advancedanalyticsllc.com/index.html
2.1
-5-
(2.5)
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
on
t
The rule of thumb for deciding whether to use or not use the FPC is to
use it whenever the sampling fraction that represents the ratio n/N of the
sample size to the population size is smaller than
using the FPC
0.05. When
is deemed appropriate, all ratios of the
form / n or s/
n must be replaced
with their adjusted versions FPC / n and FPC s/ n. The corresponding
confidence intervals must be computed accordingly.
2.1
Exercise 2.1
A sample of 49 observations is taken from a normal population with
a standard deviation of 10. The sample mean is 55. Determine the
99 percent confidence interval for the population mean.
http://cixls.advancedanalyticsllc.com/index.html
-6-
Solution
is unknown a
=10
n = 49
x = 55
0.99
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
Population Mean:
Population standard deviation:
Sample Size :
Sample Mean :
Confidence Level:
a
This quantity has to be unknown, otherwise there would not be any need to develop a
confidence interval for a known quantity.
Since the probability distribution of the raw data is normal, so is the probability distribution of the sample mean x. Equation 2.1(b) will be used because
of the large sample size that exceeds 30, and the availability of the population
standard deviation .
Of all the elements needed to construct the confidence interval (c.f. equation
2.1(b)), only z/2 is yet to be obtained. It follows from the confidence level
0.99 that = 1 0.99 = 0.01 and /2 = 0.005. Consequently, z0.005 represents
the 99.5th percentile3 of the Normal distribution needed to construct the 99th
confidence interval. It is obtained with Excel 2010 as shown in Figure 2.1.
The two confidence bounds of the interval are given by,
CI()=(51.32 ;58.68).
on
t
It is generally recommended to round the lower bound down and the upper
bound up. That is, if for example the lower bound is 13.268, and you want to
display only 2 digits after the decimal point, then you should present 13.26 (and
not 13.27). On the other hand, an upper bound of 13.262 would be rounded
up to 13.27 (and not 13.26). The rationale is to ensure the validity of the
confidence level with respect to the final interval.
3
Note that 99.5 = 100 (1 /2) is the recommended percentile in equation 2.1.
http://cixls.advancedanalyticsllc.com/index.html
-7-
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
2.1
Figure 2.1. Calculating the 99.5th percentile of the Normal distribution with
Excel 2010
Exercise 2.2
A sample of 81 observations is taken from a normal population with
a standard deviation of 5. The sample mean is 40. Determine the 95
percent confidence interval for the population mean.
Solution
on
t
Population Mean:
Population standard deviation:
Sample Size :
Sample Mean :
Confidence Level :
is unknown a
=5
n = 81
x = 40
0.95
This quantity has to be unknown, otherwise there would not be any need to develop a
confidence interval for a known quantity.
Since the probability distribution of the raw data is normal, so is the probability distribution of the sample mean x. Equation 2.1(b) will be used because
of the large sample size that exceeds 30, and the availability of the population
standard deviation .
It follows from the confidence level 0.95 that = 1 0.95 = 0.05 and
/2 = 0.025. Consequently, z0.025 represents the 97.5th percentile of the Normal
http://cixls.advancedanalyticsllc.com/index.html
-8-
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
Exercise 2.3
A sample of 10 observations is selected from a normal population
for which the population standard deviation is known to be 5. The
sample mean is 20.
a. Determine the standard error of the mean.
on
t
Solution
is unknown a
=5
n = 10
x = 20
This quantity has to be unknown, otherwise there would not be any need to develop a
confidence interval for a known quantity.
http://cixls.advancedanalyticsllc.com/index.html
2.1
-9-
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
Confidence level=1
= 0.95. The lower bound is LB = xz0.025 /n =
Exercise 2.4
A research firm conducted a survey to determine the mean amount
steady smokers spend on cigarettes during a week. They found the
distribution of amounts spent per week followed the normal distribution with a standard deviation of $5. A sample of 49 steady smokers
revealed that x = $20.
a. What is the point estimate of the population mean ? Explain
what it indicates.
b. Using the 95 percent level of confidence, determine the confidence interval for . Explain what it indicates.
on
t
Solution
is unknown
Normal
=$5
n = 49
x = $20
(a) The point estimate of the population mean is the sample mean x = $20.
It represent our best guess of the magnitude of the actual and unknown
population mean .
http://cixls.advancedanalyticsllc.com/index.html
2.2
- 17 -
2.5 as follows:
t = t/2,n1 = t0.05,15 = T.IN V (1 0.05, 15) = 1.75.
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
Note that = 1 0.90 = 0.10, and 0.05 = /2. For a confidence level
of 90%, the value of t represents the 95th percentile of the t distribution
with n 1 = 16 1 = 15 degrees of freedom.
(d) We will develop the 90% confidence interval for the population mean
using expression 2.4. The lower bound LB and the upper bound UB of
the confidence interval ar obtained as follows:
s
20
LB = x t0.05 = 60 1.75 = 51.2,
n
15
s
20
UB = x + t0.05 = 60 + 1.75 = 68.8.
n
15
The 90% confidence interval of the population mean number of eggs per
chicken is,
CI()=(51.2 ;68.8),
on
t
Exercise 2.11
Merrill Lynch Securities and Health Care Retirement, Inc., are two
large employers in downtown Toledo, Ohio. They are considering
jointly offering child care for their employees. As a part of the feasibility study, they wish to estimate the mean weekly child-care cost of
their employees. A sample of 10 employees who use child care reveals
the following amounts spent last week.
$107 $92 $97
$95
$105
$95
$104
http://cixls.advancedanalyticsllc.com/index.html
- 18 -
Solution
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
The population mean is =the mean weekly child-care cost of the population
of Merrill Lynch Securities and Health Care Retirement, Inc. employees. It is
unknown and must be estimated with a 90% confidence interval.
Capture the 10 data points of this exercise in Excel. This data may be
organized either vertically or horizontally. Following the directives of section
4.2 of chapter 4, use the Descriptive Statistics module of the Excel 2010s
Analysis ToolPak to produce Table 2.10 below (make sure to specify the correct
confidence level of 90%).
Table 2.10 : Output of the Descriptive Statistics Module
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Confidence Level(90.0%)
98.6
1.752
98
95
5.542
30.711
-1.304
0.165
16
91
107
986
10
3.212
on
t
http://cixls.advancedanalyticsllc.com/index.html
2.2
- 19 -
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
C.I()=(95.38 ;101.82).
Interpretation
Although the feasibility study estimated the mean weekly child-care cost of
the employees to be about $98.6 based on a small sample of 10 employees, the
actual value of that mean is between $95.38 and $101.82 with 90% certainty,
as suggested by the confidence interval.
Exercise 2.12
The Greater Pittsburgh Area Chamber of Commerce wants to estimate the mean time workers who are employed in the downtown area
spend getting to work. A sample of 15 workers reveals the following
number of minutes spent traveling.
29 38 38
40 37 37
33
42
38
30
21 45 34
29 35
on
t
Solution
http://cixls.advancedanalyticsllc.com/index.html
2.3
- 59 -
2.3
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
The narrower the confidence interval, the more information it provides on the real magnitude of the population mean. A too wide confidence
interval on the other hand provides us with a wide range of possible values
for the population mean, and little information about the magnitude of the
mean itself. Consequently, when designing a research study researchers often
want to determine the sample size needed to obtain a confidence interval with
a pre-specified length L.
We have seen that the confidence interval of the population mean generally has the form (x E; x + E) where
x is the sample mean, and E the
margin of error defined as E = z/2 s/ n when n is reasonably large. n is the
sample size, s the standard deviation10 , and z/2 the 100(1 /2)th percentile
of the standard Normal distribution with = 1 Confidence Level.
For a specified error margin E and confidence level 1, the desired sample
size is given by,
z/2 s 2
(2.7)
n=
.
E
If it is the length L of the confidence interval that is provided, then the desired
sample size will be given by:
n=
z/2 s
L/2
2
(2.8)
on
t
11
Note that the sample mean is always the center of the confidence interval.
Therefore the population mean is always within the error margin of the sample
mean at the specified confidence level.
10
Note that s could be the true population standard deviation when it is known, or
could be its estimated value obtained from a pilot study. Even a small pilot may yield an
estimated standard deviation sufficiently precise for the purpose of calculating the sample
size.
11
Rounding up to nearest integer means 56.02 for example will be rounded up to 57.
http://cixls.advancedanalyticsllc.com/index.html
- 60 -
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
Exercise 2.40
A population is estimated to have a standard deviation of 10. We
want to estimate the population mean within 2, with a 95 percent
level of confidence. How large a sample is required ?
Solution
Exercise 2.41
We want to estimate the population mean within 5, with a 99 percent
level of confidence. The population standard deviation is estimated
to be 15. How large a sample is required ?
Solution
on
t
http://cixls.advancedanalyticsllc.com/index.html
ai
ns
of o
th nl
e y2
co 1
m of
pl t
et he
e 1
bo 09
ok p
ag
e
4.1
on
t
From the main Excel menu, select the File tab from the menu bar, then Options
as shown in Figure 4.1. These actions should open the Excel Options form of
Figure 4.2.
Select "Options"
- 99 http://cixls.advancedanalyticsllc.com/index.html