You are on page 1of 10

Chance Variability in Sampling

I. Expected Value and Standard Error in Sampling

We have analyzed chance variability using box models, and we have formulas for the
expected value and the standard error for the sum of the draws from a box with
replacement. We can also analyze chance variability in sampling using similar box
models. However, sampling involves drawing at random without replacement.
Consider the same box model from the previous unit:

1 2 3 2 Draws

There are only 6 possible sums for 2 draws without replacement:

S1  1  2  3 S3  2  1  3 S5  3  1  4
S2  1  3  4 S4  2  3  5 S6  3  2  5

The average of these sums is S 4 and the standard deviation is  S  2 . Notice


3
that the expected value has not changed, but the standard error has. Drawing without
replacement gives fewer sums (6 instead of 9) than we had with replacement and also
eliminates the largest sum (6) and the smallest sum (2) that we had with replacement,
thus slightly reducing the variability of the sums of the draws. Therefore, the SE for
drawing without replacement is a little smaller than the SE for drawing with
replacement. We can calculate the SE for drawing without replacement using the
formula:

SE (without replacement) = Correction Factor (C.F.)  SE (with replacement)

NUMBER  OF  TICKETS  IN  BOX  NUMBER  OF  DRAWS


where C.F. = .
NUMBER  OF  TICKETS  IN  BOX  1

   
For the box model above, SE (with replacement) =  2 3  2  4 3 and C.F. =
 
32
 1 . Thus, SE (without replacement) = 1  4   2 and this
3 1 2 2 3  3
checks with the result above.

Note: When the number of tickets in the box is large relative to the number of draws,
the correction factor is nearly 1 and thus can be ignored. It is the absolute size of

1
the sample (number of draws) which determines accuracy, through the SE for
drawing with replacement. The size of the population (number of tickets in the
box) does not really matter. On the other hand, if the sample size is a substantial
fraction of the population size, the correction factor must be used. As a rule of
thumb, if the sample size is more than 10% of the population size, use the
correction factor.

Example 1: A box contains 20 red marbles and 80 blue marbles. If 25 marbles are
drawn from the box at random without replacement, what is the probability of
getting more than 22 blue marbles?

80 1 20 0 25 Draws

80(1)  20(0)
AV   .8 EV  (.8)(25)  20
100
SD  (1  0) .8(.2)  .4 SE (without) = SE (with)  C.F.
 
= (.4)(5) 75 99 
 
 1.74
75%

12.5%

20 22
22  20
 1.15 z=0 z  1.15
1.74

Thus, there is about a 12.5% chance of getting more than 22 blue marbles.

Example 2: On the average, hotel guests weigh about 150 lb with a standard
deviation of 25 lb. An engineer is designing a large elevator to lift 16 people.
If she designs it to lift 2650 lb, what is the probability that it will be over-
loaded by a random group of 16 people?

All potential
hotel guests 16 Draws

AV = 150 lb EV = (150 lb)(16) = 2400 lb


SD = 25 lb SE(without)  SE(with) = (25 lb)(4) = 100 lb

98.76%
.62%

2400 2650
2650  2400
 2.5 z=0 z  2.5
100

Thus, there is about a 0.62% chance of overloading the elevator.

II. Accuracy of Percents

Consider the following problem:

In a recent survey of youths aged 6 to 17 responding to a survey by the


Corporation for Public Broadcasting, 47% responded that they had a
television in their room. If a random sample of 400 youths aged 6 to 17
is taken, what is the probability that more than 50% of the sample had a television
in their room?

This problem could be done similar to the two previous examples using the EV and
SE for the number of youths in the sample who had a television in their room.
However, we could work the problem using the percent of youths in the sample who
had a television in their room by making the following modifications:

EV
EV (%)   100%
NUMBER  OF  DRAWS

SE
SE (%)   100%
NUMBER  OF  DRAWS

47 1 53 0 400 Draws

188
AV = .47 EV = (.47)(400)=188 EV (%)   100%  47%
400

9.98
SD  (.47 )(.53)  .499 SE = (.499)(20)=9.98 SE (%)   100%  2.5%
400

3
77%

11.5%
47% 50%
50%  47%
 1 .2 z=0 z  1.2
2.5%

Thus, there is about an 11.5% chance that more than 50% of the sample had a
television in their room.

Note: If N = NUMBER OF DRAWS, then EV = (AV OF BOX)(N) and SE =


(SD OF BOX)( N ). Thus, we get the following formulas:

EV ( AV  OF  BOX )( N )
EV(%) =  100%   100%  ( AV  OF  BOX )  100%
N N

SE ( SD  OF  BOX )( N ) SD  OF  BOX
SE(%) =  100%   100%  100%
N N N

III. Accuracy of Averages

Consider the following problem:

The average IQ score on the Stanford-Binet Intelligence Test is 100 with a


standard deviation of 15. If a random sample of 25 IQ scores is chosen, what is
the probability that the average IQ of the sample is less than 95?

This problem could be done using the EV and SE for the total of the IQ points of the
sample. However, we could work the problem using the average by making the
following modifications:

EV
EV ( AVE ) 
NUMBER  OF  DRAWS

SE
SE ( AVE ) 
NUMBER  OF  DRAWS

All IQ Scores 25 Draws


2500
AV = 100 EV = 100(25)=2500 EV(AVE) =  100
25
75
SD = 15 SE = 15(5) = 75 SE ( AVE )  3
25

91%
4.5%

95 100
95  100
z  1.67 1.67
3

Thus, there is about a 4.5% chance that the average of the sample is less than 95.

Note: If N = NUMBER OF DRAWS, then EV = (AV OF BOX)(N) and SE =


( SD  OF  BOX )( N ). Thus, we get the following formulas:

EV ( AV  OF  BOX )( N )
EV(AVE) =   AV OF BOX
N N

SE ( SD  OF  BOX )( N ) SD  OF  BOX
SE(AVE) =  
N N N

IV. Confidence Intervals

So far in our analyses of chance variability in sampling, we have begun with data for
the population (contents of the box model) and used them to make estimates about
the sample (draws from the box). However, this is not the “real” sampling problem
in which we begin with data for the sample and use them to make estimates about the
population. Consider the following sampling problem:

In a recent poll, a random sample of 400 registered voters from the entire state of
N.C. was taken. 55% of this sample indicated that they favored a state lottery.
What is the estimated percent of all N.C. voters who favor the lottery and what is
the chance variability for this estimate?

Population 400 Draws


(All registered
voters in NC)
AV = ? Sample
SD  .497 (estimated from
sample SD by 55 1 45 0
bootstrap method)
AV = .55 = 55%
SE  400 (.497)  9.94 SD  .55(.45)  .497
9.94
SE (%)   100%  2.5%
400

Thus, we could estimate the population percent to be about 55%, and the
chance variability for the sample percent is about 2.5%.

In sampling problems similar to the previous one, in which we want to estimate the
population percent, we can construct a confidence interval for this population percent
as follows:

M% Confidence Interval = Sample %  ( z M % )(SE (%))


for Population %

For the previous example, construct a 90% confidence interval for the percent of the
population that favors the state lottery:

90% Confidence Interval = 55%  (1.64)(2.5%)


= 55%  4.1%
= 50.9% to 59.1%

Note that the confidence interval depends on the sample. If other researchers were to
take random samples of 400 registered voters, they would probably get slightly
different confidence intervals. However, for about 90% of the samples, the interval
would cover the true percent of the population that favors the state lottery. We can
also construct confidence intervals for a population average as follows:

M% Confidence Interval = Sample Average  ( z M % )( SE ( AVE ))


for Population Average

6
For example, consider the following sampling problem:

There are 2700 institutions of higher learning in the U.S. Suppose that a random
sample of 36 of these institutions is taken and the average enrollment of this
sample is 3700 with a standard deviation of 6000. Construct a 95% confidence
interval for the average enrollment of all 2700 institutions.
Population 36 Draws
(All 2700 instituitions
of higher learning in U.S.)

AV = ?
SD = 6000 Sample

6000
SE(AVE) =  1000 AV = 3700
36
SD = 6000

95% Confidence Interval = 3700  2(1000)


= 3700  2000
= 1700 to 5700

V. Confidence Intervals for Small Samples

With a large sample, the SD of the sample is a good estimate for the SD of the
population. However, with a small sample, the SD of the sample is not likely to be
an accurate estimate of the SD of the population. Instead, the SD of the population
is estimated by the larger quantity SD  , given as follows:

N
SD   SD( sample)  , where N is the sample size.
N 1

Also, with a large sample, we can construct the confidence interval using the
appropriate z-score from the normal curve since the probability histogram for the
sums of the draws from a box model (with or without replacement) will follow the
normal curve (Central Limit Theorem). However, with a small sample, the
probability histogram for the sums of the draws from a box model without
replacement does not follow the normal curve. If, instead, the population (contents of

7
the box) is normally distributed, then the probability histogram will follow a
Student’s curve with (N – 1) degrees of freedom, where N is the sample size.
Therefore, in order to construct a confidence interval for the population average
based on a small sample of size N, use the following:

M% Confidence Interval = Sample Average  (t M % )( SE ( AVE ))


for Population Average
where t M % is the appropriate t-score from the Student’s curve with (N – 1) degrees
SD 
of freedom and SE(AVE) = .
N

Consider the following problem:

A quality control technician wishes to estimate the average weight of widgets in a


a large shipment. She takes a random sample of 9 widgets and gets the following
weights (in oz): 3.0, 3.2, 3.4, 3.5, 3.7, 3.7, 3.7, 4.0, and 4.2. Construct a 90%
confidence interval for the average weight of all the widgets in the shipment.

Population 9 Draws
(Large shipment of widgets)

AV = ? Sample
9
SD   (.353)  .374oz AV = 3.6oz
8
SD = .353oz

.374
SE ( AVE )   .125 oz
9

Use the Student’s curve for df = 9 – 1 = 8. Look under the column headed by
5%; this gives an area of 90% between – t and t. Thus, t = 1.86.

90% Confidence Interval = 3.6oz  (1.86)(.125oz)


= 3.6oz  0.23 oz
= 3.37oz to 3.83oz

8
Practice Sheet – Chance Variability in Sampling
1. A box contains 80 red marbles and 20 blue marbles. A random sample of 25
marbles is drawn from the box without replacement. What is the probability of
getting exactly 20 red marbles.
2. The average weight of adult males is 170lb with a standard deviation of 25lb. If
a random sample of 10 adult males gets on an elevator, what is the probability that
the total weight of the males will exceed the 2000lb load capacity of the elevator?
3. Of the 3600 WFU students, 35% are in-state students. If a random sample of 900
WFU students is chosen, what is the probability that the percent of in-state
students in the sample is between 33% and 37%?
4. The average SAT score in N.C. last year was 890 with a standard deviation of
210.
(a) If a random sample of 25 SAT scores from N.C. is taken, what is the
probability that the average SAT score of the sample is greater than 900?
(b) Same question as (a) except sample size is 400.
(c) Same question as (a) except sample size is 1600.
5. According to the American Red Cross, 37% of the population have O+ blood
type.
If a random sample of 400 donors attend a blood drive, what is the probability that
the percent of O+ blood types in the sample is greater than 40%?
6. A group of 50,000 tax forms shows an average gross income of $37,000 with a
standard deviation of $20,000. Also, 20% of the forms show a gross income over
$50,000.
(a) If a random sample of 400 tax forms is taken, what is the probability that
between 19% and 21% of the sample show gross incomes over $50,000?
(b) If a random sample of 25 tax forms is taken, what is the probability that the
average gross income of the sample is less than $35,000?
7. The average IQ score on the Stanford-Binet Intelligence Test is 100 with a
standard deviation of 15. Assume that IQ scores are normally distributed.
(a) What percent of all IQ scores would be greater than 121?
(b) If a random sample of 400 IQ scores is taken, what is the probability that more
than 10% of the sample scores would be greater than 121?
(c) If a random sample of 25 IQ scores is taken, what is the probability that the
average IQ of the sample is less than 105?
8. A box contains 400 marbles, some are red and the rest are blue. A random sample
of 100 marbles is drawn from the box without replacement. There are 68 red
marbles in this sample. Construct, if possible, a 95% confidence interval for the
percent of red marbles in the original box.
9. A quality control technician wishes to estimate the average weight of widgets in a
large shipment. She takes a random sample of 16 widgets and finds the average
weight to be 120g with SD of 15g. Construct, if possible, an 80% confidence
interval for the average weight of all the widgets in the shipment.

9
10. There are 2700 institutions of higher learning in the U.S. Suppose that a random
sample of these institutions is chosen and the average enrollment of this sample is
3700 with SD of 6500.
(a) If the sample size is 400, construct if possible a 95% confidence interval for
the average enrollment of all 2700 institutions.
(b) Repeat (a) if the sample size is 64.
11. In a certain town, there are 25,000 households. A simple random sample of 500
households is taken. 179 of the sample households had dishwashers, and 498 had
refrigerators.
(a) Construct, if possible, a 95% confidence interval for the percent of all 25,000
households with dishwashers.
(b) Repeat (a) for refrigerators.

12. A quality control technician wishes to estimate the nicotine content of the
cigarettes produced by a tobacco company. She takes a random sample of 7
cigarettes and gets the following nicotine contents (in mg): 16, 17, 19, 20, 20,
21, and 22. Construct if possible a 90% confidence interval for the average
nicotine content of all the cigarettes produced by the company.

Solution Key for Chance Variability in Sampling


1. 22%
2. .007%
3. 85%
4. (a) 40.5% (b) 17% (c) 3%
5. 11%
6. (a) 38% (b) 31%
7. (a) 8% (b) 7% (c) 95.5%
8. 60.12% to 75.88%
9. 114.8g to 125.2g
10. (a) 3115 to 4285 (b) 2120 to 5280
11. (a) 31.6% to 40.0% (b) 99.05% to 100.15% (cannot do!)
12. 17.7 mg to 20.9 mg

10

You might also like