You are on page 1of 39

SAMPLING DISTRIBUTION

MRS. PADILLA
AP STATSTICS
CHAPTR 18
OBJECTIVES
How to use simulation to generate approximate
sampling distributions of common summary
statistics such as the sample mean and sample
proportions.
To describe the shape, center, and spread of the
sampling distributions of common summary
statistics without actually generating them.
To use sampling distribution to determine which
results are reasonable likely and which would be
considered rare.
DEFINITION
Sampling Distribution is the distribution of the summary
statistics you get from taking repeated random samples.
Steps:
Take a random sample of a fixed size n from a
population
Compute a summary statistic (sample mean or sample
proportion for each sample)
Repeat Steps 1 and 2 many times
Display the distribution of your summary statistics
(make a histogram of the values of mean or proportion)
Examine the distribution displayed for shape, center,
spread, as well as, outliers or other deviations.

**NOTE: These steps describe how to use simulation to make an approximate
sampling distribution. If you want to construct an exact or theoretical sampling
distribution you can would use
n
C
r
.
REASONABLE/RARE EVENT
Reasonable Likely Events: Values that lie in the
middle 95% of a sampling distribution. (i.e. - In a
normal distribution, reasonable likely events lie
within two standard deviations from the mean.)

Rare Events: Values that lie in the outer 5% of a
sampling distribution. (i.e. - In a Normal
distribution, rare events are those values that lie
more than two standard deviations from the mean.)

Parameter,Statistic
A parameter is a number that describes the population. In
statistical practice, the value of a parameter is not known,
because we cannot examine entire populations.

A statistic is a number that can be computed from the sample
data without use of any unknown parameter. In practice, we
often use a statistic to estimate an unknown parameter.

Statistic versus Parameter: MEANS
The mean income of the sample households contacted by the
Current Population Survey was x=$60,528. The number
$60,528 is a statistic because it describes this one Current
Population Survey sample. The population poll wants to draw
conclusions about all 113 million U.S. households. The
parameter of interest is the mean income of all of these
households. We dont know the value of this parameter.
Sampling Distributions of the
Number of Successes
Properties: If a random sample of size n is selected from a
population of successes p, then the sampling distribution of
the number of success X
Has a mean of
X
= np
Has a standard deviation of
X
=
Will be approximately normal as long as n is large enough

As a guideline, both np and n(1-p)are at least 10, then using the normal
distribution as an approximation to the shape of the sampling distribution
will give reasonable accurate results.
(1 ) np p
Example
Question:
The use of seat belts continues to rise in the U.S., with
overall seat belt usage of 82%. Mississippi lags
behind the rest of the nation only about 60% wear
seat belts. Suppose you take a random sample of 40
Mississippians. How many do you expect to will
wear them? What is the probability that 30 or more
people in your sample wear seat belts?

Solution: You would expect that 60% of 40, or 24 will be
wearing them. Lets estimate that the probability of 30 or more
wear seat belts.
First we must Check that both np and n(1-p) are at least 10:
(This ensure the binomial distribution is mound-shaped
For the normal distribution to be a reasonable model.)

Thus, the sampling distribution of the number of success is ~normal,
with mean and standard error of:


The z-score for 30 successes is:

The proportion below this value of z is .9736, therefore the probability of
getting 30 or more successes is 1-.9736 = 0.0264.

CALCULATOR INPUT:

40(.6) 24
(1 ) 40(1 .6) 16
np
n p
= =
= =
40(.6) 24
(1 ) 40(.6)(1 .6) 3.098
X
X
np
np p

o
= = =
= = ~
30 24
1.937
3.098
(1 )
X
X
x x np
z
np p

o

= = ~ ~

( 1 99,1.937, 0,1) .9736273976


1 .9736273976 .0263726024
E
normalcdf =
=
Point Estimators
Definition of the two statistics called Point Estimators:
When inferences are made from the sample to the population, the
sample mean is viewed as an estimator of the mean of the population
from which the sample was selected.
Similarly, the proportion of successes in a sample is an estimator of
the proportion of successes in the population.

Properties of Point Estimators:
When you use a summary statistic from a sample to estimate a parameter of a
Population would like the, there are two properties that summary statistic to have.

1. The summary statistic should be unbiased

2. The summary statistic should have very as little variability as possible and
should have a standard error that decreases as the sample size increases.
Point Estimators continued
Example
Estimating the Distance of the Farthest Galaxy
The observatory is given the names of 31 galaxies. Your job is to
estimate the maximum distance from Earth among the 31 galaxies.
You have the resources to measure the distance of only 10 galaxies,
which you will select at random. The distances in megaparsecs
(Mpc) are:
.008 .76 .81 7.2 7.5 9.7 10.6 11.2 11.4 11.6 11.7 13.2
15 15 15.3 15.5 15.7 16.1 16.1 16.8 16.8 20.9 22.9 22.9
24.1 25.9 26.2 29.2 31.6 58.7 93
The max distance, 93Mpc, is the population parameter you hope to
estimate.

Statistic versus Parameter: PROPORTIONS
The Gallup Poll asked a random sample of 515 US adults whether they believe
In ghosts. Of the respondents, 160 said YES. So the proportion of the sample
Who say they believe in ghosts is:


The number 0.31 is a statistic. We can use it to estimate our parameter of interest:
P, the proportion of all US adults who believe in ghosts.
---------------------------------------------------------------------------------------------------

Example 2:
1) State whether each italic boldface numbers is a parameter or a statistic
2) Use appropriate notation to describe each number

The Tennessee STAR experiment randomly assigned children to regular or small
Classes during their first four years of school. When these children reached high
School, 40.2% of blacks from small classes took the ACT or SAT. Only 31.7%
of blacks from regular classes took one of these exams.
Sampling Distribution of the Sample Proportion
PROPERTIES: If a random sample of size n is selected from a population with
proportion of successes p, the sampling distribution of has these properties:

The mean of the sampling distribution is equal to the mean of the population, or


The standard error of a sampling distribution is equal to the standard deviation
of the population divided by the square root of the sample size:




As the sample size gets larger, the shape of the sapling distribution becomes more
normal and will be approximately normal if n is large enough.

As a guideline, if both np and n(1-p) are at least 10, then using the normal distribution as an
approximation of the shape of the sampling distribution will give reasonably accurate
results.
p
p
p =

(1 )
p
p p
n
n
o
o

= =
Assumptions and Conditions
The model for the distribution of sample proportions
the two assumptions are:
1. The sampled values must be independent of each other.
2. The sample size n, must be large enough

Corresponding Conditions to be checked prior to using the Normal to
model the distribution of sample proportions:
1. 10% Condition
2. Success/Failure Condition



NOTE: Sample proportions, like sample means, are among the most common
summary statistics in practical use, and knowledge of their behavior is
fundamental to the study of statistics.
Example
If you take a random sample of 40 Mississippians, what is the probability that 75%
or more were seat belts?

This is the same question asked in the previous example, because 30 out of 40 is
75%. However, the notation (and computations) will reflect thou now are
working with the proportion of successes in the sample rather than the the
number of successes.
Some notation will make your work more efficient:
Suppose you count the number of successes in a random sample of size n from
a population in which the true proportion of successes is give by p.
i.e.

So, suppose your sample of automobile drivers contains 26 who use seat belts.
Then, n = 40; p = 0.60;

p
26
0.65
40
p = =
Example
Sampling Distribution of the Sample Mean
PROPERTIES: If a random sample of size n is selected from a population
with a mean and standard deviation then,

CENTER the mean , of the sampling distribution of equals the mean of
the population :


SPREAD the standard deviation , of the sampling distribution of ,
called the standard error of the mean, equals the standard deviation of the
population, , divided by the square root of the sample size n:


SHAPE the shape of the sampling distribution will be approximately normal
if the population is approximately normal; for other populations, the
sampling distribution becomes more normal as n increases (This property is
called the Central Limit Theorem).
X
=
X

o
o
x
o
x
x
n
o
o =
Example
Given the population distribution:
With the:





Five sampling distribution of a sample
Mean for samples of noted sizes:



What is the probability that a random sample of 20 families in
the US will have an average of 1.5 children?


# of
Children
Proportion of
Families
0 0.524
1 0.201
2 0.179
3 0.070
4(or more) 0.026
0.9
1.1
X
X
and

o
=
=
Sample
Size
Mean SE
4 0.873 1.1
4 0.873 0.55
10 0.873 0.35
20 0.873 0.25
40 0.873 0.17
Population 0.873 1.1
Solution
Lets look at the sampling distribution for n = 20, we will see that the sampling
distribution of the sample mean is ~normal. Since the mean of the population
is 0.9 and standard deviation is 1.1, for the sampling distribution we have:





The z-score for the value 1.5 is:



If we use the z-table, we find the z-score of 2.4 is 0.9918, which is the probability
that the sample mean will fall below 1.5. In a random sample of 20 families, it
is almost certain that the average # of children per family will be less than 1.5

0.9
X
= =
1.1
0.25
20
x
n
o
= = ~
1.5 0.9
2.4
0.25
x
x
x x
z
n

o
o

= = ~ ~
Using Properties of the Sampling Distribution
of the mean

1. When can you use the property that that the mean of the sampling distribution
Of the mean is equal to the mean of the population ?
- Well, in random sampling, it is always true. The shape of the population
doesnt matter, nor how large the sample is, how large the population is, or
Whether you sample with or without replacement.

2. When can I use the property that the standard error of the sampling distribution
of the mean ?

- You can use this formula with a population of any shape and with any samples
size as long as you randomly sample with/without replacement and the sample size
is less than 10% of the population size. If you are taking a random sample without
replacement of a small population, you will use the following formula;




X
=
x
n
o
o =
1
x
N n
SE
N
n
o
o

= = -

3. Computing the probabilities, you find the z-score and then use the table to get
the probability. This doesnt work unless the sampling distribution is ~normal.
When can you assume the sampling distribution is ~normal?

- with a large enough sample, the s.d. is ~normal, although there is no hard
and fast rule to knowing what is large enough. We will learn some guide-
Lines later. For now, if you are told the population is ~normal distributed,
you can assume that the s.d. of the mean is ~normal too, no matter what
size. If you are told the sample size is very large, its safe to assume the
s.d. of the mean is ~normal.

Using Properties of the Sampling Distribution
of the mean
Properties of the Sampling Distribution
of the sum and difference
Suppose two values are taken randomly from two populations with means
and , and variances are and , respectively. Then the sampling
distribution of the sum and difference of the two values has mean of:



If the two values are selected independently, the variance of the sum and
difference are:



The shapes of the s.d. of the sum and difference depend on the shapes of the two
0riginal populations. If both are normally distributed, so are the s.d. of the sum
and difference.
2
1
o
2
2
o
1

1 2 sum
= +
1 2 difference
=
2 2 2
1 2 sum
o o o = +
2 2 2
1 2 difference
o o o =
Probabilities involving Sample Totals
Properties of the Sampling Distribution of the Sum of a Sample
If a random sample of size n is selected from a distribution with mean
and standard deviation of , then:
The mean of the sampling distribution of the sum is:



The standard error of the sampling distribution of the sum is:



The shape of the sampling distribution will be ~normal if the population is
~normally distributed; and for other populations the sampling distribution
Will become more normal as n increases
sum
n =
sum x
n n n
n
o
o o o = = = -
Example
Sometimes situations are stated in terms of the total number in a sample rather
than the average number. What is the probability that a random sample of 20
families in the U.S. will have a total of 30 or fewer children?

Solution: The sampling distribution of the sum is ~ normal because it has the
same shape as the sampling distribution mean.


And the standard error of


The z-score for the total of 30 children is



Therefore the probability is ~0.9918.
20(0.9) 18
sum sum
n = = =
20(1.1) 5
sum sum
n o o o = - = ~
30 20(0.9)
2.4
20(1.1)
sum
sum
samplesum
z

o

= = ~
Reasonably likely totals
In the previous example of a random sample of 20 families, what total
numbers of children are reasonably likely

We know the sampling distribution of the total number of children is
~ normal with a mean of 18 and standard error of 5. The reasonably likely
outcomes are those within ~ two standard errors of the mean. Therefore
this is the interval of:




or a total number of children between 8.2 and 27.8.
18 1.96(5)
If we look at the shape of the binomial distributiondividing by n should have no effect on the
overall shape, so well get an idea of what the Sampling Distribution
looks like.

The Effect of the Parameter: (these samples I use sample size n=10)
Finding the Shape of the Sampling Distribution
The Effect of Sample Size
Lets set the parameter at a fairly extreme valuesay, p = 0.1and see what happens
As the sample size increases. Well start with n = 20.
Variability of a Statistic
The variability of a statistic is described by the spread of its sampling distribution
This spread is determined by the sampling design and the size of the sample.
Larger samples gives smaller spread. As long as the population is much larger
Than the sample (say, at least 10 times as large), the spread of the sampling
Distribution is ~ the same for any population size.
Both bias and variability describe what
happens when we take many shots at
the target.

Bias means that our aim is off and we
consistently miss the bulls-eye in the
same direction. Our sample values do
not center on the population value.

High Variability means that repeated
shots are widely scattered on the target.
Variability of a Statistic
Figures below show histograms of four sampling distributions of statistics
Intended to estimate the same parameter.
Example
Example
Summary of Sampling Distributions
Key Facts about Sampling Distributions
A simulated, or approximate, sampling distribution is
the distribution of the sample statistic for a large
number of repeated random samples

Sampling Distributions, like data distributions, are best
described by shape, center, and spread

The standard deviation of a sampling distribution is
called the standard error

Many, but not all, sampling distributions are
approximately normal. For normal distributions,
reasonably likely outcomes are those that fall within
approximately two standard errors of the mean

If the mean of the sampling distribution is equal to the
population parameter being estimated, then the
summary statistic you are using is an unbiased
estimator of that parameter.
Summary continued

x
x

o
x
SE o


POPUPLATION
PARAMETER

SAMPLE
STATISTIC

SAMPLING
DISTRIBUTION

MEAN


STANDARD
DEVIATION

s


SIZE

N

n



OF THE MEAN OF THE SUM


CENTER
All Populations

times n
Special Case of
Binomial Populations
times n


SHAPE
All Populations


times n

Special Case of
Binomial Populations
times n
sum
n =
sum
n o o = -
sum
np =
(1 )
X
np p o =
x
=
p
p =
x
n
o
o =

(1 )
p
p p
n
o

=

You might also like