You are on page 1of 13

(More from 7.

5) Determining the Minimum Sample Size for a Proportion



Given a confidence level, 1- , the sampling error, SE, is the greatest possible distance
between the point estimate and the value of the proportion it is estimating.
n
p p
z SE
) 1 (
2 /




Suppose, before we take a sample, we decide we want a sampling error no greater than some
number SE (maybe 0.03). To save time and money, we want to limit the sample size as
much as possible. Since SE depends largely on n, we can rearrange the equation above to get

2
2 /
) 1 (
SE
p p z
n




rounded up the nearest whole number

Note: Since p is unknown, use your best guess or let p= 0.50 (which would give you a very
conservative sample size


Ex. A news organization want wants to predict, with 95% confidence, the proportion of
Americans who believe Barak Obama is doing a good job as President of the United
States.

What is the minimum sample size needed if you are to be accurate within 3% of the
population proportion?






Ex. ABC News and the Washington Post conducted a survey where they asked adults: Have
you ever seen anything that you believe was a spacecraft from another planet?

Find the minimum sample size necessary to ensure that the population proportion is
within 5 percent of the sample proportion. (Assume a confidence level of 99%)




Chapter 8 Inferences Based on a Single Sample: Tests of Hypotheses

Where Weve Been
Calculated point estimators of population parameters
Used the sampling distribution of a statistic to assess the reliability of an estimate through a
confidence interval

Where Were Going
Test a specific value of a population parameter
Measure the reliability of the test


Section 8.1 The Elements of a Hypothesis Test

A hypothesis is a claim or statement about the value of a single population characteristic or
the values of several population characteristics.

A hypothesis test is a process that uses sample data to decide between two competing claims
(hypotheses) about a population characteristic.

The Competing Hypotheses
The null hypothesis, denoted byE
0
, is a claim about a population characteristic that is
initially assumed to be true. Think of this statement as a representation of the status quo.
The alternative hypothesis, denoted byE
u
, is the competing claim. There must be
strong evidence in favor of this hypotheses to accept.
The test statistic is a value that measure the distance between the observed value of X


and the hypothesized value of p



The Outcome of a Hypothesis Test
In carrying out a test of E
0
vs E
u

If the test statistic has a high probability when H
0
is true, then H
0
is not rejected.
If the test statistic has a (very) low probability when H
0
is true, then H
0
is rejected.

Defining the Null and Alternative Hypotheses
The form of a null hypothesis is
E
0
: population characteristic = hypothesized value

The alternative hypothesis will have one of the following three forms:
E
u
: population characteristic > hypothesized value
E
u
: population characteristic < hypothesized value
E
u
: population characteristic = hypothesized value

Ex. A battery manufacturer says the mean life of its batteries is 300 min. Consumer advocates
claim that the mean is not 300 minutes. Write the null and alternative hypotheses.







Ex. In the past, it was believed that the proportion of Americans without health care coverage
was 0.20. Experts now believe this proportion has increased. Write the null and
alternative hypotheses.







Comparing hypothesis tests to the U.S. legal system.

Ex. In a jury trial, the defendant is presumed innocent unless proven guilty. The null
and alternative hypotheses are:
E
o
: Defendant is innocent
E
u
: Defendant is guilty

The alternative is accepted only if the defendant is guilty beyond reasonable
doubt. If not, the jury must fail to reject the null; that is, find the defendant
not guilty.

In the jury trial example, an error is made if an innocent defendant is found guilty (alternative
hypothesis accepted) or if a guilty defendant is set free (null hypothesis is accepted).
Obviously, we want to minimize the risk of making such errors
Error associated with hypothesis tests.
Suppose for the moment that we can either accept the null or reject the null.

If the null is rejected when it is true, a Type I error (denoted by o) is made. If the null is
accepted when it is false, a Type II error (denoted by [) is made.
True State of Nature
Decision E
0
is True E
u
is True
Accept Null (assume E
0
is true) Correct Type II Error
Reject Null (assume E
u
is true) Type I Error Correct

Measuring Type I and Type II Errors
The significance level is the maximum allowable probability of making a Type I error. We
denote this error by . This level is typically 0.01, 0.05, or 0.10. Since is a value that we
select; we can control the probability of making a Type I error. When we choose , we are
essentially saying that there is an percent chance our rejection of the null hypothesis is
incorrect.
The error associated with making a Type II error is called . Since is often unknown and
very difficult to determine. Therefore, we want to avoid making a Type II error. We do this
by never accepting the null; instead, we fail to reject it.

Ex. Water samples are taken from water used for cooling as it is being discharged from a
power plant into a river. It has been determined that as long as the mean temperature of
the discharged water is at most 150, there will be no negative effects on the rivers
ecosystem. To investigate if the plant is in compliance with regulations, monitors will
take 50 water samples at randomly selected times and record the temperature of each
sample.

a. State the null and alternative hypotheses




b. In the context of this problem, describe Type I and Type II errors.






c. Which type of error would you consider more serious? Explain.

Section 8.2 Large-Sample Test of a Hypothesis about a Population Mean

When do we decide to reject the null hypothesis?

Note: There are two common methods for deciding whether or not to reject the null hypothesis:
Rejection Regions or P-values

Rejection Regions: Depending on the alternative hypotheses, E
u
and the significance level o,
critical values are used to define the rejection. These critical values are found using the same
techniques used to find critical values for confidence intervals.



Ex. Find the rejection regions for the following E
u
and o

a. E
u
: p < k and o = u.1u

b. E
u
: p > k and o = u.uS

c. E
u
: p = k and o = u.u1


Rejection Regions for Common Values of u

Large-Sample Hypothesis Test For A Mean (aka Z-Test)

Conditions for a valid test:
1) A random sample is selected from the target population.
2) The sample size n is large (n Su).

Hypotheses
0
H
: p = p
0

A
H
: p > p
0
p < p
0
p = p
0


Test Statistic: z =
X

-
0
o
n
(use s if necessary)

Rejection Region and Conclusion:
For a one-tailed test: E
o
should be rejected if |z| > z
u
.
For a two-tailed test: E
o
should be rejected if |z| > z
u2
.


Ex. Suppose the EPA is investigating a company for not complying with carbon monoxide
standards. If the mean reading is higher than 5.00, the EPA will levy fines against the
company. A random sample of 45 readings is taken and the results were: X = 5.32
and s = 1.13. Should the EPA fine the company for non-compliance? Use = 0.05.

Hypotheses Test Procedure/Assumptions




Test statistic Significance Level/Rejection Region/






Decision and conclusion.

Ex. On a self-image test, the mean score for public-assistance recipients is expected to be 65.
A random sample of 38 recipients in a particular district is given the test and they achieve
a mean score of X

= 6S.1 and a standard deviation of s = 5.83. Do the recipients in this


country test lower than the expected average? Use = 0.01

Hypotheses Test statistic






Significance Level/Rejection Region/

Decision and conclusion.





Ex. The National Institute of Diabetes and Digestive and Kidney Diseases reports that the
average cost of bariatric (weight loss) surgery is $22,500. You think this information is
incorrect. You randomly select 30 bariatric surgery patients and find that the average cost
for their surgeries is $21,545 with a standard deviation of $3015. Is there enough
evidence to support your claim at = 0.10

Hypotheses Test statistic






Significance Level/Rejection Region/

Decision and conclusion.



Section 8.3 Observed Significance Levels: P Values

The P-value (also sometimes called the observed signicance level) is a measure of
inconsistency between the hypothesized value for a population characteristic and the observed
sample. It is the probability, assuming that E
o
is true, of obtaining a test statistic value at least
as inconsistent with E
o
as what actually resulted.

P(z z

|E
0
), where z* is the value of the test statistic.
The lower this probability, the less likely H
0
is true.

A decision as to whether E
o
should be rejected results from comparing the P-value to the
chosen o:

E
o
should be rejected if P-value o.
E
o
should not be rejected if P-value > o.


Finding p-values

One-tailed Tests



Two-tailed Tests


(for the two-tailed test, P-value = 2 P(z > |tcst stotistic|)

Ex. In auto racing, a pit crew claims that its mean pit stop time (for 4 new tires and fuel) is
less than 13 seconds. A random selection of 32 pit stop times has a sample mean of 12.9
seconds and a standard deviation of 0.19 seconds. Is there enough evidence to support the
claim at = 0.01
Hypotheses Test statistic




P-Value




Decision and conclusion.





Ex. A national organization has been working with utilities throughout the nation to find sites
for large wind machines that generate electricity. Wind speeds must average more than
22 miles per hour (mph) for a site to be acceptable. Recently, the organization conducted
wind speed tests at a particular site. Based on a sample of n = 33 wind speed recordings
(taken at random intervals), the wind speed at the site averaged 22.8 mph, with a standard
deviation of s = 4.3 mph. Determine whether the site meets the organization's
requirements. Use a significance level of 0.01
Hypotheses Test statistic



P-Value




Decision and conclusion.



Section 8.4 Small-Sample Test of a Hypothesis about a Population Mean
Conditions for a valid test:
1) A random sample is selected from the target population.
2) The population from which the sample is selected is approximately normal

Hypotheses
0
H
: p = p
0

A
H
: p > p
0
p < p
0
p = p
0


Test Statistic: t =
X

-
0
s
n
with J. . = n - 1

Rejection Region and Conclusion:
For a one-tailed test: E
o
should be rejected if |t| > t
u
.
For a two-tailed test: E
o
should be rejected if |t| > t
u2
.
Note: P-values may also be used in place of rejection regions.

Ex. A bottling company produces bottles that hold 12 ounces of liquid. Periodically, the
company gets complaints that their bottles are not holding enough liquid. To test this
claim, the bottling company randomly samples 22 bottles and finds the average amount
of liquid held by the bottles is 11.7 ounces with a standard deviation of 0.4 ounces.
Conduct a hypothesis test to determine if the complaints are valid and the company is
under-filling the bottles. Use a significance level of 0.05
Hypotheses Test statistic





Significance Level/Rejection Region/

Decision and conclusion.

Ex. An ink cartridge for a laser printer is advertised to print an average of 10,000 pages. A
random sample of eight businesses that have recently bought this cartridge are asked to
report the number of pages printed by a single cartridge. The results are shown.
9771 9811 9885 9914 9675 10,079 10,145 10,214

Assume that the data belong to a normal population. Test the null hypothesis that the
mean number of pages is not 10,000 pages. Use = 0.10

(from Stat Crunch)
Hypothesis test results:
: mean of pages
H
0
: = 10000
H
A
: 10000
Variable Sample Mean Std. Err. DF T-Stat P-value
pages 9930.5 71.1924 7 -0.9762279 0.3615

Hypotheses Test Procedure/Assumptions







Test statistic Significance Level/Rejection Region/







Decision and conclusion.



Section 8.5 Large-Sample Test of a Hypothesis about a Population Proportion

Conditions for a valid test:
1. p is the sample proportion from a random sample.
2. The sample size is large. np
0
1Sandn(1 -p
o
) 1S.
3. Sampling is done without replacement and the sample size is no more than 10% of the
population size.

Hypotheses
0
H
: p = p
0

A
H
: p > p
0
p < p
0
p = p
0



Test Statistic:
z =
p -p
0
_
p
0
(1 - p
0
)
n


P-Value/Rejection Region: computed as shown earlier

Conclusion:
Rejection Region Method
For a one-tailed test: E
o
should be rejected if |z| > z
u
.
For a two-tailed test: E
o
should be rejected if |z| > z
u2
.

P-value Method
E
o
should be rejected if P-value o.
E
o
should not be rejected if P-value > o.

Ex. In a representative sample of 1000 adult Americans, only 471 could name at least one
justice who is currently serving on the U.S. Supreme Court (Ipsos, January 10, 2006).
Using a significance level of o = 0.05, carry out a hypothesis test to determine if there is
convincing evidence to support the claim that fewer than half of adult Americans can
name at least one justice currently serving on the Supreme Court.

Hypotheses Test statistic






P-Value/Rejection Region




Decision and conclusion.






Ex. A medical researcher estimates that more than 55 percent of American adults eat
breakfast every day. In a random sample of 250 adults, 57.4 percent say that they eat
breakfast every day. At = 0.10, is there enough evidence to support the researchers
claim?

Hypotheses Test statistic





P-Value/Rejection Region




Decision and conclusion.

You might also like