You are on page 1of 33

1

Lecture Notes on Introductory Statistics, IX


(P.P. Leung)
Chapter 9 Hypothesis Tests for One Population Mean ()
9.1 The nature of Hypothesis Testing ()
9.2 Terms, Errors, and Hypotheses ()
9.3 Hypothesis Tests For One Population Mean when is known
9.5 P-values
9.6 Hypothesis tests for one population mean when is unknown
{Hypothesis test () another aspect of inferential statistics (parameters estimation
by sample statistic) from Confidence Interval. Example is still concentrated on the one
population mean.
Null hypothesis ( / / / ) hypothesis to be tested.
Alternative hypothesis or research hypothesis ( / / )
hypothesis against the null hypothesis.
300
=5 25

H0: x 3003 = x
Ha: x 3003 x

}
H0 Ha Hypothesis test ()

9.1 The nature of Hypothesis Testing ()


Hypothesis () -- a statement that something is true.
Null and Alternative Hypotheses:
Null Hypothesis ( / ) -- A hypothesis to be tested. We use the symbol H0 to
represent the null hypothesis.
Alternative Hypothesis () -- A hypothesis to be considered as an alternative to the
null hypothesis. We use the symbol Ha to represent the alternative hypothesis.
Form of Hypotheses:
Null hypothesis -- H0: = 0 (must be an equality)

Alternative hypothesis:
Two-tailed test () -- Ha: 0,
Left-tailed test () -- Ha: < 0,
Right-tailed test () -- Ha: > 0.
Examples P.385 ~ P.387 Ex9.1~3, for the following examples:
a. Determine the null hypothesis for the hypothesis test.
b. Determine the alternative hypothesis for the hypothesis test.
c. Classify the hypothesis test as two-tailed, left-tailed, or right-tailed.
1. Quality Assurance. A snack-food company produces 454g bag of pretzels. Although the
actual net weights deviate slightly from 454g and vary from one bag to another, the
company insists that the mean net weight of the bags be kept at 454g. Indeed, if the mean
net weight is less than 454g, the company will be short-changing its customers; and if the
mean net weight exceeds 454g, the company will be unnecessarily overfilling the bags.
As part of its program, the quality assurance department periodically performs a
hypothesis test to decide whether the packaging machine is working properly, that is, to
decide whether the mean net weight of all bags packaged is 454g.
Ans: Let denote the mean net weight of all bags packaged.
(a) The packaging machine is working properly, or symbolically H0: = 454 gm.
(b) The packaging machine is not working properly, or symbolically, Ha: 454 gm.
(c) Since the sign is for the alternative hypothesis, the test is two-tailed.
2. Prices of History Books. The R.R. Bowker Company of New York collects information
on the retail prices of books and publishes the data in Publishers Weekly. In 1997, the
mean retail price of history books was $43.50. Suppose that we want to perform a
hypothesis test to decide whether this years mean retail price of history books has
increased from the 1997 mean.
Ans: Let denote this years mean retail price of history books.
(a) This years mean retail price of history books equals the 1997 mean of $43.50, i.e.
H0: = $43.50.
(b) This years mean retail price of history books is greater than the 1997 mean of
$43.50, that is, Ha: > $43.50.
(c) Since the > sign is for the alternative hypothesis, the test is right-tailed.
3. Poverty and Calcium. Calcium is the most abundant mineral in the body and also one of
the most important. It works with phosphorus to build and maintain bones and teeth.
According to the Food and Nutrition Board of the National Academy of Sciences, the
recommended daily allowance (RDA) of calcium for adults is 800 milligrams (mg).

Suppose that we want to perform a hypothesis test to decide whether the average person
with an incomer below the poverty level gets less than the RDA of 800 mg.
Ans: Let denote the mean calcium intake (per day) of all people whose income below the
poverty level.
(a) The mean calcium intake of all people with incomes below the poverty level equals
800 mg per day, i.e. H0: = 800 mg.
(b) The mean calcium intake of all people with incomes below the poverty level is less
than the RDA of 800 mg per day; i.e. Ha: < 800 mg.
(c) Since the < sign is for the alternative hypothesis, the test is left-tailed.
The logic of Hypothesis Testing ():
Basic logic of Hypothesis Testing:
Take a random sample from the population. If the sample data are consistent with the null
hypothesis, do not reject the null hypothesis; if the sample data are inconsistent with the
null hypothesis (in the direction of the alternative hypothesis), reject the null hypothesis
and conclude that the alternative hypothesis is true.
Example P.388 Ex9.4, Quality Assurance, A company that produces snack-food uses a
machine to package 454g bags of pretzels. We assume that the net weights are normally
distributed and that the population standard deviation of all such weights is = 7.89g. A
random sample of 25 bags of pretzels has the net weights, in grams, displayed in the
table shown.
465
449
468
446
447

456
442
433
447
456

438
449
454
456
456

454
446
463
452
435

447
447
450
444
450

Mean ( x ) = 450
Do the data provide sufficient evidence to conclude that the packaging machine is not
working properly? We use the following steps in order to answer the question.
a. State the null and alternative hypotheses for the hypothesis test.
b. Discuss the logic behind carrying out the hypothesis test.
c. Identify the distribution of the variable x , that is, the sampling distribution of the
sample mean for samples of size 25.
d. Obtain a precise criterion for deciding whether to reject the null hypothesis in favor of
the alternative hypothesis.
e. Apply the criterion in part (d) to the sample data and state the conclusion.

Ans: Let denote the mean net weight of all bags packaged.
(a) The null and alternative hypotheses for the hypothesis test,
H0: = 454 g (the packaging machine is working properly)
Ha: 454 g (the packaging machine is not working properly)
(b) The logic if the null hypothesis is true, that is, if = 454 g, the mean weight, x , of
the sample of 25 bags of pretzels should approximately equal 454g. We say
approximately equal because we cannot expect a sample mean to equal exactly the
population mean; some sampling error is to be anticipated. However, if the sample
mean weight differs too much from 454 g, we would be inclined to reject the null
hypothesis and conclude that the alternative hypothesis is true. As we shall show in
part (d), we can use our knowledge of the sampling distribution of the sample mean
to decide how much difference is too much.
(c) The sampling distribution of the mean is normal, with n = 25, = 7.89.
x = (which we dont know),

x =

7 .8
=
= 1.56, and
n
25

x is normally distributed.
(d) The 68.26-95.44-99.74 rule states that, for a normally distributed variable, 95.44%
of all possible observations lie within two standard deviations to either side of the
mean. Applying this part of the rule to the variable x and refer to part (c), we see
that 95.44% of all samples of 25 bags of pretzels have mean weights within 2(1.56)
= 3.12 gm of . Or, equivalently, only 4.56% of all samples of 25 bags of pretzels
have mean weights that are not within 3.12 g of .
If the mean weight, x , of the 25 bags of pretzels sampled is more than two standard
deviations from 454 gm, reject the null hypothesis, = 454g, and conclude that the
alternative hypothesis, 454g, is true. Otherwise, do not reject the null hypothesis.
(e) The mean weight, x , of the sample of 25 bags of pretzels is 450 g. Therefore,

z=

x 454
450 454
=
= -2.56.
1.56
1.56

Because the mean weight of the 25 bags of pretzels sampled is more than two
standard deviations from 454 gm, we reject the null hypothesis, = 454 g, and
conclude that the alternative hypothesis, 454g, is true.
The data provide sufficient evidence to conclude that the packaging machine is not
working properly.

9.2 Terms, Errors, and Hypotheses ()


Some Additional Terminologies:

Test Statistic ( / ) The statistic used as a basic for deciding whether


the null hypothesis should be rejected.
Rejection Region ( / / ) The set of values for the test statistic that
leads to rejection of the null hypothesis.
Nonrejection Region ( / ) The set of values for the test statistic that leads
to nonrejection of the null hypothesis.
Critical Values ( ) The values of the test statistic that separate the rejection and
nonrejection regions. A critical value is considered part of the rejection region.

Do not
reject H0

Reject H0

Reject H0

Nonrejection Regtion
(Rejection regions and nonrejection region for two-tailed tests)
The alternative hypotheses:
Two-tailed test

Left-tailed test

Right-tailed test

Sign in Ha

<

>

Rejection region

Both sides

Left side

Right side

Type I and II errors ():


Decisions and errors ():
Decision

H0 is True

H0 is False

Accept H0

Correct Decision

Type II error

Reject H0

Type I error

Correct Decision

Type I and II Errors:


Type I error Rejecting the null hypothesis when it is in fact true.
Type II error Not rejecting the null hypothesis when it is in fact false.

Example P.394 Ex 9.5, Quality Assurance, consider once again the pretzel packaging
hypothesis test. The null and alternative hypotheses are
H0: = 454 g (the packaging machine is working properly)
Ha: 454g (the packaging machine is not working properly),
where is the mean net weight of all bags of pretzels packaged. Explain what each of
the following terms would mean.
a. Type I error
b. Type II error
c. Correct decision
Recall that the results of sampling 25 bags of pretzels led to rejection of the null
hypothesis, = 454 g, that is, to the conclusion that 454 g. Classify that conclusion
by error type or as a correct decision if
d. the mean net weight, , is in fact 454 g.
e. the mean net weight, , is in fact not 454 g.
Ans: (a) In fact, = 454 g but the results of the sampling lead to 454 g.
In fact, the packaging machine is working properly, but we conclude that it is not.
(b) In fact, 454 g but the results of the sampling lead to = 454 g.
In fact, the packaging machine is not working properly, but we conclude that it is.
(c) A correct decision can occur in either of 2 ways:
1. When in fact, = 454 g, the results of the sampling lead to = 454 g. The
packaging machine is working properly, and we conclude that it is.
2. When in fact, 454 g, the results of the sampling lead to 454 g. The
packaging machine is not working properly, and we conclude that it is not.
(d) Type I error. In fact = 454 g, but we have rejected it.
(e) A correct decision. In fact 454 g, and we have accepted it.
Probabilities of Type I and Type II errors -- The probabilities of making type I and type II
errors
Significance level ( / ) -- The probability of making a Type I error,
that is, of rejecting a true null hypothesis, is called the significance level, , of a
hypothesis test.
Relation between Type I and Type II Error Probabilities For a fixed sample size, the
smaller we specify the significance level, , the larger will be the probability, , of not
rejecting a false null hypothesis.

1
Region of

(Value of =

, the value of 1 , but not equal.)

Possible Conclusions for a Hypothesis Test:


If the null hypothesis is rejected, we conclude that the alternative hypothesis is true.
If the null hypothesis is not rejected, we conclude that the data do not provide sufficient
evidence to support the alternative hypothesis.
The test results are statistically significant at the level () -The null hypothesis is rejected in a hypothesis test performed at the significance level .
The test results are not statistically significant at the level -- The null hypothesis is not
rejected in a hypothesis test performed at the significance level .

9.3 Hypothesis Tests For One Population Mean when is known


Conducting a hypothesis:
1. find the critical value (), and
2. find test statistic to see on which side of the critical value it falls, and
3. make the conclusion.
Obtaining Critical Values ():
Suppose that a hypothesis test is to be performed at a specified significance level, . Then
the critical value(s) must be chosen so that, if the null hypothesis is true, the probability is
that the test statistic will fall in the rejection region.
Reminder: -- significance level, (= the probability of making Type I error).
Example P. 401 Ex9.6. Determine the critical value(s) for a hypothesis test at the 5%
significance level ( = 0.05) if the test is
a. two-tailed,
b. left-tailed,

c. right-tailed.
Ans:
Do not
reject H0
Reject H0

Do not reject H0
Reject H0

Reject H0

Critical values

(a) The left diagram, for = 0.05, z / 2 = z 0.025 = 1.96, and the critical value is 1.96.
(b) The middle diagram, for = 0.05, z = z 0.05 = 1.645, and the critical value is
1.645.
(c) The right diagram, for = 0.05, z = z 0.05 = 1.645, and the critical value is 1.645.
The most common five tail areas -- are 0.10, 0.05 and 0.01. (Why not 6?)
The One-Sample z-Test for a Population Mean (Critical-Value Approach):
( z )
Assumptions
1. Normal population or large sample.
2. is known.
Step 0. define what is.
Step 1. The null hypothesis is H0: = 0, and the alternative hypothesis is
Ha: 0
or
Ha: < 0
or Ha: > 0
(Two-tailed)
(Left-tailed)
(Right-tailed)
Step 2. Decide on the significance level .
Step 3. Compute the value of test statistic ()
x 0
z=
.
/ n
Step 4. The critical value(s) are
or
or
z
z
z / 2
(Two-tailed)
(Left-tailed)
(Right-tailed)
Use Table II to find the critical values(s).
Step 5. If the value of the test statistic falls in the rejection region, reject H0; otherwise, do
not reject H0.
Step 6. Interpret the results of the hypothesis test.
The hypothesis test is exact for normal populations and is approximately correct for large

samples from nonnormal populations.


Note: By saying that the hypothesis test is exact, we mean that the true significance level
equals ; by saying that it is approximately correct, we mean that the true significance
level only approximately equals .
When to Use the one mean z-Test:
When to use the one mean z-Test procedure ( z ):
1. For small samples say, of size less than 15 the z-test should be used only when the
variable under consideration is normally distributed or very close to being so.
2. For samples of moderate size say, between 15 and 30 the z-test can be used unless
the data contain outliers or the variable under consideration is far from being normally
distributed.
3. For large samples say, of size 30 or more the z-interval procedure can be used
essentially without restriction. However, if outliers are present and their removal is not
justified, the effect of the outliers on the hypothesis test should be examined; that is, you
should perform the hypothesis test with and without the outliers. If the conclusion
remains the same either way, you may be content to take that as your conclusion and
close the investigation. But if the conclusion is affected, you probably should make the
more conservative conclusion, use a different procedure, or take another sample.
4. If outliers are present but their removal is justified and results in a data set for which the
z-test is appropriate (as previously stated), the procedure can be used.
Example P.405 Ex9.7, Prices of History Books, The R.R. Bowker Company of New York
collects information on the retail prices of books and publishes the data in Publishers
Weekly. In 1997, the mean retail price of history books was $43.50. This years retail
prices for 40 randomly selected history books are shown as the following table. At the
1% significance level, do the data provide sufficient evidence to conclude that this years
mean retail price of all history books has increased from the 1997 mean of $43.50?
Assume that the standard deviation of prices for this years history books is $7.61.
48.04
45.75
43.04
39.84
43.32
39.74
45.84

38.29
39.92
53.74
42.93
42.98
48.20
42.94

39.38
46.86
39.07
44.40
52.74
44.37
55.78

46.03
47.77
54.72
42.99
64.42
43.74
44.46

10

33.12
67.41
45.80

56.97
48.52
64.21

49.48
61.08
53.30

46.13
34.38
34.69

s = $8.11

Mean ( x ) = $46.91

Ans: We constructed a normal probability plot, a histogram, a stem-and-leaf diagram, and a


boxplot for these data. There is no outlier. As the sample size is 40, which is large, and
the population standard deviation is known, we can apply the z-test procedure to
perform the required hypothesis test.
Step 1. State the null and alternative hypothesis.
Let denote this years mean retail price of all history books.
H0: = $43.50 (mean price has not increased)
Ha: > $43.50 (mean price has increased).
The alternative hypothesis is right-tailed because of the sign (>).
Step 2. Decide on the significance level .
Given by the question, the significance level is 1%, = 0.01.
Step 3. Compute the value of test statistic
x 0
z=
.
/ n
The known data are: 0 = 43.50, x = 46.91, =7.61 and n = 40.
z=

46.91 43.50
= 2.85.
7.61 / 40

Step 4. The critical value for a right-tailed test is z .


As = 0.01, the critical value is z 0.01 . From table z 0.01 = 2.33.
Step 5. If the value of the test statistic falls in the rejection region, reject H0; otherwise, do
not reject H0.
The value of the test statistic 2.85 (>2.33), falls in the rejection region. We reject H0.
The test results are statistically significant at the 1% level.
Step 6. Interpret the results of the hypothesis test.
At the 1% significance level, the data provide sufficient evidence to conclude that this
years mean retail price of all history books has increased from the 1997 mean of
$43.50.
Example P.406 Ex9.8, Poverty and Calcium, some information has been given by the
same example above. A random sample of 18 people with incomes below the poverty
level gives the daily calcium intakes shown in the following table.
686
993

433
620

743
574

647
634

734
850

641
858

11

992

775

1113

Mean ( x ) = 747.4

672

879

609

s = 178.8

At the 5% significance level, do the data provide sufficient evidence to conclude that the
mean calcium intake of all people with incomes below the poverty level is less than the
RDA of 800 mg? Assume that = 188 mg.
Ans: The probability normal plot reveals no outlier and roughly a normal distribution.
Though the sample size n = 18, the z-test procedure applies.
Step 1. State the null and alternative hypothesis.
Let denote the mean calcium intake (per day) of all people with incomes below the
poverty level. The null and alternative hypotheses are,
H0: = 800 mg (mean calcium intake is not less than the RDA)
Ha: < 800 mg (mean calcium intake is less than the RDA)
Step 2. Decide on the significance level .
The significance level is 5%, i.e. = 0.05
Step 3. Compute the value of test statistic
x 0
z
/ n
The known data are: 0 = 800 mg, x = 747.4 mg, =188 mg and n = 18.
z=

747.4 800
= 1.19.
188 / 18

Step 4. The critical value for a left-tailed test is z .


As = 0.05, the critical value is z 0.05 = 1.645.
Step 5. If the value of the test statistic falls in the rejection region, reject H0; otherwise, do
not reject H0.
Since z > z 0.05 , the test statistic > the critical value for a left-tailed test, i.e. the test
statistic does not fall into the rejection region, we do not reject the H0. The test results
are not statistically significant at the 5% level.
Step 6. Interpret the results of the hypothesis test.
At the 5% significance level, the data do not provide sufficient evidence to conclude that
the mean calcium intake of all people with incomes below the poverty level is less than
the RDA of 800 mg.
Example P.407 Ex9.9, Clocking the Cheetah (), The Cheetah (Acinonyx jubatus) is
the fastest land mammal on earth and is highly specialized to run down prey. According
to the Cheetah Conservation of Southern Africa Trade Environment Database, the
cheetah often exceeds speeds of 60 miles per hour (mph) and has been clocked at speeds
of more than 70 mph.

12

One common estimate of mean top speed for cheetahs is 60 mph. The following table
gives the top speeds, in mph, over a quarter mile for a sample of 35 cheetahs. At the 5%
significance level, do the data provide sufficient evidence to conclude that the mean top
speed of all cheetahs differs from 60 mph? Assume that the population standard
deviation of top speeds is 3.2 mph.
57.3
65.0
65.2
60.9
59.8

57.5
60.1
54.8
75.3
63.4

59.0
59.7
55.4
60.6
54.7

56.5
62.6
55.5
58.1
60.2

Mean ( x ) = 59.5

61.3
52.6
57.8
55.9
52.4

57.6
60.7
58.7
61.6
58.3

59.2
62.3
57.8
59.6
66.0

s = 4.3

Ans: a frequency histogram for the data suggests that the top speed of 75.3 mph is an outlier.
Thus, we 1st apply the z-test procedure to the full data set and then do it again on the
data set without the outlier.
Step 1. State the null and alternative hypothesis.
Let denote the mean top speed of all cheetahs.
The null and alternative hypotheses are,
H0: = 60 mph (mean top speed of cheetah is 60 mph)
Ha: 60 mph (mean top speed of cheetah is not 60 mph)
Step 2. Decide on the significance level .
The significance level is 5%, i.e. = 0.05.
Step 3. Compute the value of test statistic
x 0
z
/ n
The known data are: 0 = 60 mph, x = 59.5 mph, =3.2 mph and n = 35.
z=

59.5 60
= 0.88.
3.2 / 35

Step 4. The critical value for a two-tailed test is z / 2 .


As = 0.05, the critical value is z 0.025 = 1.96.
Step 5. If the value of the test statistic falls in the rejection region, reject H0; otherwise, do
not reject H0.
Since z < z 0.025 , the test statistic < (the critical value) for a two-tailed test, i.e. the
test statistic does not fall into the rejection regions, we do not reject the H0. The test
results are not statistically significant at the 5% level.
Step 6. Interpret the results of the hypothesis test.
At the 5% significance level, the data do not provide sufficient evidence to conclude that
the mean top speed of all cheetahs differs from 60 mph.
Find the effect of the outlier, 75.3 mph:

13

After removing the outlier (new x = 59.06), we find that the value of test statistic is z =
1.71, which still lies in the nonrejection region, although it is much closer to the critical
value. In this case, removing the outlier does not affect the conclusion of the hypothesis
test. We can probably accept that the mean top speed of all cheetahs is roughly 60 mph.
Statistical significance versus practical significance:
Statistical significance means that the data provide sufficient evidence to conclude that
the truth is different from the stated H0. However, it does not necessarily mean that the
difference is important in any practical sense.
x 0
x 0
n , in which | x 0 | may be small, but if n
From the formula z
=
/ n

is large, the value of z may be large and large enough to fall into the rejection region. In
such case, the value | x 0 | may not be practical significant, but it may be statistical
significant.

9.5 P-values
Critical approach () -- use the critical value () in a hypothesis test; the
approach we used above.
P-value approach (P ) -- use the observed value (sample value) as the critical value in a
hypothesis test.
P-value (observed significance level or probability value):
1. The percentage of samples that would yield a value of the test statistic as extreme (
) as or more extreme than that observed if the null hypothesis is true.
2. The probability of observing a value of the test statistic as extreme as or more extreme
than that observed. To obtain the P-value of a hypothesis test, we assume that the null
hypothesis is true. By extreme we mean, far from what we would expect to observe if
the null hypothesis is true. We use the letter P to denote the P-value.
Small P-values provide evidence against the null hypothesis; large P-values do not. The
smaller (closer to 0) the P-value, the stronger the evidence is against the null hypothesis.
Obtaining P-values for a one-sample z-test -- the P-value depends on the test, if it is a twotailed test or a one-tailed test (left-tailed and right-tailed).
Example P.422 Ex9.12, Prices of History Books, consider the history book hypothesis test
where we wanted to decide whether this years mean cost of all history books has
increased from the 1997 mean of $43.50. Recall that the null and alternative hypotheses
are (let denote this years mean retail price of all history books),
H0: = $43.50 (mean price has hot increased)

14

Ha: > $43.50 (mean price has increased).


The alternative hypothesis is right-tailed because of the sign (>). Use the data from the
previous example, x = $46.93 and = $7.61, to obtain and interpret the P-value of the
hypothesis test.
x 0
46.93 43.50
Ans: The test statistic is, z
=
= 2.85
7.61 / 40
/ n
The P-value (the probability of observing z 2.85, for a right-tailed test) is,
z = 2.85.
From table II, get = 0.0022. (area to the left = 0.9978, = 1 0.9978)
The P-value obtained (in this case, a right-tailed test, is ) is 0.0022.
Interpretation If the null hypothesis is true, we would observe a value of the test
statistic z of 2.85 or greater only about 22 times in 10,000. In other words, if the null
hypothesis is true, a random sample of 40 history books would have a mean of $46.93,
or greater, about 0.22% of the time.
Example P.423 Ex9.13, Clocking the Cheetah (), in the previous example we
concluded a hypothesis test to decide whether the mean top speed of all cheetahs differs
from 60 mph. Recall that the null and alternative hypotheses are
H0: = 60 mph (mean top speed of cheetah is 60 mph)
Ha: 60 mph (mean top speed of cheetah is not 60 mph)
where denotes the mean top speed of all cheetahs. The hypothesis test is two-tailed
because the sign () appears in the alternative hypothesis.
From the data above n = 35, x =59.526, = 3.2 and the datum 75.3 mph is an outlier.
a. Obtain and interpret the P-value of the hypothesis test, using the unabridged data (i.e.
including the outlier)
b. Obtain and interpret the P-value of the hypothesis test, using the abridged data (i.e.
with the outlier removed)
c. Comment on the effect that removing the outlier has on the evidence against the null
hypothesis.
x 0 59.526 60
Ans: (a) The test statistic of the unabridged data z
=
= 0.88.
3.2 / 35
/ n
Since it is a two-tailed test, the P-value is to solve z / 2 = 0.88 for .
z / 2 = 0.88,

= 0.1894, = 0.3789.

The P-value obtained is 0.3789.


Interpretation -- If the null hypothesis is true, we would observe a value of the test
statistic z = 0.88 or greater in magnitude more than 37 times in 100. In other words, if
the null hypothesis is true, a random sample of 35 cheetahs would have a mean top

15

speed at least as far from 60 mph as (60 59.526 = 0.474) that of our sample more than
37% of the time.
(b) Without the outlier, the test statistic becomes 1.71, and the resulted P-value is
0.0872 (2-tailed). The interpretation is similar to (a).
(c) Parts (a) and (b) indicate that the strength of the evidence against the null
hypothesis. If the outlier is retained, there is virtually no evidence against the null
hypothesis; if the outlier is removed, there is moderate evidence against the null
hypothesis.
P-value approach to hypothesis testing:
P-value as the Observed Significance Level:
The P-value of a hypothesis test equals the smallest significance level at which the null
hypothesis can be rejected, that is, the smallest significance level for which the observed
sample data results in rejection of H0.
Decision Criterion for a Hypothesis Test Using the P-value:
If the P-value is less than or equal to the specified significance level, reject the null
hypothesis; otherwise, do not reject the null hypothesis.
The One-Sample z-Test for a Population Mean (P-value Approach):
Assumptions
1. Normal population or large sample
2. is known
Step 1. The null hypothesis is H0: = 0, and the alternative hypothesis is
Ha: 0 or
Ha: < 0
or Ha: > 0
(Two-tailed)
(Left-tailed)
(Right-tailed)
Step 2. Decide on the significance level .
Step 3. Compute the value of test statistic
x 0
z=
= z0
/ n
(denote that value as z0.)
Step 4. Use Table II to obtain the P-value.
or
or
z P = z0
z P = z0
zP / 2 = z0
(Left-tailed)
(Right-tailed)
(Two-tailed)
(Notice the P subscript utilizes the -notation.)
Step 5. If P reject H0; otherwise, do not reject H0.
Step 6. Interpret the results of the hypothesis test.
The hypothesis test is exact for normal populations and is approximately correct for large

16

samples from nonnormal populations.


Example P.426 Ex9.14, Poverty and Calcium, a random sample of 18 people with
incomes below the poverty level gives the daily calcium intakes shown in the following
table. At the 5% significance level, do the data provide sufficient evidence to conclude
that the mean calcium intake of all people with incomes below the poverty level is less
than the RDA of 800 mg? Assume that = 188 mg.
686
993
992

433
620
775

743
574
1113

Mean ( x ) = 747.4

647
634
672

734
850
879

641
858
609

s = 178.8

Ans: A normal probability plot of the above data reveals no outliers and roughly a normal
distribution. We can apply the z-test procedure (P-value approach).
Step 1 State the null and alternative hypothesis.
Let denote the mean calcium intake (per day) of all people with incomes below the
poverty level. The null and alternative hypotheses are
H0: = 800 mg (mean calcium intake is not less than the RDA)
Ha: < 800 mg (mean calcium intake is less than the RDA).
The alternative hypothesis is left-tailed because of the sign (<).
Step 2 Decide on the significance level, .
The significance level is given as 5%, i.e. = 0.05.
Step 3. Compute the value of test statistic
x 0
z=
= z0
/ n
The test statistic, z0 =

x 0

/ n

747.4 800
= 1.19.
188 / 18

Step 4. Use Table II to obtain the P-value.


The P-value is to solve z P = z0 for P, i.e. z P = 1.19 and P = 0.1170.
Step 5. If P reject H0; otherwise, do not reject H0.
Since P > , we do not reject H0. The test results are not statistically significant at the
5% level.
Step 6. Interpret the results of the hypothesis test.
At the 5% significance level, the data do not provide sufficient evidence to conclude that
the mean calcium intake to all people with incomes below the poverty level is less than
the RDA of 800 mg.
Comparison of the Critical-value and P-value approach:

17

Critical-value approach

P-value approach

State the null & alternative hypothesis.

State the null & alternative hypothesis.

Decide on the significance level, .

Decide on the significance level, .

Compute the test statistic, z.

Compute the test statistic, z .

Determine the critical-value.

Determine the P-value.

If the test statistic falls in the rejection


region, reject H0; otherwise, do not.

If P , reject H0; otherwise, do not. (

Interpret the result of the test.

, 3 )

Interpret the result of the test.

Using the P-value to assess the evidence against the null hypothesis:
P-value

Evidence against H0

P > 0.1
0.05< P < 0.1
0.01 < P 0.05
P 0.01

Weak or none
Moderate
Strong
Very strong

Hypothesis Test without Significance Levels:


Many researchers do not explicitly refer to significance levels or critical values. Instead, they
simply obtain the P-value of the hypothesis test and use it (or let the reader use it) to assess
the strength of the evidence against the null hypothesis.

9.6 Hypothesis tests for one population mean when is unknown


(In the previous discussion of finding the confidence interval within a confidence level,
when is known we use normal distribution, i.e. table II;
when is unknown we use t-distribution, i.e. table IV.)
P-value for a t-test -- same as P-value for a z-test, but use table IV instead.
Estimating the P-value of a t-test use table IV or computer software, e.g. Excel.
Example -- P.435 Ex9.16, Use Table IV to estimate the P-value of each t-test.
a. Left-tailed test, n = 12, and t = 1.938
b. Two-tailed test, n = 25, and t = 0.895
Ans: We appeal from the z-test procedure, P-value approach.
Step 4. Use Table IV to obtain the P-value.
or
or
t P = t0
t P = t0
t P / 2 = t0
(Left-tailed)
(Right-tailed)
(Two-tailed)

18

(a) Apply: t P = t0, to become t P = 1.938.


Difficulties: for df = n 1 = 11, t0.05= 1.796 and t0.025 = 2.201. There is no 1.938.
Conclusion: 0.025 < P < 0.05. We can reject H0 for |test statistic| < 0.05, and can not
reject H0 for |test statistic| < 0.025. For test statistics lie between (0.025, 0.05), look
for the exact P-value from other sources, e.g. for tP = 1.938, P = 0.039354 by using
Excel.
(b) Apply: t P / 2 = t0, to become t P / 2 = 0.895.
Difficulties: for df = n 1 = 24, t0.1= 1.318. There is no smaller value.
Conclusion: P > 2(0.1) = 0.2. We can not reject H0 for |test statistic| < 0.2. For test
statistic with value larger than 0.2 we have to look for the P-value from other
sources, e.g. tP/2 = 0.895, P = 2(0.18984) = 0.3797 by using Excel.
The One-mean t-Test for a Population Mean (Critical-Value Approach)
Assumptions
1. Normal population or large sample
2. is unknown
Step 1 The null and alternative hypotheses are,
H0: =0, and
Ha: 0
or
Ha: < 0
or
Ha: > 0
(Two-tailed)
(Left-tailed)
(Right-tailed)
Step 2 Decide on the significance level, .
Step 3 Compute the value of the test statistic
t=

x 0
s/ n

Step 4 The critical value(s) are


t / 2

or

t
or
t
(Two-tailed)
(Left-tailed)
(Right-tailed)
with df = n 1. Use Table IV to find the critical value(s).
Step 5 If the value of the test statistic falls in the rejection region, reject H0; otherwise, do not
reject H0.
Step 6 Interpret the results of the hypothesis test.
The hypothesis test is exact for normal populations and is approximately correct for large
samples from nonnormal populations.
The One-mean t-Test for a Population Mean (P-Value Approach)
Assumptions
1. Normal population or large sample
2. is unknown

19

Step 1 The null and alternative hypotheses are,


H0: =0, and
Ha: 0
or
Ha: < 0
(Two-tailed)
(Left-tailed)
Step 2 Decide on the significance level, .
Step 3 Compute the value of the test statistic
x 0
t
= t0
s/ n

or

Ha: > 0
(Right-tailed)

Step 4 The critical value(s) are


t P / 2 = t0
t P = t0
t P = t0
or
or
(Two-tailed)
(Left-tailed)
(Right-tailed)
where P denote the P-value. With df = n 1, use Table IV to find the P-value(s).
Step 5 If P , reject H0; otherwise, do not reject H0.
Step 6 Interpret the results of the hypothesis test.
The hypothesis test is exact for normal populations and is approximately correct for large
samples from nonnormal populations.
Example P. 438 Ex9.17, Acid Rain and Lake Acidity, acid rain from the burning of fossil
fuels has caused many of the lakes around the world to become acidic. The biology in
these lakes often collapses because of the rapid and unfavourable changes in water
chemistry. A lake is classified as nonacidic if it has a pH greater than 6.
Aldo Marchetto and Andrea Lami measured the pH of high mountain lakes in the
Southern Alps and reported their findings in the paper Reconstruction of pH by
Chrysophycean Scales in Some Lakes of the Southern Alps (Hydrobiologia, 1994, Vol.
274, pp.83-90). The following table shows the pH levels obtained by the researchers for
15 lakes. At the 5% significance level, do the data provide sufficient evidence to
conclude that, on average, high mountain lakes in the Southern Alps are nonacidic?
7.2
7.3
5.7

7.3
6.3
6.9
Mean = 6.6

6.1
5.5
6.7

6.9
6.3
7.9

6.6
6.5
5.8

s = 0.672

Ans: A normal probability plot of the data reveals no outliers and is quite linear.
Step 1 State the null and alternative hypotheses.
Let denote the mean pH level of all high mountain lakes in the Southern Alps.
H0: = 6 (mean pH level is not greater than 6)
Ha: > 6 (mean pH level is greater than 6)
Since the (>) sign appears in the Ha, it is a right-tailed test.

20

Step 2 Decide on the significance level, .


The significance level is 5%, i.e. = 0.05.
Step 3 Compute the value of the test statistic.
Data we have are, x = 6.6, 0 = 6, s = 0.672 and n = 15. The test statistic is
x 0
6.6 6
t
=
= 3.458.
0.672 / 15
s/ n
Critical-Value Approach

P-value Approach

Step 4 The critical value for a right-tailed test Step 4 The t-statistic has df = n 1 . Use Table
is t , with df = n 1
IV to estimate the P-value, or obtain it exactly
by using technology.
From df = 14, and = 0.05, table IV gives t0.05 From t P = t0 = 3.458, table IV gives the
= 1.761.
largest value is 0.005, i.e. P < 0.005. By using
Excel, P = 0.00192.
Step 5 If the value of the test statistic falls in
the rejection region, reject H0; otherwise, do
not reject H0.

Step 5 If P , reject H0; otherwise, do not


reject H0.

Comparing t = 3.456 with t0.05 = 1.1761, we


reject H0. The test results are statistically
significant at the 5% level.

Since P = 0.005 < 0.05 which is 10 times less


than the significance level 0.05, we reject H0.
The test results are statistically significant at
the 5% level, and provide very strong
evidence against the null hypothesis.

Step 6 Interpret the results of the hypothesis test.


At the 5% significance level, the data provide sufficient evidence to conclude that, on
average, high mountain lakes in the Southern Alps are nonacidic.

***** End of Chapter IX *****


Maple Programs:
pc := COLOR(RGB, .95,.6,.45): bc := COLOR(RGB, .45,.6,.95): #define colours pc and bc.
z2plot := proc(z1,z2)
local f,A,B,M,prob,T1,T2,l_tail,r_tail,T3;
f := x -> exp(-(x^2)/2)/sqrt(2*Pi);
A := plot( f(t), t = -4..z1, color = pc, filled = true);
M := plot( f(t), t = z1..z2, color = bc, filled = true);
B := plot( f(t), t = z2..4, color = pc, filled = true);
prob := evalf(int(f(t),t=z1..z2),4);

21

T1 := plots[textplot]([(z1+z2)/2, (f(z1)+f(z2))/2, convert(prob, string)],


font=[TIMES,BOLD,18]);
T2 := plots[textplot]({[z1,-0.04,z1],[z2,-0.04,z2]},font=[TIMES,BOLD,12]);
l_tail := evalf(int(f(t),t=-4..z1),4);
r_tail := evalf(int(f(t),t=z2..4),4);
T3 := plots[textplot]({[z1-1, f(z1)/2, l_tail], [z2+1, f(z2)/2, r_tail]}, font =
[TIMES,BOLD,18]);
plots[display]({A,M,B,T1,T2,T3});
end:
pc := COLOR(RGB, .95,.6,.45): bc := COLOR(RGB, .45,.6,.95): #define colours
ziplot := proc(prob)
local f,z,G1,G2,Gt1,Gt2;
if (prob<0 or prob>1) then
ERROR(`Probability must be inside [0,1]`)
else
f := x -> exp(-(x^2)/2)/sqrt(2*Pi);
z := evalf(stats[statevalf, icdf, normald](prob), 4);
G1 := plot(f(t),t=-4..z, color=bc, filled=true);
G2 := plot(f(t), t=z..4,color=pc, filled=true);
Gt1:= plots[textplot]({[z-1,f(z)/2, prob],[z+1,f(z)/2,1prob]},font=[TIMES,BOLD,18]);
Gt2 := plots[textplot]({[z,-0.04,z]},font=[TIMES,BOLD,18]);
plots[display]({G1,G2,Gt1,Gt2});
fi;
end:
t2plot := proc(t1,t2,df)
local f,A,B,M,prob,Tp1,Tp2,l_tail,r_tail,Tp3;
f := x -> exp(-1/2*x^2)/sqrt(2*Pi);
A := plot( f(x), x = -4..t1, color = pc, filled = true);
M := plot( f(x), x = t1..t2, color = bc, filled = true);
B := plot( f(x), x = t2..4, color = pc, filled = true);
prob := stats[statevalf,cdf,studentst[df]](t2) - stats[statevalf,cdf,studentst[df]](t1);
prob := evalf(",4);
Tp1 := plots[textplot]([(t1+t2)/2, (f(t1)+f(t2))/2, convert(prob, string)],
font=[TIMES,BOLD,18]);
Tp2 := plots[textplot]({[t1,-0.04,t1],[t2,-0.04,t2]},font=[TIMES,BOLD,12]);

22

l_tail := stats[statevalf,cdf,studentst[df]](t1); l_tail:=evalf(",4);


r_tail := 1-stats[statevalf,cdf,studentst[df]](t2); r_tail:=evalf(",4);
Tp3 := plots[textplot]({[t1-1, f(t1)/2, l_tail], [t2+1, f(t2)/2, r_tail]}, font =
[TIMES,BOLD,18]);
plots[display]({A,M,B,Tp1,Tp2,Tp3});
end:
tiplot := proc(prob, df)
local f,t,G1,G2,Gt1,Gt2;
if (prob<0 or prob>1) then
ERROR(`Probability must be inside [0,1]`)
else
f := x -> exp(-(x^2)/2)/sqrt(2*Pi);
t := stats[statevalf, icdf, studentst[df]](prob);
t := evalf(",4);
G1 := plot(f(x),x=-4..t, color=bc, filled=true);
G2 := plot(f(x), x=t..4,color=pc, filled=true);
Gt1 := plots[textplot]({[t-1,f(t)/2, prob],[t+1,f(t)/2,1-prob]},font=[TIMES,BOLD,18]);
Gt2 := plots[textplot]({[t,-0.04,t]},font=[TIMES,BOLD,12]);
plots[display]({G1,G2,Gt1,Gt2});
fi;
end:

Review Problems
Understanding the Concepts and Skills
1. Explain the meaning of each term.
a. Null hypothesis
b. Alternative hypothesis
c. Test statistic
d. Rejection region
e. Nonrejection region
f. Critical value(s)
Ans: a. The null hypothesis is a hypothesis to be tested.
b. The alternative hypothesis is a hypothesis to be considered as an alternate to the null
hypothesis.
c. The test statistic is the statistic used as a basis for deciding whether the null
hypothesis should be rejected.
d. The rejection region is the set of values for the test statistic that leads to rejection of
the null hypothesis.
e. The nonrejection region is the set of values for the test statistic that leads to

23

nonrejection of the null hypothesis.


f. The critical values are the values of the test statistic that separate the rejection and
nonrejection regions. The critical values are considered part of the rejection region.
2. The following statement appeared on a box of Tide laundry detergent: "Individual
packages of Tide may weigh slightly more or less than the marked weight due to normal
variations incurred with high speed packaging machines, but each day's production of Tide
will average slightly above the marked weight."
a. Explain in statistical terms what the statement means.
b. Describe in words a hypothesis test for checking the statement.
c. Suppose that the marked weight is 76 ounces. State in words the null and alternative
hypotheses for the hypothesis test. Then express those hypotheses in statistical
terminology.
Ans: a. The weight of a package of Tide is a variable. A particular package may weigh
slightly more or less than the marked weight. The mean weight of all packages
produced on any specified day (the population mean weight for that day) exceeds the
marked weight.
b. The null hypothesis would be that the population mean weight for a specified day
equals the marked weight; the alternative hypothesis would be that the population
mean weight for the specified day exceeds the marked weight.
c. The null hypothesis would be that the population mean weight for a specified day
equals the marked weight of 76 oz; the alternative hypothesis would be that the
population mean weight for the specified day exceeds the marked weight of 76 oz. In
statistical terminology, the hypothesis test would be H0: = 76 oz and Ha: > 76 oz,
where is the mean weight of all packages produced on the specified day.
3. Regarding a hypothesis test:
a. What is the procedure, generally, for deciding whether the null hypothesis should be
rejected?
b. How can the procedure identified in part (a) be made objective and precise?
Ans: a. Obtain the data from a random sample of the population or from a designed
experiment. If the data are consistent with the null hypothesis, do not reject the null
hypothesis; if the data are inconsistent with the null hypothesis, reject the null
hypothesis and conclude that the alternative hypothesis is true.
b. We establish a precise criterion for deciding whether to reject the null hypothesis
prior to obtaining the data.
4. There are three possible alternative hypotheses in a hypothesis test for a population mean.

24

Identify them and explain when each is used.


Ans: Two-tailed test, Ha: 0. Used when the primary concern is deciding whether a
population mean, , is different from a specified value 0.
Left-tailed test, Ha: < 0. Used when the primary concern is deciding whether a
population mean, , is less than a specified value 0.
Right-tailed test, Ha: > 0. Used when the primary concern is deciding whether a
population mean, it, is greater than a specified value 0.
5. Two types of incorrect decisions can be made in a hypothesis test: a Type I error and a
Type II error.
a. Explain the meaning of each type of error.
b. Identify the letter used to represent the probability of each type of error.
c. If the null hypothesis is in fact true, only one type of error is possible. Which type is
that? Explain your answer.
d. If you fail to reject the null hypothesis, only one type of error is possible. Which type is
that? Explain your answer.
Ans: a. A Type I error is the incorrect decision of rejecting a true null hypothesis.
A Type II error is the incorrect decision of not rejecting a false null hypothesis.
b. and , respectively
c. A Type I error
d. A Type II error
6. Suppose that you want to conduct a right-tailed hypothesis test at the 5% significance
level. How must value be chosen?
Ans: It must be chosen so that, if the null hypothesis is true, the probability equals 0.05 that
the test statistic will fall in the rejection region, in this case, to the right of the critical
value.
7. In each part, we have identified a hypothesis testing procedure for a population mean. State
the assumptions required and the test statistic used in each case.
a. One-mean t-test
b. One-mean z-test
*c. Wilcoxon signed-rank test
Ans: a. Assumptions: simple random sample, normal population or large sample; unknown.
( x 0 )
Test statistic: t =
.
s/ n
b. Assumptions: simple random sample; normal population or large sample; known.
( x 0 )
Test statistic: z =
.
/ n
c. Assumptions: simple random sample; symmetric population. Test statistic:

25

W = sum of the positive ranks.


8. What is meant when we say that a hypothesis test is
a. exact?
b. approximately correct?
Ans: a. The true significance level equals .
b. The true significance level only approximately equals .
9. Discuss the difference between statistical significance and practical significance.
Ans: The results of a hypothesis test are statistically significant if the null hypothesis is
rejected at the specified significance level. Statistical significance means that the data
provide sufficient evidence to conclude that the truth is different from the stated null
hypothesis. It does not necessarily mean that the difference is important in any practical
sense.
10. For a fixed sample size, what happens to the probability of a Type II error if the
significance level is decreased from 0.05 to 0.01?
Ans: It increases.
*11. Regarding the power of a hypothesis test:
a. What does it represent?
b. What happens to the power of a hypothesis test if the significance level is kept at 0.01
while the sample size is increased from 50 to 100?
Ans: a. The probability of rejecting a false null hypothesis
b. It increases.
12. Regarding the P-value of a hypothesis test:
a. What is the P-value of a hypothesis test?
b. Answer true or false: A P-value of 0.02 provides more evidence against the null
hypothesis than a P-value of 0.03. Explain your answer.
c. Answer true or false: A P-value of 0.74 provides essentially no evidence against the null
hypothesis. Explain your answer.
d. Explain why the P-value of a hypothesis test is also referred to as the observed
significance level.
Ans: a. The P-value is the probability, calculated under the assumption that the null
hypothesis is true, of observing a value of the test statistic as extreme as or more
extreme than that observed. By extreme we mean "far from what we would expect to
observe if the null hypothesis is true."
b. True
c. True

26

d. Because it is the smallest significance level for which the observed sample data result
in rejection of the null hypothesis.
13. Discuss the differences between the critical-value and P-value approaches to hypothesis
testing.
14. Identify two advantages of nonparametric methods over parametric methods. When is a
parametric procedure preferred? Explain your answer.
15. Cheese Consumption. The U.S. Department of Agriculture reports in Food
Consumption, Prices, and Expenditures that the average American consumed 30.0 lb of
cheese in 2001. Cheese consumption has increased steadily since 1960 when the average
American ate only 8.3 lb of cheese annually. Suppose that you want to decide whether
year's mean cheese consumption is greater than the 2001 mean.
a. Identify the null hypothesis.
b. Identify the alternative hypothesis.
c. Classify the hypothesis test as two tailed, left tailed, or right tailed.
Ans: Let denote last year's mean cheese consumption by Americans.
a. H0: = 30.0 lb
b. Ha: > 30.0 lb
c. Right tailed
16. The following graph portrays the decision criterion for a hypothesis test about a
population mean, . The null hypothesis for the test is Ho: = o, and the test statistic is
z=

x 0
.
/ n

The curve shown in the graph reveals the implications of the decision criterion if in fact
the null hypothesis is true.

Do not reject H0 | Reject H0

Determine the
a. rejection region.

b. nonrejection region.

27

c. critical value(s).
d. significance level.
e. Draw a graph that depicts the answers you obtained in parts (a)-(d).
f. Classify the hypothesis test as two tailed, left tailed, or right tailed.
Ans: a. z > 1.28
b. z < 1.28
c. z = 1.28
d. a = 0.10
f. Right tailed
17. Cheese Consumption. The null and alternative hypotheses for the hypothesis test in
Problem 15 are:
H0: = 30.0 lb (mean has not increased)
Ha: > 30.0 lb (mean has increased),
where is last year's mean cheese consumption for all Americans. Explain what each of
the following would mean.
a. Type I error
b. Type II error
c. Correct decision
Now suppose that the results of carrying out the hypothesis test lead to non-rejection of
the null hypothesis. Classify that decision by error type or as a correct decision if in fact
last year's mean cheese consumption
d. has not increased from the 2001 mean of 30.0 lb.
e. has increased from the 2001 mean of 30.0 lb.
Ans: a. A Type I error would occur if in fact = 30.0 lb, but the results of the sampling lead
to the conclusion that > 30.0 lb.
b. A Type II error would occur if in fact > 30.0 lb, but the results of the sampling fail
to lead to that conclusion.
c. A correct decision would occur if in fact = 30.0 lb and the results of the sampling
do not lead to the rejection
d. Correct decision
e. Type II error.
*18. Cheese Consumption. Refer to Problem 15. Suppose that you decide to use a z-test
with a significance level of 0.10 and a sample size of 35. Assume that = 6.9 lb.
a. Determine the probability of a Type I error.
b. If last year's mean cheese consumption was 30.5 lb, identify the distribution of the
variable x , that is, the sampling distribution of the mean for samples of size 35.
c. Use part (b) to determine the probability, , of a Type II error if in fact last year's
mean cheese consumption was 30.5 lb.
d. Repeat parts (b) and (c) if in fact last year's mean cheese consumption was 31.0 lb,
31.5 lb, 32.0 lb, 32.5 lb, 33.0 lb, 33.5 lb, and 34.0 lb.
e. Use your answers from parts (c) and (d) to construct a table of selected Type II error
probabilities and powers similar to Table 9.8 on page 433.

28

f. Use your answer from part (e) to construct the power curve.
Using a sample size of 60 instead of 35, repeat
g. part (b).
h. part (c).
i. part (d).
j. part (e).
k. part (f).
l. Compare your power curves for the two sample sizes and explain the principle being
illustrated.
Ans: Note: The answers obtained to many of the parts of this problem may vary depending on
when and how much intermediate rounding is done. We used statistical software to get
the answers to most parts of this problem.
a. 0.10
b. Approximately normal with a mean of 30.5 and a standard deviation of 6.9/ 35
1.17.
c. 0.8031
d. Approximately normal with the specified mean and a standard deviation of 6.9/
1.17. The Type II error probabilities, , are shown in the table in part (e).
g. Approximately normal with a mean of 30.5 and a standard deviation of 6.9/ 60
0.89.
h. 0.7643
i. Approximately normal with the specified mean and a standard deviation of 6.9/
0.89. The Type II error probabilities, , are shown in the table in part (j).
l. For a fixed significance level, increasing the sample size increases the power.

35

60

19. Cheese Consumption. Refer to Problem 15. The following table provides last year's
cheese consumption, in pounds, for 35 randomly selected Americans.
42
29
32
40
33

25
28
28
29
33

29
32
41
22
32

34
24
20
33
18

38
43
35
23
40

36
22
24
27
32

30
38
29
32
25

a. At the 10% significance level, do the data provide sufficient evidence to conclude that
last year's mean cheese consumption for all Americans has increased over the 2001
mean? Assume that = 6.9 lb. For your hypothesis test, use a z-test and the criticalvalue approach. (Note: The sum of the data is 1078 lb.)
b. Given the conclusion in part (a), if an error has been made, what type must it be?
Explain your answer.
Ans: a. H0: = 30.0 lb, Ha: > 30.0 lb; = 0.10; z = 0.69; critical value = 1.28; do not reject
Ho; at the 10% significance level, the data do not provide sufficient evidence to

29

conclude that last year's mean cheese consumption for all Americans has increased
over the 2001 mean of 30.0 lb.
b. A Type II error because, given that the null hypothesis was not rejected, the only error
that could be made is the error of not rejecting a false null hypothesis.
20. Cheese Consumption. Refer to Problem 19.
a. Repeat the hypothesis test, using the P-value approach to hypothesis testing.
b. Use Table 9.12 on page 444 to assess the strength of the evidence against the null
hypothesis.
Ans: a. H0: = 30.0 lb, Ha: > 30.0 lb; = 0.10; z = 0.69; P = 0.2451; do not reject H0; at
the 10% significance level, the data do not provide sufficient evidence to conclude
that last year's mean cheese consumption for all Americans has increased over the
2001 mean of 30.0 lb.
b. The data provide at most weak evidence against the null hypothesis.
21. Purse Snatching. The U.S. Federal Bureau of Investigation (FBI) compiles information
on robbery and property crimes, by type and selected characteristic, and publishes its
findings in Population-at-Risk Rates and Selected Crime Indicators. According to that
document, the mean value lost to purse snatching was $332 in 2002. For last year, 12
randomly selected purse-snatching offenses yielded the following values lost, to the
nearest dollar.
207
237

422
226

272
205

362
348

165
266

269
430

Use a t-test with either the critical-value approach or the P-value approach to decide, at
the 5% significance level, whether last year's mean value lost to purse snatching has
decreased from the 2002 mean. The mean and standard deviation of the data are $284.1
and $86.9, respectively.
Ans: H0: = $332, Ha: < $332; = 0.05; t = -1.909; critical value = -1.796; 0.025<P<0.05;
reject H0; at the 5% significance level, the data provide sufficient evidence to conclude
that last year's mean value lost to purse snatching has decreased from the 2002 mean of
$332.
*22. Purse Snatching. Refer to Problem 21.
a. Perform the required hypothesis test, using the Wilcoxon signed-rank test.
b. In performing the hypothesis test in part (a), what assumption did you make about the
distribution of last year's values lost to purse snatching?
c. In Problem 21, we used the t-test to perform the hypothesis test. The assumption in

30

that problem is that last year's values lost to purse snatching are normally distributed.
If that assumption is true, why is it permissible to perform a Wilcoxon signed-rank
test for the mean value lost?
Ans: a. H0: = $332, Ha: < $332; = 0.05; W = 17; critical value = 17; P = 0.046; reject
H0; at the 5% significance level, the data provide sufficient evidence to conclude that
last year's mean value lost to purse snatching has decreased from the 2002 mean of
$332.
b. It is symmetric.
c. Because a normal distribution is symmetric.
*23. Purse Snatching. Refer to Problems 21 and 22. If in fact last year's values lost to purse
snatching are normally distributed, which is the preferred procedure for performing the
hypothesis test the t-test or the Wilcoxon signed-rank test? Explain your answer.
Ans: t-test
24. Betting the Spreads. College basketball, and particularly the NCAA basketball
tournament, is a popular venue for gambling, from novices in office betting pools to the
high roller. To encourage uniform betting across teams, Las Vegas oddsmakers assign a
point spread to each game. The point spread is the oddsmakers' prediction for the number
of points by which the favored team will win. If you bet on the favorite, you win the bet
provided the favorite wins by more than the point spread; otherwise, you lose the bet. Is
the point spread a good measure of the relative ability of the two teams? H. Stern and B.
Mock addressed this question in the paper "College Basketball Upsets: Will a 16-Seed
Ever Beat a 1-Seed?" (Chance, Vol. 11(1), pp. 27-31). They obtained the difference
between the actual margin of victory and the point spread, called the point-spread error,
for 2109 college basketball games. The mean point-spread error was found to be -0.2
point with a standard deviation of 10.9 points. For a particular game, a point-spread error
of 0 indicates that the point spread was a perfect estimate of the two teams' relative
abilities.
a. If, on average, the oddsmakers are estimating correctly, what is the (population) mean
point-spread error?
b. Use the data to decide, at the 5% significance level, whether the (population) mean
point-spread error differs from 0.
c. Interpret your answer in part (b).
Ans: a. 0 points
b. H0: = 0 points, Ha: 0 points; = 0.05; t = -0.843; critical values = 1.96;
P>0.20; do not reject H0.
c. At the 5% significance level, the data do not provide sufficient evidence to conclude

31

that the population mean point-spread error differs from 0. In fact, because P > 0.20,
there is virtually no evidence against the null hypothesis that the population mean
point-spread error equals 0.

Problems 25 and 26 each include a normal probability plot and either a frequency histogram
or a stem-and-leaf diagram for a set of sample data. The intent is to use the sample data to
perform a hypothesis test for the mean of the population from which data were obtained. In
each case, consult the graphs provided to decide whether to use the z-test, the t-test, or
neither. Explain your answer.
25. The normal probability plot and histogram of the data are depicted in Fig. 9.44; is
known.
Ans: It is probably okay to use the z-test because the sample size is large and is known.
However, it does appear from the normal probability plot that there may be outliers, so
one should proceed cautiously in using the z-test.
26. The normal probability plot and stem-and-leaf diagram of the data are depicted in Fig.
9.45; is unknown.
Ans: It appears that the variable under consideration is far from being normally distributed
and, in fact, has a left-skewed distribution. However, the sample size is large and the
plots reveal no outliers. Keeping in mind that is unknown, it is probably reasonable to
use the t-test.
*27. Refer to Problems 25 and 26.
a. In each case, consult the appropriate graphs to decide whether using the Wilcoxon
signed-rank test is reasonable for performing a hypothesis test for the mean of the
population from which the data were obtained. Give reasons for your answers.
b. For each case where using either the z-test or the -test is reasonable and where using
the Wilcoxon signed-rank test is also appropriate, decide which test is preferable. Give
reasons for your answers.
Ans: a. In view of the graphs, it appears reasonable to assume that, in Problem 25, the
variable under consideration has (approximately) a symmetric distribution but not so
in Problem 26. Consequently, it would be reasonable to use the Wilcoxon signed-rank
test in the first case, but not the second.
b. In Problem 25, it is a tough call between the Wilcoxon signed-rank test and the z-test
but, considering the possible outliers, the Wilcoxon signed-rank test is probably the
better one to use.

32

*28. Nursing-Home Costs. The cost of staying in a nursing home in the United States is
rising dramatically, as reported in the August 5, 2003 issue of The Wall Street journal. In
May 2002, the average cost of a private room in a nursing home was $168 per day. For
August 2003, a random sample of 11 nursing homes yielded the following daily costs, in
dollars, for private room in a nursing home.
73

199

192

181

182

250

159

182

208

129

282

a. Apply the t-test to decide at the 10% significance level whether the average cost for a
private room in a nursing home in August 2003 exceeded that in May 2002.
b. Repeat part (a) by using the Wilcoxon signed-rank test.
c. Obtain a normal probability plot, a boxplot, a stem-and-leaf diagram, and a histogram
of the sample data.
d. Discuss the discrepancy in results between the t-test and the Wilcoxon signed-rank
test.
Ans: a. H0: = $168, Ha: > $168; = 0.10; t = 1.03; critical value 1.372; P > 0.10; do not
reject H0; at the 10% significance level, the data do not provide sufficient evidence to
conclude that the average cost for a private room in a nursing home in August 2003
exceeded that in May 2002.
b. H0: = $168, Ha: > $168; = 0.10; W = 48; critical value = 48; P = 0.099; reject
H0; at the 10% significance level, the data provide sufficient evidence to conclude
that the average cost for a private room in a nursing home in August 2003 exceeded
that in May 2002.
d. From part (c), we find that the variable under consideration appears to be symmetric,
but that the data contain outliers. This explains the discrepancy between the results of
the two tests. In view of the small sample size, the Wilcoxon signed-rank test is
preferable to the t-test.

Working With Large Data Sets


29. Beef Consumption. According to Food Consumption, Prices, and Expenditures,
published by the U.S. Department of Agriculture, the mean consumption of beef per
person in 2002 was 64.5 lb (boneless, trimmed weight). A sample of 40 people taken this
year yielded the data, in pounds, on last year's beef consumption given on the WeissStats
CD. Use the technology of your choice to do the following.
a. Obtain a normal probability plot, a boxplot, a histogram, and a stem-and-leaf diagram
of the data on beef consumptions.
b. Decide, at the 5% significance level, whether last year's mean beef consumption is less

33

than the 2002 mean of 64.5 lb. Apply the one-mean t-test.
c. The sample data contain four potential outliers: 0, 0, 8, and 20. Remove those four
observations, repeat the hypothesis test in part (b), and compare your result with that
obtained in part (b).
d. Assuming that the four potential outliers are not recording errors, comment on the
advisability of removing them from the sample data before performing the hypothesis
test.
e. What action would you take regarding this hypothesis test?
*30. Beef Consumption. Use the technology of your choice to do the following.
a. Repeat parts (b) and (c) of Problem 29 by using the Wilcoxon signed-rank test.
b. Compare your results from part (a) with those in Problem 29.
c. Discuss the reasonableness of using the Wilcoxon signed-rank test here.
31. Body Mass Index. Body Mass Index (BMI) is a measure of body fat based on height and
weight. According to the document Dietary Guidelines for Americans, published by the
U.S. Department of Agriculture and the U.S. Department of Health and Human Services,
for adults, a BMI of greater than 25 indicates an above healthy weight. The BMIs of 75
randomly selected U.S. adults provided the data on the WeissStats CD. Use the
technology of your choice to do the following.
a. Obtain a normal probability plot, a boxplot, and a histogram of the data.
b. Based on your graphs from part (a), is it reasonable to apply the one-mean z-test to the
data? Explain your answer

You might also like