You are on page 1of 291

Chapter 1

Overview of Statistics
1.1 Answers will vary.
1.2 a. Answers will vary.
b. An hour with an expert at the beginning of a project could be the smartest move a manager can make.
When your team lacks certain critical skills, or when an unbiased or informed view cannot be found
inside your organization. Expert consultants can handle domineering or indecisive team members,
personality clashes, fears about adverse findings, and local politics. As in any business decision, the costs
of paying for statistical assistance must be weighed against the benefits. Costs are: statisticians time,
more time invested in the beginning of a project which may mean results are not immediate. Benefits
include: better sampling strategies which can result in more useful data, a better understanding of what
information can be extracted from the data, greater confidence in the results.

1.3 a. The average business school graduate should expect to use computers to manipulate the data.
b. Answers will vary.
1.4 a. Answers will vary. Why Study: In fact, most college graduates will use statistics every day. Why Not
Study: It is difficult to become a statistical expert after taking one introductory college course. A
business person should hire statistical experts and have faith that those who are using statistics are doing
it correctly.
b. Answers will vary.
c. To arrive at an absurd result, and then conclude the original assumption must have been wrong, since it
gave us this absurd result. This is also known as proof by contradiction. It makes use of the law of
excluded middle a statement which cannot be false, must then be true. If you state that you will never
use statistics in your business profession then you might conclude that you shouldnt study statistics.
However, the original assumption of never using statistics is wrong, therefore the conclusion of not
needing to study statistics is also wrong.
1.5 a. An understanding of statistics helps one determine what data is necessary by requiring us to state our
questions up front. We can then determine the proper amount of data needed, sample sizes, to confidently
answer our questions.
b. Answers will vary.
c. Answers will vary.
1.6 a. Yes, the summary is succinct. No, the purpose was not clear. Why do we want to know the weight of
Tootsie Rolls? Yes, the sampling method was explained. Yes, the findings were clearly stated. No
implications of the study were noted. Yes, jargon is a problem. Non statisticians will not know what a
FPCF is or what a confidence interval is. To improve this report the writer should restate in laymens
terms.
b. Yes, the summary is succinct. No, the purpose was not clear. Why do we want to know the proportion of
pages with advertisements? Yes, the sampling method was explained. Yes, the findings were clearly
stated. Jargon is not a problem here as it was previously. To improve, the writer should state the purpose.
1.7 a. The graph is more helpful. The visual illustration of the distribution focuses the individual on the
experience of the many (in this case 16 to 20 years and 21 to 25 years). We can quickly see that the
typical financial planner has between 16 to 25 years experience.
1
1.8 Answers will vary.
1.9 Answers will vary.
1.10 a. It is not obvious that there is a direct cause and effect relationship between an individual choosing to use a
radar detector and that individual choosing to vote and wear a seatbelt. There may be unidentified
variables that are related to these three individual characteristics. Users of radar detectors may drive
faster and thus recognize risks and therefore are more likely to wear seatbelts. Users may also be more
prone to vote, since they are concerned more about government policies and who will support their desire
to have government influence minimized.
b. Increasing the use of radar detectors may not influence those who obey laws and are less concerned with
government limitations.
1.11 a. No, the method did not work in the sense that he increased his chances of winning by picking the
numbers the way he did. Every combination of six numbers has the same chance of winning. The fact
that this winner chose his numbers based on his families birthdays and school grade does not increase
the chance of his winning.
b. Someone who picks 1-2-3-4-5-6 has just as much chance of winning as anyone else (see (a)).
1.12 a. The phrase much more is not quantified. The study report is not mentioned. There is no way to determine
the veracity of this statement. Six causes of car accidents could be poor weather, road construction, heavy
traffic, inexperienced driver, engine failure, drinking. Smoking is not on this list.
b. Smokers might pay less attention to driving when lighting a cigarette.
1.13 Many people have math phobia and because statistics involves math the subject can sound scary. The
subject of statistics has a reputation for being difficult and this can cause fear of the unknown. There is
usually not the same fear towards an ethics class. This is because there is much more emphasis on
unethical behavior in the media, to the extent that ethical behavior and an understanding of how to be
ethical is widely accepted as a requirement to graduate and then succeed.
1.14 Random sampling of cans of sauce for a specific manufacturer can be used to assess quality control.
1.15 a. The consultant can analyze the responses from the 80 purchasing managers noting that the linen supplier
should not make any conclusions about the managers who did not respond. The consultant should not use
the responses to ambiguous questions. She should suggest that the supplier redesign both the survey
questions and the survey methods to increase the response rates.
b. An imperfect analysis would be a mistake because the supplier may make changes to their business that
upset those customers not responding to the survey or those customers not sent a survey.
c. As a consultant it would be important to point out the problems with the survey instrument and the survey
method and suggest alternatives for improvement.
1.16 All of these involve taking samples from the population of interest and estimating the value of the variable of
interest.
1.17 a. Class attendance, time spent studying, natural ability of student, interest level in subject, instructors
ability, performance in course prerequisites. Smoking is not on the list.
b. Most likely students who earn As are also making good decisions about their health. Students who smoke
might also be making poor choices surrounding their study habits.
c. Giving up smoking alone may not stop a student from using poor study habits nor is it likely to increase
their interest in a topic.
1.18 Curiosity, parents who smoke, friends who smoke, seeing teenagers smoke in movies and TV, boredom,
wanting to look cool. Yes, seeing movie and TV stars smoking was on the list.
2
1.19 a. We need to know the total number of philosophy majors to evaluate this.
b. We dont know the number of students in each major from this table.
c. This statement suffers from self-selection bias. There are likely many more marketing majors who choose
to take the GMAT and therefore a wider range of abilities than the abilities of physics majors who choose
to take the GMAT.
d.The GMAT is just one indicator of managerial skill and ability. It is not the only predictor of success in
management.
1.20 a. The graph is much more useful. We can clearly see that as chest measurement increases body fat also
increases. It is not a perfectly linear relationship but the relationship is there nevertheless.
b. The last two data points on the far right show a very high chest measurement but the body fat percentage
has leveled off.
1.21 a. Its R
2
value is quite close to 1 indicating it is a good fit to the actual data. I feel that G.E. is one of the
most respected corporations in the world because of its strong management and name recognition. Its
valuable assets make it poised for steady growth over the next decade.
b. If a countrys unemployment rate is too high, it could cause a down turn in its economys structure.
c. You cannot have a negative number of people unemployed; therefore, this forecast is very unlikely.
d. This is not a well designed graph because its title is too long and there are no labels on the axes.
e. This graph has no clear border to give it a sense of containment. It is dealing with three separate pieces of
information. In this graph, the same data is presented, but in a deceptive manner. The sources do not
contain enough detail.
1.22 Answers will vary.
3
Chapter 2
Data Collection
2.1 Observation single data point. Variable characteristic about an individual.
2.2 Answers will vary.
2.3 a. attribute
b. attribute
c. discrete numerical
d. continuous numerical
e. discrete numerical
f. discrete numerical
g. continuous numerical
2.4 a. continuous numerical
b. discrete numerical
c. attribute
d. continuous numerical
e. attribute
d. discrete numerical
2.5 Answers will vary.
2.6 Answers will vary.
2.7 a. ratio
b. ordinal
c. nominal
d. interval
e. ratio
f. ordinal
2.8 a. ratio
b. ratio
c. interval
d. nominal
e. nominal
f. nominal
2.9 Answers will vary.
2.10 a. ordinal or interval
b. ordinal
c. nominal
d. ratio
2.11 a. cross-sectional
b. time series
c. time series
d. cross-sectional.
4
2.12 a. time series
b. cross-sectional
c. time series
d. cross-sectional
2.13 a. time series
b. cross-sectional.
c. time series.
d. cross-sectional.
2.14 Answers will vary.
2.15 a. Census
b. Sample
c. Sample
d. Census
2.16 a. parameter
b. parameter
c. statistic.
d. statistic
2.17 a. Sample
b. Census
c. Sample
d. Census
2.18 Use the formula: N= 20*n
a. N = 20*10 = 200
b. N = 20*10 = 1000
c. N = 20*100 = 20000
2.19 a. Convenience
b. Systematic
c. Judgment or biased
2.20 a. No. In the rush to leave the theater, stop at the restroom, use their cell phone, etc. it would not be possible
for everyone to have an equal chance to be included in the sample.
b. Might only get those who didnt like the movie and couldnt wait to leave. There might not be a large
enough crowd to get every 10
th
person to be representative and leaving the theatre is not a linearly
organized event. Might have underrepresented sample by only selecting those with earrings.
c. Only those who liked the movie or really hated the movie might respond, a bias due to self-selection.
2.21 Answers will vary.
2.22 a. 0.50
b. Answers will vary.
c. Due to random variation the sample may not be representative.
2.23 Answers will vary.
2.24 a. Response bias.
b. Self-selection bias, coverage error.
c. coverage error, self-selection bias.
5
2.25 a. Telephone or Web
b. Direct observation
c. Interview, Web, or mail
d. Interview or Web
2.26 a. Mail
b. Direct observation, through customer invoices/receipts
c. Mail
d. Interview
2.27 Version 1: Most would say yes. Version 2: More varied responses.
2.28 Does not include all possible responses or allow for the responder to pick something other than those
presented.
2.29 a. Continuous numerical
b. Attribute.
c. Discrete numerical.
d. Discrete numerical.
e. Continuous numerical.
2.30 a. ordinal (seeds represent a ranking of the players)
b. ratio
c. ratio
d. ratio
e. ratio, zero is meaningful.
2.31 Answers will vary.
2.32 Answers will vary.
2.33 Q1 Attribute, nominal
Q2 Continuous, ratio
Q3 Attribute, nominal
Q4 Continuous, ratio
Q5 Discrete, ratio
Q6 Discrete, ratio
Q7 Attribute, nominal
Q8 Attribute, interval
Q9 Continuous, ratio
Q10 Discrete, ratio
Q11 Continuous, ratio
Q12 Discrete, ratio
Q13 Attribute, nominal
Q14 Discrete, ratio
Q15 Continuous, ratio
6
Q16 Discrete, ratio
Q17 Attribute, interval
Q18 Attribute, nominal
Q19 Attribute, interval
Q20 Attribute, nominal
2.34 a. Census.
b. Sample: too costly to track each can
c. Census,: can count them all quickly and cheaply
d. Census: as long as the company can easily generate the value from its human resource center.
2.35 a. Statistic
b. Parameter
c. Statistic
d. Parameter
2.36 Answers will vary.
2.37 a. Number of employees or industry
b. There may be differences in profitability based on number of employees or industry type therefore we
should be sure to take a sample that includes both types of industries.
c. Under representation of chemical companies.
2.38 a. Cluster sampling. Easier to define geographic areas within a state where gasoline is sold. Gasoline stations
are not everywhere, thus simple random sample or stratified sampling doesnt make sense.
b. Population is finite. It is listable.
2.39 Use mail or telephone. Census not possible.
2.40 a. Could use cluster sampling as grocery stores are in well defined locations. Identify clusters within each
state.
b. The sample frame is all stores in the US selling peanut butter. This population is very large, approaching
infinity, but could still be listed.
c. A census is not possible given the size and scope of the investigation.
2.41 a. Cluster sampling
b. Finite and listable
c. Yes.
2.42 a. No. It would have been too costly and taken too much time to observe everyone who used the restroom.
b. The population is finite but not listable.
c. Judgment
d. Direct observation
e. Interviewer bias.
2.43 a. Cluster Sampling
b. It doesnt change the results but you cannot use the results to make conclusions about all salmon advertised
as wild.
2.44 a. Answers will vary.
b. Convenience.
c. No. The population is too large.
d. Population can be treated as infinite and unlistable.
7
2.45 a. Telephone or mail,
b. Finite and listable
2.46 Simple random sample or systematic sampling.
2.47 a. No
b. Ordering of the list could influence the make up of the first sample.
2.48 a. Judgment or convenience
b. Non-response bias is always present in surveys. Coverage error may occur since we dont know who has
radar detectors and who doesnt before hand so may over represent one group.
c. No causation shown so conclusions are not trustworthy.
2.49 a. Cluster sampling, neighborhoods are natural clusters.
c. Picking a day near a holiday with heavy trash.
2.50 a. Convenience sampling.
b. Based on such a small sample, that may not be representative of the entire population, it would be incorrect
to make such a statement.
c.Perhaps, if the block is representative of the city, or area with in the city, or even in his local neighborhood,
then such an inference might be valid, but confined to a specific geographic area.
d. Coverage
2.51 a. Systematic
b. Simple random sample
c. Simple random sample or systematic
d. Simple random sample or systematic
e. Stratified
2.52 a. Systematic: every 5
th
person who emerges from the office; or obtain data on n randomly selected patients
and visits and analyze.
b. Direct observation for a specific time period, such as all day Wednesday.
c. n convenient places
d. Last n flights
e. Direct observation of gasoline prices at selected stations over a two week period.
2.53 a. Sales, store type
b. Yes
c. Simple random sample
2.54 a. No, one has to sample because the population is infinite and unlistable. A census is not possible.
b. One could stratify by state or county because geographic regions may differ.
2.55 a. No, the population is too large therefore sampling is required.
b. Systematic.
2.56 Judgment sampling or systematic sampling were the most likely sampling methods. A census is not possible
because the population is too large.
2.57 Convenience sample because any other method would have been more expensive and time consuming.
2.58 a. Judgment sampling.
b. Simple random sample would be impossible because it would be impossible to identify the individuals in
the population.
8
2.59 Education and income could affect who uses the no-call list.
a. They wont reach those who purchase such services. Same response for b and c.
2.60 Selection (only those who survived would be in the sample); coverage: may include those who were least
exposed to such hazards.
2.61 a. Ordinal
b. That the intervals are equal.
2.62 For each question, the difficulty is deciding what the possible responses should be and giving a realistic range
of responses.
2.63 a. Rate the effectiveness of this professor. 1 Excellent to 5 Poor.
b. Rate your satisfaction with the Presidents economic policy. 1 Very Satisfied to 5 Very dissatisfied.
c. How long did you wait to see your doctor? Less than 15 minutes, between 15 and 30 minutes, between 30
minutes and 1 hour, more than 1 hour.
2.64 a. It depends on the questions asked. It is possible that more could agree the law should be upheld, even
though on moral grounds they oppose it.
b. Setting aside your moral and personal beliefs, given that abortion is legal, should the laws be upheld?
Setting aside the fact that abortion is legal, do you believe that killing an unborn child is moral?
c. Do you believe abortion should stay legal?
2.65 Answers will vary, one consideration would be to ask the questions as a yes or no and then provide a list of
whys or ask the respondent to list reasons for yes or no answer.
2.66 Ordinal measure. There is no numerical scale and the intervals are not considered equal.
2.67 a. Likert scale.
b. Should add a middle category that states Neither Agree Nor Disagree and remove Undecided category.
2.68 a. A constrained response scale.
b. A Likert scale would be better.
c. Self-selection bias. People with very bad experiences might respond more often than people with
acceptable experiences.
9
Chapter 3
Describing Data Visually
3.1
Approximately symmetric, but can be viewed as skewed to the left.
3.2
Distribution appears symmetric.
3.3
Sarahs Calls:
Bobs Calls:
Sarahs makes more calls than Bob and her calls are shorter in duration.
10
3.4 a. 7 bins of 20
b. Answers will vary. Too few bins (less than five) or too many bins (more than 15) might hide the skewness
in the distribution.
3.5 a. 6 bins of 100
b. Answers will vary. Too few bins (less than five) or too many bins (more than 15) might hide the skewness
in the distribution.
3.6 a. 4 bins of 10
11
b. Answers will vary. Too few bins (less than five) or too many bins (more than 15) might hide the skewness
in the distribution.
3.7 Sample default graph given. Answers will vary as to modification.
3.8 Default graph presented, answers will vary with respect to modifications made.
3.9 Default graphs presented, answers will vary with respect to modifications made.

12
3.10 Default graphs for a, b, and c.
a.
b. c.
3.11 a. Sample default graph presented.
3.12 a. Sample default graph presented.
13
b. The relationship is negative, linear and strong.
3.13 a. Sample default graph presented.
b. There is a strong, positive relationship between midterm exam scores and final exam scores.
3.14 a. Sample default graph presented.
b. There is a weak, positive linear relationship.
3.15 a. Sample default graph presented.
b. There is weak, negative linear relationship.
14
3.16 Sample default graphs presented for a, b, and c.
3.17 Sample default graphs presented for a, b, and c.
3.18 Sample default graphs presented for a and b.
15
3.19 a.
b.
Frequency Distribution - Quantitative
Nurse/Bed cumulative
lower upper midpoint width frequency percent
frequenc
y percent
0.8 1 0.9 0.2 2 5 2 5
1 1.2 1.1 0.2 5 12.5 7 17.5
1.2 1.4 1.3 0.2 13 32.5 20 50
1.4 1.6 1.5 0.2 10 25 30 75
1.6 1.8 1.7 0.2 4 10 34 85
1.8 2 1.9 0.2 3 7.5 37 92.5
2 2.2 2.1 0.2 1 2.5 38 95
2.2 2.4 2.3 0.2 1 2.5 39 97.5
2.4 2.6 2.5 0.2 0 0 39 97.5
2.6 2.8 2.7 0.2 1 2.5 40 100
c. The distribution is skewed to the right. Almost half the observations are between 1.2 and 1.6. (Note:
Dotplot and histogram were generated on Megastat. Frequency distribution was calculated using Excels
Data Analysis tool.)
16
3.20 a
b.
Frequency Distribution - Quantitative
Data cumulative
lower upper midpoint width frequency percent frequency percent
0 < 5 3 5 20 23.0 20 23.0
5 < 10 8 5 23 26.4 43 49.4
10 < 15 13 5 15 17.2 58 66.7
15 < 20 18 5 7 8.0 65 74.7
20 < 25 23 5 7 8.0 72 82.8
25 < 30 28 5 7 8.0 79 90.8
30 < 35 33 5 3 3.4 82 94.3
35 < 40 38 5 3 3.4 85 97.7
40 < 45 43 5 0 0.0 85 97.7
45 < 50 47 5 2 2.3 87 100.0
c. Distribution is skewed to the right. More games seem to be decided by smaller margins of victory than
large margins of victory.
3.21 a.
17
b.
Frequency Distribution - Quantitative
Data cumulative
lowe
r
uppe
r midpoint width frequency percent frequency percent
1 < 2 2 1 24 36.9 24 36.9
2 < 3 3 1 12 18.5 36 55.4
3 < 4 4 1 9 13.8 45 69.2
4 < 5 5 1 2 3.1 47 72.3
5 < 6 6 1 3 4.6 50 76.9
6 < 7 7 1 4 6.2 54 83.1
7 < 8 8 1 1 1.5 55 84.6
8 < 9 9 1 1 1.5 56 86.2
9 < 10 10 1 0 0.0 56 86.2
10 < 11 11 1 1 1.5 57 87.7
11 < 12 12 1 0 0.0 57 87.7
12 < 13 13 1 1 1.5 58 89.2
13 < 14 14 1 3 4.6 61 93.8
14 < 15 15 1 0 0.0 61 93.8
15 < 16 16 1 0 0.0 61 93.8
16 < 17 17 1 0 0.0 61 93.8
17 < 18 18 1 0 0.0 61 93.8
18 < 19 19 1 1 1.5 62 95.4
19 < 20 20 1 0 0.0 62 95.4
20 < 21 21 1 1 1.5 63 96.9
21 < 22 22 1 0 0.0 63 96.9
22 < 23 23 1 0 0.0 63 96.9
23 < 24 24 1 0 0.0 63 96.9
24 < 25 25 1 0 0.0 63 96.9
25 < 26 26 1 0 0.0 63 96.9
26 < 27 27 1 1 1.5 64 98.5
27 < 28 28 1 0 0.0 64 98.5
28 < 29 29 1 0 0.0 64 98.5
29 < 30 29 1 1 1.5 65 100.0
c. The distribution is skewed to the right. Three-fourths of the calls are 6 minutes or less in duration.
18
3.22 a.
b.
Frequency Distribution - Quantitative
Calories Per Gram cumulative
low
er
uppe
r
midpoin
t
widt
h frequency percent frequency percent
1.60 < 1.80 1.70 0.20 1 6.7 1 6.7
1.80 < 2.00 1.90 0.20 1 6.7 2 13.3
2.00 < 2.20 2.10 0.20 3 20.0 5 33.3
2.20 < 2.40 2.30 0.20 1 6.7 6 40.0
2.40 < 2.60 2.50 0.20 4 26.7 10 66.7
2.60 < 2.80 2.70 0.20 4 26.7 14 93.3
2.80 < 3.00 2.90 0.20 1 6.7 15 100.0
c. The distribution appears to be skewed to the left. The sample size is not large enough to draw valid
inferences about shape.
3.23 a.
19
b.
Frequency Distribution - Quantitative
Data cumulative
lowe
r
uppe
r midpoint width frequency percent frequency percent
80 < 90 85 10 1 2.7 1 2.7
90 < 100 95 10 3 8.1 4 10.8
100 < 110 105 10 8 21.6 12 32.4
110 < 120 115 10 4 10.8 16 43.2
120 < 130 125 10 6 16.2 22 59.5
130 < 140 135 10 4 10.8 26 70.3
140 < 150 145 10 4 10.8 30 81.1
150 < 160 155 10 5 13.5 35 94.6
160 < 170 165 10 1 2.7 36 97.3
170 < 180 175 10 1 2.7 37 100.0
c. The shape is somewhat symmetrical.
3.24 a. Column chart with 3D visual effect.
b. Strengths: labels on X and Y axes; Good proportions; no distracting pictures; Weaknesses: Title doesnt
make sense; Non-zero origin; 3D effect does not add to presentation; tick marks do not stand out,
measurement units on Y axis missing; vague source
c. Correct weaknesses as noted, 2Dcolumn chart would be less distracting, zero origin on Y axis would show
realistic differences in years.
3.25 a. Bar chart with 3D visual effect.
b. Strengths: Good proportions, no distracting pictures. Weaknesses: No labels on X and Y axes, Title
unclear, 3D effect does not add to presentation.
c. Correct weaknesses as noted. 2D bar chart with zero origin on X axis would improve chart.
3.26 a. Line chart
b. Strengths: labels on X and Y axes, measurement included on Y axis, Good title; No distracting pictures;
good use of gridlines Weaknesses: magnitude difference between net income and sales, no source.
c. Correct weaknesses as noted, use of logarithmic scale would correct proportion issue and show growth
rates more clearly.
20
3.27 a. Map
b. They are appropriate when patterns of variation across space are of interest. Not sure that the question
being addressed requires a map. It is difficult to assess the question from the information on the map.
The actual share rather than above or below the national average would have more meaning.
c. A pie chart showing distribution by largest states and an other category. Set of pie charts by region, with
values on national shares. A histogram of percentages would show distribution.
3.28 a. Standard pie chart.
b. Strengths: source identified, answers the question posed. Weaknesses: Other category quite large.
c. Might change title: Distribution of Advertising Dollars in the United States, 2001. Would add the total
dollars spent on advertising as well.
3.29 a. Exploded pie chart.
b. Strengths include details on names and values, differentiating between types of countries, good use of
color.
c. Improvement might be percentage shares instead of actual values, and include just the total level of
imports. Might consider a sorted column chart with OPEC and Non-OPEC countries color coded.
3.30 a. Line chart.
b. Strengths: labels on X and Y axes, use of gridlines. Weaknesses: distracting pictures, dramatic title, non-
zero origin, no source.
c. Correct weaknesses as noted, keep graph as line graph.
3.31 a. Pictogram.
b. Weakness: Title unclear. Uses two pieces of the paw to illustrate a single number. Strengths: Visually
appealing, values and labels easy to understand.
c. Should use a better title and try to use a single piece to represent each value. A pie chart with exploding
piece for Shelter/Rescue might be just as effective. Or perhaps a Pareto chart would work well.
3.32 a. Pictogram
b. Strengths: labels on X and Y axes, zero origin, Short descriptive title, detailed, accurate source.
Weaknesses: use of Area Trick to make 2000 appear much larger than 1980 values.
c. Correct weaknesses as noted, regular column chart would be better choice.
3.33 a. Figure B is the better choice, even though it has a non-zero origin. The regression line fitted shows the
actual change in noise level due to a one unit increase in airspeed (nautical miles per hour). The
inclusion of this line and its equation overcomes any exaggeration in the trend from the use of a non-
zero origin.
b. A one nautical mile per hour increase leads to .0765 increase in decibels. A one hundred mile per hour
increase leads to a 7.650 decibel increase.
c. Yes, it does seem logical.
d. It appears that the average level of noise is around 95 decibels given an airspeed of about 350. Most jets
cruise at around 400 or more, so the noise level is generally between that of a hair dryer and chain saw.
21
3.34 a.
b. A table showing the data might be useful given the small number of countries.
3.35 a.
b. A table, bar, or column chart would also work.
3.36 a.
b. A table or column chart would also be informative.
22
3.37 a.
b. A column chart would also work.
3.38 a.
b. A pie chart would also work
3.39 a.
b. A pie chart would also work.
23
3.40 a.
b. A bar chart would also be informative.
3.41 a.
b. A bar chart would also be informative.
3.42 a.
b. Side by side pie charts or a bar chart would also work.
3.43 a.
b. A bar chart or column chart would also work.
24
3.44 a.
b. A bar chart would also work.
3.45 a
b. A line plot would also work.
3.46 a.
b. A column chart would also work.
25
3.47 a.
b. A bar chart would also work.
3.48 a.
b. A bar chart would also work.
3.49 a.
b. A bar chart would also work.
3.50 No, a table is the best way to display the data. Showing four different years would make a graph too cluttered.
3.51 No, a table is the best way to display the data. The rows may not add up to 100 due to rounding.
26
Chapter 4
Descriptive Statistics
4.1 b.


4.2 b.

27
Descriptive statistics
Data
count 32
mean
27.3
4
sample variance
61.6
5
sample standard deviation 7.85
minimum 9
maximum 42
range 33
Descriptive Statistics
Data
count 28
mean 107.25
sample variance 1,043.75
sample standard deviation 32.31
minimum 52
maximum 176
range 124
4.3 b.

4.4 a.
Quiz 1 Quiz 2 Quiz 3 Quiz 4
Count 10 10 10 10
Mean 72.00 72.00 76.00 76.00
Median 72.00 72.00 72.00 86.50
Mode 60.00 65.00 72.00 none
b. No, they dont agree for all quizzes. The mean and the median are the same for quiz 1 and quiz 2, and the
mode and the median are the same for quiz 3.
c. The mode is an unreliable measure of central tendency for quantitative data. Where the mean and median
disagree, one should look at the shape of the distribution to see which measure is more appropriate.
d. Quiz 1 and Quiz 2 have a symmetric distribution. Quiz 3 is skewed right and Quiz 4 is skewed left.
e. Students on average did better on quizzes 3 and 4.

4.5 a.
Descriptive Statistic Data
count 32
mean
27.3
4
mode
26.0
0
median
26.0
0
28
Descriptive statistics
Data
count 65
mean 4.48
sample variance 34.47
sample standard deviation 5.87
minimum 1
maximum 29
range 28
b. The mode and median are the same, but the mean is greater than the median.
c. The data are skewed to the left. See 4.1.
d. The mode is not a reliable measure of central tendency for quantitative variables.
4.6 a.
Descriptiv
e Statistics Data
count 28
mean 107.25
median 106.00
mode 95.00
b. The measures are close in value.
c. Symmetric. See the dot plot and histogram in 4.2.
d. The mode is not a reliable measure of central tendency.
4.7 b.

Descriptive Statistics Data
count 65
mean 4.48
median 2.00
mode 1.00
c. The mean is more than twice the median and the median is twice the mode.
d. The data are skewed to the right based on the histogram in 4.3.
e. Because the data is heavily skewed, the median is a better measure of central tendency.
4.8 b.
Descriptive Statistics Mon Tue Wed Thu
mean 4 4.7 3.8 1.9
median 5 5 5 1
trimmed mean 4 4.7 3.8 1.9
geometric mean 2.89 3.76 NA 1.26
midrange 5 5 5 5.5
c. The geometric mean is very different from the other measures. The mean, median, and midrange are close
in value.
d. The mean or median are better measures of central tendency for this type of data. More empty seats on
Monday and Tuesday than on Wednesday and Thursday. If one stands by on Thursday, there is little
chance that they will get on a flight compared to earlier in the week.
29
4.9 a.
mean 27.34
midrange 25.50
geometric mean 26.08
trim mean 27.47
b. The mean and the trimmed mean are similar, but they are both greater than the midrange and the geometric
range.
4.10 a.
mean 107.25
geometric mean 102.48
trimmed mean 106
midrange 114
b. The mean and trimmed mean are similar and fall between the geometric mean and midrange.
4.11 a.
mean 4.48
midrange 15.00
geometric mean 2.60
trimmed mean 3.61
b. The mean is greater than the geometric and trimmed mean. The midrange is almost 4 times the mean. This
is not surprising given the distribution of the data and a few calls lasting for a very long duration.
c. The data are skewed to the right.
d. The mean or trimmed mean are appropriate measures of central tendency for this data set.
4.12 39.9%. See calculation below.
1
1 =
n
n
x
G
x
= ((60.6/15.8)^(1/4))-1 = 0.399
4.13 a.

Sample A: Sample B: Sample C:
Mean 7 62 1001
Sample Standard Deviation 1 1 1
b. The midpoint of each sample is the mean. The other 2 data points are exactly 1 standard deviation from the
mean. The idea is to illustrate that the standard deviation is not a function of the value of the mean.
30
4.14 a.
Data Set A: Data Set B: Data Set C:
Mean 7.0000 7.0000 7.0000
Sample Standard Deviation 1.0000 2.1602 3.8944
Population Standard Deviation 0.8165 2.0000 3.7417
b. The sample standard deviation is larger than the population standard deviation for the same data set.
Samples can have similar means, but different standard deviations.
4.15
Stock / s x CV
Stock A 5.25/24.50 21.43%
Stock B 12.25/147.25 8.32%
Stock C 2.08/5.75 36.17%
a. Stock C, the one with the smallest standard deviation and smallest mean, has the greatest relative variation.
b. The stocks have different values therefore directly comparing the standard deviations is not a good
comparison of risk. The variation relative to the mean value is more appropriate.
4.16
Quiz 1 Quiz 2 Quiz 3 Quiz 4
Count 10 10 10 10
Mean 72.00 72.00 76.00 76.00
sample standard deviation 13.23 6.67 11.41 27.43
coefficient of variation
(CV) 18.38% 9.26% 15.02% 36.09%
b. Scores, on average, are higher for Quiz 3 and Quiz 4. Quiz 2 has the least relative variation and Quiz 4 the
most. As the quiz scores increase from quiz to quiz, the variation within a quiz increases.
4.17
Sample standard deviation 5.87
Mean absolute deviation 3.92
4.18 From Megastat:
empirical rule
mean - 1s 19.49
mean + 1s 35.20
percent in interval (68.26%) 68.8%
mean - 2s 11.64
mean + 2s 43.05
percent in interval (95.44%) 96.9%
mean - 3s 3.79
mean + 3s 50.90
percent in interval (99.73%) 100.0%
low extremes 0
31
low outliers 0
high outliers 0
high extremes 0
b. There are no outliers based on the empirical rule. No unusual data values.
c. Assume the data are normally distributed based on the empirical rule results.
d. Yes, the sample size is large enough.
4.19 From MegaStat:
empirical rule
mean - 1s -1.39
mean + 1s 10.35
percent in interval (68.26%) 87.7%
mean - 2s -7.27
mean + 2s 16.22
percent in interval (95.44%) 93.8%
mean - 3s -13.14
mean + 3s 22.09
percent in interval (99.73%) 96.9%
low extremes 0
low outliers 0
high outliers 4
high extremes 4
b. Yes, there are 8 outliers.
c. There are more observations in the mean +-1 interval than the empirical rule would indicate, 87.7% vs.
68.26%. There are fewer observations in the mean +- 2 interval, 93.88% vs. 95.44%. Data do not seem
to be from a normal distribution.
d. Yes, there are enough sample points to assess normality.
4.20 a.
1st quartile 22.50
Median 26.00
3rd quartile 33.00
midhinge 27.5
b.
c. The median number of customers is 26. Days with 22 or fewer customers are in the bottom quartile. Days
with 33 or more customers are in the upper quartile. The midhinge is 27.75 and is a measure of central
tendency. It is a measure of central tendency. The box plot displays the first and third quartiles and the
median. The box plot indicates that there are no extreme values or outliers.
4.21 a.
32
1st quartile 1.00
3rd quartile 5.00
interquartile range 4.00
mode 1.00
median 2.00
low extremes 0
low outliers 0
high outliers 4
high extremes 4
midhinge 3
b. The median call length was 2 minutes. Calls lasting less than 1 minute were in the bottom 25%, call
lasting more than 5 minutes were in the top 25%. The midhinge was a call length of 3 minutes. The
midhinge is a measure of central tendency, and like the median is not influenced by extreme values.
c. The box plot confirms the quartile calculations and reveals that there are 6 calls of unusually large duration,
4 of the them extreme in length.
4.22 a. Mean = 725. Median = 720. Mode = 730. SD = 114.3. Q1 = 662.5. Q3 = 755.
b. The typical student pays $725 per month.
c. The measures do tend to agree.
d. From Megastat:
empirical rule
mean - 1s 610.39
mean + 1s 838.95
percent in interval (68.26%) 70.0%
mean - 2s 496.10
mean + 2s 953.23
percent in interval (95.44%) 96.7%
mean - 3s 381.82
mean + 3s 1,067.51
percent in interval (99.73%) 100.0%
low extremes 0
low outliers 1
high outliers 3
high extremes 0
There are four outliers.
e. It is possible that the data are normally distributed based on the empirical rule results.
4.23 a. Mean = 66.2. Median = 48. Mode = 48. Midrange = 108.
33
b. The median is the best measure of central tendency for a data set that is skewed.
c. The typical number of pages in a mail order catalog is 48.
d. SD = 36.36.
e. From Megastat:
empirical rule
mean - 1s 29.74
mean + 1s 102.56
percent in interval (68.26%) 90.0%
mean - 2s -6.67
mean + 2s 138.97
percent in interval (95.44%) 95.0%
mean - 3s -43.08
mean + 3s 175.38
percent in interval (99.73%) 95.0%
low extremes 0
low outliers 0
high outliers 1
high extremes 1
There are two high outliers.
4.24 a.
b.
Mean 17.83
median 18.00
mode 17.00
geometric mean 17.11
midrange 20.00
c. The mean is the best measure of central tendency.
d. The typical ladder weighs 17 or 18 pounds.
34
4.25 a.
Most travelers buy their tickets between 1 and 21 days before their trip. Half of these buy the ticket about 2
weeks before.
b. Mean = 26.71. Median = 14.5. Mode = 11. Midhinge = 124.5.
c. Q1 = 7.75, Q3 = 20.25, Midhinge = 14, and CQV = 44.64%.
d. The geometric mean is only valid for data greater than zero.
e. The median is the best measure of central tendency because the data is quantitative and heavily skewed
right.
4.26 The mode is the measure of central tendency because the data is categorical.
4.27 a. Stock funds: 1.329 x = , median = 1.22, mode = 0.99. Bond funds: 0.875 x = , median = 0.85, mode =
0.64.
b. The central tendency of stock fund expense ratios is higher than bond funds.
c. Stock funds: s = 0.5933, CV = 44.65%. Bond funds: s = 0.4489, CV = 51.32%. The stock funds have less
variability relative to the mean.
d. Stock funds: Q1 = 1.035, Q3 = 1.565, Midhinge = 1.3. Bond funds: Q1 = 0.64, Q3 = 0.99, Midhinge = 0.815.
Stock funds have higher expense ratios in general than bond funds.
4.28 a. Mean = 34.54. Median = 33.0. Mode = 23. Midrange = 42.
b. The mean or median would be an appropriate measure of central tendency because the data is fairly
symmetric.
c. SD = 10.31. CV = 29.85%.
d. The number of raisins in a box is based on weight, not quantity. The size of the raisins vary therefore the
number will vary.
4.29 a. Mean = 52.15. Median = 48.5. Mode = 47.0. Midrange = 60.0.
b. Geometric mean = 50.92.
c. All measures of central tendency are fairly close therefore use the mean.
4.30 a. Most brands contain between 160 and 230 milligrams of sodium. Five brands are very low in sodium.
b. Mean = 179.9. Median = 195.0. Mode = 225. Midrange = 187.5.
c. The median and midrange are better measures than the mean and mode because the data appear to have
outliers.
d. The geometric mean cannot be used because there are data values equal to zero.
f. There are five low outliers and 1 high outlier.
35
4.31 a.
The dot plot shows that most of the data is centered around 6500 yards. The distribution is skewed to the left.
b.
6, 335.52 x =
, median = 6,400.0, mode = 6,500.0, and midrange = 6,361.5.
c. Best: Median because data is quantitative and skewed left. Worst: Mode worst because the data is
quantitative and very few values repeat themselves.
d. This data is not highly skewed. The geometric mean works well for skewed data.
4.32 a.
The dot plot shows a fairly uniform distribution.
b. Mean = 7.52. Median = 6.8. Mode = 9.0. Midrange = 9.25.
c. The mean or the median are the best measures of central tendency because the data is fairly symmetric. The
worst measure would be the mode because there are very few values that repeat.
4.33 a. Male: Midhinge = 177, CQV = 2.82%. Female: Midhinge = 163.5, CQV = 2.75%. These statistics are
appropriate because we have specific percentiles but not the entire data set.
c. Yes, height percentiles do change. The population is slowly increasing in height.
4.34 a. Mean = 95.1. Median = 90.0. There is no mode.
b. The median is the best measure of central tendency because the data set has two high outliers.
4.35 a. 3012.44 x = , median = 2,550.5. There is no value for the mode.
b. The typical cricket clubs income is approximately 2.5 million.
4.36 The coefficient of variation for plumbing suppliers vinyl washers is: 6053/24212 = 25%. The coefficient of
variation for steam boilers is 1.7/6.8 = 25%. The demand patterns exhibit similar relative variation, even
though the standard deviations are very different.
4.37 The coefficient of variation for the lab mouse = 0.9/18 = 5%. The coefficient of variation for the lab rat =
20/300 = 6.67%. The rat has more relative variation based on a higher coefficient of variation. The
weights of lab mouse vary less around the mean.
4.38 a. See table below for CV values.
Comparative Returns on Four Types of Investments
Investment Mean Return Standard Deviation Coefficient of
Variation
Venture funds
(adjusted)
19.2 14.0 72.92%
All common stocks 15.6 14.0 89.74%
Real estate 11.5 16.8 146.09%
Federal short term
paper
6.7 1.9 28.36%
36
37
b. The standard deviations are an absolute not relative measure of dispersion. It is best to use the CV when
comparing across variables that have different means.
c. The risk and returns are captured by the CV. Federal short term paper has the lowest CV and hence lowest
risk, real estate the greatest risk. Venture funds have lower risk and greater return than common stocks
based on the CV.
4.39 a. CV Tuition Plans = 100*2.7%/6.3% = 42.86%. CV SP500 = 100*15.8%/12.9% = 122.48%.
b. We use the CV to compare the return and risk. The standard deviation tells us the measure of dispersion
relative to the mean of the distribution only. The standard deviation alone can not be used to compare
distributions.
c. The tuition plans have lower returns than the SP 500, but less risk as measured by the CV. This is not
surprising since the goal of a tuition plan is to ensure that a minimum amount of money is available at the
time the plan matures, thus parents and students are willing to take a lower return in exchange for lower
risk.
4.40 a. Midrange = (180+60)/2 = 120.
b. Assuming normality is important so that we can estimate the mean with the midrange.
c. Caffeine levels in brewed coffee are dependent on many factors including brand of coffee, grind of coffee
beans, and brew time. It is likely that the distribution is skewed to the right.
4.41 a. Midrange = (.92+.79)/2 = .82
b. A normal distribution is plausible here because there are likely to be controls on the level of chlorine added
to the water. There will be some variation around the mean but it will be predictable.
4.42 a. The distribution should be skewed to the right because the mean is greater than the median.
b. Most ATM transactions will tend to low in value but a few will be of longer duration.
4.43 a. The distribution should be skewed to the right because the mean is greater than the median.
b. Most patrons keep books out for a week or so. There will be a few patrons that keep a book out much
longer.
4.44 a. The distribution should be skewed to the left because the mean is less than the median.
b. It appears that most students scored a C or higher but there were a few students that may not have studied
for the exam.
4.45 a. The distribution of the number of DVDs owned by a family would likely be skewed to the right. Most
families will own a fairly small number of DVDs but a few families will own many.
b. Mean > median > mode.
4.46 a. The histogram should show a symmetrical distribution.
b. Answers will vary.
4.47 a. One would expect the mean to be close in value to the median, or slightly higher.
b. In general, the life span would have a normal distribution. If skewed, the distribution is more likely skewed
right than left. Life span is bounded below by zero but is unbounded in the positive direction.
4.48 a. The mean would be greater than the median. There are likely to be a few waiting times that are extremely
long.
b. If someone dies while waiting for a transplant that value should not be included in the mean or median
calculation.
4.49 a. It is the midrange, not the median.
b. The midrange is influenced by outliers. Salaries tend to be skewed to the right. The community should use
the median to base charges.
38
4.50 a. The distribution would be skewed right.
b. Switching from the mean to the median would trigger a penalty sooner because the median is less than the
mean.
c. The union would oppose this change because they would probably have to pay more penalties.
4.51 a. and c.
Week 1 Week 2 Week 3 Week 4
mean 50.00 50.00 50.00 50.00
sample standard deviation 10.61 10.61 10.61 10.61
median 50.00 52.00 56.00 47.00
b. Based on the mean and standard deviation it appears that the distributions are the same.
d.


e. Based on the medians and dotplots, the distributions are quite different.
4.52 Results will vary by student
4.53 a. For 1990:
From To frequency (F) midpoint (M) F*M F*(M-xbar)^2
1 2 39 1.5 58.5 240.1898
2 3 35 2.5 87.5 76.83767
3 4 27 3.5 94.5 6.264302
4 5 26 4.5 117 6.98517
5 6 24 5.5 132 55.32743
6 7 30 6.5 195 190.2588
7 8 9 7.5 67.5 111.4075
8 9 1 8.5 8.5 20.41526
Total 191
Average = 3.981675
Standard
Deviation = 1.92993846CV = 0.484705
39
For 2000:
From To frequency (F) midpoint (M) F*M F*(M-xbar)^2
1 2 54 1.5 81 210.3826
2 3 44 2.5 110 41.72649
3 4 23 3.5 80.5 0.015762
4 5 22 4.5 99 23.16691
5 6 26 5.5 143 106.7403
6 7 16 6.5 104 146.5241
7 8 5 7.5 37.5 81.05055
8 9 1 8.5 8.5 25.26247
Total 191
Average = 3.473822
Standard
Deviation = 1.82795415CV = 0.526208
b. The average fertility rate is approximately 4 children per women with a standard deviation of 1.9. The
central tendency and dispersion have decreased slightly from 1990 to 2000.
c. We could more easily have seen which countries are at the high end of the distribution.
d. A frequency table makes it easier to see the distribution and to create a histogram.
4.54 a.
From To frequency (F) midpoint (M) F*M F*(M-xbar)^2
119 120 1 119.5 119.5 12.29671111
120 121 5 120.5 602.5 31.41688889
121 122 16 121.5 1944 36.32071111
122 123 22 122.5 2695 5.647644444
123 124 12 123.5 1482 2.920533333
124 125 9 124.5 1120.5 20.0704
125 126 5 125.5 627.5 31.08355556
126 127 3 126.5 379.5 36.61013333
127 128 2 127.5 255 40.38008889
Total 75
Average = 123.0067
Standard
Deviation = 1.71143478CV = 0.01391335
b. The average winning time 123.0 seconds and the standard deviation on winning time is 1.71 seconds. The
distribution may be slightly skewed right.
c. The raw data would show us the years that the winning times were much longer than the average.
d. Because the overall distribution on time is slightly skewed right it is possible that the times within an
interval are also skewed right. We give equal weight to the midpoints of each interval and our estimate of
the mean could be too high.
40
4.55 a.
From To frequency (F) midpoint (M) F*M F*(M-xbar)^2
0 250 2 125 250 1396836.735
250 500 5 375 1875 1715306.122
500 1000 14 750 10500 621607.1429
1000 2000 14 1500 21000 4071607.143
Total 35
Average = 960.7143
Standard
Deviation = 479.133935CV = 0.498726773
b. The average number of degree days is 960 and the standard deviation is 480.
c. The raw data would have been useful for creating a histogram to see the shape of the distribution.
d. Equal intervals of 250 might have spread the data out too far resulting in classes with frequency of zero.
Yes, it does affect the calculations.
4.56 a.
From To frequency (F) midpoint (M) F*M F*(M-xbar)^2
20 30 1 25 25 1522.529796
30 40 9 35 315 7579.238754
40 50 20 45 900 7234.90965
50 60 17 55 935 1383.006536
60 70 36 65 2340 34.60207612
70 80 67 75 5025 8078.123799
80 90 3 85 255 1320.530565
Total 153
Average = 64.01961
Standard
Deviation = 13.3655442CV = 0.208772665
b. The average life expectancy is approximately 64 years with a standard deviation of 13.4 years. The
distribution appears to be skewed to the left.
c. The raw data would allow us to see those countries that have a very small life expectancy.
d. A frequency table makes it easier to see the distribution and to create a histogram.
4.57 a.
From To frequency (F) midpoint (M) F*M F*(M-xbar)^2
40 50 12 45 540 2771.049596
50 60 116 55 6380 3131.910804
60 80 74 70 5180 7112.648981
80 100 2 90 180 1776.547482
Total 204
Average = 60.19608
Standard
Deviation = 8.53626193CV = 0.141807609
b. No the unequal class sizes dont hamper the calculations. Class sizes are unequal to ensure that no class
size has a zero value.
41
4.58 a.
From To frequency (F) midpoint (M) F*M F*(M-xbar)^2
0 3 23 1.5 34.5 3697.366199
3 6 46 4.5 207 4309.350046
6 10 37 8 296 1412.625655
10 20 44 15 660 29.66347078
20 30 31 25 775 3629.967891
30 50 23 40 920 15334.7461
Total 204
Average = 14.17892
Standard
Deviation = 11.8308521CV = 0.834397173
b. The unequal class sizes dont hamper the calculations. Unequal class sizes can be used when a distribution
is strongly skewed to avoid classes with zero frequencies.
4.59 a. We can find the median class because we know the frequency within each class.
b. Unequal class sizes can be used when a distribution is strongly skewed to avoid classes with zero
frequencies.
4.60 a. We can find the median class because we know the frequency within each class.
b. Unequal class sizes can be used when a distribution is strongly skewed to avoid classes with zero
frequencies.
42
Chapter 5
Probability
5.1 a. S = {(V,B), (V,E), (V,O), (M,B), (M,E), (M,O), (A,B), (A,E), (A,O)}
b. Events are not equally likely. Borders probably carries more books than other merchandise.
5.2 a. S = {(S,L), (S,T), (S,B), (P,L), (P,T), (P,B), (C,L), (C,T), (C,B)}
b. There are different likelihoods of risk levels among the 3 types of business forms; therefore the different
elementary events will have different likelihoods.
5.3 a. S = {(L,B), (L,B), (R,B), (R,B)}
b. Events are not equally likely. There are more right handed people than left handed people.
5.4 a. S ={(1,H), (2,H), (3,H), (4,H), (5,H), (6,H), (1,T), (2,T), (3,T), (4,T), (5,T), (6,T)}
b. Yes, assuming that we have a fair die and fair coin.
5.5 a. Opinion of experienced stock brokers or empirical.
b. From historical data of IPOs or based on judgments.
5.6 a. Subjective
b. Opinion of a group of telecommunication stock brokers.
5.7 a. Empirical
b. Historical data of past launches.
5.8 a. Classical
b. There are 36 different outcomes from rolling two dice. There are 6 ways to roll a 7. .0046 =
3
6
36
| |
|
\
.
5.9 a. P(A B) = .4 + .5 .05 = .85.
b. P(A | B) = .05/.50 = .10.
c. P(B | A) = .05/.4 = .125.
d.
5.10 a. P(AB) = P(A) + P(B) - P(AB) = .7 + .3 0 = 1.0 .
b. P(AB) = P(AB) / P(B) = .00 / .30 = .00.
c. The intersection is an empty set because the P(AB) = 0.
43
5.11 a. P(S) = .217.
b. P(S) = .783.
c. Odds in favor of S: .217/.783 = .277.
d. Odds against S: .783/.217 = 3.61
5.12 a. (.017) /(.983) /100 = .0173 to 1
b. 98.3 / .017 = 57.83 to 1
5.13 a. X = 1 if the drug is approved, 0 otherwise.
b. X = 1 if batter gets a hit, 0 otherwise.
c. X = 1 if breast cancer detected, 0 otherwise.
5.14 a. (admitted unconditionally, admitted conditionally, not admitted)
b. (completed pass, incomplete pass, intercepted pass)
c. (deposit, withdrawal, bill payment, funds transfer)
5.15 a. P(S) = 1.246. There is a 75.4% chance that a female aged 18-24 is a nonsmoker.
b. P(S C) = .246+ .830 .232 = .844. There is an 84.4% chance that a female aged 18-24 is a smoker or is
Caucasian.
c. P(S | C) = .232/.830 = .2795. Given that the female aged 18-24 is a Caucasian, there is a 27.95% chance
that they are a smoker.
d. P(S C) = P(S) P(S C) = .246 .232 = .014. P(S | C) = .014/.17 = .0824. Given that the female ages
18-24 is not Caucasian, there is an 8.24% chance that she smokes.
5.16 P(AB). = P(A) * P(B) = .40*.50 =0.20
5.17 a. P(AB) = P(A B) / P(B) = .05/.50 = .10
b. No, A and B are not independent because P(AB) P(A).
5.18 a. P(A) * P(B) = .40*.60 =.24 and P(A B) = .24, therefore A and B are independent.
b. P(A) * P(B) = .90*.20 =.18 and P(A B) = .18, therefore A and B are independent.
c. P(A) * P(B) = .50*.70 = .35 and P(A B) = .25, therefore A and B are not independent.
5.19 a. P(V M) = .70 + .60 .50 = .80.
b. P(V M) P(V)*P(M) therefore V and M are not independent.
5.20 a. There is 25% chance that a clock will not ring (a failure, F). Both clocks would have to fail in order to
have him oversleep. Assuming independence: P(F1F2) = P(F1) * P(F2) = .25*.25 = .0625.
b. The probability that at least one of the clocks rings is 1 (P(F1)*P(F2)*P(F3)) = 1- (.25*.25*.25) = .9844,
which is less than 99%.
5.21 Five nines reliability means P(not failing) = .99999. P(power system failure) = 1 (.05)
3
= .999875. The
system does not meet the test.
5.22 a. P(A) = 100/200 = .50. There is a 50% chance that a student is an accounting student.
b. P(M) =102/200 = .51. There is a 51% chance that a student is male.
c. P(A M) = 56/200 = .28. There is a 28% chance that a student is a male accounting major.
d. P(F S) = 24/200 = .12. There is a 12% chance that a student is a female statistics major.
e. P(A | M) = 56 /102 = .549. There is 54.9% chance that a male student is an accounting major.
f. P(A | F) = P(F A) / P(F)= (44/200)/(98/200) = .4489. There is a 44.89% chance that a female student is
an accounting major.
g. P(F | S) = P(F S) / P(S) (24/200) /(40/200) = .60. There is a 60% chance that a statistics student is
female.
44
h. P(E F) = P(E) + P(F) - P(F E) 60/200 + 98/100 30/100 = 128/200 = 64%. There is 64% chance
that a student is an economics major or a female.
5.23 Gender and Major are not independent. For example, P(A F) = .22. P(A)P(F) = .245. Because the values
are not equal, the events are not independent.
5.24 a. P(D3) = 15/38 = .3948.
b. P (Y3) = 15/38 = .3948.
c. P(Y3 | D1 = P(Y3 D1) / (D1) = (2/38)/(11/38) =.1818.
d. P(D1 | Y3) = P(Y3 D1) / P(Y3) = (2/38)/(15/38) =.1333.
5.25
Joint Probabilities
P(C
1
S
1
) = .8 *.7 = .56
P(C
2
S
1
) = .8 *.2 = .16
P(C
3
S
1
) = .8 *.1 = .08
Sums to 1.00
P(C
1
S
2
) = .2*.5 = .10
P(C
2
S
2
= .2 *.4 = .08
P(C
3
S
2
) = .2 *.1 = .02
Sums to 1.00 Sum of 6 joint probablities is 1.00
Pays by debit/credit cardP(C
3
|S
2
) = .1
Pays by debit/credit cardP(C
2
|S
2
) = .4
Pays by debit/credit cardP(C
1
|S
1
) = .7
Pays by debit/credit cardP(C
2
|S
1
) = .2
Takes a shopping cart
P(S
1
) = .8
Does Not Take a
Shopping Cart P(S
2
) = .2
Pays by debit/credit cardP(C
1
|S
2
) = .5
Pays by debit/credit cardP(C
3
|S
1
) = .1
45
5.26
5.27 Let A = using the drug. P(A) = .04. P(A) = .96. Let T be a positive result. False positive: P(T | A) = .05.
False negative: P(T | A) = .10. P(T | A) = 1 .10 = .90. P(T) = (.04)(.90) + (.05)(.96) = .084. P(A | T) =
(.9)(.04)/.084 = .4286.
5.28 P(A) = .5 P(B) = .5 P(D) = .04 P(ND) = .96 P(D|A) = .06 P(A|D) = P(D A) / P(D) = ( P(D|A)* P(A) ) /
P(D) = (.06*.5) / .04 = .75.
5.29 Let W = suitcase contains a weapon. P(W) = .001. P(W) = .999. Let A be the alarm trigger. False positive:
P(A | W) = .02. False negative: P(A | W) = .02. P(A | W) = 1 .02 = .98. P(A) = (.001)(.98) + (.02)
(.999) = .02096. P(W | A) = (.98)(.001)/.02096 = .04676.
5.30 a. 26*26* 10*10*10*10 = 6,760,000 unique ids.
b. No, that only yields 26,000 unique ids.
c. As growth occurs over time, you would not ever have to worry about a duplicate id nor have to generate
new ones.
5.31 a. 10
6
= 1,000,000.
b. 10
5
= 100,000.
c. 10
6
= 1,000,000
5.32 a.
!
( )!
r n
n
P
n r
=

) (n=4, r=4) = 24 ways.


b. ABCD, ABDC, ACBD, ACDB, ADBC, ADCB, BACD, BADC, BCAD, BCDA, BDAC, BDCA, CABD,
CADB, CBAD, CBDA, CDBA, CDAB, DABC, DACB, DBAC, DBCA, DCAB, DCBA
5.33 a. 7! = 5,040 ways.
b. No, too many!
5.34 a. n = 8 and r = 3 : 336.
b. n = 8 and r = 5 : 6720.
c. n = 8 and r = 1 : 8.
d. n = 8 and r = 8. : 40320.
46
5.35 a. 8C3 = 56.
b. 8C5 = 56.
c. 8C1 = 8.
d. 8C8 = 1.
5.36 a.
!
( )!
r n
n
P
n r
=

, (n =10, r = 4 )= 5040.
b.
!
!( )!
r n
n
C
r n r
=

, (n=10, r =4 ) = 210.
5.40 a. Sample Space is made up of pairs denoted by x. There are 21 combinations.
Brown Yellow Red Blue Orange Green
Brown x
Yellow x x
Red x x x
Blue x x x x
Orange x x x x x
Green x x x x x x
b. P(BR)*P(BR) = .13*.13 = .0169.
c. P(BL)*P(BL) = .24*.24 = .0576.
d. P(G)*P(G) = .16*.16 = .0256.
e. P(BR)*P(G) = .13*.16 = .0208.
5.41 a. An empirical probability using response frequencies from the survey.
b. Odds for failure: .44/.56 = .786.
5.42 a. Subjective
b. P(loss) = 0.01408
5.43 No, the law of large numbers says that the larger the sample, the closer our sample results will be to the true
value. If Tom Brookens increases his times at bat hell get closer and closer to his true batting average,
which is probably close to .176.
5.44 a. Subjective
b. Bob probably based this estimate on his swimming ability and success of others who have completed this
feat.
5.45 a. Empirical or subjective
b. Most likely estimated by interviewing ER doctors.
c. The sample could have been small and may not have been a representative sample of all doctors.
5.46 a. Empirical
b. From a sample of births in the U.S.
c. It depends on the size of the sample and how closely the sample represented the US as a whole.
5.47 a. Subjective
b. Simulated experiment using a computer model.
c. The estimate is probably not very accurate. Results are highly dependent on the computer simulation and
the data put into the model.
5.48 a. 114/108 = .8814
47
b. Empirical
48
5.49 a. Empirical or subjective
b. Observation or survey
c. The estimate is probably not very accurate. Observation is difficult and survey results may be biased.
5.50 a. .227/.773 = .29 to 1 in using a debit card
b. .773/.227 = 3.4 to 1 against using a debit card
5.51 H1 = First child has high cholesterol, N1= First child has normal cholesterol.
H1H2H3:
P(H1H2H3) = (.01)(.1) = .001
H1H2
P(H1H2) = H1H2N3
H1 (.1)(.1) =.01 P(H1H2H3) = (.01)(.9) = .009
P(H1) = .1 H1N2H3
P(H1N2H3) = (.09)(.1) = .009
H1N2
P(H1N2) = H1N2N3
(.1)(.9) = .09 P(H1N2N3) = (.09)(.9) = .081
N1H2H3
P(N1H2H3) = (.09)(.1) = .009
N1H2
P(N1H2) = N1H2N3
(.9)(.1) = .09 P(N1H2N3) = (.09)(.9) = .081
N1
P(N1) = .9 N1N2H3
P(N1N2H3) = (.81)(.1) = .081
N1N2
P(N1N2) = N1N2N3
(.9)(.9) = .81 P(N1N2N3) = (.81)(.9) = .729
49
5.52 a.
5.53 Odds against an Acura Integra being stolen = .987/.013 = 76 to 1.
5.54 a. .33 / (1-.33) = .493 to 1 being killed.
b. .67/.33 = 2.03 to 1 against being killed.
5.55 P(Detroit Wins) = 50/51 = .9804. P(New Jersey Wins) = 5/6 = .8333.
5.56 a. 2
9
= 512 separate codes.
b. 2
10
= 1024 separate codes.
c. (1/512)* 1000 = 1.95 times (approximately 2) or (1/1024)* 1000 = 0.977 (approximately 1). We assume
that each combination is selected independently.
5.57 a. 26
3
10
3
= 17,576,000.
b. 36
6
= = 2,176,782,336.
c. 0 and 1 might be disallowed since they are similar in appearance to letters like O and I .
d. Yes, 2.1 billion unique plates should be enough.
e. 34
6
= 1,544,804,416.
5.58 Suppose the correct order for the meals is ABC. The possibilities for incorrect orders include: ACB (1
incorrect meal), BAC (1 incorrect meal), BCA (3 incorrect meals), CAB (3 incorrect meals), and CBA (1
incorrect meal).
a. P(No diner gets the correct meal) = 1/3
b. P(Exactly one diner gets the correct meal) = 1/2
c. P(Exactly two diners get the correct meal) = 0
d. P(All three diners get the correct meal) = 1/6
5.59 7P3 = 210.
5.60 a. The first 4 cards are aces: (4/52)(3/51)(2/50)(1/49) = 3.694E-06.
b. Any four cards in the hand are aces. The sample space includes: AAAAN, AAANA, AANAA, ANAAA,
NAAAA. Each of these sample spaces has the same probability as given in part a; therefore, the
probability that any four of the five cards are aces is 5*(3.694E-06) = 1.847E-05.
50
5.61 a. P(Two aces) = (4/52)(3/51) = 0.00452.
b. P(Two red cards) = (26/52)(25/51) = 0.245098.
c. P(Two red aces) = (2/52)(1/51) = 0.000754.
d. P(Two honor cards) = (20/52)(19/51) = 0.143288.
5.62 Let F denote a failure and S denote a non-failure. Use the multiplicative rule of probability for independent
events:
a. P(F1 F2) = P(F1)*P(F2) = .02*.02 = .0004.
b. P(S1 S2) = P(S1)*P(S2) = .98*.98 = .9604.
c. The sample space for all events is : F1F2, S1F2, F1S2, S1S2 There is only one event that does not contain
at least one failure, S1S2. The probability that one or the other will fail is 1- P(S1 S2) = 1- .9604 = .
0396.
5.63 No, P(A)P(B) .05.
5.64 P(B1 B2) = P(B1)*P(B2) = .5*.5 = .25.

5.65 a. Having an independent back up power system for the computers might have eliminated the delayed flights.
b. If the cost due to delayed/cancelled flights, weighted by the risk of a power outage, is greater than
$100,000 then the airline can justify the expenditure.
5.66 a. Independent
b. These are typically considered dependent. Insurance rates are higher for most men because they are
involved in more accidents.
c. Dependent, most calls are during regular business hours when the office is open.
5.67 Assuming independence, P(3 cases won out of next 3) = .7
3
= .343.
5.68 a. If p is the probability of failure, we can set p
k
= 0.00001, plug in p = 0.01, take the log of both sides, and
solve for k. In this case, k = 2.50 and then round up to the next higher integer. So 3 are required.
b. For p = .10, k = 5, so 5 servers are required.
5.69 Assuming independence, P(4 adults say yes) = .56
4
= 0.0983.
5.70 a. P(fatal accident over a lifetime) = 1 P(no fatal accident over a lifetime) = 1 (3,999,999/4,000,000)
50,000
= .012422.
b. The probability of an accident each time you get behind the wheel is so small that an individual might take
the risk.
5.71 See the Excel Spreadsheet in Learning Stats: 05-13 Birthday Problem.xls.
For 2 riders: P(no match) = .9973.
For 10 riders: P(no match) = 0.8831.
For 20 riders: P(no match) = 0.5886.
For 50 riders: P(no match) = 0.0296.
5.72 See the Excel Spreadsheet in Learning Stats: 05-13 Birthday Problem.xls.
If there are 23 riders, P(match) = .50730.
If there are 32 riders, P(match) = .75.
5.73 a. i. .4825. The probability of seeing a car in a shopping mall parking lot is .4825.
ii. .25. The probability of seeing a vehicle in the Great Lakes shopping mall is .25.
iii. .115. The probability of seeing a parked truck in a shopping mall is .115.
iv. .19. The probability of seeing a parked SUV at the Somerset mall is .19.
v. .64. The probability of seeing a parked car at the Jamestown mall is .64.
vi. .3316. The probability that a parked car is at the Jamestown mall is .3316.
51
vii. .09. The probability that a parked vehicle is a car and is at the Great Lakes mall is .09.
viii. .015. The probability a parked vehicle is a truck and is at the Oakland mall is .015.
ix. .0325. The probability that a parked vehicle is a Minivan and is at the Jamestown mall is .0325.
b. Yes, the vehicle type and mall location are dependent. For example, P(T)*P(O) = (.115)(.25) = .02875. P(T
and O) = .015. Because .02875 .015, the events are dependent.
5.74 a. i. P(S) = 320/1000 = .32. The likelihood of a male 18-24 smoking is .32.
ii. P(W) =850/1000 = .85. The likelihood of a male 18-24 being white is .85.
iii. P(S | W) = P(S and W)/ P(W) = .29/.85 = .3412. The likelihood of a white male 18-24 being a smoker is .
3412.
iv. P(S | B) = P(S and B)/ P(B) = (30/1000)/(150/1000) = .200. The likelihood of a black male 18-24 being a
smoker is .20.
v. P(S and W) = 290/1000 = .290. The likelihood of a male 18-24 being a smoker and being white is .290.
vi. P(N and B) = 120/1000 = .12 The likelihood of a male 18-24 not smoking and being black is .12.
b. The P(S and W) = .29 and the P(S)*P(W) = .32*.85 = .272. The P(S and B) = .030 and the P(B)*(S) = .
32*.15 = .048. Yes, the smoking rates suggest that race and smoking are dependent.
d. If smoking is dependent on race, then health officials might target or design special programs based on
race.
5.75 a. i. .5588. The probability that the forecasters predicted a decline in interest rates is .5588.
ii. .5294. The probability there was a rise in interest rates is .5294.
iii. .3684. Given that the forecasters predicted a decline in interest rates, the probability that there was an
actual decline is .3684.
iv. .4. Given that the forecasters predicted an increase in interest rates, the probability that there was an actual
increase is .4.
v. .1765. The probability that in a given year there was both a forecasted increase and actual increase in
interest rates is .1765.
vi. .2059. The probability that in a given year there was both a forecasted decline and actual decline in interest
rates is .2059.
b. No, P(A) = .4705 and P(A | F) = .3684. Interest rates moved down 47% of the time and yet the
forecasters predictions of a decline showed a 37% accuracy rate.
5.76 a. i. P(B) = 6/61 = .4098. The likelihood of a climb is .4098.
ii. P(L) = 14/61= .2295. The likelihood of a low noise level is .2295.
iii. P(H) = 18/61 =.2951. The likelihood of a high noise level is .2951.
iv. P(H | C) = 3/8 = .3750. The likelihood of a high noise given you are in the cruise phase is .3750.
v. P(H | D) = 14/28 = .5. The likelihood of a high noise given you are in the descent phase is .5.
vi. P(D | L) = 6/14 = .4286. The likelihood of being in the descent phase given you experience a low noise
is .4286.
vii. P(L and B) = .0984. The likelihood of both a low noise and climbing is .0984.
viii. P(L and C)= .0328. The likelihood of both a low noise and cruising is .0328.
ix. P(H and C) = .0492. The likelihood of both a high noise and cruising is .0492.
b. Flight noise is dependent on flight phase. For example: P(H) = .2951 and P(H|C) = .375. If independent the
two probabilities would be the same.
5.77 a. .7403 ii. .1244 iii. .1385 iv. .6205 v. .7098 vi. .9197 vii. .1485 viii. .1274 ix. .0042.
b. Yes, the probability of being a nonsmoker increases with level of education.
5.78 a. P(L) =.1321.
b. P(C) = .2075.
c. P(M) =.3208.
d. P(F) = .3019.
e. P(C | F) = 15/48 = .3125.
f. P(C | M) = 11/51 = .0692 / .3208 = .2157.
52
5.79
Cancer No Cancer Totals
Positive Test 4 500 504
Negative Test 0 9496 9496
Totals 4 9996 10000
P(Cancer | Positive Test) = 4/504 = 0.00794.
5.80
Authorized Not Auth Totals
Denied Access 19,000 999,999 1,018,999
Allowed Access 18,981,000 1 18,981,001
Totals 19,000,000 1,000,000 20,000,000
P(Authorized | Denied Access) = 19,000/1,018,999 = .01865.
5.81
Accident No Accident Totals
Right-Handed 3240 5760 9000
Left-Handed 520 480 1000
Totals 3760 6240 10000
P(Left-Handed | Accident) = 520/3760 = .1383.
53
Chapter 6
Discrete Distributions
6.1 A is a probability distribution, since the sum of P(x) is 1 and all probabilities are nonnegative, while B and C
are not probability distributions since the sum of P(x) is .95 for B and 1.30 for C.
6.2 E(X) = 70, V(X) = 100, = 10. Distribution is skewed to the right. The worksheet is:
x P(x) xP(x) xE(X) P(x)[xE(X)]
2
60 0.40 24.00 -10.00 40.00
70 0.30 21.00 0.00 0.00
80 0.20 16.00 10.00 20.00
90 0.10 9.00 20.00 40.00
Total 1.00 70.00 100.00
6.3 E(X) = 2.25, V(X) = 1.6875, = 1.299. Distribution is skewed to the right.
x P(x) xP(x) xE(X) P(x)[xE(X)]
2
0 0.05 0.00 -2.25 0.25
1 0.30 0.30 -1.25 0.47
2 0.25 0.50 -0.25 0.02
3 0.20 0.60 0.75 0.11
4 0.15 0.60 1.75 0.46
5 0.05 0.25 2.75 0.38
Total 1.00 2.25 1.6875
6.4 E(X) = (215,000)(.00000884)+(0)(.99999116) = 1.9006.
6.5 Expected payout = E(X) = 1000(.01)+(0)(.999) = $10, so company adds $25 and charges $35.
6.6 The expected winning is E(X) = 28,000,000(.000000023)+0(.99999998) = $0.644. Since the cost of the ticket
is $1.00, its expected value is $0.356
6.7 Expected Loss = 250(.3) + 950(.3) + 0(.4) = $360 million.
6.8 Using a = 0000 and b = 9999,
(0 9999) / 2 4999.5 = + =
and
2
[(9999 0 1) 1]/12 2886.75 = + =
6.9 a. With a = 20 and b = 60, (20 60) / 2 40 = + = and
2
[(60 20 1) 1]/12 11.83 = + =
b. P(X. 40) = (1/40)(20) = and P(X30) = (1/40)(30) = = .
6.10 Using a = 1 and b = 500000, (1 500000) / 2 250000.5 = + = and
2
[(500000 1 1) 1]/12 144337.6 = + =
54
6.11 a. With a = 1 and b = 31,
(1 31) / 2 16 = + =
and
2
[(31 1 1) 1]/12 8.944 = + =
b. Yes, if conception is random within each month.
6.12 a. = 1.5, = .50 =1+INT(2*RAND())
b. = 3.0, = 1.414 =1+INT(5*RAND())
c. = 49.5, = 28.87 =0+INT(100*RAND())
d. Answers will vary.
6.13 Answers may vary and 0 and 1 are interchangeable.
a. 1 = correct, 0 = incorrect
b. 1= insured, 0 = uninsured
c. 1 = busy, 0 = not busy
d. 1 = lost weight, 0 no weight loss
6.14 a. = .5 (desirable)
b. = .5 (desirable)
c. = .8 (undesirable)
d. = .5 (desirable)
6.15 a. = (8)(.1) = 0.8, (8)(.1)(1 .1) 0.8485 = =
b. = (10)(.4) = 4, (10)(.4)(1 .4) 1.5492 = =
c. = (12)(.5) = 6, (12)(.5)(1 .5) 1.7321 = =
d. = (30)(.9) = 27, (30)(.9)(1 .9) 1.6432 = =
e. = (80)(.7) = 56, (80)(.7)(1 .7) 4.0988 = =
f. = (20)(.8) = 16, (20)(.8)(1 .8) 1.7889 = =
6.16 a. P(X = 2) = .1488
b. P(X = 1) = .0403
c. P(X = 3) = .0015
d. P(X = 5) = 0.0074
6.17 a. P(X 3) = .9437
b. P(X > 7) = 1 P(X 6) = .1719
c. P(X < 3) = P(X 2) = .0705
d. P(X 10) = .00417
6.18 a. P(X < 4 ) = P(X 3) = .9744
b. P(X 3) = 1 P(X 2) = .5801
c. P(X 9) = .7207
d. P(X > 10) = 1 P(X 10) = .9183
6.19 a. P(X = 0) = .10737
b. P(X 2) = 1 P(X 1) = .62419
c. P(X < 3) = P(X 2) = .6778
d. = n =(10)(.2) = 2
e. = (10)(.2)(1 .2) = 1.2649
f. See below.
55
g. Skewed to the right.
56


6.20 a. P(X = 0) = .54036
b. P(X = 1)= .34128
c. P(X = 2) = .09879
d. P(X 2) = .98043
e. See below, skewed to the right.

6.21 a. P(X =10) = .00098
b. P(X 5) = 1- P(X 4) = .62305
c. P(X < 3) = P(X 2) = .05469
d. P(X 6) = .82813
6.22 a. P(X = 8) = .016796
b. P(X 5) = 1 P(X 4) = .59409
c. P(X 5) = 1 P(X 4) = .59409
d. = n =(8)(.6) = 4.8 and = (8)(.6)(1 .6) = 1.386
e. It is almost symmetric (slightly left-skewed).
57
6.23 a. = 1, = 1.0 and =1
b. = 2, = 2.0 and =1.414
c. = 4, = 4.0 and =2.0
d. = 9, = 9.0 and =3
e. = 12, = 12.0 and =3.464
6.24 a. = 0.1, P(X = 2) = .24377
b. = 2.2, P(X = 1) =.00452
c. = 1.6, P(X = 3) =.13783
d. = 4.0, P(X = 6) =.10420
e. = 12.0, P(X = 10) =.10484
6.25 a. = 4.3, P(X 3) = .37715
b. = 5.2, P(X > 7) =.15508
c. = 2.7, P(X < 3) = .49362
d. = 11.0, P(X 10) = .45989
6.26 a. = 5.8, P(X < 4) = P(X 3) = .16996
b. = 4.8, P(X 3) = 1 P(X 2) = 1 .14254 = .85746
c. = 7.0, P(X 9) = .83050
d. = 8.0, P(X >10) = 1 P(X 10) = .81589
6.27 a. P(X 1) = 1 P(X 0) = 1 .09072 = .90928
b. P(X = 0) = .09072
c. P(X > 3) = 1 P(X 3) = 1 .77872 = .22128
d. Skewed right.
58

6.28 a. Cancellations are independent and similar to arrivals.
b. P(X = 0) = .22313
c. P(X = 1) = .33470
d. P(X > 2) =1 P(X 2) =1- .80885 = .19115
e. P(X 5) =1 P(X 4) = 1 .98142 = .01858
6.29 a. Most likely goals arrive independently.
b. P(X 1) = 1 P(X 0) = 1 .06721 = .93279
c. P(X 4) = 1 P(X 3) = 1 .71409 = .28591
d. Skewed right.


59
6.30 a. Not independent events, the warm room leads to yawns from all.
b. Answers will vary.
6.31 Let = n = (500)(.003) = 1.5
a. P(X 2) = 1 P(X 1) = 1 .55783 = .44217
b. P(X < 4) = .93436
c. Use the Poisson when n is large and is small.
d. Yes, based on our rule of thumb n 20 and .05
6.32 Let = n = (100000)(.000002) = 2
a. P(X 1) = 1 P(X=0) = 1 .13534 = .86466
b. P(X 2) = 1 P(X 1) = 1 .40601 = .59399
c. Excel could be used, otherwise n is too large for practical calculations.
d. Yes, based on our rule of thumb n 20 and .05
6.33 a. = (200)(.03) = 6 letters
b. (200)(.03)(1 .03) = = 2.413
c. For = n = 6, P(X 10) = 1 P(X 9) = 1 .91608 = .08392
d. For = n = 6, P(X 4 ) = .28506
e. Excel could be used, otherwise n is too large for practical calculations.
f. Yes, based on our rule of thumb n 20 and .05
6.34 a. Range 0 to 3, P(X = 3) = .03333
b. Range 0 to 3, P(X = 2) = .13158
c. Range 0 to 4, P(X = 1) = .44691
d. Range 0 to 7, P(X = 3) = .10980
6.35 The distribution is symmetric with a small range (2 to 4).

6.36 a. Let X = number of incorrect answers in sample.
b. P(X = 0) = .31741
c. P(X 1) = 1 P(X = 0) = 1 .31741 = .68259
d. P(X 2) = 1 P(X 1) = 1 .74062 = .29538
e. Skewed right.
60

6.37 a. Let X = number of incorrect vouchers in sample.
b. P(X = 0) = .06726
c. P(X = 1) = .25869
d. P(X 3) = 1 P(X 2) = 1 .69003 = .30997
e. Fairly symmetric.

6.38 a. Let X = number of HIV specimens in sample.
b. P(X = 0) = .30604
c. P(X < 3) = P(X 2) = .95430
d. P(X 2) = 1 P(X 1) = 1 .74324 = .25676
e. Skewed right.

61
6.39* a. 3/100 < .05, okay to use binomial approximation
b. 10/200 > .05, dont use binomial approximation
c. 12/160 > .05 , dont use binomial approximation
d. 7/500 < .05, okay to use binomial approximation
6.40* a. P(X = 0) = .59049 (B) or .58717 (H)
b. P(X 2) = 1 P(X 1) = 1-.91854 = .08146 (B) or .0792 (H)
c. n/N =5/200 < 0.05 so binomial approximation is OK.
6.41* a. P(X=0)= 0.34868 (B) or .34516 (H)
b. P(X 2) = 1 P(X 1) = 1 .73610 = .26390 (B) or .26350 (H)
c. P(X < 4) = P(X 3) = .98720 (B) or .98814 (H)
d. n/N = 10/500 = .02 so we can use the binomial with = s/N= 50/500 = .1
6.42* a. P(X = 6) = .26214 (B) or .25967 (H)
b. P(X 4) = 1 P(X 3) = 1 .09888 = .90112 (B) or .90267 (H)
c. n/N = 6/400 < 0.05 so we can use the binomial with = s/N= 320/400 = .8
6.43* a. P(X = 5) = .03125 when = .5
b. P(X = 3) = .14063 when = .25
c. P(X = 4) = .03840 when = .60
6.44* a. Geometric mean is 1/ = 1/(.20) = 5
b. Using the geometric CDF, P(X 10) = 1 (1)
x
= 1 (1.20)
10
= .89263
6.45* a. Geometric mean is 1/ = 1/(.50) = 2
b. Using the geometric CDF, P(X 10) = 1 (1)
x
= 1 (1.50)
10
= .99902
6.46* a. = 79.62.54 = 202.184 cm
b. = 3.242.54 = 8.2296 cm
c. Rule 1 for the mean and Rule 2 for the std dev.
6.47* a. Applying Rule 3, we add the means for each month to get = 9500+7400 + 8600 = $25,500. Applying
Rule 4, we add the variances for each month and then take the square root of this sum to find the std dev
for the quarter:
2
= 1250+1425+1610 = 4285 and so s = (4285)
.5
= 65.4599 is the std dev for the quarter.
b. Rule 4 assumes that the sales for each month, in this case, are independent of each other. This may not
be valid, given that a prior months sales usually influence the next months sales.
6.48 The probability of a payout is 1 .99842 = .00158. The expected payout is (.00158)(1,000,000) = $1,580
dollars. To break even, the company would charge $1,580.
6.49 E(X) = (100)(1/6) + (15)(5/6) = 16.67 12.50 = $4.17. On average, you would win more than you lose. If
you have to pay more than $4.17 to play, a rational person wouldnt play (unless very risk loving).
6.50 The expected loss is E(X) = (250)(.02) + (0)(.98) = $5 which exceeds the $4 cost of insurance (assuming you
would lose the entire cost of the PDA). Statistically, it is worth it to insure to obtain worry-free shipping,
despite the small likelihood of a loss.
62
6.51 a. If uniform, = (1 + 44)/2 = 22.5 and
2
[(44 1 1) 1]/12 12.698 = + = . Quite a big difference from
what is expected.
b. What was the sample size? One might also want to see a histogram.
6.52 a. If uniform, = (1 + 5)/2 =3 and
2
[(5 1 1) 1]/12 1.414 = + = .
b. Answers will vary.
c. Answers will vary.
d. =1+INT(5*RAND())
6.53 a. = .80 (answers will vary).
b. = .300 (answers will vary).
c. = .50 (answers will vary).
d. = .80 (answers will vary).
e. Outcomes of one trial might influence the next. For example, if I fail to make a free throw because I shot
the ball long, I will adjust my next shot to be a little shorter, hence, violating the independence rule.
6.54 a. P(X = 0) =.06634
b. P(X2) = 1 P(X 1) = 1- .94276 = .05724
c. Binomial = n = (8)(.05) = 0.4 and (1 ) 8(.05)(.95) 0.616 n = = =
d. Strongly skewed to the right.

6.55 a. Define X to be the number that fail. P(X = 0) = .59049
b. P(X = 1) = .32805.
c. Strongly skewed to the right.
63

6.56 a. P(X = 0) =.10737
b. P(X 2) = 1 P(X 1) = 1 .37581 = .67780
c. P(X = 10) = .00000
d. Slightly skewed to the right.

6.57 a. P(X = 0) =.06250
b. P(X 2) = 1 P(X 1) = 1 .31250 = .68750
c. P(X 2) = .68750
d. Symmetric.

64
6.58 a. P(X = 0) =.01680
b. P(X = 1) = .08958
c. P(X = 2) = .20902
d. P(X 2) = .31539
e. Slightly skewed right.

6.59 a. =BINOMDIST(3,20,0.3,FALSE)
b. =BINOMDIST(7,50,0.1,FALSE)
c. =BINOMDIST(6,80,0.05,TRUE)
d. =1BINOMDIST(29,120,0.2,TRUE)
6.60 a. P(X 14) = 1 P(X 13) = 1 .942341 = .057659
b. P(X 15) = .0207 therefore a score of 15 would be needed.
6.61 a. P(X = 0) =.48398
b. P(X 3) = 1 P(X 2) = 1 .97166 = .02834
c. For this binomial, = n = (10)(.07) = 0.7 defaults
6.62 Using Excel: =BINOMDIST(0,14,0.08,FALSE) = 0.311193
6.63 Binomial with n = 16, = .8:
a. P(X 10) = 1 P(X 9) = 1 .02666 = .97334
b. P(X < 8) = P(X 7) = .00148
6.64 a. =POISSON(7,10,FALSE) = .0901
b. =POISSON(3,10,FALSE) = .0076
c. =1 POISSON(4,10,TRUE) = .0292
d. =1 POISSON(10,10,TRUE) = .4170
6.65 Let X = the number of no shows. Then:
a. If n = 10 and = .10, then P(X = 0) = .34868.
b. If n = 11 and = .10, then P(X 1) = 1 P(X=0) = 1 - .31381 = .68619
c. If they sell 11 seats, there is no way that more than 1 will be bumped.
d. Let X = the number who do show up and set = .90. We want P(X 10) .95 so we use Excels
function = 1BINOMDIST(9,n,.9,TRUE) for various values of n. It turns out that n = 13 will suffice.
n P(X 9) P(X 10)
11 0.30264 0.69736
12 0.11087 0.88913
13 0.03416 0.96584
65
6.66 a. Let X be the number that are not working. As long as no more than 2 are not working, he will have
enough. Using Excels =BINOMDIST(2,10,0.2,1) we calculate P(X 2) = .67780.
b. Let X be the number that are working and set = .8. We want P(X 8) .95 so we use Excels function
=1BINOMDIST(7,n,0.2,TRUE) for various values of n. It turns out that n = 13 will suffice.
n P(X 7) P(X 8)
10 0.32220 0.67780
11 0.16114 0.83886
12 0.07256 0.92744
13 0.03004 0.96996
6.67 a. Because calls to a fire station within a minute are most likely all about the same fire, the calls are not
independent.
b. Answers will vary.
6.68 a. Defects happen randomly and are independent events.
b. P(X = 5) = .17479
c. P(X 11) = 1 P(X 10) = 1 0.9823 = .0177
d. Right-skewed.
6.69. a. Storms happen at different times throughout the year and seem to be independent occurrences.
b. P(X 5) = 1P(X 4) = 1.00181 = .99819
c. P(X > 20) = 1P(X 20) = 1.95209 = .04791
d. Fairly symmetric due to large .
66
6.70 a. Near collisions are random and independent events.
b. P(X 1) = 1 P(X = 0) = 1 .30119 = .69881
c. P(X > 3) = 1 P(X 3) = 1 .96623 = .03377
d. See below.

6.71 a. Assume that cancellations are independent of each other and occur randomly.
b. P(X = 0) = .22313
c. P(X = 1) = .33470
d. P(X > 2) = 1 P(X 2) = 1 .80885 = .19115
e. P(X 5) = 1 P(X 4) = 1 .98142 = .01858
6.72 a. The number of fatal crashers occurs randomly and each crash is independent of the other.
b. P(X 4) = 1 P(X 3)= 1 .69194 = .30806
c. P(X 3) = .69164
d. Given the historical mean of = 2.8 for that decade, 4 or more crashes in one year was not very unusual,
(31% chance of occurring) assuming independent events.
6.73 a. We assume that paint defects are independent events, distributed randomly over the surface. For this
problem, we would use a mean of = 2.4 defects per 3 square meter area.
b. P(X = 0) = .09072
c. P(X = 1) = .21772
d. P(X 1) = .30844
6.74 a. We assume that paint defects are independent events, distributed randomly over the surface.
b. P(X 4) = .02925
c. PX > 15) = 1 P(X 15) = 1 .95126 = .04874
d. Fairly symmetric due to large .
67
6.75 a. = 1/30 supernova per year = 0.033333 and 1/ 30 = = =.182574.
b. P(X 1) = 1 P(X = 0) = 1
x
e
-
/x! = 1 (.033333)
0
e
.033333
/0! = 1 .96722 = .03278
c. Appendix B does not have = .0333333.
6.76 a. Earthquakes are random and independent events. No one can predict when they will occur.
b. P(X < 3) = P(X 2) = .87949
c. P(X > 5) = 1 P(X 5) = 1 . 0.9985 = .0015

6.77 a. Crashes are unrelated events, cant predict them, so they do happen randomly. A single crash does not
necessarily impact any other car crashes. This assumption may be unrealistic.
b. P(X 1) = 1 P(X = 0) = 1 .13534 = .86466
c. P(X < 5) = P(X 4) = .94735
d. Skewed to the right.

6.78* Binomial n = 2500, = .001 or Poisson with = 2.5 leaks per 2500 meters. Using the Poisson distribution:
a. P(X = 0) = 0.0828
b. P(X 3) = 1 P(X 2) = 1 .54381 = .45619
c. Skewed right.
d. Skewness = 1 1 (2.5) .400 = =
e. n is too large to be convenient.
f. n 20 and .05 so Poisson is accurate
68

6.79* a. n = 200, = .02. Define X to be the number of twin births in 200 deliveries. E(X) = (200)(.02) = 4.
b. P(X = 0) = .01759
c. P(X = 1) = .07326
d. Using the Poisson approximation to the Binomial with = 4:
P(X = 0) = .01832 from = POISSON (0, 4, FALSE)
P(X = 1) = .07179 from = POISSON (1, 4, FALSE)
e. Yes, the approximation is justified. Our rule of thumb is n 20 and .05 which is met here and the
probabilities from the Poisson are similar to the binomial.
6.80* a. Binomial P(X = 0) = .00226, Poisson P(X = 0) = .00248
b. Binomial P(X = 1) = .01399, Poisson P(X = 1) = .01487
c. Binomial P(X = 2) = .04304, Poisson P(X = 2) = .04462
d. Set = n = (200)(.03) = 6.0
e. Yes, n 20 and .05 and probabilities are similar.
6.81* a. For the binomial = n = (4386)(.00114) = 5 is the expected number killed.
b. For the binomial, (1 ) (4386)(.00114)(.99886) 2.235 n = = =
c. Using Poisson approximation with = 5.0, P(X < 5) = P(X 4)= .44049
d. P(X > 10) =1 P(X 10) = 1 .98631 = .01369
e. Yes, the approximation is justified. Our rule of thumb is n 20 and .05 which is met.
6.82* a. n = (500)(.02) = 10.
b. Using the Poisson approximation with = n = (500)(.02) = 10 we get P(X 5) = .06709.
6.83 a. P(X = 5 | N = 52, s = 13, n = 5) = .000495.
b. No, since n/N = 5/52 exceeds .05 (our rule of thumb for a binomial approximation.)
69
6.84 a. Sampling without replacement, n/N < 0.05
b. Range of X is 0 to 2.
c. See the table below.
6.85* a. Geometric mean is 1/ = 1/(.08) = 12.5 cars
b. Geometric std. dev. is
2 2
(1 ) / (.92) /(.08) 11.99 = = cars
c. Using geometric CDF, P(X 5) = 1(1)
x
= 1(1.08)
5
= .3409
6.86* a. Geometric mean is 1/ = 1/(.07) = 14.29 operations
b. Using geometric CDF, P(X 20) = 1 P(X 19) = 1 [1(1)
x
] = (1)
x
= (1.07)
19
= .2519
6.87* a. Geometric mean is 1/ = 1/(.05) = 20
b. Using geometric CDF, P(X 29) = 1 (1)
x
= 1 (1.05)
29
= 1 .2259 = .7741
6.88 a. 1/ = 1/(.02) = 50
b.
2 2
(1 ) (.98) (.02) 49.5 = =
c. Would have to examine a large number to find first check for abnormality. Since most would be OK, it
would be easy to lose concentration. Same applies to airport security inspectors.
6.89 The total number of values in the uniform distribution is n = ba+1. Since P(x) = 1/(ba+1) is a constant for
all x, the sum is simply that constant multiplied by n or (ba+1)/ ba+1) = 1.
6.90 a. ( ) / 2 (0 9999) / 2 4999.5 a b = + = + =
2 2
( 1) 1 (9999 0 1) 1
2886.8
12 12
b a + +
= = =
6.91 a.. (233.1)(0.4536)= 105.734 is the mean in kilograms
b. (34.95)(0.4536) = 15.8533 is the std dev in kilograms
c. Rule 1 for the mean and Rule 2 for the std dev.
6.92 a. By Rule 1, expected total cost is vQ+F = vQ+F = (8)(25000) + 15000 = $350,000
By Rule 2, std dev. of total cost is vQ+F = vQ = (8)(2000) = $16,000
b. To break even, we want TR TC = 0 where TR = expected total revenue and TC = expected total cost.
Since TR = (Price)(Quantity) = PQ we set PQ vQ+F = 0 and solve for P to get P(25000) 350000 = 0
or P = $14. For a profit of $20,000 we have P(25000) 370000 = 0 or P = $14.80.
70
6.93* a. Using Rule 3: X+Y = X + Y = 70+80 = 150
b. Using Rule 4: X+Y =
2 2
64 36 10
X Y
+ = + =
c. Rule 4 assumes independent test scores. Most likely these variables are not independent. The score the
student got on the first exam may influence the score on the second exam (i.e. studied more, attended
class more frequently, sought tutoring).
6.94* Using Rule 3: X+Y = X + Y = 20+10+14+6+48 = 98 hours
Using Rule 4: X+Y =
2 2
16 4 9 4 36 8.31
X Y
+ = + + + + = (assuming independent steps)
2-sigma interval around the mean 2 or 98 (2)(8.31). The range is 81.4 to 114.6 hours.
6.95 a. By Rule 1, mean of total cost: vQ+F = vQ+F = (2225)(7) + 500 = $16,075
By Rule 2, std dev. of total cost: vQ+F = vQ = (2225)(2) = $4,450
By Rule 1, expected revenue is E(PQ) = PQ = (2850)(7) = $19,950
Expected profit is TR TC = 19,950 16,075 = $3,875
6.96* Adding 50 will raise the mean by 50 using Rule 1: aX + b = aX + b = (1)(25) + 50 = 75. Multiplying by 3 will
also raise the mean by 50 using Rule 1: aX + b = aX + b = (3)(25) + 0 = 75. The first transformation will shift
the distribution to the right without affecting the standard deviation by Rule 2: aX + b = aX = (1)(6) = 6. The
second transformation will spread out the distribution, since the standard deviation will also increase using
Rule 2: aX + b = aX = (3)(6) = 18, and some scores will exceed 100.
6.97* a. This is a binomial with = n= (.25)(250) = 60
b This is a binomial with
2
= n(1) = (240)(.25)(.75) = 45 so = 6.7082
c. 1 is 60 (1)(6.7082) or 53.3 days to 66.7 days
2 is 60 (2)(6.7082) or 46.6 days to 73.4 days
These intervals contain about 68% and 95% of the X values if the shape of the binomial is approximately
normal. In this case, that is true, as you can see by printing the binomial PDF.
71
Chapter 7
Continuous Distributions
7.1 a. D
b. C
c. C
7.2 a. C
b. D
c. C
7.3 In order to be a valid PDF, total area under f(x) must equal 1.
a. Area = .25(1) = .25 therefore this is not a PDF.
b. This is a valid PDF.
c. Area = (2)(2) = 2 therefore it is not a PDF.
7.4 For a continuous PDF, we use the area under the curve to measure the probability. The area above a single
point is defined to be zero so if we summed up all the point probabilities we would have a sum equal to
zero. In addition, by definition there are an infinite number of points in the interval over which a
continuous random variable is defined.
7.5
a. = (0+10)/2 =5 =
2
(10 0)
12

= 2.886751
b. = (200+100)/2 = 150 =
2
(200 100)
12

= 28.86751
c. = (1+99)/2= 50 =
2
(99 1)
12

= 28.29016
7.6 a. P(X < 10) for U(0,50) = (10-0)/(50-0) = 0.2
b. P(X > 500) for U(0,1000) = (1000-500)/(1000-0) = 0.5
72
c. P(25 < X < 45) for U(15,65) = (45-25)/(65-15) = .4
.
7.7 P(X=25) = 0 for a continuous uniform distribution. Therefore using a < or yields the same result.
7.8 a. = (2500+4500)/2 = 3500
b. =
2
(4500 3500)
12

= 577.3503
c. The first quartile is the midpoint between a and the median: (3500+2500)/2 = 3000. The third quartile is the
midpoint between the median and b: (4500+3500)/2 = 4000.
d. P(X < 3000) = P(2500 < X < 3000) = for U(2500,4500) =(3000-2500)/(4500-2500) =0.25.
e. P(X > 4000) = P(4000 < X <4500) = for U(2500,4500) = (4500-4000)/(4500-2500) = 0.25.
f. P(3000< X < 4000) = for U(2500,4500) =(4000-3000)/(4500-2500) =0.50.
7.9 The curves differ by their mean, standard deviation, and height.
7.10 a. The maximum height is 0.0798. (Plug = 75 and =5 into the PDF.)
b. No, f(x) does not touch the X axis at any point. The distribution is asymptotic.
7.11 It says that for data from a normal distribution we expect
about 68.26% will lie within 1
about 95.44% will lie within 2
about 99.73% will lie within 3
7.12 a. Yes
b. No, distribution could be skewed. Direction of skewness depends on how one defines years of education
and which geographic region one is interested in.
c. No, distribution could be skewed right. Most bills will be delivered within a week but there may be a few
that take much longer.
d. Yes, but there could be outliers.
7.13 a. .1915.
b. .1915.
c. .5000.
d. 0
7.14 a. P(Z<2.15) P(Z<1.22) = .9842 .8888 = .0945
b. P(Z<3.00) P(Z<2.00) = .99865 .9772 = .02145
c. P (Z<2.00) P(Z<2.00) = .9772 .0228 = .9544
d. 1 P(Z<.50) = 1 .6915 = .3085
7.15 a. P(Z<2.15) P(Z<1.22) = .9842 .1112 = .8730
b. P(Z<2.00) P(Z<3.00) = .9772 .00135 = .97585
c. P(Z < 2.00) = .9772
d. P(Z = 0) = 0.
73
7.16 a. NORMDIST(232000,232000,7000,TRUE) = 0.50
b. NORMDIST(239000,232000,7000,TRUE) NORMDIST(232000,232000,7000,TRUE) = 0.341345
c. NORMDIST(239000,232000,7000,TRUE) = 0.841345
d. NORMDIST(245000,232000,7000,TRUE) = 0.968355
e. 1 NORMDIST(225000,232000,7000,TRUE) = 0.84134474
7.17 a. NORMDIST(300,290,14,TRUE) = 0.762475
b. 1 NORMDIST(250,290,14,TRUE) = 0.997863
c. NORMDIST(310,290,14,TRUE) NORMDIST(275,290,14,TRUE) = 0.781448
7.18 Use Excels NORMINV function. NORMINV(0.975,3.3,.13) = 3.554795, NORMINV(.025,3.3,.13) =
3.045. The middle 95% is in the interval 3.045 to 3.555.
b. NORMDIST(3.50,3.3,.13,TRUE) = 0.061967919
7.19 Use Excels NORMINV function to give the X value associated with the cumulative probability.
a. NORMINV(.9,10,3) =13.84465582
b. NORMINV(.5,10,3) = 10
c. NORMINV(.95,10,3) = 14.93456043
d. NORMINV(.2,10,3) = 7.475136874
e. NORMINV(.1,10,3) = 6.155344182
f. NORMINV(.25,10,3), NORMINV(.75,10,3) = 7.976531423, 12.02346858
g. NORMINV(.93,10,3) = 14.42737385
h. NORMINV(.025,10,3), NORMINV(.975,10,3) = 4.120111638, 5.87988836
i. NORMINV(.07,10,3) = 5.572626152
7.20 Use Excels NORMINV function to give the X value associated with the cumulative probability.
a. NORMINV(.9,360,9) = 371.5339675
b. NORMINV(.5,360,9) = 360
c. NORMINV(.95,360,9) = 374.8036813
d. NORMINV(.2,360,9) = 352.4254106
e. NORMINV(.1,360,9) = 348.4660325
f. NORMINV(.25,360,9), NORMINV(.75,360,9) = 353.9295943, 366.0704057
g. NORMINV(.9,360,9) = 371.5339675
h. NORMINV(.025,360,9), NORMINV(.975,360,9) = 342.3603349, 377.6396651
i. NORMINV(.96,360,9) = 375.7561746
7.21 a. P(X 8) = NORMDIST(8,6.9,1.2,TRUE) = 0.179659. This probability indicates that the event is not
common but not unlikely.
b. NORMINV(.9,6.9,1.2) = 8.437862 pounds
c. 95% of birth weights would be between 4.5 and 9.3 pounds. NORMINV(0.025,6.9,1.2),
NORMINV(.975,6.9,1.2) = 4.548045, 9.251955
7.22 a. NORMINV(.95,600,100) = 764.4853476
b. NORMINV(.25,600,100) = 532.5510474
c. NORMINV(0.1,600,100) , NORMINV(0.9,600,100) = 471.8448, 728.1551939
7.23 a. NORMDIST(110,100,15,TRUE) = 0.747507533
b. NORMDIST(2,0,1,TRUE) = 0.977249938
c. NORMDIST(5000,6000,1000,TRUE) = 0.15865526
d. NORMDIST(450,600,100,TRUE) = 0.066807229
7.24 a. NORMDIST(110,100,15,TRUE) NORMDIST(80,100,15,TRUE) = .6563
b. NORMDIST(2,0,1,TRUE) NORMDIST(1.5,0,1,TRUE) = .0441
c. NORMDIST(7000,6000,1000,TRUE) NORMDIST(4500,6000,1000,TRUE) = .7745
d. NORMDIST(450,600,100,TRUE) NORMDIST(225,600,100,TRUE) = .0667
74
7.25 a. NORMINV(.1,360,9) = 348.4660325
b. NORMINV(.32,360,9) = 355.7907105
c. NORMINV(.75,360,9) = 366.0704057
d. NORMINV(.9,360,9) = 371.5339675
e. NORMINV(.999,360,9) = 387.8122732
f. NORMINV(.9999,360,9) = 393.4718124
7.26 a. 1 NORMDIST(60,40,28,TRUE) = 0.2375
b. NORMDIST(20,40,28,TRUE) = 0.2375
c. 1 NORMDIST(10,40,28,TRUE) = 0.8580
7.27* The Excel formula could be: NORMSINV(RAND()).
7.28* The Excel formula could be: NORMINV(RAND(),4000,200).
Note: The probabilities below were calculated using Appendix C-2.
7.29* n 5 and n(1-) 5 so we can use the normal approximation to the binomial.
= n = 70, = (1 ) n = 8.07
a. P(X < 50) P(X 49.5) = P(Z < 2.54) = .0055
b. P(X > 100) P(X 100.5) = P(Z > 3.78) = .00008
7.30* n 5 and n(1-) 5 (800*.03 = 24, 800*.97 = 776) so we can use the normal approximation.
a. = 24 , = 4.8249
b. P(X 20) P(X 19.5) = .8238
c. P(X > 30) P(X 30.5) = .0885
7.31* n 5 and n(1-) 5 (200*.90 = 180, 200*.1 = 20) so we can use the normal approximation.
= 180, = 4.2426
a. P(X 175) P(X 174.5) = .9032
b. P(X < 190) P(X 189.5) = .9875
7.32* n 5 and n(1-) 5 (8465*.048 = 406.32, 8465*.952 = 776) so we can use the normal approximation.
a. n = 406.32
b. P(X 400) P(X 399.5) = .6368
c. P(X < 450) P(X 449.5) = .9861
7.33* Let = = 28, = = 5.29.
a. P(X > 35) P(X 35.5) =.0778
b. P(X < 25) P(X 24.5) = .2546
c. 20 therefore the normal approximation is appropriate.
d. .0823 and .3272. Not as close as one might wish.
7.34* Let = = 150,
=
= 12.25
a. P(X 175) P(X 174.5) = .0228
b. P(X < 125) P(X 124.5) = .0188
c. 20 therefore the normal approximation is appropriate.
d. From Excel: 1 POISSON(174, 150,1) = .0248, POISSON(124, 150,1) = .01652. The probabilities are
fairly close.
7.35 a. P(X > 7) = .1225
b. P(X < 2) = .4512
75
7.36 a. P(X > 30 minutes) = .1225
b. P(X < 15 minutes) = .6501
c. P(15 < X < 30) = .8775 .6501 = .2274
7.37 a. P(X < 60 seconds) = .9975
b. P(X > 30 seconds) = .0498
c. P(X > 45 seconds) = .0111
7.38 There are 26,280 hours in 3 years. A warranty claim can be filed if the hard drive fails within the first three
years. P(X < 26280) = 1e
x
=1e
(1/250000)(26280)
= 1e
-0.10512
= 1.9002 = .0998
7.39 a. P(X > t) = .5. To solve for t: ln(.5)/4.2 = .1650 hours.
b. ln(.25)/4.2 = .3301 hours
c. ln(.1)/4.2 = .5482 hours
7.40 a. P(X > t) = .5. To solve for t: ln(.5)/.5 = 1.3863 minutes
b. ln(.75)/.5 = .5754 minutes
c. ln(.7)/.5 = .7134
7.41 a. P(X > t) = .5. Use = 1/20 = .05. To solve for t: ln(.5)/.05 = 13.86 minutes.
b. The distribution on time is skewed to the right therefore the median < mean.
c. ln(.25)/.05 = 27.73 minutes.
7.42 a. P(X > t) = .5. To solve for t: ln(.9)/.125 = .843 years.
b. ln(.8)/.125 = 1.785 years
7.43* a. = (0+25+75)/3 =33.33
b. = 15.59
c. P(X < 25) = .3333
d. Shaded area represents the probability.
7.44* a. = (50+105+65)/3 = 73.33
b. = 11.61
c. P(X > 75) = .4091
d. Shaded area represents the probability.
76
7.45 a. D
b. C
c. C
7.46 a. Area = .5(2) = 1 therefore this is a valid PDF.
b. Area = (2)(2) = 2 therefore this is not a valid PDF.
c. Area = (.5)(2)(2) = 1 therefore this is a valid PDF.
7.47 a. = 45.
b. = 11.547
c. P(X > 45) = (65-45)/(65-25) = 0.5
d. P(X > 55) = (65-55)/(65-25) = 0.25
e. P(30< X <65) = (60-30)/(65-25) = 0.75
7.48 a. = 27.5.
b. = 12.99
c. Q1 = 16.25, Q3 = 38.75
d. P(re-swiping) = P(10 < X < 40) = 1 .6667 = .3333. 33.33% must re-swipe.
7.49 Answers will vary.
a. Suggested response: Not normal, unknown shape.
b. Would expect distribution to be skewed to the right.
c. Normal
d. Normal
7.50 a. = 86.
b. = 0.0693
c. P(X > .80) = (.98-.80)/(.98-.74) = .75
d. P(X < 85) =(.85-.74)/(.98-.74) = .4583
e. P(.8 < X <.9) = (.9-.8)/(.98-.74) = .4167
f. The chlorine is added to kill bacteria.

7.51 a. NORMINV(.5,450,80) = 450
b. NORMINV(.25,450,80) = 396.041
c. NORMINV(.9,450,80) = 552.524
d. NORMINV(.2,450,80) = 382.670
e. NORMINV(.95,450,80) = 581.588
f. NORMINV(.25,450,80), NORMINV(.75,450,80) = 396.041, 503.960
g. NORMINV(.2,450,80) =382.670
h. NORMINV(.025,450,80), NORMINV(.975,450,80) = 293.203, 606.798
i. NORMINV(0.99,450,80) = 636.108
7.52 a. The likelihood of a value greater than the mean is .50.
b. This corresponds to P(Z > 1) = .1587
c. This corresponds to P(Z > 2) = 0.02275
d. This corresponds to P(-2 < Z <2) = .9545
7.53 a. P(X>130) = .2266
b. P(X<100) = .2266
c. P(X<91) = .1151
7.54 a. P(X<579) = .5
b. P(X>590) = .2160
c. P(X<600) = .9332
77
7.55* a. P(28<X<32) = .8413 .1587 = .6826
b. P(X<22.5) = 8.84E-05
7.56 a. 1-P(X<120) = .7881
b. 1- P(X<180) = .0548
c. NORMINV(.95, 140,25) =181.121
d. NORMINV(.99,140,25) = 198.159
7.57 P(1.975<X<2.095) = .0455
7.58 a. P(X<40) = .0013
b. P(X>55) = .1711
c. Assumed a normal distribution.
d. P(40<X<62.8) = .9973. .9973*73.1 million = 72.9 million.
7.59 a. P(X>50) = .2177
b. P(X<29) = .2965
c. P(40<X<50) = .2223
d. Assumed that email consults had a symmetrical distribution that was bell shaped. Even though the number
of email consults is a discrete random variable it is possible to approximate with the normal distribution.
7.60 a. P(X>30) = .0038
b. P(all three men finish in time) = (1-.0038)
3
= .9886
7.61 P(X>90) = .2742. Assume that the distribution on time was normal.
7.62 The next procedure will be delayed if the tubal ligation runs 40 minutes or longer. P(X>40) = .1056
7.63 P(X>5200) = .2016
7.64 a. P(X< 135) = .3085
b. P(X > 175) = .0668
c. P(125 < X < 165) = .8413 .1587 = .6826
d. With variability, physicians run the risk of not treating a patient with dangerous blood pressure or treating a
patient with healthy blood pressure. Understanding variability allows physicians to minimize the chances
of making these two types of errors.
7.65 a. John scored better than only 5.26% of the others.
b. Mary scored above average, better than approximately 70% of others.
c. Zak scored better than 96.33% of others.
d. Frieda scored better than 99.34% of others.
7.66 a. False, the normal distribution is asymptotic. Thus, a value outside the given interval is possible.
b. False, the standardized values do allow for meaningful comparison. Z scores are unit free.
c. False, the normal distribution is a family of distributions, each having the same shape, but different
means and standard deviations.
7.67* a. For route A: P(X<54) = .5. For route B: P(X<54) = .0228. He should take route A.
b. For route A: P(X<60) = .8413. For route B: P(X<60) = .5. He should take route A.
c. For route A: P(X<66) = .9722. For route B: P(X<66) = .9772. He could take either route. Because the
standard deviation is smaller for route B, the chance of getting to the airport in under 66 minutes is the
same for each route.
78
7.68 a. Underfilling the bottle means putting less than 500 ml in the bottle. Find the value of for which P(X >
500) = .95. This corresponds to a z = 1.645. Using the z score formula to solve for we find that the
mean should be set at 508.225 ml.
b. To ensure that 99% contain at least 500 ml, set the mean at 511.63.
c. To ensure that 99.9% contain at least 500 ml, set the meant at 515.45.
7.69 Find the value for X such that P(X > x) = .80. This corresponds to a z = .842. Using the z score formula to
solve for x we find that the minimum length should be 11.49 inches.
7.70* a. For method A: P(X<28) = .5. For method B: P(X<28) = .0228. Method A is preferred.
b. For method A: P(X<38) = .9938. For method B: P(X<38) = .9987. Method B is preferred.
c. For method A: P(X<66) = .9722. For method B: P(X<66) = .9772. Either method is acceptable.
7.71* a. P(X > ) = .5 (property of the normal distribution). Assuming independence, the probability that both
exceed the mean is: .5*.5 =.25.
b. P(X < ) = .5 (property of the normal distribution). Assuming independence, the probability that both are
less than the mean is: .5*.5 =.25.
c. P(X<) = .5 (property of the normal distribution). Assuming independence, the probability that one is
greater than and one is less than the mean is: .5*.5 =.25. There are two combinations that yield this, so
the likelihood is: .25+.25 = .50 that one exceeds the mean and one is less than the mean.
d. P(X = ) = 0, this is a property of a continuous random variable. The probability that both equal the mean
is zero.
7.72* Use the normal approximation of the binomial distribution because we clearly meet the requirement that n
5 and n(1-) 5.
a. P(X 50) P(X 49.5) = .0646.
b. P(X < 35) P(X 34.5) = .1894.
7.73* P(X < 20) P(X 19.5) = .102
7.74* Use the normal approximation of the binomial distribution because we clearly meet the requirement that n
5 and n(1-) 5. Q1 = 16,966, Q3 = 17,034. Use the NORMINV function in Excel with = 17000 and
= 50.4975.
7.75* Use the normal approximation of the binomial distribution because we clearly meet the requirement that 5
and n(1-) 5. n = 100.25 = 25, n(1-) = 100*.75 = 75.
a. Find the value of X such that P(X x) = .05. This corresponds to a z score equal to 1.645. Using the z score
formula and solving for x we find that the minimum score should be 33.12.
b. Find the value of X such that P(X x) = .01. This corresponds to a z score equal to 2.326. Using the z score
formula and solving for x we find that the minimum score should be 35.07.
c. Q1 = 22.08, Q3 = 25.
7.76* Use the normal approximation of the binomial distribution because we clearly meet the requirement that n
5 and n(1-) 5. n = 200.8 =160, n(1-) = 200*.2 = 40.
a. P(X < 150) P(X 149.5) = .0317.
b. P(X 150) P(X 149.5) = 1 .0317 = .9683.
7.77* Use = 30 and = 5.422.
a. P(X 25) P(X 24.5) = .8448
b. P(X > 40) P(X 40.5) = .0264
7.78* Converting the rate from days to years, = 73. Let = = 73 and
=
= 8.544. It is appropriate to use
the normal approximation given that 30.
P(X < 60) P(X 59.5) = 0570.
79
7.79 a. P(X < 10,000) = 1e
x
=1e
(1/10000)(10000)
= 0.632120559
b. The distribution is skewed to the right therefore the mean is greater than the median. The probability
calculated above makes sense.
7.80 a. P(X>6) = e
(.1*6)
= .5488
b. P(X>12) = e
(.1*12)
= .3012
c. P(X>24) = e
(.1*24)
= .0907
d. P(6<X<12) = (1 .3012) (1 .5488) = .2476
7.81 a. P(X > 15,000) = e
x
=e
(1/25000)(15000)
= 0.1889
b. Ten years is 87600 hours. If the airplane is flown 25% of the time then that would be 21,900 hours of use.
Find P(X<21900) = 1e
x
=1e
(1/25000)(21900)
= .5836.
7.82 a. P(X<50,000) = e
x
=e
(1/16667)(50000)
= .0498 P(X<50,000) = e
x
=e
(1/66667)(50000)
= .472
b. There has been approximately a tenfold increase in the reliability of the engine between 1982 and 1992.
7.83 a. = (300 + 350 + 490)/3 = 380.
b. =
2 2 2
300 350 490 300*350 300*490 350*490
40.21
18
+ +
=
c. P(X > 400) =
2
(490 400)
.3045
(490 300)(490 350)

=

.
7.84* a. = (50 + 95 + 60)/3 = 68.33.
b. =
2 2 2
50 60 95 50*60 50*95 60*95
9.65
18
+ +
=
c. P(X <75) =
2
(95 75)
1 .7460
(96 50)(95 60)

=

.
d.
7.85* a. = (500 + 700 + 2100)/3 = 1100.
b. =
2 2 2
500 700 2100 500*700 500*2100 700*2100
355.90
18
+ +
=
c. P(X > 750) =
2
(2100 750)
.8136
(2100 500)(2100 700)

=

.
80
d.
7.86* a. The z scores were, respectively, 5.75, 4.55, 5.55, and 5.45.
b. If the exams scores had a historical mean and standard deviation of 80 and 20 with a normal distribution
then the exam scores reported by the four officers were highly unlikely.
81
Chapter 8
Sampling Distributions and Estimation
8.1 a.
n

=
32
4
= 16
b.
n

=
32
16
= 8
c.
n

=
32
64
= 4
8.2 a. 1.96
n

=
12
200 1.960
36
or (196.08, 203.92).
b. 1.96
n

=
15
1000 1.960
9
or (990.2, 1009.80).
c. 1.96
n

=
1
50 1.960
25
or (49.608, 50.392).
8.3 a.
1.96
= 4.035 1.96*0.005 or (4.0252, 4.0448).
b. 1.96
n

=
0.005
4.035 1.96
25
or (4.03304, 4.03696).
c. In either case, we would conclude that our sample came from a population that did not have a population
mean equal to 4.035.
8.4 a. 1. No, for n = 1 the 100 samples dont represent a normal distribution 2.The distribution of the sample
means becomes more normally distributed as n increases. 3. The standard error becomes closer to that
predicted by the CLT the larger the sample becomes. 4. This demonstration reveals that if numerous
samples were taken and analyzed we can confirm the CLT. In the real word, based on our notion of the
true mean, we can assess this. We can generate the 95% range and determine if our values are within this
range or not. Also, recognize that there is a low probability of this single range not being representative.
b. 1. No, for n = 1 the 100 samples dont represent a normal distribution 2.The distribution of the sample
means becomes more normally distributed as n increases. The standard error becomes closer to that
predicted by the CLT the larger the sample becomes. 4. This demonstration reveals that if numerous
samples taken and analyzed we can confirm the CLT. In the real word, based on our notion of the true
mean, we can assess this. We can generate the 95% range and determine if our values are within this
range or not. Also, recognize that there is a low probability of this single range not being representative.
8.5 a. 1.645
n

=
4
14 1.645
5
or (11.057, 16.943).
b. 2.576
n

=
5
37 2.576
15
or (33.675, 40.325).
82
c. 1.96
n

=
15
121 1.96
25
or (115.12, 126.88).
8.6 Exam 1: 1.96
n

=
7
75 1.96
10
or (70.661, 79.339).
Exam 2: 1.96
n

=
7
79 1.96
10
or (74.661, 83.339).
Exam 3: 1.96
n

=
7
65 1.96
10
or (60.661, 69.339)
The confidence intervals overlap. This suggests that all three exams had the same population mean.
8.7 1.96
n

=
0.005
2.475 1.96
15
or (2.4725, 2.4775).
8.8 a.
s
t
n
=
3
24 1.9432
7
or (21.797, 26.203).
b.
s
t
n
=
6
42 2.8982
18
or (37.901, 46.099).
c.
s
t
n
=
14
119 2.0518
28
or (113.571, 124.429).
Note: t values are found using the Excel formula =tinv( ((1cc)),n1) where cc is the confidence coefficient.
For a. this, this would be = tinv(((1-.90)), 6)
8.9 a. Appendix D = 2.262, Excel = tinv(.05, 9) = 2.2622
b. Appendix D = 2.602, Excel = tinv(.02, 15) = 2.6025
c. Appendix D = 1.678 ,Excel = tinv(.10, 47) =1.6779
8.10 a. Appendix D = 2.021, Excel = tinv(.05, 40) = 2.0211
b. Appendix D = 1.990, Excel = tinv(.05, 80) = 1.9901
c. Appendix D = 1.984, Excel = tinv(.05, 100) = 1.984
All are fairly close to 1.96.
8.11 a.
s
t
n
=
27.79
45.66 2.0860
21
or (33.01, 58.31).
b. The confidence interval could be narrower increasing the size of the sample or decreasing the confidence
level.
8.12 a.
s
t
n
=
3.649
19.88 1.753
16
or (18.276, 21.474).
Note: t values are found using the Excel formula =tinv( ((1-cc)/2),n-1) where cc is the confidence coefficient.
For a. this, this would be = tinv(((1-.90)/2), 15)
8.13
s
t
n
=
78.407
812.5 3.1058
12
or (742.20, 882.80).
83
8.14 a.
s
t
n
=
17541.81
24520 2.2622
10
or (11971, 37069).
b. Increase the sample size or decrease the confidence level.
c. It is unclear whether this distribution is normal or not. There appear to be outliers.
8.15 1.
s
t
n
=
4.3716
85 2.2622
10
or (81.873, 88.127).
2.
s
t
n
=
8.127
88.6 2.2622
10
or (82.787, 94.414).
3.
s
t
n
=
3.712
76 2.2622
10
or (73.345, 78.655).
b. Confidence intervals 1 and 2 overlap. The scores on exam 3 are very different than the first two. There
was a decrease in the average exam score on the third exam.
c. Here the standard deviation is not known, so use the t-distribution.

8.16 a. (1 ) n =
.5*30*.5
= 2.739
b. (1 ) n =
.2*50*.8
= 2.828
c. (1 ) n =
.1*100*.9
= 0.030
d. (1 ) n =
.005*500*.995
= .0032
Normality okay except in d.
8.17 a. 1.96 ( (1 )) / n = 1.96 (.5(1 .5)) / 250 = .0620
b. 1.96 ( (1 )) / n = 1.96 (.5(1 .5)) /125 = .0877
c. 1.96 ( (1 )) / n = 1.96 (.5(1 .5)) / 65 = .1216
8.18
*(1 ) p p
p z
n

=
.048*(1 .048)
.048 1.96
500

: .0293, .0667
b. Yes, .048*500 = 24 which is larger than 10.
c. The Very Quick Rule would not work well here because p is small.
8.19 a.
*(1 ) p p
p z
n

=
.3654*(1 .3654)
.3654 1.96
52

or (.2556, 4752)
b. .3654*52 = 19. Normality assumption is met.
8.20 a.
*(1 ) p p
p z
n

=
.419*(1 .419)
.419 1.645
43

or (.2948, .5424)
b. Given np = 18 we can assume normality.
8.21 a.
*(1 ) p p
p z
n

=
.046(1 .046)
.046 1.96
250

or (.1507, .3199)
b. Given np = 11.5 we can assume normality.
8.22 a.
*(1 ) p p
p z
n

=
.48*(1 .48)
.48 2.326
50

: .3157, .6443
b. Yes, np = 24 which is greater than 10.
84
8.23 a.
*(1 ) p p
p z
n

=
.2352*(1 .2353)
.2353 2.326
136

: .1507, .3199
b. Yes, np = 32 which is greater than 10.
c.
2
(1 )
z
n
E

| |
=
|
\
=
2
1.645
.2353(1 .2353)
.06
| |

|
\
= 136

2
(1 )
z
n
E

| |
=
|
\
=
2
1.96
.2353(1 .2353)
.03
| |

|
\
= 768
d. When the desired error decreases and the desired confidence increases, the sample size must increase.
8.24 Using MegaStat:
a. 55
b. 217
c. 865
8.25 Using MegaStat: 25
8.26 Assume a normal distribution. Solve for sigma using = (28-20)/4 = 2. From Megastat: n = 11.
8.27 Assume a normal distribution. Solve for sigma using = (200-100)/4 = 25. From Megastat: n = 97.
8.28 Assume a Poisson distribution. = = 2.1213. From Megastat: n = 98.
8.29 a.
2
z
n
E
| |
=
|
\
=
2
1.96*86.75
25
| |
|
\
= 47
b. We assumed normality and estimated = (3450-3103)/2 = 86.75
8.30 a. Using Megastat: 2165. We use =.5.
b. Sampling method: Perhaps a poll via the Internet.
8.31 a. Using Megastat: 1692. We use =.5.
b. Sampling method: Mailed survey.
8.32 a. Using Megastat: 601. We use =.5.
b. Sampling method: Direct Observation.
8.33 a. Using Megastat: 2401. We use =.5.
b. Sampling method: Random sample via telephone or Internet survey.
8.34 a. From Megastat: (1,243.75, 86.23)
b. (1,268.2, 61.80) Intervals are similar and show that the true mean difference is less than zero.
c. 1: (740, 1,462) 2: (1,246, 2,286) Yes, the intervals overlap.
d. According to answers from parts a and b, one would conclude that there is a difference in means.
8.35* a. From Megastat: (1.16302, 0.830302)
b. (1.15081, 0.79081) Intervals are similar. Assumptions about variances did not make a big difference.
c. 1: (7.95, 9.33) 2: (8.1, 9.54) Yes, the intervals overlap.
d. From all calculations it appears that there is not a significant difference in the two means.
8.36* From Megastat and assuming equal variances, (181.84, 16.50). Conclude that undergrads tend to pay less.
85
8.37* From Megastat: (.1584, .1184). Because zero falls in the interval, we cannot conclude there is a difference
in proportions.
86
8.38* From Megastat: (.1352, .6148). Because the interval is greater than zero, we can conclude that 1 > 2.
8.39* From Megastat: (.0063, .1937). Because the interval is greater than zero, we can conclude that 1 > 2.
8.40* a. From Megastat: (53.60, 248.72)
b. (81.08, 323.63)
8.41 From Megastat: (1.01338, 1.94627)
)Take the square root of the lower and upper CI values given to get the CI for the standard deviation of the
population.)
8.42* From Megastat: (0.731438, 2.321751)
8.43 From Megastat: (5.882, 10.914)
(Take the square root of the lower and upper CI values given to get the CI for the standard deviation of the
population.)
8.44 Students should observe that raw data histograms will show a uniform distribution whereas the histogram of
sample means shows a shape less uniform and closer to a normal distribution. The average of the raw
data will be equal to the average of the sample means. The population mean is 49.5 so we would expect
our data set to have a value close to this. The population standard deviation is 28.58 so we would expect
the raw data to have a sample standard deviation close to this. We would also expect the standard
deviation of the means to be close to 14.29 (28.48/sqrt(4)). The point of this exercise is to observe that
the average of sample means is close to the population mean but the standard deviation of sample means
is smaller.
8.45 a. Because the diameter is continuous there will always be slight variation in values from nickel to nickel.
b. From Megastat: (.8332, .8355)
c. The t distribution assumes a normal population, but in practice, this assumption can be relaxed, as long as
the population is not badly skewed. We assume that here.
d. Use
2
z
n
E
| |
=
|
\
to estimate the sample size. z = 2.577.and so n = 95
8.46 a. From Megastat: (3.2283, 3.3813)
b. Use
2
z
n
E
| |
=
|
\
to estimate the sample size. z = 1.645 and so n = 53.
c. The flow of the mixture might be one factor. There are many other possibilities.
8.47 a. From Megastat: (29.4427, 39.6342)
b. The cups may be different size. Having different people take the sample and count can cause lack of
consistency in sampling process. Different boxes may have different numbers of raisin.
c. Producer most likely regulates raisins by weight, not count.
d. A quality control system would increase consistency by monitoring system so producer knows what is
expected from their process and how their process varies. Then producer can work on minimizing
variation by eliminating causes of variation.

8.48 a. From Megastat: (266.76, 426.24)
b. Zero values suggest a left skewed distribution.
c. Use
2
z
n
E
| |
=
|
\
with z = 2.577 to get n = 1927.
d. Because n > N, increase the desired error. For example, if E = 50 then n = 78.
87
8.49 a. From Megastat: (19.249, 20.689)
b. Fuel economy can also vary due to tire pressure and weather. There may be more than sampling variability
contributing to differences in sample means.
8.50 a. From Megastat: (7.292, 9.98)
b. There are possible outliers that make the normal distribution questionable.
c. Use
2
z
n
E
| |
=
|
\
with z = 2.326 to get n = 38.
8.51 a. From Megastat: (33.013, 58.315)
b. With repair costs it is possible the distribution is skewed to the right. Also, the population size is small
relative to the sample size which might cause problems.
c. Use
2
z
n
E
| |
=
|
\
with z = 1.96 to get n = 119.
d. From Megastat: (21.26, 40.14)
8.52 a. From Megastat: (3,230, 3,326)
b. Use
2
z
n
E
| |
=
|
\
with z = 1.96 to get n = 91.
c. The line chart shows a decrease in the number of steps over time.
8.53 a. From Megastat: (29.078, 29.982)
b. Normality is a common distribution for height but at younger ages it is possible to see high outliers.
c. Use
2
z
n
E
| |
=
|
\
with z = 1.96 to get n = 116.
8.54 a. From Megastat: (74.02, 86.81)
b. The sample is somewhat small and the length of the commercial could be a function of the type of time out
called.
8.55 a. From Megastat: (48.515, 56.965)
b. The distribution is more likely to be skewed to the right with a few CDs having very long playing times.
c. Use
2
z
n
E
| |
=
|
\
with z = 1.96 to get n = 75.
88
8.56 a. Estimated standard deviation using the uniform approximation using
2
( )
12
b a
=
and normal
approximation using
4
b a
= where b is the maximum and a is the minimum of the range.
Uniform
Distribution
Normal
Distribution
Chromium
0.0635 0.055
Selenium
0.0004 0.00035
0.0043 0.00375
Fluoride
0.0289 0.0250
b. An estimate of the standard deviation is necessary to calculate the sample size needed for a desired error
and confidence level.
8.57 a. From Megastat: (.125, .255)
b. Normality can be assumed. np = 19.
c. Use
2
(1 )
z
n
E

| |
=
|
\
with z = 1.645 to get n = 463.
d. A quality control manager needs to understand that the sample proportion will usually be different from the
population proportion but that the way the sample proportion varies is predictable.
8.58 a. From Megastat: (.039, .117)
b. Different industries may have different quantities and types of records, especially if they have many
government contracts.
8.59 a. From Megastat: (.092, .134)
b. Yes, np = 69.
c. Use
2
(1 )
z
n
E

| |
=
|
\
with z = 1.645 to get n = 677.
d. Yes, the results could be very different today. There is a stronger focus on nutrition today than there was
10-15 years ago.
8.60 a. From Megastat: (.176, .294)
b. Normality assumption holds. np = 47 and n(1p) = 153.
c. No, the Very Quick Rule suggests that p should be close to .5. In this example, p = .235.
d. n = 304.
e. Frequent sampling of the noodle mix would help the manufacturer identify problems and stay on target.
8.61 a. From Megastat: (.595, .733)
b. No, viewers of the late night program are a unique group of television watchers compared to the rest of the
population. Not all TV watchers stay up late to watch the late night programs.
8.62 a. From Megastat: ((.012, .014)
b. The sample size is large enough to make the normal assumption valid.
89
8.63 a. standard error =
*(1 ) p p
n

= .0127
b. From Megastat: (.121, .171)
c. No, np = 112.
d. VQR: (.1103, .1821)
8.64 a. From Megastat: (.093, .130)
b. Normality assumption holds.
c. VQR: (.0753, .1472). This interval is wider than the one found in part a.
d. This is one batch of popcorn kernels, not a random sample from the food producer. It is not clear that the
person taking the sample used random sampling techniques. We also do not know the age of the popcorn
that was popped.
8.65 a. From Megastat: (.001, .002)
b. No, we would like to know the proportion of all mosquitoes killed, not the proportion of mosquitoes killed
out of bugs killed.
8.66 a. From Megastat: (.616, .824)
b. Yes, the sample size is large enough to use the normal assumption.
c. Contacting the longshoremens union might help the sampling process.
8.67* With normality: (.011, .087). With binomial (.0181, .1031).

8.68* With normality (.002, .402). With binomial (.0433, .4809). Normality is not justified because n is too small.
8.69* a. From Megastat: (.914, 1.006)
b. The upper limit on the confidence interval is greater than 1.
c. Normality is not justified therefore use the binomial approach.
d. From MINITAB: (.8629, .9951)
8.70 a. margin of error =
.5(1 .5)
1.96 .0205
2277

=
b. From Megastat: (.423, .457)
c. Because the interval falls below .5 we would conclude that it is unlikely 50% of the voters opposed the
signing.
8.71 a. Margin of error =
.5(1 .5)
1.96 .04
600

= Assume that p = .5 and a 95% confidence level.


8.72* From Megastat: (10.82, 16.96)
8.73* From Megastat: (2.51, .11). Because zero falls in the interval we cannot conclude that there is a difference
in learning methods.
8.74* a. From Megastat: (.976, 2.13)
b. Assume equal variances and normal populations.
8.75* From Megastat: (.1063, .0081). Because zero falls in the interval we cannot conclude there is a difference in
the two proportions.
8.76* From Megastat: (.124, .027). Because zero falls in the interval we cannot conclude there is a difference in
the two proportions.
90
Chapter 9
One-Sample Hypothesis Tests
9.1 Graphs should show a normal distribution with a mean of 80.
a. Rejection region in the lower tail.
b. Rejection region in both tails.
c. Rejection region in the upper tail.
9.2 a. .05*1000 = 50 times
b. .01*1000 = 10 times
c. .001*1000 = 1 time
9.3 a. Null hypothesis the man is not having a heart attack. Type I error: I admit him when he does not have a
heart attack. Type II error: I fail to admit him, he does have a heart attack, and dies. Better to make a
Type I than Type II error, since a type II error is fatal.
b. Type I error: I reject the null and let them land even though they could have stayed up for 15 minutes (or
more). Type II Dont let the plane land and the plane runs out of fuel. It is more costly to make a Type
II error.
c. Type I error: I reject the null and rush out to Staples, get caught in the snow and fail to finish the report
(when if I had stayed I would have finished it). Type II error: I run out of ink and cant finish the report.
Better to stay and try to finish the report, in fact better to print out some of it than none of it.
9.4 Costly improvements may be too small to be noticed by customers and they may be unwilling to pay for
the improvement.
9.5 a. Null hypothesis: Employee is not using illegal drugs.
Alternative hypothesis: Employee is using illegal drugs.
b. Type I error: Test is positive for drugs when no drugs are being used by the individual
Type II error: Test is negative for drug use when the person is using drugs.
c. I might dismiss or discipline someone who is a non-drug user (Type I error). They could sue for
wrongful damages. I might keep on someone who should be dismissed and they cause serious injury via
a work related accident to themselves or others (Type II error). A Type II error could have more serious
consequences than a Type I error.
9.6 a. Null hypothesis: There is no fire.
Alternative hypothesis: There is a fire.
b. Type I error: A smoke detector sounds an alarm when there is no fire.
Type II error: A smoke detector does not sound an alarm when there is a fire.
c. Consequence of making a type I error is that some guests will be inconvenienced by a false alarm and
there is the cost of having the fire department summoned. Consequence of making a type II error is that
the hotel will burn down and perhaps kill or injure many.
d. Reducing risk would increase the likelihood making a Type I error, increasing the likelihood of
a false alarm. Guests of the hotel and perhaps the fire department would be affected.
9.7 a.
.25 .2
2.0
.2*.8
100
z

= =
. From Appendix C: p-value = .046.
b. z = 1.90, p-value = .9713
c. z = 1.14, p-value = .1271
91
9.8 a.
.7 .6
1.83
.6*.4
80
z

= =
. From Appendix C: p-value = .0339.
b. z = 2.07, p-value = .0384
c. z = 2.33, p-value = .0098
9.9 a. .30*20 = 6 < 10 and .7*20 = 14 > 10 (cannot assume normality)
b. .05*50 = 2.5 < 10 and .95*50 = 47.5 >10 (cannot assume normality)
c. .10*400 = 40 > 10 and .9*400 = 360 > 10 (normality can be assumed)
9.10 H0: = .10 versus H1: > .10. z = 2.67, p-value = .0038 so reject H0 and conclude that the proportion who
believe Pepsi is concerned with consumers health has increased.

9.11 a. H0: = .997 versus H1: < .997 Reject the null hypothesis if z < 1.645.
z = 1.83 so reject H0.
b. Yes, .997*2880 = 2866 and .003*2880 = 14, both are greater than 10.
c. A Type I error would sending back an acceptable shipment. This could be a problem if the hospital runs
low on insulin syringes. A Type II error would be accepting a bad shipment. The will be a problem if a
defective syringe is used for an insulin injection.
d. p-value = .034
e. Reducing would result in an increase in Type II errors. This type of error would most likely be the one
with the more severe consequence therefore reducing is not a good idea.
9.12 a. H0: .20 versus H1: > .20 Reject the null hypothesis if z > 1.645.
z = 2.52 so reject H0. The proportion of stores that sell cigarettes to under age teens is higher than the
goal.
b. 95% CI: (.2084, .3041)
c. A two-tailed test uses /2 to determine the critical value of the test statistic. A CI uses /2 to determine
the value of the test statistic used in the margin of error calculation. Because these are the same values, a
CI can be used to make a conclusion about a two-tailed hypothesis test.
9.13 a. H0: .50 versus H1: > .50 Reject the null hypothesis if z > 1.645
z = 2.0 so reject H0. The proportion of calls lasting more than 2 minutes is greater than .5.
b. p-value = .0228
c. Yes, a difference of 12.5 % is important.
9.14 a. H0: .052 versus H1: < .052. Reject the null hypothesis if z < 2.33. z = 1.72 therefore fail to reject
the null hypothesis.
b. p-value = .0427
c. n0 = 15.6 > 10 so normality is justified.
9.15 H0: .50 versus H1: > .50. Reject the null hypothesis if z > 1.645. z = 6.87 so reject H0.
9.16* a. Normality is not justified. n = 6 < 10.
b. Using the normal assumption, z = 2.31 and the p-value = .0105.
c. Using the binomial distribution P(X 10) = .0193.
d. The normal probability is only an approximation to the binomial probability.
9.17* a. Using the binomial distribution, P(X 4 | n = 2000, = .001) = .143. The standard is being met.
b. Because there are only 4 defective items observed we cannot assume normality.
9.18* a. P(X 2 | n = 20, = .0005) = .00005. The students are Julliard do differ from the national norm.
b. Normality cannot be assumed because there were less than 10 students in the sample with perfect pitch.
92
9.19 a. z =
63 60
1.5
8
16

=
, p-value = .1336
b. z = 2.0, p-value = .0228.
c. z = 3.75, p-value = .0001.
9.20 a. p-value = P(Z > 1.34) = .0901
b. p-value = P(Z < 2.07) = .0192
c. p-value = 2*P(Z < 1.69) = .091
9.21 H0: = 2.035 oz versus H1: > 2.035 oz. z = 2.50 and p-value = .0062. Reject H0. The mean weight is
heavier than it is supposed to be.
9.22 a. H0: 195 flights/hour versus H1: > 195 flights/hour. Reject H0 if z > 1.96.
b. z = 2.11 so reject the null hypothesis and conclude that the average number of arrivals has increased. If
we had used an = .01, we would have failed to reject the null hypothesis.
c. We have assumed a normal population or at least one that is not badly skewed.
9.23 a. H0: =10 oz versus H1: 10 oz. Reject H0 if z > 1.96 or z < 1.96.
b. z = 0.7835 so we fail to reject the null hypothesis (p = .4333).
c. We assume the population is normally distributed.
9.24 a. Using Excel: TDIST(1.677, 12,1) = .0597, fail to reject H0 at = .05.
b. TDIST(2.107,4,1) = .0514, fail to reject H0 at = .05.
c. TDIST(1.865,33,2) = .0711, fail to reject H at = .050.
9.25 a.
203 200
1.5
8
16
t

= =
, p-value = .1544. Fail to reject the null hypothesis.
b. t = 2.0, p-value = .0285. Reject the null hypothesis.
c. t = 3.75, p-value = .0003. Reject the null hypothesis.
9.26 a. H0: 530 bags/hour versus H1: < 530 bags/hour. Reject H0 if t < 1.753. t = 1.60 so fail to reject the
null hypothesis.
b. When problems arise there could be an inordinately low number of bags processed. This would create a
skewed distribution.
9.27 a. H0: 400 sf/gal versus H1: < 400 sf/gal. Reject H0 if t < 1.476. t = 1.98 therefore reject H0.
b. Yes, if were less than or equal to .05 our decision would be different.
c. p-value = .0525. Because .0525 < .10, we would reject the null hypothesis.
d. A significant result in a hypothesis does not always translate to a practically important difference. In this
case, if the painter plans his or her paint purchase based on coverage of 400 square feet per gallon, but in
reality the paint covers 5% less, the painter may run short on large jobs. A difference of 5% may not
matter on a small job.
9.28 a. H0: 18 oz versus H1: < 18 oz. Reject H0 if t < 1.74. t = 2.28 therefore reject H0.
b. Yes, if =.01, we would have failed to reject the null hypothesis.
c. p-value = .018. Because the p-value < .05 we reject the null hypothesis.
9.29 a. H0: 19 minutes versus H1: < 19 minutes. Reject H0 if t < 1.729. t = 2.555 therefore reject H0.
d. p-value = .0097. Because the p-value < .05 we reject the null hypothesis.
93
9.30 a. H0: 30,000 miles versus H1: > 30,000 miles. Reject H0 if t > 1.325. t = 1.53 therefore reject H0.
This dealer shows a significantly greater mean number of miles than the national average for two year
leases.
9.31 a. H0: = 3.25 versus H1: 3.25. Reject H0 if t > 2.11 or t <2.11. t = 1.70 therefore fail to reject H0.
b. The 95% confidence interval is: (3.2257, 3.4743). Because this interval does contain 3.25 we would fail
to reject the null hypothesis.
c. We constructed the 95% confidence interval using the t statistic associated with .025 in the tail areas.
This is the same area we used to determine the critical value for the t statistic. A two-tailed test and a
confidence should always result in the same conclusion as long as is the same for both.
9.32* a. Power = .4622
b. Power = .7974
c. Power = .9459
9.33* a. Power = .3404
b. Power = .7081
c. Power = .9107
When you decrease the power will decrease as shown above.
9.34* a. Power = .0924
b. Power = .3721
c. Power = .7497
Using Learning Stats Excel file 09-08 PowerCurvesDIY.xls:
Results from Learning Stats:
.
94
BetaLeft
9.35 a. Power = .2595
b. Power = .6388
c. Power = .9123

Results from Learning Stats:
BetaLeft
9.36* H0:
2
24 versus H1:
2
< 24. Reject the null hypothesis if
2
> 16.92 or
2
< 3.325.
2
= 6.0 therefore we
fail to reject H0.
9.37* H0:
2
1.21 versus H1:
2
> 1.21. Reject the null hypothesis if
2
> 28.87.
2
= 29.16 therefore we reject H0.
9.38* H0:
2
= 0.01 versus H1:
2
0.01. Reject the null hypothesis if
2
> 27.49 or
2
< 6.262.
2
= 14.65 therefore
we fail to reject H0.
9.39* H0:
2
= 625 versus H1:
2
625. Reject the null hypothesis if
2
> 21.92 or
2
< 3.816.
2
= 8.26 therefore
we fail to reject H0.
9.40 a. P(Type II error) = 0.
b. This is bad policy because the chance of making a Type I error is uncontrolled.
9.41 a. P(Type I error) = 0.
b. This is bad policy because the chance of making a Type II error is uncontrolled.
9.42 a. H0: 90 versus H1: > 90.
b. Type I error occurs when the physician concludes a patient has high blood pressure when they do not.
Type II error occurs when the physician concludes that a patients blood pressure if OK when it is too
high.
c. A Type II error would have a more serious consequence. The patient could have severe health problems
if high blood pressure is undiagnosed.
9.43 a. H0: User is authorized versus H1: User is unauthorized.
b. Type I error occurs when the scanner fails to admit an authorized user. Type II error occurs when the
scanner admits an unauthorized user.
c. A Type II error has a more serious consequence. Allowing entry to an unauthorized user could result in
damage to the plant or possibly even a terrorist attack.
95
9.44 P(Type II error) = 0. Weve rejected the null hypothesis therefore it is impossible to make a Type II error.
9.45 P(Type I error) = 0. There can be no Type I error if we fail to reject the null hypothesis.
9.46 a. H0: A patient is does not have cancerous cells versus H1: A patient has cancerous cells. A false negative
is a Type II error and means that the test shows no cancerous cells are present when in fact there are. A
false positive is a Type I error and means that the test shows cancerous cells are present when they are
not.
b. In this case null stands for absence.
c. The patient bears the cost of a false negative. If their health problems are not diagnosed early they will
not seek treatment. The insurance company bears the costs of a false positive. Typically more tests will
need to be done to check the results.
9.47 a. H0: A patient does not have an infected appendix versus H1: A patient does have an infected appendix. A
Type I error occurs when a healthy appendix is removed. A Type II error occurs when an infected
appendix goes undetected. The consequences of a Type I error include all the risks one is subjected to
when undergoing surgery as well as the cost of an unnecessary operation. The consequences of a Type II
error include a ruptured appendix which can cause serious health issues.
b. Type II error rates are high because diagnosing appendicitis is actually quite difficult. Type I error rates
are high because the consequences of not removing an infected appendix are very serious.

9.48 a. Type I error: You should have been accepted, but the scanner rejected you. Type II error: You should
have been rejected, but the scanner accepted you.
b. The consequence of falsely rejecting someone is not as severe as falsely accepting someone. Or it could
be that the scanner is dirty and cannot read the fingerprint accurately.
9.49 The likelihood of the PSA test result showing positive for cancer is 25%. The patient who is told he has
cancer as well as his family is affected. Most likely, with an error rate this high, the physician would
perform a second test to verify the results.
9.50 This is the probability of making a type I error. This means that half of the women who test positive are
initially told that they do, not that half of the women tested do not have cancer.
9.51 a A two-tailed test would be used. You would not want to overfill or under-fill the can.
b. Overfilling costs you money and under-filling cheats the customer.
c. Because the weight is normally distributed and the population standard deviation is known the sample
mean will have a normal distribution.
d. Reject the null hypothesis if z > 2.575 or z < 2.575.
9.52 a. Because the population distribution is normal and you know the population standard deviation, you
should use the normal distribution for the sampling distribution on the sample mean.
b. H0: = 520 versus H1: 520. Reject H0 if z > 1.96 or z < 1.96.
c. z = 5.0 therefore reject the null hypothesis. The sample result is highly significant showing there is a
difference in the mean fill.
9.53 a. H0: 90 versus H1: < 90.
b.
0

X
t =
s
n

Reject H0 if t < 2.998.
c.
88.375 90
4.984
8
=-0.92 t =

Because 0.92 < 2.998 we fail to reject the null hypothesis. The sample does not
give enough evidence to reject Bobs claim that he is a 90+ student.
96
d. We assume that the population distribution in normal.
e. The p-value = .1936. Because .1936 > .01 we fail to reject the null hypothesis.
9.54 a. H0: 10 pages versus H1: > 10 pages. Reject H0 if t > 2.441. t = 5.90 so reject the null hypothesis and
conclude that the true mean is greater than 10 pages.
b. The p-value 0 so we would reject the null hypothesis.
9.55 a. H0: 2.268 grams versus H1: < 2.268 grams. Reject H0 if t < 1.761. t = 1.79 so reject the null
hypothesis and conclude that the true mean is less than 2.268 grams.
b. With use, the metal could erode slightly so that the average weight is less than the newly minted dimes.
9.56 a. H0: .50 versus H1: > .50. Reject H0 if z > 1.282. z = 2.07 so reject the null hypothesis and conclude
that the true proportion is greater than .5.
b. p-value = .0194 so we would reject the null hypothesis. The coin is biased towards heads.
9.57 a. H0: .10 versus H1: > .10. Reject H0 if z > 1.645. z = 2.00 so reject the null hypothesis and conclude
that the true proportion is greater than .1.
b. Yes, if were less than .0228, our decision would be different.
c. p-value = .0228. Conclude that more than 10% of all one-dollar bills have something extra written on
them.
9.58 a. H0: .25 versus H1: > .25. Reject H0 if z > 1.645. z = 1.39 so fail to reject the null hypothesis.
b. This is not a close decision.
c. We assume a normal distribution on the sample statistic, p. This makes sense because both n > 10 and
n(1) > 10.
9.59 a. H0: .05 versus H1: > .05. Reject H0 if z > 1.96. z = 1.95 so we fail to reject the null hypothesis at
the .025 level of significance. The standard is not being violated.
b. p-value = .0258. .0258 > .025 therefore fail to reject the null hypothesis. This decision is very close.
9.60 a. H0: 30 years versus H1: > 30 years. Reject H0 if t > 1.796. t = 3.10 so reject the null hypothesis and
conclude that the true mean age is greater than 30 years.
b. The sample mean was 33.92. This difference is probably unimportant.
c. p-value = .0051 which is much smaller than .05 so the result is statistically significant.
9.61 a. H0: .10 versus H1: > .10. Reject H0 if z > 1.645. z = 1.11 so fail to reject the null hypothesis. We do
not have strong evidence to conclude that more than 10% of all flights have contaminated drinking water.
b. p-value = .1327.
9.62 H0: .95 versus H1: > .95. Reject H0 if z > 1.96. z = 2.05 so reject the null hypothesis and conclude that
the true proportion is greater than .95. The company is exceeding its goal.
9.63 a. H0: .50 versus H1: < .50. Reject H0 if z < 1.645. z = 2.07 so reject the null hypothesis and
conclude that the true proportion is less than .5.
b. p-value = .0193. .0193 < .05 therefore we would reject the null hypothesis.
c. The sample proportion was .46. This is a difference of 4%. This is an important difference. There are
thousands of college athletes in the US. Increasing the graduation rate for college athletes is a goal that
many universities are striving for today.
9.64 a. H0: $250 versus H1: > $250. Reject H0 if t > 1.711. t = 1.64 so we fail to reject the null hypothesis.
It does not appear that the average out of pocket expense is greater than $250.
b. The decision if fairly close.
97
9.65 a. 95% CI (.173, .2684)
b. This sample is consistent with the hypothesis that no more than 25% of hams are underweight.
H0: .25 versus H1: > .25. However, if the goal were stated as having less than 25% of the hams
underweight, the set of hypotheses would be: H0: .25 versus H1: < .25. In this case, the sample
would not support the goal.
c. A confidence interval is equivalent to a two-tailed test because the critical value of the test statistic used
in the hypothesis test is the same value used to calculate the margin of error in the confidence interval.
9.66 H0: 5 days versus H1: > 5 days. Reject H0 if t > 1.796. t = 0.10 so we fail to reject the null hypothesis. It
does not appear that the average repair time is longer than 5 days so the goal is being met.
9.67 a. H0: 300 rebounds versus H1: > 300 rebounds. Reject H0 if t > 2.201. t = 0.204 so we fail to reject
the null hypothesis. It does not appear that the average number of rebounds is greater than 300.
b. There may be outliers in the population of NBA players.
9.68 a. H0: = 1.223 kg versus H1: 1.223 kg. Reject H0 if t > 2.201 or t < 2.201.
b. t = 0.33 so we fail to reject the null hypothesis. It does not appear that the mean weight is different from
1.223 kg.
9.69* a. P(X 3 | n = 100, = .01) = .0794. Because .0794 > .025 we fail to reject the null hypothesis.
b. The p-value is .0794. This sample does not contradict the automakers claim.
9.70* a. P(X 2 | n = 36, = .02) = .1618. Because .1618 > .10 we fail to reject the null hypothesis.
b. p-value = .1618. This sample does not show that the standard is exceeded.
9.71* H0: .50 versus H1: > .50. Let n = 16 and x = 10. Find P(X 10 | n = 16, = .5) = .2272. Because .2272
> .1, we cannot conclude that more than 50% feel better with the experimental medication.
9.72 H0: .10 versus H1: < .10. P(X = 0 | n = 31, = .1) = .0382. Because .0382 < .10, we can reject the null
hypothesis. It appears that the on-time percentage has fallen.
9.73* a. From MINITAB: 95% confidence interval is (0, .0154).
b. A binomial distribution should be used because n =0 which is less than 10.
c. Yes, this sample shows that the proportion of patients who experience restenosis is less than 5%.
9.74 a. The p-value is .042. A sample proportion as extreme would occur by chance about 42 times in 1,000
samples if in fact the null hypothesis were true. This is fairly convincing evidence that the drug is
effective.
b. A p-value of .087 is approximately twice .042. This sample is less convincing of the effectiveness of the
drug.
9.75 a. The p-value tells us the chance of making this particular sample observation if in fact the null hypothesis
is true. A small p-value says that there is a very small chance of making this sample observation
assuming the null hypothesis is true therefore our assumption about the null hypothesis must be false.
98
9.76* Using the worksheet 09-08 PowerCurvesDIY.xls:
BetaLeft
9.77* Using the worksheet 09-08 PowerCurvesDIY.xls:

BetaLeft
99

BetaLeft
9.78* H0:
2
= 64 versus H1:
2
64. Reject the null hypothesis if
2
> 39.36 or
2
< 12.40.
2
= 24.68 therefore we
fail to reject H0.
9.79* a. H0: 106 versus H1: > 106. Reject the null hypothesis if t > 2.807. t = 131.04 so reject the null
hypothesis. The mean brightness is considerably greater than 106.
b. H0:
2
.0025 versus H1:
2
< .0025. Reject the null hypothesis if
2
< 9.26.
2
= 12.77 therefore we
would fail to reject the null hypothesis. This sample does not provide evidence that the variance is less
than .0025.
9.80 Answers will vary but should consider the following points:
a. The null hypothesis is that the patients cholesterol is less than the threshold of treatable
hypercholesterolemia. The alternative is that the patients cholesterol is greater than the threshold of
treatable hypercholesterolemia. A Type I error is a false positive; we rejected the null when it is true. A
type II error is a false negative; we fail to reject the null hypothesis when the null is false.
b. Discussion should focus on the costs borne by the doctor for a false negative vs. costs borne by patient
for living with a false positive (both financial as well as psychological.)
c. Patient wants to minimize a Type I error. Doctor or HMO want to minimize a Type II error.
d. Discussion could include proper diet, American fast food culture, the movie Supersize Me, choice of
food, individuals right to eat what they want, responsibility (or not) of businesses to offer alternative
foods to help lower cholesterol, responsibility of individuals with respect to food choices.
100
Chapter 10
Two-Sample Hypothesis Tests
10.1 For each problem, the following formulas were used:
1 2 1 2
calc c
1 2
c c
1 2
combined number of successes p p
x x
= where p
z
combined sample size
n n 1 1
p (1 - p ) +
n n
+
= =
+
(
(

a. Standard error: .0987
Z Test Statistic: 2.43
p-value: 0.0075
Z Critical: -2.3263
Decision is not close: reject H0
b. Standard error: .0884
Z Test Statistic: 2.26
p-value: .0237
Z Critical: +/- 1.645
Decision is not close: reject H0
c. Standard error: .07033
Z Test Statistic: 1.7063
p-value: 0.0440
Z Critical: -1.645
Decision is close: reject H0
10.2 For each problem, the following formulas are used:
1 2 1 2
calc c
1 2
c c
1 2
combined number of successes p p
x x
= where p
z
combined sample size
n n 1 1
p (1 - p ) +
n n
+
= =
+
(
(

a. Standard error: .0555
Z Test Statistic: 1.4825
p-value: 0.1382
Z Critical: +/- 1.9600
Decision is not close: fail to reject H0
b. Standard error: .0618
Z Test Statistic: 2.162
p-value: .0153
Z Critical: 2.3263
Decision is not close: reject H0
c. Standard error: .01526
Z Test Statistic: 1.638
p-value: .0507
Z Critical: 1.645
Decision is close: fail to reject H0
101
10.3 a. Define 1 = proportion of shoppers that paid by debit card in 1999. Define 2 = proportion of shoppers
that paid by debit card in 2004.
H0: 1 = 2 versus H1: 1 < 2. This is a left-tailed test. Reject the null hypothesis if z < 2.33.
b. z = 2.28 so we fail to reject the null hypothesis (although the decision is close.) The sample does not
provide strong enough evidence to conclude that there is a difference in the two proportions.
c. p-value = .0113.
d. Normality is assumed since n1p1 > 10 and n2p2 > 10.
10.4 a. Define 1 = proportion of loyal mayonnaise purchasers. Define 2 = proportion of loyal soap purchasers.
H0: 1 = 2 versus H1: 1 2. This is a two-tailed test. Reject the null hypothesis if z < 1.96 or z > 1.96.
z = 1.725 therefore we fail to reject the null hypothesis. The sample evidence does not show a significant
difference in the two proportions.
b. 95% confidence interval: (.015, .255). Yes, the interval does contain zero.
10.5 a. Define 1 = proportion of respondents in first group (the group given the gift certificate.) Define 2 =
proportion of respondents in the second group.
H0: 1 = 2 versus H1: 1 2. This is a two-tailed test. Reject the null hypothesis if z < 1.96 or z > 1.96.
z = 2.021 therefore we reject the null hypothesis. The sample shows a significant difference in response
rates.
b. 95% confidence interval: (.0013, .0787). No, the interval does not contain zero. We estimate that the
response rate for the group given the gift certificate is higher than the group that did not receive the gift
certificate.
10.6 a. Define 1 = proportion of flights with contaminated water in August and September 2004. Define 2 =
proportion of flights with contaminated water in November and December 2004.
H0: 1 = 2 versus H1: 1 < 2. Reject the null hypothesis if z < 1.645. z = 1.1397 so we fail to reject the
null hypothesis. The level of contamination was not lower in the first sample.
b. p-value: 0.1272
c. From the public health perception, importance outweighs significance. Our sample information did not
allow us to conclude that the contamination proportion has gone down after sanitation improvements.
d. Yes, normality is assumed because both n1p1 > 10 and n2p2 > 10.
10.7 a. Survival rates: 28/39 = .72 and 50/53 = .94, respectively. Reject the null hypothesis that the survival
rates are equal if z < 1.28. z = 2.975 so we reject the null hypothesis. The survival rate for people with
pets is higher than for those without pets.
b. In the second sample n2 (1p2) < 10.
c. It is not clear that owning a pet is the direct cause of longer survival. There may be underlying causes
that contribute to longer survival that were not identified in the study.
10.8 a. H0: M = W versus H1: M W. Reject the null hypothesis if z < 1.645 or z > 1.645.
b. pM = .60 and pW = .6875
c. z = .69, p-value = .492. The sample does not show a significant difference in proportions.
d. Normality can be assumed because both n1p1 > 10 and n2p2 > 10.
10.9 a. H0: B = C versus H1: B C. Reject the null hypothesis if z < 1.96 or z > 1.96. z = 0.669 so we fail to
reject the null hypothesis. This sample does not give enough evidence to conclude that the proportions
are different.
b. Normality cannot be assumed because n2p2 < 10.
10.10 a. H0: 2 1 .05 versus H1: 2 1 > .05. Reject the null hypothesis if z > 1.28.
b. z = 1.14 and the p-value = .1272 so we fail to reject the null hypothesis. The percentage of shoppers
paying by debit card did not increase by 5%.
102
10.11 a. H0: 1 2 .10 versus H1: 1 2 > 10. Reject the null hypothesis if z > 1.645.
b. z = .63 and the p-value = .2644 so we fail to reject the null hypothesis. The proportion of calls lasting at
least five minutes has not decreased by 10%.
10.12 a. H0: 1 2 .20 versus H1: 1 2 > .20. Reject the null hypothesis if z > 1.96.
b. z = 3.006 and the p-value = .0013 so we reject the null hypothesis. The response rate did increase by at
least 20%.
10.13 Use the following formula for each test in (a) (c), substituting the appropriate values:
calc
2 2
1 2 1 2
1 2 1 2
1 2
=
t
( 1) + ( 1) 1 1
n n s s
+
( 1) + ( 1)
n n n n
X X

a. d.f.: 28
Standard error: 0.0931
t-calculated: -2.1483
p-value: 0.0202
t-critical: -2.0484
Decision: Reject
Formula for p-value: =TDIST(ABS(-2.1483),28,2)
b. d.f.: 39
Standard error: 1.8811
t-calculated: -1.5948
p-value: .1188
t-critical: +/- 2.0227
Decision: Not Reject
Formula for p-value: =TDIST(ABS(-1.5948),39,2)
c. d.f.: 27
Standard error: 1.0335
t-calculated: 1.9351
p-value: 0.0318
t-critical: 1.7033
Decision: Reject
Formula for p-value: =TDIST(ABS(1.9351),27,1)
10.14 Use the following formulas for each test in (a) (c), substituting the appropriate values:
calc
2 2
1 2
1 2
1 2
=
t
s s
+
n n
X X
with
( ) ( )
2
2 2
1 1 2 2
2 2
2 2
1 1 2 2
1
s n s n
d.f .
s n s n
n 1 n2 1
+
=
+

(

a. d.f.: 24
Standard error: 0.0931
t-calculated: -2.1483
p-value: 0.0210
t-critical: -2.0639
Decision: Reject
Formula for p-value: =TDIST(ABS(-2.0639),24,2)
103
b. d.f.: 32
Standard error: 1.9275
t-calculated: -1.5564
p-value: .1294
t-critical: +/- 2.0369
Decision: Not Reject
Formula for p-value: =TDIST(ABS(-1.5564),32,2)
c. d.f.: 23
Standard error: 1.0403
t-calculated: 1.9226
p-value: 0.0335
t-critical: 1.7139
Decision: Reject
Formula for p-value: =TDIST(ABS(1.9226),23,1)
10.15 a. H0: 1 = 2 versus H1: 1 2. Reject the null hypothesis if t < 1.677 or t > 1.677 (48 df). t = 0.7981 so
we fail to reject the null hypothesis. There is no difference in the average length of stay between men and
women pneumonia patients.
b. The p-value = .4288.
10.16 a. H0: exped = explo versus H1: exped < explo. Reject the null hypothesis if t < 2.552 (18 df). t = 3.704 so we
reject the null hypothesis. The average MPG is lower for the Expedition than the Explorer.
b. The p-value = .0008.
10.17 a. H0: 1 2 versus H1: 1 > 2. Reject the null hypothesis if t >2.462 (29 df). t = 1.902 so we fail to reject
the null hypothesis. The average amount of purchases when the music is slow is not less than when the
music is fast.
b. The p-value = .0336.
10.18 H0: 1 2 versus H1: 1 > 2. Reject the null hypothesis if t >2.074 (22 df). t = 2.26 so we reject the null
hypothesis. The average shoe size appears to have increased.
10.19 H0: 1 2 versus H1: 1 < 2. Reject the null hypothesis if t < 2.145. t = 2.22 so we reject the null
hypothesis. The sample data show a significant decrease in the average number of migraines each month
when using Topiramate.
10.20 a. Define the difference as New Old. H0: d 0 versus H0: d > 0. Reject the null hypothesis if t > 1.833
(9 df.) t = 2.03 so we reject the null hypothesis. The new battery shows significantly greater average
hours of charge.
b. The decision is not close. The p-value is .0363 which is less than .05.
c. Yes, this is an important difference. The sample showed a difference of 5 hours.
10.21 a. Define the difference as Daughters Height Mothers Height. H0: d 0 versus H0: d > 0. Reject the
null hypothesis if t > 1.943 (6 df.) t = 1.93 so we fail to reject the null hypothesis. There is not a
significant difference in height between mothers and daughters.
b. The decision is close. The p-value is .0509 which is slightly greater than .05.
c. A daughters height is affected by her fathers height as well as her grandparents. Nutrition also plays a
role in a persons development.
10.22 a. Define the difference as Old New. H0: d 0 versus H0: d > 0. Reject the null hypothesis if t > 2.132
(4 df.) t = 2.64 so we reject the null hypothesis. The new method shows a significantly faster average.
b. The decision is not close. The p-value is .0287 which is less than .05.
104
10.23 a. Define the difference as No Late Fee late Fee. H0: d 0 versus H0: d > 0. Reject the null hypothesis if
t > 1.383 (9 df.) t = 2.86 so we reject the null hypothesis. The average number of rentals has increased.
b. The decision is not close. The p-value is .0094 which is less than .10.
c. Yes, this is an important difference. The sample showed an average increase of 2 rentals per month
which is a 20% increase. This means more revenue for the store.
10.24 a. Define the difference as Daughter Mother. H0: d 0 versus H0: d > 0. Reject the null hypothesis if t >
2.718. t = 3.17 so we reject the null hypothesis. The average shoe size of a daughter is greater than her
mother.
b. The decision is not close. The p-value is .0045 which is less than .01.
c. Not sure if this is an important distinction. The sample showed a difference of less than a whole shoe
size.
d. In general, adults are showing a trend of increasing size.
10.25 Define the difference as Entry Exit. H0: d = 0 versus H0: d 0. Reject the null hypothesis if t > 3.499
or t < 3.499 (7 df.) t = 1.71 so we fail to reject the null hypothesis. There is no difference between the
number of entry failures and exit failures. The decision is not close. The p-value is .1307 which is much
greater than .01.

10.26 a. H0: 1
2
= 2
2
versus 1
2
2
2
. Reject H0 if F > 4.76 or F < .253. (1 = 10, 2 = 7.) F = 2.54 so we fail to
reject the null hypothesis.
b. H0: 1
2
= 2
2
versus 1
2
< 2
2
. Reject H0 if F < .264. (1 = 7, 2 = 7.) F = .247 so we reject the null
hypothesis.
c. H0: 1
2
= 2
2
versus 1
2
> 2
2
. Reject H0 if F > 2.80 (1 = 9, 2 = 12.) F = 19.95 so we reject the null
hypothesis.
10.27 a. H0: 1 2 versus H1: 1 < 2. Reject the null hypothesis if t < 1.86 (8 df.) t = 4.29 so we reject the
null hypothesis. The sample provides evidence that the mean sound level has been reduced with the new
flooring.
b. H0: 1
2
= 2
2
versus 1
2
2
2
. Reject H0 if F > 9.60 or F < .104. (1 = 4, 2 = 4.) F = .6837 so we fail to
reject the null hypothesis. The variance has not changed.
10.28 H0: 1
2
= 2
2
versus 1
2
< 2
2
. Reject H0 if F < .3549. (1 = 11, 2 = 11.) F = .103 so we reject the null
hypothesis. The new drill has a reduced variance.
10.29 a. H0: 1 2 versus H1: 1 > 2. Reject the null hypothesis if t > 1.714 (11 df.) t = 3.163 so we reject the
null hypothesis. The sample provides evidence that the mean weight of an international bag is greater
than a domestic bag.
b. H0: 1
2
2
2
versus 1
2
> 2
2
. Reject H0 if F > 2.65 or F < .33. (1 = 9, 2 = 14.) F = 6.74 so we reject the
null hypothesis. The variance of international bag weight is greater than domestic bag weight.
10.30 a. H0: 1 = 2 versus H1: 1 > 2.
b. Reject the null hypothesis if z > 2.326.
c. p1 = .0001304, p2 = .0000296, z = 2.961
d. We reject the null hypothesis. The sample evidence shows a significant difference in the two proportions.
e. The p-value = .0031. This result is not due to chance.
f. Yes, accidents have severe consequences therefore even small reductions make a difference.
g. The normality assumption is questionable because there were only 4 accidents observed with the yellow
fire trucks.
10.31 a. H0: 1 = 2 versus H1: 1 > 2.
b. Reject the null hypothesis if z > 1.645.
c. p1 = .980, p2 = .93514, z = 4.507
d. We reject the null hypothesis. The sample evidence shows a significant difference in the two proportions.
e. The p-value .0000. This result is not due to chance.
105
f. Normality assumption is valid because both n1p1 > 10 and n2p2 > 10.
106
10.32 a. H0: 1 = 2 versus H1: 1 > 2. Reject the null hypothesis if z > 1.645.
b. p1 = .169, p2 = .1360.
c. z = 2.98, p-value = .003. Reject the null hypothesis.
d. The increase is most likely due to an increase in women executives and an increased awareness of the
benefit of having more diverse boards.
10.33 a. p1 = .17822, p2 = .143.
b. z = 1.282, p-value = .2000. Because the p-value is greater than .05, we fail to reject the null hypothesis.
There is not enough evidence in this sample to conclude that there is a difference in the proportion of
minority men (out of all males) and minority women (out of all females) on Fortune 100 boards.
10.34 a. H0: 1 = 2 versus H1: 1 > 2. Reject the null hypothesis if z > 1.28. p1 = .40, p2 = .3333. z = .4839, p-
value = .3142. We fail to reject the null hypothesis.
b. Yes, the normality assumption is valid.
c. Early finishers might know the material better and finish faster. On the other hand, if a student has not
studied they might quickly right an answer down and turn in their exam just to get it over with.
10.35 a. H0: 1 = 2 versus H1: 1 2. Reject the null hypothesis if z > 2.576 or z < 2.576. z = 2.506 so we fail
to reject the null hypothesis. The decision is very close.
b. The p-value = .0122.
c. Normality assumption is valid because both n1p1 > 10 and n2p2 > 10.
d. Gender differences may imply different marketing strategies.
10.36 a. H0: 1 = 2 versus H1: 1 > 2.
b. z = 9.65, p-value 0.
c. Normality assumption is valid.
d. Yes, this difference is quite important because the safety of children is involved.
10.37 a. H0: 1 2 versus H1: 1 < 2. Reject the null hypothesis if z < 2.33. z = 8.003 so we reject the null
hypothesis.
b. p-value = .0000. This is less than .01 so the difference is quite significant.
c. Normality can be assumed.
10.38. a. pE = .1842, pW = .2580.
b. H0: E = W versus H1: E W. Reject the null hypothesis if z > 1.96 or z < 1.96. z = 2.46 so we reject
the null hypothesis and conclude that there is a greater proportion of large gloves sold on the west side of
Vail.
c. There could be a different type of skier on the east side of Vail, perhaps more children ski on the east
side as opposed to the west side.
10.39 a. H0: 1 2 versus H1: 1 > 2. Reject the null hypothesis if z > 2.326.
b. z = 2.932, p-value = .0017. The p-value is less than .01 so we would reject the null hypothesis.
c. Normality assumption is valid.
d. While the difference may seem small on paper, breast cancer has very serious consequences. Small
reductions are important.
e. Were diet, smoking, exercise, hereditary factors considered?
10.40 H0: P = X versus H1: P X. Reject the null hypothesis if z > 1.645 or z < 1.645. z = 1.222 so we fail
to reject the null hypothesis.
10.41 a. H0: 1 2 versus H1: 1 > 2. Reject the null hypothesis if z > 2.326.
b. z = 8.254, p-value = .0000.
c. Normality assumption is valid.
d. Yes, the difference is important because the risk is almost three times greater for those with a family
history of heart disease.
107
10.42 a. Group 1: (.300, .700)
Group 2: (.007, .257)
Group 3: (.015, .098)
b. H0: 1 = 2 versus H1: 1 2. z = 2.803, p-value = .0051. Reject the null hypothesis.
c. While the confidence intervals may be more appealing, the procedure in part b is more appropriate.
d. Normality is questionable because there were fewer than 10 observations in groups 2 and 3.
10.43 H0: 1 2 versus H1: 1 > 2. Reject the null hypothesis if t > 2.374 (84 df.) t = 4.089 so we reject the
null hypothesis and conclude that the virtual team mean is higher.
10.44 a. H0: 1 = 2 versus H1: 1 < 2.
b. z = 3.987, p-value = .0000. Reject the null hypothesis.
c. Normality can be assumed.
d. Yes, differences are important.
e. Physicians should ask about exercise habits, nutrition, weight, smoking etc.
10.45 a. H0: 1 = 2 versus H1: 1 < 2.
b. z = 2.7765, p-value = .0027. Reject the null hypothesis.
c. Normality can be assumed.
d. Yes, differences are important.
e. Exercise habits, nutrition, weight, smoking etc., might influence the decision. Also, many people cannot
afford them and lack insurance to pay the costs.
10.46 a. H0: 1 = 2 versus H1: 1 > 2.
b. t = 1.221, p-value = .1141
c. The results of the sample are not statistically significant although students might think 8 points is an
important difference.
d. Yes, the sample standard deviations appear similar.
e. F = .620. FL = 0.488, FR = 2.05. Fail to reject the null hypothesis. This sample does not provide evidence
that the variances are different.

10.47 a. H0: 1 2 versus H1: 1 > 2. Assuming unequal variances, t = 1.718. The p-value is .0525. We fail to
reject the null hypothesis.
b. A paired sample test may have made more sense. By comparing the costs from one year to the next for
the same 10 companies we would have eliminated a source of variation due to different businesses.
10.48 a. H0: 1 = 2 versus H1: 1 > 2.
b. t = 2.640, p-value = .0050. Because the p-value > .01 we would reject the null hypothesis.
c. The distribution could be skewed to the right by one or two extremely long calls. A heavily skewed
distribution could make the t distribution an unwise choice.
10.49 a.
New Bumper:
Control Group:
108
b. H0: 1 2 versus H1: 1 < 2.
c. Assuming equal variances, reject H0 if t < 1.729 with df = 19.
d. t = 1.63.
e. Fail to reject the null hypothesis.
f. The p-value = .0600. This decision was close.
g. A sample difference of approximately 3 days downtime would be considered important but the variation
in the downtimes is large enough that we cannot conclude the true means are different.
10.50 a. H0: N S versus H1: N < S. Reject the null hypothesis if t < 2.650 using * = 13.
b. t = 5.29.
c. This sample provides strong evidence that the average spending in the northern region is much higher
than average spending in the southern region.
d. Folks in the south may use services differently or may be older.
10.51 a. Use a two-tailed test comparing two means assuming unequal variances.
b. H0: 1 = 2 versus H1: 1 2.
c. t = 2.651 with df = 86. Because the p-value is .0096, we easily reject the null hypothesis at = .05.
Although the sample difference isnt large, large samples have high power.
d. Students might be more alert in the morning.
e. Yes, the standard deviations are similar.
f. H0: 1
2
= 2
2
versus 1
2
2
2
. Reject H0 if F < .53 or F > 1.88. (1 = 41, 2 = 45.) F = .1.39 so we fail to
reject the null hypothesis.
10.52 a. H0: 1 = 2 versus H1: 1 2.
b. Reject the null hypothesis if t < 1.686 or t > 1.686.
c. t = 1.549
d. Because t > 1.686, we fail to reject the null hypothesis.
e. The p-value = .130. Because the p-value > .10, we fail to reject the null hypothesis.
10.53 a. Dot plots suggest that the mean differ and variances differ. Note the outlier in mens salaries.
b. H0: 1 2 versus H1: 1 > 2.
c. Reject the null hypothesis if t > 2.438 with df = 35.
d. Assuming equal variances, t = 4.742.
e. Reject the null hypothesis at = .05. Men are paid more on average.
f. p-value = .0000. This shows that the sample result would be unlikely if H0 were true.
g. Yes, the large difference suggests gender discrimination.
10.54 a. H0: 1 = 2 versus H1: 1 2.
b. Reject the null hypothesis if t > 2.045 or t < 2.045.
c. t = 1.623
d. Fail to reject the null hypothesis. There appears to be no difference in the average order size between
Friday and Saturday night.
e. p-value = .1154
109
10.55 a. The distributions appear skewed to the right.

b. H0: 1 = 2 versus H1: 1 2. Assume equal variances.
c. Reject the null hypothesis t > 2.663 or if t < 2.663.
d. t = .017.
e. We fail to reject the null hypothesis. It does not appear that the means are different.
f. The p-value = .9886. This indicates that the sample result shows no significant difference.
10.56 H0: 1
2
= 2
2
versus 1
2
2
2
. Fcalculated = 1.991. The p-value = .0981. We cannot reject the null hypothesis
at = .05. The variances are not different.
10.57 a. H 0: 1 = 2 versus H1: 1 2. Assume equal variances.
b. Reject the null hypothesis if t < 1.673 or t > 1.673. Use 55 df.
c. Since t = 3.162 and the p-value = .0025, we reject the null hypothesis. Mean sales are lower on the east
side.
10.58 a. H 0: d = 0 versus H1: d 0.
b. Reject the null hypothesis if t < 2.776 or t > 2.776.
c. t = 1.31. Fail to reject the null hypothesis. The average sales appear to be the same.
10.59 H0: 1
2
= 2
2
versus 1
2
2
2
. df1 = 30, df2 = 29. Reject the null hypothesis if F > 2.09 or F < .47. Fcalculated
= .76 so we fail to reject the null hypothesis. The variances are not different.
10.60 a. H 0: d = 0 versus H1: d 0.
b. Reject the null hypothesis if t > 2.045 or t < 2.045.
c. t = 1.256.
d. We fail to reject the null hypothesis.
e. The p-value = .2193. There is no evidence that the heart rates are different before and after a class break.
10.61 Assume independent samples. H 0: 1 = 2 versus H1: 1 2. Assume equal variances. Reject the null
hypothesis if p-value < .01. t = .05 and the p-value = .9622 (two tailed test) so we fail to reject the null
hypothesis. The average assessed value from the companys assessor and the employees assessor are the
same.
10.62 H 0: 1 = 2 versus H1: 1 2. Reject the null hypothesis if the p-value is less than .10. t = 1.336 and
the p-value = .2004 (two tailed test) so we fail to reject the null hypothesis. The average size of the
homes in the two neighborhoods are the same.
10.63 H 0: 1 = 2 versus H1: 1 2. Assume unequal variances. t = 1.212 with p-value = .2433. Fail to
reject the null hypothesis. The average defect rates appear to be the same. It is questionable whether the
normal assumption applies because of the very low incidence of bad pixels. Perhaps the Poisson
distribution should be used.
10.64 H 0: d = 0 versus H1: d 0. Reject the null hypothesis if the p-value < .10. t = 1.76 and the p-value = .
1054. We fail to reject the null hypothesis but the decision is quite close.
10.65 a. H0: A
2
= B
2
versus A
2
> B
2
. df1 = 11, df2 = 11. Reject the null hypothesis if F > 3.53. Fcalculated = 9.86 so
we reject the null hypothesis. Portfolio A has greater variance than portfolio B.
110
b. H 0: 1 = 2 versus H1: 1 2. Assume unequal variances (from part a.) t = .49 with a p-value = .6326.
We fail to reject the null hypothesis. The portfolio means are equal.
Chapter 11
Analysis of Variance
11.1 a. The hypotheses to be tested are:
H0: A = B = C mean scrap rates are the same
H1: Not all the means are equal at least one mean is different
b. One factor, F = 5.31 and critical value for = .05 is F2,12 = 3.89.
c. We reject the null hypothesis since the test statistic exceeds the critical value.
d. The p-value of .0223 is less than .05. At least one mean scrap rate differs from the others.
e. From the dot plot, we see Plant B above the overall mean and Plant C below the overall mean.
Mean n Std. Dev Treatment
12.30 5 1.573 Plant A
13.96 5 2.077 Plant B
9.58 5 2.651 Plant C
11.95 15 2.728 Total
One-Factor ANOVA
Source SS df MS F p-value
Treatment 48.897 2 24.4487 5.31 .0223
Error 55.260 12 4.6050
Total 104.157 14
11.2 a. The hypotheses to be tested are:
H0: 1 = 2 = 3 = 4 physician means are the same
H1: Not all the means are equal at least one mean is different
b. One factor, F = 3.50 and critical value for = .05 is F3,24 = 3.01.
c. We reject the null hypothesis since the test statistic exceeds the critical value (close).
d. The p-value of .0310 is less than .05. At least one physician mean differs from the others.
e. From the dot plot, we see Physician 1 and Physician 3 below the overall mean and Physician 2 above the
overall mean.
111
Mean n Std. Dev Treatment
28.3 7 4.89 Physician 1
34.2 6 4.12 Physician 2
27.3 8 4.62 Physician 3
32.0 7 4.24 Physician 4
30.2 28 5.08 Total
One-Factor ANOVA
Source SS df MS F p-value
Treatment 212.35 3 70.782 3.50 .0310
Error 485.76 24 20.240
Total 698.11 27
11.3 a. The hypotheses to be tested are:
H0: 1 = 2 = 3 = 4 mean GPAs are the same
H1: Not all the means are equal at least one mean is different
b. One factor, F = 3.52 and critical value for = .05 is F3,24 = 3.01.
c. We reject the null hypothesis since the test statistic exceeds the critical value (close).
d. The p-value of .0304 is less than .05. At least one GPA mean differs from the others.
e. From the dot plot, we see the GPA for Accounting below the overall mean and Human Resources and
Marketing above the overall mean.
Mean n Std. Dev Treatment
2.834 7 0.5053 Accounting
3.024 7 0.1776 Finance
3.241 7 0.3077 Human Resources
3.371 7 0.2575 Marketing
3.118 28 0.3785 Total
One-Factor ANOVA
Source SS df MS F p-value
Treatment 1.1812 3 0.39372 3.52 .0304
Error 2.6867 24 0.11195
Total 3.8679 27
112
11.4 a. The hypotheses to be tested are:
H0: 1 = 2 = 3 = 4 mean sales are the same
H1: Not all the means are equal at least one mean is different
b. One factor, F = 4.71 and critical value for = .05 is F3,16 = 3.24.
c. We reject the null hypothesis since the test statistic exceeds the critical value.
d. The p-value of .0153 is less than .05. At least one mean differs from the others.
e. From the dot plot, we see the weekly sales for Stores 2 and 3 below the overall mean and Store 1 above
the overall mean.
Mean n Std. Dev Treatment
108.0 5 5.34 Store 1
87.4 5 10.83 Store 2
91.0 5 11.11 Store 3
101.0 5 10.30 Store 4
96.9 20 12.20 Total
One-Factor ANOVA
Source SS df MS F p-value
Treatment 1,325.35 3 441.783 4.71 .0153
Error 1,501.20 16 93.825
Total 2,826.55 19
113
11.5 Using Tukey simultaneous comparison t-values, Plant B and Plant C differ. Using the pairwise t-tests, Plant
B and Plant C differ.
Post hoc analysis
Tukey simultaneous comparison t-values (d.f. = 12)
Plant C Plant A Plant B
9.58 12.30 13.96
Plant C 9.58
Plant A 12.30 2.00
Plant B 13.96 3.23 1.22
critical values for experimentwise error rate:
0.05 2.67
0.01 3.56
p-values for pairwise t-tests
Plant C Plant A Plant B
9.58 12.30 13.96
Plant C 9.58
Plant A 12.30 .0682
Plant B 13.96 .0073 .2448
11.6 Using Tukey simultaneous comparison t-values, Physicians 2 and 3 differ. Using the pairwise t-tests,
Physicians 2 and 3 are one pair and Physicians 1 and 2 are another pair.
Post hoc analysis
Tukey simultaneous comparison t-values (d.f. = 24)
Physician 3
Physician
1
Physician
4 Physician 2
27.3 28.3 32.0 34.2
Physician 3 27.3
Physician 1 28.3 0.44
Physician 4 32.0 2.04 1.54
Physician 2 34.2 2.85 2.35 0.87
critical values for experiment wise error rate:
0.05 2.76
0.01 3.47
p-values for pairwise t-tests
Physician 3
Physician
1
Physician
4 Physician 2
27.3 28.3 32.0 34.2
Physician 3 27.3
Physician 1 28.3 .6604
Physician 4 32.0 .0525 .1355
Physician 2 34.2 .0089 .0274 .3953
114
11.7 Using Tukey simultaneous comparison t-values, Marketing and Accounting differ. Using the pairwise t-
tests, Marketing and Accounting are one pair and Human Resources and Accounting are another pair.
Post hoc analysis
Tukey simultaneous comparison t-values (d.f. = 24)
Accounting Finance Human Resources Marketing
2.834 3.024 3.241 3.371
Accounting
2.83
4
Finance
3.02
4 1.06
Human Resources
3.24
1 2.28 1.21
Marketing
3.37
1 3.00 1.94 0.73
critical values for experimentwise error rate:
0.05 2.76
0.01 3.47
p-values for pairwise t-tests
Accounting Finance Human Resources Marketing
2.834 3.024 3.241 3.371
Accounting
2.83
4
Finance
3.02
4 .2986
Human Resources
3.24
1 .0320 .2365
Marketing
3.37
1 .0062 .0641 .4743
11.8 Using Tukey simultaneous comparison t-values, Store 1 and Store 2 differ. Using the pairwise t-tests, Store
1 and Store 2 are one pair, Store 4 and Store 2 are another pair, and Store 1 and Store 3 are a third.
Post hoc analysis
Tukey simultaneous comparison t-values (d.f. = 16)
Store 2 Store 3 Store 4 Store 1
87.4 91.0 101.0 108.0
Store 2 87.4
Store 3 91.0 0.59
Store 4 101.0 2.22 1.63
Store 1 108.0 3.36 2.77 1.14
critical values for experiment wise error rate:
0.05 2.86
0.01 3.67
115
p-values for pairwise t-tests
Store 2 Store 3 Store 4 Store 1
87.4 91.0 101.0 108.0
Store 2 87.4
Store 3 91.0 .5650
Store 4 101.0 .0412 .1221
Store 1 108.0 .0040 .0135 .2700
For Exercises 11.9 through 11.12 The hypotheses to be tested are:
H0: 1
2
= 2
2
= ... = c
2
H1: Not all the j
2
are equal
where c = the number of groups. The test statistic is:
2
max
max
2
min
s
F
s
= .
Critical values of Fmax may be found in Table 11.5 using degrees of freedom given by:
Numerator 1 = c
Denominator 2 = n/c1 (round down to next lower integer if necessary).
11.9* Critical value from Table 11. 5 is 15.5 (df1 = c = 3, df2 = n/c1 = 4). We fail to reject the null hypothesis of
variance homogeneity since the text statistic Fmax = 7.027/2.475 = 2.84 is less than the critical value. This
result agrees with Levenes test (p-value = .843) and the confidence intervals overlap.
Mean n Std. Dev
Varianc
e
Treatmen
t
12.30 5 1.573 2.475 Plant A
13.96 5 2.077 4.313 Plant B
9.58 5 2.651 7.027 Plant C
11.95 15 2.728 Total
P
l
a
n
t
95%Bonferroni Confidence Intervals for StDevs
Plant C
Plant B
Plant A
10 8 6 4 2 0
Bartlett's Test
0.843
Test Statistic 0.95
P- Value 0.622
Lev en e's Test
Test Statistic 0.17
P- Value
Test for Equal Variances for Scrap Rate
116
11.10* Critical value from Table 11.5 is 10.4 (df1 = c = 4, df2 = n/c1 = 6). We fail to reject the null hypothesis of
variance homogeneity since the text statistic Fmax = 23.90/16.97 = 1.41 is less than the critical value. This
result agrees with Levenes test (p-value = .885) and the confidence intervals overlap..
Mean n Std. Dev Variance Treatment
28.3 7 4.89 23.90
Physician
1
34.2 6 4.12 16.97
Physician
2
27.3 8 4.62 21.36
Physician
3
32.0 7 4.24 18.00
Physician
4
30.2 28 5.08 Total
P
h
y
s
i
c
i
a
n
95%Bonferroni Confidence Intervals for StDevs
Physician 4
Physician 3
Physician 2
Physician 1
14 12 10 8 6 4 2
Bartlett's Test
0.885
Test Statistic 0.20
P- Value 0.978
Lev en e's Test
Test Statistic 0.21
P- Value
Test for Equal Variances for Wait Time
11.11* Critical value from Table 11.5 is 10.4 (df1 = c = 4, df2 = n/c1 = 6). We fail to reject the null hypothesis of
variance homogeneity since the text statistic Fmax = (0.2553)/(0.0315) = 8.10 is less than the critical value.
This result agrees with Levenes test (p-value = .145). However, both tests are closer than in Exercises 11.9
and 11.9 (the high variance in accounting is striking, even though the confidence intervals do overlap.).
Mea
n n Std. Dev Variance Treatment
2.834 7 0.5053 0.2553 Accounting
3.024 7 0.1776 0.0315 Finance
3.241 7 0.3077 0.0947 Human Resources
3.371 7 0.2575 0.0663 Marketing
3.118 28 0.3785 Total
M
a
j
o
r
95%Bonferroni Confidence Intervals for StDevs
Marketing
Human Resources
Finance
Accounting
1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0
Bartlett's Test
0.145
Test Statistic 6.36
P- Value 0.095
Lev en e's Test
Test Statistic 1.98
P- Value
Test for Equal Variances for GPA
117
11.12* Critical value from Table 11.5 is 20.6 (df1 = c = 4, df2 = n/c1 = 4). We fail to reject the null hypothesis of
variance homogeneity since the text statistic Fmax = 123.5/28.5 = 4.33 is less than the critical value. This
result agrees with Levenes test (p-value = .810) and the confidence intervals overlap..
Mean n Std. Dev
Varianc
e Treatment
108.0 5 5.34 28.50 Store 1
87.4 5 10.83 117.30 Store 2
91.0 5 11.11 123.50 Store 3
101.0 5 10.30 106.00 Store 4
96.9 20 12.20 Total
S
t
o
r
e
95%Bonferroni Confidence Intervals for StDevs
Store 4
Store 3
Store 2
Store 1
50 40 30 20 10 0
Bartlett's Test
0.810
Test Statistic 2.07
P- Value 0.558
Lev en e's Test
Test Statistic 0.32
P- Value
Test for Equal Variances for Sales
11.13 a. Date is the blocking factor and Plant is the treatment or research interest.
Rows (Date):
H0: A1 = A2 = A3
H1: Not all the Aj are equal to zero
Columns: (Plant)
H0: B1 = B2 = B3 = B4 = 0
H1: Not all the Bk are equal to zero
b. See tables.
c. Plant means differ at = .05, F = 41.19, p-value = .0002. Blocking factor (date) also significant, F = 8.62,
p-value = .0172.
d. A test statistic of this magnitude would arise about 2 times in 10,000 samples if the null were true.
e. Plot suggests that Plants 1 and 2 are below overall mean, Plants 3 and 4 above.
ANOVA table: Two factor without replication
Source SS df MS F p-value
Treatments (Plant) 216.25 3 72.083 41.19 .0002
Blocks (Date) 30.17 2 15.083 8.62 .0172
Error 10.50 6 1.750
Total 256.92 11
118
Mean n Std. Dev Factor Level
20.333 3 1.528 Plant 1
18 3 2 Plant 2
29 3 2.646 Plant 3
25 3 2.646 Plant 4
21.5 4 4.041 Mar 4
25.25 4 5.377 Mar 11
22.5 4 5.508 Mar 18
Post hoc analysis
Tukey simultaneous comparison t-values (d.f. = 6)
Plant 2 Plant 1 Plant 4 Plant 3
18.000 20.333 25.000 29.000
Plant 2 18.000
Plant 1 20.333 2.16
Plant 4 25.000 6.48 4.32
Plant 3 29.000 10.18 8.02 3.70
critical values for experimentwise error rate:
0.05 3.46
0.01 4.97
p-values for pairwise t-tests
Plant 2 Plant 1 Plant 4 Plant 3
18.000 20.333 25.000 29.000
Plant 2 18.000
Plant 1 20.333 .0741
Plant 4 25.000 .0006 .0050
Plant 3 29.000 .0001 .0002 .0100
119
11.14 a. Vehicle Size is the blocking factor and Fuel Type is the treatment or research interest.
Rows (Vehicle Size):
H0: A1 = A2 = A3 = A4
H1: Not all the Aj are equal to zero
Columns (Fuel Type)
H0: B1 = B2 = B3 = B4 = 0
H1: Not all the Bk are equal to zero
b. See tables.
c. Fuel type means differ at = .05, F = 6.94, p-value = .0039. Blocking factor (Vehicle Size) also significant,
F = 34.52, p-value = .0000.
d. A test statistic of this magnitude would arise about 39 times in 10,000 samples if the null were true.
e. Plot suggests that 89 Octane and 91 Octane are somewhat above the overall mean. The Tukey tests show a
significant difference in fuel economy between Ethanol 10% and 89 Octane, Ethanol 10% and 91 Octane,
and 87 Octane and 91 Octane. The pairwise t-tests confirm this plus a couple of weaker differences.
ANOVA table: Two factor without replication
Source SS df MS F p-value
Treatments (Fuel Type) 54.065 4 13.5163 6.94 .0039
Blocks (Vehicle Size) 201.612 3 67.2040 34.52 0.0000
Error 23.363 12 1.9469
Total 279.040 19
Mean n Std. Dev Group
22.5750 4 3.5575 87 Octane
25.5500 4 3.2254 89 Octane
25.8000 4 4.2716 91 Octane
22.7500 4 4.5625 Ethanol 5%
21.8250 4 3.5874 Ethanol 10%

28.0200 5 2.0130 Compact
25.4200 5 2.3392 Mid-Size
21.1600 5 1.4293 Full-Size
20.2000 5 2.7911 SUV
23.7000 20 3.8323 Total

Post hoc analysis
Tukey simultaneous comparison t-values (d.f. = 12)
Ethanol
10% 87 Octane Ethanol 5% 89 Octane 91 Octane
21.8250 22.5750 22.7500 25.5500 25.8000
Ethanol 10%
21.825
0
87 Octane
22.575
0 0.76
Ethanol 5% 22.750 0.94 0.18
120
0
89 Octane
25.550
0 3.78 3.02 2.84
91 Octane
25.800
0 4.03 3.27 3.09 0.25
critical values for experiment wise error rate:
0.05 3.19
0.01 4.13
p-values for pairwise t-tests
Ethanol
10% 87 Octane Ethanol 5% 89 Octane 91 Octane
21.8250 22.5750 22.7500 25.5500 25.8000
Ethanol 10%
21.825
0
87 Octane
22.575
0 .4618
Ethanol 5%
22.750
0 .3670 .8622
89 Octane
25.550
0 .0026 .0108 .0150
91 Octane
25.800
0 .0017 .0067 .0093 .8043
121
11.15 a. Exam is the blocking factor and Professor is the treatment or research interest.
Rows (Exam):
H0: A1 = A2 = A3 = A4
H1: Not all the Aj are equal to zero
Columns (Professor)
H0: B1 = B2 = B3 = B4 = B5= 0
H1: Not all the Bk are equal to zero
b. See tables.
c. Professor means are on the borderline at = .05, F = 3.26, p-value = .0500. Blocking factor (Exam) is not
significant, F = 1.11, p-value = .3824.
d. A test statistic of this magnitude would arise about 5 times in 100 samples if the null were true.
e. Plot shows no consistent differences in means for professors. The Tukey tests and the pairwise tests are not
calculated since the treatments do not significantly affect the exam scores.
ANOVA table: Two factor without replication
Source SS df MS F p-value
Treatments (Professors) 134.403 4 33.6008 3.26 .0500
Blocks (Exams) 34.404 3 11.4680 1.11 .3824
Error 123.721 12 10.3101
Total 292.528 19
Mean n
Std.
Dev Group
76.325 4 4.8321 Prof. Argand
75.225 4 2.3977 Prof. Blague
80.250 4 3.1859 Prof. Clagmire
76.700 4 3.1591 Prof. Dross
72.200 4 1.8655 Prof. Ennuyeux

78.040 5 6.1064 Exam 1
75.120 5 3.0971 Exam 2
76.660 5 2.5530 Exam 3
74.740 5 3.3366 Final
76.140 20 3.9238 Total

122
11.16 a. Qtr is the blocking factor and Store is the treatment or research interest.
Rows (Qtr):
H0: A1 = A2 = A3 = A4
H1: Not all the Aj are equal to zero
Columns (Store)
H0: B1 = B2 = B3 = 0
H1: Not all the Bk are equal to zero
b. See tables.
c. Store means do not differ at = .05, F = 1.60, p-value = .2770. Blocking factor (Qtr) is significant, F =
15.58, p-value = .0031.
d. A test statistic of this magnitude would arise about 28 times in 100 samples if the null were true.
e. Plot shows no consistent differences in means for stores. The Tukey tests and the pairwise tests are not
calculated since the treatments do not significantly affect the sales.
ANOVA table: Two factor without replication
Source SS df MS F p-value
Treatments (Store) 41,138.67 2 20,569.333 1.60 .2779
Blocks (Qtr) 601,990.92 3 200,663.639 15.58 .0031
Error 77,277.33 6 12,879.556
Total 720,406.92 11
Mean n Std. Dev
1,456.250 4 231.073 Store 1
1,375.250 4 261.153 Store 2
1,518.250 4 323.770 Store 3

1,509.000 3 205.263 Qtr 1
1,423.333 3 59.878 Qtr 2
1,120.667 3 63.760 Qtr 3
1,746.667 3 97.079 Qtr 4
1,449.917 12 255.913 Total
123
11.17 Factor A: Row Effect (Year)
H0: A1 = A2 = A3 = 0 year means are the same
H1: Not all the Aj are equal to zero year means differ
Factor B: Column Effect (Portfolio Type)
H0: B1 = B2 = B3 = B4 = 0 stock portfolio type means are the same
H1: Not all the Bk are equal to zero stock portfolio type means differ
Interaction Effect (YearPortfolio)
H0: All the ABjk are equal to zero there is no interaction effect
H1: Not all ABjk are equal to zero there is an interaction effect
b. See tables.
c. Years differ at = .05, F = 66.82, p -value < .0001. Portfolios differ at = .05, F = 5.48, p-value = .0026.
Interaction is significant at = .05, F = 4.96, p-value = .0005.
d. The small p-values indicate that the sample would be unlikely if the null were true.
e. The interaction plot lines do cross and support the interaction found and reported above. The visual
indications of interaction are strong for the portfolio returns data.
Table of Means
Factor 2 (Portfolio Type)
Factor 1 (Year) Health Energy Retail Leisure
2004 15.74 22.20 18.36 18.52
18.7
1
2005 22.84 27.98 23.92 25.46
25.0
5
2006 13.24 12.62 19.90 10.98
14.1
9
17.27 20.93 20.73 18.32
19.3
1
Two-Factor ANOVA with Replication
Source SS df MS F p-value
Factor 1 (Year) 1,191.584 2 595.7922 66.82 1.34E-14
Factor 2 (Portfolio) 146.553 3 48.8511 5.48 .0026
Interaction 265.192 6 44.1986 4.96 .0005
Error 427.980 48 8.9162
Total 2,031.309 59
Post hoc analysis for Factor 1
Tukey simultaneous comparison t-values (d.f. = 48)
Row 3 Row 1 Row 2
14.19 18.71 25.05
Row 3 14.19
Row 1 18.71 4.79
Row 2 25.05 11.51 6.72
critical values for experiment wise error rate:
0.05 2.42
0.01 3.07
Post hoc analysis for Factor 2
124
Tukey simultaneous comparison t-values (d.f. = 48)
Health Leisure Retail Energy
17.27 18.32 20.73 20.93
Health 17.27
Leisure 18.32 0.96
Retail 20.73 3.17 2.21
Energy 20.93 3.36 2.40 0.19
critical values for experiment wise error rate:
0.05 2.66
0.01 3.29
11.18 Factor A: Row Effect (Year)
H0: A1 = A2 = A3 = 0 year means are the same
H1: Not all the Aj are equal to zero year means differ
Factor B: Column Effect (Department)
H0: B1 = B2 = B3 = 0 department means are the same
H1: Not all the Bk are equal to zero department type means differ
Interaction Effect (YearDepartment)
H0: All the ABjk are equal to zero there is no interaction effect
H1: Not all ABjk are equal to zero there is an interaction effect
b. See tables.
c. Years do not differ at = .05, F = 0.64, p -value = .5365. Departments differ at = .05, F = 12.66, p-value =
.0004. Interaction is not significant at = .05, F = 2.38, p-value = .0899.
d. The p-values range from highly significant (Department) to insignificant (Year). The interaction effect, if
any, is weak since about 9 samples in 100 would show an F statistic this large in the absence of interaction.
e. The interaction plot lines do cross for Department, but are approximately parallel for Year and support
the lack of interaction found and reported above. The visual indications of interaction are, therefore,
non-existent for the team ratings.
Table of Means
Factor 2 (Department)
Factor 1 (Year) Marketing Engineering Finance
2004 84.7 73.0 89.3 82.3
2005 79.0 77.0 89.7 81.9
2006 88.7 79.7 84.7 84.3
84.1 76.6 87.9 82.9
Two factor ANOVA with Replication
Source SS df MS F p-value
125
Factor 1 (Year) 30.52 2 15.259 0.64 .5365
Factor 2 (Department) 599.41 2 299.704 12.66 .0004
Interaction 225.48 4 56.370 2.38 .0899
Error 426.00 18 23.667
Total 1,281.41 26
Post hoc analysis for Factor 2
Tukey simultaneous comparison t-values (d.f. = 18)
Engineering Marketing Finance
76.6 84.1 87.9
Engineering 76.6
Marketing 84.1 3.29
Finance 87.9 4.94 1.65
critical values for experimentwise error rate:
0.05 2.55
0.01 3.32
11.19 Factor A: Row Effect (Age Group)
H0: A1 = A2 = A3 = A4 = 0 age group means are the same
H1: Not all the Aj are equal to zero age group means differ
Factor B: Column Effect (Region)
H0: B1 = B2 = B3 = B4= 0 region means are the same
H1: Not all the Bk are equal to zero region means differ
Interaction Effect (Age GroupRegion)
H0: All the ABjk are equal to zero there is no interaction effect
H1: Not all ABjk are equal to zero there is an interaction effect
b. See tables.
c. Age groups differ at = .05, F = 36.96, p -value < .0001. Regions do not differ at = .05, F = 0.55, p-value
= .6493. Interaction is significant at = .05, F = 3.66, p-value = .0010.
d. The p-values range from highly significant (Age Group) to insignificant (Region). The interaction effect is
significant since only about 1 sample in 1000 would show an F statistic this large in the absence of
interaction.
e. The interaction plot lines do cross (e.g., MidWest crosses the others by age group) but visually there is
not a strong indication of interaction. This is perhaps because the data range is not large (data appear to
be rounded to nearest .1 so there is only 2-digit accuracy).
126
Table of Means
Factor 2 (Region)
Factor 1 (Age Group) Northeast Southeast Midwest West
Youth (under 18) 4.00 4.12 3.68 4.06 3.97
College (18-25) 3.86 3.70 3.88 3.78 3.81
Adult (25-64) 3.50 3.42 3.76 3.52 3.55
Senior (65 +) 3.42 3.52 3.18 3.36 3.37
3.70 3.69 3.63 3.68 3.67
Two factor ANOVA with Replication
Source SS df MS F p-value
Factor 1 (Age Group) 4.193 3 1.3975 36.96 5.56E-14
Factor 2 (Region) 0.062 3 0.0208 0.55 .6493
Interaction 1.245 9 0.1383 3.66 .0010
Error 2.420 64 0.0378
Total 7.920 79
Post hoc analysis for Factor 1
Tukey simultaneous comparison t-values (d.f. = 64)
Senior
(65 +)
Adult
(25-64)
College
(18-25)
Youth
(Under 18)
3.37 3.55 3.81 3.97
Senior (65 +) 3.37
Adult (25-64) 3.55 2.93
College (18-25) 3.81 7.07 4.15
Youth (under
18) 3.97 9.68 6.75 2.60
critical values for experimentwise error rate:
0.05 2.64
0.01 3.25
127
11.20 Factor A: Row Effect (Quarter)
H0: A1 = A2 = A3 = A4 = 0 quarter means are the same
H1: Not all the Aj are equal to zero quarter means differ
Factor B: Column Effect (Supplier)
H0: B1 = B2 = B3 = 0 supplier means are the same
H1: Not all the Bk are equal to zero supplier means differ
Interaction Effect (QuarterSupplier)
H0: All the ABjk are equal to zero there is no interaction effect
H1: Not all ABjk are equal to zero there is an interaction effect
b. See tables.
c. Quarters differ at = .05, F = 6.01, p -value < .0020. Suppliers differ at = .05, F = 4.30, p-value = .0211.
Interaction is significant at = .05, F = 0.44, p-value = .8446.
d. The p-values indicate that both main effects are significant. The interaction effect is not significant, since
about 84 samples in 100 would show an F statistic this large in the absence of interaction.
e. The interaction plot lines do not cross to a noticeable degree, so we see no evidence of interaction.
Table of Means
Factor 2 (Supplier)
Factor 1 (Quarter) Supplier 1 Supplier 2 Supplier 3
Qtr 1 12.3 10.8 14.3 12.4
Qtr 2 12.3 11.0 12.3 11.8
Qtr 3 10.3 8.5 10.0 9.6
Qtr 4 10.5 9.5 10.8 10.3
11.3 9.9 11.8 11.0
Two factor ANOVA with Replication
Source SS df MS F
p-
value
Factor 1 (Quarter) 63.23 3 21.076 6.01 .0020
Factor 2 (Supplier) 30.17 2 15.083 4.30 .0211
Interaction 9.33 6 1.556 0.44 .8446
Error 126.25 36 3.507
Total 228.98 47
Post hoc analysis for Factor 1
Tukey simultaneous comparison t-values (d.f. = 36)
Qtr 3 Qtr 4 Qtr 2 Qtr 1
9.6 10.3 11.8 12.4
Qtr 3 9.6
Qtr 4 10.3 0.87
Qtr 2 11.8 2.94 2.07
Qtr 1 12.4 3.71 2.83 0.76
critical values for experimentwise error rate:
0.05 2.70
0.01 3.35
128
Post hoc analysis for Factor 2
Tukey simultaneous comparison t-values (d.f. = 36)
Supplier 2 Supplier 1 Supplier 3
9.9 11.3 11.8
Supplier 2 9.9
Supplier 1 11.3 2.08
Supplier 3 11.8 2.83 0.76
critical values for experimentwise error rate:
0.05 2.45
0.01 3.11
11.21 We fail to reject the null hypothesis of equal means. The p-value (.1000) exceeds .05. There is no
significant difference among the GPAs. We ignore importance, since the results are not significant. The dot
plot comparison confirms that differences are not strong. For tests of homogeneity of variances, the critical
value of Hartleys Fmax statistic from Table 11.5 is 13.7 (df1 = c = 4, df2 = n/c1 = 25/4 1 = 5). We fail to
reject the null hypothesis of homogeneous variances since the text statistic Fmax = (0.3926)/(.0799) = 4.91 is
less than the critical value. This result agrees with Levenes test (p-value = .290) and the confidence
intervals overlap.
Mean n Std. Dev Variance Group
2.484 5 0.6240 0.3894 Freshman
2.916 7 0.6265 0.3926 Sophomore
3.227 7 0.2826 0.0799 Junior
3.130 6 0.4447 0.1978 Senior
2.968 25 0.5477 Total
One factor ANOVA
Source SS df MS F
p-
value
Treatment 1.8180 3 0.60599 2.36 .1000
Error 5.3812 21 0.25625
Total 7.1992 24
129
C
l
a
s
s
95%Bonferroni Confidence Intervals for StDevs
Sophomore
Senior
Junior
Freshman
2.5 2.0 1.5 1.0 0.5 0.0
Bartlett's Test
0.606
Test Statistic 3.75
P- Value 0.290
Lev en e's Test
Test Statistic 0.63
P- Value
Test for Equal Variances for GPA
11.22 We fail to reject the null hypothesis of equal means. The p-value (.0523) exceeds .05, although it is a very
close decision. We ignore importance, since the results are not significant. The dot plot does suggest that
differences exist. A larger sample might be in order. For tests of homogeneity of variances, the critical value
of Hartleys Fmax statistic from Table 11.5 is 8.38 (df1 = c = 3, df2 = n/c1 = 23/3 1 = 6). We fail to reject
the null hypothesis of homogeneous variances since the text statistic Fmax = (451.11)/(89.41) = 5.05 is less
than the critical value. This result agrees with Levenes test (p-value = .092) and the confidence intervals
overlap. However, the variances would have been judged unequal had we used = .10.
Mean n Std. Dev
Variance
s Group
261.2 5 11.95 142.70 Budgets
238.0 10 21.24 451.11 Payables
244.4 8 9.46 89.41 Pricing
245.3 23 17.91 Total
One factor ANOVA
Source SS df MS F
p-
value
Treatment 1,803.76 2 901.880 3.43 .0523
Error 5,256.68 20 262.834
Total 7,060.43 22
130
D
e
p
t
95%Bonferroni Confidence Intervals for StDevs
Pricing
Payables
Budgets
50 40 30 20 10 0
Bartlett's Test
0.140
Test Statistic 4.77
P- Value 0.092
Lev en e's Test
Test Statistic 2.17
P- Value
Test for Equal Variances for Days
11.23 We reject the null hypothesis of equal means. The p-value (.0022) is less than .05. Even a small difference
in output could be important in a large array of solar cells. The dot plot does suggest that differences exist.
Cell Type C is above the overall mean, while Cell Type B is below the overall mean. For tests of
homogeneity of variances, the critical value of Hartleys Fmax statistic from Table 11.5 is 10.8 (df1 = c = 3, df2
= n/c1 = 18/3 1 = 5). We fail to reject the null hypothesis of homogeneous variances since the text
statistic Fmax = (4.57)/(4.00) = 1.14 is less than the critical value. This result agrees with Levenes test (p-
value = .975) and the confidence intervals overlap. Tukey tests show that C differs from A and B.
Mean n Std. Dev Variances Group
123.8 6 2.04 4.17 Cell Type A
123.0 6 2.00 4.00 Cell Type B
127.8 6 2.14 4.57 Cell Type C
124.9 18 2.91 Total
One factor ANOVA
Source SS df MS F
p-
value
Treatment 80.11 2 40.056 9.44 .0022
Error 63.67 15 4.244
Total
143.7
8 17
131
Post hoc analysis
Tukey simultaneous comparison t-values (d.f. = 15)
Cell Type
B
Cell Type
A
Cell Type
C
123.0 123.8 127.8
Cell Type B 123.0
Cell Type A 123.8 0.70
Cell Type C 127.8 4.06 3.36
critical values for experimentwise error rate:
0.05 2.60
0.01 3.42
p-values for pairwise t-tests
Cell Type
B
Cell Type
A
Cell Type
C
123.0 123.8 127.8
Cell Type B 123.0
Cell Type A 123.8 .4943
Cell Type C 127.8 .0010 .0043
C
e
l
l
T
y
p
e
95%Bonferroni Confidence Intervals for StDevs
Cell Type C
Cell Type B
Cell Type A
7 6 5 4 3 2 1
Bartlett's Test
0.975
Test Statistic 0.02
P- Value 0.989
Lev en e's Test
Test Statistic 0.03
P- Value
Test for Equal Variances for Watts
132
11.24 We cannot reject the null hypothesis of equal means. The p-value (.4188) exceeds .05. Since the means do
not differ significantly, the issue of importance is moot. The dot plot does not suggest any differences. For
tests of homogeneity of variances, the critical value of Hartleys Fmax statistic from Table 11.5 is 15.5 (df1 = c
= 3, df2 = n/c1 = 15/3 1 = 4). We fail to reject the null hypothesis of homogeneous variances since the
text statistic Fmax = (246,324)/(103,581) = 2.38 is less than the critical value. This result agrees with Levenes
test (p-value = .715) and the confidence intervals overlap.
Mean n Std. Dev Variance Group
1,282.0 5 496.31 246,324 Goliath
1,376.0 5 321.84 103,581 Varmint
1,638.0 5 441.78 195,170 Weasel
1,432.0 15 424.32 Total
One factor ANOVA
Source SS
d
f MS F
p-
value
Treatment 340,360.00 2 170,180.000 0.94 .4188
Error 2,180,280.00 12 181,690.000
Total 2,520,640.00 14
V
e
h
i
c
l
e
95%Bonferroni Confidence Intervals for StDevs
Weasel
Varmint
Goliath
2000 1500 1000 500 0
Bartlett's Test
0.569
Test Statistic 0.67
P- Value 0.715
Lev en e's Test
Test Statistic 0.59
P- Value
Test for Equal Variances for Damage
133
134
11.25 We cannot reject the null hypothesis of equal means. The p-value (.1857) exceeds .05. Since the means do
not differ significantly, the issue of importance is moot. The dot plot does not suggest any differences. For tests of
homogeneity of variances, the critical value of Hartleys Fmax statistic from Table 11.5 is 20.6 (df1 = c = 4, df2 = n/c1 =
22/4 1 = 4). We fail to reject the null hypothesis of homogeneous variances since the text statistic Fmax = (141.610)/
(45.428) = 3.12 is less than the critical value. This result agrees with Levenes test (p-value = .739) and the confidence
intervals overlap.
Mean n Std. Dev Variances Group
14.2 5 8.29 68.724 Hospital A
21.5 4 11.90 141.610 Hospital B
16.9 7 7.95 63.203 Hospital C
9.3 6 6.74 45.428 Hospital D
15.0 22 8.98 Total
One factor ANOVA
Source SS df MS F p-value
Treatment 388.96 3 129.655 1.79 .1857
Error 1,305.99 18 72.555
Total 1,694.95 21
H
o
s
p
i
t
a
l
95%Bonferroni Confidence Intervals for StDevs
Hospital D
Hospital C
Hospital B
Hospital A
80 70 60 50 40 30 20 10 0
Bartlett's Test
0.645
Test Statistic 1.26
P- Value 0.739
Lev en e's Test
Test Statistic 0.56
P- Value
Test for Equal Variances for Wait
135
11.26 We reject the null hypothesis of equal means. The p-value (.0029) is less than .05. Productivity differences
could be important in a competitive market, and might signal a need for additional worker training. The dot
plot suggests that Plant B has lower productivity. For tests of homogeneity of variances, the critical value of
Hartleys Fmax statistic from Table 11.5 is 6.94 (df1 = c = 3, df2 = n/c1 = 25/3 1 = 7). We fail to reject the
null hypothesis of homogeneous variances since the text statistic Fmax = (2.9791)/(0.68558) = 4.35 is less
than the critical value. This result agrees with Levenes test (p-value = .122) and the confidence intervals
overlap.
Mean n Std. Dev Variance Group
3.97 9 0.828 0.685584 Plant A
3.02 6 1.094 1.196836 Plant B
5.57 10 1.726 2.979076 Plant C
4.38 25 1.647 Total
One factor ANOVA
Source SS df MS F
p-
value
Treatment 26.851 2 13.4253 7.72 .0029
Error 38.269 22 1.7395
Total 65.120 24
Post hoc analysis
Tukey simultaneous comparison t-values (d.f. = 22)
Plant B Plant A Plant C
3.02 3.97 5.57
Plant B 3.02
Plant A 3.97 1.37
Plant C 5.57 3.75 2.65
critical values for experimentwise error rate:
0.05 2.52
0.01 3.25
p-values for pairwise t-tests
Plant B Plant A Plant C
3.02 3.97 5.57
Plant B 3.02
Plant A 3.97 .1855
Plant C 5.57 .0011 .0148
136
P
l
a
n
t
95%Bonferroni Confidence Intervals for StDevs
Plant C
Plant B
Plant A
4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5
Bartlett's Test
0.208
Test Statistic 4.21
P- Value 0.122
Lev en e's Test
Test Statistic 1.69
P- Value
Test for Equal Variances for Output
11.27 It appears that the researcher is not treating this as a randomized block, since both factors appear to be of
research interest. Hence, this will be referred to as a two-factor ANOVA without replication.
Factor A (Method):
H0: A1 = A2 = A3
H1: Not all the Aj are equal to zero
Factor B: (Road Condition)
H0: B1 = B2 = B3
H1: Not all the Bk are equal to zero
Mean stopping distance is significantly affected by surface (p = 0.0002) but not by road condition (p =
0.5387). Tukey tests show significant differences between Ice and the other two surfaces. To test for
homogeneous variances the critical value of Hartleys statistic is F3,2 = 87.5. Since Fmax = 1.37 (for Method)
and Fmax = 14.5 (for Surface) we cannot reject the hypothesis of equal variances.
Table of Means
Mean n Std. Dev Group
452.000 3 9.849 Ice
184.667 3 37.528 Split Traction
154.000 3 11.358 Packed Snow

271.000 3 151.803 Pumping
249.667 3 177.827 Locked
270.000 3 164.739 ABS
263.556 9 143.388 Total
137
Two-Factor ANOVA Without Replication
Source SS df MS F p-value
Column (Surface) 161,211.56 2 80,605.778 134.39 .0002
Row (Method) 869.56 2 434.778 0.72 .5387
Error 2,399.11 4 599.778
Total 164,480.22 8
Post hoc analysis
Tukey simultaneous comparison t-values (d.f. = 4)
Packed Snow Split Traction Ice
154.000 184.667 452.000
Packed Snow 154.000
Split Traction 184.667 1.53
Ice 452.000 14.90 13.37
critical values for experimentwise error rate:
0.05 3.56
0.01 5.74
p-values for pairwise t-tests
Packed Snow Split Traction Ice
154.000 184.667 452.000
Packed Snow 154.000
Split Traction 184.667 .1999
Ice 452.000 .0001 .0002
11.28 We cannot reject the null hypothesis of equal means. The p-value (.3744) exceeds .05. The dot plot does not
show large differences among manufacturers. For tests of homogeneity of variances, the critical value of
Hartleys Fmax statistic from Table 11.5 is 333 (df1 = c = 7, df2 = n/c1 = 22/7 1 = 2). We fail to reject the
null hypothesis of homogeneous variances since the text statistic Fmax = (.000372567)/(0.000013815) = 27.0
is less than the critical value. This result agrees with Levenes test (p-value = .315) and the confidence
intervals overlap (although they are rather strange in appearance).
138
Table of Means
Mean n Std. Dev Variance Group
0.03255 2 0.004455 0.000019847 Aunt Millie's
0.03648 5 0.015563 0.000242207 Brownberry
0.02950 2 0.008768 0.000076878 Compass Food
0.03943 3 0.004277 0.000018293 Interstate Brand Co.
0.02007 3 0.019302 0.000372567 Koepplinger's Bakery
0.03437 3 0.003717 0.000013816 Metz Baking Co.
0.04223 4 0.009982 0.000099640 Pepperidge Farm
0.03441 22 0.012320 Total
One Factor ANOVA
Source SS
d
f MS F p-value
Manufacturer
0.00101
4 6 0.0001690 1.17 .3744
Error
0.00217
4 15 0.0001449
Total
0.00318
8 21
M
f
g
r
95%Bonferroni Confidence Intervals for StDevs
Pepperidge Farm
Metz Baking Co.
Koepplinger's Bakery
Interstate Brand Co.
Compass Food
Brownberry
Aunt Millie's
2.0 1.5 1.0 0.5 0.0
Bartlett's Test
0.729
Test Statistic 7.07
P- Value 0.315
Lev en e's Test
Test Statistic 0.60
P- Value
Test for Equal Variances for Fat
139
11.29 We cannot reject the null hypothesis of equal means. The p-value (.8166) exceeds .05. The dot plot does not
show large differences among groups, although the fourth quintile seems to have smaller variance. For tests
of homogeneity of variances, the critical value of Hartleys Fmax statistic from Table 11.5 is 7.11 (df1 = c = 5,
df2 = n/c1 = 50/5 1 = 9). We reject the null hypothesis of homogeneous variances since the text statistic
Fmax = 112.04/14.13 = 7.93 exceeds the critical value. This result agrees with Levenes test (p-value = .036)
even though the confidence intervals do overlap.
Table of Means
Mean n Std. Dev Variance Group
30.00 10 9.548 Quintile 1
29.12 10 10.213 Quintile 2
31.17 10 10.585 Quintile 3
28.71 10 3.759 Quintile 4
26.66 10 6.305 Quintile 5
29.13 50 8.286 Total
One Factor ANOVA
Source SS df MS F p-value
Treatment 111.959 4 27.9897 0.39 .8166
Error 3,252.570 45 72.2793
Total 3,364.529 49
Q
u
in
t
i
l
e
95%Bonferroni Confidence Intervals for StDevs
Quintile 5
Quintile 4
Quintile 3
Quintile 2
Quintile 1
25 20 15 10 5 0
Bartlett's Test
0.018
Test Statistic 10.28
P- Value 0.036
Lev en e's Test
Test Statistic 3.33
P- Value
Test for Equal Variances for Dropout
140
11.30 This is a replicated experiment with two factors and interaction. Based on the p-values, we conclude that
means differ for Angle (p = .0088) and for Vehicle(p = .0007). However, there is no significant interaction
for AngleVehicle (p = .6661). The interaction plots support this conclusion, as the lines do not cross. The
Tukey tests say that pairwise means differ for Rear End and Slant, and that Goliath differs from Varmint and
Weasel.
Table of Means
Factor 2 (Vehicle)
Factor 1 (Angle) Goliath Varmint Weasel
Head-On 983.3 1,660.0 1,896.7 1,513.3
Slant 1,470.0 1,733.3 1,996.7 1,733.3
Rear end 973.3 1,220.0 1,513.3 1,235.6
1,142.2 1,537.8 1,802.2 1,494.1
Two Factor ANOVA with Replication
Source SS df MS F
p-
value
Factor 1 (Angle) 1,120,029.63 2 560,014.815 6.22 .0088
Factor 2 (Vehicle) 1,985,985.19 2 992,992.593 11.04 .0007
Interaction 216,637.04 4 54,159.259 0.60 .6661
Error 1,619,400.00 18 89,966.667
Total 4,942,051.85 26
Post hoc analysis for Factor 1
Tukey simultaneous comparison t-values (d.f. = 18)
Rear end Head-On Slant
1,235.6 1,513.3 1,733.3
Rear end 1,235.6
Head-On 1,513.3 1.96
Slant 1,733.3 3.52 1.56
critical values for experimentwise error rate:
0.05 2.55
0.01 3.32
Post hoc analysis for Factor 2
Tukey simultaneous comparison t-values (d.f. = 18)
Goliath Varmint Weasel
1,142.2 1,537.8 1,802.2
Goliath 1,142.2
Varmint 1,537.8 2.80
Weasel 1,802.2 4.67 1.87
critical values for experimentwise error rate:
0.05 2.55
0.01 3.32
141
11.31 This is a replicated experiment with two factors and interaction. The only difference between this
experiment and the previous one is that the sample size is doubled, which raises the F statistics and reduces
the p-values. Based on the p-values, we conclude that means differ for Crash Type (p < .0001) and for
Vehicle(p < .0001). However, there is no significant interaction for Crash TypeVehicle (p = .2168).
Notice, however, that the interaction p-value is smaller than in the previous experiment, showing that larger
sample size alone (ceteris paribus) can make an effect more significant. The interaction plots support the
conclusion of no interaction, as the lines do not cross to any major extent. The Tukey tests suggest that
pairwise means differ for Rear End and Slant, and that Goliath differs from Varmint and Weasel.
Table of Means
Factor 2 (Vehicle)
Factor 1 (Angle) Goliath Varmint Weasel
Head On 983.3 1,660.0 1,896.7 1,513.3
Slant 1,470.0 1,733.3 1,996.7 1,733.3
Rear end 973.3 1,220.0 1,513.3 1,235.6
1,142.2 1,537.8 1,802.2 1,494.1
Two Factor ANOVA with Replication
Source SS df MS F
p-
value
Factor 1 (Angle) 2,240,059.26 2 1,120,029.630 15.56 7.30E-06
Factor 2 (Vehicle) 3,971,970.37 2 1,985,985.185 27.59 1.51E-08
Interaction 433,274.07 4 108,318.519 1.50 .2168
Error 3,238,800.00 45 71,973.333
Total 9,884,103.70 53
Post hoc analysis for Factor 1
Tukey simultaneous comparison t-values (d.f. = 45)
Rear end Head On Slant
1,235.6 1,513.3 1,733.3
Rear end 1,235.6
Head On 1,513.3 3.11
Slant 1,733.3 5.57 2.46
critical values for experimentwise error rate:
0.05 2.42
0.01 3.08
142
Post hoc analysis for Factor 2
Tukey simultaneous comparison t-values (d.f. = 45)
Goliath Varmint Weasel
1,142.2 1,537.8 1,802.2
Goliath 1,142.2
Varmint 1,537.8 4.42
Weasel 1,802.2 7.38 2.96
critical values for experimentwise error rate:
0.05 2.42
0.01 3.08
11.32 This is a replicated experiment with two factors and interaction. Based on the p-values, the means differ for
Temperature (p = .0000) and for PVC Type(p = .0013). However, there is no interaction for
TemperaturePVC Type (p = .9100). We conclude that the burst strength is affected by temperature, by
PVC type, but not by the interaction between temperature and PVC type. The dot plots suggest that PVC2 is
the best brand. The pairwise Tukey tests indicate that there is a difference between PVC2 and PVC3, but no
difference between PVC1 and PVC2 or PVC1 and PVC3.
Table of Means
Factor 2 (PVC Typr)
Factor 1 (Temperature) PVC1 PVC2 PVC3
Hot (70 Degrees C) 268.0 287.0 258.0 271.0
Warm (40 Degrees C) 314.0 334.3 306.0 318.1
Cool (10 Degrees C)) 354.0 361.3 335.3 350.2
312.0 327.6 299.8 313.1
Two Factor ANOVA with Replication
Source SS df MS F p-value
Factor 1 (Temperature) 28,580.22 2 14,290.111 81.04 0.0000
Factor 2 (PVC Type) 3,488.89 2 1,744.444 9.89 .0013
Interaction 171.56 4 42.889 0.24 .9100
Error 3,174.00 18 176.333
Total 35,414.67 26
143
Post hoc analysis for Factor 1
Tukey simultaneous comparison t-values (d.f. = 18)
Hot (70
o
C) Warm (40
o C
) Cool (10
o
C)
271.0 318.1 350.2
Hot (70
o
C) 271.0
Warm (40
o
C) 318.1 7.53
Cool (10
o
C)) 350.2 12.66 5.13
critical values for experimentwise error rate:
0.05 2.55
0.01 3.32
Post hoc analysis for Factor 2
Tukey simultaneous comparison t-values (d.f. = 18)
PVC3 PVC1 PVC2
299.8 312.0 327.6
PVC3 299.8
PVC1 312.0 1.95
PVC2 327.6 4.44 2.48
critical values for experimentwise error rate:
0.05 2.55
0.01 3.32
11.33 This is a two-factor ANOVA without replication. We conclude that tax audit rate is not significantly affected
by year (p = 0.6153) but is significantly affected by taxpayer class (p < .0001). There is no interaction as
there is no replication. MegaStat calls the column factor the treatment but the problem wording suggests
that both factors are of research interest.
144
Table of Means
Mean n Std. Dev Group
2.21500 10 1.55302 1990
2.31800 10 1.64241 1991
2.08000 10 1.53937 1992
1.85700 10 1.30518 1993
1.99800 10 1.36745 1993
2.41400 10 1.50986 1994

1.00167 6 0.49906 1040 A TPI
0.95000 6 0.20995 1040 TPI < $25,000
0.74167 6 0.17360 1040 TPI $25,000-50,000
1.06167 6 0.23853 1040 TPI $50,000-100,000
4.37167 6 1.30310 1040 TPI > $100,000
3.02000 6 1.69680 C-GR < $25,000
2.55833 6 0.38301 C-GR $25,000-100,000
3.81333 6 0.25524 C-GR > $100,000
1.32167 6 0.23216 F-GR < $100,000
2.63000 6 0.80230 F-GR > $100,000
2.14700 60 1.43886 Total
Two Factor ANOVA without Replication
Source SS df MS F p-value
Column (Year) 2.1594 5 0.43189 0.72 .6153
Row (Taxpayer Class) 92.8146 9 10.31274 17.08 8.46E-12
Error 27.1744 45 0.60388
Total 122.1485 59
145
11.34 This is a two-factor ANOVA with replication and interaction. Based on the p-values, we conclude that the
means differ by Weight (p = .0009) and by Medication (p = .0119). There is no significant interaction effect
WeightMedication (p = .9798).
Table of Means
Means: Factor 2 (Medication)
Factor 1 (Weight) Med 1 Med 2 Med 3 Med 4
1.1 or Less 133.0 141.0 136.0 127.5 134.4
1.1 to 1.3 140.5 141.5 140.5 132.0 138.6
1.3 to 1.5 148.5 153.0 148.5 140.0 147.5
140.7 145.2 141.7 133.2 140.2
Source SS df MS F p-value
Factor 1 (Weight) 717.58 2 358.792 13.25 .0009
Factor 2 (Medication) 459.00 3 153.000 5.65 .0119
Interaction 27.75 6 4.625 0.17 .9798
Error 325.00 12 27.083
Total 1,529.33 23
Post hoc analysis for Factor 1
Tukey simultaneous comparison t-values (d.f. = 12)
1.1 or Less 1.1 to 1.3 1.3 to 1.5
134.4 138.6 147.5
1.1 or Less 134.4
1.1 to 1.3 138.6 1.63
1.3 to 1.5 147.5 5.04 3.41
critical values for experimentwise error rate:
0.05 2.67
0.01 3.56
Post hoc analysis for Factor 2
Tukey simultaneous comparison t-values (d.f. = 12)
Med 4 Med 1 Med 3 Med 2
133.2 140.7 141.7 145.2
Med 4 133.2
Med 1 140.7 2.50
Med 3 141.7 2.83 0.33
Med 2 145.2 3.99 1.50 1.16
critical values for experimentwise error rate:
0.05 2.97
0.01 3.89
146
11.35 This is a two-factor ANOVA with replication and interaction. We conclude that means do not differ by
Instructor Gender (p = .43) or by Student Gender (p = .24) but there is an interaction effect between the two
factors Instructor GenderStudent Gender (p = .03). The sample size is very large, so it is unlikely that any
effect was overlooked (the test should have excellent power).
11.36 This is an unreplicated two-factor ANOVA. Although MegaStat calls it a randomized block ANOVA, the
wording of the problem suggests that both factors are of research interest. We conclude that texture is not
significantly affected by age group (p = 0.2999) or by surface type (p = 0.2907). The dot plots support these
conclusions, since there are no strong or consistent differences in the groups. No interaction is estimated
since there is no replication.
Table of Means
Mean n Std. Dev
5.1500 4 1.2261 Shiny
5.3750 4 0.8846 Satin
6.1250 4 0.4924 Pebbled
4.9500 4 0.8851 Pattern

5.7750 4 1.1236 Youth (Under 21)
5.7250 4 0.4031 Adult (21 to 39)
5.4500 4 0.9292
Middle-Age (40 to
61)
4.6500 4 0.9983 Senior (62 and over)
5.4000 16 0.9345 Total
Two Factor ANOVA without Replication
Source SS df MS F
p-
value
Columns (Surface) 3.165 3 1.0550 1.42 .2999
Rows (Age Group) 3.245 3 1.0817 1.46 .2907
Error 6.690 9 0.7433
Total 13.100 15
147
11.37 This is an unreplicated two-factor ANOVA. Although MegaStat calls it a randomized block ANOVA, the
wording of the problem suggests that both factors are of research interest. Call waiting time is not
significantly affected by day of the week (p = 0.1760) but is significantly affected by time of day (p =
0.0001) as indicated in the bar chart of means. No interaction is estimated since there is no replication.
Table of Means
Mean n Std. Dev
49.077 26 25.575 Mon
60.269 26 35.629 Tue
53.692 26 28.369 Wed
49.577 26 28.365 Thu
44.808 26 17.253 Fri

43.200 5 15.786 6:00
62.400 5 16.502 6:30
61.800 5 32.874 7:00
68.400 5 28.789 7:30
65.800 5 11.883 8:00
65.200 5 20.042 8:30
57.800 5 22.061 9:00
60.800 5 36.224 9:30
60.000 5 18.371 10:00
88.200 5 45.779 10:30
45.600 5 19.256 11:00
34.400 5 5.727 11:30
70.200 5 29.811 12:00
53.000 5 16.598 12:30
47.000 5 14.577 13:00
69.600 5 41.107 13:30
86.800 5 35.024 14:00
38.000 5 3.674 14:30
35.200 5 1.095 15:00
43.200 5 11.167 15:30
28.800 5 19.045 16:00
27.400 5 10.922 16:30
32.600 5 4.506 17:00
26.800 5 3.271 17:30
42.400 5 41.283 18:00
24.000 5 17.436 18:30
148
51.485 130 27.745 Total

Two Factor ANOVA without Replication
Source SS df MS F p-value
Columns (Day of Week) 3,537.58 4 884.396 1.62 .1760
Rows (Time of Day) 41,046.47 25 1,641.859 3.00 .0001
Error 54,716.42 100 547.164
Total 99,300.47 129
0
10
20
30
40
50
6
:
0
0
7
:
0
0
8
:
0
0
9
:
0
0
1
0
:
0
0
1
1
:
0
0
1
2
:
0
0
1
3
:
0
0
1
4
:
0
0
1
5
:
0
0
1
6
:
0
0
1
7
:
0
0
1
8
:
0
0
Time of Day
M
e
a
n

C
a
l
l

V
o
l
u
m
e
11.38 (a) This is a two Factor ANOVA. (b) There are 4 friends since df = 3 (df = r1) and 3 months since df =2
(df = c1). The total number of observations is 36 since df = 35 (df = n1). Thus, since the data matrix is
43 (12 cells) there must have been 36/12 = 3 observations per cell (i.e., 3 bowling scores per friend per
month). (c) Based on the p-values, we see Month (p = .0002) is significant at = .01, Friend (p < .0001) is
significant at = .01, and there is only a weak interaction since MonthFriend (p = .0786) is only significant
at = .10. We conclude that mean bowling scores are influenced by the month, friend and possibly by an
interaction between the month (time of year) and the bowler.
149
11.39 (a) This is a randomized block (unreplicated two-factor) ANOVA. (b) Based on the p-values, air pollution is
significantly affected by car type (p < .0001) and time of day (p < .0001). (c) Variances may appear to be
unequal. Equal variances are important because analysis of variance assumes that observations on the
response variable are from normally distributed populations that have the same variance. However, we
cannot rely on our eyes alone to judge variances, and we should do a test for homogeneity. (d) In Hartleys
test, for freeway, we get Fmax = (14333.7)/(2926.7) = 4.90 which is less than the critical value from Table
11.5 is F4, 4 = 20.6, and so we fail to reject the hypothesis of equal variances. Similarly, we fail to reject the
null of equal variances for time of day, since Fmax = (14333.6)/872.9 = 16.4 is less than the critical value from
Table 11.5 F5, 3 = 50.7.
11.40 (a) This is a two factor ANOVA with replication. (b) There are 5 suppliers since df = 4 (df = r1) and 4
quarters since df =3 (df = c1). The total number of observations is 100 since df = 99 (df = n1). Therefore,
we have a 54 data matrix (20 cells) which implies 100/20 = 5 observations per cell (i.e., 5 observations per
supplier per quarter). (c) Based on the p-values Quarter (p = .0009) and Supplier (p < .0001), we conclude
that both main effects are significant at = .01. However, there is also a very strong interaction effect
QuarterSupplier (p = .0073). We conclude that shipment times are influenced by the quarter, supplier and
the interaction between the quarter (time of year) and the supplier. However, in view of the interaction
effect, the main effects may be problematic.
11.41 (a) This is a one Factor ANOVA. The number of bowlers is 5 since d.f. = 4 (df = c1). That is, there were 5
data columns. The sample size is 67 since df = 66 (df = n1). (c) Based on the p-value from the ANOVA
table (p < .0001) we reject the null hypothesis of no difference between the mean scores and conclude there
is a difference. (d) The sample variances range from 77.067 to 200.797. To test the hypothesis of
homogeneity, we compare Hartleys critical value F5,12 = 5.30 with the sample statistic Fmax = (200.797)/
(77.067) = 2.61, and fail to reject the hypothesis of equal variances.
11.42 (a) This is a one Factor ANOVA (b) Based on the p-value from the ANOVA table (essentially zero) we
strongly reject the null hypothesis of no difference in mean profit/asset ratios. (c) The plots indicate that
company size (as measured by employees) does affect profitability per dollar of assets. There are possible
outliers in several of the groups. (d) Variances may be unequal, based on the dot plots and possible outliers.
(e) To test the hypothesis of homogeneity, we compare Hartleys critical value F4,123 = 1.96 with the sample
statistic Fmax = (34.351)/(8.108) = 4.24 and reject the hypothesis of equal variances. There isnt anything we
can do about it, though. (f) Specifically, the Tukey tests show that small companies differ significantly from
medium large, and huge companies (although the latter three categories are not different at = .05).
150
Chapter 12
Bivariate Regression
12.1 For each sample: H0: = 0 versus H1: 0.
Summary Table
Sample df r t t r Decision
a 18 .45 2.138 2.101 .444 Reject
b 28 .35 1.977 1.701 .306 Reject
c 5 .6 1.677 2.015 .669 Fail to Reject
d 59 .3 2.416 2.39 .297 Reject
12.2 a. The scatter plot shows a positive correlation between hours worked and weekly pay.
b.
Hours Worked (X) Weekly Pay (Y)
2
( )
i
x x
2
( )
i
y y ( )( )
i i
x x y y
10 93 100 7056 840
15 171 25 36 30
20 204 0 729 0
20 156 0 441 0
35 261 225 7056 1260
20 177 350 15318 2130
x
y
SSxx SSyy SSxy
2130
.9199
350 15318
r = =
c. t.025 = 3.182
d.
2
5 2
.9199 4.063
1 (.9199)
t

= =

. We reject the null hypothesis of zero correlation.


e. p-value = .0269.
151
12.3 a. The scatter plot shows a negative correlation between operators and wait time.
b.
Operators (X) Wait (Y)
2
( )
i
x x
2
( )
i
y y ( )( )
i i
x x y y
4 385 4 1444 76
5 335 1 144 12
6 383 0 1296 0
7 344 1 9 3
8 288 4 3481 118
6 347 10 6374 185
x
y
SSxx SSyy SSxy
185
.7328
10 6374
r

= =
c. t.025 = 3.183
d.
2
5 2
.7328 1.865
1 ( .7328)
t

= =

. We fail to reject the null hypothesis of zero correlation.
e. p-value = .159.
12.4 a. The scatter plot shows little correlation between age and amount spent.

b. rcalculated = .292
c. t.025 = 2.306
d.
2
10 2
.292 .864
1 ( .292)
t

= =

e. critical
2
2.306
.692
2.306 10 2
r = =
+
.
f. Because rcalculated (.292) > .692, we fail to reject the null hypothesis of zero correlation.
152
12.5 a. The scatter plot shows a positive correlation between returns from last year and returns from this year.
b. rcalculated = .5313
c. t.025 = 2.131
d.
2
17 2
.5313 2.429
1 (.5313)
t

= =

e. critical
2
2.131
.482
2.131 17 2
r = =
+
f. Because rcalculated (.5313) > .482, we reject the null hypothesis of zero correlation.
12.6 a. The scatter plot shows a positive correlation between orders and ship cost.
b. rcalculated = .820
c. t.025 = 2.228
d.
2
12 2
.820 4.530
1 (.820)
t

= =

e. critical
2
2.228
.576
2.228 12 2
r = =
+
f. Because rcalculated (.820) > .576, we reject the null hypothesis of zero correlation.
12.7 a. Correlation Matrix
1-Year 10-Year
1-Year 1.000
3-Year -.095
5-Year .014
10-Year .341 1.000
12 sample size
.576 critical value .05 (two-tail)
.708 critical value .01 (two-tail)
d. There were positive correlations between years 3 and 5 and years 5 and 10. Higher returns in Year 3 lead
to higher returns in Year 5 and also in Year 10.
12.8 a. An increase in the price of $1, reduces its expected sales by 37.5 units.
b. Sales = 842 (20)*37.5 = 92
c. From a practical point of view no. A zero price is unrealistic.
153
12.9 a. Increasing the size by 1 square foot raises the price by $150.
b. HomePrice = 125000 + 150*(2000) = $425,000
c. No, the intercept has no meaning.
12.10 a. Increasing the average revenue by 1 million dollars raises the net income by $30,700.
b. If revenue is zero, then net income is 2277 millions dollars., suggests that the firm has net income when
revenue is zero. Does not seem to be meaningful.
c. Revenue = 2277 + .0307*(1000) = 2307.7 million dollars
12.11 a. Increasing the median income by $1,000 raises the median home price by $2610.
b. If median income is zero, then the model suggests that median home price is $51,300. While it does not
seem logical that the median family income for any city is zero, it is unclear what the lower bound would
be.
c. HomePrice = 51.3 + 2.61*(50) = $181,800
Homeprice = 51.3 + 2.61*(100) = $312,300
12.12 a. Increasing the number of hours worked per week by 1 hour reduces the expected number of credits by .
07.
b. Yes, the intercept makes sense in this situation. It is possible that a student does not have a job outside of
school.
c. Credits = 15.4 + .07*(0) = 15.4 credits
Credits = 15.4 + .07*(40) = 12.6 credits
The more hours a student works, the less credits (courses) he will take on average.
12.13 a. Chevy Blazer: a one year increase in vehicle age reduces the price by $1050.
Chevy Silverado: a one year increase in vehicle age reduces the price by $1339.
b. Chevy Blazer: If age = 0 then price = $16,189. This could be the price of a new Blazer.
Chevy Silverado: If age = 0 then price = $22,951. This could be the price of a new Silverado.
c. 16,189 1,050*5 = $10,939
22,951 1,339*5 = $16,256
12.14 a. Tips = 20+ 10*Hours (Answers will vary.)
b. One hour of work yields on average $10 in tips.
c. The intercept has no meaning in this case.
12.15 a. Units Sold = 300 150*Price (Answers will vary.)
b. One dollar reduction in price increases units sold by 150 on average.
c. If price is zero, then units sold = 300. This is not meaningful, price is never zero.
12.16 a.
Hours Worked (X) Weekly Pay (Y)
2
( )
i
x x
2
( )
i
y y ( )( )
i i
x x y y
10 93 100 7056 840
15 171 25 36 30
20 204 0 729 0
20 156 0 441 0
35 261 225 7056 1260
20 177 350 15318 2130
x
y
SSxx SSyy SSxy
b.
1
2130
6.086
350
b = = ,
0
177 6.086(20) 55.286 b = = , y = 55.286 + 6.086X
154
c.
Hours Worked (xi) Weekly Pay (yi)
Estimated
Pay (

i
y
)

i i
y y
2
( )
i i
y y
2
( )
i
y y
2
( )
i
y y
10 93 116.146 -23.146 535.7373 3703.209 7056
15 171 146.576 24.424 596.5318 925.6198 36
20 204 177.006 26.994 728.676 3.6E-05 729
20 156 177.006 -21.006 441.252 3.6E-05 441
35 261 268.296 -7.296 53.23162 8334.96 7056
20 177 177.006 -0.006 3.6E-05 3.6E-05 0
20 177 2355.429 12963.79 15318
x
y
SSE SSR SST
d.
2
12, 963
.8462
15, 318
R = =
e.
12.17 a.
Operators (X) Wait (Y)
2
( )
i
x x
2
( )
i
y y ( )( )
i i
x x y y
4 385 4 1444 76
5 335 1 144 12
6 383 0 1296 0
7 344 1 9 3
8 288 4 3481 118
6 347 10 6374 185
x
y
SSxx SSyy SSxy
b.
1
185
18.5
10
b

= = ,
0
347 18.5(6) 458 b = + = , y = 458 18.5X
155
c.
Operators (xi) Wait Time (yi)
Estimated
Time (

i
y )

i i
y y
2
( )
i i
y y
2
( )
i
y y
2
( )
i
y y
4 385 384 1 1 1369 1444
5 335 365.5 -30.5 930.25 342.25 144
6 383 347 36 1296 0 1296
7 344 328.5 15.5 240.25 342.25 9
8 288 310 -22 484 1369 3481
6 347 2951.5 3422.5 6374
x
y
SSE SSR SST
d.
2
3, 422.5
.5369
6, 374.0
R = =
e.
12.18 a. and b.
c. An increase of 1% in last years return leads to an increase, on average, of .458% for this years return.
d. If last years return is zero, this years return is 11.155%. Yes, this is meaningful, returns can be zero.
e. R
2
= .2823 Only 28.23% of the variation in this years return is explained by last years return.
156
12.19 a. and b.
c. An increase of 100 orders leads to an average increase in shipping cost of $493.22.
d. The intercept is not meaningful in this case.
e. R
2
= .6717 67.17% of the variation in shipping costs is explained by number of orders.
12.20 a. and b.
c. An increase in age of 10 years leads to an average decrease in spending of $5.30.
d. The intercept is not meaningful in this case.
e. R
2
= .0851 8.51% of the variation in spending is due to the variation in age. Age of the consumer has
little impact on the amount spent.
12.21 a. Y = 557.4511 + 3.0047*X
b. The 95% confidence interval is 3.0047 2.042(0.8820) or (1.203, 4.806).
c. H0: 1 0 versus H1: 1 > 0. Reject the null hypothesis if t > 1.697. t = 3.407 so we reject the null
hypothesis.
d. p-value = .000944 so we reject the null hypothesis. The slope is positive. Increased debt is correlated
with increased NFL team value.
12.22 a. Y = 7.6425 + 0.9467*X
b. The 95% confidence interval is 0.9467 2.145(0.0936) or (0.7460, 1.1473).
c. H0: 1 0 versus H1: 1 > 0. Reject the null hypothesis if t > 1.761. t = 10.118 so we reject the null
hypothesis.
d. p-value = .000 so we reject the null hypothesis. The slope is positive. Increased revenue is correlated
with increased expenses.
157
12.23 a. Y = 1.8064 + .0039*X
b. Intercept: t = 1.8064/0.6116 = 2.954, Slope: t = 0.0039/0.0014 = 2.786 (Excel output may be different
due to internal rounding.)
c. df = 10, t.025 = 2.228.
d. Intercept: p-value = .0144. Slope: p-value = .0167.
e. (2.869)
2
= 8.23
f. This model fits the data quite fairly well. The F statistic is highly significant. Also, R
2
= .452 indicating
almost half of the variation in annual taxes is explained by home price.
12.24 a. Y = 614.930 1.09.11*X
b. Intercept: t = 614.930/51.2343 = 12.002. Slope: t = 109.112/51.3623 = 2.124.
c. df = 18, t.025 = 2.101.
d. Intercept: p-value = .0000, Slope: p-value = .0478
e. (2.124)
2
= 4.51
f. This model has a poor fit. The F statistic is barely significant at a level of .05 and R
2
= .2. Only 20% of
the variation in units sold can be explained by average price.
12.25 a.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.5313237
R Square 0.2823049
Adj R Square 0.2344586
Standard Error 4.3346058
Observations 17
ANOVA
df SS MS F
Significance
F
Regression 1 110.8584768 110.85848 5.9002402 0.028176778
Residual 15 281.8321114 18.788807
Total 16 392.6905882
Coefficients
Standard
Error t Stat P-value Lower 95% Upper 95%
Intercept 11.15488 2.190744205 5.091822 0.0001326 6.485418897 15.82434033
Last Year 0.4579749 0.188541436 2.429041 0.0281768 0.056108323 0.859841435
b. (0.05611, 0.85984) This interval does not contain zero therefore we can conclude that the slope is greater
than zero.
c. The t statistic is 2.429 and the p-value is 0.02828. Because the p-value is less than 0.05, we can conclude
that the slope is positive.
d. F = 5.90 with a p-value = .0282. This indicates that the model does provide some fit to the data.
e. The p-values match. (2.429)
2
= 5.90.
f. This model provides modest fit to the data. Although the F statistic is significant, R
2
shows that only 28%
of the variation in this years return is explained by last years return.
158
12.26 a.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.819555843
R Square 0.671671781
Adj R Square 0.638838959
Standard Error 599.0289739
Observations 12
ANOVA
df SS MS F
Significance
F
Regression 1 7340819.551 7340819.55 20.457327 0.001103268
Residual 10 3588357.115 358835.7115
Total 11 10929176.67
Coefficients
Standard
Error t Stat P-value Lower 95% Upper 95%
Intercept -31.18952293 1059.8678 -0.0294277 0.9771025 -2392.7222 2330.343177
Orders 4.932152105 1.0904657 4.52297768 0.0011033 2.5024431 7.361861124
b. (2.502, 7.362) This interval does not contain zero therefore we can conclude that the slope is greater than
zero.
c. The t statistic is 4.523 and the p-value is 0.0011. Because the p-value is less than 0.05, we can conclude
that the slope is positive.
d. F = 20.46 with a p-value = .0011. This indicates that the model does provide some fit to the data.
e. The p-values match. (4.523)
2
= 20.46.
f. This model provides a good fit to the data. The F statistic is highly significant and R
2
shows that 67% of
the variation in shipping cost is explained by number of orders.
12.27 a.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.291791
R Square 0.085142
Adj R Square -0.0292153
Standard Error 2.1283674
Observations 10
ANOVA
df SS MS F
Significance
F
Regression 1 3.372666112 3.3726661 0.7445265 0.413332705
Residual 8 36.23958389 4.529948
Total 9 39.61225
Coefficients
Standard
Error t Stat P-value Lower 95% Upper 95%
159
Intercept 6.9609354 2.088494459 3.332992 0.0103383 2.144858565 11.777012
Age -0.0529794 0.061399756 -0.8628595 0.4133327 -0.194567453 0.0886087
b. (0.1946, 0.0886) This interval does contain zero therefore we cannot conclude that the slope is greater
than zero.
c. The t statistic is 0.863 and the p-value is ..4133. Because the p-value is greater than 0.05, we cannot
conclude that the slope is positive.
d. F = 0.745 with a p-value = .4133. This indicates that the model does not fit the data.
e. The p-values match. (0.863)
2
= 0.745.
f. This model does not fit the data. The F statistic is not significant and R
2
shows that only 8.5% of the
variation in dollars spent is explained by the moviegoers age.
12.28 For only two of the data sets, F and I, are the data time series data. The rest are cross-sectional data.
12.29 Answers will vary.
12.30 Answers will vary.
12.31 For A, B, D, F, H, and I should expect a positive sign for slope.
12.32 A: positive relationship between income and home price
B: positive relationship between employees and revenue
C: positive relationship between ELOS and ALOS
D: positive relationship between HP and Cruising Speed
E: inverse relationship between years in circulation and weight of a nickel
F: no relationship between changes in the money supply and changes in the CPI
G: inverse relationship between weight of a car and the gas mileage it gets in the city.
H: positive relationship between fat calories per gram and calories per gram
I: positive relationship between usage of electricity and monthly expenditure.
Data Set A Data Set B

Data Set C Data Set D

160
Data Set E Data Set F

Data Set G Data Set H

Data Set I
12.33 A: Yes
B: Yes
C: Yes
D: Yes
E: Yes
F: No
G: Yes
H: Yes
I: Yes
161
12.34 A: An increase in median income of $1000 , increases home price by $2,609.8 . No, the intercept does
not have meaning.
B: An increase in the number of employees by 1 unit increases revenue by .304 units. Yes, the intercept
does have meaning. It is possible that the revenue is zero.
C: An increase in ELOS of 1month increases ALOS by 1.03months. No, the intercept does not have
meaning.
D: An increase in one unit of horsepower, increases cruise speed by .1931 mph. No, the intercept does
not have meaning.
E: An increase in age of 1 year, reduces the weight by 0.004 grams.
F: An increase in M1 of 1%in the prior year, increases the CPI by .1993% in the current year.
G: An increase in the weight of a car by 1 pound, reduces its city mpg by 0.0045 mpg.
H: An increase in the fat calories per gram by 1, increases total calories per gram by 2.2179.
I: An increase in 1 kwh of usage, increases the monthly expenditure by $0.1037.
For 12.35 through 12.43, filling out the MegaStat Regression Dialog Box as displayed below will provide the
information required for these questions. The dialog box displayed is for the Data Set A.
12.36 a. A: No, it means that the slope is different from zero.
B: No, it means that the slope is different from zero.
C: No, it means that the slope is different from zero.
D: No, it means that the slope is different from zero.
E: No, it means that the slope is different from zero.
F: Yes, it means that the slope is not different from zero.
G: No, it means that the slope is different from zero.
H: No, it means that the slope is different from zero.
I: No, it means that the slope is different from zero.
b. The hypothesis for each data set is: H0: 1 = 0 versus H1: 1 0.
A:.DF = 32, t-critical = 2.037; (e) Yes.
B: DF = 22, t-critical = 2.074; (e) Yes.
C: DF = 32, t-critical = 2.145; (e) Yes.
D: DF = 50, t-critical = 2.009; (e) Yes.
E: DF = 29, t-critical = 2.045; (e) Yes.
F: DF = 39, t-critical = 2.023; (e) No.
G: DF = 41, t-critical = 2.020; (e) Yes.
H: DF = 18, t-critical = 2.101; (e) Yes.
I: DF = 22, t-critical = 2.074; (e) Yes.
162
c. The p-value measures the chance of making this sample observation if the null hypothesis were true.
Small p-values tell us that the null is false
d. The p-value approach is easier since the p-value is reported as part of the regression output and can easily
be compared to the level of significance.
e. See part b above.
12.37 A: (a) Good; (b) see 12.36 (c); (c) Yes.
B: (a) Very Good; (b) see 12.36 (c); (c) Yes.
C: (a) Very Good; (b) see 12.36 (c); (c) Yes.
D: (a) Good; (b) see 12.36 (c); (c) Yes.
E: (a) Good; (b) see 12.36 (c); (c) Yes.
F: (a) Very Poor; (b) see 12.36 (c); (c) No.
G: (a) Good; (b) see 12.36 (c); (c) Yes.
H: (a) Very Good; (b) see 12.36 (c); (c) Yes.
I: (a) Excellent; (b) see 12.36 (c); (c) Yes.
12.38 A:. observations 20 and 29 have unusual residuals; no outliers.
B: observations 6 and 21 have unusual residuals; no outliers.
C: no observations have unusual residuals or outliers.
D: observations 42 has an unusual residual; observation 28 is an outlier.
E: observations 5, 8, and 28 have unusual residuals.
F: observations 14, 19, and 20 have unusual residuals.
G: observation 42 is an outlier. There are no unusual residuals.
H: There are no unusual residuals or outliers.
I: observation 14 is an unusual residual and 16 is an outlier.
12.39* Assumption of normal errors violated for: G and I
12.40* Heteroscedasticity a problem for: None
12.41* Durbin-Watson test appropriate only for: F, the value is .58. Indicates that autocorrelation is present. I
the value is 1.95, indicating that autocorrelation is not present.
12.42* Answers will vary.
12.43* A: observation 8 has high leverage.
B: observations 2 and 8 have high leverage.
C: observations 4 and 10 have high leverage.
D: observations 2, 3, 4, 14, 16, and 17 have high leverage.
E: observations 5 and 25 have high leverage.
F: observations 27, 33, 37, and 41 have high leverage.
G: observations 12, 13 and 22 have high leverage.
H: observations 6 and 11 have high leverage.
I: observations 9 and 13 have high leverage.
12.44 No, r measures the strength and direction of the linear relationship, but not the amount of variation
explained by the explanatory variable. R
2
.
12.25 H0: 1 = 0 versus H1: 1 0. tcritical = 2.3069. t = 2.3256 so we reject the null hypothesis. The correlation is
not zero.
12.46 The correlation coefficient is only .13, indicating that there exists a very weak positive correlation
between prices on successive days. The fact that it is a highly significant result stems from a large
sample size which increases power of the test. This means that very small correlations will show
statistical significance even though the correlation is not truly important.
163
12.47 a. Y = 55.2 +.73*2000 = 1515.2 total free throws expected.
b. No, the intercept is not meaningful. You cant make free throws without attempting them.
c. Quick rule:
2

i n yx
y t s

1515.2 + t(27)*53.2 = 1515.2 +/- 2.052*53.2 (1406.03, 1624.37)


12.48 a. Y = 30.7963+ .0343*X (R
2
= .202, syx = 6.816)
b. DF = 33, t.025 = 2.035
c. t = 2.889 so we will reject the null hypothesis that the slope is zero.
d. We are 95% confident that the slope is contained in the interval .0101 to .0584. This CI does not contain
zero, hence, there is a relationship between the weekly pay and the income tax withheld.
e. (2.889)
2
= 8.3463
f. The value of R-squared assigns only 20% of the variation in income withholding to the weekly pay.
While the F statistic is significant, the fit is only a modest fit.

12.49 a. Y = 1743.57 1.2163*X (R
2
= .370, syx = 286.793)
b. DF = 13, t.025 = 2.160
c. t = 2.764 so we will reject the null hypothesis that the slope is zero.
d. We are 95% confident that the slope is contained in the interval 2.1617 to 0.2656. This CI does not
contain zero, hence, there is a relationship between the weekly pay and monthly machine downtime.
e. (2.764)
2
= 7.639696
f. The value of R-squared assigns only 37% of the variation in monthly machine downtime to the monthly
maintenance spending (dollars). Thus, throwing more money at the problem of downtime will not
completely resolve the issue. Indicates that there are most likely other reasons why machines have the
amount of downtime incurred.
12.50 a. Y = 6.5763 +0.0452*X (R
2
= .519, syx = 6.977)
b. DF= 62, t.025 = 2.00 (using DF = 60)
c. t = 8.183 so we will reject the null hypothesis that the slope is zero.
d. We are 95% confident that the slope is contained in the interval 0.0342 to 0.0563. This CI does not
contain zero, hence, there is a relationship between the total assets (billions) and total revenue (billions).
e. (8.183)
2
= 66.96
f. The value of R-squared assigns 51.9% of the variation in total revenue (billions) to the total assets
(billions). Thus, increasing assets will lead to an increase in income. However, the results also indicate
that there are most likely other reasons why companies earn the revenue they do.
12.51 a. r = .677
b. The critical values for =.01 are .393. The correlation coefficient of .677 is outside of these limits, so
we reject the hypothesis of no correlation and the sample evidence supports the notion of positive
correlation.
c. The scatterplot shows a positive correlation between IBM and EDS stock prices.
164
12.52 a.
b. r = .792. This shows a fairly strong positive linear relationship between gestation and longevity.
c. At = .01, the correlation coefficient of .792 is outside the critical range .537. We reject the
hypothesis of no correlation. There is significant correlation.
12.53 a. The scatter plot indicates that there is a negative correlation between life expectancy and fertility.
b. r = .846. There is a strong negative linear relationship between a nations life expectancy and their
fertility rate.
c. At = .01, the correlation coefficient of =.846 is outside the critical range .463. We reject the
hypothesis of no correlation. There is a negative correlation between life expectancy and fertility rates.
12.54 a. The scatter plot shows almost no pattern.
165
b. r = .105. At = .05, the correlation coefficient of = .105 is not outside the critical range .381. We
fail to reject the hypothesis of no correlation. It appears there is very little relationship between price and
accuracy rating of speakers.
12.55 For each of these, the scatter plot will contain the answers to (a), (b), and (d) with respect to the fitted
equation.
c. Salary: The fit is good. Assessed: The fit is excellent. HomePrice2: The fit is good.
d. Salary: An increase in the age by 1 year increases salary by $1447.4.
Assessed: An increase in 1 sq. ft. of floor space increases assessed value by $313.30.
HomePrice2: An increase in 1 sq. ft. of home size increases the selling price by $209.20.
e. The intercept is not meaningful for any of these data sets as a zero value for any of Xs respectively
cannot realistically result in a positive Y value.

12.56 a.
estimated slope
standard error
t =
Dependent Variable Estimated Slope Std Error Differ from 0?
Highest grade achieved -0.027 0.009 3.00 0.008 Yes
Reading grade
equivalent
-0.07 0.018 3.89 0 Yes
Class standing -0.006 0.003 2.00 0.048 No
Absence from school 4.8 1.7 2.82 0.006 Yes
Grammatical reasoning 0.159 0.062 2.57 0.012 No
Vocabulary -0.124 0.032 3.88 0 Yes
Hand-eye coordination 0.041 0.018 2.28 0.02 No
Reaction time 11.8 6.66 1.77 0.08 No
Minor antisocial
behavior
-0.639 0.36 1.77 0.082 No
c. It would be inappropriate to assume cause and effect without a better understanding of how the study was
conducted.
166
12.57 a.
c. The fit of this regression is weak as given by R
2
= 0.2474. 24% of the variation in % Operating Margin is
explained by % Equity Financing.
12.58 a.
c. The fit of this regression is very good as given by the r-squared value 0.8216. The regression line does
show a strong positive linear relationship between molecular wt and retention time, indicating that the
greater the molecular wt the greater is the retention time.
12.59 a. Based on both the r-squared = 0 and the p-value > .10, there is no relationship between the class size and
teacher ratings.
b. Given that r-squared = 0, we have not explained teacher ratings in this bivariate model. Other factors
would be students expected GPA, years teaching, core class, age of student, gender of student, gender of
instructor, etc. Answers will vary with respect to other teachers.
167
12.60 a. The scatter plot shows a positive relationship.
c. The fit of this regression is very good as given by the r-squared value .8206. The regression line shows a
strong positive linear relationship between revenue and profit, indicating that the greater the revenue the
greater is the profit.
12.61 a. The slope of each model indicates the impact of an additional year of the vehicle has on the price. This
relationship for each model is negative indicating that an additional year of age reduces the asking price.
This impact ranges from a low for the Taurus (an additional year reduces the asking price by $906) to a
high for the Ford Explorer (an additional year reduces the asking price by $2,452).
b. The intercepts could indicate the price of a new vehicle.
c. Based on the R-squared values: The fit is very good for the Explorer, the F-150 Pickup and the Taurus.
The fit is weak for the Mustang. One reason for the seemingly poor fit for the Mustang is the fact that
this is a collector item (if in good condition) so that the age is less important of a factor in determining
the asking price.
d. Answers will vary, but a bivariate model for 3 of the vehicles explains approximately 2/3 of the variation
in asking price at a minimum. Other factors: condition of the car, collector status, proposed usage, price
of a new vehicle.
12.62 a. The regression results are not significant, based on the p-value, for the 1-year holding period. The results
for the 2-Year period are significant at the 5% level, while for 2 years and beyond the results are
significant at the 1% level. For each regression there is an inverse relationship between P/E and the stock
return. For the 8-year and 10-year period the relationship is approximately -1. The R-squared increases
as the holding period increases. This indicates that P/E ratio explains a greater portion of the variation in
stock return, the longer the stock is held.
b. Yes, given the data are time series, the potential for autocorrelation is present. Also, it is commonly
recognized that stock returns do exhibit a high degree of autocorrelation, as do most financial series.
12.63 a. Using Fathers Height: My Predicted Height = 71+2.5 = 73.5 My actual height = 73
Using Average of Parents Height: My Predicted Height = 68+2.5 = 70.5
b. Fairly accurate within 0.5 when using my fathers height, within 2.5 when using average parent height.
May be there is improved accuracy using only fathers height for males.
c. Regression analysis of samples of daughters and sons, with respective average height of parents.
Separate samples of each.
168
Chapter 13
Multiple Regression
13.1 a. Y = 4.31 0.082*ShipCost + 2.265*PrintAds + 2.498*WebAds + 16.7*Rebate%
b. The coefficient of ShipCost says that each additional $1 of shipping cost reduces about$ 0.082 from net
revenue.
The coefficient of PrintAds says that each additional $1000 of printed ads adds about $2,265 to net
revenue.
The coefficient of WebAds says that each additional $1000 of printed ads adds about $2,498 to net
revenue.
The coefficient of Rebate% says that each additional percentage in the rebate rate adds about $16,700 to
net revenue.
c. The intercept is meaningless. You have to supply some product, so shipping cost cant be zero. You
dont have to have a rebate or ads, they can be zero.
d. NetRevenue = $467,160.
13.2 a. Y = 1225 + 11.52*FloorSpace 6.935*CompetingAds .1496*Price
b. The coefficient of FloorSpace says that each additional square foot of floor space adds about 11.52 to
sales (in thousand of dollars).
The coefficient of CompetingAds says that each additional $1000 of CompetingAds reduces about 6.935
from sales (in thousand of dollars).
The coefficient of Price says that each additional $1 of Advertised Price reduces about .1496 from net
revenue (in thousand of dollars).
c. No. If all of these variables are zero, you wouldnt sell a bike (no one will advertise a bike for zero).
d. Sales = $48.6 thousand
13.3 a. DF are 4, 45
b. F.05 = 2.61, using df = 4, 40.
c. F = 64,853/4990 = 12.997. Yes, the overall regression is significant.
H0: All the coefficients are zero (1 = 2 = 3 = 0)
H1: At least one coefficients is non-zero
d. R
2
= 259,412/483,951 = .536 R
2
adj = 1 (1 .536)(49/45) = .4948
13.4 a. DF are 3, 26
b. F.05 = 2.61
c. F = 398802/14590 = 27.334. Yes, the overall regression is significant.
H0: All the coefficients are zero (1 = 2 = 3 = 0)
H1: At least one coefficients is non-zero
d. R
2
= 1196410/1575741 = .759 R
2
adj = 1 (1 .759)(29/265) = .731
13.5 a.
Predictor t-value
Intercept 0.0608585 0.9517414
ShipCost -0.0175289 0.9860922
PrintAds 2.1571429 0.0363725
WebAds 2.9537661 0.0049772
Rebate% 4.6770308
b. t.005 = 2.69. Web Ads and Rebate% differ significantly from zero (p-value < .01 and t-value>2.69.)
c. See table in part a.
169
13.6 a.
Predictor t-value
Intercept 3.0843192 0.0034816
FloorSpace 8.6631579
CmpetingAds -1.7759283 0.0825069
Price-0.14955 -1.6752548 0.1008207
Critical Value of t
b. t.005 = 2.779. Only Floor Space differs significantly from zero (p-value < .01 and t-value>2.779.
c. See above table
13.7 Use formulas in text: 13.11b and 13.12b and t with 34 df and .025 in the upper tail:

i
y
tnk1SE =

i
y
2.032*3620 =

i
y 7355.84
Using the quick rule:

i
y 2SE

i
y 2*(3620) =

i
y 7240
Yes, the quick rule gives similar results.

13.8 Use formulas 13.11b and 13.12b and t with 20 df and .025 in the upper tail:

i
y
tnk1SE =

i
y
2.086*1.17 =

i
y
2.44062
Using the quick rule

i
y 2SE

i
y 2*(1.17) = v 2.34
Yes, the quick rule gives similar results.
13.9 All are cross-sectional data except for Data Set D which is time series.
13.10 Answers will vary. Casual observation indicates that X and Y data for each data set are well conditioned.
13.11 Answers will vary. Sample Answers based on selection of independent variables:
A: Length (-) Width (-) Weight (-) Japan (+)
B: Price (-) Shelf (+)
C: Floor (+) Offices (+)Entrances (+) Age (-)Freeway (+)
D: CapUtil (+) ChgM1 (+) ChgM2 (+) ChgM3 (+)
E: Dropout (-) EdSpend (+) Urban (+) Age (-) FemLab (+) Neast (+) Seast (+) West (+)
F: TotalHP (+) NumBlades (+) Turbo (+)
G: MW(+) BP (-) RI (+) H1 (+) H2 (+) H3 (+) H4 (+) H5 (+)
H: Height (+) Line (+) LB (+) DB (+) RB (+)
I: Age (+) Weight (+) Height (+) Neck (+) Chest (+) Abdomen (+) Hip (+) Thigh (+)
J: Age (-) Car (+) Truck (+) SUV (+)
13.12 Evans Rule : A, B, D, F, H
Doanes Rule: C, E, G, I
13.13 Data Set A: y = 43.9932 0.0039length 0.1064width 0.0041weight 1.3228Japan.
Data Set B: y = 87.1968 0.0016Price 1.3881Shelf.
Data Set C: y = 59.3894 + 0.2509Floor + 97.7927Offices + 72.8405Entrances 0.4570Age +
116.1786Freeway.
Data Set D: y = 21.6531 + 0.2745CapUtil + 0.2703ChgM1 0.2012ChgM2 + 0.4630ChgM3.
Data Set E: y = 2.1471 0.0258Dropout + 0.0006EdSpend + 0.0891Urban 0.2685Age + 0.3516Femlab +
3.9749Neast + 1.4456Seast + 1.8117.
Data Set F: y= 696.9390 + 0.3927Year + 0.1787TotalHP + 8.8269NumBlades + 15.9752Turbo.
Data Set G: y = 51.3827 0.1772MW + 1.4901BP 13.1620RI 13.8067H1 6.4334H2 12.2297H3
0.5823H4.
Data Set H: y = 12.0098 + 2.8141Height + 69.0801Line + 23.7299LB 5.3320DB.
170
Data Set I: y = 35.4309Age 0.1928Weight 0.0642Height 0.3348Neck.
Data Set J: y = 15,340.7233 693.9768Age 533.5731Car + 5,748.1799Truck + 3,897.5375SUV.
The Regression Analysis Output for each data set follows. Please refer to the output for answers to
questions 13.14 13.17.
Data Set A
R 0.703
Adjusted
R 0.671 n 43
R 0.838 k 4
Std. Error 2.505 Dep. Var. City
ANOVA table
Source SS df MS F p-value
Regression 563.9264 4 140.9816 22.46 1.40E-09
Residual 238.5387 38 6.2773
Total 802.4651 42
variables
coefficient
s std. error
t
(df=38) p-value 95% lower 95% upper VIF
Intercept 43.9932 8.4767 5.190 7.33E-06 26.8330 61.1534
Length -0.0039 0.0445 -0.087 .9311 -0.0939 0.0862 2.672
Width -0.1064 0.1395 -0.763 .4501 -0.3888 0.1759 2.746
Weight -0.0041 0.0008 -4.955 1.53E-05 -0.0058 -0.0024 2.907
Japan -1.3228 0.8146 -1.624 .1127 -2.9718 0.3262 1.106
Mean VIF 2.358
Data Set B
R 0.034
Adjusted R 0.000 n 27
R 0.185 k 2
Std. Error 4.060 Dep. Var. Accuracy
ANOVA table
Source SS df MS F p-value
Regression 14.0006 2 7.0003 0.42 .6588
Residual 395.6290 24 16.4845
Total 409.6296 26
variables coefficients std. error t (df=24) p-value 95% lower 95% upper VIF
Intercept 87.1968 2.4030 36.286 1.76E-22 82.2372 92.1564
Price -0.0016 0.0047 -0.338 .7382 -0.0113 0.0081
1.05
4
Shelf 1.3881 1.8307 0.758 .4557 -2.3903 5.1666
1.05
4
Data Set C
R 0.967
Adjusted R 0.961 n 32
171
R 0.983 k 5
Std. Error 90.189 Dep. Var. Assessed
ANOVA table
Source SS df MS F p-value
Regression
6,225,261.256
1 5 1,245,052.25 153.07 2.01E-18
Residual 211,486.6189 26 8,134.11
Total
6,436,747.875
0 31
variables coefficients
std.
error t (df=26) p-value 95% lower 95% upper VIF
Intercept -59.3894 71.9826 -0.825 .4168 -207.3520 88.5731
Floor 0.2509 0.0218 11.494 1.08E-11 0.2060 0.2957 3.757
Offices 97.7927 30.8056 3.175 .0038 34.4708 161.1146 3.267
Entrances 72.8405 38.7501 1.880 .0714 -6.8115 152.4924 1.638
Age -0.4570 1.2011 -0.380 .7067 -2.9258 2.0118 1.169
Freeway 116.1786 34.7721 3.341 .0025 44.7035 187.6536 1.185
Mean VIF 2.203
Data Set D
R 0.347
Adjusted R 0.275 n 41
R 0.589 k 4
Std. Error 2.672 Dep. Var. ChgCPI
ANOVA table
Source SS df MS F p-value
Regression 136.8772 4 34.2193 4.79 .0033
Residual 257.0584 36 7.1405
Total 393.9356 40
Regression output confidence interval
variables coefficients
std.
error t (df=36) p-value 95% lower 95% upper VIF
Intercept -21.6531 9.5228 -2.274 .0290 -40.9662 -2.3399
CapUtil 0.2745 0.1130 2.429 .0203 0.0453 0.5038 1.205
ChgM1 0.2703 0.1069 2.530 .0159 0.0536 0.4870 1.193
ChgM2 -0.2012 0.2981 -0.675 .5040 -0.8058 0.4034 5.017
ChgM3 0.4630 0.2463 1.879 .0683 -0.0366 0.9626 4.489
Mean VIF 2.976
Data Set E
R 0.729
Adjusted R 0.677 n 50
R 0.854 k 8
172
Std. Error 2.128
Dep.
Var. ColGrad%
ANOVA table
Source SS df MS F p-value
Regression 500.5063 8 62.5633 13.82 1.78E-09
Residual 185.6579 41 4.5282
Total 686.1642 49
variables coefficients std. error
t
(df=41) p-value
95%
lower
95%
upper VIF
Intercept -2.1471 11.3532 -0.189 .8509 -25.0753 20.7811
Dropout -0.0258 0.0564 -0.458 .6495 -0.1398 0.0881 2.189
EdSpend 0.0006 0.00036045 1.568 .1245 -0.0002 0.0013 2.343
Urban 0.0891 0.0253 3.520 .0011 0.0380 0.1402 1.492
Age -0.2685 0.2517 -1.067 .2923 -0.7769 0.2398 1.640
Femlab 0.3516 0.0894 3.935 .0003 0.1711 0.5321 1.652
Neast 3.9749 1.0908 3.644 .0007 1.7720 6.1778 2.254
Seast 1.4456 1.2430 1.163 .2516 -1.0647 3.9559 3.112
West 1.8117 0.9069 1.998 .0524 -0.0198 3.6432 1.831
Mean
VIF 2.064
Data Set F
R 0.768
Adjusted R 0.750 n 55
R 0.876 k 4
Std. Error 18.097 Dep. Var. Cruise
ANOVA table
Source SS df MS F p-value
Regression 54,232.9050 4 13,558.23 41.40 2.75E-15
Residual 16,375.2041 50 327.50
Total 70,608.1091 54
Regression output confidence interval
variables coefficients std. error t (df=50) p-value 95% lower
95%
upper VIF
Intercept -696.9390 393.3465 -1.772 .0825 -1,486.9990 93.1209
Year 0.3927 0.1991 1.972 .0541 -0.0073 0.7927 1.131
TotalHP 0.1787 0.0195 9.167 2.76E-12 0.1396 0.2179 1.459
NumBlades 8.8269 5.7530 1.534 .1313 -2.7284 20.3823 1.716
Turbo 15.9752 6.2959 2.537 .0143 3.3296 28.6208 1.201
Mean VIF 1.377
Data Set G
R 0.987
Adjusted R 0.983 n 35
173
R 0.993 k 7
Std. Error 8.571 Dep. Var. Ret
ANOVA table
Source SS df MS F p-value
Regression 146,878.2005 7 20,982.6001 285.64 1.27E-23
Residual 1,983.3648 27 73.4580
Total 148,861.5653 34
variables coefficients std. error t (df=27)
p-
value
95%
lower
95%
upper VIF
Intercept 51.3827 162.7418 0.316 .7546 -
282.535
385.3010
MW -0.1772 0.3083 -0.575 .5701 -0.8097 0.4553 21.409
BP 1.4901 0.1831 8.139
9.64E-
09 1.1144 1.8657 31.113
RI -13.1620 107.2293 -0.123 .9032 -
233.178
206.8542 13.115
H1 -13.8067 9.7452 -1.417 .1680 -33.8022 6.1888 9.235
H2 -6.4334 8.6848 -0.741 .4652 -24.2531 11.3863 2.816
H3 -12.2297 8.1138 -1.507 .1434 -28.8779 4.4184 2.458
H4 -0.5823 4.8499 -0.120 .9053 -10.5335 9.3689 1.793
Mean VIF 11.706
Data Set H
R 0.806
Adjusted R 0.789 n 50
R 0.898 k 4
Std. Error 19.256 Dep. Var. Weight
ANOVA table
Source SS df MS F p-value
Regression 69,245.0001 4 17,311.25 46.69 1.84E-15
Residual 16,685.0799 45 370.78
Total 85,930.0800 49
variables coefficients std. error t (df=45) p-value
95%
lower
95%
upper VIF
Intercept -12.0098 118.3477 -0.101 .9196 -250.3743
226.354
7
Height 2.8141 1.6495 1.706 .0949 -0.5083 6.1364 2.257
Line 69.0801 10.1884 6.780 2.16E-08 48.5597 89.6006 3.141
LB 23.7299 8.9644 2.647 .0111 5.6748 41.7851 1.734
DB -5.3320 8.0565 -0.662 .5115 -21.5587 10.8947 1.502
Mean
VIF 2.158
Data Set I
R 0.841
Adjusted R 0.810 n 50
R 0.917 k 8
Std. Error 3.957
Dep.
Var. Fat%
174
ANOVA table
Source SS df MS F p-value
Regression 3,399.1446 8
424.893
1 27.14
4.82E-
14
Residual 641.8882 41 15.6558
Total 4,041.0328 49
variables coefficients
std.
error
t
(df=41) p-value
95%
lower
95%
upper VIF
Intercept -35.4309 24.9040 -1.423 .1624
-
85.7256 14.8639
Age 0.0905 0.0880 1.028 .3099 -0.0872 0.2682 1.712
Weight -0.1928 0.0783 -2.462 .0181 -0.3510 -0.0346 31.111
Height -0.0642 0.1160 -0.554 .5827 -0.2984 0.1700 1.689
Neck -0.3348 0.4023 -0.832 .4100 -1.1472 0.4776 5.472
Chest 0.0239 0.1788 0.133 .8945 -0.3373 0.3850 11.275
Abdomen 0.9132 0.1640 5.570
1.77E-
06 0.5821 1.2444 17.714
Hip -0.3107 0.2749 -1.130 .2649 -0.8658 0.2445 25.899
Thigh 0.7787 0.2907 2.678 .0106 0.1915 1.3658 11.931
Mean
VIF 13.350
Data Set J
R 0.139
Adjusted R 0.134 n 637
R 0.373 k 4
Std. Error 8573.178 Dep. Var. Price
ANOVA table
Source SS df MS F p-value
Regression 7,512,691,866 4 1,878,172,966 25.55 1.20E-19
Residual
46,451,606,04
7 632 73,499,377
Total
53,964,297,91
3 636
variables coefficients std. error t (df=632) p-value 95% lower 95% upper VIF
Intercept 15,340.7233 1,239.0560 12.381 1.12E-31
12,907.556
3
17,773.890
3
Age -693.9768 117.6801 -5.897 6.02E-09 -925.0682 -462.8853 1.017
Car -533.5731 1,225.8598 -0.435 .6635 -2,940.8263 1,873.6802 3.201
Truck 5,748.1799 1,318.6111 4.359 1.52E-05 3,158.7885 8,337.5713 2.662
SUV 3,897.5375 1,315.4861 2.963 .0032 1,314.2828 6,480.7923 2.749
Mean VIF 2.407
175
13.14 Answers will vary by dataset, see output . The main conclusion is that if the 95 percent confidence
interval contains the value 0, the predictor coefficient is not significantly different from zero. Predictor
coefficients that are shaded in yellow do not include the value zero. These predictors are those that do
have an impact on the dependent variable.
176
13.15 The hypothesis for each data set is: H0: i = 0 versus H1: i 0
For each output provided, see yellow shaded cells. These are the predictor variables for which the null
hypothesis is rejected and are the same ones that did not include zero in the 95 percent confidence
interval from 13.14 .
A:.DF = 38, t-critical = 2.024
B: DF = 24, t-critical = 2.064
C: DF = 26, t-critical = 2.056
D: DF = 36, t-critical = 2.028
E: DF = 41, t-critical = 2.020
F: DF = 50, t-critical = 2.009
G: DF = 27, t-critical = 2.052
H: DF = 45, t-critical = 2.014
I: DF = 41, t-critical = 2.020
J: DF = 632, t-critical = 1.96
13.16 a. For full model results see shaded answers in output provided. These are the predictors whose p-values
are less than 0.05.
b. Yes, the predictors that were found to have significant coefficients from the t-tests are the same ones that
are significant from using the p-values.
c. Most prefer the p-value approach, easier to check for significance.
13.17 A:. Very Good
B: Very Poor
C: Excellent
D: Poor
E: Very Good
F: Very Good.
G: Excellent
H: Very Good
I: Very Good
J: Poor
13.18 Std errors are calculated for each full model. Use equation 13.11b to construct the prediction intervals.
A:.

i
y
tnk1 SE =

i
y
2.024*2.505 =

i
y
5.07012
B:

i
y
tnk1 SE =

i
y
2.064*176.291 =

i
y
363.864624
C:

i
y
tnk1 SE =

i
y
2.056*90.189 =

i
y
185.428584
D:

i
y
tnk1 SE =

i
y
2.028*2.672 =

i
y
5.563104
E:

i
y
tnk1 SE =

i
y
2.020*2.128 =

i
y
4.29856
F:

i
y
tnk1 SE =

i
y
2.0009*18.097 =

i
y
36.356873
G:

i
y
tnk1 SE =

i
y
2.052*8.571 =

i
y
17.587692
H:

i
y
tnk1 SE =

i
y
2.014*19.256 =

i
y
38.781584
I:

i
y
tnk1 SE =

i
y
2.020*3.957 =

i
y
7.99314
J:

i
y
tnk1 SE =

i
y
1.96*8573.178 =

i
y
16.80342888
177
13.19 a.
Data Set A Correlation Matrix
Length Width Weight Japan
Length 1.000
Width .720 1.000
Weight .753 .739 1.000
Japan -.160 -.267 -.093 1.000
43 sample size
.301 critical value .05 (two-tail)
.389 critical value .01 (two-tail)
Yes, width and height are correlated with other.
Data Set B Correlation Matrix
Price Shelf
Price 1.000
Shelf -.227 1.000
27 sample size
.381 critical value .05 (two-tail)
.487 critical value .01 (two-tail)
No correlation found.
Data Set C Correlation Matrix
Offices Entrances Freeway
Floor
Offices 1.000
Entrances .444 1.000
Age -.241 .136
Freeway -.368 -.082 1.000
32 sample size
.349 critical value .05 (two-tail)
.449 critical value .01 (two-tail)
Offices and Entrances correlated with each other.
Data Set D Correlation Matrix
CapUtil ChgM1 ChgM2 ChgM3
CapUtil 1.000
ChgM1 -.241 1.000
ChgM2 -.265 .266 1.000
ChgM3 -.071 .080 .857 1.000
41 sample size
.308 critical value .05 (two-tail)
.398 critical value .01 (two-tail)
M2 and M3 are highly correlated.
178
Data Set E Correlation Matrix
Dropout EdSpend Urban Age Femlab Neast Seast West
Midwes
t
Dropout 1.000
EdSpend -.047 1.000
Urban .096 .260 1.000
Age -.067 .340 -.099 1.000
Femlab -.445 .258 .101 -.226 1.000
Neast -.009 .667 .080 .316 .169 1.000
Seast .550 -.394 -.380 .089 -.495 -.298 1.000
West -.059 -.135 .352 -.428 .138 -.331 -.350 1.000
Midwest -.466 -.108 -.066 .053 .182 -.315 -.333 -.370 1.000
50 sample size
.279 critical value .05 (two-tail)
.361 critical value .01 (two-tail)
Regional differences correlated with other predictor variables. This is to be expected as regional
differences influence college graduation rate as well as the factors that influence those rates.
Data Set F Correlation Matrix
TotalHP NumBlades
Year
TotalHP 1.000
NumBlades .491 1.000
Turbo .096 .388
55 sample size
.266 critical value .05 (two-tail)
.345 critical value .01 (two-tail)
Number of Blades is correlated with both Turbo and Total HP
Data Set G Correlation Matrix
35 sample size
.334 critical value .05 (two-tail)
.430 critical value .01 (two-tail)
BP correlated with MW, RI and H1
RI correlated with H5
H5 correlated with H1
179
Data Set H Correlation Matrix
Heigh
t Line LB DB RB
Heigh
t 1.000
Line .683 1.000
LB .032 -.359
1.00
0
DB -.351 -.381 -.266
1.00
0
RB -.447 -.403 -.281 -.298 1.000
50 sample size
.279 critical value .05 (two-tail)
.361 critical value .01 (two-tail)
Line, DB and RB correlated with Height
DB and Line correlated
RB and Line correlated
These correlations make sense. Each position is specialized, so if you are fit
for one, chances are you are not fit for any other.
Data Set I Correlation Matrix
Age Weight Height Neck Chest
Abdome
n Hip Thigh
Age
1.00
0
Weight .265 1.000
Height -.276 .109 1.000
Neck .176 .882 .201 1.000
Chest .376 .912 .014 .820 1.000
Abdomen .442 .915 -.052 .781 .942 1.000
Hip .314 .959 -.045 .804 .911 .942 1.000
Thigh .219 .937 -.037 .823 .859 .890 .938 1.000
50 sample size
.279 critical value .05 (two-tail)
.361 critical value .01 (two-tail)
Weight is correlated with the body parts given. Other body parts are correlated with each other. This is
not unexpected.
Data Set J Correlation Matrix
Age Car Truck SUV Van
Age 1.000
Car .003 1.000
Truc
k .017 -.478
1.00
0
SUV -.092 -.495 -.308
1.00
0
Van .106 -.283 -.176 -.182 1.000
637 sample size
.078 critical value .05 (two-tail)
.102 critical value .01 (two-tail)
637 sample size
.078 critical value .05 (two-tail)
180
.102 critical value .01 (two-tail)
SUV correlated with Age, Car, and Truck
Van is correlated with all variables
Car is correlated with Truck, SUV and Van
Such a large n reduces the critical values so even small r is significant.
181
13.20 a. See output.
b. Multicollinearity is a potential problem if the VIF is greater than 10 (rule of thumb): G has potential
multicollinearity problems based on the VIFs
13.21 A: observation 42 is an outlier, no unusual residuals.
B: observation 11 has an unusual residual; no outliers.
C: no observations have unusual residuals or outliers.
D: observations 19 and 20 have unusual residuals; no outliers.
E: observation 6 has an unusual residual; there are no outliers.
F: observations 23, 39 and 46 have unusual residuals; no outliers.
G: observations 15, 17 and 25 are unusual residuals; no outliers.
H: observations 1, 6, 26, 48 are unusual residuals, no outliers.
I: no unusual residuals or outliers.
J: observations 246, 397 and 631-632 are unusual residuals; 212, 342, 502 are outliers.
13.22 A: observations 2, 8, 13, and 21 have high leverage.
B: observation 18 has high leverage.
C: No Leverage effects present.
D: observations 16 and 33 have high leverage.
E: observations 2, 44 and 48 have high leverage.
F: observations 43 and 46 have high leverage.
G: observation 24 has high leverage.
H: observations 20 and 44 have high leverage.
I: observations 5, 15, 36, 39, and 42 have high leverage.
J: observations 1-4, 52, 77, 92-101, 116, 126, 178-181, 184, 270-272, 298, 493, 502, 522, 554, 556-564, 611-
624, 627-628 have high leverage.
13.23 Normality is a problem for J.
13.24 Heteroscedasticity is a concern for J.
13.25 Durbin-Watson for D: 0.74, indicating autocorrelation could be a potential problem.
13.26 a. Each slope measures the additional revenue earned by selling one more unit (one more car, truck, or
SUV, respectively.)
b. The intercept is not meaningful. Ford has to sell at least one car, truck or SUVto earn revenue. No sales
mean no revenue.
c. The predictor variables are highly correlated to each other (multicollinearity problem), as well as related
to missing variables that influence their sales as well as revenue.
13.27 The sample size is too small relative to the number of predictors. Using the following:
Evans Rule (conservative) n/k 10 (at least 10 observations per predictor)
Doanes Rule (relaxed) n/k 5 (at least 5 observations per predictor)
A researcher would have to either reduce the number of predictors or increase the size of the sample.
With 8 predictors, one needs a minimum of 40 observations using Doanes Rule or 80 using Evans Rule.
If increasing the sample size is not feasible, then pairwise t-tests on group means could be performed by
recalculating the groupings of the proposed binaries.
13.28 a. One binary must be omitted to prevent perfect multicollinearity.
b. Same reasoning as in (a). The rule is if you use the intercept then you must use one less binary than the
total number of binaries as predictors. The interpretation of the intercept is the missing binary.
c. Monday: 11.2 + 1.19 = 12.39
d. Shift 3 11.2, Shift 1 and Shift 2 have lower AvgOccupancy given that they have negative coefficients.
e. The intercept represents the AvgOccupancy on Sundays during Shift 3.
182
f. The fit is poor.
183
13.29 Main points:
1. The regression as a whole is not significant based on the p-value of .3710
2. R-squared is 0.117 indicating a very poor fit.
3. Examination of the individual regression coefficients indicates that the two binary variables are not
significantly different from zero, p-values >.10.
4. Conclusion: cost per average load does not differ based on whether or not it is a top-load washer or
whether or not powder was used. No apparent cost savings based on washer type or detergent type.
13.30 Main points:
1. The best model in terms of fit as measured by the R-squared is model (NVAR=3), although only a
small improvement over model (NVAR=2). No gain if fit is achieved by adding LifeExp and Density.
2. Examination of the individual regression coefficients indicates that the InfMort and Literate have p-
values < .01 and GDPCap has a p-value < 0.05.
3. Conclusion: Infant mortality and Literate have the greatest impact on birth rates.
13.31 a. Yes, the coefficients make sense, except for TrnOvr. One would think that turnovers would actually
reduce the number of wins, not increase them.
b. No. It is negative and the number of games won is limited to zero or greater. You cant win games with
all of the presented values = 0.
c. One needs either 5 observations or 10 observations per predictor. Here we have 6, so we need 30
observations minimum for Doanes rule, but we only have 23. Yes, the sample size is a problem.
d. Rebounds and points highly correlated. We dont need both of them and the variance of rebounds is
increased, which increases the denominator of the test-statistic biasing it towards the non-rejection of the
null hypothesis.
13.32 Main points:
1. The regression as a whole indicates a very strong fit.
2. R-squared is .81. The predictor variables as a group explain 81.1% of the variation in Salary.
3. Examination of the individual regression coefficients indicates that all of the variables are significantly
different from zero, p-values <0.01
4. Conclusion: The ethnicity of a professor does matter. A professor who is African-American earns on
average $2,093 less than one who is not. Assistant professors earn on average $6,438 less than other
professors. New hires earn less than those who have been there for some time. The finding that those
who have been there longer have higher salaries is a fact of rank and the tenure system. The finding that
after controlling for this, race does matter does support that racial discrimination is present.
13.33 a. Both men and women who had prior marathon experience had lower times on average than those who
were running for the first time.
b. No the intercept does not have any meaning. If all predictor/binary variables were 0 then you wouldnt
have an individual racer.
c. It is suspected non-linearity is present among age, weight, and height. In this model we see increases in
age decreases times, but at an increasing rate, increases in weight decreases time, but at an increasing rate
and increasing height increases time, but at a decreasing rate.
d. The model predicted that I would run the marathon in about 12 and hours. And that could be right. I
can walk 4 mph so it would take at least 6 to 7 hours miminum!
184
Chapter 14
Time Series Analysis
Note: For questions that deal with best model the student should consider four criteria:
Occams Razor Would a simpler model suffice?
Overall fit How does the trend fit the past data?
Believability Does the extrapolated trend look right?
Fit to recent data Does the fitted trend match the last few data points
14-1 a. See graphs.
b. Diesel fuel may be cheaper and may offer better MPG.
c. See graphs.
d. Exponential model is simple and looks like a good fit.
e. Linear seems too conservative (about 41,000 by 2006). Quadratic is reasonable (about 61,000 by 2006).
Exponential is aggressive (about 84,000 by 2006).
t Linear Exponential Quadratic
12 33,980 49,723 43,895
13 37,256 64,778 52,130
14 40,533 84,391 61,127
y = 3276.5x - 5338.6
R
2
= 0.8338
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
y = 381.37x
2
- 1299.9x + 4576.9
R
2
= 0.9219
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
185
y = 2080.2e
0.2645x
R
2
= 0.9372
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
14-2 a. Non-linear (decline, then rise).
b. Rise of internet, increase in high speed connections.
c. Quadratic is by far the best fit.
d. Quadratic forecast is about 10.0 for 2007.
e. Linear and exponential forecasts are flat, but could be right
t Linear
Exponentia
l
Quadrati
c
8 7.23 7.20 9.71
y = 0.0357x + 6.9429
R
2
= 0.009
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
2000 2001 2002 2003 2004 2005 2006
y = 0.2071x
2
- 1.6214x + 9.4286
R
2
= 0.9219
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
2000 2001 2002 2003 2004 2005 2006
186
y = 6.8934e
0.0055x
R
2
= 0.0106
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
2000 2001 2002 2003 2004 2005 2006
14.3 a. Somewhat linear, but maybe slowing.
b. Rise of internet, higher speed connections, increased PC use, hacker cult
c. See graphs.
d. Use criteria from text to assess. Students may like the quadratic because it has a good fit, but also
because its projections look more reasonable than the other two models.
e. Quadratic forecasts about 120. But, in hindsight, exponential (almost 240) may have come closer to
foretelling the virus explosion of the first decade of the 2000s.
t Linear
Exponentia
l
Quadrati
c
8 135.7 236.92 119.4
y = 18.143x - 9.4286
R
2
= 0.9244
0
20
40
60
80
100
120
140
160
180
200
1996 1997 1998 1999 2000 2001 2002
y = -1.3571x
2
+ 29x - 25.714
R
2
= 0.9399
0
20
40
60
80
100
120
140
1996 1997 1998 1999 2000 2001 2002
187
y = 9.4401e
0.4028x
R
2
= 0.8785
0
100
200
300
400
500
600
1996 1997 1998 1999 2000 2001 2002
14.4 a. Fairly linear increase.
b. Needs of family, expenses incurred requires more income
c. See graphs.
d. Each has merit. Simplicity favors the linear, but quadratic captures the slowing.
e. Linear gives 3,770 hours, quadratic only about 3,693 hours.
t Linear
Exponentia
l
Quadrati
c
19 3,770 3,782 3,693
y = 29.611x + 3206.9
R
2
= 0.9232
2,800
3,000
3,200
3,400
3,600
3,800
4,000
1
9
8
2
1
9
8
4
1
9
8
6
1
9
8
8
1
9
9
0
1
9
9
2
1
9
9
4
1
9
9
6
1
9
9
8
y = -1.2051x
2
+ 52.508x + 3130.5
R
2
= 0.9559
2,800
3,000
3,200
3,400
3,600
3,800
4,000
1
9
8
2
1
9
8
4
1
9
8
6
1
9
8
8
1
9
9
0
1
9
9
2
1
9
9
4
1
9
9
6
1
9
9
8
188
y = 3211.8e
0.0086x
R
2
= 0.9139
2,800
3,000
3,200
3,400
3,600
3,800
4,000
1
9
8
2
1
9
8
4
1
9
8
6
1
9
8
8
1
9
9
0
1
9
9
2
1
9
9
4
1
9
9
6
1
9
9
8
14.5 a. Quite linear.
b. Rise of health care concerns, change to healthy diet
c. yt = 581.73 + 25.55t.
d. Increased capacity needed for production and distribution.
e. Using t = 6 we get y6 = 581.73 + 25.55(6) = 735.
y = 25.55x + 581.73
R
2
= 0.9898
500
550
600
650
700
750
1980 1985 1990 1995 2000
y = -0.7786x
2
+ 30.221x + 576.28
R
2
= 0.9911
500
550
600
650
700
750
1980 1985 1990 1995 2000
189
y = 584.92e
0.0389x
R
2
= 0.9886
500
550
600
650
700
750
800
1980 1985 1990 1995 2000
14.6 a. See graphs. A cyclical pattern is observed (positive autocorrelation).
b. As m increases (i.e., more smoothing) the trendline does not describe the data as well. m = 2 gives the
best fit while m = 5 gives the most smoothing. A larger m is actually helpful if we are trying to reduce
the impact of day-to-day fluctuations (if you want to track the data exactly, why smooth at all).
c. Can help anticipate the next days rate. Trend models arent much help.

190
14.7 a. Yields seem to exhibit a cyclical pattern that is not like any standard trend model (looks like positive
autocorrelation) so exponential smoothing seems le a good choice for making a one-period forecast.
b-f. MegaStat output is lengthy. Here is a summary table:
= .10 = .20 = .30
Mean Squared Error 0.039 0.028 0.021
Mean Absolute Percent Error 3.8% 3.1% 2.7%
Percent Positive Errors 42.3% 46.2% 51.9%
Forecast for Period 53 4.3 4.37 4.43
14.8 a. Seasonality is present as well as a positive trend.
Calculation of Seasonal Indexes
1 2 3 4
1 0.923 1.170
2 0.858 0.982 0.935 1.141
3 0.875 1.027 1.030 1.208
4 0.788 0.974 1.007 1.160
5 0.847 0.981 1.002 1.160
6 0.866 0.979
mean: 0.847 0.988 0.979 1.168 3.982
adjusted: 0.850 0.993 0.984 1.173 4.000
191
b. Time and seasonal binaries are significant.
Regression Analysis
R 0.842
Adjusted R 0.809 n 24
R 0.918 k 4
Std. Error 528.304 Dep. Var. Revenue
variables coefficients std. error t (df=19) p-value
Intercept 5,845.3500 308.8056 18.929 8.64E-14
Qtr1 -1,835.2060 308.6711 -5.946 1.01E-05
Qtr2 -1,111.0262 306.6461 -3.623 .0018
Qtr3 -1,145.5131 305.4247 -3.751 .0014
t 111.1536 15.7861 7.041 1.06E-06
c. Forecasts for 2005:
Period Forecasts
2005 Q1 6788.98
2005 Q2 7624.32
2005 Q3 7700.98
2005 Q4 8957.65
14.9 a. No trend present, but there is seasonality. It is important to emphasize that 1.00 is the key reference
point for a multiplicative seasonal index. The creative student might plot the seasonal indexes. A chart
can show that May, Jun, Sep, Oct, and Nov are above average (seasonal index exceeding 1.00).
MegaStats trend fitted to deseasonalized data (shown below) indicates a slight downward trend, but
without a t-statistic we cannot tell if it is significant.
Calculation of
Seasonal Indexes
1 2 3 4 5 6 7 8 9 10 11 12
1 0.981 0.611 1.087 1.127 1.066 0.925
2 0.907 0.869 1.062 1.111 0.991 1.003 0.890 0.711 0.980 1.288 1.684 1.060
3 0.597 0.869 1.175 0.663 0.794 1.154 1.017 1.108 1.136 1.168 1.128 0.749
4 1.006 0.527 0.636 1.065 2.484 1.000 0.957
5
mean: 0.836 0.755 0.958 0.946 1.423 1.052 0.961 0.810 1.068 1.194 1.293 0.911
adjusted: 0.822 0.742 0.941 0.930 1.399 1.034 0.945 0.796 1.049 1.174 1.271 0.896
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
S
e
a
s
o
n
a
l

I
n
d
e
x
192
b. Fit of the multiple regression model is not very good (R
2
= .282, R
2
adj = .036) and only April (t = 2.296,
p = .0278) shows significant seasonality at = .05. This is reasonable, since spring might cause a spike
in Corvette sales. The trend term (Time) is negative, but is not significant.
R 0.282
Adjusted R 0.036 n 48
R 0.531 k 12
Std. Error 802.557 Dep. Var. Sales
variables coefficients std. error t (df=35) p-value
Intercept 2,391.5417 477.6169 5.007 1.57E-05
Jan -242.5153 575.3861 -0.421 .6760
Feb 410.7361 574.0241 0.716 .4790
Mar 523.9875 572.7890 0.915 .3666
Apr 1,312.4889 571.6817 2.296 .0278
May 546.4903 570.7028 0.958 .3448
Jun 298.2417 569.8531 0.523 .6040
Jul -62.7569 569.1332 -0.110 .9128
Aug 470.7444 568.5434 0.828 .4133
Sep 599.9958 568.0843 1.056 .2981
Oct 910.2472 567.7561 1.603 .1179
Nov 87.2486 567.5592 0.154 .8787
Time -7.5014 8.6341 -0.869 .3909
c. Forecasts for 2004 are shown. These forecasts involve plugging the correct binary value (0, 1) for each
month into the fitted regression equation. It is simpler than it seems at first glance, since all binaries are
zero except the month being forecast. Nonetheless, it will be a challenge for most students.
Period Forecast Period Forecast
January 1,781.46 July 1,916.21
February 2,427.21 August 2,442.21
March 2,532.96 September 2,563.96
April 3,313.96 October 2,866.71
May 2,540.46 November 2,036.21
June 2,284.71 December 1,941.46
193
194
14.10 a. See graphs for each series
b. Answers will vary. All have positive upward trend. Spirit is an aggressive new airline. To some extent,
its growth is more rapid because it is starting from a smaller base.
c. See graphs.
d. For all three variables, the linear model gives a good fit, and by Occams Razor we might choose it for
short-term forecasts. However, in many business situations, the exponential model is also a viable
model. In this case, it gives somewhat more aggressive forecasts.
e. Forecasts for period t = 6 using the linear model:
Revenue: y6 = 67.1(6) + 83.9 = 486.5
Aircraft: y6 = 3.5(6) + 12.1 = 33.1
Employees: y6 = 362.4(6) + 606.4 = 2781
Revenue

Aircraft

195
Employees

14.11 a. Dual scale graph is helpful, due to differing magnitudes (or separate graphs).
b. Electronic sales are large, but have a declining trend (yt = 31,578 1615.7t, R
2
= .9449). Mechanical
sales are small, but with a rising trend (yt = 2467 + 40.543t, R
2
= .7456).
c. Electronic sales are falling at 6.27% (yt = 32091e
0.0627t
, R
2
= .9413) while mechanical are rising at 1.54%
(yt = 2470.6e
0.0154t
, R
2
= .7465).
d. Fascination with electronic gadgets may be waning and/or competitors may be moving in on the Swiss
watch industry. They may have a stronger specialty niche in mechanical watches.
e. Electronic forecast for 2004 is about 20,268, mechanical about 2,750.
Electronic: y7 = 1615.7(7) + 31578= 20,268
Mechanical: y7 = 40.543(7) + 2466.9 = 2,750
2,400
2,450
2,500
2,550
2,600
2,650
2,700
2,750
1998 1999 2000 2001 2002 2003
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
Mechanical
Electronic
14.12 a. See graph. Trend is steadily upwards.
b. People have more leisure time and use it watching the tube.
c. See graph. Linear gives a good fit.
196
d. The linear trend should give good results based on overall fit and fit to recent data.
e. y12 = 18.609(12) + 256.98 = 480.3 minutes
f. Yes, hours in a day are finite (24 hours) and people also must work and sleep.
14.13 a. See graph. Trend is downwards.
b. Possible reasons for voter apathy might be disillusionment with politicians.
c. See graph. Linear gives a reasonable fit. Quadratic is not shown (Occams Razor).

d. The linear trend should give good results based on overall fit and fit to recent data.
e. y19 = 0.3672(19) +59.144 = 52.2 percent.
f. The Committee for the Study of the American Electorate reported that more than 122 million people
voted in the 2004 Presidential election, the highest turnout (60.7 percent) since 1968. Source:
http://www.washingtonpost.com. This is much higher than any of the forecasting models would have
predicted. This shows that past trends may not be a guide to the future.
14.14 a. See graph.
b. Steady upward trend since 1996.
197
c. Fitted trend equations:
Linear: yt = 11.599+ .8125 t (R = 0.6192)
Exponential: yt = 12.551e
.00424 t
(R = 0.5972)
Quadratic: yt = 18.396 1.736 t + .1699 t
2
(R = 0.9658)
d. Quadratic is the best model based on overall fit and fit to recent data.
Quadratic forecast: y15 = 18.396 1.736(15) +.1699(15
2
) = 30.58
14.15 a. See graph. All three are increasing steadily.
b. Linear growth due to convenience of credit and flexibility in when to pay the bills.
c. Linear model seems appropriate (Occams Razor, good fit to recent data).
d. Forecasts for 2004 (t = 10):
Total: y10 = 994.86+112.45(10) = 2119.36
Non-revolving: y10 = 583.58+73.75(10) = 1321.1
Revolving: y10 = 411.17 + 38.76(10) = 798.8
y = 112.45x + 994.86
y = 73.75x + 583.58
y = 38.7x + 411.17
0
500
1,000
1,500
2,000
2,500
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Total
Revolving
Non-revolving
14.16 a. See graph.
b. Answers will vary. Airplanes are capital goods, affected by business cycle (stronger demand expected
during periods of growth), interest rates (credit availability for financing), and foreign demand (exchange
rates, economic conditions abroad, foreign competition). There appears to be a structural break after the
1979 peak (cyclical prior to 1979).
c. No fitted trend can capture the whole pattern of aircraft sales.
d See graph. Even this subset of the data has a problematic downturn that all of the standard trend models
will miss. This may be related to the 2001 attack on the U.S. World Trade Center.

U.S. Manufactured General Aviation Shipments, 1966-2003
y = 721.53e
0.1159x
R
2
=0.808
500
1000
1500
2000
2500
3000
3500
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Year
P
la
n
e
s
e. Exponential trend is yt = 721.53e
.1159t
(R
2
= .805).
f. Exponential forecast for 2004 is y13 = 721.53e
.1159(13)
= 3,255. This forecast is aggressive, considering the
recent past. A judgment forecast might be better, just by eyeballing the most recent data, e.g., y13 = 2,500
(better than 2002-2003, worse than 2000-2001). It is well to remind students that equations do not
always give better forecasts than judgment, especially when patterns are erratic.
198
14.17 Answers depend on which series is chosen. Soft drinks (both regular and diet) are increasing, Whole
milk is decreasing, while reduced fat milk increased and then leveled off. Beer and wine are flat, while
liquor is decreased over this period. Tastes in beverages are influenced by prices, fads, perceived health
benefits, calories, changing age demographics, advertising, and many other factors that students will
think of. In the absence of strong trends (all but soft drinks) a judgment forecast will be reasonable
(2005 same as 2000). For soft drinks, a linear forecast would probably make sense.
U.S. Beverage Consumption
0
5
10
15
20
25
30
35
40
1980 1985 1990 1995 2000
G
a
l
l
o
n
s

P
e
r

C
a
p
i
t
a

. Whole milk
Reduced-fat milk
Diet soft drinks
Regular soft drinks
Fruit juices
Beer
Wine
Liquor
14.18 a. See graphs.
b. Federal taxes and spending are political variables that are affected by perceived societal needs, the
balance of political power in Congress, and leadership from the executive branch. Economic growth,
inflation, business cycles, and economic policies underlie changes in the GDP and federal debt.
199
c. Until 2000, receipts were growing faster than outlays (5.13% versus .3.95% in exponential model)
leading to a rising budget surplus. Since then, the opposite is true. Federal debt was growing more
slowly (4.93%) than the GDP (5.91%) over this entire period, though since 2000 it appears that federal
debt is increasing more rapidly. It would be useful to look up recent data in the Economic Report of the
President (www. gpoaccess.gov/eop/) to see what has been happening since 2004.
d. Answers will vary. The main difficulty is deciding what trend (if any) can be used to fit budget receipts,
since there has been a structural change since 2000.
e. Answers will vary. Relevant groups are Congress, Federal Reserve, President, businesses, households.
14.19 a. Both have slight downward trends.
b. The CD data file goes back to 1970, while the textbook data sets starts at 1980. Using the textbook data,
we fit trends to 1980-2005 and equate the womens trend to mens trend (148.66 0.2186t = 129.96 .
0145t). Solving for t gives t = 92 or year 2137 (recall that 2005 is t = 26). Whether this will actually
happen depends on human physiology and its limits, as well as issues such as performance enhancing
drugs. If a student uses the entire data set 1970 to 2005, the equated trends (173.62 1.0916t = 153.75
.1531t) converge at t = 42.5 or about 2012 (in the whole data set, 2005 is t = 36). Bothe graphs are
shown below.
c. Moving average gives a good fit.
d. Yes, it is reasonable since trend is slight.
Data from 1980-2005 Only
y = -0.2186x + 148.66
y = -0.0145x + 129.96
120
125
130
135
140
145
150
155
160
1
9
8
0
1
9
8
2
1
9
8
4
1
9
8
6
1
9
8
8
1
9
9
0
1
9
9
2
1
9
9
4
1
9
9
6
1
9
9
8
2
0
0
0
2
0
0
2
2
0
0
4
Men
Women
200
Data from 1970-2005
y = -1.0916x + 173.64
y = -0.1531x + 133.75
100
110
120
130
140
150
160
170
180
190
200
1
9
7
0
1
9
7
3
1
9
7
6
1
9
7
9
1
9
8
2
1
9
8
5
1
9
8
8
1
9
9
1
1
9
9
4
1
9
9
7
2
0
0
0
2
0
0
3
2
0
0
6
2
0
0
9
2
0
1
2
Men
Women
3-Period Moving Average
120
125
130
135
140
145
150
155
160
1
9
8
0
1
9
8
2
1
9
8
4
1
9
8
6
1
9
8
8
1
9
9
0
1
9
9
2
1
9
9
4
1
9
9
6
1
9
9
8
2
0
0
0
2
0
0
2
2
0
0
4
Men
Women
14.20 a. See graphs.
b. See graphs.
c. Answers will vary depending on the example chosen. Commercial banks main have a downward trend
while branches have and upward trend. Rise of online banking, ATM machines, direct deposit are
examples of why branch use has increased. For saving institutions, both series have downward trends.
d. See graphs
e. Answers will vary
f. There is little difference between the two models. Both yield similar, almost identical results. The linear
model is therefore used since it is easier to work the linear trend model.
y = 1407.2x + 55024
R
2
= 0.9812
y = -273.47x + 10052
R
2
= 0.9741
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
1995 1996 1997 1998 1999 2000 2001 2002 2003
Banks-Main
Banks-Branches
201
y = -181.87x + 13745
R
2
= 0.8545
y = -73.967x + 2044.4
R
2
= 0.9639
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
1995 1996 1997 1998 1999 2000 2001 2002 2003
Savings-Main
Savings-Branches
14.21 a. See graph. Trend is rapid growth.
b. Answers will vary, but there has been a dramatic increase in shares, need to have quick access to travel,
unevenness of commercial airplane travel, burden of security procedures for commercial air travel,
convenience of smaller airports (closer to ultimate destination than main hubs), and rise of a highly
compensated executive class with access to company expense accounts.
c. Possibly, if underlying causes remain unchanged, such as continued crowding and deterioration of
quality in commercial air travel (e.g., bankrupt airlines, labor disputes). However, the exponential
models fit to recent data is poor.
d. The exponential forecast for 2003 is so aggressive as to be unconvincing (y18 = 3.8256e.4512(18) =
12,879. Perhaps a judgment forecast would be better (e.g., somewhere around 8,000).
Fractional Aircraft Ownership
y = 3.8256e
0.4512x
R
2
= 0.9708
0
2000
4000
6000
8000
10000
12000
14000
1986 1988 1990 1992 1994 1996 1998 2000 2002
14.22 a. See graph.
b. See graph.
c. Answers will vary, use criteria from above and in text to assess, answers will vary, but all exhibit a small,
positive upward trend. Linear trend model works best for cars, least for light trucks, based on overall fit
and fit to recent data.
d. Purchasers of new vehicles, producers of new vehicles, governments, oil companies and refiners
202
14.23 a. See graph. Except for spike in 2001 (possible ralated to 9-11 attacks) there is little trend.
b. Answers will vary, but the fitted trend, while not strong, is positive.
c. A linear trend does not fit the data. It is tempting to try a higher-order polynomial, but such a model
would be useless for forecasting.
d. Would expect it to be around 140 (judgment based on last 4 years excluding 2001).
U.S. Law Enforcement Officers Killed, 1994-2002
y = 3.9167x + 119.64
R
2
= 0.1136
100
120
140
160
180
200
220
240
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Year
K
i
ll
e
d
14.24 a. See graph.
b. Answers will vary, but the trend, while not strong, has been negative and declining in the numbers killed.
Awareness of how to avoid lightning (despite popularity of outdoor sports like golf, climbing, boating).
c. The exponential model is a good overall fit and is a simple way to describe the observed trend in a
believable way.
d. See graph. The 2005 forecast is y14 = 329.01e
.1415(14)
= 45.4 deaths.
203
y = 329.01e
-0.1415x
R
2
= 0.9253
0
50
100
150
200
250
300
350
400
1
9
4
0
1
9
4
5
1
9
5
0
1
9
5
5
1
9
6
0
1
9
6
5
1
9
7
0
1
9
7
5
1
9
8
0
1
9
8
5
1
9
9
0
1
9
9
5
2
0
0
0
14.25 a. See graph.
b. Answers will vary. A linear trend does not fit the data well. The quadratic does a good job of capturing
the pattern, and its forecasts look reasonable (similar to what someone might get using a judgment
forecast based on the most recent data).
c. See graph. The quadratic forecast is y11 = 62.72 (11)
2
- 771.68 (11) + 11384 = 10,485.
y = 62.72x
2
- 771.68x + 11384
R
2
= 0.8741
0
2000
4000
6000
8000
10000
12000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
14.26 a. See graph.
b. See graph.
c. Yes, but will be beyond 2040.
d. Trend equations are given in the graph.
e. Issue is of interest to athletes, trainers, ergonomists, and sports physiologists. Set 11.967 0.0804 x =
10.541 0.0402 x and solve for x = 37.14 (round up to 38). Year 38 is 2084. It would be conceivable if
trends stay the same. However, if men get faster as women get faster, and if the trends are nonlinear, it
will take even longer for convergence to happen.
204
14.27 Answers will vary based on series selected. It is clear that the oil consumption in the US is increasing.
This is a good opportunity for students to access the internet (e.g., the Energy Information Agency
www.eia.doe.gov) to get more information to augment their preconceived ideas and facts.
14.28 Answers will vary based on series selected. The trend for these series has been positive and increasing,
but at a decreasing rate. Results indicate rising costs of prison interment, since all series show a
projected increase. The legal, demographic, and sociological roots of these trends are worth exploring.
This is a good opportunity for students to access the internet (e.g., www.ojp.usdoj.gov/bjs) to get more
information to augment their preconceived ideas and facts.
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
1
9
8
6
1
9
8
8
1
9
9
0
1
9
9
2
1
9
9
4
1
9
9
6
1
9
9
8
2
0
0
0
2
0
0
2
2
0
0
4
2
0
0
6
2
0
0
8
2
0
1
0
Total
Probation
Jail
Prison
Parole
14.29 Answers will vary based on series selected. There are different trends in males vs. females on verbal as
well as mathematical portion of exam. There are implications for learning styles by gender, college
admissions, and perhaps even career choices. This is a good opportunity for students to access the Web
(e.g., www.collegeboard.com) to get more information to augment their preconceived ideas and facts.
14.30 a. See graph.
b. Answers will vary. Students will see that m = 2 (shown below) gives the best fit. However, m = 2
offers little smoothing.. Being heavily influenced by the most recent data values could be a disadvantage
in making a future projection.
c. Yes., based on overall fit, fit to recent data, and the inapplicability of any of the usual trend models.
205
14.31 a. See graphs.
b. See graphs. The degree of smoothing varies dramatically as is increased.
c. For this data = .20 seems to track the recent data well, yet provides enough smoothing to iron out the
blips. It gives enough weight to recent data to bring its forecasts above 1.80 (lagging but reflecting the
recent rise in rates). In contrast, = .05 or = .10 are not responsive to recent data (too much
smoothing) so they give a forecast below 1.80. While = .50 gives a good fit it does not smooth the
data very much. Forecasters generally use smaller values (the default of = .20 is a common choice).
d. Yes, based on graphs and fitted trendline.
= .05 = .10

= .20 = .50

206
14.32 a. See graph.
b. Yes. Seasonality is quite apparent. There are highs and lows, depending on the month. There is also an
upward trend. The seasonal swings seem to increase in amplitude.
c. See output below. It is important to note that 1.00 is the key reference point for a multiplicative seasonal
index. The creative student might plot the seasonal indexes to show that Nov through Mar (the winter
months) are above 1.00, Apr is near 1.00, and the summer and fall months are below 1.00.
d. December is the highest, August is the lowest. This is logical, based on the seasons impact on heating
degree-days (presumably the residence is in a northern climate).
e. There is some upward trend. Logical, since generally, fossil fuels are getting more expensive.
f. We used December as our norm. All binaries are significant and negative (although Jan and Feb are
significant only at rather large ) meaning that December is the highest month (all other months reduce
the cost below December). The next two highest months are February and January.

R 0.803
Adjusted R 0.743 n 48
R 0.896 k 11
Std. Error 22.411 Dep. Var. Cost
ANOVA table
Source SS df MS F p-value
Regression
73,856.453
5 11 6,714.2230 13.37 1.23E-09
Residual
18,081.219
1 36 502.2561
Regression output
207
variables coefficients std. error t (df=36) p-value
Intercept 133.1025 11.2055 11.878 5.16E-14
Jan -19.4400 15.8470 -1.227 .2279
Feb -16.4225 15.8470 -1.036 .3070
Mar -44.1600 15.8470 -2.787 .0084
Apr -62.6500 15.8470 -3.953 .0003
May -95.5400 15.8470 -6.029 6.36E-07
Jun -113.1775 15.8470 -7.142 2.14E-08
Jul -96.8575 15.8470 -6.112 4.92E-07
Aug -114.0950 15.8470 -7.200 1.80E-08
Sep -108.0000 15.8470 -6.815 5.74E-08
Oct -77.4050 15.8470 -4.885 2.14E-05
Nov -38.8175 15.8470 -2.450 .0193
14.33 a. See graph. There is no apparent trend.
b. Yes, one sees the demand for permits start to increase before the prime building months of the summer
season. Seasonality is apparent in the recurring cycles.
c. See output and graph. Note that 1.00 is the key reference point for a multiplicative seasonal index. The
creative student might plot the seasonal indexes to show that April through October seasonal indexes
exceed 1.00, while the other months are below 1.00.
d. April, May, June have the most; while January, December, February have the least. Yes, this is logical,
based on weather patterns.
e. No there is no trend (see graph).
208
14.34 a. See graph. There is a pattern, but it is cyclical and obviously not easily modeled.
b. Yes, but not easily detected. Shipments are slower in the first quarter and stronger in the fourth quarter.
c. See output and graph. Yes, there is an upward trend, but with a heavy cyclical pattern.
209
14.35 a. See graph.
b. See output and graph. Yes, there is an upward trend in the deseasonalized data. This is logical, given
consumers dependence on credit. There is a hint of a level jump in 2004.
c. Highest: November, December, January (holiday buying and bill-paying). Lowest: October, September,
July. Yes this is logical. Credit increases due to the Christmas buying season. Drops off during the
month of July (vacation), and September and October (kids back to school, waiting for Christmas
spending season). Most of the months that are below 1.00 are only slightly below.
14.36 a. See graph.
b. See output. Yes there is a trend (a little hard to see because the scale is magnified by the seasonal
spike every December.
c. December is the highest (greatly different from the other months) and January is the lowest. Yes, based
on the Christmas retail season makes sense. The other months are not too far from 1.00 (the all-
important reference point that would indicate an average month in a multiplicative model). There is a
trend in the deseasonalized data.
210
d. See output. The seasonal binaries are constructed with the highest month, December as the base. The
seasonal binaries measure the amount of sales for the month that are less than December. All seasonal
binaries are negative, indicating that December is indeed the highest month. All coefficients, including
Time, are significant in the regression model. The model can be used to estimate monthly Jewelry Sales
in the U.S.
R 0.967
Adjusted R 0.960 n 72
R 0.983 k 12
Std. Error
149.97
0 Dep. Var. Sales
211
variables coefficients std. error t (df=59) p-value
Intercept 3,455.0500 71.1368 48.569 2.81E-49
Jan -2,798.6837 87.1031 -32.131 4.43E-39
Feb -2,575.2579 87.0134 -29.596 4.27E-37
Mar -2,665.1655 86.9322 -30.658 6.05E-38
Apr -2,604.2397 86.8595 -29.982 2.08E-37
May -2,343.4806 86.7952 -27.000 6.66E-35
Jun -2,470.0548 86.7395 -28.477 3.59E-36
Jul -2,534.4623 86.6923 -29.235 8.42E-37
Aug -2,479.3698 86.6537 -28.612 2.76E-36
Sep -2,570.6107 86.6237 -29.676 3.68E-37
Oct -2,503.1849 86.6022 -28.904 1.58E-36
Nov -2,165.5925 86.5893 -25.010 4.27E-33
Time 6.2409 0.8624 7.237 1.08E-09
14.37 a. See graph.
b. See output. No strong trend is found. There is only slight seasonality, with December demand for cash
apparently being about 2.3% higher than the other months (seasonal index 1.023) and the other months
having seasonal indexes very near 1.00. Only February is noticeably below 1.00 (seasonal index is
0.989), perhaps because it is a shortened month.
c. See output. December was omitted (to serve as a reference point). All the other months have negative
coefficients, indicating that December was, on average, the highest month in terms of M1, February,
May, and October are the only months with somewhat significant seasonality in the regression.
212
R 0.121
Adjusted R 0.000 n 84
R 0.348 k 12
Std. Error 31.417 Dep. Var. M1
variables coefficients std. error t (df=71) p-value
Intercept 1,132.0405 13.7115 82.561 2.79E-72
Jan -21.5584 16.8664 -1.278 .2053
Feb -42.1388 16.8537 -2.500 .0147
Mar -31.9335 16.8422 -1.896 .0620
Apr -17.6567 16.8319 -1.049 .2977
May -38.0086 16.8228 -2.259 .0269
Jun -31.4176 16.8150 -1.868 .0658
Jul -29.2837 16.8083 -1.742 .0858
Aug -33.2498 16.8028 -1.979 .0517
Sep -29.5302 16.7986 -1.758 .0831
Oct -35.4392 16.7955 -2.110 .0384
Nov -24.2767 16.7937 -1.446 .1527
Time 0.0375 0.1428 0.263 .7934
Using the fitted regression equation, here are forecasts for 2002:
Period Forecast
85 1,089.90
86 1,100.11
87 1,114.38
88 1,094.03
89 1,100.62
90 1,102.76
91 1,098.79
92 1,102.51
93 1,096.60
94 1,107.76
95 1,132.08
96 1,132.04
213
14.38 a. See graph. The trend is positive and increasing. The MegaStat seasonal index show that the 2
nd
quarter
is the highest (1.085 or 8.5% above normal) and the 1
st
quarter is the lowest (.877 or about 12% below
normal). This is consistent with temperature and the seasons in the U.S. as they might affect sales of
Coca Cola products.
b. All seasonal binaries are significant. Each measures departure from quarter 4 (i.e., Qtr4 was the omitted
seasonal binary). Results confirm quarter 1 is below quarter 4 and quarters 2 and 3 are above it. The
time coefficient is also significant confirming the presence of a trend.
c. See output.
R 0.866
Adjusted R 0.837 n 24
R 0.930 k 4
Std. Error 197.012 Dep. Var. Revenue
variables coefficients std. error t (df=19) p-value
Intercept 4,713.1250 115.1581 40.927 5.40E-20
Qtr1 -443.7292 115.1079 -3.855 .0011
Qtr2 596.6250 114.3528 5.217 4.90E-05
Qtr3 487.8125 113.8973 4.283 .0004
time 20.3125 5.8869 3.450 .0027
214
Here are the forecasts for 2005 from fitted regression equation:
Period Forecast
25 4772
26 5833
27 5738
28 5277
14.39 a. See output, the regression is significant. All binaries and time are significant, except for January and
February. Each seasonal binary measures the distance from December (i.e., December is the omitted
binary).
R 0.871
Adjusted R 0.845 n 72
R 0.933 k 12
Std. Error 488.245 Dep. Var. Issued
variables coefficients std. error t (df=59) p-value
Intercept 5,485.9750 231.5951 23.688 7.91E-32
Jan 575.5780 283.5755 2.030 .0469
Feb 580.1012 283.2834 2.048 .0450
Mar 1,343.2911 283.0189 4.746 1.36E-05
Apr 1,344.6476 282.7821 4.755 1.32E-05
May 1,586.8375 282.5729 5.616 5.56E-07
Jun 2,525.5274 282.3915 8.943 1.42E-12
Jul 2,665.3839 282.2380 9.444 2.10E-13
Aug 2,933.4071 282.1123 10.398 5.84E-15
Sep 2,386.9304 282.0144 8.464 9.03E-12
Oct 1,910.4536 281.9445 6.776 6.47E-09
Nov 1,215.1435 281.9026 4.311 .0001
Time -36.5232 2.8077 -13.008 5.57E-19
b. Applying the fitted regression equation, here are the forecasts for 1997. No web site could be found to
check these forecasts.
Period Forecast
73 3,395.36
74 3,363.36
75 4,090.03
76 4,054.86
77 4,260.53
78 5,162.69
79 5,266.03
80 5,497.53
81 4,914.53
82 4,401.53
83 3,669.69
84 2,418.03
215
14.40* Translate each equation into the form yt = y0(1+r)
t
. For each equation, we have y0 = a and r = e
b
1. We
then verify that both forms yield the same result for t = 3 (an arbitrary choice). Students may not think of
doing this kind of confirmation, but it makes the equivalency clear.
a. yt = 456(1+.1309)
t
b. yt = 228(1+..0779)
t
c. yt = 456(1-0.0373)
t
t=3 t=3
y0 r y0(1 + r)
t
yt = a e
bt
a. 456 0.1309 659.5071 659.5071
b. 228 0.0779 285.5296 285.5296
c. 456 -0.0373 406.8696 406.8696
14.41* Translate each of the following fitted compound interest trend models into an exponential model of the
form yt = a e
bt
. For each equation, we set a = y0 and b = ln(1 + r). We then verify that both forms yield
the same result for t = 3. Students may not think of doing this kind of confirmation, but it makes the
equivalency clear.
a. yt = 123 e
.0853t
b. yt = 654 e
.1964t
c. yt = 308e
-0.0598 t
t=3 t=3
a b y0(1 + r)
t
yt = a e
bt
a. 123 0.853 158.8506 158.8506
b. 654 0.1964 1178.825 1178.825
c. 308 -0.0598 257.4562 257.4562
216
Chapter 15
Chi-Square Tests
15.1 a. H0: Earnings and Approach are independent.
b. Degrees of Freedom = (r1)(c1) = (41)(31) = 6
c. CHIINV(.01,6) = 16.81 and test statistic = 127.57.
d. Test statistic is 127.57 (p < .0001) so reject the null at = .01.
e. No Clear Effect and Business Combinations contributes the most.
f. All expected frequencies exceed 5.
g. p-value is near zero (observed difference not due to chance).
Increase Decrease No Effect Total
Expenses and Losses Observed 133 113 23 269
Expected 142.07 83.05 43.88 269.00
(O - E) / E 0.58 10.80 9.93 21.31
Revenue and Gains Observed 86 20 8 114
Expected 60.21 35.20 18.59 114.00
(O - E) / E 11.05 6.56 6.04 23.64
Business Combinations Observed 12 22 33 67
Expected 35.39 20.69 10.93 67.00
(O - E) / E 15.46 0.08 44.58 60.12
Other Approaches Observed 41 4 20 65
Expected 34.33 20.07 10.60 65.00
(O - E) / E 1.30 12.87 8.33 22.49
Total Observed 272 159 84 515
Expected 272.00 159.00 84.00 515.00
(O - E) / E 28.38 30.31 68.88 127.57
127.57 chi-square
6 df
p-value
15.2 a. H0: Age Group and Ownership are independent.
b. Degrees of Freedom = (r1)(c1) = (21)(41) = 3
c. CHIINV(.01,3) = 11.34 and test statistic = 19.31.
d. Since the p-value (.0002) is less than .01, we reject the null and find dependence.
e. Adults and Europe and Adults and Latin America contribute the most.
f. All expected frequencies exceed 5.
g. The p-value from MegaStat shows that observed difference would arise by chance only 2 times in 10,000
samples if the two variables really were independent.
217
U.S. Europe Asia Latin. America Total
Observed 80 89 69 65 303
Expected 75.75 75.75 75.75 75.75 303.00
(O - E) / E 0.24 2.32 0.60 1.53 4.68
Observed 20 11 31 35 97
Expected 24.25 24.25 24.25 24.25 97.00
(O - E) / E 0.74 7.24 1.88 4.77 14.63
Observed 100 100 100 100 400
Expected 100.00 100.00 100.00 100.00 400.00
(O - E) / E 0.98 9.56 2.48 6.29 19.31
19.31 chi-square
3 df
.0002 p-value
15.3 a. H0: Verbal and Quantitative are independent
b. Degrees of Freedom = (r1) (c1) = (31)(31) = 4
c. CHIINV(.005,4) = 14.86 and test statistic = 55.88.
d. Test statistic is 55.88 (p < .0001) so we reject the null at = .005.
e. The upper left cell (Under 25 and Under 25) contributes the most.
f. Expected frequency is less than 5 in two cells.
g. p-value is nearly zero (observed difference not due to chance).
Under 25 25 to 35 35 or More Total
Under 25 Observed 25 9 1 35
Expected 10.50 14.00 10.50 35.00
(O - E) / E 20.02 1.79 8.60 30.40
25 to 35 Observed 4 28 18 50
Expected 15.00 20.00 15.00 50.00
(O - E) / E 8.07 3.20 0.60 11.87
35 or More Observed 1 3 11 15
Expected 4.50 6.00 4.50 15.00
(O - E) / E 2.72 1.50 9.39 13.61
Total Observed 30 40 30 100
Expected 30.00 40.00 30.00 100.00
(O - E) / E 30.81 6.49 18.58 55.88
55.88 chi-square
4 df
.0000 p-value
15.4 a. H0: Privilege Level and Disciplinary Action are independent.
b. Degrees of Freedom = (r1)(c1) = (31)(21) = 2
c. CHIINV(.01,2) = 9.210 and test statistic = 13.77.
d. Since the p-value (.0010) is less than .01, we reject the null and find dependence.
e. Not Disciplined and Low contributes the most.
f. All expected frequencies exceed 5 except Low and Not Disciplined.
g. The p-value from MegaStat shows that observed difference would arise by chance only 1 time in 1000
samples if the two variables really were independent.
218
Disciplined Not Disciplined Total
Observed 20 11 31
Expected 26.29 4.71 31.00
(O - E) / E 1.51 8.42 9.93
Medium Observed 42 3 45
Expected 38.17 6.83 45.00
(O - E) / E 0.38 2.15 2.53
Observed 33 3 36
Expected 30.54 5.46 36.00
(O - E) / E 0.20 1.11 1.31
Observed 95 17 112
Expected 95.00 17.00 112.00
(O - E) / E 2.09 11.68 13.77
13.77 chi-square
2 df
.0010 p-value
15.5 a. H0: Return Rate and Notification are independent.
b. Degrees of Freedom = (r1)(c1) = (21)(21) = 1
c. CHIINV(.025,1) = 5.024 and test statistic = 5.42 (close decision).
d. Since the p-value (.0199) is less than .025, we reject the null and find dependence.
e. Returned and No, Returned and Yes contribute the most.
f. All expected frequencies exceed 5.
g. The p-value from MegaStat shows that observed difference would arise by chance only 20 times in 1000
samples if the two variables really were independent.
Returned Not Returned Total
Observed 39 155 194
Expected 30.66 163.34 194.00
(O - E) / E 2.27 0.43 2.70
Observed 22 170 192
Expected 30.34 161.66 192.00
(O - E) / E 2.29 0.43 2.72
Observed 61 325 386
Expected 61.00 325.00 386.00
(O - E) / E 4.56 0.86 5.42
5.42 chi-square
1 df
.0199 p-value
Hypothesis test for two independent proportions
p1 p2 pc
0.6393 0.4769 0.5026 p (as decimal)
39/61 155/325 194/386 p (as fraction)
39 155 194 X
61 325 386 n
0.1624 difference
219
0. hypothesized difference
0.0698 std. error
2.33 z
.0199 p-value (two-tailed)
5.419795 z-squared
0.0302 confidence interval 95.% lower
0.2946 confidence interval 95.% upper
0.1322 half-width
h. In a two-tailed two-sample z test for 1 = 2 we verify that z
2
is the same as the chi-square statistic, as
presented in the table above. The p-value is the same (.0199) and z
2
= 5.42 (after rounding) is equal to
the chi-square value from the previous table.
15.6 Most purchasers are 18-24 years of age. Fewest bought by those 65 and over. At = .01, this sample
does contradict the assumption that readership is uniformly distributed among these six age groups, since
the p-value is less than .01.
Goodness of Fit Test
observed expected O - E (O - E) / E % of chisq
38 20.000 18.000 16.200 51.76
28 20.000 8.000 3.200 10.22
19 20.000 -1.000 0.050 0.16
16 20.000 -4.000 0.800 2.56
10 20.000 -10.000 5.000 15.97
9 20.000 -11.000 6.050 19.33
120 120.000 0.000 31.300 100.00
31.30 chi-square
5 df
8.17E-06 p-value
15.7 Vanilla and Mocha are the leading flavors. Coffee least favorite as measured by sales. However, at = .
05, this sample does not contradict the assumption that sales are the same for each beverage, since the p-
value (.8358) is greater than .05.
220
Goodness of Fit Test
observed expected O - E (O - E) / E % of chisq
18 21.000 -3.000 0.429 50.00
23 21.000 2.000 0.190 22.22
23 21.000 2.000 0.190 22.22
20 21.000 -1.000 0.048 5.56
84 84.000 0.000 0.857 100.00
.86 chi-square
3 df
.8358 p-value
15.8 Graph reveals that 0 and 8 are occurring the most frequently, while 1 has the smallest occurrence rate. At
= .05, we cannot reject the hypothesis that the digits are from a uniform population, since the p-value
(.5643) is greater than .05.
221
Goodness of Fit Test
observed expected O - E (O - E) / E % of chisq
33 27.000 6.000 1.333 17.31
17 27.000 -10.000 3.704 48.08
25 27.000 -2.000 0.148 1.92
30 27.000 3.000 0.333 4.33
31 27.000 4.000 0.593 7.69
28 27.000 1.000 0.037 0.48
24 27.000 -3.000 0.333 4.33
25 27.000 -2.000 0.148 1.92
32 27.000 5.000 0.926 12.02
25 27.000 -2.000 0.148 1.92
270 270.000 0.000 7.704 100.00
7.70 chi-square
9 df
.5643 p-value
15.9 At = .05, we cannot reject the hypothesis that the movie goers are from a uniform population, since the
p-value (.1247) is greater than .05
Goodness of Fit Test
Age observed expected O - E (O - E) / E % of chisq
10 < 20 5 8.000 -3.000 1.125 11.25
20 < 30 6 8.000 -2.000 0.500 5.00
30 < 40 10 8.000 2.000 0.500 5.00
40 < 50 3 8.000 -5.000 3.125 31.25
50 < 60 14 8.000 6.000 4.500 45.00
60 < 70 9 8.000 1.000 0.125 1.25
70 < 80 9 8.000 1.000 0.125 1.25
56 56.000 0.000 10.000 100.00
10.00 chi-square
6 df
.1247 p-value
15.10 a. The sample mean and standard deviation are close to those used to generate the values.
Sample Generated
5.20 5.00
2.4075 2.2361
222
b. and c. See table below. Since the p-value (.7853) is greater than 0.05, we dont reject the null hypothesis
that the observations were coming from a Poisson distribution. Note that the end categories were
collapsed so that their expected frequencies would be at least 5. A common error that students make is to
fail to check that their probabilities sum to 1 and that the expected frequencies sum to n. If these sum to
less than expected, it is an indication that they forgot the Poisson probabilities beyond the highest
observed value (X = 12, 13, 14, ... etc.).
X P(X)
observed expected O - E (O - E) / E
2 or less 0.12465 6 6.2326 -0.2326 0.0087
3 0.14037 7 7.0187 -0.0187 0.0000
4 0.17547 7 8.7734 -1.7734 0.3585
5 0.17547 8 8.7734 -0.7734 0.0682
6 0.14622 10 7.3111 2.6889 0.9889
7 0.10444 3 5.2222 -2.2222 0.9456
8 or more 0.13337 9 6.6686 2.3314 0.8151
1.00000 50 50.0000 0.0000 3.1850
3.185chi-square
6df
.7853p-value
d. If we use = 5.2 instead of = 5, the test statistic changes and we lose one degree of freedom (because
is being estimated from the sample). However, in this case, the p-value (.7610) is about the same, so we
still fail to reject the hypothesis of a Poisson distribution.
X P(X)
observed expected O - E (O - E) / E
2 or less 0.10879 6 5.4393 0.5607 0.0578
3 0.12928 7 6.4639 0.5361 0.0445
4 0.16806 7 8.4031 -1.4031 0.2343
5 0.17479 8 8.7393 -0.7393 0.0625
6 0.15148 10 7.5740 2.4260 0.7771
7 0.11253 3 5.6264 -2.6264 1.2260
8 or more 0.15508 9 7.7539 1.2461 0.2002
1.00000 50 50.0000 0.0000 2.602
2.602chi-square
5df
.7610p-value
223
15.11 Using sample mean = 4.948717949 the test statistic is 3.483 (p-value = .4805) with d.f. = 611 = 4.
The critical value for = .05 is 9.488 so we cannot reject the hypothesis of a Poisson distribution. Note
that the end categories were collapsed so that their expected frequencies would be at least 5. A common
error that students make is to fail to check that their probabilities sum to 1 and that the expected
frequencies sum to n. If these sum to less than expected, it is an indication that they forgot the Poisson
probabilities beyond the highest observed value (X = 11, 12, 13, ... etc.).
X P(X)
Obs Exp OE (OE) / E
2 or less 0.12904 3 5.032 -2.032 0.821
3 0.14326 5 5.587 -0.587 0.062
4 0.17724 9 6.912 2.088 0.631
5 0.17542 10 6.841 3.159 1.458
6 0.14468 5 5.643 -0.643 0.073
7 or more 0.23036 7 8.984 -1.984 0.438
1.00000 39 39.000 0.000 3.483
15.12 At = .05, you cannot reject the hypothesis that truck arrivals per day follow a Poisson process, since
the p-value (.2064) is greater than .05. For this test, we use the estimated sample mean = 2.6. Note
that the top categories were collapsed so that their expected frequencies would be at least 5. A common
error that students make is to fail to check that their probabilities sum to 1 and that the expected
frequencies sum to n. If these sum to less than expected, it is an indication that they forgot the Poisson
probabilities beyond the highest observed value (X = 8, 9, 10, ... etc.).
X Days P(X) Exp Obs-Exp Chi-Square
0 4 0.07427 7.4274 -3.4274 1.5816
1 23 0.19311 19.3111 3.6889 0.7047
2 28 0.25104 25.1045 2.8955 0.3340
3 22 0.21757 21.7572 0.2428 0.0027
4 8 0.14142 14.1422 -6.1422 2.6677
5 or more 15 0.12258 12.2577 2.7423 0.6135
Total 100 1.00000 100.0000 0.0000 5.9041
df 4
p-value 0.2064
15.13 From sample, x = 75.375 and s = 8.943376. Set ej = 40/8 = 5. Students might form categories
somewhat differently, so results may vary slightly depending on rounding. Using Visual Statistics with 8
classes with class limits to ensure equal expected frequencies (the optimal expected frequencies option)
the test statistic is 6.000 (p-value = .3062) using d.f. = 821 = 5. The critical value for = .05 is 11.07
so we cannot reject the hypothesis of a normal distribution. Visual Statistics helpful because you can
adjust for expected frequencies less than 5 easily and quickly.
Score Obs Exp ObsExp Chi-Square
Under 65.09 5 5.000 0.000 0.000
65.09 < 69.34 3 5.000 -2.000 0.800
69.34 < 72.53 5 5.000 0.000 0.000
72.53 < 75.38 3 5.000 -2.000 0.800
75.38 < 78.22 9 5.000 4.000 3.200
78.22 < 81.41 7 5.000 2.000 0.800
81.41 < 85.66 4 5.000 -1.000 0.200
85.66 or more 4 5.000 -1.000 0.200
Total 40 40.000 0.000 6.000
224
15.14 For this test, we use the estimated sample mean 31.1512 and standard deviation 9.890436. Set ej = 42/8
= 5.25 Students might form categories somewhat differently, so results may vary slightly depending on
rounding. Results shown below are from Visual Statistics using the option for equal (optimal) expected
frequencies. At = .025, you cannot reject the hypothesis that carry-out orders follow a normal
population, since the p-value (.7074) is greater than .025.
Cost of Order Obs Exp Obs-Exp Chi-Square
Under 19.77 6 5.2500 0.7500 0.1070
19.77 < 24.48 6 5.2500 0.7500 0.1070
24.48 < 28.00 6 5.2500 0.7500 0.1070
28.00 < 31.15 3 5.2500 -2.2500 0.9640
31.15 < 34.30 5 5.2500 -0.2500 0.0120
34.30 < 37.82 6 5.2500 0.7500 0.1070
37.82 < 42.53 3 5.2500 -2.2500 0.9640
42.53 or more 7 5.2500 1.7500 0.5830
Total 42 42.0000 0.0000 2.9520
d.f. 5
p-value .7074
15.15* The probability plot looks rather linear, yet the p-value (.033) for the Anderson-Darling test is less than
= .05. This tends to contradict the chi-square test used in Exercise 15.13. However, the Kolmogorov-
Smirnov test (DMax = .158) = does not reject normality (p > .20). The data are a borderline case, having
some characteristics of a normal distribution. If we have to choose one test, the A-D is the most
powerful.
ExamScore
P
e
r
c
e
n
t
100 90 80 70 60 50
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
0.033
75.38
StDev 8.943
N 40
AD 0.811
P-Value
Probability Plot of ExamScore
Normal
225
15.16* The probability plot looks linear and the p-value (.404) for the Anderson-Darling test exceeds = .05.
The Kolmogorov-Smirnov test (DMax = .085) = does not reject normality (p > .20). Therefore, we cannot
reject the hypothesis of normality.
Cost of Order
P
e
r
c
e
n
t
60 50 40 30 20 10
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
0.404
31.15
StDev 9.890
N 42
AD 0.373
P-Value
Probability Plot of Cost of Order
Normal
226
15.17 a. H0: Pay Category and Job Satisfaction are independent.
b. Degrees of Freedom = (r1)(c1) = (21)(31) = 2
c. CHIINV(.05,2) = 5.991 and test statistic = 4.54.
d. Since the p-value (.1032) is greater than .05, we cannot reject the null and find independence.
e. Highlighted cells contribute the most see table and (O - E)
2
/ E.
f. No small expected frequencies
g. The p-value from MegaStat shows that observed difference would arise by chance only 103 times in
1000 samples if the two variables really were independent.
Satisfied Neutral Dissatisfied Total
Observed 20 13 2 35
Expected 15.28 13.80 5.92 35.00
(O - E) / E 1.46 0.05 2.59 4.10
Observed 135 127 58 320
Expected 139.72 126.20 54.08 320.00
(O - E) / E 0.16 0.01 0.28 0.45
Observed 155 140 60 355
Expected 155.00 140.00 60.00 355.00
(O - E) / E 1.62 0.05 2.88 4.54
4.54 chi-square
2 df
.1032 p-value
15.18 a. H0: Credits Earned and Certainty of Major are independent.
b. Degrees of Freedom = (r1)(c1) = (31)(31) = 4
c. CHIINV(.01,4) = 13.28
d. Since the p-value (.0052) is less than .01, we can reject the null and conclude dependence.
e. Highlighted cells contribute the most (see table).
f. No small expected frequencies
g. The p-value from MegaStat shows that observed difference would arise by chance only 5 times in 1000
samples if the two variables really were independent.
Very
Uncertain
Somewhat
Certain
Very
Certain Total
0 to 9 Observed 12 8 3 23
Expected 7.55 6.83 8.63 23.00
(O - E) / E 2.63 0.20 3.67 6.50
10 to 59 Observed 8 4 10 22
Expected 7.22 6.53 8.25 22.00
(O - E) / E 0.08 0.98 0.37 1.44
60 or more Observed 1 7 11 19
Expected 6.23 5.64 7.13 19.00
(O - E) / E 4.39 0.33 2.11 6.83
Total Observed 21 19 24 64
Expected 21.00 19.00 24.00 64.00
(O - E) / E 7.11 1.51 6.15 14.76
14.76 chi-square
4 df
.0052 p-value
227
15.19 a. H0: Order Handed In and Grade are independent.
b. Degrees of Freedom = (r1)(c1) = (21)(21) = 1
c. CHIINV(.1,1) = 2.706 and test statistic = 0.23.
d. Since the p-value is greater than .10, we cannot reject the null and find independence.
e. Highlighted cells contribute the mostsee table and (O - E)
2
/ E.
f. No small expected frequencies
g. The p-value from MegaStat shows that observed difference would arise by chance 628 times in 1000
samples if the two variables really were independent, so the sample result is not convincing.
h. See table below. The z
2
does equal the chi-squared value and gives the same two-tailed p-value.
Earlier
Hand-In
Later
Hand-In Total
B or better Observed 10 8 18
Expected 9.18 8.82 18.00
(O - E) / E 0.07 0.08 0.15
C or worse Observed 15 16 31
Expected 15.82 15.18 31.00
(O - E) / E 0.04 0.04 0.09
Total Observed 25 24 49
Expected 25.00 24.00 49.00
(O - E) / E 0.11 0.12 0.23
.23 chi-square
1 df
.6284 p-value
Hypothesis test for two independent proportions
p1 p2 pc
0.4 0.3333 0.3673 p (as decimal)
10/25 8/24 18/49 p (as fraction)
10. 8. 18. X
25 24 49 n
0.0667 difference
0. hypothesized difference
0.1378 std. error
0.48
0.2304 z-squared
.6284 p-value (two-tailed)
228
15.20 a. H0: Type of Planning and Competition are independent.
b. Degrees of Freedom = (r1)(c1) = (31)(31) = 4
c. CHIINV(.05,4) = 9.488 and test statistic = 24.59.
d. Since the p-value (.0001) is less than .05, we can reject the null, we find dependence.
e. Highlighted cells contribute the mostsee table and (O - E)
2
/ E.
f. No small expected frequencies
g. The p-value from MegaStat shows that observed difference would arise by chance only 1 time in 10,000
samples if the two variables really were independent.
Limited Constituency Comprehensive Total
Observed 11 25 33 69
Expected 23.87 23.87 21.26 69.00
(O - E) / E 6.94 0.05 6.49 13.48
Moderate Observed 19 23 15 57
Expected 19.72 19.72 17.56 57.00
(O - E) / E 0.03 0.55 0.37 0.94
Observed 43 25 17 85
Expected 29.41 29.41 26.18 85.00
(O - E) / E 6.28 0.66 3.22 10.16
Observed 73 73 65 211
Expected 73.00 73.00 65.00 211.00
(O - E) / E 13.25 1.26 10.08 24.59
24.59 chi-square
4 df
.0001 p-value
15.21 a. H0: Graduation and Sport are independent. H1: Graduation and Sport are not independent.
b. Degrees of Freedom = (r1)(c1) = (121)(21) = 11
c. CHIINV(.01,11) = 24.73 and test statistic = 82.73.
d. Since the p-value (less than .0001) is less than .01, we can reject the null, we find dependence.
e. Highlighted cells contribute the mostsee table and (O - E)
2
/ E.
f. No small expected frequencies
g. The tiny p-value from MegaStat shows that observed difference would not arise by chance if the two
variables really were independent. Point out to students that the large sample size could make almost any
deviation from independence significant.
229
Grad in
6 Years
Not Grad in
6 years Total
Tennis Observed 42 16 58
Expected 29.61 28.39 58.00
(O - E) / E 5.18 5.40 10.58
Swimming Observed 116 51 167
Expected 85.27 81.73 167.00
(O - E) / E 11.07 11.55 22.63
Soccer Observed 35 17 52
Expected 26.55 25.45 52.00
(O - E) / E 2.69 2.80 5.49
Gymnastics Observed 40 23 63
Expected 32.17 30.83 63.00
(O - E) / E 1.91 1.99 3.90
Golf Observed 30 21 51
Expected 26.04 24.96 51.00
(O - E) / E 0.60 0.63 1.23
Track Observed 97 69 166
Expected 84.76 81.24 166.00
(O - E) / E 1.77 1.84 3.61
Football Observed 267 317 584
Expected 298.19 285.81 584.00
(O - E) / E 3.26 3.40 6.67
Wrestling Observed 70 87 157
Expected 80.16 76.84 157.00
(O - E) / E 1.29 1.34 2.63
Baseball Observed 77 98 175
Expected 89.36 85.64 175.00
(O - E) / E 1.71 1.78 3.49
Hockey Observed 39 66 105
Expected 53.61 51.39 105.00
(O - E) / E 3.98 4.16 8.14
Basketball Observed 36 61 97
Expected 49.53 47.47 97.00
(O - E) / E 3.70 3.86 7.55
Other Observed 18 5 23
Expected 11.74 11.26 23.00
(O - E) / E 3.33 3.48 6.81
Total Observed 867 831 1698
Expected 867.00 831.00 1698.00
(O - E) / E 40.49 42.24 82.73
82.73chi-square
11df
4.36E-13p-value
230
15.22 a. H0: Vehicle Type and Mall Location are independent.
b. Degrees of Freedom = (r1)(c1) = (51)(41) = 12
c. CHIINV(.05,12) = 21.03 and test statistic = 24.53.
d. Since the p-value (.0172) is less than .05, we can reject the null, we find dependence.
e. Highlighted cells contribute the mostsee table and (O - E)
2
/ E.
f. Small expected frequencies in the full size van row.
g. The p-value (.0172) from MegaStat shows that observed difference would arise by chance about 17 times
in 1,000 samples if the two variables really were independent.
Somerset Oakland Great Lakes Jamestown Total
Car Observed 44 49 36 64 193
Expected 48.25 48.25 48.25 48.25 193.00
O - E -4.25 0.75 -12.25 15.75 0.00
(O - E) / E 0.37 0.01 3.11 5.14 8.64
Minivan Observed 21 15 18 13 67
Expected 16.75 16.75 16.75 16.75 67.00
O - E 4.25 -1.75 1.25 -3.75 0.00
(O - E) / E 1.08 0.18 0.09 0.84 2.19
Full-size Van Observed 2 3 3 2 10
Expected 2.50 2.50 2.50 2.50 10.00
O - E -0.50 0.50 0.50 -0.50 0.00
(O - E) / E 0.10 0.10 0.10 0.10 0.40
SUV Observed 19 27 26 12 84
Expected 21.00 21.00 21.00 21.00 84.00
O - E -2.00 6.00 5.00 -9.00 0.00
(O - E) / E 0.19 1.71 1.19 3.86 6.95
Truck Observed 14 6 17 9 46
Expected 11.50 11.50 11.50 11.50 46.00
O - E 2.50 -5.50 5.50 -2.50 0.00
(O - E) / E 0.54 2.63 2.63 0.54 6.35
Total Observed 100 100 100 100 400
Expected 100.00 100.00 100.00 100.00 400.00
O - E 0.00 0.00 0.00 0.00 0.00
(O - E) / E 2.29 4.64 7.12 10.48 24.53
24.53chi-square
12df
.0172p-value
15.23 a. H0: Smoking and Race are independent.
b. Degrees of Freedom = (r1)(c1) = (21)(21) = 1
c. CHIINV(.005,1) = 7.879 and test statistic = 5.84 (for males) and 14.79 (for females).
d. For males, the p-value (.0157) is not less than .005, so we cannot reject the hypothesis of independence.
However, for females, the p-value (.0001) is less than .005 so we conclude dependence.
e. Highlighted cells contribute the mostsee table and (O - E)
2
/ E.
f. No small expected frequencies.
g. The p-value for males is just within the chance level, while the female p-value indicates significance.
231
Males
Smoker Nonsmoker Total
Observed 145 280 425
Expected 136.00 289.00 425.00
(O - E) / E 0.60 0.28 0.88
Observed 15 60 75
Expected 24.00 51.00 75.00
(O - E) / E 3.38 1.59 4.96
Observed 160 340 500
Expected 160.00 340.00 500.00
(O - E) / E 3.97 1.87 5.84
5.84 chi-square
1 df
.0157 p-value
Females
Smoker Nonsmoker Total
Observed 116 299 415
Expected 102.09 312.91 415.00
(O - E) / E 1.90 0.62 2.51
Observed 7 78 85
Expected 20.91 64.09 85.00
(O - E) / E 9.25 3.02 12.27
Observed 123 377 500
Expected 123.00 377.00 500.00
(O - E) / E 11.15 3.64 14.79
14.79chi-square
1df
.0001p-value
15.24 a. H0: Cockpit Noise Level and Flight Phase are independent.
b. Degrees of Freedom = (r1)(c1) = (31)(31) = 4
c. CHIINV(.05,4) = 9.488 and test statistic = 15.16.
d. Since the p-value (.0044) is less than .05, we can reject the null, i.e., we find dependence.
e. Highlighted cells contribute the mostsee table and (O - E)
2
/ E.
f. Small expected frequencies in the Cruise column.
g. The p-value (.0044) from MegaStat shows that observed difference would arise by chance about 44 times
in 1,000 samples if the two variables really were independent.
232
Climb Cruise Descent Total
Observed 6 2 6 14
Expected 5.74 1.84 6.43 14.00
(O - E) / E 0.01 0.01 0.03 0.05
Medium Observed 18 3 8 29
Expected 11.89 3.80 13.31 29.00
(O - E) / E 3.15 0.17 2.12 5.43
Observed 1 3 14 18
Expected 7.38 2.36 8.26 18.00
(O - E) / E 5.51 0.17 3.98 9.67
Observed 25 8 28 61
Expected 25.00 8.00 28.00 61.00
(O - E) / E 8.67 0.36 6.13 15.16
15.16 chi-square
4 df
.0044 p-value
15.25 a. H0: Actual Change and Forecasted Change are independent.
b. Degrees of Freedom = (r1)(c1) = (21)(21) = 1
c. CHIINV(.10,1) = 2.706 and test statistic = 1.80.
d. Since the p-value (.1792) exceeds .10, we cannot reject the null, i.e., we find independence.
e. Highlighted cells contribute the mostsee table and (O - E)
2
/ E.
f. No small expected frequencies.
g. The p-value (.1792) from MegaStat shows that observed difference would arise by chance about 18 times
in 100 samples if the two variables really were independent.
Decline Rise Total
Observed 7 12 19
Expected 8.94 10.06 19.00
(O - E) / E 0.42 0.37 0.80
Observed 9 6 15
Expected 7.06 7.94 15.00
(O - E) / E 0.53 0.47 1.01
Observed 16 18 34
Expected 16.00 18.00 34.00
(O - E) / E 0.96 0.85 1.80
1.80 chi-square
1 df
.1792 p-value
15.26 a. H0: Smoking and Education Level are independent.
b. Degrees of Freedom = (r1)(c1) = (41)(41) = 6
c. CHIINV(.005,6) = 18.55 and test statistic = 227.78.
d. Since the p-value (less than .0001) is smaller than .005, we reject the null, i.e., we find dependence.
e. First and fourth rows contribute the mostsee table and (O - E)
2
/ E.
f. No small expected frequencies.
g. The tiny p-value from MegaStat is highly significant. Point out to students that this is partly an artifact
due to the huge sample size (i.e., in large samples, just about any deviation from independence would be
significant).
233
No Smoking < 1/2 Pack >= 1/2 Pack Total
< High School Observed 641 196 196 1033
Expected 764.76 139.74 128.50 1033.00
(O - E) / E 20.03 22.65 35.46 78.14
High School Observed 1370 290 270 1930
Expected 1428.83 261.09 240.08 1930.00
(O - E) / E 2.42 3.20 3.73 9.35
Some College Observed 635 68 53 756
Expected 559.69 102.27 94.04 756.00
(O - E) / E 10.13 11.48 17.91 39.53
College Observed 550 30 18 598
Expected 442.72 80.90 74.39 598.00
(O - E) / E 26.00 32.02 42.74 100.76
Total Observed 3196 584 537 4317
Expected 3196.00 584.00 537.00 4317.00
(O - E) / E 58.58 69.36 99.84 227.78
227.78 chi-square
6 df
2.28E-46 p-value
15.27 a. H0: ROI and Sales Growth are independent.
b. For 22 table: Degrees of Freedom = (r1)(c1) = (21)(21) = 1
For 33 table: Degrees of Freedom = (r1)(c1) = (31)(31) = 4
c. For 22 table: CHIINV(.05,1) = 3.841 and test statistic = 7.15.
For 33 table: CHIINV(.05,4) = 9.488 and test statistic = 12.30.
d. For 22 table: Conclude dependence since p-value = .0075 is smaller than .05.
For 33 table: Conclude dependence since p-value = .0153 is smaller than .05.
e. First column contributes the mostsee table and (O - E)
2
/ E.
f. No small expected frequencies.
g. The tables agree. Both p-values are significant at = .05.
22 Cross-Tabulation of Companies
Low High Total
Observed 24 16 40
Expected 17.88 22.12 40.00
(O - E) / E 2.09 1.69 3.78
Observed 14 31 45
Expected 20.12 24.88 45.00
(O - E) / E 1.86 1.50 3.36
Observed 38 47 85
Expected 38.00 47.00 85.00
(O - E) / E 3.95 3.20 7.15
7.15 chi-square
1 df
.0075 p-value
234
33 Cross-Tabulation of Companies
Low Medium High Total
Observed 9 12 7 28
Expected 5.27 12.52 10.21 28.00
(O - E) / E 2.64 0.02 1.01 3.67
Medium Observed 6 14 7 27
Expected 5.08 12.07 9.85 27.00
(O - E) / E 0.17 0.31 0.82 1.30
Observed 1 12 17 30
Expected 5.65 13.41 10.94 30.00
(O - E) / E 3.82 0.15 3.36 7.33
Observed 16 38 31 85
Expected 16.00 38.00 31.00 85.00
(O - E) / E 6.63 0.48 5.19 12.30
12.30 chi-square
4 df
.0153 p-value
15.28 a. H0: Type of Cola Drinker and Correct Response are independent.
b. Degrees of Freedom = (r1)(c1) = (21)(21) = 1
c. CHIINV(.05,1) = 3.841 and test statistic = 0.63.
d. Since the p-value (.4282) exceeds .05, we cannot reject the null, i.e., we find independence.
e. Highlighted cells contribute the mostsee table and (O - E)
2
/ E.
f. No small expected frequencies.
g. The p-value shows that observed difference would arise by chance about 43 times in 100 samples if the
two variables really were independent. We get the same p-value result using a two-tailed test of two
proportions, and z
2
= 0.79
2
= .63the same as the chi-square test statistic (except for rounding).
Regular Cola Diet Cola Total
Observed 7 7 14
Expected 5.78 8.22 14.00
(O - E) / E 0.26 0.18 0.44
Observed 12 20 32
Expected 13.22 18.78 32.00
(O - E) / E 0.11 0.08 0.19
Observed 19 27 46
Expected 19.00 27.00 46.00
(O - E) / E 0.37 0.26 0.63
.63 chi-square
1 df
.4282 p-value
235
Hypothesis test for two independent proportions
p1 p2 pc
0.3684 0.2593 0.3043 p (as decimal)
7/19 7/27 14/46 p (as fraction)
7. 7. 14. X
19 27 46 n
0.1092 difference
0. hypothesized difference
0.1378 std. error
0.79 z
0.6277
.4282 p-value (two-tailed)
15.29 a. H0: Student Category and Reason for Choosing are independent.
b. Degrees of Freedom = (r1)(c1) = (31)(31) = 4
c. CHIINV(.01,4) = 13.28 and test statistic = 54.18.
d. Since the p-value (less than .0001) is smaller than .01, we reject the null, i.e., we find dependence.
e. No consistent patternsee table and (O - E)
2
/ E.
f. No small expected frequencies.
g. Tiny p-value indicates that the variables are not independent.
Tuition Location Reputation Total
Freshmen Observed 50 30 35 115
Expected 30.49 34.41 50.09 115.00
O - E 19.51 -4.41 -15.09 0.00
(O - E) / E 12.48 0.57 4.55 17.59
Transfers Observed 15 29 20 64
Expected 16.97 19.15 27.88 64.00
O - E -1.97 9.85 -7.88 0.00
(O - E) / E 0.23 5.06 2.23 7.52
MBAs Observed 5 20 60 85
Expected 22.54 25.44 37.03 85.00
O - E -17.54 -5.44 22.97 0.00
(O - E) / E 13.65 1.16 14.25 29.06
Total Observed 70 79 115 264
Expected 70.00 79.00 115.00 264.00
O - E 0.00 0.00 0.00 0.00
(O - E) / E 26.36 6.79 21.03 54.18
54.18chi-square
4df
4.83E-11p-value
236
15.30 a. H0: Dominance of Parent and Favoring Legalizing Marijuana are independent.
b. Degrees of Freedom = (r1)(c1) = (21)(31) = 2
c. CHIINV(.10,2) = 4.605 and test statistic = 4.23.
d. Since the p-value (.1204) exceeds .10, we cannot reject the null, i.e., we find independence.
e. No consistent patternsee table and (O - E)
2
/ E.
f. One expected frequency is below 5 (Father and Yes).
g. This is a close decision. The test statistic does not quite exceed the critical value, but is close.
Mother Neither Father Total
Observed 9 13 12 34
Expected 12.24 11.56 10.20 34.00
(O - E) / E 0.86 0.18 0.32 1.35
Observed 9 4 3 16
Expected 5.76 5.44 4.80 16.00
(O - E) / E 1.82 0.38 0.68 2.88
Observed 18 17 15 50
Expected 18.00 17.00 15.00 50.00
(O - E) / E 2.68 0.56 0.99 4.23
4.23 chi-square
2 df
.1204 p-value
15.31 At = .10, this sample does not contradict the assumption that Presidents deaths is uniformly distributed
among the four seasons, since the p-value (.6695) is greater than .10. No parameters are estimated, so
d.f. = c1m = 410 = 3.
observed expected O - E (O - E) / E % of chisq
11 9.000 2.000 0.444 28.57
9 9.000 0.000 0.000 0.00
10 9.000 1.000 0.111 7.14
6 9.000 -3.000 1.000 64.29
36 36.000 0.000 1.556 100.00
1.56 chi-square
3 df
.6695 p-value
15.32 At = .05, this sample does not contradict the assumption that the 50 answers are uniformly distributed
since the p-value (.6268) is greater than .05. No parameters are estimated, so d.f. = c1m = 510 = 4.
observed expected O - E (O - E) / E % of chisq
8 10.000 -2.000 0.400 15.38
8 10.000 -2.000 0.400 15.38
9 10.000 -1.000 0.100 3.85
11 10.000 1.000 0.100 3.85
14 10.000 4.000 1.600 61.54
50 50.000 0.000 2.600 100.00
2.60 chi-square
4 df
.6268 p-value
237
15.33 To obtain expected values, multiply the U.S. proportions by 50. At = .05, Oxnard employees do not
differ significantly from the national distribution, since the p-value (.1095) exceeds .05. No parameters
are estimated, so d.f. = c1m = 410 = 3. A common error that students may make is to treat
percentages as if they were frequencies (i.e., to convert the Oxnard frequencies to percentages). Doing
so is a serious error because it doubles the sample size.
observed expected O - E (O - E) / E % of chisq
4 8.250 -4.250 2.189 36.22
20 22.900 -2.900 0.367 6.08
15 12.200 2.800 0.643 10.63
11 6.650 4.350 2.845 47.07
50 50.000 0.000 6.045 100.00
6.045chi-square
3df
.1095p-value
15.34 At = .01, you cannot reject the hypothesis that the digits are from a uniform population since the p-
value (.6570) is greater than .01. There are 356 occurrences since 894 = 356. No parameters are estimated, so
d.f. = c1m = 1010 = 9.
observed expected O - E (O - E) / E % of chisq
39 35.600 3.400 0.325 4.77
27 35.600 -8.600 2.078 30.51
35 35.600 -0.600 0.010 0.15
39 35.600 3.400 0.325 4.77
35 35.600 -0.600 0.010 0.15
35 35.600 -0.600 0.010 0.15
27 35.600 -8.600 2.078 30.51
42 35.600 6.400 1.151 16.90
36 35.600 0.400 0.004 0.07
41 35.600 5.400 0.819 12.03
356 356.000 0.000 6.809 100.00
6.81 chi-square
9 df
.6570 p-value
15.35 At = .10, you cannot reject the hypothesis that the die is fair, since the p-value (.4934) is greater than .
10. No parameters are estimated, so d.f. = c1m = 610 = 5.
observed expected O - E (O - E) / E % of chisq
7 10.000 -3.000 0.900 20.45
14 10.000 4.000 1.600 36.36
9 10.000 -1.000 0.100 2.27
13 10.000 3.000 0.900 20.45
7 10.000 -3.000 0.900 20.45
10 10.000 0.000 0.000 0.00
60 60.000 0.000 4.400 100.00
4.40chi-square
5df
.4934p-value
238
15.36 At = .025, you cannot reject the hypothesis that goals per game follow a Poisson process, since the p-
value (.9293) is greater than .025. One parameter is estimated, so d.f. = c1m = 711 = 5. A common
error that students may make is to fail to define the top category as open ended (X = 6, 7, 8, ...) so that the
last entry in the P(X) column actually is P(X 6) = 1P(X 5). If this error is made, the probabilities
will sum to less than 1 and the expected frequencies will sum to less than 232. Another common mistake
is not combining end categories to enlarge expected frequencies (e.g., Cochrans rule requires ej 5).
Goals fj P(X) ej fj-ej (fj-ej)
2
(fj-ej)
2
/ej
0 19 0.08387 19.4586 -0.4586 0.21031 0.01081
1 49 0.20788 48.2271 0.7729 0.59733 0.01239
2 60 0.25760 59.7642 0.2358 0.05559 0.00093
3 47 0.21282 49.3742 -2.3742 5.63674 0.11416
4 32 0.13187 30.5928 1.4072 1.98010 0.06472
5 18 0.06536 15.1646 2.8354 8.03976 0.53017
6 or more 7 0.04060 9.4185 -2.4185 5.84900 0.62101
Total games 232 1.00000 232 0.0000 1.35419
Total goals 575
Mean goals/game 2.478448
28

df 5
p-value 0.92926
15.37* Estimated mean is = 1.06666667. For d.f. = c1m = 411 = 2 the critical value is CHIINV(.025,2) =
7.378, test statistic is 4.947 (p= .0943) so we cant reject the hypothesis of a Poisson distribution A
common error that students may make is to fail to define the top category as open ended (X = 3, 4, 5, ...)
so that the last entry in the P(X) column actually is P(X 3) = 1P(X 2). If this error is made, the
probabilities will sum to less than 1 and the expected frequencies will sum to less than 60. Another
common mistake is not combining the top categories to enlarge expected frequencies (e.g., Cochrans
rule requires ej 5).
X fj P(X) ej fj-ej (fj-ej)
2
/ej
0 25 0.344154 20.64923 4.35077 0.917
1 18
0.367097 22.02584 -4.02584 0.736
2 8
0.195785 11.74712 -3.74712 1.195
3 or more 9
0.092964 5.57781 3.42219 2.100
Total 60 1.000000 60.00000 0.00000 4.947
15.38 Results may vary, depending on which software package was used, how the categories were defined, and
which options were selected (e.g., equal expected frequencies versus equal class widths). Results are
shown for Visual Statistics (chi-square test with equal expected frequency option) and MINITAB
(histogram with fitted normal curve and Anderson-Darling test). For the chi-square test, we use d.f. =
c3 since two parameters are estimated,. i.e. c1m = c12 = c3. Note that the chi-square tests p-
value may not agree with the A-D tests p-value. Point out to students that the chi-square test is based on
grouped frequencies, whereas the A-D test is based on individual data values, and hence they may
disagree. The A-D test is more powerful, but its methods are less intuitive for most students.
239
Data Set A Kentucky Derby Winning Times, 1950-2005 (n = 56)
Time Obs Exp Obs-Exp Chi-Square
Under 120.7 6 5.6 0.4 0.029
120.7 < 121.2 7 5.6 1.4 0.35
121.2 < 121.6 0 5.6 -5.6 5.6
121.6 < 121.9 7 5.6 1.4 0.35
121.9 < 122.1 6 5.6 0.4 0.029
122.1 < 122.4 14 5.6 8.4 12.6
122.4 < 122.7 1 5.6 -4.6 3.779
122.7 < 123.1 5 5.6 -0.6 0.064
123.1 < 123.5 5 5.6 -0.6 0.064
123.5 or more 5 5.6 -0.6 0.064
Total 56 56 0 22.929
Parameters from sample d.f. = 7 p < 0.002

Derby Time
P
e
r
c
e
n
t
125 124 123 122 121 120 119
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
0.223
122.1
StDev 1.100
N 56
A D 0.482
P- Value
Probability Plot of Derby Time
Normal
Data Set B National League Runs Scored Leader, 1900-2004 (n = 105)
Runs Obs Exp Obs-Exp Chi-Square
Under 103.7 11 13.13 -2.13 0.344
103.7 < 110.9 12 13.13 -1.13 0.096
110.9 < 116.3 16 13.13 2.88 0.630
116.3 < 121.2 15 13.13 1.88 0.268
121.2 < 126.0 14 13.13 0.88 0.058
126.0 < 131.4 11 13.13 -2.13 0.344
131.4 < 138.6 13 13.13 -0.13 0.001
138.6 or more 13 13.13 -0.13 0.001
Total 105 105 0 1.743
Parameters from sample d.f. = 5 p < 0.883
240
Derby Time
F
r
e
q
u
e
n
c
y
125 124 123 122 121 120
18
16
14
12
10
8
6
4
2
0
Mean 122.1
StDev 1.100
N 56
Histogramof Derby Time
Normal

Runs
P
e
r
c
e
n
t
180 160 140 120 100 80
99.9
99
95
90
80
70
60
50
40
30
20
10
5
1
0.1
Mean
0.561
121.2
StDev 15.13
N 105
A D 0.306
P- Value
Probability Plot of Runs
Normal
Data Set C Weights (in grams) of Pieces of Halloween Candy (n = 78)
Weight (gm) Obs Exp Obs-Exp Chi-Square
Under 1.120 9 11.14 -2.14 0.412
1.120 < 1.269 9 11.14 -2.14 0.412
1.269 < 1.385 14 11.14 2.86 0.733
1.385 < 1.492 13 11.14 1.86 0.31
1.492 < 1.607 8 11.14 -3.14 0.886
1.607 < 1.757 17 11.14 5.86 3.079
1.757 or more 8 11.14 -3.14 0.886
Total 78 78 0 6.718
Parameters from sample d.f. = 4 p < 0.152

Candy Wt (gm)
P
e
r
c
e
n
t
2.5 2.0 1.5 1.0 0.5
99.9
99
95
90
80
70
60
50
40
30
20
10
5
1
0.1
Mean
0.148
1.438
StDev 0.2985
N 78
A D 0.555
P-Valu e
Probability Plot of Candy Wt (gm)
Normal
Data Set D Price-Earnings Ratios for Specialty Retailers (n = 58)
PE Ratio Obs Exp Obs-Exp Chi-Square
Under 10.35 4 8.29 -4.29 2.217
10.35 < 15.19 11 8.29 2.71 0.889
15.19 < 18.92 13 8.29 4.71 2.682
18.92 < 22.39 13 8.29 4.71 2.682
22.39 < 26.12 6 8.29 -2.29 0.631
26.12 < 30.96 6 8.29 -2.29 0.631
30.96 or more 5 8.29 -3.29 1.303
Total 58 58 0 11.034
Parameters from sample d.f. = 4 p < 0.026
241
Candy Wt (gm)
F
r
e
q
u
e
n
c
y
2.1 1.8 1.5 1.2 0.9 0.6
14
12
10
8
6
4
2
0
Mean 1.438
StDev 0.2985
N 78
Histogramof Candy Wt (gm)
Normal
Runs
F
r
e
q
u
e
n
c
y
150 135 120 105 90
20
15
10
5
0
Mean 121.2
StDev 15.13
N 105
Histogramof Runs
Normal

PE Ratio
P
e
r
c
e
n
t
80 70 60 50 40 30 20 10 0
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
<0.005
20.66
S tDev 9.651
N 58
A D 2.307
P -Valu e
Probability Plot of PE Ratio
Normal
Data Set E U.S. Presidents Ages at Inauguration (n = 43)
Age Obs Exp Obs-Exp Chi-Square
Under 48.88 6 7.17 -1.17 0.19
48.88 < 52.20 10 7.17 2.83 1.12
52.20 < 54.86 5 7.17 -2.17 0.655
54.86 < 57.52 11 7.17 3.83 2.05
57.52 < 60.84 2 7.17 -5.17 3.725
60.84 or more 9 7.17 1.83 0.469
Total 43 43 0 8.209
Parameters from sample d.f. = 3 p < 0.042

Age
P
e
r
c
e
n
t
70 65 60 55 50 45 40
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
0.556
54.86
StDev 6.186
N 43
A D 0.304
P- Value
Probability Plot of Age
Normal
Data Set F Weights of 31 Randomly-Chosen Circulated Nickels (n = 31) Nickels
Weight (gm) Obs Exp Obs-Exp Chi-Square
Under 4.908 4 5.17 -1.17 0.263
4.908 < 4.943 4 5.17 -1.17 0.263
4.943 < 4.972 6 5.17 0.83 0.134
4.972 < 5.000 4 5.17 -1.17 0.263
5.000 < 5.036 8 5.17 2.83 1.554
5.036 or more 5 5.17 -0.17 0.005
Total 31 31 0 2.484
Parameters from sample d.f. = 3 p < 0.478
242
Age
F
r
e
q
u
e
n
c
y
68 64 60 56 52 48 44 40
18
16
14
12
10
8
6
4
2
0
Mean 54.86
StDev 6.186
N 43
Histogramof Age
Normal
PERatio
F
r
e
q
u
e
n
c
y
64 48 32 16 0
30
25
20
15
10
5
0
Mean 20.66
StDev 9.651
N 58
Histogramof PE Ratio
Normal

Nickel Wt (gm)
P
e
r
c
e
n
t
5.15 5.10 5.05 5.00 4.95 4.90 4.85 4.80
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
0.021
4.972
S tDev 0.06623
N 31
A D 0.881
P -Value
Probability Plot of Nickel Wt (gm)
Normal
15.39* In this problem, the estimated Poisson mean is = 0.702479339 runs/inning. For d.f. = cm1 = 411
= 2, the critical value is CHIINV(.05,2) = 5.991. The test statistic is 64.02 (p-value less than .0001) so
we reject the hypothesis that runs per inning are Poisson. A common error that students may make is to
fail to define the top category as open ended (X = 3, 4, 5, ...) so that the last entry in the P(X) column
actually is P(X 3) = 1P(X 2). If this error is made, the probabilities will sum to less than 1 and the
expected frequencies will sum to less than 121. Another common mistake is not combining the top
categories to enlarge expected frequencies (e.g., Cochrans rule requires ej 5). Why is the Poisson fit
poor? Perhaps because runs in baseball are conditional on arrival at another base (usually) and waiting at
that base before proceeding to a run scored following someone elses hit (so the runner can arrive at
home plate) whereas in hockey, a goal is scored or not when you arrive at the opponents goal.
Runs fj P(X) ej fj-ej (fj-ej)
2
/ej
0 83 0.495356 59.93803 23.06197 8.873405
1 15 0.347977 42.10523 -27.10523 17.44898
2 7 0.122223 14.78903 -7.78903 4.102294
3 or more 16 0.034444 4.16771 11.83229 33.59226
Total 121 1.000000 121.00000 0.00000 64.01695
15.40* Different pitchers are faced. It is not the same pitcher throughout the game. Hitters also vary by inning.
Some innings have the teams best hitters up, others do not. Also, batters face different degrees of
pressure in different innings, depending on how close the game is, how many are out, and so on. At = .
01, you can reject the hypothesis that the runs scored per inning are from a uniform population since the
p-value (.0084) is less than .01. We use d.f. = cm1 = 910 = 8,
Runs by Inning - Test for Uniform Distribution
Inning fj ej fj-ej (fj-ej)
2
(fj-ej)
2
/ej
1 9 9.44 -0.44 0.19753 0.02092
2 15 9.44 5.56 30.86420 3.26797
3 10 9.44 0.56 0.30864 0.03268
4 4 9.44 -5.44 29.64198 3.13856
5 17 9.44 7.56 57.08642 6.04444
6 10 9.44 0.56 0.30864 0.03268
7 8 9.44 -1.44 2.08642 0.22092
8 11 9.44 1.56 2.41975 0.25621
9 1 9.44 -8.44 71.30864 7.55033
Total 85 20.56471
df 8
p-value 0.008398
243
Nickel Wt (gm)
F
r
e
q
u
e
n
c
y
5.12 5.04 4.96 4.88 4.80
9
8
7
6
5
4
3
2
1
0
Mean 4.972
S tDev 0.06623
N 31
Histogramof Nickel Wt (gm)
Normal
15.41* Results will vary, but should be close to the intended distribution in a chi-square test or z-test for the
intended mean. However, students often are surprised at how much a normal sample can differ from a
perfect bell-shape, and even the mean and standard deviation may not be on target. That is the nature
of random sampling, as we learned in Chapters 8 and 9. The following sample was created using Excels
function =NORMINV(RAND(),0,1). The histogram is reasonably bell-shaped. The test statistic for zero
mean is z = (10)(0.031808) = 0.3181 (p-value = .7504) so the mean is not significantly different from 0.
The standard deviation is a little larger than 1. The chi-square test from Visual Statistics (using equal
class widths) gives a p-value of .834 so the hypothesis of normality should not be rejected.
Descriptive statistics
count 100
mean 0.031808
sample std dev 1.117752
minimum -2.35182753
maximum 3.337159939
1st quartile -0.672417
median -0.005821
3rd quartile 0.824195
Classes Obs Exp Obs-Exp Chi-Square
Under -1.641 8 6.73 1.27 0.24
-1.641 < -0.930 10 12.76 -2.76 0.596
-0.930 < -0.218 24 21.66 2.34 0.254
-0.218 < 0.493 23 24.85 -1.85 0.138
0.493 < 1.204 21 19.29 1.71 0.152
1.204 < 1.915 10 10.12 -0.12 0.001
1.915 or more 4 4.6 -0.6 0.079
Total 100 100 0 1.461
d.f. = 4 p < 0.834
244
15.42* Results will vary, but should be close to the intended distribution in a chi-square test or test for the
desired mean. However, students often are surprised at how much a uniform sample can differ from a
perfect uniform shape, and even the mean and standard deviation may not be on target. That is the
nature of random sampling, as we learned in Chapters 8 and 9. The following sample was created using
Excels function =RAND(). The histogram does not look quite uniform. The test statistic for = 0.50000
is z = (0.5334370.500000)/.0288675 = 1.1583 (p-value = .2467) so the mean is not significantly
different from 0.5000. The standard deviation (.29142) is a little larger than .28868. However, the chi-
square test from Visual Statistics (dividing the observed range into 8 classes) does givesa p-value less
than .05 (mainly due to the surfeit of observations in the sixth class) so the hypothesis of uniformity
might be rejected.
Histogram
0
2
4
6
8
10
12
14
16
18
0
.
0
0
0
0

0
.
1
0
0
0

0
.
2
0
0
0

0
.
3
0
0
0

0
.
4
0
0
0

0
.
5
0
0
0

0
.
6
0
0
0

0
.
7
0
0
0

0
.
8
0
0
0

0
.
9
0
0
0

1
.
0
0
0
0

uniform
P
e
r
c
e
n
t
Descriptive statistics
count 100
mean 0.533437
sample variance 0.084926
sample standard deviation 0.291421
minimum 0.017795
maximum 0.998479
1st quartile 0.275343
median 0.620489
3rd quartile 0.772453
Classes Obs Exp Obs-Exp Chi-Square
Under 0.1404 12 12.5 -0.5 0.02
0.1404 < 0.2630 10 12.5 -2.5 0.5
0.2630 < 0.3856 14 12.5 1.5 0.18
0.3856 < 0.5081 9 12.5 -3.5 0.98
0.5081 < 0.6307 6 12.5 -6.5 3.38
0.6307 < 0.7533 23 12.5 10.5 8.82
0.7533 < 0.8759 12 12.5 -0.5 0.02
0.8759 or more 14 12.5 1.5 0.18
Total 100 100 0 14.08
Parameters from user d.f. = 7 p < 0.050
15.43* Results will vary, but should be close to the intended distribution in a chi-square test or z-test for the
intended mean. However, students often are surprised at how much a Poisson sample can differ from a
245
expected shape, and even the mean and standard deviation may not be on target. That is the nature of
random sampling, as we learned in Chapters 8 and 9. The following sample was created using Excels
Tools > Data Analysis > Random Number Generation with a mean of = 4. The histogram looks fine.
The test statistic for = 4 is z =(5)(4.17)20 = 0.85 (p-value = .3953) so the mean doe not differ
significantly from 4. The standard deviation (2.09) is very close to 2 (the square root of = 4). The chi-
square test from Visual Statistics gives a p-value of .722 so the hypothesis of normality should not be
rejected.
Histogram
0
5
10
15
20
25
30
1

2

3

4

5

6

7

8

9

1
0

1
1

1
2

Data
P
e
r
c
e
n
t
Descriptive Statistics
count 100
mean 4.17
sample standard deviation 2.09
minimum 1
maximum 11
1st quartile 3.00
median 4.00
3rd quartile 5.00
interquartile range 2.00
mode 4.00
X Values Obs Exp Obs-Exp Chi-Square
1 or less 9 9.16 -0.16 0.003
2 11 14.65 -3.65 0.91
3 20 19.54 0.46 0.011
4 24 19.54 4.46 1.02
5 14 15.63 -1.63 0.17
6 8 10.42 -2.42 0.562
7 7 5.95 1.05 0.184
8 3 2.98 0.02 0
9 or more 4 2.13 1.87 1.631
Total 100 100 0 4.49
d.f. = 7 p < 0.722
Chapter 16
246
Nonparametric Tests
16.1 Since the p-value (from MegaStat) is greater than .05 we fail to reject the null hypothesis of randomness.
Runs Test for Random Sequence
n runs
12 7 A
15 7 B
27 14 total
14.333 expected value
2.515 standard deviation
-0.133 z test statistic
-0.066 z (with continuity correction)
.9472 p-value (two-tailed)
Note: MegaStat uses a continuity correction (subtracting 0.5 from the difference in the numerator when R
is below its expected value) which will lead to different z values and p-values than if the textbook
formula is used. MegaStats p-value shown.
16.2 Since the p-value (from MegaStat) is greater than .10 we fail to reject the null hypothesis of randomness.
Runs Test for Random Sequence
n runs
10 6 F
14 6 T
24 12 total
12.667 expected value
2.326 standard deviation
-0.287 z test statistic
0.072 z (with continuity correction)
.9429 p-value (two-tailed)
Note: MegaStat uses a continuity correction (subtracting 0.5 from the difference in the numerator when R
is below its expected value) which will lead to different z values and p-values than if the textbook
formula is used. MegaStats p-value shown.
247
248
16.3 a. At = .10, the population median does not differ from 50 (p-value = .4732). The worksheet and test
statistic calculation are shown.
Student xi xi-50 | xi-50 | Rank R
+
R
-
1 74 24 24 14.5 14.5
2 5 -45 45 24 24
3 87 37 37 20 20
4 26 -24 24 14.5 14.5
5 60 10 10 5.5 5.5
6 99 49 49 27 27
7 37 -13 13 9 9
8 45 -5 5 3 3
9 7 -43 43 22 22
10 78 28 28 17 17
11 70 20 20 13 13
12 84 34 34 19 19
13 97 47 47 25 25
14 93 43 43 22 22
15 54 4 4 2 2
16 24 -26 26 16 16
17 62 12 12 7.5 7.5
18 32 -18 18 12 12
19 60 10 10 5.5 5.5
20 66 16 16 11 11
21 2 -48 48 26 26
22 43 -7 7 4 4
23 62 12 12 7.5 7.5
24 7 -43 43 22 22
25 100 50 50 28 28
26 64 14 14 10 10
27 17 -33 33 18 18
28 48 -2 2 1 1
406 234.5 171.5
Test Statistic:
( 1) 28(28 1)
234.5
234.5 203
4 4
0.7173
43.9147 ( 1)(2 1) 28(28 1)(56 1)
24 24
+ +

= = = =
+ + + +
n n
W
z
n n n
b. The histogram appears platykurtic, but the A-D test statistic (p = .468) indicates that the hypothesis of
normality should not be rejected.
249
100 80 60 40 20 0
Median
Mean
70 65 60 55 50 45 40
Anderson-Darling Normality Test
Variance 917.041
Skewness -0.199844
Kurtosis -0.988643
N 28
Minimum 2.000
A-Squared
1st Quartile 27.500
Median 60.000
3rd Quartile 77.000
Maximum 100.000
95% Confidence Interv al for Mean
41.936
0.34
65.421
95% Confidence Interv al for Median
39.690 68.207
95% Confidence Interv al for StDev
23.942 41.219
P-V alue 0.468
Mean 53.679
StDev 30.283
95% Confidence Intervals
Summary for Score
16.4 a. In the Wilcoxon/Mann-Whitney test at = .05, there is a difference in the population median scores on
the two exams (p-value = .00234). The worksheet is shown.
Student Exam 1 Exam 2 d |d| Rank R+ R-
9 52 53 -1 1 1.5 1.5
12 95 96 -1 1 1.5 1.5
8 71 69 2 2 3 3
20 54 58 -4 4 4 4
10 79 84 -5 5 5.5 5.5
14 81 76 5 5 5.5 5.5
3 65 59 6 6 7 7
16 54 47 7 7 8 8
4 60 68 -8 8 9.5 9.5
18 92 100 -8 8 9.5 9.5
15 59 68 -9 9 11.5 11.5
17 75 84 -9 9 11.5 11.5
7 72 82 -10 10 13 13
1 70 81 -11 11 14.5 14.5
19 70 81 -11 11 14.5 14.5
5 63 75 -12 12 16.5 16.5
11 84 96 -12 12 16.5 16.5
2 74 89 -15 15 18 18
13 83 99 -16 16 19 19
6 58 77 -19 19 20 20
210 23.5 186.5
Test Statistic:
( 1) 20(20 1)
23.5
23.5 105
4 4
3.04
26.7862 ( 1)(2 1) 20(20 1)(40 1)
24 24
+ +

= = = =
+ + + +
n n
W
z
n n n
b. In a t-test at = .05, there is no difference in the population mean scores on the two exams since the p-
value is greater than .05. The two tests reveal different results. Samples are too small for a meaningful
test for normality. The MegaStat results are shown.
250
Hypothesis Test: Independent Groups (t-test, pooled variance)
Exam 1 Exam 2
70.55 77.10 mean
12.55 15.26 std. dev.
20 20 n
38 df
-6.550 difference (Exam 1 - Exam 2)
195.178 pooled variance
13.971 pooled std. dev.
4.418 standard error of difference
0 hypothesized difference
-1.48 t
.1464 p-value (two-tailed)
16.5 a. At = .05, there is no difference in the medians, since the p-value is greater than .05. MegaStat uses a
correction for ties, so students may get different z values and p-values. The calculations and p-value
shown are from MegaStat, on the assumption that students will use MegaStat for the calculations.
Wilcoxon - Mann/Whitney Test
n sum of ranks
10 135 Bob's Portfolio
12 118 Toms Portfolio
22 253
115.00 expected value
15.16 standard deviation
0.96 z, uncorrected
1.29 z, corrected for ties
.1983 p-value (two-tailed)
b. MegaStats results are shown, assuming equal variances (t = 1.62, p = .0606). At = .05, there is no
difference in the means, since the p-value is greater than .05. If you assume unequal variances, the result
is similar (t = 1.661, p = .0565). Both tests lead to the same decision. Samples are too small for a
meaningful test for normality.
Hypothesis Test: Independent Groups (t-test, pooled variance)
Bob's Portfolio Toms Portfolio
6.040 4.100 mean
2.352 3.119 std. dev.
10 12 n
20 df
1.9400 difference (Bob's - Toms Portfolio)
7.8392 pooled variance
2.7999 pooled std. dev.
1.1988 standard error of diff
0 hypothesized difference
1.618 t
.0606 p-value (one-tail upper)
251
16.6 a. We fail to reject the null hypothesis that there is a difference in the medians since the p-value is greater
than .05. MegaStats results are shown.
Wilcoxon - Mann/Whitney Test
n sum of ranks
9 125 Old Bumper
12 106 New Bumper
21 231
99.00 expected value
14.07 standard deviation
1.81
.0700 p-value (two-tailed)
b. We fail to reject the null hypothesis that there is a difference in the means since the p-value is greater
than .05. MegaStats results are shown. We have the same decision as in (a). Samples are too small for
a meaningful test for normality.
Hypothesis Test: Independent Groups (t-test, pooled variance)
Old Bumper New Bumper
1,766.11 1,101.42 mean
837.62 696.20 std. dev.
9 12 n
19 df
664.694 difference (Old - New)
576,031.463 pooled variance
758.967 pooled std. dev.
334.673 standard error of difference
0 hypothesized difference
1.99 t
.0616 p-value (two-tailed)
.0308 p-value (one-tailed)
16.7 MegaStat results are shown. At = .05, there is no difference in median volatility in these four
portfolios (p = .0892). The ANOVA test gives the same conclusion, but the decision is very close (p = .
0552). Had we used = .10, the difference would have been significant in either test.
Kruskal-Wallis Test
Median n Avg. Rank
16.20 15 20.03 Health
22.70 12 35.13 Energy
21.05 14 29.71 Retail
18.10 13 26.69 Leisure
19.65 54 Total
6.511 H (corrected for ties)
3 d.f.
.0892 p-value
252
One factor ANOVA
Mean n Std. Dev
19.92037037 17.34 15 4.630 Health
19.92037037 23.18 12 6.311 Energy
19.92037037 20.62 14 4.032 Retail
19.92037037 19.14 13 6.711 Leisure
19.92 54 5.716 Total

ANOVA table
Source SS df MS F p-value
Treatment 241.815 3 80.6049 2.71 .0552
Error 1,489.913 50 29.7983
Total 1,731.728 53
Based on the four individual histograms, we would doubt normality. However, each sample is rather
small for a normality test. Pooling the samples, we get a p-value of .490 for MINITABs Anderson-
Darling test statistic, so normality cant be rejected.


253
30 24 18 12 6
Median
Mean
22 21 20 19 18 17
Anderson-Darling Normality Test
Variance 32.674
Skewness 0.014614
Kurtosis -0.137501
N 54
Minimum 4.900
A-Squared
1st Quartile 15.250
Median 19.650
3rd Quartile 24.275
Maximum 32.500
95% Confidence Interv al for Mean
18.360
0.34
21.481
95% Confidence Interv al for Median
17.636 22.064
95% Confidence Interv al for StDev
4.805 7.057
P-V alue 0.490
Mean 19.920
StDev 5.716
95% Confidence Intervals
Summary for Volatility
16.8 a. At = .05, there is a difference in median productivity since the p-value is less than .05.
Kruskal-Wallis Test
Mediann Avg. Rank
4.10 9 11.61 Station A
2.90 6 6.67 Station B
5.40 10 18.05 Station C
4.50 25 Total
9.479 H (corrected for ties)
2 d.f.
.0087 p-value
multiple comparison values for avg. ranks
8.63 10.58
b. At = .05, there is a difference in median productivity since the p-value is less than .05.
One factor ANOVA
Mean n Std. Dev
4.38 3.97 9 0.828 Station A
4.38 3.02 6 1.094 Station B
4.38 5.57 10 1.726 Station C
4.38 25 1.647 Total
ANOVA table
Source SS df MS F p-value
Treatment 26.851 2 13.4253 7.72 .0029
Error 38.269 22 1.7395
Total 65.120 24
The samples are rather small for a normality test. Pooling the samples them, we get a p-value of .392 for
MINITABs Anderson-Darling test statistic, so normality cant be rejected.
254
8 7 6 5 4 3 2
Median
Mean
5.0 4.5 4.0 3.5 3.0
Anderson-Darling Normality Test
Variance 2.7133
Skewness 0.660729
Kurtosis 0.072543
N 25
Minimum 1.9000
A-Squared
1st Quartile 3.0000
Median 4.5000
3rd Quartile 5.2500
Maximum 8.4000
95% Confidence Interv al for Mean
3.7001
0.37
5.0599
95% Confidence Interv al for Median
3.2396 4.9802
95% Confidence Interv al for StDev
1.2862 2.2915
P-V alue 0.392
Mean 4.3800
StDev 1.6472
95% Confidence Intervals
Summary for Units Per Hour
16.9 The median ratings of surfaces do not differ at = .05 since the p-value is greater than .05.
Friedman Test
Sum of Ranks Avg. Rank
9.00 2.25 Shiny
10.00 2.50 Satin
17.50 4.38 Pebbled
10.00 2.50 Pattern
13.50 3.38 Embossed
60.00 3.00 Total
4 n
5.013 chi-square (corrected for ties)
4 d.f.
.2860 p-value
multiple comparison values for avg. ranks
3.14(.05) 3.68(.01)
255
16.10 The median sales of coffee sizes do not differ at = .05, since the p-value is greater than .05
Friedman Test
Sum of Ranks Avg. Rank
10.00 2.00
10.00 2.00 Medium
10.00 2.00
30.00 2.00
5
0.000 chi-square (corrected for ties)
2
1.0000
multiple comparison values for avg. ranks
1.51 1.86
16.11 a. Worksheet is shown for rank correlation.
Profit in year: Rank in year:
Obs Company 2004 2005 2004 2005
1 Campbell Soup 595 647 6 7
2 ConAgra Foods 775 880 5 5
3 Dean Foods 356 285 10 10
4 Del Monte Foods 134 165 13 14
5 Dole Food 105 134 15 15
6 Flowers Foods 15 51 19 18
7 General Mills 917 1055 3 3
8 H. J. Heinz 566 804 7 6
9 Hershey Foods 458 591 8 8
10 Hormel Foods 186 232 12 11
11 Interstate Bakeries 27 -26 17 20
12 J. M. Schmucker 96 111 16 16
13 Kellogg 787 891 4 4
14 Land O'Lakes 107 21 14 19
15 McCormick 211 215 11 13
16 Pepsico 3568 4212 1 1
17 Ralcorp Holdings 7 65 20 17
18 Sara Lee 1221 1272 2 2
19 Smithfield Foods 26 227 18 12
20 Wm. Wrigley, Jr. 446 493 9 9
Rank sum:210 210
b. Spearman rank correlation found by using the Excel function CORREL on the rank columns is 0.9338.
c. t-statistic for Spearman rank correlation is 11.706. Clearly, we can reject the hypothesis of no correlation
at any of the customary levels.
256
2 2
0.9338
11.076
1 1 0.9338
20 2 2
= = =


s
s
r
t
r
n
Critical values:
= 0.025 t0.025 = 2.093 Reject H0
= 0.01 t0.01 = 2.539 Reject H0
= 0.005 t0.005 = 2.861 Reject H0
d. MegaStats calculations are shown.
Spearman Coefficient of Rank Correlation
2004 2005
2004 1.000
2005 .934 1.000
20 sample size
.444 critical value .05 (two-tail)
.561 critical value .01 (two-tail)
e. Calculated using the CORREL function on the actual data (not the ranks) we get r = 0.9960:
f. In this example, there is no strong argument for the Spearman test since the data are ratio. However, the
assumption of normality may be dubious (samples are too small for a reliable normality test).
16.12 There is a discrepancy between the textbooks data and the student CD data. The textbooks margin
answer is based on the CD data. Students answers will depend on which data set they use. Calculations
for each data set are shown below. This discrepancy will be corrected in future editions.
Data Set from CD: Textbook Data Set:
12-Mo 5-Yr 12-Mo 5-Yr
12-Mo 1.000 12-Mo 1.000
5-Yr .742 1.000 5-Yr .373 1.000
24sample size 24sample size
.404 critical value .05 (two-tail)
.515 critical value .01 (two-tail)
The worksheets for each data set are shown:
257
Data Set from CD Ranks: Textbook Data Set Ranks:
Fund 12-Mo 5-Yr Fund 12-Mo 5-Yr
1 17.5 18.5 1 17.5 18.5
2 1 5 2 1 5
3 12 11 3 12 11
4 6 4 4 6 4
5 7 1 5 7 1
6 15 14 6 15 14
7 23 22 7 23 22
8 10 7 8 10 7
9 9 10 9 9 10
10 16 21 10 16 21
11 11 9 11 14 9
12 17.5 23 12 4 23
13 22 15 13 11 15
14 19 24 14 17.5 24
15 21 20 15 22 20
16 3 2 16 19 2
17 8 18.5 17 21 18.5
18 24 12 18 3 12
19 2 3 19 8 3
20 13 8 20 24 8
21 14 17 21 2 17
22 4 13 22 13 13
23 5 6 23 5 6
24 20 16 24 20 16
Rank sum: 300 Rank sum: 300
e. Pearson correlation found by using the Excel function CORREL is 0.6560 (from student CD data set) or
0.2796 (from the data printed in the textbook).
f. In this example, there is no strong argument for the Spearman test since the data are ratio. Despite the
low outlier in 5-year returns, both samples pass the test for normality (p = .541 and .460 respectively).
The following tests are based on the CD data set
20 15 10 5 0
Median
Mean
11 10 9 8 7 6 5
A nderson-Darling Normality Test
V ariance 34.8478
S kew ness -0.022436
Kurtosis -0.236113
N 24
Minimum -2.4000
A -Squared
1st Quartile 3.5250
Median 8.6500
3rd Q uartile 11.5000
Maximum 21.1000
95% C onfidence Interv al for Mean
5.4615
0.31
10.4469
95% C onfidence I nterv al for Median
5.8016 11.2000
95% C onfidence Interv al for StDev
4.5880 8.2808
P -V alue 0.541
Mean 7.9542
S tDev 5.9032
95% Confidence Intervals
Summary for 12-Mo

15 10 5 0
Median
Mean
10 9 8 7
A nderson-Darling Normality Test
V ariance 17.8220
S kew ness -0.696921
Kurtosis 0.850848
N 24
Minimum -2.9000
A -Squared
1st Quartile 6.0500
Median 9.1500
3rd Q uartile 10.5000
Maximum 14.7000
95% C onfidence Interv al for Mean
6.6049
0.34
10.1701
95% C onfidence I nterv al for Median
6.9441 10.2520
95% C onfidence Interv al for StDev
3.2811 5.9219
P -V alue 0.460
Mean 8.3875
S tDev 4.2216
95% Confidence Intervals
Summary for 5-Yr
258
16.13 Since the p-value (.5300) is greater than .05, we fail to reject the null hypothesis of randomness. Note:
MegaStats z-value subtracts a continuity correction of 0.5 from the numerator of the test statistic when R
is less than its expected value. This will give a slightly different result than the formula shown in the
textbook (see Siegel and Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill,
1988).
H0: Events follow a random pattern
H1: Events do not follow a random pattern
Runs Test for Random Sequence
n runs
21 14 B
29 14 A
50 28 total
25.360 expected value
3.408 standard deviation
0.775 z test statistic
0.628 z (with continuity correction)
.5300 p-value (two-tailed)
16.14 Since the p-value (.9145) is greater than .01 we fail to reject the null hypothesis of randomness. Note:
MegaStats z-value subtracts a continuity correction of 0.5 from the numerator of the test statistic when R
is less than its expected value. This will give a slightly different result than the formula shown in the
textbook (see Siegel and Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill,
1988).
H0: Events follow a random pattern
H1: Events do not follow a random pattern
Runs Test for Random Sequence
n runs
21 9 H
14 8 M
35 17 total
17.800 expected value
2.794 standard deviation
-0.286 z test statistic
-0.107 z (with continuity correction)
.9145 p-value (two-tailed)
16.15 Since the p-value (.6245) is greater than .05 we fail to reject the null hypothesis of randomness. Note:
MegaStats z-value subtracts a continuity correction of 0.5 from the numerator of the test statistic when R
is less than its expected value. This will give a slightly different result than the formula shown in the
textbook (see Siegel and Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill,
1988).
259
H0: Events follow a random pattern
H1: Events do not follow a random pattern
Runs Test for Random Sequence
n runs
14 8 T
11 7 F
25 15 total
13.320 expected value
2.411 standard deviation
0.697 z test statistic
0.490 z (with continuity correction)
.6245 p-value (two-tailed)
16.16 Since the p-value (.2163) is greater than .01 we fail to reject the null hypothesis of randomness. Note:
MegaStats z-value subtracts a continuity correction of 0.5 from the numerator of the test statistic when R
is less than its expected value. This will give a slightly different result than the formula shown in the
textbook (see Siegel and Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill,
1988).
H0: Events follow a random pattern
H1: Events do not follow a random pattern
Runs Test for Random Sequence
n runs
21 10 N
12 10 H
33 20 total
16.273 expected value
2.610 standard deviation
1.428 z test statistic
1.237 z (with continuity correction)
.2163 p-value (two-tailed)
16.17 Since the p-value (.2135) is greater than .05 we fail to reject the null hypothesis of randomness. Note:
MegaStats z-value subtracts a continuity correction of 0.5 from the numerator of the test statistic when R
is less than its expected value. This will give a slightly different result than the formula shown in the
textbook (see Siegel and Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill,
1988).
H0: Events follow a random pattern
H1: Events do not follow a random pattern
260
Runs Test for Random Sequence
n runs
18 11 C
16 11 X
34 22 total
17.941 expected value
2.861 standard deviation
1.419 z test statistic
1.244 z (with continuity correction)
.2135 p-value (two-tailed)
16.18 Since the p-value (.2288) is greater than .05 we fail to reject the null hypothesis of randomness. Note:
MegaStats z-value subtracts a continuity correction of 0.5 from the numerator of the test statistic when R
is less than its expected value. This will give a slightly different result than the formula shown in the
textbook (see Siegel and Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill,
1988).
H0: Events follow a random pattern
H1: Events do not follow a random pattern
Runs Test for Random Sequence
n runs
34 13 Up
27 13 Dn
61 26 total
31.098 expected value
3.821 standard deviation
-1.334 z test statistic
-1.204 z (with continuity correction)
.2288 p-value (two-tailed)
Since the p-value is greater than .05 we fail to reject the null hypothesis of randomness.
16.19 Since the p-value (.1508) is greater than .05 we fail to reject the null hypothesis of randomness. Note:
MegaStats z-value subtracts a continuity correction of 0.5 from the numerator of the test statistic when R
is less than its expected value. This will give a slightly different result than the formula shown in the
textbook (see Siegel and Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill,
1988).
H0: Events follow a random pattern
H1: Events do not follow a random pattern
261
Runs Test for Random Sequence
n runs
13 5 Lo
11 4 Hi
24 9 total
12.917 expected value
2.378 standard deviation
-1.647 z test statistic
-1.437 z (with continuity correction)
.1508 p-value (two-tailed)
16.20 MegaStat results are shown. At = .10, the median ELOS does not differ for the two groups, since the
p-value (.5720) is greater than .10. The hypotheses are:
H0: M1 = M2 (no difference in ELOS)
H1: M1 M2 (ELOS differs for the two groups)
Wilcoxon - Mann/Whitney Test
n sum of ranks
10 124 Clinic A
12 129 Clinic B
22 253
115.00 expected value
15.04 standard deviation
0.57 z, corrected for ties
.5720 p-value (two-tailed)
Although the histogram is somewhat platykurtic in appearance, normality may be assumed at = .10
based on the Anderson-Darling p-value (.147). To perform this test, the two samples were pooled. Even
so, the sample is rather small for a normality test.
50 40 30 20
Median
Mean
37.5 35.0 32.5 30.0 27.5 25.0
Anderson-Darling Normality Test
Variance 119.100
Skewness 0.460007
Kurtosis -0.632645
N 22
Minimum 16.000
A-Squared
1st Quartile 20.000
Median 30.000
3rd Quartile 40.000
Maximum 52.000
95% Confidence Interv al for Mean
25.525
0.54
35.202
95% Confidence Interv al for Median
23.891 36.109
95% Confidence Interv al for StDev
8.396 15.596
P-V alue 0.147
Mean 30.364
StDev 10.913
95% Confidence Intervals
Summary for Weeks
262
16.21 MegaStat results are shown. At = .05, the median defect counts do not differ for the two groups, since
the p-value (.4731) is greater than .05. The hypotheses are:
H0: M1 = M2 (no difference in number of bad pixels)
H1: M1 M2 (number of bad pixels differs for the two groups)
Wilcoxon - Mann/Whitney Test
n sum of ranks
12 162.5
12 137.5
24 300
150.00 expected value
16.73 standard deviation
0.72 z, corrected for ties
.4731 p-value (two-tailed)
The histogram is strongly right-skewed and the Anderson-Darling p-value is small (less than .005) so the
assumption of normality is untenable. To perform this test, the two samples were pooled. Even so, the
sample is rather small for a normality test.
5 4 3 2 1 0
Median
Mean
2.0 1.5 1.0 0.5 0.0
Anderson-Darling Normality Test
V ariance 2.3460
Skewness 1.11533
Kurtosis 0.65863
N 24
Minimum 0.0000
A-Squared
1st Quartile 0.0000
Median 1.0000
3rd Quartile 2.0000
Maximum 5.0000
95% Confidence Interv al for Mean
0.8116
1.36
2.1051
95% Confidence Interv al for Median
0.0000 2.0000
95% Confidence Interv al for StDev
1.1904 2.1486
P-V alue < 0.005
Mean 1.4583
StDev 1.5317
95% Confidence Intervals
Summary for Bad Pixels
16.22 MegaStat results are shown. At = .05, the median weights do not differ for the two groups, since the p-
value (.1513) is greater than .01. The hypotheses are:
H0: M1 = M2 (no difference in weight of linemen)
H1: M1 M2 (Weight of linemen differs for the two teams)
263
Wilcoxon - Mann/Whitney Test
n sum of ranks
14 226 WeightD
13 152 WeightP
27 378
196.00 expected value
20.56 standard deviation
1.44 z, corrected for ties
.1513 p-value (two-tailed)
The histogram is bell-shaped, and normality may be assumed at any common based on the Anderson-
Darling p-value (.360). To perform this test, the two samples were pooled.
340 320 300 280
Median
Mean
305 300 295 290
Anderson-Darling Normality Test
Variance 241.95
Skewness 0.642175
Kurtosis 0.197856
N 27
Minimum 274.00
A-Squared
1st Quartile 288.00
Median 300.00
3rd Quartile 308.00
Maximum 338.00
95% Confidence Interv al for Mean
292.74
0.39
305.04
95% Confidence Interv al for Median
288.00 305.00
95% Confidence Interv al for StDev
12.25 21.32
P-V alue 0.360
Mean 298.89
StDev 15.55
95% Confidence Intervals
Summary for Pounds
16.23 MegaStat results are shown. At = .05, the median difference in heart rates does not differ from zero,
since the p-value (.0923) is greater than .05. Note: The 5th observation is omitted because its difference
is zero (heart rate of 82 before and after) which leaves only n = 29. The hypotheses are:
H0: Md = 0 the median difference in pulse rate is zero
H1: Md 0 the median difference in pulse rate is not zero
Wilcoxon Signed Rank Test
Paired Data: :Before After
149sum of positive ranks
286sum of negative ranks
29 n
217.50 expected value
40.69 standard deviation
-1.68 z, corrected for ties
.0923 p-value (two-tailed)
The worksheet is shown:
264
Student Before After d | d | Rank R+ R-
1 60 62 -2 2 12 12
2 70 76 -6 6 24.5 24.5
3 77 78 -1 1 5 5
4 80 83 -3 3 16.5 16.5
5 82 82 0 0
6 82 83 -1 1 5 5
7 41 66 -25 25 29 29
8 65 63 2 2 12 12
9 58 60 -2 2 12 12
10 50 54 -4 4 20.5 20.5
11 82 93 -11 11 27 27
12 56 55 1 1 5 5
13 71 67 4 4 20.5 20.5
14 67 68 -1 1 5 5
15 66 75 -9 9 26 26
16 70 64 6 6 24.5 24.5
17 69 66 3 3 16.5 16.5
18 64 69 -5 5 23 23
19 70 73 -3 3 16.5 16.5
20 59 58 1 1 5 5
21 62 65 -3 3 16.5 16.5
22 66 68 -2 2 12 12
23 81 77 4 4 20.5 20.5
24 56 57 -1 1 5 5
25 64 62 2 2 12 12
26 78 79 -1 1 5 5
27 75 74 1 1 5 5
28 66 67 -1 1 5 5
29 59 63 -4 4 20.5 20.5
30 98 82 16 16 28 28
Rank sum: 435 149 286
The histograms are bell-shaped, and normality may be assumed at any common based on the Anderson-
Darling p-values (.543 and .388).
96 80 64 48
Median
Mean
72 70 68 66 64
A nderson-Darling Normality Test
V ariance 131.361
S kewness 0.182087
Kurtosis 0.832718
N 30
Minimum 41.000
A -S quared
1st Q uartile 59.750
Median 66.500
3rd Quartile 77.250
Maximum 98.000
95% C onfidence Interv al for Mean
63.854
0.31
72.413
95% C onfidence Interv al for Median
64.000 70.771
95% C onfidence I nterv al for StDev
9.128 15.408
P -V alue 0.543
Mean 68.133
S tDev 11.461
95% Confidence Intervals
Summary for Before

90 80 70 60
Median
Mean
75.0 72.5 70.0 67.5 65.0
A nderson-Darling Normality Test
Variance 92.102
Skewness 0.431633
Kurtosis -0.347835
N 30
Minimum 54.000
A-S quared
1st Q uartile 62.750
Median 67.500
3rd Quartile 77.250
Maximum 93.000
95% C onfidence Interv al for Mean
66.050
0.38
73.217
95% C onfidence Interv al for Median
64.229 74.771
95% C onfidence I nterv al for StDev
7.643 12.901
P-V alue 0.388
Mean 69.633
StDev 9.597
95% Confidence Intervals
Summary for After
265
16.24 MegaStat results are shown. At = .05, the median downtimes (days) do not differ for the two groups,
since the p-value (.1735) is greater than .05. The hypotheses are:
H0: M1 = M2 (no difference in number of repair incidents)
H1: M1 M2 (Number of repair incidents differs for the two groups)
Wilcoxon - Mann/Whitney Test
n sum of ranks
12 112.5 New Bumper
9 118.5 Old Bumper
21 231 total
132.00 expected value
13.96 standard deviation
-1.36 z, corrected for ties
.1735 p-value (two-tailed)
The histogram is right-skewed, but the Anderson-Darling p-value (.088) would not lead to rejection of
the hypothesis of normality at = .05. To perform this test, the two samples were pooled. Even so, the
sample is rather small for a normality test.
16 12 8 4
Median
Mean
9 8 7 6 5
Anderson-Darling Normality Test
Variance 18.5619
Skewness 0.938235
Kurtosis 0.530538
N 21
Minimum 1.0000
A-Squared
1st Quartile 4.0000
Median 7.0000
3rd Quartile 10.0000
Maximum 18.0000
95% Confidence Interv al for Mean
5.2293
0.63
9.1516
95% Confidence Interv al for Median
4.6735 8.3265
95% Confidence Interv al for StDev
3.2961 6.2216
P-V alue 0.088
Mean 7.1905
StDev 4.3084
95% Confidence Intervals
Summary for Days
16.25 MegaStat results are shown. At = .01, the median square footage does differ for the two groups, since
the p-value (.0022) is less than .01. The hypotheses are:
H0: M1 = M2 (no difference in square footage)
H1: M1 M2 (square footage differs for the two groups)
266
Wilcoxon - Mann/Whitney Test
n sum of ranks
11 79.5 Grosse Hills
11 173.5 Haut Nez Estates
22 253 total
126.50 expected value
15.21 standard deviation
-3.06 z, corrected for ties
.0022 p-value (two-tailed)
The histogram is bell-shaped, and normality may be assumed at any common based on the Anderson-
Darling p-value (.359). To perform this test, the two samples were pooled.
4500 4000 3500 3000
Median
Mean
3800 3700 3600 3500 3400 3300 3200
Anderson-Darling Normality Test
Variance 237208.7
Skewness 0.890787
Kurtosis 0.934743
N 22
Minimum 2800.0
A-Squared
1st Quartile 3215.0
Median 3450.0
3rd Quartile 3850.0
Maximum 4850.0
95% Confidence Interv al for Mean
3343.1
0.39
3775.0
95% Confidence Interv al for Median
3220.0 3752.7
95% Confidence Interv al for StDev
374.7 696.0
P-V alue 0.359
Mean 3559.1
StDev 487.0
95% Confidence Intervals
Summary for Sq Ft
16.26 MegaStat results are shown. At = .01, the median GPA does not differ among the classes, since the p-
value (.1791) is greater than .01.
H0: All c population medians are the same
H1: Not all the population medians are the same
Kruskal-Wallis Test
Median n Avg. Rank
2.19 5 7.40
2.96 7 11.93
3.26 7 16.21
3.10 6 15.17
3.01 25
4.902 H (corrected for ties)
3
.1791
multiple comparison values for avg. ranks
10.98 13.09
267
The histogram is rather bimodal, but normality cannot be rejected at any common based on the
Anderson-Darling p-value (.277). To perform this test, the four samples were pooled.
3.6 3.2 2.8 2.4 2.0
Median
Mean
3.3 3.2 3.1 3.0 2.9 2.8 2.7
Anderson-Darling Normality Test
Variance 0.3000
Skewness -0.388769
Kurtosis -0.543016
N 25
Minimum 1.9100
A-Squared
1st Quartile 2.5800
Median 3.0100
3rd Quartile 3.3250
Maximum 3.8900
95% Confidence Interv al for Mean
2.7419
0.43
3.1941
95% Confidence Interv al for Median
2.8339 3.3081
95% Confidence Interv al for StDev
0.4277 0.7619
P-V alue 0.277
Mean 2.9680
StDev 0.5477
95% Confidence Intervals
Summary for GPA
16.27 MegaStat results are shown. At = .01, the median crash damage does not differ among the cars, since
the p-value (.4819) is greater than .01. Sample sizes are too small for a reasonable test for normality,
even if the samples are combined.
H0: All c population medians are the same
H1: Not all the population medians are the same
Kruskal-Wallis Test
Mediann Avg. Rank
1,220.00 5 6.40
1,390.00 5 7.80
1,830.00 5 9.80
1,390.00 15
1.460
2
.4819
multiple comparison values for avg. ranks
6.77 8.30
16.28 MegaStat results are shown. At = .05, the median waiting time does not differ among the hospitals,
since the p-value (.1775) is greater than .05.
H0: All c population medians are the same
H1: Not all the population medians are the same
268
Kruskal-Wallis Test
Median n Avg. Rank
11.00 5 11.20 Hospital A
18.00 7 15.43 Hospital B
11.50 6 10.33 Hospital C
7.00 4 6.75 Hospital D
13.50 22
4.923 H (corrected for ties)
3
.1775
multiple comparison values for avg. ranks
10.33 12.31
The histogram looks rather right-skewed, but normality may be assumed at any common based on the
Anderson-Darling p-value (.561). To perform this test, the four samples were pooled.
30 20 10 0
Median
Mean
20 18 16 14 12 10
Anderson-Darling Normality Test
Variance 80.712
Skewness 0.557667
Kurtosis -0.076613
N 22
Minimum 0.000
A-Squared
1st Quartile 8.750
Median 13.500
3rd Quartile 21.000
Maximum 36.000
95% Confidence Interv al for Mean
11.062
0.30
19.029
95% Confidence Interv al for Median
9.973 19.027
95% Confidence Interv al for StDev
6.912 12.839
P-V alue 0.561
Mean 15.045
StDev 8.984
95% Confidence Intervals
Summary for Minutes
269
16.29 MegaStat results are shown. At = .05, the median output (watts) does differ among the three types of
cells, since the p-value (.0104) is smaller than .05.
H0: All c population medians are the same
H1: Not all the population medians are the same
Kruskal-Wallis Test
Mediann Avg. Rank
123.50 6 7.75
122.00 6 6.00
128.00 6 14.75
125.00 18
9.139 H (corrected for ties)
2
.0104
multiple comparison values for avg. ranks
7.38 9.05
The histogram appears right-skewed, but normality may be assumed at any common based on the
Anderson-Darling p-value (.524). To perform this test, the three samples were pooled.
132 130 128 126 124 122
Median
Mean
127 126 125 124 123 122
Anderson-Darling Normality Test
Variance 8.46
Skewness 0.446497
Kurtosis -0.585434
N 18
Minimum 121.00
A-Squared
1st Quartile 122.00
Median 125.00
3rd Quartile 127.25
Maximum 131.00
95% Confidence Interv al for Mean
123.44
0.31
126.34
95% Confidence Interv al for Median
122.52 126.48
95% Confidence Interv al for StDev
2.18 4.36
P-V alue 0.524
Mean 124.89
StDev 2.91
95% Confidence Intervals
Summary for Watts
16.30 MegaStat results are shown. At = .01, the median stopping distance does not differ among the types of
braking method, since the p-value (.2636) is greater than .01. Normality test is not practical with only
nine data points.
H0: All c populations have the same median (braking method not related to stopping distance).
H1: Not all the populations have the same median (braking method related to stopping distance).
270
Friedman Test
Sum of Ranks Avg. Rank
6.00 2.00 Pumping
4.00 1.33
8.00 2.67
18.00 2.00
3
2.667 chi-square (corrected for ties)
2
.2636
multiple comparison values for avg. ranks
1.95 2.40
16.31 MegaStat results are shown. At = .01, the median waiting time does not differ among the time of day,
since the p-value (.5964) is greater than .01.
H0: All c populations have the same median (waiting time not related to time of day).
H1: Not all the populations have the same median (waiting time related to time of day).
Friedman Test
Sum of Ranks Avg. Rank
76.00 2.92
87.50 3.37
78.00 3.00
79.50 3.06
69.00 2.65
390.00 3.00
26
2.773 chi-square (corrected for ties)
4
.5964
multiple comparison values for avg. ranks
1.23 1.44
All five histograms are skewed. Only one histogram (Thursday) might be considered normal, according
to the Anderson-Darling test (individual tests not shown). Combining the five days (shown below) we
reject normality (p < .005). There are high outliers. A non-parametric test is desirable.
271
150 120 90 60 30 0
Median
Mean
55 50 45 40 35
Anderson-Darling Normality Test
V ariance 769.771
Skewness 1.20873
Kurtosis 1.35480
N 130
Minimum 0.000
A-Squared
1st Quartile 34.000
Median 42.000
3rd Quartile 63.750
Maximum 152.000
95% Confidence Interv al for Mean
46.670
5.13
56.299
95% Confidence Interv al for Median
37.365 47.000
95% Confidence Interv al for StDev
24.733 31.598
P-V alue < 0.005
Mean 51.485
StDev 27.745
95% Confidence Intervals
Summary for Waiting Time
16.32 At = .05, there is a significant rank correlation between gestation and on longevity since the Spearman
coefficient of rank correlation is outside the critical region as given in the MegaStat output (.769 > .423).
H0: Rank correlation is zero (s 0) (no relationship between gestation and longevity)
H1: Rank correlation is positive (s > 0) (there is a relationship between gestation and longevity)
Spearman Coefficient of Rank Correlation
Gestation Longevity
Gestation 1.000
Longevity .769 1.000
22 sample size
.423 critical value .05 (two-tail)
.537 critical value .01 (two-tail)
Responses to analyzing the question on human beings will vary (you could do a regression and show that
the prediction for humans does not fit the regression, since we have gestation of about 270 days and live,
say, 75 years on average). The histograms are right-skewed. Using = .05, based on the Anderson-
Darling test, normality is rejected for gestation (p = .012) but not for longevity (p = .068). It should be
noted that neither sample would pass the normality test at = .10, so there is some reason to doubt
normality. Thus, the nonparametric Spearman test is attractive.
600 400 200 0
Median
Mean
300 250 200 150 100 50
A nderson-Darling Normality Test
V ariance 31046.92
S kew ness 1.08380
Kurtosis 0.63782
N 22
Minimum 13.00
A -Squared
1st Quartile 58.75
Median 129.50
3rd Q uartile 338.75
Maximum 660.00
95% C onfidence Interv al for Mean
116.69
0.96
272.94
95% C onfidence I nterv al for Median
62.95 286.23
95% C onfidence Interv al for StDev
135.56 251.80
P -V alue 0.012
Mean 194.82
S tDev 176.20
95% Confidence Intervals
Summary for Gestation

30 20 10 0
Median
Mean
16 14 12 10 8 6
A nderson-Darling Normality Test
Variance 58.323
Skewness 1.29820
Kurtosis 2.85056
N 22
Minimum 1.000
A-S quared
1st Quartile 5.000
Median 12.000
3rd Quartile 15.000
Maximum 35.000
95% C onfidence Interv al for Mean
8.296
0.67
15.068
95% C onfidence I nterv al for Median
6.945 15.000
95% C onfidence Interv al for StDev
5.875 10.914
P-V alue 0.068
Mean 11.682
StDev 7.637
95% Confidence Intervals
Summary for Longevity
272
16.33 At = .05, there is a significant rank correlation between fertility in 1990 and fertility in 2000 since the
Spearman coefficient of rank correlation is outside the critical region as given in the MegaStat output
(.812 > .514).
H0: Rank correlation is zero (s 0) (no relationship exists)
H1: Rank correlation is positive (s > 0) (there is a relationship)
Spearman Coefficient of Rank Correlation
1990 2000
1990 1.000
2000 .812 1.000
15 sample size
.514 critical value .05 (two-tail)
.641 critical value .01 (two-tail)
The histogram is bell-shaped, and normality may be assumed at any common based on the Anderson-
Darling p-value (.437). To perform this test, the two samples were pooled.
2.1 1.8 1.5 1.2
Median
Mean
1.70 1.65 1.60 1.55 1.50
Anderson-Darling Normality Test
Variance 0.0567
Skewness 0.284054
Kurtosis -0.056615
N 30
Minimum 1.1000
A-Squared
1st Quartile 1.4000
Median 1.6000
3rd Quartile 1.7000
Maximum 2.1000
95% Confidence Interv al for Mean
1.4811
0.36
1.6589
95% Confidence Interv al for Median
1.5000 1.6771
95% Confidence Interv al for StDev
0.1896 0.3200
P-V alue 0.437
Mean 1.5700
StDev 0.2380
95% Confidence Intervals
Summary for Fertility
16.34 At = .01, there is not a significant rank correlation between calories and sodium since the Spearman
coefficient of rank correlation is inside the critical region as given in the MegaStat output (.229 < .623).
Samples are too small for reliable test for normality. However, there is one severe outlier in the calories
(possibly a data recording error). The sodium histograms is somewhat right-skewed. All in all, the
nonparametric test seems like a good idea.
H0: Rank correlation is zero (s 0) (no relationship between calories and sodium)
H1: Rank correlation is positive (s > 0) (there is a relationship between calories and sodium)
Spearman Coefficient of Rank Correlation
Fat (g) Calories Sodium (mg)
Fat (g) 1.000
Calories .680 1.000
Sodium (mg) .559 .229 1.000
16 sample size
.497 critical value .05 (two-tail)
.623 critical value .01 (two-tail)
273
16.35 At = .05, there is a significant rank correlation between colon cancer rate and per capita meat
consumption, since the Spearman coefficient of rank correlation is outside the critical region as given in
the MegaStat output (.813 > .413).
H0: Rank correlation is zero (s 0) (no relationship)
H1: Rank correlation is positive (s > 0) (there is a relationship)
Spearman Coefficient of Rank Correlation
Colon Cancer Rate Per Capita Meat
Colon Cancer Rate 1.000
Per Capita Meat .813 1.000
23 sample size
.413 critical value .05 (two-tail)
.526 critical value .01 (two-tail)
The colon cancer histogram is right-skewed and its Anderson-Darling p-value (.027).suggest non-
normality at = .05. However, the meat consumption histogram appears normal and the Anderson-
Darling p-value (.621) confirms this.
40 30 20 10 0
Median
Mean
20 18 16 14 12 10 8
A nderson-Darling Normality Test
Variance 100.376
Skewness 1.23068
Kurtosis 1.35479
N 23
Minimum 1.100
A-S quared
1st Quartile 7.500
Median 12.500
3rd Quartile 19.300
Maximum 41.800
95% C onfidence Interv al for Mean
10.141
0.83
18.806
95% C onfidence I nterv al for Median
7.814 15.610
95% C onfidence Interv al for StDev
7.748 14.180
P-V alue 0.027
Mean 14.474
StDev 10.019
95% Confidence Intervals
Summary for Colon Cancer Rate

300 250 200 150 100 50 0
Median
Mean
160 140 120 100
A nderson-Darling Normality Test
V ariance 5178.31
S kew ness 0.633858
Kurtosis 0.535117
N 23
Minimum 19.00
A -Squared
1st Quartile 81.00
Median 134.00
3rd Q uartile 177.00
Maximum 313.00
95% C onfidence Interv al for Mean
107.19
0.28
169.42
95% C onfidence I nterv al for Median
103.00 168.10
95% C onfidence Interv al for StDev
55.65 101.85
P -V alue 0.621
Mean 138.30
S tDev 71.96
95% Confidence Intervals
Summary for Per Capita Meat
16.36 At = .05, there is a significant rank correlation between gas prices and carbon emission since the
Spearman coefficient of rank correlation is outside the critical region as given in the MegaStat output (|
.588| > .355).
H0: Rank correlation is zero (s 0) (no relationship)
H1: Rank correlation is positive (s > 0) (there is a relationship)
Spearman Coefficient of Rank Correlation
Gas Price ($/L) CO2/GDP (kg/$)
Gas Price ($/L) 1.000
CO2/GDP (kg/$) -.588 1.000
31 sample size
.355 critical value .05 (two-tail)
.456 critical value .01 (two-tail)
The gas price histogram appears left-skewed, but its Anderson-Darling p-value (.169).suggests normality
at = .05. However, the CO2 histogram is strongly right-skewed and non-normal (p < .005).
274
1.2 1.0 0.8 0.6 0.4
Median
Mean
1.00 0.95 0.90 0.85 0.80 0.75
A nderson-Darling Normality Test
V ariance 0.05581
S kew ness -0.414781
Kurtosis -0.915642
N 31
Minimum 0.38100
A -Squared
1st Quartile 0.61300
Median 0.87900
3rd Q uartile 1.03300
Maximum 1.19800
95% C onfidence Interv al for Mean
0.74983
0.52
0.92314
95% C onfidence I nterv al for Median
0.76341 0.99017
95% C onfidence Interv al for StDev
0.18878 0.31577
P -V alue 0.169
Mean 0.83648
S tDev 0.23623
95% Confidence Intervals
Summary for Gas Price ($/L)

2.0 1.5 1.0 0.5
Median
Mean
0.9 0.8 0.7 0.6 0.5 0.4 0.3
A nderson-Darling Normality Test
Variance 0.33697
Skew ness 1.39163
Kurtosis 0.78585
N 31
Minimum 0.13000
A-Squared
1st Quartile 0.31000
Median 0.41000
3rd Q uartile 0.97000
Maximum 2.08000
95% C onfidence Interv al for Mean
0.47288
2.55
0.89873
95% C onfidence I nterv al for Median
0.35000 0.67949
95% C onfidence Interv al for StDev
0.46388 0.77593
P-Value < 0.005
Mean 0.68581
StDev 0.58049
95% Confidence Intervals
Summary for CO2/GDP (kg/$)
16.37 At = .05, there is a significant rank correlation between this weeks points and last weeks points since
the Spearman coefficient of rank correlation is outside the critical region as given in the MegaStat output
(. 812 > .444). We are not surprised that team rankings usually do not change much from week to week.
There is no reason to expect normality since ratings do not tend toward a common mean.
H0: Rank correlation is zero (s 0)(no relationship)
H1: Rank correlation is positive (s > 0) (there is a relationship)
Spearman Coefficient of Rank Correlation
This Week Last Week
This Week 1.000
Last Week .812 1.000
20 sample size
.444 critical value .05 (two-tail)
.561 critical value .01 (two-tail)
275
Chapter 17
Quality Management
17.1 a. See text, p 732.
b. See text, p 731.
c. See text p 732-733.
17.2 Common cause variation is normal and expected. Special cause variation is abnormal.
17.3 Zero variation is an asymptote of aspiration, not achievable in human endeavors.
17.4 Answers will vary. Students may see themselves as interval customers of higher education, and employers as
external customers, or may refer to their place of employment or organizations like Starbucks or music that
they like.
17.5 Answers will vary. Use Likert scales for service attributes.
a. Cleanliness of vehicle, full gas tank, waiting time for sales help.
b. Length of queues, friendliness of staff (Likert), interest paid on accounts.
c. Price, seat comfort, picture quality (Likert scale for all).
17.6 Examples of barriers include employee fear, inadequate equipment, inadequate equipment maintenance,
insufficient employee training, flawed process design, unclear task definitions, poor supervision, lack of
support for employees.

17.7 Deming felt that most workers want to do a good job, but are often hampered by the work environment,
management policies, and fear of reprisal if they report problems.
17.8 Students may name Deming, Shewhart, Ishikawa, Taguchi, and others theyve heard of.
17.9 Demings 14 points (abbreviated) are on p. 736. See www.deming.org for a more complete list.
17.10 Techniques of SPC (statistical process control) are a specific subset of tools of TQM (total quality
management) and CQI (continuous process improvement).
17.11 Define parameters, set targets, monitor until stable, check capability, look for sources of variation or
nonconformance, make changes, repeat steps.
17.12 Attribute control charts are for nominal data (e.g., proportion conforming) while variable control charts are
for ratio or interval data (e.g., means).
17.13 a. Sampling frequency depends on cost and physical possibility of sampling.
b. For normal data, small samples may suffice for a mean (Central Limit Theorem).
c. Large samples may be needed for a proportion to get sufficient precision.
17.14 We can estimate using the sample standard deviation (s), or using
2
R d where R is the average range and
d2 is a control chart factor from Table 17.4, or using the average of the sample standard deviations of many
samples ( s ). If the process standard deviation is known, we do not need to estimate . But a little thought
will show that can only be known from one of the preceding methods.
276
17.15 This is the Empirical Rule (see chapters 4 and 7):
a. Within 1 standard deviations 68.26 percent of the time
b. Within 2 standard deviations 95.44 percent of the time
c. Within 3 standard deviations 99.73 percent of the time
17.16 Students may need to be reminded that sigma refers to the standard error of the mean n .
Rule 1. Single point outside 3 sigma
Rule 2. Two of three successive points outside 2 sigma on same side of centerline
Rule 3. Four of five successive points outside 1 sigma on same side of centerline
Rule 4. Nine successive points on same side of centerline
17.17
2
0.42
UCL 3 12.5 3 12.742
2.326 5
R
x
d n
= + = + =
2
0.42
LCL 3 12.5 3 12.258
2.326 5
R
x
d n
= = =
17.18
2
5
UCL 3 400 3 403.643
2.059 4
R
x
d n
= + = + =
2
5
LCL 3 400 3 396.357
2.059 4
R
x
d n
= = =
17.19 Estimated is
2
/ R d = 30/2.059 = 14.572, UCL = 98.37, LCL = 54.63
1 2 9
... 72.25 74.25 ... 82.25
76.5
9 9
x x x
x
+ + + + + +
= = =
1 2 9
... 43 31 ... 41
30
9 9
R R R
R
+ + + + + +
= = =
17.20
1 2 9
... 5.52 5.51 ... 5.51
5.50
8 8
x x x
x
+ + + + + +
= = =
1 2 25
... 0.13 0.11 ... 0.13
0.110
8 8
R R R
R
+ + + + + +
= = =
Estimate of = 5.50 and estimate of = (0.110)/(2.326) = 0.0473
17.21 R = 0.82 (centerline)
UCL = D4 R = (2.004)(0.82) = 1.64328
LCL = D3 R
= (0)(0.82) = 0
17.22
R
= 0.82 centerline for R chart
UCL = D4 R
= (2.574)(12) = 30.888 upper control limit
LCL = D3 R
= (0)(12) = 0 lower control limit
17.23 By either criterion, process is within acceptable standard (Cp = 1.67, Cpk = 1.67).
Cp index:
USL LSL 725 715
1.667
6 6(1)
p
C

= = =

277
Cpk index:
USL
USL 725 720
5.00
1
z

= = =

and
LSL
LSL 720 715
5.00
1
z

= = =

zmin =
USL LSL
min( , ) z z = min{ 5.00, 5.00} = 5.00 and so
min
5.00
1.667
3 3
pk
Z
C = = =
17.24 If the minimum capability index is 1.33, this process meets the Cp but fails on the Cpk criterion.
Cp index:
USL LSL 0.432 0.423
1.50
6 6(0.001)
p
C

= = =

Cpk index:
USL
USL 0.432 0.426
6.00
0.001
z

= = =

and
LSL
LSL 0.426 0.423
3.00
0.001
z

= = =

zmin =
USL LSL
min( , ) z z = min{6.00, 3.00} = 3.00 and so
min
3.00
1.00
3 3
pk
Z
C = = =
17.25 If the minimum capability index is 1.33, the process fails on both criteria, especially Cpk due to bad centering
(Cp = 1.67, Cpk = 1.67).
Cp index:
USL LSL 55.9 55.2
1.1667
6 6(0.1)
p
C

= = =

Cpk index:
USL
USL 55.9 55.4
5.00
.1

= = =

z and
LSL
LSL 55.4 55.2
2.00
.1

= = =

z
zmin =
USL LSL
min( , ) z z
= min{5.00, 2.00} = 2.00 and so
min
2.00
0.667
3 3
pk
Z
C = = =
17.26 Yes, it is OK to assume normality since n = (500)(.02) = 10.
(1 )
UCL 3
n

= +
(.02)(.98)
.02 3 .0388
500
= + =
(1 )
LCL 3
n

= =
(.02)(.98)
.02 3 .0012
500
=
17.27 Yes, safe to assume normality since n = (20)(.50) = 10.
(1 )
UCL 3
n

= +
(.5)(.5)
.50 3 .8354
20
= + =

(1 )
LCL 3
n

= =
(.50)(.50)
.50 3 .1646
20
=
17.28 Since n(1) = 40(.10) =4 is not greater than 10, we cant assume normality.
(1 )
UCL 3
n

= +
(.90)(.10)
.90 3 1.042302
40
= + = (use 1.000 since UCL cannot exceed 1).
278

(1 )
LCL 3
n

= =
(.90)(.10)
.90 3 .757698
40
=
17.29 Services are often assessed using percent conforming or acceptable quality, so we use p charts.
17.30 The charts and their purposes are:
a. x chart monitors a process mean for samples of n items. Requires estimates of and (or R and
control factor d2).
b. R chart monitors variation around the mean for samples of n items. Requires estimate of R or and
control chart factor D4.
c. p chart monitors the proportion of conforming items in samples of n items. Requires estimate of .
d. I chart monitors individual items when inspection is continuous (n = 1). Requires estimates of and .
17.31 Answers will vary. For example:
a. GPA, number of classes re-taken, faculty recommendation letters (Likert).
b. Knowledge of material, enthusiasm, organization, fairness (Likert scales for all).
c. Number of bounced checks, size of monthly bank balance errors, unpaid VISA balance.
d. Number of print errors, clarity of graphs, useful case studies (Likert scales for last two).
17.32 Answers will vary. For example:
a. Percent of time out of range, frequency of poor reception, perceived ease of use of menus.
b. Percent of time server is unavailable, frequency of spam or pop-up ads.
c. Customer wait in queue to pick up or drop off, rating of garment cleanliness, rating of staff courtesy.
d. Waiting time in office, staff courtesy, percent of cost covered by insurance.
e. Waiting time for service, perceived quality of haircut, rating of friendliness of haircutter.
f. Waiting time for service, perceived quality of food, rating of staff courtesy.
17.33 Answers will vary. For example:
a. MPG, repair cost.
b. Frequency of jams, ink cost.
c. Frequency of re-flushes, water consumption.
d. Battery life, ease of use (Likert scale).
e. Cost, useful life, image sharpness (Likert scale).
f. Cost, useful life, watts per lumen.
17.34 a. Sampling (not cost effective to test every engine).
b. 100% inspection (airlines record fuel usage and passenger load on every flight).
c. 100% inspection (McDonalds computers would have this information for each day).
d. Sampling (you cant test the life of every battery).
e. Sampling (cost might prohibit hospitals from recording this in normal bookkeeping).
17.35 x is normally distributed from the Central Limit Theorem for sufficiently large values of n (i.e., symmetric
distribution). However, the range and standard deviation do not follow a normal distribution (e.g., standard
deviation has a chi distribution).
17.36 Answers will vary. It is because x is normally distributed from the Central Limit Theorem for sufficiently
large values of n. However, some processes may not be normal, and subgroups typically are too small for the
CLT to apply unless the data are at least symmetric (see Chapter 8). For small n, normality would exist if the
underlying process generates normally-distributed dataa reasonable assumption for many, but not all,
processes (especially in manufacturing). If non-normal, special techniques are required (beyond the scope of
an introductory class in statistics).
17.37 a. Variation and chance defects are inevitable in all human endeavors.
b. Some processes have very few defects (maybe zero in the short run, but not in the long run).
c. Quarterbacks cannot complete all their passes, etc.
279
17.38 Answers will vary, depending on how diligent a web search is conducted.
17.39 Answers will vary (e.g., forgot to set clock, clock set incorrectly, couldnt find backpack, stopped to charge
cell phone, had to shovel snow in driveway, alarm didnt go off, traffic, car wont start, cant find parking).
17.40 Answers will vary (addition or subtraction error, forgot to record a deposit or withdrawal, recorded data
incorrectly e.g., $54.65 instead of $56.54, missing check number, lost debit card receipt).
17.41 Answers will vary (e.g., weather, union slowdown, pilot arrived late, crew change required, de-icing planes in
winter, traffic congestion at takeoff , no arrival gate available).
17.42 a. If = 1.00 mils and = 0.05 mils, and if the minimum capability index is 1.33, this process is well
below capability standards (Cp = Cpk = 0.95).
Cp index:
USL LSL 1.20 0.80
0.952
6 6(0.07)
p
C

= = =

Cpk index:
USL
USL 1.20 1.00
2.857
0.07
z

= = =

and
LSL
LSL 1.00 .80
2.857
0.07
z

= = =

zmin =
USL LSL
min( , ) z z = min{ 2.587, 2.857} = 2.857 and so
min
2.857
.952
3 3
pk
Z
C = = =
b. If = 1.00 mils and = 0.05 mils, and if the minimum capability index is 1.33, this process meets
capability standards (Cp = Cpk = 1.33).
Cp index:
USL LSL 1.20 .80
6 6(0.05)
1.33

= =

=
p
C
Cpk index:
USL
USL 1.20 1.0
4.00
0.05
z

= = =

and
LSL
LSL 1.00 0.80
4.00
0.05
z

= = =

zmin =
USL LSL
min( , ) z z = min{4.00, 4.00} = 4.00 and so
min
4.00
1.333
3 3
pk
Z
C = = =
c. The point is to show that a reduction in the process standard deviation can improve the capability index.
17.43 a. If = 1.00 mils and = 0.05 mils, and if the minimum capability index is 1.33, this process meets
capability standards (Cp = Cpk = 1.33).
Cp index:
USL LSL 1.20 .80
6 6(0.05)
1.33

= =

=
p
C
280
Cpk index:
USL
USL 1.20 1.00
4.00
0.05
z

= = =

and
LSL
LSL 1.00 0.80
4.00
0.05
z

= = =

zmin =
USL LSL
min( , ) z z = min{4.00, 4.00} = 4.00 and so
min
4.00
1.333
3 3
pk
Z
C = = =
b. If = 0.90 mils and = 0.05 mils, and if the minimum capability index is 1.33, this process meets
capability standards (Cp = 1.33, Cpk = 0.67).
Cp = 1.33, Cpk = 0.67
Cp index:
USL LSL 1.20 0.80
6 6(0.05)
1.33
p
C

= =

=
Cpk index:
USL
USL 1.20 0.90
6.00
0.05
z

= = =

and
LSL
LSL 0.90 0.80
2.00
0.05
z

= = =

zmin =
USL LSL
min( , ) z z = min{6.00, 2.00} = 2.00 and so
min
2.00
0.667
3 3
= = =
pk
Z
C
c. This example shows why we need more than just the Cp index. A change in the process mean can reduce
the Cpk index, even though the Cp index is unaffected.
17.44 a. If = 140 mg and = 5 mg, and if the minimum capability index is 1.33, this process meets capability
standards (Cp = Cpk = 1.33).
Cp index:
USL LSL 160 120
6 6(5)
1.33

= =

=
p
C
Cpk index:
USL
USL 160 140
4.00
5
z

= = =

and
LSL
LSL 140 120
4.00
5
z

= = =

zmin =
USL LSL
min( , ) z z = min{4.00, 4.00} = 4.00 and so
min
4.00
1.333
3 3
pk
Z
C = = =
b. If = 140 mg and = 3 mg, and if the minimum capability index is 1.33, this process exceeds capability
standards (Cp = Cpk = 2.22).
Cp index:
USL LSL 160 120
6 6(3)
2.22
p
C

= =

=
Cpk index:
USL
USL 160 140
6.67
3
z

= = =

and
LSL
LSL 140 120
6.67
3
z

= = =

zmin =
USL LSL
min( , ) z z = min{6.67, 6.67} = 6.67 and
min
6.67
2.22
3 3
pk
Z
C = = =
281
c. The point is to show that a reduction in the process standard deviation can improve the capability of a
process that already meets the requirement.
17.45 a.
100
UCL 6050 3 6223.205
3
= + = and
100
LCL 6050 3 5876.795
3
= =
b. Chart violates no rules.
c. Process is in control.
17.46. a. Histogram is bell-shaped and probability plot is linear with one possible low outlier (the Anderson-
Darling statistic has p-value = .296).

Pounds
P
e
r
c
e
n
t
6300 6200 6100 6000 5900 5800
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
0.296
6073
StDev 85.93
N 24
A D 0.422
P- Value
Probability Plot of Pounds
Normal
b. Yes, it approximates the normal distribution.
c. Sample mean is 6072.625 and the sample standard deviation is 85.92505. They are both close to the
process values.
17.47 a.
.07
UCL 1.00 3 1.0939
5
= + =
and
.07
LCL 1.00 3 .9061
5
= =
b. Chart violates no rules.
c. Process is in control.
282
17.48 a. Histogram is bell-shaped and probability plot is linear with one possible high outlier (the Anderson-
Darling statistic has p-value = .656).

Mils
P
e
r
c
e
n
t
1.20 1.15 1.10 1.05 1.00 0.95 0.90 0.85
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
0.656
1.006
S tDev 0.06547
N 35
A D 0.270
P -Valu e
Probability Plot of Mils
Normal
b. The distribution is approximately normal.
c. Sample mean is 1.006 and the sample standard deviation is 0.0655, both close to the process values.
17.49 a. Cp = 1.00 and Cpk = 0.83.
Cp index:
USL LSL 30 18
6 6(2)
1.00

= =

=
p
C
Cpk index:
USL
USL 30 23
3.50
2
z

= = =

and
LSL
LSL 23 18
2.50
2
z

= = =

zmin =
USL LSL
min( , ) z z = min{3.50, 2.50} = 2.50 and so
min
2.50
0.833
3 3
pk
Z
C = = =
b. If the minimum capability index is 1.33, this process is well below capability standards.
c. The frequency of the door of the door being opened. Door not being closed tightly.
17.50 a.
2
UCL 23.00 3 26.00
4
= + = and
2
LCL 23.025 3 20.00
4
= =
b. Control chart suggests a downward trend but does not violate Rule 4.
283
c. The sixth mean hits the UCL, so possibly not in control.
17.51 a. The sample mean of 23.025 and the standard deviation of 2.006 are very close to the process values ( =
23, = 2).
b. The histogram is symmetric, though perhaps platykurtic. Probability plot is linear but Anderson-Darling
test statistic has a p-value below .005 so fails normality test.

Temperature
P
e
r
c
e
n
t
30.0 27.5 25.0 22.5 20.0 17.5 15.0
99.9
99
95
90
80
70
60
50
40
30
20
10
5
1
0.1
Mean
<0.005
23.03
S tDev 2.006
N 80
A D 1.348
P -Valu e
Probability Plot of Temperature
Normal
17.52 a. Cp = 2.00 and Cpk = 2.00. If the minimum capability index is 1.33, this process is capable.
Cp index:
USL LSL 14.6 13.4
6 6(.10)
2.00
p
C

= =

=
Cpk index:
USL
USL 14.60 14.00
6.00
.10
z

= = =

and
LSL
LSL 14.00 13.40
6.00
.10
z

= = =

zmin =
USL LSL
min( , ) z z = min{6.00, 6.00} = 6.00 and so
min
6.00
2.00
3 3
pk
Z
C = = =
b. Since the process is capable, there is no reason to change unless the customers can see the variation.
284
17.53 a. Histogram is arguably normal, though somewhat bimodal. However, the probability plot is linear and
the Anderson-Darling tests p-value of .795 indicates a good fit to a normal distribution.

Hours
P
e
r
c
e
n
t
9400 9200 9000 8800 8600 8400 8200
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
0.795
8785
StDev 216.1
N 20
A D 0.224
P- Value
Probability Plot of Hours
Normal
b. Center line = 8760,
200
UCL 8760 3 9028.33
5
= + = , and
200
LCL 8760 3 8491.67
5
= =
c. Center line = 8784.75,
216.1398
UCL 8784.75 3 9074.73
5
= + = , and
216.1398
LCL 8784.75 3 8494.77
5
= =
d. The UCL and LCL from the sample differ substantially from those based on the assumed process
parameters. This small sample is perhaps not a reliable basis for setting the UCL and LCL.
17.54 a. Cp = 2.00 and Cpk = 1.83.
Cp index:
USL LSL 477 453
6 6(2)
2.00
p
C

= =

=
Cpk index:
USL
USL 477 466
5.50
2
z

= = =

and
LSL
LSL 466 453
6.50
2
z

= = =

zmin =
USL LSL
min( , ) z z = min{5.50, 6.50} = 5.50 and so
min
5.50
1.83
3 3
pk
Z
C = = =
b. If the minimum capability index is 1.33, this process is capable.
c. Need to pack the box without crushing the Chex. Given the smallness and fragility of each Chex it
would be difficult to attain.
17.55 a.
3
UCL 465 3 470.20
3
= + =
3
LCL 465 3 459.80
3
= =
b. No rules are violated. The process is in control.
285
17.56 a. The histogram and probability plot do not appear grossly non-normal, but the p-value (.042) for the
Anderson-Darling test suggests that the box fill may not be normal (our conclusion depends on ).

Grams
P
e
r
c
e
n
t
472 470 468 466 464 462 460 458 456
99
95
90
80
70
60
50
40
30
20
10
5
1
Mean
0.042
464.2
StDev 2.797
N 30
A D 0.761
P- Value
Probability Plot of Grams
Normal
b. Sample mean is 464.2 which is very close to 465.
17.57 a. From MegaStat: UCL = 12.22095, LCL = 11.75569, centerline = 11.98832
b. Process appears to be in control.
c. Histogram approximates the normal distribution c. The histogram and probability plot (Anderson-
Darling p-value = .871) suggest a normal distribution.

Weight in Grams
P
e
r
c
e
n
t
12.8 12.6 12.4 12.2 12.0 11.8 11.6 11.4 11.2
99.9
99
95
90
80
70
60
50
40
30
20
10
5
1
0.1
Mean
0.871
11.99
StDev 0.2208
N 84
A D 0.204
P-Valu e
Probability Plot of Weight in Grams
Normal
286
17.58 a.
(1 )
UCL 3

= +
n

(.06)(.94)
.06 3 .11038
200
= + =

(1 )
LCL 3

=
n
(.06)(.94)
.06 3 .0096
120
= =
b. Yes, 200(.06) = 12 and 200(.94) = 188 both are greater than 10.
17.59 a.
(1 )
UCL 3

= +
n

(.05)(.95)
.05 3 .1154
100
= + =

(1 )
LCL 3

=
n

(.05)(.95)
.05 3 .0154
100
= = (set to .0000 since LCL cant be negative).
b. Sample 7 hits the LCL, so the process may not be in control.
c. Samples are too small to assume normality (n = 5). Better to use MINITABs binomial option to set the
control limits.
17.60 Chart A: Trend
Chart B: Oscillation
Chart C: Level Shift
Chart D: Instability
Chart E: None
Chart F: Cyclical
17.61 Chart A: Rule 4.
Chart B: No rules violated.
Chart C: Rule 4.
Chart D: Rules 1, 4.
Chart E: No rules violated.
Chart F: Rules 1, 2.
287
17.62 Each pattern is clearly evident, except possibly instability in the third series. See charts below.
Yes, upward trend present. Yes, downward trend present
Yes, instability (Rule 1). Yes, cyclical (only 6 centerline crossings).
Maybe oscillation (14 centerline crossings).
288
17.63 Each pattern is clearly evident, except possibly instability in the third series.
In Control?
0.90
1.00
1.10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample

Up Trend?
0.90
1.00
1.10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample
Yes, in control (no rules violated). Yes, upward trend, but no rule violations.
Down Trend?
0.90
1.00
1.10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample

Unstable?
0.90
1.00
1.10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample
Yes, downward trend, but no rule violations. Yes, unstable (Rule 1).
Cycle?
0.90
1.00
1.10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample
Yes, cycle (only 7 centerline crossings).
17.64 Answers will vary, sample points presented.
a. p-chart since these are attributes (e.g., percent of patients who received aspirin).
b. Randomness cant be overcome completely, each doctor/nurse interprets standards differently. Also,
there may be cost constraints or issues of time in emergency situations.
c. Reduction in variation raises costs: training, monitoring and evaluating
17.65 a. Type I error: disease is not present, but remove meat anyway (sample result shows process out of control
when it is not). Type II error: disease is present, but fail to remove the meat (sample result shows
process in control when in fact it is not).
b. 27,400,000/4,000 = 6,850 two-ton truckloads. Need to know the cost of trucks, drivers, and disposal
fees. But where to put it?
c. NIMBY (not in my backyard).
289
17.66 a. Type I error: disease is not present, but remove meat anyway (sample result shows process out of control
when it is not). Type II error: disease is present, but fail to remove the meat (sample result shows
process in control when in fact it is not).
b. Public cannot see the salmonella, so they have a collective interest in hiring an agent on their behalf to
inspect the meat (e.g., government inspectors, who presumably are unbiased and have the safety of the
public in mind).
17.67 Type I error: cereal actually was safe for human consumption, but was discarded. Type II: cereal was
unsafe for human consumption, but was sold. The former is a cost borne by the company (lost profit)
and perhaps by consumers (higher prices), while the latter is a cost borne both by the public (possible
health hazard) and by the company (possible litigation). Although both parties bear some cost, the costs
are different in nature and may differ in magnitude as well. Consumers cannot see or detect pesticide, so
government inspection is needed to protect the public interest and to apply the laws.
17.68 Answers will vary. It may be hard top define a defective M. Since sampling is easy (and tasty) It
should be possible to inspect enough M&Ms to get a reliable confidence interval for .
17.69 Answers will vary. Observers may disagree as to what constitutes a defect. Presumably, a newer car has
fewer paint defects, though it does depend on usage.
17.70 Answers will vary. It may be hard to define a broken Cheerio. Randomness should be attainable.
290
291

You might also like