You are on page 1of 109

Practical Data Analysis with JMP

Instructor Solutions
Robert Carver

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. Practical Data Analysis with JMP Copyright 2010, SAS Institute Inc., Cary, NC, USA ISBN 978-1-60764-475-0 ISBN 978-1-60764-487-3 (electronic book) All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st printing, July 2010 SAS Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hardcopy books, visit the SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.

Other brand and product names are registered trademarks or trademarks of their respective companies.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 2

Scenario 1 Student answers will vary. Answers will depend on data set student selects to input into a new JMP data table Scenario 2 a. Quantity of cement (component 1), expressed as kg in a m^3 mixture.
b. c. Quantity of Superplasticizer (component 5), expressed as kg in a m^3 mixture. Quantity of Fine Aggregate (component 7), expressed as kg in a m^3 mixture.

Scenario 3 Columns that need to be corrected: DMDMARTL, RIDEXPRG, BPQ150A Scenario 4 NHANES does not contain experimental data because the experimenters are not manipulating any of the variables. The data was not obtained through a designed experiment but through observation. Scenario 5 Student answers will vary. Excel sheet should be imported into JMP. Scenario 6 This data table contains monthly stock values and volume from the FTSE 100 index, from1 January 2003 through 1 December 2007. Data were collected by observation on the first day of each month. The date column is ordinal because it is a chronological variable. Open, High, Low, Close, Volume, and change% are all Continuous columns containing numeric measurements. Open is the FTSE 100 indexs opening price. High represents the high price for the day. Low is the low price for that day. Close is the closing price for that day. Volume is the number of shares exchanged during the day. change% is how much the index changed from open to close.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

Scenario 7 This data table contains significant statistics from earthquakes recorded worldwide between August 20, 2009 and September 19, 2009. Data was collected by observation on the first day of each month. The date column is ordinal because it is a chronological variable. Latitude is a continuous variable indicating the latitudinal coordinate of where the earthquake took place. Longitude is also a continuous variable indicating the longitudinal coordinate of where the earthquake took place. Magnitude is a continuous measurement of how strong the earthquake was, while depth is a continuous variable describing how far from the surface the epicenter was. Time is an ordinal column describing when the earthquake took place. This data was found by observation. Scenario 8 This table contains observational data from the WHO regarding tobacco use, cardiovascular disease and cancer rates. Code is a nominal variable uniquely identifying each nation. Country is a nominal variable that provides the name of the country relating to the data. Region is also a nominal variable indicating the region where the country is located in. TobaccoUse is a continuous variable observed describing the prevalence of tobacco use in that country. Female and Male are both continuous variables that were found observationally which describe the prevalence of tobacco use for both genders. CVmort is the mortality rate from cardiovascular disease for this country and CancerMort is the cancer mortality rate for this country. Both are continuous. Scenario 9 The sampling weight is equal across the stratified sample at approximately 1.6, reflecting the fact that there are approximately equal numbers of countries within each of the four quartiles.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 3
Scenario 1 a.Using the grabber tool, click and drag upwards to increase the number of bars in the histogram. A second peak near 80 appears when as the number of bars increases, while the peak at 75 remains.

b.To make the peaks more distinct, increase the spread on the axis. This effectively zoomed in on the two

peaks.
Distributions LifeExp Quantiles
100.0% maximum 99.5% 97.5% 90.0% 75.0% quartile 50.0% median 25.0% quartile 10.0% 2.5% 0.5% 0.0% minimum 83.529 83.529 81.9905 80.15 76.966 72.902 64.622 51.7618 44.7239 40.119 40.119

Moments
Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N 69.47079 10.417608 0.7460204 70.942142 67.999438 195

67 69 71 73 75 77 79 81 83 85

c.Scale can be manipulated in order to change the center, shape, and spread of a histogram, so it is

important to carefully analyze and think critically about the choice of scale on an axis.
d.Africa is home to the 7 countries with the shortest life expectancies in the world. Social and economic

factors like healthcare, education, unemployment, income, and even political stability can account for this.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP


Scenario 2 a.This histogram has a shape that is skewed to the left, has a mean of about 70, and a spread described by a range from
35 to 80. It has one peak

Distributions LifeExp Quantiles


100.0% maximum 99.5% 97.5% 90.0% 75.0% quartile 50.0% median 25.0% quartile 10.0% 2.5% 0.5% 0.0% minimum 78.3 78.3 76.9976 75.1846 71.77 67.231 57.251 48.8434 42.6994 39.906 39.906

Moments
Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N 64.364077 9.8722116 0.7069638 65.758399 62.969755 195

40

50

60

70

80

b.The five-number summary from 1985 has a minimum of 39.9, a 25% quartile of 57.2, a 75% quartile of 77.8, and a maximum of 78.3 with a median of 67.2. The 2010 dataset has a minimum of 40.1, a 25% quartile of 64.6, a 75% quartile of 76.9, and a maximum of 83.5 with a median of 72.9. Clearly, every statistic from the five-number summary has increased indicating life expectancy has gone up across the entirety of the distribution. However, the lower end of the distribution has increased by less than the higher end of the distribution. c.The standard deviation is 9.87 in the 1985 data compared to 10.4 in the 2010 data. . d.Similar to the 2010 histogram, the mean is less than the median in 1985 which is indicative of a skewed to the left distribution, which we see in the histogram.

Scenario 3 a. The points furthest to the left and right indicate the minimum and maximum respectively. In each boxplot, the ends of the box represent the first and third quartiles, and the line within the box represents the median. The diamond shows the location of the mean. We see a handful of outlying points in the LifeSpan boxplot, but not in the TotalSleep plot.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 3 Solutions

b. The Total Sleep boxplot is more symmetric: the box is approximately in the center of the graph with

points evenly spread on either side. In addition, the median and mean diamond coincide , and the median line is centered in the box. Similarly, the mean and median of Total Sleep are very close in value (10.45 and 10.53). In contrast, the LifeSpan boxplot has a long upper whisker with high outliers, but a very short lower whisker. The mean exceeds the median.
c.99.5% of the species have a life span less than 100 years. d.Nondreaming is more symmetrical and has a center of about8. It ranges from 2 to 17.9 hours. Dreaming

is skewed to the right and has a center of 1.8. It ranges from 0 to 6.6.
e.The animals that get the most sleep tend to be relatively small animals and have low predation, exposure,

and danger values.


f.In the distribution of body weight, there is a large gap between the two elephants and the other majority

of mammals who all weigh significantly less than the two elephants.
g.The animals that sleep in the most exposed locations are also the largest in terms of body weight. This

may be because larger animals cannot hide as easily, or due to sheer size, they can sleep in exposed locations safely.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP


Scenario 4 a....Volume has a nearly symmetrical and normal distribution. It ranges from 1043.49 to 2115.33 with a median of 1726.22 and a mean of 1710.49. b.Change% has a nearly symmetrical shape ranging from -9.466 to 8.654. Its center can be described by the mean of .8639 and a median of 1.226. c.The FTSE declines approximately 25% of the time. d.Both graphs clearly show the range of the Close variable. The steady growth over time is clear in the line chart, but the histogram loses the information related to time. On the other hand, the two peaks that are so evident in the histogram are invisible in the line graph.
Chart
8000 7000 6000 5000 Close 4000 3000 2000 1000 01/02/2003 02/03/2003 03/03/2003 04/01/2003 05/01/2003 06/02/2003 07/01/2003 08/01/2003 09/01/2003 10/01/2003 11/03/2003 12/01/2003 01/02/2004 02/02/2004 03/01/2004 04/01/2004 05/04/2004 06/01/2004 07/01/2004 08/02/2004 09/01/2004 10/01/2004 11/01/2004 12/01/2004 01/04/2005 02/01/2005 03/01/2005 04/01/2005 05/03/2005 06/01/2005 07/01/2005 08/01/2005 09/01/2005 10/03/2005 11/01/2005 12/01/2005 01/03/2006 02/01/2006 03/01/2006 04/03/2006 05/02/2006 06/01/2006 07/03/2006 08/01/2006 09/01/2006 10/02/2006 11/01/2006 12/01/2006 01/02/2007 02/01/2007 03/01/2007 04/02/2007 05/01/2007 06/01/2007 07/02/2007 08/01/2007 09/03/2007 10/01/2007 11/01/2007 12/03/2007 Date 0

e.This line graph shows fluctuation without any obvious pattern. The monthly percentage change seems to vary at random from month to month, typically remaining approximately between 3% and +3%. There is no obvious growth over the five years, in contrast to the closing index value.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 3 Solutions

Scenario 5 a.The histogram has four bars. DFW, LAX, and ORD all have high with counts of over 4000 while LAS is low with only a count of around 400.
Distributions DEST Frequencies
Level DFW LAS LAX ORD Total N Missing 4 Levels Count 4601 408 4215 5571 14795 0 Prob 0.31098 0.02758 0.28489 0.37655 1.00000

DFW

LAS

LAX

ORD

b.The distribution is strongly skewed to the right. It has a median of -1 and mean of 9.3. The range is from 42 minutes (42 minutes early) to 965 minutes. From the histogram and descriptive statistics, it is clear that the vast majority of flights are delayed for less than 90 minutes, with approximately half arriving slightly early.
Distributions DELAY Quantiles
100.0% maximum 99.5% 97.5% 90.0% quartile 75.0% median 50.0% quartile 25.0% 10.0% 2.5% 0.5% 0.0% minimum 965 234.23 116.15 42 13 -1 -10 -16 -24 -30 -42

Moments
Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N 9.3178528 39.406454 0.3336058 9.9717648 8.6639408 13953

0 100

300

500

700800

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

6 Practical Data Analysis with JMP

c.Because airlines attempt to schedule arrivals accurately, it is unlikely that very many flights would be extraordinarily early. However, given the many possible reasons for delays and the nature of travel, some flights can be exceptionally late. The practical minimum sets a lower bound for this variable, but there isnt a comparable upper bound. As such, a few flights with very long delays will tend to skew the data. d.Nearly half of all flights are delayed. Scenario 6 a.TobaccoUse is somewhat symmetrical with a mean of 24.77 and median of 25.6. It ranges from 4.3 to 51.8. b.CancerMort is skewed to the right with a mean of 132.3 and median of 133. It ranges from 60 to 306 c.CVMort has two peaks at around 150 and 400. It is skewed to the right. It has a mean of 355.5 and a median of 375. It ranges from 106 to 713. d.Overall, TobaccoUse is more uniform than CancerMort and CVMort. CancerMort has one peak and CVMort has two peaks. CVMort has the largest range and TobaccoUse has the smallest range. TobaccoUse is the most symmetrical of the three, while CancerMort and CVMort are both skewed right. e.Europe & Central Asia and Sub-Saharan Africa have the highest count of countries in this data table. South Asia has the lowest count and America, East Asia & Pacific and Middle East & North Africa all fall in the middle. f.Women generally use less Tobacco products than men do. The center for the male distribution is around 35 compared to around 10 for women.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 4

Scenario 1 a.
Mosaic Plot
1.00

0.75

Public

0.50 Private 0.25 Both 0.00

Region

Public provision is most common by far in the Americas and Europe & Central Asia. Provide provision seems to be the norm in the rest of the world. Most areas have relatively few countries with both public and private, though such arrangements are fairly common (more than 25% of countries) in the Americas.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

c.

The fitted line and RSquare value are shown above. There is a strong, positive linear relationship between the two variables.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 5

Scenario 1 NOTE: parts a through d use a contingency table like this one:

a.Pr(Sub-Saharan Africa) = 0.2654


b.Pr(Longer than 90) = 0.6296. c.Pr(Longer than 90 and Sub-Saharan Africa) = 0.1358 d.Pr(Longer than 90|Sub-Saharan Africa) = 0.5116 NOTE: for the remaining parts of this problem, use a contingency table like this one:

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

e.Pr(Both) = 0.1656 f.Pr(Both or SESAP) =Pr(Both) + Pr(SESAP) Pr (Both and SESAP) =0.1656 +0.1401 0.0255 = 0.2802. g.Pr(Both |SESAP) = 0.1818. h.No. Comparing the probabilities from parts e and g, we see that Pr(Both) Pr(Both|SESAP). Therefore the two are not independent.

Scenario 2
NOTE: This contingency table provides the necessary information to respond to all parts:

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 5 Solutions 3

a.Pr(Mexican American) =0.2303 b.Pr(Mexican American or Never Married) = Pr(M.A.) + Pr(N.M.) Pr (M.A. and N.M.) = 0.2303 + 0.3596 0.0949 = 0.495. c.Pr(Mexican American and Never Married) = 0.0949. d.Pr(Never married| Mexican American) = 0.4114 e.No. In part e we found that Pr(Never Married|Mexican American)= 0.4114. The marginal probability Pr(Never Married) = 0.3596. Because the probabilities are unequal, we find that the events are not independent.

Scenario 3
For all of the questions that follow, we can use this contingency table:

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP

a.Pr(Binge at least once a week) = 0.1276. b.Pr(Never binge) = 0.4212. c.Pr(Accident) = 0.0560. d.Pr(Accident or binge at least once a week) = Pr(Accident) + Pr(at least once a week) Pr(Accident and binge at least once a week) = 0.0560 + 0.1276 .0184 = 0.1552. e.Pr(Accident | binge at least once a week) = 0.1443. f.Pr(Binge at least once a week| Accident) = 0.3286. g.No. Comparing the results in parts a and f or parts c and e should lead to the conclusion that because the relevant marginal probabilities do not equal the corresponding conditionals, the events are not independent.

Scenario 4
NOTE: Different contingency tables are needed for different parts of this problem.

a.Pr(Not in labor force) =0.3122


b.Pr(South) =0.3500

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 5 Solutions 5

Parts c and d rely on this table:

c.Pr(Part-time or female) = Pr(part-time)+ Pr(female) Pr(part-time and female) = 0.1354 + 0.5647 0.0964 = 0.6037 d.Pr(Part-time|female) = 0.1707 e. The marital status column identifies three types of respondents who are not married: those who are divorced, never married, or widowed. To find the probability of selecting a person who is not married, we sum the probabilities of these three categories: Pr(Not Married) =0.12718 + 0.23420 + 0.08281 = 0.44419.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

6 Practical Data Analysis with JMP


This table can be used for Part f:

f.No. In part a we found Pr(not in labor force) = 0.3122. From this contingency table we see that Pr(not in labor force|divorced) = 0.2638. Because these two probabilities are unequal, we conclude that the two events are not independent.

Scenario 5 a.Pr(Central)= 0.2863


b.Pr(rupture) = 0.2125. c.This problem is complicated by the fact that most cells in this column are blank and the remaining cells contain the label Yes. There are 189 Yes values and 468 rows in all. Therefore Pr(Evacuation) = 189/468 = 0.4038. d.Just as in the prior question, we are impeded by missing data here. If we construct a contingency table of incident type vs. Evac, we find the conditional probability if 0.2291. If we replace the missing cells in the Evac column with No, we find Pr(Evac|Rupture) = 0.3516. In the context of this question, either response is acceptable.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 5 Solutions 7

e.

Pr(Rupture or Explosion) = Pr(Rupture) + Pr(Explosion) Pr(Rupture and Explosion) = 0.2230 + 0.3676 0.0784 = 0.5122.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 6

Scenario 1 a.
The normal quantile plot appears to the left. The distribution is strongly skewed positively, and therefore the normal model is not suitable for this variable.

b.For the normal model described, 14.96% of the distribution lies to the left of 1.319. c.Pr(X>5.5) = 1 0.9426 = 0.0574. In comparison, based on the reported quantiles, we find that more than 10% of the observed data lies above 5.5 children per woman.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

Scenario 2
a. In the shadowgram to the left we see a generally symmetric distribution that seems to be moundshaped. There may be some indication of a secondary peak at approximately 299,950 km/sec., but the overall impression is that the distribution might be well-described by the normal model.

b.

In the normal quantile plot the points closely follow the 45-degree diagonal line further suggesting that suitability of the normal model.

c.The data set provides some support for the assumption. Michelsons various measurements of the speed of light seem to vary according to an approximate normal distribution.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 6 Solutions 3

Scenario 3
a. This distribution shows mild skewness. The lower tail is truncated and therefore shorter and thicker than a normal distribution would be.

b.This table summarizes the comparison between the theoretical normal distribution identified in the problem and the percentiles of the actual observed data:
%ile 10 25 50 75 90
N(35.818,16.708) Compressive Strength

Difference 0.205 0.855 1.375 0.877 1.751

14.408 24.550 35.818 47.086 57.228

14.203 23.695 34.443 46.209 58.979

The normal model matches the observed data most closely at the 10th percentile. At other points in the distribution, it either over- or under-estimates the value.

Scenario 4 a.Student answers will vary. Most will likely choose the weekly change column corresponding to the Hang Seng
market index, but others might select a different column (e.g. Tel Aviv or S&P). In these graphs, the points track most closely to the diagonal line. b.Student answers will vary here as well. The FTSE and Istanbul (IGBM) weekly changes have normal quantile plots that deviate most from the diagonal line. c.The mean and standard deviation of the changes in Hang Seng for the weeks observed are 1.102065 and 5.242892. For a normal distribution with that mean and standard deviation, Pr (X <0) =0.5832, or approximately 0.58.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP


d. Looking at the table of quantiles (left), we see that the 75th percentile is at 2.43% and the 50th percentile is at 1.5195%. We know therefore that the Hang Seng index lost value somewhere between 50% and 75% of the time. This is consistent with the result in part c.

Scenario 5
Use these graphs to respond to all parts:

a.This histogram is mound-shaped with a single peak centered near 500 minutes. The large majority of respondents report between approximately 300 and 700 minutes of sleep per week. b.No the normal model is not suitable, We see considerable deviation from the normal pattern in the upper and lower tails. c.The Age histogram is more skewed that the Sleeping histogram, with distinct secondary peaks in each tail. It appears to be centered near 40, but with the peaks in the tails it is difficult to generalize about the degree of dispersion. Again, the normal quantile plot casts doubts on using a normal model for this variable. The normal model seems to fit acceptably near the center of the distribution, but deviates quite dramatically in the tails.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 6 Solutions 5

Scenario 6
These graphs can be used to respond to parts a and b.

a.Closing values appear to be symmetric and bimodal, with peaks between 4000-5000 and 6000-6500. The center of the distribution is close to 5000 and it ranges from approximately 3500 to 7000. In contrast, the %change column is moderately symmetric with a single peak just above 0. Most of the distribution lies between -5 % and +5 %. b.Though neither distribution is perfectly described as normal, the normal quantile plot for the change % column lies closer to the diagonal than the corresponding plot for the Close column. c. The volume column has a normal quantile plot that looks quite close to a normal distribution. It would be well described by a model ~N(1710.4911, 203.1369).

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 7

Scenario 1 a.Student answers will vary due to the operation of the random number generator.
b.Use the Binomial distribution to find that the probability of drawing a sample with 250 households all of which have Internet service (i.e.,none without) is 2.8409e-22, which is approximately equal to 0 c.The probability that a SRS of 250 households would include 25 or fewer homes without Internet service is 0.00031368. d.No. If we scroll through the table, we notice that the cumulative probabilities approach 1 after approximately 60 homes without service. In other words, in samples of 250 homes it is almost certain that 60 or fewer would have no service. It is virtually impossible to obtain a simple random sample of 250 homes with no Internet service at all.

Scenario 2 a. The proportion of countries in Sub-Saharan Africa is 0.24227.


b. As shown to the right, the mean is 39.286 deaths per 1,000 live births; the standard deviation is 37.806.

c.

Student answers will vary. Above we find the results of one random sampleonly 5 of the 30 countries are in SubSaharan Africa (16.7%). The mean mortality rate in the sample is 31.27 (note that in this sample only 28 of 30 countries reported an infant mortality rate). In general students results will not match the population values shown in parts a & b due to sampling variation.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

Scenario 3 a.Student answers will vary. In general, the sampling distribution will be bell-shaped and symmetrical, centered very
near 0.40 and ranging from about 0.35 to 0.45. b.Student answers will vary again. In general, the sampling distribution will be roughly bell-shaped and symmetrical, centered very near 0.40. It will be more squat and have a larger range, about 0.30 to 0.50. The reduction of sample size by a factor of essentially doubles the spread of the sampling distribution. c.Student answers will vary again. In general, the sampling distribution will be roughly bell-shaped and possibly a little left skewed, centered very near 0.95. Compare to the distribution in part c, this distribution will be steep and range only from about 0.90 to 1.00. d.Small samples have more variability than larger onesthe comparison of the sampling distributions in (a) and (b) above illustrates that. Generally, to reduce sampling variation by one-half, we must increase the sample size by a factor of 4. e.In part c we notice that the population with a proportion of .95 generates samples with comparatively small standard errors. The risks associated with sampling variation tend to be smaller in more uniform populations.

Scenario 4 a.Student responses will vary. In general, the sampling distribution will be bell-shaped and symmetrical, centered very
near 15 with an overall standard error (std. deviation of the sample means) approximately equal to 0.10 and ranging from about 14.7 to 15.3. b.Student responses will again vary. In general, the sampling distribution will be bell-shaped and symmetrical, centered very near 15 with an overall standard error (std. deviation of the sample means) approximately equal to 0.20 and ranging from about 14.4 to 15.6. c.Student responses will again vary. In general, the sampling distribution will be bell-shaped and symmetrical, centered very near 15 with an overall standard error (std. deviation of the sample means) approximately equal to 0.40 and ranging from about 13.8 to 16.2. d.Student responses will again vary. In general, thanks to the Central Limit Theorem the sampling distribution will be bell-shaped and symmetrical, centered very near 15 with an overall standard error (std. deviation of the sample means) approximately again equal to 0.10 and ranging from about 14.7 to 15.3. e.The results will be very similar to parts a and d though each student may have slightly different numerical results. f.Reducing the sample size gradually increases the standard error of the sampling distribution (i.e. increases the variability across samples). Populations with relatively large standard deviations generate samples with comparatively large sampling variation. With samples this large (n = 1000) the shape of the parent population has no appreciable effect on the center, shape or spread of the sampling distribution.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 8

Scenario 1 a.
Based on the analysis shown to the left, 95 of 447 disruptions with known causes were ruptures. The estimated confidence interval is from 0.177 to 0.253. We can be 95% confident that the true population proportion is somewhere between 0.177 and 0.253. b.We can be 90% confident that the true proportion of ruptures is between 0.182 and 0.246. c.When we lower the confidence level the interval becomes narrower. d.Probably not. We are 95% confident that the populaiton proportion is between 17.7% and 25.3%. As such, it would be implausible to conclude that the true value is only 15%.

Scenario 2 a.Yes. We have a random sample of sufficient size to invoke the Central Limit Theorem.
b. We are 95% confident that the proportion of homes without Internet service is between 0.102 and 0.188.

c.

With a p-Value of 0.0556, this sample falls just short of statistical significance. The sample does not provide sufficient evidence to conclude that the rate is currently below 18%.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP


d. Now the confidence interval is narrowerfrom 0.12 to 0.16and we would reject the null hypothesis and conclude that the current proportion of homes without Internet service is less than 0.18.

e.A larger sample with the very same proportion provides more precision in the confidence interval (i.e. a narrower interval) and enhances the statistical significance of the test result.

Scenario 3 a.Yes. We have a random sample of sufficient size to invoke the Central Limit Theorem.
b.We can be 95% confident that the population proportion is between 0.073 and 0.083. c.We can be 99% confident that the population proportion is between 0.071 and 0.085. Both intervals are centered at the same value, but the 99% interval is wider than the 95% interval. d.We can be 90% confident that the population proportion is between 0.073 and 0.083. e.The lower the confidence level, the narrower the interval.

Scenario 4 a.Yes. We have a random sample of sufficient size to invoke the Central Limit Theorem.
b. We can be 95% confident that between 4.9% and 6.4% of all individuals in 18-39 age range have been in accidents after drinking.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 8 Solutions 3

c.

Because of the questions wording, a two-tailed test is most appropriate here. Based on this random sample, we can confidently conclude that it is not credible to conclude that 10% of the population binge drinks at least once per week. If anything, this sample suggests a higher population proportion.

Scenario 5 a.It depends The total sample size is 189; because some events or combination of events are relatively rare, it may be
the case that np <5, in which case we should not interpret the inferential results. b.56 of the 189 observations were of dolphins feeding in the evening. The relative frequency is high enough to satisfy the conditions for inference. We can be 95% certain that the population proportion is between 0.236 and 0.365. c.Although the observed relative frequency is 0.53, and thus greater than 0.5 the p-Value is 0.362 which is quite high enough that we can readily attribute the result to sampling error. In other words, a null hypothesis that the population proportion is 0.50 or less is still plausible, so we fail to reject the null.

Scenario 6 a.

We can be 90% confident that the proportion of trading days on which McDonalds stock increases is somewhere between 0.423 and 0.569. b.

This interval is a bit wider that the earlier one: both are centered at 0.496, but the 95% interval reaches from 0.4098 to 0.5824.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 9

Scenario 1 a.Probably. These columns contain continuous data, and though both distributions are strongly right-skewed, both have
a sufficiently large number of observations to rely on the Central Limit Theorem. The critical question is whether we can view this particular time period as representative of the overall process of pipeline disruptions; if we can regard it as random, then we can proceed to make inferences. b.

We can be 95% confident that the mean dollar value of property damage associated with pipeline disruptions is somewhere between $ 623,201 and $ 3,295, 892. c.The 90% interval is $ 307,156 to $ 2,979,847. We can be 90% certain that the mean damage cost lies between these two values. d.The interval remains centered at $ 1,336,345 but becomes narrower when we reduce the confidence level. e.

We can be 99% confident that the mean dollar cost of lost natural gas is between $6646.48 and $39,415.62.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP


f. The test results are shown to the right. The relevant p-Value for the aternative hypothesis that < 20,000 is 0.3425 to high to reject the null at any conventional significance level.

g.

Student answers will vary but should conclude that if the null hypothesis were that = approximately $ 31,100 then we would reject the null in favor of the onesided alternative hypothesis.

h.Student answers will vary. A useful approach is to comment on how the power of the test changes at different true values of the population mean with the alpha level at (say) 0.05 and the sample size fixed at 468. So, for instance, if the population mean is just slightly less than $20,000 (say, 19,786), the power of this test is just slightly more than alpha. If the true mean were approximately $12,700 power would increase to 0.20. For the power level to reach 0.5, the true mean would need to be roughly $5,300. At the minimum possible loss of $0, the power of the test to discern the false null hypothesis would be approoximately 0.72.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 9 Solutions 3

Scenario 2 a.

Yes. We do not know the population so we will use the t-distribution. Because the sample is small (n = 20) we want to see if the sample data suggest that the population is roughly normal in shape. The histogram and normal quantile plots indicate mild skewness but no serious indication of non-normality.

b.

Based on this sample data, we can be 90% confident that the speed of light is between 299,810.5 and 299,852.5 km. per second.

c.From the confidence interval in part b we can see that Michelson would probably have (erroneously) concluded that the value 300,000 kps is not credible. The two-tailed hypothesis test yields a P-value < 0.0001 and a test statistic equal to 13.898; Michelson would have rejected a null hypothesis that the constant speed of light is 300,000 kps. d.Student answers may vary, but assuming a significance level of 0.05 and a two-sample test, if the null value were approximately 299,857 Michelson would not have rejected the null hypothesis.

Scenario 3 a.Student answers will vary. On the one hand, because both measurements refer to the same childs height, we expect
them to be quite similar. On the other hand, when a person stands the spine may compress slightly, so that standing height measurements may be less than reclining measurements. b. From the test results we see that we reject the null hypothesis that the mean difference = 0. It is not credible that the paired measurements are equal; the 95% confidence interval suggests that recumbent length is on average somewhere between .8 and 1.0 cm more than the comparable measurement of height.

Scenario 4 a.Yes. We do not know the population so we will use the t-distribution. Because the sample is so large (n = 6774) we
can rely on the Central Limit Theorem to proceed.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP


b. We can be 95% confident that the mean flight delay was somewhere between 10.47 and 12.29 minutes.

c.No. The interval is an estimate of the population mean, not the range of individual values. The interval provides an estimate of the location of the population mean acknowledging the uncertainty that arises from using a sample. d.

For this test we see that the reported p-value is 0.0910. Because this exceeds 0.05, we fail to reject the null hypothesis and we conclude that there is insufficient evidence to conclude that the mean is less than 12 minutes.

e.If the true population mean actually = 10 minutes the power of this test would be approximately 0.996. In other words, if the reality were that the mean flight is delayed 10 minutes, this test would detect that the mean is less than 12 minutes.

Scenario 5 a.Yes. We do not know the population so we will use the t-distribution. Because the sample is so large (n = 1455) we
can rely on the Central Limit Theorem to proceed.

b.

We can be 95% confident that the mean departure delay is between 14.63 and 19.04 minutes.

c.

We can be 95% confident that the mean time from scheduled departure to wheels off is between 31.75 and 36.33 minutes.

d.

We can be 95% confident that there is a mean difference of 16.64 and 17.75 minutes between these two ways of measuring the delay in the departures of flights. Not surprisingly, the time from scheduled departure to wheels off is considerably longer than the reported departure delay figure on average.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 10

Scenario 1 NOTE: Complete answers should note that we have continuous data, independent samples, and that the samples in each part of the question are large enough to rely on the Central Limit Theorem. a.
We can be 95% confident that the mean difference in Body Mass Index between men and women is between .34761 and 1.07684.

.
b. We can be 95% confident that the mean difference in Systolic Blood Pressure between men and women is between 5.9501 and 3.7012.

c.

We can be 95% confident that the mean difference in Diastolic Blood Pressure between men and women is between 4.387 and 2.7689.

d.

We can be 95% confident that the mean difference in Waist Circumference between men and women is between 5.0091 and 3.249 cm.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

Scenario 2 a.We should first note that we have modest sample sizes (n= 35 and n=43) from strongly skewed distributions.
Therefore, we should be reluctant to interpret the resulting interval at all. However, the reported 95% confidence interval is from $11,026,606 to +$32,748,087. b.We should first note that we have modest sample sizes (n= 35 and n=43) from strongly skewed distributions. Therefore, we should use the Wilcoxon test rather than the t (results below). The relevant P-value is 0.3018, which is insufficient to reject the null hypothesis.

Scenario 3 a.
We should first note that we have strongly skewed distributions but the sample sizes are reasonably large. Therefore, we can proceed to interpret the results of a t-test. In this test, there is compelling evidence to suggest that it does not take longer to secure the area after a rupture than after a leak; to the contrary, leaks require more time. b. We can be 95% confident that the difference in property damage costs between ruptures and leaks is somewhere between $ 23,315 and $ 171,415.

c.

In this case the different tests of homogeneity of variance lead to different conclusions. Using Levenes test, we would fail to reject the null hypothesis of equal variances; with F Test 2-sided, we would reject the null and conclude that the variances are unequal. Given the ambiguity, it is safer to conclude that the variances are unequal when conducting the tests of means (above).

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 10 Solutions 3

Scenario 4 a.Student answers will differ. We have only 8 individuals without PD, and for the baseline pitch and jitter, the
distributions appear bimodal with few observations in the center; shimmer may be normally distributed for non-PD observations. Among individuals with PD (n = 24) the distributions tend to be skewed. As such, with non-normal distributions and small samples, this sample does not satisfy the conditions for the use of the t-test. b. Based on the Wilcoxon test (assuming a significance level of = 0.05) we fail to reject the null hypothesis that the mean fundamental frequency is equal for both groups. There is no significant difference in this sample data.

c.

Based on the Wilcoxon test (assuming a significance level of = 0.05) we reject the null hypothesis that the mean jitter measurement is equal for both groups. There is a statistically significant difference in this sample data.

d.

Based on the Wilcoxon test (assuming a significance level of = 0.05) we reject the null hypothesis that the mean shimmer measurement is equal for both groups. There is a statistically significant difference in this sample data.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP

Scenario 5 a. If we rely on Levenes test, we conclude that there is insufficient evidence to conclude that the variances are different; the F Test 2-sided leads to the opposite conclusion. To be safe well use the t-test assuming unequal variances for the next question.

b.

We can be 95% confident that flight delays on Skywest are, on average, between 2.7 and 5.3 minutes shorter than those on American Airlines.

Scenario 6 a.
Using just the 2003 data, we estimate with 95% confidence that females reported sleeping between 5.55 and 12.95 minutes more than males.

b.

We can infer that people spent more time on email in 2007 than in 2003: we estimate with 95% confidence that the mean time devoted to email was somewhere between 0.6 and 1.26 minutes longer per day in 2007 than in 2003.

c.

Combining all of the data from both years, we can conclude with 95% confidence that men spend, on average, 2.3 to 6.1 fewer minutes per day socializing than do women.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 11

Scenario 1 a.
No. At the 0.05 level of significance we reject the null hypothesis of equal probabilities.

b.

Based on this sample, we would conclude that the variables are independent. We do not have sufficient evidence to conclude that the two variables are not independent (assuming a significance level of 0.05).

c.

Based on this sample, we would conclude that the variables are independent. We do not have sufficient evidence to conclude that the two variables are not independent (assuming a significance level of 0.05).

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

Scenario 2 a.

Because there are some cells with very small counts and expected counts, we should use caution making inferences from the ChiSquare test. However, we can note that the evidence points towards rejection of the null hypothesis of independence and we can also note (for example) that dolphins were regularly observed feeding in the morning and evening, but rarely if ever at other times. b. No. At the 0.05 level of significance we reject the null hypothesis of equal probabilities.

Scenario 3 a.
No. At the 0.05 level of significance we reject that null hypothesis that Provider and Region are independent.

b.

No. At the 0.05 level of significance we reject that null hypothesis that Provider and MatLeave90+ are independent.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 11 Solutions 3

c.

No. At the 0.05 level of significance we reject that null hypothesis that MatLeave90+ and Region are independent.

Scenario 4 a.
Because there are a substantial proportion of cells with very small expected counts, we should use caution making inferences from the ChiSquare test. However, we can note that the evidence points toward rejecting the null hypothesis of independence. We might observe (for example) that married respondents were disproportionately nonHispanic whites..

Scenario 5 a.
No. At the 0.05 level of significance we reject that null hypothesis that binge drinking regularity and involvement in car accidents are independent. Students who report binging at least once a week are far more likely to have been involved in an accident than other students.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP

Scenario 6 a.
The Chi-Square goodness-of-fit test indicates that the five categories are not equally distributed across mammalian species. We reject the null hypothesis that all proportions are equal at 0.20.

b.

In this case the Chi-Sqiuare goodness of fit test does not reject the null hypothesis of equal dsistituon. In other words we should NOT conclude that species are unequally distributed across the predation index.

c.
The total sample size here leads to many cells with expected counts < 5, makng the Chi-Square test unreliable. That said, the test results point in the direction of rejecting the null hypothesis.

Scenario 7 a.
According to the Chi-Square test the two variables are not independent. There is sufficient evidence to reject a null hypothesis that they are independent.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 11 Solutions 5

b.

According to the Chi-Square test the two variables are not independent. There is sufficient evidence to reject a null hypothesis that they are independent.

c.

According to the Chi-Square test the two variables are not independent. There is sufficient evidence to reject a null hypothesis that they are independent.

d.

According to the Chi-Square test the two variables are not independent. There is sufficient evidence to reject a null hypothesis that they are independent.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 12

Scenario 1 a.

In this case we find that the regional variances are not equal but the residuals do appear to be approximately normal. According to Welchs test, the mean birthrate is not equal across the regions of the world. Strictly speaking we cannot rely on a formal test to determine which regions differ. Visual inspection of the means diamonds in the Oneway graph suggests that SubSaharan birth rates are unusually high, and that birth rates in Europe and Central Asia are unusually low.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP


b.

As in part a, we find unequal variances across regions and moderately normal residuals. We can turn to Welchs test and conclude that fertility rates are not equal in the regions of the world, but cannot conduct a formal test of the differences. From the means diamonds it appears that Sub Saharan Africa has unusually high fertility rates.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 12 Solutions 3

c.

We start by evaluating conditions. The Residual by Predicted Plot raises some question about the equality of variances, but it is not definitive. The residuals do not appear to be normally distributed, but we have reasonably large sampes and can rely on the Central Limit Theorem. We find no significant interaction term, andwe do find a signifcant main effect associated with the Provider of benefits. It appears that countries with private provision of maternity benefits have significantly higher rates of maternal mortality. d.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP


We start by evaluating conditions. The Residual by Predicted Plot does not raise questions about the equality of variances. The residuals do not appear to be normally distributed, but we have reasonably large sampes and can rely on the Central Limit Theorem. We find no significant interaction term, nor do we find any signifcant main effects. It appears that these two aspects of maternity leave policies have no detectable effect on under 5 mortality.

Scenario 2
a.

We see no evidence that the ANOVA assumptions have been violated; variances across the three groups appear to be equal and residuals are approximately normal. The F Ratio of 4.6275 and corresponding P-value of 0.0187 indicate that we should reject the null hypothesis of equal means; there is compelling evidence that the different additives lead to different mean changes.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 12 Solutions 5

b.

Because we have a control group, we should use Dunnetts method to compare the means.

c.We find that there is a significant improvement in insulation with the super additivethe temperature change is smallest with that additive. The company should switch from regular to super.

Scenario 3
a.

We start by evaluating conditions, and find no signs that the sample data violate the conditions for inference.

A review of the Effect Tests shows that we have a significant interaction effect as well as significant main effects. This tells us that prices vary by city and by model, and whats more the impact of model varies across the cities.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

6 Practical Data Analysis with JMP

When we apply Tukeys HSD (output not shown fully here) we see the complexity of the interactions; we should not make statements about main effects but can use the connecting letters report to identify differences among the modelcity combinations. b.

We start by evaluating conditions, and find no signs that the sample data violate the conditions for inference.

A review of the Effect Tests shows that we have a significant interaction effect as well as one significant main effect for City. This tells us that mileage varies by city but that it does so differently depending on the model. There is no main effect for model.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 12 Solutions 7

Looking at Tukeys HSD for the significant interactions we see the complexity of the interactions; we should not make statements about main effects but can use the connecting letters report to identify differences among the modelcity combinations.

Scenario 4 a.

As usual we start by evaluating assumptions. We have a very large sample, so the Central Limit Theorem applies and we need not be concerned with normality (above we see the residuals are unimodal and symmetric, but depart from the normal model in the tails). We also see evidence that the variances are unequal. In practice, because of the very large sample it is not surprising that we find significant differences.

Both Welchs test and the standard ANOVA results strongly indicate that there are significant differences in group means. There is no control group here. Tukeys HSD indicates that employed people at work get the least sleep and unemployed people who are looking report the most. All others are significantly different from those two groups, but indistinguishable from one another.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

8 Practical Data Analysis with JMP


b.

We start again by evaluating assumptions. We have a very large sample, so the Central Limit Theorem applies and we need not be concerned with normality (above we see the residuals are unimodal and symmetric, but depart from the normal model in the tails). We dont get a default plot of residuals vs predicted values plot, but well treat the variances as equal for now. Adding the second variable (sex) to the model does not improve it much. As we can see in the effects tests, sex has not main effect, but there is a significant interaction effect. The effect of employment status on an individuals sleeping patterns is different for men and women .

Scenario 5 a.

As we can see from the output, the sample data seem to violate the assumptions of normality and equal variance. Each of the regional subsamples is large enough to rely on the Central Limit Theorem with respect to normality.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 12 Solutions 9

Using Welchs test (below) we would conclude that the mean costs of property damage are not identical across the regions.

b.Here again we see evidence that the sample data violate the assumptions of normality and equal variance. The twoway ANOVA results do not suggest significant main effects. The profiler reveals a possible minor interaction for cases in the Southwest region, but given the questions about assumptions it is inappropriate to draw conclusions. c.The distribution of residuals (not shown here) raises questions about normality and the usual tests indicate that the variances of the different disruption-type subgroups are unequal. According to Welchs test, there is at least one disruption type that differs from the others in terms of time required to make the area safe.

d.

There are no significant main or interaction effects in this two-way ANOVA.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

10 Practical Data Analysis with JMP

Scenario 6
a.

We start by examining assumptions. The residuals appear to be normally distributed (the sample sizes are large enough to rely on the Central Limit Theorem in this case), but the subsamples appear not to share a common variance. Both Welchs test and the conventional ANOVA find no significant differences among group means. b.

In this analysis the assumption of normality is satisfied; the tests for equal variances are not all in agreement so we may question that assumption. Both Welchs test and the ANOVA indicate a significant difference in mean diameters for at least one machine. Tukeys HSD finds that machine C334 has lower mean diameters than the other machines.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 12 Solutions 11

c.

The assumption of normality does appear to be satisfied; visual inspection of residuals vs. predicted values does not reveal any obvious differences in group variances.

The interaction plots indicate interaction effects between operator and machine, making it difficult to interpret the main effects of machine and operator separately.

d.The interaction plot is a bit difficult to read because the Operator initials are superimposed on one another. The profiler makes it easier to see that the extent to which machines produce tubing of differing widths varies by operator. Thus, for example, when Operator RMM is involved, machine A455 regularly makes the widest diameters; otherwise it does not.RRMs tubing diameters appear to vary widely by machine, whereas DRJs do not.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 13

Scenario 1 a.

Above are the regression results for adult females. We find a significant relationship between waist circumference and BMI, with the waist measurement accounting for about 88% of the variation in BMI. Each addition centimeter of waist circumference is associated with an increase of 0.3847 in BMI. b. In the regression within the chapter using adult males, RSquare was 0.862445; here is it .880098, indicating that the model with adult females fits just slightly better than the model with males.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP


c.

If we restrict the analysis to females under the age of 17 we find a slightly stronger relationship between Waist and BMI. The estimated slope is slightly smaller than before (0.344 vs. 0.385) but otherwise the regression models are very similar.

Scenario 2
a.

In this regression we find a weak (R2 = 0.31) but highly significant positive relationship. Subjects who differ in age by 1 year tend to have, on average, systolic BP that is approximately 0.47 points higher per year. This is not a strong relationship because age accounts for less than one-third of the variation in systolic BP.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 13 Solutions 3

b.NOTE: The question does not specify which column should be treated as Y and which as X. Because systolic pressure is the pressure of blood leaving the heart, and diastolic is the pressure of returning blood, it makes sense to use Diastolic as Y. Students who reverse the columns will see the same R2 and significance levels.

Here we find a significant but weak positive relationship. For each additional 1 point of systolic BP, diastolic increases by 0.388 points. c. The scatterplot to the left shows little or no relationship between pulse and systolic BP. If anything, there may be a very weak negative relationship here, contrary to the suspicion expressed in the question.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP

Scenario 3
a. The estimated equation is: and R2 = 0.979 indicating a very strong relationship and excellent fit. Despite the strong summary statistics, the scatterplot very clearly indicates some doubt about the linear model: the points seem to bend around the line, suggesting that the relationship is not best described as a line.

Scenario 4 a.

R2 = 0.03. This regression shows there is a weak, significant negative relationship between mileage and price for used cars. The further a car has been driven, on average the lower the price (about 4 cents per mile, on average). However there is considerable scatter around the line.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 13 Solutions 5

Scenario 5 a.
In the scatterplot we see a moderately strong positive linear association.

b.

The Random Walk model specifies a slope of 1 and intercept of 0. In the table of parameter estimates, we see that we reject the null that the intercept = 0. Moreover, in a custom test comparing the estimated slope to a hypothetical parameter of 1.0, we reject the null; that is, we find that 0.906 is significantly different from 1. Therefore, the Random Walk model does not suit this set of data.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

6 Practical Data Analysis with JMP

Scenario 6
a.

Using the Haydn data we find a similar story to the one we saw with Mozart. We again find the Golden Mean model plausible. b.Here, the R2 value (not shown) is .889; with the Mozart data R2 was .938 which is slightly better. In both cases the linear model fits the data very well.

Scenario 7 a.

Here we find a significant, but weak, negative relationship. On average, each additional day of gestation is associated with a reduction of 0.02 hours of sleep per night. Gestation accounts for only about 40% of the variation in total sleep, so it is a fair predictor of sleep hours.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 13 Solutions 7

b.

Here the linear model is a very poor fit to the data. We have three mammals with very large brains (African elephants, Asian elephants, and humans) and these distort the estimated parameters. Although we do find a significant negative slope, it is clear from the graph that the model is not suitable for estimating hours of sleep for most species.

Scenario 8
a.

We find a non-significant relationship here Tobacco Use is not a useful predictor of cancer deaths in a country.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

8 Practical Data Analysis with JMP


b.

This is also a non-significant relationship. Tobacco Use does not predict cardiovascular mortality rate. c.The aggregate prevalence of tobacco use obscures the fine distinctions in the amount and length of tobacco use in individuals. Wed really want to look at data at the individual level in order to determine the degree to which increased tobacco use influences the risks of death from cancer or from cardiovascular disease.

Scenario 9 a.

This is a highly significant, but weak, positive relationship. For each additional kg of cement in the mixture, compressive strength increases on average by 0.08 megapascals.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 13 Solutions 9

b.

Now we find a highly significant but weak negative relationship. Increased water decreases compressive strength. Specifically, each additional kg of water reduces compressive strength on average by 0.23 megapascals.

Scenario 10
a.There are slight differences, but when we round the major statistics we find that all four models are nearly identical: Yi = 3 + 0.5 Xi. All R2 (0.66) and p-values (0.0022 for the slope) are the same. b. The linear model is an apt description of these points. There is a general linear trend with points scattering evenly above and below the line.

c.In the other three graphs, the points do not fall in a linear pattern at all. This illustrates a substantial risk in running a linear regression without first examining the data visually. (In JMP we always see a scatterplot of the points either prior to fitting a model or in conjunction with fitting a model).

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 14

Scenario 1 a.

We first performed this regression in Chapter 13. Above are the regression results for adult females. We find a significant relationship between waist circumference and BMI, with the waist measurement accounting for about 88% of the variation in BMI. Each addition centimeter of waist circumference is associated with an increase of 0.3847 in BMI. When we save the residuals and check their normality, we find the normality assumption seems to be reasonable. The graph of residuals vs. predicted values suggests that the dispersion of residuals increases as predicted values increase, though it is not an overly dramatic tendency. We can probably trust this model for predictions.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

b. The 95% confidence interval for the slope is approximately [.358, .411]. This means we are 95% confident that the mean increase in BMI for each additional 1 cm. of waistline is between .358 and .411. c. Looking at the fitted line graph , it appears that the mean BMI for women with 68 cm. waists is approximately18 .

Scenario 2
a.

Once again we see the suggestion of heteroskedasticity on the left side of the graph. The residals are largely normal in shape, except for a single right-side outlier. We can probably use the model safely.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 14 Solutions 3

b.

The residuals for this regression appear to satisfy the assumptions of constant variance and normality. There is some indication that the variance increase moving from left to right, but the evidence is ambiguous. c. The scatterplot to the left shows little or no relationship between pulse and systolic BP. If anything, there may be a very weak negative relationship here, contrary to the suspicion expressed in the question.

The residuals graphs cast doubt on normality (though the Central Limit Theorem applies); there does not seem to be a problem with constant variance.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP

Scenario 3
a.

In Chapter 13 we noted that despite the strong summary statistics, the scatterplot very clearly indicates some doubt about the linear model: the points seem to bend around the line, suggesting that the relationship is not best described as a line. The Residual by Predicted plot very clearly depicts both the nonlinearity and the heteroskedasticity. Moreover, normality is also questionable, though we do have a large sample.

Scenario 4 a.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 14 Solutions 5

The residuals are clearly not normally distributed, though the sample is reasonably large. There is no obvious problem with constant variance.

b.The 95% confidence interval for the marginal descrease in price associated with each additional mile driven is [ - $
0.003, - $ 0.076].

c.Student answers will vary. The prediction bands on this graph are quite wide, and even with rescaling the axes it is
difficult to read predicted values of Y. A reasonable response would be that the price should fall between $6200 to $19,500.

Scenario 5 a.
In the scatterplot we see a moderately strong positive linear association.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

6 Practical Data Analysis with JMP

b.

These residuals should raise no concerns about heteroskedasticity or normality. Because these are time-series data, we also want to look at the residuals in sequence. Once again, the oscillations appear to satisfy the conditions of least squares regression there are no long runs of positive and negative residuals. b.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 14 Solutions 7

The Random Walk model specifies a slope of 1 and intercept of 0. In the table of parameter estimates, we see that we reject the null that the intercept = 0. Moreover, in a custom test comparing the estimated slope to a hypothetical parameter of 1.0, we reject the null; that is, we find that 0.906 is significantly different from 1. Therefore, the Random Walk model does not suit this set of data.

Scenario 6
a.

With the Haydn data, in the Residual vs. Partb plot we find a heteroskedastic pattern; the residual do deviate from normality, but the distribution is single peaked, moderately and we have a large sample. b.

With the Mozart data we also find heteroskedasticity and probable non-normality. Both issues present reasons not to interpret the regression results. With the relatively small Mozart sample, we cannot rely on the Central Limit Theorem with regard to the non-normality.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

8 Practical Data Analysis with JMP

Scenario 7 a.

Here we find a heteroskedastic pattern in which the variability of residuals diminishes as the Gestation period lengthens. Normality is not ideal, but the sample size is large enough to rely on the CLT. Given the non-constant variance, we should be reluctant to interpret or use the results of the regression.

Scenario 8
a.

Recall that we find a non-significant relationship here Tobacco Use is not a useful predictor of cancer deaths in a country. The residuals seem to show more variability in the middle range of tobacco use (non-constant variance), and residuals are nearly normal, with a long upper tail but large sample size. This model is not useful for inference.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 14 Solutions 9

b.

Recall that this is also a non-significant relationship. Tobacco Use does not predict cardiovascular mortality rate. The residuals indicate some possible curvature (non-linearity) as well as heteroskedasticity. They are not very close to a normal distribution, though the large sample size would permit us to invoke the CLT. This model should not be put to use based on this sample.

Scenario 9 a.

These residuals look good the Residual vs. Cement plot shows an even scatter above and below the 0-line and the normal quantile plot shows that the residuals follow a nearly normal distribution except for the lower tail. In any case, we have a very large sample, so the CLT applies. We can safely interpret the results. This is a highly significant, but weak, positive relationship. For each additional kg of cement in the mixture, compressive strength increases on average by 0.08 megapascals.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

10 Practical Data Analysis with JMP


b.

In this regression the variability of the residuals seems slightly larger in the middle range of X values than elsewhere, but the pattern is not definitive. There are also clear problems with normality, but the CLT applies. Hence, we can go ahead and interpret the results. Now we find a highly significant but weak negative relationship. Increased water decreases compressive strength. Specifically, each additional kg of water reduces compressive strength on average by 0.23 megapascals.

Scenario 10
a.

Above are the four plots of residuals vs. X; normality plots are not shown here. The residuals in the first regression are homoskedastic and approximately normal. The others indicate non-linearity and/or heteroskedasticity. Normality plots also indicate non-normal residuals in these small samples. b.The four residual vs. X plots indicate that only the first model is suitable for interpretation and use.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 15

Scenario 1 a.

The residual plots from this multiple regression model are very similar to those from the simple regression using Waist circumference as the only predictor (see those graphs below). We can use this set of data for estimation. The regression results themselves are shown here:

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP


We find a strong relationship between BMI and the model, but this model is not much of an improvement over the previous model (shown again below). The intercept has changed dramatically, though in this model the intercept does not have much meaning. The effect size for the Waist measurement is almost equal to that of the single variable model, and the coefficient of height is not significant at the cusotmary .0.05 level. The height variable is not significant at the 0.05 level, though it is significant at the 0.10 level. The two-variable model has a very small improvement in goodness of fit in comparison to the single-variable model. In short, the addition of the height data does not improve the model in any material way.

We first performed this regression in Chapter 13. Above are the regression results for adult females. We find a significant relationship between waist circumference and BMI, with the waist measurement accounting for about 88% of the variation in BMI. Each addition centimeter of waist circumference is associated with an increase of 0.3847 in BMI. When we save the residuals and check their normality, we find the normality assumption seems to be reasonable. The graph of residuals vs. predicted values suggests that the dispersion of residuals increases as predicted values increase, though it is not an overly dramatic tendency. We can probably trust this model for predictions.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 15 Solutions 3

b. We can safely apply the model, and the 95% CI for the slop is approximately [0.359, 0.411]; with the simpler model the corresponding interval was very nearly the same. [.358, .411]. This means we are 95% confident that the mean increase in BMI for each additional 1 cm. of waistline is between .359 and .411. c.

In the model using waist and thigh circumference (note typographical error in early printings of the book that this is refrered to as wrist cicumference),we find residuals that are approximtely normal and more heteroskedastic than our prior models. In this sense, the model is less attractive than the earlier ones. On the other hand, the goodness of fit is improved (Adj. RSquare;see below) now equals 0.92 and both slopes are statistically signifiant and make logical sense.

d.Student responses will vary depending on choice of varibles. Complete answers should include residual graphs to check normality and heteroskedasticity. A model including BMTRI (tri-skin fold) does add to the explanatory power of the model in comparion to those shown early.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP

Scenario 2
a.Student answers will vary. One rotated scatterplot is shown here (including a density ellipsoid). We see a weak tendency for systolic BP to increase both as age and weight increase.

b.

The residuals indicate a possible problem with heteroskedasticity; the sample size is probably large enough to rely on the CLT in the face of possible non-normality. When we look at the regression results, we conclude that there is a significant positive relationship between systolic BP and weight, but that age has no significant effect once age is considered. The overall model fit is poor.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 15 Solutions 5

c.

Here again we find concerns about heteroskedasticity and normality; if we continue on to interpret the coeffcient estimates, we see that the Diastolic BP adds little to the model. The estimated value is not significantly different from zero, and the adjusted R2 is very nearly the same in the prior model using just 2 factors in the model. This model is no meaningful improvement over the prior one.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

6 Practical Data Analysis with JMP

d.Student answers will vary. Among the promising table columns to include in a model are age in years and waist circumference. The key in these responses is whether students accurately assess the residuals and the significance and meaning of parameter estimates.

Scenario 3
a.

The leverage plots immediately suggest a problem with collinearity, which is confirmed by the very high VIFs in the table of parameter estimates (below):

This model should not be used or interpreted.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 15 Solutions 7

b.By dropping the column of MortUnder5, we resolve the collinearity problem. Residual graphs appear to be acceptable, so this model can be interpreted.

Other things being equal, nations with higher maternal mortality have higher birthrates; similarly nations with higher infant mortality also have higher birthrates. The size of the effect is given by the estimated slopes above. Both variables are statistically significant, and the model collectively accounts for approximately 78% of the variation in birthrates around the world in 2005.

Scenario 4 a. .

When we estimate a simple linear model using gestation as the factor, we find a heteroskedastic pattern in which the variability of residuals diminishes as the Gestation period lengthens. Normality is not ideal, but the sample size is large enough to rely on the CLT. Given the non-constant variance, we should be reluctant to interpret or use the results of the regression.

b.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

8 Practical Data Analysis with JMP


With the addition of the BrainWeight variable, the residuals are still heteroskedastic suggesting caution in interpretation of the other results. The leverage plot (not shown) for BrainWt indicates a possible collinearity problem.

The Brain Weight variable is not significant at the customary 5% level, though the P-value is small (0.0628). This model is not a substantial improvement over the first model.

c.

This model is not an improvement over the prior two. We still see heteroskedasticity in the plot of residuals vs. fitted values (not shown here). We see evidence of collinearity in the large VIF for BrainWt, and only the Gestation variable is statistically significant.

Scenario 5 a.

In the correlation matrix we find that the Basic Goods index is most highly correlated with the General Index. The simple model that estimates monthly values of the General IIP from the Basic Goods IIP provides an excellent goodness of fit and the sample is large enough to invoke the CLT. However, we do see some evidence of nonlinearity in the plot of residuals vs. fitted values (below):

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 15 Solutions 9

Given the R2 value of nearly 0.99, the non-linearity may not be a major problem. The estimation results are as follows:

An increase of 1 in the Basic Goods index will be accompanied on average by an increase of approximately 1.4 in the General Index.

b.Student answers will vary to this question. In most cases the second variable will be highly correlated with the Basic
Goods index, raising the issue of collinearity. VIFs will tend to be large (> 10), and non-linearity will persist.

c.See discussion in part (b) above. It is not surprising that these index variables are all highly correlated because they
all measure different aspects of the fundamental production activity within the Indian economy, and all reflect the general level of economic activity.

Scenario 6
a.Student models will vary. Here is one plausible result using the Enfield and Orono columns:

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

10 Practical Data Analysis with JMP


The residuals appear to have a non-constant variance, which raises a problem with using this model for prediction or estimation. The model adjusted R2 is approximately 0.9 which indicates a very good fit. Both variables are statistically significant and we see no real evidence of collinearity.

b.All of these communities have been exposed to the same state and national trends described in the question. Thus, the same factors that have led to reduced waste collections in one community also lead to reduced collections in another.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 16

Scenario 1 a.

The key results are shown above. Compared to the model using waist circumference only, this model has a slightly higher adjusted RSquare and smaller Root Mean Square Error. Both variables are statistically significant. The residuals vs. fits graph is quite similar in both models, and this model makes logical sense. b.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

Adding the interaction term improves the model slightly. The interaction term is statistically significant and the fit is very slightly better.

Scenario 2 a.

The leverage plots indicate collinearity problems, which are borne out by the parameter estimates. We see that the model has rather poor fit, and only the Weight variable is statistically significant. b.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 16 Solutions 3

Adding the diastolic blood pressure measurement does not help; it is not statistically significant (as shown above). The summary of fit measures are slightly worse in this model, and the new variable also shows a collinearity problem.

Scenario 3 a.

This model fits the data rather well, and all coefficients are significant. We find that other things equal higher rates of maternal mortality are associated with higher birthrates, and that after controlling for differences in maternal mortality, countries that do not offer lengthy maternity leaves have higher birthrates than countries with longer leaves. Residuals appear to be normally distributed with equal variances. b.There is no significant interaction. Adding the interaction term does not improve the model.

Scenario 4 a.
For Denmark, the annual growth rate is e0.24485191 = 0.277 or 27.7% per year.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP


b. For Malaysia, the annual growth rate is e0. 32149081 = 0.379 or 37.9% per year.

c.

For the U.S., the annual growth rate is e0. 22281871 = 0.249 or 24.9% per year.

d.The log-linear model does not fit any of these countries perfectly but is useful in all cases. The US has the slowest rate of growth, followed by Denmark, Malaysia and then Thailand.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 16 Solutions 5

Scenario 5 a.

The logistic regression results appear to the left. The regressor, PPE, is statistically significant and we see that patients with Parkinsons Disease have significantly lower PPE values than patients without PD. In the Logistic Plot, the dark markers are patients with PD; we see that the estimated curve distinguishes between PD and non-PD patients.

b.This model is an improvement over the one estimated in the chapter. The ChiSquare statistic is larger here as is the RSquare (U) statistic. In the earlier model, we had non-significant variables, but this time the independent variable is significant. The researchers did succeed.

Scenario 6 a.
The results are to the left. We find that the whole model is significant with a rather poor fit, as measured by U. Other things being equal, the longer Part a is the lower the odds that it was composed by Haydn. Conversely, the longer Part b is (holding Part a constant) the higher the odds that it was composed by Haydn.

b. [note: to solve this problem, one needs to refer to outside sources about Logistic Regression]To decide which composer is more likely to have written a sonata with a 72-measure Parta and 112 measure Partb, we first substitute the values into the estimated equation: Logodds = 1.92488592 0.1249799(72) +0.05302013(112) = 1.13541232. This is the log of the odds ratio for Haydn/Mozart, so the odds ratio is e1.13541232 = 0.3213. Because the estimated ratio is well below 1, it is far more likely that Mozart would have composed such a sonata rather than Haydn.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

6 Practical Data Analysis with JMP

Scenario 7 a.Here are the results for the quadratic and linear fits:

We can see that the quadratic model has better goodness of fit statistics, and graphically it is clear the the parabolic model fits the observed points better than the linear model. b. Once again the quadratic model is the better fit. The graph to the left makes the matter clear. Moreover the RSquare statistics of the two models are 0.46 and 0.65 respectively. The root mean square error is similarly smaller in the quadratic model: 27.1 for the linear regression and 21.7 for the quadratic.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 17

Scenario 1 a.

As shown above, using a 6-month season is a minor improvement over the 3-month season. The variance, MAPE, and MAE are smaller with this model than the earlier model, and RSquare is very slightly higher. b.

As we can see above, the summary measures for the quadratic model (left) are slightly better than for the log-linear model. c.

As shown above, the AR(2,1) model is an improvement as indicated by all measures of fit.

Scenario 2 a.Student answers will vary. Responses should note that Durables show a marked upward trend with likely seasonal
component. Below are summary results for several reasonable approaches. Among the methods available through the Time Series platform, Linear Exponential Smoothing outperforms the others according to the measures we have studied. The adjusted RSquare statistics for the regression-based models are inferior to all but the AR(1) model, as follows: Linear, (.854), Quadratic (.855), LogLinear (.867).

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP

b. This table summarizes the results for all of the models shown in Part a: Period Actual Holt Winters AR(1) AR(1,1) 112 113 114 413.6 439.8 432.3 412.57 414.35 416.13 410.41 412.27 418.55 434.91 427.41 420.36 434.31 440.49 441.01

Linear

Quadrati LogLinear c 417.23 425.80 430.08 419.15 428.17 432.80 421.14 430.64 435.63

Scenario 3 a.This is an annual series and therefore there can be no seasonal component.
b.Student answers will vary. For Denmark, Linear Exponential Smoothing provides a very good fit:

c.Student answers will vary. For the Malaysia data, a 3rd degree polynomial (cubic) model provides a very good fit:

d.Student answers will vary. For the United States data, a quadratic model provides a very good fit:

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 16 Solutions 3

e.These countries are all best approximated by different models. Effective time-series modeling requires the use of a variety of approaches.

Scenario 4 a.The fertility rate in Brazil has declined following an S-shaped curve:

An AR(1,1) model fits modertately well, with relatively high RSquare (0.969), low variance (0.077) and MAPE and MAE of 5.35% and 0.20 respectively. b.

The decline in the Russian Federation fertility rate has been rather irregular, and will not be well-modeled by any of the regression methods. Simple exponential smoothing or AR(1,1) models serve well.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP


c.

Indias decline is very regular, especially since 1960. Linear Exponential Smoothing (Holts method) and AR(1,1) models both fit extremely well. d.

The decline in Chinas fertility rate has been rather irregular, and will not be well-modeled by any of the regression methods. An AR(1,1) model fits well. e.

f.It is difficult to say with certainty. The AR(1,1) model produces a 2010 estimate of 2.025, with is closer to the UN figure than any of the other models presented in the chapter.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 16 Solutions 5

Scenario 5 a.
Bangor: For this series, an AR(4,1) works moderately well. The strong seasonal element here suggests that points are correlated with the observation 4 quarters earlier.

b.

Bucksport: After the fourth quarter in the data table (i.e. from 2000 onward) something dramatically changed in Bucksport. The series is very stable and generally stationery from 2000 onward. An AR(4,0) or AR(4,1) model works well.

c.

Enfield: This pattern is much like the one in Bangor; Once again an AR(4,1) model fits well.

d.

Orono:The patterns in Orono maps closely with that in Bucksport. An AR(4,0) or AR(4,1) model works well.

e.

Winslow: Here we see the dramatic change occurring roughly half-way through the time series. Simple exponential smoothing provides are reasonably good model.

f.The autoregressive model with a lag of 4 quarters worked well in most cases, but not all. It makes sense that solidwaste generation varies seasonally. On the other hand it is pretty evident from the graphs that in some communities there were major changes in practice at some point during the period, which makes it unlikely that any one approach would work equally well.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

6 Practical Data Analysis with JMP

Scenario 6 a.
CO2 emissions in Afghanistan have fallen since the series began, and have leveled off (with minor increases) in most recent years. For this series, a log-linear model fits quite well (Rsqr =0.905). The other time series methods do not fit quite as well, though an AR(1,1) provides a good fit.

b.

In the Bahamas there has also been a steady decline, with on unusual jump in 2002. This irregularity makes it difficult to fit a regression trend model well, and none of our models will account for the small spike completely.A 3rd-degree polynomial (cubic) provides a moderately good fit, as does AR(1,1).

c.

In sharp contrast to the prior two graphs, Chinas CO2 emissions have been rapidly rising. A 3rd-degree polynomial (cubic) provides a moderately good fit, as does AR(1,1).

d.

The pattern in the Sudan time series is somewhat similar to China in recent years, but in the early 1990s carbon emissions were falling. A 2nd degree (quadratic) polynomial model reasonably fits well. An AR(1) model is also a good fit, but AR(1,1) is not in contrast to earlier series.

e.

CO2 emissions in the US rose for much of the period and seem to have leveled off, presenting a quite different pattern from the prior 4 nations. A 2nd degree polynomial fits best.

f.There is no single model that fits all of these series probably because the extent use of CO2-generating technologies varies considerably across these countries. Some are reducing emissions while others are making greater use of activities that emit CO2.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 16 Solutions 7

Scenario 7 a.
The series to the left would be poorly described with any type of linear trend model because it exhibits several changes of direction. Because we have just 6 months of data, we should not use Winters method which accounts for seasonal variation.

b.Student answers will vary. A simple exponential smoothing model fits the data as well as any other model in the chapter. The five daily forecasts from that model are all the same: approximately $57.59.

Scenario 8 a.

The Nikkei225 has the highest correlation with the S&P500 (0.9812) and the FTSE100 is close behind with r = 0.9810)

b.The models should be for the Nikkei and S&P. The two series are shown below.

For the S&P no model is perfect, AR(2,1) provides a Much like the S&P series, the Nikkei is well-modeled comparably low variance, MAE, MAPE, and high RSqr. with an AR(2,1) model.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

8 Practical Data Analysis with JMP


c.Yes. Both markets are engaged in competition in the same global markets, and move very closely together as indicatd by their very high correlation. d.Student answers will vary. The AR(2,1) model works rather well for the HangSeng data. Based on that model, the forecasts are as follows: 01/05/2009 13972.2114 01/12/2009 13804.8764 01/19/2009 13510.4924 The graph to the right shows the extrapolation. We can be 95% confident that the index would lie within the blue confidence limits. Beyond that it is difficult to specify a confidence level in the point estimates, but they appear to be a reasonable extrapolation beyond the observed data.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 18

Scenario 1 a.

The first 5 rows are shown to the left. b.The table now has 80 rows in all, rather than 40 due to the addition of a fourth binary variable. Because were using a different random seed, the randomization of runs (expressed in the patterns) has also changed from the table shown in Figure 18.5. c.Assuming we follow the example presented in the chapter , we now have 50 experimental runs, the first 10 of which are assigned to team member #1. Each team member will perform 10 of the 16 possible runs, with each member having a slightly different pattern assigned randomly. d.This question asks us to modify the original design from part a to be a half-fractional design, with 40 runs rather than the original 80.

As shown in the aliasing table, we lose the ability to differentiate between three pairs of interaction effects.

Scenario 2 a. There will be 32 runs in a Resolution IV, full-factorial design.


b.

c.[NOTE: The question should read: Briefly explain what happens when we move from a two-factor screening design to a five-factor design.] In a two-factor screening design there would be just four runs (22) and the five-factor model has 25 = 32 runs.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP


d.With an additional factor level for the first categorical factor the number of runs increases to 48.

Scenario 3
a.

The first five rows of the data table, including Patterns, are shown above. b.To accommodate 72 subjects, we now specify 5 replications (that is, 6 sets of 12 runs each for a total of 72 runs. The patterns are now as follows:

Note that the runs have again been randomized; in this particular case, the 2-interruption condition does not apply to any of the first five experimental subjects, but does appear for the 6th subject. c.With 72 subjects, the prediction profiler shows that the variance ranges from approximately 0.042 to approximately 0.056. With 144 subjects, the corresponding variance range is reduced by half, ranging from approximately 0.021 to 0.028.

Scenario 4 a.Categorical factors: type of incentive, timing of incentive, survey mode, guarantee vs. lottery.
Continuous factors: Duration of survey, number of contacts made, amount of money offered.

b.[Student answers will vary]


Type of incentive: monetary/ non-monetary. Might also include none as a control, or vary the specific non-monetary incentives. Timing of incentive: as described, point of contact vs. completion of survey Survey mode: telephone, email. Nature of gift: guarantee vs. entry into lottery

c.Assuming that we use minimal number of factor levels described in b, and two factor levels for the continuous
factors, we would have four dichotomous categorical factors and three continuous factors. This would, then, require 2 x 2 x 2 x 2 x 2 x 2 x 2 = 27 = 128 runs.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 18 Solutions 3

d.

We have 8 patterns, repeated 4 times for a total of 32 runs. Each factor level appears 16 times within the design, and the runs are randomized (for example, above we see the first 8 rows and notice that rows 1, 2 and 5 are all the same pattern). Many of the conceivable patterns (e.g., -------) do not appear in the design. This is a Resolution 3 design that will allow us to discern main effects but no interactions.

Scenario 5 a.Here are the first five rows of the table:

b.Here are the first five rows of this table:

c.The full-factorial design has 480 runs and the response-surface custom design has 640. In the initial design, the AntiUV additive is tested at levels of 3, 5 and 10 with each of the three tested in one-third of the runs. In the revised design, the levels are 3, 6.5, and 10 but the intermediate 6.5 level is only tested in 5% of the experimental runs.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP

Scenario 6
a.This table has 72,072 rows. Here are the first five:

b.This table has only 27,027 rows with these as the first five:

c.In the full factorial design, every combination of all levels the five factors (2 x 3 x 2 x 3 x 2 = 72) is tested whereas in the reduced custom design, far fewer are tested because interactions are limited to two factors at a time.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Instructor Solutions
Chapter 19

Scenario 1 a.

Machine A455

Machine C334

As we can see in the graphs above, Machine C335 may have an unstable standard deviation and machine A455 shows two sample means beyond the control limits. These machines should be inspected closely for possible adjustment.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

2 Practical Data Analysis with JMP


b.

Measurements for operator MKS (charts shown above) appear to be out of control throughout Phase 1. c. This capability analysis shows that 5% of the observations lie outside the capability limits, indicating that the process is capable of producing tubing that is within .5 mm of 4.5.

Scenario 2 a.
This process is out of control at one point. Because a day with 0 cancellations is desirable, we should not be concerned about dates with values below the LCL. However, the chart shows 1 date well above the UCL.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 19 Solutions 3

b.

This is essentially the same data as the previous chart again showing the process out of control at several points. Because a day with 0 cancellations is desirable, we should not be concerned about dates with values below the LCL. However, the chart shows 2 dates above the UCL and the final date appears to be at the UCL.

Scenario 3 a.
Production of basic goods has been rising steadily over time, which is a good thing. This is not a process designed for a constant target, but rather one of continuous growth.

b.This is a distinct seasonal pattern; not only does the level of production of production vary by season, so does the variability of production vary predictably by season.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

4 Practical Data Analysis with JMP


c. Once again we see a steady pattern of growth, with clear seasonal variation. In contrast to the control chart for Basic Goods, the one for NonDurables may exhibit a more linear upward trend, and substantial growth in variability (the R Chart) in the most recent years. Because the need for basic goods probably follows the growth in population we might expect steady growth akin to population trends.

d.Again we see a strong seasonal pattern, though the amount of variability increases noticeably on the right side of the graph. As the total amount of NonDurables produced has increased, so has the extent of seasonal variation, perhaps caused by underlying changes in the demand for non-durables.

Scenario 4 a.In most regions except for the Southwest the standard deviations are sufficiently unstable that we should not interpret
the Xbar charts. In the Southwest, the standard deviations have been steadily increasing but the limited data (only five sample mean) indicates increasing mean times to restore the area to safety, but still within control limits. b.In most regions except for the Eastern regions the standard deviations are sufficiently unstable that we should not interpret the Xbar charts. In the East, there is a single sample in which the standard deviation spikes well above the LCL. Otherwise, but the S chart and Xbar charts show a process that is mostly in control.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 19 Solutions 5

Scenario 5 a.
Bangor: The S chart is stable; early in the study period there was one year below the LCL. Otherwise Bangor has remained within limits, though the 6 most recent years have been above average.

b.

Bucksport: The S chart is stable except for the first year, as is the Xbar chart. After an initial year with extraordinarily high output, the process has remained very stable.

c.

Enfield: This pattern is much like the one in Bangor. The S chart is stable throughout. Otherwise Enfield has remained within limits, though the 5 of the 6 most recent years have been above average

d.

Orono:The pattern in Orono is similar to Bucksport. The S chart is stable except for the first year, as is the Xbar chart. After an initial year with extraordinarily high output, the process has remained very stable; however, because the control limits are computed using all years in Orono we see a control chart in which no single sample falls within the control limits. The first year was so extraordinarily large compared to later years that every subsequent year lies below the LCL.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

6 Practical Data Analysis with JMP


e. Winslow: In year 3 the S chart (not shown) shows the sample standard deviation above the UCL; otherwise the standard deviations are moderately stable. The Xbar chart shows a process out of control until year 6, after which the process seems to be in control.

Scenario 6 a.

Emissions in most regions are relatively stable In Africa (shown to the left), both the ranges and means have been steadily rising over the 15-year period.

b.The general message in this set of charts is the same as in Part a. The Xbar charts are all identical to what we saw previously. The S Chart for Africa appears to exhibit more dramatic growth than the Range chart.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 19 Solutions 7

Scenario 7 a.
Given the instability in the standard deviations, we should be reluctant to interpret the Xbar chart. However, we might observe that for roughly the first 10 samples both the standard deviations and means tended to be substantially higher than for the remainder of the period. It would appear that there was a fundamental process change leading to shorter and more predictable departure delays sometime around the 10th sample.

b.

The pattern in these charts is very similar to that of the prior charts except that the magnitude of the times is larger in this case. Again, with the instability of the standard deviations it is risky to interpret the Xbar chart. We do however see a dramatic change around sample 10.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

8 Practical Data Analysis with JMP


c. We need to select the weekday flights; because the target involves individual flights, we make a Run Chart. The critical capability limit here is the USL, which we set at 20 minutes; the other values may be set to zero We see that 16% of the flights exceeded delays of more than 20 minutes. Therefore the current process is not capable of meeting the goal.

Scenario 8 a.
It appears that the variability of the process standard deviation has increased over time, with one recent S above the UCL. Nearly all of the sample means are within the control limits; early in the observation period (roughly the first 15 samples) the mean magnitudes remained quite close to 4.0. Since that time, the fluctuations in mean magnitude have increased even as the mean appears to have remained stable.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

Chapter 19 Solutions 9

b.

With a larger sample size there are naturally fewer sample means. The averages in both mean charts are the same, but otherwise the computed values are different. In this chart, the control limits for both graphs are closer to the mean than in the earlier chart. Again we see increasing oscillation in the sample standard deviations, but otherwise this process is in control.

This set of Instructor Solutions is a companion piece to the following SAS Press book: Carver, Robert. Practical Data Analysis with JMP. Copyright 2010, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

You might also like