You are on page 1of 10

Stat 305 Final Practice Solutions

1. Enterprise Industries produces Fresh, a brand of liquid laundry detergent. In order to more
effectively manage its inventory, the company would like to better predict demand for Fresh. To
develop a prediction model, the company has gathered data concerning demand for Fresh over
the last 30 sales periods (each sales period is defined to be a four-week period). For this data
set, let

x1 = the price (in dollars) of Fresh as offered by Enterprise Industries in the sales period minus
the average industry price (in dollars) of competitors similar detergents in the sales period.

x 2 = Enterprise Industries advertising expenditure (in hundreds of thousands of dollars) to


promote Fresh in the sales period

y = the demand for Fresh (in hundreds of thousands of bottles) in the sales period
Refer to Output A for parts (a) (b).
a) [4] Based on your interpretation of the scatterplots provided, state the model equations that
might adequately describe the relationship of i) y with x1 and ii) y with x 2 . Your answers
here should be similar in form to the following incorrect answer:
y = 0 + 1 x1 + 2 x2 + 3 x1 x2 + .

i) y =

0 + 1 x1 +

ii) y = 0 + 1 x 2 + 2 x 22 +

b) [2] If one would fit the model, y 0 + 1 x1 (which may or may not correctly reflect the
relationship between y and x1 ) to the data set, what would be the value for R2, the coefficient of
determination?

R 2 = 0.88972 = 0.7916
From among several models for y as a function of x1 and x 2 , the following model (Model 1) was
selected: y = 0 + 1 x 2 + 2 x 22 + 3 x1 + 4 x1 x 2 + .
c) [5] A normal quantile plot of residuals and a plot of the residuals versus the predicted values
are shown in Output B. Describe how you may use these plots to examine whether certain
model assumptions are appropriate here. State the assumptions under consideration and
identify clearly the plot you would use for assessing each assumption.

~ iid N (0, 2 )
The constant variance assumption can be assessed by looking at the plot of the
residuals. If one sees the residuals forming a fan shape, then the constant variance
assumption may not be appropriate for the data. If the model is appropriate for the data,
then one hopes to see the residuals forming a cloud shape.

The normality assumption could be assessed by looking at the normal quantile plot. If
the points create a fairly linear pattern, especially in the middle of the plot, then the
normality assumption could be appropriate for the data.
Refer to Output C for parts (d) (g).
d) [4] Predict the demand for the next sales period (in hundreds of thousands of bottles) if the
price difference will be -.20 (dollars) and the advertising expenditure for Fresh will be 5.0
(hundreds of thousands of dollars).

y = 29.11 7.61(5) + .67(52 ) + 11.13( 0.20) 1.48( 0.20 * 5)


= 7.064
e) [5] Calculate a 90% confidence interval for 2 using the information provided. Use this interval
to test the hypotheses concerning whether a quadratic term is needed in the model or not.
State your decision.

0.6712 1.708(0.2027)
(0.32,1.02 )
Since 0 is not in the interval, then we can conclude that 2 0 . Hence the quadratic
term is needed in the model. Note that the value used for t is based on 25 df.
f)

[4] State the null and the alternative hypotheses concerning whether the interaction term is
needed in the model or not. Continue to follow the five-step format to perform a hypothesis test.

H0 : 4 = 0
H a : 4 0
1.4777 0
= 2.21
0.6672
p value = 0.0361

t=

Since the p-value is less than 0.05, we can reject the null hypothesis and conclude that
the interaction term is needed in the model.
g) [2] Give the estimate for 2 .
MSE = 0.04258

h) [6] Since Enterprise Industries has to pay someone to visit several stores and gather
information on the prices for similar detergents produced by competitors during every sales
period, Enterprise Industries is wondering if using only advertising expenditure to predict
demand is equivalent to using both advertising expenditure and price difference to predict
demand. Output D contains the output for a model that uses only advertising expenditure to
predict demand, i.e., y = 0 + 1 x 2 + 2 x 22 + (Model 2). Follow the five-step format to justify
using Model 1 or Model 2 to predict demand. Note that you will need to provide the value of a
test statistic that is distributed according to an F -distribution.

H 0 : 3 = 4 = 0
H a : at least one j 0 for i = 3,4.
(2.18 1.06) /(27 25)
= 13.2 ~ F2,25
1.06 / 25
Q(.95) = 3.39
p value < .001
f =

The p-value is less than 0.05 so we reject H0. Therefore, use the full model (Model 1) to predict
demand.
Output A:

Output B:

Output C:

Output D:

2. Fill in the blanks for the following Analysis of Variance table.


Source

DF

Model

400

Error

__a__

___b___

24

800

C. Total

Sum of Squares

Mean Square
_____c____

F Ratio
___e___

_____d_____

Answers: a=20, b=400, c=100, d=20, e=5.


3. A student measured his car mileage at different combinations of speed (55, 60, 65 and 70 mph) and
octane (87 and 90). The data set consists of 24 observations - three observations for each
combination of levels. The fitted model is in the form y = b0 + b1 x speed + b2 xoc tan e .
Refer to Output A to answer the following questions.
a) Calculate R2, the coefficient of determination.
R2 = 235.14 / 246.63 = .95
b) Calculate the residual for the observation given in the first line of the data table.

y1 y1 = 30 (133.029 .176 * 55 + 1.981* 87)


= 30 29.637
= .363
c) Give a 99% confidence interval for 1 .

0.176 2.831(0.027)
(0.252, 0.100)
d) Give the estimate for

0.547 = 0.7396

f)

Interpret the confidence interval calculated by JMP for the observation described by the last line
in the data table.
For a speed of 55 mph and 87 octane, we are 95% confident that the average mileage will be
between 28.97 and 30.19 mpg.

g) To compare model y 0 + 1 x speed + 2 x oc tan e to model y

0 , state the null and alternative

hypotheses, the formula for the test statistic, the formula with the appropriate values as provided
by the JMP output, the p-value and the conclusion.

H 0 : 1 = 2 = 0
H a : at least1 or 2 0
(246.63 11.489) /(23 21)
= 214.90
11.489 /(21)
P[ F2, 21 > f ] < .0001
f =

Since the p-value is less than 0.05, we can conclude that at least one of the parameters is not
equal to zero. Therefore, the model that includes both speed and octane along with the
appropriate parameter estimates should be used to predict average mileage.
Output A:

First five lines from the data table:


Speed
55
60
65
70
55

Octane
87
87
87
87
87

Mileage Lower 95% Mean mileage


30
28.9687622
29
28.2334503
28
27.3517837
27
26.3237622
30.5
28.9687622

Upper 95% Mean mileage


30.1929045
29.164883
28.2832163
27.5479045
30.1929045

4. The Department of Transportation (DOT) conducted an experiment to determine the relationship


between the curing process, which is characterized by time and temperature, and maximum
compressive strength of concrete (psi). There are four levels for time: 1, 2, 5 and 10 days. There
are three levels for temperature: 40, 60 and 80 degrees Fahrenheit. And there are three
observations for each combination of levels.
a) In order for inference about quantities such as 1 to be valid, what do we need to assume about
errors (residuals) for any linear regression model?
~ iid Normal(0, 2 )
b) Compare the model y 0 + 1 xtime + 2 xtemp (refer to Output B) to the model

y 0 + 1 xtime + 2 xtemp + 3 xtime xtemp (refer to Output C). Which model (along with the
appropriate parameter estimates) use to predict strength and why? Follow the five-step format
and provide a test statistic that is distributed according to the F-distribution.
H 0 : 3 = 0
H a : 3 0
(364346.2 321450.5) /(33 32)
= 4.27 ~ F1,32
321450.5 / 32
Using F1,30 : Q(.95) = 4.17 < 4.27 < Q(.99) = 7.56
f =

=> .01 < P value < .05

Since p-value<.05, there is enough evidence to conclude that 3 0 . Hence, we should


fit y 0 + 1 xtime + 2 xtemp + 3 xtime xtemp to the data set and use the model with the
Output B:

appropriate parameter estimates to predict strength.


Output C:

5. Circle either T (true) or F (false).


T

Suppose a 95% confidence interval for the difference of two population


means is (-1.3, 4.1). According to the 95% confidence interval, the p-value would
be less than 0.05 based on a null hypothesis that states there is no difference.

When one says (0,10) is a 95% CI for , one means that the probability
that lies within the interval is .95.

A 99% confidence interval is wider than a 95% confidence interval for a given
data set.

Answers: F, F, T.

6. Fill in the blank(s) with the appropriate answer.


a) A Type __________ error occurs when one says that there is a difference between two population
means when the difference is zero as stated by the null hypothesis.
b) The letters iid stand for __________________________________________________.
c) The sample mean for large samples (samples with 30 or more values) is approximately
normally distributed according to the

____________

____________ Theorem.

Answers: I , independently and identically distributed, Central Limit.

7. A new experimental drug to reduce cholesterol was developed. Five people were chosen to receive
the new drug. Each person had his/her cholesterol measured before taking the drug. Then each
person took the drug for a six-week period and had his/her cholesterol measured again.
a) Give and interpret a 95% confidence interval for the mean difference between the before and
after cholesterol measurements.
Person
Before
After

1
200
180

2
220
190

Differences: -20, -30, -15, -20, -60


d = -29
2
s d = 330

330
330
29 2.776

,29 + 2.776

5
5

df = n 1 = 5 1 = 4
(51.55,6.45)

3
180
165

4
195
175

5
240
180

We are 95% confident that the mean decrease in cholesterol after taking the new drug for six weeks
will be between 6.45 and 51.55.

8. An engineer is concerned about spring lifetimes (103 cycles) under two different levels of stress:
900 N/mm2 and 950 N/mm2. Below are the data.
950 N/mm2: 225, 171, 198, 189, 189, 135, 162, 135, 117, 162
900 N/mm2: 216, 162, 153, 216, 225, 216, 306, 225, 243, 189
Follow the five-step format to assess the strength of evidence that the difference in mean lifetimes
between 900 N/mm2 stress level and 950 N/mm2 stress level is not equal to zero.

H 0 : 900 950 = 0
H a : 900 950 0
x950 = 168.3
2
s950
= 1098.9

x900 = 215.1
2
s900
= 1844.1

1098 .9(9) + 1844 .1(9)


= 38.36
18
215.1 168.3 0
t=
= 2.73
1
1
38.36
+
10 10
.01 < .5 p < .025
sp =

.02 < p < .05


The p-value is less than .05, so we will reject the null hypothesis and conclude that there is a difference
in mean lifetimes between 900 N/mm2 stress level and 950 N/mm2 stress level.
9. An engineer is concerned about spring lifetimes (103 cycles) under two different levels of stress:
900 N/mm2 and 950 N/mm2. This time the engineer performed this experiment with a total of 100
springs. Below are the data.

x950 = 154.1
2
s 950
= 1315.2

n950 = 60
x900 = 168.8
2
s 900
= 1902.8

n900 = 40
Follow the five-step format to assess the strength of evidence that the difference in mean lifetimes
between 900 N/mm2 stress level and 950 N/mm2 stress level is not equal to zero.

H 0 : 900 950 = 0
H a : 900 950 0
z=

168.8 154.1 0

= 1.73
1902.8 1315.2
+
40
60
P[| Z |> 1.73] = 2 P[ Z < 1.73] = .0836
The p-value is greater than .05, so we will not reject the null hypothesis. Hence, there is not enough
evidence to conclude that there is a difference in mean lifetimes between 900 N/mm2 stress level and
950 N/mm2 stress level.

You might also like