Professional Documents
Culture Documents
Model Building
11.2
a.
b.
c.
Chapter 11
H0: 1 = 0
Ha: 1 0
1 0
s
941.900
= 3.42
275.08
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 20 - (2 + 1) = 17. From Table VI, Appendix B, t.025 = 2.110. The
rejection region is t < 2.110 or t > 2.110.
Since the observed value of the test statistic falls in the rejection region (t = 3.42 <
2.110), H0 is rejected. There is sufficient evidence to indicate 1 0 at = .05.
e.
For confidence coefficient .95, = .05 and /2 = .025. From Table VI, Appendix
B, with df = n (k + 1) = 20 (2 + 1) = 17, t.025 = 2.110. The 95% confidence
interval is:
(1230.501, 372.381)
f.
R2 = R-Sq = 45.9% . 45.9% of the total sample variation of the y values is explained
by the model containing x1 and x2.
R2a = R-Sq(adj) = 39.6%. 39.6% of the total sample variation of the y values is
explained by the model containing x1 and x2, adjusted for the sample size and the
number of parameters in the model.
379
g.
11.4
h.
The observed significance level of the test is p-value = 0.005. Since the
p-value is so small, we will reject H0 for most reasonable values of . There is
sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in
predicting y at greater than 0.005.
a.
H0: 1 = 0
Ha: 1 > 0
The test statistic is t =
1 0
s
3.1
= 1.35
2.3
The rejection region requires = .05 in the upper tail of the t distribution with df =
n (k + 1) = 25 (2 + 1) = 22. From Table VI, Appendix B, t.05 = 1.717. The
rejection region is t > 1.717.
Since the observed value of the test statistic does not fall in the rejection region (t =
1.35 >/ 1.717), H0 is not rejected. There is insufficient evidence to indicate 1 > 0 at
= .05.
b.
H0: 2 = 0
Ha: 2 0
The test statistic is t =
2 0
s
.92
= 3.41
.27
380
Chapter 11
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 25 (2 + 1) = 22. From Table VI, Appendix B, t.025 = 2.074. The
rejection region is t < 2.074 or t > 2.074.
Since the observed value of the test statistic falls in the rejection region (t = 3.41 >
2.074), reject H0. There is sufficient evidence to indicate 2 0 at = .05.
c.
For confidence coefficient .90, = 1 .90 = .10 and /2 = .10/2 = .05. From Table
VI, Appendix B, with df = n (k + 1) = 25 (2 + 1) = 22, t.05 = 1.717. The
confidence interval is:
For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table
VI, Appendix B, with df = n (k + 1) = 25 (2 + 1) = 22, t.005 = 2.819. The
confidence interval is:
a.
For x2 = 1 and x3 = 3,
E(y) = 1 + 2x1 + 1 3(3)
E(y) = 2x1 7
The graph is :
381
b.
For x2 = 1 and x3 = 1
E(y) = 1 + 2x1 + (1) 3(1)
E(y) = 2x1 3
The graph is:
c.
They are parallel, each with a slope of 2. They have different y-intercepts.
d.
11.8
No. There may be other independent variables that are important that have not been
included in the model, while there may also be some variables included in the model which
are not important. The only conclusion is that at least one of the independent variables is a
good predictor of y.
11.10
a.
b.
R2 = .58. 58% of the total sample variation of the levels of trust is explained by the
model containing the 5 independent variables.
c.
F=
d.
.58 5
R2 k
=
= 16.57
2
(1 R ) [n (k + 1)] (1 .58) [66 (5 + 1)]
The rejection region requires = .10 in the upper tail of the F-distribution with 1 = k
= 5 and 2 = n (k + 1) = 66 (5 + 1) = 60. From Table VIII, Appendix B, F.10 = 1.90.
The rejection region is F > 1.96.
Since the observed value of the test statistic falls in the rejection region
(F = 16.57 > 1.96), H0 is rejected. There is sufficient evidence to indicate that at
least one of the 5 independent variables is useful in the prediction of level of trust at
= .10.
11.12
a.
y = 3.70 + .34 x1 + .49 x2 + .72 x3 + 1.14 x4 + 1.51x5 + .26 x6 .14 x7 .10 x8 .10 x9 .
382
Chapter 11
b.
0 = 3.70 . This is estimate of the y-intercept. It has no other meaning because the
point with all independent variables equal to 0 is not in the observed range.
1 = 0.34 . For each additional walk, the mean number of runs scored is estimated
to increase by .30, holding all other variables constant.
2 = 0.49 . For each additional single, the mean number of runs scored is estimated to
increase by .49, holding all other variables constant.
3 = 0.72 . For each additional double, the mean number of runs scored is
estimated to increase by .72, holding all other variables constant.
4 = 1.14 . For each additional triple, the mean number of runs scored is estimated
to increase by 1.14, holding all other variables constant.
5 = 1.51 . For each additional home run, the mean number of runs scored is
estimated to increase by 1.51, holding all other variables constant.
6 = 0.26 . For each additional stolen base, the mean number of runs scored is
estimated to increase by .26, holding all other variables constant.
7 = 0.14 . For each additional time a runner is caught stealing, the mean number
of runs scored is estimated too decrease by .14, holding all other variables constant.
8 = 0.10 . For each additional strikeout, the mean number of runs scored is
estimated to decrease by .10, holding all other variables constant.
9 = 0.10 . For each additional out, the mean number of runs scored is estimated
to decrease by .10, holding all other variables constant.
c.
H0: 7 = 0
Ha: 7 < 0
7 0
s
.14 0
= 1.00
.14
The rejection region requires = .05 in the lower tail of the t-distribution with df
= n (k + 1) = 234 (9 + 1) = 224. From Table VI, Appendix B, t.05 = 1.645. The
rejection region is t < 1.645.
Since the observed value of the test statistic does not fall in the rejection region
(t = 1.00 </ 1.645), H0 is not rejected. There is insufficient evidence to indicate
that the mean number of runs decreases as the number of runners caught stealing
increase, holding all other variables constant at = .05.
383
d.
For confidence level .95, = .05 and /2 = .05/2 = .025. From Table VI, Appendix
B, with df = 224, t.025 = 1.96. The 95% confidence interval is:
We are 95% confident that the mean number of runs will increase by anywhere from
1.412 to 1.608 for each additional home run, holding all other variables constant.
11.14. a.
b.
R2 = .31. 31% of the total sample variation of the natural log of the level of CO2
emissions in 1996 is explained by the model containing the 7 independent variables.
The test statistic is F =
.31 7
R2 k
=
= 3.72
2
(1 R ) [n (k + 1)] (1 .31) [66 (7 + 1)]
The rejection region requires = .01 in the upper tail of the F-distribution with 1 = k
= 7 and 2 = n (k + 1) = 66 (7 + 1) = 58. From Table XI, Appendix B, F.01 = 2.95.
The rejection region is F > 2.95.
Since the observed value of the test statistic falls in the rejection region
(F = 3.72 > 2.95), H0 is rejected. There is sufficient evidence to indicate that at
least one of the 7 independent variables is useful in the prediction of natural log of
the level of CO2 emissions in 1996 at = .01.
c.
11.16
d.
The test statistic is t = 2.52 and the p-value is p < 0.05. Since the observed p-value is
less than (p < .05), Ho is rejected. There is sufficient evidence to indicate foreign
investments in 1980 is a useful predictor of CO2 emissions in 1996 at = .05.
a.
Coef
-108.07
0.08509
3.771
-0.04941
S = 97.48
SE Coef
62.70
0.08221
1.619
0.02926
R-Sq = 3.9%
T
-1.72
1.03
2.33
-1.69
P
0.087
0.302
0.021
0.094
R-Sq(adj) = 1.8%
Analysis of Variance
Source
Regression
Residual Error
Total
384
DF
3
140
143
SS
53794
1330210
1384003
MS
17931
9501
F
1.89
P
0.135
Chapter 11
s = 97.48. We would expect about 95% of the observed values of DDT level to fall
within 2s or 2(97.48) = 194.96 units of their least squares predicted values.
c.
To determine if at least one of the variables is useful in predicting the DDT level, we
test:
Ho: 1 = 2 = 3 = 0
Ha: At least 1 i 0
The test statistic is F = 1.89 and the p-value is p = .135. Since the p-value is not less
than = .05 (p = .135 </ .05), H0 is not rejected. There is insufficient evidence to
indicate at least one of the variables is useful in predicting the DDT level at = .05.
d.
For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table VI,
Appendix B, with df = n 3 = 144 4 = 140, t.025 = 1.96. The 95% confidence
interval is:
(0.10676, 0.00794)
We are 95% confident that the mean DDT level will change from 0.10676 to
0.00794 for each additional point increase in weight, holding length and mile
constant. Since 0 is in the interval, there is no evidence that weight and DDT level
are linearly related.
385
11.18
a.
Coef
12.180
-0.02654
-0.4578
S = 3.519
SE Coef
4.402
0.05349
0.1283
R-Sq = 52.9%
T
2.77
-0.50
-3.57
P
0.009
0.623
0.001
R-Sq(adj) = 50.5%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
39
41
SS
542.03
483.08
1025.12
MS
271.02
12.39
F
21.88
P
0.000
2 = .458. We estimate that the mean weight change will decrease by .458% for
each additional increase of 1% in acid-detergent fibre, with digestion efficiency held
constant.
c.
d.
For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table
VI, Appendix B, with df = n (k + 1) = 42 (2 + 1) = 39, t.005 2.704. The 99%
confidence interval is:
We are 99% confident that the change in mean weight change for each unit change in
acid-detergent fiber, holding digestion efficiency constant is between .8047% and
.1109%.
386
Chapter 11
e.
R2 = R-Sq = 52.9%. 52.9% of the total sample variance of the weight changes is
explained by the model containing the 2 independent variables, digestion efficiency ad
acid-detergent fiber.
R2a = R-Sq(adj) = 50.5%. 50.5% of the total sample variance of the weight changes is
explained by the model containing the 2 independent variables, digestion efficiency
ad acid-detergent fiber, adjusting for the sample size and the number of parameters in
the model.
f.
11.20
a.
b.
c.
3 = .384.
387
d.
e.
11.22
R2 / k
.95 /18
=
= 1.06
2
(1 R ) /[n ( k + 1)]
(1 .95) /[20 (18 + 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 18
and 2 = n (k + 1) = 20 (18 + 1) = 1. From Table IX, Appendix B, F.05 245.9. The
rejection region is F > 245.9.
Since the observed value of the test statistic does not fall in the rejection region (F = 1.06
>/ 247), H0 is not rejected. There is insufficient evidence to indicate the model is adequate
at = .05.
Note: Although R2 is large, there are so many variables in the model that 2 is small.
388
Chapter 11
11.24
a.
Coef
131.92
2.726
0.04722
-2.5874
S = 9.810
SE Coef
25.69
2.275
0.09335
0.6428
R-Sq = 77.0%
T
5.13
1.20
0.51
-4.03
P
0.000
0.248
0.620
0.001
R-Sq(adj) = 72.7%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Pounds
Units
Weight
DF
3
16
19
DF
1
1
1
SS
5158.3
1539.9
6698.2
MS
1719.4
96.2
F
17.87
P
0.000
Seq SS
3400.6
198.4
1559.3
MSR
1719.4
=
= 17.87
MSE
96.2
The rejection region requires = .01 in the upper tail of the F-distribution with
1 = k = 3 and 2 = n (k + 1) = 20 (3 + 1) = 16. From Table XI, Appendix B,
F.01 = 5.29. The rejection region is F > 5.29.
Since the observed value of the test statistic falls in the rejection region (F = 17.87
> 5.29), H0 is rejected. There is sufficient evidence to indicate a relationship exists
between hours of labor and at least one of the independent variables at = .01.
c.
H0: 2 = 0
Ha: 2 0
The test statistic is t = .51. The p-value = .620. We reject H0 if p-value < . Since
.620 > .05, do not reject H0. There is insufficient evidence to indicate a relationship
exists between hours of labor and percentage of units shipped by truck, all other
variables held constant, at = .05.
389
d.
R2 is printed as R-Sq. R2 = .770. We conclude that 77% of the sample variation of the
labor hours is explained by the regression model, including the independent variables
pounds shipped, percentage of units shipped by truck, and weight.
e.
If the average number of pounds per shipment increases from 20 to 21, the estimated
change in mean number of hours of labor is 2.587. Thus, it will cost $7.50(2.587) =
$19.4025 less, if the variables x1 and x2 are constant.
f.
g.
No. Regression analysis only determines if variables are related. It cannot be used to
determine cause and effect.
11.26
From the printout, the 90% prediction interval is (-151.996, 175.4874). We are 90%
confidence that an actual DDT level for a fish caught 100 miles upstream that is 40
centimeters long and weighs 800 grams will be between -151.996 and 175.4874. Since the
DDT level cannot be negative, the interval would be between 0 and 175.4874.
11.28
a.
Coef
-102.36
0.004091
3.4511
-0.14286
S = 11.10
SE Coef
29.21
0.001218
0.7949
0.03634
R-Sq = 60.0%
T
-3.50
3.36
4.34
-3.93
P
0.002
0.002
0.000
0.001
R-Sq(adj) = 55.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Altitude
Latit
Coast
DF
1
1
1
DF
3
26
29
SS
4809.4
3202.3
8011.7
MS
1603.1
123.2
F
13.02
P
0.000
Seq SS
730.7
2175.3
1903.4
Fit
29.25
SE Fit
5.60
95.0% CI
17.75,
40.76)
95.0% PI
3.71,
54.80)
390
Chapter 11
b.
To determine if the first-order model is useful for the predicting annual precipitation,
we test:
H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is 13.02 and the p-value is p = 0.000. Since the p-value is less than
= .05, H0 is rejected. There is sufficient evidence to indicate that the model is
useful for predicting annual precipitation at = .05.
c.
11.30
Coef
StDev
0.9326
0.2482
3.76
0.002
x1
-0.024272
0.004900
-4.95
0.000
x2
0.14206
0.07573
1.88
0.080
x5
0.38457
0.09801
3.92
0.001
Constant
S = 0.4796
R-Sq = 66.6%
R-Sq(adj) = 59.9%
Analysis of Variance
Source
Regression
Residual
DF
SS
MS
6.8701
2.2900
9.95
0.001
15
3.4509
0.2301
18
10.3210
Error
Total
Sourc
DF
Seq SS
x1
1.4016
x2
1.9263
x5
3.5422
391
Unusual Observations
Obs
x1
Fit
StDev Fit
Residual
St Resid
40.0
3.200
2.068
0.239
1.132
2.72R
StDev Fit
-0.098
0.232
95.0%
( -0.592,
CI
0.396)
95.0%
(
-1.233,
PI
1.038)
The 95% prediction interval is (1.233, 1.038). We are 95% confident that the actual
voltage is between 1.233 and 1.038 kw/cm when the volume fraction of the disperse phase
is at the high level (x1 = 80), the salinity is at the low level (x2 = 1), and the amount of
surfactant is at the low level (x5 = 2).
11.32
11.34
a.
b.
a.
R2 = 1
SSE
SS yy
=1
21
= .956
479
H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is F =
R2 / k
.956 / 3
= 202.8
=
2
(1 R )[n (k + 1)] (1 .956)[32 (3 + 1)]
The rejection region requires = .05 in the upper tail of the F distribution, with 1 = k
= 3 and 2 = n (k + 1) = 32 (3 + 1) = 28. From Table IX, Appendix B, F.05 = 2.95.
The rejection region is F > 2.95.
Since the observed value of the test statistic falls in the rejection region (F = 202.8 >
2.95), H0 is rejected. There is sufficient evidence that the model is adequate for
predicting y at = .05.
392
Chapter 11
c.
d.
H0: 3 = 0
Ha: 3 0
The test statistic is t =
1 0 10
s
= 2.5.
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 32 (3 + 1) = 28. From Table VI, Appendix B, t.025 = 2.048. The
rejection region is t < 2.048 or t > 2.048.
Since the observed value of the test statistic falls in the rejection region (t = 2.5 >
2.048), H0 is rejected. There is sufficient evidence to indicate that x1 and x2 interact at
= .05.
11.36
a.
H0: 1 = 2 = 3 = 0
Ha: At least one i is not 0
The test statistic is F = 226.35 and the p-value is p < .001. Since the p-value is less
than (p < .001 < .05), Ho is rejected. There is sufficient evidence to indicate the
overall model is useful for predicting y, willingness of the consumer to shop at a
retailers store in the future at = .05.
b.
H0: 3 = 0
Ha: 3 0
The test statistic is t = -3.09 and the p-value is p < .01. Since the p-value is less
than (p < .01 < .05), H0 is rejected. There is sufficient evidence to indicate
393
When x2 = 1,
Since no value is given for o , we will use o = 1 for graphing purposes. Using
MINITAB, a graph might look like:
Scatterplot of YHAT vs X1 when X2=1
3.0
YHA T
2.5
2.0
1.5
d.
4
X1
When x2 = 7,
Since no value is given for o , we will again use o = 1 for graphing purposes.
394
Chapter 11
YHA T
-1
-2
-3
-4
1
e.
4
X1
x2=1
x2=7
YHA T
1
0
-1
-2
-3
-4
1
4
X1
Since the lines are not parallel, it indicates that interaction is present.
11.38
a.
E ( y ) = o + 1 x1 + 2 x2 + 3 x1 x2
b.
If x1 and x2 interact to affect y then the effect of x1 on y depends on the level of x2.
Also, the effect of x2 on y depends on the level of x1.
395
c.
Since the p-value is not small (p = .25), Ho is not rejected. There is insufficient
evidence to indicate x1 and x2 interact to affect y.
d.
1 corresponds to x1, the number ahead in line. If the negative feeling score gets
larger as the number of people ahead increases, then 1 is positive. 2 corresponds to
x2, the number behind in line. If the negative feeling score gets lower as the number
of people behind increases, then 2 is negative.
11.40
a.
If client credibility and linguistic delivery style interact, then the effect of client
credibility on the likelihood value depends on the level of linguistic delivery style.
b.
H0: 1 = 2 = 3 = 0
Ha: At least one i 0
c.
d.
H0: 3 = 0
Ha: 3 0
e.
f.
396
Chapter 11
11.42
a.
b.
H0: 4 = 0
c.
11.44
d.
e.
R2 = .2946. 29.46% of the variability in the client's reaction scores can be explained
by this model.
a.
1 = .02. The mean level of support for a military response is estimated to increase
by .02 for each day increase in level of TV news exposure, all other
variables held constant.
b.
H0: 1 = 0
Ha: 1 > 0
The p-value is p = .03/2 = .015. Since the p-value is less than (p = .015 < .05), H0 is
rejected. There is sufficient evidence to indicate that an increase in TV news
exposure is associated with an increase in support for military resolution, all other
variables held constant, at = .05.
c.
To determine if the relationship between support for military resolution and gender
depends on political knowledge, we test:
H0: 8 = 0
Ha: 8 0
The p-value is p = .02. Since the p-value is less than (p = .02 < .05), H0 is rejected.
There is sufficient evidence to indicate that the relationship between support for a
military resolution and gender depends on political knowledge, all other variables
held constant, at = .05.
d.
To determine if the relationship between support for military resolution and race
depends on political knowledge, we test:
H0: 9 = 0
Ha: 9 0
The p-value is p = .08. Since the p-value is not less than (p = .08 </ .05), H0 is not
rejected. There is insufficient evidence to indicate that the relationship between
397
support for a military resolution and race depends on political knowledge, all other
variables held constant, at = .05.
e.
f.
R2 = .194.
R2 / k
.194 / 9
=
= 46.88
2
(1 R ) /[n (k + 1)] (1 .194) /[1763 (9 + 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 =
k = 9 and 2 = n (k + 1) = 1763 (9 + 1) = 1753. From Table IX, Appendix B, F.05
1.88. The rejection region is F > 1.88.
Since the observed value of the test statistic falls in the rejection region (F = 46.88 >
1.88), H0 is rejected. There is sufficient evidence to indicate that the model is useful
at = .05.
11.46
a.
H0: 2 = 0
Ha: 2 0
The test statistic is t =
2 0
s
.47 0
= 3.133
.15
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 25 (2 + 1) = 22. From Table VI, Appendix B, t.025 = 2.074. The
rejection region is t < 2.074 or t > 2.074.
Since the observed value of the test statistic falls in the rejection region (t = 3.133 >
2.074), H0 is rejected. There is sufficient evidence to indicate the quadratic term
should be included in the model at = .05.
b.
H0: 2 = 0
Ha: 2 > 0
The test statistic is the same as in part a, t = 3.133.
The rejection region requires = .05 in the upper tail of the t distribution with df =
22. From Table VI, Appendix B, t.05 = 1.717. The rejection region is t > 1.717.
Since the observed value of the test statistic falls in the rejection region (t = 3.133 >
1.717), H0 is rejected. There is sufficient evidence to indicate the quadratic curve
opens upward at = .05.
398
Chapter 11
11.48
11.50
a.
b.
It moves the graph to the right (2x) or to the left (+2x) compared to the graph of
y = 1 + x2.
c.
It controls whether the graph opens up (+x2) or down (x2). It also controls how steep
the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the
narrower the curve is.
a.
b.
1 = 321.67. Since the quadratic effect is included in the model, the linear term is
just a location parameter and has no meaning.
c.
d.
Since no data have been collected past 1999, we have no idea if the relationship
between the two variables from 1984 to 1999 will remain the same until 2021.
399
11.52
a.
yhat
8
6
4
2
0
0
100
200
300
400
Dose
500
600
700
800
b.
c.
d.
11.54
a.
b.
400
Chapter 11
c.
Inter national
1000
800
600
400
200
0
100
200
300
400
Domestic
500
600
From the plot, it appears that the first order model might fit the data better. There
does not appear to be much of a curve to the relationship.
d.
Coef
202.9
-0.581
0.003638
S = 142.696
SE Coef
245.0
1.510
0.002085
R-Sq = 78.8%
T
0.83
-0.38
1.74
P
0.424
0.707
0.107
R-Sq(adj) = 75.2%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Domestic
Dsq
DF
1
1
DF
2
12
14
SS
906515
244345
1150860
MS
453258
20362
F
22.26
P
0.000
Seq SS
844526
61990
H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2
401
H0: 2 = 0
Ha: 2 0
The test statistic is t = 1.74
The p-value is p = 0.107. Since the p-value is greater than = .05
(p = 0.107 > = .05), H0 is not rejected. There is insufficient evidence to indicate
that a curvilinear relationship exists between foreign and domestic gross revenues at
= .05.
e.
11.56
402
From the analysis in part d, the first-order model better explains the variation in
foreign gross revenues. In part d, we concluded that the second-order term did not
improve the model.
a.
b.
It moves the graph to the right (2x) or to the left (+2x) compared to the graph of
y = 1 + x2.
c.
It controls whether the graph opens up (+x2) or down (x2). It also controls how steep
the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the
narrower the curve is.
Chapter 11
11.58
a.
10500+
7000+
***
* *
**
*
*
*
** *
**
3500+
*
*
* *
+---------+---------+---------+---------+---------+------X
0.0
8.0
16.0
24.0
32.0
40.0
b.
From the plot, it looks like a second-order model would fit the data better than a firstorder model. There is little evidence that a third-order model would fit the data better
than a second-order model.
c.
Coef
2752.4
122.34
s = 1904
Stdev
613.5
26.08
R-sq = 36.7%
t-ratio
4.49
4.69
p
0.000
0.000
R-sq(adj) = 35.0%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
38
39
SS
79775688
137726224
217501920
Unusual Observations
Obs.
X
Y
27
27.0
2007
40
40.0
11520
MS
79775688
3624374
Fit Stdev.Fit
6056
345
7646
591
F
22.01
Residual
-4049
3874
p
0.000
St.Resid
-2.16R
2.14R
403
To see if there is a significant linear relationship between day and demand, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 4.69.
The p-value for the test is p = 0.000. Since the p-value is less than = .05, H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship
between day and demand at = .05.
d.
Coef
5120.2
-215.92
8.250
s = 1637
Stdev
816.9
91.89
2.173
R-sq = 54.4%
t-ratio
6.27
-2.35
3.80
p
0.000
0.024
0.001
R-sq(adj) = 52.0%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
2
37
39
SS
118377056
99124856
217501920
SOURCE
X
XSQ
DF
1
1
SEQ SS
79775688
38601372
Unusual Observations
Obs.
X
Y
27
27.0
2007
MS
59188528
2679050
Fit Stdev.Fit
5305
357
F
22.09
Residual
-3298
p
0.000
St.Resid
-2.06R
H0: 2 = 0
Ha: 2 0
The test statistic is t = 3.80.
The p-value for the test is p = 0.001. Since the p-value is less than = .05, H0 is
rejected. There is sufficient evidence to indicate that there is a quadratic relationship
between day and demand at = .05.
404
Chapter 11
e.
11.60
Since the quadratic term is significant in the second-order model in part d, the second
order model is better.
a.
b.
1 estimates the difference in the mean value of the dependent variable between level
2 and level 1 of the independent variable.
2 estimates the difference in the mean value of the dependent variable between level
3 and level 1 of the independent variable.
c.
d.
MSR 2059.5
=
= 24.72
MSE
83.3
Since no was given, we will use = .05. The rejection region requires = .05 in
the upper tail of the test statistic with numerator df = k = 2 and denominator df = n
(k + 1) = 15 (2 + 1) = 12. From Table IX, Appendix B, F.05 = 3.89. The rejection
region is F > 3.89.
Since the observed value of the test statistic falls in the rejection region (F = 24.72 >
3.89), H0 is rejected. There is sufficient evidence to indicate at least one of the means
is different at = .05.
11.64
a.
b.
A confidence interval for the difference of two population means could be used.
Since both sample sizes are over 30, the large sample confidence interval is used (with
independent samples).
1 if public college
Let x1 =
0 otherwise
The model is E(y) = 0 + 1x1
405
c.
1 is the difference between the two population means. A point estimate for 1 is 1 .
A confidence interval for 1 could be used to estimate the difference in the two
population means.
11.66
a.
1 if no
Let x1 =
0 if yes
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded yes to the
question "Flextime of the position applied for" and 1 is the difference in the mean job
preference between those who responded 'no' to the question and those who answered
yes to the question.
b.
1 if referral
Let x1 =
0 if not
1 if on-premise
x2 =
0 if not
1 if counseling
Let x1 =
0 if not
1 if active search
x2 =
0 if not
1 if not married
Let x1 =
0 if married
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded married to
marital status and 1 is the difference in the mean job preference between those who
responded not married and those who answered married.
406
Chapter 11
e.
1 if female
Let x1 =
0 if male
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for males and 1 is the difference in the
mean job preference between females and males.
11.68
a.
4 = .296 The difference in the mean value of DTVA between when the operating
earnings are negative and lower than last year and when the operating earnings are
not negative and lower than last year is estimated to be .296, holding all other
variables constant.
b.
To determine if the mean DTVA for firms with negative earnings and earnings lower
than last year exceed the mean DTVA of other firms, we test:
H0: 4 = 0
Ha: 4 > 0
The p-value for this test is p = .001 / 2 = .0005. Since the p-value is so small, we
would reject H0 for = .05. There is sufficient evidence to indicate the mean DTVA
for firms with negative earnings and earnings lower than last year exceed the mean
DTVA of other firms at = .05.
11.70
c.
Ra2 = .280 28% of the variability in the DTVA scores is explained by the model
containing the 5 independent variables, adjusted for the number of variables in the
model and the sample size.
a.
To determine if there is a difference in the mean monthly rate of return for T-Bills
between an expansive Fed monetary policy and a restrictive Fed monetary policy, we
test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 8.14.
Since no n nor is given, we cannot determine the exact rejection region. However,
we can assume that n is greater than 2 since the data used are from 1972 and 1997.
With = .05, the critical value of t for the rejection region will be smaller than 4.303.
Thus, with = .05, t = 8.14 will fall in the rejection region. There is sufficient
evidence to indicate a difference in the mean monthly rate of return for T-Bills
between an expansive Fed monetary policy and a restrictive Fed monetary policy at
= .05.
However, the value of R2 is .1818. The model used is explaining only 18.18% of the
variability in the monthly rate of return. This is not a particularly large value.
407
To determine if there is a difference in the mean monthly rate of return for Equity
REIT between an expansive Fed monetary policy and a restrictive Fed monetary
policy, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 3.46.
Since no n nor is given, we cannot determine the exact rejection region. However,
we can assume that n is greater than 4 since the data used are from 1972 and 1997.
With = .05, the critical value of t for the rejection region will be smaller than 3.182.
Thus, with = .05, t = 3.46 will fall in the rejection region. There is sufficient
evidence to indicate a difference in the mean monthly rate of return for Equity REIT
between an expansive Fed monetary policy and a restrictive Fed monetary policy at
= .05.
However, the value of R2 is .0387. The model used is explaining only 3.87% of the
variability in the monthly rate of return. This is a very small value.
b.
For the first model, 1 is the difference in the mean monthly rate of return for T-Bills
between an expansive Fed monetary policy and a restrictive Fed monetary policy.
For the second model, 1 is the difference in the mean monthly rate of return for
Equity REIT between an expansive Fed monetary policy and a restrictive Fed
monetary policy.
c.
The least squares prediction equation for the equity REIT index is:
y = 0.01863 0.01582x.
When the Federal Reserves monetary policy is restrictive, x = 1. The predicted mean
monthly rate of return for the equity REIT index is
a.
b.
408
1 if level 3
x3 =
0 otherwise
Chapter 11
c.
11.74
11.76
d.
e.
a.
b.
409
11.78
a.
Coef
StDev
-2.210
0.07831
10.354
-0.0948
1.250
0.04947
8.538
0.1418
-1.77
1.58
1.21
-0.67
0.085
0.122
0.233
0.508
R-Sq = 44.1%
R-Sq(adj) = 39.7
Analysis of Variance
Source
Regression
Residual
Error
Total
Sourc
e
x1
x2
x1x2
DF
3
38
SS
452.54
572.58
41
1025.12
DF
Seq SS
1
1
1
384.24
61.57
6.73
MS
150.85
15.07
F
10.01
P
0.000
Unusual Observations
Obs
12
37
40
x1
30.0
42.5
75.0
y
-8.500
8.000
8.500
410
Chapter 11
c.
The slope is .0783. For each unit increase in digestion efficiency, the mean weight
change is estimated to increase by .0783 for goslings fed plants.
d.
To determine if the slopes associated with the two diets differ, we test:
H0: 3 = 0
Ha: 3 0
a.
1 if intervention group
Let x2 =
0 if otherwise
The first-order model would be:
E(y) = 0 + 1x1 + 2x2
b.
If pretest score and group interact, the first-order model would be:
E(y) = 0 + 1x1 + 2x2 + 3x1x2
411
d.
For the control group, x2 = 0. The first-order model including the interaction is:
E(y) = 0 + 1x1 + 2(0) + 3x1(0) = 0 + 1x1
For the intervention group, x2 = 1. The first-order model including the interaction is:
E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 1x1 + 2 + 3x1
= (0 + 2) + (1 + 3)x1
The slope of the model for the control group is 1. The slope of the model for the
intervention group is 1 + 3.
11.82
a.
b.
For the high-tech firms, x2 = 1. The model for the high-tech firm is:
E(y) = 0 + 1x1 + 2(1) = 0 + 2 + 1x1
d.
For the high-tech firms, x2 = 1. The model for the high-tech firm is:
E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 2 + (1 + 3)x1
By adding variables to the model, SSE will decrease or stay the same. Thus, SSEC SSER.
The only circumstance under which we will reject H0 is if SSEC is much smaller than SSER.
If SSEC is much smaller than SSER, F will be large. Thus, the test is only one-tailed.
11.86
a.
b.
c.
412
Chapter 11
d.
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
(SSE R SSE C)/(k g )
SSE C /[n (k + 1)]
(1250.2 1125.2) /(5 2) 41.6667
= .89
=
=
1125.2 /[30 (5 + 1)]
46.8833
The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k g = 5 2 = 3 and denominator df = n (k + 1) = 30 (5 + 1) = 24.
From Table IX, Appendix B, F.05 = 3.01. The rejection region is F > 3.01.
Since the observed value of the test statistic does not fall in the rejection region (F =
.89 >/ 3.01), H0 is not rejected. There is insufficient evidence to indicate the secondorder terms are useful at = .05.
11.88
a.
b.
c.
11.90
d.
Since the p-value is so small (p < .0001), H0 is rejected. There is sufficient evidence
to indicate at least one of the seven diagnostic variables contributes information for
the prediction of y.
a.
413
b.
c.
H0: 2 = 5 = 0
To determine if the interaction terms are important, we test:
H0: 4 = 5 = 0
d.
From MINITAB, the outputs from fitting the three models are:
Regression Analysis: Value versus Age, AgeSq, Status, AgeSt, AgeSqSt
The regression equation is
Value = 83 - 5.7 Age + 0.236 AgeSq - 62 Status + 5.4 AgeSt - 0.234 AgeSqSt
Predictor
Constant
Age
AgeSq
Status
AgeSt
AgeSqSt
Coef
83.4
-5.74
0.2361
-62.1
5.36
-0.2337
S = 286.8
SE Coef
316.3
18.68
0.2549
354.8
24.81
0.4080
R-Sq = 24.7%
T
0.26
-0.31
0.93
-0.18
0.22
-0.57
P
0.793
0.760
0.359
0.862
0.830
0.570
R-Sq(adj) = 16.1%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
AgeSt
AgeSqSt
DF
5
44
49
DF
1
1
1
1
1
SS
1186549
3618994
4805542
MS
237310
82250
F
2.89
P
0.024
Seq SS
865746
138871
77594
77342
26996
Coef
-176.1
11.166
196.5
-11.432
S = 283.2
SE Coef
145.0
3.902
178.9
6.763
R-Sq = 23.2%
T
-1.21
2.86
1.10
-1.69
P
0.231
0.006
0.278
0.098
R-Sq(adj) = 18.2%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Status
AgeSt
414
DF
1
1
1
DF
3
46
49
SS
1116017
3689526
4805543
MS
372006
80207
F
4.64
P
0.006
Seq SS
865746
21097
229174
Chapter 11
Coef
165.8
-8.81
0.2535
-105.6
S = 284.5
SE Coef
182.7
10.89
0.1632
107.9
R-Sq = 22.5%
T
0.91
-0.81
1.55
-0.98
P
0.369
0.423
0.127
0.333
R-Sq(adj) = 17.5%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
DF
1
1
1
DF
3
46
49
SS
1082210
3723332
4805542
MS
360737
80942
F
4.46
P
0.008
Seq SS
865746
138871
77594
Since no is given, we will use = .05. The rejection region requires = .05 in the
upper tail of the F distribution with 1 = 2 numerator degrees of freedom and 2 = 44
denominator degrees of freedom. From Table IX, Appendix B, F.05 3.23. The
rejection region is F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region (F =
.429 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the
quadratic terms are important for predicting market value at = .05.
Test for part c:
The test statistic is:
F=
The rejection region is the same as in previous test. Reject H0 if F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region
(F = .634 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the
interaction terms are important for predicting market value at = .05.
415
11.92
a.
The reduced model for testing if the mean posttest scores differ for the intervention
and control groups would be:
E(y) = 0 + 1x1
11.94
b.
The reported p-value is .03. Since the p-value is so small, H0 is rejected. There is
evidence to indicate that the mean posttest sun safety knowledge scores differ for the
intervention and control groups for > .03.
c.
The reported p-value is .033. Since the p-value is so small, H0 is rejected. There is
evidence to indicate that the mean posttest sun safety comprehension scores differ for
the intervention and control groups for > .033.
d.
The reported p-value is .322. Since the p-value is not small, H0 is not rejected. There
is no evidence to indicate that the mean posttest sun safety application scores differ
for the intervention and control groups for < .322.
a.
b.
To determine whether there are differences in mean emotional distress levels that are
attributable to exposure group, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
c.
To determine whether there are differences in mean emotional distress levels that are
attributable to exposure group, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
The test statistic is F =
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k
g = 5 2 = 3 and 2 = n (k + 1) = 200 (5 + 1) = 194. From Table IX, Appendix
B, F.05 2.60. The rejection region is F > 2.60.
Since the observed value of the test statistic does not fall in the rejection region
(F = .93 >/ 2.60), H0 is not rejected. There is insufficient evidence to indicate that
there are differences in mean emotional distress levels that are attributable to exposure
group at = .05.
416
Chapter 11
11.96
a.
The best one-variable predictor of y is the one whose t statistic has the largest absolute
value. The t statistics for each of the variables are:
Independent
Variable
x1
x2
x3
x4
x5
x6
t=
i
s
t = 1.6/.42 = 3.81
t = .9/.01 = 90
t = 3.4/1.14 = 2.98
t = 2.5/2.06 = 1.21
t = 4.4/.73 = 6.03
t = .3/.35 = .86
The variable x2 is the best one-variable predictor of y. The absolute value of the
corresponding t score is 90. This is larger than any of the others.
11.98
b.
Yes. In the stepwise procedure, the first variable entered is the one which has the
largest absolute value of t, provided the absolute value of the t falls in the rejection
region.
c.
Once x2 is entered, the next variable that is entered is the one that, in conjunction with
x2, has the largest absolute t value associated with it.
a.
In step 1, all 1 variable models are fit. Thus, there are a total of 11 models fit.
b.
In step 2, all two-variable models are fit, where 1 of the variables is the best one
selected in step 1. Thus, a total of 10 two-variable models are fit.
c.
In the 11th step, only one model is fit the model containing all the independent
variables.
d.
E ( y ) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 7 x7 + 9 x9 + 10 x10 + 11 x11
e.
67.7% of the total sample variability of overall satisfaction is explained by the model
containing the independent variables safety on bus, seat availability, dependability, t
travel time, convenience of route, safety at bus stops, hours of service, and frequency
of service.
f.
Using stepwise regression does not guarantee that the best model will be found.
There may be better combinations of the independent variables that are never found,
because of the order in which the independent variables are entered into the model.
417
11.100 a.
The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved
shape. Such a pattern usually indicates that curvature needs to be added to the model.
b.
The plot of the residuals reveals a nonrandom pattern. The residuals versus the
predicted values shows a pattern where the range in values of the residuals increases
as y increases. This indicates that the variance of the random error, , becomes
larger as the estimate of E(y) increases in value. Since E(y) depends on the x-values
in the model, this implies that the variance of is not constant for all settings of the
x's.
c.
This plot reveals an outlier, since all or almost all of the residuals should fall within 3
standard deviations of their mean of 0.
d.
This frequency distribution of the residuals is skewed to the right. This may be due to
outliers or could indicate the need for a transformation of the dependent variable.
11.102 a.
b.
Since all the pairwise correlations are .45 or less in absolute value, there is little
evidence of extreme multicollinearity.
No. The overall model test is significant (p < .001). This implies that at least one
variable contributes to the prediction of the urban/rural rating. Looking at the
individual t-tests, there are several that are significant, namely x1, x3, and x5. There is
no evidence that multicollinearity is present.
418
Chapter 11
11.106 a.
Coef
2.7944
-0.000164
0.38348
S = 0.7188
SE Coef
0.4363
0.006564
0.07189
R-Sq = 55.8%
T
6.40
-0.02
5.33
P
0.000
0.980
0.000
R-Sq(adj) = 52.0%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Income
Size
DF
2
23
25
DF
1
1
SS
15.0027
11.8839
26.8865
MS
7.5013
0.5167
F
14.52
P
0.000
Seq SS
0.2989
14.7037
No; Income and household size do not seem to be highly correlated. The correlation
coefficient between income and household size is .137.
b.
Frequency
10
0
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Residual
419
Residual
-1
3
Fitted Value
Residual
-1
0
10
20
30
40
50
60
70
80
90
100
Income
Residual
-1
0
Size
Yes; The residuals versus income and residuals versus homesize exhibit a curved shape.
Such a pattern could indicate that a second-order model may be more appropriate.
420
Chapter 11
c.
No; The residuals versus the predicted values reveals varying spreads for different
values of y . This implies that the variance of is not constant for all settings of the
x's.
d.
Yes; The outlier shows up in several plots and is the 26th household (Food consumption
= $7500, income = $7300 and household size = 5).
e.
No; The frequency distribution of the residuals shows that the outlier skews the
frequency distribution to the right.
Percent
99
90
50
10
1
0.1
99.9
-5
0
5
Standardized Residual
2.5
0.0
50
10
50
Fitted Value
100
Frequency
100
2
4
6
8
Standardized Residual
5.0
7.5
10
150
10.0
10.0
7.5
5.0
2.5
0.0
1 10 20 30 4 0 5 0 6 0 7 0 8 0 9 0 00 10 20 30 40
1 1 1 1 1
Observation Order
Standardized Residual
10
8
6
4
2
0
0
500
1000
1500
2000
2500
WEIGHT
421
Standardized Residual
10
8
6
4
2
0
20
25
30
35
LENGTH
40
45
50
55
Standardized Residual
10
8
6
4
2
0
0
50
100
150
200
250
300
350
MILE
From the normal probability plot, the points do not fall on a straight line, indicating the
residuals are not normal. The histogram of the residuals indicates the residuals are
skewed to the right, which also indicates that the residuals are not normal. The plot of
the residuals versus yhat indicates that there is at least one outlier and the variance is
not constant. One observation has a standardized residual of more than 10 and several
others have standardized residuals greater than 3. This is also evident in the plots of the
residuals versus each of the independent variables. Since the assumptions of normality
and constant variance appear to be violated, we could consider transforming the data.
We should also check the outlying observations to see if there are any errors connected
with these observations.
11.110 a.
422
R2 / k
.83 / 4
=
= 24.41
2
(1 R ) /[n (k + 1)] (1 .83)([25 (4 + 1)]
Chapter 11
The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k = 4 and denominator df = n (k + 1) = 25 (4 + 1) = 20. From
Table IX, Appendix B, F.05 = 2.87. The rejection region is F > 2.87.
Since the observed value of the test statistic falls in the rejection region (F = 24.41
> 2.87), H0 is rejected. There is sufficient evidence to indicate at least one of the
parameters is nonzero at = .05.
b.
H0: 1 = 0
Ha: 1 < 0
The test statistic is t =
1 0
s
2.43 0
= 2.01
1.21
The rejection region requires = .05 in the lower tail of the t distribution with df =
n (k + 1) = 25 (4 + 1) = 20. From Table VI, Appendix B, t.05 = 1.725. The
rejection region is t < 1.725.
Since the observed value of the test statistic falls in the rejection region (t = 2.01
< 1.725), H0 is rejected. There is sufficient evidence to indicate 1 is less than 0 at
= .05.
c.
H0: 2 = 0
Ha: 2 > 0
The test statistic is t =
2 0
s
.05 0
= .31
.16
The rejection region requires = .05 in the upper tail of the t distribution. From part
b above, the rejection region is t > 1.725.
Since the observed value of the test statistic does not fall in the rejection region (t =
.31 >/ 1.725), H0 is not rejected. There is insufficient evidence to indicate 2 is
greater than 0 at = .05.
d.
H0: 3 = 0
Ha: 3 0
The test statistic is t =
3 0
s
.62 0
= 2.38
.26
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = 20. From Table VI, Appendix B, t.025 = 2.086. The rejection region is t < 2.086
or t > 2.086.
Since the observed value of the test statistic falls in the rejection region (t = 2.38 >
2.086), H0 is rejected. There is sufficient evidence to indicate 3 is different from 0 at
= .05.
423
11.112 The error of prediction is smallest when the values of x1, x2, and x3 are equal to their sample
means. The further x1, x2, and x3 are from their means, the larger the error. When x1 = 60,
x2 = .4, and x3 = 900, the observed values are outside the observed ranges of the x values.
When x1 = 30, x2 = .6, and x3 = 1300, the observed values are within the observed ranges
and consequently the x values are closer to their means. Thus, when x1 = 30, x2 = .6, and
x3 = 1300, the error of prediction is smaller.
11.114 From the plot of the residuals for the straight line model, there appears to be a mound shape
which implies the quadratic model should be used.
11.116 a.
b.
(SSE R SSE C) /( k g )
SSE C /[n (k + 1)]
d.
The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = 2 and denominator df = 29. From Table IX, Appendix B, F.05 = 3.33.
The rejection region is F > 3.33.
11.118 a.
b.
424
Chapter 11
11.120 a.
b.
11.122 a.
b.
c.
The quantitative variables GMAT score, verbal GMAT score, undergraduate GPA,
and first-year graduate GPA should all be positively correlated to final GPA.
1
x5 =
0
1
x6 =
0
d.
e.
5 = difference in mean final GPA between student cohort year 2 and year 1.
6 = difference in mean final GPA between student cohort year 3 and year 1.
f.
425
g.
11.124 a.
2 = difference in the mean salaries between whites and nonwhites, all other variables
held constant.
3 = change in the mean salary for each additional year of education, all other
variables held constant.
4 = change in the mean salary for each additional year of tenure with firm, all other
variables held constant.
5 = change in the mean salary for each additional hour worked per week, all other
variables held constant.
b.
2 : We estimate the difference in the mean salaries between whites and nonwhites to
be
3 : We estimate the change in the mean salary for each additional year of education
to be $1.519, all other variables held constant.
4 : We estimate the change in the mean salary for each additional year of tenure
with firm to be $.320, all other variables held constant.
5 : We estimate the change in the mean salary for each additional hour worked per
week to be $.205, all other variables held constant.
426
Chapter 11
c.
R2 = .240. 24% of the total variability of salaries is explained by the model containing
gender, race, educational level, tenure with firm, and number of hours worked per
week.
To determine if the model is useful for predicting annual salary, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0
The test statistic is F =
R2 / k
.24 / 5
=
= 11.68
2
(1 R )[n (k + 1)] (1 .24) /[191 (5 + 1)]
The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k = 5 and denominator df = n (k + 1) = 191 (5 + 1) = 185. From
Table IX, Appendix B, F.05 2.21. The rejection region is F > 2.21.
Since the observed value of the test statistic falls in the rejection region (F = 11.68 >
2.21), H0 is rejected. There is sufficient evidence to indicate the model containing
gender, race, educational level, tenure with firm, and number of hours worked per
week is useful for predicting annual salary for = .05.
d.
To determine if male managers are paid more than female managers, we test:
H0: 1 = 0
Ha: 1 > 0
The p-value given for the test < .05/2 = .025. Since the p-value is less than = .05,
there is evidence to reject H0. There is evidence to indicate male managers are paid
more than female managers, holding all other variables constant, for > .025.
e.
11.126 a.
b.
The salary paid an individual depends on many factors other than gender. Thus, in
order to adjust for other factors influencing salary, we include them in the model.
The main effects model would be: E ( y ) = 0 + 1 x1 + 8 x8
1 = .28 . The mean value for the relative error of the effort estimate for developers
is estimated to be .28 units below that of project leaders, holding previous accuracy
constant.
8 = .27 . The mean value for the relative error of the effort estimate if previous
accuracy is more than 20% is estimated to be .27 units above that if previous
accuracy is less than 20%, holding company role of estimator constant.
c.
One possible reason for the sign of 1 being opposite from what is expected could be
that company role of estimator and previous accuracy could be correlated.
427
11.128 a.
R2 = .45. 45% of the total variability of the suicide rates is explained by the model
containing unemployment rate, percentage of females in the work force, divorce rate,
logarithm of GNP, and annual percent change in GNP.
To determine if the model is useful for predicting suicide rate, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0
The test statistic is F =
R2 / k
.45 / 5
=
= 6.38
2
(1 R )[n (k + 1)] (1 .45) /[45 (5 + 1)]
The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k = 5 and denominator df = n (k + 1) = 45 (5 + 1) = 39. From
Table IX, Appendix B, F.05 2.45. The rejection region is F > 2.45.
Since the observed value of the test statistic falls in the rejection region (F = 6.38 >
2.45), H0 is rejected. There is sufficient evidence to indicate the model containing
unemployment rate, percentage of females in the work force, divorce rate, logarithm of
GNP and annual percent change in GNP is useful for predicting suicide rate for = .05.
b.
2 : We estimate the change in suicide rate for each unit change in percentage of
females in the work force to be .0231, all other variables held constant.
3 : We estimate the change in suicide rate for each unit change in divorce rate to be
.0765, all other variables held constant.
4 : We estimate the change in suicide rate for each unit change in logarithm of GNP
to be .2760, all other variables held constant.
5 : We estimate the change in suicide rate for each unit change in annual percent
change in GNP to be .0018, all other variables held constant.
The p-values for unemployment rate and percentage of females in the work force are
less than .05. This indicates that both are important in predicting suicide rate. The pvalues for divorce rate, logarithm of GNP, and annual percent change in GNP are all
greater than .10. This indicates that none of these variables are important in
predicting suicide rate. We must view these conclusions with caution. Some of these
independent variables may be highly correlated with each other. If so, some of the
variables declared nonsignificant may be significant if the other variables are removed
from the model.
428
Chapter 11
c.
d.
Curvature: It may be possible that the relationship between the suicide rate and some
of the independent variables is not linear, but curved. Thus, some of the variables that
do not appear to be useful predictors may, in fact, be useful predictors if the secondorder term was added to the model.
Interaction: Again, it may be possible that the effect of some independent variables
on the suicide rate is different for different levels of other independent variables. This
possibility should be explored before throwing out certain independent variables.
Multicollinearity: Some of these independent variables may be highly correlated with
each other. If so, some of the variables declared nonsignificant may be significant if
other variables are removed from the model.
11.130 CEO income (x1) and stock percentage (x2) are said to interact if the effect of one variable,
say CEO income, on the dependent variable profit (y) depends on the level of the second
variable, stock percentage.
11.132 a.
MEAN
DF
SQUARES
SQUARE
F VALUE
PROB>F
MODEL
25784705.01
8594901.67
241.758
0.0001
ERROR
16
568826.19
35551.63709
C TOTAL
19
26353531.20
ROOT MSE
188.5514
R-SQUARE
0.9784
DEP MEAN
3014.2
ADJ R-SQ
0.9744
SOURCE
C.V.
6.255438
PARAMETER ESTIMATES
PARAMETER
STANDARD
T FOR H0:
ESTIMATE
ERROR
PARAMETER=0
290.99944
4.581
0.0003
0.37864583
-0.399
0.6949
5.34596285
-0.491
0.6300
0.006863831
7.569
0.0001
VARIABLE
DF
INTERCEP
1333.17830
X1
-0.15122302
X2
-2.62532461
X1X2
0.05195415
429
b.
MSR
8, 594, 901.67
=
= 241.758
MSE
35, 551.637
The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k = 3 and denominator df = n (k + 1) = 20 (3 + 1) = 16. From
Table IX, Appendix B, F.05 = 3.24. The rejection region is F > 3.24.
Since the observed value of the test statistic falls in the rejection region (F = 241.758
> 3.24), H0 is rejected. There is sufficient evidence to indicate the model is useful at
= .05.
c.
3 0
s
= 7.569.
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 20 (3 + 1) = 16. From Table VI, Appendix B, t.025 = 2.120. The
rejection region is t < 2.120 or t > 2.120.
Since the observed value of the test statistic falls in the rejection region (t = 7.569 >
2.120), H0 is rejected. There is sufficient evidence to indicate the interaction between
advertising expenditure and shelf space is present at = .05.
430
d.
Advertising expenditure and shelf space are said to interact if the affect of advertising
expenditure on sales is different at different levels of shelf space.
e.
If a first-order model was used, the effect of advertising expenditure on sales would
be the same regardless of the amount of shelf space. If interaction really exists, the
effect of advertising expenditure on sales would depend on which level of shelf space
was present.
Chapter 11
11.134 a.
Coef
StDev
42.247
5.712
7.40
0.000
-0.011404
0.005053
-2.26
0.037
0.00000061
0.00000037
1.66
0.115
Constant
x
xsq
S = 21.81
R-Sq = 34.9%
R-Sq(adj) = 27.2%
Analysis of Variance
Source
DF
SS
MS
4325.4
2162.7
4.55
0.026
475.6
Regression
Residual Error
17
8085.5
Total
19
12410.9
Sourc
DF
Seq SS
e
x
3013.3
xsq
1312.1
Unusual Observations
Obs
16
17
x1
9150
Fit
StDev Fit
Residual
4.60
-11.21
16.24
15.81
St Resid
1.09 x
15022
2.20
8.09
21.40
-5.89
-1.41 x
431
c.
From MINITAB, the test statistic is t = 1.66 with p-value = .115. Since the p-value is
greater than = .05, do not reject H0. There is insufficient evidence to indicate that a
curvilinear relationship exists between dissolved phosphorus percentage and soil loss
at = .05.
11.136 a.
b.
Coef
StDev
28.87
12.67
2.28
0.034
x1
-0.00000011
0.00000028
-0.38
0.708
x2
0.8440
0.2326
3.63
0.002
x3
-0.3600
0.1316
-2.74
0.013
x4
-0.3003
0.1834
-1.64
0.117
Constant
S = 5.989
R-Sq = 51.2%
R-Sq(adj) = 41.5%
Analysis of Variance
Source
Regression
DF
SS
MS
753.76
188.44
5.25
0.005
35.87
Residual Error
20
717.40
Total
24
1471.17
Source
DF
Seq SS
x1
129.96
x2
355.43
x3
172.19
x4
96.17
Unusual Observations
Obs
x1
Fit
StDev Fit
Residual
11940345
32.60
17.25
3.40
15.35
St Resid
3.11R
12
4905123
27.00
16.17
4.36
10.83
2.63R
432
Chapter 11
The least squares prediction line is y = 28.9 .00000011x1 + .844x2 .360x3 .300x4.
To determine if the model is useful for predicting percentage of problem mortgages,
we test:
H0: 1 = 2 = 3 = 4 = 0
Ha: At least one of the coefficients is nonzero
MS(Model)
= 5.25
MSE
The p-value is p = .005. Since the p-value is less than = .05 (p = .005 < .05), H0 is
rejected. There is sufficient evidence to indicate the model is useful in predicting
percentage of problem mortgages at = .05.
c.
0 = 28.9. This is merely the y-intercept. It has no other meaning in this problem.
1 = 0.00000011. For each unit increase in total mortgage loans, the mean
percentage of problem mortgages is estimated to decrease by 0.00000011, holding
percentage of invested assets, percentage of commercial mortgages, and percentage of
residential mortgages constant.
2 = 0.844. For each unit increase in percentage of invested assets, the mean
percentage of problem mortgages is estimated to increase by 0.844, holding total
mortgage loans, percentage of commercial mortgages, and percentage of residential
mortgages constant.
4 = 0.300. For each unit increase in percentage of residential mortgages, the mean
percentage of problem mortgages is estimated to decrease by 0.300, holding total
mortgage loans, percentage of invested assets, and percentage of commercial
mortgages constant.
433
d.
From the scattergrams, it appears that possibly x2 and x4 might warrant inclusion in
the model as second order terms.
434
Chapter 11
e.
Coef
StDev
56.17
13.81
4.07
0.001
x1
-0.00000008
0.00000025
-0.31
0.760
x2
-1.8177
0.9935
-1.83
0.084
x3
-0.4494
0.1127
-3.99
0.001
x4
0.2227
0.6079
0.37
0.718
x2sq
0.07707
0.02665
2.89
0.010
x4sq
-0.01887
0.02334
-0.81
0.429
Constant
S = 4.956
R-Sq = 69.9%
R-Sq(adj) = 59.9%
Analysis of Variance
Source
Regression
DF
SS
MS
1029.03
171.51
6.98
0.001
24.56
Residual Error
18
442.13
Total
24
1471.17
Source
DF
Seq SS
x1
129.96
x2
355.43
x3
172.19
x4
96.17
x2sq
259.22
x4sq
16.05
Unusual Observations
Obs
x1
Fit
StDev Fit
Residual
4 11940345
32.600
26.777
4.038
5.823
St Resid
2.03R
-2.04R
10
5328142
7.500
16.105
2.599
-8.605
12
4905123
27.000
16.559
3.607
10.441
3.07R
20
2978628
3.200
11.759
2.679
-8.559
-2.05R
435
MS(Model)
= 6.98
MSE
The p-value is p = .001. Since the p-value is less than = .05 (p = .001 < .05), H0 is
rejected. There is sufficient evidence to indicate the model is useful in predicting
percentage of problem mortgages at = .05.
f.
The rejection region requires = .05 in the upper tail of the F-distribution with 1 =
(k g) = (6 4) = 2 and 2 = n (k + 1) = 25 (6 + 1) = 18. From Table IX,
Appendix B, F.05 = 3.55. The rejection region is F > 3.55.
Since the observed value of the test statistic falls in the rejection region (F = 5.60 >
3.55), H0 is rejected. There is sufficient evidence to indicate one or more of the
second-order terms of our model contribute information for the prediction of the
percentage of problem mortgages at = .05.
11.138 a.
MEAN
DF
SQUARES
SQUARE
F VALUE
PROB>F
MODEL
2396.36410
798.78803
99.394
0.0001
ERROR
16
128.58590
8.03662
C TOTAL
11
2524.95000
SOURCE
436
ROOT MSE
2.83489
R-SQUARE
0.9491
DEP MEAN
23.05000
ADJ R-SQ
0.9395
C.V.
12.29889
Chapter 11
PARAMETER ESTIMATES
PARAMETER
STANDARD
T FOR H0:
VARIABLE
DF
ESTIMATE
ERROR
PARAMETER=0
INTERCEP
-11.768830
3.05032146
-3.858
0.0014
X1
10.293782
1.43788129
7.159
0.0001
X1SQ
-0.417991
0.16132974
-2.591
0.0197
X2
13.244076
1.50325080
8.810
0.0001
The reduced model E(y) = 0 + 3x2 was fit to the data. The SAS output is:
DEP VARIABLE: Y
ANALYSIS OF VARIANCE
SUM OF
MEAN
DF
SQUARES
SQUARE
F VALUE
PROB>F
MODEL
1.25000000
1.25000000
0.009
0.9258
ERROR
18
2523.70000
140.20556
C TOTAL
19
2524.95000
ROOT MSE
11.84084
R-SQUARE
0.0005
DEP MEAN
23.05
ADJ R-SQ
-0.0550
SOURCE
C.V.
51.37025
PARAMETER ESTIMATES
PARAMETER
STANDARD
T FOR H0:
VARIABLE
DF
ESTIMATE
ERROR
PARAMETER=0
INTERCEP
23.30000000
3.74440323
6.223
0.0001
X2
-0.50000000
5.29538583
-0.094
0.9258
437
The rejection region requires = .10 in the upper tail of the F distribution with
numerator df = k g = 3 1 = 2 and denominator df = n (k + 1) = 20 (3 + 1) = 16.
From Table VIII, Appendix B, F.10 = 2.67. The rejection region is F > 2.67.
Since the observed value of the test statistic falls in the rejection region (F = 149.01
> 2.67), H0 is rejected. There is sufficient evidence to indicate the age of the machine
contributes information to the model at = .10.
After adjusting for machine type, there is evidence that down time is related to age.
11.140 a.
438
Chapter 11
For both sunny weekdays and sunny weekend days, as the predicted high temperature
increases, so does the predicted day's attendance. However, the predicted day's
attendance on sunny weekend days increases at a faster rate than on sunny weekdays.
Also, the predicted day's attendance is higher on sunny weekend days than on sunny
weekdays.
b.
4
s
15
=5
3
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 30 (4 + 1) = 25. From Table VI, Appendix B, t.025 = 2.06. The
rejection region is t < 2.06 or t > 2.06.
Since the observed value of the test statistic falls in the rejection region (t = 5 > 2.06),
H0 is rejected. There is sufficient evidence to indicate the interaction term is a useful
addition to the model at = .05.
c.
d.
The width of the interval in Exercise 11.139e is 1245 645 = 600, while the width is
850 800 = 50 for the model containing the interaction term. The smaller the width
of the interval, the smaller the variance. This implies that the interaction term is quite
useful in predicting daily attendance. It has reduced the unexplained error.
439
e.
11.142 a.
Coef
-9.917
0.16681
0.13760
-0.0011082
-0.0008433
0.0002411
S = 0.1871
SE Coef
1.354
0.02124
0.02673
0.0001173
0.0001594
0.0001440
R-Sq = 93.7%
T
-7.32
7.85
5.15
-9.45
-5.29
1.67
P
0.000
0.000
0.000
0.000
0.000
0.103
R-Sq(adj) = 92.7%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1sq
x2sq
x1x2
DF
5
34
39
DF
1
1
1
1
1
SS
17.5827
1.1908
18.7735
MS
3.5165
0.0350
F
100.41
P
0.000
Seq SS
5.2549
7.5311
3.6434
1.0552
0.0982
b.
The standard deviation for the first-order model is s = .4023. The standard deviation
for the second-order model is s = .1871.
The relative precision for the first-order model is 2(.4023) = .8046. The relative
precision for the second-order model is 2(.1871) = .3742.
c.
MSR
3.5165
=
= 100.41
MSE
.0350
The p-value is .0000. Since the p-value is less than = .05, H0 is rejected. There is
sufficient evidence to indicate the model is useful for predicting GPA at = .05.
440
Chapter 11
d.
Residuals Versus x1
(response is y)
0.5
0.4
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
40
50
60
70
80
90
100
x1
441
Residuals Versus x2
(response is y)
0.5
0.4
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
50
60
70
80
90
100
x2
The residual plots of the residuals against x1 and against x2 for the second-order model
indicate there is no mound or bowl shape in either graph. This implies that secondorder is the highest order necessary. We have eliminated the mound shape from the
plots of the residuals against x1 and the residuals against x2 for the first-order model.
From the plots and the results of the tests in 11.145, it appears the second order model
is preferable for predicting GPA.
f.
Since no is given, we will use = .05. The rejection region requires = .05 in the
upper tail of the F distribution with 1 = k g = 5 2 = 3 and 2 = n [k + 1] =
40 (5 + 1) = 34. From Table IX, Appendix B, F.05 2.92. The rejection region is
F > 2.92.
Since the observed value of the test statistic falls in the rejection region (F = 45.68 >
2.92), H0 is rejected. There is sufficient evidence that at least one second-order term
is useful at = .05.
442
Chapter 11
11.144 a.
b.
1 if brand 3
x3 =
0 otherwise
c.
443
Several models were fit to obtain the final model. I first fit a model with only the main effects for
Floor, Distance, View, Endunit, and Furnish. Of these, only Furnish, adjusted for the other variables,
was not significant. See the output below.
The regression equation is
Price = 184 - 3.81 Floor + 1.74 Distance + 40.3 View - 32.7 Endunit
+ 4.28 Furnish
Predictor
Constant
Floor
Distance
View
Endunit
Furnish
Coef
183.570
-3.8076
1.7414
40.325
-32.716
4.279
s = 24.39
Stdev
5.221
0.7482
0.3750
3.456
9.581
3.602
R-sq = 49.4%
t-ratio
35.16
-5.09
4.64
11.67
-3.41
1.19
p
0.000
0.000
0.000
0.000
0.001
0.236
R-sq(adj) = 48.2%
Analysis of Variance
SOURCE
Regression
Error
Total
SOURCE
Floor
Distance
View
Endunit
Furnish
DF
5
203
208
SS
118091
120802
238893
DF
1
1
1
1
1
SEQ SS
14149
21208
75065
6829
840
MS
23618
595
F
39.69
p
0.000
I then added Floor2 and Distance2 to the model with all main effects. For this model, all of the main
effects, including Furnish, were significant along with both squared terms. The output follows.
The regression equation is
Price = 220 - 13.3 Floor - 7.01 Distance + 38.9 View - 22.0 Endunit
+ 7.31 Furnish + 1.05 FlSq + 0.572 DiSq
Predictor
Constant
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
s = 22.49
444
Coef
220.258
-13.296
-7.007
38.927
-21.967
7.308
1.0512
0.5719
Stdev
8.178
3.253
1.614
3.202
9.086
3.419
0.3492
0.1033
R-sq = 57.4%
t-ratio
26.93
-4.09
-4.34
12.16
-2.42
2.14
3.01
5.54
p
0.000
0.000
0.000
0.000
0.017
0.034
0.003
0.000
R-sq(adj) = 56.0%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
7
201
208
SS
137234
101659
238893
DF
1
1
1
1
1
1
1
SEQ SS
14149
21208
75065
6829
840
3640
15503
SOURCE
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
MS
19605
506
F
38.76
p
0.000
I then did a stepwise regression, forcing all the main effects and the two squared terms into the model,
to see if any two-way interaction terms could be added to the model. From this, only the interaction
between Floor and View was significant. The output from the final model is:
The regression equation is
Price = 206 - 9.93 Floor - 7.02 Distance + 66.0 View - 22.5 Endunit
+ 6.48 Furnish + 1.02 FlSq + 0.577 DiSq - 6.04 FV
Predictor
Constant
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
FV
Coef
206.123
-9.927
-7.020
65.952
-22.451
6.485
1.0207
0.57720
-6.037
s = 21.44
Stdev
8.379
3.186
1.539
6.619
8.662
3.265
0.3330
0.09848
1.312
R-sq = 61.5%
t-ratio
24.60
-3.12
-4.56
9.96
-2.59
1.99
3.07
5.86
-4.60
p
0.000
0.002
0.000
0.000
0.010
0.048
0.002
0.000
0.000
R-sq(adj) = 60.0%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
8
200
208
SS
146965
91928
238893
DF
1
1
1
1
1
1
1
1
SEQ SS
14149
21208
75065
6829
840
3640
15503
9731
SOURCE
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
FV
MS
18371
460
F
39.97
p
0.000
445
This final model is fairly good. The R-squared value is .615. Thus, 61.5% of the variation in prices can
be explained by the model that includes the follow variables: Floor and Floor-squared, Distance and
Distance-squared, View, Endunit, Furnish, and the interaction of Floor and View. The residual plots
are as follows:
From the residual plots, it appears that the data are normally distributed, but there may be a couple of
outliers. This is evident by the two points whose standardized residuals are less than 3. Also, it
appears that there is constant variance. Thus, the model looks to be fairly good. It would be better if
the R-squared value was higher, however.
The final model is:
Price = 206 9.93 Floor 7.02 Distance + 66.0 View 22.5 Endunit + 6.48 Furnish
+ 1.02 FlSq + 0.577 DiSq - 6.04 FV
I have included graphs to indicate how each variable affects the price. These graphs reflect the
relationship between Price and a selected variable, holding the other variables constant.
The first graph is a graph of Price by Floor for each level of View, since Floor and View interact. Both
lines are curved to reflect the quadratic relationship between Floor and Price. For the Non-ocean view,
the price is fairly constant. There is a slight decrease in price as the Floor increases until Floor 5, and
then a slight increase as the floor increases. For the Ocean view, the price decreases at a decreasing rate
as the Floor increases.
The second graph is a graph of the Price by Distance. Again, the quadratic relationship is reflected by
the curved line. As the distance increases, the price decreases until a distance of 6 is reached. Then the
price begins to increase again as the distance increases.
446
The third graph is a graph of the Price by View, for each Floor. Again, we must look at the relationship
between Price and View at each Floor because of the significant interaction. For all Floors, the price of
the Ocean View is higher than the price of the Non-ocean View. However, the difference in the two
views depends on the floor.
The fourth graph is a graph of the Price by Endunit. From the graph, the price of the endunits are less
than the others.
The last graph is a graph of the Price by Furnish. From the graph, the price of the furnished units is
higher than the price of the non-furnished units.
447