CH11 Isbe PDF

Multiple Regression and
Model Building
11.2
a.
0 = 506.346, 1 = 941.900, 2 = -429.060
b.
y = 506.346 941.900x1 429.060x2
c.
SSE = 151,016, MSE = 8883, s = 94.251
Chapter 11
We expect about 95% of the y-values to fall within 2s or 2(94.251) or 188.502

units of the fitted regression equation.
d.
H0: 1 = 0
Ha: 1 0
The test statistic is t =
1 0
s
941.900
= 3.42
275.08
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 20 - (2 + 1) = 17. From Table VI, Appendix B, t.025 = 2.110. The
rejection region is t < 2.110 or t > 2.110.
Since the observed value of the test statistic falls in the rejection region (t = 3.42 <
2.110), H0 is rejected. There is sufficient evidence to indicate 1 0 at = .05.
e.
For confidence coefficient .95, = .05 and /2 = .025. From Table VI, Appendix
B, with df = n (k + 1) = 20 (2 + 1) = 17, t.025 = 2.110. The 95% confidence
interval is:
2 t.025 s 429.060 2.110(379.83) 429.060 801.441

2
(1230.501, 372.381)
f.
R2 = R-Sq = 45.9% . 45.9% of the total sample variation of the y values is explained
by the model containing x1 and x2.
R2a = R-Sq(adj) = 39.6%. 39.6% of the total sample variation of the y values is
explained by the model containing x1 and x2, adjusted for the sample size and the
number of parameters in the model.
Multiple Regression and Model Building
379
g.
To determine if at least one of the independent variables is significant in prediction y,

we test:
H0: 1 = 2 = 0
Ha: At least one i 0
From the printout, the test statistic is F = 7.22
Since no level was given, we will choose = .05. The rejection region requires
= .05 in the upper tail of the F-distribution with 1 = k = 2 and 2 = n (k + 1)
= 20 (2 + 1) = 17. From Table IX, Appendix B, F.05 = 3.59. The rejection region is
F > 3.59.
Since the observed value of the test statistic falls in the rejection region
( F = 7.22 > 3.59), H0 is rejected. There is sufficient evidence to indicate at least
one of the variables, x1 or x2, is significant in predicting y at = .05.
11.4
h.
The observed significance level of the test is p-value = 0.005. Since the
p-value is so small, we will reject H0 for most reasonable values of . There is
sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in
predicting y at greater than 0.005.
a.
We are given 1 = 3.1, s = 2.3, and n = 25.

1
H0: 1 = 0
Ha: 1 > 0
1 0
s
3.1
= 1.35
2.3
The rejection region requires = .05 in the upper tail of the t distribution with df =
n (k + 1) = 25 (2 + 1) = 22. From Table VI, Appendix B, t.05 = 1.717. The
rejection region is t > 1.717.
Since the observed value of the test statistic does not fall in the rejection region (t =
1.35 >/ 1.717), H0 is not rejected. There is insufficient evidence to indicate 1 > 0 at
= .05.
b.
We are given 2 = .92, s = .27, and n = 25.

2
H0: 2 = 0
Ha: 2 0
2 0
s
.92
= 3.41
.27
380
Chapter 11
df = n (k + 1) = 25 (2 + 1) = 22. From Table VI, Appendix B, t.025 = 2.074. The
Since the observed value of the test statistic falls in the rejection region (t = 3.41 >
2.074), reject H0. There is sufficient evidence to indicate 2 0 at = .05.
c.
For confidence coefficient .90, = 1 .90 = .10 and /2 = .10/2 = .05. From Table
VI, Appendix B, with df = n (k + 1) = 25 (2 + 1) = 22, t.05 = 1.717. The
confidence interval is:
1 t.05 s 3.1 1.717(2.3) 3.1 3.949 (.849, 7.049)

1
We are 90% confident that 1 falls between .849 and 7.049.

d.
VI, Appendix B, with df = n (k + 1) = 25 (2 + 1) = 22, t.005 = 2.819. The
2 t.005 s .92 2.819(.27) .92 .761 (.159, 1.681)

2
We are 99% confident that 2 falls between .159 and 1.681.

11.6
a.
For x2 = 1 and x3 = 3,
E(y) = 1 + 2x1 + 1 3(3)
E(y) = 2x1 7
The graph is :
381
b.
For x2 = 1 and x3 = 1
E(y) = 1 + 2x1 + (1) 3(1)
E(y) = 2x1 3
The graph is:
c.
They are parallel, each with a slope of 2. They have different y-intercepts.
d.
The relationship will be parallel lines.
11.8
No. There may be other independent variables that are important that have not been
included in the model, while there may also be some variables included in the model which
are not important. The only conclusion is that at least one of the independent variables is a
good predictor of y.
11.10
a.
The first order model is: E(y) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 5 x5
b.
R2 = .58. 58% of the total sample variation of the levels of trust is explained by the
model containing the 5 independent variables.
c.
F=
d.
.58 5
R2 k
=
= 16.57
2
(1 R ) [n (k + 1)] (1 .58) [66 (5 + 1)]
The rejection region requires = .10 in the upper tail of the F-distribution with 1 = k
= 5 and 2 = n (k + 1) = 66 (5 + 1) = 60. From Table VIII, Appendix B, F.10 = 1.90.
The rejection region is F > 1.96.
(F = 16.57 > 1.96), H0 is rejected. There is sufficient evidence to indicate that at
least one of the 5 independent variables is useful in the prediction of level of trust at
= .10.
11.12
a.
The least squares prediction equation is:
y = 3.70 + .34 x1 + .49 x2 + .72 x3 + 1.14 x4 + 1.51x5 + .26 x6 .14 x7 .10 x8 .10 x9 .
382
Chapter 11
b.
0 = 3.70 . This is estimate of the y-intercept. It has no other meaning because the
point with all independent variables equal to 0 is not in the observed range.
1 = 0.34 . For each additional walk, the mean number of runs scored is estimated
to increase by .30, holding all other variables constant.
2 = 0.49 . For each additional single, the mean number of runs scored is estimated to
increase by .49, holding all other variables constant.
3 = 0.72 . For each additional double, the mean number of runs scored is
estimated to increase by .72, holding all other variables constant.
4 = 1.14 . For each additional triple, the mean number of runs scored is estimated
to increase by 1.14, holding all other variables constant.
5 = 1.51 . For each additional home run, the mean number of runs scored is
estimated to increase by 1.51, holding all other variables constant.
6 = 0.26 . For each additional stolen base, the mean number of runs scored is
estimated to increase by .26, holding all other variables constant.
7 = 0.14 . For each additional time a runner is caught stealing, the mean number
of runs scored is estimated too decrease by .14, holding all other variables constant.
8 = 0.10 . For each additional strikeout, the mean number of runs scored is
estimated to decrease by .10, holding all other variables constant.
9 = 0.10 . For each additional out, the mean number of runs scored is estimated
to decrease by .10, holding all other variables constant.
c.
H0: 7 = 0
Ha: 7 < 0
7 0
s
.14 0
= 1.00
.14
The rejection region requires = .05 in the lower tail of the t-distribution with df
= n (k + 1) = 234 (9 + 1) = 224. From Table VI, Appendix B, t.05 = 1.645. The
rejection region is t < 1.645.
Since the observed value of the test statistic does not fall in the rejection region
(t = 1.00 </ 1.645), H0 is not rejected. There is insufficient evidence to indicate
that the mean number of runs decreases as the number of runners caught stealing
increase, holding all other variables constant at = .05.
383
d.
For confidence level .95, = .05 and /2 = .05/2 = .025. From Table VI, Appendix
B, with df = 224, t.025 = 1.96. The 95% confidence interval is:
5 t / 2 s 1.51 1.96(.05) 1.51 0.098 (1.412, 1.608)

5
We are 95% confident that the mean number of runs will increase by anywhere from
1.412 to 1.608 for each additional home run, holding all other variables constant.
11.14. a.
b.
R2 = .31. 31% of the total sample variation of the natural log of the level of CO2
emissions in 1996 is explained by the model containing the 7 independent variables.
The test statistic is F =
.31 7
R2 k
=
= 3.72
2
(1 R ) [n (k + 1)] (1 .31) [66 (7 + 1)]
The rejection region requires = .01 in the upper tail of the F-distribution with 1 = k
= 7 and 2 = n (k + 1) = 66 (7 + 1) = 58. From Table XI, Appendix B, F.01 = 2.95.
(F = 3.72 > 2.95), H0 is rejected. There is sufficient evidence to indicate that at
least one of the 7 independent variables is useful in the prediction of natural log of
the level of CO2 emissions in 1996 at = .01.
c.
To determine if foreign investments in 1980 is a useful predictor of CO2 emissions in

1996, we test:
H0: 1 = 0
Ha: 1 0
11.16
d.
The test statistic is t = 2.52 and the p-value is p < 0.05. Since the observed p-value is
less than (p < .05), Ho is rejected. There is sufficient evidence to indicate foreign
investments in 1980 is a useful predictor of CO2 emissions in 1996 at = .05.
a.
From MINITAB, the output is:

Regression Analysis: DDT versus Mile, Length, Weight
The regression equation is
DDT = - 108 + 0.0851 Mile + 3.77 Length - 0.0494 Weight
Predictor
Constant
Mile
Length
Weight
Coef
-108.07
0.08509
3.771
-0.04941
S = 97.48
SE Coef
62.70
0.08221
1.619
0.02926
R-Sq = 3.9%
T
-1.72
1.03
2.33
-1.69
P
0.087
0.302
0.021
0.094
R-Sq(adj) = 1.8%
Analysis of Variance
Source
Regression
Residual Error
Total
384
DF
3
140
143
SS
53794
1330210
1384003
MS
17931
9501
F
1.89
P
0.135
Chapter 11
y = 108.07 + 0.08509x1 + 3.771x2 0.04941x3

b.
s = 97.48. We would expect about 95% of the observed values of DDT level to fall
within 2s or 2(97.48) = 194.96 units of their least squares predicted values.
c.
To determine if at least one of the variables is useful in predicting the DDT level, we
test:
Ho: 1 = 2 = 3 = 0
Ha: At least 1 i 0
The test statistic is F = 1.89 and the p-value is p = .135. Since the p-value is not less
than = .05 (p = .135 </ .05), H0 is not rejected. There is insufficient evidence to
indicate at least one of the variables is useful in predicting the DDT level at = .05.
d.
To determine if DDT level increases as length increases, we test:

H0: 2 = 0
Ha: 2 > 0
The test statistics is t = 2.33

The p-value is p = .021/2 = .0105. Since the p-value is less than (p = .0105 < .05),
H0 is rejected. There is sufficient evidence to indicate that DDT level increases as
length increases, holding the other variables constant at = .05.
The observed significance level is p = .0105.
e.
For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table VI,
Appendix B, with df = n 3 = 144 4 = 140, t.025 = 1.96. The 95% confidence
interval is:
3 t / 2 s 0.04941 1.96(0.02926) 0.04941 0.05735

3
(0.10676, 0.00794)
We are 95% confident that the mean DDT level will change from 0.10676 to
0.00794 for each additional point increase in weight, holding length and mile
constant. Since 0 is in the interval, there is no evidence that weight and DDT level
are linearly related.
385
11.18
a.

Regression Analysis: WeightChg versus Digest, Fiber
WeightChg = 12.2 - 0.0265 Digest - 0.458 Fiber
Predictor
Constant
Digest
Fiber
Coef
12.180
-0.02654
-0.4578
S = 3.519
SE Coef
4.402
0.05349
0.1283
R-Sq = 52.9%
T
2.77
-0.50
-3.57
P
0.009
0.623
0.001
R-Sq(adj) = 50.5%
Source
Regression
Residual Error
Total
DF
2
39
41
SS
542.03
483.08
1025.12
MS
271.02
12.39
F
21.88
P
0.000
y = 12.2 .0265x1 .458x2

b.
0 = 12.2 = the estimate of the y-intercept

1 = .0265. We estimate that the mean weight change will decrease by .0265% for
each additional increase of 1% in digestion efficiency, with acid-detergent fibre held
constant.
2 = .458. We estimate that the mean weight change will decrease by .458% for
each additional increase of 1% in acid-detergent fibre, with digestion efficiency held
constant.
c.
To determine if digestion efficiency is a useful predictor of weight change, we test:

H0: 1 = 0
Ha: 1 0
The test statistic is t = .50. The p-value is p = .623. Since the p-value is greater than
(p = .623 > .01), H0 is not rejected. There is insufficient evidence to indicate that
digestion efficiency is a useful linear predictor of weight change at = .01.
d.
VI, Appendix B, with df = n (k + 1) = 42 (2 + 1) = 39, t.005 2.704. The 99%
2 t.005 s .4578 2.704 (.1283) .4578 .3469 (.8047, .1109)

2
We are 99% confident that the change in mean weight change for each unit change in
acid-detergent fiber, holding digestion efficiency constant is between .8047% and
.1109%.
386
Chapter 11
e.
R2 = R-Sq = 52.9%. 52.9% of the total sample variance of the weight changes is
explained by the model containing the 2 independent variables, digestion efficiency ad
acid-detergent fiber.
R2a = R-Sq(adj) = 50.5%. 50.5% of the total sample variance of the weight changes is
explained by the model containing the 2 independent variables, digestion efficiency
ad acid-detergent fiber, adjusting for the sample size and the number of parameters in
the model.
f.
To determine if at least one of the variables is useful in predicting weight change, we

test:
H0: 1 = 2 = 0
Ha: At least 1 i 0
The test statistic is F = 21.88 and the p-value is p = .000. Since the p-value is less
than = .05 (p = .000 < .05), H0 is rejected. There is sufficient evidence to indicate at
least one of the variables is useful in predicting weight change at = .05.
11.20
a.

y = 4.30 .002x1 + .336x2 + .384x3 + .067x4 .143x5 + .081x6 + .134x7
b.
To determine if the model is adequate, we test:

H0: 1 = 2 = 3 = 4 = 5 = 6 = 7 = 0
Ha: At least one i 0, i = 1, 2, 3, ..., 7
The test statistic is F = 111.1 (from table).
Since no was given, we will use = .05. The rejection region requires = .05 in
the upper tail of the F-distribution with 1 = k = 7 and 2 = n (k + 1) = 268 (7 + 1)
= 260. From Table IX, Appendix B, F.05 2.01. The rejection region is F > 2.01.
Since the observed value of the test statistic falls in the rejection region (F = 111.1 >
2.01), H0 is rejected. There is sufficient evidence to indicate that the model is
adequate for predicting the logarithm of the audit fees at = .05.
c.
3 = .384.
For each additional subsidiary of the auditee, the mean of the

logarithm of audit fee is estimated to increase by .384 units.
387
d.
To determine if the 4 > 0, we test:

H0: 4 = 0
Ha: 4 > 0
The test statistic is t = 1.76 (from table).
The p-value for the test is .079. Since the p-value is not less than (p = .079 </ =
.05), H0 is not rejected. There is insufficient evidence to indicate that 4 > 0, holding
all the other variables constant, at = .05.
e.
To determine if the 1 < 0, we test:

H0: 1 = 0
Ha: 1 < 0
The test statistic is t = 0.049 (from table).
The p-value for the test is .961. Since the p-value is not less than (p = .961 </ =
.05), H0 is not rejected. There is insufficient evidence to indicate that 1 < 0, holding
all the other variables constant, at = .05. There is insufficient evidence to indicate
that the new auditors charge less than incumbent auditors.
11.22
To determine if the model is useful, we test:

H0: 1 = 2 = = 18 = 0
Ha: At least one i 0, i = 1, 2, ... , 18
R2 / k
.95 /18
=
= 1.06
2
(1 R ) /[n ( k + 1)]
(1 .95) /[20 (18 + 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 18
and 2 = n (k + 1) = 20 (18 + 1) = 1. From Table IX, Appendix B, F.05 245.9. The
rejection region is F > 245.9.
Since the observed value of the test statistic does not fall in the rejection region (F = 1.06
>/ 247), H0 is not rejected. There is insufficient evidence to indicate the model is adequate
at = .05.
Note: Although R2 is large, there are so many variables in the model that 2 is small.
388
Chapter 11
11.24
a.

Regression Analysis: Labor versus Pounds, Units, Weight
Labor = 132 + 2.73 Pounds + 0.0472 Units - 2.59 Weight
Predictor
Constant
Pounds
Units
Weight
Coef
131.92
2.726
0.04722
-2.5874
S = 9.810
SE Coef
25.69
2.275
0.09335
0.6428
R-Sq = 77.0%
T
5.13
1.20
0.51
-4.03
P
0.000
0.248
0.620
0.001
R-Sq(adj) = 72.7%
Source
Regression
Residual Error
Total
Source
Pounds
Units
Weight
DF
3
16
19
DF
1
1
1
SS
5158.3
1539.9
6698.2
MS
1719.4
96.2
F
17.87
P
0.000
Seq SS
3400.6
198.4
1559.3
The least squares equation is:

y = 131.92 + 2.726x1 + .0472x2 2.587x3
b.
To test the usefulness of the model, we test:

H0: 1 = 2 = 3 = 0
Ha: At least one i 0, for i = 1, 2, 3
MSR
1719.4
=
= 17.87
MSE
96.2
The rejection region requires = .01 in the upper tail of the F-distribution with
1 = k = 3 and 2 = n (k + 1) = 20 (3 + 1) = 16. From Table XI, Appendix B,
F.01 = 5.29. The rejection region is F > 5.29.
Since the observed value of the test statistic falls in the rejection region (F = 17.87
> 5.29), H0 is rejected. There is sufficient evidence to indicate a relationship exists
between hours of labor and at least one of the independent variables at = .01.
c.
H0: 2 = 0
Ha: 2 0
The test statistic is t = .51. The p-value = .620. We reject H0 if p-value < . Since
.620 > .05, do not reject H0. There is insufficient evidence to indicate a relationship
exists between hours of labor and percentage of units shipped by truck, all other
variables held constant, at = .05.
389
d.
R2 is printed as R-Sq. R2 = .770. We conclude that 77% of the sample variation of the
labor hours is explained by the regression model, including the independent variables
pounds shipped, percentage of units shipped by truck, and weight.
e.
If the average number of pounds per shipment increases from 20 to 21, the estimated
change in mean number of hours of labor is 2.587. Thus, it will cost $7.50(2.587) =
$19.4025 less, if the variables x1 and x2 are constant.
f.
Since s = Standard Error = 9.81, we can estimate approximately with 2s precision or

2(9.81) or 19.62 hours.
g.
No. Regression analysis only determines if variables are related. It cannot be used to
determine cause and effect.
11.26
From the printout, the 90% prediction interval is (-151.996, 175.4874). We are 90%
confidence that an actual DDT level for a fish caught 100 miles upstream that is 40
centimeters long and weighs 800 grams will be between -151.996 and 175.4874. Since the
DDT level cannot be negative, the interval would be between 0 and 175.4874.
11.28
a.

Regression Analysis: Precip versus Altitude, Latit, Coast
Precip = - 102 + 0.00409 Altitude + 3.45 Latit - 0.143 Coast
Predictor
Constant
Altitude
Latit
Coast
Coef
-102.36
0.004091
3.4511
-0.14286
S = 11.10
SE Coef
29.21
0.001218
0.7949
0.03634
R-Sq = 60.0%
T
-3.50
3.36
4.34
-3.93
P
0.002
0.002
0.000
0.001
R-Sq(adj) = 55.4%
Source
Regression
Residual Error
Total
Source
Altitude
Latit
Coast
DF
1
1
1
DF
3
26
29
SS
4809.4
3202.3
8011.7
MS
1603.1
123.2
F
13.02
P
0.000
Seq SS
730.7
2175.3
1903.4
Predicted Values for New Observations

New Obs
1
Fit
29.25
SE Fit
5.60
95.0% CI
17.75,
40.76)
95.0% PI
3.71,
54.80)
Values of Predictors for New Observations

New Obs Altitude
Latit
Coast
1
6360
36.6
145
The fitted regression line is:
y = 102.36 + 0.00409 x1 + 3.4511x2 0.1429 x3
390
Chapter 11
b.
To determine if the first-order model is useful for the predicting annual precipitation,
we test:
H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is 13.02 and the p-value is p = 0.000. Since the p-value is less than
= .05, H0 is rejected. There is sufficient evidence to indicate that the model is
useful for predicting annual precipitation at = .05.
c.
The prediction interval is (3.71, 54.80).

With 95% confidence, we can conclude that the annual precipitation for an individual
meteorological station with characteristics x1 = 6360 feet, x2 = 36.6, x3 = 145 miles
will fall between 3.71 inches and 54.80 inches.
11.30
The first order model is:
E(y) = 0 + 1x1 + 2x2 + 3x5

We want to find a 95% prediction interval for the actual voltage when the volume fraction
of the disperse phase is at the high level (x1 = 80), the salinity is at the low level (x2 = 1),
and the amount of surfactant is at the low level (x5 = 2).
Using MINITAB, the output is:
y = 0.993 - 0.0243 x1 + 0.142 x2 + 0.385 x5
Predictor
Coef
StDev
0.9326
0.2482
3.76
0.002
x1
-0.024272
0.004900
-4.95
0.000
x2
0.14206
0.07573
1.88
0.080
x5
0.38457
0.09801
3.92
0.001
Constant
S = 0.4796
R-Sq = 66.6%
R-Sq(adj) = 59.9%
Source
Regression
Residual
DF
SS
MS
6.8701
2.2900
9.95
0.001
15
3.4509
0.2301
18
10.3210
Error
Total
Sourc
DF
Seq SS
x1
1.4016
x2
1.9263
x5
3.5422
391
Unusual Observations
Obs
x1
Fit
StDev Fit
Residual
St Resid
40.0
3.200
2.068
0.239
1.132
2.72R
R denotes an observation with a large standardized residual

Predicted Values
Fit
StDev Fit
-0.098
0.232
95.0%
( -0.592,
CI
0.396)
95.0%
(
-1.233,
PI
1.038)
The 95% prediction interval is (1.233, 1.038). We are 95% confident that the actual
voltage is between 1.233 and 1.038 kw/cm when the volume fraction of the disperse phase
is at the high level (x1 = 80), the salinity is at the low level (x2 = 1), and the amount of
surfactant is at the low level (x5 = 2).
11.32
11.34
a.
E(y) = 0 + 1x1 + 2x2 + 3x1x2
b.
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3 + 6x2x3
a.
R2 = 1
SSE
SS yy
=1
21
= .956
479
95.6% of the total variability of the y values is explained by this model.

b.
To test the utility of the model, we test:
H0: 1 = 2 = 3 = 0
R2 / k
.956 / 3
= 202.8
=
2
(1 R )[n (k + 1)] (1 .956)[32 (3 + 1)]
The rejection region requires = .05 in the upper tail of the F distribution, with 1 = k
= 3 and 2 = n (k + 1) = 32 (3 + 1) = 28. From Table IX, Appendix B, F.05 = 2.95.
2.95), H0 is rejected. There is sufficient evidence that the model is adequate for
predicting y at = .05.
392
Chapter 11
c.
The relationship between y and x1 depends on the level of x2.
d.
To determine if x1 and x2 interact, we test:
H0: 3 = 0
Ha: 3 0
1 0 10
s
= 2.5.
2.048), H0 is rejected. There is sufficient evidence to indicate that x1 and x2 interact at
= .05.
11.36
a.
To determine if the overall model is useful for predicting y, we test:
H0: 1 = 2 = 3 = 0
Ha: At least one i is not 0
The test statistic is F = 226.35 and the p-value is p < .001. Since the p-value is less
than (p < .001 < .05), Ho is rejected. There is sufficient evidence to indicate the
overall model is useful for predicting y, willingness of the consumer to shop at a
retailers store in the future at = .05.
b.
To determine if consumer satisfaction and retailer interest interact to affect

willingness to shop at retailers shop in future, we test:
H0: 3 = 0
Ha: 3 0
The test statistic is t = -3.09 and the p-value is p < .01. Since the p-value is less
than (p < .01 < .05), H0 is rejected. There is sufficient evidence to indicate
393
consumer satisfaction and retailer interest interact to affect willingness to shop at

retailers shop in future at = .05.
c.
When x2 = 1,
y = o + .426 x1 + .044 x2 .157 x1 x2

= + .426 x + .044(1) .157 x (1)
o
= o + .044 + (.426 .157) x1

= + .044 + .269 x
o
Since no value is given for o , we will use o = 1 for graphing purposes. Using
MINITAB, a graph might look like:
Scatterplot of YHAT vs X1 when X2=1
3.0
YHA T
2.5
2.0
1.5
d.
4
X1
When x2 = 7,
y = o + .426 x1 + .044 x2 .157 x1 x2

= + .426 x + .044(7) .157 x (7)
o
= o + .308 + (.426 1.099) x1

= + .308 .673x
o
Since no value is given for o , we will again use o = 1 for graphing purposes.
394
Chapter 11
Using MINITAB, a graph might look like:

Scatterplot of YHAT vs X1 when X2=7
YHA T
-1
-2
-3
-4
1
e.
4
X1
Using MINITAB, both plots on the same graph would be:

Scatterplot of YAHT vs X1
Variable
x2=1
x2=7
YHA T
1
0
-1
-2
-3
-4
1
4
X1
Since the lines are not parallel, it indicates that interaction is present.
11.38
a.
The hypothesized regression model including the interaction between x1 and x2

would be:
E ( y ) = o + 1 x1 + 2 x2 + 3 x1 x2
b.
If x1 and x2 interact to affect y then the effect of x1 on y depends on the level of x2.
Also, the effect of x2 on y depends on the level of x1.
395
c.
Since the p-value is not small (p = .25), Ho is not rejected. There is insufficient
evidence to indicate x1 and x2 interact to affect y.
d.
1 corresponds to x1, the number ahead in line. If the negative feeling score gets
larger as the number of people ahead increases, then 1 is positive. 2 corresponds to
x2, the number behind in line. If the negative feeling score gets lower as the number
of people behind increases, then 2 is negative.
11.40
a.
If client credibility and linguistic delivery style interact, then the effect of client
credibility on the likelihood value depends on the level of linguistic delivery style.
b.
To determine the overall model adequacy, we test:
H0: 1 = 2 = 3 = 0
c.
The test statistic is F = 55.35 and the p-value is p < 0.0005.

Since the p-value is so small (p < 0.0005), H0 is rejected for any reasonable value of
. There is sufficient evidence to indicate that the model is adequate at > 0.0005.
d.
To determine if client credibility and linguistic delivery style interact, we test:
H0: 3 = 0
Ha: 3 0
e.
The test statistic is t = 4.008 and the p-value is p < 0.005.

Since the p-value is so small (p < 0.005), H0 is rejected. There is sufficient evidence
to indicate that client credibility and linguistic delivery style interact at > 0.005.
f.
When x1 = 22, the least squares line is:
y = 15.865 + 0.037(22) 0.678 x2 + 0.036 x2 (22) = 16.679 + 0.114 x2

The estimated slope of the Likelihood-Linguistic delivery style line when client
credibility is 22 is 0.114. When client credibility is equal to 22, for each additional
point increase in linguistic delivery style, the mean likelihood is estimated to increase
by 0.114.
g.
When x1 = 46, the least squares line is:
y = 15.865 + 0.037(46) 0.678 x2 + 0.036 x2 (46) = 17.567 + 0.978 x2

The estimated slope of the Likelihood-Linguistic delivery style line when client
credibility is 46 is 0.978. When client credibility is equal to 46, for each additional
point increase in linguistic delivery style, the mean likelihood is estimated to increase
by 0.978.
396
Chapter 11
11.42
a.
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5
b.
H0: 4 = 0
c.
t = 4.408, p-value = .001

Since the p-value is so small, there is strong evidence to reject H0. There is sufficient
evidence to indicate that the strength of client-therapist relationship contributes
information for the prediction of a client's reaction for any > .001.
11.44
d.
Answers may vary.
e.
R2 = .2946. 29.46% of the variability in the client's reaction scores can be explained
by this model.
a.
1 = .02. The mean level of support for a military response is estimated to increase
by .02 for each day increase in level of TV news exposure, all other
variables held constant.
b.
To determine if an increase in TV news exposure is associated with an increase in

support for military resolution, we test:
H0: 1 = 0
Ha: 1 > 0
The p-value is p = .03/2 = .015. Since the p-value is less than (p = .015 < .05), H0 is
rejected. There is sufficient evidence to indicate that an increase in TV news
exposure is associated with an increase in support for military resolution, all other
c.
To determine if the relationship between support for military resolution and gender
depends on political knowledge, we test:
H0: 8 = 0
Ha: 8 0
The p-value is p = .02. Since the p-value is less than (p = .02 < .05), H0 is rejected.
There is sufficient evidence to indicate that the relationship between support for a
military resolution and gender depends on political knowledge, all other variables
held constant, at = .05.
d.
To determine if the relationship between support for military resolution and race
depends on political knowledge, we test:
H0: 9 = 0
Ha: 9 0
The p-value is p = .08. Since the p-value is not less than (p = .08 </ .05), H0 is not
rejected. There is insufficient evidence to indicate that the relationship between
397
support for a military resolution and race depends on political knowledge, all other
e.
f.
R2 = .194.
19.4% of the variation in the support for military resolution is

explained by the model containing the seven independent variables
and the two interaction terms.
H0: 1 = 2 = 3 = 4 = 5 = 6 = 7 = 8 = 9 = 0
Ha: At least one i 0, i = 1, 2, 3, ... , 9
R2 / k
.194 / 9
=
= 46.88
2
(1 R ) /[n (k + 1)] (1 .194) /[1763 (9 + 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 =
k = 9 and 2 = n (k + 1) = 1763 (9 + 1) = 1753. From Table IX, Appendix B, F.05
1.88. The rejection region is F > 1.88.
1.88), H0 is rejected. There is sufficient evidence to indicate that the model is useful
at = .05.
11.46
a.
H0: 2 = 0
Ha: 2 0
2 0
s
.47 0
= 3.133
.15
2.074), H0 is rejected. There is sufficient evidence to indicate the quadratic term
should be included in the model at = .05.
b.
H0: 2 = 0
Ha: 2 > 0
The test statistic is the same as in part a, t = 3.133.
The rejection region requires = .05 in the upper tail of the t distribution with df =
22. From Table VI, Appendix B, t.05 = 1.717. The rejection region is t > 1.717.
1.717), H0 is rejected. There is sufficient evidence to indicate the quadratic curve
opens upward at = .05.
398
Chapter 11
11.48
11.50
a.
b.
It moves the graph to the right (2x) or to the left (+2x) compared to the graph of
y = 1 + x2.
c.
It controls whether the graph opens up (+x2) or down (x2). It also controls how steep
the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the
narrower the curve is.
a.
0 has no meaning because x = 0 would not be in the observed range of values. In

this case, x is the year with values between 1984 and 1999.
b.
1 = 321.67. Since the quadratic effect is included in the model, the linear term is
just a location parameter and has no meaning.
c.
2 = .0794. Since the value of 2 is positive, the curvature is upward.
d.
Since no data have been collected past 1999, we have no idea if the relationship
between the two variables from 1984 to 1999 will remain the same until 2021.
399
11.52
a.
Using MINITAB, a sketch of the least squares prediction equation is:

Scatterplot of yhat vs Dose
12
10
yhat
8
6
4
2
0
0
100
200
300
400
Dose
500
600
700
800
b.
For x = 500, y = 10.25 + .0053(500) .0000266(5002 ) = 10.25 + 2.65 6.65 = 6.25
c.
For x = 0, y = 10.25 + .0053(0) .0000266(02 ) = 10.25
d.
For x = 100, y = 10.25 + .0053(100) .0000266(1002 ) = 10.25 + .53 .266 = 10.514

This value is slightly larger than that for the control group (10.25).
For x = 200, y = 10.25 + .0053(200) .0000266(2002 ) = 10.25 + 1.06 1.064 = 10.246
This value is slightly smaller than that for the control group (10.25). So, the largest
value of x which yields an estimated weight change that is closest to, but just less than
the estimated weight change for the control group is x = 200.
11.54
a.
A first order model is:

E(y) = o + 1x
b.
A second order model is:

E(y) = o + 1x + 2x2
400
Chapter 11
c.
Using MINITAB, a scattergram of these data is:

Scatterplot of International vs Domestic
1200
Inter national
1000
800
600
400
200
0
100
200
300
400
Domestic
500
600
From the plot, it appears that the first order model might fit the data better. There
does not appear to be much of a curve to the relationship.
d.
Using MINITAB, the output is:

Regression Analysis: International versus Domestic, Dsq
International = 203 - 0.58 Domestic + 0.00364 Dsq
Predictor
Constant
Domestic
Dsq
Coef
202.9
-0.581
0.003638
S = 142.696
SE Coef
245.0
1.510
0.002085
R-Sq = 78.8%
T
0.83
-0.38
1.74
P
0.424
0.707
0.107
R-Sq(adj) = 75.2%
Source
Regression
Residual Error
Total
Source
Domestic
Dsq
DF
1
1
DF
2
12
14
SS
906515
244345
1150860
MS
453258
20362
F
22.26
P
0.000
Seq SS
844526
61990
To investigate the usefulness of the model, we test:
H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2
401
The test statistic is F = 22.26.

The p-value is p = 0.000. Since the p-value is so small, we reject H0. There is
sufficient evidence to indicate the model is useful for predicting foreign gross
revenue.
To determine if a curvilinear relationship exists between foreign and domestic gross
revenues, we test:
H0: 2 = 0
Ha: 2 0
The test statistic is t = 1.74
The p-value is p = 0.107. Since the p-value is greater than = .05
(p = 0.107 > = .05), H0 is not rejected. There is insufficient evidence to indicate
that a curvilinear relationship exists between foreign and domestic gross revenues at
= .05.
e.
11.56
402
From the analysis in part d, the first-order model better explains the variation in
foreign gross revenues. In part d, we concluded that the second-order term did not
improve the model.
a.
b.
It moves the graph to the right (2x) or to the left (+2x) compared to the graph of
y = 1 + x2.
c.
It controls whether the graph opens up (+x2) or down (x2). It also controls how steep
the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the
narrower the curve is.
Chapter 11
11.58
a.
A scatterplot of the data is:

-
10500+
7000+
***
* *
**
*
*
*
** *
**
3500+
*
*
* *
+---------+---------+---------+---------+---------+------X
0.0
8.0
16.0
24.0
32.0
40.0
b.
From the plot, it looks like a second-order model would fit the data better than a firstorder model. There is little evidence that a third-order model would fit the data better
than a second-order model.
c.
Using MINITAB, the output for fitting a first-order model is:

Y = 2752 + 122 X
Predictor
Constant
X
Coef
2752.4
122.34
s = 1904
Stdev
613.5
26.08
R-sq = 36.7%
t-ratio
4.49
4.69
p
0.000
0.000
R-sq(adj) = 35.0%
SOURCE
Regression
Error
Total
DF
1
38
39
SS
79775688
137726224
217501920
Obs.
X
Y
27
27.0
2007
40
40.0
11520
MS
79775688
3624374
Fit Stdev.Fit
6056
345
7646
591
F
22.01
Residual
-4049
3874
p
0.000
St.Resid
-2.16R
2.14R
R denotes an obs. with a large st. resid.
403
To see if there is a significant linear relationship between day and demand, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 4.69.
The p-value for the test is p = 0.000. Since the p-value is less than = .05, H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship
between day and demand at = .05.
d.
Using MINITAB, the output for fitting a second-order model is:

Y = 5120 - 216 X + 8.25 XSQ
Predictor
Constant
X
XSQ
Coef
5120.2
-215.92
8.250
s = 1637
Stdev
816.9
91.89
2.173
R-sq = 54.4%
t-ratio
6.27
-2.35
3.80
p
0.000
0.024
0.001
R-sq(adj) = 52.0%
SOURCE
Regression
Error
Total
DF
2
37
39
SS
118377056
99124856
217501920
SOURCE
X
XSQ
DF
1
1
SEQ SS
79775688
38601372
Obs.
X
Y
27
27.0
2007
MS
59188528
2679050
Fit Stdev.Fit
5305
357
F
22.09
Residual
-3298
p
0.000
St.Resid
-2.06R
R denotes an obs. with a large st. resid.
To see if there is a significant quadratic relationship between day and demand, we

test:
H0: 2 = 0
Ha: 2 0
The p-value for the test is p = 0.001. Since the p-value is less than = .05, H0 is
rejected. There is sufficient evidence to indicate that there is a quadratic relationship
between day and demand at = .05.
404
Chapter 11
e.
11.60
Since the quadratic term is significant in the second-order model in part d, the second
order model is better.
The model is E(y) = 0 + 1x1 + 2x2

where
1 if the variable is at level 2

x1 =
0 otherwise
1 if the variable is at level 3

x2 =
0 otherwise
0 = mean value of y when qualitative variable is at level 1.

1 = difference in mean value of y between level 2 and level 1 of qualitative variable.
2 = difference in mean value of y between level 3 and level 1 of qualitative variable.
11.62
a.

y = 80 + 16.8x1 + 40.4x2
b.
1 estimates the difference in the mean value of the dependent variable between level
2 and level 1 of the independent variable.
2 estimates the difference in the mean value of the dependent variable between level
3 and level 1 of the independent variable.
c.
The hypothesis H0: 1 = 2 = 0 is the same as H0: 1 = 2 = 3.

The hypothesis Ha: At least one of the parameters 1 and 2 differs from 0 is the same
as Ha: At least one mean (1, 2, or 3) is different.
d.
MSR 2059.5
=
= 24.72
MSE
83.3
Since no was given, we will use = .05. The rejection region requires = .05 in
the upper tail of the test statistic with numerator df = k = 2 and denominator df = n
(k + 1) = 15 (2 + 1) = 12. From Table IX, Appendix B, F.05 = 3.89. The rejection
region is F > 3.89.
3.89), H0 is rejected. There is sufficient evidence to indicate at least one of the means
is different at = .05.
11.64
a.
b.
A confidence interval for the difference of two population means could be used.
Since both sample sizes are over 30, the large sample confidence interval is used (with
independent samples).
1 if public college
Let x1 =
0 otherwise
The model is E(y) = 0 + 1x1
405
c.
1 is the difference between the two population means. A point estimate for 1 is 1 .
A confidence interval for 1 could be used to estimate the difference in the two
population means.
11.66
a.
1 if no
Let x1 =
0 if yes
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded yes to the
question "Flextime of the position applied for" and 1 is the difference in the mean job
preference between those who responded 'no' to the question and those who answered
yes to the question.
b.
1 if referral
Let x1 =
0 if not
1 if on-premise
x2 =
0 if not
The model would be E(y) = o + 1x1 + 2x2

In this model, o is the mean job preference for those who responded none to level
of day care support required, 1 is the difference in the mean job preference between
those who responded referral and those who responded none, and 2 is the
difference in the mean job preference between those who responded on-premise and
those who responded none.
c.
1 if counseling
Let x1 =
0 if not
1 if active search
x2 =
0 if not
The model would be E(y) = 0 + 1x1 + 2x2

In this model, 0 is the mean job preference for those who responded none to
spousal transfer support required, 1 is the difference in the mean job preference
between those who responded counseling and those who responded none, and 2 is
the difference in the mean job preference between those who responded active
search and those who responded none.
d.
1 if not married
Let x1 =
0 if married
In this model, 0 is the mean job preference for those who responded married to
marital status and 1 is the difference in the mean job preference between those who
responded not married and those who answered married.
406
Chapter 11
e.
1 if female
Let x1 =
0 if male
In this model, 0 is the mean job preference for males and 1 is the difference in the
mean job preference between females and males.
11.68
a.
4 = .296 The difference in the mean value of DTVA between when the operating
earnings are negative and lower than last year and when the operating earnings are
not negative and lower than last year is estimated to be .296, holding all other
variables constant.
b.
To determine if the mean DTVA for firms with negative earnings and earnings lower
than last year exceed the mean DTVA of other firms, we test:
H0: 4 = 0
Ha: 4 > 0
The p-value for this test is p = .001 / 2 = .0005. Since the p-value is so small, we
would reject H0 for = .05. There is sufficient evidence to indicate the mean DTVA
for firms with negative earnings and earnings lower than last year exceed the mean
DTVA of other firms at = .05.
11.70
c.
Ra2 = .280 28% of the variability in the DTVA scores is explained by the model
containing the 5 independent variables, adjusted for the number of variables in the
model and the sample size.
a.
To determine if there is a difference in the mean monthly rate of return for T-Bills
between an expansive Fed monetary policy and a restrictive Fed monetary policy, we
test:
H0: 1 = 0
Ha: 1 0
Since no n nor is given, we cannot determine the exact rejection region. However,
we can assume that n is greater than 2 since the data used are from 1972 and 1997.
With = .05, the critical value of t for the rejection region will be smaller than 4.303.
Thus, with = .05, t = 8.14 will fall in the rejection region. There is sufficient
evidence to indicate a difference in the mean monthly rate of return for T-Bills
between an expansive Fed monetary policy and a restrictive Fed monetary policy at
= .05.
However, the value of R2 is .1818. The model used is explaining only 18.18% of the
variability in the monthly rate of return. This is not a particularly large value.
407
To determine if there is a difference in the mean monthly rate of return for Equity
REIT between an expansive Fed monetary policy and a restrictive Fed monetary
policy, we test:
H0: 1 = 0
Ha: 1 0
Since no n nor is given, we cannot determine the exact rejection region. However,
we can assume that n is greater than 4 since the data used are from 1972 and 1997.
With = .05, the critical value of t for the rejection region will be smaller than 3.182.
Thus, with = .05, t = 3.46 will fall in the rejection region. There is sufficient
evidence to indicate a difference in the mean monthly rate of return for Equity REIT
between an expansive Fed monetary policy and a restrictive Fed monetary policy at
= .05.
However, the value of R2 is .0387. The model used is explaining only 3.87% of the
variability in the monthly rate of return. This is a very small value.
b.
For the first model, 1 is the difference in the mean monthly rate of return for T-Bills
between an expansive Fed monetary policy and a restrictive Fed monetary policy.
For the second model, 1 is the difference in the mean monthly rate of return for
Equity REIT between an expansive Fed monetary policy and a restrictive Fed
monetary policy.
c.
The least squares prediction equation for the equity REIT index is:
y = 0.01863 0.01582x.
When the Federal Reserves monetary policy is restrictive, x = 1. The predicted mean
monthly rate of return for the equity REIT index is
y = 0.01863 0.01582(1) = .00281

When the Federal Reserves monetary policy is expansive, x = 0. The predicted mean
monthly rate of return for the equity REIT index is
y = 0.01863 0.01582(0) = .01863.
11.72
a.
The first-order model is E(y) = 0 + 1x1
b.
The new model is E(y) = 0 + 1x1 + 2x2 + 3x3

1 if level 2
where x 2 =
0 otherwise
408
1 if level 3
x3 =
0 otherwise
Chapter 11
c.
To allow for interactions, the model is:

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3
11.74
11.76
d.
The response lines will be parallel if 4 = 5 = 0
e.
There will be one response line if 2 = 3 = 4 = 5 = 0
a.
When x2 = x3 = 0, E(y) = 0 + 1x1

When x2 = 1 and x3 = 0, E(y) = 0 + 1x1 + 2
When x2 = 0 and x3 = 1, E(y) = 0 + 1x1 + 3
b.
For level 1, y = 44.8 + 2.2x1

For level 2, y = 44.8 + 2.2x1 + 9.4
= 54.2 + 2.2x1
For level 3, y = 44.8 + 2.2x1 + 15.6
= 60.4 + 2.2x1
The model is E(y) = 0 + 1x1 + 2 x12 + 3x2 + 4x3 + 5x4

where x1 is the quantitative variable and
1 if level 2 of qualitative variable
x2 =
0 otherwise
x3 =
0 otherwise
x4 =
0 otherwise
409
11.78
a.
E(y) = 0 + 1x1 + 2x2 + 3x1x2
1 if diet is duck chow

where x 2 =
0 otherwise
b.
Using MINITAB, the printout is:

WtChg = -2.21 + 0.0783x1 - 10.4x2 - 0.095x1x2
Predicto
r
Constant
x1
x2
x1x2
S = 3.882
Coef
StDev
-2.210
0.07831
10.354
-0.0948
1.250
0.04947
8.538
0.1418
-1.77
1.58
1.21
-0.67
0.085
0.122
0.233
0.508
R-Sq = 44.1%
R-Sq(adj) = 39.7
Source
Regression
Residual
Error
Total
Sourc
e
x1
x2
x1x2
DF
3
38
SS
452.54
572.58
41
1025.12
DF
Seq SS
1
1
1
384.24
61.57
6.73
MS
150.85
15.07
F
10.01
P
0.000
Obs
12
37
40
x1
30.0
42.5
75.0
y
-8.500
8.000
8.500
WtChg StDev Fit Residual St Resid

0.139
0.802
-8.639
-2.27R
7.445
2.990
0.555
0.22 X
6.910
2.077
1.590
0.48 X

X denotes an observation whose X value gives it large influence.
The fitted equation is y = 2.21 + .0783x1 + 10.4x2 .095x1x2
410
Chapter 11
c.
For diet = plants, x2 = 0

y = 2.21 + .0783x1 + 10.4(0) .095x1(0) = 2.21 + .0783x1
The slope is .0783. For each unit increase in digestion efficiency, the mean weight
change is estimated to increase by .0783 for goslings fed plants.
d.
For diet = plants, x2 = 1
y = 2.21 + .0783x1 + 10.4(1) .095x1(1) = 8.19 .0167x1

The slope is .0167. For each unit increase in digestion efficiency, the mean weight
change is estimated to decrease by .0167 for goslings fed duck chow.
e.
To determine if the slopes associated with the two diets differ, we test:
H0: 3 = 0
Ha: 3 0
From MINITAB, the test statistic is t = .67 with p-value = .508

Since = .05 is less than the p-value, we fail to reject H0. There is insufficient
evidence to conclude that the slopes associated with the two diets are significantly
different at = .05
11.80
a.
1 if intervention group
Let x2 =
0 if otherwise
The first-order model would be:
E(y) = 0 + 1x1 + 2x2
b.
For the control group, x2 = 0. The first-order model is:

E(y) = 0 + 1x1 + 2(0) = 0 + 1x1
For the intervention group, x2 = 1. The first-order model is:

E(y) = 0 + 1x1 + 2(1) = 0 + 1x1 + 2 = (0 + 2) + 1x1
In both models, the slope of the line is 1.

c.
If pretest score and group interact, the first-order model would be:
E(y) = 0 + 1x1 + 2x2 + 3x1x2
411
d.
For the control group, x2 = 0. The first-order model including the interaction is:
E(y) = 0 + 1x1 + 2(0) + 3x1(0) = 0 + 1x1
For the intervention group, x2 = 1. The first-order model including the interaction is:
E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 1x1 + 2 + 3x1
= (0 + 2) + (1 + 3)x1
The slope of the model for the control group is 1. The slope of the model for the
intervention group is 1 + 3.
11.82
a.
The first-order model is:

E(y) = 0 + 1x1 + 2x2
b.
For the high-tech firms, x2 = 1. The model for the high-tech firm is:
E(y) = 0 + 1x1 + 2(1) = 0 + 2 + 1x1
The slope of the line would be 1.

c.
The new model would include the interaction term:

E(y) = 0 + 1x1 + 2x2 + 3x1x2
d.
For the high-tech firms, x2 = 1. The model for the high-tech firm is:
E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 2 + (1 + 3)x1
The slope of the line would be 1 + 3.

11.84
By adding variables to the model, SSE will decrease or stay the same. Thus, SSEC SSER.
The only circumstance under which we will reject H0 is if SSEC is much smaller than SSER.
If SSEC is much smaller than SSER, F will be large. Thus, the test is only one-tailed.
11.86
a.
b.
The reduced model would be E(y) = 0 + 1x1 + 2x2
c.
The numerator df = k g = 5 2 = 3 and the denominator df = n (k + 1)

= 30 (5 + 1) = 24.
412
Chapter 11
d.
H0: 3 = 4 = 5 = 0
(SSE R SSE C)/(k g )
SSE C /[n (k + 1)]
(1250.2 1125.2) /(5 2) 41.6667
= .89
=
=
1125.2 /[30 (5 + 1)]
46.8833
The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k g = 5 2 = 3 and denominator df = n (k + 1) = 30 (5 + 1) = 24.
From Table IX, Appendix B, F.05 = 3.01. The rejection region is F > 3.01.
Since the observed value of the test statistic does not fall in the rejection region (F =
.89 >/ 3.01), H0 is not rejected. There is insufficient evidence to indicate the secondorder terms are useful at = .05.
11.88
a.
Let variables x1 through x4 be the Demographic variables, variables x5 through x11 be

the Diagnostic variables, variables x12 through x15 be the Treatment variables, and
variables x16 through x21 be the Community variables. The compete model is:
E ( y ) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 5 x5 + 6 x6 + 7 x7 + 8 x8 + 9 x9
+ 10 x10 + 11 x11 + 12 x12 + 13 x13 + 14 x14 + 15 x15 + 16 x16 + 17 x17
+ 18 x18 + 19 x19 + 20 x20 + 21 x21
b.
To determine if the 7 Diagnostic variables contribute information for the prediction

of y, we test:
H0: 5 = 6 = = 11 = 0
c.
The reduced model would be:

E ( y ) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 12 x12 + 13 x13 + 14 x14
+ 15 x15 + 16 x16 + 17 x17 + 18 x18 + 19 x19 + 20 x20 + 21 x21
11.90
d.
Since the p-value is so small (p < .0001), H0 is rejected. There is sufficient evidence
to indicate at least one of the seven diagnostic variables contributes information for
the prediction of y.
a.
The complete second order model is:

E(y) = 0 + 1x1 + x12 + 3x2 + 4x1x2 + 5 x12 x2
where x1 = age
1 if current
x2 =
0 otherwise
413
b.
To determine if the quadratic terms are important, we test:
c.
H0: 2 = 5 = 0
To determine if the interaction terms are important, we test:
H0: 4 = 5 = 0
d.
From MINITAB, the outputs from fitting the three models are:
Regression Analysis: Value versus Age, AgeSq, Status, AgeSt, AgeSqSt
Value = 83 - 5.7 Age + 0.236 AgeSq - 62 Status + 5.4 AgeSt - 0.234 AgeSqSt
Predictor
Constant
Age
AgeSq
Status
AgeSt
AgeSqSt
Coef
83.4
-5.74
0.2361
-62.1
5.36
-0.2337
S = 286.8
SE Coef
316.3
18.68
0.2549
354.8
24.81
0.4080
R-Sq = 24.7%
T
0.26
-0.31
0.93
-0.18
0.22
-0.57
P
0.793
0.760
0.359
0.862
0.830
0.570
R-Sq(adj) = 16.1%
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
AgeSt
AgeSqSt
DF
5
44
49
DF
1
1
1
1
1
SS
1186549
3618994
4805542
MS
237310
82250
F
2.89
P
0.024
Seq SS
865746
138871
77594
77342
26996
Regression Analysis: Value versus Age, Status, AgeSt

Value = - 176 + 11.2 Age + 196 Status - 11.4 AgeSt
Predictor
Constant
Age
Status
AgeSt
Coef
-176.1
11.166
196.5
-11.432
S = 283.2
SE Coef
145.0
3.902
178.9
6.763
R-Sq = 23.2%
T
-1.21
2.86
1.10
-1.69
P
0.231
0.006
0.278
0.098
R-Sq(adj) = 18.2%
Source
Regression
Residual Error
Total
Source
Age
Status
AgeSt
414
DF
1
1
1
DF
3
46
49
SS
1116017
3689526
4805543
MS
372006
80207
F
4.64
P
0.006
Seq SS
865746
21097
229174
Chapter 11
Regression Analysis: Value versus Age, AgeSq, Status

Value = 166 - 8.8 Age + 0.253 AgeSq - 106 Status
Predictor
Constant
Age
AgeSq
Status
Coef
165.8
-8.81
0.2535
-105.6
S = 284.5
SE Coef
182.7
10.89
0.1632
107.9
R-Sq = 22.5%
T
0.91
-0.81
1.55
-0.98
P
0.369
0.423
0.127
0.333
R-Sq(adj) = 17.5%
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
DF
1
1
1
DF
3
46
49
SS
1082210
3723332
4805542
MS
360737
80942
F
4.46
P
0.008
Seq SS
865746
138871
77594
Test for part b:

The test statistic is:
F=
(SSE R SSE C)/(k g ) (3, 689, 526 3, 618, 994) / 2

=
= .429
82, 250
SSE C /[n ( k + 1)]
Since no is given, we will use = .05. The rejection region requires = .05 in the
upper tail of the F distribution with 1 = 2 numerator degrees of freedom and 2 = 44
denominator degrees of freedom. From Table IX, Appendix B, F.05 3.23. The
rejection region is F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region (F =
.429 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the
quadratic terms are important for predicting market value at = .05.
Test for part c:
The test statistic is:
F=
(SSE R SSE C)/(k g ) (3, 723, 332 3, 618, 994) /(5 3)

=
= .634
82, 250
SSE C /[n (k + 1)]
The rejection region is the same as in previous test. Reject H0 if F > 3.23.
(F = .634 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the
interaction terms are important for predicting market value at = .05.
415
11.92
a.
The reduced model for testing if the mean posttest scores differ for the intervention
and control groups would be:
E(y) = 0 + 1x1
11.94
b.
The reported p-value is .03. Since the p-value is so small, H0 is rejected. There is
evidence to indicate that the mean posttest sun safety knowledge scores differ for the
intervention and control groups for > .03.
c.
The reported p-value is .033. Since the p-value is so small, H0 is rejected. There is
evidence to indicate that the mean posttest sun safety comprehension scores differ for
the intervention and control groups for > .033.
d.
The reported p-value is .322. Since the p-value is not small, H0 is not rejected. There
is no evidence to indicate that the mean posttest sun safety application scores differ
for the intervention and control groups for < .322.
a.
To determine whether the rate of increase of emotional distress with experience is

different for the two groups, we test:
H0: 4 = 5 = 0
b.
To determine whether there are differences in mean emotional distress levels that are
attributable to exposure group, we test:
H0: 3 = 4 = 5 = 0
c.
To determine whether there are differences in mean emotional distress levels that are
attributable to exposure group, we test:
H0: 3 = 4 = 5 = 0
(SSE R SSE C) /(k g )

(795.23 783.9) /(5 2)
=
= .93
783.9 /[200 (5 + 1)]
SSE C /[ n (k + 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k
g = 5 2 = 3 and 2 = n (k + 1) = 200 (5 + 1) = 194. From Table IX, Appendix
B, F.05 2.60. The rejection region is F > 2.60.
(F = .93 >/ 2.60), H0 is not rejected. There is insufficient evidence to indicate that
there are differences in mean emotional distress levels that are attributable to exposure
group at = .05.
416
Chapter 11
11.96
a.
The best one-variable predictor of y is the one whose t statistic has the largest absolute
value. The t statistics for each of the variables are:
Independent
Variable
x1
x2
x3
x4
x5
x6
t=
i
s
t = 1.6/.42 = 3.81
t = .9/.01 = 90
t = 3.4/1.14 = 2.98
t = 2.5/2.06 = 1.21
t = 4.4/.73 = 6.03
t = .3/.35 = .86
The variable x2 is the best one-variable predictor of y. The absolute value of the
corresponding t score is 90. This is larger than any of the others.
11.98
b.
Yes. In the stepwise procedure, the first variable entered is the one which has the
largest absolute value of t, provided the absolute value of the t falls in the rejection
region.
c.
Once x2 is entered, the next variable that is entered is the one that, in conjunction with
x2, has the largest absolute t value associated with it.
a.
In step 1, all 1 variable models are fit. Thus, there are a total of 11 models fit.
b.
In step 2, all two-variable models are fit, where 1 of the variables is the best one
selected in step 1. Thus, a total of 10 two-variable models are fit.
c.
In the 11th step, only one model is fit the model containing all the independent
variables.
d.
The model would be:
E ( y ) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 7 x7 + 9 x9 + 10 x10 + 11 x11
e.
67.7% of the total sample variability of overall satisfaction is explained by the model
containing the independent variables safety on bus, seat availability, dependability, t
travel time, convenience of route, safety at bus stops, hours of service, and frequency
of service.
f.
Using stepwise regression does not guarantee that the best model will be found.
There may be better combinations of the independent variables that are never found,
because of the order in which the independent variables are entered into the model.
417
11.100 a.
The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved
shape. Such a pattern usually indicates that curvature needs to be added to the model.
b.
The plot of the residuals reveals a nonrandom pattern. The residuals versus the
predicted values shows a pattern where the range in values of the residuals increases
as y increases. This indicates that the variance of the random error, , becomes
larger as the estimate of E(y) increases in value. Since E(y) depends on the x-values
in the model, this implies that the variance of is not constant for all settings of the
x's.
c.
This plot reveals an outlier, since all or almost all of the residuals should fall within 3
standard deviations of their mean of 0.
d.
This frequency distribution of the residuals is skewed to the right. This may be due to
outliers or could indicate the need for a transformation of the dependent variable.
11.102 a.
b.
Since all the pairwise correlations are .45 or less in absolute value, there is little
evidence of extreme multicollinearity.
No. The overall model test is significant (p < .001). This implies that at least one
variable contributes to the prediction of the urban/rural rating. Looking at the
individual t-tests, there are several that are significant, namely x1, x3, and x5. There is
no evidence that multicollinearity is present.
11.104 First, we need to compute the value of the residual:
Residual = y y = 87 29.63 = 57.37

We are given that the standard deviation is s = 24.68. Thus, an observation with a
residual of 57.37 is 57.37 / 24.68 = 2.32 standard deviations from the fitted regression
line. Since this is less than 3 standard deviations from the regression line, this point is
not considered an outlier.
418
Chapter 11
11.106 a.

Regression Analysis: Food versus Income, Size
Food = 2.79 - 0.00016 Income + 0.383 Size
Predictor
Constant
Income
Size
Coef
2.7944
-0.000164
0.38348
S = 0.7188
SE Coef
0.4363
0.006564
0.07189
R-Sq = 55.8%
T
6.40
-0.02
5.33
P
0.000
0.980
0.000
R-Sq(adj) = 52.0%
Source
Regression
Residual Error
Total
Source
Income
Size
DF
2
23
25
DF
1
1
SS
15.0027
11.8839
26.8865
MS
7.5013
0.5167
F
14.52
P
0.000
Seq SS
0.2989
14.7037
Correlations: Income, Size

Pearson correlation of Income and Size = -0.137
P-Value = 0.506
No; Income and household size do not seem to be highly correlated. The correlation
coefficient between income and household size is .137.
b.
Using MINITAB, the residual plots are:

Histogram of the Residuals
(response is Food)
Frequency
10
0
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Residual
419
Residuals Versus the Fitted Values

(response is Food)
3
Residual
-1
3
Fitted Value
Residuals Versus Income

(response is Food)
3
Residual
-1
0
10
20
30
40
50
60
70
80
90
100
Income
Residuals Versus Size

(response is Food)
3
Residual
-1
0
Size
Yes; The residuals versus income and residuals versus homesize exhibit a curved shape.
Such a pattern could indicate that a second-order model may be more appropriate.
420
Chapter 11
c.
No; The residuals versus the predicted values reveals varying spreads for different
values of y . This implies that the variance of is not constant for all settings of the
x's.
d.
Yes; The outlier shows up in several plots and is the 26th household (Food consumption
= $7500, income = $7300 and household size = 5).
e.
No; The frequency distribution of the residuals shows that the outlier skews the
frequency distribution to the right.
11.108 Using MINITAB, the residual plots are:
Residual Plots for DDT

Normal Probability Plot of the Residuals
Percent
99
90
50
10
1
0.1
Residuals Versus the Fitted Values

Standardized Residual
99.9
-5
0
5
2.5
0.0
50
10
50
Fitted Value
100
Residuals Versus the Order of the Data

Frequency
100
2
4
6
8
5.0
Histogram of the Residuals
7.5
10
150
10.0
10.0
7.5
5.0
2.5
0.0
1 10 20 30 4 0 5 0 6 0 7 0 8 0 9 0 00 10 20 30 40
1 1 1 1 1
Observation Order
Residuals Versus WEIGHT

(response is DDT)
12
10
8
6
4
2
0
0
500
1000
1500
2000
2500
WEIGHT
421
Residuals Versus LENGTH

(response is DDT)
12
10
8
6
4
2
0
20
25
30
35
LENGTH
40
45
50
55
Residuals Versus MILE

(response is DDT)
12
10
8
6
4
2
0
0
50
100
150
200
250
300
350
MILE
From the normal probability plot, the points do not fall on a straight line, indicating the
residuals are not normal. The histogram of the residuals indicates the residuals are
skewed to the right, which also indicates that the residuals are not normal. The plot of
the residuals versus yhat indicates that there is at least one outlier and the variance is
not constant. One observation has a standardized residual of more than 10 and several
others have standardized residuals greater than 3. This is also evident in the plots of the
residuals versus each of the independent variables. Since the assumptions of normality
and constant variance appear to be violated, we could consider transforming the data.
We should also check the outlying observations to see if there are any errors connected
with these observations.
11.110 a.
To determine if at least one of the parameters is not zero, we test:

H0: 1 = 2 = 3 = 4 = 0
422
R2 / k
.83 / 4
=
= 24.41
2
(1 R ) /[n (k + 1)] (1 .83)([25 (4 + 1)]
Chapter 11
numerator df = k = 4 and denominator df = n (k + 1) = 25 (4 + 1) = 20. From
Table IX, Appendix B, F.05 = 2.87. The rejection region is F > 2.87.
> 2.87), H0 is rejected. There is sufficient evidence to indicate at least one of the
parameters is nonzero at = .05.
b.
H0: 1 = 0
Ha: 1 < 0
1 0
s
2.43 0
= 2.01
1.21
The rejection region requires = .05 in the lower tail of the t distribution with df =
n (k + 1) = 25 (4 + 1) = 20. From Table VI, Appendix B, t.05 = 1.725. The
rejection region is t < 1.725.
Since the observed value of the test statistic falls in the rejection region (t = 2.01
< 1.725), H0 is rejected. There is sufficient evidence to indicate 1 is less than 0 at
= .05.
c.
H0: 2 = 0
Ha: 2 > 0
2 0
s
.05 0
= .31
.16
The rejection region requires = .05 in the upper tail of the t distribution. From part
b above, the rejection region is t > 1.725.
Since the observed value of the test statistic does not fall in the rejection region (t =
.31 >/ 1.725), H0 is not rejected. There is insufficient evidence to indicate 2 is
greater than 0 at = .05.
d.
H0: 3 = 0
Ha: 3 0
3 0
s
.62 0
= 2.38
.26
df = 20. From Table VI, Appendix B, t.025 = 2.086. The rejection region is t < 2.086
or t > 2.086.
2.086), H0 is rejected. There is sufficient evidence to indicate 3 is different from 0 at
= .05.
423
11.112 The error of prediction is smallest when the values of x1, x2, and x3 are equal to their sample
means. The further x1, x2, and x3 are from their means, the larger the error. When x1 = 60,
x2 = .4, and x3 = 900, the observed values are outside the observed ranges of the x values.
When x1 = 30, x2 = .6, and x3 = 1300, the observed values are within the observed ranges
and consequently the x values are closer to their means. Thus, when x1 = 30, x2 = .6, and
x3 = 1300, the error of prediction is smaller.
11.114 From the plot of the residuals for the straight line model, there appears to be a mound shape
which implies the quadratic model should be used.
11.116 a.
b.
Ha: At least one of 4 and 5 0

The regression model
E(y) = 0 + 1x1 + 2x2 + 3 x22 + 4x1x2 + 5x1 x22
is fit to the 35 data points, yielding a sum of squares for error, denoted SSEC. The
regression model
E(y) = 0 + 1x1 + 2x2 + 3 x22
is also fit to the data and its sum of squares for error is obtained, denoted SSER. Then
the test statistic is:
F=
(SSE R SSE C) /( k g )
SSE C /[n (k + 1)]
where k = 5, g = 3, and n = 35.

c.
The numerator degrees of freedom is k g = 5 3 = 2, and the denominator degrees

of freedom is n (k + 1) = 35 (5 + 1) = 29.
d.
numerator df = 2 and denominator df = 29. From Table IX, Appendix B, F.05 = 3.33.
11.118 a.
E(y) = 0 + 1x1 + 2x2 + 3x3

1, if level 2
1, if level 3
x3 =
where x2 =
0, otherwise
0, otherwise
b.
E(y) = 0 + 1x1 + 2 x12 + 3x2 + 4x3 + 5x1x2 + 6x1x3 + 7 x12 x2 + 8 x12 x3

where x1, x2, and x3 are as in part a.
424
Chapter 11
11.120 a.
b.
11.122 a.
E(y) = 0 + 1x1 + 2x2

E(y) = 0 + 1x1 + 2 x12 + 3x2 + 4 x22 + 5x1x2
1.
2.
3.
4.
5.
b.
c.
The "Quantitative GMAT score" is measured on a numerical scale, so it is a

quantitative variable.
The "Verbal GMAT score" is measured on a numerical scale, so it is a
The "Undergraduate GPA" is measured on a numerical scale, so it is a
The "First-year graduate GPA" is measured on a numerical scale, so it is a
The "Student cohort" has 3 categories, so it is a qualitative variable. Note that
the numerical scale is meaningless in this situation. (It is possible to consider
this as a quantitative variable. However, for this problem we will consider it as
qualitative.)
The quantitative variables GMAT score, verbal GMAT score, undergraduate GPA,
and first-year graduate GPA should all be positively correlated to final GPA.
1
x5 =
0
1
x6 =
0
if student entered doctoral program in year 3

otherwise
if student entered doctoral program in year 5
otherwise
d.
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6
e.
0 = the y-intercept for students entering in year 1.

1 = the final GPA will increase by 1 for each additional increase of one unit of
GMAT score, holding the remaining variables constant.
2 = the final GPA will increase by 2 for each additional increase of one unit of
verbal GMAT score, holding the remaining variables constant.
3 = the final GPA will increase by 3 for each additional increase of one
undergraduate GPA point, holding the remaining variables constant.
4 = the final GPA will increase by 4 for each additional increase of one first-year
graduate GPA point, holding the remaining variables constant.
5 = difference in mean final GPA between student cohort year 2 and year 1.
6 = difference in mean final GPA between student cohort year 3 and year 1.
f.
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x1x5 + 8x1x6

+ 9x2x5 + 10x2x6 + 11x3x5 + 12x3x6 + 13x4x5 + 14x4x6
425
g.
For the year 1 cohort, x5 = x6 = 0. The model is:

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5(0) + 6(0) + 7x1(0) + 8x1(0)
+ 9x2(0) + 10x2(0) + 11x3(0) + 12x3(0) + 13x4(0) + 14x4(0)
= 0 + 1x1 + 2x2 + 3x3 + 4x4
The slopes for the four variables are 1, 2, 3 and 4 respectively.
11.124 a.
The hypothesized model is:

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5
0 = y-intercept. It has no interpretation in this model.

1 = difference in the mean salaries between males and females, all other variables
held constant.
2 = difference in the mean salaries between whites and nonwhites, all other variables
held constant.
3 = change in the mean salary for each additional year of education, all other
4 = change in the mean salary for each additional year of tenure with firm, all other
5 = change in the mean salary for each additional hour worked per week, all other
b.
The least squares equation is:
y = 15.491 + 12.774x1 + .713x2 + 1.519x3 + .32x4 + .205x5
0 = estimate of the y-intercept. It has no interpretation in this model.

1 : We estimate the difference in the mean salaries between males and females to be
$12.774, all other variables held constant.
2 : We estimate the difference in the mean salaries between whites and nonwhites to
be
$.713, all other variables held constant.
3 : We estimate the change in the mean salary for each additional year of education
to be $1.519, all other variables held constant.
4 : We estimate the change in the mean salary for each additional year of tenure
with firm to be $.320, all other variables held constant.
5 : We estimate the change in the mean salary for each additional hour worked per
week to be $.205, all other variables held constant.
426
Chapter 11
c.
R2 = .240. 24% of the total variability of salaries is explained by the model containing
gender, race, educational level, tenure with firm, and number of hours worked per
week.
To determine if the model is useful for predicting annual salary, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
R2 / k
.24 / 5
=
= 11.68
2
(1 R )[n (k + 1)] (1 .24) /[191 (5 + 1)]
Table IX, Appendix B, F.05 2.21. The rejection region is F > 2.21.
2.21), H0 is rejected. There is sufficient evidence to indicate the model containing
gender, race, educational level, tenure with firm, and number of hours worked per
week is useful for predicting annual salary for = .05.
d.
To determine if male managers are paid more than female managers, we test:
H0: 1 = 0
Ha: 1 > 0
The p-value given for the test < .05/2 = .025. Since the p-value is less than = .05,
there is evidence to reject H0. There is evidence to indicate male managers are paid
more than female managers, holding all other variables constant, for > .025.
e.
11.126 a.
b.
The salary paid an individual depends on many factors other than gender. Thus, in
order to adjust for other factors influencing salary, we include them in the model.
The main effects model would be: E ( y ) = 0 + 1 x1 + 8 x8
1 = .28 . The mean value for the relative error of the effort estimate for developers
is estimated to be .28 units below that of project leaders, holding previous accuracy
constant.
8 = .27 . The mean value for the relative error of the effort estimate if previous
accuracy is more than 20% is estimated to be .27 units above that if previous
accuracy is less than 20%, holding company role of estimator constant.
c.
One possible reason for the sign of 1 being opposite from what is expected could be
that company role of estimator and previous accuracy could be correlated.
427
11.128 a.
R2 = .45. 45% of the total variability of the suicide rates is explained by the model
containing unemployment rate, percentage of females in the work force, divorce rate,
logarithm of GNP, and annual percent change in GNP.
To determine if the model is useful for predicting suicide rate, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
R2 / k
.45 / 5
=
= 6.38
2
(1 R )[n (k + 1)] (1 .45) /[45 (5 + 1)]
Table IX, Appendix B, F.05 2.45. The rejection region is F > 2.45.
2.45), H0 is rejected. There is sufficient evidence to indicate the model containing
unemployment rate, percentage of females in the work force, divorce rate, logarithm of
GNP and annual percent change in GNP is useful for predicting suicide rate for = .05.
b.
0 = .002 = estimate of the y-intercept. It has no interpretation in this model.

1 : We estimate the change in suicide rate for each unit change in unemployment
rate to be .0204, all other variables held constant.
2 : We estimate the change in suicide rate for each unit change in percentage of
females in the work force to be .0231, all other variables held constant.
3 : We estimate the change in suicide rate for each unit change in divorce rate to be
.0765, all other variables held constant.
4 : We estimate the change in suicide rate for each unit change in logarithm of GNP
to be .2760, all other variables held constant.
5 : We estimate the change in suicide rate for each unit change in annual percent
change in GNP to be .0018, all other variables held constant.
The p-values for unemployment rate and percentage of females in the work force are
less than .05. This indicates that both are important in predicting suicide rate. The pvalues for divorce rate, logarithm of GNP, and annual percent change in GNP are all
greater than .10. This indicates that none of these variables are important in
predicting suicide rate. We must view these conclusions with caution. Some of these
independent variables may be highly correlated with each other. If so, some of the
variables declared nonsignificant may be significant if the other variables are removed
from the model.
428
Chapter 11
c.
To determine if unemployment rate is a useful predictor of the suicide rate, we test:

H0: 1 = 0
Ha: 1 0
The p-value = .002. Since this p-value is less than = .05, there is evidence to reject
H0. There is sufficient evidence to indicate unemployment rate is a useful predictor of
the suicide rate for = .05.
d.
Curvature: It may be possible that the relationship between the suicide rate and some
of the independent variables is not linear, but curved. Thus, some of the variables that
do not appear to be useful predictors may, in fact, be useful predictors if the secondorder term was added to the model.
Interaction: Again, it may be possible that the effect of some independent variables
on the suicide rate is different for different levels of other independent variables. This
possibility should be explored before throwing out certain independent variables.
Multicollinearity: Some of these independent variables may be highly correlated with
each other. If so, some of the variables declared nonsignificant may be significant if
other variables are removed from the model.
11.130 CEO income (x1) and stock percentage (x2) are said to interact if the effect of one variable,
say CEO income, on the dependent variable profit (y) depends on the level of the second
variable, stock percentage.
11.132 a.
The SAS output is:

DEP VARIABLE: Y
ANALYSIS OF VARIANCE
SUM OF
MEAN
DF
SQUARES
SQUARE
F VALUE
PROB>F
MODEL
25784705.01
8594901.67
241.758
0.0001
ERROR
16
568826.19
35551.63709
C TOTAL
19
26353531.20
ROOT MSE
188.5514
R-SQUARE
0.9784
DEP MEAN
3014.2
ADJ R-SQ
0.9744
SOURCE
C.V.
6.255438
PARAMETER ESTIMATES
PARAMETER
STANDARD
T FOR H0:
ESTIMATE
ERROR
PARAMETER=0
PROB > |T|
290.99944
4.581
0.0003
0.37864583
-0.399
0.6949
5.34596285
-0.491
0.6300
0.006863831
7.569
0.0001
VARIABLE
DF
INTERCEP
1333.17830
X1
-0.15122302
X2
-2.62532461
X1X2
0.05195415
The fitted model is y = 1333.18 .151x1 2.625x2 + .052x1x2
429
b.
To determine if the overall model is useful, we test:

H0: 1 = 2 = 3 = 0
MSR
8, 594, 901.67
=
= 241.758
MSE
35, 551.637
Table IX, Appendix B, F.05 = 3.24. The rejection region is F > 3.24.
> 3.24), H0 is rejected. There is sufficient evidence to indicate the model is useful at
= .05.
c.
To determine if the interaction is present, we test:

H0: 3 = 0
Ha: 3 0
3 0
s
= 7.569.
2.120), H0 is rejected. There is sufficient evidence to indicate the interaction between
advertising expenditure and shelf space is present at = .05.
430
d.
Advertising expenditure and shelf space are said to interact if the affect of advertising
expenditure on sales is different at different levels of shelf space.
e.
If a first-order model was used, the effect of advertising expenditure on sales would
be the same regardless of the amount of shelf space. If interaction really exists, the
effect of advertising expenditure on sales would depend on which level of shelf space
was present.
Chapter 11
11.134 a.
There is a curvilinear trend.

b.

The regression equation is y = 42.2 - 0.0114x + 0.000001 xsq
Predictor
Coef
StDev
42.247
5.712
7.40
0.000
-0.011404
0.005053
-2.26
0.037
0.00000061
0.00000037
1.66
0.115
Constant
x
xsq
S = 21.81
R-Sq = 34.9%
R-Sq(adj) = 27.2%
Source
DF
SS
MS
4325.4
2162.7
4.55
0.026
475.6
Regression
Residual Error
17
8085.5
Total
19
12410.9
Sourc
DF
Seq SS
e
x
3013.3
xsq
1312.1
Obs
16
17
x1
9150
Fit
StDev Fit
Residual
4.60
-11.21
16.24
15.81
St Resid
1.09 x
15022
2.20
8.09
21.40
-5.89
-1.41 x
X denotes an observation whose X value gives it large influence.
The fitted model is y = 42.2 .0114x + .00000061x2
431
c.
To determine if a curvilinear relationship exists, we test:

H0: 2 = 0
Ha: 2 0
From MINITAB, the test statistic is t = 1.66 with p-value = .115. Since the p-value is
greater than = .05, do not reject H0. There is insufficient evidence to indicate that a
curvilinear relationship exists between dissolved phosphorus percentage and soil loss
at = .05.
11.136 a.
The first order model for this problem is:

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4
b.

Regression Analysis
y = 28.9 -0.000000 x1 + 0.844 x2 - 0.360 x3 - 0.300 x4
Predictor
Coef
StDev
28.87
12.67
2.28
0.034
x1
-0.00000011
0.00000028
-0.38
0.708
x2
0.8440
0.2326
3.63
0.002
x3
-0.3600
0.1316
-2.74
0.013
x4
-0.3003
0.1834
-1.64
0.117
Constant
S = 5.989
R-Sq = 51.2%
R-Sq(adj) = 41.5%
Source
Regression
DF
SS
MS
753.76
188.44
5.25
0.005
35.87
Residual Error
20
717.40
Total
24
1471.17
Source
DF
Seq SS
x1
129.96
x2
355.43
x3
172.19
x4
96.17
Obs
x1
Fit
StDev Fit
Residual
11940345
32.60
17.25
3.40
15.35
St Resid
3.11R
12
4905123
27.00
16.17
4.36
10.83
2.63R
432
Chapter 11
The least squares prediction line is y = 28.9 .00000011x1 + .844x2 .360x3 .300x4.
To determine if the model is useful for predicting percentage of problem mortgages,
we test:
H0: 1 = 2 = 3 = 4 = 0
Ha: At least one of the coefficients is nonzero
MS(Model)
= 5.25
MSE
The p-value is p = .005. Since the p-value is less than = .05 (p = .005 < .05), H0 is
rejected. There is sufficient evidence to indicate the model is useful in predicting
percentage of problem mortgages at = .05.
c.
0 = 28.9. This is merely the y-intercept. It has no other meaning in this problem.
1 = 0.00000011. For each unit increase in total mortgage loans, the mean
percentage of problem mortgages is estimated to decrease by 0.00000011, holding
percentage of invested assets, percentage of commercial mortgages, and percentage of
residential mortgages constant.
2 = 0.844. For each unit increase in percentage of invested assets, the mean
percentage of problem mortgages is estimated to increase by 0.844, holding total
mortgage loans, percentage of commercial mortgages, and percentage of residential
mortgages constant.
3 = 0.360. For each unit increase in percentage of commercial mortgages, the

mean percentage of problem mortgages is estimated to decrease by 0.360, holding
total mortgage loans, percentage of invested assets, and percentage of residential
mortgages constant.
4 = 0.300. For each unit increase in percentage of residential mortgages, the mean
percentage of problem mortgages is estimated to decrease by 0.300, holding total
mortgage loans, percentage of invested assets, and percentage of commercial
mortgages constant.
433
d.
Using MINITAB, the scattergrams are:
From the scattergrams, it appears that possibly x2 and x4 might warrant inclusion in
the model as second order terms.
434
Chapter 11
e.

Regression Analysis
y = 56.2 -0.000000 x1 - 1.82 x2 - 0.449 x3 + 0.223 x4 + 0.0771 x2sq - 0.0189 x4sq
Predictor
Coef
StDev
56.17
13.81
4.07
0.001
x1
-0.00000008
0.00000025
-0.31
0.760
x2
-1.8177
0.9935
-1.83
0.084
x3
-0.4494
0.1127
-3.99
0.001
x4
0.2227
0.6079
0.37
0.718
x2sq
0.07707
0.02665
2.89
0.010
x4sq
-0.01887
0.02334
-0.81
0.429
Constant
S = 4.956
R-Sq = 69.9%
R-Sq(adj) = 59.9%
Source
Regression
DF
SS
MS
1029.03
171.51
6.98
0.001
24.56
Residual Error
18
442.13
Total
24
1471.17
Source
DF
Seq SS
x1
129.96
x2
355.43
x3
172.19
x4
96.17
x2sq
259.22
x4sq
16.05
Obs
x1
Fit
StDev Fit
Residual
4 11940345
32.600
26.777
4.038
5.823
St Resid
2.03R
-2.04R
10
5328142
7.500
16.105
2.599
-8.605
12
4905123
27.000
16.559
3.607
10.441
3.07R
20
2978628
3.200
11.759
2.679
-8.559
-2.05R
The least squares prediction equation is

y = 56.2 .00000008x1 1.82x2 .449x3 + .223x4 + 1 .0771x22 .0189 x42
To determine if the model is useful for predicting percentage of problem mortgages,
we test:
H0: 1 = 2 = 3 = 4 = 5 = 6 = 0
435
MS(Model)
= 6.98
MSE
The p-value is p = .001. Since the p-value is less than = .05 (p = .001 < .05), H0 is
rejected. There is sufficient evidence to indicate the model is useful in predicting
f.
To determine if one or more of the second-order terms of our model contribute

information for the prediction of the percentage of problem mortgages, we test:
H0: 5 = 6 = 0
(SSE R SSE C) /( k g ) (717.40 442.13) /(6 4)

=
= 5.60
442.13 /[25 (6 + 1)]
SSE C /[n (k + 1)]
The rejection region requires = .05 in the upper tail of the F-distribution with 1 =
(k g) = (6 4) = 2 and 2 = n (k + 1) = 25 (6 + 1) = 18. From Table IX,
Appendix B, F.05 = 3.55. The rejection region is F > 3.55.
3.55), H0 is rejected. There is sufficient evidence to indicate one or more of the
second-order terms of our model contribute information for the prediction of the
11.138 a.
Using SAS, the output for fitting the model is:

DEP VARIABLE: Y
SUM OF
MEAN
DF
SQUARES
SQUARE
F VALUE
PROB>F
MODEL
2396.36410
798.78803
99.394
0.0001
ERROR
16
128.58590
8.03662
C TOTAL
11
2524.95000
SOURCE
436
ROOT MSE
2.83489
R-SQUARE
0.9491
DEP MEAN
23.05000
ADJ R-SQ
0.9395
C.V.
12.29889
Chapter 11
PARAMETER ESTIMATES
PARAMETER
STANDARD
T FOR H0:
VARIABLE
DF
ESTIMATE
ERROR
PARAMETER=0
PROB > |T|
INTERCEP
-11.768830
3.05032146
-3.858
0.0014
X1
10.293782
1.43788129
7.159
0.0001
X1SQ
-0.417991
0.16132974
-2.591
0.0197
X2
13.244076
1.50325080
8.810
0.0001
The fitted model is: y = 11.8 + 10.3x1 .418 x12 + 13.2x2

b.
To determine if the second-order term is necessary, we test:

H0: 2 = 0
Ha: 2 0

The p-value is p = .0197. Since the p-value is less than (p = .0197 < .05), H0 is
rejected. There is sufficient evidence to conclude that the second-order term in the
model proposed by the operations manager is necessary at = .05.
c.
The reduced model E(y) = 0 + 3x2 was fit to the data. The SAS output is:
DEP VARIABLE: Y
SUM OF
MEAN
DF
SQUARES
SQUARE
F VALUE
PROB>F
MODEL
1.25000000
1.25000000
0.009
0.9258
ERROR
18
2523.70000
140.20556
C TOTAL
19
2524.95000
ROOT MSE
11.84084
R-SQUARE
0.0005
DEP MEAN
23.05
ADJ R-SQ
-0.0550
SOURCE
C.V.
51.37025
PARAMETER ESTIMATES
PARAMETER
STANDARD
T FOR H0:
VARIABLE
DF
ESTIMATE
ERROR
PARAMETER=0
PROB > |T|
INTERCEP
23.30000000
3.74440323
6.223
0.0001
X2
-0.50000000
5.29538583
-0.094
0.9258
437
The fitted model is y = 23.3 .5x2.

The hypotheses are:
H0: 1 = 2 = 0
(SSE R SSE C) /(k g )

SSE C /[ n (k + 1)]
(2523.7 128.586) /(3 1) 1197.557
=
= 149.01
=
128.586 /[20 (3 + 1)]
8.036625
numerator df = k g = 3 1 = 2 and denominator df = n (k + 1) = 20 (3 + 1) = 16.
From Table VIII, Appendix B, F.10 = 2.67. The rejection region is F > 2.67.
> 2.67), H0 is rejected. There is sufficient evidence to indicate the age of the machine
contributes information to the model at = .10.
After adjusting for machine type, there is evidence that down time is related to age.
11.140 a.
For a sunny weekday, x1 = 0 and x2 = 1:

x3 = 70 y = 250 700(0) + 100(1) + 5(70) + 15(0)(70) = 700
x3 = 80 y = 250 700(0) + 100(1) + 5(80) + 15(0)(80) = 750
x3 = 90 y = 800
x3 = 100 y = 850
For a sunny weekend, x1 = 1 and x2 = 1:

x3 = 70 y = 250 700(1) + 100(1) + 5(70) + 15(1)(70) = 1050
x3 = 80 y = 250 700(1) + 100(1) + 5(80) + 15(1)(80) = 1250
x3 = 90 y = 1450
x3 = 100 y = 1650
438
Chapter 11
For both sunny weekdays and sunny weekend days, as the predicted high temperature
increases, so does the predicted day's attendance. However, the predicted day's
attendance on sunny weekend days increases at a faster rate than on sunny weekdays.
Also, the predicted day's attendance is higher on sunny weekend days than on sunny
weekdays.
b.
To determine if the interaction term is a useful addition to the model, we test:

H0: 4 = 0
Ha: 4 0
4
s
15
=5
3
Since the observed value of the test statistic falls in the rejection region (t = 5 > 2.06),
H0 is rejected. There is sufficient evidence to indicate the interaction term is a useful
addition to the model at = .05.
c.
For x1 = 0, x2 = 1, and x3 = 95,

y = 250 700(0) + 100(1) + 5(95) + 15(0)(95) = 825
d.
The width of the interval in Exercise 11.139e is 1245 645 = 600, while the width is
850 800 = 50 for the model containing the interaction term. The smaller the width
of the interval, the smaller the variance. This implies that the interaction term is quite
useful in predicting daily attendance. It has reduced the unexplained error.
439
e.
11.142 a.
Because an interaction term including x1 is in the model, the coefficient corresponding

to x1 must be interpreted with caution. For all observed values of x3 (temperature), the
interaction term value is greater than 700.
Regression Analysis: y versus x1, x2, x1sq, x2sq, x1x2
y = - 9.92 + 0.167 x1 + 0.138 x2 - 0.00111 x1sq -0.000843 x2sq
+0.000241 x1x2
Predictor
Constant
x1
x2
x1sq
x2sq
x1x2
Coef
-9.917
0.16681
0.13760
-0.0011082
-0.0008433
0.0002411
S = 0.1871
SE Coef
1.354
0.02124
0.02673
0.0001173
0.0001594
0.0001440
R-Sq = 93.7%
T
-7.32
7.85
5.15
-9.45
-5.29
1.67
P
0.000
0.000
0.000
0.000
0.000
0.103
R-Sq(adj) = 92.7%
Source
Regression
Residual Error
Total
Source
x1
x2
x1sq
x2sq
x1x2
DF
5
34
39
DF
1
1
1
1
1
SS
17.5827
1.1908
18.7735
MS
3.5165
0.0350
F
100.41
P
0.000
Seq SS
5.2549
7.5311
3.6434
1.0552
0.0982

y = 9.917 + .167 x1 + .138 x2 .00111x12 .000843 x22 + .000241x 1 x2
b.
The standard deviation for the first-order model is s = .4023. The standard deviation
for the second-order model is s = .1871.
The relative precision for the first-order model is 2(.4023) = .8046. The relative
precision for the second-order model is 2(.1871) = .3742.
c.
To determine if the model is useful, we test:

H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0, i = 1, 2, ... , 5
MSR
3.5165
=
= 100.41
MSE
.0350
The p-value is .0000. Since the p-value is less than = .05, H0 is rejected. There is
sufficient evidence to indicate the model is useful for predicting GPA at = .05.
440
Chapter 11
d.
To determine if the interaction term is important, we test:

H0: 5 = 0
Ha: 5 0

The p-value is .103. Since the p-value is not less than = .10, H0 is not rejected.
There is insufficient evidence to indicate the interaction term is important for
predicting GPA at = .10.
e.
From MINITAB, the plots are:
Residuals Versus x1
(response is y)
0.5
0.4
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
40
50
60
70
80
90
100
x1
441
Residuals Versus x2
(response is y)
0.5
0.4
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
50
60
70
80
90
100
x2
The residual plots of the residuals against x1 and against x2 for the second-order model
indicate there is no mound or bowl shape in either graph. This implies that secondorder is the highest order necessary. We have eliminated the mound shape from the
plots of the residuals against x1 and the residuals against x2 for the first-order model.
From the plots and the results of the tests in 11.145, it appears the second order model
is preferable for predicting GPA.
f.
To see if the second-order terms are useful, we test:

H0: 3 = 4 = 5
(SSE R SSE C ) /(k g ) (5.9876 1.1908) / 3

=
= 45.68
.0350
SSE C / [ n (k + 1) ]
Since no is given, we will use = .05. The rejection region requires = .05 in the
upper tail of the F distribution with 1 = k g = 5 2 = 3 and 2 = n [k + 1] =
40 (5 + 1) = 34. From Table IX, Appendix B, F.05 2.92. The rejection region is
F > 2.92.
2.92), H0 is rejected. There is sufficient evidence that at least one second-order term
is useful at = .05.
442
Chapter 11
11.144 a.
The model is E(y) = 0 + 1x1

A sketch of the response curve might be:
b.
The model is E(y) = 0 + 1x1 + 2x2 + 3x3

1 if brand 2
where x 2 =
0 otherwise
1 if brand 3
x3 =
0 otherwise
c.
The model is E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3

443
The Condo Sales Case

(To accompany Chapters 1011)
Several models were fit to obtain the final model. I first fit a model with only the main effects for
Floor, Distance, View, Endunit, and Furnish. Of these, only Furnish, adjusted for the other variables,
was not significant. See the output below.
Price = 184 - 3.81 Floor + 1.74 Distance + 40.3 View - 32.7 Endunit
+ 4.28 Furnish
Predictor
Constant
Floor
Distance
View
Endunit
Furnish
Coef
183.570
-3.8076
1.7414
40.325
-32.716
4.279
s = 24.39
Stdev
5.221
0.7482
0.3750
3.456
9.581
3.602
R-sq = 49.4%
t-ratio
35.16
-5.09
4.64
11.67
-3.41
1.19
p
0.000
0.000
0.000
0.000
0.001
0.236
R-sq(adj) = 48.2%
SOURCE
Regression
Error
Total
SOURCE
Floor
Distance
View
Endunit
Furnish
DF
5
203
208
SS
118091
120802
238893
DF
1
1
1
1
1
SEQ SS
14149
21208
75065
6829
840
MS
23618
595
F
39.69
p
0.000
I then added Floor2 and Distance2 to the model with all main effects. For this model, all of the main
effects, including Furnish, were significant along with both squared terms. The output follows.
Price = 220 - 13.3 Floor - 7.01 Distance + 38.9 View - 22.0 Endunit
+ 7.31 Furnish + 1.05 FlSq + 0.572 DiSq
Predictor
Constant
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
s = 22.49
444
Coef
220.258
-13.296
-7.007
38.927
-21.967
7.308
1.0512
0.5719
Stdev
8.178
3.253
1.614
3.202
9.086
3.419
0.3492
0.1033
R-sq = 57.4%
t-ratio
26.93
-4.09
-4.34
12.16
-2.42
2.14
3.01
5.54
p
0.000
0.000
0.000
0.000
0.017
0.034
0.003
0.000
R-sq(adj) = 56.0%
SOURCE
Regression
Error
Total
DF
7
201
208
SS
137234
101659
238893
DF
1
1
1
1
1
1
1
SEQ SS
14149
21208
75065
6829
840
3640
15503
SOURCE
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
MS
19605
506
F
38.76
p
0.000
I then did a stepwise regression, forcing all the main effects and the two squared terms into the model,
to see if any two-way interaction terms could be added to the model. From this, only the interaction
between Floor and View was significant. The output from the final model is:
Price = 206 - 9.93 Floor - 7.02 Distance + 66.0 View - 22.5 Endunit
+ 6.48 Furnish + 1.02 FlSq + 0.577 DiSq - 6.04 FV
Predictor
Constant
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
FV
Coef
206.123
-9.927
-7.020
65.952
-22.451
6.485
1.0207
0.57720
-6.037
s = 21.44
Stdev
8.379
3.186
1.539
6.619
8.662
3.265
0.3330
0.09848
1.312
R-sq = 61.5%
t-ratio
24.60
-3.12
-4.56
9.96
-2.59
1.99
3.07
5.86
-4.60
p
0.000
0.002
0.000
0.000
0.010
0.048
0.002
0.000
0.000
R-sq(adj) = 60.0%
SOURCE
Regression
Error
Total
DF
8
200
208
SS
146965
91928
238893
DF
1
1
1
1
1
1
1
1
SEQ SS
14149
21208
75065
6829
840
3640
15503
9731
SOURCE
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
FV
MS
18371
460
F
39.97
p
0.000
445
This final model is fairly good. The R-squared value is .615. Thus, 61.5% of the variation in prices can
be explained by the model that includes the follow variables: Floor and Floor-squared, Distance and
Distance-squared, View, Endunit, Furnish, and the interaction of Floor and View. The residual plots
are as follows:
From the residual plots, it appears that the data are normally distributed, but there may be a couple of
outliers. This is evident by the two points whose standardized residuals are less than 3. Also, it
appears that there is constant variance. Thus, the model looks to be fairly good. It would be better if
the R-squared value was higher, however.
The final model is:
Price = 206 9.93 Floor 7.02 Distance + 66.0 View 22.5 Endunit + 6.48 Furnish
+ 1.02 FlSq + 0.577 DiSq - 6.04 FV
I have included graphs to indicate how each variable affects the price. These graphs reflect the
relationship between Price and a selected variable, holding the other variables constant.
The first graph is a graph of Price by Floor for each level of View, since Floor and View interact. Both
lines are curved to reflect the quadratic relationship between Floor and Price. For the Non-ocean view,
the price is fairly constant. There is a slight decrease in price as the Floor increases until Floor 5, and
then a slight increase as the floor increases. For the Ocean view, the price decreases at a decreasing rate
as the Floor increases.
The second graph is a graph of the Price by Distance. Again, the quadratic relationship is reflected by
the curved line. As the distance increases, the price decreases until a distance of 6 is reached. Then the
price begins to increase again as the distance increases.
446
The third graph is a graph of the Price by View, for each Floor. Again, we must look at the relationship
between Price and View at each Floor because of the significant interaction. For all Floors, the price of
the Ocean View is higher than the price of the Non-ocean View. However, the difference in the two
views depends on the floor.
The fourth graph is a graph of the Price by Endunit. From the graph, the price of the endunits are less
than the others.
The last graph is a graph of the Price by Furnish. From the graph, the price of the furnished units is
higher than the price of the non-furnished units.
447

CH11 Isbe PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH11 Isbe PDF

Uploaded by

Copyright:

Available Formats

Multiple Regression and

0 = 506.346, 1 = 941.900, 2 = -429.060

y = 506.346 941.900x1 429.060x2

SSE = 151,016, MSE = 8883, s = 94.251

We expect about 95% of the y-values to fall within 2s or 2(94.251) or 188.502

The test statistic is t =

2 t.025 s 429.060 2.110(379.83) 429.060 801.441

Multiple Regression and Model Building

To determine if at least one of the independent variables is significant in prediction y,

We are given 1 = 3.1, s = 2.3, and n = 25.

We are given 2 = .92, s = .27, and n = 25.

1 t.05 s 3.1 1.717(2.3) 3.1 3.949 (.849, 7.049)

We are 90% confident that 1 falls between .849 and 7.049.

2 t.005 s .92 2.819(.27) .92 .761 (.159, 1.681)

We are 99% confident that 2 falls between .159 and 1.681.

Multiple Regression and Model Building

The relationship will be parallel lines.

The first order model is: E(y) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 5 x5

The least squares prediction equation is:

The test statistic is t =

Multiple Regression and Model Building

5 t / 2 s 1.51 1.96(.05) 1.51 0.098 (1.412, 1.608)

To determine if foreign investments in 1980 is a useful predictor of CO2 emissions in

From MINITAB, the output is:

The least squares prediction equation is:

y = 108.07 + 0.08509x1 + 3.771x2 0.04941x3

To determine if DDT level increases as length increases, we test:

The test statistics is t = 2.33

3 t / 2 s 0.04941 1.96(0.02926) 0.04941 0.05735

Multiple Regression and Model Building

From MINITAB, the output is:

y = 12.2 .0265x1 .458x2

0 = 12.2 = the estimate of the y-intercept

To determine if digestion efficiency is a useful predictor of weight change, we test:

2 t.005 s .4578 2.704 (.1283) .4578 .3469 (.8047, .1109)

To determine if at least one of the variables is useful in predicting weight change, we

The least squares prediction equation is:

To determine if the model is adequate, we test:

For each additional subsidiary of the auditee, the mean of the

Multiple Regression and Model Building

To determine if the 4 > 0, we test:

To determine if the 1 < 0, we test:

To determine if the model is useful, we test:

From MINITAB, the output is:

The least squares equation is:

To test the usefulness of the model, we test:

Multiple Regression and Model Building

Since s = Standard Error = 9.81, we can estimate approximately with 2s precision or

From MINITAB, the output is:

Predicted Values for New Observations

Values of Predictors for New Observations

The fitted regression line is:

y = 102.36 + 0.00409 x1 + 3.4511x2 0.1429 x3

The prediction interval is (3.71, 54.80).

The first order model is:

E(y) = 0 + 1x1 + 2x2 + 3x5

Multiple Regression and Model Building

R denotes an observation with a large standardized residual

E(y) = 0 + 1x1 + 2x2 + 3x1x2

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3 + 6x2x3

95.6% of the total variability of the y values is explained by this model.

To test the utility of the model, we test: