You are on page 1of 69

Multiple Regression and

Model Building

11.2

a.

0 = 506.346, 1 = 941.900, 2 = -429.060

b.

y = 506.346 941.900x1 429.060x2

c.

SSE = 151,016, MSE = 8883, s = 94.251

Chapter 11

We expect about 95% of the y-values to fall within 2s or 2(94.251) or 188.502


units of the fitted regression equation.
d.

H0: 1 = 0
Ha: 1 0

The test statistic is t =

1 0
s

941.900
= 3.42
275.08

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 20 - (2 + 1) = 17. From Table VI, Appendix B, t.025 = 2.110. The
rejection region is t < 2.110 or t > 2.110.
Since the observed value of the test statistic falls in the rejection region (t = 3.42 <
2.110), H0 is rejected. There is sufficient evidence to indicate 1 0 at = .05.
e.

For confidence coefficient .95, = .05 and /2 = .025. From Table VI, Appendix
B, with df = n (k + 1) = 20 (2 + 1) = 17, t.025 = 2.110. The 95% confidence
interval is:

2 t.025 s 429.060 2.110(379.83) 429.060 801.441


2

(1230.501, 372.381)
f.

R2 = R-Sq = 45.9% . 45.9% of the total sample variation of the y values is explained
by the model containing x1 and x2.
R2a = R-Sq(adj) = 39.6%. 39.6% of the total sample variation of the y values is
explained by the model containing x1 and x2, adjusted for the sample size and the
number of parameters in the model.

Multiple Regression and Model Building

379

g.

To determine if at least one of the independent variables is significant in prediction y,


we test:
H0: 1 = 2 = 0
Ha: At least one i 0
From the printout, the test statistic is F = 7.22
Since no level was given, we will choose = .05. The rejection region requires
= .05 in the upper tail of the F-distribution with 1 = k = 2 and 2 = n (k + 1)
= 20 (2 + 1) = 17. From Table IX, Appendix B, F.05 = 3.59. The rejection region is
F > 3.59.
Since the observed value of the test statistic falls in the rejection region
( F = 7.22 > 3.59), H0 is rejected. There is sufficient evidence to indicate at least
one of the variables, x1 or x2, is significant in predicting y at = .05.

11.4

h.

The observed significance level of the test is p-value = 0.005. Since the
p-value is so small, we will reject H0 for most reasonable values of . There is
sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in
predicting y at greater than 0.005.

a.

We are given 1 = 3.1, s = 2.3, and n = 25.


1

H0: 1 = 0
Ha: 1 > 0
The test statistic is t =

1 0
s

3.1
= 1.35
2.3

The rejection region requires = .05 in the upper tail of the t distribution with df =
n (k + 1) = 25 (2 + 1) = 22. From Table VI, Appendix B, t.05 = 1.717. The
rejection region is t > 1.717.
Since the observed value of the test statistic does not fall in the rejection region (t =
1.35 >/ 1.717), H0 is not rejected. There is insufficient evidence to indicate 1 > 0 at
= .05.
b.

We are given 2 = .92, s = .27, and n = 25.


2

H0: 2 = 0
Ha: 2 0
The test statistic is t =

2 0
s

.92
= 3.41
.27

380

Chapter 11

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 25 (2 + 1) = 22. From Table VI, Appendix B, t.025 = 2.074. The
rejection region is t < 2.074 or t > 2.074.
Since the observed value of the test statistic falls in the rejection region (t = 3.41 >
2.074), reject H0. There is sufficient evidence to indicate 2 0 at = .05.
c.

For confidence coefficient .90, = 1 .90 = .10 and /2 = .10/2 = .05. From Table
VI, Appendix B, with df = n (k + 1) = 25 (2 + 1) = 22, t.05 = 1.717. The
confidence interval is:

1 t.05 s 3.1 1.717(2.3) 3.1 3.949 (.849, 7.049)


1

We are 90% confident that 1 falls between .849 and 7.049.


d.

For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table
VI, Appendix B, with df = n (k + 1) = 25 (2 + 1) = 22, t.005 = 2.819. The
confidence interval is:

2 t.005 s .92 2.819(.27) .92 .761 (.159, 1.681)


2

We are 99% confident that 2 falls between .159 and 1.681.


11.6

a.

For x2 = 1 and x3 = 3,
E(y) = 1 + 2x1 + 1 3(3)
E(y) = 2x1 7
The graph is :

Multiple Regression and Model Building

381

b.

For x2 = 1 and x3 = 1
E(y) = 1 + 2x1 + (1) 3(1)
E(y) = 2x1 3
The graph is:

c.

They are parallel, each with a slope of 2. They have different y-intercepts.

d.

The relationship will be parallel lines.

11.8

No. There may be other independent variables that are important that have not been
included in the model, while there may also be some variables included in the model which
are not important. The only conclusion is that at least one of the independent variables is a
good predictor of y.

11.10

a.

The first order model is: E(y) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 5 x5

b.

R2 = .58. 58% of the total sample variation of the levels of trust is explained by the
model containing the 5 independent variables.

c.

F=

d.

.58 5
R2 k
=
= 16.57
2
(1 R ) [n (k + 1)] (1 .58) [66 (5 + 1)]

The rejection region requires = .10 in the upper tail of the F-distribution with 1 = k
= 5 and 2 = n (k + 1) = 66 (5 + 1) = 60. From Table VIII, Appendix B, F.10 = 1.90.
The rejection region is F > 1.96.
Since the observed value of the test statistic falls in the rejection region
(F = 16.57 > 1.96), H0 is rejected. There is sufficient evidence to indicate that at
least one of the 5 independent variables is useful in the prediction of level of trust at
= .10.

11.12

a.

The least squares prediction equation is:

y = 3.70 + .34 x1 + .49 x2 + .72 x3 + 1.14 x4 + 1.51x5 + .26 x6 .14 x7 .10 x8 .10 x9 .

382

Chapter 11

b.

0 = 3.70 . This is estimate of the y-intercept. It has no other meaning because the
point with all independent variables equal to 0 is not in the observed range.

1 = 0.34 . For each additional walk, the mean number of runs scored is estimated
to increase by .30, holding all other variables constant.

2 = 0.49 . For each additional single, the mean number of runs scored is estimated to
increase by .49, holding all other variables constant.

3 = 0.72 . For each additional double, the mean number of runs scored is
estimated to increase by .72, holding all other variables constant.

4 = 1.14 . For each additional triple, the mean number of runs scored is estimated
to increase by 1.14, holding all other variables constant.

5 = 1.51 . For each additional home run, the mean number of runs scored is
estimated to increase by 1.51, holding all other variables constant.

6 = 0.26 . For each additional stolen base, the mean number of runs scored is
estimated to increase by .26, holding all other variables constant.

7 = 0.14 . For each additional time a runner is caught stealing, the mean number
of runs scored is estimated too decrease by .14, holding all other variables constant.

8 = 0.10 . For each additional strikeout, the mean number of runs scored is
estimated to decrease by .10, holding all other variables constant.

9 = 0.10 . For each additional out, the mean number of runs scored is estimated
to decrease by .10, holding all other variables constant.
c.

H0: 7 = 0
Ha: 7 < 0

The test statistic is t =

7 0
s

.14 0
= 1.00
.14

The rejection region requires = .05 in the lower tail of the t-distribution with df
= n (k + 1) = 234 (9 + 1) = 224. From Table VI, Appendix B, t.05 = 1.645. The
rejection region is t < 1.645.
Since the observed value of the test statistic does not fall in the rejection region
(t = 1.00 </ 1.645), H0 is not rejected. There is insufficient evidence to indicate
that the mean number of runs decreases as the number of runners caught stealing
increase, holding all other variables constant at = .05.

Multiple Regression and Model Building

383

d.

For confidence level .95, = .05 and /2 = .05/2 = .025. From Table VI, Appendix
B, with df = 224, t.025 = 1.96. The 95% confidence interval is:

5 t / 2 s 1.51 1.96(.05) 1.51 0.098 (1.412, 1.608)


5

We are 95% confident that the mean number of runs will increase by anywhere from
1.412 to 1.608 for each additional home run, holding all other variables constant.
11.14. a.

b.

R2 = .31. 31% of the total sample variation of the natural log of the level of CO2
emissions in 1996 is explained by the model containing the 7 independent variables.
The test statistic is F =

.31 7
R2 k
=
= 3.72
2
(1 R ) [n (k + 1)] (1 .31) [66 (7 + 1)]

The rejection region requires = .01 in the upper tail of the F-distribution with 1 = k
= 7 and 2 = n (k + 1) = 66 (7 + 1) = 58. From Table XI, Appendix B, F.01 = 2.95.
The rejection region is F > 2.95.
Since the observed value of the test statistic falls in the rejection region
(F = 3.72 > 2.95), H0 is rejected. There is sufficient evidence to indicate that at
least one of the 7 independent variables is useful in the prediction of natural log of
the level of CO2 emissions in 1996 at = .01.
c.

To determine if foreign investments in 1980 is a useful predictor of CO2 emissions in


1996, we test:
H0: 1 = 0
Ha: 1 0

11.16

d.

The test statistic is t = 2.52 and the p-value is p < 0.05. Since the observed p-value is
less than (p < .05), Ho is rejected. There is sufficient evidence to indicate foreign
investments in 1980 is a useful predictor of CO2 emissions in 1996 at = .05.

a.

From MINITAB, the output is:


Regression Analysis: DDT versus Mile, Length, Weight
The regression equation is
DDT = - 108 + 0.0851 Mile + 3.77 Length - 0.0494 Weight
Predictor
Constant
Mile
Length
Weight

Coef
-108.07
0.08509
3.771
-0.04941

S = 97.48

SE Coef
62.70
0.08221
1.619
0.02926

R-Sq = 3.9%

T
-1.72
1.03
2.33
-1.69

P
0.087
0.302
0.021
0.094

R-Sq(adj) = 1.8%

Analysis of Variance
Source
Regression
Residual Error
Total

384

DF
3
140
143

SS
53794
1330210
1384003

MS
17931
9501

F
1.89

P
0.135

Chapter 11

The least squares prediction equation is:

y = 108.07 + 0.08509x1 + 3.771x2 0.04941x3


b.

s = 97.48. We would expect about 95% of the observed values of DDT level to fall
within 2s or 2(97.48) = 194.96 units of their least squares predicted values.

c.

To determine if at least one of the variables is useful in predicting the DDT level, we
test:
Ho: 1 = 2 = 3 = 0
Ha: At least 1 i 0

The test statistic is F = 1.89 and the p-value is p = .135. Since the p-value is not less
than = .05 (p = .135 </ .05), H0 is not rejected. There is insufficient evidence to
indicate at least one of the variables is useful in predicting the DDT level at = .05.
d.

To determine if DDT level increases as length increases, we test:


H0: 2 = 0
Ha: 2 > 0

The test statistics is t = 2.33


The p-value is p = .021/2 = .0105. Since the p-value is less than (p = .0105 < .05),
H0 is rejected. There is sufficient evidence to indicate that DDT level increases as
length increases, holding the other variables constant at = .05.
The observed significance level is p = .0105.
e.

For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table VI,
Appendix B, with df = n 3 = 144 4 = 140, t.025 = 1.96. The 95% confidence
interval is:

3 t / 2 s 0.04941 1.96(0.02926) 0.04941 0.05735


3

(0.10676, 0.00794)

We are 95% confident that the mean DDT level will change from 0.10676 to
0.00794 for each additional point increase in weight, holding length and mile
constant. Since 0 is in the interval, there is no evidence that weight and DDT level
are linearly related.

Multiple Regression and Model Building

385

11.18

a.

From MINITAB, the output is:


Regression Analysis: WeightChg versus Digest, Fiber
The regression equation is
WeightChg = 12.2 - 0.0265 Digest - 0.458 Fiber
Predictor
Constant
Digest
Fiber

Coef
12.180
-0.02654
-0.4578

S = 3.519

SE Coef
4.402
0.05349
0.1283

R-Sq = 52.9%

T
2.77
-0.50
-3.57

P
0.009
0.623
0.001

R-Sq(adj) = 50.5%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
2
39
41

SS
542.03
483.08
1025.12

MS
271.02
12.39

F
21.88

P
0.000

y = 12.2 .0265x1 .458x2


b.

0 = 12.2 = the estimate of the y-intercept


1 = .0265. We estimate that the mean weight change will decrease by .0265% for
each additional increase of 1% in digestion efficiency, with acid-detergent fibre held
constant.

2 = .458. We estimate that the mean weight change will decrease by .458% for
each additional increase of 1% in acid-detergent fibre, with digestion efficiency held
constant.
c.

To determine if digestion efficiency is a useful predictor of weight change, we test:


H0: 1 = 0
Ha: 1 0
The test statistic is t = .50. The p-value is p = .623. Since the p-value is greater than
(p = .623 > .01), H0 is not rejected. There is insufficient evidence to indicate that
digestion efficiency is a useful linear predictor of weight change at = .01.

d.

For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table
VI, Appendix B, with df = n (k + 1) = 42 (2 + 1) = 39, t.005 2.704. The 99%
confidence interval is:

2 t.005 s .4578 2.704 (.1283) .4578 .3469 (.8047, .1109)


2

We are 99% confident that the change in mean weight change for each unit change in
acid-detergent fiber, holding digestion efficiency constant is between .8047% and
.1109%.

386

Chapter 11

e.

R2 = R-Sq = 52.9%. 52.9% of the total sample variance of the weight changes is
explained by the model containing the 2 independent variables, digestion efficiency ad
acid-detergent fiber.
R2a = R-Sq(adj) = 50.5%. 50.5% of the total sample variance of the weight changes is
explained by the model containing the 2 independent variables, digestion efficiency
ad acid-detergent fiber, adjusting for the sample size and the number of parameters in
the model.

f.

To determine if at least one of the variables is useful in predicting weight change, we


test:
H0: 1 = 2 = 0
Ha: At least 1 i 0
The test statistic is F = 21.88 and the p-value is p = .000. Since the p-value is less
than = .05 (p = .000 < .05), H0 is rejected. There is sufficient evidence to indicate at
least one of the variables is useful in predicting weight change at = .05.

11.20

a.

The least squares prediction equation is:


y = 4.30 .002x1 + .336x2 + .384x3 + .067x4 .143x5 + .081x6 + .134x7

b.

To determine if the model is adequate, we test:


H0: 1 = 2 = 3 = 4 = 5 = 6 = 7 = 0
Ha: At least one i 0, i = 1, 2, 3, ..., 7
The test statistic is F = 111.1 (from table).
Since no was given, we will use = .05. The rejection region requires = .05 in
the upper tail of the F-distribution with 1 = k = 7 and 2 = n (k + 1) = 268 (7 + 1)
= 260. From Table IX, Appendix B, F.05 2.01. The rejection region is F > 2.01.
Since the observed value of the test statistic falls in the rejection region (F = 111.1 >
2.01), H0 is rejected. There is sufficient evidence to indicate that the model is
adequate for predicting the logarithm of the audit fees at = .05.

c.

3 = .384.

For each additional subsidiary of the auditee, the mean of the


logarithm of audit fee is estimated to increase by .384 units.

Multiple Regression and Model Building

387

d.

To determine if the 4 > 0, we test:


H0: 4 = 0
Ha: 4 > 0
The test statistic is t = 1.76 (from table).
The p-value for the test is .079. Since the p-value is not less than (p = .079 </ =
.05), H0 is not rejected. There is insufficient evidence to indicate that 4 > 0, holding
all the other variables constant, at = .05.

e.

To determine if the 1 < 0, we test:


H0: 1 = 0
Ha: 1 < 0
The test statistic is t = 0.049 (from table).
The p-value for the test is .961. Since the p-value is not less than (p = .961 </ =
.05), H0 is not rejected. There is insufficient evidence to indicate that 1 < 0, holding
all the other variables constant, at = .05. There is insufficient evidence to indicate
that the new auditors charge less than incumbent auditors.

11.22

To determine if the model is useful, we test:


H0: 1 = 2 = = 18 = 0
Ha: At least one i 0, i = 1, 2, ... , 18
The test statistic is F =

R2 / k
.95 /18
=
= 1.06
2
(1 R ) /[n ( k + 1)]
(1 .95) /[20 (18 + 1)]

The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 18
and 2 = n (k + 1) = 20 (18 + 1) = 1. From Table IX, Appendix B, F.05 245.9. The
rejection region is F > 245.9.
Since the observed value of the test statistic does not fall in the rejection region (F = 1.06
>/ 247), H0 is not rejected. There is insufficient evidence to indicate the model is adequate
at = .05.
Note: Although R2 is large, there are so many variables in the model that 2 is small.

388

Chapter 11

11.24

a.

From MINITAB, the output is:


Regression Analysis: Labor versus Pounds, Units, Weight
The regression equation is
Labor = 132 + 2.73 Pounds + 0.0472 Units - 2.59 Weight
Predictor
Constant
Pounds
Units
Weight

Coef
131.92
2.726
0.04722
-2.5874

S = 9.810

SE Coef
25.69
2.275
0.09335
0.6428

R-Sq = 77.0%

T
5.13
1.20
0.51
-4.03

P
0.000
0.248
0.620
0.001

R-Sq(adj) = 72.7%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Pounds
Units
Weight

DF
3
16
19

DF
1
1
1

SS
5158.3
1539.9
6698.2

MS
1719.4
96.2

F
17.87

P
0.000

Seq SS
3400.6
198.4
1559.3

The least squares equation is:


y = 131.92 + 2.726x1 + .0472x2 2.587x3
b.

To test the usefulness of the model, we test:


H0: 1 = 2 = 3 = 0
Ha: At least one i 0, for i = 1, 2, 3
The test statistic is F =

MSR
1719.4
=
= 17.87
MSE
96.2

The rejection region requires = .01 in the upper tail of the F-distribution with
1 = k = 3 and 2 = n (k + 1) = 20 (3 + 1) = 16. From Table XI, Appendix B,
F.01 = 5.29. The rejection region is F > 5.29.
Since the observed value of the test statistic falls in the rejection region (F = 17.87
> 5.29), H0 is rejected. There is sufficient evidence to indicate a relationship exists
between hours of labor and at least one of the independent variables at = .01.
c.

H0: 2 = 0
Ha: 2 0
The test statistic is t = .51. The p-value = .620. We reject H0 if p-value < . Since
.620 > .05, do not reject H0. There is insufficient evidence to indicate a relationship
exists between hours of labor and percentage of units shipped by truck, all other
variables held constant, at = .05.

Multiple Regression and Model Building

389

d.

R2 is printed as R-Sq. R2 = .770. We conclude that 77% of the sample variation of the
labor hours is explained by the regression model, including the independent variables
pounds shipped, percentage of units shipped by truck, and weight.

e.

If the average number of pounds per shipment increases from 20 to 21, the estimated
change in mean number of hours of labor is 2.587. Thus, it will cost $7.50(2.587) =
$19.4025 less, if the variables x1 and x2 are constant.

f.

Since s = Standard Error = 9.81, we can estimate approximately with 2s precision or


2(9.81) or 19.62 hours.

g.

No. Regression analysis only determines if variables are related. It cannot be used to
determine cause and effect.

11.26

From the printout, the 90% prediction interval is (-151.996, 175.4874). We are 90%
confidence that an actual DDT level for a fish caught 100 miles upstream that is 40
centimeters long and weighs 800 grams will be between -151.996 and 175.4874. Since the
DDT level cannot be negative, the interval would be between 0 and 175.4874.

11.28

a.

From MINITAB, the output is:


Regression Analysis: Precip versus Altitude, Latit, Coast
The regression equation is
Precip = - 102 + 0.00409 Altitude + 3.45 Latit - 0.143 Coast
Predictor
Constant
Altitude
Latit
Coast

Coef
-102.36
0.004091
3.4511
-0.14286

S = 11.10

SE Coef
29.21
0.001218
0.7949
0.03634

R-Sq = 60.0%

T
-3.50
3.36
4.34
-3.93

P
0.002
0.002
0.000
0.001

R-Sq(adj) = 55.4%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Altitude
Latit
Coast

DF
1
1
1

DF
3
26
29

SS
4809.4
3202.3
8011.7

MS
1603.1
123.2

F
13.02

P
0.000

Seq SS
730.7
2175.3
1903.4

Predicted Values for New Observations


New Obs
1

Fit
29.25

SE Fit
5.60

95.0% CI
17.75,
40.76)

95.0% PI
3.71,
54.80)

Values of Predictors for New Observations


New Obs Altitude
Latit
Coast
1
6360
36.6
145

The fitted regression line is:

y = 102.36 + 0.00409 x1 + 3.4511x2 0.1429 x3

390

Chapter 11

b.

To determine if the first-order model is useful for the predicting annual precipitation,
we test:

H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is 13.02 and the p-value is p = 0.000. Since the p-value is less than
= .05, H0 is rejected. There is sufficient evidence to indicate that the model is
useful for predicting annual precipitation at = .05.
c.

The prediction interval is (3.71, 54.80).


With 95% confidence, we can conclude that the annual precipitation for an individual
meteorological station with characteristics x1 = 6360 feet, x2 = 36.6, x3 = 145 miles
will fall between 3.71 inches and 54.80 inches.

11.30

The first order model is:

E(y) = 0 + 1x1 + 2x2 + 3x5


We want to find a 95% prediction interval for the actual voltage when the volume fraction
of the disperse phase is at the high level (x1 = 80), the salinity is at the low level (x2 = 1),
and the amount of surfactant is at the low level (x5 = 2).
Using MINITAB, the output is:
The regression equation is
y = 0.993 - 0.0243 x1 + 0.142 x2 + 0.385 x5
Predictor

Coef

StDev

0.9326

0.2482

3.76

0.002

x1

-0.024272

0.004900

-4.95

0.000

x2

0.14206

0.07573

1.88

0.080

x5

0.38457

0.09801

3.92

0.001

Constant

S = 0.4796

R-Sq = 66.6%

R-Sq(adj) = 59.9%

Analysis of Variance
Source
Regression
Residual

DF

SS

MS

6.8701

2.2900

9.95

0.001

15

3.4509

0.2301

18

10.3210

Error
Total
Sourc

DF

Seq SS

x1

1.4016

x2

1.9263

x5

3.5422

Multiple Regression and Model Building

391

Unusual Observations
Obs

x1

Fit

StDev Fit

Residual

St Resid

40.0

3.200

2.068

0.239

1.132

2.72R

R denotes an observation with a large standardized residual


Predicted Values
Fit

StDev Fit

-0.098

0.232

95.0%
( -0.592,

CI
0.396)

95.0%
(

-1.233,

PI
1.038)

The 95% prediction interval is (1.233, 1.038). We are 95% confident that the actual
voltage is between 1.233 and 1.038 kw/cm when the volume fraction of the disperse phase
is at the high level (x1 = 80), the salinity is at the low level (x2 = 1), and the amount of
surfactant is at the low level (x5 = 2).
11.32

11.34

a.

E(y) = 0 + 1x1 + 2x2 + 3x1x2

b.

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3 + 6x2x3

a.

R2 = 1

SSE
SS yy

=1

21
= .956
479

95.6% of the total variability of the y values is explained by this model.


b.

To test the utility of the model, we test:

H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is F =

R2 / k
.956 / 3
= 202.8
=
2
(1 R )[n (k + 1)] (1 .956)[32 (3 + 1)]

The rejection region requires = .05 in the upper tail of the F distribution, with 1 = k
= 3 and 2 = n (k + 1) = 32 (3 + 1) = 28. From Table IX, Appendix B, F.05 = 2.95.
The rejection region is F > 2.95.
Since the observed value of the test statistic falls in the rejection region (F = 202.8 >
2.95), H0 is rejected. There is sufficient evidence that the model is adequate for
predicting y at = .05.

392

Chapter 11

c.

The relationship between y and x1 depends on the level of x2.

d.

To determine if x1 and x2 interact, we test:

H0: 3 = 0
Ha: 3 0
The test statistic is t =

1 0 10
s

= 2.5.

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 32 (3 + 1) = 28. From Table VI, Appendix B, t.025 = 2.048. The
rejection region is t < 2.048 or t > 2.048.
Since the observed value of the test statistic falls in the rejection region (t = 2.5 >
2.048), H0 is rejected. There is sufficient evidence to indicate that x1 and x2 interact at
= .05.
11.36

a.

To determine if the overall model is useful for predicting y, we test:

H0: 1 = 2 = 3 = 0
Ha: At least one i is not 0
The test statistic is F = 226.35 and the p-value is p < .001. Since the p-value is less
than (p < .001 < .05), Ho is rejected. There is sufficient evidence to indicate the
overall model is useful for predicting y, willingness of the consumer to shop at a
retailers store in the future at = .05.
b.

To determine if consumer satisfaction and retailer interest interact to affect


willingness to shop at retailers shop in future, we test:

H0: 3 = 0
Ha: 3 0
The test statistic is t = -3.09 and the p-value is p < .01. Since the p-value is less
than (p < .01 < .05), H0 is rejected. There is sufficient evidence to indicate

Multiple Regression and Model Building

393

consumer satisfaction and retailer interest interact to affect willingness to shop at


retailers shop in future at = .05.
c.

When x2 = 1,

y = o + .426 x1 + .044 x2 .157 x1 x2


= + .426 x + .044(1) .157 x (1)
o

= o + .044 + (.426 .157) x1


= + .044 + .269 x
o

Since no value is given for o , we will use o = 1 for graphing purposes. Using
MINITAB, a graph might look like:
Scatterplot of YHAT vs X1 when X2=1
3.0

YHA T

2.5

2.0

1.5

d.

4
X1

When x2 = 7,

y = o + .426 x1 + .044 x2 .157 x1 x2


= + .426 x + .044(7) .157 x (7)
o

= o + .308 + (.426 1.099) x1


= + .308 .673x
o

Since no value is given for o , we will again use o = 1 for graphing purposes.

394

Chapter 11

Using MINITAB, a graph might look like:


Scatterplot of YHAT vs X1 when X2=7

YHA T

-1

-2

-3

-4
1

e.

4
X1

Using MINITAB, both plots on the same graph would be:


Scatterplot of YAHT vs X1
Variable

x2=1
x2=7

YHA T

1
0
-1
-2
-3
-4
1

4
X1

Since the lines are not parallel, it indicates that interaction is present.
11.38

a.

The hypothesized regression model including the interaction between x1 and x2


would be:

E ( y ) = o + 1 x1 + 2 x2 + 3 x1 x2
b.

If x1 and x2 interact to affect y then the effect of x1 on y depends on the level of x2.
Also, the effect of x2 on y depends on the level of x1.

Multiple Regression and Model Building

395

c.

Since the p-value is not small (p = .25), Ho is not rejected. There is insufficient
evidence to indicate x1 and x2 interact to affect y.

d.

1 corresponds to x1, the number ahead in line. If the negative feeling score gets
larger as the number of people ahead increases, then 1 is positive. 2 corresponds to
x2, the number behind in line. If the negative feeling score gets lower as the number
of people behind increases, then 2 is negative.

11.40

a.

If client credibility and linguistic delivery style interact, then the effect of client
credibility on the likelihood value depends on the level of linguistic delivery style.

b.

To determine the overall model adequacy, we test:

H0: 1 = 2 = 3 = 0
Ha: At least one i 0
c.

The test statistic is F = 55.35 and the p-value is p < 0.0005.


Since the p-value is so small (p < 0.0005), H0 is rejected for any reasonable value of
. There is sufficient evidence to indicate that the model is adequate at > 0.0005.

d.

To determine if client credibility and linguistic delivery style interact, we test:

H0: 3 = 0
Ha: 3 0
e.

The test statistic is t = 4.008 and the p-value is p < 0.005.


Since the p-value is so small (p < 0.005), H0 is rejected. There is sufficient evidence
to indicate that client credibility and linguistic delivery style interact at > 0.005.

f.

When x1 = 22, the least squares line is:

y = 15.865 + 0.037(22) 0.678 x2 + 0.036 x2 (22) = 16.679 + 0.114 x2


The estimated slope of the Likelihood-Linguistic delivery style line when client
credibility is 22 is 0.114. When client credibility is equal to 22, for each additional
point increase in linguistic delivery style, the mean likelihood is estimated to increase
by 0.114.
g.

When x1 = 46, the least squares line is:

y = 15.865 + 0.037(46) 0.678 x2 + 0.036 x2 (46) = 17.567 + 0.978 x2


The estimated slope of the Likelihood-Linguistic delivery style line when client
credibility is 46 is 0.978. When client credibility is equal to 46, for each additional
point increase in linguistic delivery style, the mean likelihood is estimated to increase
by 0.978.

396

Chapter 11

11.42

a.

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5

b.

H0: 4 = 0

c.

t = 4.408, p-value = .001


Since the p-value is so small, there is strong evidence to reject H0. There is sufficient
evidence to indicate that the strength of client-therapist relationship contributes
information for the prediction of a client's reaction for any > .001.

11.44

d.

Answers may vary.

e.

R2 = .2946. 29.46% of the variability in the client's reaction scores can be explained
by this model.

a.

1 = .02. The mean level of support for a military response is estimated to increase
by .02 for each day increase in level of TV news exposure, all other
variables held constant.

b.

To determine if an increase in TV news exposure is associated with an increase in


support for military resolution, we test:

H0: 1 = 0
Ha: 1 > 0
The p-value is p = .03/2 = .015. Since the p-value is less than (p = .015 < .05), H0 is
rejected. There is sufficient evidence to indicate that an increase in TV news
exposure is associated with an increase in support for military resolution, all other
variables held constant, at = .05.
c.

To determine if the relationship between support for military resolution and gender
depends on political knowledge, we test:

H0: 8 = 0
Ha: 8 0
The p-value is p = .02. Since the p-value is less than (p = .02 < .05), H0 is rejected.
There is sufficient evidence to indicate that the relationship between support for a
military resolution and gender depends on political knowledge, all other variables
held constant, at = .05.
d.

To determine if the relationship between support for military resolution and race
depends on political knowledge, we test:

H0: 9 = 0
Ha: 9 0
The p-value is p = .08. Since the p-value is not less than (p = .08 </ .05), H0 is not
rejected. There is insufficient evidence to indicate that the relationship between

Multiple Regression and Model Building

397

support for a military resolution and race depends on political knowledge, all other
variables held constant, at = .05.
e.

f.

R2 = .194.

19.4% of the variation in the support for military resolution is


explained by the model containing the seven independent variables
and the two interaction terms.
H0: 1 = 2 = 3 = 4 = 5 = 6 = 7 = 8 = 9 = 0
Ha: At least one i 0, i = 1, 2, 3, ... , 9
The test statistic is F =

R2 / k
.194 / 9
=
= 46.88
2
(1 R ) /[n (k + 1)] (1 .194) /[1763 (9 + 1)]

The rejection region requires = .05 in the upper tail of the F distribution with 1 =
k = 9 and 2 = n (k + 1) = 1763 (9 + 1) = 1753. From Table IX, Appendix B, F.05
1.88. The rejection region is F > 1.88.
Since the observed value of the test statistic falls in the rejection region (F = 46.88 >
1.88), H0 is rejected. There is sufficient evidence to indicate that the model is useful
at = .05.
11.46

a.

H0: 2 = 0
Ha: 2 0
The test statistic is t =

2 0
s

.47 0
= 3.133
.15

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 25 (2 + 1) = 22. From Table VI, Appendix B, t.025 = 2.074. The
rejection region is t < 2.074 or t > 2.074.
Since the observed value of the test statistic falls in the rejection region (t = 3.133 >
2.074), H0 is rejected. There is sufficient evidence to indicate the quadratic term
should be included in the model at = .05.
b.

H0: 2 = 0
Ha: 2 > 0
The test statistic is the same as in part a, t = 3.133.
The rejection region requires = .05 in the upper tail of the t distribution with df =
22. From Table VI, Appendix B, t.05 = 1.717. The rejection region is t > 1.717.
Since the observed value of the test statistic falls in the rejection region (t = 3.133 >
1.717), H0 is rejected. There is sufficient evidence to indicate the quadratic curve
opens upward at = .05.

398

Chapter 11

11.48

11.50

a.

b.

It moves the graph to the right (2x) or to the left (+2x) compared to the graph of
y = 1 + x2.

c.

It controls whether the graph opens up (+x2) or down (x2). It also controls how steep
the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the
narrower the curve is.

a.

0 has no meaning because x = 0 would not be in the observed range of values. In


this case, x is the year with values between 1984 and 1999.

b.

1 = 321.67. Since the quadratic effect is included in the model, the linear term is
just a location parameter and has no meaning.

c.

2 = .0794. Since the value of 2 is positive, the curvature is upward.

d.

Since no data have been collected past 1999, we have no idea if the relationship
between the two variables from 1984 to 1999 will remain the same until 2021.

Multiple Regression and Model Building

399

11.52

a.

Using MINITAB, a sketch of the least squares prediction equation is:


Scatterplot of yhat vs Dose
12
10

yhat

8
6
4
2
0
0

100

200

300

400
Dose

500

600

700

800

b.

For x = 500, y = 10.25 + .0053(500) .0000266(5002 ) = 10.25 + 2.65 6.65 = 6.25

c.

For x = 0, y = 10.25 + .0053(0) .0000266(02 ) = 10.25

d.

For x = 100, y = 10.25 + .0053(100) .0000266(1002 ) = 10.25 + .53 .266 = 10.514


This value is slightly larger than that for the control group (10.25).
For x = 200, y = 10.25 + .0053(200) .0000266(2002 ) = 10.25 + 1.06 1.064 = 10.246
This value is slightly smaller than that for the control group (10.25). So, the largest
value of x which yields an estimated weight change that is closest to, but just less than
the estimated weight change for the control group is x = 200.

11.54

a.

A first order model is:


E(y) = o + 1x

b.

A second order model is:


E(y) = o + 1x + 2x2

400

Chapter 11

c.

Using MINITAB, a scattergram of these data is:


Scatterplot of International vs Domestic
1200

Inter national

1000
800
600
400
200
0
100

200

300

400
Domestic

500

600

From the plot, it appears that the first order model might fit the data better. There
does not appear to be much of a curve to the relationship.
d.

Using MINITAB, the output is:


Regression Analysis: International versus Domestic, Dsq
The regression equation is
International = 203 - 0.58 Domestic + 0.00364 Dsq
Predictor
Constant
Domestic
Dsq

Coef
202.9
-0.581
0.003638

S = 142.696

SE Coef
245.0
1.510
0.002085

R-Sq = 78.8%

T
0.83
-0.38
1.74

P
0.424
0.707
0.107

R-Sq(adj) = 75.2%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Domestic
Dsq

DF
1
1

DF
2
12
14

SS
906515
244345
1150860

MS
453258
20362

F
22.26

P
0.000

Seq SS
844526
61990

To investigate the usefulness of the model, we test:

H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2

Multiple Regression and Model Building

401

The test statistic is F = 22.26.


The p-value is p = 0.000. Since the p-value is so small, we reject H0. There is
sufficient evidence to indicate the model is useful for predicting foreign gross
revenue.
To determine if a curvilinear relationship exists between foreign and domestic gross
revenues, we test:

H0: 2 = 0
Ha: 2 0
The test statistic is t = 1.74
The p-value is p = 0.107. Since the p-value is greater than = .05
(p = 0.107 > = .05), H0 is not rejected. There is insufficient evidence to indicate
that a curvilinear relationship exists between foreign and domestic gross revenues at
= .05.
e.

11.56

402

From the analysis in part d, the first-order model better explains the variation in
foreign gross revenues. In part d, we concluded that the second-order term did not
improve the model.

a.

b.

It moves the graph to the right (2x) or to the left (+2x) compared to the graph of
y = 1 + x2.

c.

It controls whether the graph opens up (+x2) or down (x2). It also controls how steep
the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the
narrower the curve is.

Chapter 11

11.58

a.

A scatterplot of the data is:


-

10500+

7000+

***

* *

**
*

*
*

** *

**

3500+

*
*

* *

+---------+---------+---------+---------+---------+------X
0.0

8.0

16.0

24.0

32.0

40.0

b.

From the plot, it looks like a second-order model would fit the data better than a firstorder model. There is little evidence that a third-order model would fit the data better
than a second-order model.

c.

Using MINITAB, the output for fitting a first-order model is:


The regression equation is
Y = 2752 + 122 X
Predictor
Constant
X

Coef
2752.4
122.34

s = 1904

Stdev
613.5
26.08

R-sq = 36.7%

t-ratio
4.49
4.69

p
0.000
0.000

R-sq(adj) = 35.0%

Analysis of Variance
SOURCE
Regression
Error
Total

DF
1
38
39

SS
79775688
137726224
217501920

Unusual Observations
Obs.
X
Y
27
27.0
2007
40
40.0
11520

MS
79775688
3624374

Fit Stdev.Fit
6056
345
7646
591

F
22.01

Residual
-4049
3874

p
0.000

St.Resid
-2.16R
2.14R

R denotes an obs. with a large st. resid.

Multiple Regression and Model Building

403

To see if there is a significant linear relationship between day and demand, we test:

H0: 1 = 0
Ha: 1 0
The test statistic is t = 4.69.
The p-value for the test is p = 0.000. Since the p-value is less than = .05, H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship
between day and demand at = .05.
d.

Using MINITAB, the output for fitting a second-order model is:


The regression equation is
Y = 5120 - 216 X + 8.25 XSQ
Predictor
Constant
X
XSQ

Coef
5120.2
-215.92
8.250

s = 1637

Stdev
816.9
91.89
2.173

R-sq = 54.4%

t-ratio
6.27
-2.35
3.80

p
0.000
0.024
0.001

R-sq(adj) = 52.0%

Analysis of Variance
SOURCE
Regression
Error
Total

DF
2
37
39

SS
118377056
99124856
217501920

SOURCE
X
XSQ

DF
1
1

SEQ SS
79775688
38601372

Unusual Observations
Obs.
X
Y
27
27.0
2007

MS
59188528
2679050

Fit Stdev.Fit
5305
357

F
22.09

Residual
-3298

p
0.000

St.Resid
-2.06R

R denotes an obs. with a large st. resid.

To see if there is a significant quadratic relationship between day and demand, we


test:

H0: 2 = 0
Ha: 2 0
The test statistic is t = 3.80.
The p-value for the test is p = 0.001. Since the p-value is less than = .05, H0 is
rejected. There is sufficient evidence to indicate that there is a quadratic relationship
between day and demand at = .05.

404

Chapter 11

e.

11.60

Since the quadratic term is significant in the second-order model in part d, the second
order model is better.

The model is E(y) = 0 + 1x1 + 2x2


where

1 if the variable is at level 2


x1 =
0 otherwise

1 if the variable is at level 3


x2 =
0 otherwise

0 = mean value of y when qualitative variable is at level 1.


1 = difference in mean value of y between level 2 and level 1 of qualitative variable.
2 = difference in mean value of y between level 3 and level 1 of qualitative variable.
11.62

a.

The least squares prediction equation is:


y = 80 + 16.8x1 + 40.4x2

b.

1 estimates the difference in the mean value of the dependent variable between level
2 and level 1 of the independent variable.

2 estimates the difference in the mean value of the dependent variable between level
3 and level 1 of the independent variable.
c.

The hypothesis H0: 1 = 2 = 0 is the same as H0: 1 = 2 = 3.


The hypothesis Ha: At least one of the parameters 1 and 2 differs from 0 is the same
as Ha: At least one mean (1, 2, or 3) is different.

d.

The test statistic is F =

MSR 2059.5
=
= 24.72
MSE
83.3

Since no was given, we will use = .05. The rejection region requires = .05 in
the upper tail of the test statistic with numerator df = k = 2 and denominator df = n
(k + 1) = 15 (2 + 1) = 12. From Table IX, Appendix B, F.05 = 3.89. The rejection
region is F > 3.89.
Since the observed value of the test statistic falls in the rejection region (F = 24.72 >
3.89), H0 is rejected. There is sufficient evidence to indicate at least one of the means
is different at = .05.
11.64

a.

b.

A confidence interval for the difference of two population means could be used.
Since both sample sizes are over 30, the large sample confidence interval is used (with
independent samples).
1 if public college
Let x1 =
0 otherwise
The model is E(y) = 0 + 1x1

Multiple Regression and Model Building

405

c.

1 is the difference between the two population means. A point estimate for 1 is 1 .
A confidence interval for 1 could be used to estimate the difference in the two
population means.

11.66

a.

1 if no
Let x1 =
0 if yes
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded yes to the
question "Flextime of the position applied for" and 1 is the difference in the mean job
preference between those who responded 'no' to the question and those who answered
yes to the question.

b.

1 if referral
Let x1 =
0 if not

1 if on-premise
x2 =
0 if not

The model would be E(y) = o + 1x1 + 2x2


In this model, o is the mean job preference for those who responded none to level
of day care support required, 1 is the difference in the mean job preference between
those who responded referral and those who responded none, and 2 is the
difference in the mean job preference between those who responded on-premise and
those who responded none.
c.

1 if counseling
Let x1 =
0 if not

1 if active search
x2 =
0 if not

The model would be E(y) = 0 + 1x1 + 2x2


In this model, 0 is the mean job preference for those who responded none to
spousal transfer support required, 1 is the difference in the mean job preference
between those who responded counseling and those who responded none, and 2 is
the difference in the mean job preference between those who responded active
search and those who responded none.
d.

1 if not married
Let x1 =
0 if married
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded married to
marital status and 1 is the difference in the mean job preference between those who
responded not married and those who answered married.

406

Chapter 11

e.

1 if female
Let x1 =
0 if male
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for males and 1 is the difference in the
mean job preference between females and males.

11.68

a.

4 = .296 The difference in the mean value of DTVA between when the operating
earnings are negative and lower than last year and when the operating earnings are
not negative and lower than last year is estimated to be .296, holding all other
variables constant.

b.

To determine if the mean DTVA for firms with negative earnings and earnings lower
than last year exceed the mean DTVA of other firms, we test:
H0: 4 = 0
Ha: 4 > 0
The p-value for this test is p = .001 / 2 = .0005. Since the p-value is so small, we
would reject H0 for = .05. There is sufficient evidence to indicate the mean DTVA
for firms with negative earnings and earnings lower than last year exceed the mean
DTVA of other firms at = .05.

11.70

c.

Ra2 = .280 28% of the variability in the DTVA scores is explained by the model
containing the 5 independent variables, adjusted for the number of variables in the
model and the sample size.

a.

To determine if there is a difference in the mean monthly rate of return for T-Bills
between an expansive Fed monetary policy and a restrictive Fed monetary policy, we
test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 8.14.
Since no n nor is given, we cannot determine the exact rejection region. However,
we can assume that n is greater than 2 since the data used are from 1972 and 1997.
With = .05, the critical value of t for the rejection region will be smaller than 4.303.
Thus, with = .05, t = 8.14 will fall in the rejection region. There is sufficient
evidence to indicate a difference in the mean monthly rate of return for T-Bills
between an expansive Fed monetary policy and a restrictive Fed monetary policy at
= .05.
However, the value of R2 is .1818. The model used is explaining only 18.18% of the
variability in the monthly rate of return. This is not a particularly large value.

Multiple Regression and Model Building

407

To determine if there is a difference in the mean monthly rate of return for Equity
REIT between an expansive Fed monetary policy and a restrictive Fed monetary
policy, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 3.46.
Since no n nor is given, we cannot determine the exact rejection region. However,
we can assume that n is greater than 4 since the data used are from 1972 and 1997.
With = .05, the critical value of t for the rejection region will be smaller than 3.182.
Thus, with = .05, t = 3.46 will fall in the rejection region. There is sufficient
evidence to indicate a difference in the mean monthly rate of return for Equity REIT
between an expansive Fed monetary policy and a restrictive Fed monetary policy at
= .05.
However, the value of R2 is .0387. The model used is explaining only 3.87% of the
variability in the monthly rate of return. This is a very small value.
b.

For the first model, 1 is the difference in the mean monthly rate of return for T-Bills
between an expansive Fed monetary policy and a restrictive Fed monetary policy.
For the second model, 1 is the difference in the mean monthly rate of return for
Equity REIT between an expansive Fed monetary policy and a restrictive Fed
monetary policy.

c.

The least squares prediction equation for the equity REIT index is:
y = 0.01863 0.01582x.
When the Federal Reserves monetary policy is restrictive, x = 1. The predicted mean
monthly rate of return for the equity REIT index is

y = 0.01863 0.01582(1) = .00281


When the Federal Reserves monetary policy is expansive, x = 0. The predicted mean
monthly rate of return for the equity REIT index is
y = 0.01863 0.01582(0) = .01863.
11.72

a.

The first-order model is E(y) = 0 + 1x1

b.

The new model is E(y) = 0 + 1x1 + 2x2 + 3x3


1 if level 2
where x 2 =
0 otherwise

408

1 if level 3
x3 =
0 otherwise

Chapter 11

c.

To allow for interactions, the model is:


E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3

11.74

11.76

d.

The response lines will be parallel if 4 = 5 = 0

e.

There will be one response line if 2 = 3 = 4 = 5 = 0

a.

When x2 = x3 = 0, E(y) = 0 + 1x1


When x2 = 1 and x3 = 0, E(y) = 0 + 1x1 + 2
When x2 = 0 and x3 = 1, E(y) = 0 + 1x1 + 3

b.

For level 1, y = 44.8 + 2.2x1


For level 2, y = 44.8 + 2.2x1 + 9.4
= 54.2 + 2.2x1
For level 3, y = 44.8 + 2.2x1 + 15.6
= 60.4 + 2.2x1

The model is E(y) = 0 + 1x1 + 2 x12 + 3x2 + 4x3 + 5x4


where x1 is the quantitative variable and
1 if level 2 of qualitative variable
x2 =
0 otherwise
1 if level 3 of qualitative variable
x3 =
0 otherwise
1 if level 4 of qualitative variable
x4 =
0 otherwise

Multiple Regression and Model Building

409

11.78

a.

E(y) = 0 + 1x1 + 2x2 + 3x1x2

1 if diet is duck chow


where x 2 =
0 otherwise
b.

Using MINITAB, the printout is:


The regression equation is
WtChg = -2.21 + 0.0783x1 - 10.4x2 - 0.095x1x2
Predicto
r
Constant
x1
x2
x1x2
S = 3.882

Coef

StDev

-2.210
0.07831
10.354
-0.0948

1.250
0.04947
8.538
0.1418

-1.77
1.58
1.21
-0.67

0.085
0.122
0.233
0.508

R-Sq = 44.1%

R-Sq(adj) = 39.7

Analysis of Variance
Source
Regression
Residual
Error
Total
Sourc
e
x1
x2
x1x2

DF
3
38

SS
452.54
572.58

41

1025.12

DF

Seq SS

1
1
1

384.24
61.57
6.73

MS
150.85
15.07

F
10.01

P
0.000

Unusual Observations
Obs
12
37
40

x1
30.0
42.5
75.0

y
-8.500
8.000
8.500

WtChg StDev Fit Residual St Resid


0.139
0.802
-8.639
-2.27R
7.445
2.990
0.555
0.22 X
6.910
2.077
1.590
0.48 X

R denotes an observation with a large standardized residual


X denotes an observation whose X value gives it large influence.

The fitted equation is y = 2.21 + .0783x1 + 10.4x2 .095x1x2

410

Chapter 11

c.

For diet = plants, x2 = 0


y = 2.21 + .0783x1 + 10.4(0) .095x1(0) = 2.21 + .0783x1

The slope is .0783. For each unit increase in digestion efficiency, the mean weight
change is estimated to increase by .0783 for goslings fed plants.
d.

For diet = plants, x2 = 1

y = 2.21 + .0783x1 + 10.4(1) .095x1(1) = 8.19 .0167x1


The slope is .0167. For each unit increase in digestion efficiency, the mean weight
change is estimated to decrease by .0167 for goslings fed duck chow.
e.

To determine if the slopes associated with the two diets differ, we test:
H0: 3 = 0
Ha: 3 0

From MINITAB, the test statistic is t = .67 with p-value = .508


Since = .05 is less than the p-value, we fail to reject H0. There is insufficient
evidence to conclude that the slopes associated with the two diets are significantly
different at = .05
11.80

a.

1 if intervention group
Let x2 =
0 if otherwise
The first-order model would be:
E(y) = 0 + 1x1 + 2x2

b.

For the control group, x2 = 0. The first-order model is:


E(y) = 0 + 1x1 + 2(0) = 0 + 1x1

For the intervention group, x2 = 1. The first-order model is:


E(y) = 0 + 1x1 + 2(1) = 0 + 1x1 + 2 = (0 + 2) + 1x1

In both models, the slope of the line is 1.


c.

If pretest score and group interact, the first-order model would be:
E(y) = 0 + 1x1 + 2x2 + 3x1x2

Multiple Regression and Model Building

411

d.

For the control group, x2 = 0. The first-order model including the interaction is:
E(y) = 0 + 1x1 + 2(0) + 3x1(0) = 0 + 1x1

For the intervention group, x2 = 1. The first-order model including the interaction is:
E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 1x1 + 2 + 3x1
= (0 + 2) + (1 + 3)x1

The slope of the model for the control group is 1. The slope of the model for the
intervention group is 1 + 3.
11.82

a.

The first-order model is:


E(y) = 0 + 1x1 + 2x2

b.

For the high-tech firms, x2 = 1. The model for the high-tech firm is:
E(y) = 0 + 1x1 + 2(1) = 0 + 2 + 1x1

The slope of the line would be 1.


c.

The new model would include the interaction term:


E(y) = 0 + 1x1 + 2x2 + 3x1x2

d.

For the high-tech firms, x2 = 1. The model for the high-tech firm is:
E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 2 + (1 + 3)x1

The slope of the line would be 1 + 3.


11.84

By adding variables to the model, SSE will decrease or stay the same. Thus, SSEC SSER.
The only circumstance under which we will reject H0 is if SSEC is much smaller than SSER.
If SSEC is much smaller than SSER, F will be large. Thus, the test is only one-tailed.

11.86

a.

Ha: At least one i 0, i = 3, 4, 5

b.

The reduced model would be E(y) = 0 + 1x1 + 2x2

c.

The numerator df = k g = 5 2 = 3 and the denominator df = n (k + 1)


= 30 (5 + 1) = 24.

412

Chapter 11

d.

H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
(SSE R SSE C)/(k g )
SSE C /[n (k + 1)]
(1250.2 1125.2) /(5 2) 41.6667
= .89
=
=
1125.2 /[30 (5 + 1)]
46.8833

The test statistic is F =

The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k g = 5 2 = 3 and denominator df = n (k + 1) = 30 (5 + 1) = 24.
From Table IX, Appendix B, F.05 = 3.01. The rejection region is F > 3.01.
Since the observed value of the test statistic does not fall in the rejection region (F =
.89 >/ 3.01), H0 is not rejected. There is insufficient evidence to indicate the secondorder terms are useful at = .05.
11.88

a.

Let variables x1 through x4 be the Demographic variables, variables x5 through x11 be


the Diagnostic variables, variables x12 through x15 be the Treatment variables, and
variables x16 through x21 be the Community variables. The compete model is:
E ( y ) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 5 x5 + 6 x6 + 7 x7 + 8 x8 + 9 x9
+ 10 x10 + 11 x11 + 12 x12 + 13 x13 + 14 x14 + 15 x15 + 16 x16 + 17 x17
+ 18 x18 + 19 x19 + 20 x20 + 21 x21

b.

To determine if the 7 Diagnostic variables contribute information for the prediction


of y, we test:
H0: 5 = 6 = = 11 = 0

c.

The reduced model would be:


E ( y ) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 12 x12 + 13 x13 + 14 x14
+ 15 x15 + 16 x16 + 17 x17 + 18 x18 + 19 x19 + 20 x20 + 21 x21

11.90

d.

Since the p-value is so small (p < .0001), H0 is rejected. There is sufficient evidence
to indicate at least one of the seven diagnostic variables contributes information for
the prediction of y.

a.

The complete second order model is:


E(y) = 0 + 1x1 + x12 + 3x2 + 4x1x2 + 5 x12 x2
where x1 = age
1 if current
x2 =
0 otherwise

Multiple Regression and Model Building

413

b.

To determine if the quadratic terms are important, we test:

c.

H0: 2 = 5 = 0
To determine if the interaction terms are important, we test:
H0: 4 = 5 = 0

d.

From MINITAB, the outputs from fitting the three models are:
Regression Analysis: Value versus Age, AgeSq, Status, AgeSt, AgeSqSt
The regression equation is
Value = 83 - 5.7 Age + 0.236 AgeSq - 62 Status + 5.4 AgeSt - 0.234 AgeSqSt
Predictor
Constant
Age
AgeSq
Status
AgeSt
AgeSqSt

Coef
83.4
-5.74
0.2361
-62.1
5.36
-0.2337

S = 286.8

SE Coef
316.3
18.68
0.2549
354.8
24.81
0.4080

R-Sq = 24.7%

T
0.26
-0.31
0.93
-0.18
0.22
-0.57

P
0.793
0.760
0.359
0.862
0.830
0.570

R-Sq(adj) = 16.1%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
AgeSt
AgeSqSt

DF
5
44
49

DF
1
1
1
1
1

SS
1186549
3618994
4805542

MS
237310
82250

F
2.89

P
0.024

Seq SS
865746
138871
77594
77342
26996

Regression Analysis: Value versus Age, Status, AgeSt


The regression equation is
Value = - 176 + 11.2 Age + 196 Status - 11.4 AgeSt
Predictor
Constant
Age
Status
AgeSt

Coef
-176.1
11.166
196.5
-11.432

S = 283.2

SE Coef
145.0
3.902
178.9
6.763

R-Sq = 23.2%

T
-1.21
2.86
1.10
-1.69

P
0.231
0.006
0.278
0.098

R-Sq(adj) = 18.2%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Status
AgeSt

414

DF
1
1
1

DF
3
46
49

SS
1116017
3689526
4805543

MS
372006
80207

F
4.64

P
0.006

Seq SS
865746
21097
229174

Chapter 11

Regression Analysis: Value versus Age, AgeSq, Status


The regression equation is
Value = 166 - 8.8 Age + 0.253 AgeSq - 106 Status
Predictor
Constant
Age
AgeSq
Status

Coef
165.8
-8.81
0.2535
-105.6

S = 284.5

SE Coef
182.7
10.89
0.1632
107.9

R-Sq = 22.5%

T
0.91
-0.81
1.55
-0.98

P
0.369
0.423
0.127
0.333

R-Sq(adj) = 17.5%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status

DF
1
1
1

DF
3
46
49

SS
1082210
3723332
4805542

MS
360737
80942

F
4.46

P
0.008

Seq SS
865746
138871
77594

Test for part b:


The test statistic is:
F=

(SSE R SSE C)/(k g ) (3, 689, 526 3, 618, 994) / 2


=
= .429
82, 250
SSE C /[n ( k + 1)]

Since no is given, we will use = .05. The rejection region requires = .05 in the
upper tail of the F distribution with 1 = 2 numerator degrees of freedom and 2 = 44
denominator degrees of freedom. From Table IX, Appendix B, F.05 3.23. The
rejection region is F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region (F =
.429 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the
quadratic terms are important for predicting market value at = .05.
Test for part c:
The test statistic is:
F=

(SSE R SSE C)/(k g ) (3, 723, 332 3, 618, 994) /(5 3)


=
= .634
82, 250
SSE C /[n (k + 1)]

The rejection region is the same as in previous test. Reject H0 if F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region
(F = .634 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the
interaction terms are important for predicting market value at = .05.

Multiple Regression and Model Building

415

11.92

a.

The reduced model for testing if the mean posttest scores differ for the intervention
and control groups would be:
E(y) = 0 + 1x1

11.94

b.

The reported p-value is .03. Since the p-value is so small, H0 is rejected. There is
evidence to indicate that the mean posttest sun safety knowledge scores differ for the
intervention and control groups for > .03.

c.

The reported p-value is .033. Since the p-value is so small, H0 is rejected. There is
evidence to indicate that the mean posttest sun safety comprehension scores differ for
the intervention and control groups for > .033.

d.

The reported p-value is .322. Since the p-value is not small, H0 is not rejected. There
is no evidence to indicate that the mean posttest sun safety application scores differ
for the intervention and control groups for < .322.

a.

To determine whether the rate of increase of emotional distress with experience is


different for the two groups, we test:
H0: 4 = 5 = 0
Ha: At least one i 0, i = 4, 5

b.

To determine whether there are differences in mean emotional distress levels that are
attributable to exposure group, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5

c.

To determine whether there are differences in mean emotional distress levels that are
attributable to exposure group, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
The test statistic is F =

(SSE R SSE C) /(k g )


(795.23 783.9) /(5 2)
=
= .93
783.9 /[200 (5 + 1)]
SSE C /[ n (k + 1)]

The rejection region requires = .05 in the upper tail of the F distribution with 1 = k
g = 5 2 = 3 and 2 = n (k + 1) = 200 (5 + 1) = 194. From Table IX, Appendix
B, F.05 2.60. The rejection region is F > 2.60.
Since the observed value of the test statistic does not fall in the rejection region
(F = .93 >/ 2.60), H0 is not rejected. There is insufficient evidence to indicate that
there are differences in mean emotional distress levels that are attributable to exposure
group at = .05.

416

Chapter 11

11.96

a.

The best one-variable predictor of y is the one whose t statistic has the largest absolute
value. The t statistics for each of the variables are:

Independent
Variable

x1
x2
x3
x4
x5
x6

t=

i
s

t = 1.6/.42 = 3.81
t = .9/.01 = 90
t = 3.4/1.14 = 2.98
t = 2.5/2.06 = 1.21
t = 4.4/.73 = 6.03
t = .3/.35 = .86

The variable x2 is the best one-variable predictor of y. The absolute value of the
corresponding t score is 90. This is larger than any of the others.

11.98

b.

Yes. In the stepwise procedure, the first variable entered is the one which has the
largest absolute value of t, provided the absolute value of the t falls in the rejection
region.

c.

Once x2 is entered, the next variable that is entered is the one that, in conjunction with
x2, has the largest absolute t value associated with it.

a.

In step 1, all 1 variable models are fit. Thus, there are a total of 11 models fit.

b.

In step 2, all two-variable models are fit, where 1 of the variables is the best one
selected in step 1. Thus, a total of 10 two-variable models are fit.

c.

In the 11th step, only one model is fit the model containing all the independent
variables.

d.

The model would be:

E ( y ) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 + 7 x7 + 9 x9 + 10 x10 + 11 x11
e.

67.7% of the total sample variability of overall satisfaction is explained by the model
containing the independent variables safety on bus, seat availability, dependability, t
travel time, convenience of route, safety at bus stops, hours of service, and frequency
of service.

f.

Using stepwise regression does not guarantee that the best model will be found.
There may be better combinations of the independent variables that are never found,
because of the order in which the independent variables are entered into the model.

Multiple Regression and Model Building

417

11.100 a.

The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved
shape. Such a pattern usually indicates that curvature needs to be added to the model.

b.

The plot of the residuals reveals a nonrandom pattern. The residuals versus the
predicted values shows a pattern where the range in values of the residuals increases
as y increases. This indicates that the variance of the random error, , becomes
larger as the estimate of E(y) increases in value. Since E(y) depends on the x-values
in the model, this implies that the variance of is not constant for all settings of the
x's.

c.

This plot reveals an outlier, since all or almost all of the residuals should fall within 3
standard deviations of their mean of 0.

d.

This frequency distribution of the residuals is skewed to the right. This may be due to
outliers or could indicate the need for a transformation of the dependent variable.

11.102 a.
b.

Since all the pairwise correlations are .45 or less in absolute value, there is little
evidence of extreme multicollinearity.
No. The overall model test is significant (p < .001). This implies that at least one
variable contributes to the prediction of the urban/rural rating. Looking at the
individual t-tests, there are several that are significant, namely x1, x3, and x5. There is
no evidence that multicollinearity is present.

11.104 First, we need to compute the value of the residual:

Residual = y y = 87 29.63 = 57.37


We are given that the standard deviation is s = 24.68. Thus, an observation with a
residual of 57.37 is 57.37 / 24.68 = 2.32 standard deviations from the fitted regression
line. Since this is less than 3 standard deviations from the regression line, this point is
not considered an outlier.

418

Chapter 11

11.106 a.

From MINITAB, the output is:


Regression Analysis: Food versus Income, Size
The regression equation is
Food = 2.79 - 0.00016 Income + 0.383 Size
Predictor
Constant
Income
Size

Coef
2.7944
-0.000164
0.38348

S = 0.7188

SE Coef
0.4363
0.006564
0.07189

R-Sq = 55.8%

T
6.40
-0.02
5.33

P
0.000
0.980
0.000

R-Sq(adj) = 52.0%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Income
Size

DF
2
23
25

DF
1
1

SS
15.0027
11.8839
26.8865

MS
7.5013
0.5167

F
14.52

P
0.000

Seq SS
0.2989
14.7037

Correlations: Income, Size


Pearson correlation of Income and Size = -0.137
P-Value = 0.506

No; Income and household size do not seem to be highly correlated. The correlation
coefficient between income and household size is .137.
b.

Using MINITAB, the residual plots are:


Histogram of the Residuals
(response is Food)

Frequency

10

0
-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Residual

Multiple Regression and Model Building

419

Residuals Versus the Fitted Values


(response is Food)
3

Residual

-1
3

Fitted Value

Residuals Versus Income


(response is Food)
3

Residual

-1
0

10

20

30

40

50

60

70

80

90

100

Income

Residuals Versus Size


(response is Food)
3

Residual

-1
0

Size

Yes; The residuals versus income and residuals versus homesize exhibit a curved shape.
Such a pattern could indicate that a second-order model may be more appropriate.

420

Chapter 11

c.

No; The residuals versus the predicted values reveals varying spreads for different
values of y . This implies that the variance of is not constant for all settings of the
x's.

d.

Yes; The outlier shows up in several plots and is the 26th household (Food consumption
= $7500, income = $7300 and household size = 5).

e.

No; The frequency distribution of the residuals shows that the outlier skews the
frequency distribution to the right.

11.108 Using MINITAB, the residual plots are:

Residual Plots for DDT


Normal Probability Plot of the Residuals

Percent

99
90
50
10
1
0.1

Residuals Versus the Fitted Values


Standardized Residual

99.9

-5

0
5
Standardized Residual

2.5
0.0

50

10

50
Fitted Value

100

Residuals Versus the Order of the Data


Standardized Residual

Frequency

100

2
4
6
8
Standardized Residual

5.0

Histogram of the Residuals

7.5

10

150

10.0

10.0
7.5
5.0
2.5
0.0
1 10 20 30 4 0 5 0 6 0 7 0 8 0 9 0 00 10 20 30 40
1 1 1 1 1

Observation Order

Residuals Versus WEIGHT


(response is DDT)
12

Standardized Residual

10
8
6
4
2
0
0

500

1000

1500

2000

2500

WEIGHT

Multiple Regression and Model Building

421

Residuals Versus LENGTH


(response is DDT)
12

Standardized Residual

10
8
6
4
2
0
20

25

30

35
LENGTH

40

45

50

55

Residuals Versus MILE


(response is DDT)
12

Standardized Residual

10
8
6
4
2
0
0

50

100

150

200

250

300

350

MILE

From the normal probability plot, the points do not fall on a straight line, indicating the
residuals are not normal. The histogram of the residuals indicates the residuals are
skewed to the right, which also indicates that the residuals are not normal. The plot of
the residuals versus yhat indicates that there is at least one outlier and the variance is
not constant. One observation has a standardized residual of more than 10 and several
others have standardized residuals greater than 3. This is also evident in the plots of the
residuals versus each of the independent variables. Since the assumptions of normality
and constant variance appear to be violated, we could consider transforming the data.
We should also check the outlying observations to see if there are any errors connected
with these observations.
11.110 a.

To determine if at least one of the parameters is not zero, we test:


H0: 1 = 2 = 3 = 4 = 0
Ha: At least one i 0
The test statistic is F =

422

R2 / k
.83 / 4
=
= 24.41
2
(1 R ) /[n (k + 1)] (1 .83)([25 (4 + 1)]

Chapter 11

The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k = 4 and denominator df = n (k + 1) = 25 (4 + 1) = 20. From
Table IX, Appendix B, F.05 = 2.87. The rejection region is F > 2.87.
Since the observed value of the test statistic falls in the rejection region (F = 24.41
> 2.87), H0 is rejected. There is sufficient evidence to indicate at least one of the
parameters is nonzero at = .05.
b.

H0: 1 = 0
Ha: 1 < 0
The test statistic is t =

1 0
s

2.43 0
= 2.01
1.21

The rejection region requires = .05 in the lower tail of the t distribution with df =
n (k + 1) = 25 (4 + 1) = 20. From Table VI, Appendix B, t.05 = 1.725. The
rejection region is t < 1.725.
Since the observed value of the test statistic falls in the rejection region (t = 2.01
< 1.725), H0 is rejected. There is sufficient evidence to indicate 1 is less than 0 at
= .05.
c.

H0: 2 = 0
Ha: 2 > 0
The test statistic is t =

2 0
s

.05 0
= .31
.16

The rejection region requires = .05 in the upper tail of the t distribution. From part
b above, the rejection region is t > 1.725.
Since the observed value of the test statistic does not fall in the rejection region (t =
.31 >/ 1.725), H0 is not rejected. There is insufficient evidence to indicate 2 is
greater than 0 at = .05.
d.

H0: 3 = 0
Ha: 3 0
The test statistic is t =

3 0
s

.62 0
= 2.38
.26

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = 20. From Table VI, Appendix B, t.025 = 2.086. The rejection region is t < 2.086
or t > 2.086.
Since the observed value of the test statistic falls in the rejection region (t = 2.38 >
2.086), H0 is rejected. There is sufficient evidence to indicate 3 is different from 0 at
= .05.

Multiple Regression and Model Building

423

11.112 The error of prediction is smallest when the values of x1, x2, and x3 are equal to their sample
means. The further x1, x2, and x3 are from their means, the larger the error. When x1 = 60,
x2 = .4, and x3 = 900, the observed values are outside the observed ranges of the x values.
When x1 = 30, x2 = .6, and x3 = 1300, the observed values are within the observed ranges
and consequently the x values are closer to their means. Thus, when x1 = 30, x2 = .6, and
x3 = 1300, the error of prediction is smaller.
11.114 From the plot of the residuals for the straight line model, there appears to be a mound shape
which implies the quadratic model should be used.
11.116 a.
b.

Ha: At least one of 4 and 5 0


The regression model
E(y) = 0 + 1x1 + 2x2 + 3 x22 + 4x1x2 + 5x1 x22
is fit to the 35 data points, yielding a sum of squares for error, denoted SSEC. The
regression model
E(y) = 0 + 1x1 + 2x2 + 3 x22
is also fit to the data and its sum of squares for error is obtained, denoted SSER. Then
the test statistic is:
F=

(SSE R SSE C) /( k g )
SSE C /[n (k + 1)]

where k = 5, g = 3, and n = 35.


c.

The numerator degrees of freedom is k g = 5 3 = 2, and the denominator degrees


of freedom is n (k + 1) = 35 (5 + 1) = 29.

d.

The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = 2 and denominator df = 29. From Table IX, Appendix B, F.05 = 3.33.
The rejection region is F > 3.33.

11.118 a.

E(y) = 0 + 1x1 + 2x2 + 3x3


1, if level 2
1, if level 3
x3 =
where x2 =
0, otherwise
0, otherwise

b.

E(y) = 0 + 1x1 + 2 x12 + 3x2 + 4x3 + 5x1x2 + 6x1x3 + 7 x12 x2 + 8 x12 x3


where x1, x2, and x3 are as in part a.

424

Chapter 11

11.120 a.
b.
11.122 a.

E(y) = 0 + 1x1 + 2x2


E(y) = 0 + 1x1 + 2 x12 + 3x2 + 4 x22 + 5x1x2
1.
2.
3.
4.
5.

b.

c.

The "Quantitative GMAT score" is measured on a numerical scale, so it is a


quantitative variable.
The "Verbal GMAT score" is measured on a numerical scale, so it is a
quantitative variable.
The "Undergraduate GPA" is measured on a numerical scale, so it is a
quantitative variable.
The "First-year graduate GPA" is measured on a numerical scale, so it is a
quantitative variable.
The "Student cohort" has 3 categories, so it is a qualitative variable. Note that
the numerical scale is meaningless in this situation. (It is possible to consider
this as a quantitative variable. However, for this problem we will consider it as
qualitative.)

The quantitative variables GMAT score, verbal GMAT score, undergraduate GPA,
and first-year graduate GPA should all be positively correlated to final GPA.
1
x5 =
0
1
x6 =
0

if student entered doctoral program in year 3


otherwise
if student entered doctoral program in year 5
otherwise

d.

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6

e.

0 = the y-intercept for students entering in year 1.


1 = the final GPA will increase by 1 for each additional increase of one unit of
GMAT score, holding the remaining variables constant.
2 = the final GPA will increase by 2 for each additional increase of one unit of
verbal GMAT score, holding the remaining variables constant.
3 = the final GPA will increase by 3 for each additional increase of one
undergraduate GPA point, holding the remaining variables constant.
4 = the final GPA will increase by 4 for each additional increase of one first-year
graduate GPA point, holding the remaining variables constant.

5 = difference in mean final GPA between student cohort year 2 and year 1.
6 = difference in mean final GPA between student cohort year 3 and year 1.
f.

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x1x5 + 8x1x6


+ 9x2x5 + 10x2x6 + 11x3x5 + 12x3x6 + 13x4x5 + 14x4x6

Multiple Regression and Model Building

425

g.

For the year 1 cohort, x5 = x6 = 0. The model is:


E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5(0) + 6(0) + 7x1(0) + 8x1(0)
+ 9x2(0) + 10x2(0) + 11x3(0) + 12x3(0) + 13x4(0) + 14x4(0)
= 0 + 1x1 + 2x2 + 3x3 + 4x4
The slopes for the four variables are 1, 2, 3 and 4 respectively.

11.124 a.

The hypothesized model is:


E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5

0 = y-intercept. It has no interpretation in this model.


1 = difference in the mean salaries between males and females, all other variables
held constant.

2 = difference in the mean salaries between whites and nonwhites, all other variables
held constant.
3 = change in the mean salary for each additional year of education, all other
variables held constant.

4 = change in the mean salary for each additional year of tenure with firm, all other
variables held constant.
5 = change in the mean salary for each additional hour worked per week, all other
variables held constant.
b.

The least squares equation is:

y = 15.491 + 12.774x1 + .713x2 + 1.519x3 + .32x4 + .205x5

0 = estimate of the y-intercept. It has no interpretation in this model.


1 : We estimate the difference in the mean salaries between males and females to be
$12.774, all other variables held constant.

2 : We estimate the difference in the mean salaries between whites and nonwhites to
be

$.713, all other variables held constant.

3 : We estimate the change in the mean salary for each additional year of education
to be $1.519, all other variables held constant.

4 : We estimate the change in the mean salary for each additional year of tenure
with firm to be $.320, all other variables held constant.

5 : We estimate the change in the mean salary for each additional hour worked per
week to be $.205, all other variables held constant.

426

Chapter 11

c.

R2 = .240. 24% of the total variability of salaries is explained by the model containing
gender, race, educational level, tenure with firm, and number of hours worked per
week.
To determine if the model is useful for predicting annual salary, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0
The test statistic is F =

R2 / k
.24 / 5
=
= 11.68
2
(1 R )[n (k + 1)] (1 .24) /[191 (5 + 1)]

The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k = 5 and denominator df = n (k + 1) = 191 (5 + 1) = 185. From
Table IX, Appendix B, F.05 2.21. The rejection region is F > 2.21.
Since the observed value of the test statistic falls in the rejection region (F = 11.68 >
2.21), H0 is rejected. There is sufficient evidence to indicate the model containing
gender, race, educational level, tenure with firm, and number of hours worked per
week is useful for predicting annual salary for = .05.
d.

To determine if male managers are paid more than female managers, we test:
H0: 1 = 0
Ha: 1 > 0
The p-value given for the test < .05/2 = .025. Since the p-value is less than = .05,
there is evidence to reject H0. There is evidence to indicate male managers are paid
more than female managers, holding all other variables constant, for > .025.

e.
11.126 a.
b.

The salary paid an individual depends on many factors other than gender. Thus, in
order to adjust for other factors influencing salary, we include them in the model.
The main effects model would be: E ( y ) = 0 + 1 x1 + 8 x8

1 = .28 . The mean value for the relative error of the effort estimate for developers
is estimated to be .28 units below that of project leaders, holding previous accuracy
constant.

8 = .27 . The mean value for the relative error of the effort estimate if previous
accuracy is more than 20% is estimated to be .27 units above that if previous
accuracy is less than 20%, holding company role of estimator constant.
c.

One possible reason for the sign of 1 being opposite from what is expected could be
that company role of estimator and previous accuracy could be correlated.

Multiple Regression and Model Building

427

11.128 a.

R2 = .45. 45% of the total variability of the suicide rates is explained by the model
containing unemployment rate, percentage of females in the work force, divorce rate,
logarithm of GNP, and annual percent change in GNP.
To determine if the model is useful for predicting suicide rate, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0
The test statistic is F =

R2 / k
.45 / 5
=
= 6.38
2
(1 R )[n (k + 1)] (1 .45) /[45 (5 + 1)]

The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k = 5 and denominator df = n (k + 1) = 45 (5 + 1) = 39. From
Table IX, Appendix B, F.05 2.45. The rejection region is F > 2.45.
Since the observed value of the test statistic falls in the rejection region (F = 6.38 >
2.45), H0 is rejected. There is sufficient evidence to indicate the model containing
unemployment rate, percentage of females in the work force, divorce rate, logarithm of
GNP and annual percent change in GNP is useful for predicting suicide rate for = .05.
b.

0 = .002 = estimate of the y-intercept. It has no interpretation in this model.


1 : We estimate the change in suicide rate for each unit change in unemployment
rate to be .0204, all other variables held constant.

2 : We estimate the change in suicide rate for each unit change in percentage of
females in the work force to be .0231, all other variables held constant.

3 : We estimate the change in suicide rate for each unit change in divorce rate to be
.0765, all other variables held constant.

4 : We estimate the change in suicide rate for each unit change in logarithm of GNP
to be .2760, all other variables held constant.

5 : We estimate the change in suicide rate for each unit change in annual percent
change in GNP to be .0018, all other variables held constant.
The p-values for unemployment rate and percentage of females in the work force are
less than .05. This indicates that both are important in predicting suicide rate. The pvalues for divorce rate, logarithm of GNP, and annual percent change in GNP are all
greater than .10. This indicates that none of these variables are important in
predicting suicide rate. We must view these conclusions with caution. Some of these
independent variables may be highly correlated with each other. If so, some of the
variables declared nonsignificant may be significant if the other variables are removed
from the model.

428

Chapter 11

c.

To determine if unemployment rate is a useful predictor of the suicide rate, we test:


H0: 1 = 0
Ha: 1 0
The p-value = .002. Since this p-value is less than = .05, there is evidence to reject
H0. There is sufficient evidence to indicate unemployment rate is a useful predictor of
the suicide rate for = .05.

d.

Curvature: It may be possible that the relationship between the suicide rate and some
of the independent variables is not linear, but curved. Thus, some of the variables that
do not appear to be useful predictors may, in fact, be useful predictors if the secondorder term was added to the model.
Interaction: Again, it may be possible that the effect of some independent variables
on the suicide rate is different for different levels of other independent variables. This
possibility should be explored before throwing out certain independent variables.
Multicollinearity: Some of these independent variables may be highly correlated with
each other. If so, some of the variables declared nonsignificant may be significant if
other variables are removed from the model.

11.130 CEO income (x1) and stock percentage (x2) are said to interact if the effect of one variable,
say CEO income, on the dependent variable profit (y) depends on the level of the second
variable, stock percentage.
11.132 a.

The SAS output is:


DEP VARIABLE: Y
ANALYSIS OF VARIANCE
SUM OF

MEAN

DF

SQUARES

SQUARE

F VALUE

PROB>F

MODEL

25784705.01

8594901.67

241.758

0.0001

ERROR

16

568826.19

35551.63709

C TOTAL

19

26353531.20

ROOT MSE

188.5514

R-SQUARE

0.9784

DEP MEAN

3014.2

ADJ R-SQ

0.9744

SOURCE

C.V.

6.255438

PARAMETER ESTIMATES
PARAMETER

STANDARD

T FOR H0:

ESTIMATE

ERROR

PARAMETER=0

PROB > |T|

290.99944

4.581

0.0003

0.37864583

-0.399

0.6949

5.34596285

-0.491

0.6300

0.006863831

7.569

0.0001

VARIABLE

DF

INTERCEP

1333.17830

X1

-0.15122302

X2

-2.62532461

X1X2

0.05195415

The fitted model is y = 1333.18 .151x1 2.625x2 + .052x1x2

Multiple Regression and Model Building

429

b.

To determine if the overall model is useful, we test:


H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is F =

MSR
8, 594, 901.67
=
= 241.758
MSE
35, 551.637

The rejection region requires = .05 in the upper tail of the F distribution with
numerator df = k = 3 and denominator df = n (k + 1) = 20 (3 + 1) = 16. From
Table IX, Appendix B, F.05 = 3.24. The rejection region is F > 3.24.
Since the observed value of the test statistic falls in the rejection region (F = 241.758
> 3.24), H0 is rejected. There is sufficient evidence to indicate the model is useful at
= .05.
c.

To determine if the interaction is present, we test:


H0: 3 = 0
Ha: 3 0

The test statistic is t =

3 0
s

= 7.569.

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 20 (3 + 1) = 16. From Table VI, Appendix B, t.025 = 2.120. The
rejection region is t < 2.120 or t > 2.120.
Since the observed value of the test statistic falls in the rejection region (t = 7.569 >
2.120), H0 is rejected. There is sufficient evidence to indicate the interaction between
advertising expenditure and shelf space is present at = .05.

430

d.

Advertising expenditure and shelf space are said to interact if the affect of advertising
expenditure on sales is different at different levels of shelf space.

e.

If a first-order model was used, the effect of advertising expenditure on sales would
be the same regardless of the amount of shelf space. If interaction really exists, the
effect of advertising expenditure on sales would depend on which level of shelf space
was present.

Chapter 11

11.134 a.

There is a curvilinear trend.


b.

From MINITAB, the output is:


The regression equation is y = 42.2 - 0.0114x + 0.000001 xsq
Predictor

Coef

StDev

42.247

5.712

7.40

0.000

-0.011404

0.005053

-2.26

0.037

0.00000061

0.00000037

1.66

0.115

Constant
x
xsq
S = 21.81

R-Sq = 34.9%

R-Sq(adj) = 27.2%

Analysis of Variance
Source

DF

SS

MS

4325.4

2162.7

4.55

0.026

475.6

Regression
Residual Error

17

8085.5

Total

19

12410.9

Sourc

DF

Seq SS

e
x

3013.3

xsq

1312.1

Unusual Observations
Obs
16
17

x1
9150

Fit

StDev Fit

Residual

4.60

-11.21

16.24

15.81

St Resid
1.09 x

15022

2.20

8.09

21.40

-5.89

-1.41 x

X denotes an observation whose X value gives it large influence.

The fitted model is y = 42.2 .0114x + .00000061x2

Multiple Regression and Model Building

431

c.

To determine if a curvilinear relationship exists, we test:


H0: 2 = 0
Ha: 2 0

From MINITAB, the test statistic is t = 1.66 with p-value = .115. Since the p-value is
greater than = .05, do not reject H0. There is insufficient evidence to indicate that a
curvilinear relationship exists between dissolved phosphorus percentage and soil loss
at = .05.
11.136 a.

The first order model for this problem is:


E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4

b.

Using MINITAB, the printout is:


Regression Analysis
The regression equation is
y = 28.9 -0.000000 x1 + 0.844 x2 - 0.360 x3 - 0.300 x4
Predictor

Coef

StDev

28.87

12.67

2.28

0.034

x1

-0.00000011

0.00000028

-0.38

0.708

x2

0.8440

0.2326

3.63

0.002

x3

-0.3600

0.1316

-2.74

0.013

x4

-0.3003

0.1834

-1.64

0.117

Constant

S = 5.989

R-Sq = 51.2%

R-Sq(adj) = 41.5%

Analysis of Variance
Source
Regression

DF

SS

MS

753.76

188.44

5.25

0.005

35.87

Residual Error

20

717.40

Total

24

1471.17

Source

DF

Seq SS

x1

129.96

x2

355.43

x3

172.19

x4

96.17

Unusual Observations
Obs

x1

Fit

StDev Fit

Residual

11940345

32.60

17.25

3.40

15.35

St Resid
3.11R

12

4905123

27.00

16.17

4.36

10.83

2.63R

R denotes an observation with a large standardized residual

432

Chapter 11

The least squares prediction line is y = 28.9 .00000011x1 + .844x2 .360x3 .300x4.
To determine if the model is useful for predicting percentage of problem mortgages,
we test:
H0: 1 = 2 = 3 = 4 = 0
Ha: At least one of the coefficients is nonzero

The test statistic is F =

MS(Model)
= 5.25
MSE

The p-value is p = .005. Since the p-value is less than = .05 (p = .005 < .05), H0 is
rejected. There is sufficient evidence to indicate the model is useful in predicting
percentage of problem mortgages at = .05.
c.

0 = 28.9. This is merely the y-intercept. It has no other meaning in this problem.
1 = 0.00000011. For each unit increase in total mortgage loans, the mean
percentage of problem mortgages is estimated to decrease by 0.00000011, holding
percentage of invested assets, percentage of commercial mortgages, and percentage of
residential mortgages constant.

2 = 0.844. For each unit increase in percentage of invested assets, the mean
percentage of problem mortgages is estimated to increase by 0.844, holding total
mortgage loans, percentage of commercial mortgages, and percentage of residential
mortgages constant.

3 = 0.360. For each unit increase in percentage of commercial mortgages, the


mean percentage of problem mortgages is estimated to decrease by 0.360, holding
total mortgage loans, percentage of invested assets, and percentage of residential
mortgages constant.

4 = 0.300. For each unit increase in percentage of residential mortgages, the mean
percentage of problem mortgages is estimated to decrease by 0.300, holding total
mortgage loans, percentage of invested assets, and percentage of commercial
mortgages constant.

Multiple Regression and Model Building

433

d.

Using MINITAB, the scattergrams are:

From the scattergrams, it appears that possibly x2 and x4 might warrant inclusion in
the model as second order terms.

434

Chapter 11

e.

Using MINITAB, the printout is:


Regression Analysis
The regression equation is
y = 56.2 -0.000000 x1 - 1.82 x2 - 0.449 x3 + 0.223 x4 + 0.0771 x2sq - 0.0189 x4sq
Predictor

Coef

StDev

56.17

13.81

4.07

0.001

x1

-0.00000008

0.00000025

-0.31

0.760

x2

-1.8177

0.9935

-1.83

0.084

x3

-0.4494

0.1127

-3.99

0.001

x4

0.2227

0.6079

0.37

0.718

x2sq

0.07707

0.02665

2.89

0.010

x4sq

-0.01887

0.02334

-0.81

0.429

Constant

S = 4.956

R-Sq = 69.9%

R-Sq(adj) = 59.9%

Analysis of Variance
Source
Regression

DF

SS

MS

1029.03

171.51

6.98

0.001

24.56

Residual Error

18

442.13

Total

24

1471.17

Source

DF

Seq SS

x1

129.96

x2

355.43

x3

172.19

x4

96.17

x2sq

259.22

x4sq

16.05

Unusual Observations
Obs

x1

Fit

StDev Fit

Residual

4 11940345

32.600

26.777

4.038

5.823

St Resid
2.03R
-2.04R

10

5328142

7.500

16.105

2.599

-8.605

12

4905123

27.000

16.559

3.607

10.441

3.07R

20

2978628

3.200

11.759

2.679

-8.559

-2.05R

R denotes an observation with a large standardized residual

The least squares prediction equation is


y = 56.2 .00000008x1 1.82x2 .449x3 + .223x4 + 1 .0771x22 .0189 x42
To determine if the model is useful for predicting percentage of problem mortgages,
we test:
H0: 1 = 2 = 3 = 4 = 5 = 6 = 0
Ha: At least one of the coefficients is nonzero

Multiple Regression and Model Building

435

The test statistic is F =

MS(Model)
= 6.98
MSE

The p-value is p = .001. Since the p-value is less than = .05 (p = .001 < .05), H0 is
rejected. There is sufficient evidence to indicate the model is useful in predicting
percentage of problem mortgages at = .05.
f.

To determine if one or more of the second-order terms of our model contribute


information for the prediction of the percentage of problem mortgages, we test:
H0: 5 = 6 = 0
Ha: At least one of the coefficients is nonzero

The test statistic is F =

(SSE R SSE C) /( k g ) (717.40 442.13) /(6 4)


=
= 5.60
442.13 /[25 (6 + 1)]
SSE C /[n (k + 1)]

The rejection region requires = .05 in the upper tail of the F-distribution with 1 =
(k g) = (6 4) = 2 and 2 = n (k + 1) = 25 (6 + 1) = 18. From Table IX,
Appendix B, F.05 = 3.55. The rejection region is F > 3.55.
Since the observed value of the test statistic falls in the rejection region (F = 5.60 >
3.55), H0 is rejected. There is sufficient evidence to indicate one or more of the
second-order terms of our model contribute information for the prediction of the
percentage of problem mortgages at = .05.
11.138 a.

Using SAS, the output for fitting the model is:


DEP VARIABLE: Y
ANALYSIS OF VARIANCE
SUM OF

MEAN

DF

SQUARES

SQUARE

F VALUE

PROB>F

MODEL

2396.36410

798.78803

99.394

0.0001

ERROR

16

128.58590

8.03662

C TOTAL

11

2524.95000

SOURCE

436

ROOT MSE

2.83489

R-SQUARE

0.9491

DEP MEAN

23.05000

ADJ R-SQ

0.9395

C.V.

12.29889

Chapter 11

PARAMETER ESTIMATES
PARAMETER

STANDARD

T FOR H0:

VARIABLE

DF

ESTIMATE

ERROR

PARAMETER=0

PROB > |T|

INTERCEP

-11.768830

3.05032146

-3.858

0.0014

X1

10.293782

1.43788129

7.159

0.0001

X1SQ

-0.417991

0.16132974

-2.591

0.0197

X2

13.244076

1.50325080

8.810

0.0001

The fitted model is: y = 11.8 + 10.3x1 .418 x12 + 13.2x2


b.

To determine if the second-order term is necessary, we test:


H0: 2 = 0
Ha: 2 0

The test statistic is t = 2.591.


The p-value is p = .0197. Since the p-value is less than (p = .0197 < .05), H0 is
rejected. There is sufficient evidence to conclude that the second-order term in the
model proposed by the operations manager is necessary at = .05.
c.

The reduced model E(y) = 0 + 3x2 was fit to the data. The SAS output is:
DEP VARIABLE: Y
ANALYSIS OF VARIANCE
SUM OF

MEAN

DF

SQUARES

SQUARE

F VALUE

PROB>F

MODEL

1.25000000

1.25000000

0.009

0.9258

ERROR

18

2523.70000

140.20556

C TOTAL

19

2524.95000

ROOT MSE

11.84084

R-SQUARE

0.0005

DEP MEAN

23.05

ADJ R-SQ

-0.0550

SOURCE

C.V.

51.37025

PARAMETER ESTIMATES
PARAMETER

STANDARD

T FOR H0:

VARIABLE

DF

ESTIMATE

ERROR

PARAMETER=0

PROB > |T|

INTERCEP

23.30000000

3.74440323

6.223

0.0001

X2

-0.50000000

5.29538583

-0.094

0.9258

Multiple Regression and Model Building

437

The fitted model is y = 23.3 .5x2.


The hypotheses are:
H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2

(SSE R SSE C) /(k g )


SSE C /[ n (k + 1)]
(2523.7 128.586) /(3 1) 1197.557
=
= 149.01
=
128.586 /[20 (3 + 1)]
8.036625

The test statistic is F =

The rejection region requires = .10 in the upper tail of the F distribution with
numerator df = k g = 3 1 = 2 and denominator df = n (k + 1) = 20 (3 + 1) = 16.
From Table VIII, Appendix B, F.10 = 2.67. The rejection region is F > 2.67.
Since the observed value of the test statistic falls in the rejection region (F = 149.01
> 2.67), H0 is rejected. There is sufficient evidence to indicate the age of the machine
contributes information to the model at = .10.
After adjusting for machine type, there is evidence that down time is related to age.
11.140 a.

For a sunny weekday, x1 = 0 and x2 = 1:


x3 = 70 y = 250 700(0) + 100(1) + 5(70) + 15(0)(70) = 700
x3 = 80 y = 250 700(0) + 100(1) + 5(80) + 15(0)(80) = 750
x3 = 90 y = 800
x3 = 100 y = 850

For a sunny weekend, x1 = 1 and x2 = 1:


x3 = 70 y = 250 700(1) + 100(1) + 5(70) + 15(1)(70) = 1050
x3 = 80 y = 250 700(1) + 100(1) + 5(80) + 15(1)(80) = 1250
x3 = 90 y = 1450
x3 = 100 y = 1650

438

Chapter 11

For both sunny weekdays and sunny weekend days, as the predicted high temperature
increases, so does the predicted day's attendance. However, the predicted day's
attendance on sunny weekend days increases at a faster rate than on sunny weekdays.
Also, the predicted day's attendance is higher on sunny weekend days than on sunny
weekdays.
b.

To determine if the interaction term is a useful addition to the model, we test:


H0: 4 = 0
Ha: 4 0

The test statistic is t =

4
s

15
=5
3

The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with
df = n (k + 1) = 30 (4 + 1) = 25. From Table VI, Appendix B, t.025 = 2.06. The
rejection region is t < 2.06 or t > 2.06.
Since the observed value of the test statistic falls in the rejection region (t = 5 > 2.06),
H0 is rejected. There is sufficient evidence to indicate the interaction term is a useful
addition to the model at = .05.
c.

For x1 = 0, x2 = 1, and x3 = 95,


y = 250 700(0) + 100(1) + 5(95) + 15(0)(95) = 825

d.

The width of the interval in Exercise 11.139e is 1245 645 = 600, while the width is
850 800 = 50 for the model containing the interaction term. The smaller the width
of the interval, the smaller the variance. This implies that the interaction term is quite
useful in predicting daily attendance. It has reduced the unexplained error.

Multiple Regression and Model Building

439

e.

11.142 a.

Because an interaction term including x1 is in the model, the coefficient corresponding


to x1 must be interpreted with caution. For all observed values of x3 (temperature), the
interaction term value is greater than 700.
From MINITAB, the output is:
Regression Analysis: y versus x1, x2, x1sq, x2sq, x1x2
The regression equation is
y = - 9.92 + 0.167 x1 + 0.138 x2 - 0.00111 x1sq -0.000843 x2sq
+0.000241 x1x2
Predictor
Constant
x1
x2
x1sq
x2sq
x1x2

Coef
-9.917
0.16681
0.13760
-0.0011082
-0.0008433
0.0002411

S = 0.1871

SE Coef
1.354
0.02124
0.02673
0.0001173
0.0001594
0.0001440

R-Sq = 93.7%

T
-7.32
7.85
5.15
-9.45
-5.29
1.67

P
0.000
0.000
0.000
0.000
0.000
0.103

R-Sq(adj) = 92.7%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1sq
x2sq
x1x2

DF
5
34
39

DF
1
1
1
1
1

SS
17.5827
1.1908
18.7735

MS
3.5165
0.0350

F
100.41

P
0.000

Seq SS
5.2549
7.5311
3.6434
1.0552
0.0982

The least squares prediction equation is:


y = 9.917 + .167 x1 + .138 x2 .00111x12 .000843 x22 + .000241x 1 x2

b.

The standard deviation for the first-order model is s = .4023. The standard deviation
for the second-order model is s = .1871.
The relative precision for the first-order model is 2(.4023) = .8046. The relative
precision for the second-order model is 2(.1871) = .3742.

c.

To determine if the model is useful, we test:


H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0, i = 1, 2, ... , 5

The test statistic is F =

MSR
3.5165
=
= 100.41
MSE
.0350

The p-value is .0000. Since the p-value is less than = .05, H0 is rejected. There is
sufficient evidence to indicate the model is useful for predicting GPA at = .05.

440

Chapter 11

d.

To determine if the interaction term is important, we test:


H0: 5 = 0
Ha: 5 0

The test statistic is t = 1.67.


The p-value is .103. Since the p-value is not less than = .10, H0 is not rejected.
There is insufficient evidence to indicate the interaction term is important for
predicting GPA at = .10.
e.

From MINITAB, the plots are:

Residuals Versus x1
(response is y)
0.5
0.4
0.3

Residual

0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
40

50

60

70

80

90

100

x1

Multiple Regression and Model Building

441

Residuals Versus x2
(response is y)
0.5
0.4
0.3

Residual

0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
50

60

70

80

90

100

x2

The residual plots of the residuals against x1 and against x2 for the second-order model
indicate there is no mound or bowl shape in either graph. This implies that secondorder is the highest order necessary. We have eliminated the mound shape from the
plots of the residuals against x1 and the residuals against x2 for the first-order model.
From the plots and the results of the tests in 11.145, it appears the second order model
is preferable for predicting GPA.
f.

To see if the second-order terms are useful, we test:


H0: 3 = 4 = 5
Ha: At least one i 0, i = 3, 4, 5

The test statistic is F =

(SSE R SSE C ) /(k g ) (5.9876 1.1908) / 3


=
= 45.68
.0350
SSE C / [ n (k + 1) ]

Since no is given, we will use = .05. The rejection region requires = .05 in the
upper tail of the F distribution with 1 = k g = 5 2 = 3 and 2 = n [k + 1] =
40 (5 + 1) = 34. From Table IX, Appendix B, F.05 2.92. The rejection region is
F > 2.92.
Since the observed value of the test statistic falls in the rejection region (F = 45.68 >
2.92), H0 is rejected. There is sufficient evidence that at least one second-order term
is useful at = .05.

442

Chapter 11

11.144 a.

The model is E(y) = 0 + 1x1


A sketch of the response curve might be:

b.

The model is E(y) = 0 + 1x1 + 2x2 + 3x3


1 if brand 2
where x 2 =
0 otherwise

1 if brand 3
x3 =
0 otherwise

A sketch of the response curve might be:

c.

The model is E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3


A sketch of the response curve might be:

Multiple Regression and Model Building

443

The Condo Sales Case


(To accompany Chapters 1011)

Several models were fit to obtain the final model. I first fit a model with only the main effects for
Floor, Distance, View, Endunit, and Furnish. Of these, only Furnish, adjusted for the other variables,
was not significant. See the output below.
The regression equation is
Price = 184 - 3.81 Floor + 1.74 Distance + 40.3 View - 32.7 Endunit
+ 4.28 Furnish
Predictor
Constant
Floor
Distance
View
Endunit
Furnish

Coef
183.570
-3.8076
1.7414
40.325
-32.716
4.279

s = 24.39

Stdev
5.221
0.7482
0.3750
3.456
9.581
3.602

R-sq = 49.4%

t-ratio
35.16
-5.09
4.64
11.67
-3.41
1.19

p
0.000
0.000
0.000
0.000
0.001
0.236

R-sq(adj) = 48.2%

Analysis of Variance
SOURCE
Regression
Error
Total
SOURCE
Floor
Distance
View
Endunit
Furnish

DF
5
203
208

SS
118091
120802
238893

DF
1
1
1
1
1

SEQ SS
14149
21208
75065
6829
840

MS
23618
595

F
39.69

p
0.000

I then added Floor2 and Distance2 to the model with all main effects. For this model, all of the main
effects, including Furnish, were significant along with both squared terms. The output follows.
The regression equation is
Price = 220 - 13.3 Floor - 7.01 Distance + 38.9 View - 22.0 Endunit
+ 7.31 Furnish + 1.05 FlSq + 0.572 DiSq
Predictor
Constant
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
s = 22.49

444

Coef
220.258
-13.296
-7.007
38.927
-21.967
7.308
1.0512
0.5719

Stdev
8.178
3.253
1.614
3.202
9.086
3.419
0.3492
0.1033

R-sq = 57.4%

t-ratio
26.93
-4.09
-4.34
12.16
-2.42
2.14
3.01
5.54

p
0.000
0.000
0.000
0.000
0.017
0.034
0.003
0.000

R-sq(adj) = 56.0%

The Condo Sales Case

Analysis of Variance
SOURCE
Regression
Error
Total

DF
7
201
208

SS
137234
101659
238893

DF
1
1
1
1
1
1
1

SEQ SS
14149
21208
75065
6829
840
3640
15503

SOURCE
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq

MS
19605
506

F
38.76

p
0.000

I then did a stepwise regression, forcing all the main effects and the two squared terms into the model,
to see if any two-way interaction terms could be added to the model. From this, only the interaction
between Floor and View was significant. The output from the final model is:
The regression equation is
Price = 206 - 9.93 Floor - 7.02 Distance + 66.0 View - 22.5 Endunit
+ 6.48 Furnish + 1.02 FlSq + 0.577 DiSq - 6.04 FV
Predictor
Constant
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
FV

Coef
206.123
-9.927
-7.020
65.952
-22.451
6.485
1.0207
0.57720
-6.037

s = 21.44

Stdev
8.379
3.186
1.539
6.619
8.662
3.265
0.3330
0.09848
1.312

R-sq = 61.5%

t-ratio
24.60
-3.12
-4.56
9.96
-2.59
1.99
3.07
5.86
-4.60

p
0.000
0.002
0.000
0.000
0.010
0.048
0.002
0.000
0.000

R-sq(adj) = 60.0%

Analysis of Variance
SOURCE
Regression
Error
Total

DF
8
200
208

SS
146965
91928
238893

DF
1
1
1
1
1
1
1
1

SEQ SS
14149
21208
75065
6829
840
3640
15503
9731

SOURCE
Floor
Distance
View
Endunit
Furnish
FlSq
DiSq
FV

The Condo Sales Case

MS
18371
460

F
39.97

p
0.000

445

This final model is fairly good. The R-squared value is .615. Thus, 61.5% of the variation in prices can
be explained by the model that includes the follow variables: Floor and Floor-squared, Distance and
Distance-squared, View, Endunit, Furnish, and the interaction of Floor and View. The residual plots
are as follows:

From the residual plots, it appears that the data are normally distributed, but there may be a couple of
outliers. This is evident by the two points whose standardized residuals are less than 3. Also, it
appears that there is constant variance. Thus, the model looks to be fairly good. It would be better if
the R-squared value was higher, however.
The final model is:
Price = 206 9.93 Floor 7.02 Distance + 66.0 View 22.5 Endunit + 6.48 Furnish
+ 1.02 FlSq + 0.577 DiSq - 6.04 FV
I have included graphs to indicate how each variable affects the price. These graphs reflect the
relationship between Price and a selected variable, holding the other variables constant.
The first graph is a graph of Price by Floor for each level of View, since Floor and View interact. Both
lines are curved to reflect the quadratic relationship between Floor and Price. For the Non-ocean view,
the price is fairly constant. There is a slight decrease in price as the Floor increases until Floor 5, and
then a slight increase as the floor increases. For the Ocean view, the price decreases at a decreasing rate
as the Floor increases.
The second graph is a graph of the Price by Distance. Again, the quadratic relationship is reflected by
the curved line. As the distance increases, the price decreases until a distance of 6 is reached. Then the
price begins to increase again as the distance increases.

446

The Condo Sales Case

The third graph is a graph of the Price by View, for each Floor. Again, we must look at the relationship
between Price and View at each Floor because of the significant interaction. For all Floors, the price of
the Ocean View is higher than the price of the Non-ocean View. However, the difference in the two
views depends on the floor.
The fourth graph is a graph of the Price by Endunit. From the graph, the price of the endunits are less
than the others.
The last graph is a graph of the Price by Furnish. From the graph, the price of the furnished units is
higher than the price of the non-furnished units.

The Condo Sales Case

447

You might also like