You are on page 1of 112

Chapter 08 - The Comparison of Two Populations

CHAPTER 8
THE COMPARISON OF TWO POPULATIONS
8-1.

n = 25

D = 19.08

H0: D = 0
t (24) =

s D = 30.67

H1: D 0

D D0
sD / n

= 3.11

Reject H0 at = 0.01.
Paired Difference Test
Evidence
Size

25

Average Difference 19.08 D

Assumption
Populations Normal

Stdev. of Difference 30.67 sD


Note: Difference has been defined as
Test Statistic 3.1105 t
df 24
Hypothesis Testing
Null Hypothesis
p-value
H0: 1 2 =0
0.0048

8-2.

n = 40

D = 5 s D = 2.3

H0: D = 0
t(39) =

At an of
5%
Reject

H1: D 0

50
2.3 / 40

= 13.75

Strongly reject H0. 95% C.I. for D 2.023(2.3/ 40 ) = [4.26, 5.74].


8-3.

n = 12

D = 3.67

H0: D = 0

s D = 2.45 (D = Movie Commercial)

H1: D 0

8-1

Chapter 08 - The Comparison of Two Populations

(template: Testing Paired Difference.xls, sheet: Sample Data)


Paired Difference Test
Data
Current Previous Evidence
M
C

Size

Average Difference 3.66667 D

Assumption

15

10

Populations Normal

2
3
4
5
6
7
8
9

17
25
17
14
18
17
16
14

9
Stdev. of Difference 2.44949 sD
Note: Difference has been defined as
21
16
Test Statistic 4.4907 t
11
df
8
At an of
12 Hypothesis Testing
Null Hypothesis
p-value
13
5%
H0: 1 2 =0
Reject
15
0.0020
H0: 1 2 >=0
13
0.9990
H0: 1 2 <=0
Reject
0.0010

At = 0.05, we reject H0. There are more viewers for movies than commercials.
8-4.

D = 0.2

n = 60

H0: D 0
t(24) =

0.2 0
1 / 60

sD = 1

H1: D > 0
= 1.549. At = 0.05, we cannot reject H0.

Paired Difference Test


Evidence
Size

60

Assumption

Average Difference

0.2

Populations Normal

Stdev. of Difference

sD
Note: Difference has been defined as

Test Statistic 1.5492 t


df 59
Hypothesis Testing
Null Hypothesis
p-value
H0: 1 2 =0
0.1267
H0: 1 2 >=0
0.9367
H0: 1 2 <=0
0.0633

8-5.

n = 15

D = 3.2

H0: D 0
t (14) =

s D = 8.436

At an of
5%

(D = After Before)

H1: D > 0

3.2 0
8.436 / 15

= 1.469

8-2

Chapter 08 - The Comparison of Two Populations

There is no evidence that the shelf facings are effective.


8-6.

n = 12

D = 37.08

H0: D = 0

s D = 43.99

H1: D 0

(template: Testing Paired Difference.xls, sheet: Sample Data)


Paired Difference Test
Data
Current Previous Evidence
France Spain

Size

12

Average Difference 37.0833 D

Assumption

258

214

Populations Normal

2
3
4
5
6
7
8
9
10
11
12

289
228
200
190
350
310
212
195
175
299
190

250
Stdev. of Difference 43.9927 sD
Note: Difference has been defined as
190
185
Test Statistic 2.9200 t
114
df 11
At an of
285 Hypothesis Testing
Null Hypothesis
p-value
378
5%
H0: 1 2 =0
Reject
230
0.0139
H
:

>=
160
0
0.9930
0
1
2
H0: 1 2 <=0
Reject
120
0.0070
220
105

Reject H0. There is strong evidence that hotels in Spain are cheaper than those in France,
based on this small sample. p-value = 0.0139
8-7.

Power at D = 0.1
H0: D 0

n = 60

D = 1.0

= 0.01

H1: D > 0

C = 0 + 2.326( / n ) = 0.30029

We need:

P( D > C | D = 0.1)
= P( D > 0.30029 | D = 0.1)

0.30029 0.1

= P Z
1 / 60

= P(Z > 1.551) = 0.0604


8-8.

n = 20

D = 1.25

H0: D = 0

s D = 42.896

H1: D 0

8-3

Chapter 08 - The Comparison of Two Populations

t (19) =

1.25 0

= 0.13

42.89 / 20

Do not reject H0; no evidence of a difference.


Paired Difference Test
Evidence
Size

20

Assumption

Average Difference 1.25 D

Populations Normal

Stdev. of Difference 42.89 sD


Note: Difference has been defined as
Test Statistic 0.1303 t
df 19
Hypothesis Testing
Null Hypothesis
p-value
H0: 1 2 =0
0.8977

8-9.

n1 = 100

At an of
5%

n 2 = 100 x1 = 76.5 x 2 = 88.1 s1 = 38 s 2 = 40

H0: 2 1 0

H1: 2 1 0

(Template: Testing Population Means.xls, sheet: Z-test from Stats)


(need to use the t-test since the population std. dev. is unknown)
Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

100
76.5
38

n
x-bar
s

100
88.1
40

H0: Population Variances Equal


F ratio 1.10803
p-value 0.6108

Assuming Population Variances are Equal


Pooled Variance 1522 s2p
Test Statistic -2.1025 t
df 198

Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0

At an of Confidence Interval for difference in Population Means


Confidence
p-value
5%
Interval
Reject
0.0368
95%
-11.6 10.8801 = [ -22.48, -0.7199 ]
0.0184
0.9816

Reject

Reject H0. There is evidence that gasoline outperforms ethanol.


8-10.

n1 = n 2 = 30

H0: 1 2 = 0
Nikon (1): x1 = 8.5

H1: 1 2 0
s1 = 2.1

Minolta (2): x 2 = 7.8 s 2 = 1.8

8-4

Chapter 08 - The Comparison of Two Populations

z=

8.5 7.8
2

= 1.386

(2.1 / 30) (1.8 / 30)

Do not reject H0. There is no evidence of a difference in the average ratings of the two cameras.
8-11.

Bel Air (1):

n1 = 32

x1 = 2.5M

s1 = 0.41M

Marin (2):

n 2 = 35

x 2 = 4.32M

s 2 = 0.87M

H0: 1 2 = 0

H1: 1 2 0

(Template: Testing Population Means.xls, sheet: t-test from Stats)


(need to use the t-test since the population std. dev. is unknown)
Equal variance assumptions is questionable.
t-Test for Difference in Population Means
Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

32
2.5
0.41

35
4.32
0.87

n
x-bar
s

H0: Population Variances Equal


F ratio 4.50268
p-value 0.0001

Assuming Population Variances are Equal


Pooled Variance 0.47609 s2p
Test Statistic -10.7845 t
df
65

Warning: Equal variance assumption is questionable.


At an of

Null Hypothesis
H0: 1 2 =0

p-value
0.0000

5%
Reject

H0: 1 2 >=0
H0: 1 2 <=0

0.0000
1.0000

Reject

Confidence Interval for difference in Population Means


Confiden
ce
Interval
95%
-1.82 0.33704 = [
-2.157, -1.48296 ]

Assuming Population Variances are Unequal


Test Statistic -11.101 t
df
49
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0

p-value
0.0000
0.0000
1.0000

At an of
5%
Reject
Reject

8-5

Confidence Interval for difference in Population Means


Confidence Interval
95%
-1.82 0.32946 = [
-2.1495, -1.49054 ]

Chapter 08 - The Comparison of Two Populations

Reject H0. There is evidence that the average Bel Air price is lower.
8-12.

(Template: Testing Population Means.xls, sheet: t-test from Stats)


(need to use the t-test since the population std. dev. is unknown)

H0: J SP = 0

H1: J SP 0

t-Test for Difference in Population Means


Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

40
15
3

n
x-bar
s

40
6.2
3.5

H0: Population Variances Equal


F ratio 1.36111
p-value 0.3398

Assuming Population Variances are Equal


Pooled Variance 10.625 s2p
Test Statistic 12.0735 t
df 78

Null Hypothesis
H0: 1 2 =0

At an of Confidence Interval for difference in Population Means


Confidence
p-value
5%
Interval
0.0000 Reject
95%
8.8 1.45107 = [ 7.34893, 10.2511 ]

H0: 1 2 >=0
H0: 1 2 <=0

1.0000
0.0000

Reject

Reject the null hypothesis. The global equities outperform U.S. market.
8-13.

Music:

n1 = 128 x1 = 23.5

s1 = 12.2

Verbal: n 2 = 212

x 2 = 18.0

H0: 1 2 = 0

H1: 1 2 0

z=

23.5 18.0
(12.2 / 128) (10.5 2 / 212)
2

s 2 = 10.5

= 4.24

Reject H0. Music is probably more effective.

8-6

Chapter 08 - The Comparison of Two Populations

Evidence
Sample1 Sample2
n
Size
128
212
x-bar
Mean 23.5
18
Popn. Std. Devn.

Popn. 1
12.2

Popn. 2
10.5

Hypothesis Testing
Test Statistic 4.2397 z
At an of
p-value
5%
Reject
0.0000

Null Hypothesis
H0: 1 2 =0

8-14.

n1 = 13

n 2 = 13

x1 = 20.385

= .05

s1 = 7.622

s 2 = 4.292

H0: u1 = u2

H1: u1 u2

S p2

x 2 = 10.385

13 17.622 2 13 14.292 2
13 13 2

t 24

20.385 10.385

38.2581 1 1
13 13

38.2581

4.1219

df 24.
Use a critical value of 2.064 for a two-tailed test. Reject H0. The two methods do differ.
8-15.

Liz (1):

n1 = 32

x1 = 4,238

Calvin (2):

n 2 = 37

x 2 = 3,888.72 s 2 = 876.05

a. one-tailed: H0: 1 2 0
b. z =

s1 = 1,002.5

H1: 1 2 > 0

4,238 3,888.72 0
(1,002.52 / 32) (876.052 / 37)

= 1.53

c. At = 0.5, the critical point is 1.645. Do not reject H0 that Liz Claiborne models do not get
more money, on the average.
d. p-value = .5 .437 = .063 (It is the probability of committing a Type I error if we choose
to reject and H0 happens to be true.)

8-7

Chapter 08 - The Comparison of Two Populations

e.

S 2p

10 11002.5 2 11 1876.05 2
10 11 2

t 24

4238 3888.72

879983.804 1 1
10 11

879983.804

0.8522

df 19
8-16.

(Template: Testing Population Means.xls, sheet: t-test from Stats)


(need to use the t-test since the population std. dev. is unknown)
H0: 1 2 = 0
H1: 1 2 0
t-Test for Difference in Population Means
Evidence
Sample1

Sample2

28
0.19
5.72

28
0.72
5.1

Size
Mean
Std. Deviation

Assumptions
Populations Normal
n
x-bar
s

H0: Population Variances Equal


F ratio 1.25792
p-value 0.5552

Assuming Population Variances are Equal


Pooled Variance 29.3642 s2p
Test Statistic -0.3660 t
df 54

Null Hypothesis
H0: 1 2 =0

At an of Confidence Interval for difference in Population Means


Confidence
p-value
1%
Interval
0.7158
99%
-0.53 3.86682 = [ -4.3968, 3.33682 ]

H0: 1 2 >=0
H0: 1 2 <=0

0.3579
0.6421

Do not reject the null hypothesis. Pre-earnings announcements have no impact on earnings on
stock investments.
8-17.

Non-research (1):

n1 = 255

s1 = 0.64

Research (2):

n 2 = 300

s 2 = 0.85

x 2 x1 = 2.54

95% C.I. for 2 1 is: ( x 2 x1 ) z / 2 (s1 / n1 ) (s 2 / n2 )


2

= 2.54 1.96 (.64 / 255) (.85 / 300) = [2.416, 2.664] percent.

8-8

Chapter 08 - The Comparison of Two Populations

8-18.

Audio (1): n1 = 25

x1 = 87

s1 = 12

Video (2): n 2 = 20

x 2 = 64

s 2 = 23

H0: 1 2 = 0
t(43) =

H1: 1 2 0
x1 x 2 0

(n1 1) s1 (n 2 1) s 2
n1 n 2 2
2

1
1

n1 n 2

= 4.326

Reject H0. Audio is probably better (higher average purchase intent). Waldenbooks should
concentrate in audio.
Evidence
Sample1 Sample2
Size
Mean
Std. Deviation

25
87
12

n
x-bar
s

20
64
23

Assuming Population Variances are Equal


Pooled Variance 314.116 s2p
Test Statistic 4.3257 t
df
43
Null Hypothesis
H0: 1 2 =0

8-19.

With training (1):

p-value
0.0001

At an of
5%
Reject

n1 = 13

x1 = 55

s1 = 8

Without training (2): n 2 = 15

x 2 = 48

s2 = 6

H0: 1 2 4,000

H1: 1 2 > 4,000

(55 48) 4

t (26) =

= 1.132

(12)(8) (14)(6) 1
1

26
13 15

The critical value at = .05 for t (26) in a right-hand tailed test is 1.706. Since 1.132 < 1.706,
there is no evidence at = .05 that the program executives get an average of $4,000 per year
more than other executives of comparable levels.

8-20.

(Use template: testing difference in means.xls)


(need to use the t-test since the population std. dev. is unknown)
H0: P - L= 0

H1: P - L 0

8-9

Chapter 08 - The Comparison of Two Populations

t-Test for Difference in Population Means


Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

20
1
1.1

20
6
2.5

n
x-bar
s

H0: Population Variances Equal


F ratio 5.16529
p-value 0.0008

The variances are not equal.


Assuming Population Variances are Unequal
Test Statistic -8.1868 t
df 26

Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0

At an Confidence Interval for difference in Population


of
Means
Confidence
Interval
p-value
5%

0.0000 Reject 95%


-5 1.25539 = [ -6.2554, -3.74461 ]
0.0000 Reject
1.0000

Reject the null hypothesis: the average cost of beer is cheaper in Prague. Londoners save
between $3.74 and $6.26.
8-21.

(Use template: testing difference in means.xls)


(need to use the t-test since the population std. dev. is unknown)
H1: 1 2 0
H0: 1 2 = 0
t-Test for Difference in Population Means
Evidence
Size
Mean
Std. Deviation

US

China

15
3.8
2.2

18
6.1
5.3

8-10

Assumptions
Populations Normal
n
x-bar
s

H0: Population Variances Equal


F ratio 5.80372
p-value 0.0018

Chapter 08 - The Comparison of Two Populations

Equal variance assumption is violated.


Assuming Population Variances are
Unequal
Test Statistic -1.676 t
df 23

Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0

Confidence Interval for difference in Population


At an of Means
Confidence
p-value
Interval
1%

0.1073
99% -2.3 3.85252 = [ -6.1525, 1.55252 ]
0.0536
0.9464

Do not reject the null hypothesis (p-value = 0.1073), investment returns are the same in China
and the US.
8-22.

Old (1):

n1 = 19

x1 = 8.26

s1 = 1.43

New (2):

n 2 = 23

x 2 = 9.11

s 2 = 1.56

H0: 2 1 0

H1: 2 1 > 0
9.11 8.26 0

t (40) =

= 1.82

18(1.43) 22(1.56) 1
1

40
19 23

Some evidence to reject H0 (p-value = 0.038) for the t-distribution with df = 40, in a one-tailed
test.
8-23.

Take proposed route as population 1 and alternate route as 2. Assume equal variance for both
populations.
H0: 1 2 0
H1: 1 2 > 0
p-value from the template = 0.8674
cannot reject H0

8-24.

(Use template: testing difference in means.xls)


(need to use the t-test since the population std. dev. is unknown)
H0: 1 2 = 0
H1: 1 2 0

8-11

Chapter 08 - The Comparison of Two Populations

Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

20
3.56
2.8

n
x-bar
s

20
4.84
3.2

H0: Population Variances Equal


F ratio 1.30612
p-value 0.5662

Assuming Population Variances are Equal


Pooled Variance 9.04 s2p
Test Statistic -1.3463 t
df
38
Null Hypothesis
H0: 1 2 =0

p-value
0.1862

H0: 1 2 >=0
H0: 1 2 <=0

0.0931
0.9069

At an of
5%

Do not reject the null hypothesis. Neither investment outperforms the other.
8-25.

Yes (1): n1 = 25

x1 = 12

No (2):

x 2 = 13.5

n 2 = 25

s1 = 2.5
s2 = 1

Assume independent random sampling from normal populations with equal population variances.
H0: 2 1 0
H1: 2 1 > 0
13.5 12

t(48) =

= 2.785

24(2.5) 24(1) 1
1

48
25 25

At = 0.05, reject H0. Also reject at = 0.01. p-value = 0.0038.


Evidence
Sample1 Sample2
n
Size
25
25
Mean
12
13.5 x-bar
s
Std. Deviation 2.5
1

Assuming Population Variances are Equal


Pooled Variance 3.625 s2p
Test Statistic -2.7854 t
df
48
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0

At an of
p-value
5%
0.0076 Reject
0.0038

Reject

8-12

Chapter 08 - The Comparison of Two Populations

8-26.

H0: 1 2 = 0

H1: 1 2 0

.1331 .105 0

z=

= 0.8887

20(.09) 27(.122) 1
1

47
21 28

Do not reject H0. There is no evidence of a difference in average stock returns for the two
periods.
8-27.

(Use template: testing difference in means.xls)


(need to use the t-test since the population std. dev. is unknown)
H0: N - O 0
H1: N - O > 0
Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

8
3
2

n
x-bar
s

10
2.3
2.1

H0: Population Variances Equal


F ratio 1.1025
p-value 0.9186

Assuming Population Variances are Equal


Pooled Variance 4.23063 s2p
Test Statistic 0.7175 t
df
16
Null Hypothesis
H0: 1 2 =0

p-value
0.4834

H0: 1 2 >=0
H0: 1 2 <=0

0.7583
0.2417

At an of
5%

Do not reject the null hypothesis. (p-value = 0.2417) The new advertising firm has not resulted
in significantly higher sales.
8.28.

From Problem 8-25:


n1 = n 2 = 25 x1 = 12

x 2 = 13.5

s1 = 2.5

s2 = 1

We want a 95% C.I. for 2 1 :


(n1 1) s1 (n2 1) s 2
( x 2 x1 ) 2.011
n1 n2 2
2

1
1

n1 n2

24(2.5) 24(1) 1
1
= (13.5 12) 2.001

48
25 25

= [0.4170, 2.5830] percent.

8-13

Chapter 08 - The Comparison of Two Populations

8-29.

Before (1):

x1 = 85

n1 = 100

After (2):

x 2 = 68

n 2 = 100

H0: p1 p2 0
H1: p1 p2 > 0
p 1 p 2
.85 .68
z=
=
= 2.835
1
1
1
1
(.765)(.235)

p (1 p )
100 100
n1 n 2
Reject H0. On-time departure percentage has probably declined after NWs merger with
Republic. p-value = 0.0023.

Evidence

Sample Sample
1
2
Size 100
100 n
#Successes 85
68 x
Proportion 0.8500 0.6800 p-hat

Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.7650
Test Statistic 2.8351 z
Null Hypothesis
H0: p1 - p2 = 0
H0: p1 - p2 >= 0
H0: p1 - p2 <= 0

8-30.

p-value
0.0046
0.9977
0.0023

At an of
5%
Reject
Reject

Small towns (1):

n1 = 1,000

x 1 = 850

Big cities (2):

n 2 = 2,500

x 2= 1,950

H0: p1 p2 0

H1: p1 p2 > 0
850 1,950

1,000 2,500

z=

1
850 1,950 2,800 1

3,500 3,500 1,000 2,500

= 4.677

Reject H0. There is strong evidence that the percentage of word-of-mouth recommendations in
small towns is greater than it is in large metropolitan areas.
8.31.

n1 = 31

x 1 = 11

H0: p1 p2 = 0

n 2 = 50

x 2= 19

H1: p1 p2 0

8-14

Chapter 08 - The Comparison of Two Populations

z=

p 1 p 2
1
1

p (1 p )
n
n
2
1

= 0.228

Do not reject H0. There is no evidence that one corporate raider is more successful than the other.
8-32.

Before campaign (1):

n1 = 2,060

p 1 = 0.13

After campaign (2):

n 2 = 5,000

p 2 = 0.19

H0: p2 p1 .05
H1: p2 p1 > .05
p 2 p 1 D
0.19 0.13 .05
z=
=
= 1.08
(.13)(.87) (.19)(.81)
p 1 (1 p 1 ) p 2 (1 p 2 )

2,060
5,000
n1
n2
No evidence to reject H0; cannot conclude that the campaign has increased the proportion of
people who prefer California wines by over 0.05.

8-33.

95% C.I. for p2 p1:


= .06 1.96

( p 2 p 1 ) 1.96

p 1 (1 p 1 ) p 2 (1 p 2 )

n1
n2

(.13)(.87) (.19)(.81)
= [0.0419, 0.0781]

2,060
5,000

We are 95% confident that the increase in the proportion of the population preferring California
wines is anywhere from 4.19% to 7.81%.
Confidence Interval

95%

8-34.

Confidence Interval
0.0600 0.0181
= [

0.0419 , 0.0782 ]

The statement to be tested must be hypothesized before looking at the data:


Chase Man. (1): n1 = 650
x 1 = 48
Manuf. Han. (2):

n 2 = 480

x 2 = 20

H0: p 1 p 2 0 H1: p 1 p 2 > 0


p 1 p 2
z=
= 2.248
1
1

p (1 p )
n1 n 2
Reject H0. p-value = 0.0122.
8-35.

American execs (1): n1 = 120

x 1 = 34

European execs (2):

x 2 = 41

H0: p 1 p 2 0

n 2 = 200

H1: p 1 p 2 > 0

8-15

Chapter 08 - The Comparison of Two Populations

z=

.283 .205
1
1
(.234)(1 .234)

120
200

= 1.601

At = 0.05, there is no evidence to conclude that the proportion of American executives who
prefer the A380 is greater than that of European executives. (p-value = 0.0547.)
Evidence

Sample 1 Sample 2
Size 120
200 n
x
#Successes
34
41
Proportion 0.2833 0.2050 p-hat

Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.2344
Test Statistic 1.6015 z

8-36.

Null Hypothesis

p-value

H0: p1 - p2 = 0

0.1093

H0: p1 - p2 >= 0

0.9454

H0: p1 - p2 <= 0

0.0546

At an of
5%

Cleveland (1): n1 = 1,000

x 1 = 75

p 1 = .075

n 2 = 1,000

x 2 = 72

p 2 = .072

Chicago (2):
H0: p 1 p 2 = 0
z=

H1: p 1 p 2 0

p 1 p 2
1
1

p (1 p )
n1 n 2

p = (72 +75)/2,000 = .0735

= 0.257

We cannot reject H0. p. value = 0.7971


8-37.

(Use template: testing difference in proportions.xls)


H0: pQ pN = 0
H1: pQ pN 0

8-16

Chapter 08 - The Comparison of Two Populations

Comparing Two Population Proportions


Evidence

Sample 1 Sample 2
Size 100
100 n
x
#Successes
18
6
Proportion 0.1800 0.0600 p-hat

Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.1200
Test Statistic 2.6112 z
At an of
p-value
5%

Null Hypothesis
H0: p1 - p2 = 0

0.0090

Reject

Reject the null hypothesis, the new accounting method is more effective.
8-38.

(Use template: testing difference in proportions.xls)


H0: pC pD = 0
H1: pC pD 0
Comparing Two Population Proportions
Evidence

Sample 1 Sample 2
Size 100
100 n
x
#Successes
32
19
Proportion 0.3200 0.1900 p-hat

Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.2550
Test Statistic 2.1090 z
Null Hypothesis
H0: p1 - p2 = 0

At an of
p-value
1%
0.0349

Do not reject the null hypothesis: the proportions are not significantly different.
8-39.

Motorola (1):

n1 = 120

x 1 = 101 p1 = .842

Blaupunkt (2):

n 2 = 200

x 2 = 110 p2 = .550

H0: p 1 p 2

H1: p 1 > p 2

z=

p = (101 +110)/320 = .659

.842 .550
1
1
(.659)(1 .659)

120 200

= 5.33

8-17

Chapter 08 - The Comparison of Two Populations

Strongly reject H0; Motorolas system is superior (p-value is very small).


8-40.

Old method (1):

n1 = 40

2
s1 = 1,288

New method (2):

n 2 = 15

2
s 2 = 1,112

H0:

2
1

H1:

2
1

>2

use = .05

2
2
s 1 /s 2

F (39,14) =
= 1,288/1,112 = 1.158
The critical point at = .05 is F (39,14) = 2.27 (using approximate df in the table). Do not reject
H0. There is no evidence that the variance of the new production method is smaller.
F-Test for Equality of Variances
Sample 1 Sample 2
Size
40
15
Variance

1288

1112

Test Statistic 1.158273 F


df1
39
df2
14

Null Hypothesis
H0:

2
1

H0:

2
1
2
H0: 1

8-41.

2
2

p-value

= 0 0.7977

>= 0 0.6012

<= 0 0.3988

2
2
2
2

At an of
5%

Test the equal-variance assumption of Problem 8-27:


2

H0: 1 = 2
F = 1.1025

H1:

2
1

2
2

Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 1.1025
p-value 0.9186

Do not reject H0. Variances are equal.


8-42.

Yes (1): n1 = 25 s1= 2.5


No (2):

n 2 = 25 s2= 1

H0: 1 = 2
H1: 1 2
2
Put the larger s in the numerator and use 2 :
2

F (24,24) = s1 / s2 = (2.5) /(1) = 6.25

8-18

Chapter 08 - The Comparison of Two Populations

From the F table using = .01, the critical point is F (24,24) = 2.66. Therefore, reject H0. The
population variances are not equal at = 2(.01) = 0.02.
F-Test for Equality of Variances

Size
Variance

6.25

Test Statistic
df1
df2

6.25
24
24

Null Hypothesis
H0:

2
1

2
H0: 1
2
H0: 1

8-43.

Sample 1 Sample 2
25
25

n1 = 21

2
2

2
2
2
2

p-value

= 0 0.0000

1
F

At an of
5%
Reject

>= 0 1.0000
<= 0 0.0000

s1 = .09
2

n2 = 28

Reject

s2 = .122

F (27,20) = (.122) /(.09) = 1.838


At = .10, we cannot reject H0 because the critical point for = .05 from the table with dfs =
30, 20 is 2.04 and for dfs 24, 20 it is 2.08. We did not reject H0 at = .10 so we would also not
reject it at = .02. Hence this particular C.I. contains the value 1.00.
8-44.

Before (1):

n1 = 12

s1 = 16,390.545

After (2):

n 2 = 11

2
s 2 = 86,845.764

H0: 1 = 2
H1: 1 2
F (10,11) = 5.298
The critical point from the table, using = .01, is F (10,11) = 4.54. Therefore, reject H0. The
population variances are probably not equal. p-value < .02 (double the ).

8-19

Chapter 08 - The Comparison of Two Populations

F-Test for Equality of Variances


Sample 1 Sample 2
Size
11
12
Variance 86845.76 16390.55
Test Statistic 5.298528 F
df1
10
df2
11

Null Hypothesis
2
H0: 1
2
H0: 1 2
H0: 1 -

8-45.

n1 = 25

p-value

At an of
1%

2
- 2=
2
2 >=
2
2 <=

0 0.9945
0 0.0055

Reject

s1 = 2.5

n2 = 25

s2 = 3.1

0 0.0109

H1: 1 2
H0: 1 = 2
= .02
F (24,24) = (3.1)2/(2.5)2 = 1.538
From the table: F .01(24,24) = 2.66. Do not reject H0. There is no evidence that the variances in the
two waiting lines are unequal.
8-46.

nA = 25

sA = 6.52
2

nB = 22

sB = 3.47

H1: A B
H0: A = B
= .01
F (24,21) = 6.52/3.47 = 1.879
The critical point for = .01 is F (24,21) = 2.80. Do not reject H0. There is no evidence that stock
A is riskier than stock B.
F-Test for Equality of Variances

Size
Variance

Sample 1 Sample 2
25
22
6.52

3.47

Test Statistic 1.878963 F


df1
24
df2
21

Null Hypothesis
H0:

2
1

2
H0: 1
2
H0: 1

2
2

2
2
2
2

p-value

At an of
1%

= 0 0.1485

>= 0 0.9258
<= 0 0.0742

8-20

Chapter 08 - The Comparison of Two Populations

8-47.

The assumptions we need are: independent random sampling from the populations in question,
and normal population distributions. The normality assumption is not terribly crucial as long as
no serious violations of this assumption exist. In time series data, the assumption of random
sampling is often violated when the observations are dependent on each other through time. We
must be careful.

8-48.

(Use template: testing difference in means.xls)


(need to use the t-test since the population std. dev. is unknown)
H0: Leg Knee= 0
H1: Leg Knee 0

Evidence
Sample1 Sample2
Size 200
Mean 10402
Std. Deviation 8500

200 n
11359 x-bar
9100 s

Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 1.14616
p-value 0.3367

Assuming Population Variances are Equal


Pooled Variance 7.8E+07 s2p
Test Statistic -1.0869 t
df
398
Null Hypothesis
H0: 1 2 =0

p-value
0.2778

H0: 1 2 >=0
H0: 1 2 <=0

0.1389
0.8611

At an of
5%

Do not reject the null hypothesis. The average cost of the two procedures are similar.
8-49.

99% C.I. for : Leg Knee:


Confidence Interval for difference in Population Means
Confidence Interval
99%
-957 2278.97 = [ -3235.97, 1321.97 ]

The C.I. contains zero as expected from the results of Problem 8-48.
8-50.

d = 51

d = 4.636

s d = 7.593

H0: u d 0 H1: u d > 0


4.636
= 2.025
t (10) =
7.593/ 11

Reject H0. Performance did improve after the sessions.

8-21

Chapter 08 - The Comparison of Two Populations

8-51.

For Problem 8-50:


95% C.I.: D t (10) s d / n
7.593
= 4.636 5.101 = [0.465, 9.737]
= 4.636 2.228
11
Confidence Intervals for the Difference in Means
(1 - ) Confidence Interval
95%
4.636
5.10105 = [ -0.465 ,
9.73705]

8-52.

(Use template: testing difference in proportions.xls)


H0: pNFL pSCI = 0
H1: pNFL pSCI 0
Comparing Two Population Proportions
Evidence

Sample 1 Sample 2
Size 200
200 n
x
#Successes
96
52
Proportion 0.4800 0.2600 p-hat

Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.3700
Test Statistic 4.5567 z
Null Hypothesis

At an of
p-value
5%

H0: p1 - p2 = 0

0.0000

H0: p1 - p2 >= 0

1.0000

H0: p1 - p2 <= 0

0.0000

Reject
Reject

Reject H0. There is evidence that NFL viewers watch more commercials than those viewing
Survivor.
8-53. 99% C.I. pNFL pSCI (for the difference between viewing commercials for NFL viewers vs.
Survivor viewers.)
Confidence Interval

99%

Confidence Interval
0.2200 0.1211
= [

0.0989 , 0.3411 ]

The C.I. does not contain zero, as expected.

8-22

Chapter 08 - The Comparison of Two Populations

8-54.

(Use template: testing difference in means.xls)


(need to use the t-test since the population std. dev. is unknown)
H0: CR Guat = 0
H1: CR Guat 0

Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

15
1242
50

n
x-bar
s

15
1240
50

H0: Population Variances Equal


F ratio
1
p-value 1.0000

Assuming Population Variances are Equal


Pooled Variance 2500 s2p
Test Statistic 0.1095 t
df 28
Null Hypothesis
H0: 1 2 =0

p-value
0.9136

H0: 1 2 >=0
H0: 1 2 <=0

0.5432
0.4568

At an of
5%

Do not reject the null hypothesis. The number of roses imported from both countries is about the
same.

8-55.

x1 = 60

n1 = 80

x2= 65

n2 = 100

p = 125/180 = .6944

H0: p1 p2 = 0
H1: p1 p2 0
p 1 p 2 0
.75 .65
=
= 1.447
z=
1
1
1
1
(.6944)(1 .6944)

p (1 p )
80 100
n1 n 2
Do not reject H0. (There is no evidence that one movie will be more successful than the other
(p-value = 0.1478).

8-23

Chapter 08 - The Comparison of Two Populations

Evidence

Sample 1 Sample 2
Size
80
100 n
x
#Successes
60
65
Proportion 0.7500 0.6500 p-hat

Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.6944
Test Statistic 1.4473 z
At an of
p-value
5%

Null Hypothesis
H0: p1 - p2 = 0

8-56.

0.1478

95% C.I. for the difference between the two population proportions:
( p 1 p 2 ) 1.96
= 0.10 1.96

p 1 (1 p 1 ) p 2 (1 p 2 )

n1
n2

(.75)(.25) (.65)(.35)
= [0.0332, 0.2332]

80
100

Yes, 0 is in the C.I., as expected from the results of Problem 8-55.


8-57.

K:
L:

nK = 12
nL = 12

x K = 12.55
x L = 11.925

H0: K L = 0
t (22) =

sK = .7342281
sL = .3078517

H1: K L 0
12.55 11.925
2

= 2.719

11(.7342281) 11(.3078517) 1
1

22
12
12

Reject H0. The critical points for t (22) at = .02 are 2.508. Critical points for t (22) at = .01
are 2.819. So .01 < p-value < .02. The L-boat is probably faster.

8-24

Chapter 08 - The Comparison of Two Populations

Evidence
Sample1 Sample2
n
Size
12
12
Mean 12.55
11.925 x-bar
Std. Deviation 0.73423 0.30785 s

Assuming Population Variances are Equal


Pooled Variance 0.31693 s2p
Test Statistic 2.7194 t
df
22
Null Hypothesis
H0: 1 2 =0

8-58.

p-value
0.0125

At an of
5%
Reject

Do Problem 8-57 with the data being paired. The differences KL are:
0.2
n = 12
t (11) =

1.0

0.2

D = .625

.625 0
.7723929/ 12

1.0

2.2

0.2

0.8

0.9

1.0

0.2

0.6

1.2

sD = .7723929
= 2.803

2.718 < 2.803 < 3.106 (between the critical points of t (11) for = .01 and .02).
Hence, .01 < p-value < .02, which is as before, in Problem 8-57 (the pairing did not help much
herewe reach the same conclusion).
Paired Difference Test
Evidence
Size

12

Average Difference 0.625 D

Assumption
Populations Normal

Stdev. of Difference 0.77239 sD


Note: Difference has been defined as
Test Statistic 2.8031 t
df 11
Hypothesis Testing
Null Hypothesis
p-value
H0: 1 2 =0
0.0172
H0: 1 2 >=0
0.9914
H0: 1 2 <=0
0.0086

8-59.

At an of
5%
Reject
Reject

(Use template: testing difference in proportions.xls)


H0: West South = 0

H1: West South 0

8-25

Chapter 08 - The Comparison of Two Populations

Comparing Two Population Proportions


Evidence

Sample 1 Sample 2
Size 1000
1000 n
#Successes 49.5
67.9 x
Proportion 0.0495 0.0679 p-hat

Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.0587
Test Statistic -1.7503 z
At an of
p-value
5%

Null Hypothesis
H0: p1 - p2 = 0

0.0801

Do not reject the null hypothesis: the delinquency rates are the same.
8-60.

IIT (1):

n1 = 100

p 1 = 0.94

Competitor (2):

n2 = 125

p 2 = 0.92

H0: p1 p2 = 0

H1: p1 p2 0

z=

.02
1
1
(.9288)(1 .9288)

100 125

p = .92888

= 0.58

There is no evidence that one program is more successful than the other.
8-61.

Design (1):

n1 = 15

x1 = 2.17333

Design (2):

n2 = 13

x 2 = 2.5153846

H0: 2 1 = 0

s1 = .3750555
s2 = .3508232

H1: 2 1 0
2.5153846 2.173333

t (26) =

= 2.479

14(.3750555) 12(.3508232) 1
1

26
15
13

p-value = .02. Reject H0. Design 1 is probably faster.


8-62.

H0:

2
1

H1: 1

=2
2

2
2

F (14,12) = s1 / s2 = (.3750555)2/(.3508232)2 = 1.143

8-26

Chapter 08 - The Comparison of Two Populations

Do not reject H0 at = 0.10. (Since 1.143 < 2.62. Also < 2.10, so the p-value > 0.20.) The
solution of Problem 8-61 is valid from the equal-variance requirement.

8-63.

A = After:

nA = 16

B = Before: nB = 15
H0: A B 5

x A = 91.75

sA = 5.0265959

x B = 84.7333

sB = 5.3514573

H1: A B > 5
91.75 84.733 5

t (29) =

= 1.08

15(5.0265959) 14(5.3514573) 1
1

29
16 15

Do not reject H0. There is no evidence that advertising is effective.


8-64.

H0: 1 = 2 H1: 1 2
F (14,15) = (5.3514573)2/(5.0265959)2 = 1.133
Do not reject H0 at = 0.10. There is no evidence that the population variances are not equal.
F-Test for Equality of Variances

Size

Sample 1 Sample 2
15
16

Variance 28.6381

25.26667

Test Statistic 1.133434 F


df1
14
df2
15

Null Hypothesis
H0:

2
1

2
H0: 1
2
H0: 1

8-65.

2
2

2
2
2
2

p-value

At an of
10%

= 0 0.8100

>= 0 0.5950
<= 0 0.4050

From problem 8-48:


sL = 8500
sK = 9100
H0:

2
L

=K

H1: L K

Evidence
Sample1 Sample2
Size 200
Mean 10402
Std. Deviation 8500

200 n
11359 x-bar
9100 s

8-27

Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 1.14616
p-value 0.3367

Chapter 08 - The Comparison of Two Populations

F = 1.146 p = 0.34
Do not reject the null hypothesis of equal variances.

8-66.

H0: K 2 = L 2

H1: K 2 L 2

F (11,11) = (.7342281)2/(.3078517)2 = 5.688


Critical point for = 0.02 is about 4.5. Therefore, reject H0. Thus the analysis in Problem 8-57
is not valid. We need to use the other test. The other test also gives t = 2.719 but the df are
obtained using Equation (8-6):
( s1 / n1 s2 / n2 ) 2
2

df =

= approximately 14 (rounded downward).


( s12 / n1 ) ( s2 2 / n2 ) 2

n 1 n 1
1
2

t .02(14) = 2.624 < 2.719 < 2.977 = t .01(14), hence 0.01 < p-value < 0.02. Reject H0.
8-67.

Differences A B:
11 3
D = 2.375

t (15) =

14 8 10 5 7

2 12

5 10 22 12

sD = 9.7425185 n = 16

2.375 0
9.7425185/ 16

H0: D = 0

= 0.9751

H1: D 0

Do not reject H0. There is no evidence that one package is better liked than the other.
Paired Difference Test
Evidence
Size

16

Average Difference -2.375 D

Assumption
Populations Normal

Stdev. of Difference 9.74252 sD


Note: Difference has been defined as
Test Statistic -0.9751
df 15
Hypothesis Testing
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0

8-68.

t
At an of
5%

p-value
0.3450
0.1725
0.8275

Supplier A: nA = 200 xA = 12
Supplier B: nB = 250 xB = 38
H0: pA pB = 0
H1: pA pB 0

p = (12 +38)/450 = .1111

8-28

Chapter 08 - The Comparison of Two Populations

p A p B 0

z=

1
1

p (1 p )
n
n
2
1

.06 .152

1
1

(.1111)(.8888)

200
250

= 3.086

Reject H0. p-value = .002. Supplier A is probably more reliable as the proportion of defective
components is lower.
8-69.

95% C.I. for the difference in the proportion of defective items for the two suppliers:
( p B p A ) 1.96

p A (1 p A ) p B (1 p B )

nA
nB

=.092 1.96(.0282415) = [0.0366, 0.1474].


Confidence Interval

95%

8-70.

Confidence Interval
0.0920 0.0554
= [

0.0366 , 0.1474 ]

90% C.I. for the difference in average occupancy rate at the Westin Plaza Hotel before and after
the advertising:
2

15(5.0265959) 14(5.3514573) 1
1
( x B x A ) 1.699

29
15 16

= 7.016667 3.1666375 = [3.85, 10.18] percent occupancy.


8-71.

(Use template: testing difference in means.xls)


(need to use the t-test since the population std. dev. is unknown)
H0: B O = 0
H1: B O 0
t-Test for Difference in Population Means
Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

25
60
14

20
65
8

n
x-bar
s

Assumption of equal variances is violated.


Assuming Population Variances are Unequal
Test Statistic -1.5048 t
df
39
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0

p-value
0.1404
0.0702
0.9298

At an of
5%

8-29

H0: Population Variances Equal


F ratio 3.0625
p-value 0.0155

Chapter 08 - The Comparison of Two Populations

Do not reject the null hypothesis. The price of the two virtual dolls is about the same.
8-72.

(Use template: testing difference in means.xls)


(need to use the t-test since the population std. dev. is unknown)
H0: A B = 0
H1: A B 0
t-Test for Difference in Population Means
Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

74
28
6

n
x-bar
s

65
22
6

H0: Population Variances Equal


F ratio
1
p-value 1.0000

Assuming Population Variances are Equal


Pooled Variance 36 s2p
Test Statistic 5.8825 t
df 137
Null Hypothesis
H0: 1 2 =0

p-value
0.0000

At an of
5%
Reject

H0: 1 2 >=0
H0: 1 2 <=0

1.0000
0.0000

Reject

Reject the null hypothesis: the average returns are similar.


8-73.

(Use template: testing difference in means.xls; sheet:t-test from stats)


H0: 2 1 = 0
H1: 2 1 0
t-Test for Difference in Population Means
Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

74
50
20

65
14
8

The assumption of equal variances is violated.

8-30

n
x-bar
s

H0: Population Variances Equal


F ratio 6.25
p-value 0.0000

Chapter 08 - The Comparison of Two Populations

Assuming Population Variances are Unequal


Test Statistic 14.2414 t
df
98
Confidence Interval for difference in Population
At an of Means
p-value
5%
Confidence Interval
Reject
0.0000
95%
36 5.01643 = [ 30.9836, 41.0164 ]
1.0000
Reject
0.0000

Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0

The 95% CI: [$30.98M, $41.02M]


8-74.

a.

n1 = 2500 x1 = 39

s1 = s2 = 2
H0: u1 = u2
z=

x 2 = 35

= .05
H1: u1 u2
39 35

n 2 = 2500

= 70.711

2 / 2500 2 / 2500

Reject H0. The average workweek has shortened.


2

b. 95% C.I.: (39 35) 1.96 2 / 2500 2 / 2500


= 4 .1109 = [3.8891, 4.1109]

8-75.

(Use template: testing difference in means.xls; sheet:t-test from stats)


H0: 2 1 = 0
H1: 2 1 0
t-Test for Difference in Population Means
Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size
Mean
Std. Deviation

25
1.7
0.4

25
1.5
0.7

n
x-bar
s

H0: Population Variances Equal


F ratio 3.0625
p-value 0.0081

The assumption of equal variances is violated.


Assuming Population Variances are
Unequal
Test Statistic 1.24035 t
df
38
Null Hypothesis
H0: 1 2 =0

p-value
0.2225

At an of
5%

Do not reject the null hypothesis. The mean catches are about the same. p-value = 0.2225

8-31

Chapter 08 - The Comparison of Two Populations

8-76.

Yes. Lower income households are less likely to have internet access. (p-value = 0.0038)
Comparing Two Population Proportions
Evidence
Size
#Successes
Proportion

Sample 1
500
350
0.7000

Sample 2
n
500
x
310
p-hat
0.6200

Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat

0.6600

Test Statistic

2.6702

Null Hypothesis

p-value

At an of
5%
Reject

H0: p1 - p2 = 0

0.0076

H0: p1 - p2 >= 0

0.9962

H0: p1 - p2 <= 0

0.0038

Reject

8-77. The 95% C.I. contains 0, which supports the results from 8-75.
Confidence Interval for difference in Population Means
Confidence Interval
95%
0.2 0.32642
= [
-0.1264, 0.52642 ]

8-78

The ration of the variances is 3.18. The degrees of freedom for both samples is 10 1 = 9.
Using the F-table for 9 degrees of freedom in both the numerator and the denominator, we find a
value of 3.18 when = 0.05. Therefore, there is a 5% chance.

8-79

(Use template: testing difference in means.xls; sheet:t-test from data)


1. Assuming equal variances:
H0: 2 1 = 0
H1: 2 1 0

8-32

Chapter 08 - The Comparison of Two Populations

t-Test for Difference in Population Means


Data
Co.1
Co.2
Sample1 Sample2
2570
2480
2870
2975

2055
2940
2850
2475

2660

1940

2380
2590
2550
2485
2585
2710

2100
2655
1950
2115

Evidence:
Sample1 Sample2
n
Size
11
9
Mean 2623.18 2342.22 x-bar
Std. Deviation 174.087 393.55 s

Assuming Population Variances are Equal


Pooled Variance 85673.3 s2p
Test Statistic 2.1356 t
df
18
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0

p-value
0.0467

At an of
5%
Reject

0.9766
0.0234

Reject

At 0.05 level of significance, reject the null hypothesis that the charges are the same.
2. Test the assumption of equal variances.

H0 : 12 22

H1 : 12 22

Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 5.11054
p-value 0.0193

Reject null hypothesis: the variances are not equal.


3.Assuming unequal variances,
H0: 2 1 = 0
H1: 2 1 0

8-33

Chapter 08 - The Comparison of Two Populations

Assuming Population Variances are Unequal


Test Statistic 1.98846 t
df
10
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0

At an of
5%

p-value
0.0748
0.9626
0.0374

Reject

Accept the null hypothesis: the charges are not different.


Case 10: Tiresome Tires II
1) Do not reject the null hypothesis at 5%
Evidence

Assumptions
Populations Normal

Sample1 Sample2
Size 40
40 n
Mean 2742.5 2729.35 x-bar
Std. Deviation 32.8883 38.3189 s

H0: Population Variances Equal


F ratio
1.16512
p-value
0.6356

Assuming Population Variances are Equal


Pooled Variance 1274.99 s2p
Test Statistic 1.6470 t
df 78
Null Hypothesis

p-value

H0: 1 2 <=0

0.0518

At an of Confidence Interval for difference in Population Means


Confidence Interval
5%

95%
13.15 15.8956
=

2) Increasing would decrease . Increasing to any value above 5.18% will cause the null
hypothesis to be rejected.
3) Paired difference test: Reject the null hypothesis, (p-value = 0.0471)

8-34

Chapter 08 - The Comparison of Two Populations

Paired Difference Test


Data
Old Meth New Meth Evidence
Sample1 Sample2

Size

Average Difference

40

2792

2713

13.15

2
3
4
5
6
7
8
9
10
11

2755
2745
2731
2799
2793
2705
2729
2747
2725
2715

2741 Stdev. of Difference 48.4877


2701
2731
Test Statistic 1.7152
2747
df
39
2679 Hypothesis Testing
Null Hypothesis
2773
2676
2677
H0: 1 2 <=0
2721
2742

Assumption

Populations Normal

sD
Note: Difference has been defined as
Sample1 - Sample2

p-value

At an of
5%

0.0471

Reject

4) Reducing the variance of the new process will decrease the chances of a Type I error.

8-35

Chapter 09 - Analysis of Variance

CHAPTER 9
ANALYSIS OF VARIANCE
9-1.

H0:
H1:

X X X X
X X
X X
X X X
X
X X X X
X X X X

1 2 3 4
All 4 are different
2 equal; 2 different
3 equal; 1 different
2 equal; other 2 equal but different from first 2

9-2.

ANOVA assumptions: normal populations with equal variance. Independent random sampling
from the r populations.

9-3.

Series of paired t-test are dependent on each other. There is no control over the probability of a
Type I error for the joint series of tests.

9-4.

r = 5 n1 = n2 = . . . = n5 = 21 n =105
dfs of F are 4 and 100. Computed F = 3.6. The p-value is close to 0.01. Reject H0. There is
evidence that not all 5 plants have equal average output.
F Distribution
10%
(1-Tail) F-Critical 2.0019

9-5.

5%
2.4626

1%
3.5127

0.50%
3.9634

r = 4 n1 = 52 n2 = 38 n3 = 43 n4 = 47
Computed F = 12.53. Reject H0. The average price per lot is not equal at all 4 cities. Feel very
strongly about rejecting the null hypothesis as the critical point of F (3,176) for = .01 is
approximately 3.8.
F Distribution
10%
(1-Tail) F-Critical 2.1152

5%
2.6559

1%
3.8948

0.50%
4.4264

9-6.

Originally, treatments referred to the different types of agricultural experiments being performed
on a crop; today it is used interchangeably to refer to the different populations in the study.Errors
are the differences between the data points and their sample means.

9-7.

Because the sum of all the deviations from a mean is equal to 0.

9-1

Chapter 09 - Analysis of Variance

9-8.
9-9.

Total deviation = xij x = ( x i x ) + x ij xi


= treatment deviation + error deviation.
The sum of squares principle says that the sum of the squared total deviations of all the data
points is equal to the sum of the squared treatment deviations plus the sum of all squared error
deviations in the data.

9-10.

An error is any deviation from a sample menu that is not explained by differences among
populations. An error may be due to a host of factors not studied in the experiment.

9-11.

Both MSTR and MSE are sample statistics given to natural variation about their own means.
(If x > 0 we cannot immediately reject H0 in a single-sample case either.)

9-12.

The main principle of ANOVA is that if the r population means are not all equal then it is likely
that the variation of the data points about their sample means will be small compared to the
variation of the sample means about the grand mean.

9-13.

Distances among populations means manifest themselves in treatment deviations that are large
relative to error deviations. When these deviations are squared, added, and then divided by dfs,
they give two variances. When the treatment variance is (significantly) greater than the error
variance, population mean differences are likely to exist.

9-14.

a) degrees of freedom for Factor: 4 1 = 3


b) degrees of freedom for Error: 80 4 = 76
c) degrees of freedom for Total: 80 1 = 79

9-15

SST = SSTR + SSE, but this does not equal MSTR + MSE. A counterexample:
Let n = 21

r = 6 SST = 100

SSTR = 85

SSE = 15

Then SST = SSTR + SSE = 85 + 15 = 100.


But = MSTR MSE

SST
SSTR SSE 85 15

18
n 1
r 1 n r 5 15

9-16.

When the null hypothesis of ANOVA is false, the ratio MSTR/MSE is not the ratio of two
independent, unbiased estimators of the common population variance 2 , hence this ratio does
not follow an F distribution.

9-17.

For each observation xij , we know that


(tot.) = (treat.) + (error):

xij x = ( x i x ) + x ij xi

Squaring both sides of the equation:

( xij x )2 = ( x i x )2 + 2( x i x )( xij x i ) + ( xij x i )2

9-2

Chapter 09 - Analysis of Variance

Now sum this over all observations (all treatments i = 1, . . . , r; and within treatment i, all
observations j = 1, . . . , ni:
ni

( xij x )2 =

i 1 j 1

ni

( x i x )2 +

i 1 j 1

ni

2( x i x )( xij x i ) +

ni

( xij x i )2

i 1 j 1

i 1 j 1

Notice that the first sum of the R.H.S. here equals

ni( x i x )2 since for each i the

i 1

summand doesnt vary over each of the ni) values of j. Similarly the second sum is
r

i 1

ni

[( x i x ) ( xij x i )]. But for each fixed i,


j 1

ni

( xij x i ) = 0 since this is just the sum

j 1

of all deviations from the mean within treatment i. Thus the whole second sum in the long R.H.S.
above is 0, and the equation is now
r

ni

( xij x )2 =

i 1 j 1

ni( x i x )2 +

ni

( xij x i )2

i 1 j 1

i 1

which is precisely Equation (9-12).


9-18.

(From Minitab):
Source
df
SS
MS
F
Treatment
2
381127
190563
20.71
Error
27
248460
9202
Total
29
629587
The critical point for F (2,27) at = 0.01 is 5.49. Therefore, reject H0. The average range of the 3
prototype planes is probably not equal.

5%

ANOVA Table
Source
SS
Between 381127
Within 248460
Total 629587

9-19.

df

MS
Fcritical p-value
F
2 190563.33 20.7084038 3.3541312 0.0000 Reject
27 9202.2222
29

(Template: Anova.xls, sheet: 1-way):


ANOVA Table
Source
SS
Between 187.696
Within 152.413
Total 340.108

5%
df
3
28
31

MS
Fcritical p-value
F
62.565 11.494 2.9467 0.0000 Reject
5.4433

9-3

Chapter 09 - Analysis of Variance

MINITAB output
One-way ANOVA: UK, Mex, UAE, Oman
Source
Factor
Error
Total

DF
3
28
31

S = 2.333

SS
187.70
152.41
340.11

MS
62.57
5.44

R-Sq = 55.19%

F
11.49

P
0.000

R-Sq(adj) = 50.39%
Individual 95% CIs For Mean Based on

Pooled
Level
UK
Mex
UAE
Oman

Mean

StDev

8
8
8
8

60.160
58.390
55.190
54.124

2.535
2.405
2.224
2.149

StDev
+---------+---------+---------+-------(------*-----)
(------*-----)
(------*------)
(-----*------)
+---------+---------+---------+--------

52.5

55.0

57.5

60.0

Pooled StDev = 2.333

Critical point F (3,28) for = 0.05 is 2.9467. Therefore we reject H0. There is evidence of
differences in the average price per barrel of oil from the four sources. The Rotterdam oil market
may not be efficient. The conclusion is valid only for Rotterdam, and only for Arabian Light. We
need to assume independent random samples from these populations, normal populations with
equal population variance. Observations are time-dependent (days during February), thus the
assumptions could be violated. This is a limitation of the study. Another limitation is that
February may be different from other months.

9-20.

An F(.05,2,101) = 3.61 result, relative to a critical value of 3.08637, indicates a significant difference
in their perceptions on the roles played by African American models in commercials.

9-21.

(From Minitab):
Source
Treatment
Error
Total

df
2
38
40

SS
91.0426
140.529
231.571

9-4

MS
45.5213
3.69812

F
12.31

Chapter 09 - Analysis of Variance

p-value = .0001. Critical point for F (2,38) at = .05 is 3.245. Therefore, reject H0. There is a
difference in the length of time it takes to make a decision.

5%

ANOVA Table

Source
SS
df
MS
Fcritical
p-value
F
Between 91.0426
2 45.521302 12.3093042 3.2448213 0.0001 Reject
Within 140.529 38 3.6981215
Total 231.571 40

9-22.

An F(.05,2,55) = 52.787 result, relative to a critical value of 3.165, indicates a significant difference
in the monetary-economic reaction to the three inflation fighting policies.

9-23.

The test results exceed the critical value of F(.01,3,236) = 3.866. The results indicate that the
performances of the four different portfolios are significantly different.

9-24.

95% C.I. for the mean responses:


Martinique: x2 t / 2 MSE / n2 = 75 1.96 504.4 / 40 = [68.04, 81.96]
Eleuthera: 73 1.96 MSE / n3 = [66.04, 79.96]
Paradise Island: 91 1.96 MSE / n4 = [84.04, 97.96]
St. Lucia: 85 1.96 MSE / n5 = [78.04, 91.96]

9-25.

Where do differences exist in the circle-square-triangle populations from Table 9-1, using
Tukey? From the text:
MSE = 2.125
triangles: n1 = 4
x1 = 6
squares:

n2 = 4

x 2 = 11.5

circles:

n3 = 3

x3 = 2

For = .01, q (r,nr) = q 0.01(3,8) = 5.63 Smallest ni is 3:


T = q MSE / 3 = 5.63 2.125 / 3 = 4.738
| x1 x 2 | = 5.5 > 4.738

sig.

| x 2 x 3 | = 9.5 > 4.738

sig.

| x1 x 3 | = 4.0 < 4.738

n.s.

Thus: 1 = 3; 2 > 1; 2 > 3

9-26.

Find which prototype planes are different in Problem 9-18:


MSE = 9,202 ni = 10 for all i
x A = 4,407
x B = 4,230

xC = 4,135

For = .05, q (3,27) = approximately 3.51. T = 3.51 9,202 / 10 = 106.475

9-5

Chapter 09 - Analysis of Variance

| x A x B | = 177 > 106.475

sig.

| x B xC | = 95 < 106.475

n.s.

| x A xC | = 272 > 106.475

sig.

Prototype A is shown to have higher average range than both B and C. Prototypes B and C have
no significant difference in average range (all conclusions are at = 0.05).
Tukey test for pairwise comparison of group means
A
r
B Sig B
3
n-r
C Sig
C
27
q0
T

9-27.

3.51
106.476

Since H0 was rejected in Problem 9-19, there are significant differences.


T = q0.05(4,28) 5.4433 / 8 = 3.332
|UK MEX| = |60.16 58.39|
= 1.77
|UK UAE| = |60.16 55.19|
= 4.97
|UK OMAN| = |60.16 54.1238| = 6.0362
|MEX UAE| = |58.39 55.19|
= 3.2
|MEX OMAN| = |58.39 54.1238| = 4.2662
|UAE OMAN| = |55.19 54.1238| = 1.0662
All are < 0.22, thus not significantas expected.
Tukey test for pairwise comparison of group means
UK
r
Mex
Mex
4
n-r
UAE Sig
UAE
28
Oman Sig
Sig
q0
4.04
T
3.33248

9-28.

(Question has no relevance to 9-20)

9-29.

Degrees of freedom for Factor: 3-1 = 2


Degrees of freedom for Error: 157 3 = 154
Degrees of freedom for Total: 157 1 = 156
The overall F test indicates that there is a difference in the groups reaction to pricing tactics.
The subsequent information also indicates that there is a significant difference between each of
the groups reactions.

9-30.

a) Total sample size = 275


b) The critical value for F(.05, 2, 272) is 3.029; therefore the overall ANOVA test is very significant.
c) Monopoly prices are significantly different than limited competition and strong competition.

9-6

Chapter 09 - Analysis of Variance

9-31.

We cannot extend the results to planes built after the analysis. We used fixed effects here, not
random effects. The 3 prototypes were not randomly chosen from a population of levels as would
be required for the random effects model.

9-32.

A randomized complete block design is a design with restricted randomization. Each block of
experimental units is assigned to treatments with randomization of treatments within the block.

9-33.

Fly all 3 planes on the same route every time. The route (flown by the 3 planes) is the block.

9-34.

Look at the residuals. If the spread of the residuals is not equal, we probably have unequal 2 ,
the assumption of equal variances is violated. A histogram of the residuals will reveal normality
violations.

9-35.

Otherwise you are not randomly sampling from a population of treatments, and inference is not
valid for the entire population.

9-36.

No. Rotterdam (and Arabian Light) was not randomly chosen.

9-37.

If the locations and the artists are chosen randomly, we have a random effects model.

9-38.

1. Testing for possible interactions among factor levels.


2. Efficiency.

9-39.

Limitations and problems: (1) We dont know the overall significance level of the 3 tests; (2) If
we have 1 observation per cell then there are 0 degrees of freedom for error. Also, for a fixed
sample size there is a reduction of the df for error.

9-40.

1. As more factors are included, df for error decreases.


2. As more factors are included, we lose the control on , and the probability of at least one
Type I error increases.

9-41.

Since there are interactions, there are differences in emotions averaged over all levels of
advertisements.

9-42.

At = 0.05:
Location: F = 50.6, significant
Job type: F = 50.212, significant
Interaction: F = 2.14, n.s.

9-7

Chapter 09 - Analysis of Variance

ANOVA Table
Source
SS
df
MS
F
Location 2520.988
2 1260.49 50.645
Job Type 2499.432
2 1249.72 50.212
Interaction 212.716
4 53.179 2.1367
Error
1792
72 24.8889
Total 7025.136
80

9-43.
Morning
Evening
Late Night

ABC
50
50
50

CBS
50
50
50

NBC
50
50
50

5%
Fcritical p-value
3.1239 0.0000 Reject
3.1239 0.0000 Reject
2.4989 0.0850

Source
Network
Newstime
Interaction
Error
Total

SS
145
160
240
6200
6745

df
2
2
4
441
449

MS
72.5
80
60
14.06

F
5.16
5.69
4.27

From table:
F 0.01(4,400) = 3.36
F 0.01(2,400) = 4.66
Therefore, all are significant at = 0.01. There are interactions. There are Network main effects
averaged over Newstime levels. There are Newstime main effects over Network levels.
Levels of task difficulty: a 1 = 1; therefore a = 2
Levels of effort: b 1 = 1; therefore b = 2
There are no task difficulty main effects because p-value = 0.5357
There are effort main effects because p-value < 0.0001
There are no significant interactions, as p-value = 0.1649.

9-44.

a.
b.
c.
d.
e.

9-45.

a. Explained is Treatment: Treat = Factor A + Factor B + (AB)


b. Levels of exercise price: a 1 = 2; therefore a = 3
c. Levels of time of expiration: b1 = 1; therefore b = 2
ab(n 1) = 144, a = 3, b = 2; therefore n 1 = 24, n = 25, N = 25 6 = 150
n = 25
There are no exercise-price main effects (F = 0.42 < 1).
There are time-of-expiration main effects at = 0.05 but not at = 0.01 because F (1,144) =
4.845. From the F table, for dfs = 1, 150: critical point for = 0.05 is 3.91 and for = 0.01
it is 6.81.
h. There are no interactions: F = .193 < 1
i. There is some evidence for time-of-expiration main effects. There is no evidence for
exercise-price main effects or interaction effects.
j. For time-of-expiration main effects, .01 < p-value < .05. For the other two tests, the p-values
are very high.

d.
e.
f.
g.

k. We could use a t-test for time-of-expiration effects: t

9-8

2
(144)

= F (1,144)

Chapter 09 - Analysis of Variance

9-46.

Since there are interactions but neither of the main factors have significant F-tests, a likely
conclusion is that the two factors work in opposite directions, i.e., inverse to each other.

9-47.

Advantages: reduced experimental errors (the effects of extraneous factors) and greater economy
of sample sizes.

9-48.

Use blocking by firm, to reduce the error contributions arising from differences between firms.

9-49.

Could use a randomized blocking design: 4 observations, UK, Mexico, UAE, Oman at 4
locations and 4 different dates.

9-50.

A good blocking variable would be size of firm in terms of total assets or total sales, etc.

9-51.

Yes. Have people of the same occupation/age/demographics use sweaters of the 3 kinds under
study. Each group of 3 people are a block.

9-52.

As stated in 9-23, a good blocking variable would be some measure of diversity in the portfolio.

9-53.

We could group the executives into blocks according to some choice of common characteristics
such as age, sex, years employed at current firm, etc. The different blocks for the chosen attribute
would then form a third variable beyond Location and Type to use in a 3-way ANOVA.

9-54.

We must assume no block-factor interactions.

9-55.

SSTR = 3,233
SSE = 12,386 n = 100 blocks
df error = (n 1)(r 1) = 99(2) = 198 df treatment = r 1 = 2
3,233/ 2
= 25.84
12,386 / 198
Reject H0. p-value is very small. There are differences among the 3 sweeteners. Should be
very confident of results. Blocking reduces experimental error here, as people of the same
weight/age/sex will tend to behave homogeneously with respect to losing weight.

F = MSTR/MSE =

9-56.

n = 70
r=4
SSTR = 9,875 SSBL = 1,445 SST = 22,364
SSE = 22,364 1,445 9,875 = 11, 044
MSE =

11,044
= 53.35
(69)(3)

MSTR =

9,875
= 3,291.67
3

F (3,207) = MSTR/MSE = 61.7


Reject H0. p-value is very small. Not all of the four methods are equally effective.

9-9

Chapter 09 - Analysis of Variance

9-57.

SSTR = 7,102
SSE = 10,511 r = 8 ni = 20 for all i
MSTR = SSTR/(r 1) = 7,102/7 = 1,014.57
MSE = SSE/(n r) = 10,511/(160 8) = 69.15
F (7,152) = 14.67 > 2.76 (crit. point for = 0.01). Therefore, reject H0. Not all tapes are equally
appealing. p-value is very small.

9-58.

n1 = 32
n2 = 30
n3 = 38
n4 = 41
n =141
MSTR = SSTR/(r 1) = 4,537/3 = 1,512.33
F (3,137) = MSTR/MSE = 1,512.33/412 = 3.67
(at = 0.05) 2.67 < 3.67 < 3.92
(at = 0.01)
We can reject H0 at = 0.05. There is some evidence that the four names are not all equally well
liked.

9-59.

Software packages: 3
SS software = 77,645
SS computer = 54,521
SS int. = 88,699
SSE = 434,557
n = 60
Source
software
computer
interaction
error
Total

Computers 4

SS
77,645
54,521
88,699
434,557
655,422

df
2
3
6
708
719

MS
38,822.5
18,173.667
14,783.167
613.78

F
63.25
29.60
24.09

Both main effects and the interactions are highly significant.


9-60.

Treatment df = (r-1) = 2
Block df = 74
Total df = 224
Total sample size was 225:
Error df = (n-1)(r-1) = (74)(2) = 148
Critical value of F(.05, 2, 148) = 3.0572, which is less than F = 13.65. The results are significant.

9-10

Chapter 09 - Analysis of Variance

9-61.
Source
pet
location
interaction
error
Total

SS
22,245
34,551
31,778
554,398
642,972

df
3
3
9
144
159

MS
7,415
11,517
3,530.89
3,849.99

F
1.93
2.99
0.92

There are no interactions. There are no pet main effects.


( = 0.05) 2.68 < 2.99 < 3.92
( = 0.01)
Thus there are location main effects at = 0.05.
9-62. F-ratio = 4.5471 p-value = .0138 (using a computer). At = 0.05, only groups 1 and 3 are
significantly different from each other. Drug group is significantly different from the No.
Treatment group.

5%

ANOVA Table
Source
SS
Between 3203.12
Within 25359.6
Total 28562.7

df

MS
Fcritical
p-value
F
2
1601.56 4.54708749 3.123901138 0.0138 Reject
72 352.21667
74

Confidence Intervals of Group Means


Group
Confidence Interval
Drug
24.16

7.4824354
Placebo
27.8

7.4824354
N0-Treatment
39.48

7.4824354

95%
95%
95%

Tukey test for pairwise comparison of group means


Drug
r
Placebo
Placebo
3
n-r
N0-Treatment
Sig
72
q0 3.41
T 12.7994

9-63.

N0-Treatment

a. Blocking (repeated measures) is more efficient as every person is his/her own control.
Reductions in errors. Limitations? Maybe carryover effects from trial to trial.

9-11

Chapter 09 - Analysis of Variance

9-64.

b. SSTR = 44,572
SSE = 112,672
r=3
n= 30
MSTR = 44,572/2 = 22,286
MSE = 112,672/(29)(2) = 1,942.62
F (2,58) = 11.47. Reject H0.
n1 = n2 = n3 = 15
r = 3 A one-way ANOVA gives an F-value of 22.21, which is significant
even at < 0.001, hence we reject the hypothesis of no differences among the three models.
MSE = 48.1, so at = 0.01 we use the critical point q = 4.37 (closest to the required value for
dfs = 3, 42), giving the Tukey criterion T = q MSE / ni = 7.83. Observed means:
xGI = 124.73

x P = 121.40

| xGI x P | = 3.33

So:

x Z = 108.73

| xGI x Z | = 16.00*

| x P x Z | = 12.67*

Using T = 7.83, we reject the hypothesis of xGI = x Z and also x P = x Z (at the 0.01 level of
significance), but not the xGI = x P hypothesis.

5%

ANOVA Table
Source
SS
Between 2137.78
Within 2021.47
Total 4159.24

df

MS
Fcritical
p-value
F
2 1068.8889 22.2082976 3.219938094 0.0000 Reject
42 48.130159
44

Confidence Intervals of Group Means


Group
Confidence Interval
GI
124.733

3.6149467
Phillips
121.4

3.6149467
Zenith
108.733

3.6149467

95%
95%
95%

Tukey test for pairwise comparison of group means


GI
r
Phillips
Phillips
3
n-r
Zenith
Sig
Sig
42
q0 4.37
T 7.82789

9-65.

n = 50
r =3
SSTR = 128,889
F (2,98) =

128,899 / 2
42,223,987 / 98

SSE = 42,223.987
= 0.14958

Do not reject the null hypothesis


9-66.

2
(df)

= F (1,df)

9-12

Zenith

Chapter 09 - Analysis of Variance

9-67.

Rents are equal on average. There is no evidence of differences among the four cities.

9-68.

Answers will vary depending upon which report is selected.

9-69.

A one-way ANOVA strongly rejecting H0. For the three levels of Store, 95% confidence
intervals are calculated for means, as shown, which do not overlap at all.
Case 11: Rating Wines
(Template: ANOVA.xls, sheet: 1-Way)
data:
n

11

1
2
3
4
5
6
7
8
9
10
11

Chard
89
88
89
78
80
86
87
88
88
89
88

10

13

11

Merlot C.Blanc C.Sauv


91
81
92
88
81
89
99
81
89
90
82
9
91
81
92
88
78
90
88
79
91
89
80
93
90
83
91
87
81
97
88
88
85
86

1) Do not reject the null hypothesis, there is no difference in the average ratings due to the type of
grape.
ANOVA Table
Source
SS
Between 411.617
Within 6545.63
Total 6957.24

5%
df
3
41
44

MS
Fcritical p-value
F
137.21 0.8594 2.8327 0.4698
159.65

Case 12: Checking out Checkout

9-13

Chapter 09 - Analysis of Variance

1.
ANOVA
n
1
2
3
4
5
6
7
8
9
10

10

10

10

Scan1 Scan2 Scan3


16
13
18
15
18
19
12
13
15
15
15
14
16
18
19
15
14
16
15
15
17
14
15
14
12
14
15
14
16
17

5%

ANOVA Table
Source
Between
Within
Total

SS
df
20.6
2
79.7 27
100.3 29

MS
10.3
2.9519

Fcritical p-value
F
3.4893 3.3541 0.0449 Reject

Reject the null hypothesis of equal number of scans per minute.


2. Rows = Clerks, columns = scanners

ANOVA Table
5%
Source
SS
df
MS
Fcritical p-value
F
Row 20.76667
4 5.19167 2.1239 2.5787 0.0934
Column
90.7
2
45.35 18.552 3.2043 0.0000 Reject
Interaction 14.13333
8 1.76667 0.7227 2.1521 0.6705
Error
110
45 2.44444
Total
235.6
59

Reject the null hypothesis of equal number of scans per minute (columns)
Do not reject the null hypothesis that the clerks are equally efficient.
There are no interaction effects present.

9-14

Chapter 10 - Simple Linear Regression and Correlation

CHAPTER 10
SIMPLE LINEAR REGRESSION AND CORRELATION
(The template for this chapter is: Simple Regression.xls.)
10-1.

A statistical model is a set of mathematical formulas and assumptions that describe some realworld situation.

10-2.

Steps in statistical model building: 1) Hypothesize a statistical model; 2) Estimate the model
parameters; 3) Test the validity of the model; and 4) Use the model.

10-3.

Assumptions of the simple linear regression model: 1) A straight-line relationship between X and
Y; 2) The values of X are fixed; 3) The regression errors, , are identically normally distributed
random variables, uncorrelated with each other through time.

10-4.

0 is the Y-intercept of the regression line, and 1 is the slope of the line.

10-5.

The conditional mean of Y, E(Y | X), is the population regression line.

10-6.

The regression model is used for understanding the relationship between the two variables, X and
Y; for prediction of Y for given values of X; and for possible control of the variable Y, using the
variable X.

10-7.

The error term captures the randomness in the process. Since X is assumed nonrandom, the
addition of makes the result (Y) a random variable. The error term captures the effects on Y of a
host of unknown random components not accounted for by the simple linear regression model.

10-8.

The equation represents a simple linear regression model without an intercept (constant) term.

10-9.

The least-squares procedure produces the best estimated regression line in the sense that the line
lies inside the data set. The line is the best unbiased linear estimator of the true regression line
as the estimators 0 and 1 have smallest variance of all linear unbiased estimators of the line
parameters. Least-squares line is obtained by minimizing the sum of the squared deviations of the
data points from the line.

10-10. Least squares is less useful when outliers exist. Outliers tend to have a greater influence on the
determination of the estimators of the line parameters because the procedure is based on
minimizing the squared distances from the line. Since outliers have large squared distances they
exert undue influence on the line. A more robust procedure may be appropriate when outliers
exist.

10-1

Chapter 10 - Simple Linear Regression and Correlation

10-11. (Template: Simple Regression.xls, sheet: Regression)


Simple Regression
Income
X

Wealth
Y

Error

17.3

0.8

2
3
4

2
3
4

23.6
40.2
45.8

-3.02
3.46
-1.06

0.167 -0.967
95%
10.12 + or - 2.77974
0.833 0.967
0.333 -0.431 Confidence Interval for Intercept

56.8

-0.18

0.500 0.000

Quantile

Confidence Interval for Slope

0.667 0.431

95%

(1-) C.I. for 1

(1-) C.I. for 0


6.38 + or - 9.21937

Regression Equation: Wealth Growth = 6.38 + 10.12 Income Quantile

10-12. b1 = SSXY /SSX = 934.49/765.98 = 1.22


10-13. (Template: Simple Regression.xls, sheet: Regression)
Thus, b0 = 3.057

b1 = 0.187
2

r 0.9217 Coefficient of Determination


r 0.9601 Coefficient of Correlation

Confidence Interval for Slope


(1-) C.I. for 1

95% 0.18663 + or - 0.03609

s(b1)

Confidence Interval for Intercept


(1-) C.I. for 0

95% -3.05658 + or - 2.1372

s(b0) 0.97102Standard Error of Intercept

Prediction Interval for Y


(1-) P.I. for Y given X

X
95%
10
-1.19025 + or - 2.8317

0.0164Standard Error of Slope

s 0.99538Standard Error of prediction

Prediction Interval for E[Y|X]


X
(1-) P.I. for E[Y | X]

+ or ANOVA Table
Source
SS
Regn. 128.332
Error 10.8987
Total 139.231

df
1
11
12

MS
F
Fcritical p-value
128.332 129.525 4.84434 0.0000
0.99079

10-2

Chapter 10 - Simple Linear Regression and Correlation

10-14. b1 = SSXY /SSX = 2.11


b0 = y b1 x = 165.3 (2.11)(88.9) = 22.279
10-15.
Simple Regression
Inflation
X

Return
Y

Error

-3

-20.0642

2
3
4

2
12.6
-10.3

36
12
-8

17.9677
-16.294
-14.1247

5
6
7
8
9

0.51
2.03
-1.8
5.79
5.87

53
-2
18
32
24

36.4102
-20.0613
3.64648
10.2987
2.22121

Inflation & return on stocks


2

r 0.0873 Coefficient of Determination


r 0.2955 Coefficient of Correlation

Confidence Interval for Slope

(1-) C.I. for 1

95%

0.96809 + or - 2.7972

s(b1) 1.18294Standard Error of Slope

Confidence Interval for Intercept

95%

(1-) C.I. for 0


16.0961 + or - 17.3299

s(b0) 7.32883Standard Error of Intercept


s 20.8493Standard Error of prediction

ANOVA Table
Source
SS
Regn. 291.134
Error 3042.87
Total 3334

df
1
7
8

MS
F
291.134 0.66974
434.695

Fcritical p-value
5.59146 0.4401

10-3

Chapter 10 - Simple Linear Regression and Correlation

60
50

y = 0.9681x + 16.096
40
30
Y

20
10
0
-15

-10

-5

-10 0

10

15

-20
X

There is a weak linear relationship (r) and the regression is not significant (r2, F, p-value)
10-16.
Simple Regression
Year
X

Value
Y

Error

1960

180000

84000

2
3
4

1970
1980
1990

40000
60000
160000

-72000
-68000
16000

2000

200000

40000

Average value of Aston Martin


2

r 0.1203 Coefficient of Determination


r 0.3468 Coefficient of Correlation

Confidence Interval for Slope

(1-) C.I. for 1

95%

1600 + or - 7949.76

s(b1)

2498Standard Error of Slope

Confidence Interval for Intercept

95%

(1-) C.I. for 0


-3040000 + or - 1.6E+07

s(b0) 4946165Standard Error of Intercept


s 78993.7Standard Error of prediction

ANOVA Table
Source
SS
Regn. 2.6E+09
Error 1.9E+10
Total 2.1E+10

df
1
3
4

MS
F
2.6E+09 0.41026
6.2E+09

Fcritical p-value
10.128 0.5674

10-4

Chapter 10 - Simple Linear Regression and Correlation

250000

y = 1600x - 3E+06

200000

150000
100000
50000
0
1950

1960

1970

1980
X

1990

2000

2010

There is a weak linear relationship (r) and the regression is not significant (r 2, F, p-value).
Limitations: sample size is very small.
Hidden variables: the 70s and 80s models have a different valuation than other decades possibly
due to a different model or style.
10-17. Regression equation is:
Credit Card Transactions = 39.6717 + 0.06129 Debit Card Transactions
2

r 0.9624 Coefficient of Determination


r 0.9810 Coefficient of Correlation

Confidence Interval for Slope

95%

(1-) C.I. for 1


0.6202

+ or -

0.17018

s(b1) 0.06129Standard Error of Slope

Confidence Interval for Intercept

95%

(1-) C.I. for 0


177.641 + or - 110.147

s(b0) 39.6717Standard Error of Intercept

Prediction Interval for Y


(1-) P.I. for Y given X

X
+ or -

s 56.9747Standard Error of prediction

Prediction Interval for E[Y|X]


X
(1-) P.I. for E[Y | X]

+ or ANOVA Table
Source
SS
Regn. 332366
Error 12984.5
Total 345351

df
1
4
5

MS
F
332366 102.389
3246.12

Fcritical p-value
7.70865 0.0005

There is no implication for causality. A third variable influence could be increases in per capital
income or GDP Growth.
10-5

Chapter 10 - Simple Linear Regression and Correlation

y b b x Take partial derivatives with respect to b


/ b [ ( y b b x) ] = 2 y b b x
/ b [ ( y b b x) ] = 2 x y b b x
2

10-18. SSE =

and b1:

Setting the two partial derivatives to zero and simplifying, we get:

y b b x = 0 and x y b
y nb xb = 0 and
0

b1 x = 0.

Expanding, we get:

xy - xb x b
2

=0

Solving the above two equations simultaneously for b0 and b1 gives the required results.
10-19. 99% C.I. for 1 :

1.25533 2.807(0.04972) = [1.1158, 1.3949].

The confidence interval does not contain zero.


10-20. MSE = 7.629
From the ANOVA table for Problem 10-11:
ANOVA Table
Source
SS
Regn. 1024.14
Error 22.888
Total 1047.03

df
1
3
4

MS
1024.14
7.62933

10-21. From the regression results for problem 10-11


s(b0) = 2.897 s(b1) = 0.873
s(b1) 0.87346Standard Error of Slope
s(b0) 2.89694Standard Error of Intercept

10-22. From the regression results for problem 10-11


Confidence Interval for Slope

95%

(1-) C.I. for 1


10.12 + or - 2.77974

Confidence Interval for Intercept

95%

(1-) C.I. for 0


6.38 + or - 9.21937

95% C.I. for the slope: 10.12 2.77974 = [7.34026, 12.89974]


95% C.I. for the intercept: 6.38 9.21937 = [-2.83937, 15.59937]

10-6

Chapter 10 - Simple Linear Regression and Correlation

10-23. s(b0) = 0.971 s(b1) = 0.016; estimate of the error variance is MSE = 0.991. 95% C.I. for 1 :
0.187 + 2.201(0.016) = [0.1518, 0.2222]. Zero is not a plausible value at = 0.05.

Confidence Interval for Slope

(1-) C.I. for 1

95%

0.18663 + or - 0.03609

s(b1) 0.0164Standard Error of Slope

Confidence Interval for Intercept


(1-) C.I. for 0

95% -3.05658 + or - 2.1372

s(b0) 0.97102Standard Error of Intercept

10-24. s(b0) = 85.44 s(b1) = 0.1534


Estimate of the regression variance is MSE = 8122
95% C.I. for b1: 1.5518 2.776 (0.1534) = [1.126, 1.978]
Zero is not in the range.

Confidence Interval for Slope

(1-) C.I. for 1

95%

1.55176 + or - 0.42578

Confidence Interval for Intercept


(1-) C.I. for 0

95%
-255.943 + or - 237.219

s(b1) 0.15336Standard Error of Slope

s(b0) 85.4395Standard Error of Intercept

10-25. s 2 gives us information about the variation of the data points about the computed regression line.
10-26. In correlation analysis, the two variables, X and Y, are viewed in a symmetric way, where no one
of them is dependent and the other independent, as the case in regression analysis. In
correlation analysis we are interested in the relation between two random variables, both
assumed normally distributed.
10-27. From the regression results for problem 10-11:
r 0.9890 Coefficient of Correlation

10-28. r = 0.960
r

0.9601

Coefficient of Correlation

10-7

Chapter 10 - Simple Linear Regression and Correlation

10-29. t(5) =

0.3468
(1 .1203) / 3

= 0.640

Accept H0. The two variables are not linearly correlated.


10-30. Yes. For example suppose n = 5 and r = .51; then:
t=

r
(1 r ) /(n 2)
2

= 1.02 and we do not reject H0. But if we take n = 10,000 and

r = 0.04, giving t = 14.28, this leads to strong rejection of H0.


10-31. We have: r = 0.875 and n = 10. Conducting the test:
t (8) =

r
(1 r ) /(n 2)
2

.875
(1 .8752 ) / 8

= 5.11

There is statistical evidence of a correlation between the prices of gold and of copper.
Limitations: data are time-series data, hence not dependent random samples. Also, data set
contains only 10 points.

10-34. n= 65 r = 0.37

t (63) =

.37
(1 .37 2 ) / 63

= 3.16

Yes. Significant. There is a correlation between the two variables.


1
1
ln [(1 + r)/(1 5)] = ln (1.37/0.63) = 0.3884
2
2
1
1
= ln [(1 + )/(1 )] = ln (1.22/0.78) = 0.2237
2
2

10-35. z =

= 1/ n 3 = 1/ 62 = 0.127
z = ( z )/ = (0.3884 0.2237)/0.127 = 1.297.

Cannot reject H0.

10-36. Using TINV(,df) function in Excel, where df = n-2 = 52: =TINV(0.05,52) = 2.006645
And TINV(0.01, 52) = 2.6737
Reject H0 at 0.05 but not at 0.01. There is evidence of a linear relationship at = 0.05 only.
10-37. t (16) = b1/s(b1) = 3.1/2.89 = 1.0727.
Do not reject H0. There is no evidence of a linear relationship using any .
10-38. Using the regression results for problem 10-11:
critical value of t is: t( 0.05, 3) = 3.182
computed value of t is: t = b1/s(b1) = 10.12 / 0.87346 = 11.586
Reject H0. There is strong evidence of a linear relationship.

10-8

Chapter 10 - Simple Linear Regression and Correlation

10-39. t (11) = b1/s(b1) = 0.187/0.016 = 11.69


Reject H0. There is strong evidence of a linear relationship between the two variables.
10-40. b1/ s(b1) = 1600/2498 = 0.641
Do not reject H0. There is no evidence of a linear relationship.
10-41. t (58) = b1/s(b1) = 1.24/0.21 = 5.90
Yes, there is evidence of a linear relationship.
10-42. Using the Excel function, TDIST(x,df,#tails) to estimate the p-value for the t-test results, where
x = 1.51, df = 585692 2 = 585690, #tails = 2 for a 2-tail test:
TDIST(1.51, 585690,2) = 0.131.
The corresponding p-value for the results is 0.131. The resgression is not significant even at the
0.10 level of significance.
10-43. t (211) = z = b1/s(b1) = 0.68/12.03 = 0.0565
Do not reject H0. There is no evidence of a linear relationship using any . (Why report such
results?)
10-44. b1 = 5.49 s(b1) = 1.21
t (26) = 4.537
Yes, there is evidence of a linear relationship.
10-45. The coefficient of determination indicates that 9% of the variation in customer satisfaction can
be explained by the changes in a customers materialism measurement.
10-46 a. The model should not be used for prediction purposes because only 2.0% of the
variation in pension funding is explained by its relationship with firm profitability.
b. The model explains virtually nothing.
c. Probably not. The model explains too little.
10-47. In Problem 10-11 regression results, r 2 = 0.9781. Thus, 97.8% of the variation in wealth growth
is explained by the income quantile.
2

0.9781 Coefficient of Determination

10-48. In Problem 10-13, r 2 = 0.922. Thus, 92.2% of the variation in the dependent variable is
explained by the regression relationship.
10-49. r 2 in Problem 10-16: r 2 = 0.1203
10-50. Reading directly from the MINITAB output: r 2 = 0.962

10-9

Chapter 10 - Simple Linear Regression and Correlation

r 0.9624 Coefficient of Determination

10-51. Based on the coefficient of determination values for the five countries, the UK model explains
31.7% of the variation in long-term bond yields relative to the yield spread. This is the best
predictive model of the five. The next best model is the one for Germany, which explains 13.3%
of the variation. The regression models for Canada, Japan, and the US do not predict long-term
yields very well.
10-52. From the information provided, the slope coefficient of the equation is equal to -14.6. Since its
value is not close to zero (which would indicate that a change in bond ratings has no impact on
yields), it would indicate that a linear relationship exists between bond ratings and bond yields.
This is in line with the reported coefficient of determination of 61.56%.
10-53. r 2 in Problem 10-15: r 2 = 0.873
2

r 0.8348 Coefficient of Determination

10-54.

( y y) = [( y y) ( y y )] = [( y y) 2( y y)( y y ) ( y y )
= ( y y ) 2 ( y y )( y y ) + ( y y )
But: 2 ( y y )( y y ) = 2 y ( y y ) 2 y ( y y ) = 0
2

because the first term on the right is the sum of the weighted regression residuals, which sum to
zero. The second term is the sum of the residuals, which is also zero. This establishes the result:
( y y ) 2 ( y y ) 2 ( y y ) 2 .

10-55. From Equation (10-10): b1 = SSXY/SSX.


From Equation (10-31):
SSR = b1SSXY. Hence, SSR = (SSXY /SSX)SSXY = (SSXY) 2/SSX
10-56. Using the results for problem 10-11:
F = 134.238 F(1,3) = 10.128
Reject H0.
F
134.238

Fcritical p-value
10.128 0.0014

10-57. F (1,11) = 129.525


F
129.525

10-58. F(1,4) = 102.39

t (11) = 11.381
Fcritical
4.84434

t (4) = 10.119

t 2 = 11.3812 = the F-statistic value already calculated.


p-value
0.0000

t 2 = F (10.119)2 = 102.39

10-10

Chapter 10 - Simple Linear Regression and Correlation

F
102.389

p-value
0.0005

Fcritical
7.70865

10-59. F (1,7) = 0.66974 Do not reject H0.

10-60. F (1,102) = MSR/MSE =

87,691/ 1
= 701.8
12,745 / 102

There is extremely strong evidence of a linear relationship between the two variables.
10-61. t (k2 ) = F (1,k) . Thus, F(1,20) = [b1/s(b1)]2 = (2.556/4.122)2 = 0.3845
Do not reject H0. There is no evidence of a linear relationship.

10-62

t (k2 )

SS / SS
X
= [b1/s(b1)] = XY
s / SS
X

[using Equations (10-10) and (10-15) for b1 and s(b1), respectively]


SS / SS
XY
X
=
MSE / SS
X

= (SS XY / SS X )

MSE / SS X

SSR/1
MSR
SS 2XY / SS X
=
=
= F (1,k)
MSE
MSE
MSE

[because SS 2XY / SS X = SSR by Equations (10-31) and (10-10)]


10-63. a. Heteroscedasticity.
b. No apparent inadequacy.
c. Data display curvature, not a straight-line relationship.
10-64. a. No apparent inadequacy.
b. A pattern of increase with time.
10-65. a. No serious inadequacy.
b. Yes. A deviation from the normal-distribution assumption is apparent.

10-11

Chapter 10 - Simple Linear Regression and Correlation

10-66. Using the results for problem 10-11:


Residual Analysis

Durbin-Watson statistic
d
3.39862
Residual Plot

4
3
2

Error

1
0
-1
-2
-3
-4
X

Residual variance fluctuates; with only 5 data points the residuals appear to be normally
distributed.

Normal Probability Plot of Residuals


3

Corresponding
Normal Z

2
1
0
-10

-5

0
-1
-2
-3

Residuals

10-12

10

Chapter 10 - Simple Linear Regression and Correlation

10-67. Residuals plotted against the independent variable of Problem 10-14:


*
resids

1.2+
*

*
*

0.0+

*
*

*
*

*
-1.2+

Quality
30

40

50

60

No apparent inadequacy.
Residual Analysis

Durbin-Watson statistic
d
2.0846

10-68.

10-13

70

80

Chapter 10 - Simple Linear Regression and Correlation

Residual Analysis
d

Durbin-Watson statistic
1.70855

Plot shows some curvature.

10-69. In the American Express example, give a 95% prediction interval for x = 5,000:
y = 274.85 + 1.2553(5,000) = 6,551.35.
P.I. = 6,551.35 (2.069)(318.16) 1

1 (5,000 3,177.92) 2

25
40,947,557.84

= [5,854.4, 7,248.3]
10-70. Given that the slope of the equation for 10-52 is 14.6, if the rating falls by 3 the yield should
increase by 43.8 basis points.
10-71. For 99% P.I.:

t .005(23) = 2.807

6,551.35 (2.807)(318.16) 1

1 (5,000 3,177.92) 2

25
40,947,557.84

= [5,605.75, 7,496.95]
10-72. Point prediction: y 6.38 10.12(4) 46.86
The 99% P.I.: [28.465, 65.255]
Prediction Interval for Y
(1-) P.I. for Y given X

X
99%
4
46.86 + or - 18.3946

10-14

Chapter 10 - Simple Linear Regression and Correlation

10-73. The 99% P.I.: [36.573, 77.387]


Prediction Interval for Y
(1-) P.I. for Y given X

X
99%
5
56.98 + or - 20.407

10-74. The 95% P.I.: [-142633, 430633]


Prediction Interval for Y
(1-) P.I. for Y given X

X
95% 1990
144000 + or - 286633

10-75. The 95% P.I.: [-157990, 477990]


Prediction Interval for Y
(1-) P.I. for Y given X

X
95% 2000
160000 + or - 317990

16.0961 0.96809(5) 20.9365


10-76. Point prediction: y
10-77.
a) simple regression equation: Y = 2.779337 X 0.284157
when X = 10, Y = 27.5092
Intercept

Slope

b0
b1
-0.284157 2.779337

b) forcing through the origin: regression equation: Y = 2.741537 X.


Intercept

Slope

b0
0

b1
2.741537

When X = 10, Y = 27.41537


Prediction
X
10

Y
27.41537

c) forcing through (5, 13): regression equation: Y = 2.825566 X 1.12783


Intercept

Slope

b0
b1
-1.12783 2.825566

Prediction
X
5

Y
13

10-15

Chapter 10 - Simple Linear Regression and Correlation

When X = 10, Y = 27.12783


Prediction
X
10

Y
27.12783

d) slope 2: regression equation: Y = 2 X + 4.236


Intercept

Slope

b0
4.236

b1
2

When X = 10, Y = 24.236


10-78. Using Excel function, TINV(x, df), where x = the p-value of 0.034 and df = 2058 2:
TINV(0.034, 2056) = 2.121487. Since the slope coefficient = -0.051, t-value becomes negative, t
= -2.121487.
a) standard error of the slope: sb1

b1
0.051
0.02404

2.12487
t

b) Using an = 0.05, we would reject the null hypothesis of no relationship between the
response variable and the predictor based on the reported p-value of 0.034.
10-79. Given the reported p-value, we would reject the null hypothesis of no relationship between
neuroticism and job performance. Given the reported coefficient of determination, 19% of the
variation in job performance can be explained by neuroticism.
10-80. The t-statistic for the reported information is:

b1 0.233

4.236
0.055
sb1

Using Excel function, TDIST(t,df,#tails), we get a p-value of 0.000068:


TDIST(4.236, 70, 2) = 6.8112E-05. There is a linear relationship between frequency of online
shopping and the level of perceived risk.

10-81 (From Minitab)


The regression equation is
Stock Close = 67.6 + 0.407 Oper Income

10-16

Chapter 10 - Simple Linear Regression and Correlation

Predictor
Constant

Coef

Oper Inc

67.62
0.40725

s = 9.633

R-sq = 89.0%

Stdev
12.32

t-ratio
5.49
11.38

0.03579

p
0.000
0.000

R-sq(adj) = 88.3%

Analysis of Variance
SOURCE
DF
SS
MS
F
p
Regression
1
12016
12016
129.49 0.000
Error
16
1485
93
Total
17
13500
Stock close based on an operating income of $305M is y = $56.24.

(Minitab results for Log Y)


The regression equation is
Log_Stock Close = 2.32 + 0.00552 Oper Inc
Predictor
Constant
Oper Inc
s = 0.08422

Coef
2.3153
0.0055201

Stdev
0.1077
0.0003129

R-sq = 95.1%

Analysis of Variance
SOURCE
DF
Regression
1
Error
16
Total
17

p
0.000
0.000

R-sq(adj) = 94.8%

SS
2.2077
0.1135
2.3212

Unusual Observations
Obs.
x
y
1
240
3.8067

t-ratio
21.50
17.64

MS
2.2077
0.0071

Fit
3.6401

F
311.25

Stdev.Fit
0.0366

p
0.000

Residual
0.1666

R denotes an obs. with a large st. resid.


Stock close based on an operating income of $305M is y = $54.80

10-17

St.Resid
2.20R

Chapter 10 - Simple Linear Regression and Correlation

The regression using the Log of monthly stock closings is a better fit. Operating Income explains
over 95% of the variation in the log of monthly stock closings versus 89% for non-transformed Y.
10-82. a) The calculated t-value for the slope coefficient is:

b1 0.92

92.00
sb1 0.01

Using Excel function, TDIST(t,df,#tails), we get a p-value of 0.0


TDIST(92.0, 598, 2) = 0. There is a linear relationship.
b) The excess return would be 0.9592:
FER = 0.95 + 0.92(0.01) = 0.9592
10-83
a) adding 2 to all X values: new regression: Y = 5 X + 17
since the intercept is b0 Y b1 X , the only thing that has changed is that the value for Xbar has increased by 2. Therefore, take the change in X-bar times the slope and add it to the
original regression intercept.
b) adding 2 to all Y values: new regression: Y = 5X + 9
using the formula for the intercept, only the value for Y-bar changes by 2. Therefore, the
intercept changes by 2
c) multiplying all X values by 2: new regression: Y = 2.5 X + 7
d) multiplying all Y Values by 2: new regression: Y = 10 X + 7
10-84 You are minimizing the squared deviations from the former x-values instead of the former yvalues.
10-85
a)

Y = 3.820133 X + 52.273036
Intercept

Slope

b0
b1
52.273036 3.820133

b)

90% CI for slope: [3.36703, 4.27323]


Confidence Interval for Slope

c)

(1-) C.I. for 1

90%

3.82013 + or - 0.4531

r2 = 0.9449, very high; F = 222.931 (p-value = 0.000): both indicate that X affects Y

10-18

Chapter 10 - Simple Linear Regression and Correlation

d)

since the 99% CI does not contain the value 0, the slope is not 0
Confidence Interval for Slope

e)

(1-) C.I. for 1

99%

3.82013 + or - 0.77071

Y = 90.47436 when X = 10
Prediction
X
10

Y
90.47436

f)

X = 12.49354

g)

residuals appear to be random

Residual Analysis
d

h)

Durbin-Watson statistic
2.56884

appears to be a little flatter than normal

10-19

Chapter 10 - Simple Linear Regression and Correlation

Case 13: Level of leverage


a) Leverage = -0.118 0.040 (Rights)
b) Using Excel function, TDIST(t,df,#tails), we get a p-value of 0.0
TDIST(2.62, 1307, 2) = 0.0089 There is a linear relationship.
c) The reported coefficient of determination indicates that shareholders rights explain 16.5%
of the variation in a firms leverage.
Case 14: Risk and Return
1)

Y = 1.166957 X 1.060724
Intercept

Slope

b0
b1
-1.090724 1.166957

2)

stock has above average risk: b1 > 1.10

3)

95 % CI for slope:
Confidence Interval for Slope

4)

(1-) C.I. for 1

95%

1.16696 + or - 0.37405

When X = 10, Y = 10.57884


95% CI on prediction:
Prediction Interval for Y
(1-) P.I. for Y given X

X
95% 10
10.5788 + or - 5.35692

5)

residuals appear random


Residual Analysis
d

Durbin-Watson statistic
0.83996

10-20

Chapter 10 - Simple Linear Regression and Correlation

6)

a little flatter than normal


Normal Probability Plot of Residuals

Cor responding
Nor mal Z

3
2
1
0
-10

-5

-1

10

-2
-3

Residuals

7)

Y = 1.157559 0.945353
Intercept

Slope

b0
b1
-0.945353 1.157559

Prediction
X
6

Y
6

risk has dropped a little but it is still above average since b1 > 1.10

10-21

Chapter 11 - Multiple Regression

CHAPTER 11
MULTIPLE REGRESSION
(The template for this chapter is: Multiple Regression.xls.)
11-1.

The assumptions of the multiple regression model are that the errors are normally and
independently distributed with mean zero and common variance 2 . We also assume that the X i
are fixed quantities rather than random variables; at any rate, they are independent of the error
terms. The assumption of normality of the errors is need for conducting test about the regression
model.

11-2.

Holding advertising expenditures constant, sales volume increases by 1.34 units, on average, per
increase of 1 unit in promotional experiences.

11-3.

In a correlational analysis, we are interested in the relationships among the variables. On the
other hand, in a regression analysis with k independent variables, we are interested in the effects
of the k variables (considered fixed quantities) on the dependent variable only (and not on one
another).

11-4.

A response surface is a generalization to higher dimensions of the regression line of simple linear
regression. For example, when 2 independent variables are used, each in the first order only, the
response surface is a plane is a plane in 3-dimensional euclidean space. When 7 independent
variables are used, each in the first order, the response surface is a 7-dimensional hyperplane in
8-dimensional euclidean space.

11-5.

8 equations.

11-6.

The least-squares estimators of the parameters of the multiple regression model, obtained as
solutions of the normal equations.

11-7.

Y nb b X b X
X Y b X b X b X X
X Y b X b X X b X
0

2
2

852 = 100b0 + 155b1 + 88b2


11,423 = 155b0 + 2,125b1 + 1,055b2
8,320 = 88b0 + 1,055b1 + 768b2
b0 = (852 155b1 88b2)/100
11,423 = 155(852 155b1 88b2)/100 + 2,125b1 + 1,055b2
8,320 = 88(852 155b1 88b2)/100 + 1,055b1 + 768b2

11-1

Chapter 11 - Multiple Regression

Continue solving the equations to obtain the solutions:


b0 = 1.1454469
11-8.

b1 = 0.0487011

b2 = 10.897682

Using SYSTAT:
DEP VAR:

VALUE

N: 9

MULTIPLE R: .909

ADJUSTED SQUARED MULTIPLE R:

.769

STANDING ERROR OF ESTIMATE:


VARIABLE

COEFFICIENT

SQUARED MULTIPLE R: .826

59.477

STD ERROR

STD COEF

TOLERANCE

P(2TAIL)

CONSTANT

9.800

80.763

0.121

0.907

SIZE

0.173

0.040

0.753

0.9614430

4.343

0.005

DISTANCE

31.094

14.132

0.382

0.9614430

2.200

0.070

0.000

ANALYSIS OF VARIANCE
SOURCE

SUM-OF-SQUARES

DF

MEAN-SQUARE

F-RATIO

REGRESSION

101032.867

50516.433

14.280

RESIDUAL

21225.133

3537.522

Multiple Regression Results


0
Intercept
b -9.7997
s(b) 80.7627
t -0.1213
p-value 0.9074

1
2
Size Distance
0.17331
31.094
0.0399
14.132
4.34343
2.2002
0.0049
0.0701

VIF 1.0401

P
0.005

Value
3

1.0401

ANOVA Table
Source
SS
Regn. 101033
Error 21225.1

df
2
6

MS
50516
3537.5

Total 122258

15282

11-2

FCritical p-value
F
14.28 5.1432 0.0052
2

R 0.8264

s 59.477
2

Adjusted R 0.7685

Chapter 11 - Multiple Regression

11-9.

With no advertising and no spending on in-store displays, sales are b0 47.165 (thousands) on
the average. Per each unit (thousand) increase in advertising expenditure, keeping in-store
display expenditure constant, there is an average increase in sales of b1 = 1.599 (thousand).
Similarly, for each unit (thousand) increase in in-store display expenditure, keeping advertising
constant, there is an average increase in sales of b2 = 1.149 (thousand).

11-10. We test whether there is a linear relationship between Y and any of the X, variables (that is, with
at least one of the Xi). If the null hypothesis is not rejected, there is nothing more to do since
there is no evidence of a regression relationship. If H0 is rejected, we need to conduct further
analyses to determine which of the variables have a linear relationship with Y and which do not,
and we need to develop the regression model.
11-11. Degrees of freedom for error = n 13.
11-12. k = 2
n = 82
SSE = 8,650 SSR = 988
MSR = SSR / k = 988 / 2 = 494
SST = SSR + SSE = 988 + 8650 = 9638
MSE = SSE / n (k+1) = 8650 / 79 = 109.4937
F = MSR / MSE = 494 / 109.4937 = 4.5116
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the dfs refer to the degrees of freedom in the numerator and denominator, respectively.
FDIST(4.5116, 2, 79) = 0.013953
Yes, there is evidence of a linear regression relationship at = 0.05, but not at 0.01.

11-13. F (4,40) = MSR/MSE =

7,768 / 4
= 1,942/197.625 = 9.827
(15,673 7,768) / 40

Yes, there is evidence of a linear regression relationship between Y and at least one of the
independent variables.
11-14. Source
Regression
Error
Total

SS
7,474.0
672.5
8,146.5

df
3
13
16

MS
2,491.33
51.73

F
48.16

Since the F-ratio is highly significant, there is evidence of a linear regression relationship
between overall appeal score and at least one of the three variables prestige, comfort, and
economy.
11-15. When the sample size is small; when the degrees of freedom for error are relatively smallwhen
adding a variable and thus losing a degree if freedom for error is substantial.

11-3

Chapter 11 - Multiple Regression

11-16. R 2 = SSR/SST. As we add a variable, SSR cannot decrease. Since SST is constant, R 2 cannot
decrease.
11-17. No. The adjusted coefficient is used in evaluating the importance of new variables in the
presence of old ones. It does not apply in the case where all we consider is a single independent
variable.
11-18. By the definition of the adjusted coefficient of determination, Equation (11-13):
R2 = 1

n 1
SSE /( n k 1)
= 1 (SSE/SST)
n k 1
SST /( n 1)

But: SSE/SST = 1 R 2, so the above is equal to:


1 (1 R 2)

n 1
n (k 1)

which is Equation (11-14).

11-19. The mean square error gives a good indication of the variation of the errors in regression.
However, other measures such as the coefficient of multiple determination and the adjusted
coefficient of multiple determination are useful in evaluating the proportion of the variation in
the dependent variable explained by the regressionthus giving us a more meaningful measure
of the regression fit.
11-20. Given an adjusted R 2 = 0.021, only 2.1% of the variation in the stock return is explained by the
four independent variables.
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the dfs refer to the degrees of freedom in the numerator and denominator, respectively.
FDIST(2.27, 4, 433) = 0.06093
There is evidence of a linear regression relationship at = 0.10 only.

11-21. R 2 = 7,474.0/8,146.5 = 0.9174 A good regression.


R 2 = 1 (1 0.9174)(16/13) = 0.8983

s=

MSE =

51.73 = 7.192

11-22. Given R 2 = 0.94, k = 2 and n = 383, the adjusted R 2is:


n 1
= 1 (1 0.94)(382/380) = 0.9397
n (k 1)
Therefore, security and time effects characterize 93.97% of the variation on market price. Given
the value of the adjusted R 2, the model is a reliable predictor of market price.

R2

=1 (1 R 2)

11-23. R 2 = 1 (1 R 2)

n 1
= 1 (1 0.918)(16/12) = 0.8907
n (k 1)

Since R 2 has decreased, do not include the new variable.

11-4

Chapter 11 - Multiple Regression

11-24. Given R 2 = 0.769, k = 6 and n = 242


R 2 = 1 (1 R 2)

n 1
= 1 (1 0.769)(241/235) = 0.7631
n (k 1)

Since R 2 =76.31%, approximately 76% of the variation in the information price is characterized
by the 6 independent marketing variables.
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the dfs refer to the degrees of freedom in the numerator and denominator, respectively.
FDIST(44.8, 6, 235) = 2.48855E-36
There is evidence of a linear regression relationship at all s.
11-25. a.

The regression expresses stock returns as a plane in space, with firm size ranking and
stock price ranking as the two horizontal axes:
RETURN = 0.484 - 0.030(SIZRNK) 0.017(PRCRNK)
The t-test for a linear relationship between returns and firm size ranking is highly significant,
but not for returns against stock price ranking.

b. We know that R 2 = 0.093 and n = 50, k = 2. Using Equation (11-14) we calculate:


n 1
= 1 R 2
(1 R 2)
n

(
k

1
)

n (k 1)
R 2 = 1 (1 R 2 )
= 1 (1 0.093)(47/49) = 0.130
n 1
Thus, 13% of the variation is due to the two independent variables.

c. The adjusted R 2 is quite low, indicating that the regression on both variables is not a good
model. They should try regressing on size alone.
11-26. R 2 = 1 (1 - R 2)

n 1
= 1 (1 0.72)(712/710) = 0.719
n (k 1)

Based solely on this information, this is not a bad regression model.


11-27. k = 8

n = 500
Source
Regn.
Error
Total

SSE = 6179

SST = 23108

SS
16929
6179

df
8
491

23108

499

11-5

MS
2116.125
12.5845
3.0684E+14

F
168.153

Chapter 11 - Multiple Regression

Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the dfs refer to the degrees of freedom in the numerator and denominator, respectively.
FDIST(168.153, 8, 491) = 0.00 approximately
There is evidence of a linear regression relationship at all s.
R 2 = SSR/SST = 0.7326

R2= 1

SSE /[ n (k 1)]
= 0.7282
SST /( n 1)

MSE = 12.5845

11-28. A joint confidence region for both parameters is a set of pairs of likely values of 1 , and 2 at
95%. This region accounts for the mutual dependency of the estimators and hence is elliptical
rather than rectangular. This is why the region may not contain a bivariate point included in the
separate univariate confidence intervals for the two parameters.
11-29. Assuming a very large sample size, we use the following formula for testing the significance of
each of the slope parameters: z

bi
. and use = 0.05. Critical value of |z| = 1.96
sbi

For firm size: z = 0.06/0.005 = 12.00 (significant)


For firm profitability: z = -5.533 (significant)
For fixed-asset ratio: z = -0.08
For growth opportunities: z = -0.72
For nondebt tax shield: z = 4.29 (significant)
The slope estimates with respect to firm size, firm profitability and nondebt tax shield are
not zero. The adjusted R-square indicates that 16.5% of the variation in governance level is
explained by the five independent variables. Next step: exclude fixed-asset ratio and growth
opportunities from the regression and see what happens to the adjusted R-square.\
11-30. 1.
The usual caution about the possibility of a Type 1 error.
2. Multicollinearity may make the tests unreliable.
3. Autocorrelation in the errors may make the tests unreliable.
11-31. 95% C.I.s for 2 through 5 :

2 : 5.6 1.96(1.3) = [3.052, 8.148]


3 : 10.35 1.96(6.88) = [3.135, 23.835]
4 : 3.45 1.96(2.7) = [1.842, 8.742]
5 : 4.25 1.96(0.38) = [4.995, 3.505]
3 & 4 :contains the point (0,0)

11-32. Use the following formula for testing the significance of each of the slope parameters:

bi
. and use = 0.05. Critical value of |z| = 1.96
sbi

11-6

Chapter 11 - Multiple Regression

For unexpected accruals: z = -2.0775 / 0.4111 = -5.054 (significant)


For auditor quality: z = 0.5176
For return on investment: z = 1.7785
For expenditure on R&D: z = 2.1161 (significant)
The R-square indicates that 36.5% of the variation in a firms reputation can be explained by the
four independent variables listed.
11-33. Yes. Considering the joint confidence region for both slope parameters is equivalent to
conducting an F test for the existence of a linear regression relationship. Since (0,0) is not in the
joint 95% region, this is equivalent to rejecting the null hypothesis of the F test at = 0.05.
11-34. Prestige is not significant (or at least appears so, pending further analysis). Comfort and
Economy are significant (Comfort only at the 0.05 level). The regression should be rerun with
variables deleted.
11-35. Variable Lend seems insignificant because of collinearity with M1 or Price.
11-36. a. As Price is dropped, Lend becomes significant: there is, apparently, a collinearity between
Lend and Price.
b.,c. The best model so far is the one in Table 11-9, with M1 and Price only. The adjusted R 2 for
that model is higher than for the other regressions.
d. For the model in this problem, MINITAB reports F = 114.09. Highly significant. For the
model in Table 11-9: F = 150.67. Highly significant.
e. s = 0.3697. For Problem 11-35: s = 0.3332. As a variable is deleted, s (and its square, MSE)
increases.
f. In Problem 11-35: MSE = s 2 = (0.3332)2 = 0.111.
11-37. Autocorrelation of the regression error may cause this.
11-38. Use the following formula for testing the significance of each of the slope parameters:

bi
. and use = 0.05. Critical value of |z| = 1.96
sbi

For new technological process: z = -0.014 / 0.004 = -3.50 (significant)


For organizational innovation: z = 0.25
For commercial innovation: z = 3.2 (significant)
For R&D: z = 4.50 (significant)
All but organizational innovation is an important independent variable in explaining
employment growth. The R-square indicates that 74.3% of the variation in employment growth
is explained by the four independent variables in the equation.

11-7

Chapter 11 - Multiple Regression

11-39. Regress Profits on Employees and Revenues


Multiple Regression

Sl.No.
1
2
3
4
5
6
7
8
9
10

Y
Profits
-1221
-2808
-773
248
38
1461
442
14
57
108

ANOVA Table
Source
Regn.
Error
Total

1
Ones
1
1
1
1
1
1
1
1
1
1

X1
Employees
96400
63000
70600
39100
37680
31700
32847
12867
11475
6000

SS
df
4507008.861 2
7281731.539 7
11788740.4

X2
Revenues
17440
13724
13303
9510
8870
6846
5937
2445
2254
1311

MS
2253504.43
1040247.363
1309860.044

Multiple Regression Results


0
1
2
Employees
Revenues
Intercept
0.0085493 -0.174148688
b 834.9510193
s(b) 621.1993315 0.064416986 0.340929503
t 1.344095167 0.132718098 -0.510805567
0.2208
0.8982
0.6252
p-value
VIF

FCritical p-value
F
2.166
4.737 0.1852
2

R 0.3823

29.8304

29.8304

1019.925

0.2058

Adjusted R

Correlation matrix
1
2
Employees Revenues
1 Employees
1.0000
2 Revenues
0.9831
1.0000
Y

Profits

-0.5994

-0.6171

Regression Equation:
Profits = 834.95 + 0.009 Employees - 0.174 Revenues
The regression equation is not significant (F value), and there is a large amount of
multicollinearity present between the two independent variables (0.9831). There is so much
multicollinearity present that the negative partial correlations between the independent variables
and profits are not maintained in the regression results (both of the parameters of the independent
variables should be negative). None of the values of the parameters are significant.

11-40. The residual plot exhibits both heteroscedasticity and a curvature apparently not accounted for in
the model.

11-8

Chapter 11 - Multiple Regression

11-41.
a) residuals appear to be normally distributed
b) residuals are not normally distributed
11-42. An outlier is an observation far from the others.
11-43. A plot of the data or a plot of the residuals will reveal outliers. Also, most computer packages
(e.g., MINITAB) will automatically report all outliers and suspected outliers.
11-44. Outliers, unless they are due to errors in recording the data, may contain important information
about the process under study and should not be blindly discarded. The relationship of the true
data may well be nonlinear.
11-45. An outlier tends to tilt the regression surface toward it, because of the high influence of a large
squared deviation in the least-squares formula, thus creating a possible bias in the results.
11-46. An influential observation is one that exerts relatively strong influence on the regression surface.
For example, if all the data lie in one region in X-space and one observation lies far away in X, it
may exert strong influence on the estimates of the regression parameters.
11-47. This creates a bias. In any case, there is no reason to force the regression surface to go through
the origin.
11-48. The residual plot in Figure 11-16 exhibits strong heteroscedasticity.
11-49. The regression relationship may be quite different in a region where we have no observations
from what it is in the estimation-data region. Thus predicting outside the range of available data
may create large errors.
11-50. y = 47.165 + 1.599(8) + 1.149(12) = 73.745 (thousands), i.e., $73,745.
11-51. In Problem 11-8: X 2 (distance) is not a significant variable, but we use the complete original
regression relationship given in that problem anyway (since this problem calls for it):
y = 9.800 + 0.173X 1 + 31.094X 2
y (1800,2.0) = 9.800 + (.173)1800 + (31.094)2.0 = 363.78

11-52. Using the regression coefficients reported in Problem 11-25:


Y = 0.484 0.030Sizrnk 0.017Prcrnk = 0.484 0.030(5.0) 0.017(6.0) = 0.232
11-53. Estimated SE( Y ) is obtained as:
(3.939 0.6846)/4 = 0.341.
Estimated SE(E(Y | x)) is obtained as: (3.939 0.1799)/4 = 0.085.

11-9

Chapter 11 - Multiple Regression

11-54. From MINITAB:


Fit: 73.742
St Dev Fit: 2.765
95% C.I. [67.203, 80.281] 95% P.I. [65.793, 81692]
(all numbers are in thousands)
11-55. The estimators are the same although their standard errors are different.
11-56. A prediction interval reflects more variation than a confidence interval for the conditional mean
of Y. The additional variation is the variation of the actual predicted value about the conditional
mean of Y (the estimator of which is itself a random variable).
11-57. This is a regression with one continuous variable and one dummy variable. Both variables are
significant. Thus there are two distinct regression lines. The coefficient of determination is
respectively high. During times of restricted trade with the Orient, the company sells 26,540
more units per month, on average.
11-58. Use the following formula for testing the significance of each of the slope parameters:

bi
. and use = 0.05. Critical value of |z| = 1.96
sbi

For the dummy variable: z = -0.003 / 0.29 = -0.0103 is not significant. A firms being regulated
or not does not affect its leverage level.
11-59. Two-way ANOVA.
11-60. Use analysis of covariance. Run it as a regressionLength of Stay is the concomitant variable.
11-61. Early investment is not statistically significant (or may be collinear with another variable). Rerun
the regression without it. The dummy variables are both significantthere is a distinct line (or
plane if you do include the insignificant variable) for each type of firm.
11-62. This is a second-order regression model in three independent variables with cross-terms.
11-63. The STEPWISE routine chooses Price and M1 * Price as the best set of explanatory variables.
This gives the estimated regression relationship:
Exports = 1.39 + 0.0229Price + 0.00248M1 * Price
The t-statistics are: 2.36, 4.57, 9.08, respectively. R 2 = 0.822.
11-64. The STEPWISE routine chooses the three original variables: Prod, Prom, and Book, with no
squares. Thus the original regression model of Example 11-3 is better than a model with squared
terms.

11-10

Chapter 11 - Multiple Regression

Example 11-3 with production costs squared: higher s than original model.
Multiple Regression Results
0
Intercept
b 7.04103
s(b) 5.82083
t 1.20963
0.2451
p-value

1
2
prod promo
3.10543 2.2761
1.76478 0.262
1.75967 8.6887
0.0988 0.0000

3
4
book prod^2
7.1125 -0.017
1.9099 0.1135
3.7241
-0.15
0.0020 0.8827

VIF 34.5783 1.7050 1.2454 32.3282


ANOVA Table
Source
SS
Regn. 6325.48
Error 217.472

df
4
15

MS
FCritical p-value
F
1581.4 109.07 3.0556 0.0000
14.498

Total 6542.95

19

344.37

s 3.8076
2

R 0.9668

Adjusted R 0.9579

Example 11-3 with production and promotion costs squared: higher s and slightly higher R2
Multiple Regression Results
0
Intercept
b 5.30825
s(b) 5.84748
t 0.90778
p-value 0.3794

1
2
prod promo
4.29943 1.2803
1.95614 0.8094
2.19792 1.5817
0.0453 0.1360

3
book
6.7046
1.8942
3.5396
0.0033

4
5
prod^2 promo^2
-0.0948 0.0731
0.1262 0.0564
-0.7511
1.297
0.4651 0.2156

VIF 44.4155 17.0182 1.2807 41.7465 16.2580


ANOVA Table
Source
SS
Regn. 6348.81
Error 194.145

df
5
14

MS
FCritical p-value
F
1269.8 91.564 2.9582 0.0000
13.867

Total 6542.95

19

344.37

11-11

0.9703

s 3.7239
2

Adjusted R 0.9597

Chapter 11 - Multiple Regression

Example 11-3 with promotion costs squared: slightly lower s, slightly higher R2
Multiple Regression Results
0
1
2
Intercept prod promo
b 9.21031 2.86071 1.5635
s(b) 2.64412 0.39039 0.7057
t 3.48332 7.3279 2.2157
p-value 0.0033 0.0000 0.0426
VIF

3
4
book promo^2
7.0476
0.053
1.8114 0.0489
3.8908 1.0844
0.0014 0.2953

1.8219 13.3224 1.2062 12.5901

ANOVA Table
Source
SS
Regn. 6340.98
Error 201.967

df
4
15

MS
1585.2
13.464

Total 6542.95

19

344.37

FCritical p-value
F
117.74 3.0556 0.0000
2

R 0.9691

s 3.6694
2

Adjusted R 0.9609

11-65. Use the following formula for testing the significance of each of the slope parameters:

bi
. and use = 0.05. Critical value of |z| = 1.96
sbi

For After * Bankdep: z = -0.398 / 0.035 = -11.3714 (significant interaction)


For After * Bankdep * ROA: z = 2.7193 (significant interaction)
For After * ROA: z = -3.00 (significant interaction)
For Bankdep * ROA: z = -3.9178 (significant interaction)
An adjusted R-square of 0.53 indicates that 53% of the variation in bank equity has been
expressed by interaction among the independent variables.
11-66. The squared X 1 variable and the cross-product term appear not significant. Drop the least
significant term first, i.e., the squared X 1, and rerun the regression. See what happens to the
cross-product term now.
11-67. Try a quadratic regression (you should get a negative estimated x 2 coefficient).
11-68. Try a quadratic regression (you should get a positive estimated x 2 coefficient). Also try a cubic
polynomial.
11-69. Linearizing a model; finding a more parsimonious model than is possible without a
transformation; stabilizing the variance.

11-12

Chapter 11 - Multiple Regression

11-70. A transformed model may be more parsimonious, when the model describes the process well.
11-71. Try the transformation logY.
11-72. A good model is log(Exports) versus log(M 1) and log(Price). This model has R 2 = 0.8652. Thus
implies a multiplicative relation.
11-73. A logarithmic model.
11-74. This dataset fits an exponential model, so use a logarithmic transformation to linearize it.
11-75. A multiplicative relation (Equation (11-26)) with multiplicative errors. The reported error term,
, is the logarithm of the multiplicative error term. The transformed error term is assumed to
satisfy the usual model assumptions.
11-76. An exponential model Y = (e 0 1x1 2 x2 ) =

(e3.79+1.66 1 +2.91 2 )

11-77. No. We cannot find a transformation that will linearize this model.
11-78. Take logs of both sides of the equation, giving:
log Q = log 0 + 1log C + 2log K + 3log L + log
11-79. Taking reciprocals of both sides of the equation.
11-80. The square-root transformation Y Y
11-81. No. They minimize the sum of the squared deviations relevant to the estimated, transformed
model.
11-82. It is possible that the relation between a firms total assets and bank equity is not linear.
Including the logarithm of a firms total assets is an attempt to linearize that relationship.
11-83.
Prod
Prom
Book

Earn
.867
.882
.547

Prod

Prom

.638
.402

.319

As evidenced by the relatively low correlations between the independent variables,


multicollinearity does not seem to be serious here.

11-13

Chapter 11 - Multiple Regression

11-84. The VIFs are: 1.82, 1.70, 1.20. No severe multicollinearity is present.
11-85. The sample correlation is 0.740. VIF = 2.2 minor multicollinearity problem
11-86.
a) Y = 11.031 + 0.41869 X1 7.2579 X2 + 37.181 X3
Multiple Regression Results
0
1
2
X1
X2
Intercept
11.031 0.41869 -7.2579
b
s(b) 20.9905 0.28418 5.3287
t 0.52552 1.47334
-1.362
0.6107
0.1714 0.2031
p-value
VIF

1.0561 557.7

ANOVA Table
Source
SS
Regn. 2459.78
Error 5981.02
Total

8440.8

3
X3
37.181
26.545
1.4007
0.1916

557.9

df
3
10

MS
819.93
598.1

13

649.29

FCritical p-value
F
1.3709 3.7083 0.3074
R

s 24.456
2

0.2914

Adjusted R 0.0788

b) Y = 20.8808 + 0.29454 X1 +16.583 X2 81.717 X3


Multiple Regression Results
0
Intercept
20.8808
b
s(b)
23.5983
t
0.88484
0.3970
p-value
VIF

1
X1
0.29454
0.29945
0.98361
0.3485

2
3
X2
X3
16.583 -81.717
23.96
119.5
0.6921 -0.6838
0.5046 0.5096

1.0262 9867.0

9867.4

ANOVA Table
Source
Regn.
Error

SS
1605.98
6834.82

df
3
10

MS
FCritical p-value
F
535.33 0.7832 3.7083 0.5300
683.48

Total

8440.8

13

649.29

11-14

R 0.1903

s 26.143
2

Adjusted R -0.0527

Chapter 11 - Multiple Regression

c) all parameters of the equation change values and some change signs. X2 and X3 are correlated
(0.9999) Solution: use either X2 or X3, but not both.
d) Yes, the correlation matrix indicated that X2 and X3 were correlated
X1
X2
X3
X1 1.0000
X2 -0.0137 1.0000
X3 -0.0237 0.9991 1.0000

11-87. Artificially high variances of regression coefficient estimators; unexpected magnitudes of some
coefficient estimates; sometimes wrong signs of these coefficients. Large changes in coefficient
estimates and standard errors as a variable or a data point is added or deleted.
11-88. Perfect collinearity exists when at least one variable is a linear combination of other variables.
This causes the determinant of the X X matrix to be zero and thus the matrix non-invertible. The
estimation procedure breaks down in such cases. (Other, less technical, explanations based on the
text will suffice.)
11-89. Not true. Predictions may be good when carried out within the same region of the
multicollinearity as used in the estimation procedure.
11-90. No. There are probably no relationships between Y and any of the two independent variables.
11-91. X 2 and X 3 are probably collinear.
11-92. Delete one of the variables X 2, X 3, X 4 to check for multicollinearity among a subset of these
three variables, or whether they are all insignificant.
11-93. Drop some of the other variables one at a time and see what happens to the suspected sign of the
estimate.
11-94. The purpose of the test is to check for a possible violation of the assumption that the regression
errors are uncorrelated with each other.
11-95. Autocorrelation is correlation of a variable with itself, lagged back in time. Third-order
autocorrelation is a correlation of a variable with itself lagged 3 periods back in time.
11-96. First-order autocorrelation is a correlation of a variable with itself lagged one period back in
time. Not necessarily: a partial fifth-order autocorrelation may exist without a first-order
autocorrelation.
11-97. 1) The test checks only for first-order autocorrelation. 2) The test may not be conclusive.
3) The usual limitations of a statistical test owing to the two possible types of errors.

11-15

Chapter 11 - Multiple Regression

11-98. DW = 0.93

n = 21

k=2

d L = 1.13
d U = 1.54 4 d L = 2.87
4 d U = 2.46
At the 0.10 level, there is some evidence of a positive first-order autocorrelation.
11-99. DW = 2.13

n = 20

k=3

d L = 1.00
d U = 1.68 4 d L = 3.00
4 d U = 2.32
At the 0.10 level, there is no evidence of a first-order autocorrelation.

Durbin-Watson d = 2.125388

11-100. DW = 1.79 n = 10
k = 2 Since the table does not list values for n = 10, we will use the
closest table values, those for n = 15 and k = 2:
d L = 0.95
d U = 1.54 4 d L = 3.05
4 d U = 2.46
At the 0.10 level, there is no evidence of a first-order autocorrelation. Note that the table values
decrease as n decreases, and thus our conclusion would probably also hold if we knew the actual
critical points for n = 10 and used them.
11-101. Suppose that we have time-series data and that it is known that, if the data are autocorrelated, by
the nature of the variables the correlation can only be positive. In such cases, where the
hypothesis is made before looking at the actual data, a onesided DW test may be appropriate.
(And similarly for a negative autocorrelation.)
11-102. DW analysis on results from problem 11-39:
Durbin-Watson d = 1.552891

k = 2 independent variables
n = 10 for the sample size.
Table 7 for the critical values of the DW statistic begins with sample sizes of 15, which is a little
larger than our sample. Using the values for size 15 as an approximation, we have:
for = 0.05, dl = 0.95 and du = 1.54
the value for d is slightly larger than du
indicating no autocorrelation.

Residual plot with employees on x-axis:

11-16

Chapter 11 - Multiple Regression

Residual Plot
2000
1500

Residual

1000
500
0
0

20000

40000

60000

80000

100000

120000

-500
-1000
-1500
-2000

11-103. F (r,n(k+1)) =

(6.996 6.9898) / 2
(SSE R SSE F ) / r
=
= 0.0275
0.1127
MSE F

Cannot reject H0. The two variables should definitely be droppedthey add nothing to the
model.
11-104. Y = 47.16 + 1.599X 1 + 1.1149X 2
The STEPWISE regression routine selects both
variables for the equation. R 2 = 0.961.
11-105. The STEPWISE procedure selects all three variables. R 2 = 0.9667.
11.106. All possible regression is the best procedure because it evaluates every possibility. It is expensive
in computer time; however, as computing power and speed increase, this becomes a very viable
option. Forward selection is limited by the fact that once a variable is in, there is no way it can
come out once it becomes insignificant in the presence of new variables. Backward elimination is
similarly limited. Stepwise regression is an excellent method that enjoys very wide use and that
has stood the test of time. It has the advantages of both the forward and the backward methods,
without their limitation.
11-107. Because a variable may lose explanatory power and become insignificant once other variables
are added to the model.
11-108. Highest adjusted R 2; lowest MSE; highest R 2 for a given number of variables and the assessment
of the increase in R 2 as we increase the number of variables; Mallowss Cp.
11-109. No. There may be several different best models. A model may be best using one criterion, and
not the best using another criterion.

11-17

Chapter 11 - Multiple Regression

11-110. Results will vary. Sample regression for Australia.


(Data source: Foreign Statistics/Handbook of International Economic Statistics/Tables)
Australia

Real GDP

Defense
Exp % GDP

1970

171

2.3

14.6

1,219

1980

238

2.7

17.0

1,052

1990

328

2.2

17.3

1,670

1992

330

2.3

17.5

1,800

1993

342

2.6

17.7

2,000

1994

359

2.5

17.9

1,230

1995

369

2.7

18.1

1,800

1996

382

2.6

18.3

2,090

2.5

18.4

1,790

1997

394

Grain
Population Yields

Multiple Regression Results


0
1
2
3
Intercept Defense Exp % GDP Population Grain Yields
-64.709
58.04
0.035
b -583.38
s(b) 123.753
45.0667
8.8057
0.0246
t -4.714
-1.4358
6.5912
1.4181
0.2105
0.0012
0.2154
p-value 0.0053
VIF

1.3387

ANOVA Table
Source
SS
Regn. 40654.3
Error 2039.75
Total

42694

2.0253

1.6331

df
3
5

MS
FCritical p-value
F
13551 33.218 5.4094 0.0010
407.95

5336.8

R 0.9522

Correlation matrix
1
2
3
Defense Population Grain Yields
Defense 1.0000
Population 0.4444
1.0000
Grain Yields 0.0689
0.5850
1.0000
Real GDP 0.2573

0.9484

0.7023

11-18

s 20.198
2

Adjusted R 0.9236

Chapter 11 - Multiple Regression

Partial F Calculations

Australia
#Independent variables in full model
#Independent variables dropped from the model
SSEF

2039.748

SSER

39867.85

Partial F
p-value

46.36369
0.0010

3k
2r

Model is significant with high R2, F-value, low multicollinearity.


11-111. Substitution of a variable with its logarithm transforms a non-linear model to a linear model. In
this case, the logarithm of size of fund has a linear relationship with the dependent variables.
11-112. Since the t-statistic for each variable alone is significant and given the R-square, we can conclude
that a good linear relation exists between the dependent and independent variables. Since the tstatistic of the cross products are not significant, there is no relation among the independent
variables and the cross products. In conclusion, there is only a linear relationship among the
dependent and independent variables.
11-113. Using MINITAB
Regression Analysis: Com. Eff. versus Sincerity, Excitement, ...
The regression equation is
Com. Eff. = - 36.5 + 0.098 Sincerity + 1.99 Excitement + 0.507
Ruggedness - 0.366 Sophistication

Predictor
Constant
Sincerity
Excitement
Ruggedness
Sophistication

S = 3.68895

Coef
-36.49
0.0983
1.9859
0.5071
-0.3664

SE Coef
24.27
0.3021
0.2063
0.7540
0.3643

R-Sq = 94.6%

T
-1.50
0.33
9.63
0.67
-1.01

P
0.171
0.753
0.000
0.520
0.344

R-Sq(adj) = 91.8%

Based on the p-values for the estimated coefficients, only the assessed excitement variable is
significant. The adjusted R-square indicates that 91.8% of the variation in commercial
effectiveness is explained by the model. The ANOVA test indicates that a linear relation exists
between the dependent and independent variables.

11-19

Chapter 11 - Multiple Regression

Analysis of Variance
Source
Regression
Residual Error
Total

DF
4
8
12

SS
1890.36
108.87
1999.23

MS
472.59
13.61

F
34.73

P
0.000

11-114. STEPWISE chooses only Number of Rooms and Assessed Value.


b0 = 91018
b1 = 7844 b2 = 0.2338 R 2 = 0.591
11-115. Answers to this web exercise will vary with selected countries and date of access.
Case 15: Return on Capital for Four Different Sectors
Indicator variables used:
Sector

I1

I2

I3

Banking

Computers

Construction

Energy

1.
A
1
2
3
4
5
6
7
8
9
10

Multiple Regression Results

b
s(b)
t
p-value

F
G
Chapter 11 Case - ROC

0
1
Intercept
Sales
14.6209 2.30E-05
2.51538 2.60E-05
5.81259 0.88781
0.0000
0.3770

2
Oper M
0.0824
0.0553
1.4905
0.1396

3
Debt/C
-0.0919
0.0444
-2.0692
0.0414

4
I1
10.051
2.0249
4.9636
0.0000

5
I2
2.8059
2.2756
1.2331
0.2208

6
I3
-1.6419
1.8725
-0.8769
0.3829

1.2472

1.2212

1.6224

1.8560

1.8219

1.9096

VIF

Based on the regression coefficients of I1, I2, I3, the ranking of the sectors from highest return to lowest
will be:
Computers, Construction, Banking, Energy
2. From "Partial F" sheet, the p-value is almost zero. Hence the type of industry is significant.

11-20

Chapter 11 - Multiple Regression

3. 95% Prediction Intervals:


Sector

95% Prediction Interval

Banking
Computers

12.9576 + or 12.977

Construction
Energy

23.0082 + or 13.295
15.7635 + or 13.139
11.3157 + or 12.864

11-21

Chapter 12 - Time Series, Forecasting, and Index Numbers

CHAPTER 12
TIME SERIES, FORECASTING, AND INDEX NUMBERS
12-1.

Trend analysis is a quick method of determining in which general direction the data are moving
through time. The method lacks, however, the theoretical justification of regression analysis
because of the inherent autocorrelations and the intended use of the method in extrapolation
beyond the estimation data set.

12-2.

The trend regression is:


b0 = 28.7273 b1 = -0.6947 r 2 = 0.511
y = 28.7273 0.6947 t
y (Jul-2007) = 12.055%

for t = 24

(Using the template: Trend Forecast.xls)


Forecasting with Trend

t
24
25
26
27
28

Forecast
Z-hat
12.0553
11.3607
10.666
9.9713
9.27668

Regression Statistics
2
r 0.5111
MSE 22.24426
Slope -0.69466
Intercept 28.72727

Forecast for July, 2007 (t = 24) = 12.0553%


12-3.

The trend regression is:


b0 = 34.818 b1 = 12.566
y = 34.818 + 12.566 t

r 2 = .9858

y (2008) = 198.182
y (2009) = 210.748
(Using the template: Trend Forecast.xls)

12-1

Chapter 12 - Time Series, Forecasting, and Index Numbers

Forecasting with Trend


Data
Period
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007

t
1
2
3
4
5
6
7
8
9
10
11
12

Zt
53
65
74
85
92
105
120
128
144
158
179
195

Forecast
t
13
14
15
16
17
18
19
20
21
22
23
24

Z-hat
198.182
210.748
223.315
235.881
248.448
261.014
273.58
286.147
298.713
311.28
323.846
336.413

Regression Statistics
2

r 0.9858
MSE 32.51189
Slope 12.56643
Intercept 34.81818

Forecast for 2008 (t = 13) = 198.182 and for 2009 (t = 14) = 210.748

12-4.

The trend regression is:


b0 = -0.873
b1 = 3.327
y = -0.667 + 3.269 t
y = 39.05%

r 2 = 0.8961

for t = 12

12-2

Chapter 12 - Time Series, Forecasting, and Index Numbers

(Using the template: Trend Forecast.xls)


Forecasting with Trend

t
12
13
14
15
16

Forecast
Z-hat
39.0545
42.3818
45.7091
49.0364
52.3636

Regression Statistics
2
r 0.8961
MSE 15.68081
Slope 3.327273
Intercept -0.87273

Forecast for next year (t = 12) = 39.05%


12-5.

No, because of the seasonality.

12-6.

No. Cycles are not well modeled by trend analysis.

12-7.

The term, seasonal variation is reserved for variation with a cycle of one year.

12-8.

There will be too few degrees of freedom for error.

12-9.

The weather, for one thing, changes from year to year. Thus sales of winter clothing, as an
example, would have a variable seasonal component.

12-3

Chapter 12 - Time Series, Forecasting, and Index Numbers

12-10. Using MINITAB to conduct a multiple regression with a time variable and 11 dummy variables:
Regression Analysis: profit versus t, jan, ...
adjusted R-square
is reasonable.
Setting t = 25, Jan = 1 and the rest of the months = 0, we
The The
regression
equation
is
get a =forecasted
for Jan, t2007
1.588 jan + 0.121 feb + 0.319 mar + 0.567 apr
profit
0.163 value
+ 0.0521
+ =
0.123
+ 0.615 may + 0.413 jun + 0.510 jul + 0.758 aug + 0.856 sep +
Predicted0.904
Values for
New
oct
+ Observations
0.602 nov
New
Obs
Fit SE
Fit SE
95%
CI
95%
Predictor
Coef
Coef
T PI
P
1
1.5875
0.3104
(0.9043,
2.2707)
(0.5874,
2.5876)
Constant
0.1625
0.3104 0.52 0.611
t
0.05208 0.01129 4.61 0.001
jan
0.1229
0.3543 0.35 0.735
for New 0.34
Observations
febValues of Predictors
0.1208
0.3505
0.737
mar
0.3188
0.3470 0.92 0.378
New
apr
0.5667
0.3439 1.65 0.128
Obs
t
jan
feb
mar
apr
may 1 25.0 0.6146
0.3411 0.000000
1.80 0.099
1.00 0.000000
0.000000
jun
0.4125
0.3387 1.22 0.249
julNew
0.5104
0.3366 1.52 0.158
augObs
0.7583
0.3349 2.26
0.045
aug
sep
oct
nov
0.000000
0.000000
sep 1 0.000000
0.8563
0.3336
2.57 0.000000
0.026
oct
0.9042
0.3326 2.72 0.020
nov
0.6021
0.3320 1.81 0.097

may
0.000000

12-11. Using trend analysis:


trend regression
is: = 83.2%
S = The
0.331834
R-Sq
R-Sq(adj) = 64.8%
2
b0 = 8165707 b1 = 40169.72 r = 0.9715
y = 8165707 + 40169.72 t
Analysis of Variance
y = 8728083 for t = 13
Source
DF
SS
MS
F
P
Regression
12Trend
5.9783
0.4982 4.52 0.009
(Using the template:
Forecast.xls)
Residual
Errorwith
11Trend
1.2112 0.1101
Forecasting
Total
23 7.1896
t
14
15
16
17
18

Forecast
Z-hat
8728083
8768252
8808422
8848592
8888761

Regression Statistics
2
r 0.9715
MSE 7.82E+08
Slope 40169.72
Intercept 8165707

Forecast for next year (t = 13) = 8728083

12-4

jun
0.000000

jul
0.000000

Chapter 12 - Time Series, Forecasting, and Index Numbers

12-12. Using a computer:


Linear regression trend line:

data:
t (mon.) Z(t)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

(Jul)
(Aug)
(Sep)
(Oct)
(Nov)
(Dec)
(Jan)
(Feb)
(Mar)
(Apr)
(May)
(Jun)
(Jul)
(Aug)
(Sep)
(Oct)
(Nov)
(Dec)
(Jan)
(Feb)
(Mar)
(Apr)
(May)
(Jun)
(Jul)
(Aug)
(Sep)
(Oct)
(Nov)
(Dec)
(Jan)
(Feb)
(Mar)
(Apr)

trend:
Zhat(t)

7.40
6.80
6.40
6.60
6.50
6.00
7.00
6.70
8.20
7.80
7.70
7.30
7.00
7.10
6.90
7.30
7.00
6.70
7.60
7.20
7.90
7.70
7.60
6.70
6.30
5.70
5.60
6.10
5.80
5.90
6.20
6.00
7.30
7.40

7.18
7.17
7.15
7.13
7.11
7.09
7.07
7.05
7.03
7.01
6.99
6.97
6.95
6.93
6.91
6.89
6.87
6.86
6.84
6.82
6.80
6.78
6.76
6.74
6.72
6.70
6.68
6.66
6.64
6.62
6.60
6.58
6.56
6.54

Zhat(t) = 7.2043 0.0194 t

Centered
Moving
Average

C(t) =
CMA
Zhat(t)

7.02
7.01
7.05
7.10
7.15
7.20
7.25
7.30
7.30
7.29
7.28
7.25
7.20
7.11
7.00
6.89
6.79
6.71
6.62
6.51
6.43
6.40

0.993
0.995
1.002
1.012
1.022
1.032
1.043
1.052
1.057
1.057
1.059
1.058
1.053
1.043
1.029
1.017
1.005
0.996
0.985
0.971
0.963
0.960

-----------FORECAST---------------35 (May) (Zhat = 6.525)(S = 109.60)/100 = 7.15

Template Forecast is 7.045

12-5

Ratio
Moving
Average

Seasonal
Index
S

99.76
95.54
116.38
109.92
107.76
101.45
96.55
97.32
94.47
100.17
96.16
92.41
105.62
101.29
112.92
111.73
111.90
99.88
95.21
87.58
87.05
95.37

95.68
92.25
90.57
97.57
95.96
92.22
102.47
98.21
114.41
110.59
109.60
100.45
95.68
92.25
90.57
97.57
95.96
92.22
102.47
98.21
114.41
110.59
109.60
100.45
95.68
92.25
90.57
97.57
95.96
92.22
102.47
98.21
114.41
110.59

[Deseasoned]
Z(t)/S%
7.73
7.37
7.07
6.76
6.77
6.51
6.83
6.82
7.17
7.05
7.03
7.27
7.32
7.70
7.62
7.48
7.29
7.27
7.42
7.33
6.90
6.96
6.93
6.67
6.58
6.18
6.18
6.25
6.04
6.40
6.05
6.11
6.38
6.69

Chapter 12 - Time Series, Forecasting, and Index Numbers

12-13. (Using the template: Trend+Season Forecasting.xls, sheet: monthly)


Forecasting with Trend and Seasonality
Forecast for Sep, 2006 = 0.33587
t Year
1 2004
2 2004
3 2005
4 2005
5 2005
6 2005
7 2005
8 2005
9 2005
10 2005
11 2005
12 2005
13 2005
14 2005
15 2006
16 2006
17 2006
18 2006
19 2006
20 2006
21 2006
22 2006

Month
11 Nov
12 Dec
1 Jan
2 Feb
3 Mar
4 Apr
5 May
6 Jun
7 Jul
8 Aug
9 Sep
10 Oct
11 Nov
12 Dec
1 Jan
2 Feb
3 Mar
4 Apr
5 May
6 Jun
7 Jul
8 Aug

Y
0.38
0.38
0.44
0.42
0.44
0.46
0.48
0.49
0.51
0.52
0.45
0.4
0.39
0.37
0.38
0.37
0.33
0.33
0.32
0.32
0.32
0.31
Intercept

Trend Equation 0.518283

Deseasonalized
0.40913
0.41684
0.45224
0.42406
0.48048
0.49272
0.45687
0.45687
0.4539
0.44922
0.44242
0.43222
0.4199
0.40587
0.39057
0.37357
0.36036
0.35347
0.30458
0.29837
0.2848
0.26781

t
23
24
25
26

Year
2006
2006
2006
2006

Slope
-0.00818

12-14. (Using the template: Trend+Season Forecasting.xls, sheet: monthly)


Forecasting with Trend and Seasonality

12-6

Forecasts
Month
9 Sep
10 Oct
11 Nov
12 Dec

Y
0.33587
0.29803
0.29152
0.27867

Chapter 12 - Time Series, Forecasting, and Index Numbers

Forecast for Oct, 2006 = 28.73718


t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Data
Year
Month
2005 1
Jan
2005 2
Feb
2005 3
Mar
2005 4
Apr
2005 5
May
2005 6
Jun
2005 7
Jul
2005 8
Aug
2005 9
Sep
2005 10
Oct
2005 11
Nov
2005 12
Dec
2006 1
Jan
2006 2
Feb
2006 3
Mar
2006 4
Apr
2006 5
May
2006 6
Jun
2006 7
Jul
2006 8
Aug
2006 9
Sep
Intercept

Trend Equation 22.54861

Y Deseasonalized
14
16.8856
10
22.2728
50
54.0922
24
24.6668
16
15.3033
15
15.8805
20
22.3533
42
22.5141
18
21.3884
26
20.2627
21
20.6647
20
21.4286
18
21.71
10
22.2728
22
23.8006
24
24.6668
26
24.8678
24
25.4087
18
20.1179
58
31.0909
40
47.5297

t
22
23
24
25

Forecasts
Year Month
2006 10 Oct
2006 11 Nov
2006 12 Dec
2007 1 Jan

Y
28.73718
22.75217
20.88982
18.55136

Slope
-0.00694

The forecast for October is considerably less than the actual percents recorded for August and
September. The forecast reflects the historical percentage of negative stories instead of the
recent past history.
12-15. (Using the template: Trend+Season Forecasting.xls)
Forecasting with Trend and Seasonality (quarterly)
t Year
1 2005
2 2005
3 2005
4 2005
5 2006
6 2006
7 2006
8 2006
9 2007

Q
1
2
3
4
1
2
3
4
1

Deseasonalized
3.4
3.869621
4.5
4.150717
4
4.258289
5
4.554288
4.2
4.78012
5.4
4.98086
4.9
5.216404
5.7
5.191888
4.6
5.23537

12-7

Chapter 12 - Time Series, Forecasting, and Index Numbers

Forecasts
Year
t
10
2007
11
2007
12
2007

Q
2
3
4

Y
6.20676
5.56327
6.71894

Seasonal
Indices
Q
Index
1
87.86
2
108.42
3
93.93
4
109.79
400

Forecast for Q2, 2007 = 6.20676


12-16. Assuming a weight of 0.4
(Using the template: Exponential Smoothing.xls)
Exponential Smoothing
MAE
MAPE
MSE
3.3688 7.91% 18.2177
Period
45
46
47
48
49

Actual Forecast
27 27.6959
26 27.4175
27 26.8505
28 26.9103
27.3462

Forecast for next quarter = 27.3462

12-8

Chapter 12 - Time Series, Forecasting, and Index Numbers

12-17. Using a computer:


w = 0.3

Zhat(1) = Z(1) = 57

Zhat( 2):
Zhat( 3):
Zhat( 4):
Zhat( 5):
Zhat( 6):
Zhat( 7):
Zhat( 8):
Zhat( 9):
Zhat(10):
Zhat(11):
Zhat(12):
Zhat(13):
Zhat(14):
Zhat(15):
Zhat(16):
Zhat(17):

0.3(57.00)
0.3(58.00)
0.3(60.00)
0.3(54.00)
0.3(56.00)
0.3(53.00)
0.3(55.00)
0.3(59.00)
0.3(62.00)
0.3(57.00)
0.3(50.00)
0.3(48.00)
0.3(52.00)
0.3(55.00)
0.3(58.00)
0.3(61.00)

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

0.7(57.00)
0.7(57.00)
0.7(57.30)
0.7(58.11)
0.7(56.88)
0.7(56.61)
0.7(55.53)
0.7(55.37)
0.7(56.46)
0.7(58.12)
0.7(57.79)
0.7(55.45)
0.7(53.21)
0.7(52.85)
0.7(53.50)
0.7(54.85)

w = 0.8
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=

57.00
57.30
58.11
56.88
56.61
55.53
55.37
56.46
58.12
57.79
55.45
53.21
52.85
53.50
54.85
56.69

0.8(57.00)
0.8(58.00)
0.8(60.00)
0.8(54.00)
0.8(56.00)
0.8(63.00)
0.8(55.00)
0.8(59.00)
0.8(62.00)
0.8(57.00)
0.8(50.00)
0.8(48.00)
0.8(52.00)
0.8(55.00)
0.8(58.00)
0.8(61.00)

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

0.2(57.00)
0.2(57.00)
0.2(57.80)
0.2(59.56)
0.2(55.11)
0.2(55.82)
0.2(53.56)
0.2(54.71)
0.2(58.14)
0.2(61.23)
0.2(57.85)
0.2(51.57)
0.2(48.71)
0.2(51.34)
0.2(54.27)
0.2(57.25)

=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=

57.00
57.80
59.56
55.11
55.82
53.56
54.71
58.14
61.23
57.85
51.57
48.71
51.34
54.27
57.25
60.25

The w = .8 forecasts follow the raw data much more closely. This makes sense because the raw
data jump back and forth fairly abruptly, so we need a high w for the forecasts to respond to
those oscillations sooner.

12-9

Chapter 12 - Time Series, Forecasting, and Index Numbers

12-18. Using a computer:


w = 0.7

Zhat(1) = Z(1) = 195

Zhat( 2): 0.7(195.00) + 0.3(195.00) =


Zhat( 3): 0.7(193.00) + 0.3(195.00) =
Zhat( 4): 0.7(190.00) + 0.3(193.60) =
Zhat( 5): 0.7(185.00) + 0.3(191.08) =
Zhat( 6): 0.7(180.00) + 0.3(186.82) =
Zhat( 7): 0.7(190.00) + 0.3(182.05) =
Zhat( 8): 0.7(185.00) + 0.3(187.61) =
Zhat( 9): 0.7(186.00) + 0.3(185.78) =
Zhat(10): 0.7(184.00) + 0.3(185.94) =
Zhat(11): 0.7(185.00) + 0.3(184.58) =
Zhat(12): 0.7(198.00) + 0.3(184.87) =
Zhat(13): 0.7(199.00) + 0.3(194.06) =
Zhat(14): 0.7(200.00) + 0.3(197.52) =
Zhat(15): 0.7(201.00) + 0.3(199.26) =
Zhat(16): 0.7(199.00) + 0.3(200.48) =
Zhat(17): 0.7(187.00) + 0.3(199.44) =
Zhat(18): 0.7(186.00) + 0.3(190.73) =
Zhat(19): 0.7(191.00) + 0.3(187.42) =
Zhat(20): 0.7(195.00) + 0.3(189.93) =
Zhat(21): 0.7(200.00) + 0.3(193.48) =
Zhat(22): 0.7(200.00) + 0.3(198.04) =
Zhat(23): 0.7(190.00) + 0.3(199.41) =
Zhat(24): 0.7(186.00) + 0.3(192.82) =
Zhat(25): 0.7(196.00) + 0.3(188.05) =
Zhat(26): 0.7(198.00) + 0.3(193.61) =
Zhat(27): 0.7(200.00) + 0.3(196.68) =
---------------FORECAST-------------Zhat(28): 0.7(200.00) + ).3(199.01) =

195.00
193.60
191.08
186.82
182.05
187.61
185.78
185.94
184.58
184.87
194.06
197.52
199.26
200.48
199.44
190.73
187.42
189.93
193.48
198.04
199.41
192.82
188.05
193.61
196.68
199.01
199.70

12-10

Chapter 12 - Time Series, Forecasting, and Index Numbers

Exponential Smoothing
MAE
MAPE
MSE
4.8241 2.52% 34.8155

w 0.7
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

Zt
195
193
190
185
180
190
185
186
184
185
198
199
200
201
199
187
186
191
195
200
200
190
186
196
198
200
200

Forecast
195
195
193.6
191.08
186.824
182.047
187.614
185.784
185.935
184.581
184.874
194.062
197.519
199.256
200.477
199.443
190.733
187.42
189.926
193.478
198.043
199.413
192.824
188.047
193.614
196.684
199.005
199.702

|Error|

%Error

Error

3.6
6.08
6.824
7.9528
2.61416
0.21575
1.93527
0.41942
13.1258
4.93775
2.48132
1.7444
1.47668
12.443
4.7329
3.58013
5.07404
6.52221
1.95666
9.413
6.8239
7.95283
4.38585
3.31575
0.99473

1.89%
3.29%
3.79%
4.19%
1.41%
0.12%
1.05%
0.23%
6.63%
2.48%
1.24%
0.87%
0.74%
6.65%
2.54%
1.87%
2.60%
3.26%
0.98%
4.95%
3.67%
4.06%
2.22%
1.66%
0.50%

12.96
36.9664
46.567
63.247
6.83383
0.04655
3.74529
0.17591
172.287
24.3814
6.15697
3.04292
2.18059
154.828
22.4004
12.8173
25.7459
42.5392
3.82853
88.6046
46.5656
63.2475
19.2357
10.9942
0.98948

12-11

Chapter 12 - Time Series, Forecasting, and Index Numbers

12-19. Assuming a weight of 0.9


(Using the template: Exponential Smoothing.xls)
Exponential Smoothing
w
t

0.9

Zt
1 2565942
2 2724292
3 3235231
4 3863508
5 4819747
6 5371689
7 6119114
8

Forecast
2565942
2565942
2708457
3182554
3795413
4717314
5306251
6037828

Forecast for 2007 = 6037828


12-20. Answers will vary.
12-21. Equation (12-11):
Z wZ w(1 w)Z
t 1

t 1

w(1 w) 2 Z t 2 w(1 w) 3 Z t 3 ...

The same equation for Z t (shifting all subscripts back by 1):


Z t = wZ t1 + w(1w)Z t2 + w(1-w) 2Z t3 + w(1-w) 3Z t4 +

Now multiplying this second equation throughout by (1w) gives:


(1w) Z = w(1w)Z t-1 + w(1w) 2Z t-2 + w(1w) 3Z t3 = w(1w) 4Z t-4 +
t

Now note that all the terms on the right side of the equation above are identical to all the terms in
Equation (12-11) on the top, after the term wZ t. Hence we can substitute in Equation (12-11) the
left hand side of our last equation, (1w) Z t for all the terms past the first. This gives us:
Z t 1 wZ t (1 w)Z t

which is Equation (12-12).


12-22. Equation (12-13) is:

Z
t +1 = Zt + (1 w)(Zt

Multiplying out we get:

Zt )

Z t 1 Z t (1 w)Z t (1 w)Z t Z t (1 w)Z t Z t wZ t wZ t (1 w)Z t ,

which is Equation (12-12).

12-12

Chapter 12 - Time Series, Forecasting, and Index Numbers

289.1
; thus:
100
new CPI
24.9
26.9
27.5
27.7
..
.

12-23. Simply divide each CPI by


year
1950
1951
1952
1953
..
.

old CPI
72.1
77.8
79.5
80.1
..
.

12-24. 168.77 in July 2000 and 173.48 in June 2001.


12-25. A simple price index reflects changes in a single price variable of time, relative to a single base
time.
12-26. Index numbers are used as deflators for comparing values and prices over time in a way that
prevents a given inflationary factor from affecting comparisons. They are also used to provide an
aggregate measure of changes over time in several related variables.
12-27. a. 1988
1993index
163
=
1988index
100
c. It fell, from 145% of the 1988 output down to 133% of that output.
d. Big increase in the mid 80s, then a sharp drop in 1986, tumbling for three more
then slowly climbing back up until 1995, then a drop-off.

b. Just divide each index number by

a)
Price Index
BaseYear 1988
Year
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997

100 Base
Price
Index
175 175
190 190
132 132
96 96
100 100
78 78
131 131
135 135
154 154
163 163
178 178
170 170
145 145
133 133

12-13

years,

Chapter 12 - Time Series, Forecasting, and Index Numbers

c)
Price Index
BaseYear 1993
Year
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997

163 Base
Price
Index
175 107.36
190 116.56
132 80.982
96 58.896
100 61.35
78 47.853
131 80.368
135 82.822
154 94.479
163 100
178 109.2
170 104.29
145 88.957
133 81.595

12-28. Divide each data point by


Jun. 03: 98.6

Jan.2004value
1.44 :
100

Jul. 03: 95.14

12-29. Since a yearly cycle has 12 months and there are only 18 data points, a seasonal/cyclical
decomposition isnt feasible. Simple linear regression, with the successive months numbered
1,2,..., gives SALES = 4.23987 .03870MONTH, thus for July 1995 (month #19), the forecast is
3.5046.
(Using the template: Trend Forecast.xls)

12-14

Chapter 12 - Time Series, Forecasting, and Index Numbers

Forecasting with Trend


Data
Period

jan
feb
mar
apr
may
jun
jul
aug
sep
oct
nov
dec
jan
feb
mar
apr
may
jun

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Zt
4.4
4.2
3.8
4.1
4.1
4
4
3.9
3.9
3.8
3.7
3.7
3.8
3.9
3.8
3.7
3.5
3.4

Forecast
t
19
20
21

Z-hat
3.50458
3.46588
3.42718

The forecast of sales for July, 2004 is 3.5 million units.


Regression Statistics
2

r 0.7285
MSE 0.016906
Slope
-0.0387
Intercept 4.239869

12-30. Trend analysis is a quick, if sometimes inaccurate, method that can give good results. The
additive and multiplicative TSCI models are sometimes useful, although they lack a firm
theoretical framework. Exponential smoothing methods are good models. The ones described in
this book do not handle seasonality, but extensions are possible. This author believes that BoxJenkins ARIMA models are the way to go. One limitation of these models is the need for large
data sets.
12-31. Exponential smoothing models smooth the data of sharp variations and produce forecasts that
follow a type of average movement in the data. The greater the weighting factor w, the closer
the exponential smoothing series follows the data and forecasts tend to follow the variations in
the data more closely.

12-15

Chapter 12 - Time Series, Forecasting, and Index Numbers

12-32. Using MINITAB: Stat: Time Series: Moving Average


Moving Average for Data
Moving Average
Length

Accuracy Measures
MAPE
MAD
MSD

1.69534
1.75000
3.66964

Forecasts
Period
13

Forecast
103.375

Lower
99.6204

Upper
107.130

Forecast for next period = 103.375


12-33. Assuming a weight of 0.4
Use the template: Exponential Smothing.xls
w
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

0.4
Zt
18
17
15
14
15
11
8
5
4
3
5
4
6
5
7
8

Forecast
18
18
17.6
16.56
15.536
15.3216
13.593
11.3558
8.81347
6.88808
5.33285
5.19971
4.71983
5.2319
5.13914
5.88348
6.73009

y(2007) = 6.73009

12-16

Chapter 12 - Time Series, Forecasting, and Index Numbers

12-34. a) raised the seasonal index to 99.38 for April from 99.29
We would expect to see the April index change by a significant amount. The reason it did
not is due to the calculations involving moving average.
b) raised the seasonal index to 122.27 for April from 99.29
c) raised the seasonal index to 100.16 for December from 100.09 We would expect the
December index to change by a significant amount. It did not due to the calculations for
moving average.
d) very high or low values for data points at the beginning or end of a series have little impact
on the seasonal index due to their limited influence in the moving average computations.
12-35. (Using the template: Trend Forecast.xls)
Forecasting with Trend
Data
Period
1998
1999
2000
2001
2002
2003
2004

t
1
2
3
4
5
6
7

Zt
6.3
6.6
7.3
7.4
7.8
6.9
7.8

Forecast
t
8
9
10

Z-hat
7.95714
8.15714
8.35714

Forecast for 2005 = 7.957


Regression Statistics
2

r 0.5552
MSE 0.179429
Slope
0.2
Intercept 6.357143

12-36. Answers will vary.


Case 16: Auto Parts Sales Forecast
1)
Forecasts
Year
t
17
2002
18
2002
19
2002
20
2002

Q
1
2
3
4

Y
$85,455,550.30
$108,706,616.14
$97,706,824.92
$105,724,455.54

12-17

Chapter 12 - Time Series, Forecasting, and Index Numbers

Using Excels regression tool and the Centered Moving Average (col. G of the template) as our
Y and the values under t (col. B of the template) for our X, we get the following supporting detail
for the Trend+Seasonal model:
Regression Statistics
Multiple R
0.89727
R Square
0.805093
Adjusted R Square 0.785602
Standard Error
1.558112
Observations
12

Intercept
time

Standard
Coefficients Error
t Stat
P-value
152.2638 1.195366 127.3785 2.18E-17
-0.83741 0.130296 -6.42701 7.57E-05

(Note: the coefficient values are identical to those generated by the template.)
ANOVA
df

SS
MS
F
Significance F
1 100.2802 100.2802 41.30642
7.57E-05
10 24.27713 2.427713
11 124.5573

Regression
Residual
Total

2) Multiple Regression Equation:


Y = -2693200091 8445234.547 M2 +82447357.24 NF 3768891Oil Price
Multiple Regression Results
0
Intercept
-2693200091
b
s(b)
1096606287
t
-2.455940771
0.0303
p-value
ANOVA Table
Source
SS
df
Regn. 3.77493E+15 3
Error 8.2767E+14 12
Total

4.6026E+15

15

1
2
M2 Index
Non Farm Activity Index
-8445234.547
82447357.24
101021547.4
38350031.1
-0.083598349
2.149864156
0.9348
0.0527

3
Oil Price
-3768891
1263314.066
-2.983336528
0.0114

MS
FCritical p-value
F
1.25831E+15 18.243631 3.4902996 0.0001
6.89725E+13
3.0684E+14

3) forecasted values using regression model:

12-18

0.8202

s 8304970.102
2

Adjusted R 0.77521656

Chapter 12 - Time Series, Forecasting, and Index Numbers

Quarter
2002/Q1
2002/Q2
2002/Q3
2002/Q4
4)

Forecast
$81,337,085.11
$55,574,874.53
$60,903,732.58
$59,868,829.41

4. Add the new data:


Y

X1

X2
X3
X4 X5 X6
Non Farm
Ones
Activity
Q2 Q3 Q4
Sales
M2 Index Index
Oil Price
35452300 1 2.356464
34.2
19.15
0
0
0
41469361 1 2.357643
34.27
16.46
1
0
0
40981634 1 2.364126
34.3
18.83
0
1
0
42777164 1 2.379493
34.33
19.75
0
0
1
43491652 1 2.373544
34.4
18.53
0
0
0
57669446 1 2.387192
34.33
17.61
1
0
0
59476149 1 2.403903
34.37
17.95
0
1
0
76908559 1
2.42073
34.43
15.84
0
0
1
63103070 1 2.431623
34.37
14.28
0
0
0
84457560 1 2.441958
34.5
13.02
1
0
0
67990330 1 2.447452
34.5
15.89
0
1
0
68542620 1 2.445616
34.53
16.91
0
0
1
73457391 1
2.45601
34.6
16.29
0
0
0
89124339 1
2.48364
34.7
17
1
0
0
85891854 1 2.532692
34.67
18.2
0
1
0
69574971 1 2.564984
34.73
17
0
0
1

12-19

Chapter 12 - Time Series, Forecasting, and Index Numbers

Multiple Regression Results


0

b
s(b)
t
p-value

M2 Index
Intercept
-2655354679 -12780153.29
1219227600
118142020
-2.177899088 -0.108176187
0.0574
0.9162

2
Non Farm
Activity Index
81566233.8
43101535.65
1.892420596
0.0910

9.9367

9.0506

VIF
ANOVA Table
Source
SS
df
Regn.
3.89129E+15 6
Error
7.11312E+14 9
Total

MS
6.48548E+14
7.90347E+13

4.6026E+15 15

Oil Price
Q2
Q3
Q4
-3827527.175
5802059 7127252.8 3211850.1
1534592.501 6575281.3 6653402.9 6716387.2
-2.494165177 0.8824047 1.0712192 0.478211
0.0342
0.4005
0.3120
0.6439
1.4058

F
8.2058616

3.0684E+14

1.6411

1.6803

FCritical
p-value
3.3737564 0.0031

s 8890145.586
Adjusted
2
R 0.742423692

0.8455

Regression Equation:
Sales = -2655354679 12780153.29 M2 + 81566233.8 NFAI 3827527.175 Oil P +5802059 Q2
+7127252.8 Q3 + 3211850.1 Q4
5. Forecast for next four quarters:
Quarter
02 Q1
02 Q2
02 Q3
02 Q4

Sales
76344324
56495768
62878143
57771809

6. Partial F-test:
H0: 4 = 5 = 6 = 0
H1: not all are zero
(Remember, to drop the three indicator variables, they must be the last three independent
variables in the data sheet of the template.)
Partial F Calculations
#Independent variables in full model
#Independent variables dropped from the model
SSEF

7.11E+14

SSER

8.28E+14

Partial F
p-value

0.490747
0.6973

12-20

1.7123

6
3

Chapter 12 - Time Series, Forecasting, and Index Numbers

p-value = 0.6973, very high. Do not reject the null hypothesis: indicator variables are not
significant.
7. Comparing the three model forecasts:
It would be ideal to have the values for 2004 to compare the forecasts to the actual values.
However, these values are not available. The next step is to compare the three models relative to
R2, F and standard error of the models.
Model
R2
F
Std. error
Trend + Seasonal
0.805
41.306
1.558
MR (part 2)
0.820
18.244
8,304,970.1
MR (part 4)
0.846
8.206
8,890,145.6
Clearly, the best model is the Trend + Seasonal Model with the smallest Std. Error and highest Fvalue. The only significant independent variable in the multiple regression models is oil prices,
and a regression of sales on oil prices only yields an R2 of 0.33 and a very high std. error.

12-21

You might also like