You are on page 1of 8

Model Selection: MWD Test

Problem 11.17
a.

Estimate a log-linear model using the given data. You can use any combination of the independent
variables you choose. I estimated a regression with ln(Q)=f(ln(disposable income), ln(price of
chicken), ln(composite price index)). [Note this is not the best model] Be sure to store the
residuals, this also stores the predicted Y, (Y-hat).

SUMMARY
OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

0.990
0.980
0.977
0.028
23

ANOVA
df

SS

Regression
Residual
Total

3
19
22

0.759
0.015
0.775

2.030
0.481
-0.351
-0.061

Standard Error
0.119
0.068
0.079
0.130

Coefficients
Intercept
ln(disposable inc)
ln(Pchick)
ln(Pcomp index)

MS
0.253
0.001

F
315.206

t Stat
17.103
7.058
-4.416
-0.470

P-value
0.000
0.000
0.000
0.644

Here are the predicted values of (ln(Y)) and the residuals (e) from the log-linear regression.

Significance F
0.000

RESIDUAL
OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

b.

Predicted ln(Q(Chick))
3.343
3.396
3.405
3.432
3.487
3.509
3.524
3.589
3.614
3.630
3.670
3.706
3.738
3.645
3.711
3.703
3.774
3.821
3.819
3.874
3.942
3.958
3.980

Residuals
-0.018
0.002
-0.011
-0.005
-0.046
-0.003
0.048
0.005
-0.011
0.018
0.029
-0.010
-0.005
0.054
-0.005
-0.011
-0.020
-0.034
0.025
0.050
-0.028
-0.013
-0.011

Now estimate a linear regression using the same collection of independent variables.

SUMMARY
OUTPUT

Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

0.959
0.920
0.907
2.246
23

ANOVA
df
3
19
22

SS
1100.112
95.816
1195.929

MS
366.704
5.043

F
72.716

Coefficients
32.587
0.010
-0.296
0.106

Standard Error
3.974
0.004
0.131
0.072

t Stat
8.200
2.283
-2.257
1.466

P-value
0.000
0.034
0.036
0.159

Regression
Residual
Total

Intercept
Disposable inc
Pchick
Pcomp index

Here are the predicted values of Y and the residuals (e) from the linear regression.
RESIDUAL

Significance F
0.000

OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

c.

Predicted Q(Chick)
30.835
32.318
32.010
32.633
33.509
34.131
34.364
35.520
35.838
36.524
38.360
38.344
40.190
38.052
40.058
40.915
43.058
43.669
42.754
47.491
52.710
53.639
55.476

Residuals
-3.035
-2.418
-2.210
-1.833
-2.309
-0.831
1.236
0.880
0.862
1.876
2.040
1.956
1.610
2.348
0.642
-0.815
-0.358
0.431
3.946
3.109
-2.610
-1.939
-2.576

To choose between the two functional forms we use the MWD test

H0: Linear model: Y is a linear function of the Xs

H1: Log-linear model: ln(Y) is a linear function of the Xs or the ln(X)s.


Take the natural log of Y-hat from the linear regression, that is take the natural log of the predicted Y in the
table above. [Note this is not the same as the predicted(lnY) from the first regression.] You would get the
following values:
ln(Y-hat)
3.429
3.476
3.466
3.485
3.512
3.530
3.537
3.570
3.579
3.598
3.647
3.647
3.694
3.639
3.690
3.711
3.763
3.777
3.755
3.861
3.965
3.982
4.016

We use these values to create the variable Z1 = ln(Y-hat) predicted(lnY)


Z1 = ln(Y-hat) - predicted(lnY)

0.09
0.08
0.06
0.05
0.02
0.02
0.01
-0.02
-0.04
-0.03
-0.02
-0.06
-0.04
-0.01
-0.02
0.01
-0.01
-0.04
-0.06
-0.01
0.02
0.02
0.04
Regress Y on the Xs and Z1 to get the following results:
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

0.991
0.981
0.977
1.120
23

ANOVA
df
Regression
Residual
Total

Intercept
Disposable inc
Pchick
Pcomp index
Z1 = ln(Y-hat) - predicted(lnY)

4
18
22

SS
1173.361
22.568
1195.929

MS
293.340
1.254

F
233.967

Coefficient
s
30.766
0.008
-0.187
0.087
-45.125

Standard Error
1.996
0.002
0.067
0.036
5.904

t Stat
15.416
3.994
-2.796
2.404
-7.643

P-value
0.000
0.001
0.012
0.027
0.000

Significance F
0.000

Reject H0 if Z1 is statistically significant, in this case we see that Z1 is statistically significant, conclude that
H1 a log-linear model is a better fit.
Now calculate Z2 = antilog(ln(Y-hat) Y(hat)) and regress lnY on the natural logs of Xs and the new
variable Z2.
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

0.991
0.982
0.978
0.028
23

ANOVA
df
Regression
Residual
Total

Intercept
ln(disposable inc)
ln(Pchick)
ln(Pcomp index)
Z2 = antilog(ln(Y-hat) - Y-hat

SS

MS
0.190
0.001

F
242.066

Standard Error
0.118
0.071
0.093
0.129

t Stat
17.067
6.393
-3.157
-0.515

P-value
0.000
0.000
0.005
0.613

31880130123.800

-1.194

0.248

4
18
22

0.761
0.014
0.775

Coefficients
2.015
0.454
-0.292
-0.066
38073961326.163

Significance F
0.000

Here we find that the variable Z2 is not statistically significant and conclude that the log-linear model is the
appropriate model.

You might also like