You are on page 1of 10

Week 6: Guided solutions for Wooldridge exercises

P1.1 It does not make sense to pose the question in terms of causality. Economists would assume
that students choose a mix of studying and working (and other activities, such as attending class,
leisure, and sleeping) based on rational behavior, such as maximizing utility subject to the
constraint that there are only 168 hours in a week. We can then use statistical methods to
measure the association between studying and working, including regression analysis that we
cover starting in Chapter 2. But we would not be claiming that one variable causes the other.
They are both choice variables of the student.
P1.2 (i) Ideally, we could randomly assign students to classes of different sizes. That is, each
student is assigned a different class size without regard to any student characteristics such as
ability and family background. For reasons we will see in Chapter 2, we would like substantial
variation in class sizes (subject, of course, to ethical considerations and resource constraints).
(ii) A negative correlation means that larger class size is associated with lower performance.
We might find a negative correlation because larger class size actually hurts performance.
However, with observational data, there are other reasons we might find a negative relationship.
For example, children from more affluent families might be more likely to attend schools with
smaller class sizes, and affluent children generally score better on standardized tests. Another
possibility is that, within a school, a principal might assign the better students to smaller classes.
Or, some parents might insist their children are in the smaller classes, and these same parents
tend to be more involved in their childrens education.
(iii) Given the potential for confounding factors some of which are listed in (ii) finding a
negative correlation would not be strong evidence that smaller class sizes actually lead to better
performance. Some way of controlling for the confounding factors is needed, and this is the
subject of multiple regression analysis.
C1.1 (i) The average of educ is about 12.6 years. There are two people reporting zero years of
education, and 19 people reporting 18 years of education.
In EViews, open the variable educ and select View => Descriptive Statistics & Tests =>
Stats Table
(ii) The average of wage is about $5.90, which seems low in 2005.
In EViews, open the variable wage and select View => Descriptive Statistics & Tests =>
Stats Table
(iii) Using Table B-60 in the 2004 Economic Report of the President, the CPI was 56.9 in
1976 and 184.0 in 2003.

(iv) To convert 1976 dollars into 2003 dollars, we use the ratio of the CPIs, which is
184 / 56.9 3.23 . Therefore, the average hourly wage in 2003 dollars is roughly
3.23($5.90) $19.06 , which is a reasonable figure.
(v) The sample contains 252 women (the number of observations with female = 1) and 274
men.
In EViews, open the variable female and select One-Way Tabulation. Press ok with the default
configuration.
C1.2 (i) There are 1388 women in the sample. In EViews, to make a frequency table open cigs.
In view select One-Way Tabulation and just press ok with the default configuration.

Tabulation of CIGS
Number of categories: 18
Cumulative

Cumulative

Value

Count

Percent

Count

Percent

1176

84.73

1176

84.73

0.22

1179

84.94

0.29

1183

85.23

0.50

1190

85.73

0.65

1199

86.38

19

1.37

1218

87.75

0.43

1224

88.18

0.29

1228

88.47

0.36

1233

88.83

0.07

1234

88.90

10

55

3.96

1289

92.87

12

0.36

1294

93.23

15

19

1.37

1313

94.60

20

62

4.47

1375

99.06

30

0.36

1380

99.42

40

0.43

1386

99.86

46

0.07

1387

99.93

50

0.07

1388

100.00

1388

100.00

1388

100.00

Total

There are 1,388 observations in the sample and 1,176 has cigs = 0. So 1,388 -1,176 = 212
women have cigs > 0, i.e. they smoked during pregnancy.
(ii) Open cigs. Again, in View select Descriptive Statistics & Tests Stats Table
The average of cigs is about 2.09, but this includes
the 1,176 women who did not smoke.
Reporting just the average masks the fact that almost
85 percent of the women did not smoke. It makes more
sense to say that the typical woman does not smoke
during pregnancy; indeed, the median number of
cigarettes smoked is zero.

Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis

CIGS
2.087176
0.000000
50.00000
0.000000
5.972688
3.560448
17.93397

Jarque-Bera
Probability

15830.76
0.000000

Sum
Sum Sq. Dev.

2897.000
49478.45

Observations

1388

(iii) Open cigs. Then in sample type:

Again, in view select Descriptive Statistics & Tests Stats Table


The average of cigs for women with cigs > 0 is about 13.7. Of course, this is much higher than
the average over the entire sample because we are excluding all of the non-smokers (1,176
zeros). This is an example of a conditional average, because we are conditioning on smoking.

(iv) This time open fatheduc and in view select Descriptive Statistics & Tests Stats
Table. Remember to get rid of sample selection of cigs > 0, simply open sample and delete the
IF condition. The average of fatheduc is about 13.2.
In view select One-Way Tabulation Now use:

Such that NAs are included in the table. There are 196 observations with a missing value for
fatheduc leaving us only 1,192 observations that can be used to compute the average.
Tabulation of FATHEDUC
Date: 01/30/10 Time: 10:33
Sample: 1 1388
Included observations: 1388
Number of categories: 19

Value
NA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Total

Count
196
1
2
4
3
4
10
10
22
17
49
64
443
87
115
43
189
32
97
1388

Percent
14.12
0.07
0.14
0.29
0.22
0.29
0.72
0.72
1.59
1.22
3.53
4.61
31.92
6.27
8.29
3.10
13.62
2.31
6.99
100.00

Cumulative
Count
196
197
199
203
206
210
220
230
252
269
318
382
825
912
1027
1070
1259
1291
1388
1388

Cumulative
Percent
14.12
14.19
14.34
14.63
14.84
15.13
15.85
16.57
18.16
19.38
22.91
27.52
59.44
65.71
73.99
77.09
90.71
93.01
100.00
100.00

Table.

(v) This time open faminc and in view select Descriptive Statistics & Tests Stats

The average and standard deviation of faminc are


about 29.027 and 18.739, respectively, but faminc
is measured in thousands of dollars. So, in dollars,
the average and standard deviation are $29,027
and $18,739.

Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis

FAMINC
29.02666
27.50000
65.00000
0.500000
18.73928
0.617620
2.473396

Jarque-Bera
Probability

104.2811
0.000000

Sum
Sum Sq. Dev.

40289.00
487060.0

Observations

1388

C1.3 (i) In EViews, open the variable math4 and select View => Descriptive Statistics &
Tests => Stats Table. The largest is 100, the smallest is 0.
(ii) In EViews, open the variable math4 and select One-Way Tabulation. Press ok with the
default configuration.
38 out of 1,823, or about 2.1 percent of the sample have a perfect math score.
Tabulation of MATH4
Date: 01/30/10 Time: 10:44
Sample: 1 1823
Included observations: 1823
Number of categories: 6

Value
[0, 20)
[20, 40)
[40, 60)
[60, 80)
[80, 100)
[100, 120)
Total

Count
31
128
264
622
740
38
1823

Percent
1.70
7.02
14.48
34.12
40.59
2.08
100.00

Cumulative
Count
31
159
423
1045
1785
1823
1823

Cumulative
Percent
1.70
8.72
23.20
57.32
97.92
100.00
100.00

(iii) 17. To get this number, open math4 and in Sample type:

(iv) The average of math4 is about 71.9 and the average of read4 is about 60.1. So, at least
in 2001, the reading test was harder to pass.
(v) In Eviews, write: cor math4 read4 in the command window. The sample correlation
between math4 and read4 is about .843, which is a very high degree of (linear) association. Not
surprisingly, schools that have high pass rates on one test have a strong tendency to have high
pass rates on the other test.
MATH4
1.000000
0.842728

MATH4
READ4

READ4
0.842728
1.000000

(vi) In EViews, open the variable math4 and select View => Descriptive Statistics &
Tests => Stats Table. The average of exppp is about $5,194.87. The standard deviation is
$1,091.89, which shows rather wide variation in spending per pupil. [The minimum is $1,206.88
and the maximum is $11,957.64.]
(vii) a) 100((6,000 5,500)/5,500) = 9.1% and b) 100(log(6,000) log(5,500)) = 8.7%

C1.4 (i) In EViews, open the variable train and select One-Way Tabulation. Press ok with the
default configuration. 185/445 .416 is the fraction of men receiving job training, or about
41.6%.
Tabulation of TRAIN
Date: 01/30/10 Time: 11:04
Sample: 1 445
Included observations: 445
Number of categories: 2

Value
0
1
Total

Count
260
185
445

Percent
58.43
41.57
100.00

Cumulative
Count
260
445
445

Cumulative
Percent
58.43
100.00
100.00

(ii) Open re78. Select Sample and (a) type under IF train = 0 which will give you a sample
of those who havent received training, and (b) type under IF train = 1 which will give you a
sample of those who have received training. In both cases, you find the average by selecting
View => Descriptive Statistics & Tests => Stats Table For men receiving job training, the
average of re78 is about 6.35, or $6,350. For men not receiving job training, the average of re78
is about 4.55, or $4,550. The difference is $1,800, which is very large. On average, the men
receiving the job training had earnings about 40% higher than those not receiving training.
(iii) Open unem78. Select Sample and (a) type under IF train = 0 which will give you a
sample of those who havent received training, and (b) type under IF train = 1 which will give
you a sample of those who have received training. In both cases, you find the fraction of
unemployment by selecting One-Way Tabulation (Press ok with the default configuration).
About 24.3% of the men who received training were unemployed in 1978; the figure is 35.4%
for men not receiving training. This, too, is a big difference.
Tabulation of UNEM78
Date: 01/30/10 Time: 11:13
Sample: 1 445 IF TRAIN=0
Included observations: 260
Number of categories: 2

Value
0
1
Total

Count
168
92
260

Percent
64.62
35.38
100.00

Cumulative
Count
168
260
260

Cumulative
Percent
64.62
100.00
100.00

Percent
75.68
24.32
100.00

Cumulative
Count
140
185
185

Cumulative
Percent
75.68
100.00
100.00

Tabulation of UNEM78
Date: 01/30/10 Time: 11:15
Sample: 1 445 IF TRAIN=1
Included observations: 185
Number of categories: 2

Value
0
1
Total

Count
140
45
185

(iv) The differences in earnings and unemployment rates suggest the training program had
strong, positive effects. Our conclusions about economic significance would be stronger if we
could also establish statistical significance (which is done in Computer Exercise C9.10 in
Chapter 9).

P2.1 In the equation y = 0 + 1x + u, add and subtract 0 from the right hand side to get y =
(0 + 0) + 1x + (u 0). Call the new error e = u 0, so that E(e) = 0. The new intercept is
0 + 0, but the slope is still 1.
n

P2.2

(i) Let yi = GPAi, xi = ACTi, and n = 8. Then x = 25.875, y = 3.2125, (xi x )(yi
i=1

y ) = 5.8125, and (xi x )2 = 56.875. From equation (2.9), we obtain the slope as 1 =
i=1

5.8125/56.875 .1022, rounded to four places after the decimal. From (2.17), 0 = y
x 3.2125 (.1022)25.875 .5681. So we can write
1

n = .5681 + .1022 ACT


GPA
n = 8.
The intercept does not have a useful interpretation because ACT is not close to zero for the
n increases by .1022(5) = .511.
population of interest. If ACT is 5 points higher, GPA
(ii) The fitted values and residuals rounded to four decimal places are given along with
the observation number i and GPA in the following table:

n
u
i GPA GPA
1 2.8
2.7143 .0857
2 3.4

3.0209 .3791

3 3.0

3.2253 .2253

4 3.5

3.3275 .1725

5 3.6

3.5319 .0681

6 3.0

3.1231 .1231

7 2.7

3.1231 .4231

8 3.7

3.6341 .0659

You can verify that the residuals, as reported in the table, sum to .0002, which is pretty close to
zero given the inherent rounding error.

n = .5681 + .1022(20) 2.61.


(iii) When ACT = 20, GPA

(iv) The sum of squared residuals (SSR),

2
i

i =1

places), and the total sum of squares (SST),

, is about .4347 (rounded to four decimal

(yi

y )2, is about 1.0288. So the R-squared

i=1

from the regression is


R2 = 1 SSR/SST 1 (.4347/1.0288) .577.

Therefore, about 57.7% of the variation in GPA is explained by ACT in this small sample of
students.

C2.1 (i) In both cases, open the variable and select View => Descriptive Statistics & Tests =>
Stats Table. The average prate is about 87.36 and the average mrate is about .732.
(ii) To estimate the equation in EViews write ls prate c mrate in the command window. The
estimated equation is
n
prate = 83.05 + 5.86 mrate

n = 1,534, R2 = .075.
Dependent Variable: PRATE
Method: Least Squares
Date: 01/30/10 Time: 11:41
Sample: 1 1534
Included observations: 1534
Variable

Coefficient

Std. Error

t-Statistic

Prob.

C
MRATE

83.07546
5.861079

0.563284
0.527011

147.4840
11.12137

0.0000
0.0000

R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)

0.074703
0.074099
16.08528
396383.8
-6436.956
123.6848
0.000000

Mean dependent var


S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat

87.36291
16.71654
8.394989
8.401945
8.397578
1.908008

(iii) The intercept implies that, even if mrate = 0, the predicted participation rate is 83.05
percent. The coefficient on mrate implies that a one-dollar increase in the match rate a fairly
large increase is estimated to increase prate by 5.86 percentage points. This assumes, of
course, that this change prate is possible (if, say, prate is already at 98, this interpretation makes
no sense).

= 83.05 + 5.86(3.5) = 103.59.


(iv) If we plug mrate = 3.5 into the equation we get prate
This is impossible, as we can have at most a 100 percent participation rate. This illustrates that,
especially when dependent variables are bounded, a simple regression model can give strange
predictions for extreme values of the independent variable. (In the sample of 1,534 firms, only
34 have mrate 3.5.)
(v) mrate explains about 7.5% of the variation in prate. This is not much, and suggests that
many other factors influence 401(k) plan participation rates.

10